2. Recall that we are representing Boolean values by 0 and 1, and not 1, ?1.
3. The case d = 1 has to be handled separately. In that case, it is easy to see that 1 is closed under complementation but not under union and intersection; 1 is closed under union but not under complementation and intersection; and 1 is closed under intersection but not under complementation and union. As for AC 1 Introduction
The computing power of small-depth circuits has been extensively studied in the past fteen years. In particular, several complexity classes de ned by polynomialsize constant-depth circuits constructed with various types of unbounded fan-in symmetric gates (i.e., gates whose output value is determined by the sum of the inputs) have been considered. A primary example of such a class is AC 0 , which is obtained by allowing AND and OR gates. The seminal result that parity is not computable in AC 0 was rst obtained by Furst, Saxe and Sipser (1984) and Ajtai (1983) : this lower bound was subsequently improved by Yao (1985) and H astad (1986) , and generalized by Razborov (1987) , Barrington (1986) and Smolensky (1987) . Furthermore, by combining work of Fagin et al. (1985) and Yao (1985) , we now have a complete characterization of the symmetric functions realizable in AC 0 ; they are (essentially) the threshold functions t (x 1 ; : : :; x n ) =
( 1 P n i=1 x i t 0 else where the parameter t is polylogarithmic in n.
Another such class is TC 0 which consists of polynomial-size constant-depth circuits constructed with arbitrary threshold gates. It is well-known that the class remains the same if only majority gates are allowed. On the other hand, threshold circuits can compute any symmetric function, hence arbitrary symmetric gates can be used. By the previous remark, AC 0 TC 0 , but otherwise this class is poorly understood. Indeed, threshold circuits have been found to be surprisingly powerful. For example, Beame et al. (1986) gave algorithms to compute powering (given x, return x 2 ; : : : ; x n ), iterated multiplication (given x 1 ; : : :; x n , return the product x 1 x n ) and division (given x; y, return bx=yc) that are easily checked to be in TC 0 , as was observed by Reif (1987) , Hajnal et al. (1987) and Immerman and Landau (1989) . In fact, it is not known if all of NP is contained in TC 0 . Usually, circuits in TC 0 are classi ed according to their total depth, but in our work, we will adopt a di erent perspective. Considering that arbitrary threshold gates are much more powerful than AND and OR gates, we propose a parametrization of TC interconnections. This compares with the best known circuit, in terms of total depth, which is due to Siu and Roychowdhury (1994) and uses four levels of majority gates but no other. In contrast, we will produce a circuit in d TC TC 0 3 with only two levels of AND-OR gates. Thus we save one level of majority gates compared to the circuit of Siu and Roychowdhury (1994) , at the expense of 6 increasing the total depth by one. Similar results are obtained for powering and division.
Our paper is organized as follows: in Section 2, we summarize the necessary background and prove some results about various elementary classes of functions. In particular, we develop a simple symbolic calculus for composing bounded-depth circuit classes: this notation allows for a succinct and elegant presentation of our results.
The next section presents a slightly modi ed version of a known method to solve the iterated addition problem, which can be implemented with only one level of threshold gates (instead of two) at the cost of increasing the total depth by two: this algorithm will be central to the design of further algorithms. Then we give a general presentation of Chinese remaindering and how its implementation can be used to construct e cient threshold circuits for arithmetic problems.
We are then ready to describe our solutions to the problems of iterated multiplication, powering and division. The ideas presented earlier lead to circuits having one less level of threshold gates than the best known ones, at the cost of increasing the total depth by only one.
Finally, we brie y discuss the hierarchy d TC TC , its relationship to the usual depth hierarchy for TC 0 , and the question of lower bounds.
Preliminaries
In this section, we provide basic de nitions and some background on Boolean circuits. We also investigate exactly how TC 0 circuits can compute symmetric functions. In the process, we introduce some of the central ideas of this article and establish preliminary results that will be needed later.
Boolean functions, circuits and circuit complexity classes
In this article, Boolean functions are functions, either total or partial, from f0; 1g to f0; 1g . In particular, the domain of a Boolean function may be any subset of f0; 1g . Boolean functions will always have the property that they map inputs of the same length to outputs of the same length. This implies that every Boolean function can be written as a sequence f 1 ; f 2 ; : : :, where f k is a function from f0; 1g k to f0; 1g l and l is some function of k. Such a sequence is called a family of functions.
Examples of Boolean functions that will be considered in this article are the functions MOD m and t de ned by MOD m (x 1 ; : : :; x n ) = 1 i m divides P n i=1 x i and t (x 1 ; : : :; x n ) = 1 i P n i=1 x i t. The parameters m and t are functions of n that can, of course, be constant. The t are called threshold functions, and MOD 2 and n=2 are also called parity and majority.
A Boolean circuit with n inputs and m outputs naturally de nes a total function from f0; 1g n to f0; 1g m , if we assume that the output gates are numbered 1; 2; : : : ; m. Note that input nodes can be labeled by variables, negated variables, or the constants 0 and 1. Since we will often consider partial functions, we say that f is computed by a circuit C if f is the total function de ned by C or one of its restrictions. In other words, C computes f if C outputs the value of f for every input in the domain of f. As usual, to compute functions on inputs of arbitrary length, we consider families of circuits, that is, sequences C 1 ; C 2 ; C 3 ; : : :, where C n is a circuit with n inputs. We will consider mainly classes of functions de ned by constant-depth polynomial-size circuits with gates of unbounded fan-in, where the size of a circuit is de ned to be the number of edges it contains.
The simplest such class is AC 0 , where only NOT, AND and OR gates are allowed. AC 0 circuits cannot compute parity, MOD m , for any constant m, and majority (Ajtai 1983 , Furst et al. 1984 , H astad 1986 , Yao 1985 . Moreover, t 2 AC 0 if and only if t 2 (log n) O(1) (see Ajtai and Ben-Or 1984 , Denenberg et al. 1986 or Fagin et al. 1985 for the upper bound; for the lower bound, combine a result of Fagin et al. 1985 with H astad 1986 or Yao 1985 .
By adding majority gates to AC 0 circuits, we get the class TC 0 . It is not hard to show that TC 0 contains all the MOD m functions, for m constant (Furst et al. 1984 , Hajnal et al. 1987 . In fact, TC 0 contains all the symmetric functions (Hajnal et al. 1987) , i.e., functions whose value is determined by the sum of the inputs.
Both these classes are contained in the well-known circuit complexity class NC For all of these constant-depth circuit classes, we de ne depth hierarchies by considering circuits of xed constant-depth. Thus, NC We will sometimes use the more descriptive notation AND, OR and MAJ for the classes 1 , 1 and TC 0 1 . Also, AND k and OR k will denote AND and OR, but with gates of fan-in bounded by k. Let AND O(f) (Sipser 1983 (Sipser , H astad 1986 (Hajnal et al. 1987 We can generalize threshold functions by allowing integer weights w 1 ; : : :; w n to be put on the terms of the sum and outputting 1 i P n i=1 w i x i t. The weights become parameters in terms of which the function is de ned, just like the threshold t. We call these functions weighted threshold functions.
A weighted threshold gate is a gate labeled with a weighted threshold function, and a weighted threshold circuit is a circuit consisting entirely of weighted threshold gates. The weight of a circuit is the maximum absolute value of any weight occurring in its gates. A family of weighted threshold circuits has small weight if its weight is bounded by some polynomial in n.
Small-weight threshold circuits provide another equivalent de nition of TC 0 . First, consider a single weighted threshold gate. By feeding several copies of the same input, we can reduce the weights to 1 or ?1. Negative weights are then simulated by using negations. Second, given a depth-d small-weight polynomialsize threshold circuit, starting at level d, simulate threshold gates with majority gates and combine the NOT gates obtained with the threshold gates from previous levels. It can be veri ed that the resulting circuit has depth d and polynomial size.
To summarize, we have the following well-known result: Proof Suppose that f(x 1 ; : : :; x n ) = h( P n i=1 x i ), where h : Z ! f0; 1g, and suppose that h ?1 (1)\f0; : : : ; ng = fv 1 ; : : :; v k g with 0 v 1 < < v k n. For j = 1; : : : ; k, Moreover, if the weights are all nonnegative or all not positive, then no negations are needed in the circuit. In particular, SYM contains all functions of a weighted sum of the inputs, provided the weights are small, i.e., bounded by a polynomial in n.
We will often use symmetric gates as an intermediate step in constructing threshold circuits. In doing this, we will frequently make use of the closure properties of SYM. This will allow symmetric gates to be combined with certain types of gates into which they feed.
The class SYM is obviously closed under complementation since the negation of a symmetric gate can be clearly computed by a symmetric gate. However, SYM is also closed under nite intersections and unions. In fact, SYM is closed under nite Boolean functions. The following theorem slightly generalizes results of Hofmeister et al. (1991) and Beigel (1994 In particular, if c is constant, then the fan-in of the resulting gate is polynomial in m.
Combining classes of functions
Circuits are often designed in two or more stages, each using a di erent type of gates. In order to refer to the class of functions computed by such circuits in a concise way, we will use extensively the following notation:
De nition 2.7 If ? 1 and ? 2 are classes of Boolean functions, then let ? 1 ? 2 denote the class of functions f of the form f(x) = f 1 (f 2 (x)), where f 1 2 ? 1 and f 2 2 ? 2 . This de nition is made exclusively in terms of the composition of functions; in particular, no reference is made to Boolean circuits. Nevertheless, if ? 1 and ? 2 are circuit complexity classes, then ? 1 ? 2 is the class of functions that can be computed by circuits consisting of two stages: the rst one computes a function in ? 2 , the second one, a function in ? 1 . More precisely, if g 2 ? 1 is computed by a family of circuits D 1 ; D 2 ; : : : and h 2 ? 2 is computed by E 1 ; E 2 ; : : :, then f(x) = g(h(x)) is computed by C 1 ; C 2 ; : : : where C n is D m (E n ), m being the output length of E n .
For example, if ? 1 = AC 0 2 and ? 2 = TC 0 3 , then every function in ? 1 ? 2 can be computed by a family of circuits of depth six with majority gates on levels one, two and three, NOT gates on level four, and AND-OR gates on levels ve and six. (The level of NOT gates is needed since the inputs of the AC 0 2 circuit may be negated.) It is easy to see that the total size of the circuit is polynomial in n. Moreover, since TC 0 Conversely, if f is computed by a family of circuits in two stages, then f can be put in a certain class ? 1 ? 2 . However, some care must be taken. Consider the sequence of functions g 1 ; g 2 ; : : : where g n is the function computed by the second stage of the nth circuit. If m, the output length of the rst stage, is not an injective function of n, it might not be possible to de ne g such that all the g n are restrictions of g to inputs of a certain length.
The obvious solution to this technical problem is to \pad" the output of the rst stage so that m is monotone increasing. Then g can be de ned and f(x) = g(h(x)), where h is not the function computed by the rst stage of the family of circuits, but its padded version.
Another reason to pad the output of the rst stage is so that the complexity of the second stage be measured relative to n and not relative to m, which might be much smaller.
To illustrate all this, suppose that f is computed by a family of depth-ve polynomial-size circuits with a rst stage consisting of three levels of majority gates and a second stage consisting of two levels of AND-OR gates. Notice that even if the second stage of the family of circuits does de ne correctly a function g, this function might be the parity function, if m is equal to log n, for example. Since parity is not in AC 0 , we would not be able to conclude that f is in AC In general, whenever the circuit class corresponding to the rst stage is closed under polynomial-size padding, then we can assume that m is monotone increasing. All the classes considered in this article have this property and this \padding trick" will be used often.
Polynomials of constant degree
Polynomials are a common and useful way of representing Boolean functions. For example, many results on constant-depth circuits have been obtained by considering various kinds of polynomial representations of Boolean functions. (For more on this, see Beigel 1993.) Simple examples of polynomial representations are 1?x 1 = NOT(x 1 ), x 1 x n = AND(x 1 ; : : :; x n ) and 1?(1 ?x 1 ) (1 ?x n ) = OR(x 1 ; : : :; x n ).
2
We will be particularly interested in polynomials with small integer coe cients and constant degree.
De nition 2.8 Let POL denote the class of Boolean functions whose output length is bounded by a polynomial in n and such that each output can be written as a polynomial in the input variables with integer coe cients bounded by a polynomial in n and degree bounded by a constant. Let SUM be the subclass of POL corresponding to polynomials of degree one. 
the ith bit of H(x). It is not hard to see that this implies the result.
u t
This simple observation will be used frequently in this article since it will allow circuits with symmetric gates to be simulated e ciently by making use of the following result:
Proposition 2.10 SYM SUM = SYM. In addition, MAJ SUM = MAJ and
The idea of the proof is to simply feed the inputs of the sums directly into the symmetric gate. 
Using AND-OR gates
The approach taken in this article when investigating the complexity of functions in terms of threshold circuits is to minimize rst the depth as measured by threshold gates only, and then the total depth. As a consequence, we will be interested in showing that the function computed by certain stages in our circuits belong to AC Proof Suppose that k is even. Then, any k circuit can be transformed so that level one contains only OR gates and so that all inputs feed into these level one OR gates. Whenever an output of a d function is a positive input of a k circuit (i.e., x i ), compute it with a d circuit; whenever it is a negative input (i.e., x i ), compute it with a d circuit. By using NOT(AND(x 1 ; : : :; x n )) = OR(x 1 ; : : :; x n ) and by combining together OR gates, we get a depth-(k + d ? 1) circuit. It can also be veri ed that the resulting circuit is of polynomial size. We therefore have a k+d?1 circuit.
The 
x i = v j ; for some j i L j (x) = 1 and G j (x) = 1; for some j:
MAJ. Now consider the negations of the symmetric gates. These can also be computed by symmetric gates and thus with circuits similar to the above. By using the fact that NOT(OR(x 1 ; : : :; x n )) = AND(x 1 ; : : : ; x n ), we get a 1 NC Proposition 2.16 Suppose f(x) = 1 i t 1 P n i=1 x i t 2 . Then f can be computed by a depth-two circuit with a majority gate of fan-in 6n 2 at the output and OR gates of fan-in two at level one.
Proof The result is trivial if t 1 > n or t 2 < 0. In addition, without loss of generality, we can assume that t 1 0 and that t 2 n. Now observe that
(t 1 + t 2 )x i n 2 + t 1 t 2 :
Therefore, t 1 P n i=1 x i t 2 can be determined by a weighted threshold gate whose inputs are the x i and the x i x j . As indicated in Section 2.2, such a threshold gate can be simulated by a majority gate. The result follows since x i x j = OR(x i ; x j ). u t In this section, we construct constant-depth polynomial-size threshold circuits for addition, iterated addition and multiplication. These circuits are all constructed with the objective of trying to minimize rst the number of levels of threshold gates and then the total depth. We provide comparisons between our results and other results that try to minimize the total depth only. The iterated addition circuit will be an important building block of our iterated multiplication and division circuits of Sections 5.1 and 5.3. (see Hofmeister et al. 1991 and Siu et al. 1991, for example) . We show here that addition can be computed by both 2 NC 0 1 and 2 NC 0 1 circuits. 4 Unless otherwise speci ed, we assume that numbers are given by their binary representation. Also, as is the case here, n will usually not be the exact length of the input, but something polynomially related to it.
Addition

16
Theorem 3.1 Addition is in 2 NC 0 1 .
Proof Suppose that x and y are the two input numbers, given in binary by x n x 1 and y n y 1 , so that x = P n i=1 x i 2 i?1 and y = P n i=1 y i 2 i?1 . Let z be their sum, with binary representation z n+1 z 1 . Let C i be the carry coming into position i from the right. Then z i = MOD 2 (C i ; x i ; y i ).
Consider position i in x and y. Let G i = AND(x i ; y i ), P i = MOD 2 (x i ; y i ) and A i = AND(x i ; y i ). For i > j, let R ij = AND(P i?1 ; : : :; P j+1 ; G j ). Then, R ij = 1 means that position i receives a carry from position j. It is easy to see that C i = OR(R i(i?1) ; : : :; R i1 ). Now let Q ij = AND(P i?1 ; : : :; P j+1 ; A j ), for i > j. . This fact is already implicit in Hofmeister et al. (1991) and .
Even though, for the purposes of this article, we are mainly interested in circuits of the type 2 NC 0 1 , it is interesting to note that our circuits also show that addition can be computed using constant-degree polynomials. Proof Consider the circuit in the proof of the previous theorem. On any given input, at most one R ij can be 1. Therefore, the OR that gives the value of C i can be computed in SUM. The result follows by the closure properties of POL. In terms of total depth, as threshold circuits, these circuits are not optimal; an optimal TC 0 2 circuit was obtained by Siu and Bruck (1991) (see also Alon and Bruck 1994) . However, this circuit uses threshold gates on both levels; the 2 NC 0 1 circuit of Theorem 3.1 uses no threshold gates, at the extra cost of one level of gates of constant fan-in.
3.2 Iterated addition iterated addition input n n-bit numbers output Their (n + dlog ne)-bit sum Chandra, Stockmeyer and Vishkin (1984) were the rst to show that iterated addition was in TC 0 , and a simple analysis of their result shows that iterated addition is in TC 0 10 . Hofmeister, Hohberg and K ohling (1991) and Siu and Bruck (1991) , independently, and using very di erent techniques, showed that iterated addition was in fact in TC 0 3 . Siu and Roychowdhury (1994) were the rst to obtain an optimal TC 0 2 circuit, by using results of Goldmann et al. (1992) on large-weight threshold gates (see also Goldmann and Karpinski 1993) .
The depth-two circuit of Siu and Roychowdhury (1994) uses two levels of threshold gates, as does the depth-three circuit of Hofmeister et al. (1991) .
5
On the other hand, the depth-three circuit of Siu and Bruck (1991) uses three levels of threshold gates. However, the circuit of Chandra et al. (1984) uses only one level of threshold gates, at the input. By using the results of Section 2.6 and Theorem 3.1, it is possible to implement their technique in 4 SYM. We show here that iterated addition can in fact be computed in 2 SYM. Proof The main idea, as in Hofmeister et al. (1991) and Siu and Bruck (1991) , is to reduce the addition of n numbers to the addition of two numbers. Suppose that x 1 ; : : : ; x n are the input numbers, given in binary by x in x i1 , i = 1; : : : ; n.
Let z be their sum, with binary representation z n+dlog ne z 1 . Let l = dlog ne and m = dn=le.
Divide each x i in m blocks of l bits and let S k be the sum of the kth block of every x i :
In other words, S k = P n i=1 P l j=1 x i((k?1)l+j) 2 j?1 . It is clear that z = P m k=1 S k 2
The maximum value of any S k is n(2 l ? 1) < 2 l+log n . Therefore, the binary representation of S k has no more than 2l bits. Let L k be the low- Hofmeister et al. (1991) . Moreover, this circuit is not far from the optimal depthtwo circuit, in the sense that the AC 0 1 function computed at the output is of a very restricted type. The second and third circuits have depth four, but have the advantage of using only one level threshold gates. Among circuits using only one level of threshold gates, these are the best that are known. It is the third circuit that will be the most useful in the following sections, since we will be able to combine the NC 0 1 level of gates with gates at the preceding levels. Once again, it is interesting to note that iterated addition can be computed using constant-degree polynomials. This follows simply from using the SUM AC Siu and Bruck (1991) , but in a way di erent than that of the 1 TC 0 2 circuit of Corollary 3.4. In fact, an intermediate result of Siu and Bruck (1991) is that iterated addition can be approximated as the sum of the outputs of a TC 0 2 circuit. Our SUM TC 0 2 circuit computes iterated addition exactly and the underlying proof is much simpler.
Multiplication multiplication input Two n-bit numbers output Their (2n)-bit product
It is well-known that multiplication can be easily reduced to an iterated addition. In terms of total depth, this is optimal, by the results of Hajnal et al. (1987) .
Using techniques similar to those used for the problems of this section, it is possible to show that subtracting two n-bit numbers and comparing two n-bit numbers can be done in 2 NC 0 1 and in SUM AC 0 1 NC 0 1 , that adding log n n-bit numbers can be done in 4 and in SUM AC 0 3 , and that multiplying a (log n)-bit numbers with an n-bit number can be done in 3 and in SUM AC 0 2 . In addition, weighted iterated addition, a variant of iterated addition in which n-bit integer weights are associated to the input numbers, can be computed in 2 SYM and SUM AC 0 1 SYM. (For more details, see Maciel 1995.) 4 Chinese Remaindering Chinese remaindering is the technique that was originally used by Beame et al. (1986) to show that iterated multiplication and division are in NC 1 . It is this same technique that is used in obtaining all the small-depth threshold circuits for these two functions (Siu et al. 1993, Siu and Roychowdhury 1994) , and it is this technique that will be used in this article. In this section, we present Chinese remaindering as a general tool for computing arbitrary integer functions using small-depth threshold circuits.
Let f be an arbitrary function from f0; 1g to N. The strategy for computing f will be as follows. First, choose, for every n, m pairwise relatively prime numbers q 1 ; : : :; q m such that Q = Q m i=1 q i > maxff(x) : x 2 dom(f) \ f0; 1g n g. Then, given x, 20 A. compute r i = f(x) mod q i , for i = 1; : : : ; m B. compute f(x) mod Q from the residues r 1 ; : : :; r m This last number is the correct value of f(x) since Q > f(x) implies that f(x) mod Q = f(x).
In the above, Step A amounts to solving the following problem:
f modulo q 1 ; : : :; q m input x 2 dom(f) f0; 1g output The dlog q i e-bit number f(x) mod q i , for i = 1; : : : ; m
The complexity of computing f modulo q 1 ; : : : ; q m will of course depend on the particular function f under consideration, and on the choice of q 1 ; : : :; q m .
Step B, on the other hand, is independent of f. The computation speci ed there is possible because the residues f(x) mod q i determine the value of f(x) mod Q, by the Chinese Remainder Theorem. We will compute f(x) mod Q by using the following lemma which is easily obtained from the usual constructive proof of the Chinese Remainder Theorem. Proof Let x be an arbitrary input of length n, let r i = f(x) mod q i , i = 1; : : : ; m, and let z = P m i=1 u i r i , the u i being given by Lemma 4.1. According to the above discussion, we simply have to compute z mod Q.
Clearly, z mod Q = z ? kQ for some k 2 N. Since z < mQ maxfq i g, it must be that k < m maxfq i g. The fact that the q i are pairwise relatively prime implies that they are distinct. Therefore, m maxfq i g and k n c , for some constant c. For j = 0; : : : ; n c , let z j = z ? jQ. Then, z j = z k = z mod Q if and only if 0 z j < Q.
Therefore, z mod Q can be computed as follows: 6 In particular, ? can be any of the complexity classes de ned in this article. (See Section 2.4.) 1. compute z j = ( P n i=1 u i r i ) ? jQ, for j = 0; : : : ; n c 2. output z j i 0 z j < Q
Step 1 is a weighted iterated addition whose inputs are the r i . Recall that the u i are xed numbers that depend only on Q. Since Q (maxfq i g) m 2 2 n O(1) and u i < Q, there is N 2 n O(1) such that 2 N is greater than both n c Q and the maximum Therefore, representing s j by s j(N+1) s j1 , we have that s j(N+1) = 1 i z j 0 and, in that case, s jN s j1 is the binary representation of z j . (In the case that s j(N+1) = 0, which means that z j < 0, s jN s j1 is not the binary representation of z j . However, the value of z j will be needed in Step 2 only when z j 0.) The value of s j can be computed by rst distributing the r i over the binary representation of the u i . This gives a new sum whose terms are easily computed from the r i without having to use any gates at all. Then, evaluate this sum in u t Let us point out where the various hypotheses of this theorem were used. The fact that the q i are pairwise relatively prime is to ensure that f(x) z (mod Q), via the Chinese Remainder Theorem. Condition (a), requiring that the q i be small, implies that m is small and gives the bound on the number of possible values for k. Finally, condition (b) , requiring that Q be large, implies that f(x) mod Q = f(x).
Recalling from Section 3.2 that iterated addition can also be computed in SUM AC Proof We basically show that the right z j can be selected in POL. Recall that there is only one value of j for which 0 z j < Q. Therefore, the OR giving the value of z ki can be computed in SUM, which implies that Step 2 can be computed in POL.
Therefore, f can be computed in POL SUM AC u t Siu and Roychowdhury (1994) also used Chinese remaindering to compute iterated multiplication and division with small-depth threshold circuits. While our implementation of Chinese remaindering will lead to the best known iterated multiplication and division circuits in terms of majority-depth, their implementation leads to the best known circuits in terms of total depth. (In fact, their division circuit is optimal.)
From their work, we can extract a result similar to Theorems 4.2 and 4.3 and whose conclusion implies that f 2 TC 0 2
?. A proof would go as follows. First, notice that the proof of Theorem 4.3 shows that f 2 POL ?, if f modulo q 1 ; : : :; q m is in ?, iterated addition is in , and and ? are both closed under polynomial-size padding. Let LT d denote the class of functions computed by general weighted threshold circuits of polynomial size and depth d. Siu and Roychowdhury (1994) showed that iterated addition is in SUM LT 1 . They then used a result of Goldmann et al. (1992) (see also Goldmann and Karpinski 1993 and Goldmann 1992) which states that every bit of an LT 1 function can be closely approximated by the sum of the outputs of a TC 0 1 circuit. This implies that every bit of iterated addition can be so approximated. Therefore, let be the class of functions that can be approximated by the sum of the outputs of a TC 0 1 circuit. By observing that constant-degree polynomials of such approximations can also be approximated in the same way, we get that f can be approximated by a sum of the outputs of a TC ?. It is a simple exercise to generalize the Chinese Remaindering technique to functions from f0; 1g to Z. Condition (b) in Theorems 4.2 and 4.3 becomes Q > maxf2 jf(x)j : x 2 dom(f) \ f0; 1g n g. Details can be found in Maciel (1995) .
Iterated Multiplication and Division
In this section, we use the Chinese remaindering technique of the previous section to obtain constant-depth polynomial-size threshold circuits for iterated multiplication, powering and division. We also provide, at the end of Section 5.1, a detailed comparison between our iterated multiplication circuits and the other small-depth iterated multiplication circuits that can be found in the literature. 23 5.1 Iterated multiplication iterated multiplication input n n-bit numbers output Their (n 2 )-bit product Beame et al. (1986) , using Chinese remaindering, were the rst to show that iterated multiplication is in NC 1 . Realizing that their algorithm could be implemented by TC 0 circuits (Reif 1987 , Hajnal et al. 1987 , Immerman and Landau 1989 , researchers started looking for minimal-depth TC 0 circuits for iterated multiplication. In terms of total-depth, the best result obtained so far is that iterated multiplication is in TC 0 4 (Siu and Roychowdhury 1994) . However, four levels of threshold gates are not necessary. For example, the circuit of Immerman and Landau (1989) can be implemented using only three levels of threshold gates. Here, we show that iterated multiplication can be computed in 2 TC 0 3 . This will be proved using Theorem 4.2. As a consequence, we need, for every n, a sequence of pairwise relatively prime numbers satisfying the hypotheses of that theorem. For every n, consider the rst n 2 prime numbers p 1 ; : : : ; p n 2 . Then, by the Prime Number Theorem, maxfp 1 ; : : :; p n 2 g = p n 2 2 O(n 2 log n). Also, Q = Q n 2 i=1 p i 2 n 2 > Q n i=1 x i , for every n-bit input numbers x 1 ; : : :; x n , so that conditions (a) and (b) of Theorem 4.2 are both satis ed.
We now need to compute iterated multiplication modulo p 1 ; : : :; p n 2 e ciently. This is done in the following lemma. Proof (of the lemma) First notice that iterated multiplication modulo 2 is simply the AND of the low-order bits of the input numbers. Now let p 3 be a prime number and suppose that x 1 ; : : : ; x n are the input numbers, given in binary by x in x i1 , i = 1; : : :; n. Let Z p denote the ring of integers modulo p. It is well-known that Z p is in fact a eld and that Z p = Z p ?f0g, the multiplicative group of Z p , is cyclic. In particular, there is an element g 2 Z p such that Z p f1; g; g 2 ; : : :; g p?2 g. For every i, let a i be the unique number such that 0 a i p ? 2 and x i g a i (mod p): Then, Q n i=1 x i g a 1 g an g a 1 + +an (mod p).
Therefore, ( Q n i=1 x i ) mod p can be computed as follows:
1. compute a i , for i = 1; : : : ; n 2. compute g a 1 + +an mod p
Note that g and p are xed and do not have to be computed. In
Step 1, the value of a i is determined by x i mod p. Since x i = P n j=1 x ij 2 j?1 , every bit of a i can therefore be written as a function of P n j=1 (2 j?1 mod p)x ij , which is a sum with weights bounded by p. Therefore, each bit of a i can be computed by a symmetric gate of fan-in pn.
In
Step 2, the number to be computed is a function of a 1 + + a n , a sum of the bits of the a i with weights bounded by p. Therefore, every bit of that number can also be computed by a symmetric gate of fan-in pn.
Since the largest prime p n 2 2 O(n 2 log n), all symmetric gates used in the above computations have fan-in O(n 3 log n) and so iterated multiplication modulo p 1 ; : : :; p n 2 is in SYM SYM. u t Note that this proof uses both the fact that the p i are small and the fact that they are prime.
We have therefore proved Theorem 5.2 Iterated multiplication is in 2 TC 0 3 . We end this section with a review of the main results concerning the computation of iterated multiplication using small-depth Boolean circuits. All of these follow, more or less explicitly, the same pattern:
1. The computation of iterated multiplication modulo small primes 2. Chinese remaindering Beame et al. (1986) were the rst to compute iterated multiplication in this way; they implemented these two steps in NC 1 . It was soon observed that this could even be done in TC 0 (Reif 1987 , Hajnal et al. 1987 , Immerman and Landau 1989 . In fact, only three levels of threshold gates are necessary. For example, Immerman and Landau (1989) implemented Chinese remaindering using the iterated addition circuit of Chandra et al. (1984) . This requires only one level of threshold gates, the other two levels being used for the computation of iterated multiplication modulo small primes.
From there, progress was made possible rst by the observation that iterated multiplication modulo small primes could be computed in SYM SYM, a result that was obtained independently by Maciel and Th erien (1993) and Siu et al. (1993) ; second, by the design of iterated addition circuits more e cient than that of Chandra et al. (1984) . Siu et al. (1993) trying to minimize total depth, used the same block method that we used in Section 3.2 to reduce iterated addition to the addition of two numbers. Then, following Siu and Bruck (1991) , which implies harmonic analysis results of Bruck (1990) and Bruck and Smolensky (1990) , they approximated every bit of the sum of these two numbers as the sum of the outputs of a TC 0 1 circuit. The result was that every bit of iterated addition could be approximated as the sum of the outputs of a TC 0 2 circuit. Using the fact that the AND of such approximations can also be approximated in the same way, they obtained a TC 0 5 iterated multiplication circuit. Maciel and Th erien (1993) introduced the idea of trying to minimize rst the number of levels of threshold gates. Using the iterated addition circuit of Chandra et al. (1984) , we obtained a TC 0 circuit with threshold gates on only the rst three levels.
The results presented in this article improve on that result by reducing the total depth to ve. This is done, as we have seen, by computing iterated addition in 2 SYM using the block and the carry-look-ahead methods. The closure properties of 2 then give a 2 TC 0 3 iterated multiplication circuit. The best iterated multiplication circuit known so far, in terms of total depth, was obtained by Siu and Roychowdhury (1994) . They improved the circuit of Siu et al. (1993) by using results of Goldmann et al. (1992) to approximate every bit of iterated addition as the sum of the outputs of a TC SYM iterated addition circuit of Section 3.2, we can prove a slightly stronger result than that of Siu et al. (1993) with a proof that requires neither harmonic analysis nor the de nition of a notion of approximation. u t Table 1 gives a summary of the iterated multiplication circuits mentioned in this section. The third column lists the main ingredients used in the design of the circuits. Note that all of these circuits use Chinese remaindering and that the last four all use the fact that iterated multiplication modulo small primes is in SYM SYM.
Result
Main ingredients Beame et al. (1986 
Powering
The following problem is often considered in conjunction with iterated multiplication.
powering input An n-bit number x output The kn-bit number x k , for k = 2; : : : ; n The computation of x k can be viewed as a special case of iterated multiplication in which the rst k numbers are equal to x and the other n ? k, equal to one. As such, powering can be computed using an iterated multiplication circuit. However, we can take advantage of the special structure of the problem.
Lemma 5.4 Powering modulo p 1 ; : : : ; p n 2 is in SYM. Proof Suppose that the input number x is given in binary by x n x 1 . Let k and q be arbitrary. Since x k (x mod q) k (mod q), the value of x k mod q is determined by x mod q. Since x P n j=1 (2 j?1 mod q)x j (mod q), we have that every bit of x k mod q is a function of a sum with weights bounded by q. Therefore, by Proposition 2.4, every bit of x k mod q can be computed by a symmetric gate of fan-in qn. The result now follows from the fact that p n 2 2 n O(1) .
u t Theorem 5. Siu and Roychowdhury (1994) imply a TC 0 3 circuit for powering that uses threshold gates on all levels. In terms of total depth, their circuit is optimal (Hofmeister and Pudl ak 1992) .
Division division
input Two n-bit numbers x and y such that y > 0 output The n-bit number bx=yc
The design of small-depth Boolean circuits for division parallels that of circuits for iterated multiplication and powering. As in the case of these other two functions, the rst NC 1 division circuit was obtained by Beame et al. (1986) and the best TC 0 circuit, in terms of total depth, is due to Siu and Roychowdhury (1994) . They showed that division is in TC 0 3 and this is optimal as shown by Hofmeister and Pudl ak (1992) .
The TC 0 3 circuit of Siu and Roychowdhury (1994) uses threshold gates on all levels. As was the case for iterated multiplication and powering, we show here that division can be computed using one less level of threshold gates. More precisely, we show that division is in 2 TC 0 2 . The idea behind our circuit is the same as in Beame et al. (1986) and Siu and Roychowdhury (1994) . The quotient x=y is approximated using a power series for 1=y and this approximation is computed by Chinese remaindering. We will also make use of the following lemma.
Lemma 5.7 Let x and y be two integers such that jyj < 2 n . If x=y t x=y+2 ?n , then btc = bx=yc. Proof If x=y t x=y + 2 ?n , then btc = bx=yc, unless there is an integer s such that x=y < s t, which implies that there is an integer s di erent from x=y and such that js ? x=yj 2 ?n .
Suppose that such an integer s exists. Since jyj < 2 n , we have that jsy ? xj < 1. We can therefore approximate 1=y with arbitrary precision by taking N su ciently large. Let v = 2 ?l P 2n?1 k=0 u k . We get that 1=y ? 2 ?2n v < 1=y. Let t = xv. Then x=y ? 2 ?n < t < x=y, since x < 2 n , and x=y < t + 2 ?n < x=y + 2 ?n .
Consider bt + 2 ?n c. According to the lemma, bt + 2 ?n c = bx=yc. Division can therefore be computed as follows:
1. compute F j (x; y) = 2 ?n + x2 ?j 2n?1 X k=0 u k , where u = 1 ? y2 ?j , for j = 1; : : : ; n 2. output bF j (x; y)c if j = l
The computation of the F j (x; y) will be done by Chinese remaindering. Let G j (x; y) = 2 2n 2 F j (x; y) so that G j (x; y) is an integer. Notice that F j (x; y) can be computed from the value of G j (x; y) without having to use any gates at all. Let m = 5n 3 and consider the rst m prime numbers p 1 ; : : :; p m . It is easy to verify that the hypotheses of the integer version of Theorem 4.2 are satis ed (see the remark at the end of Section 4).
The rest is similar to the computation of powering in 2 TC 0 2 (see Lemma 5.4 and Theorem 5.5). Since G j (x; y) = 2 2n 2 ?n + x2 n 2 ?j 2n?1 X k=0 (2 n 2 ? y2 n 2 ?j ) k ; the value of G j (x; y) mod q is determined by x mod q and y mod q. Therefore, every bit of G j (x; y) mod q is a function of two sums with weights bounded by q and thus can be computed by a symmetric gate of fan-in (4qn) 2 , by combining Proposition 2.4 and Theorem 2.5. This implies that computing all the G j modulo p 1 ; : : :; p m can be done in SYM.
Theorem 4.2 now implies that each G j (x; y) can be computed in on G j , the same subcircuit can be used in the computation of every G j (x; y). Therefore, computing all the G j (x; y) can be done in 2 SYM SYM, which is equal to 2 TC 0 2 . To choose F l (x; y) among F 1 (x; y); : : :; F n (x; y), let j = AND(y n ; : : :; y j+1 ; y j ): Then, j = 1 if and only if 2 j y < 2 j+1 which means that j = l. Therefore, F li (x; y), the ith bit of F l (x; y), can be computed as F li (x; y) = OR n j=1 ( j^Fji (x; y)):
Since we also have that F li (x; y) = OR n j=1 ( j^F ji (x; y)); this means that F l can be computed in 1 NC 0 1 from the j and the F j . Notice that the j are computed in 1 which is trivially included in 2 . Therefore, F l can be computed in 1 NC . Now notice that there is only one value of j for which j = 1. Therefore, the OR giving the value of F li can be computed in SUM. This implies that F l can be computed in POL given the j and the F j . Therefore, F l can be computed in POL SUM AC Table 2 summarizes the results that can be obtained in this way. For each problem or class of problems, the best total-depth result is also included. The references given are for the rst proof of the results.
There is a close relationship between the d TC TC circuits by threshold circuits of xed small depth. If ? is a complexity class de ned in terms of polynomial-size circuits, let q? denote its quasipolynomial-size version (i.e., size n (log n) O(1) ) and let q? + denote the class q? qAND (log n) O(1) . For example, qSYM + denotes the class of functions computed by quasipolynomial-size depth-two circuits with symmetric gates at the output and AND gates of fan-in (log n) O(1) at the input. Results of Beigel and Tarui (1994) Maciel (1995) . One consequence of this result is that the design of circuits with small majority-depth can be used as a rst step in constructing quasipolynomial-size circuits of small total depth.
Whenever faced with a hierarchy ? 1 ? 2 ? 3 of complexity classes, it is natural to try to determine if the hierarchy is in nite or if it collapses, i.e., if (Sipser 1983 (Sipser , H astad 1986 
(x 1 ; : : :; x n ) = MAJ(M 1 ; : : :; M p n ). Since this function is the typical function de ned using two levels of majority gates, it is reasonable to conjecture that it cannot be computed with a single level of majority gates, even with the help of an arbitrary but constant number of levels of AND-OR gates. In other words, we conjecture that MAJ 
