Multiplication is one of the most fundamental operations in arithmetic and algebraic computations. In this paper, we present depth-optimal circuits for performing multiplication, multioperand addition, and symmetric function evaluation with small size and restricted fan-in. In particular, we show that the product of two n-bit numbers can be computed using a unit-weight threshold cir- (1), and edge complexity O(n'+'l" log(d + l ) ) , for any integer d > 0. All the circuits proposed in this paper have constant depth when logk n is a constant and are depth-optimal within small constant factors for any fan-in k.
Introduction
Threshold circuits constitute a powerful computational model for arithmetic and other computations [12, 13, 16, 201 . A liriear threshold function is defined as a Boolean a circuit is defined as the length of (i.e., the number of nodes on) the longest path from any input to any output node of the circuit, while thefan-in of a circuit is defined as the largest fan-in among all the gates contained in it. The edge complexity and the gate complexity of a circuit are defined as the number of edges and the number of gates in the circuit, respectively.
In this paper, we propose unit-weight threshold circuits to perform iterated addition and multiplication and to evaluate general symmetric functions. The circuits we propose have depths that are considerably smaller than those in [3, 13, 141 . In particular, we present a unit-weight threshold circuit to compute the sum of m n-bit numbers, which has depth approximately equal to 2 log, m + log, n + 1 .&log2 d , edge complexity O(nml+'/dlog(d + I)), and fan-in k, for any positive integer d. The parameter d can be selected to obtain the desired tradeoff between depth and edge complexity of the circuit. The depth of our iterated addition circuit is optimal within a factor of 1 + o(1) when log2d = o(logkn) and logkm = o(log,n), and is optimal within a factor of 1.5 + o( 1) when log, d = o( logk n) and logk m NN log, 11.
We derive a unit-weight threshold circuit to compute the product of two n-bit integers, which has depth approximately equal to 3 logkn + 1.44 log2 d, edge complexity O(n2+Ildlog(d + 1)). and fan-in k. The depth of this multiplication circuit is optimal within a factor of 1.5 + o(1) for any circuit based on the grade-school method (i.e., bit-matrix reduction) and is optimal within a factor of 3 + o( 1) from a trivial lower bound, assuming log2d = o(logkm). Siu et a1 [14, 131 have given multiplication threshold circuits of restricted fan-in, which require depth (7 log2(d 
and fan-in k for any integer d >_ 1. Our multiplication circuit improves on the results in [ 141 by reducing the required depth by a factor of 3.67 asymptotically for circuits of fan-in k when logkn is not a constant and by a factor of 4.86 asymptotically for circuits with similar fan-in and edge complexity when d is large. We also show that any symmetric function of n inputs can be evaluated using a 0-7803-5700-0/99/$10.0001999 IEEE unit-weight threshold circuit of fan-in k, depth approximately equal to 2 logk n + 1.44 logz d, and edge complexity O(nl+l/d log(d + 1)). The depth of this circuit is optimal within a factor of 3 + o( 1) for the given fan-in when log,d = o(logkn). The depth of our circuit for symmetric function evaluation is smaller than the depth of the corresponding circuit given in [3] by a factor of approximately 4.17 for similar edge complexity and fan-in k = n.
Iterated Addition
The addition of two operands is the most frequently encountered operation in computer arithmetic units. We can show that addition can be performed using an AND-OR circuit that has almost linear edge complexity and is depth-optimal within a factor of 1 + o( 1) when logkn is not a constant.
Theorem 2.1
The sum of two n-bit integers can be computed using an AND-OR circuit of depth logkn + o(logk n) + O( l), edge complexity O(d2n(log* qn)2), * " ' and fan-in k, for any
In what follows, we focus on the (m,n) iterated (multioperand) addition problem, which is the problem of computing the sum of rn integers, each of which consists of n bits. A related problem is the (m,n) sum-reduction problem, where we want to produce two integers whose sum is equal to the sum of the original m n-bit numbers. Both problems have been considered extensively in the literature, and many constructions have been proposed to solve them [9, 18, 11, 171. 
The (rn,n) Sum-Reduction Problem
Given n one-bit numbers, an (n, rlog2(n + 1)l)-counter is a circuit that produces the [logz(n + 1)1 -bit binary representation of the sum of then bits [ 151. Parallel counters are important in our constructions, since they are used as subblocks in the circuits that we will propose for the sumreduction problem. Lemma 2.2 An (n, rlog2(n + l)])-counter can be constructed using a unit-weight threshold circuit of depth 2, edge complexity ti' + O(n), and fan-in n.
We are now in a position to present circuits for the sum reduction problem. Using the techniques developed in [7, 91 , we can reduce the number of operands that have to be added from m = pr to p[log,(r + 1)1) by using (r, [logz(r+ 1)l)-counters. This reduction will be used repeatedly to reduce the number of operands. Note that the larger the ratio T * l can be made, given the constraints on the fan-in of the circuits, the faster we will be able to perform the sum-reduction operation.
We define the function f (t) as the unique integer x 5 t that satisfies the condition
In other words, r = f ( t ) achieves the largest possible value for + for any integer r 5 t. If multiple values of x maximize the ratio T *, then r is the smallest among them. The following lemma will be useful in our analysis. The following theorem supplies a tradeoff scheme between depth and edge complexity in the (m,n) sumreduction problem. It also gives flexibility in choosing the fan-in of the gates used, which is not the case in Lemma 2.3. The main idea of the following theorem is to use Lemma 2.2 repeatedly to reduce the number of operands to a small number, and then use Lemma 2.3 to obtain the final result. is no more than g -' ( k ) , the fan-in of the circuit that implements Phase 3 is at most equal to g ( g -' ( k ) ) = k from Lemma 2.3. Since the depth of each stage-in Phases 2 and 3 is equal to two, the threshold circuit constructed above for the (m, n) reduction problem has depth q + 2x + 4, fan-in no more than k, and edge complexity O(nm'+lldlog(d + 1)).
To find the depth of the circuit, we need to compute the numbers of stages q and x required for Phases 1 and 
Iterated Addition
In this subsection, we turn our attention to the iterated addition problem, which is the problem of computing the sum of m n-bit integers.
Theorem 2.5 The sum of m n-bit integers can be computed using a unit-weight threshold circuit of depth 2 log, m + log, n + 
Proof:
We first use the (m,n) sum-reduction circuit of Theorem 2.4 to reduce the number of operands from m to two, and then compute the sum of the two numbers using the adder of Theorem 2.1.
0
A trivial lower bound on the depth required to perform iterated addition is log, m + logkn since there are mn input bits. The depth of our iterated addition circuit is optimal within a factor of 1 +o( 1) when logld = o(log,n) and log2m = o(logn), and is optimal within a factor of 1.5+0(1) when log2d = o(logkn) and logzm% logzti.
Symmetric Functions
A B6olean function f is said to be sytnnietric if ,x,(,,,) for any permutation (x,(]), . . . , of ( X I , . . . ,x,~). An important property of symmetric Boolean functions is that they are completely specified by the number of ones in their inputs (that is, by the sum &x;).
Therefore, the threshold circuits for iterated addition lead to efficient circuits for evaluating general symmetric functions. The depth of our circuit for k = n is smaller than the depth of the circuit given in [3] by a factor of 4.17 asymptotically. Theorem 2.4 also provides a mechanism to trade off depth for edge complexity with any restricted fan-in k not exceeding m. The depth of our circuit for symmetric function evaluation is optimal within a factor of 3 + o( 1) from the trivial lower bound logkn when logzd = O(l0gkti).
Multiplication
The results obtained in Section 2 for iterated addition give rise to a fast and edge-efficient multiplier that uses threshold gates of restricted fan-in, as described in the following theorem. . , X I ,xo)z and Y = (yrl-l,. . , , y l , y~) z be the two integers to be multiplied. We will transform the problem into the problem of finding the sum of ii n-bit numbers by means of bit-matrix reduction (i.e., the grade-school method). The binary numbers p . . . I " ' x . A y . -s
.., n -1 , can be computed using a unit-weight threshold circuit of depth one. We then have 141 have given multiplication threshold circuits of restricted fan-in, which require depth (710g2(d+ 1) +4)logkn +o(logdlogkn) +0(1), edge complexity O ( @ n 2 k $ ) , and fan-in k for any integer d 2 1. Theorem 4.1 improves on the results in [ 141, by reducing the required depth by a factor of 3.67 asymptotically for circuits of fan-in k (and d = 1) when logkn is not a constant and by a factor of 4.86 asymptotically for circuits with similar fan-in and edge complexity when d is large.
The depth of our circuit for multiplication is optimal within a factor of 3 + o( 1) from a trivial lower bound logk2n when log2d = o(logkrz). Since any multiplication circuit based on bit-matrix reduction has n2 intermediate values, each of which may affect the most significant bit of the product, the depth of our circuit is optimal within a factor of 1.5 + o( 1) from the lower bound for any multiplication circuit using bit-matrix reduction.
5-Conclusion
We have proposed several threshold circuits to perform iterated addition and multiplication and to evaluate symmetric functions. Our constructions provide effective tradeoffs among edge complexity, circuit depth, and maximum fan-in through the flexibility provided in the choice of the parameters k (fan-in) and d (levels of hierarchy). Our circuits appear to be considerably more depthefficient than the best previous circuits, assuming similar edge complexity and fan-in (or, alternatively, considerably more cost-effective for similar circuit depth). Moreover, the depths of all the circuits presented in this paper are optimal within a small constant factor with any fan-in restriction.
