Neural computation of arithmetic functions by Siu, Kai-Yeung & Bruck, Jehoshua
Neural Computation of Arithmetic 
Functions 
KAI-YEUNG SIU AND JEHOSHUA BRUCK 
The basic processing unit of a neural network i s  a linear thresh- 
old element. I t  has been known that neural networks can be much 
more powerful than traditional logic circuits, assuming that each 
threshold element can be built at a cost comparable to that o f  AND, 
OR, Nor logic elements. Whereas any logic circuit o f  polynomial 
size (in n) that computes the product of two n-bit numbers requires 
unbounded delay, such computations can be done in a neural net- 
work with “constant” delay. We improve some known results by 
showing that the product o f  two n-bit numbers and sorting of  n 
n-bit numbers can be computed by a polynomial-size neural net- 
work using only 4 and 5 unit delays, respectively. Moreover, the 
weights of  each threshold element in our neural networks require 
O(1og n)-bit (instead of n-bit) accuracy. 
I. INTRODUCTION 
Neural networks can be viewed as circuits of highly inter- 
connected parallel processing units called “neurons.” The 
most commonly used models of neurons are linear thresh- 
old gates or, when continuityor differentiability is required, 
elements with a sigmoid input-output function. Because of 
recent advances in VLSl technology, the neural network has 
also emerged as a new technology and has found wide 
application in many areas. 
Much of thecurrent research in neural networks i s  in the 
area of pattern classification and is concerned with devel- 
oping efficient “learning” algorithms for adjusting inter- 
connection weights adaptively to perform the desired clas- 
sification. Heuristics such as the “back propagation 
algorithm” have obtained surprisingly good empirical 
results [I]. In this paper, we shall look at another area of 
application of neural networks. Our model of a neuron i s  
the linearthresholdgate,and the networkarchitecturecon- 
sidered here i s  the layered feedforward network. We shall 
see how common arithmetic functions such as multipli- 
Manuscript received Nov. 14,1989; revised March 14,1990. This 
work was done while K.-Y. Siu was a research student associate at 
IBM Almaden Research Center and was supported in part by the 
Joint Services Program at Stanford University (US Army, US Navy, 
US Air Force) under Contract DAAL03-88-C-0011, and the Depart- 
ment of the Navy (NAVELEX) under Contract N00039-84-C-0211, 
NASA Headquarters, Center for Aeronautics and Space Informa- 
tion Sciences under Grant NAGW419-S6. 
K.-Y. Siu is with the Information Systems Laboratory, Stanford, 
CA 94305, USA. 
J. Bruck i s  with the IBM Research Division, Almaden Research 
Center, San Jose, CA 951204099. 
IEEE Log Number 9039184. 
cation and sorting can be efficiently computed in a “shal- 
low” neural network. Whereas the interconnection weights 
are modified adaptively for different inputs in pattern clas- 
sification and the desired classification is usually only 
approximated, in our network the weights are fixed for all 
inputs and the desired function i s  computed exactly. We 
shall confine our attention to operations on numbers rep- 
resented in binary and we assume the inputs are encoded 
in {+I ,  -1) instead of {I, 0). Little would change in our 
analysis i f  we adopted the conventional {I, 0) encoding 
since the transformation {+I, -1) + {I, 0) can easily be 
done by x -+ (x + 1)/2. 
The remainder of this paper i s  divided into seven major 
sections. In Section 11, we review the classical model of a 
neuron, indicate the limitation of its capability and address 
the issues of sensitivity and dynamic range of parameters 
fromthepractical pointofview. In Section Il1,weintroduce 
a more practical model of a neuron in which we restrict the 
weights to be integers and the growth rate of the magni- 
tudes of the weights to be at most polynomial in the size 
of the inputs. In Section IV, we consider a feedforward net- 
work of such neurons and indicate its unrestricted capa- 
bility to compute any Boolean function. In Section V, we 
present some known lower-bound results on the classical 
implementation of arithmetic functions such as multipli- 
cation of two n-bit integers to indicate that unbounded 
delay i s  required using AND, OR, NOT logic elements. In Sec- 
tion VI, we show that our model of a feedforward neural 
network is very fast in computing arithmetic functions. In 
particular, sorting, sum of n n-bit numbers, and multipli- 
cation of two n-bit numbers can all be computed by a shal- 
low neural network. The fact that these two functions can 
be computed in a “constant-depth’’ neural network was 
shown in [2] (see also [3]); however, their construction is  not 
depth-efficient and it i s  not explicitly stated how many con- 
stant layers are needed in each step of their construction. 
We shall see how the constant can be reduced by a more 
depth-efficient construction and by using the results in [4]. 
It has been known [5], [6] that more complicated arithmetic 
functions such as exponentation and division can be com- 
puted in a constant-depth neural network. We shall only 
review the technique of reducing division to exponentia- 
tion and refer interested readers to [5]. In the conclusion, 
we indicate some possible extension of these results and 
other directions of research. 
PROCEEDINGS OF THE IEEE, VOL. 78, NO. IO, OCTOBER 1990 1669 
II. CLASSICAL MODEL OF A NEURON 
The classical model of a neuron [A i s  a linear threshold 
device, which computes a linear combination of the inputs, 
compares the value with a threshold, and outputs + I  (or 
-1) if thevalue is larger(or smaller)than the threshold. More 
formally, we have 
Input: 
x' = (XI, . * , x,) E R" 
Parameters: 
weights iC = (w,, - , w,) E R" 
threshold 0 E R 
output: 
where 
+ I  i f y r  0 I -1 otherwise sgn {Y) = 
In this paper, we consider only Boolean inputs x' E { + I ,  
-I} '. It i s  easy to see that logic elements such as AND, OR, 
NOT can be simulated by a neuron: 
sgn (xl + - . + x, - n) 
+ I  iff all xi = +I I -1 otherwise = AND (Xi, * * ,  X,) = 
sgn (x, + * + x, + n - 1) 
+I 
-1 otherwise 
iff some x, = +I 
+I i f x  = -1 I -1 i f x  = + I  
I = O R  (Xi, * ' ', X,) = 
sgn (-XI = NOT (x) = 
ABooleanfunction thatcan be realized bya neuron iscalled 
a linear threshold function. However, the class of linear 
threshold functions only constitutes a vanishingly small 
subclass of the totality of Boolean functions. In fact, there 
are 2'" Boolean functions in n variables, but only 2°(n2) are 
linear threshold functions. On the other hand, since any 
Boolean function can be implemented by a network of AND, 
OR, NOT elements, it follows that a network of neurons can 
implement any Boolean function. Note that we have not yet 
made any restriction on the size of the network, i.e., the 
number of elements in the network. In general, an arbitrary 
Boolean function must require the size of the network to 
grow exponentially large with the number of input vari- 
ables. Later on, we shall see that some functions that can 
be computed by a network with a polynomial number of 
AND, OR, NOT elements require an unbounded number of 
delays, whereas only a constant delay i s  needed if com- 
puted by a neural network. 
In the definition of the classical model of a neuron, the 
weights can take on real values. Since we would like to 
implement a neuron using an analog device, from a prac- 
tical point of view it i s  important to see if the assumption 
1670 
of real valued weights i s  necessary. In other words, can all 
linear threshold functions be realized if the weights are of 
finite precision? Actually, it was known [8] that each of the 
weights in a linear threshold function of n variables can be 
assumed to be integersof O(n log n) bits. However, this st i l l  
allows theweights togrow exponentiallyfast with the num- 
ber of input variables. In fact, most linear threshold func- 
tions have weights that must grow exponentially fast. This 
fact can also be interpreted as the necessity of high accu- 
racy and high sensitivity of parameters in the actual imple- 
mentation of a neuron. Motivated by this consideration, in 
the next section we consider a more practical model of a 
neuron, in which the weights are restricted to grow only 
polynomially fast. 
1 1 1 .  MORE PRACTICAL MODEL OF A NEURON 
In the following, we consider a restricted class of neu- 
rons, which is more practical as a computational model. 
Each function f ( X )  = sgn (E:=, wi - x, + w,,)computed in this 
subclass i s  characterized by the property that the weights 
wiare integers and bounded bya polynomial in the number 
of input variables, that is, I wi I 5 n c  for some constant c > 
0. For conveniLnce, we refer to this restricted model of a 
neuron as an LTl element. ~ 
Since the weights in an LT, element are assumed to be 
polynomially large integers, this means that weonly require 
O(log n)-bit accuracy in each weight. Thus in actual analog 
implementation, the device is much less sensitive to small 
fluctuations of parameters than the classical Eodel. Note 
that the logic elements AND, OR, NOT are also LT, elements 
(see Section 11). A natural question to ask i s  how limited in 
capability are 8l elements in comparison with the classical 
model? In [4], it was shown that any classical neuron cAn 
be simulated by three layers of a polynomial number of LT, 
elements. In other words, we can trade off exponentially 
large weights with a polynomial increase in size and a con- 
stant increase in delay by a factor of three. Hence any func- 
tion that can be computed by a network of a polynomial 
number of classical neurons witJ constant delay can also 
be computed by a network of LT, elements with constant 
delay and polynomial increase in size. This leads naturally 
to the consideration of the corn utational capability of a 
feedforward neural network of 6, elements. 
IV. FEEDFORWARD NEURAL NETWORK 
A feedforward network i s  a network of interconnected 
functional elements E C: { + I ,  -1) + { + I ,  -1) with no 
feedback. More formally, we define a feedforward network 
to be an acyclic labeled directed graph, with 
a l is t  of ni, distinguished input nodes with indegree 
0 
internal nodes with arbitrary indegreewhich compute 
functional gates E C of the outputs from precedent 
nodes 
a list of nout distinguished output nodes. 
The depth of a node vis defined to be the length of the long- 
est path (each edge is  a unit length) from the input nodes 
tov.Thedepthofthenetworkisdefined to bethe maximum 
depth of all output nodes. If we group al l  gates with the 
same depth together, we can consider the network to be 
arranged in layers, where the depth of the network i s  equal 
PROCEEDINGS OF THE IEEE, VOL. 78, NO. IO, OCTOBER 1990 
to the number of layers (excluding the input layer) in the 
network, and gates of the same layer are computed in par- 
allel. Given an assignment of the input nodesfrom domain 
{ + I ,  - I }  n'n, the value of the network at each output node 
is obtained by evaluation of the gates in increasing depth 
order. The network therefore defines a mapping from { + I ,  
- 1 } nln to { + I ,  - I }  and the depth of the network can 
be interpreted as the time for its parallel execution of the 
we introduced a new model of a neuron, called an 3l ele- 
ment, as the basic building block in our neural network. In 
fact, the main theme of this paper i s  to see how a shallow 
neural network of polynomial size can compute common 
functions such as multiplication and sortingwith small con- 
stant delay. 
VI. COMPUTING WITH SHALLOW NEURAL NETWORKS 
mapping. 
We define a neural network to be a feedforward network 
of 8, elements. Similarly, a logic circuit is a feedforward 
network of AND, OR, NOT logic gates. Obviously, any Boolean 
function can be computed by a logic circuit (without any 
restriction on its size) and thus by a neural network, since 
AND, OR, NOT are also L?, elements. 
Loosely speaking, a network is  shallow i f  it has small 
depth. Before we show how a shallow neural network can 
compute arithmetic functions, we first review the classical 
implementation using AND, OR, NOT logic elements and the 
limitation of constant depth circuits of such elements. This 
i s  the subject of next section. 
V. CLASSICAL IMPLEMENTATION OF ARITHMETIC FUNCTIONS 
It i s  well known among experienced circuit designers that 
they cannot implement some common functions such as 
parity and multiplication with small programmable logic 
arrays (PLAs), a type of integrated circuit used inside micro- 
processors to compactly represent many functions. Thus 
it i s  of both practical and theoretical interest to see how 
largethe sizeof logiccircuits must betocomputesuch com- 
mon functions. It turns out that PLAs are well modeled by 
bounded-depth circuits of AND, OR, NOT logic elements with 
arbitraryfan-in. In 1961, Lupanov[9] studied bounded-depth 
circuits and showed that paritycircuits of depth 2 must have 
an exponential number of gates. A breakthrough in theo- 
retical research occurred in 1981 [IO]; Furst er al. showed 
that any bounded-depth logic circuits must use more than 
a polynomial number of gates with arbitrary fan-in. This 
lower-bound result was further improved by several 
researchers [I?], [12], who showed that an exponential num- 
ber of gates i s  necessary to implement the parity function 
in a bounded-depth logic circuit. All these results can be 
interpreted as proofs that any PLA implementing parity must 
have an exponential amount of chip area, and thus estab- 
lishing a basis for the common belief among circuit design- 
ers. 
Another way of interpreting these results i s  that any par- 
itycircuitwhich uses apolynomial amountof chipareamust 
have unbounded delay. By introducing the notion of con- 
stant-depth reduction [2], similar results can be shown for 
other common functions such as multiplication and divi- 
sion. In fact, currently used multipliers require O(log n) 
delays for input number of n-bits. The lower-bound results 
also imply that the minimum possible delay for multipliers 
of polynomial size is fl(log nllog log n). 
We can explain the preceding negative results by the fact 
that the basic processing logic elements AND, OR, NOT of the 
circuits are not powerful enough. In practical implemen- 
tation, these logic elements are built using analog devices; 
perhaps we can build a more powerful gate out of analog 
devices to increase the computational power of the circuit? 
In Section Ill, because of some issues of implementation, 
In this section we focus on the computational capability 
of the feedforward neural network model introduced in 
Section IV. We assume that each neuron takes a unit delay 
to compute and we consider only neural networks of poly- 
nomial size. We shall see how the product of two n-bit num- 
bers and sorting of n n-bit numbers can be computed with 
only 4 and 5 unit delays, respectively. Since our construc- 
tion of the "neural multiplier" generalizes a known tech- 
nique of computing symmetric functions with a neural net- 
work, we first show how any symmetric function can be 
computed in two layers of neural networks [131,[141. 
A. Computing a Symmetric Function 
Definition: A Boolean function f i s  said to be symmetric i f  
f(xl, * * xn) = f(X(1), * * x(nJ 
for any permutation (x(~) ,  , x(,$ of (xl, . . * , xn), or equiv- 
alently, there exists a set of numbers {kl, * . . , k,}, 1 k, 1 I 
n such that 
n 
f(xl, , x,) = 1 iff c x, E {kl, , k / } .  
In other words, a symmetric function depends only on the 
sum of input values. Using the same notation, let 
, = l  
f n  
The first layer of our network consists of neurons which 
compute the values yk, and yk,. In the second layer, the out- 
put neuron takes as inputs Yk,, )ik, and outputs sgn { c \ = ~  
(Yk, + Pk,) - 1 ) .  If E?=, X; $ {kl, * ' . , k,}, then Yk, = -pk, for 
a l l j  = I ,  ' ' * ,/. Thus, E,=, (Yk, + pk,) = 0 and output = -1. 
On the other hand, if Cy=, x, = k, for some k, E {kl, . . , 
= sgn {E:=, (Yk, + pk,) - 11 = sgn (2  - 1 1  = I .  Hence our 
network correctly computes the desired symmetric func- 
tion. Since the parity function is symmetric, it follows from 
the above results that parity can be computed in two layers 
of neural network, whereas it takes unbounded delay to 
compute parity in a logic circuit. Figure 1 illustrates a two- 
layer network for computing the parity function of three 
variables. Note that 
k , } ,  then yk, = pk, = I and Yk, = pk, for i # j. Thus, Output 
Parity (xl, x2, x,) = 1 iff number of x;s = 1 i s  odd 
iff x1 + x2 + x3 = -1 or 3. 
On closer observation, it i s  evident that the above con- 
struction also holds for any Boolean function f(xl, . . . , x,) 
whose value only depends on a weighted sum of the vari- 
ablesC:=l w, . x,,wheretheweights w,are integersand poly- 
1671 SIU AND BRUCK: NEURAL COMPUTATION OF ARITHMETIC FUNCTIONS 
I \ 
- / 
1st  layer 2nd layer 
Fig. 1. Two-layer neural network for computing the Parity 
(xl, xz, x,) function. 
nomially bounded. Thus any such function can be com- 
puted in two layers of neural networks. Also notice that 
since we only use the output neuron to compute the linear 
combination of the outputs from the first layer, which only 
takeson value +I or -1, it i s  redundanttocomputethesgn 
( -  * a )  after computing the linear combination. We shall 
make use of these two observations in the next section. 
B. Addition and Multiplication 
Using the carry-look-ahead method, it was known that 
the sum of two n-bit numbers can be computed in a 
bounded-depth logic circuits of polynomial size with arbi- 
traryfan-in. In fact, itwasshown in[4]thatatwo-layerneural 
network suffices. This result is based on ideas from har- 
monicanalysisof Boolean functionsand we refer interested 
readers to [4], [I51 for more details. Since the least signif- 
icant bit of the sum is the EXCLUSIVE-OR function of the least 
significant bits of the two numbers, which i s  not a linear 
threshold function, it follows that the sum cannot be com- 
puted usingonlyone 1ayer.Thusatwo-layer neural network 
i s  depth-optimal. 
Whereas the sum of two n-bit numbers can be computed 
in bounded-depth logic circuits, the results in [IO] imply 
that the sum of n n-bit numbers cannot be computed with 
bounded delay. However, such computations can be done 
with bounded delay in a neural network. In the following, 
we first show how to compute the sum of n log n-bit num- 
bers in two layers by generalizing the techniques of com- 
puting symmetric functions. Based on this technique, we 
thenshowhowtoreducethesumofnn-bit numbemtothat 
of two O(n)-bit numbers using two layers. The results in [4] 
implythat two more layers suffice tocompute the final sum. 
Afterwards we shall see how to combine the second and 
the third layers. Finally, we show how the product of two 
n-bit numbers can be reduced to the sum of n %-bit num- 
bers using one more layer, so that altogether only four lay- 
ers are needed to compute the product. 
1) Computing the Sum of n log n-bit Numbers with Two 
Layers: Given n log n-bit numbers, say in binary represen- 
tation, z, = zIlogn . - . . n, we would like to 
compute the binary representation of their sum 
z,, for i = 1, 
n logn 
s = c zi = c 2i4(Zl, + z2, + * * . + ZnJ. 
i=l , = 1  
1672 
I 
Clearly, s is  a polynomially bounded weighted sum of the 
variables z,, for I = 1, . . , n and = 1, . . , log n. Thus, 
each bit of the binary representation of the sum s can be 
regarded as a Boolean function that depends onlyon a poly- 
nomially bounded weighted sum of n x log n input vari- 
ables. From the first remark given at the end of Section VI- 
A, any such function can be computed using 2 layers. 
2) Reduction of the Sum of Two O(n)-bit Numbers: Sup- 
posewearegiven nn-bit binarynumbers:~, = x, ,x,"-~.  . . 
xIor I= 1, . . * , n and we want to compute their sum. We 
shall see how to reduce this multiple sum to the sum of two 
numbers. Without loss of generality, we assume that N = 
nllog nand log n are integers, where log denotes logarithm 
tothe base2. Considerthe following scheme: Partition each 
binary number x; into N consecutive blocks Y,,, Y,,, ., 
Y I N - ,  of log n bits each so that 
N - 1  
/ = 0  
x, = c,, . 2logn I 
where 0 5 2, < 2"gn. Note that in binary representation, 
say a block g,, i s  "odd" 6t  "even" if i s  odd or even, respec- 
tively. 
Let Sodd denote the sum of then numbers when the even 
blocks are set to zero and seven denote the sum when the 
odd blocks are set to zero. The sum of the original n num- 
bers will be the sum of Sod, and seven. We now show how 
to compute Sodd and seven in parallel using two layers. 
Observe that for each j = 0, . * . , N - 1, the sum 
Y,, = X,logo 2 . * * XI, and L, = xf"-lx,"-z * - XI, log"' We 
n-1 n-1  
S, = c Yl, < c 2logn = 221ogn 
r = O  I =o 
and thus S/ can be represented in 2 log n bits. Observe that 
each 5, i s  the sum of n log n-bit numbers. It follows from 
the previous section that the binary representation of each 
S, can be computed with two layers. Now 
Since Si can be represented in 2 log n bits, there is no over- 
lapping in the binary representation between 
Therefore, wecan sum each odd blockS, in parallel with two 
layers and concatenate the resulting bits of 'each sum 
together to obtain Sodd. We can obtain seven in a similar fash- 
ion in parallel. To sum the two O(n)-bit numbers Sodd and 
seven, another two layers suffice [4]. 
3) Combining the Second and Third Layers: Recall the 
second remark given at the end of Section VI-A. Since each 
output of the second layer is equal to a linear combination 
of the outputs from the first layer, which only takes on val- 
ues +I or -1, the sgn ( a  . .)  in the second layer i s  not 
needed. Therefore we can directly feed the outputs from 
the first layer and take the linear combination as inputs to 
the third layer. As a result, the first three layers can be com- 
bined into two layers. So altogether only three layers are 
needed to compute the sum of n n-bit numbers. 
A small numerical examplewill be helpful to illustratethe 
ideas. We take n = 16 and for simplicity, we only compute 
the sum of four 16-bit numbers. In Fig. 2, each of the four 
binary numbers xl, x2, x3, x4 are partitioned into four blocks 
Ylo, Y,,, Y I 2 ,  Yls  of log n = 4 bits each, for i = 1, . . . , 4. The 
P R O C E E D I N G S  OF T H E  IEEE, VOL. 78, NO. IO, O C T O B E R  1990 
+ 1 0 0 1 0  0 0 0 1 ~ ~ 0 0 0 1  1 1 1 0 ~ 0 0 0 0  - S d d  
0 0 1 0  0 0 1 0  1 1 1 1  0 0 0 0  0 0 0 1  - to ta l sun  
Fig. 2. Computing a multiple sum. 
even blocks RIO, Rlz are denoted by a dotted rectangle and 
the odd blocks RI,, RI, are denoted by a solid rectangle. 
4) Reducing the Product to a Multiple Sum: Computing 
the product of 2 n-bit binary nurnbersx = X , - ~ X , - ~  . . . xo, 
Y = y n - 1 y n - 2  . . . yo i s  equivalent to computing the sum of 
n 2n-bit binary numbers: 
z, = Z,zn-lXIZ"-z . * * Z,,, 
where 
i = 0, . . . , n - l r  
0 i f ( i + n s  k s 2 n - ? ) o r ( O s  k < i )  
X k - ,  A y, if i 5 k < i + n Zk = [ 
where A denotes the logic AND function. In other words, 
z, = 0.. . 0 ( X n - i  A k - 2  A y l )  ' ' . ( X o  A y,) 0 . . ' 0 
U U 
n - ,  I 
Giventhetwon-bitinputbinarynumbersx = x n - l x n - 2 - .  . 
x o , y  = y n - 1 y n - 2  . yo, thefirst layerof our multiplier net- 
work outputs the n 2n-bit binary numbers z, = z , ~ ~ ~ ~ x , ~ ~ ~ ~  
. . . z,, and then computes the sum in three more layers. 
Thus the product of two n-bit numbers can be computed 
in four layers. 
C. Sorting 
Here we shall see how sorting of n n-bit numbers can be 
computed in a neural networkwith depth 5. The techniques 
are mainly based on the results in [2]. We assume that the 
input is a list of the n n-bit binary numbers and the output 
will be the same list sorted in nondecreasing order. A num- 
ber which appears m times in the input list will be dupli- 
cated m times in the output list. 
In sorting, the basic operation i s  the comparison of two 
numbers, i.e., given two n-bit binary numbers x = x ,x , - ,  
. . .  x,,  y = y n y n - ,  * * y l ,  we want to compute whether x 
2 y. It istemptingtoconcludethatcomparison can becom- 
puted in a single layer since 
f n  \ 
2'  . ( x ,  - y,)  = +I i 
However, notice that the weights chosen above are expo- 
nential in n ankthus do not satisfy the conditions in our 
definition of a LT, element. In fact, it was shown in [4] that 
the Emparison function cannot be computed using a SE- 
gle LT, element, but it can be computed in two layers of LT, 
elements. 
, n, denote the input 
binary numbers. Define 
Let z, = zlnzln-,  . . . z,,, fo r i  = 1, 
+I 
-1 otherwise 
i f  z, > z, or (z, = z/ and i 2 j )  
c,/ = 
Note that for each i, p, = 
z, in the sorted list. If we let 
(1 + c,)/2 i s  the position of 
EQrn(pi) = (sgn {p, - m }  + sgn { m  - p,))  - 1 
+I if p, = m 
-1 otherwise 
then the kth bit of the mth number in the sorted list i s  
where v and A respectively denote the OR and AND func- 
tions. 
In our neural network, the comparison functions c,/s are 
computed in the first two layers. The next two layers are 
used to compute 
SIU AND BRUCK NEURAL COMPUTATION OF ARITHMETIC FUNCTIONS 
D. Extensions to Other Functions 
It is natural to continue our study of neural networks on 
computation of more complicated arithmetic functions 
such as exponentation, division, and extraction of square 
roots. In fact, it can be shown that multiplication of n n-bit 
numbers and division of 2 n-bit numbers can also be com- 
puted by a constant-depth neural network. In [5 ] ,  it was 
shown how multiplication of n n-bit numbers can be com- 
puted in logic circuits of O(log n)  depth, using the Chinese 
Remainder theorem. The basic idea of the construction i s  
to hardwire in a polynomial size tableof discrete logarithms 
for some prime powers and then reduce the problem to one 
of iterated addition. On closer observation, it is not hard 
to see how this construction of O(log n) depth logic circuits 
can beadapted toaconstruction of aconstant-depth neural 
network. A presentation of such results would take us too 
far from the scope of this paper because of the necessary 
number-theoretic background. Moreover, the constant 
obtained by direct application of the algorithm in [5] will 
be too large for the resulting neural network to be consid- 
ered shallow, and therefore the difference between log n 
and the constant in the delay i s  not significant unless the 
input numbers are astronomically large. These results are 
of theoretical importance, however, because they describe 
the fundamental difference in computation between neural 
networks and logic circuits. At this time, we are not able to 
reduce the constant to obtain a shallow neural network for 
division and multiple product. Below,we shall only indicate 
how division can be computed in constant-depth neural 
network, provided that exponentiation can be computed 
in constant delay. (See [5] for more details.) 
Suppose we are given two n-bit binary numbers x ,  y,  and 
we wish to compute the n-bit representation of Lx ly ]  , i.e., 
thegreatest integer 5 x ly .  Weshall assume2 5 y < x .  Since 
xIy i s  equal to the product of x and y -', it i s  enough to get 
a finite underapproximation 9-l of y - '  with error <2-". 
Then in a constant-depth neural network, we can compute 
1673 
 
9 = x * y-' with error < I  and determine which one of the 
Let j 2 2 be an integer such that 2 j - l  5 y < 2'. Note that 
I s land we can express y -' as a series expansion 
L9J or L9J + 1 is  Lx~YJ * 
I 1 - y2 
y-l = 2 -i . (1 - (1 - y2 -$-I 
If we put 
then the difference between y-' and y-i is  less than 2-". 
Since the exponentiation (1 - y 2  -')I is a special case of mul- 
tiple products, we can compute them in parallel with a con- 
stant-depth network from the previous remarks and com- 
pute the multiple sum as shown in Section V-B. 
Since evaluation of the numerator and denominator of 
a rational function involves computing a sum of multiple 
products in constant depth, applying the numerator and 
denominator to the division network gives the value of the 
rational function. As a result, any rational function can be 
computed in constant delay by neural networks. In general, 
we can conclude that an analytic function which is well 
approximated by a truncated power series can be evaluated 
by a constant-depth neural network. 
VII. CONCLUSION 
We have introduced a restricted model of a neuron, which 
is more practical as a model of computation than the clas- 
sical model. We define our model as afeedforward network 
of such neurons. We have shown how common arithmetic 
functions such as multiple addition, multiplication, and 
sorting can be computed by a polynomial-size shallow 
neural network with 3, 4, and 5 unit delays, respectively, 
whereas it was known that these functions cannot be com- 
puted in constant-depth logic circuits. Applying the results 
in [5], we also indicated how these results can be extended 
to more complicated functions such as multiple products, 
division, rational functions, and approximation of analytic 
functions. 
A natural continuation of our study i s  to consider even 
morecomplicated functions such as indicator functions for 
graph connectivity, bipartite matching, and network flow, 
all of which have well-known polynomial time algorithms. 
Another direction of research is  to obtain lower-bound 
results in order to obtain a depth-optimal neural network. 
In fact, it was shown in [I31 that computing the product of 
two n-bit numbers requires more than two layers, whereas 
our depth-4 multiplier network provides an upper bound 
of four layers. However, the well-known lower-bound tech- 
niques for unbounded fan-in logic circuits appear to break 
down completely in the case of neural networks, where the 
linear threshold elements are the basic processing units. 
Even though there are many candidate functions which 
appear not to be computable by a polynomial-size 
depth-3 neural network, at present no such function is  
explicitly proved to exist. It i s  of theoretical interest to note 
here that any function computable in a polynomial sizecon- 
1674 
I 
stant-depth logiccircuitwith unbounded fan-in isalsocom- 
putable in a depth-3 neural network of superpolynomial- 
that is, n o('ogn)(instead of exponential)-size [16]. Therefore, 
proving that our multiplication and sorting networks are 
depth-optimal will be a difficult task and new lower-bound 
techniques in circuit complexity for networks of linear 
threshold elements have to be developed. 
The moral of our study is that circuits based on threshold 
elements could be extremely powerful. Of course, these 
hopes are based on the assumption that a linear threshold (n,) element can be implemented using analog devices 
whose unit cost is small. This would justify research in 
device technology to investigate the feasibility of building 
such elements with small cost. 
ACKNOWLEDGMENT 
The first author would like to thank Prof. Thomas Kailath 
for his guidance, constant encouragement, and financial 
support. 
REFERENCES 
J. L. McClelland, D. E. Rumelhardt, and the PDP Research 
Group, Parallel Distributed Processing: €xplorations in the 
Microstructure of Cognition, vol. 1. MIT Press, 1986. 
A. K. Chandra, L. Stockmeyer, and U. Vishkin, "Constant 
depth reducibility," Siam ]. Comput., vol. 13, pp. 423-439, 
1984. 
N. Pippenger, "The complexity of computations by net- 
works,"lBM]. Res. Develop.,vol. 31, no. 2, pp. 235-243, Mar. 
1987. 
K. Y. Siu and J. Bruck, On the Dynamic Range of Linear 
Threshold Elements, Tech. Rep. RJ 7237, IBM Research, Jan. 
1990, to be submitted to SIAM). Discrete Math. 
P. W. Beame, S .  A. Cook, and H. J. Hoover, "Log depth circuits 
for division and related problems," Siam]. Comput., vol. 15, 
J. Reif, "On threshold circuits and polynomial computation," 
froc. 2nd Ann. Structure in Complexity Theory Symp., pp. 
M. Minsky and S .  Papert, ferceptrons. MIT Press, expanded 
edition, 1988. 
P. Raghavan, Learning in Threshold Networks: A Computa- 
tion Model and Applications, Tech. Rep. RC 13859, IBM 
Research, July 1988. 
0. Lupanov, "Implementing the algebra of logic functions in 
terms of constant-depth formulas in the basis f, *, -," Sov. 
fhys. Dokl., vol. 6, no. 2, 1961. 
M. Furst, J. B. Saxe, and M. Sipser, "Parity, circuits and the 
polynomial-time hierarchy," froc. I€€€ Symp. found. Comp. 
Sci., vol. 22, pp. 260-270, 1981. 
J. Hastad, "Almost optimal lower bounds for small depth cir- 
cuits," froc. ACM Symp. Theor. Computing, vol. 18, pp. 6-20, 
1986. 
R. Smolensky, "Algebraic methods in the theory of lower 
bounds for Boolean circuit complexity," Proc. ACM Symp. 
Theor. Computing, vol. 19, pp. 77-82, 1987. 
A. Hajnal, W. Maass, P. Pudlak, M. Szegedy, and G .  Turan, 
"Threshold circuits of bounded depth," /E€€  Symp. found. 
Comp. Sci., vol. 28, pp. 99-110, 1987. 
J. Bruck, "Harmonic analysis of polynomial threshold func- 
tion," SIAM 1. Discrete Math., vol. 3, no. 2, pp. 168-177, May 
1990. 
J. Bruck and R. Smolensky, Polynomial Threshold functions, 
ACo Functions and Spectral Norms, Tech. Rep. RJ 7140, IBM 
Research, Nov. 1989; to appear lEEESymp. Found. Comp. Sci., 
1990. 
E. Allender, "A note on the power of threshold circuits," to 
in /€E€ Symp. found. Comp. Sci., vol. 30, 1989. 
994-1003,1986. 
118-123,1987. 
PROCEEDINGS OF THE IEEE, VOL. 78, NO. 10, OCTOBER 1990 
- 
I 
Kai-Yeung Siu was born in Hong Kong on 
October 9, 1966. H e  received the B.Sc. 
degree in mathematics and computer sci- 
ence from New York University, NY, and 
the B.Eng. degree in electrical engineering 
from The Cooper Union, NY, both in 1987. 
In June 1988, he received the M.Sc. degree 
in electrical engineering from Stanford 
University, CA. 
He i s  currently associated with the Infor- 
mation Svstems Laboratorvat Stanford Uni- 
Jehoshua Bruck was born in Haifa, Israel, on 
April 19, 1956. He received the B.Sc. and 
M.Sc. degrees in electrical engineering 
from the Technion, Israel Institute of Tech- 
nology, in 1982 and 1985, respectively, and 
the Ph.D. degree in electrical engineering 
from Stanford University in 1989. 
From 1982 to 1985 he was with the IBM 
Haifa Scientific Center, Israel. In March, 
1989, he joined the IBM Research Division 
at the Almaden Research Center, San Jose, 
versity and pursuing his Ph.D. degree under the guidance of Prof. 
Thomas Kailath. He i s  also a research student associate with the 
Computer Science Department at IBM Almaden Research Center, 
San Jose, CA. His research interests include computational com- 
plexity theory, neural networks and parallel computation. 
CA, where he is presently a Research Staff Member. 
Dr. Bruck’s research interests include error-correcting codes, 
fault-tolerant computing, parallel computing, and neural net- 
works. 
SIU AND BRUCK: NEURAL COMPUTATION OF ARITHMETIC FUNCTIONS 
~ ___- 
1675 
