In this paper we investigate the reduction of the size for small depth feed-forward linear threshold networks performing binary addition, comparison, and related functions. For n bit operands we propose a depth-3 O( n 2 log n ) asymptotic size network for the binary addition with polynomially bounded weights. We propose also a depth-3 addition of optimal O(n) asymptotic size network and a depth-2 comparison of O( p n) asymptotic size network, both with O(2 p n ) asymptotic size of weight values. For existing architectural formats we show that our schemes, with equal or smaller depth networks, substantially outperform existing schemes in terms of size and fan-in requirements and in occasions in weight requirements.
I. Introduction and Main Results
A linear threshold gate with a Boolean output F(X) is de ned by:
F(X) = sgn(F(X)) = In this paper we investigate feed-forward linear threshold gates based networks for addition and addition related operations. Regarding such operations the following has been established using threshold logic based parallel networks:
For the binary addition, Siu et al 4], 10] suggested that each bit of the sum is computable with depth-2 networks with a network size of O(n 4 ) and that the network size can be reduced to O(n 2 ) for depth-3 networks.
In 4] it has been indicated that the comparison function, performed on two operands of length n, can be computed in depth-2 networks with size of O(n 4 ). Further with depth-3 networks, it has been suggested that the comparison can be realized with size DRAFT of O(n). Roychowdhury and al 11] suggested that the comparison can be computed in depth-3 networks with size of O( n log n ) and polynomially bounded weights. We investigate the reduction of the network size for depth-3 networks for addition and depth-2 networks for comparison. The main theoretical conclusions of the paper can be summarized as follows:
Addition can be performed by a depth-3 network with the size in the order of O( n 2 logn ) and polynomially bounded weights. Addition can be performed by a depth-3 network 1 with 6n + 2d n d p n e e size (i.e. of optimal O(n) size complexity), a maximum fan-in of 2d n d p n e e + 3 and a maximum weight size of 2 d p n e . It is not known if optimal O(n) size depth-3 networks with polynomial weights are possible. The comparison of two n-bit operands with carry can be computed by a depth-2 network with 2d n d p n e e + 1 size (i.e. of O( p n) size complexity), a maximum fan-in of MAXf2d n d p n e e + 1 , 2d p n eg and a maximum weight size of 2 d p n e . It is not known if O( p n) size depth-2 networks with polynomial weights are possible.
Concerning practical situations, represented by existing architectural formats, we show that our schemes provide sizable advantages over other schemes. In particular we show the following:
The proposed addition scheme with polynomially bounded weights requires up to 71% threshold gates and 28% fan-in for the realization of 32-bit adders and up to 47% threshold gates and 18% fan-in for the realization of 64-bit devices when compared to the Siu et al scheme 4], known to be the best schemes thus far for small depth and size networks for addition. The proposed O(n) size addition scheme requires up to 18% threshold gates for the realization of 32-bit adders and up to 9% for the realization of 64-bit adders when compared to the Siu et al scheme 4]. Our scheme implies a maximum weight value twice (for 32 bit operands) or four times (for 64 bit operands) the maximum weight value deduced from 4], but it provides an 8:53 respectively 13:47 times lower fan-in. For equal delay our scheme requires up to 18% gates, 50% weights and 28% fan-in 1 It is interesting to note that this scheme allows also an implicit construction of a depth-2 network for the addition with the size in the order of O(n).
DRAFT for the realization of 32-bit comparators and up to 13% gates, equal weights and 20% fan-in for the realization of 64-bit devices , when compared to the Siu et al scheme 4] . When compared with Roychowdhury and al scheme 4], 11] it requires up to 94% gates, 25% weights and 75% fan-in for the realization of 32-bit comparators and up to 83% gates, 50% weights and 92% fan-in for the realization of 64-bit devices. The presentation is organized as follows: In Section 2 we present the proposed schemes for addition and addition related functions. Section 3 contains comparisons between our approaches and what is known as the state of the art for some usual dimensions of operands and Section 4 some concluding remarks.
II. Recursive Formulae for Binary Addition
Binary addition requires the computation of the carry and the sum. We assume that the operands are partitioned into groups. In order to produce the carry equations, for a group i of length l, we de ne two new quantities, i , (the carry-force quantity) and i , (the carry-preserve quantity) de ned by the following: carry-force: i = 1 when the group's sum has a value 2 l ] + = and 0 otherwise 2 .
carry-preserve: i = 1 when the group's sum has a value 2 l ? 1] + = and 0 otherwise.
The theorem to follow introduces a carry computation using threshold logic. Proof: By induction. Given that the expression for the carry presumably computes the true carry C i it must be that i ? 1 0 when the true carry of the addition is C i = 1 and that i ? 1 < 0 when the true carry for the addition is C i = 0.
basis: Trivial with proper substitutions.
step: Assume that the theorem holds true for k ? 1 prove that it is also true for k. Assuming that the theorem holds true for k ? 1 it is implied that:
If the true carry for the addition C k?1 = 1 then k?1 1 . If the true carry for the addition C k?1 = 0 then k?1 < 1. Further, by removing the recurrence and with substitutions it can be proven that the maximumvalue of k?1 is MAXf k?1 g = 2 k and the minimum is MINf k?1 g = ?2 k +1. 2 We use x + = and x ? = in order to denote greater or equal and less than or equal to x respectively.
DRAFT
The carry C k?1 is the carry into the group i thus the logical expression for the carry-out is C k = k + k C k?1 , and it must be proven that C k = sgnf2 k k + k ? 1] + k?1 ? 1g is equivalent to this logical expression. The logical expression dictates to consider, after exclusion of irrelevant cases 3 The maximum fan-in is due to either the computation of the carry out on the second level of threshold gates or the computation of the group m and m . The fan-in required for the carry is equal to 2d n x e + 1. The fan-in requirements for the m and m depends on the number of bits comprising a group. It is equal to 2x (the bits of both operands are required to compute the m and m for any given m). Consequently, the maximum fan-in required for comparison is MAXf2d n x e + 1; 2xg. With appropriate considerations, the maximum weight value required can be computed to be equal to MAXf2 d n x e ; 2 x g.
Consequently, the weight sizes are minimum when 2 d n x e = 2 x implying a partition of p n bits per group.
Because the number of blocks has to be an integer number we have to assume for x the value d p n e. This leads to the maximum weight of MAXf2 d n d p n e e ; 2 d p n e g and to a maximum fan-in of MAXf2d n d p n e e + 1; 2d p n eg. In order to be able to assume an upper bound for the result of the MAX operator we have to establish a relation between d n d p n e e and d p n e. If n is a perfect square then d n d p n e e = d p n e, otherwise it can be proved, based on the fact that dxe x + 1 holds true for any x, that d n d p n e e d p n e and the di erence between the two numbers could not be larger than 1. Therefore the weights are at most 2 d p n e and the maximum fan-in is upper bounded by 2d p n e + 1.
Regarding the size the rst level requires 2d n d p n e e threshold gates for the computations of the m and the m quantities. One threshold gate is required to compute the carry-out on the second level. Consequently, the comparison requires 2d n d p n e e+1 threshold gates.
DRAFT Theorem 2: The 2 ? 1 addition of two n-bit binary numbers can be computed by an explicit depth-3 linear threshold network with O( n 2 log n ) size and polynomially bounded weights.
Proof: Assume that the operands have been subdivided into groups and that each group contains at most m log n bits 4 . For the bits in position j = 0 inside the group i, the bit carry-force and the bit carry-preserve are the group carry-force and respectively the group carry-preserve that correspond to the group i ? 1. We need 4n ? 4 l n m log n m gates for the computation of all the bit carry-force and bit carry-preserve quantities. Obviously the fan-in and the weight requirements for bit carry-force and bit carry-preserve are less than for the group quantities. All these group and bit carry-force and carry-preserve can be computed in parallel with the expense of 4n gates.
Given (2) All the products in Equation (2) can be computed in parallel in one gate delay with 4(i + 2) threshold gates and after that the logical OR of these products can be done with one threshold gate.
Therefore the entire addition can be performed by a depth-3 network. In the rst level we compute the group and bit carry-force and carry-preserve quantities with 4n threshold gates. The second level computes the products in Equation (2) . Because each bit position j in the group i needs 4(i + 2) products and there are m log n bit positions in each group we need 4m log n(i + 2) threshold gates in order to compute the products that correspond to the sum bits in group i. Because The third level of the network contains n threshold gates, one for each bit position. Therefore the entire size of the network is in the order of O( n 2 logn ). Because all the gates in the second level compute logical ANDs the inputs' weights are 1 and the threshold values are at most m log n + 4. All the gates on the third level perform logic ORs and therefore have all the inputs' weights and the thresholds equal to 1. As a consequence the weight values are dominated by the weights associated to the gates in the rst level and therefore are in the order of O(n m ), i.e. polynomially bounded. Corollary 3: The maximum fan-in for the threshold gates in the network is given by MAX n 2m log n; 4 l n m log n m + 2 o . Proof: The maximum fan-in is equal to 2m log n for the gates in the rst level. By the inclusion in the products of the bits X j and Y j (normal or inverted) the products in Equation (2) contain at most i+4 variables and therefore the maximumfan-in for the gates that compute the AND terms is equal with l n m logn m +4. Because the Equation (2) It will be shown that 1 + = , 1 ? = , and 3 + = can be computed by:
Where the k and k are computed for all k, except for k = i, using the entire group of bits and for k = i the quantities k and k are computed by considering the bits r of the DRAFT group i where 0 r j ? 1. case 1 (1 + = ) : To prove that the 1 + = expression is correct we must prove that if any of X j , Y j , and C j?1 is equal to 1 then 1 + = = 1 and if none of X j , Y j , and C j?1 is 1 then 1 + = = 0. Clearly for C j?1 it holds true (proven earlier) and it can be trivially proven (with substitutions) that the case holds true for X j , Y j values. case 2 (1 ? = ) : Analogous to case 1. case 3 (3 + = ) : Analogous to case 1 with proper considerations. The equations that compute the sum require an explicit depth-3 network computing: on the rst level the k and k for all groups and bits for the group i. On the second level the network computes 1 + = , 1 ? = , and 3 + = , and nally on the third level the network computes the S j for all j.
In order to compute the cost we divide the addition, as we did for the comparison, into groups of length x. By following the same way of reasoning as in the Corollary 2 we obtain that the optimum value of the maximum number of bits in each group is d p n e. This partition leads to a maximum fan-in of 2d n d p n e e + 3 and to a maximum weight of 2 d p n e . Under the assumption that the partition of the operands is done in groups of d p n e bits the following is required regarding the size of the network. In order to compute the group i and i it is required to have 2d n d p n e e threshold gates in the rst level. Further we require at most 2n threshold gates in the rst level to compute all bit k and k . On the second level we require 3n gates to compute 1 + = , 1 ? = , and 3 + = and nally we require n gates on the third level to compute the sum S j for all j. Thus the entire scheme requires at most 6n + 2d n d p n e e threshold gates to compute the sum. This scheme will increase the fan-in for the next network that uses as input the computed sum by 2, because the value of each sum bit is carried by 3 signals instead of 1.
III. Comparisons
In the previous discussion we have determined the network requirements in general. Our scheme for addition presented in Theorem 2 provides polynomially bounded weights and a network size in the order of O( n 2 log n ) and it is superior to the scheme presented in 4] which has an O(n 2 ) size. Consequently, we imposed an optimal O(n) size for depth-3 networks for the addition and investigated the in uence such an imposition had on the weight sizes and fan-in. Given that asymptotic complexities need not apply to realistic scenarios we considered as DRAFT a nal exercise a comparison with other schemes assuming existing architectural formats. In particular we considered 32 and 64 bit architectures and estimated the requirements of the various schemes. The results of our estimations are reported in Table I and Table II . For the evaluation of Siu and al and Roychowdhury and al schemes performance we used the formulas reported in 4], 11]. The PW Addition row corresponds to the addition scheme presented in Theorem 2 for m = 1, i.e. the division of operands in groups of log n bits. For the other rows we assumed that the subdivision of the operands is made using d p n e. The depth-3 comparison is done by rst dividing the operands in two and after that in d q n 2 e. The rst level computes the carry-force and carry-preserve for all the groups of d q n 2 e bits. The second level produces the carry out of the least signi cant n 2 bits and the third level the result of the comparison.
What is noticeable from the Tables is the small amount of linear threshold gates to realize the addition-comparison for the common 32 and 64-bits operand sizes. Clearly, the improvement for the size over existing art is substantial. In particular our addition scheme with polynomially bounded weights requires up to 71% for the realization of 32-bit adders and up to 47% for the realization of 64-bit devices when compared to the Siu et al scheme 4]. The fan-in reduction is also signi cant because our scheme requires up to 28% for the realization of 32-bit adders and up to 18% for the realization of 64-bit devices. As the Tables suggest the scheme proposed in Theorem 3 can be realized with a very small fraction of gates for the 32 and 64 -bit operand sizes. In particular, it requires up to 18% for the realization of 32-bit adders and up to 9:32% for the realization of 64-bit devices when compared to the Siu et al scheme 4]. While, as it can be observed in Table II for 32-bit operands, our scheme implies a maximum weight value twice the maximum weight value deduced from 4], it provides an 8:5 times lower fan-in.
Regarding the comparison when we consider a depth-2 network, as expected, the weights requirements are greater for our scheme when compared to 4], 11] and superior in size. This conclusion however is reversed when the depths of the network are assumed to be equal. Our estimations indicate that the scheme we propose is better in all counts including the size of the weights (at the exception of 64 operands which the weight size of the Siu et al scheme are equal to ours). In particular, our depth-3 scheme requires up to 18% gates, 50% weights and 28% fan-in for the realization of 32-bit comparators and up to 13% gates, equal weights and 20% fan-in for the realization of 64-bit devices, when compared to the Siu et al scheme 4] . When compared with Roychowdhury and al scheme 11] it requires up to 94% gates, 25% weights and 75% fan-in for the realization of 32-bit comparators and up to 83% gates, 50% weights and 92% fan-in for the realization of 64-bit devices.
IV. Concluding Remarks
The main concern of this paper was the reduction of the size of networks computing xed point arithmetic operations while maintaining small network depths with bounded and unbounded weights. It was shown that the addition can be performed by a depth-3 network with: the size in the order of O( n 2 logn ) and polynomially bounded weights; with the size of 6n + 2d n d p n e e (i.e. of optimal O(n) complexity), a maximum fan-in of 2d n d p n e e + 3
and a maximum weight value of 2 d p n e . Related to comparison it was shown that the comparison of two n-bit operands with carry can be computed by a depth-2 network with 2d n 
