AbstractÐAssuming signed digit number representations, we investigate the implementation of some addition related operations assuming linear threshold networks. We measure the depth and size of the networks in terms of linear threshold gates. We show first that a depth-P network with yn size, weight, and fan-in complexities can perform signed digit symmetric functions. Consequently, assuming radix-P signed digit representation, we show that the two operand addition can be performed by a threshold network of depth-P having yn size complexity and yI weight and fan-in complexities. Furthermore, we show that, assuming radix-Pn À I signed digit representations, the multioperand addition can be computed by a depth-P network with yn Q size with the weight and fanin complexities being polynomially bounded. Finally, we show that multiplication can be performed by a linear threshold network of depth-Q with the size of yn Q requiring yn Q weights and yn P log n fan-in.
INTRODUCTION
H IGH performance addition and addition related operations, such as multiplication, play an important role in the computer-based computational paradigm. A major impediment to improving the speed of arithmetic execution units incorporating addition and addition related operations is the presence of carry and borrow chains. One solution for the elimination of carry chains is the use of redundant representation of operands, proposed by Avizienis in [1] . The Signed Digit (SD) number representation method allows, under certain assumptions, the so-called ªtotally parallel additionº [1] , which limits the propagation of the carries at the expense of some overhead in data storage space and in processing time for the conversion of the results and potentially of the operands.
The redundant representation operates as follows:
For any radix r ! P, a sign-digit integer number In order to have minimum redundancy and, as a consequence, minimum storage overhead, one can assume that r P Â Ã , but, in order to break the carry chain, i.e., to have ªtotally parallel addition,º the value of should satisfy the relations stated in (2) . r I P $ % r À IX P Based on sign-digit representation, a number of high-speed architectures 2 have been reported, see, for example [2] , [3] , [4] , [5] , [6] . Thus far, all the investigations in SD arithmetic architectures assumed logic implementation with technologies that directly implement Boolean gates. Currently, other possibilities exist in VLSI for the implementation of Boolean functions using threshold devices in CMOS technology [7] , [8] , [9] , [10] . In assuming Threshold Logic (TL), the basic processing element can be a Linear Threshold Gate 3 (LTG) computing the Boolean function p such that:
Q where the set of input variables and weights are defined by x I Y x P Y F F F Y x nÀI Y x n and by 3 I Y 3 P Y F F F Y 3 nÀI Y 3 n , respectively. Such an LTG contains a threshold value, 2, a summation device, AE, computing p , and a threshold element, , computing p sgnp .
Given that TL may be promising, it is of interest to investigate new schemes applicable to such a new technology. To this end, assuming binary nonredundant representations, a number of recent proposals regarding addition and multiplications, see, for example, [13] , [14] , [15] , [16] , [17] , [18] , [19] , [20] , have been developed that assume threshold, rather than Boolean, logic.
Thus far, there are no studies assuming redundant representations and TL. In this paper, we assume SD number representation and we investigate linear threshold networks for P À I addition, multioperand addition, and multiplication. We assume that the operands are n-SD numbers and we are mainly concerned with establishing the limits of the circuit designs using threshold-based networks. We measure the depth and the size of the networks we propose in terms of LTGs.
The main contributions of our proposal can be summarized as:
. Any SD symmetric function can be implemented by a depth-P feed-forward Linear Threshold Network (LTN) with yn size, weight and fan-in values. . Assuming radix-P redundant operand representation, the addition of two n-SD numbers can be computed by a depth-P LTN with yn size and yI weight and fan-in values. . Assuming radix-Pn À I redundant operand representation, the multioperand addition of n n-SD numbers can be computed by an explicit depth-P LTN with the size in the order of yn Q , with the maximum weight value in the order of yn Q , and the maximum fan-in value in the order of yn P . . Assuming radix-Pn À I operand representation, the multiplication of two n-SD numbers can be computed by an explicit depth-Q LTN with the size in the order of yn Q . The maximum weight value is in the order of yn Q and the maximum fan-in value is in the order of yn P log n.
We also note here that, while our results are primarily theoretical, there exist technology proposals, see, for example, [10] , which may implement at least some of the proposed schemes, e.g., two operand addition. The presentation is organized as follows: In Section 2, we discuss background information on Boolean symmetric functions and their implementation with TL and introduce some preliminary results; in Section 3, we present TL schemes for the P À I addition of radix-P SD numbers; in Section 4, we study the multiplication of radix-P SD numbers and we present schemes for the multioperand addition and the multiplication of radix-Pn À I SD numbers; we conclude the presentation with some final remarks.
BACKGROUND AND PRELIMINARIES
In order to make this presentation self-consistent, we introduce in this section the definition of Boolean symmetric functions and some TL-based implementation techniques that we will use in our investigation. 
For any n input variable symmetric Boolean function p s , the sum 1 n iI x i ranges from H (all input variables are H) to n (all input variables are I). Inside this definition domain HY n, there are r intervals q j Y j Y j IY PY F F F Y r, for which if 1 P q j Y j , then p s is equal to I and, outside these intervals, the function is H. This is graphically depicted in Fig. 1 and formally described by (4).
The number of intervals depends on the function definition and we proved elsewhere [21] that, for any Boolean symmetric function, the maximum number of intervals r is upper bounded by d nI P e. Definition 2. A Boolean function of n variables p gs is generalized symmetric 4 if it entirely depends on 1 n iI w i x i , the weighted sum of its input variables, with w i , i IY PY F F F Y n, given integer constants.
5
In essence, a generalized symmetric Boolean function p gs is either a symmetric Boolean function or a nonsymmetric Boolean function that can be transformed into a symmetric Boolean function by trivial transformations, e.g., assignment of different weight values to the inputs or input 4. This definition and, also, Definition 1 are not specific to functions with Boolean input variables. The symmetry is an intrinsic property of the function and do not depend on the input variable type. Therefore, they also appy to functions of other types of input variables, e.g., integer, real.
5. The weights w i can be also real numbers, but we have assumed integer values here because of practical considerations related to the LTG fabrication technology [7] , [10] .
replication. p gs can be described as a function of 1 n iI w i x i and the definition domain extends from HY n to HY 1 mx , where 1 mx n iI w i . All the results that stand true for symmetric Boolean functions can be also applied to generalized symmetric Boolean functions.
To clarify the generalized symmetric Boolean function concept, let us consider the R P-bit multioperand addition producing a R-bit result. The truth table and the schematic diagram for such a function are depicted in Fig. 2 . First, it can be observed that, in order to produce the sum at bit position H, we need to consider only the bits in the first column (LSB position). It can be easily verified that the Boolean function computing the sum's LSB, The s I function is, however, a generalized symmetric Boolean function as it can be made to be a symmetric Boolean function if a weight of P is associated with the input bits in the column I. Consequently, the s I sum bit can be computed by a symmetric Boolean function s I 1, where 1 x H y H z H w H Px I y I z I w I , which interval-based representation is graphically depicted in Fig. 3 .
Given that symmetric (generalized or not) functions constitute a frequently used class of Boolean functions and because they are expensive to implement in hardware, in terms of area and delay, their implementation with feedforward LTNs has been the subject of numerous theoretical and practical scientific investigations, see, for example, [22] , [23] , [24] , [25] , [16] , [21] .
The most network-size efficient approach known so far for the depth-P implementation of symmetric Boolean function with TL is the telescopic sum method, introduced by Minick in [23] . The method can be used for the implementation of any Boolean symmetric function and produces depth-P feed-forward LTNs with the size in the order of yn, measured in terms of LTGs, and with linear weight and fan-in values. We shortly describe this method by introducing the following lemma.
Lemma 1 [23] . Any Boolean symmetric function p s x I Y x P Y F F F Y x n , described as in (4), can be implemented by a two-layer feed-forward LTN with a size complexity measured in terms of LTGs in the order of yn as follows:
t r n I À q r if r T n nd t r H if r nX
A formal proof of Lemma 1 and implementation examples can be found in [26] . Given that we assume SD operands (that is, we consider functions with no Boolean input variables), we need to map them into general Boolean functions. In order to achieve this mapping, we first have to choose a representation for the SDs. One possible representation is the P's complement [27] . 6 Given a fixed radix r, an SD number is represented as s nÀI Y s nÀP Y F F F Y s I Y s H . In this presentation, we will consider that any digit s i can assume a value in the symmetric 7 digit set fÀY À IY F F F Y IY HY IY F F F Y À IY g, with the maximum digit magnitude satisfying (1) or (2) . The cardinality of the digit set is P I and, consequently, any SD s i can be binary represented by a k-tuple x kÀI Y F F F Y x I Y x H with k dlogP Ie and x l P fHY Ig, for l HY IY F F F Y k À I.
For the particular case of the P's complement codification of the SDs, the dimension of the k-tuple can also be computed X U As a consequence of (7), p is expressed as a generalized Boolean symmetric function of nI dlog Ie variables, then it can be computed with the scheme in Lemma 1. The size of the LTN implementing p depends, on the number of intervals on the definition domain. Given that, in our case, the maximum absolute value any digit can assume is r À I, the argument of p as described in (7), in the worst case scenario, can take any value inside the definition domain À nÀI iH w i rY nÀI iH w i r. Consequently, the maximum number of intervals is upper bounded by Pr nÀI iH w i I P 8 9
X
Because we assumed that the weights w i and the radix r are arbitrary integer constants, the LTN cost is in the order of yn. Obviously the weight and fan-in values are in the order of yn. t u
SIGNED DIGIT P À I ADDITION
In this section, we investigate P À I addition schemes using a ªtotally parallelº [1] addition approach. We use a fixed radix of P and the corresponding digit set fIY HY Ig, where I denotes ÀI. We consider two n-SD integers
Traditionally, in the context of Boolean logic, the P À I addition of radix-P SD represented operands has been achieved with two-step approaches [2] , [27] , [3] : First, an intermediate carry i and an intermediate sum s i satisfying the equation x i y i P i s i are computed for each digit position i. Second, the sum digit z i , i HY IY F F F Y n À I, is computed as s i iÀI .
In our approach, we will use the ªtotally parallelº addition described in Table 1 [3] . We also assume that any digit x in the set fIY HY Ig is represented in the P's complement notation by two bits, as is shown in Table 2 . Note that, in this codification, the combination x H and x À I is not allowed and cannot appear during the computations.
It can be observed in Table 1 that the digits in position i À I contribute into the computation of s i and i only by their sign. Therefore, what we have to compute in order to implement the scheme presented in the table are the functions
These two functions, as is directly implied from the table, are not symmetric in their input variables. They can be made symmetric by computing the weighted sum of the inputs 1 s stated by (8) such that (9), (10) with proper determined weights w i and w iÀI hold true for all the possible input combinations.
We compute the weights w i and w iÀI by taking into consideration the specific structure of the functions s i and i . The choice for w iÀI I is straightforward. Given that, for the digits in position i À I, we take into account only the x À bits, the minimum value of w i should be equal 8 to Q.
Consequently, the weighted sum 1 s in (8) can be computed as ÀTx
iÀI and the description of the symmetric functions computing s i and i is described in Table 3 .
From the table, we derive the interval description (similar to the description of (4)) for the required Boolean functions:
Assume that
i and À i are computed as in (15), (16) .
IT
We next introduce an implicit depth-I implementation technique based on the fact that any symmetric Boolean function p s , defined as in (4), can be expressed as: 8. w i has to be greater than the maximum value that can be assumed by w iÀI x À iÀI y À iÀI which, in this case, is 2. 
Proof. To verify (18) , it will be shown that p s is indeed I when the sum 1 n iI x i lies inside an interval q j Y j for a specific j and that p s is H when there is no j such that 1 P q j Y j for all j, I j r.
. Case 1: 1 P q j Y j for a specific j, I j r.
In this case,
T h e r e f o r e , p s r À j I j À r, i.e., is I as needed. . Case 2: There is no j, I j r, such that 1 P q j Y j . In this case, there are three possibilities: 1 P l Y q lI for a given l, I l r, 1 P HY q I , and 1 P r Y n. We will prove that, in all of them, p s is H as needed. In the first subcase,
i.e., is H. In the second subcase,
Consequently, p s r À r, i.e., is H. In the last subcase,
is H.
Given that any q j can be obtained with an LTG computing sgnf1 À q j g and any À j with an LTG computing sgnf j À 1g, the entire network is built with Pr LTGs, i.e., the implementation cost is in the order of yn. All the input weights are I and the fan-in for all the gates is n.
t u
The method presented in Lemma 3 can also be applied for the implementation of generalized symmetric functions. Given that, in this case, the number of intervals is upper bounded by n iI w i I P $ % Y the implementation cost will be upper bounded by
i.e., is still in the order of yn.
Remark 1. The scheme in Lemma 3 can be changed into an explicit one by connecting all the outputs of the gates computing q j and À j to a gate with the threshold value of r I. The output of this extra gate will explicitly provide the value of p s after the delay of P TGs.
Remark 2. If q I H, then q
I is always I and (18) becomes:
If r n, then À r is always I and (18) becomes:
If q I H and r n, then q j and À j are always I and (18) becomes:
It should be noted that, if used in cascaded computation, the method described in Lemma 3 increases the fan-in of the next stage because the value of the function p s is carried by Pr signals.
From Table 3 and using (15, (16) , (17), the four Boolean symmetric functions describing the computations of the intermediate sum s i and carry i can be expressed by the following:
By applying Lemma 3, we derive from (22), (23), (24), (25) an implicit depth-I implementation of the first step of the ªtotally parallelº addition scheme. Because ÀT i and V À i are always I and Remark 2, we have that:
In order to make the way this implicit scheme is working more intuitive, we depict in Fig. 4 
Assuming radix-P SD operand representation and the SD codification in Table 2 , the addition of two n-SD numbers can be computed by an implicit depth-P LTN with
IIn P LTGs, a maximum weight value of T, and a maximum fan-in of IP.
Proof. The quantities d ÀI À i and I i in (33), (34) can be computed by doing the proper substitutions, using (26), (27) , (28), (29) , as: constructed with IIn TGs. For the digit position n À I, we have to produce the carry-out. This can be explicitly generated in depth-P at the expanse of two TGs computing:
Therefore, the cost of the entire addition network is IIn P, i.e., of yn complexity. Obviously, the weight values and fan-in values do not depend on n. The maximum fan-in is IP and the maximum weight value is T, i.e., having yI complexity. t u
Note that, for this scheme, the value of z i is carried by two signals and one threshold value and z À i is actually depth-P explicitly computed. If used in cascaded computation, this method will increase with I the fan-in of the next stage and will contribute with I to the threshold value of some of the gates in the next stage.
If we compare the scheme introduced in Theorem 1 with the depth-P scheme presented in [28] , which has a network size of PSn S, a maximum fan-in of PT, and a maximum weight value of IPQ, one can observe that we achieved a substantial reduction in network size, weight, and fan-in values for the same network depth. However, the new depth-P scheme is implicit and this fact increases the fan-in of the stage requiring as inputs the digits z i . In the remainder of this section, we show that it is possible to explicitly compute the sum while maintaining the network depth and complexity.
The method described by (30) , (31) is implicit because of the way we compute the final sum bit z (31) , (37), (38) . To this end, we assume that, in order to represent a SD x in the set fIY HY Ig, we use the codification described in Table 4 instead of the P's complement codification in Table 2 . Note that, with this new codification, the combination x I and x À I is not allowed and cannot appear during the computations. Under this assumption, the quantity 1 s can be expressed as in (39) and it can take values in the definition interval ÀIPY V.
Thus, the first step of the ªtotally parallelº addition scheme is described in 
The second step of the ªtotally parallelº addition is the computation of z i s i iÀI . In this case, 1 z ÀPs
iÀI and the second step can be described by 
Theorem 2. Assuming radix-P SD operand representation and the SD codification in Table 4 , the addition of two n-SD numbers can be computed by an explicit depth-P LTN with
IPn P LTGs, a maximum weight value of IH and a maximum fan-in of IR.
Proof. By proper substitutions, using (44), (45), (46), (47), (48), (49) provide an explicit depth-P implementation scheme of the P À I addition as follows:
SI
On the first level, we compute, for each digit position i, (50), (51). For the digit position n À I, we have to produce the carry-out. This can be also explicitly generated in depth-P at the expanse of two TGs computing:
Therefore, the cost of the entire addition network is IPn P. The maximum fan-in is IR and the maximum weight value is IH. t u
One can observe that all the quantities involved in Theorem 2 are in the same order of magnitude as in Theorem 1. Even though the scheme in Theorem 1 requires slightly larger maximum fan-in (IR instead of IP) and weight values (IH instead of T), it has the advantage of explicitly computing the sum digits after the delay of P TGs.
SIGNED DIGIT MULTIOPERAND ADDITION AND MULTIPLICATION
Threshold networks for multioperand addition and multiplication of n-bit binary operands have been reported [14] , [15] , [26] , [29] . Generally speaking, multioperand addition and multiplication can be achieved in two steps, namely: First, reduce a multioperand addition (in multiplication, such addition is required for the reduction of the partial product matrix) into two rows; second, add the two rows to produce the final result. In addition to these two steps, the multiplication also requires a third step, the production of the partial product matrix. In this section, we investigate these processes. For such a scheme and nonredundant representations, the following has been suggested: . The reduction of the multioperand addition (or the reduction of multiplication partial product matrix) into two rows can be achieved by depth-P networks with the cost of the network, in terms of LTGs, in the order of yn P and a maximum fan-in in the order of yn log n, see, for example, [15] , [29] . . The entire multiplication can be implemented by a depth-R network [14] . It was also suggested in [30] , based on a result in [31] , that multioperand addition can be computed in depth P and multiplication in depth Q, but no explicit construction for the networks and no complexity bounds are provided. A constructive approach can be derived if the result in [32] suggesting that a single threshold gate computing p x sgnf3 H 3 I x I Á Á Á 3 n x n g with arbitrary weights can be simulated by an explicit polynomial-size depth-P network is used. Such a LOGSPACE-uniform construction as stated in [32] produces a network with ylog IP n wires and the weights of those wires in order of ylog V n, for a total size of yn PH log PH n. The total size for such a construction was further reduced to yn IP log IP n in [33] . LOGSPACEuniform constructions for depth P multioperand addition and depth Q multiplication has been suggested in [32] , but the discussion about depth-P multioperand addition or depth-Q multiplication schemes is marginal and no complexity bounds are explicitly given. In an attempt to assess the complexity of such a scheme for multioperand addition which operates on an n P -input function instead of an ninput function, we can use the least expensive scheme in [32] and estimate that such a depth-P multioperand addition or depth-Q multiplication network may require a total size of yn PR log PR n. In this section, we investigate the potential benefit that can be expected by using SD represented operands in TL multiplication schemes. First, we prove that multioperand addition can be achieved by a depth-P network with yn Q size, yn Q weights, and yn P fan-in complexities. It must be noted that the proposed network performs an n operand to one result reduction in depth-P, not an n operand to two reduction in depth-P as previously proposed schemes [15] , [29] do. Subsequently, we show that the multiplication (that is, the generation of the partial products and the matrix reduction into one row representing the product) can be achieved with a depth-Q network with yn Q size, yn Q weights, and yn P log n fan-in complexities.
Depth-P Multioperand Addition
It is well-known that, in order to perform n-bit multioperand addition, first, the n rows (representing the n numbers) are reduced to two, then the two rows are added to produce the final result. This two-step process is depicted, for the particular case of eight V-bit numbers, in Fig. 5a . As indicated in the introduction of the section, the first step of multioperand addition not using redundant digit representations requires a depth-P network and additional depth is required to perform the second step.
In the following, we will prove that, if we assume SD operands in an appropriate representation radix the multioperand addition of n n-SD numbers and, consequently, the reduction of the partial product matrix of the multiplication operation, into one row, can be achieved in one computation step, as in Fig. 5b , requiring a depth-P network. This is achieved by determining a radix which allows an n-digit ªtotally parallelº addition. Avizienis investigated this issue in [1] , but from the dual point of view, by assuming a given radix-r SD representation and determine the maximum number of digits that can be added in ªtotally parallelº mode within that radix-r SD representation. In our investigation, the number of digits n is given and a minimum value for the radix-r must be found to compute n SD addition into a ªtotally parallelº mode. We answer to this question in the following lemma.
Lemma 4. The simultaneous addition of n SDs can be done in a ªtotally parallelº mode by assuming a representation radix greater or equal with Pn À I.
Proof. The simultaneous addition of n SDs can be done in a way similar to the addition of two digits. That is, in order to add the n digits x mode, we first have to produce an intermediate sum digit u i and a transport digit t i that satisfy (54) and, also, we have to satisfy the constraint indicating that the subsequent addition in (55) that gives the value of the sum digit z i in the position i, can be performed without generating a carry-out. That is:
We have to find the value of the radix r for which the computation in (54), (55) (56), (57), we can derive the following inequalities:
In order to obtain the greatest range for jtj mx , we have to assume the maximum redundancy digit set, i.e., jxj mx r À I and, for the intermediate sum, an absolute maximum value of juj mx r P Â Ã . This, together with (58) and depending if we assume an odd radix r o or an even one r e , leads to r e ! Pn or r o ! Pn À I. Therefore, in order to perform simultaneous addition of n SDs in a ªtotally parallelº mode, we have to use a representation radix greater or equal with Pn À I. t u
Assuming a representation radix of Pn À I, we introduce the depth-P multioperand addition scheme for n n-SD numbers.
Theorem 3. Assuming radix-Pn À I SD representation, the multioperand addition of n n-SD numbers (that is, the reduction via addition of an n-digit n row matrix to one row) can be computed by an explicit depth-P LTN with the size of yn Q . The maximum weight value is the order of yn Q and the maximum fan-in value is in the order of yn P .
Proof. Assume that the n SD numbers we have to add are
, with i IY PY F F F Y n and all the digits x j i , iY j IY PY F F F Y n can take value within the symmetric digit set
Given that the radix-Pn À I allows for ªtotally parallelº addition of n SDs, we can compute the sum of the n numbers as follows: For each position i, produce an intermediate sum digit u i and a transport digit t i that satisfy u i Pn À It i x I i x P i Á Á Á x n i ; the sum digit z i in the position i is computed as z i u i t iÀI without generating a carry-out. If we assume that the greatest absolute values for the input digits, transport digits, and i n t e r m e d i a t e s u m d i g i t s a r e jxj mx Pn À P, jtj mx n À I, and juj mx n À I, respectively, the sum digit z i will depend only on the values of the digits in the columns i and i À I of the multioperand addition matrix and can be computed with the two-step approach. With this scheme, the network implementing the multioperand addition contains one subcircuit performing this computation for each digit position i, i IY PY F F F Y n. Obviously, the cost of the entire network is n times the cost of the circuit performing the ªtotally parallelº addition of n digits. The delay of the multioperand addition, the maximum weight, and fan-in values are imposed by their similar values in the circuit performing the ªtotally parallelº addition of n digits.
The direct implementation of this two-step computation procedure with the scheme in Lemma 1 is not convenient because it will lead to a depth-R LTN. However, given that any generalized symmetric Boolean function can be implemented with a depth-P network, we can reduce the depth of the network to P if we are able to compute the value of z i with a symmetric function of Pn input variables, i.e., all the digits in the columns i and i À I of the multioperand addition matrix. This can be done by observing the direct link that exists between the value of z i and the value assumed by the weighted sum 1 of all the Pn digits
in the columns i and i À I, computed as in (59). lead to j1j mx Rn P n À I and to a variation domain for 1 equal to ÀRn P n À IY Rn P n À I.
Because the digits involved into the computation in (59) belong to the set h, we need log Pn À I I bits for their P's complement codification. Under this codification, each digit x . Each of these bits will take part in the computation of 1 with a weight that corresponds to its position inside the digit and following the P's complement codification convention. With this assumption, (59) becomes:
Y TH assuming all of these product digit z i can be expressed by a function p 1. Obviously, because of the weighted manner, we did the computation of the sum 1, the function p is symmetric in all of the input variables 9 and, consequently, it can be implemented using the method described in Lemma 1 with a depth-P LTN. Because z i can assume any digit value in the set h, we again need logPn À I I bits for its codification. Therefore, in order to compute p 1, we have to compute log Pn À I I symmetric Boolean functions Asymptotically speaking, this leads to an implementation of the multioperand addition of n n-SD numbers with a depth-P network having the number of LTGs in the order of yn Q .
The maximum weight value is upper bounded by the dimension of the definition domain, i.e., Vn P n À I I, and, consequently, it is in the order of yn Q . The maximum fan-in value is imposed by the gates in the second level of the network which take as inputs all the bits participating into the computation, i.e., PnlogPn À I I, and some outputs of the gates on the first level. The total number of gates in the first level of the network is upper bounded by Vn P nÀII PnÀI l m and, consequently, the maximum fan-in value is in the order of yn P . t u
We conclude our investigation on TL networks for the multiplication of SD operands by introducing a depth-Q LTN for multiplication which uses the multioperand addition scheme we presented in Theorem 3.
Depth-Q Multiplication
Multiplication is achieved with the generation and reduction of a partial product matrix. In the previous section, we showed that the multioperand addition (and, by extension, the reduction of the multiplication partial product matrix) can be performed in depth-P using threshold networks and SD representations. In this section, we investigate the entire multiplication operation, including the generation of the partial product matrix.
In the case of nonredundant operand representation, the generation of the partial product matrix can be performed at the expanse of n P TGs in depth-I because we need one AND gate to produce each partial product z iYj x i Â y j , iY j HY IY F F F Y n À I. This may not be true for sign digit operands where each partial product z iYj is an SD which has to be computed as the product of two SDs x i and y j . In essence, even though, using TL and SD representation, the partial product reduction can be achieved by a depth-P, it is not said that multiplication can be achieved by a depth-Q network.
To achieve a depth-Q multiplication, we use Theorem 3 for the reduction of the partial product matrix and use implicit computations in the network connecting the partial product production and the first stage of partial product reduction. Given that, in order to use the scheme in 9. The number of input Boolean variables is given by the product of the number of digits involved into the computation of z i and the number of bits we need in order to represent a digit in h, i.e., PnlogPn À I I.
10
. If the multioperand addition matrix is the partial product matrix corresponding to the multiplication of two n-SD numbers, the number of columns is Pn and the cost changes as a consequence. However, this does not change the asymptotic cost.
x i and y j to Pn À P p Â Ã . In the following lemma, we assume that the operand digits are represented with the P's complement codification discussed in Section 2 and prove that the entire partial product matrix can be produced by a depth-P LTN with polynomially bounded size, weight, and fan-in values.
L e m m a 5 . A s s u m i n g t w o n-S D o p e r a n d s
, the partial product matrix kz iYj k iYjHYIYFFFYnÀI , z iYj x i Â y j can be produced by a depth-P LTN with the size measured in terms of LTGs in the order of yn Q . The maximum weight value is in the order of yn and the maximum fan-in value is in the order of yn.
Proof. We assume that all the SDs are represented in the P's complement notation by
The value of d is imposed by the maximum absolute value of Pn À P p Â Ã we have assumed for the operand digits and is equal to log Pn À P p Â Ã Â Ã I. With these assumptions, the partial product z iYj can be expressed as in the following equation: bounded by the dimension of the definition domain for the p r 1 m functions, i.e., Rn À I I, and, consequently, it is in the order of yn. The maximum fan-in value is imposed by the gates in the second level of the network which take as inputs all the bits participating into the computation, i.e., Plog Pn À P p P, and some outputs of the gates on the first level. Because we proved that the total number of gates in the first level of the network is upper bounded by RnÀII P l m , the maximum fan-in value is also in the order of yn.
t u By connecting the results for the multioperand addition and the generation of the partial product matrix for SD operands, we obtain a depth-R scheme for the multiplication of SD numbers as stated in the following corollary: Corollary 1. Assuming radix-Pn À I SD representation the multiplication of two n-SD numbers can be computed by an explicit depth-R LTN with the size measured in terms of LTGs in the order of yn Q . The maximum weight value is the order of yn Q and the maximum fan-in value is in the order of yn P .
Proof. Trivial from Lemma 5 and Theorem 3. t u
The delay of the multiplication network can still be reduced by producing the partial product matrix using an implicit computation scheme presented in Lemma 3.
Theorem 4. Assuming radix-Pn À I SD representation the multiplication of two n-SD numbers can be computed by an explicit depth-Q LTN with the size in the order of yn Q . The maximum weight value is the order of yn Q and the maximum fan-in value is in the order of yn P log n.
Proof. Trivial. First, use the implicit implementation (Lemma 3) in order to produce the partial products z iYj with the delay of one TG. This derivation will not change the asymptotic costs we derived in Lemma 5. Second, use the depth-P multioperand addition in Theorem 3 to produce the product. The implicit computation of the partial products will only increase the fan-in of the gates in the first level of the network performing the multioperand addition from PnlogPn À I I to at most PnRn À QlogPn À I I. This will change the asymptotic bound for the fan-in from yn P to yn P log n. The asymptotic size of the network and the maximum weight value will remain unchanged. Consequently, this depth-Q scheme has a network size in the order of yn Q and the maximum weight value is the order of yn Q . t u
CONCLUSIONS
We investigated LTNs for symmetric Boolean functions P À I addition, multioperand addition, and multiplication. We assumed SD number representation and we were mainly concerned with establishing the limits of the circuit designs using threshold based networks. We have shown that, assuming radix-P representation, the addition of two n-SD numbers can be computed by an explicit depth-P LTN with yn size and yI weight and fan-in values. If a higher radix of Pn À I is assumed, we proved that the multioperand addition of n n-SD numbers can be computed by an explicit depth-P LTN with the size in the order of yn Q , with the maximum weight value in the order of yn Q and the maximum fan-in value in the order of yn P . Finally, we have shown that the multiplication of two n-SD numbers can be computed by an explicit depth-Q LTN with the size in the order of yn Q . The maximum weight value is in the order of yn Q and the maximum fan-in value is in the order of yn P log n. York, and the Glendale laboratory in Endicott, New York. At IBM, he was involved in a number of projects regarding computer design, organizations, and architectures, and in the leadership of advanced research projects. A number of his design and implementation proposals have been implemented in commercially available systems and processors, including the IBM 9370 model 60 computer system, the IBM POWER II, the IBM AS/400 Models 400, 500, and 510, Server Models 40S and 50S, the IBM AS/400 Advanced 36, and the IBM S/390 G4 and G5 computer systems. For his work, he received numerous awards, including 23 levels of Publication Achievement Awards, 15 levels of Invention Achievement Awards, and an Outstanding Innovation Award for Engineering/Scientific Hardware Design in 1989. Six of his 65 patents have been rated with the highest patent ranking in IBM and, in 1990, he was awarded the highest number of patents in IBM.
Dr. Vassiliadis is a member of the IEEE Computer Society and an IEEE fellow. His research interests include computer architecture, embedded systems, hardware design and functional testing of computer systems, parallel processors, computer arithmetic, neural networks, fuzzy logic and systems, and software engineering.
