We 
Introduction
In this paper we address the problem of multiple-valued multiplications. First, we will give an overview about recent approaches to optimize the multiplication procedure. In the 1950's first investigations for the use of quasiternary number representations were made. The basic idea was presented by Booth in 1951 [ 11. His idea was to combine two binary digits and to recode these into signed digits. The aim of his work was to develop a uniform algorithm for multiplication of two signed binary numbers. The disadvantage consists in an unfavorable behaviour, if sequences of "01" occur. In this case the recoded quasitematy number has more covered digits than the original one. A decisive alteration in the recoding structure from two to three bits by MacSorley [ 1 I] led to the fact, that the modified-Booth recoding forms the most common recoding structure for binary multipliers. It is an advantage of this recoding algorithm that at most half of the digits can be covered. By a corresponding design, taking these properties into account, this leads to a considerable saving in partial products when using it in a multiplier. In spite of the quasiternary recoding, the realization in a hardware is carried out in established binary logic [5] .
In parallel, circuits using multiple-valued logic (MVL) were developed [7] . In recent years, a number of papers combining MVL-techniques and recoding algorithms for the development of multipliers have been presented. Such a combination was given by Chen and Rajashelchara [2] in CMOS technique. But, the complexity for realizing the multiplier is very high, due to the fact that the chosen radix is 2 and only adders with two inputs are used. A mixed multiple-valued /binary approach was presented by Etiemble and Navi using current mode circuits [4] . By special 3BC-cells (3-valued to binary current-mode converters), ternary inputs will be recoded to binary outputs that will be summed up to multiple-valued currents. While all approaches presented so far work with radix 2 instead of higher radixes, Kameyawa et al. [lo] gave an algorithm for multiplication in radix 4. Also the circuits of Ishizuka et al. [9] operate to radix 4 using current mode circuits and a radix-4 redundant number system. Ibrahim and Abdul-Karim [8] developed a radix 3 2x2 trit multiplier in CMOS. Besides the fact that the introduced adders are too extensive, they use the ternary digital COS-MOS technique, introduced by Mouftah and Jordan [12] . But, the usage of resistors is of no significance for todays VLSI-realizations.
None of the approaches presented makes use of all advantages of the MVL-technique. On the one hand, this is based on the usage of a redundancy in the number representation in combination with a suboptimal radix. On the other hand, instead of an optimal radix, the possible number of inputs for an adder will not be exhausted. An early attempt with an optimal combination of number of adder-inputs and radix was done by Vranesic and Hamacher in ternary multi-threshold logic [ 131. The ternary multiplier worked directly to radix 3. The introduced adder consists of four inputs and accomplished a four-totwo reduction of summands. Since every output has to be connected to an input of a following adder -providing that the output is not part of the result -this extension leads to a considerable saving of adders compared to a binary technique. A promising approach was presented by De and Sinha [3] in I'L-technique. Besides an optimal combination of number of inputs and radix, they move on by parallelism through precarry operations that speed up the multiplication. In this paper we present ternary multipliers using optimal radix 3, a non-redundant encoding scheme and adders with up to four inputs.
In the following section we will derive some equations needed for the estimations in the later sections. In section 3 the sum-and the carry-circuit of a ternary 4-input adder as well as a simulation result is presented. Section 4 compares the complexity of multiplication circuits using different kinds of adders. The timing behaviour of the resulting multiplication circuits is analyzed in section 5. For further improvements, ternary carry look-ahead circuits are presented in section 6. The main results are finally summarized in the conclusion.
Basic relations
In the following we assume that both, multiplicand and multiplicator U and V, have the same wordlength n. This is no common limitation, but simplifies the derived equations. Both numbers, U and V, have the representation
whereby ui and vi can take the values 0 and f 1. The representation of U and V in (1) is assumed to radix 3. Each bit position i of U and V corresponds to a so-called weight which is 3' in the ternary case. For the corresponding decimal value follows n-l u = CUi3'
The representation in (2) illustrates that the ternary expression is easier than a binary one, due to the ternary values 0 and f 1. The multiplication of U and V (M=U.V) leads to equation (3) [3] n-1 A4 = c 3+ ui_jvj + 2k2 3' 2 ui_jvj (3)
The multiplication of uivj will be called a partial productpp(i+jj. For the ternary case this partial product can only lie within the range -1 to + 1. Usual adders sum up several inputs of the same weight and produce two outputs, one at the same and one at the next higher bit position. As we work to radix 3, the result can lie within the range -4 to +4. This means, however, that a ternary adder can process four inputs of the same bit position in contrast to three inputs of a binary adder. A further advantage of the ternary number representation is the treatment of the sign, because it causes no additional expense. Desisting from the recoding algorithms, mentioned in the introduction, in binary technique we have to spend one bit more to accommodate the sign. By consideration of the sign bit, we can derive the relation for a binary representation with the same resolution as a ternary representation to be b = ..10g(3) + 1 z n-l.58 + 1 log(2) (4) whereby b gives the number of digits of a binary representation. With increasing n, the number of partial products in a multiplication increases naturally. Independent of the radix, pp(i) partial products
have to be processed for every bit position. Summed up from the starting index 0 to the complete wordlength of the n.n multiplication result 2n-2 it follows
The higher number of digits of a binary representation consequently influences the number of partial products PP quadratically. On the other hand, the expense of ternary gates is much higher than that of binary ones. Firstly, the following analysis will show in detail, whether the ternary technique is advantageous in contrast to the binary one or not. For this analysis, we have to develop a ternary adder with four inputs and two outputs. Secondly, we will analyze the effects of a reduction from k inputs to 1 outputs with k2l for adders with k=2 to 4 inputs. Since every output of an adder has to be connected with an input of a following adder -except this output is part of the result -it follows a nonlinear relation between the number of adders and the wordlength n. This relation should be analyzed in dependence on the types of adders. With this analysis we can compare the complexity of both binary and ternary multipliers.
TDDNL 4-input adder
As a ternary switching technique we have choosen the ternary dynamic differential no race logic (TDDNL) technique [6] . The realization of logical operations with one or two inputs as well as connection circuits to the binary system are given in [6] . The design of a TDDNLadder with three inputs is also shown there. This design should be expanded to four inputs.
TDDNL-gates use a logical array of nmos-transistors with three outputs, f,, f,, and f,, i.e., a negative, zero, and a positive output. In the evaluation phase, exactly one of these three outputs must be connected with the source M. Thus, the development of optimal arrays is an optimization problem that can be solved by existing software tools. In contrast to that, the presented method for designing the logic arrays is heuristically. With this heuristical method we get optimal solutions for the derived adder as well as for the carry look-ahead gate presented in section 6. The heuristical method for the determination of the logical array, with every variable used having a fixed weight, can be described as follows:
Each variable will be worked out in a step. Firstly, if variables have differing weights, we have to decide, which variable is to be used first. If all variables are equal in weight, the order of the variables is of no importance. In the first step, all combinations of the first variable will be covered. All nodes arising from this step will be combined with all three combinations of the next variable. In the following, the actual sums will be written to the new arising nodes. The sums will be calculated by the sum of the two variables processed up to this point. All nodes with the same sum can be connected together. For a further simplification attention must be paid to the fact, whether a node eventually represents a fixed result. That means that the result is not depending on the following variables, In this case, this node can be connected to the corresponding output. All other nodes will be processed in the same manner as desctibed, until all variables have been used.
The following development of a 4-input sum-and carry-circuit elucidates this method. In case of a ternary adder with four inputs the four variables w, x, y, and z have to be processed. The order of the variables is of no significance because the four inputs have the same weight. After each node, we will form a sum that is composed of the complete path to the source-node M. In the case of a ternary carry-signal in fig.2 , the sum after the first variable w at the left node is -1, at the middle node it is 0 and at the right node it is + 1. These three nodes will be combined with the three combinations of x. The sums at the left node connected with 2 will be -2, with x0 -1 and connected with x it will be 0. After this procedure is done for all 9 cases, we look for nodes with equal weights. In the case of the carry signal we can reduce the nine nodes to five, according to four cases where the sums are equal. The connection of the nodes means that the result is independent on the way, how this sum is obtained. Especially, an intermediate sum of 1 is independent how it comes about, either by the combination w = 1 / x=0 or by w =0 / x = 1. The number of nodes that require further processing is minimal with this method. Every one of the five nodes after x and w with the sums in the range from -2 to 2, must be combined with the three combinations of y. The connection of nodes with the same weight after y, results in seven nodes in the range from -3 to + 3. Because of a negative carry having to be set if the sum is equal or less than -2, the node with the sum -3 can be connected with the output Ca, in fig.2 . Similar considerations eliminate the node with the sum +3. The remaining five nodes have to be combined with all combinations of z. The logical array obtained by this method has a least number of transistors for this problem. As a further property of this method, no states will be propagated back in the network that could cause a wrong result (compare fig.12 in [6] ). The logical array for the sum is still more favorable than that of the carry, because the modulo-3-operation takes care for the property that the intermediate sums cannot exceed the range -1 to + 1. Regardless of the number of variables used, almost three nodes after every step have to be processed further. Both logical arrays, for the sum and for the carry, are given in fig.2 and 3 . The carry-signal needs 39 nmos-transistors, whereby the sumcircuit uses only 30 transistors. These circuits have to be expanded by the TDDNL-driver in fig. 1. 
Complexity of multipliers
We can not explain our analysis in detail at this point. But, the results in fig.5 show significant advantages of ternary multiplications against binary ones. For the detailed analysis, we have made the same assumptions as in [6] . According to this analysis the following complexity results for the different adder types: binary adder with 2 inputs: complexity 54 and 30 transistors; binary adder with 3 inputs: complexity 60 and 36 transistors; ternary adder with 2-inputs: complexity 90 and 58 transistors; ternary adder with 3 inputs: complexity 127 and 91 transistors; ternary adder with 4 inputs: complexity 159 and 119 transistors. Fig.4 shows the relations, standardized to the number of digits n as an exponent to base 3. For n=32, e.g., a resolution of more than 1.853.10" follows for the ternary case, whereby in a binary representation according to (4) nearly 52 bit have to be provided. From the above formulas a complexity of 79200 results for a 32-trit-multiplier. This is equal to 60032 transistors. Fig.4 reflects a comparison with respect to the complexity of binary and ternary multipliers. Regarding to the timing behaviour we can expect the ternary multiplier to be superior in contrast to binary multipliers, because considerably less steps have to be processed up to the final result. With respect to the switching delays, TDDNL-gates can be compared to have a similar delay in contrast to binary dynamic gates. This is obvious, if we consider the logic block in fig.2 that is built by n-channel MOSFETs only, whereby a maximal number of four transistors is connected in series. Besides a faster switching property against p-channel MOSFETs, this means a lower input capacity too. For the TDDNL-gates only the propagation delay of a TDDNL-driver ( fig.1) has to be added. But, this delay is formed by the switching of only one p-channel transistor. For a more detailed analysis, we will make the following assumptions: All partial products will be formed in a standardized delay time 7= 1. The propagation delay for a binary dynamic gate will also be 7= 1 from every input to both the sum-and the carry-output. To demonstrate the advantages of the ternary technique, we assume the TDDNL-gates to have a propagation delay of 7=2 from any input to any output.
Timing behaviour
In order to optimize the circuit regarding to its timing behaviour, we will introduce a new order technique. This technique optimizes the timing behaviour, but complicates the layout of the complete circuit. However, we will keep this in the background since efficient algorithms exist for solving these problems. The new order technique functions as follows:
All partial products will be ordered according to their magnitudes for each weight and will be written into a list. An adder processes the 2, 3 or 4 (in proportion to the adder used) fastest products and forms two output signals, called intermediate sums is(i). The magnitude of these outputs is equal to the largest magnitude of the inputs plus the standardized propagation delay as specified. The used inputs are cancelled from the list and the new two magnitudes are written into the list, which will be ordered again. This procedure will be processed from the LSB to the MSB, until only one entry remains for every weight. Fig.5 shows the behaviour for adders with three inputs (a) which form an input-to-output reduction from 3 to l/l (sum/carry). Fig.6 shows the behaviour with four input adders which form an input-to-output reduction from 4 to 1 /l . The numbers on the left hand of the ellipses show the calculated propagation delay that has to be written to the same and to the next higher weight. This number has to be calculated by adding the time delay 7 to the biggest number in the ellipse. We observe that the propagation delay of a ternary multiplier-result increases by two whereby that of the binary result increases by one per weight. But, the crucial difference is that we can use a carry look-ahead method for much lower weights in the ternary case in contrast to the binary case. This is based on the fact that only one intermediate sum has this high propagation delay. All other intermediate sums is(i) are stable much earlier.
Thus, a carry look-ahead technique can be used in the ternary case from the lowest bits up to the MSB of the result. Fig.8 shows the behaviour of a ternary 16-trit multiplier in contrast to a binary 26-bit multiplier in fig.7 , which means nearly the same resolution for both cases. The arrows mark the weight from that onwards the usage of a CLA-circuit is sensible. Desisting from the carrysignal that is rippled through the complete wordlength, the largest standardized intermediate sum is 11 in the ternary case in contrast to 26 in the binary case. This is in spite of the fact that the estimated propagation delay of the ternary adder is twice as large as that of the binary adder. Furthermore, it has to be considered that the wordlength of the ternary result is reduced by a factor of 1.58 compared to the binary result. The concluding chapter describes the development of the ternary CLA-circuits that can be used with the ternary multiplier circuits.
TDDNL carry look-ahead technique
TDDNL-gates have the property that the prechargephase concludes with an invalid combination of the output. Whether the output has to be -1, 0 or 1, the TDDNL-gate has to change its state. Thus, the carry signal ripples through all stages connected in series, independent of the states of the variables. A following gate can switch only when the preceding gate has a valid output combination. As a result, the total propagation delay depends on the number of gates, connected in series. Thus, the aim of the CLA-logic is to speed up this procedure by connecting the function of two or more adders into one special structure.
The basic carry look-ahead principle was developed by Weinberger and Smith in 1956 [ 141 for binary logic. But, we refrain from the usage of generate-and propagatesignals as used in this early work, because this would require two additional TDDNL-gates. Instead, we will use the carry-and the sum-outputs of the circuits directly. For the development of a ternary carry look-ahead circuit we assume that 2 bits per weight have to be processed in a final adder. Fig.9 shows the CLA-structure with 3 inputs tin ;, c,,, and si with the weight 1 and two inputs ci+, and si+, with the weight 3. The logical switching circuit must connect the output c,, n with the source in the case of a sum equal or less than -5 (= -9 + 3 + 1). The output c,,, p has to be connected in the case of a sum equal or greater than +5 (=9-3-l) . In the last case, the output c,,,,, has to be connected with source, if the sum of the five inputs lies in the range -4 to +4. The development of the circuit is conformed to the heuristic method described earlier. To speed up the switching of this circuit, the time critical variable ci,, i is connected to the output as closely as possible. fig.10 shows the complete logic of a 3-trit CLA-circuit. As an argument against such a structure one can remark that the switching delay of seven transistors in series is very high. But, six of the seven inputs are stable much earlier than the critical input ci, i. 
Conclusion
We have derived formulas for the number of adders in an n.n bit binary and ternary multiplication structure. The complexity for a binary array-multiplier is nearly twice as large as that for a ternary multiplier with a corresponding resolution. In addition, we have shown the advatages of the timing behaviour with an optimal order of partial products and intermediate sums. The presented ternary CLA-circuits can be used to speed up the processing of the final multiplication result.
