The logic cost and speed of parallel multipliers implemented in both binary and ternary logic is studied. Binary operand lengths of 8 through 32 bits and the corresponding ternary digit range of 6 through 21 are considered. For the particular design technique used, the b i i r y versions are slightly faster where the speed criterion is in terms of the longest logic path from operands to product. Ternary designs show smaller total cost of gates and a major reduction in the number of required inputs, indicating greatly simplified wiring interconnection complexity.
Introduction
Recent technical literature shows an increasing incidence of papers describing many-valued switching systems. Workable algebras and minimisation techniques for such systems have been proposed (Allen and Givone, 1968 ; Vranesic, Lee, and Smith, 1970; Pugh, 1967) . However, the fundamental question of applicability of nonibinary schemes within the framework of present binary technology has seldom been critically approached. It is apparent that when three-valued storage elements become 'naturally' available, it will be sensible to use them in cofijunction with other ternary logic primitives. In fact, a well-known argument is often encountered that since the most efficient radix for implementation of switching systems is the natural base (e = 2.71828 . . .) it seems likely that the 'best' integral radix is 3 rather than 2. Unfortunately there have been few attempts to show the validity of this hypothesis in the realm of currently available devices. In this paper we take a close look at the possibility of using ternary arithmetic circuitry to facilitate implementation of large units required for parallel multiplication. High speed parallel binary multiplication has been studied for a number of years (Ramamoorthy and Economides, 1969; Wallace, 1964) . Implementations incorporating some of these design ideas in prototype models (Habibi and Wintz, 1970; Pezaris, 1971) and commercial production versions (Anderson, Earle, Goldschmidt, and Powers, 1967; Control Data Corporation, 1966 ) also exist. Our aim is to design a ternary parallel multiplier and compare it with a binary design for equivalent sizes of operands. The comparison is made on the basis of gate and input costs and speed in terms of delay along the longest logic path from operands to product, where each gate is assigned a unit delay r. In the binary case AND-OR-NOT logic is assumed and in ternary the switching primitives of Vranesic et al. (1970) , defined in Table 1 , comprise the basic gates. A gate fan-in limit of 8 is used throughout while fan-out problems are ignored (binary fan-out problems are always worse than in the corresponding ternary system). The basic concepts used to speed up binary multiplication are adapted and extended in the ternary design. Included are:
1. Digit grouping of the multiplier in pairs and appropriate summand selection. 2. Carry-save reduction of the summands in a 'carry-save adder (CSA) tree'. 3. Fast addition of the final two summands to obtain the required product, using a design incorporating first-level carry lookahead logic, essentially the same as in Flores (1963 It is assumed that the 1's-complement is available at the output of the M register and 1 to be added in the low-order bit position is easily introduced into the CSA summand reduction tree. Fig. 1 shows a schematic of the multiplier for n = 24. There are 13 input summands to the tree, one for each bit pair of D plus one last summand which is either M or all zeroes, depending on whether or not d,, = 1, that is, the thirteenth summand results from recoding an implied 00 digit pair to the left of d24. Multiplier recoding and summand selection causes 4r delay. The tree is constructed from binary full adders (the CSA technique) and each level contributes 3r delay since input complements are not assumed available. Since there are five levels, the total tree delay is 15r. The final 48-bit adder is of the first-level lookahead type with a group size of 8 and it contributes 8r of delay. The total delay through the 24-bit multiplier is thus 272. The cost in gates and inputs is given in Table 3 . Similar computations were made for other operand lengths and the results are presented in Section 4.
Ternary multiplier design
Ternary multiplication is arranged following the basic structure of Section 2. While we are still using the same operand names M and D, all switching variables in this section are ternary having truth values 0, 1 and 2. We also note that digit by digit multiplication is never needed, since non-trivial summands can be formed initially using the addition process only. The multiplier is recoded in digit pairs, the same grouping as in binary. Whereas in binary all versions of M were easy to obtain, the same is not true in ternary. Besides shifting ( x 3) and/or complementing My it is necessary to generate f 2 x M and + 4 x M. An extension of the Wallace (1964) technique is used for recoding the multiplier as shown in Table 4 . The two nontrivial summands, 2 x M and 4 x M, are obtained at the start of the multiplication process by using both halves of the adder. A new method of summand reduction in a tree structure is proposed for ternary. Four summands are reduced to two at each level. The sum of four ternary digits can have a maximum value of 8, which can just be represented by two ternary digits, each of value 2, one in the same digit position as the four summand digits and the other in the next higher digit position. It is convenient to call these the sum and carry-out digits, respectively. This 4 to 2 summand reduction process makes the reduction tree smaller than in the binary case, and in addition the ternary equivalent of any binary multiplier has fewer summands from the start.
This brief summary of the design technique for ternary multipliers will now be expanded in terms of the switching functions needed in the various subunits, and a 16-digit multiplier will be used as a concrete example. Its schematic is shown in Fig. 2 . The incoming carry value C,, -,, to the digit pair position didi-,, i = 2, 4, . . ., 16, which is simply the adjacent bit, di -,, in binary, is somewhat more complicated in ternary. In general, it is a function of all preceding multiplier digits, very analogous to the way in which carries are determined in parallel binary adders. This is the motivation for the term carry as used here. There is an incoming carry if the previous digit pair, considered as a 2-digit ternary coded integer has a value equal to or greater than 5 or has a value of 4 with a carry in to its position. Some notation is helpful at this point. Let have a logic value of 2. Note that C, = 2, because (Z,,,), = 2, causing G, to have a value of 2. From the pattern of the above equations for the Q,,, variables, which directly implement the selection procedure of Table 4 for the m = 6 digit pair, it is clear that Q,,, is the only one that has a value of 2. It is set to 2 by the term (ZO,,),Cs. Since Q,,, = 2, tree inputs T6,i attain the values of the summand variables M3,i which is exactly the action called for by Table 4 . It should be noted that when a negative summand is to be used, 3's-complement arithmetic is used, that is, the 2's-complement of the summand is introduced into the tree with a 1 to be added in the low-order position. From the above equations, it is easily shown that the tree inputs are available after 1 12 time units. This assumes that all versions of M are available at the outputs of their registers after 9t time units, as explained below. register. This is a convenient point to complete the discussion of the adder. A first-level lookahead type adder has been assumed similar to the standard procedure as in Flores (1963) . The general fan-in limit of 8 implies a group size of 8 for lookahead purposes and this results in a worst case delay of 82 time units in producing all sum digits. The delay is derived as follows. The propagate and generate functions for each digit position i are formed in 32 as followed by formation of all carries in 42 using standard binary logic. A further 12 is needed to compute the sum function The delay through a 4 to 2 reduction level is clearly 52 and thus through the complete tree of Fig. 2(a) is 157. The above discussion shows a 342 delay through the complete ternary It is more difficult to decide w o n measures for cost com~ari-binary, while input count is about 38 % lower.
sons. One of the more conventional techniques is to use the total number of gates and inputs as the two measures, and that has been done here. The total number of inputs to gates is a straightforward measure of the combination of circuit package pin count and intra-package circuit connection complexity. The difficulty is in arriving at a way of counting gate costs. The basic assumption made here is that each binary gate has a cost of one unit. Ternary AND, OR and INVERTER gates are also assumed to have unit cost since they have circuit implementations very similar to the binary gates. The two versions of ternary CYCLING gates (mod 3 adder and mod 3 subtractor) are more expensive to implement than any of the other gates, hence a weight of 3 units has been assigned to each cycling gate. The effect of this choice is not too critical since
Conclusions
An attempt was made to design ternary parallel multipliers with propagation delays along the longest logic path comparable to those of their binary equivalents. Actual designs show the delays slightly favouring the binary case. Circuit cost in terms of gate count is lower in ternary designs although not significantly enough to be of major importance. The key result is exhibited by Fig. 5 , showing considerable reduction in the number of inputs in ternary cases. This may be sufficient to permit physical implementation of ternary multipliers at a size level where construction of the binary equivalents is currently not feasible because of wiring interconnection complexity.
