## XI. COGNITIVE INFORMATION PROCESSING**

Academic and Research Staff

| Prof. M. Eden | Prof. D. E. Troxel | Dr. D. M. Ozonoff |
| :--- | :--- | :--- |
| Prof. J. Allen | Prof. I. T. Young | Dr. O. J. Tretiak |
| Prof. B. A. Blesser | Dr. R. R. Archer | F. X. Carroll |
| Prof. T. S. Huang | Dr. G. H. Granlund | Deborah A. Finkel |
| Prof. F. F. Lee | Dr. E. G. Guttmann | E. R. Jensen |
| Prof. S. J. Mason | Dr. L. P. A. Henckels | J. W. Modestino |
| Prof. W. F. Schreiber | Dr. K. R. Ingham | J. S. Ventura |
|  | Dr. J. I. Makhoul |  |

## Graduate Students

| B. S. Barbay | M. Hubelbank | R. J. Shillman |
| :--- | :--- | :--- |
| J. Bonn | F. H. Ives | I.. G. Sitler |
| W. T. Borroz | J. W. Klovstad | J. R. Sloan |
| J. E. Bowie | H. S. Magnuski | A. A. Smith |
| B. E. Boyle | L. A. Marks | R. D. Solomon |
| P. Coueignoux | P. L. Miller | M. Sporer |
| J. R. Ellis, Jr. | B. Minkow | J. H. Stahler |
| I. S. Englander | O. R. Mitchell, Jr. | W. D. Stratton |
| A. M. Fakhr | J. A. Myers | H-m. D. Toong |
| A. E. Filip | D. O'Shaughnessy | K. P. Wacks |
| S. G. Finn | R. A. Piankian | Y. D. Willems |
| W. B. Grossman | G. Poonen | H. M. Wolfson |
| D. W. Hartman | R. S. Putnam | K. H. Yung |
|  | R. San Antonio |  |

## A. NEW HIGH-SPEED MULTIPLIER DESIGN

With the advent of MSI and LSI integrated circuit technology, there is no doubt that digital multipliers of very high speed can be achieved, once it is agreed what should be incorporated in these chips. ${ }^{1}$ In the meantime, we can achieve a very fast design by simultaneously exploiting the mathematical structure of binary two's-complement multiplication and existing MSI circuits which can be adapted in a natural way to the structure of this task. Accordingly, we shall show that the expression for binary multiplication can be rewritten to suggest use of the 74181 Arithmetic Logic Unit (ALU) in a straightforward way that achieves high speed, simple layout, and very little logic external to the ALU array.

In order to display the desired structure of multiplication, we shall consider the multiplication of two 4 -bit two's-complement numbers. Let each such number be represented as

[^0](XI. COGNITIVE INFORMATION PROCESSING)
$$
\mathrm{Z} \equiv \mathrm{z}_{3} \mathrm{z}_{2} \mathrm{z}_{1} \mathrm{z}_{0}=-2^{3} \mathrm{z}_{3}+2^{2} \mathrm{z}_{2}+2^{1} \mathrm{z}_{1}+2^{\mathrm{O}} \mathrm{z}_{\mathrm{o}}
$$
where each $z_{i}$ is either 1 or 0 , so that the product of two such numbers is
\[

$$
\begin{aligned}
X Y= & 2^{6} x_{3} y_{3}-2^{5}\left(x_{3} y_{2}+x_{2} y_{3}\right)+2^{4}\left(-x_{3} y_{1}+x_{2} y_{2}-x_{1} y_{3}\right) \\
& +2^{3}\left(-x_{3} y_{o}+x_{2} y_{1}+x_{1} y_{2}-x_{o} y_{3}\right) \\
& +2^{2}\left(x_{2} y_{o}+x_{1} y_{1}+x_{0} y_{2}\right)+2^{1}\left(x_{1} y_{o}+x_{o} y_{1}\right)+2^{o_{0}} x_{o} y_{o}
\end{aligned}
$$
\]

This sum is commonly arranged in an array in which each column contains factors of like powers of 2 , as in Fig. XI-1. The factors can be further rearranged, as

$$
\begin{gathered}
-\mathrm{x}_{3} \mathrm{y}_{0}+\mathrm{x}_{2} \mathrm{y}_{0}+\mathrm{x}_{1} \mathrm{y}_{0}+\mathrm{x}_{0} \mathrm{y}_{0} \\
-\mathrm{x}_{3} \mathrm{y}_{1}+\mathrm{x}_{2} \mathrm{y}_{1}+\mathrm{x}_{1} \mathrm{y}_{1}+\mathrm{x}_{0} \mathrm{y}_{1} \\
-\mathrm{x}_{3} \mathrm{y}_{2}+\mathrm{x}_{2} \mathrm{y}_{2}+\mathrm{x}_{1} \mathrm{y}_{2}+\mathrm{x}_{0} \mathrm{y}_{2} \\
+\mathrm{x}_{3} \mathrm{y}_{3}-\mathrm{x}_{2} \mathrm{y}_{3}-\mathrm{x}_{1} \mathrm{y}_{3}-\mathrm{x}_{0} \mathrm{y}_{3}
\end{gathered}
$$

Fig. XI-1. Array representation of two's -complement multiplication.

| $+\left\{\begin{array}{l}  \\ \\ x_{3} y_{3} \end{array}\right.$ |  | $x_{2} y_{2}$ | $\begin{aligned} & x_{2} y_{1} \\ & x_{1} y_{2} \end{aligned}$ | $\begin{aligned} & \mathrm{x}_{2} \mathrm{y}_{0} \\ & \mathrm{x}_{1} \mathrm{y}_{1} \\ & \mathrm{x}_{0} \mathrm{y}_{2} \end{aligned}$ | $\begin{aligned} & \mathrm{x}_{1} \mathrm{y}_{0} \\ & \mathrm{x}_{0} \mathrm{y}_{1} \end{aligned}$ | $x_{0} y_{0}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $-\{$ | $\begin{aligned} & \mathrm{x}_{3} \mathrm{y}_{2} \\ & \mathrm{x}_{2} \mathrm{y}_{3} \end{aligned}$ | $\begin{aligned} & \mathrm{x}_{3} \mathrm{y}_{1} \\ & \mathrm{x}_{1} \mathrm{y}_{3} \end{aligned}$ | $\begin{aligned} & x_{3} y_{3} \\ & x_{0} y_{3} \end{aligned}$ |  |  |  |

Fig. XI-2. Multiplication array, grouped in positive and negative terms.
shown in Fig. XI-2, in order to group positive and negative terms. At this stage, each row has a constant factor for each term such as $y_{o}$ for the top row, and these can be factored out as bits that control the conditional inclusion of a given row in the final sum. Thus, in Fig. XI-3, the top row ( $\mathrm{x}_{2} \mathrm{x}_{1} \mathrm{x}_{\mathrm{o}}$ ) will be added into the sum just in case $y_{0}=1$, and similarly for the other rows. Figure XI-3 also shows 6 conditional terms to be summed, but one of these, $x_{3} y_{3}$, affects only the most significant bit position. If this term is included in any other row (say, row 6), the only change


Fig. XI-3. Multiplication array, grouped by rows and their respective control bits.


Fig. XI-4. Illustrating the parallel nature of the multiplication task.

## (XI. COGNITIVE INFORMATION PROCESSING)

in the final result might be a carry out of the most significant bit. When sign extension is not desired, it is thus possible to incorporate the $\mathrm{x}_{3} \mathrm{y}_{3}$ term in row 6 , and we are left with just 5 rows to sum. It is this representation of the product as a sum of conditional terms that can be exploited by the ALU design.

Having rewritten the product as a sum of terms, each conditioned on a control bit, the next step is to minimize the sum-and-carry delays by exploiting the inherent parallelism of the array. Referring to the five rows as A through E, Fig. XI-4 shows how a tree structure permits simultaneous sums to be computed, rather than performing each indicated sum in serial, left-to-right order. In Fig. XI-4, each box denotes an ALU adder, and we assume that each add is completed in $\Delta$ seconds. During the first $\Delta$ seconds two adds are completed, followed by one add in each of the two succeeding intervals. There are two advantages in this scheme. First, the total delay would be $4 \Delta$ seconds if the adds were done serially, but when the parallelism is utilized, only $3 \Delta$ seconds of delay result. More generally, for larger sized numbers N bits long, a similar binary tree would lead to $\left(\log _{2} \mathrm{~N}\right) \Delta$ seconds delay, whereas a serial procedure would require $(\mathrm{N}-1) \Delta$ seconds delay. For $\mathrm{N}=$ 16, the saving is $\Delta(15-4)=11 \Delta$, a very substantial figure. When $N$ is not a power of 2 , some branches of the full binary tree are pruned, but the saving in time because of parallelism is still obtained. The second advantage is that the binary tree arrangement can be implemented in a straightforward and natural way by using the 74181 ALU, 24-pin MSI package.

The 74181 ALU, shown in Fig. XI-5, operates on two 4 -bit inputs in a manner prescribed by the four control bits, $S_{0}$ through $S_{3}$, and the Mode Control bit $M$, to produce a single 4 -bit output. As shown in Fig. XI-6, all of the needed control functions can be realized by appropriate use of $S_{O}$ through $S_{3}, M$, and $\bar{C}_{0}$, the last being the input carry to the least significant bit. Note that only the double conditional sum, ( $A$ if $z$ ) $+(B$ if $y)$, requires extra circuitry to translate the condition bits ( $z$ and $y$ ) into ALU controls, but that this circuitry is very simple, containing only an XOR gate and an inverter.

Figure XI-7 shows the complete design for a $4 \times 4$ multiplier, in which the control circuitry is shown in detail. Depending on the size of the multiplier desired, extra time savings may be realized by appropriate partitioning of the array and insertion of carries, but the basic details remain the same. The authors have designed $16 \times 16$ and $16 \times 24$ arrays, which illustrate further refinements. These designs are available to the interested reader.

A further advantage of the $A L U$ is its wide availability. Originally, it was introduced in TTL, but Schottky TTL and MECL 10,000 versions are now available. Worst-case multiplication times will depend on which one of these packages is used, but a $16 \times 16$ design should yield a completion time of 95-100 ns


Fig. XI-5. Logic diagram for the 74181 Arithmetic Logic Unit.

| BASIC | CONTROL SIGNALS |  |  |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| FUNCTION | $S_{0}$ | $S_{1}$ | $\mathrm{S}_{2}$ | $\mathrm{S}_{3}$ | M | $\bar{c}_{0}$ |
| 0 | 1 | 1 | 0 | 0 | 1 | $\times$ |
| A | 0 | 0 | 0 | 0 | 0 | 1 |
| B | 0 | 1 | 0 | 1 | 1 | $\times$ |
| $A+B$ | 1 | 0 | 0 | 1 | 0 | 1 |
| $A-B$ | 0 | 1 | 1 | 0 | 0 | 0 |
| CONTROL |  |  |  |  |  |  |
| FUNCTION |  |  |  |  |  |  |
| $A+B$ | 1 | 0 | 0 | 1 | 0 | 1 |
| A-B | 0 | 1 | 1 | 0 | 0 | 0 |
| $A+(B$ if $y)$ | y | 0 | 0 | $y$ | 0 | 1 |
| $(A$ if $z)+(B$ if $y)$ | $z \oplus y$ | $\bar{z}$ | 0 | y | $\bar{z}$ | 1 |

Fig. XI-6. Control functions for the 74181 Arithmetic Logic Unit.


Fig. XI-7. Four by four multiplier block diagram, with external control circuitry.
in Schottky TTL. This is considerably faster than the performance obtainable from specialized multiplier packages, such as the Fairchild 9344 or the Advanced Micro Devices AM 2505. Since the ALU package has many uses, it is relatively inexpensive, particularly considering the resulting multiplier speed. The package count, and hence power, is high (approximately $N(N+1) / 4$ for an $N \times N$ multiply; for $N=16,69$ ALU's were required) but layout is simple, and no other design incorporating standard commercial MSI packages has been able to yield the speed of this ALU array.

Certainly faster or cheaper multipliers have been built. The ALU in a binary tree, however, appears to be an optimal choice when very high speed is desired from standard commercial packages.
J. Allen, E. R. Jensen

References

1. S. D. Pezaris, "A 40 -ns 17-bit by 17 -bit Array Multiplier," IEEE Trans. on Computers, Vol. C-20, No. 4, pp. 442-447, April 1971.

[^0]:    *This work was supported by the National Institutes of Health (Grant 5 POl GM14940-05), and by the Joint Services Electronics Programs (U.S. Army, U.S. Navy, and U.S. Air Force) under Contract DAAB07-71-C-0300).

