# Binary addition and multiplication in cellular space 

By E. Katona

Cellular automata are highly parallel bitprocessors, so they are suitable for the bitparallel execution of distinct computational tasks. In this paper powerful bitparallel algorithms are given for fixed point binary addition and multiplication, taking into account the cellprocessor architecture developed by T. Legendi [1]. For this architecture there have been constructed more then 100 cellular algorithms solving different computational tasks [6]. In a large cellular space a high number of cellular adders, multipliers and other processing elements may be embedded, and more complex tasks may be computed in parallel, as matrix multiplication [4], certain data processing tasks [5], etc.

## 1. Introduction

A cellular automaton is a highly parallel processor, but the economical programming of such a processor is not an easy task. If macro-cells are applied (a cell works as a microprocessor), then the programming of the cellular structure is somewhat easier [7], but the architecture has lower flexibility (fixed operations, fixed word length, etc.) and in general the bitparallel execution of the operations is impossible.

If micro-cells are applied (having maximum 16 states) with variable transition functions, then the cellprocessor has high flexibility and a totally bitparallel processing is possible. In [3], [4], [5], [6] and in this paper it is shown that a cellprocessor consisting of micro-cells is economically programmable, and the speed of the cellular algorithms is wordlength-independent in most cases.

The cellprocessor architecture proposed in [1] is based on the micro-cell conception, and has - from the point of view of this paper - the following characteristic properties:
(i) The cellular space is a two-dimensional rectangle-form cell-matrix which is bounded by dummy-cells (the dummy cells have no transition funtion, but their states can be set from the outside world). In the cellular net the von Neumann neighbourhood is assumed.
(ii) The cells do not have a fixed transition function, but receive commands (microinstructions) from a central control (CCPU), and arbitrary local transition function may be realized by the execution of a certain sequence of microinstructions. This implies that the cellprocessor can work with an arbitrary local transition function, and - moreover - it can work with 'time-varying transition function.
(iii) The cellular space is inhomogeneous, that is, the individual cells may work with different transition functions at the same time. To ensure this property, each cell has an internal state. The cells having different internal states may work with different transition functions. So, if there are $n$ different internal states, then maximum $n$ different transition functions may work in parallel. The internal states are set at $t=0$, and during the working of the cellprocessor they are unchanged.

The transition functions will be defined according to [2] by microconfiguration terms. A microconfiguration term has the form:

| the state of a group <br> of cells at time $t$ |
| :---: |$|$| the required state of (another) |
| :---: |
| group of cells at time $t+1$ |

Each cell on the right side occurs on the left side, too, and is marked by double frame for the identification. Because of the inhomogeneity a microconfiguration term may describe more transition functions together.

The notation $\left[x_{k} x_{k-1} \ldots x_{1}\right.$ ] will be used often in the text, which means a $k$-digit binary number having the digits $x_{k}, x_{k-1}, \ldots, x_{1}\left(x_{i} \in\{0,1\}\right)$.

## 2. Binary addition

Binary addition is the most fundamental arithmetic operation. The cellular algorithm descrited below is applied in many further cellular processing elements (see the cellular multiplier in this paper, and [4], [5], [6]).

The cellular binary addition is based on the "carry save" addition algorithm. Let $x=\left[x_{k} \ldots x_{1}\right], y=\left[y_{k} \ldots y_{1}\right]$ and $z=\left[z_{k} \ldots z_{1}\right]$ be binary numbers of $k$ digits to be added. In the first step $x$ and $y$ are added in a parallel way: a (partial) sum $\dot{s}=\left[s_{k} \ldots s_{1}\right]$ and a carry vector $c=\left[c_{k} \ldots c_{1}\right]$ is computed as follows

$$
\begin{equation*}
\left[c_{i} s_{i}\right]:=x_{i}+y_{i} \quad \text { for any } i \tag{1}
\end{equation*}
$$

In the second step the number $z$ can be added to $s$ and $c$ by the formula

$$
\begin{equation*}
\left[c_{i}^{\prime} s_{i}^{\prime}\right]:=z_{i}+s_{i}+c_{i-1} \quad \text { for any } i \tag{2}
\end{equation*}
$$

(The sign'serves for the distinction between the old and new values of $s$ and c.)
If there are more numbers to be added, then they can be added to $s$ and $c$ also by formula (2). The complete sum of the operands should be computed from the last $s$ and $c$ in $k-1$ steps applying the formula

$$
\begin{equation*}
\left[c_{i}^{\prime} s_{i}^{\prime}\right]:=s_{i}+c_{i-1} \quad \text { for any } i \tag{3}
\end{equation*}
$$

On the basis of the described parallel addition algorithm it is easy to construct a cellular automaton for binary addition. It consists of $k$ adder cells, each containing a sum bit $S$ and a carry bit $C$ ( 4 -state cells). A dummy cell is connected to each adder cell as upper neighbour (Fig. 1).


Fig. 1
At $t=0$ the bits $S$ and $C$ are 0 , and the bits $I$ contain the first number to be added. In any further step a new number will be written into the bits $I$ and the adder cells work with the transition function:


After the input of the last operand the dummy cells are set into 0 and after $k-1$ steps the complete sum of the operands is computed in the bits $S$ of the cell-row. (In this way the above transition function includes the formulas (1), (2), (3).)

The addition of $n$ numbers each consisting of $k$ bits, needs $n+k-1$ steps, so the parallel addition algorithm is economical for many operands.

Remark. To prevent the overflow, for $n$ operands a cellular adder consisting of $k+\log _{2} n$ cells should be used. If only $k$ cells are applied, then the leftmost cell needs a special overflow-watching transition function (inhomogeneity).

The above cellular adder has many simple applications, as the binary counter, the computation of certain number-rows (e.g. Fibonacci-numbers), vector addition, etc. [6]; but the most important application is the binary multiplication discussed in the next point.

## 3. The multiplication of two binary numbers

The cellular multiplication algorithm is based, as usual, on the addition: the partial products will be generated in a special cell-row, and another cell-row under it works as an adder (Fig. 2).


Fig. 2

The partial products are generated in an overlapped manner. Between the digits of the multiplicand $a=\left[a_{k} \ldots a_{1}\right]$ and the multiplier $b=\left[b_{k} \ldots b_{1}\right]$ zero digits are inserted, and in such a form they move step by step one against another in the upper cell-row (Fig. 3).
step 1

$$
\begin{array}{r}
a_{4} 0 a_{3} 0 a_{2} 0 a_{1} \\
b_{4} \circ b_{3} 0 b_{2} 0 b_{1}
\end{array}
$$

$$
\frac{\mathrm{adder}}{a_{4} 0 a_{3} 0 a_{2} 0 a_{1}} \begin{array}{r}
b_{4} 0 b_{3} 0 b_{2} 0 b_{1}
\end{array}
$$

step 3

$$
\begin{array}{|c}
\hline \text { adder } \\
\begin{array}{r}
a_{4} 0 a_{3} 0 a_{2} 0 a_{1} \\
b_{4} 0 b_{3} 0 b_{2} 0 b_{1}
\end{array} \\
\text { adder }
\end{array}
$$

Fig. 3
Cellular algorithm for binary multiplication in the case $k=4$.

The products of the operand digits staying on the same position are summed by the adder (on Fig. 3 in the first step $a_{1} b_{4}$, in the second step $a_{2} b_{4}$ and $a_{1} b_{3}$ are summed). Fig. 3 shows well that in steps $1,2,3$ and 4 the bit $b_{4}$ is multiplied by $a_{1}, a_{2}, a_{3}$ and $a_{4}$, thus the partial product $\left[a_{4} a_{3} a_{2} a_{1}\right] \cdot b_{4}$ is generated for the adder. The partial products corresponding to $b_{3}, b_{2}$ and $b_{1}$ are computed in a similar way, and each is created on the appropriate position.

The two rows of the cellular multiplier have distinct transition functions, which may be defined together as follows:


If $k$-bit numbers are multiplied, then the product has $2 k$ bits, therefore an adder of length $2 k$ should be used. Thus the multiplier needs $4 k 4$-state cells.

If at $t=0$ the configuration of Fig. 3 (step 1) is assumed, then at $t=2 k-1$ all the partial products are generated. It is easy to see that at $t=2 k$ the rightmost $k$ cells of the adder have zero carry bits. Therefore to compute the complete product further $k$ steps are needed, thus the whole multiplication process uses $3 k$ steps.

Remark. If between the digits of $a$ and $b$ the digits of further two $k$-bit numbers $x$ and $y$ are written (instead of the zeros), then the multiplier computes the expression $a \cdot b+x \cdot y$ ! The cellular multiplier may be used for vector-multiplication in a similar way [4].

## 4. Multiplication of more then two numbers

In this section a cellular algorithm is given to compute the product $x_{1} \ldots x_{n}$ where $x_{i}$ is a $k$-bit number and $0 \leqq x_{i}<1$ holds for any $i$ (the leftmost digit of $x_{i}$ has the positional value $2^{-1}$ ). To solve this task the cellular multiplier of section 3 will be modified: 3-bit cells (i.e. 8-state cells) will be used where the third bits in the adder cells serve for control (Fig. 4).


Fig. 4
Cellular multiplier for more then two numbers. The control bits are marked by $V$.

At $t=0$ the number $x_{1}$ is stored in the bits " $S$ " of the adder. The numbers $x_{2}, \ldots, x_{n}$ come from the outside world and go left on the bits " $B$ ". Before each number $x_{i}$ a control signal of value 1 is sent, which goes left on the control bits and copies the bits " $S$ " into the bits " $A$ " (at the same time the adder is cleared). Thus the number $x_{i}$ coming from the outside world is multiplied by the product $x_{1} \cdot \ldots \cdot x_{i-1}$, and the process may be repeated until it is necessary.

According to the above principle, the transition functions of section 3 should be modified as follows.

If the adder cell contains a control signal 0 :

where $\quad\left[C^{\prime} S^{\prime}\right]=S+C+A \cdot B$.

## Listing

$$
\begin{array}{cccccccccc}
\text { STEP } & 0: & . & . & . & . & . & . & . & . \\
& & i & 0 & 0 & i & 0 & \dot{0} & 0 & 0 \\
& & . & . & . & . & . & . & . & <
\end{array}
$$

STEP 1:

$$
\begin{array}{llllllll} 
& - & \cdot & & & & . & j \\
i & \dot{0} & \dot{0} & i & \dot{0} & \dot{0} & \dot{0} & 0
\end{array}
$$

STEP 13:

$$
\begin{array}{llllllll}
1 & . & 0 & . & 1 & . & . & . \\
0 & i & i & 0 & 1 & i & 0 & 0 \\
. & . & . & . & . & < & . & .
\end{array}
$$

$\begin{array}{lllllllll}\text { STEP 12: } & 1 & . & 0 & . & 1 & . & . & . \\ & \dot{0} & i & i & \dot{1} & 1 & i & 0 & . \\ & . & . & . & . & . & < & . & .\end{array}$

$$
\begin{array}{llllllll}
. & 0 & . & 1 & . & . & . & 1 \\
. & i & i & . & . & 1 & . & 0 \\
0 & 1 & 1 & 0 & 2 & 0 & 0 & 0 \\
. & . & . & . & < & . & . & .
\end{array}
$$

STEP 14:

$$
\begin{array}{lllllll}
i & 0 & 0 & i & 0 & 0 & 0
\end{array} 0
$$

$$
\begin{array}{llllllll}
0 & . & 1 & . & . & . & 1 & . \\
\dot{0} & i & 1 & 1 & 0 & 0 & 0 & 0 \\
. & . & . & < & . & . & . & .
\end{array}
$$

STEP 15:

$$
\begin{array}{llllllll}
. & 1 & . & j & . & 1 & . & 1 \\
\dot{0} & i & i & 0 & 0 & 0 & i & 0
\end{array}
$$

STEP 16;
$\begin{array}{llllllll}1 & . & j & . & 1 & . & 1 & \\ \dot{0} & i & 0 & 0 & 0 & 0 & 1 & 1\end{array}$

STEP 17:
$\begin{array}{llllllll}. & i & . & 1 & . & 1 & . & 1 \\ 0 & 0 & 0 & 1 & . & 1 & . & 0 \\ < & . & . & . & . & . & . & 1\end{array}$
STEP 18:
$\begin{array}{llllllll}i & . & 1 & . & 1 & . & 1 & . \\ 0 & j & 1 & i & 1 & i & 1 & i \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1\end{array}$

STEP 19:
$\begin{array}{cccccccc}. & 1 & . & 1 & . & 1 & . & 0 \\ . & 0 & . & 1 & . & 1 & . & 1 \\ 0 & 0 & 1 & 1 & 2 & 1 & 2 & 1 \\ . & . & . & . & . & . & & .\end{array}$
STEP 8:

$$
\left.\begin{array}{cccccccccccccccccc}
\dot{i} & . & 1 & . & 1 & . & 0 & . & \text { STEP 20: } & 1 & . & 1 & . & 1 & . & 0 \\
0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & & & & . & 0 & 0 & 1 & 3 & 0 & 3
\end{array}\right)
$$

STEP 9:

$$
\begin{array}{llllllll}
. & 1 & . & 1 & . & 0 & & 1 \\
. & 1 & . & 0 & . & 0 & . & 1 \\
0 & 0 & 0 & 0 & i & 1 & 0 & 0
\end{array}
$$

STEP 21:

$$
\begin{array}{llllllll}
. & 1 & . & 1 & . & 0 & . & \\
\dot{0} & \dot{0} & i & 1 & 1 & i & 1 & \dot{0} \\
\hline
\end{array}
$$

STEP 10:

$$
\begin{array}{llllllll}
1 & . & 1 & . & 0 & . & 1 & . \\
. & . & 1 & . & 0 & . & 0 & . \\
0 & 1 & 0 & 0 & 1 & 1 & 0 & 1
\end{array}
$$

STEP 22:

$$
\begin{array}{lllllll}
1 & . & 1 & . & 0 & . & i \\
\dot{0} & i & 0 & i & 0 & i & 0
\end{array}
$$

STEP 11:

$$
\begin{array}{cccccccc}
. & 1 & . & 0 & . & 1 & . & i \\
. & i & i & 1 & . & 0 & . & i \\
0 & 1 & 1 & 0 & i & 1 & 0 & 0 \\
. & . & . & . & . & . & < & .
\end{array}
$$

If the adder cell contains a control signal 1 :


The multiplication process is demonstrated on a simulation example (see Listing). The product of $x_{1}=0.1001, x_{2}=0.1101$ and $x_{3}=0.1110$ will be computed by an 8 -bit multiplier. The multiplier is displayed in 4 rows, according to Fig. 4, but in the third row the bits $S$ and $C$ are printed together in the form [ $C S$ ] (that is, for example the value 2 means $C=1$ and $S=0$ ). The points mean insignificant zeros in each row.

At $t=0, x_{1}$ is stored in the adder, and a control signal marked by " $<$ " starts on the right end of the multiplier. Between $t=1$ and $t=8$ the number $x_{1}$ is copied into the bits " $A$ " and it is shifted right (hereby zeros are inserted between the digits). The number $x_{2}$ comes from outside and will be multiplied by $x_{1}$. At $t=10$ the rightmost digit of $x_{1} x_{2}$ is computed. Already at this moment a new control signal may be started which ensures the multiplication of $x_{1} x_{2}$ by $x_{3}$, thus an overlapping is possible between the consecutive multiplications.

For the multiplication of $n$ numbers $(2 k+2)(n-1)+2 k \approx 2 k n$ steps are required, and the modified multiplier consists of $4 k 8$-state cells. The product contains $2 k$ digits (the leftmost digit has the positional value $2^{-1}$ ) and the first $\eta k-\log _{2} k-\log _{2} n$ bits are always correct.

## 5. Concluding remarks

In this paper three fundamental cellular processing elements have been discussed, each designed for the same cellprocessor architecture [1]. Each processing element is based on a bitparallel cellular algorithm where nearly all cells work effectively in each time-step. By the interconnection of such simple processing elements more complex tasks may be solved in bitparallel by a cellprocessor.

```
RESEARCH GROUP ON THEORY OF AUTOMAT'A
HUNGARIAN ACADEMY OF SCIENCES
SOMOGYI U. 7. 
SZI:GED, HUNGARY
H-6720
```


## References

]1] Legendi, T., Cellprocessors in computer architecture, Computational Linguistics and Computer Languages, v. 11, 1977, pp. 147-167.
[2] Legendi, T., A 2D transition function definition language for a subsystem of the CELLAS cellular processor simulation language, Computational Linguistics and Computer Languages, v. 13, 1979, pp. 169-194.
[3] Katona, E., T. Legendi, Cellular algorithms for fixed point decimal addition and muítiplication, Elektron. Informationsverarb. Kybernet., v. 17, 1981, pp. 637-644.
[4] Katona, E., Cellular algorithms for fixed point vector- and matrix-multiplication, Proceedings of the Conference Programmine Systems' 81, pp. 262-280, in Hungarian.
[5] Katona, E., The application of cellprocessors in conventional data processing, Proceedings of the Third Hungarian Computer Science Conference, Publishing House of the Hungarian Academy of Sciences, Budapest, 1981, pp. 295-306.
[6] Katona, E., Cellular algorithms (Selected results of the cellprocessor team led by T. Legendi), Von Neumann Society, Budapest, 160 pages in Hungarian, 1981
[7] Domán, A., A 3-dimensional cellular space, Sejtautomaták, Gondolat Kiadó, Budapest, 1978, in Hungarian.
[8] Vollmar, R., Algorithmen in Zellularautomaten, B. G. Teubner, Stuttgart, 1979.
[9] Nishio, H., Real time sorting of binary numbers by 1-dimensional cellular automaton, Proceedings of the International Symposium on Uniformly Structured Automata and Logic, Tokyo, 1975, pp. 153-162.

