



## Improved design for parallel multiplier based on phase-mode logic

| 著者                | Horima Yohei, Onomi Takeshi, Kobori<br>Masayuki, Shimizu Itsuhei, Nakajima Koji |  |  |
|-------------------|---------------------------------------------------------------------------------|--|--|
|                   |                                                                                 |  |  |
| journal or        | IEEE Transactions on Applied Superconductivity                                  |  |  |
| publication title |                                                                                 |  |  |
| volume            | 13                                                                              |  |  |
| number            | 2                                                                               |  |  |
| page range        | 527-530                                                                         |  |  |
| year              | 2003                                                                            |  |  |
| URL               | http://hdl.handle.net/10097/48251                                               |  |  |

doi: 10.1109/TASC.2003.813924

# Improved Design for Parallel Multiplier Based on Phase-Mode Logic

Yohei Horima, Takeshi Onomi, Masayuki Kobori, Itsuhei Shimizu, and Koji Nakajima

Abstract—For the improvement of the Phase-Mode parallel multiplier, we propose to use a Booth encoder as a substitute of an AND array. Booth's algorithm is often used for generations of partial products. The scale of the encoder does not matter for defining its operation frequency because the Phase-Mode Booth encoder is a pipelined structure. We suggest that the encoder is used as a serial encoder to reduce the number of Josephson junctions (JJ). There are two methods for applying the Booth encoder to the current structure. The first method is shifting multiplicands. The second method is shifting partial products and complementary signals. The total JJ's in both methods are less than the AND array in large scale. The Phase-Mode Booth encoder with 2.5 kA/cm<sup>2</sup> Nb/AlOx/Nb junctions can operate over 30 GHz according to the numerical simulations.

*Index Terms*—Booth encoder, Booth's algorithm, parallel multiplier, phase-mode, single flux quantum.

#### I. INTRODUCTION

T HE development of single flux quantum (SFQ) digital circuits has made dramatic progress after the SFQ logic was summarized systematically [1], [2], [3]. The attractive points of an SFQ logic are its power dissipation and operation speed. Therefore, an SFQ digital logic is the candidate for the main electronics in the next generation.

We have proposed the Phase-Mode logic as the main SFQ logic. Followings are the differences between the Phase-Mode logic and RSFQ logic: 1) an ICF (INHIBIT controlled by fluxon) gate is the fundamental device of the Phase-Mode logic [4], [5]. 2) The system operates by one clock signal such as an asynchronous system. We introduced the basic structure of the Phase-Mode pipelined parallel multiplier [6] to realize a high-performance digital signal processor (DSP). The primitive 2-bit × 2-bit multiplier fabricated by the NEC 2.5 kA/cm<sup>2</sup> Nb/AlOx/Nb process technology was demonstrated successfully [7]. In this paper, we introduce a more efficient method to realize a large scale multiplier. For the improvement of the Phase-Mode multiplier, we propose the Phase-Mode Booth encoder to generate partial products. We compare the performance between the current structure and the encoder.

The numerical simulation is carried out by JSIM [8], Verilog-HDL, and SCOPE [9] assuming the NEC standard 2.5 kA/cm<sup>2</sup> Nb/AlOx/Nb process.

The authors are with L.E.I.S., Research Institute of Electrical Communications, Tohoku University, Sendai 980-8577, Japan (e-mail: yhorima@nakajima.riec.tohoku.ac.jp; onomi@riec.tohoku.ac.jp; m-kobori@nakajima.riec.tohoku.ac.jp; a-itsu@nakajima.riec.tohoku.ac.jp; hello@.riec.tohoku.ac.jp).

Digital Object Identifier 10.1109/TASC.2003.813924

#### II. PHASE-MODE PARALLEL MULTIPLIER

The Phase-Mode parallel multiplier consists of three major arithmetic blocks such as an AND array block, a carry save adder (CSA) block, and a carry lookahead (CLA) block [6]. The AND array generates partial products. The CSA block is for additions of partial products. The CLA block creates final solutions of multiplications. We designed the primitive 2-bit  $\times$  2-bit parallel multiplier consisting of 262 JJ's to operate correctly [7]. According to the experiment, the extra Josephson transmission lines (JTL) are needed for the interfaces, especially the data distributions for the AND array, because the current process technology is only two wiring layers. As the result, the number of JJ's is increased, and the throughput of circuits is less than we estimated. To solve this problem, the number of wiring layers must be increased. Also, it is important to apply a new algorithm as a substitute of the current structure to reduce the number of JJ's and to improve the performance.

#### III. PHASE-MODE BOOTH ENCODER

Scaling a circuit size without abating a circuit performance is a key when we improve a multiplier. Generating partial products and adding partial products use almost 90% of the area of a general multiplier. There are many fast multiplication algorithms to reduce arithmetic operations. The Booth's algorithm [10], [11] is the famous algorithm to generate partial products. This algorithm is often applied to present semiconductive electronics for increasing the throughput and for making the compact circuits. The Booth encoder can be realized efficiently by using the Phase-Mode logic. In this section, we explain Booth's algorithm and the Phase-Mode Booth encoder as a substitute of an AND array to design a large scale parallel multiplier.

#### A. Booth's Algorithm

Booth's algorithm checks each two-bit of a multiplier at a time. A multiplier using Booth's algorithm operates faster because total partial products are reduced.

The main idea of Booth's algorithm is the modification of a multiplier, Y, in the multiplication as the next equation [11],

$$Y = -y_{n-1}2^{n-1} + \sum_{i=0}^{n-2} y_i 2^i,$$
  
$$= -\sum_{i=0}^{n-1} y_i 2^i + 2\sum_{i=0}^{n-2} y_i 2^i,$$
  
$$= \sum_{i=0}^{n-1} (-y_i + y_{i-1})2^i.$$
 (1)

#### 1051-8223/03\$17.00 © 2003 IEEE

Manuscript received August 9, 2002; revised December 1, 2002. This work was supported by the Special Coordination Funds of the Ministry of Education, Culture, Sports, Science and Technology of the Japanese Government.

TABLE I SUMMARY OF BOOTH'S ALGORITHM

| $y_{i+1}$ | $y_i$ | $y_{i-1}$ | operation |
|-----------|-------|-----------|-----------|
| 0         | 0     | 0         | 0         |
| 0         | 1     | 0         | +X        |
| 1         | 0     | 0         | -2X       |
| 1         | 1     | 0         | -X        |
| 0         | 0     | 1         | +X        |
| 0         | 1     | 1         | +2X       |
| 1         | 0     | 1         | -X        |
| 1         | 1     | 1         | 0         |
|           |       |           |           |

Y and X denote a multiplier and a multiplicand, respectively.  $y_i$  denotes the *i*th bit of the multiplier.

 $y_i$  indicates the *i*th bit of the multiplier. *n* is the number of the bits.  $y_{-1} = 0$ . All coefficients of  $2^i$  in (1) are only "0" or " $\pm 1$ ." If it supposes that *n* is even number, (1) can be rearranged as follows:

$$Y = \sum_{i=0}^{n-1} (-y_i + y_{i-1})2^i,$$
  
= 
$$\sum_{i=0}^{n/2-1} (-2y_{2i+1} + y_{2i} + y_{2i-1})2^{2i}.$$
 (2)

The coefficients of (2) are only "0," " $\pm$ 1" or " $\pm$ 2." Namely, the multiplication,  $X \times Y$ , equals the summation of each partial product, "0," " $\pm$ 1X," or " $\pm$ 2X." The summary of this algorithm is shown in Table I. According to Booth's algorithm, the number of partial products is  $n^2/2$  in n-bit  $\times n$ -bit multiplier. All partial products are generated by n/2 arithmetic steps. Normally, the AND array generates  $n^2$  partial products by  $n^2$  multiplications. The full detail of this explanation is described in [11]. As the result, partial products can be made efficiently by scanning each three-bit of a multiplier within one operation with Booth's algorithm.

### B. Booth Encoder

An ICF gate is the basic device in the Phase-Mode logic. This gate has the universal operation function, INHIBIT. Thus, it is possible to realize the complex digital operations with the combination of ICF gates [3], [5]. First, we introduce the extended AND gate having two AND functions,  $X \cdot Y = B$  and  $Y \cdot \text{Re} = C$  (Fig. 1). The gate is useful to realize the Booth encoder. The concept of this circuit is similar to B Flip-Flop based on RSFQ [12]. Followings are the operations of the gate:

- If the internal state is "0," an SFQ from the terminal X is emitted by J5.
- If the internal state is "0," an SFQ from the terminal Y is trapped in the loop consisting of L2, J4 and J6.
- If the internal state is "0," an SFQ from the terminal Re is emitted by J1.
- If the internal state is "1," an SFQ from the terminal *X* switches J6–J2 and transfers to the terminal B.
- If the internal state is "1," an SFQ from the terminal Re switches J3–J4 and transfers to the terminal C.



Fig. 1. Extended AND gate. (a) Circuit diagram. The device parameters are Ic1 = 0.25 mA, Ic2 = 0.21 mA, Ic3 = 0.14 mA, Ic4 = 0.19 mA, Ic5 = 0.25 mA, Ic6 = 0.13 mA, L1 = 2.9 pH, L2 = 2.0 pH, L3 = 2.9 pH. (b) Symbol. (c) Moore diagram.

J indicates Josephson junction. The internal state indicates the existence of an SFQ in the loop of L2, J4, and J6 (L2, J2 and J3).

Next, we introduced the Phase-Mode Booth encoder. The Phase-Mode Booth encoder can be formed easily with the extended AND gates. The encoder consists of the three arithmetic blocks.

- First Block: This block is the main block of the Phase-Mode Booth encoder. The basic function in this block examines each three-bit of the multiplier,  $y_{i+1}$ ,  $y_i$  and  $y_{i-1}$ , in the multiplication.
  - If  $y_{i-1}$  or  $y_i$  is applied, the signal "+1" is generated.
  - If both  $y_{i-1}$  and  $y_i$  are applied, the signal "+2" and the signal "+1" are generated.
  - If no input or all inputs are applied, the output signal does not exit.
  - If the signal  $y_{i+1}$  and the signal  $y_i$ , or the signal  $y_{i+1}$ and the signal  $y_{i-1}$  are applied, the output signal "—" is generated. The signal "—" is used for the complementary signal and the inversion of the multiplicand such as "—X."
  - If only signal  $y_{i+1}$  is applied, the signal "—" and the signal "—2" are generated.
- Second Block: This block judges the sign "+" or "-" mainly.
  - If the signal "+1" and the multiplicand  $x_i$  are applied, the signal "+" is generated from the terminal B in the first ICF gate. In addition, if the signal "+2" from the first block is applied, the multiplicand  $x_i$  is shifted.
  - If only multiplicand  $x_i$  is applied, the signal "-" is generated from the terminal C in the first ICF gate to start multiplying by "-" value. In addition, if the signal "-2" is applied, the multiplicand  $x_i$  would be shifted before  $x_i$  is inverted.
- Third Block: If the operation is "-x" or "-2x," the signal  $2x'_i$  or  $+2x'_i$  is inverted with the signal "-."

The addition of each partial product and complementary number is achieved in the CSA and the CLA block. The throughput of the encoder is kept in any scales because the structure of the Phase-Mode Booth encoder is a pipelined structure.

To confirm that these blocks operate correctly, we have designed a 2-bit  $\times$  2-bit Phase-Mode Booth encoder (Fig. 3). c.n.



Fig. 2. Circuit diagram of the Phase-Mode Booth encoder. (a) First block. (b) Second block. (c) Third Block.



Fig. 3. 2-bit  $\times$  2-bit Phase-Mode Booth Encoder. c.n. and  $p_i$  indicates the complementary signal and the *i*th bit of the partial product signal.

and  $p_i$  denote the complementary signal and the *i*th bit of the partial product in the multiplication, respectively. The encoder consists of 19 ICF gates and about 800 JJ's. According to the simulations, the encoder can operate over 30 GHz. We also design a part of the first block of the encoder consisting of 184 JJ's by the NEC 2.5 kA/cm<sup>2</sup> standard process technology (Fig. 4).

#### IV. MULTIPLIER WITH BOOTH ENCODER

In this section, we introduce the method for applying the Phase-Mode Booth encoder. Next, we compare the performance between the multiplier using an AND array and the one using a Booth encoder. Finally, we summarize the comparison for designing the large scale multiplier.

In general, the main factor to define the maximum operation frequency of the multiplier is the CSA block [6]. The arithmetic time of one-stage in the CSA is written as follows:

$$Delay Time = \alpha T_{in} + \beta T_{add}.$$
 (3)

 $\alpha$  is a number of input SFQ's.  $\beta$  is a bit number of the full adder.  $T_{in}$  is a time interval between inputting SFQ's, and  $T_{add}$  is the time of operating carry propagation per bit.  $\alpha T_{in}$  in (3) is



Fig. 4. A layout view of a part of the first block of the Phase-Mode Booth encoder consisting of 184 JJ's.



Fig. 5. Conceptual diagrams of 8-bit multipliers. (a) Multiplicands are shifted the position to two-bit left each operation. (b) Partial products and complementary signals are shifted each operation.

the main value because SFQ's applying to the full adder must be in serial order. It takes particular time to make bit serial with merge cells. We described the (3,2) CSA with merge cells around 13.3 GHz [6]. For an example, the conceptual diagrams of 8-bit multipliers using the Booth encoders are shown in Fig. 5.



Fig. 6. Estimations. AACSA, A and B in the figure denote the AND array and the CSA block, input-shift Booth encoders, and output-shift Booth encoders, respectively. (a) ICF gates without CLA block. (b) Josephson junctions without CLA block.

It is possible for the encoders to operate three times faster if the CSA and the CLA block operate at around 10 GHz since the throughput of the encoder is over 30 GHz. Each partial product must shift the position to two-bit left because Booth's algorithm is a uniform shift method. We propose the two methods of applying the Booth encoder to the multiplier in order to shift the partial products. 1) The multiplicands are shifted in each operation. 2) The partial products and the complementary numbers are shifted in each operation. In the case of 8-bit multiplier, the Booth encoder does not need to operate over 30 GHz because all partial products and complementary signals are generated by two encoders at two operations. Therefore, the operation frequency of the encoder, which is 20 GHz, is enough to generate all partial products.

Now, we compare the performance between the AND array and the Booth encoder. To compare them strictly, it is essential to include the CSA block. The estimations of the total required ICF gates and JJ's for each word length are shown in Fig. 6. Many JTL's are necessary to connect with each cell for the current process technology. For example, only  $n^2$  AND cells are required to produce all partial products in an *n*-bit multiplier. However, the data distributions for the AND array are complicated. One SFQ for input signal must be amplified to  $n^2$  SFQ's. The structure of the AND array becomes complicate because a data distribution needs a plenty of JJ's.

On the other hand, the Booth encoder needs less data distributions compared to the AND array. This indicates that less JJ's are required to realize a large scale parallel multiplier because the extra JTL's are not necessary (Fig. 6). One Booth encoder has more ICF gates than an AND array because the arithmetic logic of the Booth encoder is more complicated than the AND array; however, the Booth encoder can overcome the total JJ's by using the Booth encoder as a serial encoder. According to the estimation of the number of ICF gates, the input-shift Booth encoder is the most efficient method. On the other hand, the output-shift Booth encoder is not an useful method because the output shifter needs ICF gates. But, the total JJ's in both cases are less than the AND array. As the result, the Phase-Mode Booth encoder is the useful encoder to generate partial products in a large scale.

#### V. CONCLUSION

According to our estimations, the AND array needs the extra JTL's for the interfaces with the current process technology. To improve the multiplier, we investigate other algorithms as substitutes of the current structure. We propose that the Phase-Mode Booth encoder is the candidate to generate partial products. There are two methods for applying the Booth encoder to the Phase-Mode parallel multiplier. First, the multiplicands shift its position two-bit toward the left in each operation. Second, the partial products and the complementary signals are shifted each operation. It is possible to reduce the total JJ's by using the Booth encoder as a serial encoder. The encoder generates over 30 GHz with 2.5 kA/cm<sup>2</sup> Nb/AlOx/Nb junctions based on the simulations.

#### ACKNOWLEDGMENT

The authors gratefully appreciate the efforts of NEC Inc. Superconductive Circuits Fabrication Team.

#### REFERENCES

- K. Nakajima, Y. Onodera, and Y. Ogawa, "Logic design of Josephson network," J. Appl. Phys., vol. 47, pp. 1620–1627, April 1976.
- [2] K. K. Likharev and V. K. Semenov, "RSFQ logic/memory family: A new Josephson junction technology for subterahertz-clock-frequency digital systems," *IEEE Trans. Appl. Supercond.*, vol. 1, pp. 3–28, March 1991.
- [3] K. Nakajima, H. Mizusawa, H. Sugahara, and Y. Sawada, "Phase-mode Josephson computer system," *IEEE Trans. Appl. Supercond.*, vol. 1, pp. 29–36, March 1991.
- [4] K. Nakajima, G. Oya, and Y. Sawada, "Fluxoid motion in phase mode Josephson switching system," *IEEE Trans. Magn.*, vol. MAG-19, pp. 1201–1204, May 1983.
- [5] T. Onomi, K. Yanagisawa, and K. Nakajima, "New phase-mode logic gates with large operation regions of circuit parameters," *IEEE Trans. Appl. Supercond.*, vol. 11, pp. 974–977, March 2001.
- [6] —, "Phase-mode pipelined parallel multiplier," *IEEE Trans. Appl. Supercond.*, vol. 11, pp. 541–544, March 2001.
- [7] Y. Horima, I. Shimizu, M. Kobori, T. Onomi, and K. Nakajima, "Comparison between an AND array and a Booth encoder for large-scale phase-mode multipliers," IEICE Trans. Electron., vol. E86-C, Jan. 2003, to be published.
- [8] E. S. Fang and T. Van Duzer, Ext. Abstr. International Superconductive Electronics Conference, 1989, p. 407.
- [9] N. Mori, A. Akahori, T. Sato, N. Takeuchi, A. Fujimaki, and H. Hayakawa, "A new optimization procedure for single flux quantum circuits," *Physica C*, pt. 2, vol. 357–360, pp. 1557–1569, Aug. 2001.
- [10] A. D. Booth, "A signed binary multiplication technique," *Quart. J. Appl. Math.*, pt. 2, vol. 4, 1951.
- [11] L. P. Rubinfield, "A proof of the modified Booth's algorithm for multiplication," *IEEE Trans. Comput.*, vol. 24, pp. 1014–1015, Oct. 1975.
- [12] S. V. Polonsky, V. K. Semenov, and A. F. Kirichencko, "Single flux, quantum B flip-flop and its possible applications," *IEEE Trans. Appl. Supercond.*, vol. 4, pp. 9–18, March 1994.