A 32×32-bit Multiplier Using Multiple-Valued MOS Current-Mode Circuits by 亀山  充隆
A 32×32-bit Multiplier Using Multiple-Valued
MOS Current-Mode Circuits
著者 亀山  充隆
journal or
publication title
IEEE Journal of Solid-State Circuits
volume 23
number 1
page range 124-132
year 1988
URL http://hdl.handle.net/10097/46838
doi: 10.1109/4.268
124 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 23. NO. 1, FEBRUARY 1988 
A 32 x 32-bit Multiplier Using 
Multiple-valued MOS 
Current- Mode Circuits 
Ahsrruct -A 32 X 32-bit multiplier using multiple-valued current-mode 
circuits has been fabricated in 2-pm CMOS technology. For the multiplier 
based on the radix-4 signed-digit (SD) number system, 32 X 32-bit two’s 
complement multiplication can be performed with only three-stage SD full 
adders (SDFA’s) using a binary-tree addition scheme. 
The chip contains about 23 600 transistors and the effective multiplier 
size is about 3.2 X 5.2 mm2, which is half that of the corresponding binary 
CMOS multiplier. The multiply time is less than 59 ns. The performance 
is comparable to that of the fastest binary multiplier reported. 
I. INTRODUCTION 
T IS well known that multiple-valued logic (MVL) is a I very attractive approach for ULSI or wafer-scale in- 
tegration (WSI) because of the reduction of interconnec- 
tion complexity and the number of active devices [l]. 
Recently, the advantage of MVL has been confirmed in 
various applications such as memories, image processors, 
arithmetic circuits, and so on [2]-[5]. However, very few 
LSI chips based on MVL have been fabricated. This paper 
describes new LSI-oriented multiple-valued CMOS cur- 
rent-mode circuits and the practical implementation of a 
32 x 32-bit multiple-valued multiplier chip based on the 
radix-4 signed-digit (SD) number system [6]. 
In the SD number representation, carry propagation 
during addition and subtraction is always limited to one 
position to the left. This property of the number system is 
useful not only for addition but also for multiplication. 
Since multiple-valued coding is direct for the representa- 
tion of the signed digit, the arithmetic circuits can be 
implemented very compactly by the use of multiple-valued 
circuits [7]. In particular, the multiple-valued bidirectional 
current-mode circuits proposed here are quite suitable for 
the implementation of SD arithmetic because the linear 
summation including polarity can be performed by simple 
wiring [5] .  This property enables the interconnection com- 
plexity to be greatly reduced and the resulting arithmetic 
LSI circuits to be very compact. 
Manuscript received July 3, 1987; revised September 10, 1987. 
S. Kawahito, M. Kameyama, and T. Higuchi are with the Department 
of Electronic Engineering, Tohoku University, Aoba, Aramaki, Sendai 
980, Japan. 
H. Yamada is with the Semiconductor Research Center, Matsushita 
Electric Industrial Company, Ltd., Moriguchi, Osaka 570, Japan. 
IEEE Log Number 8718226. 
A 32x32-bit multiplier LSI with binary input and out- 
put has been designed using multiple-valued current-mode 
circuits and implemented in 2-pm CMOS technology. The 
multiplier, based on the radix-4 SD number system, is 
realized by a regular array structure using a three-stage 
binary-tree scheme [8], [9]. New hardware algorithms for a 
partial-product generation and an SD-to-binary conversion 
technique have also been developed for a high-speed com- 
pact multiplier. It is confirmed that the multiple-valued 
multiplier based on the SD number system is totally 
superior to the fastest binary multiplier [lo] in terms of 
speed, power dissipation, and chip area. 
11. BIDIRECTIONAL CURRENT-MODE CIRCUITS 
The most important concept of multiple-valued current- 
mode circuits is that of wired summation as introduced by 
Dao et al. with multiple-valued integrated injection logic 
[ 111. However, such “ single-directional” current-mode cir- 
cuits are not always suitable for the implementation of 
arithmetic circuits based on a sign-symmetrical number 
representation such as the SD number system. The bidirec- 
tional current-mode circuits proposed here are essentially 
suitable for the implementation of SD arithmetic and 
facilitate wired summation including polarity. Fig. 1 il- 
lustrates the principle of bidirectional wired summation. 
From Kirchhoff‘s current law, the current z is equal to the 
sum of the two currents x and y .  The current z is applied 
to successive bidirectional current-mode circuits, where 
polarity and current levels are detected and arithmetic 
operations are performed using several basic circuits. Fig. 
2 provides a summary of available basic bidirectional 
current-mode circuits. 
Fig. 3(a) shows a current source using a p-channel 
depletion-mode MOSFET. In the ideal case, the saturation 
value of the drain current Z, used as a constant current is 
written as 
1, = K ( W / L ) W 2  (1) 
where K ,  W, L ,  and V,. are, respectively, the transconduc- 
tance parameter, the channel width, the channel length, 
0018-9200/88/0200-0124$01.00 01988 IEEE 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on March 01,2010 at 23:07:01 EST from IEEE Xplore.  Restrictions apply. 
KAWAHITO et al. : 32 x  BIT MULTIPLIER USING CURRENT-MODE CIRCUITS 125 
C I R C U I T S  
OUTPUT z : x + y  
Fig. 1. Bidirectional wired summation 
B A S I C  CURRENT CURRENT M I R R O R S  THRESHOLD BI-DIRECTIONAL 
CIRCUIT SOURCE N-CH. TYPE P-CH.TYPE D E T E C T O R  CURRENT INPU 
I I I I I I I 
Fig. 2. Basic bidirectional current-mode circuits. 
(a) (b)  ( C )  
Fig. 3 .  Current sources and the threshold detector. (a) Current source. 
(b) Voltage-switched current source. (c) Threshold detector. 
and the threshold voltage of the p-channel depletion-mode 
MOSFET. The unit current can be set at the specified 
value using dose control. This type of current source is 
quite insensitive to the fluctuation of the supply voltage 
V,,, and requires no bias source or connection other than 
V . D .  
A voltage-switched current source can easily be imple- 
mented using a p-channel enhancement-mode MOSFET 
together with the current source as shown in Fig. 3(b). 
Using these current sources, a threshold detector can be 
constructed as shown in Fig. 3(c) [ll], [12]. The function is 
given by 
1 (2) y = O ,  i f x < T  y = m ,  i f x > T  
where TI,  and ml, are the threshold and the output 
currents, respectively, and I, is a unit current. This 
threshold detector is denoted by TD( T, m ) .  
In the bidirectional current-mode circuits, a current 
mirror is used for three purposes. One is to invert the 
current direction, another is to produce replicas of an 
input current, and the other is the scaling of the input 
current. There are two types of current mirrors: NMOS 
and PMOS. 
The polarity of the bidirectional current can be detected 
by a bidirectional current input circuit (BDCI) shown in 
Fig. 4(a). Let the current injected from V,, through the 
current source x be defined positive, while the current 
flowing into the ground through the output of the NMOS 
current mirror y be defined negative. z is the current 
(b) (c) 
Fig. 4. Bidirectional current input circuits. (a) Schematic. (b) I - V  
characteristics for z > 0. (c) I -  V characteristics for z < 0. 
obtained from the wired summation of x and y .  If x is 
greater than the absolute value of y (Iyl), z is positive, 
while if x is less than JyJ, z is negative. In case of x = JyJ ,  
z becomes zero. Fig. 4(b) and (c) shows the I - V  character- 
istics of x, y ,  and z in the cases of z > 0 and z < O ,  
respectively, when the switch SW is open, and a voltage 
source is connected to the point b. The I -V characteristic 
of BDCI when the voltage source is connected to the point 
a is also shown in Fig. 4(b) and (c). The operation of 
BDCI is given by 
I + = O ,  I - = O ,  if V = V D , / 2  
I + = I ,  I-=(), if V > V D , / 2  
I + =  0, 1- = I ,  if V <  V,,/2 
When S W is closed and the voltage source is removed, the 
crossover points A and B will be determined as the 
operating points for z > 0 and z < 0, respectively. It is 
obvious that the positive and negative z values are equal 
to It and I-, respectively. Replicas of I' and 1- are 
obtained at the output of BDCI through current mirrors. 
Consequently, a bidirectional current can be decomposed 
into two single-directional currents by using BDCI. 
111. SIGNED-DIGIT ARITHMETIC CIRCUITS 
The radix-4 SD arithmetic is used in the multiplier 
because of easy compatibility with the binary system and 
compactness of the implemented chip. The radix-4 SD 
number is represented by the following symmetrical digit 
set of seven values: 
L =  {-3,-2,-1,0,1,2,3}.  (4) 
Any integer X E [ - N ,  N I ,  where N = 4" - 1, can be coded 
as a sequence of radix-4 signed digits x, according to 
1 1 - 1  
x= ( X n p I  . . . x ,  . . . xo) = x14'. ( 5 )  
r = O  
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on March 01,2010 at 23:07:01 EST from IEEE Xplore.  Restrictions apply. 
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 23, NO. 1, FEBRUARY 1988 126 
U 
Fig. 6 .  Signed-digit full adder (SDFA) 
In general, the number X is not uniquely coded in the SD 
number representation except for X = 0. This redundancy 
in the number representation allows fast parallel oper- 
ation. 
The addition of two numbers X = (xnp . . . x ,  . . . x o )  
and Y = ( y,, - . . y, . . . yo)  is performed by the following 
three successive steps for each digit: 
z ,  = x ,  + Y ,  
s, = w, + C l p 1  
(6) 
(7) 
(8) 
4c, + w, = z ,  
where z ,  is a linear sum of x ,  and y,, w, is an intermediate 
sum, c, is a carry, and 
Z,E {-6 ; - . ,0 ; .* ,6)  
W,E {-2,-1,0,1,2} 
c, E { - 1 , O J )  
It is obvious that s, E L. The final sum s, can be obtained 
almost in parallel and the addition speed is independent of 
the word length. 
Fig. 5 shows the parallel SD adder based on the bidirec- 
tional current-mode circuits. The addition steps of (6) and 
(8) can be performed by bidirectional wired summation 
without active devices. Equation (7) can be performed by 
the signed-digit full adder (SDFA) shown in Fig. 5. From 
(7) and (9), the intermediate sum w, and the carry c, can 
be obtained as follows: 
w , = z ,  , c,=O, if - l < z , < l  
w, = z ,  -4, c, =1, if z ,  > 2 
w, = z ,  +4, c, = -1, if z ,  < -2 
From (lo), the SDFA can be constructed by using the 
basic circuits of Fig. 2 and bidirectional wired summation 
as shown in Fig. 6. The SDFA is composed of 26 tran- 
Xf 
X- 
4 
t 
X- 
TD(1.1) 
TD(3.1) ~ 
(0.1) *y 
BDCl  -2' (0.1) 
TD(1.1) ?* 
TD(3.1) ~ 
(0.1) 
-2x+ 
c 
Fig. 7. Inverted quantizer with dynamic range from - 2 to 2 
Fig. 8. Photomicrograph of the SDFA with the inverted quantizer 
sistors. For the improvement of speed and dc characteris- 
tics, i t  is useful to reduce the number of cascaded current 
mirrors in the SDFA. One solution is to retain the inter- 
mediate sum w, in inverted form. 
In general, signal levels must be restored before their 
complete degradation. While the carry c, is quantized by 
the threshold detector, the intermediate sum w, is not 
quantized by the circuit of Fig. 6. Thus an inverted quan- 
tizer is essential for the output w,. At the same time, the 
specified polarity can be obtained through the inversion 
operation. Fig. 7 shows the inverted quantizer having a 
dynamic range from - 2 to 2 whch is also composed of 26 
transistors. 
Fig. 8 shows a photomicrograph of an implemented 
SDFA with the inverted quantizer as a test circuit. The 
effective size is about 140 X 205 pm2. The characteristics of 
the implemented circuits are shown in Fig. 9, where the 
unit current is approximately 31 pA with transistor sizes of 
L = 2.8 pm and W = 9 pm, and a threshold voltage of 1.5 
V. Fig. 9(a) shows the current transfer curve for the carry 
of the SDFA corresponding to (7). By using the inverted 
quantizer whose characteristic is shown in Fig. 9(b), the 
current transfer curve for the intermediate sum w, of the 
SDFA is quantized as shown in Fig. 9(c). The worst-case 
delay time of the SDFA with the inverted quantizer is 
measured to be about 11 ns, while the simulated value 
using SPICE2 is 8.9 ns. 
The minimum noise margin is about 10 pA (32 percent 
of the unit current) from Fig. 9(c). If the statistical varia- 
tion from the desired output of the current source exceeds 
the noise margin, logical errors will occur. The variation of 
the current-source output current ( A Z )  is mainly caused by 
the variation of the threshold voltage of the transistor 
(AV,). The variation is represented as A I  = (AV, /V, )  
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on March 01,2010 at 23:07:01 EST from IEEE Xplore.  Restrictions apply. 
KAWAHITO et U / . :  32 X 32-BIT MULTIPLIER USING CURRENT-MODE CIRCUITS 127 
(C)  
Fig. 9. Current-transfer curve of the SDFA and the inverted quantizer. 
(a) Carry of the SDFA-horizontal axis: input z(41 pA/div); vertical 
axis: carry output c(41 pA/div). (b) Inverted quantizer-horizontal 
axis: input x(24.4 pA/div); vertical axis: output y(24.4 pA/div). (c) 
Intermediate sum of the SDFA-horizontal axis: input z(41 pA/div); 
vertical axis: intermediate sum output (quantized) w(41 pA/div). 
-(2+(AVT/VT))Za. In case of AVT=50 mV, VT=1.5 V, 
and I ,  = 31 PA, A I  becomes about 2 PA. This variation is 
not large enough to cause logical errors. 
The speed is improved by the shrinking of minimum 
process feature sizes. However, the noise margin is de- 
creased due to the channel-length modulation effect of the 
transistors used for the current source and the current 
mirrors. One solution for this problem is to use a special 
device with the reduced channel-length modulation effect. 
For example, a DSA MOS device will be useful [13]. 
IV. MULTIPLICATION ALGORITHM 
For compatibility with the binary system, the signed- 
binary number (two's complement) representation is used 
at the input and the output of the multiplier, and the 
radix-4 SD number representation is used inside the multi- 
plier. New algorithms for partial-product generation, an 
addition scheme, and SD-to-binary conversion are used to 
perform high-speed multiplication. 
A .  Generation of Partial Products 
One of the most important techniques for the realization 
of a high-speed compact multiplier is to reduce the number 
of partial products. To reduce the number of partial prod- 
ucts, the modified Booth's algorithm [14] is often used in 
binary multipliers. By the extension of the modified Booth's 
algorithm, a recording algorithm suitable for radix-4 SD 
number partial-product generation has been developed 
here. 
Let Y be the multiplier in the n-bit two's complement 
number. Assume that n is a multiple of 8. The multiplier Y 
may be written as 
n - 2  
Y = - y  .-,2"-'+ y,2J (11) 
J = o  
where y/ E (0,l) .  Equation (11) can be also expressed as 
follows: 
Y = (Y, , - ,  + yn-4 +2Yn-3 +4Yn-, - ~ ~ ~ - , ) 2 ( " - ~ )  
n/4-  1 
= Q,16J 
/ = o  
where y-,=O and Q,E {-S;.. ,O;.-,S} is defined as 
Q, = ( Y4, - 1 + Y4, + 'Y4, + 1 + 'Y4, + 2 - 'Y4, + 3 1. (14) 
Consequently, the multiplier Y can be divided into n/4 
groups, each of 5 bits. Each pair of two contiguous groups 
has one bit in common. Equation (13) can be rewritten as 
the following equation by considering two parts, one for 
j = odd and one for j = even: 
n / 8 - 1  n /8 -1  
Y =  Q2k162k+ Q2k+1162k+'. (15) 
k = O  k = O  
For the multiplicand X ,  the product P of X and Y can be 
written as 
n / 8 - 1  n / 8 - 1  
P =  Q2kX162k+  Q2k+lX162k+'. (16) 
k = O  k = O  
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on March 01,2010 at 23:07:01 EST from IEEE Xplore.  Restrictions apply. 
128 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 23, NO. 1. FEBRUARY 1988 
PJ 
UJ 
VJ 
Fig. 
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 
0 1 2 - 1  0 1 - 2 - 1  0 1 2 - 1  0 1 - 2 - 1  0 
- 8 - 8 - 8 - 4 - 4 - 4  0 0 0 0 0 4 4 4 8 8 8 
(b) 
SD c o n v e r t e r .  
10. P a r t i a l - p r o d u c t  g e n e r a t o r  ( P P G ) .  (a) S c h e m a t i c .  (b) Binary- to -  
For the reduction of the number of adder delays, the 
following two expressions for X are used: 
1, - 2  
x= - x, ,-12tj-1 + x,2’, 
I = o  
for the product Q Z k  X (17) 
I ,  - 2  
x=X2_12”-1-  F 2 ’ - 1 ,  
, = o  
for the product Q,k+lX (18) 
where denotes the complement of x,. Fig. 10(a) shows 
the partial-product generator (PPG) using the recoding 
algorithm. Q ,  can be decomposed into an appropriately 
selected U/ and V,  as 
Q,=v,+v, (19) 
where U / ~ { - 2 , - 1 , 0 , 1 , 2 }  and V ,~{-8 , -4 ,0 ,4 ,8}  are 
given as shown in Table I. Q, X can be written as 
Q , X = U , X + V , X  
= A, + B, ( 2 0 )  
where A , = U , X E { - ~ X , - X , O , X , ~ X }  and B , = V , X E  
{ - 8 X ,  - 4 X ,  0,4 X ,  8 X }. Since the multiplicand X is rep- 
resented as a two’s complement binary number, the gener- 
ation of A, and B, can be realized by shift and comple- 
ment operations. The values SX,  4 X ,  and 2 X correspond 
to 3-, 2- ,  and 1-bit left shift operations in X ,  respectively. 
TABLE I1 
MAPPING FOR EACH BIT OF A, , ,  U, , , ,  (FOR M S B ) ,  U , , , ,  (FOR 
i = 0 .  . . n - l), AND d , ,  (INCREMENT SIGNAL) FROM U,, 
T A B L E  111 
MAPPING FOR EACH BIT O F A , , , , ,  
(FOR i = 0 . . . n - l),  AND d , ,  + (INCREMENT SIGNAL) FROM 
(FOR MSB), a , , , , + ,  
U 2 k + l  
TABLE IV 
MAPPING FOR EACH BITOF B,, ,  b,,+,,,, (FOR M S B ) ,  b,,,, (FOR 
i = 0 .  . . n + l), AND (INCREMENT SIGNAL) FROM V,, 
T A B L E  V 
MAPPING FOR EACH BIT OF B2,+,, bn+2 ,2k+ l  (FOR M S B ) ,  h l , 2 k + l  
(FOR i = 0 .  . . n + l), AND e , , + ,  (INCREMENT SIGNAL) FROM 
The value - X can be obtained by first complementing the 
number X then adding 1 to its LSB. The shf t  and comple- 
ment control signals are generated from the recoder whch 
can be easily implemented by conventional CMOS gates. 
As a result, the shift and the complement operations in X 
are performed using the data selectors controlled by these 
signals as shown in Tables 11-V. The partial product p , ,  
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on March 01,2010 at 23:07:01 EST from IEEE Xplore.  Restrictions apply. 
KAWAHITO et U [ . :  32 X 32-BIT MULTIPLIER USING CURRENT-MODE CIRCUITS 129 
corresponding to (20) 
n /2 
'J Q j X =  PI, J4' 
r = O  
can be generated by the following equations in the radix-4 
SD number representation: 
Z I , J = 2 b ~ I + 1 , J + b ~ l , J + 2 a 2 1 + l , J + a 2 1 , J  (21) 
4cr, j + w1.j = z i , j  (22) 
(23) PI, J = w l ,  J + '1-1, J '  
If ~ = 2 k , p , , , ~ { - 1 , 0 , 1 , 2 , 3 } ,  because z , , ,~{0 ; . . , 6} ,  
'1, J ~ { 0 , 1 } ,  and ~ , , ~ ~ { - 1 , 0 , 1 , 2 } .  While, i f  j = 2 k + 1 ,  
PI, J E {-3,-2,-1,0,1}, because z , , ~ E  {-6;..,0}, c , , ~  
E { - l,O}, and y J  E { -2, -l,O,l}. For the partial- 
product generation for MSD, the following expression is 
used instead of (21): 
' n / 2 , J =  - 4 b n + 2 , J + 2 b n + l , J +  bn.J- 'n,J (24) 
where z , , / , , ~ ~ E  {-5;..,3} and Z , , / ~ , ~ ~ + ~ E  { -3 , - . . ,5}.  
The operations of (21) and (22) are performed by the 
binary-to-SD converter shown in Fig. 10(b) which can be 
implemented using. binary CMOS gates, pass transistors, 
and voltage-switched current sources. Also (23) is per- 
formed by bidirectional wiring summation. 
B. Addition of Partial Products 
Based on the above partial-product generation, n/4 
operands expressed as a radix-4 (n /2 + 1)-digit number 
are produced, where the operand is denoted by PJ (for 
j = 0,. . . , (n /4) - 1). The final product can be obtained by 
performing the multiple-operand addition of P,'s as fol- 
lows: 
n / 4 - 1  
P = PJ16J. (25) 
J = o  
The multiple-operand addition can be performed using 
two-input parallel SD adders in a binary-tree structure. 
For example, a 32 x 32-bit multiplier which contains eight 
operands composed of 17-digit partial products can be 
constructed by three-stage adders based on a binary-tree 
structure as shown in Fig. 11. 
The linear sum f, , ,  at the input of the first-level SD 
adders is given by 
f , , ,  = P 1 , 2 k +  P1,2k+l (26) 
where P1.2k( E { -i4,i,2?3}) and P1,2k+l( E { -3, -2, 
- 1,0,1}) are the partial products of the (2k)th and the 
(2k + 1)th PPG groups, respectively, and where f,, is 
limited to values between -4  and 4. Since the input 
dynamic range of the SDFA is from - 6 to 6, it is possible 
to add the increment signals dJ and el together with the 
partial products. Unless this method is used, the input 
FIRST 
LEVEL 
I l l  
~ S D F ~  $0~4 $ D F ~  ---SECOND - -  L E V E L  -- - -  - -  
ADDER I ! ]ADDER 
THIRD 
L E V E L  -- 
ADDER JP,, &PI JP,-, 
F I N A L  PRODUCT @---Wired Summation 
Fig. 11. Binary-tree scheme for the 32 X 32-bit multiplication. 
linear sum to the SDFA's exceeds its dynamic range due to 
the addition of dJ and eJ. 
Fig. 12 shows the example of 8x4-bit multiplication 
using the proposed algorithm, where the multiplicand X = 
(loll), and the multiplier Y=(00101010), in the two's 
complement form. The final product P ( = - 44 + 3 X 42 - 
2 X4O = -210) of X ( = -23 +2l+2' = -5) and Y ( = 25 
+ 23 + 2 l =  42) can be obtained with a single-level SD 
adder. 
Generally, addition in the m-operand adder based on 
the binary-free scheme is performed with I-stage SDFA's 
where 
I =  [Iog,m] (27) 
and [x  J denotes the smallest integer such that [x  J > x. In 
the n X n-bit multiplier, the number of partial products of 
m-operands expressed as radix-4 SD numbers becomes 
n /4 using the above recoding algorithm. Therefore, the 
number of adder stages I in an n X n-bit multiplier is given 
by 
I =  [log2(n/4)] = [log,n-2]. (28) 
Fig. 13 shows the relation between the number of adder 
delays and the word length for various multipliers. In Fig. 
13, the number of adder delays in a binary multiplier is 
defined as that number of adders necessary to reduce all 
the partial products into two operands. The number of 
adder delays for the SD multiplier is half of that of the 
fastest binary multiplier using a Wallace tree and modified 
Booth's algorithm for a word length of 32 bits [14], [15]. 
Also a tenfold reduction of the number of adder delays 
can be achieved compared with a 32 X 32-bit ordinary 
array multiplier. 
C. SD-to-Binary Conversion 
The final product P represented by the radix-4 SD 
number can be written as 
n - 1  
P = ( p , - , * - p , - - p , ) =  p,4' (29) 
r = O  
where p, E { -3, -2, - 1,0,1,2,3}, for i = 0; . -, n -2 and 
p n P 1  E { - l,O,l}. The radix-4 signed digit p, can be de- 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on March 01,2010 at 23:07:01 EST from IEEE Xplore.  Restrictions apply. 
130 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 23, NO. 1, FEBRUARY 1988 
fn 
30- 
-I 
w 
25-  
a 
w 
U 
20- 
15 
I( 0) ( 04 ( 0) ( 011 ( - 1 )  ( 1)l ( - 1 )  ( 2 ) l (  1 )  ( - 1 1  ( 1 )  ( - 2 4  
* I &W &e J- J- A- I 5 6 (  0) I S 5 (  0) ] S4( -1 )  I S3( 0) I S2( 3 )  1 S t (  0) I S 0 ( - 2 )  ]--- FP:(SO)4 
Fig. 12. Euample of the 8x4-bit  multiplication (C: complement; NC: 
no complement; S ( i ) :  !-bit shift; PP: partial product; F P :  final 
product; (SD),: radix-r SD number; ( L S ) , :  linear sum). 
- 
I I / , ,  
'0 8 16 24324048566472808896104 
WORD-LENGTH( bi t )  
Fig. 13. Comparison of the number of adder delays in various multi- 
pliers ( A  : proposed SD multiplier; B: binary multiplier with Wallace 
tree (WT) and modified Booth's algorithm (MBA); C: binary multi- 
plier with WT; D: binary multiplier with carry save adder (CSA) and 
MBA; E: binary multiplier with CSA). 
composed into two symmetrical components ( p', p,: ) as 
follows: 
p,' = p, ,  p,- = 0 ,  if p, 2 0 
p,+=O, p ; = - p , ,  i f p , < O  
where p:, p,- E {0,1,2,3} and p ,  = p: - p,. Since both 
P + = ( P , + - ~ - - * P ~ + )  and P - = ( P ~ - - ~ - . . P ; )  are the 
ordinary radix-4 weighted-number representation, these 
values can be easily decoded to two's complement binary 
numbers. Therefore, the conversion can be performed by 
subtraction using binary full adders. By the extension of 
the above conversion algorithm, a high-speed conversion 
technique using a radix-4 carry lookahead adder has been 
developed [8]. Consequently, the conversion can be com- 
pleted with the delay of decoder and ten CMOS gates. 
V. IMPLEMENTATION OF THE SD MULTIPLIER 
Fig. 14 shows a block diagram of the 32x32-bit SD 
multiplier which is composed of PPGs, parallel SD adders 
(PSDA's), a recoder, decoders, and SD-to-binary con- 
verters. The multiplication can be performed by a three- 
stage binary-tree addition scheme. The chip also contains 
input ( X ,  Y )  registers, an output ( P )  register, and a prod- 
uct selector. 
Fig. 15 shows a photomicrograph of the SD multiplier. 
It is clear that the structure is very regular and the inter- 
connections between the modules are very simple. The 
chip size is 4.92X7.00 mm2, and the effective multiplier 
size is 3.16 X 5.23 mm2 which is half of that of the fastest 
conventional binary multiplier [lo]. 
In order to measure the multiply time, multiplication- 
control gates are inserted between the Y register and the 
recoder. The multiplication is started at the rising edge of 
the multiplication enable ME signal, and the multiplica- 
tion results are latched in the P register at the falling edge 
of the M E .  Thus, it is obvious that the minimum pulse 
width of the M E  is the effective multiply time when the 
multiplication is performed correctly. The maximum delay 
time t, of the multiplier is expressed as 
t, = t, + 3t, + 2t, + td + f, (31) 
where t,, t,, t,, td, and t ,  are the maximum delay times of 
the product generator, the SDFA, the inverted quantizer, 
the decoder, and the SD-to-binary converter, respectively. 
By substituting these delay times obtained from SPICE 
simulation using the 2-pm device parameters to (31), the 
multiply time is estimated to be 42 ns. Fig. 16 shows 
waveforms of the M E  input and the product output (MSB), 
where the multiplicand X = 00000100, and the multiplier 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on March 01,2010 at 23:07:01 EST from IEEE Xplore.  Restrictions apply. 
KAWAHITO et U / . :  32 X 32-BIT MULTIPLIER USING CURRENT-MODE CIRCUITS 131 
CK 
M E X / Y  RESET C K X  , ! !  1 I I 
TABLE VI 
COMPARISON OF TWO IMPLEMENTED 32 X 32-BIT MULTIPLIERS 
PRODUT SELECTOR 
1 
DO-D31 O E  
Fig. 14. Block diagram of the 32 x 32-bit multiplier. 
I The multiple-valued 
mu1 tip1 ier I ~ ~ ~ a ~ ~ ~ Z i  pl ier 
Multiply time ( n s )  I 59 I 56 
Number of 
interconnections I 200 I 1,500 
- ~ 
I 45*000 Number of transistors 23,600 (current-mode) I ( 7 , 2 0 0 )  
Effective multiplier 
size (mn*) 
Power cli s s  ipa t ion 
Technology 
( W )  
5.2 x 3.2 5.3 x 5.7 
Y = 80000000, in the hexadecimal representation. From 
Fig. 16, the multiply time is measured to be about 59 ns. 
Table VI shows the comparison of actual performances 
of the implemented SD multiplier and the fastest binary 
multiplier presented at ISSCC‘86. The multiply time is 
comparable to that of the fastest binary multiplier, because 
the number of cascaded adder chains is half, and because 
the delay time of the SDFA module is about twice com- 
pared with a binary full adder. 
The number of transistors is approximately 52 percent 
of that for a binary multiplier. However, the effective area 
is determined not only with the number of transistors but 
also with the feature sizes of transistors and the intercon- 
nection. The feature sizes of transistors used for the cur- 
rent-mode circuits are relatively large compared with those 
used for CMOS gates, while the number of interconnec- 
tions among the modules (PPG’s, SDFA’s, and the de- 
coders) is approximately 13 percent of that of the fastest 
binary multiplier using the Wallace tree. Also, the regular- 
ity of the layout greatly contributes to the reduction of 
interconnection area in the multiple-valued multiplier. As 
a result, a 46-percent reduction of the effective area is 
achieved compared with the binary multiplier. 
The multiple-valued multiplier is superior to the conven- 
tional one with regard to the power dissipation because of 
the reduction of the active devices. The number of tran- 
sistors used for current-mode circuits is about 7200 and 
the other transistors are used for binary CMOS gates. 
Fig. 15. Photomicrograph of the multiplier chip. 
VI. CONCLUSION 
Fig. 16 Waveforms of the M E  input and the product output (MSB) 
This paper presents multiple-valued CMOS bidirectional 
current-mode circuits and their application to a high-speed 
compact multiplier. The implemented 32 X 32-bit multi- 
plier based on the radix-4 signed-digit number system is 
totally superior to the fastest CMOS binary multiplier 
reported, 
In submicrometer technology, however, it is difficult to 
control the saturated current of the MOS devices. There- 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on March 01,2010 at 23:07:01 EST from IEEE Xplore.  Restrictions apply. 
132 
fore, the development of devices more suited to multiple- 
valued current-mode circuits remains a future problem. 
The concept of the bidirectional current-mode circuits is 
potentially effective not only for the radix-4 SD number 
system but also for other-radix SD number systems and 
the symmetrical residue number system. 
ACKNOWLEDGMENT 
The authors wish to thank Dr. S. Horiuchl and Dr. T. 
Ishihara of Matsushita Electric Industrial Company, Ltd. 
for many helpful comments. 
REFERENCES 
K. C. Smith, “The prospects for multivalued logic: A technology 
and applications view,” IEEE Trans. Computers, vol. C-30, pp. 
619-634, Sept. 1981.“ 
D. A. Rich et al., A four-state ROM using multilevel process 
technology,” IEEE J. Solid-state Circuits, SC-19, no. 2, pp. 
174-179, 1984. 
M. Horiguchi et al., “An experimental large capacity semiconduc- 
tor file memory using 16-level/cell storage,” in Proc. Symp. VLSI 
Circ., May 1987, pp. 4?‘-50. 
M. Kameyama et al., Design and implementation of quaternary 
NMOS integrated circuits for pipelined image processing,” IEEE J .  
Solid-state Circuits, vol. SC-22, no. 1, pp. 20-27, FEb. 1987. 
S. Kawahito, M. Kameyama, and T. Higuchi, VLSI-oriented 
bi-directional current-mcde arithmetic circuits based on the radix-4 
signed-digit number system,” in Proc. 1986 Int. Symp. Multiple- 
Valued Logic,“May 1986, pp. 70-77. 
A. Avizienis, Signed-digit number representations for fast parallel 
arithmetic,” IRE Trans. Electron Comput., vol. EC-10, pp. 389-400, 
Sept. 1961. 
M. Kameyama and T. Higuchi, “Design of radix-4 signed-digit 
arithmetic circuits for digital filtering,” in Proc. 1980 Int. Symp. 
Multiple- Valued Logic, June 1980, pp. 272-277. 
S. Kawahito et al., “A high-speed compact multiplier based on 
multiple-valued bi-directional current-mode circuits,” in Proc. 1987 
Int. Symp. Multiple-Valued Logic, May 1987, pp. 172-180. 
S. Kawahito et al., “A 32 X 32 bit multiplier using multiple-valued 
MOS current-mode circuits,” in Proc. Symp. VLSI Circ., May 
1987. DD. 99-100. 
A.E: camal  et al., “A CMOS 32b Wallace tree multiplier-accumu- 
lator,” in ISSCC Dig. Tech. Papers, 1986, THPM 15;:. 
T. T. Dao. E. J. McCluskev. and L. K. Russell. Multivalued 
integrated injection logic,” IEEE Trans. Computers, vol. (2-26, pp. 
1233-1241, Dee;‘ 1977. 
T. Yamakawa, CMOS multivalued circuits in hybrid mode,” in 
Proc. 1985 Int. Symp. Multiple-Valued Logic, May 1985, pp. 
Y. Tarui et al., “Diffusion self-ali ned enhance depletion MOS-IC 
(DSA-ED-MOS-IC),” in Proc. 2njConf.  Solid-&e Deuices, 1970, 
p. 193. 
A. D. Booth, “A signed binary multiplication technique,” Quart. J .  
Mech. Appl. Math., vol. 4, part 2, pp. 236-240, 1951. 
C. S. Wallace, “A suggestion for a fast multiplier,” IEEE Trans. 
Electron Comput., vol. EC-13, no. 1, pp. 14-17, Feb. 1964. 
144- 151. 
Electronics, Informatioi 
Shoji Kawahito (S’85) was born in Tokushma, 
Japan, on March 21, 1961. He received the B.E. 
and M.E. degrees in electronic engineering from 
Toyohashi University of Technology, Toyohashi, 
Japan, in 1983 and 1985, respectively. 
He is currently working towards the D.E. de- 
gree at Tohoku University, Sendai, Japan. His 
main interests include multiple-valued informa- 
tion processing, high-performance arithmetic 
VLSI. and signal processing. 
Mr. Kawahito is a member of the Institute of 
I, and Communication Engineers of Japan. 
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 23, NO. 1, FEBRUARY 1988 
Michitaka Kameyama (M’79) was born in 
Utsunomiya, Japan, on May 12, 1950. He re- 
ceived the B.E., M.E., and D.E. degrees in elec- 
tronic engineering from Tohoku University, 
Sendai, Japan, in 1973, 1975, and 1978, respec- 
tively. 
He is currently an Associate Professor in the 
Department of Electronic Engineering, Tohoku 
University. His general research interests include 
multiple-valued logic systems, VLSI-oriented 
special-purpose processors, highly reliable digital 
systems, and robotics. 
Dr. Kameyama is a member of the Institute of Electronics. Informa- 
tion. and Communication Engineers of Japan, the Society of Instrument 
and Control Engineers of Japan, the Information Processing Society of 
Japan, and the Robotics Society of Japan. He received the Awards for 
Excellence at the 1984 and 1985 IEEE International Symposiums on 
Multiple-valued Logic (with T. Higuchi et al .)  and the Technically 
Excellent Award from the Society of Instrument and Control Engineers 
of Japan in 1986 (with T. Higuchi et al .) .  He was the Program Co-chair- 
man of the 1986 IEEE International Symposium on Multiple-valued 
Logic. 
Tatsuo Higuchi (M70-SM’83) was born in 
Sendai. Japan, on March 30, 1940. He received 
the B.E., M.E., and D.E. degrees in electronic 
engineering from Tohoku University, Sendai, 
Japan, in 1962, 1964, and 1969, respectively. 
He is currently a Professor with the Depart- 
ment of Electronic Engineering, Tohoku Univer- 
sity. His research interests include design of 
one-dimensional and two-dimensional finite 
word-length digital filters, multiple-valued logic 
systems, fault-tolerant computing, and VLSI 
computing structures for signal processing and image processing. 
Dr. Higuchi is a member of the Institute of Electrical Engineers of 
Japan, the Institute of Electronics, Information, and Communication 
Engineers of Japan, and the Society of Instrument and Control Engineers 
of Japan. He received the Awards for Excellence at the 1984 and 1985 
IEEE International Symposiums on Multiple-valued Logic (with M. 
Kameyama et d.), the Outstanding Transactions Paper Award from the 
Society of Instrument and Control Engineers of Japan in 1984 (with M. 
Kawamata). and the Technically Excellent Award from the Society of 
Instrument and Control Engineers of Japan in 1986 (with M. Kameyama 
et U / . ) .  He was the Program Chairman of the 1983 IEEE International 
Symposium on Multiple-Valued Logic, and he is the Chairman of the 
Japan Research Group on Multiple-valued Logic. 
Haruyasu Yamada was born in Tokyo, Japan, in 
1943. He received the B.S. and M.S. degrees in 
electronic engineering from Tohoku University, 
Sendai, Japan, in 1966 and 1968, respectively. 
He joined Matsushita Electric Industrial Com- 
pany, Ltd., Osaka, Japan, in 1968. Since then he 
has been engaged in the research and develop- 
ment of bipolar integrated circuits. Presently, he 
is working on LSI’s of image signal processing 
and others for consumer applications at the 
Semiconductor Research Center. 
Authorized licensed use limited to: TOHOKU UNIVERSITY. Downloaded on March 01,2010 at 23:07:01 EST from IEEE Xplore.  Restrictions apply. 
