Further Reducing the Redundancy of a Notation Over a Minimally Redundant Digit Set by Daumas, Marc & Matula, David
HAL Id: hal-00157717
https://hal.archives-ouvertes.fr/hal-00157717
Submitted on 26 Jun 2007
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Further Reducing the Redundancy of a Notation Over a
Minimally Redundant Digit Set
Marc Daumas, David Matula
To cite this version:
Marc Daumas, David Matula. Further Reducing the Redundancy of a Notation Over a Mini-
mally Redundant Digit Set. The Journal of VLSI Signal, Springer Verlag, 2003, 33 (1-2), pp.7-18.
￿10.1023/A:1021133616373￿. ￿hal-00157717￿
Journal of VLSI Signal Processing 33, 7–18, 2003
c© 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.
Further Reducing the Redundancy of a Notation Over a Minimally
Redundant Digit Set
MARC DAUMAS∗
Laboratoire de l’Informatique du Parallélisme, CNRS, ENS de Lyon, France
DAVID W. MATULA
Southern Methodist University, School of Engineering, Dallas, Texas, USA
Received September 25, 2001; Revised November 30, 2001
Abstract. Redundant notations are used implicitly or explicitly in many digital designs. They have been studied in
details and a general framework is known to reduce the redundancy of a notation down to the minimally redundant
digit set. We present here an operator to further reduce the redundancy of such a representation. It does not reduce
the number of allowed digits since removing one digit to a minimally redundant digit set is a conversion to a non
redundant digit set and this is an expensive operation. Our operator introduces some correlation between the digits
to reduce the number of possible redundant notations for any represented number. This reduction is visible in small
useful operators like the elimination of leading zeros. We also present a key application with a CMOS Booth recoded
multiplier. Our multiplier is able to accept both a redundant or a non redundant input with very little modifications
and almost no penalty in time or space compared to state-of-the-art non redundant multipliers.
Keywords: computer arithmetic, redundant notation, addition, multiplication
1. Introduction and Summary
Any positive integer has a unique radixβ representation
(β ∈ N and β ≥ 2) with digits in the set {0, . . . , β − 1}.
When the set contains more than β elements, the
notation is redundant and many integers have sev-
eral representations [1–4]. The redundancy increases
with the number of additional elements in the digit
set.
In some cases, we are able to retrieve some correla-
tions between the digits of the redundant representation
computed by an algorithm. That assertion reduces the
redundancy of the notation even though the digit set is
not changed.
One such example is the radix 4 recoding inspired
from Booth [5] as presented in Koren [6]. The number is
∗This work has been partially supported by the PICS 479 from the
French National Center for Scientific Research (CNRS).
written on the digit set {−2, . . . , 2}but a digit 2 can only
be followed by a negative digit possibly preceded by a
string of zeros. For example, (201)4 = 2 · 42 + 0 · 41 +
1 ·40 = 33 is not a valid Booth recoded representation.
The valid one for 33 is (12̄01)4 = 1 · 43 + (−2) · 42 +
0 · 41 + 1 · 40.
The conversion from the usual radix 2 representation
to the radix 4 Booth recoded representation is obtained
from the bit to digit conversion presented in Table 3.
This special radix 4 notation on the redundant digit set
{−2, . . . , 2} is not redundant.
In Section 2, we identify a quantity that describes
part of the correlation of the digits. We have first pre-
sented this quantity in [7], it has been used by Matula
and Nielsen [8], Nielsen et al. [9], Seidel and Even [10],
Dumonteix and Mehrez [11], and refered by Seidel
[12]. It is called the fraction range associated to a set of
valid representations (language). It is the set of all the
possible fractions of units in the last position lost when
8 Daumas and Matula
one truncates any valid representation for any possible
position in the representation.
The fraction range of a general radix 4 notation on
the digit set {−2, . . . , 2} is included in (−2/3, 2/3)
whereas the fraction range of the abovementioned
Booth recoded notation is included in [−1/2, 1/2]. We
will see a first example of how this quantity measures
the redundancy of the notation in the Section 2.3 where
the window of k digits of a radix β representation forms
a new representation radix βk .
Fraction range proves to be appropriate in retaining
most of the information when carry recodings are ap-
plied. Such operations are specific digit set conversions
[13] or rewriting rules [14, 15] that compute in parallel
a ±1 carry and an in-place residual digit for each digit
position. Digit set conversion is a generalization of the
addition of numbers in redundant and/or non redundant
notations on parallel and serial systems. For example,
the binary sum of (101101)2 and (011101)2 can be im-
plemented as the radix 2 conversion of (112202)2 to the
digit set {0, 1}. The fraction range traces the correlation
of the digits in a complex arithmetic algorithm that uses
only recodings as this is the case for most implemented
arithmetic operators on a redundant notation.
We present in Section 3 three applications of redun-
dancy reduction of radix two borrow save and carry
save notations, one of them being our key Booth mul-
tiplier with one redundant operand [16].
2. Fraction Range and Carry Recoding
2.1. Definitions
The radix β notation (dm . . . d0 ·d−1 . . . dl)β on the digit
set S represents the rational number of Eq. (1). The
set of all the possible representations from positions l
through m with digits in S is the formal language noted
Sml (l ≤ 0 ≤ m). The valuation ‖ · ‖β maps Sml into the
set Q| of the rational numbers.
‖dm . . . d0 · d−1 . . . dl‖β =
m∑
i=l
diβ
i with
l ≤ i ≤ m ⇒ di ∈ S (1)
In the following, we are only interested in contiguous
digits sets containing zero, that is S = {Smin, . . . , Smax}
with Smin ≤ 0 ≤ Smax leading to a redundant notation
as soon as Smax − Smin ≥ β. The digit set is minimally
redundant if it contains exactly β + 1 digits, that is
Smax − Smin = β. A carry ripple process with best time
complexity (log(m − l)) is necessary to convert to a
notation with only β elements.
On a computer, a number is always produced as
a vector of digits D = [dm . . . d0 d−1 . . . dl]. An algo-
rithm that computes A(X ) = D does not produce a ra-
tional number (the valuation) but a vector of digits (the
representation). In the introduction, we have seen that
there might be some correlations between the digits
of the vector as long as they have not been given by
the user but computed. Without knowing the actual
value of the digits stored in the vector D, we can define
some representations that are acceptable and prove that
some others cannot occur. This defines the language
A+ = {A(X ) where X is a possible input} ⊂ Sml which
is the subset of all the possible output representations.
By extension, we use the same notation for A+ and A in
the following equations. We define the fraction range
of A and the fraction range at position j by Eqs. (2)
and (3). We will see how this single set captures alone
some very relevant part of the correlation between the
digits.
Fr j (A) = Fr j (A+)
= {‖0 · d j−1 . . . dl‖β, [dm . . . dl] ∈ A+} (2)
Fr(A) = Fr(A+) =
⋃
l≤ j≤m
Fr j (A
+) (3)
For a vector that is input by the user digit after digit,
nothing is known about any correlation. This yields the
largest a priori fraction range associated with the given
radix and the given digit set. It is bounded by Eq. (4).1
Fr
(
Sml
) = {‖0 · d j−1 . . . dl‖β, l ≤ j ≤ m,
[dm . . . dl] ∈ Sml
}
Fr
(
Sml
) ⊂ 1
β − 1 · (Smin; Smax) (4)
For radix two representations, the fraction range of a
borrow save vector where S = {−1, 0, 1} and a carry
save vector where S = {0, 1, 2} are respectively (−1, 1)
and [0, 2). Indeed this fact was recognized in Intel’s
description of the Pentium bug: truncating a number in
carry save format produces an error within 2 ulps called
the “region of uncertainty” [17, 18]. If the number is
stored in borrow save format the error is within ±1 ulp.
A positive carry recoding transforms (dm . . . dl)β on
the digit set S = {Smin, . . . , Smax} to the representation
(em+1 em . . . el)β . It uses a quantity p ∈ S called the
pivot to define the Eqs. (5) and (6) for l ≤ i ≤ m + 1
Further Reducing the Redundancy of a Notation Over a Minimally Redundant Digit Set 9
where cl = dm+1 = cm+2 = 0.
ci+1 =
{
1 if di ≥ p
0 otherwise
(5)
ci+1β + ei = di + ci (6)
We verify by induction on m that a recoding does not
change the value stored as presented below.
m∑
i=l
diβ
i =
m+1∑
i=l
eiβ
i
The recoded digit set is {min(Smin; p − β), . . . ,
max(Smax − β + 1; p)}. The digit set is reduced if
Smin + β ≤ p < Smax. The output digit set is minimally
redundant as soon as Smin ≥ p − β and p + β > Smax,
that is the case for example if the digit set is maximally
redundant and p = 0. If the output digit set is minimally
redundant, it is {p−β, . . . , p}. If p = Smax = Smin +β
the output digit set is minimally redundant and identical
to the input set.
A negative carry conversion is similar to the posi-
tive carry conversion except that ci+1 is defined from
Eq. (7).
ci+1 =
{−1 if di ≤ p
0 otherwise
(7)
The recoded digit set is {min(Smin + β − 1; p), . . . ,
max(Smax; p + β)}. The digit set is reduced if Smin <
p ≤ Smax−β. If the output digit set is minimally redun-
dant, it is {p, . . . , p + β}. It is unchanged, minimally
redundant, if p = Smin = Smax − β. We will see that
even when we do not reduce the number of elements of
the digit set, such conversions perform some legitimate
work visible through its effect on the fraction range.
2.2. Fundamental Property
If A is a language of Sml with fraction range Fr(A)
and P any positive carry recoding with output digit set
{Smin, . . . , Smax} then2
Fr(P(A)) ⊂ 1
β
(Fr(A) + {Smin, . . . , Smax − 1}). (8)
Proof: Let D ∈ A with D = [dm . . . dl]. We define
the carry vector C = [cm+1 cm . . . cl+1] from Eq. (5)
and the result vector E = P(D) = [em+1 em . . . el]
from Eq. (6). We obtain the following equation for any
rational number f ∈ Fr j (P(A)) by extracting a term
from the radix polynomial of E in Eq. (1)
f = ‖0 · e j−1 . . . el‖β
= e j−1 − c j−1
β
+ ‖0 · c j−1 e j−2 . . . el‖β.
We recognize that [c j−1 e j−2 . . . el] is the recoded vec-
tor of [d j−2 . . . dl] and therefore
‖0 · c j−1 e j−2 . . . el‖β = ‖0 · 0 d j−2 . . . dl‖β
∈ 1
β
Fr j+1(A) ⊂ 1
β
Fr(A).
We know from the definition of the output digit set that
e j−1 − c j−1 = d j−1 − βc j < Smax to allow c j−1 to be
incorporated without carry ripple. It follows that
Fr(P(A)) ⊂
Smax−1⋃
k=Smin
1
β
({k} + Fr(A))
⊂ 1
β
(Fr(A) + {Smin, . . . , Smax − 1}).
We prove in a very similar way that if N is a negative
carry recoding then
Fr(N (A)) ⊂ 1
β
(Fr(A) + {Smin + 1, . . . , Smax}). (9)
Considering Eqs. (8) and (9) it is now legitimate to
perform a digit set recoding from one digit set to the
same digit set since it reduces the fraction range of the
final vector. We can recode a borrow save vector to re-
duce its fraction range from (a, b) to (−1/2 + a/2, b/2)
with a positive recoding and to (a/2, b/2+1/2) with a
negative recoding. For a carry save number, we can re-
duce its fraction range from [a, b) to [a/2, b/2 + 1/2)
with a positive recoding and to [1/2 + a/2, b/2 + 1)
with a negative recoding.
We start with digits in the set {p − β, . . . , p − 1 + r}
with 1 ≤ r ≤ β for a generalization of this last obser-
vation. Let P be a positive recoding with pivot p, then
by induction
Fr(Pk) ⊂ 1
β − 1 ·
(
p − β, p − 1 + r
βk
)
∪ {0}. (10)
Comparing the fraction range of such a vector re-
coded k times to the fraction range of a non redundant
10 Daumas and Matula
vector (β − 1)−1 · (p − β; p − 1), we see that the dif-
ference r · β−k is decreasing geometrically with each
new recoding. We can even apply another recoding to
center the added quantity. Let N be the negative re-
coding with pivot p − β, the fraction range NPk(X ) is
bounded by Eq. (11). The centering is best if β = 2
since (β − 1)/β = 1/β = 1/2.
Fr(NPk) ⊂ 1
β − 1 ·
(
p − β + β − 1
β
;
p − 1 + β − 1
β
+ r
βk+1
)
∪ {0} (11)
Proof: The proof of Eq. (10) is easily obtained
by induction on k. We prove it here just for k = 1.
From Eq. (4), we know that Fr(Sml ) ⊂ (β − 1)−1 ·
(p − β, p − 1 + r ) ∪ {0}. The output digit set is mini-
mally redundant since p − 1 + r < p + β and Eq. (8)
gives us the inclusion:
Fr(P) ⊂ 1
β
(
Fr
(
Sml
) + {p − β, . . . , p − 1})
⊂ 1
β
(
1
β − 1(p − β) + p − β;
1
β − 1(p − 1 + r ) + p − 1
)
∪ {0}
⊂ 1
β − 1
(
p − β; p − 1 + r
β
)
∪ {0}.
Equation (11) is proved in a similar way:
Fr(NPk) ⊂ 1
β
(Fr(Pk) + {p − β + 1, . . . , p})
⊂ 1
β
(
1
β − 1(p − β) + p − β + 1;
1
β − 1
(
p − 1 + r
βk
)
+ p
)
∪ {0}
⊂ 1
β − 1
(
p − β + β − 1
β
; p − 1
+ β − 1
β
+ r
βk+1
)
∪ {0}.
We obtain an identical property if k negative re-
codings are applied to a vector of digits in the set
{p + 1 − r, . . . , p + β} with 1 ≤ r ≤ β. In this case
N k has its fraction range bounded by Eq. (12) and PNk
with one last positive recoding of pivot p + β has its
fraction range bounded by Eq. (13).
Fr(N k) ⊂ 1
β − 1 ·
(
p + 1 − r
βk
, p + β
)
∪ {0} (12)
Fr(PNk) ⊂ 1
β − 1 ·
(
p + 1 − β − 1
β
− r
βk+1
;
p + β − β − 1
β
)
∪ {0} (13)
2.3. First Application: Combining Several
Digits in a Window
We will see in Section 3 how the fraction range of a bi-
nary encoding can be used to deduce important proper-
ties on the digits stored in a vector. We will just present
here a first practical application of the fraction range.
Let D = [dm . . . dl] be a vector of radix β digits. The
window of k digits starting at position j · k can be val-
ued alone as an integer d ′j given by Eq. (14) with digits
in the range of Eq. (15). If we read the digits d ′j for any
m/k ≤ j ≤ l/k we obtain a new vector represent-
ing the same number radix βk as shown in Eq. (16). We
commonly use the octal and the hexadecimal notations
because this conversion is very easy to perform back
and forth with β = 2 and k = 3 or k = 4.
d ′j =
∥∥d( j+1)·k−1 . . . d j ·k∥∥β =
k−1∑
i=0
d j ·k+iβ i
(14)
d ′j ∈
βk − 1
β − 1 · [Smin, Smax] (15)
‖dm . . . dl‖β =
∥∥∥d ′ mk  . . . d ′ lk 
∥∥∥
βk
(16)
We can also write d ′j as presented in Eq. (17) leading
to another interval for d ′j as shown Eq. (18).
d ′j =
∥∥0 · d( j+1)·k−1 . . . dl∥∥β · βk −
∥∥0 · d j ·k . . . dl∥∥β
(17)
d ′j ∈ (βk · Fr(D) − Fr(D)) (18)
If the fraction range satisfies Fr(D) ⊂ (a, b), Eq. (18)
yields that a combined digit d ′j is bounded by
a · βk − b < d ′j < b · βk − a. We present Tables 1 and
2 two examples radix 10, with the digit set {−1, . . . , 9}
(generalized borrow save) and up to 3 positive carry
Further Reducing the Redundancy of a Notation Over a Minimally Redundant Digit Set 11
Table 1. Reducing the redundancy to obtain a minimally redundant digit set radix 100.
Digits radix 100
Rec. Fraction range (radix 10) Max Digit set
Non 19 · (−1; 8 + 1) =
(− 19 ; 1) 99 {−11, . . . , 99}
P 19 ·
(−1; 8 + 110 ) = (− 19 ; 910 ) 90 + 19 {−11, . . . , 90}
P2 19 ·
(−1; 8 + 1100 ) = (− 19 ; 89100 ) 89 + 19 {−11, . . . , 89}a
P3 19 ·
(−1; 8 + 11000 ) = (− 19 ; 8891000 ) 88.9 + 19 {−11, . . . , 89}a
aThe digit set is minimally redundant. There will be no further improvement.
Table 2. Reducing the redundancy to obtain a minimally redundant digit set radix 1000.
Digits radix 1000
Rec. Fraction range (radix 10) Max Digit set
Non 19 · (−1; 8 + 1) =
(− 19 ; 1) 999 {−111, . . . , 999}
P 19 ·
(−1; 8 + 110 ) = (− 19 ; 910 ) 900 + 19 {−111, . . . , 900}
P2 19 ·
(−1; 8 + 1100 ) = (− 19 ; 89100 ) 890 + 19 {−111, . . . , 890}
P3 19 ·
(−1; 8 + 11000 ) = (− 19 ; 8891000 ) 889 + 19 {−111, . . . , 889}a
aThe digit set is minimally redundant. There will be no further improvement.
recodings with pivot p = 9. The lower bound on the
combined digit set is given by Eq. (15) and does not
change. The maximum value for one digit is computed
from the fraction range in the table. The first recod-
ing removed approximately a portion 1
β
of the possible
digits and the second one a portion 1
β2
.
Getting rid of almost 1
β
of the digits allows us to
reach easily the minimally redundant digit set. It may
not seem to be important working radix 10 but this
is very relevant radix 2 as we will see in the next
section.
3. Binary Applications
As we have seen in Eqs. (10)–(13), a short sequence
of recodings can be employed to reduce the fraction
range and approach the width of 1 ulp. The prac-
tical value of binary carry recodings is that just a
few recodings provide partial compression sufficient
to obtain almost all the benefits of full conversion
to a non redundant notation while avoiding the high
cost of a carry-ripple addition. The following obser-
vations are straightforward from the definition and
support partial compression applications in round-
ing, leading insignificant digit deletion, and multiplier
recoding.
3.1. Implementation and Bit Specialization
In computers, the digit vector is not stored but encoded
into bits. In the carry save format, each digit di is stored
with two bits pi and qi and the digit value is defined
as di = pi + qi . We specialize the bits of the result
register ei = p′i + q ′i such that p′i+1 stores the carry
bit ci+1 generated by Eq. (5) and q ′i stores the residual
quantity ei − ci = di − ci+1β of Eq. (6). This leads
us to the truth table and the equations for the positive
carry recoding of a carry save number Fig. 1(a). If the
numbers are stored using two’s complement, the sign
digit is moved to position m + 1 and q ′m+1 = q ′m . The
equations of Fig. 1(a) are the ones of an half adder cell
(HA) as we will see soon.
The negative carry recoding of a carry save number
cannot be defined in such an elementary manner but
we may define more recodings if the target encoding
is borrow save instead of carry save. A borrow save
digit di (resp. ei ) is stored with two bits pi and ni
(resp. p′i and n
′
i ) and the digit is defined as di = pi − ni
(resp. ei = p′i − n′i ). We specialize a positive carry or a
negative carry recoding since we can store the carry in
p′i+1 (positive carry P-recoding of Fig. 1(b) or in n
′
i+1
(negative carry N -recoding of Fig. 1(c)).
The negative carry recoding of a carry save number
to borrow save notation is not possible but a positive
12 Daumas and Matula
Figure 1. Truth table and equations of binary recodings with a
specialized carry bit.
Figure 2. N -carry recoding of a borrow save number.
carry recoding is possible. It is presented Fig. 1(d) as
the Q-recoding. Figure 2 illustrates how each output
diagonal of the 2 × (m − l + 2) bit array is determined
by an input column of the 2 × (m − l + 1) bit array for
the N -carry recoding.
The half-adder cell is one of the basic cells of the
computer arithmetic libraries. Functionally, it is imple-
mented with an exclusive or gate and a logical
Figure 3. Half adder cell (HA) a + b = 2c + s.
Figure 4. Recoder cells.
and (see Fig. 3). In choosing the place to add some
new logical inverters, we define the bit level
cells corresponding to the P-, N - and Q-recodings (see
Fig. 4). These inverters may or may not yield an actual
penalty compare to a standard half adder cell. For ex-
ample, the exclusive or gate may be implemented
using pass transistors as it is the case in state-of-the-art
circuit design [19, 20]. If so, an inverted exclusive
or gate is obtained by switching some of the input
wires of a straight exclusive or gate.
3.2. Recodings and Digit Set Conversions
We will see in the remaining of the text that being an
high level quantity, the fraction range can be easily
tracked through an algorithm and it is therefore very
useful when more than one recoding is performed. We
have built the three recoder segments presented Figs. 5
to 7. To obtain the corresponding recoder, the desired
segment is replicated m − l + 1 times, to treat all the
digits of the input from weight 2l to weight 2m as pre-
sented later in Fig. 11. Modified versions of the segment
are used at the ends, to avoid using logic simply to gen-
erate constants. The format and the fraction range are
indicated after each recoding.
Further Reducing the Redundancy of a Notation Over a Minimally Redundant Digit Set 13
Figure 5. Segment for the translated NQ transformation.
Figure 6. Segment for the PN2 transformation.
Figure 7. Segment for the PNQ transformation.
The first precoder Fig. 5 allows transformation of
carry save to sign digit SD. The reader is invited to re-
fer to our [7] report for a more detailed discussion on
the signed digit notation. The second and the third pre-
coder of Figs. 6 and 7 convert from borrow save or carry
save to radix 4 notation on the digit set {−2, . . . , 2}.
Our redundant Booth recoder (Section 3.3) uses some
additional (lower level) information on the last recod-
ing of these conversions.
3.3. Rounding and Leading Digit Deletion
If we truncate the low order part of a word, we may
cause an error up to ±1 ulp or 2 ulps depending on the
notation. On the other hand, if we first apply a PNk−1
transformation (k ≥ 1) to the borrow save register
or a PNk−2 Q transformation (k ≥ 2) to a carry save
register, we obtain a number stored in borrow save
format that has a very limited fraction range included
in [−1/2, 1/2 + 2−k). As a consequence, truncating
it at any position will result in an error less than
1/2 + 2−k ulp.
Three lines of half adder provides a PN2 rounding
with fraction range (−5/8, 1/2) sufficient to reduce
truncation error below 5/8 ulps. This can be quite use-
ful in microcoded redundant binary designs for divi-
sion, square root, and transcendentals.
The next observation has great applicability in ex-
tracting differences from a function table for perform-
ing interpolation [21, 22]. A 2 × (m − l + 1) bit array
may be formed as the difference a − b of two (m −
l + 1)-bit unsigned integers and stored in borrow save
format with no operation. In this case, two recodings
allow to discard the leading digits.
Partial compression realizes virtually all the bene-
fits of leading digit deletion. Let X be a positive num-
ber represented in borrow save such that X ≤ 2k − 1.
We know that if N (X ) = [d ′m+1 d ′m . . . d ′l ], then d ′k = 0
for all i ≥ k + 1. Thus N (X ) may be truncated to a
2 × (k + 1) bit array. If we can only bound the magni-
tude of X (i.e. |X | ≤ 2k − 1), we have to com-
pute PN(X ) and we prove that d ′′i = 0 for all
i ≥ k + 2.
3.4. Booth Recoder
Previous research has shown the feasibility of multi-
plier designs employing redundant binary operands. To
avoid the general increase in hardware size entailed by
redundant binary inputs, recent attention has been fo-
cused on limiting redundant inputs simply to the multi-
plier recoder input [23–25] rather than the multiplicand
[26, 27].
With minimum circuitry, we are ready to derive the
low-power hot-one (only one signal is set at any time)
signal controls {−2, −1, 0, 1, 2} [24, 28] or the com-
mon multiplier controls {negative; doubled factor; un-
changed factor} [29, 30]. But we prefer to compute the
enhanced sign select controls {negative; positive; dou-
bled factor; unchanged factor} published by Goto et al.
[19] who have proved better than previous solutions
[31, 32].
On practical grounds, computing directly the con-
trol signals is more desirable than converting the
14 Daumas and Matula
Figure 8. General purpose Booth multiplier with fast feedback ca-
pacities through a precoder.
number from digit set {0, . . . , 3} (non redundant in-
put), {0, . . . , 6} (carry save input) or {−3, . . . , 3} (bor-
row save input) to digit set {−2, . . . , 2} before convert-
ing each of the digits obtained to control signals. Past
recoders have added critical path delay for the more
frequent case where a non redundant binary input is
available. Our proposed circuit does not lengthen the
time of one multiplication, compared to the state-of-
the-art encoding if both inputs are non redundant.
The common radix 4 Booth multiplier computes in
three steps the representation of D = A × B where
the two numbers A and B are represented radix 2 by
(am . . . al)2 and (bm ′ . . . bl ′ )2. The final organization of
our redundant aware multiplier is visible Fig. 8. The
first step converts B to a radix-4 minimally redundant
recoding on the digit set {−2, . . . , 2} (“Booth encod-
ing”). This conversion is performed for l ′/2 ≤ j ≤
m ′/2 by looking at bits b2 j+1, b2 j and b2 j−1 to com-
pute digit b′j as presented in Table 3. One can check
by induction that this operation does not change the
represented value as written in the first part of Eq. (19).
The second step (“Tree reduction”) accumulates the
partial products b′j · A · 4 j in a redundant format to
Table 3. Radix 4 Booth recoding.
Multiplier representation Enhanced sign select
b2 j+1 b2 j b2 j−1
Booth digit b′j
− 2b2 j+1
+ b2 j + b2 j−1 X j TX j PL j M j
0 0 0 0 0 1 0 0
0 0 1 1 1 0 1 0
0 1 0 1 1 0 1 0
0 1 1 2 0 1 1 0
1 0 0 −2 0 1 0 1
1 0 1 −1 1 0 0 1
1 1 0 −1 1 0 0 1
1 1 1 0 0 1 0 0
compute the product as suggested by the second part
of Eq. (19). The third step converts the redundant re-
sult to the usual non redundant binary representation
and possibly rounds it according to the active rounding
mode (“2-1 compression”).
m∑
i=l
bi 2
i =
m/2∑
j=l/2
b′j 4
j and A × B
=
m∑
i=l
bi · A · 2i =
m/2∑
j=l/2
b′j · A · 4 j (19)
Compared to the usual multiplication, Booth recod-
ing divides by two the number of partial products gen-
erated and accumulated but these products are more
difficult to generate since they cannot be obtained by a
logical and gate as this is the case when we mul-
tiply only by bi ∈ {0; 1}. A naive implementation can
ruin all the advantages of Booth recoding. State of the
art implementations usually present two cells:
– The encoder is responsible of generating a set of
control signals from the input bits b2 j+1, b2 j and
b2 j−1. We present in Table 3 its truth table and in the
Fig. 9 its high level CMOS circuit.
– The multiplexer computes a representation of b′j · A
based on (am . . . al)2 and the control signals gener-
ated by the encoder. The number obtained after mul-
tiplexing is the one’s complement representation of
A × b′j . Little extra circuitry is added to take care of
two’s complement logic.
As noted by Goto et al. [19], an IEEE-754 standard
double precision multiplier uses 27 encoders and 1527
Figure 9. Usual non redundant enhanced sign select Booth encoder.
Further Reducing the Redundancy of a Notation Over a Minimally Redundant Digit Set 15
multiplexers. These later cells may add up to 90% of
the area of the circuit.
The next result supports the factoring of multi-
plier recoding into a two step process. Partial com-
pression is first applied to recode a redundant for-
mat so in a second step it may be passed through
our modified Booth encoder of Fig. 9 with very few
penalty.
The leading negative weight bit of any 2×k bit window
of a borrow save number PNk(X) indicates the sign of
the digit value of that window whenever that digit is
non-zero. The same result applies for a window on a
PNk−1 Q recoded number.
Proof: Let D = N k(X ) with D = [dm . . . dl]. We de-
fine the carry vector C = [cm+1 cm . . . cl+1] from Eq. (5)
and the result vector E = P(D) = [em+1 em . . . el] from
Eq. (6) with ei = pi −ni . We obtain the following equa-
tion for any combined digit e′i radix 2
k
e′i =
∥∥0 · e(i+1)·k−1 . . . el∥∥2 · 2k − ‖0 · ei ·k . . . el‖2.
We extract a term from the radix polynomial of E from
Eq. (1)
e′i =
(
e(i+1)·k−1 − c(i+1)·k−1
) · 2k−1
+ ∥∥0 · c(i+1)·k−1 e(i+1)·k−2 . . . el∥∥2 · 2k
− ‖0 · ek·n . . . el‖2.
We recognize that [c(i+1)·k−1 e(i+1)·k−2 . . . el] is the
recoded vector of [d(i+1)·k−2 . . . dl] and e(i+1)·k−1 −
c(i+1)·k−1 = −n(i+1)·k−1 and therefore
e′i ∈ −n(i+1)·k−1 · 2k−1 + 2k−1 · (−2−k ; 1)
− 1
2
· (−1 − 2−k ; 1)
For any k, if n(i+1)·k−1 = 1 then e′i ≤ −2k−1 + 2k−1 +
1+2−k
2 and e
′
i ≤ 0 since e′i ∈ Z.
We can now list all the possible outputs of two digits
e2 j+1 and e2 j of a PN2 or a PNQ recoder. As presented
Table 4 some outputs are not valid because they violate
the fraction range or the result just presented above
or because they cannot be obtained from the last P
recoding.
We have also listed Table 4 all the valid cases if we
store b2 j+1 in n2 j+1, b2 j both in p2 j+1 and n2 j and
finally b2 j−1 in p2 j . The value of the stored number
Table 4. Radix 4 PN2 or PNQ recoded borrow save number.
Input bits Acceptable digit?
p2 j+1 n2 j+1 p2 j n2 j
Radix 4
digit Recoded Non redundant
0 0 0 0 0 Yes Yes
0 0 0 1 −1 No No
0 0 1 0 1 Yes Yes
0 0 1 1 0 Yes No
0 1 0 0 −2 Yes Yes
0 1 0 1 −3 No No
0 1 1 0 −1 Yes Yes
0 1 1 1 −2 Yes No
1 0 0 0 2 No No
1 0 0 1 1 Yes Yes
1 0 1 0 3 No No
1 0 1 1 2 Yes Yes
1 1 0 0 0 No No
1 1 0 1 −1 Yes Yes
1 1 1 0 1 No No
1 1 1 1 0 Yes Yes
is unchanged since we virtually compute 2B − B as
presented in details in [7].
We design the new encoder presented Fig. 10 to pro-
duce the correct result for each acceptable input of
d2 j+1 and d2 j . It shows how a borrow save register
can be used as the input to the Booth encoding logic
to store either a PNQ precoded carry save result obtain
through the logic described Fig. 11 or a non redundant
number which has been converted to borrow save using
Figure 10. Redundant aware enhanced sign select Booth encoder.
16 Daumas and Matula
Figure 11. PNQ precoder: Convert from carry save to radix 4 Booth
recoding.
the centered conversion 2B − B. The modified Booth
multiplier of Fig. 8 accepts a redundant number as one
of its operands, by the use of a precoder which does
not generate any additional delay for a non-redundant
operand.
4. Conclusion
We have presented a general formalism for the study
of partial compressions and roundings. This approach
proves fruitful in reducing the redundancy of a borrow
save or a carry save bit array to allow radix 2k Booth
recoding with minimal circuitry. Reducing redundancy
is useful in any application that does not allow any full
range redundant number as input but does not require
non redundant inputs.
With the technology used in the design presented by
Goto et al. [19], that is 0.25 µm CMOS technology
with 2.5 V power supply. The authors obtain a 54 × 54
multiplier with a clock cycle of 4.1 ns. From spice sim-
ulations presented by Goto et al. [19] we can predict
that, each row of modified half-adders will deliver its
output with a delay between 100 ps and 200 ps (de-
pending on electrical properties). As a result, our new
circuit can perform a non redundant multiplication in
4.1 ns or prepare a redundant result to be reused by an-
other multiplication in 3.4 ns. For example computing
a product of 11 numbers will take 41 ns with the origi-
nal multiplier and 35 ns with our modified one. Higher
savings could be assessed by an electrical simulation
of the PNQ recoding.
Precoded Booth multipliers have also been imple-
mented [11, 33]. They use 0.5 µm CMOS technol-
ogy with the Alliance standard cell library [34]. Al-
though this implementation is not as sharp as Goto
et al. [19]’s one, Dumonteix [33] have been able to
derive interesting results. For example, a 32 bits mul-
tiplier with one input recoded producing a redundant
result uses 0.81 mm2 instead of 0.72 mm2 (12% over-
head) for the usual Booth multiplier but the latency
drops from 16.83 ns to 12.23 ns (27% improvement).
Other applications arise in the forwarding and feed-
back of a number internal to a floating point unit to re-
duce redundancy as presented by Matula and Nielsen
[8]. Precoders are used to forward some parts of the
redundant result after redundancy is sufficiently re-
duced. This allows an IEEE standard behavior of the
rounding unit albeit the numbers are not compressed
to a non redundant format. The construction of a stan-
dard adder using this technique is depicted by Nielsen
et al. [9].
Other applications will appear in converting the two
bounds of an interval to the step of a linear interpolation
as it is performed for fast reciprocal and square root
approximation [21]. This will allow to forward directly
the pair of recoded bounds in borrow save format to a
modified Booth multiplier.
Acknowledgments
We wish to thank Peter Kornerup for his notations as
presented in [13] and his kind and knowledgeable help
over the time of this work.
Notes
1. The product of a set B by a scalar a is the set of any element of
B scaled by a, a · B = {a · b, b ∈ B}.
2. The sum of two sets A and B is the set of the sum of any two
elements from A and B, A + B = {a + b, (a; b) ∈ A × B}.
References
1. A. Avižienis, “Signed Digit Number Representations for Fast
Parallel Arithmetic,” IRE Transactions on Electronic Comput-
ers, vol. 10, 1961, pp. 389–400.
2. D.W. Matula, “Radix Arithmetic: Digital Algorithms for
Computer Architecture,” in Applied Computation Theory:
Analysis, Design, Modeling, Prentice Hall, 1976, pp. 374–
448.
3. D.W. Matula, “Basic Digit Sets for Radix Representation,” Jour-
nal of the ACM, vol. 29, no. 4, 1982, pp. 1131–1143.
4. A.M. Nielsen and P. Kornerup, “Redundant Radix Representa-
tions of Rings,” IEEE Transactions on Computers, vol. 48, no.
11, 1999, pp. 1153–1165.
5. A.D. Booth, “A Signed Binary Multiplication Technique,” Quar-
terly Journal of Mechanics and Applied Mathematics, vol. 4,
no. 2, 1951, pp. 236–240.
Further Reducing the Redundancy of a Notation Over a Minimally Redundant Digit Set 17
6. I. Koren, Computer Arithmetic Algorithms, Englewood Cliffs,
NJ: Prentice Hall, 1993.
7. M. Daumas and D.W. Matula, “Recoders for Partial Com-
pression and Rounding,” Research Report 97-01, Labo-
ratoire de l’Informatique du Parallélisme, Lyon, France,
1997.
8. D.W. Matula and A.M. Nielsen, “Pipelined Packet-Forwarding
Floating Point: I. Foundations and a Founder,” in Proceed-
ings of the 13th Symposium on Computer Arithmetic, Monterey,
California, T. Lang, J.-M. Muller, and N. Takagi (Eds.), 1997,
pp. 140–147.
9. A.M. Nielsen, D.W. Matula, C.N. Lyu, and G. Even, “An
IEEE Compliant Floating Point Adder that Conforms with the
Pipelined Packet Forwarding Paradigm,” IEEE Transactions on
Computers, vol. 49, no. 4, 2000, pp. 33–47.
10. P.M. Seidel and G. Even, “How Many Logic Levels Does
Floating-Point Addition Require?,” in 1998 International Con-
ference on Computer Design, Austin, Texas, 1998, pp. 142–
149.
11. Y. Dumonteix and H. Mehrez, “A Family of Redundant Mul-
tipliers Dedicated to Fast Computation for Signal Processing,”
in Proceedings of the 2000 IEEE International Symposium on
Circuits and Systems, Geneva, Switzerland, 2000, pp. 325–
328.
12. P.M. Seidel, “High Speed Redundant Reciprocal Approxima-
tion,” in 3rd Real Numbers and Computers Conference, Paris,
France, 1998, pp. 219–229.
13. P. Kornerup, “Digit-Set Conversion: Generalizations and Appli-
cations,” IEEE Transactions on Computers, vol. 43, no. 5, 1994,
pp. 622–629.
14. T.M. Carter and J.E. Robertson, “The Set Theory of Arith-
metic Decomposition,” IEEE Transactions on Computers, vol.
39, no. 8, 1990, pp. 993–1005.
15. M. Ercegovac and T. Lang, “On Recoding in Arithmetic Algo-
rithms,” Journal of VLSI Signal Processing, vol. 14, 1996, pp.
283–294.
16. M. Daumas and D.W. Matula, “A Booth Multiplier Accepting
Both a Redundant or a Non-Redundant Input with no Addi-
tional Delay,” in IEEE International Conference on Application-
specific Systems, Architectures and Processors, Boston, Mas-
sachusetts, E.E. Swartzlander, G.A. Jullien, and M. Schulte
(Eds.), 2000, pp. 205–214.
17. H.P. Sharangpani and M.L. Barton, “Statistical Analysis of
Floating Point Flaw in Pentium Processors,” White Paper 11,
Intel Corporation, 1994.
18. J.-M. Muller, “Algorithmes de division Pour Microprocesseurs:
illustration à l’aide du “bug” du Pentium,” Technique et Science
Informatiques, vol. 14, no. 8, 1995.
19. G. Goto et al., “A 4.1ns Compact 54 × 54b Multiplier Utiliz-
ing Sign Select Booth Encoders,” IEEE Journal of Solid-State
Circuits, vol. 32, no. 11, 1997, pp. 1676–1682.
20. A.M. Shams and M.A. Bayoumi, “A Novel High-Performance
CMOS 1 Bit Full Adder Cell,” IEEE Transactions on Circuits
and Systems—II: Analog and Digital Signal Processing, vol. 47,
no. 5, 2000, pp. 478–481.
21. D. Das Sarma and D.W. Matula, “Faithful Bipartite ROM Re-
ciprocal Tables,” in Proceedings of the 12th Symposium on
Computer Arithmetic, Bath, England, S. Knowles and W.H.
McAllister (Eds.), 1995, pp. 17–28.
22. M.J. Schulte and J.E. Stine, “Approximating Elementary Func-
tions with Symmetric Bipartite Tables,” IEEE Transactions on
Computers, vol. 48, no. 8, 1999, pp. 842–847.
23. N. Takagi, “Arithmetic Unit Based on a High Speed Multiplier
with a Redundant Binary Addition Tree,” in Advanced Signal
Processing Algorithms, Architectures and Implementation II,
vol. 1566 of Proceedings of SPIE, 1991, pp. 244–251.
24. B.W.Y. Wei, H. Du, and H. Chen, “A Complex Number Multi-
plier Using Radix 4 Digits,” in Proceedings of the 12th Sympo-
sium on Computer Arithmetic, Bath, England, S. Knowles and
W.H. McAllister (Eds.), 1995, pp. 84–90.
25. C.N. Lyu and D.W. Matula, “Redundant Binary Booth Recod-
ing,” in Proceedings of the 12th Symposium on Computer Arith-
metic, Bath, England, S. Knowles and W.H. McAllister (Eds.),
1995, pp. 50–57.
26. W.S. Briggs and D.W. Matula, “Rectangular Array Signed
Digit Multiplier,” US Patent 5 184 318, US Patent Office,
1993.
27. W.S. Briggs and D.W. Matula, “A 17 × 69 Bit Multiply and
Add Unit with Redundant Binary Feedback and Single Cycle
Latency,” in Proceedings of the 11th Symposium on Computer
Arithmetic, Windsor, Ontario, E. Swartzlander, M.J. Irwin, and
G. Jullien (Eds.), 1993, pp. 163–170.
28. R.M. Jessani and M. Putrino, “Comparison of Single and Dual
Pass Multiply Add Fused Floating Point Units,” IEEE Transac-
tions on Computers, vol. 47, no. 9, 1998, pp. 927–937.
29. G. Goto, T. Sato, M. Nakajima, and T. Sukemura, “A 54 × 54b
Regularly Structured Tree Multiplier,” IEEE Journal of Solid-
State Circuits, vol. 27, no. 9, 1992, pp. 1229–1236.
30. N. Ohkubo et al., “A 4.4 ns CMOS 54 × 54-b Multiplier Us-
ing Pass Transistor Multiplexer,” IEEE Journal of Solid-State
Circuits, vol. 30, no. 3, 1995, pp. 251–257.
31. J. Mori et al., “A 10 ns 54 × 54b Parallel Structured Full Array
Multiplier with 0.5 µm CMOS Technology,” IEEE Journal of
Solid-State Circuits, vol. 26, no. 4, 1991, pp. 600–605.
32. H. Makino et al., “An 8.8 ns 54 × 54-bit Multiplier with High
Speed Redundant Binary Architecture,” IEEE Journal on Solid
State Circuits, vol. 31, no. 6, 1996, pp. 773–783.
33. Y. Dumonteix, “Optimisations des chemins de données
arithmétiques par l’utilisation de plusieurs systèmes de
numération,” Ph.D. Thesis, Université Pierre et Marie Curie,
Paris, France, 2001.
34. A. Greiner, L. Lucas, and F. Wajsbürt, “Designing a High Com-
plexity Microprocessor Using the Alliance CAD System,” in
Proceedings of the 7th Annual IEEE International ASIC Con-
ference and Exhibit, Rochester, New York, 1994, pp. 223–
226.
35. P. Chai et al., “A 120 MFLOPS CMOS Floating Point Proces-
sor,” in Proceedings of the 1991 Custom Integrated Circuits Con-
ference, San Diego, California, IEEE Computer Society Press,
1991, pp. 15.1.1–15.1.4.
36. H.M. Darley et al., “Floating Point/Integer Processor with Divide
and Square Root Functions,” US Patent 4 878 190, US Patent
Office, 1989.
37. H. Kabuo et al., “Accurate Rounding Scheme for the New-
ton Raphson Method Using Redundant Binary Representa-
tion,” IEEE Transactions on Computers, vol. 43, no. 1, 1994,
pp. 43–51.
38. S.M. Quek, L. Hu, J.P. Prabhu, and F.A. Ware, “Apparatus for
Determining Booth Recoder Input Control Signals,” US Patent
5 280 439, US Patent Office, 1994.
18 Daumas and Matula
Marc Daumas is a senior scientist with the French National Center
for Scientific Research (CNRS). He has joined the CNRS in 1998
in the Laboratoire de l’Informatique du Parallélisme (LIP) in Lyon.
He has previously been an assistant professor in the ENSERG in
Grenoble (1997), a lecturer (1995–1997) and a fellow student (1989–
1993) in the Ecole Normale Supérieure (ENS) in Lyon. Marc Daumas
received his Ph.D. in 1996 from the ENS in Lyon, and his M.Sc.
in 1992 from the Southern Methodist University in Dallas, Texas.
He has been working on formal and hardware aspects of computer
arithmetic with a special interest in number representation systems.
Marc.Daumas@ENS-Lyon.Fr
David W. Matula received the Ph.D. degree in engineering from
the University of California, Berkeley, in 1966. He is currently a
professor in the Computer Science and Engineering Department at
Southern Methodist University, Dallas, Texas. He is the author of
more than 90 papers on computer arithmetic and graph algorithms
and holds 13 patents on computer arithmetic and cellular communi-
cation systems.
Matula@Engr.SMU.Edu
