A new Low-Power recoding algorithm for multiplierless single/multiple constant multiplication. by K. Oudjida, Abdelkrim et al.
A new Low-Power recoding algorithm for multiplierless
single/multiple constant multiplication.
Abdelkrim K. Oudjida, Mohamed L. Berrandjia, Nicolas Chaillet
To cite this version:
Abdelkrim K. Oudjida, Mohamed L. Berrandjia, Nicolas Chaillet. A new Low-Power recod-
ing algorithm for multiplierless single/multiple constant multiplication.. IEEE Faible Tension
Faible Consommation (FTFC’13), Jan 2013, France. pp.1-4, 2013. <hal-00869542>
HAL Id: hal-00869542
https://hal.archives-ouvertes.fr/hal-00869542
Submitted on 7 Oct 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destine´e au de´poˆt et a` la diffusion de documents
scientifiques de niveau recherche, publie´s ou non,
e´manant des e´tablissements d’enseignement et de
recherche franc¸ais ou e´trangers, des laboratoires
publics ou prive´s.
TABLE I.  ASSYMPTOTIC RUNTIME SUMMARY 
Algorithm Runtime  
Name Author Year 
BIGE Thong [1] 2011 O(2N) 
H(k) Dempster  [7] 2004 O(2N) 
MAG Gustafsson [8] 2002 Ω (2N) 
– Bernstein [12] 1986 O(2N) 
Hcub Voronenko [6] 2007 O(N6) 
BHM Dempster [13] 1995 O(N4) 
– Lefèvre [4] 2001 O(N3) 
DBNS Dimitrov [3] 2007 O(N) 
CSD Avizienis [2] 1961 O(N) 
A New Low-Power Recoding Algorithm for 
Multiplierless Single/Multiple Constant Multiplication 
A.K. Oudjida, M.L. Berrandjia 
Microelectronics and Nanotechnology Division 
Centre de Développement des Technologies Avancées 
Algiers, Algeria 
a_oudjida@cdta.dz 
N. Chaillet 
AS2M Department 
FEMTO-ST Institute 
Besançon, France 
nicolas.chaillet@femto-st.fr
 
Abstract— Optimizing the number of additions in constant 
coefficient multiplication is conjectured to be a NP-hard 
problem. In this paper, we report a new heuristic requiring an 
average of 29.10% and 10.61% less additions than the standard 
canonical signed digit representation (CSD) and the double base 
number system (DBNS), respectively, for 64-bit coefficients. The 
maximum number of additions per coefficient is bounded by 
(N/4)+2, and the time-complexity of the recoding is linearly 
proportional to N, where N is the bit-size of the constant. These 
performances are achieved using a new redundant version of 
radix-28 recoding.  
Keywords—Double Base Number System (DBNS); High-Speed 
and Low-Power Design; Multiplierless Single/Mutiple Constant 
Multiplication (SCM/MCM); Radix-2r Booth recoding. 
I.  BACKGROUND AND MOTIVATION  
Many applications in DSP and control, such as linear time 
invariant (LTI) filters/controllers, involve the computation of a 
large number of multiplications of one variable by a set of 
constants. To be efficiently handled, the implementation must 
be multiplierless, that is, using exclusively additions, 
subtractions and shifts. This problem is known as 
single/multiple constant multiplication (SCM/MCM) and is 
conjectured to be NP-hard [1]. A big number of heuristics have 
been proposed. They are classified into four categories:  
• Digit-recoding heuristics such as CSD [2] and  DBNS [3] ; 
• Common subexpression elimination (CSE) using pattern 
matching. Examples are Lefèvre [4] and Boullis [5]; 
• Directed acyclic graph (DAG) based algorithms such as 
Hcub [6], H(k) [7], and MAG [8]; 
• Mixed algorithms combining CSE and DAG such as the 
recent optimal algorithm BIGE [1]. 
A good survey and a detailed comparative study showing 
pros and cons of various algorithms is given in [1][6][9]. 
Despite the big number of proposed heuristics, the vast 
majority of LTI system optimizations use the CSD 
representation for constant encoding [10]. The rational is that:  
• CSD recoding is easy to implement; 
• The adder complexity in CSD is known, which is not the 
case for the other heuristics [1][11]. In CSD the number of 
adders is bounded by (N+1)/2-1 and tends asymptotically 
to an average value of (N/3)-8/9, which yields to 33% 
saving over the naïve add-and-shift approach. 
• CSD requires a linear computational time, contrary to its 
counterparts that require an excessive runtime (Table I) 
and storage, which makes them impractical for high values 
of N, at least for the current compute power. 
 
 
 
 
 
 
 
 
The central point of this work is the minimization of the 
total number of additions. Based on radix-2r signed-digit 
number system [14] [15], a new Redundant Radix-2r Recoding 
(R3) is proposed as an alternative to existing heuristics. 
Applied to the particular case of radix-28 with N=64, a saving 
of 29.10% is achieved over CSD, which yields to much less 
power consumption and more speed. In addition, the new 
recoding shows high aptitude for common subexpression 
elimination, which makes it a good candidate for MCM.  
The paper is organized as follows. Section I outlines the 
necessity of a linear runtime heuristic with a high compression 
ratio to handle large bit-size constants. Section II introduces the 
new R3 algorithm, while Section III compares the results to 
CSD and DBNS recodings. Finally, Section IV provides some 
concluding remarks and suggestions for future work.  
II. NEW REDUNDANT RADIX-2r ALGORITHM (R3)                   
FOR MULTIPLICATION BY A N-BIT CONSTANT 
A N-bit C constant is expressed in radix-2r as follows: 
( ) rjrrjrrrjr)r/N(
j
rjrjrjrj ccccccC 222222 1
1
2
2
1
0
2
2
1
10
1 ×−+⋅⋅⋅++++= −+
−
−+
−
−
=
++−∑         
   
∑ −
=
×=
1
0
2
)r/N(
j
rj
jQ
  ;                                                                (1)      
where  01 =−c  and 
*Ν∈r . For simplicity purposes and 
without loss of generality, we assume that r is a divider of N.  
This work is supported by “Centre de Développement des Technologies
Avancées (CDTA), Algiers, Algeria, in collaboration with FEMTO-ST 
Institute, Besançon, France. 
c-1 c0  c1   c2    c3  c4   c5   c6   c7  c8  c9  c10 c11  c12 c13  c14 c15  c16 c17 c18  c19 c20 c21 c22 c23 
( ) ( ) 710210 cZZQ −×+=  
   8+1 bits 
16
2
8
1
0
0 222 ×+×+×= QQQC  
( ) ( ) 1511211 cZZQ −×+=  
 : c-1= 0  C 
   24+1 bits 
Qj ( ) 1280 21 ≤+≤ jZZ
( ) ( ) 2312212 cZZQ −×+=  
: c7 ,  c15 , c23  are sign bits  
Figure 1. Partitioning of a 24-bit C constant using R3 algorithm. 
In eq. (1), the two’s complement representation of C constant 
is split into N/r two’s complement slices ( jQ ), each of r+1 bit 
length. Each pair of two contiguous slices has one overlapping 
bit. To eq. (1), corresponds a digit-set ( )rD 2  such as   
  ( ) { }1111 2121011222 −−−− −−+−−=∈ rrrrrj DQ ,,...,,,,,..., .  
The sign of Qj term is given by crj+r-1 bit, and j
k
j mQ j ×= 2 , 
with { }1210 −∈ rkj ,...,,,  and ( ) { } { }0125312 1 ∪−=∈ −rrj OMm ...,,,, . ( )rOM 2  represents the required set of odd-multiples in radix-2r 
recoding, with ( ) 222 −= rrOM . Finally, C can be expressed as 
follows:   ( ) ( )( ) rjrN
j
k
j
c jrrj mC 221
1
0
1 ×××−= ∑−
=
−+
/
 .                   (2) 
Equation (2) is not redundant since for each C constant 
corresponds a unique representation (mj). To make the solution 
space larger in order to select a less adder-consuming 
representation of C, the recoding must be redundant. To 
achieve such a goal, we announce the following theorem: 
Theorem 1. In radix-2r, |Qj|=[Aj×2p+(-1)e×Bj×2h], where: ( ){ };...,,,,,, / 125310 12 −∈ −rjj BA { } ( ){ };,...,,,;,...,,, / 1221012210 121 −∈−∈ −− rr hp
and { }.,10∈e  
The proof of the above theorem is based on our Theorem (1) 
described in [16][17]. Note that different notations for |Qj| are 
possible. For instance: 37=1×25+5×20 or 37=5×23–3×20. We 
illustrate the idea for r=8, where 0 ≤ |Qj| ≤ 128. Equation (2) 
becomes: ( )( ) ( ) jc)/N(
j
h
j
ep
j
jBAC 8
18
0
21212 78 ×−×××−+×= +∑ −
=
 
                   ( ) ( ) jc)/N(
j
j
jZZ 8
18
0
21 21 78 ×−×+= +∑ −
=
                             (3)  
where Z1=Aj×2p ; Z2=(-1)e×Bj×2h ; { };,,,,, 75310∈jj BA  
{ };,...,,, 7210∈p  { };,,, 3210∈h  and { }.,10∈e  
Note that |Qj|=(Z1+Z2)j. The partitioning of C constant 
according to eq. (3) is depicted in Fig. 1, while the recodings of 
odd and even |Qj| digits are separately denoted in Table II.  
The product C×X becomes: 
( ) ( ) ( )[ ] ( ) jcN
j
h
j
ep
j
jXBXAXC 8
18
0
21212 78 ×−××××−+××=× +∑−
=
)/(    (4) 
Note that when { }753 ,,B,A jj ∈ , one extra adder is needed 
since for instance: 3×X=2×X+X .  
 
 
 
 
 
 
   TABLE  II: ODD AND EVEN |QJ| DIGIT RECODING USING R3  ALGORITHM
Odd |Qj|  Z1=Aj × 2p  Z2=(-1)e ×Bj × 2h (Z1+ Z2)j Even|Qj| (Z1+ Z2)j 
1 1 × 2 0  0 × 2 0 U1 2 21 × U1 
3 3 × 2 0  0 × 2 0 U3 4 22 × U1 
5 5 × 2 0  0 × 2 0 U5 6 21 × U3 
7 7 × 2 0  0 × 2 0 U7 8 23 × U1 
9 1 × 2 3  1 × 2 0 U9 10 21 × U5 
11 3 × 2 2 -1 × 2 0 U11 12 22 × U3 
13 3 × 2 2  1 × 2 0 U13 14 21 × U7 
15 1 × 2 4 -1 × 2 0 U15 16 24 × U1 
17 1 × 2 4  1 × 2 0 U17 18 21 × U9 
19 5 × 2 2 -1 × 2 0 U19 20 22 × U5 
21 5 × 2 2  1 × 2 0 U21 22 21 × U11 
23 3 × 2 3 -1 × 2 0 U23 24 23 × U3 
25 3 × 2 3   1 × 2 0 U25 26 21 × U13 
27 7 × 2 2   -1 × 2 0 U27 28 22 × U7 
29 7 × 2 2   1 × 2 0 U29 30 21 × U15 
31 1 × 2 5 -1 × 2 0 U31 32 25 × U1 
33 1 × 2 5   1 × 2 0 U33 34 21 × U17 
35 1 × 2 5   3 × 2 0 U35 36 22 × U9 
37 1 × 2 5   5 × 2 0 U37 38 21 × U19 
39 5 × 2 3 -1 × 2 0 U39 40 23 × U5 
41 5 × 2 3  1 × 2 0 U41 42 21 × U21 
43 5 × 2 3  3 × 2 0 U43 44 22 × U11 
45 3 × 2 4 -3 × 2 0 U45 46 21 × U23 
47 3 × 2 4 -1 × 2 0 U47 48 24 × U3 
49 3 × 2 4  1 × 2 0 U49 50 21 × U25 
51 3 × 2 4  3 × 2 0 U51 52 22 × U13 
53 3 × 2 4  5 × 2 0  U53 54 21 × U27 
55 7 × 2 3 -1 × 2 0 U55 56 23 × U7 
57 7 × 2 3  1 × 2 0 U57 58 21 × U29 
59 1 × 2 6 -5 × 2 0 U59 60 24 × U15 
61 1 × 2 6 -3 × 2 0 U61 62 21 × U31 
63 1 × 2 6 -1 × 2 0 U63 64 26 × U1 
65 1 × 2 6  1 × 2 0 U65 66 21 × U33 
67 1 × 2 6  3 × 2 0  U67 68 22 × U17 
69 1 × 2 6  5 × 2 0  U69 70 21 × U35 
71 1 × 2 6  7 × 2 0  U71 72 23 × U9 
73 5 × 2 4 -7 × 2 0 U73 74 21 × U37 
75 5 × 2 4 -5 × 2 0 U75 76 24 × U19 
77 5 × 2 4 -3 × 2 0 U77 78 21 × U39 
79 5 × 2 4 -1 × 2 0 U79 80 24 × U5 
81 5 × 2 4  1 × 2 0 U81 82 21 × U41 
83 5 × 2 4  3 × 2 0 U83 84 22 × U21 
85 5 × 2 4  5 × 2 0 U85 86 21 × U43 
87 5 × 2 4  7 × 2 0  U87 88 23 × U11 
89 3 × 2 5 -7 × 2 0 U89 90 21 × U45 
91 3 × 2 5 -5 × 2 0 U91 92 22 × U23 
93 3 × 2 5 -3 × 2 0 U93 94 21 × U47 
95 3 × 2 5 -1 × 2 0 U95 96 25 × U3 
97 3 × 2 5  1 × 2 0 U97 98 21 × U49 
99 3 × 2 5  3 × 2 0 U99 100 22 × U25 
101 3 × 2 5  5 × 2 0  U101 102 21 × U51 
103 3 × 2 5  7 × 2 0  U103 104 23 × U13 
105 7 × 2 4 -7 × 2 0 U105 106 21 × U53 
107 7 × 2 4 -5 × 2 0 U107 108 22 × U27 
109 7 × 2 4 -3 × 2 0 U109 110 21 × U55 
111 7 × 2 4 -1 × 2 0 U111 112 24 × U7 
113 7 × 2 4  1 × 2 0 U113 114 21 × U57 
115 7 × 2 4  3 × 2 0 U115 116 23 × U29 
117 7 × 2 4  5 × 2 0 U117 118 21 × U59 
119 7 × 2 4  7 × 2 0 U119 120 23 × U15 
121 1 × 2 7 -7 × 2 0 U121 122 21 × U61 
123 1 × 2 7 -5 × 2 0 U123 124 22 × U31 
125 1 × 2 7 -3 × 2 0 U125 126 21 × U63 
127 1 × 2 7 -1 × 2 0 U127 128 27 × U1 
 
TABLE III:    R3 VERSUS CSD : AVERAGE  NUMBER OF ADDITIONS 
(Avg) AND UPPER BOUND (Upb) 
Constant     
Bit-width N 
CSD R3 Saving 
  (Avg,%) Avg Upb Avg Upb 
8 1.7882 4 1.7254 3 3.5119 
16 4.4445 8 4.1050 6 7.6386 
24 7.1111 12 6.2846 8 11.6226 
32 9.7777 16 8.3194 10 14.9145 
64 20.4444 32 14.4932* 18 29.1091 
*: Obtained from 1010 uniformly distributed random C values. 
TABLE IV:   R3 VERSUS DBNS : AVERAGE  NUMBER OF ADDITIONS 
(Avg) AND UPPER BOUND (Upb) 
Constant    
Bit-width N
DBNS [3] R3 Saving 
 (Avg,%) Avg Upb Avg Upb 
32 ≈9.05+* 13* 8.3194 10 8.0729 
64 16.2151* 21* 14.4932 18 10.6191 
+: Taken from Fig.1 in [3]; *: Obtained from 105 uniformly 
distributed random constants. 
TABLE V:  R3 VERSUS CSD : SMALLEST VALUES 
FOR 32-BIT CONSTANT 
Number of  
Additions (q) CSD R3 
1 3 3 
2 11 11 
3 43 43 
4 171 139 
5 683 651 
6 2731 2699 
7 10923 34971 
8 43691 559259 
9 174763 17336475 
10 699051 143163547 
11 2796203 – 
12 11184811 – 
13 44739243 – 
14 178956971 – 
15 715827883 – 
Our recoding is highly redundant, i.e., each |Qj| may have 
several notations in Z1 and Z2 digits. We fully exploited this 
property to minimize the number of adders using a C-program 
which exhaustively explores for each odd |Qj|, all possible 
notations and selects the least adder consumer combination 
according to the following priority order: (Aj , Bj)=(Aj , 0); (Aj , 
Bj)=(1 , 1); (Z1 , Z2)=(1×27, Z2); and finally (Z1 , Z2)=(Z1 ,1×20). 
These two latter couples allow the following simplification:  
( )[ ] ( )[ ] [ ] [ ] ...ZZ......ZZ... jjjj +×+−×+=+××−+×+×+ ++ 881828801827 222221221     
In case none of those cases is encountered, C-program 
pursues in the following priority order: (Aj , Bj)=(1,3) or (3,1); 
(Aj , Bj)=(3 , 3); (Aj , Bj)= (1,5) or (5,1); (Aj , Bj)=(5, 5); (Aj , 
Bj)= (1, 7) or (7, 1); (Aj , Bj)=(7, 7);  (Aj , Bj)= (3,5) or (5,3); 
(Aj , Bj)= (3,7) or (7,3); (Aj , Bj)= (5,7) or (7,5). This order 
maximizes the occurrences of 1, then of 3, and minimizes those 
of 5 and 7 in |Qj| digits, which will more likely reduce the 
number of adders in the whole C recoding. Furthermore, we 
perform common Uk digit elimination as an ultimate 
optimization step. Only odd |Qj| digits are optimized. 
Optimized even digits are directly derived from odd ones using 
shift operations as indicated in Table II. 
To illustrate the idea, the product P=23453×X is first 
computed in CSD and then in R3. It gives:  
PCSD  = 215 ×X – 213 ×X – 210 ×X – 27 ×X + 25 ×X – 22×X + X; 
PR3 = 28×(25×U3–22×U1)–(25×U3+ U3) ; U1=X and U3=2×X +X. 
PCSD requires 6 operations, while PR3 needs only 4. Note 
that the naïve add-and-shift algorithm would have required 9 
operations. We assume that addition and subtraction have the 
same area/speed cost, and that shift is costless since it can be 
realized without any gates using hard wiring. Note that in R3 
there is no overflow risk since the shift span is fully controlled.  
III. RESULT COMPARISON 
In equation (4), there are N/8 iterations. Each iteration 
generates a maximum of 2 partial products (PP). Thus, the 
maximal number of PP is N/4. A maximum of 3 supplementary 
adders are necessary in case 3×X, 5×X, and 7×X are all 
invoked at the same time in the recoding. Therefore, the 
maximal number of additions per coefficient (Upb) is bounded 
by (N/4)+2. As for the average number of additions (Avg), it 
has been exhaustively calculated for C values varying from 0 to 
2N-1, for N=8, 16, 24, and 32. But for N=64, we calculated the 
average using 105, 106, 109 and 1010 uniformly distributed 
random C values. While the difference between the four 
obtained results is insignificant (<10-3), the average decreases 
as the number of C values increases, and converges to 14.4932 
additions.  Results are reported in Table III. For N=64, R3 uses 
29.10% less additions than CSD. The saving seems to grow 
linearly for low values of N. It will asymptotically converge to 
an upper limit which is unknown for the time being. 
 Regarding computation-time complexity, it is linearly 
proportional to N as shown by eq. (4). As for the storage 
complexity, a look-up table with 128 entries is required, which 
is insignificant. 
 Concerning DBNS, Dimitrov [3] calculated average and 
upper-bound values from 105 uniformly distributed random 
constants, for 32 and 64 bits only (Table IV). Note that DBNS 
upper-bounds will be higher if the worst cases are not attained 
by the pattern of 105 constants. 
Another performance indicator of the recoding is the 
smallest value that requires q additions, for q varying from 1 
to the upper-bound of the recoding. Table V summarizes this 
information for 32-bit constant. Note that starting from q=7, 
higher values are provided by R3 algorithm. 
 
 
 
 
 
 
 
 
 
 
 
 
 
Predictability in addition-number (Upb and Avg) and 
runtime/storage requirements informs on the heuristic 
capabilities and limitations.  Upb denotes exactly the length of 
the critical-path formed by successive additions, while Avg 
gives an idea on the compression performance of the heuristic. 
On the other hand, runtime/storage complexity helps to decide 
whether the use of the heuristic is appropriate with regard to a 
constant bit-width (N). While this latter is known for all 
heuristics (Table I), addition complexity is unknown for most 
of them [1][11]. Pinch was the first to set an asymptotic 
complexity O(N/log(N)) for Upb [18]. Better, based on DBNS 
arithmetic [19], Dimitrov [20] gave a rough evaluation of the 
hidden constant (α) in the big O-notation as being 1≤α≤2. Only 
CSD and R3 do have exact analytic expressions for addition 
complexity (only Upb for R3). For the all remaining heuristics, 
no addition complexity does exist. This is a real handicap as 
there is no visibility on how the heuristic evolves with respect 
to N, unless to exhaustively calculate Avg (Fig. 2)  and Upb, 
but this is still limited to low values of (N≤32) as an excessive 
compute power is required. Though heuristics of  Fig. 2 
exhibits higher compression ratios than R3 for N>16, some 
values of Table VI are not only greater than the ones provided 
by R3, but also equal or even greater than Upb of R3. For 
N≥128, only Lefèvre algorithm remains practical O(N3), 
because even when neglecting the hidden constant α in O(N6), 
Hcub requires more than 4398 billions of iterations. Another 
serious drawback of non-recoding heuristics is the overflow 
risk because of uncontrolled shift spans [3].  Such a problem 
never occurs in digit-recoding heuristics: CSD, DBNS and R3.  
It becomes now clear why despite the large number of 
existing heuristics; CSD is not only used in designing the vast 
majority of LTI systems [10], but incorporated in most of 
advanced synthesis tool as well, such as in Synopsys Design 
Compiler Ultra [10][21]. 
IV. CONCLUSION AND FUTURE WORK 
An efficient alternative (R3) to the most commonly used 
heuristic (CSD) has been proposed. Instead CSD, the use of 
R3 in designing LTI systems leads to much less power 
consumption and more speed.  A pending issue is to determine 
the analytic expression of the average number of additions 
(Avg) needed by R3 with regard to constant bit-width N.   
REFERENCES 
[1] J. Thong and N. Nicolici, “An optimal and practical approach to single 
constant multiplication,” IEEE Trans. on Computer-Aided Design of 
Integrated Circuits and Systems, vol. 30, no. 9, pp. 1373–1386, 
September  2011.  
[2] A. Avizienis, “Signed-digit number representation for fast parallel 
arithmetic,”. IRE Trans. on Electronic Computers, vol. EC-10, No. 3, pp. 
389–400, September 1961. 
[3] V.S. Dimitrov, L. Imbert, and A. Zakaluzny, “Multiplication by a 
Constant is Sublinear,” Proceedings of the 18th IEEE Symposium on 
Computer Arithmetic (ARITH’18), pp. 261-268, June 2007. 
[4] V. Lefèvre,  “Multiplication by an Integer Constant,” INRIA Research 
Report, No. 4192, Lyon, France, May 2001.  
[5] N. Boullis and A. Tisserand, “Some Optimizations of Hardware 
Multiplication by Constant Matrices,” IEEE Trans. on Computers (TC), 
vol. 54, No. 10, pp. 1271-1282, October 2005. 
[6] Y. Voronenko and M. Püschel, “Multiplierless Multiple Constant 
Multiplication,” ACM Trans. on Algorithms (TALG), vol. 3, No. 2, 
Artcle 11, pp. 1-38,  May 2007. 
[7] A. Dempster and M. Macleod, “Using Signed-Digit Representations to 
Design Single Integer Multipliers Using Subexpression Elimination,” 
Proceedings of the IEEE International  Symp. on Circuits and Systems 
(ISCAS), vol. 3, pp. III-165–168, Vancouver, Canada, May 2004. 
[8] O. Gustafsson, A.G. Dempster, and L. Wanhammar, “Extended Results 
for Minimum-Adder Constant Integer Multipliers,” Proceedings of the 
IEEE International Symposium on Circuits and Systems (ISCAS), vol. 
1, pp. I-73 I-76, Scottsdale Arizona, USA, May 2002. 
[9] F. de Dinechin, “Multiplication by Rational Constant,” IEEE Trans. on 
Circuits and Systems II: Express Brief, vol.  59, No. 2, pp. 98–102, 
February 2012. 
[10] R. Kastner, A. Hosangadi, and F. Fallah, “Arithmetic Optimization 
Techniques for Hardware and Software Design,” Cambridge University 
Press, ISBN-13 978-0-521-88099-2, © 2010. 
[11] O. Gustafsson, “Lower Bounds for Constant Multiplication Problems,” 
IEEE Trans. on Circuits and Systems II: Express Brief, vol.  54, No. 11, 
pp. 974–978, November 2007. 
[12] R.L. Bernstein, “Multiplication by Integer Constant,” Software – 
Practice and Experience 16, 7, pp. 641-652, 1986.    
[13] A.G. Dempster and M.D. Macleod, “Use of Minimum Adder Multiplier 
Blocks in FIR Digital Filters,” IEEE Trans.  on Circuits and Systems-II: 
Analog and Digital Signal Processing 42, 9, pp. 569-567, 1995. 
[14] S. Homayoon and A. Gupta, “A Generalized Multibit Recoding of 
Two’s Complement Binary Numbers and its Proof  with Application in 
Multiplier Implementation,” IEEE Trans. on Computers (TC), vol. 39, 
N° 8, August 1990. 
[15] P.M. Seidel, L. D. McFearin, and D.W. Matula, “Secondary Radix 
Recodings for Higher Radix Multipliers,” IEEE Trans. on Computers 
(TC), vol. 54, N°2, February 2005. 
[16] A.K. Oudjida, N. Chaillet, A. Liacha, and M.L. Berrandjia "New High-
Speed and Low-Power Radix-2r Multiplication Algorithms," 
Proceedings of the 11th edition of IEEE-FTFC Low-Voltage Low-
Power Conference, ISSN: 978-1-4673-0821-2/12, Paris, June 2012.  
[17] A.K. Oudjida, N. Chaillet, A. Liacha, and M.L. Berrandjia, “A New 
Recursive Multibit Recoding Algorithm for High-Speed and Low-Power 
Multiplier," Journal of Low Power Electronics (JOLPE), vol. 8, N° 5, 
pp. 579-594, ISSN 1546-1998, American Scientific Publishers (ASP),  
December 2012. 
[18] R. G. E. Pinch, “Asymptotic Upper Bound for Multiplier Design,” 
Electronics Letters, vol. 32, N° 5, pp. 420-421, February 1996. 
[19] V.S. Dimitrov, G.A. Jullien, and W.C. Miller, “Theory and Applications 
of the Double-Base Number System,” IEEE Trans. on Computers (TC), 
vol. 48, No. 10, pp. 1098-1106, October 1999. 
[20] V.S. Dimitrov, K.U. Järvinen, and J. adikari, “Area Efficient Multipliers 
Based on Multiple-Radix Representations,” IEEE Trans. on Computers 
(TC), vol. 60, N° 2, pp 189-201, February 2011. 
[21]   Synopsys Datasheet, Design Compiler Ultra-Design Compiler® at  its 
Best. Available at: www.synopsys.com.  
 
Figure 2.  Comparison of R3 with non-recoding heuristics  
                 based on average number of additions (Avg) 
TABLE VI:    NUMBER OF  ADDERS: SOME  PECULIARITIES 
 
Algorithm 
Hexadecimal Values 
(84AB5)H 
N=20 
(595959)H 
N=24 
(64AB55)H 
N=24 
(5959595B)H 
N=32 
Bernstein [12] 8G 7 7 8 
Hcub*  [6] 6 8E 9G – 
BHM*  [13] 5 7 7 – 
Lefèvre  [4] 4 8E 6 11G 
R3 4 5 6 8 
*: Limited to 26 bits; x: Lowest number of additions ; N: Constant bit-size; 
E:  Equal to Upb of R3; G:  Greater than Upb of R3; Upb of R3= (N/4)+2 
