High Dynamic Range RNS Bases for Modular Multiplication by Shirin Rezaie et al.
High Dynamic Range RNS Bases for Modular Multiplication 
Shirin Rezaie
1, Mohammad Esmaeildoust
2, Marzieh Gerami
1, Keivan Navi
2 and Omid Hashemipour
2 
 
 
1 Department of Computer, Science and Research Branch, Islamic Azad University, Tehran, Iran 
 
 
2 Faculty of Electrical and Computer Engineering, Shahid Beheshti University, GC, Tehran, Iran 
 
 
Abstract 
Modular multiplication is the most important part of public key 
cryptography algorithm like RSA and elliptic curve cryptography. 
Residue Number System is an efficient way to speed up these 
applications  because  of  its  carry  free  nature.  Efficiency  of 
modular  multiplication  in  RNS  is  depending  on  effective 
selection  of RNS bases. In this work efficient design of RNS 
bases are reported where comparing to the state-of-the-art, the 
proposed  RNS  bases  has  enjoyed  more  efficient  arithmetic 
operation  and  residue/binary  to  binary/residue  conversion. 
Therefore modular multiplication in RNS is implemented with 
more speed. Comparison with the best work in literature shows 
that  noticeable  improvement  in  speed  has  achieved  by  the 
proposed RNS bases. 
Keywords:  Montgomery  Modular  Multiplication,  Residue 
Number  System  (RNS),  modular  arithmetic,  Elliptic  curve 
cryptography (ECC). 
1. Introduction 
Residue  Number  System  (RNS)  has  achieved  more 
attention  by  researcher  in  recent  years  for  its  ability  to 
perform fast arithmetic operation like addition, subtraction 
and  multiplication  [1].  Computations  in  RNS  are  done 
without  carry  propagation  between  residues and can run 
concurrently and independently, so it results in speed up 
and  reducing  the  complexity  of  different  arithmetic 
components.  RNS  is  an  instrumental  tool  in  many 
applications  like  image  processing,  public  key 
cryptography  [2-10]  and  digital  signal  processing  (DSP) 
[11]  which  require  high  speed  computations.  Modular 
multiplication  is  the  main  part  of  these  applications 
especially  in  cryptography  algorithms  like  RSA  [2],  [3] 
and elliptic curve cryptography (ECC) [9], [10] which can 
be implemented in RNS systems. 
Moduli  selection  has  important  role  in  efficiency  of 
arithmetic  operation,  forward  and  reverse  conversion 
which  are  the  three  main  parts  of  RNS  system  [12-14]. 
Different moduli sets with efficient reverse converter are 
proposed  by  researcher.  One  of  the  most  well-formed 
moduli sets is {2P
n
P-1, 2P
n
P, 2P
n
P+1} which forward converter for 
these moduli can be done with simple process and the best 
reverse converter for this moduli set is reported in [15].  
For  applications  like  cryptography  algorithms,  more 
dynamic ranges are needed. Therefore five moduli sets are 
presented  such  as  {2P
n
P,  2P
2n+1
P-1,  2P
n/2
P-1,  2P
n/2
P+1, 2P
n
P+1} [14] 
and {2P
n
P, 2P
n
P-1, 2P
n
P+1, 2P
n
P-2P
(n+1)/2
P+1, 2P
n
P+2P
(n+1)/2
P+1} when n is 
odd  [16].  Modular  multiplication  is  the  main  part  of 
cryptography algorithms such as RSA [3] and ECC [10]. 
One  of  the  most  known  algorithms  for  the  modular 
multiplication  is  Montgomery  modular  multiplication 
which does not require any division [17]. This algorithm 
can be implemented in RNS using an auxiliary basis [18]. 
Proper  selection  of  RNS  bases  results  in  increasing  the 
efficiency of modular multiplication. In [19] RNS bases for 
the  first  and  second  basis  are  proposed  in  the  form  of 
21
i k −  and  21
j k +  respectively  where  i,  j=  1,…,  m.  The 
main  disadvantages  of  this  work  are  unbalanced  moduli 
sets and inefficient multiplicative inverses which yield to 
increasing the delay of reverse converter and inefficiency 
of arithmetic operation. In [2] RNS bases are presented in 
the form of 2P
k
P-cRi
R where 0 ≤ cRi
R < 2P
k/2
P. Hamming weight of 
moduli in this work is equal to three in worse case. As 
discussed in [2], efficient reduction can be achieved by this 
form of moduli. Simple multiplicative inverses are another 
advantage  of  this  work.  This  report  is  the  fastest  RNS 
implementation until now which considered the efficiency 
of  arithmetic  operation,  residue/binary  to  binary/residue 
conversion.  We  can  enjoy  the  efficiency  of  reverse 
conversion and arithmetic operation of moduli set {2P
n
P, 2P
n
P-
1, 2P
n
P+1, 2P
n
P-2P
(n+1)/2
P+1, 2P
n
P+2P
(n+1)/2
P+1} [16] in the second basis 
of  the  work  reported  in  [2],  in  order  to  realize  RNS 
Montgomery multiplication with higher speed. This paper 
presents efficient RNS bases for public key cryptography 
and ECC especially. In first basis, search for moduli set in 
the form of 2P
k
P-cRi
R where 0 ≤ cRi
R < 2P
k/2
P with hamming weight 
equal  to  three  are  performed,  and  in  the  second  basis, 
moduli set {2P
n
P, 2P
n
P-1, 2P
n
P+1, 2P
n
P-2P
(n+1)/2
P+1, 2P
n
P+2P
(n+1)/2
P+1} [16] 
is  used.  With  these  RNS  bases,  we  can  utilize  the 
advantages  of  arithmetic  unit  and  efficient  forward  and 
reverse  conversion.  Moreover,  for  the  first  basis,  RNS 
moduli are proposed with variant bit lengths in order to 
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011 
ISSN (Online): 1694-0814 
www.IJCSI.org 69achieve different dynamic ranges especially for ECC, for 
example  192,  256 a nd  320  bits.  The  results  show  that 
noticeable  improvement  of  modular  multiplication  is 
achieved compared to the method proposed in [2]. 
This paper is organized as follows. Section 2 introduces 
RNS  and  modular  multiplication  background.  The 
proposed RNS bases are detailed in section 3. Comparison 
with other RNS bases is presented in Section 4 and finally 
section 5 concludes the paper. 
2. Related background 
2.1 Overview of RNS 
We consider a set of integers (p1, p2, …, pm), which is 
called RNS basis with 
1
m
i
i
Mp
=
=∏ . The pi’s are pair wise 
relatively prime, where gcd (pi, pj) = 1, for 1 ≤ i, j ≤ m, i ≠ 
j. The RNS representation of an integer [ ] 0, XM ∈ , is (xR1
R, 
xR2
R,…, xRm
R), i.e. xRi
R = X mod pRi
R. There are several algorithms 
for the reverse converter which translate residues into its 
equivalent weighted number. The Mixed Radix Conversion 
(MRC) is one of them calculated by: 
(1)    
-1
321 21 1
1
  ...
m
mi
i
X v p v p pv pv
=
= ++ + + ∏  
(2)   1 1 2 2 3 -1     ( ( ... )...) mm X v pv pv p v = + + ++  
Where 
2
2
33
3
11
-1
2 2 11
-1 -1
3 3 11 22
()
(( ) )
p p
pp p
vx
v xvp
v xvp vp
=
= −
= −−
 
 
And in the general case: 
-1 -1 -1
1 1 2 2 -1 -1 ((( ) ) ... )
mm m
m
m m mm pp p p
v x vp vp v p = − − −−
 
 
1
j
i p p
−
 is the multiplicative inverse of pRi
R modulus pRj
R.  
The other algorithm is Chinese Remainder Theorem (CRT) 
that  convert  residue  number  into  weighted  number  X  as 
follows: 
(3)  
1
i
m
ii i p
i M
X xN M
=
= ∑    
Where  M  =  pR1
RpR2
R…pRm
R, 
i
i
M
M
P
= and 
1
i
ii p NM
− =  is 
multiplicative  inverse  of  MRi
R  modulus  pRi
R.  The  CRT  is 
implemented in parallel channel followed by a modulus M 
adder  which  is  very  large,  but  MRC  is  a  sequential 
algorithm. For the moduli set with more than four moduli 
set, combination of these two algorithms could be applied 
to achieve higher speed of inverse converter [13-14]. 
2.2 Overview of RNS Montgomery multiplication 
In this section we discuss the calculation of Montgomery 
modular multiplication in RNS introduced in [2]. Consider 
X  and  Y  as  two  large  numbers,  BRm
R=  {p R1
R,…,pRm
R}  and 
B'Rm
R={p'R1
R,…,p'Rm
R}  as  two  bases.  Where 
1
m
i
i
Mp
=
=∏  and 
1
m
i
i
Mp
=
′′ =∏  are the products of the elements of the RNS 
bases. The RNS representation of X and Y in these bases is 
equal to (xR1
R,…,xRm
R) and (yR1
R,…,yRm
R) in the first basis, and 
(x'R1
R,…,x'Rm
R)  and  (y'R1
R,…,y'Rm
R)  in  the  second  basis.  We 
consider T where T < M < M' and gcd(T, M) = gcd(T, M') 
=  gcd( M,  M')  =  1.  The  term, A×B×MP
-1
P  mod  T,  can  be 
calculated by modular multiplication as following: 
RNS Montgomery Multiplication 
1. Consider D as product of A and B in two bases BRm
R and 
B'Rm
R.  This  means 
i i ii p d xy = ×  in  the  first  basis  and 
i i ii p d xy
′ ′ ′′ = × for i=1,…, m in the auxiliary basis and in 
the general case D = A × B. 
2.  Consider 
1
P P
Q DT
−
= ×  which  is  evaluated  just  in  the 
first basis thus
1
i
i
ii p p
q dT
−
= × . 
3. Representation of Q is extended to auxiliary basis B'Rm
R. 
4. Consider 
1
()
P R DQT M
−
′ = −× ×  which is computed just 
in the auxiliary basis B'Rm
R. Thus  1
( N)
i
i
i ii i p p
r dq M
−
′ ′
′ ′′′ = −× × . 
5. Representation of R is extended to the first basis BRm
R. 
 
3. Proposing RNS bases  
Basic  operation  for  RNS  Montgomery  multiplication 
consists of two conversions included several products and 
one  addition  [2].  Hence  proper  selection  of  RNS  bases 
leads  to  speed  up  these  operations  in  each  moduli,  and 
reverse  and  forward  conversion  can  be  done  with  more 
speed.  
 
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011 
ISSN (Online): 1694-0814 
www.IJCSI.org 703.1 Selecting RNS bases for modular multiplication 
Form of the moduli is very important for the efficiency of 
the  modular  multiplication,  so  selecting  efficient  RNS 
bases is the main purpose of this work in order to achieve 
efficient  modular  multiplication.  In  [2],  RNS  bases  are 
presented  in  the  form  of  2
k-ci  where  0  ≤  ci  <  2
k/2. 
Reduction in moduli 2
k-ci is easy and efficient arithmetic 
operation  will  be achieved comparing to general moduli 
[2]. Simple multiplicative inverses is another advantages of 
the  work  reported  in  [2]  which  result  in  replacing 
multiplication required in calculation of vi’s in Eq. 2 by 
some simple shift and addition . Cost of reduction in [2] 
for moduli in the form of 2
k
i c −  is reported as 2w(ci)+2 
additions of k bit words where w(ci) is the hamming weight 
of ci. 
In order to increase efficiency of arithmetic unit and speed 
of reverse converters, RNS moduli set, {2
n, 2
n-1, 2
n+1, 2
n-
2
(n+1)/2+1, 2
n+2
(n+1)/2+1} [16], is used for the second basis. 
For  the  first  basis  like  the  method  proposed  in  [2]  the 
exhaustive search for the moduli in the form of 2
k-ci where 
0 ≤ ci < 2
k/2 is done. Although the first basis has simple 
multiplicative inverses [2], the second basis enjoys more 
efficient  arithmetic  operation,  forward  and  reverse 
converter.  
Three  RNS  bases  with  different  dynamic  ranges  are 
proposed for each basis shown in table 1. Selecting RNS 
moduli  sets  with  various  bit  lengths  leads  to  having 
different  dynamic  ranges  which  is  suitable for ECC, for 
example  192,  256 a nd  320  bits.  Reverse  converter  for 
moduli set {2
n, 2
n-1, 2
n+1, 2
n-2
(n+1)/2+1, 2
n+2
(n+1)/2+1} [16] 
is designed for odd n, therefore k is considered even in first 
basis and n is considered as k+1 which is shown in table 1. 
Table 1: proposed RNS bases for various dynamic ranges 
RNS bases  First basis Bm  Auxiliary basis B'm 
The first 5-
moduli 
RNS bases 
2
64-2
10-1 , 
2
64-2
31-1 , 
2
64-2
16-1 , 
2
64-2
19-1 , 
2
64-2
20-1 . 
2
65 , 
2
65-1 , 
2
65+1 , 
2
65-2
33+1 , 
2
65+2
33+1 . 
The second 5-
moduli 
RNS bases 
2
52-2
10-1 , 
2
52-2
31-1 , 
2P
52
P-2P
15
P-1 , 
2P
52
P-2P
19
P-1 , 
2P
52
P-2P
20
P-1 . 
2P
53 
P, 
2P
53
P-1 , 
2P
53
P+1 , 
2P
53
P-2P
27
P+1 , 
2P
53
P+2P
27
P+1 . 
The third 5-
moduli 
RNS bases 
2P
40
P-2P
8
P-1 , 
2P
40
P-2P
10
P-1 , 
2P
40
P-2P
16
P-1 , 
2P
40
P-2P
19
P-1 , 
2P
40
P-2P
20
P-1 . 
2P
41 
P, 
2P
41
P-1 , 
2P
41
P+1 , 
2P
41
P-2P
21
P+1 , 
2P
41
P+2P
21
P+1 . 
 
 
The  process  of  conversion  from  one  basis  to  another 
needed in line 3 and 5 of modular multiplication algorithm 
prescribed in section 2.2 is shown in figure 1. Delay of 
conversion from first basis to second basis and vice versa 
must be considered in order to achieve overall delay. 
 
 
(A) 
 
(B) 
Fig.1  (A) conversion from first to second basis, (B) conversion from 
second to first basis 
 
3.2  RNS  to  RNS  conversion  from  first  to  second 
basis 
As shown in figure 1, RNS to RNS conversion from first to 
second  basis  includes  two  steps:  RNS  to  mixed  radix 
system (MRS) in first basis and MRS to RNS from first to 
second basis. Eq. 4 shows the delay of these conversions: 
 
(4)       
RNS RNS RNS MRS MRS RNS Delay Delay Delay −−− =+          
 
 
Delay of RNS to MRS based on [2] in first basis is: 
 
(5)        ( )
1
1
2, ;
1
max 2 ( ) 4
j
m
RNS MRS i j FA p j mi j
i
Delay w p w c kD
−
−
− = <
=
  =++     ∑   
    
 
Where  1
j
i p p
− is  multiplicative  inverse  of  pRi
R  modulus  pRj
R, 
w(cRj
R) is the hamming weight of cRj
R, DRFA
R is delay of one bit 
full adder and m is the number of moduli in each basis. For 
five moduli set Eq. 5 is reformed to: 
 
(6)   ( )
4
1
2,5;
1
max 2 ( ) 4
j
RNS MRS i j FA p j ij
i
Delay w p w c kD
−
− = <
=
  =++     ∑  
 
Delay  of  RNS  to  MRS  conversion  from  first  to  second 
basis with different bit lengths is shown in table 2. Note 
that the moduli in first basis are same with [2]. Therefore 
comparing to [2], same RNS to MRS conversion from first 
to second basis is achieved. 
Table 2: cost of conversion from RNS to MRS with various bit lengths 
Key length  Conversion’s Delay from RNS to MRS 
320  5632 DRFA 
256  4836 DRFA 
192  3220 DRFA 
 
After calculation of MRS, conversion of MRS numbers to 
residues in the next basis must be done. Since conversion 
of  MRS  to  residues  in  second  basis  can  be  perform  in 
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011 
ISSN (Online): 1694-0814 
www.IJCSI.org 71parallel,  hardware  implementation  for  critical  moduli  in 
second  bases  which  are  the  moduli 2
n+2
(n+1)/2+1 and 2
n-
2
(n+1)/2+1 are done. The proposed hardware implementation 
represented in the following subsection which has delay 
(7)    
1
( 1)/2
1
( (2 2 1) 3 )
m
nn
MRS RNS FA
i
Delay MA CSA CPA D
−
+
−
=

= ± ++ + 
 ∑  
 
Where  MA  (2
n±2
(n+1)/2+1)  is  modular  adder  in  modulo 
2
n±2
(n+1)/2+1, CSA represent carry save adder and CPA is 
carry propagation delay where ripple carry adder is used in 
this  design.  Based  on  achieved  hardware  and  delay  for 
reduction in moduli 2
n±2
(n+1)/2+1, MRS to RNS conversion 
in second basis has delay 
(8)  
1
( 1)/2
1
( (2 2 1) 3 )
m
nn
MRS RNS FA
i
Delay MA CSA CPA D
−
+
−
=

= ± ++ + 
 ∑  
 
By using modulo m adder [21] and considering 2n delay of 
FA for moduli 2
n-2
(n+1)/2+1, we have: 
(9)    
4
1
(2 4 ( 3)) (12 28) MRS RNS FA
i
Delay n n n D −
=
= ++ + = + ∑  
 
Similarly  considering  (2n+2)  delay  of  FA  for  moduli 
2
n+2
(n+1)/2+1, we have 
(10)  
4
1
(2 6 ( 3)) (12 30) MRS RNS FA
i
Delay n n n D −
=
= ++ + = + ∑    
 
3.2.1 Reduction in modulo 2
n-2
(n+1)/2+1 
Efficient binary to residue conversion in moduli 2
n, 2
n-1 
and  2
n+1 are proposed by researcher [22] which can be 
employed in MRS to RNS conversion in this work. For the 
calculation of MRS to RNS delay, moduli in the form of 
2
n-2
(n+1)/2+1  and  2
n+2
(n+1)/2+1  must  be  considered. 
Reduction in modulo 2
n-2
(n+1)/2+1 is based on Eq. 2. Let us 
rewrite it for simplicity. 
(11)  
1 12 23 1 ( ( ... )...)
j i mm p x v pv pv p v − = + + ++    
 
Where  p1,  p2,…,  pm-1  are  moduli  in  the  form  of 
221
i t k −−  and pj is 2
n-2
(n+1)/2+1. Therefore Eq. 11 can be 
rewritten as: 
 
(12)  
1
3 2
( 1)
2
12
34
22 1
(2 2 1)(
(2 2 1)( (2 2 1) ...)...)
n
n
t k
t t kk i
L
vv
x vv
+
−+
+ −− +
= −− + −− +    
   
 
Note  that  n  =  k+1  in  Eq.  12.  Considering  L  as  basic 
operation in Eq. 12 result in 
(13)     ( 1)
2
111
22 1
0...0 0...0
n
n i
i iii
kt
L v vvv
+ +++
−+
= +−−    
               
 (k+1)-bit separation results in 
     (14)   ( 1)
2
1 12345
11111 22 1
n
n i iiiii L v vvvvv +
+++++ −+ = ++−−−    
Where 



1
1
1 1,0
1 1,0 1, 1 1,1
2
1 1, 1 1,1
3
1 1, 1,0
1 1, 1,0 1, 1 1, 4
1 1, 1 1, 1
2
0
0...0
0 ...
00 ...
... 0...0
... 0 ..
0...0 ..
i
i
i
ii
ii
k
i i ik i
i ik i
i i kt i
t
i ik t i ik ik
i ik ik t
kt
vv
vv
vvv v
v vv
vv v
vv vv v
v vv
++
+ + +− +
+ +− +
+ +− +
+ +− + +− +−
+ +− +− +
−+
=
 =  ′ = 
 = 
 =

′′ = 
= 

1
5
11 0
i t
ii vv
+
++ =
   
Negative number in modulo 2
n-2
(n+1)/2+1 can be expressed 
as 
( 1) ( 1)
2 2
( 1)
2
( 1)
2
( 1)/2
22 1 22 1
( 1)/2
22 1
( 1)/2
22 1
(2 2 1)
                     = (2 1 ) 2 2
                     = ( 2 2)
n n
n n
n
n
n
n
nn
nn
n
vv
v
v
+ +
+
+
+
−+ −+
+
−+
+
−+
− = − +−
−− − +
+− +
 
Since n is 41, 53 and 65 shown in table 1, the value (-
2
(n+1)/2+2)  can  be  computed  according  to  value  of  n. 
Therefore (-2
(n+1)/2+2) is considered as constant r which is 
determined according to n. Thus  
   (15)   ( 1)
2
15
111 22 1 2 n
n i iii L v vvv r +
+++ −+ ′ ′′ = ++++    
 
Hardware implementation of Eq. 15 is shown in figure 2. 
In this figure, output of the binary adder S has n+3 bit. In 
order to calculate S in modulo 2
n-2
(n+1)/2+1, with (n-1)-bit 
separation S1 and S2 are achieved where S1 is n-1 bits LSB 
and  S2  is  the  rest.  Therefore 
( 1)/2 11 22 1
nn SS + −+ =  and  S2  is 
applied  to  a  combinational  circuit  that  computes 
( 1)/2 2 22 1
nn S + −+ [20],  and  produce  the  variable,  S3.  Finally 
modulo adder are used to calculate the results.  
n-bit CSA
Com. Ckt.
Binary adder
Modular 
adder
1 S
3 bit MSB
3 S
n-bit CSA
n-bit CSA
1 i v + ′
1 i v + ′′ 5
1 i v + 2r
1 bit LSB n−
1
i v
2 S
 
Fig.2  hardware implementation of reduction of L in modulo 2
n-2
(n+1)/2+1 
 
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011 
ISSN (Online): 1694-0814 
www.IJCSI.org 72Since  L  has  (k+1)  bit,  in  the  next  step 
(2 2 1)
i t k
i Lv L ′ = + − −×  must be calculated as follows: 
(16)   ( 1)
2 22 1 (2 2 1)
i n
n
t k
i Lv L +
−+ ′ =+ −−    
(17)   ( 1)
2
1234
22 1 3  n
n i Lv LLLLL r +
−+ ′ = ++ + + ++    
 
Where 
       



1
1
0
01
2
1
3
0
01 4
1
1
0
0...0
...
0 ...
... 0...0
... ...
0...0 ...
i
i
ii
i
i
ii
k
k
k
kt
t
kt k kt
k kt
kt
vv
LL
L LL L
L LL
LL L
L L LL L
L LL
−
− −+
−+
−+
=
 =  ′ = 
 = 
 =

′′ = 
= 

 
 
So Eq. 17 changes to 
(18)   ( 1)
2
1
22 1 2 n
n i Lv LLL r +
−+ ′ ′ ′′ = ++ ++      
        
Hardware  implementation  of  L'  is  similar  to  figure  2. 
Based on this hardware implementation, the delay and area 
of conversion from MRS to RNS can be calculated as 
(19)  
1
( 1)/2
1
( (2 2 1) 3 )
m
nn
MRS RNS FA
i
Delay MA CSA CPA D
−
+
−
=

= − ++ + 
 ∑    
 (20)  
1
( 1)/2
1
( (2 2 1) 3 )
m
nn
MRS RNS FA
i
Area MA CSA CPA A
−
+
−
=

= − ++ + 
 ∑    
   
Where AFA is area of one bit full adder. 
 
3.2.2 Reduction in modulo 2
n+2
(n+1)/2+1 
Based on Eq. 12 we have 
(21)  
1
3 2
( 1)
2
12
34
22 1
(2 2 1)(
(2 2 1)( (2 2 1) ...)...)
n
n
t k
t t kk i
I
vv
x vv
+
++
+ −− +
= −− + −− +    
   
 
Note that n=k+1 in Eq. 21. Basic operation in Eq. 21 is 
calculation of I which can be done as: 
(22)     ( 1)
2
111
22 1
0...0 0...0
n
n i
i iii
kt
I v vvv
+ +++
++
= +−−    
 
 (k+2)-bit separation results in 
(23)   ( 1)
2
1 12345
11111 22 1
n
n i iiiii I v vvvvv +
+++++ ++ = ++−−−    
 
Where 
 
       
1 00 ii vv =
   



1
1 1,1 1,0
1 1,1 1,0 1, 1 1,2
2
1 1, 1 1,2
3
1 1, 1 1,0
1 1, 1 1,0 1 4
1 1, 1 1, 2
4
0...0
00 ...
0000 ...
... 0...0
.. 00
0..0 ..
i
i
i
i
i
i ii
k
i i i ik i
i ik i
i i kt i
t
i i kt i i
i ik ik t
kt
v vv
v vv v v
v vv
vv v
vv v v
v vv
+ ++
+ + + +− +
+ +− +
+ + −+ +
+ + −+ + +
+ +− +− +
−+
 =  ′ = 
 = 
 =

′′ = 
= 

, 1 1, 2
5
11
..
00
i k i kt
ii
v
vv
− + −+
++ =
 
Following the same approach that is used in calculation of 
( 1)
2 22 1
n
n v
+
−+ − , then 
( 1)
2 22 1
n
n v
+
++ −  can be expressed as 
 
( 1) ( 1)
2 2
( 1)
2
( 1)
2
( 1)/2
22 1 22 1
( 1)/2
22 1
( 1)/2
22 1
(2 2 1)
                     = (2 1 ) 2 2
                      =2 2
n n
n n
n
n
n
n
nn
nn
n
vv
v
v
+ +
+
+
+
++ ++
+
++
+
++
− = + +−
−− + +
++
 
 
As mentioned before, n is 41, 53 and 65, so the value of 
(2
(n+1)/2+2)  can  be  computed  according  to  value  of  n. 
Therefore (2
(n+1)/2+2) is considered as constant r' which is 
determined according to n. Thus  
 
(24)   ( 1)
2
15
111 22 1 2 n
n i iii I v vvv r +
+++ ++ ′ ′′ ′ = ++++    
 
By  replacing  I  in  Eq.  21  with  (k+2)-bit  binary  form, 
(2 2 1)
i t k
i vI + − −× must  be  calculated.  Therefore  we 
have: 
(25)   ( 1)
2 22 1 (2 2 1)
i n
n
t k
i Iv I +
++ ′ =+ −−    
(26)   ( 1)
2
1 234
22 1 3 n
n i IvIIII I r +
++ ′′ = + + + + ++    
 
Where 
       



1
1
10
10 1 2
2
12
3
10
101 2 4
12
2
00
0...0
...
00 ...
... 0...0
... ...
0...0 ...
i
i
ii
i
i
ii
k
k
k
kt
t
kt k kt
k kt
kt
vv
I II
I III I
I II
II I
I I II I
I II
+
+
−+
−+ + −+
+ −+
−+
=
 =  ′ = 
 = 
 =

′′ = 
= 

 
 
So Eq. 26 changes to 
(27)   ( 1)
2 22 1 2
n
n i IvIII r
+
++ ′ ′ ′′ ′ = ++ ++    
 
Hardware  implementation  of  Eq.  27  is  also  similar  to 
figure 2. Therefore delay of conversion from MRS to RNS 
for moduli, 2
n-2
(n+1)/2+1 and 2
n+2
(n+1)/2+1, is equal to:  
 
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011 
ISSN (Online): 1694-0814 
www.IJCSI.org 73(28)  
1
( 1)/2
1
( (2 2 1) 3 )
m
nn
MRS RNS FA
i
Delay MA CSA CPA D
−
+
−
=

= ± ++ + 
 ∑
   
3.3  RNS  to  RNS  conversion  from  second  to  first 
basis 
As  shown  in  figure  1,  RNS  Montgomery  modular 
multiplication  required  conversion  from  second  to  first 
basis. The second basis is 5-moduli set, {2
n, 2
n-1, 2
n+1, 2
n-
2
(n+1)/2+1,  2
n+2
(n+1)/2+1}  [16].  Delay  of  conversion  from 
second to first basis can be calculated as: 
 
(29)  
RNS RNS RNS Weighted Weighted RNS Delay Delay Delay −− − =+    
 
Delay  and  area  of  conversion  from  RNS  to  weighted 
number for these five moduli set based on [16] is shown in 
the table 3. 
Table 3: Delay and area of reverse conversion in second RNS basis 
 
RNS basis 
Area of Conversion 
from RNS to 
Weighted 
Delay of 
Conversion 
from RNS 
to Weighted 
{2P
n
P, 2P
n
P-1, 2P
n
P+1, 
2P
n
P-2P
(n+1)/2
P+1, 
2P
n
P+2P
(n+1)/2
P+1} [16] 
(19n)ARFA
R+ (7n)ARXOR
R+ 
(7n)ARAND
R+ (2n)ARXNOR
R+ 
(2n)AROR
R+ (4n)ARNOT 
 
(8n+4)DRFA 
 
The  forward  converter  for  modulo  in  the  form  of 
221
i t k −−  where  0 /2 i tk <<  is  presented  in  [2].  This 
modulo  achieves  small  hamming  weight  and  simple 
multiplicative  inverses.  In  [2]  to  calculate  the  residue 
numbers from MRS in five moduli set based on Eq. 2 the 
following operation must be considered: 
(30)  
1 12 23 34 4 5
221
( ( ( )))
t j k
j
H
xvp vp vp vp v
−−
= ++++    
   
 
The delay of H is addition of k bit words where w(cRi
R) and 
w(c'Rj
R)  are  the  hamming  weight  of  221
i t k −−  and 
221
j t k −− , respectively. In [2] total delay for m moduli is 
reported as 
 
(31)  
1
1,
1
m a x ( ()2 ()2 )
m
MRS RNS i j FA jm
i
Delay w c w c kD
−
− =
=
 ′ = ++ 
 ∑    
 
For designing forward converter in modulo 221
j t k −−  for 
5k-bit dynamic ranges we have 
 
(32)  
4
432
4 3 2 10
0
22222
ik k k k k
i
i
X x x x x xx
=
= = + + ++ ∑    
 
Eq. 32 can be rewritten as 
 
(33)  
43 2 1 0 2 (2 (2 (2 ) ) )
kkkk
z
X xx x x x =++++ 
   
 
Unlike  the  calculation  needed  in  Eq.  31,  the  hamming 
weight of cRi
R in calculation of z as the basic operation in Eq. 
33 is equal to zero. Therefore the delay of reduction in 
modulo 221
j t k −−  reported in Eq. 31 for the proposed 
RNS bases is changed to 
 (34)  
1
1
(2 ( ) 2)
m
Weighted RNS j FA
i
Delay w c kD
−
−
=
 ′ =+ 
 ∑    
 
Therefore for five moduli set, Eq. 32 changes to 
(35)  
4
1
(2 ( ) 2) Weighted RNS j FA
i
Delay w c kD −
=
 ′ =+ 
 ∑    
 
Total delays from second to first basis for proposed moduli 
sets are shown in table 4. 
Table 4: total cost of conversion from second to first basis 
key length
 
5moduli proposed 
320  2060 DRFA 
256  1676 DRFA 
192  1292 DRFA 
4. Complexity of modular multiplication and 
comparison 
The main aim of this work is increasing the efficiency of 
arithmetic  operation.  Since  the  second  basis  of  our 
approach  is  the  moduli  set  {2P
n
P,  2P
n
P-1,  2P
n
P+1,  2P
n
P-2P
(n+1)/2
P+1, 
2P
n
P+2P
(n+1)/2
P+1},  reduction  in  this  moduli  set  can  be 
implemented with more simple process (four levels of CSA 
and  MA  (2P
n
P±2P
(n+1)/2
P+1)  in  worse  case).  With  using  the 
moduli set, {2P
n
P, 2P
n
P-1, 2P
n
P+1, 2P
n
P-2P
(n+1)/2
P+1, 2P
n
P+2P
(n+1)/2
P+1} in 
the second basis, the delay of MRS to RNS conversion is 
implemented with faster hardware. As shown in table 5, the 
proposed  RNS  bases  have  achieved  noticeable 
improvement in delay of RNS to RNS conversion required 
in the process of RNS Montgomery multiplication. 
Table 5: Comparison delay of different RNS bases for PR256 
RNS bases  Total delay  Improvement 
(%) 
5moduli bases [2]  (11840)DRFA  - 
The first 5moduli proposed  (8502)DRFA  28% 
 
 
 
 
 
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011 
ISSN (Online): 1694-0814 
www.IJCSI.org 745. Conclusion 
This  paper  presents  five  moduli  RNS  bases  in  order  to 
increase the efficiency of RNS Montgomery multiplication. 
RNS  moduli  sets  with  various  bit  lengths  are  proposed 
which cover different dynamic ranges for ECC 192, 256 
and 320 bits. Higher speed in RNS to RNS conversion is 
achieved by the proposed RNS bases. Comparison with the 
best  RNS  bases  achieved 28% improvement in delay of 
RNS  to  RNS  conversion  in  five  moduli  RNS  bases. 
Therefore  more  efficient  modular  multiplication  is 
achieved by the proposed RNS bases. 
 
References 
[1]  K.  Navi,  A.  S.  Molahosseini,  M.  Esmaeildoust,  "How  to 
Teach Residue Number System to Computer Scientists and 
Engineers," IEEE Transactions on Education, vol. 53, no. 3, 
2010. 
[2] C. Bajard, M. Kaihara, T. Plantard, "Selected RNS Bases for 
Modular  Multiplication,"  19th  IEEE  International 
Symposium on Computer Arithmetic, pp. 25-32, 2009. 
[3]  J.  C.  Bajard,  L.  Imbert,  "A  Full  RNS  Implementation  of 
RSA," IEEE Transactions on Computers, vol. 53, no. 6, pp. 
769-774, 2004. 
[4]  J.  Bajard,  L.  Didier,  and  P.  Kornerup,  "An  RNS 
Montgomery's  Modular  Multiplication  Algorithm,"  IEEE 
Trans. Computers, vol. 47, no. 2, pp. 167-178, Feb. 1998. 
[5]  J.  Bajard,  L.  Didier,  and  P.  Kornerup,  "Modular 
Multiplication  and  Base  Extensions  in  Residue  Number 
Systems,"  Proc.  15th  IEEE  Symp.  Computer  Arithmetic 
(ARITH '01), pp. 59-65, 2001. 
[6]    A.F.  Tenca  and  C  .K.  Koc,  "A  Scalable  Architecture  for 
Modular Multiplication Based on Montgomery's Algorithm," 
IEEE Trans. Computers, vol. 52, no. 9, pp. 1215-1221, Sept. 
2003. 
[7]  C.  McIvor,  M.  McLoone,  and  J.V.  McCanny,  "Modified 
Montgomery  Modular  Multiplication  and  RSA 
Exponentiation,"  IEE  Proc.  Computers  and  Digital 
Techniques, vol. 151, pp. 402-408, 2004. 
[8] C. McIvor, M. McLoone, J.V. McCanny, A. Daly, and W. 
Marnane,  "Fast  Montgomery  Modular  Multiplication  and 
RSA  Cryptographic  Processor  Architectures,"  Proc.  37th 
Ann.  Asilomar  Conf.  Signals,  Systems,  and  Computers, 
2003. 
[9] D. M. Schinianakis, A. P. Fournaris, H. E. Michail, A. P. 
Kakarountas, and T. Stouraitis, "An RNS Implementation of 
an  Fp  Elliptic  Curve  Point  Multiplier",  IEEE 
TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, VOL. 
56, NO. 6, JUNE 2009. 
[10] D. M. Schinianakis, A. P. Kakarountas, and T. Stouraitis, 
"A  new  approach  to  elliptic  curve  cryptography:  An  RNS 
architecture,"  in  Proc.  IEEE  Mediterranean  Electrotech. 
Conf., pp. 1241-1245, May 2006. 
[11] G.C. Cardarilli, A. Nannarelli and M. Re, "Residue Number 
System  for  Low-Power  DSP  Applications,"  Proc.  of  41nd 
IEEE  Asilomar  Conference  on S ignals,  Systems,  and 
Computers, 2007. 
[12] K. Navi, M. Esmaeildoust, A. S. Molahosseini, “A General 
Reverse  Converter  Architecture  with  Low  Complexity  and 
High  Performance”,  IEICE  TRANSACTIONS  on 
Information  and  Systems  Vol.E94-D  No.2  pp.264-273, 
2011. 
[13] A. S. Molahosseini, K. Navi, C. Dadkhah, O. Kavehei and 
S.  Timarchi,  "Efficient  Reverse  Converter  Designs  for  the 
new 4-Moduli Set {2P
n
P-1, 2P
n
P, 2P
n
P+1, 2P
2n+1
P-1} and {2P
n
P-1, 2P
n
P+1, 
2P
2n
P,  2P
2n
P+1}  Based  on N ew  CRTs,"  IEEE  Transactions  on 
Circuits and Systems-I, vol. 57, no. 4, (2010), pp.  823-835. 
[14]  Mohammad  Esmaeildoust,  Keivan  Navi  and 
MohammadReza Taheri, “High speed reverse converter for 
new  five-moduli  set  {2P
n
P,  2P
2n+1
P-1,  2P
n/2
P-1,  2P
n/2
P+1,  2P
n
P+1},” 
IEICE Electronics Express,  Vol. 7, No. 3 pp.118-125, 2010. 
[15] Y. Wang, X. Song, M. Aboulhamid and H. Shen, "Adder 
based  residue  to  binary  numbers  converters  for  {2P
n
P-1,  2P
n
P, 
2P
n
P+1}," IEEE Transactions on Signal Processing, vol. 50, no. 
7, pp. 1772-1779, 2002. 
[16]  A.  A.  Hiasat,  “VLSI  implementation  of  New  Arithmetic 
Residue  to  Binary  decoders,”  IEEE  Transactions  on VLSI 
systems, vol. 13, no. 1, pp. 153-158, 2005. 
[17]  P.Montgomery,  "Modular  Multiplication  without  Trial 
Division," Mathematics of Computation, vol. 44, no. 170, pp. 
519-521, Apr. 1985. 
[18]  K.C  Posch  and  R.  Posch,  Modulo  reduction  in  residue 
number  systems.  IEEE  Transaction  on  Parallel  and 
Distributed Systems, 6:5 (1995) p.: 449-454 
[19] Kooroush Manochehri Kalantari, Saadat Pour Mozafari and 
Babak  Sadeghiyan,  "Improved  RNS  for  RSA  Hardware 
Implementation", Vol. 2, No. 2&4 (b), pp. 31-39, 2004. 
[20]  A.A.  Hiasat,  “Arithmetic  binary  to  residue  encoders  for 
moduli  (2P
n
P±2P
k
P+1)”,  IEE  Proc.-Comput.  Digit.  Tech.,  Vol. 
150, No. 6, November 2003. 
[21]  M.A.  Bayoumi,  G.A.  Jullien,  W.C.  Miller,  “A  VLSI 
implementation  of  residue  adder,”  IEEE  Transactions  on 
Circuits and Systems, vol. 34, no. 3, pp. 284-288, 1987. 
[22] B. Guan and E.V. Jones, “Fast conversion between binary 
and residue numbers, ” Electronics Letters, vol. 24, no. 19, 
pp. 1195-1197, 1988.  
 
 
 
Shirin Rezaie was born in Tehran, Iran, in 1986. She received the 
B.Sc.  degree  from  Islamic  Azad  University (IAU), South Tehran 
Branch, Tehran, Iran in 2009. She is M.Sc. student in Computer 
Architecture  at  IAU,  Science  and  Research  Branch.  She  is 
working  on c omputer  arithmetic  especially  on  residue  number 
system. 
  
Mohammad Esmaeildoust  received  the  B.Sc.  degree  in 
hardware  engineering  from  Shahed  University,  Tehran,  Iran,  in 
2006, and  he M.Sc. degree in computer architecture from Shahid 
Beheshti  University  of  Technology,  Tehran, Iran, in 2008. He is 
currently  a  Ph.D.  candidate  in  computer  architecture  at  Shahid 
Beheshti  University  of  Technology.  He  is  working  on 
reconfigurable  computing and  computer  arithmetic,  especially  in 
the area of the residue number system. 
 
Marzieh Gerami is  M.Sc.  student  in  Computer  Architecture  at 
IAU, Science and Research Branch (Tehran, Iran). She received 
the B.Sc. degree from Shahid Bahonar University Of Kerman, Iran 
in  2007.  She  is  working  on c omputer  arithmetic  especially  on 
residue number system.  
 
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011 
ISSN (Online): 1694-0814 
www.IJCSI.org 75Keivan Navi received the M.Sc. degree in Electrical Engineering 
(Computer  Hardware)  from  Sharif  University  of  Technology, 
Tehran,  Iran,  in  1990  and  the  Ph.D.  degree  in  computer 
architecture from Paris XI University, Paris, France, in 1995. He is 
currently an Associate Professor in the Faculty of Electrical and 
Computer Engineering, Shahid Beheshti University. His research 
interests include the Residue Number System, carbon nanotubes, 
single electron transistors, reversible logic design, interconnection 
network, and quantum computing.  
 
Omid Hashemipour (BS’85,  MS’87,  Phd’91)  in  electrical 
engineering all received from university of Arkansas at Fayetteville 
USA.  From  1991,  he  is  with  the  Electrical  and C omputer 
engineering Faculty at Shahid Beheshti University, G.C., Tehran, 
Iran as an associate professor. His research interest includes Low 
Power, Low Voltage Analog and Digital integrated Circuits.  
 
IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 4, No 1, July 2011 
ISSN (Online): 1694-0814 
www.IJCSI.org 76