Estimation of Hardware Complexity of Residue Number System for Signal Processing Application by Chaitali Biswas Dutta et al.
 
 
Estimation of Hardware Complexity of Residue Number System for 
Signal Processing Application 
 
 
Chaitali Biswas Dutta 
Research Scholar, Dept of 
CSE, University of  
Kalyani, India 
Asst. Prof., Dept of CA, 
GIMT, Guwahati, India 
mail.chaitali@yahoo.in
 
 
 
Partha Garai 
Machine Intelligence Unit, 
Indian Statistical Institute, 
203, BT Road, Kolkata - 
700108, India 
 
 
 
 
Amitabha Sinha
 
School of Information 
Technology, West Bengal 
University of Technology, 
BF-142, Sector-1, Salt 
Lake City, Kolkata-
700064, India 
 
Abstract 
 
Residue Number System (RNS) are becoming popular 
for  designing  high  performance  DSP  processors 
because of their ability to offer carry free arithmetic 
operation. The carry free operations lead to concurrent 
execution  of  arithmetic  operations  on  the  residues. 
However in RNS, moduli selection is one of the most 
important  parameter  that  determines  bit  efficiency, 
area, power consumption, speed etc.  The novelty of the 
architecture  is  shown  by  comparison  the  different 
schemes  reported  in  the  literature.  Also  a  Ranking 
method  is  proposed  using  which  moduli  sets  can  be 
ranked  in  terms  of  bit  efficiency  &  hardware 
complexity. 
 
Keywords: Residue Number System, Moduli Selection, 
Dynamic  Range,  Hardware  Complexity,  Area 
Estimator 
 
1.  Introduction  
Weighted number systems such as the binary 
number  system  [1]  and  the  decimal  number  system 
have a carry chain. It is often limiting the performance 
of  computer  arithmetic  [2,3].  Among  all  number 
systems, RNS is gaining attraction of research because 
of  concurrency  and  carry  free  arithmetic  operations. 
The Residue Number System (RNS) is a non-weighted 
number system. In RNS, arithmetic operations on large 
integers  are  done  by  splitting  them  into  smaller 
residues and performing the operations independently 
and  parallelly,  thereby  speeding  up  the  whole 
operation. 
H/w  complexity  or  area  overhead  is  an 
important factor for any digital circuit. Increase in area 
not only leads to rise in cost because of more silicon 
usage but also leads to lower yield and reliability. It is 
a well-known fact that, if the area of a single chip is 
large than chances of defects also rise, leading to lower 
yields. Further, heat generated by a large chip is more 
leading to drop in reliability. That is why area overhead 
or hardware complexity in an important factor and we 
should minimize this factor. 
Moduli  selection  [4,5,6]  is  the  first  step  in 
RNS  and  that  determines  bit  efficiency,  hardware 
complexity, frequency of operation etc [7]. There are 
several techniques for moduli set generation reported in 
the literature 
n n n  (2 , 2  + 1, 2  - 1)  [6], 
n n n-1  (2 , 2  -1, 2  -1)   
and 
2n n n  (2  +1,2  + 1,2  -1)  [7].  In  [6]  it  was 
mentioned  that  hardware  complexity  of  RNS  is 
minimum for the moduli set  
t 1 1 1 2 n n n n n {2 ,2 +1,2  -1,2 1,...,2 1} ± ± .  In  this  paper 
we propose a new scheme to generate moduli set and 
show that its hardware complexity is lower that that of 
[6].  The contributions of paper are following: 
 
1.  Propose an algorithm to generate any moduli set of 
finite  number  of  cardinality  in  a  given  dynamic 
range. It many be noted that other moduli section 
schemes  mentioned  above,  do  not  have  an 
algorithm to generate them; given a dynamic range 
they are generated by trial and error.   
2.  Show that bit efficiency of the proposed scheme is 
better than all other scheme given in the literature. 
3.  Show  that  hardware  complexity  of  the  proposed 
scheme is better than existing schemes [6,7]. 
4.  Propose a heuristic to select an appropriate moduli 
set (in a given dynamic range) that has minimum 
hardware complexity, without actual synthesis.  
 
 
Chaitali Biswas Dutta et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1540-1547
IJCTA | SEPT-OCT 2011 
Available online@www.ijcta.com
1540
ISSN:2229-6093 
 
2.  Overview of RNS  
RNS  uses  a  set  of  numbers 
0 1 2 n-1 (r , r , r , ..., r ),  which  is  mapped  with  some 
number X in any other number system using a set of 
integers  m0,  m1,  m2,  …,  mn-1  called  moduli.  These 
numbers  are  relatively  prime,  that  is,  GCD 
( ,  ) 1 for  i j m m i j = ≠ . Let X be a decimal number 
and N is the product of all moduli. N is called dynamic 
range. Then RNS can represent any numbers from 0 to 
(N-1). Now r = (X mod m) where r is the remainder of 
a number X with respect to modulus m. Number X will 
be represented by n-tuple  0 1 2 n-1 (r , r , r , ..., r ), where 
i i r  = (X mod m ) and 0  i  n-1 ≤ ≤ . 
The standard arithmetic operations of addition 
[8,9], subtraction [10] and multiplication [11] are easily 
implemented with residue number system, depending 
on the choice of the moduli. The Chinese Remainder 
Theorem (CRT) may rightly be viewed as one of the 
most  important  fundamental  results  in  the  theory  of 
RNS. The CRT [12, 13, 14] is useful for many other 
operations and above all it is very helpful in case of 
RNS  to  binary  conversion.  Mainly  New  Chinese 
Remainder Theorem is introduced for this conversion 
[15, 16, 17, 18]. CRT is assured that if the moduli of a 
RNS are chosen appropriately then each number in the 
dynamic range will have a unique representation in the 
residue system. 
.  In  the  literature,  there  are  a  few  kinds  of 
moduli  sets.  A  set  of  any  given  moduli  is  called  a 
general-moduli set, as it is efficient for RNS systems 
with a large dynamic range. The three-moduli sets, 
1
n n n
M S  = (2 , 2  + 1, 2  - 1)  , 
2 M S  = {2n, 2n + 1, 2n - 1) , 
3
n n n-1
M S  = (2 , 2  -1, 2  -1)  and 
4
2n n n
M S = (2  +1,2  + 1,2  -1) are four cases 
of the general-moduli sets and these sets are widely 
used for residue number system with a medium 
dynamic range [4, 11]. 
  Given  a  moduli  set  hardware  complexity 
depends on the functionalities of the RNS. In this paper 
we  compare  the  hardware  complexity  using  a  case 
study  of  an  RNS  processor  having  the  following 
functionalities: 
1.  Binary to RNS conversion [18] 
2.  DEMUX 
3.  Addition  
4.  Subtraction 
5.  Multiplication 
6.  MUX 
7.  RNS to binary conversion [15, 16, 17] 
ADD
SUB
MUL
D
E
M
U
X
D
E
M
U
X
M
U
X
B/R
Conv.
B/R
Conv.
R/B
Conv.
n
n
n
n
n
n
n
n
n
n
n
n
R1
R2
Rf
LUT
 
Figure 1: Basic Block Diagram for RNS 
Processor 
 
The  basic  block  diagram  for  the  RNS  processor  is 
given above (i.e., Figure 1). 
 
3.  Algorithm for Moduli-Set Generation 
and Estimation for Area 
In  this  section  we  describe  an  algorithm  to 
generate  any  number  of  moduli  set  for  a  given 
precision. 
 
Module find_moduli (N, n, SM) 
 
//Input: N (no. of Bit), n  (no. of moduli set) 
//Output: SM   (Efficient moduli set) 
 
Step 1:  2 1
n N x   = −
 
 
Step 2:   if x is even then  
                   2   n x =              
              else 
                   2 1 n x = +  
Step 3:  When  3 n   =  
               if ((2 )(2 1)(2 -1) (2 -1))
N n n n + ≥ then 
                     {2 ,2 1,2 -1} M S n n n = +  
               else 
                      n will be incremented till 
               ((2 )(2 1)(2 -1) (2 -1))
N n n n + ≥ condition  
              will be satisfied.  
Step 4:  if n = 4 then 
                Let 1
{(2 )(2 1)(2 1)}
N 2 k =  n n n
  −
+ −    
   
               Find the smallest number  1 k k ≥  , where  1 k   
                       is relatively prime to2n, 2 1 n+  and2 -1 n . 
Chaitali Biswas Dutta et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1540-1547
IJCTA | SEPT-OCT 2011 
Available online@www.ijcta.com
1541
ISSN:2229-6093 
 
                     {2 ,2 1,2 1, 1} M S n n n k = + −  
                    … …       … … … … … … … … 
                    … …       … … … … … … … … 
Step p:  if n p  = then 
                     1
{(2 )(2 1)(2 1)}
N 2 k =  n n n
  −
+ −    
         
                    
( 3) 1 p k k
−   =   
               Find the smallest number 
1
1 k k ≥ , where  1 k     
               is relatively prime to 2n, 2 1 n+  and2 -1 n .  
              Therefore,  
                 1 {2 ,2 1,2 1, } M S n n n k = + −  
               Again,                                                         
               
1
1
{(2 )(2 1)(2 1)( )}
N 2 k =  n n n k
  −
+ −    
    
                
( 4) 2 p k k
−   =    
               Find the smallest number 
2
2 k k ≥ , where  2 k     
               Is relatively prime to 2n, 2 1 n+ , 2 -1 n   
               and   1 k . 
              Therefore,  
                 1 2 {2 ,2 1,2 1, , } M S n n n k k = + −  
               … … … … … … … … … … … … … … 
               … … … … … … … … … … … … … …                 
1 4
1
{(2 )(2 1)(2 1)( )...( )}
N
p
2 k =  n n n k k −
  −
  + −  
             
( ( 1)) 3 p p p k k
− − −   =    
                 Find  the  smallest  number 
3
3
p
p k k
−
− ≥ , 
where  3 p k −   is  relatively  prime  to  2n,  2 1 n+ , 
2 -1 n ,  1 k ,…..., 4 p k − . 
Therefore,  
               1 2 3 3 {2 ,2 1,2 1, , , ,... } M p S n n n k k k k − = + −  
End of find_moduli 
 
 
Module Estimator_Bit_Area (N, R) 
  //Input: N (no. of bit), n  (cardinality of moduli set) 
  //Output: R (Rank of the moduli set) 
Step 1: Using the previous algorithm we can calculate  
             required bits i  (B : where i is the no. of  
             moduli set, i.e., i=3,4,5,6,...,q)of            
             particular moduli Set for a particular N. 
 
Step 2:  Calculate the sum of the distances (convert  
into bits) i  (D : where i is the no. of   
moduli set,i.e., i=3,4,5,6,...,q) of each  
member  of  a  set  from  its  nearest 
m 2 ,                              
where m=1, 2, 3, ... .... 
 
Step 3: Calculate the LUT size i  (L  : where i is the     
            no. of moduli set, i.e., i=3,4,5,6,...,q)for  
             each set. 
               i L  = row size × column size 
                    = highest number of the set × 
                            2 (number bit of this highest no.) 
                     
Step 4:  Calculate the distance (convert into bits) 
              i  (d : where i is the no. of moduli set,             
            i.e., i=3,4,5,6,...,q) of the product of the  
             members of a set from it’s dynamic range 
                 
 (Step1 , 2, 3, 4 will  repeat  for  moduli  sets 
i=3,4,5,6,...,q ) 
 
Step 6:  Let B is the smallest number between 
                        ,..., 3 4 5 6 q  B ,B ,B ,B B  
              Set value of B, i,e,. B =1 ′  
              Now, value of  , . ,
3
3 3
B
 B ie B
B
′ =  
                        value of  , . .,
4
4 4
B
 B ie B
B
′ =  
                        … … … … … … … … 
                        … … … … … … … … 
                        value of  , . .,
q
q q
B
 B ie B
B
′ =  
            Finally , we get the list of value 
                       ,..., 3 4 5 6 q  B ,B ,B ,B B ′ ′ ′ ′ ′ 
 
Step 7:  Similarly, like Step 6, we can calculate  
                       ,..., 3 4 5 6 q  D ,D ,D ,D D ′ ′ ′ ′ ′ 
 
Step 8:  Similarly, like Step 6, we can also calculate  
                       ,..., 3 4 5 6 q  L ,L ,L ,L L ′ ′ ′ ′ ′ 
Step 9: Similarly, like Step 6, we can calculate                                        
                       ,..., 3 4 5 6 q  d ,d ,d ,d d ′ ′ ′ ′ ′ 
Step 10:  Let  1 W ,  2 W ,  3 W  and  4 W  are the weights of  
               the four parameter  i B ,  i D , i L  and  i d   
Chaitali Biswas Dutta et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1540-1547
IJCTA | SEPT-OCT 2011 
Available online@www.ijcta.com
1542
ISSN:2229-6093 
 
               respectively. 
               We find that  0.25 1 W = ,  0.75 2 W = ,  
                0.75 3 W =  and   0.25 4 W = . 
 
Step 11: Now calculate                 
             ( ) ( ) ( ) ( ) 3 1 3 2 3 3 3 4 3 R=  W B +W D +W L +W d ′ ′ ′ ′ × × × ×  
             ( ) ( ) ( ) ( ) 4 1 4 2 4 3 4 4 4 R=  W B +W D +W L +W d ′ ′ ′ ′ × × × ×  
              … … … … … … … … … … … …  
              … … … … … … … … … … … … 
             ( ) ( ) ( ) ( ) q 1 q 2 q 3 q 4 q R=  W B +W D +W L +W d ′ ′ ′ ′ × × × ×  
 
Step 12: Now arrange  ..., 3 4 5 q  R ,R ,R R in decreasing  
              order and find the smallest one. Smallest one  
              will refer as rank_1, next largest will refer as  
              rank _2 and so on.  
           
Step 13: Now arrange the moduli set according to the  
              rank. Required area is also proportional with  
              this rank. That means if Rank is decreased  
              area will also decreased. 
 
End of Estimator_Bit_Area 
 
Now an example will be given to illustrate this scheme: 
Let  N = 16 . Therefore dynamic range is 0 to 
16 2 1 −  
i. e., 0 to 65535. 
Now  required  bits  for  moduli  set  are 
18 3 B = ,  19 4 B = ,  19 5 B =  and  22 6 B =  (We can 
calculate  this  required  bits  using  the  previous 
algorithm). 
When  n=3, 
3 {42,43,41} M S = .  Now  the 
nearest 
m 2  of  42  is 
5 2 , i.e.  32 . Also the nearest 
m 2 of 43  and 41are respectively 
5 2  and 
5 2 . 
Therefore,  
2 2 2 log (42 32) log (43 32) log (41 32) 3  D = − + − + −            
 i.e.,  12 3  D =  
Similarly,  4 4  D = , 8 5  D = , 7 6  D = . 
Now for the set 
3 { } M S 42,43,41 =  LUT size is 
3  L   = row size × column size 
         = highest number of the set × 
                            2 (number bit of this highest no.) 
         = 43 (2 6) × ×  
         = 516 
                 
[
]
 43 is the highest no. of the set
and required bit for 43 is 5 
Q
 
Similarly,  190 4  L = , 104 5  L = , 104 6  L = . 
Now  we  have  to  calculate  3 4 5 6  d ,d ,d ,d for  the  set 
3 { } M S 42,43,41 = , 
4 {16,17,15,19} M S = , 
5 {10,11,9,13,7} M S = , 
5 {8,9,7,11,5,13} M S =  
respectively. 
 
Therefore, 
16
2 log [(42 43 41) (2 1)] 3  d   = × × − −   ,  
   i.e.,  14 3  d = . 
Similarly,  4 14 d = , 5 15 d = , 6 19  d = . 
 
Now  we  calculate  the  ratio.  The  ratio  table  for 
N = 16  is given below (Table 1). 
 
N    Moduli  Set  Bi′  Di′  Li′  di′ 
16 
n=3  (42,43,41)  1  3  4.96  1 
n=4  (16,17,15,19)  1.05  1  1.83  1 
n=5  (10,11,9,13,7)  1.05  2  1  1.07 
n=6  (8,9,7,11,5,13)  1.22  1.75  1  1.36 
 
Table 1: Ratio table for N=16 
 
Since  0.25 1 W = ,  0.75 2 W = ,  0.75 3 W=   and  0.25 4 W = , 
therefore,  3 6.47 R = ,  4 2.64 R = ,  5 2.78 R =   and 
6 2.71 R = . 
According to our scheme: 
( ) 4 2.64 _1 R rank ⇒ , 
( ) 6 2.71 _2 R rank ⇒ , 
( ) 5 2.78 _3 R rank ⇒ , 
( ) 6.47 _4 3 R rank ⇒ .  
Form Table 3 we can see the area of these moduli set. 
Therefore it is proved that area is proportional to rank. 
 
We can use the standard estimator or artificial 
intelligence  based  estimator,  like  neural  network  to 
determine the rank. But it needs thousand of sets but 
here  we  have  use  mainly  four  sets.  That  is  why  we 
select the weight of the parameter (i.e.,  1 W ,  2 W ,  3 W  
and  4 W ) heuristically. We take  0.25 1 W = ,  0.75 2 W = , 
0.75 3 W =  and  0.25 4 W = . These values of the weights 
are  chosen  according  to  the  importance  of  the 
Chaitali Biswas Dutta et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1540-1547
IJCTA | SEPT-OCT 2011 
Available online@www.ijcta.com
1543
ISSN:2229-6093 
 
parameter.  And  from  Table  3  we  conclude  that  our 
selection is perfect because it  work  with above 80% 
accuracy.   
 
4.  Comparison of Bit Efficiency and 
Hardware Complexity 
 
In this section we will compute the number of 
bits required to implement the moduli set generated by 
the  algorithm  discussed  in  the  last  section.  We  also 
compare the bit efficiency of the moduli set proposed 
by us with 
t 1 1 1 2 n n n n n {2 ,2 +1,2  -1,2 1,...,2 1} ± ±  [6]. 
Further, we compare the hardware complexity of the 
RNS  processor  discussed  in  the  last  section 
implemented  using  the  moduli  set  of  the  proposed 
scheme with that obtained using the moduli set of [6]. 
Hardware complexity is measured in terms of LUT on 
a Vertex-5 FPGA.    
Bits  required  to  implement  the  moduli  set 
1 2 3 { , , ,..., } M p S n n n n = are 
2 1 2 2 2 3 2 log log log ... log p n n n n   + + + +              
  In  Table  4  we  compare  bits  required  for 
moduli  set  three,  four,  five,  six  generated  by  the 
proposed approach with [6]; in [6] moduli set  G M  is 
generated  as 
1 1 1 2 {2 ,2 1,2 -1,2 1,...,2 1}
t n n n n n   + ± ± , 
where  1 i n n > , and  1 n  as well as  ' ( 2,3,.., ) i  n s   i t =  
need to be chosen such that all these moduli are co-
prime  numbers.  Table  4  also  illustrates  the  areas 
required to implement the RNS processor on Vertes-5 
platform.  
In case of the proposed approach the moduli 
set being linear numbers helps to keep the values small 
and so the difference between their product and  2
N-1 is 
also much small. In case of [6] the moduli set being 
exponential  numbers  their  values  may  grow  (if  co-
primes are not found for lower values of n) and so the 
difference between their product and 2
N-1 is also large. 
This may be observed from Table 4, N=32 and n=3. 
Use of large numbers as moduli sets increase bit widths 
and also the hardware complexity.  
From the LUT counts it may be noted that in 
case  of  the  proposed  scheme,  given  an  N  (dynamic 
range), the most efficient hardware is dependent on the 
moduli-set selected. In the previous section we propose 
a heuristic algorithm that can select a moduli-set for a 
given N, which generates the hardware with minimum 
complexity without synthesis. We also demonstrate the 
accuracy of the scheme based on synthesis results in 
Table 3.   
 
5.  Implementation of FIR Filter Using the 
Proposed Scheme 
As  shown  above  the  proposed  moduli  set 
gives a bit and area efficient residue system which is 
better than the schemes discussed in the literature. It 
can  be  shown  experimentally  that  this  scheme  gives 
and optimum area and speed when VLSI is used as the 
fabrication medium.  
The  parallel  nature  of  RNS  speeds  up  the 
arithmetic  operations  which  can  be  performed 
independently for each module. So RNS can be applied 
as  the  building  block  for  the  high  speed  processing 
elements  especially  for  digital  signal  processing 
applications as mathematical operations for each slice 
must be performed within the sampling time. 
 
 
 
 
Figure 2: Sampling of an Analog Signal 
 
One of the applications of our proposed RNS moduli 
scheme may be FIR filter [19] where systolic cells are 
clustered to form pipeline architecture. All the systolic 
cells are basically the processing elements. In a high 
frequency  input  system,  when  the  analog  input  is 
sampled & converted to digital & processed, the time 
difference  between  two  sampling  (Sampling  time)  is 
very  less.  So  the  sampled  value  must  be  processed 
within  that  time  slice.  As  the  process  is  a 
computationally intensive, RNS can make it possible to 
do the computation within a minimal time, which helps 
to implement the filter for high frequency inputs. Using 
our  proposed  bit  scheme,  bit  efficiency  can  be 
increased more than that of the methods proposed in 
the literature, increasing speed & decreasing the area 
requirement  to  fabricate  the  hardware  inside  VLSI 
chip.  
f(n) 
n 
T 
f(t) 
t 
t = nT 
T = 
1
s f
 
Chaitali Biswas Dutta et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1540-1547
IJCTA | SEPT-OCT 2011 
Available online@www.ijcta.com
1544
ISSN:2229-6093 
 
A bit efficient RNS requires minimal numbers 
of  bits  to  implement  a  moduli  scheme  of  a  specific 
cardinality. 
The  equation  for  finite  response  filter 
implementing RNS can be defined as 
1
0
( ) ( )  u(k-i)
j j
N
j m m
i
y k a i
−
=
= ∑   
Where     u(k-i)
j m     and  ( ) j y k   are  the  residue 
representation  of  the  input  and  output  signals 
respectively  of  the  filter  modulo j m ,  and ( )
j m a i , 
i = 0, 1, 2, ..., N-1 are the filter coefficients in RNS 
representation [20]. 
FIR  filter  can  be  implemented  in  a 
conventional scheme using delay elements (Figure 3). 
The delay elements actually pass the values delaying 
by certain amount of time so that the signal values of 
the  previous  steps  are  multiplied  with  the 
corresponding coefficients.  
 
 
Figure 3: FIR Filter using Delay 
 
In this process, at each step, we need the computation 
of  the  whole  function.  The  same  FIR  filter  can  be  
implemented using Systolic Cells (Figure 4).  
 
 
Figure 4: Systolic FIR Filter 
 
As the cells are Transport Triggered, they are triggered 
as and when the value of the previous cell is set. They 
are  basically  the  data  processing  units  (DPU),  so 
compute the values of each step & store them to be 
processed by the next cell.  
So  results  are  made  available  in  a  pipeline 
fashion.  In  this  case  our  proposed  scheme  can  be 
implemented to build the DPUs as most of the cases it 
gives lesser area than the other schemes given in the 
literature. The areas for the DPUs for various modular 
cardinalities are given in Table 2. 
 
 
 N 
  Proposed Scheme 
Moduli Set  Bit# 
 
DPU 
Area 
16 
n=3  (42,43,41)  18  161 
n=4  (16,17,15,19)  19  137 
n=5  (10,11,9,13,7)  19  163 
n=6  (8,9,7,11,5,13)  22  159 
20 
n=3  (102,103,101)  21    257 
n=4  (32,33,31,35)  23  161 
n=5  (16,17,15,19,23)  24  226 
n=6  (12,13,11,17,7,19)  25  171 
32 
n=3  (1626,1627,1625)  33  1605 
n=4  (256,257,255,259)  35  510 
n=5  (86,87,85,89,77)  35  401 
n=6  (42,43,41,47,37,53)  36  336 
 
Table 2: Area of DPUs for various cardinalities 
 
For the FIR Filter,  
Area complexity =  ( )
2 O     log L N m m and 
Time complexity = ( ) O logm , 
where  m  is the modulus used in the system [20]. As 
discussed,  our  proposed  scheme  uses  the  moduli 
having  the  minimum  value  than  the  others  scheme 
proposed in the literature. So as a result, the area & 
time  complexity  are  decreased  when  the  proposed 
scheme is used when implemented using VLSI. 
 
6.  Conclusion 
In  this  paper  we  proposed  an  algorithm  to 
generate  any  moduli  set  with  finite  cardinality  for  a 
given dynamic range and shown by different examples 
that bit efficiency of the proposed scheme is better than 
all other scheme introduced in the literature. Here we 
also introduced an estimator depending on bit and area 
efficiency.  Using  this  estimator  we  can  decide  what 
type  of  moduli  set  with  finite  cardinality  required 
minimum area and what type of moduli set we should 
use for a particular VLSI fabrication. 
 
 
 
 
 
 
 
 
 
 
 
 
 
Chaitali Biswas Dutta et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1540-1547
IJCTA | SEPT-OCT 2011 
Available online@www.ijcta.com
1545
ISSN:2229-6093 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
N 
  Proposed Scheme 
Moduli Set  Bi 
 
Di  Li 
  
di  Ri  Total 
Area 
12 
n=3  (18,19,17)  1  1.2  2.16  1.19  7.34    318 
n=4  (8,9,7,11)  1  1  1  1  3.76  216 
n=5  (6,7,5,11,13)  1.13  2  1.18  17.9  4.91  272 
n=6  (4,5,3,7,11,13)  1.27  1.8  1.18  38.62  4.95  248 
16 
n=3  (42,43,41)  1  6  4.96  1  18.14    340 
n=4  (16,17,15,19)  1.05  1  1.83  1.41  6.26  338 
n=5  (10,11,9,13,7)  1.05  2  1  2.88  3.28  284 
n=6  (8,9,7,11,5,13)  1.22  1.8  1  34.64  4.38  328 
20 
n=3  (102,103,101)  1  15  7.59  1  30.53  542 
n=4  (32,33,31,35)  1.09  1  2.21  7.75  7.44  402 
n=5  (16,17,15,19,23)  1.14  2.4  1.21  58.6  5.4  338 
n=6  (12,13,11,17,7,19)  1.19  3  1  225.9  5.9  356 
24 
n=3  (258,259,257)  1.08  1.2  20.27  1.85  61.68    898 
n=4  (64,65,63,67)  1.08  1  4.08  3.66  6.41  456 
n=5  (28,29,27,31,25)  1  4  1.35  1  6.31  384 
n=6  (16,17,15,19,23,11)  1.12  2.6  1  13.26  4.65  362 
28 
n=3  (646,647,645)  1  80.4  41.74  1  165.68  2394 
n=4  (128,129,127,131)  1.03  1  6.76  5.46  21.06  722 
n=5  (50,51,49,47,53)  1  13.6  2.05  37.28  13.39  784 
n=6  (26,27,25,29,23,31)  1  5.8  1  82.12  6.56  478 
32 
n=3  (1626,1627,1625)  1  253.2  56.28  1  295.7   3292 
n=4  (256,257,255,259)  1.06  1  7.33  12.65  22.82  1050 
n=5  (86,87,85,89,77)  1.06  20.8  1.96  15.94  16.62  856 
n=6  (42,43,41,47,37,53)  1.09  9.2  1  636.63  11.06  714 
 
Table 3: Required area and rank of the moduli set are proportional  
N 
  Proposed Scheme    Most widely used scheme [6] 
Moduli Set  Bi   Area  Moduli Set  Bi  Area 
16 
n=3  (42,43,41)  18    340    (64,65,63)  20  402 
n=4  (16,17,15,19)  19  306  (32,33,31,5)  20  332 
n=5  (10,11,9,13,7)  19  284  (32,33,31,15,7)  24  356 
n=6  (8,9,7,11,5,13)  22  298  (32,33,31,15,7,1)  25  394 
20 
n=3  (102,103,101)  21  542    (128,129,127)  23  544 
n=4  (32,33,31,35)  23  402  (64,65,63,17)  25  432 
n=5  (16,17,15,19,23)  24  338  (32,33,31,17,5)  25  376 
n=6  (12,13,11,17,7,19)  25  356  (32,33,31,17,7,5)  28  394 
32 
n=3  (1626,1627,1625)  33   3292    (2048,2049,2047)  35  3680 
n=4  (256,257,255,259)  35  1050  (512,513,511,65)  36  1230 
n=5  (86,87,85,89,77)  35  856  (512,513,511,17,5)  37  1288 
n=6  (42,43,41,47,37,53)  36  714  (128,129,127,31,17,7)  36  770 
 
Table 4:  Comparison of bit and area efficiency of proposed scheme with most widely used 
scheme [6]  
 
Chaitali Biswas Dutta et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1540-1547
IJCTA | SEPT-OCT 2011 
Available online@www.ijcta.com
1546
ISSN:2229-6093 
 
References 
 
[1]  J.  P.  Hayes  “Computer  Architecture  and 
Organization”, McGraw-Hill, 2004. 
[2]  K.  Hwang,  “Computer  Arithmetic:  Principles, 
Architecture and Design”, John Wiley & Sons, 1979. 
[3]  Chao-L.  Chiang  and  L.  Johnsson,  “Residue 
Arithmetic  and  VLSI”,  IEEE  International 
Conference on Computer Design, pp. 80-83, 1983. 
[4] M. Abdallah and A. Skavantzos, “A systematic 
approach  for  selecting  practical  moduli  sets  for 
residue number systems”, Southeastern Symposium 
on System Theory, pp. 445-459, 1995. 
[5]  E.  Setiaarif  and  P.  Siy,  “A  new  moduli  set 
selection  technique  to  improve  sign  detection  and 
number  comparison  in  Residue  Number  System 
(RNS)”, Fuzzy Information Processing Society, pp. 
766 - 768, 2005. 
[6]  W.  Wang,  M.N.S.  Swamy  and  M.O.  Ahmad, 
“Moduli  Selection  in  RNS  for  Efficient  VLSI 
Implementation”,  International  Symposium  on 
Circuit and System, vol. 4, pp. 512- 515, 2003. 
[7] A. Omondy and B.Premkumar, “Residue Number 
System  theory  and  implementation”,  Imperial 
College Press, 2007. 
[8]  A.  A.  Hiasat,  “High-Speed  and  Reduced-Area 
Modular  Adder  Structure  for  RNS”,  IEEE 
Transactions on Computers, vol. 51, issue 1,  pp. 84-
89, 2002. 
[9]  M.  Bayoumi,  G.  Jullien  and  W.  Miller,  “An 
efficient VLSI adder for DSP architectures based on 
RNS”, IEEE International Conference on Acoustics, 
Speech,  and  Signal  Processing,  vol.  10,  pp.  1457- 
1460, 1985.  
[10]  S.  Timarchi,  K.  Navi  and  M.  Hosseinzade, 
“New Design of RNS Subtractor for modulo 2
n+1”, 
Information and Communication Technologies, vol. 
2, pp. 2803-2808, 2006. 
[11] Y. Ma, “A simplified architecture for modulo 
(2n  +  1)  multiplication”,  IEEE  Transactions  on 
Computers, vol. 47, issue 3, pp. 333-337, 1998. 
[12]  C  Ding,  D  Pei  and  A  Salomaa  “Chinese 
Remainder  Theorem:  Applications  in  Computing, 
Coding, Cryptography”, World Scientific Publishing 
Company, 1996.  
[13]  Y.  Wang,  “New  Chinese  Remainder 
Theorems”,  Asilomar  Conference  on  Signals, 
Systems & Computers, vol. 1, pp. 165-171, 1998. 
[14]  S.  Bi  and  W.  J.  Gross,  “The  Mixed-Radix 
Chinese Remainder Theorem and Its Applications to 
Residue  Comparison”,  IEEE  Transactions  on 
Computers, vol. 57, issue 12,  pp. 1624-1632, 2008. 
[15] W. Wang, M.N.S. Swamy and M. O. Ahmad, 
Y. Wang, “A study of residue-to-binary converters 
for  three-moduli  sets”,  IEEE  Transactions  on 
Circuits and Systems I, vol. 50, issue 2, pp. 235 – 
243, 2003. 
[16]  W. Wang,  M.N.S.  Swamy  and  M.O.  Ahmad, 
“An area-time-efficient residue-to-binary converter”, 
IEEE Midwest Symposium on Circuits and Systems, 
vol. 2, pp. 904- 907, 2000. 
[17] S. J. Piestrak, “A High-speed Realization of a 
Residue to Binary Number System Converter”, IEEE 
Transactions on Circuits and Systems II: Analog and 
Digital Signal Processing, vol. 42, issue 10, pp. 661 
– 663, 1995. 
[18]  G.  Bi  and  E.  V.  Jones,  “Fast  conversion 
between binary and residual numbers”, Electronics 
Letters, vol. 24, pp. 1195-1197, 1988. 
[19]  V.  Visvanathan,      N.  Mohanty  and  S. 
Ramanathan,  “An  Area-Efficient  Systolic 
Architecture  for  Real-Time  VLSI  Finite  Impulse 
Response  Filters”,  The  Sixth  International 
Conference on VLSI Design, pp. 166-171, 1993. 
[20] M. A. Bayoumi, G. A. Jullien and W. C. Miller, 
“A  systolic  (VLSI)  array  using  RNS  for  digital 
signal  processing  applications”,  ACM  Annual 
Computer Science Conference, pp.115 – 120, 1984. 
 
 
 
Chaitali Biswas Dutta et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1540-1547
IJCTA | SEPT-OCT 2011 
Available online@www.ijcta.com
1547
ISSN:2229-6093