An Efficient VLSI Architecture for Matrix Based RNS Backward Converter  by Rayapudi, Bhavana et al.
 Procedia Computer Science  85 ( 2016 )  271 – 277 
Available online at www.sciencedirect.com
1877-0509 © 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the Organizing Committee of CMS 2016
doi: 10.1016/j.procs.2016.05.233 
ScienceDirect
 International Conference on Computational Modeling and Security (CMS 2016) 
AN EFFICIENT VLSI ARCHITECTURE FOR MATRIX BASED 
RNS BACKWARD CONVERTER 
Bhavana Rayapudi a,*, I.B.K Rajua, Gnaneshwara Charya, Pranay Deekondaa,                  
Prashanth Ummadisettia 
aB.V Raju Institute of Technology, Narsapur, Medak-502 313, Telangana, India 
Abstract 
Residue Number System (RNS) is the important research area from last five decades. Forward & backward conversion process is 
the bottle neck which limits the use o f RNS for computing needs.. In this paper, we proposed an efficient VLSI architecture for  
Matrix based RNS backward converter. We analysed the performance of proposed architecture for different modulo sets of size 
up to ten  . Implemented using TSMC standard cell 180 nm CMOS technology libraries and result analysis indicated that, the 
performance of proposed converter achieved about 59% area reduction and 30% efficient with respective to Time-Delay Product 
when compared to the state of art Backward converters. 
© 2015 The Authors. Published by Elsevier B.V. 
Peer-review under responsibility of organizing committee of the 2016 International Conference on Computational Modeling and 
Security (CMS 2016).  
Keywords:  Residue Number System; Chinese remainder theorem( CRT); Backward Convertor; Computer Arithmetic 
1.  Introduction 
   In weighted number system, the main disadvantage is carry chain propagation because of this , there is 
performance degradation in computing hardware. So carry chain propagation is the main challenging problem. For 
reduction and eliminat ion of carry  chain there are many conventional  number system approaches like carry look 
ahead, parallel prefix Adders, ELM adder. With  all these approaches we can propagate the carry but not eliminating 
the carry totally. Whenever integer arithmetic for large numbers is needed this conventional number system will not 
 
 
* Corresponding author. Rayapudi Bhavana. 
E-mail address:bhavana.rayapudi@gmail.com 
© 2016 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the Organizing Committee of CMS 2016
272   Bhavana Rayapudi et al. /  Procedia Computer Science  85 ( 2016 )  271 – 277 
be a better choice. So, there is few unconventional approach like Residue number system [1] which restrict the carry  
chain propagation within the residue digits  by which parallel execution can be achieved .This property makes it   
suitable for fast computer arithmetic’s. It  has many  advantages like parallelis m, fault  tolerance, modularity, carry  
free nature etc.., with all these features it is well suited for digital signal processors (DSP) applicat ions  [5] such as 
digital filtering, convolutions, correlation, fast Fourier transforms, computer security (cryptography) [6], fault  
tolerance, fault detection, error correction, communication engineering[7] and image processing [8].  
     RNS processors have three components [2] fo rward Convertor, Modulo Arithmetic Unit and Backward  
Convertor. Among all these steps backward conversion is cost overhead. To overcome this problem there are 
different backward conversion algorithms like CRT [1], Mixed Radix Conversion (MRC) [1], Matrix Method 
(MATR) [3], CRT-I and CRT-II [4]. CRT is desirable because of its parallelism but the drawback is large modulo-
M addition operation during the last stage. In MRC algorithm only Mixed Radix Dig its are added in the last stage 
but is sequential in nature. The main d isadvantage with CRT-l and CRT-ll is they restricted to specific class of  
module sets. MATR is the backward conversion algorithm proposed is [3] sequential in nature but needs less 
computations steps when compared with MRC. There are no VLSI arch itectures are existing. In this paper, we 
proposed VLSI architecture for MATR 
     
   The paper we briefly present the necessary background in Sect ion 2. Section 3 describes the proposed VLSI 
architecture for Matrix Method for efficient residue to decimal conversion. We evaluate the performan ce of our 
proposal in Section 4. Finally section 5 gives conclusion. 
2. Background 
     RNS is an unconventional number system that is defined in  terms of relatively  prime moduli set 
ሼ݉ଵǡ ݉ଶǡ ݉ଷ ǥ Ǥ Ǥ ǡ ݉௡ሽ that is gcd(mi,mj) for i≠j . A weighted number can be represented as ܺ ൌ ሺݔଵǡݔଶǡǥ Ǥ Ǥ ǡ ݔ௡ሻ 
where 
ݔ௜ ൌ ܺ݉݋݀݉௜ ൌ ܺȁݔ௜ǡͲ ൑ ݔ௜ ൏ ݉௜ ሺͳሻ 
 RNS has unique representation for any integer in the rangeሾͲݐ݋ܯ െ ͳሿ, where M is the dynamic range of the 
moduli set ൛݉ଵǡ ݉ଶǡǥǤǤ ǡ݉௡ൟǡ which is equal to the product of ݉௜  terms. 
   In  this paper we are evaluating the performance of CRT  (Chinese Remainder Theorem), MRC (Mixed Radix 
Conversion) and MATR (Matrix Method). CRT defined for a set of pair-wise relatively-prime 
moduli, ൛݉ଵǡ ݉ଶǡǥǤǤ ǡ݉௡ൟ   and a residue representation ሺݔଵǡݔଶǡǥ Ǥ ǡ ݔ௡ሻ in that system of some number X, i.e. 
ݔ௜ ൌ ȁܺȁ௠೔  ǡ that number and its residues are related by the equation 
ȁܺȁெ ൌ อ෍ ݔ௜หܯ௜ି ଵȁ௠೔ܯ௜
ே
௜ୀଵ
อ
ெ
ሺʹሻ  
The ROM based reverse converter architecture for Chinese remainder theorem is given below. The main drawback 
of CRT is large modulo-M operation. 
 
Fig 1: ROM based CRT Architecture 
 MRC approach is inherently a sequential approach. It is defined as assume that there is set of residues 
ሺݔଵǡݔଶǡǥ Ǥ ǡ ݔ௡ሻ with the moduli set ൛݉ଵǡ ݉ଶǡǥǤǤ ǡ݉௡ൟand the corresponding mixed radix d igits are ܼଵǡ ܼଶǡǥ Ǥ Ǥ ǡ ܼே  ,  
then the equation for converting residue to decimal conversion  X is as follow. 
273 Bhavana Rayapudi et al. /  Procedia Computer Science  85 ( 2016 )  271 – 277 
 ܺ ൌ ݖே݉ேିଵ݉ேିଶ ǥ Ǥ݉ଵ൅ǥǥǥǤǤ ൅ ݖଷ݉ଶ݉ଵ ൅ ݖଶ݉ଵ ൅ ݖଵሺ͵ሻ 
 
Where 
ݖଵ ൌ ݔଵሺͶሻ                        
ݖଶ ൌ ቚห݉ଵି ଵȁ௠మ ሺݔଶ െ ݖଵሻቚ௠మ   ሺͷሻ 
ݖଷ ൌ  ቚหȁ݉ଶି ଵȁ௠య ȁ݉ଵି ଵȁ௠య ห௠య൫ݔଷ െ ሺݖଶ݉ଵ ൅ ݖଵሻ൯ቚ௠య
ൌ  หȁ݉ଶି ଵȁ௠య ൫ȁ݉ଵି ଵȁ௠య ሺݔଷെ ݖଵሻ െ ݖଶ൯ห௠య ሺ͸ሻ 
 ݖே ൌ ቚห݉ேିଵିଵ ȁ௠ಿ൫ȁ݉ேିଶିଵ ȁ௠ಿ൫ǥ ห݉ଶି ଵȁ௠ಿ ሺݔேെ ݖଵሻ െ ݖଶሻǥ ൯ െ ݖேିଵ൯ቚ௠೙ ሺ͹ሻ 
          
      Where 0 ≤ zi ˂ mi   it is evident that mixed radix is a weighted number system.  Mixed radix dig it requires ௡
ሺ௡ିଵሻ
ଶ  
multip licat ions and subtractions. The asymptotic complexity of MRC is ܱሺ݊ଶሻ.The main drawback of MRC is 
sequential in nature, so it takes more time for computations. The MRC architecture is shown below. 
 
Fig 2: ROM Based Architecture for Mixed Radix Method 
 MATR is another theorem used for residue to decimal conversion based on number of jumps i.e  the nearest residue 
number at least one residue being zero. We can calculate decimal number by n -consecutive jumps in the residue 
table such that each jump increases the number zeros with one. 
274   Bhavana Rayapudi et al. /  Procedia Computer Science  85 ( 2016 )  271 – 277 
3. Proposed VLSI Architecture for Mixed Matrix Method 
 
    To defined mixed  matrix method, assume that there is a residue number ሺݔଵǡݔଶǡǥ Ǥ ǡ ݔ௡ሻ  with moduli set ൛݉ଵǡ ݉ଶǡǥǤǤ ǡ݉௡ൟthen the decimal number conversion X as  
 ܺ ൌ σ ݌௜௡௜ୀଵ ሺͺሻ  
 
Where ݌௜    
݌௜ ൌ ሺ݉ଵ݉ଶ ǥ݉௜ିଵሻหȁሺ݉ଵ݉ଶ ǥ݉௜ିଵሻିଵȁ௠೔ ݐሺ௜ିଵሻ௝ห௠೔ ሺͻሻ                    
i>1 and ݐሺ௜ିଵሻ௝  is the value to be determined. If݌ଵ ǡ ݌ଶǡ ǥ Ǥ Ǥ ݌௡    are jumps and corresponding residues are 
ݔଵǡݔଶǡݔଷǡǥ Ǥ Ǥݔ௡respectively, then    
݌ଵ ൌ ݔଵሺͳͲሻ 
And the first location is  
ܺ െ ݌ଵ ൌ
ۏ
ێ
ێ
ێ
ێ
ێ
ۍ ȁݔଵ െ ݌ଵȁ௠భ ൌ Ͳȁݔଶ െ ݌ଵ ȁ௠మ ൌ ݐଵ
หݔଷ െ ݌ଵ ȁ௠య ൌ ݐଶȁݔସ െ ݌ଵ ȁ௠ర ൌ ݐଷ
ڭ
ȁݔ௡െ ݌ଵ ȁ௠೙ ൌ ݐ௡ିଵے
ۑ
ۑ
ۑ
ۑ
ۑ
ې
ሺͳͳሻ  
 
The second jump is ݌ଶ ൌ ܿଶ݉ଵǡ , where c2 has to be satisfy หݐଵ െ ݌ଶ ȁ௠మ ൌ Ͳ 
 
݌ଶ ൌ݉ଵหȁሺ݉ଵሻିଵȁ௠మ ݐଵห௠మ ሺͳʹሻ 
And the second location is  
ܺ െ ݌ଵ െ ݌ଶ ൌ
ۏ
ێ
ێ
ێ
ێ
ێ
ۍ ȁͲ െ ݌ଵȁ௠భ ൌ Ͳȁݐଵ െ ݌ଶ ȁ௠మ ൌ Ͳ
หݐଶ െ ݌ଶ ȁ௠య ൌ ݐଶଵȁݐଷ െ ݌ଶ ȁ௠ర ൌ ݐଷଵ
ڭ
ȁݐ௡ െ ݌ଶ ȁ௠೙ ൌ ݐሺ௡ିଵሻଵے
ۑ
ۑ
ۑ
ۑ
ۑ
ې
ሺͳ͵ሻ 
The third jump is ݌ଷ ൌ ܿଷ݉ଶ݉ଵǡ  where c3 has to satisfy ȁݐଶଵ െ ݌ଷ ȁ௠య ൌ Ͳ 
݌ଷ ൌ ሺ݉ଵ݉ଶሻหȁሺ݉ଵ݉ଶሻିଵȁ௠య ݐଶଵห௠య ሺͳͶሻ 
And the third location is  
                                                                 
ܺ െ ݌ଵ െ ݌ଶ െ ݌ଷ ൌ
ۏ
ێ
ێ
ێ
ێ
ێ
ێ
ۍ ȁͲ െ ݌ଷ ȁ௠భ ൌ ͲȁͲ െ ݌ଷ ȁ௠మ ൌ Ͳ
หݐଶଵ െ ݌ଷ ȁ௠య ൌ Ͳȁݐଷଵ െ ݌ଷ ȁ௠ర ൌ ݐଷଵଵȁݐସଵ െ ݌ଷ ȁ௠ఱ ൌ ݐସଵଵ
ڭ
หݐሺ௡ିଵሻଵ െ ݌ଷȁ௠೙ ൌ ݐሺ௡ିଵሻଵଵے
ۑ
ۑ
ۑ
ۑ
ۑ
ۑ
ې
ሺͳͷሻ  
 The process is continued until the final location is zero  and the final jump is determined by 
݌௡ ൌ ܿ௡݉ଵ݉ଶ ǥ Ǥ Ǥ ݉௡ିଵ, where ܿ௡has to be satisfy หݐሺ௡ିଵሻଵଵ െ ݌௡ ȁ௠೙ ൌ Ͳ 
݌௡ ൌ ሺ݉ଵ݉ଶ ǥ ݉௡ିଵሻหȁሺ݉ଵ݉ଶ ǥ ݉௡ିଵሻିଵȁ௠೙ ݐሺ௡ିଵሻଵଵห௠೙ ሺͳ͸ሻ  
And the final location is  
275 Bhavana Rayapudi et al. /  Procedia Computer Science  85 ( 2016 )  271 – 277 
ܺ െ ݌ଵ െ ݌ଶ െ ݌ଷ ǥ Ǥ െ݌௡ ൌ
ۏ
ێ
ێ
ێ
ێ
ێ
ێ
ۍ ȁͲ െ ݌௡ ȁ௠భ ൌ ͲȁͲ െ ݌௡ ȁ௠మ ൌ Ͳ
หͲ െ ݌௡ ȁ௠య ൌ ͲȁͲ െ ݌௡ ȁ௠ర ൌ ͲȁͲ െ ݌௡ ȁ௠ఱ ൌ Ͳ
ڭ
หݐሺ௡ିଵሻଵଵ െ ݌௡ ȁ௠೙ ൌ Ͳے
ۑ
ۑ
ۑ
ۑ
ۑ
ۑ
ې
ሺͳ͹ሻ 
Therefore the final decimal conversion value X for mixed matrix method is  
ܺ ൌ ݌ଵ ൅ ݌ଶ ൅ ݌ଷ ൅ǥ ǥ݌௡ ሺͳͺሻ                    
   Matrix method is done on theoretical assumption [5]. Here we are g iving practical architecture fo r those 
theoretical assumptions. When we consider all parameters i.e  area, power and timing, matrix method is the best 
method comparat ive other. If we consider only power MRC is the best and coming to ti ming CRT is the better 
choice. For area, matrix method is the best choice.  Here is the proposed VLSI arch itecture for backward conversion 
using Mixed Matrix Method for moduli set m=5{11, 7, 5, 3, 2} 
 
Fig 3: Proposed VLSI Architecture for Mixed Matrix Method for 5 moduli set {11, 7, 5, 3, 2} 
276   Bhavana Rayapudi et al. /  Procedia Computer Science  85 ( 2016 )  271 – 277 
4. Performance Evaluation 
 
  In order to properly evaluate the performance of the proposed reverse converters, both a theoretical analysis as well 
as an evaluation based on the implementation results was performed. We analysed the performance of proposed 
Architecture for d ifferent modulo sets of size up to ten  using TSMC standard cell 180 nm CMOS technology 
lib raries In Matrix Method, except the first iterat ion, for all remaining iterations requires n parallel subtractions. For 
finding ݌௜ ൌ ሺ݅ ൌ ʹǡ ݊ሻ  except the first iteration it requires 2 multiplications because for every computation we 
calculate its moduli and their multiplicative inverse. We can say clearly from the equations of MATR (12) (14) (16). 
(n-1) conversion process required because for finding ݌௜ ൌ ሺ݅ ൌ ͳǡ ݊ሻ ,݌ଵ ൌ ݔଵ is a straight process. And also n-1 
additions for computation of backward  conversion X. Therefore the total sum of computations required for mixed  
matrix method is 4n-3. The asymptotic complexity is the order ofܱሺ݊ሻ. 
 In Mixed radix Method for finding mixed rad ix dig its it requires 
௡ሺ௡ିଵሻ
ଶ mult iplications and subtractions. For 
decimal conversion equation i.e X, we require (n-1) additions and multiplications.  
 
 Next coming to Chinese remainder Theorem i.e CRT, for evaluating the expression X it requires ሺ݊ െ ͳሻ additions 
and n mult iplications and also in  addition it requires n-modulo  operations and n-divisions i.e  fo r the computation of 
multip licat ive inverse and modulo division n(n).so finally addition of all the computations for CRT it requires 
݊ଶ ൅ ʹ݊ െ ͳ.Till now we discuss all theoretical computation requirements. 
 
 In practical matrix method is best choice comparative other methods in terms of area. We can say this by seeing the 
below table. The practical and theoretical assumptions show that matrix method is  best among other methods i.e 
CRT and MRC. Coming to power MRC is best than matrix method. For timing CRT is the best choice. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5. Conclusion 
      In this paper we proposed an efficient VLSI architecture for Matrix based RNS backward converter upto 10 
moduli set. The performance of propoosed architecture is evaluated both by theoritically and practically. The 
experimental results suggest that proposed architecture have achived 59% of area reduction for 10 moduli set and 
30% efficient with respective to Time Delay Product when compared to the state of art Backward converters. 
References 
1. A.Omondi and B.Premkumar. Vol. 2 of Residue Number Systems: Theory and Implementation 2007. 
2. Keivan Navi, Amir Sabbagh Molahosseini, and Mohammad Esmaeildoust,How to Teach Residue Number System to Computer 
Scientists and Engineers,IEEE Transactions On Education, Vol. 54, No. 1, February 2011 
3. Kazeem Alagbe Gbolagade  and  Sorin Dan Cotofana,Generalized Matrix Method for Efficient Residue to Decimal conversion,2008. 
4. Yuck Wang,Residue-to-Binary converters based on New Chinese Remainder Theorem,IEEE Transactions On Circuits And Systems-
II:Analog And Digital Signal Processing,Vol.47,No.3,March 2000.  
Table 1: Theoretical Estimated Area ሺߤ݉ଶ ൈ ͳͲଶሻ for different No. 
of Moduli sets    
No. of  
Moduli Moduli sets MATR MRC CRT 
2 {3,2} 5 4 7 
5 {11,7,5,3,2} 17 28 34 
7 {17,13,11,7,5,3,2} 25 54 62 
10 {29,23,19,17,13,11,7,5,3,2} 37 108 119 
Table 2: Practical Estimated Area ሺߤ݉ଶ ൈ ͳͲଶሻ for different No. of 
Moduli sets 
No. Of 
Moduli   Moduli  sets MATR MRC CRT 
2 {3,2} 11.5 9.5 15 
5 {11,7,5,3,2} 46.1 51 57 
7 {17,13,11,7,5,3,2} 63.7 126 180 
10 {29,23,19,17,13,11,7,5,3,2} 91.0 225 279 
Table 3: Practical timing ሺ݌ݏൈ ͳͲଶሻfor different No. of Moduli sets 
No. Of 
Moduli Moduli sets MATR MRC CRT 
2 {3,2} 98.7 108.0 92.8 
5 {11,7,5,3,2} 114.5 138.3 106.1 
7 {17,13,11,7,5,3,2} 198.3 258.6 157.8 
10 {29,23,19,17,13,11,7,5,3,2} 235.7 301.8 198.0 
Table 4: T  x A (݌ݏ ൈ ߤ݉ଶ ൈ ͳͲ଺) for different No. of Moduli sets 
No. Of 
Moduli  Moduli sets MATR MRC CRT 
2 {2,3} 19 36 27 
5 {11,7,5,3,2} 52 71 60 
7 {17,13,11,7,5,3,2} 76 157 113 
10 {29,23,19,17,13,11,7,5,3,2} 96 178 148 
277 Bhavana Rayapudi et al. /  Procedia Computer Science  85 ( 2016 )  271 – 277 
5. Fred J. Taylor, “Residue Arthimetic: A Tutorial with Examples”,IEEE Trans. On Computer,pp. 50~62,May 1984. 
6. S.Antao,J.C. Bajard and L.Sousa, “Elliptical Curve Point Multiplication On GPUS”.In Proc. IEEE Int. Conf. Asap, Rennes,  
Apr.2010,pp.192-199. 
7. V.T.Goh and M.U. Siddiqi, “Multiple Error Detection And correction Based On Redundant Residue Number Systems. “IEEE Trans. 
Commun, Vol.56,No.3,pp.325-330,Mar.2008. 
8. W.Wang, M.N.S.Swamy, M.O. Ahmad, “RNS Application For Digital Image processing”,In Proc.4 th IEEE Int.Workshop System-On-
Chip FIR Real T ime Appl.,2004,pp.77-80. 
