Abstract-Residue number system is a non-weighted integer number system which uses the residues of division of ordinary numbers by some modules for representing that ordinary numbers. In this paper, the general three moduli set }2 n -1,2 n +1,2 pn+1 -1} based on CRT algorithm is proposed in which -p‖ is an even number greater than zero. The special case of this set for p=2 which is }2 n -1,2 n +1,2 2n+1 -1{ is also described in this paper. Since the dynamic range of this set is odd, some difficult problems in RNS can be easily solved based on this set using parity checking. The proposed reverse converter is better in speed and hardware in comparison to reverse converters in similar dynamic range. Moreover, from the complexity point of view, the internal arithmetic circuits of this moduli set is improved and is less complex than the other sets in similar dynamic range.
Abstract-Residue number system is a non-weighted integer number system which uses the residues of division of ordinary numbers by some modules for representing that ordinary numbers. In this paper, the general three moduli set }2 n -1,2 n +1,2 pn+1 -1} based on CRT algorithm is proposed in which -p‖ is an even number greater than zero. The special case of this set for p=2 which is }2 n -1,2 n +1,2 2n+1 -1{ is also described in this paper. Since the dynamic range of this set is odd, some difficult problems in RNS can be easily solved based on this set using parity checking. The proposed reverse converter is better in speed and hardware in comparison to reverse converters in similar dynamic range. Moreover, from the complexity point of view, the internal arithmetic circuits of this moduli set is improved and is less complex than the other sets in similar dynamic range.
Index Terms-Reverse Converter, Moduli Set, Dynamic Range, Residue Number System
I. INTRODUCTION
Residue number system is a non-weighted number system which is specified with a muduli set {m 1 ,m 2 ,…,m n } in which an integer number X is represented as (x 1 ,x 2 ,…,x n ) that x i =x mod m i . Arithmetic operations on residues can be performed in each moduli in parallel without carry propagation between them. The RNS has been widely considered for efficient hardware implementation of digital signal processing (DSP) [3] , and for the implementation of high-speed FIR filters [4] . Moreover, RNS has applications in image processing systems, especially RNS image coding which can offer high-speed VLSI implementation of secure image processing algorithms [5] .
Because of the carry free property of some operations like addition, subtraction and multiplication, implementation of these operations are easy and fast. Some operations like sign determination, number comparison, and overflow detection cannot be accomplish free of carry between muduli, so they are considered as fundamental problems in RNS. Different solutions are proposed to accomplish these operations. One of them is parity checking where parity means the residue in redundant modulo 2. But this solution is easily to implement only when the dynamic range is odd.
Moduli set selection and reverse conversion design is very significant in RNS. Reverse conversion is mainly implemented with one of the algorithms of Chinese reminder theorem and mixed-radix conversion or a combination of these two.
One of the most popular moduli sets is }2 n -1,2 n ,2 n +1{. But today its dynamic range is not sufficient for many applications. So moduli sets with larger dynamic range and sets with more moduli for increasing parallelism are mostly proposed. The modulo 2 n looks an appropriate modulo with respect to hardware cost and delay of arithmetic circuits and converters. But this modulo is even therefore the dynamic range will be even, so we would not be able to use parity checking as a solution for the fundamental problems in RNS.
Few moduli sets with odd dynamic ranges were also proposed, like: {2 -1} [16] . Because of using 2 n -2 and 2 n +3 moduli, complexity of internal arithmetic circuits for [6] , [7] is high. In comparison to our reverse converter, since in moduli set {2 n -1, 2 n +1, 2 2n +1} the third modulo is a multiple of the other moduli, the reverse converter has a better performance. But the arithmetic circuits for moduli in the form of 2 n +1 are complex and unfortunately two of them are in this moduli set, so the performance has decreased in the overall RNS. In moduli set {2 n/2 -1,2 n/2 +1,2 n +1,2 2n+1 -1}, the parallelism is increased but there are unbalance moduli and also two moduli in the form of 2 n +1 which lead to decrease in the performance of overall RNS.
In this paper the reverse converter for the general three odd moduli set is proposed based on CRT algorithm in which p is an even number greater than zero. Taking p as a variant we can have the appropriate dynamic range. Therefore, the dynamic range is odd, and consequently the proposed set is amenable to solve difficult RNS problems using parity checking.
In the rest of the paper we will see a brief introduction of RNS (Section II), design of proposed converter for the general case and then for the special case (section III) and at the end of the paper we will review the performance evaluation and finally the conclusion.
In order to perform some arithmetic operations on a weighted number In an RNS system, a converter is needed to decompose a weighted binary number into a residue represented number, with regard to the moduli set. That converter is a binary to residue converter (forward converter). After forward conversion, arithmetic operations can be performed on each modulo independently and simultaneously and without carry propagation between residues. In order to use the result of arithmetic operations in the form of a weighted number, the resulted RNS number must be converted into its equivalent weighted binary number by residue to binary conversion (reverse conversion).
Binary to residue conversion can be implemented with multi-operand modular adders simply. The arithmetic unit includes modular arithmetic circuits for each modulo channel. Reverse conversion involves a significant degree of complexity.
The algorithms of residue to binary conversion are mainly based on chinese remainder theorem (CRT) and mixed-radix conversion (MRC).
In CRT [1] , the residue number (x 1 ,x 2 ,…,x n ) with moduli set {m 1 ,m 2 ,…,m n } is obtained as follow:
Where
By MRC algorithm [1] , the residue represented number (x 1 ,x 2 ,…,x n ) can be converted into the weighted number X with moduli set {m 1 ,m 2 ,…,m n } as follow:
The cofficients a i s can be obtained from the residues by
Where n>1 and a 1 =x 1 .
The RNS has many applications in digital signal processing (DSP) , image processing, RSA algorithm and communication systems. Also, RNS offers new approaches to the design of the error detection and error correction codes. The basic arithmetic components in arithmetic logic unit (ALU) and DSP systems, such as number comparison, parity checking, base extension, sign determination, and overflow detection, turn to a tough obstacle in RNS, which limit many RNS based applications.
Some difficult problems in RNS like number comparison, sign determination, and overflow detection, can be solved based on parity checking. Moreover, parity checking is also one of the fundamental issues for the division and scaling in RNS. For the odd moduli set, the parity checking is one of the fundamental issues [9] .
Each RNS system is based on a moduli set which consist of a set of relatively prime integers. The majority of the algorithms for performing these difficult operations are based on reverse conversion. Hence, an efficient design of reverse converter greatly simplify the hardware implementation of these difficult operations.
The complexity of the residue to binary converter and also the speed of the RNS arithmetic circuits are mainly based on the form and the quantity of the moduli in a moduli set.
The most used moduli set is {2 n −1, 2 n , 2 n +1} [10]. The implementation of reverse conversion, modular addition, and multiplication of this moduli set are not complex generally, but number comparison, sign determination, and overflow detection cannot be accomplished based on parity checking because the dynamic range is even. In this case, we would not be able to use parity checking as a solution for the fundamental problems in RNS.
III. DESIGN OF REVERSE CONVERTER

A. Design of Reverse Converter for General Three
Moduli Set {2
For design of the reverse converter we use CRT algorithm. Theorems, properties and lemmas are used in the design. 
as 2 n -1, 2 n +1 are prime relative to 2 pn+1 -1 and gcd(2 n +1, 2 n -1)=1, so we can conclude that all three moduli are relatively prime to each other.
Lemma 1: the multiplicative inverse of (2 n +1)×(2 pn+1 -1) modulo 2 n -1 is 2 n -1.
Proof: since we have
With replacing above result in equation (8), we have:
Lemma 2: the multiplicative inverse of (2
Proof: since p is an even number, we have:
Replacing above result in equation (10), we have:.
Lemma 3: the multiplicative inverse of (2
-(2 + 2 + ... + 2 + 2) .
By substituting above result in equation (11), we have:
Property 1: the multiplication of residue number v by 2 P in modulo 2 n -1 is equivalent to p bit circular left shifting, where p is a natural number. The proof is mentioned in [1] .
Property 2: the negative residue number (-v) in modulo 2 n -1, is equal to the one's complement of v where 0 ≤ v < 2 n -1. The proof is mentioned in [1] .
Based on CRT algorithm, the residue number (x 1 ,x 2 ,…,x n ) with moduli set {m 1 ,m 2 ,…,m n } , is obtained as follow: 
By substituting the above result in (14), we have:
According to [1] , we can consider X as follow:
[ ]
With respect to moduli set{2 n -1,2 n +1,2 pn+1 -1}, the residues (x 1 ,x 2 ,x 3 ) has representations in binary form as follow: 
By substituting above result in equation (17), the value of X is calculated as follow in which Yx 3 is calculated by concatenation of Y and x 3 , so we don't need any additional hardware. 
Example:
For moduli set {2 n -1,2 n +1,2 pn+1 -1}for p=2 and n=3 we have the moduli set {7,9,127}. We calculate X by residue representation (4,4,3) as follow: If we consider above results, we can see that the residue representation X=130, with respect to moduli set {7,9,127} is (4,4,3), which is truly calculated.
Hardware architecture of proposed reverse converter is shown in figure 1 . For the hardware implementation we use modular adders and logic gates. In this structure the residue number )x 1 ,x 2 ,x 3 ( is changed to ((P/2)+4)-v i vectors by operation preparation1(O.P.1), which is compose of (n+2)-bit not gates. We use a (2n)-bit CSA with EAC tree For calculating Y, in which first module adds the three v 1 ,v 2 ,v' vectors.
Since v 2 in equation (24) has n-1bits of "0", so n-1 F.As replace with n-1 H.As. v' in equation (31) contains n-2 bits of "1", so n-2 F.As replace n-2 XNOR/OR pairs. The other vectors also sequentially add with the result of previous module. The worst case is when we need (p/2)+1 addition modules and the best situation is when p/2 is a multiplication of three. In this case the delay will decrease. Here we have considered the worse scenario. The delay of each CSA is equal to the delay of one FA.
In the worst case, CSA tree needs (p/2)+1 CSA modules. Afterwards a (2n)-bit one's complement adder is to be added to the modulo 2 n -1, which is a CPA with EAC, therefore its delay is two times of the delay of a CPA that includes a 2n FA modules. In order to calculate the equation (33), we use 2n not gates and at the end, we use a ((2+p)n+1)-bit regular adder.
If we consider the delay of a CSA equal to a FA' delay and the delay of a n-bit CPA equal to an (2n)-bit F.A' delay, the final delay will be calculated as follow: delay=((6+p)n+(2+(p/2))) T FA . (34)
Finally X is calculated as follow : 
-1}for n=2 we have the moduli set {3,5,31}. We calculate X by residue representation (1,2,6) as follow:
With respect to equations above, we have:
If we consider above results, we can see that the residue representation X=37, with respect to moduli set {3,5,31} is (1,2,6), which is truly calculated. The delay of this circuit is (8n+3) T FA . Comparing this delay to the same form of moduli sets with the similar dynamic range, will show us that the delay is decreased. The details of the implementation are shown in figure 2. 
2n-1 -1{ [12] Effective n ,2 2n-1 -1} [12] . Also it has better hardware cost in comparison with reverse converter with moduli set{2 n -1,2 n -3,2 n +3,2 n +1} [6]. According to [1] , using memory for large amount of n is not economically feasible for the delay and for the hardware. In addition two unusual moduli 2 n -3, 2 n +3, will cause decreasing of the performance in the arithmetic unit of RNS system. So from performance point of view our reverse converter is better than the reverse converter in [6] and It has less hardware costs comparing to the reverse converter with moduli set in {2 n -1,2 n ,2 2n+1 -1} [11], although this module has less delay. There are some important parameters in designing an RNS like speed of internal RNS arithmetic processing. For estimating this parameter we use the method of [15] in which the time-performance is compared between moduli sets. The speed of arithmetic calculation for a moduli set is determined with slowest modulo which is the critical modulo. We use the unit gate delay of parallel prefix adder for critical modulo of moduli sets of table 1, and the results are shown in table 3. The moduli set {2 n -1,2 n +1,2 2n +1}[8], with (4n)-bit odd dynamic range, with respect to table 1 has better delay and hardware cost comparing to our reverse converter with 4n+1-bit dynamic range. however, because of two moduli in form of 2 n +1 we will have a decrease in performance of RNS arithmetic unit. So timeperformance of our reverse converter is better than it. So we can conclude that the proposed reverse converter in this paper is better than the other reverse converter with similar dynamic range. V. CONCLUSION In this paper a general three moduli set for even p and it's reverse converter is proposed. This moduli set with ((2+p)n+1)-bit variant dynamic range can have different dynamic range according to different applications. The odd moduli set leads to efficient implementation of internal circuits for fundamental problems in RNS arithmetic and in overall RNS system. The reverse converter in this paper has a better performance in hardware cost and delay comparing to the other reverse converters with similar dynamic range. 
