Abstract-Residue Number System (RNS) is a valuable tool for fast and parallel arithmetic. It has a wide application in digital signal processing, fault tolerant systems, etc. In this work, we introduce the 3-moduli set {2
n , 2 2n -1, 2 2n +1} and propose its residue to binary converter using the Chinese Remainder Theorem. We present its simple hardware implementation that mainly includes one Carry Save Adder (CSA) and a Modular Adder (MA). We compare the performance and area utilization of our reverse converter to the reverse converters of the moduli sets {2 +1} that have the same dynamic range and we demonstrate that our architecture is better in terms of performance and area utilization. Also, we show that our reverse converter is faster than the reverse converter of {2 n -1, 2 n , 2 n +1} for dynamic ranges like 8-bit, 16-bit, 32-bit and 64-bit however it requires more area.
Index Terms-Residue arithmetic, Residue to binary converter, Chinese remainder theorem (CRT)

I. INTRIDUCTION
Residue Number System (RNS) arithmetic is a valuable tool for theoretical studies of fast arithmetic [5] . With its carry-free operations, parallelism and fault tolerance, RNS has been used in computer arithmetic since 1950s. These properties have made it very useful in some applications including digital signal processing and fault tolerant systems [4] . Different moduli sets have been presented for RNS that have different properties with regards to reverse conversion (Residue to Binary or R/B), Dynamic Range (DR) and arithmetic operations. The moduli of the forms 2 n , 2 n -1 and 2 n +1 are very popular according to their easy arithmetic operations. The most famous moduli set is {2 n -1, 2 n , 2 n +1} and several methods have been proposed for its reverse conversion and the best method has been outlined in [11] . On the other hand, there are some other moduli sets that have greater dynamic ranges in comparison with this moduli set. They include; the moduli sets {2 -1). It has been shown that the reverse converter of this moduli set has superior area-time complexity in comparison with the reverse converters of [2] and [3] . In [9] the moduli set {2
n , 2
+1, 2 n +2 (n+1)/2 +1} has been focused on which has the same dynamic range of 2 n × (2 4n -1) and a new reverse converter has been proposed that is more efficient than the previous converters including [8] and [10] . In this paper, we introduce the moduli set {2 n , 2 2n -1, 2 2n +1} that has the same dynamic range as [1] and [9] but the reverse conversion can be carried out faster and it requires lower hardware area in comparison with [1] and [9] . Our reverse converter is faster than the reverse converter of [11] for dynamic ranges like 8-bit, 16-bit, 32-bit and 64-bit however it utilizes more area than the reverse converter of [11] .
In Section II of this paper we provide a short background for RNS and also introduce the moduli set {2
n , 2 2n -1, 2 2n +1}. In Section III, we present two lemmas and consider the reverse conversion scheme for the proposed moduli set using the presented lemmas and the CRT. In Section IV, we provide the hardware implementation of the reverse converter and in Section V we evaluate this converter and compare the results with similar works. Finally, in Section VI we present our conclusions.
II. BACKGROUND
RNS is defined by a set S of N integers that are pair-wise relatively prime. That is +1)=d=1.□ So our proposed moduli set can be used in RNS and we can consider its reverse converter.
III. REVERSE CONVERTER
In this section, we present the reverse converter of the moduli set {2
n , 2 2n -1, 2 2n +1} but first, we provide two lemmas which are based on the properties that have been used in calculating the reverse converters
Lemma 1: The residue of a negative residue number (-v) in modulo (2 n -1) is calculated by the one's complement operation where 0≤v<2 n -1. Lemma 2: The multiplication of a residue number v by 2 P in modulo (2 n -1) is carried out by P-bit circular left shift where P is a natural number. Now, to calculate the number X from its residues, we can apply the CRT. The CRT is formulated as; 
for (4) we have
and for (5) we write
where K is an integer number and depends on the value of X. By replacing (2)- (5) in (6) we have:
By dividing the both side of (7) by 2 n and calculating the floor values in modulo (2 4n -1) we have 
In this case the number X can be computed by
Equation (8) can be written as 
Now, we consider (12)- (14) and simplify them for implementation in a VLSI system. It is necessary to note that r i,j means the j-th bit of R i .
Evaluation of S 1 :
The residue R 1 can be represented in 4n bits as follows; 
where r means the complement of r.
Evaluation of S 2 :
The residue R 2 can be represented in 4n bits as follows; 
we evaluate the two parts of S 2 separately using Lemma 2 
that is a 4n-bit residue number.
Evaluation of S 3 :
The residue R 3 can be represented in 4n bits as follows; 
for the two parts of S 3 we use Lemma 2 and we write 
so, S 3 includes two 4n-bit numbers that are S 3,1 and S 3,2 .
IV. HARDWARE IMPLEMENTAION
To implement the reverse converter, four 4n-bit numbers should be summed up in modulo (2 4n -1). This requires a 2-level Carry Save Adder (CSA) tree that includes two 4n-bit CSAs. Nevertheless by considering (17) and (27), it is clear that the 3n rightmost bits of S 1 and also the n leftmost bits of S 3,2 are ones. So, we replace the 3n rightmost bits of S 3,2 with the same bits of S 1 . Based on this manipulation, the new numbers have been shown in (28) and (29). Consequently, now S 3,2 contains 4n ones and we know that it is equivalent to zero in modulo (2 4n -1). Now, we have 3 numbers and therefore, the required 2-level CSA can be replaced by only one CSA. Fig. 1 shows the hardware architecture of the reverse converter. The Operand Preparation (OP) component includes some wires and inverters and prepares the 4n-bit numbers for the Multi Operand Modular Adder (MOMA). The CSA tree includes only one 4n-bit CSA with End-Around Carry (EAC) [6] . The last component in MOMA is a Modular Adder (MA) and can be implemented using the methods of [6] , [7] or [15] . The output of this adder is equivalent to
and consequently, X can be computed by using (9) . [9] provide the same dynamic range as our moduli set. So, in this section we compare two properties of our moduli set to the moduli sets of [1] and [9] ; 1) Time and area complexities of the reverse conversion and 2) Time complexity of the arithmetic operations in their moduli. Finally, we compare our reverse converter to the reverse converter of a 3 moduli set proposed in [11] . Now, we compute the hardware utilization of our reverse converter in terms of adders and basic gates. As outlined in the previous section, we should sum up three 4n-bit numbers S  1 , S 2 and S 3,1 . For this purpose, one CSA which includes 4n Full Adders (FAs) is sufficient. But by considering the operands, it is clear that some of these FAs could be simplified further. For the (n-1) rightmost bits, we need (n-1) pairs of XNOR/OR gates instead of (n-1) FAs, since one of the inputs of each FA is 1. Similarly, for the middle (2n-1) bits, we replace the (2n-1) FAs with (2n-1) pairs of XOR/AND gates, since one of the inputs of each FA is 0. For the rest of the bits, we use (n+2) FAs. Besides this MOMA, the operand preparation includes some wires and inverters. Ignoring the wires, it includes (3n+1) inverters. The total amount of the used hardware is shown in Table I . It is clear from Table I that our proposed reverse converter requires very low hardware area in comparison with the reverse converter of [1] and also our reverse converter is superior to the reverse converter of [9] which is the most efficient converter for the moduli set {2 n , 2
+1}. In [9] , one 4×1 multiplexer is required for generating one of the 4n-bit operands of the CSA tree. So this operand can have four possible values and they would only contain fixed ones and zeros. To consider its associated CSA, we have assumed that the number of ones is approximately equal to the number of zeros and this assumption does not affect the comparison. The total delay of our reverse converter is the sum of the delays of three components: the operand preparation, CSA and MA. The delay of operand preparation is equal to the delay of a NOT gate. For the CSA, the delay is the delay of an FA. For the MA, different methods can be applied that have different delays [6] [7] [15]. Here we have used the modular adder of [15] . Adopting the unit gate delay [11] [13]15], we assume t inv =t and =1, t mux =2, t FA =2, t xor =2 and consequently using the mothod of [15] , t MA(n) =2log 2 (n)+3. Table II shows the delays of the reverse converters. It can be concluded form Table II that we have eliminated the delay of two FAs in comparison with [1] and the delay of three FAs in comparison with [9] . In addition to this delay improvement, we have utilized much lower hardware than [1] and [9] . 2 log ( ) 7 5; if log (2 ) log ( )
2 log ( ) 5 5; if log (2 ) log ( ) 1 (2) n n m n n m
So far, we have shown that our converter has better area and time complexities than those of [1] and [9] , but we have left one question unanswered. For an equal dynamic range, is a 4 or 5-moduli set always faster than a 3-moduli set? It is the magnitude of the largest modulo that dictates the speed of arithmetic operations; however, speed and cost do not just depend on the width of the residues but also depend on the moduli chosen [5] . Consequently, for the moduli set of [1], modulo 2 2n +1 determines the overall speed of the RNS. The same is true for our proposed moduli set. Therefore our moduli set and the moduli set of [1], are both restricted to the time performance of modulo 2 2n +1. The moduli set of [9] includes two moduli of (2 n -2 (n+1)/2 +1) and (2 n +2 (n+1)/2 +1). Here, we compute the delay of addition in modulo (2 n +2 (n+1)/2 +1) by using the method of [11] and we compare it to delay of addition in modulo (2 2n +1) that has been computed by using the method of [13] . Table III shows that addition in modulo (2 2n +1) is much faster than addition in modulo (2 n +2 (n+1)/2 +1). So, we can conclude that although [9] has five moduli, it is not faster than our proposed moduli set. Therefore our moduli set In addition to comparing [1] and [9] , we would like to compare our reverse converter to the reverse converters of 3-moduli sets. In [14] , it has been shown that moduli set {2 n -1, 2 n , 2 n +1} has the fastest and the most area efficient reverse converter among the other 3-moduli sets for the dynamic ranges of 8-bit, 16-bit, 32-bit and 64-bit. So, we compare our reverse converter to the reverse converter of [11] which is the most efficient reverse converter for {2 n -1, 2 n , 2 n +1}. For the sake of a fair comparison, we consider the moduli set {2 m -1, 2 m , 2 m +1} where m is chosen in a way that provides similar dynamic ranges to our moduli set and more or less m can be the floor or ceiling value of 5n/3. By using this approximation, the hardware utilization of the reverse converter of [11] has been derived and included in Table I . In Table II , we have compared our reverse converter to the reverse converter of [11] considering two cases. In case (1) our reverse converter is faster than the reverse converter of [11] and it is worthwhile to mention that for example, for n in [1, 50], this case covers 73% of dynamic ranges including 8-bit, 16-bit, 32-bit and 64-bit. In case (2) which covers 26% of dynamic ranges, our reverse converter and the reverse converter of [11] have the same delay but [11] requires less hardware area. Table IV shows the area and delay comparison of the proposed reverse converter and that of the [11] using the unit-gate model where the hardware area utilization of the gates are A NOT =A AND =A OR =1 and A XOR =2. The hardware area utilization of the modular adder has been computed using the adder of [15] . It can be concluded that the comparison of our work and [11] is purely dictated by the chosen dynamic range. However, for the discussed dynamic ranges, our reverse converter is faster than the reverse converter of [11] while [11] requires less area.
VI. CONCLUSION
In this paper we proposed the moduli set {2 n , 2 2n -1, 2 2n +1} and its reverse converter. This moduli set provides the dynamic range of 2 n ×(2 4n -1) and the implementation results have shown that its reverse converter has better area and time complexities in comparison with the moduli sets with the same dynamic ranges. We also showed that for majority of the similar dynamic ranges, our reverse converter is faster than the reverse converter of {2 n -1, 2 n , 2 n +1} but the reverse converter of {2 n -1, 2 n , 2 n +1} has less area.
ACKNOWLEDGMENT
The authors wish to acknowledge the valuable help of Dr. T. Vergos with the modular adders. 
REFRENCES
