Introduction
The Residue Number System (RNS) is an efficient alternative number system which has been attracted researchers for over three decades. In RNS, arithmetic operations such as addition and multiplication can be performed on residues without carry-propagation between them; resulting in parallel arithmetic and high-speed hardware implementations (Parhami, 2000; Mohan, 2002; Omondi & Premkumar, 2007) . Due to this feature, many Digital Signal Processing architectures based on RNS have been introduced in the literature (Soderstrand et al., 1986; Diclaudio et al., 1995; Chaves et al., 2004) . In particular, RNS is an efficient method for the implementation of high-speed finite-impulse response (FIR) filters, where dominant operations are addition and multiplication. Implementation issues of RNSbased FIR filters show that performance can be considerably increased, in comparison with traditional two's complement binary number system (Jenkins et al., 1977; Conway et al., 2004; Cardarilli et al., 2007) . As described in (Navi et al., 2011 ) a typical RNS system is based on a moduli set which is included some pair-wise relatively prime integers. The product of the moduli is defined as the dynamic range, and it denotes the interval of integers which can be distinctively represented in RNS. The main components of an RNS system are a forward converter, parallel arithmetic channels and a reverse converter. The forward converter encodes a weighted binary number into a residue represented number, with regard to the moduli set; where it can be easily realized using modular adders or look-up tables. Each arithmetic channel includes modular adder, subtractor and multiplier for each modulo of set. The reverse converter decodes a residue represented number into its equivalent weighted binary number. The arithmetic channels are working in a completely parallel architecture without any dependency, and this results in a considerable speed enhancement. However; the overhead of forward and reverse converters can counteract this speed gain, if they are not designed efficiently. The forward converters can be designed using efficient methods. In contrast, design of reverse converters have many complexities with many important factors such as conversion algorithm, type and number of moduli. An efficient moduli set with moduli of the form of powers of two can greatly reduce the complexity of the reverse converter as well as arithmetic channels. Due to this, many different moduli sets have been proposed for RNS which can be categorized based on their dynamic range. The most well-known 3n-bit dynamic range moduli set is {2 n -1, 2 n , 2 n +1} (Gallaher et al., 1997; Bhardwaj et al., 1998; Wang et al., 2000; Wang et al., 2002) . The main reasons for the popularity of this set are its well-form and balanced moduli. However, the modulo 2 n +1 has lower performance than the other two moduli. Hence, some efforts have been done to substitute the modulo 2 n +1 with other well-form RNS moduli, and the resulted moduli sets are {2 n -1, 2 n , 2 n-1 -1} (Hiasat & Abdel-Aty-Zohdy, 1998; Wang et al., 2000b) , {2 n -1, 2 n , 2 n+1 -1} (Mohan, 2007; Lin et al., 2008) . The dynamic ranges provided by these three moduli sets are not adequate for recent applications which require higher performance. Two approaches have been proposed to solve this problem. First, using three-moduli sets to provide large dynamic range with some specific forms like {2 , 2 -1, 2 + 1} where < (Molahosseini et al., 2008) and {2 2n , 2 n -1, 2 n+1 -1} (Molahosseini et al., 2009) . Second, using four and five moduli sets to increase dynamic range and parallelism in RNS arithmetic unit. The 4n-bit dynamic range four-moduli sets are {2 n -1, 2 n , 2 n +1, 2 n+1 +1} (Bhardwaj et al., 1999; Mohan & Premkumar, 2007) and {2 n -1, 2 n , 2 n +1, 2 n+1 -1} (Vinod et al., 2000; Mohan & Premkumar, 2007) . Although, these four-moduli sets include relatively balanced moduli, their multiplicative inverses are very complicated, and this results in low-performance reverse converters. Furthermore, some recent applications require even more dynamic range than 4n-bit. This demand results in introducing new class of moduli sets which have been called large dynamic range four-moduli sets. The first one is the 5n-bit dynamic range moduli set {2 n -1, 2 n , 2 n +1, 2 2n +1} that was proposed by (Cao et al., 2003) . Next, (Zhang et al., 2008) enhanced the dynamic range to 6n-bit, and introduced the set {2 n -1, 2 n +1, 2 2n -2, 2 2n+1 -3}. Moreover, proposed the fourmoduli sets {2 n -1, 2 n , 2 n +1, 2 2n+1 -1} and {2 n -1, 2 n +1, 2 2n , 2 2n +1} in 5n and 6n-bit dynamic range, respectively. In this chapter, after an introduction about RNS and reverse conversion algorithms, the architecture of the state-of-the-art reverse cnverters which have been designed for the efficient large dynamic range four-moduli sets {2 n -1, 2 n , 2 n +1, 2 2n +1}, {2 n -1, 2 n +1, 2 2n , 2 2n +1} and {2 n -1, 2 n , 2 n +1, 2 2n+1 -1} will be investigated. Furthermore, a recent contribution about modified version of the four-moduli set {2 n -1, 2 n , 2 n +1, 2 2n+1 -1} that is {2 n -1, 2 n +1, 2 2n , 2 2n+1 -1} will be studied. Finally, we present performance comparison in terms of hardware requirements and conversion delays, between the investigated reverse converters.
Background
The fundamental part of RNS (Omondi & Premkumar, 2007) is the moduli set {P 1 ,P 2 , …,P n } where numbers are relatively-prime, i.e. gcd(P i ,P j )=1 for i≠j. The binary weighted number X can be represented as X=(x 1 ,x 2 , … ,x n ), where mod ,0
This representation is unique for any integer number X in the range [0,M-1], where M=P 1 P 2 …P n is the dynamic range of the moduli set {P 1 ,P 2 , …,P n } (Taylor, 1984) . Addition (subtraction) and multiplication on RNS numbers can be performed in parallel due to the absence of carry propagation between residues. The famous algorithms for performing reverse conversion are Chinese remainder theorem (CRT), mixed-radix conversion (MRC) and new Chinese remainder theorems (New CRTs).
In order to design a reverse converter, we have to select appropriate moduli set with considering the required parallelism and dynamic range requirements. Next, the moduli should be substituted in one of mentioned conversion algorithm formulas, and the resulted conversion equations should be simplified using some modulo arithmetic properties to reduce hardware complexity. Finally, hardware implementation of the simplified equations can be done using binary hardware's such as full adders, half adders, logic gates or lock-up tables. In the following, we briefly review the formulas of reverse conversion algorithms for four-moduli RNSs. Hence, consider the moduli set (P 1 , P 2 , P 3 , P 4 ) with corresponding RNS number (x 1 , x 2 , x 3 , x 4 ). By CRT (Parhami, 2000) the weighted number X can be calculated by
The CRT has capability of parallel implementation; however its final big modulo adder results in inefficient hardware realization if it is considered in direct form. By MRC (Koc, 1989 ) the conversion can be done using the following equation:
The v i 's coefficients are as follows v 1 =x 1 (7) 2 2
Although MRC implies a sequential process, for two and three-moduli sets it can be lead to simple and efficient reverse conversion equations. The New CRT-I (Wang, 2000; uses a more efficient conversion formula 234 11 1 21 2 2 32 3 2 3 43 
The New CRTs have potentiality to create higher performance reverse converters than CRT and MRC particularly for some special four-moduli sets. Hence, many research have been done in the recent years to discover efficient four-moduli sets which can be fitted with properties of New CRTs. In the next sections, we investigate the reverse converters that are previously designed for these four-moduli sets.
3. Reverse converter for the moduli set {2 n -1, 2 n , 2 n +1, 2 2n +1}
The moduli set {2 n -1, 2 n , 2 n +1, 2 2n +1} was introduced by (Cao et al., 2003) . They have used New CRT-I to design a fully adder-based reverse converter. In the following, we briefly review the conversion formulas and hardware architecture of the converter of (Cao et al., 2003) . First, consider the moduli set {2 n -1, 2 n , 2 n +1, 2 2n +1} with corresponding residues (x 1 , x 2 , x 3 , x 4 ). The residues can be represented in bit-level as below With substituting the required multiplicative inverses and values of moduli, i.e. P 1 =2 n , P 2 =2 n +1, P 3 =2 2n +1 and P 4 =2 n -1 in the New CRT-I formulas (11) 
This main conversion equation can be simplified based on the following two well-known modulo (2 n -1) arithmetic properties. Property 1: The residue of a negative residue number (−v) in modulo (2 n − 1) is the one's complement of v, where 0≤v< 2 n − 1 (Hariri et al. 2008 ).
Property 2: The multiplication of a residue number v by 2 P in modulo (2 n − 1) is carried out by P bit circular left shift, where P is a natural number (Hariri et al. 2008 
Next, the binary vectors v i 's which have been simplified based on properties 1 and 2 are as below 1 1,1 1,0 1, 1 1,1 1,0 1, 1 1,1 1,0 1, 1 1,1 1,0 1, 1 1,3 1,2 Therefore, these six operands should be added using a modulo (2 4n -1) multi-operand adder which can be realised by four carry-save adders (CSAs) with end-around carry (EAC) followed by a modulo (2 4n -1) carry propagate adder (CPA) with EAC (Piestrak, 1994 (Piestrak, , 1995 . The hardware architecture of the resulted converter is shown in Fig. 1.   Fig. 1 . The converter for moduli set {2 n -1, 2 n , 2 n +1, 2 2n +1} (Cao et al., 2003) 4. Reverse converter for the moduli set {2 n -1, 2 n +1, 2 2n , 2 2n +1}
The moduli set {2 n -1, 2 n +1, 2 2n , 2 2n +1} has been recently introduced by to provide large dyamic range (6n-bit), and high-speed reverse converter. Similar to (Cao et al., 2003) , the New CRT-I has used to design converter but with different moduli order, i.e. {2 2n , 2 2n +1, 2 n +1, 2 n -1}. Therefore, by letting P 1 =2 2n , P 2 =2 2n +1, P 3 =2 n +1 and P 4 =2 n -1, and putting the multiplicative inverses in the New CRT-I formulas (11) 
Therefore, only five operands should be added using three CSAs with EAC followed by a CPA with EAC (Piestrak, 1994 (Piestrak, , 1995 . Hence, in comparison with (Cao et al., 2003) which needed four CSAs, the results in reduction of one 4n-bit CSA with EAC; while providing larger dynamic range. The Fig. 2 shows the hardware implementation of this converter. Fig. 2 . The converter for moduli set {2 n -1, 2 n +1, 2 2n , 2 2n +1} 5. The reverse converter for the moduli set {2 n -1, 2 n , 2 n +1, 2 2n+1 -1}
The main disadvantage of the moduli sets {2 n -1, 2 n , 2 n +1, 2 2n +1} and {2 n -1, 2 n +1, 2 2n , 2 2n +1} is the modulo 2 2n +1. Because, performance of modulo arithmetic circuits for 2 2n +1 is much
4n-bit CSA with EAC 4n-bit CPA with EAC www.intechopen.com lower than the moduli 2 n -1 and 2 n +1. Hence, have been substituted 2 2n +1 with well-formed number 2 2n+1 -1 that results in introducing the large dynamic range four-moduli set {2 n -1, 2 n , 2 n +1, 2 2n+1 -1}. Besides, they have used New CRT-II to design an efficient reverse converter for this moduli set as described below. With considering P 1 =2 n , P 2 =2 2n+1 -1, P 3 =2 n +1, P 4 =2 n -1, and the New CRT-II formulas (15)-(17), we have the following conversion equations 
Therefore, two modulo adders needed to realize (46) and (50). Moreover, (55) can be implemented using three CSAs with EAC followed by a CPA with EAC. Note that some of the full adders (FAs) of these CPAs and CSAs are simplified to XOR/AND or XNOR/OR pairs due to the constant bits of the inputs. The final result, i.e. (53) can be obtained by a (4n+1)-bit binaryadder with '1' carry-in. Fig. 3 presents the reverse converter for the moduli set {2 n -1, 2 n , 2 n +1, 2 2n+1 -1}.
6. Reverse converter for the moduli set {2 n -1, 2 n +1, 2 2n , 2 2n+1 -1}
The moduli set {2 n -1, 2 n , 2 n +1, 2 2n+1 -1} reduces the total delay of RNS arithmetic unit versus the moduli sets {2 n -1, 2 n , 2 n +1, 2 2n +1} and {2 n -1, 2 n +1, 2 2n , 2 2n +1}. However, still the inter-channel delay of modulo 2 2n+1 -1 is larger than the other three moduli, i.e. 2 n -1, 2 n and 2 n +1. Due to this, the moduli set {2 n -1, 2 n +1, 2 2n , 2 2n+1 -1} has been recently proposed by . The main advantage of this set is that it provides all of the merits of the moduli set {2 n -1, 2 n , 2 n +1, 2 2n+1 -1} while providing larger dynamic range (6n-bit). Because, enhancing modulo 2 n to 2 2n is not increasing the complexity of the reverse converter. The converter of has a two-level architecutre. In other words, they have used a combinatorial conversion algorithm; consisting both CRT and MRC. First, the previous CRT-Based design of reverse converter for the subset {2 2n , 2 n -1, 2 n +1} (Hiasat & Sweidan, 2004 ) is used to achieve the weighted equivalent of the residues (x 1 , x 2 , x 3 ) as below Fig. 3 . The converter for moduli set {2 n -1, 2 n , 2 n +1, 2 2n+1 -1} 
The hardware implementation of this converter relies on two modulo adders for realization of (61) and (67). In other words, (61) needed two 2n-bit CSAs with EAC and a 2n-bit CPA with EAC, and a (2n+1)-bit CPA with EAC is used to realize (67). Besides, (66) only requires one (4n+1)-bit regular binary adder; the required multiplications all can be done using shift and concatenation. The converter has been depicted in Fig. 4 . Table 1 presents the total hardware requirements and conversion delays of the reverse converters for the large dynamic range four-moduli sets in terms of logic gates and FAs. Note that A FA and D FA indicate the area and delay of one FA, respectively. It can be seen that the fastest converter is the converter for moduli set {2 n -1, 2 n +1, 2 2n , 2 2n +1}. Because, the dynamic range of this set is 6n-bit while the dynamic range of moduli set {2 n -1, 2 n , 2 n +1, 2 2n +1} is 5n-bit. Therefore, for providing the same dynamic range, the value of n for the first
Complexity comparison

Moduli set Hardware Requirements Conversion Delay
{2 n -1, 2 n , 2 n +1, 2 2n+1 -1} (8n+2)A FA + (n-1)A XOR + (n-1)A AND + (4n+1)A XNOR +(4n+1)A OR + (7n+1)A NOT + (n)A MUX2×1 (12n+5)D FA +3D NOT +D MUX {2 n -1, 2 n +1, 2 2n , 2 2n+1 -1} (10n+3)A FA + (n+1)A XOR + (n+1)A AND + (3n-1)A XNOR +(3n-1)A OR + (7n+3)A NOT (12n+6)D FA +2D NOT {2 n -1,2 n , 2 n +1, 2 2n +1} (11n+6)A FA + (2n-1)A XOR + (2n-1)A AND + (4n)A XNOR +(4n)A OR + (5n+3)A NOT (8n+3)D FA +D NOT {2 n -1, 2 n +1, 2 2n , 2 2n +1} (10n+6)A FA + (4n-3)A XOR + (4n-3)A AND + (2n-3)A XNOR +(2n-3)A OR + (6n+3)A NOT (8n+3)D FA +D NOT Table 1 . Hardware requirements and conversion delays of the reverse converters for the large dynamic range four-moduli sets www.intechopen.com set is smaller than the second set. Furthermore, the reverse converter for the moduli set {2 n -1, 2 n , 2 n +1, 2 2n+1 -1} relies on less hardware requirements than others. From another side, the moduli sets {2 n -1, 2 n , 2 n +1, 2 2n+1 -1} and {2 n -1, 2 n +1, 2 2n , 2 2n+1 -1} results in faster RNS arithmetic units than the moduli sets {2 n -1,2 n , 2 n +1, 2 2n +1} and {2 n -1, 2 n +1, 2 2n , 2 2n +1}. Fig. 4 . The converter for moduli set {2 n -1, 2 n +1, 2 2n , 2 2n+1 -1} 
Conclusion
The Residue Number System has been recognized as one of the efficient alternative number systems which can be used to high-speed hardware implementation of Digital Signal Processing computation algorithms. However, forward and reverse converters are needed to act as interfaces between RNS and the conventional binary digital systems. The overhead of these converters can frustrate the speed efficiency of RNS, and due to this a lot of research has been done to design efficient reverse converters. This chapter presents a study on the state-of-the-art reverse converters which have been designed for the recently introduced large dynamic range RNS four-moduli sets. We provide an overview about different reverse conversion algorithms, the recent four-moduli sets, and the reverse converter architectures.
