Abstract-Based on an algorithm derived from the New Chinese Remainder Theorem I, we present three new residue-to-binary converters for the residue number system (2 1 2 2 + 1) designed using 2 -bit or -bit adders with improvements on speed, area, or dynamic range compared with various previous converters. The 2 -bit adder based converter is faster and requires about half the hardware required by previous methods. For -bit adder-based implementations, one new converter is twice as fast as the previous method using a similar amount of hardware, whereas another new converter achieves improvement in either speed, area, or dynamic range compared with previous converters.
Adder Based Residue to Binary Number Converters for (2 n 1; 2 n ; 2 n + 1)
Yuke Wang, Xiaoyu Song, Mostapha Aboulhamid, Member, IEEE, and Hong Shen
Abstract-Based on an algorithm derived from the New Chinese Remainder Theorem I, we present three new residue-to-binary converters for the residue number system (2 1 2 2 + 1) designed using 2 -bit or -bit adders with improvements on speed, area, or dynamic range compared with various previous converters. The 2 -bit adder based converter is faster and requires about half the hardware required by previous methods. For -bit adder-based implementations, one new converter is twice as fast as the previous method using a similar amount of hardware, whereas another new converter achieves improvement in either speed, area, or dynamic range compared with previous converters.
Index Terms-Adders, algorithm, arithmetic, circuit, residue number system.
I. INTRODUCTION
T HERE has been interest in residue number system (RNS) arithmetic as a basis for computational hardware since the 1950s [1] , [2] . During the past decade, the RNS has received considerable attention in arithmetic computation and signal processing applications, such as fast Fourier transforms, digital filtering, and image processing [2] , [3] . The main reasons for the interests are the inherent properties of RNS such as parallelism, modularity, fault tolerance, and carry-free operations [3] . The technology advantages offered by VLSI have added a new dimension to the implementation of RNS-based architectures. Several high-speed VLSI special-purpose digital signal processors have been successfully implemented.
The two most important issues for the residue arithmetic are the choice of moduli sets and the conversion of the residue to binary numbers. The residue number system based on the set of moduli has gained popularity and is expected to play an increasing role in RNS digital signal processing [5] . For general moduli sets, the residue to binary conversions are traditionally based on the Chinese Remainder Theorem (CRT) or mixed-radix conversion. Some new general conversion algorithms called New Chinese Remainder Theorems have been recently proposed with smaller size modulo operations [13] , [14] . Several conversion methods for have been reported [6] - [11] , [15] - [18] . Early converters [17] for such moduli sets use ROM, which can be limited by the size . In recent years, converters using -bit or -bit adders have been proposed. These converters are designed using special formulas rather than the general CRT algorithm, and improvement in terms of hardware complexity has been reported. Detailed comparisons of all those converters are presented in Tables I and II. In this paper, for the moduli set , we present new and uniform algorithms designed using the New Chinese Remainder Theorems for the RNS to binary conversion. Three different converters using either -bit or -bit adders are proposed. The -bit adder-based converter is faster and requires about half the hardware required by the previous methods [7] - [9] . For -bit adder-based implementations, one new converter is twice as fast as the previous method [6] using a similar amount of hardware, whereas another new converter achieves improvement in both speed and area. The amount of hardware for the new converters is similar for the -bit adder-based converter compared with the one in [9] . However, in [9] , not the entire dynamic range of numbers is used.
In the following, we first introduce background material and derive the formulas; then, we show an example and propose three different hardware implementations. , where , and is the multiplicative inverse of . The CRT requires a modulo (large-valued) operation, which is not very efficient. Therefore, the converters proposed in [6] - [11] , [15] , [16] , and [18] use specially designed algorithms to remove the modulo operation or to reduce the size of the modulo operation. For example, the converters in [6] and [14] are based on the formula , and methods are required to compute the coefficients and . In [7] , [9] , and [15] , the converters are based on the formula , and methods for computing are needed in each paper. In [7] , the number is calculated as , where , , , and are -bit numbers obtained from . On the other hand, the third formula in [15] reduces the size of the modulo operation from to at the expense that some part of the dynamic range will not be useable. Recently, some alternative general conversion algorithms [the New Chinese Remainder Theorems (New CRT-I, II, and III) [13] , [14] ] have been proposed, which reduce the size of the modulo operation required by the CRT.
II. BACKGROUND

New Chinese Remainder Theorem I (New CRT-I):
Given the residue number , the binary number can be computed by (4) , shown at the bottom of the page, which can be easily simplified as (5) , shown at the bottom of the page, where , . Based on the New CRT-I, we have the following theorem for . Theorem 1: For a three moduli set , the binary number can be calculated as (6) where , and . In Section III, we apply (6) to the moduli set to design the residue to binary converters.
III. BASIC FORMULAS
The following Theorem 2 is a direct application of Theorem 1.
Theorem 2: For the moduli set , the number can be computed from by the formula Next, we present an example using the above formulas. Example: Consider the example shown in [6] . Let and a number 407, which can be represented as (1, 7, 2) in the moduli set (7, 8, 9) . Now, given (1, 7, 2) (001, 111, 0010), we have the equation at the bottom of the next page. Compared with the long calculation on [6, p. 56], the above process is much simpler. 
IV. NEW CONVERTERS
In Section III, we presented the necessary formulas for residue to binary conversion. In this section, we propose new converters using -bit or -bit adders based on the formulas (8.1)-(8.4). The addition of is shown in Fig. 1(a) and (b) . Fig. 1(b) shows the block diagram of the unit. It consists of FAs, two MUXs, one XOR gate, and inverters. The delay of this unit is the delay of the FA plus the delay of an inverter and the delay of a MUX. The circuit produces two numbers and . We denote and , and then, . Next, we perform the addition using FAs. The signal is connected to the carry-in bit of the full adder at the last FA in Fig. 2(a) . Fig. 2(b) shows the block diagram of the unit. It consists of inverters, one HA, and FAs. The delay of this unit (nFA2) is the delay of a full adder plus the delay of an inverter. The circuit produces two numbers , . We denote and , and then, . Therefore, , as defined in (8.2), now becomes i.e., (11) where , , , and are all -bit numbers; is a one-bit number.
A. Basic Operations to Compute and
The addition in (11) can be done in many different ways using -bit or -bit adders. These different implementations will be shown in the following.
B. -Bit Adder-Based Converter-Converter I
In the following, we present the new Converter I implementing the addition in (11) using a -bit adder.
where , and are two -bit numbers, and is a one-bit number. In Fig. 3(a) , the units nFA1 and nFA2, which are used to produce , , , and , are connected to a -bit 1's complement adder. The -bit adder produces the value , which forms the MSBs of the number , whereas forms the LSBs of .
The hardware required in the new Converter I shown in Fig. 3(a) is as follows:
FAs, one HA, two MUXs, one XOR gate, inverters, and one -bit 1s complement adder. The delay of the converter is the sum of the delay of the FA , the delay of an inverter , the delay of MUX , and the delay of the -bit 1s complement adder [7] , i.e., . In the literature, one of the best converters using -bit adders is presented in [7] . In order to compare the performance, we show the main components used in the converter proposed in [7] as Fig. 3(b) . The delay in [7] is . For simplicity reasons, we only compare one version of the implementation in [7] . The second implementation has the same result. From the side-by-side comparison, it is easy to see that we save one -bit CSA with end around carry (EAC).
Detailed comparison of the other related converters are summarized in Table I , where the data for [8] , [9] , and [11] are from [7, Table I ]. In summary, Converter I is the best converter using -bit adders, using about half of the hardware used in [7] . The reason for such improvement is that the converters in [8] , [9] , [11] , and [18] use the formula , where , , , and are bit numbers obtained from , whereas the new Converter I is derived based on the New Chinese Remainder Theorem I and is computed by , which reduces the fournumber operation into two numbers.
C. -Bit Adder Based Converters-Converter II and III
The addition in (11) can also be done by -bit adders. In this section, we propose two such converters. The performance is to be compared with the performance of the converter in [6] , [15] , and [18] , which use -bit adders as well. Since we can only generate -bit numbers using -bit adders, we therefore obtain the value in the form , where and are both -bit binary numbers.
Recall that , where , , , and are all -bit numbers; is a one bit number. Using an -bit adder, we can add and together with , which generates a sum and a carry . Similarly, we can add and using an -bit adder, which generates a sum and a carry . Since the addition is module addition, the carry represents a number that should be added to the number . For the case where the carry is 0, the sum is the value . For the case where the carry is 0, the sum is the value such that . However, when the carries and are not 0, the value and must be modified to obtain the correct value of and . In the following, we propose Converter II and . Similarly, . The selector implements these two functions. Note here that the selector does not introduce any extra delay since CLAs are used, and the carries , , , and are generated during the carry-generation phase of the CLAs and are available for evaluation to the selector while the CLAs perform the summation.
The hardware required in Fig. 4 includes FAs, one HA, MUXs, one XOR gate, inverters (including four inverters for the selector), two AND gates for the selector, and -bit CLAs. The delay of the converter is . Converter III: Considering the fact that and , we can replace the CLA2 and CLA4 in Fig. 4 by other combinational circuits that perform the operation and . Fig. 5 shows such a converter. The circuit plus1 performs the function of adding 1 to a -bit input number. Consider , . We have the following equations, which imply that the circuit plus1 requires XOR gates and AND gates plus 1 inverter.
The hardware required in Fig. 5 includes FAs, MUXs, XOR gate, inverters (including four inverters for the selector and two for the plus-1 circuit), AND gates for the selector and the plus-1 circuit, one HA, and -bit CLAs. The delay of the converter is . In order to make clear comparison, the Fig. 6 shows the main components for the converter proposed in [6] . No detailed implementation is given for each module in [6] . We evaluate the performance based on [4] . Recently, the results in [4] are also used to evaluate the performance in [7] . Modules M1 and M2 require two CLAs and one CSA, where all are -bit adders, one XOR for generating C1, and inverters for 2s complement operation. M3 and M4 require two additional CPAs and inverters for 2s complement opseration. Module M6 uses nine AND gates, one OR gate, eight inverters, and one XOR gate. M5 uses bit memory to store the value. The delay is . Two more recent converters using -bit adders have also been proposed in [15] and [18] . The one in [18] is based on the approach in [7] and, therefore, has high hardware cost, whereas the one in [15] has similar hardware cost as the new converters proposed here. However, the converter in [15] has some unused dynamic range. Table II summarizes the comparison of the two converters proposed in this paper as well as the converter in [6] , [15] , and [18] . The delay of Converter II is almost half of the delay of the converter in [6] .
Assume the straightforward implementation of the CLA, which consists of a carry look-ahead unit and a summation unit, which, in total, require AND gates, XOR gates, and OR gates. The hardware requirement in [6] is even higher than the hardware required in Converter III, whereas its delay is longer.
V. CONCLUSION
Three different residue-to-binary converters for the special moduli have been presented in this paper. Compared with various previous proposed converters, the new converters proposed here have better performance in terms of speed and area. The new converters are designed based on the recently introduced New Chinese Remainder Theorems. It is expected that for other moduli sets, the New Chinese Remainder Theorems will also improve the design of residue-to-binary converters.
