Residue number system is popular in high performance arithmetic applications like digital signal processing because of its carry free nature, modularity and error correcting properties. But these opportunities are eclipsed by the high area and time requirements for reverse conversion. In this regard, we present two new techniques for residue-toweighted number system conversion. The first one is based on the popular Chinese Remainder Theorem. Here by evaluating the quotient the number is decoded. The second one deals with residue-to-mixed radix conversions. The arithmetic based technique replaces the conventional hardware intensive look up tables by simple adders. An OHR based high speed MRC is also presented. The mixed radix converters described are memoryless and are hardware efficient compared to conventional techniques.
INTRODUCTION
Residue number system (RNS) is popular in high performance arithmetic applications like Digital Signal Processing (DSP) systems because of its carry free nature, modular structure and error correcting properties [1,2]. It simplifies large number computation by decomposing them into a set of parallel independent computations on residues generated with respect to a convenient moduli set. Even though RNS is capable of performing high speed arithmetic, it was not popular in general computing because of the difficulties involved in division, magnitude comparisons, forward and reverse conversions between binary and residue numbers etc. Recent work in RNS provided a number of techniques to reduce the complexity of Residue-to-Binary Converters (RBC).
In RNS, an integer in the interval [O,M) can be represented as a set of n residues (XI, x2, . . . , x,), with respect to a moduli set of r painvise relatively prime integers ( ml, m2, ..., m, ) where x, = X mod m and M = n m i .
Arithmetic operations are computed by the formula:
( xl, xz, ...., x, ) @ ( yI, yz, ... , yr ) = (a, 22, ..., z, ) where zi = Ix 8 y i I and @ denotes one of the operations of addition, subtraction or multiplication. Thus arithmetic operations on residues can be performed in parallel without any carry propagation among the residue digits. However, before performing any operations on residues, first the number has to be converted from binary-to-residue. Figure 1 . These stored values are accessed in a serial fashion [12] . The main drawback of this method is that it is hardware intensive and as bits per modulus increases the hardware grows exponentially. Table Based MRC In this paper an efficient method for reducing the hardware implementation using CRT and MRC techniques are presented. Section 2 introduces the proposed converter followed by a performance evaluation. In Section 3 the new memory less residue-to-mixed radix converters are introduced and a comparison is made with respect to the traditional look up table approach. Finally Section 4 offers conclusions.
DECODING USING QUOTIENT EVALUATION
One of the major problems with the above CRT implementation is its hardware complexity. To reduce the hardware complexity we propose the following modification. This is done by evaluating the quotient alone so that it always uses less hardware. The only requirement we impose is that the moduli set contains the modulus 2", where n is an integer.
The reduction in the size of the ROM table is based on the following theorem. Figure 2 shows the general scheme of the modified residue-to-binary converter. Functionally the proposed architecture is similar to the converter given in [ 11 1. In our approach the width of the ROMs which perform the look up operation has been reduced by n bits. This is achieved by storing the contents with respect to modulus Mi instead of M. This amounts to an overall reduction of 2" X n x r bits of ROM for the entire design. Furthermore this reduces the size of the CSA tree and the accompanying CSA, CPA and MUX stages in comparison to the design given in [ll] . 
Performance Evaluation

MEMORYLESS MRC
Mixed radix converters proposed so far are based on a look up tables [12] . In this section we propose a new approach whereby all look up tables in previous implementations are replaced by arithmetic units. The basic idea is that instead of precomputing the values corresponding to the coefficients, we compute the values in real time. The method can be explained as follows. In a typical look up table implementation the function of each block is to perform a modulo subtraction and a modulo multiplication by a constant.
In general, if two inputs to a block are labeled as X and Y each n bits wide, then the outputs of the block are given by: z = I(X -Y)kl, (4.1) where k is an integer constant, and X and Y can be expressed as X = 2 " -'~~-~ + 2 n -2~ n-2 + ..-+ x o , and Y = 2"-' yn-' + 2n-2 yn-2 t + yo . Substituting in thus requiring a total of 3840 bytes. This needs a total transistor count of 6597 to implement one look up table. The number of transistors required to implement a single adder block in the method described is 1040. This is considerably less than the number of transistors used (6597) in the earlier approach. Table 2 shows the requirement for various moduli, having different bit lengths. We can see that the hardware requirement grows exponentially for a look up table approach, whereas it is more or less linear with respect to the number of bits per moduli in our adder based approach.
An alternative method of eliminating the look up tables in MRC is by making use of the property of One-hot Residue Number system (OHR) described in [14] . With this one-hot coded representation of the residue digits, addition can be performed by cyclic shifts (rotations). One of the operands is rotated by an amount equal to the other operand. The rotation can be performed by barrel shifters. These circuits compute all possible rotations in parallel and pass when required the appropriate one to the output. The barrel shifters can be built using pass transistors or transmission gates. A subtractor is implemented in the same way, except that the subtrahend input bus is permuted to generate the additive inverse (modulo m,) of its operand. Each block of Figure 1 performs a modulo subtraction and modulo multiplication by a constant. These blocks can be replaced by OHR cell, which is shown in Figure 4 . Therefore we can use an OHR subtractor here and multiplication by a constant modulo m, is done by wire transposition. This approach is better when conversion speed is important. Table 2 shows the hardware requirement in terms of transistors for different moduli, having different bit lengths. We can see that the hardware requirement grows exponentially for a look up table approach, whereas it is more or less linear with respect to the number of bits per moduli in adder based approach. The hardware growth is at a quadratic rate with OHR blocks. 
CONCLUSIONS
In this paper efficient residue-to-weighted number system converters for RNS are presented. A new technique has been introduced to reduce the complexity of the converter design by evaluating the quotient alone and the number X is deduced from the quotient. The second approach is for residue to mixed radix conversion, that replaces the look up table approach by simple adders and OHR cells. The OHR based conversion is preferred in places where speed of conversion is important. The total hardware requirement has been shown to be less than that used in traditional designs. 
Quotient evaluation
Moduli
