Abstract-This paper presents a new scheme for designing residue generators using threshold logic. This approach is based on the periodicity of the series of powers of 2 taken modulo k n ± 2 . In addition, a new algorithm is proposed to obtain a new set of partitions which are more advantageous in terms of area and delay for the presented topology. Experimental results in the analized range of k and n show that new proposed circuits using the novel partitioning are 70% faster and provide area savings of 64%, when compared with similar circuits using the partitioning methods presented to date.
INTRODUCTION
Residue arithmetic has been used in digital computing systems for many years [1] . The modular characteristic of the Residue Number System (RNS) offers the potential for highspeed and parallel arithmetic. RNS is a carry-free arithmetic system that improves the computation performance, and exhibits more parallelism and smaller area [1] [2] . Arithmetic operations, e.g. addition, substraction, and multiplication can be carried out on residue digits independently and concurrently more efficiently than the conventional two's complement systems [1] . RNS has shown significant efficiency in implementing different types of Digital Signal Processing (DSP) applications, such as filtering [3] , FFT computation [4] , fault-tolerant computer systems [5] , communication [6] , and cryptography [7] .
Whatever the application, RNS uses a number of residue generators which forms binary-to-residue converters [1, [8] [9] . However, residue converters can compromise the speed of the whole system. The most well-known RNS moduli set is formed by three-moduli RNS } 2 , 2 , 2 { k k n n n + − with 1 = k . This moduli set has attracted attention since it is well suited for effective regular VLSI implementations [10] and very efficient combinational converters from/to the binary system exist [11] . Different moduli sets have been proposed for the direct/reverse conversion to increase the parallelism and the dynamic range to n m 4 = , for example } 3 2 , 1 2 { ± ± n n [12] , and } 2 , 1 2 { 2n n ± [13] . In addition, another four moduli set can be derived } 2 , 2 , 2 , 2 { k positive/negative odd numbers or zero, chosen in such a way that a valid group of co-prime numbers is obtained [8] . The generators k n ± 2 with k odd and 3 ≥ k are more complex than the ones with 1 = k . However, some aplications of this sorts of residue generators to the field of DSP have been proposed in [5] , [14] . At the present, one of the most efficient generator mod k A n ± = 2 are implemented using the periodic properties of powers of two modulo A, A j 2 [14] . This paper proposes a quite different approach to build any moduli set with an appropiate dinamic range, which is based on the concept of threshold gate (TG) [15] , rather than the traditional logic. TG logic allows for the implementation of more complex functions while reducing the logic depth and gate count, when compared with traditional logic gates [16] .
In this paper we propose an improved topology for residue generators mod A. The implementation of this topology is carried out in threshold logic, which takes advantages of a novel input partitioning also proposed herein. Experimental results show that new residue generators using the novel partitioning can achieve improvements over 60% both area and delay in comparison with other ways of partitioning. This paper is organized as follows. Section II presents the topology based on threshold gates and the parameters which define them. Section III introduces the mathematical groundwork for the periodicity of the series powers of 2 carried out by mod A. In addition, a new efficient algorithm to compute the periodicity of any A is proposed. Section IV details the proposed general design. Section V presents a complexity analysis, in terms of delay and area, of the obtained circuits which are completed with the experimental results of Section VI. A summary of the work proposed is presented in Section VII.
II. GENERATOR MOD A
An m-input Boolean function ( ) 
The direct division of X by A to find the remainder is an inefficient solution. The technique proposed in [1] give us an improved algorithm based on the expression: 
If the values of . In addition, these functions are periodic functions and so they can be implemented [15] by using separate TG networks. 
which has a size of bits of ( )
, provides the the quotient Q and the reminder R.
represents the quotient Q, and the a-bit number
is the binary representation of the remainder R. In terms of delay, the division operation is implemented in
levels by using also L threshold gates. In terms of area, the main contribution of hardware is not given by the weights of the inputs but by the exponential increase of the weights associated to the quotient output, A r 2 . This topology presents an unfeasible power-delay trade-off for a large values of r. Therefore, a more efficient scheme which reduces the value of the summatory of the sequence of residues is required. The proposal scheme is based on the periodic properties of powers of two mod A.
III. PERIODIC PROPERTIES OF
The periodicity of the series of 
A. Generator Mod A based on the P(A) concept
This sort of generator is based on the adapted definition of period of the odd module A P(A) [17] which is described as follows:
• Definition 1: The period of the odd module A P(A) is the minimum distance between two distinct 1's in the array of residues of powers of 2 taken mod A, i.e P(A)
is simply called the period of A [14] .
B. Generator Mod A based on the HP(A) concept
This sort of generator is based on the adapted definition of half-period of the odd module A HP(A) [17] which is described as follows:
• 
and with the concept of P(A) is derived that
similarly with the concept of HP(A) is derived that
C. Generator mod A based on a minimal summatory of contributions:
This novel generator is based on a new algorithm herein proposed, which explores the possibility of using positive and negative residues in the array of powers of 2 (as in the HP(A) case). This algorithm reduces the weight of the residue values in order to obtain a minimal weighted contribution for the summation of residues (4), and also for the value of r (5).
The proposed method to obtain the array of residues of powers of 2 is based on the property which establishes that for 
. Taking into account this idea, a novel algorithm is described applying that [14] :
Applying (11) recursively we can derived the new algorithm (Algortihm 1). Up to now, to obtain array of residues of powers of 2 mod A, the calculation of each 
In all cases we can order the partitions into 12 sets i Γ , with the same weight i [14] . For example in case 1, }, ,
. Additionally, the weight, i, of one set of partition may be different for each case, for example 9 x is included into the set 5 Γ in case 1, 8 − Γ in case 2, and 5 Γ in case 3.
IV. DESIGN PROCEDURE
In this section, we present a novel scheme for the m-input generator mod A based on threshold gates for the three different ways of partitioning presented in the previous section. This generator mod A can be designed by using the following procedure:
Step 1: Obtaining the array of residues of powers of 2 for m bits by means of the proposed algorithm.
Step 2: Standardize the range of the ouput of the residue generator to form the i Γ sets in a positive range 1 0 − ≤ ≤ A i as is in case 1. In cases 2 and 3 the residue generator output without any correction is in a range positive and negative. In [15] is proposed the addition of a correction term to obtain a typical range 1 0 Is important to clarify that the COR factor in case 3 is obtained in the same way only replacing j b by j v . From now on and since COR is a constant, the correction factor is added to the threshold value of each TG of the residue generator.
Step 3: Once the sequence is defined and corrected we make the partition of the set of input bits } , ,..., { In order to better illustrate the design procedure, the design of a 20-bit generator mod 37 is presented. Table I summarizes the parameters COR, S max , Q and r needed for the characterization of this case study. 
Figure 2 depicts the complexity of the topologies with the differents ways of partitioning presented. The benefits exhibited in topology case 3 can be explained by the significant reduction in the value of S max , Q and r presented in Table I . 
V. COMPLEXITY ESTIMATION AND COMPARISON
Improved residue generators mod A can be achieved with the use of the proposed way of partitioning, which reduces the value of the summatory of sequence of residues (4). The novel residue generator mod A based on the proposed algorithm (case 3) is compared with the same topology using the other cases of partitioning [14] . All the designs were designed following the procedure described in Section IV. In order to obtain a reliable comparison of area and delay values, a simplified gate model based on the β-driven approach [18] is used. Fan-out and fan-in delays are not considered.
A. Delay Estimation
To estimate the delay, we need to analyze the depth of the stages which provides the value of the quotient, Q, and the residue, R. The first stage of TGs has a depth of r, whereas the second one has a depth of a, as shown in Figure 1 . Thus, the delay estimation is given by:
Figure 3a depicts the delay estimation, in threshold gate levels for the new residue generator. These values reflect the average results for the several k odd numbers
, for each n. It is noticeable that the topology using case 3 of partitioning is faster than the same topology using the other cases of partitioning, resulting in a delay reduction of at least 6% in the whole range of analyzed n and k values.
B. Area Estimation
Let us consider the transistor-input area with an associated weight equal to 1 as the area unit used for this estimation. In our approach, an input j x with an associated weight . The obtained estimation suggests a significant area reduction for the topology using the proposed partitioning in comparison with the related art [14] . Thus, area reductions of at least 42%, are expected. The area estimation results indicate an exponential increase with n, for all three cases, giving an unfeasible power-delay trade-off for large values of n.
VI. EXPERIMENTAL RESULTS
In order to evaluate the performance of the proposed residue generators experimental results were also obtained and compared. The topologies have been validated in CADENCE using 0.13μm Standard technology from UMC. Simulations of gates for the β-driven approach with this technology have proven that the fan-in is limited to small values of weighted inputs. Note that the technology used is not optimized for the design of threshold gates. Nevertheless, experimental results for residue generators with 16 input bits with presented topology have been obtained and compared. , for case 1 and 2 normalized to the proposed case 3. The proposed topology with the novel method of partitioning exhibits equal or better area-delay trade-off in comparison with the related art presented in [14] . For example, in the particular case of residue generator mod 19 , ( ) 
VII. CONCLUSIONS
Novel methods to design residue generators modulo k n ± 2 , for threshold logic are presented for any n and k. The threshold logic approach takes advatage of the periodic properties of the powers of two modulo k n ± 2 . The proposed scheme reduces the hardware cost in comparison with traditional implementations using threshold gates. First, a novel algorithm, which exploits the periodic properties of the series of powers of 2 taken from modulo k n ± 2 , provides us with the sequences of residues in a more efficient way when compared with the existing state of the art. With this simpler algorithm, all the design can be derived for any value of n and k. The novel topology with three different ways to separate the partitions are compared, two from the related art, and the proposed one. The novel idea of separation shows to be advantageous in terms of area and delay, confirmed by both the theoretical analysis and the experimental results. These results suggest that improvements up to 64% in area and 70% in delay can be achieved. 
