Abstract-Novel radix-moduloarithmetic units for residue number system (RNS)-based architectures are introduced in this paper. The proposed circuits are shown to require several times less area than previously reported architectures for particular moduli of operation, while also being preferable in the area time complexity sense. The complexity reduction is achieved by extending the carry-ignore property of modulo-2 operations to radices higher than two, which are not powers of two. The carry-ignore property is efficiently exploited by introducing simplified digit adders, instead of general radixadders. The proposed simplification of digit adders is possible, since the maximum values of certain intermediate digits produced in the architecture are found to be less than 1. Detailed area and time complexity models are derived for the arithmetic units. The proposed radix-architectures include multipliers, adders, and merged multipliers-adders. In addition, efficient radixbinary-to-residue and residue-to-binary conversion techniques and architectures are introduced.
[4]- [6] . Piestrak presented a technique for building residue generators and multi-operand adders, featuring the exploitation of the periodicity of the residues of power of two offered by certain numbers as moduli [7] . The architectures utilize radix-2 arithmetic. Stouraitis et al. [6] presented full adder-based architectures for RNS multiply-add operations. DiClaudio et al. introduced the pseudo-RNS representation, which enables building reprogrammable-modulus multipliers, allows for systolization, and simplifies the computation of digital signal processing (DSP) algorithms, such as finite impulse response (FIR) filtering, correlations, and discrete Fourier transform (DFT) [5] . High-radix techniques for the acceleration of multiplications have been proposed in the literature. Takagi presented a radix-4 modular multiplication hardware algorithm suitable for large-modulus operations [8] . Soderstrand and Escott have proposed merging of RNS with the multiple-valued logic paradigm, aiming at the VLSI implementation of FIR filters. In their approach, the selection of the number of levels to represent multiple-valued logic signals is consistent with the modulus of operation, providing a natural representation for RNS [9] . Bases which include moduli of the form and have been widely used in RNS architectures, as they provide for low-complexity implementations [10] . The area time efficiency of the corresponding arithmetic units stems from the short word length of the operands (as implied by RNS) and their resemblance of simple conventional binary arithmetic architectures. However, to comply with the RNS requirement for pair-wise relatively prime moduli in the base, no more than one modulus of the form can be utilized. Therefore, not all residue channels can exploit the efficiency of architectures for operations modulo .
In this work, new architectures for multiplication, addition, merged multiplication-addition, and conversion are introduced, based on the extension of the carry-ignore property to radices , . It is shown that significant complexity reduction is achieved by modulo-multipliers, compared to previously reported combinatorial designs [5] , [6] , without adopting multiple-valued logic. The exploitation of the introduced architectures circumvents the restriction of using a single modulus channel for which the carry-ignore property is applicable. This is achieved by using different moduli of the form , such that
Hence, more than one carry-ignore channels can be included in an RNS architecture. Notice that the remainder of the moduli channels can be implemented using any of the conventional radix-2 architectures.
1057-7130/00$10.00 © 2000 IEEE
To exploit the carry-ignore property, radix-digit adders of low complexity should be developed. In the proposed arithmetic units, low complexity becomes feasible, as it is shown that the digits of the intermediate results do not assume the full range of values possible for a radix-digit. In particular, the general radix-adders required in a straightforward radix-architecture are reduced to adders, which do not need to manipulate the full range of single-digit radix-values. Therefore, simpler hardware is required for their implementation. The derived maximum values do not depend on the modulus of operation and, in some cases, do not depend on the radix ; therefore, their applicability is extended beyond modulo-arithmetic units. To illustrate the impact of the proposed arithmetic units on a complete RNS-based architecture, design techniques for the derivation of radix-binary-to-residue and radix-residue-to-binary converters are also introduced. The binary-to-radix-residue converter architecture is based on specialized cells of low complexity, while the radix-residue-to-binary conversion can be performed by modifying an existing architecture [11] . However, it is shown that introduced adder-based structures can also be utilized in the radix-residue-to-binary conversion to improve area time efficiency.
The selection of a particular architecture to implement a residue channel should minimize a performance measure such as area, time, or area time complexity. To facilitate the selection procedure, area and time complexity bounds for the proposed architectures are derived.
The remainder of the paper is organized as follows. Some RNS and radix-arithmetic basics are offered in Section II. Section III introduces the modulo-radix-multiplier architecture, Section IV focuses on modulo-radix-addition, and Section V focuses on merged radix-modulo-multiplication-addition. Binary-to-radix-residue and radix-residue-tobinary conversion architectures are discussed in Section VI. Performance comparisons to previously published architectures are presented in Section VII. Finally, conclusions are discussed in Section VIII.
II. REVIEW OF BASIC RNS CONCEPTS AND RADIX-ARITHMETIC
An RNS is defined through a set of integers (2) called the base. The integers are pair-wise relatively prime, i.e.,
for . Restriction (3) is necessary to assure that every integer , , can be uniquely represented as an -tuple (4) where denotes the residue of modulo . The main benefit of adopting RNS to perform arithmetic operations is that it allows for the totally parallel addition, subtraction, and multiplication of operands expressed as -tuples of the form (4). In particular, let and be the RNS representations of the integers and , respectively. Then the RNS representation of the product, the sum, or the difference of the two tuples denoted as , is given by
where denotes addition, subtraction, or multiplication. In this paper, the processing of integers modulo is investigated. Assume an integer of radix-word length , such that (7) where for . Then the residue of modulo is (8) since , when and , for . Therefore, the digits , , do not contribute to the residue . It is the combined exploitation of the carry-ignore property and the low-complexity radix-reduced digit adders that leads to the substantial performance improvement for processing of residues modulo .
III. THE PROPOSED RADIX-MULTIPLIER MODULO
An organization for the proposed high-radix multiplier (HRM) modulo is shown in Fig. 1 . Let the multiplier and the multiplicand be residues modulo of two integers and . They are given via (8) as (9) (10) where , are radix-digits, . Then, the function of the HRM, i.e., the computation of the product , can be described by (11) The products of the digits of the multiplier and of the multiplicand, are two-digit radix-values (12) where is the carry digit and is the sum digit. Since , for
, it follows that the products for which , do not contribute to . Similarly, any other digit having a weight of or more, does not contribute to a residue modulo . Such digits are:
1) the carries produced by adding digits of weight ; 2) the most significant digit of the products for which . Therefore, the modulo operation in (11) does not impose any computational-and subsequently hardware-complexity on the HRM architecture. The HRM consists of two stages, namely the preprocessor, which generates the two-digit products , and the adder array, which sums the outputs of the preprocessor as dictated by the double summation in (11) . Several types of digit adders are developed and used in this adder array. Their usage is justified in Section III-B.
A. The Preprocessor
The preprocessor computes the products and is organized as a collection of cells, operating in parallel, each cell of which produces a two-digit result , given by (12) . While may assume all radix-legitimate values, i.e., , the carry digit may only assume a limited set of values. Since the maximum value which can be assumed by is less than , the binary word length required to represent a carry digit is reduced; thus hardware complexity is reduced due to minimizing both wiring and processing circuitry complexity. The possible maximum values of carry digits are computed by the following Theorem.
Theorem 1: The maximum value of the carry digit generated at a preprocessor cell is . Proof: Since the multiplier and the multiplicand are encoded in radix , the maximum value of the product in (12) is (13) Since can be written in radix-form as (14) where and are the carry and sum digits, it follows that the maximum value of the carry digit is . Hence, for a radix-3 multiplier preprocessor, it is obtained that the maximum value of the produced carry digits is , i.e., a single-bit value. Two types of preprocessor cells are required, since the cells which multiply the most significant digits of the multiplier and the multiplicand, need not produce a carry digit, due to the carry-ignore property of a modulooperation. Cells PP produce a carry, while cells of type CP do not produce a carry digit. The performance of the two types of preprocessor cells for radices and is summarized in Table I . The area complexity is quantified in terms of the area occupied by a 2-input NOR gate, denoted as , while the time complexity is normalized to the corresponding gate delay . Efficient preprocessor cells can be obtained by synthesizing and optimizing a VHDL description using commercial logic synthesis tools such as Mentor Graphics' Autologic II [12] .
B. The Radix-Digit Adders of the Adder Array
The products generated by the preprocessor, are summed in the second stage of an HRM, which is an array of radix-digit adders, as shown in Fig. 1 . The introduced array architecture is modeled after the carry-save adder array [13] and it consists of several types of digit adders organized into columns. As carry digits of different maximum value are produced in the HRM, the need for minimal hardware and time complexity dictates that several types of digit adders should be employed to compose the adder array. Digits with maximum values and are produced at the preprocessor due to Theorem 1; hence they are added by digit adders at the top of each column. These adders, the input digits to which have a maximum value of either or , are called full adders (FAs) and produce carries of maximum value 2, as proven in Theorem 2 below. Therefore, in the next most significant column, digit adders or , capable of processing carries of maximum value 2 are placed below the FAs. The particular digit adders produce a carry of maximum value one. This can be proven as in Theorem 2, and the proof is omitted. Finally, adders or , which process single-bit carries, are placed at the lowest part of the column. Hence, a column consists of adders, which process digits of maximum value , , 2, or 1. All types of digit adders have at least one input which can receive a digit spanning the range , to accommodate a sum digit and-in this way-to reflect the organization of the array into columns.
Theorem 2:
The maximum value of a carry generated by an FA cell is two, if , and one, if . Proof: The maximum two-digit result produced by a digit adder in the array is (15) Two cases are distinguished. If then an integer exists, such that . Hence, (15) can be written as (16) Since , the generated carry digit may assume a maximum value of two. If then (15) gives (17) Hence, in a radix-3 architecture the maximum value of a carry digit generated by an FA equals one. The maximum value assumed by the carry digits generated by each possible digit adder type, can be found in a manner similar to that of Theorem 2. The particular maximum values are shown in Table II . Determining the maximum value of the carry is important, since it allows for employing fewer bits to represent the carry digit than the general radix-digit. In this way, the related hardware complexity is minimized, because wiring is reduced and the circuits, which produce or use the carries, are simplified.
The performance of the various types of digit adders is shown in Table III . The operation of each digit adder is described in the following. 1) FA: adds two radix-digits and a carry generated at the preprocessor.
2)
: adds a radix-digit and two carries generated by FAs.
3)
: adds a radix-digit to a carry generated by an FA.
4)
: adds a radix-digit with two carries generated by , , , or digit adders.
5)
: adds a radix-digit with one carry generated by , , , or digit adders. All types of digit adders can be further reduced, when located at the th column of the array, since no carry digit needs to be produced by adders of the particular column due to the carry ignore property. Reduced adders are denoted as the primed versions of the corresponding adder types and their complexity is displayed in Table III . The large variations in the performance of the various digit adder types noticed in Table III , reveal the practical importance of employing several types of adders to exploit the limited set of values assumed by the carry digits.
The general organization of a digit adder requires two stages and is shown in Fig. 3 . The first stage is a radix-2 three-operand adder, while the second one is a combinatorial logic structure, which translates the intermediate result produced by the first stage into a two-digit radix-legitimate result, composed of a radix-carry and a radix-sum digit. The two stages can be merged into a combinatorial logic implementation, especially for digit adders of lower complexity such as adders. Both the first and the second stage can be reduced by exploiting don't-care input combinations or by merging the two stages for simple types of adders, such as . For example a radix-5 FA receives a two-bit carry and two three-bit digits, the maximum value of which is 4. Therefore digit values 5-7 are don't-care combinations and can be used in logic minimization. Furthermore, the maximum value of the intermediate result produced by the first stage, is , which allows the don't-care input combinations 12-15 for the optimization of the second stage. The complexities shown in Table III are obtained by optimizing a gate-level netlist with Autologic II.
C. Complexity of the High-Radix Multiplier Modulo
In this section, the complexity of an HRM is quantified by deriving the number of the various adders and cells which it comprises. The first stage of the HRM is the preprocessor, which computes the digit products, such that . Equation (11) reveals that products correspond to the th digital position; therefore preprocessor cells are required at the particular digital position. The preprocessor area complexity is the sum of the complexity of the PP cells, which produce a carry digit, and the complexity of the CP cells at the th column, which do not produce a carry digit, i.e., (18) where and can be computed as (19) and (20) The CP cells are used because at the digital position , only digit of a product needs to be computed due to the carry-ignore property. Furthermore, the products of weight greater than are ignored. The second stage of the HRM is, as expected, an adder array. The adder array consists of columns, indexed by . The th column adds digits of weight . In particular, the th column adds the least significant digits of the products of weight and the most significant digits of the products of weight . Hence, if a number FA of FAs are required and organized into the th column, the top adder of which column receives two digits, while each of the remaining adders in the column receives one, then the number FA can be evaluated as follows:
Therefore, the total number of FAs in an array of a modulo-HRM is (22) Notice that the th column, the most significant one of the array, is composed of FA digit adders. To derive the number and type of the remainder of the digit adders which-along with FAs-compose a column, the number and type of carry digits produced at the th column, are determined in the following section.
1) Types of Carry Digits in the HRM-Array:
Two types of carry digits are produced within the adder array, those that can assume a maximum value of two and those with maximum value of one. The particular maximum values do not depend on the adopted radix . Let and denote the number of carry digits with maximum value of one and two, respectively, which are added by the th column of adders. The carries of maximum value 2 to be added at the th column, are produced only by the FAs of the th column. Hence, from (21), it is obtained that FA
for and . Carries of maximum value one are produced by two types of digit adders: 1) the digit adders which process the carry digits (having a maximum input value of two) produced by FAs in the nd column; 2) the digit adders which process one-bit carries generated at the nd column, when . In the th column FA carries generated by FAs and one-bit carries are processed. Hence, can be computed in the following recursive way: (24) for with the initial condition . By induction, the following theorem is derived.
Theorem 3: In the radix-carry-save adder array organized into columns, the number of carry digits with maximum value of one added in the th column, is given by (25) for . Proof: The proof is offered in Appendix A.
2) Organization of the Adder Columns:
The numbers and of the two types of carry digits added by the th column, given by (25) and (23) respectively, are utilized in the computation of the number and type of digit adders, which compose a column. Apart from the FAs, which manipulate digits from the preprocessor, the th column comprises adders for processing the two types of carry digits generated at the less significant th column. The number of digit adders in the th column, which add two carries of maximum value two and a radix-digit, is given by
In case is odd, an digit adder is required to process the remaining carry digit. Hence, is given by
In a similar way, it can be found that the number of digit adders in the th column is (28) and the number of digit adders is given from (29) where . Equations (21) and (26)- (29) provide the number and the type of digit adders per digital position (digit adder column in the particular architecture). Since, by virtue of the carry-ignore property, the number of columns is known, exact complexity formulas can be derived by computing the sums of all adder column complexities over the range . The particular complexities are computed in Appendix B and the derived formulas are summarized in Table IV. 3) The HRM-Area Complexity: The total hardware complexity of the HRM-is the weighted sum of the number of each digit adder type in all columns as well as the complexity of the preprocessor given by (18) . Hence, by taking into consideration the complexities of the digit adders shown in Tables I and III, the following expression of the total HRMcomplexity is obtained:
where the s and s denote the complexities of the various digit adders and preprocessor cells.
4) The HRM-Delay: The delay of the path from the inputs of the nd column to the output of the th column is given as FA (31) while the delay along the most significant column is FA (32) where the s denote the preprocessor cell and digit adder delays. Delay values for digit adder and preprocessor cell implementations using a 0.7-m CMOS standard-cell library are shown in Tables I and III. Depending on the particular gate-level implementation of the various types of digit adders, the radix , and the power , the maximum-delay critical path is defined by either the th column or the path from the top of the nd column to the output of the st one. Therefore, the longest path delay is given by (33)
IV. RADIX-MODULO-ADDITION
Let and be residues modulo of two integers and , as defined in (9) and (10) . Then the residue of their sum modulo is
An architecture for the radix-modulo-adder is based on digit adders, named Complete Full Adders (CFAs). An CFA sums two radix-digits and a single-bit input carry to produce a full radix-sum digit and a single-bit carry out digit. The carry out digit is always a single-bit quantity, independently from the radix of operation , because the maximum two-digit result produced by a CFA is
Let CFA be a CFA which does not generate a carry, and should be placed in the most significant column, due to the carry-ignore property. A radix-architecture for the implementation of (34) comprises radix-CFAs and a reduced one, which does not generate a carry digit, as shown in Fig. 4 . Therefore, the area 
complexity of a two-operand radix-modulo-adder is given as while the maximum delay of is V. MERGED RADIX-MULTIPLICATION-ADDITION

A. The Radix-3 Case
Let , , and denote residues modulo . A radix-3 merged multiplier-adder, which performs the operation (36) can be built using three types of digit adders, namely FA, , and , as shown in Fig. 5 . The required number of FAs in the th column is given by FA
where , as, in addition to the preprocessor digits, a digit from the addend is also considered. The number of and digit adders per column are given as The preprocessor complexity is identical to the complexity of a multiplier preprocessor, i.e.,
The total number of FAs is given as
The total number of digit adders is given as
The complete area complexity of a radix-3 merged multiplieradder is given as (44)
B. General Radix-Merged Multiplier Adder
The general radix-moduloMerged Multiplier-Adder (MMA) can be derived from the corresponding multiplier architecture by introducing an additional CFA at each column. The organization of the radix-MMA is shown in Fig. 6 . The complexity of the MMA can be computed in a manner similar to the HRM. In fact, the numbers of FA, , , , and digit adders for the MMA, are identical to those required for the HRM. However, the introduction of one CFA per column imposes an additional area complexity of (45) and a CFA at the th column. The time complexity is computed in a manner similar to the HRM case and is given by (46) where and are evaluated from (31) and (32).
VI. RADIX-BINARY-TO-RESIDUE AND RESIDUE-TO-BINARY CONVERSION
In this section, efficient architectures for radix-modulobinary-to-radix-residue and radix-residue-to-binary conversions are introduced.
A. High-Radix Binary-to-Residue Modulo-Converters
An architecture for a high-radix binary-to-residue (HRBR) modulo-converter is proposed. The HRBR architecture directly converts a binary integer to a radix-residue. Consider an -bit integer written in binary form as (47) where . The computation of the residues moduloof both sides of (47) allows the evaluation of the residue (48) (49) where with . The modulo operation in (49), does not require any computation, due to the carry ignore property. Hence, the decomposition recursions used in [14] are avoided. In particular, the carry digits, which could be produced at the th column, should be ignored and, therefore, the particular carries are not generated; hence the corresponding hardware is eliminated.
The computation of the radix-residue is performed by an array of special-purpose cells, called simplified digit adders (SAs). The HRBR converter can be conceived as a generalization of the binary-to-residue modulo-converters, which decompose the binary input into residues and, subsequently, conditionally add them, depending on the value of In the proposed HRBR converter, each residue , , is expressed in radix-format , the high-radix digits of which, are added by a suitable radix-SA array. The SAs do not perform general radix-addition. Instead, SAs placed at the th column add constant values to their input, when the corresponding input bit is set, , for any for which . Therefore, SAs can be used in implementing (49), which implies that either or 0 is added to the th radix-digit column, depending on being 1 or 0, respectively. A three-character label is assigned to each SA to reveal its functionality as described in the following. An SA labeled has three input ports, namely , , and , as shown in Fig. 7(a) . The value assigned to port can be any radix-digit, while ports and receive bits, which, when asserted, denote that the constants and/or respectively, should be added to the result. The constants and correspond to the values of (49). If the bits are assigned to the ports and , respectively, the SA of type computes the radix-two-digit sum 
where is the radix-digit computed by the SA placed above . An SA placed at the top of a column, is labeled , as shown in Fig. 7(b) , and it returns (53) where it is assumed that the input bits are assigned to the ports , , and , respectively. It should be noted that, when SAs of types and are used as building blocks of an array, the bits are either bits of the input , or carry bits from the less significant column. A carry bit can be assigned to any input port, which corresponds to an addend . The definition of a HRBR converter architecture is illustrated by means of an example. Assume a modulo-125 8-bit converter. In the particular case, it is , , and . The corresponding values used in (49), are shown in Table V , while the converter architecture and a possible layout are shown in Fig. 8 . The area and time performance of an 8-bit and a 12-bit modulo-125 HRBR converter are compared to previously reported conversion schemes [7] , [14] in Table VI . ROM complexity is quantified by the model used in [5] . The particular examples demonstrate that the proposed conversion of a binary integer into a radix-residue can be less complicated than the conventional radix-2 residue conversion. Therefore, the performance improvement achieved by the high-radix architectures presented in previous sections is not compromised by binary-toradix-residue conversion overhead. In fact, further improvement becomes possible by means of the HRBR converter.
B. High-Radix Residue-to-Binary Converters
Chinese Remainder Theorem (CRT)-based techniques for residue-to-binary conversion are discussed. The total number of bits required to represent a high-radix residue modulo is given by (54) as there are radix-digits, each of binary word length . However, the residue modulo , expressed in radix-2 format, has a word length of (55)
When
, the high-radix representation imposes exactly identical complexity to that of radix-2 residue decoding, assuming that the CRT-based residue decoding architecture introduced by Elleithy et al. [11] is employed, which consists of a partial sum generator, a partial sum adder, a range determinator, and a final converter. To accommodate a radix-residue, only the partial sum generator is altered. In particular, the residue bit partitioning, used to reduce the number of partial sums and maintain moderate lookup table size, can also be applied in radix-arithmetic, as shown below. Assume that the th residue of an RNS is
where are the radix-digits, each of which is represented in binary form as (57) where and . From (56) and (57), it is obtained that (58) Following [11] and by taking into account the radix-representation (58) of , it follows that (59) (60) (61) where , , and denotes the modulomultiplicative inverse of . Notice that the equality of (60) to (59) is due to the particular relation of the modulus with the product terms. Equation (61) reveals that the partial sum generator proposed in [11] is also applicable in radix-arithmetic.
In the case , to prevent the increase in lookup table size, an 1-bit radix-2 full adder-based architecture can be employed to evaluate the partial sum . In particular, can be computed by decomposing the residue into bits , each of weight , since it is obtained from (58) that (62) Subsequently, 1-bit full adder-based structures can be derived [6] , [15] , the efficiency of which depends on the choice of the RNS base, since the base dictates the actual values of and . As an example, consider the base . It follows that , , , and , while the multiplicative inverses are , , and . Notice that for it is . The partial sum in (60), which corresponds to is . The weights of the bits are shown in Table VII for . By assigning each to the digital columns dictated by the corresponding weight, a radix-2 full adder-based architecture for the evaluation of is obtained. For example, as has a weight of , it should be added to all columns, which correspond to a "1" in the binary expression of the weight [15] , i.e., to the th significant columns, where . In the particular example, an architecture with an area complexity of and a delay of can be obtained. For comparison, a ROM-based architecture for the evaluation of requires a lookup table of size bits, which has a complexity of at a delay of . Although , in the particular example the proposed full-adder-based generator is better in an area time sense than the lookup table-based generator. Therefore, even when , the radix-residue decoding can be performed efficiently.
VII. PERFORMANCE COMPARISONS
The performance of the radix-modulo-HRM is compared to that of previously reported architectures [5] , [6] , [16] , as well as to the performance of a straightforward ROM lookup table-based architecture. Comparison to [6] is performed by deriving an architecture as described in [6] for each modulus and evaluating its performance. Two models of ROM complexity are employed in the comparisons, one based on experimental performance data of ROM cells generated by a commercial tool [17] and the ROM complexity model utilized by DiClaudio et al. [5] . The architectures are compared in terms of area, time, and area time complexities, as shown in Tables VIII-X. By using the digit adder complexities shown in Table III , and the area and time complexities (31) and (33), area and time complexity bounds of the proposed radix-HRM for and are obtained. Tables VIII-X show that radix-HRM reduce the area, time, and area time complexity in all cases but one by a larger than one factor, and, for some moduli, complexities are reduced several times. Reduction factor is the ratio of the complexity of an architecture over the corresponding complexity of the proposed architecture. In these tables, the experimental ROM performance model is assumed.
The relation between the radix-word length , i.e., the number of the required radix-digits, of a residue moduloand a corresponding radix-2 word length of a residue modulo is
Equation (63) is utilized in the area, time, area time, and area time square complexity comparisons of the HRM moduloto the published architectures [5] , [16] and to a direct ROM lookup-table implementation, which are shown in Figs. 9 and 10. ROM performance in Figs. 9 and 10 occurs by assuming the complexity model utilized in [5] . Fig. 9 illustrates that radix-5 HRMs are better in the area, area time, and area time square sense, while they are slower for . Fig. 10 shows that radix-3 HRMs are slower, but they are of the smallest area, area time, and area time square complexity. The radix-3 digit adder complexities of Table III are used for the area time square complexity diagram in Fig.  10 , while the remaining of the plots in Fig. 10 , assume digit adders optimized for low area complexity. The complexity measures of the proposed HRMs utilized in the comparisons, are upper bounds of the actually achieved performance, since the HRMs can be further optimized by modern logic synthesis and CAD optimization tools. For example, by optimizing the modulo-125 radix-5 HRM architecture with Autologic II, an area time complexity of 13 650 is obtained, which is less than obtained from Table X . The area time complexity of optimized radix-5 moduloadders is summarized in Table XI , where it is shown that they are comparable to offset residue adders [18] . Therefore, the radix- approach provides an area time efficient solution for a residue channel, which comprises conversions, multiplication, and addition.
VIII. CONCLUSION
Combinatorial arithmetic units based on radix-arithmetic to implement modulo-operations, which include multiplication, addition, and merged multiplication-addition have been introduced. Furthermore, efficient radix-residue-to-binary and binary-to-radix-residue techniques have been described. A comparative performance evaluation has illustrated that area time complexity can be reduced several times for moduli of the form . The demonstrated area time efficiency stems from the exploitation of the carry-ignore property, which is inherent in modulo-operations, and from the adoption of low-complexity digit adders. The complexity reduction of the latter is due to the exploitation of the incomplete set of values which can be assumed by interim digits in the architectures.
The actual maximum values are identified and are expressed as a function of the radix , in case they are not constants. The maximum digit values do not depend on the adopted modulus of operation ; hence, they could be exploited in RNS architectures independently of the form of the modulus. However, the performance of such architectures is not dealt with in this paper. Finally, to facilitate the evaluation of the proposed arithmetic units, when exploring the area-time design space of an RNS-based VLSI DSP system, complexity bounds have been offered. The interest in accelerating computations and reducing the corresponding area requirements has recently received a new dimension; the reduction of power dissipation. Coarsely and assuming a constant activity ratio, area reduction reflects switching capacitance reduction, while acceleration can be exploited to reduce the supply voltage, potentially leading to power dissipation savings [19] . Therefore, the RNS concept and the proposed architectures are worth investigating as a means of reducing power dissipation in signal processing. The proposed radix-arithmetic units may prove an interesting alternative for adoption in RNS DSPs.
APPENDIX A PROOF OF THEOREM 3
Initially, it is noted that the claim (25) holds for , i.e., , as no one-bit carries are generated at columns for which , while a single carry of maximum value two is generated at column . Subsequently, assume that (25) holds for a particular , i.e.,
It is now proven that if (64) is true, then (25) holds when , i.e., that is true. By applying (24) for , it follows that (65) Two cases may be distinguished, depending on being an even or an odd integer. Assume that is even, i.e., for an integer . In this case, the two terms of the sum in (65) can be written as Therefore, from (68) and (71), it follows that for any value, even or odd, of can be expressed as
Therefore the claim (25) is true.
APPENDIX B PROOFS OF THE COMPLEXITY BOUNDS
The proofs of the complexity measures in Table IV 
