Repeated modular additions and overflow detection are possible in redundant hybrid number systems (RHNS). In this paper a circuit is proposed that implements the overflow-detecting procedure in such systems and allows a mean addition time of about 10.5 gate delays for numbers having a magnitude order normally distributed in the range [−2 33 , 2 33 − 1], versus a 14 gate delay required by 32-bit CLA adders.
INTRODUCTION
Over the last two decades, considerable attention has been paid to residue number systems (RNSs) because of their fast, parallel arithmetic capabilities [1, 2, 3, 4, 5] . However, the major disadvantage of residue arithmetic, i.e. the absence of explicit information on number magnitude, remains unsolved and lengthy intermodular operations are required when dealing with sign or overflow detection and magnitude comparison. Many solutions have been proposed to speed up intermodular operations because of a renewed interest in fast parallel arithmetic due to VLSI [6, 7, 8, 9, 10, 11] . Alternatively, in an attempt to devise number systems retaining both the modular properties of RNSs and the explicit knowledge of the magnitude of weighted representations, several authors tried to add a 'magnitude index' to the residue representation of numbers [12, 13, 14, 15] . This approach is reconsidered in this paper and an RNS with an MI will be referred as a hybrid number system (HNS), according to Barsi and Pinotti [15] . An adding, overflow-detecting procedure, which strongly reduces the need for intermodular operations, is reconsidered and evaluated by simulations in terms of mean rate of intermodular operations, using an HNS with redundancy (RHNS). This procedure could be implemented fruitfully by a device which cooperates with a CPU in applications where a massive quantity of data have to be additively accumulated. The mean addition time in such a solution has been derived for a specific-size RHNS, showing its actual effectiveness. In particular, a circuit was devised that features a mean response time of 10-10.5 gate delays/addition, for iterative additions in a range larger than the range [−2 33 , 2 33 − 1], versus 14 gate delays for traditional 32-bit-long carry look-ahead adders, when integers to be added have magnitude order values normally distributed with the maximum value of probability selected in the range from ≈ 3 × 10 −2 to 1/128 ≈ 8 × 10 −3 (uniform distribution).
HYBRID NUMBER SYSTEMS
The choice of a number system is essential to determine the complexity of arithmetic procedures. In weighted systems, the presence of carries limits the execution speed, while magnitude comparison and overflow detection can be considered as a by-product. On the contrary, residue addition and multiplication are very fast because of the independence of residue digits, but comparison and overflow detection require the knowledge of the magnitude of results and time-consuming intermodular operations are necessary. The hybrid notation partially retains the modular properties of RNSs and reduces the need for intermodular operations drastically. In the following, the essentials of hybrid representation are briefly recalled. From the fundamental remainder theorem [16] , any signed integer X can be expressed as
where µ is a positive integer, |X | µ represents the least nonnegative remainder of dividing X by µ and X/µ is the relative greatest integer not exceeding X/µ. A hybrid number system (HNS) is defined representing X as
with
and assuming that: 46 G. ALIA AND E. MARTINELLI
is represented in a weighted notation; we assume that W X is in the range in [−P, P), where P is a positive integer.
• R X , which specifies X in this interval, that is 0 ≤ R X ≤ µ − 1, is represented in a residue system of range µ.
• Consequently X is in the range in [−Pµ, Pµ).
ADDITION AND OVERFLOW DETECTION IN RHNSs
In this section some properties of addition and overflow detection in RHNSs are recalled from [14] . Let
. . , X t ≡ R X t , W X t be t integers represented in an HNS. Their sum:
has the representation S ≡ {R S , W S }, where, from (3):
that is, the residue part of the representation of S is the sum modulo µ of residue parts of the operands, whereas the weighted part may turn out greater than the sum of weighted parts of X and Y . It is possible to avoid the computation of δ S by providing the hybrid number system with an appropriate redundancy: the residue range µ is extended by adding a set of moduli whose product m R is such that µm R contains the sum of the residue parts of t operands. For an actual extension of the residue range, µ and m R are chosen relatively prime. The representation of an integer X in such an HNS, called a redundant hybrid number system (RHNS), will be of the kind X ≡ {R X , W X }, where
with R X and W X expressed in a residue and in a weighted system, respectively. However, because of such an extension, an RHNS does not guarantee the uniqueness of the representation, that is several pairs R X and W X satisfy the definition. The RHNS representation of X for which 0 ≤ R X < µ coincides with the HNS representation of X , that is R X = R X , W X = W X , and it will be referred to as the normalized representation of X. Now, consider the addition of the normalized representations of t integers
, available in succession; the following relations hold:
where t * represents an upper bound to the number of normalized integers which can be added without overflow occurrences from the extended residue range, provided that the weighted range is [−t P, t (P − 1)].
Checking S for overflow from [−Pµ, Pµ) can be performed by reconstructing its normalized weighted part W S , which is related to W S through Equations (6) and (8):
and an overflow will be detected if and only if:
that is:
Since, unfortunately, computing δ S would require intermodular operations, conditions (10) can be restated only as sufficient conditions, by means of inequality (7):
From conditions (11) , it follows that testing for overflow requires the knowledge of the number of normalized representations which have been globally added. Such information is kept in a control part C X that is added to the RHNS representation, i.e. X ≡ {R X , W X , C X }.
It will be assumed that:
• any normalized representation has a control part C X = 1; • the control part C S of the sum is obtained by adding the control parts of the operands; • all operands have normalized representations or they result as a sum of normalized representations.
Conditions (11) will take the form
From (9), C S cannot exceed the quantity
and, consequently, [1, C max ] is the representation range for
THE COMPUTER JOURNAL, Vol. 41, No. 1, 1998
that is if the redundancy does not exceed the duplication of the residue range, the following adding-overflow-detecting procedure can be adopted. It is worth noting that introducing the control part C X allows extension of these results to any sequence of data, even if not all represented in the normalized notation. Adding procedure (mostly parallel)
1. Add residue, weighted and control parts of operands to obtain R S , W S , C S ; 2. if condition (12) is satisfied an overflow is detected, else 3. if C S ≤ C max then S ≡ {R S , W S , C S } is the RHNS representation of the sum S, else R S may exceed the extended residue range and a preliminary normalization of operands is required before repeating the procedure.
Normalizing procedure (intermodular) 
EXAMPLES. Consider an HNS with µ = 210, P = 7 and then Pµ = 1470. The HNS will be extended to an RHNS assuming that one modulus m R = 11 (µm R = 2310) is added to the residue system and the weighted range is increased accordingly. Assume that a number of additions has been performed, yielding the result X, and a further number Y is currently added. For simplicity, R X , R Y , R S are provided in a positional representation; actually digits of the residue representations of R X and R Y are separately added to yield digits of the residue representation of R S .
• Let X ≡ {1320, −6, 7} and Y ≡ {23, 2, 2} be two RHNS representations, as derived from the addition of integers in normalized representation. The procedure yields:
is not satisfied and no overflow is detected; 3. C S ≤ C max = 11 and {R S , W S , C S } = {1343, −4, 9} is recognized as the RHNS representation of the result. • In the same system, consider X ≡ {2221, −7, 10} and Y ≡ {1821, −3, 9}:
is not satisfied and no overflow is detected; 3. C S > C max = 11; a normalization of operands is required, resulting in 4. {2221, −7, 10} ≡ {121, 3, 1} and {1821, −3, 9} ≡ {141, 5, 1}; 5. the whole procedure must be repeated: R S = 262; W S = 8; C S = 2; 6. condition (12): W S < −P − C S + 1 = −8 or W S > P − 1 = 6 is satisfied and an overflow is detected.
• Finally, consider integers X ≡ {2001, −14, 10} and Y ≡ {12, −6, 1}:
W S > P − 1 = 6 is satisfied and an overflow is detected.
INFLUENCE OF REDUNDANCY ON THE NOR-MALIZATION RATE
The proposed approach tries to avoid intermodular operations, exploiting the redundancy of RHNSs. Intermodular operations must be executed during the normalization phase.
To evaluate the influence of the choice of m R on the normalization rate a computational model for generating sequences of additions must be considered. The model assumes that sequences of normalized representations of integers in the range [−Pµ, Pµ) with a given probability distribution of occurrence are added. Each sequence ends when an overflow is detected or a normalization phase is necessary. Consequently, all the sequences which end with a normalization request have the same length C max + 1. The probability p N of a normalization request when performing an addition has been computed for several probability distributions, redundancy values and weighted ranges. Values m R = 2 i , i = 1, . . . , 7 have been chosen for redundancy, and values 2 j , j = 1, . . . , 6 for 2P (the weighted range). The reason for considering the value 2P is that overflow detection is related to this value rather than to the range of representation of integers or to the residue range, when m R < µ is assumed, as stated by inequality (14) . Consequently, for computational purposes, it is sufficient to process the weighted parts of numbers only. Choosing a probability distribution is very influential and difficult because of the wide spread of the application requirements. In this case, Gauss-like distributions for the weighted parts of operands were chosen, assuming that positive and negative numbers have the same probability of occurrence and that the greater |W|, the lower the probability of W. In fact, the representation range is generally dimensioned so that overflow probability is low.
The distribution is:
with the following conditions: −P ≤ W ≤ P − 1, that is the meanW = −0.5, and 2P ≤ 64; this bound is due to the need of limiting the number of simulation experiments. Constant A was chosen so as to normalize the distribution, while constant B was given values in the set {0, 15, 30, 45}, to produce a wide set of shapes, as shown in Figure 1 for 2P = 64. The behaviour of p N is shown in Figure 2a -d for the same values of constant B and of the weighted range 2P as in Figure 1 .
It is immediately seen that the number of normalizations reduces to a negligible rate as soon as m R takes non-trivial values. Moreover:
• The probability p N equals 1/m R for 2P = 2, whatever distribution shape is used. This occurs as the weighted part of operands can take only values −1 and 0, and neither positive nor negative overflow can be detected before the m R th addition is performed.
• p N decreases as 2P increases. This happens since the greater the value of 2P, the more likely overflow detection is.
• p N decreases for increasing values of m R and the value 1/m R represents a maximum for p N . In fact, at least m R additions can be performed without normalization requests.
• As parameter B increases, p N tends to its maximum 1/m R , for any value 2P. In fact, as B increases, the probability of weighted values near zero increases and, correspondingly, the probability of overflow detection decreases.
THE LOGIC DESIGN
In order to perform a quantitative evaluation in an example of specific size, let us consider an RHNS with the weighted part in [−64, +63], the residue part expressed by means of four residue digits, based on moduli m 1 = 113, m 2 = 121, Consequently, any sum in the range [−Pµ, Pµ) requires a 7-bit adder delay, that is a total of 10 gate delays, versus 14 gate delays necessary to 32-bit carry look-ahead adders.
(We assume an addition time of 6 + 4(h − 1) gate delays for operands of 4 h bits, exploiting carry look-ahead techniques [17] .)
On the other hand it may be necessary to normalize operands (such an operation can be performed in parallel on the two operands) at a rate depending on the value of parameter B.
From the shape of p N in Figures 2a-d normalization rates for this case are likely to be as shown in Table 1 , where the probability values for the mean magnitude orderW = −0.5 are also reported.
Referring to Figure 3 , let's evaluate the cost of normalization in terms of gate delays.
First, 7-bit residue digits must be converted to obtain R X , which is in [0, µm R ), for instance following the Chinese Remainder Theorem method: For the whole conversion from r i to the termsm i |r i /m i | m i of summation, five 128 ×35 ROMs can be used for which an access time comparable to 10 gate delays can be assumed.
The summation of these five values can be carried out, using five inputs, mod µm R multioperand modular adder (MOMA) as described in [18] . MOMA's response time is
where p is the number of n-bit operands; in this case T A = 36 gate delays. We conclude that about 46 gate delays are necessary to obtain the value of R X from r i . The increment of the magnitude index, which is less than m R , must be evaluated by the expression R X /µ , which can be rewritten as:
To evaluate |R X | µ , a mod µ operation can be carried out as in [19] . Since R X is in a field of 35 bits and fast multipliers require about 2 × (log 2 n − 1)/(log 2 3 − 1) + 2 + 4 × 1/2 × log 2 n gate delays [17] , about 87 gate delays are necessary to produce |R X | µ .
Furthermore, a 35-bit subtraction (≈ 14 gate delays) and a mod m R operation (about 69 gate delays) must be performed to obtain |R X − |R X | µ | m R . A 128 × 7 ROM (10 gate delays) provides the value of | R X /µ | m R . A further 10 gate delays to increment the magnitude index W X would be accounted for, but such an operation is performed in parallel with the operation described below, so a total of about 180 gate delays must be taken into account to evaluate from R X .
Finally, it remains only to evaluate the new value of r R using the relation
in fact, residue digits r i , i = 1, . . . , 4, remain unchanged, as a multiple of µ is subtracted from R X . Quantity r R can be obtained by means of a 16-Kbyte ROM, with an access time comparable to 20 gate delays. Therefore, the normalization phase globally amounts to about 46 + 180 + 20 = 246 gate delays.
Comparing this value with the normalization rates of Table 1 , it can be concluded that, in the worst simulated case B = 45, the mean addition time, including the normalization, is about 10.5 gate delays against 14 gate delays for traditional 32-bit-long CLA adders.
It is worth noting that in the sketched structure an increment of the residue range obtained by adding further moduli of the same length slightly affects the normalization time, increasing the depth of the MOMA and the length of its adders. In particular, adding a further seven bits modulus in the example does not modify the normalization time at all, while the range of integers is enhanced by a factor of about 2 7 .
Finally, even if multiplication in HNS is considered an inefficient operation [12] , the authors believe that it may be worth studying the possibility of extending the results presented in this work to multiplication procedures as well.
