This paper presents a new multivariate mapping strategy for the recently introduced Modulus Replication Residue Number System (MRRNS). This mapping allows computation over a large dynamic range using replications of extremely small rings. The technique maintains the useful features of the MRRNS, namely: the ease of input coding; the absence of a Chinese Remainder Theorem inverse mapping across the full dynamic range; the replication of identical rings; the natural integration of complex data processing.
W I N D S O R U N I V E R S I T Y O F

V L S I R e s e a r c h G r o u p
W I N D S O R U N I V E R S I T Y O F
Large Dynamic Range Computations over Small
Finite Rings
N.M. Wigley, G.A. Jullien and D. Reaume
Introduction
Computations using finite rings can offer distinct advantages over integer computations using binary arithmetic. The most visible use of such computations is in the coding of integers as elements of a set of rings, with relatively prime moduli, allowing large dynamic range closed operations (addition, multiplication) to be carried out by a set of parallel small ring calculations.
This is known as the Residue Number System (RNS) [8] . It derives from the classical form of the Chinese Remainder Theorem (CRT).
Normal techniques for arithmetic computation require some form of carry propagation; this occurs at regular intervals during the computation, and the mechanism for carry propagation is one of the main driving forces behind the various number representations used for arithmetic computation. In using residue systems, we are able to defer the entire carry propagation procedure until a considerable part of the computation has been completed. Since carry data flows orthogonally to word data, the finite ring approach, which removes the carry flow, allows easier clocking [5] , testing [6] and fault tolerance strategies [9] to be used.
The RNS does, however, have drawbacks. Since the number of moduli bears a direct relationship on the size of the dynamic range, a need for an increased dynamic range is reflected in the need for ever-larger moduli. The structures of the finite rings associated with these larger moduli become increasingly complex to implement in hardware. Indeed, practical implementations of RNS devices have essentially been limited to five, and on occasion, six-bit moduli, with one fixed and one variable input at each stage of the computation.
Treatment of complex integers can lead to more restrictions on the choice of moduli. Use of the QRNS (Quadratic Residue Number System) can, on the one hand, simplify the design and decrease the area of hardware; on the other hand, the QRNS imposes yet another condition on the moduli (that they be of the form 4k + 1) and thus creates the need for even larger moduli.
The authors have recently introduced a modified version of the RNS called the MRRNS (Modulus
Replication RNS) [12] . The MRRNS is based on a version of the CRT which holds for polynomial rings. The distinction between the two lies in the fact that in the RNS all of the moduli must be relatively prime; this requirement is completely relaxed in the MRRNS framework.
The MRRNS allows the repeated use of moduli in order to increase the dynamic range of the desired computation. Moreover, the QRNS requirement that the moduli be of the form 4k + 1 can be relaxed, albeit at the cost of some extra hardware (but at no increase in the complexity of design). These two facts together have enabled us to put together a system which will execute a 1024-point FFT with 17-bit input data, using the smallest possible ring moduli of 3, 5, and 7. To do this we introduce a new, multivariate version of the MRRNS.
We also coalesce the use of the replication of small modulus rings with a recently introduced complex dynamic CMOS dynamic logic design technique, referred to as Switching Trees [14] .
This design technique converts look-up table storage into minimized complex transistor circuitry based on a binary tree structure. The technique allows input data up to a wordlength of 6 bits, and allows outputs of an arbitrary number of bits. In particular we illustrate how multiplications and additions in the finite rings can be accomplished through the use of these tools, and a high speed single phase clock pipeline structure [13] .
We start the main body of the paper with the necessary definitions and theorems on which the theory is based. There follows a discussion of the preliminary encoding of integers as polynomials. We then give a treatment of the further encoding and decoding of polynomials as vectors. We then discuss the example FFT both in terms of architecture, including scaling requirements and its implementation. We conclude with a discussion of VLSI implementation strategies.
Definitions and Theorems
In this paper we introduce a multivariate approach to the MRRNS [12] , whereby several indeterminates are used to stand for various radices, each a distinct power of 2. This use of several variables allows very large dynamic range computations to be performed over very small finite rings. We begin with the definitions of the rings needed, and continue with the principle theorems used.
Definitions
Let M be a positive integer. Let Z denote the ring of integers. Let Z M denote the ring of integers modulo M . The elements of Z M can be taken as residue classes of integers, but it is more convenient and customary to use a set of M consecutive integers. Addition and multiplication in these rings are the usual operations of modulo arithmetic.
Let X denote a variable. If R is any ring then we let R[X] denote the ring of all polynomials in the variable X with coefficients in R , with the usual addition and multiplication of polynomials. Thus
Z[X]
consists of all polynomials in X whose coefficients are integers, and Z M [X] consists of all polynomials in X whose coefficients are in the ring Z M . Addition and multiplication in Z M [X] are defined by reducing the coefficients of a sum or product modulo M . Thus in Z 13 [X], if
, and . This is done by using the Division Algorithm for Polynomials.
Quotient Rings
Finally we reduce the coefficients of the product (mod M ).
If, as above, we have f(X) = 5X 2 + 4X and h(X) = 3X 2 -6, and if g(X) = X 3 -X, then by the Division Algorithm we have f(X)h(X) = g(X)(2X -1) + ( -2X 2 + X), and thus in the ring
) the product f(X)h(X) = -2X 2 + X.
In the case of two variables, with two fixed polynomials g(X) and h(Y) (with coefficients in Z M ),
we can define the ring
as the set of all polynomials f (X,Y) which have coefficients in Z M , have degree less than m in the variable X and degree less than n in Y .
Addition is straightforward, and multiplication is defined according to the rule of taking the least residue modulo g(X) and modulo h(Y) as well as reducing the coefficients (mod M ).
Direct Product Rings
The direct product of two rings R and S is denoted by R × S and consists of the set of all pairs of elements (r, s) where r ∈ R and s ∈S . Addition and multiplication are performed within components. Thus (r 1 , s 1 ) + (r 2 , s 2 ) = (r 1 + r 2 , s 1 + s 2 ) , and (r 1 , s 1 ) ⋅ (r 2 , s 2 ) = (r 1 ⋅ r 2 , s 1 ⋅ s 2 ). The direct product of two rings is also a ring. Note that some authors write R ⊕ S and use the terminology 'direct sum'.
For example in the direct product ring Z 7 × Z 9 the elements (3,2) and (-3,4) have sum given by (3,2) + (-3,4) = (0, -3), and their product is given by (3,2)⋅(-3,4) = (-2,-1).
If R 1 ,…, R p are rings then we shall use the notation R i i ∏ to indicate the direct product
The benefit of direct product rings to the VLSI designer is that a stream of computations will be performed wholly in each individual channel, independently of computations in the other channels. This is to be contrasted with the cross channel communication required by typical binary arithmetic systems.
Homomorphisms and Isomorphisms
If R and S are two rings then a ring homomorphism from R to S is a map Φ from R to S which has the properties of being additive and multiplicative, i.e. for any two elements r 1 ,r 2 ∈ R we have
If R is any ring then another example of a homomorphism is the map from R[X] to R given by fixing an element r of R and then evaluating each polynomial f (X) at the value f (r). We have An isomorphism is a homomorphism Φ from R to S for which the inverse map Φ −1 from S to R exists. A necessary and sufficient condition for Φ −1 to exist is that Φ satisfy two conditions; the range of Φ must be all of S ( Φ is onto), and Φ must be 1-1. This means that if Φ maps any element of R to the zero element of S then the element of R must itself have been zero. In other words, if r ∈ R is nonzero then Φ(r) must be nonzero.
If there is an isomorphism from R to S then we say that R and S are isomorphic. It follows that any additions or multiplications that are performed in the one ring will have these operations echoed in the other ring. Hence computations can be performed in one ring and the results mapped to the other ring. We shall use this fact below to map data from a ring of polynomials to a direct product ring, in order to perform the computations in the direct product ring. This enables us to take advantage of the independent channel arithmetic discussed earlier.
The basic isomorphisms
We now lay the theoretical groundwork for our conversion of a given algorithm, originally stated in binary form, into a problem which can be implemented in a direct product ring. To do this we
give an isomorphism from a ring of polynomials to a direct product ring. 
The term (r i − r j ) −1 denotes the multiplicative inverse of (r i − r j ) in the ring Z M , which exists by the assumption above.
First we state a lemma which is valid for more general rings. 
). The evaluation map of f (X) at all of the roots of g(X),
Proof. Let f (X) have coefficients in Z M and degree less than m . Then the map
) is an evaluation map of f (X), and hence it is a homomorphism.
We must show that it is 1-1 and onto.
To show that it is onto, let (a 1 , a 2 ,…, a m ) be any element in the cross-product ring. Using the
and we see that it maps to the element (α 1 ,α 2 ,…,α m ). Hence (α 1 ,α 2 ,…,α m ) belongs to the range, and thus the map is onto.
To show that the map is 1-1, suppose that f (X) is a polynomial of degree less than m and it maps to zero. We must show that f (X) is the zero polynomial. Since the image of f (X) is the zero element of the direct product ring, this means that for each i = 1,…, m we have f (r i ) = 0. Thus f (X) has m distinct roots {r 1 ,r 2 ,…,r m }. Since f (X) has degree < m, by lemma 2 it must be the zero polynomial.u
We now turn to functions of more than one variable. Although in the rest of the paper we are concerned with polynomials of more than two variables, the proofs are made clearer if we restrict ourselves to two variables. The extension to more than two variables is easily made.
We assume now that {r i :1 ≤ i ≤ m} and {s j :1 ≤ j ≤ n} are two sets of elements of Z M with the
} be polynomials with the property that ϕ i r i
and ϕ i r p ( ) = 0 for p ≠ i and ψ j s q ( ) = 0 for q ≠ j ; the construction of these polynomials follows exactly as in the case for one variable, given above.
Then we have the following theorem.
is a polynomial with coefficients in Z M whose degree in X is less than m , and whose degree in Y is less than n . The evaluation
map of f (X,Y) at all pairs of roots of g(X) and h(Y), given by
taken mn times), but also an isomorphism:
Proof. The map is an evaluation map and is thus a homomophism. We must show that it is an isomorphism, i.e. that it is 1-1 and onto.
To show that the map is onto, suppose that
To show that it is 1-1, suppose that f (X,Y) is a polynomial of degree less than m in X and less than n in Y , and suppose that f (r i , s j ) = 0 for 1 ≤ i ≤ m and 1 ≤ j ≤ n . It is sufficient to prove that
,and then, for each i,
has n roots; since it is of degree less than n in the variable Y , by lemma 1 it must be the zero polynomial. This says that each of its coefficients is zero. But this means that for 1 ≤ i ≤ m we have f j r i ( ) = 0 . Since i was arbitrary it follows that f j X ( ) has m distinct roots. But f (X) is of degree less than m , and thus by lemma 2 it follows that f j X ( ) is the zero polynomial. Hence f (X,Y) is also the zero polynomial, and we have proved that only the zero polynomial gets mapped to the zero element.
This implies that the map from Z M [X,Y] / (g(X)
,h(Y)) to the cross-product ring is 1-1, and hence is an isomorphism.u 3. Encoding and Decoding of Integers.
We now show how to convert a DSP algorithm into a computation in a direct product ring. For simplicity let us assume that the algorithm to be solved consists of computing the inner product of two vectors of complex integers; this will cover a great number of DSP algorithms. We first give a schematic of the process, and accompany it with a simplified example. In Fig. 1 we have a diagram of all of the rings that we shall be using, together with the mappings between them. The rings are denoted by the boxes. { }, where T stands for the complex unit j and X stands for 4. We shall define the quotient rings later by specifying the polynomials
Representing integers as polynomials (the map ϕ )
The input data are assumed to be given as two sequences of complex integers of a given bitlength.
They are then represented as polynomials in Z[X 1 ,…, X n ], as shown below. Instead of calculating the inner product of two sequences of complex integers we thus perform the inner product of two sequences of polynomials and then apply the map ϕ −1 . Since ϕ −1 is a homomorphism, it preserves sums and products and thus it preserves inner products. Thus it is sufficient to find the polynomial in Z[X] which represents the inner product; this is the forward map, ϕ (this is not a homomorphism, and is shown hatched on Fig. 1 ). The rest of this section is devoted to this problem.
For the sake of example and to show how the various homomorphisms work, we shall map the complex integer 13 − 11j to the direct product ring. Using T in place of j as well as X in place of the radix 4, we have 13 
The maps µ and Φ
The modulus M is selected in advance, and is assumed in general to factor in the form Since inputs and outputs have different polynomial degrees, it is important to take care of the highest degree possible. Inputs in our examples have degree one in each variable, and outputs have degree two. Thus we shall represent the components of Ψ with a 9 × 9 matrix for the modulus m = 3, and as a 6 × 6 matrix for the modulus m = 5 .
The map Ψ
With m = 3, the polynomial
evaluates at the roots T = −1, 0,1 and X = −1, 0,1 to yield the matrix product: 
It is significant that this matrix is a tensor product Γ ⊗ Γ of matrices. Indeed, let the polynomial f (X) = α 1 + α 2 X + α 3 X 2 be evaluated at the points X = −1, 0,1. Then the matrix Γ of the transformation is shown in the following product, which represents the evaluation:
It is easily seen that Γ ⊗ Γ is the 9 × 9 matrix above . This fact is highly beneficial in the VLSI layout problem, where we may sometimes deal with four or more variables.
For the modulus m = 5 the situation is simpler. We treat polynomials which are quadratic in X and linear in T , say
. We set T = −2,2 and X = −1, 0,1,
and we obtain
The matrix is the tensor product Γ 1 ⊗ Γ , where Γ is as before and Γ 1 is the matrix corresponding to the evaluation of the polynomials A 1 + A 2 T at the points T = −2,2 .
The inner product
Once the data have been converted to elements of this last ring, the inner product is performed in each component of the direct product ring
The results are then mapped back to C.
Reversing the maps Ψ and q
The map Ψ is an isomorphism, so Ψ −1 poses no problem. The matrix components of Ψ −1 can be obtained by inverting the matrices Γ and Γ 1 above and forming the respective tensor products.
The map q reduces polynomials of higher degree mod g i (X) ( ); as was stated before and will be shown below, we can choose the degrees of the polynomials g i (X) { } high enough to ensure that the output data have degrees less than or equal to the degrees of the g i , so that the map q −1 exists and requires no computation. This is shown as a bold line on Fig. 1 .
Reversing the maps Φ, µ and ϕ
The classical CRT assures that the map Φ −1 exists and is an isomorphism between
The next step, however, must be treated carefully. The map µ has in general no inverse.
Since µ reduces coefficients by finding residues mod M ( ), it is necessary that the coefficients of the answer polynomial be their own residues mod M ( ). This is the crucial restriction on the size of Since the computation is taken over the ring Z 105 , modular overflow will take place if any of the coefficients is greater than 52.5 = 105/2 in absolute value.
In actual fact, of course, the computation will take place in the direct product ring, and the result will then be mapped back to the polynomial ring. However, this map is an isomorphism, and thus any difficulties that exist will appear, and thus can be studied in either setting. Modular overflow is easier to study in the framework of the polynomials.
After each stage of the FFT, i.e. after each DFT, the output must be converted to the same form as required for input data, since the output of one stage becomes the input for the next stage. Thus we require a means for transforming polynomials of degree two, with coefficients up to 52 in magnitude, into polynomials of degree 1 and coefficients from the set {-1, 0, +1}.
At the same time we must also be concerned with number growth. Inputs of 16 bits (plus a sign bit) will give rise to much larger outputs; these outputs must then be scaled down to 16 bit numbers. Thus both scaling and polynomial conversion are necessary.
During each stage of the FFT there is growth in the size of the coefficients of the transform, and normally scaling takes place after each such stage. In [10] a study was done on the growth of numbers, both from the theoretical and the practical standpoints. Experimental results were obtained by considering: pseudorandom numbers; sine waves plus pseudorandom numbers; and speech signals. Although the theoretical worst case upper bounds were 5.058, for N 2048, a practical upper bound on number growth equal to 4 was considered quite reasonable for the input sequences under consideration. In the rest of this paper we shall make use of this empirical assumption. In addition to the growth of numbers arising through the FFT computations, we must also consider the scaling of the twiddle factors that occurs when performing the initial quantization.
Since the polynomial mapping will correctly map values in the interval (-2 16 , 2 16 ), and since the twiddle factors are all equal to 1 in the first stage of the FFT, scaling may be disregarded provided we limit ourselves to inputs of 14 bits. In the following stages we quantize the twiddle factors with sixteen bits. Clearly scaling will be required after the 2nd, 3rd and 4th stages. For maximum accuracy, the scaling factor should be chosen so as to produce outputs which are as great as the input polynomial mapping scheme will allow. No scaling is performed at the end of the 5th stage.
Scaling, Polynomial Conversion and Error Analysis
The need for scaling at the end of each stage of the FFT can be carried out most efficiently by combining it with the requirement of conversion of polynomials. Reference [10] discusses both the A/D and twiddle factor quantization errors associated with the algorithm. We will specifically discuss the dynamic range overflow, as it pertains to our polynomial ring representation.
RNS Overflow
A potential source of error is overflow of the residue number system. This occurs when one of the coefficients of the output polynomials is greater in magnitude than 52.5 = 105/2. We now show that, under normal circumstances, such an occurrence is extremely rare.
Let us consider the typical multiplication of polynomials that occurs in the MRRNS. We have two real integer multipliers, say A and B , which are represented as polynomials. For the FFT implementation we have used polynomials which are linear in each of their variables. Each integer A and B can then be written as such a polynomial with coefficients drawn from the set {-1, 0, 1} (see section 3.1):
Overflow occurs when one of the coefficients of the product polynomial C = AB is greater than 52 in magnitude. We have the identity: 
Statistical Error Distribution
In order to examine the incidence of overflow we have simulated the multiplication of such polynomials using a program, MODULUS, developed within our group. Both the A and B multipliers are chosen to be polynomial representations of uniformly distributed random pseudointegers; the A multipliers are assumed to be of 14, 15 or 16 bits (plus sign bit), and the B multipliers have 16 bits.
The user specifies the statistical distributions of the input integers, the number and types of indeterminates to be used in the polynomial representations, and the blocklength of the inner product. Probability generating functions, PGFs, (which are themselves polynomials in some new, unrelated variable) are then computed for the coefficients of the input polynomials. From these, PGFs are computed for the output polynomials using standard statistical theories as well as the statistical independence of the distributions. These computations enable the user to determine requirements for the moduli of the RNS and MRRNS computations, and to determine overflow statistics.
Overflow for the example FFT is found to be exceedingly rare; the probability of an overflow error is less than 10 -9 . For a complex inner product with blocklength 8 (a radix 8 calculation), the statistical distribution of the coefficient most likely to overflow, namely c 1111 , is given in Fig. 2 .
The distribution is essentially normal, and coefficient values exceeding 52 in absolute value still have very low probability; in the order of 10 -4 .
Scaling and conversion of output polynomials
It has been seen that, after all but the last stage of the FFT, outputs must be scaled and converted to polynomials which are appropriate for the next stage (i.e. multiplication by twiddle factors, followed by another 4-point DFT ). The error introduced by scaling is, of course, dependent on the scaling algorithm employed. There are many techniques available. Assuming twiddle factors to have B bits, and assuming a number growth factor of 4, the desired scaling factor is seen to be 2 B+ 2 .
The simplest technique for conversion of the output polynomials into input polynomials, which have been reduced by an appropriate scaling factor, is to convert the polynomials into binary integers, scale, and convert the results to input polynomials. This necessitates a large binary adder.
The number of terms to be added may, however, be reduced by adding together the coefficients of the same power of two (i.e. coefficients of monomials which represent the same power of 2, e.g.
and X , each of which represents 4) and then treating this result as one coefficient. This permits the additions to be done using modular arithmetic, which has the combined effect of decreasing the number of additions to be performed and also decreases the number of values which the CRT must process. It also has the salutary effect of keeping down hardware overhead. The most efficient scaling algorithm we have found is given below:
Scaling Algorithm: Suppose that the desired scaling factor is 2 s (in practice we have s = 18, since B = 16). A low error method of applying this scale factor is to use the recursive relationship:
Using this method, the coefficients corresponding to the first s powers of 2 are processed using 5-bit (or fewer) additions. Absolute error, though roundoff errors occur s times, is limited to:
Further hardware simplification is possible, without introducing significant error, by ignoring the lowest order terms.
Mean squared error
To compute mean squared error, we observe that after the first stage of scaling there is a probability 1 4 of an error of 2 − s , a probability 1 4 of an error −2 − s , and a probability 1 2 of an error of 0. Hence the mean squared error after the first stage of scaling is
After the second stage of scaling another error is introduced, whose mean squared error is:
Continuing, we obtain e(s − 1) = 2 −3 . Since each term being added in the scaling operation is offset from the previous term by one bit, the errors calculated above are all independent. Thus the total mean squared scaling error is the sum of the individual mean squared errors, and the total mean squared scaling error is 2
< 1 6 . Accounting for both real and imaginary parts, we see that the total mean squared scaling error is < 1 3 . Thus algorithm III produces a mean squared error less than twice that of algorithm I, and hence less than twice the minimum possible mean squared error.
VLSI Implementation
For the small ring computations allowed by our technique, we only require to build arbitrary switching blocks with a maximum of 6 logic inputs. The blocks, unfortunately, do not have the benign decomposition properties of binary arithmetic, where it is possible to build complete arithmetic circuits from 3-bit input, 2-bit output full adders. The traditional approach for residue blocks has been to suggest the use of ROMs, and this still remains the preferred implementation procedure within the residue arithmetic community [8] .
Minimized ROM Structures (Switching Trees)
In an attempt to optimize current designs on silicon, we have examined the construction of such small address space ROMs and, in particular, the minimization of the ROM structure based on the specific ROM contents. In looking at ROM decomposition strategies, we can go beyond the normal 2-dimensional structure (row and column decoders) to a maximally decomposed structure consisting, essentially, of a binary tree. In this implementation the decoders reduce to inverters.
The approach we have evolved is to generate a full binary tree, program the bottom of the tree (remove unwanted transistors) and then minimize the resulting structure based on two simple graph theory rules [14] . In doing this we do not invoke any concepts from boolean algebra which may not yield the best transistor configurations (including PLA configurations).
Restricting the trees to be n-channel blocks and evaluating only a single node, we can build massively pipelined systems, where every evaluation node is pipelined. A recently introduced true single phase clocking system [13] , provides an excellent, stable, pipelining technique for quite complex trees. Using double trees evaluating both true and complement nodes, we can implement both domino and differential cascode voltage switch logic [2] . 
Embedded Single Phase Clocked Latch
The complete single phase clocked latch, with embedded switching tree, is shown in Fig. 3 . Since we are only interested in implementing n-channel logic blocks, we use a single inverter p-channel block at the output of each n-channel block.
Using this dynamic latch arrangement, we have fabricated 6
high trees that function at pipeline rates of 40MHz for a 3µ CMOS double metal, p-well process.
Results
As an illustration of the application of the switching tree method we have used our CAD package, WOODCHUCK [15] , to minimize the full tree structure of a modulo 7 multiplier (Fig. 4) . 
Comments and Conclusions
In this paper we have discussed a new polynomial ring mapping technique which allows large dynamic range computations to be performed using massively parallel small finite ring computational elements. We have presented complete forward and inverse mapping details, along with suitable scaling procedures and the application of redundant binary representations; the computation of a radix-4 FFT has been used to illustrate the procedure. A complete quantization analysis, including coefficient growth peculiar to this mapping strategy, has been developed.
The technique allows direct mapping of bits of either a purely real or multiplexed bit coded complex number to a set of independent rings, defined by the smallest usable odd relatively prime moduli of 3, 5 and 7. Although the use of such small rings in a traditional RNS system would yield an inadequate computational dynamic range, our new technique allows usefully large dynamic range computations with such moduli.
A suitable VLSI implementation procedure, using the recently introduced Switching Tree approach, has been illustrated with the synthesis of a complete Mod 7 multiplier. Fabricated test cells operate at 40MHz.
An important VLSI architectural point is the massive replication required for the small rings. We have to remember that a complete multiplication is performed within a single pipeline cycle using only 6-transistor high trees. Essentially we have changed the VLSI footprint of the computational elements from the roughly square footprint of standard binary multipliers to a narrow-long rectangular footprint. Since the narrow dimension is in the temporal direction, we achieve high speed, low latency implementations. Instead of the integrated two-dimensional data flow experienced with standard binary arithmetic elements (associated with carry propagation), our architecture only communicates across the dynamic range at relatively widely spaced intervals (scaling and conversion). Between these points we have linear independent pipelines using only 3-bit variables. At the conversion points we effectively have a corner turning procedure, where the computations across the entire dynamic range are computed; these computations are also linear pipelines. Testability, and fault tolerance advantages of such an architecture are not to be dismissed, particularly in critical applications; the architecture is ideally suited to current density ULSI fabrication processes. 
