In this paper, architecture of residue number system used in FIR filters, is presented. For many years residue number coding has been recognized as a system which provides capability for implementation of a high speed addition and multiplication. These advantages of residue number system coding for the high speed FIR filters design results from the fact that an digital FIR filter requires only addition and multiplication. The proposed FIR filter architecture is performed as series of modulo multiplication and accumulation across each modulo. A numerical example illustrates the principles of FIR filtering of an 32 order low pass filter. This architecture is compared with FIR filters direct synthesis.
Introduction
T HE residue number system is a non-weighted number system which speeds up arithmetic operations by dividing them into smaller parallel operations. Since the arithmetic operations in each modulo are independent of each other, there is no carry propagation among them so residue number system is carry-free addition, multiplication and borrow-free substraction [1] . Residue number system is one of the most effective techniques for power dissipation reduction in VLSI system design [2] .
Some application of the residue number system are digital signal processing [3] [4] [5] . Digital filters are especially important in DSP because they can be used for a wide variety of applications: noise-reduction, band splitting, band limiting, Manuscript received on October 5, 2008 . Author is with Faculty of Natural science and Mathematics, Lole Ribara 29, 38220 Kosovska Mitrovica, Serbia (e-mail: negovanstamenkovic@gmail.com).
interpolation, decimation, pulse forming, echo suppression, equalization, etc. Two basic filter types are commonly implemented in DSP: Finite Impulse Response (FIR) filters and Infinite Impulse Response (IIR) filters. Both filter implementations includes binary-to-residue converter at its input, which converts the input data into equivalent residues. The filtering is mainly performed in the central block. Since there are L residues in the residue set, L sub-filters are used to process corresponding residues from the input. Reverse conversion is at the output; translation from residue representation back to binary notation is performed. In this paper architecture for the FIR filters implementation is proposed.
The complexity as well as the efficiency of residue to binary conversion and vice versa, is primary based on the proper selection of the modulo set and the conversion algorithm. Many different modulo set have been suggested, such as {2 n − 1, 2 n , 2 n + 1} [6] [7] [8] [9] , {2 n , 2 n − 1, 2 n−1 − 1} [10, 11] , {2 n , 2 n − 1, 2 n+1 − 1} [12] , {2 2n + 1, 2 n + 1, 2 n − 1} [13] , {2 n , 2 2n − 1, 2 2n + 1} [14] . {2 2n , 2 n+1 + 1, 2 n+1 − 1} [15] . In this paper we have used three moduli set {2 n − 1, 2 n , 2 n + 1}. This set of modulo is very popular due to simple conversion from a positional binary number system as well as an efficient implementation of some arithmetic operations. Nevertheless, it has disadvantage that residue (2 n + 1) requires (n + 1) bits to represent 2 n + 1 states, which means that almost half of the states remain unused.
This paper is organized in following way: in Section 2 we have introduced the necessary background of the structure of simple finite fields and residue number system. Section 3 discuss a method for design of linear phase recursive digital filters and method for translating 2's-complement binary representation into residue number system and vice versa. Section 4 presents the proposed design methodology and the RNS filter architecture for linear phase FIR filter. Simulation impulse and steady state response of FIR filter in residue arithmetic are illustrated in Section 5.
Residue Number System
Let us introduce the basic terminology [6] unique representation, in base β [16] .
The dynamic range of representable numbers is usually partitioned into two approximately equal parts, such that approximately half of the numbers are positive and the rest are negative. Thus, every representable integer, X , which satisfy one of two relations:
can be represented in RNS form. 4. The operations of addition, subtraction and multiplication are defined over the set of congruence classes in Z/MZ as:
These equations illustrate the parallel carry-free nature of the RNS. 5. The binary-to-residue converter is designed according to the following algorithm [17] . A K bit number X can be expressed as:
where b 0 is the sign bit. The residue of X mod m i , where m i , i = 1, 2, . . . , N, is the i-th modulus used to define the RNS code, can be writen using equation (2),
where ⊕ represents modulo addition. 6. The reconstruction of X from its residues {x 1 , x 2 , . . . , x k } is based on the Chinese Remainder Theorem:
where
7. Another way to convert RNS representation into weighted form X is by using Mixed Radix Conversion [18] . The vector
is the Mixed Radix System (MRS) representation of an integer X smaller than M, such that:
where x ′ i ∈ [0, m i ) are the mixed radix digits of X , and x
and can be computed using Euclid's algorithm [18, 19] . 
The Design of RNS Digital Filter
Finite Impulse Response (FIR) digital filters have attracted a great deal of interest because they are inherently stable structures which are much less sensitive to quantization errors than filters of the recursive type. An FIR filter is described by (7), where x n is the input to the filter, b k represents the filter coefficients, N is the filter order and y n is the filter output
For a very large N, filters implemented in traditional binary weighted number system suffer from disadvantages of carry propagation delay in binary adders and multipliers.
In RNS a large integer is broken into smaller residues which are independent of each other. Each residue digit is processed in parallel without carry propagation from one to another. This leads to significant speed up of multiply and accumulate (MAC) operations which in turn results in high data rate for RNS based FIR filters [2, 6] .
A modulo set must be selected which provides just enough dynamic range for the FIR filter. A set comprised of a large number of small-valued integers will provide a highly parallel RNS structure while maintaining low memory requirements, for stored-table operations. Consider the 31th-order lowpass linear phase filter described by the magnitude response of Figure 1 . This lowpass filter, which was designed by means of a published program based on the Parks-McClellan Algorithm is representative of a large class of FIR filters which require relatively high accuracy in the coefficients to prevent serious distortion in the frequency response.
The design and numerical computation of an FIR filter was done using MATLAB R [24] using Parks-McClellan algorithm in a two-step process. First is to use the firpmord command to estimate the order of the optimal Parks-McClellan FIR filter to meet design specifications. The syntax of the command is as follows: [n,fo,mo,w]=firpmord(f,m,dev), where f is vector of band frequencies, vector m contains the desired magnitude response values at the passbands and the stopbands of the filter, and the vector dev has the maximum allowable devia-tions of the magnitude response of the filter from the desired magnitude response. The second step is the actual design of the filter, using the firpm command b=firpm(n,fo,mo) to find the impulse response b of the Parks-McClellan FIR filter for our design.
A moduli set must be selected to provide just enough dynamic range for the FIR filter. Consider the 31th-order lowpass linear phase filter described by the magnitude response of Figure 1 .
The filter coefficients are shown in Table 1 for double precision (the IEEE 754 standard) and for 10-bit precision in integer notation. The spectrum of the quantization error which results from quantizing coefficients to 10 bits is shown in Figure 2 , where it can be seen that 10 bits is sufficient to maintain a quantization error which is 20 dB below the stopband filter response. Integer values in the third column in Table 1 are transformed from floating point value (second column) in two steps. First step is conversion of floating point filter coefficients b in binary string b binary using two MATLAB R functions, Q 1=quantizer('round',Format) and b binary=num2bin(Q 1,b). Value Format in quantizer MATLAB R function creates parameters of binary numbers: [wordlength, fractionlength] for signed fixed-point mode. For 10-bit precision format are wordlength=12 and fractionlength=10.
Second step is conversion of binary string b binary into integer value using two new MATLAB R functions: q 1=quantizer('round',Format) and b int=bin2num(q 1,b binary). In this case value Format is without fractionlength i.e. Format= [12, 0] . At last, integer values of filter coefficients are transformed in RNS number. This paper investigates binary to residue converter for the modulo set {63, 64, 65}. For example, double precision of filter coefficient b 1 is b=-0.00039444937475 which is converted to binary number b binary=000000010010, than to integer number b int=18, and at last to RNS number b RNS={18, 18 18}.
Assume that the data sequence is quantized to 8 bits (including sign) and that filter must be implemented without rounding error. An absolute upper bound on |y(n)| is given by (8)
The moduli set {63, 64, 65} provides a dynamic range of 17.9996 bits, which is adequate for most practical situations since the bound of 17,8719 bits given by (8) is extremely pessimistic. To produce linear phase filters, certain symmetry conditions have to be imposed on {b k }, where {b k } are real filter coefficients. Consider transfer function of order N whose transfer function is
In our paper filter order (N = 31) is an odd integer and suppose that {b k } has even
This structure is called the linear phase direct form. The block diagram implementation of the transfer function (10) is shown in Figure 3 for odd N. As it can be seen from Figure 3 the basic arithmetic operation is a multiplication followed by an addition. This is usually called a multiplyaccumulate (MAC) operation. 
The Architecture of RNS FIR Filter
The implementation of residue number system based on the finite impulse response of linear phase filter is shown in Figure 4 . As it can be noted, finite impulse response filtering is achieved in residue number system domain by using multiple modulo m i finite impulse response filter blocks. The implementation is generic and assumes that three modulo (m 1 = 63, m 2 = 64 and m 3 = 65) are chosen so as to meet the desired filter precision requirements. The finite impulse response filtering is performed as a series of modulo multiply-and-accumulate (MAC) operations across each modulo m 1 to m 3 . Designing and optimizing MAC operator is very important to carry out high performance and low power DSP operations. In general MAC operator is implemented using multipliers and adders. Block Forward converter is residue-to-binary converter for three modulo set of the form {63, 64, 65}. Note that forward conversion for the modulo-64 channel is achieved simply by keeping the least significant 6 bits of the 2s complement data.
To calculate the response of FIR filter y n to arbitrary inputs x n modulo MAC operation across each moduli m 1 , m 2 and m 3 is used. The input signal x n would be converted into residue form at the filter input. The residue encoded output y n M would be computed in parallel residue circuit 
The result y(n) is obtained by the RNS to the binary conversion block by using the Chinese Remainder Theorem (CRT)
Clearly, the input and output conversions, constitute a significant overhead in systems implemented in RNS.
Reverse conversion y n is last step in digital signal processing in residue number arithmetic. Result is integer number Y int. For comparison this results with results obtained trough standard signal processing, we have finished this example with conversion of integer number to fixed point presentation.
First step in conversion of integer number to fixed point number is conversion of integer number to binary number. Notice that the result of the two positive binary numbers b k m 1 [ x n−k m 1 + x n−31+k m 1 ] multiplication may be n + m digit long, where multiplicand is n digit long and multiplier is m digit long.
In our example input signal and coefficient are 7 and 10 bits long, respectively, then result is 17 digits long.
Thus, format in function Q 2=quantizer('round',Format) is Format= [17 0] .
Using Y bin=num2bin(Q 2, Y int) we can convert results to binary number.
Finally, for binary to fixed point conversion we use following MATLAB At the end of every subfilter output computation, data in data memory need to be shifted so that new data sample x n+1 come to place x n and data value x n in turn replaces the data value x n−1 . Due this data movement the clock rate can be maximized by modifying the data address register in order to act as a circular address generator. To implement this, a counter can be used which reset to location 0 after counting up to N. The new data sample read-in in this location and computation of next output is resumed.
If direct form linear phase FIR filter is realized so the input data are stored in one memory, while the coefficients are stored in another memory. Then each output is computed by performing (N + 1)/2 MAC operations. Thus, this structure requires 50% less multiplications than the direct form.
Signal processing in residue arithmetic
Transient response is important characteristic (characterization) of a system, though it is often used as the impulse response and sinusoidal steady state response.
Impulse response
Impulse response of a discrete-time (or digital) system is defined as the output (response), denoted by h(n), when the input is unit sample δ (n). In integer notation unit sample is multiplied by scalar 128
In {63, 64, 65} residue number system unit sample is Figure 5 shows impulse response of RNS subfilters. The coefficient values of the first and the second subfilter must be stored in 8 bits word, but values of the third subfilter must be stored in 9 bit word.
Unit sample on the input for the first subfilter is multiplied by scalar 2, for the second subfilter is multilied by zero, and for the third subfilter is multiplied by scalar 63. For impulse response of whole filter we can use the Chinese Remainder Theorem in order to convert a number presented in the residue system into conventional number system. This impulse response is shown in Figure 6 .
As it was expected, impulse response of linear phase filter (10) resulting from quantizing the coefficients to 10 bits, is shown in Figure 7 . It can be seen that 10 bits is sufficient to maintain error which is less than 4 × 10 −4 . 
Steady state response
Steady state response of RNS FIR digital filter is the second example. The term 'steady state response' arise naturally in the context of sinewave analysis. In our example sampled sinewave signal with frequency f = 1/8 Hz is x k = 127 sin(2 π k/16). Sampling frequency is F s =2 Hz. Figure 8 shows steady state response of RNS subfilters. Before signal application to the input of a digital filter, the filter's internal "state" is assumed to be equal to zero. When input sinewave is switched on, the filter takes a while to "settle down" to a perfect sinewave at the same frequency. The filter response during this "settling" period is called the transient response of the filter. The response of the linear and time-invariant filter, after the transient response, is called the steady-state response, and it consists of a pure sinewave at the same frequency as the input sinewave, but with amplitude and phase determined by the filter's frequency response at that frequency. In other words, the steady-state response begins when the LTI filter is fully "warmed up" by the input signal. More precisely, the filter output is the same as it would be if the input signal would be applied since time minus infinity. FIR filter of length N + 1 is fully "warmed up" after N samples of input; that is, for input starting at time k = 0, by time n = N, all internal state delays of the filter contain delayed input samples instead of their initial zeros. When the input signal is a unit step u(n) times sinusoid (or, by superposition, any linear combination of sinusoids), we may say that the filter output reaches steady state at time n = N.
In general, complete response of our filter is given by the superposition of its zero-state response and, initial-condition response. Zero-state response simply means the response of the filter to an input signal when the initial state of the filter is zeroed to begin with. The initial-condition response is of course the response of the filter to its own initial state, with the input signal being zero.
Note that, both the phase and group delay of a linear-phase filter are equal to N/2 samples of plain delay at every frequency. Since FIR filter of length N + 1 implements N samples of delay, the value N/2 is exactly half of the total filter delay.
As it was expected, steady state response of linear phase filter (10) and steady state response on the Figure 9 are similar. The quantization error of steady state response resulting from the coefficients quantization to 10 bits is shown in Figure  10 . It can be seen that error is less than 4 × 10 −3 . To decrease error coefficients must be quantized to more than 10-bit precision. In this case set of modulo must be larger, for example RNS= [127, 128, 129] . Coefficient of subfilter for this modulo set are stored in 7-, 7-, and 8-bit word.
Conclusion
An residue number system finite impulse response filter architecture is presented in this paper. The RNS coding technique is attractive for FIR filters which requires only multiplication and addition because these operations are very fast in an RNS. Since the RNS implementation, in its fundamental form, produces filter outputs with full precision (no roundoff error), it is particulary attractive for real time filtering and image data, both of which coarsely quantized to minimize processing and storage requirements.
An RNS design proposed for the 31th order lowpass FIR filter can be based on standard TTL IC packages.
