Abstract-In this paper several issues concerning the design and implementation of multistage decimators, interpolators, and narrow-band filters are discussed. In particular, the question of designing these systems in terms of minimum storage rather than minimum computation rate is examined. It is shown that a design which uses finite impulse response (FIR) filters for each stage, and which is minimized for storage is essentially minimized in terms of computation rate as well. The problem of further improvements in designing decimators and interpolators by taking advantage of DON'T CARE frequency bands is also discussed. For the early stages in a multistage design it is shown that fairly significant reductions in filter order can be achieved in this manner. A third issue in the design process is the question of practical schemes for efficient implementation of multistage decimators and interpolators in both hardware and software. One such efficient implementation is discussed in this paper. Finally, the problem of designing multistage decimators and interpolators using elliptic infinite impulse response (IIR) filters is discussed. It is shown that multistage IIR designs can be somewhat more efficient computationally than single-stage designs; however, the storage efficiency of the multistage lIR design is worse than that of the single-stage hR design.
Further Considerations in the Design of Decimators and Interpolators RONALD E. CROCHIERE, MEMBER, IEEE, AND LAWRENCE R. RABINER, FELLOW, IEEE Abstract-In this paper several issues concerning the design and implementation of multistage decimators, interpolators, and narrow-band filters are discussed. In particular, the question of designing these systems in terms of minimum storage rather than minimum computation rate is examined. It is shown that a design which uses finite impulse response (FIR) filters for each stage, and which is minimized for storage is essentially minimized in terms of computation rate as well. The problem of further improvements in designing decimators and interpolators by taking advantage of DON'T CARE frequency bands is also discussed. For the early stages in a multistage design it is shown that fairly significant reductions in filter order can be achieved in this manner. A third issue in the design process is the question of practical schemes for efficient implementation of multistage decimators and interpolators in both hardware and software. One such efficient implementation is discussed in this paper. Finally, the problem of designing multistage decimators and interpolators using elliptic infinite impulse response (IIR) filters is discussed. It is shown that multistage IIR designs can be somewhat more efficient computationally than single-stage designs; however, the storage efficiency of the multistage lIR design is worse than that of the single-stage hR design.
I. INTRODUCTION
J N earlier papers [1] , [2] a general theory of multistage decimators and interpolators for sample-rate conversion and narrow-band filtering was discussed. In this paper we expand upon some of these ideas and discuss further issues in the design of decimators and interpolators. In particular, in Section II we address the issue of designing multistage decimators and interpolators for minimum storage as opposed to designing for minimum computation. It is shown that the two solutions are relatively close and that a design which is minimized on the basis of storage is essentially minimized in terms of computation as well. A complete set of design curves for minimum storage are given to complement similar curves in [1] In Section III we discuss further improvements of multistage decimators by using multistopband ifiter designs in place of low-pass filters. It is shown that additional gains in efficiency up to about 25 percent are possible with this approach.
In Section IV we discuss further issues of implementing finite impulse response (FIR) decimators and interpolators in both software and hardware. A block diagram is presented illustrating a basic strategy for such an implementation.
Finally, Section V deals with the use of infinite impulse response (IIR) filter designs in the implementation of multistage decimators and interpolators. Design curves are presented and compared with those for FIR designs. Much of the discussion in this paper is a continuation and a direct extention of concepts developed in [1] , [2] . For the sake of brevity we will not repeat any of that discussion, but will assume that the reader is familiar with this work. The notation used in this paper is that which was established in [1] . Also, as shown in [1] ,the design relations for decimators and interpolators are dual and the same set of design curves apply to both designs. Therefore, the subsequent design relations and curves in this paper, although formulated for decimators, apply equally to the design of interpolators as well.
II. MINIMIZATION OF STORAGE VERSUS MINIMIZATION OF MULTIPLICATIONS
In [1] , [2] we considered the total number of multiplications and additions (MADS) as a criterion for optimization in a multistage decimator or interpolator design. It was also observed that a considerable savings in the total number of storage locations of a single-stage implementation could be accrued using a multistage implementation. In this section we consider the problem of designing multistage decimators and interpolators for minimum storage instead of minimum computation. The development of the problem is nearly identical to that of minimizing the amount of computation. It can be assumed that the total number of storage locations (for both coefficient and data storage) in the multistage design is approximately proportional to the sum of the lengths of the filters in each of the stages. That is, K NTGNi, (1) where NT is the total number of storage locations necessary, N1 is the length of the FIR filter for the ith stage, K is the total number of stages, and G is a proportionality constant which depends on whether we are implementing a decimator, interpolator, or low-pass filter, and on the particular manner in which it is implemented. From [1, eq. (20)] 1 it was shown stages and D( ) is a function of and the passband and stopband tolerances, respectively, for the filters in each decimation stage (see [1] ).
We will assume in this development that L1 = 1, i = 1, 2, -, K. That is, the decimator (or interpolator) has integer decimation (or interpolation) ratios, D1, at each stage. For designs with noninteger ratios, the integer L1 must be greater than one and, as seen from (2), this implies a much larger value for N. This serves as a strong inducement against using stages with noninteger decimation ratios (in addition to the reasons given in [1] ).
Using (1) and (2) 
is a relatively weak function of K. The more interesting function is T. NT can be minimized for a given K by minimizing T as a function of the D1's and then choosing the value of K which minimizes NT. To minimize T as a function of the D1's an optimization routine (in this case the Hooke and Jeeves algorithm [31) is used. Fig. 1 gives plots of the minimum T as a function of D and Lf and Fig. 2 gives plots of the corresponding decimation ratios for minimum storage designs. For comparison, the dotted lines in Fig. 1 correspond to values of T when the design is minimized for multiplications. For a one-stage implementation the two designs are obviously identical and for a two-stage implementation they are essentially indistinguishable on the curves in Fig. 1 . For three-and four-stage implementations, savings in storage of at most 2 :1 are possible using a minimized storage design instead of a minimized multiplication design. Thus the two solutions are relatively close and their difference becomes smaller as f becomes small. In comparing the decimation ratios of a minimized storage design (Fig. 2) to those of a minimized multiplication design [1, Fig. 71 , it is seen that the minimized storage design favors slightly lower decimation ratios for the lower numbered stages and slightly larger ratios for the higher numbered stages.
In comparing the number of required multiplications in a minimum storage design to that of a minimum multiplication design it was found that the cUrves are essentially indistinguishable from those of [1, Fig. 6 ]. Thus a design that is minimized in terms of storage is essentially minimized in terms of the number of multiplications as well. This is a consequence of the fact that minima for S are relatively broad minima whereas minima for T are slightly narrower.
A final important conclusion which can be drawn in comparing curves of S and T ([1, Fig. 61 and Fig. 1 ) for large values of D is that, in terms of multiplication rate, only small gains in efficiency are obtainable by using more than two or three stages; however, substantial gains in efficiency of storage are still possible by going to three and four-stage designs.
Together, the curves in Figs. 1 and 2 and those in [11 provide a useful set of guidelines for choosing practical D1 values for a wide range of decimator or interpolator designs. All of these curves are based on the approximation to filter order (3) given by [1, eq. (11)]. A similar set of curves was generated using [1, eq. (9)J2 and they were found to be essentially indistinguishable to those presented here and in [1] , thus justifying the use of this approximation.
III. FURTHER IMPROVEMENT OF DECIMATOR OR INTERPOLATOR DESIGNS USING MULTIPLE STOPBAND
FILTER DESIGNS (4a)
In the original design procedure for multistage decimators and interpolators the digital filter at each stage was designed as a low-pass filter whose job was to eliminate the frequency bands which would either alias back into the baseband (for a decimator) or appear as images of the baseband (for an interpolator). In this section it is shown how, for some of the (4b) stages, the low-pass filter can be replaced by a multiband digital filter of lower order than the original low-pass filterthereby reducing the overall computation for the multistage design. Fig. 3 shows the filtering requirements for the ith stage of a multistage decimator. The initial sampling frequency is fr(il) and the final sampling frequency iSfrj, defined as = fr(i-i) (5) where D1 is the decimation rate for the ith stage. The baseband is defined as the region from f 0 to f=f. The bands which are aliased back into the baseband after sample rate reduction are shown as dotted regions in Fig. 3 . Each of these bands is centered at integer multiples of the final sampling frequency for the stage, i.e.,ffrj, 2fri --, and is of width 2f. The regions between the dotted bands are DON'T CARE regions (and are denoted with a in Fig. 3 ) in that the frequency response of the filter can be largely left unspecified and unconstrained in these bands.
When realized as a conventional low-pass filter the specifications of Fig. 3 are the following: 1 _pIHi(e1(2 (-'))J<l +&, 0 Hj(e1(2r1)I 6 fri -f f 0.5f(1.1), (6) 2These curves had to be generated for specific values of and cS as N1 could no longer be simply factored into a product of and 6 and a product of the Di's. The use of [1, eq. (9)J also greatly compounded the difficulty of the optimization routine in finding a minimum as undesired local minima outside of the range feasible solutions were introduced by inclusion of the term f(& i, 62) F. 2frifsf'2fri+fs. As can be seen by comparing (6) and (7), the specifications for the passband and the transition band are identical. The differences occur primarily in the placement of the DON'T CARE bands following the initial stopband.
To illustrate the possible reductions in filter order which can be obtained by using this multiband approach to the design of the individual filters, Fig ized stopband cutoff frequency will be small, there are significant reductions in filter order which can be achieved by using the multiband design approach. However, when the ith stage is a latter stage in the chain (i.e., the normalized stopband cutoff frequency is relatively large), the percentage reduction in filter order obtained by using a multiband design is small. Fig. 4(b) shows similar results for a stage with a decimation ratio of 10. By extrapolating to the limits of the curves of Fig. 4 (as F becomes 0), it can be shown that a decimation ratio of D1 can be obtained using a simple filter--in particular a filter whose impulse response is a rectangular window of duration D1 samples will suffice for most practical cases.
Goodman [4] , [5] has exploited this result in proposing some extremely simple structures for realizing multistage decimators' and interpolators. To illustrate some specific design examples, Figs. S and 6 show comparisons between a low-pass filter and its equivalent multiband design for two different sets of specifications. For In this example the reduction in filter order from the low-pass design (N = 41) to the multiband design (N = 39) was small because of the relatively wide width of the baseband. is quite significant in this example.
The following examples illustrate the overall effect on the computation rate of substituting multiband filters for low-pass filters. In [2] ii was shown that a practical implementation of this decimator had decimation rates of D1 = 5, D2 = 2 for the two stages. Thus the filter orders for a low-pass design were N1 = 25, N2 = 27 for the two stages. The total multiplication rate for the two-stage decimator was shown to be [2] RiD 13/5 = 2.6 first decimation stage If the low-pass filter in the first stage is replaced by a multiband filter, the required filter order is reduced to N = 22. For the second stage, no reduction in filter order is possible since = 2 and for a 2:1 reduction the low-pass and multiband designs are identical. Thus for the overall decimator the total multiplication rate becomes In [2] it was shown that a practical implementation of this In summary, we have shown that for the early stages in a multistage design of decimators (or interpolators) a small, but significant, amount of savings in the required multiplication rate can be obtained by careful design of the digital filters required to remove the signal components which alias into the baseband. In particular, we have shown that the resulting design is a multiband digital filter with DON'T CARE bands between each of the bands of interest.
One interesting question which arises out of this discussion is whether or not it is beneficial to increase the number of stages in the design so as to exploit the computational advantages of the multiband approach to the fullest. This question is not a simple one to answer. Goodman [4] , [5] has found that when using a large number of stages to implement a large decimator, the individual filters can be economically realized using simple prototypes, e.g., rectangular, or Hamming window designs. The argument against using a larger number of stages in the design is that the system complexity in terms of extra control and timing logic increases, Thus there are distinct tradeoffs here, and each specific application must be treated and studied independently.
N. ADDITIONAL CONSIDERATIONS IN THE IMPLEMENTATION OF FIR DECIMATOR AND INTERPOLATOR DESIGNS
In [11 a practical scheme for implementing a general interpolator/decimator stage was presented. It involved the storage of the coefficients in a "scrambled" order such that both the data and coefficients could be sequentially addressed in the computation of the output samples. In this section we show how this scheme for a single stage can be incorporated into the overall structure of a multistage decimator or interpolator. We will restrict this discussion to the case of integer decimators and interpolators. A general block diagram for the implementation of a threestage decimator in cascade with a three-stage interpolator is shown in Fig. 7(a) and its corresponding control sequence is given in Fig. 7(b) . Together, the cascade results in the implementation of a low-pass filter. To realize only a decimator or interpolator, appropriate parts of this structure can be partitioned off from the main structure.
The decimator has three data buffers (S1D, S2D, and S3D) for storage or internal data for its three stages. These storage buffers are of durations N , N, and N words, respectively. Three additional buffers hold the coefficients for the filters in each of the three stages. The interpolator has six data buffers associated with it, three (SlI, S21, and S31) of lengths Qi, Q2, and Q, respectively, and three (Til, T21, and T31) of lengths D1, D2, and D3, respectively. In addition, stage i has a buffer for the "scrambled" coefficients which are partitionéd into D1 blocks of Q samples each. The operation of this structure is depicted by the control sequence in Fig. 7(b) . The data buffers can be thought of as shift registers although they do not have to be implemented in this way. The process begins by reading D1 samples from the main input/output (I/O) buffer into S1D. One output sample is then computed from stage one of the decimator and stored in S2D. This process is repeated D2 times until D2 samples have been computed and stored in S2D. One output is then computed for stage 2 of the decimator and stored in S3D. The above process is repeated D3 times until D3 samples have been stored in S3D at which point one output sample is calculated from stage 3 of the decimator. This completes one cycle of the decimator;D D1D2D3 samples have been read from the main I/O buffer and one output sample has been computed. A similar computation cycle can now proceed for the interpolator. The output sample from the decimator is stored into The above process was described as a serial process and represents a straightforward approach for a software implementation. If high-speed and/or special-purpose hardware are used, the structure in Fig. 7(a) is also particularly attractive as it easily lends itself (with slight modification) to various degrees of parallel processing and pipelining. For example, in the interpolation stages all D• outputs of a stage i can be computed in parallel [6] . Similar degrees of parallelism are possible in the implementation of the decimator stages. Pipelining is also an attractive possibility with this structure since it is essentially a completely feed-forward structure [7] . Each stage can be separately implemented in hardware. In this case it may be more attractive to choose the design such that the amount of computation is equally divided among the stages rather than minimizing the total computation or total storage. Another feature of this structure which could be used in a pipeline scheme is that, because of the duality of the decimation and interpolation stages, many of the same control and timing signals used for the decimator could also be used for the interpolator as well. reason for this is that FIR filters can be designed to have exactly linear phase. However, if linear phase is not an important consideration, then hR filters can be considered as an attractive alternative since they can generally be implemented more efficiently than equivalent FIR designs. In this section it is shown that an optimization procedure similar to the one discussed in [11 can be used to design an optimal hR multistage decimator, interpolator, or narrow-band filter. In this discussion we will only consider the case of integer decimation or interpolation stages. It is fairly straightforward to extend the results to noninteger stages.
To formulate the problem it is useful to reexamine the design of decimator and interpolator stages with integral ratios of sampling rates. As shown in [1, Fig. I ] , a decimator with a decimation rate of M can be implemented by filtering the input signal with a low-pass filter (in this case an hR filter) and then decreasing the sampling rate by selecting one out of every M samples of the output w(n) of the filter. Unlike the FIR implementation, the output w(n) of the hR filter (or all states of its recursive part) must be computed for all values of n prior to decimating by M. In a multistage decimator design the specifications on the low-pass filters are the same as those discussed in [1] with D1 -M1 for the ith stage. Similarly, for a 1 :L interpolator, the same design specifications as discussed in [1] apply to the low-pass hR filters at each stage. As in the FIR case, duality also applies in the case of hR designs of decimators and interpolators and, therefore, all design formulas and curves for multistage fIR decimators also apply to multistage hR interpolators.
With the above concepts in mind we can now formulate the relations for the multistage fIR decimator (or interpolator). We will assume that the hR filters are optimal elliptic designs (obtained from biinearly transformed analog designs).
Then the filter order for stage i can be shown to be of the form [8] N1 =A(, 65) Bj(D,f,K;Di ,D2, --,Dk1) ii1L1 D where G' is a proportionality factor which is dependent on (10) the method of implementation of the IIR filter structure.
For example, in a conventional cascade structure, three multiplications are required for the implementation of a second-order section and, therefore, for this structure G' is approximately -(and we must add one multiplication (11) for the gain constant).
With the aid of (9) the above expression can be written in the form (l )/( The reader should note the similarity of this expression to that of [1, eq. (18a) and (18b)]. It is seen that Rr can be expressed as a product of a function of the ripples, the initial sampling frequency, and a function of the cutoff frequencies and decimation ratios. In a similar manner, RT can be minimized by minimizing S as a function of the decimation (interpolation) ratios for each value of K and then choosing the value of K which minimizes the product, The function A(61)/K, 6) is tabulated in Table I for convenience.
The minimization of S can again be performed by an optimization routine such as the Hooke and Jeeves algorithm. This optimization is not a trivial one. The evaluation of the elliptic functions3 can be extremely sensitive numerically. Great care must be used in controlling the range of parameter values to avoid both arithmetic overflow and roundoff error, and also in constraining the optimization search to be within the region of realizable solutions.
Using the Hooke and Jeeves method, the above optimization problem was solved. Plots of minimized values of S are given in Fig. 8 and the corresponding optimum decimation ratios and B1( ) values are given in Figs. 9 and 10, respectively. These curves and Table I (which gives values for the function A as 61) and 6 vary) can be used in a manner similar to that of the FIR design curves as a guide toward selecting practical integer values of decimation ratios for decimator (or interpolator) designs.
Several interesting observations can be made from these curves and tables. As in the FIR case, the function of the ripples A(o/K, 6) (see Table I ) is a weakly varying function of K and again most of the interesting observations can be made from curves of the S function. From plots of S in Fig. 8 we observe that improvements in efficiency of approximately two or three are possible for moderate values of D (20-50) by using a multistage design and gains of up to about eight are possible for large values of D and small values of f. Although these gains are not as striking as those for FIR designs (which can be orders of magnitude), they do represent modest improvements. Another conclusion that can be drawn from the curves in Fig. 8 is that little, if any, improvement in efficiency can be gained by using a three-stage hR design over a two-stage hR design and, therefore, two stages, at most, are sufficient for most purposes.
The curves in Fig. 10 represent B, values for the optimized designs. By noting from (9) that N1 =AB,, it is clear that the 3See Hastings [9] for the evaluation of J( ( ). B1 values represent the normalized (e.g., N1/A) filter orders. We can observe from these curves that the order of the final stage of a two-or three-stage design is essentially equal to that of a one-stage design. This occurs because the orders of the hR filters are determined by the ratio fp/f3 (the unwarped transition ratio) as opposed to FIR designs whose orders are determined by the difference f -f,,. It can also be seen that the sum of the filter orders for a multistage design, i.e., the total required storage, will always be greater than that of a one-stage design. Therefore, a multistage hR decimator or interpolator is always less efficient, in terms of storage, than a single-stage hR design. Thus, unlike the FIR design where both computation and storage could be reduced, the multistage hR design represents a tradeoff between computation and storage.
An Example
To illustrate the use of the IIR design tables and curves we will choose a decimator with the following specifications:
These are the same specifications that were used for the 100:1 filter example in [2] and in the 100:1 decimator example in Section HI. For a one-stage hR design it can be seen from Table I (9)] and actual value of 14 must be used. For a cascade implementation this will require a multiplication rate of (1 ÷ 14 X 3/2)fr0 or 22 mults./s. For a two-stage hR design it is seen from Fig. 9 (d) that the optimum (theoretical) decimation ratios are D1 = 14.5 and = 6.9. From Table I and Fig. 10(d) we find thatA = 8.72, B1 = 0.39 and B2 = 1.6 giving theoretical filter orders of N1 3.4 and N2 13.9. One practical choice of decimation ratios is D1 = 20 and D2 = 5. This leads to theoretical filter orders of N1 = 3.8 and N2 = 13.8 (these values cannot be obtained from Fig. 10 ; see instead the tables in [8] ). Actual filter orders for this design are, therefore, 4 and 14 for N1 and N2 ,respectively.
Another practical choice of decimation ratios might be D1 = 10 and D2 = 10. This results in theoretical filter orders of N1 = 3.1 and N2 = 13.97 and actual values of 4 and 14, respectively. With careful design, a third-order filter might be substituted for N1 ,however.
The results for these three hR designs for the D = 100 decimation are tabulated in Table II In order to compare the hR designs to FiR designs a twostage and a three-stage FIR decimator design are also included in Table II . The design of the two-stage FIR decimator was taken from [2] and the three-stage design is that given in Section III. From this comparison we can observe that the FIR designs are more efficient in terms of computation than the IIR designs, however, they require considerably more storage for both data and coefficients.
VI. SUMMARY
In this paper we have discussed several of the important issues which concern the detailed design and implementation of multistage decimators, interpolators, and narrow-band filters. It was shown that, when using FIR filters, multistage decimators and interpolators which are minimized for storage were essentially minimized in terms of computation rate as well. This is because the minimum on the computation rate was a broad minimum of the objective design function and did not vary significantly as the decimation ratiOs of the individual stages were varied. Thus the optimum design based on storage coincided with the broad optimum based on computation.
A second issue in the design of optimal decimators and interpolators was the question of minimizing FIR filter order at each stage. It was shown that, for the early stages in a multistage design, one could take advantage of the DON'T CARE frequency bands which lay between each of the relevant frequency bands to reduce the filter order required to meet the given design specifications. It was shown that, for the early stages in the design, reductions in filter order of up to 50 percent were achievable in this manner. Another issue in the implementation of multistage designs was the question of how to efficiently implement multistage decimators and interpolators in both hardware and software. In Section IV a modular structure was discussed which was particularly suited for both hardware and software implementations. Techniques for pipelining the hardware structure for maximum efficiency were also discussed.
Finally, the question of the suitability of using hR filters in the implementation of multistage decimators and interpolators was discussed. It was shown that a multistage hR design is only slightly more efficient computationally than a single-stage hR design, and that it was always less efficient in terms of storage than the single stage design. In comparing the FIR and hR implementations of decimators and interpolators, it was shown that the storage required for hR implementations was considerably less than for FIR irnplementations; however, the computation rates were comparable. In addition, the FIR designs were linear phase designs whereas the hR designs were elliptic designs whose phase was highly nonlinear.
In summary, we have tried to discuss several of the issues which affect the design and implementation of multistage decimators and interpolators. We have tried to point out the advantages and disadvantages of each of the alternatives which can be used to design and implement these systems.
