Abstract-This paper reports on results from the algorithmic design and simulation of a two-path polyphase decimation filter with 24-bit accuracy over the frequency range from dc to 15.2 kHz. The filter is suited for very high precision data conversion and measurement applications. The device reported in this paper has been designed for use with a fourth-order, singleloop, XA modulator running at 4096 kHz. Results of floating and fixed-point simulations, architectural design, comparative bit-level simulations and silicon implementation of the decimator are also reported, together with a sample baseband measurement of a fourth-order modulator.
I. INTRODUCTION
ITH THE advent of high-speed integrated digital instrumentation, control and signal processing systems, there is a continually increasing need for high-resolution and high-fidelity analog-to-digital converters (ADC's). Recent advances in VLSI technology, coupled with advances in digital signal processing (DSP) know-how, make it possible to address applications necessitating ADC resolutions in excess of the once highly regarded 16-bit data converter. The need to achieve resolution in excess of 16-bits monolithically has popularized the potential offered by deploying oversampled EA modulators, in conjunction with digital decimation filters, to overcome the limitations inherent in conventional analog techniques.
It is well known that the noise power of the EA modulator in the baseband must be less than 2-2(b-1)/12 relative to the reference levels f l , where b is the converter resolution. However, it is less well appreciated that in order to maintain the conversion precision over the entire bandwidth of the converter it is necessary for the peak-to-peak passband ripples of the decimation filter's magnitude response to be less than the least significant bit (LSB), 2-2(b-1), of the converter resolution specification. For a 24-bit converter this implies that the total baseband noise power must be less than -149 dB, and the passband ripples must not exceed f 0 . 5 pdB. The baseband noise power level can be achieved with an ideal fourthorder single-loop modulator running with an oversampling ratio of 128. The decimation filter must not allow significant amounts of out-of-band noise to be aliased into the baseband while the passband ripple must not exceed f0.25 pdB. This figure for the fullband ripple has been used in order to allow for the nonidealities in the rest of the system. Given this very stringent specification, there exist the questions as to what type of filtering approach is to be employed, how the algorithmic design must be carried out and what test procedures are to be deployed in evaluating such a system. If designed appropriately, a linear-phase FIR filter could satisfy these stringent requirements, but at the expense of huge dimensionality, which inevitably leads to inefficient silicon implementations. The filtering approach described in this paper is that of employing parallel combination of all-pass IIR recursive digital filters in a two-path cascaded polyphase environment, implementing high-order half-band decimation filters, with few coefficients, needing moderately short coefficient wordlengths, thus rendering them suitable for efficient physical realizations.
In the following sections we shall briefly describe the basic polyphase two-path structure, the technique employed in the algorithmic design of its filter coefficients, report results from ideal (floating-point precision) simulations of the filter, together with the simulatiodevaluation environment (including simulated performance measurement results from a fourth-order single-loop EA modulator) and the corresponding stimuli employed. The paper further elaborates on and gives details of the architectural design, comparative bit-level simulation results (with finite-wordlength two's-complement arithmetic) and custom silicon implementation, of the highfidelity polyphase decimation filter chip.
ALL-PASS TWO-PATH POLYPHASE APPROACH
The basic recursive (IIR) all-pass filter in a polyphase twopath configuration, with the appropriate delay in one of the branches, and their purpose-designed coefficients result in very high-performance and easily implemented half-band filters. Design techniques for such filters employing parallellcascade combinations of elementary all-pass sections having one coefficient per second-order stage, as the starting point for an eventual elliptic approximation, have been reported in depth [ 11- [3] . The algorithm for generating the prototype all-pass filter coefficients is fairly straightforward for floating-point precision coefficients [ 11. However, for effective real-time physical realizations (fixed-point) finite wordlength coefficients are required and need to be established [4] - [6] .
A. The Basic All-Pass Building Block
The basic building block is the second-order IIR all-pass filter having its two poles on the imaginary axis, and its zeros on the same axis, but at the reciprocal distance from the origin, having the transfer function
(1)
There exists a variety of physical structures which implement (1). The structure choice we have made here is that of the Numerator first, followed by the Denominator (N-D form), computations. Doing the calculations in this manner results in relatively low peak gains at intermediate points in the structure with the minimum of computations at the relatively small expense of an extra delayer in the first section. The physical structure of the basic N-D form second-order all-pass filter is shown in Fig. l(a) . Fig. l(b) shows the two-coefficient fourth-order structure resulting from the cascade of two basic second-order units. Higher even-order filters can be realized by similar extensions to the structure.
Configuring the appropriate order all-pass sections in a parallel fashion, with a delay in one of the branches, as shown in Fig. 2 , results in an overall lowpass half-band filter. The net effect on the pole zero pattern (PZP) of the resultant composite filter, and hence its response, is that its poles remain at the same locations as before with the addition of an extra pole at z = 0 (due to the delay added to the lower branch).
The zeros on the other hand get transported to new locations, with an additional zero introduced at the Nyquist frequency. For the case when second-order allpass sections are used, the magnitude response of each stage in Fig. 2 , is given by
The gain at dc ( z = 1) for this class of filter, is unity, with zero gain at Nyquist ( z = -l), and is down by -3 dB at half-Nyquist ( z = j ) , irrespective of the filter coefficients, ao, al, . . . or the order of the allpass sections. A concise yet very clear way of visualizing what happens in this filter is to consider the phase, as it is this attribute of the filter that effects the action (since the magnitude is a tame unity throughout). There exists a phase shift of exactly 7r (due to the unit delay in the bottom path) at Nyquist between the top and bottom branch, and the two branches are in phase at dc. There is however, a sharp transition in the phase at halfNyquist as the poles on the imaginary axis are approached and passed while traversing the unit circle from dc to Nyquist. Hence the rationale here is that the (top and bottom branch) filter responses add constructively (as they are in phase) from dc to half-Nyquist forming the new filter's pass-band and add destructively (as they are 7r out of phase) from half-Nyquist to Nyquist, forming the new filter's stopband response.
A simple yet very effective way of getting high levels of stopband attenuation, without substantially affecting the passband performance, is through cascading of lowerorder structures as shown in Fig. 2 . Detailed frequency-domain magnitude response performances for the overall decimation filter, designed and implemented based around this idea are given in the latter sections. It should be noted that a samplerate reduction inherent to the decimation process, (by a factor of 2, as half-band filters are used) takes place at the end of each cascade block. The implication of this is that the computations are performed at the higher rates, and subsequently half the computed samples are thrown away.
Since the transfer functions of the half-band allpass filters involve only polynomials in z 2 , the sample rate reduction can be moved to the input of the cascade blocks. The unit delay in the lower branch is effected by staggering the undersampling, and feeding even samples into the top branch and odd samples into the lower branch [7] - [8] . Fig. 3 shows the equivalent of Fig. 2 , with its sample-rate reduced at the input. The double delayers within the allpass sections (as shown in Fig. 1 ) are replaced with unit delayers at the lower rate. It is this basic structure that has been used throughout the decimator cascade. This enables us to design and optimize each lowpass filter stage to its required specification independently from its successive stages.
B. The Decimator Cascade
The overall decimation filter was constructed by cascading the input sample-rate reduced building blocks of Fig. 3 . Seven such stages were employed, providing a sample-rate reduction of 128. Fig. 4 shows the 118th-order, decimator filter cascade, comprising two, four and eight coefficient double-lowpass filter sections. The filter coefficients were optimized to have 13-bit length (signed), utilizing a specially developed "bitflipping algorithm" described in the next section.
The decimation filter output quantization noise power is the sum of the noise introduced by the modulator into the signal baseband and its part aliased during the decimation. The total of baseband magnitude response passband ripples, assuming that no distortion is introduced by the modulator, is the sum of all lowpass filter passband ripples of all the stages and the noise spectrum aliased into the signal band. For the case of a multistage decimation filter decreasing the sampling rate by two at each stage, both output quantization noise power and decimation filter passband ripples can be calculated at the end of each stage using the term of an "overall lowpass transfer function"
where M = fs/fN is an oversampling ratio, fs the modulator sampling frequency, fN the baseband signal sampling frequency and v = f / fs the normalized frequency. This idea is presented in Fig. 5 for a three-stage decimation filter. For a small number of decimation filter stages, the only significant noise aliasing into baseband originates from the modulator noise spectrum, at frequencies where only one low-pass filter has its stopband replica. Then the noise power Sg and the filter passband ripples PR at the end of ith stage can be computed using du. ( 5 )
m E COEFFICIENT DESIGN ALGORITHM
The prime motivation here is to find simple powers-of-two coefficients for use in the half-band decimation filters. The individual specification of each half-band filter in the cascade chain is automatically set by the desired ADC resolution and choice of EA modulator with the specified passband and stopband performances of the overall decimator equally split between the n stages in the cascade. However, to find the relevant binary constrained coefficients capable of satisfying the given decimator specification, a bit-jlipping algorithm has been developed [4] - [6] . This algorithm is seeded with the floating-point coefficients delivered from an elliptic approximation [l] . A structured exhaustive search of the possible bit pattems yielding improvement in the filter frequency response in question, starting from the most significant digit of the fixed-point coefficient and working toward the least significant one, is at the heart of this approach. The optimization process starts with the first-stage filter in the decimator and proceeds sequentially forward until the last stage is reached. If the performance of a given filter in the cascade is better than required at the end of any one stage's optimization, this fact is used to advantage by relaxing the specification of the following stage, hence giving the possibility of reduced implementation complexity. The bit-jlipping approach delivers more efficient filters for a given wordlength in comparison to a truncated elliptic filter result [6] .
Iv. THE SIMULATION SETUP AND mSULTS
In order to assess and validate the designed filter, and establish the overall datapath widths for the fixed-point realization, a simulation test setup, with the appropriate input stimuli was configured, in the Comdisco SPW and HDS environments [9]. The behavioral floating and fixed-point model of the structure of Magnitude response of the floating-point decimation filter for an input stimuli, a unit impulse, a composite signal of twenty sine waves directly input to the filter and the composite signal fed through the fourth-order modulator. 
A. The Floating-point Simulations
As mentioned earlier, three sets of simulations were performed. Fig. 6 shows the full-band (0-2048 kHz) magnitude response of the decimation filter, obtained from a 300000-point impulse test followed by a 262 144-point Hanningwindowed FFT, normalized to the peak. As can be seen from this response, the passband ripple (zoomed in) very comfortably meets the set requirements. The stopband attenuation of the filter is also within the specification.
The result of Fig. 7 is for the composite test signal, applied to the filter directly. Examining the zoomed baseband response, the filter faithfully delivers the twenty spectral lines of the test sines with -0.105 pdB worst case amplitude distortion, with its stopband showing the harmonic distortion of the baseband sinusoids below -150 dB. The phase response of the decimation filter reported here, is nonlinear (minimum-phase) as the filter itself is an IIR filter (close to an elliptic approximation). Hence the group delay of the filter is nonconstant and is monotonically increasing as is typical in an elliptic approximation. The worst case deviation of the group delay was found to be 26.4 ps referred to the input rate. As the group delay of the filter for this particular application was not of paramount importance and was within acceptable limits, no action was taken to flatten (linearize phase) the group delay. However it should be noted that an almost linear phase version of this filter could have been designed, but at the expense of increased implementation complexity. It should also be noted that the full-band magnitude response spectra displayed in Figs. 7, 8, 10 , and 11 are not normalized to the peak, whereas their accompanying zoomed-in baseband portions are.
Applying the composite sines to the fourth-order modulator, and feeding the one-bit modulator output to the decimation filter results in the response depicted in Fig. 8 . This figure clearly exposes the baseband amplitude distortion of the modulator, and what we effectively see here is the baseband transfer function of the modulator, for the given input conditions. From this result it can be said that our very high-fidelity decimation filter, can be used as a measuring tool in a EA modulator evaluation environment.
B. The Fixed-point Simulations
The same sets of simulations done for the floating-point model of the filter were performed for the fixed-point model too. The comparative results from these fixed-point simulations lead to the establishment of the datapath widths as well as the type of rounding to be employed in the fixed-point filter. To satisfy the specifications, it was found necessary to employ 32-bit two's complement arithmetic in the first three doublefilter decimator stages and 40-bit two's complement arithmetic in the later four stages. The loss of precision employed throughout was that of convergent rounding. Fig. 9 shows the full-band (0-2048 kHz) magnitude response of the decimation filter, obtained from a 300000-point impulse test followed by a 262 144-point Hanningwindowed FIT. As can be seen from this response, the passband ripple (zoomed in) comfortably meets the requirements. It is however worth noting that the ripple structure, though well within the required specification, is notably different from that of the "ideal" floating-point one. The stopband attenuation of the filter is also within the required specification.
The results shown in Fig. 10 are those for the composite test signal fed directly into the filter input. It can be seen that the amplitude distortion of the fixed-point structure appears to be marginally worse than the floating-point case, but is well within the required specification. The results for the composite 
v. ARCHITECTURAL DESIGN OF THE FILTER
The custom-designed decimation filter processor architecture that implements the decimator cascade comprises two bit-parallel sequenced processor structures. The total computational load of all the allpass sections is 27648 thousand difference-multiply-accumulate (DMAC) cycles per second in addition to the recombining additions. In order to effectively realize the filter in a regular CMOS process, without the need for very high-performance arithmetic blocks, a two-processor solution has been chosen. The processor computing the first two stages is the high-speed processor shown in Fig. 12 . It is clocked at 18.432 MHz, employing a two-phase clocking scheme. The processor can be viewed as comprising two distinct subprocessors, the inner and outer processors. The inner processor comprises two single-port RAM'S, latches, a subtractor, a 13 x 32 multiplier and an adder forming a DMAC. This sub-processor computes the allpass sections; and the outer processor, comprising the Dual-port RAM (DRAM), latches and an adder, combines the allpass outputs, to form the half-band lowpass output, for the following stages.
The remaining five stages of the decimator are calculated by the low-speed processor operating at 9.216 MHz, employing a four-phase clocking scheme. Fig. 13 shows the structure of the low-speed processor.
The low-speed processor is similar in overall structure to the high-speed processor in that it includes an inner and an outer sub-processor where a DMAC forms the core of the inner processor. However the low-speed processor uses only one single-port RAM which operates at twice the frequency of the DMAC and has additional data pathways required to implement the higher-order allpass filters used in the later stages of the decimator.
The overall chip plot of the entire decimation filter is seen in Fig. 14. The digital system design, logic simulation and physical layout of the chip were undertaken through the Chipcrafter silicon compilation environment [ 121, targeting to a 1.0 pm double-metal n-well bulk CMOS process.
VI. CONCLUSION
In this paper we have presented the complete design procedure, from specification, coefficient optimization, comparative high-and low-level simulation and efficient architectural realization amenable to custom silicon integration, of a very high-fidelity, decimation filter that is capable of resolving up to 24-bits of accuracy. The simulation results presented in the paper confirm the filter's potential, as it would be on the final implementation. At present we have the filter architecture implemented on a 1.0 pm double-metal n-well bulk CMOS process, with an approximate silicon area of 32 mm2. He worked in the field of television for Decca Records Ltd. and, during a period at the Imperial College of Science and Technology, London, was engaged in the design of electro-myographic equipment for the Migraine Trust. He joined the staff of the University of Westminster (formerly The Polytechnic of Central London) in 1971. He is currently Director of the Division of Electronic Systems. His research activities have included packet-switching local-area networks, discrete-time signal processing structures, aircraft braking systems, design methodologies for integrated circuit design, sigma-delta conversion techniques and teaching methods in electronic engineering. He is currently working on the theory and practical realization of high-fidelity sigma-delta and DIA converters.
Artur Krukowski He has been working in the School of Electronic and Manufacturing Systems Engineering of the University of Westminster since his graduation and has been involved in the design of a DSP CODEC chip and other DSP integrated circuits.
