Abstract-A power efficient multi-rate multi-stage Comb decimation filter for mono-bit and multi-bit CA A/D converters is presented. Polyphase decomposition in all stages, with high decimation factor in the first stage, is used to significantly reduce the sampling frequency of the Comb filter. Several implementations indicate that proper choice of the first stage decimation factor can considerably improve power consumption, area and maximum sampling frequency. In multibit EA AIDS, this optimum first stage decimation factor is function of the input wordlength.
These filters were usually implemented using the IIR-FIR technique [4] , Fig. 1 (b) . Recently lower power consumption has been achieved using the FIR2 [3] , and the POLY-FIR2 In this paper, we present a different representation of the Comb filter. This representation allows us to exploit the Polyphase decomposition in order to perform higher decimation factors at the input of the first stage. Although coefficients, resulting from this decomposition, require expensive multiplication operations and larger wordlength, the overall power consumption is lower. This is due to the significant reduction of the operating frequency. We show that, an optimum decimation factor exists that compromises the added complexicity of the Polyphase decomposition with the reduction of the operating frequency. This optimum decimation factor depends on the output wordlength of the CA modulator.
PREVIOUS WORK COMPARISON
In the IIR-FIR structure, shown in Fig.l(b) , the FIR filter,
operates at a sampling frequency M times lower In order to ensure stability of the IIR filter, the wordlength of the IIR filter has to be equal to (WO + k log2M) bits [4] , where WO is the number of bits at the filter input. The major drawback of this architecture is that the IIR filter is operating at maximum sampling frequency and with a very large wordlength. Equation (1) can be written in the following form:
Applying the commutative rule [6] , we get the FIR2 structure shown in Fig. 1 (c) . In this structure, the Comb filter is realized by cascading logaM identical FIR filters, (1 + z -~)~, each decimating by 2. The POLY-FIR2 structure [5] , illustrated in Fig.l(d) , is obtained by applying Polyphase decomposition [7] , to the FIR2 structure. In this case, the decimation occurs at the input of each filter, thus reducing by 2 the sampling frequency of each stage. The FIR2 and the POLY-FIR2 structures have the advantage of not having any stability problems and the wordlength of each stage i is limited to (WO + k i ) bits.
The average power consumption of a digital signal processing system is proportional to: the number of operations performed per sample, the wordlength and the sampling frequency. In Comb filters, we will assume that the number of operations is equal to the number of partial products to be added. The power consumption, P, can then be defined by the following relation:
where N P ; is the number of partial products to be added in stage i , W; the input wordlength of stage .i, Mj the decimation factor in stage j and I the total number of decimation stages.
Equation (3) is used to compare the power consumption for different implementations of a 5'" order Comb filter, with a decimation factor of 32. In the next section, we will introduce a different architecture that reduces even more power consumption, especially for low input wordlength.
PROPOSED COMB FILTER ARCHITECTURE
As shown in Fig.3(a) , we propose to decompose the Comb decimation filter into a first stage FIR filter HI ( z ) with a decimation factor M I , followed by a cascade of FIR (1 + z -' )~ filters with a decimation factor 2. The reason behind choosing this representation is that we would like to decimate as much as possible in the first stage. The following stages are kept with the minimum decimation ratio 2 because, when the wordlength of the input signal is high, reducing the sampling frequency does not compensate for the added complexity of the Polyphase decdmposition. In the following, we will explain how Polyphase decompostion is applied to Comb decimation filters. Equation (1) can be written in the following form: (4) where, The expansion of Hl(z) results in an FIR filter of order
The coefficients of this filter are integers and symmetrical h,(n) = h, (N -1 -n ) , where N = k(M1 -1). Applying Polyphase decomposition on the filter of equation (7), we get n =v n =v I ( 31, -1)
Efficient Polyphase implementation of HI (z) is shown in Fig.3@ ). As we can see, decimation takes place before filtering, so multiplications and additions are performed at a sampling frequency M I times lower than the frequency of the input signal. The subsequent filters decimating by 2 are nothing but a special case of the general case described above.
Higher values of Ml will significantly reduce the sampling frequency of the first stage which can be interesting for power consumption. On the other hand, we can see, from equation (3, that higher values of M I will increase the order of the filter Hl (z), which implies more complex coefficients and a higher number of partial products. Note also that the wordlength of the polyphase filter will increase since it is equal to ( W , + k logZM1) bits. The transposed-form requires larger wordlength for the intermediate registers, which can increase power consumption. The direct-form has a long critical path which limits the maximum sampling frequency of the filter. Since the use of Polyphase decomposition has highly reduced the operating frequency of the filter, the critical path is no longer a problem. Thus we have chosen the direct-form implementation. Fig.4 shows the general architecture for one stage of the comb decimation filter. All the subfilters, Eo, E l , .... EMMI resulting from the polyphase decomposition are operating at the same sampling frequency. One way of reducing the required hardware is to gather all additions from the different subfilters into one adder tree. This adder tree is also used in the multipliers to sum all the partial products. In fact, partial products resulting from different multiplications is gathered with the addition operations in the same adder tree. The Wallace tree [8] is an efficient realization of the adder tree. This technique is usually used in the implementation of high speed multipliers 1.2.3.4.5. G bits) . These filters have been realized using the proposed system architecture described in section 111. and with the implementation described in section IV.. Table I lists the number of partial products NP1 for all possible decimation factors M1.
Three criteria have been chosen for evaluation: power consumption, area and maximum sampling frequency. Power consumption is estimated using equation (3) . A similar equation can be deduced to estimate the area of the circuit. We assume, as in section II., that the hardware required to add the multiplication partial products is dominant. The area, A, can then be defined as
The maximum operating frequency, F, , , , For multi-bit (6-bit) EA, minimum power consumption and area are achieved for a decimation factor Ml = 2 and Ml = 4. Since higher frequency of operation can be achieved with Ml = 4, the implementation with Ml = 4 is more interesting.
In general, for multi-bit EA, we can see that, as the number of bits at the input of the Comb filter decrease, the proposed architecture becomes more interesting. Although the main purpose from this architecture was to achieve low-power consumption, significant improvements regarding area and maximum sampling frequency have also been obtained. 
VI. CONCLUSION
Low-power implementations of a Comb decimation filter for mono-bit and multi-bit EA A/D converters have been presented. A multi-stage polyphase structure with maximum decimation factor in the first stage has been used. The proper choice of this first stage decimation factor can significantly improve power consumption, area and maximum sampling frequency. In order to find this optimum first stage decimation factor, simple equations have been developed to estimate circuit performances of the proposed architecture. Gathering all the partial products additions into one adder tree has also considerably reduced the required hardware for the circuit. 
