Introduction: Decimation is one of the most frequent and useful tasks in the field of communication. In a receiver or a digital radio the incoming signal has to be decimated and its spectrum has to be limited according to the given channel filter requirements. The cascaded integrator comb (CIC) filter introduced by Hogenauer [1] is the first among the chain of filters close to the analogue-to-digital converters as it can support the high incoming sample rate and can provide high decimation ratios. CIC filters are available as two's complement arithmetic designs, as Xilinx IP block [2] and as COST ICs from Harris=Intersil, i.e. HSP43220 [3] . Our previous attempt involved the use of the residue number system (RNS) arithmetic [4] . Although the RNS implementation improved the speed of the design, the overall cost measured as a product of area and time (A * T) was not favourable.
CIC filter theory:
The principal blocks of a CIC filter are an integrator and a differentiator with a rate changer in between. The transfer function of a CIC decimation filter with 'S' stages is given by,
where D is the number of delays in the comb section and R is the down sampling factor. The CIC filter with two's complement adder uses the least number of resources but with the increase in the number of stages and the number of input bits, the adder becomes very slow owing to carry ripple. Several techniques for multiple operand addition that attempt to lower the carry propagation penalty have been proposed and implemented [5] . Among these, the carry save adders (CSAs) are the fastest since there is no carry propagation until the last stage, while in the other stages a partial sum and a sequence of carries are generated separately. We have incorporated these adders only in the integrator sections in our designs, however we have used two's complement addition in the comb sections. Owing to the presence of the feedback in the integrator, the standard CSA design becomes larger with the number of stages in the CIC design. The first stage has a (3,2) CSA, followed by a (5,3) CSA and all consecutive stages will have (6,3) CSAs as shown in Fig. 1 . Thus, with an increase in the number of stages, the performance of this adder deteriorated, using more silicon resource and decreased speed of computation. This drawback was overcome by the use of 'Modified CSA' (MCSA), which is obtained by combining multiple (3,2) CSAs in a so-called Wallace tree. Fig. 2 shows the resulting MCSA structure. Pruning technique: The quantisation introduced through pruning in the final stage is very large when compared with the quantisation introduced in the output by pruning some least significant bits (LSBs) at the previous stages. If s T,2Sþ1 2 is the quantisation noise introduced through pruning in the output, Hogenauer suggested to set it equal to the sum of the (truncation) noise s T,k 2 introduced by all previous sections. From Figs. 1 and 2 it can be seen that the CSA and MCSA designs introduce more noise sources than the original two's complement design. More precisely, the MCSA and CSA configurations introduce a total of S and (2S À 1) additional noise sources, respectively. We can take care of these additional noise sources by adjusting (A17) in Hogenauer's pruning equations as follows: in the error distribution technique we distribute the additional noise sources over all stages, including the comb sections, i.e. we replace Hogenauer's equation (A17) with
In the direct quantisation noise adjustment we reduce the extra noise in each stage by scaling all noise sources to the allocated noise margin for that stage. We would replace Hogenauer's (A17) by
for CSA with k ¼ 2; . .
The comb section will be unchanged in this case. The cic:exe program from [5] was modified in order to compute the modified bit width for the CSA and MCSA designs and the results for the COST IC HSP43220 are shown in Table 1 . It can be noticed that the error distribution yields larger required bit width in the comb sections and therefore we used the direct quantisation method (shown bold in Table 1 ) without error distribution for our designs. Tables 2 and 3 , respectively. These Tables show only the best results regarding the area * time cost metric. For a complete listing including the optimum synthesis results for maximum speed optimisation we refer [6] .
Conclusion: This Letter evaluates the use of carry save adders (CSAs) to implement Hogenauer's popular CIC filters. The quantisation error analysis for CSA shows that no more than one additional guard bit precision is needed when compared with Hogenauer's 2C pruning. Synthesis results for a typical design example used in the Harris=Intersil HSP43220 have been compiled and show an improvement in speed from 84% to 164% and up to 31% for costs metric (A * T) for the Altera EPF10K130EQC240-1 FPLD. Improvements in speed are reported as increased from 53% to 129% and up to 44% for costs metric (A * T) using CBIC Synopsys tools when compared with the conventional two's complement design.
