A new high-speed, programmable FIR filter is present, which is a multiplierless filter with CSD encoding coefficients. With this encoding scheme, the speed of filter is improved and the area is optimized. In order to make this filter more applicable, we employ a new programmable CSD encoding structure to make CSD coefficients programmable. In the end of this paper, we design a 10-bits, 18 taps video luminance filter with the filter structure we present. The completed filter core occupies 6.8× × × ×6.8 mm 2 of silicon area in Wu-Xi Shanghua 0.6-µ µ µ µm 2P2M CMOS technology, and its maximum work frequency is 100MHz.
I. Introduction
inite Impulse response (FIR) filters have been used in video and communication circuits more and more broadly, and higher performance in speed and area is demanded. The traditional FIR filter structure [1] , as shown in Fig. 1 , has already not met the speed demand of high performance systems, because it is restricted by the multiplier and adder circuits.
The transform function of FIR filter is described as follows.
The critical path delay of the FIR filter in Fig. 1 is T M +MT A , T M is the delay of one multiplication, T A is the delay of one adder, and M is the tap number of filter. It is clear that the critical path is rapidly increasing with the tap number of FIR filter. use of custom application specific integrated circuits (ASICs), because programmable signal processors (such as DSPs) cannot accommodate such high sample rates without an excessive amount of parallel processing, and for dedicated applications, the flexibility of a filter with high-speed multipliers [2] is not necessary.
In this paper, we present a new high-speed, CSD coefficient FIR filter structure, which is explored through studying CSD coefficient filters, BOOTH multipliers and high-speed adders. With such structure, we can implement any order high-speed FIR filters, and the critical path delay is almost not relative to the tap number. In order to make this filter more applicable, we employ a new programmable CSD encoding structure to make CSD coefficients programmable. In the following, we will respectively address CSD encoding in FIR filters in the second part, programmable CSD encoding structure in the third part and the structure of partial product adder tree in the fourth part. In the end, we adopt this type of filter to implement a 10-bits, 18 taps video luminance filter.
II. CSD encoding in FIR filter
The traditional FIR filter is shown as Fig. 1 , but for fixed coefficient filters we can simply shift the data bus to the left or right by an appropriate number of bits and F Fig. 2 . Frequency response and CSD coefficients for 10-bits, 18-taps luminance filter employ a small number of adders/subtracters instead of multipliers. The resulting hardware complexity is a small fraction of the complexity of a general filter with multipliers and thus a significantly larger number of taps can be integrated into a single chip.
As we all know, any fraction can be described as follows [3] .
where s k ∈{-1,0,1} and p k ∈{0,1,...,M}. The representation given by (2) has M+1 total (ternary) digits and L nonzero digits. A canonic signed-digit (CSD) representation is defined as the minimal representation for which no two nonzero digits s k are adjacent. So the number of adders/subtracters required to realize a CSD coefficient is one less than the number of nonzero digits in the fraction. For any coefficient of FIR filters can been translated into CSD coefficient [4] , we develop a MATLAB program to generate the CSD code of general FIR filter's coefficient. The CSD coefficients and frequency response diagram for a 10-bits, 18 taps luminance filter are shown as Fig. 2 .
III. Programmable CSD encoding structure
The complexity of FIR filter can decrease rapidly with CSD coefficient multipliers instead of fixed coefficient multipliers, but the compatibility decreases too. In this paper we explore a new programmable CSD encoding structure to decrease the complexity and increase compatibility. 3 is one tap CSD encoding structure, which CSD coefficients are generated by MATLAB program. There are no more than three nonzero digits in a CSD coefficient. The shift operation according to the position of nonzero digit is shown as Table 1 . The input of CSD encoding structure is 5 bits' signed binary number. MSB is singed bit, which represents negating operation; four low bits are the number of shifting left. If they are full one "1111", the output is full zero. In the end, the three outputs (partial products) of one tap are added together.
Employing the above programmable CSD encoding structure, the partial products and the internal data length decrease, but the resolution of filter does not degrade. As we all know, Nbits×Nbits BOOTH multiplier has [N/2] partial products [5] [6] and the internal data length is 2N bits, but CSD encoding structure has three partial products and the internal data length is smaller than N to guarantee the truncation error less than quantization noise. Thus the programmable CSD encoding structure is more advantageous in the complexity and compatibility.
IV. Partial product adder tree

A. Wallace adder tree in Booth multiplier
Multiplier is a fundamental unit in digital signal processing circuits, the searching about the multiplier architecture had grown up. The multipliers (Fig. 4) in [5] [6] employ modified Booth algorithm and parallel Wallace adder tree. It consists of the following three In the interest of improving the parallelity, we adopt a 4:2 compression adder instead of 3:2 full adder in partial product adder array. In the end, the final adder adds the 2N-bits C (Carry) and S (Sum) and generates the multiplier product.
A. Partial product adder tree in FIR filters
In the duration of design FIR filters, we design a fit adder tree for FIR filters through studying the above Wallace adder tree used in Booth multiplier. The partial There are three partial products output in one tap CSD encoding structure, so the N-taps FIR filter has 3N partial products. In order to add all these partial products, we need [log 2 3N] level partial-product adders (4:2 compression adder) and generate two partial products C (Carry) and S (Sum).
B. Final adder
The critical path delay of partial product adder array in FIR filter is only the delay of 2×[ log 2 3N] serial full adders. But final adder is a (N+M) bits full adder (M is the guarantee bits to decrease the truncation error), its delay is greater than the one in adder array. So we need Fig. 7 . Programmable, CSD coefficient FIR filter to adopt carry selected adder (CSA) with carry lookahead adder (CLA).Carry look-ahead adder is shown as Fig. 6 , which employs two 4-bits CLA to compose one 8-bits CSA. It can be concluded that the critical path delay is one 8-bits CLA and some multiplexers, so the adder of this structure can work at high speed.
V. Implementation of programmable CSD
We construct programmable CSD coefficient filter ( Fig. 7) with many above modules. In Fig. 7 , the top is programmable CSD encoding structure, the middle is 4:2 compression adder tree, the bottom is carry-selected adder with carry look-ahead adder. To eliminate the DC gain, the filtered signal is output to a DC gain cancelled module.
In digital video encoder (DVE) system, we design a 10-bits, 18-taps high-speed luminance filter using the above FIR filter structure. Its frequency response diagram is shown as Fig. 2 . In order to program the coefficients in FIR filters, we design a slave mode I 2 C bus controller. The whole chip is implemented in 0.6-µm 2P2M CMOS technology in Wuxi-Shanghua. The filter core area is 6.8mm×6.8mm, the maximum work frequency is 100MHz.
VI. Conclusion
In this paper we present a new high-speed, programmable FIR filter, which is a multiplierless filter with CSD encoding coefficients. With this encoding scheme, the speed of filter is improved and the area is optimized. In order to make this filter more applicable, we employ a new programmable CSD encoding structure to make CSD coefficients programmable. In the end of this paper, we also design a 10-bits, 18-taps video luminance filter with the filter structure we present. The completed filter core occupies 6.8×6.8 mm 2 of silicon area in Wu-Xi Shanghua 0.6-µm 2P2M CMOS technology, and its maximum work frequency is 100MHz.
