Abstract-FIR filter is widely used in the fields of signal processing with high fidelity. In the design of FIR filter the traditional algorithm such as serial DA algorithm, parallel DA algorithm and combined series and parallel DA algorithm are all not so suitable, which has lots of shortcomings. In this design, we improve the DA algorithm and uses the 4-BAAT method which optimizes the structure of LUT(look-up table) to realize the FIR filter. The process of FIR simple tap coefficients is achieved by the operation of shifting and reversing of symbol bit, which can pick up the speed and save the logic resources. This design adopts the Verilog HDL language, using FDA Tool to determine and quantify the coefficients with the given filter parameters. Then it uses Quartus II8.0 to integrate and wire, and Modelsim to simulate and verify. Compared with the filter achieved by standard DA algorithm, this filter reduces complexity, saves logic resources and picks up the speed besides it has the advantages of good reconfiguration, simple hardware structure and high real-time.
INTRODUCTION
FIR filter has the good stability and linear phase in the digital signal process, so it is widely used in the fields of signal processing with high fidelity such as digital audio, image processing and biomedical fields [1] .
There are three hardware implementation ways of FIR currently, digital integrated circuit, the chip DSP and the programmable logic device [2] .
Because of regular logic blocks and abundant connection resources, FPGA is suitable for achieving FIR filter with fine-grained and high parallel structure. Besides, the parallelism and scalability of FPGA are better than general DSP chip's which is dominated by serial arithmetic, so the FIR filter designed by based on FPGA will has good application prospects.
However, the multiply-and-accumulate structure (MAC) is needed in the implementation of FIR filter based on FPGA, which will consume a large amount of logic resources [3] .
Thus, we decide to optimize DA algorithm to solve this problem.
In this design, we choose FIR low pass filter as the research content and the look-up table technique as the research object in DA algorithm, using FDA Tool to determine and quantify the coefficients with the given filter parameters. To pick up the speed and save the logic resources, we adopt the 4-BAAT method to optimize the structure of LUT(look-up table) and achieve the process of FIR simple tap coefficients by the operation of shifting and reversing of symbol bit.
II. BRIEF INTRODUCTION OF THE FIR FILTER AND DA ALGORITHM

A. The introduction of the FIR filter
The mathematical expression of FIR filter：
where, K stands for the FIR filter taps; h(k) stands for the tap coefficients with class k ,that is the response of unit impulse; x(n-k) stands for the input signal delaying K taps; x(n) and y(n) separately stand for the input and output [4] .
Its basic structure is shown in Fig .1 FIR filter of linear phase type can save the resources owing to the symmetrical characteristics of impulse response coefficients.
Accordingly, the key content of FIR filter designing is the calculation of K-times convolution, which leads to a large amounts of multiplications during the hardware implement. So we use the optimized DA algorithm to solve this problem.
B. The introduction of the DA algorithm
Using LUT (Look-Up Table) in the DA algorithm can change the multiplication operation into the addition operation during the convolution operation, which will save the time for searching.
DA algorithm consists of serial DA algorithm, parallel DA algorithm and combined series and parallel DA algorithm. The serial distributed algorithm has the relatively simple structure and less occupied resources, but its speed is not fast due to the length of the data. While the parallel distributed algorithm has the neat structure, and it is widely used in the occasion with high requirement of the speed. But the parallel distributed algorithm will cause more consumption of resources [5, 6] .
No matter what kind of distributed algorithms, they all choose ROM as a look-up table. Experiments show that with the increasement of the filter order number, the number of ROM based on the 2 power is increasing too [7] .
Therefore, we optimize the look-up table structure of combined series and parallel DA algorithm in order to achieve the balance of improving processing speed and saving logic resources.
III. DESIGN OF IMPROVED FIR
Because the performance parameters of filter has been researched in many papers, so this design doesn't regard the performance parameters of filter as the key points, and mainly discusses how to optimize the hardware structure of the filter when its parameters are given.
In order to determine the impulse response coefficients conveniently, the filter is designed with the FDATool in the MATLAB software [8] . Where, the indicators are set as follows: the sampling frequency is 10MHz; cutoff frequency is 1MHz; order of filter is 16; input data width is 8 bits and coefficient data width is 8. The impulse response coefficients generated by FDATool and the quantized coefficients are shown in Table1.
For multiplication operations of the first four quantized coefficients (0, -1, -2, 4), we can use the operation of shifting and inversing of symbol bits, which will save lots of time and the resources. And for the four general coefficients (21, 49, 80100), we use the improved look-up table method to do the multiplication operations.
During the look-up table process using 4-BAAT (4bist-at-a-time), the symbol bits does not participate it and is directly regarded as a result of the sign bits8. (10) hn (6) 0.155284877548770 80 0101 0000 hn (9) hn (7) 0.194576857477141 100 0110 0100 hn (8) The Schematic diagram of improved DA algorithm is shown in Fig .2 . Because the FIR filter uses a linear symmetric structure, the impulse response coefficients have symmetrical features. Then we symmetrically add the pre-input x(n) for simplifying the design. The SRL box loads the four addition values with 8 bits, where the right columns are for four lower bits of the number , and the left column are for four higher bits.
The FIR designed by FPGA is composed of five modules: input, pre-addition, LUT, shift summation and adder tree. The structure diagram of modules is shown in Fig .3 . Figure 1 . The basic structure of linear phase type FIR filter Figure 3 . The structure diagram of modules
The input module accepts the input sampling data and provides the data for the pre-addition module to complete the pre-addition of two sampling values with the same impulse response coefficient; The LUT module changes the multiplication into a look-up operation by 4-BBAT searching method; Shift summation module does the operation of shifting adder on the data searched by LUT; Adder tree module adds the products of pre-added data and coefficients to solve the inherent addition in the convolution operation.
IV. THE DESIGN SIMULATION AND ANALYSIS
This design uses Verilog HDL language for the FIR filter RTL description.
It carries on the synthesis, placement and routing on the QuartusⅡplatform, finally, does the timing simulation by Modelsim software [9] . With 100MHz clock, the simulation waveform is shown in Fig.1.4 . The timing simulation test condition is 0.8MHz signal source and 5MHz noise. The diagram shows that the design is correct and reaches the desired performance parameters of the filter. Fig .5 and Fig .6 separately give the simulation results using the Matlab under the same conditions. Fig .5 shows the simulation waveform with noise source and Fig .6 shows the simulation waveform after filter treatment.
As can be seen from the figure, the simulation results of Modelsim and MATLAB are basically similar. There are some little gap, because the signal using in Modelsim is handled by sample process and the impulse response coefficients is quantified by the FDA Tool. Thus these two results exist some error to some extend.
The table2 shows the resources consumption of the FIR filter which is realized by three methods of Qutraus II synthesis under the same conditions. The methods are serial DA algorithm, parallel DA algorithm and combined series and parallel DA algorithm. From the table above, we can simply draw a conclusion that the improved DA algorithm can overcome the shortage of low speed of serial DA algorithm and the large amounts of resources of parallel DA algorithm, which greatly follows the principle that the space and speed change in balance in FPGA design.
V. CONCLUSIONS
This paper based on the DA algorithm, uses 4-BAAT method to optimize the look-up table structure in the design of FIR filter. Besides, the process of FIR simple tap coefficients is achieved by the operation of shifting and inversing of symbol bits, which picks up the speed and saves the logic resources. Finally, we verify the correctness and advantages of the design with the input of 0.8MHz signal sources and 5MHz noise sources. The FIR filter is widely used in different occasions, which causes its different performance parameter requirements [10] , so in the next step the aim is to solve the shortcomings of traditional DA algorithm, in order to improve its application in practice.
ACKNOWLEDGMENT
This work was financially supported by the Yunnan Re form Project Foundation "Research and Exploration in co 
