This paper presents the realization of area efficient architecture using Distributed Arithmetic with Offset Binary Coding (DA-OBC) for implementation of Finite Impulse Response (FIR) Filter. Area complexity in the algorithm of Finite Impulse Response Filter is mainly caused by multipliers. Among the multiplierless techniques of FIR Filter, Distributed Arithmetic is most preferred area efficient technique. In this technique, partial products of filter coefficients are precomputed and stored in Lookup Table ( LUT) and the filtering is done by shift and accumulate operations on these partial products. However, the scale of the LUT will increase exponentially with the co-efficient. If the co-efficient is small, it is very convenient to realize. While the coefficient is large, it will take up a lot of storage resources of FPGA and reduce the calculation speed. The paper presents the improvement of the DA algorithm by reducing the LUT size and delay using Offset Binary Coding Algorithm. The design based on Altera EP2S15F484C3 Chips is synthesized under the integrated environment of Quartus II 9.1. The result of simulation and test shows that Offset Binary Coding greatly reduces the FPGA hardware resources and the high speed filtering is achieved compared to conventional DA Algorithm.
Introduction
Due to the intensive use of FIR filter in video and communication systems, high performance in speed, area and power consumption is demanded. Basically, digital filters are used to modify the characteristics of signals in time and frequency domain and have been recognized as primary digital signal processing [1] . There has been a growing trend to implement DSP functions in FPGA last few years, which offer a balanced solution in comparison with Application Specific Integrated Circuits (ASICs) and Digital Signal Processors (DSPs). The advantages of the FPGA approach to digital filter implementation include high sampling rates than are available from tradition DSP chips, lower costs than ASIC for moderate volume applications. In that sense, the research community has put great effort in designing efficient architectures for DSP functions such as finite impulse response filters which are extensively used most of signal processing application such as Digital Communication, Speech Processing, Wireless/Satellite Communication, Bio-medical Signal Processing and many others due to its linearity and stability. Only the limitation offer by it is large number of taps, to get desired frequency response, which leads to area complexity. In general form, the FIR filter [2] is characterized by
Equation (2) shows that, output of FIR filter is sum of product of impulse response and input sequence i.e. the extensive sequence of multiplication operations. Since the multipliers are costly in terms of area, several multiplierless schemes had been proposed. These methods can be classified in two categories based on how filter co-efficients are manipulated for the multiplication operation. The first category of multiplerless technique is the conversion based approach, in which filter co-efficients are transformed into other numerical representations whose hardware implementation or arithmetic manipulation is more efficient than the traditional binary representation. One of the examples of such a multiplierless technique is Canonic Signed Digit (CSD) method, in which filter co-efficients are represented by a combination of power of two in such a way that multiplication can be implemented simply by adder/subtractor and shifter [3] . Dempster-Mcleod method is also conversion based Area Efficient Implementation of FIR Filter Using Distributed Arithmetic with Offset Binary Coding www.iosrjournals.org 2 | Page approach but in this case partial results are arranged in cascade to get further savings in the usage of adders [4] . Second category of multiplierless technique is memory based approach involves memory or lookup tables used to store precomputed filter co-efficient operations. Constant co-efficient multiplier [5] and Distributed Arithmetic [6] are the memory based methods. Distributed Arithmetic (DA) is the one of the efficient technique for realization of higher order filters as it can achieve high throughputs without the help of a hardware multiplier which was first developed by Croisier et.al [7] . The complicated multiplication-accumulation operation is converted to the shifting and adding operation when DA algorithm is directly applied to realize FIR filter [8] . The DA has proved to be an area efficient technique of FIR filter implementation. While using this technique special care is required against exponential growth of LUT size. Slicing of LUT to the desired length, gives an effective solution [9] .
In this paper area efficient implementation of FIR filter using Distributed Algorithm with Offset Binary Coding is proposed. The proposed algorithm reduces the LUT size by a factor of 2 to 2 N-1 . The obtained implementation results of proposed method are compared with conventional DA algorithm. The next section describes DA based FIR filter and section 3 describes DA with offset binary coding based FIR filter. Section 4 gives the performance analysis of FIR filter using proposed algorithm.
II. Distributed Arithmetic Based FIR Filter
Distributed arithmetic is a bit level rearrangement of a multiply accumulate to hide the multiplications. It is a powerful technique for reducing the size of a parallel hardware multiply-accumulate that is well suited to FPGA designs. At any instant n, the output y[n] of N-tap FIR filter is given as
Where {h i }'s are M-bit filter co-efficients and {x i }'s are input samples coded as k-bit 2's complement numbers given by
 
By substituting equation (4) in equation (3), we get
Therefore, by interchanging the summing order of i and j, the initial multiplications in equation (3) are now distributed to another computation pattern. Since the term h j depends on the x i,j values and has only 2 N possible values, it is possible to precompute them and store them in a read only memory(ROM) or LUT. An input set of N bits (x 0,j , x 1,j , x 2,j ,………..x N-1,j ) is used as an address to retrieve the corresponding h j values. These intermediate results are accumulated in k clock cycles to produce one y value. This leads to a multiplierfree Area Efficient Implementation of FIR Filter Using Distributed Arithmetic with Offset Binary Coding www.iosrjournals.org 3 | Page realization of vector multiplication. Table 1 shows the content of the ROM for N=5. Fig.1 . shows the DA implementation of a N-tap FIR Filter. The Shift -Accumulate is a bitparallel carrypropagate adder that adds the LUT content to the previous accumulated result. The inverter and the MUX are used for inverting the output of the ROM inorder to compute h k-1 and the control signal S is 1 when j = k-1 and 0 otherwise. The computation runs from j = 0 to j = k-1.
Area Efficient Implementation of FIR Filter Using Distributed Arithmetic with Offset Binary Coding www.iosrjournals.org 4 | Page
III. Distributed Arithmetic with Offset Binary Coding Based FIR Filter
The size of ROM is very important for high speed and area efficiency. The size of ROM increases exponentially with each added input address line. The proposed Offset -Binary Coding can reduce the ROM size by a factor of 2 to 2 N-1 . By rewriting equation (4) as, Area Efficient Implementation of FIR Filter Using Distributed Arithmetic with Offset Binary Coding www.iosrjournals.org 5 | Page values are mirrored along the line between the 16 th and 17 th rows. In otherwords, the term D j has only 2 N-1 possible values depending on the x i,j values. Therefore it is possible to reduce the ROM size by a factor of 2. Table III shows the content of reduced ROM with OBC. Table: 3 
IV. Implementation Results
The design and implementation of the existing method and the proposed method is done using the Verilog HDL coding and synthesized on Altera, Quartus II 9.1. The Altera Quartus II design software provides a complete, multiplatform design environment that easily adapts to specific design needs. It is a comprehensive environment for system-on-a-programmable-chip (SOPC) design. The Quartus II software includes solutions for all phases of FPGA and CPLD design as shown in figure 3 . In addition, the Quartus II software allows us to use the Quartus II graphical user interface and command-line interface for each phase of the design flow.
Fig. 3 Quartus II Design Flow Design procedure:
Step 1: Derive the filter Co-efficient according to specification of filter.
Step 2: Store the input value in input register.
Step 3: Design the LUT, which represents all the possible sum combination of filter co-efficient.
Step 4: Accumulate and shift the value according to partial term beginning with LSB of the input and shift it to the right to add it to the next partial result.
Step 5: Analyze the output of filter as per specification. 
(a)
Test bench Result Table IV . Table IV shows that 8-tap DA-OBC FIR filter reduce the memory requirement and time delay. Therefore the proposed design utilizes very less chip area compared to DA algorithm, which stems from the fact that it demands half the memory size of DA algorithm and utilize less combinational logic. The proposed algorithm reduces the time delay very much as compared to conventional method.
V. Conclusion
The Complicated Multiplication -Accumulation operation is converted to the shifting and addition operation when the DA algorithm is directly applied to realize FIR filter. However, the size of LUT increases exponentially with each added input address line. The proposed algorithm for FIR filter synthesized under the integrated environment of Altera, Quartus II 9.1 which is area efficient since it reduced the memory requirement by a factor of 2 as compared to conventional DA algorithm and the proposed algorithm reduces delay approximately 5 times as compared to conventional FIR filter. So these filters can be used in various applications such as Adaptive filtering for noise cancellation and echo cancellation, pulse shaping in WCDMA, software design radio and signal processing system for high speed. In future the work to reduce the power consumption by reducing the critical path using either pipelining or parallel processing could be performed.
