I. Introduction
The massive advancement in multimedia, computing and mobile communications has led to increase in demand for Digital Signal Processing [1] . In DSP applications, filters are used for two purposes namely signal restoration and signal separation. Digital filters can be categorized into FIR and IIR filters. FIR filters are widely used in mobile communication applications for operations like matched filtering, channel equalization, spectral shaping and interference cancellation because of their high stability and exact linear phase. Recently with the development of Software Defined Radio (SDR) technology [2] FIR filters have been concentrated on reconfigurable implementation. Filter coefficients in reconfigurable filters change dynamically during run time. In biomedical applications like Electro Cardio Gram (ECG) the coefficients of FIR filters remain fixed. So there is need to implement a Reconfigurable and Fixed FIR filter structure to support above mentioned multi-standard communications. Several researchers have proposed different types of VLSI architectures for the implementation of Fixed and Reconfigurable FIR filters. Distributed Arithmetic (DA) based FIR filters use Lookup Tables (LUTs) for storing the filter coefficients and to minimize the system complexity [3] and [4] . Multiple Constant Multiplications (MCM) based architecture is proposed for fixed coefficient implementation in this method, multiplication operation is performed with the help of shift and add unit. Computation sharing programmable FIR filter has been proposed by J. Park [5] . It is used for high speed and low power applications and it minimizes redundant computation in FIR filters by using the Computation Sharing Multiplier (CSHM). In 2006 Chen and Chiueh have proposed a RFIR filter architecture based on Canonic sign digit (CSD). It minimizes the accuracy of filter coefficients, but it occupies more area and does not provide sufficient area delay structure [6] . The architectures in [5] and [6] are more suitable for small order filter lengths and not applicable for channel filters because of their huge complexity. In 2010 R. Mahesh et al proposed a VLSI architecture for RFIR filters based on Programmable Shifts Method (PSM) and Constant Shifts Method (CSM) [7] . PSM provides low area and low power consumption and CSM provides high-speed when compared with PSM. In 2013 and 2014, B.K. Mohanty et al proposed a RFIR filter based on the Block Least Mean Square (BLMS) algorithm. But, it is not suitable for higher order filters and reconfigurable filter coefficients, such as multi channel filters [8] and [9] . Recently, S.Y. Park and P.K. Meher [10] have proposed a DA-based RFIR filter architecture of the area and delay efficiency. The Block-processing method is not used in this architecture. However, the need for reducing the area while increasing the computational speed is still felt. This paper addresses the problem of developing a high speed and area efficient VLSI architectures for FIR filter in Fixed 
An Efficient VLSI Architectures for FIR Filter in Fixed and Reconfigurable Applications
DOI: 10.9790/4200-0704016372 www.iosrjournals.org .
Fig.4.
Internal structure of (l + 1) th IPC.
The Pipelined Adder Unit (PAU) receives partial products from all the M IPUs. Array of Kogge Stone Adder is used in PAU to add all the partial products is shown in Fig.5 . KSA is one of the Carry Tree Adders or Parallel Prefix Adders. Kogge Stone Adders gains more importance among all the adders because of its high performance. 
Fixed FIR filter architecture
The architecture of block FIR filter for fixed application is shown in the Fig.6 . For Fixed FIR filter implementation, the CSU is not necessary here filter coefficients are fixed. Similarly, IPUs are not used because multiplication operation is performed with Multiple Constant Multiplication (MCM) units to reduce the huge complexity of the architecture. The MCM based method is more efficient when a given input variable is multiplied with more number of fixed constants using shift and add method is shown in Fig.7 . It can be implemented by using adders/sutractors and shifters. Initially, the constants are expressed in binary form. Then for every non-zero digits in the binary format of the constant, based on its digit position the input variable is shifted and adds up the shifted variable to obtain a result. MCM is employed in many applications like error correcting codes, frequency multiplication and Multiple Input and Multiple Output systems (MIMO). MCM based technique for a Fixed FIR filter with block size L=4, make utilize of symmetry in input matrix S K 0 to execute vertical and horizontal common subexpression elimination and to reduce the number of shift and add functions in the MCM units. MCM can be employed in both vertical and horizontal order of the coefficient matrix. The MCM based method consists of six input samples similar to six MCM blocks. All MCM blocks compute the required product terms using shift and add method. The outputs of all MCM blocks are given to the adder network to produce the inner product terms. In the Pipelined Adder Unit (PAU) array of KSA is used to add inner product values and produce a block of the filter output.
III. Simulation Results
The proposed VLSI architectures for fixed and reconfigurable applications are written in a VerilogHDL, synthesized and simulated using Xilinx ISE 12.2 design tool and an ISIM simulator. The design properties used for simulation results are Spartan 3E family, XC3S500E device, FG320 package with a speed grade of -5.
Fig.8.Simulation Result of Reconfigurable FIR Filter
The simulation result of Reconfigurable FIR filter is shown in the Fig.8 . The input sample X K is a 16-bit binary value and filter length C m is also a 16-bit binary value. Input values are applied to RU it produces four rows of 16-bit input samples by performing a shift operation. The 16-bit filter coefficients are partitioned into a group of four bits. Multiplication operations are performed between filter coefficients and the input samples. These partial products are finally added in the PAU by using an array of Kogge Stone Adders (KSA). Finally, from KSA filter output is obtained. To illustrate the functionality of the proposed architecture a 16-bit input sample Xk=0011001000010000 and filter coefficient cm=0011010001010110 with clock=1 and reset =0 are considered, then shift operation is performed on input sample, which gives four rows of input. The first row of input is the same as the input sample w1. For the second row of input, the first four bits of the MSB (0011) of input sample become LSB and this process continues w2, w3 and w4 are obtained. Filter coefficient cm is partitioned into a group of four bits which gives w5, w6, w7, and w8. The four rows of input w1, w2, w3, and w4 are multiplied with filter coefficients w8,w7,w6, and w5 then the output w9,w10,w11,w12 are obtained respectively. After that by using KSA the summation of w9, w10, w11 and w12 are performed. Finally, the filter output yk=1101011010111100 is obtained.
Fig.9. Simulation Result of Fixed FIR Filter
The simulation result of Fixed FIR filter is shown in the Fig.9 . The input sample X is 8-bit binary value and having the filter length N=16. Input values are applied to RU it produces six input samples of 8-bit binary value. These input samples are multiplied by several constant coefficients using the MCM method. The outputs of all the MCM blocks are given to adder network it produce inner product values. The array of KSA is used in PAU for summation of all the inner product values. Finally, from KSA filter output is obtained. To illustrate the functionality of the proposed architecture a 8-bit input sample Xk=00011101 and fixed coefficient are considered. MCM operation is performed between the input sample and 4-wide MCM (h0, h1, h2, h3) having a width of 8-bit binary value then w1 is obtained. This process repeats for 8,12 and 16 wide MCM and the output obtained are w2, w3, w4, w5, w6, and w7. In adder network, these partial inner products are added which gives w8, w9, w10 and w11. After that by using KSA the summation of w8, w9, w10 and w11 are performed. Finally, the filter output yk is obtained. 
V. Conclusion
In this paper, a high-speed and area efficient transpose form block FIR filter is implemented for both Fixed and Reconfigurable applications. The proposed architecture is compared with that of RFIR and FFIR filter using Ripple Carry Adders (RCA) and Carry Select Adder (CSLA) in pipelined adder unit in terms of Area Delay Product (ADP). The proposed architecture has a better ADP than a RFIR and FFIR filter with RCA by 14.8% and 4.51% and is better than RFIR and FFIR filter with CSL by 19.4% and 17.9%. In the future, the area and delay can further be reduced in transpose form FIR filter by using Dadda multiplier in the Inner Product Unit.
