FIR Filter Implementation Using DA with LUT less Design Structure Based on FPGA by Patel, M. S. (Mr) & K, M. J. (Mr)
                                                                                
International Journal of Modern Communication Technologies & Research (IJMCTR) 
 ISSN: 2321-0850, Volume-2, Issue-3, March 2014   
                                                                                               23                                                                 www.erpublication.org 
 
 
Abstract- This Paper describes the implementation of FIR 
filters on FPGA based on traditional method costs considerable 
hardware resources, which goes against the decrease of circuit 
scale and the increase of system speed. It is very well known 
that the FIR filter consists of Delay elements, Multipliers and 
Adders. Because of usage of Multipliers in our design gives rise 
to 2 demerits that are (i) Increase in Area and (ii) Increase in 
the Delay which ultimately results in low performance (Less 
speed). A new design and implementation of FIR filters using 
Distributed Arithmetic is provided in this paper to solve this 
problem. Distributed Arithmetic structure is used to increase 
the recourse usage while pipeline structure is also used to 
increase the system speed. In addition, the LUT LESS method 
is also used to decrease the required memory units. The 
simulation results indicate that FIR filters using Distributed 
Arithmetic can work stable with high speed and can save 
almost 50 percent hardware recourses to decrease the circuit 
scale, and can be applied to a variety of areas for its great 
flexibility and high reliability. 
 
Index Terms—Distributed Arithmetic (DA), Field 
programmable gate arrays (FPGA), Finite impulse response 
(FIR), Look up table (LUT), Pipeline. 
I. INTRODUCTION 
Filters are a basic component of all signal processing 
and telecommunication systems. Filters are widely employed 
in signal processing and communication systems in 
applications such as channel equalization, noise reduction, 
radar, audio processing, video processing, biomedical signal 
processing, and analysis of economic and financial data. For 
example in a radio receiver band-pass filters, or tuners, are 
used to extract the signals from a radio channel. Digital 
filters are divided into two categories, including Finite 
Impulse Response (FIR) and Infinite Impulse Response (IIR). 
And FIR filters are widely applied to a variety of digital 
signal processing areas for the virtues of providing linear 
phase and system stability. 
    
Fig.1. FIR Filter 
 
Manuscript received March  12, 2014. 
 Mr. Shrikant Patel, MTech. Student (VLSI Design), Department of 
Electronics and Communication, Oriental University, Indore, M.P.  
 Mr. Jeevan Reddy K., Guide and HOD, Department of Electronics and 
Communication, Oriental University, Indore- 453555, M.P., India 
 
 
c(i) =constant or filter coefficient  
x(i) = nth point of input sequences is variable 
y(n)= represents the system response 
Finite impulse response (FIR) filters are the most popular 
type of filters implemented in software. A digital filter takes a 
digital input, gives a digital output, and consists of digital 
components. In a typical digital filtering application, 
software running on a digital signal processor (DSP) reads 
input samples from an A/D converter The FPGA-based FIR 
filters using traditional direct arithmetic costs considerable 
multiply-and-accumulate (MAC) blocks with the augment of 
the filter order. However, according to Distributed 
Arithmetic, we can make a Look-Up-Table (LUT) LESS 
design to conserve the MAC values and callout the values 
according to the input data if necessary. Therefore, LUT 
LESS design can be created to take the place of MAC units so 
as to save the hardware resources. This paper provide the 
principles of Distributed Arithmetic, and introduce it into the 
FIR filters design, and then presents a 31-order FIR low-pass 
filter using modified Distributed Arithmetic, which save 
considerable MAC blocks to decrease the circuit scale, 
meanwhile, LUT LESS design method is used to decrease the 
required memory units and pipeline structure is also used to 
increase the system speed. 
 
II. DISTRIBUTED ARITHMETIC FOR FIR FILTER 
Distributed Arithmetic is one of the most well-known 
methods of implementing FIR filters. The DA solves the 
computation of the inner product equation when the 
coefficients are pre knowledge, as happens in FIR filters. An 
FIR filter of length K is described as: 
 
 
Where h[k] is the filter coefficient and x[k] is the input data. 
For the convenience of analysis, x'[k] =x [n - k] is used for 
modifying the equation (1) and we have: 
 
Then we use B-bit two's complement binary numbers to 
represent the input data: 
 
 
Where Xb[k] denoted the b’th of Xb[k], Xb[k] €{0,1}. 
Substitution of (3) into (2) yields: 
FIR Filter Implementation Using DA with LUT less 
Design Structure Based on FPGA 
Mr. Shrikant Patel, Mr. Jeevan Reddy K. 
 FIR Filter Implementation Using DA with LUT less Design Structure Based on FPGA 
 
                                                                                               24                                                          www.erpublication.org 
 
We have 
 
In equation (4), we observe that the filter coefficients can be 
pre-stored in LUT, and addressed by xb = [,,...]. This way, the 
MAC blocks of FIR filters are reduced to access and 
summation with LUT. ]0[bx]1[bx]1[Kxb. The 
implementation of digital filters using this arithmetic is done 
by using registers, memory resources and a scaling 
accumulator. Original LUT-based DA implementation of a 
4-tap (K=4) FIR filter is shown in Figure 2. The DA 
architecture includes three units: the shift register unit, the 
DA-LUT unit, and the adder/shifter unit. 
 
Fig.2. Original LUT-based DA implementation of a 4-tap 
filter 
 
 
A. LUT-less DA architectures for a 4-tap FIR filter 
 
In Fig.2, we can see that the lower half of LUT (locations 
where b3=1) is the same with the sum of the upper half of 
LUT (locations where b3 =0) and h [3]. Hence, LUT size can 
be reduced 1/2 with an additional 2x1 multiplexer and a full 
adder, as shown in Figure 3. By the same LUT reduction 
procedure, we can have the final LUT-less DA architectures, 
as shown in Figure 4 On other side, for the use of 
combination logic circuit, the filter performance will be 
affected. But when the taps of the filter is a prime, we can use 
4-input LUT units with additional multiplexers and full 
adders to get the tradeoff between filter performance and 
small resource usage. 
 
 
 
Fig.3. Modified DA architecture for a 4-tap filter (2³ word 
LUT implementation of DA) 
 
 
 
 
 
 
Fig.4. LUT-less DA architectures for a 4-tap FIR filter 
 
 
III. FINAL BLOCK DIAGRAM OF 15 – TAB FIR FILTER 
 
Above block diagram shows the final block diagram of the 31 
– Tab FIR Filter. In this diagram consist of PISO shift 
register, where PISO means parallel in and serial out that 
mean shift Register received data in parallel form and give 
out put in serial form. It is also consist of 8 types of 4 – Tab 
FIR Filter. For this purpose no. of 8 LUT LESS Designs is 
used. It is LUT LESS Design of modified LUT. It is 
connected between the pipeline register and shift register. 
When pipeline register use as element, which increase the 
system speed. LUT LESS Design – 0 and LUT LESS Design 
– 1 are connected to the adder similarly all the no. of 6 LUT 
LESS Designs are connected to the adder in coupling form 
after that the adding separate result of 4 LUT LESS Designs 
are connected to the individual adder and finally both adding 
result add by the final adder. Final result of the entire adding 
is saved to the accumulator.  
 
 
 
                                                                                
International Journal of Modern Communication Technologies & Research (IJMCTR) 
 ISSN: 2321-0850, Volume-2, Issue-3, March 2014   
                                                                                               25                                                                 www.erpublication.org 
 
 
 
 
 
 
Fig.6. Structure of 31-Tab FIR filter based on Distributed 
Arithmetic 
(LUT – Look Up Table LESS Design, P.R. – Pipeline 
Register) 
 
 
IV. SIMULATION RESULT OF MODIFIED LUT 
 
Fig.7. Simulation Result of modified LUT Architecture 
 
V. SIMULATION RESULT FOR LUT LESS 
ARCHITECTURE 
 
 
Fig.8. Simulation Result of LUT LESS Architecture 
 
 
VI. SYNTHESIZE RESULT FOR LUT LESS 
ARCHITECTURE 
 
Fig.9. Synthesize of LUT LESS Architecture 
 
VII. CONCLUSION 
This reports the LUT LESS Design DA architectures for 
high-order filter. The architectures reduce the memory usage 
by LUT LESS Design at the cost of the limited decrease of the 
system frequency. We also divide the high-order filters into 
several groups of small filters. As to get the high speed 
implementation of FIR filters, a full-parallel version of the 
DA architecture is adopted. We have successfully 
implemented a high-efficient 31-tap full-parallel DA filter, 
using both an original DA architecture and a modified DA 
architecture on a 4VLX40FF668 FPGA device. It shows that 
the proposed DA architectures are hardware efficient for 
FPGA implementation. The design and implementation 
based on Distributed Arithmetic, which is used to realize a 
31-order FIR low-pass filter. Distributed Arithmetic 
structure is used to increase the recourse usage while pipeline 
structure is used to increase the system speed. The test results 
indicate that the designed filter using Distributed Arithmetic 
can work stable with high speed and can save almost 50 
percent hardware recourses. Meanwhile, it is very easy to 
transplant the filter to other applications through modifying 
the order parameter or bit width and other parameters, and 
Pretreatment 
 FIR Filter Implementation Using DA with LUT less Design Structure Based on FPGA 
 
                                                                                               26                                                          www.erpublication.org 
therefore have great practical applications in digit signal 
processing. 
 
After all implementation and simulation result of the 
modified LUT and LUT LESS Design result. According to 
Fig.7 and Fig.8 these are the diagram of modified LUT and 
LUT LESS Design so that wave result of both structure are 
same. Now we take the device utilization summary of LUT 
LESS Design. 
 
Logic 
utilization          
Used Available Utilization 
No. of slice                   386 4656 13% 
No. of 4 input 
LUTs      
759 9312 12% 
No. of 
banded IOBs      
273 232 117% 
 
Table1. Device utilization Summary of LUT LESS Design   
 
The device utilization summary of LUT LESS Design in No. 
of slice are 13% and ratio 386/4656. The utilization of no. of 
4i/p LUTs is 12% and the ratio of utilization is 759/9312. No. 
of banded IOB’s ratio of utilization is 273/232 and utilization 
percentage is 117%. 
It is shown that LUT LESS architecture works as a modified 
LUT architecture and both results are same that mean 
modified LUT architecture can replaced by the LUT LESS 
architecture. We are designed FIR Filter using DA 
architecture with LUT LESS architecture. These are the 
main novelty of this paper. 
ACKNOWLEDGMENT 
I wish to thank and honour my internal guide Mr. Jeevan 
Reddy K., HOD, Department of Electronic and 
communication and all faculty members of Department of 
Electronic and communication, Oriental University Indore, 
Staff and who have helped me in many ways directly or 
indirectly for this paper. I would like to dedicate this paper to 
my family and friends and to God, who gave me support 
during the tough times in the course of completion of the 
paper. 
REFERENCES 
[1] Uwe Meyer-Baese.Digital signal processing with FPGA[M]. Beijing: 
Tsinghua University Press, 2006. 
[2] Tsao Y C and Choi K. Area-Efficient Parallel FIR Digital Filter 
Structures for Symmetric Convolutions Based on Fast FIR Algorithm 
[J]. IEEE Transactions on Very Large Scale Integration (VLSI) 
Systems,,PP(99),2010. 
[3]  Stanley A. White, “Applications of distributed arithmetic to digital 
signal processing: A tutorial review,” IEEE ASSP Magazine, vol. 6, pp. 
4–19, July 1989. 
[4]  S. Haykin, Adaptive Filter Theory, Prentice Hall, Upper Saddle River, 
NJ, 1996. 
[5] C. F. N. Cowan and J. Mavor, “New digital-adaptive filter 
implementation using distributed-arithmetic techniques,” IEE 
Proceedings, vol. 128, Pt. F, no. 4, pp. 225–230, August 1981. 
[6] A. Peled and B. Lie, “A new hardware realization of digital filters,” IEEE 
Transactions on A.S.S.P., vol. 22, pp. 456–462, December 1974. 
[7]  Partrick Longa, Ali Miri, "Area-Efficient Fir Filter Design on FPGAs 
using Distributed Arithmetic" IEEE International Symposium on Signal 
Processing and Information Technology, pp:248-252,2006.  
[8]  Sangyun Hwang, Gunhee Han,Sungho Kang, Jaeseok Kim, "New 
Distributed Arithmetic Algorithm for Low-Power FIR Filter 
Implementation", IEEE Signal Processing Letters, Vol.11, No5, 
pp:463-466,May, 2004.  
[9] Heejong Yoo, David V.Anderson, "Hardware-Efficient Distributed 
Arithmetic Architecture For High-order Digital Filters", IEEE 
International Conference on Acoustics, Speech and Signal Processing, 
Vol.5,pp. 125-128,March,2005.  
[10]  Wangdian, Xingwang Zhuo "Digital Systems Applications and Design 
Based On Verilog HDL", Beijin: National Defence Industry press, 2006.  
[11]  McClellan , J.H. Parks, T.W. Rabiner, L.R. "A computer program for 
designing optimum FIR linear phase digital filters":. IEEE Trans. Audio 
Electroacoust. Vol. 21, No.6, pp:506-526, 1973.  
[12]  M. Keerthi 1, Vasujadevi Midasala2, S Nagakishore Bhavanam3, 
Jeevan Reddy ``FPGA Implementation of Distributed Arithmetic For 
FIR Filter” International Journal of Engineering Research & 
Technology (IJERT), Vol. 1 Issue 9, November- 2012. 
[13] Yajun Zhou, Pingzheng Shi “Distributed Arithmetic for FIR Filter 
implementation on FPGA’’ IEEE International Conference on 
Multimedia Technology, Print ISBN: 978-1-61284-771-9, pp. 294 - 
297, july 2011. 
 
 
 
 Mr. Shrikant Patel, received his bachelor and 
master degree in Electronics Engineering at Department of Electronics from Dr. 
Hari Singh Gour University, Sagar (M.P.) India. He is also received M.Tech 
degree in VLSI Design at the Department of Electronics and Communication 
Engineering from Oriental University, Indore (M.P.) India. 
 
