Design and Analysis of Distributed Arithmetic based FIR Filter by Akhter, Shamim & Bareja, Divya
APTIKOM J. CSIT Vol. 5, No. 3, 2020 : 116 – 128  
APTIKOM Journal on Computer Science and Information Technologies 
Vol. 5, No. 3, 2020, pp. 116~128 
ISSN: 2528-2417 
DOI: 10.34306/APTIKOM.J.CSIT.42 ■ 116 
 
 
Design and Analysis of Distributed Arithmetic based FIR 
Filter 
Shamim Akhter1, Divya Bareja2 
JIIT Noida1, ECE Department2 -India 
E-mail: shamim.akhter@jiit.ac.in1, bareja@gmail.com2 
 
 
Abstract 
The core of digital signal processing applications is digital filter. In designing digital filter, Multiply 
Accumulate (MAC) unit is used. MAC comprises of multiplier, adder and an accumulator.  Faster  adder  and 
multiplier circuits are required for high speed MAC unit. But MAC based structures  have disadvantages like high  
power dissipation, slow processing etc. The multiplication operation which multiplies input with the fixed coefficients 
considerably took large place to store their temporary data. So, memory based structures substitute multipliers to 
reduce area and latency of system. Distributed Arithmetic (DA) is one of the memory based technique. Distributed 
Arithmetic (DA) based technique substitute multipliers in FIR filters. In this paper a detailed analysis is done for 
designing FIR filter using DA technique in VHDL. Synthesis is done using Xilinx ISE for  Virtex-4 ML  402.  Area, 
delay and power analysis is performed using Synopsys Design Compiler for 32/28 nmstd_cell. 
 
Keywords: FIR Filter, Distributed Arithmetic, Off Set Binary Coding 
 
Copyright © 2020 APTIKOM - All rights reserved. 
 
 
1. Introduction 
An FIR filter is non-recursive filter that has no feedback and its input-output relation is given by 
𝑁−1 
𝑦(𝑛) = ∑ ℎ(𝑘)𝑥(𝑁 − 𝑘) 
𝑘=0 
 
 
The mathematical expression given above can be realized by the MAC structure shown in (Figure 1). 
 
Figure 1. FIR Filter 
 
It can be seen that the above implementation requires large number of multipliers. In  general  
multiplication operation needs large number of hardware in terms of logic gates. The value of filter 
coefficients h0, h1 …hN-1 is known prior since these values are decided by the design constraints like 
sampling frequency, type and order of filter, chosen window [1]. Distributed Arithmetic (DA) based sum of 
product technique is more efficient for this type of mathematical computation. The paper is organized as 
follows. Section 2 deals with DA and it various topologies to increase speed of operation. Section 3  
describe about FIR filter and its implementation in FDA Tool for getting the filter coefficient values [2]. It 
also deals with methodology for the DA based FIR filter design. In Section 4 simulation result  is  
discussed. Section 5 gives OBC-DA based FIR filter design and analysis followed by  comparative  
analysis. Section 6 gives conclusion and future work followed by references [3]. 
ISSN: 2528-2417 ■ 117 APTIKOM J. CSIT 
APTIKOM J. CSIT Vol. 5, No. 3, 2020 : 116 – 128 
 
 
2. Basic of Distributed Arithmetic 
Supposed we have to do the computation of sum of product term: 
 
Y= ∑HiXi, index i ranges from 0 to N-1 
Hi = Multiplicand and Xi= Multiplier. 
Multiply and Accumulate (MAC) technique for above computation requires more time. It also 
requires large area in terms of logic gates for multiplier. Considering case where Hi is constant i.e [4]. in 
situation where we have one of the inputs (multiplicand or multiplier) is known priory, distributed 
arithmetic (DA) is very useful (1). Instead of using multiplier directly, DA uses Look Up Table (LUT), 
shift registers and accumulator (Adder). For example, assuming N=3, we have following expression to 
evaluate using DA with fixed constant values as: H0= 1.5, H1= 2.5 and H2= -3.5 
 
Y=H0X0+ H1X1 + H2X2 (1) 
 
Using the concept of DA as available in open literature, a  LUT for above case  will  be generated 
as given in Table 1. Memory address is formed by values of X0X1X2. The LUT content is presented in 16- 
bit 2’s complement format, 8-bits for integer portion and remaining 8-bits for fractional portion. This is 
taken so that the resolution in the final result is high. The values of constant terms are H0 = 
0000000110000000, H1= 0000001010000000, H2= 1111110010000000. LUT contents are also written in 
Hex format for clarity [5,6]. 
 
Table 1. LUT Table for Equation given in Expression (1) 
 
Address 
X0X1X2 
LUT Content 
(16 Bit) 
LUT Content 
(in Hex Format) 
000 0000000000000000 0000 
001 1111110010000000 FC80 
010 0000001010000000 0280 
011 1111111100000000 FF00 
100 0000000110000000 0180 
101 1111111000000000 FE00 
110 0000010000000000 0400 
111 0000000010000000 0080 
 
The inputs are in 8-bits 2’s complement form where first 4-bits for integer portion and remaining 
4-bits for fractional part. Taking inputs as: X0= 1.5, X1= -0.75 and X2=-0.5, so we have X0= 00011000, 
X1=11110100 and X2= 11111000. The address formed, using the above input values starting from LBS to 
MSB, for accessing contents of LUT will be as given in Table.2 [7]. 
 
Table 2. Address for accessing LUT contents 
 
000 
000 
010 
101 
111 
011 
011 
011 
 
It is to be noted that the first four addresses is formed by the fractional portion of the input and 
remaining is formed by integer portion. In DA, we have Shift and Add operation. Fractional part (starting 
from LSB Bit) will have recursive addition operation with right shifted LUT contents. Integer  part  will  
first have left shifting and then recursive addition [8]. A point of caution here is: In case any one of the 
input 
ISSN: 2528-2417 ■ 118 APTIKOM J. CSIT 
Design and Analysis of Distributed Arithmetic based FIR Filter ... (Shamim Akhter) 
 
 
X0 
 
X1 
L 
U 
T 
Adder Shift R 
Adder 
X2 
Shift L Adder 
is negative (can be seen from MSB bit), the last shifted LUT content is to be subtracted. The block 
diagram for the same is given in (Figure 2) below. 
 
Figure 2. Block Diagram for DA based computation 
 
The adders are initialized with zero. The sequence of output from each section is given below: 
For fractional terms: 
 
For integer terms: 
[(-2) + (2.5 + (0 + (0)2-1) 2-1) 2-1)]2-1 = > - 0.375 
 
0.5 + (-1)21 + (-1)22 - (-1)23 => 2.5 
 
It is to be noted that the last shifted items is subtracted since one of the number is negative. Final result is 
the addition of the above computed values i.e. -0.375 + 2.5 => 2.125. 
 
In order to speed up the computation, various topologies like Offset Binary Coding (OBC), LUT 
partitioning, 2-BAAT (Bit At A Time) is suggested in (1). Efficient implementation of Finite Impulse 
Response Filter (FIR) using Distributed Arithmetic (DA) architecture is discussed in (2, 3). We have 
discussed the detailed analysis for FIR filter in next section [9]. 
 
3. Da Based Fir Filter Design And Analysis 
In this section, we have discussed about implementing FIR Filter (3-10) using DA approach. 
 
3.1 FIR Filter Design Approach 
An FIR filter is non-recursive filter that has no feedback and its input-output relation is given by 
 
𝑦(𝑛) = 
∑𝑁−1 ℎ(𝑘)𝑥(𝑁 − 𝑘) 
𝑘=0 
(2) 
 
 
It is characterized by filter coefficient h(k). The filter coefficient is computed by Filter Design 
and Analysis (FDA) tools available in Simulink under Xilinx Block set. As GUI for the same available in 
Matlab software is shown in (Figure 3). 
 
Figure 3. MATLAB FDA tool GUI 
ISSN: 2528-2417 ■ 119 APTIKOM J. CSIT 
APTIKOM J. CSIT Vol. 5, No. 3, 2020 : 116 – 128 
 
 
 
 
For example, we have considered the following for analysis purpose only. 
a) Choose response type  Lowpass. 
b) Filter order 15 
c) Choose filter design method FIR(window(hamming)) 
d) Specifications 
The sampling frequency of fs: 5MHz; 
The cutoff frequency of fc : 1.5MHz; 
Magnitude response of this filter will be as given below: 
 
Figure 4. Magnitude response of designed FIR filter 
This also provides us the Filter coefficient and the same is given below. 
Table 3. Filter Coefficients 
 
 
Due to filter symmetry, h(k)=h(N-k), where N is order of filter. The filter coefficients are 
represented in 24-bit 2’s complement format, with 9-bits for the integer part and remaining 15-bits for 
fractional part. For clarity, coefficients are written below in Hex format [10]. 
 
h(0) = h(15) = 00006E； 
h(1) = h(14) = FFFFC5； 
h(2) = h(13) = FFFE9D； 
h(3) = h(12) = 0002E7； 
h(4) = h(11) = 00021B； 
h(5) = h(10) = FFF37E； 
h(6) = h(9) = 0007A1；  
h(7) = h(8) = 00410C；     
ISSN: 2528-2417 ■ 120 APTIKOM J. CSIT 
Design and Analysis of Distributed Arithmetic based FIR Filter ..(Shamim Akhter) 
 
 
 
Filter equation given in Equation (2) can be written as 
Y(n)=x(n)h(0) + x(n-1)h(1)+……+ x(n-15)h(15) (3) 
 
Due to even symmetry, h(0)=h(15), h(1)=h(14), ….. h(7)=h(8). Hence Equation (3) can be rewritten as: 
Y(n) = [x(n-0)+x(n-15)]h(0) + [x(n-1) + x(n-14)]h(1) 
+[x(n-2)+x(n-13)]h(2) + [x(n-3) + x(n-12)]h(3) 
+[x(n-4)+x(n-11)]h(4) + [x(n-5) + x(n-10)]h(5) 
+[x(n-6)+x(n-9)]h(6) + [x(n-7)+ x(n-8)]h(7) (4) 
The above expression can be rewritten as 
 
Y(n) = X015h(0) + X114h(1)+ X213h(2)+ X312h(3) + X411h(4) + X510h(5)+X69h(6)+ X78h(7) (5) 
 
Direct computation of the expression given in Equation (5) using distributed arithmetic will require large 
size LUT. Hence it is partitioned into two smaller parts  each consisting of four  terms. The block diagram  
in (Figure 5) shows the computation technique for the same. 
 
X015 
X114 
X213 
DA Block 
with 
LUT   
h(0),h(1), 
h(2),h(3) A 
D 
X31 2 DE 
Y 
DA Block 
X411 
X510 
X69 
X78 
with 
R
 
LUT   
h(4),h(5), 
h(6),h(7) 
 
Figure 5. DA based computation after LUT Partitioning 
 
Filter is having single input X(n), and hence the values of signal X015 , X114, ….. X78 are generated by 
addition of the shifted version of the sampled signal. Initially signal X015, X114, ….., X78 are initialized to 
zero. 
Later x(0)= current value, x(1)=x(0), x(2)=x(1)……. x(15)=x(14). 
 
The address generation step, input preprocessing and LUT content are discussed in consequent sections. 
 
3.2. Input Pre-processing Stage 
We have considered the input signal of frequency 1MHz and sampling frequency 5MHz so as to 
check the working/performance of designed low pass filter in VHDL. In this example we have considered 
the magnitude of signal as A=1. But the design works for other values of magnitude too. 
 
x(t) = A sin (ωt) where, ω =angular frequency. 
= sin(2π106t) 
 
Based on sampling frequency, we have following sample time instances: 0 us, 0.2  us, 0.4 us, 0.6 
us and 0.8 us. Hence the sampled value of input signals is as given below (have used 16-bit 2’s 
complement format, with 8-bits for the integer part and remaining 8-bits for fractional part). Hex 
presentation is also shown side by side [11]. 
■ 121 
   
ISSN: 2528-2417 APTIKOM J. CSIT 
APTIKOM J. CSIT Vol. 5, No. 3, 2020 : 116 – 128 
 
 
| 
Sin0.0 = 0.000000 = 0000000000000000 = 0000 
Sin0.4π= 0.951056 = 0000000011110011= 00F3 
Sin0.8π= 0.587785 = 0000000010010110= 0096 
Sin1.2π= -0.587785 =1111111101101010=FF6A 
Sin1.6π= -0.951056 =1111111100001101=FF0D 
 
Since we need to prepare the signal for X015, X114, ….. X78 for forming address line to access the 
LUT contents (see Section 3.1). The block diagram given in (Figure 6) shows input processing stage. 
 
Input x(0)  
X015 
Dela x(1) 
| X214 
| x(2) 
| 
| 
 
x(14) 
 
x(15) 
Adder for 
Generating 
Address 
For  
Accessing 
LUT X78
 
 
 
 
Figure 6. Input Preprocessing Block 
 
The signal X015 X114 X213X312 X411 X510 X69 X78 are generated by using 16-bit binary adder. 
 
3.3 LUT Contents 
As discussed in Section 3.1, we need to have two different LUT say LUT1 and LUT2. The 
content of LUT1 is given in (Table 4) and will be accessed by address line formed by X015 X114 
X213X312. 
 
Table 4. LUT1 Contents 
 
Address 
X015 X114 X213X312 
LUT Contents as 
derived from 
Equation (5) 
LUT 
Contents 
(in Hex) 
0000 0 000000 
0001 h(3) 0002E7 
0010 h(2) FFFE9D 
0011 h(2)+h(3) 000184 
0100 h(1) FFFFC5 
0101 h(1)+h(3) 0002AC 
0110 h(1)+h(2) FFFE62 
0111 h(1)+h(2)+h(3) 000149 
1000 h(0) 00006E 
1001 h(0)+h(3) 000356 
1010 h(0)+h(2) FFFF0E 
1011 h(0)+h(2)+h(3) 0001F3 
1100 h(0)+h(1) 000033 
1101 h(0)+ h(1)+h(3) 00031B 
1110 h(0)+h(1)+h(2) FFFED1 
1111 h(0)+h(1)+h(2)+h(3) 0001B8 
 
The content of LUT2 is given in (Table 5) and will be accessed by address line formed by X411 X510 X69 X78. 
Dela 
Dela 
ISSN: 2528-2417 ■ 122 APTIKOM J. CSIT 
APTIKOM J. CSIT Vol. 5, No. 3, 2020 : 116 – 128 
 
 
Distributed Arithmetic Based Computation using 
LUT1 and LUT2 
Table 5. LUT2 Contents 
 
Address 
X411 X510 X69 X78 
LUT Contents as 
derived from 
Equation (5) 
LUT 
Contents 
(in Hex) 
0000 0 000000 
0001 h(7) 00410C 
0010 h(6) 0007A1 
0011 h(6)+h(7) 0048AE 
0100 h(5) FFF37E 
0101 h(5)+h(7) 00348A 
0110 h(5)+h(6) FFFB20 
0111 h(5)+h(6)+h(7) 003C2B 
1000 h(4) 00021B 
1001 h(4)+h(7) 004328 
1010 h(4)+h(6) 0009BD 
1011 h(4)+h(6)+h(7) 004AC9 
1100 h(4)+h(5) FFF59A 
1101 h(4)+ h(5)+h(7) 0036A6 
1110 h(4)+h(5)+h(6) FFFD3C 
1111 h(4)+h(5)+h(6)+h(7) 003E47 
 
The complete block diagram representation for FIR filter operation is given in (Figure 7). 
 
Input x(n) 
 
 
 
 
 
 
 
 
 
Output of Filter Y(n) 
Figure 7. Block Diagram of the Designed System 
 
Next section shows simulation and synthesis result. Each module is designed using VHDL and simulated 
in Model-Sim software. Synthesis is performed on Xilinx ISE and Synopsys design compiler. 
 
4. Simulation And Synthesis Result 
Provide a statement that what is expected, as stated in the "Introduction" chapter can ultimately 
result in "Results and Discussion" chapter, so there is compatibility. Moreover, it can also be added the 
prospect of the development of research results and application prospects of further studies into the next 
(based on result and discussion). 
VHDL design includes ports for clk, reset, input (x) and output(y). As discussed in Section 3.2, we have 
only five samples. So the values of sampled data will be shifted/delayed to form x(1), x(2),x(3), till x(15). 
Correct filter output will be available only after 15-clock pulses. The Table.6 shows outputs for 25-clock 
pulse. The 24-bit filter output from VHDL simulation in represented in Hexadecimal values. These values 
are then converted into decimal values keeping in mind that it is 2’s complement presentation with 9-bit 
for integer part and rest 15-bits for fractional part. (Table 6) also shows hand calculated decimal values of 
expected output. It can be seen that expected and hand calculated value are nearly equal. 
 
Input Pre-processing Block 
X015 X114 X213 X312 X411 X510 X69 X78 
ISSN: 2528-2417 ■ 123 APTIKOM J. CSIT 
Design and Analysis of Distributed Arithmetic based FIR Filter ... (Shamim Akhter) 
 
 
Table 6. Output values comparison for Input freq (1MHz) 
 
S. No Hand 
Calculation 
Simulation output 
(in Hexadecimal) (in Decimal) 
1 0.0 000000 0.0 
2 +0.0032 000065 +0.003082 
3 +0.0002 000005 +0.000153 
4 -0.0134 FFFE49 -0.013397 
5 +0.0131 0001AC +0.013061 
6 +0.0372 0004B7 +0.036835 
7 -0.0833 FFF54F -0.083526 
8 -0.0318 FFFBFA -0.031433 
9 +0.5486 004613 +0.547454 
10 +0.8558 006D1F +0.852508 
11 +0.0372 0004B7 +0.036834 
12 -0.9260 FF8A5C -0.919067 
13 -0.5938 FFB4B2 -0.588317 
14 +0.5807 004A2E +0.579528 
15 +0.9424 00782F +0.938934 
16 0.0 000000 0.0 
17 -0.9391 FF88B3 -0.932037 
18 -0.5804 FFB665 -0.575042 
19 +0.5804 004A26 +0.579284 
20 +0.9391 0077C6 +0.935729 
21 0.0 000000 0.0 
22 -0.9391 FF88B3 -0.932037 
23 -0.5804 FFB665 -0.575042 
24 +0.5804 004A26 +0.579284 
25 +0.9391 0077C6 +0.935729 
 
The input and output waveform plotted in Matlab is shown in (Figure 8). For first filtered out starts 
appearing after 15 clock cycle i.e. 1.5us. 
 
Figure 8. Input and output waveform from data plot of Table 6 
ISSN: 2528-2417 ■ 124 APTIKOM J. CSIT 
APTIKOM J. CSIT Vol. 5, No. 3, 2020 : 116 – 128 
 
 
The output waveform from VHDL simulation is shown in (Figure 9). 
 
 
Figure 9. Simulated output from Model-Sim 
 
 
It takes 15 clock cycles to get the correct value of shifted input data to initialized the contents of x(0) to 
x(15). After 16th clock, filter output starts appearing normally. Since it is a case of low pass filter, and the 
selected frequency lies in pass band so output is nearly same as input. Design is tested for other input 
magnitude (less than 1) and output is nearly same as that of input (after 16th clock pulse). 
 
We have computed output of FIR filter at different input signal frequencies (2 MHz) i.e. higher than the 
cut off frequency. The sampled input values are as given below: 
 
Sin0 = 0.000000= 0000000000000000=0000 
Sin0.8π= 0.587785= 0000000010010110= 0096 
Sin1.6π= -0.951056=1111111100001101=FF0D 
Sin2.4π= 0.951056 =0000000011110011= 00F3 
Sin3.2π=- 0.587785=1111111101101010=FF6A 
 
Table 7. Output values comparison for Input freq (2 MHz) 
 
S. No. Hex values Decimal values 
1 000000 0.0 
2 00003E +0.00189 
3 FFFF72 -0.00433 
4 FFFFCF -0.00149 
5 000284 +0.01965 
6 FFFD48 -0.02124 
7 FFFA7C -0.04309 
8 001005 +0.12515 
9 00118A +0.13702 
10 FFF9BC -0.04895 
ISSN: 2528-2417 ■ 125 APTIKOM J. CSIT 
Design and Analysis of Distributed Arithmetic based FIR Filter ... (Shamim Akhter) 
 
 
11 FFFD48 -0.02124 
12 0003D5 +0.02993 
13 FFFE58 -0.01294 
14 0000F4 +0.00745 
15 FFFF85 -0.003753 
16 000000 0.0 
17 000148 +0.01001 
18 FFFE87 -0.01150 
19 00017E +0.01165 
20 FFFF44 -0.00573 
21 000000 0.0 
22 000148 +0.01001 
23 FFFE87 -0.01150 
24 00017E +0.01165 
25 FFFF44 -0.00573 
 
 
These values are plotted in MATLAB as shown below- 
 
 
Figure 10. Input and output waveform from data plot of Table.7 
 
It is clear from above results that filter will not pass signals whose frequency is greater than cut-off 
frequency. To reduce the LUT size, filter can be designed using Off-Set Binary Coding (OBC) as  
discussed in next section. 
 
 
5. OBC-DA BASED FIRFILTER 
In OBC-DA, the LUT size is reduced to halve as compared to that of normal DA [1]. The detail 
about the OBC-DA based Sum of Product computation is available in literature. In this section, we have 
discussed the designing of FIR filter using OBC techniques. The address for accessing the LUT contents 
generated by XORing each bits of first input data with that of other inputs. For example LUT1 is accessed  
by address formed by [X015 X114, X015 X213, X015 X312]. Similarly LUT2 is accessed by address formed by 
[X411 X510, X411 X69, X411 X78]. 
ISSN: 2528-2417 ■ 126 APTIKOM J. CSIT 
APTIKOM J. CSIT Vol. 5, No. 3, 2020 : 116 – 128 
 
 
X015 X114 
X015    X213 
X015    X312 
 
 
 
X411 X510 
X411 X69 
X411 X7 
 
 
Figure 11. Block Diagram for OBC-DA based 
 
The LUT block content shown in Table.4 and 5 is modified according to OBC-DA technique. The 
Table.8 shows the LUT1 contents for block diagram given in (Figure 11). 
 
Table 8. LUT1 Contents for OBC-DA 
 
New Address 
X114 X213X312 
LUT Contents as 
derived from 
Equation (5) based on 
OBC 
LUT 
Contents 
(in Hex) 
000 -1/2(h0+h1+h2+h3) FFFF24 
001 -1/2(h0+h1+h2-h3) 00020C 
010 -1/2(h0+h1-h2+h3) FFFDC1 
011 -1/2(h0+h1-h2-h3) 0000A7 
100 -1/2(h0-h1+h2+h3) FFFEEA 
101 -1/2(h0-h1+h2-h3) 0001D1 
110 -1/2(h0-h1-h2+h3) FFFD86 
111 -1/2(h0-h1-h2-h3) 00006C 
 
Similarly the (Table 9) shows the LUT2 contents. 
 
Table.9 LUT2 Contents for OBC-DA 
 
New Address 
X510 X69X78 
LUT Contents as 
derived from 
Equation (5) based on 
OBC 
LUT 
Contents 
(in Hex) 
000 -1/2(h4+h5+h6+h7) FFE0DD 
001 -1/2(h4+h5+h6-h7) 0021E8 
010 -1/2(h4+h5-h6+h7) FFE87E 
011 -1/2(h4+h5-h6-h7) 00298A 
100 -1/2(h4-h5+h6+h7) FFD45B 
101 -1/2(h4-h5+h6-h7) 001566 
110 -1/2(h4-h5-h6+h7) FFDBFC 
111 -1/2(h4-h5-h6-h7) 001D08 
OBC-DA 
Block 
with 
LUT 1 
A 
D 
D 
E 
R 
OBC-DA       
Block 
with 
ISSN: 2528-2417 ■ 127 APTIKOM J. CSIT 
Design and Analysis of Distributed Arithmetic based FIR Filter ... (Shamim Akhter) 
 
 
The filter is designed with OBC-DA and tested with the same set of input data as used for DA. The 
hand calculated and that obtained from VHDL simulation is summarized in (Table 10). 
Table 10. Output values comparison for Input freq (1MHz) using OBC-DA 
 
S. No Hand 
Calculation 
Simulation output 
(in Hexadecimal) (in Decimal) 
1 0.0 000000 0.0 
2 +0.0032 000068 +0.00317 
3 +0.0002 000006 +0.00018 
4 -0.0134 FFFE45 -0.01352 
5 +0.0131 0001A4 +0.01281 
6 +0.0372 0004B9 +0.03689 
7 -0.0833 FFF567 -0.08279 
8 -0.0318 FFFBF7 -0.03152 
9 +0.5486 0045D5 +0.54556 
10 +0.8558 006CE7 +0.85079 
11 +0.0372 0004B9 +0.03690 
12 -0.9260 FF8A27 -0.92069 
13 -0.5938 FFB46D -0.59042 
14 +0.5807 0049E7 +0.57736 
15 +0.9424 0077F0 +0.93701 
16 0.0 FFFFFA -0.00018 
17 -0.9391 FF887C -0.93323 
18 -0.5804 FFB623 - 0.57705 
19 +0.5804 0049DF + 0.57712 
20 +0.9391 007785 + 0.93375 
21 0.0 FFFFFA -0.00018 
22 -0.9391 FF887C -0.93323 
23 -0.5804 FFB623 - 0.57705 
24 +0.5804 0049DF + 0.57712 
25 +0.9391 007785 + 0.93375 
 
The simulated about is nearly same as that obtained by DA technique. The synthesis result from Xilinx 
Virtex 4: ML 402 FPGA board is given in (Table 11). 
 
Table 11. Comparison between DA and OBC-DA 
 
Comparison 
Parameters 
DA OBC-DA 
Number of Slices 733 722 
Number of 4- 
Input LUT 
1186 1166 
Delay (in ns) 19.4 21.69 
 
Synthesis is also been performed using Synopsys Design Compiler using 32 nm Std_Cell. The result is 
given in (Table 12). 
Table 12. Area, Delay and Power Comparision 
 
Technique Area(µm2) Power(µw) Delay(ns) 
DA 
OBC 
11059 
10654 
305 
275 
3.86 
4.07 
 
It can be seen that OBC-DA requires less hardware as compared to DA but having little more delay due to 
extra overhead in address generation. 
ISSN: 2528-2417 ■ 128 APTIKOM J. CSIT 
APTIKOM J. CSIT Vol. 5, No. 3, 2020 : 116 – 128 
 
 
 
6. Conclussion 
Multiply Accumulate Circuit structures consume much power, area, and have large latency and 
low throughput. So, we have analyzed FIR Filter  designing using Distributed  Arithmetic architectures  
with different topologies like DA- simple LUT based, DA-OBC using 1 bit at a time. All these topologies 
have their advantages over MAC structures. DA-OBC using 1-BAAT consumes very less area as 
compared to DA. 
 
References 
 
[1]  Stanley A.  White,”  Applications  of  Distributed  Arithmetic  to Digital  Signal Processing: A Tutorial Review”, 
IEEE ASSP MAGAZINE, JULY 1989, PP 4-19. 
[2]   Cui Guo-wei, Wang Feng-ying ,” The Implementation of FIR Low-pass Filter Based on FPGA and DA “, Fourth 
International Conference on Intelligent Control and Information Processing (ICICIP), 9 – 11 June 2013, PP 604 - 
605. 
[3] Ramesh .R, Nathiya .R,” Realization Of Fir Filter Using Modified Distributed Arithmetic Architecture”, An 
International Journal (SIPIJ) on Signal & Image Processing ,Vol.3, No.1, February2012,PP83-94. 
[4] M. Yazhini1 and R. Ramesh,” FIR Filter Implementation using Modified Distributed Arithmetic Architecture ”, 
Indian Journal of Science and Technology (INDJST), | Vol 6 (5) | May 2013, PP 4485-4491. 
[5] Yajun Zhou,” Distributed Arithmetic for FIR Filter implementation on FPGA”, IEEE International Conference on 
Multimedia Technology (ICMT), July 2011, PP 294-297. 
[6] Heejong Yoo and David V. Anderson,” Hardware-Efficient Distributed Arithmetic Architecture for High-Order 
Digital Filters”, IEEE International Conference On Acoustics, Speech, And Signal Processing (ICASSP),  APRIL 
2005, PP 125-128. 
[7] Distributed Arithmetic FIR Filter, XilinxDS240 (www.xilinx.com) Product Specification, V 9.0, April 28, 2005, 
PP 1-45. 
[8] Haw-Jing Lo, “Distributed Arithmetic” in Design of a reusable Distributed Arithmetic Filter And  Its Application  
To The Affine Projection Algorithm” Georgia, UMI Microform, 2009, PP 3-12. 
[9] Wang Sen, Tang Bin, Zhu Jun,” Distributed Arithmetic for FIR Filter Design on  FPGA”,  Supported  by  the 
National Defense Pre-research Fund of China, PP 620-623. 
[10] B. Ayyappa Reddy, G. Sambasiva Rao, “Distributed Arithmetic Unit Design For FIR Filter”, International Journal 
And Magazine of Engineering, Technology, Management and Research, Feb 2015, PP 7-11. 
[11] Febriyanto, E., & Naufal, R. S. (2019). Attitude Competency Assessment in the 2013 Curriculum Based On 
Elementary School Prototyping Methods. IAIC Transactions on Sustainable Digital Innovation, 1(1), 87-96. 
