IMPLEMENTATION OF LOW AREA AND DELAY BIT LEVEL ADDER-TREES by Jhansi, D. & Eswara Rao, P.
D. Jhansi* et al. 
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
 Volume No.5, Issue No.2, February – March 2017, 5772-5776. 
2320 –5547 @ 2013-2017 http://www.ijitr.com All rights Reserved.  Page | 5772 
 
Implementation of Low Area and Delay Bit 
Level Adder-Trees 
D.JHANSI 
Student 
Sarada Institute of Science Technology and 
Management, Ampolu Road, Srikakulam 532404, 
Andhra Pradesh 
P.ESWARA RAO 
Assistant Professor 
Sarada Institute of Science Technology and 
Management, Ampolu Road, Srikakulam 532404, 
Andhra Pradesh. 
Abstract: Digital filters are becoming ubiquitous in audio applications. As a result, good digital filter 
performance is important to audio system design.  Digital filters uses finite precision to represent signals 
and are differ from analog filters as digital filters uses finite precision arithmetic to compute the filter 
response. In this project, FIR filter is implemented in Xilinx ISE using VERILOG language. VERILOG 
coding for the FIR filter is implemented in this project and waveforms are observed through simulation. 
Adder posses less weight when compared to multipliers in terms of silicon area and this is an 
advantageous in FIR structure. For this project the chosen multipliers are booth and Wallace and the 
considered adders are carry save and carry skip. In this project we have to develop an RTL for the 
structures and verify the functionality of the structures along with performing the synthesis using Xilinx 
synthesizer. The results are compared in terms of area (LUT’S), power, delay and memory for various fir 
structures. 
I. INTRODUCTION 
Filter is a frequency selective network. A filter 
allows a particular band of frequencies and 
attenuates all the remaining frequencies. Analog 
and digital are the two types of filter. Depending on 
the impulse response of a filter it is classified into 
two types one is finite impulse response and the 
other is infinite impulse response.  
Digital filters: 
In the industry of electronics digital filters are used. 
Compared to analog filters digital filters have attain 
much signal to noise ratio for this reason we use 
digital filters than analog filters. The digital filters 
will perform noiseless mathematical operations at 
each intermediate step in the transforms. Design 
engineers use digital filters to achieve better 
performance level that are difficult to obtain with 
analog filters. 
The following are the three operations will do in 
digital filters: 
1. Addition operation or subtraction. 
2. Multiplication of a signal by a constant value. 
3. Delaying a digital signal by one or more 
sample periods 
A graphical means of describing a digital 
filter whereby the behavior of the filter is 
described by in below figure 
T
X(n) Y(n)
X(n-1)
AX(n-1)
Time Delay (Z
-1
)
Multiply
Summation
A
 
Fig.1: Block Diagram of a Simple Digital Filter 
      Is the unit impulse function given as input to 
a filter and its response is h (n). if the impulse 
response of a system is known, it is possible to 
calculate the system response for any input 
sequence x (n). at sample index n = 0 the unit 
impulse is applied to the system. So. The impulse is 
non-zero only for values of n greater than or equal 
to zero i.e., h (n) is zero for n<0. This impulse 
response is said to be casual otherwise the system 
would be producing a response before an input has 
been applied. It is known from the time-invariance 
property of a Linear Time Invariant System that the 
response of a system to a delayed unit impulse 
      .  
 
Multipliers are one of the major devices that are 
used in digital signal processing systems.  A series 
of repeated addition will give the result for 
multiplication as in past multiplication is 
considered as addition, subtraction and shifting. 
Multiplicand is the number which is to be added, 
multiplier is the number of times that is added and 
product is the result. A partial product will be 
generated while doing addition at each step. The 
information that is in the content was preserved by 
interrupting the operands as integers and generating 
twice the length of operands. Multiplication 
process is carried out in two steps. The first one is 
to generate the partial products of the given 
operands and the second one is to add all the partial 
products.  
D. Jhansi* et al. 
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
 Volume No.5, Issue No.2, February – March 2017, 5772-5776. 
2320 –5547 @ 2013-2017 http://www.ijitr.com All rights Reserved.  Page | 5773 
 
Twofold is the basic multiplication principle. The 
partial are evaluated in first fold and all those 
products are added up to get the final result.  
Shifting operation is done which gates the right bit 
to “multiplicand”.  In order to perform 
multiplication operation for both signed and 
unsigned numbers we use two’s compliment 
method.  
As the key factor of any system is multiplier so the 
performance of a system is determined by the 
performance of a multiplier. But the limitation with 
this is, it consumes more area so reducing the area 
and improving the speed performance of multiplier 
is the major issue. The speed and area are inversely 
proportional such that a system which is designed 
to perform high speed operations it occupies more 
area. Because of area problem we design the 
system by using parallel processing.   
These multipliers have moderate performance in 
both speed and area. The existing design of 
multiplier is complicate because of switching 
system and with irregularities in the design. Digits 
that are in serial fashion causes above problem so 
that radix 2
^
n multiplier is used that operates on the 
digits that are in parallel fashion. This concept was 
introduced by M.K. Ibrahim in 1993. Process of 
pipelining gives the advantage of constant 
operation speed by not considering the size of a 
multiplier. The size of a digit is given by the clock 
speed and is given before the design implemented. 
 
Fig.2: FIR Filter 
Adders are used not only to perform addition 
operations but also for calculating addresses in 
processors and similar operations are also done. 
Some of the examples of adders that operate on 
binary numbers are binary-coded decimal and 
excess-3 adders. The operation of two’s and one’s 
compliment is done for subtraction operation and 
are used to represent negative numbers. 
When three or more operands are used then we go 
for compressor to speed up the summation process. 
Carry save adder is one of the examples for such 
adders. The circuit design may have more than one 
adder when it has more than four addends. Wallace 
tree is one of the circuits that are mostly used. 
Wallace tree multiplier is notably used in 
multipliers.            
II. ADDERS  
Carry-Save Adder: 
The addition of two or more addends is done 
through carry-save adder. There are many cases 
where it is desired to add more than two numbers 
together.  Consider the below example 
12345678 
+87654322 
=100000000 
Adding process will starts from LSB bits and the 
result of carry is passed to next bit suct that the 
process goes on until we get the final result and the 
carry is passed throught the process. A time 
proportional to n to allow a possible carry to 
propagate from one end of the number to the other. 
1. The addition result is not known 
2. The result of addition may be either larger or 
smaller than the given number  
3. The result may be either positive or negative 
The major limitation while using Ripple Carry 
Adder is “carry propagation delay” and this can be 
overcome when we use “carry look ahead adder”. 
The delay is reduced by using this adder. But for 
processing large numbers this adder may also 
introduces some delay.  
Here is an example of a binary sum: 
10111010101011011111000000001101 
+11011110101011011011111011101111 
It computes the sum digit as 
10111010101011011111000000001101 
+11011110101011011011111011101111 
=21122120202022022122111011102212. 
As each digit performs the operation individually 
that is it doesn’t need to wait for the carry 
generated by the previous bit, the whole operation 
is performed in a single clock tick. 
Addition is used only for adding two bits and 
produces sum and a carry bit. Carry save adder will 
be not preferred for adding two numbers. In 
multiplication process it is used to add up all the 
generated partial products. 
At each stage of carry save addition the following 
we observe 
1. The results are known 
2. But we don’t know whether the result is larger or 
smaller than the given number 
Consider three n-bit numbers a, b, and c given as 
inputs to the carry save unit, it individually 
generates the sum and carry bit irrespective of the 
D. Jhansi* et al. 
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
 Volume No.5, Issue No.2, February – March 2017, 5772-5776. 
2320 –5547 @ 2013-2017 http://www.ijitr.com All rights Reserved.  Page | 5774 
 
previous sum and carry. Carry save unit consists of 
n full adders. 
The usage of two ripple carry adders for adding 
three or more numbers produces a delay which is 
diminished by a carry save adder followed by a 
ripple carry adder. The delay because ripple carry 
adder doesn’t performs next addition operation 
until it gets the carry from previous bit. 
Carry Skip Adder: 
Carry skip adder contains some special blocks that 
are useful for detecting the bits that are to be added. 
Here the carry will be either generated or 
propagated. Carry by-pass adder is also known as 
carry skip adder. The signal that is produced by this 
circuit is known as “propagation signal”. The carry 
signal is transmitted through all the stages of 
blocks and the propagation time is depended on the 
position of carry that has been generated. If there is 
no need to calculate the carry then only the time 
which is required to compute the sum value is 
considered. The below block diagram has four 
multiplexers and is considered as 16 bit carry skip 
adder. The implementation of the circuit is shown 
below. 
 
Fig.3: Structure of Carry Skip Adder 
III. BOOTH’S MULTIPLICATION 
ALGORITHM 
The algorithm examines adjacent pairs of N-bit 
multipliers. The pairs are taken in two’s 
complement representation. The least significant 
bit is also included in two’s complement 
representation. The bits yi and yi-1 (where i start 
from 0 to N-1) for each bit of i. for equal bits the 
accumulator bit is left unchanged. The addition 
operation is performed to P when yi =0 and yi-1   
=1. Subtraction operation is performed to P when yi 
=1 and yi-1   =0, so that the final value P will be 
signed product. 
Both the multiplicand and the product value are 
two’s complemented and their representation will 
be not specified at starting of the process. Any 
number system will accepts addition and 
subtraction process. By default the order is 
followed from LSB to MSB when the order is not 
mentioned i.e., starting from i = 0.  
The algorithm starts by adding two predetermined 
values A and S continuously and the value is 
assigned to product P. once we get the value of P, 
the value will be shifted to rightward. In the below 
example that is shown m and r represents 
multiplicand and multiplier respectively and the 
values x and y represents the number of bits that 
are present in m and r. 
1. Determine the values of A and S, note down 
the initial value of P. These three values have 
should have an equal length of (x+y+1). 
1. A: the value of m is assigned to the left 
most significant bit of A and the 
remaining bits are given zero. 
2. S: the value of –m i.e., in two’s 
compliment notation is assigned to the left 
most significant bit of S and the remaining 
(y+1) bits are filled with zeros. 
3. P: the LSB bit is assigned to zero and the 
MSB bit is assigned the value of x and the 
remaining bits are appended with the value 
of r. 
2. Determining the rightmost bits of P 
1. If the bits are 01, find out the value of 
P+A. 
2. If the bits are 10, find out the value of 
P+S.  
3. If the bits are 00 and 11 there is no need 
for any computation.  
4. Shift the value arithmetically that is 
obtained in second step. Now the P value 
is updated to the new value. 
3. Repeat the above step 2 until it is equal to Y 
times. 
4. The rightmost bits that are in P equal to the 
product of m and r. 
Example: 
Find 3*(-4), with m=3 and r=-4, the values of x and 
y are 4 and 4 respectively. 
1. m = 0011, -m = 1101, r = 1100 
2. A = 0011 0000 0 
3. S = 1101 0000 0 
4. P = 0000 1100 0 
5. The loop is performed four times as the value 
of y is four bits 
1. P = 0000 1100 0. The last two bits are 
equal to 00. So sift the value to right, 
therefore P = 0000 0110 0. 
2. P = 0000 0110 0. The last two bits are 
equal to 00 so again shift the value to 
right, therefore P = 0000 0011 0. 
D. Jhansi* et al. 
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
 Volume No.5, Issue No.2, February – March 2017, 5772-5776. 
2320 –5547 @ 2013-2017 http://www.ijitr.com All rights Reserved.  Page | 5775 
 
3. P = 0000 0011 0. The last two bits are 
equal to 10 so add P+S and assigned the 
value to P, therefore P = P+S = 1101 0011 
0 
4. P = 1110 1001 1. Arithmetic right shift. 
The last two bits are equal to 11. Perform 
arithmetic right shift. P = 1110 1001 1. 
The final product is equal to -12. 
 Wallace Tree Multiplier: 
Adders do not propagate carry to all the bits, so 
adders are faster than the parallel adders and are 
widely used to implement the multiplication 
process. Wallace tree multiplier is one of such 
circuit that is used to speed up the multiplication 
process and is made up of combinational logic 
circuits that are used to multiply the binary 
integers. General adders like full adder and half 
adder improves the speed of multiplication process 
and are essential elements for implementing the 
multiplication process. Remaining process like 
ROM look up tables and shift-add approach has an 
disadvantage that as the number of bits increase the 
time required to calculate the result linearly 
increases.  
 
Fig.4: Wallace multiplier 
The figure shown above says how to realize 
Wallace tree multiplier for 8-bit. 
The major complexity of many signals processing 
system is multiplication.  
The Wallace multiplier follows below three steps: 
1. Full adders are used for calculation of bits in 
each group. 
2. In conventional Wallace reduction single bits 
are passed to next stage and a group of bits 
are not processed and they are passed to 
contrast conventional method. 
3. Half adders that are used must known about 
that stages that shouldn’t exceed the number 
in conventional Wallace multiplier. 
Half adders are used in final stage of reduction for 
exceptional cases.  
4. Results  
RTL schematic 
 
Fig.:5 Top module. 
 
Fig.6: Wallace multiplier with carry save. 
 
Fig.7: wallace multiplier with carry skip adder. 
Technological schematics. 
 
Fig .8: Wallace multiplier with carry save adder 
 
Fig.9: wallace multiplier with carry skip adder. 
Wave forms 
D. Jhansi* et al. 
(IJITR) INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND RESEARCH 
 Volume No.5, Issue No.2, February – March 2017, 5772-5776. 
2320 –5547 @ 2013-2017 http://www.ijitr.com All rights Reserved.  Page | 5776 
 
 
Fig. 10: output’s of Wallace with carry save 
adder. 
 
Fig .11: Wallace with carry skip adder. 
IV. CONCLUSION 
In the last two decades, many architectures have 
been introduced for the design of low complexity 
fir operation. But there is no such improvement in 
the FIR design. This project gives the solution for 
that type of requirements. From the results it can be 
concluded that the FIR WALLACE CARRY 
SAVE structure occupies less area, less memory 
and consumes less power also. But the FIR 
WALLACE CARRY SKIP structure has less delay 
when compare with other structures. So from this 
project it has a chance to use the corresponding 
structure based on the industrial requirements. In 
future there may be a chance to develop the layouts 
for the structures. 
V. REFERENCES 
[1]  Yu Pan and Pramod Kumar Meher, Senior 
Member, IEEE, “Bit-Level Optimization of 
Adder-Trees for Multiple Constant 
Multiplications for Efficient FIR Filter 
Implementation ,”  Transactions On Circuits 
And Systems—I: Regular Papers, Vol. 61, 
No. 2, February 2014. 
[2]  D. R. Bull and D. H. Horrocks, “Primitive 
operator digital filter,” IEEE Proceedings-G, 
vol. 138, no. 3, pp. 401–412, Jun. 1991. 
[3]  A. G. Dempster and M. D. Macleod, “Use 
of minimum-adder multiplier blocks in FIR 
digital filters,” IEEE Trans. Circuits Syst. II, 
Analod Digit. Signal Process., vol. 42, no. 9, 
pp. 569–577, 1995. 
[4]  S. D. S. M. Mehendale and G. Venkatesh, 
“Synthesis of multiplier-less FIR filters with 
minimum number of additions,” in Proc. 
IEEE ICCAD,1995. 
[5]  I. C. Park and H. J. Kang, “Digital filter 
synthesis based on minimal signed digit 
representation,” in Proc. Design Autom. 
Conf. (DAC),2001. 
[6]  Y. Voronenko and M. Püschel, 
“Multiplierless multiple constant 
multiplication,” ACM Trans. Algorithms, 
vol. 3, no. 2, 2007. 
[7]  P. K. Meher and Y. Pan, “Mcm-based 
implementation of block fir filters for high-
speed and low-power applications,” in Proc. 
VLSI and System-on-Chip (VLSI-SoC), 
2011 IEEE/IFIP 19th Int. Conf., Oct.2011, 
pp. 118–121. 
[8]  L. Aksoy, C. Lazzari, E. Costa, P. Flores, 
and J. Monteiro, “Design of digit-serial FIR 
filters: Algorithms, architectures, and a 
CAD tool,” IEEE Trans. Very Large Scale 
Integration (VLSI) Syst., vol. 21, no. 3,pp. 
498–511, Mar. 2013. 
[9]  M. B. Gately, M. B. Yeary, and C. Y. Tang, 
“Multiple real-constant multiplication with 
improved cost model and greedy and 
optimal searches,” in Proc. IEEE ISCAS, 
May 2012, pp. 588–591. 
[10]  M. Kumm, P. Zipf,M. Faust, and C.-H. 
Chang, “Pipelined adder graph optimization 
for high speed multiple constant 
multiplication,” in Proc.IEEE ISCAS, May 
2012, pp. 49–52. 
[11]  R. Hartley and A. Casavant, “Tree-height 
minimization in pipelined architectures,” in 
Proc. IEEE ICCAD, Nov. 1989. 
