Abstract
Introduction
In field of electronic industry, digital filters are used extensively. Noise range can be gradually increases by using analog filters. Better noise performance can be obtained by using digital filters compared to analog filters. The proposed design includes optimization of bit width and hardware resources without any impact on frequency response and output signal [2] . Three basic mathematical operations such as Addition (or) subtraction, Multiplication, delaying digital signal for one or more sample periods that are used in digital filter are shown in Figure 1 .1. By using mathematical operations, behavior of the filter can be described. Coefficients are multiplied by fixed-point constants using additions, subtractions and shifts in a multiplier block [5] . In VLSI Signal Processing, two types of digital filters widely used, one is FIR (Finite Impulse Response) and other is IIR (Infinite Impulse Response). FIR indicates that impulse is finite. In this filter phase is kept linear in order to avoid noise distortions and no feedback is used. As compared to IIR, FIR is very simple to design. Such type of FIR filters are used in DSP processors for high speed. In Digital Signal Processing Multiplication and addition requires a lot of time. High speed addition is done by parallel prefix adder and better version of truncated multiplier with fewer components [4] . For limited calculations IIR is used because all output is found separately, even though there is need to provide feedback. Digital Signal Processing, FIR filters define less number of bits which is designed by using finiteprecision. If IIR filter uses feedback problem will raise. In FIR filter limited bits are efficient, which there is no feedback. Using fractional arithmetic can implement FIR filters. FIR filters require more co-efficient than IIR filter in order to implement same frequency response, therefore needing more memory and hardware resources to carry out mathematical operations [1] . The proposed FIR filter design can have less number of coefficients.
Multipliers
Now a day's fast co-processor, digital signal processing chips and graphics processors has created to satisfy customer for high speed and area efficient multipliers. Current design ranges from small, low-performance shift and ADD multipliers to large high-performance array of multipliers. Higher performance is gained by Tree structures than linear arrays, but tree interconnection is more complex, less regular. In digital signal processors and microprocessors multiplier is one of the key hardware blocks in most of digital and high performance systems. Main motivation behind this paper is to offer high speed and lower power consumption without increase in silicon area. Figure 2 .1 represent multiplication process of two binary numbers, multiplicandand Multiplieraccording to the multiplier rules. If inputs are n bit then output should be 2n bits. The first step of method is to form the partial product matrix and this is obtained by adding the multiplicand and multiplier bits. If the multiplier bit is 0, partial product is 0. If multiplier bit is 1, partial product is equal to multiplicand, repeats for every multiplier bit. Note that number of partial products equal to the width of the multiplier. To obtain final product elements in columns (from right to left) are added using binary logic 7. Any carries are carried on to next column. Result of this operation is stored in one bit of product and operation is repeated for each remaining column. 
Binary Multiplications

Truncated Multiplier
Truncated multiplication is a technique, only most significant columns multiplication matrixes are used, therefore area requirements shrinkage will be done. Truncation is a method, least significant amount of columns not formed in this way. 'T' defines the degree of truncation and T least significant bits of the product always result in '0'. Algorithm behind truncated multiplication is same, when dealing with non-truncated multiplication regardless of truncation degree. Effect is illustrated in Figure 3 , where truncation column in the partial product matrix is not formed. Degree of T=8 and T=12 is applied. Notice that columns to the right of maroon vertical line are missing. In implementation of truncated multiplier extended bit width of multiplier operands. By increasing bit width complexity will be increased in operation of multiplication. In truncated method 8 bit and 12 bit operations are implemented by three operations of truncated multiplier which follows operation of 4 bit truncated multiplier operation those are deletion, truncation, rounding concepts [3] shows in 
Deletion
In truncated multiplier multiplication process is started with deletion operation only. In partial product more than half of bits are removed and then remaining bits become partial products in process. This is the main criteria of deletion.
Truncation
Truncation is a method where least significant columns in partial product matrix are not formed. 'T' defines degree of truncation. Least Significant Bits (LSB) of product always results in 0. Algorithm behind fixed width multiplication is same as when dealing with nonfixed width multiplication of the truncation degree. In FIR filter, zero order of non-uniform coefficient quantization is used to minimize cost and area [4] .
Rounding
Conventionally an n-bit multiplicand and n-bit multiplier would render a 2n-bit product. Sometimes an n-bit output is desired to reduce number of stored bits. Let us consider instance of 8x8 bit multiplier.
Figure 3.2. Partial Product Selection Logic for 8x8 Bit Multiplication
Let us consider an instance of 12x12 bit multiplier in Figure 3 .3. Truncated multiplication provides an efficient method for reducing power dissipation and area of rounded parallel multiplier. In those huge bit widths of implementations mainly focus on the performance of multiplier in lot of complexity levels also. In case of truncated method area shrinkage will automatically done at any number of bit width. So, other parameters of power and delay with those parameters depend on cost effective will be decayed a lot. 
Wallace Tree Multiplier
To reduce number of partial products that are to be added in final intermediate result Wallace Tree multiplier is used. Basic operation of Wallace Tree is multiplication of two unsigned integer. An efficient hardware is required to implement a digital circuit, that multiplies two integers is Wallace Tree multiplier that is designed by an Australian Computer Scientist Chris in 1964. There are three steps in Wallace tree multiplier.
Partial Product Generation Stage
First step of binary multiplier is generation of Partial product. Which are generated based on multiplier value. If multiplier bit is '0' (zero), then partial product row is also '0' (zero), if it is '1' (one). Each partial product row is shifted one unit to left from 2nd bit multiplication. Sign bit in signed multiplication also extended to left. For a conventional multiplier partial product generators are used. These contain a series of logic AND gates as shown in Figure  4 .1. In this process of multiplication of two numbers, main operation is addition of partial products. Thus, performance and speed of multiplier depends on performance of the adder forms core of multiplier. Multiplier must be pipelined, to achieve higher performance.
Partial Product Reduction Stage
The design analyses begin with analysis of elementary algorithm for multiplication using Wallace Tree multiplier. Algorithm for 8-bits x 8-bits multiplication performs by Wallace Tree multiplier shows in Figure 4 .1. The multiplication process is done in 5 stages. In each stage half adders are used and full adders that are denoted by 1 bit half adder and circle for 1-bit full adder. Partial products can be reduced by using half adders and full adders that are combined to build a carry-save adder (CSA). In next step remaining two rows are added by using a fast carry-propagate adder. Ripple-carry adder (RCA) uses the schematic of the conventional 8-bits x 8-bits. High speed Wallace multiplier is designed by referring to the algorithm. Block diagram for conventional high speed 8-bits x 8-bits Wallace Tree multiplier is shown in Figure 4 .1. The main aim of proposed architecture is to reduce overall latency. WALLACE TREE and DADDA are two reduction techniques that are discussed in [8] .
Partial Product Addition Stage
In this stage Wallace Tree multiplier method, ripple carry adders (RCA) are used to perform these addition operations.Three steps are used in Wallace method to process multiplication operation. They are 1. Construction of bit product(s) 2. Exhausting conventional adder, combine all product matrixes to form 2 vectors (carry and sum) outputs in first row. 3. Fast carry-propagate adder, remaining two rows are summed to produce the product. 
Dadda Multiplier
DADDA multiplier is designed by Luigi Dadda, computer scientist during 1965. DADDA multiplier is mined form of parallel multiplier [6] . It increases speed and involves less number of gates. The parallel multiplier uses different type of schemes, DADDA is one of schemes that fundamentally minimize number of adder stages required to perform summation of partial products. By using full adders and half adders number of rows in matrix, number of bits at each summation stage can be reduced. Wallace Tree multiplier is expensive compared to that of DADDA multiplier. In this paper, DADDA multiplier is designed and analyzed by considering different methods using full adders involving different logic styles.
Implementation of DADDA MULTIPLIER
Algorithm of DADDA multiplier is based on matrix form, as represents in Figure 5 .1. The partial product bits are arranged in first stage are demonstrated in Figure 9 . It represents the way of working process in DADDA multiplier. 
Steps Involved in DADDA Multipliers Algorithm
The wires carry different weights depending on situation of multiplied bits in Figure 5 .2. To reduce number of partial products 2 layers of full adders are used. Group wires in two numbers are added with a conventional adder. Ripple Carry Adder is used to add more number of additions that are to be accomplished with carry in sand carry outs that are to be chained. By using several full adders it is possible to create a logical circuit to add multiplebit numbers. Each full adder input is Cin, which is Cout of previous full adder. Since each carry bit "ripples" to next full adder, the architecture of DADDA multiplier algorithm must use RCA procedure. Data is taken with 3 wires and added by using adders. The carry of each stage is added with next two data's in same stage. At the final stage, same method of ripple carry adder is performed and hence product terms p1 to p8 is obtained in Figure 5 .3. The performance comparisons of proposed 12 bit integrated ALU design is shown in Table  7 .1. The12 bit ALU with MCMAT based WALLACE Tree multiplier will gives better results than 12 bit ALU with MCMAT based DADDA multiplier. The RTL schematic of 12 bit ALU with WALLACE Tree multiplier is similar to theRTL schematic of 12 bit ALU with DADDA multiplier.
Experimental Results
Conclusion
In this paper direct form of digital FIR filter is recommended. It reduces area due to decrease of number of elements which are structural adders and storage elements. The design of 8 bit and 12 bit multiplier are proposed and the parameters such as power, area and delay for DADDA, normal multiplier and WALLACE Tree multiplier are compared. At last it is observed that truncated multiplier is not much efficient in terms of power factor and DADDA multiplier, WALLACE Tree multipliers are efficient in terms of delay and power analysis. The designing of 12 bit Arithmetic Logic Unit is done with MCMAT based 12 bit FIR using WALLACE Tree and DADDA multipliers. The results obtained shows that WALLACE Tree multiplier is efficient than DADDA multiplier, because speed and power are better in WALLACE Tree multiplier. By integrating ALU with MCMAT based digital FIR filter design in the real time applications, the overall speed and area can be improved. In future, it can be designed and implemented for large bit width also.
