I. Introduction
The real time digital signal processing applications are greatly extended by the advancements in VLSI (Very Large Scale Integrated Circuit) technology. As a part of digital signal processing, the FIR (Finite Impulse Response) filter has so many applications especially it is well-suited for elimination of PLI [1] . Power line Interference is the most common type of noise in the ECG signal caused by absorbing the electromagnetic radiation by the human body from 50Hz frequency power lines [2] . Low power and high speed filtering is essential to eliminate the noise from biomedical raw signals and make the monitoring device portable. Basically the FIR filter consists of multiplier and an accumulator that contains the sum of the previous consecutive products [3] . A digital athematic unit called MAC can also perform the same operation of multiplication and accumulation. Therefore, the repetitive process of multiplication and addition in FIR filter is conveniently obtained by MAC unit as shown in Fig.1 . This thesis presents an innovative FIR filter design based on R-MAC Unit. The performance of this filter largely depends on the speed and power of the MAC unit employed inside the filter. The effective architecture for FIR filter with given specifications may be designed by using MATLAB. The performance of the designed filter may be verified by using Xilinx.
The current work compares various MAC units on Power, Performance and Area (PPA) benchmarks. In this project, the MAC is designed by using Vedic multiplier and Carry Select Adder. Vedic multiplier is faster than the array multiplier and Booth multiplier. The area needed for Vedic Multiplier is very small when compared to other multiplier architecture and the higher order multipliers can also be designed easily from lower order multipliers [4] . Carry Select Adder (CSA) is mainly used due to its low power consumption in the MAC unit and it also occupies less area. CSA can also operate at more speed [5] . A Vedic multiplier-CSA MAC is used to design a FIR filter to meet the speed and power requirements. In the design of MAC based FIR filter, the each tap-summer is needed to replace by one MAC unit. By this way the design requires same number of MAC units as number of taps and consequently increases the utilization area and power consumption. The power consumption can be reduced by introducing the concept of reuse of a single MAC unit instead of multiple MAC units using multiplexing technique. The main goal of this project is to design a low power and high speed FIR filter based on R-MAC by employing the time division multiplexing. The entire FIR filter is coded in Verilog and synthesized in Xilinx for speed and power analysis.
The rest of the paper is organized as follows. In section II, the basic design of FIR filter based on MAC unit is presented. Section III, presents the proposed work that is how the FIR filters is designed using the concept of R-MAC. Section IV describes the validation and comparison of results. Section IV gives the conclusion and future scope of this work.
II. Basic MAC based FIR filter
The design of an FIR filter involves a lot of multipliers and adders that consumes lot of power and take a lot of time to compute the operations. The power and delay are significantly reduced by introducing MAC unit in the design of FIR filter. In computing, especially digital signal processing, the multiply-accumulate operation is a common step that computes the product of two numbers and adds that product to an accumulator. Modern computers may contain a dedicated MAC, consisting of a multiplier implemented in combinational logic followed by an adder and an accumulator register that stores the result. The output of the register is fed back to one input of the adder, so that on each clock cycle, the output of the multiplier is added to the register.
Fig. 1: MAC based FIR filter (4-tap)
Combinational multipliers require a large amount of logic, but can compute a product much more quickly than the method of shifting and adding typical of earlier computers. The first processors to be equipped with MAC units were digital signal processors, but the technique is now also common in general-purpose processors. In this method, an Nth order FIR filter design require, an equal number of MAC units as number of taps (N+1).As order of the filter increases, the number of taps increases and consequently number of MAC units need to increase. Hence the utilization area and power consumption are increases due to the increment of MAC units. The power requirements and area utilization can be greatly reduced by using a single MAC instead of multiple MACs by the concept of R-MAC using a time switching mechanism.
III. Proposed Filter Design

A. Proposed MAC Unit
In this work, instead of multiplier stage we use Vedic multiplier, adder stage we use Carry Select adder and in the accumulation register we use PIPO register. This PIPO register is used to store data as shown in Fig.2 . Parallel In Parallel Out register uses D flip flop. Thus this PIPO will be acting as the accumulator in MAC unit and as a delay unit when considering filter.
Fig. 2: Block diagram of MAC unit
Multiplier based on Vedic Mathematics is one of the fastest and low power multiplier. Employing this technique in the computation algorithms will reduce the complexity, execution time, power etc. The 2×2 Vedic multiplier modules have been implemented using two half adders modules. The total delay is two half adder
Design of a Low Power and High Speed FIR filter based on Reusable MAC Unit
DOI: 10.9790/2834-1204014652 www.iosrjournals.org 48 | Page delays, once the Tap products are generated. A 4x4 multiplication is simplified into 4, 2x2 multiplication that can be performed in parallel as shown in Fig.3 . This reduces the number of stages of logic and thus reduces the delay of the multiplier. This example illustrates a better and parallel implementation style of Urdhva Tiryagbhyam sutra. The use of Vedic mathematics lies in the fact that it reduces the typical calculations in conventional mathematics to very simple ones. This is so because the Vedic formulae are claimed to be based on the natural principles on which the human mind works. Vedic Mathematics is a methodology of arithmetic rules that allow more efficient speed implementation. This is a very interesting field and presents some effective algorithms which can be applied to various branches of engineering such as computing. The higher order multipliers are also designed using lower order multipliers as shown in Fig.4 . Carry Select Adder (CSA) is mainly used due to its low power consumption in the MAC unit and it also occupies less area. Carry Select Adder can also operates at more speed. The CSA generally consists of two ripple carry adders and a multiplexer. Adding two n-Tap numbers with a carry select adder is done with two adders in order to perform the calculation twice, one time with the assumption of the carry in being zero and the other assuming it will be one. After the two results are calculated, the correct sum, as well as the correct carry out, is then selected with the multiplexer once the correct carry in is known. The combination of Vedic multiplier and carry select adder meets the speed requirements of the present work. 
B. FIR Filter based on R-MAC
In this proposed work, the designed low power high speed MAC unit is reused at each tap of the filter based on allocation of a specific time slot for that tap. So the hardware requirement is greatly reduced by using a single MAC instead of multiple units. To suppress the 50Hz interference in ECG signal, a band-stop filter is designed whose coefficients are obtained from the MATLAB function of FIR filter.The filter can be implemented in many ways depending on the number of multipliers and accumulators available. In this paper we have implemented using a Single MAC Based FIR Filter unit. There block diagram is shown in Fig.5 which consists of two multiplexers and Single MAC Based FIR Filter unit. The multiplexer is used to select only one input at a time which is fed to the multiplier at a given time. As each product term is generated, it is added to the
Design of a Low Power and High Speed FIR filter based on Reusable MAC Unit
DOI: 10.9790/2834-1204014652 www.iosrjournals.org 49 | Page previously accumulated sum in the MAC unit. Each input sample is delayed from the previous sample by 8T, where T is the time taken by the multiplier and accumulator to compute one product term and add it to the previously accumulated sum in the accumulator.
Fig. 5: Block diagram of proposed R-MAC unit
Here we consider x (n) as a 16 samples and h (n) has 16 coefficients, so we used 16:1 mux of two quantities. These multiplexers select first one sample i.e. x (n) and first coefficients h (0) applies to MAC unit. A MAC unit is a single bit MAC unit. The output of this will be saved in accumulator which will be wide bits. In the next clock cycle it selects next sample x (n-1) and next coefficient h (1) and performs MAC operation on these inputs. So this will be apply for all the bits one by one and final output will be y (n) which is saved in accumulator. The counter is used in order to provide clock (clk) and reset (rst) and also enable (en).The operation is performed according the trailing edge of clock is provided and when the reset is set to 0. The multiplexer, shortened to "MUX" or "MPX", is a combinational logic circuit designed to switch one of several input lines through to a single common output line by the application of a control signal. Multiplexers operate like very fast acting multiple position rotary switches connecting or controlling multiple input lines called "channels" one at a time to the output. Generally, the selection of each input line in a multiplexer is controlled by an additional set of inputs called control lines and according to the binary condition of these control inputs, either "HIGH" or "LOW" the appropriate data input is connected directly to the output. Normally, a multiplexer has an even number of 2 n data input lines and a number of "control" inputs that correspond with the number of data inputs. The main advantage of using Single MAC Based FIR Filter unit for 16-Tap is that it provides a less delay compared to that of multiple MAC's which are used in MAC based FIR filter of 4-Tap i.e., 4 MAC's.The power reduction is achieved through the usage of a MAC unit inside the filters that reduce the total activity and therefore the dynamic power.
IV. Validation and Discussion of Results
A. Simulation Results
The Vedic multiplier is implemented using Xilinx ISE simulator and the results shown in Fig.6 .
Fig. 6: Vedic Multiplier simulation results
Design of a Low Power and High Speed FIR filter based on Reusable MAC Unit
DOI: 10.9790/2834-1204014652 www.iosrjournals.org 50 | Page
The carry select adder is implemented by using the Xilinx ISE simulator. All the internal blocks of multiplier are individually implemented as sub modules. And then finally all the components are combined using the module instantiation to implement the final multiplier top module block. The simulation results are shown in Fig.7 .
Fig. 7: Carry Select Adder simulation results
The 4-tap MAC based FIR filter implemented by using the Xilinx ISE simulator. All the internal blocks of multiplier are individually implemented as sub modules. And then finally all the components are combined using the module instantiation to implement the final multiplier top module block. The simulation results shown in Fig.8 . The FIR filter based on R-MAC unit is implemented by using the Xilinx ISE simulator. All the internal blocks of the adder are individually implemented as sub modules. And then finally all the components are combined using the module instantiation to implement the final adder top module block. The simulation results are shown in Fig.9 . A noise-free ECG signal is generated in MATLAB with a sampling frequency of 1000Hz and bandwidth of 100 Hz. This ECG signal is normalized for a peak-to-peak value of 1 (Fig.11) . A 50Hz sinusoidal noise is generated with a sampling frequency of 1000Hz and added to the ECG signal as shown in Fig.12 . The noise-affected ECG signal is then applied to the filter input. The Signal-to-Noise ratio of the noise affected ECG signal is 10.08dB.It was observed that almost all of the 50Hz noise is suppressed at the output of the filter as depicted in Fig.13 and this signal can be reliably used for diagnosis. The MAC unit design was simulated using Cadence NCSim simulator and synthesized on Synopsys Design compiler and Primetime-PX was used for power analysis. All the results of timing, power and area of MAC unit are computed using 28nm UTBB-FDSOI (Ultra-Thin Body and Box Fully Depleted Silicon on Insulator) technology. The Filter has been designed in Verilog HDL and Xilinx ISE is integrated with MATLAB to apply the ECG signal to the filter and view the output. The convergence time for the algorithm was calculated using a Linux platform with x86, 64 bits working on 2.8GHz. It is clear that the speed is greatly improved and the power consumption is also reduced in the proposed work, R-MAC based FIR design compare to MAC based FIR filter using multiple MAC units.
V. Conclusion
The basic FIR filter, MAC based FIR filter and R-MAC based FIR filter are designed using by the combination of Vedic Multiplier and Carry Select Adder. The delay and power consumption parameters are calculated for the above three cases. From the results, it is clear that the R-MAC based FIR filter yields the better performance compared to MAC based FIR filters in terms of both delay and power consumption. Synthesis for this design has been done successfully. The speed is improved by almost 10.95% and the power consumption reduced to 38.09%. This architecture can be used effectively in the area of requiring high throughput such as real-time digital signal processing. There is a possibility to improve the performance of this design with the help of Reversible logic gates which can be usable in the multiplier design. Future work can also include the integration of the divider block in MAC unit.
