Abstract: This paper presents the comparative performance of an adaptive FIR filter for a Delayed LMS algorithm. The delayed error signal was used to obtain a Delayed LMS algorithm to allow efficient pipelining for achieving a small critical path and area efficient implementation. This paper presents hardware efficient results (device utilization parameters) and power consumed. The FPGA families (Artix-7, Virtex-7, and Kintex-7) for a low voltage perspective are shown. The synthesis results showed that the artix-7 CMOS family achieves the lowest power consumption of 1.118 mW with 83.18 % device utilization. Different Precision strategies, such as the speed optimization and power optimization, were imposed to achieve these results. The algorithm was implemented using MATLAB (2013b) and synthesized on the Leonardo spectrum.
Introduction
Adaptive filters have attracted considerable attention because of their inherent capacity of adaptation. From an implementation point of view, the most desirable features of FPGA are its flexibility and programmability, which makes it the best choice for digital algorithm implementation. Adaptive algorithms have a wide range of applications, such as adaptive filters, smart antennas (for beam formation) [1] , hearing aids applications for patients to develop low power hearing machines [2] , in filters that use an adaptive fuzzy dividing frequency control mechanism for reducing the harmonics and in power factor [3] , and in power systems for estimating low frequency power modes [4] . Different adaptive algorithms exist, in which the LMS have maximum applications because of its simplicity [5] . Owing to the number of important applications of adaptive filters, the LMS algorithm has attracted considerable research interest. Weight adaptation of a LMS filter in each period of samples has taken place using the calculated error through which errors are calculated as the difference between the response of the filter and the desired response of the filter.
Each digital algorithm should have low power consumption in its synthesis and implementation. Therefore, when these adaptive algorithms are implemented for filter synthesis in FPGA, it should consume the lowest power. On the other hand, some of the major advantages of FPGA are diminished by this low power requirement. The main reason behind this problem is the leakage static power and large implementation area. Therefore, the main objective is to minimize the power requirement using different hardware implementation strategies along with an intelligent choice of board [6] . Over the last decade most studies considered power consumption as the main factor in the form of the area delay product (ADP) and energy per sample (EPS). Factors, such as weight, efficiency and size of devices, have shifted the researcher's objective towards miniaturization. A range of low operating and cooling costs, reliability and the growing requirement for low-power handheld communications and computer systems are the main causes of new low power implementation, particularly for FPGAs.
From an implementation point of view it has been shown that between the two forms of implementation; the direct form adaptive filter has the same critical path as the transposed form with lower register complexity [7] . Fig. 1 presents a generalized adaptive filter's implementation, where the output is feedback to the filter and the weights should be readjusted to minimize the difference between the response and desired signals. The power consumption is also dependent on the step size bound, which must be chosen intelligently because the Tap input power is the main reason behind this and an important term in the step size bounds.
The remainder of the paper structured as follows. Part II presents a brief introduction of the LMS adaptive filter. In part III, the results and discussion of the different FPGA families' implementation and their architecture analysis are provided; the last section IV reports the conclusions.
Review of LMS Adaptive Filter
The LMS adaptive filter has attracted considerable research attention because of its easy implementation, and potential modifications, such as delayed LMS adaptive filters. Fig. 1 presents the basic adaptive algorithm, in which the implementation arrangement for general adaptive filters is shown. Fig. 2 shows the conventional approach for the direct form implementation of LMS adaptive filters. In this implementation, for each and every input sample, the convolution has taken place with filter weights for each sampling time, which was achieved using a FIR filter. These weights are the important factors that form a new weight vector and it is redistributed on the input samples to minimize the error sample. This error signal was estimated using the desired output signal and output of FIR filter. Basically, this error signal drives the weights with the help of a step size parameter that is used to minimize the error [9] . There is an appropriate range of step sizes, which can be used and are mentioned in the equations. If the input signal is assumed to be x(n) and the output is denoted as r(n) then the weight of the filters are updated by the following equations:
The new weights are then calculated with the help of correlation matrix using the above equations. From the figure, it is clear that for one sample duration, two multiplications and one addition are to be done to obtain the desired output and the process is repeated to minimize the error signal by introducing new weights.
The step size plays an important role in deciding both the errors and weights but there is a bound on this step size, which is denoted in Eq. (5) as
Owing to the larger critical paths and implementation issues, a new modified algorithm came into existence where the presently calculated error signal is utilized to update the new weights of the next sample. This approach enables implementation with small critical paths and better feedback of the error sample to the filter. In this approach, pipelining is achieved at the feedback signals but the LMS does not support the pipelining in this case. This is why a delay is introduced and a Weight update equation for the DLMS adaptive filter is given in Eq. (6) as
When the FIR filter block is combined with the error calculation block, the latency in the path from the response of the filter to the subtraction can be compensated, as shown in Fig. 3 [8] Parallel processing is involved in cases where a high speed throughput is necessary. When a pipeline is performed, buffer elements, which are a meted condition in the DLMS filter, are placed. From the figure, it is clear that the error signal is delayed by n 1 number of cycles and is then given to the weight update block. At the input signal, the same amount of delay is provided. In the weight update block, the signals are utilized according to equation 6. n 2 number of cycles delay is then given and finally, the output is distributed to the FIR filter in terms of the new weights.
Results and Analysis Discussion
This section discusses the optimized results of the LMS adaptive filter on the different CMOS families with low power and high speed objectives along with the efficient utilization of different FPGAs. Three different families called Artix-7, Virtex-7 and Kintex-7 are taken for the simulation. For a decrease in power consumption, the LUTs are combined and the tap power is minimized, which makes it possible for efficient device utilization. Fixed point implementation is performed, and all the results are shown in Table 1 .
Here 7 series families were considered because of their low power performance and the architecture based properties. As all FPGA families have different architecture advantages, these architecture alignments [10] are shown in Fig. 4 . The common elements in these families allows the easy IP reuse per quick design. In Fig.  5 , different color coding is used to show different parameters, which are shown in Fig. 5 .
The fourth generation ASMBL architecture is used in the present low power families. In the case of the 7-Series of FPGAs, the architecture comprises of the different columns of the different resources, such as clocking, DSP, HSSIO, I/O etc. Fig. 6 shows the column-wise arrangement of the architecture.
The adaptive filter was implemented on the MATLAB version 2013b and synthesized on the Leonardo spectrum. High speed operation of the adaptive filter was achieved by reducing the critical path. A function-based approach was used in this MATLAB implementation and the critical path was minimized by implementing the efficient pipeline process. Recursive loops were minimized in MATLAB implementation using the unroll loops. The different power saving strategies was applied to the implemented filter in the synthesis results. These optimization strategies include the speed optimization for which the clock speed is increased and different power optimization schemes are used. These results are for the LMS adaptive filter order N=8. The implementation of the LMS filter is used for noise cancellation. From the results, it is clear that the artix-7 family takes very less power consumption 1.118(mw) with the maximum device utilization of 83%. In addition, when the optimization is applied for speed performance, it has highest device utilization. The architecture for artix-7 showed the highest efficient performance with respect to power consumption. This study employed a strategy for optimized balanced pipelining across the blocks, which are time consuming, to reduce the power as delay.
The Virtex family has the maximum power consumption among all the three families when a power optimization strategy is employed. The duty cycle in all the cases was almost 5 ns. Therefore, the artix-7 family has proven to be efficient in terms of a power consumption and device utilization point of view. It is clear from the result table that upon applying the register balancing and reducing the unused registers space, the results shows that 82% device utilization can be achieved in the case of Artix-7with 1.231mW. The results show that the device utilization decreases as the speed improves. Hence, there is some trade-off between the speed and area. Table 1 shows the results when the power optimization is applied to improve the performance of the adaptive filter
Conclusion
This paper presents the analysis of the LMS adaptive filter on the different CMOS families, in which artix-7 is best, suited for both the power consumption and speed. The device utilization achieved in this work was 83.14% with minimum power consumption for artix-7. The general strategy used for reducing the power consumption in adaptive filters is at the architectural level but this power consumption can also be minimized at the interconnect level as all the signals are clock triggered. 
