Engineering Science and Technology, an International Journal 19 (2016) 212-226



Review

Contents lists available at ScienceDirect

### Engineering Science and Technology, an International Journal

journal homepage: http://www.elsevier.com/locate/jestch

# Design of hardware efficient FIR filter: A review of the state-of-the-art approaches



200

### Abhijit Chandra<sup>a,\*</sup>, Sudipta Chattopadhyay<sup>b</sup>

<sup>a</sup>Department of Instrumentation & Electronics Engineering, Jadavpur University, Kolkata, India <sup>b</sup>Department of Electronics & Telecommunication Engineering, Jadavpur University, Kolkata, India

#### ARTICLE INFO

Article history: Received 5 May 2015 Received in revised form 11 June 2015 Accepted 24 June 2015 Available online 1 September 2015

Keywords: Common sub-expression elimination (CSE) Differential coefficient method (DCM) Genetic algorithm (GA) Minimal difference differential coefficients method (MDDCM) Mixed integer linear programming (MILP) Multiple constant multiplication Multiplier-less filter Pseudo floating point (PFP)

#### ABSTRACT

Digital signal processing (DSP) is one of the most powerful technologies which will shape the science, engineering and technology of the twenty-first century. Since 1970, revolutionary changes took place in the broad area of DSP which has made it an essential tool in many engineering applications. Digital filter is considered to be one of the most important components of almost every DSP sub-systems and therefore a number of extensive works had been carried out by researchers on the design of such filter. In order to meet the stringent requirements of filter specification, order of the designed filter is generally assumed to be very large and this leads to high power and area consumption during their implementation. As a matter of fact, design of hardware efficient digital filter has drawn enormous attention which needs to be addressed by various useful means. One popular approach has been to encode the tap coefficients of such filter in the form of sum of signed powers-of-two and thus the operation of the basic design approaches applicable for the synthesis of hardware efficient finite duration impulse response (FIR) filter. Both the traditional and heuristic search algorithms have been incorporated and properly arranged in this review.

Copyright © 2015, The Authors. Production and hosting by Elsevier B.V. on behalf of Karabuk University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).

### 1. Introduction

Digital filter design has brought significant attention amongst researchers over the last few decades. The class of digital filters may broadly be categorized into finite duration impulse response (FIR) and infinite duration impulse response (IIR) filter. FIR filters exhibit significant advantages like bounded-input-bounded-output (BIBO) stability, phase-linearity, and low-coefficient sensitivity over IIR counterparts which have made them perfectly suitable in many applications [1–3]. A major drawback of FIR filter is the large number of arithmetic operations involved during the implementation which limits its speed and demands more power [4]. This has motivated researchers to lean on the field of hardware efficient low-power filter design and accordingly this field has been enriched with a number of valuable contributions from many scientists all over the world. The first written article was published in the year 1982 [5] and the concept is exhaustively studied even today [6–8]. Hence it can be identified

as an active area of research. FIR filters are generally characterized by their impulse response coefficients indicating the multiplication constants with the input signals. These multipliers are power and area consuming devices and thus make the filter unbefitting in portable wireless devices like mobile phones, tablets, laptops etc. One of the most efficient ways to reduce the complexity of digital filter is to confine the tap coefficients to assume values in the form of sums of signed-powers-of-two (SPT). As a matter of fact, multipliers can be replaced by a small number of shifters and adders resulting in corresponding improvement in the area and power efficiency. Low area and low power design of FIR filter can also be achieved with the aid of parallel or block processing which is also found to be suitable to increase the effective throughput. This has been achieved by various means which include frequency spectrum characteristics [9], iterated short convolution [10,11] and so on. These power efficient digital filters have found their application in modern digital communication systems [12,13], wireless sensor networks [14] and so on.

One of the simplest methods to realize finite word length FIR filter is obtained by rounding the optimum infinite precision coefficients to its B-bit representation. However, the performances of such filters are significantly degraded from those with optimum real coefficients. Some suboptimal algorithms [15–17] can be found in

2215-0986/Copyright © 2015, The Authors. Production and hosting by Elsevier B.V. on behalf of Karabuk University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

<sup>\*</sup> Corresponding author. Tel.: +91 33 2335-2587; Fax: +91 33 2335-7254. *E-mail address*: abhijit922@yahoo.co.in (A. Chandra). Peer review under responsibility of Karabuk University.

this context which may improve the coefficients obtained by the rounding of optimal floating-point arithmetic through global search, univariate search, modified univariate search [15], and randomsearch optimization method [17]. Application of branch and bound technique for nonlinear discrete optimization in selecting the coefficients of recursive digital filter with a given word length has been shown in Reference 16 to meet the arbitrary response specification. These methods can only be applied for filter with lower order and the obtained solution is suboptimal in most of the cases.

Design of FIR filter with SPT coefficients has largely been considered as a problem of optimization in a discrete space with an aim to reduce the error power between the ideal and desired frequency response. In connection to this, the methods of integer linear programming and integer quadratic programming are particularly useful for designing FIR filters with powers-of-two coefficient grid [18]. However, integer programming have some serious disadvantages in the sense that the solution of finite word length obtained by using integer programming saves only a few bits in coefficient word length in comparison with the solution obtained by rounding. Moreover, it demands huge amount of computer resources and thus limits the maximum size of the filter. As a matter of fact, the design problems of powers-of-two FIR filter have subsequently been formulated as mixed integer linear programming (MILP) [19], integer semi-infinite linear programming [20], semi definite programming (SDP) [21], discrete semi-infinite linear programming problem (DSILP) [22], and mixed integer programming (MIP) [23] by various researchers over a number of years. Some methods using the branch and bound (B & B) technique based on linear programming are most useful in MILP [20,24].

In the context of powers-of-two filter design, problem of multiple constant multiplication (MCM) has been an active area of research over the last two decades. As the coefficients are constants, it is possible to realize those coefficients using shifts, additions and subtractions and thus eliminating the need of multipliers in the filter structure. MCM is the problem of realizing the multiplication of the same input by a number of constant integers using minimum multiplier-less operations. The idea of MCM is to utilize redundancies between the coefficients so as to minimize the required number of adders. Generally speaking, different MCM algorithms as available in the literature can be divided into three groups, i.e. adder graph method [25–27], common sub-expression elimination [28,29] and difference method algorithms [30,31].

Graph-based algorithms are bottom-up methods that iteratively construct the graph representing the multiplier block. The graph construction is guided by a heuristic that determines the next graph vertex to be added to the graph. Graph-based algorithms offer more degrees of freedom by not being restricted to a particular representation of the coefficients and typically produce solutions with the lowest number of operations. The first written article in this regard has been published in 1995 [25] which has proposed the concept of n-dimensional reduced adder graph (RAG-n) algorithm for the reduction of adders in filter design. This was considered to be the best approach for more than a decade before the inception of the HCUB algorithm in 2007 [27]. In recent times, algorithms like difference based adder graph heuristic for MCM problems [26] and truncated MCM using pattern modification technique (PMT) [32] have been appearing in the list which have established their superiority over the previously best approaches.

Complexity reduction of multiplication coefficients has been carried out by various means amongst which common sub-expression elimination (CSE) algorithm is most popular. A number of research articles have already been reported in the literature [29,33–35] which have dealt with CSE in different aspects of multiplier-less FIR filter design. One common feature of CSE algorithms is to identify common bit patterns in the coefficient set and to share those identified common sub-expressions to minimize the adder cost. Hartley [29] took the pioneering initiative of sub-expression sharing in filters using canonic

signed digit (CSD) multipliers. Inspired by the application of CSE in designing hardware efficient digital filters, algorithms like non recursive signed common sub-expression elimination (NR-SCSE) [36] and heuristic common sub-expression elimination [37] have been successively introduced by various researchers over a number of years and their supremacy over the predecessors has been properly substantiated. However, the filter structure obtained using CSE is hard to pipeline because it is highly irregular. In addition to this, since the coefficients of programmable and reconfigurable filters are not fixed, it would not have been easy to find the common sub-expressions for newly applied coefficients [38].

As a matter of fact, reordering of filter coefficients has emerged as one of the efficient techniques to reduce the hardware cost of digital filters. In regard to this, differential coefficients method (DCM) [30] follows the intuition that recasting the filter computation in terms of the difference between the adjacent coefficients can reduce the number of ones required to represent coefficients. DCM works well when consecutive coefficients are similar i.e.  $h_k = (1011001111)$ and  $h_{k-1} = (100100111)$ . However, it suffers when large differences result in many 1's being needed to store the difference. Moreover, if any of the two coefficients is zero, DCM offers no additional advantage. To get rid of this problem, Vinod et al. have proposed the minimal difference differential coefficients method (MDDCM) [39,40] which, rather than storing the difference of FIR coefficients in an order from  $h_0$  to  $h_{N-1}$ , sorts the coefficients in such a way that adjacent coefficients have minimum differences in magnitude. By considering the fact that the adder width can be minimized by limiting the shifts of the operands to shorter lengths, an efficient coefficient partitioning algorithm, called pseudo floating point (PFP) representation has been introduced in Reference 41. This has been integrated with vertical common sub-expression elimination (VCSE) algorithm towards the design of low complexity channel filters.

Inspired by the genetic and social behaviour of animals, the last quarter of the twentieth century has brought various intelligent optimization techniques into limelight. These techniques, classified into evolutionary and swarm optimization, have subsequently been applied in a number of research applications of proper relevance. In regard to powers-of-two FIR filter design, methods like tabu search (TS) [42], genetic algorithm (GA) [43–45], micro GA ( $\mu$ GA) [46], modified  $\mu$ GA [47], orthogonal GA (OGA) [48], differential evolution (DE) [49], self-organizing random immigrants genetic algorithm [50,51], particle swarm optimization (PSO) [52], and artificial bee colony (ABC) algorithm [53,54] have been successfully incorporated and their supremacy over their predecessors has been firmly established.

Success of hardware efficient powers-of-two one dimensional (1D) FIR filter design has also been extended towards the synthesis of two-dimensional (2D) multiplier-less image filter when Pei and Jaw published their article in the year 1987. Since then, researchers throughout the world have significantly contributed in this field for improving the quality of digital image. In connection to this, conventional methods like linear programming (LP) [55], semi definite programming (SDP) and artificial methods like genetic algorithm (GA) [56,57], gravitational search algorithm (GSA) [58], differential evolution [59] have proven their effectiveness.

It is unarguably true that the field of hardware efficient FIR filter design has been enriched with numerous valuable contributions from researchers for more than 30 years. As a matter of fact, it seems appropriate to summarize those concepts which have been adopted to address this problem over so many years. Motivated by this aim, this work presents a detailed review on the evolution of design flow for hardware efficient FIR filter. The rest of the paper has been organized as follows. Section 2 describes the growth of mathematical programming in the design of linear phase powers-of-two FIR filter. CSE and its numerous advancements have been presented in Section 3. Section 4 accumulates all such approaches focusing on the minimization of adders in hardware efficient FIR filter design. In Section

5, an extensive survey on the coefficient representation scheme has been carried out. Application of intelligent optimization techniques in the area of multiplier-less FIR filter design has been thoroughly investigated in Section 6 followed by the design strategies of two dimensional hardware efficient FIR filter in Section 7. Experimental observations have been listed in Section 8 and the paper is finally concluded in Section 9 with a possible scope of future research in this particular field.

## 2. Linear phase powers-of-two FIR filter design using mathematical programming

Design of FIR filters over a discrete powers-of-two coefficient space has been a part of active research since long. The first written article based on this area of signal processing was that of Lim et al. [5] where the discrete coefficients are selected by the method of integer programming. The frequency response  $H(\omega)$  of any FIR filter of length N can always be expressed as a trigonometric function of the frequency variable  $\omega$  [18]:

$$|H(\omega)| = \begin{cases} h\left(\frac{N-1}{2}\right) + 2\sum_{n=0}^{(N-3)^{2}} h(n)\cos\omega\left(\frac{N-1}{2}-n\right), & \text{for odd } N\\ 2\sum_{n=0}^{(N_{2})-1} h(n)\cos\omega\left(\frac{N-1}{2}-n\right), & \text{for even } N \end{cases}$$
(1)

Magnitude part of the response has been illustrated and the symmetricity of the impulse response i.e. h(n) = h(N-1-n) is assumed in equation (1). The resulting phase response for both odd and even N may therefore have a form like:

$$\Theta(\omega) = \begin{cases} -\omega \left(\frac{N-1}{2}\right), & \text{if } H(\omega) > 0\\ -\omega \left(\frac{N-1}{2}\right) + \pi, & \text{if } H(\omega) < 0 \end{cases}$$
(2)

The filter design problem is nothing but obtaining the set of coefficients h(n) such that  $H(\omega)$  is the best approximation to some desired function  $D(\omega)$  with respect to some optimality criterion. During their design using minimax strategy, the value of  $H(\omega)$  is subject to the following constraint:

$$D(\omega) - \delta k(\omega) \le H(\omega) \le D(\omega) + \delta k(\omega) \tag{3}$$

 $\delta k(\omega)$  in equation (3) stands for the ripple to be minimized. One possible criterion to optimize the filter coefficient h(n) is to reduce the output error power [60]. This may be expressed as:

$$|E(\omega)|^{2} = |V(\omega)|^{2} |D(\omega) - H(\omega)|^{2}$$
(4)

 $E(\omega)$  and  $V(\omega)$  signify the frequency spectrum of error signal and input signal respectively in the above equation. According to Parseval's theorem, the above equation may be observed as the minimization of the variable J as follows [60]:

$$J = \int_{0}^{n} |V(\omega)|^{2} |D(\omega) - H(\omega)|^{2} d\omega$$
(5)

As can be inspected from the former illustration, optimum coefficient values may be obtained by reducing the weighted average of the error signal with different weights to be assigned for the frequency variable in the pass-band and stop-band. For feasible solution of equation (5), integration can simply be replaced by summation like,  $\sum_i |V(\omega_i)|^2 |D(\omega_i) - H(\omega_i)|^2$ . Minimization of J subject to some linear constraints on the elements of h(n) is a quadratic programming problem and therefore a general purpose integer quadratic programming can be used for solution. However, combination of linear and quadratic programming packages with a branch-andbound (B & B) technique may sometimes be useful to design filters in discrete space and successfully be employed by the authors in Reference 17. Authors have made use of two different variants of branch-and-bound algorithm, namely isocost and breadth-first branch-and-bound search mechanism. The first problem always selects the best sub-problem for further branching for reducing the search cost and the latter one continues until a sub-optimum discrete solution is achieved.

The method of synthesizing hardware efficient FIR filters requiring fewer arithmetic operations than the conventional one is based on a cascade structure of a multiplier-less pre-filter and an FIR equalizer. One such optimal method for designing multiplierless FIR and IIR filters has been demonstrated [61] with cascaded prefilter-equalizer architecture. During the course of design, both the prefilter and equalizer are simultaneously designed using MILP that yields a resulting filter with minimal complexity, assuming that FIR filter consists of a cyclotomic polynomial (CP) prefilter and interpolated second order polynomial (ISOP) equalizer. As far as the design of IIR filter is concerned, all pole IIR equalizers consisting of inverse of interpolated first order polynomials (IIFOPs) are introduced and a CP-prefilter cascaded with this type of equalizer has been designed.

The most convenient way of representing the coefficients of hardware friendly FIR filter is that of signed-powers-of-two (SPT) illustration. For a fixed word length B of the DSP processor, the impulse response coefficient may have its general form like:

$$h(n) = \sum_{i=1}^{B} s_i 2^{-i} \quad \text{with} \quad s_i \in \{-1, 0, 1\}$$
(6)

A number of such illustrations are available in literature with a common emphasis to reduce the hardware complicacy as a whole. Mixed integer linear programming (MILP) technique was judiciously utilized for this purpose which is formulated to minimize the number of SPT terms for a given filter specifications. Such a representation with minimum number of SPT terms is the canonic signed digit code (CSDC) representation where no two SPT terms can be adjacent. The SPT representation in equation (6) may have its alternative binary form as:

$$h(n) = \sum_{i=1}^{B} (s_{n,i}^{+} - s_{n,i}^{-}) 2^{-i} \quad with \quad s_{n,i}^{+}, s_{n,i}^{-} \in \{0, 1\}$$

$$\tag{7}$$

Introduction of equation (7) has made the formulation of the optimization goal function linear and made it possible to have linear constraints on the number of SPT terms per coefficient which the system designers have desperately tried to limit. Authors in Reference 19 (Gustafsson, 2001) have addressed this criterion in the optimization problem through the inclusion of the constraint  $\sum_{i=1}^{B} (s_{n,i}^* - s_{n,i}^-) \le L_{max}$ , where  $L_{max}$  is the maximum number of SPT terms per coefficient. The solution obtained by the process of optimization may not emerge as in CSDC form which has been ensured through the incorporation of the inequality,  $s_{n,i}^* + s_{n,i}^- + s_{n,i+1}^* + s_{n,i+1}^- \le 1 \forall n = \{0, 1, 2, \dots, N-1\}$ .

Design of FIR filters with sum of SPT coefficients by means of integer programming approach has been relaxed after a long time when Lu [21] has proposed a semi definite programming (SDP) problem that is solvable using efficient SDP solvers in polynomial time. SDP is nothing but a constrained optimization problem where a linear objective function is minimized subject to matrix constraints that closely depend on the variable vector *h*. Typical formulation of SDP problem has its form like [21]:

minimize 
$$C^T h$$
 where  $C(\omega) = [1, \cos\omega, \cos 2\omega, \dots, \cos(N-1)\omega]^T$ 

(8a)

subject to: 
$$F(h) = F_0 + \sum_{i=1}^r h_i F_i \geq 0$$
 (8b)

Matrices  $F_i$  for  $0 \le i \le r$  in the above equation are symmetric and  $\succcurlyeq$  symbolizes positive semi definite.

In the next year, Ito et al. [62] have proposed another design method for linear phase SPT filters based on an SDP relaxation method. Their method includes a linear programming (LP) relaxation and a relaxation by adding triangle inequalities. From the theoretical view point, SDP relaxation with triangle relaxation is stronger enough than simple SDP relaxation, LP relaxation or LP relaxation with triangle inequalities. In the same year, Yao and Chien [63] have proposed a three step algorithm for designing linear phase FIR filters with SPT coefficients where MILP has been applied in the last step to the three least significant digits of the filter coefficients for reducing the number of SPT terms.

In order to ensure the optimality of the obtained solution, the SPT FIR design problem has been formulated as a discrete semiinfinite linear programming problem (DSILP) and consequently been solved by branch and bound (B & B) method [19]. Authors have started solving the optimization problem by ignoring the fact that each of the coefficients is of SPT, i.e. relaxing DSILP to simple SILP. As SILP is a continuous optimization problem, the achievable solution may not always ensure the possibility of each coefficient to be of SPT and hence SILP is combined with the B & B method.

A new approach to low-power FIR filter design algorithm has been formulated in Reference 64 as an MILP problem that minimizes Chebyshev error and synthesizes coefficients consisting of prespecified alphabets. The number of alphabets corresponding to the coefficients has been reduced significantly and the near optimal coefficients satisfy the filter characteristics as well. In the same year, two-step and three-step schemes have been proposed in Reference 65 towards the design of variable digital filters with sum of SPT coefficients using minimax or least-square criterion. For the leastsquare criterion, an effective application of B & B method for solving this complex non-linear integer programming problem has been accomplished through the introduction of a reduced search area and an efficient cutting scheme. Through numerical examples, authors have also claimed that the obtained finite precision filters yield approximately the same performance as the infinite precision solution with a small number of additions and subtractions.

Recently, the design of discrete coefficient FIR filters has been facilitated by MILP and subsequently been solved by B & B technique [66]. The filter design problem has been formulated as a minimization problem such as:

minimize 
$$\gamma$$
, (9a)

subject to: 
$$|H(\omega) - 1| \le \delta_p$$
 for  $\omega \in [0, \omega_p]$ 

$$|H(\omega)| \le \gamma \quad \text{for } \omega \in [\omega_s, \pi] \tag{9b}$$

 $\delta_p$  and  $\gamma$  identify the maximum allowable ripple of  $H(\omega)$  in the pass-band and stop-band regions of interest. Authors have pointed out the minimization problem with trigonometric semi-infinite constraints (TSICs). According to the Markov-Lukacs theorem, linear TSICs in the variable  $h \in \mathbb{R}^{N+1}$  can always be changed to non-negative trigonometric polynomial. The filter design problem may thus be formulated as:

 $\min_{h,\gamma} \operatorname{inimize} \gamma$ 

subject to 
$$A_i h + d_i \in C_i^*(7), i = 1, 2, 3, 4$$
 (10)

with 
$$A_1 = -A_2 = A_3 = -A_4 = I, d_1 = (\delta_p - 1, 0, ..., 0)^T, d_2 = (\delta_p + 1, 0, ..., 0)^T, d_3 = d_4 = (\delta_p, 0, ..., 0)^T$$
. Equation (10) introduces a

new term  $C_i^*$ , derived from  $C_i$ , which is known to be the description of TSICs in terms of trigonometric curves and its polar in the reverse order. They have their mathematical illustrations as follows:

$$C_{a,b} = \left\{ c(\omega) : \cos\omega \in [\cos a, \cos b] \in \mathbb{R}^{N+1} \right\} \quad \text{where} \\ c_N(\omega) = (1, \cos\omega, \cos 2\omega, \dots, \cos N\omega)^T$$
(11)

$$C_{a,b}^* = \{ u: \langle u, v \rangle \ge 0 \ \forall \ v \in C_{a,b} \}$$

$$(12)$$

Semi definite programming (SDP) of equation (10) has therefore been solved by using SeDuMi [67] and consequently the optimal filter coefficients can be synthesized very easily.

Branch and bound (B & B) technique has later been utilized for designing low power linear phase FIR filters in Reference 68 by fixing a coefficient to a certain value which is determined by finding the boundary values of the coefficient using linear programming. Although the worst case run time of the algorithm is exponential, its capability to find out appreciably good solutions in reasonably good amount of time makes it a desirable CAD tool for designing such filters. Superiority of the algorithm on existing methods like those in References 18, 62, 69, 70 and 71 in terms of SPT term count, design time, hardware complexity and power performance has been explicitly demonstrated with several design examples.

The same problem of discrete coefficient FIR filter design using mixed integer programming (MIP) was formulated in Reference 23 where MIP is transformed into an equivalent integer programming problem on the basis of a transformation between two integer spaces and the computation of the optimum scaling factor for a given set of coefficients. An efficient algorithm based on a discrete filled function has subsequently been developed for solving the equivalent problem. Authors have proven the supremacy of their design over [70–74] with the help of some numerical examples.

An integer linear programming (ILP) approach to design optimal finite word length linear-phase FIR filters in the logarithmic number system (LNS) domain has been recently proposed [75]. Authors have optimized the filter directly in the LNS domain with finite word length constraints in which several branch variable selection and branching direction schemes were suggested and evaluated. By means of different design examples, it has been shown that the resultant filters are optimal in the minimax sense under finite word length conditions.

### 3. Common sub-expression elimination for the design of hardware efficient FIR filter

The general method for carrying out multiplication by a constant value can be achieved using a sequence of shifters and adders. However, the operations of subtractions are also used as well for the sake of using the hardware efficiently. In most of the cases, best results are obtained when the multipliers are represented by CSD digits as reported in literature. A number of articles are available in literature where researchers have developed the idea of optimizing the design of CSD multipliers by eliminating the common sub-expressions in filter coefficients. Common sub-expression elimination (CSE) has been extensively studied in literature and various algorithms have been proposed in References 28, 29, 76 and 77 in this regard. Basic feature of CSE method is to identify common bit patterns in the set of coefficients and to share those identified common sub-expressions in order to reduce the number of addition operations.

The first available article in this respect has been published by Hartley in the year 1996 which has reduced the number of adders by approximately 50% [29]. The proposed algorithm considers sub-expressions mixing terms in different versions of the input signal

| X(n)    | <u>1</u>   |           | 7         |   | 1 |            |   |          | ( |                      | $\searrow$ |               |
|---------|------------|-----------|-----------|---|---|------------|---|----------|---|----------------------|------------|---------------|
| X(n-1)  |            | 1         | $\square$ | ( |   | 7          |   | <u>1</u> | ( | $\frac{1}{\sqrt{1}}$ |            | $\frac{1}{2}$ |
| X(n-2)  | $\bigcirc$ | 1         |           |   | 1 | $\bigcirc$ |   |          |   |                      | $\int$     | $\square$     |
| X (n-3) | 1          | $\square$ |           |   |   |            | 1 |          | 1 |                      |            |               |

Fig. 1. Five occurrences of the same common sub-expressions in four coefficients of FIR filter.

and additionally it explicitly takes into account the number of delay latches in the circuit and attempts to minimize the number of adders and delays.

The algorithm is based on finding out several common subexpressions between coefficients. The main idea may be illustrated by a simple example of a 4-tap FIR filter whose output can be expressed as:

$$y(n) = h_0 x(n) + h_1 x(n-1) + h_2 x(n-2) + h_3 x(n-3)$$
(13)

with  $h_0 = (10101000010)_2$ ,  $h_1 = (0100010101010101010101)_2$ ,  $h_2 = (01001000010)_2$ , and  $h_2 = (1000001010000)_2$ .

It unambiguously identifies five occurrences of the same common sub-expressions between three different coefficients as shown in Fig. 1.

The first sub-expression can be expressed as:

$$x_2 = x(n) - x(n-1)2^1 = x_1 - x_1[-1] \gg (-1)$$
(14)

The term [-1] in equation (14) represents a unit delay and the sign '>>n' corresponds to an n-step right shift and the bar indicates a negative expression. The method proposed by Hartley is then applied recursively to identify common sub-expression. Fig. 2shows the location of the previous sub-expression into a new matrix and its recursive use.

From these figures, complete definition of the filter may be written as:

$$y = -x_1 + x_3 \gg 2 + x_2 \gg 10 - x_3[-1] \gg 5 - x_2[-1] \gg 11 - x_2[-2]$$
  
$$\gg 1 + x_1[-3] \gg 6 - x_1[-3] \gg 8 \quad \text{with} \quad x_3 = x_2 + x_1 \gg 2$$
(15)

In case any sub-expression definition involves negative shift, it is to be modified accordingly to remove the negative shift as shown below:

$$x_2' = x_1 \gg 1 - x_1 [-1] \tag{16}$$

$$x_3' = x_2' + x_1 \gg 3 \tag{17}$$

$$y = -x_1 + x'_3 \gg 1 + x'_2 \gg 9 - x_3[-1] \gg 4 - x'_2[-1] \gg 10 - x'_2[-2] + x_1[-3]$$
  
$$\gg 6 - x_1[-3] \gg 8 \quad \text{with} \quad x'_2 = x_2 \gg 1 \quad \text{and} \quad x'_3 = x_3 \gg 1$$
(18)

In the same year, Potkonjak and his co-workers [28] have described another common sub-expression based technique which finds the maximum number of coincidences between two signed digits. A new solution of the MCM problem is presented in Reference 77 that combines exhaustive search for multiple pattern identification with a steepest descent approach for pattern selection. Results have identified a significant reduction in either arithmetic operations or necessary hardware along with satisfactory runtimes.

Towards the elimination of common sub-expression during the design of multiplier-less filter, non recursive signed common sub-expression elimination (NR-SCSE) algorithm has been proposed in Reference 36 and consequently its several applications have been discussed. The limitation resulting from the recursive utilization of a common sub-expression is the high logic depth into the digital circuit. This has been solved in Reference 36 by using each sub-expression once. This new array splitting algorithm combines the advantages of previous methods in the sense that it reduces the logic depth from Hartley algorithm [29] and uses approximately the same number of logic operators than Bull-Horrocks modified (BHM) algorithm [25]. It searches for the non recursive signed common sub-expressions which must be eliminated from the original CSD array.

Heuristic common sub-expression elimination (CSE) and the coefficient quantization by successive approximation algorithm have been integrated in Reference 37 to precisely distribute a predefined addition budget to the quantized coefficients. An improved exploration algorithm with variable step-sizes has also been proposed to find an optimum scale factor that collectively settles the filter coefficients into the quantization space. Authors have claimed to reduce approximately 30% budgets for comparable filter responses. The improved scale factor exploration helps to find an identical or a better quantization result with significantly less run time irrespective of the application of CSE.

| <u>1</u> |          | 2 | 1 |   |   |                 |          | 2 |          |
|----------|----------|---|---|---|---|-----------------|----------|---|----------|
|          |          |   |   | 2 |   | $\underline{1}$ | >        |   | <u>2</u> |
|          | <u>2</u> |   |   |   |   |                 |          |   |          |
|          |          |   |   |   | 1 |                 | <u>1</u> |   |          |

| <u>1</u> |   | 3 |  |          |   |          | 2 |          |
|----------|---|---|--|----------|---|----------|---|----------|
|          |   |   |  | <u>3</u> |   |          |   | <u>2</u> |
|          | 2 |   |  |          |   |          |   |          |
|          |   |   |  |          | 1 | <u>1</u> |   |          |

Fig. 2. Recursive use of the algorithm [29] over the array in Fig. 1.

As time progresses, researchers have not only considered the SPT patterns in the coefficients but also the length of critical path in the multiplier-block. In connection to this, Yao et al. had proposed a novel CSE algorithm [78] for the synthesis of fixed point FIR filters which performs tradeoff designs between complexity and the throughput rate. The number of adders as synthesized by this method is proportionate with that required by the algorithms like References 25, 28, 29 and 36. Authors have also claimed that their method can synthesize the higher order complicated FIR filters within a few seconds.

In the year 2005, Macleod and Dempster introduced a new CSE algorithm which searches for a bounded number of minimal signed digit (MSD) representation [79]. The proposed algorithm first finds all the possible MSD representation of each different coefficient value by utilizing the method as described in Reference 80. Authors have established the supremacy of their proposed algorithm by comparing its performance with the existing algorithms like References 81–83. A genetic programming-based method for CSE in multiplierless digital filter realization has been introduced in Reference 84 which had searched for the common factors in higher order digital filters with a few non-zero digits in their coefficients. Fitness measure for this optimization technique involves the number of common sub-expressions which reduce interconnections and latency. Authors have also established the efficiency of their approach by experiments in 1D and 2D filters.

In order to implement low-complexity parallel multiplier-less digital FIR filters using the concept of shift inclusive differential (SID) coefficients and CSE, a new computation reduction method has been proposed by Wang and Roy in Reference 33. The idea of SID coefficients has been reformulated by introducing a new graph representation and mapping the optimization problem into an equivalent problem of determining a directed minimum spanning tree (DMST) of a directed multi-graph which has subsequently been solved by an optimal graph theoretic algorithm. For further reduction of design complexity, a novel CSE method has been proposed which recursively eliminates 2-bit sub-expressions with a steepest descent approach for sub-expression selection. As far as the efficiency of the proposed method is concerned, up to 75% reduction has been achieved in terms of number of additions as compared to other multiplier-less architectures like References 28, 29, 76 and 77. In comparison with one contemporary CSE algorithm, Wang and Roy's algorithm [33] achieves an improvement up to 19%.

Towards the synthesis of low-complexity powers-of-two FIR filter, minimization of SPT terms has been considered as the optimization goal. This problem statement has been reformulated in Reference 35 to account for the sharable adders where the authors address the optimization of the reusability of the adders for two major types of common sub-expressions, together with the reduction of adders for spare SPT terms. By limiting the number of common SPT (CSPT) terms to be no more than that of the rounded CSD coefficient set, first stage of the algorithm freely allocates any CSD coefficient in the neighbourhood of the rounded coefficient set to enhance the occurrences of the two chosen common subexpressions while reducing the total number of spare SPT terms in the minimal CSD coefficient set. The unachievable normalized peak ripple magnitude (NPRM) in the first stage has been compensated in the second stage by an efficient word length dependent adaptive neighbourhood search method. The algorithm uses a common sub-expression-based hamming weight pyramid to locate low-cost candidate coefficients with preferential consideration of shared common sub-expressions. The performance of the algorithm was compared with a number of state-of-the-art multiplierless algorithms like References 18, 70, 71, 85, 86 and 87. Experimental results have demonstrated that this method is capable of synthesizing FIR filters with least CSPT terms in comparison with previous approaches.

### 4. Approaches for the minimization of adders in hardware efficient FIR filter design

Design complexity resulting from the implementation of nonrecursive digital filters in custom or semi-custom integrated circuits without any built-in multiplier is often measured in terms of the number of addition operations used to realize the multiplication operation. With a view to reduce this complexity, CSD representation was being used for a long time by the circuit designers for this purpose [88].

The year 1995 has been marked with a significant progress in the field of circuits and systems where many researchers have come up with their innovative ideas towards the reduction of adder cost in filter design. Some of them have proven that using multiplier blocks for exploiting the redundancy across the coefficients results in considerable reduction in complexity over CSD representation which in turn are less complex than standard binary representation. Three such new algorithms have consequently been proposed in Reference 25 which consist of an efficient modification of an existing algorithm, one novel algorithm for better results and a hybridization of these two which trades off performance against computational time. Authors have investigated the shortcomings of popular BH algorithm, proposed by Bull and Horrocks [89], which had used multiplier blocks for reducing the implementation cost of FIR filters. The performance of Bull and Horrocks [89] yields an identical result as compared to the original design which uses several single-coefficient multipliers with fewer adders and subtractors by virtue of the fact that it allows all products of the input sample to be produced simultaneously. The limitations and the corresponding solutions are readily available in Reference 25 which have significantly improved the results obtained.

As a part of their major contribution in reducing the adder cost, Dempster and Macleod have introduced an n-dimensional reduced adder graph (RAG-n) algorithm that is divided into two parts. The first section is optimal in the sense that it ensures minimum adder cost provided that the set of the coefficients had been completely synthesized by this part of the algorithm. The second part of the algorithm is heuristic which uses two look-up-tables generated by the MAG algorithm [81], covering a range from 1 to 4096. For each coefficient value in the prescribed range, the cost look-up-table contains the optimum single-coefficient costs of multiplication and fundamental look-up-table contains different sets of fundamentals which can be used to implement the multiplication at optimal cost. It has been well established that for small set sizes, BH [88] and modified BH (BHM) [81] algorithms are significantly faster than the RAG-n algorithm.

As far as their contribution in the relevant field is concerned, it has been demonstrated that the heuristic RAG-n multi-coefficient cost multiplier block design algorithm results in an average improvement of about 20% over popularly known BH algorithm for five coefficients of 12 bit word length. BHM algorithm, which is identified as less efficient one in comparison with RAG-n because of its higher cost graph, even yields 10% improvement over BH algorithm of 12 bit word length and hybrid algorithm. However, RAG-n algorithm is slower than BHM for small coefficient sets but is quicker for large sets in which case the computation time for BH and BHM has a square law growth rate with set size in comparison with linear growth for RAG-n.

In the same year, towards the reduction of the complicacy of fixed-point multipliers with fixed or programmable multiplicands, one method was presented by Li [90] which has driven enormous attention in the relevant field. Their approach deals with finding out the minimum number of adders for implementing a multiplier of a given multiplicand. Before the proposition of this article, CSD expressions were normally used for multiplicands which had been heavily challenged by the proposed minimum number of shift-add operations (MNSAO) as far as the number of adders in the structure is concerned.

In comparison with CSD expressions under no more than same number of shift-add operations (SAOs); the MNSAO expression significantly increases the largest representable contiguous range and the number of representable integers in a given range and thus reduces the mean approximation error. Therefore it has subsequently been applied for the design of multiplier-less digital filters subject to some pre-specified implementation cost determined by the total number of adders in the entire filter. It has been shown that the filters designed in Reference 70 are significantly superior to those designed by MILP programming [91] and simulated annealing (SA) [92] which prescribes the number of SPT terms per coefficient to be no more than two. Li et al. have shown that the designed filter can achieve up to 4.2 dB smaller normalized peak ripple (NPR) over the technique in Reference 70, subject to the same number of adders for the entire filter.

Another promising article [76] demonstrates the use of optimizing transformations to diminish the number of additions and subtractions for a given set of filter coefficient values and coefficient representation schemes. For a direct form FIR filter structure, the number of additions has been minimized by eliminating the common sub-expressions in the binary representation of the coefficients. Reduction of the adders in the transposed form i.e. MCMbased form of FIR filter has also been taken care of by the authors through some modification of their already proposed algorithm. It has been demonstrated clearly that through the incorporation of CSE algorithm, total number of addition and subtraction operations has been reduced by as much as 35% for direct structure and 38% for transposed architecture. In effect, the total number of additions and subtraction operation has been reduced by an average factor of 2.2 in comparison with 1.43, as achieved in Reference 93.

Pearson and Parhi [94] had introduced a novel approach towards the design of low power FIR filter by means of parallel or block processing with duplication of hardware. They have achieved considerable reduction in multiplier element at the cost of doubling the number of adder elements. However, the reduced multiplier implementation yields lower hardware cost and less power consumption by virtue of the fact that the area required to implement a multiplier element is significantly larger than that of the adder element. In continuation to this, an adjacent coefficient sharing based sub-structure sharing technique along with maximum absolute difference quantization process has been introduced in References 95 and 96 and has subsequently been employed to reduce the hardware cost of parallel FIR filters. Based on the given examples, authors had shown that their proposition results in 45% reduction in hardware cost as compared to traditional parallel filtering methods.

Reduction of the total number of adders for synthesizing multiplier-less FIR filters has been achieved through a number of favourable approaches amongst which the systematic algorithm as proposed by Kaakinen and Saramaki [85] finds its suitable place in the literature. During the optimization procedure, one linear programming algorithm has been initially used for determining the parameter space of the infinite-precision coefficients as well as the feasible space where the filter meets the given amplitude specifications. The second step locates the filter parameters in this space such that the resulting filter satisfies the criterion with the simplest coefficient representation form. The main advantage associated with the approach in comparison with other existing techniques is that it finds all the solutions which can satisfy the given magnitude specifications.

Although the complexity of multiplier blocks was significantly reduced by adopting techniques like decomposing multiplication into simple operations of shifts and additions and sharing common sub-expressions, reducing the delay of multiplier blocks remained as an unexplored area till Kang and Park [97] have presented new algorithms to minimize the complexity of multiplier blocks under the given delay constraints. Authors have combined three proposed methods to BHM [81] and RAG-n [25] algorithms to implement filters which can satisfy the given specification of the number of adder-steps. A trade-off between delay and hardware complexity is enabled by changing the delay constraints. Experimental results have shown that the algorithm in Reference 97 can reduce the delay of multiplier blocks at the cost of a little increase in complexity.

It took several years when researchers have aggressively reduced both the coefficient word length and the number of non-zero bits in the filter coefficients with an aim of minimizing the adder step [98]. The authors have modified the representation of the filter coefficients such that the number of full-adders resulting from the hardware implementation is proportional to only the product of the signal word length and the number of adders. In effect, it implies that the number of full-adders is entirely independent of the coefficient word length and the number of shifts between the nonzero bits in the coefficient. Incorporation of this novel algorithm yields promising results to filters with up to 500 taps. In terms of the number of multiplier block adders and multiplier block full adders, authors have demonstrated the supremacy of their proposed technique over some existing ones. More explicitly, while the algorithm proposed in Reference 99 comes up with 25% to 44% reduction in the number of MB adders, the same achieved with Reference 98 is as high as 67%. In terms of the number of FAs, the resulting reduction is around 71% from Reference 98 as compared to 25% to 54% reduction in Reference 99.

For a long time, RAG-n was considered to be the probably best algorithm to solve MCM problems. However, a new algorithm called HCUB has been emerged as a way of improving the results over RAG-n [27]. Both of them are adder graph algorithms, divided into two stages – an optimal part and a heuristic part. The heuristic part can be viewed as adding extra coefficients to be realized such that the basic operation in the optimal part can continue. It is explicitly mentioned that the HCUB algorithm finds solutions that require up to 20% less additions and subtractions than the solutions found out by the previously known best algorithm like RAG-n [25] and BHM [81].

An adder graph type algorithm for solving the MCM problem has been introduced in Reference 26 with a novel heuristic inspired by difference method class of algorithms. Unlike the previous algorithms, it does not rely on look-up tables for its execution. It has been shown that the proposed heuristic provides better or comparable results than RAG-n. Compared to HCUB, the algorithm is slightly better on average for most of the conditions.

During the optimization of the coefficients of multiplier-less filter, common sub-expression sharing proves to be very much fruitful in which the coefficient multipliers are represented as a multiplier block (MB) with shared shifters and adders. As far as the power consumption in MBs is concerned, not only the total number of adders but also the adder depth of every coefficient demands for significant contribution. Few years back, an MILP based technique [100] has been employed to optimize the filter coefficients subject to the minimization of ripples in the frequency response of the filter along with a constraint on the total number of adders and an allowable maximum adder depth. Authors have established the supremacy of their proposed algorithm by means of a design example which reveals that the proposed algorithm generates filters using less adders with minimum adder depth than the approach like References 25, 78 and 101.

Recently, truncated MCM using pattern modification technique (PMT) has been developed for FIR filter implementation [32]. This algorithm truncates every node adder in DAG generated by different MCM algorithms with a common principle of ensuring that every two inputs to the same node have the same weight. Superiority of PMT has been established by virtue of the fact that compared to non-truncated MCM algorithms, it reduces the area cost by 35% without increasing quantization error.

### 5. Coefficient representation schemes in multiplier-less filter design

Tap coefficients of multiplier-less FIR filter are encoded in different forms so as to yield hardware efficient architecture. Many of the approaches have tried to select common sub-expressions after representing the constants in CSD form. Although CSD representation is effective for one constant, it is not the best for multiple constants because the CSD representation of a constant is unique and independent of the other constants. For the multiple constant multiplications, it would have been more efficient to use minimal signed digit representation (MSD) that has the same number of nonzero digits as CSD but provides multiple representations for a constant [102,103]. An algorithm has been proposed in this regard [83] to find all MSD representations of a constant and to synthesize digital filter based on the MSD representation. It utilizes the redundancy of the MSD representation to make as many common sub-expressions and thus leads to smaller filters. Superiority of the proposition has been established by implementing several filters and comparing the results with conventional ones obtained from CSD representation.

CSE technique decomposes all the constants in terms of several common bases. With a vision to optimize the storage of filter coefficients, this algorithm effectively extracts the commonly occurring sub-expressions. However, because of its highly irregular structure, filter model using CSE is hard to pipeline [104]. This has seriously drawn the attention of several researchers towards the low power, high speed realization of FIR filters. Sankarayya and his co-workers have been considered to be pioneers in this regard when they had proposed a new algorithm [30] for efficient representation of FIR filter coefficients. Instead of the direct coefficients, this algorithm uses various orders of differences between the coefficients along with the stored pre-computed results to compute the convolution sum and accordingly has been termed as differential coefficients method (DCM) in literature. As differential coefficients have shorter word length than the original, it can reduce the number of ones required to represent the coefficients and hence reduce the power consumption. An N-tap FIR filter with coefficient  $h_k$ , input sequence  $x_j$  and output sequence  $y_i$ , can be expressed as:

$$y_j = \sum_{k=0}^{N-1} h_k \cdot x_{j-k} \quad \forall j$$
<sup>(19)</sup>

DCM technique, on the other hand, first computes the partial product with differential coefficients and then computes the sum of the stored partial product of previous computation to obtain the result corresponding to the original coefficient set. Two consecutive outputs of the filter may readily be obtained by expanding equation (19) as:

$$y_j = h_0 \cdot x_j + h_1 \cdot x_{j-1} + h_2 \cdot x_{j-2} + \dots + h_{N-1} \cdot x_{j-N+1}$$
(20)

$$y_{j+1} = h_0 \cdot x_{j+1} + h_1 \cdot x_j + h_2 \cdot x_{j-1} + \dots + h_{N-1} \cdot x_{j-N+2}$$
(21)

The term  $y_{j+1}$  may be written in terms of first order difference DCM as:

$$y_{j+1} = h_0 \cdot x_{j+1} + \left(dh_1^1 \cdot x_j + h_0 \cdot x_j\right) + \dots + \left(dh_{N-1}^1 \cdot x_{j-N+2} + h_{N-2} \cdot x_{j-N+2}\right)$$
(22)

The variable  $dh_k^1 = h_k - h_{k-1}$ ,  $\forall k = 1, 2, ..., N-1$  is termed as the first order difference between the adjacent coefficients  $h_k$  and  $h_{k-1}$ , and the terms like  $h_0.x_j$  and  $h_{N-2} \cdot x_{j-N+2}$  are the compensating terms. As

can be inspected from equation (22), DCM suffers from overheads since it needs extra adders to compute the sums of stored partial products of previous computation in order to compensate the effect of differential coefficients [27]. Apart from considering differential coefficients, differential inputs had also been taken care of in one of an algorithm termed as differential coefficients and input method (DCIM) [105]. For three consecutive outputs  $y_{j-1}$ ,  $y_j$  and  $y_{j+1}$ ; their first order differences may be defined as follows:

$$y_{j}^{1} = y_{j} - y_{j-1} = h_{0} \cdot (x_{j} - x_{j-1}) + h_{1} \cdot (x_{j-1} - x_{j-2}) + \dots + h_{N-1} \cdot (x_{j-N+1} - x_{j-N})$$
(23)

$$y_{j+1}^{1} = y_{j+1} - y_{j} = h_{0} \cdot (x_{j+1} - x_{j}) + h_{1} \cdot (x_{j} - x_{j-1}) + \dots + h_{N-1} \cdot (x_{j-N+2} - x_{j-N+1})$$
(24)

Sum of the first (N-1) partial products of  $y_j$  may be defined as [105]:

$$y_{j,(N-1)} = h_0 \cdot (x_j - x_{j-1}) + h_1 \cdot (x_{j-1} - x_{j-2}) + \dots + h_{N-2} \cdot (x_{j-N+2} - x_{j-N+1})$$
(25)

Now,

$$y_{j+1}^{1} = h_{0} \cdot (x_{j+1} - x_{j}) + \{(h_{1} - h_{0}) \cdot (x_{j} - x_{j-1}) + h_{0} \cdot (x_{j-1} - x_{j-2})\} + \cdots + \{(h_{N-1} - h_{N-2}) \cdot (x_{j-N+2} - x_{j-N+1}) + h_{N-2} \cdot (x_{j-N+2} - x_{j-N+1})\}$$

$$= h_{0} \cdot (x_{j+1} - x_{j}) + (h_{1} - h_{0}) \cdot (x_{j} - x_{j-1}) + \cdots + (h_{N-1} - h_{N-2}) \cdot (x_{j-N+2} - x_{j-N+1}) + \{h_{0} \cdot (x_{j} - x_{j-1}) + h_{1} \cdot (x_{j-1} - x_{j-2})\} + \cdots + h_{N-2} \cdot (x_{j-N+2} - x_{j-N+1})\}$$

$$= h_{0} \cdot (x_{j+1} - x_{j}) + (h_{1} - h_{0}) \cdot (x_{j} - x_{j-1}) + \cdots + (h_{N-1} - h_{N-2}) \cdot (x_{j-N+2} - x_{j-N+1}) + y_{j(N-1)}$$
(26)

and

$$y_{j+1} = y_{j+1}^{i} + y_{j}$$
  
=  $h_{0} \cdot (x_{j+1} - x_{j}) + (h_{1} - h_{0}) \cdot (x_{j} - x_{j-1}) + \cdots$   
+  $(h_{N-1} - h_{N-2}) \cdot (x_{j-N+2} - x_{j-N+1}) + y_{j,(N-1)} + y_{j}$  (27)

As can be inspected from equation (27), except the first term, rest (N-1) partial products are multiplications between differential coefficients and differential inputs and therefore a shorter multiplier than that in DCM may be used in DCIM which stores the sum of compensated terms in  $y_{j,(N-1)}$ . This has the consequence of avoiding additional (N-2) unnecessary memory accesses and additions. But for each output  $y_j$ , two extra storage and additions are required. However, since the basic technique used in DCIM is same as that of DCM, their overheads are also common. In addition to this, DCIM suffers from input propagation delay since the difference cannot be derived prior to the input arrival.

Both DCM and DCIM method calculate the difference between adjacent tap coefficients in order to minimize the resulting hardware cost. This approach may not always lead to shorter word length of the difference signals in case the adjacent coefficients differ by a significantly large margin. This issue has been studied in recent past by few researchers [39,40] who had calculated the difference between those coefficients which are having least difference between their magnitude values and subsequently these minimal difference values have been used to encode the differential coefficients. The use of minimal difference coefficients reduces the effective word length and minimizes the number of full adders and net memory in turn. This approach, known as minimal difference differential coefficients method (MDDCM) [40], first sorts the coefficients such that adjacent coefficients are having minimal differences in their magnitudes before computing the difference representation.

Almost all the algorithms available in the literature for designing multiplier-less filter have primarily focused on the minimization of the total number of full adders in realizing the filter. There are few reported articles which have judiciously represented the powers-of-two tap coefficients in such a way that the number of full adder count can be reduced. Pseudo floating point (PFP) representation scheme is one such approach which has drawn considerable attention amongst researchers. For any arbitrary coefficient  $h_i$  of word length B, represented in the form of CSD as  $h_i = \sum_{i=1}^{B-1} 2^{-a_{ij}}$ , can have its PFP representation as [41]:

$$h_i = 2^{-a_{i0}} \sum_{j=0}^{B-1} s_j \cdot 2^{-(a_{ij}-a_{i0})} = 2^{-a_{i0}} \sum_{j=0}^{B-1} 2^{-c_{ij}}$$
(28)

where  $s_j \in \{1, 0, -1\}$  and  $c_{ij} = a_{ij} - a_{i0}$ . The term  $a_{i0}$  is known as the 'shift' and the maximum of  $c_{ij}$ , i.e.  $(a_{i(B-1)} - a_{i0})$  is termed as the 'span' part. As can be inspected from equation (28), PFP representation makes it possible to express any B-bit CSD coefficient as a (shift, span) pair using fewer bits.

Coefficient partitioning [41] is another well developed algorithm which has been really effective in reducing the range of the span part of PFP by partitioning it into two sub-components. This method divides the entire span part into two sub-components of length  $\frac{M}{2}$  for even M (or two sub-components of length  $\left\lceil \frac{M}{2} \right\rceil$  and  $\left\lfloor \frac{M}{2} \right\rfloor$  for odd M) where M represents the span of PFP rep-

resentation. The latter sub-component is further scaled by its order to reduce its span. As a matter of fact, the partitioned and scaled version of PFP coefficients can be added with less number of full adders. Moreover, attempts have been taken to examine the adder complexity reduction achieved by partitioning the coefficients into more than two sub-components. It has come to the observation of the authors that the widths of the adders in the intermediate stages of the multiplier are larger and thus calls for more full adders. On the other hand, when the coefficient is partitioned into two subcomponents, only one inner shift operation exists and the widths of the adders in the preceding stages are less, while the final stage adder requires the highest width. Therefore partitioning a coefficient into two halves offers the best reduction of full adders than partitioning into multiple parts.

Limitations of PFP scheme have very recently been pointed out in Reference 106 by the introduction of minimum index floating point (MIFP) representation for the powers-of-two coefficients of FIR filter. Computational cost of MIFP scheme has been measured with respect to various performance metrics like number of one bit full adders, number of one bit shifters and total delay count. Superiority of the scheme has subsequently been established in terms of those parameters. Resultant coefficient representation under MIFP scheme may be outlined as [106]:

$$h_{i} = 2^{-\mu_{i}} \sum_{j=1}^{\mathbb{N}_{i}} s_{j} \cdot 2^{-a_{i}^{j}} \quad \text{where } \mu_{i} = a_{i}^{1} + \frac{\left(a_{i}^{\mathbb{N}_{i}} - a_{i}^{1}\right)}{2}$$
(29)

The term  $\mu_i$  in the above equation identifies the overall shift applied to the terms inside the span part and hence it is known as the 'shift' part in MIFP. The variable  $a_i^j$  in equation (29) implies the index of a non-zero term relative to the position of the term  $\mu_i$  and hence it may assume both positive and negative integer values including zero depending upon its position. Collection of powers-oftwo terms with positive (including zero)  $a_i^j$  constitutes the 'left span' and with negative  $a_i^j$  constitutes the 'right span' part in the MIFP scheme.

In order to reduce the power consumption of FIR filter, a novel coefficient ordering algorithm has been described in Reference 107 where the implementations are based on processing the coefficients in a non-conventional order using both direct form (DF) and transposed form (TF) FIR filters. An overall power reduction of up to 34% with up to 56% area overhead for TF structure is reported as compared to conventional filter implementation. However, DF structure results in 19% power reduction without incurring any area overhead.

A new hardware efficient reconfigurable FIR filter architecture has been recently proposed in Reference 38 where filter coefficients have been partitioned into smaller sub-coefficients based on novel binary signed sub-coefficients. Partial products of all possible sub-coefficients and input data have been calculated in precomputer block and results are distributed on filter taps to compose the coefficient multiplication.

### 6. Intelligent optimization techniques in the field of hardware efficient FIR filter design

Efficient design of multiplier-less powers-of-two FIR filter has already been addressed as a problem of optimization by several researchers. As a matter of fact, a number of mathematical optimization algorithms like MILP [18] and SDP [21] have been judiciously employed for the purpose of solving the problem. The last decade of the twentieth century is considered to have significant impact on the field of signal processing because of its resourceful amalgamation with artificial intelligence. In connection to this, Benvenuto and his co-researchers [92] have presented one simulated annealing (SA) algorithm for the design of linear phase powers-of-two digital filter. Towards the reduction of computational complexity, new features have also been added with respect to traditional SA algorithms. As an attempt to combat with the large computation time of SA, entropy directed deterministic annealing (EDDA) optimization algorithm [108] has been presented for the design of digital filters with discrete coefficients. It utilizes estimates of conditional entropy to prune the problem during the optimization and thereby reduces the computational time by 30 to 50%. The concept of SA has been recently applied to the sum of powers-of-two optimization problem by minimizing the total number of nonzero digits of the FIR coefficients [109]. Apart from dealing with classical filter specification like in-band ripple and stop-band rejection, it has also considered additional uncommon shape constraints even in the transition band.

Moreover, quite a few evolutionary and swarm optimization techniques have proven their competency in substituting many of the traditional optimization mechanisms which occasionally fail to perform suitably in many of the engineering problems. Design of multiplier-less FIR filter has also been seriously influenced by appropriate application of evolutionary optimization techniques, amongst them genetic algorithm (GA) is most common. In connection to this, Cemes and Ait-Boudaoud have initiated the GA-based power-of-two FIR filter design problem by using simple genetic operators like reproduction, cross-over and mutation to search the discrete coefficient space of predefined powers-of-two coefficients [43,110]. Their approach has outperformed traditional techniques that restrict their coefficients to be single power-oftwo terms. Two years later, Gentili and his co workers [45] had thrown sufficient light on the same problem by adopting a specific filter coefficient coding scheme. Authors have claimed that their proposed approach is capable of attaining better or almost comparable results than the other methods of interest like MILP [18], simulated annealing (SA) [92], Parks McClellan method [111], proportional relation preserve (PRP) method [112] and so on. Because of its implicit parallel nature, GA-based approach can explore many possible solutions at each generation and hence can be easily implemented on parallel machine. Design of high-speed low-power FIR filter has also been facilitated by GA in Reference 113 where the required goal has been achieved by factorizing a long filter into several cascaded subfilters each with coefficient values constrained to sum of SPT. GA has made it possible to implement filters in signed powers-of-two space with near global minimum and low hardware cost. Very recently, a novel GA is proposed for the design of multiplier-less linear phase FIR filters both in single stage and cascade forms [8]. The discrete search space is partitioned into

smaller ones based on pass-band gains and the search efficiency has been improved by adjusting the cross-over and mutation rate in an adaptive way. Unlike the conventional GA, algorithm in Reference 8 uses the adder cost of the filter as the objective function and penalties are applied when ripple requirements are not met. The proposition proves to be greedy over [68,114] in terms of design time and the hardware cost is saved in most of the cases.

Optimization of FIR filter over the CSD coefficient space based on GA has been developed in Reference 115. Proposed optimization technique exploits the restoration of CSD numbers in conjunction with the conventional cross-over and mutation operators in addition to a new local mutation operator. Application of GA for optimizing filters generated by the FRM technique has been presented in Reference 116. It has been demonstrated that GA is capable of producing better discrete coefficient solution as obtained from linear optimization technique and is very close to the continuous solution obtained from non-linear optimization technique. Another novel genetic algorithm for the design and discrete optimization of FRM FIR digital filters over the conventional CSD as well as new double base number system (DBNS) multiplier coefficient spaces has been introduced in Reference 117. Proposed genetic algorithm was based on a pair of indexed look-up tables of permissible CSD/DBNS numbers whose indices form a closed set under the operations of cross-over and mutation. It automatically leads to legitimate CSD/DBNS coefficients without any recourse to gene repair during optimization. Finally, it has been successfully applied to the design of a pair of low-pass and band-pass FRM FIR digital filters. Through proper design examples, authors have established that the resulting optimized CSD/DBNS filters outperformed the corresponding infinite precision FRM FIR digital filters in some cases [117].

Although conventional GA (CGA) has proven itself to be a potential search tool for the design of multiplier-less FIR filter, it requires comparatively huge computational time since the repetitive evaluations of a large population of candidate solution are relatively low. This issue has later been addressed by Cen and Lian [46] who had incorporated a new variant of GA, known as micro GA ( $\mu$ GA) in the same design problem. µGA-based algorithm requires small population size for its execution which had made the convergence speed of µGA relatively faster than that of CGA. However, there is a likelihood that µGA may be trapped into a local optimum point due to the presence of a small population. This issue has been addressed through proper modification of µGA by varying the probabilities of cross-over and mutation during the evolution and consequently termed as modified µGA [47]. Authors have claimed that compared to CGA, modified µGA speeds up the optimization process significantly. This has been substantiated by experimental analysis in the sense that modified  $\mu$ GA is about seven times faster than CGA and yields a better solution than MILP-based design. A new variant of GA, called orthogonal genetic algorithm (OGA), has been incorporated in the design of cascade form multiplier-less FIR filter [48] which has explored two objective functions based on a single and multiple amplitude response criterion. Authors have claimed that the OGA approach leads to improved amplitude response relative to that of an equivalent direct-form cascade filter obtained using the Remez exchange algorithm.

Traferro et al. [42] have added a global constraint which fixes the total number of shift registers in such a way that each coefficient can be represented using different precisions. Optimization of FIR filter coefficient has been solved by a specific tabu search (TS) method which is computationally lighter than other heuristics like SA and GA. Supremacy of the design algorithm has been substantiated by several experimental results and comparisons with previously reported works like References 112 and 118. A hybrid genetic algorithm (GST), composed of the main features of adaptive GA (AGA), simulated annealing (SA) and tabu search (TS), has been introduced in Reference 119 towards the design of powers-of-two FIR filter. AGA with varying population size and varying probabilities of genetic operations works as the basis of the hybrid algorithm. Use of SA is to help AGA escape from the local optima and prevent premature convergence. The concept of tabu has been introduced to speed up convergence by reducing search space according to the properties of FIR filter coefficients. It has been established by means of design examples that the normalized peak ripples of the designed filters can largely be reduced with the help of GST. Unlike the other GA, the method of GST improves the solution quality and reduces the computational effort as well.

Powers-of-two design of FIR filter has been recently achieved with the aid of some evolutionary computational algorithms which have outperformed GA along with its different variants in many benchmark problems. In regard to this, differential evolution (DE) algorithm was used to design multiplier-less FIR filter with powers-of-two coefficients [49]. Impact of different mutation strategies of DE in the design process has subsequently been studied in References 120–122 and a new self-adaptive DE algorithm has also been proposed for the design purpose [123]. The same problem has later been targeted by means of self-organizing random immigrants genetic algorithm (SORIGA) [49,50] and its supremacy over the previous design strategies had been established.

Design of a CSD based FRM filter with reduced computational complexity has been accomplished by means of swarm optimization technique like artificial bee colony (ABC) algorithm [53]. Reduced computational complexity has been achieved due to fewer generations for convergence as well as the reduced dimension of the food source along with its appropriate initial selection. Moreover, quality of the solution has been ensured through efficient exploration and exploitation of the search space in the modified ABC algorithm. Design of non-uniform filter bank trans-multiplexer has been achieved in Reference 54 where the filter coefficients are synthesized in the CSD format and ABC algorithm has been employed for the purpose of optimization. Simulation result has established that the performance of the proposed algorithm is better than that obtained by rounding the continuous coefficients of the filter to the nearest CSD number.

### 7. Design strategies of two-dimensional multiplier-less FIR filter

Design of two-dimensional multiplier-less filter has also gained serious attention from researchers over the last few decades. Enormous modification has taken place in this field since the year 1987 when Pei and Jaw [124] have taken pioneering initiative for the design of 2D multiplier-less digital FIR filters using a special class of multiplier-less 1D filter with coefficients as sums or differences of powers-of-two. Authors have incorporated McClellan transformation to map the one-dimensional filter into a two-dimensional one. As far as the hardware implementation of these filters is concerned, they are very attractive, efficient and reliable for high speed computation. However, the structure proposed by Pei is valid only for original first order McClellan transformation. In connection to this, a new analytical approach for the determination of the coefficients of the first order McClellan transformation has been presented accordingly by Kwan and Chan [125]. On comparing the results with those of the original first order McClellan transformation, authors have established the improvement resulting from their analytical approach over the original one.

Use of a generalized McClellan transformation with order more than one for the design of 2D linear phase FIR digital filters has been illustrated in Reference 55. The design problem is formulated as a linear programming (LP) optimization problem to maximize the transition width of 1D FIR filter subject to the inequality constraints in the 2D frequency domain. A local search method has finally been adopted for efficiently finding the appropriate powers-of-two coefficients. The optimization algorithm eliminates the drawbacks of high computational cost and huge memory storage in using conventional LP based algorithms. Three simple and efficient transformations have been proposed in Reference 126 for designing circularly symmetric wideband and multiple bands 2D FIR filter. The first transformation has been regarded as the  $k^{th}$  order version of the original McClellan transformation and other two transformations are developed on the basis of  $k^{th}$  order McClellan transformation. Effectiveness and flexibility of the proposed transformations have been fully depicted by the presented illustrations. Authors have claimed that in comparison with other transformations, approach in Reference 126 has provided significant savings in the number of multiplies at the expense of slightly large number of adders and delays.

An optimal minimax design of 2D FIR digital filters with finite precision coefficients and linear phase has been developed in Reference 127. This algorithm associates linear programming and a branch and bound technique for which two strategies are compared, namely depth-first-search and hybrid strategy consisting of depth-first-search and breadth-first-search. A large number of design examples are presented to show the efficiency of the method for the design of 2D filters with different specifications and sizes. One simulated annealing (SA) based design technique has been proposed for the minimax design of 2D multiplier-less FIR filters [128] whose coefficients have been written as the sum or difference of two power-of-two terms. The algorithm proves to be intrinsically very flexible. Usefulness of the technique in the context of video filters has been demonstrated by a number of filter design examples. Minimax design problem of two-dimensional linear phase FIR filters with continuous and discrete coefficients has later been described in Reference 129. Authors have initially formulated the minimax continuous-coefficient design problem as an LP problem with inequality constraints. Based on the obtained continuous coefficients, an efficient method was proposed for designing 2D filter with powers-of-two coefficients in the spatial domain.

A number of artificially intelligent optimization techniques have found their suitable application in the design process of 2D multiplier-less filter too. In connection to this, the very first paper has appeared in the year 1995 when Sriranganathan and his coworkers have designed circularly symmetric and diamond shaped low-pass linear phase powers-of-two FIR filters with the aid of GA [56]. Authors have adopted minimax error criterion which leads to a minimization of weighted ripple in both pass-band and stopband. Designed filter has been found to yield better or comparable performance than those designed with the aid of LP and SA. Another efficient design method of multiplier-less 2D state-space digital filters (SSDF) based on GA has been proposed in Reference 130 which are found to be attractive for high speed operation and simple implementation. The design problem is described by Roesser's local statespace model and formulated subject to the stability of the resultant filter. Thamvichai et al. [131] have incorporated two different types of GA, namely binary-GA and integer-GA, to find the periodically shift variant (PSV) coefficients of 2D filter. The design involves finding the impulse response of the 2D PSV filter in closed form and then using GA to find the filter coefficients.

An effective GA-based approach has been proposed [57] for designing two-dimensional FIR filters with complex-valued frequency responses by extending the concept of 1D filter design. Through minimization of quadratic measure of error in the frequency band, realvalued chromosomes are evolved to realize filter coefficients with evolutionary algorithm. It has been also shown that some coefficients of the designed filters are inherent to zero and thus results in significant saving in design time. An advanced GA was developed in Reference 132 to design 2D FIR filters which can adapt the genetic operators during the genetic life while remaining simple and easy to implement. Adaptive GA has produced filters with good re-

sponse characteristics while greatly reducing the error criteria and CPU time. GA combined with singular value decomposition (SVD) has been used to design 2D FIR filters in which the role of GA was to optimize the design of 1D filter [133]. An improvement to SVD was made by varying the order of 1D filter in each branch in accordance with its singular values. This improvement has resulted in more efficient design by reducing the number of coefficients by 20% with acceptable error in pass-band and stop-band. Recently, design of 2D multiplier-less linear phase FIR filter has been accomplished by designing multiplier-free 1D linear phase FRM FIR filter followed by multiplier-less transformation [43,58]. Resulting 1D filter is converted to the CSD space using a new discrete optimization based on modified gravitational search algorithm (GSA) [58] and modified harmony search algorithm (HSA) [134]. GSA and HSA have been adapted in such a way that during the course of optimization, candidate solutions turn out to be integers and efficient exploration and exploitation of the search space are done. Approaches in References 58 and 134 are bestowed with the features of reduced computational complexity and time.

A new strategy of multiplier-less image filter design with the aid of DE algorithm has been presented very recently [59]. Designed filter has accordingly been used to reduce the effect of Gaussian noise from standard test images and resulting performance has been studied with respect to relevant parameters. Authors have claimed the superiority of their design by comparing those parameters with other design approaches. One comparative study of evolutionary algorithms applied for the design of 2D FIR filters has been elaborated in Reference 135. Several stochastic methodologies capable of handling large spaces have also been explored. Finally, a new GA has been proposed where some concepts are introduced to optimize the trade-off between diversity and elitism in the genetic population.

### 8. Experimental results

This section makes an attempt to throw sufficient light on the progress and impact of powers-of-two FIR filter design from various perspectives. The objective of the design process is to achieve the desired filter specification with as minimum hardware cost as possible. Frequency characteristics of the designed filter have been governed by few performance parameters like pass-band ripple, transition and stop-band attenuation, width of the transition-band and so on. Similarly, hardware efficiency of the multiplier-less filter can be calculated on the basis of different indices like total number of powers-of-two terms (TPT), total number of adders (TA) divided into multiplier adders (MA) and structural adders (SA), total number of D flip flops (TDF) divided into multiplier D flip flops (MDF) and structural D flip flops (SDF) and total number of zero-valued filter coefficients (ZFC). A detailed comparative study amongst various hardware efficient FIR filters has been summarized in Tables 1 and 2below in which Table 1 demonstrates the behaviour of the filter in frequency domain while Table 2 emphasizes on the associated hardware cost.

Looking at the numerical entries in Table 1, supremacy of DEMLFIR filter [49] can easily be established as it yields higher attenuation value in the transition band of frequency response. On the other hand, [136] outperforms the other design algorithms by a large margin in terms of stop-band behaviour of the frequency characteristics. outperforms the other design algorithms by a large margin. However, except the design in Reference 85, the rest of the powers-of-two FIR filters had produced an acceptable stop-band behaviour in the sense that the minimum stop-band attenuation value is always higher than 80 dB. In an attempt to compare the hardware complexity of multiplier-less FIR filters, associated indices are calculated per unit length of the filter as they are of a different order. It is clearly seen from Table 2 that DEMLFIR [49] provides favourable design for its implementation in terms of TPT, TDF and ZFC as compared to other powers-of-two filters. Moreover, most of its

#### Table 1

Comparative analysis with respect to frequency response of hardware efficient FIR filters.

| Method                            | Length of<br>the filter | Transition-band<br>attenuation (dB)<br>at different frequency<br>points (rad/pi) |       | Stop-band<br>attenuation (dB) at<br>different frequency<br>points (rad/pi) |       |       | Minimum stop-band attenuation (dB) |       |
|-----------------------------------|-------------------------|----------------------------------------------------------------------------------|-------|----------------------------------------------------------------------------|-------|-------|------------------------------------|-------|
|                                   |                         | 0.35                                                                             | 0.4   | 0.45                                                                       | 0.65  | 0.75  | 0.85                               |       |
| Samueli [69]                      | 25                      | 4.279                                                                            | 15.03 | 34.02                                                                      | 98.19 | 87.92 | 112.2                              | 84.66 |
| Chen and Willson [71]             | 28                      | 2.958                                                                            | 13.48 | 39.58                                                                      | 135.2 | 118.1 | 124.6                              | 115.8 |
| Kaakinen and Saramaki [85]        | 29                      | 1.002                                                                            | 4.239 | 15.76                                                                      | 45.8  | 50.33 | 53.6                               | 30.28 |
| [heng, Jou and Wu [136]           | 30                      | 2.954                                                                            | 13.77 | 40.6                                                                       | 162.1 | 121   | 168.1                              | 117.9 |
| Xu, Chang and Jong [35]           | 28                      | 3.732                                                                            | 13.8  | 38.3                                                                       | 150.2 | 116.7 | 89.06                              | 80.25 |
| Feng and Teo [23]                 | 34                      | 2.41                                                                             | 13.71 | 44.2                                                                       | 154.4 | 147   | 143                                | 130.6 |
| Chandra and Chattopadhyay<br>[49] | 29                      | 22.28                                                                            | 40.93 | 70.05                                                                      | 120.5 | 125.2 | 120.5                              | 110.7 |

coefficients have a value of zero and thus makes the structure more applicable for low power design.

Hardware complexity of the designed filter may further be improved by the incorporation of proper representation techniques. Since the powers-of-two filters substitute multipliers by means of adders and shifters only, hardware cost of such filters are generally measured in terms of full adders (FA) count only. In order to make one comparative analysis amongst different representation schemes, all possible binary vectors of length 10, 12 and 14 have been considered into the present study and subsequently the average number of FA count had been calculated as listed in Tables 3–5.

Supremacy of MIFP scheme in minimizing the FA count has been firmly established from the results in the above tables. It can be explicitly seen that irrespective of the coefficient word length and number of non-zero bits, MIFP always requires less FA as compared to direct method or PFP. More specifically, with a total of 8 non-zero bits in the filter coefficient, MIFP requires 12.44%, 13.76% and 14.88%, and 15.87% less full adders than PFP for coefficient word length of 10, 12 and 14 respectively. Corresponding improvement with respect to direct method has been respectively found to be 13.92%, 16.45% and 18.58%.

Computation of the total number of full adders has finally been carried out by considering an arbitrary coefficient

#### Table 2

Comparative analysis with respect to hardware cost per unit length of multiplier-less FIR filters.

| Method                  | Length of  | Word   | Name of the parameter |       |        |       |  |
|-------------------------|------------|--------|-----------------------|-------|--------|-------|--|
|                         | the filter | length | TPT                   | TA    | TDF    | ZFC   |  |
| Samueli [69]            | 25         | 9      | 1.8                   | 1.76  | 10.4   | 0     |  |
| Chen and Willson [71]   | 28         | 12     | 2.143                 | 2.179 | 15.893 | 0.07  |  |
| Kaakinen and            | 29         | 11     | 1.69                  | 1.862 | 12.207 | 0.207 |  |
| Saramaki [85]           |            |        |                       |       |        |       |  |
| Jheng, Jou and Wu [136] | 30         | 11     | 1.733                 | 1.9   | 12.367 | 0.2   |  |
| Xu, Chang and Jong [35] | 28         | 13     | 2.214                 | 2.393 | 17.679 | 0.214 |  |
| Feng and Teo [23]       | 34         | 13     | 2.176                 | 2.324 | 17.441 | 0.176 |  |
| Chandra and             | 29         | 8      | 1.621                 | 1.931 | 8.483  | 0.345 |  |
| Chattopadhyay [49]      |            |        |                       |       |        |       |  |

#### Table 3

Average number of full adders for a coefficient word length of 10.

| Number of     | Method          |         |         |  |  |  |  |
|---------------|-----------------|---------|---------|--|--|--|--|
| non-zero bits | Direct multiply | PFP     | MIFP    |  |  |  |  |
| 4             | 40.8            | 37.2    | 33.6476 |  |  |  |  |
| 5             | 53.6667         | 50.3333 | 45.3095 |  |  |  |  |
| 6             | 66.4286         | 63.5714 | 56.719  |  |  |  |  |
| 7             | 79.125          | 76.875  | 67.925  |  |  |  |  |
| 8             | 91.7778         | 90.2222 | 79      |  |  |  |  |

### Table 4

Average number of full adders for a coefficient word length of 12.

| Number of     | Method          |          |          |  |  |  |  |
|---------------|-----------------|----------|----------|--|--|--|--|
| non-zero bits | Direct multiply | PFP      | MIFP     |  |  |  |  |
| 4             | 44.4            | 39.6     | 35.3354  |  |  |  |  |
| 5             | 58.3333         | 53.6667  | 47.6692  |  |  |  |  |
| 6             | 72.1429         | 67.8571  | 59.7175  |  |  |  |  |
| 7             | 85.875          | 82.125   | 71.5316  |  |  |  |  |
| 8             | 99.5556         | 96.4444  | 83.1778  |  |  |  |  |
| 9             | 113.2           | 110.8    | 94.7091  |  |  |  |  |
| 10            | 126.8182        | 125.1818 | 106.1667 |  |  |  |  |

### Table 5

Average number of full adders for a coefficient word length of 14.

| Number of     | Method          |          |          |  |  |  |  |
|---------------|-----------------|----------|----------|--|--|--|--|
| non-zero bits | Direct multiply | PFP      | MIFP     |  |  |  |  |
| 6             | 77.8571         | 72.1429  | 62.7253  |  |  |  |  |
| 7             | 92.625          | 87.375   | 75.153   |  |  |  |  |
| 8             | 107.3333        | 102.6667 | 87.3863  |  |  |  |  |
| 9             | 122             | 118      | 99.4805  |  |  |  |  |
| 10            | 136.6364        | 133.3636 | 111.4765 |  |  |  |  |
| 11            | 151.25          | 148.75   | 123.4038 |  |  |  |  |
| 12            | 165.8462        | 164.1538 | 135.2857 |  |  |  |  |

h = 0.00010010010010 using an 8-bit quantized input signal. The coefficient may be written as  $h = 2^{-4} + 2^{-7} + 2^{-10} + 2^{-12} + 2^{-15}$  whose MIFP form is given by  $h = 2^{-10} (2^6 + 2^3 + 2^0 + 2^{-2} + 2^{-5})$ . Resultant multiplier structure has been shown in Fig. 3below which identifies that MIFP scheme is in need of 56 FAs only. On the other hand, direct multiply and PFP scheme require 80 and 64 FAs respectively. Hence, for the example coefficient at hand, MIFP outperforms the other two representation strategies by 30% and 12.5% respectively.

### 9. Conclusion

Design of hardware efficient multiplier-less FIR digital filter has received significant attention from researchers over the last few decades. A number of promising algorithms have been developed towards the efficient design of such filters. These include conventional techniques like integer quadratic programming, MILP, SDP and so on. Similar bit pattern in such powers-of-two filter has been eliminated by means of CSE and its improved variants. Contributory algorithms have been developed towards the reduction of adders in such filter circuits. In recent times, this field has been properly enriched with the amalgamation of powerful intelligent optimization techniques. This paper attempted to provide an overall picture of the state-of-the-art research carried out in this particular field. With a comprehensive introduction to the necessity of hardware



Fig. 3. Multiplier structure for the example coefficient using MIFP.

efficient digital filter design, this article has thrown sufficient light on the evolution of such design process along with their associated advantages and limitations. It has also provided a brief overview on the design procedure of two-dimensional image filter whose mask coefficients are in the form of powers-of-two.

Implementation of such hardware efficient filters deals with a number of design objectives which include required area, consumed power, speed or latency of the designed filter and associated throughput. Literature review suggests that most of the design algorithms aim to attain certain specific objective while keeping the other objectives unattended. However, appropriate trade-off amongst those objectives is essentially required in most of the practical applications. Intelligent optimization techniques of recent interest may be employed to address this issue. More specifically, multi-objective optimization algorithms could have been applied for the same design problem and the resultant impact over single objective optimization may be studied in the future. Role of fuzzy logic and fuzzy system towards the design of such powers-of-two filters may emerge as an active way of research for the next generation researchers. Design strategy of minimum phase multiplier-less digital filter may be focused as a future extension of this area of research. Since application of the designed multiplier-less FIR filters in various fields of communication and signal processing has not yet been examined, it could also be studied extensively in the future.

### References

- S.K. Mitra, Digital Signal Processing: A Computer-based Approach, 2nd ed., McGraw Hill, New York, 2001.
- [2] J.G. Proakis, Digital Signal Processing: Principles, Algorithms, and Applications, Prentice Hall of India, New Delhi, 1997.
- [3] B. Somanathan Nair, Digital Signal Processing: Theory, Analysis and Digitalfilter Design, Prentice-Hall of India, New Delhi, 2004.
- [4] L. Tan, Digital Signal Processing: Fundamentals and Applications, Academic Press, New York, 2011.
- [5] Y.C. Lim, S.R. Parker, A.G. Constantinides, Finite word length FIR filter design using integer programming over a discrete coefficient space, IEEE Trans. Acoust. ASSP-30 (4) (1982) 661–664.
- [6] J. Tian, G. Li, Q. Li, Hardware-efficient parallel structures for linear-phase FIR digital filter, in: Proceedings of 56th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS 2013), 2013, pp. 995–998.
- [7] V. Pavlovic, M. Lutovac, M. Lutovac, Efficient implementation of multiplierless recursive low pass FIR filters using computer algebra system, in: Proceedings of 11th IEEE International Conference on Telecommunication in Modern Satellite, Cable and Broadcasting Services (TELSIKS 2013), Vol. 1, 2013, pp. 65–68.

- [8] W.B. Ye, Y.J. Yu, Single-stage and cascade design of high order multiplierless linear phase FIR filters using genetic algorithm, IEEE Trans. Circuits Syst. I Regular Pap. 60 (11) (2013) 2987–2997.
- [9] J.-G. Chung, K.K. Parhi, Frequency spectrum based low-area low-power parallel FIR filter design, EURASIP J. Appl. Signal Processing 9 (2002) 944– 953.
- [10] C. Cheng, K.K. Parhi, Hardware efficient fast parallel FIR filter structures based on iterated short convolution, IEEE Trans. Circuits Syst. I Regular Pap. 51 (8) (2004) 1492–1500.
- [11] C. Cheng, K.K. Parhi, Hardware efficient fast parallel FIR filter structures based on iterated short convolution, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2004), 2004, pp. 361–364.
- [12] K. Ichige, H. Munemasa, A. Hiroyuki, An efficient signed-power-of-two term allocation for filter coefficients in digital communication system, IEICE Trans. Commun. 89 (12) (2006) 3266–3268.
- [13] A.F. Shalash, K.K. Parhi, Power efficient FIR folding transformation for wireline digital communications, in: Proceedings of 32nd Asilomar Conference on Signals, Systems and Computers, vol. 2, 1998, pp. 1816–1820.
- [14] C. Xu, S. Yin, Y. Qin, H. Zou, A novel hardware efficient FIR filter for wireless sensor networks, in: Proceedings of Fifth IEEE International Conference on Ubiquitous and Future Networks (ICUFN 2013), 2013, pp. 197–201.
- [15] E. Avenhaus, On the design of digital filters with coefficients of limited word length, IEEE Trans. Audio Electro Acoust. 20 (3) (1972) 206–212.
- [16] C. Charalambous, M. Best, Optimization of recursive digital filters with finite word lengths, IEEE Trans. Acoust. 22 (6) (1974) 424–431.
- [17] M. Suk, S.K. Mitra, Computer-aided design of digital filters with finite word lengths, IEEE Trans. Audio Electro Acoust. 20 (5) (1972) 356–363.
- [18] Y.C. Lim, S.R. Parker, FIR filter design over a discrete powers-of-two coefficient space, IEEE Trans. Acoust. 31 (3) (1983) 583–591.
- [19] O. Gustafsson, H. Johansson, L. Wanhammar, An MILP approach for the design of linear-phase FIR filters with minimum number of signed-power-of-two terms, in: Proceedings of European Conference on Circuit Theory Design (ECCTD 2001), 2001, pp. 217–220.
- [20] R. Ito, K. Suyama, R. Hirabayashi, Optimal design of FIR filter with discrete coefficients based on integer semi-infinite linear programs, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2001), vol. 2, 2001, pp. 629–632.
- [21] W.S. Lu, Design of FIR filters with discrete coefficients: a semi definite programming relaxation approach, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2001), vol. 2, 2001, pp. 297– 300.
- [22] R. Ito, R. Hirabayashi, Optimal design of FIR filter with SP2 coefficients based on semi-infinite linear programming method, in: Proceedings of 14th European Signal Processing Conference (EUSIPCO 2006), 2006.
- [23] Z.G. Feng, K.L. Teo, A discrete filled function method for the design of FIR filters with signed-powers-of-two coefficients, IEEE Trans. Signal Processing 56 (1) (2008) 134–139.
- [24] N.I. Cho, S.U. Lee, Optimal design of finite precision FIR filters using linear programming with reduced constraints, IEEE Trans. Signal Processing 46 (1) (1998) 195–199.
- [25] A.G. Dempster, M.D. Macleod, Use of minimum-adder multiplier blocks in FIR digital filters, IEEE Trans. Circuits Syst. II Analog Digit. Signal Processing 42 (9) (1995) 569–577.

- [26] O. Gustafsson, A difference based adder graph heuristic for multiple constant multiplication problems, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2007), 2007, pp. 1097–1100.
- [27] Y. Voronenko, M. Puschel, Multiplierless multiple constant multiplication, ACM Trans. Algorithms 3 (2) (2007) 1–39.
- [28] M. Potkonjak, M.B. Srivastava, A.P. Chandrakasan, Multiple constant multiplications: efficient and versatile framework and algorithms for exploring common sub expression elimination, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 15 (2) (1996) 151–165.
- [29] R. Hartley, Sub expression sharing in filters using canonic signed digit multipliers, IEEE Trans. Circuits Syst. II Analog Digit. Signal Processing 43 (10) (1996) 677–688.
- [30] N. Sankarayya, K. Roy, D. Bhattacharya, Algorithms for low power and high speed FIR filter realization using differential coefficients, IEEE Trans. Circuits Syst. II Analog Digit. Signal Processing 44 (6) (1997) 488–497.
- [31] O. Gustafsson, H. Ohlsson, L. Wanhammar, Improved multiple constant multiplication using minimum spanning trees, in: Proceedings of Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, vol. 1, 2004, pp. 63–66.
- [32] R. Guo, L.S. DeBrunner, K. Johansson, Truncated MCM using pattern modification for FIR filter implementation, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2010), 2010, pp. 3881–3884.
- [33] Y. Wang, K. Roy, A novel low-complexity method for parallel multiplierless implementation of digital FIR filters, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2005), 2005, pp. 2020–2023.
- [34] Y. Wang, K. Roy, CSDC: a new complexity reduction technique for multiplierless implementation of digital FIR filters, IEEE Trans. Circuits Syst. I Regular Pap. 52 (9) (2005).
- [35] F. Xu, C.H. Chang, C.C. Jong, Design of low-complexity FIR filters based on signed powers-of-two coefficients with reusable common subexpressions, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 26 (10) (2007).
- [36] M. Peiro, E.I. Boemo, L. Wanhammar, Design of high-speed multiplierless filters using a nonrecursive signed common subexpression algorithm, IEEE Trans. Circuits Syst. II Analog Digit. Signal Processing 49 (3) (2002) 196–203.
- [37] T.J. Lin, T.H. Yang, C.W. Jen, Area-effective FIR filter design for multiplier-less implementation, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2003), vol. 5, 2003, pp. 173–176.
- [38] A. Abbaszadeh, K.D. Sadeghipour, A new hardware efficient reconfigurable FIR filter architecture suitable for FPGA applications, in: Proceedings of 17th IEEE International Conference on Digital Signal Processing, 2011, pp. 1–4.
- [39] A.P. Vinod, C.H. Chang, P.K. Meher, A. Singla, Low power FIR filter realization using minimal difference coefficients: part I-complexity analysis, in: Proceedings of IEEE Asia Pacific Conference on Circuits and Systems (APCCAS 2006), 2006, pp. 1547–1550.
- [40] A.P. Vinod, C.H. Chang, P.K. Meher, A. Singla, Low power FIR filter realization using minimal difference coefficients: part II-Algorithm, in: Proceedings of IEEE Asia Pacific Conference on Circuits and Systems (APCCAS 2006), 2006, pp. 1551–1554.
- [41] A.P. Vinod, E.K. Lai, Optimizing vertical common subexpression elimination using coefficient partitioning for designing low complexity software radio channelizers, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2005), 2005, pp. 5429–5432.
- [42] S. Traferro, F. Capparelli, F. Piazza, A. Uncini, Efficient allocation of power of two terms in FIR digital filter design using tabu search, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 1999), vol. 3, 1999, pp. 411–414.
- [43] R. Cemes, D. Ait-Boudaoud, Genetic approach to design of multiplierless FIR filters, Electron. Lett. 29 (24) (1993) 2090–2091.
- [44] G. Wade, A. Roberts, G. Williams, Multiplier-less FIR filter design using a genetic algorithm, in: Proceedings of IEE on Vision, Image and Signal Processing, vol. 141, no. 3, 1994, pp. 175–180.
- [45] P. Gentili, F. Piazza, A. Uncini, Efficient genetic algorithm design for power-oftwo FIR filters, in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 1995), vol. 2, 1995, pp. 1268–1271.
- [46] L. Cen, Y. Lian, Complexity reduction of high-speed FIR filters using microgenetic algorithm, in: Proceedings of First IEEE International Symposium on Control, Communications and Signal Processing, 2004, pp. 419–422.
- [47] L. Cen, Y. Lian, A modified micro-genetic algorithm for the design of multiplierless digital FIR filters, in: Proceedings of IEEE Region 10 Conference (TENCON 2004), 2004, pp. 52–55.
- [48] S.U. Ahmad, A. Antoniou, Cascade-form multiplierless FIR filter design using orthogonal genetic algorithm, in: Proceedings of IEEE International Symposium on Signal Processing and Information Technology, 2006, pp. 932–937.
- [49] A. Chandra, S. Chattopadhyay, A novel approach for coefficient quantization of low-pass finite impulse response filter using differential evolution algorithm, Signal Image Video Process. 8 (7) (2014) 1307–1321.
- [50] A. Chandra, S. Chattopadhyay, Novel design strategy of multiplier-less low-pass finite impulse response filter using self-organizing random immigrants genetic algorithm, Signal Image Video Process. 8 (3) (2014) 507–522.
- [51] A. Chandra, S. Chattopadhyay, Design optimization of powers-of-two FIR filter using self-organizing random immigrants GA, Int. J. Electron. 102 (1) (2015) 127–140.
- [52] V.J. Manoj, E. Elias, On the design of multiplier-less nonuniform filter bank transmultiplexer using particle swarm optimization, in: Proceedings of World Congress on Nature & Biologically Inspired Computing (NaBIC 2009), 2009, pp. 55–60.

- [53] M. Manuel, E. Elias, Design of multiplier-less FRM FIR filter using Artificial Bee Colony Algorithm, in: Proceedings of 20th IEEE European Conference on Circuit Theory and Design (ECCTD 2011), 2011, pp. 322–325.
- [54] V.J. Manoj, E. Elias, Artificial bee colony algorithm for the design of multiplierless nonuniform filter bank transmultiplexer, Int. J. Inf. Sci. 192 (2012) 193–203.
- [55] C. Chen, J. Lee, McClellam transform based design techniques for twodimensional linear phase FIR filters, an improved polynomial-time algorithm for designing digital filters with power-of-two coefficients, IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 41 (8) (1994) 505–517.
- [56] S. Sriranganathan, D.R. Bull, D.W. Redmill, Design of 2-D multiplierless FIR filters using genetic algorithms, in: Proceedings of 1st International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications, (GALESIA 1995), 1995, pp. 282–286.
- [57] S.T. Tzeng, Design of 2-D FIR digital filters with specified magnitude and group delay responses by GA approach, Signal Processing 87 (2007) 2036–2044.
   [58] M. Manuel, R. Krishnan, E. Elias, Design of multiplierless 2-D sharp wideband
- [36] M. Maruer, K. Kisimari, E. Erlas, Design of multiplicities 2-D sharp wideback filters using FRM and GSA, Glob. J. Res. Eng. Electron. Electron. Eng. 12 (2012).
- [59] A. Chandra, S. Chattopadhyay, A new strategy of image denoising using multiplier-less FIR filter designed with the aid of differential evolution algorithm, Multimedia Tools Appl. (2014) doi:10.1007/s11042-014-2358-7.
- [60] Y.C. Lim, S.R. Parker, Discrete coefficient FIR digital filter design based upon an LMS criteria, IEEE Trans. Circuits Syst. 30 (10) (1983) 723–739.
- [61] H.J. Oh, Y.H. Lee, Design of discrete coefficient FIR and IIR digital filters with prefilter-equalizer structure using linear programming, IEEE Trans. Circuits Syst. II Analog Digit. Signal Processing 47 (6) (2000) 562–565.
- [62] R. Ito, T. Fujie, K. Suyama, R. Hirabayashi, New design methods of FIR filters with signed power of two coefficients based on a new linear programming relaxation with triangle inequalities, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2002), vol. 1, 2002, pp. 813– 816.
- [63] C.Y. Yao, C.J. Chien, A partial MILP algorithm for the design of linear phase FIR filters with SPT coefficients, IEICE Trans. Fundam. E85-A (2002) 2302–2310.
- [64] G. Karakonstantis, K. Roy, An optimal algorithm for low power multiplierless FIR filter design using Chebyshev criterion, in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), vol. 2, 2007, pp. 49–52.
- [65] H.H. Dam, A. Cantoni, K.L. Teo, S. Nordholm, FIR variable digital filter with signed power-of-two coefficients, IEEE Trans. Circuits Syst. I Regular Pap. 54 (6) (2007) 1348–1357.
- [66] H.Q. Ta, T.L. Nhat, Design of FIR filter with discrete coefficients based on mixed integer linear programming, in: Proceedings of 9th IEEE International Conference on Signal Processing (ICSP 2008), 2008, pp. 9–12.
- [67] J.F. Sturm, Using SeDuMi 1.02: a MATLAB toolbox for optimization over symmetric cones, Optim. Methods Softw. 11–12 (1999) 625–653.
- [68] M. Aktan, A. Yurdakul, G. Dundar, An algorithm for the design of low-power hardware efficient FIR filters, IEEE Trans. Circuits Syst. I Regular Pap. 55 (6) (2008) 1536–1545.
- [69] H. Samueli, An improved search algorithm for the design of multiplierless FIR filters with powers-of-two coefficients, IEEE Trans. Circuits Syst. 36 (1) (1989) 1044–1047.
- [70] D. Li, J. Song, Y.C. Lim, A polynomial-time algorithm for designing digital filters with power-of-two coefficients, in: Proceedings of IEEE International Symposium on Circuits and Systems(ISCAS 1993), 1993, pp. 84–87.
- [71] C.L. Chen, A.N. Willson Jr., A trellis search algorithm for the design of FIR filters with signed-powers-of-two coefficients, IEEE Trans. Circuits Syst. II Analog Digit. Signal Processing 46 (1) (1999) 29–39.
- [72] C.L. Chen, K.Y. Khoo, A.N. Willson Jr., An improved polynomial-time algorithm for designing digital filters with power-of-two coefficients, in: Proceedings of IEEE International Symposium on Circuits and Systems (ICSAS1995), vol. 1, 1995, pp. 223–226.
- [73] T. Ciloglu, Design of FIR filters for low implementation complexity, Electron. Lett. 35 (7) (1999) 529–530.
- [74] D. Ait-Boudaoud, R. Cemes, Modified sensitivity criterion for the design of powers-of-two FIR filters, Electron. Lett. 29 (16) (1993) 1467–1469.
- [75] S.A. Alam, O. Gustafsson, Design of finite word length linear-phase FIR filters in the logarithmic number system domain, VLSI Des. 2014 (2014) 1–14.
- [76] M. Mehendale, S.D. Sherlekar, G. Venkatesh Synthesis of multiplier-less FIR filters with minimum number of additions, in: Proceedings of IEEE International Conference on Computer-Aided Design (ICCAD-95), 1995, pp. 668–671.
- [77] R. Pasko, P. Schaumont, V. Derudder, S. Vernalde, D. Durackova, A new algorithm for elimination of common subexpressions, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 18 (1) (1999) 58–68.
- [78] C.Y. Yao, H.H. Chen, T.F. Lin, C.J. Chien, C.T. Hsu, A novel commonsubexpression-elimination method for synthesizing fixed-point FIR filters, IEEE Trans. Circuits Syst. I Regular Pap. 51 (11) (2004) 2215–2221.
- [79] M.D. Macleod, A.G. Dempster, Multiplierless FIR filter design algorithms, IEEE Signal Processing Lett. 12 (3) (2005) 186–189.
- [80] A.G. Dempster, M.D. Macleod, Generation of signed-digit representations for integer multiplication, IEEE Signal Processing Lett. 11 (8) (2004) 663–665.
- [81] A.G. Dempster, M.D. Macleod, Constant integer multiplication using minimum adders, in: Proceedings of IEE on Circuits, Devices and Systems, vol. 141, no. 5, 1994, pp. 407–413.
- [82] R. Hartley, Optimization of canonic signed digit multipliers for filter design, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 1991), 1991, pp. 1992–1995.

- [83] I.C. Park, H.J. Kang, Digital filter synthesis based on an algorithm to generate all minimal signed digit representations, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 21 (12) (2002) 1525–1529.
- [84] H. Safiri, M. Ahmadi, G.A. Jullien, W.C. Miller, A new algorithm for the elimination of common subexpressions in hardware implementation of digital filters by using genetic programming, J. VLSI Signal Processing 31 (2) (2002) 91–100.
- [85] J.Y. Kaakinen, T. Saramaki, A systematic algorithm for the design of multiplierless FIR filters, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2001), vol. 2, 2001, pp. 185–188.
- [86] C.Y. Yao, A study of SPT-term distribution of CSD numbers and its application for designing fixed-point linear phase FIR filters, in: Proceedings of IEEE International Symposium on Circuits and Systems, no. 2, 2001, pp. 301–304.
- [87] O. Gustafsson, L. Wanhammar, Design of linear-phase FIR filters combining subexpression sharing with MILP, in: Proceedings of IEEE 45th Midwest Symposium on Circuits and Systems, (MWSCAS 2002), vol. 3, 2002, pp. 9–12.
- [88] A. Avizienis, Signed-digit number representations for fast parallel arithmetic, IRE Trans. Electron. Comput. 3 (1961) 389–400.
- [89] D.R. Bull, D.H. Horrocks, Primitive operator digital filters, in: Proceeding of IEE Circuits, Devices and Systems, 1991, pp. 401–412.
- [90] D. Li, Minimum number of adders for implementing a multiplier and its application to the design of multiplierless digital filters, IEEE Trans. Circuits Syst. 11 Analog Digit. Signal Processing 42 (7) (1995) 453–460.
- [91] Y.C. Lim, Design of discrete-coefficient-value linear phase FIR filters with optimum normalized peak ripple magnitude, IEEE Trans. Circuits Syst. 37 (12) (1990) 1480–1486.
- [92] N. Benvenuto, M. Marchesi, A. Uncini, Applications of simulated annealing for the design of special digital filters, IEEE Trans. Signal Processing 40 (2) (1992) 323–332.
- [93] M. Potkonjak, M.B. Srivastava, A.P. Chandrakasan, Efficient substitution of multiple constant multiplications by shifts and additions using iterative pair-wise matching, in: Proceedings of 31st Conference on Design Automation, 1994, pp. 189–194.
- [94] D.N. Pearson, K.K. Parhi, Low-power FIR digital filter architectures, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 1995), vol. 1, 1995, pp. 231–234.
- [95] D.A. Parker, K.K. Parhi, Area-efficient parallel FIR digital filter implementations, in: Proceedings of International Conference on Application Specific Systems, Architectures and Processors (ASAP 1996), 1996, pp. 93–111.
- [96] D.A. Parker, K.K. Parhi, Low-area/power parallel FIR digital filter implementations, J. VLSI Signal Processing 17 (1997) 75–92.
- [97] H.J. Kang, I.C. Park, FIR filter synthesis algorithms for minimizing the delay and the number of adders, IEEE Trans. Circuits Syst. II Analog Digit. Signal Processing 48 (8) (2001) 770–777.
- [98] D.L. Maskell, J. Leiwo, J.C. Patra, The design of multiplierless FIR filters with a minimum adder step and reduced hardware complexity, in: Proceedings of IEEE International Symposium on Circuits and Systems(ISCAS 2006), 2006, pp. 605–608.
- [99] A.P. Vinod, M.K. Lai, On the implementation of efficient channel filters for wideband receivers by optimizing common subexpression elimination methods, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 24 (2005) 295–304.
- [100] Y.J. Yu, Y.C. Lim, Optimization of fir filters in subexpression space with constrained adder depth, in: Proceedings of the 6th International Symposium on Image and Signal Processing and Analysis (ISPA 2009), 2009, pp. 766– 769.
- [101] Y.J. Yu, Y.C. Lim, Roundoff noise analysis of signals represented using signed power-of-two terms, in: Proceedings of 14th European Signal Processing Conference (EUSIPCO 2006), 2006.
- [102] T. Chang, C. Kung, C. Jen, A simple processor core design for DCT/IDCT, IEEE Trans. Circuits Syst. Video Technol. 10 (3) (2000) 439–447.
- [103] J.T. Kim, Design and implementation of computationally efficient FIR filters and scalable VLSI architectures for discrete wavelet transform, PhD dissertation, Advanced Institute of Science and Technology, Korea, 1998.
- [104] A.P. Vinod, E. Lai, D.L. Maskell, P.K. Meher, An improved common subexpression elimination method for reducing logic operators in FIR filter implementations without increasing logic depth, Integr. VLSI J. 43 (2010) 124–135.
- [105] T. Chang, Y. Chu, C. Jen, Low power FIR filter realization with differential coefficients and input, IEEE Trans. Circuits Syst. II 47 (2000) 137–145.
- [106] A. Chandra, S. Chattopadhyay, Efficient encoding of powers-of-two coefficients through minimum index floating point representation (MIFPR), in: Proceedings of 2014 International Conference on Control, Instrumentation, Energy and Communication (CIEC 2014), 2014, pp. 650–653.
- [107] A.T. Erdogan, T. Arslan, Low power FIR filter implementations based on coefficient ordering algorithm, in: Proceedings of the IEEE Computer Society Annual Symposium on VLSI Emerging Trends in VLSI Systems Design (ISVLSI 2004), 2004.
- [108] P. Persson, S. Nordebo, I. Claesson, Design of discrete coefficient FIR filters by a fast entropy-directed deterministic annealing algorithm, IEEE Trans. Signal Processing 53 (3) (2005) 1006–1014.
- [109] R. Baudin, G. Lesthievent, Design of FIR filters with sum of power-of-two representation using simulated annealing, in: Proceedings of 2014 7th

Advanced Satellite Multimedia Systems Conference and the 13th Signal Processing for Space Communications Workshop (ASMS/SPSC), 2014, pp. 339–345.

- [110] R. Cemes, D. Ait-Boudaoud, Multiplierless FIR filter design with power-of-two coefficients, Inst. Electr. Eng. 6 (1993) 1–4.
- [111] T.W. Parks, C.S. Burrus, Digital Filter Design, Wiley, New York, 1989.
- [112] Q. Zhao, Y. Tadokoro, A simple design of FIR filters with powers-of-two coefficients, IEEE Trans. Circuits Syst. 35 (5) (1988) 566-570.
- [113] L. Cen, Y. Lian, High speed frequency response masking filter design using genetic algorithm, in: Proceedings of IEEE International Conference on Neural Networks & Signal Processing, 2003, pp. 735–738.
- [114] D. Shi, Y.J. Yu, Design of discrete-valued linear phase FIR filters in cascade form, IEEE Trans. Circuits Syst. I Regular Pap. 58 (7) (2011) 1627–1636.
- [115] T.G. Fuller, B. Nowrouzian, F. Ashrafzadeh, Optimization of FIR digital filters over the canonical signed-digit coefficient space using genetic algorithms, in: Proceedings of Midwest Symposium on Circuits and Systems, 1998, pp. 456–459.
- [116] Y.J. Yu, Y.C. Lim, Genetic algorithm approach for the optimization of multiplierless sub-filters generated by the frequency-response masking technique, in: Proceedings of 9th IEEE International Conference on Electronics, Circuits and Systems, vol. 3, 2002, pp. 1163–1166.
- [117] P. Mercier, S.M. Kilambi, B. Nowrouzian, Optimization of FRM FIR digital filters over CSD and CDBNS multiplier coefficient spaces employing a novel genetic algorithm, J. Comput. 2 (7) (2007) 20–31.
- [118] T. Çiloglu, Y. Hoon Lee, Efficient allocation of power-of-two terms in complex FIR filter design, in: Proceedings of the 1999 IEEE International Symposium on Circuits and Systems, (ISCAS'99), vol. 3, 1999, pp. 411–414.
- [119] L. Cen, A hybrid genetic algorithm for the design of FIR filters with SPOT coefficients, Signal Processing 87 (2007) 528–540.
- [120] A. Chandra, S. Chattopadhyay, Selection of computationally efficient mutation strategy of differential evolution algorithm for the design of multiplier-less low-pass FIR filter, in: Proceedings of 14th International Conference on Computer and Information Technology (ICCIT 2011), 2011, pp. 274–279.
- [121] A. Chandra, S. Chattopadhyay, Role of mutation strategies of differential evolution algorithm in designing hardware efficient multiplier-less low-pass FIR filter, J. Multimedia 7 (5) (2012) 353–363.
- [122] A. Chandra, S. Chattopadhyay, Computationally efficient design of multiplierless low-pass FIR filter using trigonometric mutation strategy of differential evolution algorithm, in: Proceedings of Fourth International Conference on Sustainable Energy and Intelligent System (SEISCON 2013), 2013, pp. 272–277.
- [123] A. Chandra, S. Chattopadhyay, A novel self-adaptive differential evolution algorithm for efficient design of multiplier-less low-pass FIR filter, in: Proceedings of Second International Conference on Sustainable Energy and Intelligent System (SEISCON 2011), 2011, pp. 733–738.
- [124] S.C. Pei, S.B. Jaw, Efficient design of 2D multiplierless FIR filters by transformation, in: Proceedings of IEEE International Conference of Acoustics, Speech and Signal Processing (ICASSP 1987), vol. 12, 1987, pp. 1669–1672.
- [125] H.K. Kwan, C.L. Chan, Circularly symmetric two-dimensional multiplierless FIR digital filter design using an enhanced McClellan transformation, in: Proceedings of IEE vol. 136, no. 3, 1989, pp. 129–134.
- [126] J.C. Liu, Y.L. Tai, Design of 2-D wideband circularly symmetric FIR filters by multiplierless high-order transformation, IEEE Trans. Circuits Syst. I Regular Pap. 58 (4) (2011) 746–754.
- [127] P. Siohan, A. Benslimane, Finite precision design of optimal linear phase 2-D FIR digital filters, IEEE Trans. Circuits Syst. 36 (1) (1989) 11–22.
- [128] L. Banzato, N. Benvenuto, G.M. Cortelazzo, A design technique for twodimensional multiplierless FIR filters for video applications, IEEE Trans. Circuits Syst. Video Technol. 2 (3) (1992) 273–284.
- [129] J. Lee, S. Yang, D. Tang, Minimax design of 2-D linear-phase FIR filters with continuous and powers-of-two coefficients, Signal Processing 80 (2000) 1435–1444.
- [130] Y.H. Lee, M. Kawamata, T. Higuchi, Design of multiplierless 2-D state-space digital filters over a powers-of-two coefficient space, IEICE Trans. Fundam. E79-A (3) (1996) 374–377.
- [131] R. Thamvichai, T. Bose, R.L. Haupt, Design of 2-D multiplierless filters using the genetic algorithm, in: Proceedings of 35th Asilomar Conference on Signals, Systems and Computers, vol. 2, 2001, pp. 588–591.
- [132] K. Boudjelaba, F. Ros, D. Chikouche, An advanced genetic algorithm for designing 2-D FIR filters, in: Proceedings of Pacific Rim Conference on Communications, Computers and Signal Processing, 2011, pp. 60–65.
- [133] B. Elkarami, M. Ahmadi, An efficient design of 2-D FIR digital filters by using singular value decomposition and genetic algorithm with canonical signed digit (CSD) coefficients, in: Proceedings of IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS), 2011, pp. 1–4.
- [134] M. Manuel, E. Elias, Design of sharp 2D multiplier-less circularly symmetric FIR filter using harmony search algorithm and frequency transformation, J. Signal Inf. Processing 3 (2012) 344–351.
- [135] K. Boudjelaba, D. Chikouche, F. Ros, Evolutionary techniques for the synthesis of 2-D FIR filters, IEEE Stat. Signal Processing Workshop (2011) 601–604.
- [136] K. Jheng, S. Jou, A. Wu, A design flow for multiplierless linear-phase FIR filters: from system specification to Verilog code, in: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2004), vol. 5, 2004, pp. 293–296.