Abstract-The most computationally intensive part of a wideband receiver is the channelizer. The computational complexity of linear phase finite impulse response (LPFIR) filters employed in the channelizer is dominated by the number of adders used in the implementation of the multipliers. In this paper, two methods are proposed to efficiently implement the channel filters in a wideband receiver based on common subexpression elimination (CSE). We exploit the fact that a significant amount of redundant multiplications exist in the filter-bank channelizer as it extracts multiple narrowband channels from the wideband signal. By forming three and four nonzero-bit super-subexpressions utilizing redundant identical shifts that exist between a two-nonzero-bit common subexpression (CS) and a third nonzero bit, or between two nonzero-bit CS, the number of adders to implement the channel filters can be reduced considerably. Furthermore, the complexity of adders is analyzed and design examples of the channel filters employed in the digital advanced mobile phone system (D-AMPS) and the personal digital cellular (PDC) channelizers show that the proposed methods offer considerable reduction in the number of full adders when compared to conventional CSE methods.
I. INTRODUCTION
Digital filters employed in the channelizer of a wideband receiver, which extracts several narrowband channels from a wideband signal, present a hardware design challenge [1] . Linear phase finite impulse response (LPFIR) filters implemented with high-speed and low-power are required in channelizers. Although programmable filters based on digital signal processing cores offer the advantage of flexibility; they are not suitable for wideband receiver applications that demand high throughput and low-power consumption. Therefore, application-specific digital filters are frequently adopted to meet the constraints of performance and power consumption in such applications. However, these filters employ a large number of multipliers that lead to excessive area and power consumption even if they are implemented in full custom integrated circuits. Therefore, the problem of implementing digital filters with small area and low-power consumption has received a great attention in the last decade. The algorithms that minimize the complexity of multiplication in LPFIR filters focus on reducing the number of adders needed to implement the multipliers.
Multiple constant multiplication (MCM) is a transformation closely related to the widely used substitution of multiplications with constants by shifts and additions. While the latter considers multiplication of only one constant at a time, the MCM considers multiplication of one variable with multiple constants. Common subexpression elimination (CSE) tackles the MCM problem by minimizing the number of additions through extracting common parts among the constants represented in canonic signed digit (CSD) [2] - [10] . In general, these methods eliminate redundant computations in multiplier blocks by employing the most common subexpressions consisting of two nonzero-bits. In this paper, we show that conventional horizontal CSE (HCSE) and vertical CSE (VCSE) methods using two nonzero-bit common subexpression (CS) can be optimized to form three and four nonzero-bit super-subexpression (SS) by exploiting redundant identical shifts among them. Our proposed techniques offer considerable reduction in implementing LPFIR filters employed in the channelizer of a wideband receiver where SS among the coefficients of several filters are utilized. They are particularly suitable for filter bank channelizers (FBC). We extend the conventional MCM problem proposed for individual LPFIR filters to FBC for multiplication of one variable (wideband signal) with multiple constants (coefficients) of a bank of bandpass filters (called channel filters) as shown in Fig. 1 .
The rest of this paper is organized as follows. In Section II, the HCSE algorithm used to implement MCM in LPFIR filters is analyzed. The complexity of implementation is analyzed in terms of full adders required for each adder. A Horizontal Super-Subexpression Elimination (HSSE) algorithm by optimizing the HCSE method is presented in Section III. In Section IV, a vertical SS elimination (VSSE) is presented by optimizing the VCSE algorithm. We relate our method to highlevel synthesis methods in Section V. The implementation of channel filters for the digital advanced mobile phone system (D-AMPS) and the personal digital cellular (PDC) standards using proposed HSSE and VSSE techniques are illustrated in Section VI. We also provide a comparison of hardware reduction achieved employing the proposed methods with that in conventional CSE methods. Section VII provides our conclusions.
II. CSE

A. Analysis of the HCSE Method
In this section, we discuss the implementation of an LPFIR filter using HCSE and provide an analysis of the issues related to its complexity. An 8-tap LPFIR filter whose coefficients in 16-bit CSD form in Fig. 2 is used as an example to illustrate the HCSE method. The numbers in the first row in Fig. 2 represent the number of bitwise right shifts.
Definition 1 [Multiplier Block Adders (MBA)]:
The adders used in the multiplier block (MB) to compute the sum of partial products formed when x is multiplied with hi are called MBAs.
Definition 2 (Structural Adders):
The intertap adders used to compute the sum of convolved signals (between each delay stage) are called structural adders (SAs). The number of SA in a filter structure is same as that of the number of distinct delay stages.
It is well known that LPFIR filters are symmetric since its impulse response satisfies the condition h(n) = h(N 0 1 0 n) (1) where N is the number of taps (filter length). Thus, only extra bN=2c structural adders are required (floor value considered if N is odd) to obtain the filter output corresponding to the symmetric part. If N b is the number of nonzero bits in the symmetric half coefficient set represented in CSD, it requires N b 0 1 adders to obtain bN=2c01 i=0 x 1 h i . Therefore, the number of adders required to implement the filter is given by where x 1 is the input signal and represents bitwise right shift operation. If N hs is the total number of 2-bit HCS in the symmetric half coefficient set and N as is the number of adders required for distinct HCS, the reduction of adders achieved using HCSE is N hs 0N as . Hence, the number of adders required to implement the filter using HCSE can be obtained by modifying (2) (N b 0 1) + N 2 0 (N hs 0 N as ): (4) In this case, N hs = 13 and N as = 2. According to (4) , 21 adders are required to implement the filter. This offers a reduction of 34% over direct implementation without HCSE.
B. Adder Complexity
All of the CSE techniques presented in literature discuss the reduction of hardware at the adder level to show the efficiency of those methods. However, the complexity of each adder is significant in practical implementations with high-speed/low-power requirements. In this section, we analyze the complexity of the adders, since it determines the actual cost of implementation. An adder that adds two n-bit numbers would require n full adders (FAs) to compute the sum. We consider ripple carry adders (RCAs) throughout the paper on account of its low power consumption. The area, power, and speed of an adder depend on the value of n, which is called the adder width. Efforts to optimize these parameters should focus on minimizing the adder width, i.e., the number of FAs. Firstly, we derive the expressions for analyzing the complexity of adders in HCSE optimized filters and then compute the number of FAs required to implement them.
Definition 3 (Nonzero Terms):
The subexpressions and the nonzero bits other than the subexpressions of a coefficient are termed as its nonzero terms. For example, the two nonzero terms of a coefficient represented in CSD, (0.101 000 1), are [1 0 1] (CS) and 1 (least significant bit).
Definition 4 (Operands):
The input signal shifted corresponding to the positional weights of the nonzero terms of the coefficient form the operands of the adders. For instance, in the case of the coefficient, (0.101 000 1), the operands are x 2 1 and x 1 7, where x 1 is the input and x 2 = x 1 + x 1 2 is the CS, [1 0 1] . Note that the number of nonzero terms and operands are identical. The number of adders required to compute the output for a coefficient is equal to one less than the number of operands.
Definition 5 (Span):
The span is analogous to the wordlength, which is equal to the number of bits of an operand. Considering the above example, if x 1 is an 8-bit quantized signal, the span of the operand x2 1 is 11 and that of x1 7 is 15.
Definition 6 (Adder-Step):
One addition stage in a maximal path of decomposed multiplications is called the adder-step. A multiplication can have different adder-steps, depending on the structure of multiplication.
We employ the high-speed tree structure shown in Fig. 3 to implement the MB. The spans of the operands are indicated as s i in Fig. 3 . The s i 's shown adjacent to the adders represent the adder widths. Using the binary tree in Fig. 3 , the number of adder-steps (An) required to compute the sum of partial products of n operands (nonzero bits of the coefficient) is given by 2 A n. From this, we obtain A n = log 10 (n) log 10 (2) :
The An obtained in (5) is the lowest number of adder-steps (lower bound) possible to achieve in an addition structure since the tree structure considered in our method performs parallel addition. Therefore, our method always results in a minimum adder-step implementation and hence has the lowest delay. Case I-Odd Number of Operands: Consider the coefficient h(n) = (0:100 101 010 1). If x 1 represents the input signal, the output can be expressed as
where x 2 = x 1 + x 1 2 is the HCS corresponding to the bit pattern [1 0 1] . In this case, the number of operands is three (odd) and, hence, two adders are required to compute y(n). If x1 is represented using eight bits, the minimum span (neglecting the carry part) of x 2 is 10 and those of the first, the second, and the third operands of (6) are 9, 14, and 18, respectively. For an adder whose operands have spans s1 and s2 , such that s 2 > s 1 , the adder width is s 2 . There are two possible ways to implement (6) , as shown in Fig. 4 Filter coefficients in CSD form with wordlengths up to 24-bits are considered here. Since no adjacent bits in CSD are ones, a 24-bit CSD number can have a maximum of 12 nonzero bits and, hence, at the most 12 nonzero operands could occur in a multiplication. Consider the filter tap shown in Fig. 3 that has an odd number of operands (nine). The total number of FAs required to implement this filter tap is given by the sum of the widths of all adders, i.e., s2 + 2s4 + s6 + 3s8 + s9 . By extending this minimum adder-step structure to 24-bit CSD coefficients, it can be shown that the number of FAs N 0 required to compute the output corresponding to a coefficient with n operands can be determined using the expression N o = s 2 + a 1 s 3 + 2s 4 + a 3 s 5 + s 6 + a 5 2s 7 + 3s 8 + a 7 s 9 +s10 + a92s11 + 2s12 (7) where s n is the span of the nth operand and a i 's are equal to zero except an02 = 1. For instance, if seven operands are present, using (7) we get N 0 = s 2 +2s 4 + s 6 +2s 7 . Expression (7) can be represented in matrix form for easier computation of FAs for any coefficient with n operands (n 11 since n is odd) as
where UH (dke) represents the elements of the dketh row of the matrix and S is the span vector 
In this case, the number of operands is four (even) and, hence, three adders are required to compute y(n). The possible addition sequences to obtain (9) are shown in Fig. 5 (a) and (b). If the spans of the operands of (9) are s 1 , s 2 , s 3 , and s 4 , respectively, the implementation in (10) where c 0 2; for n = 6 1; for n 6 = 6 and c 1 2; for n = 10 1; for n 6 = 10 .
For example, if six operands are present, it would require (s 2 + 2s 4 + 2s6) FAs. Using the matrix form, the number of FAs for computing the output is given by
where S is the span vector as in (8) 
C. FA Requirements in HCSE Method
The number of FAs (MBA) required to compute the partial products for the filter in Fig. 2 can be determined using (8) and (11) for odd and even numbers of operands, respectively. We consider the first two coefficients in Fig. 2 
Using this method, the total number of FAs required to compute the partial products of the MBAs of the LPFIR filter in Fig. 2 is 376. In the next section, we present an optimization technique that minimizes the number of FAs.
III. OPTIMIZATION OF HCSE METHOD
We observe that several 3-and 4-bit horizontal SS (HSS) can be formed by exploiting identical shifts between an HCS and a nonzero bit or between two HCS, which will eliminate redundant computations of HCS. While implementing multiplication using shifts and adds, if we could perform addition prior to shift, the adder width can be minimized. Note that in CSE implementations, the adders employed for CS have shorter widths since the shift operations for obtaining the final partial products are performed after the addition at the CS stage. In the proposed HSSE method, shift operations are performed after additions at two stages-first at the HCS stage and then at the HSS stage. Therefore, the adders at these two stages have shorter adder widths. Utilizing this fact, we shall now show that considerable reduction of FAs is possible by forming HSS from HCS using the HSSE method.
A. HSSE Method
The HSSE procedure is as follows. First, the 2-bit HCS are extracted from the coefficient set represented in CSD. These HCS are then examined for multiple occurrences of identical shifts with a nonzero bit or with another HCS within the same coefficient to form 3-and 4-bit HSS, respectively. Consider the example in Fig. 2 , where HCS are given by (3) . Note that following multiple bit patterns can be formed. 
2 
3 
It may be noted that several HSS in "shifted and delayed" forms of (16), (17) , and (18) We observe that several HSSs exist in LPFIR filters, especially in the case where the number of taps is large and the wordlength is higher (16-bit and higher). We have investigated several examples of LPFIR filters with taps ranging from 100 to 1200 corresponding to different stop-band attenuation specifications. The infinite-precision filter coefficients were generated by the Parks-McClellan FIR filter design program provided by the MATLAB "remez" function. Filter coefficients represented in CSD form for different wordlengths of 16-, 20-, and 24-bits were considered. From the extensive examples we worked out, it has been observed that among the possible HSS, the 3-bit expressions (16), (17) , and (18), the output of the filter whose coefficients given in Fig. 2 
Fig . 6 shows the filter structure using the HSSE method. Note that only 17 adders (10 MBA and 7 SA) are required to implement the filter, two for HCS (3), three for HSS (16)- (18), and 12 for the filter output (19). It may be noted that though 17 additions are present in (19), using symmetry of LPFIR filters, only twelve adders are sufficient to compute the sum as shown in Fig. 6 . This is because, the outputs of adders, A 6 , A8 , A9 , and A10 can be shared by respective symmetric filter tap pairs as shown in Fig. 6 . Note that the sharing of symmetric parts is shown in Fig. 6 using the @ symbol. Thus, by sharing, we can save one adder each, corresponding to A6 , A9 , A10 , and two adders corresponding to A 7 and A 8 (sharing the output of A 8 results saving of two adders, A 7 and A 8 ). Thus, the total saving due to sharing is five adders. Hence, only 12 adders are required to obtain (19). Therefore, the HSSE implementation requires four adders less than the HCSE implementation. The adder-steps required to compute the partial products in the proposed method is four, which is the same as that of the HCSE method. Thus, both methods have identical critical paths of four adder-steps.
B. FA Requirements
The number of full adders required to compute the partial products of the filter in Fig. 6 can be determined using (8) and (11) for odd and even number of operands respectively. Ten adders (A 1 to A 10 ) are required to compute the partial products. Note that the adders A 3 , A 4 , and A5 that compute the HSS part have relatively short adder-width when compared to other adders in subsequent stages. This is because the use of HSS adders allows us to perform most of the "right shift" operations after addition, which is more efficient than the usual "shift and add" method. As a result, fewer FAs are required to compute the partial products. Thus, using the HSSE method, only 253 FAs are required to compute the partial products of the MBAs of the LPFIR filter in Fig. 2 . This is a reduction of 32.7% over the HCSE method. Design examples of implementing channel filters of a wideband receiver using the HSSE method are discussed in Section VI.
IV. OPTIMIZATION OF VCSE METHOD
In the VCSE method [8] , the fact that many vertical common subexpressions (VCS) exist in an LPFIR filter since the adjacent coefficients have similar patterns in the MSB part is utilized for reducing adders. In this section, we show that the SS technique used for optimizing the HCSE method can also be applied to the VCSE method.
Consider an 8-tap LPFIR filter whose coefficients in 16-bit CSD form are shown in Fig. 7 . In this case, N b = 22 (considering symmetric half coefficients) and N = 8. 
Using these VCS, 18 adders (12 MBA and 6 SA) are required to implement the filter, two for the VCS (20) and 16 for the output. This offers a reduction of 28% over the direct implementation without VCSE.
A. Proposed VSSE Method
The 2-bit VCS used in VCSE method can be extended to obtain several 3-and 4-bit vertical super-subexpressions (VSS) by exploiting identical shifts between a VCS and a nonzero bit or between two VCS. 
Note that several VSSs in "shifted and delayed" forms of (21) and (22) occur in the coefficient set. Employing the VSS, the output of the filter in Fig. 7 The filter structure using the VSSE method is shown in Fig. 8 . Only 15 adders are required to implement this filter, two for VCS (20), one each for VSS (21) and (22), and 1 for the filter output (23) after using the symmetry of coefficients. Thus, the VSSE method offers better reduction than the VCSE method. The adder-steps in both methods are identical (four) and, hence, their critical paths are the same. The reduction of FAs, FA R , offered by the VSSE method over the VCSE method can be determined using
where SCS is the span of a VSS, SSD is the span of the shift differential between the VCS of a VSS, m is the number of distinct VSS in the symmetric half coefficient set, and n is the total number of VSS for each distinct VSS set.
We illustrate this using the coefficients of the filter in Fig. 7 . Consider the VSS and h (1) . If the wordlength of x1 is 16 bits, then these VSS have spans 16+7 = 23 and 16+16 = 32, respectively. The spans of the VSS, x 5 , across h(1) and h(2) is 16 + 11 = 27 and that of x 4 across h(2) and h(3) are 16+ 6 = 22 and 16+ 15 = 31. Thus, the sum of spans is 135. The spans of the shift differentials (SSD) of the VSS, x 4 and x 5 , are 18 each. Using (24), it can be found that the proposed VSSE method requires 99 FAs fewer than the VCSE method, which is a reduction of 31%.
B. Compatibility Issue in VS Methods
An inherent drawback of the VCSE method is that the symmetry of LPFIR filter coefficients cannot be completely exploited for efficient implementation of the filter. In the case of HCSE method, since all the bits forming an HCS exist within the coefficient, its symmetric counterpart can be easily implemented using delays and structural adders. Thus, extra adders (MBA) are not required to compute the symmetric half coefficients when HCSE method is used. However, the bits that form a VCS in VCSE method occur across the coefficients and hence the symmetry is destroyed when the bits are of opposite sign [4] . Hence, in VCSE implementations, extra adders are required to obtain the symmetric part of the coefficients when more than one VCS with bits of opposite sign exist. Since the basic ideas of VCSE and VSSE methods are the same, the limitation inherent in the former exists in the latter also. Therefore, compatible VCS patterns have to be identified to form a VSS. Two VCS (4-bit VSS) or a VCS and a nonzero bit (3-bit VSS) are said to be compatible to form a VSS if its symmetry is not affected, i.e., no extra adders are required to compute the symmetrical part of the LPFIR filter. The signs of the bits in VCS determine the compatibility. We use the notation s(b) to represent "sign of bit" b in defining compatibility. Fig. 9 (c) illustrates 3-bit compatible VSS patterns. Note that the bit 1, which is combined with VCS 101 can be anywhere in the second row. The notation x denotes don't cares, since the bits in these locations will not affect the compatibility of VSS. Any VSS that does not satisfy the conditions mentioned above is incompatible.
Definition 7 (Compatible 4-Bit VSS): Let
We investigated the same examples of LPFIR filters designed using Parks-McClellan method discussed in previous section. CSD coefficients of wordlengths 8-, 12-, 14-, and 16-bits were considered. It has been observed that the most common VSS are the 3-bit VSS that form around 60% of all the VSS and hence they account for the major reduction of adders in the VSSE method. Design examples of the HSSE and VSSE methods are provided in Section VI.
V. EXTENSION OF CSE TO HIGH-LEVEL SYNTHESIS
A. High-Level Synthesis Transformation
In high-level synthesis, the primary goal of transformations has been to optimize the application specific integrated circuit (ASIC) design to reduce cost metrics (area and power) while meeting throughput constraints [11] . The high-level synthesis literature has an extensive coverage of CSE as a powerful transformation to reduce power consumption and area [2] , [12] - [14] . Iqbal et al. [12] used CSE in their algebraic speed-up procedure for throughput improvement. The objective function in [12] was to reduce the critical path. The approaches in [2] , [13] , and [14] focused on a more apparent goal of reducing the number of operations and, therefore, area and power of designs. The significant advancement for the transformation using CSE was achieved by Potkonjak et al. in [13] and [14] . They first formulated the MCM problem in high-level synthesis by considering the multiplications of one variable with several constants at a time and also reduced the number of shifts and additions based on an iterative pairwise matching. Mehendale et al. [2] considered the CSE problem by examining the filter coefficient matrix and the iterative elimination of the most frequently occurring common subexpressions.
In general, the high-level synthesis tasks of the methods in [2] , [13] , and [14] are based on elimination of two nonzero-bit common subexpressions as shown in Fig. 10(a) . The operands, a, b, c, and d in Fig. 10(a) represent the input signal of the filter and its shifted versions. The sums e and f are the common subexpressions that are shared for minimizing adders and s 1 , s 2 , s 3 , s 4 represent the shifts. Note that four adders are required to obtain the final expressions, h and i. Fig. 10(b) illustrates our super-subexpression method, where the super-subexpression g is shared for further reduction of adders to obtain h and i using appropriate shifts. Note that only three adders are required using our method. Thus, by employing the new transformation (super-subexpression), our method improves the efficiency of CSE in high-level synthesis and offers a more power-efficient solution by reducing the number of operations (additions).
B. Area and Power Reduction
In CMOS technology, there are three sources of power dissipation arising from switching (dynamic) currents, short-circuit currents, and leakage currents. Among these parameters, the switching component, which is a function of the effective capacitance, plays the most significant role [15] . It is possible to reduce power by employing transformations such as reductions in critical path, number of operations, and average transition activity. These transformations result in architectures that minimize the effective capacitance of the circuit [15] . The basic motivation behind critical path reduction is that the supply voltage can be lowered while keeping the throughput fixed. It can be noted from the design examples of previous sections that the tree-structured (parallel) addition (shown in Fig. 3 ) adopted in our method results in considerable reduction of critical path. Moreover, when compared with a chain (serial) implementation, the signal paths are more balanced in a tree implementation and the amount of extra transitions is reduced. For example, the capacitance switched for a chained implementation is a factor of 1.5 larger than the tree implementation for a four input addition [15] . Thus, the filter structure used in our method is efficient in terms of critical path length and transition activity. Having optimized these two parameters, the most obvious approach for capacitance reduction is to reduce the number of operations (and, hence, the number of switching events) in the data control flow graph. The super-subexpression elimination methods proposed in this paper is an efficient transformation that directly reduces the number of operations through the reduction of FAs required for each adder.
We illustrate the area reduction achieved in our method using the example of the 8-tap filter coefficients in Fig. 2 Fig. 2 using HCSE. On the other hand, our HSSE method needs only 253 FAs, which is equivalent to 3658.38 units for the MB. The higher reduction of FAs achieved using our method in the case of channel filters (that possess large number of taps) employed in wideband receivers significantly reduces the cost metrics, area, and power.
VI. DESIGN EXAMPLES
The channel filters of a wideband receiver that operate in the intermediate frequency (IF) have a large number of taps due to their narrow transition band and high sampling-frequency requirements. Therefore, the CSE optimization methods proposed in this paper offer considerable complexity reduction when used to implement the channel filters. We present examples of implementing channel filters for the D-AMPS and the PDC cellular standards using the HSSE and VSSE methods. The proposed optimization methods are compared with conventional 2-bit CSE techniques and reductions of FA's are determined. Based on the simulation results obtained for filters with different wordlengths, certain guidelines on the choice of HSSE and VSSE methods are also drawn.
A. Example 1
We consider the LPFIR filters employed in the filter bank channelizer of D-AMPS in [16] . Note that decimation by N is moved to the left of the bandpass filters using the noble identity and the sampling rate chosen is 34.02 MHz as in [16] . The channel filters extract 30 kHz D-AMPS channels from the input signal after downsampling by a factor of 350. The pass-band and stop-band edges are 30 and 30.5 kHz, respectively. The peak pass-band ripple specification is chosen as 0.1 dB. The filter stop-band specifications at different frequencies are chosen as in the D-AMPS standard [17] . The lengths of the LPFIR filters required to meet these specifications are determined using [18] N = 010 log 10 1 2 0 13 14:61f + 1
where 1 and 2 represent the passband and stopband ripples, and 1f
is the normalized width of the transition region. We applied the proposed HSSE and VSSE methods to implement the filters using 12-bit and 16-bit CSD coefficients. The 3-and the 4-bit SS formed from the 2-bit CS are utilized for optimization. Table I shows a comparison of the number of adders for computing the partial products (PP adders) required for implementing the filters using conventional HCSE and VCSE methods and the proposed methods.
We compare the reduction rates of HCSE, VCSE, HSSE, and VSSE methods with respect to conventional CSD implementation without using any CS methods. The comparison of reduction rates of adders achieved using proposed methods and that of CSE methods for 12-bit and 16-bit wordlengths are shown in Tables II and III, respectively. It can be observed that the VSSE method offers a better reduction rate over the HSSE method when the bit precision of implementation is lower (12-bit). The VSSE technique offers an average reduction of 39.9% for the 12-bit implementation whereas an average reduction of 43.7% is achieved using the HSSE technique for the 16-bit implementation. Note that both methods require fewer PP adders than the 2-nonzero-bit CSE methods. The number of FAs required for implementation is shown in Table IV . The reduction rates of FAs achieved Tables V and VI, respectively. Both HSSE and VSSE methods result in significant reduction of FAs when compared to HCSE and VCSE methods. In the case of implementation using the 12-bit, the VSSE method offers the best reduction (47.2%), whereas the reduction offered by the HSSE method is the best (54.2%) for the 16-bit case.
The reduction achieved when the proposed methods are used to employ the D-AMPS channelizer where extraction of each channel requires a separate narrowband filter is examined. The wideband signal considered for channelization consists of 1134 D-AMPS channels, each of 30 kHz spacing. We analyzed the requirement of PP adders to implement the filters for extracting 70, 141, 283, 567, and 1134 channels. The number of filter taps chosen is 1180 and the coefficient wordlength considered is 16 bits to meet the requirement of attenuating blockers that can be potentially 96 dB stronger than the wanted signal. Simulation results shown in Fig. 11 depict the adder reductions achieved using our proposed methods as a function of the number of extracted channels. The percent reductions are shown with respect to conventional implementation without using any CSE methods. Both of the HSSE and VSSE methods offer considerable hardware reduction and also result in better rate of change in hardware reduction as the number of channels increases compared to the CSE methods.
B. Example 2
In this example, we consider the channel filters employed in receivers for the PDC standard. The sampling rate of the wideband signal is 25.6 MHz, which covers 1024 channels of 25 kHz spacing. We fix the filter length as 1000 to meet the maximum attenuation requirement of 09 dB, and 24-bit coefficients are considered. The number of PP adders required to implement the filter is shown in Table VII . The requirement of FAs are also shown in Table VII , which shows that the proposed HSSE and VSSE methods offer a minimum reduction of 20% over the CSE methods.
Based on the simulation results, the following guidelines for choosing the best implementation method can be formulated as follows.
1) As in the case of VCSE method, the coefficient symmetry of LPFIR filters cannot be completely exploited in VSSE method. Hence, the proposed VSSE method offers better hardware reduction over the HSSE method only when the bit precision of implementation is lower. For larger wordlength implementations, the spans of the operands are also larger and, hence, the HSSE method results in a better reduction of adders. 2) For the 12-bit implementation, the VSSE method results in an average FA reduction of 20% over the VCSE method, whereas for wordlengths of 16-bit and higher, the HSSE method offers an average reduction of 25% over the HCSE method. 
VII. CONCLUSION
In this paper, we have proposed two techniques for optimizing the CSE methods to efficiently implement low-complexity LPFIR filters. They are based on the extension of conventional 2-nonzero-bit HCS to form 3-and 4-nonzero-bit HSS by exploiting identical shifts between an HCS and a nonzero bit, or between two HCS. These HSS eliminate redundant computations of HCS and, hence, reduce the number of adders. We have also applied the optimization technique to the VCSE method and formulated the VSSE algorithm. Furthermore, the complexities of adders are analyzed and expressions for determining the number of FAs required for each adder in a filter are derived. The experimental results show that considerable reduction of FAs can be achieved using proposed methods. Certain guidelines on the choice of HSSE and VSSE methods are also provided. We have applied the proposed methods to filter bank channelizers, where common CS that occur among a bank of filters are utilized. The design examples of channelizers based on D-AMPS and PDC cellular standards show that the optimization techniques presented in this paper offers an average reduction rate of 50% over conventional channel filter implementations.
