This work presents a novel coefficient mapping method to reduce the area cost of the finite impulse response (FIR) filter design, especially for optimizing its coefficients. Being capable of reducing the area cost and improving the filter performance, the proposed mapping method consists of four steps: quantization of coefficients, import of parameters, constitution of prime coefficients with parameters, and constitution of residual coefficients with prime coefficients. Effectiveness of the proposed coefficient mapping method is verified by selecting the 48-tap filter of IS-95 code division multiple access (CDMA) standard as the benchmark. Experimental results indicate that the proposed design with canonical signed digit (CSD) coefficients can operate at 86 MHz with an area of 241,813 um 2 , leading to a throughput rate of 1,382 Mbps. Its ratio of throughput/area is 5,715 Kbps/um 2 , yielding a higher performance than that of previous designs. In summary, the proposed design reduces 5.7% of the total filter area, shortens 25.7% of the critical path delay, and improves 14.8% of the throughput/area by a value over that of the best design reported before.
Introduction
Digital signal processing applications are common in home entertainment systems, television sets, high-fidelity audio equipment, and information systems. The digital filter is an important component in mathematical operations on a sampled, discrete-time signal to enhance the certainty of a signal. The digital filter is characterized by its transfer function. Two digital filters are infinite impulse response (IIR) and finite impulse response (FIR) filters. The IIR filter consists of a transfer function with feedback mode, and the FIR filter consists of the function with nonfeedback mode. Commonly found in image processing, audio processing, and wireless communications, FIR filter applications are characterized by a linear phase, arbitrary magnitude, and relatively easy implementation. The filter hardware consists of adders, subtractors, shifters, and registers. Many related works [1] [2] [3] [4] [5] [6] [7] [8] [9] attempt to reduce the number of these required components in filter implementation, especially for the optimization of coefficients' realization. Experimental results demonstrate that the proposed coefficient mapping method performs better than previous designs in terms of area ratio.
The rest of this paper is organized as follows. Section 2 briefly describes previous researches for filter optimization. Section 3 then describes the coefficient mapping method. Next, Section 4 summarizes the experimental results and compares them with those of other previous designs. Conclusions are finally drawn in Section 5, along with recommendations for future research.
Background
2.1. Digital FIR Filter. Digital filters generally vary in coefficients, based on their specifications. The design of coefficients in a filter can be divided into four portions: coefficient selection, coefficient identification, searching algorithm, and coefficient quantization.
(1) Coefficient Selection. Typically determined by a set of filter specifications, coefficient selection must consider the number of taps, bit width, and filter complexity. According to the different complexities of coefficients, different algorithms are used to find the common subexpressions (CSs) and eliminate them for obtaining the best area reduction. (2) Coefficient Identification. Coefficients must be encoded to determine the area cost of a filter and the frequency of extracting common subexpressions. In coefficient encoding, the common expression is binary encoding. However, this encoding method causes more 1's signals in data expression and more calculations in hardware implementation. Hence, optimizing more coefficients [10, 11] involves using the canonic signed digit (CSD) expression to eliminate many 1's signals and using less common subexpressions.
Mathematical Problems in Engineering
(3) Searching Algorithm. The searching algorithm can find more common subexpressions to reduce the area cost of filter. Although many works [1-9, 12, 13] have attempted to find as many common subexpressions as possible, a more complex algorithm may not yield a higher performance, especially in coefficients with a low complexity.
(4) Coefficient Quantization. Coefficient quantization is an effective means of reducing the number of logic gates while implementing a filter. When the coefficients are quantized for implementation, the commonly used rounding method causes a deviation in the time and frequency responses of the implemented filter from the ideal response. Sensitivity of the filter response is of priority concern when quantizing the coefficients.
Optimization of the FIR Filter. The equation of FIR filter
can be expressed as (1) , and its transferred function can also be expressed as (2):
Parameter in (1) and (2) is expressed as the number of taps. This parameter is related to the output of a filter system and its frequency responses. Two implementation methods used for a filter are direct and cascade architectures. Here, the direct architecture is of priority concern, especially for enhancing the frequency responses of a filter with all zeros. The direct architecture can also be divided into direct and transposed forms. The direct form consists mainly of multipliers, adders, and registers. Figure 1 shows the architecture of the direct form. An -tap filter with a direct form requires copies of a multiplier, − 1 copies of an adder, and copies of a register.
Coefficients ℎ(0)-ℎ( ) can be expressed as binary numbers. The binary expressions for the coefficients can be found with the common subexpressions (CSs) between them. In previous researches [1] [2] [3] [4] [5] [6] [7] [8] [9] , there are three common subexpression elimination (CSE) methods including horizontal, vertical, and mixed searching methods. The method in [1] uses CSD expressions [2] for the coefficients and uses horizontal searching algorithm to find the CSs including (1, 0, 1), (1, 0, −1), (1, 0, 0, 1), and (1, 0, 0, −1). This method first optimizes the CS with the highest appearance until no CS can be extracted. In [3, 7] , this method also uses CSD expressions, yet uses vertical search to find the CSs, including (1, −1) and (−1, 1). By modifying the vertical searching method in [8] , the method in [4] first uses horizontal search to extract the CSs including (1, 0, 1), (1, 0, −1), (1, 0, 0, 1) and (1, 0, 0, −1), which is expressed in CSD format. Thereafter, this method performs a vertical search to extract the CSs, including (1, 0, 1) and (−1, 0, 1).
In addition to using two horizontal methods and one vertical searching method to extract the CSs, the method in [5] also uses a multiplier-adder block (MAB) and structure adder (SA) to construct the CSs and their residues. Besides finding two CSs with the same appearances, the method in [6] extracts the CS with a smaller bit width. A previous work [8] developed two methods for extracting the CSs in CSD format. The first method analyzes the CSs with 3-, 4-, and 5-bit by performing the statistics of their appearances. The second method searches the coefficients up to down and extracts the CSs between them by using vertical search. The method in [9] proposes a rule in which the depth of logic gates cannot be increased by performing the horizontal search in the same way as in [4] and the vertical search in the same way as in [8] .
Following implementation of the above searching methods, the CSs can be extracted and the same CSs can be used for calculation only once. Calculation times of the filter are reduced due to the extractions. These searching methods can also reduce the required number of adders and subtractors. To verify the different searching methods, the 48-tap filter of IS-95 CDMA is selected as the benchmark.
Proposed Coefficient Mapping Method
The mapping method divides the coefficients into two parts: primary coefficients and remaining coefficients. The parameters that are set up in the algorithm are ℎ and . The constitution of coefficients has two steps: constitution of primary coefficients and calculation of multiple relations between the primary and remaining coefficients. The operation steps of the mapping method are described as follows.
Step 1. Normalize the -tap coefficients by multiplying 2 . The normalized results are ℎ = × 2 , for 0 ≤ ≤ − 1.
Step 2. Separate normalized coefficients ℎ into two parts, 1 and 2 :
Mathematical Problems in Engineering Step 3. Quantize ℎ and ℎ based on ℎ . Equations (4) and (5) show the relations between ℎ , ℎ , and ℎ
Parameters Δℎ and Δℎ are the variances after performing the quantization. Let ℎ = ℎ ( ℎ, ) and let ℎ = ℎ ( ℎ, ).
Equations (4) and (5) thus become the following equations separately:
Step 4. Find a multiple value , that is between ℎ and ℎ , and find a multiple value , that is between ℎ and ℎ .
The relations are shown as
Step 5. Quantize , based on . Equation (10) shows the relation between and , . Parameter Δ , denotes the variance after performing the quantization:
Let , = ( , , ), and (11) shows the relation between , and , :
Step 6. Constitute ℎ and substitute (11) with (9) . The relation between ℎ , ℎ , and , becomes Step 7. Revert the parameters ℎ and ℎ by substituting (9) and (12) with (7). Equations (13) and (14) show the relations as follows:
Let ℎ = ℎ × , , and (14) is then modified to
According to the variance, the calculation error and correct rates can be expressed as
Substitute the variances of ℎ and ℎ into (16). The error rate can be obtained, for 0 ≤ ≤ − 1. 
Before performing the mapping method, this work first sets up two parameters in which ℎ equals 32 and equals 0.25 to reduce the hardware design complexity. After Steps 1, 2, 3, 4, 5, 6, and 7 of the proposed method are performed, Table 1 lists the generated ℎ values. Figure 2(a) shows the frequency responses, poles, and zeros distributions of the original filter. Figure 2 
Experimental Results
For comparison, Table 2 lists various 48-tap filter designs for IS-95 CDMA. These filters can generate 16-bit output data at one clock cycle. The table shows the architectureand gate-level information of the filters. The area cost of the architecture-level includes how many adders, subtractors, and registers are used for implementation. Gate-level information includes the area cost, critical path delay, throughput, and throughput per area. In the architecture-level, analysis results indicate that the original filter with binary coefficients has the largest area cost among other designs. The design in [7] has the smallest summation for calculating the number of adders and subtractors. More than having the smallest area cost, the proposed design also achieves the highest throughput and ratio of throughput per area in the gate-level synthesis among other designs. Table 2 lists two filter designs: the coefficients expressed with binary and CSD formats. In the architecture-level, the proposed filter with CSD coefficients has the smallest summation for calculating the number of adders and subtractors. The proposed filter only requires a total of 56 adders and subtractors which are the smallest amounts among the previous designs. The best design is the method in [7] , which has an area of 255,450 um 2 and achieves a throughput/area of 4,979 Kbps/um 2 . The proposed design with CSD coefficients can operate at 86 MHz with an area of 241,813 um 2 , leading to a throughput rate of 1,382 Mbps. Its ratio of throughput/area is 5,715 Kbps/um 2 , which is the highest performance among the previous designs. In summary, the proposed design reduces 5.7% of the total filter area, shortens 25.7% of the critical path delay, and improves 14.8% of the throughput/area compared with the best design in [7] .
Conclusions
This work has developed a novel filter design with coefficient mapping method. The proposed method can reduce the area cost by finding the primary coefficients and using them to construct the remaining coefficients. The proposed method can also use several coefficients and construct all of the filter coefficients. Experimental results demonstrate that the proposed design with binary or CSD coefficients can more significantly reduce the area cost and improve the ratio of throughput/area compared with previous designs. Implementation results further demonstrate that the proposed design has the highest throughput with the lowest area cost.
