This paper proposes a novel algorithm for the design and hardware reduction of a class of multiplier-less two-channel PR filter banks (FBs) using sum-of-powerssf-two (SOPOT) coefficient. It minimizes a more realistic hardware cost, such as adder cells. subject to a prescribe output accuracy taking into account of the rounding and overflow effects, instead of using just the SOPOT terms as in conventional method. Furthermore, by implementing the filters in the FBs using multiplier-block (MB), significant overall saving in hardware resources can be achieved An effective random search algorithm is also proposed to solve the design problem which is also applicable to PR W FBs with highly nonlinear objective functions.
I. INTRODUCTION Perfect reconstruction (PR) multirate fiter banks (FB) have important applications in signal analysis, signal coding and the design of wavelet bases. A number of techniques for designiug linear-phase and low-delay PR two-channel filter banks are now available [1] [2] [3] . Recently. there is an increasing interest in designing PR filter banks with very low implementation complexity. One of the applications is to provide efficient hardware implementation of the 9/7 wavelet filter for the PEG2000 standard FBs using sum of powers-of-two (SOPOT) coefficients are particularly attractive for VLSI or hardware implementation because multiplication of SOPOT coefficients can be implemented efficiently using hard-wired shifters and adders only (i.e. multiplierless). The design of such SOPOT PR FBs using the 2-channel lossless lattice structure and genetic algorithm was studied in [6] . Another family of multiplier-less PR two-channel FIRAR FB and wavelets, using SOPOT coefficients and the structure in [l] , was studied recently by the authors in [4] [7] . They are attractive because of their low hardware and design complexities. Furthermore, the PR condition is structurally imposed and is robust to c&icient It is well known that there are two sources of error in implementing a digital filter: coefficient round-off error and signal round-off error [lo] . Coefficient round-off error happens when the real-valued coefficients of the filter. obtained say by the ParkMcClellan algorithm, are rounded to their fured-point representations to simply the hardware implementation. The frequency response of the filter is therefore changed, and might not satisfy the specification any more. On the other hand, signal roundoff error occurs when overflow occurs due to insufficient internal wordlen=$b and improper scaling; and when rounding is performed for long intermediate data after multiplications with the filter coefficients. Signal round-off mor is usualiy more difficult to handle in hardware implementation because complicated hardware for detecting overflows, etc., would significantly slow down the throughput of the system The SOPOT FBs mentioned above are free from coefficient round-off noise because the FBs are optimized using the SOPOT coefficients as variables. Unfortunately, most of these methods only focused on minimizing the number of SOPOT terms to meet a given frequency specification. and pay little attention to signal round-off error. In order to satisfy a given output accuracy, one usually employs a fixed and long wordlength for all intermediate data, which means increased hardware complexity.
Therefore, the design problem should be to minimize the hardware complexity of the system while satisfying the given frequency specification and the output accuracy. The hardware complexity could be the number of adder cells and registers used in the FBs, which is related to the exact wordlength used for each intermediate data. The output accuracy of a digital fdter is usually specified statistically by its output noise power due to the rounding operations performed, using a given noise model. For fine quantization, roundoff noise is usually modeled as white and is uncorrelated with the signal and other noise sources. To satisfy a given output accuracy (say ]&bit), one has to determine the appropriate scaling and quantization.
wordlength of each intermediate data to avoid signal overklow and to achieve a noise power less than the given specification (say96dB for 16-bit accuracy).
The purpose of this paper is to provide a solution to the above can also be designed using the proposed method Our paper is organized as follows: in section 11, the SOPOT FBs considered and the MB technique will be described The round-off-noise and overflow problems will be addressed in Section III. Section IV is devoted to the 'Random search' design algorithm This is followed by several design examples in Section V. Finally conclusions are drawn in section VI. be to the original real number. In practice, the number of non-zero terms i s usually kept to a small number while satisfying a given specification so that the multiplication can be implemented as a limited number of shift and add (subtract) operations, giving rise to multiplier-less realization. Multiplier-less filter banks and wavelet bases with linear-phase and low system delay can be obtained &om this sbucture by searching for the SOPOT coefficients using the genetic algorithm [6] [7]. As mentioned earlier, the number of adders needed to implement a(=) and j ( z ) can further be reduced 0-7803-701 1-2/01/$10.00 02001 EEE by rewriting them in transposed form It cm be seen that instead of multiplying the delayed input samples with the filter coefficients as in the direct f o r q the input sample is now multiplied with all the coefficients. This can be efficiently implemented using a multiplier block (h4B) [9] . Let's consider a simple example with two hlter coefficients: 3 and 21. The SOPOT representations of these two numbers are: 3=2' +1 and 21=2' +2' + l . This requires 3 adders and 3 shifts. If implemented in a MB, the multiplication of the input with the coefficient 3 will also be generated by decomposing 3 as 2' + 1 , requiring one addition. The multiplication with 21, however.
II. 2 -c " N E L PR SOPOT FB
can be simplified by re-using the intermediate result generated by the fmt filter coefficient '3' as 21 = 3.7 = 3. (24 -1) . Actually, the intermediate result, after multiplication by 3, is multiplied by 7, which reqwes one less adder than generating 21 directly. In principle, it is possible to remove all the redundancy found in the constant multipliers leading to a reahation with the minimum number ojadders. This can drastically reduce the number of adders required for realizing such FBs when there is a large number of filter coefficients to be implemented in the transposed form FIR struuure (around 50% in our example).
Ill. ROUND-OEFNOm AND OVERFLOWANAtYSES

Analvsis of Round-off Noise
As mentioned earlier, round-off noise occurs when rounding is performed during arithmetic computation. In fixed-point arithmetic. round-off operation is usually performed after multiplication to limit the wordlength of the intermediate data in order to save hardware resources. Round-off error is thus generated Due to the difficulty in analymg exactly the rounding error, they are usually treated as white random process, unmelated with the signal and other noise sources. For rounding operation, quaatization noise will have zero mean and a variance o2 = A2 112, where A is the quantization step-size, which is determined by the number of fractional bits that is retained after multiplication. Consider the &msposed form FIR filter in figure 2 . The blocks D and Q(.) represent respectively a register and the round-off operator. Any signal in this filter, for example the input signal x(n] , has a fuced-point representation of the form < n I m > , which means that the total wordlength is n + m bits where n represents the integer bits (including the si@ bit) and m the hctiwat bits. For notation convenience. any signal will be represented as 4n] :< n I m >, meaning that it has n integer bits and m fractional bits. Now, consider the input sample x[n] :< 1 I 7 > , which is a %bit number gated into the digital filter at every clock cycle. In general, to have I &bit output accuracy, the output noise-power must be below -96 dB level. From these results, we can see that, the larger the number of noise sources, the lower w i l l be the accuracy of the computation. The noise power can however be reduced by increasing the wordlength for the fractional bits, at the expense of increased hardware complexity.
Preventing Overflow
Another important source of error is signal overflow [lo] .
which occurs when the &located wordlength in the integer part is insufiicient to represent comedy the fured-point representation of the output after addition (such as the adders in Fig. 2) . In order to avoid overflow, we must allocate more bits to the integer part of the register (say D in Fig.2 ). We me given the option to retain or decrease the number of bits in the hctional part, depending on the required accuracy. To determine whether overflow will OCCUT for a given adder, we can compute certain measures of the transfer function from the input to this particular adder. Here, we prefer a more conservative measure using the absolute sum of the impulse ).
determine the required integer wordlength at each position to avoid any overtlow. The number of fractional bits will be optimized to s a t i e the given output accuracy. It should be noted that there are other scaling method such as L2 scaling which can also be used However, there is stiU a small probability that overflow will OCCUT. In digital signal processor, special hardware is usually used to detect the present of overftow and the result will be clipped to the m&dmjnimum values of the representation (saturation arithmetic).
Our design method consists of two parts. First, the parameters of the filters a(=) and /3(z) such as their coefficients and their order (parameters N and A4) are determined from the frequency specification (system delay, stopband attenuation, cutoff frequencies) using the method in [4] . Then, the SOPOT coefficients are determined using a random search algorithm to generate the ME (see 1 below). The hardware complexity of the FBs are then minimized while maintainins the output accuracy using the noise models mentioned earlier (see 2 below).
Search for the SOPOTjIter coeficienfi.
The Optimization procedure consists of two stags. F&L a random search algorithm, to be discussed in the sequel, is used to search for the SOPOT coefficients of a(;) and p(=) such that a given performance measure is minimized Thep the minimum number of adders needed in the multiplier block is determined The generation of the multiplier-block from the SOPOT coefficients follows the algorithms proposed in [9] . Let I, be the vector containing the real-valued coefficients of a(=) and p(=) obtained by the method in [4] . The principle of the random search algorithm is to generate random candidate SOPOT coefficients in the neighborhood of x, so as to search for the optimal discrete solution More precisely, a new coefficient vector xNm is generated by adding to it a random vector to the original coefficient
, where a is a s a l e factor which controls the size of the neighborhood to be searched, xR is a vector with its elements being random numbers in the range (1)
The process is repeated with merent vector i so that the SOPOT space in the neighborhood of i is sampled randomly. Since the sampled solutions are close to the real-valued optunal solution, their frequency responses will also be close to the ideal one. but with Merent hardware complexity. The set that yelds the minimum score with a given number of terms is recorded As this is a random search algorithm, the longer the searching time, the higher the chance of finding the optimal solution.
Minimization of the jiIter banks hardwore structures with prescribed outpul accuracy
After the MB is generated the maximum wordlength of all the products, Fig. 2 , is calculated Ifwe do not perform any rounding using the operator Qf.) , and sufficient wordlength is allocated to all adders, then there is no rounding error. Of come. this will require excessive hardware cost, especially when the output accuracy is low. Our god is to determine the Given the rounded output format of the MB, 6 , one can tietennine, using the method described in Section III.2, the formats of the registers, D's, and the structure of the adders, in order to avoid any ovefflow. The fractional part for those scaled output, to prevent overfIow, can either retain its wordlengh or reduce it by one as mentioned in Section IlI.2. This option is stored in a vector d, , to be optimized together with 6 . The noise power at the filter output of the fdter is readily computed accordingly to the analysis described in Section IlI.1. Note the output noise power from a(z) and p(z) wd1 be evaluated and their mmbutms at the lowpass (and highpass) analysis filters will be properly summed, using their respective power transfer functions mentioned in Section m.1. Our design algorithm seeks to lower the wordlength of each intermediate data and hence the complexity format as specified in 6 and S, to minimize the hardware cost. Using 6 and a,, the hardware cost, C, given by the adder cells in the MB and the subsequent adders in Fig. 2 can be evaluated In summary, the design problem is
(a*),)
whexe F&, is the output noise power at the lowpass and highpass filters and Pvc is the specified output accuracy. Using a random search algorithm similar to that mentioned in Section IV.l, the vector (6,6,) is searched in the neighborhood of their full precision values (6,6,), (that is no rounchng) for feasible solutions that satisfymg the given output accuracy. The one with the minimum hardware cost C(6.6, ) is declared as the solution of this problem There are several advantages of this algorithm First of all, with the computational power of nowadays personal computer (PC) the time for obtaiuing high quality solutions is manageable, especially when an initial real-valued solution is available by some means. In fact, for the problem considered here, the overall design time is less than 10 minutes using a Pemium-400 PC with Matlab 5.3, including both the design of SOPOT coefficients, generation of the MB and the intmal wordlength allocation. Secondly, it is applicable to problems with general objective functions probably with very complicated inequality constraints, as illustrated in this work It is also possible to combine the search with the MB generation processes together for better performance but the computational time will be greatly increased We now present a few design examples.
V. DESIGN EXAMPLES
I. Tiso-channelPR FBs with p(:) and a(:) FIRJ%rs
To demonstrate the effectiveness of our algorithm for solving the complicated design problem, a two-chaunel FB with the following frequency specification is designed: passband and stopband curoff frequencies %= 0. 47 n CONCLUSION A novel algorithm for the design and hardware reduction of a class of multiplier-less two-channel PR FBs using SOPOT is presented It minimizes a more realistic hardware cost, such as adder cells, subject to a prescribe output accuracy taking into account rounding and overflow effects. Fwther, by implementing the filters in the FBs using multiplier-block w), sigdicant overall saving in hardware resources can be achieved. An effective random search algorithm is also proposed to solve the design problem, which is also applicable to PR IIR FB with highly nonlinear objective functions.
1 PWL I Res. 1.
Design Results
