Abstract -The ANSI S1.11 1/3-octave filter bank is suitable for digital hearing aids, but its large group delay and high compu-tational complexity complicate matters considerably. This study presents a 10-ms 18-band quasi-ANSI S1.11 1/3-octave filter bank for processing 24 kHz audio signals. We first discuss a filter order optimization algorithm to define the quasi-ANSI filters. The group delay constraint of filters is limited to 10 ms. The proposed design adopts an efficient prescription-fitting algorithm to reduce inter-band interference, enabling the proposed quasi-ANSI filter bank to compensate any type of hearing loss (HL) using the NAL-NL1 or HSE prescription formulas. Simulation results re-veal that the maximum matching error in the prescriptions of the mild HL, moderate HL, and severe-to-profound HL is less than 1.5 dB. This study also investigates the complexity-effective multirate IFIR quasi-ANSI filter bank. For an 18-band digital hearing aid with a 24 kHz sampling rate, the proposed architecture eliminates approximately 93% of the multiplications and up to 74% of the storage elements, compared with a parallel FIR filters architec-ture. The proposed analysis filter bank (AFB) was designed in UMC 90 nm CMOS high-VT technology, and on the basis of postlayout simulations, it consumes 73 W. By voltage scaling (to 0.6 V), the simulation results show that the power consumption decreases to 27 W, which is approximately 30% of that consumed by the most energy-efficient AFB available in the literature for use in hearing aids.
Introduction
EARING loss [1] - [3] can be characterized as conduc-tive, sensorineural, and mixed hearing loss. Conductive hearing loss means the sound is not conducted well through a disordered outer or middle ear. Sensorineural hearing loss (SNHL) means the sensory cells in the cochlea are absent or not functioning appropriately. If both conductive and sensorineural losses are present, the result is mixed hearing loss. Conductive hearing loss can be recovered after some adequate treatments, but most people with SNHL are fitted with hearing aids. SNHL can degrade the functions of human ear in several different ways and introduce phenomena such as a raised hearing threshold, de-creased and squeezed hearing range, reduced temporal and spec-tral resolution, and the loss of noise tolerance [1] . These factors make hearing aids more complex than simply amplifying sound Audiologists usually identify and diagnose hearing loss with the pure tone audiogram (PTA) test, which uses sinusoidal sig-nals over octave frequencies from 250 Hz to 8 kHz to measure the minimum levels of sound (i.e., hearing thresholds). The re-sults of PTA test are generally recorded on an audiogram. Fig. 1 demonstrates a typical example of moderate-tosevere hearing loss. Fitting hearing aids usually requires a prescription formula. The widely used NAL-NL1 [4] , or the HSE for Chinese [5] , gen-erates the ideal electro-acoustic response (i.e., the gain-curve) of a hearing aid. The gain-curve specifies the insertion gain, or the amplification, at each standard 1/3-octave frequency from 150 Hz to 8 kHz. The goal of the NAL-NL1 is to maximize the speech intelligibility while maintaining the loudness of the am-plified sound equal to, or less than, that perceived by people with normal hearing. The NAL-NL1 produces different gain-curves for different input sound pressure levels (SPLs) (e.g., 40, 50, 60, 65, 70, 80, and 90 dB). The right side of Fig. 1 illustrates the ex-ample prescription of a 40 dB SPL input level.
Advanced hearing aids are currently battery-powered digital devices consisting of a microphone, digital signal processing (DSP) circuit, and receiver (i.e., the loudspeaker) [1] - [3] . The microphone and the receiver perform the transformation be-tween acoustic and electrical signals. The DSP circuit performs sophisticated functions including the auditory compensation algorithm to overcome the hearing loss, and noise reduction and feedback cancellation to improve speech quality and intelligibility. The DSP circuit also uses adaptive directional microphones and spectral shaping for speech enhancement. According to Kates [3] , a DSP block, performing entire set of DSP functions, typically consumes up to 61% of the overall power budget of a digital hearing aid. One common approach to realize the auditory compensation algorithm, which makes the sound audible for hearing-aid wearer, is to employ an analysis filter bank (AFB) followed by sub-band amplifica-tion and multichannel wide dynamic-range control (WDRC) and an synthesis filter bank (SFB) [1] - [3] , [6] , [14] . A low power Mandarin-specific hearing aid test chip was recently implemented in UMC 90 nm CMOS technology with High-VT standard cells [6] . The test chip contains an 18-band filter bank and 3-channel WDRC auditory compensator and a multi-band noise reduction with entropy enhanced voice activity detection (VAD). The power consumed by AFB is approximately 27% of the total power in [6] . band [9] - [11] , critical-band [12] , symmetric-band [13] , and 1/3-octave-band [14] filter banks. A 7-band octave filter bank was designed in [9] and [10] using the interpolated FIR (IFIR) filter technique. Lian and Wei [11] proposed an 8-band octave filter bank with the IFIR and frequency-response masking (FRM) techniques to reduce the computational complexity. Chong et al. designed a critical-band filter bank to match well human percep-tion [12] . However, the irregular property of the critical bands makes their implementation difficult. In addition to [11] , Wei and Lian proposed a 16-band symmetric filter bank [13] that guarantees high frequency resolution at both high and low fre-quency regions but rather low resolution near to . Kuo et al. recently proposed an efficient 18-band ANSI S1.11 1/3-octave filter bank [14] . This 1/3-octave filter bank is suitable for both NAL-NL1 and HSE prescription formulas for hearing aids.
To design a filter bank for hearing aids, the frequency re-sponse should match the prescription as closely as possible. Suppose that the prescriptions by NAL-NL1 or HSE in Fig. 1 are the target specification, and we evaluate the matching ca-pability of different types of filter bank (Fig. 3) . Further as-sume that filter banks in Fig. 3 possess 18 bands and the pre-scription-fitting algorithm, described in Section II, is applied to minimize the matching error. The uniform filter bank has equal-space sub-band bandwidth, which results in a fixed fre-quency resolution. The lowest resolution in the low frequency region contributes the maximum matching error, which is ap-proximately 8.4 dB. The symmetric filter bank, on the other hand, has a rather low frequency resolution near . The max-imum matching error, appearing in the middle frequency region, equals 6.2 dB. With matching to human hearing characteristics, the critical-like filter bank reduces the maximum matching error to 3 dB. Finally, the 18-band ANSI S1.11 1/3-octave filter bank achieves zero matching error because the frequency sampling points of NAL-NL1 or HSE are the same as the central frequen-cies of ANSI filter bank [15] .
Filters usually cause delays in the datapath of the hearing aid. Although the 1/3-octave filter bank has the best matching capability, it suffers from 78 ms delay for processing 24 kHz audio [14] . The delay of the 1/3-octave filter bank is still up to 27 ms if parallel minimumphase infinite-impulse response (IIR) filters are applied [14] . This is because the sharp transition bandwidth of the ANSI filter is defined in a low frequency region [15] . Except for ANSI filter bank, the other filter banks in Fig. 3 have delays of approximately 10 ms. The matching ca-pability of different filter banks obeys the acoustic uncertainty principle, which states that the time-bandwidth product is con-stant. That is, if spectral resolution increases, temporal resolu-tion decreases, and vice versa.
Hearing aids transmit signals into the ear canals through two different paths. One is the directly received sound and the other is the sound processed by the hearing aid.
Fig. 3. Matching capability comparisons for different types of filter bank
Previous studies have investigated the acceptable delay introduced by the hearing aid. The general requirement of less than 12 ms [2] prevents the loss of visual cues (un-synchronized) with respect to hearing. Stone and Moore [16] , [17] indicated that a delay of 20-30 ms can be judged as objectionable for mild-to-moderate hearing loss. The popularity of open-canal (OC) fitting hearing aids, which leave the ear canal much more open than traditional close-fitting ear-molds, makes hearing aid delays even more concerning. In the OC fitting hearing aid, more sounds would travel directly into the ear canal. A delay of approximately 10 ms might create the comb filter effect [18] , [19] (which will not be the case at most of frequencies) if the direct path signal amplitude is comparable to the one produced by the hearing aid.
Using the high performance ANSI 1/3-octave filter bank, a relaxed-version with a low group delay filter bank, called the quasi-ANSI filter bank, for the digital hearing aid is designed and implemented. This study proposes a filter order optimiza-tion algorithm for developing the FIR filters. The delay con-straint of each filter is limited to 10 ms. To reduce the match error, this study also considers an efficient prescription fitting algorithm. Simulation results show that the maximum matching error to various prescriptions of different types of hearing loss is less than 1.5 dB. Moreover, a low complexity multirate IFIR filter bank architecture is proposed. Compared with an 18-band parallel FIR filters, this design saves approximately 93% of the multiplications and 74% of the storage elements. The proposed analysis filter bank has also been implemented in UMC 90 nm CMOS technology with a high-VT standard cell library. By pro-cessing 24 kHz audio, the chip consumes only 73 W. Applying voltage scaling enables further energy savings. If the supply voltage decreases to 0.6 V, the simulation result reveals that the power consumption of the proposed analysis filter bank equals 27 W, which is about 30% of that consumed by the most en-ergy-efficient AFB [14] available in the literature design for the hearing aid.
The rest of this paper is organized as follows. Section II presents a low delay quasi-ANSI S1.11 1/3-octave filter bank using a filter order optimization algorithm and an efficient prescription-fitting algorithm to minimize the matching error.
Several simulation results in this section verify the effec-tiveness of the proposed filter bank. Section III develops the low-complexity VLSI architecture of the proposed filter bank by exploiting the IFIR and multirate signal processing tech-niques. Section IV demonstrates the implementation result of the proposed filter bank. Finally, Section V presents some concluding remarks.
Low Delay Filterbank Design
This section presents octave FIR filter bank as ζ.
A. Quasi-ANSI S1.11 1/3-Octave Filter Bank The ANSI S1.11 standard [15] defines 3-class, 43 1/3-oc-tave bands covering the frequency range of 0-20 kHz. Each 1/3-octave band is specified by its midband frequency (or cen-tral frequency) and bandwidth. Based on the good matching performance, this study de-signs a relaxed-version of standard ANSI filters of constraint tap-length for digital hearing aids. Fig. 5 outlines the proposed filter coefficient optimization algorithm, which contains two iter-ative design procedures: one meets the 10 ms group delay constraint, and the other limits the relaxation in the matching error. Note that an advanced noise reduction algorithm, such as the Siemens SoundSmoothing noise reduction algorithm [20] , contributes a nearly 1 ms group delay [19] . Therefore, the constraint of 10 ms group delay of the filter bank is sufficient to meet the general requirement of the hearing aid without loss of visual cues with respect to hearing [2] . Moreover, to design a filter bank for the hearing aid, the frequency response should match the prescription as closely as possible. A 3 dB error performance is also a necessary constraint to achieve the preferable compensation for each hearing loss pattern.
Note that expanding the transition bandwidth reduces the group delay of the designed filter by (4) and (5).
TABLE I Exploration Results of filter
The Minimize matching-error algorithm reduces the matching error caused by inter-band interferences.
To evaluate the effectiveness of the proposed filter bank, this study uses audiograms from the Independent Hearing Aid Infor-mation, a public service of Hearing Alliance of America [21] . These audiograms include mild hearing loss, moderate hearing loss, and severe-to-profound hearing loss. These audiograms also appear in [11] , but they considered fitting the audiograms only, and not their prescriptions.
The audiogram in Fig. 7 (a) depicts low frequency mild-to-moderate hearing loss and mild high frequency hearing loss. People with this type of hearing loss lose overall loudness be-cause most vowels cannot be heard. Very close distance con-versations should be necessary. The maximum matching error of the proposed filter bank is approximately 0.1 dB. The audio-gram in Fig. 7(b) , like that in Fig. 1 , reveals moderate-to-severe hearing loss at middle to high frequency region, which is the common type of hearing loss caused by aging. The sensitivity at low frequencies is good enough to get some vowel information, helping the person realize that someone is talking.
However, without consonants, they cannot easily distinguish between one word and another. The maximum matching error of the pro-posed filter bank is approximately 0.4 dB, which is slightly worse than 0 dB, the standard ANSI filter bank, but much better than the others in Fig. 3 . The audiogram in Fig. 7(c) reveals severe-to-profound hearing loss at middle to high frequency re-gion, which occurs commonly in older workers exposed to noisy environments for prolonged periods. The maximum matching error of the proposed filter bank is approximately 0.6 dB. Fi-nally, the audiogram in Fig. 7(d) shows severe flat hearing loss at all frequencies, where the hearing thresholds are more than 70 dB. Although this is a difficult case to compensate for, the maximum matching error is less than 1.5 dB, thus validating the effectiveness of the proposed filter bank.
Multirate Ifir Quasi-Ansi Filter Bank
This section presents the efficient VLSI architecture of the proposed filter bank by exploiting the IFIR and multirate signal processing techniques. Although normal human ears are not sensitive to phase-delay, designing filter bank with exact linear phase [11] , [12] , [14] , of-fers some advantages regarding the development of advanced binaural hearing aids, which not only target at compensating hearing losses, but also music signals and sound localization for binaural hearing aids. 
Low-Power VLSI Implementation
One important issue in early stage of the system design is to decide the appropriate design parameters among possible de-sign alternatives or design spaces. The design spaces usually involve multiple metrics of interest, such as timing, resource usage, power, and cost. In general, less functional units require higher clock rate and temporary storages or complicated con-trol logic. Consider the silicon implementation in [14] as an ex-ample. By applying a single multiply-and-accumulate (MAC) unit, standard ANSI analysis filter bank was implemented in TSMC 130 nm CMOS technology and the chip operated at 6.13 MHz for real-time processing of 24 kHz data.
It may be too high for hearing aid applications. And, the MAC unit occupies only approximately 25% of the chip area and consumes approximately 30% of the total power [14] . That is, the control logic and the storages are dominant, which may not be a good architecture for low-power VLSI
A. Multi-MAC Architecture
Instead of single MAC unit, consider a set of 25 parallel mul-tipliers, which can perform up to 49-tap linear-phase FIR fil-tering calculation in one cycle. With 25 multipliers, the first delay line requires 5 cycles to complete filtering calculations for every sample, the second delay line requires 3 cycles for every two samples, and the third delay line requires 21 cycles for every four samples. If data are well scheduled, there will be no stall cycle and the hardware can operate at 288 kHz for real-time processing of 24 kHz audio. Otherwise, a higher clock rate will be necessary.
1)
Filter-Oriented RPA Algorithm: For simplicity, assume that within a clock cycle there would be one, and only one, sub-filter with the right to access the set of 25 multipliers. The efficient data scheduling algorithm can be derived by modifying the RPA in Fig. 11 , called the filter-oriented RPA algorithm. Therefore, at most 12 cycles per sample are required to accomplish the filtering operations. Note that the unused multipliers in each cycle can be clock-gated for saving power.
The second row of Table VI shows the implementation result of the proposed quasi-ANSI AFB, using filter-oriented RPA, in UMC 90 nm CMOS high-VT technology. Three 250 ms input sequences were used for power estimation: a female voice, male voice, and random signal. Synopsys PrimeTime suite and Nanosim were respectively applied to gate-level and circuit-level simulations to evaluate the power performance. The clock rate of the proposed quasi-ANSI AFB was 288 kHz and the power consumption was 91µw.
2) DelayLine-Oriented RPA Algorithm: The filter-oriented RPA is comprehensible; however, the data fed into the set of 25 multipliers would switch over delay lines frequently. This might consume extra dynamic power. To address this issue, par-tition the set of 25 multipliers into three independent subsets, dedicated to three delay-lines (i.e., 9 multipliers for the first delay-line, 3 multipliers for the second, and 13 multipliers for the third).
The third row of Table VI outlines the implementation result of the proposed AFB by applying delayline-oriented RPA. The clock rate was 288 kHz and the power consumption was 84µw. Note that switching over delay lines infrequently reduces the dynamic power to 31µ w, comparing 41µ w with the filter-oriented RPA.
B. Adder-Based FIR Architecture
Although the control logic is simple, the results in Section IV.A conclude that the allocation of 25 multipliers seems to be an overdesign. One efficient method to reduce the redundant operations is to apply multiple constant multipli-cations (MCMs) [28] or common sub-expression elimination (CSE) [29] method. [29] An efficient multiplierless adder-based) quantization framework for FIR filters was re-cently proposed in [30] , which allows explicit tradeoffs between the hardware complexity and the quantization error to facilitate FIR filter design exploration. Simulation results reveal that the adder-based architecture saves approximately 43% redundant additions, compared with the direct implementation for each sub-filter.
To achieve the same clock rate (i.e., 288 kHz), a chain of 45 adders are allocated. The fourth row of Table VI shows the implementation results of an adder-based 18-band AFB, which consumes 137 W. Both the chip area and power consumption are significantly worse than that of multi-MAC cases. This is because that the adder-based architecture usually accompanies an extreme increase in storage elements for temporary values [27] . Despite rather limited arithmetic units, the control logic of the adder-based filter bank is overly complicated, and requires many large multiplexers. This overrides the benefit of the reduced resource usage.
For a fair comparison, we have re-implemented the result in [14] using the same CMOS technology (i.e., UMC 90 nm CMOS high-VT technology). The simulation results in Table VI show that the single-MAC architecture of [14] consumes 102µ w.
C. The Optimized Low-Power Architecture
The implementation results in Table VI show that the op-timized hardware would be a compromise design consisiting of fewer, but enough, parallel multipliers, limited storage, and control logics. As described in Section III, the integral com-parison ratios regarding the multiplicative complexity for three delay-lines are approximately 3 : 1 : 4, respectively. Because of possessing the least complexity, it is necessary to allocate one MAC unit for the second delay-line to serve filtering calcula-tions. To guarantee adequate computer power preventing from stall or wait cycles, the number of MACs designated for the first and the third delay line, respectively, will be 3 and 4. With 3 multipliers, 33 cycles are required to complete filtering calcula-tions for the first delay-line. The second and the third delay-lines require 52 and 125 cycles, respectively, to complete calculations with 1 and 4 multipliers, respectively.
The system controller coordinates the data flow, according to the scheduling algorithm, and handles the input interface. The register module contains the coefficient memory and the data memory. The coefficient memory stores the 14 sub-filter coeffi-cients, while the data memory maintains 3 separate delay-lines. The filter engine contains 3 independent sets (i.e., 3, 1, and 4, respectively) of MAC units, dedicated for three delay-lines. The optimized 10 ms 18-band quasi-ANSI AFB has been implemented in UMC 90 nm CMOS high-VT standard cell li-brary. The chip has an area of approximately 33274 (2-input NAND) gates and operates at 792 kHz. For processing of 24 kHz audio, the power consumption is approximately 73 W, es-timated using three 250 ms input sequences: the female voice, male voice, and random signal.
Conclusion
This study presents a low-delay, high-performance, and low-power filter bank design for advanced digital hearing aids. The standard ANSI S1.11 1/3-octave bank is rarely adopted in hearing aids because of its high computation complexity and rather large group delay, even though it has the advantage of good match to human hearing characteristics. This study proposes a 10-ms 18-band quasi-ANSI S1.11 1/3-octave filter bank with a slight relaxation the ANSI specification. The computation complexity is 226 MACs. The storage complexity is 187 registers for delay-line, 506 coefficients, and 300 buffer registers to meet linear-phase requirements. The proposed AFB was implemented in UMC 90 nm CMOS high-VT technology, and operated at 792 kHz for real-time processing of 24 kHz audio and consumed approximately 73 W with V supply voltage. The chip can also operate at a low voltage (0.6 V) without any performance degradation. The contributions of this study include the following: (1) a systematic framework for developing more appropriate quasi-ANSI specification of filters for hearing aids that are more easily implementable and realizable, as Section II shows; (2) a thorough design space exploration method that exploits multirate and IFIR techniques to construct a VLSI architecture that significantly reduces multiplicative complexity of the filter bank without increasing the latency unduly, as described in Section III; and (3) an efficient data scheduling algorithm and appropriate hardware resource allocation for the small chip area and ultra-low power implementation of the proposed filter bank, as Section IV shows. For business considerations, the detailed specifications of modern hearing aids are beyond disclosure, and it is difficult to compare them with the proposed filter bank. Nevertheless, we believe that, if NAL-NL1 or HSE prescription formula is applied, the proposed design is superior.
