A series of speech perceptual studies were conducted on normally hearing subjects using s thesized acoustic signals based on two s ech essing scgmes : Zero-Crossing and Filter Bank. f i e F i g B a n k scheme was shown to provide more speech information than the Zero-Crossing scheme. These schemes were implemented in a laboratory based speech A Low Power Smtched 6apmtor S ech Specqum Analyzer embodyin several novel design me&ologes is also descnbed.
fhs spectrum analyzer can be used to implement the Filter Bank scheme in a wearable speech processor.
I. INTRODUCTION
In the design of speech processors that would convey useful speech information to cochlear implant recipients. extensive perception and engineering studies are required. These studies and work include : 1. Psychophysical studies to examine the nature of hearin sensations produced by electrical stimulation of r e s i d d audltory nerve fibres; 2. Formulation of speech coding strategies on the basis of 3. fmpiementation of the chosen speech codin strategy using different signal processing schemes a n i different acoustic-to-electric parameter conversion algorithms; 4. Speech perceptual studies on normally hearing subjects using synthesized acoustic signals to assess the relative ments of potentially useful signal processing schemes; and . 5. Speech perceptual studies on cochlear implant recipients to assess the relative merits and appro riateness of different signal processing schemes f & electrical stimulation. Results from ps chophysical studies on cochlear implants have indicated that loudness increased with current level, pitch increased with electric pulse rate, and pitch and sharpness of the hearing sensation produced by individual electrode positions varied in accordance with the tonotopic organization of the cochlea [1, 2] . On .the .basis of these psychophysical results, a log~cal speech c h g is to convert sound intens!ty to elecmc current, $e fundamental Eequency of the speech signal to elecmc re tipon rate, and the frequencies of spectral peaks of the speecrsignal to electrode posioons. As far as the spectral eaks are concerned, formant frequencies may be estimateiby counting the number of zero crossings at the output of formant fdters, or spectral peaks may be detected by a peak picking al orithm at the output of a fdter bank spectrum analyzer. In ddition, the number of formants o~spectral peaks would be one of the variables of the selected signal processing scheme. 
ISCAS '89
The present signal processing scheme used in the University of Melbourne/Nucleus 22 channel cochlear implant [3] estimates the first and second formant frequencies by countin the number of zero crossin s at the output of two formant Aters, and converts these two kequencies into electrode positions [21. This scheme will be referred to as the ZerO-Crossing (ZC) scheme in this paper. A wearable speech processor has been developed and encouragin speech. perce tion results were obtruned from cochlear impknt recipients [$ In order to provide additional spectral information, a different speech processin scheme estimates the frequencies of 4 spectml peaks at &e output of a filter bank spectrum analyzer and converts these spectral peaks to 4 electrode This scheme will be referred to as the Filter Bank ( l & % % : In this aper, the amount of useful speech infomation provided& the two speech F s s i n g schemes, ZC 
II. A REAL-TIME LABORATORY BASED SPEECH PROCESSOR
A real-time laboratory-based speech processor has been developed to permit speech processing and perceptual studies. The processor is interfaced with acoustic nansducers for studies on normally hearing subjects, and cochlear implant hardware for cochlear implant recipients. It comprises 14 TMS32010 di 'tal signal processors (DSPs) interfaced to an IBM compatibt personal computer. Seven of the DSPs are used in the analysis front-end, while the remaining seven are used in the synthesis output section. The ZC scheme, depicted in figure 1, used estimates of the first and second formant frequencies (F1 and F2) and their amplitudes (A1 and A2) to select the channel number and amplification factor of two out of twenty synthesis filters. The fundamental frequency, FO, and the presence of voicpg were also determined. For voiced s ch, the two synthesis filters were excited at a rate equal to g w h i l e for unvoiced speech, a random rate was used. The outputs of the two Nters were summed to form the synthesized speech signal. The FB scheme, depicted in figure 2, estimated a runnin spectrum of the speech signal. Fing a 24 channel speeck s ctrum analyzer, each com nsing a Band ass filter, Full-E v e Rectifier, and Lowpass [lter. The an&sis front-end of the FB scheme implemented in the laboratory-based speech rocessor emulates the function of the Switched Ca acitor $ ech Specaum Analyzer described in Section V. k l o w TK 24 spectrum channels were processed by a peak picking and s nthesis filter selection algorithm which selects four (out of 207 synthesis filters and their corresponding amplification factors. The rate of excitation was the same for the four filters, and was determined by FO and the presence or absence of voicing as described for the ZC scheme. The outputs of the four filters are summed.
III. TESTING AND TRAINING SCHEDULES
Each subject was initially exposed to two sessions of synthesized speech containing only FO information. These sessions were conducted to accustom the subject to degraded and synthesized speech before formal assessment and training so that order effects could be reduced.
After these introductory sessions, the study followed a pre-training assessment, training andrt-training assessment schedules for each scheme.
e four sub ects were first tested and trained using the ZC scheme while &e other two used the FB scheme. The test and training schedules were repeated for the unused scheme.
IV. RESULTS
The post-training percentage scores for consonant perception in noise are shown in figure 3 where 16 consonants in the /a/-/c/-/a/ frame were used. For all four sub'ects, scofes were higher for the FB scheme than for the ZC scieme at si noise-ratios rangin from 5 to 20 B. The results oT2: Z using unprocessed ?normal) speech materials are also shown in figure 3. At signal-to-noise rauos of 5 and 10 dB, it can be seen that the scores were highest for the unprocessed speech, medium for the FB scheme, and lowest for the ZC scheme. From figure 3, it can also be seen that the intersubject variations in the consonant perce tion were small for each of the three conditions : un rocesse8speech. FB and ZC. These results indicate that the f B scheme provided more information that is pertinent to consonant perception than the ZC scheme. However, the performance for the FB scheme is far from perfect as indicated by the much better performance for the unprocessed condition. Figure 4 depicts the pre-training and st-trainin percentage correct scores for the 16 consonant in t~/aJ-/c/-/afframe when no noise was added to the signal. As .in figure 3, there were only small intersubject variations in the post-trainin scores.
It is important to note that for all subjects, the effects of training, as indicated by the difference between the pre-and post-training scores, were larger for the speech processin scheme that was tested first. As an exam le, for subject 1, F j was trained and tested before ZC, i e improvement in performance from pre-to post-training was much larger for the (FB) scheme that was tested first than for the second (ZC) scheme. Vowel perception performance was similar across subjects and speech processing schemes for the, 11 vowels in the hvd/ format. There were also small unprovements from pretraining to post-training.
These results indicate that information pertinent to vowel perception was provided by both schemes. Figure 5 depicts the pre-and st-training percentage correct scores for o n set monosylla% CNC words. For all four subjects, CN$ erformance was better for FB than for ZC, and as in the case of the consonant results in figure 4 , the effects of training were lar er for the s ech rocessin scheme that was trained and teste3 first. F o r g t h & and Z?, subjects 3 and 4 scored higher in CNC words than subjects 1 and 2. In summary, the FB scheme provides more information than the ZC scheme for normally hearing subjects.
S eech perception studies on cochlear implant patients are now k i n conducted to compare the performance of these two speecf processing schemes.
Two of

V. A LOW POWER SWITCHED CAPACITOR
SPEECH SPECTRUM ANALYZER The development of a Low Power Switched Capacitor (SC) Speech Spectrum Analyzer which can be used to implement the FB scheme in a wearable speech processor is now described. The primary requirements are minimum chip area and power dissipation is now described. A Time-Multiplexed approach has been adopted as it allows operational amplifiers (op amps) and some capacitors to be shared, hence reducing hardware requirements. It comprises 24 channels, each of which consist of a cascade of Bandpass filter (BPF), Full-Wave Rectifier (FWR) and Lowpass filter (LPF). Several novel design methodolo 'es have been employed in order to satlsfy the requirements oRhe speech spectrum analyzer. A transitional maximally flat magnitude (Butterworth) and maximally flat FE delay (Bessel) filter approximation is used to in the S BP synthesis. This approximation provides a suitable compromise between the frequency and temporal resolutions. DC offsets between different BPF channels is an imponant consideration as they would inadvertently be measured as energy of the bandlimited BPF outputs. Methods cited 111 literature to reduce these DC offsets include cascadine a Highpass filter section to the preceding Bandpass filter secnon [7] and the use of resistive strings [8] to provide volta e division so that capacitor ratlos of all. BPF channels are ma8e equal.
The latter method results in idenucal DC transfer functions from the input of each biquadratic filter (bi uad) op amp to the output of the biquad for all BPF %annels. However, these methods are hardware inefficient in terms of chip area and power dissipation. A solution proposed here is to design biquads such that the above mentioned DC transfer funcuons are independent of capacitor ratios. Figure 6 (a) depicts such a Time-Multiplexed biquad whose inpudou ut transfer function (Vouoin) and transfer funcuons from $e inputs of op amp 1 and 2 to biquad ou Ut (Vout/Vinl, Vout/Vinz) are gwen in figure 6 (~) . At D? Vout/Vinl F d V o~f l i n 2 are shown to be -1 and 0 respectively. This result is significant because the BPF section is now not only hardware efficient but also micropower compatible without serious DC offset differences between BPF channels. A circuit is termed micrqpower compatible if all its op amps sausfy the following cntena : when an op amp samples an input, its output is not sampled by another op amp during the same instant.
Usual design techniques Fe employed to further minimize DC offsets including using minimum sized complementary switches, clock signals with fairly slow turn off rates, and capacitors as large as tolerable.
The Time-Multiplexed biquad uses the clock signals depicted in figure 6(b) . Capacitors of the BPF bicpd v y considerably across channels due to the different PF trans e, ' functions. In figure  6(a) , the non-integrating ca acitors 'A', F', 'I' and 'U' do not carry charge information f o m one local clock nod to the next. As the value of 'J' remains invariqt for $channels, it is shared amongst all channels as depicted in figure Ha), hence achieving some chip area saving. A modular layout where equal areas are allocated to each channel of the specmm analyzer, is desirable in an integrated circuit im lementation to simplify interconnection and reduce the area &e to interconnection. These equal allocated areas are primarily taken up by capacitors and because of the lar e differences in the total capacitance of the different channefs, the allocated areas are quite large. For the Time-Multi lexed biquad used, the non-integrating capacitor 'A' has the fargest capacitance variations across channels. The large variation in total capacitance of the, BPF biquad can be reduced by employing a new capacitor sham8 technique (fi 6(a)) applied to capacitor array 'A' which is now describeyFirst, a common capacitor 'AO' of 1 unit capacitance is shared by all filter channels as the smallest capacitor of the 'A' array is greater than 2 units. In this manner, an 'A' capacitor is made up by connectin AO' in parallel to a residual 'Ax' capacitor (x being the BPE channel 111 consideration). The size of 'A' array can be further reduced if channels with large 'A' values, channels 9-24, share a further common capacitor 'AA' of 1 1 units. In these cases, the 'A' capacitor is realized by a parallel combination of 'AO' + 'AA' + 'Ax'. In this fashion, 'Ax' for channels 10-24 are reduced by 11 units each, hence the TimeMultiplexed BPF biquad capacitance is reduced by a notable 23%. This capacitor sharing corresponds to an a roximate 15% chip area saving for the Time-Multi lexed BgF section. A clock signal Pd9-24, shown in figure 6& ), that goes high at the commencement of Pd9 and low at the end of Pd24, is used to connect 'AA' into the circuit, or alternatively, a bank.of switches may be used.
The proposed ca acitor shanng techni ue is also applicable to array 'U' w&ch has been simpliied to a two ca acitor array shown figure 6(a). Use of this array results in a c k p area saving of 5% and simplifies the chip layout. Furthermore, as the smallest 'D' capacitor is increased to maintnn the same capacitor ratio at the input of op amp. 1, the worst case noise performance of the TimeMu nplexed BPF biquad is improved.
The SC FWR which is also new, is shown in figure 7. Op amp 1 and com arator 2 are autozeroed during the even phase, hence DC o k e t corn nsated. In the following odd hase. op amp 1 m e s as a deG-fYee u n a a i n invertin Fp!ifier and comparator 2 compares the input .anb its inversion (obtain+ from the. unity gain inverting amplifier). If the input is posinve (neganve) wth reference to analog ground, the output is simply the input signal via 'path +' ('path -') in fi 7. The circuit thus functions as a FWR, i.e. v u t ( n f E In the authors' view, this error is negligibcfor most speech applications. The BPF dynamic ran e using f1.W sup lies is 64dB. The ~osstalk between the hPF channels is -4QdB. >e. measured impulse res nses of the BPF secnon settle wthm 20 ms. These resu% satisf ical s eech spectrum analyzer s cifications. The 6S#of the gPF secnon and FWR at 1
I&z is -20 dB and -25 dB respectively . PSRR ofthe LP section at DC is -35 dB.
With a f1.W supply, the cyren! drain-of !he proto pe chip is 5~. m e op amps used in this d z a a o n were fdlcliberately overdesigned for measurement urposes.
An integrated circuit compris@g 24 channels ( 1 8 les and 24 FWRs), now under design, is expected to draw &s 2.5 mW w i t h a +1SV supply, hence smtable for its apphcaaon m a wearable speech processor.
VI. CONCLUSIONS
Speech rrceptual studies have been conducted which show that the ilter Bank speech processing scheme provides more speech information than the Zero-Crossing scheme. These schemes w~ implemented in a laboratory based speech rocessor whch has been descnbed. A Low Power Switched Eapacitor Speech Spectrum Analyzer embod 'ng several novel design methodologies has also been describex ak input signal. 
Consonant Confusion In Noise
