Abstract -This contributioii presents the four phases of a project aiming at the realizatioii in VLSI of a digital audio equalizer with a linear phase characteristic. Tlie first step includes the identification of tlie system requirements, based on experience arid (psycho-acoustical) literature. Secondly, the signal processing algorithms coiistitutiiig the global design of tlie equalizer are coinputer siiiiulated. Tlie third step iiicludes tlie realization of tlie equalizcr design using one or more programmable DSP's. In order to niiiiiinize tlie iiiiiiiber of DSP chips necessary for the realization, tliis step reqiiircs tlic optimization of the structure and niappirig of the algoritliin oii tlie resources of tlie DSP. The number of processor cycles is crucial iii this optiniizatioii. The purpose of tlie resulting prototype is to test and to validate in a digital audio erivironrneiit the specificatioii generated iii tlic first step. Tlie programmability of the DSP's allows for specification cliaiiges at this stage of the project. The fourth step is tlie VLSI iiiiplciiiciitatioii of tlie validated algorithm of the previous pliase. For tliis purpose the structure of tlie algorithm is optimized in order to take fill1 advantage of tlie silicoii resources. Speed and required area are tlie crucial parameters in this optiniization. The final step includes tlie testiiig of tlie coiripleted chips together with a parallel designed and realized PCB in a digital audio environment. The presentation will enipliasizc tlie algoritliinic and design considerations together with tlie results.
INTRODUCTION
In the field of audio recording or reproduction, signal analyzers are widely used e.g. t o overcome shortcoinmings of analog natural sources or to modify the sound frequency character istic-in a m o r e preferred shape. Equalizers, therefore, do consist, of a set, of bandpass filters. In each band the gain can be set by the user. T h e bandpass filters in many analog equalizers are only of second order. In general, analog equalizers have the disadvantages of the dependent gain control between adjacent bands, due to the relative large overlap in frequency band and srnall stopband attenuation, and a large ripple in the passband. Furthermore analog equalizers inevitably add phase distortion especially in the enhanced resp. attenuated parts of the spectrum. For monaural sources this tlistortioii is of no importance. However, in a multipoint acoustic field, the stereo image can be severely distorted due to the phase nonlinearities.
The location of a soiirc,e in a sonndfield by the human auditory system is, a t least for the lower freqnency range, gnided by the phase difference perceived between the two ears. The topic of the perceptivity of phase distortion, however, continues to generate much controversy and we will stay away from it.
These considerations lead us to the idea to apply digital FIR bandpass filters with a strict linear phase c.ha.racterist,ic for the equalizer realization. In order to test the performance and the user preference, a project started t o design and realize such a system. As linear phase FIR filters have symmetrical impulse responses, we anticipate in this approach the pre-echos of impulsive sound dynamics. Whether this is worse than phase distortion is an interesting question in the area of psycho -acoustics but outside our scope. This paper emphasizes the design and realization of the digital linear-phase equalizer.
The final step of this project includes the testing of the completed chips together with a parallel designed and realized PCB in a digital audio environment. However, this point has not been reached yet and is outside the scope of this paper.
SYSTEM SPECIFICATION
First we specify the input signal to the equalizer system. If we adopt the CD digital audio quality standard we have an audio signal sampled a t a rate of 44.1 kHz and digitized in I G bits. The output of the equalizer follows this standard as well. The equalizer can be realized by a set of N parallel bandpass filters with a gain factor for each band which can be set independently by the user. The system should have a linear overall phase characteristic and a flat amplitude characteristic if the gain factors of all bands are set equal.
The ability of the human audio perceptive system to discriminate between frequencies is diminishing proportional with absolute frequency. Therefore, for a perceptive uniform audio resolution, the filter bandwidths can be proportional with freqnency too. A number of N = 10 octave bands covers the range of audible frequencies 30 Iiz -20 k H z adequately, and is therefore a good starting point for the design of the equalizer. Consecutive octave bands double in bandwidth. Also the transition width of the bandpass filters can be increased proportional to frequency. T h e octave bands are defined in Table   1 . The Table shows that the overlapping adjacent slopes of two consecutive filters have to be realized in such a.way that with equal gain factors the overall amplitude characteristic becomes flat. T h e dynamic range of 16 bits corresponds to an attenuation of 90 tlB in the stopband. Psycho -acoustical Table 1 : The specification of the ten octave Iiands, note that band 10 is a highpass filter! literature reports audible effects 40 dB below the main signal. Therefore we set the requirement on the overall passband ripple to f 0.1 dB. The system requirements of the equalizer system are hereby specified.
EQUALIZER ALGORITHM
In principle, the requested bandlxms filters can be realized using the direct form realization of Finite Impulse Response (FIR) filters [l] . We have e.g. for the i f h band:
T h e impulse response of the FIR filter is given by the set coefficients h i ( / ) , 1 = 0,1, a , Li. A symmetric impulse response ensures a linear phase. In order to obtain some insight, in the task we have set onrselves we designed the 10 bandpass filters of Table 1 as equiripple filters with the (worstcase) specification of 90 dB stopband attenuation and f 0.1 dB passband ripple for each bandpass filter. For this design we have used the filter design software from the Signal Processing Workstation (SPW) [2] . The nurriber of coefficients Li neccessary for the realization i n this "brute force" approach is summarized in Table 2 .
For this realization the total number of filtercoefficients is 37277. In order to be able to test this realization against the system specification, we therefore need a processing power of 1 .G5 Giga Operations per Second (GOPS) and a t least 76 DSP chips, based upon the a w~m p t i o n , that a mnltiplication and accumulate instriiction is performed i n two clockcycles and with a clockcycle of44. we have choosen the multi -rate signal processing structure of Fig. 1 , which appeared promising for further reduction of computations [4] . By applying decimation tecliniques the number of multiplications per second can be reduced considerably. Of course the original frequency contents must be restored by interpolation. I n order to reduce the sample rate with a factor M we first have to filter the signal to remove the frequency components above half the reduced sample rate. Alias distortion is prevented in this way. Of every M samples only one sample is processed, the others are discarded.
Because of the less stringent requirements to these anti-alias filters, it is often advantageous to perform the decirnation (interpolation) in more than one 
DSP IMPLEMENTATION
The implementation on the DSP can be divided into two parts. After a discussion of the implementation of a single band we turn to the multiband case. For a single band we choose for an input buffer size of twice the decimation factor M of the first stage. Once M samples have been read, A4 new output samples can be cornputed. The algorithm includes modulation, filtering, demodulation etc. as indicated in the block diagram in Fig. 1 . The modulations/democlrilations a t t8he input and a t the output of the block diagram are performed with cosine ancl sine tables because the DSP cannot execute a simple cos(.) instruction. The length of such a table depends on the modulation frequency ( i . e . the ratio of the sampling frequency and the center frequency of the pertinent bandpass filter). Applying octave bands we need for lower frequencies more coeflicients in order to obtain a cyclic table. The modulation/demorIoIation in the initltlle of the diagram are simple frequency shifts of ir, so these tables becorne very short having the coefficients O,l,O,-1.
For the filter operations we apply tables with filter coefficients using a pointer t o the current coefficient. The in -and output of such a filter will also be stored in two tables using pointers, one table for the real components and one table for the imaginary components. IIantlIing these pointers in the proper way the samples are multiplied with the pertinent coefficients and accumulated. llsing this method each filter will be calculated. The decimation is performed by removing (not using) M -1 of the M samples with decimation factor M . The int,erpolation output must be multiplied with the interpolation factor to a.djust, the energy of the signal. The single band implementation is illustrated in Fig. 2 .
If we want to expand this single-band system to a multi-band system we have to handle some prol~letns. For the input and output we should take a buffer with a size of at, least twice the maximum decimation factor. However in particular cases this size night prove too short and samples might be processed a t the moment they have been overwritten. (For example if the two biggest decimation factors don't differ much). For this reason an input/output buffer should be taken with a size of twice the maximum decimation factor plus the second largest, decimation factor. A pointer giving the current position in the main input/output buffer should be retained for each band. Each band will be processed according to a scheduling table containing a list that points to the band to be processed.
The necessary causality of the filters introduces a signal delay when processing the individual filters. The total delky of each bandfilter is the sum of the delays in each single filter. Since each Band has different filters and different decimation factors, the total delays of each band will be different. At this point the DSP can be programmed. The main problem of the programming is of course situated in the available processing time of the DSP for a sample. IJsing the Motorola DSP 56001 the number of available clock cycles between two consecutive input samples will be f 460. In realizing the entire system this does not seem iniich, however, by using efficient (pre)decimation techniques this amount of cycli can be raised to M tirnes the original number of cycli depending on which (I,re)tlecimat,ion factor has been used.
VLSI IMPLEMENTATION
Besides the DSP irnpl~meiitation of the developed algorithm we wanted to survey the p~ssibilit~y of realizing tjhe algorithm in dedicated hardware. The (V)LSI implementation of an algorithm in hardware can give some profit in terms of speed and area coiriparetl to the use of DSP's if one admits less flexihility. One can optimize towards speed and/or area while preserving the necessary functionality. In contrast with the DSP realization, where the number of cycles is the main cost factor (because of the sequential behaviour of the processor), one can make fully use of parallellism and pipelining when designing dedicated hardware. In this part shortly two prototype designs are discussed of which one makes use of extended pipelining, in this way minimizing the number of functional Blocks, whereas the second design makes use of several functional parts which work in parallel, each executing part of the algorithm.
To assure a correct design, tlhe algorithm is firstly described and simulated using the Modeling and Design Language (MoDL) [5] . Starting from the behavioral description, the design can be worked out via an initial implementation towards a more detailed hardware description. Using this method, every design step can be verified. From this hardware description the datapath is entered in the Ment,or Graphics' Idea Station (schemat>ic entry) [6] using a standard cell library. The controller part of the design is generated using a logic synthesis paelage [7] with int>erface to the Mentor Graphics' Idea Station. The layout is generated using Mentor Graphic,s' Cell Station (Placement & routing). Processing is done via Eurochip using the chosen standard cell technology (ES2 1.5 pm CMOS or MIETEC 2.4 prn CMOS). After fabrication the design will lie tested with an ASIC tester using the determined test patterns, as well as an in-circuit, test in an audio test environment.
Looking at the algorithm we can distinguish several functions that have to be performed. One needs 14 convolutions, which have to do a 9 many multiplications as possible (in this way determing the maximal filter lengths) within one sample period. Also 9 slower rriult,iplications for modulation and attenuation, each performing just, one mult,iplication during each sample period are required. Finally some adtlit,ions and control will be needed. If the number of filter coefficients of the several filters and the sample period is known one can compute the maximum time allowetl for a rnultiplication. This knowledge is needed to adjust the hardware to it's function.
The first prot,otlype design developetl uses a parallel multiplier. In the given 1.5 p m CMOS technology this rriultiplier is fast enough to compute the convolution of a filter with 90 coet%cients within a single sample period. If one uses 8 stage pipelining it is possible to compute the 14 required convolutions together with some of the nec,essary modulation and attenuation computations with just two parallel multipliers and a large amount of pipeline registers. Looking a t the first design we see that no use is macle of the sample rate reduction introduced by t.he deciinat.ion fact,ors in the algorithm. This sample rate reduction allows much more time for the computat,ion of the intermediate filterconvolutions of the algorit,lirn.
This first design protot,ype resiilt>s i n a two chip solution which requires approximately 64 mm2 in t.he ES2 1.5 pm CMOS Standard Cell technology for each chip. Nereto the algorit,hm is divided i n two parts (an upper part and a lower part) each performed by a single chip. This can lie easely done if the intermediate and final results are combined on each single chip. In this way both chips will be exactly equal. The chips will be running at 33 MHz., will contain 64 pins of which 52 are used to address and read the two 40 ns 1K RAM banks of coefficient and sample mernory [8] . See Table 3 .
For the second design we put as aim the to use the sample rate reduction for subsequent filters within the algorithm, meanwhile avoiding the use of pipelining. This at the cost of more but smaller multipliers. Because of the use of several multipliers instead of one or two one can trim the filterlengths and multiplier speed so that large filters are computed a t the lower sample rates. This results in a design consisting of 4 serial-parallel multipliers which make optimal use of the possibly lower intermediate sample rate.
The algorithm is devicled over four multipliers. see Fig. 3 . T h e first multi- Figure 3 : Division of t,he algoriblim over different multipliers plier performs the first 4 filter convolutions (111, h2, 114, 115) and decimations ( m l and m4) a t the initial (and highest)) sample rate. The second multiplier (second stage) handles filt#ers h 3 , 1.16, 117, 118 antl h l l , as well as decimation m2, m3, m5, m6 and interpolatrion I1 and 14. This is possible because the sample rate is m l times lower. The third multiplier is used to compute filters h9 and 1112 together with int,erpolat,ion 12 antl 15. The last multiplier (again working at, the highest sample rate) computes filter 1110 and h13. By optimizing the filters for this design, tthis means making filters h l , h4, h10 and h13 small while allowing the other filters to get relatively large, one makes more efficient use of the available computing power without the need for parallel multipliers or pipeling t,echniques [9] , [IO] .
Also in this second prototype the characteristic of equal filter coefficients around the center coefficient of FIR filters is turned into a profit. Of afilter of length N coefficients 1 arid N, 2 and N-1 etc. are equal so first samples 1 and N , 2 and N-1 etc. can be added before the multiplic,ation with the filtercoefficient is performed, in this way retlncing t,he nurnber of multiplications required and the size of c.oefficients memory aliout two t,imes. This makes t,lie computation of more filters wit,h one multiplier or t81re use of larger filters possible. See Fig. 
4.
This second design protot,ype results in an estimated chipsize of 35 mm2 in the ES2 1.5 p m CMOS Standard Cell technology. The chip will be running a t 25 MHz and will have the same pincount of 64 pins. The reqnired amount of off-chip memory will be around 21tB RAM. The access time of the used RAM'S must be less than 20 11s. See Table 3 . filtercoefficient\ Figure 4 : Redu c t ion of 1 I it1 1 t, i 11 li c a t,ions t,h rough s y i timet r ic fil t,er character istics
In the current stage of the V I S I iiiil~lernetitat,ioti, external memory will be used to store the f<lt8er coelficient,s antl tlat8a sarriples. Embedded memory may replace this data st,orage i n future. 11 will be obvious that the number of memory accesses is liiiiit,etl by the read/write access time of the applied memory devices. There are tawo reasons for minimizing the number of memory accesses of a particular filt8er(bank) implementatioii. Reducing the number of accesses wil reduce the cost of t8he memory environment since the prizes of the memories are proportional to the speed of the devices. The size of the memories should also be taken int,o account, for the same reason. Secondly, minimizing the number of meniory accesses makes it possible to implement filters with more stringent, f i h r specifications. 
