Analogue VLSI Implementation of a 2-D Sound Localisation System by Grech, Ivan
Analogue VLSI Implementation of a 2-D Sound 
Localisation System 
Ivan Grech 
Submitted for the Degree of 
Doctor of Philosophy 
from the 
University of Surrey 
Unis 
Surrey Space Centre 
School of Electronic Engineering, Information Technology and Mathematics 
University of Surrey 
Guildford, Surrey GU2 5XH, UK 
February 2002 
© I. Grech 2002 
VOLUME CONTAINS CLEAR OVERLAYS 
OVERLAYS SCANNED SEPERATELY AND 
OVER THE RELEVANT PAGE. 
CONTAINS 
PULLOUTS 
Summary 
The position of a sound source can be accurately determined in both azimuth and 
elevation through the use of localisation cues extracted from the incident audio signals. 
Compared to lateral localisation, 2-D hardware localisation is novel and requires the 
extraction of spectral cues in addition to time delay cues. The objective of this work is to 
develop an analogue VLSI system which extracts these cues from audio signals arriving 
at the left and right channels of the system, and then map these cues to the source 
position. The use of analogue hardware, which is broadly adapted from the biological 
auditory system, enables fast and low power computation. 
To obtain accurate 2-D localisation from the hardware-extracted cues a novel 
algorithm for the mapping process has been developed. The performance of this algorithm 
is evaluated via simulation under different environmental conditions. The effects of 
hardware non-idealities on the localisation accuracy, including mismatches and noise are 
also assessed. 
The analogue hardware implementation is divided into three main sections: a 
front-end for splitting the input signal into different frequency bands and extraction of 
spectral cues, an onset detector for distinguishing between the incident portion and the 
echo portion of the acoustic signal, and a correlator for determination of time delay cues. 
Novel building blocks have been designed using standard CMOS in order to enable low 
voltage low power operation of the differential architecture essential for the accuracy of 
the extracted cues. A novel feedback technique enables accurately controlled Class AB 
operation of a low voltage switched-current memory cell. A novel cross-coupling 
technique ensures correct Class AB operation of a log-domain bandpass filter. The five 
chips developed here operate at ± 0.9 V supply. The system has been tested by applying 
audio signals convolved with a position-dependent transfer function at the input, and then 
processing the resulting hardware-generated cues. Measurement results show that 2-D 
localisation within 5° accuracy is achievable using hardware extracted cues. 
Key words: sound localisation, analogue VLSI, silicon cochlea, log domain, switched 
capacitor, switched current, current mode, analogue processing. 
Email: igrech@eng. um. edu. mt 
WWW: http: //www. eng. um. edu. mt/microelectronics/igrech. html 
11 
Acknowledgements 
I would like to express my gratitude to my supervisor Dr. Tanya 
Vladimirova, and co-supervisor Prof. Joseph Micallef for their constant guidance and 
support throughout the course of this research work. 
I would also like to thank the University of Malta for offering me the 
opportunity and funding necessary to carry out this research work. 
Finally I would also like to thank my family for the constant support and 
encouragement they have shown during the course of my studies. 
111 
Publications 
Parts of this thesis have been published as follows: 
1. I. Grech, J. Micallef, and T. Vladimirova, "The Silicon Cochlea and its Adaptation 
to Sound Localisation", IEE Proceedings on Circuits, Devices and Systems, 146, 
pp. 70-76,1999. (Awarded the 2000 IEE Ambrose Fleming Premium). 
2. I. Grech, J. Micallef, and T. Vladimirova, "± 0.9 V Switched Capacitor Multiplier 
with Rail-to-Rail Input", Electronics Letters, 35, pp. 1688-1689,1999. 
3. I. Grech, J. Micallef, and T. Vladimirova, "Two ± 0.7 V S21 Class AB Fully 
Differential Memory Cells", Electronics Letters, 36, pp. 2062-2063,2000. 
4. I. Grech, J. Micallef, and T. Vladimirova, "Low-Voltage, High-Speed Fully 
Differential CMOS Op amp", Proceedings of the 7`h IEEE International 
Conference on Electronics, Circuits and Systems, 1, pp. 7-11,2000. 
5. I. Grech, J. Micallef, and T. Vladimirova, "Low-Voltage, SC TDM Correlator for 
the Extraction of Time Delay", Proceedings of the 7`h IEEE International 
Conference on Electronics, Circuits and Systems, 1, pp. 112-115,2000. 
6. I. Grech, J. Micallef, and T. Vladimirova, "Low Voltage Programmable Current 
Mode Circuit for Onset Detection in a Sound Signal", Proceedings of the 8`h IEEE 
International Conference on Electronics, Circuits and Systems, 3, 
pp. 1119-1122,2001. 
7. I. Grech, J. Micallef, and T. Vladimirova, "Low Voltage Log Domain Front End 
for the Extraction of 2-D Sound Localisation Cues", Proceedings of the 8`h IEEE 
International Conference on Electronics, Circuits and Systems, 3, 
pp. 1473-1476,2001. 
8. I. Grech, J. Micallef, and T. Vladimirova, "Cue-to-Position Mapping Algorithm 
for 2-D Spatial Localisation using Interaural and Monaural Cues Derived from 
Analogue Hardware", submitted for publication. 
iv 
Contents 
Contents 
Chapter 1- Introduction 
1.0 Sound Localisation Applications .................................................. .1 
1.1 Sound Localisation Techniques ................................................... .2 
1.2 Analogue Versus Digital ........................................................... .3 
1.3 Low Voltage Low Power Design ................................................. .. 
5 
1.4 Deviations from the Biological System ............................................ .8 
1.5 Scope and Outline of Thesis ....................................................... .. 
9 
Chapter 2- The Silicon Cochlea with Application to Sound 
Localisation 
2.0 Introduction ............................................................................ 
12 
2.1 The Biological Cochlea ........................ ... 
13 
..................................... 
2.2 Hardware Models of the Auditory System.......... 16 ............................... 
2.2.1 Middle Ear Models ....................................................... 
16 
2.2.2 Inner Ear Models 
......................................................... 
17 
2.2.2.1 Transconductance-C Implementations ....................... 
17 
2.2.2.2 Log-Domain Implementations ................................ 
19 
2.2.2.2.1 The Translinear Principle ......................... 
20 
2.2.2.3 Digital Implementation ......................................... 
21 
2.2.3 Inner Hair Cell Models .................................................. 
22 
2.2.4 Outer Hair Cell Modelling and Automatic Gain Control .......... 
24 
2.2.5 ............................. Spiral Ganglion Neurone Models......... 
25 
2.2.6 Cochlear Nucleus Models ............................................... 
26 
2.3 Design Is sues Related to the Performance of the Silicon Cochlea .............. 
27 
2.3.1 Second Order Lowpass Filter Stability 
................................. 
27 
2.3.2 Dynamic Range ........................................................... 
28 
2.3.3 Device Mismatch ......................................................... 
29 
2.3.4 Area Requirement ........................................................ 
30 
2.4 Application of the Silicon Cochlea to Sound Localisation ...................... 
31 
2.4.1 Localisation Theory ...................................................... 
31 
2.4.2 VLSI Implementation of Localisation Systems ...................... 
32 
V 
Contents 
2.5 Enhancements in VLSI Implementations of Sound Localisation ............... 
34 
2.6 Conclusions ............................................................................. 
35 
Chapter 3- Cue-To-Position Mapping Algorithm for 2-D Spatial 
Localisation 
3.0 Introduction ............................................................................ 
36 
3.1 Coordinate System Used ............................................................... 
39 
3.2 System Model Overview .............................................................. 
40 
3.2.1 Block Diagram .............................................................. 
40 
3.2.2 Stimulus Generation ....................................................... 42 
3.3 Cue Extraction Analogue Blocks .................................................... 42 
3.3.1 The Filter Bank 
.............................................................. 42 
3.3.2 Interaural Phase Delay ..................................................... 43 
3.3.3 Envelope Detection ......................................................... 44 
3.3.4 Interaural Intensity Difference Computation 
............................ 
44 
3.3.5 Interaural Envelope Delay Computation 
................................ 
44 
3.3.6 Monaural Spectral Cues Computation ................................... 44 
3.3.7 Onset Detection ............................................................. 46 
3.4 Template Generation and Interpolation of the Resulting Cues .................. 47 
3.5 Cue-To-Position Manning Algorithm ............................................... 49 
3.5.1 The Search Method ......................................................... 49 
3.5.1.1 Basis for the Algorithm ......................................... 49 
3.5.1.2 Three-Step Search Method .................................... 53 
3.6 Simulation Results ..................................................................... 
54 
3.6.1 Environmental Effects ..................................................... 55 
3.6.1.1 Effect of Noise Interference ................................... 
55 
3.6.1.2 Effect of Echo Interference ..................................... 56 
3.6.2 Hardware-Induced Effects ................................................ 57 
3.6.2.1 Effect of Channel Cross-talk ................................... 57 
3.6.2,2 Effect of Hardware Accuracy on Localisation Error and 
Individual Cues ................................................... 60 
3.6.3 Performance at Different Source Positions ............................. 63 
3.6.4 Accuracy and Speed of the 3-Step Mapping Algorithm ............. 67 
vi 
Contents 
3.7 Generic Search Algorithm ............................................................ 
68 
3.8 Conclusions .............................................................................. 
69 
Chapter 4- Hardware Detection of Onsets in a Sound Signal 
4.0 Introduction ............................................................................. 
71 
4.1 Principle of Operation ................................................................. 
71 
4.2 Analogue and Mixed-Mode VLSI Design Overview ............................. 
76 
4.2.1 Analogue VLSI Design .................................................... 
76 
4.2.2 Mixed-Mode IC Design ................................................... 
77 
4.2.3 Analogue Layout Techniques ............................................. 78 
4.3 Implementation ........................................................................ 78 
4.3.1 Voltage-to-Current Converter and Harmonic Mean Splitter......... 79 
4.3.2 Squaring Circuit ............................................................ 80 
4.3.3 Envelope LPF ............................................................... 82 
4.3.3.1 Log-Domain Circuit Synthesis 
................................ 
82 
4.3.4 Delay Line ................................................................... 84 
4.3.4.1 Class AB S2I Differential ± 0.7 V Current Memories...... 85 
4.3.4.1.1 Circuit Implementation I 
............................ 
85 
4.3.4.1.2 Circuit Implementation II 
........................... 
86 
4.3.4.1.3 Simulation Results ................................... 
88 
4.3.5 Echo Decay Model ......................................................... 90 
4.3.6 Onset Trigger Threshold ................................................... 91 
4.3.7 Window Generator ......................................................... 92 
4.4 Simulation Results ..................................................................... 
92 
4.5 Testing Results ......................................................................... 95 
4.5.1 Voltage-to-Current Converter and Current Splitter .................. 
95 
4.5.2 Squaring Circuit .......................................................... 95 
4.5.3 Low-Pass Filter ............................................................ 98 
4.5.4 Delay Line .................................................................. 99 
4.5.5 Echo Decay Model ........................................................ 100 
4.5.6 Onset Window Generator .............................................. 104 
4.6 Conclusions ............................................................................. 106 
vii 
Contents 
Chapter 5- Log Domain Front-End for the Extraction of 2-D Sound 
Localisation Cues 
5.0 Introduction ............................................................................. 
108 
5.1 Front End System Overview .......................................................... 
109 
5.2 Log-Domain Bandpass Filters ...................................................... 
110 
5.2.1 Log-Domain Filter Based on G, -C Cochlea Model .................. 
Ill 
5.2.2 Signal Flow Graph for the Second Log-Domain BPF ................. 114 
5.2.3 Differential Class AB Architecture ....................................... 
117 
5.2.4 Implementation of the Non-Linear Cross-Coupling Terms........... 119 
5.2.5 Bandpass Filters Circuit Implementation .............................. 
120 
5.3 Envelope Extraction ................................................................... 122 
5.4 Biasing Arrangement of the BPF Array ............................................ 122 
5.5 Computation of III) and First Order MSC Cues ................................... 124 
5.6 Second Order MSC Computation 
.................................................... 
125 
5.7 Automatic Gain Control Loop ........................................................ 125 
5.8 Output Multiplexing ................................................................... 132 
5.9 Simulation Results 
..................................................................... 132 
5.10 Testing Results ........................................................................ 135 
5.10.1 Bandpass Filters ........................................................... 136 
5.10.2 Complete System Response Including AGC .......................... 
141 
5.11 Conclusions ............................................................................ 142 
Chapter 6- Time Delay Cues Generation 
6.0 Introduction ............................................................................. 146 
6.1 ITD Extraction Using Analogue Hardware ........................................ 
146 
6.2 Cascade Delay Lines .................................................................. 147 
6.2.1 Circuit Design ............................................................... 
148 
6.2.2 Test Results for the Delay Line .......................................... 
151 
6.3 Time Division Multiplexed SC Correlator ......................................... 153 
6.4 High Speed Op Amp Design ......................................................... 155 
6.4.1 Class AB Slew-Boosted Differential Op Amp ......................... 155 
6.4.2 Experimental Results for the First Differential Op Amp .............. 
158 
6.4.3 Enhanced-Gain Class AB Differential Op Amp ........................ 159 
viii 
Contents 
6.4.3.1 Cascoded Input Stage ........................................... 
160 
6.4.3.2 Class AB Output Stage ......................................... 
161 
6.4.3.3 Common-Mode Feedback ...................................... 
161 
6.4.3.4 Slew Rate Enhancement ....................................... 
162 
6.4.3.5 Biasing Arrangement ........................................... 
163 
6.4.4 Simulation Results 
......................................................... 
164 
6.5 The Parallel S/H Bank ................................................................ 
167 
6.6 SC Multiplier ........................................................................... 
170 
6.7 Integrator Bank ........................................................................ 
173 
6.8 Analogue Memory .................................................................... 
174 
6.9 Dual Differential Input Comparison Circuitry ..................................... 
176 
6.9.1 Adder ........................................................................ 176 
6.9.2 Comparator 
.................................................................. 177 
6.10 Digital Control 
........................................................................ 177 
6.10.1 Reset Phase 
................................................................ 178 
6.10.2 Correlation Mode ......................................................... 178 
6.10.3 Comparison Mode 
........................................................ 
179 
6.10.4 Deadband Generation 
..................................................... 181 
6.10.5 Clock Voltage Doubler .................................................. 182 
6.10.6 Voltage Level Shifters ................................................... 
183 
6.11 Simulation Results ................................................................... 
184 
6.12 Experimental Results for the Correlator .......................................... 
184 
6.13 Conclusions ............................................................................ 
190 
Chapter 7- System Testing 
7.0 Introduction ............................................................................. 192 
7.1 Test Setup 
.............................................................................. 
192 
7.2 Hardware Cues Template ............................................................... 
193 
7.2.1 IID Cues ........................................................................ 193 
7.2.2 ITD Cues 
...................................................................... 
195 
7.2.3 First Order and Second Order MSC Cues ................................ 195 
7.3 Test Results at Different Source Positions with Different S/N Values......... 202 
7.4 Test Results with Echo - Effect of Onset Detector ............................... 208 
ix 
Contents 
7.5 Test Results with Different Sound Sources ......................................... 208 
7.6 Conclusions ............................................................................. 210 
Chapter 8- Conclusions And Further Work 
8.0 Conclusions ............................................................................ 211 
8.1 Further Work 
........................................................................... 
214 
8.1.1 Improvements on the Cue-To-Position Mapping Algorithm and its 
Hardware Implementation .................................................. 214 
8.1.2 Improvements to the Onset Detector .................................... 215 
8.1.3 Improvements to the Front-End ........................................... 216 
8.1.3.1 Microelectromechanical (MEMS) Technology 
....................... 
218 
8.1.4 Improvements on ITD Extraction ......................................... 218 
8.1.5 Supply Voltage Regulation 
................................................ 220 
8.1.6 Other Applications for the Hardware Building Blocks ................. 220 
Bibliography ..................................................................................... 222 
Appendices 
A. 1 Layout of Onset Detector Chip ...................................................... 234 
A. 2 Test Setup for Onset Detector and Front-End Chip ............................... 235 
A. 3 Layout of Front-End Chip ............................................................ 236 
A. 4 Layout of SC Cascade Delay Line .................................................. 237 
A. 5 Layout of the Differential Class AB Op Amp .................................... 238 
A. 6 Layout of the TDM Correlator Chip ................................................ 239 
A. 7 Test Setup for the Correlator Chip ............. ........... 240 .......................... 
X 
List of Figures 
List of Figures 
Figure 1-1 Power and area costs for analogue and digital designs ...................... 
Figure 1-2 Effects of scaling down CMOS technology .................................. 
Figure 2-1 Simplified single-chamber 2-D model of the cochlea ....................... 
Figure 2-2 2"d order LPF with separate cut-off frequency and Q-factor tuning..... . 
Figure 2-3 Principle of the current-mode log-domain filter .............................. 
Figure 2-4 An MOS 4-transistor translinear loop ......................................... 
Figure 2-5 Inner hair cell model including temporal adaptation .............. 
Figure 2-6 Cochlea model including AGC ................................................ 
Figure 2-7 Wide linear range OTA ......................................................... 
Figure 2-8 Block diagram of the l-D sound localiser chip .............................. 
Figure 3-1 
Figure 3-2 
Figure 3-3 
Figure 3-4 
Figure 3-5 
Figure 3-6 
Figure 3-7 
Figure 3-8 
Figure 3-9 
Coordinate system used ......................................................... 
Synthetic stimulus generation .................................................. 
Proposed front-end .............................................................. 
Batteau's model of the external ear ............................................. 
Model used for onset detection ................................................ 
Contour map for the perceptual distance ...................................... 
IPD cues and IED cues for the 9`h and 10`h BPF ............................. 
5 
6 
14 
17 
20 
21 
23 
25 
29 
33 
39 
40 
41 
45 
47 
51 
52 
Angular error between the estimated source location and the actual 
source location plotted as a function of the S! N ratio ....................... 
56 
Normalised values of the error in (a) IPD, IED and IID cues and (b) 1 S` 
and 2°d order monaural spectral cues plotted as a function of the S/N 
ratio ................................................................................. 57 
Figure 3-10 Angular error between the estimated source position and the actual 
position plotted versus the echo reverberation constant (Afe) and the 
echo envelope decay constant (T) .............................................. 58 
Figure 3-11 Angular error between the estimated source location and the actual 
source location plotted as a function of the crosstalk ratio ................. 
59 
Figure 3-12 The angular error between the estimated source position and its actual 
position plotted versus Q factor ................................................ 61 
xi 
List of Figures 
Figure 3-13 The angular error between the estimated source position and its actual 
position, for the case of unmatched and matched (2-sided) centre 
frequency ((h) variation in the BPF bank ..................................... 63 
Figure 3-14 Simulation results (uncorrelated noise) ....................................... 65 
Figure 4-1 Observed sound and total estimated echo envelope for the echo model 
used as a basis of the onset detector circuit ................................... 72 
Figure 4-2 Discrete-time algorithm for maximum echo estimation in a sound 
signal .............................................................................. 73 
Figure 4-3 Onset detector block diagram 
................................................... 73 
Figure 4-4 Operation of the onset detector ................................................ 74 
Figure 4-5 Layout of the translinear loop shown in chapter 2, Figure 2-4............ 79 
Figure 4-6 V-I converter and harmonic mean splitter .................................... 80 
Figure 4-7 Squaring circuit consisting of three translinear loops ....................... 81 
Figure 4-8 Signal flow graph for the envelope LPF ...................................... 82 
Figure 4-9 Log-domain building blocks 
................................................... 82 
Figure 4-10 One side of the 2°d Order LPF .................................................. 84 
Figure 4-11 Differential S2I cell using feedback current control .......................... 87 
Figure 4-12 S21 cell intended for pseudo-differential operation using feed-forward 
current control .................................................................... 87 
Figure 4-13 Simulation results showing the coarse MC pMOS and nMOS currents 
for the first S21 cell and the second S21 cell ................................... 89 
Figure 4-14 Transient simulation results for the first S21 cell used in a delay line..... 89 
Figure 4-15 Differential echo decay model .................................................. 91 
Figure 4-16 Estimated echo to composite signal envelope comparison ................. 92 
Figure 4-17 Transient response of the V-I converter and harmonic mean splitter..... 93 
Figure 4-18 Transient response of the squaring circuit .................................... 93 
Figure 4-19 Transient response of the 2"d order LPF ...................................... 94 
Figure 4-20 Echo-decay model transient response ......................................... 94 
Figure 4-21 Output waveforms of the V-I converter and current splitter ............... 96 
Figure 4-22 THD measured as a function of the input amplitude ........................ 96 
Figure 4-23 Output waveforms of the squaring circuit .................................... 97 
Figure 4-24 Simulated and measured LPF frequency responses ......................... 98 
xii 
List of Figures 
Figure 4-25 Output waveforms of the LPF .................................................. 
99 
Figure 4-26 Output waveforms of the S21 delay line ....................................... 
100 
Figure 4-27 Echo decay model output signal ................................................ 
102 
Figure 4-28 Measured window generator output ........................................... 
104 
Figure 5-1 Block diagram of the front-end ................................................ 
109 
Figure 5-2 2"d order LPF / BPF used for cochlea modelling ............................ 
111 
Figure 5-3 Signal flow graph for the cochlea model ...................................... 
112 
Figure 5-4 Signal flow graph for the simplified cochlea model ........................ 
112 
Figure 5-5 Log-domain BPF based on cochlea model and minimised circuit......... 113 
Figure 5-6 Alternative, simplified and log-domain BPF signal flow graphs.......... 115 
Figure 5-7 Log-domain implementation of the second BPF ............................. 
117 
Figure 5-8 Signal flow graph for the Class AB BPF ..................................... 
118 
Figure 5-9 Implementation of the non-linear cross-coupling terms via a translinear 
loop 
................................................................................. 
119 
Figure 5-10 Simplified implementation of the non-linear cross-coupling ............... 
120 
Figure 5-11 Second order Class AB differential BPF ...................................... 
121 
Figure 5-12 Resistive bias line, associated driver and Q-factor tuning .................. 
123 
Figure 5-13 PTAT current source together with start-up circuit .......................... 
124 
Figure 5-14 Circuit used for 1IDs and 1st order MSCs ..................................... 
124 
Figure 5-15 Circuit used for 2"d order MSCs ................................................ 
125 
Figure 5-16 AGC loop arrangement ......................................................... 
126 
Figure 5-17 Maximum current detector ...................................................... 
126 
Figure 5-18 Differential variable gain amplifier ............................................ 
127 
Figure 5-19 Steady state output current A; and VGA gain A; as a function of input 
amplitude .......................................................................... 
128 
Figure 5-20 Transient response of the GCA current gain A; to a step input change 
in I,,, X .............................................................................. 
129 
Figure 5-21 Simulink model of the AGC loop .............................................. 
130 
Figure 5-22 Simulink transient simulation results for the AGC loop showing input 
current, VGA output, envelope signal and current gain of the VGA...... 131 
Figure 5-23 Multiplexing arrangement for the BPF, envelope, IID and MSC cue 
outputs ............................................................................. 
132 
X111 
List of Figures 
Figure 5-24 Frequency response of the BPFs for Itue I nA to I µA ................... 
133 
Figure 5-25 Frequency response of the LPFs for Itune = 200 pA to 200 nA ............. 
134 
Figure 5-26 PTAT current source characteristic for T= -20 to 120 °C and 
R1=61kQ ........................................................................ 
134 
Figure 5-27 Divider transfer characteristic .................................................. 
135 
Figure 5-28 Estimated and measured resonant frequency of each of the 24 BPFs..... 136 
Figure 5-29 Adjusted Q-factors for the L and R BPFs .................................... 
137 
Figure 5-30 Frequency response of the 3`d, 10`h, 18th and 24`h BPF measured using 
Avantest spectrum analyser together with a tracking generator............ 138 
Figure 5-31 THD as a function of input amplitude ......................................... 140 
Figure 5-32 Complete front-end test with a modulated sinusoidal input signal........ 143 
Figure 6-1 Cascade SC delay line circuit .................................................. 148 
Figure 6-2 Non-ovelapping clock phases Kt, K2 and corresponding two-phase 
clock generation for K1, K2 together with their complements nKi, nK2.. 150 
Figure 6-3 Op amp used in the SC delay ................................................... 
151 
Figure 6-4 Block diagram of the analogue TDM correlator ............................. 154 
Figure 6-5 Circuit diagram of the first op amp .................. ..................... 
156 
Figure 6-6 Input stage of the second op amp .............................................. 
160 
Figure 6-7 One side of the class AB output stage ......................................... 161 
Figure 6-8 CMFB amplifier .................................................................. 162 
Figure 6-9 Slew boost circuit ................................................................ 163 
Figure 6-10 Bias generator ..................................................................... 
164 
Figure 6-11 Op amp frequency response and input referred noise ....................... 165 
Figure 6-12 Transient response for the op amp in unity gain configuration, 
operating at a supply voltage of 1.8 V ......................................... 166 
Figure 6-13 Transient response of the op amp in unity gain configuration operating 
at a supply voltage of 1.4 V ..................................................... 
167 
Figure 6-14 Parallel S/H bank ................................................................. 168 
Figure 6-15 SC Multiplier core and CM adjustment ....................................... 171 
Figure 6-16 OTA used for output CM feedback ............................................ 173 
Figure 6-17 Integrator bank ................................................................... 
174 
Figure 6-18 Analogue memory ............................................................... 175 
xlv 
List of Figures 
Figure 6-19 SC Adder .......................................................................... 
176 
Figure 6-20 Clocked comparator ............................................................. 
178 
Figure 6-21 Comparator control waveforms ................................................ 
181 
Figure 6-22 Scheme for dead-band generation ............................................. 
182 
Figure 6-23 Voltage doubler .................................................................. 
183 
Figure 6-24 Voltage level shifter ............................................................. 
183 
Figure 6-25 Simulation results for S/H bank, integrator and analogue memory 
output - with delay and without delay ........................................ 
185 
Figure 6-26 Measured and ideal correlator chip responses at fck = 754 kHz........... 186 
Figure 6-27 Main timing diagram showing Reset, Comp_state<0> and 
Comp_mode signals ............................................................ 187 
Figure 6-28 Comparison stages: D-latch clock and comparator output signals with 
the L-R delay set to 0,15,30,40 and 50 gs ................................. 
187 
Figure 6-29 Measured and ideal correlator chip responses at fck = 3.88 MHz.......... 190 
Figure 7-1 Test setup for system testing ................................................... 193 
Figure 7-2 Measured lID values as a function of azimuth for different elevation 
angles .............................................................................. 194 
Figure 7-3 IPD values as a function of azimuth for different elevation angles....... 196 
Figure 7-4 IED values as a function of azimuth for different elevation angles....... 197 
Figure 7-5 Contour map for the measured right channel first order MSC cues...... 198 
Figure 7-6 Contour map for the measured left channel first order MSC cues......... 199 
Figure 7-7 Contour map for the measured right channel second order MSC cues... 200 
Figure 7-8 Contour map for the measured left channel second order MSC cues..... 201 
Figure 7-9 Localisation performance using hardware-generated cues for an input 
S/N of 80 dB ...................................................................... 203 
Figure 7-10 Localisation performance using hardware-generated cues for an input 
S/N of 60 dB ...................................................................... 204 
Figure 7-11 Localisation performance using hardware-generated cues for an input 
S/N of 40 dB ...................................................................... 
205 
Figure 7-12 Localisation performance using hardware-generated cues for an input 
S/N of 20 dB ...................................................................... 206 
xv 
List of Figures 
Figure 7-13 Localisation performance using hardware-generated cues for an input 
S/N of 10 dB ...................................................................... 207 
Figure 7-14 Localisation accuracy under different reflection coefficient (Afe) and 
decay time constant (i) conditions with (i) onset detector circuit 
enabled and (ii) disabled .......................................................... 209 
Figure 8-1 Onset detector improvement 
................................................... 216 
Figure 8-2 Proposed mixed-signal ITD extraction architecture ........................ 219 
xvi 
Acronyms 
Acronyms 
ADC Analogue to digital converter 
AGC Automatic gain control 
BiCMOS Bipolar-Complementary metal oxide silicon 
BPF Bandpass filter 
BSIM Berkeley Short Channel IGFET model 
CM Common mode 
CMFB Common mode feedback 
CMOS Complementary metal oxide semiconductor 
CN Cochlear nucleus 
DRC Design rule check 
Gm-C Transconductance-Capacitance 
GBW Gain bandwidth product 
HDL Hardware description language 
HRTF Head related transfer function 
IED Interaural envelope delay 
IGFET Insulated gate field effect transistor 
IHC Inner hair cell 
III) Interaural intensity difference 
ILD Interaural level difference 
IPD Interaural phase difference 
ITD Interaural time delay 
LD Log domain 
LED Light emitting diode 
LMS Least mean square 
LPF Low pass filter 
LVS Layout versus schematic 
MC Memory cell 
MEMS Microelectromechanical devices 
MOS Metal oxide semiconductor 
MSC Monaural spectral cue 
xvii 
Acronyms 
OHC Outer hair cell 
OTA Operational transconductance amplifier 
PFM Pulse frequency modulation 
PTAT Proportional to absolute temperature 
PTSH Post stimulus time histogram 
PWM Pulse width modulation 
RBF Radial basis function 
S/H Sample and hold 
S2I Double sampling switched current 
SC Switched capacitor 
SFG Signal flow graph 
SGN Spiral ganglion neurone 
SI Switched current 
S/N Signal to noise 
TDE Time delay estimation 
TDM Time division multiplexed 
THD Total harmonic distortion 
VGA Variable gain amplifier 
VLSI Very large scale integration 
VT Threshold voltage 
WTA Winner take all 
xviii 
Chapter 1. Introduction 
Chapter 1 
Introduction 
1.0 Sound Localisation Applications 
Localisation of a sound signal has various applications such as automatic camera 
orientation in teleconferencing, surveillance systems [1] and robotics (automatic guided 
vehicles) [2]. Sound localisation is also required when using a microphone array for beam 
forming with electronic steering; microphone arrays can be used to enhance a desired 
source signal and attenuate undesired signals and are used in the field of 
teleconferencing [3], speech [4] and speaker recognition [5], speech acquisition in an 
automotive environment and hearing aid devices [6]. Another emerging area of interest is 
"spatialisation" of sound where an acoustic signal is manipulated in such a way as to give 
it the impression of coming from a particular location, and applications range from 
entertainment [7] to military [8]. A sound localisation system could be used in this case in 
order to assess the quality of the spatial sound. A related application of sound localisation 
systems is to assess the acoustic properties of buildings. In many applications, real-time 
processing is required, which often necessitates the use of dedicated hardware. A sound 
localisation system based on the biological auditory system is also interesting in that it 
provides insights of how the biological system performs under different conditions (such 
as different input stimuli and environmental acoustic parameters). This work is focussed 
on passive sound localisation (i. e. localisation of a source generating an audio signal), as 
opposed to active (or sonar) sound localisation where the object to be localised is in fact a 
reflecting surface and the sound source is part of the localisation system. 
1.1 Sound Localisation Techniques 
A possible approach to sound localisation is to use an array of microphones 
(typically more than eight) in order to perform time delay estimation (TDE) [9], also 
known as time difference of arrival. TDE methods compare the phase difference between 
the microphone signals using analysis techniques such as normalised cross correlation, 
least mean square adaptive filtering and cross-power spectrum phase. The main advantage 
of TDE-based methods is that, with a relatively low computational complexity, they can 
still achieve good localisation performance [10-12]. The main disadvantage of using a 
1 
Chapter 1. Introduction 
TDE-based approach is that, in the case of multiple sound sources, these methods tend to 
give unreliable results. 
Another approach is to find the source location by maximising the power of a 
steered beam-former using a microphone array [13-15]. Here the location estimate is 
derived from a filtered, weighted, and summed version of the signal data received at the 
sensors. Optimal performance depends on a priori knowledge of the spectral content of 
both the primary signal and background noise, which is rarely available in practice. 
Another problem with this approach is that the objective function to be maximised (the 
output power), does not have a strong global peak and frequently contains several local 
maxima. Hence non-linear optimisation requiring a large number of computations is 
necessary, making this approach not very appropriate for real time applications. 
The use of high-resolution spectral estimation has also been used for sound source 
localisation [16,17]. These techniques include beam-forming methods adapted from the 
field of high-resolution spectral analysis such as autoregressive modelling, minimum 
variance spectral estimation and eigenanalysis-based techniques. These techniques are all 
based upon the spatio-spectral correlation matrix derived from the signals received at the 
sensors. Often this matrix is not known, but has to be estimated from the observed data 
and is prone to errors with conventional sources. In addition, high-resolution methods are 
all designed for narrowband signals and although they can be extended to wideband 
signals [181, these extensions increase the computational requirements substantially. 
Although the search for the source location is not as complex as in steered beam-forming 
techniques, the function to be maximised typically still contains several sharp peaks and 
hence fast optimisation techniques cannot be used. 
The above mentioned approaches typically use digital signal processing techniques 
and the computations involved do not resemble in any way the processing which takes 
place in the biological auditory system. The technique used in this work achieves 2-D 
localisation implementing processes which broadly resemble what takes place in the 
biological system. The system uses just two sensors and localisation is estimated using the 
difference (interaural) information between the two sensor outputs, together with the 
spectral information from each sensor in order to solve certain ambiguities. A front end 
whose functionality resembles the processing which takes place in the biological cochlea 
has been designed for the separation of the input signal into frequency bands. Models of 
the cochlea have been reported in literature with applications to pitch perception [ 19], 
2 
Chapter 1. Introduction 
speech recognition [20], 1-D localisation [21] and hearing aid implants [22]. In this work, 
it is assumed that sensor position is fixed and hence dynamic source localisation 
information is not available (in the biological system, dynamic localisation information is 
thought to play an important part in resolving certain localisation ambiguities which arise 
at specific source positions [23]). Furthermore, signals due to echoes are treated as 
unwanted interference rather than used as localisation information. Fully analogue 
techniques have been used here for the extraction of localisation information (or cues) 
from the sensor output. These techniques allow direct mapping to analogue VLSI with a 
low power and area overhead. A novel algorithm has also been developed and tested in 
order to process the resulting cues and determine the source location in 2-D. 
1.2 Analogue Versus Digital 
Any form of signal processing can be carried in broadly three different ways: 
(a) using software executed on a multi-purpose digital processor, (b) using a digital 
application specific integrated circuit (ASIC), (c) using an analogue ASIC. The first 
technique has the advantage of a relatively fast design cycle time, partly because of the 
ease of modification of the algorithm being implemented. However, software 
implementations running on a general processor usually suffer from a high power 
dissipation and large area; furthermore computation often cannot be done in real-time. 
Considerable improvement is achieved if a custom digital circuit is designed, in which 
case the power and area requirement would be greatly reduced, while the speed enhanced 
and excellent accuracy still maintained. Signal processing can then be carried out using 
customised analogue circuitry, in which case, further reduced power dissipation and area 
overhead is expected. Whilst digital systems are essentially discrete-level and discrete- 
time (sampled data) systems, analogue systems are essentially continuous-level but can 
still be classified into continuous time and discrete time systems. Continuous time (and to 
a certain extent also discrete time) analogue systems are generally able to perform real- 
time computation. 
When choosing between digital and analogue application specific designs, one has 
to take into account the aspects of power and area costs in each case. Fundamental work 
in this regard can be found in literature [241, where it has been shown that power and area 
cost of an analogue system is considerably lower than that of a digital system for low 
precision requirements, and vice-versa for high precision requirement. A typical plot of 
3 
Chapter 1. Introduction 
power and area cost as a function of precision (measured as S/N ratio), taken from [24] is 
shown in Figure 1-1. Analogue computation at low precision is cheap because the 
circuitry is usually a direct mapping of the problem to be solved and therefore there is 
little area and power overhead, but at high precision, the costs of maintaining low noise 
and offset in a single wire representation make it expensive. The crossover point between 
digital and analogue systems in today's CMOS technology happens near 8-bits of 
accuracy which is equivalent to a SIN of 48 dB. A cochlea model implemented using a 
software programme running on a general purpose (Dec-Alpha) processor would consume 
about 50 W of power, while a digital ASIC implementation consumes about 150 mW of 
power [25]. In contrast a typical analogue cochlea consumes just 0.5 mW [26]. The area 
requirements are 299 mm2,25 mm2 and 7.7 mm2 for the Dec-Alpha, digital ASIC and 
analogue ASIC implementations, respectively. In contrast the human brain with its high 
processing capability, is estimated to consume just 12 W [27] and processing seems to be 
done using a mix of discrete time and continuous time analogue distributed techniques, 
each carried out in a highly noisy fashion. The issue of low power dissipation is obviously 
very critical for portable battery-powered applications. Compared to digital VLSI 
techniques, a full-custom layout is required in analogue designs in order to achieve good 
performance. 
The high level simulations presented in this work show that around 40 dB of S/N 
ratio is sufficient to yield satisfactory sound localisation results and hence the use of 
analogue computation techniques is attractive for this application. Most of the hardware 
building blocks have been designed using continuous time analogue techniques (front-end 
filter bank, spectral cue computation circuits and most blocks in the onset detector). 
Discrete-time techniques have been used in cases where long time delays are required 
(echo estimation and time-delay computation), since in this case discrete time techniques 
are advantageous compared to continuous time techniques in terms of precision and area 
cost. Furthermore, this thesis presents a mix of a variety of analogue processing 
techniques, namely continuous-time current-mode techniques for the front-end, switched- 
current techniques for the delay line in the onset detector and switched-capacitor 
techniques for time delay computation. 
4 
Chapter I. Introduction 
10' 
10` 
102 
100 
my 
CL 10. 
10, 
10' 
10" 
10 
10' 
la 
10 
1a 
is 
IC 
ýa) POWER COSTS 
DIGITAL 
ANALOG 
Limit set by i 
1 /f noise 1 
fora fixed I 
area con! n!!! n 
NU 20 40 EO so 100 
OWeui AM Rmfkn MR% 
(b) AREA COSTS 
=I 
DIGITAL 
ANALOG 
Limit set by 
thermal noise --'I 
fora fixed 
power consumption 
14 
0 20 40 e0 8p 100 
Output SIN Rato (dB) 
Figure 1-1: (a) Power and (b) area costs for analogue and digital designs. Analogue 
approaches are cheap for low precision because the hardware is a direct mapping of the 
problem to be solved. At high precision, however, analogue circuits become resource 
expensive due to noise and offset requirements. 
1.3 Low Voltage Low Power Design 
Most CMOS circuits designed in this work have been optimised to operate at a 
low supply voltage of ± 0.9 V. The use of a low supply voltage is not merely dictated by 
considerations of low power dissipation, but is a necessity as a result of maximum voltage 
limits imposed by today's CMOS technologies. In fact, low power and low voltage are 
often conflicting specifications. The digital market has pushed technology into higher 
component densities, resulting in reduced line-widths and hence lower gate-channel 
5 
Chapter 1. Introduction 
breakdown voltages. Figure 1-2 shows the roadmap of the supply voltage VDD, threshold 
voltage VT and corresponding ratio VDD/VT, as a function of channel gate length, for past 
and future CMOS technologies [28]. Although supply voltages have been significantly 
reduced, threshold voltages have not been reduced in the same manner mainly due to the 
VDD/VT 
5- 
O 
bý 
po 2 VDD 
vý b " 
O 
y 
2003-2006 
0.51, 4> , 
ÖÖ VT 
aa 
0.01 0.02 0.05 0.12 0.25 0.5 1 
Gate channel length, µm 
Figure 1-2: Effects of scaling down CMOS technology: power supply voltage is going down 
as a result of a reduced gate oxide breakdown voltage. Threshold voltage is also decreasing, 
but as a slower pace due to static power dissipation in digital circuits 1281. 
requirement of low static power dissipation in digital circuits. The reduced ratio of 
maximum supply voltage to threshold voltage is not critical for digital designs, but has 
made analogue design more challenging. 
Some of the implications of low voltage operation are clear if one considers just 
the noise issue. For a hypothetical analogue circuit operating at a supply voltage of VDD, 
and with a rail-to-rail output capability, the maximum sinusoidal output power is V2DDI8, 
resulting in a S/N ratio of. 
22 
SIN= 
VDD 
_ 
yDD 
8vß ýG12 K,, (P)(fh 
-f)+K In 
fh (1.1) 
; _ý Ip 
Ai fi 
6 
Chapter 1. Introduction 
where vr, 2 is the noise at the output, K, (p) and Kf are technology dependent parameters, N 
is the number of MOS devices in the circuit, I; is the dc current through each MOS 
device, G12 is the noise contribution of each device to the output, and fh - f, represents the 
bandwidth being considered. The parameter p is equal to 0.5 for the MOS device in strong 
inversion and 1.0 for the MOS device in weak inversion. The first term in the 
denominator represents the thermal noise while the other term represents the flicker (or 
1/f) noise. The parameters K, and Kf are given by [24]: 
K, (1.0) = 
4kTU,. 
; Kw(0.5) = 
4kT(2 / 3) 
.Kf=B /Cox (1.2) 2K 2, uCox (W / L) 
where k is Boltzmann's constant, T is the absolute temperature, UT is the thermal voltage, 
x is the subthreshold exponential coefficient, µ is the mobility of the electron (or hole), 
C0,, is the oxide capacitance, W and L are the width and length of the transistor and B is a 
measure of the number of surface states, impurities and defects in the gate oxide. The 
coefficients G; and the number of devices used in the processing stage are to a certain 
extent dependent on the ingenuity of the designer. However, it can be easily deduced that 
in order to maintain a specific S/N ratio at low values of VDD, both flicker and thermal 
noise contributions have to be reduced. Flicker noise can be reduced by increasing the 
area of the MOS devices while thermal noise can be reduced by either increasing the dc 
current (and hence power dissipation) or increasing the W/L ratio. The minimum L ratio is 
defined by technology constraints and also by the need to minimise errors due to 
mismatches and channel length modulation effects. Hence an increase in W/L ratio 
effectively also means an increase in the area cost together with associated parasitics; in 
order to maintain the same bandwidth, with an increased value of parasitics, the current 
consumption has to be further increased. 
Other challenges arise during the design of practical CMOS circuits due to the use 
of low supply voltages. The technology used in this work has a nominal threshold voltage 
of 0.8 V, which can increase up to 1.2 V due to the body effect. This means that for 
operation with VDD = 1.8 V, circuits have to be designed using a maximum of only one 
gate-source interface between each supply rail. Taking a VGS value of 1.0 V, leaves only 
0.8 V available for signal swing and additional drain-source interfaces as required, for 
example, for current sources. Typically the drain-source saturation voltage for an MOS 
device (VDSsat) is around 100 to 200 mV, which limits the number of drain-source 
7 
Chapter 1. Introduction 
interfaces which can be placed between each rail. Whereas in supply voltages of 5V or 
more it is possible to place multiple cascode devices and thus achieve a high gain in a 
single stage, this option is not available for low voltage operation. The situation is further 
aggravated, since most devices operate on the verge of pinch off and hence exhibit low 
output resistance, which further reduces the gain of amplifiers and accuracy of current 
minors. 
Log-domain (LD) current-mode circuits offer a substantial advantage in this case 
since they essentially require low gate overdrive voltage, reduced VDS sat, and no high gain 
amplifiers are required. Furthermore voltage swings are compressed relative to the current 
signals. The design of switched capacitor (SC) circuits at low supply voltage is quite 
critical due to different issues. First of all, sufficient gate overdrive must be applied in 
order to efficiently turn on the MOS switches such that a steady state result is obtained 
within the required time interval. Secondly, errors due to non-idealities such as clock 
feedthrough and charge injection tend to be high compared to the available signal swing, 
which necessitates the use of a differential topology and techniques to reduce such errors. 
The design of differential, high gain and high slew-rate op amps as required for SC 
circuits is also a challenging task at low supply voltages. On the other hand SC circuits 
offer an advantage for low supply voltage operation, in that the capacitors may be used in 
order to provide the required voltage shift between two successive stages such that 
adequate overdrive is available for the next stage. 
1.4 Deviations from the Biological System 
Although this work is broadly based on the processing which is carried out in the 
biological auditory system, and relies on the extraction of localisation features resulting 
from the head related transfer function (HRTF), the techniques used here are not intended 
to be an exact model of the auditory system. Several deviations have been made from the 
biological system in order to improve applicability to VLSI implementation and/or 
improve the localisation accuracy. Important deviations of this work from the biological 
system are listed below: 
(a) Onset detection is carried out using an algorithm which is not related to the 
biological system, but is easily implemented using analogue VLSI techniques. 
8 
Chapter 1. Introduction 
(b) A parallel bandpass filter architecture has been used instead of a cascade of low 
pass filters for the "cochlea" front-end in order to reduce problems due to accumulation of 
noise, offset and delay. 
(c) The automatic gain control system used in the front-end acts on all frequencies 
in order to preserve spectral information. This is different from what happens in the 
biological system, where small amplitude components seem not to be masked by high 
amplitude components [29]. 
(d) Envelope extraction is carried out using signal squaring rather than half-wave 
rectification, in order to reduce the filtering requirements and hence minimise chip area. 
(e) Signal communication between the cochlea and the cochlear nucleus in the brain 
is carried out in pulse form. In this work, continuous time analogue signal processing is 
used for the spectral cues and discrete time processing is used for the extraction of time 
delay cues. 
(f) Mapping of the localisation cues into the actual source azimuth and elevation is 
carried out using an algorithm optimised for speed and accuracy; this algorithm does not 
reflect in any way the processing which is carried out in the brain. 
1.5 Scope and Outline of Thesis 
The scope of this thesis is the design of an analogue CMOS front end for the 
extraction of localisation cues from which the azimuth and elevation of a sound source 
can be determined. The analogue hardware is implemented in a standard CMOS process, 
operating at a low supply voltage of ± 0.9 V with low power dissipation. 
Chapter 2 of the thesis reviews various hardware implementations of the models 
for the cochlea, and also applications of the silicon cochlea for sound localisation. First 
the main features of the cochlea are presented. Then hardware models for the different 
sections of the cochlea, namely the middle ear, inner ear, inner and outer hair cells, spiral 
ganglion neurones and cochlear nucleus are reviewed. Special emphasis is given to the 
inner ear models, where different VLSI topologies are introduced. In particular, design 
issues for the popular Gm-C based inner ear models are discussed. Sound localisation 
theory is introduced and VLSI implementations of sound localisation models are 
reviewed, together with possible enhancements. 
The development of a novel cue-to-position mapping algorithm intended for 
mapping certain sound features into the sound source position is presented in chapter 3. 
9 
Chapter 1. Introduction 
The system topology being investigated is first explained, together with techniques of how 
the test stimulus signal is generated. Then the various processing stages required for 
extracting both the monaural and binaural cues are described, namely the bandpass 
filtering and cue extraction methods. A technique for onset detection is also presented. 
The process of generating and interpolating a cue template for subsequent comparison is 
described. Then, the chapter focuses on the actual algorithm developed, outlining the 
basis for this algorithm. The new algorithm is tested via simulation for various 
environmental and hardware non-idealities. The accuracy of the new algorithm is 
compared with that of an existing single-step search method. The performance of the new 
algorithm at different source positions is also assessed under the influence of different 
signal-to-noise ratios. The accuracy and speed of the developed algorithm is then 
discussed. Finally a more generic search algorithm is introduced. 
Chapter 4 discusses the design, simulation and testing of an analogue 
programmable CMOS circuit for onset detection in sound signals. First the principle of 
operation is reviewed. The chapter then describes the design of the various building 
blocks (namely voltage-to-current converter, harmonic mean splitter, envelope detection, 
delay line, echo decay model, and onset window generator) forming part of the onset 
detector which is based on log-domain and switched current techniques. A novel feedback 
technique enables accurately controlled class AB operation of a low voltage switched 
current memory cell used in the delay line. The design is optimised for low power and low 
supply voltage operation. Simulation results for each building block are then discussed. 
Finally measurement results for each individual stage and also for the whole onset 
detector chip are presented showing good correlation between the simulated and measured 
characteristics. 
Chapter 5 presents the design, simulation and testing of an analogue front-end 
CMOS chip intended for the extraction of spectral sound localisation cues from two 
sound signals. This chip is designed entirely using log-domain techniques, resulting in 
very low power dissipation. Two possible implementations of bandpass filters (BPF) are 
first discussed. Then the necessary requirements and modifications for differential 
class AB operation are presented. A novel cross-coupling technique is used to ensure the 
proper operation of differential class AB log-domain BPFs. The chapter then describes 
the actual cue extraction CMOS block, as required for interaural and monaural spectral 
cues. An automatic gain control mechanism is also implemented in this chip in order to 
10 
Chapter 1. Introduction 
enhance the dynamic range. Simulation results are presented for each building block. 
Measurement results for the fabricated chip show that salient parameters which affect the 
accuracy of the localisation system, such as BPF dynamic range and divider accuracy, 
exhibit the required response. 
The design, simulation and testing of an analogue low voltage correlator chip for 
the extraction of time delay cues are presented in chapter 6. This chip is based on 
switched capacitor techniques with a multiplexed op amp, used in order to reduce the area 
requirement. An overview of the required building blocks is first given, together with a 
presentation on delay lines which are fundamental to time delay cues computation. The 
design and test results of a cascade delay line are discussed. The chapter then focuses on 
a time-division multiplexed topology. For this implementation a high slew-rate op amp 
was developed and experimental results obtained after fabrication. A second high slew- 
rate op amp was designed in order to enhance the gain. A novel low voltage switched 
capacitor multiplier, required to compute correlation between two signals, has also been 
developed. The chapter discusses the various building blocks in the multiplexed correlator 
chip, including the digital control circuitry. Testing results show that the correlator chip 
output follows closely the expected output in extracting the ITD cues. 
Chapter 7 describes the test setup used for testing the complete system and the 
extraction of a cue template customised to the designed hardware. Measured spectral cue 
values are obtained using the hardware front-end, while time delay cues are computed via 
software correlation of the front-end outputs. Test results for different source positions 
using hardware-derived cues show that localisation accuracy within 5° is obtained for 
over 95% of the test cases considered. The effect of the onset detector is also verified by 
applying test stimuli containing echo interference and comparing the results obtained 
when the onset detector is first enabled and then disabled. The performance of the system 
using different test stimuli has also been assessed. 
General conclusions drawn from the previous chapters, together with a summary 
of the original contributions carried out at each stage are presented in chapter 8. Future 
work, including possible enhancements on the work presented in this thesis, is discussed 
and possible extensions to this work in order to implement a complete hardware sound 
localisation system are described. 
11 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
Chapter 2 
The Silicon Cochlea with Application to Spatial 
Localisation 
2.0 Introduction 
In this chapter, reported silicon cochlea model implementations with their 
adaptation to sound localisation hardware applications are reviewed. The chapter is 
organised as follows: in section 2.1, the functional operation of the human auditory 
pathway is described; section 2.2 reviews silicon models for the cochlea, inner hair cells, 
outer hair cells and the spiral ganglion cells. Major emphasis is given to the inner ear 
(cochlea) silicon model, where the different filtering techniques are described, together 
with their respective advantages and disadvantages: in particular, the popular 
transconductance-C topology is examined with respect to stability, dynamic range, device 
mismatch effects and area requirement. Section 2.3 looks at sound localisation techniques, 
with major emphasis on the silicon cochlea used as a front-end in VLSI implementations. 
The first cochlea models were electrical transmission line models [30]. These 
models were later replaced by digital computer simulations in the 1950s and 1960s; 
however, analogue electrical models were still being built using discrete 
components [31-33]. The availability of low-cost and compact VLSI technology revived 
interest in analogue cochlea models [34], which can be used for real-time, low power 
applications in the field of auditory modelling, speech recognition, cochlear implants and 
sound localisation. 
Spatial localisation of a sound source is an important research issue since it can be 
used as a model for the biological system, and also has practical applications such as 
sound-guided vehicles and automatic camera orientation in teleconference systems [35]. 
Several models have been implemented in software, capable of accurate 2-D sound 
localisation. Such implementations may require several minutes of computation time and 
therefore, for real-time applications, hardware implementations are necessary. Existing 
hardware implementations are based on interaural time difference, computed at various 
frequency bands, between the acoustic signals received from the left and right channels, 
and are capable only of lateral localisation [19], [21], [36]. The silicon cochlea model can 
be used in order to split the input signal into the various frequency bands. 
12 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
2.1 The Biological Cochlea 
The human auditory pathway consists of the external ear, the middle ear and the 
inner ear. Transduction from mechanical energy to electrical pulses occurs along the inner 
ear via the inner hair cells (IHCs). The external ear introduces spectral shaping (mainly 
due to the pinna) of the sound which is dependent on the location of the sound source. 
This spectral shaping is important for sound localisation especially in the determination of 
elevation [37]. The acoustic wave entering the middle ear may be considered to be the 
summation of two waves: the direct wave and the reflected wave from the pinna. There is 
a phase difference between these two waves, which monotonically decreases with increase 
in elevation [38]. The resulting composite wave thus contains spectral shaping which 
carries information about the elevation of the sound source. Azimuth localisation of a 
sound source can be carried out due to the spectral shaping introduced by the head, as well 
as due to the difference in energy level and arrival time of the sound signal between the 
left and right ears. The manipulation of the sound signal by the head and the pinna can be 
characterised via a direction dependent impulse function known as the Head Related 
Transfer Function (HRTF). 
The middle ear serves several functions such as impedance matching from air- 
travelling wave to the liquid-travelling wave. It also protects the basilar membrane (which 
runs along the length of the cochlea duct) from high-energy sounds and provides a 
filtering effect. The spectral shaping introduced by the middle ear may be considered as a 
5th order low-pass filtering effect with a double pole at approximately 24 kHz and a triple 
pole at approximately 16 kHz, respectively [37], [39]. 
The inner ear consists of the cochlea duct together with the cells necessary for 
sensory transduction. The cochlea may be considered as an incompressible liquid 
surrounded by a hard wall on one side and a flexible membrane on the other side as 
shown in Figure 2-1. 
A full biological description of the cochlea is given in [40]. The IHCs are located 
on the flexible membrane. The liquid obeys Laplace's equation and is subject to the 
13 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
Audio signal 
entryfrom middle 
ear 
basilar 
membrane y=h 
y=0 
Y 
x 
Figure 2-1: Simplified single-chamber 2-D model of the cochlea 1401: the x and y axes 
denote the direction along and across the cochlea duct, respectively. 
boundary conditions imposed by the flexible membrane, the hard wall, the signal entry 
point and the end of the duct. These conditions may be summarised as follows [40]: 
__-fix=L 
1901 
=0 (Hard wall), 
y=o 
-2p 
a20 
= S(x) 
do 
+ ß(x) 
az0 
+M (X) 
030 
at y=h (Basilar membrane) (2.1) 
at ay ayat ayat 
where f(t) is the acoustic wave signal, L is the length of the cochlea duct, p is the liquid 
density and S(x), ß(x), M(x) are the stiffness, damping, and mass of the membrane, 
respectively. In Eqn. 2.1,4 is the velocity potential of the liquid and is related to the 
velocity as v= -V4 and to the pressure difference p and liquid density p as p= p(64 /St). 
The x-direction is taken along the cochlea duct, while the y-direction is taken across the 
cochlea duct. 
The membrane parameters vary along the cochlea duct according to the following 
relationships [40]: 
V20 =0 (Laplace's equation for the liquid), 
Ila 
=f (t) (Signal entry point), 
°o 
=0 (End of duct), 
S(x) = S(0) exp(- 2x /A) 
Q(x) = ß(0) exp(- x/ A) (2.2) 
M(x) = M(0) 
14 
x=0 x=L 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
where A is called the space constant. The above analysis leads to 2-D or 3-D cochlea 
models. Early models were I -D models, which were also known as long-wave models, 
since the acoustic wave was assumed to be long compared to the cochlea duct width, such 
that the wave propagation was assumed to be predominantly along the cochlea duct. Since 
the basilar membrane at the signal entry point is stiff, the signal velocity is high. As the 
signal propagates along the duct, its velocity becomes progressively lower. The effect of 
the basilar membrane on the signal is a low-pass filtering characteristic with a cut-off 
frequency which is high at the entry point and becomes progressively lower with 
increasing x. The frequency response characteristic at a particular position along the duct 
is a low-pass filter characteristic with a slight broad peak near the cut-off frequency: this 
peak is termed pseudoresonance, and is not the result of a true resonant structure. The 
result is that the spectral content of the input acoustic wave is effectively spatially 
distributed along the cochlea duct. In [411, a more sophisticated 2-chamber model of the 
cochlea has been developed, which characterises the reflections, known as Kemp echoes, 
occurring from the cochlea as a result of an incident acoustic signal. 
The transduction from mechanical vibration to electrical signals is carried out by 
the IHCs which respond only in one direction. This means that the IHCs carry out a form 
of half-wave rectification. Also, the IHCs respond to the velocity of the membrane rather 
than to its displacement. This displacement to velocity translation may be regarded as a 
time differentiation process. The IHC exhibits a compressive non-linear response, which 
reduces the signal level range to a more manageable range [19]. The IHCs are assumed 
not to interfere with the actual cochlear mechanics. 
The outputs of the IHCs feed the spiral-ganglion neurons (SGNs). These neurons 
are responsible for converting the analogue output of each IHC into fixed-width fixed- 
height variable frequency pulses (called action potentials), whose average firing rate 
encodes the analogue signal strength of the IHC output. The SGNs also perform temporal 
adaptation, that is with a persistent stimulus, their output tends to decrease with time. In 
the absence of an input stimulus the SGNs produce spontaneous action potentials, whose 
frequency can be as high as half that which occurs with maximum input stimulus [42]. 
The pulses from the SGNs are relayed via the auditory (cochlear) nerve into the 
cochlear nucleus in the brain. Here these pulses feed several types of cells each having a 
particular firing characteristic: each cell type tends to extract specific key features from 
the input pulse stream: for example, the Lateral Superior Olive and Medial Superior Olive 
15 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
nuclei are believed to be responsible for sound localisation based on interaural intensity 
difference and time difference respectively. A detailed description of these cells can be 
found in [42]. 
The above description assumes a purely passive time-invariant cochlea structure: 
in reality, the cochlea also exhibits a form of active automatic gain control (AGC). The 
automatic gain control compensates for losses occurring due to the inherent damping of 
the basilar membrane and also increases the sensitivity of the cochlea for low energy 
acoustic signals [43]. The gain control is carried out by active and adaptive outer hair cells 
(OHCs), which are under control of the brain. The OHCs can reduce the loss term for 
low-energy signals and can also make it negative. An important aspect of this AGC loop 
is that it is not local, but distributed: signals from the inner parts of the cochlea duct affect 
the damping at outer sections: this is known as broadly-coupled AGC. In this way, the 
spectral contrasts between the various cochlea sections are not reduced. Also, the broad- 
coupled AGC results in better control, since signals arriving at the inner sections of the 
duct, must first travel through the outer sections. This effect also results in two-tone 
suppression, where frequency components near the cut-off frequency are suppressed by 
the presence of high-level signals at lower frequencies. Another aspect of the AGC 
mechanism is that the gains of the left and right cochleas are probably coupled, such that 
the AGC loop does not suppress interaural intensity differences (IIDs), which are useful 
for lateral localisation. 
2.2 Hardware Models of the Auditory System 
2.2.1 Middle Ear Models 
The first electrical models of the auditory pathway consisted of passive networks, 
using resistors, inductors and capacitors (RLC), and transmissions lines [37]. In these 
models, voltage and current values are analogous to pressure and velocity values, 
respectively. RLC networks can be used to model the behaviour of the middle and inner 
ear sections. In the transmission line model, the signal propagates along inductors in 
series, with series RLC networks tapped from the inductors to ground (i. e. acting as shunt 
elements). These shunt elements are tuned to high frequencies near to the base of the 
cochlea (staples) and to low frequencies near to the apex (helicotreme). 
The middle ear has been modelled in silicon as a 5th order LPF, implemented using 
five cascaded G,,, -C low-pass filters, based on the damped integrator topology. Since two 
16 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
poles of the filter coincide at one frequency and the remaining three poles coincide at 
another frequency, the filter was divided into two groups each having a separate tuning 
bias voltage [39]. 
2.2.2 Inner Ear Models 
Various VLSI implementations of the actual cochlea (also known as inner ear or 
basilar membrane) have been reported. Most silicon cochlea models have been based on 
weak inversion G, , -C filters [19], [34], [38], [44], [45] and also on switched-capacitor 
filters [46]-[50]; the key issues in such implementations are low-power, low-voltage 
operation, large dynamic range, low area requirement and temperature drift. This section 
reviews the different filtering techniques used in analogue implementations, together with 
a brief description of a digital implementation. 
2.2.2.1 Transconductance-C Implementations 
The most popular cochlea model is a 1-D unidirectional model which consists of a 
cascade of single-ended 2"d order low-pass filter sections with progressively-decreasing 
cut-off frequency [19], [34], [38], [44], [45]. The 2"d order low-pass filters are based on 
MOS Gm C filters with independent tuning voltages for the cut-off frequencies and 
Q-factors, as shown in Figure 2-2. 
Q-tuning OTA3 
Vout 
Vin 
co - Tuning 
Figure 2-2: 2nd order LPF with separate cut-off frequency and Q-factor tuning. For 
cochlea modelling, several of these sections, tuned to different cut-off frequencies are 
cascaded. 
17 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
Typically, the filters operate in weak inversion (subthreshold) region in which case 
the gm value which sets the cut-off frequency is directly proportional to the bias current. 
The drain current of an MOS transistor operating in weak inversion is given by [51], [52]: 
ýKUT 
S 
)e( (1-UTVBS 
I UVDS T1p= Ioe 1 _e\ (2.3) 
where K is the subthreshold exponential coefficient, I. is the subthreshold current-scaling 
parameter, UT is the thermal voltage equal to kT/q, VGS is the gate to source voltage, VDS 
is the drain to source voltage and VBS is the bulk to source voltage. 
The ID-VAS characteristic of the MOS transistor, operating in subthreshold region 
is exponential, and thus a linearly decreasing bias voltage produces an exponentially 
decreasing cut-off frequency as required for cochlea modelling. This is achieved using a 
polysilicon resistive line with 2 bias voltages applied to its ends. Another G,, , -C 2d order 
LPF section is described in [53], which uses only two OTAs, but in this case the Q-factor 
and cut-off frequency cannot be tuned independently. In [53], a 3rd order LPF is also 
described, which is a more accurate model of the cochlea hydrodynamics. G,,, -C filters are 
popular for this application, since they have low power dissipation, wide tuning range and 
relatively low area requirement. 
A more accurate cochlea model is presented in [40], where separate circuits are 
used in order to model the fluid and the basilar membrane. In this case, the fluid is 
modelled using a 2-D resistive network, while the membrane is modelled using GrC 
circuits which can mimic the basilar membrane impedance. This model is a 2-D model, 
which can be reduced into a 1-D model. Although the area requirement is about 10 times 
higher than that of the cascaded LPF model, this improved model has the advantage of 
being bi-directional, fault-resistant and has a continuum limit. 
A multi-resolution basilar membrane, again based on the G,,, -C approach, can be 
found in [36], where separate voltages are used to control the propagation delay and 
cut-off frequency: in this way, the frequency resolution can be increased, without 
excessively increasing the propagation delay. Other models have been based on a cascade 
of V' order LPFs, with BPFs tapped to them [40], [54]-[56]. The cascade approach is 
very sensitive to errors introduced due to mismatch, offset and noise since these errors 
tend to accumulate along the cascade. In [21], the cascade approach has been eliminated 
18 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
using a parallel bank of BPFs instead as a cochlea, although this is not a true cochlea 
model. 
Subthreshold G, ,, -C implementations suffer 
from mismatch errors, relatively low 
dynamic range and temperature drift. For this reason switched capacitor (SC) 
implementations are preferred when more power and chip area can be traded for accuracy: 
in this case, the accuracy of the filter is determined by capacitor ratios rather than absolute 
component values and hence, high accuracy can be achieved. SC implementations also 
offer a high dynamic range at reduced voltage levels and do not depend on linearity of the 
active devices used. Several SC implementations of the cochlea model have been 
investigated [46]-[50]. In [46], the design consisted of a cascade of 2nd order SC filters 
based on unity gain buffers built using a Miller OTA: this design leads to a parasitic 
sensitive circuit. A variable clock frequency arrangement is used in this case in order to 
achieve the desired frequency range and reduce the area and power requirement. The 
designs in [47]-[50] are not exact cochlea models, since they are all based on a parallel 
structure of BPFs. The area requirement can be reduced using techniques such as time 
division multiplexing (TDM) with resistive strings [47] and very large time constant 
biquads based on charge-differencing biquads [48], [49], in which the capacitor value 
spread can be made very small and still obtain low cut-off frequencies. The latter design 
also allows for offset voltage cancellation. TDM SC filters are again used in [50], together 
with a special capacitor sharing technique in order to reduce the area requirement: in this 
case, the op amp bias arrangement is such that maximum bias current occurs during clock 
transitions, such that high slew-rates can be achieved at the instance when a clock 
transition occurs, while still keeping a low quiescent bias current. 
2.2.2.2 Log-Domain Implementations 
Although SC filters can be used at low supply voltages, they suffer from several 
problems arising from clock feedthrough, and sampling errors. In order to be able to use 
low supply voltages and still obtain a good dynamic range, together with very low power 
dissipation, log-domain filters can be used. Log-domain filters are continuous-time filters 
which require minimal linearisation circuitry (unlike G,,, -C filters), since the non-linearity 
of the circuit elements is compensated by the circuit topology itself: thus log-domain 
filters are internally non-linear externally linear systems. The companding nature of these 
19 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
filters minimises the voltage swing and hence large signal inputs can be achieved with 
low supply voltages. Figure 2-3 shows the general concept of log-domain filters. 
A log-domain implementation of the cochlea model, based on the 2 "d order LPF 
cascade, can be found in [57]. Here, MOS transistors biased in weak inversion (Eqn. 2.3) 
were used in order to implement the log(x) and exp(x) functions required for log-domain 
filters. Log-domain filters, like Gm-C filters suffer from low accuracy, but temperature 
drift can be greatly reduced by deriving the bias current from a self-biased (PTAT) current 
generator [58]. The main problem with log-domain filters is that exact noise analysis is 
difficult due to the nonlinear behaviour of the circuit: in fact, the noise signal is affected 
by the input signal level. Unlike linear systems, device mismatches in log-domain filters 
can cause distortion. For these reasons several specific tests are necessary in order to 
completely assess the performance of such filters [59]. 
J Logarithmic 
Non-linear 
Input Processing in the 
Exponential Filter 
Compression g Expander Out ander Out put Signal Log-domain 
Figure 2-3: Principle of the current-mode log-domain filter. 
2.2.2.2.1 The Translinear Principle 
The translinear principle is an important tool for the analysis and design of log- 
domain circuits and can be stated as follows: in a closed loop containing an equal number 
of oppositely-connected translinear elements, the product of the current densities in the 
elements connected in the clockwise direction is equal to the corresponding product for 
elements connected in the counter clockwise direction [60]. A translinear element is a 
device which has a transconductance proportional to the current. The bipolar transistor is 
a classic example of such a device. However, CMOS transistors also exhibit the 
translinear property when operated in the subthreshold region as is evidenced from Eqn. 
2.3, provided the body effect term due to VSB is removed and VDS » UT. A 4-transistor 
translinear loop is shown in Figure 2-4. The translinear loop is formed by transistors M1- 
M4, while M5,6 provide local feedback around M1,3 respectively. Using Kirchoff's voltage 
law VGSI - VGS2 + VGS3 - VGS4 = 0. 
20 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
VB 
Figure 2-4: An MOS 4-transistor translinear loop 
Using Eqn. 2.3 for weak inversion operation and neglecting the term due to VDS: 
UT 
1 1=xV 
U, 
l Bý 
IZ 
+1-KV +U,. 1 Bý 
I3 1=x U, y Bý 
l4 
+1-K __ 0 
(2.4) 
K Io K K Io K K Io K K Jo K 
Bý 
Assuming all the devices are fabricated in the same common substrate, it can be noted that 
VBSI = VBS2 and VBS3 = VBS4 and thus 1113 = 1214, In the above circuit, the output terminal 
of the translinear loop is the drain of M4. However, the output terminal can be shifted to 
any transistor position, by simply changing the position of the diode-connected transistor 
and/or changing the position of the gate connection of the local feedback transistors. The 
above topology is well-suited for low-voltage applications since it merely requires a 
supply voltage of VGS2 + VDS5 + VDSSat (for upper current sources), which comes out to be 
around I-1.2 V for a typical CMOS technology with a threshold voltage of 0.8 V. 
2.2.2.3 Digital Implementation 
In a digital hardware implementation of the silicon cochlea [25], the cochlea 
model was based on a cascade of 2"d order digital LPFs, with independent cut-off 
frequency and damping coefficients. Bit-serial arithmetic together with TDM was used to 
implement the LPFs: in bit-serial arithmetic systems, low area and power requirement is 
achieved at the expense of reduced effective speed. The comparative study between 
digital and analogue processing, carried out in [24], indicates that analogue computation is 
21 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
cheaper than digital computation for low precision applications (such as cochlea 
modelling) in terms of power and area overhead and vice-versa for high precision 
applications. 
2.2.3 Inner Hair Cell Models 
An electrical model for the inner hair cell (IHC) can be found in [37], and consists 
of an R-C transmission line with a resistive load followed by a leaky differentiator. The 
input to the network is a current, representing the calcium current, which arises due to 
voltages in the IHC as a result of displacement. The R-C transmission line represents the 
diffusion process of the calcium. The leaky differentiator models the transformation of 
the resulting calcium concentration to postsynaptic voltage. This circuit models the 
chemical processes which take place inside the cell, rather than the global behaviour of 
the IHC from the mechanical to electrical transduction point of view. 
Silicon models of the IHC are mostly behavioural models of the IHC as a 
transducer. Commonly, the IHC is modelled using a differentiator, a saturating non- 
linearity, a half-wave rectifier [19], [22], [61], [62], and in some cases a temporal 
adaptation circuit [42], [63]-[65]. In most cases, a hysteretic differentiator [61], [62] is 
used for the differentiation of the cochlear model output, while in some cases a linear 
differentiator is used [22]. The hysteretic differentiator [38] typically consists of a Gm-C 
circuit with a non-linear element in feedback loop: this type of differentiator enhances the 
zero-crossings of the input waveform, and thus emphasises the phase information in the 
waveform. In other cases, the normalised differentiated output is taken as the voltage 
difference between the input and the output of the second feedforward damped integrator 
in the 2nd order LPF section [29], [66], in which case, the differentiation process is 
essentially linear. 
The rectification process is usually half-wave using an MOS transistor as a diode. 
Although half-wave rectification is biologically plausible, full-wave rectification is 
sometimes preferred, when the output of the IHC has to be temporally smoothed in order 
to retrieve the AGC control signal: this is because full-wave rectification requires a 
smaller smoothing capacitor than half-wave rectification for the same amount of ripple, 
and therefore requires less chip area [29]. 
A possible technique to implement temporal adaptation [65] is to integrate the 
output of the IHC using a leaky integrator and use it to control the current ratio of a 
22 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
modified current mirror [67] which couples the IHC output to the next stage. A slightly 
different approach (Figure 2-5) is used in [42], [63], [64], where temporal adaptation is 
implemented by subtracting a current from the IHC output which increases monotonically 
with the integrated value of the IHC output. 
n 
Vref 
lcJ 
R 
lout 
Ispont 
Saturation H-W Rectification: Adaptation 
............................................... .................................. -............. 
Figure 2-5: Inner hair cell model including temporal adaptation: with sustained 
stimulus, the capacitor C charges up via R, causing the output current Iout to decrease. 
I, pont is used to represent the spontaneous firing of the Spiral Ganglion neurones with 
zero stimulus [421, [631, [641. 
A less common IHC model with temporal adaptation [39] is based on discrete 
charge storage and models the actual fine details of the biological IHC. This model is 
based on the reservoir model of the hair cell with 3 stages of stores: the global store, the 
local store and a number of immediate stores. The constant global store charges the local 
store at a specific time constant. When the local store voltage exceeds an immediate site 
voltage, the immediate site also starts to be charged from the local store at another time- 
constant. The number of immediate sites discharging current to the output node is a 
compressive monotonic-increasing function of the basilar membrane velocity signal taken 
from the cochlea model. 
The IHC model in [681 is intended as a zero-crossing detector and frequency-to- 
voltage (time-to-voltage) converter, where it is assumed that the relevant information in 
the cochlea output is solely contained in the zero-crossings. This is not an exact model of 
the biological IHC, but may be useful in some applications. 
23 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
2.2.4 Outer Hair Cell Modelling and Automatic Gain Control 
The OHC can be modelled using an integrator, a 2nd order LPF and a saturating 
non-linearity. The integrator is used in order to model the transformation of membrane 
velocity into displacement, while the LPF computes a filtered and delayed version of the 
membrane displacement, corresponding to delayed OHC response to bending of the 
stereocilia [40]. The OHC implementation in [40] is, however, only an open-loop system, 
with no automatic gain control (AGC) mechanism. 
Silicon models of the cochlea which implement AGC (also called adaptive or 
active models) can be found in references [24], [25], [29], [69], [70]. In [70], the cochlea 
model consists of a cascade of notch filters with BPF-LPF structures tapped to them. The 
AGC control signal is derived from the BPF output and is used to control the gain of the 
LPF. In this way, the AGC method is essentially feed-forward with no spatial averaging: 
this is not an exact model of the cochlea, but this technique guarantees AGC stability. 
In [24], [25], [29], a feedback AGC loop is used, which is more biologically plausible: in 
these cases, the AGC is implemented by adjusting the Q-factor of the 2"d order LPFs used 
in the cochlea filter cascade, with the processed IHC signal. The main problem with 
feedback AGC techniques is the stability issue: the feedback signal is derived through 
sensing of the cochlea output (IHC/OHC circuit), together with temporal and spatial 
averaging. All these processes incur a time-delay, which can therefore pose stability 
problems. This condition is further aggravated in the cascade approach where a single 
unstable stage can cause all the outputs of the successive stages to be corrupted; (indeed 
in a malfunctioning biological cochlea, instability does occasionally occur resulting in a 
symptom called tinnitus). Spatial averaging is obtained either by a cross-talk connection 
(Figure 2-6) between adjacent channels [24], [25] or via a weighted summation (involving 
nearby channels) as in [29]. 
In [69], the AGC mechanism is implicitly implemented using a non-linear positive 
feedback block for the feedback OTA. At high signal levels, the nonlinear block shuts off 
the positive feedback signal and thus effectively reduces the gain of the stage, resembling 
the adaptation which takes place in the biological cochlea. 
24 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
Signal 
input Filter 
Q-Ctl. HWR 
AGC 
Filter 
AGC 
Filter 
To other 
filters 
HWR HWR 
AGC 
Outputs 
Figure 2-6. Cochlea model including AGC [25]: the outputs of the 2nd order filters are 
half-wave rectified (HWR) and used to control the AGC blocks which perform feedback 
Q-factor tuning. Crosstalk connections between adjacent AGC blocks prevent loss of 
contrast information between channels. 
2.2.5 Spiral Ganglion Neurone Models 
The spiral ganglion neurone (SGN) models, which encode amplitude information 
into a pulse-frequency modulated (PFM) pulse stream, have been exclusively based on 
circuits derived from Mead's Self-Resetting Axon [38], [71]. This circuit consists of a 
non-inverting, non-linear amplifier with capacitive positive feedback. The input current is 
used to charge the input capacitor. As soon as the threshold voltage of the amplifier is 
exceeded, the output abruptly goes high due to the positive feedback action. A logic high 
state on the output, causes the input capacitor to start to discharge at a specific rate which 
controls the pulse width. Typical applications of this circuit as an SGN model can be 
found in [19], [65], [72], where the circuit has been slightly modified in order to inhibit 
the input during the duration of the pulse. The SGN model can also be modified in order 
to be able to include an adjustable refractory period after the output goes again low [36]. 
In most cases, the non-inverting amplifier is implemented using a cascade of two CMOS 
inverters; however, a modified low-power version of this amplifier [67] can be used in 
which all the transistors remain biased in the weak inversion region. 
In some applications, the conversion from amplitude information output by the 
IHC to PFM may not be required for further processing to take place; therefore the output 
of the IHC model which is usually a current-mode signal is directly processed by other 
circuitry without prior conversion into pulse form [42], [63], [64]. In such cases, the 
spontaneous firing of neurones can be modelled by simply adding a constant current to the 
IHC output. 
25 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
An exception to the PFM characteristic of SGN models can be found in [211, 
where pulse width modulation (PWM) is used instead: the pulse width is a monotonic 
function of the amplitude of the low-pass filtered, half-wave rectified signal output from 
the cochlea model. This is not an exact model of the SGN, but is preferred in this case, 
since the delay line is composed of fixed-delay delay elements unlike the delay elements 
used in [19], [36] where the delay elements delay the pulses by the width of the pulse 
itself. 
2.2.6 Cochlear Nucleus Models 
Models for the cochlear nucleus (CN) inside the brain, which perform low-level 
auditory processing, are also being developed [42], [63], [64]. In order to model the 
different neurones in the CN, a "universal neurone" has been developed in which several 
parameters such as membrane threshold potential, and several currents which mimic the 
sodium and potassium currents can be adjusted. These currents effectively control the 
input threshold, firing rate and refractory period of the neurone. Different types of 
neurones can thus be simulated by adjusting these parameters such that the model's output 
response matches neurological data. This data often takes the form of Post Stimulus Time 
Histogram which is thought to characterise the neurones inside the CN. In particular, 
in [63] a model for Chopper cells which synchronise to specific amplitude modulation 
frequencies has been developed, while in [64], results obtained for the spherical bushy 
cells, globular bushy cells, octopus cells and stellate cells are documented. Other neurone 
circuits can be found in [73]-[75]. 
In order to be able to achieve useful processing, a large amount of cells would 
have to be fabricated on a single chip, together with the required interconnections. For this 
reason, in most cases, post-cochlear processing has been limited to specific applications 
such as a stereausis model for speech recognition [62], pitch perception [ 19] and lateral 
sound localisation [ 19], [21 ], [36]. The stereausis model in [62] derives a 2-D 
representation of binaural sound from the outputs of two cochlea channels via the use of 
correlation elements placed between the left and right channels. Pitch perception involves 
autocorrelation of the cochlea output, while lateral sound localisation (based on interaural 
delay) involves correlation of the left and right cochlea outputs for each frequency band. 
In the cochlear nucleus, it is believed that neurones exist which act as pulse delay lines 
while other neurones fire when they receive signals on each of their two inputs 
26 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
synchronously. These two types of neurones can thus be used together in order to 
implement the correlation function. The silicon model for the time-delay line is based on a 
cascade of Mead's axon repeater circuit [38], while the correlator cells are based on 
analogue multipliers. In [19], a winner-take-all (WTA) network [76] is also used in order 
to select the cell which corresponds to maximum correlation and inhibit all the others. 
2.3 Design Issues Related to the Performance of the Silicon Cochlea 
The G, , -C 2 
"d order filter cascade model (based on the model used in [34]) 
operating in weak inversion, has been extensively investigated in silicon cochlea designs, 
since it is simple, compact and capable of good performance with low power requirement. 
The discussion carried out in this section is mainly related to this type of cochlea model 
implementation. 
2.3.1 Second Order Lowpass Filter Stability 
The transfer function of a2 °d order LPF can be written as: 
2 
H (s) _2A. 
On 
2 S+ 2ýoons + Cvn (2.5) 
where Ao is the low frequency gain, ý is the damping factor and w is the natural 
frequency of oscillation. The Q-factor is given as Q= 1/24. Small signal analysis of the 
classical 2 "d order section [38] in Figure 2-2, shows that: 
Ao =1, Wn = 
gm 
and m3 2-6c 
gm, + gm2 
() 
where gn, l and g,, 2 are the transconductances of the forward path OTAs (assumed equal to 
gm) and gm3 is the transconductance of the feedback OTA. It is evident that in order to 
preserve stability, from the small signal point of view, k must be strictly greater than zero 
(implying finite Q). This means that gm3 must be smaller than the summation of the other 
two transconductances. 
Large signal analysis on the 2 "d order LPF [38], however, shows that a more 
stringent condition needs to be obeyed, if large signal stability has to be preserved, that is 
4 >_ 0.191 (Q < 2.62). Large signal instability can be eliminated by increasing the linear 
27 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
range of forward path OTAs, and keeping that of the feedback OTA small, using 
techniques described in the following section. 
2.3.2 Dynamic Range 
The dynamic range of a G,,, -C filter is bounded at the lower end by the noise floor 
which is typically in the range of 0.1 mVpp, and at the upper end by the input linear range 
of the differential pair used in the Gm C filter which is typically 60 mVpp. If the 
differential pair is operated beyond its linear range distortion of the output occurs. In order 
to increase the input linear range of a differential pair several approaches are possible: 
these include input capacitive voltage dividers [53], source degeneration via diode- 
connected transistors [45], [53], diffusive source degeneration [54] and a special OTA 
structure comprising well (back-gate) input, source and gate degeneration with bump 
linearisation [51,69]. 
Capacitive voltage dividers have shown little success, since the Miller capacitance 
between the OTA output and its inverting input severely reduces the open-loop voltage 
gain: this also reduces the voltage gain of the resulting voltage-follower structure, which 
means that a cascade of filters will quickly attenuate the signal. Source degeneration is 
effective for increasing the linear range of a differential pair; however, it also poses a 
severe reduction in the common-mode operating range of the amplifier and an increase in 
thermal noise. Cascading of diode-connected transistors further increases the linear range, 
with greater reduction of the common mode input range [53]. A similar technique is 
diffusive source degeneration which yields no net increase in the current noise density and 
no decrease in common mode input range. Diffusive source degeneration can be applied 
using single or multiple diffusers. A single diffuser can improve the linear range by 
18 dB, but has the disadvantage that it requires extra common mode circuitry in order to 
determine the diffuser transistor gate bias voltage. Double diffusers give 12 dB increase in 
the linear range, but require no common-mode circuitry. 
The OTA in Figure 2-7, [51], [69] uses the back gate as the signal input terminal, 
which has intrinsically lower transconductance value than the normal gate terminal. 
Diode-connected transistors are used for source degeneration. The gate terminals in this 
case are used to implement gate degeneration. These three characteristics reduce the 
effective transconductance of the OTA, thus increasing the linear range. In addition a 
bump circuit [77] is used in order to reduce the current through the differential pair when 
28 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
the differential input voltage is near to zero: this technique, further linearises the OTA. 
One problem with this OTA arises due to the parasitic bipolar transistors present when the 
back gate is used as the input terminal: these restrict the common mode input range. 
Another effect is the change of the subthreshold exponential parameter x (in eqn. 2.3) 
with changes in the bias d. c. current and common mode input voltage, giving rise to 
deviations in the transconductance value. 
M 
M 
Vb 
S 
ý_ 
BI _jI-1 
W 
1I 
Bjýý' 
1-ýM rGM 
Vin- Vin+ 
Lout 
Figure 2-7: Wide linear range OTA comprising well-input transistors (W), source 
degeneration (S), gate degeneration (GM) and bump linearisation (B). The (M) transistors 
perform current mirroring to the output node [511. 
2.3.3 Device Mismatch 
Due to low-power and low-voltage requirements, it is desirable to operate 
analogue signal processing systems in the weak inversion regime: in particular G,,, -C 
filters used for cochlear modelling have all been operated in this region. Matching in the 
subthreshold region is poor and the quiescent current for small transistors can vary by a 
factor of two for the same nominal transistor dimensions and bias conditions [45]. 
Matching of the tail current sources is critical since these affect the tuning and Q-factor of 
the 2"d order section and hence also its stability: mismatch in the differential pair will give 
rise to an offset voltage, which, however, is not a critical issue in this application. 
JN 
29 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
One technique to reduce the problem of mismatches is the use of pseudo- 
BiCMOS [66]: this technique involves the use of a lateral bipolar transistor built in 
standard CMOS process for the tail current sources. Using bipolar transistors instead of 
MOS transistors removes bias current variations due to threshold voltage mismatch. 
Temperature independent biasing can be achieved in this case by the use of diode- 
connected bipolars to establish the end voltages of the polysilicon bias line. Another 
technique to reduce variation in the cut-off frequency and Q-factor is to use large 
transistor dimensions for the bias transistors [45]. The Q factor of the 2°d order section is 
related to the ratio of the feedback OTA grr, to the sum of the forward OTA gr values. 
Hence, in order to reduce mismatch problems, the common-centroid layout technique can 
be used by just duplicating the feedback (Q-control) tail current control transistor, without 
a severe area overhead [45]. 
In the original cochlea filter cascade, based on the LPF in Figure 2-2, two 
polysilicon bias lines are used, one for the Q-control and one for the cut-off frequency 
control. Non-uniformity of the polysilicon line, can also give rise to errors and stability 
problems. One technique to reduce this problem is to use the same bias line for the gates 
of all tail current transistors. Q-factor tuning is then achieved by controlling the source 
voltage of the tail current source in the feedback OTA using a global voltage source [45]. 
In [61], dynamic correction of Q-factor values is obtained by checking for 
instability using the IHC circuit output itself, and reducing the Q-factor accordingly, until 
stability is achieved together with a specific safety margin. The Q-factor control voltage is 
stored using a floating gate transistor. 
2.3.4 Area Requirement 
In order to implement a large number of cochlea sections on a single chip, it is 
desirable to minimise the area requirement of the individual sections as much as possible. 
Some area minimisation can be achieved by removing redundant transistors which result 
when the outputs of two OTAs are tied together [45]. Furthermore, large transistor 
dimensions should only be used when matching is critical (bias current transistors), while 
other transistors should be kept small (differential pair). 
Additional area minimisation can be achieved in the differentiation and 
rectification processes carried out by the IHC, since the current flowing through the 
second capacitor in the 2 "d order LPF section (Figure 2-2), is the derivative of the output 
30 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
voltage. Therefore a half-wave rectified derivative of the output voltage can be readily 
achieved as a current-mode signal using a single mirror transistor. Since the magnitude of 
the derivative is proportional to the frequency, in order to achieve an equal peak current at 
each tapping, the mirror ratio would have to be scaled using an additional bias resistive 
line [45]. An alternative approach to compute the normalised differentiation, without the 
need of an additional bias line is to use an additional OTA to compute the difference 
between the non-inverting input of the 2 "d OTA in the forward path and its output [66]. 
If the back-gate is used as the input terminal for the OTAs, then its capacitance to 
ground is enough to implement the G. -C filters, without the need of extra explicit 
capacitors. This further minimises the area requirement [69]. 
2.4 Application of the Silicon Cochlea to Sound Localisation 
The main applications for the silicon cochlea fall into 4 categories: cochlear 
implants [22], neural modelling [64], pitch and speech recognition [19], [39], [62], 
[78]-[82] and sound localisation [19], [21], [36], [79]. In [20], [82], the analogue cochlea 
model is used as part of a continuous time linear predictive code estimator, which results 
in a good compromise between time and frequency resolution and is capable of achieving 
real-time processing at low power consumption, compared to digital linear predictive code 
estimators. In localisation applications the cochlea has been used to split the input signal 
into several frequency components prior to the extraction of the various localisation cues. 
2.4.1 Localisation Theory 
Sound localisation in the auditory system takes place using three main cues [83]: 
spectral cue, interaural intensity difference (IID) and interaural time difference (ITD) 
between the left and right channels. The III) is sometimes also referred to as the acoustic 
headshadow effect, while the ITD cues are sometimes referred to as the interaural time 
disparity [381. The IID cue is computed as the ratio of the signal energy between the right 
and left ear, while the ITD cues are computed using cross-correlation. Since both the IID 
and ITD cues are a function of frequency, this process has to be carried out for each 
frequency component: the prior spectral analysis of the input sound signal is carried out 
by the cochlea. The dependence of the IID and ITD cues on the actual location of the 
sound source is a result of the HRTF. The HRTF obviously varies from individual to 
31 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
individual; however, results obtained in [84] indicate that the IID and ITD cues are still 
useful for localisation of sound synthesised using a non-individualised HRTF data. 
The use of IID and ITD cues together is known as the duplex theory of 
localisation. The spectral cue which is the result of the HRTF (mainly due to the pinna 
interaction) is responsible for determining elevation and distinguishing front-back 
locations. Although it is possible to achieve accurate 2-D localisation using only IID and 
ITD cues [83], combining the spectral cue with the IID and ITD cues should improve 
accuracy. 
The biological system is exceptionally good at ignoring cues resulting from 
echoes, thus avoiding localisation ambiguities. This is the result of what is known as the 
precedence effect [35], [85], [86], where sound localisation is determined by interaural 
cues associated with the earlier-arriving direct sound, while later-arriving reflections are 
neglected. 
2.4.2 VLSI Implementation of Localisation Systems 
In order to split the input waveform into separate frequency bands, a silicon 
cochlea (or a variation of it) can be used. Hardware implementations of sound localisation 
models to date have been based on ITD cues and developed for azimuth determination of 
the sound source. The cross-correlation circuit is usually based on discrete delay elements, 
although a continuous-time delay computation circuit has been recently developed for this 
purpose [87]. 
In [19], the left and right cochlea are based on the 2 "d order LPF cascade model 
of [38] as shown in Figure 2-8. Each cochlea output drives an IHC circuit which performs 
differentiation (via a hysteretic differentiator), and half-wave rectification. The IHC 
output drives a SGN circuit which is based on Mead's self resetting axon. The output 
pulse rate of the SGN circuit depends on the IHC current output. The pulse width is fixed 
using a constant bias voltage. The SGN from each channel and cochlea feeds a cascade 
tapped pulse delay line. Each tapped output on the delay line connected to the left cochlea 
is multiplied with a corresponding tapped output on the delay line connected to the right 
cochlea corresponding to the same frequency channels. The resulting correlation outputs 
are summed across all frequency channels and temporally integrated. These outputs feed a 
WTA network which selects the line with maximum correlation and inhibits all the others. 
32 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
AL 
2"" 
Order 
Section 
Section 
TDM Scanner 
Output 
Figure 2-8: Block diagram of the 1-D sound localiser chip used in [191: the left and right 
cochleae consist of a cascade of 2nd order low-pass filters. Each cochlea tap drives an IHC 
circuit followed by a pulse generator (PG) which emulates the function of the spiral 
ganglion neurons in the biological system. The outputs from each corresponding left and 
right PGs are cross-correlated using the pulse delay line (A) and coincidence detectors (C). 
The WTA network determines the maximum correlation output. A time-division 
multiplexed (TDM) scanner is used to monitor the WTA outputs. 
The hardware localisation system in [36] is similar to that in [19] from the general 
architecture point of view, but with some implementation differences. The system consists 
of two separate chips: a cochlea chip and a correlator chip. The cochlea chip is again 
based on the cascade LPF approach, but in this case, a special filter section is used which 
allows an increase in the frequency resolution without excessively increasing the time 
delay. Similarly, the differentiator used in the IHC circuit is different from the hysteretic 
differentiator used in [19]: at low frequencies, the IHC pumps current into the SGN for 
approximately half of every input cycle that it sees, while for high frequencies, current is 
pumped in for a much smaller portion of the cycle: at high frequencies, increases in 
amplitude of an input sinusiod cause a sudden but temporary increase in the current 
pumped into the auditory neurone circuit. The SGN circuit is again based on Mead's self- 
33 
(Winner-Take-All Network 1 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
resetting axon, but also includes a refractory period control. The correlator chip is similar 
to that of [ 19], but in this case parallel delay lines are used in order to minimise 
accumulation of errors. The multiplier is based on a simple NAND gate circuit with an 
additional output current control bias transistor. Each multiplier output is summed with 
the other corresponding outputs along all frequency channels and integrated using a leaky 
integrator before being applied to the output driver. No WTA circuit is used in this case. 
The localiser chip in [21] deviates in some aspects from the true biological system 
in order to achieve better localisation accuracy. The frequency components of the input 
signal is determined using a parallel bank of BPFs instead of a cascade structure: this is 
not an accurate cochlea model, but improves localisation accuracy by removing 
accumulation of errors inherent in the cascade structure. The IHC-SGN circuit half-wave 
rectifies the input, low-pass filters the result and generates a pulse at the peaks of the 
resultant waveform, whose pulse-width is a monotonic function of the amplitude of the 
low-pass filter output. Hence PWM instead of PFM is used in this case. Parallel delay 
lines are used with the delay elements modified in order to implement a form of the 
precedence effect: the delay-correlation elements contain the necessary logic to implement 
this effect. The delay-correlation block will only generate a high correlation output if the 
signal it has to delay is high, its output is low, and if the undelayed signal from the other 
side is high. Once a high output is generated, the delay-correlation element prevents the 
pulse from further propagating down the line. The delay-correlation element remains high 
for a specific tuneable refractory period in order to make the system insensitive to 
reflections. This arrangement has a design flaw since although downstream pulses are 
inhibited, upstream sections can still operate and respond to reflections: this causes the 
peak correlation to move towards the centre of the array. 
2.5 Enhancements in VLSI Implementations of Sound Localisation 
Although software models of sound localisation are well developed, there is a 
need for dedicated VLSI implementations in order to achieve real-time processing, 
especially in the front-end section that calculates the localisation cues. Preferably, this 
hardware would have to operate at low power in mobile battery-operated applications and 
should have a wide dynamic range. Capability of operation at low supply voltages is an 
essential feature due to the reduced breakdown voltages of modern CMOS technologies 
arising from the reduced linewidths. Existing implementations utilise only one 
34 
Chapter 2. Silicon Cochlea with Application to Spatial Localisation 
localisation cue, the interaural time difference computation, and suffer mainly from 
inaccuracy due to MOS transistor weak inversion operation. However, efficient VLSI 
implementation of the three localisation cues is required in order to realise real-time 2-D 
localisation and further increase the accuracy of sound localisation in the horizontal 
direction. 
2.6 Conclusions 
The review of literature presented here has shown that the silicon cochlea offers a 
promising technique to VLSI implementation of sound localisation. As part of a sound 
localisation front-end section, it facilitates splitting up of the sound signal into different 
frequency bands, prior to the computation of the localisation cues and allows low-power 
analogue filtering to be carried out in real-time. A sound localisation front-end section 
should contain dedicated AGC blocks with the aim of increasing the dynamic range, as 
well circuitry for onset detection in order to implement the precedence effect. The AGC 
should be broadly coupled both between different frequency bands, and also across the 
left and right channels, such that the spectral and EID cues are not diminished. Absolute 
accuracy of the cut-off frequencies of the cochlear filters may not be necessary; however, 
they should be temperature-drift free, such that the system can be calibrated. The 
inaccuracies can then be compensated in the back-end section of the system which maps 
the localisation cues into azimuth and elevation. 
t 
35 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
Chapter 3 
Cue-to-Position Mapping Algorithm for 2-D 
Spatial Localisation 
3.0 Introduction 
Localisation of a sound source entails the evaluation of a number of cues which 
are then used by a suitable cue-to-position mapping algorithm to extract the source 
azimuth and elevation. Hardware evaluation of the cues enables localisation to be carried 
out in real time. In this chapter a novel cue-to-position mapping algorithm for 2-D spatial 
localisation is developed. Its accuracy and robustness is investigated via simulation taking 
into consideration the effect of environmental non-idealities and hardware-induced 
inaccuracies. A generalisation of the algorithm that can be adapted to different head- 
related transfer function data is also considered. 
A novel 3-step cue-to-position mapping algorithm developed here for 2-D sound 
source localisation uses the interaural and monaural cues found in the biological auditory 
system. A high level model of a front-end subsystem, suitable for implementation in 
analogue hardware, is used to extract the required cues. The search algorithm developed 
is then used to map the obtained cues into the azimuth and elevation angles of the source. 
The accuracy of the search algorithm is determined via simulation taking into 
consideration the effects of noise and echo interference. For comparison purposes, the 
simulations are repeated using a reported single-step cue-to-position mapping 
algorithm [88]. It is shown that significantly improved results are obtained with the new 
search method. 
Since the localisation cues can be hardware generated, the accuracy and robustness 
of the algorithm are also investigated with reference to hardware-induced non-idealities 
such as crosstalk, variations in centre frequency and filter quality factor Q. The 3-step 
algorithm developed here is then also modified to allow adaptation to different HRTF 
data sets and the simulations indicated above repeated. Comparable results are again 
obtained for this more generic algorithm. 
Spatial localisation of a sound source can be carried out using several techniques. 
In the case of the human auditory system, the localisation cues arise due to manipulation 
of the source signal by the HRTF which is a function of the source location. Although, in 
36 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
this work, the HRTF is used in order to generate the test stimuli for the system, it should 
be possible to use the system in applications having two sensors mounted on a spherical 
structure, with the pinna modelled using artificial reflecting surfaces as in [89]. 
Localisation cues can be classified into spectral (or monaural) and interaural (or 
binaural) cues. Monaural cues are derived from certain characteristics in the spectral 
response of the received audio waveform which are due to the HRTF. Interaural cues are 
derived from differences between the left (L) and right (R) channel waveforms and can be 
further divided into interaural intensity (level) difference (IID or ILD) and interaural time 
delay (ITD) cues. In the biological system, IID cues may be important at frequencies 
above 1 kHz [90], although there is few if any data that actually suggests that IID cues are 
important in the biological system: in fact it has been shown that an overall 10 dB 
interaural level imbalance has no effect on localisation performance [91]. ITD cues can 
be further divided into interaural phase delays (IPDs) and interaural envelope delays 
(IEDs): both IPD and IED cues are the result of approximately an overall time shift on the 
whole waveform and hence their values are approximately the same. However, a 
distinction is made between IED and IPD cues since at higher frequencies a correlation 
algorithm for determining IPD cues may result in ambiguities. In fact, in the biological 
system, IED cues play an important role in the biological sound localisation system for 
frequencies above 1.5 kHz, while at lower frequencies IPD cues are more important [85]. 
ITD cues are particularly useful for lateral (1-D) localisation since the delay between the 
L and R signals increases monotonically as the azimuth angle of the sound source, 
measured from the medial plane, is increased up to 90° [92]. Although IID cues also 
follow similar patterns [93], it is possible to achieve 2-D localisation when combined with 
ITD cues [94]. Monaural spectral cues are affected by both the lateral and vertical 
position of the sound source; however, in the lateral direction, interaural cues give more 
salient information while spectral cues are important since they are highly dependent on 
the elevation of the sound source [95]. Nevertheless, it is possible to perform lateral 
localisation in the absence of interaural cues [96]. 
A number of software-based sound localisation models have already been developed 
that use only monaural cues [95] or interaural cues [88), [97], [98) for 2-D localisation. 
In [95], only monaural cues were used whose accuracy depends mainly on the spectrum 
of the source: in this case, under zero noise conditions, localisation errors varied from 
0.3-38°, depending on the type of input and on the spectral cue being used. A single-step 
37 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
search method, using only interaural cues, was developed in order to determine the most 
likely source position and the reported localisation errors were mainly due to the 
discretisation of the spatial likelihood map [88]: in this case, no attempt was made to 
systematically model the effect of environmental and hardware non-idealities. Interaural 
cues have also been used in [97], where the source position was confined to a plane and 
results, indicating localisation errors of approximately 6°, were obtained for a single S/N 
ratio of 40 dB, while cue-to-position mapping was carried out using a neural network 
(NN). A NN approach was also adopted in [98], where monaural and IID cues were used: 
in this case localisation error due to NN interpolation errors varied between 6-20°, 
depending on whether monaural or binaural training was used. Some 1-D localisation 
systems based on analogue hardware have also been proposed [21], [36], [99]; however, 
these systems only generated interaural time delay cues and no cue-to-position mapping 
was carried out. 
Software-based generation of cues from received signals is a time-consuming process 
due to the computationally intensive nature of the processing involved - filtering, 
envelope detection, correlation, integration and division. For real time 2-D location of a 
sound source an alternative approach is to extract the interaural and monaural cues 
through dedicated analogue hardware and then process these evaluated cues with an 
appropriate cue-to-position mapping algorithm to determine the values of azimuth and 
elevation angles. 
The coordinate convention used throughout this work is briefly introduced in 
section 3.1. The processing blocks used for generation of a binaural sound source 
conveying localisation information together with various non-idealities are discussed in 
section 3.2. Section 3.3 introduces the building blocks required for the extraction of both 
monaural and binaural localisation cues. The processes of discrete-space cue template 
extraction and the interpolation used in order to achieve continuous-space cue values, are 
discussed in section 3.4. Section 3.5 discusses the basis of the developed cue-to-position 
mapping algorithm, together with the actual algorithm details. Simulation results under 
different environmental and hardware non-idealities are presented in section 3.6. A 
generalised 3-step algorithm, which can be adapted to different HRTF data-sets is then 
described in section 3.7. 
38 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
3.1 Coordinate System Used 
Figure 3-1 shows the coordinate convention used through this work. Azimuth 
angles are represented by 0 and are measured relative to the median plane, with points on 
the right-hand side taken as positive. Elevation angles are represented by 4 and are 
measured relative to the horizontal plane, with points above it taken as positive. 
Median plane (0 = 0°) 
or (0 = 180°) 
Left sensor 
(8 = -90°) 
900)' 
.. i 
10 
, 
' 
, 
. 
. 
S. 
.. 
'ý. 
(0 =180°) i-, ' 
/ 
ýf 
ý 
/ 
1 
ce i 
/ 
-' e `. 
y E/ 
ý 
/ 
Right sensor 
(0 = 900) 
. 
, 
. 
Y' 
Horizontal plane (4 = 0°) 
Figure 3-1: Coordinate system used: Azimuth angles 0 are measured from the median 
plane (vertical plane passing through the middle of the two sensors). Elevation angles 4 
are measured from the horizontal plane passing through the 2 sensors. 0 is taken as 
positive for locations to the right of the median plane, while $ is taken as positive for 
locations above the horizontal plane passing through the two sensors. 
39 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
3.2 System Model Overview 
3.2.1 Block Diagram 
The framework used for simulation is divided into three sections: the stimulus 
generator, a front-end section and a back-end section. A block diagram of this system, 
which has been modelled using MATLAB, is shown in Figure 3-2 and Figure 3-3. The 
stimulus generator (Figure 3-2) generates the L and R signals given the source signal and 
the location of the source. 
Sound 
Source 
LEFT 
HRTF 
(e, 0 
RIGHT 
HRTF 
(8,0 
LEFT 
HRTF 
(01,01) 
CROSS-TALK 
ECHO 
GENERATION 
MODEL 
A 
B 
RIGHT 
HRTF 
(e', 4') 
E) -º A 
B 
L -Signal 
CORRELATED 
NOISE SOURCE 
UNCORRELATED 
NOISE SOURCES 
R-Signal 
Figure 3-2: Synthetic stimulus generation: the sound source signal is convolved with the 
HRTF in order to generate the L and R channel signals. The echo signal is generated by 
convolution of the sound source with an exponentially decaying pulse train. The echo signal 
is then convolved with the HRTF pertaining to the assumed echo location (0', 4'). The 
generator also allows for inter-channel cross-talk and the addition of correlated and 
uncorrelated noise. 
40 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
The front-end section (Figure 3-3 (a)) determines localisation cues from these 
signals, while the back-end section (Figure 3-3 (b)) maps these resulting cues into an 
estimate of the source location. 
LEFT 
MONAURAL 
SPECTRAL 
CUE 
GENERATOR 
Spectral 
Cues 
ONSET 
DETECTION 
RIGHT 
MONAURAL 
SPECTRAL 
CUE 
GENERATOR 
CROSS 
BPF ENVELOPE COR- 
II BAN{ DETECTION RELATION IEDs 
L-Input 
AG4 BPF ENVELOPE 01. ri 
BANK DETECTION R-Input 
INTERAURAL 
SPECTRAL 
CUE 
IIDs GENERATOR 
CROSS 
CORRELATION 
IPD 
(a) 
POSITION 
Computed ESTIMATION Source 
cues ALGORITHM position 
(0,, 0) 
CUE 
TEMPLATE 
(b) 
Figure 3-3: (a) Proposed front-end consisting of onset detection, BPF bank and other 
blocks necessary for the generation of ITD, IID and monaural cues. (b) Position 
estimation is carried out using the computed cues together with a cue template. 
41 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
3.2.2 Stimulus Generation 
In the biological system, the sound source is manipulated by the HRTF. Thus, in 
this model, the left and right test signals are generated by convolving a common input 
source signal with the HRTF which is a function of position. Existing HRTF data has 
been used for this purpose [100] (http: //xenia. media. mit. edu/-kdm//hrtf. html). In 
addition, the stimulus generator also allows for the following environmental and hardware 
system non-idealities: 
(a) correlated Gaussian noise on both L and R channels 
(b) uncorrelated Gaussian noise on both L and R channels 
(c) cross-talk between L and R channels 
(d) echo interference 
Any signal arriving at both the receiving microphones and not originating from 
the intended sound source location may be regarded as correlated "noise". Typical forms 
of correlated "noise" in analogue hardware could result from mains pickup, noise injected 
from the power supply rails or noise coupled from digital control circuitry. Uncorrelated 
Gaussian noise is present in all hardware and is generated by the microphones themselves 
and the electrical devices forming part of the hardware. Some degree of cross-talk 
between the L and R channels is also to be expected in hardware implementations and is 
usually due to capacitive coupling. The degree of cross-talk can be potentially high 
especially in systems where some blocks are multiplexed between the left and right 
channels in order to reduce the chip area cost. 
3.3 Cue Extraction Analogue Blocks 
The front-end section uses two channels as inputs (L and R) which feed two 
separate filter-banks. For each filter section, the IPD, IED, IID cues together with 
monaural first-order spectral and second-order spectral cues are extracted. For simulation 
purposes, the inputs are sampled at a frequency fs equal to 44.1 kHz, which is also the 
frequency at which the HRTF data was sampled. 
3.3.1 The Filter-bank 
Each filter-bank consists of 24 filter sections. Each filter section is made up of a 
cascade of two second order bandpass filters (BPF), each having the following transfer 
function: 
sw Hn(s)= (3.1) 
s2 +wQs+ tont 
42 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
where Q is set to 10 and w" = 2zrf1k", where n is the filter number. In this case f, and k 
are set to 80 Hz and 1.265, respectively, resulting in a frequency range of 
80 Hz - 18 kHz. From an analogue implementation point of view, a parallel filter bank 
would be preferred to a cascade version, since offset voltage and noise do not accumulate. 
3.3.2 Interaural Phase Delay Computation 
The interaural phase delay (IPD) is calculated using cross-correlation: 
fO L(t)R(t + rd) dt for rmax < 'rd <0 (3.2) 
r(rd) _ 
1 L(t - zd)R(t) dt for 0< rd < Tmax 
where T is the integration interval, which is set to 100 ms in this case, and ,,, is the 
maximum delay to be considered, which is set to I ms [101]. It is known that humans 
show little improvement in localisation accuracy for increases in stimulus duration 
beyond 10 ms [ 102]. The 100 ms value is large compared to this value; however, it is only 
8 times longer than the wavelength of the lowest filter centre frequency and is therefore 
reasonable from the correlation point of view: the duration of the input stimulus itself 
may be significantly shorter than the integration interval. The choice for the value of the 
integration interval T is a compromise between localisation accuracy and system response 
speed: increasing the integration interval T would result in a more accurate estimate for 
the cues, especially for the low frequency bands, and will thus also improve localisation 
accuracy at the expense of a higher response time. 
The time delay CIPD is computed as the value of is for which r(rd) is maximum. If 
the input signals L(t) and R(t) are sinusoidal signals given by asin(wt) and ßsin(wt+Td), 
the value of r(rd) is approximately given by aßTcos[w(Td+id)], for sufficiently large 
values of the integration interval T. It should therefore be noted that the correlation 
algorithm breaks up when I «(Td+td) I? 27t, that is when the input frequency exceeds 
(1/2tmax), which corresponds to n >_ 9 (corresponding to a centre frequency of 525 Hz). In 
this work, IPDs are computed for filters with n=I to 9, since the delay Td never exceeded 
0.8 ms for the particular HRTF data set that has been used. During the evaluation of the 
IPD cues, 'rd values are taken as integer multiples of the sampling period. Hence the 
resulting IPD values are discrete values which are in the range [-44/fs, 44/fs] and 
43 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
correspond to multiples of the sampling period. In order to facilitate hardware 
implementation, no interpolation is attempted. 
3.3.3 Envelope Detection 
Envelope detection is here carried out using a full-wave rectifier followed by a 
first order low-pass filter which has a corner frequency set to 1/5 of the corresponding 
BPF centre frequency: this ratio gives a good compromise between settling time and the 
value of the ripple present at the output. 
3.3.4 Interaural Intensity Difference Computation 
The interaural intensity difference is computed using the following equation: 
Ip Renv (n, t) dt 
CIID (n) = 20 log 10 (3.3) 
J0 Lenv (n, t) dt 
where R,,,, and Len,, are the envelope signals of the R and L channels, respectively. 
Accurate estimate of the integrated envelope during the interval T is only possible for the 
higher frequency values, since for the lower frequencies, the period is comparable with 
the integration time: thus, in this model, liDs were computed for n= 11 to 24. 
3.3.5 Interaural Envelope Delay Computation 
The interaural envelope delay (IED) is computed in the same manner as the IPD; 
however, the envelope signals are used in this case. The IEDs are here evaluated for 
n= 10 to 24, corresponding to centre frequencies 664 Hz to 18 kHz. 
3.3.6 Monaural Spectral Cues Computation 
Monaural spectral cues (MSC) are cues which are evaluated using the envelope 
information of one channel at a time. In the human auditory system, these cues arise due 
to the direction-dependent interaction of the pinna with the incoming acoustic wave. The 
pinna may be modelled as having multiple reflecting surfaces: a classical model is the one 
by Batteau [103] and is depicted in Figure 3-4. These multiple reflection paths cause 
constructive and destructive interference at different frequencies and thus the spectral 
44 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
content of the composite signal exhibits peaks and troughs whose position is dependent 
on the source azimuth and elevation. 
Input Output 
Figure 3-4: Batteau's model of the external ear. The delays Te and r# are functions of the 
azimuth and elevation, respectively, while a, and a2 are reflection coefficients. 
Since it is not possible to distinguish between the spectral shaping introduced by 
the pinna and that which is inherent in the original signal source, these cues depend on 
both the HRTF and the source spectrum. Hence, in order to validate their use, some 
assumption about the source spectrum has to be made [95]. Nevertheless, these cues are 
essential in order to locate a sound source precisely in terms of both azimuth and 
elevation. They are also indispensable for determining the elevation of sound sources at 
zero azimuth angle, where the interaural cues are essentially all equal to zero. Two types 
of MSCs have been computed, called first order and second order MSCs. 
The first order MSCs are analogous to the first derivative of the magnitude- 
frequency curve and the values of the MSCs for the right channel are computed as: 
T 
ýRenv (n + Lt)dt 
C MSCR (n) = 20 log 10 T 
(3.4) 
jRenv (n, t)dt 
For the left channel, the value of C'MSCL(n) is computed in a similar way. The first 
order MSCs assume that the source spectrum is locally flat, and therefore the value of the 
MSC is only dependent on the HRTF. 
45 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
The second order MSCs are analoguous to the second order derivative of the 
magnitude-frequency curve and are computed as: 
CMSCI(n) CMSCR n+ 1) - CMSCjn) 
TT (3.5) 
jRenv(n+1, t)dtýTRenv(n-1, t)dt) 
= 201og10 C2 T 
öReni(n, t)dt 
A similar equation is used for C"MSCL(n). Second order MSCs assume that the slope of the 
sound source spectrum is locally constant and they are therefore more generic with 
regards to source spectral properties. 
3.3.7 Onset Detection 
The determination of onsets is an important aspect of a sound localisation system 
since it provides a degree of robustness against ambiguities which may result from 
echoes. In fact, it is known that the biological system ignores localisation information 
arising from echoes via the "precedence effect" [86]. In order to determine the onset of 
input signal in time, an estimate of the echo envelope E(t) is first determined using the 
model shown in Figure 3-5, which has been adapted from [35]. This method has been 
chosen since it can be readily implemented using analogue hardware, even though it has 
no direct biological justification. The low-pass filter (LPF) in this model is 1st order, with 
a corner frequency set to 500 Hz. The delay Td represents the time delay from the incident 
signal to the first echo wave-front. The ratio Afe is the reverberation ratio, while the RC 
decay product r represents the echo decay constant. The values of Td and Afe are set to 
6 ms and 0.5, respectively, while z is set to 0.1 s. An onset is detected when the ratio 
S(t): E(t) exceeds a specific threshold Thr set to 1.5, where S(t) is the composite envelope 
signal. The value chosen for Thr resulted in accurate onset detection: if this threshold is 
set too low, spurious onsets will be generated, while if it is set too high true onsets will 
not be detected. For optimal performance, the values of Td, Afe and r will need to be 
adjusted according to the acoustic properties of the environment. The detection of an 
onset generates a window function having a flat portion of 4 ms duration followed by an 
exponentially decaying tail of length 6 ms. During the 10 ms duration of the whole 
window function, further onsets are inhibited. 
46 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
L 
R 
FULL-WAVE ------------- 
RECTIFIER `4fe ECHO DECAY MODEL Estimated 
DELAY 
Echo E(t) 
+ LPF d T 
FULL-WAVE CR 
RECTIFIER ;T 
i 
------------ .1 
Thr -º 
Composite 
Envelope 
S(t) 
Onset 
Figure 3-5: Model used for onset detection. The composite envelope S(t) of the L and R 
signals is extracted via full-wave rectification and low-pass filtering. 
The window function is directly multiplied with the input signal, rather than with 
the cues derived after bandpass filtering as has been carried out in previous models [104]. 
This approach is not biologically justified, but it yields a better accuracy of the resulting 
cues, since post-filter windowing would in fact result in a substantial truncation of the 
cues pertaining to the incident signal; this is because the effective length of the impulse 
response of the filters is much longer than the window width especially at the lower 
frequency bands. 
3.4 Template Generation and Interpolation of the Resulting Cues 
In order to be able to determine the location of a sound source from a given set of 
cues, a template of cue values for different sound positions is required. This template has 
been computed using an impulse function as sound source input with the ideal conditions 
of no echo, zero additive noise and zero cross-talk. The template of cues was generated 
using available HRTF data, which exists for a set of discrete azimuth and elevation 
angles. The existing data set has HRTF values for elevation angles between -40° and 90°. 
At an elevation of -40°, the azimuth is uniformly sampled at 56 angles (from 0 to 180°). 
47 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
The system is then extended to include azimuth angles in the range from 0 to -180°, by 
noting that from symmetry: 
CIID(0,0) _ -CIID(O, 0); CIED(-0,0) _ -CIED(O, 0); 
CIPD(-t9,0) _ -CIPD(O, 0); 
CMSCR(-0,0) = CMSCL(O, 0); 
iiSCR(- 0) _C IMSCL(e, 0); 
CMSCL(-O, 0) = CMSCR(9,0); 
c'MSCL(O, 0) _ CMSCR(6,0); 
(3.6) 
For further processing some form of spherical interpolation is required. This was 
achieved via the use of a radial basis function (RBF) [105] as an interpolation tool. The 
RBF network centres were chosen to lie on equal arc distances on a spherical surface with 
Ný = 12 different elevation values starting from -40° to 90° in steps of 10.8°. This value 
for Ný was chosen as a compromise between over-fitting and under-fitting. For each 
elevation angle 4, the azimuth was uniformly divided in Ne sections, where Ne is given 
by: 
NB (0) = 
180 Co 
10.8 s(b) 
1 
-40°<_0<90° 
0 =900 (3.7) 
In the actual model, it was found that the interpolation accuracy was improved if 
the N9(4) was increased by a factor of two. The interpolated cue values C* are given by: 
* 
C =H*WC c (Oi, o1>= 
Ne (0) 
Z wC (k, m)hi (k, m) 
k=1 m=1 
(3.8) 
where We is a weight matrix and H is a 2-dimensional Gaussian radial function given by: 
(T (0l, O1'9k IOm )) 
2 
hi (k, m) = exp 
r2 (3.9) 
In the above expression, r is the radius parameter set to 10°, while Y(0;, 4 i, Oki 4m) 
represents the angular distance between two points (6;, 4; ) and (()kg 4m) located on a 
spherical surface and is given by: 
'Y(Ob4, Bk, )= cos -I cos((i -Ok)cos(j)cos( )+sin(4-)sin( ) 
(3.10) 
48 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
The optimal weight matrix Wc, used in the RBF interpolating the original vector of cues 
C, was computed using the algorithm for the least square error [ 106] given by: 
WC ° (HTH)-1HTC (3.11) 
3.5 Cue-to-Position Mapping Algorithm 
The position of the sound source is estimated by computing the likelihood values 
L(6,0) for the estimated cues Cei, where (0,0) are points inside the whole search space 
for which the template cues C*,; are known. The objective is to find the position (6e, 00 
which maximises L(6,0) given by: 
*2 
1N 
[Cej 
-C ti (B, 0)- 
L(9,0) =K exp 
1Z 2 =I Ui 
(3.12) 
In the above expression, N is the total number of cues, 6 is the variance parameter 
for each particular cue, while K is a normalising factor computed as: 
K =Nl2 
N 
(2; r) flai (3.13) 
i=1 
The model allows the calculation of L(6, ý) using only ITD, IID, first order or 
second order monaural spectral cues or a combination of these cues. The term inside the 
exponential function is a measure of the "perceptual distance. " Clearly, maximising the 
likelihood function is equivalent to minimising the perceptual distance. 
3.5.1 The Search Method 
3.5.1.1 Basis for the Algorithm 
Previous algorithms based on spatial likelihood maps [88], perform a single step 
search in the whole search space, using all available cues concurrently. The algorithm 
presented here is based on the proposition that ITD cues are approximately constant when 
computed along the locus of points which define conical surfaces commonly known as 
"cones of confusion". In fact, an approximation for ITDs (assuming a spherical model for 
the head) has been proposed by Kuhn [92] as: 
49 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
3rsin6 
ITD c ? rsin6 
c 
for f<1 kHz 
for f>1 kHz 
(3.14) 
In this approximation, r is the radius of the sphere modelling the head, c is the speed of 
sound and S is the angle between the median plane and a ray passing from the centre of 
the head through the source position. Further evidence supporting this proposition is 
obtained from contour maps for perceptual distance (or likelihood value) plotted using 
only ITD cues. A typical plot is shown in Figure 3-6(a) where the source origin was 
assumed to be at (24°, 50°). Similar plots are obtained for other source locations. This 
assumption was further verified by plotting ITD cues (at different values of 0,0) versus 
the cone parameter ß (= 90 - 8) defined as: 
/3 = cos -1 (sin 0 cos O) (3.15) 
Figure 3-7 shows this plot for both the IPD cues computed for the 9th filter and lED cues 
computed for the 10`'' filter, respectively, obtained for values of 0 ranging from 0 to 1800 
and values of 4s ranging from -35 to 901. Similar plots can be obtained for the other filter 
positions. It is noted that there exists a strong correlation between the ITD cues and the 
parameter ß. Numerical values for the degree of correlation Rs between the various cues 
and the parameter ß were determined using Spearman's rank correlation method and 
results are summarised in Table 3-1. 
Cue IPD IED IID MSCL MSCR MSCL2 MSCR2 
BPF No. 1-9 10-24 11-24 11-23 11-23 13-23 13-23 
Mean 
RS 1 0.9919 0.9574 0.8961 0.1898 0.1886 0.0815 0.1295 
Table 3-1: Computed mean correlation coefficients R, between the respective cues and the 
cone parameter P. ITD cues exhibit a strong correlation with ß, while IID cues have a lower 
(but still significant) correlation with P. Monaural cues exhibit no correlation with P. 
50 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
150 
100 
50 
d 
50 
-50 
-100 
-150 
ý ý 
z 
ý 
3t2 
0 
9 
-40 -20 0 20 40 60 80 Elevation, deg. 
(a) 
150 
100 
50 
0 
E 
-50 
-100 
-150 
zs 
5e 
22 s o 
0 
35 
1 
-40 
150 
100 
50 
0 
E 
-50 
-100 
-150 
-20 0 Elevation, deg. 
40 
(b) 
9 
3V i 4S 
60 80 
21 JJ 
-40 -20 0 Elevat20 deg. 
40 60 80 
ion, 
(C) 
Figure 3-6: Contour map for the perceptual distance for a source (denoted by "*") at 
location (24°, 50°), computed using only ITD cues. The contour curves are approximately 
cones over which the ITD cues exhibit very small variations. (b) Contour map obtained 
for the same case, but using only IID cues. In this case, the contour curves indicate that 
the IIDs exhibit significant variation over the cones shown in (a). (c) Contour map 
obtained for the same case but using only monaural cues. It is evident that these cues are 
not related to the cones shown in (a) and they convey significant information regarding 
the elevation of the sound source. 
51 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
The values in the second row in Table 3-1 are the magnitude of the correlation 
coefficients, averaged over the respective filters. Table I indicates that III) cues are also 
significantly correlated to the value ß (see also Figure 3-6(b)) but to a lesser extent than 
IPD cues. MSC cues are not constant over the conical surfaces defined by the value of ß, 
as can also be deduced from contour maps for the perceptual distance obtained using 
these cues alone (Figure 3-6(c)). This is because all cues except ITD cues depend heavily 
on the spectral manipulation of the signal by the HRTF: this manipulation is highly 
non-linear with respect to change in direction both along azimuth and elevation 
directions. Also, since ITDs are computed using cross-correlation of bandpass-filtered 
signals, they are relatively robust against source spectrum variation and noise. Hence the 
value of ß, giving the most likely cone of confusion, can almost always be accurately 
determined from ITD cues even under the most perverse conditions. It will be shown in 
section 3.6 that the proposition that ITD cues are approximately constant over a conical 
surface can be relaxed to a more general surface in order to improve the applicability of 
the algorithm to different HRTF data sets. 
x 10, 
rf 
II 
a 0-4 
p ..................................................................................... 
............................................................................ 
2 ..................................... i................... .......... ............... 
3 
,d...................................... 
3.............. 
.............................. 
............................................ ................... 5E 
.. g ................................. ............................................. 
................... ................. ................................. 
-8 0 20 40 60 80 100 
(a) ß, deg. 
x 10, I 
0 
-1 
`n -2 
O 
, --+ 
II '3 
A -ý w 
-5 
-6 
-7 0 20 40 60 80 100 
(b) ß, deg. 
Figure 3-7: (a) IPD cues and (b) IED cues for the 9' and 10th BPF respectively, plotted 
against the cone parameter P. The vertical scatter for a particular value of ß is quite small 
indicating that these cues are primarily a function of ß only. 
52 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
3.5.1.2 Three-Step Search Method 
The search process for (6e, 4e) is carried out in the following three steps: 
(i) determination of the most likely cone of confusion based on ITD cues; 
(ii) determination of approximate source location along cone locus using all cues via a 
discrete "grid" search: at this point an approximate value for 0 and 0 is determined; 
(iii) determination of the most likely source location using a gradient descent algorithm. 
This method requires relatively low computation time since the effective search 
space is drastically reduced and is also robust against additive noise and changes in the 
source spectrum. The determination of the most likely cone of confusion is done by fixing 
the elevation to 0° and searching along the whole possible azimuth range (-180° to 180° 
in this case), for the angle ß (measured along the azimuth), which results in a maximum 
likelihood value. A step size of 5° is used for this search. This search is carried out using 
only ITD values. 
Having determined the value of angle ß, a search is now made which is free in 
both azimuth and elevation but is restricted to the cone defined by this angle, that is: 
9= 90" ± cos -i 
cos (901, - P) 
cos 0 (3.16) 
During this search, 4 is swept for the whole valid range (-350 to 190° -ß 
I) in steps of 
5°. In this way, an approximate value for (0,4) is obtained. This search is carried out 
using also the lID cues together with the monaural cues, which are important in 
determining the most likely source angle along the cone of confusion. 
Having determined the approximate source location (6e, 4e), a gradient descent 
algorithm is used to locate the exact position (Ao, Q. This step is carried out without any 
restrictions in both azimuth and elevation and thus allows for variations of the true locus 
of the cones of confusion from the approximate formula used in the previous step. Via the 
use of RBF interpolation, it also allows localisation accuracy to be improved without 
restriction to finite angle discretisation, unlike some previous localisation systems based 
on discrete likelihood maps [88]. In fact the final localisation accuracy, under ideal 
environmental and hardware conditions is only dependent on the RBF interpolation 
accuracy. 
Using Taylor's theorem of the mean in 2 variables, expanded up to 3 terms gives: 
L(e0,00) = L(9e +se, oe +eo) m L(ee, oe)+eOLO(ee, oe)+. 60LO(ee, oe) (3.17) 
53 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
Partially differentiating the above expression with respect to 0 and 0 separately, and 
setting these two derivatives to zero (at the point of maximum likelihood) gives: 
LB (-Loo - Loo se e'e - L90 - Log 
' Le 
(3.18) 
Lo ;- Lw -Loo co s- Leo - Loo Lo 
In the above expression, all partial derivatives are evaluated at (Ae, 4e). These derivatives 
are evaluated using central finite difference approximations. In this way, a better estimate 
for the source location is computed using: 
e0 ee "'e 
_ + Ol0 oe er ,0 
(3.19) 
The above procedure is repeated until the angular distance between two successive values 
of (6e, fie) is less than a critical threshold. 
The above method works well if the likelihood function can be approximated (at 
least locally) by a polynomial of degree 2 and if the starting value is in fact near a local 
maximum. If the first condition is not true, then the error in the estimate of the second 
derivative may be so large that the next computed point will have lower likelihood value 
than the previous one. This problem is solved by accepting a new point only if its 
likelihood value is greater than the previous one. If this is not so, the values of C'e and E'4 
are halved until such a point is found. If the second condition is not true, then the 
algorithm may head towards a minimum rather than a maximum. This condition is easily 
detected by verifying that: 
LBEB <0 and L`64 <0 (3.20) 
If any one of these conditions is found to be false, then the sign of the respective s value 
is reversed. 
3.6 Simulation Results 
The robustness of the 3-step search method is examined by simulation 
experiments designed to investigate separately the effect of correlated noise, uncorrelated 
noise, echo and crosstalk on the estimated angle of source location. The effect of 
hardware variations on the individual cue values and also on the final localisation error is 
also investigated. The tests presented here are obtained with the sound location fixed at 
(24°, 50°), although similar results have also been achieved using other source locations. 
54 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
In each case, broadband pseudo-random noise was used as the input sound source. The 
results are also compared to those obtained under the same conditions but using a single 
step search. 
3.6.1 Environmental Effects 
3.6.1.1 Effect of Noise Interference 
This simulation is carried out by adding random Gaussian noise to the L and R 
signals after HRTF manipulation, for S/N ratio varying from 5 to 80 dB. In each case, the 
simulation is carried out 20 times, and the average angular error in localisation is 
computed (using Eqn. 3.10) and plotted as shown in Figure 3-8. The solid curve (i) 
indicates the variation of localisation error with identical noise waveforms injected into 
the L and R channels. The dashed curve (ii) is obtained by injecting uncorrelated noise 
waveforms into the L and R channels separately. Localisation error is less then 1° for S/N 
ratios greater than 34 dB. The corresponding errors obtained when the single step search 
is used are shown in curves (iii) and (iv), where it can be seen that the localisation error in 
this case is significantly higher than that obtained using the 3-step search method. 
The effect of noise on the individual cues is also investigated by calculating the 
root mean square cue error E, normalised by the maximum magnitude of the cue Cmax in 
the whole search space, that is using the expression: 
E=1 
ýCaclual 
- Cideal ý2 
Ný Ca. 
(3.21) 
The corresponding plots are shown in Figure 3-9(a), (b) for the case when the 
same noise signal is injected in the L and R channels and Figure 3-9(c), (d), for the case 
when independent noise is injected. 
These results are, however, conditioned by the effect of the integration interval 
used in the cross-correlation for determining the ITD cues. The integration interval of 
100 ms captures just 8 cycles of the low-frequency BPF output. It is thus expected that 
over a larger integration interval, the effect of uncorrelated noise is considerably 
attenuated and the above results will therefore be modified if the integration interval is 
increased significantly, at the expense of a much larger computational time. However, 
when the cues are hardware generated, real time computation is possible so that a large 
integration interval becomes a practical proposition. Furthermore, the integration interval 
55 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
mainly affects the IPD cues obtained for low frequency bands, so that improved accuracy 
is possible by reducing the weights attributed to these IPD values. 
80 
70 
60 
50 
b 
40 
30 
20 
10 
0 
:, 
(1V) 
ý111ý 
0 0" 
0 10 20 30 40 50 60 70 80 
S/N ratio, dB 
Figure 3-8: Angular error between the estimated source location and the actual source 
location plotted as a function of the S/N ratio. The solid curve (i) was obtained using a 
single noise source applied to both the left and right channels, while the dashed curve (ii) 
was obtained using different noise sources for the L and R channels. Curves (iii) and (iv) 
represent the corresponding results obtained via a single step search. 
3.6.1.2 Effect of Echo Interference 
The effect of echo is investigated with regards to variations in the reverberation 
ratio Afe and the echo decay constant T. During this test, the onset detection parameters 
are kept fixed as described in section 3.3.7, while the parameters in the stimulus generator 
are varied. A single point echo source is assumed at a location of (-24°, 50°). Figure 3-10 
shows the localisation error for Afe and ti in the range 0 to I and 1 ms to 1 s, respectively, 
obtained using the new algorithm. In this particular case, it can be seen that the 
localisation error is less than 1° if the values of Afe or r are kept less than 0.5 and 0.05 s, 
respectively. The corresponding plot for a single step search is shown in Figure 3-10 (b). 
56 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
Comparison of the two results shows that the 3-step search method is more tolerant to 
residual echo errors not eliminated by the onset detector. 
nw 
U. 
O. OF 
U, ai 
0.0E 
k 0.0A 
0.0: 
n nr 
U. U: 
0 20 40 60 
(a) S/N ratio, dB 
0.04 
0.0. 
0.0, E 
Jo. 
0 
2n 
1 
1st 
80 0 
0.1 
IPD 10: 
0.0 
.E Fr 
Lý) 
_n r 
IIIID 
v. vv 
0 20 40 60 80 
(c) S/N ratio, dB 
n nr U. ' 
o. 9) U 
1 0. 
cd 
C", 
E 
A 
ö 0. 
20 40 60 80 
(b) S/N ratio, dB 
0 20 40 60 80 
Jý 
04 
03 
1 st 02- 
2ndl:. 
01 
0 
Fiure 3-9: Normalised values of the error in (a) IPD, IED and III) cues and (b) 1`` and 
2" order monaural spectral cues plotted as a function of the S/N ratio for the case when 
the same noise signal is applied to both the L and R channels. Corresponding results for 
the case when uncorrelated noise signals are added to L and R channels are shown in (c) 
and (d), respectively. 
3.6.2 Hardware-Induced Effects 
3.6.2.1 Effect of Channel Cross-talk 
During this simulation, a portion of the L-channel signal is added to the R-channel 
signal and vice-versa. The ratio of the original signal to the cross-talk component is varied 
from 5 to 80 dB and the resulting angular error is shown in Figure 3-11 (a) for (i) new 
57 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
12 
10 
b 
Cg 0 
4 
2 
0 
1 
1 
I 
60 
60 
.d 40 
30 
20 
1 10 
0 
-10 1 
I 
Rever 
(Afe) 
(b) 
Figure 3-10: Angular error between the estimated source position and the actual position 
plotted versus the echo reverberation constant (Afe) and the echo envelope decay constant 
(t), for (a) new algorithm, and (b) single step search. During this simulation, the onset 
detector parameters were kept constant while the echo generator parameters (Afe, T) In the 
stimulus generator were varied. 
58 
Q 
_3 
Loglok-E, S) 
(a) 
0 _3 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
50 
45 
40 
35 
30 
b 
25 
2C 
if 
1( 
Tý 
ý11ý 
(i) 
5 
0 
0 10 20 30 40 50 60 
(a) Signal/Crosstalk ratio, dB 
0.2 0.12 
70 80 
0.1 
0.15 
I 
0.1 
6.4 
0.05 
0.08 
Z 
U 
cd 
0.06 
0.04 
2nd 
0.02 
IPD 
PID 
00 10 
50 100 0 50 100 
(b) Signal/Cross-talk Ratio, dB (c) Signal/Cross-talk Ratio, dB 
Figure 3-11 (a). Angular error between the estimated source location and the actual source 
location plotted as a function of the crosstalk ratio using (i) new algorithm, (ii) single step 
search. The normalised values of the error in IPD, IED and IID cues are shown in (b) and 
1" and 2d order monaural spectral cues are plotted in (c) as a function of the cross-talk 
ratio. 
1st 
59 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
algorithm and (ii) single step search. It can be seen that for signal/crosstalk values higher 
than 30 dB, the localisation error is less than 1 °. Below 30 dB, the 3-step search method 
results in a significantly reduced error. The effect of cross-talk on the individual cues is 
shown in Figure 3-11 (b), (c). 
3.6.2.2 Effect of Hardware Accuracy on Localisation Error and Individual Cues 
Hardware inaccuracies arise due to mismatches between components and also due 
to variations of component parameters (such as threshold voltage) from the nominal 
design and process values. In particular, component parameter variations can lead to 
changes in the resonant frequency and Q-factor. The effect of hardware inaccuracies are 
assessed by considering variations from the nominal Q-factor and centre frequency w, by 
factors in the range of 0.5-2 and 0.8-1.25, respectively, while keeping the noise, echo 
and crosstalk equal to zero. Two cases are considered: in the first case the variations are 
assumed to be identically matched between the L-channel and R-channel filter banks. In 
the second case, the Q and w values of only one filter bank are varied, while the other 
filter bank is maintained fixed with its nominal parameters. In all cases, the template used 
for localisation is maintained fixed and equal to the one generated for nominal filter 
characteristics, that is the template is not adapted to the hardware in any way. 
Simulation results for hardware variations in the Q value are shown in 
Figure 3-12. These results show that unmatched variations in Q result in a significant 
angular error, while the error in the case of matched variations in Q is very low, within 2°. 
It can be seen that the localisation error due to matched variations in the Q-factor are 
much lower than for the one-sided case. This result emphasises the importance of good 
matching between the L and R filter banks in order to achieve accurate localisation. The 
different variations in Q in the left and right filter banks result in differences in the 
intensity and delay between the output of the two filter banks, thus causing a significant 
error in the computed interaural cues as is evident from Figure 3-12 (b) - (d). In contrast, 
matched variations in Q-factor result in equal variations in the output of both channels, so 
that the error in the interaural cues is practically negligible. The variations of monaural 
cues with matched variations in the Q-factor essentially follow the same trend as for the 
unmatched case. 
60 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
1ý 
1( 
cA 
b 
w 
0 
aý 
oA 
0.7 
0.6 
0.5 
a, 
0.4 
0.3 
0 W 0.2 
0.1 
IED 
IID 
IPD 
0.5 1 1.5 2 
(a) Qactual / Qnominal 
0.04 
0.035 
0.03 
U) 
v 0.02 
Cd 
0.02 
0 
rý 0.015 
0.01 
0.005 
r' 
2nd 
Ist 
ED 
IPD 
............. 0.5 1 1.5 2 0.5 1 1.5 2 
(C) Qactual/Qnominal (d) Qactual / Qnominal 
Figure 3-12: The angular error between the estimated source position and its actual position 
plotted versus Q factor, for the case of a(i) unmatched and a(ii) matched (2-sided) Q variation 
in the BPF filter-bank. Curves a(iii) and a(iv) indicate the corresponding results obtained 
using a single step search. The corresponding normalised variations in the individual 
interaural cues are shown: (b) interaural cues versus unmatched Q variation; (c) monaural 
cues versus unmatched Q variation; (d) interaural cues versus matched Q variation. 
0- 
0.5 1 1.5 2 (b) Qactual/Qnominal 
10 
M8 O 
r-1 
ie 
U 
m 
ý2 
W 
n 
61 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
Figure 3-13 shows the simulation results obtained for centre frequency w 
variations. Again, the angular error in the case of unmatched variation, Figure 3-13 (a), is 
significantly larger than that for the matched variation, Figure 3-13 (b). However, 
matched co variations induce a significantly larger localisation error than matched Q 
variations: this is explained by the fact that all cues are a function of frequency, in 
particular IID and monaural cues [107] as is evident from Figure 12(c-e). Matched 
variations in the resonant frequency also result in relatively low interaural cue variations. 
In particular, it can be seen that ITD errors are smaller than 1113 or monaural cue errors: 
this is because time delay cues depend to a lesser extent on frequency than IID or 
monaural cues [107]. However, the variations in monaural cues are still significant in this 
case. Thus matched variations in the resonant frequency still result in high localisation 
errors. This is because the magnitude response of the HRTF exhibits sharp variations with 
frequency, which are attributable to interference effects introduced by the pinna. In the 
case of mismatched w variations, an additional error contribution arises from the ITD 
cues. In particular, ITD cues show very large variations at low CO,, values, due to the fact 
that the equivalent time delay errors can be high even for small phase errors (which arise 
from w variations) at low frequencies. 
I 
biD 
a. ) 
"O 
bA 
ao 
b 
aý 
bn 
50 
(i) 
A 
0.8 1 1.2 
(a) wn actual / (On nominal 
"0.0.9 
/11.1 
1.2 
(b) wn actual 
/ (On nominal 
62 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
U 
Cd 
Cd 
I 
W 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0.8 1 1.2 
(c) wn actual 
/ (On nominal 
0.08 
0.07 
r 
0.06 
0.05 
0.04 
E 
0.03 
0.02 
0.2 
0.18 
0.16 
0.14 
0.12 
C, 0.1 
0.08 
0.06 
0.04 
0.02 
0.8 1 1.2 
ýdý 0n actual 
/ wn nominal 
IID 
lED 
IPD 
0.01 
0' 
0.8 1 1.2 
(e) wn actual / (On nominal 
Figure 3-13: The angular error between the estimated source position and its actual position, 
for the case of (a) unmatched and (b) matched (2-sided) centre frequency (Co. ) variation in 
the BPF bank, is plotted versus the normalized co. value. Curves (i) are obtained using the 
new algorithm while curves (ii) are obtained using a single step search. The corresponding 
normalised variations in the individual cues are shown: (c) interaural cues versus 
unmatched variation; (d) monaural cues versus unmatched «-variation (e) interaural 
cues versus matched to variation. 
63 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
3.6.3 Performance at Different Source Positions 
The performance of the new algorithm at different source position tested under the 
influence of additive noise and results are shown in Figure 3-14. The figure shows 
localisation performance in the right hand hemisphere. The results shown in this section 
are obtained using a single run of the algorithm without any form of averaging, unlike in 
the previous section where the errors were averaged over 20 runs. For SIN ratios less than 
60 dB (Figure 3-14 (a)), localisation errors are small, within 5°. These errors mainly arise 
due to interpolation errors in the RBF used to store the cue templates, rather than from the 
effect of the additive noise itself. For S/N ratios of 40 dB (Figure 3-14 (b)), good 
localisation accuracy is still achieved accept for a few errors which occur along cones of 
confusion. This result confirms that spectral cues (which are responsible for 
descriminating source positions along cones of confusion), are more susceptible to noise 
than ITD cues. Furthermore, several front-back ambiguities can be noticed along the 
median plane (0 = 0°). Sound localisation along the median plane is prone to errors even 
in the biological system as also evidenced in several experiments [108]; this is because 
interaural cues are practically zero along this plane and localisation has to rely solely on 
monaural spectral cues whose accuracy depends a lot on the source spectrum and are 
highly susceptible to additive noise. In humans, it is known that localisation accuracy in 
the median plane depends a lot on the source spectrum, and breaks down completely for 
pure sinusoidal sound signals due to the unavailability of monaural spectral cues in that 
case. Figure 3-14 (c), (d) shows the results obtained for S/N = 20 dB, for both 
uncorrelated and correlated additive noise. Again, it is evident in these cases that most 
errors occur along cones of confusion and in particular in the median plane. 
The results are in line with those obtained in section 3.6.1.1, where it can be seen 
that localisation accuracy is substantially impaired for SIN values less than about 30 dB. 
At S/N values greater than or equal to 30 dB, errors are mostly due to spectral cue errors. 
However, below 30 dB, errors in ITD cues also become significant and localisation errors 
across cones of confusion start to occur. As pointed out in the previous section, errors in 
ITD cues due to uncorrelated additive noise can be reduced by increasing the correlation 
time. 
64 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
ýo 
Rear 
0=1800 
(a) SIN > 60 dB 
(uncorrelated noise) 
Rear 
e=180° 
(b) S/N = 40 dB 
(uncorrelated noise) 
Original position * """' 0 Resolved position 
Front 
e=0 
400 
10 
Front 
e=o° 
- 40° 
Figure 3-14 (a), (b): Simulation results (uncorrelated noise) for: (a) SIN > 60 dB show that 
most positions can be resolved within an error of less 5°. A front-back error is evident at 
the source position of (8011, -30°). It is also evident that most errors occur along cones of 
confusion and arise due to low robustness of spectral cues; (b) S/N = 40 dB, in which case, 
it is also evident that most errors occur along cones of confusion. In particular, large 
errors occur for source locations along the median plane (0 = 0°). 
65 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
)° 
Rear 
e=1800 
Rear 
e=1800 
Original position """ 0 Resolved position 
Front 
e=o° 
- 40° 
Ioo 
Front 
0 =o° 
400 
Figure 3-14 (c), (d): Simulation results with SIN = 20 dB for (c) uncorrelated and 
(d) correlated additive noise. In both cases, most errors occur along cones of confusion, 
indicated the susceptibility of spectral cues to noise. In particular most errors occur along 
the median plane, where all interaural cues are zero. For S/N values less than/equal to 20 
dB, localisation errors across cones of confusion also start to occur. 
66 
(c) S/N = 20 dB 
(uncorrelated noise) 
(d) S/N = 20 dB 
(correlated noise) 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
3.6.4 Accuracy and Speed of the 3-Step Mapping Algorithm 
The above simulation results clearly show more precise 2-D localisation can be 
achieved with the proposed 3-step search algorithm using all the biological interaural and 
monaural cues, when the system is subjected to noise, echo, crosstalk and hardware 
inaccuracies. First, as already suggested, ITD cues are relatively robust when compared to 
other cues. Unlike monaural spectral cues they are independent of the source spectrum 
variation, provided that a sufficiently narrow BPF bandwidth is used. The ITD values can 
be evaluated quite accurately even for moderate BPF bandwidths since they only possess 
slight frequency dependence. Thus the probability of choosing the right "cone of 
confusion", at the first stage of the search algorithm, is always high. Secondly, in order to 
implement a full single-step search for the maximum likelihood value using all cues, one 
has to attribute some weights (or variances) to the individual cue distance terms. The 
particular choice for a set of weights can greatly affect the localisation accuracy since the 
distance term due to monaural spectral cues (and to a lesser extent III) cues) can have 
multiple maxima at various locations. Thus setting the weight of the monaural spectral 
cues too high could result in choosing a position of maximum likelihood which is in fact 
far away from the true source location. Setting this weight too low, would reduce the 
resolving power of monaural cues along the "cones of confusion". In the current search 
method, during the first step only ITD cues are used, while during the second step, the 
ITD values are relatively constant along the cone of confusion: thus the choice of the 
approximate source position along a predetermined "cone of confusion" is primarily 
dependent on the lID and monaural cues. The relative weight values are only important 
during the final step when the fine search is done: typically this step causes the 
approximate source position to "move" to a nearby local likelihood maximum. 
The current algorithm requires 72 comparisons during the first step and a 
maximum of 25 comparisons (for ß= 0°) during the second search. In contrast, a full 
single-step search at the same resolution would require about 1800 comparisons, 
indicating that the proposed algorithm offers also a significant reduction in the 
computation time. The computation complexity is also low compared to NN 
techniques [97], [98]. 
67 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
9 
3.7 Generic Search Algorithm 
The search method, described in section 3.5.1 achieves accurate results when 
tested with the available HRTF data. However, its particular drawback is that it assumes 
that the ITD cues are approximately constant over the conical surface. The method can be 
further generalised such that no assumption is made about the nature of the surface over 
which the ITD cues are approximately constant. 
In this method, the search space is divided into "meshes" ranging in azimuth from 
-180 to 180° and elevation from -35 to 90°, with a step size of 5° in both cases. The 
weighted inter-vector distances are then computed using only ITD cues according to the 
following equation: 
D(ij) = 
\C1PD 
(i, n) - 
CIPD (J, n))2 + 
\C_ED 
(l, n) - 
C_ED (J, n))Z (3.22) 
n=1 
a1PD (n) 
n=10 LIED 
(n) 
In the above equation, i and j represent points in the search space defined by 
(0;, 4; ) and (As, 4), respectively, while n is the filter number. The points (0, ý) are then 
grouped into 40 clusters, in such a way that for each cluster, the value of D(i, j) does not 
exceed some specific threshold (or `radius'). This threshold is set such that the number of 
resulting clusters roughly corresponds to the number of discrete angles taken along the 
azimuth for 0= 0° during the generation of the cue template. This is a good choice for the 
number of clusters, since each discrete point lying on 0= 0° is likely to be attributed to a 
different "cone of confusion", which in this case defines a new cluster. The values of 
(8,0) contained in a particular cluster thus define a set of points over which the ITD cues 
are approximately constant. Each cluster c is characterised by the mean C, PDmean(c, n) 
and CIEDmean(c, n) values for the points inside that particular cluster (n being the filter 
number). These cluster parameters, together with the angle values which are members of 
that cluster, are computed once and saved in a database. 
During the first step of the search, the cluster c which minimises the distance 
between the front-end generated ITD cue values and the cluster mean ITD cue values is 
selected. Next, the member of that cluster c which minimises the distance due to all cues 
is determined. The azimuth and elevation angles of this member thus define the 
approximate location of the sound source. As with the previous search method, a fine 
gradient-descent is finally carried out in order to retrieve the most likely angle. 
68 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
In the current model, the biggest cluster contains 284 members. Thus, the 
maximum number of comparisons involved in determining the approximate localisation 
angle is only 324, compared to the 2585 comparisons which would have been required 
had a single-step full search been carried out instead at the same resolution. Using this 
method, it is possible to give more weight to the IID and monaural cues when selecting 
the most likely member in a cluster. 
The simulations carried out in section 3.6 were also repeated using this cluster- 
based algorithm for localisation. The resulting errors in localisation for variations in S/N 
ratio, cross-talk and hardware non-idealities were almost identical to those obtained using 
the other algorithm, indicating that the performance of the generic algorithm is as good as 
the previous algorithm, at least for this particular set of HRTF data. However, the cluster- 
based algorithm allows for some degree of adaptation (as with neural network models), 
which can be advantageous when used with other HRTF data sets. 
3.8 Conclusions 
A 3-step cue-to-position mapping algorithm for 2-D sound localisation has been 
developed which uses all the interaural and monaural cues found in the biological 
auditory system, and where the cues can be hardware-extracted rather than software- 
based. The proposed cue-to-position mapping algorithm achieves the same localisation 
accuracy, under ideal conditions, compared to a single step search but is significantly 
faster. The algorithm is more robust in the presence of both environmental non-idealites, 
such as echo and noise, as well as hardware-induced inaccuracies such as channel 
crosstalk and variations in filter Q-factor and centre frequency. The algorithm breaks the 
"2-D" search problem into two "1-D" searches and utilizes the various types of cues in 
such a way that their discriminative power is maximized. The generic version of this 
algorithm based on vector-clustering allows for additional adaptation to different HRTF 
data sets. 
In the current implementation the cue weight values were kept constant, 
independent of the filter number, when computing the likelihood values. This approach 
has been adopted since in the majority of cases the most common sound sources (such as 
speech) are typically broad-band in nature. 
The simulation results presented above evidence the conditions required in 
designing analogue 2-D localisation systems. Accuracy of the hardware system depends 
69 
Chapter 3. Cue-to-Position Mapping Algorithm for 2-D Spatial Localisation 
mainly on good matching between the Q-factor of the left and right filter banks and also 
on the absolute accuracy of the centre frequency of the filter banks. Unmatched variations 
have to be minimised via the use of appropriate circuit design and silicon layout 
techniques, while absolute variation of the centre frequency can be taken into account if 
the template is customised to the cue generating hardware. 
70 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
Chapter 4 
Hardware Detection of Onsets in a Sound Signal 
4.0 Introduction 
In practical applications, sound localisation is carried out in a reverberant 
environment, where echoes introduce localisation cues which do not pertain to the 
original sound source position: thus they can introduce potential ambiguities in the 
position estimated by a localisation system. In the auditory system of humans and other 
animals, reflections have little influence on the result, since localisation is mainly based 
around the onset of the signal: this effect is known as the precedence effect [86]. Thus, 
the determination of onsets, that is the incident portions of the audio signal, is an 
important aspect of a sound localisation system since it provides a degree of robustness 
against ambiguities which may result from echoes. 
The hardware proposed in this section is based on an algorithm 'which is 
insensitive to non-transient noise and automatically adapts to different sound levels [35]. 
A current-mode low voltage CMOS circuit, which generates a digital output at instances 
where the ratio of the received audio signal to the estimated echo signal exceeds a specific 
threshold has been designed for this purpose: this novel hardware has been designed to 
operate at a supply voltage of ± 0.9 V, using a standard double-poly double-metal CMOS 
0.8 µm technology with a threshold voltage of 0.8 V. The principle of operation, together 
with a block diagram for the onset detector adapted for analogue hardware 
implementation is presented in section 4.1. Section 4.2 presents a brief overview of the 
design cycle of analogue and mixed-mode integrated circuits, together with some 
analogue design layout issues. The actual circuitry is discussed in section 4.3 with the 
respective simulation results in section 4.4. Test results for the various stages and for the 
whole chip are then discussed in section 4.5. 
4.1 Principle of Operation 
The principle of operation of the onset detector circuit is based on the echo 
estimation model depicted in Figure 4-1. The impulse response of the received signal, 
including both the incident and echo components, is assumed to be given by: 
71 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
I t=0 
I(t) =00<t< Td (4.1) 
La e-(t-Td)/r t>T fe d 
aý b 
I1 
a 
I 
b 2 
eý afe 
time 
Total estimated echo envelope 
Figure 4-1: Observed sound and total estimated echo envelope for the echo model used as a 
basis of the onset detector circuit 1351. 
The corresponding original discrete-time algorithm on which the hardware is 
based relies on a maximum echo estimation method, depicted in Figure 4-2 [35]. afe 
represents the attenuation of the first echo wave-front envelope relative to the incident 
signal envelope S(t). The initial delay from the incident signal to the first echo wave-front 
is denoted Td while T. is the sampling period of the system. It is assumed that the echo 
signal envelope decays with a time constant i. The block MAX outputs the maximum of 
its inputs. Thus E(t) represents an estimate of the maximum echo envelope. The estimated 
echo envelope is compared to the input signal envelope S(t) and if S(t): E(t) exceeds a 
specific threshold, an onset signal is generated. 
72 
1 '. 
.` .` 
,` .` 
.ý ,ý 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
S(t) T 
Estimated echo 
ýfe 
z 
Ts MAX Et 
lý- LT 
IS 
Figure 4-2. Discrete-time algorithm for maximum echo envelope estimation in a sound 
signal. 
The values for Td, are, and i depend on the acoustic environment and therefore 
these parameters have to be programmable: typical values for Td, afe, and i are 6 ms, 0.5 
and 0.1 s, respectively. The algorithm depicted in Figure 4-2 is implemented using 
current-mode analogue hardware, with some slight modifications. The complete block 
diagram of the onset detector chip is shown in Figure 4-3. Onset detection is carried out 
on the composite L and R signal envelope which is extracted via a process of squaring, 
summation and low-pass filtering. The composite envelope is delayed using a delay line 
ATd. The coefficient (x,, is compensated for in the threshold value (THR) set for onset 
detection. The echo decay model can be thought of as a half-wave rectifier with RC 
filtering. The resulting estimated echo E(t) is then multiplied by the threshold used for 
onset detection and compared to the original composite envelope signal S(t). The output 
of the comparator triggers a window generator whose pulse width and off-time are 
programmable. 
Echo decay model Thold 
------------------------- (T HR) 
Xý ý, ý, 
E(t) L(t) 
` /'` dx 
'2 R(t) X 
S(t) 
Envelope extraction 
------------------------ 
Figure 4-3: Onset detector block diagram 
Onset 
The simulated operation of the onset detector is shown in Figures 4-4 (a), (b). In 
both cases, the incident signal consists of a sequence of impulses. The sensor signal 
73 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
1 
0.5 
n 
1 
0.5 
0 
0.40 
0.2 
n 
0.01 0.02 0.03 0.04 0.05 0.06 0.07 
i lila WL Mt IYUL hM h&kJ IkAJU 
0.01 0.02 0.03 0.04 0.05 0.06 0.07 
v 
0.20 
61 
0.1 
0 
,0 
0.5 
0 
1 0.01 0.02 0.03 0.04 0.05 0.06 0.07 
0.5 
0 
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 
Time, s 
Figure 4-4 (a). Operation of the onset detector (T. = 0.1 s, Td =6 ms, acre = 0.5 for both echo 
generator and onset detector): 1" panel - incident signal; 2d panel - signal arriving at the 
sensor, including the echo component; 3'd panel - extracted envelope after the LPF; 4th 
panel - envelope signal delayed by Td and corresponding estimated echo; 5" panel - 
output of threshold comparator; 6`h pane! - corresponding window signal. All signal 
levels are normalised with respect to the input. 
0.01 0.02 0.03 0.04 
. _i f\- (41 "\ )T 1y\ 1i' 
0.01 0.02 0.03 0.04 
0.05 0.06 0.07 
0.05 0.06 0.07 
74 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
1 
0.5 
0 
100.01 0.02 0.03 0.04 0.05 
0.06 0.07 
1 
0.5 
0 
0.4 0 
0.2 
0 
0.2 0 
0.1 
Cd 
0.01 0.02 0.03 0.04 0.05 0.06 0.07 
__1(\_ 
)-' tt\ if\ )-, \ Ii \. 1''ßi V , I'V 
0.01 0.02 0.03 0.04 0.05 0.06 0.07 
01 fL 1 1L_ )u. f1i l J'Lfi"Ll'"V 1', V., I \l"' 
10 0.01 0.02 0.03 0.04 0.05 0.06 
0.07 
0.5 
0 
10 0.01 0.02 0.03 0.04 0.05 0.06 0.07 
0.5 
0 
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 
Time, s 
Figure 4-4(b): Operation of the onset detector (onset detector parameters kept equal to 
those in the previous case while echo generator t and Afe increased to 0.5 s and 0.8, 
respectively): 1'` panel - incident signal; 2nd - signal arriving at the sensor, including the 
echo component; 3, d - extracted envelope after the LPF; 4`h panel - envelope signal 
delayed by Td and corresponding estimated echo; 5th panel - output of threshold 
comparator; 0" panel - corresponding window signal. All signal levels are normalised 
with respect to the input. 
75 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
is the summation of the incident signal and the echo signal each convolved by their 
respective HRTF. The echo signal is computed by convolving the incident signal with an 
echo impulse function modelled as an exponentially decaying pulse train (decay rate time 
constant ti. ) starting with amplitude afe at time Td and spaced at intervals Td. The 
threshold for onset detection was kept constant at 1.5 for both cases. In Figure 4-4 (a), 
both the generator and onset detector parameters are identical. It can be seen that the 
onset window is correctly placed at the instances where the incident signal has the highest 
energy level compared to the echo signal. In Figure 4-4 (b), the onset detector parameters 
were maintained as in the previous case, but parameters are and ti of the echo impulse 
function were increased to 0.8 and 0.5 s respectively. It can be seen that, in this case, the 
onset detector output does not coincide with the incident signal peak energy positions. In 
fact, for correct operation, the onset detector has to be programmed according to the 
acoustic properties of the environment which depend on the surface characteristics and 
the room size. 
4.2 Analogue and Mixed-Mode VLSI Design Overview 
4.2.1 Analogue VLSI Design 
Analogue VLSI design is essentially a full-custom design starting with a 
schematic capture, where all MOS devices, capacitors and resistors are sized in order to 
obtain the required circuit functionality. The circuit performance can be checked via 
simulation tools such as SpectreS, which allows d. c. operating, a. c. (small-signal) sweep, 
noise and transient analysis. Three types of simulation are carried out in this case using 
three different MOS models namely: nominal, worst speed and worst power. Worst speed 
simulation takes place with the highest threshold voltages and parasitic capacitances: this 
simulation is essential for low voltage designs. Worst power simulation takes into account 
the lowest threshold voltage values. 
After correct functionality has been verified, a layout of the circuit has to be 
carried out. In contrast to digital design, standard cell libraries do not exist in analogue 
design and each device has to be laid out and connected manually, taking into account 
matching and parasitic elements constraints. The layout is checked using a design rule 
check (DRC) and a layout versus schematic (LVS) tool. The DRC tool checks that the 
layout conforms with the technology design rules which include parameters such as 
minimum feature sizes, minimum layer spacings and layer stacking capability. The LVS 
76 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
tool is used to check that the layout truly maps the schematic in terms of both connections 
and device sizing, using the input and output pins as a starting point reference. The final 
layout has to be completed with the input/output and supply pins and the associated pad 
rings and scribe outline. 
The next stage after the layout process is parasitic element extraction and back 
annotation, where parasitic elements (capacitances, gate resistances and diodes) 
introduced in the layout are identified and added to the original schematic netlist. A 
simulation is then carried out using this netlist in order to ensure that the circuit still 
functions as required. Once the layout is ready, an appropriate package has to be selected 
and a bonding diagram for the pin-pad connections is drawn. 
4.2.2 Mixed-Mode IC Design 
In a mixed-mode design, the analogue section is designed in essentially the same 
manner described in section 4.2.1 except that the pads and associated pad rings are not 
placed at this stage. The design of the digital section is, however, more automated and is 
carried out in a different way. The digital section design starts with a hardware 
description language (HDL) code: in this case Verilog was used. A Verilog simulation is 
carried out in order to verify the high level functionality of the HDL-defined digital block. 
The HDL-defined digital block is then simulated with the analogue circuitry 
using a mixed-mode simulation tool such as SpectreSVerilog. In order to carry out 
simulation, the circuit has first to be partitioned into analogue and digital sections; in 
between, for simulation purposes, interface elements (lEs) are introduced which act as a 
1-bit A/D and D/A converters. By default, the IE parameters are set for a digital circuit 
operating with a supply of 5 V. Since in this case the circuit has to operate with a supply 
of ± 0.9 V, the IE parameters have to be changed accordingly. 
Once satisfactory functionality has been obtained, the HDL code is converted 
into a schematic using a synthesis tool such as Synergy. During synthesis, several 
parameters such as timing, fanout constraints and implementation methods have to be 
defined. The synthesis tool uses a pre-defined library of primitive cells in order to 
generate the circuit. The resulting digital circuit can then be simulated using either gate- 
level Verilog simulation or SpectreS transistor-level simulation. In this case, the digital 
primitives library was originally designed to operate at 3.3 V; even so, it was found that 
correct operation (at a reduced speed) is still achieved at voltages down to 1 V. However, 
77 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
since the circuit is intended to operate at ± 0.9 V, it is important that correct operation is 
verified via a transistor-level simulation taking into account the loading effect of the 
analogue circuitry, since the gate-level models are only accurate when the circuit is 
operated at the nominal primitive cell library supply voltage. 
Layout of the digital section consists of the following automated layout stages: 
gate and interconnect space placing, power routing and signal routing. Routing between 
the digital section and analogue section can be also done automatically, provided an 
abstract view of the analogue section is prepared where the access direction of each pin to 
be connected is defined. It is important to have some degree of isolation (via guard rings 
in this case) between the analogue and digital section in order to prevent digital noise 
from being injected into the analogue circuitry. The layout process then continues with 
the DRC and subsequent steps as described in the previous section. 
4.2.3 Analogue Layout Techniques 
During the layout stage of an analogue circuit, it is important to take into account 
any matching constraints which are essential for the correct functionality of the circuit. It 
is also important that some layout considerations are included in the schematic itself: this 
is known as a layout-oriented design. As an example, if a scaled current mirror of ratio 
1: 1.5 is required, the best way is to use 5 identical transistors (2 for the input side and 3 
for the output side) since identical devices match better than devices of different sizes. 
Furthermore, for devices which have to be highly matched (such as differential pairs), 
minimum device sizes have to be avoided. 
Devices to be matched have to be placed in close proximity and should have the 
same orientation and boundary conditions in the layout. Matching can be further 
enhanced via the use of interdigitation, common-centroid and dummy strip techniques. 
Interdigitation also helps to reduce the parasitics. A typical common-centroid layout is 
shown in Figure 4-5. for the transistors Mt-M4 of the translinear loop shown in Chapter 2, 
Figure 2-4. In this particular case transistor pairs M1, M2 and M3, M4 have to be matched. 
4.3 Implementation 
Although the concept of onset detection described in Section 4.1 has already been 
implemented in software, no hardware implementations have been reported. The 
hardware architecture adopted relies on a current mode topology which allows for low 
79 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
voltage and low power operation. Log-domain building blocks have been used for the 
envelope extraction filter and for the echo decay model: this technique allows a compact 
design to be achieved, while still maintaining good linearity, and ease of 
programmability. An S2I delay line is used for the implementation of Td. The translinear 
loops involved in the log domain blocks have been designed to be insensitive to the body 
effect, which is particular to CMOS in contrast with bipolar technology. A current mode 
approach also allows compact implementation of some of the mathematical functions 
involved, such as signal addition and squaring. In particular, the squaring, echo decay 
model and SZI delay presented here are novel building blocks. 
MID 
MI 
MI 
M2D M3D 
A14 D 
N4G 
134S 
Figure 4-5. Layout of the translinear loop shown in Chapter 2, Figure 2-4, excluding the 
tail transistors M5, M6. Matching of transistors M,, M2 (and M3, M4) is ensured via the 
common centroid technique. 
4.3.1 Voltage-to-Current Converter and Harmonic Mean Splitter 
Most of the onset detector chip is designed in a fully differential topology in order 
to reduce the effects of systematic offsets and common mode noise, such as power supply 
injected noise. Hence a single-ended to differential converter is required at the input 
stage. The circuit diagram of the V-I converter (which has to be duplicated for the L and 
R inputs) and associated single-ended to differential current converter is shown in 
Figure 4-6. The V-I converter is built around M15 23. The values of R,, 2 are equal and 
much higher than 1/gm22,23. The output current of the V-I converter is given by: 
123 -'18 
y"'+ - "in The V-I converter can be used with both single-ended or differential R, 
79 
M2 Ml M2 Ml M4 M3 M4 M3 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
inputs: single ended input operation is achieved by leaving one of the inputs open or tied 
to Vin-Ref" Since the currents through M20, M22 and M23 are approximately equal, their 
source voltages will be at the same potential. Thus Vin-Ref is used to set the common mode 
input level of V;,, + and Vin which must be at least one saturation voltage above V. Most 
of the processing in the onset detector chip is carried out using log-domain circuits: for 
differential log-domain circuits both a geometric mean splitter [109] or a harmonic mean 
splitter can be used [110] in order to ensure that both complementary current signals 
remain strictly positive. A harmonic mean splitter adapted for low supply voltage 
operation is used here in order to transform the single-ended current generated by the V-I 
converter into two differential current outputs. Devices M1_5 operate in weak inversion, 
thus using the translinear principle: 
I1,2 * I4 = I3'Is 
Io+' Io Idc 
I++I-2 00 
where Io+ - Io- = 
V'"+ - Vin(4.2) 
R, 
T bas 
Vin-Ref 
Vin+ 
Vin- i 
M21 
(bias 
191_ I1 M12 I1I M14 
M19 CM17 M18 ý2 IIh 1ý 2 Idc M1i0 2 
M1 
i1 
M13 
ii M8 
M20 11 4- il "'1 4 M23 
M2 M3116 I M4 M5 
2 
RI III7I Ml I1 10+ Io- 
M15 1["I M16 yj M6 M7 
R2 
vss 
Figure 4-6: V-I converter and harmonic mean splitter. 
4.3.2 Squaring Circuit 
A single translinear loop cannot be used for current-mode signal squaring, since 
the differentially represented input signal is essentially bipolar and an adequate (positive) 
dc current has to be ensured through each MOS device. The proposed solution is based on 
three translinear loops as shown in Figure 4-7. This topology ensures that an adequate d. c. 
component exists in all branches since each current term through each device is strictly 
80 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
positive provided I;,, + and I;,, - are also positive. Using the translinear principle it can be 
shown that: 1+ =14 +113 - 
(II +ý2 
+ 
III ý2 
and J_ 
2J I Hence the differential output 
reJ ref ref 
is given by: 
I0+- 
i0--+ 
- 
Itn 
I ref 
jIM2E 
M19 
1"20 
IL M24. M25 
M22 M23 
Figure 4-7: Squaring circuit consisting of three translinear loops. 
(4.3) 
81 
Chapter 4. Hardware Detection 
-1 
ýn _c 
0)-Is 1 
Onsets in a Sound Signal 
>0: 0>0> 
ý/ 
Figure 4-8: Signal flow graph for the envelope LPF 
4.3.3 Envelope LPF 
Y 
The signal flow graph for the LPF used for extraction of the signal envelope from 
the squared signal is shown in Figure 4-8. Non-zero quiescent conditions for the state 
variables exist provided the input is non-zero, making the signal flow graph useful for 
log-domain applications. 
4.3.3.1 Log-Domain Circuit Synthesis 
The log-domain circuit is based on the three main building blocks shown in 
Figure 4-9, which comprise a signal compander, log-domain integrator and signal 
expander. These blocks have been adapted from the BiCMOS versions proposed in [I I I]. 
lin 
M1 
Va 
Log compander 
vin- 
M2 
Vin+ 
Log-domain integrator 
Figure 4-9: Log-domain building blocks 
lo 
M5 
Vin 
Expander 
For correct operation, it is important that M1_5 operate in weak inversion and have 
to be matched. The number of inputs in the integrator can be increased by simply adding 
82 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
transistors in parallel with M2 and/or M3. Although these blocks were originally intended 
for BiCMOS implementation, they can be readily used for CMOS processes; signals are 
coupled from one stage to another via the source terminal, and hence the body effect term 
is identical for the driving and driven transistors and thus cancels out. Compared to 
bipolar devices, however, CMOS devices exhibit a much smaller current range for 
exponential ID-VGS behaviour as limited by the subthreshold region - this property limits 
the dynamic range of CMOS translinear circuits. On the other hand, CMOS translinear 
circuits do not suffer from the base-current effects present in bipolar translinear circuits 
which can lead to distortion. 
The actual implementation is based on two 2 "d order log-domain sections in a 
pseudo-differential topology as shown in Figure 4-10, using the log-domain building 
blocks introduced above. In this circuit M1_8 operate in weak inversion. The circuit 
operation can be verified as follows: 
Translinear equations: Iin12tune =14I6Iout ' 
1216 =12tune 
Capacitor equations: Cd yc 
t= I2 - I4 ,Cd 
Vc 2 'tune - I6 
Drain current equations: 
VC1-Vl -VT VC2 -Vl -VT VCl -VC2 
Itune = IDOe nUT ' 16 = IDOe 
nUT Itune = I6e nUT 
VC2 -V2 -VT VB -V2 -VT 
Itune=lDOe nUT , lout=lDOe 
nUT 
vr, -vn 
': *Itune-loute nUT 
Simplifying the above equations, the transfer function of this circuit can be verified as: 
I 
out 
(S) 
- 
wn 
i 
Iin 
(S) SZ +»s+ a)n2 (4.4) 
Itune where w =. The -3 dB cut-off frequency is given by: nCUT 
_1+ 
(4.5) 
0) 
-3dB - 
0) 
n2 
The current through M3 is always equal to It11i1e. However, the current source 'tune and M3 
are still required in order to ensure correct start-up of the filter. 
83 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
Figure 4-10: One side of the 2nd Order LPF 
4.3.4 Delay Line 
: LA 
VB 
A continuous time delay section is not feasible for this application due to the 
relatively large delay value required. Consider a continuous time delay element 
constructed having the all-pass transfer function: 
H(w)= s-coo Td (w)= 2 tan -1 w 
s +0)0 w0 
(4.6) 
In order to achieve the requirement that Td(0) is independent of w, the value of wo 
must be set to at least 5 times wmax for 1.3 % error in Td. Taking camax to be equal to 
15.7 krad/s (taken to be 5 times the LPF corner frequency), implies that coo has to be set to 
78.5 krad/s and hence the achievable delay is around 25.5 µs. In order to achieve a total 
delay of 6 ms (as used in the high level simulations), 235 cascaded delay elements would 
be required. Such a cascade is not practical due to area, noise, distortion and offset 
problems. On the other hand if a discrete-time system is used with a sampling frequency 
of say 5 kHz (taken to be 10 times the LPF corner frequency), only 30 delay elements are 
required. In addition, SC and SI circuits, provide much better accuracy compared to 
continuous time circuits, at the expense of higher power consumption. 
84 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
4.3.4.1 Class AB S21 Differential ± 0.7 V Current Memories 
Two novel Class AB double-sampled switched current memory cells have been 
investigated for the purpose of the discrete-time delay line [112]. These cells exhibit a 
quiescent current that can be accurately controlled and is independent of the supply 
voltage. Furthermore, it is ensured that the minimum current passing through the memory 
transistors is equal to a well defined value, throughout the signal range. The first memory 
cell is based on a fully-differential architecture, while the second one can be used in 
single-ended or pseudo-differential circuits. 
Switched-current memory cells (MCs) constitute a fundamental building block in 
current-mode circuits which can be used in applications such as delay-lines, filters and 
ADCs. Class AB operation of these MCs allows a wide range of operation while still 
maintaining a low quiescent current. It is also desirable that the MC can operate with a 
low supply voltage which is in line with today's reduced technology linewidths. Class AB 
SI MCs have been designed [113], [114]; however, these circuits require a minimum 
supply voltage of twice the threshold voltage VT to operate correctly and the quiescent 
current depends mainly on the supply voltage. A feedback technique for controlling the 
quiescent current in a basic single-ended Class AB MC, which is compatible with low 
voltage operation has been proposed [115], where a high-gain feedback loop containing a 
high impedance node is used which, however, may result in speed or stability problems. 
The coarse-fine (S21) MC [ 116] is a well known technique for eliminating the various 
errors which result in the basic SI MC. A Class AB version of this cell has been 
proposed [117]; however, the quiescent current in this case still depends mainly on the 
supply voltage. The first MC which is investigated here, is a fully differential structure 
with a stable feedback current control mechanism which is not prone to slew-rate limited 
speed. The second architecture uses feed-forward current control and may be used in 
single-ended or pseudo-differential systems. In both cases, the fine memory cell may be 
operated in Class A or Class AB, and the circuits can operate at a minimum supply 
voltage of VT + 2VDS sat, taking into account the body effect on the value of VT. 
4.3.4.1.1 Circuit Implementation I 
The circuit shown in Figure 4-11 is a true differential Class AB S2I MC. MIA, B 
and M2A, B constitute the main MC while M15A, 13 and MI6A, 13 form the fine MC. Suppose 
the current through M9A, Bis equal to IcQ: then during the coarse memorising phase 4i, 
85 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
IIA= kI7B = kI5B = 12B and I, B= kI7A = kISA = I2A, where l;,, + = I2A -IIA, I; n- = I2ß - I1B 
and k= (W/L)1/(W/L)7 _ (W/L)2/(W/L)5. During this phase, the currents through the fine 
MC transistors are set to be equal to IFQ. The structure formed by MIIA, B and Mb A, B 
(which are equal in size to M5A, B) is used to detect the minimum current through M5A and 
M5B. When I5A = Iss, MI IA and MIOB (MI IB and Mb A) may be regarded as a single device 
with twice the gate length of M5A (M5B) and thus the current flowing through M12 will be 
equal to the current flowing through M5A (M5B). When IsA » Isa, M1 0A will operate in 
triode region, while MI IA will act as a cascode transistor to MIOB. The current through M12 
will thus be equal to 2I5B. This current is mirrored to M9A, B and compared to IcQ and the 
resulting current difference is used to adjust the common mode currents through MIA and 
MIB. Thus this arrangement ensures a minimum current through M2A, B, of klcQ/2 when 
112A 
-I2B 
I»0, and kIcQ when I2A = I2B, allowing Class AB operation with accurate 
quiescent current control determined by Ice, which can be set independently of the supply 
voltage. The feedback loop incurs no high impedance nodes, and thus a fast response with 
good stability is easily achieved. During the fine memorising phase 42, the main current 
control circuit is disabled and the current through MIA, B is kept constant. During this 
phase the residual error current is memorised via M16A, B. Since the residual error current 
is typically low, the fine MC can operate in Class A without limiting the circuit 
performance, in which case I15A, a is kept constant and equal to IFQ. However, it is still 
possible to introduce a low swing Class AB operation of the fine MC by including the 
dotted components CXA, Bi, CF2A, B and two additional switches, in which case the currents 
through MI5A, B can be altered during 4)2. The memorised current is read during 4)3: for 
delay line applications 4)3-switch is implemented as the input sampling switch of the 
succeeding stage. Resistors RCA, B are set to be equal to the on-resistance of the 4)3-switch: 
the function of these resistors is to ensure that the voltage at the drains of M3A, B and 
M4A, B will be the same (= VGS2A, B- IinRCA, B) during both 4)l, 2 and 4)3 phases such that 
the finite output conductance of the transistors does not affect the accuracy of the circuit. 
These resistors are implemented using permanently switched-on MOS switches. 
4.3.4.1.2 Circuit Implementation II 
An alternative approach for low-voltage Class AB S21 MCs is shown in 
Figure 4-12, where control of the current is ensured via a feed-forward arrangement rather 
86 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
CC2A M12 
41M9B 
c2B 
= 
MIA M M9A MIB 
C M7B 
CP L 
+ M3A 
43 M8 M8B 43 M3B 4i, 2 Ii, _ 
M11A M11B ICQ 
Iý 
M4A ICQ M4B 
M6A *j I VCN VCN M6B 
MsA M10A MIoB Mss CCIB M2B MZA= Ccs 
1 
RCA 
** 
RCB 
CF2A CF2B:::;:: 
MisA MI5B 
M18A LvCP 
VCP M18B 
IF CXB 
Q IQ 
CN CF VCN M17A M17s 
21 M14 1 
ý2 
Ml6A M16B 
T CFI 
CFI 
= 
Figure 4-11: Proposed differential S21 cell using feedback current control. Dotted 
components are shown for Class AB operation of the fine memory cell. 
Ml ;...... 4' Mia Mio M15 41 Ms 
M22 
VCP 23 
M M21 M2 M6 IFQ M13 is 
++ Io AUX Ijn AUX 
y 
Iii+ 
Ma 
42 IFQ M12 ýi M16 I+ IcQ 
M7 41 vCN 
MI M19 M2o 
M4 Ms - M9 
jMii 
Figure 4-12: Proposed S21 cell intended for pseudo-differential operation using feed- 
forward current control. Dotted components are shown for Class AB operation of the fine 
memory cell. 
87 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
than a feedback mechanism as in the previous case, thus rendering an inherently stable 
system at the expense of requiring an additional replica, I;,, Aux, of the input current. 
The circuit in Figure 4-11 shows an S21 cell intended to be used in a pseudo-differential 
current-mode architecture. The coarse MC is formed around MI, and M14, while the fine 
MC is formed around M8 and M5. Transistors M1,4,11,14 generate a replica of the output 
current To Aux which is required by the subsequent memory cell. During the coarse 
memorising phase 4i, M23 acts as a half-wave rectifier and thus the current though M14 is 
set to ICQ if Ii AUX >0 or to ICQ + Iin AUX if Iin Aux < 0. In this way, it is ensured that the 
minimum current through M1 1 or M14 is equal to IcQ, which is also equal to the quiescent 
current and independent of the supply voltage. In this phase, the currents though the fine 
MC transistors is set to IFQ. During the fine memorising phase 42, the current through M14 
is kept constant while the residual error current is memorised via M8. As in the previous 
circuit, the fine MC can be made to operate in Class AB via the addition of the dotted 
components. The memorised current is read during 03. The output replica To AUX incurs an 
error due to fabrication process mismatches in M4,8 and Ml1,17; however, since this 
current is only used to control the current though M14 of the subsequent stage, this error 
does not ultimately affect the accuracy of the circuit. 
4.3.4.1.3 Simulation Results 
Both MCs have been designed using a CMOS process with a nominal threshold 
voltage 0.8 V and simulated at a supply voltage of ± 0.7 V. Figure 4-13 shows the 
currents through the coarse MC pMOS and nMOS transistors for an input current range of 
-50 . tA to 50 µA. In the first case, ICQ was set to 1 µA and k=5, while in the second case 
IcQ =2 µA. It can be seen that the quiescent current is well-defined and the current 
reaches a non-zero minimum value throughout the signal range. Figure 4-14 shows the 
transient simulation of the first MC connected as a delay line, clocked at 200 kHz and 
with a 20 µA pk sinusoidal current input applied to the first stage. The transmission error 
between 2 stages is 0.05%. For both MCs, the main speed limitation comes from the 
considerable on resistance of the switches at low supply voltages. The speed can be 
improved by increasing the aspect ratio of the switch transistors, but for very high 
sampling frequencies, clock voltage boosting is required. 
In the actual implementation, 31 pseudo-differential Class AB S21 current memory 
cells (described in section 4.3.4.1.2) have been used for the delay line: the pseudo- 
88 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
60 
50 
Q 
40 
U 30 
as 
v 2C 
1C 
r 
60 
50 
Q 
40 
U 
U 30 
2C 
1( 
r 
nMOS pMOS / 
I cQ 
------------------------- 
50 0 50 50 0 50 
(a) Input Current, µA (b) Input Current, pA 
Figure 4-13: Simulation results showing the coarse MC pMOS and nMOS currents for (a) 
the first S2I cell and (b) the second S21 cell. 
1U 
10 
Up 
a 5.10 UI, ý{UU UU 
0 ýJ UU 
-200 50 100 150 200 250 300 
(a) Time, µs 
z 20 nn 
51 10 
ý -10 
uU 
-20 , 0 50 100 150 200 250 300 
(b) Time, is 
Figure 4-14: Transient simulation results for the first S21 cell used in a delay line after 
(a) the 18' stage, (b) the 11`h stage for a sinusoidal input current. 
I It 
nMOS pMOS 
C kl CQ 
ýQ 
' 
-- --- ------- 
89 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
differential version is used since the first MC version is intended for applications where 
Ij, + = -I;,, -, whereas in this case both I;,, + and I;,, - are positive. The value of Td is 
programmable from I to 20 ms by varying the clocking frequency using an on-chip 
frequency divider. At high values of Td, the sampling frequency is reduced and thus the 
LPF cut-off frequency has to be lowered accordingly in order to avoid aliasing. The LD 
circuits operate with a signal current of the order of a few µAs, in order to ensure 
subthreshold operation of the MOS devices. However, the S21 line is capable of handling 
signals of around 100 µA. Hence, in order to improve the dynamic range, scaling-up 
(x 10) and scaling-down (= 10) current mirrors are used before and after the S21 line, 
respectively. 
4.3.5 Echo Decay Model 
The echo decay is modelled using the following rule: 
Io (t + At) _ 
{Iin(t) 
_eýýr 
for I; 
n 
(t) > I,, (t) 
(4.7) 
I,, Oe for I; 
n 
(t) < Jo (t) 
The echo decay model is based on a pseudo differential circuit. Since in the proposed 
circuit the echo signal is represented by two complementary signals, where 
lo(t) = I,, (t) -V (t), then the above equations need to be modified as follows: 
II. (t) for 1. (t) <I -(t) 
Ids +(Io (t) -IdC)e °`I' for I; n-(t) > Iý- (t) 
(4.8a) 
1 +(t+At) = 
I'°+(t) for I; 
ý+(t) >I 
+(t) fIdC+(I 
+(t)-Id, )CA`' for I;,, +(t) <la+(t) (4.8b) 
where Id, = (I;,, + + I;,, -)/2. This modification is required in order to ensure that the d. c. 
component is not attenuated. The corresponding circuit used to implement the above 
model is shown in Figure 4-15. 
The circuit is based on two 1s` order lowpass sections, one handling the I;,, + and 
the other I;, -. Devices M1_6 and M1_23 operate in weak inversion. Devices M14,32 and 
M15,31 are used to extract the common mode component Ide of I;,, +(t) and Iin (t) where 
Ide = (Iin+ + Iin )/2. When I; n+(t) < Io+(t), the current through M4 will be less than Icune 
causing M9,16,10 to be turned off. Using the translinear principle for M2,3,5,6 results in: 
90 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
tune ýc (t) . 
'tune ''dc 
, Also, considering M5, M6 and C: 
dI +(t) 
--I 
+(t)"I (t) Thus 
!o (t) dt nCL 
the output current lo(t) may be written as: Io' (t) = Irc + I, )e where 
t= nCUT/Itune, which is set at a nominal value of 100 ms for the current application. 
When I;, + > 1. (t), M9,10,16 conduct causing C to be discharged abruptly until 1a just equals 
Itne. At this stage, L+(t) = I; n+(t). The circuit handling Ii, -(t) works in a similar way; 
however, the inequality conditions are reversed. When I;,, -(t) > Io-(t), the current through 
M21 will be higher then Itune, causing M22.24 to be switched off. The circuit will thus 
behave as the previous one with I; n+(t) > lo+(t). When I; n- (t) < Io- (t), M22,23,24 will be 
turned on causing C to be charged until I21 just equals Itune: at this point Io-(t) = I; n-(t). M17 
and M24 are used to provide adequate bias level for the source of M22 ensuring that M22 is 
switched off when I;,, - (t) > la- (t). 
4.3.6 Onset Trigger Threshold 
The current mode differential estimated echo signal is multiplied in order to set the 
threshold for onset window generation and compared to the composite envelope current 
signal using the circuit shown in Figure 4-16. It can be seen that Vo goes high when 
91 
Figure 4-15: Differential echo decay model 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
(IS+ 
-Is)> Q(Iecho+ - (echo 
)' where the threshold ß can be digitally set between I and 4, Is is 
the composite signal envelope and Iecho is the estimated echo signal envelope. 
7 
nB<O>1 nB<1>l nB<2>1 
7 
B<0> 
ý3 B<I> ) 
ifi 
B<2> 
Ir 
M1 
M2 M3 Mä M3 T M6 M7 
lecho- 
nB<0> nB<1>d nB<2>1 
2 
i7 B<O> 
3 8<1> 
) 
i6 
B<2- 
)' 
}--"J M9 m15 m16 
Iecho+ M10 M11 M12 1M13 
M14 
Is+ 
Vo 
Is- 
Figure 4-16: Estimated echo to composite signal envelope comparison 
4.3.7 Window Generator 
When triggered, the onset window generator produces a digital signal with 
on-time t1 followed by an off-time of at least t2: further onsets are masked out during the 
duration of t2. This section has been synthesized using Verilog HDL. Both ti and t2 are 
programmable and are set to a default value of 4 ms and 6 ms, respectively. This digital 
output is used by the sound localisation system in order to be able to discriminate between 
incident signals and echo signals. 
4.4 Simulation Results 
The transient simulation results for the V-I converter and harmonic mean splitter 
operated with a single-ended input voltage of 3 kHz, 500 mV pk and R, = R2 =I MO is 
shown in Figure 4-17. Curves Io' and 1.7 indicate the individual outputs of the current 
splitter, which is operated with Ids = 100 nA, thus showing that Class AB operation is 
achieved. The curve Id; ff shows the difference between the two complementary outputs. 
The corresponding squaring circuit complementary outputs I+ and L- are shown in 
Figure 4-18, together with their difference Id; ff. The squaring circuit is operated with 
Iref = 200 nA. The differential output signal is quite close to the expected shifted cosine 
waveform, having a theoretical peak value of 1.25 . tA. 
Figure 4-19 shows the LPF output Io,, t, having the squared 3 kHz signal as input I;,, 
(with the other channel input suppressed). During this simulation, Itune is set to 50 nA with 
92 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
C= 250 pF, resulting in a -3 dB cut-off frequency of around 850 Hz. The capacitor 
C is 
an on-chip capacitor implemented using the gate oxide capacitance of a MOSFET. A 
smoothed output signal of around 625 nA is obtained which is in fact a scaled 
representation of the mean square of the input signal. 
600 
Io 
200 
aý 
0 
U 
-20C 
400 
1+ 
Idiff 
-40( 
-600' 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
Time, ms 
Figure 4-17: Transient response of the V-I converter and harmonic mean splitter. 
1800 
1600 
1400 
1200 
1000 
800 
60C 
40( 
20( 
1 
I+ 
Io 
'41ff 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
Time, ms 
Figure 4-18: Transient response of the squaring circuit 
93 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
1400 
1200 
1000 
800 
600 
400 
U 
200 
C 
Figure 4-19: Transient response of the 2 "d order LPF 
The transient output (Io+, Io-) of the echo decay model block in response to 
periodic bursts at the input Iin is shown in Figure 4-20, with C=1 pF (external) and 
'tune = 600 nA, resulting in a time constant of 87 ms. The resulting differential signal la_D; ff 
180 
160 
140 
120 
100 
a 
80 
U 
60 
40 
2C 
0 
Figure 4- 20: Echo-decay model transient response 
94 
012345 
Time, ms 
0 50 100 150 200 250 300 350 400 
Time, ms 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
follows the required echo decay model characteristic. However, for very fast input rise 
times, considerable overshoot occurs due to the feedback loops around M4 and M21. This 
does not present a severe limitation in this case, since the input envelope signals have 
essentially slow rise times due to prior low-pass filtering. The whole onset detection 
system comsumes 5.4 mW, with most of the power dissipated in the SZI delay line 
implementing Td. 
4.5 Testing Results 
The chip has been fabricated (Layout in Appendix A. 1) and has a die size of 
4.9 x 3.5 mm and consists of about 8300 transistors. The test set up is shown in 
Appendix A. 2. The external components are used for the generation of bias currents and 
voltages. The current-mode output signals are converted into voltage-mode signals via the 
use of a differential transresistance amplifier having a transresistance value of I M. The 
correct functionality of each stage in the onset detector is verified by monitoring the 
appropriate current-mode outputs, which are available for each stage. 
4.5.1 Voltage-to-Current Converter and Current Splitter 
Figure 4-21 shows the test waveforms obtained with a PM3394B oscilloscope for 
a sinusoidal input voltage of 1 kHz and an amplitude of 875 mV (input resistors set to 
I MSS). Waveforms ch3 and chl show the differential signals while ch2 is the resulting 
difference. The harmonic mean splitter was also tested for THD using an Avantest 
R3265A spectrum analyser. The results, plotted in Figure 4-22, indicate a THD value of 
1.9 % for an input voltage of 1.5 V pk, corresponding to 1.5 µA pk of differential output 
current, and a dynamic range of 70 dB. The -3 dB cut-off frequency was measured to be 
at 124 kHz. Two-tone intermodulation tests were also carried out using input sinusoidal 
frequencies of 1.5 kHz and 4 kHz. Table 4-1 shows the results for different input 
amplitudes. 
4.5.2 Squaring Circuit 
Figure 4-23 shows the test result waveforms obtained for the squaring circuit. The 
results were obtained using an input signal (chl) of IV pk (R;,, =1 Mc). The differential 
output signals are shown in traces chi and ch3, while the resulting difference signal is 
shown in trace chg. 
95 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
PM3394B 
ch3 
chl 
ch2 
Figure 4-21: Output waveforms of the V-I converter and current sputter, showing 
differential signals (chl, chi) and resulting difference (ch2). 
1.8ý 
1.6' 
1.4- 
R 
. ö ý.. ..... _ _ .... : 1.2 
O. B. 
0.6 r .: 
0 
0.4 
0.2 
0' .. - _ _. _. -- --- -F-- - 
10'2 10'1 100 10 
V;,,, V into 1 MSZ resistor 
Figure 4-22: THD measured as a function of the input amplitude. The input resistance 
is 1 MCI, and thus 1V pk at the input corresponds to 1 µA at the output. 
96 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
Input. amplitudes Output sidebands measured relative to 
outputs at 1.5 and 4 kHz 
1.5 kHz k 4 kHz k. 2.5 kHz 5.5 kHz. 
0.2 0.2 -60 dB < -70 dB 
0.4 0.4 - 55 dB - 58 dB 
0.8 0.8 -51 dB - 53 dB 
1.0 1.0 -47 dB -49 dB 
Table 4-1: Intermodulation tests carried out on the V-I converter and harmonic mean 
splitter at 1.5 kHz and 4 kHz. 
PM3394B 
ch2. 
ch4, 
ch3 
chi 
Figure 4-23: Output waveforms of the squaring circuit, showing input signal (ch4), 
differential signals (chl, ch3) and resulting difference (ch2). 
97 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
4.5.3 Lowpass Filter 
The frequency response of the lowpass filter, measured at three different tuning 
currents (54 nA, 99 nA, 425 nA) is shown in Figure 4-24, together with the simulated 
response curves. The measured - 3dB corner frequencies are 800 Hz, 1.4 kHz and 
6.2 kHz, respectively, which are also close to the simulated values. The measured 
responses also exhibit the expected - 40 dB/decade roll-off. A particular discrepancy 
between the measured and simulated characteristic is the lack of peaking in the measured 
results, which is expected for a damping factor of 0.5. However, for this application, this 
discrepancy does not pose any limitation on the circuit functionality. 
10'- 
5i 
Oý.. 
10 
t 
425 nA Itune 
-15 
-20 . 
-251 
-30 
= 54 nA 
Itune = 99 nA I wne 
-35 
. 40 
100 10 ' 10 2 10 3 10 4 10 5 10 6 
Frequency, Hz 
Figure 4-24: Simulated and measured LPF frequency responses: the curve marked with an 
`*' is the measured response while the corresponding smooth curve is the simulated result. 
98 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
The functionality of the LPF as an envelope detector is shown in the measured 
waveform in Figure 4-25, where a sinusoidal signal of 10 kHz modulated with a 
triangular waveform of 210 Hz is applied at the input (ch4). The LPF is operated at a 
tuning current of 425 nA for this result and subsequent results shown in this chapter. The 
extracted envelope signal is shown in ch4. 
PM3394B 
chl 
ch3 
ch2 
ch4 
Figure 4-25: Output waveforms of the LPF, showing input signal (ch4), differential signals 
(chl, ch3) and resulting difference (ch2). 
4.5.4 Delay-line 
The delay exhibited by the whole S2I delay line consisting of 31 stages can be 
calculated as: 
31 x2 xDiv Td = 
. 
fork 
(4.9) 
where Div is the on-chip frequency divider factor and fctk is the input clock frequency. 
The measured waveforms for the circuit tested with fClk = 20 kHz and Div set to 1,2 and 4 
are shown in Figure 4-26 (a) - (c), corresponding to a sinusoidal input signal of 10 kHz 
99 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
modulated with a triangular waveform of 77 Hz. The corresponding delay values of 
3.1 ms, 6.2 ms and 12.4 ms can be evidenced from these figures. At the higher Div 
setting, the sampling rate is reduced as can be seen in Figures 26 (b) and (c). It should be 
noted here that the last stage of the S21 line works as a sample-and-hold circuit with no 
LPF smoothing characteristic to remove the switching spikes. 
4.5.5 Echo Decay Model 
The output waveforms for the echo decay model, measured at two different tuning 
currents (Iwr, e =2 µA and 2.8 µA) and C= 940 nF (external) are shown in 
Figures 4-27 (a), (b). The measured decay rate time constants are 22 ms and 16 ms, 
respectively. The corresponding time constants obtained during simulation are 24.5 ms 
and 17.5 ms. The waveforms in Figures 4-27 (a), (b) are obtained for a sinusoidal input 
signal of 10 kHz modulated with a triangular waveform of 77 Hz. Figures 4-27 (c), (d) 
shows the measured echo model decay output at Iwr,, (decay) = 2.8 µA and Div equal 
to 2,4, corresponding to Td values of 6.2 ms and 12.4 ms, respectively. 
PM3394B 
ch4, 
ch2 
Figure 4-26 (a): Output waveforms of the S21 delay line at fdk = 20 kHz, showing input signal 
(ch4), and the corresponding delayed envelope signal (ch2), for Div =1 (Td = 3.1 ms). 
100 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
PM3394B 
ch4 
ch2 
Figure 4-26 (b): Same as Figure 4-26 (a) but for Div =2 (Td = 6.2 ms). 
PM3394B 
ch4 
ch2 
Figure 4-26 (c): Same as Figure 4-26 (a) but for Div =4 (Td =12.4 ms). 
101 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
PM3394B 
ch4 
ch3 
Y/Dir: Timebase: TRACE 
1.00 V 2.00ms ch3 
1.00 V 2.00ms ch4 
Figure 4-27 (a): Echo decay model output signal at ffik = 20 kHz, showing input 
signal (ch3), and the corresponding echo decay model output signal (ch4), for 
Div = 1, (Td = 3.1 ms), It. (decay) =2 pA. 
PM3394B 
ch4 
ch3 
Y/Div: Timebases TRACE 
1.00 V 2.00ms ch3 
1.00 V 2.00ms ch4 
Figure 4-27 (b): Same as Figure 4-27 (a) but for It,. (decay) = 2.8 µA. 
102 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
PM3394B 
ch4 
ch3 
Y/Div: Timebase: TRACE 
1.00 V 2.00ms ch3 
1.00 V 2.00ms ch4 
Figure 4-27(c): Same as Figure 4-27 (a), but with Div =2 (Td = 6.2 ms) and 
I,,.. (decay) = 2.8 pA 
PM3394B 
ch4 
ch3 
Y/Divs Timebaae: TRACE 
1.00 V 2.00ms ch3 
1.00 V 2.00ms ch4 
Figure 4- 27 (d): Same as Figure 4-27 (a), but with Div =4 (Td =12.4 ms) and 
1"'.. (decay) = 2.8 &A. 
103 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
4.5.6 Onset Window Generator 
The operation of the whole onset detector has been tested at fcik = 40 kHz, 
= 2.8 . iA and THR = 1, with the on and off times Itune (LPF) = 425 nA, Rune (decay) = Im, 
of the onset generator set to 4 and 6 ms, respectively. The results for Div values of 1,2,4 
and 8, corresponding to Td: -- 1.55,3.1,6.2,12.4 ms, are shown in Figures 4-28 (a)-(d), 
respectively. For Figures 4-28 (a)-(c), the input signal is a 10 kHz sinusoidal waveform 
modulated with a 40 Hz triangular waveform. As expected, the onset window is located 
around the peaks of the input envelope. In Figures 4-28 (c), onsets occur only at alternate 
peaks, since the estimated echo energy is still high relative to the input signal envelope 
for the peaks with a missed onset. Figures 4-29 (d) was obtained with Div = 8, but with 
the input signal modulated with a 20 Hz triangular waveform. 
The measured power dissipation of the whole circuit, operating at a supply voltage 
of ± 0.9 V, is 4.0 mW under quiescent conditions, rising to 5.4 mW when an input signal 
is applied. This variation in the power dissipation with input signal level is expected and 
is mainly due to the Class AB operation of the S2I delay line. 
PM3394B 
ch3 
ch4 
ch2 
Y/Div: Timebaae: 
50.0mV 5.00ms 
1.00 V 5.00ms 
2.00 V 5.00ms 
Figure 4-28 (a): Measured 
It.. (LPF) = 425 nA, It. (Deca 
(Td =1.55 ms). 
TRAB 
ch2 
ch3 
Ch4 
window generator output for fcik = 40 kHz, 
y) = 2.8 µA, On-time =4 ms, Off-time =6 ms, Div =1 
104 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
PM3394B 
ch3 
ch4 
ch2 
1 
Y/Div: Timebaaas TRACK 
50.0mV 5.00ms ch2 
1.00 V 5.00ms ch3 
2.00 V 5.00ms ch4 
Figure 4-28 (b): Same as Figure 4-28 (a), but with Div =2 (Td= 3.1 ms). 
PM3394B 
ch3 
ch4 
ch2 
Y/Div: Timebase: TRACE 
50. OmV 5.00ms ch2 
1.00 V 5.00ms ch3 
2.00 V 5.00ms ch4 
Figure 4-28 (c): Same as Figure 4-28 (a), but with Div =4 (Td = 6.2 ms). 
105 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
PM3394B 
ch3 
ch4 
ch2 
Y/Div: Timebase: 
50.0mV 10. Oms 
1.00 V 10. Oms 
2.00 V 10. Oms 
Figure 4-28 (d): Same as Figs 
frequency set to 20 Hz. 
TR R 
ch2 
ch3 
ch4 
ire 4-28 (a), but with Div =8 (Td 12.4 ms), and modulating 
4.6 Conclusions 
The onset detector minimises ambiguities that may be introduced by the presence 
of echoes and will be used as the first block of the hardware cue extraction system, 
processing the incident signals by removing echo information. The analogue chip 
developed here for the detection of onsets in an acoustic signal is based on echo envelope 
estimation, and includes novel circuits for envelope extraction, initial echo delay and echo 
decay model. The squaring circuit is based on three translinear loops which ensure 
adequate d. c. components in the currents passing through the MOS devices. The echo 
decay model is based on a modified log domain ls` order low pass section allowing 
programmability of the decay constant. Most of the processing has been done in a pseudo 
differential mode in order to reduce the effects of common mode noise and offsets. A 
current mode approach has been adopted and all blocks have been optimised to operate at 
a low supply voltage. In order to minimize power consumption, log-domain circuits have 
been used for envelope extraction and echo decay modelling. The initial echo delay has 
106 
Chapter 4. Hardware Detection of Onsets in a Sound Signal 
been implemented using a switched-current delay line which is easily interfaced with the 
log domain circuits. The circuit also demonstrates that S21 circuits can be successfully 
interfaced with log-domain circuits. 
Test results are quite close to the simulated results, indicating that subthreshold, 
MOS operation is viable, provided good layout techniques are used. Analogue signal 
processing using MOS transistors operating in subthreshold region is interesting from the 
low power, low voltage point of view. These circuits also tend to be compact, requiring 
no extra linearisation circuitry and/or high gain amplifiers and therefore entail low cost in 
terms of area. Circuit variations due to parameter tolerances are accounted for, since the 
operation of each stage can be tuned separately via suitable tuning currents. 
107 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
Chapter 5 
Log-Domain Front-End for the Extraction of 2-D 
Sound Localisation Cues 
5.0 Introduction 
This chapter describes the design of a current-mode front end for the extraction of 
localisation spectral cues, namely the interaural intensity difference cues and monaural 
spectral cues, which are essential for the determination of the elevation of a sound source. 
In addition this chip outputs information required by the subsequent time delay extraction 
chip to evaluate interaural time delays. Previous hardware techniques usually relied on 
G, 7C topologies [44], which require extra linearisation techniques [54] in order to 
improve the dynamic range. For this implementation, a current mode log-domain topology 
using subthreshold MOS operation is used for compact and micropower operation, while 
still achieving a good bandwidth. In addition, LD processing results in good linearity 
since the circuits are essentially large-signal linear. A current mode solution is also 
preferred because of the ease of implementation of certain mathematical operations. A 
fully differential Class AB architecture has been adopted for the filters, in order to 
minimise errors and the effects of power-supply noise. The filters are implemented using 
CMOS versions of bipolar LD building blocks, modified in order to allow for Class AB 
operation. The chosen topology minimises errors which occur in CMOS LD circuits 
arising from the body effect. When compared to bipolar LD circuits, CMOS circuits 
exhibit a smaller dynamic range due to the weak inversion operating range limit; however, 
they offer the advantage of zero base (gate) current, which is often a source of distortion 
in bipolar LD circuits. In this case, the dynamic range of the front-end has been further 
enhanced via the use of AGC. 
The front end splits the input signals into different frequency bands and computes 
monaural (1st and 2"a order) and interaural spectral cues from the resulting signal 
envelopes for each band. The front end has been optimised to operate at a supply voltage 
of ± 0.9 V. This is the first LD implementation of a front end with localisation cues 
extraction. The chip has designed and implemented using a standard double-poly double 
metal 0.8 µm CMOS process with a nominal VT = 0.8 V. 
108 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
An overview of the front-end chip is described in section 5.1, where the main 
building blocks of the chip are introduced. Section 5.2 deals with the development stages 
of the LD bandpass filters up to the final Class AB differential version. Section 5.3 
describes the envelope extraction method, while in section 5.4, the biasing details for the 
BPF array are discussed. The hardware computation of the interaural intensity difference 
and monaural (first and second) order spectral cues is described in sections 5.5 and 5.6. 
The automatic gain control loop hardware details are discussed in section 5.7, while 
section 5.8 briefly describes the output multiplexing involved. Simulation and testing 
results results are presented in sections 5.9 and 5.10, respectively. 
5.1 Front-end System Overview 
A block diagram of the front end is shown in Figure 5-1. The inputs to the system 
are the left and right audio channel signals, which are first converted to current signals 
and then split into complementary differential signals using the harmonic splitter 
described in Chapter 4. These signals are then processed by an AGC block and 
logarithmically compressed. Each channel is then processed by a parallel BPF bank with 
24 filters whose centre frequencies range from 80 Hz to 20 kHz. Envelope processing is 
carried out for centre frequencies above 1 kHz (i. e. for the 12 `h filter onwards), while the 
output of filters having centre frequencies less than 1 kHz are processed off-chip in order 
to extract the interaural phase delay (IPD) cues [118]. 
X 
AGC 
signal AGC 
IID 
. x 
signal 
Jl I. 1" !l 
Left MSC MSC 
input 
º 
Log & 
GC A X 
I' 
Zr Z, a 
MSC 
-N 
s 
A 
LAG8 
MSC MSC 
Q-f ctor 
Ca6bntion 
Memory 
UR BPF (24) 
10 Mit m AGC 
UR Envelope (x 13) 
Currents from Reference 
all envelope 
AGC UR MSCI (s 12)_ 
currrnt Cionip -0- signal UR MSC2 (x J i) 
detectors 00 
detector (ID (x13) 
UR BPF, 
Output UR Envelope, 
Multiplexors UR MSCl, 
UR MSC:, 
11D. 
output pins 
Figure 5-1: Block diagram of the front-end 
Right 
Input 
109 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
The BPF filter outputs are squared and lowpass filtered in order to extract the 
envelope of the signal. The envelope signals are output by the front end for further 
processing for the extraction of interaural envelope delay (IED) cues. IIDs are extracted 
by dividing the envelope waveform of the L-channel by that of the R-channel. Monaural 
cues are extracted by dividing the envelopes of adjacent filters of the same channel. 
In order to allow for Q-factor variations due to component mismatches and parasitic 
elements in the hardware implementation, a digital calibration memory is used which 
controls the current sources adjusting the Q-factor of each filter. Calibration of the system 
is done off-chip. In order to avoid distortion due to excessive signal amplitudes, a 
maximum current detector is used to determine the maximum envelope signal. This 
current is then used to adjust the gain of the AGC block. The number of output pins 
required has been minimised by multiplexing the processed signals from all filters onto a 
single pin. 
5.2 Log-Domain Bandpass Filters 
The cochlea has often been modelled as a cascade of LPFs followed by a 
differentiator. Although this approach has biological counterparts, from the hardware 
implementation point of view, a parallel BPF bank offers the following advantages 
compared to a cascade of LPFs: 
(a) noise and offset errors do not accumulate 
(b) system is more fault tolerant: if one filter malfunctions, the other filters will still 
operate and localisation is still possible (although less accurate), that is, there is 
graceful performance degradation 
(c) signal delay does not accumulate, and hence the filters react immediately to the onset 
of the signal, independent of the number of filters used 
The main problem with the BPF architecture is that it is very difficult to achieve the 
steep high frequency cut-off slopes, which are inherent to the biological system. However, 
the high level system simulations carried out in chapter 2, indicate that satisfactory 
localisation results can still be obtained using a cascade of two second order BPFs with a 
Q-factor just equal to 10. 
For the implementation of the BPFs, the use of LD filters has been studied. The 
advantage of using this type of filters is that they require only a few transistors, and can 
achieve very good linearity in the current-mode domain, even at low supply voltages, due 
110 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
to their companding nature with respect to voltage swings. Since LD circuits are 
externally large-signal linear, they are advantageous compared to G, ,, -C filters, where 
linearity can be improved only within certain limits [54]. LD filters can be implemented 
in CMOS using transistors operating in weak inversion, thus using low bias currents 
(typically of the order of 1 µA) and can achieve good performance at audio frequencies. 
The main problem with such circuits is the mismatch issue. However, the effects of this 
problem can be easily eliminated by generating the cue template using the actual 
implemented hardware. In that case, thermal stability of the circuit is essential. The time 
constants inside a LD filter can be kept constant with temperature, via the use of PTAT 
current sources. Three BPF LD filters have been investigated: one based on a LD 
transformation of the classical cochlea model, another is derived directly from the BPF 
transfer function using signal flow graph techniques, while the third one is a fully 
differential architecture, which is used for the final chip implementation. 
5.2.1 Log-Domain Filter Based on Gm-C Cochlea Model 
Figure 5-2 shows a G,,, -C 2nd order LPF, which is often used in electronic cochlea 
models [29], [34], [45], [51]. 
7BPF 
VLPF 
Figure 5-2: 2"d order LPF / BPF used for cochlea modelling 
Although in literature this filter has been mainly used as a LPF for cochlea modelling, it 
can be seen that the BPF transfer function can be achieved by taking the difference signal 
between the input and the output of the last stage: 
(g, 
nr 
2 
VLPF (s) C 
V1n(s) $mr 
2 
s2 +s 
c 
+Smr 
8mr lC) 
2gmr -8mq 
Ml 
V8pF (S) S 
SCT ) 
(5.1) 
V1. (S) gm 
t2 
S2 FSg+T 
MT 
c 
I. 
2g - S. q 
111 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
The signal flow graph for the above 2"d order section is shown in the Figure 5-3: 
gmq/sC 
vx 1 gmt/sC 
º"f* gmi/sC "1 
vin 1R` / VLPF 
-1 gmmq/ssC -1 
Figure 5-3: Signal flow graph for the cochlea model 
The SFG in Figure 5-3 can be simplified by scaling, shifting and changing the 
position of the feedback paths going to node V,,, so that the system can be implemented 
using only 2 integrators, as shown in Figure 5-4. 
gmq / $mi 
1 gmt/sC 
vX 
"1" sC 00.0 g""/ "1 
VLPF 
Vill 
gmq / gmt 
Figure 5-4: Signal flow graph for the simplified cochlea model 
The circuit diagram for the 2°d order section is derived by connecting the LD 
building blocks introduced in Chapter 4 (Section 4.3.3.1) according to the SFG, as shown 
in Figure 5-5 (a). It can be seen that compander and expander blocks are only required at 
the input and output interfaces. The circuit diagram in Figure 5-5 (a) can be minimised by 
noting that some transistor pairs form current mirrors carrying a d. c. current and can 
therefore be substituted by a single d. c. current source as shown in Figure 5-5 (b). 
Correct operation of the LD filter can be verified by writing down the translinear 
loop equations, capacitor voltage equations and the equations for the drain currents of M5, 
M6, M8andM9. 
Translinear loop equations: 
IQl tune = 1112, 'in 'tune 
2= 13 121 LPF ' 
181 tune =I LPF 12 
112 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
1 VB 
r- 
Figure 5-5 (a): Log-domain BPF based on cochlea model 
LP F 
Ha 
Capacitor equations: 
Cd 
dt 
!=I 
tune +1ý- I3 - IQ Cd c2 ='tune - I2 
113 
Figure 5-5 (b): Minimised log-domain filter 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
Drain current equations: 
VC, -Vo -VTVB -Vo -VT 
VC' - vB 
Itune =ID0e 
nU TI LPF =fD0e 
nU pI 
tune =f LPF e 
nU T 
VCt-VX -VT VC, -VX -VT Vct-VC, 
nU T nU T nU T I tune =IDOe12=IDOe tune = 
12 e 
Simplifying the above equations gives: 
'tune 
ýLPF (s) 
z 
nCU T 
and 'BPF (s) =Sn 
CU T JLPF (S) (5.2) 
'in (s) It. 
ne 2 
Itune 
S Z+ 
(nCU 
r s+ 
f tune 
Itune 
l 
nCU T 
2ltune - IQ 
Thus IBPF(s) represents a bandpass transfer function with ot)n = Itufle/nCUT and 
Q= Itune/(2ltune - IQ), and UT = kT/q. 
5.2.2 Signal Flow Graph for the Second Log-Domain BPF 
One drawback of the above implementation is that the ratio y= IQ/Itune is related to 
the Q-factor by y=2- 1/Q. For a Q-factor of 10, y has a value of 1.9, while ay value of 2 
will result in an unstable system. This large sensitivity of Q with k for k values 
approaching 2, can be particularly troublesome since small circuit mismatches can result 
in large Q-variations or even an unstable system. In order to avoid this problem, it is 
desirable that the Q-factor is linearly related to the ratio of two bias currents, and thus a 
new filter topology has been investigated. 
If X and Y are the input and output signals of a BPF, respectively, then it can be 
seen that: 
2 
XCO 
n=S+W"+ 
0)n2 Y sY = X0) n- 
i" Y- 0) "y 
s 
(5.3) QQs 
The corresponding SFG for a 2"d order BPF is as shown in Figure 5-6 (a). In the LD filter, 
the w coefficient is included as part of the integrator. Also, it is preferable to scale Y by 
1/Q such that unity gain is achieved at the resonant frequency. Thus the scaled SFG with 
s, and s2 as state variables is as shown in Figure 5-6 (b). 
In this scaled SFG, it can be seen that the 2 "d integrator does not have a local 
negative feedback (damping) path. This means that at quiescent condition, its input node 
114 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
(si) will be forced to a zero d. c. value by the main negative feedback loop. In LD circuits, 
this condition imposes a problem since the logarithm of zero is undefined and therefore it 
is not possible to establish a d. c. operating point. In order to be able to establish the d. c. 
operating point of the filter a d. c. offset signal is added as shown below in 
Figure 5-6 (c) [109]. 
-0)n 
2 
X wn 
1/s 1/s 
" 
UJn/Q 
Y 
Figure 5-6 (a): Alternative BPF signal flow graph 
-1 
Wn /S si Wn /S 
X Ole -f ". Sz 
-1/Q 
1/Q 
Y 
Figure 5-6 (b): Simplified BPF signal flow graph 
-1 
s' s2 
X-1 º" w- "si"1 No "w /s " 
-1/Q 
/Q -1 
d/S 
Y 
Figure 5-6(c): Signal flow graph adapted for log-domain implementation 
The d. c. offset signal Jdc/S forces the signal sl to have a quiescent value of Jdc. It 
can be seen that the new output signal Y in this case is given by: 
s[yn X(s)+ wn 
z 
-1 dc 
Q 
s2 + Qn s+ w2 s2 +ns+ w` s (5.4) Q 
115 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
Using the final value theorem, it can be verified that the d. c. value of Y is Jdc/Q, while the 
transfer function with respect to the X input remains unchanged. The circuit diagram is 
thus obtained by connecting the LD building blocks according to the above modified SFG 
as shown in Figure 5-7. In this circuit IBPF2 is equal to IBPF scaled by 1/Q. It can also be 
noted that M12 and M4 effectively form a current mirror: therefore, the circuit can be 
simplified by placing the current source Itune/Q in parallel with the capacitor of the first 
stage and eliminating M4. Correct filter operation is verified by writing down the 
translinear loop equations, the capacitor voltage equations and the drain current equations 
for M5,11,7,8 as follows: 
Translinear loop equations: 
, 
tunet = 15,3 9 
I3I 
dc - 
I6I 
BPF' 
I 
dc'3I2 - 161 tunet in' 
I 
BPF 2Q -I BPF 9 
I2I 
BPF -I tunet in 
Capacitor equations: 
CdV t =""e +I5 I z' Cdvc2 =I6-I3 dt Q dt 
Drain current equations (VDs >3 UT, IDO = subthreshold current scaling parameter): 
VB -VQ I -Vr Vc I -Vol -VT Vc l -VB 
BPF Doe 
nUr 
1 tune Doe 
RUT 
tune 1BPFe 
nUr 1 =1 1 =1 '= 
VCI-Vat-Vr vc2-V02_VT 
_Vc2 -VrI 15 ='Doe 
RUT 'tune =1 Doe 
RUT 
=> Itune = 15e RUT 
Simplifying the above equations gives: 
Sý 
I 
inne 
)li- 
(S) +I1 iune Idc (S) 
nCUT InCU,. 
)z 
IBPF(S) =z (5.5) 
Sz + 
Ilene )S'ý Dune 
nCUT 
( 
nCUT 
which is a 2nd order BPF with cw = It1111e/nCUT, where UT = kT/q. It can be seen that, 
provided the transistors are matched, the resonant frequency and Q-factor are independent 
of the absolute transistor parameters such as VT and (W/L). However, increasing the W/L 
ratio of these transistors, increases the maximum drain current which can still be obtained 
within the weak inversion region, but the associated transistor parasitic capacitances, 
which modify the filter response, will also increase. 
116 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
5.2.3 Differential Class AB Architecture 
In order to reduce the errors due to systematic offsets and also the influence of 
externally induced noise (such as power supply noise and noise induced from nearby 
circuitry), a fully differential topology was finally chosen for the design of the front end. 
An approach to this topology is to split the input signal into two complementary signals, 
which are then processed by two separate single-ended filters. For correct operation of 
the LD filters, two rules have to be obeyed: first, a non-zero d. c. operating point solution 
has to exist for each of the state variables. For a non-zero d. c. operating point solution to 
exist, it should be possible that the derivatives of the state variables can be made equal to 
zero while the state variables themselves remain strictly positive. Secondly, it must be 
ensured that the state variables remain strictly positive provided that the complementary 
inputs are strictly positive. Considering the SGF in Figure 5-6 (c), the state space 
equations can be written as: 
dt - Con x- s2 _ (5.6) 
ds Zr1 CO,. Is. - 'j dc l dt 
A d. c. quiescent solution is ensured via the introduction of Jdc, since sl and s2 remain 
positive even at steady state. However, in order to obey the second rule, the value of Jac/Q 
117 
Figure 5-7: Log-domain implementation of the second BPF 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
has to be chosen to be higher than the maximum instantaneous value of Y; this latter 
condition provides a restriction on the maximum output signal, and thus it does not allow 
Class AB operation. In order to enable Class AB operation, it has to be ensured that the 
derivatives of each state variable are positive in the limit as the state variables approach 
zero [119]. In a fully differential system, the state variables are represented by two 
complementary signals thus: s1 = slp- SIN and s2 = sZP - S2N. The state space equations can 
be modified for a fully differential BPF as follows: 
ds 
= CO X+ S2N - 
SIP 
- 
SIPSIN d 
2P=Wn 
[Sip 
S2PS 2N 1 
Q d J 
dS 
1N 
SIN SIPSIN dS 
2N S2PS2N 1 
(5.7) 
_ CV 
n dt 
X+ SzP - Qk 0) dt n 
[S 
I 1N k J 
From the above equations, it can be seen that the derivative of each state variable 
tends to a positive value in the limit as the state variable itself tends to zero, thus ensuring 
that the state variables can never become equal to zero, hence allowing for Class AB 
operation. Non-linear cross-coupling terms SIpSIN and S2PS2N have been added which 
allow the derivatives of the state variables to take a zero value while the state variables 
themselves remain positive. These non-linear cross-coupling terms cancel out in the 
differential output expression; thus the system still implements a linear BPF transfer 
characteristic. The corresponding signal flow graph for the differential Class AB BPF is 
shown in Figure 5-8. 
SIP* SIN/k S2p " S2N/k 
-1 -1 
1 
0 co, , 
/s 
Slp 
W /S 
0.5 
11 
X 
S2p 
1 -1 
- 0.5 
Ar, 
" (o° /s C"1"w /s " 
SIN 
-1 -1/Q -1 
SIP 'o SIN/k S2P " S2N/k 
Figure 5-8: Signal flow graph for the Class AB BPF 
Y 
118 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
5.2.4 Implementation of the Non-linear Cross-coupling Terms 
The non-linear cross-coupling terms, which are required for the implementation of 
the Class AB BPF, are not accounted for in the LD integrator described in Section 5.2.2. 
However, it will be shown that these terms can be readily introduced by a slight 
modification of the integrator. Suppose it is required to add the input term -loutlkt2/lscale 
(Iscale being a constant current scaling parameter) to an integrator whose output is I. U, t. A 
possible approach is to use a translinear loop M3_6 as shown in Figure 5-9. 
It can be seen that the source voltages of M6 and M11 must be identical, thus M6, 
M11 can be removed and the source of M8 fed directly (dotted line) by M5. It can also be 
noted that M3.5,8-10 form a translinear loop, thus 
f out2 f outl ' tune 
[scale 18 loud The value of Iscale is 
arbitrary and can be conveniently chosen to be equal to Itune, in which case the translinear 
loop will force I8 to be equal to I, . Thus the non-linear cross coupling term can be 
efficiently implemented by injecting the current Io,, t2 directly into the capacitor, as shown 
in Figure 5-10. The operation of the modified integrator can be verified as follows: 
Translinear loop equation: 'in tune 
'oul2 -'c I outs 
Figure 5-9: Implementation of the non-linear cross-coupling terms via a translinear loop 
119 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
[loutl 
VB 
Figure 5-10: Simplified implementation of the non-linear cross-coupling terms 
Capacitor equation: 
dv` 
=I dt 
Drain current equation: 
VB -V 
nU T 1 out 1=f tune e 
Simplifying the above equation gives: 
dIo tl 
_ O)n Iin - 
Io ýllout2 
tune 
(5.8) 
5.2.5 Bandpass Filters Circuit Implementation 
The resulting circuit implementation of the differential 2 "d order Class AB BPF is 
shown in Figure 5-11. For correct operation, M3-6, M10, M11, M13-15 must be in weak 
inversion region. M15 is used as a log compressor, while M14 implements an exponential 
expander. The non-linear cross-coupling terms are implemented via M1,2. The linear 
coupling terms are implemented via M4. In order to reduce the effect of VDS variation on 
the drain current, all transistors are in fact cascoded; however, for clarity, the cascode 
transistors are not shown in the diagrams. The translinear loop equations are given by: 
120 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
Ix 'tune 
__ 1 
'in+ Itune 
=1 and 
I° + 
. 
Itune 
= - 14 1°+ Io- - I4 + Iq - Icl Io+ Ix - Ic2 Ix+ 
The drain current equations are given by: 
VB 
u-VCl , 10 Itunee 
dl .+ I+ dV 
dt nUT dt nUTC 
I+=I 
vnUT 
X `""e 
e 
dl, + 
_ 
1x dVCz Ix+Icz 
dl n UT dt n UTC 
Simplifying the above equations it can shown that the above section implements the 
following time domain equations: 
dI +I +(t) I°+(t)I -(t) dlx' Is+(t)Ix(t) WO 
= Con Ii. (t) + I,, (t) -°- and = COn 
[J+(t) 
- (5.9) dt Q 'rune dt I, 
une 
where wr, = Ih111e/nCUT and Q= Itne/IQ. Similar equations are obtained for Io- and I), -. The 
overall transfer function is given by: 
1°' (s) - 10 (s) _ 
SO) n H(s) 
(s)(s) s: +a'-n s+rvý2 
(5.10) 
Q 
fln+ 
MIA 
0 
Ml 5A 
VB 
M16A Vi+ 
lin- 
VB 
VC2 
M12A 
M13A 
Cý 
Vý 
M7Ä 
6A 
CI 
Iq Itune 1a _ 
M4B 5B M8B 
14B X 
vB 
M16B Vi. 
>I III., 
w 
M6B 
M2' Iq Itune 10 + 
Itune 
+ M8A M9A M4A M5A ä 
14A - 
M3A MI 1A 
VB M10A 
Y_+ 
M1B M2B 
M15B M3B 
__L f M/ö Cý 
Figure 5-11: Second order Class AB differential BPF. 
(tune 
M9B 
M11B 
VB 
M13B 
M12B 
C 
M108 
121 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
5.3 Envelope Extraction 
Envelope extraction, used in obtaining spectral and IED cues, is carried out via a 
process of signal squaring and 2nd order low pass filtering. The low pass filter cut-off 
frequency is set to be equal to one-fifth the resonant frequency of the corresponding BPF. 
The cut-off frequency is set by a scaled current mirror which is used to generate the tuning 
currents of the BPFs and LPFs. The squaring and lowpass filtering circuits are described 
in detail in chapter 4, sections 4.3.2 and 4.3.3. 
5.4 Biasing Arrangement of the BPF Array 
From the equations derived for LD filters, it can be seen that in order to retain the 
same resonant frequency at different temperatures, a PTAT current source has to be used 
such that the ratio It11t1e/nUT remains constant. In order to achieve the exponential 
progression in the resonant frequency of the filters in the bank, linearly-spaced voltages 
are produced via a resistive divider line, together with an associated driver circuit as 
shown in Figure 5-12. These voltages are applied to the gates of MOS devices operated in 
weak inversion. MI in Figure 5-12 (b) is matched to MI in Figure 5-12 (a), which forms a 
negative feedback loop around the driver circuit M2_5. The driver circuit has three poles in 
its transfer function which pose a stability problem. Hence compensation capacitors are 
introduced at the high impedance nodes in order to ensure stability. Values shown 
adjacent to transistors in Figure 5-12 (b) indicate the relative (W/L) ratios. The section 
shown in Figure 5-12 (b) is repeated for each filter. The currents through Mla and Mlb are 
set to IPTAT and IPTAT/225 (generated by a PTAT source), which correspond to the bias 
currents of the first (k = 1) and the last (k = 24) filter. For correct operation, M1 in 
Figure 5-12 (a) and all MI-8 in Figure 5-12 (b) (whose gates are connected to the 
uniformly tapped PolySi line) must be matched and operate in weak inversion. The tuning 
currents are therefore given by: 
i 
-n(k) 
_ 
IPTAT 
225 
k 
23 
iJ 
k =I.. 24 (5.11) 225 nCU T 
Using the scaled current mirror, the cut-off frequency of the LPF sections is set to 
1/5 the BPF centre frequency. Transistors M9_13 are used as switches and are controlled by 
the 24 x 5-bit Q-factor calibration memory, which is synthesised via Verilog HDL. A 
122 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
default value for B(4: 0) equal to (10101)2 (M4, M6 on, M3, M5i M7 off) and resulting in a 
Q-factor of 10.25, is pre-programmed into the memory. 
Using C= 50 pF, in the BPF circuit of Figure 5-11, and a tuning current Itune_BPF 
(k) in the range 1 nA to 225 nA at 27 °C, a frequency range of approximately 80 Hz - 18 
kHz can be achieved. The implementation of the PTAT source was based on the CMOS 
PTAT current reference shown in Figure 5-13. Transistors M1 and M2 are set such that 
the current through M3 is negligible during normal operation; however, this device is 
required in order to ensure that the PTAT source starts up. Since both M6 and M7 have 
their sources connected to ground, the body effect is cancelled out. Provided M6_9 operate 
in weak inversion, and since (W/L)4= 2(W/L)5, the current IPTAT is given by: 
k- 
I' nUT 
In 2 
ýn(ký = 
In 2 
225 23 
1k=1.. 
24 (5.12) 2R, 450 R, C 
This scheme makes con(k) independent of the absolute transistor parameters IDO, n and VT. 
R1 is an external component and allows for trimming of the BPF resonant frequencies. 
Vnias (k) 
M2a 
, 
-ý M3a M3a M2a ib; es 
M1a IIM1b 
ý---ý-ý M5a Id. I M5b ý--ý--ý 
IPTAT 
M4a M4b IPTAT/225 
Figure 5-12 (a): Resistive bias line and associated driver. 
Vbias(k) M1 M2 M3 M4 M5 M6 M7 M8 
205 41 1 2 4 8 16 10 
m m m m m 
M9 M10 M11 M12 M13 
Itune Itune Iq 
BPF LPF BPF 
Figure 5-12 (b): Bias transistors with Q-factor tuning. 
M5 
123 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
M4 M5 
mi12 11 111 
M31 Z R1 
IPTAT/225 
M2 
25 1225 1 225 IPTAT 
M6 M7 M8 M9 
Figure 5-13: PTAT current source together with start-up circuit. 
5.5 Computation of IID and First Order MSC Cues 
The IID cues are a measure of the intensity difference between the L and R 
channels. The TIDs are in this case computed via a 4-transistor translinear circuit formed 
using MOS devices operating in weak inversion as shown in Figure 5-14. M7,8 and M9,10 
are used as current mirrors in order to transform the differential current signal generated 
by the LPF into a single ended signal. M1_4 operate in weak inversion; thus: 
IinI+ _IinI +Iofs lout 
=1 Iout =I ref 
+Iofs ) 5.13 
I ref I in 2+ -I in 2+I ofs 
ref jin I+I ofs 
where I;,,, and I1n2 are the L and R envelope signals and Iofs is a constant current used to 
establish dc equilibrium for the case when uni = Iiti2 = 0. Since the envelope extraction is 
carried out via signal squaring without taking again the square root of the resulting 
envelope, the IID and MSC values are in fact computed on the square of the input 
amplitude. First order MSCs are computed by dividing the envelope signals of adjacent 
filters in the same channel using the same circuit as for IDs. In this case Iouc is a 
representation of the 15' derivative of the square of the signal magnitude with frequency. 
linl+ 
linl- 
lin2+ 
Iin2- 
124 
Figure 5-14: Circuit used for IIDs and 1" order MSCs. 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
5.6 Second Order MSC Computation 
Second order MSCs are computed using the 6-transistor translinear loop circuit 
shown in Figure 5-15, where M1_6 operate in weak inversion and M1015 are used for signal 
subtraction. Thus the output current is given by: 
our ref 
(IinI + Igfs )(Iin3 + Iofs) (5.14) I 
(I 
in 2+I oft) 
2 
where I;,, 1, Iin2 and I; n3 are 
the squared envelope signals generated by three consecutive adjacent filters of the same 
channel. In this case 't is a representation of the 2 "d derivative of the squared signal 
magnitude with frequency. 
Iin2+ 
lin2- 
rn2- 
Figure 5-15: Circuit used for 2 °d order MSCs. 
5.7 Automatic Gain Control Loop 
ri3+ 
n3- Ire 
lout 
VB 
AJB 
An AGC mechanism has been implemented in order to ensure that the BPFs 
operate within their dynamic range. The AGC loop adjusts the gain of the system 
according to the maximum envelope output from each filter as depicted in the block 
diagram shown in Figure 5-16, thus ensuring that none of the BPFs outputs are distorted. 
Since the AGC signal acts on both L and R channel variable gain amplifiers (VGAs) in 
the same manner, UD cues are not modified by the AGC loop; this feature is also present 
in the biological auditory system, where it is known that the AGC signals between the L 
and R ears are somewhat cross-coupled [43]. 
125 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
Lin A;. Lin 
L and R channel 
Filter banks and Envelope 
detectors 
E, 12 
Maximum Imax 
E1,2 Current 
ER, 2 Detector 
7 
10 
Rin ER, 2 
Figure 5-16: AGC loop arrangement. 
The circuit for detection of the maximum envelope current is shown in 
Figure 5-17. Considering the 2-cell arrangement shown, if I;,, l = 'j 2, then the currents 
through M, and M7 will be equal, causing Io to be equal to Im l= Iin2. If 'in! is now 
increased, then VGSI will have to increase such that IDS reaches the new value of I{nl" VGS7 
will also increase causing M7 to enter triode region. Thus the value of 1,, will be equal to 
I;,, 1. In general Io will be equal to the maximum of I; 0l, I; n2. In the actual front-end 11 x2 
cells are used, one for each envelope signal. 
linl- 
linl{ 
Cell 1 
Ai"Rin 
Figure 5-17: Maximum current detector 
The input signal amplitude is controlled by the VGA shown in Figure 5-18. The 
VGA is formed around the translinear loops M1,3,10,11 and M4,6,10,11- Cagc is an external 
126 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
capacitor which controls the AGC reaction time. The steady state gain of the VGA is 
given by: 
A=I our 
+- jour 
_I asc 
-'max (5.15) 
Iin+ - Iin- Lage 2 
Maximum gain occurs with Imax =0 in which case Aj, max = IagcI/Iagc2" Under closed 
loop condition, for large input amplitudes, the maximum value of I,, a,, will be equal to 
IagcI" Mz, s, 7, s, 9,12,13,14 ensure that d. c. component in the differential outputs remains equal to 
that in the input, such that it is not attenuated for very low A; values; due to this feature, 
Ai, max has to be restricted to 1 
by making Iagcl = Iagc2. 
M712 'I M8 I 
M9 
vT 
t---jIout+ lin-' I "lout- lagcl T Tlagc2 lin+ 
M1 M211 1--11 - -H M4 
M3 10 --"-N M6 
Vagc 
M10 1 M11 
-{ M7 ý-{ M8 
M13 
t 
M12 M14 M12 J 
Imax from 
1Jagc 
j' 
21 Max detector 
Figure 5-18: Differential variable gain amplifier. 
-Il 
In order to analyse the steady state response of the AGC loop under different input 
amplitudes, consider a single sinusoidal signal applied at the input with amplitude I; and 
frequency equal to one of the resonant frequencies of the filters. Since the envelope 
signals are extracted by a process of squaring and lowpass filtering, the value of I.,, will 
be equal to (I; A; Q2)2/2Iscaie, where Q is the Q-factor of one section of the BPF cascade and 
ISýale is the scaling current used in the squaring circuit. Substituting the expression of A; 
gives: 
Q412 
lagcl -Imax 
`I1l +21 Ii 2k-1 1l +41 Iilk Imp agcl 1- asc2 agcl aacl aacl aacl 
2lscale => max 21i 2k (5.16) 
where k= Q412Iscale 
The plots in Figure 5-19 show the steady state variation of Iýa, and A; as the input 
amplitude is varied from I nA to I µA for the case when Iagc I= Iagc2 = Iscale =I . tA and 
Q= 10. The circuit used in the VGA ensures correct startup whatever the initial voltage 
127 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
stored on Cagc. If the voltage across Cagc is initially zero, then Imax will be zero and Cagc is 
charged via Iagcl. Another condition may arise when the voltage across Cagc is too high 
such that the translinear loop does not operate properly and Imax is also zero: under this 
condition Cagc is discharged via M10 until a valid operating condition is obtained. 
The transient open loop response of the VGA can be analysed by considering the 
transient current through Cagy as follows: 
AI, o _Iagci-'max-I,. I I eVc-VB-VT ýdlio = 
Ito dV 
_ 
IjoIc 
-I-I' 'o = agc2 nU dt nU dt nCU agý2 agýz rrr 
Thus, _ 
A. ( 
`IagCl -ImaX 
(5.17) 
ud 
dt nCUT 
The time domain response of the AGC as a result of a step input change in I,,, ax is given 
by: 
Al (t) = 
Iagcl -Imax 
('agcl -Imax)t 
[1ac1- I max -I agc2 Ai (0) lagc2 +e 
Ai(O) 
Iagc2 t 
Ai(t) = Ai(O)e nCUT 
(or Imax = Iagcl) 
1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
nCUT 
(for Imax <Jagd) l 
I,,., µA 
A; 
10 w 10 4 lo'? 10 -6 
Input amplitude Ii, µA 
(5.18) 
Figure 5-19: Steady state output current A, and VGA gain A, as a function of input 
amplitude. 
128 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
The transient response of the VGA block with I agcl = Iagc2 =I µA, A; (0) =I and 
C= 270 nF for a step input Imax with amplitudes ranging from 0 to 1 µA is shown in 
Figure 5-20. The capacitor C is external and allows for trimming of the AGC reaction 
time. 
0.9 
0.8 
0.7 
0.6 
.-o. I 
ce 
U 
o.: 
0.: 
0. 
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 
I-X=0gA 
Irrox=0.2NA 
i 
Max =0. aß 
i 
max = 
0.6 IiA 
=0.8MA 
rrax 
IMax =t NA 
Time, s 
Figure 5-20: Transient response of the GCA current gain A, to a step input change in Imax 
The closed loop transient response of the AGC is simulated using the simplified 
MATLAB Simulink model shown in Figure 5-21. For this simulation, the AGC 
parameters are set as follows: Iagc I= Iagc2 = Iscale =1 µA, C= 220 nF and UT = 26 mV. A 
BPF with a centre frequency of 1 kHz and an envelope LPF with a cut-off frequency of 
200 Hz are modelled by the corresponding transfer functions. The response of the AGC 
loop is checked by applying a1 kHz, 1µA pk sinusoidal input multiplied by a step 
function having a value of 0.01 for the first 0.5 s and subsequently 0.08. The 
corresponding graphs showing the input current, VGA output, envelope output and VGA 
gain A; are shown in Figure 5-22, where it can be seen that the VGA gain drops from 0.73 
to 0.16 due to the increased input amplitude. 
129 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
rn 
15 
CL 
c 
in 
Figure 5-21: Simulink model of the AGC loop 
130 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
0.1 
0 
I-0.1 
n n1 n7 ni nA nr. na n-7 nQ na 
0.1, I 
0 
ýr 
a 
.ý 
Q -0.1' 0 
10, 
C 
wn 
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
0 0.1 0.2 0.. 3 0.4 0.5 0.6 0.7 0.8 0.9 1 
cd oA 
aý 
0 
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 
Time, s 
Figure 5-22: Simulink transient simulation results for the AGC loop showing input 
current, VGA output, envelope signal and current gain of the VGA. 
131 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
5.8 Output Multiplexing 
In order to reduce the number of output pins, the individual output current mode 
signals from each of the 24 BPFs are multiplexed onto a single output pin, using nMOS 
pass transistors. Multiplexing is also applied to the processed envelope, IID and MSC 
cues. The five multiplexers are controlled by a single digital decoder, synthesised using 
Verilog HDL, as shown in Figure 5-23. The decoder is designed in such a way that 
applying (11111)B on its digital input causes all pass transistors to be turned on: this 
feature is useful for simulation purposes in order to enable termination of the various 
outputs such that current mode signals can be monitored. 
5.9 Simulation Results 
The 2 "d order BPF sections are simulated with a tuning current ranging from I nA 
to I µA. The corresponding frequency response for 8 filters, with Iq = ItUie/ 10 and 
C= 50 pF is shown in Figure 5-24. It is noted that cascoding of the translinear loop 
devices and the current mirrors greatly improves the attenuation at low frequencies. For 
the low frequency bands, the low frequency roll-off is limited; however, the low 
frequency bands are only used to extract phase information so that this low frequency roll- 
off is not critical. 
repeated for envelope, IID 
and MSC cues 
BPF-out 
Ml II M21 I M24 
IBPFI I IBPF2 I IBPF24 
Select 5-to-24 
5 
DECODER 
Figure 5-23: Multiplexing arrangement for the BPF, envelope, IID and MSC cue outputs. 
132 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
20 
10 
-10 
n 
_20 
-30 
týl C 
L) 
-5C 
-6t 
-7( 
-8ý 
1.0n 2.7n 7.2n 19n 52n 133n 373n 1. Ou 
0 
ýý1 Jý 
rý 
r r 
t 
100 101 102 103 104 105 
Frequency, Hz 
Figure 5-24: Frequency response of the BPFs for I, .e1 nA to 1µA. The BPFs have 
been simulated with tuning currents of up to 1 uA corresponding to a centre 
frequency of 50 kHz; however, 18 kHz is the highest centre frequency used. 
The corresponding frequency response of the 2"d order LPF sections used for 
envelope extraction with C= 50 pF and the tuning currents set to one fifth of those of the 
respective BPFs is shown in Figure 5-25. The filter response shows the expected 
-40 dB/decade roll-off with some degradation at high frequencies which is due to the 
parasitic capacitances associated with the MOS devices. 
Figure 5-26 shows the PTAT output current as a function of temperature with 
R= 61 M. This resistance setting corresponds to a BPF frequency range of 80 Hz to 
18 kHz. The response is very linear within the range -20 to 120 °C, and the slope 
corresponds to a value of n equal to 1.88. 
133 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
20 
0 
-20 
-40 
-6C 
-8C 
-10( 
_1')1 
53 3. n 27. n OOn 
200p 1. n 10. n 
\74. 
n 
100 10 102 103 10a 105 
Frequency, Hz 
Figure 5-25: Frequency response of the LPFs for Iýe = 200 pA to 200 nA. 
134 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
The simulated transfer characteristic I3t versus Ii, for the IID, MSC divider with 
'ref= 100 nA, L= 100 pA is shown in Figure 5-27, for different values of Iiti2. For the 
case when Irani = Iin2 =0 nA, the output is equal to 100 nA (= Iief), as determined by Infs. 
1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 25 
0.1 
Iin2, nA 
100 
75 
550 
v 
0 10 20 30 40 50 60 70 80 
Iinl, nA 
Figure 5-27: Divider transfer characteristic 
90 100 
The power dissipation for the whole front-end is approximately 0.94 mW when 
operated at ± 0.9 V. Most of the power is dissipated in the sections tuned to higher 
frequencies due to the associated higher tuning currents. This value of power dissipation 
is obtained with the multiplexer input set to (11111)B and hence all outputs enabled. 
Under normal operation it is expected that the power dissipation is slightly less, since then 
only the outputs generated by a single filter will be active at a time. 
5.10 Testing Results 
The chip has been fabricated (layout in Appendix A. 3) and has a die size of 
13 x 11 mm and consists of about 150 K transistors. The test set up is shown in 
135 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
Appendix A. 2. The external components are used for the generation of bias currents and 
voltages. The current-mode output signals are converted into voltage-mode signals via the 
use of a differential transresistance amplifier having a transresistance value of 1 MO. 
5.10.1 Bandpass Filters 
The frequency response of the BPFs was tested using an Avantest R3265A 
spectrum analyser together with a tracking generator. The output currents form the BPFs 
are converted into voltages using aI MS2 transresistance differential amplifier based on 
low noise (TL074) op amps, and attenuated by 20 dB before being fed into the spectrum 
analyser. The measured and corresponding estimated (using Eqn. 5.12) centre frequencies 
are shown in Figure 5-28. The measured values represent the average resonant 
frequencies of the L and R channel. The discrepancy between the L and R resonant 
frequencies is found to be within 1.7 %. The estimated and measured values track well 
180001 
16000 F 
14000 
12000 ,- 
U 
o ioooo 
.r 8000 - 
0 
aý 
fy 6000H 
IC 
*= ideal resonant frequency 
O= measured resonant frequency 
4000 
2000 
05 10 15 20 
Filter Number 
25 
Figure 5-28: Estimated and measured resonant frequency of each of the 24 BPFs. 
136 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
especially for the low frequency filters. However, for the higher frequency filters (21-24), 
the measured resonant frequency is lower than the estimated value. Possibly this 
discrepancy arises from parasitic capacitances and also from operation near the boundary 
between subthreshold and strong inversion region. 
During measurement, it is found that with the nominal Q-tuning current setting, 
most filters exhibit a Q-factor which is too high, thus also rendering them marginally 
unstable. At high Q-factor values, the Q-factor is very sensitive to component variations 
and any unaccounted-for parasitics. Hence, during the test procedure, the Q-factors are 
tuned, by appropriately loading the calibration memory, in order to achieve a uniform 
peak response for all filters. Figure 5-29 shows the adjusted Itune/IQ value as required in 
order to achieve a uniform peak stable response for all filters. The resulting frequency 
response of the BPFs is measured and results are shown in Figures 5-30 (a)-(d) for the left 
channel with filter numbers 3,10,18,24. 
9.9. -- ---- 
9.8 
9.7 
9.6- 
9.5 
C 
0 9.4 
9.3 , 
9.2ý 
9.1 
0*Oc 
*0®**®*O 
9 
*= Left channel 
0= Right channel 
0 c 
CC®ME 
8.9'__1. ýk º 1_ Q_ý -ý? L? 1- 1 1- .1 --1 i 
1-__T 
05 10 15 20 25 
Filter Number 
Figure 5-29: Adjusted It.,,, /IQ for uniform peak response of the L and R BPFs. 
137 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
REF 44.9 dBmV ATT 20 dB 
10 dB/ 
- ------------------------------------ ---------- ------ j 
--------------------- - ------ --------- ----------------------------------------- 
q 
... 
------------------------------------- ------------- _ -------- ---------- 
DL -46.1 dBmV r--------- ------------ 
: 
-------------- 
-------------------------- --------------- - RBW 
- ------ --------------- 
*10 Hz 
VBW 
1 Hz ..................................... --- - ... 
SWP II 
50S START 100 Hz STOP 10 K 
Figure 5-30 (a): Frequency response of the 3rd BPF measured using Avantest 
spectrum analyser together with a tracking generator. 
REF 44.9 dBmV ATT 20 dB 
10 dB! 
... 
DL -46.1 dBmV . 
'" 
--------------- 
RBW 
*10 Hz 
VBW 
1 Hz 
SWP 
50 s 
i 
ý. . _. ___. __}__. __. ... ý. _. ý_. .. t. 4 . ý.. _ ------------------------------- 1 
.... _.... _ ... _ý.. i 
. 
. 
START 100 Hz STOP 10 K 
Figure 5-30 (b): Frequency response of the 10`h BPF measured using Avantest 
spectrum analyser together with a tracking generator. 
138 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
REF 44.9 dBmV ATT 20 dB 
10 dB/ 
-------------- --------"------ -------- " .. -_.:. ... - ....... _. 
----'-'-'--- "----'"--"-- ---------- "----"---'--"--"-- ................ ...... -. -- 
---------------------... ---- -- --------------- ' 
DL -46.1 dBmV 
RBW .... ;... ------ - 
*10 Hz 
VBW 
1 Hz 
SWP 
50 s START 100 Hz STOP 10fä K 
Figure 5-30 (c): Frequency response of the 18`h BPF measured using Avantest 
spectrum analyser together with a tracking generator. 
REF 44.9 dBmV ATT 20 dB 
10 dB/ 
_ ---------- ---"--- -- 
DL -46.1 dBmV :.; . t--- . - 
RBW ................ ----"- 
* 10 Hz 
VBW ..... ----------- ------- -- 
1 Hz ................ _ .. SWP 
50 s START 100 Hz STOP 100 K 
Figure 5-30 (d): Frequency response of the 24`h BPF measured using Avantest 
spectrum analyser together with a tracking generator. 
139 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
The THD, measured for the 12th filter for different input amplitudes at the 
resonant frequency of 1.12 kHz, is shown in Figure 5-31. All readings are taken with the 
V-I converter input resistors equal to 1 M. As expected in fully differential topologies, 
distortion components mainly consist of odd harmonics, since even harmonics are 
cancelled out. It is interesting to note that at a THD of 1.9 % (corresponding to 
Vin = 500 mV), the output differential current is already 6 µA pk, and hence some of the 
MOS devices are indeed operating in strong inversion; this shows that even for moderate 
strong inversion operation, LD CMOS circuits provide satisfactory results. The measured 
dynamic range at 1.9 % THD is around 68 dB. 
V;,,, V into l MSZ resistor 
Figure 5-31: THD as a function of input amplitude (V-I converter resistance equal to 
1 MCI), measured for the 12`h filter at an input frequency of 1.12 kHz (equal to filter 
resonant frequency). 
140 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
Results obtained from two-tone IMD tests carried out on the 12 `h BPF for two 
frequencies near the pass-band, namely 1.0 kHz and 1.2 kHz, are tabulated in 
Table 5-1 (a). The test is repeated with two out of band signals (500 Hz and 620 Hz), but 
whose IMD product (1.12 kHz) falls in the pass-band. In this case, the intermodulation 
products measured relative to the thermal noise floor (at -46.1 dBmV) are tabulated in 
Table 5-1 (b). 
Input amplitudes Output sidebands measured relative to 
outputs at 1.02 (and 1.22) kHz 
1.02 kHz (V pk) 
0.2 
1.22 kHz (V pk) 
0.2 
200 Hz 
-70 dB 
2.2 kHz 
-75 dB 
0.4 0.4 - 67 dB - 69 dB 
0.8 0.8 - 60 dB - 65 dB 
1.0 1.0 -54 dB -59 dB 
Table 5-1 (a): Intermodulation tests carried out on 12th BPF with inputs at 1.02 kHz and 
1.22 kHz. 
Input amplitudes Output sideband at 1.12 kHz measured 
500 Hz (V pk) 620 Hz (V pk) relative to noise floor 
0.2 0.2 3 dB 
0.4 0.4 5 dB 
0.8 0.8 12 dB 
1.0 1.0 15 dB 
Table 5-1 (b): Intermodulation tests carried out on 12Th BPF with inputs at 500 Hz and 
620 Hz. 
5.10.2 Complete System Response Including AGC 
A 1.12 kHz sinusoidal input modulated by a square-wave function (frequency 
2 Hz) is used to obtain the response of the front end, with the AGC loop enabled. The 
same input is applied to both channels; however, the signal applied to the R-channel is set 
to one-half that applied to the L-channel via a voltage divider. The 12 `h L and R envelope 
141 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
signals are shown, together with their respective lID output, in Figure 5-32 (a). During 
this test, the IID reference current is set to 1 µA, while the offset current is set to 100 pA. 
The AGC capacitor value is equal to 0.22 µF. Chl shows the R-input waveform, whose 
amplitude changes from 25 mV to 50 mV. Ch2 and ch3 waveforms are the corresponding 
L and R channel (squared) envelope outputs. Due to the compression imposed by the 
AGC loop, the envelopes only change by a factor of 1.3. Ch4 shows the corresponding 
IID waveform which is around 4V and represents the ratio between the (squared) L and R 
envelope signals. As expected, the IID value is not affected by the AGC. 
The above test is repeated, but with the input modulated using a triangular 
waveform instead (frequency 2 Hz), and results are shown in Figure 5-32 (b). The 
corresponding L and R (squared) envelopes, shown by ch2 and ch3 respectively, clearly 
demonstrate the AGC compression. The corresponding IID value (ch4), is also around 
4 V, except for dips which occur around the zero crossing of the modulating triangular 
waveform, where the RD signal drops to around 1V as determined by the Iof, setting of 
the lID divider. Figure 5-32 (c) shows the same test repeated with the AGC loop disabled 
and with the L and R inputs halved in order to prevent distortion. In this case the 
(squared) envelope signals in fact show the expected uncompressed quadratic response 
(with some distortion near the peaks due to overloading of the envelope detection 
circuits). The lID value is also around 4V in this case except for some distortion near the 
peaks of the triangular waveform. The measured quiescent current consumption of the 
whole front end, including the output currents, is 494 µA (power consumption 890 p. W), 
of which 2 gA are due to the output currents. The total power consumption increases to 
1.1 mW when a1V, 1 kHz signal is applied to both the L and R inputs. 
5.11 Conclusions 
A LD CMOS low voltage front end chip, intended for use in a 2-D sound 
localisation system, comprising a BPF bank and extraction of IIDs and MSCs has been 
designed, fabricated and tested. LD processing enables low power operation and a wide 
dynamic range, while the blocks themselves are capable of operating at a low supply 
voltage of ± 0.9 V. Results in fact show that good linearity is still achieved even when the 
CMOS LD circuits operate in the moderate strong inversion region. A parallel BPF 
architecture has been adopted rather than a more precise cochlea model in order to 
142 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
PM3394B 
ch1 
ch2 
ch3 
ch4 
Y/Div: Timebase: TRACE 
50.0mV looms chi 
1.00 v looms ch2 
1.00 V looms ch3 
50. OmV looms ch4 
Figure 5-32 (a): Complete front-end test with a modulated sinusoidal input signal 
(11.2 kHz modulated by a2 Hz square wave). Chl - right input, ch2 - L-envelope, 
ch3 - R-Envelope, ch4 - corresponding DID signal. 
PM3394B 
ch1 
ch2 
ch3 
ch4 
Y/Div: Timebase: TRACE 
50. OmV 100ms chi 
1.00 V 100ms ch2 
1.00 V 100ms ch3 
50. OmV looms ch4 
Figure 5-32 (b): Complete front-end test with a modulated sinusoidal input signal 
(11.2 kHz modulated by a2 Hz triangular wave). Chl - right input, ch2 - L-envelope, 
ch3 - R-Envelope, ch4 - corresponding IID signal. The compression introduced by 
the AGC loop is evident. 
143 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
PM3394B 
ch1 
ch2 
ch3 
ch4 
Y/Div: Timebase: TRACE 
20. OmV 100ms chl 
1.00 V 100ms ch2 
1.00 V 100ms ch3 
50. OmV looms ch4 
Figure 5-32 (c): Complete front-end test with a modulated sinusoidal input signal 
(11.2 kHz modulated by a2 Hz rectangular wave) with the AGC disabled. Chl - right 
input, ch2 - L-envelope, ch3 - R-Envelope, ch4 - corresponding IID signal. In this case, 
inputs are halved with respect to those used in Figure 5- 32 (a), (b); however, envelope 
signals are large in magnitude and on the verge of distortion. 
reduce the accumulation of errors, noise and delay. Furthermore, the response of one 
filter does not affect the response of the other filters, as would happen in a cascade 
approach. Variations in Q-factor due to mismatches are compensated via a calibration 
memory. Distortion due to overdrive conditions is prevented by using an AGC loop 
which reduces the gain at the input for large signal amplitudes, thus further increasing the 
effective dynamic range of the front end. 
Test results show good agreement between the estimated and measured resonant 
frequencies of the BPFs, and also a good matching between the L and R channels. 
However, Q-factor tuning is a necessary requirement in order to obtain the same peak 
response for all filters, which is necessary for the correct evaluation of the 111) and 
monaural spectral cues. It is also noted that, in practice, in order to achieve a stable 
Q-factor value, a highly regulated power supply voltage has to be used, even though the 
nominal Q-factor is accurately determined by a current mirror ratio. One reason for this is 
due to the finite output conductance of the MOS devices which causes variations in the 
144 
Chapter 5. Log-Domain Front-End for the Extraction of 2-D Sound Localisation Cues 
drain current with supply voltage, thus causing the Q-factor to vary. Another reason could 
be the voltage-dependent parasitic capacitors which exist in the MOS structure, such as 
the channel to substrate depletion capacitance. 
145 
Chapter 6. Computation of Time Delay Cues 
Chapter 6 
Computation of Time Delay Cues 
6.0 Introduction 
The time delay between the acoustic signals arriving at the left and right inputs 
conveys important information about the sound source's position [108]. In fact, in the 
biological system, the superior olive region in the brainstem contains cells that are 
preferentially responsive to sounds with specific interaural time differences; these cells 
receive inputs from both the left and right cochleae via their respective hair cells and 
auditory neurones and can discriminate differences of microseconds and up to a 
maximum of about 1 ms (in humans) in the arrival time of sounds at the two ears [120]. 
This chapter describes the development of a time delay estimation chip capable of 
extracting the time delay cues, namely interaural phase delay and interaural envelope 
delay cues. The final chip is based on a time-division multiplexed topology in order to 
minimise the area requirement. Switched capacitor techniques have been used for this 
application as a result of the time delay and accuracy requirement involved. Novel 
building blocks (namely a switched-capacitor multiplier and a high speed op amp) have 
been designed in order to enable operation at low supply voltage, while still achieving 
high speed and low power properties. Section 6.1 discusses the analogue hardware 
building blocks required for time delay extraction. The design of a chip based on a 
cascade delay line topology is described in section 6.2. Section 6.3 gives the details for a 
time division multiplexed topology. This topology necessitates the use of a high speed 
operational amplifier which is described in section 6.4. The various building blocks for 
the time division multiplexed topology, namely the parallel sample-and-hold bank, 
multiplier, integrator, analogue memory, comparator and digital control are described in 
sections 6.5 to 6.10. Simulation and testing results for this chip are then reported in 
sections 6.11 and 6.12, respectively. 
6.1 ITD Extraction using Analogue Hardware 
The time delay between two signals is obtained by first computing the cross- 
correlation function at specific delay positions: 
T (6.1) 
r(n) = JR(t)L(t - nzd) dt for nE 
[0... 431 
0 
146 
Chapter 6. Computation of Time Delay Cues 
In the above expression R(t) and L(t) represent the left and right input signals, td is the 
minimum detectable delay difference (resolution), n corresponds to discrete delay 
positions and r(n) is the correlation result. In order to determine the actual delay between 
R(t) and L(t), it is necessary to find the value of n for with r(n) is maximised. In terms of 
hardware, the cross-correlation process requires a multiplier, integrator and a tapped delay 
line to act as a storage space for previous samples. Furthermore, a comparator (together 
with some temporary storage) is required in order to determine the value of n which 
maximises r(n). In this implementation, all blocks have been designed using SC 
techniques in order to achieve high accuracy together with the possibility of operation at 
low supply voltages. Thus, a digital control block is also required in order to generate the 
various clock phases required by the different building blocks and also to coordinate the 
interaction between the blocks. 
6.2 Cascade Delay Lines 
The calculation of IPDs and IEDs requires the use of a delay-line in order to store 
previous signal values which are used during the calculation of the cross-correlation. One 
method to implement delay lines is to use a cascade of delay elements, where the signal is 
transferred from one element to the next after a specific time delay td. Some hardware 
techniques for the implementation of cascade delay lines have been 
reported [21], [36], [99]. Continuous time delay lines are not well suited for this 
application since, in order to achieve the required delay, a large number of cascaded delay 
elements is required. Asynchronous pulse delay lines using CMOS circuitry operating in 
the subthreshold region suffer from very low accuracy due to hardware mismatch issues 
even though the power consumption of such circuitry is low. This is because hardware 
mismatches not only affect the amplitude of the propagating signal, but also result in 
variations in Td of the individual delay elements of the delay line. The use of SC delay 
elements is preferred since in these circuits, id of each individual element is precisely 
determined by the clocking frequency; low THD values are also possible in SC 
circuits [ 1211. In this way the IPD and IED cues can be computed with good accuracy. 
In a SC implementation, a 2-phase clocking system is usually sufficient to control 
such a delay line. The main problem with this approach is that the signal may degrade as 
it moves forward on the delay line due to offset, gain-mismatch and noise problems. 
Another approach is to use a parallel bank of sample and hold (S/H) circuits or analogue 
147 
Chapter 6. Computation of Time Delay Cues 
memories. In this case, the digital control circuit is often more complex since it needs to 
replace the oldest value by a new input sample and also to output the stored values in the 
required sequence. 
6.2.1 Circuit Design 
A dual-channel 20-stage SC delay line operating at ± 0.9 V has been designed, 
fabricated and tested. The delay line is based on a SC integrator as a delay element, where 
the integrating capacitor is discharged by the successive stage after each clock cycle. The 
delay line consists of identical delay elements (Figure 6-1 (b)), except for the first one, 
Figure 6-1 (a), which has to be interfaced to the inputs. The last element requires a 
termination block containing two switches, which are essential for discharging the 
integration capacitor, as shown in the complete diagram Figure 6-1 (c). 
Rprop(1) 
Lin' 
Rin' 
Lprop(1) 
Rout(1) 
Lout(1) 
148 
Figure 6-1 (a): First element of the SC delay line. 
Chapter 6. Computation of Time Delay Cues 
L(n-1) 
_ 
R(n-1)-1 
I 
Lin I-N 
Rin --4 
Figure 6-1 (b): Subsequent delay elements. 
Lprop(2) 
First-stage Delay 0---1 
Delay Element 
Element -ý---I Rprop(2) 
Lout(1) Rout(1)Lout(2) Rout(2) 
Figure 6-1 (c): Complete SC delay line. 
Rprop(n) 
Lprop(n) 
'n) 
In) 
Delay 
Element 
Lout(n) Rout(n) 
Since all parasitic capacitors associated with the switches are connected either to a 
voltage source with a low output resistance (for example, op amp output), or are 
discharged every clock cycle, the above configuration is parasitic-insensitive. It can also 
be seen that the maximum offset voltage is always due to three op amp offsets and thus in 
general it does not accumulate along the delay line. Also, since the delay line relies on 
charge transfer from the capacitor of the previous stage to the next one, the output voltage 
149 
Chapter 6. Computation of Time Delay Cues 
at every stage is only dependent on one capacitor ratio and hence the ratio error does not 
accumulate. All switches are implemented using CMOS transmission gates. The design is 
quite area efficient, requiring one op amp and two capacitors per delay unit, where each 
delay unit processes both channels. Unlike continuous-time circuits the delay is in this 
case uniquely defined by the clock frequency. 
K1 
K2 
deadband o time 
(a) 
CK 
nRS' 
(b) 
nK1 
K, 
K2 
nK2 
RST 
nRST 
Figure 6-2: (a) Non-ovelapping clock phases K,, K2; (b) corresponding two-phase clock 
generation for KI, K2 together with their complements nK1, nK2. 
The switches are controlled by a 2-phase clock generator shown in Figure 6-2. In 
order to allow for the finite turn-off times of the switches some dead-band is introduced 
between the two clock phases. The complement of each phase is also required in order to 
drive the CMOS transmission gates used as switches. A dead-band, between the two 
phases, is produced as a result of the propagation delay Tdb of the weak inverters in the 
feedback path of the latch. Since the outputs of the clock generator drive all the switches 
in the delay line, they must be adequately buffered, otherwise capacitive loading can 
150 
Chapter 6. Computation of Time Delay Cues 
severely increase the rise and fall times of these signals. This would result in a reduction 
of the maximum sampling frequency and possibly also a reduction in the effective dead- 
band between the two phases. The RST signal disables both clock phases and discharges 
all integration capacitors. This signal is mainly used during the simulation phase of the 
circuit in order to initialise the circuit to a known state. 
6.2.2 Test Results for the Delay Line 
The op amp is designed to have a rail-to-rail output, such that the delay line is 
capable of handling rail-to-rail signals. It consists of a differential pair, a folded-cascode 
stage and an output common source stage as shown in Figure 6-3. 
)ut 
vs5 
Figure 6-3: Op amp used in the SC delay line 
Ten prototypes of the 20-stage delay line together with a separate op amp have 
been fabricated: the layout of the chip is shown in Appendix A. 4. The op amp has been 
tested individually with the bias current set to around 20 µA using an external bias 
resistor. The op amps on the test chips exhibit a measured open loop gain of 80 dB and a 
phase margin of 401. The -3dB corner frequency is at 4 kHz, while the unity gain 
bandwidth is 25 MHz. The probe used in carrying out these measurements presents an 
effective loading of 12 pF in parallel with 10 MO. A slew rate of 4 V/µs results for the 
op amp in unity gain non-inverting configuration with a 400mV pk-pk, 300 kHz square 
wave signal applied as input. The magnitude of the input offset voltage of the op amps, as 
measured on 10 prototypes, is found to be less than 5 mV in each case. 
151 
Chapter 6. Computation of Time Delay Cues 
Measurements have been carried out on the whole delay line for total harmonic 
distortion, inter-channel crosstalk, intermodulation distortion and d. c. offset. The THD is 
measured using a 10 kHz, 200 mV pk sinusoidal input voltage (-56 dB THD) applied to 
each channel separately, with the other channel input set to zero. A clock frequency of 
500 kHz is used during this test. The THD value measured along different taps on the 
delay line is -40 dB or less in each tap position. There is no tendency for the THD to 
increase as the distance from the first tap is increased. At a higher supply voltage 
(± 1.5 V), and THD values obtained are -50 dB or less. The most probable cause of 
distortion is the on-resistance of the transmission gates, which can be significantly high at 
low supply voltages: this severely increases the settling time of the circuit. At a clock 
frequency of 100 kHz, the THD is -45 dB or less. 
Crosstalk measurements have been carried out by applying a 10 kHz, 200 mV pk 
sinusoidal signal at the input of one channel and making measurements on the other 
channel whose input is suppressed. In each case, the magnitude of the measured crosstalk 
is at least 40 dB below that of the injected signal. The crosstalk value shows very little 
tendency to increase along the delay line. At a sampling frequency of 100 kHz, the 
crosstalk value is approximately 8 dB lower than that obtained at 500 kHz. Crosstalk can 
result from coupling via stray capacitances or through the power supply. Another source 
of crosstalk in the time division multiplexed (TDM) delay line could be the incomplete 
discharge of certain parasitic capacitances due to the on-resistance of the CMOS switches 
which is significant at low supply voltages: this crosstalk component increases as the 
sampling frequency is increased. 
Intermodulation distortion measurements have been carried out for two cases: in 
the first case, two sinusoidal signals (200 mV pk) of different frequencies (10 and 7 kHz) 
are applied to the L and R channel, respectively. For the second case both signals are 
applied to the same channel via a resistive network, while the input of the other channel is 
suppressed. In both cases, the measured magnitudes of the crosstalk components are 
56 dB, or more, below those of the injected signals, indicating that intermodulation 
distortion is negligible in this delay line. 
The d. c. offset voltage at different positions along the delay line has been 
measured by suppressing the input voltages to both channels. In some particular tap 
positions, offset voltage magnitudes as high as 50 mV result. The offset voltage limits the 
dynamic range of the delay line when used as part of a correlator since it has to be 
152 
Chapter 6. Computation of Time Delay Cues 
ensured that the input signal level is large enough such that the correlation peak is not 
affected by the offset generated due to hardware non-idealities. The offset voltage shows 
a large variation with changes in the amplitude of the digital signals controlling the clock. 
The offset voltage does not show any tendency to increase along the delay line. The 
results suggest that the main component of offset voltage is clock feedthrough or charge 
injection in the transistor switches rather than due to op amp offset. The effects of clock 
feedthrough and charge injection can be reduced via the use of dummy switches and fully 
differential techniques. 
The variation of the signal amplitudes along the line has also been noted. The 
maximum variation is 8% and is due to mismatch between the input capacitor and the 
integration capacitor used at each particular stage. In the chip layout, the delay elements 
are placed near to each other in a matrix. Thus the capacitors are not physically close to 
the input capacitor. Better capacitor matching could have been obtained by placing all the 
capacitors near to the input capacitor or via the use of the common-centroid technique. 
Nevertheless, the variation of the signal magnitude does not increase along the delay line: 
this is expected from the circuit topology used which relies on charge transfer rather than 
voltage duplication. 
6.3 Time Division Multiplexed SC Correlator 
A SC cross-correlator had been designed using time-division multiplexing such 
that only one op amp per channel is required for each processing block. This method 
reduces the area requirement although it can potentially increase the power consumption 
due to higher op amp speed (slew rate) requirements [122] and more complex digital 
control circuitry. However, the power consumption requirement of the op amps can be 
reduced via the use of dynamic biasing. A fully differential structure has been used in 
order to reduce the effect of clock feedthrough, charge-injection and common mode noise 
coupling; the reduction of the effects of clock feedthrough and charge injection is one of 
the main challenges in designing SC circuits. Although SC circuits are useful for low 
voltage operation, since the capacitors can be used to generate the required voltage shifts 
between adjacent circuit blocks, the reliable switching of the CMOS gates themselves is 
another challenge. 
153 
Chapter 6. Computation of Time Delay Cues 
L-in Parallel SC Integrator Diff-to-Single 
S/H Bank Multiplier -00 Bank ended 
Comparator 
convertor 
R-in 
Fully Differential 
Analogue Memory Max. correlation 
position 
Figure 6-4: Block diagram of the analogue TDM correlator. 
The analogue section of the correlator is divided in 6 blocks, as shown in 
Figure 6-4. All blocks are under the control of a digital control unit, which has been 
synthesised using Verilog HDL. As is common of most SC circuits, the output from each 
block is only valid during a particular clock phase: hence it has to be ensured that the 
sampling phase of the successive block coincides with the appropriate phase of the 
preceding block. The parallel filter-bank, integrator bank and analogue memory contain 
just one op amp each. For ITD extraction the maximum delay to be measured is I ms. 
High level software simulations indicate that it is sufficient to split this delay into 
44 discrete steps. The above arrangement implements the cross-correlation and the 
comparison processes as described in Section 6.1. 
The input data is sampled at 44.1 kHz, that is every 22.7 µs, which is equal to Td. 
After taking the input sample, the parallel S/H bank, SC multiplier and integrator banks 
have to process all the 44 pre-sampled left-channel inputs, before taking the next input 
sample. Thus the processing rate for these blocks has to be I process per 0.52 µs, that is 
1.94 Mprocesses/s. 
The system works in two modes: correlation mode and minimum delay selection 
mode. During the correlation mode, input samples are taken and stored in the parallel S/H 
bank, in a first-in-first-out (FIFO) manner. After acquiring a new input sample, all pre- 
sampled left-channel values L(t - ntid) are multiplied with the current right-channel input 
R(t) coming from the other channel and integrated on a corresponding capacitor in the 
integrator. This process continues for a specific "integration" period T s. After this period 
has elapsed, the delay position pertaining to the maximum correlation point is chosen, by 
utilising a comparator and an analogue memory. During the start of this process, the 
154 
Chapter 6. Computation of Time Delay Cues 
analogue memory is filled with the first integrated value (corresponding to zero delay). 
Then, each integrated value is compared with the value stored inside the analogue 
memory. If the value is greater than that currently stored in the analogue memory, the 
analogue memory is overwritten with the new value and the position of this value is 
stored digitally. In this way, the ITD value, computed as the position of the maximum 
correlation value, is determined after 44 comparisons. In the current chip only 5 stages 
have been implemented; this still enables the functionality of the correlator to be assessed. 
6.4 High Speed Op Amp Design 
Two op amps have been designed for the current application. The current ITD 
computation is carried out using the first op amp. The second op amp is an enhanced 
variation of the first op amp for higher gain and better low supply voltage performance. 
6.4.1 Class AB Slew-Boosted Differential Op Amp 
The schematic diagram for the first op amp is shown in Figure 6-5. The op amp 
has been designed to operate in class AB mode: this enables the use of low quiescent bias 
current while still enabling high output currents. Furthermore, a slew-rate boosting block 
has been incorporated in order to enhance the speed of the op amp during large voltage 
transitions. A common-mode feedback (CMFB) circuit is required in order to keep the 
common mode voltage of the outputs at 0 V. 
The op amp consists of two gain stages: a differential pair (M10 - M13) and a 
common source output stage (MI, M6). The input common mode voltage is set to V... 
Transistors M2, M3, M7-M9 are used to transfer the signal from the drain of Mio to the 
gate of Mi. In the current implementation, Ibias is set to 60 p. A and M3, M4, M20 are all set 
to the same size. Transistors M7-M11 are also identical. Thus the quiescent current through 
these transistors is equal to 30 µA. The aspect ratio of M6 is set to 8 times that of M9, and 
similarly that of M1 is set to 8 times that of M2. Thus the current through MI and M6 is 
equal to 240 µA. The low frequency gain of the whole amplifier is approximately given 
by: 
gm12 A,, _ 
gds 12 +gds/0 +gds22 +gds23 
cgm7 öm6 + gm/ 
gm1 
gds 1 +gds6 +g 
(6.2) 
155 
Chapter 6. Computation of Time Delay Cues 
v( 
5' 
From the transistor sizes being used, it can be deduced that g,, i2 = gm7 gml/gm6, thus 
ensuring the same signal gain of the path through Mt to be equal to that of the path 
through M6. It can be seen that each node in the signal path contains a diode-connected 
transistor except for the output node and the node connected to the gate of M6. Miller 
frequency compensation is thus achieved by connecting a Miller capacitor across the 
drain and source terminals of M6. A zero nulling resistor RZ, with its value set to 1/gni6, is 
156 
Figure 6-5: Circuit diagram of the first op amp. 
Chapter 6. Computation of Time Delay Cues 
also used in order to improve the phase margin and the settling time of the op amp. The 
dominant pole is introduced at a node which is common to both signal paths via Mi and 
M6 and therefore, no additional zero-terms are introduced in the frequency response. 
The slew rate of an op amp is generally limited by the value of the compensation 
capacitor used, the load capacitance and the bias current. In general, the slew rate can be 
improved by increasing the bias currents, but this will also increase the power 
consumption and will also lower the gain of the op amp due to a reduced g, /gas ratio. The 
slew rate of this op amp has been improved by boosting the bias currents during slewing 
conditions [123]. The slew detector has been built around M14-Mi7, which controls the 
current boost transistor M19. The slew detector operates by computing the current 
difference through M> > and MI I'. During quiescent condition, the currents through these 
two transistors will be identical and so the current through M14 and M15 will also be the 
same. In this case the current through M18 and M19 will be negligible. During slewing 
condition, a current difference will exist between M14 and M15. The resulting current 
difference will be injected into M18. It should be noted that the slew detector has been 
replicated in such a way that it performs full wave rectification of the current difference. 
When a current difference exists, this will be reflected as current in Mtg or Mfg' 
depending on the direction. In any case, M19 or M19' will conduct causing the bias current 
in the whole op amp to rise. The maximum peak bias current is limited by the linear range 
of the slew detector: the maximum boost current is achieved when M17 or M17' enters 
triode region, in which case the gate voltage of M19 or M19' will be nearly equal to Vad. 
The slew detector will only give an additional speed improvement of the op amp if its 
reaction time is small enough. For this reason, parasitic capacitances have to be 
minimised in this block: transistor sizes have been chosen to be the smallest possible 
while still achieving accurate mirroring. Errors in the mirror transistors, arising from 
finite output resistance of the MOS devices, inside the slew detector will result in some 
leakage boost current during quiescent condition. In the present design this has been 
limited to less than 5µA. 
Common-mode feedback in low voltage rail-to-rail output op amps is a critical 
issue [124]. In the current implementation, the common-mode voltage is achieved directly 
via R. This method removes the problem of having to use a rail-to-rail input amplifier 
in the CMFB circuit. The CMFB amplifier is based on a single differential pair with a 
split output node. In this stage, M21 and M24 are sized twice M22 and M23, respectively. 
157 
Chapter 6. Computation of Time Delay Cues 
The gate voltage of the differential transistors is around 0 V. This leaves very little 
headroom for the tail current source. Thus a resistor has been used to provide the tail 
current rather than a current source. Although this lowers the CMRR of the CMFB 
amplifier, this property is not an important issue in CMFB circuits. The value of Rcm must 
be high enough such that it does not degrade the voltage gain of the op amp by a large 
amount. However, a large value of Rc, r also introduces a pole with the parasitic 
capacitance at the gate of M21, which degrades the frequency response of the CMFB 
circuit and can also cause instability. For this reason capacitor Cem has been introduced in 
order to provide coupling at high frequencies and thus cancel the effect of this parasitic 
pole. The CMFB loop is frequency-compensated by the main compensation capacitors C,. 
The above CMFB topology allows very fast settling of the common mode output 
voltage, and makes the op amp quite versatile since its action is continuous and 
independent of the clocking of the SC circuit where the op amp is to be used. Compared 
to SC CMFB techniques, the sensing resistors limit the gain of the op amp and consume 
extra current. However, resistive sampling has been chosen in this case because SC 
sampling techniques often require an extra clock phase in order to refresh the sampling 
capacitors. This means that during one clock phase, the op amp is configured to do CMFB 
refresh rather than for the actual processing and therefore the effective speed of the 
op amp will have to be doubled in order to achieve the same processing throughput. 
Furthermore SC CMFB circuits often require several clock cycles before an accurate CM 
output voltage can be guaranteed [125]. There are other problems associated with SC 
CMFB circuits namely that the accuracy of the output CM voltage is limited by clock 
feedthrough and charge injection errors and the reliable operation of MOS switches in the 
signal path at low supply voltages. 
6.4.2 Experimental Results for the First Differential Op Amp 
The above op amp has been fabricated and ten prototypes were tested. The 
corresponding layout of the chip is shown in Appendix A. 5. For testing purposes three 
versions of the op amp have been implemented on the same die. The first consists of the 
full op amp (with Ren, = 10 W), the second requires external CMFB, the third is the full 
op amp but with several internal probe points and also with the facility of disabling the 
current boost circuit. The op amps have been tested using a±0.9 V supply and all 
measurements carried out using a 10: 1 probe having an equivalent input impedance of 
158 
Chapter 6. Computation of Time Delay Cues 
10 MS2 // 12 pF. For the complete op amp the open loop DC gain is around 52 dB, while 
with external CMFB resistors set to 100 kQ, the gain increases to about 55 -58 dB. The 
-3 dB bandwidth is around 120 kHz for the complete op amp and 100 
kHz for the second 
version. The experimental gain values are considerably lower than 65 dB obtained during 
simulation carried out using the MOS transistor model (Model 15) available on the AMS 
0.8 µm technology kit. The gain of the first stage obtained during simulation (MOS 
Model 15) is around 40 (gate of Mi/single-ended input and gate of M6/single-ended 
input), while that of the second stage is around 44 (output/gate of M1 or M6). The 
corresponding measured values on the probed circuit are 20 for the first stage and 22 for 
the second stage. Simulation with an updated MOS model (Model 53 BSIM3 ver. 3) gives 
results practically identical to the experimental ones. The difference in gains obtained 
using Model 15 and Model 53 arises mainly due to different values for the calculated 
output conductance, probably because the transistors essentially operate near the triode- 
pinch off boundary due to the restricted supply voltage. In fact, the output conductance of 
the relevant transistors obtained using Model 53 is about twice that obtained using 
Model 15 in the present circuit, thus explaining the gain reduction by a factor of two in 
each stage. 
Since this op amp is used for the whole ITD extraction circuit, the latter has been 
also simulated using MOS Model 53. The results are practically identical to those 
obtained using Model 15: this is explained by the fact that the op amps are used in unity 
feedback configuration and thus their open loop gain is highly desensitised. Furthermore, 
the parallel S/H bank can tolerate a low op amp gain by design. 
6.4.3 Enhanced Gain Class AB Differential Op Amp 
The second op amp is based on the first op amp but with additional novel 
enhancements in order to improve the gain, low supply voltage operation and reduced 
quiescent power dissipation. The enhancements are: a novel cascoded input stage with a 
regulated CM output voltage, a continuous-time CMFB with a CM current introduced in 
the sensing resistors, in order to ensure correct operation of the CMFB amplifier at low 
voltages; furthermore, a regenerative current boosting technique with an inherent 
threshold for its activation has been used in order to ensure a very high slew rate and still 
maintain the boost currents negligible during quiescent condition even in the presence of 
circuit mismatches. The proposed op amp achieves a gain bandwidth (GBW) of 35 MHz 
159 
Chapter 6. Computation of Time Delay Cues 
and a slew rate of 210 V/µs when operated at 1.8 V and should be useful for most 
switched capacitor circuits operating at a clock frequency of 13 MHz. 
6.4.3.1 Cascoded Input Stage 
The differential input stage is shown in Figure 6-6. M1 and M2 form the 
differential pair while M13 provides the tail current which is set to 60 [A. M9 - M12 are 
cascode transistors. The currents I3Ba and I3Bb are negligible during non-slewing 
condition. Transistor pairs M3, M4 and M5, M6 are driven by the complementary outputs 
of the first stage and therefore they effectively act as constant current sources. The current 
through the cascode branches is set by M7 and M8 and is equal to 30 µA and thus the 
current through M3 - M6 is also equal to 30 µA. This topology achieves the same 
differential gain characteristic of a standard cascoded differential pair given by: 
Avl = gm 
öm, 
ords, a // gm, rds, _ (6.3) 
gds, Sds5 +gds6 +gds_ 
However, it has the advantage that the output CM voltage of the stage is well defined by: 
2L 113 + 2I, (6.4) 
'k "n 
Výmo - VTn + 
F2 
4)+ 
Vss 
This feature is essential in order to be able to establish the quiescent operating current of 
the next stage, without additional CMFB networks, except for the one which is required 
for the output stage. 
VDD 
VBIA5ý VBIASI VBIAS2 
Mý HI Mts MB 
V 
BPCAS 
M1 M2 F M9 VDVý) F Vnv+ M10 'fIF 
Vot+ 
BPCAS 
Vol_ 
VBNCAS 
I ýBNCAS 
Mtt M12 
Iv(6 Ise, M3I M4 M5 
1'f '' , II '1 Ilsb 
Vss 
Figure 6-6: Input stage of the second op amp. 
160 
Chapter 6. Computation of Time Delay Cues 
6.4.3.2 Class AB Output Stage 
In order to reduce the power consumption and still achieve a high output drive 
capability, a class AB output stage, based on [ 126] and shown in Figure 6-7 has been 
used. The output node from the cascode stage directly drives the pull-down transistor M23, 
while transistors M15, M17, M19, M25, M26 provide the essential voltage shift for the pull- 
up transistor M21. In addition, M26 is used for the output CM voltage correction. The 
nominal current through M26 is equal to 60 µA, while transistors M15, M17 and M25 are 
identical to M3 - M6. Thus the current through M15, M», M19 and M25 will be also equal 
to 30 µA. Hence, the current through M21 and M23 can be accurately controlled and 
mainly relies on device matching rather than absolute process parameter values. 
Frequency compensation of the op amp is achieved via Ct while Rl is a zero nulling 
resistor set to provide some feed-forward compensation and hence improve the phase 
margin [127]. 
VCMFB 
M26 M19 
-ýJ_(M27) (M20) 
Voi+ M25 
" of-) (M14) 
15 
(M16) 
M21 
il 
R' 
Iý+(M2i) 
Vo- 
i (R2) (VO. ) M17 
[M18) M23 C2) 
(M24) 
Figure 6-7: One side of the class AB output stage. 
6.4.3.3 Common-Mode Feedback 
In order to avoid the problems associated with SC CMFB circuits discussed in section 
6.4.1, a continuous time CMFB approach has been adopted as shown in Figure 6-8. Resistors 
R3 and R4 form a resistive summing network used to extract the output CM voltage. The 
OTA formed by M28 - M31 constitutes the CMFB error amplifier. The main problem with 
such an arrangement is that the CM voltage reference is only 0.9 V above V,, and this 
leaves very little headroom for M30 and M32 to operate correctly. This problem is solved 
by forcing a CM current ISHIFT2 through R3 and R4 in order to generate the required 
voltage shift which ensures correct operation of the CMFB error amplifier. In this design, 
R3 and R4 are equal to 10 kS2 and ISHIFT2 is equal to 60 µA, resulting in a CM input 
161 
Chapter 6. Computation of Time Delay Cues 
voltage of 1.2 V above Vss to appear at the input of the CMFB amplifier. In order to 
compensate for the CM current passing through R3 and R4, the sizes of M23 and M24 are 
adjusted in order to pass 30 pA in excess of the current in M21 and M22. The reference 
voltage for the CMFB amplifier is set by R6 (equal to half the value of R3) and IsH1FTn 
(equal to IsHIFT2). Capacitors C5 and C6 provide a direct signal path at high frequencies 
and thus generate a zero which compensates for the pole introduced by R3, R4 and the 
input capacitance of the CMFB amplifier. 
Frequency compensation of the CMFB loop is provided by C3 and C4, which 
effectively act as Miller capacitance since the voltage gain between the gate of M26 and 
the gate of M21 is close to unity and all poles in this section of the circuit are essentially at 
high frequencies. R5 is used as a zero-nulling resistor. This arrangement allows the GBW 
and phase margin of the CMFB loop to be set independently of the op amp GBW and 
phase margin. The current through M32 is set to 60 µA while the (W/L) ratio of M28 and 
M29 is equal to one half that of M26 in order to maintain a balanced current condition with 
zero CM error. 
M28 
Vol 
Ism 
Va 
130 
1 M31 
IstuM 
VCMFB 
V 
BNCAS2 
+`6 
M32 }-- 
BCMA 
R5 
C4 
Figure 6-8: CMFB amplifier 
6.4.3.4 Slew Rate Enhancement 
Since the output stage operates in Class AB, the slew rate of the op amp is mainly 
limited by the current available to charge or discharge the compensation capacitors Ct and 
C2 in Figure 6-7. In order to improve the slew rate of the op amp the current through M7 
(Mg) and M13 has to be increased and additional current sinks I3Ba (I3ab) have to be added 
in parallel with M3 and M4 (M5 and M6). Denoting the boost current which is passed 
through M7 and M13 during slewing condition by I7B and I13B, respectively, the single- 
ended slew rate of the op amp may be written as: 
SRS =17B-I3B+j2 
+I13a 
t 
_I38+I2 
_I 
SR Cl C, (6.5) 
162 
Chapter 6. Computation of Time Delay Cues 
In order to achieve symmetric slewing, SRT must be equal to SR': this condition is 
achieved by ensuring that I13B = 138 = 2178" 
The boost currents are generated by the circuit shown in Figure 6-9. During non- 
slewing condition, the currents through M68 - M71 are approximately equal and therefore 
the currents through M54 - M58 and M63 - M67 will be negligible. During slewing 
condition, the current difference between M68 and M71 (M69 and M70) will generate a 
boost current through M54 - M58 (M63 - M67), which effectively increases the available 
charging current for the compensation capacitors. M54 - M58 and M63 - M67 effectively 
also act as half-wave rectifiers such that only one branch is active during slew condition. 
In this design, the (W/L) ratio of M59 (and M61) is set slightly larger than that of M62 (and 
M60) in order to ensure that the boost currents are negligible during quiescent conditions, 
even though there might be some mismatch in M68 - M71 and M59 - M62. The current 
boosting scheme is regenerative, which ensures a rapid enhancement of the slew rate. 
VBIASI 
--I) Mn 
m 
r. i 
M54-M6, 
Figure 6-9: Slew boost circuit. 
6.4.3.5 Biasing Arrangement 
m .ý 
.: 
The biasing circuitry is shown in Figure 6-10, which also shows the boost current 
injection points for I7B and I13B. The quiescent current through the op amp is set by I1 
which is mirrored to the various op amp sections. High swing cascode mirrors have been 
used in order to ensure good mirroring even though the transistors operate close to triode 
region due to the low supply voltage used. M46 and M47 mimic the operation of Ml and 
M2: their gates are connected to VSS since in the intended application, the input CM 
voltage will be set to V. Similarly, M52 mimics the operation of M30 and M31 in the 
CMFB OTA, while M42 mimics the operation of M9 and M1o in the input stage. 
163 
Chapter 6. Computation of Time Delay Cues 
T M34 T1IT VBIAS2 VBIASI 
I- 
ýl 
-, 
I 1"º 
{_ M38 
M43 M48 f 
M49 
ýI 
so s3 36 
M3S VBPCAS 
M48 42 
1446 47 
M51 37 VBNCAS BI 
R7 SHIF'f1 
ISHI -2 187 
41 
VBNCAS2 
{ 
MSZ 
Ms 
'I 
9 [, 
Mai 
11 
If 1 
M40 
`f Mas 
1 
VBCMA 
IM 
Figure 6-10: Bias generator 
6.4.4 Simulation Results 
Simulation results using BSIM 3 ver. 3 MOS model show that the high speed 
Class AB fully differential op amp is capable of operation at supply voltages down to 
1.4 V (± 0.7 V). Open loop frequency response of the op amp with a load capacitance of 
5 pF is shown in Figure 6-11(a). Although the op amp is optimised for operation at a 
supply voltage of 1.8 V (± 0.9 V), the frequency response is also plotted for supply 
voltages of 1.6 and 1.4 V. Figure 6-11(b) shows the input referred noise of the op amp as 
a function of frequency. The thermal noise is less than -150 dB and is mainly due to 
transistors M3_6 in the input stage. 
Figure 6-12 shows the transient response of the op amp in unity gain 
configuration, with a capacitive load of 5 pF and operated at a supply voltage of 
1.8 V (± 0.9 V). In this case the input amplitude is set to 1.4 V. This test is also repeated 
with the current boost circuit disabled: with no slew-boosting, the slew-rate falls from 
210 V/µs to 42 V/µs. It can be seen that the op amp exhibits fast settling both as 
differential output and also as CM output level. Fast settling of the output CM level is not 
only necessary to achieve output rail-to-rail operation, but also to ensure correct operation 
of the subsequent stages. Results of the transient simulation carried out with a supply 
voltage of 1.4 V and with an input amplitude of 1.2 V are shown in Figure 6-13. 
Rail-to-rail and accurate output CM level control is still obtained at these reduced supply 
voltages; however, the slew rate is somewhat decreased. 
Table 6-1 shows a summary of the simulation results for the op amp operated at 
different supply voltages. In all cases, the load capacitance is set to 5 pF. The total 
164 
Chapter 6. Computation of Time Delay Cues 
quiescent power dissipation at ± 0.9 V is 3.8 mW. For the transient simulations, the input 
voltage is 1.4 V (for ± 0.8 and ± 0.9 V supply) and 1.2 V (for ± 0.7 V supply). The 
settling time to reach 0.1 % of the final value is measured from the start of the transition 
of the input signal. It can be seen that using this op amp it is possible to achieve a clock 
frequency of 13 MHz with an accuracy of 0.1 % at ± 0.9 V supply voltage. These results 
compare well with recently published designs [128], [129]. 
Supply 
V 
GBW 
MHz 
Phase 
Margin 
Gain 
dB 
SR 
V/µs 
0.1 % 
Settling time 
ns 
± 0.9 35 
26*11291 
62° 
64° *[129] 
93 
94 *11291 
210 
104 *[129] 
38 
100 *[l291 
± 0.8 34 62° 89 118 49 
75 FIO: E 1 65 °'[128] 68 *11281 1 400 ý 
.7 
32 60° 69 88 60 
Table 6-1. Summary of simulation results: values shown in italics are reported results in 
references [128], [129]. In [128], the load was 50 pF//500 (Z, while in [129] the load was 4 pF. 
100 
80 
60 
40 
. r, 
20 
O 
_2n 
10o 102 104 106 
Frequency, Hz 
0 
-50 bA 
4) 
yj -900 
. 1ý 
-150 
toe 
................ ................... .......................................... 
-200' ' 
100 102 10° 10° 108 
Frequency, Hz 
Figure 6-11 (a): Frequency response of the op amp at supply voltages of 1.8 V (t 0.9 V), 
1.6 V (± 0.8 V) and 1.4 V (± 0.7 V) with a load capacitance of 5 pF. 
............ i .......................... . .......................................................................... . 
165 
Chapter 6. Computation of Time Delay Cues 
-105 
-110 
N 
-115 
7 
-120 
-125 
-130 
-135 
w 
-140 
-145 
-150 
-15E 
1.4 V 
1.6 -1.8 V 
10 10 10 10 10 10 10 10 10 
Frequency, Hz 
Figure 6-11 (b): Input referred noise as a function of frequency. 
2 
O) 1 
ö0 
ý 
-1 
-2 
(c) 
(a) 
ý(d) 
(b) 
0 
400- 
300- 
200- 
100 
O 
0.05 0.1 0.15 0.2 
Time, µs 
0.25 0.3 0.35 
(e) 
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 
Time, µs 
Figure 6-12: Transient response of the op amp in unity gain configuration operating at a 
supply voltage of ± 0.9 V (load, 5 pF) and with an input voltage of 1.4 V: (a), (b) single- 
ended outputs, (c) differential output, (d) differential output with no current boosting, (e) 
current waveform in the tail current source M13. 
166 
Chapter 6. Computation of Time Delay Cues 
1.5 
0.5 
c 
0 
0., 
15 
O 
-1. 0 0.5 1 1.5 2 2.5 3 3.5 
Time, µs 
Figure 6-13: Transient response for the op amp in unity gain configuration, operating at 
± 0.7 V with a 1.2 V square wave input, (load, 5 pF). 
6.5 The Parallel S/H Bank 
A parallel S/H bank has been chosen instead of a cascade delay line: thus shifting 
of charges form one capacitor to another is avoided. This reduces errors, which may arise 
due to the various non-idealities which occur as the charge propagates along the delay 
line, such as clock feedthrough, charge-injection, capacitor mismatch, noise and offset 
errors. The use of a parallel S/H topology rather than a delay line, however, complicates 
the generation of the digital control signals. The main design issues of the S/H circuit are: 
rail-to-rail signal operation at low supply voltages, low offset voltage and accurate value 
retention of the input sample. The schematic diagram of the FIFO S/H block is shown in 
Figure 6-14 and is based on the fast settling, high slew rate low voltage op amp described 
in the section 6.4. 
The circuitry has been designed to operate on a±0.9 V supply. The input 
common mode voltage of the op amp was set to - 0.9 V (which allows adequate gate 
over-drive for the pMOS input pair), rather than 0V. The input is sampled by closing 
switches KA. The input voltage is then stored on capacitors CB(n) by closing switches K, 
and switches SnB2, SnB3. On the next cycle the voltage is also copied on CA(n) by 
closing SnB2, SnB3, SnA3 and SnAl. 
167 
Chapter 6. Computation of Time Delay Cues 
-0.9 V0V S5B1 S5B4 
CB(5) 
-410 S5B2 S5B3 
\--- S5A4 
S5A 1 
0-0 
S5A2 CA(5) S5A3 10 
SIB1 SIB4 CBM 
0v -0.9 V 
S1B2 SIB3 
K, 
IM S1A4 
" 
/K A 0_ýý 
KA 
/ 
C^( 
In+ KI S1A2 
1) S1 A3 
Out+ 
In- -e K Out- KA 
SIM 
K, 
KA 
S IM 1 
0V 
SIA2 Ail) SIA3 
-0.9 V1 S1B4 110-0 
S1B2 C8(1) S1B3 
S5A1 SSA4 
S5A2 Cn(5) S5A3 
-0.9 V 
S5B4 
SS 1I 
0V 
S5B2 CB(5) S5B3 
Figure 6-14: Parallel S/H bank. 
Due to the finite op amp gain, the inputs of the op amp do not perfectly track each 
other and therefore some charge remains stored on the parasitic capacitances present at 
these nodes. The charge on these capacitors depends on the present op amp output and 
therefore introduces crosstalk between adjacent samples during the read process. Since 
the value on each capacitor is read 44 times before it is again initialised, the resulting 
crosstalk can greatly affect the accuracy of the circuit. Using 2 hold capacitors for each 
168 
Chapter 6. Computation of Time Delay Cues 
sample, the severity of this problem is reduced. During read mode, first CA(n) is read: this 
causes the op amp output to be near to the ideal stored value. However, some error will be 
present due to the change in charge in the input parasitic capacitors, which also corrupts 
the value on CA(n). Then CB(n) is read, while CA(n) is again refreshed: during this process 
the change in charge across the parasitic capacitances will be small and therefore the 
value stored on CB(n) will not be affected. The reading actually used for processing is the 
one taken using CB(n). The resulting crosstalk coefficient during the read process of 
adjacent samples is approximately given by: 
For the 1St step: 
öyo(i+t) 
^ 
cp (6.6a) 
avo (i) C Av 
For the 2 °d step: 
avo(i +1) cp 
avo(i) =I CHA, J (6.6b) 
where A is the open loop op amp gain, Cp is the parasitic capacitance present at the input 
node of the op amp and CH is the storage capacitance. 
Errors due to charge-injection and clock feedthrough have been reduced 
practically to zero by introducing certain delays between the switch phases: switches 
connected to the output node of the op amp (SnA3, SnB3) are switched off slightly after 
the switches connected to the input nodes of the op amp (SnA2, SnB2). The input nodes 
of the op amp are always at -0.9 V approximately, while the output nodes can be at any 
value between Vdd and VSS. Thus the charge injected by SnA2 and SnB2 is the same for 
both the positive and negative branches of the circuit and its effect therefore cancels out 
by the differential topology. The charge injection due to the SnA3, SnB3 has negligible 
effect on the capacitor charge since the other terminal of the capacitor is open. Similarly, 
the charge injection effect due to the input switches is reduced by switching off the KA 
switches connected to the -0.9 V potential before those connected to the input source. 
The selection of transistor switch sizes is another important issue. Choosing a very 
large transistor size increases its associated parasitics while choosing a small size results 
in a large on-resistance and thus increases the settling time of the circuit, and the 
additional poles in the feedback path may even cause the circuit to go unstable. For 
switches connected to - 0.9 V potential or to the inputs of the op amp small single nMOS 
169 
Chapter 6. Computation of Time Delay Cues 
transistors are sufficient: in those cases, since a full 1.8 V gate-source drive is available 
for switching them on. The use of a transmission gate does not give any additional 
advantage since the pMOS transistor will always be off. For nodes connected to 0V or to 
the outputs of the op amp, a transmission gate is used, in order to ensure that at least one 
transistor remains switched on during the on state for the whole signal range. It is found 
that even by using relative large transistor sizes, the on-resistance is still too high 
especially when the voltage is near 0 V. At this point the minimum available VGS is 0.9 V. 
Although the nominal threshold voltage of the process used is around 0.7 V, this value 
could be as high as 0.8 V in the worst case and is further increased by the body-effect. 
There are two possible solutions to this problem: either eliminating these "critical" 
switches via the use of the switched op amp technique or else increase the gate voltage 
via clock voltage doubling. The first solution is not practical in this case since the op amp 
is shared between different integration capacitors and a switched op amp 
architecture [130] would require replicas of the output stage for each integration 
capacitor. Also, the op amp needs to operate at a fairly high clock frequency which means 
that the switching action of a switched op amp needs to be very fast. The second solution 
is chosen here. Thus the chip contains a voltage multiplier, together with the digital 
voltage level shifters necessary for driving the critical switches. 
6.6 Switched Capacitor Multiplier 
The multiplier used in order to generate the correlation products is based on a SC 
topology as shown in Figure 6-15. The multiplier core is based on the quadratic ID-VGS 
behaviour of MOS transistors operating in strong inversion and a detailed analysis of this 
circuit is given in [131). Analysis of the circuit shows that the voltages on nodes A-D 
may exceed the supply voltages under certain input conditions and therefore a special 
switch configuration (2 MOS devices in series with their bulks connected to each other) 
has to be used for the switches connected between these nodes and Vdd. The original work 
was carried out with nMOS transistors used as switches, which required the use of an 
isolated p-tub in order to eliminate leakage to the bulk. In the current CMOS process, 
only isolated n-tubs are available and therefore the circuit has been modified to use 
pMOS switches in this position rather than nMOS switches. 
170 
Chapter 6. Computation of Time Delay Cues 
0.9 V 
K, 
Vinl 
K2 
0V 
Ki 
vif1- 
K2 /. 
o 
Ki 
KZ 
0 
VIZ Ki 
K2 
0V 
-0.9 V 
Figure. 6-15 (a): SC Multiplier core. 
ocm 
1+ 
o 
During K1, the inputs are sampled and the required multiplication result is 
available as (VO' - V, -), while Vom = Vdd + Vss - (Vo++ Vo-)/2. On K2, assuming that the 
output node of the fine common-mode correction OTA remains at 0 V, the output 
voltages and Vo2+ and Vol will be given by: 
171 
Chapter 6. Computation of Time Delay Cues 
+ Vo l+ Vo l 6.7a Vol = Vss - 
+[v 
ss + 
Vdd + Vol +-2) 
Vol = Vss -2 Vss + Vdd + Vol- - 
Vol 
2 
Vol 
(6.7b) 
It can thus be seen that the new common mode voltage of Vo2+ and Vol- is equal to 
VSS - 0.5(VSS + Vdd) which is equal to VSS for VSS = -Vdd. The nodes Vo2+ and Vol directly 
drive the inputs of the integrator op amp which are intended to operate at a common mode 
voltage of Vss. The feedback OTA is used to provide a further fine correction of the 
common mode input voltage to the integrator op amp in order to allow for errors 
introduced due to circuit non-idealities such as clock feedthrough and charge injection. 
v0+ 
Vcm 
vo- 
K2 
"o2 
v02 
Figure 6-15 (b): SC multiplier common mode adjustment 
The switches connected to the input nodes and to 0V potential are implemented as 
transmission gates and are clocked with a boosted supply voltage. Switches connected to 
Vdd are implemented as two pMOS transistors sharing the same floating bulk. Switches 
connected to Vss are implemented as nMOS transistors. 
The feedback OTA is based on a single-stage cascoded structure as shown in 
Figure 6-16. Transistors M2, M3, Mg and M9 are identical as well as M1 and M4. M6 and 
Ki 
172 
Chapter 6. Computation of Time Delay Cues 
M7 are also identical and are set to one half M5. M12, M13 and M15 are identical, while 
M14 is set to one half M4. The current Ibias is set to 100 µA and therefore the current 
through M12 and M13 is 50. iA, which is the same current set through M14 and M, 5. The 
value of Rbias is 4.5 kQ, which results in a voltage drop of 225 mV: this voltage is the 
drain to source voltage for M9 and M8, thus ensuring that these transistors operate in 
pinch off region. All transistors have a channel length of 1 µm except M 10 and M> > whose 
channel length is set to 2 µm in order to enhance the voltage gain of the OTA, which is 
approximately given by gm5/gdsI1. 
414 
Ibias 
415 
Rbias 
M2 
6.7 Integrator Bank 
A standard fully differential SC integrator has been implemented in order to 
integrate and store the correlation result. The schematic diagram of the integrator is 
shown in Figure 6-17. The switches connected to the input of the multiplier are used to 
isolate the integrator from the multiplier during the comparison phase. During the 
correlation phase, these switches are kept closed. Resetting of the capacitor values is done 
at the beginning of the correlation phase by switching on S,,, and S, r4 simultaneously. 
Integration is carried out during the K2 phase of the multiplier by simultaneaously closing 
S, 2 and SO of the appropriate capacitor corresponding to the particular delay position of 
the delayed input. During comparison mode, the integrator bank is read sequentially by 
closing Sic2 and S, c3 keeping the input mode switches open. 
173 
Figure 6-16: OTA used for output CMFB. 
Chapter 6. Computation of Time Delay Cues 
-0.9 V 
Mode 
VIN+ 
VIN * !1 
Input from 
Mutliplier 
Figure 6-17: Integrator bank. 
6.8 Analogue Memory 
vo+ 
vo 
Output to comparator and 
analogue memory 
An analogue memory has been designed which allows sampling of a new input 
during the read mode phase, and optional update of the stored value during an update 
phase, thus requiring just two clock cycles for read and update. The memory is used for 
temporary storage of the highest correlation value during comparison phase. The 
schematic diagram of the analogue memory element is shown in Figure 6-18. 
174 
S52 S53 
Chapter 6. Computation of Time Delay Cues 
VSS 0V 
4RD I 4D 
0v 
Vin 
Vin 
ýUPD 
"RD Cý 
4D 
4UPD 
y 
v+ + 
vo- 
ov 
Figure 6-18. Analogue memory. 
The analogue memory is controlled by two clock phases: the read (fan) phase and 
the update (4uPD) phase. Capacitor C2 stores the current memory value. During the read 
phase C2 is connected in the feedback path of the op amp and thus the value of the voltage 
across it appears at the output of the op amp. In the meantime capacitor C3 is discharged, 
while the input voltage is sampled on C1. If the read signal is followed by an update 
signal, then the charge on Ct is transferred to C3, causing the output of the op amp to be 
equal to the previously sampled value. At the same time, this value is also stored on C2 
such that it can be read at a later stage. In the current application, the value stored in the 
analogue memory is read and compared with the integrated value being output by the 
integrator. If the integrator output is found to be higher than the analogue memory output, 
the analogue memory is updated by issuing an update signal after the read signal. 
In the current implementation all switches are implemented using transmission 
gates, although for switches connected to - 0.9 V potential a single nMOS pass transistor 
would have been sufficient. A boosted clock voltage is used for switches connected to 
0 V, input nodes and op amp outputs, in order to ensure sufficient gate overdrive for all 
the signal swing. In order to minimise clock-feedthrough effects, switches connected to 
the inputs of the op amp are switched off slightly before the other switches. 
175 
C3 \ OR ý 
VD 0V 
Chapter 6. Computation of Time Delay Cues 
6.9 Dual Differential Input Comparison Circuitry 
6.9.1 Adder 
A passive SC adder shown in Figure 6-19 is used in order to transform the two 
differential signals coming from the integrator bank and the analogue memory into a 
single differential signal before being processed by the comparator. 
The input V;,, j is derived from the integrator, while Vin2 is the output from the 
analogue memory. During K2, one side of the capacitors is held at 0 V, while the other 
side is held at a specific input voltage Vc,,, by the comparator. Thus, during KI the outputs 
at Vö and V. - are respectively given by: 
Vo+ = Vcm +C 2C 
c(Vint++V) 
+Cp 
Vo =Vcm+ 
C 
(Vn++Vn1 ) 
2C+ (6.8) 
In the above expression Cp is the parasitic capacitance at the input nodes of the 
comparator to ground. It can be seen that Vö - VJ is proportional to (Vin, - Viii2). At this 
point only the sign of this expression is important and thus the parasitic capacitance Cp 
does not affect the behaviour of the circuit. Furthermore any clock feedthrough 
introduced by the K2 switches will have the same effect on both branches and will 
therefore be cancelled out completely under ideal device matching conditions. Clock 
feedthrough introduced by the K1 switches is not important since the capacitors are 
refreshed on the next phase. 
VNI+ + 
Vin2+ --1 
K, Ki 
KZ 
40 W, Vo+ 
K2 
ý0V lOV 
V° 
Vin2 N Vinl^ 
KI K, 
K2 K2 lov 1 
ov 
Figure 6-19: SC Adder 
176 
Chapter 6. Computation of Time Delay Cues 
6.9.2 Comparator 
A two-stage comparator, shown in Figure 6-20, is used for the selection of the 
maximum correlation value. The first stage consists of a differential pair M3 and M4. The 
biasing current for this pair is determined by M6 and M7 while M, and M2 set the voltage 
at the drains of M3 and M4 [132] in order to provide common mode control of the first 
stage. A second voltage amplification stage is built around Mg and M9. The speed of the 
comparator is enhanced via the use of two latches (M 13, M14) and (Mil, M12). The 
comparator operates on two phases: refresh mode and comparison mode. During refresh 
mode (K2), the input transistors of the first and second stage are diode connected. This 
operation charges the capacitors in the adder circuit and the inter-stage coupling 
capacitors C with the required voltage shift and provides a form of auto-zeroing [127]. 
Transistors M8 and M9 are sized such that during track mode the gate-source voltage of 
M11, M12 is well below the threshold voltage such that the latch is disabled. The upper 
latch M13, M14 is disabled via switch M10. The quiescent current through Mg and M9 is 
therefore determined by the mirror transistors M15, M16. During comparison mode the K2 
switches are off and the inputs V+ and V- are generated by the adder circuit. The nKl -Del 
signal is then driven low (slightly after the KI goes high) enabling the latch formed by 
M13, M14 and thus resulting in a fast comparator action. Two additional inverters are 
included at the output in order to transform the voltage levels at M15, M16 into the 
required digital levels. 
Under ideally matched conditions, the clock feedthrough and charge-injection 
errors introduced at the end of the track phase will be identical for both the positive and 
negative paths and therefore it is completely cancelled out. The K2 switches are operated 
at a boosted clock voltage in order to ensure reliable switching operation. 
6.10 Digital Control 
The digital block of the correlator generates all the signals required by the S/H 
bank, multiplier, integrator, analogue memory and comparator. In addition the digital 
block outputs a 3-bit signal containing the position of the maximum correlation value 
determined during comparison mode. The digital control block has four inputs: clock 
(CK), compare mode (comp_mode), reset (rst) and compare result (comp result). The 
comp mode signal is to select between correlation phase and comparison phase (in which 
the correlation results are compared and the position of the maximum value is recorded). 
177 
Chapter 6. Computation of Time Delay Cues 
Q 
nQ 
Figure 6-20: Clocked comparator. 
Comp_result is the input from the analogue comparator used during the comparison 
phase. Two state variables i and j are internally used in the control logic. 
6.10.1 Reset Phase 
The reset operation is asynchronous to the clock and is implemented by assigning 
i=I and j= 10. In this condition, all capacitors in the S/H block and integrator block are 
reset, while the multiplier is disabled by applying KI and 4RST. 
6.10.2 Correlation Mode 
During comparison mode the digital control system is controlled by the following 
two nested loops which control the values of i and j: 
if (j<9) 
N+1; 
else 
begin 
j=0; 
if (i>O) i=i-1; else i=4; 
end 
The above process is triggered on the positive edge of the clock signal. The S/H bank, 
multiplier and integrator are controlled according to the values of i and j in the following 
manner described using pseudo-code: 
178 
Chapter 6. Computation of Time Delay Cues 
if (j=0) 
begin 
Reset capacitors CA[i] and CB[i] in the S/H bank 
Read CA[(i+4) mod 5] 
Multiplier in K2 phase (output valid) 
11 Integrate multiplier output on CI[O] (corresponds to 0-delay point) 
end 
else 
if (j[0]=1) //odd-phase used for actual read phase and mult i/p 
begin 
// Read CB {(i+4 j [3: 1 ]) mod 51 
// Refresh CA {i+4 -j [3: 1 ]) mod 5) 
// Multiplier in KI phase (sample input) 
// Integrator is disabled 
if (j=7) //sample input on input capacitors of S/H circuit 
end 
else //even-phase used for finite-gain compensation of S/H bank and mult o/p 
begin 
// Multiplier in K2 phase (output valid) 
// Integrate multiplier output on CI{(5 j[3: 1]) mod 5} 
(corresponds to 5j [3: 1 ] delay point) 
if (j=8) Load sampled input on CB[i] 
else Read CA {(i+4 j [3: 1 ]) mod 51 
end 
As can be seen from the above process, the input is sampled on every i-iteration 
(during j= 7). The previously stored values are then scanned one by one and correlated 
with the current values coming for the other channel. The multiplier output is integrated 
on the integrator capacitor corresponding to the relevant delay point. Every j-iteration the 
S/H capacitor (CA[i], CB[i]) holding the "oldest" value is discarded (during j= 0) and the 
new value is loaded into it (during j= 8). Pipelining of processes, which can be carried 
out in parallel, reduces the number of cycles required for the correlation and thus also 
lowers the minimum clock frequency requirement. 
6.10.3 Comparison Mode 
Determination of the maximum correlation position takes place during comparison 
mode. Comparison mode is enabled by setting the comp_mode input high. During this 
mode, the value of the state variable j is set to 11 and another state variable comp_state is 
used to control the comparison process. The value of comp-state is initially I and is 
incremented every positive edge of the clock CK2. With j= 11, the digital control block 
179 
Chapter 6. Computation of Time Delay Cues 
resets all capacitors in the S/H bank, resets the multiplier and opens the input switches of 
the integrator bank, thus isolating it from the multiplier. The comparison process takes 
place serially as follows: 
if (compstate=l) 
// Read position CI[O] on integrator bank 
Set analogue memory to READ mode 
Set adder to K2 phase and comparator to TRACK mode 
else 
if (compstate=2) 
// Read position CI[ I] in order to allow the integrator bank to settle for the next 
read 
// Set memory to UPDATE mode in order to load the value stored in CI[O] into it 
// Set adder to K2 phase and comparator to track mode 
// Max_pos=0 
else 
if (compstate[O]=1) // odd phase 
// Read position CI[compstate[3: 1]] 
Set analogue memory to READ mode 
// Set adder to K1 phase (sample input from integrator bank) 
// Switch off comparator TRACK switches: comparison is thus enabled 
else 
If even phase 
// Read position CI[compstate[3: 1]] to prepare integrator for next phase 
// Switch off READ switches in analogue memory 
// Set adder to K2 phase and comparator to TRACK mode 
If (mem_update=l) 
// Max_pos= compstate[3: 1 ]-1 to record current maximum value position 
/I Set analogue memory to UPDATE mode 
else 
// Switch off analogue memory UPDATE switches 
The above process is triggered on any change in comp_state or mem update. The 
variable mem_update is used to indicate the necessity to update the analogue memory 
value with a new value and is controlled by the comparator output in the following 
manner: 
always @(negedge CK3) 
if (Adder is in K1 mode) 
// mem_update = comparator output 
In addition, the nK,. D,,, latch-enable input of the comparator is controlled directly 
by the KI signal of the Adder and CKI using: nCOMP = (Adder K1 & -CK1). The 
180 
Chapter 6. Computation of Time Delay Cues 
resulting timing sequence is shown in Figure 6-21. Input data to the comparator is valid as 
soon as Add_K1 goes high. After about a clock period, the comparator output is latched 
by driving the nCOMP signal low. Its output is then sampled on the falling edge of CK3, 
in order to ensure sufficient time for the output to settle. The comparator output becomes 
CK1 
CK2 
_ý 
II 
CK3 
TRK 
Add_K 1 
nCOMP 
Sampling point of 
Comp. output 
tL 
IT 
, -r 
LL T 
+ 
tL 
'-1- 
L 
T 
iý 
i 
ý' 
ýý 
iý 
Figure 6-21: Comparator control waveforms. 
again invalid when Add _KI goes 
low. 
6.10.4 Dead-Band Generation 
u 
1' 
i 
1 
The digital control block generates a number of clock phases and in each case a 
finite amount of dead-band must be ensured between adjacent phases in order to allow for 
the finite turn-off time of the MOS switches and also for the propagation delays of the 
switch control signals. The technique shown in Figure 6-22 has been used in order to 
generate the dead-band periods. 
CK; 
CK3 
CK2 
CKM 
Figure 6-22: Scheme for dead-band generation 
I 
11 
1 
1 
1I 
181 
Chapter 6. Computation of Time Delay Cues 
The digital delays were implemented by cascading six buffer sections each 
consisting of a weak inverter followed by a strong inverter. The total delay achieved 
in 
this way is around 10 - 15 ns. The synchronous section of the digital control 
logic is 
OE = -(CKI& -CK3) 
triggered on the positive edge of CK2. An out enable (OE) signal is generated using the 
following combination of CK3 and CKI: 
The OE (output enable) signal is an active high signal with a dead-band (logic 
low) interval equal to the total delay between CK;,, and CK3. This dead-band interval is 
centred around CK2. The dead-band between the various output signals is thus generated 
by ANDing these signals (having not dead-band) with the OE signal. 
6.10.5 Clock Voltage Doubler 
A boosted supply voltage is required in order to drive certain critical switches in 
various blocks forming part of the correlator. The schematic diagram of the voltage 
doubler which has been used in this application is shown in Figure 6-23. The digital logic 
generates complementary clock waveforms (including a dead-band interval) at the outputs 
of IN1 and IN2. The sizes of these two inverters have to be large enough in order to drive 
capacitors Ct and C2. Assume that Cl and C2 are initially discharged. When the output of 
IN1 is high and that of IN2 is low, the gate voltage of M2 will be at Vdd potential. This 
causes C2 to charge to (Vdd - VT - Vss). The sources of Mt and M2 will be at Vdd and 
(Vdd - VT) respectively, causing M4 to turn on and M3 to turn off. A capacitive divider is 
thus formed by Ct and C3, causing VDD_BOOST to reach a value which is somewhat 
smaller than VDD. When the output of IN2 goes high, the source of M2 will be at 
(2Vdd - VT - Vss). This causes Cl to charge to (Vdd - Vss). At this point M3 turns on and 
M4 turns off. Under steady state conditions and with zero load current, the value of 
VDD BOOST will reach a maximum value of (2Vdd - Vss) which is 2.7 V in this case. 
The actual output value will be somewhat less and depends on the load current. It is 
important that the bulk terminals of M3 and M4 are connected to VDD BOOST, since this 
is the highest potential on the drain/source terminals, in order to avoid leakage to the 
n-well. 
182 
Chapter 6. Computation of Time Delay Cues 
11,4,4 
C: 
iNZ 
Figure 6-23: Voltage doubler 
6.10.6 Voltage Level Shifters 
lOOST 
A standard CMOS voltage level shifter, shown in Figure 6-24 has been used in 
order to transform the output level (VDD) from the digital control block to the boosted 
voltage level (VDD BOOST) as required by some of the transistor switches. When Vi,, is 
low, M1 is switched on while M2 is off. This forces M5 to be switched off and M3 to be 
switched on. The latch configuration formed by M3 and M5 ensures that they can never be 
both on, so that no quiescent power is dissipated. Thus, with V;,, low M8 is turned high 
while M7 is switched off, causing the output to go low. The reverse situation occurs when 
the input is high. In order to save some area, all pMOS transistors share the same 
substrate. This circuit has to be repeated for every separate switch control voltage to be 
boosted. 
VDD_BOOST 
vout 
vm 
Figure 6-24: Voltage level shifter. 
183 
Chapter 6. Computation of Time Delay Cues 
6.11 Simulation Results 
A 5-stage version of the correlator has been designed and simulated. 
Figure 6-25 (a) shows the S/H output waveform and the corresponding input waveform 
with the correlator operating at an input sampling frequency of 500 kHz. 
Figure 6-25 (b) shows the corresponding integrator and analogue memory outputs 
during comparison mode after an integration time of 50 µs, for the case when the R-input 
is delayed by 6 µs relative to the L-input. In this case the analogue memory is updated 
after each comparison stage and the value output by the digital control circuit is equal to 5 
showing the correct position of the maximum correlation. Figure 6-25 (c) shows the same 
output waveforms for the case of zero phase delay: in this case the first value stored in the 
analogue memory is retained and the digital output value is equal to 1, indicating 
maximum correlation at zero delay. 
6.12 Experimental Results for the Correlator 
The 5-stage correlator chip which measures 3.4 x 3.1 mm and contains 5172 
transistors has been fabricated (layout in Appendix A. 6) and tested using the test set-up 
shown in Appendix A. 7. The chip was first tested at a clock frequency fck of 754 kHz, 
thus achieving a maximum detectable delay of 46.4 µs. This test was purposely carried 
out at a lower clock frequency than the intended clocking frequency of 3.88 MHz (for a 
44-stage correlator), in order to compensate for the reduced number of stages. The time 
delay between the L and R test inputs (spectrum 50 Hz -1 kHz) was varied from 0 to 
60 µs, while the maximum correlation position was indicated by l. e. d. s. The present TDM 
correlator has no automatic gain control mechanism and therefore the correct choice of 
integration interval and input amplitude is necessary in order to ensure that the integrator 
does not saturate. The integration time was set to 100 ms and the input amplitude was 
limited to around 100 mV. The supply voltage was set to ± 0.9 V. Although the boosted 
clock voltage was designed to be 2.7 V, a value of 1.1 V (referenced to 0 V) was found to 
be sufficient. The corresponding test results are shown in Figure 6-26 where both the 
ideal and actual chip responses are plotted: at the operated clock frequency, the time delay 
resolution of the chip is 13.3 gs (effective input sampling rate 75.4 kHz), while the 
measured maximum error is 1.9 µs. This characteristic is practically the same for the 
5 prototypes that were tested. The power dissipated by the analogue section of the chip is 
184 
Chapter 6. Computation of Time Delay Cues 
16 mW, while that dissipated by the digital control circuit (excluding the voltage-boosted 
clock circuitry) is 160 µW. The boosted voltage control circuitry consumes 16 µW. 
0.9 
0.7 
V 
0.5 
0.3 
0.1 
0.1 
a 
V 
o. c 
o. 1 
Input 
Output 
49.5 
0.14 r 
0.1 
v 0.06 
50 50.5 51 51.5 52 52.5 
Time, µs 
r Integrator o/p 
Memory o/p 
0.02 
49.5 50 50.5 51 51.5 52 52.5 
Time, µs 
Figure 6-25: Simulation results for (a) S/H bank, integrator and analogue memory 
output - with delay (b), without delay (c) 
185 
05 10 15 20 25 
Time, µs 
Chapter 6. Computation of Time Delay Cues 
5 
4 
0 
0 a 
0 
äý 
0 U 
cd 
2 
Actual 
output ^f 
4Ideal output 
0 10 20 30 40 50 60 
Delay, µs 
Figure 6-26: Measured and ideal correlator chip responses at fck = 754 kHz. 
A timing diagram depicting the main control signals (comp-Mode, reset, 
compstate<O>) is shown in Figure 6-27. The chip output is valid when comp-mode is 
high and after all comparisions have taken place. The reset signal is applied soon after 
comp_mode goes low in order to reset the S/H and integrator banks: correlation takes 
place during the rest of the period when comp mode is low. 
Figures 6-28 (a)-(e) show in detail the comparator waveforms during comparison 
phase for different L-R delay conditions. The upper waveform shows the actual 
comparator output, while the lower waveform shows the clock signal of the D-Latch 
clock input which is used to latch the comparator output: the positive edge of this 
waveform corresponds to the comparator output sampling point. 
186 
Chapter 6. Computation of Time Delay Cues 
PM3394B 
chi 
ch2 
chi 
Figure 6-27: Main timing diagram showing (a) Reset (ch3), (b) Comp_state<O> (chi) 
and (c) Comp_mode (ch2) signals 
PM3394B 
ch3 
chi 
Figure 6-28 (a): Comparison stages: D-latch clock (chi) and comparator output (ch3) 
signals with the L-R delay set to 0 s. 
187 
Chapter 6. Computation of Time Delay Cues 
PM3394B 
chi 
chl 
Figure 6-28 (b): Comparison stages: D-latch clock (chi) and comparator output (chi) 
signals with the L-R delay set to 15 is. 
PM3394B 
ch3 
chi 
Figure 6-28 (c): Comparison stages: D-latch clock (chl) and comparator output (ch3) 
signals with the L-R delay set to 25 µs. 
188 
Chapter 6. Computation of Time Delay Cues 
PM33 94.8 
cn3 
chl 
Figure 6-28 (d): Comparison stages: D-latch clock (chl) and comparator output (ch3) 
signals with the L-R delay set to 40 ps. 
PM3394B 
ch3 
chl 
Figure 6-28 (e): Comparison stages: D-latch clock (chl) and comparator output (ch3) 
signals with the L-R delay set to 50 is. 
189 
Chapter 6. Computation of Time Delay Cues 
The chip has also been tested at the intended clock frequency (for the 44-stage correlator) 
of 3.88 MHz. At this clock frequency, the time delay resolution is 2.58 µs (effective input 
sampling rate of 388 kHz). The corresponding test results are shown in Figure 6-29 where 
both the ideal and actual chip responses are plotted: the maximum error is 0.29 µs. At this 
frequency, the power dissipated by the analogue section is 16 mW, while that dissipated 
by the digital control circuit (excluding the voltage-boosted voltage clock circuitry) is 
710 µW. The boosted voltage control circuitry consumes 70 µW. 
5 
r4 0 
.ý 
.ý 
ä. 3.5 
0 
Cd 3 
2.5 
4.5 
1.5 
Ideal output 10 
Actual output 
02468 10 12 
Delay, µs 
Figure 6-29: Measured and ideal correlator chip responses at fck= 3.88 MHz 
6.13 Conclusions 
A novel SC low voltage correlator, intended for the extraction of ITD cues in a 
sound localisation system, has been developed. The correlator chip has been tested 
successfully at ± 0.9 V for different delays presented at the input. Each delay presented at 
the input should activate the appropriate correlator output. The correlator chip has been 
tested at two clock frequencies and in both cases the measured output follows closely the 
expected output. Measurements at the higher clock frequency, necessary for a 44 stage 
190 
Chapter 6. Computation of Time Delay Cues 
correlator, show that the correlator chip output follows the expected output with a 
maximum error of 0.29 µs on the expected time delay resolution of 2.58 µs. 
The observed errors in the ITD cues result from an accumulation of inaccuracies 
that arise from clock feedthrough and charge injection effects in the parallel S/H, 
integrator bank and comparator SC stages and also from mismatch errors in the integrator 
bank capacitors. Reported asynchronous pulse delay lines are prone to both sampling 
interval errors as well as amplitude errors. A delay line implemented using SC topology is 
only affected by amplitude errors. 
The errors in the measured time delay are found to be sensitive to the supply 
voltage used. This effect could be due to two reasons: 
(i) the coarse common mode output correction in the multiplier forces the common mode 
value to VSs 0.5(VSS + Vdd), which is equal to VSS (required value) only if Vss = -Vdd. Any 
discrepancy between the magnitudes of Vdd and Vss will cause an error in the common 
mode voltage, which although attenuated by the fine common mode adjustment, will still 
result in some finite residual error which can affect the performance of the subsequent 
integrator stage. 
(ii) clock feedthrough and charge injection errors are both a function of the clock voltage. 
The differential topology with the delayed clock switching should reduce the errors by a 
large amount compared to a single-ended topology. However, residual errors still exist 
which are mainly due to mismatches between the two paths forming the differential 
topology. 
Since the correlator chip operates successfully at the higher clock rate, the chip 
can be easily extended to 44 stages by adding the necessary capacitors and switches and 
modifying the digital control circuit. 
191 
Chapter 7. System Testing 
Chapter 7 
System Testing 
7.0 Introduction 
The results obtained from the system developed here for 2-D sound localisation 
are presented. The designed hardware tested in the system consists of the onset detector 
chip, which minimises errors arising from echo interference, and the front-end chip, 
which extracts lID cues, I" order and 2"d order MSC cues. These hardware extracted 
cues together with software generated ITD cues are used to obtain the 2-D source location 
via the novel 3-step cue-to-position mapping algorithm. The test set up is described, 
followed by the measured results. Results obtained for the individual cue measurements 
under ideal conditions and 2-D source localisation performance under different noise 
conditions are reported. The performance of the onset detector in reducing errors arising 
from echo interfence is also assessed. Finally, the localisation performance of the system 
using different sound sources is presented. 
7.1 Test Setup 
The system is tested via the use of the PC/C32 DSP board, made by 
Loughborough Sound Images, equipped with the AM/D16SA Burr-Brown ADC/DAC 
daughter module; this board is used in order to generate the required test signals and to 
capture the resulting cue signals generated by the designed chips. The board has 2 input 
and 2 output dc coupled analogue channels which are capable of real time operation. The 
associated A/D and D/A converters are both 16-bit and have a maximum sampling rate 
capability of 200 kHz and 500 kHz, respectively. During the tests, the sampling 
frequency is set to 44 kHz. Two of the channels are used to output the left and right test 
signals while the other two channels are used to capture two cues at a time. The test 
stimulus data signals for different source positions are generated using Matlab in a similar 
way as described in chapter 3 and pre-stored in files on the host PC. A block diagram of 
the test setup is shown in Figure 7-1. 
The front-end chip is capable of generating the BPF output signals, envelope 
signals, IID cues, first order and second order MSC cues. The PTD cues are computed via 
software using cross-correlation of the hardware BPF output signals and envelope signals. 
192 
Chapter 7. System Testing 
During these tests, the reference currents are set to 10 nA for the IID dividers and 1 to 
for the MSC dividers. The values for the reference currents are chosen in order to 
optimise the dynamic range. The BPF centre frequencies are set in range 80 Hz - 17 kHz. 
L and R test data files 
generated using Matlab 
via HRTF convolution 
Loughborough Sound 
Images PC/C32 DSP 
board (+ host computer) 
V V w w 
ýy 
ü w H .i 
Onset detector and 
front-end chip 
Cues data file 
Cues to position 
mapping using 
Matlab 
Figure 7-1: Test setup for the system testing - the L and R test data are pre-recorded using 
Matlab and are used by the PC/C32 DSP board in order to generate the L an R signals. The 
various cues are then captured using the PCIC32 DSP board and stored in a file, which is 
again processed by Matlab in order to evaluate the source position. 
7.2 Hardware Cues Template 
A cues template file, customised to the chip under test, is first extracted. The 
L and R stimulus signals are computed using an impulse signal as a sound source, 
convolved with the appropriate position-dependent HRTF. A test sequence length of 
100 ms is used. The generated lID and MSC cues are sampled and averaged over the 
same time interval of 100 ms. 
7.2.1 IID Cues 
Figure 7-2 (a) shows the plot of the measured III) cues, for the 24`h filter, as a 
function of azimuth for elevation values of -40° to 80°. The lID value shows a general 
193 
Chapter 7. System Testing 
80 
70 
60 
z 5O 
c 
Q 4C 
H 
3[ 
21 
bo 
55 
50 
45 
4C 
3. 
3{ 
2' 
2 
1 
0 20 40 60 80 100 120 140 160 180 
Azimuth, degrees 
(b) 
Figure 7-2 (a), (b): Measured IID values as a function of azimuth for different elevation 
angles for (a) 24`h BPF, (b)14th BPF. 
194 
20 40 60 80 100 120 140 160 180 
Azimuth, degrees 
(a) 
Chapter 7. System Testing 
trend to increase with increasing azimuth with a peak value at around 90° although for 
certain elevations the peak position corresponds to other azimuth angles. Figure 7-2 (b) 
shows a similar plot but for the 14`h filter. In this figure, ED peaks at azimuths other than 
90° are evident. These peaks occur due the interference effect induced by the pinna 
(modelled in the HRTF data), whose frequency response is a function of both elevation 
and azimuth. In both figures, it is evident that lID values show a low variation with 
azimuth at high elevation values. The frequency dependence of IID cues is also evident 
from these plots. The maximum measured III) value is around 80 dBnA corresponding to 
an absolute ED value of 60 dB. 
7.2.2 ITD Cues 
The ITD cues are computed via software correlation of the BPF outputs for IPD 
values and envelope outputs for IED values. Figure 7-3 shows the computed IPD values 
as a function of azimuth for elevation values in the range -40° to 80° for the 9th and 2°d 
BPF, corresponding to centre frequencies of 520 and 100 Hz, respectively. At low 
frequencies, the effect of finite window length on the correlation process introduces a 
significant error in the computed ITD values as evidenced by the various ripples in the 
IPD plots for the 2°d BPF. Similar comments apply for the computed IED values, shown 
in Figure 7-4 for the 24`h and 12`h BPF. The ITD values range from 0 to around 0.9 ms, 
with a peak around the azimuth angle of 90°; they show very little frequency dependence 
since they are primarily the result of a global signal delay. 
7.2.3 First Order and Second Order MSC Cues 
Monaural spectral cues do not exhibit any particular pattern with increasing 
azimuth and hence contour maps are used here to depict the variation of MSCs with both 
azimuth (0 to 180°) and elevation (-40 to 90°). First order MSCs, which represent the 
ratio of the (n+1)`h to the nth BPF output envelope, are shown for the right channel 
(Figure 7-5) and the left channel (Figure 7-6). MSC values adjacent to the contour lines 
are in dB. tA. Plots are shown for both the 13`h -12`h BPFs and the 24`h -23 
d BPFs 
corresponding to resonant frequencies of around 1.1 kHz and 14 kHz. The salient 
information which MSC cues give in terms of the elevation of a sound source is 
evidenced by their variation with elevation. It is interesting to note that there are regions 
195 
Chapter 7. System Testing 
(depicted by horizontal contour lines) where the MSC cues vary very little with azimuth 
but are strongly elevation dependent. 
0.9 
0.8 
0.7 
0.6 
0.5 
Q 
0.4 
C6. 
0.3 
o.; 
o.. 
-o. i ö 
0.9 
0.8 
0.7 
0.6 
0.5 
Ö 
0.4 
0 .. 
0., 
0. 
20 40 60 80 100 120 140 160 180 
Azimuth, degrees 
(a) 
0L 
L- Li -L 
20 40 60 80 100 120 140 160 180 
Azimuth, degrees 
(b) 
Figure 7-3 (a), (b): IPD values as a function of azimuth for different elevation angles for 
(a) 9`b BPF, (b) 2°d BPF; these IPD values were computed from measured BPF outputs. 
196 
Chapter 7. System Testing 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.. 
0.: 
0. 
0.9 
0.8 
0.7 
0.6 
fA 
- 0. 
0. 
0. 
0. 
0 20 40 60 80 100 120 140 160 180 
Azimuth, degrees 
(b) 
Figure 7-4 (a), (b): IED values as a function of azimuth for different elevation angles for 
(a) 24`h BPF, (b) 12`h BPF; these IED values were computed from measured envelope 
outputs. 
197 
-0.10 20 40 60 80 100 120 140 160 180 
Azimuth, degrees 
(a) 
Chapter 7. System Testing 
80 
60'- 
12 98 
10 
11 ý' 
= 40 7 oA -J 
. Cl as 20 4 
0 
7 
-20! 
6 
8.1917 
-40 00 120 140 160 180 0 20 40 60 80 1 
Azimuth, degrees 
la) 
80 
-21 
-24 
60 9 
-12 40 i,. 
-18 
-32, /-- 
20 -- ------ -41 
0Ij 
-24 
-15 
20 -9 
-21 -18 
-40 20 40 60 80 100 120 140 160 180 
Azimuth, degrees 
(b) 
Figure 7-5 (a), (b): Contour map for the measured right channel first order MSC cues for 
(a) 12`h -13th BPFs, (b) 23rd -24 th BPFs: values shown on contour lines are in dBµA. 
198 
Chapter 7. System Testing 
80 
11 60 
13 8 
to 6 
- 40 
c 
'-0 
Lý 3 
6 
68 
-20 
20 40 60 80 100 120 140 160 180 
Azimuth, degrees 
(a) 
80 
60 
-21 
- 11 
-18 
40 
11 
ý. 
\ 25 
Cý 20 
-35 
-21 0r ---ý 
-28 
-25 
-20 
`- -1.. 1-- -ý -40 0 20 40 60 80 100 120 
Azimuth, degrees 
(b) 
140 160 180 
Figure 7-6 (a), (b): Contour map for the measured left channel first order MSC cues for 
(a) 12th -13th BPFs, (b) 23 Id -24 th BPFs: values shown on contour lines are in dBµA. 
15 
199 
Chapter 7. System Testing 
3 
80' 
60L 
2 
c1 
40' 
oU 
w 
01 12 
-46 20 40 
i 
/3 
5 
5 
-1 2 -_ 
60 
0.5 
80 100 120 
Azimuth, degrees 
(a) 
80, 
-5 -11 
ýi 
I. 
140 160 180 
-17 
60' 
Z 
U 
W) 40' 17 X28 
-22 
20' 50 
. a? 
0, 
-22 
p 
-20` 
-39 
0 20 40 60 80 100 120 140 160 180 
Azimuth, degrees 
(b) 
Figure 7-7 (a), (b): Contour map for the measured right channel second order MSC 
cues for (a) 12th - 13`h -14th BPFs, (b) 22"d -23 rd -24 th BPFs: values shown on 
contour lines are in dBuA. 
200 
58 
Chapter 7. System Testing 
80 
60 6 
2 
40 
20 
12 
0 
-20 
-40O 
80 
60 
40 
GA 
20 
ý 0( 
4 
0.6 
?0 40 60 
4 
5 -i 
v 
10 
ý' l 
4 
4 \ý_ 5- 
80 100 120 140 
Azimuth, degrees 
(a) 
-4 
-9 
-14 
-25 
-41 
-20 
"20 
-36 
2 
-0.6 
-15 
25 
-36 -41 
-30 
-4 
-20 
-40 ` 0 20 40 60 80 100 120 140 160 180 
Azimuth, degrees 
(b) 
Figure 7-8 (a), (b): Contour map for the measured left channel second order MSC cues 
for (a) 12th - 13`h -24 th BPFs, (b) 22nd -23 nd - 24`h BPFs: values shown on contour 
lines are in dBµA. 
201 
160 180 
Chapter 7. System Testing 
The corresponding contour plots for the second order MSC cues, which represent 
the ratio of the product of the (n+1) `h and (n-1)`h BPF envelope output to the nth envelope 
output, are shown in Figures 7-7 and 7-8. The plots are shown for the 14th - 13th _ 12th 
and 24 `h -23d-22nd BPFs corresponding to centre resonant frequencies of around 
1.3 kHz and 14 kHz. As with first order MSC cues, the importance of second order MSC 
cues with regards to elevation information is evident from these plots. 
7.3 Test Results at Different Source Positions with Different Input S/N Values 
The performance of the sound localisation system is assessed by applying input 
stimuli corresponding to different source locations, with azimuths in the range of 0° to 
180° and elevations in the range -40° to 90°. The 3-step algorithm described in chapter 3 
is then used for the determination of the source location. The test is carried out with noise 
added to the L and R channel inputs, corresponding to input S/N ratios of 80,60,40,20 
and 10 dB. In all cases, both L-R common mode (correlated) additive noise and 
uncorrelated additive noise is considered. 
The variations of the mean angular error measured for the source locations, 
considered as a function of input S/N, are presented in Table 7-1. This table summarises 
the results obtained in the localisation plots shown in Figures 7-9 to 7-13. Test results for 
an input S/N of 80 dB, shown in Figure 7-9 (a), (b), exhibit a few localisation errors 
which arise mainly due to the noise of the front-end system itself and also due to 
interpolation errors in the cue template. The largest errors occur along the median plane 
where interaural cues are zero; in this case localisation relies solely on monaural spectral 
cues which are very susceptible to errors arising from additive noise. 
Input 
SIN 
Mean 
localisation error 
(degrees) 
% No. positions 
with 
5° < error S 10° 
% No. positions 
with 
10° < error <_ 20° 
% No. positions 
with 
error > 20° 
80 dB 4.4"c4.9c 0"c, 1.1` 0"`, 0° 3.3"x, 3.3° 
60 dB 7.9 ', 7.8c 0 "c, 0c 
. Inc, 1.1 ` 5.5 "° 5.5` 
40 dB 7.9"°, 7.9C 0°c, 0` 1.1"`, 1.1` 5.5nß5.5 c 
20 dB 8.7 "`, 8.4c 4.4 'c, 0° 2.2 ",, 3.3 c 5.5 "c 5.5 
lOdB 13.6"`, 12.7` 7.7"c, 12.1c 15.4"°, 8.8° 11.0"°, 9.9c 
Table 7-1: Mean localisation and distribution of errors for different input S/N ratios for 
both correlated noise (`) and non-correlated noise ("`). 
202 
Chapter 7. System Testing 
AP% raº ".. hs r 
cw- 400 
Rear QF , º _' _, i F t ron 
0=180° 10' 9 4P -4) > 
00 
ý- -40 
(a) S/N = 80 dB 
(uncorrelated noise) 
Rear 
-- 
4= 400 
Front 
0 =180° 
(b)S/N=80dB 
(correlated noise) 
Original position * """ 0 Resolved position 
0=0° 
ý=-400 
Figure 7-9: Localisation performance using hardware-generated cues for an input S/N of 
80 dB. 
203 
Chapter 7. System Testing 
4- 4P A7 
0 
'4D S 
Rear Qt +P__.. 0 Front 
0=180° 
16 400 
' (a) S/N = 60 dB 
m 
(uncorrelated noise) 
F. 
lip- 
Rear 
I 
Front QE .. ~ . may... iý 
0= 180° 
4 -, p ýp -gyp - 
- 
o =-40° 
(b) S/N = 60 dB 
(correlated noise) 
Original position * -'-"- 0 Resolved position 
Figure 7-10: Localisation performance using hardware-generated cues for an input S/N 
of 60 dB. 
204 
Chapter 7. System Testing 
.Y . ý. +ý 
. rte. . 
{. Irr f ", +, r 
400 
dý e^ ') ,D 
ter.. 
_& 
ID, %i 
Rear 0 
4) 0 4- .- '6: _ .D0 Front 
A= 180° (p 40 40 V 4P 0=00 
0k v 40 
ý, C ... .. _. =- 400 
(a)S/N=40dB 
(uncorrelated noise) 
ý. = 40' Qk; 
T 
Rear - Front 
° 0= 180 
iD 40 _ .. l. _ ,. __.. .. 
".. G0 
400 
(b)S/N=40dB 
(correlated noise) 
Original position * -'-"- 0 Resolved position 
Figure 7-11: Localisation performance using hardware-generated cues for an input S/N of 
40 dB. 
205 
Chapter 7. System Testing 
. *' 
(4 4P 400 
Rear Front 
0= 180° IP 0 AP ýß aP It a0= 0° 
40° 
(a) S/N = 20 dB 
(uncorrelated noise) 
to 
° cý = 40 
t-A 
Front 
Rear 
, __ 1 , ,_.; a_oo __ 0= 180 .. 
Original position * -*-"- 0 Resolved position 
Figure 7.12: Localisation performance using hardware-generated cues for an input S/N 
of 20 dB. 
206 
Chapter 7. System Testing 
rNE- 
o 
0+ .' 
# 
Rear 
p 
Front 
-- NO 0=180° 0=0° 
ce c + O 16 
(a) SIN ý 10 dB 
(uncorrelated noise) ""y" ` 
ti 
Rear 4ý 
Front 
E) = 180° b- % 4) v 
alt . i. AY 
-40° 
(b) S/N= 10 dB ý... 
(correlated noise) 
Original position * ""'ý 0 Resolved position 
Figure 7-13: Localisation performance using hardware-generated cues for an input SIN 
of 10 dB. 
207 
Chapter 7. System Testing 
Corresponding localisation plots for input S/N ratios of 60,40,20 and 10 dB are 
shown in Figures 7-10 to 7-13. For input SIN ratios equal to and above 20 dB, good 
localisation (within 5°) is achieved for most source locations, although a slight increase in 
localisation errors along the median plane is evident. For S/N* ratios below 20 dB, 
considerable degradation in localisation performance is evident. 
7.4 Test Results with Echo - Effect of Onset Detector 
The onset detector chip is interfaced to the front-end cue extraction chip via the 
AGC control voltage pin using a Schottky diode with its cathode connected to the onset 
detector output pin. During the onset window off-state, the onset detector output overrides 
the AGC mechanism by pulling it low, causing the output of the voltage controlled gain 
amplifiers in the front-end to be turned off. During the on-state of the onset window, 
normal AGC loop control prevails. During this test, the onset detector parameters Td, Afe 
and i are set to 6.2 ms, 0.5 and 0.1 s, respectively. An input signal corresponding to 
source location (24°, 50°) together with a superimposed point echo signal corresponding 
to location (-24°, 50°) and with different values of Afe and'r is applied at the input of the 
system. The resulting localisation error is shown in Figure 7-14 (a). For Afe values less 
than 0.4 or r values less than 10 ms, the angular error is within 2°. The test was repeated 
with the onset detector disabled and the corresponding results are plotted in 
Figure 7-14 (b), from which it can be noted that the localisation errors are significantly 
larger and begin to increase as soon as Afe and ti are increased from zero. 
7.5 Test Results with Different Sound Sources 
In the tests carried out in sections 7.3 and 7.4, an impulse-type sound source was 
used. The test results reported in this section were obtained using three different sound 
sources corresponding to the English phonemes "sh", ` ow" and "ah" uttered by a male 
speaker. The test was carried out with no additive noise and echo. Different source 
locations with azimuths in the range of 0° to 180° and elevations in the range -40° to 90° 
were considered, and each test was repeated 20 times. The test results obtained are 
tabulated in Table 7-2. The best performance is obtained for the phoneme "sh" (88% of 
the locations resolved within 5° accuracy) and this feature is probably due to the fact the 
spectral content of this particular phoneme is very much similar to that of broadband 
noise, which therefore satisfies the requirement for the use of monaural spectral cues. 
208 
Chapter 7. System Testing 
-f"/1t 
'1I -" 1, 
5I-' 
-'' -1` I 
`tom 
'I 
yam., I 
3 
b, v 
p 
-I- ýI` 
ýI 
on j 
0.8 
0.4 Afe 
0.2 2 logio(T) 
0 -3 
(a) 
--', I1I 
20- -- I '~ 
I 
15 
ý' 
' 
Iý 1 Q% 
i In l 
OII 
ý' 
Lr 
G 
bn I1 
0.8 
0.6 
O4 
Afe 0.2 
02 
loglo(r) 
-3 
(b) 
Figure 7-14: Localisation accuracy under different reflection coefficient (Afe) and decay time 
constant (i) conditions with onset detector circuit enabled (i) and disabled (ii). 
209 
Chapter 7. System Testing 
The other phonemes produce a significant decrease in localisation accuracy (locations 
resolved within 5° accuracy is 80% for "ow" and 76% for "ah") and is mainly due to the 
fact that these phonemes are narrow-band in nature and thus the evaluation of the spectral 
cues is prone to significant error; hence the source position cannot be resolved accurately 
along the cone of confusion in this case. 
Input Mean % No. positions % No. positions % No. positions 
S/N localisation error with with with 
(degrees) 5° < error <_ 10° 10° < error _< 
20° error > 20° 
"sh" 8° 3 5 4 
"ow" 18° 2 10 8 
"ah". 15° 3 11 10 
Table 7-2: Mean localisation and distribution of errors for three input signals corresponding 
to English phonemes "sh", "ow" and "ah". 
7.6 Conclusions 
The system for 2-D sound localisation tested here uses hardware extracted IID and 
MSC cues and software generated ITD cues, which are then mapped via the 3-step cue-to- 
position algorithm developed in this work. The hardware tested includes the onset 
detector chip, and the front end chip which extracts IID cues, the first order and second 
order MSC cues, and which provides information used to generate the ITD cues. The 
results reported in this chapter show that a localisation accuracy of 5° is possible for 96% 
of the test cases, provided that the sound source is broadband in nature and a cue template 
customised to the fabricated hardware is used. The system is significantly tolerant to 
additive noise with a graceful degradation in localisation performance down to S/N ratios 
of 20 dB. The onset detector provides a significant protection against echo-induced errors 
provided the echo properties of the environment are well characterised. Degradation in 
sound localisation performance is obtained when relatively narrowband signals are used 
due to the fact that the cue-to-position mapping algorithm has not been adapted to such 
signals. 
210 
Chapter 8. Conclusions and Further Work 
Chapter 8 
Conclusions and Further Work 
8.0 Conclusions 
The scope of this work is to design the necessary analogue building blocks for the 
extraction of cues required for 2-D sound localisation via two sensors, together with the 
necessary processing algorithm for mapping the cues into source azimuth and elevation. 
The aims of this research work have been fully met. Analogue hardware consisting of an 
onset detector chip, a spectral cue extraction front-end chip and a correlator chip, required 
for 2-D sound source localisation, operating at ± 0.9 V has been designed, implemented 
and tested. The requirement of the designed hardware to operate at a low supply voltage 
entailed a series of challenges in the development of the various building blocks 
implemented using log domain, switched capacitor and switched current techniques. The 
building blocks have been designed in standard (0.8 . tm AMS) CMOS for low cost, and 
consume low power (onset detector: 5.4 mW, spectral cue extraction: 890 . tW, 
correlator: 17 mW) thus enabling them to be used in battery powered applications. 
A novel algorithm for mapping the cues generated by the front-end hardware into 
source location has been developed and systematically tested via simulations for various 
environmental and hardware non-idealities. To date single step search methods have been 
used which search directly in 2-D space for the most likely source location using all cues 
simultaneously. The 3-step algorithm developed here first searches along the azimuth 
(elevation 0) using only interaural cues, then moves along the chosen conical surface to 
determine the most likely source position, using all cues. Finally a gradient descent 
method is used to accurately obtain the source location. This algorithm maximises the 
discriminative power of each particular localisation cue leading to more robustness with 
respect to hardware and environmental non-idealities. It is also faster as a result of 
splitting the 2-D search problem into two consecutive 1-D searches. This algorithm has 
been tested using cues generated by the designed hardware and a localisation accuracy 
better than 5° was achieved in 96 % of the cases. 
During this research work, novel design solutions had to be developed to ensure 
operation of the low power analogue hardware at ± 0.9 V; building blocks implemented 
include Class AB S21 memory cells, squaring circuit and echo decay model in the onset 
211 
Chanter 8. Conclusions and Further Work 
detector chip, filters, maximum current detector and automatic gain control in the spectral 
cues extraction front-end chip, and a high speed op amp and SC multiplier in the 
correlator chip. The circuits have been tested and test results reported. 
Two Class AB S21 memory cells designed here, one using feedforward current 
control and the other using a feedback control loop, achieve accurate quiescent current 
control and hence optimised power consumption. In addition both exhibit improved 
accuracy of stored values and are capable of operating at a higher clock speed than similar 
cells reported to date. The squaring circuit consists of three translinear cells with 
transistors operating in subthreshold region allowing low power differential Class AB 
operation. Echo decay has to date been modelled using software. A tuneable echo decay 
model has been implemented here in hardware using fully differential architecture and 
able to operate at low supply voltage. 
The log domain CMOS bandpass filters implemented here employ a novel 
technique developed in order to enable differential Class AB operation at low voltages. 
The AGC loop, used to improve the dynamic range of these log domain filters, has been 
implemented using current mode techniques in contrast to voltage mode techniques 
normally employed. In this application, the AGC loop needs to detect which of the 
48 BPFs exhibits the strongest signal envelope. A maximum current detector operating at 
low voltage had thus to be designed. 
The extraction of ITD cues required the design of a novel high speed Class AB 
op amp, with slew rate enhancement for power dissipation optimisation, and a continuous 
time CMFB network adapted for low voltage operation. Regenerative slew boosting is 
used here, incorporating also an activation threshold to further improve power 
consumption. At ± 0.9 V operation, a slew rate of 210 Vlµs and a 0.1% settling time of 
38 ns are achieved, which, when compared with results reported to date for comparable 
circuits [129], represent a 100% increase of the slew rate and a 60% reduction of the 
settling time. In addition a novel SC multiplier with accurate rail-to-rail input operation 
and output common mode voltage control has also been designed. 
The review work which has been carried out during the course of this work shows 
that analogue VLSI techniques offer specific advantages when used for audio signal 
processing applications. In particular, the current mode approach leads to compact circuits 
that are also power efficient because a direct mapping of the problem to be solved into 
hardware is often possible. Typical problems inherent in analogue implementations of the 
212 
Chapter 8. Conclusions and Further Work 
cochlea mainly arise due to matching inadequacies especially when the MOS device is 
operated in weak inversion. It has been found that to achieve satisfactory performance, 
the following points are essential: 
(i) the design should consist of a differential topology in order to reduce errors 
that arise from first order non-idealities; 
(ii) good matching requires good analogue layout techniques which also entails a 
layout-oriented circuit design; 
(iii) switched capacitor or switched current techniques are advantageous compared 
to continuous time techniques where high accuracy and long time delays are 
required as in the case of ITD computation. 
The new algorithm for mapping localisation cues into source position developed in 
chapter 3 achieves good localisation accuracy when simulated with additive noise. The 
ability of the system to work reliably even with low S/N ratios, makes its analogue 
implementation a feasible solution. An onset detection mechanism is required in order to 
reduce errors arising from echoes. Simulations carried out in order to assess hardware 
non-idealities point to the following requirements: 
(i) good matching between the left and right channel processing elements is 
essential; 
(ii) under good left/right channel matching, the absolute value of the filter Q 
factors is relatively unimportant, provided that a minimum Q factor is ensured; 
(iii) a customised template is necessary in order to allow for BPF centre 
frequency variations. 
The novel low voltage current mode circuit for detection of onsets in a sound 
signal, described in chapter 4, has been successfully tested. This circuit can be adapted to 
different environmental conditions by tuning the main onset detection parameters. The 
circuit also demonstrates the successful interfacing of CMOS log domain and SI building 
blocks. The differential topology used in the onset detector circuitry greatly reduces the 
problems commonly associated with subthreshold MOS operation. 
The design and test results obtained for the spectral cue extraction front-end chip 
described in chapter 6 show that it is possible to achieve a reasonable amount of signal 
processing with micropower low voltage operation using continuous current-mode 
log domain CMOS circuits. The bandpass filters in the front end achieve a dynamic range 
of 68 dB for a 1.9% THD. Test results point out to the following: 
213 
Chapter 8. Conclusions and Further Work 
(i) provision for Q-tuning is necessary in order to allow for parasitic elements 
and mismatches which are not catered for during the design and simulation phase; 
(ii) good power supply regulation is essential for the reliable operation of low 
voltage analogue circuits: in this circuit, the Q-factors are very sensitive to supply voltage 
vanation; 
(iii) low harmonic distortion is possible in CMOS log domain circuits, even 
when the MOS devices are operated in moderate inversion region; the differential 
topology helps in reducing harmonic distortion by attenuating even harmonics; 
(iv) accumulation of noise and offset is prevented via the use of a parallel BPF 
architecture rather than a cascade approach. 
The complete system was tested using a broadband signal as a sound source, 
convolved with the appropriate HRTF response. Measurement results for 2-D 
localisation, obtained in chapter 7, using a customised cue template, show that around 
96% of the test positions can be resolved within an accuracy of 5°. A graceful 
degradation of localisation performance with input S/N ratio is achieved; at an input S/N 
of 20 dB, 88% of the source locations considered can be resolved within 5° accuracy. 
The onset detector also provides good protection against echo-induced errors provided 
that the echo properties of the environment are well characterised: in fact around 20% 
reduction in error results, under echoic conditions, when the onset detector is enabled. 
However, with narrow-band sound signals, some degradation in the localisation 
performance occurs as a result of the assumptions inherent in the use of monaural spectral 
cues. 
8.1 Further Work 
8.1.1 Improvements on the Cue-to-Position Mapping Algorithm and Its Hardware 
Implementation 
The cue-to-position mapping algorithm presented in this thesis achieves good 
performance with broadband sound sources; however it is possible to adapt it to achieve 
better performance with other types of sound sources. A possible approach would be to 
dynamically adapt the weight values given to the cues for each filter according to the 
input spectrum. In this way, cues pertaining to filters whose output level is low (and 
hence having a low S/N ratio) are given a lower weight than other cues. A possible 
approach would be to evaluate the weight attributed to the cues according to the average 
214 
Chapter 8. Conclusions and Further Work 
energy output from a number (such as four) of adjacent filters of both L and R channels: 
in this way the spectral shaping which is introduced via the external ear in not masked 
out. 
The choice of the integration interval used for ITD cue computation is also an 
important parameter which affects localisation accuracy: in this work, the interval was 
kept the same for all filters -a possible enhancement would be to scale the integration 
interval according to BPF or envelope filter cut-off frequency, attributing longer intervals 
to the lower frequency bands. 
In order to develop a complete hardware sound localisation system, it is necessary 
to design the necessary "back-end" hardware in order to map the cues generated by the 
front-end into position using hardware. The use of neural networks could be interesting in 
that the back-end system can be adapted in order to account for non-idealities in the front- 
end system. RBF neural networks could be useful in this application and can be 
efficiently mapped into hardware [133]-[135]. 
8.1.2 Improvements to the Onset Detector 
The designed onset detector allows for programmability of the delay for the first 
wavefront, decay constant and reflection coefficient. These parameters, however, still 
need to be set manually using a priori knowledge of the environmental conditions. It 
would thus be interesting to develop an algorithm to compute these parameters by 
applying test sound signals and measuring the resulting ambient reverberation 
characteristics. 
Most of the power dissipated in the onset detector is consumed by the S2I delay 
line. Some power reduction is still possible by carefully optimising the quiescent currents 
of the S21 memory cells. In the present implementation, the final current signal is taken 
from the auxiliary current output terminal of the last stage. The signal thus incurs some 
variations when switching from coarse to fine sampling phases. Some glitches in the 
output current are also evident during testing, and are due to switching transients. Some 
performance enhancement is therefore possible by introducing a sample and hold circuit 
at the last stage and possibly also a LPF in order to remove the switching transients. 
The current comparator used to trigger the onset window generator consists of a 
basic current mirror. The speed of this comparator is thus limited and depends on the 
resulting current difference and parasitic capacitances. The speed of the comparator is an 
215 
Chapter 8. Conclusions and Further Work 
important issue since the time delay between the onset of a sound signal and its echo can 
be quite small. The speed of the current comparator can be enhanced via the introduction 
of positive feedback. The comparator will then have to be reset via the window generator 
itself. 
The onset detector incurs a delay which is mainly due to the LPF used for 
envelope extraction and the current comparator. This means that the onset window is in 
fact delayed compared to the actual signal onset, causing a portion of the incident signal 
to be truncated. A possible solution to this problem is to introduce a delay in the signal 
path feeding the cue generation circuitry in order to equalise the onset detector delay as 
shown in Figure 8-1. In this way, the onset window can be centred around the incident 
portion of the sound signal. 
L-input 
R-input 
L and R signals to cue 
extraction circuits 
Figure 8-1: Onset detector improvement: the delay incurred in the onset detector can be 
compensated using parallel delays for the L and R inputs. 
8.1.3 Improvements to the Front-end 
Most of the power dissipated in the front-end is consumed in the filters tuned to 
the high frequencies. It would thus be interesting (in terms of power consumption) to 
explore the degradation of sound localisation with reduced bandwidth (such as up to 
8 kHz). The possibility of using a higher number of filters per frequency span at the lower 
frequencies in order to increase the frequency resolution of the cues would also be worth 
investigating. 
The front-end developed in this work allows for individual Q-factor adjustment 
and for global adjustment of the centre frequencies. Provided that reasonable matching is 
maintained between adjacent filters, monotonicity of the centre frequencies is guaranteed. 
216 
Chapter 8. Conclusions and Further Work 
However, the frequency span of the whole BPF cascade is determined by the ratio of the 
current mirrors feeding the first and last BPF. A useful enhancement in future designs 
would be to provide for a programmable frequency span and/or 
individual centre 
frequency adjustment. In this way the front-end can be adjusted for input signals of 
different bandwidths. A non-volatile adjustment of the Q-factors via the use of an on-chip 
EPROM would also be a useful feature. 
In the designed front-end, the capacitor values used in the BPFs are kept constant 
(50 pF) for all the resonant frequencies and frequency tuning is solely achieved by 
varying the tuning current. This approach practically guarantees monotonicity of resonant 
frequencies because of the excellent matching of capacitors of equal size in CMOS 
processes. However, this approach also leads to a high area cost and a relatively higher 
power dissipation, since high resonant frequencies can alternatively be achieved using 
lower capacitor values and tuning current. The use of low capacitor values for filters with 
higher resonant frequencies, however, has to be carried out cautiously due to noise issues 
and also in order to guarantee a monotonic progression of the resonant frequencies. When 
considering the latter issue, it is important to take into account the increased effects of 
parasitic capacitances at higher frequencies and the matching of the frequency- 
determining capacitors. A possible approach to ensure good matching is to organise the 
BPFs in such a way that the capacitor values are halved every octave of frequency. 
Within the same octave the capacitor values are maintained constant and tuning is 
achieved by changing the current. In this way, all capacitors can be constructed using 
integral multiples of a unit capacitor. 
The concept of log-domain processing was originally adapted to bipolar devices 
whose collector current exhibits an excellent exponential behaviour with base-emitter 
bias: in fact this exponential behaviour range exceeds 8 decades of collector current. In 
contrast, the range of exponential behaviour of the drain current in the MOS device is 
only about 5 decades: above this range the MOS device enters strong inversion and signal 
distortion takes place in circuits which utilise the exponential characteristic of the device. 
This characteristic limits the dynamic range of CMOS log domain circuits. Although the 
dynamic range can be enhanced using AGC techniques, it is interesting to explore the use 
of CMOS devices operating in strong inversion for current-mode signal processing using 
root-domain techniques [136]. Root-domain circuits are not as compact as log-domain 
circuits but can be used to extend the dynamic range behaviour of CMOS current-mode 
217 
Chapter 8. Conclusions and Further Work 
circuits: additionally, the matching properties of a CMOS device operating in strong 
inversion are better than those of the same device operating in weak inversion. The 
advantages of using root-domain circuits need, however, to be weighed against the 
associated increased power dissipation and circuit complexity. 
8.1.3.1 Microelectromechanical (MEMS) Technology 
The filtering process of the biological cochlea is essentially a mechanical process 
and thus it is interesting to investigate the use of microelectromechanical (MEMS) 
topology for this application [137], [138]. MEMS technology is interesting because it can 
lead to low power and highly compact devices, even though the technology is still 
evolving and expensive compared to standard CMOS. The architecture used in [137] 
consists of an array of polysilicon beams supported at the edges. The beams are made of 
different lengths resulting in different resonant frequencies and thus act as BPFs; the 
deflection of the beams is proportional to the intensity of the sound at the resonant 
frequency. Beam movement is measured using the capacitan ce of the beam with respect 
to the substrate. Sensing can also be carried out using the piezoresistive effect [139]; 
however, this method is not easily integrated with CMOS electronic devices. The issues 
of type of resonant structures used, transduction techniques, packaging and on-chip signal 
processing are still open for research. 
8.1.4 Improvements on ITD Extraction 
The present ITD extraction circuit only caters for 5 discrete ITD values. The 
circuit has to be extended to a larger number of stages (around 44). In order to increase 
the number of stages, the digital control block will have to be extended; however, the 
op amps, multiplier and comparator used in the analogue section are capable of operating 
at the increased clock rate as required for 44 stages. A form of automatic gain control 
also has to be included together with a suitable interface for translating the current mode 
signals from the front-end to voltage signals. 
Other topologies for ITD extraction would also be worth investigating. The 
biological auditory system seems to use coincidence detection of pulses for the extraction 
of ITD cues. In VLSI, pulse delaying and coincidence detection can be very efficiently 
implemented using digital techniques. A possible approach is depicted in Figure 8-2. 
Sigma-delta techniques could be used in order to transform the analogue input from the 
218 
Chapter 8. Conclusions and Further Work 
front end into a digital pulse stream. A second order modulator would be sufficient for 
this purpose and can be designed to operate at very low supply voltages [ 1321, using 
switched op amp techniques rather than clock voltage doubling (the elimination of clock 
voltage doubling is essential to prevent gate oxide breakdown in sub-micron technologies 
and also to reduce power dissipation). It is also possible to design a current-mode sigma 
delta modulator using the S2I building blocks described in chapter 4: in that case, the 
front-end current mode outputs can be directly fed into the modulator. It is also possible 
to incorporate a BPF characteristic in the modulator transfer function. The delay between 
the L and R signals can then be extracted using a digital delay line and coincidence 
detection: an AGC mechanism can be easily implemented on the digital side in order to 
avoid overflow. 
Time delay cues are primarily the result of a global signal delay and therefore their 
frequency dependence is minimal. On the other hand, the separate extraction of time 
ITD value 
Figure 8-2: Proposed mixed-signal ITD extraction architecture: sigma-delta modulators 
are used to convert the input analogue signals into a digital pulse stream. Cross-correlation 
is then carried out using digital delays lines, XOR gates and a counter. Overflow can be 
prevented by dividing all the counter outputs by 2, once a carry is detected on any one of the counters. The digital comparator is used to determine the maximum correlation 
position. 
219 
Input from L- Input from R- 
channel BPF channel BPF 
Chapter 8. Conclusions and Further Work 
delay cues for each frequency band is expensive from the hardware requirement point of 
view. It would therefore be interesting to assess the loss (if any) in localisation accuracy 
if a global time delay extraction is used for the composite signal, rather than for each 
specific frequency band. The results obtained in this thesis indicate the loss in accuracy is 
likely to by negligible; however, the savings in hardware complexity would be 
significant. 
8.1.5 Supply Voltage Regulation 
During testing, it was found that good power supply regulation is essential to 
ensure good performance of the various chips. Variations in the supply voltage cause 
variations in the resonant frequency and Q-factor of the BPFs in the front-end chip. The 
main reason for this effect is due to the finite output conductance of the MOS devices, 
especially when they are operated at low supply voltages (and hence near the onset of 
triode region). For the ITD extraction chip it is also essentially to keep Vss exactly equal 
to -Vdd, for the correct operation of the multiplier common mode output voltage control. 
It can be concluded that for circuits to perform reliably at very low supply 
voltages, good supply regulation is of utmost importance and hence an on-chip low- 
voltage low-drop regulator is required especially in cases where battery aging has to be 
compensated for. The design of low voltage low-drop (and low power) regulators in 
CMOS with good temperature stability is a challenging task which provides scope for 
further research. 
8.1.6 Other Applications for the Hardware Building Blocks 
The hardware building blocks designed during the course of this thesis can be 
broadly classified into the following categories: continuous time current mode processing 
(building blocks used in the front end and parts of the onset detector), discrete time 
current mode processing (delay line used in the onset detector) and discrete time voltage 
mode processing (ITD extraction circuit). The current mode blocks can be further 
subdivided into linear (such as the BPFs) and non-linear (such as the squaring, echo decay 
model in onset detector, and maximum current selector in cue extraction front end). All 
these building blocks have a variety of applications: for example, discrete time current 
and voltage mode processing is extensively used for A/D and D/A conversion and 
filtering; continuous time current mode processing is useful in a variety of area-efficient 
220 
Chapter 8. Conclusions and Further Work 
wide bandwidth signal processing applications including neural networks. In most cases, 
it is possible to directly map the problem to be solved into hardware using the same basic 
current mode building blocks with very slight modifications. The possibility of using 
these building blocks at low supply voltages thus allows a variety of analogue signal 
processing applications to be carried out using sub-micron CMOS technologies. 
Furthermore, the complete front end described in chapter 6 could have other applications 
in the area of audio signal processing such as speech or speaker recognition. 
221 
Bibliography 
Bibliography 
[1]M. Cowling and R. Sitte, "Sound identification and direction detection in Matlab 
for surveillance applications", Proceedings of The Matlab User Conference, 
Australia, 2000. 
[2] R. Kfuc and M. W. Siegel, "Physically based simulation model for acoustic 
sensor robot navigation", IEEE Transactions on Pattern Analysis and Machine 
Intelligence, 9, pp. 766-778,1987. 
[3] J. Flanagan, D. Berkley, G. Elko, J. West and M. Sondhi, "Autodirective 
microphone systems", Acustica, 73, pp. 58-71,1991. 
[4] D. Giuliani, M. Omologo and P. Svaizer, "Talker localization and speech 
recognition using a microphone array and a cross-power spectrum phase 
analysis", Proceedings of the International Conference on Spoken Language 
Processing (ICSLIP), 3, pp. 1243-1246,1994. 
[5] Q. Lin, E. Jan and J. Flanagan, "Microphone arrays and speaker identification", 
IEEE Transactions on Speech Audio Processing, 2, pp. 622-629,1994. 
[6] J. E. Greenberg and P. M. Zurek, "Evaluation of an adaptive beamforming method 
for hearing aids", Journal of the Acoustic Society of America, 91, pp. 1662-1676, 
1992. 
[7] D. R. Begault, "3-D Sound for virtual reality and multimedia", Academic Press, 
Cambridge, MA., 1994. 
[81 A. W. Bronkhors, J. A. Veltman, and L. van Breda, "Application of a three- 
dimensional auditory display in a flight task", Human Factors, 38, pp. 23-33, 
1996. 
[9] M. S. Brandstein, and H. Silverman, "A closed-form location estimator for use 
with room environment microphone arrays", IEEE Transactions on Speech and 
Audio Processing, 5, pp. 45-50,1997. 
[10] R. Schmidt, "A new approach to geometry of range difference location", IEEE 
Transactions on Aerospace Electronics, AES-8, pp. 821-835,1972. 
[11] J. Smith and J. Abel, "Closed-form least squares source location estimation from 
range-difference measurements", IEEE Transactions on Acoustics, Speech and 
Signal Processing, 35, pp. 1661-1669,1987. 
222 
Bibliograph 
[121 M. Brandstein, J. Adcock, and H. Silverman, "A Closed-form Method for Finding 
Source Locations from Microphone-array Time-delay Estimates", IEEE 
Transactions on Speech and Audio Processing, 5, pp. 45-50,1997. 
[13] W. Bangs and P. Schultheis, "Space-time processing for optimal parameter 
estimation", in Signal Processing, Q. Griffiths, P. Stocklin, and C. V. 
Schooneveld editors), pp. 577-590, New York Academic Press, 1973. 
[141 G. Carter, "Variance bounds for passively locating an acoustic source with a 
symmetric line array", Journal of the Acoustic Society of America, 36, pp. 953- 
964,1988. 
[151 H. F. Silverman and S. E. Kirtman, "A two-stage algorithm for determining talker 
location from linear microphone-array data", Computer, Speech and Language, 6, 
pp. 129-152,1992. 
[16] S. Haykin, "Adaptive filter theory", Prentice Hall, 1991. 
[17] D. Johnson and D. Dudgeon, "Array signal processing - concepts and techniques", 
Prentice Hall, 1993. 
[18] J. Krolik, "Focussed wide-band array processing for spatial spectral estimation", 
in Advances in Spectrum Analysis and Array Processing (S. Haykin editor), 2, 
pp. 221-261, Prentice Hall, 1991. 
[19] J. Lazzaro, "Silicon models of early audition", Ph. D. thesis, California Institute of 
Technology, 1990. 
[20] E. Fragniere, F. Schaik, E. Vittoz, "Cochlear linear predictive coding: interfacing 
an analogue cochlear model with a conventional speech recognition system", 
MANTRA Centre for Neuro-Mimetic Systems, 1992-1996 Activity Report, 
pp. 105-116, EPFL, Lausanne, Switzerland, January 1996. 
[21 ] N. Bhadkamkar, "Binaural source localizer chip using subthreshold analog 
CMOS", Proceedings of the IEEE International Conference on Neural Networks, 
Orlando, FL, pp. 1866-1870,1994. 
[22] R. J. W. Wang, R. Sarpeshkar, M. Jabri, C. Mead, "A low power analog front-end 
module for cochlear implants", Proceedings of the XVI World Congress on 
Otorhinolaryngology, Sydney, 1997. 
[23] J. L. van Soest, "Richtungshooren bij sinusvormige geluidstrillingen (Directional 
hearing of sinusoidal sound waves)", Physica, 9, pp. 271-282,1929. 
223 
Bibliography 
[24] R. Sarpeshkar, "Analog versus digital: extrapolating from electronics to 
neurobiology", Neural Computation, 10, pp. 1601-1638,1998. 
[25] C. D. Summerfield and R. F. Lyon, "ASIC implementation of the Lyon cochlea 
Model", Procedings of the IEEE International Conference on Acoustics, Speech 
and Signal Processing, San Francisco CA, 5, pp. 673-676,1992. 
[26] R. Sarpeshkar, R. F. Lyon, and C. A. Mead, "A low-power wide dynamic range 
analog VLSI cochlea", Analog Integrated Circuits and Signal Processing, 16, 
pp. 254-274,1998. 
[27] L. C. Aiello and P. Wheeler, "The expensive-tissue hypothesis: the brain and 
digestive system in human and primate evolution", Current Anthropology, 36, 
pp. 199-221,1995. 
[28] Y. Taur and T. J. Watson, "The incredible shrinking transistor", IEEE Spectrum, 
36, pp. 25-29,1999. 
[29] E. Fragniere, A. Schaik, E. Vittoz, "Design of an analogue VLSI model of an 
active cochlea", Analog Integrated Circuits and Signal Processing, 13, pp. 19-35, 
1997. 
[30] L. C. Peterson, and B. P. Bogert: "A Dynamical Theory of the Cochlea", Journal of 
the Acoustic Society of America, 22, pp. 269-381,1950. 
[31] J. L. Stewart, "A theory and physical model for cochlear mechanics", Acta 
Otolaryngologica Suppl., 294, pp. 1-24,1972. 
[32] E. Zwicker, "Suppression and (2f1-f2) - difference tones in a nonlinear cochlear 
preprocessing model with active feedback", Journal of the Acoustic Society of 
America, 80, pp. 163-176,1986. 
[33] E. Zwicker and W. Peisl, "Cochlear preprocessing in analog models, in digital 
models and in human inner ear", Hearing Research, 44, pp. 209-216,1990. 
[34] R. F. Lyon and C. Mead: "An analog electronic cochlea", IEEE Transactions on 
Acoustics, Speech and Signal Processing, 36, pp. 1119-1134,1988. 
[35] J. Huang, N. Ohnishi, and N. Sugie, "Sound localization in reverberant 
environment based on the model of the precedence effect", IEEE Transactions on 
Instrumentation and Measurement, 46, pp. 842-846,1997. 
[36] N. Bhadkamkar and B. Fowler, "A sound localisation system based on biological 
analogy", Proceedings of the IEEE International Conference on Neural 
Networks, San Francisco, CA, pp. 1902-1907,1993. 
224 
Bibliography 
[37] J. B. Allen, "Cochlear modeling", IEEE ASSP (Acoustics, Speech and Signal 
Processing) Magazine, 2, pp. 3-29,1985. 
[38] C. A. Mead, "Analog VLSI and Neural Systems", (Addison-Wesley), 1989. 
[39] W. Liu, A. G. Andeou, and M. H. Goldstein, "Voiced-speech representation by an 
analog silicon model of the auditory periphery", IEEE Transactions on Neural 
Networks, 3, pp. 477-487,1992. 
[40] L. Watts, "Cochlear mechanics: analysis and analog VLSI", Ph. D. thesis, 
California Institute of Technology, 1993. 
[41] L. Sellami, and R. W. Newcomb, "A digital scattering model of the cochlea", 
IEEE Transactions on Circuits and Systems I: Fundamental Theory and 
Applications, 44, pp. 174-180,1997. 
[42] A. Schaik, and R. Meddis, "The electronic ear", Neurobiology, NATO ASI Series 
A- Life Sciences, 289, pp. 233-250,1996. 
[43] R. F. Lyon, "Automatic gain control in cochlear mechanics" in The Mechanics and 
Biophysics of Hearing, (Springer-Verlag), pp. 395-402,1990. 
[44] R. F. Lyon, and C. A. Mead, "A CMOS VLSI cochlea", Proceedings of the IEEE 
International Conference on Acoustics, Speech and Signal Processing, 
New York, NY, pp. 2172-2175,1988 
[45] L. Watts, D. A. Kerns, and R. F. Lyon, "Improved implementation of the silicon 
cochlea", IEEE Journal on Solid-State Circuits, 27, pp. 692-700,1992. 
[46] J. P. Comil, and P. G. A. Jespers, "A micropower switched capacitor 
implementation of the silicon cochlea", Proceedings of the European Solid State 
Circuits Conference, Ulm, Germany, pp. 100-103,1994. 
[47] Y. Kuraishi, K. Nakayama, K. Miyadera, and T. Okamura, "A single-chip 20- 
channel speech spectrum analyzer using a multiplexed switched-capacitor filter 
bank", IEEE Journal on Solid-State Circuits, SC-19, pp. 964-970,1984. 
[48] J. Lin. W. Ki, K. Thompson, and S. Shamma, "Realization of cochlear filters by 
VLT switched capacitor biquads", IEEE International Conference on Acoustics, 
Speech and Signal Processing, San Francisco, CA, 2, pp. 245-247,1992. 
[49] J. Lin, W. Ki, T. G. Edwards, and S. Shamma, "Analog VLSI implementations of 
auditory wavelet transforms using switched-capacitor circuits", IEEE 
Transactions on Circuits and Systems I, 41, pp. 572-583,1994. 
225 
Bibliography 
[501 J. S. Chang and Y. C. Tong, "A micropower-compatible time-multiplexed SC 
speech spectrum analyzer design", IEEE Journal of Solid-State Circuits, 28, 
pp. 40-48,1993. 
[51] R. Sarpeshkar, R. F. Lyon, and C. Mead, "A low-power wide-linear-range 
transconductance amplifier", Analog Integrated Circuits and Signal Processing, 
13, pp. 123-151,1997. 
[52] Y. Tsividis, "Mixed Analog-Digital VLSI Devices and Technology (An 
Introduction)", McGraw-Hill, pp. 47-97,1996 
[53] R. F. Lyon, "Analog implementations of auditory models", Proceedings of the 
DARPA Workshop on Speech Recognition and Natural Language, Pacific Grove, 
pp. 212-216,1991. 
[54] P. M. Furth and A. G. Andreou, "Cochlear models implemented with linearized 
transconductors", Proceedings of the IEEE International Symposium on Circuits 
and Systems, Atlanta, GA, 3, pp. 491-494,1996. 
[551 P. M. Furth and A. G. Andreou, "A design framework for low power analog filter 
Banks", IEEE Transactions on Circuits and Systems - I, 42, pp. 966-971,1995. 
[56] N. Kumar, W. Himmelbauer, G. Cauwenberghs, and A. G. Andreou, "An analog 
VLSI chip with asynchronous interface for auditory feature extraction", 
Proceedings of the IEEE International Symposium on Circuits and Systems, Hong 
Kong, 1, pp. 553-556,1997. 
[57] C. Toumazou, J. Ngarmnil, and T. S. Lande, "Micropower log-domain filter for 
electronic cochlea", Electronics Letters, 30, pp. 1839-1841,1994. 
[58] R. Fox, M. Nagarajan, and J. Harris, "Practical design of single-ended log-domain 
filter circuits", Proceedings of the International Symposium on Circuits and 
Systems, Hong Kong, pp. 341-344,1997. 
[59] Y. Tsividis, "Externally linear, time-invariant systems and their application to 
companding signal processors", (Invited paper) in the IEEE Transactions on 
Circuits and Systems II, 44, pp. 65-85,1997. 
[60] B. Gilbert, "Translinear circuits: A proposed classification" Electronics Letters, 
11, pp. 14-16,1975. 
[61] R. Sarpeshkar, R. F. Lyon, and C. A. Mead, "Nonvolatile correction of Q-offsets 
and instabilities in cochlear filters", Proceedings of the IEEE International 
Symposium on Circuits and Systems, Atlanta, GA, 3, pp. 329-332,1996. 
226 
Bibliography 
[621 C. Mead, X. Arreguit, and J. Lazzaro, "Analog VLSI model of binaural hearing", 
IEEE Transactions on Neural Networks, 2, pp. 230-236,1991. 
[63] A. Schaik, E. Fragniere, and E. Vittoz, "A silicon model of amplitude modulation 
detection in the auditory brainstem", Advances in Neural Information Processing 
Systems, No. 9, Ch. 152, pp. 741-747,1997. 
[64] A. Schaik, E. Fragniere, and E. Vittoz, "An analogue electronic model of ventral 
cochlear nucleus neurons", Proceedings of MicroNeuro, Lausanne, Switzerland, 
pp. 52-59,1996. 
[65] J. Lazzaro, "Temporal adaptation in a silicon auditory nerve", Advances in Neural 
Information Processing Systems, 4, pp. 813-820,1992. 
[66] A. Schaik, E. Fragniere, and E. Vittoz, "Improved silicon cochlea using 
compatible lateral bipolar transistors", Advances in Neural Information 
Processing Systems, 1996, No. 8, Ch. 152, pp. 671-677,1996. 
[67] J. Lazzaro and J. Wawrzynek, "Low-power silicon neurons, axons and synapses", 
Kluwer International Series in Engineering and Computer Science, No. 1266, 
p. 153,1994. 
[68] N. Kumar, G. Cauwenberghs, and A. G. Andreou, "A circuit model of hair-cell 
transduction for temporal processing and auditory feature extraction", 
Proceedings of the 291h Annual Conference on Information Sciences and Systems, 
Baltimore MD, pp. 350-354,1995. 
[69] R. Sarpeshkar, R. F. Lyon, and C. Mead, "An analog VLSI cochlea with new 
transconductance amplifiers and nonlinear gain control", Proceedings of the IEEE 
International Symposium on Circuits and Systems, Atlanta, GA, 3, pp. 292-295, 
1996. 
[70] T. Hirahara and T. Komakine, "A computational cochlear nonlinear preprocessing 
model with adaptive Q circuits", Proceedings of the IEEE International 
Conference on Acoustics, Speech and Signal Processing, pp. 496-499,1989. 
[71] J. Lazzaro, "anaLOG: A functional simulator for VLSI neural systems", MSc. 
Dissertation, Computer Science Dept., California Institute of Technology, 
(Technical Report No: 5229: TR: 86), 1986. 
[72] J. Lazzaro and C. Mead, "Circuit models of sensory transduction in the cochlea", 
in Analog VLSI Implementations of Neural Networks, (C. Mead and M. Ismail 
editors), pp. 85-101, Kluwer, 1989. 
227 
Bibliography 
[73] L. Watts, "Designing Networks of Spiking Silicon Neurons and Synapses", 
Proceedings of the Computation and Neural Systems Meeting CNS92, 
San Francisco, CA, 1992. 
[74] R. Sarpeshkar, L. Watts, C. A. Mead, "Refractory Neuron Circuit", CNS 
Memorandum CNS-TR-92-08, California Institute of Technology, 1992. 
[75] M. Mahowald, R. Douglas, "A silicon neuron", NATURE, 354,19/26, 
pp. 515-518,1991 
[76] J. Lazzaro, S. Ryckebusch, M. A. Mahowald, and C. Mead, "Winner-take-all 
networks of O(n) complexity", Caltech Technical Report No: CS-TR-21-88. 
(ftp: // www. cs. berkeley. edu/-lazzaro/biblio/wta-nips. ps. gz); also in Advances in 
Neural Information Processing Systems 1, (D. Tourestzky, editor), San Mateo, 
CA: Morgan Kaufmann Publishers, pp. 703-711,1988. 
[77] T. Delbruck, "Bump circuits for computing similarity and dissimilarity of analog 
voltages", Caltech CNS Memo #26, May 24,1993 (ftp: // 
www. pcmp. caltech. edu/anaprose/tobi/bump/bump. pdf). 
[78] J. Lazzaro and J. Wawrzynek, "Speech recognition experiments with silicon 
auditory models", Analog Integrated Circuits and Signal Processing, 13, 
pp. 37-51,1997. 
[79] J. Lazzaro, "Biologically-based auditory signal processing in analog VLSI", 
Proceedings of the IEEE 25th Asilomar Conference on Signals, Systems, and 
Computers, Pacific Grove, pp. 790-794,1991. 
[80] J. Lazzaro, "A silicon model of an auditory neural representation of spectral 
shape", IEEE Journal of Solid State Circuits, 26, pp. 772-777,1991. 
[81] J. Lazzaro and J. Wawrzynek, "Silicon models for auditory scene analysis", 
Advances in Neural Information Processing Systems, 8, pp. 699-705,1996. 
[82] E. Fragniere, A. Schaik, and E. Vittoz, "Linear predictive coding of speech using 
an analogue cochlear model", Proceedings of Eurospeech, Madrid, 1, 
pp. 119-122,1995. 
[83] K. D. Martin, "A computational model of spatial hearing", Proceedings of the 
126`h Meeting of the Acoustical Society of America, Austin, 1994. 
[84] E. M. Wenzel, "Localisation using non-individualised head-related transfer 
functions", Journal of the Acoustic Society of America, 94, pp. 111-123,1993. 
228 
Bibliography 
[85] P. M. Zurek, "A note on onset effects in binaural hearing", Journal of the Acoustic 
Society America, 93, pp. 1200-1201,1993. 
[861 P. M. Zurek, "The precedence effect and its possible role in the avoidance of 
interaural ambiguities", Journal of the Acoustic Society America, 67, pp. 952-963, 
1980. 
[87] C. Pu and J. G. Harris, "A continuous-time analog circuit for computing time 
delays between signals", Proceedings of the IEEE International Symposium on 
Circuits and Systems, Atlanta, GA, 3, pp. 357-360,1996. 
[88] K. D. Martin, "Estimating azimuth and elevation from interaural differences", 
Proceedings of the IEEE Mohonk Workshop on Applications of Signal Processing 
to Acoustics and Audio, 1995. 
[89] J. G. Harris, C. J. Pu, and J. C. Principe, "A Monaural cue sound localizer", Analog 
Integrated Circuits and Signal Processing, 23, pp. 163-172,2000. 
[90] L. Rayleigh, "On our perception of sound direction", Philosophical Magazine, 13, 
pp. 214-232,1907. 
[91] F. L. Wightman and D. J. Kistler, "Monaural sound localization revisited", Journal 
of the Acoustic Society of America, 101, pp. 1050-1063,1997. 
[92] G. F. Kuhn, "Physical acoustics and measurements pertaining to directional 
hearing", Directional Hearing, (W. A. Yost and G. Gourevitch editors), Springer- 
Verlag, pp. 3-25,1987. 
[93] J. C. Middlebrooks, "Narrow-band sound localization related to external ear 
acoustics", Journal of the Acoustic Society of America, 92, pp. 2607-2624,1992. 
[94] C. Lim and R. O. Duda, "Estimating the azimuth and elevation of a sound source 
from the output of a cochlear model", Proceedings of the 28`h Asilomar 
Conference on Signals, Systems and Computers, Pacific Grove, CA, pp. 399-403, 
1994. 
[95] P. Zakarauskas and M. S. Cynader, "A computational theory of spectral cue 
localization", Journal of the Acoustic Society of America, 94, Pt. 1, pp. 1323- 
1331,1993. 
[96] W. H. Slattery, J. C. Middlebrooks, "Monaural sound localization - acute versus 
chronic unilateral impairment", Hearing Research, 75, pp. 38-46,1994. 
[97] M. S. Datum, "An artificial neural network for sound localization using binaural 
cues", Journal of the Acoustic Society of America, 100, pp. 372-383,1996. 
229 
Bibliography 
[98] C. Neti, E. D. Young and M. H. Schneider, "Neural network models of sound 
localization based on directional filtering by the pinna", Journal of the Acoustic 
Society of America, 92, pp. 3140-3156,1992. 
[99] J. Lazzaro and C. Mead, "A silicon model of auditory localization", Neural 
Computation, 1, pp. 41-70,1989. 
[1001 W. G. Gardner and K. Martin, "HRTF measurements of a KEMAR", Journal of 
the Acoustic Society of America, 97, pp. 3907-3908,1995. 
[101) W. Gaik, "Combined evaluation of interaural time and intensity differences: 
psychological results and computer modeling", Journal of the Acoustic Society of 
America, 94, pp. 98-110,1993. 
[1021 P. M. Hofman, and A. J. van Opstal, "Spectro-temporal factors in two-dimensional 
human sound localization", Journal of the Acoustic Society of America, 103, 
pp. 2634-2648,1998. 
[103] D. W. Batteau, "The role of the pinna in human localization", Proceedings of the 
Royal Society London, Series B, 168, pp. 158-180,1967. 
[104] K. D. Martin, "Echo suppression in a computational model of the precedence 
effect", Proceedings of the IEEE Mohonk Workshop on Applications of Signal 
Processing to Acoustics and Audio, 1997. 
[105] D. S. Broomhead and D. Lowe, "Multivariate Functional Interpolation and 
Adaptive Networks", Complex Systems, 2, pp. 312-355,1988. 
[106] J. O. Rawlings, "Applied Regression Analysis", (Wadsworth & Brooks/Cole, 
Pacific Grove, CA), 1988. 
[107] M. S. Brainard, E. I. Knudsen and S. D. Esterly, "Neural derivation of sound source 
location: Resolution of spatial ambiguities in binaural cues", Journal of the 
Acoustic Society of America, 91, pp. 1015-1027,1992. 
[ 108] J. Blauert, "Spatial Hearing - The psychophysics of human sound localization", 
(MIT Press, Cambridge, Massachusetts), 1997. 
[109] D. R. Frey, "Exponential state space filters: a generic current mode design 
strategy, " IEEE Transactions on Circuits and Systems 1,43, pp. 34-42,1996. 
[110] R. M. Fox, "Enhancing dynamic range in differential log-domain filters based on 
the two-filters approach", Proceedings of the IEEE International Symposium on 
Circuits and Systems, Geneva, 2, pp. 617-620,2000. 
230 
Bibliography 
[1111 M. Punzenberger and C. Enz, "A new 1.2 V BiCMOS log-domain integrator for 
companding current mode filters", Proceedings of the IEEE International 
Symposium on Circuits and Systems, 3, pp. 292-295,1996. 
[112] I. Grech, J. Micallef, and T. Vladimirova, "Two ± 0.7 V S21 class AB differential 
memory cells", IEE Electronics Letters, 36, pp. 2062-2063,2000. 
[113] H. Traff and S. Eriksson, "Class A and AB compact switched current memory 
Circuits", Electronics Letters, 29, pp. 1454-1455,1993. 
[114] A. Worapishet, J. B. Hughes, and C. Toumazou, "Class AB technique for high 
performance switched-current memory cells", IEEE International Symposium on 
Circuits and Systems, II, pp. 456-459,1999. 
[115] R. Srowik and R. Schuffny: "Low-power class AB current memory cell", 
Electronics Letters, 35, pp. 2014-2015,1999. 
[116] J. B. Hughes and K. W. Moulding: "S2I: A switched-current technique for high 
performance", Electronics Letters, 29, pp. 1400-1401,1993. 
[117] A. Worapishet, J. B. Hughes, and C. Toumazou: "Low-voltage class AB two-step 
sampling switched currents", IEEE International Symposium on Circuits and 
Systems, Geneva, May 28-3, II, pp. 413-416,2000. 
[118] I. Grech, J. Micallef, and T. Vladimirova, "Low-Voltage, SC TDM correlator for 
the extraction of time delay", Proceedings of the 7`h IEEE International 
Conference on Electronics, Circuits and Systems, 1, pp. 112-115,2000. 
[119] D. R. Frey, "A state-space formulation for externally linear class AB dynamical 
circuits", IEEE Transactions on Circuits and Systems-II, 46, pp. 306-314,1999. 
[120] R. F. Thompson, "The brain", W. H. Freeman and Company, 1985. 
[1211 T. Enomoto, T. Ishihara, and M. Yasumoto, "Integrated tapped MOS analogue 
delay-line using switched capacitor technique" in Electronics Letters. 18, 
pp. 193-194,1982. 
[122] D. J. Allstot and K. Tan, "Simplified MOS switched capacitor ladder filter 
structures", IEEE Journal of Solid State Circuits, SC-16, pp. 724-729,1981. 
[123] M. G. Degrauwe, J. Rijmenants, E. A. Vittoz and H. J. De Man, "Adaptive biasing 
in CMOS amplifiers", IEEE Journal of Solid State Circuits, SC-17, pp. 522-528, 
1982. 
231 
Bibliography 
[124] T. Pasch, U. Kleine, and R. Klinke, "A low voltage differential op amp with novel 
common mode feedback", IEEE International Conference on Electronics, 
Circuits and Systems, 1, pp. 345-348,1998. 
[1251 A. N. Karanicolas, K. O. Kenneth, and J. Y. Wang, " A high-frequency fully 
differential BICMOS operational amplifier", IEEE Journal of Solid State 
Circuits, 26, pp. 203-208,1991. 
[126] Texas Instruments Inc., "TLV2211, TLV2211Y Advanced LinCMOS Rail-to-Rail 
Micropower Single Operational Amplifiers", Product Datasheet, 1997. 
[127] D. A. Johns and K. Martin, "Analog Integrated Circuit Design ", John Wiley and 
Sons, 1997. 
[128] G. Palmisano, G. Palumbo, and R. Salerno, "A 1.5-V high drive capability CMOS 
op-amp", IEEE Journal of Solid State Circuits, 34, No. 2, pp. 248-252,1999. 
[1291 P. Malcovati, F. Maloberti, and M. Terzani, "An high-swing, 1.8 V, push-pull 
OPAMP for sigma-selta modulators", Proceedings of the IEEE International 
Conference on Electronics, Circuits and Systems, 1, pp. 33-36,1998. 
[130] M. Steyaert, J. Crols, and S. Gogaert, "Switched-opamp, a technique for realising 
full CMOS switched-capacitor filters at very low voltages", Proceedings of the 
European Solid State Circuits Conference, pp. 178-181,1993. 
[131] I. Grech, J. Micallef and T. Vladimirova, "± 0.9 V switched capacitor multiplier 
with rail-to-rail input", Electronics Letters, 35, pp. 1688-1689,1999. 
[132] 1. Grech, J. Micallef, C. J. Debono, P. Malcovati, and F. Maloberti, "A IV second 
order sigma-delta modulator", Analogue Integrated Circuits and Signal 
Processing, 24, pp. 151-164,2001. 
[133] L. Theogarajan and L. A. Akers, "A scalable low voltage analog Gaussian radial 
basis circuit", IEEE Transactions on Circuits and Systems, 44, pp. 977-979,1997. 
[134] J. Choi, B. J. Sheu, and J. C. -F. Chang, "A Gaussian synapse circuit for analog 
VLSI neural networks", IEEE Transactions on VLSI Systems, 2, pp. 129-133, 
1994. 
[1351 J. Anderson, J. C. Platt, and D. B. Kirk, "An analog VLSI chip for radial basis 
functions", Advances in Neural Information Processing Systems (S. J. Hanson, J. 
D. Cowan, and C. L. Giles, editors), 5, San Mateo, CA: Morgan Kaufmann, 
pp. 765-772,1993. 
232 
Bibliography 
[136] M. Eskiyerli, C. Toumazou, and A. Payne, "State-space synthesis of integrators 
based on the MOSFET square law", Electronics Letters, 32, pp 505-506,1996. 
[137] J. Chenu, T. Massengill, and K. Bohringer, "EE502 / ME504 Introduction to 
MEMS resonant beams for auditory recognition", University of Washington, 
Seattle, WA 98195. 
(ftp: //www. ee. washington. edu/class/502/autOO/Projects/ChenuMassengill. pdf) 
[138] D. Haronoan and N. C. MacDonald, "A Microelectromechanics Based Artificial 
Cochlea (MEMBAC)", Proceedings of the 8l" International Conference on Solid- 
State Sensors and Actuators and Eurosensors IX, pp. 708-711,1995. 
[139] D. Haronian and N. C. MacDonald, "A microelectromechanics-based frequency- 
signature sensor", Sensors and Actuators A: Physical, 53, pp. 288-298,1996. 
233 
LPF 
Splitter, Squarer 
& Echo Decay 
Window Gen. 
Appendix A. 1 - Onset Detector Chip Layout 
Die Size: 3.5 x 4.9 mm 
'Lit 
16 
a 
s 
I. IN= 
--° 
I....;, 
J 
i. -- i= 
7 
U 
Zýi= 
11 J 
LI: 
0 
"ti" i t I" LI"- 
_ I 
. " 
_ ý, C 
- y ll ý 
Ail 
33 
1 
11 
234 
Appendix A. 1- Onset Detector Chip Layout 
Die Size: 3.5 x 4.9 mm 
ýý 
i 
e 
i 
ý - J 
J 
J 
- Y 
J 
11Vij= l. ' 
^ý 
1 
771 
I' '-1. 
ýý] 
3 "_, 
ý 
_ö 
L LJ ... 
'H 
'.,..; 
ii; 
s 
O 
a 
.: 
IN, : IN, :: qck; w :: *:: ki; 
v 
234 
Appendix A. 2: Test Setup for Onset Detector and Front-End Chip (1 of 2) 
R23 IM 
Mi 
CONN2 
R24 IM Ln" 
Rý" R25 1M 
4 E. A R28 1M 
R27 c> rt28 
"p1 
=-gti 
ECHO--jo JP: 
o 
ip: 
mo+ 
JP: 
LPFO_^0 
JP 
RCurcý ýnaýý 
JF 
DEL-ENV"t 
JF 
ECHO- t 
JF 
ISO" t 
` 
Jf 
JF 
LPFo" t 
JI 
RCwrer" tý 
S 
v I 
Ö 
R1 R2 R21 
925K 
im 
195K 
BRB 
IS- nGýnYF"", ýi T 
Cýý.. I 022u SW DIP-8/SM 
"U 7Äb 
N_ýANr; O Y V)Nr1IN 
týýö; "ý 00 o0cooo NC 
vdd NC 
Trgger ' Ontime(t 
VCP D(2) 
VCN b(5) 
VlmnvVCP NO) 
Vlmov-NyM Wincbd 
VloonvtDC 
oerecroa 
Onimelo) 
RMSConvroc axser Add 
Lin- NC 
Lin* NC 
Vret-In NC 
Rln. NC 
Rim NC 
RCurrenb NC 
RCwrent" NC 
LCwrrent- NC 
LCwrenl" Uj; NC 
IK. NC 60 
lu 
C16 
tu 
rdý 
520K SI. 10 
ý 
T 
C2 
0.22u 
R4 me 1) 
}M 
Oli Wd RMSmnWOC 
nip 1 022u 
R5 
vdd 
469K 
VICOnvIDC $N 
C4 0.22.. T u RB 
Ad 485K Vtý,, Vr_p 
öi 2u 
R T veC J 
7 
11.4M 
16, ro-LPF 
C6 
o. 22u 
Wd 
u Ü vdd 
AS it 
21M üuM-ECMO. Oany 
I 
ca ý- CS 
0.22u T0 nu Oo-- 
T ýI 
R30 $ L1 
vdd vddd 
1mM 
RESISTOR 
vo. TT TT 
C C12 C13 C14 
1000u 0.22u 10001. ' 0.22u 
Wý 
1 r 2 ew 3 
U 
9 
0 
11 
12 
R19 
1OOK 
VR2 
101 
R2a 
100 
JP1a Cl? 
ue 
ma01 
co, 22u 
RON 
R22 
10K 
AV) 
RPI 
IOK. 10 
RP2 
IOK 00 
RP3 
10Ki4 
R3 
tOK 
a» 
109 
Appendix A. 2: Test Setup for Onset Detector and Front-End Chip (2 of 2) 
uryýýry{ Sig ` v, PC 33; aL, NC fill R70 
1 ýI -Ca NC 
NC Ott RC 
I 
1 
NC L. IOGSM Nis hWrRý NC 
R)1 p RC AO. " 
N= 
NC 
Mil tOItY ý Vaeco 003 latc-Rof 
1172 1 OW-Oft NC 
Mt WcCv: 
Load NC 
we"U04) RL 
RK MS Ml3) tpa trmot 
tog NC CONIM C 
WMdWt> IOCDI. 2nd 
FBANK w 
4wol R.. R, 4t me"aws) II . is 
Mc L"tfaupwit 
NC LMeuMaA 
RC NC 
W Mý4* O) NC 
RC LWout4oW 
11 NL 
njA W004"0(4) MMOYO-" 
NC MC 
MC NC 
Lb 
MOW A"1131 xC 
NC NC 
NC LM* 
NC NC 
YYtYYýYýYYýYYlýYýYYýýYýYýýýýYýýýYYýý 
C0004 
-M 
17111 11 
f4owls"Ol 
 
4 
t 3 
n' 
cow" 
M 0 ` C10 1 .t Iln , OK C, 023v ý I Ottu C37 T 0.27. 
3 
X41 
INK 
943 
tu 
x 
N"IfýuPeý 
m 
I"IýPoyý 
RAN 
a M_ý__b 
k 
rW 
WI G{ Gf C34 C24 C10 CM Wt W$ Gt 
Gt C34 
022.022.022 On,. 071Y 012. OItY 022.022.0,22 022,. 0 . 
22 
R/4 u8o 
,M 47i RSO urn TL 
, PES 
ueF 
"44 iM nm4 
MnnMý RMSCýIW `ý 
JP29 
R"MSCitput 
P22 JPIS 
W l"MSCI= 
R"MSC7n0.. Si 
JP23 
L"MSC2 
JP26 
R47 02- 
ObM C R49 
- ýý 210km Imo, 
JPtI IM Rsl 
lp 02- im 
lo_ýb 
JP70 
R73 60AM R75 
I'M 
913( 
R46 
nn ýL t0]M 
R45 
10. ]M 
Rer R44 
im 
, p, 
Jp, ýa. naow ý 
x 
w 
1M 
Bias sources, IID & MSC extraction 
Calibration 
Memory 
Bias sources, RD & MSC exlracti m 11 
X 
Splitter 
&AGC 
Max detect 
Appendix A3. - Spectral Cue Extraction Front-end Chip Layout 
Die Size: 13 x II mm 
................. 
L-Li F-1. 
ýýý 
1' 
.12. 
v, r_ v r= 6, r- v rý rý 4rNF6. r= v 
7 F-7 
ED ED 
KH J. 
on cza UP 
Iý 
236 
Appendix A3. - Spectral Cue Extraction Front-end Chip Layout 
Die Size: 13 x II mm 
-- DOLIdtddtilU 
.............. ý. ,.,.. .. - ............ 
ili 
. '. r "C 'S . 
fS. 'r 1: S 
. 
1'... '. 1: '... 
ý1' 'S 
_. 
i 'S . i'_ý: ': t5: '.. i'_.: '. : 1: 'ý 1 
Yr. I wi r 
0 C3 C3 C3 0 'E-D* 1ýý F 
^=v, Fnv r .. r=vr=ur=ur=vr=6. r r 
_ý- o- 1=1 - o--o- pEl- Qa- o- pEl- LI.. _ 
,__". _- ý 
', L- Lý "" Lý 
Uý '" L: '. Lam, .. 
  
ý, ý: ý 
?; ý;: ý 
. ......... . 
.............. 
i 
236 
ý4 i 
Dclay 1 
okýdk zw- 
Final sie 
Appendix A. 4 - SC Cascade Delay Line Layout 
Die size: 3.00 x 2.95 mm 
237 
Appendix A. 4 - SC Cascade Delay Line Layout 
Die size: 3.00 x 2.95 mm 
237 
ConVkW amp 
with On- p Test 
Pads 
op awp 
w 
Complete 0p amp CMF qMw 
Appendix A. 5 - Layout of the Differential Class AB Op amp 
Die size: 1.78 x 1.78 mm 
I 
* j-r--rrm 
: 
________ 
4 
nit. 
" 
" 
c. 
t 
Li 
ti ` 
9 
S" i:: 
j 1j 
i 
_.... 
' 
V, 
/ 
.: 0 , 
c. P 
u6fi; ziv. ii ýa ý*, 
, 
", . _.. _ .. ý_i 
: TJ Li ti ii el . `ºi"I4 "ýi ( f# " 
S 
H 
238 
Appendix A. 5 - Layout of the Differential Class AB Op amp 
Die size: 1.78 x 1.78 mm 
Complete Op amp, aw , 
with On-chip Test -- 
.. ic j"K®a OWN - 
law'' - ;? ý1-: ... .. 
_ 
', .. Op amp 
without 
omplete Op amp CMFB sense 
" resistors 
13 M 
'1 IT , ý, Jwj 1"M, !8 
238 
Digital 
Control 
Adder 
Cz ator 
Analogue 
Memory 
Clods Level Shiners 
Bank 
Mulfioier 
ParaUd SJH 
Bank 
Appendix A. 6 - Layout of TDM Correlator Chip 
Die size: 3.10 x 3.43 mm 
La `ý': 
i 
i 
i 
s 
i 
_. , 
ij! 
.10(dn, 
ý, 
v. yr, . a". c.. ... 
c 
TI.. 
I' 
$iF 
ý sF 
i" 
__ 
I4W 
`1 
r®ý 
.a l+ 
-sem "a 
:1 : 
1y ýK 
UNI +y' BURP UR 
AR` 
239 
Appendix A. 6 - Layout of TDM Correlator Chip 
Die size: 3.10 x 3.43 mm 
.. 
s 
J Pl v, v. ww wj 
-Adder & 
Comparator 
Aa ogu 
emory 
In egrator 
ank 
; iC 
Mtild piier 
r" 
4i fters 
%-Iwp SU 
Pp 
1 
'ý f 
MOT 
III 
LIM Ap. 
UwAfmqrry(wsuRm: s 
239 
Appendix A. 7: Test Setup for the Correlator Chip 
2 I0 
U2A 
L_ 
TL074 
R3 
R4 IOK 
IOK 
R6 M 
10K IOK 
WWl0 
ins 
nm. 
s0 M 0 
v doe 
u3c Run 
hol. 
R7 
10K 
Re 
10K 
R9 RIO 
IOK 10K 
MduO 
uto 
nd 
TLO. 
C3 is 
NC 
mK2_2 
nK1 2 
mnK2 
mnKI 
mVml. 
mKt 
mVM2- 
mynd 
. Re 
mRON 
mU 
m1KM 
mVln2" 
Vdds 
NC 
NC 
02 
LED 
e$$Tý'ý' 4 sß3±ääßs 
ya:. ýi 1ý 
g 
1 
Rit 
1K 
R20 
100 
888 
Nc 
NC 
-d 
coi i. s Oo CHIP ma. po"(0) 
aut. (2) 
"° 
vdd 
mar 1 
a 
m.. yo 01 
$0547 us 
ck2 
at. (O) " s t 
X13) f" 
Slaw(1) oom ord Cnmpem oom o compout2 " 2 w r NC ýa "fa 
NC compMSM(O) 
Nc 
PIC 
D3 
LED 
sy^ýq 
NC 
Y$yiEöZSSZ R21 
100 
R14 R12 
vdd" W. p Vpya 
1K 
G 
tOK ö m.: 
/ 02 
9C6a7 
m moa" E' 
u 
0.1r 
R15 
Wd. muO Mina 
loK C5 
0.1u 
4 _ 
D4 Z 
ra. 
R16 
m"miEM" 
LED 
1EK C6 
tu O 
. R2i 
100 
R17 R13 1K 
vsf opt_ 
16K C7 
maa 2 03 
8C5a7 
OAU 
R16 p 
vu 
16K 
"p2 d was 
C6 
0.1u 
Wd_2ou 
RE 
JP3 
RESET 
' 
DO 
" 
UBLER 
U 100 
CI? 
0.1u 
C1J 
wa. 
. 
Ydd 
ra ToAu T. C9 D1 R19 T C10 it IN4148 1OK C11 IC 12 
ou 10. IOU IOU 
4 HEADER 
3 
5 
6 
7 
I Y 
ill 
wa 
ý ýYlgo 
, C7. 
io 
rs. 11 2 
12 HEADER 
JP6 
PHASE 
wa. 
Yss 1 
2 
02 
VR1 
Lin 
Ys 4 
0 
00K C16 
Im e 
GM 
ý 
10 
11 
12 
12 HEADER 
