An Analog Architecture for Auditory Feature Extraction and Recognition by Smith, Paul Devon
AN ANALOG ARCHITECTURE FOR AUDITORY







of the Requirements for the Degree
Doctor of Philosophy in Electrical Engineering
School of Electrical and Computer Engineering
Georgia Institute of Technology
November 2004
Copyright© 2004 by Paul Devon Smith
AN ANALOG ARCHITECTURE FOR AUDITORY
FEATURE EXTRACTION AND RECOGNITION
Approved by:
Dr. Paul E. Hasler, Advisor
Professor, School of ECE
Georgia Institute of Technology
Dr. David V. Anderson
Professor, School of ECE
Georgia Institute of Technology
Dr. Robert J. Butera
Professor, School of ECE
Georgia Institute of Technology
Dr. Chin-Hui Lee
Professor, School of ECE
Georgia Institute of Technology
Dr. Bradley A. Minch
Professor, School of ECE
Franklin W. Olin College of Engineering
Date Approved: August 2004
DEDICATION
To my family,
for continued support and guidance;
and to the eternal creator
without whom this journey would have never started,
nor been completed.
ACKNOWLEDGEMENTS




Speech recognition systems have been implemented using a wide range of signal process-
ing techniques including neuromorphic/biological inspired and Digital Signal Processing
techniques. Neuromorphic/biologically inspired techniques, such as silicon cochleamod-
els, are based on fairly simple yet highly parallel computation and/or computational units.
While the area of digital signal processing (DSP) is based onbl ck transforms and statisti-
cal or error minimization methods.
Essential to each of these techniques is the first stage of extracting meaningful infor-
mation from the speech signal, which is known as feature extraction. This can be done
using biologically inspired techniques such as silicon cochlea models, or techniques be-
ginning with a model of speech production and then trying to separate the the vocal tract
response from an excitation signal. Even within each of these approaches, there are mul-
tiple techniques including cepstrum filtering, which sits under the class of Homomorphic
signal processing, or techniques using FFT based predictive approaches. The underlying
reality is there are multiple techniques that have attackedth problem in speech recog-
nition but the problem is still far from being solved. The techniques that have shown to
have the best recognition rates involve Cepstrum Coeffici nts for the feature extraction and
Hidden-Markov Models to perform the pattern recognition.
The presented research develops an analog system based on programmable analog array
technology that can perform the initial stages of auditory feature extraction and recognition
before passing information to a digital signal processor. The goal being a low power system
that can be fully contained on one or more integrated circuitchips. Results show that it is
possible to realize advanced filtering techniques such as Cep trum Filtering and Vector
Quantization in analog circuitry. Prior to this work, previous applications of analog signal
processing have focused on vision, cochlea models, anti-aliasing filters and other single
component uses. Furthermore, classic designs have looked heavily at utilizing op-amps
v
as a basic core building block for these designs. This research also shows a novel design
for a Hidden Markov Model (HMM) decoder utilizing circuits that take advantage of the




DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Physiology of the human ear . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 The Speech Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Existing Approaches in Analog/Digital Speech Recognition . . . . . . . . 14
1.4 Neuromorphic Engineering . . . . . . . . . . . . . . . . . . . . . . . . .18
1.5 An Analog Front End for Speech Recognition . . . . . . . . . . . .. . . 19
1.5.1 Frequency Decomposition . . . . . . . . . . . . . . . . . . . . . 21
1.5.2 Amplitude Detection . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.3 V-to-I Linear Transconductor . . . . . . . . . . . . . . . . . . . .22
1.5.4 Weighted multiplication . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.5 Distance Measure . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.6 Sequence Detection . . . . . . . . . . . . . . . . . . . . . . . . . 23
CHAPTER 2 PROGRAMMABLE ANALOG MEMORIES . . . . . . . . . . 24
2.1 Floating-gate Transistor Element . . . . . . . . . . . . . . . . . .. . . . 24
2.2 Device Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Programming Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Kappa Projection Algorithm (KPA) for Ultra-low programming . . . . . . 37
2.5 Floating-gate Memory Arrays . . . . . . . . . . . . . . . . . . . . . . .43
CHAPTER 3 PROGRAMMABLE CONTINUOUS-TIME FILTER BANKS . 47
3.1 A Compact Band-pass Filter Element . . . . . . . . . . . . . . . . . .. . 47
3.2 Designing for Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
vii
3.3 Designing for multiple filter stages . . . . . . . . . . . . . . . . . . . 55
3.4 Decreasing Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . .59
3.5 Programmed Filter Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.6 Programming Out Offsets . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.7 Biasing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.7.1 Floating-gate Direct Biasing . . . . . . . . . . . . . . . . . . . .66
3.7.2 Programmable Bias Generators . . . . . . . . . . . . . . . . . . . 67
CHAPTER 4 ANALOG SIGNAL PROCESSING BLOCKS . . . . . . . . . . 70
4.1 Frequency Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . .70
4.2 Amplitude Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Linear transconductance . . . . . . . . . . . . . . . . . . . . . . . . . .. 80
4.4 Programmable Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . .85
4.5 Programmable bump element . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6 Programmable diffuser element . . . . . . . . . . . . . . . . . . . . . . . 88
CHAPTER 5 ANALOG SIGNAL PROCESSING SYSTEMS . . . . . . . . . . 95
5.1 Auditory Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Analog Pattern Recognition Blocks . . . . . . . . . . . . . . . . . .. . . 102
5.2.1 Analog Vector Quantization . . . . . . . . . . . . . . . . . . . . . 102
5.2.2 Applications of Programmable Diffusers to HMM classifiers . . . 105
CHAPTER 6 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
viii
LIST OF TABLES
Table 1 Tunnelling and injection voltages for various feature sizes available through
MOSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Table 2 Summary of Kappa Projection Algorithm (KPA) performance . . . . . . 43
ix
LIST OF FIGURES
Figure 1 Overview of CADSP concept. . . . . . . . . . . . . . . . . . . . . .. . 2
Figure 2 Analog and Digital system partitioning. . . . . . . . . . . . . . . . . 3
Figure 3 DSP power/MMAC vs. Time . . . . . . . . . . . . . . . . . . . . . . . 4
Figure 4 Development of Analog computational arrays. . . . . .. . . . . . . . . 5
Figure 5 Physiology of the outer ear. . . . . . . . . . . . . . . . . . . . .. . . . 6
Figure 6 Frequency vs. position response of the cochlea. This system is similar
to a filter bank decomposition. . . . . . . . . . . . . . . . . . . . . . . . 19
Figure 7 Analog speech recognition system block diagram. . .. . . . . . . . . . 20
Figure 8 Cross section of a floating-gate element. . . . . . . . . .. . . . . . . . . 25
Figure 9 Flow chart of first iteration of floating-gate programming algorithm. . . . 28
Figure 10 Plot of injection rate versus injection pulse width for different drain-to-
source voltages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Figure 11 Extraction ofVin j from the injection rate parameters. . . . . . . . . . . . 30
Figure 12 Extraction ofα from the injection rate parameters. . . . . . . . . . . . . 31
Figure 13 Example data using the programming algorithm to program a device to
various operating points. . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 14 Floating-gate programmability of a single deviceto various operating
points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Figure 15 Programming control structure hardware evolution. . . . . . . . . . . . . 35
Figure 16 Figure to illustrate the device programming setupand current measure-
ment accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Figure 17 A novel predictive algorithm for programming floating-gate devices into
pico-amp and sub-pico-amp current ranges. . . . . . . . . . . . . . . 38
Figure 18 Figure showing kappa and what is meant by the ”eff ctive” kappa of a
floating-gate device as used in the KPA. . . . . . . . . . . . . . . . . . .39
Figure 19 κ variation with drain current and variation across an entirechip at the
single deep sub-threshold projection point. . . . . . . . . . . . .. . . . 40
x
Figure 20 Sub-threshold plots showing a set of programming points using the Kappa
Projection Algorithm (KPA). . . . . . . . . . . . . . . . . . . . . . . . . 41
Figure 21 Band pass filter corner frequency characterization. . . . . . . . . . . . . 42
Figure 22 Band pass filter low corner frequencies set using the KPA. . . . . . . . . 42
Figure 23 Floating-gate array demonstrating element isolation by controlling the
gate and drain voltage of each column and row respectively. .. . . . . 44
Figure 24 Early injection results showing an array of floating-gate devices pro-
grammed to exponentially spaced currents. . . . . . . . . . . . . . .. . 45
Figure 25 Injection results showing a single row of floating-gate multiplier blocks
programmed to cosine coefficients. . . . . . . . . . . . . . . . . . . . . 46
Figure 26 Schematic of a single C4 structure. . . . . . . . . . . . . . . . . . . . . . 48
Figure 27 C4 frequency response plots with various gains and independently tu -
able corner frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Figure 28 C4 biasing using resistors and programmable devices, respectively. . . . . 52
Figure 29 Simulation and measurement data of filter ”Q” vs. current ratio. . . . . . 54
Figure 30 C4 circuit and frequency response for a various filter structures. . . . . . 55
Figure 31 Change in C4 input capacitance vs. frequency. . . . . . . . . . . . . . . 56
Figure 32 Characterization plot ofC4 Low corner frequency for decreasing values
of iTauLow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Figure 33 Characterization plot ofC4 High corner frequency for increasing value
of iTauHigh. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Figure 34 Extraction of C4 frequency response parameters. . . . . . . . . . . . . . 58
Figure 35 C4 change in linearity with DIBL voltage. . . . . . . . . . . . . . . . . . 59
Figure 36 Example of applying correction factor to low and high corner frequencies. 60
Figure 37 Filter bank frequency response error after correction factor . . . . . . . . 61
Figure 38 C4 filter bank response data. . . . . . . . . . . . . . . . . . . . . . . . . 61
Figure 39 C4 filter bank response data. . . . . . . . . . . . . . . . . . . . . . . . . 62
Figure 40 C4 filter bank response data. . . . . . . . . . . . . . . . . . . . . . . . . 63
Figure 41 Second-order C4 array programmed with a Q of 1. . . . . . . . . . . . . 64
xi
Figure 42 C4 first-order filter bank measured and theoretical fit. . . . . . . .. . . . 65
Figure 43 C4 second-order filter bank measured data and theoretical fit. .. . . . . 66
Figure 44 C4 third-order measured data and theoretical fit. . . . . . . . . . . .. . . 67
Figure 45 C4 fourth-order filter bank response data. . . . . . . . . . . . . . . . . .68
Figure 46 C4 fifth-order filter bank response data. . . . . . . . . . . . . . . . . . . 68
Figure 47 Frequency response of the cochlea. . . . . . . . . . . . . . . . . . 71
Figure 48 Filter bank approach to decomposition and similarities to classic way of
thinking about FFTs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Figure 49 Schematic of a single C4 structure. . . . . . . . . . . . . . . . . . . . . . 73
Figure 50 Floating-gate version of a capacitively coupled current conveyer second-
order section (C4SOS). . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Figure 51 Illustration of the method used in the most recent filter bank chip with
attenuation on the input as well as a buffer on the output. . . . . . . . . . 74
Figure 52 C4 filter bank response data. . . . . . . . . . . . . . . . . . . . . . . . . 75
Figure 53 Second-order C4 array programmed with a Q of 1. . . . . . . . . . . . . 76
Figure 54 Filter bank outputs from a speech signal with two segments of spoken
words. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Figure 55 The initial peak detector circuit using a classical diode for the input and
the corresponding step response. . . . . . . . . . . . . . . . . . . . . . .78
Figure 56 The redesigned peak detector circuit using a opamps in a feedback con-
figuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Figure 57 The redesigned min detector circuit using a opampsin a feedback con-
figuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Figure 58 Circuit diagram illustrating a floating-gate implementation of a linearized
transconductance stage. . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Figure 59 Response of Linear Transconductance stage used tochange the differen-
tial input voltage into a single-ended output current. . . . .. . . . . . . 82
Figure 60 Schematic drawing of a hair cell and correspondingresponse. . . . . . . 83
Figure 61 Circuit diagram illustrating a floating-gate implementation of a normal
differential input stage. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Figure 62 Response of the floating-gate differential input pair. . . . . . . . . . . . . 85
xii
Figure 63 Voltage-mode floating-gate multiplier. . . . . . . . . . . . . . . . . . 86
Figure 64 The programmable bump element and the corresponding output given
three different programmed offset values. . . . . . . . . . . . . . . . . . 87
Figure 65 Programmable VQ using floating-gate circuits. . . .. . . . . . . . . . . 88
Figure 66 A ”classical” diffuser element and a single floating-gate diffuser cell. . . 89
Figure 67 Output current for a ”classical” diffuser element. . . . . . . . . . . . . . 90
Figure 68 Diffuser line output voltage for each element vs. time with symmetrical
conductances and input applied to the center. . . . . . . . . . . . . 92
Figure 69 Diffuser line output voltage for each element vs. time with symmetrical
conductances and input applied at each end. . . . . . . . . . . . . . .. . 93
Figure 70 Respective digital and analog cepstrum implementatio s. . . . . . . . . . 98
Figure 71 Analog Cepstrum building blocks. . . . . . . . . . . . . . .. . . . . . . 99
Figure 72 Cepstrum system output. . . . . . . . . . . . . . . . . . . . . . .. . . . 101
Figure 73 Overview of Vector Quantization (VQ). . . . . . . . . . . . . . . . . 103
Figure 74 Basic circuit, architecture, and measurements from the VQ circuit. . . . . 104
Figure 75 Dendrite figure showing component structures. . . .. . . . . . . . . . . 107
Figure 76 Circuit design for the HMM branch element as well asthe corresponding
HMM classifier network. . . . . . . . . . . . . . . . . . . . . . . . . . . 110




As portable electronics continue to advance, greater signal processing is required at lower
power and at faster rates. While digital systems and DSPs arehighly programmable and
easy to use, the high power consumption quickly drains batteries and the slow processing
can exclude these from real-time situations. However, instead of looking to the digital
world for all of life’s answers, analog blocks should be usedwhen there are clear wins.
Specifically, MOSFETs running in the subthreshold regime can be used to create extremely
low-power, continuous-time systems. Plus, with the addition of floating-gate MOSFETs
within this analog circuitry, these analog systems take on one of the main characteristics of
digital circuitry, which is ease of use through programmability.
This thesis outlines current efforts in creating cooperative analog/digital signal process-
ing (CADSP) systems towards auditory signal processing applications [2, 25]. We discuss
current analog circuit approaches towards the front-end signal processing. We discuss our
current IC approaches using this technology for an analog signal processing front-end sys-
tem for speech recognition.
New advances in analog VLSI circuits have made it possible toperform operations
that more closely reflect those done in DSP applications, or that are desired in future DSP
applications. Further, analog circuits and systems can beprogrammable, reconfigurable,
adaptive, and at a density comparable to digital memories (for example, 100,000+ multi-
pliers on a single chip) Therefore, one might wonder if we have both digital and analog
signal processing (DSP and ASP respectively) available, how does one choose a particular
solution for a given auditory application. The question is where to partition the analog–
digital boundary, as shown in Figure 1, to enhance the overall functionality of a system by
utilizing analog/digital computations in mutually beneficial ways. By addingfunctionality
















Figure 1. CADSP concept. Computations that have typically been implemented completely in a digital
sytem, are being moved to analog implementations for improvements in power and area. In a purely
digital system, the bottleneck is in the analog-to-digitalconversion. There are also significant gains by
reducing the amount of data being transmitted over the analog -to-digital pipe. Accuracy of the data
being used in the digital system reduces down to 10 bits of accuracy due to quantization noise and
fixed-point errors.
therefore, the entire product under consideration. Further, t is additional computational
power allows for expansion of current DSP algorithms to incorporate more biologically
inspired techniques in its algorithms.
Speech recognition systems have been implemented using a wide range of signal pro-
cessing techniques including neuromorphic/biological inspired [44, 45, 47, 85] and Digital
Signal Processing techniques [65, 9, 74, 16, 86, 87]. Neuromo phic/biologically inspired
techniques such as silicon cochlea models are based on simple and highly parallel computa-
tion and computational units. While in the area of digital signal processing (DSP) is based
on block transforms and statistical or error minimization methods. In addition to this, the
computation occurs serially. Over the past years DSP has grown into a very popular topic.
Applications using DSPs have also grown fuelled by the increase in computing power and
decrease in cost, which has been a very positive step, particul ly in the area of speech
recognition.
2












































Figure 2. Partitioning a system between analog and digital implementations is critical. There is a point
where one domain is more efficient based on costs factors such as die-area and hours of development.
Essential to each of these techniques is the first stage of extracting meaningful informa-
tion from the speech signal, which is known as feature extraction. This can be done using
biologically inspired techniques such as silicon cochlea models, or techniques beginning
with a model of speech production as a convolution of the vocal tract with an excitation sig-
nal and then trying to separate the signals. Even within eachof these approaches, there are
multiple techniques including cepstrum filtering, which sits under the class of Homomor-
phic signal processing, or techniques using classic FFT based predictive approaches. The
underlying reality is there are multiple techniques that have ttacked the problem is speech
recognition and the problem is still far from being solved. The techniques that have shown
to have the best recognition rates involve Cepstrum Coeffici nts for the feature extraction
and Hidden-Markov Models to perform the pattern recognitio.
The proposed research will develop an analog system based onfloati g-gate technol-
ogy that can perform the initial stages of auditory feature extraction and recognition before
passing information to a digital signal processor. This concept is illustrated in Figure 4. The
research aims to develop a low power system that can be fully contained on one or more
integrated circuit chips. Preliminary results show that itis possible to realize an advanced
filtering technique known as Cepstrum Filtering and a VectorQuantization technique, in
3
















Figure 3. DSP power/MMAC vs. Time: DSP power per MMACs has tracked Moore’s Law for over
two decades [14]. Current limitations for high-end systemsare imposed by the Analog-To-Digital Con-
verter. This bottleneck can be removed by shifting some of the processing from the digital domain to
the analog domain. Results show an equivalent of up to a 20 year increase in power per MMAC of
computation by implementing Cooperative-Analog Digital Signal Processing Techniques.
analog circuitry. Prior to this work, previous applications of analog signal processing
have focused on vision, cochlea models, anti-aliasing filters and other single component
uses. Furthermore, classic designs have looked heavily at utilizing op-amps as a basic core
building block for these designs. This research also proposes a novel design for a Hidden
Markov Model (HMM) decoder. This project will utilize circuits that take advantage of

































































Figure 4. Development of Analog computational arrays. a.) Micro-level: single floating-gate transis-
tor element. b.) Macro-level: Array of floating-gate elements. c.) System-level: Analog Computing
Arrays are a combination of computing and memory elements asdetailed in the digital system. This
consolidation reduces overall die area and substantially reduces overall system power.
1.1 Physiology of the human ear
The human ear is an exceedingly complex organ. Figure 5 illustrates the major structures
and processes that comprise the human ear. The outer ear is composed of two parts, the
visible flap of skin and cartilage attached to the side of the head, and the ear canal, a tube
about 0.5 cm in diameter extending about 3 cm into the head. These structures direct
environmental sounds to the sensitive middle and inner ear org ns located safely inside of
the skull bones. Stretched across the end of the ear canal is athin sheet of tissue called the
tympanic membrane or ear drum. Sound waves striking the tympanic membrane cause it
to vibrate. The middle ear is a set of small bones that transfer this vibration to the cochlea
(inner ear) where it is converted to neural impulses. The cochlea is a liquid filled tube
roughly 2 mm in diameter and 3 cm in length. The cochlea is curled up and looks like a
small snail shell as shown in Figure 5. In fact, cochlea is derived from the Greek word for
snail.
When a sound wave tries to pass from air into liquid, only a small fraction of the sound
is transmitted through the interface, while the remainder of the energy is reflected. This
is because air has a low mechanical impedance (low acoustic pressure and high particle
5
Ty m p a n u m
M a l l e u s
I n c u s
S t a p e s
C o c h l e a
P i n n a
O u t e r
E a r
M i d d l e
E a r
I n n e r
E a r
Figure 5. Physiology of the outer ear.
velocity resulting from low density and high compressibility), while liquid has a high me-
chanical impedance. In less technical terms, it requires more effort to wave your hand in
water than it does to wave it in air. This difference in mechanical impedance results in most
of the sound being reflected at an air/liquid interface.
The middle ear is an impedance matching network that increases the fraction of sound
energy entering the liquid of the inner ear. For example, fishdo not have an ear drum or
middle ear, because they have no need to hear in air. Most of the impedance conversion
results from the difference in area between the ear drum (receiving sound from theair) and
the oval window. The ear drum has an area of about 60mm2 , while the oval window has an
area of roughly 4mm2. Since pressure is equal to force divided by area, this difference in
area increases the sound wave pressure by about 15 times.
Contained within the cochlea is the basilar membrane, the supporting structure for about
12,000 sensory cells forming the cochlear nerve. The basilar membrane is stiffest near
the oval window, and becomes more flexible toward the opposite end, allowing it to act
as a frequency spectrum analyzer. When exposed to a high frequency signal, the basilar
6
membrane resonates where it is stiff, resulting in the excitation of nerve cells close to the
oval window. Likewise, low frequency sounds excite nerve cells at the far end of the basilar
membrane. This makes specific fibers in the cochlear nerve respond to specific frequencies.
This organization is called the place principle, and is prese ved throughout the auditory
pathway into the brain.
Another information encoding scheme is also used in human hering, called the vol-
ley principle. Nerve cells transmit information by generating brief electrical pulses called
action potentials. A nerve cell on the basilar membrane can encode audio information by
producing an action potential in response to each cycle of the vibration. For example, a
200 hertz sound wave can be represented by a neuron producing200 action potentials per
second. However, this only works at frequencies below about500 hertz, the maximum
rate that neurons can produce action potentials. The human ear ov rcomes this problem
by allowing several nerve cells to take turns performing this single task. For example, a
3000 hertz tone might be represented by ten nerve cells altern tely firing at 300 times per
second. This extends the range of the volley principle to about 4 kHz, above which the
place principle is exclusively used.
Our impression of the loudness of a sound corresponds betterto the log of the acoustic
power rather than its linear value. Thus, it is common to exprss sound intensity on a
logarithmic scale, called decibel SPL (Sound Power Level).On this scale, 0 dB SPL is a
sound wave power of 10 -16 watts/cm 2 , about the weakest sound detectable by the human
ear. Normal speech is at about 60 dB SPL, while painful damageto the ear occurs at about
140 dB SPL. Successively doubling the power in a sound gives an impression of roughly
equal steps in loudness. The loudness of a sound is thereforenormally expressed on the
logarithmic decibel (dB) scale, where a factor of ten increase corresponds to 10 dB. (The
cube root of the power is now generally considered to match perce tual loudness even more
closely, but the use of the log scale has been retained.)
The difference between the loudest and faintest sounds that humans can hear is about
7
120 dB, a range of one-million in amplitude. Listeners can detect a change in loudness
when the signal is altered by about 1 dB (a 12% change in amplitude). In other words,
there are only about 120 levels of loudness that can be perceiv d from the faintest whisper
to the loudest thunder. The sensitivity of the ear is amazing; when listening to very weak
sounds, the ear drum vibrates less than the diameter of a single molecule!
The perception of loudness relates roughly to the sound power with an exponent of 1/3.
For example, if you increase the sound power by a factor of ten, listeners will report that the
loudness has increased by a factor of about two. This scale can work to your disadvantage
in the opposite direction when eliminating undesired environmental sounds. An example
would be to cover 99% of your wall with a perfect soundproof materi l in an effort to block
unwanted noise. Even though the sound power has been reducedby 99% to only 1% of its
former value, the perceived loudness has only dropped by 20%.
The range of human hearing is considered to be 20 Hz to 20 kHz, with a peak in sensi-
tivity to sounds between 1 kHz and 4 kHz. It falls off markedly somewhere below 100Hz
and, depending on our age, somewhere above 5 to 10kHz. For example, listeners can de-
tect sounds as low as 0 dB SPL at 3 kHz, but require 40 dB SPL at 100 hertz (an amplitude
increase of 100). Listeners can tell that two tones are diff rent if their frequencies differ
by more than about 0.3% at 3 kHz. This increases to 3% at 100 hertz. For comparison,
adjacent keys on a piano differ by about 6% in frequency.
1.2 The Speech Signal
The organs primarily involved in producing speech are the larynx, visible as the ”Adam’s
apple” in men, and which contains a pair of muscular folds called the vocal cords, and the
vocal tract, which is a tube leading from the larynx along thep arynx and then branching
into the oral cavity leading to the lips and through the nasalcavity to the nostrils.
Acoustic energy in speech can be generated in two different ways. The primary mech-
anism, known asvoicedexcitation, occurs in the larynx. The muscles in the larynx place
8
the vocal cords close together and make them loose enough that when air from the lungs is
driven through them they open and close quasiperiodically at an average rate of about 110
times a second for a man and about twice that for a woman. The main instant of voiced
excitation occurs not, as one might expect, on opening, but when the air flow from the lungs
is suddenly stopped as the cords are pulled together by Bernoulli forces resulting from air
flow through the opening. The forces are caused by a pressure drop and change of flow
direction. Bernoulli Forces tend to close and constrict anyform of valve opening and limit
the controllable power through the valve [32]. The resulting voiced speech sounds include
all vowels (unless whispered) and many consonant sounds: the wordsRoman, yellow, and
wiring, for example, are composed entirely of voiced sounds.
In the second mechanism for generating acousitc energy in speech air passes from the
lungs through the larynx with the vocal cords held apart and is forced through a constriction
formed by the tongue or lips causing turbulence and resulting in a aperiodic, noise-like
excitation. Sounds generated purely in this way (such as ”s”and ”ft” in soft) are said to be
voiceless, and they generally play a less important role in speech than voiced sounds.
Vowel sounds are voiced. They are produced without any obstruction in the oral cavity.
If the branch to the nasal tract is open, the vowel is said to benasalized (such as the vowels
in the French wordsbon, sans, faim,etc.). Vowels can be further divided into so-called pure
vowels, which can be produced in isolation with a stationaryvocal tract, anddiphthongs,
(such as in the wordsay, sowandsigh) where a movement of the articulators (the tongue,
lips or jaw) is necessary.
In contrast to vowels, consonants always involve a narrowing in the oral tract and in the
extreme case, the narrowing may result in total obstruction. Sounds involving such total
obstruction come under the general heading of stops. Nasal consonants are produced when
the nasal branch is open. Examples of this can be found in the final sounds of sim, sin and
sing. Voiceless plosives are generated when the nasal branch is closed and pressure builds
prior to the oral closure being release. Examples include the sounds at the beginnings of
9
the words pin, tin and kin.
Airflow through a constriction causes turbulence. When thisprocess is steady, the re-
sulting sound is know as a fricative, either voiceless (as inthe initial sounds of fat, sip and
thick) or voices (as in the corresponding sounds in vat, zip and the). The noise-like compo-
nent of voiced fricatives is generally much weaker than thatin their voiceless homologues.
Indeed, the whole sound is less intense, and this intensity difference forms one of the cues
used in discriminating between voiced and voiceless fricatives.
When the vocal tract is narrowed but not enough to cause turbulence a class of conso-
nant sounds such as the initial sounds in way, ray and lay is produced. They are lumped
together under the general heading of sonorants.
Whether the excitation in a speech sound is voiced or voiceless, the acoustic signal
generated by the excitation is modified by the resonant structure of the vocal tract, which
behaves as an acoustic tube along which planar propagation of sound waves occurs. Dif-
ferences in the cross-sectional area along the length of thetube cause reflections, and it
is these reflections that give rise to the resonances orformants. The resonant structure
therefore depends on the position of the tongue, lips and jaw.
The generation of the excitation and its spectral modification by the vocal tract turn
out to be largely independent of each other. To a good approximation, they can therefore
be considered as a source isolated from, and leading into, a line r filter. If we look at the
power spectrum of a section of speech waveform, the regularly spaced spikes occur at each
integer multiple of the fundamental frequency of the excitation, and are harmonics of the
fundamental. The intensity of the harmonics is determined by the product of two factors.
The first is the spectrum resulting from the details of the airflow through the larynx from
one closure of the vocal cords to the next; and the second is the pectrum corresponding to
the impulse response of the vocal tract.
The frequency sensitivity of the ear can be measured in various ways – by having listen-
ers determine subjectively equal frequency intervals at different locations in the spectrum;
10
by testing ther ability to detect small changes in frequency; by measuring the frequency
range over which spectral componentns interact; or even by direct physiological measure-
ments on the inner ear. All these methods lead to strikingly similar perceptual frequency
scale, with sensitivity being roughly constant over the first few hundred Hertz and then de-
creasing with increasing frequency. The perceptual frequency scale is often approximated
by a scale, thetechnical mel scale, that is linear to 1 kHz and logarithmic from then on
[35].
Just as one might expect from signal processing considerations, the degradation in fre-
quency resolution at higher frequencies is associated withan improvement in temporal
resolution. This trade-off is well matched to the acoustic properties of speech. The higher
formants have large bandwidths and do not therefore requirehigh frequency resolution. In
voiceless sounds, energy tends to be concentrated at high frequencies, representing large
transients temporally. Spectral fine structure is absent, but such sounds, particularly voice-
less plosives, contain events that are sharply defined temporally. Voiced sounds therefore
require good frequency resolution at low frequencies (below 2kHz) and voiceless sounds
require good temporal resolution above 2kHz.
Unless two frequency components are within a certain critical distance of each other
on a perceptual frequency scale, their phase relationship has no perceptual effect. Con-
sequently, a sound can be substantially characterized by its power spectrum, ignoring its
phase spectrum. This characteristic will be exploited throughout this work.
Strong frequency componenets can suppress the ear’s response to weaker components.
In temporal masking, the strong component masks a weaker component at the same ora
nearby frequency. The stronger component can occur just before or just after the weaker
component, though the eff ct operates over much greater temporal separations in the form r
case – so-calledforward masking– than in the latter. In simultaneous masking or frequency
masking a strong component masks the presence of a weaker component presented at the
11
same time at a different frequency. The eff ct decreases as the frequency separation be-
tween the components increases, but the decrease is slower when the weaker component
lies above rather than below the stronger component. Frequency masking therefore oper-
ates primarily upwards in frequency.
The perception of a continuous sound, such as a note from a musical instrument, is often
divided into three parts: loudness, pitch, and timbre (pronounced ”timber”). Loudness is a
measure of sound wave intensity, as previously described. Pitch is the frequency of the fun-
damental component in the sound, that is, the frequency withwhich the waveform repeats
itself. While there are subtle eff cts in both these perceptions, they are a straightforward
match with easily characterized physical quantities.
Timbre is more complicated, being determined by the harmonic content of the signal.
Because of this, two waveforms can have very different time domain waveforms, yet sound
identical. This is because hearing is based on the amplitudeof the frequencies, and is very
insensitive to their phase. The shape of the time domain waveform is only indirectly related
to hearing, and usually not considered in audio systems.
One benefit to the ear being phase insensitive can be understood by examining how
sound propagates through the environment. Suppose you are listening to a person speaking
across a small room. Much of the sound reaching your ears is reflect d from the walls,
ceiling and floor. Since sound propagation depends on frequency (such as: attenuation,
reflection, and resonance), different frequencies will reach your ear through different paths.
This means that the relative phase of each frequency will change as you move about the
room. Since the ear disregards these phase variations, you perceive the voice as unchanging
as you move position. From a physics standpoint, the phase ofan audio signal becomes
randomized as it propagates through a complex environment.Put another way, the ear is
insensitive to phase because it contains little useful information.
However, it cannot be said that the ear is completely deaf to the phase. This is because
a phase change can rearrange the time sequence of an audio signal. An example is the
12
chirp system that changes an impulse into a much longer duration signal. Although they
differ only in their phase, the ear can distinguish between the two sounds because of their
difference in duration. For the most part, this is just a curiosity, not something that happens
in the normal listening environment.
It is often said that timbre is determined by the shape of the waveform. This is true,
but slightly misleading. The perception of timbre results from the ear detecting harmonics.
While harmonic content is determined by the shape of the waveform, the insensitivity of the
ear to phase makes the relationship very one-sided. That is,a particular waveform will have
only one timbre, while a particular timbre has an infinite number of possible waveforms.
The ear is very accustomed to hearing a fundamental plus harmonics. If a listener is
presented with the combination of a 1 kHz and 3 kHz sine wave, they will report that it
sounds natural and pleasant. If sine waves of 1 kHz and 3.1 kHzare used, it will sound
objectionable.
This is the basis of the standard musical scale, as illustrated by the piano keyboard.
Striking the farthest left key on the piano produces a fundamental frequency of 27.5 hertz,
plus harmonics at 55, 110, 220, 440, 880 hertz, etc. (there aralso harmonics between these
frequencies, but they aren’t important for this discussion). These harmonics correspond to
the fundamental frequency produced by other keys on the keyboard. Specifically, every
seventh white key is a harmonic of the far left key. That is, the eighth key from the left
has a fundamental frequency of 55 hertz, the 15th key has a fundamental frequency of 110
hertz, etc. Being harmonics of each other, these keys sound similar when played, and are
harmonious when played in unison. For this reason, they are all called the note, A. In
this same manner, the white key immediate right of each A is called B, and they are all
harmonics of each other. This pattern repeats for the seven not s: A, B, C, D, E, F, and G.
Speech is a non-stationary signal while pitch determinatiolo ks at the ”simple” task of
extracting fundamental frequency and period from such a signal. This is inherently difficult
task because of: 1. the non-stationarity of speech; 2. the variation in articulatory gestures
13
leading to a wide range of possible temporal structures; 3. the variance of fundamental
frequency, almost four octaves; 4. irregularity in the excitation signal.
Pitch and voicing parameters are different from analyses that look at vocal tract param-
eters. Within the scope of this research, parameters associted with pitch were used as the
processing basis. Pitch, i.e., fundamental frequency (or rate of vocal fold vibration)F0,
as well as fundamental periodT0, has a key position in the acoustic speech signal. Pitch
determination deals with voice source analysis. The parameters that are being determined
are the manner of excitation ( the presence of a voiced or voiceless excitation) and the rate
of vocal cord vibration (referring to the pitch or fundamental frequency). The ear is an
order of magnitude more sensitive to changes of fundamentalfrequency than to changes
of other speech signal parameters [33]. Pitch or fundamental frequency is the predominant
factor in the prosodic information of the speech signal. Considering the ear sensitivity and
prosodic information, pitch determination is a key to good an reliable speech measurement
methods.
Pitch detection algorithms can be categorized into Short-Te m and Time-Domain anal-
ysis algorithms. With completely stationary and periodic signals, these algorithms yield
similar results. However, variables such as stationarity/time variance, windowing length,
averaging, and operating domain will influence the results of individual algorithms. Within
the short-term algorithms there are many methods of pitch detection including: Correlation
techniques such as autocorrelation and average magnitude distance function, Maximum
likelihood techniques, and Frequency domain analysis techniques such as Harmonic anal-
ysis and Cepstrum analysis. Cepstrum analysis is of particular interest because the overall
architecture lends itself well to a parallel analog implementation.
1.3 Existing Approaches in Analog/Digital Speech Recognition
Speech recognition systems have been implemented using a wide range of signal process-
ing techniques including Neuromorphic/biologically inspired and Digital Signal Processing
14
(DSP) techniques. Neuromorphic/biologically inspired techniques such as silicon cochleas
[44, 45, 47, 85] and Neural Network implementations [72, 66,62, 46, 82] are based on sim-
ple and highly parallel computation and computational units. While DSP techniques are
based on block transforms and statistical or error minimization methods that occur serially
[65, 9, 74, 16, 86, 87]. In addition to this, the input signal must first be converted from
analog to digital before the computation.
Over the past years DSP has grown into a very popular topic. Applications using DSPs
have also grown, fuelled by the increase in computing power and decrease in cost, which
has been a very positive step, particularly in the area of speech recognition. Analog signal
processing chips that are able to perform the initial stagesof speech recognition before
passing the data to a digital processor would offer several advantages over conventional
systems in power dissipation and higher speed (closer to real-time implementation).
With all that being said, algorithms aside, the different techniques still leave the huge
question of implementation. Historically, many computational systems began as analog
implementations and as digital systems became cheaper, faster, nd demonstrated their
benefits from ease of programmability (reducing design time), uch of current signal pro-
cessing has migrated to digital signal processing implementatio s. Going one step beyond
this, digital systems have powered a significant portion of the advancement in signal pro-
cessing applications. The current paradigm is to perform asuch processing as reasonably
possible, within the design constraints, in the digital domain for the benefits listed above.
Neuromorphic implementations cover all analog [54] to all digital FPGA implementations
[84]. So even in the implementation, the range is very broad.
Speech recognition is typically handled in two steps, Analog-to-Digital conversion fol-
lowed by large amounts of computation performed on one or more digital signal processors.
This approach minimizes the time the input signal spends in the analog domain and is pri-
marily due to the improved processing efficiency of digital signal processing as compared
to most analog signal processing circuits. In this case, processing efficiency is measured in
15
Millions of Multiply- Accumulates per milliWatt (MMAC/ mW). An increase in process-
ing efficiency equates to more MMACs/mW or same MMAC and less power. In addition
to this, specialized Digital Signal Processors have made signal processing problems work-
able tasks and have proven to be a viable solution.
Specialized DSP architectures have been designed to increase computational efficiency
while performing many of todays signal processing algorithms. DSPs, however, are very
power hungry, and because of this, have not lent themselves to xtensive processing tasks
in portable devices. Furthermore, digital signal processors can require large amounts of die
area for memory, which, at times, could be even larger than the processor itself. Typical
applications are very scaled-down custom versions in orderto meet specific power budgets.
Power is directly related to the amount of computation beingperformed [81, 38], or for a
DSP, the MIPS (Millions of Instructions Per Second) being executed.
Advances have come as geometries have decreased, followingMoore’s Law [14]. Fig-
ure 3 shows digital processing and the steady decrease in power per computation vs. time.
Power and die area reductions can also be realized through custom ASICs designed to per-
form a specific processing task. One step further, mixed-signal and analog implementations
show further reductions in power an area.
On the other hand, using biology as an example, low-power androbust analog signal
processing is proven to be an efficient solution everyday in all living organisms. Biological
approaches are typically massively parallel, highly noisetol rant and very low-power sys-
tems [57]. ASICs that are based on biological approaches areable to outperform general
purpose processing systems. Taking this one step further, for a given task, where an ana-
log implementation is possible, an analog implementation pr vides significant benefits in
power consumption and processing speed, on the order of 2 to 3orders of magnitude power
reduction [70].
Neuromorphic approaches to speech recognition have focused on cochlea modelling
and Neural Network approaches. This makes sense because of th biological attachment
16
to cochlea forms of processing and the inherent parallel processing of neural networks.
Neuromorphic/biologically inspired techniques tend to fall into two categories a digital
implementation of a biologically inspired circuit, or an analog implementation of the same.
Rarely will you see a mix of the two with instances of processing very similar to digital
signal processing. However, digital signal processing hasproven to be the best method,
aside from the human brain, to perform speech recognition tasks. What is needed is a
system that combines the parallelism of neuromorphic circuits with the processing power
of digital processing techniques, such as block transforms, etc.
Speech recognition involves a feature extraction step, followed by various recognition
steps which can be implemented using vector matching to a stored set of feature vectors,
Neural Network/Neuromorphic engineering, or statistical/probabilistic approaches such as
Hidden-Markov Model decoding. Our system implements a continuous-time cepstrum as
the feature extraction block. The cepstrum computation produces very useful features sets
because it has an exponential frequency spacing for the filters. The cochlea has exponen-
tially spaced frequencies which ties the cepstrum closely to biological processing for a
well-suited balance.
Early models for speech recognition used straight forward feature vector matching.
These were relatively simple to implement with algorithm differences primarily being
changes to the error distance metric. Later approaches involved more Neural-Network
architectures and Neuromorphic approaches took advantageof parallel processing and
the underlying circuitry to improve recognition rates. Current approaches use statisti-
cal/probabilistic approaches such as building Hidden-Markov Models of speech signals.
Hidden-Markov models have shown the most success in speech rcognition systems with
recognition rates above 90%. Each of the systems take feature vectors directly as inputs.
We have combined the VQ and HMM approach because our architecture operates in the
continuous-time domain, lending itself well to Neuromorphic and parallel architectures,
17
however the feature vectors are discrete pre-determined vectors, which simplifies the sig-
nal flow to later processing blocks. These pre-determined vectors are used to train and
build our Pattern Recognition Block, which is based on Hidden-Markov Model decoding.
1.4 Neuromorphic Engineering
This research borrows heavily from IC design methodologieswithin the Neuromorphic
community. Neuromorphic Engineering looks at doing quality engineering from a biolog-
ical viewpoint. The responsibility for the circuits community is doing good engineering in
parallel with developing systems that borrow from biological systems. Many of the designs
have biologically inspired architectures and are used to perform problems/find solutions to
tasks of a part of the body such as the cochlea [84, 69, 37, 59, 54], retina [56, 6, 3, 4], and at
a lower level, systems of multi-neuron oscillators [60] down to single neurons [55, 63, 18].
These families of Neuromorphic chips will serve as a model for our Analog Signal Pro-
cessing (ASP) circuits because they are well understood, bilogically inspired and integrate
well with our standard computational memory technology.
The cochlea and human auditory system is a relatively simpleand well studied system
when compared to other systems such as vision and cortex. However, the human auditory
system still represents a significant level of complexity when compared to modern com-
puter systems. Early research in cochlea design focused on designing systems by modelling
the cochlea [53] and higher levels of the auditory nervous system [85]. The cochlea itself
is a complex three-dimensional fluid-dynamic system and imple entations of it’s function
have involved low-pass/band-pass filter implementations [17] while other implementations
have looked at the core mechanics involved in it’s functionality [84].
18
Figure 6. Frequency vs. position response of the cochlea. The cochlea responds to varying frequencies
stronger at different positions. This system is similar to a filter bank decomposition.
1.5 An Analog Front End for Speech Recognition
The tendency in the signal processing realm for dealing withincoming audio signals has
gone towards immediately passing the audio signal, which isin the form of an analog sig-
nal, to an analog-to-digital converter (ADC) so that the signal can be manipulated digitally.
Typically, the FFT of the signal is performed digitally so tha the individual subbands can
be manipulated. Digital signal processing has many advantages, and the greatest is the ease
of programming a digital system to meet the given requirements.
However, there is another option which is to introduce an analog system that does more
than simply convert a signal into a digital version as soon aspos ible. By placing an analog
signal processing block immediately before an ADC, as is shown in Figure 1, much of the
processing can be done with the low-power and real-time computation of analog circuitry.
This, therefore, alleviates a large portion of the burden ofthe digital circuitry. The overall
system can either have a smaller digital processing block than was previously required, or
it can have the same size digital block that will allow for more functionality since the basic
processing has already been conducted in analog.
The final goal is to develop an An analog architecture for speech r cognition that can












Figure 7. Analog speech recognition system block diagram.
DSP to perform higher-level more complex computations. Thevarious processing blocks
are basic blocks modified from ideal DSP blocks that are used throughout speech recogni-
tion systems. These include an analog Cesptrum-like process r that consists of an analog
frequency decomposition similar to a fourier transform [77], a Vector-Quantization stage
[29], that reduces the overall data set for later matching, ad continuous-time HMM
block built from programmable diffusers [31]. The processing cores take advantage of the
inherent computational abilities of analog circuitry and exhibit low power dissipation. Fur-
thermore, by using floating-gate technology, each component block can be programmed to
both eliminate any offsets and also reconfigure the overall system.
The architecture in Figure 7 serves as our starting point, asi was the initial block-level
description of an analog speech recognition architecture [75]. Yet, in order to build a full
system requires a through understanding of specific areas includ g: large-scale program-
ming, filter banks, peak detectors, linear transconductance elements, vector matrix multi-
pliers, vector quantizers, and programmable diffuser elements. Large-scale programming
will be covered in the next chapter, followed by a discussionof filter banks and the design
considerations associated with their use. The next chapterwill cover the components nec-
essary for analog signal processing operations, followed by a chapter discussing the analog
signal processing systems that can be developed using thesecomponents including cep-
strum processing, vector quantization, and programmable diffuser elements, all of which
fall under the scope of creating a useful analog architecturfor auditory feature extraction
and recognition.
We commonly use several basic circuit elements for our auditory signal processing
20
structures, Figure 71 on page 99 shows one example system using these circuits. We will
look at these circuits, as well as others, in the following chapters. Floating-gate circuit
techniques enable using these circuits for a wide range of signal processing functions [28].
1.5.1 Frequency Decomposition
We have been using coupled bandpass IC filter models for cochlear modeling, which are
designed to be used for front-end signal processing. The spectrum decomposition is done
using differentialC4 second-order-section bandpass filters [17]. For simplicity only one
half of the differential structure is shown in Fig. 71a on page 99. The spacing of the band-
pass filters is arbitrary because each can be programmed to have a desired high-frequency
corner and low-frequency corner [27]. Programming theC4s is handled as if each filter
were two floating-gate elements [41].
As a bandpass filter array, the C4 filter banks are not cascaded as are many cochlea
models [57], therefore eliminating the typical distortionr noise accumulation. In speech,
particularly in noisy environments, the signal power is more evenly distributed across a
broad frequency range than a simple tone, and therefore allowing for large input ampli-
tudes with minimal output distortion (higher system signal-to-noise ratio). As a result, we
typically have signal amplitudes through each filter that are 10mV to 30mV or less for input
amplitudes between 0.25V and 1V, resulting in harmonic distortion through the system less
than -30dB at each tap; differential circuits will further reduce these effects.
1.5.2 Amplitude Detection
The output of each filter bank passes through a peak detector stage to produce a constant
magnitude output, or an envelope of the incoming signal. This magnitude is similar to tak-
ing the power spectrum density or real spectrum of an input signal. The circuit is shown in
Figure 71b on page 99. We program the peak detectors to the desir frequency response of
each frequency band. The floating-gate transistor on the output provides an offset current
to set the DC output voltage. Each peak detector has an individually programmable corner
21
frequency. Because the output magnitude is continuous, this allows us to capture additional
high frequency content within each band. The peak detector programming blocks are iso-
lated similarly to theC4s. The entire bank is treated as a single row and within that row he
individual elements are accessed by column. Control circuitry on the rows and columns
ensures isolation.
1.5.3 V-to-I Linear Transconductor
The output from each amplitude detector block is a voltage sinal. In order to generate the
log of this voltage, which represents the magnitude, we needto transform this signal from
the voltage to current domain. The major hurdle is doing thisoperation linearly over a wide-
enough range for the signal to be useful. This section gives an initial circuit description for
such a block.
1.5.4 Weighted multiplication
Figure 71 on page 99 shows our analog differential multiplier that multiplies the incoming
differential voltage signal with a stored differential weight. We program the positive and
negative weights by setting programmable floating-gate voltages. These values can be
programmed to any arbitrary value, Their differential operation requires each pair to have
a DC bias voltage.
1.5.5 Distance Measure
In order to reduce the data set output from the continuous-time cepstrum operation, we will
quantize each vector to a know vector stored in memory. As a parallel operation, we are able
to perform the quantization stage using an array of single distance measure blocks imple-
mented using ”bump” circuits. This provides a programmable, parallel and programmable
method of performing large sets of vector distance measuresinstantaneously.
22
1.5.6 Sequence Detection
The final stage of recognition will involve a sequence detection step in order to match a set
of vectors to a model for a given sound that corresponds to a phoneme, or sub-phoneme.
At this stage the outputs are digital and the overall system lends itself well to cascading
multiple stages together to produce higher levels of recogniti n.
With this as the basis, the following chapters will look at each of the required pieces in
detail and outline the process toward developing an analog IC for auditory feature extrac-
tion and recognition. The chapters will begin with the programmable analog elements as
they are critical to all of the following blocks and require quite a bit of discussion on their
own. Following the programmable elements, I will cover the programmable filter banks,
the first block in the system, and the most critical block for the feature extraction phase.
The next chapter will discuss all of the ancillary blocks required, including those needed to
maintain a useable signal flow from the filter banks through each of the core blocks used
in each of the following recognition blocks. The final chapter will discuss the recognition
blocks of the system in detail, with the final goal of this thesis being the introduction of an




This chapter will discuss the core memory element used through t the analog computing
arrays, the floating-gate transistor. The floating-gate transistor is a standard p-type transis-
tor with no DC path to ground leading from the input gate of thedevice. This allows for
charge, and a resulting eff ctive voltage, to be stored on the gate with negligible degradation
in the stored value. Critical characteristics of the devicemaking them useful to us include:
it’s ease of integration into standard analog circuit design ; the combination of memory
within the computational element, reducing area; and the accur y of values stored in the
element.
2.1 Floating-gate Transistor Element
The elements used in the computational memory structures are vanilla pFET transistors
with additional polysilicon area to accommodate a poly-poly capacitor as the input to the
device, as shown in Figure 8. This structure provides a method for storage and computation
within the device.
The use of floating-gate devices as a memory element is not newin the circuits commu-
nity [10]. Since their discovery much effort has gone into making them a viable non-volatile
memory element which we now find in digital cameras and mp3 players. More recently,
the beneficial use of floating-gates in analog circuits has been r alized and it has been pub-
lished that not only can they be used as memory devices but func ion as programmable,
compact, computational elements [39, 19, 23].
Floating-gate technology is used in Electrically Erasablend Programmable Read-Only
Memory (EEPROM) which use special processes. Floating-gate devices can also be created
using standard CMOS processes which contain oxide thicknesses that are uniform over all




















Figure 8. Cross section of a floating-gate element. The structure is very similar to a classic pFET
transistor. The floating-gate is made from polysilicon (poly1), however, it does not have any physical
contacts to it’s surface and is electrically insulated by silicon-dioxide, thus able to store charge that is
on the gate indefinitely. The poly1 gate can be modulated by the poly-poly capacitor that is formed
by eletcrolyte (poly2) and an input gate voltage(Vin). Source-to-Drain current is modulated through
the capacitor divider from the gate input into the channel through the series capacitance from gate
to channel. Tunnelling occurs through the MOS capacitor that is formed from the polysilicon gate
to the nwell. The nwell voltage(Vtun) is controlled externally. Injection is controlled by modulating
the channel current, by modulating the gate input voltage; and the drain-to-channel voltage, which is
modulated directly through controlling the drain voltage. Programming is possible through the control
of the injection and tunnelling phenomena.
using Fowler/Nordheim Tunneling [50] (Tunneling) and Channel-Hot Electron Injection
[20] (Injection). Table 1 lists voltages required for the tunneling and injection phenomena
for various processes and as technologies advance, the device geometries decrease leading
to lower programming voltages. Lifetime and variability ofstored values on the floating-
gate, in the listed geometries, is not an issue as shown in [11]. However, smaller oxide
thicknesses and higher electric fields raise interesting design considerations with regard to
retention as geometry sizes continue to decrease.
There are however, fundamental differences in usage between EEPROM, multi-level
25
memories and these computational memories in that the programming values do not repre-
sent discrete levels of charge being programmed to the device. This is both a limitation and
benefit, depending on how the device will be used in subsequent circuitry. For digital cir-
cuits the desired solution is only interested in resolving two distinct values for each cell, one
or zero Newer digital systems store information using multiple levels but fundamentally use
approaches similar to binary valued cells. Conversely, an analog memory has infinite ac-
curacy, theoretically limited to a single electron moving across the barrier and require a
continuous range of programmable values. Additionally, insta taneous programming of
each cell is not as critical because in an analog system we aregenerally more interested in
programming a device once and then reading or, as we use them,computing continuously
afterwards. This represents a fundamental trade-off between precision and time and thus,
the programming scheme for analog cells must be diff rent from that used for digital mem-
ory. Designing for large scale integration of these memory elem nts into systems requires
that the elements themselves be relatively small, however they can be scaled and sized just
as traditional transistors. Due to their small design sizes, mi match between elements can
be noticeable, as one would expect from any standard process. The ability to program each
device, in addition to smart integration techniques such asprogrammable current mirrors
and proper layout, we are able to show mismatch cancellationequivalent to 14-16 bits of
accuracy [1]. Exploiting this accuracy allows us to programsingle device down to the
sub-picoammeter range, by including some considerations fr the fundamental mechanics
of the transistor.
2.2 Device Programming
As mentioned earlier, a key requirement for using floating-gate elements is the ability to
accurately program each element. Programming the analog memory elements is a com-
plicated task as outlined in [80, 42]. By controlling the various device parameters such as
gate voltage and drain-source voltage we are able to accurately program the devices using
26
Min. Channel Oxide Tunnelling Injection
Length Thickness Voltage Voltage
2.0µm 400− 450Å 27 V 14 V
1.5µm 300− 350Å 20 V 10 V
0.5µm 110− 160Å 15 V 6 V
0.35µm 70− 85Å 12 V 5 V
0.25µm 50− 60Å 8 V 4 V
Table 1. Tunnelling and injection voltages for various feature sizes available through MOSIS
(http: //www.mosis.org). The required voltages start noticeably high for minimum channel lengths of
2.0µm, but as the oxide thickness decreases, each of the programming voltages also decreases. Cur-
rent programming voltages are now within the range of using on-chip charge pumps and high-voltage
control circuitry. Tunnelling refers to Fowler /Nordheim tunnelling [50]. Injection refers to Channel
Hot-Electron (CHE) injection [20].
Hot-Electron Injection and Fowler-Nordheim Tunnelling. Other mechanisms such as UV
Photo injection may also be used, but are not used in our programming schemes due to
process variations. Table 1 list various programming voltages over different processes. The
required voltages start noticeably high, but as the oxide thickness decreases, each of the
programming voltages also decreases. Current programmingvoltages are now within the
range of using on-chip charge pumps and high-voltage control circuitry.
Using large floating-gate arrays on the order of 1K to 10K elemnts, it becomes obvious
that programming each element by hand would be a very time intensive process. Figure
9 shows a block diagram of the programming algorithm. The programming algorithm has
been automated in MATLAB and comprises three lower level functio s. These functions
include:
1. EraseArray
Tunnels entire array until all current levels are below their d sired values. This func-
tion is required because tunnelling is used for global eraseand injection is used for
fine programming but only works in a single direction.
2. GetInRange()















START Get in Range Fine element
programming
Figure 9. Flow chart of first iteration of floating-gate programming algorithm. Future revisions use
similar relation for calculating the necessary injection voltage but use on chip circuitry to program
multiple elements in parallel and also to measure currents on chip.
the speed of injection. Injection requires drain current. Tunnelling is a global func-
tion so some elements may have significantly lower current levels than others. Future
revisions will allow isolation of tunnelling as well.
3. InjectArray()
Controls the injection of an entire array by calculating theoptimal drain voltage
to ensure that each injection pulse brings the element closer t the desired current
without overshooting.
The programming algorithm is based upon a representation ofthe physical channel
hot-electron (CHE) injection equation as shown below, and solving for the injection drain-
to-source voltage required to move the device output characteristics to a desired value.






















Figure 10. Plot of injection rate versus injection pulse width for di fferent drain-to-source voltages. The
injection pulse width was limited by programming hardware and is shown to occur at a pulse width of
100µs. The minimum injection pulse width can be decreased by moving the control circuitry on chip.
Solving for the injection drain-to-source voltage for a desir d current using the Hot-
electron injection in a pFET is described by (1)





whereIS0 is the initial current,IS is the final current, andVin j andI in j0 are physical device
parameters. The injection current is also dependent upon the change in the floating-gate
voltageVf g [26]. Solving forIS we get
IS
−α =
−α · I in j0
C0 · IS0α
· e−∆Vd/Vin j · (t) + IS0−α (2)
whereC0 = UT ·CT/κ [22, 24].
Rearranging this solution into a single equation for∆Vd, we get a solution that is the


















t Vinj = 217mV
Figure 11. Using the injection rate parameters in Figure 10 we are able to extract the values forVin j .











· Vin j (3)
The required drain voltage is calculated for each element inthe array given their present
value of current and the final desired current. These voltagev lues are then applied to each
element in the array.
Accurately programming a device is highly dependent upon the accurate representa-
tion of the device parameters. There are methods for extracting the parameters as will
be explained shortly and each extraction is dependent upon the accuracy of the measure-
ment equipment, which is where we see a direct relationship between speed and accuracy.




















Figure 12. Using the injection rate parameters in Figure 10 we are able to extract the values forα.
through curve-fit extraction, where asI in j0 is a constant.
Figures 10 shows the injection process at multiple values ofdrain-to-source voltage.
The plots are typically done over multiple devices to speed up the process. This data shows
∆I on the Y-axis and I on the X-axis because we know injection iscurrent dependent. From
the injection rate equation
∆I = m× I1+αexp(−∆Vds
Vin j
) (4)
On a log-log scale,




we see that the slope of each line equals 1+ α. Figure 12 shows the extracted value of
α. Next, using Figure 10,Viin j can be extracted as follows. Choose a current I to be the
base current for operation. From the selected I, extract thevertical column of data from
31






















Figure 13. Example data using the programming algorithm to program a device to various operating
points. This plot shows convergence of three different trials twice to 10nA within 10 steps and once to
7nA within 5 steps.
Figure 10. The resulting plot will resemble data in Figure 11and the slope equalsIdesiredVin j .
Performing a similar linear fit we are able to extractVin j.
Injection has been shown to occur with pulses down to 10µs as shown in Figure 10. The
rate of change at a given drain-to-source voltage increaseswith pulse width and drain-to-
source voltage. Figure 13 shows convergence in eight to ten iterations, which results in an
individual programming time of 80 - 100µs. For an array with 1K floating-gate elements,
the total programming time could be as low as 80 - 100 ms.








wheretmeasis the time required to perform a current measurement,tpulse is the time required
for a pulse,tramp is the time required in ramping, E is the number of elements tobe injected,
P is the number of pulses, and T is the total time required to inject E elements. Although
32
























Figure 14. Floating-gate programmability of a single device to various operating points. An operating
point is defined by a Drain Current at a given gate voltage. This figure shows three (3) device operating
points. Each operating point is reached by programming the device.
tmeaswill decrease as the current increases, and overestimate ofth programming time can
be obtained by using the slowest current to be measured. Typical values fortpulse andtramp
are 10µs and 200µs. Using the FPGA board as the direct interface to our commercial
ammeter, current measurement time is on the order of 100ms. It can be seen clearly that
the dominant factor in our speed is the current measurement.Future implementation will
use on-chip integrators with small capacitance values to decrease the current measurement
time, thus reducingtpulse. A 100f F capacitance will give us atpulse of 400µs (including
average). Assuming P=10 for every element, it will take at most 2s to program a 500
element array.
The programming algorithm is computational intensive and requires solving (3) before
every injection pulse. Current implementations transform(3) into a look up table that is
loaded into the FPGA, thus avoiding any kind of computation and greatly increasing the
programming speed. Now the flow of data is only between the PC board and the FPGA.
33
Using this predictive algorithm, one is able to reliably andrepeatably shift the operating
point of a single memory element along a continuous range of values for gate vs. drain
current as shown in Figure 14. Speed is still relatively important as large arrays are to be
programmed, but our approach prioritizes accuracy considerations over speed as evident
by the iterative approach.
2.3 Programming Hardware
We designed a custom floating-gate programming board to control the floating-gate pro-
gramming scheme. The board is controlled by a serial port through MATLAB. The board
controls the power supply, gate voltage, tunnelling voltage, and drain voltage. The board
is also able to measure current while varying the drain voltage, which is essential during
injection.
Part of the system involved accessing the array is on chip with the devices. Future
revisions of this system are being developed that move more of the external circuitry on
chip. This will provide cleaner data and increase programming speed. Speed improvements
are also possible by utilizing parallelism that can only be otained, at reasonable costs of
time, money, and overhead, by moving the control on chip.
Digital control of our PCB is done with the use of an FPGA. The FPGA on the pro-
gramming board is an Altera Stratix EP1S10 device. This FPGAhas approximately 200K
equivalent system gates, just under 1 Mbit of on–chip memory, and 48 embedded multi-
pliers. The Stratix FPGA is installed on one of Altera’s stand rd development boards that
contains 25 MBytes of external memory (including SRAM, SDRAM, and Flash), 10/100
Mbit Ethernet interface, and 81 general–purpose I/O pins.
The FPGA is configured to implement a customized soft–core processor (Altera’s 32 bit
Nios processor) along with specialized VHDL modules that handle timing critical commu-
nication between the programming board and the soft core processor. The Nios processor
controls the overall system and coordinates the parallel opration of the different VHDL
34
(a) First Generation
     Programming Board (1G)
     Programming algorithm
     implemented in MATLAB.
     Slow Interface.
     No on-board memory.
     1 element / 10 min.
(b) Second Generation
     Programming Board  (2G)
     Programming algorithm
     implemented in VHDL.
     High-speed interface
     On-board memory.
     100 elements / 1 sec.
(c) Third Generation
     Programming Board (3G)
     Programming algorithm
     implemented ON-CHIP.
     High-speed interface.
     ON-CHIP memory/control
     100K - 1M elements / 1 sec.
Figure 15. Programming control structure hardware evolution.
modules. In addition, software running on the Nios processor implements the TCP/IP pro-
tocol and communicates with the PC via a 100 Mbit Ethernet connection. On the PC side,
a Matlab interface has been developed that provides a directl nk o the FPGA from the
Matlab command line.
Part of the system involved accessing the array is on chip with the devices. Future revi-
sions of this system are being developed that move more of theexternal circuitry on chip.
This will provide cleaner data and decrease programming speed. Speed improvements are
also possible by utilizing parallelism that can only be obtained, at reasonable costs of time,
money, and overhead, by moving the control on chip.
Programming board serial port communication is limited by port speed and also the
operating system running the algorithm. Using windows machines has shown to add ad-
ditional serial port timeout delays. We are currently moving the algorithm onto a pro-
grammable integrated circuit controller (PIC) or onto a field-programmable gate array
(FPGA), thereby removing the OS delays.
A four layer PC board was fabricated. With the use of DACs, ADCs, and level shifters,
this board provides the user with 7 bias voltages (0− 3.3V), 4 programming voltages (0−













































Figure 16. a.) Figure to illustrate the setup used to measurethe device current and for the program-
ming. Illustrate the ammeter, custom programming board andFPGA which implements a set of in-
structions from the programming algorithm. b.) The accuracy of the current measurement equipment
decreases sharply as the current being measured decreased.
(0 − 5V). A clock of 10MHz is used for the DACs, while the ADC has a maximum rate
of 200M samples per second. Figure 16 (a) shows the block diagram for the board. The
board is fully controlled with the use of an FPGA.
External current measurements are limited due to the huge line capacitance for wires
running off chip and the equipment used to perform the current measurement. The current
measurement circuitry can typically provide one current measurement between 10µs for
large currents and 100ms for very small currents. Off chip also requires additional filtering
of the data to ensure accurate results which can make the measurements even slower. Future
versions will have on-chip current measurement circuitry.
The current measurements are the primary bottleneck on floating-gate programming.
A precise and fast reading is required for accurate programming. Taking the current out
of the chip into the ammetter introduces some error in our measurements. The noise and
the parasitic capacitances introduced by the protoboard greatly decrease our accuracy for
measuring the current. This will in fact affect our programming precision. Our current
measurement accuracy is shown in Figure 16 (b).
36
Averaging will increase the accuracy of measurement but will also increase the time re-
quired for the current measurement. To minimize the time delay an “intelligent” averaging
is performed. This consists on increasing the the averagingas we get closer to the target.
This is implemented as a look up table that is loaded into the FPGA.
2.4 Kappa Projection Algorithm (KPA) for Ultra-low program ming
Biological applications such as filtering for hearing-aidsand signal decomposition in the
auditory frequency range can have time constants as long as tens of milliseconds to sec-
onds. Achieving such long time constants can be extremely challenging in integrated circuit
technology due to limits in capacitance sizes. This becomesextremely obvious in Gm-C
type analog filters where corner frequencies as low as 100Hz may be desired. For a corner
frequency of 100Hz,τ would be approximately 1.5ms. Limiting capacitance to 1pF re-
quires a Gm of 0.628nS corresponding to a bias current of 32pA, which for many, already
represents a significant design challenge [51].
To make the problem even more challenging, for analog processing architectures such
as Cepstrum Filtering [77], arrays of parallel analog filters (more than 10) are the initial
processing step, making the use of 1pF capacitors for each filter unreasonable. Typical
capacitance values for these filters are on the order of 50-100fF. Keeping the time-constants
the same, but scaling currents, we are now required to provide bias currents from 100fA -
1pA for the low frequencies. This trend can be seen in Figure 21. Additionally, we need
to be able to provide a large number of these bias values and also the ability to adjust each
bias. This would allow the programming scheme to account fordevice mismatch and adjust
for errors that cause a shift in the desired pass band of the filer.
These types of biasing schemes are particularly useful in low-p wer applications where
the supply voltages are severely reduced. These elements allow us to program the device to
any effective Threshold Voltage we desire. This programming method is particularly useful


































Figure 17. A novel predictive algorithm for programming floating-gate devices into pico-amp and sub-
pico-amp current ranges. The desired operating point of thedevice is ”C”. ”A” is the current operating
point of the device. Rather than program the device directlyto ”C”, the device is programmed to ”B”,
taking advantage of the improved accuracy in that range for the current measurement equipment.
Operating point ”C” is achieved by taking into account the Sub VT slope of the device.
are no issues using this device as part of a current mirror to generate a current sink through
an nFET.
The Kappa Projection Algorithm (KPA) is a method of predicting the final operating
current of a device from a know, but higher starting current.The device parameterκ is
critical to the deep sub-threshold current projection and accurately predicting the final op-
erating point. Because the predictive algorithm uses this value to scale the effective shift,
any error has a multiplicative affect on the final current. Theκ of the floating-gate devices
is decreased due to the capacitive coupling of the input as shown in Figure 18(a). Typical
values for kappa∼ 0.4 - 0.7 for a corresponding Cin∼ 30fF. To make the problem more
difficult, κ varies with current. Our extraction method fitsκ over the entire sub-threshold
range and uses the largest value. However, for much lower currents we predict thatκ will
38
(a) (b)
Figure 18. a.) Figure showing kappa and what is meant by the ”effective” kappa of a floating-gate
device. Kappa represents the attenuation on the input gate voltage due to the capacitive voltage division
from input to the channel of the device. b.) Using this effective kappa, we are able to project the
operating current for a device well below the limits of our measurement equipment.
decrease introducing an error in the prediction. Figure 19 (a). shows howκ varies through-
out the sub-threshold region.
Theκ of a transistor relates directly to the amount of coupling achieved from the gate
into the channel of the device. Floating gate devices use capacitively coupling into the
gate of a transistor, thus the eff ctive coupling from gate to channel is decreased due to
a capacitive division from two series capacitors. There aretwo widely used methods for
extracting this valueκ from a single device. These include: direct extraction fromthe
subthreshold slope of a device in saturation or extraction fr m the gain of a device in
common-gate configuration. Figure 18(b) models the first of these two methods, which is
what was used here.
Accurate measurements of current and voltage, respectively, s critical in each of these
methods. Current measurement error using commercial current m asurement equipment
vs. on-chip integrator style current measurement has been inv stigated elsewhere [73]. For














































Figure 19. κ variation with drain current and variation across an entire chip at the single deep sub-
threshold projection point. a.) Averageκ across the entire array of devices.κ changes with device
current. The expression for κ is only valid in the sub-threshold region of operation. As the device
current continues to decrease, we see thatκ approaches an assymptote, but tends to continue to increase
slightly. The projection data was generated using the largest value ofκ measured, but this is still less
than the actual κ for very small currents. This accounts for the error increasing as we project the
operating point of the device to lower and lower currents. b.) Effective kappa variance across a chip of
computational memory elements.κe f f = κ
Cinput
CT
, whereCinput is the input capacitance for the device and
CT is the total capacitance for the node. Variance in both X and Ydirection is less than1%.
set was taken for an array of 320 programmable memory elements. Theκ std is 0.0025 as
shown in Figure 19 (b). Thus all of our parameter extraction and predictive calculations
depend on taking accurate current measurements to give an exact location for the current
element. Current accuracy given our measurement setup using commercial picommeters,
have given us an accuracy of 2% or less, within our desired range of operation as shown in
Figure 16.
Other methods of current measurement include on-chip method such as integrating
structures. These structures provide cleaner current measurements for low currents and
when integrated with external digital control circuitry, are able to measure currents as fast
as 100 - 1000 measurements per second. Noise in the system is included in the power sup-
ply noise along with other noise sources such as substrate coupling into the floating-gate,
but these are well understood noise sources. The limitations f r the on-chip current mea-
surements are determined by the linearity dynamic range of the operational amplifiers used
40
























Max error =     2.3034%
Min error =     0.0341%
























Min error =     0.1653%
Max error =     1.5221%
(a) (b)
Figure 20. Sub-threshold plots showing a set of programmingpoints using the Kappa Projection
Algorithm (KPA). Programmed currents were set as low as 2pA with a maximum error of 2.3%.
in the circuitry, along with charge feed-through of any switches used in the configuration.
Previous experiments have shown an accuracy as low as 1pA andas high as 10nA at a base
current of 1uA.
In addition to measurement errors, the device parameters aralso dependent upon the
operating condition of the device. For example,κ is known to vary between the sub-
threshold and above-threshold regions. This variance on kappa is between 1-2%, which
also introduces limitations to our predictive algorithms.One solution to the variation in
parameters given operating range, would be to use an algorithm that continuously updated
the parameters given a desired change, vs. the actual change. The parameter differences
could then be extrapolated back out of the original equationto account for the differences.
This type of approach would have the advantage of being self-updating/optimizing. This
approach would require additional computational power, and thus currently does not lend
itself very well to an on-chip implementation.
Figure 22 was done assumingκ doesn’t change in sub-threshold region. With this
assumption, we were able to achieve an accuracy of greater than 90%. Many of our test
applications require an error of 95% or higher. Taking into account the change inκ we are






























C4 Low Corner Response for decreasing iTau
Low





















Figure 21. Band pass filter corner frequency characterization. a.) Characterization plot ofC4 Low cor-
ner frequency for decreasing values ofiTauLow. Showing programming across the range of frequencies.























































Figure 22. a.) C4 plot of Low corner frequency for decreasing value ofiTauLow. All points are gen-
erated using the Kappa Projection Algorithm (KPA). b.) Erro r plot of the projected frequency points.
Maximum error = 8%.
Due to measurement limitations, we were not able to measure the currents at the ex-
tremely low currents. Instead, we extrapolated the currentgiven a linear relationship be-
tween current and the corner frequency of an analog filter called theC4 [78].
The characterization plot in Figure 21(b) shows there is a linear relationship between
bias current and corner frequency. Assuming this curve continues to be linear, and extrap-
olating the effective current from the measured frequency response, we werable to easily
42
Table 2. Summary of Kappa Projection Algorithm (KPA) perfor mance
Technology 0.5µ N-well CMOS
Array size 10×32
Programming mechanismsHot electron injection
and electron tunneling
Programming error < 2.0% at 10pA
< 10.0% at 100fA
measure currents in the 200fA range with limiting factors being the noise floor for the cur-
rents in the device which should be on the order of the reversebias junction currents, 1 fA
or less.
2.5 Floating-gate Memory Arrays
The floating-gate core used in the computational arrays differ from those used in other ana-
log memory circuits such as Epots [19]. Epots are made ”User-Fri ndly” by the addition of
several control circuits around each floating-gate elementsuch that the overall circuit block
is one order of magnitude larger than our floating-gate elements. Analog computational
memories are a viable method of storing analog values and dueto their relatively small
size, lend themselves well to being scaled up as sizeable analog memory arrays. The bene-
fit of large floating-gate computational arrays is the compactness of each core element and
therefore support circuits are moved to the array periphery. Large arrays of these devices
can be easily integrated with standard CMOS analog design, making them ideal for use as
bias devices for filters, multipliers and many other applications.
Isolation within an array is critical to controlling the thousands, even millions, of pro-
grammable elements that may be on a chip. To that end, a functional ”AND” is done for
each element. Injection requires two conditions to be satisfied within the device for injec-
tion to occur, namely drain current and sufficient drain-to-source voltage. The drain current








Ga te  Control
Voltage
Figure 23. Floating-gate array demonstrating element isolation by controlling the gate and drain
voltage of each column and row respectively. Selection of gate and drain voltages are controlled by
on-chip switching circuitry. The gate, source and drain voltages are controlled by DACs on a custom
programming board.
is a sufficient electric field from the drain to channel region to accelerate the carriers suf-
ficiently to cause impact-ionization. Thus, by simply usingthe gate voltage in the column
direction and the drain voltage in the row direction, we can effectively isolate a device
by ensuring that only a single device has conditions sufficient to cause injection. This is
illustrated further in Figure 23.
To ensure isolation, within an array each device is connected through various control
circuits to switch a single element into and out of program mode where it’s gate and drain
voltages can be explicitly controlled. For the aforementioned isolation reasons, tunneling
44

























Final Sweep of Programmed Floating-gate Elements
Used Projected Targets
Used Real Targets
Figure 24. Early injection results showing an array of floating-gate devices programmed to exponen-
tially spaced currents.
is used to erase the entire array and injection is used to program each element.
The isolation circuitry is made of muxes that switch the drain nd gate voltages of the
desired element onto a common bus for each signal. All other elements are switched to a
separate voltage which ensures that those devices will not inject. The external voltages are
routed off-chip and controlled by an external programming board[39].A typical program-
ming scenario is shown in Figure 23.
The external control circuits used to access each element and program the array are
contained on a custom programming board which interfaces toa c mputer via a serial
connection. Current versions of the programming board use FPGAs to provide the digital
interface and control loop. Using these boards we have been abl to program over 120dB
dynamic range to an accuracy of 99%.
45

























Max error = 0.70%
Figure 25. Injection results showing a single row of floating-gate multiplier blocks programmed to
cosine coefficients. These blocks are essential to performing analog frequency transform functions. Be-
cause the values are arbitrary, one can also set these linearly or to increase or decrease logarithmically.
This chapter illustrated the floating-gate core used in the computational arrays and a
programming method to accurately control the operating condition of the device. Using
this element as programmable memory elements in standard CMOS analog circuits, we
are able to build larger analog signal processing systems. The benefit of large floating-gate
computational arrays due to ease of integration with standard CMOS analog design, making
them ideal for use as bias devices for filters, multipliers and many other applications was
also discussed. The next chapter will outline one of the coremputational blocks used for
the analog signal processing operations specifically filtering.
46
CHAPTER 3
PROGRAMMABLE CONTINUOUS-TIME FILTER BANKS
This chapter explores a five-transistor continuous-time band-pass filter element called the
Capacitively Coupled Current Conveyer (C4) with a programmable pass-band. We show
measurement data from 1Hz to 100kHz . This chapter discussesthe effects of various
design parameters on frequency-range, gain and linearity.Experimental data is presented
from circuits fabricated on a 0.5µm nwell CMOS process available through MOSIS.
Band-pass filter elements have a long history, from simple lin ar-systems to standard
implementations [68]. The main use being some form of signaldecomposition, whether
it be to amplify/attenuate a specific signal frequency, or to separate multiple frequencies
[71, 67, 30]. Tuning of these devices is critical [36, 64], inaddition to matching, power,
and overall die area. The filter discussed here has a simple toology (5 transistors), uses
a single power-supply, uses very little power, and is easilytunable. It will also serve as a
good starting point for developing higher order bandpass filters.
3.1 A Compact Band-pass Filter Element
The filter used in the programmable filter array is based on thecapacitively coupled current
conveyer (C4) that has been presented [41] and characterized [78] elsewher . Two C4s
are shown in Figure 50 on page 73 separated by a unity-gain buffer. The entire circuit
of Figure 50 is called the C4 second-order section (C4SOS) and is the basic filter of this
programmable filter bank. Since the C4SOS is simply a cascade of two high-gain C4s,
understanding the C4 allows analysis of the C4SOS to be straightforward.
The initial theory on the capacitively coupled current conveyer (C4) was developed from
the Autozeroing Floating-Gate Amplifier (AFGA) [23], whichhad widely separated corner
frequencies due to limitations of the device. The C4 is a capacitively based bandpass filter















Figure 26. Schematic of a single C4 structure. The capacitors model all explicit and parasiticcapaci-
tances in the signal path of the circuit.
corner frequency limitations. The circuit is shown in Figure 26. This circuit has previously
been used in many systems [43, 30], but these applications were designed with very low
Q’s, typically less than 1. Such low values of Q will not work for some applications, one
example of this would be a cochlea stage which requires Q’s ashigh as 30 for low amplitude
signals. In designing systems with moderate to high Q’s there are certain properties that
will play a significant role. Within this chapter, we hope to cover these issues to clarify the
design and use of the C4 within these systems.
Figure 26 illustrates our bandpass filter element, which serves as the initial building
block for our spectral decomposition operations. The size of the block is critical since
there will be one element required for each sub-band decomposition. Typical applications
have used 32 sub-bands with an overall size of 800um X 100um making them easy to
integrate into larger analog signal processing systems.
Here we develop the transfer function of theC4 filter using large-signal analysis. From
48


















































































Figure 27. C4 frequency response curves. (a) Frequency response curve for a C4 with a gain of 5.
Showing that each corner frequency can move independently.(b) Unity gain C4 showing tunability
over a wide range of frequencies. (c) The Q for a C4 can also be tuned, up to the maximum theoretical
Q. (d) C4 change in gain with C2.













dt = IM3[1 − e
−κ∆Vf g/UT e−∆Vout/VA] (7)
If M4 is a long-channel device (> 5λ), we assume a large early Voltage (> 25) for that
device and that the change inVout will be small compared to this. This allows us to make
the following assumption
e∆Vout/VA ≈ 1
whereVA is the early voltage of M4. This assumption holds to within 4%when∆Vout is
49
close to it’s largest swing of 1000mV, but is much more reasonable at 0.4% with voltage
swings closer to more realistic values such as 100mV. But we must keep in mind that even
this assumption has minimized our SNR to -47.9dB due to this linearization technique.





dt − (C1 +C2 +CW)
dVf g
dt = IM2[e
−(κ∆Vout−∆Vf g)/UT − 1] (8)
C2
d(Vf g)
dt − (C2 +CL)
dVout
dt = IM3[1 − e
−κ∆Vf g/UT ] (9)
At this point we must make further assumptions to even begin to solve this system
of equations. We will begin by making two undesirable, yet necessary assumptions at this
point. The errors introduced by linearizing at this stage will be discussed later when looking
at distortion. First we assume
−(κ∆Vout − ∆Vf g)/UT  1 (10)
from 8. However, we have already seen that∆Vout can be as large as 1000mV. Also, given
thatUT ∼ 25mV this makes this assumption even less accurate. Next we assume that
−κ∆Vf g/UT  1 (11)
from equation 9. Keep in mind that ideally,∆Vf g would be zero if the follower in the
feedback path is indeed operating correctly, which means this approximation holds during
normal operating conditions. For simplicity we will continue to use both of these assump-
tions and make the following simplification
ex − 1 ≈ x
1− e−x ≈ x




















By taking the Laplace transforms of (12) and (13), these equations become




sC2Vf g − s(C2 +CL)Vout = κIM3UT Vf g (15)
The sum of capacitances is renamed as
CT = C1 +C2 +CWCO = C2 +CL
for the total capacitance and the output capacitance, respectively. Again, rewriting the
equations, they become




sC2Vf g − sCOVout = κIM3UT Vf g (17)









(16) and (17) become
sC1C2τlVin + sτlVout − s
CT
C2
τlVf g = −Vout + 1κVf g
sτ f Vf g − sCOC2 τ f Vout = Vf g (18)













sτ f − 1
)
= sCOC2 τ f Vout (19)
The C4 is a capacitively based bandpass filter with electronicallytunable corner fre-







sτl(1− sτ f )















































Figure 28. C4 biasing using resistors and programmable devices, respectively. a.) Array of 32 C4 filters
biased using a resistor ladder. Transistor mismatch is significant and the filter taps are not even mono-
tonic. b.) Array of 32 programmable C4s biased using programmable memories with exponentially
spaced currents within 95% accuracy. By adding programmingto the filter bank, qualitatively one is
able to quickly see a significant improvement.












and the total capacitance,CT , and the output capacitance,CO, are defined asCT = C1 +
C2 + CW andCO = C2 + CL. The currentsIτl andIτh are the currents throughM2 andM3,
respectively in Figure 26. With normal usage,τ f is so fast that the zero it produces lies far
outside of the operating range. Hence, the C4 takes on the form of a bandpass filter within
the region of interest with±20 dB/decade slopes outside the passband. The midband gain
is−C1/C2. UT is the thermal voltage (25.9mV@25oC) andκ is the subthreshold slope. The
plots of Figure 27 show measured data from a 0.5µmprocess available through MOSIS that
summarizes the frequency response of the C4.
The C4 takes on the properties of a bandpass filter with first-order roll-off and a pass-
band gain set by the ratio of the two coupling capacitors asAv = −C1/C2. The overlap
capacitance of the MOSFET causes there to always be some effectiveC2 capacitance, even
if it is only a few fF, so the gain is not infinite. The overall time constant of the filter, which
52




Furthermore, since the corner frequencies of the C4 are completely independent of each
other, as shown in Figure 27a, either one of the corner frequencies could be pulled to the
extreme allowing the C4 to take the form of a lowpass or highpass filter.
Figure 28 (a) shows aC4 array biased using a resistor ladder. Transistor mismatch is
significant and the filter taps are not even monotonic. Figure28 (b) shows an array of
32 programmable vanilla C4s programmed with exponentially spaced corner frequencies
within 95% accuracy. By adding programming to the filter bank, qualitatively one is able
to quickly see an improvement. Further improvements requiran additional calibration
step.
3.2 Designing for Q


















Figure 29 is a plot of the Q peak versus the ratio ofIτl/Iτh for a unity-gain C
4 using the
extracted values of the capacitances from the layout. As canbe seen from this MATLAB
plot, there is a maximum value that the resonance can achieve. The maximum value occurs
whenIτh is slightly larger thanIτl for the capacitances of the fabricated circuit that yielded
the plots of Figure 29. The maximum value of the Q peak can be predicted for a certain set























































Max Q = 1.8869
R = 0.0550
(a) (b)
Figure 29. Simulation and measurement data of filter ”Q” vs. current ratio. a.) Calculated Q as a
function of Iτh/Iτl . There is a maximum Q at a particular current ratio, and the curve is relatively flat
around that maximum point. The maximum achievable Q for thiscircuit was designed to be approx-
imately 1. Higher maximum values of Q can be explicitly designed into the circuit. b.)Experimental

















for C2  CO (25)
This represents the maximum Q for small amplitudes and will drop as amplitude increases.
This phenomena is not discussed within the scope of this paper, but will be discussed later.
As can be seen from Figure 29, the C4 was not designed to have a large Q peak. However,
by changing the capacitances, a much larger Q peak can be achived. For example, by
reducing the value ofC2, more resonance occurs. A good method to increase Q is to not
explicitly draw any C2 capacitance and use the gate-drain overlap capacitance of M4 in
Figure 26 as C2.
Removing the feedback capacitorC2 transforms the C4 into a high-gain filter since the
effective value ofC2 is simply the overlap capacitance ofM4. This is a configuration is
called a vanilla C4. In addition to being high gain, a vanilla C4 has increased resonance that














Vo u t   2


























































Figure 30. (a) C4 circuit and a cascade structure to generate higher order filters. The center frequency,
along with the Q for each filter stage is programmable by integrating programmable elements into the
circuit structure. (b) The frequency response for a variousfilter structures designed for a 1st-order up
to a 3rd-order filter response. The 2nd order stage was designed to have a very high Q and a Q of 62
was measured, which is ample for most auditory applications.
The larger theQ peak, the greater the isolation of the center frequency whenthe vanilla C4
is tuned to have a narrow bandwidth.
3.3 Designing for multiple filter stages
These filters can be cascaded to increase the stop-band rolloff, as well as the overall Q
of the filter. Operation of the C4 Second-Order Filter is similar to a vanilla C4 since the
C4 Second-Order Filter is simply a cascade of two vanilla C4s isolated by a buffer. By
tuning each of the C4s comprising the C4SOS to have identical time constants, the overall
response of the C4SOS has±40 dB/decade slopes outside the passband and a potentially
largeQ peak.
One potentially hazardous trait of the C4 is that the input capacitance of the circuit





































 = 36.54 fF
C
W
 = 365.4 fF
C
W
 = 3.654 pF
Figure 31. Change in C4 input capacitance vs. frequency. Low frequency input capacitance= C1. This
transitions to the high-frequency input capacitance whichis roughly C1//Cw.
critical when cascading multipleC4 stages to create higher order filter banks. If the previous
stage in the overall system is dependant upon the input capacit nce of the C4 for its load
capacitance, then this could be a serious issue.
Referring back to Figure 26, the input capacitance can easily be found for the cases of
very low frequencies and very high frequencies. For very lowfrequencies, the middle node
is an AC ground because of the high-gain amplifier. Hence, theinput capacitance for low
frequencies is simplyCin = C1. For very high frequencies, the transistors can no longer
follow the signals, so the C4 reduces to a network of capacitors and the input capacitance
becomes the series/parallel combination of the capacitances in this network seen looking
into the input,Vin. The input capacitance for the two extreme cases are given by
Cin( f → 0) = C1
Cin( f → inf) = C1‖ (CW +C2‖CL) ≈ C1‖CW (27)
This approximation holds in the case whenC2 is significantly smaller than the load capac-




























































Low Corner Frequency vs. Bias Current




Figure 32. a.) Characterization plot of C4 Low corner frequency for decreasing values ofiTauLow.
Showing programming across the range of frequencies. b.) Characterization plot of C4 corner fre-


























































High Corner Frequency vs. Bias Current




Figure 33. a.) Characterization plot ofC4 High corner frequency for increasing value ofiTauHigh. Show-
ing programming across the range of frequencies. b.) Characterization plot of C4 corner frequency vs.
bias current.
very high frequencies, and since these values are not identical, there is a transition region
for the input capacitance between these states and this figure shows that the transition re-
gion between these two values ofCin which occurs over a confined frequency band near
the center frequency of the C4. Figure 31 shows results of a SPICE simulation in which the
input capacitance was computed. Simulation and experiments closely agree for this circuit.
However, since the width of the swing in input capacitance isa function of the values of



















































Figure 34. Extraction of C4 frequency response parameters. A linear regression is applied to the
result of the polynomialization (shown below). The solid line in the upper plot is the ideal lowpass
response calculated with the parameters that have been extracted. Frequency response information is
only needed in the region surrounding the corner frequency;additionally, parameters can be obtained
from less dense data with nearly the same accuracy. Measuring fewer data points allows for increased
speed in the tuning algorithm.
choice of capacitor values. Increasing the drawn size ofCW is the best choice for reducing
the shifting input capacitance for the simple reason that the larger the value ofCW, the more
closelyC1‖CW ≈ C1, and hence the more closely the high-frequencyCin approaches the the
low-frequencyCin. Figure 31 shows the results of 10-fold increases in the capacitance of
CW. The larger thatCW is drawn, the less the eff ct of the input capacitance shift on the
system.
Another way of reducing the effects of the shifting input capacitance is to put a buffer
in front of the C4 so that the previous circuit always sees the same load capacit nce. How-
ever, for many cases, spending the real estate required for abuffer could be better used
by simply increasing the size of theCW capacitor because increasingCW has advantages
besides keepingCin relatively fixed. The primary reason thatCW is drawn in the C4 is to
58



































Figure 35. (a) C4 change in linearity with DIBL voltage. The linear range of a C4 bandpass filter is
improved by using source degeneration in the feedback path.A DIBL transistor is chosen to provide
an exponential relationship that is more flat than a regular transistor because of it’s very small early
voltage. For simplicity each curve is offset+500mV. The lower curve shows the output when the DIBL
voltage is closest to it’s ideal voltage. (b) A circuit implementation to provide a self-bias for the DIBL
transistor. The optimal DIBL voltage would be determined bythe bias point of a given C4, therefore a
separate circuit would be required for each filter.
capacitively divide the input signal and thus increase the input linear range.
3.4 Decreasing Distortion
The C4 uses a short-channel device, named a DIBL (Drain-Induced Barrier Loading) tran-
sistor, to decrease the loop-gain in the feedback path and increase the overall linearity of
the device. A DIBL transistor is chosen to provide an exponential relationship that is more
flat than a regular transistor because of its very small earlyvoltage. During initial char-
acterization this device has its gate pulled to VDD in order to emove it from the loop,
however, an optimal bias point for the device is shown to improve linearity. Figure 35(a)
shows the decrease in the second harmonic as the DIBL bias approaches its optimal value.
The optimal DIBL voltage is dependent on the DC operating point, which changes with
corner frequency for each filter. Figure 35(b) shows a self-biasing scheme that is currently
being testing to work over all frequencies thus increasing li earity without external biases.
59
In the case of an array of these devices, the self-biasing DIBL circuitry would be required
for each filter tap becauseVDIBL is dependent uponVτl andVτh, which change depending
























































 = 1.9304 kHz
(a) (b)
Figure 36. Example of applying correction factor to (a) low frequency and (b) high frequency corners.
The correction factor is calculated as a ratio of the actual time constant to the desired time constant
and is used in determining the correct bias current to program.
3.5 Programmed Filter Bank
An array of 32 C4s was fabricated with a 0.5µm process available through MOSIS. While
the bandpass filter elements can be programmed to any desiredcenter frequency spacing
and bandwidth, we will use the example of exponentially spaced center frequencies with
narrow bandwidths and moderateQs as this is a highly advantageous configuration in au-
dio signal processing. This type of configuration closely models the biology of the human
cochela [cite] and it lends itself to allow subbands of frequncy to be independently ma-
nipulated since there is a real-time frequency decomposition occurring. Figure 28b shows
the frequency response of each of the 32 filter taps. These filters were programmed so that
theIτls andIτhs for each filter tap had exponentially spaced currents that were programmed
within 95% of their desired values.
By programming the filter bank to have exponentially spaced center frequencies with
60


































Figure 37. This plot shows the percentage error in corner frequency for a bank of filters, with a sin-
gle correction factor and with multiple, adjusted correction factors. The adjusted correction factors
account for both mismatch in the circuits from filter to filter , but also for offsets as a result of the
programming algorithm. The final errors can be attributed to errors introduced in the programming




















































C4 frequency response − (final)
(a) (b)
Figure 38.C4 filter bank response data. a.) All filter corners programmed to 100 Hz and 1kHz. b.) All
filter corners programmed to 500Hz and 5kHz.
narrow bandwidths and moderate resonance, an input signal is broken down into its respec-
tive frequency components. This type of filter bank is extremely useful in signal processing

























































C4 frequency response − (final)
(a) (b)
Figure 39.C4 filter bank response data. a.) All filter corners programmed tight to 5kHz and 10kHz.
b.) Bias currents programmed to a ”nice” spacing. Not an explicit frequency.
the input signal, as is often done in auditory processing. These same signal processing
techniques can also be extended into the IF band with this same filter bank. Several ver-
sions of this programmable array of bandpass filters have been fabricated using 0.5µm and
0.35µmprocesses available through MOSIS.
3.6 Programming Out Offsets
Figure 28 (b) illustrates one of the fundamental design issue with analog circuits, and this
is that no matter how accurately biases can be set, circuit performance is affected by mis-
matches that occur during the fabrication process. The 32 traces from the programmable
bandpass array are monotonically spaced, which is a difficult task and has only been over-
come by very clever circuit design [71]. In contrast, this monotonicity and spacing was sim-
ply programmed by the floating-gate biases. However, inspection of Figure 28 (b) shows
that corner frequencies are notperfectlyspaced. However, this is of no concern because
these errors due to mismatch of transitor and capacitor sizes can be simply programmed
out with the floating gates.
Using the time constant equations and estimated values for the device constants and

























































C4 frequency response − (final)
(a) (b)
Figure 40. First-order C4 filter bank response data. a.) C4 filter corners programmed tohave a Q=2
with the high frequency starting at 64kHz and all the other corners falling out from there. b.) C4
filter corners programmed to have a Q=3 with the high frequency starting at 64kHz and all the other
corners falling out from there.
outset, the corners are chosen so as to be well-spaced to ensure that the high frequency
corner is independent of the influence of the low corner’s time constant, and vice versa. The
magnitude of the frequency response for the filter is then measur d. Frequency response
parameters can be easily extracted by polynomialization ofthe magnitude response. This
involves transforming the magnitude response data so that alinear regression can be applied
and the parameters extracted. The transformation for highpass and lowpass responses (and


























whereA is the passband gain. A typical frequency response for the high corner of a C4
is shown in Figure 34 along with the regression of the resulting polynomial. Using the
extracted frequency response parameters, a correction constant is calculated as a ratio of the
actual time constant and the target time constant. This corre tion factor is then multiplied
with the original target current and the floating-gate devics are reprogrammed.





























 frequency response - (final)
Figure 41. Second-order C4 array programmed with a Q of 1.
programmable C4circuit fabricated through MOSIS is shown in Figure 36. The plots show
both the original and the corrected frequency response curves of the low and high frequency
corners. In the case of the high frequency corner, the targetcorner frequency was 2.0 kHz;
after one correction step, the corner frequency was measured as 1.93 kHz, a four percent
error. The error was within the five percent tolerance that was characteristic of the algorithm
used for floating-gate programming; it is reasonable to assume an increase in accuracy of
the programming algorithm would result in a further increase in the accuracy of the tuning
algorithm.
The correction factor was calculated initially for both thehigh and low frequency cor-
ners in only a single filter. These single correction factorswere then applied to all of the
filters in the filter bank with all of the corner frequencies targeted for the same value. From
these results, correction factors unique to each filter werecalculated and stored. The cor-
ner frequencies were then reprogrammed using the unique corr cti n factors. Figure 37


























C4 First−Order Filters used to make the third−order filter
Figure 42. C4 first-order filter bank measured and theoretical fit. The ”dar k” dotted line is the theoret-
ical filter used, while the solid line is the experimental filter response. The other two ”lighter” dotted
lines are the theoretical responses used to create the third-order filter data. They do not refer to any of
the other experimental curves.
plot shows data from the filter bank programmed with a single correction factor and with
multiple, adjusted correction factors.
Biasing of these arrays is critical. The bias currents typically scale according to the
following relationship.
In+1 = A ∗ In (29)
where A is a scale larger than 1, and n goes from 1 to the number of sub-bands in the





























C4 Second−Order Filters.  Each tap has Q of 2
Figure 43. C4 second-order filter bank measured data and theoretical fit. The ”dark” dotted line is the
theoretical filter used, while the solid line is the experimental filter response.
3.7 Biasing Techniques
Since the bias values for each filter tap in the bank of filters is ideally related by a constant
scale function, this raises some interesting questions about biasing methods for these de-
vices. the two methods that will be discusses are biasing using the floating-gates directly
and biasing using a scales biasing generator of some kind; for our purposes we have used a
modified bias generator structure originally introduced in[7].
3.7.1 Floating-gate Direct Biasing
One method is using floating-gate biasing directly. This method allows the current for
each filter element to be programmed independently. The benefit to this is flexibility and
also space because there is less overhead associated with simply programming each device















































































Figure 44. C4 third-order measured data and theoretical fit. a.) Bands used to create the Third-order
filter bank (data set 3). b.) Bands used to create the shifted Third-order filter bank (data set 3). c.)
Third-order filter bank (data set 3) showing difference by shifting one filter.
3.7.2 Programmable Bias Generators
Because the scale is constant between all filters, one can also use Bias Generators which




























C4 Fourth−Order Filters.  Each tap has Q of 2 or less


























C4 Fifth−Order Filters.  Each tap has Q of 2 or less
Figure 46. C4 fifth-order filter bank response data.
68
a predefined filter bank. However, to expand the spectrum decomposition to any arbitrary
range of frequencies, one must proved a means to set the corner frequencies to a set of
values after fabrication.
This can be accomplished by using programmable Bias Generators. These cells inte-
grate programmable elements to allow the spread of the bias values and the scale between
values to vary based on the programmed value in an analog memory element.
The advantages to this method are that all bands will be spaced across the frequency of
interest without doing much programming. However, the costf r his easy of use is extra
circuitry required to provide the spreading function of thebias currents.
In this chapter we have shown the design parameters which affect frequency response,
gain and linearity of a 5 transistor filter element. We have shown primarily experimental
results, along with some simulation, to verify the tunability and linearity of the device.
Taking all of these into account, this ultimately results ina low-power and compact filter
which can be easily tuned over 6 orders of magnitude in frequencies with maximum Q fac-
tors that can explicitly designed into a single stage. Usingthese design parameters, one can
begin to design higher-order C4 filter banks, with applications in numerous signal process-
ing applications. In this paper we have shown the design parameters which affect frequency
response, gain and linearity of a 5 transistor filter element. We have shown primarily ex-
perimental results, along with some simulation, to verify the unability and linearity of the
device. Taking all of these into account, this ultimately results in a low-power and compact
filter which can be easily tuned over 6 orders of magnitude in frequencies with maximum
Q factors that can explicitly designed into a single stage. Using these design parameters,




ANALOG SIGNAL PROCESSING BLOCKS
In this chapter we present the dynamics of several floating-gate computational buildings
blocks and accompanying experimental measurements. Theseblocks include programmable
signal spectrum decomposition, peak detection, vector matix multiplication for perfor-
maing analog frequency transforms, and specifically in our case cepstrum processing. This
chapter will also cover analog blocks that perform a trigonometric distance measure and
diffuser elements for wave propagation networks. Using these blocks we are able to begin
developing analog architectures as front-ends for larger digital or analog speech processing
and recognition systems.
4.1 Frequency Decomposition
Transforms are widely used and in a general sense can refer toany type of conversion
operation that may be applied to a signal. Some common transform include: the Fourier
Transform (FT), Discrete Fourier Transform (DFT), Wavelettransform (WT), etc. Block
transforms have a fundamental issue of space-frequency localization. This is particularly
important to discuss when dealing with windowing issues associated with performing the
transform. The longer the window, the better the frequency resolution; the shorter the
window, the better the time resolution. The tradeoff is unavoidable and is limited by the
Fourier uncertainty principlewhich also limits these circuits even though they operate on
continuous signals.
The wavelet transform is a tool that cuts up data or functionsor operators into different
frequency components, and then studies each component witha resolution matched to its
scale. Wavelets provide a tool for time-frequency localization of a signal evolving in time
(e.g., the amplitude of the pressure on an eardrum, for acousti al applications). Our ear
uses a wavelet-like transform when analyzing sound, at leasin the very first stage. Given
70
Figure 47. Frequency response of the cochlea.
a signal f(t), one is interested in its frequency content locally in time. The standard Fourier
transform also gives a representation of the frequency content of f, but information con-
cerning time-localization of, e.g., high frequency burstscannot be read off easily from the
Fourier transform. Time-localization can be achieved by first windowing the signal.
In some filterbank configurations, particularly when using octave bands, one obtains a
discrete-time wavelet series. Such a configuration has beenpopular in signal processing
less for its mathematical properties than because an octaveband or logarithmic spectrum
is more natural for certain applications such as audio compression since it emulates the
hearing process. Such an octave-band filter ban an be used, under certain conditions, to
generate wavelet bases, as shown by Daubechies [71].
This section explains the concept of spectral decomposition using analog programmable
71
Vin(t) Vout(t)
Figure 48. Filter bank approach to decomposition and similarities to classic way of thinking about
FFTs.
filter bank elements. In this section we present an analog filterbank operation which is anal-
ogous to a Continuous-time Fourier Transform. Given specific implementation properties
of the filterbank, similarities to the wavelet transform arelso possible. These elements
operate at extremely low power and lend themselves well to low-p wer real-time audio
signal processing. This bank of analog bandpass filters can be placed before an analog-to-
digital converter to perform a frequency decomposition of an audio signal and also allow
for analog preprocessing before being transformed into thedigital domain. The individual
filter taps are easily be tuneable to any given frequency and bwidth because capacitively
coupled current conveyers (C4s) are used with floating-gate transistors as biasing elements.
Offsets and mismatches within the circuit elements are shown tobe inconsequential be-
cause they can be accounted for and programmed out.










Figure 49. Schematic of a single C4 structure. The capacitors model all explicit and parasiticcapaci-















Figure 50. Floating-gate version of a capacitively coupledcurrent conveyer second-order section
(C4SOS).
filters has to be created in which the basic bandpass filter element must be relatively small
so that it can be placed in an array, the bandpass element mustbe easily tunable so that
multiple elements can cover the entire auditory spectrum, and the center frequencies of the
bandpass elements must follow an orderly spacing. Typically, for frequency decomposis-
tion, exponential spacing is desired. Having a moderate amount f resonance (Q ≈ 30) is
also desirable for better isolation of the center frequency. The rest of this chapter discusses
how we have gone about designing and building this type of programmable bandpass array.
A similar operation to Sub-band coding, one can view the classic l transform operations





















   C1 = 1.5*C2 
Moderate Q 


















   C2 = 1.5f 
High Q 
   CW = 200*C2
dVout = 







































   C2 = 1.5f 
High Q 
   CW = 200*C2
dVout = 














Attenuation Stages Buffer Stages
Figure 51. This figure illustrates the method used in the mostrecent filter bank chip with attenuation on
the input as well as a buffer on the output. This allows for the most input swing for improved linearity,
as well as isolating the filter from capacitance shifts on later stages.
analytically, the specific transformation taking place. Because we are using constant-Q
filterbanks, these operations are similar to wavelet transforms. The difference here is that
for our designs, the maximum achievable Q is predesigned, while the resulting Q of each
filter depends on the placement of the high and low corners, which are both programmable.
In order to simplify the comparison with wavelets, we will describe all filterbanks as if they
have the same Q, which is required for the comparison.
Using these continuos-time filter banks for signal decomposition is possible. The fil-
tering operation for each core filter, then using the base structu e to develop higher-order
filters, will have the following characteristics
H(s) =
s(ωoQ ) ∗ HBP




whereωo is the center frequency andHBP is the pass band gain of the filter.
A filter bank of this type does not allow for perfect reconstruc ion because some infor-
mation will be lost during the filtering operation, where even within the pass band of the

























































C4 frequency response − (final)
(a) (b)
Figure 52. First-order C4 filter bank response data. a.) C4 filter corners programmed tohave a Q=2
with the high frequency starting at 64kHz and all the other corners falling out from there. b.) C4
filter corners programmed to have a Q=3 with the high frequency starting at 64kHz and all the other
corners falling out from there.
is applicable in many places. The key question in these applications is how much overlap
is reasonable between filters, which leads to the underlyingdesign characteristics for the
filters themselves such as Quality factor or ”Q” and pass bandg in.
The operation of this element is covered in much more detail in Chapter 3. For au-
dio signal we are interested in using this element in the range from 100Hz up to 100kHz.
An array of 32 C4s was fabricated with a 0.5µm process available through MOSIS. While
the bandpass filter elements can be programmed to any desiredcenter frequency spacing
and bandwidth, we will use the example of exponentially spaced center frequencies with
narrow bandwidths and moderateQs as this is a highly advantageous configuration in au-
dio signal processing. This type of configuration closely models the biology of the human
cochela [cite] and it lends itself to allow subbands of frequncy to be independently manip-
ulated since there is a real-time frequency decomposition occurring. Figure 52 shows the
frequency response of 16 of the 32 filter taps. Only the outputfrom the first stage (a single
C4) is shown. These filters were programmed so that theIτls andIτhs for each filter tap had
exponentially spaced currents that were programmed within95% of their desired values.





























 frequency response - (final)
Figure 53. Second-order C4 array programmed with a Q of 1.
narrow bandwidths and moderate resonance, an input signal is broken down into its respec-
tive frequency components. This type of filter bank is extremely useful in signal processing
applications in which the specific algorithm is easier to perform on individual subbands of
the input signal, as is often done in auditory processing
Useful applications
1) Cepstrum Encoding ASR
2) Noise Reduction
3) Biological Hearing Models
Show outputs of speech data.
In this chapter we have shown the design parameters which affect frequency response,
gain and linearity of a 5 transistor filter element. We have shown primarily experimental
results, along with some simulation, to verify the tunability and linearity of the device.
Taking all of these into account, this ultimately results ina low-power and compact filter
76
































Figure 54. Output from a speech signal with two segments of spoken words. The outputs for the fol-
lowing filterbanks is shown.
which can be easily tuned over 6 orders of magnitude in frequencies with maximum Q fac-
tors that can explicitly designed into a single stage. Usingthese design parameters, one can
begin to design higher-order C4 filter banks, with applications in numerous signal process-
ing applications. In this paper we have shown the design parameters which affect frequency
response, gain and linearity of a 5 transistor filter element. We have shown primarily ex-
perimental results, along with some simulation, to verify the unability and linearity of the
device. Taking all of these into account, this ultimately results in a low-power and compact
filter which can be easily tuned over 6 orders of magnitude in frequencies with maximum
Q factors that can explicitly designed into a single stage. Using these design parameters,









































Figure 55. a.) The initial peak detector circuit using a clasical diode for the input. This circuit suffered
from decreased input range because of the initial diode drop, as well as very large offsets. b.) Initial
step response data showing the adjustable time constant of the circuit.
The magnitude of each spectrum passes through a peak detector stage to produce a
constant magnitude output. This magnitude is similar to taking the power spectrum density
or real spectrum of an input signal. At this point, phase information is unchanged, how-
ever the frequency response of the peak detectors must be programmed to it’s respective
frequency band.





This currentI = V2/R, where R is the total resistance of the bias transistor and dio e. The
floating-gate transistor on the output provides an offset current to set the DC output voltage.









Each peak detector has an individually programmable cornerf equency. Because the out-
























Peak Detector Output Characteristics
(a) (b)
Figure 56. a.) The new peak detector circuit using a feedbackto track to signal peaks. This circuit
does not suffer from decreased input range due diode drops. There are alsono system offsets outside
of the amplifiers which are typically small, and minimized bythe overall loopgain due to the feedback
configuration. b.) Tracking data of the new peak detector as the bias current is changed.
within each band. The peak detector programming blocks are isolated similarly to theC4s.
The entire bank is treated as a single row and within that row the individual elements are
accessed by column. Control circuitry on the rows and columns ensures isolation.
A new design for the floating-gate minimum and peak detector is shown in Figures 57
and 56 respectively. The circuit uses an high-gain feedbackloop to track the maximum and
minimum of input signals. The circuit is also capable of locking to the maximum/minimum,
however for many signals with a varying amplitude, such as speech, it is advantageous for
the circuit to track these changes. The tracking speed for aninstantenous increase in the
speech signal depends on the bandwidth of the feedback amplifier which is in the 10MHz
range, well above the frequencies of interest. However, as signal energy within a particular
band decreases the tracking speed of the envelope detector is related to a linear decay





























Minimum Detector Output Characteristics
(a) (b)
Figure 57. a.) The new min detector circuit using a feedback to track to signal peaks. This circuit
does not suffer from decreased input range due diode drops. There are alsono system offsets outside
of the amplifiers which are typically small, and minimized bythe overall loopgain due to the feedback
configuration. b.) Tracking data of the new min detector as the bias current is changed.
The initial design used a 100fF capacitor which resulted in bias current on the order of
pico-amps to track signals in the kilohertz range. The latest v rsion doubled the capacitor
to increase the low-end useable range of the circuit. This circuit has the added advantage of
being easily modified to operate as a minimum detector. The combined circuit performs a
signal envelope function which is later used as our magnitude output from the filter banks.
4.3 Linear transconductance
The initial stages of the system operate in voltage-mode on both inputs and output and up
to this stage, that has worked fine. However, at this point we would like to convert out
frequency dependent voltage signals into a current to be weight d during the transform
operation. This section explores our phases of developmenttowards our transconductance
element, or more simply a voltage-to-current convertor, designed utilizing floating-gate in-
puts. The element displays a linear transconductance response over a limited range range
80
Figure 58. Circuit diagram illustrating a floating-gate imp lementation of a linearized transconductance
stage.
of voltage inputs. To increase the input range we can implement capacitive source attenu-
ation. Capacitively coupling the inputs does more than simply attenuate the input signal, it
also allows us to use our programmable memory concept to these cells as well in order to
cancel any offsets at the input. Experimental data is presented from circuits fabricated on a
0.5µm nwell CMOS process available through MOSIS.
Initially one would think to use a fully-linear transconductance stage at this point, which
will have many benefits such as linearity, obviously, along with other benefits such as
classic and well know design methodologies [34]. These techniques, however, utilize the
square-law behavior of the transistors in above-thresholdwhich increases the power dissi-
pation of these stages. Again, assuming the final implementatio will be to use an array
of these elements, power is of significant concern. Along these same lines there are also
81




















Linear V−to−I Output Characteristics
V+
V−
Figure 59. Response of Linear Transconductance stage used to change the differential input voltage
into a single-ended output current.
techniques that can operate with the devices in sub-threshold using what is called source-
degeneration, and specifically for these circuits, the technique bears a striking resemblance
to diffuser concepts [76].
An initial design was done combining the classic and low-power transconducance lin-
earization techniques. The circuit shown in Figure 58 uses atechnique of splitting the
differential input current on each leg such that both input signal (V+ and V-) are input on
each leg. Along with this, I included the cross-connected transistors at the source of the
devices to include additional source-degeneration. The final goal being a circuit with a
linear voltage-to-current relationship over a fairly wideinput voltage. Results also shown
in Figure 59 show a final input range of 500mV, but a less than ideal linear response. There
is a significant drop in the transconductance around zero differential input voltage, due
primarily to the transistors used for source-degeneration. Ideally we would want these de-
vices connected to a voltage relative to the common-mode input voltage, but for sake of
82
(a) (b)
Figure 60. A. A schematic drawing of a hair cell with a recording electrode inserted into its cytoplasm.
B. Application of a mechanical force to the hair bundle deflects this elastic structure. C. When the
top of a hair bundle is displaced back and firth by a stimulus probe (lower trace), the opening and
closing of mechanically sensitive channels produces an oscillatory receptor potential (upper trace). D.
The sigmoidal relation between hair-bundle deflection (abscissa) and receptor potential (ordinate) is a
stimulated hair cell. Figure taken from [?].
area these voltages were connected to the input. This technique proved to be less than ideal
for the final application.
Along a different train of thought, looking more at the biological strucures involved in
auditory transduction including the outer-ear, cochlea and in er hair cells, there are some








Figure 61. Circuit diagram illustrating a floating-gate imp lementation of a normal differential input
stage.
interest is the hair cells as shown in Figure 60. The hair cells transduce mechanical mo-
tion on the cochlea to electrical signals processed by the brain. The transductance curves
are sigmoidal in nature, meaning they saturate beyond a given input magnitude. This is
caused by two physical characteristics of the hair cells, limited range of motion and limited
neurotransmitter response. Because of this, there are manyother structures involved in the
very complex auditory pathway to compensate for these limiting factors and to maintain
the sensitivity of this portion of the auditory system, but we are focussing primarily on the
hair cells as they relate to our method of transducing voltages to currents at this critical
stage. Using this sigmoidal response as a basis, we can simply i lement this stage as a
differential pair.
The very well-known differential input pair, as shown in Figure 61, has two huge bene-
fits for this system, very compact and also low-power, based on whatever value we choose,
84






























Output Characteristics of a differential pair



























Figure 62. Response of the floating-gate differential input pair showing improved input linear range,
in addition to the desired saturation effects as also seen in the hair cell in the auditory pathway. The
floating-gates also allow the zero-bias point to be shifted by changing the charge on the floating-gate.
the obvious choice being sub-threshold for consistency. The differential input pair equation




gives us the same sigmoidal response as an inner hair-cell asshown in Figure??. The input
linear range for this circuit is typically on the order of+/- 50mV. In order to increase the
input linear range, we take advantage of our floating-gate techniques. This circuit is shown
in Figure 61 and using capacitive attenuation, we are able toincrease the input linear range
to+/- 250mV, or 500mV peak-to-peak as shown in Figure 62.
Although the differential-pair has a linear input voltage of up to 500mV peak-to-peak,
this circuit actively takes advantage of the saturation effects from the differential-pair. This
saturation performs a limiting function on the inputs, but still preserves the sensitivity for
small inputs, which is ideal for this circuit. The current outp ts are also ideal for input to
the next stage which is the vector-matrix multiplier block.
4.4 Programmable Multiplier
The analog differential multiplier shown in Figure 63 is using voltage input signals, scaling
this signal by the stored weight and generating an output current. The equation governing
85




































Figure 63. Voltage-mode floating-gate multiplier.
its operation is as follows:
IOut = IS O(W
+ +W−)cosh(∆Vin/Vy)
−IS O(W+ −W−)sinh(∆Vin/Vy) (35)
The linearized case including only the first order terms simplifies to:
IOut = IS O(W
+ +W−) + IS O(W
+ −W−)(∆Vin/Vy)
Where the two weightsW+ andW− are programmable floating gate voltage values. These
values can be programmed to any arbitrary value, but for operations involving spectrum
decomposition and transforms, these values are programmedto cosine scale factors. Their
differential operation requires each pair to have a DC bias voltage and from there, the
cosine values are scaled around this bias current. A sample ap ic tion would be for each
row to act as a 32-tap DCT basis vector. Similar vectors are programmed into each row and
typically they will be set to produce a wavelet type of decompsition by scaling each basis




























Figure 64. The programmable bump element and the corresponding output given three different pro-
grammed offset values.
4.5 Programmable bump element
Figure 65 shows the circuit and measured data for a single nodin the VQ classifier ar-
ray. Each cell in the array compares the value of that column’s input to the value it has
memorized; the output current flows out of the Vout node. This circuit is a variation on the
bump circuit [8], which compares the two inputs to this circut; this cell returns a high value
if the two values match (minimal difference). The circuit performs a continuous distance










whereVk is the differential input voltage,Vmk is the resulting stored voltage representing the
ideal mean value for this particular element,κ is the coupling from gate to surface potential,
andUT = kT/q is the thermal voltage. This system outputs a measure of the similarity;
therefore, the outputs of all the elements can be added (by KCL) and the largest output is
the vector with the maximum similarity. The sum of these current outputs are sent through
a Winner-Take-All circuit that outputs the N largest result, where N can be 1 or more [41].
Programmable elements [21] are used at the inputs to store and subtract off the each
cell’s mean value. Setting the floating-gate charge establihes the mean value as well as
eliminating any mismatch between the two-transistor pairs. Setting the size of the input
capacitor as well as other capacitor elements around the floating-gate sets the linear range
87
Vtun






























-1.5 -1 -0.5 0
Differential input voltage (V)
-1.5 -1 -0.5 0 0.5 1 1.5
0

















Figure 65. Programmable VQ using floating-gate circuits. Wereconfigure the VQ circuit so that
it fits within the standard floating-gate programming archit ecture and algorithms [40]. Capacitors
with the additional arrows is our symbol for a tunnelling jun ction, which is a MOS capacitor that can
also remove charge from the floating-gate node. We show experimental results after programming
eight cells to different offset voltages. The difference in the bump peaks is due to mismatch in the
MOSFET transistors. We reset the floating-gate charge usingelectron tunnelling, and program positive
or negative offsets using hot-electron injection. If we inject the floating-gate associated with the positive
input terminal, then we increase the offset. If we inject the floating-gate associated with the negative
input terminal, then we decrease the offset.
of the circuit, and therefore sets the width of thebumpelement. We increase the floating-
gate charge (remove electrons) by electron tunnelling, anddecrease the floating-gate charge
(add electrons) by hot-electron injection.
4.6 Programmable diffuser element
This section presents a programmable diffuser using floating-gate circuits. We present the
dynamics of classical diffuser circuits and show the differences between the classical and
the programmable case. Programmable diffusers offer many advantages including remov-
ing individual element mismatch and also giving the user theability to reconfigure the
overall system behavior. Programmable diffusers can be used for spreading, just as in a




i    n
[   n   ]
V
l    e    a    k
V   f   g[   n   ]
d    i    f    f    u    s    eV
d    i    f    f    u    s    e
V   f   g[   n   ]
l    e    a    k
Figure 66. (a.) A ”classical” diffuser element.Vdi f f use controls the lateral current propagation, while
Vleak controls the vertical current propagation. The input gate voltage to the leak and diffusive elements
are typically a single voltage so the conductance of all elements is equal. (b.) A single floating-gate
diffuser cell. The floating-gate voltage of each element will control the individual conductances. The
individual conductances can be programmed to any arbitraryvalue. Some interesting affects can be
observed when voltages are equal or increase/decrease down the length of the diffuser line. Using
floating-gates to set the voltages of each node allows this circuit to remain compact.
Numerous applications require temporal filtering in eitherone or two dimensions. Sim-
ple resistive grids have been designed to solve this problem. They offered a low-pass filter-
ing operation on the output signal and spread the input response across multiple elements.
Resistors are very expensive when used within integrated circuit technology. Keeping
this in mind, the transition to more dense networks of elements have been designed that
use subthreshold MOSFETs as the diffusive elements. This offered huge savings in terms
of chip area and also gave way to more elaborate diffusive networks. These elements are
compact and programmable. The network characteristics aredep ndent upon the relation-
ship between various floating-gate voltages and can be programmed to exhibit multiple
behaviors.
A diffuser network is a current mode implementation of a one dimensional resistive
network. Classic diffuser circuits have been used for many years in computationalnetworks
that require varying degrees of spreading. It can also be used to perform local spatial
averaging, which can be used to improve signal-to-noise ratio or obtain a local reference



























Figure 67. (a.)Output current for a ”classical” diffuser element. The change in output current is due
to the change in input voltageVin. The input occurs at the center of the array, nodeVk. Vdi f f use and
Vleak were held constant. (b.) Diffuser circuit with node voltages labelled. The difference equations are
calculated at each node and determine the direction of propagation down the diffuser line.
vision chips [57]. They have also been used to model layers ofvisual cortex when input
signals are distributed across multiple elements. This property acts as a low-pass filter by
softening any abrupt changes on the output signals.
Standard current mode diffusers spread current from a given input node. These types of
networks can be realized using either passive or active components. Classic diffusers will
have a constant conductance and a constant leak term at each node [57, 4]. Diffusers made
with active elements, such as transistors, can vary the conductance by changing the gate
voltage. This change is typically global.Vdi f f use controls the lateral current propagation,
while Vleak controls the vertical current propagation. By using floating-gate transistors, we







I in = I (x), wherex = 0 (38)
with x = 0 representing the input node, andL is the space constant. There is also a capaci-
tance at each node that results in low-pass filtering behavior, which is nonlinear.
Implementing a diffusor network using floating-gate elements to set the conductances
90
of the individual diffuser and leak elements has two distinct advantages. The firstis can-
celling mismatch when floating-gate diffusers are used in the classical diffuser with a global
constant conductance. By programming the floating-gate charge to a desired value and thus
setting a desired conductance, offset is limited by the accuracy with which you can program
a floating-gate element [22, 80]. The second is the ability tochange the individual conduc-
tance of each element. The relative floating-gate voltages can be programmed such that
the conductance of adjacent nodes increase in one directionand decrease in the other, thus
introducing a propagation gradient.
Figure 67 shows a standard diffuser circuit with node voltages labelled. Beginning at
nodeVk we define the node currents as if the nodes of the diffusive elements to the left
and right are sources with currentsI1 andI2 respectively. The connected node of the leak
element is the source terminal of the element with currentIk leaving the node. There is also
an input current from a synchronous input which is labelledI ink and a capacitance (C) at
each node which produces a voltage delay to an input current pulse at the node.
Summing these node currents gives us:




Rearranging and grouping terms we get











Defining a backward propagation term and a forward propagation term respectively, we get
B(k) = e−κ/UT (Vak−Vbk), F(k) = e−κ/UT(Vak−1−Vbk)















































Figure 68. Diffuser line output voltage for each element vs. time with symmetrical conductances and
input applied to the center. (a.) Diffuser line output voltage for each element vs. time. There is asingle
input applied at the center of the diffuser line. The time delay at each node is dependent upon the
conductance of each diffuser block and also the magnitude of the input pulse. (b.) Diffuser line output
voltage for each element vs. time with a larger input pulse.
and the previous leak element and F(k) the forward gate voltage differences between the
previous diffusive element and the current leak element, see Figure 67(b).
Using these simplifications and substituting forF(k) andB(k), we can further relate the
spacial delay to a time delay for each node, and move the delays to a unit delay in order
to rewrite everything in terms of the Z-Transform in space. This results in the following
equation







1+ B(k) + F(k)
)
Ik
− B(k)Ikz−1 − F(k)Ikz (40)
From here we raise the question of relating the equation withcomponents that are dis-
crete in position, to wave propagation for a continuous case.
The Tailor series expansion for the discrete form of the second-derivative, we get the
following equation:


















































Figure 69. Diffuser line output voltage for each element vs. time with symmetrical conductances and
input applied at each end. (a.) Diffuser line output voltage for each element vs. time for a single input
applied to the right of the diffuser line. (b.) Diffuser line output voltage for each element vs. time with
a single input applied to the left of the diffuser line. The output voltage drop looks linear for short
diffusion distances but is actually exponential. The current drop is linear, therefore the voltage drop is
exponential. This can easily be seen with a drop shown acrossthe entire diffuser line.
This differential equation is capable of supporting diffusive and wave-propagating solu-
tions. From ( 40) we are able to get three distinct behaviors fr m the network.
(1) Diffusive Case:F = B
B[z− 3+ z−1]
−B+ B [z− 3+ z−1]
︸         ︷︷         ︸
d2
dx2






(3) Propagating Case 2:F  B and∀ elements
−(1+ B) + Bz−1
−1+ B [z−1 − 1]
︸    ︷︷    ︸
− ddx
93














Figures 68 and 69 show experimental results with for a systemwith all of the con-
ductances set to be equal which is the diffusive case. Any input on a single node will
spread symmetrically to it’s neighbors on both sides. In order to create wave propagation
the floating-gate elements can be programmed with a voltage gradient in the desired direc-
tion. The system can propagate faster than seconds, but we sho these speeds to show how




ANALOG SIGNAL PROCESSING SYSTEMS
5.1 Auditory Feature Extraction
This section presents a novel approach to programmable spectrum decomposition, analog
frequency transforms, and spectrum compaction. This system can act as a feature extraction
front-end for larger digital or analog speech processing systems.
From the general model of speech production, speech is the convolution of an excitation
sequence, which is a pseudorandom sequence, with an impulseresponse of the vocal sys-
tem [9]. Extracting the excitation or vocal tract response from a speech signal is non-trivial
because they are combined by convolution:
s(n) = e(n) ∗ θ(n), (44)
wheres(n) is the speech signal,e(n) is the excitation, andθ(n) is the vocal tract response.
Many applications require some knowledge or at least an estimate of either the pseudoran-
dom excitation or the vocal tract response or both.
There are many different methods of feature extraction. Different feature extraction
techniques can be used depending on wether the desired features relate to auditory proper-
ties or general signal properties. Examples of this are cepstrum processing, linear predictive
coding, and zero-crossing processing. Cepstrum has its rooin the characteristics of the
human auditory system in that a speech signal can be modelledas a random excitation se-
quence convolved with the transfer characteristics of the physical articulatory components
such as larynx, vocal chords, tongue, teeth and lips. LPC processing is dependent upon
correlations in the auditory sequence and statistics of previous sequence values. Other
techniques look at the zero crossing of a signal to extract useful information and base their
processing techniques on properties of biological components of the auditory system such
the hair cell[44, 45] and cochlea [54, 84]. Other implementations look at modelling the
95
overall auditory system [83, 48]. There may be other techniques but these are some of
the most common. Regardless of the feature extraction technique, there are benefits and
hurdles when using any of them and one must understand the information that is obtained
from any of the coding techniques. For our purposes, cepstrum processing was chosen be-
cause of its parallelism with a filter bank implementation, its resulting feature set, and that
it follows closely with what has been implemented using DSPs.
Feature extraction techniques that have been implemented eith r in DSP or analog/mixed-
signal IC have different limiting factors. In DSP, limitations include power vs. computa-
tional cycles, if power is a concern, otherwise computations are limited to real-time im-
plementation constraints. In the analog domain design constrai ts include power, but typ-
ically orders of magnitude less than digital circuitry; chip or die area; signal factors such
as dynamic range and signal-to-noise ratio; or general mismatch in devices due to process
variance. For our purposes one can follow good design procedures to minimize limiting
factors such as signal-to-noise or process variances, but die area, for the algorithms we
are proposing, is a critical limiting factor. In order to perform real-time matrix multiplica-
tions, frequency transforms or high-order signal processing tasks in analog circuitry would
require huge die-area to accomodate sample-and-hold circuitry.
The speech signal is modelled by a convolution of a random excitation sequence e(n)
with the vocal tract v(n). The excitation sequence can be modelle as primarily high-
frequency information, while the vocal tract changes at a much lower frequency. The Cep-
strum essentially separates the high-frequency from the low-frequency data by performing
a log-compression on the spectrum information.
Several kinds of spectral matching measures have been studied. In particular, spectral
matching measures based on linear predictive coding (LPC),which seem to be superior to
matching measures based on the filter bank or on the fast Fourier transform (FFT), have
been shown to be very eff ctive in speech recognition, speech analysis, speech syntesis,
and speech coding.
96
The LPCcepstrum distance(CEP) was introduced by Atal(1974). Itakura(1975) intro-
duced an efficient measure called the Itakura distance or the likelihoodratio measure and
showed its usefulness for spoken word recognition. Moreover, he clarifies its meaning as a
spectral matching measure in the frequency domain. Gray andMarkel (1976) summarized
the LPC-based measures, together with the Cosh measure, from the standpoint of matching
in the frequency domain and successfully applied these measur s to vector quantization.
The LPC cepstrum, the Itakura distance, and the Cosh measureshow the same charac-
teristics in matching between similar spectra (Gray and Markel, 1976). According to our
evaluation of these measures by spoken word recognition (Shikano and Sugiyama, 1982)
and vowel verification (Shikano and Kohda, 1980), the Itakurdistance is somewhat in-
ferior to the LPC cepstrum distance and the Cosh measure because of its asymmetrical
spectral matching in the frequency domain. The LPC cepstrumdistance and the Cosh mea-
sure demonstrate quite similar performances for word recogniti n and vowel recognition.
These measures, however, lack two characteristics of frequency domain spectral matching.
One is a local peak-weighted matching like formant matching. The other is a frequency
axis weighting such as provided by Mel scaling.
Cepstral analysis is a special case within a general class ofmethods known as “homo-
morphic” signal processing [61]. The cepstrum is able to reslve the two convolved pieces
of the speech,e(n) andθ(n), into two additive components, that can then be separated or
analyzed using spectral (cepstral) analysis. In general, the spectrumΘ(ω) of the vocal tract
response is assumed to be smooth with only slow changes as a function of frequency; the
spectrumE(ω) of the excitation signal is assumed to have rapid variations as a function of
frequency. When cepstrum analysis is used, the energy from log|E(ω)| will be mapped to
high values ofn and the energy from log|Θ(ω)| will be mapped to low values ofn. There-
fore, the two signals,ce(n) andcθ(n), will largely occupy different parts of the frequency
axis and can be analyzed as separate entities.




Figure 70. Respective digital and analog cepstrum implementations. a.) The traditional cepstrum
computation which is performed in digital circuitry. b.) Fl oating-gate system to perform cepstrum
front-end computation for speech processing systems. The system contains 32 frequency taps that can
be spaced arbitrarily by programming the corner frequencies for the bandpass filter banks. The peak
detectors provide a power spectrum of the input signal for any given time slice. Outputs are generated
by summing the output currents for each row through KCL.
output of the log|S(ω)| into critical band energies prior to performing the inverseDCT.
The Mel-cepstrum has certain qualities that make it useful for speech recognition. First,
the critical band filtering stage produces an output that is very similar to that observed at
early stages of human auditory perception. Second, the finalDCT serves to decorrelate
the critical band energies for improved automatic pattern rcognition performance. The
DCT also has the advantage over the Fourier transform that ityields real values. It is for
these reasons that the Mel-Frequency Cepstrum is the model used in designing our analog
cepstrum implementation.
Within the analog circuits domain, an analog Mel-FrequencyCepstrum is implemented
beginning with a band-pass filter approach to signal decomposition. Large arrays of floating
98















































































































































Figure 71. Analog Cepstrum building blocks. a.) Floating-gate C4 second-order-section and its cor-
responding frequency response. The high and low corner frequencies can be independently tuned for
each filter bank. Arbitrarily programmable corner frequenc ies allow these filters to be spaced linearly,
octave, logarithmically or any other values desired by the user. b.) Differential floating-gate multiplier
structures multiply two di fferential signals by constant factors that are stored on the floating gate ele-
ments. c.) Floating-gate peak detectors. The frequency response of the peak detector is controlled by a
bias voltage which controls the gate of nFET M3. This elementsets a constant resistance and the total
R,C value shifts the high corner frequency. The frequency response is shown for different values ofvtau.
gate computational blocks have been developed that performsimilar processing as mel-
cepstrum algorithm. The processing is all done in continuous analog circuitry.
The basic building block of the cepstrum begins with a continuous spectrum decom-
position similar to a Discrete-Fourier Transform (DFT). The spectrum decomposition is
done using differential Capacitively Coupled Current Conveyors (C4) second-order-section
bandpass filters [17]. The spacing of the bandpass filters is arb trary because each can
be programmed to have a desired high-frequency corner and low-frequency corner [40].
These structures can also be cascaded to design higher orderfilters. Figure 30 shows the
frequency response for the C4 structure for 1st,2nd and 3rd order filters. For the auditory
applications Q’s of up to 30 are useful which would limit implementation to a minimum
of a 2nd order filter structure. For simplicity only one half of the differential structure is
shown in Figure 71. Programming theC4s is handled as if each filter were two floating
gate elements. The entire row is viewed as a single row and thefloating-gate elements for
99
the high and low corner frequencies are accessed by column. Control circuitry guarantees
injection isolation by latching the gate and drain voltagesto the power supply for elements
within C4s not selected [39].
The magnitude of each spectrum passes through a peak detector stage to produce a con-
stant magnitude output. This magnitude is similar to takingthe power spectrum density or
real spectrum of an input signal. At this point, phase information is unchanged, however
the frequency response of the peak detectors must be programmed to it’s respective fre-





This currentI = V2/R, where R is the total resistance of the bias transistor and dio e. The
floating-gate transistor on the output provides an offset current to set the DC output voltage.









Each peak detector has an individually programmable cornerf equency. Because the out-
put magnitude is continuous, this allows us to capture additional high frequency content
within each band. The peak detector programming blocks are isolated similarly to theC4s.
The entire bank is treated as a single row and within that row the individual elements are
accessed by column. Control circuitry on the rows and columns ensures isolation.
The analog differential multiplier is shown in Figure 71. The equation governing its
operation is as follows:
IOut = IS O(W
+ +W−)cosh(∆Vin/Vy)
−IS O(W+ −W−)sinh(∆Vin/Vy) (47)
The linearized case including only the first order terms simplifies to:
IOut = IS O(W























































Figure 72. Cepstrum system output. The system input is a sequence of speech using a standard speech
database. There are 12 continuous cepstrum coefficients calculated for this section of speech and more
coefficients is only a matter of chip area since the calculation is performed in parallel analog circuits.
From the graph one can see the two distinct periods of speech.
Where the two weightsW+ andW− are programmable floating gate voltage values. These
values can be programmed to any arbitrary value, but for operations involving spectrum
decomposition and transforms, these values are programmedto cosine scale factors. Their
differential operation requires each pair to have a DC bias voltage and from there, the cosine
values are scaled around this bias current. Each row acts as a32-tap DCT basis vector.
Similar vectors are programmed into each row and typically they will be set to produce a
101
wavelet type of decomposition by scaling each basis vector frequency by a constant factor.
The mel-cepstrum, as used in digital signal processing (DSP) is based on a signal
sampled in time and in frequency. The analog cepstrum is an approximation to the mel-
cepstrum or cepstrum (depending on how the filters are defined) i which frequency is sam-
pled but time is not. The output of each filter contains information similar to the short-time
Fourier transform and can likewise be assumed to represent th product of the excitation
and vocal-tract within that filter band. The primary difference here is that the DSP mel-
cepstrum approximates the critical band log frequency analysis of the human ear by com-
bining DFT bands while the analog system actually performs acritical band-like analysis
on the input signal. Thus higher frequency critical band energies are effectively computed
using shorter basis functions than the lower frequency bands. This is more in agreement
with analysis in the human auditory system and is better suited to identifying transients.
One other difference between the analog cepstrum described herein and thereal cep-
strum described in Eq. (44) is that the magnitude function (inside the log) is estimated using
a peak detector rather than using the true magnitude of the complex spectrum.
Chips performing an analog cepstrum have been fabricated throug MOSIS. Initial re-
sults show that the analog cepstrum will be useful components within speech recognition
systems. The full system is currently being tested with preliminary results are shown in
Figure 72 The system output from the analog peak detector wasprocessed using simulated
multiplier cells in matlab.
5.2 Analog Pattern Recognition Blocks
5.2.1 Analog Vector Quantization
Vector quantization (VQ) is typically used in data compression and in classifying signals to
symbols [72]. For example, in speech processing VQ is used toreduce the set of detectable
spectrum vectors to a manageable set for later classification. The goal of VQ is to provide








































































( c   )
Figure 73. Overview of Vector Quantization (VQ). (a) Single-input VQ is equivalent to Analog-to-
Digital Conversion. The incoming waveform is sampled and converted into digital symbols, often as a
literal representation of the incoming waveform. In this diagram, we input a single channel signal at
three time samples and observe that eachsignal is converted to the closest stored symbol (N-bit words
for an N-bit ADC). (b) Multiple-input VQ is a multidimension al extension of the single-input VQ case.
In this diagram, we input a multichannel signal at three time samples and observe that eachsignal is
converted to the closest stored codeword (N-bit codewords for an N-bit VQ). (c) One application for
VQ is in automatic speech recognition. In particular, a VQ block is critical in an analog IC front-end
for speech recognition [75].
complexity of signal processing algorithms such as classificat on. VQ, like any Analog-
to-Digital Conversion (ADC), is a lossy operation, but an ADC only classifies data in a
single dimension, or on a vector of length one; where as a vector quantizer classifies data
in an arbitrary number of dimensions, or a vector of length N.Figure 73 shows the basic

















































































V1 V2 V3 Vn
Second Differential Input (V)





























































Figure 74. Basic circuit, architecture, and measurements from the VQ circuit. The core cell is built
from a floating-gate bump circuit, which allows the target mean value to be stored and subtracted from
the broadcasted input signal. In a 1mm x 1mm of 0.5um die area,you should be able to get roughly
1k programmable bump elements. We see that output current (experimental measurements) of the
middle leg of the bump circuit reaches a maximum at its centervalue, and falls off exponentially as
one moves from that center value. This output current is summed together with the output from other
bump circuits. We also show experimental data when sweepingtwo inputs to see the multidimensional
structure of the bump function. We get a region of high response where both bumps occur, and far
away from the center, the response is nearly invariant with input voltage. Near the center of a bump in
one dimension, but away from the center of the bump in the second dimension, we still see a significant
response. To get an ideal VQ type bump, we would want to assumethat our output is a log-encoded
version of the actual output. We also show a nearly exponential expanded result using a power of 10
circuit, which is fairly easy to perform using translinear techniques (most on-chip techniques for taking
the exponential of a current effectively result in a large power law).
Vector quantization can also be implemented in a programmable analog, floating-gate
based system. This analog implementation of VQ provides power vs. computation savings
not easily realizable in the equivalent digital VQ implementation. One can either choose
to compute at higher input signal frequencies at the same powr consumption, compute at
the same input signal frequencies at a much lower power consumption, or a combination of
these approaches. This computational efficiency is important for low-power applications,
such as portable battery powered devices. We see this technology as a fundamental compo-
nent in a low-power speech recognition system [75], as well as in other sensory processing
applications.
A VQ system will compute how far away a particular input vector is from the desired
target vectors, and pick the code vector that isclosestto the input vector. We compute
104
the closest input vector by choosing an appropriate distance metric,d(x,m), between the
incoming vector signal (x) and the desired or target mean value (m) for these signals. Two
particular measures we discuss are





where n andσ is depend upon the problem and input statistics. The first appro ch is pre-
ferred for real-time implementation, where the second is prefer ed for algorithmic reasons.
Previous IC implementations have used simple||x−m||metrics for their difference functions
[15, 5, 52].









whereσ is dependant on circuit parameters. One variant uses an expon ntial function
of this metric, which effectively turns the summation into a product, resulting in a more
Gaussian-like formulation. This metric is close to the Guassian approach and easily imple-
mentable in CMOS.
5.2.2 Applications of Programmable Diffusers to HMM classifiers
In this section we show connections between dendritic processing structures and Hidden-
Markov Model (HMM) decoding and a circuit topology we believe can be used to imple-
ment such a decoding structure using analog programmable diffuser networks. From an
integrated circuit (IC) toward biology perspective, thesesimple spreading networks relate
well to cable theory and are similar to biological structures such as dendrites and cortical
cells. Going in the other direction, from IC to classical digital signal processing (DSP),
these structure hold similarities to HMM decoders. In orderto implement both structures
requires a compact array of variable conductance elements.Using floating-gate transistors
and allows us to get hundreds of state nodes in 1mm x 1mm of 0.5um die area and we
105
are able to individually vary the conductance of each diffuser element in the array, which
dramatically changes the analysis of these arrays. This appro ch illustrates the design syn-
ergy of Neuromorphic engineering — biological systems inspire engineering design, and
engineering practice inspires biological theory. This research provides a good example
of how an integrated-circuit approach can bridge signal processing techniques and neural
modelling.
Both techniques are based on diffusor networks, originally presented by Boahen, et. al
[4]; classic diffusors have a constant conductance and a constant leak term. We use pro-
grammable diffusor networks, built from floating-gate transistors, for which we first pre-
sented experimental results elsewhere [76]. In the diffusor network, floating-gate transistors
set the conductance of each element individually, thereby cancelling mismatch and allow-
ing a desired conductance to be programmed. The result is that caling the floating-gate
voltages, particularly in a linear fashion, we can choose betwe n classic diffusive behavior
[4], described by parabolic PDEs, and forward or backward wave propagation, described
by hyperbolic PDEs. Charge on each floating gate determines not only the directionality of
the wave, but also its speed of propagation.
Dendritic computation is a mixture of digital and analog computing paradigms. Den-
drites are a major portion of neurons. They function as the inputs to the cell and use
complex connectivity to make a wide range of connection morph logies. They do not,
however, merely transmit data from pre-synaptic neurons tothe cell body to which they
are connected; recent experimental measurements suggest 70 percent of power in the brain
is supplied at dendrites. Instead, they are instrumental inperforming sometimes complex
computations. Since synapses (connections between cells)are made on the surface of these
cells, dendrites increase the number of connections which can be made while still optimiz-
ing the cell to fit into a small space. [13]. As we previously stated, it is the dendrites that
carry signals from pre-synaptic cells toward the cell body of the post-synaptic one (the cell






























rm rmCm Cm Cm
Cm Cm Cm
Figure 75. Dendrite figure showing component structures. (a) The membrane of a cell (including
the dendrites) is comprised of a two layer lipid and channelswhich span both layers of the lipid. The
channels selectively allow certain ions to pass across, andcan be modelled by a transistor [12]. The
membrane itself does a very good job of separating charge andhas been classically modelled with a
capacitor. (b) Inside the cell, ions spread from point A to B by means of diffusion. A subthreshold
MOSFET transistor models this well as electrons spread through the channel of a subthreshold MOS-
FET transistor by diffusion. Putting these pieces together gives us the classic diffusor [4] which is
similar to the circuit shown in (c) shows a cartoon picture ofa dendrite. The diameter of a dendrite
decreases exponentially as you move from the proximal to distal end (from the cell body to the end of
the dendrite). (d). Here we have made the diffusive elements Floating Gate elements to greatly reduce
the number of wires needed, and to facilitate programming ofthese voltages. (e) shows the classical
view of dendrites that neurobiologists have used to model dendrites. It is similar to our view, however
falls short in several areas.
neuron is able to generate a response in the post-synaptic cell dep nds on several factors:
strength of the synapse, amplification by other excitatory signals, being suppressed by in-
hibitory signals, distance the signal must be transmitted,an morphology of the dendrite
itself.
A Hidden Markov Model (HMM) can be viewed as a state machine inwhich the states
107
themselves are not observable, but an output, whose statistics are determined by the current
state, is observable. For example, in using an HMM to model spech production the states
are the desired utterance (phonemes and words) and the observations are features of the
audio signal produced by the talker. The audio features are determined by the spoken word
but they are randomly distributed since each time that same word is spoken it will sound a
little different. For recognition problems, the goal is to estimate theunderlying states of the
state machine based on the observed outputs. For speech recognition, the HMM decoder
takes as inputs the signal statistics or features and generat s a probability of occurrence on
any one of a set of speech “symbols.” These “symbols” can be grouped over multiple short
windows to generate larger symbols, one of which is phonemes. The ongoing input train
of symbols is used to map a path through a trellis of probabilities for these larger blocks of
phonemes and words [66].
It is widely accepted that Hidden-Markov Model Decoding is one of the most robust
and accurate methods of speech recognition being used today. We contend that the reason
for the success of HMM’s, particularly in speech recognitio, lies in their similarities with
biological processing that is occurring at the dendrite levl.
HMMs may be looked at as probabilistic state machines or somesequential processing
structure. Here, we look at HMMs as propogating waves and theprobabilities relate to
the velocity of propogation. Previous work in this area usedanalog circuitry to decode the
HMM states [49]; this work very clearly explained the computational paradigm for HMM
classification although the circuits were not elegant impleentations.
Figure 76 shows our HMM branch implementation and HMM network implementa-
tion. The transition equation is the key link between HMMs and dendritic computation
structures. Because of our ability to program the floating-gates, we are able to implement
the following transition equation
φi[n] = bi[n](φi[n− 1] + φi−1[n− 1]) (50)
108
using the diffuser structure
φi(t) − φi(t − δ)
︸             ︷︷             ︸
capacitor(a)
= [bi(t) − 1]φi(t − δ)
︸                 ︷︷                 ︸
leaktransistor(b)
+ bi(t)φi(t − δ)
︸         ︷︷         ︸
di f f usiontransistor(c)
(51)









This shows that we can directly implement (52) using a floating-gate diffusor network,
where the currents, Ii, which we define as
I i = Ioe
−κVleaki /UT eκVi/UT ,
to represent the probability along that path.
In discrete HMM state decoding, a decoder is used with a similar structure to the true
HMM system. The decoder contains states corresponding to the hidden states in the HMM
system. Decoding is accomplished by selecting the most probable path through the hidden
states based on the observations [9] [58].
To facilitate HMM decoding in a discrete-time system we associate a likelihood,φi(n)
with each statei in the decoder. Then, for each timen the following steps occur:
• the outputs of the HMM system are observed and the probability that observed output
was produced by statei of the HMM system is estimated asbi(n);
• for each statei the likelihood is updated based onbi(n) and the cumulative previous
probabilities represented in the likelihood of all states that can transition to statei
and the probability of the respective transitions.
For the stereotypical speech production HMM the likelihoodup ate equation is
φi(n) = bi(n) ((1− ai)φi(n− 1)+ ai−1φi−1(n− 1)) . (53)
[65] For the case that the transition probabilities are equal (a common assumption in speech

















HMM Classifier NetworkHMM Branch Element
b1
b2 b3 b4 b5
b6































HMM element Dendrite element
Figure 76. Circuit design for the HMM branch element as well as the corresponding HMM classifier
network. Our branch element design is based upon diffusor elements to perform the classical HMM
calculation. In this framework, each branch element exhibits wave propagation. We can build these
branch elements into an array for classification. In a practical implementation, we need to choose the
largest useful result, which in practice is a WTA circuit, where only a subset of winning outputs are
real outputs. These real outputs reset the HMM function. At the bottom of our branch element, we
also show the the relationship between a dendrite cell and anHMM cell. The di fference between the
two approaches is the implementation of the wave propagation mechanisms and to allow correspond-
ing inputs to enhance wave propagation. The HMM cell explicitly combines two state elements and
eliminates the leakage from the target cell. The dendrite first combines the probability to the resulting
state element and then uses nonlinear gain to transmit the result to the next element. The shaded areas
show the transistors that are signal dependent in each case.
to unity) leaving
φi(n) = bi(n) (φi(n− 1)+ φi−1(n− 1)) . (54)
In digital signal processing, viterbi is typically the process used to calculate a path
weight, however a maximum negative likelihood is typicallyused because of computing
accuracy. Taking the log also simplifies the computation from multiplication to addition
[9] [58] and also increases the dynamic range of the probabilities. The log encoding com-
presses the range of probabilities and ensures that accuracy is preserved.
110
In continuous–time HMM decoding, the likelihood update becomes continuous but the
states remain discrete. We define a continuous–time HMM decoder in the same spirit as
the discrete–time decoder with a likelihood update equation:
dΦi(t)
dt
= γi (λi−1Φi−1(t) − λiΦi(t)) + βi(t). (55)
In Eq. 55,Φi(t) typically represents a log–likelihood measure for statei; λi−1Φi−1(t) repre-
sents the contribution from the statei − 1 scaled by a probability of transition;λiΦi(t) is
the likelihood that the HMM is already in statei and that it is leaving statei; βi(t) is the
log probability that the current observation is generated by the HMM system statei; and
γi controls the speed of the system. Conceptually, theλ terms describe the propagation of
the likelihood from left to right,γi controls the speed of propagation, andβi increases the
likelihood in statei when the observed output matches that expected for statei in he HMM
system.
The state transition behavior of continuous-time HMMs is similar to viewing multiple
waves travel along parallel paths with multiple inputs occurring at different places down
the path. Depending on the current state of the wave and the input, the wave will either
be amplified or attenuated. Outputs taken at the end of these paths are dependent upon the
coincidence of a wave phenomena and a series of input sequences.
When applying this continuous–time HMM decoder to problemswhere the rate of the
state sequence under observation varies greatly, it may be desirable to implement multiple
HMM decoders for each state sequence. For example, if this HMM decoder is applied to
speech recognition then two or three HMM decoders may be usedfor each phoneme to
optimally identify the same phoneme spoken fast or slow. Thenature of the continuous–
time HMM is such that moderate variations in the rate of statechanges do not present a
problem but large changes may defeat the method of identification using coincident waves.
It should also be noted that the disadvantage of placing multiple HMM decoders for each
sequence on a chip is far outweighed by the small size of each de oder.
Using programmable diffusers gives us the ability to implement Hidden Markov Model
111
Figure 77. Results from an HMM branch with 32 nodes as a function of time. The input waveforms
varied regularly with time.
decoders on an analog chip. We achieve this using passive elements for the diffuser struc-
ture, or spreading, and active elements to apply inputs at a given position and time. The
core structure is a small diffuser cell contains 2-3 transistors and capacitors. The compact
structure will allow us to implement very long HMM diffuser lines and/or many HMM dif-
fuser lines. Size is important since each pattern that we desire to recognize must have an
HMM decoder line associated with it [79] [49].
The core of analog HMM decoders are the programmable diffusers. We will store a
wave directionality term on each floating-gate element and as a wave propagates down
a particular path, these stored values determine the amountof additional driving force a
wave receives as it passes through a node. Each node has a single input source and it is the
coincidence of the passing wave and the input which determines the wave behavior.
Linking this explanation to the continuous–time HMM decoder takes us back to ( 55)
where the output at a single node is dependent upon a weightedsum of the propagating
wave and the current observation match to that node.
112
The programmable diffuser equation also shows a similar relationship between thecur-
rent state and previous behaviors. The floating-gate charge, s shown in Figure 76, provides
the weighting from previous node to the current node and the magnitude of the input scales
the input current at the particular node.
The state transition behavior of HMMs is similar to viewing multiple waves travel along
parallel paths with multiple inputs occurring at different places down the path. Depending
on the current state of the wave and the input, the wave will either be amplified or attenu-
ated. Outputs taken at the end of these paths is dependent upon he coincidence of a wave




Research presented here provides a substantial leap forward in the development of analog
computational systems from the study of interesting devicephysics to fully programmable
analog networks and lays a solid foundation for future development of large-scale, com-
pact, low-power analog computation systems. This work builds on earlier investigations of
floating-gate devices as computational memory elements. The result of these investigations
led to the first fully analog system suitable for auditory feature extraction and recognition.
The core of this research has centered around the floating-gate device as an analog
programmable memory element. Extending this concept to computational arrays required
methods of accurately programming large numbers of these devices rapidly. Methods for
rapid programming, ultra-low programming and various evoluti ns of programming hard-
ware implementations and algorithms were also presented.
System level investigations started with observation of the analog computational sys-
tems for feature extraction and led to the first fully analog programmable band pass filter
bank. The programmability was proven through experiments illu trating that each filter
bank was programmable and because of the compactness of the filter, making large arrays
was fairly straight forward. Furthermore, this investigaton identified the key issues associ-
ated with implementing higher-order filter topologies including effective input capacitance,
Q, input linearity, and programming accuracy for each filter. Results from this system were
presented.
With the filterbank as the initial processing block, additional blocks were presented to
further process the incoming signal to complete the featureextraction stage. These stages
included peak detection, linear voltage-to-current transform and a vector matrix multipli-
cation step. The final feature extraction block was based on aCepstrum-like processing,
but, in the analog domain. Also, core processing blocks requi d for performing an analog
114
distance measure and implementing wave propagation networks were introduced. These
blocks were the core blocks used in the vector quantization and the hidden-markov model-
like decoding blocks respectively.
This research concluded by presenting various signal processing systems that could
now be implemented using all of the pieces previously introduced. These systems included
a feature extraction block based on cepstrum processing, a vector quantization block and a
hidden-markov model-like decoding block. With this systemas an initial attempt, it may
now be said that analog computational systems can operate coop ratively with digital signal
processing systems with this being the example. Analog processing blocks have been used
to perform some of the initial processing stages of speech recognition before passing their
results to a DSP for further processing. The result being an an log architecture for auditory
feature extraction and recognition.
The core concepts in this research are part of the underlyingtechnology in the commer-
cial venture GTronix, Inc.
GTroniX, Inc.
Redefining the space of analog circuits through revolutionary signal processing.
Founded, February 2003.





[1] A, F. and H, P., “Offset Removal from Floating-Gate Differential Amplifiers
and Mixers.,” inProceedings of the IEEE Midwest Symposium on Circuits and Sys-
tems, (Tulsa, OK), Aug. 2002.
[2] A, D. V. and H, P., “Cooperative analog/digital signal processing,” in
World Conference on Systemics, Cybernetics, and Informatics, (Orlando, FL), July
2001.
[3] A, A. and B, K., “A 590,000 transistor 48,000 pixel, contrast sensitive,
edge enhancing, cmos imager-silicon retina,” inIEEE Symposium on Circuits and
system, 1995.
[4] B, K. and A, A., “A contrast-sensitive retina with reciprocal synapses,”
in Advances in Neural Information Processing Systems 4(M, J., ed.), San Mateo,
CA: Morgan Kaufman Publishers, 1991.
[5] C, G. and P, V., “A low-power cmos analog vector quantizer,”
IEEE Journal of Solid State Circuits, vol. 32, pp. 1278–1283, August 1997.
[6] D, T., “Silicon retina with correlation-based velocity-tuned pixels,” IEEE
Transactions on Neural Networks, vol. 4, no. 3, pp. 529–541, 1993.
[7] D̈, T. and S, A., “Bias current generators with wide dynamic range,”
in Proceedings of the IEEE International Symposium on Circuits and Systems.
[8] D̈, T., “Bump circuits for computing similarity and dissimilarity of analog
voltages,” inIJCNN-91-Seattle International Joint Conference on Neural Networks,
vol. 1, pp. 475–479, July 8-12 1991.
[9] D, J. R., P, J. G., and H, J. H. L.,Discrete-time Processing of Speech
Signals. New York: Institute of Electrical and Electronics Engineers Press, 2000.
[10] D.K and S, S., “A floating gate and its application to memory devices,”Bell
Syst. Tech. J., vol. 46, p. 1288, 1967.
[11] D, C., F, E., and H, P., “Practical issues using e-pot circuits,” in
Proceedings of the IEEE International Symposium on Circuits and Systems, vol. V,
(Phoenix, AZ), pp. 493 – 496, May 2002.
[12] F, E., “A biologically inspired silicon neuron,” Master’s thesis, Georgia In-
stitute of Technology, 2003.
[13] F, J. C. and H, K. M., “Dendrite structure,” inDendrites(S, G., S-
, N., and H, M., eds.), Oxford: Oxford University Press, 1999.
116
[14] F, G., “Digital signal processor trends,”IEEE Micro, pp. 52 – 59, Nov 2000.
[15] F, B., W, J., and A, A., “CMOS analog IC implementing the backprop-
agation algorithm,” inAbstracts of the First Annual INNS Meeting, vol. 1, p. 381,
1988.
[16] G, A., W, S., and Z, K., “Vector quantization techniques in speech cod-
ing,” in Advances in speech signal processing(F, S. and S, M. M., eds.),
pp. 3 – 48, New York: M. Dekker, 1992. Not terribly useful.
[17] G, D. W. and H, P., “Capacitively-coupled current conveyer second-order
section for continuous-time bandpass filtering and cochleamodeling,” inProceedings
of the IEEE International Symposium on Circuits and Systems, 2002.
[18] H, R. R. and K, C., “An analog VLSI implementation of a visual interneu-
ron: enhanced sensory processing through biophysical modeling,” International Jour-
nal of Neural Systems, vol. 9, pp. 391–395, Oct. 1999.
[19] H, R., B, J., H, P., M, B., , and D, S., “A cmos pro-
grammable analog memory-cell array using floating-gate circuit,” in IEEE Transac-
tions on Circuits and Systems Special Issue, vol. 48, p. 4, January 2001.
[20] H, P.,Foundations of Learning in Analog VLSI. PhD thesis, California Institute
of Techonology, February 1997.
[21] H, P. and L, T., “Overview of floating-gate devices, circuits, and systems,”
IEEE Journal of Circuits and Systems II, pp. 1 –3, January 2001.
[22] H, P., M, B. A., and D, C., “Adaptive circuits using pfet floating-gate
devices,” inProceedings of the 20th Anniversary Conference on AdvancedResearch
in VLSI, (Atlanta, GA), pp. 215–229, March 1999.
[23] H, P., M, B. A., and D, C., “An autozeroing floating-gate amplifier,”
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing,
vol. 48, no. 1, pp. 74–82, 2001.
[24] H, P., “Continuous-time feedback in floating-gate mos circuits,” IEEE Journal
of Circuits and Systems II, pp. 56–64, January 2001.
[25] H, P. and A, D. V., “Cooperative analog-digital signal processing,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing, vol. IV, (Orlando, FL), pp. 3972–5, May 2002.
[26] H, P., D, C., M, B. A., and M, C. A., “Single transistor learning
synapses,” inAdvances in Neural Information Processing Systems 7(T, G.,
T, D. S., and L, T. K., eds.), pp. 817–824, Cambridge, MA: MIT Press,
1995.
117
[27] H, P., K, M., and M, B. A., “A transistor-only circuit model of the au-
tozeroing floating-gate amplifier,” inMidwest Conference on Circuits and Systems,
(Las Cruces, NM), 1999.
[28] H, P. and M, B. A., Floating-Gate Devices, Circuits, and Systems. IEEE
Press, 2002.
[29] H, P., S, P., D, C., G, C., D, J., and A, D., “A
floating-gate vector-quantizer,” inIEEE Midwest Circuits and Systems, (Tulsa, OK),
Aug. 2002.
[30] H, P., S, P., E, R., G, D., and A, D. V., “Biologically
inspired auditory sensing system interfaces on a chip,” in2002 IEEE Sensors Confer-
ence, vol. 1, (Orlando, FL), pp. 669–674, June 2002.
[31] H, P., S, P. D., F, E., and A, D. V., “A neuromorphic ic
connection between cortical dendritic processing and hmm classification,” inDSP
Workshop, Taos, 2004.
[32] H, H. and K, C., “Bifurcations in a vocal fold model,”
[33] H, W. J., “Pitch and voicing determiniation,” inAdvances in speech signal pro-
cessing(F, S. and S, M. M., eds.), pp. 49 – 84, New York: M. Dekker,
1992.
[34] H, S.-C., I, M., and Z, S. R., “A wide range differential difference
amplifier: A basic block for analog signal processing in MOS technology,” IEEE
Transactions on Circuits and Systems II, vol. 40, pp. 289–301, May 1993.
[35] H, M. J., “The speech signal,” inDigital Speech Processing; Speech Coding, Syn-
thesis and Recognition(I, A. N., ed.), pp. 43 – 71, Boston: Kluwer Academic
Publishers, 1992.
[36] K, P., S-S, E., and K, A., “An enhanced adaptive Q-
tuning scheme for a 100-mhz fully symmetric ota-based bandpss filter,”IEEE Jour-
nal of Solid-State Circuits, vol. 38, pp. 585 – 593, April 2003.
[37] K, J. M., “Accurate tuning curves in a cochlear model,”IEEE Transactions on
Speech and Audio Processing, vol. 1, pp. 453–462, October 1993.
[38] K, J., N, M., and G, P., “A dsp powered solid state audio system,”
in icassp, vol. 4, pp. 2283 – 2286, 1999.
[39] K, M., D, J., H, P., and A, D., “Programmable and adaptive
analog filters using arrays of floating-gate circuits,” inProceedings of the 21st Con-
ference on Advanced Research in VLSI, (Atlanta, GA), pp. 148 –162, March 2001.
[40] K, M., L, A., N, J., and H, P., “A programmable continuous-time
floating-gate fourier processor,”IEEE Journal of Circuits and Systems II, January
2001.
118
[41] K, M., H, P., D, J., and A, D. V., “Programmable and adaptive
analog filters using arrays of floating-gate circuits,” in2001 Conference on Advanced
Research in VLSI(B, E. and M, C., eds.), pp. 148–162, IEEE Computer
Society, March 2001.
[42] K, M., H, P., D, J., and A, D. V., “Programmable and adaptive
analog filters using arrays of floating-gate circuits,” in2001 Conference on Advanced
Research in VLSI(B, E. and M, C., eds.), pp. 148–162, IEEE Computer
Society, March 2001.
[43] K, M., L, A., H, P., and N, J., “A programmable continuous-time
floating-gate fourier processor,”IEEE Transactions on Circuits and Systems II: Ana-
log and Digital Signal Processing, vol. 48, pp. 90–99, January 2001.
[44] K, N. and A, G. C. A. G., “Level crossing time interval circuit for micro-
power analog vlsi auditory processing,” inProceedings of the IEEE Workshop on
Neural Networks for Signal Processing, vol. V, pp. 581 – 590, Aug. 1995.
[45] K, N. and A, G. C. A. G., “A circuit model of hair-cell transduction
for temporal processing and auditory feature extraction,”in Proceedings of the IEEE
International Symposium on Circuits and Systems, vol. 3, pp. 301 – 304, 1996.
[46] L, T. S., Neuromorphic Systems Engineering, Neural Networks in Silicon.
Kluwer Academic Publishers, 1 ed., 1998.
[47] L, J. and M, C., “A silicon model of auditory localization,”Neural Compu-
tation, vol. 1, pp. 47–57, 1989.
[48] L, J., W, J., and K, A., “Systems technologies for silicon au-
ditory models,”IEEE Micro, pp. 7 – 15, June 1994.
[49] L, J., W, J., and L, R., “A micropower analog VLSI HMM
state decoder for wordspotting,” inAdvances in Neural Information Processing Sys-
tems 9(M, M. C., J, M. I., and P, T., eds.), pp. 727–733, Cambridge,
Massachusetts: MIT Press, 1996.
[50] L, M. and S, E. H., “Fowler-nordheim tunneling in thermally grown
sio2,” Journal of Applied Physics, vol. 40, p. 278, 1969.
[51] L-B, V. and S-G, T., “On the design and character-
ization of femtoampere current-mode circuits,”IEEE Journal of Solid-State Circuits,
vol. 38, pp. 1353 – 1363, August 2003.
[52] L, J. and C, G., “A micropower learning vector quantizer for paral-
lel analog-to-digital data compression,” inI ternational Conference on Circuits and
Systems, (Monterey, CA), 1988.
119
[53] L, R. F., “A computational model of filtering, detection and compression in the
cochlea.,” inProc. of the IEEE Intl. Conf. on Acoust. Speech and Signal Proc., (Paris),
May 1982.
[54] L, R. and M, C., “An analog electronic cochlea.,”IEEE Transactions on Acous-
tics, Speech, and Signal Processing, vol. 36, pp. 1119 – 1134, July 1988.
[55] M, M. and D, R., “A silicon neuron,”Nature, vol. 354, no. 6345,
pp. 515–518., 1991.
[56] M, M. and M, C., “The silcon retina,”Scientific American, vol. 264, no. 5,
pp. 76–82, 1991.
[57] M, C.,Analog VLSI and Neural Systems. Massachusetts: Addison-Wesley, 1 ed.,
1989.
[58] M, T. K. and S, W. C., Mathematical Methods and Algorithms for Signal
Processing. Upper Saddle River, NJ 07548: Prentice Hall, 1 ed., 2000.
[59] N, S. T. and K, D. O., “A model for active elements in cochlear biomechanics,”
Journal of the Acoustical Society of America, vol. 79, pp. 1472–1480, 1986.
[60] N, J. D., M, B. K., B, E. A., DW, S. P., and H, P. E., “A
cmos coupled nonlinear oscillator array,” inProceedings of the IEEE International
Symposium on Circuits and Systems, vol. IV, (Phoenix, AZ), pp. 301 – 304, may
2002.
[61] O, A. V. and S, R. W., Digital Signal Processing. New Jersey:
Prentice-Hall, 1975.
[62] P, Y.-H., Adaptive Pattern Recognition and Neural Networks. Oxford University
Press, Inc., 1996.
[63] P, G.N.; DW, S., “Analogue vlsi morris-lecar neuron,”Electronics Letters,
vol. 33, pp. 997 – 998, June 5 1997.
[64] P, T. K. and A, P. E., “A highly accurate step-response-based successive-
approximation frequency tuning scheme for high-q continuous-time bandpass filters,”
IEEE Transactions on Circuits and Systems I, vol. 50, pp. 221 – 227, May 2003.
[65] R, L., “A tutorial on hidden markov models and selected applications in speech
recognition,”Proceedings of the IEEE, vol. 77, pp. 257–286, 1989.
[66] R, S., M, N., B, H., C, M., and F, H., “Connectionist
probability estimators in hmm speech recognition,” inIEEE Transactions On Speech
And Audio Processing, vol. 2, pp. 161–174, January 1994.
[67] S, C. and S, R., “A practical micropower programmable bandpass
filter for use in bionic ears,”IEEE Journal of Solid-State Circuits, vol. 38, pp. 63 –
70, January 2003.
120
[68] S-S, E. and S-M, J., “CMOS transconductance amplifiers,
architectures and active filters: a tutorial,” inIEE Proceedings - Circuits, Devices and
Systems, vol. 147, pp. 3 – 12, February 2000.
[69] S, R., Efficient precise computation with noisy components: extrapolting
from an electronic cochlea to the brain. PhD thesis, California Institute of Technol-
ogy, Pasadena, CA, 1997.
[70] S, R., “Analog versus digital: Extrapolating from electronics to neurobiol-
ogy,” Neural Computation, vol. 10, pp. 1601–1608, 1998.
[71] S, A. V., F, E., and V, E., “Improved silicon cochlea using compat-
ible lateral bipolar transistors,” inAdvances in Neural Information Processing Systems
8 (T, D., ed.), (Cambridge, MA), pp. 671–677, MIT Press, 1996.
[72] S, J., Pattern Classification, A Unified View of Statistical and Neural Ap-
proaches. New York: John Wiley and Sons, Inc., 1996.
[73] S, G., S, P., L, H., C, R., H, T., T, C., and P.H, “Au-
tomatic rapid programming of large arrays of floating-gate elements,” inProceedings
of the IEEE International Symposium on Circuits and Systems, (Vancouver, Canada),
2004. Accepted.
[74] S, K. and I, F., “Spectrum distance measures for speech recognition,”
in Advances in speech signal processing(F, S. and S, M. M., eds.), pp. 419
– 452, New York: M. Dekker, 1992.
[75] S, P. and H, P., “The analog speech recognition project,” inProceedings
of the IEEE International Conference on Acoustics, Speech,and Signal Processing,
vol. 4, (Orlando, FL), pp. 3988–3991, May 2002.
[76] S, P. and H, P., “A programmable diffuser circuit based on floating-gate
devices,” inMidwest Circuits and Systems, (Tulsa, OK), Aug. 2002.
[77] S, P., K, M., E, R., H, P., and A, D. V., “Mel-frequency
cepstrum encoding in analog floating-gate circuitry,” inProceedings of the IEEE In-
ternational Symposium on Circuits and Systems, vol. IV, (Phoenix, AZ), pp. 671–674,
May 2002.
[78] S, P. D., G, D. W., C, R., and H, P., “A five-transistor bandpass
filter element,” inProceedings of the IEEE International Symposium on Circuits and
Systems, (Vancouver, Canada), 2004. Submitted.
[79] S, P. D., H, P., and A, D. V., “Analog speech recognition project,” in
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing, vol. 4, (Orlando, FL), pp. 3988–3991, 2002.
121
[80] S, P. D., K, M., and H, P., “Accurate programming of analog floating-
gate arrays,” inProceedings of the IEEE International Symposium on Circuits and
Systems, vol. 5, (Phoenix, AZ), pp. 489–492, May 2002.
[81] T, C., “Calculation of TMS320LC54x power dissipation,” Application Report
SPRA164, Texas Instruments, Digital Signal Processing Solutions - Semiconductor
Group, June 1997.
[82] W, M., H, P., and A, L., CMOS Neural Networks for Pattern Associa-
tion. Special Feature, IEEE Micro, October 1989.
[83] W, K. and S, S. A., “A functional model of the early auditory system.,”
in Proceedings of the IEEE-SP International Symposium, Time-Frequency and Time-
Scale Analysis, pp. 45 – 48, Oct 4-6 1992.
[84] W, L., K, D., L, R., and M, C., “Improved implementation of the
silicon cochlea.,”IEEE Journal of Solid-State Circuits, pp. 692 –700, May 1992.
[85] W, M., O, M., M, M., and S, E., “Speech recognition in ana-
log multichannel cochlear prostheses: initial experiments i controlling classifica-
tions.,” in IEEE Transactions on Biomedical Engineering, pp. 1002 –1010, Oct 1990.
[86] W, E. and S, S., “Comparison of linear prediction cepstrum coefficients
and mel-frequency depstrum coefficients for language identification,” inI ternational
Symposium on Intelligent Multimedia, Video and Speech Processing, (Hong Kong),
pp. 95 – 98, May 2001.
[87] Z, M., “Comparison of four approaches to automatic language identification of
telephone speech.,” inIEEE Transactions on Speech and Audio Processing, vol. 4,
pp. 31–44, January 1996.
122
