FEEDFORWARD ARTIFICIAL NEURAL NETWORK DESIGN UTILISING SUBTHRESHOLD MODE CMOS DEVICES by COUE, Dominique Xavier Henri Leon
FEEDFORWARD ARTIFICIAL NEURAL 
NETWORK DESIGN UTILISING 
SUBTHRESHOLD MODE CMOS 
DEVICES 
by 
Dominique Xavier Henri Leon C O U E 
A thesis submitted to the University of Plymouth 
in partial fulfilment for the degree of 
DOCTOR OF PHILOSOPHY 
School of electronic, Communication and Electrical Engineering 
Faculty of Technology 
June 1997 
LIBRARY STORE 
REEERENCE ONLY 
1 6 OCT 1997 
900 '..413469 
n ^ ^ 2 - C P U • 
90 0341346 9 
Dominique Xavier Henri Leon COUE 
FEEDFORWARD A R T I F I C I A L NEURAL NETWORK DESIGN 
UTILISING SUBTHRESHOLD MODE CMOS D E V I C E S 
Abstract 
This thesis reviews various previously reported techniques for simulating artificial 
neural networks and investigates the design of fully-connected feedforward networks 
based on MOS transistors operating in the subthreshold mode of conduction as they are 
suitable for performing compact, low power, implantable pattern recognition systems. 
The principal objective is to demonstrate that the transfer characteristic of the devices 
can be fully exploited to design basic processing modules which overcome the linearity 
range, weight resolution, processing speed, noise and mismatch of components 
problems associated with weak inversion conduction, and so be used to implement 
networks which can be trained to perform practical tasks. 
A new four-quadrant analogue multiplier, one of the most important cells in the 
design of artificial neural networks, is developed. Analytical as well as simulation 
results suggest that the new scheme can efficiently be used to emulate both the synaptic 
and thresholding functions. To complement this thresholding-synapse, a novel 
current-to-voltage converter is also introduced. The characteristics of the well known 
sample-and-hold circuit as a weight memory scheme are analytically derived and 
simulation results suggest that a dummy compensated technique is required to obtain the 
required minimum of 8 bits weight resolution. Performance of the combined load and 
thresholding-synapse arrangement as well as an on-chip update/refresh mechanism are 
analytically evaluated and simulation studies on the Exclusive OR network as a 
benchmark problem are provided and indicate a useful level of functionality. 
Experimental results on the Exclusive OR network and a 'QRS' complex detector 
based on a 10:6:3 multilayer perceptron are also presented and demonstrate the potential 
of the proposed design techniques in emulating feedforward neural networks. 
Human intelligence is defined concretely as the ability to 
construct a new, unique, accurate response to each new, unique experience which confronts 
each human at each moment of his/her existence. 
Harvey Jackins The Human side of Human Beings (1978) 
Abbreviations 
AN Artificial Neuron 
ANN Artificial Neural Network 
CMOS Complementary Metal Oxide Semiconductor 
DAC Digital-to-Analogue Converter 
EBP Error Back-Propagation 
ECG electrocardiogram 
EEPROM Electrical Erasable Programmable Read Only Memory 
FGMOS Floating Gate Metal Oxide Semiconductor 
FIR Finite Impulse Response 
HRes Horizontal Resistance 
LSB Least Significant Bit 
ML? Multi-Layer Perceptron 
NETSIM NETwork SIMulator 
op amp operational amplifier 
ON Output Neuron 
PE Processing Element 
RAM Random Access Memory 
RMS Root Mean Square 
SPICE Simulation Program with Integrated Circuit Emphasis 
TS Thresholding-Synapse 
TTS Transconductance-Thresholding-Synapse 
VCO Voltage Control Oscillator 
VLSI Very Large Scale Integration 
WP Weight Perturbation 
XOR Exclusive OR 
Contents 
1 
1 Introduction 
1.1 Biological to artificial neural networks: an overview 1 
1.1.1 The brain 2 
1.1.2 Sensory systems 4 
1.1.3 The effectors 5 
1.1.4 Artificial neural networks 5 
1.2 Artificial neural network models 9 
1.2.1 Synapse 9 
1.2.2 Neuron 10 
1.2.3 A neuron and its synaptic cormections 13 
1.2.4 A perceptron 14 
1.3 Neural network topologies 14 
1.3.1 Feedforward neural networks 15 
1.3.2 Feedback neural networks 16 
1.4 Brief history of artificial neural networks 17 
1.5 Overview of simulation of artificial neural networks 20 
1.6 Motivation for the study 25 
2 VLSI implementation of artificial neural networks 41 
2.1 Synapse designs 43 
2.1.1 Resistor-based synapse 43 
2.1.2 Transistor-based synapse 44 
2.1.3 Hybrid transistor/capacitor-based synapse 58 
2.2 Adder designs 59 
2.2.1 Current-based summer 59 
2.2.2 Charge-based summer 60 
2.3 Activation function generator 60 
2.3.1 Squashing function based on a transimpedance amplifier 61 
2.3.2 Diode-based current-to-voltage converter 62 
2.3.3 Thresholding generator for pulse-stream 64 
2.4 Weight storage 65 
2.4.1 Digital weight storage 66 
2.4.2 Capacitive weight storage 67 
2.4.3 EEPROM technology for weight storage 68 
2.5 Summary 70 
3 Subthreshold design for feedforward neural networks 92 
3.1 Statement of problems 93 
3.2 Design method 95 
3.2.1 Possible circuit implementation 96 
3.2.2 Current memory circuit 97 
3.3 The conceptual building blocks of the Neural Network 102 
3.3.1 A four-quadrant multiplier 102 
3.3.2 The load 106 
3.4 Circuit Implementation 109 
3.4.1 Combined load and transconductance-thresholding-synapse 110 
3.4.2 Speed of the network 112 
3.4.3 Weight range 114 
3.5 Summary 114 
4 Design and implementation of a neural network chip 128 
4.1 The complete circuits of the neural network 129 
4.1.1 The transconductance-thresholding-synapse 129 
4.1.2 The load 132 
4.1.3 The pre-processing cell 133 
4.1.4 The output layer activation cell 136 
4.1.5 The core of a feedforward neural network 137 
4.2 Weight storage scheme 137 
4.2.1 Capacitive storage 138 
4.2.2 The refresh mechanism 141 
4.3 Simulation of a neural network 145 
4.4 The neural network chip 147 
4.4.1 Exclusive OR network 148 
4.4.2 QRS complex detector 148 
4.5 Layout techniques 149 
4.6 Summary 150 
5 Performance of the prototype neural network chip 172 
5.1 Experimental characteristics of the basic building modules 173 
5.1.1 Transconductance-thresholding-synapse 173 
5.1.2 Current-to-voltage converter 174 
5.1.3 Horizontal resistance 175 
5.1.4 Output neuron 176 
5.1.5 Weight storage scheme 177 
5.1.6 Refreshing mechanism 179 
5.2 Training algorithm for analogue feedforward neural networks 181 
5.2.1 Back-propagation 184 
5.2.2 Weight perturbation 186 
5.3 Test set-up 189 
5.4 Performance of the analogue neural networks 191 
5.4.1 Exclusive OR network 191 
5.4.2 QRS complex detector 193 
5.5 Summary 195 
6 Conclusions 219 
6.1 Discussion of results 219 
6.2 Recommendations for future work 222 
6-2.1 Weight storage 222 
6.2.2 On-chip learning 223 
6.2.3 Learning algorithm 223 
6.2.4 Sensitivity to temperature 224 
6.3 Summary 224 
Bibliography 226 
A The MOS transistor 239 
A. 1 Strong inversion 239 
A.2 Weak inversion 242 
B CMOS parameters for M I E T E C 2.4mm double poly, double metal 247 
process 
C Analysis of a diode-based currcnt-to-voltage converter 249 
D Analysis of a diode-based current-to-voltagc converter incorporating a 252 
feed-back mechanism 
E Switch-induced error voltage on an elementary PMOS-based 255 
sample-and-hold circuit 
F Switch-induced error voltage on a dunimy-compcnsatcd PMOS-based 260 
sample-and-hold circuit 
G Yield analysis of the digital-to-analoguc converter 265 
H Published papers 268 
IV 
List of Figures 
1.1 Block diagram representation of the nervous system 28 
1.2 Typical biological neuron cell 29 
1.3 Stained section of brain taken from the visual cortex of a rat 30 
1.4 Reconstilution of a section of a vertebrate retina 31 
1.5 Partial representation of an artificial neural network signal processor . . . . 32 
1.6 Electronic-circuit representation of a model neuron and its synaptic 
connections 33 
1.7 Silicon retina of Mead and Mahowald 33 
1.8 Symbolic notation of a synapse 34 
1.9 Symbolic notation of a neuron 34 
1.10 Activation functions 35 
1.11 A neuron and its synaptic connections 36 
1.12 A perceptron 36 
1.13 Single-layer feedforward neural network 37 
1.14 Multi-layer feedforward neural network 38 
1.15 Single-layer recurrent network 39 
1.16 McCulIoch-Pitts model of a neuron 40 
1.17 NETSIM neurocomputer 40 
2.1 Resistor-based synapse 72 
2.2 An MOS transistor synapse 73 
2.3 Static characteristics of an MOS transistor synapse 73 
2.4 Dual-transistor synapse 74 
2.5 Static characteristics of a dual-transistor synapse 74 
2.6 Cross-coupled quad synapse 75 
2.7 Sialic characteristics of the cross-coupled quad 75 
2.8 Differential-pair multiplier 76 
2.9 Static characteristics of the differential-pair in saturation MOS 77 
2.10 Static characteristics of the differential-pair in subthreshold MOS 77 
2.11 Sign switching cell 78 
2.12 Modified Gilbert cell 78 
V 
2.13 Static characteristics of the modified Gilbert multiplier in saturation 
MOS 79 
2.14 Static characteristics of the modified Gilbert multiplier in 
subthreshold MOS 79 
2.15 MOS version of the Gilbert multiplier 80 
2.16 Static characteristics of the modified Gilbert multiplier in saturation 
MOS 81 
2.17 Static characteristics of the modified Gilbert multiplier in 
subthreshold MOS 81 
2.18 Wide range Gilbert muhiplier 82 
2.19 Switched-capacilor synapse 83 
2.20 Current summer 84 
2.21 Squashing function based on a transimpedance amplifier 85 
2.22 Static characteristic of the Iransimpedancc amplifier 86 
2.23 W converter implemented utilising a CMOS arrangement of biased 
diodes 87 
2.24 Static characteristics of the diode-based W converter 88 
2.25 Thresholding generator for pulse-stream 89 
2.26 Digital memory based on weighted current sources 90 
2.27 Sample/hold memory cell 91 
2.28 Floating-gate MOS transistor 91 
3.1 Local section of a fully connected feedforward neural network 116 
3.2 Feedforward neural network with the activation function distributed 
over the following synapses 117 
3.3 Transconductancc multiplier based on the modified Gilbert cell 118 
3.4 Current memory cell 119 
3.5 NMOS version of Pain and Fossum current memory cell 120 
3.6 Circuit diagram of the four-quadrant multiplier 121 
3.7 Static characteristics of the four-quadrant analogue multiplier 122 
3.8 Currenl-to-voltagc converter comprising a feedback mechanism 123 
3.9 Static characteristics of the currcnl-to-voltage converter for various 
supply voltage 123 
3.10 Analogue implementation of a section of a feedforward neural 
network 124 
3.11 Sigmoid activation function 125 
3.12 Analogue implementation of a fully connected feedforward neural 
network 126 
3.13 Simplified small-signal equivalent circuit of a TTS and load 127 
3.14 Transient characteristics of the combined TTS and load circuits 127 
VI 
4.1 The complete circuit of a transconduclance-thresholding-synapse 152 
4.2 Array of TTS circuits 152 
4.3 The complete circuit of the load 153 
4.4 Horizontal resistor 154 
4.5 Static transfer characteristic of the horizontal resistance 155 
4.6 Sialic transfer characteristics of the Iransconductance-thresholding-
synapse fed by an horizontal resistance 155 
4.7 Circuit diagram of the output neuron 156 
4.8 The complete analogue implementation of a fully connected 
feedforward neural network 157 
4.9 Basic PMOS-based sample-and-hold circuit 158 
4.10 Level of clock feedthrough voltage of a single 2.4/2.4 mm PMOS 
switch 158 
4.11 PMOS-bascd dummy-compensated sample-and-hold circuit 159 
4.12 Level of clock feedthrough voltage of a PMOS-based 
dummy-compensated sample-and-hold circuit 159 
4.13 The complete refreshing mechanism 160 
4.14 Circuit diagram of the digital-to-analogue converter 161 
4.15 DAC yield versus current-source mismatch 162 
4.16 Circuit diagram of the differential amplifier 163 
4.17 Simulation results of the open-loop frequency response of the 
differential amplifier 164 
4.18 XOR neural network 165 
4.19 Simulated transient characteristics of the XOR network 166 
4.20 Structure oflhc QRS complex detector 167 
4.21 Layout o fa matrix of TTSs 168 
4.22 Floor plan of the neural network chip 169 
4.23 Photomicrograph of the prototype chip 170 
5.1 Experimental set-up for measuring the transfer characteristics of the 
TTS 197 
5.2 Measured static transfer curves of the TTS 198 
5.3 Test configuration used for measuring the characteristics of the load . . . 199 
5.4 Experimental static characteristics of the W converter 199 
5.5 Experimental set-up for measuring the characteristics of the horizontal 
resistance 200 
5.6 Experimental static characteristic of the horizontal resistor 200 
5.7 Experimental static characteristics of the Iransconductance-
thresholding- synapse when fed by the horizontal resistance 201 
5.8 Measured static transfer characteristics of the output neuron 202 
VII 
5.9 Transient characteristics of the output neuron when loaded by its 
associated sensing scheme 203 
5.10 Variation of the TTS differential output current during the charge 
leakage process 204 
5.11 Effect of clock feedlhrough and capacitive coupling onto the output of 
the TTS during weight refreshing 205 
5.12 Transient characteristic of the refreshing mechanism 206 
5.13 Signal flow of the error back-propagation algorithm 207 
5.14 Signal flow of the weight perturbation algorithm 208 
5.15 ECG training patterns 209 
5.16 Learning curve of the computer simulated QRS complex detector 209 
5.17 Experimental set-up for training and testing the analogue ANNs 210 
5.18 Learning curve of the analogue XOR neural network using the weight 
perturbation algorithm 211 
5.19 Performances of the XOR network when trained using stepwise mode . . 212 
5.20 Transient characteristics of the XOR network 212 
5.21 Learning curve of the analogue QRS detector network using the 
weight perturbation algorithm 213 
5.22 Frequency response of the linear phase FIR filler 214 
5.23 Example of QRS detection for medium quality ECG 214 
6.1 Transconductance-thrcsholding-synapse incorporating an 
FGMOS-based memory device 225 
A . l An n-channel MOS transistor 244 
E. l Equivalent lumped models of the basic PMOS-based sample-and-hold 
circuit 259 
F. 1 Equivalent lumped models of the PMOS-based dummy- compensated 
sample-and-hold circuit 264 
VIII 
List of Tables 
4.1 The chip features 171 
5.1 Output current offset of the individual TTSs 215 
5.2 Nominal resistance of the W converter for various bias currents 216 
5.3 Output voltage offset of the individual W converters 216 
5.4 Output voltage offset of the individual HRess 217 
5.5 Output current offset of the individual ONs 217 
5.6 DACs bit current sources 218 
5.7 Performances of the analogue and digitally simulated QRS complex 
detectors 218 
A . I List of Symbols 245 
IX 
Acknowledgements 
I would very much indeed like to thank the following people for their help and support 
during the course of this research: 
Dr G. Wilson for suggesting the field of study, introducing me to research, and his 
encouragement over the last three years. 
F. Fuchs who has freely given his time to many hours of discussion on the subjects of 
MOS transistor and clock feedthrough phenomenon and his constant stream of 
assistance. 
All the technical and administrative staff within the department; particularly for Mr K. 
Jackson, Mr D. Groom and Mr A. Jerram for their support with the CAD facilities. 
Dr P. Culverhouse for his valuable help in ANN and for critically reading this thesis and 
making suggestions for improvements. 
Y. Tcha and G. Novello of the Institut Universitaire de Technologic D' Angers, 
Maine-et-Loire, France, for building the experimental set-ups. 
Finally, I am deeply grateful to my wife Heather for reading the manuscript in full» and 
for making many helpful suggestions to improve the English. 
X 
A u t h o r ' s dec larat ion 
At no time during the registration for the degree of Doctor of Philosophy has the author 
been registered for any other University award. 
This study was financed with the aid of a studentship from the University of Plymouth. 
A relevant scientific conference was attended and the following papers published: 
Coue, D. and Wilson, G.: 'CMOS subthreshold-mode I/V converter for analogue neural 
network applications', Electronics Letters, Vol. 32, pp. 990-991, 1996. 
Coue, D. and Wilson, G.: 'A four-quadrant subthreshold mode multiplier for analog 
neural network applications', IEEE Transactions on Neural Networks, Vol. 7, pp. 
1212-1219, 1996. 
Signed* 
Date 
XI 
Chapter 1 
Introduction 
1.1 Biological to artificial neural networks: an 
overview 
Creativity has been an important fulfilling and driving element throughout the evolution 
of humankind and has, for example, progressed development of the wheel, through to 
modem vehicles etc. Today, we would identify the Von Neuman computer as a key 20th 
century invention. Although state of the art computers can now outperform biological 
neuron systems in executing tasks such as repetitive arithmetic operations, their abilities 
to perform complex functions such as image processing in real time are inadequate due 
to their limited processing power. In contrast to electronic computers, human 
(biological) neural systems seem to offer enormous computing power combined with an 
amazing flexibility. The desire to duplicate human computing performances artificially 
has been one of the principal driving forces in the quest for a greater understanding of 
the human neural system. The computational abilities of artificial neural systems are 
based on the collective operations of a dense network of inter-connected Artificial 
Neurons (ANs). As with biological systems, networks of artificial neurons draw their 
powerful computing abilities from the fact that they can be taught to perform desired 
tasks. These characteristics have led to the increasing use of artificial neural systems as 
a solution for a wide range of complex applications, for example, automatic speech 
recognition and image processing, 
A description of the structure and function of the human nervous system and of the 
development of artificial neural networks is given in the following sections. The human 
nervous system has received a great deal of attention during the past century, providing 
an ever better understanding of its operation [1-11]. This outcome has been enhanced 
by the combined efforts of researchers involved in cognate disciplines, such as 
neurobiology, neuropsychology, physics, mathematics, computer science and electrical 
engineering. Due to its extreme complexity, the human nervous system may be viewed, 
in a rough caricature, as a three-stage system depicted in the block diagram of Fig. 1.1 
[ I ] , [12]; the brain, the receptors and the effectors. 
1.1.1 The brain 
The central part of the nervous system is the brain, described in Fig. 1.1 as the neural 
network. The term neural network has been derived from the fact that the human brain 
tissue is made up of a network of nerve cells also referred to as neurons. The human 
brain is thought to contain around 10" of these neurons. Although, structural features 
may vary from cell to cell, the arrangement of a biological neuron, shown in Fig 1.2 
[12], consists of three major elements: 
• A cell body, called the soma; 
• A long transmission-line like structure, called the axon; and 
• A branching structure, comprising of what are called dendrites, where the 
neuron picks up signals from other neurons. 
In general terms, the cell body of a neuron collects incoming signals from its dendrites 
and aggregates them. I f the sum reaches a certain level, referred to as the thresholding 
level, a pulse is sent (or "fired") down the axon which abuts other neurons and dendrites. 
The signal generated by the neuron and transported along its axon is an electrical 
impulse. This electrical information is subsequently passed on to other nerve cells via 
connecting links which are also commonly referred to as synapses. A synapse is 
believed to be an electrochemical transmitter, in the sense that it converts a presynaptic 
electrical signal into a chemical signal and then back into a postsynaptic electrical signal 
[2], [6]. Some synapses are excitatory in that they tend to promote firing, whereas others 
are inhibitory and so are capable of cancelling signals that otherwise would excite a 
neuron to fire. A typical neuron may receive information from anywhere between 
hundreds and thousands of adjacent nerve cells and in turn feeds a similar number of 
other neurons. This may give a rough indication of the number of synapses that make up 
the human brain which is understood to be in the order of 10'**. Fig. 1.3 shows a stained 
section of brain taken from a cat cortex [ I ] . This section indicates that the cell bodies 
may be arranged in a layered fashion. Although, this arrangement suggests that the 
information flows principally in a forward direction, i.e. from receptors to effectors, it 
appears that there are also large numbers of lateral and feedback connections. 
A biological neuron may be viewed as an input/output Processing Element (PE); in 
the sense that it collects input signals from either receptors or neighbouring neurons and 
produces an output impulse, depending upon both the incoming signals and the state of 
the synaptic connections. Based on this assumption, the brain could be viewed as a 
massive parallel computer that enables human beings to understand, generate speech, 
recognise images, generate movements and make complex decisions involving 
reasoning. However, this analogy is probably too simplistic in the sense that brain 
mechanisms involve complex electrical and chemical relationships that are not yet fully 
understood [6] and that the central nervous system cannot be viewed as a whole. The 
human brain is a wonderfully we 11-organised and structured system. Investigations into 
the organisation of the brain have established that specific areas of the cerebral cortex 
are dedicated to elementary sensor and motor functions such as the Broca*s area in the 
left frontal lobe associated with the ability to speak and Wernicke's area related to the 
ability to understand natural language [4], [9]. 
1.1.2 Sensory systems 
As mentioned earlier, the central nervous system continuously receives information, 
processes it, and makes appropriate decisions. The principal sources of information for 
the brain are the receptors, which are modified nerve cells that are specialised to 
transforming into electrical signals the stimuli generated externally. Some sensors 
respond to light, others to chemicals (taste and smell) and still others to mechanical 
deformation (touch and hearing). It may be pointed out that our knowledge of the 
structure and function of these sensory neurons is more developed than that of the cells 
making up the central nervous system, since they are more readily accessible [1]. Fig. 
1.4 shows a reconstituted section of a vertebrate retina. The structure of the retina is 
similar to that found in the visual cortex, see Fig. 1.3, in the sense that the inner nerve 
cells are organised in layers. The outer layer of the retina is made up of photoreceptors 
that transduce light into electrical signals. 
1.1.3 The effectors 
An effector may be best described as an electrochemical transmitter, in the sense that it 
transforms an electrical signal into a chemical signal that activates a neuromuscular 
junction and subsequently generates a muscular contraction [8]. The electrical signal 
that activates few or many effectors, depending on the muscle function, is generated by 
a motor-neuron whose cell body is located in the spinal cord. A motor-neuron has a 
similar structure to that of a neuron located in the cortex. However, some of its 
characteristics, such as size of the axon and velocity of nerve-impulse, may vary from 
one to another, depending on the muscles innervated. 
1.1.4 Artificial neural networks 
Research that has led to today's knowledge of the human nervous system has not only 
been driven by medical [10] and psychological [ I I ] purposes but also by the fact that 
the brain has the ability to perform complex functions, such as thinking and reasoning, 
that are not yet achievable by any other means. The fact that the nervous system is able 
to perform such functions with ease, precision and at remarkable speed has fascinated 
human-kind, in much the same way that Copernicus was by the universe four centuries 
ago. This interest combined with that of the revolution of electricity since its discovery 
by the British physicist Faraday, less than two centuries ago, and the discovery of the 
fundamental processing element, referred to as the transistor, by the American physicist 
Shockley and his co-workers, no more than half a century ago, have led to the 
possibility of duplicating some neuronal functions artificially. The aims of the research 
into artificial neural systems in the past five decades have been to understand the 
operation of nervous functions, as well as to model their behaviour using mathematical 
tools and to try to simulate those using either software on a digital computer or 
dedicated integrated circuits. 
The field of artificial neural system research is vast and diverse and, from an 
engineering view-point it may be divided into three distinct areas, in much the same 
way as the descriptive approach of the biological nervous system was adopted formeriy; 
pattern/signal processing, adaptive sensors and motor control. Although this thesis 
concentrates mainly on the simulation of a specific type of pattern/signal processor, the 
next sections present a succinct description of a number of important areas. 
1.1.4.1 Pattern/signal processing 
A pattern/signal processor based on an artificial neural system is a computational unit 
that consists of a network of interconnected ANs, In the literature, such an artificial 
neural system is not only referred to as an Artificial Neural Network (ANN) but also as 
neurocomputing, network computation, connectionism, parallel distributed processing, 
layered adaptive system, self-organizing network, neuromorphic system or network. The 
properties of such ANN systems mainly depend on the adopted neuron model, the 
number of interconnected PEs and the pattern of interconnection (namely, the topology). 
A partial representation of such an ANN system is shown in Fig. 1.5. Furthermore, the 
function that is performed by a complete ANN is also determined by the intemeuron 
connection strengths known as the synaptic weights. Similar to its biological 
counterpart, ANN signal processors acquire the ability to perform complex tasks such as 
pattern recognition [14], speech processing [15], image processing [16], etc., through 
experience commonly referred to as learning. The learning procedure consists of 
modifying the synaptic weights of the ANN in an orderiy manner so as to attain the 
desired function. Although a reasonable idea of the functions of biological neurons and 
synapses is acquired, duplicating them fully may appear to be an impractical task. Many 
different neuron and synaptic models have been suggested but each of them either 
embodies a limited number of features compared with that of their biological 
equivalents or are highly speculative [12], [17-18]. One of the most commonly used 
models simulates the synaptic connection as an arithmetic multiplier weighting the 
neural signal passing through it. The neuron aggregates its post-synaptic signals and 
generates a single thresholded output. A more detailed explanation of concepts and 
terminology of neural network models and topologies is provided in paragraph 1.2 and 
paragraph 1.3, respectively. 
Considerable advances in Very Large Scale Integration (VLSI) technologies have 
facilitated the simulation of reasonable size ANN systems [12], [18-19]. A straight-
forward approach consists of substituting a neuron with an analogue operational 
amplifier (op amp) configured as a summator. The intemeuron connections are made 
using silicon resistors [20-21]. Fig. 1.6 shows an electronic-circuit representation of a 
model neuron and its synaptic connections. However, such electronic-circuit simulators 
suffer from some drawbacks: 
• Communication between neurons is done in a synchronous manner that is not 
a characteristic exhibited by biological neural systems [1]; and 
• The number of interconnection links that can be associated to an AN is limited 
due to the wiring restrictions of VLSI technologies. 
On the other hand, the time response of such an electronic simulator is measured in the 
order of a microsecond which is approximately three orders of magnitude faster than 
that of its biological equivalent. 
1.1.4.2 Adaptive sensors 
Modelling and simulating adaptive sensory systems such as those found in the retina 
[22-24] and the cochlea (sense organ of hearing) [22], [25-26] have received an 
incredible amount of attention during the past decades. Although both of these 
biological stimulus transducers have distinct sensory characteristics, they are largely 
regarded as parallel systems. Research into artificial sensory systems has mainly 
consisted of developing biologically inspired models and simulating those using either 
software on a digital computer or a dedicated analogue VLSI chip [22-26]. However, 
real time simulation of artificial vision and auditory systems is computationally 
intensive, and is not practically achievable using software simulators even by employing 
toda/s most powerful Von Neuman machines. However this technique may still be used 
as a means of correlating the behaviour of a newly developed sensory model with that of 
its biological counterpart. On the other hand. Carver Mead and his fellow researchers at 
Caltech [22-23] have pioneered a new approach that consists of developing a set of 
analogue VLSI subcircuits that correspond to primary vision and hearing functions. At 
the basis of his ground breaking work Mead has exploited the exponential characteristic 
of Complementary Metal Oxide Semiconductor (CMOS) transistors operating in the 
subthreshold mode of conduction. In addition to this fundamental characteristic, power 
consumption is extremely low since current levels are in the range of 10''^  to 10'' A 
[22-23]. Fig. 1.7 shows an electronic representation of the silicon retina developed by 
Mead and Mahowald [22-23]. 
1.1.4.3 Motor control 
Research in this field ranges from restoration of movement in paralysed human limbs 
[27-28], to mimicking human movements using ANN based manipulators [29]. 
8 
Although diverse, the basis of movement control investigations consists of determining 
an optimum mathematical model of motor systems that is based on experimental 
observation of human mechanisms of movement. However, such an empirical and 
theoretical model contains dynamic characteristics that are highly non-linear, making 
motor control systems extremely difficult to control. This computational problem may 
be overcome by judiciously using an adequate ANN system that can be trained to 
generate complex patterns of stimulation required by the motor system that produces the 
desired movements. This technique is usually referred to as Functional Electrical 
Stimulation (FES) when applied to the restoration of movement to paralysed human 
limbs. 
1.2 Artificial neural network models 
In this section, a basic definition of each function making up an ANN and its graphic 
illustration is given. 
1.2.1 Synapse 
The funcfion performed by an artificial synapse is the mathematical operation of 
multiplication. In addition to this feature, a synaptic connection is also characterised by 
a weight. Hence, the output of a connection link is the product of its input and its weight 
value. The input of a synapse is either the output of a neighbouring neuron or an input to 
the network. Fig. 1.8 shows the symbolic notation for a synapse. Although a square box 
embodying a multiplication sign is an adequate graphic illustration, such a symbol may 
at times appear to be an encumbrance, especially when the number of synaptic 
connections becomes dense. In such conditions, connection links are illustrated as 
arrows. The input and output of a synapse are related as 
U j = W i j . X j (1.1) 
where Xj is the input signal, Uj is the output signal and Wjj is the synaptic weight. Since 
the number of synapses associated with a single neuron is usually greater than one, the 
weights are identified by a subscript notation. The first subscript, i , refers to the neuron 
to which the synapse is attached and the second subscript, j , identifies the input of that 
neuron. The property of a synaptic connection is also determined by the weight value 
associated to it. Hence, i f the weight Wj. is positive the synapse is considered as 
excitatory, or i f it is negative the synapse is regarded as inhibitory. 
1.2.2 Neuron 
As in its biological equivalent, an artificial neuron has several inputs (represented in 
number by N), and one output. The inputs of a neuron may be represented using a vector 
notation 
U ^ [ U , , U 2 . . . . , ^ , . . . , U ^ ] (1.2) 
where U represents the input vector and Uj is the j " " input. A neuron performs two 
mathematical operations which are, first, a linear summation function and, second, a 
squashing function, also referred to as an activation function. 
1.2.2.1 Summation 
Initially, a neuron forms the sum of its inputs expressed as 
S i = E U j (1.3) 
10 
This neuronal function is usually symbolically represented as an encircled capital sigma, 
see Fig. 1.9 (a). In a similar manner to the synaptic weight, the sum of a neuron is also 
identified using a subscript notation. 
1.2.2.2 Activation function 
The role of the activation function is to provide an output neuron signal, depending 
upon both the type of function and the active level of its input (output of the adder), that 
is limited (or squashed) between two specified boundaries. The activation function is 
denoted byj{.) and the output of the neuron is expressed as 
Yi=y(SO (1.4) 
Combining (1.3) and (1.4), the output of a PE is related to its inputs as 
Y i = y l U j (1.5) 
Vj=i J 
The squashing operation of the neuron effectively introduces a non-linearity into the PE 
input-output relation. This characteristic enables ANNs to model highly complex non-
linear systems [12], [18]. Although a literature review demonstrates that a large number 
of squashing functions have been proposed, a much smaller number have been adopted. 
The next section introduces the three most frequently used activation functions and their 
properties. In these descriptions, the activation functions are presented as having a 
bipolar characteristic, in the sense that the function allows the response of the neuron to 
be either positive or negative. Although most ANN architecture or applications require 
bipolar neuron response, a unipolar characteristic is simply obtained by shifting and 
scaling the bipolar function [18]. 
I I 
The activation function is symbolically represented by its own ftinctional 
characteristic which is, for the bulk of this thesis, a sigmoid symbol. For reasons of 
convenience, both summation function and activation function symbols are brought 
together, as shown in Fig. 1.9 (b). 
1.2.2.2.1 Step activation function 
The step function, see Fig. 1.10 (a), produces two output values, -1 and 1, in the 
following fashion: 
• I f the input of the of the step function S; is greater or equal to zero, then the 
output of the function takes on the value 1; 
• Otherwise the output of the function is - 1 . 
This type of function is mathematically described as 
r 1 if S- >o 
^^ ^^ n.i i f s ; < o (^ -^ ^ 
This type of function is also referred to as a binary function, since it produces a binary 
type value. 
1.2.2.2.2 Ramp activation function 
The ramp function, see Fig. 1.10 (b), contains three regions of operation of which two 
are areas of saturation. Between the saturation areas the ramp function produces an 
output signal that is a linear function of the active level of its input and is 
mathematically defined by 
C 1 i f S i > l 
\ S i i f - i>Si 
^ -1 i f S i < 
> 1 
xsi)= i i i>i (1-7) 
-i 
12 
For this particular case, within the linear region of operation, the gain of the ramp 
function is unity. Although the ramp function allows the output of the neuron to take on 
any value between the two saturation boundaries (in this case 1 and -1), like the step 
fijnction it contains areas of discontinuity. 
1.2.2.2.3 Sigmoid activation function 
The sigmoid activation function, see Fig. 1.10 (c), is a continuous ftinction characterised 
by a shape similar to an "S". It is by far the most widely exploited activation function in 
the design of ANNs. This is due to the fact that the sigmoid fianction has a monotonic 
characteristic which is an activation fimction feature required by many learning 
algorithms [18]. Several ftinctions offer such a sigmoid characteristic, however the most 
commonly exploited is the hyperbolic tangent function, usually written as tanh. Such an 
activation function is represented mathematically by the following expression 
Despite being a non-linear function, it may be noted that the sigmoid function, within a 
narrow range around the origin, has a linear approximation. 
1.2.3 A neuron and its synaptic connections 
In the literature, a neuron is usually represented with its associated synaptic connections 
as shown in Fig. 1.11. The operation performed by a neuron having N synaptic weighted 
inputs is obtained by combining (1.1) and (1.5) and is 
(1-9) 
13 
1.2.4 A perceptron 
The addition of a bias allows the threshold of the activation function to be varied from 
one neuron to another, see Fig. 1.12. The bias b, is usually obtained by adding an extra 
synaptic connection to the neuron for which the input is set to a fixed value of 1. Thus 
the value of the bias is given by the value of the weight associated to the biasing 
connections, i.e. b = WJQ. The operation of a neuron with its synaptic connections and 
bias is expressed as 
E W i j . X j + b (1 .10 ) 
The structure shown in Fig. 1.12 is the basic element that is used to build ANNs, and is 
usually referred to as a perceplron after Rosenblatt [ 3 0 ] . 
L 3 Neural network topologies 
Neural networks may be constructed as interconnected perceptrons. ANNs are typically 
organised in layers in a similar way to their biological equivalents. Within a layer, 
neurons are fed by synaptic connections from sources which are either the outputs of 
neurons situated within a neighbouring layer or inputs to the network. The choice of 
architecture is strongly influenced by the task that the neural network has to perform 
[18]. Although this thesis is primarily focused on the behaviour of a particular ANN 
topology, namely the class of feedforward networks, a discussion of some of the most 
common topologies and their characteristics are presented in the following section. 
1 4 
1.3.1 Feedforward neural networks 
A feedforward network may be described as a network where signals are exclusively 
propagated in a forward fashion, through layers of perceptrons until the information 
reaches the last level, i.e. the output layer. This type of configuration is commonly used 
in ANN design since it can be trained to perform pattern recognition [16], pattern 
classification, tasks etc. Feedforward neural networks are said to be fully connected 
when each neuron within a layer is connected to every neuron in the abutting forward 
layer. It is said to be partially connected i f otherwise. 
1.3.1.1 Single-layer feedforward network 
The simplest form of feedforward neural network is the single-layer perceptron. The 
term single-layer is derived from the fact that the inputs of the network (namely, the 
input nodes) are projected onto one layer of neurons which is also the output layer. Fig. 
1.13 shows a single-layer perceptron which contains M perceptrons which have N 
inputs. The fundamental characteristic of the single-layer feedforward neural network is 
that it is limited to the classification of lineariy separable input patterns [31]. In order to 
increase the capacities of such networks to non-linear classifiers, one or more additional 
layers of perceptrons are required. 
1.3.1.2 Multi-layer feedforward networks 
Multi-layer feedforward configuration is the most extensively used architecture in ANN 
design. Such a type of architecture is also commonly known as a Multi-Layer 
Perceptron (MLP). A MLP is a network which combines two or more layers of 
perceptrons. Fig. 1.14 shows a multi-layer feedforward neural network which includes 
M layers. For simplicity of notation the network shown in Fig. 1.14 is referred to as a 
15 
N:P:Q: :R MLP, where N represents the number of source nodes, P the number of 
neurons in the first layer, Q the number of neurons in the second layer and R the number 
of neurons in the output layer. It may also be noted that an additional superscript 
number needs to be added to the weight notation in order to associate a set of synaptic 
weights to a layer. As an example, synaptic weight would be the weight associated 
to the synapse which is connected to the i"" neuron of the k"' layer and which takes as its 
source the output of the j ' ^ neuron of the antecedent layer. It may also be added that the 
layers which are located between the input layer of source nodes and the layer of output 
neurons are referred to as hidden layers. 
This thesis focuses on multi-layer feedforward networks since they are capable of 
implementing within a prescribed degree of accuracy, many complex input/output 
mapping functions of practical interest [32]. This capability is due to the evolution of a 
suitable learning algorithm which is commonly known as the Error Back-Propagation 
(EBP). The characteristics of this learning algorithm are introduced in chapter 5. It may 
also be added that the output signals of such a structure, at any given time, depend 
entirely on the state of the input pattern and the synaptic weights. 
1.3.2 Feedback neural networks 
Feedback networks are also commonly known as recurrent neural networks. A recurrent 
network is differentiated from a feedforward type structure by the fact that one or more 
output signals are fed back into the network via delay units. A single-layer recurrent 
neural network is depicted in Fig. 1.15. The distinct characteristic of recurrent networks 
is that the output signals, at any instant, depend not only on the state of the input pattern 
and the strength of the synaptic connections but also on the internal state of the network 
(i.e., the level of output activity of the previous instant). Feeding the outputs back into 
16 
the network has also the effect of introducing a memory characteristic into the system 
[18]. However, unlike the feedforward networks where outputs are "instantaneously" 
generated, the output signals are only available after the network has reached a steady 
state. One of the most common feedback neural networks is the Hopfield scheme [33], 
which has a structure similar to that shown in Fig. 1.15. The characteristics of such a 
neural network have also attracted a lot of interest within the ANN research community. 
1.4 Brief history of artificial neural networks 
As mentioned earlier, the field of ANN research is vast and diverse and this has 
consequently led to a large amount of literature. This historical review concentrates on 
the most significant research developments. However, i f the reader requires more 
historical details, additional information may be found in the articles of Grossberg [17] 
and Widrow et ai [34] and the Haykin [12] and Zurada [18] texts. The following 
summary is presented in chronological order. However, it is worth mentioning that since 
its early days, research into ANN has followed discontinuous paths. 
1943 is usually considered as a key year in terms of ANN research. During that year, 
McCulloch and Pitts [35], proposed the idea of a logical calculus for modelling the 
nervous system. The McCulloch and Pitts model of the neuron is shown in Fig. 1.16. 
Although revolutionary at the time, the McCulloch-Pitts model contains some severe 
limitations such as: 
• The inputs and output of the neuron are limited to binary type values; and 
• The synaptic weights are constant and are confined to either Wj = +1 for 
excitatory synapses or W; = - I for inhibitory synapses. 
17 
Although being severely restricted, McCuIloch and Pitts have shown that their neuron 
can perform basic logic operations such as an AND, OR and NOT. It was not long after 
this pioneering work, in 1949, that Hebb [36] suggested the first simple learning rule 
which is still in use today. The rule, known as the Hebbian learning algorithm, is 
formulated by Hebb as follows: 
• When an axon of cell A is near enough to excite a cell B and repeatedly or 
persistently takes place in firing it, some growth process or metabolic change 
takes place in one or both cells such that A's efficiency, as one of the cells 
firing B, is increased [36]. 
That is to say that the learning rule expresses an update of the synaptic weight as being a 
proportion of the correlation between its presynaptic signal Xj and its postsynaptic 
activity [12]. At that time Hebb's work had a great deal of influence in the world of 
neural network theory and has since been exploited by many other researchers and 
evolved in many directions. However, the ANN research community had to wait until 
1956 to see the first computer simulation of a neural network. This work was carried out 
by Rochester and his colleagues [37]. The system simulated 512 neurons and made use 
of the Hebbian learning rule. Inspired by Hebb's work, in 1958, Rosenblatt [30] 
presented his theory on the perceptron; Rosenblatt's probabilistic approach, introduced a 
link between the learning rule of Hebb and the neuron model of McCulloch and Pitts. 
Subsequently, the perceptron received a considerable amount of attention. However, the 
excitement generated by the perceptron was soon to disappear with the publication of a 
book by Minsky and Papert [31] in 1969. Using a rigorous mathematical approach 
Minsky and Papert demonstrated the computational limits of the perceptron. These 
results were to leave the research community more or less at a standstill for more than a 
18 
decade. In the mean-time, in 1960, Widrow and Hoff suggested a basic neural network 
building block known as the ADAptive LINEar element also referred to as an 
ADALINE, along with a new powerful but simple learning rule known as the least mean 
square algorithm also referred to as the Widrow-Hoff learning rule [34]. The major 
difference between the ADALINE and the perceptron hes in their learning 
characteristics; for the former the adaptive algorithm requires the knowledge of a target 
output while in the case of the latter this information is not needed. Here lies the origin 
of the supervised learning algorithm. 
While many researchers abandoned the field of neural network during the 1970's, 
Anderson [38] and Kohonen [39], published independently, their ground breaking work 
on a model for associative memories. 
Interests into neural network research were revived, in the 1980's, due in part to the 
publication of Hopfield's paper in 1982 [33]. In his article, he suggested that artificial 
neural systems draw their computational characteristics from the collective properties of 
a fully connected network of PEs. His theory put an end to the scepticism brought up by 
the revelations of Minsky and Papert more than a decade earlier. He also presented a 
new descriptive approach of associative memory using differential equations. He 
additionally drew attention to the possibility of implementing artificial PEs using 
integrated circuits. Since he formulated his theories, artificial neural systems have been 
extensively studied and have also generated a lot of interest within the engineering 
community for whom ANN offers a new solution to some complex problems. As an 
example, Sejnowski and Rosenberg [15] have used an ANN that can be taught to 
convert a string of characters into a string of phonemes. 
19 
In addition to all of these influential developments, the advance of VLSI 
technologies have made possible the implementation of neural networks in electronic 
hardware. Amongst the first electronic neural network designs were those suggested by 
Graf and his collaborators [20] and Hopfield and Tank [21], in 1986. Their designs 
made use of basic electronic devices such as the resistor and the capacitor. However this 
was immediately followed by many other designs which were using more 
computationally advanced electronic devices such as the MOS transistor. Mead's book, 
published in 1989, presented a new view on the similarity between primary biological 
functions and circuits and devices based on CMOS transistors operating in the 
subthreshold mode of conduction [22]. It may be added that Mead's philosophy and 
work have been a great source of inspiration for much of the work presented in this 
thesis. 
1.5 Overview of simulation of artificial neural 
networks 
Within the last decade, computer scientists and electrical engineers have dedicated time 
to the design of ANN simulators as a means for complex problem solving. The aim of 
this section is to provide a broad view of the many different lines of thought that have 
evolved over the years. 
One of the most inexpensive and readily available solutions, to the simulation of 
ANN functions, is based on software implemented on conventional single-processor 
computers. However, the superior computational power of an ANN in performing tasks 
such as speech processing or image processing in real time results from the parallel 
operation of a large number of interconnected neurons. Unfortunately, computational 
20 
loads such as these require extensive execution times on a conventional computer 
architecture. Even with the introduction of architecturally advanced processors such as 
the Digital Signal Processor (DSP) and the Reduced Instruction Set Computer (RISC) 
processor, leading to speed improvements of an order of magnitude every five to ten 
years, many real-time applications remain too demanding [18], [40-41]. 
To overcome this speed trap, Forrest and his colleagues [41] have advocated the use 
of Multiple Instruction Multiple Data (MIMD) arrays of transputers where the 
processing load is distributed over many processors. Their simulator consists of 40 
transputers, a host acting as an overall controller and a graphics processor used as a 
display generator. Each transputer includes a 32-bit microprocessor with on-chip 
memory, inter-processor links and a 32-bit-wide external memory interface. A processor 
can be programmed, using a dedicated programming language, to compute a share of 
the processing load and to communicate with up to four of its neighbouring processors. 
Forest and his collaborators applied their general-purpose parallel computer to the 
simulation of an Hopfield neural network [21] which was trained to perform image 
restoration. However, the limited number of links between processors poses a special 
challenge to the interconnection needs of ANNs. The efficient programming of parallel 
processors poses further challenges. On the other hand, such a simulator offers some 
degree of flexibility, in the sense that the topology of the simulated ANN is easily 
alterable through programming and arithmetic precision is high. 
Much attention is now focused on exploiting VLSI technologies for the hardware 
implementation of dedicated ANN simulators [42-43]. CMOS technologies offer the 
capability of fabricating chips with tens of millions of transistors on a single silicon die. 
Garth [44] and Hammerstrom [45] have suggested digital approaches, where arrays of 
21 
identical modules, consisting of an arithmetic processor with sufficient local memory to 
simulate moderate size single layer networks, operate in parallel. For example, Garth's 
simulator consists of a 3-dimensional array of interconnected NET work SIMulator 
(NETSIM) cards and a host computer, see Fig. 1.17. Each NETSIM card is an 
autonomous element with sufficient local memory and processing power to simulate a 
single-layer network of 256 neurons with 256 synaptic connections per neuron (i.e., 
65,536 synapses) in less than 20 millisecond. A NETSIM card contains: 
• A communication chip which enables the fast exchange of messages between 
cards or the host and a card; 
• A local microprocessor; 
• The solution engine designed to act as a co-processor which accelerates the 
computation of the extensive number of multiply-and-add operations required 
by ANNs; and 
• Some memory to store data such as the synaptic weights, the input vectors, etc. 
The access of the memory is pipelined, allowing low-cost 120 nano-second 
Dynamic Random Access Memories (DRAMs) to be used. 
This approach results in a low-cost, efficient and fast means of simulating ANNs. The 
X I neural network chip of Hammerstrom is also based on a similar approach. However 
it is worth noting that neither neural network simulators fully exploit the parallel 
processing and fault tolerance characteristics of ANNs [21] since synaptic and neuronal 
operations are distributed in time. Alternative digital approachs [43], [46] which do 
offer such features are based on a systolic array of processing units (i.e., one multiplier 
per synapse and one adder and activation function generator per neuron). However, due 
22 
to the digital complexity and the area of silicon required for the multiplying operation 
the number of processing units which can be implemented on a silicon die is limited. 
For example, the GENES IV chip of lenne and Viredaz [46] integrates a matrix of 2 x 2 
processing units. This small number is mainly due to the extended arithmetic precision 
of a processing unit (e.g., 17 bits). However Murray [43] indicates that even utilising a 
reduced precision arithmetic approach, the number of elements which can be integrated 
on a silicon die is considerably limited (<I00). Furthermore, digital approaches also 
involve relatively high power consumption since it is proportional to the square of the 
operating frequency. On the positive side however, digital systems have a greater 
immunity to noise, operate at high speed (frequency in excess of 100 MHz), offer high 
levels of precision and ease of weight storage. 
The simulation of large numbers of interconnected neurons, and therefore synaptic 
links, in a parallel systolic manner can be achieved by implementing the processing 
elements using an analogue approach [19-26], [47-69]. Successful integration of large 
numbers of synaptic connections and neuron functions on a single silicon die demands 
that each element uses the smallest possible area. One analogue approach [20-21], 
requires an op amp for each neuron and simulates synaptic connections using simple 
resistors. Unfortunately, such systems cannot easily be programmed. Proposed solutions 
[19], [22-26], [47-69] have exploited the ease with which the transconductance of MOS 
transistors can be changed by modifying the device's bias point. These implementation 
techniques may be classified in distinct categories depending upon the mode of 
conduction of the transistors [72-74]: 
• Artificial neural functions can be implemented using building blocks 
consisting of MOS devices operating in the above-threshold mode of operation 
23 
[47-60]. Here the designers exploit the fundamental quadratic function of the 
transconductor. 
• An alternative approach simulates ANN functions exploiting the exponential 
characteristics of MOS transistors operating in the subthreshold mode of 
conduction (also known as weak inversion) [61-69]. 
Within these two sub-sections of ANN design many implementation techniques have 
been suggested ranging from the single transistor synapse [47-48] to the MOS version 
of the Gilbert transconductance multiplier, which utilises six transistors [70]. Although, 
analogue techniques are well suited to the implementation of artificial neural networks, 
issues such as dynamic weight storage [71], arithmetic precision [67-68], and noise 
immunity, pose significant challenges. 
A hybrid approach which combines analogue and digital technologies has been 
suggested by Murray and his colleagues [75-77] and subsequently investigated by other 
groups [78], The pulse-stream technique uses digital signals to both carry information 
and to control analogue circuitry, allowing the compactness of analogue computation to 
be effectively coupled with the simplicity and robustness of digital signals. However, 
one such design [76] requires 100 pulses to drive a neuron fully "on" or "o f f . With a 
pulse frequency conservatively set at 0.5 MHz, a two-layer perceptron would settle in 
around 3 ms, which is approximately three orders of magnitude slower than alternative 
analogue techniques [54], [58]. 
The capacities of previously published analogue and hybrid designs that emulate 
ANN functions systolically are further detailed in chapter 2. 
24 
1.6 Motivation for the study 
The still emerging revelation that ANNs are applicable to a wide range of problems 
coupled with the growing availability of an already extended range of simulators is 
leading to an increasing use of artificial neural systems in many different fields and 
especially in the fertile areas of medicine and healthcare. ANN systems are particularly 
capturing interest in medical applications which are directly connected to improving the 
quality of human life. For example, several research groups are working on the problem 
of restoring movement to paralysed human limbs [27-28] using ANN based functional 
electrical stimulators. For such medically related problems patients could further benefit 
fi-om the design of an implantable system. Such an approach has already been suggested 
for the design of cardioverter defibrillators providing an on-line heart therapy to patients 
suffering from life-threatening arrhythmia [79] which task is similar to the detection of 
heart beats, i.e. QRS complexes, of foetal electrocardiograms [80]. 
In chapter 3, design of multi-layer feedforward neural systems which exploit the low 
current levels associated with MOS devices operating in subthreshold mode are 
investigated. Such designs have continued and will continue, to receive much attention, 
because they satisfy the fundamental characteristics required by implantable neural 
network systems: 
• Low power consumption enables a battery power source; 
• Compact analogue circuits allow systolic implementation, hence facilitating 
pattern classification in real time; 
• Computation is carried out in the analogue domain resulting in simplified 
interfaces to outside systems such as sensors and effectors; and 
25 
• Analogue parallel architecture exhibits robust performance in the presence of 
hardware faults. 
Although subthreshold operation exhibits valuable characteristics, it also lacks strength 
in some other areas: 
• Subthreshold currents have reduced abilities to charge/discharge capacitive 
elements, hence the operating speed of ANN systems is degraded [66], [69], 
[72]; 
• MOS devices operating in the weak inversion are affected by mismatching 
effects [65], [81-83]. These effects are characterised at the circuit level by 
offsets, hence limiting arithmetic precision; 
• High weight storage resolution (at least 8 bits) is unobtainable in practice 
without a trade-off in silicon area; 
• Subthreshold currents are vulnerable to thermal noise; and 
• The transconductance of an MOS transistor operating in the weak inversion is 
strongly sensitive to temperature [72-74] and biasing variations. 
To address some of these issues, a novel building block called the thresholding-synapse 
which exploits the non-linearity associated with the input of transconductance 
multipliers to perform both the activation function and synaptic multiplication is 
presented. A new four-quadrant multiplier is developed in order to efficiently simulate 
this thresholding synapse, complemented by an equally original current-to-voltage 
converter. Critical discussions on the effectiveness of the proposed schemes to emulate 
neuronal functions are derived from analysis and simulation results. The unit's potential 
in terms of speed and ease of programmability is also provided. 
26 
Based on these suggested arrangements the implementation of a whole neural 
network chip which comprises a 2:2:1 feedforward network, to solve the Exclusive OR 
(XOR) problem as well as to check the AN operations, and a QRS complex detector, 
founded on a 10:6:3 MLP, is then the subject of chapter 4. An extra two processing 
cells, namely the horizontal resistance and output neuron,27 are introduced in order to 
alleviate problems associated with the effect of combining the thresholding and synaptic 
functions. Structural details of all of these basic building blocks and their vulnerability 
to process parameter variations are presented. Further discussions are also devoted to 
the issue of an adequate weight storage scheme and tailoring of an on-chip weight 
update/refresh mechanism to suit it. In order to evaluate the performance of the 
proposed system as a whole, simulation studies on the XOR neural network as a 
benchmark problem have also been conducted. 
Experimental results from each of the primitive structural elements are subsequently 
presented in chapter 5. For each experiment the set-up and chip configuration are 
provided. The concept of learning is also discussed in this chapter. The suitability of the 
back-propagation and weight perturbation learning algorithms for chip-in-Ioop training 
is then evaluated using illustrative examples. The experimental set-up used for training 
and testing the networks is then presented. Finally the performance of the two low-
power analogue VLSI ANNs is provided and compared with that of a digital simulator. 
27 
Neural 
Network 
Receptors Effectors 
Sensory 
Organs 
Motor 
Organs 
The Brain 
Figure 1.1: Block diagram representation of the nervous system. 
28 
Dcndrilic spines 
Apical J 
dcndriies 
Segment 
of dendriic 
Basal 
dendrites 
Synaptic 
icminals 
Figure 1.2: Typical biological neuron cell. 
Adapted from Haykin [12]. 
29 
Figure 1.3: Stained section of brain taken from 
the visual cortex of a rat. Adapted from Hubcl 
|1|. The numbers on the right-hand side identify 
cellular layers; the capital letters label individual 
neurons. 
30 
I i<iure 1.4: Reconstitution of a section of a 
vertebrate retina. Adapted from Spooner |13|. 
There are three layers of cells; the outer receptors, R, 
the internal bipolar cells, BC, and the inner ganglions 
cells, G. 
31 
5\ 
Processing 
element 
Figure 1.5: Partial representation of an artificial neural network 
signal processor. 
32 
Connection from 
other neurons 
7 ^ OpAmp 
Figure 1.6: Electronic-circuit representation of 
a model neuron and its synaptic connections. 
Figure 1.7: Silicon retina of Mead and Mahowald. A single pixel element 
is illustrated in the circular window. 
33 
w 
T 
Figure 1.8: Symbolic notation of a synapse. 
(a) (b) 
Figure 1.9: Symbolic notation of a neuron. 
(a) expanded form, (b) compacted form. 
34 
2 
1.5 
1 
0.5 
0 
•0.5 
-1 
-1.5 
-2 
2 
1.5 
1 
0.5 
0 
•0.5 
-1 
•1.5 
-2 
/(Si) 
-
I 1 1 1 1 1 
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 
(a) Si 
/(Si) 
/ 
• / 
1 1 1 1 1 1 
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 
(b) Si 
0.5 K 
0.5 K 
- 4 - 3 - 2 - 1 0 1 2 3 4 
(C) Si 
Figure 1.10: Activation functions, (a) Step function, 
(b) Ramp function, (c) Sigmoid function. 
35 
Figure 1.11: A neuron and its synaptic connections. 
Xo = l 
Inputs^ Xj 
X„ 
Figure 1.12: A perceptron. 
36 
Input layer Ouput layer 
Figure 1.13: Single-layer feedforward neural network. 
37 
Input layer Hidden layers Output layer 
Figure 1.14: Multi-layer feedforward neural network. 
38 
Delay unit 
Figure 1.15: Single-layer recurrent network. 
39 
Binary 
inputs 
1 
-1 Binary 
output 
Figure 1.16: McCulloch-Pitts model of a neuron. 
Broadcast 
Channel 
Host 
Computer 
NETSEM 
CARD 
\ 
3-D communications 
Channel 
Communication 
Chip 
t 
Microprocessor 
Solution-engine 
Chip 
Synapse/Input 
Memory 
Figure 1.17: NETSIM neurocomputer. 
40 
Chapter 2 
VLSI implementation of artiHcial neural 
networks 
Since the beginning of the 1970's the evolution of VLSI technologies has mainly been 
driven by an increasing demand for more computationally evolved digital systems. 
During the past decade, growth has, however, been more noticeable due to the maturing 
of analogue VLSI. Developments in both analogue and digital VLSI and MOS 
techniques have recently attracted interest in the field of ANN simulation for several 
reasons: 
• State of the art VLSI technologies offer the capability of fabricating chips with 
tens of millions of transistors on a single silicon die [22]; 
• An MOS transistor is a powerful computational element. Several of these 
devices can be combined to form advanced processing circuits such as 
multipliers and adders; 
41 
• Today's VLSI processes allow for both analogue and digital circuits to be 
integrated on the same silicon chip; 
• VLSI technologies are well suited to the highly parallel, regular, and modular 
architecture of artificial neural systems; 
• Computer-Aided-Design (CAD) tools for design and simulation of analogue 
and digital MOS circuits are well developed; and 
• VLSI technologies are more readily accessible to the public. 
Al l these evolving characteristics make VLSI technologies increasingly better 
candidates for the implementation of artificial neural systems. 
As stipulated in the introduction, to possess the fundamental features of parallel 
processing and fault tolerance, ANN hardware simulators require the use of a design 
architecture which is based on a systolic approach. To fulf i l such a design condition, it 
is highly desirable, i f not essential that each element making up a neural network 
simulator occupies a minimum silicon area. Such an issue arises i f one considers the 
implementation of Sejnowski-Rosenberg's MLP [15], since 18,629 synaptic connections 
and 106 neuronal functions are required, or when designing an implantable system [79]. 
This chapter reviews design techniques that have been developed as a means of 
simulating ANN systems in a systolic manner. Hence, these include a family of reported 
schemes based on either analogue [19], [22-26], [47-69] or analogue/digital [75-78] (i.e. 
hybrid) design methods. These building blocks are presented in the following sections 
depending upon which of the three basic operations it is simulating namely, synaptic 
weighting (multiplication), signal summation and activation functions. These various 
cells are implemented using basic electronic devices such as resistors, capacitors, op 
42 
amps and MOS transistors. The electronic characteristics of the resistor, capacitor and 
op amp can be found in the hterature [84] while a brief description of the operation of 
an MOS transistor is presented in Appendix A. 
The overall objective of this review is to discuss and evaluate the performances of 
previously published computational circuits based on both theoretical and SPICE [85] 
(Simulation Program with Integrated Circuit Emphasis) simulation results. Al l the 
SPICE simulations presented in this chapter have been generated using the parameters 
of the MIETEC 2,4 |im CMOS process which are presented in Appendix B. 
Finally the last section of this chapter presents an overview of the issue of weight 
storage. 
2.1 Synapse designs 
The synapse is by far the most important building block in the hardware implementation 
of ANNs. Using Sejnowski-Rosenberg's MLP as a design example, the number of 
synaptic circuits would represent 99.4% of the total number of cells. This illustration 
gives a clear indication to why researchers involved in the design of ANN simulators 
have continued, and will continue, to dedicate attention to the development of efficient 
synaptic cells. The function of a synapse is to produce an output signal which is the 
result of the product of the synaptic input signal and a weight. Both input and weight 
signals can be of either sign resulting in a four-quadrant multiplication. 
2.1J Resistor-based synapse 
The most straightforward synaptic design is based on Ohm's law [84]. Ohm's law states 
that the current flowing through a resistor is related to the product of the differential 
43 
potential across that resistor and its conductance, thus offering the multiplying feature 
required by the synaptic element. Fig. 2.1 shows a resistor and its equivalent synaptic 
representation. One of the terminals is connected to a virtual ground so that the 
differential potential across the resistor is V^ which also corresponds to the input signal 
of the synapse. The current flowing through the resistance, illustrated in Fig. 2.1 as lu, is 
usually selected as the output signal of the synapse since it simplifies the 
implementation of the summing function. Consequently the conductance G determines 
the value of the weight as the resistor-based synapse function is expressed as 
I u = G . V x (2.1) 
The advantage of such a technique is that it allows the integration of large neural 
networks, since high resistor density is achievable using dedicated material such as 
amoqDhous silicon (approximately one resistor per 10 fim^) [20]. However one of the 
major drawbacks is that once a circuit is fabricated, the function of the network is not 
easily alterable since the system does not allow for reprogrammability. As a 
consequence the design has no learning capability. Furthermore, the mapping 
characteristics of a network may vary from one chip to another. These variations are 
mainly due to the effect of limited accuracy of integrated resistors. Moreover a resistor-
based synapse does not allow for an inhibitory type of connection, unless the driving 
neuron provides for inverting and non-inverting outputs [20-21], 
2.1.2 Transistor-based synapse 
Learning in artificial neural systems requires a synaptic circuit that permits the weight to 
be externally adjustable. Such an increase in flexibility is achieved in VLSI technology 
by exploiting the ease with which the conductance or transconductance of MOS 
44 
transistors can be altered [ 4 7 ] . The techniques discussed in this section utilise MOS 
devices as either voltage-controllable resistors or electronically adjustable 
transconductors. 
2.1.2.1 Single-Transistor Synapse 
An MOS transistor operating in the linear mode of conduction (Appendix A) may be 
viewed as a voltage-controllable two-terminal resistor [ 4 7 - 4 9 ] . This may be illustrated 
by considering the drain current flowing through an N-channel MOS device 
I D = K N . { ( V G S - - ())B)VDS - ^ V ^ ^ 
1 3 
CVds -VBs+(!)B)2 - ( ( t ) B - V B s ) 2 } (2 .2 ) 
where V^g is the gate-to-source potential, V^s is the drain-to-source potential, Vgg is the 
bulk-to-source potential, Vpg is the flat-band voltage, (j)g is the surface potential and y is 
the body effect coefficient (Appendix A). The so-called transconductance factor is given 
as 
K N = H N . C O X . - ^ (2 .3 ) 
where l i ^ is the electron mobility, is the thin oxide capacitance per unit area, and W 
and L are the channel width and channel length of the transistor respectively. 
Utilising a binomial expansion technique, (2 .2 ) can be expanded in Taylor's series 
resulting in the following expression 
I D = K N . ( V G S - V T ) . V D S + /^(VDS) (2 .4 ) 
where the flrst term is linearly dependent on V^g and h(V^^) represents all the non-
linear terms of the drain current. 
45 
Considering the synaptic design shown in Fig. 2.2 and assuming that the input signal 
Vjj is small enough so that the contribution of the non-linear terms compared to that of 
the linear term are negligible, then the output current may be expressed as 
I U = K : N . ( V W - V T ) . V X (2 .5 ) 
When this is related to ( 1 . 1 ) it can be deduced that the value of the synaptic weight is 
given by Wjj = 1C[^.(V^ - V.,.) which is tuneable via the control potential V^. However to 
ensure that the synaptic circuit performs as desired, it is necessary that both source/bulk 
and drain^ulk junctions of the transistor be reversed-biased which requires that 
V B < V X , „ (2 .6 ) 
The family of simulated static characteristics of Fig 2.3 shows the output current versus 
the input signal over the range - I V... 1 V for different values of V^ , indicating the 
practicability of an MOS transistor as a variable-synaptic element. These simulations 
were performed with a bulk potential of -1 V, in order to fulf i l the proviso of (2 .6) , and 
the channel width and channel length of the transistor set to 2.4 | im and 2 4 jam 
respectively. These results show that the linear characteristic of the resistor deteriorates 
significantly as the control signal decreases toward the threshold voltage. This non-
linearity leads to substantial multiplication inaccuracies which are undesirable during 
the learning process. With reference to (2 .4) , the distortion is due to the higher order 
terms in V^g contained in h(V^^). 
2.1.2.2 Dual-Transistor Synapse 
The extent of the distortion can be appreciated by considering the second order 
approximation to the drain current of an NMOS transistor operating in the so-called 
linear mode of conduction 
4 6 
I D = K N . [ ( V G S - V T ) . V D S - \ y l s ] (2-7) 
This second-order approximation is sufficient since the effect of the third and 
subsequent higher orders are relatively small. It may be noted that the square term 
contributes significantly to the drain current when the drain-source potential approaches 
the non-saturation conduction limit given by 
VDS = V G S - V T (2 .8 ) 
Several linearisation techniques have been suggested [ 48 -49 ] , Here a dual-transistor 
synapse is used as an illustrative example [ 4 8 ] . The scheme is shown in Fig. 2.4 and its 
kernel is essentially a pair of matched devices which are driven by a control signal 
and are fed by a balanced differential input voltage = - (-V^). Note that both 
output terminals share the same reference potential V^^^ In practice this virtual short-
circuit is achieved using an op amp. 
The current flowing through each transistor can be expressed as follows 
Ml - K-N (Vw - VRef - V T ) . ( V X - VKef) " ^ - VRef) (2 .9 ) 
and 
IM2 - K.N. (vw - VKef - V T ) . ( - V X - VR,r) - f (-Vx - VR J ( 2 .10 ) 
Thus the difference output current of the synapse is given by 
/u = I M I - IM2 = K N . ( V W - V T ) . V X ( 2 . 1 1 ) 
This expression shows that, at the expense of an additional device, the linearity is 
considerably improved. The simulation results shown in Fig. 2.5 further demonstrate the 
improvement. 
4 7 
Although such a structure enhances the performance of synaptic multiplication, it is 
still lacking the fundamental bipolar weight feature and has the additional 
inconvenience of requiring a balanced differential input signal. 
2.1.2.3 Cross-coupled quad synapse 
To achieve a synaptic connection which offers a bipolar weight characteristic implies 
the use of a four-quadrant multiplier. This feature is readily obtainable by combining a 
pair of two-quadrant multipliers. Such a design technique has been frequently used in 
analogue VLSI technology. I f one considers the dual-transistor arrangement as a basic 
building block, then the circuit presented in Fig. 2.6 is a direct product of such a design 
approach. This scheme is commonly known as the cross-coupled quad [ 5 7 - 6 0 ] and 
consists of four matched devices Mj-M^ driven by input voltages V^ = Vx + — and 
Vx = Vx - y , and control signals V ^ = Vw + y and V ^ = Vw - y . Where V^ is 
the common-mode input level, is the dynamic input voltage, V^ and are the 
quiescent and dynamic control potentials, respectively. It can be shown that, utilising 
(2 .7) , the difference output current of both dual-transistor arrangements DT, and DT^ are 
given by 
/ui = I M I - 1M2 = K N - I ^ V W + y - V X - V T J .VX (2 .12 ) 
and 
'U2 = IM4 - IM3 = K N . ( V W - y - V X - V T ) . V X (2 .13) 
Thus the difference output current of the synapse is expressed as 
'u = 'ui - 'U2 = ( I M I + I M 3 ) - ( IMZ + I M 4 ) = K N - V W - V X (2 .14) 
48 
Since the synaptic output is independent of the threshold voltage of the transistors, the 
cross-coupled scheme can be implemented using either enhancement or depletion 
devices of either n- or p-channe! thus permitting its integration into a wide range of 
CMOS processes. It is also interesting to note that the cell is symmetrical, hence 
allowing for permutability between the input and output nodes. Lehmann and Bruun 
[60] have exploited this valuable characteristic to minimise hardware requirements 
when implementing the back-propagation learning algorithm in analogue ANNs. 
Finally, balanced input and weight signals are not required. 
The family of simulated static characteristics of Fig 2.7 shows the output current 
versus the input signal over the range - I V... 1 V for various differential weight values. 
These simulation results confirm that, for an aspect ratio of IV/L = 2.4|im / 24|im for all 
transistors and a bulk potential of -1 V, the cross-coupled arrangement offers the 
possibility of programming weight values of either signs. It may be seen that the transfer 
characteristics in the first and second quadrants are asymmetrical and that the error 
reaches approximately 15%. This is mainly due to higher order non-linearities in the 
drain currents (i.e. cubic and quadratic terms in Vpg). However the major drawback is 
the need for an op amp, in order to provide for the virtual output short-circuit, which can 
occupy substantial silicon area. 
2,1.2.4 Differential pair synapse 
As mentioned above the role of the op amp, in synaptic designs based on MOS 
transistors acting as voltage-controllable two-terminal resistors, is to maintain a pair of 
nodes at the same potential. This section demonstrates how this requirement can be 
avoided when utilising the MOS device as a transconductor operating either in the 
saturation or subthreshold mode of conduction. 
49 
At the basis of their synaptic designs several researchers have exploited the versatile 
transconductor-based differential pair multiplier shown in Fig. 2.8. This structure is also 
referred to as the long-tail pair and divides a biasing current between a pair of 
transistors M, and M j as a function of the difference between the input voltages 
V^ = Vx + - ^ and V ^ = V x - ^ . Where Vjj and are the common-mode and 
dynamic input levels, respectively. Note that independent of the regime of operation the 
tail and the drain currents are related by 
IW = I M I + I M 2 (2 .15) 
2.1.2.4.1 A differential pair in saturation MOS 
The voltage-to-current transfer characteristic of the long-tail pair based on saturated 
devices can be determined utilising the simplified square-law given in Appendix A by 
I D = ^ . ( V G S - V T ) ' (2 .16) 
I f the two transistors making up the pair have well matched features, their drain currents 
are expressed as 
I M I = ^ - ( V X + ^ - V S - V T ) ' (2 .17) 
and 
IM2 = ^ . ( V X - ^ - V S - V T ) ' (2 .18) 
where Vg represents the common-source node potential. Combining (2 .15) , (2 .17) and 
(2 .18) , the difference output current is given by 
KN.V'X 
'u = I M I - IM2 = V K N . I W .VX J I - (2.19) 
50 
This expression identifies the inherently non-hnear nature of the transfer characteristic. 
This can be assessed as follows. Substituting (2.19) into (2.17) and (2.18) the drain 
currents of the devices can be rewritten as 
I M I — 
IC_N 
2 
vx iw 
K N 4.1w 
(2.20) 
and 
1M2 — T " Iw 4.1 •iw 
(2.21) 
Thus it may be noted that these expressions are valid as long as do not exceed the 
operating range given by 
(2.22) 
I f so it can be seen that the difference output current saturates at - 1 ^ and 1^ as 
appropriate. However for input signal well within this range of operation, (2.19) may be 
approximated to a first order as 
(2.23) 
It can be appreciated that the difference output current is independent of the drain-to-
source potentials of M, and M j , as long as these do not reach the minimum saturation 
limit Vjjs given in (2.8), thus allowing for a resistive load to be directly connected to the 
synapse output [50-52], [69] (i.e. without the need of an impedance adaptor such as an 
op amp). However the saturation conditions expressed in (2.22) are dependent upon the 
weight control signal I ^ . This brings to the fore the fundamental problem of the 
51 
differential pair in saturation MOS which is that the smaller the weight value the 
narrower the input operating range. This is illustrated in the graph of simulated static 
characteristics depicted in Fig. 2.9. 
2.1.2.4.2 A Differential pair in subthreshold IMOS 
With the assumption that the drain-to-source quiescent potential is greater than four 
times the thermal voltage V, = kT/q (approximately 26 mV at room temperature), but 
low enough to disregard the Early effect, the drain current of an NMOS transistor biased 
in the subthreshold region is related to its gate-to-source potential (Appendix A) as 
I D = Ix.exp V GS (2.24) 
where K is a measure of the effectiveness of the gate in controlling the channel current 
and Ix = If^Q.lV/L, where IDQ is the characteristic current of the transistor. Thus the 
currents flowing through a differential pair of matched devices can be expressed as 
I M I =1X .exp exp •K. V I J 
(2.25) 
and 
IM2 = Ix - exp K. 
vx 
2 
Vt 
. exp -K. 
t J 
(2.26) 
Combining these two expressions and (2.15), it can readily be shown that the sum of the 
device currents at the common-source node is given by 
'4 2 ^ ) - " P ( - I ^ ) (2.27) 
52 
Hence the difference output current [22], [79] can be formulated as 
/'u - Iw-
(exp ) -exp Vx 
(exp _ 2.VtJ j + exp Vx L"''*2.V,J 
= l w - t a n h ( K . ^ ] (2.28) 
As in the saturation mode of conduction, the transfer characteristic has a non-linear 
response with respect to the differential input and saturates at 11^ I i f I v^^ I is greater 
than a few V, (e.g. | vx l«6 .V, ) . Nevertheless given that the tanh ftinction can be 
expanded in series as 
tanh(x)=x-ijc3 + - ^ j c 5 . + for IJCI < n (2.29) 
then for | < V, /K, (2.28) can be approximated to a first order as 
K.Iw 
/ u — . v x (2.30) 
Thus, biased in the weak inversion, the long-tail pair also operates as a linear 
Iransconductance multiplier for input voltage less than V, / K. Although the linear range 
of operation is smaller than when the scheme is operated in the saturation region it is 
however independent of the biasing current, as can be seen in the family of simulated 
static characteristics shown in Fig. 2.10. It may also be deduced from (2.30) that, as 
with saturated devices, the scheme can be directly loaded [22], [67], [79]. However the 
fundamental advantage of subthreshold operation over the saturation mode of 
conduction is that power dissipation is diminished since current and biasing voltage 
levels are typically lower. 
53 
2.1.2.5 Multiple differential pair synapse 
The differential pair is however a two-quadrant multiplier and can therefore only 
provide unipolar weight values. Different design techniques have been suggested to 
extend this feature to a four-quadrant multiplication. 
The first approach is based on a long-tail pair loaded by a sign switching cell which 
is made up of four switching transistors Mi-M^ controlled by a binary value B,. [51], [79] 
as shown in Fig. 2.11. The difference output current of the complete structure is then 
given by 
1 I - = / i^n i f B . = 1 
loui iout lout \ J I / \ \ ^ - ^ ^ ) 
I - l in i f B . = 0 
where Ij^, is the difference output current of a long-tail pair biased either in the strong or 
weak inversion. The difficulty with this technique is that the weight is controlled by 
both an analogue and digital signal. 
The second approach consists of coupling two differential pairs DP,, DPj which 
share the same input voltages V% and V'^, and are biased by Iw = Iw + ^ and 
Uv = - ^ respectively. Where 1^ and are the quiescent and dynamic components 
of the biasing currents, respectively. This scheme is also known as the modified Gilbert 
multiplier and is shown in Fig. 2.12. In the saturation regime [47], [52] the 
transconductance characteristic of this cell can be obtained utilising (2.19) and is 
expressed as 
'U - 'DPI - 'DP2 - V K N . I W . 
which simplifies to 
_/w_ K N - V x L J w _ KN.V^ 
2.1w • 4.1w • i ' 2.1w • 4.1w ; 
.vx (2.32) 
54 
w I + 2.1 
1 -
w 2.1 
(2.33) 
w y 
for I much smaller than ^ ( 2 . I W / K . N ) • Where /'up, and i^p^ represent the difference 
output current of the differential pair DP , and DP2 respectively. While for devices 
operated in the subthreshold region [22], [67] the output current 
/u ~ 'w-tanhi K. Vx (2.34) 
is deduced substituting /op, and i^p2 (2-28), where it has been assumed that all devices 
have identical features. The substantial advantage of subthreshold over saturation mode 
is that the synaptic weight is controlled in a linear fashion. The static behaviour of this 
multiplier operated in strong and weak inversion is depicted in Fig. 2.13 and 2.14 
respectively. Note that, when biased in saturation mode (2.32), the linearity range of 
can be widened at the expense of increasing the quiescent component of the biasing 
currents. 
The alternative and by far the most commonly used synaptic design, shown in Fig. 
2.15, exploits the MOS version of the Gilbert cell [70] which utilises an additional 
long-tail pair to provide for the biasing currents of the preceding scheme. The fixed 
biasing current Ig, is then distributed between the differential pairs D P , and DP2 via a 
third differential pair DP3 in a manner determined by the difference weight voltage 
v w = V w - V w . Operated in the saturation mode of conduction [53-55], the synapse 
output current can be derived by combining (2.20), (2.21) and (2.32) and is formulated 
as 
'u = 
[i 
2.1, 
J2 
- V 
1- - 2 
2.1B Vw v2 
V K N 2 
' ^ . 
Vx •vx (2.35) 
55 
which is non-linearly related to and in an interactive fashion. Nevertheless, for 
small input and weight signals (2.35) can be approximated as 
/'u = -^ .vw-vx (2.36) 
v2 
Note that the linearity range of the grounded differential pair is fixed and defined by the 
value of IQ and K^, while the synaptic weight is only related to the transconductance 
factor of the transistors. Thus the operating range can be adjusted at will without 
affecting the connection strength. Combining (2.34), and (2.28), the difference output 
current of the Gilbert structure becomes 
/u = l B . t a n h ( K . 2 ^ ^ ) . t a n h ( K . ^ ) (2.37) 
when biased in the subthreshold region [22], [62-64], [68-69]. I f the signal swing of 
both and v^ ^ is limited to I V , / K I , tanh(x) approximates to x (2.29) and the Gilbert 
cell then behaves as a linear transconductance multiplier since its transfer characteristic 
(2.37) simplifies to 
/ = I B J L V (2.38) 
4.V? 
The static characteristics shown in Fig. 2.16 and 2.17 confirm that irrespective of the 
mode of conduction, the Gilbert cell generates a difference output current related to the 
product of two differential voltages which may be used to represent the input and weight 
signal of a synapse. The major difference is that in the saturation mode. Fig. 2.16, the 
linearity range of the inputs can be adjusted to fit a given design, while in the weak 
inversion, Fig. 2.17, it is fixed and substantially smaller. 
56 
2.1.2.5.1 Wide range Gilbert multiplier 
From the above analysis and simulation results, it would appear that the Gilbert 
multiplier biased either in strong or weak inversion offers an ideal solution to the 
simulation of artificial synapses. Unfortunately the circuit is constrained by some 
biasing conditions which are mainly due to the fact that its design is based on a stack of 
long-tail pairs. One of the solutions to this problem is to isolate the top and bottom 
levels of the structure utilising a pair of current mirrors as shown in Fig. 2.18 [22], 
[53-54]. Although the wide range Gilbert multiplier virtually eliminates the biasing 
constraints of the Gilbert cell, it creates further inconveniences namely: 
• The weight and input stages of the multiplier are integrated using opposite 
channel devices, thus requiring the need for a twin well process; 
• The power dissipated by the modified scheme is twice that of the Gilbert 
circuit since the sum of currents flowing in both current mirrors is 2.1g; 
• Compared to the original scheme, the number of transistors is increased by 
70%, therefore considerably reducing the number of synaptic connections that 
are implantable on a given area of silicon; and 
• Since the number of transistors to be matched is greater, the wide range 
Gilbert cell will typically generate larger offset errors. 
In the next chapter it will be shown that it is possible to eliminate these effects whilst 
maintaining a similar transfer characteristic by utilising an arrangement in which the 
bias current is distributed across two differential pairs by modulating the bulk potentials 
of the devices. 
57 
2.1.3 Hybrid transistor/capacitor-based synapse 
The multiplying operation performed by the synaptic connection can also be simulated 
using an hybrid transistor/capacitor technique. One approach is based on the fact that a 
charge stored on a capacitor is related to the product of the difference potential Vj^ 
across that capacitor and its capacitive value C^ which is given by 
Qu=Cw.Vx (2.39) 
Here the charge is chosen as the output of the synapse in order to facilitate the 
implementation of the summation function. This issue is discussed in the next section. It 
may be noted that this technique is similar to that used for the resistor-based synapse 
with the exception that the output signal is a quantity of electrical charge instead of a 
current. Although the relationship between and Vj^ is valid at any time, the synapse 
is simulated using a switched-capacitor structure, as shown in Fig. 2.19 (a), which 
exploits the charge conservation principle [86-88]. The transistor switches are controlled 
via a pair of two-phase clock signals, as depicted in Fig. 2.19 (c). When the switch 
associated with (]), is closed, a charge packet, with magnitude proportional to Vj^, is 
stored on the capacitor which is then transferred to the output of the synapse during the 
clock phase see Fig. 2.19 (c) for more details. Since C^ is regarded as the weight, an 
array of weighted capacitors is required in order to emulate variable connection 
strengths. The additional disadvantage is that only positive weight values are simulated, 
unless a complementary opposite sign structure as shown in Fig. 2.19 (b) is adopted. 
The alternative solution to the former problem is to view the switched-capacitor 
circuit of Fig. 2.19 (a) as a conductance of value 
Gc = Cw.f (2.40) 
58 
where f is the switching frequency of the control signals which can therefore be used to 
modify the strength of a synaptic connection. However it has been reported [75-76] that 
it is more elegant to utilise the switching speed as the synaptic input when the post-
neuronal states are represented as sequences of pulses modulated in time, thus limiting 
the number of on-chip Voltage Control Oscillators (VCO) to a minimum. The weight is 
then represented as the input voltage of the switched-capacitor scheme. 
However all of these switched-capacitor techniques suffer from the clock 
feedthrough phenomenon [89] which occurs during the tuming-off transient of an MOS 
switch. 
2.2 Adder designs 
Most of the synaptic schemes presented in the literature are developed such that their 
output signals are either currents or charges. This choice is primarily made for the 
straightforward reason that the summation function can be achieved simply by 
connecting the synapse outputs to a common bus bar. Thus such an implementation 
technique leads to a compact summer circuit since it does not involve the use of 
additional components. 
2.2.1 Current-based summer 
KirchhofPs current law stipulates that the sum of all the currents flowing into a node is 
equal to the sum of all the currents flowing out of that same node. This theorem is at the 
basis of the current summer circuit shown in Fig. 2.20. The currents I^p lu:* ••• K j y ••• ^UN 
which enter the node are the output signals of the synaptic connections linked to an 
59 
analogue AN. Thus, applying KJrchhoffs principle, the output of the summer is given 
by 
Is = i ; i u j (2.41) 
j=i 
where N represents the number of connecting links associated to an AN. 
2.2.2 Charge-based summer 
A charge-based summer exploits the charge conservation principle that an electrical 
charge Q cannot be either created or destroyed and that, at any given time, the sum of all 
the charges stored in an electrical node is equal to zero. Applying these basic electrical 
concepts to a neuron circuit involving connected switched-capacitor synapses, the 
quantity of electrical charge flowing out of the summing node, similar to the current 
summer scheme depicted in Fig. 2.20, is then given by 
Q s = S Q u j (2.42) 
where Q^j is the output charge produced by the J"" switched-capacitor synapse, during 
the switching phase ^2 (see Fig. 2.19 (c) for details). 
2.3 Activation function generator 
The role of the activation function generator is to contain (squash) its weighted-sum 
input between two specified limits and supply an output signal which is compatible with 
the input levels required by the synapses it feeds. As a consequence its design is 
dependent upon the type of pre- and post-synaptic signals and the model of the 
squashing fimction which can be either a step, a ramp or a sigmoid function of either 
60 
unipolar or bipolar characteristics. Although applications exploiting the inherent 
properties of a fully connected feedforward neural network require far less activation 
function generators than connection links, it is however still desirable that the silicon 
area associated to each squashing building block, as determined by the number and size 
of electronic devices, be minimised in order to maximise implementation efficiency. 
This feature becomes a requirement when reconfigurability is involved, since an 
identical number of synaptic and thresholding building blocks are needed [53]. 
2.3.1 Squashing function based on a transimpedance amplifier 
When the design of the interconnection link is based on the principle of either a fixed or 
variable conductor, the characteristic of the thresholding generator is that of a non-
linear resistor, since the signal provided by the summing junction is a current and the 
signal required by the input of the synapse is a voltage. The non-linearity in the transfer 
function of the resistive element represents the squashing function. It may also be noted 
that such a current-to-voltage converter needs to provide low input and output 
impedances so that the multiplying characteristics of the synapse are unaltered whatever 
the number of connections it is either fed by or supplying to, thus allowing for a large 
fan-in and fan-out. Al l these features are displayed by a transimpedance amplifier 
scheme [20-21], [47-49], [54], [57-58]. Fig. 2.21 depicts the structure of a trans-
impedance amplifier, which is associated with the cross-coupled quad multiplier, 
consisting of an op amp and four MOS transistors operating as variable two-terminal 
resistors. When the op amp is operating in its linear range the transfer characteristic of 
such a scheme can be obtained utilising (2.7), since the matched transistors (Mj -MJ are 
biased in their triode region, and is given by 
61 
where /g is the difference input current produced by the current summer. The gain of the 
current-to-voltage conversion can be adjusted via the difference control potential 
v c = v c i - v c 2 - The transfer characteristic displays a saturation behaviour, thus 
providing the thresholding required by an AN, whenever attempts to generate an 
output signal greater than the supply voltage of the op amp. This is illustrated in the 
graph of simulated static characteristic depicted in Fig. 2.22. 
Although a complete multi-input programmable analogue MOS vector multiplier 
offers a feasible solution to the implementation of multi-layer feedforward ANNs, it is 
not efficient since the silicon area associated with the analogue neuron can be of an 
order of magnitude larger than that of the synapse [58]. 
2.3.2 Diode-based current-to-voltage converter 
When utilising long-tail pairs, biased either in the saturation or subthreshold mode of 
conduction, as the basis of a synaptic design the characteristic of the thresholding 
function generator is also that of a non-linear resistor for the same reasons as stated 
above. However, the differential pair possesses the following valuable features: 
• High input impedance, because the difference signal is applied to the gates of 
the devices; and 
• Low output conductance, since the drain-to-source voltages of the transistors 
have a minute influence in the multiplying characteristics; 
thus, the current-to-voltage converter does not need to be buffered in order to allow for 
large fan-in/fan-out. As a consequence the non-linear load can be implemented using a 
simple CMOS arrangement of biased diodes as shown in Fig. 2.23 [53], [56], [64]. Note 
62 
that, for a fully differential scheme, the resistor consists of four matched transistors 
M , - M 4 and two biasing voltage sources E , , E j - The transfer characteristic of such a 
scheme can be determined assuming that the current flowing through a diode connected 
transistor is exponentially related to its gate-to-source potential as in (2.24), Thus the 
output difference voltage in Appendix C is given by 
vv = V t - V - _ 2.V, sinh 
U.i; 
(2.44) 
where is the quiescent bias current and is given by 
Ix = I x e x p K. 
V d d - ( E , + E 2 ) 
2.Vt 
(2.45) 
where V^^ is the supply voltage. 
Because the drain current of the device is an exponential function of the gate-to-
source voltage, the W characteristic of the resistor is highly sensitive to the biasing 
arrangement. This may be illustrated as follows. Given that the sinh'' function can be 
expanded as 
s inh- ' ix )=x. ^.x' + ^ . x ' -
2.3 2.4.5 2.4.6.7 
X' + for IJCI < 1 (2.46) 
Then for < iX^^ the effective driving point resistance may be approximated to the 
first order as 
V , Vy 
's 2.K:.I X 2.K.Ix.exp 
V d d - ( E | + E 2 ) 
2.V, 
(2.47) 
It can therefore be appreciated that the resistance is highly sensitive to variation in both 
the supply and bias sources. The simulation results depicted in Fig. 2.24 cleariy 
highlights this undesirable effect. It can be seen that within the supply range 2.9 to 3.1 V 
63 
the nominal resistance of the scheme varies by at least 60%. It may also be added that 
the non-linearity associated with the lA^ characteristics is not necessarily the ideal 
squashing ftmction. The next chapter will demonstrate how these two issues have been 
tackled. 
2.3.3 Thresholding generator for pulse-stream 
The role of the thresholding function circuit, when the synapse array exploits a 
switched-capacitor design, is to convert, in a non-linear fashion, a sum of charges into a 
stream of pulses whose frequency is modulated by the magnitude of that sum. This has 
been achieved by converting the charge packets into a correlated analogue voltage, 
utilising a leaky integrator whose design exploits a standard op amp integrated into a 
resistive/capacitive negative-feedback loop, and then subsequently transforming that 
voltage into a frequency modulated signal using a VCO [75-76]. Such an electronic 
system is depicted in Fig. 2.25. Note that the feedback resistor can be implemented as 
a switched-capacitor resistor driven by a global clock signal (common to all neurons), 
thus allowing the gain of the generator to be externally controlled. The non-linearity is 
introduced either by clamping the voltage across the feedback resistor or utilising a 
VCO which embodies a sigmoid transfer characteristic [77]. However, within its linear 
region the output voltage of the leaky integrator is defined by the following differential 
equation 
where Q^j is the amount of charge, per clock cycle, delivered by the j ' * ' synapse, f^  is the 
frequency of the clock (i.e. pre-synaptic input) and Rg and Cg are respectively the 
feedback resistor and capacitor. 
64 
Due to the nature of the above expression, the system requires five times the time 
constant X j = R -^Cg to settle to an output steady state activity given by 
V Y ( t ) = -Rs.i;Quj.fj (2.49) 
thus Hmiting the computational speed of an ANN. It should be emphasised that due to 
the transient behaviour of the synapse, a ripple signal, superimposed onto the output of 
the leaky integrator, is generated and has a magnitude given by 
V Y r i p p l e = V w . ^ (2.50) 
where is the weight voltage of the switched-capacitor connection and is its 
associated capacitor. This undesired variation has the effect of limiting the processing 
accuracy of the neuron. Both expressions (2.48) and (2.50) encapsulate the fundamental 
compromises between speed and accuracy of computation of pulse stream neural 
networks based on a switched-capacitor design. 
2.4 Weight storage 
As mentioned earlier, the characteristics of a neural network are determined by its 
topology and the weights associated with each connection. This section will only deal 
with issues related to weight. As mentioned earlier, the strength of each weight is 
estimated during the learning procedure. The abilities of the network to use these 
weights in a subsequent application is achieved by storing their values utilising a non-
volatile memory circuit. The building block utilised to perform this task is usually 
referred to as either the weight storage cell or the long-term memory scheme. It may be 
noted that as one of these cells is associated with each synapse, it is desirable that its 
65 
silicon area be minimised allowing for a systolic simulation of ANNs. Since learning is 
an intensive iterative process, whereby the weight values can be altered many times 
[12], [18], [34], [91-92] (i.e. depending upon the type of algorithm), it is then essential 
that the speed at which the data is written in the memory cell be high in order to allow 
for rapid training. To implement this feature either for an on-chip or an off-chip system, 
it is also recommended that the memory cell offers a linear characteristic which may 
lead to a simpler and more compact weight storage controlling circuitry. The storage 
scheme should also offers a weight resolution of at least 8 bits in order to allow 
convergence during learning [93-94]. Utilising the above mentioned criteria, the 
following subsections will assess the viability of some storage circuits which have been 
developed in relation to the synaptic designs described earlier. 
2.4.1 Digital weight storage 
One straightforward weight storage design is based on a binary weighted array of 
devices. The elements can be either resistors [49] or capacitors [88] or current sources 
[50-51], [79] depending on the principle upon which the synapse is designed. Note that 
this technique has been inspired by the memory system used in a digital computer. A 
memory scheme based on weighted current sources is depicted in Fig. 2.26. For this 
example, transistors Mj to M„ behave as current sources with magnitudes determined by 
the biasing voltage V^.^. Transistors S, to acts as switches which are controlled by 
binary signals stored on non-voiatile digital memory cells. The output current of the 
circuit is given by 
Iw = I . b i + 2.1.b2 + 4.1.b3 + + 2^-' . l . b N = fc'-' . I .b i (2.51) 
i=l 
66 
where bj is either "0" or " 1 " according to the binary state of Bj, I is the unit current 
source and N represents the number of bits. Note that the number of transistors needed 
to implement the digital-to-analogue converter circuitry alone (i.e. without including the 
digital memory cells) is 2"^  + N - 1 . In the case of learning rules which require high 
weight resolution (i.e. at least 8 bits), this inherent feature mitigates against the small 
area condition required for a practical synaptic connection. Furthermore the size of those 
devices must also be large enough to obtain a monotonic conversion [82], even though 
mismatching effects can in practice be reduced by taking advantage of a common 
centroid layout [19]. To optimise the silicon area associated with digital storage cells, a 
design based on shift registers has been suggested [50-51]. However such schemes 
suffer from low learning speeds since the weight data are processed serially. 
2.4,2 Capacitive weight storage 
A basic capacitive sample/hold system can be used as a storage scheme [47], [52-54], 
[56], [58], [67-69], [75-76]. Its structure is shown in Fig. 2.27, and consists of a 
switching transistor which is controlled by a switching signal <[> and a hold 
capacitance C^^^^^. In order for the quantum of stored charges to remain fixed during the 
holding time the sensory circuit (i.e. synapse) must provide for a high impedance. In 
practice this can be achieved by connecting the capacitor to either the gate or bulk of an 
MOS device. Although most mixed analogue/digital processes offer the possibility of 
integrating poly-poly capacitors, the cell area may be significantly reduced by 
implementing C^oid as an NMOS device M^oij for which the bulk, drain and source 
terminals are grounded. This is justified by the fact that the capacitance is inversely 
proportional to the thickness of the dielectric which is that of the gate oxide. However 
67 
this approach is subject to one requirement which is that the difference potential across 
the device must, at all times, be greater than in order for it to be biased in strong 
inversion [74]. Provided that this condition is fulfilled, then the gate of the transistor 
acts as the top plate, the inversion layer as the bottom plate and the value of the 
capacitor may be approximated by 
Choid = ' ^ h o i d - ^ h o i d - T ^ (2.52) 
where (f^ hoid ^hoid respectively the width and the length of M^^^^^, i^^ is the 
thickness of the insulator and is its permittivity. 
The basic sample/hold circuit suffers from two major drawbacks: 
• During the holding time, some stored charge (information) is lost due to 
leakage occurring into the substrate via the reverse-biased source/drain-to-
substrate diodes of the switching transistor, and also through subthreshold 
conduction. 
• The stored charge is perturbed during the switch-off transient of the access 
transistor M ,^. 
These issues will be thoroughly discussed in chapter 4. 
2.4.3 E E P R O M technology for weight storage 
An alternative approach to the implementation of a non-volatile, alterable, compact 
weight memory cell is to use the programmable threshold-voltage characteristic of 
either charge trapping or floating gate MOS transistors [71], [95-96]. This technology 
was originally developed for the design of Electrical Erasable Programmable Read Only 
Memory (EEPROM). These devices operates on the principle that charges trapped in 
68 
one or more extra layers of insulated gate situated between the control gate and the 
channel effectively produce a shift in the threshold voltage V .^. Thus the current flowing 
through a Floating Gate MOS (FGMOS) device is determined by the amount of charge 
deposited on the floating-gate as well as the potential applied on the control gate, drain, 
source and substrate. Fig. 2.28 shows the cross-section of an FGMOS transistor (charge 
injector mechanism is not shown). Note that such a structure can be fabricated utilising 
a standard double-poly CMOS process wherein poly-1 and poly-2 layers are 
respectively utilised to act as the floating and controlling gate [96]. FGMOS transistors 
can be grouped in different categories depending upon the type of mechanism used to 
inject or remove charge from the floating gate. The most common methods [71] are: 
• Avalanche injection; and 
• Fowler-Nordheim tunnelling injection. 
The process of charging and discharging a floating gate device is energy intensive. To 
obtain an avalanche breakdown phenomenon from a reverse biased n-p junction, a 
voltage in excess of +30 V is required, whereas tunnelling either electrons or holes from 
the channel to a floating gate through a thin oxide necessitates pulses of ±15 V of 
amplitude. Although tunnelling is compatible with today's low voltage VLSI 
technologies, the shift in V .^, created by a train of constant pulses, is proportional to the 
logarithm of the number of pulses and the characteristics of the charging and 
discharging procedure are diverse. This strong non-linearity makes the weight value 
extremely difficult to program accurately. On the positive side however, since leakage 
through the oxide insulator is extremely low, the retention time of programmed devices 
can range from 1 to 10 years. 
69 
2.5 Summary 
In this chapter, the most common analogue and hybrid designs used to simulate the 
primary functions of ANN have been discussed. 
It has been shown that synaptic programmability can be obtained utilising transistors 
acting either as voltage controllable resistors or as adjustable transconductors biased 
either in the subthreshold or saturation mode of conduction. To provide for signal 
adaptation, the resistor-based synapses have, however, the disadvantage of requiring an 
op amp which is area hungry. The alternative solutions, which are based on 
combinations of differential pairs, remove this burden. Analysis and simulations 
revealed that the Gilbert multiplier and its derivative may be programmed to behave 
either as an inhibitory or as an excitatory synapse and that the saturation mode of 
conduction offers more control over signal range than subthreshold operation, which is 
an advantage acquired at the expense of an increase in power dissipation since current 
and biasing voltage levels are typically higher. However the Gilbert cell is liable to 
some biasing conditions which are costly, in power dissipation and silicon area, to 
overcome. 
The switched-capacitor synapse as a variable conductor is compact; however it lacks 
computing resolution due to clock feedthrough phenomenon and its associated 
squashing generator bargains speed against accuracy of computation and is area hungry 
since it involves at least one op amp. 
It has also been established that when utilising synaptic schemes based on either 
analogue or hybrid designs the summation function is simple to implement since no 
electronic devices are required. 
70 
Weight storage schemes based on designs such as digital memory, capacitive 
element and EEPROM technologies respectively suffer from area inefficiency, limited 
retention time and non-linear programmability. 
71 
Resistor 
sAAAAA/^ 
^ Virtual 
^ G r o u n d 
Resistor-based 
Synapse 
Wn = G 
X 
u 
Virtual 
Ground 
( b ) 
Figure 2.1: Resistor-based synapse, (a) electronic representation, 
(b) equivalent synaptic model. 
72 
u 
Virtual 
Ground 
Figure 2.2: An MOS transistor synapse. 
10uA 
U Ofi 
- lOufl 
V w = 3 . 5 V 
V w = 3 V 
V w = 2.5 V 
Vvv=2 V 
h 
•1.211 -o.eu 
• i D ( m ) 
1.2U 
Figure 2.3: Static characteristics of an MOS transistor synapse. lu versus V^. 
73 
w 
Ml 
T 
M2 
I M I 
T 
VB 
VRef 
/ U = I M I - I M 2 
Ref 
Figure 2.4: Dual-transistor synapse. 
Oft 
-ZOufI 
24U 1.6U -O.Su •o.ou 0.8U ].6u 24U 
Figure 2.5: Static characteristics of a dual-transistor synapse. i\j versus 
for ( f ^ / I ) M i 2 = 2.4|im / 24^m and a reference potential of 0 V. 
74 
Ml 
VB DTii 
M2J 
I M I I 
L. 
M3 
H V B 
M47 
IM3 
IM4 A I ' 4 
DT2i 
Ref 
I U = ( I M I + IM3) 
- (IM2+ IM4) 
Figure 2.6: Cross-coupled quad synapse. 
' u 
- 5 . 0 u f l + — 
- 1 . 2 U -O.eu -o.nu -o.ou o.nu 0.8U 
V w = l V 
Vw=0.5V 
V w = O V 
-0.5 V 
Vw==-1 V 
1.2U 
Figure 2.7: Static characteristics of the cross-coupled quad. 1^ versus 
for {lVfL)Mi2XA ^ 2.4nm / 24|im, a reference potential of 0 V, a quiescent 
input signal of 0 V and common-mode control signal of 3 V. 
75 
I U = I M I - I M 2 
I M I IM2 
Ml Vs M2 
Iw 
Figure 2.8: Differential-pair multiplier. 
76 
Oft 
- l O u f t - i 
J . . . . 
•1.2U -O.flU •O.W O.OU O.itU O.BU 1.2U 
Figure 2.9: Static characteristics of the differential-pair in saturation 
MOS. I'u against for (WVL)^,^ = 2.4|im/6|im and a common-mode input 
signal of 3V. 
ZQQnft-; 
l l j Q n -
•200nfl 
-200nU 
w - 2 0 0 n A 
_ I w - ISOnA 
I w - 100 nA 
I w - 50 nA 
•100nU OnU 100nU 200nU 
Figure 2.10: Static characteristics of the differential-pair in subthreshold 
MOS. /'u against for {IV/L)^^^ ^ 24^m / 2.4|im and a common-mode input 
signal of 1 V. 
77 
out 
Ml - . u B/ I M 3 
^ 2 M4 
in 
Figure 2.11: Sign switching cell. 
T ' U = ( I M I - I M 2 ) - ( I M 4 - I M 3 ) T 
M3 M4 Ml M2 
Figure 2.12: Modified Gilbert cell. 
78 
10ufi-p 
«U Oft 
•10ufl + 
-1 .2U 1.2U 
Figure 2.13: Static characteristics of the modified Gilbert multiplier in 
saturation MOS. /'u against for ( f J ' / I ) M i ^ j . 4 = 2.4 | im /6 fim, a quiescent 
biasing current of 5 ^lA and a common-mode input voltage of 3 V. 
2Q0nn 
100nn 
' U Oft 
-lOOnA 
1DQmU lOOnU 
•200nft 
-250mU -200mU 
^w=200nA 
(W=120iiA 
/w=40nA 
i\y=-40nA 
iw = -200nA 
ZOQmU 
Figure 2.14; Static characteristics of the modified Gilbert multiplier 
in subthreshold MOS. lu against for (W/L)^,,^^^ = 24 |im / 2.4 nm, a 
quiescent biasing current of 100 nA and a common-mode input voltage 
of 1 V. 
79 
' u = ( I M I - I M 2 ) - ( IM4- IM3) 
Y * M i m i l 
Ml M2 
Vsi 
Figure 2.15: MOS version of the Gilbert multiplier. 
80 
lOuA 
SUA 
-SUA 
•10uA-^--
- I . H U -1 .0U 1.0U 
Vw= 0.6 V 
Vw=0.2V 
Vw = -0.2V 
Vw=-lV 
1.UU 
Figure 2.16: Static characteristics of the Gilbert multiplier in saturation 
MOS. iyj against for {W/L)^^j^^^^^ = 2.4 fim /6 nm, a biasing current of 10 ^ A, 
a common-mode input voltage of 5 V and a quiescent weight potential of 2 V. 
2aQnA 
lOOnA 
-100nn 
-200nA 
•250nU -200nU -1Q0IQU OnU 
V v 
lOQmU 
Vw = 100mV 
Vw=60 mV 
Vw=20mV 
Vw = -20mV 
Vw=-60mV 
Vvv = -ioOmV 
200nU 
Figure 2.17: Static characteristics of the Gilbert multiplier in subthreshold 
MOS. iyj against for (^//')MI2J.4^.6 = 24 ^m / 2.4 ^m, a biasing current of 
200 nA, a common-mode input voltage of 3 V and a quiescent weight potential 
of 1 V. 
81 
l u 
• 
V W 
Ms M6 
LV 
M7 
M3 M4 
VS2 
, J _ M9 
DP2 
Figure 2.18: Wide range Gilbert multiplier. 
82 
V x 
^ Q u 
C w 
(a) 
C w ^ 
(b) 
Q u 
Time 
Time 
Time 
^ Clock cycle — 
(c) 
Figure 2.19: Switched-capacitor synapse, (a) excitatory 
synaptic connection, (b) inhibitory synapse and (c) transient 
characteristics. 
83 
Analogue 
Synapse 
X i 
W n 
X n 
IjJN 
Figure 2.20: Current summer. 
84 
Cross-coupled 
quad syanpse Sum 
Figure 2.21: Squashing function based on a transimpedance ampllfler. 
85 
5. OUT 
5.0U + 
I2un 
Figure 2.22: Static characteristic of the transimpedance amplifler. 
Vy versus for (^ / I ) , ^ , ^34 = 2.4|im /6nm and V^, - = 0.2 V. 
86 
Differential pair 
Synapse Sum 
L . M 4 i 
Figure 2.23: W converter implemented utilising a CMOS 
arrangement of biased diodes. 
87 
SOOmU 
SOnfl 
-"VaT-12.9 V 
3V 
Vdd«3.1 V 
Vdd-
5fl0mU + 
-120nA 12 On A 
Figure 2.24: Static characteristics of the diode-based V\ converter. 
Vy agains for (^K/I)MU3.4 = 12 ^im I2A | im and E, = Ej = 0 .89 V . 
88 
Switched-
capacitor 
synapse Sum 
Rs 
A / 
Leaky integrator 
Figure 2.25: Thresholding generator for pulse-stream. 
89 
Digital 
IMcm 
Cell 
o 
bias 
tnl I 
SN 
-1 
BN 
M a • / / • 
Synapse 
Digital 
Mem 01 
Cell 
il
Iw 
| J f 4 . I 
S 3 
B3 
Digital 
Memo: 
Cell 
B2 
il Digital 
n ^ S 2 Vlcmon — 
w Cell " 
Ml Ms 
- 3 
B i 
M i Ml Ml 
Figure 2.26: Digital memory based on weighted current sources. 
0 
Ms 
Chold 
J — ! 
_ iMhoid 
Synapse 
< -
Figure 2.27: Sample/hold memory cell. 
Drain 
Gate 
(Poly-2) 
Source 
Oxide 
Bulk Floating gate 
(Poly-l) 
Figure 2.28: Floating-gate MOS transistor. 
91 
Chapter 3 
Subthreshold design for feedforward 
neural networks 
The aim of this chapter is to identify the problems associated with the design of existing 
building blocks, namely multipliers, current-to-voltage converters and weight storage 
schemes, in the context of a larger system such as a feedforward ANN when MOS 
devices are operated in the subthreshold mode of conduction and to present alternative 
solutions. A design method which exploits the exponential transfer characteristic of 
MOS transconductors is presented, based on the fact that the activation function of a 
neuron, which is incorporated into a whole analogue system, can be distributed over the 
next layer of synapses without affecting the overall behaviour of the network [64]. Two 
different cells have been specifically designed, to overcome some of the disadvantages 
associated with subthreshold biasing and to emulate the functions required by the above 
mentioned design method, namely a four-quadrant multiplier [64] and a load [99]. The 
structure of the former circuit is based on a cross-coupled quad design in which 
differential multiplication is obtained by driving the bulk (also known as the back gate) 
92 
terminals of MOS transconductors. The latter scheme displays a transfer characteristic 
similar to that of the CMOS configuration of biased diodes presented in the previous 
chapter, with the difference that the sensitivity to power supply variations has been 
substantially improved utilising an arrangement incorporating feedback. The 
performances of these building blocks are assessed based analytically and via 
simulation. The discussions will include their relative advantages and disadvantages. 
The final objective of this chapter is to examine the potentials, in terms of speed and 
ease of programmability, of a neural network based on the proposed technique. 
3.1 Statement of problems 
It has been demonstrated in the previous chapter how four-quadrant multipliers, 
exploiting the transconductance characteristic of MOS transistors, such as the Gilbert 
cell and its derivatives, offer a viable solution to the emulation of synaptic connections. 
It has also been shown that when the devices are operated in the subthreshold mode of 
conduction power dissipation is diminished not only because current levels are typically 
lower than transistors biased into strong inversion, but also because the biasing voltages 
can be reduced. However in the context of a whole system, combination of this 
primitive building block with that of a weight storage cell based on a sample-and-hold 
design and a non-linear load poses some challenges. This section concentrates on the 
problems associated to the non-linear dependence on both of the inputs of the Gilbert 
cell. 
For a typical K value of 0.84, the linear range of the inputs of the Gilbert multiplier, 
implemented utilising NMOS devices, are limited to ± / K a ± 30 mV. Such a 
93 
confined linear input swing imposes limits on both the weight resolution and the 
dynamic range of the thresholding generator as shown below. 
The weight resolution is determined by both the accuracy of the weight storage 
scheme and the linearity range of the synaptic weight input and is given by 
W , < l o g 2 ( y ^ ) (bits) (3.1) 
Even i f utilising compensation techniques such as the dummy device [90], to minimise 
error due to the clock feedthrough phenomenon, one may expect the relative accuracy of 
a sample-and-hold storage scheme to be 2 mV at best. Thus, it can be seen from (3.1) 
that the resolution of the weights is limited to 4 bits which is well below the precision 
necessary for the successful training procedure of a feedforward ANN [93-94]. 
As mentioned in the previous chapter, in order to achieve compatibility, the 
activation function generator must supply a signal which is squashed within the input 
range of the synapses it is feeding. With signal levels as low as those mentioned above 
the system would therefore be highly susceptible to noise. 
A possible solution to these problems would be to linearise the inputs of the 
multiplier utilising two pre-processing cells which would be characterised by a tanh"' 
type distortion. This may be achieved using diode connected transistors as current-to-
voltage converters [70]. However this would have the inconvenience of considerably 
increasing the silicon size, as determined by the number of transistors, of the synaptic 
cell. 
It is shown in the next section how the undesired saturation associated with the input 
of a modified Gilbert multiplier (i.e. tanh function) could be exploited in the design of a 
feedforward neural network. 
94 
3.2 Design method 
Let us consider the local section of a ftilly connected feedforward neural network as 
depicted in Fig. 3.1. It has been shown in the introduction that the neuron transfer 
function can be expressed in the form 
xr'=/(sr)=y Zw;;^ .x]" + b 
^ j 
r \ 
(3.2) 
where_/(.) is the sigmoid activation function, X"'*' is the output of the i'*' neuron in the 
layer m and Xj"" represents the input of the j ' " connection link in the layer m. 
The solution that is presented here makes use of the non-linearity associated with the 
input of a transconductance multiplier to perform both the activation function and 
synaptic multiplication. The proposed design technique is based on the observation that 
the activation function of the i"" neuron in layer m, depicted in Fig. 3.1, can be 
distributed over each of the forward layer synapses without modifying the overall 
behaviour of the neural network [64] as shown in Fig 3.2. It may be noted that the 
transfer function of that neuron is unchanged and is given by (3.2). It can consequently 
be seen that for the hidden and output layers, the thresholding and synaptic weighting 
functions could be combined to form what will be referred to as a Thresholding-
Synapse (TS). The output and input of a TS block are related as 
pr' = wj|"7(s;^ ) (3.3) 
where P j^'"'^ ' and Sj"" are the output and input of the TS block respectively; i and k define 
the driving neuron in the layer m and the driven node in the layer m+1 respectively. The 
following subsection indicates that it is possible to implement (3.3) using a simple 
95 
non-linear transconductance multiplier consequently eliminating the need for a non-
linear load. 
3.2.1 Possible circuit implementation 
An analogue circuit implementation of the TS cell as required by the neural architecture 
shown in Fig. 3.2 can be realised utilising the transconductance multiplier depicted in 
Fig. 3.3. The core of the scheme is a modified Gilbert cell. Note that the top two current 
sources denoted 1^ ^^  have the effect of removing the quiescent components of the output 
current of the multiplier. Assuming that all transistors have matched characteristics, the 
difference output current of the suggested circuit (2.34) can be expressed in the form 
Ipk = ( l w - l R c f ) . t a n h [ K . j ^ ^ (3.4) 
When this expression is related to (3.3) it can be seen that the value of the synaptic 
weight of the TS is given by Wj = 1^ ^ - 1^ ^^  which is adjustable via the biasing current 
source 1^ and the activation function is a tanh function. Although such a circuit offers an 
ideal solution to the simulation of a TS block so far as: 
* It provides for the most commonly used activation function in the simulation 
of feedforward ANNs which is the sigmoid function; 
* The weight is linearly controllable over a bipolar wide range; and 
* It is compact since it integrates a minimum number of elements (4 active 
transistors and 4 current sources of which one of the latter is required to be 
alterable). 
some problems arise when one needs to consider a suitable analogue memory. 
96 
3.2.2 Current memory circuit 
Utilising the above transconductance multiplier to simulate the T S cell the weight signal 
is represented as a current. The memory must behave as a linearly controllable current 
source. As mentioned in the previous chapter, such a storage circuit can be constructed 
by digitally controlling the sum of weighted current sources which are generated via 
transistors operating as transconductors. However, this digital-to-analogue conversion 
technique is not suitable for weight resolution value greater than 3 bits for reasons of 
poor efficiency in silicon area. An alternative solution would be to make use of a basic 
voltage-to-current transfer cell, whereby a voltage stored on a capacitor is transduced to 
a current via a transistor [97]. The circuit diagram of such a scheme is shown in Fig. 3.4 
(a). Here the two switch-transistors are driven by a two-phase clock signal ^^  whose 
transient states are illustrated in Fig. 3.4 (b). When both switches are closed, the input 
current I^ ^^ ^ charges the holding capacitor to a specified voltage such that at the 
end of this sampling period, T^, the current flowing through the storage transconductor 
Mj is the memorised current. During the holding phase, T^, the switches are turned off 
and Mg operates as a current source which is loaded by a diode connected transistor M,. 
Note that during this period the voltage stored on the capacitor is isolated and is 
therefore memorised. Note also that V^ . is used to generate the weighted current required 
by the synaptic circuit via an additional transconductor M .^ whose characteristics match 
those of Mj,. However several problems are associated with this type of current memory. 
First, during the switching off transient, part of the charge stored in the inversion 
layer of the switch-transistor is injected qj„j on reducing the gate voltage of the 
storage transconductor by 
97 
A V , = ^ (3.5) 
Furthermore, during the sampling mode, the drain-to-source voltage of M3, which 
operates as a diode biased in the subthreshold region, is approximately V^. In the 
holding mode this voltage changes to V^^ minus the voltage drop across the load which 
is also approximately Vj. Hence, when the transfer circuit is switched from tracking to 
holding this variation is fed back to via a coupling capacitance C^. is represented 
by the dotted capacitor in Fig. 3.4 (a). This effect increases V .^ by roughly the following 
amount 
A V s 2 « ( V < , d - 2 . V T ) . ^ - ^ (3.6) 
It may be noted that in the weak inversion region, the value of C .^ is approximately that 
of the drain-to-gate overlap capacitance [74]. 
Finally, during the holding phase the charges accumulated on are discharged at a 
rate determined by both the reverse-biased current of the source/drain-to-substrate 
junction, represented by a dotted diode in Fig. 3.4 (a), and the subthreshold conduction 
of the feedback transistor. This leakage decreases the stored voltage and therefore the 
memorised current. 
To reduce AV^ ,^ (3.5) Guggenbiihl et at. [97] have suggested the combination of two 
solutions which are: 
• Reducing the amount of injected charge utilising a dummy-switch 
compensation technique; and 
• Increasing the value of the holding capacitor by means of a Miller 
enhancement method. 
98 
Whereas AV^ (3,6) is attenuated by replacing the storage transconductor by a regulated 
cascode device whose feedback capacitance is lowered by about 100 times. However 
these compensating techniques are only applicable when the current memory cell is 
operated in strong inversion. 
To bring the current error caused by clock feedthrough and parasitic capacitive 
coupling within acceptable limits, (i.e. < 0.1%) Pain and Fossum [98] have proposed an 
alternative scheme which makes use of a feedback arrangement. The cell, shown in Fig. 
3.5 (a) in its NMOS version, has been developed for a low-power application and is 
therefore suitable for weak inversion current levels. Note that the circuit is controlled by 
a pair of two-phase clock signals (j), and ^2 their respective inverses 4> i and ^ . The 
transient states of <t)| and ^2 are depicted in Fig. 3.5 (b). The sampling operation takes 
place when the switches associated to (j), and ^2 closed with the dynamics of this 
phase defined by the non-linear differential equation 
Imem = Ics + IMS = C s - ^ + Ix.expj^K.:^ J (3.7) 
where V^ , is the gate-to-source potential of the storage transconductor. The solution to 
this differential equation is 
iMs(t) = r (3.8) 
l - c p . e x p [ - | ; - . t ] 
where the constant cp is dependent on the magnitude of I ,^^  at the origin. 
The current-error i^, resulting from the change of state in (t»i, is then partially fed back 
to the holding capacitor, via a pair of current mirrors Mj-Mj and M 3 - M 4 , creating a 
negative change in the stored voltage. This compensation phenomenon takes place until 
99 
the error decreases to a level which cannot be detected by the feedback loop. As a 
consequence, the current flowing through M .^ is approximately I ^ ^ to the precision of a 
residual current. This correction process is also described by a non-linear differential 
equation 
'e = - i f ' ^ = 'MS - Imem = Ix-CXp ( » ^ - ^ ] - Imem (3.9) 
where r| is the ratio of the current mirrors. The solution to this differential equation is 
given as 
/e(t) = ^ r — Y - (3-'^) 
where 1^  = / / t = 0). Note that for a stable operation of the feedback structure the ratio of 
the current mirrors should be smaller than 1. Furthermore, the correction scheme relies 
on a change in V^, induced during the transition from sampling to error compensation 
mode, to be positive. This requires the coupling capacitance C , to be greater than the 
gate capacitance of (|), switch. 
(3.8) and (3.10) indicate that the time required to charge the gate capacitor and 
subsequently compensate for errors due to switching transient such that the difference 
between the current flowing in and 1^ ^^  is within one Least Significant Bit (LSB) 
precision, is approximately 
T . = T . . T e = 5 . ( i ^ ^ (3.1.) 
which is inversely proportional to the memorising current. Thus for currents as low as 
10 nA, and a typical storage capacitance of 1 pF with a mirroring factor of 0.2, this 
period can extend to up to 90 |is. Note that during the holding mode, <t»i = <1>2 = 0» 
100 
transistor M, acts as the load and its mirrored current is short-circuited to the ground via 
<t)2 switch. 
To compensate for the loss of information due to charge leakage, the stored current 
value may be periodically regenerated [71] via a Digital-to-Analogue Converter (DAC) 
circuit. A sequential refreshing technique is desirable to minimise the number of DACs. 
Therefore the amount of synaptic weight that can be refreshed by a single DAC is 
determined by both the holding and the sampling/compensation periods 
Ns = ^ (3.12) 
* SC 
where T^ is the interval between update phases such that the precision of the weight is 
maintained at 1 L S B . A numerical approach suggests that for a maximum current of 510 
nA with 1 L S B corresponding to 2 nA and assuming that the leakage current is 
approximately 1 pA, the duration of the holding phase would be about 120 \is since the 
accuracy of the gate voltage between two quantized levels is of the order of 120 jiV 
for weight values situated in the upper range. These results would imply that the 
maximum number of synaptic weights that could be refreshed by a single DAC would 
only be one, thus indicating the impracticability of a current memory scheme, and 
therefore that of the modified Gilbert cell as a means to simulate a T S cell. Note that if 
the devices were operated in the strong inversion region the ratio of synapses to DACs 
would be increased by 3 orders of magnitude. 
101 
3.3 The conceptual building blocks of the Neural 
Network 
This section shows that it is possible to obtain a transconductance relationship similar to 
that of the modified Gilbert multiplier in an arrangement in which the bias current is 
distributed across two differential pairs by modulating the bulk potentials of the devices, 
thus eliminating the need for a current memory design. It will also be shown that this 
circuit combined with that of a compensated sample/hold using a dummy switch scheme 
offers the possibility of simulating 8 bits weight resolution. Finally, it will be 
demonstrated that the sensitivity to biasing variations of a load arrangement based on 
diode-connected transistor has been substantially improved. 
3.3.1 A four-quadrant multiplier 
The four-quadrant multiplier [64], as shown in Fig. 3.6, has been developed to 
efficiently implement the TS block and consists of a current source Ig and two 
differential pairs P, and Pj of matched transistors operating in the subthreshold mode of 
conduction. Differential multiplication is obtained by applying one differential input 
= - V"j( between the gate terminals of M, ( M J and ( M 3 ) whilst the second 
signal vw = - appears between the bulk terminals of M, (Mj) and M3 (MJ. It 
may be added that the differential pairs must be integrated on separate wells. 
With the assumptions that the drain-source potential V^ j^  > 4.V( but remains low 
enough to disregard Early effect, the drain current of a PMOS transistor operating in the 
weak inversion is given in Appendix A by 
lD = -Ix-exp 
V GS 
V, exp 
• d - K ) V B S (3.13) 
102 
where all the parameters are defined in Appendix A. Typical parameters for a 
minimum-size device (2.4|am x 2.4|am) fabricated in a standard analogue 2.4 |im 
p-substrate are for a p-type transistor, = 26.6 x 10"'^  A and K = 0.7. It should also be 
emphasised that the normal operation of a PMOS transistor requires that the source/bulk 
and drain^ulk junction be reverse-biased, i.e. V,,^  > 0 and V^ ,^ > 0. 
The transfer characteristics can be determined as follow. Utilising (3.13), it can be 
shown that the sum of the devices currents at the common source node is expressed as 
I B = - ( I M I + I M 2 + I M 3 + I M 4 ) = Ix.exp)^— j 
exp 
e x p [ - K . ^ J . e x p ^ - K . ^ ^ 
( . ( , - . | . ^ ] . » p ( . ( , . K , . ^ ] (3-14) 
w h i l e the d i f f erence output current is g iven by 
'o = ( I M 2 + I M 4 H I M I + I M 3 ) = I x e x p ( ^ — j 
exp 
e x p [ - K . - J - e x p ^ - K . - ^ 
( , , K } . X ^ ] . e x p ( , , . < ) . X ^ ] (3.15) 
Thus combining (3.14) and (3.15) it can readily be shown that 
/'o = lB.tanh K. vx 
2.V, 
tanh (3.16) 
This transconductance relationship is similar to the one obtained by the MOS version of 
the Gilbert cell (2.37) with the exception that the coefficient of the factor, v^ ^ / 2.V, is 
1 - K instead of K . This variation has the desirable effect of extending the linearity range 
of differential input from V, / K « 37 mV to V, / (1 - K ) w 90 mV. It will be shown 
103 
later how this extended linearity has been exploited. Note that the gain of the multiplier 
is adjustable through Ig. 
Since, the core of the multiplier uses only four devices, mismatching and its effect on 
input offset may be more readily manageable. This will be covered in the next chapter. 
It may also be noted that having only one stack of transistors will permit a decrease in 
supply voltage, thus reducing the power dissipation. 
3.3.1.1 Limits of operation 
The operating range of the differential input is, however, limited by the fact that the 
source/bulk junctions of transistors Mj-M^ should not be forward-biased. This condition 
leads to the following requirement 
Vw + V w > Vs + Vj (3.17) 
where is the quiescent bulk common-mode input level, and are the quiescent 
and dynamic source potentials, respectively. The quiescent and dynamic source voltage 
can be obtained by rearranging (3.14) as follows 
Vs = V , . , „ r i ] H . K , ^ . „ . K ) , ^ 
V4.Ix/ Vt Vt 
(3.18) 
and 
V s = V „ In ^sech K. vx 
V , 
] + lnfsechr(l - K ) . ^ (3.19) 
where Vj^ is the quiescent gate potential. Note that whatever the sign or amplitude of the 
dynamic potential applied to either the gate or bulk terminals, the dynamic signal at the 
common-source node will always be negative. Hence combining (3.17), (3.18) and 
104 
disregarding the influence of the dynamic signal the maximum input swing of the 
bulk input can be expressed as 
(3-20) 
Therefore, to fully exploit the extended linearity range of the bulk dynamic input, Ig, V^^ , 
and the aspect ratio of the transistors (IV/L) needs to be carefully chosen so that the 
signal swing remains within the limits defined by the above expression. 
3.3.1.2 Simulation results 
The transconductance multiplier performance has been optimised via S P I C E simulations 
based on a 3 V power supply using Level 3 model parameters for the M I E T E C 2.4 |im 
CMOS process. With bias currents set at 250 nA, extensive simulations have shown that 
a maximum bulk dynamic and linearity range of ± 250 mV can be obtained for an aspect 
ratio of W/L =10/1 for the transistors where the common mode voltage of the gate and 
bulk terminals are set to 1.5 V and 2.55 V respectively. The family of simulated static 
characteristics shown in Fig. 3.7 (a) and (b) suggest that over the linear range of v^ y 
(3.16) may be approximated as 
/o = g . v w . t a n h ( ^ K . ^ ] (3.21) 
where g = 1^ .(1 - K ) / 2.V„ which is similar to the transfer function of the modified 
Gilbert cell. Thus this transconductance multiplier may be employed to implement the 
function described by the TS building block. 
It has however been observed in Fig. 3.7 (b) that the linearity range of the bulk 
difference input is greater than that which was predicted by the analytical results. This 
variation is mainly due to the fact that the coefficient which measures the effectiveness 
105 
of the gate in controlling the channel current is a function of the bulk-to-source potential 
of the device [72], [61], and that its value tends to 1 as the level ofV^^ gets larger. This 
effect is however desirable. Comparing (3.21) and (3.3) it may be noted that the signal 
could be used to control the value of the weight. The fact that its linearity range is of 
the order of ± 250 mV, (3.1) indicates that the system would allow for an 8 bits weight 
resolution as long as the precision of the weight storage scheme is within 2 mV. Note 
that the effectiveness of the gate at controlling the barrier energy is inversely 
proportional to the square-root of the device substrate doping [74]. Thus a weak doping 
would tend to increase K towards unity and further extend the linearity range of the 
input of the multiplier. This influence may be observed by simulating an NMOS version 
of the circuit. 
Further simulations based on a range of values for the key static variables Ig, I^, V^ ^ 
and have been carried out and suggest that the dynamic range estimate given in 
(3.20) is reasonable, 
3.3.2 The load 
The load converts the summed (neural) current to a potential as required to drive the 
inputs of the following layer. Given that the thresholding function is incorporated within 
the synapse, to fulfil the conditions of implementation suggested in (3.3), it might be 
assumed that a linear current-to-voltage converter would be needed. However, it will be 
shown in the next section that moderate levels of non-linearity in the load resistor 
characteristics would have little effect on the overall performance of the MLP, and 
consequently a design based on a minimum number transistor structure can be used. 
106 
It has been shown in the previous chapter that the current-to-voltage conversion can 
be achieved using a CMOS arrangement of biased diodes. It has also been noted that 
because the drain current in a subthreshold mode device is an exponential function of 
the gate-source voltage, the lA^ characteristic of such a load resistor would be highly 
sensitive to biasing arrangements. 
Note that the sensitivity could be substantially enhanced by keeping the difference 
potential V^^ - (E, + E^) to a constant level, see (2.44) for details. This may be achieved 
[99] by rearranging the position of the bias sources E , and E j , as indicated in the circuit 
diagram of Fig. 3.8. It is important to note that unlike in the original scheme, the core 
diode group M,-M^ and the associated biasing elements can be implemented using either 
PMOS or NMOS devices, thus allowing for a more compact integration. The biasing 
potentials are generated by two diode connected transistors M j and M^ and are 
controlled by a feedback arrangement comprising transistors M^-M, together with a 
current source I^. Since the potential across the diodes Mg and M,, is nominally fixed by 
I2, variations in the supply (V^J are distributed across the gate-source junctions of the 
shunt transistor and the biasing transistor M5. With the proviso that these transistors 
have matching characteristics and that their quiescent currents are similar, it follows that 
approximately half of the supply voltage variation will appear across the device M5. 
Given the symmetry of the scheme it also follows that an equal variation will appear 
across the diode connected transistor M j . With the biasing potential across the core 
diode group (i.e. V^^ - E , - Ej) buffered from supply variations the load resistance 
sensitivity is correspondingly reduced. 
This phenomenon can be accounted for by assuming that the drain current of a 
PMOS transistor is related to its gate-source potential as 
107 
ID = -Ix.exp (3.22) 
It is shown in Appendix D that, based on this transistor model, the transfer characteristic 
of the scheme is given 
2.V, . . 
Vo = -r::—.Smh 1 1 + 7 ^ (3.23) I4.1z V IM7 j ^ ^ 
where Ic is the current biasing the core diode group and I^ ^^  is the drain current of the 
shunt device M,. It can be seen that since I2 is nominally independent it follows that the 
sensitivity of the resistance value is determined by the static ratio I^ / l^r A.ny given 
level of insensitivity is achievable by ensuring that the shunt current is sufficiently 
larger than the core group current. Although the sensitivity is improved at the expense 
of an increase in the static current consumption, when considering a battery-powered 
system it is however a justifiable expense since supply voltage cannot in practice be 
regarded as a constant. 
3.3.2.1 Simulation results 
The family of simulated static characteristics in Fig. 3.9, is based on a bias current I^ of 
3.125 nA, an aspect ratio of W/L = 5/1 for all transistors and a 3 V power supply. They 
show the level of sensitivity to a power supply variation of ± 100 mV. For such biasing 
conditions the sensitivity is enhanced by 40 times while the static current consumption 
is increased by a factor of 30. However, it can be appreciated that under these conditions 
of operation the transfer characteristic of the load can be approximated as 
108 
Note that even in this mode of operation the power dissipated by the load is of the same 
order as the multiplier. 
Extended simulations have also indicated that increasing the aspect ratio W/L of all 
devices has the effect of decreasing the static current ratio I^ , / I ^ , and consequently 
improve the sensitivity of the nominal resistance to power supply variation. It may be 
pointed out, as a matter of interest, that to render the load completely insensitive to 
supply variation it would be necessary to substitute both biasing diodes Mj and M^ by 
two ideal current-controlled voltage sources. This could be achieved by buffering the 
biasing potential E , and Ej utilising two op amps operating as followers. However, in 
the context of neural network design, this solution would not be area efficient. 
3.4 Circuit Implementation 
Following the results presented in the above section it can be seen that an analogue 
implementation of the neural network architecture depicted in Fig. 3.2 can be realised, 
as shown in Fig. 3.10, using a Transconductance-Thresholding-Synapse (TTS) [64] to 
emulate the function of the T S block. The T T S building cell is composed of a 
four-quadrant multiplier and two grounded current sources whose values are half of the 
biasing current source. The role of the latter devices is to eliminate the quiescent 
component of the output signal of the multiplier so that the difference output current of 
the T T S contains only a dynamic term. This choice has been made in order to avoid 
high current levels at the summing junction which would require larger wiring busses 
and create undesirable parasitic coupling capacitance. It can readily be demonstrated 
(3.21) that in the linear range of V^; < 2.V, / (1 - K)) the difference output current 
of a T T S can be expressed in the form 
109 
Ipk = g . V w i . t a n h ( K . ^ j (3.25) 
Note the graphic illustration of a TTS cell is circle embodying a thresholding and a 
multiplying sign. The summation function can simply be achieved by connecting the 
outputs of the T T S to a common bus bar. Finally the current-to-voltage conversion can 
be realised using the load arrangement described earlier whose transfer characteristics 
(3.24) can be approximated as 
Vs, = ^ . s i n h - { i ^ ] (3.26) 
For reasons of convenience, the symbols of both the summation and load circuits have 
been brought together, as shown in Fig. 3.10. 
3.4.1 Combined load and transconductance-thresholding-synapse 
The transfer characteristic of an analogue T S block can be obtained by combining (3.25) 
and (3.26) and its normalised input and output variables are related as 
Pk = T^.Vwi. tanhrsinh- ' (Si ) l (3.27) 
4.1z ^ 
where = Ipj, / 4.1^^ and S^  = 1^, 14.1^. 
When this expression is compared to (3.3) it can be seen that the value of the 
synaptic weight is given by 
Wi = V w i . ^ | f ^ (3.28) 
which is adjustable in a bipolar fashion via the difference potential V^^, and the 
activation function is tanh[sinh"'(.)]. The fact that the transfer function is not an ideal 
sigmoid is due to the non-linearity associated to the current-to-voltage converter. 
110 
However, the tanh[sinh"'(.)] function is similar to the most commonly used form of 
activation function which is the hyperbolic tangent, as shown in Fig. 3.11. As a 
consequence (3.27) may be approximated as 
Pk = 4^.Vwi-tanh[Si] (3.29) 
Following this analysis it can be appreciated that the non-linearity associated with the 
load circuit has a minor influence on the transfer characteristic of a feedforward neural 
system. It will be shown in the next subsection that the presence of the distortion 
actually enhances the processing speed of the ANN. 
Note that to provide a better immunity against noise all signals associated with the 
TTS and load are differential. Although this improvement is at the expense of lowering 
transistor integration density, this trade-off is however tenable when one considers the 
low current and voltage levels associated with weak inversion operation. 
Comparing (3.29) and (3.3) it can be deduced that an analogue implementation of a 
fully connected feedforward neural network can be achieved utilising T T S and load 
circuits as is illustrated in Fig. 3.12. It can be noticed that since the thresholding 
function is combined with the input of the synapse: 
• An activation function (i.e. tanh(.)) is associated with each node of the input 
layer of the network; and 
• The output of each neuron situated in the output layer does not contain a 
thresholding element. 
A solution to both of these problems will be offered in the next chapter. 
I l l 
3.4,2 Speed of the network 
The processing speed of the analogue system discussed above is mainly determined by 
the amount of current that the T T S circuit can source/sink at its output and by the 
resistive and capacitive loads it is driving. It can be expected that operating speed will 
be lower than for a system whose devices are biased in the strong inversion since 
subthreshold current levels are much lower. However, this is partially compensated for 
by the fact that, in weak inversion in contrast to strong inversion, the layer of charge 
throughout the length of the channel is negligible, thus the gate-to-source capacitances 
are relatively small [74]. The transient characteristic of the analogue structure shown in 
Fig. 3.12 is described by the simplified small-signal equivalent circuit shown in Fig. 
3.13. The current I is controlled by the output current of the T T S , R represents the 
resistive load, and the total capacitance at the T T S output and is given by 
C L = N . C i + C r (3.30) 
where Cj is the average input capacitance of a T T S , N is the number of T T S inputs 
connected at the output of the driving TTS-load, and is the input capacitance of the 
current-to-voltage converter which is an order of magnitude smaller than Cj. Thus for 
large networks (i.e. N > 10), C L = N.C; may be used as an approximate relation. 
If I^^^ is the maximum current available at the output of a T T S , and C L is assumed to 
be linear, then the rate of change of the output voltage is defined by the non-linear 
differential equation 
I ™ , = C L . | + 4 . I z . s i n h ( K . 2 ; ^ ] (3.31) 
One of the solutions to this differential equation is given by 
112 
v(t) = ^ . t a n h - W 
tanh 
max ^ 
4 . V , . C l 
tanh 
max ^ 'z 
4 . V , . C l 
(3.32) 
Evaluating this expression with the software package: Mathcad, shows that for a 
maximum output current = \q/2 = 125 nA (single ended) and a load biasing current 
of 3.125 nA, the settling time of (3.32), to within 1% o f error, is about 1.5 for a 
capacitive load o f 0.25 pF (i.e. N « 5). As depicted in Fig. 3.14, a SPICE transient 
simulation also indicates a similar processing speed. Note that this rate o f change 
corresponds to that o f a single layer and is mainly dependent upon the number o f TTS 
connected at the output o f the driving TTS-load arrangement. A n approximation o f the 
processing speed of an MLP would be detenmined by the sum o f the transient time for 
each layer which is consequently proportional to the size o f the network. The above 
results may therefore be considered as exemplars. However the fact that the resistive 
element has a non-linear characteristic substantially improves the time response o f the 
neural network as is demonstrated below. 
To obtain an A N N system that would accurately possess the characteristic given in 
(3.29), the transfer function o f the resistor would need to be 
(3.33) 
which is the effective driving-point resistance o f the non-linear load at the origin (3.26). 
Alternatively the computation rate would be determined by the solution o f the following 
differential equation 
I m a x = C L . | ^ + V 
dv . 2.1z.K 
Vt 
(3,34) 
113 
which yields, as shown in Fig. 3.14, a relatively longer settling time o f 8 |is. This result 
may be explained, in physical terms by the fact that for the structure shown in Fig 3.12 
the effective value o f the resistive element decreases as the current f lowing through it 
increases, thus one would expect the charging/discharging time to be reduced. 
3.4.3 Weight range 
It can be noted in (3.28) that, since the signal range o f the weight control input is limited 
to ± 250 mV, the maximum and minimum weight values (i.e. weight range) are 
determined by the ratio Ig / Iz- Thus for a synapse tail current o f 250 nA and a load 
biasing current o f 3.125 nA the weight range is set to ± 10. Note that this range can 
easily be altered to suit any given neural network problems since Ig and 1^  can be 
externally programmed. 
It is also interesting to note that the power dissipation can be traded o f f against 
processing speed (3.32) or vice versa without effectively changing the weight range. 
3.5 Summary 
The limited input signal range for a subthreshold mode transconductance multiplier has 
been extended by distributing some thresholding operations for feedforward neural 
networks over to the inputs o f the synapses. 
The conceptual circuit o f a TTS cell based on a new four-quadrant MOS analogue 
multiplier has been presented. The circuit is area efficient and less likely to be 
vulnerable to mismatching effects since a minimum number o f devices are utilised. It 
has a low power dissipation because a minimum stack o f devices permits a reduction in 
the supply voltage. It offers a wide range linear differential input and the non-linear tanh 
114 
function associated to the other differential input has been used to achieve the activation 
function required by the TS. The conditions o f operation o f the multiplier have also 
been analytically derived and verified via a series o f SPICE simulations. 
An alternative means to resolve the problems associated with load arrangements 
based on diode connected transistors has also been presented. Sensitivity to power 
supply is however improved at the expense o f power dissipation which remains low 
enough not to be considered a burden. The cell is also area efficient since a single type 
o f MOS transistor is utilised. 
Analytical and simulation studies have also shown that the non-linearity associated 
with the load does not interfere with the characteristics o f the network and considerably 
enhances the processing speed. 
It has also been established that the weight range o f the synapses can readily be 
adjusted and power dissipation traded o f f against processing speed irrespectively. 
115 
Ha 
Layer m Layer m+l 
Figure 3.1: Local section of a fully connected feedforward neural network. 
116 
.m+l 
m+l 2i 
W, 
m+l 
ki 
Layer m-1 Layer m Layer m+l 
Figure 3.2: Feedforward neural network with the activation 
function distributed over the following synapses. 
117 
iRef 
oo 
Vsj I M I 
Ml Ml 
IM2 
Vdd 
iRef 
IM3 
M3 M4 
IM4 
I w iRef 
+ -• 
IPj 
Wi 
Figure 3.3: Transconductance multiplier based on the modified Gilbert cell 
Vdd 
i m c m i 
t - L ~ I 
M| 
Synapse 
Y i w 
i 
M, 
(a) 
(b) 
time 
L S B 
time 
Figure 3.4: Current memory cell, (a) circuit 
diagram, (b) transient characteristics. 
119 
(a) 
time 
(b) 
time 
LSB 
time 
Figure 3.5: NMOS version of Pain and Fossum current memory cell. 
(a) circuit diagram, (b) transient characteristis. 
120 
Vc 
'o = G M 2 + 'IV14)-(1M1 + ' M 3 ) 
Figure 3.6: Circuit diagram of the four-quadrant multiplier. 
121 
200nAT-
lOOnA 
lOOnfl 
l B = 2 5 0 n A 
100mU lOOraU 
Vw=300 mV 
^VV= -300 mV 
•200nn 
-2S0inU -20BnU 200niU 250nU 
(a) 
IQOnn 
Iq on 
-10Qnn 
l B = 2 5 0 n A 
ZDOnU 200nU 
Vx = 100 mV 
=-100 mV 
350nU 
(b) 
Figure 3.7: Static characteristics of the four-quadrant analogue multiplier. 
(a) against v^ , and (b) against for {W/L)^,^^^ = 24Mm / 2.4um, a 
common-mode gate potential o f 1.5 V and a quiescent bulk signal o f 2.55 V. 
122 
Vdd 
l 7 _ 
20O111U 
Vo OU 
-2OO1QU 
Figure 3.8: Current-to-voltage converter comprising a 
feedback mechanism. 
Vdd = 2.9 V 
Vdd-3,1 V 
•iioanu + 
- 1 2 0 n n BOnA 120nA 
Figure 3.9: Static characteristics of the current-to-voltage converter 
for various supply voltage, against for {W/L)^^,^ = 12|im / 2.4|im 
and Iz = 3.125 nA. 
123 
Sum 
i n 
<• 
! + 
Isi 
Load TTS 
Vdd 
iz 
HP 
SI 
R > 
Figure 3.10: Analogue implementation of a section of a feedfoward neural net>vork 
/ ( X ) o h 
Tanh(x1 
Tanh(sinh (x) 
-0.5 
Figure 3.11: Sigmoid activation function. 
125 
layer 2 layer m-1 output layer 
Figure 3.12: Analogue implementation of a fully connected feedforward neural networl« 
V 
0.4J 
035 H 
h 
0.15 
0.03 
Figure 3.13: Simplified small-signal equivalent 
circuit of a T T S and load. 
Figure 3.14: Transient characteristics of the combined T T S and load 
circuits. 
SlmulalloD results 
^ Analytical results 
—X— Linear load 
127 
Chapter 4 
Design and implementation of a neural 
network chip 
An analogue technique for implementing a fu l ly connected feedforward neural network 
and the conceptual circuits o f the basic building blocks along with their performances 
have been discussed in the previous chapter. The objective o f this chapter is to present 
the design, in a detailed fashion, o f all the elements that have been developed for the 
implementation o f a whole neural network chip. The complete structures o f the two 
previously discussed schemes which make up the body o f the network are presented 
along with an extra two cells that have been conceived to overcome the problems 
associated with the effect o f combining the thresholding and synaptic functions. The 
issue o f weight storage is also covered in this chapter. This w i l l include an overall 
presentation o f the system and a thorough discussion, based on analytical results, o f its 
performance and that o f the cells it is incorporating. The overall behaviour o f a basic 2 
layer analogue feedforward neural network such as the XOR which incorporates 9 
synapses, 3 neurons and a weight storage scheme has been assessed via simulation 
128 
studies. A part of this chapter will also be dedicated to introducing the whole design of 
neural network chip that comprises a 10:6:3 MLP and a 2:2:1 feedforward network. The 
final section of this chapter will be devoted to a discussion on the layout of the entire 
system and the physical features of the neural network chip will also be outlined. 
During the development of the chip, the choice of the design techniques, and 
consequently the circuit structures, has been made keeping in mind five different issues 
namely, the size of the most frequently used cells as determined by the number and 
dimensions of the transistors involved, the power dissipation, the speed of operation, the 
accuracy of computation and the ease of integration. 
4.1 The complete circuits of the neural network 
It has been indicated in the first chapter that an MLP is composed of multiple arrays of 
identical modules namely, multipliers, summers and thresholding generators. It results 
fi-om such a type of framework that any sort of MLP topology can be created by simply 
modifying the size of these groups of array. However since the combination of the basic 
neural units has been modified, as explained in the previous chapter, it may be shown 
how this valuable feature has been retained at the circuit level. 
4.1.1 The transconductance-thresholding-synapse 
As mentioned earlier, the TTS cell includes three current sources of which one is 
biasing the core of the four-quadrant multiplier and the other two have the function of 
eliminating the quiescent component of the difference output current. It is essential that 
the current values of the latter two sources are matched and are a fraction of the former 
source by a factor of half in order to avoid a quiescent current build up at the summing 
129 
junction which could drive the load arrangement into saturation. It is therefore desirable 
that these current sources are controlled at the circuit level rather than at the chip level 
to avoid large mismatching between devices resulting from long distance parameter 
variations [82]. This would be even more apparent as the devices are biased in the weak 
inversion. Thus in order to render the circuit modular and to prevent substantial current 
offsets, all the transistorised current sources are driven by a single bias potential Vg, as 
depicted in Fig. 4.1. The output DC cancellation is performed by the two matched 
NMOS devices and M,o, while a third transistor Mg, of identical characteristics, is 
generating the biasing current via a PMOS current mirror consisting of equal size 
elements M^, and M^. The dimensions of the transistors (W/L) are given in (im. A 
nominal gain of 2 is achieved for the current mirror by utilising two multiplier 
transistors (M^-M,) of identical dimensions rather than a single device whose width 
would be double that of the reference element (M5). This choice was made in order to 
minimise error in the current ratio due to geometrical mismatching engendered by the 
etching effect of the polysilicon gate of the transistors [74]. Furthermore, it has been 
shown in the previous chapter that the common source node potential changes when a 
signal is applied to either input of the multiplier. A similar phenomenon also takes place 
between the output nodes of the TTS as it is directly feeding a load circuit. Since both of 
these variations appear between the drain and source terminals of the transistors forming 
the current sources, long channel devices have been used as a means to diminish the 
effect of channel length modulation. Note the compromise between size and accuracy of 
computation. 
The effect of transistor mismatch on the difference output current of the complete 
TTS scheme was assessed via a series of Monte Carlo SPICE simulations [19]. Pelgrom 
130 
et al [82] have suggested that the dominant source of mismatch between two identical 
MOS transistors biased in the weak inversion is the variation in the zero bias threshold 
voltage and that its variance is given by 
CT^(VTO) = % + S^VTOD^ ( 4 . 1 ) 
where AV^Q is the area proportionaHty constant of the threshold voltage and S^Q 
describes the variation of V^Q with respect to the distance D between the components. 
Both of these parameters are process dependent and can be empirically determined. 
However Pelgrom et al experimental results suggest that for a 2.5 |am, 50 nm gate-
oxide CMOS process, which is a wafer fabrication technology similar to that of 
MIETEC, the value of these parameters are: 
• For an NMOS: AVTO = mV|im, S^jo = 4 | iV/ | im; and 
• For a PMOS: Ayro = 35 mVjim, Syro = 4 \)MI\xm. 
It may be noted that, for small devices situated at a short distance apart, the distance 
dependent component of the variance is negligible with respect to the area component. 
According to this data, a series of 100 Monte Carlo simulations of the whole TTS 
circuit, when both of its inputs are short-circuited and with a biasing current of 250 nA, 
indicate a standard deviation of the difference output current (i.e. CT(Ip,^ )) of 13 nA. Note 
that a(IpK) is approximately 10% of the full scale output current. This offset current can 
be compensated for, during training, by modifying the weight of the biasing connection 
of the neuron. This error correction technique is in practice adequate since it is most 
unlikely that the sum of the offset currents of a batch of TTSs will exceed the maximum 
output current of the biasing synapse. 
131 
As shown in Fig. 4.2, the advantage of utilising a single bias potential (Vg) is that it 
allows for the gain of a large group of TTSs to be straightforwardly controlled. Vg can 
be generated utilising a diode connected transistor biased by an external current source. 
However, due to the fact that the TTS cell 1 and N can physically be far from each other 
their biasing current can substantially be mismatched. This difference can be as high as 
15% for distances of the order of I mm and manifests itself as a variation of the gain, 
which can also be compensated for during training (see (3.28)). 
4.1.2 The load 
The biasing technique described above can also be applied to the load. However, since 
the value of the current source is substantially lower than that of the TTS, the effect of 
process parameter variations over long distance is potentially more devastating. 
According to the transfer characteristic of the I/V converter given in (3.26), it can be 
appreciated that a change of 1 nA in \^ leads to a 20% variation in the effective driving 
point of the resistance. To prevent this undesired phenomenon from occurring, the 
structure of the feedback mechanism has been altered to allow a biasing current level 
similar to that of the TTS to be used while retaining identical resistive characteristics. 
As mentioned in the previous chapter, the role of is to generate a constant potential 
Eo across two diode connected transistors. Note that the biasing requirements could be 
satisfied by substantially increasing the width of these diodes. However this would be 
achieved to the detriment of silicon size. The alternative solution, as indicated in the 
circuit diagram of Fig. 4.3, is however area efficient since a single diode (Mg) is used to 
induce E„. Furthermore the structure of the feedback mechanism, which includes the 
shunt and diode devices together with the biasing source, has been rearranged so that I , 
132 
can be implemented utilising an NMOS device (M,). That is because, as suggested by 
Pelgrom et al, matching is poorer for p-channel transistors than for n-channel ones [82]. 
For the same reason as stated earlier, a long channel biasing element has also been 
employed to limit the effect of variations in the supply voltage on the current l '^ . The 
choice of the other components' aspect ratios were influenced by the necessity for 
suitable power dissipation, sensitivity to supply voltage variations and silicon area. 
Using (4.1) as a means to model the standard deviation of the threshold voltage of 
each transistor, a series of Monte Carlo SPICE simulations of the whole load have 
shown that, for an un-driven input and a biasing current set to 100 nA, a differential 
input voltage offset of about 10 mV can in practice be expected. A similar batch of 
simulations also indicated that the resistance of the load can vary from its nominal value 
by as much as 13% i f situated as far as 1 mm from the biasing generator. 
4.1.3 The pre-processing cell 
As mentioned in the previous chapter, the adoption of a TTS approach results in the 
introduction of a tanh function non-linearity at the input layer connection. This requires 
that the input data be pre-processed [64] to compensate for the influence of this non-
linear function. Where the neural network is incorporated in a programmable loop, the 
pre-processing can easily be accommodated within the host computer software. 
However, i f the neural hardware is physically interfaced to the real world analogue 
sensors, the non-ideality demands a physical solution. Although this can be achieved by 
designing an extra synaptic cell to perform the required operation the preferred approach 
makes use of a Horizontal Resistance (HRes) [22], [100] arrangement which displays a 
tanh''(.) pre-distortion function. The structure of the resistor and its symbol are shown in 
Fig. 4.4. It consists of a core of four diode-connected transistors ( M j - M J and two 
133 
current sources implemented by (M^-Mg) and (M,o-M,3). Note that the values of these 
biasing currents are identical and are twice that of the TTS ones. This choice was 
primarily made for the following two reasons: 
• To make sure that the synaptic weights associated to either the input or hidden 
layers have a similar range of operation. 
• To ensure input/output compatibility for the network design, and allow for the 
possibility of interconnected neural network chips and the implementation of 
recurrent networks. 
The additional advantage is that both HRes and TTS cells can share the same biasing 
generator as long as their biasing transistors have identical aspect ratios. 
Following the Kirchhoffs current law, it can readily be shown that 
l, = (h^M^^ (4 .2 ) 
For a core of matched transistors operating in the subthreshold mode of conduction 
(2.28) it can also be demonstrated that the difference output currents of the top and 
bottom differential pairs are respectively given by 
e x p l - K - ^ l - e x p l - K - ; ^ J 
1. -12 = - 2 . I B . = 2 . l B . t a n h ( ^ K ^ J (4.3) 
exp - K - ^ +exp -K-p-
and 
exp K - ^ -exp K - ^ 
exp K — +exp K — 
\ V , y \ V, 
134 
Thus combining (4.2), (4.3) and (4.4) it follows that the difference input voltage can be 
written as 
V , = 2 ^ . t a n h - { J i - ^ (4-5) 
where 1; is the differential input current and V^. = V^^. - Vg.. Accordingly, the 
normalised function of the input layer synapse is obtained by combining (3.25) and (4.5) 
and can be expressed as 
where Ij = Ij / 2.1B. expression confirms the linearisation of the input layer synaptic 
function with the weight given as: Wj = Vy^i.lB.(l-K) / S.Iz-V,. The other benefit of using 
a horizontal resistance is that a differential current is needed to drive the input 
connection and consequently limits external noise pickup. Note that (4.5) is valid for 
differential input currents smaller than I 2.1B I • simulated static transfer 
characteristic of the HRes are presented in Fig 4.5 and confirm the tanh"' distortion. 
Extra simulation, whose results are depicted in Fig. 4.6, have also been conducted to 
confirm that when a TTS is fed by an HRes its characteristics are linearised. 
A series of Monte Cario SPICE simulations of the HRes circuit suggested that for a 
null differential input current one can expect a 4 mV output offset due to mismatching 
effects. Observe that this standard deviation is noticeably smaller than that of the 
previously discussed scheme. This result may be explained by the fact that in the latter 
circuit the biasing sources have been implemented using a multiple transistor design 
[19], [65]. 
135 
4.1.4 The output layer activation ceil 
Having combined the thresholding and synaptic functions, it has previously been 
mentioned that the neurons of the output layer are deprived of their squashing functions, 
see Fig. 3.11 for more details. To conform to MLP design, these activation functions 
can readily be implemented utilising extra TTS circuits for which the weight values are 
set to one. These specific TTSs will be referred to as Output Neurons (ONs). 
Implementing the output layer neuron function using a TTS results in a differential 
current output which also reduces external noise pickup. Furthermore the structure of 
the TTS has been slightly modified to introduce an internal current amplifier thereby 
allowing an external resistive/capacitive load to be driven directly. The scheme and its 
symbolic representation are shown in Fig. 4.7 and consists of a basic TTS cell and 
additional current mirrors. The gain of the amplifier has been limited to eight, mainly 
for reasons of power consumption and silicon area. Over the linear range of W^-, the 
differential output current of the ON is determined by the transfer characteristic of the 
TTS (3.25) times the gain of the current amplifier which may be approximated as 
I o = 8 . g . V w i . t a n h ( K . ^ ] (4.7) 
Since a multiple transistor technique has been employed to minimise mismatching in the 
current amplifier circuit, one can therefore expect the standard deviation of the 
difference output current offset of the ON to be roughly that of the TTS times the gain, 
which is an approximation that has been confirmed by a series of Monte Carlo 
simulations. 
136 
4.1.5 The core of a feedforward neural network 
As shown in Fig. 4.8 the core of a fully connected feedforward ANN can be efficiently 
implemented in analogue low power VLSI technology using: 
• TTS circuits to simulate synaptic and sigmoid thresholding functions; 
• Common bus bars to achieve the summation operations; 
• Non-linear resistive elements to convert the summed currents to potentials as 
required by the following layer of TTSs. 
• HRess to restore the linearity of the input nodes; and 
• ONs to emulate the activation function of the output layer neurons. 
It may be noted that, since both the inputs and outputs of the network are compatible, 
the design technique allows for recurrent neural networks, such as the Hopfield network, 
to also be implemented. 
4.2 Weight storage scheme 
So far, it has been theoretically shown that, based on the principle of distributing the 
thresholding fimction over the next layer of synapses, the basic processing functions of 
feedforward and possibly recurrent neural networks can readily be simulated, in a 
systolic manner, utilising TTS, load, HRes and ON analogue circuits. However, before 
proceeding to the design of a prototype neural network chip, it is necessary to consider 
the issue of analogue weight storage. 
137 
The design strategy for the weight storage scheme is based on the need to maintain the 
weight quiescent potential at a fixed level so as to insure maximum swing capability 
(3.20). 
4.2.1 Capacitive storage 
It may be noted that in the proposed neural network design the weight values are 
controlled in a bipolar fashion by the differential potentials V^^ ,; at the bulk terminals of 
the transistors forming the core of the TTS cells (3.28). As mentioned in chapter 2, 
ideally the analogue memory cell in a neural network design would be characterised by 
nonvolatility, a high resolution (at least 8 bits), easy access, high update speed (needed 
in the learning phase) and minimum use of die area. Unfortunately, the analogue storage 
device that satisfies all of these conditions has yet to be developed. 
To surmount this issue, the proposed solution was to thoroughly investigate the 
characteristics of a sample/hold circuit, in order to fully optimise its performances. 
As mentioned in chapter 2, the basic capacitive sample/hold system shown in Fig. 
4.9 which consists of a PMOS switching transistor Mg and an holding capacitance C^^^^ 
suffers from both charge injection and charge leakage. These effects can severely limit 
the resolution of the weight signal when corresponding sensing systems are biased in the 
weak inversion region (3.1). Note that the holding capacitor can be implemented using 
an NMOS transistor since the quiescent component of the weight signal will always be 
much greater than V^. 
4.2.1.1 Charge injection 
Throughout the sampling period the switching transistor conducts, thus a finite amount 
of hole carriers are stored in its channel region to form the inversion layer. During the 
138 
tuming-off transient, when the gate voltage of M^ rises from ground toward V^^, these 
charges evacuate the channel through the source and the drain connections. Thus the 
packet of charge injected onto the holding capacitor induces an error voltage. Although 
the inversion layer disappears as the source-to-gate voltage of the switching device 
drops below -V^, the offset voltage continues to rise, since charges are still fed through 
via the gate-source overlap capacitance, until the clock signal reaches a steady state (i.e. 
Based on the lumped model suggested by Sheu et al. [89], the magnitude of the error 
voltage induced on the holding capacitor in Appendix E can be approximated as 
' 1^  2.1Cp 
2 
Chold 
J 
.erfl 
2.U.C . ( V W + V T ) hold 
+7^(Vdd-[Vw+VT])(4.9) 
^hold 
where U is the rising-rate of the clock (assumed to be constant), Kp is the 
transconductance parameter of Mg, C^ ,^  is the gate overlap capacitance, CQ is the gate 
capacitance, is the input signal voltage and is the threshold voltage of the 
switching device which is V j = VTO - Y-(yVdd-Vw-<)>B - f ^ ^ • 
For a minimum feature size switching transistor, an holding capacitor of 1 pF and a 
switching-off period of 5 nS, Fig 4.10 shows the level of clock feedthrough voltage 
predicted by (4.9) and a SPICE simulation. These results indicate that over the weight 
signal range one can expect a maximum switch-induced error voltage of the order of 5 
mV, thus limiting the resolution of the weight to 6 bits. 
In order to improve the accuracy of such a storage scheme, a standard dummy 
compensation technique has been considered [90]; see Fig. 4.11, which consists of an 
139 
additional half-width short-circuited transistor M^, driven by an antiphase clock (((»/). 
Following [89] the effect of clock feedthrough in Appendix F is given by 
Ve = T T . U . C hold 
2 .Kp 2.Choid 
exp r j | ^ ( v J , + 2 . V ^ ^ - 3 . V < , , . V T w ) 
erfl 
Cox 
lV2.U.Cho,d ™ y - e n 2 . U . C hold 
.(2 .VTw-Vdd) 
2.C hold 
. (Vdd-Vrw) 
where V^.^ = V .^ + V^. 
Given that the erf(.) function may be approximated as 
1 r ( 1-2^  
erf(:c) = ^ . 1 - ^ for AT« 1 
(4.10) 
(4.11) 
Then for a fast switching-off transient, i.e. U » K^.V^w / 2.0^,,^, (4.10) may be simplified 
to 
V e = - r ^ . ( V d d - V T w ) / l - e x p 
^•(-hold V U . C hold 
•(Vdd-VTw).(Vdd-2 .VTw) (4.12) 
It can be appreciated that, within this condition of operation, for an input signal set to 
(^ dd / 2) - V .^ the error voltage induced by the turning off of the switching transistor is 
completely compensated for by the dummy device. However, for a 4.8|im / 2.4|im 
switching device, a switching-off period of 5 ns and an holding capacitor of I pF, as 
shown in Fig. 4.12, (4.12) and a SPICE simulation show that the absolute error in the 
full range of stored voltage is reduced to a level where the required 8 bits weight 
resolution is readily achievable. 
140 
4.2.1.1 Charge leakage 
As mentioned in the previous chapter, the problem of leakage may be addressed by 
continuously refreshing the weight information using on-chip circuitry that converts 
value stored in digital memory into an analogue signal. However, the number of 
parasitic leakage paths for the dummy compensated sample/hold circuit is more than 
double that of the simple scheme. Assuming a total leakage current of 5 pA, the voltage 
stored onto the capacitor would need to be restored once every 400 |is i f an error of no 
more than 2 mV is to be allowed. In order to reduce this refreshing frequency and 
consequently power dissipation, a twin-capacitor structure has been adopted, although 
such a choice does impact on die size. 
Assuming that both storage arrangements have matched characteristics (i.e. similar 
leakage current), the weight signal is influenced and therefore needs to be updated when 
one of the capacitors has reached complete discharge. Under worse case conditions, that 
is when V*^ is set to 2.8 V, a refreshing cycle of 40 ms could then be tolerated. 
4.2.2 The refresh mechanism 
The differential weight potential is generated using the on-chip circuit shown in Fig. 
4.13. The structure consists of a DAC, two silicon resistors and a pair of buffer 
amplifiers. Row and column decoders direct the differential voltage to the refreshed 
TTS. In our case, the digital weights are stored in an external Random Accesss Memory 
(RAM), however, this component could easily be integrated within the chip. The 
maximum refreshing speed is mainly dependent on the RAM access time, the 
conversion time and the time required to charge the storage capacitor. In the switch-on 
mode, Mj is biased in triode mode region and the charging time constant is 
141 
T - R s . C h o i d = J. ^ — r - C h o i d (4.13) 
where represents the channel resistance of the path transistor. For set to 2.8 V, the 
maximum time constant is approximately 22 ns, therefore the holding capacitor is fully 
refreshed to within 1% of its desired value in 75 ns, (i.e. 5XT). 
4.2.2.1 Digital-to-analogue converter 
For reasons of compactness and speed, the DAC uses a binary-weighted current source 
technique, see Fig. 4.14. The basic current multiplier consists of 70 identical transistors 
M, to M70 with common gate and source nodes. Note that the LSB is implemented using 
four transistors connected in series whereas the Most Significant Bit (MSB) consists of 
thirty-two devices combined in parallel. This design approach was considered since it 
limits the number of transistors while the influence of the body effect on the weight of 
the LSB still remains negligible. The currents are controlled by PMOS current switches. 
The clock feedthrough associated with the bit switch, is compensated by the addition of 
a dummy transistor in the output current line. To limit the influence of the output 
voltage swing, the output conductance of the transistors forming the current sources of 
the four most significant digits (b^-b,) have been decreased utilising a feedback 
amplifier as in the simple regulated cascode circuit [101]. The output current may be 
expressed as 
lDAC = ^ - i 2 . ' - ' . b i (4.14) 
where bj is either "0" or " 1 " according to the binary state of Bj and Ip is the biasing 
current (set to 80 nA). A further bank of transistors, not shown in Fig. 4.14, generates a 
reference current, 1^^ identical to the most significant binary-weighted source. 
142 
The linearity error associated with random mismatches in the conversion elements is 
critically important. Hence, the circuit yield is a function of the matching accuracy of 
the current sources. Following Lakshmikumar et al. [81], the yield of the proposed 
structure may be expressed (Appendix G) as 
G = nerf1 
j=i j.(255-j) • f2 ^ 
IF > 
(4.15) 
where oip /1? represents the standard deviation for a unit current source in percent. As 
can be seen in Fig. 4.15, a yield of 100 percent can therefore be achieved for a matching 
standard deviation in the unit current source of about 1.5 percent. However, Pelgrom et 
al [82] indicate that such standard deviation can only be obtained for devices operating 
in the saturation mode of conduction. Within this condition of operation and assuming 
the variance of the transconductance factor (CT\K^)) to be negligible with respect to the 
variance of the threshold voltage, the standard deviation in the current source may be 
approximated by 
IF ywr (Vg, -VT) ' 
where W, L and \ ^ are receptively the channel width, length and gate-to-source 
potential of the transistor used as a current source. Given that for a saturated n-type 
MOS device (V^^ , -V^. f = lA^.L I W.Y^^, the length of the transistor for a given standard 
deviation may be expressed as 
143 
It may be pointed out that in the saturation mode of conduction, the standard deviation 
of the drain current is inversely proportional to the drawn length (L) of the device. In 
our process is nominally 51.7x10'** A A ^ l Consequently, to obtain a DAC with an 
integral non-linearity of ± 1 LSB with a 100 % yield requires a length greater or equal to 
70 |im. The smallest possible size transistor (W/L = 3.6|im / 70|im) was used to build 
the current multiplier and a common centroid layout adopted to further reduce 
mismatching effects. 
4.2.2.2 Operational amplifier 
The DAC outputs are converted to differential potentials via two silicon resistors. To 
ensure high speed operation and sufficient current driving capability, two op amps 
configured as buffers interface the DAC-resistor structure and the weight storage 
mechanism. With the bias supply-voltage equal to 3 V and the common-mode weight 
inputs set at approximately 2.55 V the op amps are required to operate at near top rail 
input and to provide a wide unity-gain bandwidth to allow for fast settling time on low 
load capacitor (1 pF), and dissipate as little as possible power, within a small die area. A 
structure offering most of these features is shown as Fig. 4.16, and is based on a design 
described by Hogervorst et ai, [102]. This two-stage structure comprises a folded 
cascode input stage and a class-AB output stage. Since common mode input bottom rail 
operation is not needed in our system, the P-channel differential input pair and the g^ 
control circuitry have been omitted leading to a more compact and simple structure. The 
initial choice of device aspect ratios was determined by power dissipation, slew-rate, 
silicon area and input offset considerations. Final drawn values were developed using a 
SPICE simulator. 
144 
The simulated open loop frequency response of the op amp loaded by a 10 kQ 
resistor and 10 pF capacitor is shown in Fig. 4.17. The result indicates that a unity-gain 
bandwidth of approximately 3.5 MHz and a unity-gain phase margin of 63° are achieved 
for a biasing current of 2.5 ^lA. Extra simulations have also confirmed that, configured 
as a unity gain buffer, the amplifier offers adequate performances even when operating 
in near top-rail common-mode input signal. Note that a multiple transistor technique 
was used in order to limit the effects of process parameter variations and consequently 
minimise the offset voltage of the input stage. A batch of Monte Carlo simulations has 
suggested that a 5 mV input offset can be expected. However this error level can be 
reduced i f one considers using a common-centroid layout structure. 
4.2.2.3 Decoders 
Since the chip includes 98 TTSs and 5 ONs, analogue weight information is necessarily 
refreshed and updated in a sequential manner. The clock signal (<!>) for the access switch 
in each TTS (or ON) is generated by addressing the appropriate row and column. To 
reduce the number of external pin connections to a minimum, the signals are generated 
using a 3-to-8 row decoder and 4-to-14 column decoder based on standard commercial 
designs. However, to synchronise the clock signal with the analogue weight 
information, an extra control command line has been added to the decoders. 
4.3 Simulation of a neural network 
The capacities of the proposed designs (i.e. TTS, load, ON, HRes, capacitive storage 
and weight refreshing scheme) have been assessed via a whole neural network 
simulation in SPICE. To limit the size of the complete system, as determined by the 
145 
number of transistors, and consequently simulation time a simple 2:2:1 MLP performing 
an XOR function was selected. The structure of the network is shown in Fig. 4.18 (a) 
and comprises 3 input nodes (including bias), 2 hidden processing units and an output 
neuron and is emulated, in Fig. 4.18 (b) using 3 HRess, 9 TTSs, 3 \N converters, an 
output neuron and a complete weight refreshing scheme. For reasons of simplicity, the 
holding capacitors are not represented in the diagram. 
An extra TTS has been added to each neuron to bias the weighted sum. The inputs to 
these bias synapses were set to unity via an HRes. Note that since this input level is 
common to all biasing TTSs a single bias input has therefore been used. This biasing 
technique will also be apphed to the design of a neural network chip since: 
• It limits the number of biasing inputs to one and therefore reduces the amount 
of HRes cells; 
• Power dissipation is consequently diminished, while processing speed is 
unaffected since the bias is kept to a constant level; and 
• The number of connection pads is also minimised. 
To avoid having to train the analogue network, the sequences of digital data applied at 
the inputs of the DAC were determined utilising a set of weight values, given by a 
trained neurocompuler, which were then converted using (3.28) and (4.14). Fig. 4.19 
shows the performances of the simulated XOR network. The first 20 i^s of the 
simulation was dedicated to charge each weight storage capacitor pair to its appropriate 
difference analogue voltage using time multiplexing. The four distinct input patterns in 
Fig. 4.19 (a) were subsequently fed to the network in a sequential manner. It may be 
noted that as expected, the propagation time of the 2 layer network is approximately 
146 
4 |is since the maximum weight value was set to 10, i.e., the bias current of the TTSs 
was 250 nA and I '^ = 100 nA. 
Extra simulations have also shown that the complete refreshing scheme (i.e., DAC, 
op amp and sample/hold cells) offers a suitable level of functionality and that one can 
expect a maximum refreshing speed of I (is. This result also suggests that one of this 
structure could be used to refresh or update, in a time multiplexed manner, the weight 
values of 40 000 TTSs, since a 40 ms holding time is expected. 
It may be appreciated that a physical silicon implementation would not necessarily 
exhibit the performances predicted either by simulation or theoretical results for several 
reasons including modelling imperfections and process/device tolerances. Therefore the 
above mentioned results should be regarded as guidelines. 
4.4 The neural network chip 
A prototype chip was fabricated using the Eurochip 2.4 \im double metal, double poly 
p-well type process. Since information characterising the behaviour of subthreshold 
mode devices was unavailable, much attention has been directed to the design of 
individual test structures. The primary aim of this prototype design was to demonstrate 
that the proposed technique along with the suggested circuits can be trained to perform 
practical pattern recognition tasks. The neural network chip consists of: 
• Two TTSs, an ON and two lA^ converters as test structures; 
• An XOR network comprising 3 source nodes (including bias), 2 hidden 
processing neurons and 1 output unit; 
• An 10:6:3 MLP; 
147 
• A row and column decoder for weight addressing; 
• A DAC, two silicon resistors and two differential amplifiers for weight 
refreshing; and 
• Two biasing generators which provide the bias potentials required by the TTS, 
HRes, ON and load circuits. 
Note that no HRes circuit was integrated for test purpose, that is because its transfer 
characteristic will readily be available from any inputs of the two networks. 
4.4.1 Exclusive OR network 
The XOR function has been included for test purposes since its simple but highly non-
linear characteristics serves as a benchmark problem. As mentioned earlier, the network 
consists of 3 input nodes (including bias), 2 hidden processing units and an output 
neuron and is implemented using 3 HRess, 9 TTSs, 3 l/V converters and an ON. The 
outputs of the 2 hidden processing units were made available for measurement purposes 
via two pairs of CMOS switches. These switches were utilised so that the capacitive 
loads associated with the bonding pads could be isolated from the network and 
consequently processing speed would be optimised. 
4.4.2 QRS complex detector 
To further demonstrate the capability of the proposed approach, a 10:6:3 perceptron, 
trained to perform the seemingly complex function of detecting the QRS complex of 
foetal electrocardiogram (ECG) has been included. This type of application has already 
received much attention [80]. It may also be noted that this exercise is similar to the 
cardioverter defibrillators [79]. 
148 
Although Ifeachor et al. [80] results suggest that a 20:6:1 feedforward network 
performs marginally better than a 10:6:1 MLP in detecting QRS complexes, the latter 
scheme was however integrated in order to comply with wafer fabrication process 
limitations (i.e. limited amount of bonding pads). Nevertheless an extra two output 
neurons were added so that the quality of the detected QRS complexes could be 
classified in three different categories (i.e. good, average and poor). The complete 
structure of the QRS complex detector is shown in Fig. 4.20 and was simulated using 87 
TTSs, 11 HRess, 9 W converters and 3 ONs. 
4,5 Layout techniques 
To exploit the modular structure of fully connected feedforward neural networks, a full 
custom layout approach [19] was adopted for generating the masks of the basic building 
blocks described earlier in this chapter. In order to minimise the complexity of inter-cell 
connections and consequently enhance device density, signals generally shared by 
adjacent circuits were distributed via banks of horizontal and vertical metal wires. This 
layout strategy, as shown in Fig 4.21, allows for the implementation of any size matrix 
of unit elements. At the circuit level, to achieve acceptable matching between critical 
devices (i.e. differential-pair transistors and mirror transistors) several considerations 
were taken into account: 
• The devices were integrated as close as possible to each other and arranged 
with the same orientation; and 
• A common-centroid-symmetry layout design was employed, in which devices 
are connected around a central point. 
149 
Since the chip includes both analogue and digital circuits, to attenuate the noise 
interaction attention was paid to the: 
• Use of separate analogue and digital supply connections; 
• Maximum physical separation of the analogue and digital elements where 
possible, see floor plan of the chip in Fig. 4.22; 
• Use of guard rings. 
Since a large capacitance is usually associated with the bonding-pad diodes of an 
electrostatic discharge protection cell, all the analogue inputs and outputs of the chip 
were left unprotected from static electric charges so as to maximise the processing speed 
of the networks. Note that some protections are however offered by the reverse-biased 
source/bulk and drain/bulk junctions of the transistors incorporated in the HRess and 
ONs. 
The chip includes 100 pins and the circuits occupy an active area of 2.79mm x 
2.26mm (containing approximately 10 000 transistors and 214 capacitors). A photo-
micrograph of the prototype chip is shown in Fig. 4.23 and its principal features are 
given in Table 4.1. 
4.6 Summary 
The complete structures of the building blocks making-up the core of feedforward 
ANNs have been presented and their vulnerability to process parameter variations 
evaluated via series of Monte Carlo SPICE simulations. Results indicated that the 
expected imperfections in the basic processing cells may be compensated for during 
training. 
150 
The non-linear input and un-thresholded output problems, associated with 
distributing the squashing function over to the next layer of synapses, have respectively 
been overcome by introducing an HRes and an ON which also facilitate compatibility 
between the inputs and outputs of the network and decrease vulnerability to external 
noise pickup. 
A dummy-compensated sample-and-hold circuit has been suggested as a means to 
implement the storage scheme since it readily allows for an 8 bits weight resolution. To 
control the weight voltages an on-chip update/refresh mechanism consisting of a DAC, 
two silicon resistors, a pair of op amps configured as buffers and a row and column 
decoder has been developed. 
The behaviour of the suggested ANN design has been assessed via a series of SPICE 
simulations of the XOR network. Results revealed a reasonable level of performance. 
Finally a prototype neural network chip has been developed in order to experimentally 
evaluate the performance of the proposed feedforward ANN design technique. 
151 
n 4(14.4 n 4/14.4 4/14.4 
l - i M l M 2 
24/2.4 24/2.4 
H - i M 3 M 4 
24/2.4 24/2.4 
I B / 2 
125 nAf 
I -M 8 
3.2/16 
M 9 
3.2/16 
M I O 
3.2/16 5 
Figure 4.1: The complete circuit of a transconductance-thresholding-synapse. 
1000 
Ccncnlor 
Good 
matching 
Poor 
matching 
Vsi 
\ ^ 2 
VSN 
Vsb 
IPK 
Figure 4.2: Array of TTS circuits. 
152 
V d d 
h M5 18.4/3.2 
M8 
3.2/25.2 
M l M2 
9.6/3.2 9.6/3.2 
'Si 
l ~ _IV1 
L 2S». 
, M 3 M 4 „ . , 
•^9.6/3.2 ^ 9.6/3.2p J 
E 
M6 
M9 
3.2/16 
100 nA 
Vz 
Figure 4.3: The complete circuit of the load. 
153 
4/14.4 
V2 
125 nA 
VB 
•ft* 
Vdd 
4/14.4 
M7 
£^4.4 
^14.4 11^ 14.4 
I 2.1B 
' h 24/2.4 24/2.4 H ' 
SI 
Ml. 
r | 24/2.4 24/2.4 |H 
| 2 . 1 B 
lfMi3 iImu \Ai ifd, 
7 ^ 
3.2/16 3.2/16 
3 
3.2/16 3.2/16 
P 
MlO 
3.2/16 
Figure 4.4: Horizontal resistor. 
154 
SOOmU 
ifOOnU 
VSI GU 
-UOOnU 
SOOnU-^-- r - - -
-550nn -HOOnft 200nA OA 200nn HOOnfl SSQnn 
Figure 4.5: Static transfer characteristic of the horizontal resistance. 
ZOOnft 
lOOnft-i 
100nn 
-SSOnd -iiOOnfl -200nA on 
l i 
20BnA 
Vw;«300mV 
I 
Vw -200 mV 
Vw- lOOmV 
V w - 0 
I 
I 
I 
I 
Vw--100 mV 
V v v - -200mV 
Vw--300 niV 
UOOnA SSOnA 
Figure 4.6: Static transfer characteristics of the transconductance-
thresholding-synapse fed by an horizontal resistance. 
155 
0^  
\'si 
VB 
It 
vdd 
4/14.4 lH/14.4 
.2^6 3.2/16 
Vw 
||l24/2.4 24/2.4JI (^ 24/2.4 24/2.r;|l 
J .2/I« 
l ^ f c IKMIO IKM.. IKMU [KMI, |KM,5 |KM,» |K.%t> |KM.> lKMl*|^t»lK>t. IK.M21 i K f a 
^•4 .4 |[4/I4.4 H4/14.4 , |[4/I4.4 , [[4/I4..I | | 4 / I 4 l [[4/14.4 [[4/14.4 ||4/14.. |[4/i4.4 [[4/14.4 H4/U.4 \^14.4 |[_4/I4.4 
J . 2 / I 
^At,4 
J.2/16 
l o 
3.2/l(f 
^^ l7 M<1 
J : 2 7 | ( 1 
M41 
J.2/16 
Figure 4.7: Circuit diagram of the output neuron. 
- J 
layer 2 
activation 
function 
layer m-1 output layer 
Figure 4.8: The complete analogue implementation of a fully connected feedforward neural network. 
Vdd 
w 
TTS 
M l o l d 
Figure 4.9: Basic PMOS-based sample-and-hold circuit. 
0.0052 
0.0038 
0.0048 
0.0046 
0.0044 
0.0042 
0.004 
V w (V) 
Figure 4.10: Level of clock feedthrough voltage of a single 2.4/2.4 ^ irn 
PMOS switch. Switching speed U = 3 V / 5 ns, holding capacitor C .^u = I pF. 
— Calculated 
— Simulated 
158 
Vdd Vdd 
Figure 4.11: PMOS-based dummy-compensated sample-and-hold circuit. 
0.0003 
0.0002 H 
O.COOl h 
^e(V) oh 
-O .OOOlh 
-0.0002 
Vw (V) 
Figure 4.12: Level of clock feedthrough voltage of a PMOS-based dummy-
compensated sample-and-hold circuit. Switching speed U = 3 V / 5 ns, 
holding capacitor C^^^^^ = I pF, Mg 4.8 / 2.4 |im. 
— Calculated 
• Simulated 
159 
R > R 
o 
R A M 
hi " V l r r f 1 \ 
D A C y 
1 
i 
riDAC 
Column Decoder 
0 0 
1 r 
T T S C a p a c i t o r D u m m y 
S w i t c h 
Figure 4.13: The complete refreshing mechanism. 
Vdd 
Mm 
4/14 
I * " 
M105 
3.6/70 
M93 
6.4/3.2 
M103 
10.4/3.2 
MlOO 
4/14 
h i M94 3J/3.2 
M92 
6.4/3.2 
M78 
6.4/3. 
1 ^ 
M102 
3.6/70 
3.6/70 
M104 
3.6/70 
M79 
32/32 
M77 M75 X 
6.4/3.2 6.4/3.2^ W 6.4/3.2 6.4/3.: 
M76 
3.2/3.2 
M74 
' ^ 3.6/70 
1/2 
r 
MS 
3.6/70 
M6 
3.6/70 
1 
r 
Ml 
3.6/70 
3.6/70 
M3 
3.6/70 
• • I D A C 
^ I D A C 
M73 
3.2/3.2 
M71 
6.4/3.2 
M4 
3.6/70 
Figure 4.14: Circuit diagram of the digital-to-analogue converter. 
a / I (% 
Figure 4.15: DAC yield versus current-source mismatch. 
162 
I V U I 
^ 1 mSII m,V1 h'lMlrft'iniHll^l:. HE 
Figure 4.16: Circuit diagram of the differential amplifier. 
100 
50 
180d-F 
135d 
w90d 
1.0HZ 
D5d 
lOtIz 100Hz I.OXHz 10KHZ 
Frequency 
lOQKHz 1.0HHZ 10HHZ 
Figure 4.17: Simulation results of the open-loop frequency response 
of the differential ampliner. 
1 6 4 
(a) 
A2. 
Aj. 
O 
•a 
Column decoder 
2x100 KQ 
2.8V 
Figure 4.18: X O R neural network. 
0\ 
OS 
0.2 V 3 
A 0.0 V | 
0.2 V 
0.2 V 
B 0.0 V-
-0.2 V j 
O 0\iA 
•1 H A J 
0 
I I ] I 1 1 — I — ] ~ 
10 20 
• | — I — I - ' I ' r I 
30 
Time (^ is) 
1 ' ' ' "^ I ' ' ' ' I 
40 50 60 
A B o 
0 0 0 
0 1 1 
1 0 1 
1 1 0 
(a) 
Figure 4.19: Simulated transient characteristics of the XOR network. 
(a) Logic table o f the XOR gate. 
Figure 4.20: Structure of the QRS complex detector. 
167 
> i ^wi w sii V f i '^i w w w w \i 
w w \i \i \i 'i^ki ^wl w w 
I I I I 
Figure 4.21: Layout of a matrix of TTSs. 
•J - 1 N 
-
1 Nl 
'ffc 
- 'I/V' 
I n - 1 \ 
T—r 
Pol>-l 
HRes 
I T S array 
(11x6) 
I T S array 
(3x2) 
I/V 
TTS 
array 
(3x1) 
HRes irv 
Xor 
ONeu 
DAC Res Opamp 
E C G 
Classifier 
irv 
Test 
elements 
TTS 
ONeu 
I/V 
Column 
decoder 
TTS 
array 
(7x3) 
I/V 
ONeu 
Row 
decoder 
Figure 4.22: Floor plan of the neural network chip. 
169 
Figure 4.23: Photomicrograph of the prototype chip. 
1 7 0 
Fabrication process 2.4-^m CMOS 
double-poly 
double-metal 
TTS unit size I74^m X 147nm 
\fW converter unit size 147^m X I02 | im 
Output-neuron unit size 202|im X 176nm 
Horizontal-resistance unit size 174)im X 87|im 
D A C size 405|im X 382|im 
OpAmp size 444|im X 209^m 
Silicon resistances 201|im X I45 | im 
Row decoder size 468^m X 277(im 
Column decoder size 944nm X 296nm 
Number o fTTSs 98 
Number o f W converters 14 
Number o f output-neuron 5 
Number o f Horizontal-resistance 14 
Weight memory device capacitor 
Active die size 2.79mm x 2.26mm 
Supply voltage 3V 
Package 100-pin PGA 
Table 4.1: The chip features. 
171 
Chapter 5 
Performance of the prototype neural 
network chip 
The potentials o f the various neural network modules have been analytically estimated 
in the previous chapter and simulation studies have also indicated similar levels o f 
practicality. In this chapter, the performances o f these elements are experimentally 
assessed and compared with those mentioned above. To this end special test set-ups 
have been developed which allow for measurements to at least 1% accuracy. A 
thorough review o f the potentials o f various learning algorithms suitable for analogue 
neural structure in-Ioop is presented. The objective o f this analysis is to examine the 
basic concept o f these algorithms in order to highlight their performance level in terms 
o f learning quality, f lexibil i ty, convergence speed, and hardware cost and consequently 
select the most suitable for the given neural network structure. A section o f this chapter 
w i l l also be dedicated to introducing the experimental set-up that has been developed to 
monitor the training and testing o f the analogue neural networks. The performances o f 
the X O R and QRS complex detector networks are subsequently presented and 
compared wi th those predicted by the analysis and SPICE simulations. Finally, these 
172 
results w i l l be used to judge the level o f usefulness o f the proposed implementation 
technique in simulating fu l ly connected feedforward neural networks. 
5.1 Experimental characteristics of the basic building 
modules 
A n assessment o f the performances o f the analogue neural network modules, i.e. TTS, 
\N converter, ON, HRes, weight storage scheme and refreshing mechanism are 
presented in this section. Ten samples o f the prototype chips were made available by 
IVHETEC for test purposes, thus allowing for an evaluation o f the effects o f local and 
global process parameter variations on the test structures. 
5.1.1 Transconductance-thresholding-synapse 
For measurement purposes, the differential output currents o f the individual TTSs were 
converted to differential potentials via the use o f a pair o f matched transimpedance 
amplifiers consisting o f a low noise JFET input op amp and a feedback passive resistor 
o f I M O . As shown in Fig. 5 . 1 , these outputs were subsequently sensed using a 
difference amplifier whose gain was set to 5. The differential input potential at the gate 
terminals was applied externally using a function generator, whereas the weight 
differential voltage was generated using the on-chip DAC-op amp arrangement and 
directed to the bulk inputs o f the TTS under test by the weight address decoders. 
Experiments were carried out over a wide range o f biasing currents and confirmed 
that the gain factor o f the differential output current is proportional to Ig. The measured 
static transfer curves relating the output voltage to the input voltages V^j and are 
shown in Fig. 5.2 (a) and (b), respectively and correspond to a bias current o f 250 nA. 
173 
These transfer characteristics were obtained using the X - Y mode measurement facility 
o f a digital oscilloscope and show excellent correlation with the simulation results o f 
Fig. 3.7. The measurements also confirm that over the linear range o f the bulk 
differential input potential, the differential output current can be approximated using 
(3.25). Although the bias current was set to 250 nA, the gain o f the TTS is about 50 % 
higher than anticipated. This increase may principally be due to the significant distance 
(approximately 2 mm) between the biasing generator and the TTS under test. Distance 
o f this order w i l l exacerbate parameter variations associated to unavailable non-uniform 
doping. 
Extensive measurements on all available TTSs have also been conducted to establish 
the magnitude o f output offset current when both inputs arc short-circuited. To ensure 
accuracy, the offset contribution o f the test set-up was evaluated and accounted for 
during the experiments. The results shown in Table 5.1 indicate a standard deviation o f 
approximately 12 nA which closely agrees with the prediction. 
5.1*2 Current-to-voltage converter 
Since the TTS biasing current is set to 250 nA, to obtain a weight range o f ± 10 the 
non-linear load must display a driving point resistance o f approximately 15 M Q . To 
measure the transfer characteristic o f such a high-value resistor the experimental 
configuration shown in Fig. 5.3 was used in order to minimise external noise pickup 
and measurement distortion. The floating current source was implemented using an 
arrangement o f high-precision passive resistors and amplifiers and controlled by a 
triangular shape voltage signal o f 0.4 Vpp. The scheme exhibited a 1 j iAA^ trans-
conductance. The generated differential potential was subsequently detected using low 
noise JFET input op amps operated as voltage followers driving a difference amplifier 
174 
whose gain was also set to 5. Since the test set-up loaded the resistive element with a 
substantially large capacitor (i.e. at least 5 pF) measurements were conducted at 10 l i z 
so as to avoid frequency related attenuation. 
Results suggested, as shown in Table 5.2, that the nominal resistance (i.e. the slope 
resistance measured at the origin) is inversely proportional to the bias current I2 over 
the range 15 nA ... 60 nA, thus facilitating the control o f the weight range. It has also 
been observed that within this range o f biasing current the nominal resistance is three 
times smaller than expected. This variance may be caused by either batch-to-batch or 
wafer-to-wafer rather than local or global die process parameter variations, since the 
scale o f error was approximately the same for all 10 prototype chips. Fig. 5.4 depicts a 
family o f measured I -V static characteristics obtained for a bias current o f 30 nA and 
supply voltages ranging from 2.9 V to 3.1 V in equal 100 mV increments. These 
measurements validate that the transfer characteristic is relatively insensitive to power 
supply variations and displays a sinh ' non-linearity. 
Extra tests have also indicated that, as shown in Table 5.3, the load arrangement is 
less vulnerable to local process parameter variations than anticipated. The standard 
deviation o f the output offset voltage is o f the order o f 5 mV. These experiments were 
conducted with the floating current source removed from the above mentioned test 
set-up. 
5.1.3 Horizontal resistance 
The experimental arrangement, shown in Fig. 5.5, consisted o f an inverting amplifier 
where the device under test (i.e. HRcs) together with a passive resistor dcfmed the 
closed loop gain. Given that the expected value o f the horizontal resistance, within its 
linear range, is in the order o f 210 kQ, a passive resistor o f 200 kD, was employed to 
175 
obtain an absolute gain o f approximately one. To f ix the quiescent common-mode input 
level to / 2 the non-inverting input o f the op amp was set to 1.5 V . With a triangular 
shape signal, having an amplitude o f 0.2 and a common-mode component o f 1.5 V , 
applied at the input o f the structure, the current to voltage sialic characteristic o f the 
horizontal resistance shown in Fig. 5.6 was obtained for a bias current Ig o f 250 nA. 
With the gain o f the inverting amplifier chosen to be one, the settings for the digital 
oscilloscope were 50 mV/div in X mode and 50 mV/div in Y mode which was also 
inverted. The results also indicate close agreement with the theoretical prediction (4.5), 
i.e. the HRes exhibits the tanh ' distortion and a nominal resistance o f approximately 
250 kQ. It has also been noted that, as expected, the resistance value is a function o f the 
controlled bias current generator; the higher the bias current, the lower is the resistance. 
The performance o f a TTS fed by an HRes was experimentally assessed utilising the 
set-up presented in Fig. 5.1 wherein the function generator was substituted by the 
arrangement previously described. The experimental evidence, whose results are 
depicted in Fig. 5.7, confirms the predicted linearisation o f the input o f the TTS for 
differential input current within ± 2XIQ. 
The output voltage offset measuremenls were conducted on the lIRes associated to 
the A input o f the XOR network o f each prototype chip. The results are given in Table 
5.4 and exhibit a standard deviation o f 4 mV which closely agrees with the value 
predicted by the Monte Carlo study. 
5.1.4 Output neuron 
Measurements were conducted utilising an experimental set-up similar to that used for 
characterising the TTS, with the exception that the value o f the feedback passive 
resistances o f the transimpedance amplifiers were halved so as to contain the output 
176 
signal well within the supply voltages of the op amps since the differential output 
current is magnified. With the biasing current set to 250 nA, Fig. 5. 8 (a) and (b) 
present the static measurements which relate the differential output current to both 
differential input voltages Vgj and respectively. These results, when compared with 
those of the T T S , confirm that the gain of the internal current buffer of the ON is about 
8. This outcome was expected because both structures are juxtaposed on the die, thus 
their biasing currents must be similar. One can therefore conclude with some certainty 
that the 50 % gain error of the T T S , as mentioned earlier, and that of the ON is due to 
global process parameter variations. 
As predicted by the series of Monte Carlo simulations, the standard deviation of the 
output current offset of the individual ONs, given in Table 5.5, is 97 nA which is 
approximately that of the T T S s limes the gain of the internal current amplifier. 
To determine the driving capacity of the ON a transient experiment was conducted. 
The step response of the ON exhibited in Fig. 5.9 was acquired with the structure 
supplied by a square wave signal of 0.4 embedded in a 1,5 V quiescent common-
mode component and loaded by its associated detecting scheme. This measurement 
indicates, for a maximum positive weight signal, a propagation delay of 1.5 |as and a 
transition time of 1.4 |is both of which are mainly due to the limitations of the op amps 
(i.e. slew rate) utilised within the sensing arrangement. 
5.1.5 Weight storage scheme 
As foreshadowed, the holding time is a critical parameter for any weight storage 
scheme. The weight persistence was determined by applying a 200 mV differential 
potential at the test T T S input and loading its weight storage capacitors with a 
maximum positive value. To maximise subthreshold conduction during the charge 
177 
leakage process, the input of the access switches was subsequently set to the lowest 
possible weight control voltage. The differential output current detection was achieved 
utilising the sensing scheme depicted in Fig. 5.1 in order to accurately correlate the 
weight voltage decay to the static transfer characteristics formerly presented in Fig, 5.2; 
Fig. 5.10 shows the output signal variation of the T T S in the absence of refreshing. 
Two distinct periods may be identified. During the initial 60 seconds charges stored on 
both capacitors are leaked at a similar rale. Nonetheless the differential output current is 
slightly corrupted for the following two reasons: 
• Mismatch in the leakage paths degrades the stored differential potential; and 
• The gain of the T T S is influenced by the change in the quiescent common-
mode weight signal. The higher the common-mode, the lower is the gain. 
In the subsequent phase, the capacitor prccharged with the most positive weight 
signal C \ „ i d has reached complete discharge (i.e. = V ^ J . However charges still flow 
between the complementary capacitive element C\^y^ and the substrate, thus reducing 
the magnitude of the differential weight voltage and consequently that of the output 
signal at a much quicker rate. The maximum holding time, at room temperature, 
corresponds to the time taken for the output to drop by I/I27. Measurements indicate 
that a refresh cycle time of 5 seconds would be acceptable. Results also suggest a 
mismatch in the leakage paths of approximately 0.4 fA and a leakage current per path 
of about 5 fA rather than 5 pA as anticipated. In the event of a design based on a single 
capacitivc storage device this latter result implies that the weight would need to be 
restored at least three times per second. 
Charge injection also has an important influence on the weight storage mechanism. 
As noted earlier, clock feedthrough has the effect of modifying the charge stored onto 
178 
the holding capacitors and occurs during the switch-off transient of the access 
transistors, thereby inducing an error in either the differential output of a T T S or an 
ON. The level of switched induced error voltage was experimentally evaluated utilising 
the lest set-up presented in Fig 5.1. A differential potential of 100 mV was applied at 
the input of the individual T T S while a null weight was loaded onto its holding 
capacitors. Fig. 5.11 shows the variations in the T T S output signal when the weight is 
continuously refreshed at a rate of 80 times a second. This measurement clearly 
illustrates that the quantity of stored charge is corrupted due to clock feedthrough and 
capacitive coupling between the decoder signal lines and the holding capacitors. To 
estimate the contribution of each effect additional experiments have been conducted. 
The results revealed that when the row and column signal lines, crossing over C\Q,J, are 
switched independently a disturbance of 30 mV and -10 mV respectively is caused, 
while charge injection induces a relatively larger change of 120 mV. When this is 
related to the T T S static transfer characteristics, it implies that the error caused by clock 
feedthrough occurring at the holding capacitors level is 30 mV which appears to be 
much larger than expected. A similar amplitude has been detected over the whole 
weight range, thus suggesting that the source of this interference is related to additional 
digital signal wires running above the top plate of C\„ ,d rather than charge injection. 
This hypothesis seems plausible as charge injection should be further compensated 
since a twin-capacitor memory structure is utilised. 
5,1.6 Refreshing mechanism 
Linearity and monotonicity in the D A C element of the weight-update circuitry are 
essential to the learning process. Monotonicity is assured if the difference between a bit 
current and the sum of the lower-order bit currents is positive and the differential 
179 
linearity depends on the magnitude of that difference. Table 5.6 shows the measured 
values of the weighted current sources for each of the 10 prototype DACs. For a biasing 
current Ij of 80 nA, these measurements confirm the monotonicity in the static transfer 
characteristics and indicate a differential linearity of ± 1 L S B with a circuit yield of 100 
%. It may also be noted that when the absolute value of the D A C input changes from 
127 to 128, the DACs exhibit an accuracy of ± V^ L S B . This higher than expected level 
of precision may be due to the fact that the bit current transistors were integrated around 
a common centroid point. 
The dynamic characteristic of the weight refreshing system is also an important 
feature. The maximum settling time is the interval required to fully charge the holding 
capacitors when the digital stored weight changes from one extreme to the other. This 
refreshing speed is a determining factor of the following two fundamental limits of our 
analogue ANN design: 
• The maximum number of TTSs that could be refreshed by a single 
asynchronously time-share refreshing scheme; and 
• The learning time. 
As depicted in Fig. 5.12, a maximum refreshing time of 3 |is was obtained for an op 
amp biasing current I^ p of 2.5 |aA. This result indicates that for a holding time of 5 
seconds a maximum number of 1.5x10^ synaptic-wcights could be refreshed/updated 
using the suggested design. Since this is greater than the number of synapses that can 
readily be integrated on a large silicon die, (i.e. 10mm x 10mm) refresh rate 
considerations appear not to be restrictive. 
180 
5.2 Training algorithm for analogue feedforward 
neural networks 
As mentioned earlier the input/output mapping function achieved by a feedforward 
neural network is dependent upon the weight associated to each synaptic connection. To 
determine a weight point in the weight space that may fit the desired function, the 
neural network is stimulated utilising an input pattern within a set of examples of input-
output pairs. The difference between the actual output and the desired target is then 
used to modify the free parameters of the network, which arc usually initially set to 
small random values, in order to diminish the mapping error. This error measure for a 
pattern p is defined as the sum of the square errors 
Ep = Z (Tpk - Opk)' (5.1) 
where n is the number of output units. T^^ and O^y. represent the target and actual 
outputs, respectively, for pattern p at the output k. When this procedure is repeated for 
each of the training samples, one epoch is complete. This method is commonly referred 
to as stepwise supervised learning [18] and takes place until the averaged sum of the 
error measure from each input/ouput vector in the training set has reached an acceptable 
limit. The mean-squared mapping error or cost function is expressed as 
EAV = i - I I (Tpk - Opk)' (5.2) 
IN p-i k= l 
where N is the number of training patterns. At any given time in the proceeding, the 
magnitude and direction by which the connection strengths of the network are altered 
depends on the value of the error measure and the nature of the update rule (also known 
181 
as the algorithm) that is used to minimise it. However, since the education process is 
conducted on a pattern-to-pattern basis, the size of the former is usually kept small in 
order to avoid the loss of previously stored information. As a consequence to make the 
feedforward neural network emulate the teacher many training epochs are usually 
required, thus rendering the learning procedure intensively iterative and computational. 
Generally, to make the path through the weight space more stochastic, thus allowing a 
wider exploration of the error surface, the patterns are presented in a random fashion 
from one epoch to another [12]. 
An alternative approach to stepwise learning is batch mode. In batch mode learning 
the complete set of training vectors are consecutively presented to the network in order 
to estimate the average squared error. The weights are updated once, at the end of an 
epoch rather than after each training pattern, utilising (5.2). This technique may 
counteract the forgetting effect of stepwise learning, especially i f a large number of 
irregular patterns are to be taught. However this is acquired at the expense of an 
increase in the likelihood of trapping the error surface in a local minimum since the 
influence of a random update sequence is not available. 
A neural network is said to be trained when the error surface has reached a global 
minimum, i.e. the weight vector of the MLP approximates, within some degree of 
accuracy, the required input/output mapping function. When this state of operation is 
attained one can dispose of the teacher and independently use the MLP to perform the 
task it was trained for. However it is important to note that a neural network subject to a 
training procedure may never converge toward a global solution. This outcome does not 
necessarily indicate the non-existence of the unique set of weights needed to produce 
the desired mapping characteristics. However in the eventuality of failure one may 
consider altering either the initial starting conditions (i.e. the initial values of the free 
182 
parameters), the learning rule, the topology of the network or eventually revising the 
number and diversity of the training patterns [12]. 
The fundamental characteristic of a trained neural network is that it has the ability to 
generalise well [18]. That is to say that if all the training patterns were sampled 
throughout the mapping space, the network should be able to provide a satisfactory 
response to an apparently unknown input pattern by interpolation. 
Several supervised learning algorithms for feedforward neural networks have been 
developed for neurocomputer simulators [12], [18], [34], [91-92]. However, a limited 
number of those are adaptable to analogue hardware implementations [51], [93-94] for 
various reasons that are presented later. The EBP also commonly referred to as the 
back-propagation is by far the most commonly used algorithm to train MLP in software 
on conventional digital computers. It was first described by Paul Werbos in his Ph.D. 
thesis in 1974 and was popularised more than a decade later in 1986 by Rumelharl and 
his colleagues [91]. The standard E B P learning rule [56], [60] and derivatives [55], [59] 
have been successfully implemented in analogue V L S I hardware in order to speed up 
the learning process and to render the neural network system completely autonomous. 
However it will be shown in the next section that the update technique is also 
computationally demanding. Thus, for a given silicon area, the hardware cost of 
integrating the E B P on-chip can severely degrade the density of primary neuronal 
circuits. 
In the following section our experience with back-propagation and weight-
perturbation techniques is discussed. 
183 
5.2,1 Back-propagation 
The back-propagation algorithm is based on a first-order approximation of the steepest 
descent method which uses the Jacobian gradient to determine a suitable direction of 
movement of the weights. According to the steepest descent method, the correction 
applied to a weight is proportional to the negative of the gradient of the error surface 
with respect to that particular weight [12], [18] and is given by 
AW^^ = - ^ i - ^ (5 3) 
'J 2"aw^ ^^ '^ ^ 
where T\ is referred to as the learning rate. The factor I/2 was introduced in order to 
simplify subsequent analytical derivations. For a feedforward neural network Op^ in 
(5.1) may be expressed as follow 
Op.=/z b ^ w « / z b;+w;„./.../z b^w^'„.x„]...]] (5.4) 
where W^ ,*^  represent the synaptic weight that is attached to the k^ neuron of the output 
layer and is fed by the l*^  neuron located in the antecedent layer. W^^ is the weight 
vector associated to the input layer and x„ is the n'*' input node of the network. 
Combining (5.1), (5.3) and (5.4) the update rule of the back-propagation algorithm is 
given [12], [18] by 
AW? = T i . Y f ' . 5 r (5.5) 
where Yj°"' is the output signal of j'*' neuron situated in the a-l"^ layer and 5^ is referred 
to as a local error signal and is defined as follow 
[Tpk-OpkJ represents the output layer 
184 
where S^- is the sum signal of the i^ neuron located in the layer and/( . ) is the first 
derivative of the thresholding function. To be able to compute this first derivative the 
E B P learning rule requires that the activation function of the neurons be continuous. 
This is the reason why the sigmoid function is by far the most commonly used 
activation function in the design of feedforward neural networks. It may be observed 
that the local error signal of a given hidden unit is a function of all of those associated 
to each neuron located in the subsequent layer. Hence, a complete learning iteration 
consists of feeding the input of the network with a training pattern and propagating the 
signal forward toward the output nodes. This first step is usually referred to as the 
forward path. The local error signal vector of the output layer is then estimated and 
propagated backward toward the input nodes in order to evaluate those of the hidden 
layers. The weights are then adjusted as follow 
W?(ncw) = W^(o,d) + T i . Y f ' . S r (5.7) 
Fig. 5.13 depicts the How of information of the back-propagation algorithm for a 3:3:2 
MLP. It may be noticed that as the output get closer to the target (5.6), the value of 
reduces, leading to a slower progression towards the optimum. For an analogue 
hardware simulator such has the one described above, implementing this process with a 
limited weight resolution of 8 bits may present some problems. Several researchers 
[93-94] have stated that successful MLP gradient descent learning requires weight 
precision of between 8 and 13 bits. The actual number of bits depends on the problem 
to be solved and on the choice of r). Speed improvements and better convergence can be 
obtained by modifying the algorithm so as to estimate a near-optimum value of the 
learning parameter [12] and/or incorporating a momentum term [18], However it can be 
appreciated that for each synaptic connection located in a hidden layer of the forward 
185 
path an extra three multipliers are required to update its weights. These additional 
computational costs make the standard back-propagation algorithm a questionable 
candidate for on-chip training. 
To successfully evaluate the weight correction signal AW^j. the learning rule requires 
an accurate knowledge of the network connection strengths, the output slate and internal 
activity level of each neuron and the sigmoid function derivative. For an in-loop 
training architecture (see section 5.3 for details) extra input/output pads would be 
required to provide the necessary information from the forward path to the teacher. This 
additional hardware cost also causes the E B P learning rule to be incompatible with 
chip-in-loop training. Note that for the analogue simulator described earlier the output 
states of the hidden neurons arc not available since the thresholding and synaptic 
functions have been combined. 
5.2,3 Weight perturbation 
The Weight Perturbation (WP) learning algorithm was first presented by Jabri and 
Flower [92], and is also based on an approximation of the gradient descent method. The 
weight correction signal is estimated using a finite difference of the error/weight 
gradient which is a first order approximation of (5.3) and is given by 
EfwI^+pertl-Efw;;) AE(W( 
^^^r •'^••^- "^-—r ^ = -^' pert (5-^) 
^ ^ i J (w?+pcrt)-W^ 
where pert is the weight perturbation signal. Note that the factor Vi is not introduced 
since the gradient is approximated rather than analytically evaluated. However it is 
required that, for the approximation to remain satisfactory, both the perturbation and 
error variation are kept small in order to optimise the probability of convergence and 
186 
the level of generalisation. These conditions may cause the WP learning algorithm to 
converge at a slower rate than the E B P learning rule. 
A complete training procedure involves feeding an input pattern to the network and 
evaluating the error signal (5.1). The connection strengths are then perturbed in turn 
and the corresponding changes in the network error are measured. Finally the weights 
are updated using 
A E 
W ? ( „ e w ) = W ? U ) - 1 1 . - ^ (5.9) 
Fig. 5.14 depicts the signal flow of the weight perturbation algorithm for a 3:3:2 
feedforward neural network. Although the WP method is similar to the back-
propagation technique, it is more suitable to analogue hardware implemented systems, 
because the weight update is simply a factor of the change in the network error. Note 
that knowledge about the non-linearity associated to the neuron and its first derivative is 
also no longer required. These advantages are partially offset in that for each iteration 
Z+1 forward paths arc required, where Z is the number of weights. Obviously, this may 
generate greater concern for a computer simulator than for an analogue V L S I system, 
since in the former the forward-path computation time may be greater than in the latter. 
For on-chip implementation of the WP algorithm Z memory elements are required to 
temporally store the error variation associated to each weight. This hardware cost is 
clearly less complex than that of the E B P learning algorithm. It is also worth noting that 
for an in-loop training system, access to hidden neuron activity is not needed. 
Comparing the EBP and WP algorithms, one can trade-off training speed for 
complexity of implementation. In our case the weight perturbation scheme appears to 
offer the best approach for a chip in the loop architecture. 
187 
5.2.3.1 Computer experiment 
The performance of the WP learning algorithm has been assessed via extensive 
computer simulations for the task of detecting QRS complexes in a foetal E C G signal. 
The training data set included 306 input patterns of good, and poor quality QRS 
complexes and noise signals. Fig 5.15 shows the E C G signals from which the training 
patterns where obtained. A training pattern consisted of 10 samples which were 
extracted at even intervals from a window of 30 data points. The network, whose 
structure was depicted in Fig. 4.20, was trained using various perturbation and learning 
rale values, ranging from 0.02 to 0.32 and 0.05 to 0.2, respectively. A stepwise mode of 
training was adopted whereby the patterns were presented in an incremental fashion. In 
all cases, identical initial weight values and stopping criteria were imposed. The 
simulations showed that the quality of the generalisation was best when the learning 
rate and the perturbation values were set to 0.05 and 0.08, respectively. Fig. 5.16 shows 
the learning curve of the network. The training was terminated when the root mean 
square error of the network decreased below the set limit of 2%, and required 
approximately 135 epochs. During these experiments it was noted that increasing the 
learning rate had the effect of reducing the number of epochs required to reach the 
stopping criterion. However, it also had the effect of lowering the generalisation ability 
of the network. It may also be noted that these (computer simulated) learning 
experiments were achieved using high precision arithmetic and therefore "ideal" weight 
resolution and dynamic range. 
188 
5.3 Test set-up 
Although, the neural network chip includes a weight refreshing mechanism and weight 
storage capacitors, the architecture is not completely autonomous in the sense that it 
does not incorporate an on-chip learning mechanism and memory elements to store the 
digital value of the weights. To the former issue few solutions arc available. First, the 
neural networks can be trained off-chip. This technique involves evaluating a set of 
weights that will match the required mapping function utilising a digital neurocomputer 
simulator and downloading those connection strengths onto the chip. Unless the 
simulator takes into account the non-idealities of the analogue hardware which are 
offsets and gain errors, the analogue neural systems may not display the desired 
mapping function. Furthermore the possibility of characterising each processing unit of 
a large analogue structurc is not always made available since it impairs on hardware 
cost (i.e. extra connection pads). An alternative solution consists of inserting the neural 
network in a loop whereby the learning procedure is monitored by an external host 
computer. This technique is commonly referred to as chip-in-Ioop training. Since the 
forward path is provided by the analogue ANN rather than a digital ideal system, the 
learning mechanism is therefore able to compensate for the distortions. The latter 
method emerges as the most suited to our present design. 
To facilitate communication between the host computer and the analogue neural 
networks an interface board has been purpose designed. Fig. 5.17 shows the block 
diagram of the complete test set-up. The test bench includes a high speed digital weight 
memory circuit which incorporates a 128 x 8 bits static R A M to store the digitised 
value of the synaptic connections. The external storage scheme was designed so that the 
weights could be refreshed either using data stored in the static R A M or by the host 
189 
computer. In the former case, i.e. when the networks are trained, the weights were 
sequentially updated at a rate of 80 times a second. This is achieved using a 7 bits serial 
counter driven by a 10 kHz clock signal. To avoid synchronisation problem during a 
weight refresh cycle, the R A M is enabled before the on-chip decoders and then disabled 
after. Twelve 8-bit high-speed DACs are utilised to convert the input pattern data, 
which are supplied by the host computer, into differential currents. The inputs of the 
DACs are multiplexed and latched since the host computer provides a limited number 
of digital input/output data channels. This design burden has the unfortunate 
consequence of limiting the speed of inter-communication and therefore increasing the 
training time. The differential output currents of the networks are converted into single-
ended voltages utilising sensing schemes similar to that exploited to characterise the 
ON. These voltages arc then transformed, also in a multiplexed manner, into digital 
words of 12 bits using a single flash Analogue-to-Digital Converter (ADC). 
The whole test scheme can be operated cither in training or generalisation mode. In 
the former state, since the retention time of the capacitive storage schemes is 5 seconds, 
a complete WP training cycle can be executed without having to refresh the weights. 
Hence analogue noise generated either by the effect of charge injection or capacitive 
coupling is disregarded. The analogue system can therefore fully enjoy the 8 bits weight 
resolution during training. When a network is trained its final connection strengths are 
stored in a file for possible later use. In the generalisation mode, the weights are 
downloaded onto the RAM and test patterns are fed to the system by the host computer. 
190 
5.4 Performance of the analogue neural networks 
5.4.1 Exclusive OR network 
The X O R network was trained using the weight perturbation algorithm, described 
above. To avoid saturation of the system during the early phase of the training, the 
synaptic weights were initially set to zero. 
Training was performed using a stepwise approach, where one of the four patterns 
was presented at the inputs and all weights were updated prior to a pattern change. It 
may also be added that the patterns were presented to the network in an incremental 
fashion rather than in a random order. The neural network trained optimally when 
parameters t| and pert were set to 0.05 and 0.2, respectively. Tests showed that if the 
value of the learning rale was greater than that suggested above, the system tended to 
settle to a local minimum rather than converging to a global solution. On the other 
hand, reducing r\ below 0.05 had the effect of increasing the number of epochs required 
for the network to converge. In an ideal system, the accuracy of the weight perturbation 
algorithm improves as the perturbation is reduced. In practice however, thermal noise 
imposes lower limits on the minimum perturbation size and it is necessary to trade-off 
gradient accuracy for signal-to-noise ratio. Our tests established that a perturbation of 
0.2 offers a good compromise. 
A batch mode approach was then used to train the network whereby the weights 
were adjusted following a complete cycle through the training patterns. Results 
indicated that over its stepwise counterpart, batch mode training allowed larger values 
of the learning parameter before the system encountered any convergence difficulties. 
This approach resulted in a substantial improvement in the learning speed. According to 
191 
the experiments, as long as r| was smaller than 0.5 the WP training algorithm offered 
satisfactory generalisation levels. 
Fig. 5.18 shows the learning curve of the X O R network for both stepwise and batch 
mode training. The training procedures were terminated when changes in the Root 
Mean Square (RMS) error were so minor as to be undetectable by the learning system. 
Notwithstanding substantial noise constraints, the network was able to reach a 3% RMS 
error. Fig. 5.19 illustrates the performances of the X O R network when trained using 
stepwise mode. As mentioned earlier, for training purposes, the differential output 
current was converted into a single-ended voltage using a circuit arrangement similar to 
the one suggested for the ON tests. With the X O R network driving such a load, as 
shown in Fig. 5.20, the propagation delay was typically 7 |is of which 1.5 \xs is due to 
the limitations of the op amps utilised within the sensing arrangement. Thus the 
propagation time of the forward path is approximately 5.5 |is. This experimental 
propagation delay is slightly larger than that predicted by the simulation. This increase 
may principally be due to the fact that in the simulation study the wiring and coupling 
capacitances were not taken into account. 
When the weight point, which satisfied the mapping function of the X O R network 
located in the first prototype chip, was downloaded onto the other 9 prototype 
structures, measurements indicated that in all cases, performances did not reach an 
acceptable level of operation. This was clearly due to the effect of random circuit 
offsets. However, results improved to a level comparable to the one shown in Fig. 5.19 
after readjustment of the weights and clearly indicate that errors due to offsets and non-
linearity can be accommodated by the learning process. 
192 
5.4.2 QRS complex detector 
The 10:6:3 network was trained to detect QRS inputs using the same learning mode and 
set of input/output patterns as those employed for the digital equivalent network. The 
training program employed a weight perturbation procedure, similar to that used during 
computer simulation, with additional functions dedicated to communication between 
the host and the network inputs/outputs and the weight storage mechanism. It may also 
be mentioned that the weight perturbation procedure provided updated digital weights 
which had been rounded up to their nearest feasible value in order to virtually increase 
the weight resolution by Vi bit. It was initially found that with all the synaptic weights 
set to a null value, the outputs of the QRS complex detector network were fully 
saturated. The solution adopted involved training the network to eliminate the saturated 
nodes. The weight point was subsequently adopted as an initial training set and the QRS 
complex detector successfully trained using 0.05 and 0.2 as learning rate and 
perturbation, respectively. Fig. 5.21 shows the learning curve of the network. It may be 
noted that the RMS error could not be reduced below 7%; a value comparatively greater 
than that of its analogous software simulator. This result is unsurprising given the 
limited weight resolution and the analogue thermal noise. Although the forward 
propagation time for the analogue network is substantially lower than that of its digital 
counterpart, the overall training lime was, however, longer. The increased training time 
was mainly due to slow communication between the host and the test bench. 
The effectiveness of both the digital and analogue simulators in detecting the QRS 
was assessed using test samples. The foetal ECG data was obtained from the Plymouth 
Perinatal Research Group. The ECG signals were digitised at 500 samples/s to a 
resolution of 8 bits. To minimise the influence of external noise source, such as mains, 
193 
during signal acquisition, it has been necessary to pre-process the raw ECGs. The pre-
processing was achieved using an optimal linear phase Finite Impulse Response (FIR) 
band-pass digital filter, whose frequency response is shown in Fig. 5,22. The filter 
attenuates the mains frequency and successive harmonics components by approximately 
40 dB, In the case of cardiac arrhythmia classification [79], it may be noted that 
pre-processing might not be needed since the system would be implanted near the 
inf3rmation source, i.e. heart. The ECG was then normalised to lie in the range ± 1 and 
serially fed to the inputs of the networks using time delays. Since the raw ECG was 
sampled 500 times a second, one in three samples were supplied to the systems so that a 
complete QRS complex fitted the window formed by the time delays. The 
generalisation level of the networks was determined using the following measure of 
performance 
_ (Total N*" of QRS complexes - N° of misses - N° of false detections) 
Total of QRS complexes ^ ' ^ 
This performance measure attains a value of 100% only i f all the QRS complexes in an 
ECG are correctly detected, i.e. no misses and no false detections. The number of 
misses or false detections were determined by visually comparing the outputs of the 
neural networks and the filtered ECG. A QRS was assumed to be present i f the output 
of the neural network exceeded a threshold level set to half of the maximum output 
neuron swing, i.e. zero. 
The ability of the network to detect previously unseen sets of QRS complexes was 
tested on two foetal ECG taken from different foetuses. The first ECG signal contained 
800 heart beats and the quality of the data could be described as medium, whilst the 
second was composed of 465 QRS complexes of poor quality. Table 5.7 shows the 
performances of both digital and analogue neural networks. The results suggest that a 
194 
digital simulator performs marginally better than an ANN designed using MOS devices 
operating in subthreshold mode on the problem of detecting QRS complexes from 
foetal ECG. The reasons are that the analogue system offers a limited weight resolution 
and thermal noise imposed a restriction on the minimum value of the perturbation 
parameter. Fig. 5.23 shows an example of QRS detection for a medium quality raw 
ECG signal. Note the constant delay between the raw and filtered ECG, which is 
characterised by the length of the FIR filter (i.e. 200). 
5.5 Summary 
The experimental characteristics of each building block used in the design of fully 
connected feedforward neural networks have been presented in this chapter. The 
measurements indicated that for all arrangements except the capacitive weight storage 
scheme, the performances were in close agreement with the theoretical and simulation 
predictions. The concept of the learning process has also been dealt with in this chapter. 
Although the back propagation learning algorithm is well suited to train MLP which is 
simulated by utilising digital ncurocomputer, its application to analogue simulators 
incurs considerable hardware cost. While the WP learning rule allows a trade-off of 
learning speed against hardware complexity, it offers a more adequate solution to 
training utilising a chip-in-loop architecture. Computer simulation studies revealed that, 
based on the task of delecting QRS complexes in a foetal ECG signal, the WP reached 
excellent levels of convergence. To implement the WP learning algorithm a specialy 
designed chip-in-loop system has been developed. Finally, both XOR and QRS detector 
analogue neural networks were successfully trained. These results suggest that in spite 
of offsets, gain errors and analogue noise feedforward ANNs may be efficiently 
195 
simulated exploiting the exponential relation of current and voltage offered by MOS 
transistors biased in their weak inversion. 
196 
Function 
Generator 
On-chip 
DAC 
Op-Amp I MO 
100 KO 
20 K n I A A A A / ^ 
20 KO 
I MO 
00 KO 
Figure 5.1: Experimental set-up for measuring the transfer characteristics of the TTS 
2 V 
l.S V 
H250 m y 
H125 mV 
0.5 V 
V T S 
0 
-0.5 V 
•1 V 
.1.5 V 
-2V 
hf+H H - H H f - M 
125 niV 
250 n^V 
-250 mV -200 mV -150 mV -100 mV -SO mV 0 50 mV 100 mV 150 mV 200 mV 250 mV 
(a) S I 
2 V 
.5 V 
I V 
0.5 V 
V T S 
-0.5 v; 
•1 v 
.1.5 V 
-2 V 
1 I 
= 100 n V 
1 1 1 1 • 1 1 1 
" 3 U r 
• I I I 
rv 
1 I I I 
-1-1-1" 1 t i l l - T - ^ l - l 
< Vs , = -10( 
i r V — 
mV 
1 = 
-0.5 V -0.4 V -0.3 V -0.2 V -0.1 V 0 0.1 V 0.2 V 0 J V 0.4 V 0.5 V 
(b) Vw 
Figure 5.2: Measured static transfer curves of the TTS. (a) V.^ ^ against W^., 
(b) V^^ against V^. 
198 
100 K U 
.'0 K U 
KM) K U 
Figure 5.3: Test configuration used for measuring the characteristics of 
the load. 
— Vd 
— Vd 
— Vd 
\ JV 
l = 3 . IV 
1 = 2.9V 
1 I i I 1 1 1 1 1 1 1 1 1 i 1 1 
1 M t t i l l t t f 1 1 1 t 1 
4 \ 
3 V 
2 \ 
I \ 
250 mV 200 mV 150 mV 100 mV 50 mV 0 50 mV 100 mV 154»mV 200 mV 250 mV 
Vc 
Figure 5.4: Experimental static characteristics of the l/V converter. 
against for 1^  / = l^iA/V. 
199 
200 KO 
Figure 5.5: Experimental set-up for measuring the 
characteristics of the horizontal resistance. 
200 mV 
ISO mV 
100 mV 
SO mV 
H 
-50 mV 
•100 mV 
-ISO m\ 
-200 m\ 
- H - H - H - H M - M M M M \ M M M M M M - H - H -
-250 mV -200 mV -ISO mV -100 mV -50 mV 0 50 roV 100 mV ISO mV 200 mV 250 roV 
Vc 
Figure 5.6: Experimental static characteristic of the horizontal resistor. 
VH against V^, Isi = Vc/2.10^ 
200 
2 V 
I . S V 
I V 
0.5 V 
T S 
-0.5 V 
I V 
- I . S V 
-2 V 
= 250r iV 
/ = 125r iV 
— I — l ~ l 't-
1 1 t-t— 1 1 1 1 
/ V = -125i nV 
> = -250 mV 
T 
1 =—n 
-250 IDV -200 mV -150 mV -100 mV -50 mV 0 50 mV 100 mV 150 mV 200 mV 250 mV 
Vc 
Figure 5.7: Experimental static characteristics of the transconductance-
thresholding-synapse when fed by the horizontal resistance. V^s against V^. 
201 
8 V 
6 V 
4 V 
2 V 
^250rfiV 
¥ 125 nhV 
Vo 
-2 V 
- I V 
- 6 V 
-av 
I I I I 
\^vH-125hiV 
\Vvh-250hV 
-250 mV -200 mV -ISO mV -100 mV -50 mV 0 50 mV 100 mV ISOmV 200 mV 250 mV 
(a) Vsi 
4 V 
3 V 
2 V 
I V 
•I V 
.2 V 
-3 V 
^ V 
\ = 100 n V 
/ >f-i = so n V 
^1 'SI 
r 
1 1 1 ! 
1 1 1 1— 
/ ; 
1 , 
y V = -bUi nV 
1 : =—H 
= -100 mV 
-03 V -0.4 V -OJ V -0.2 V -0.1 V 0 0.1 V 0.2 V 0.3 V 0.4 V 
(b) Vw 
Figure 5.8: Measured static transfer characteristics of the output 
neuron, (a) VQ against Vg; and (b) VQ against V^. 
0.5 V 
202 
CH2 5V CHI 0.5V 
C H I =Vsi 
H - H h-HH hH-H h-HH H - H H - H h-H-t 
Time 
Figure 5.9: Transient characteristics of the output neuron when loaded 
by its associated sensing scheme. 
203 
2 V 
I V 
0.5 V 
T S 
0 
-0.5 V^ 
• 1 1 1 1 1 1 1 1 1 1 1 l i l t •1 1 1 1 • I I I ! 1 1 1 1 • • • • • • I • 
20 5 
=—1 
-1 V 
•1.5 V 
-2 V 
Time 
Figure 5.10: Variation of the TTS differential output current during 
the charge leakage process. 
204 
0.4 V 
O J V 
0.2 V 
0.1 V 
0 
-0.1 V 
-0.2 V 
-0.3 V 
.0.4 V 
— I — 1 — I — 1—1 
t 1 
to' 
- _ j j ^ j J 
• I 1 1 1 1 1 1 1 1 1 1 
T i l 
1 1 1 1 1 1 1 1 - l - U l I . ) 1 1 1 
1 1 —r~ • I l l 
- > 
2 mS <-
I - + 
Time 
Figure 5.11: Effect of clock feedthrough and capacitive coupling onto 
the output of the TTS during weight refreshing. 
205 
0.8 V 
0.6 V 
0.4 V 
0.2 V 
W 
0 • I I I • • • 1 • I I I • I I I 
n M 1 i M 1 M M M i l " -1 1 I I 1 1 1 1 
T -
1 
-0.2 M 
-0.4 V 
-0.6 \ \ 
-0.8 V^ 
Time 
Figure 5.12: Transient characteristic of the refreshing mechanism, 
206 
3 
For^vard path 
€) 
€) 
Back-propagation 
of the error signals 
Figure 5.13: Signal flow of the error back-propagation algorithm. 
O 
00 
Forward path 
Measurement of the 
error variations 
Memory 
Figure 5.14: Signal How of the weight perturbation algorithm. 
0 h 
window I Window 100 W indow IS3 
50 100 
Samples 
Figure 5.15: E C G training patterns. 
150 
U 0.25 
0.15 h-
0.05 \ -
60 80 100 
Number of epochs 
120 140 
Figure 5.16: Learning curve of the computer simulated QRS 
complex detector. Learning rale: ri = 0.05 and perturbation: 
pert = O.OS. 
209 
Training Data Training 
input data control target outputs 
patterns 
C/3 
Si 
Test board 
Final Initial 
weights weights 
Digital 
Weight 
Memory 
Neural 
Network 
Biasing 
Circuitry 
Weight 
controllerL_rn 
Gradient 
evaluation 
Compare 
Network output] 
and 
target output 
Software 
Figure 5.17: Experimental set-up for training and testing the analogue ANNs. 
210 
Stepwise in = O.ok, pert = O.l] 
Batch (Ti= 0.5, pert = 0.2) 1.2 h 
7 1 0.6 
200 300 400 
Number of epochs 
500 
Figure 5.18: Learning curve of the analogue XOR neural 
network using the weight perturbation algorithm. 
211 
1—r 
CHI 0.5V; 
I 
CH2 0.5V: + CH3 2V: 
AdCHl) 
B(fcH2) 
M M M M ++++ M M M M M M M M ++++ M M •f-H-+ 
Outpikt(CH3b 
50 mS 
Time 
Figure 5.19: Performances of the XOR network when trained using 
stepwise mode. 
1 
m 0.41 
A 
1 
t 
1 1 
f \ / A ( C H 1) 
1 1 [ 1 
( ) 1 ' ^ • t 1 * t t t - i -
\ \ 
Out put (C U) 
= 1 
5^S 
Time 
Figure 5.20: Transient characteristics of the XOR network. 
212 
20 40 60 80 100 
Number of epochs 
Figure 5.21: Learning curve of the analogue QRS detector network 
using the weight perturbation algorithm. Learning rale: t) = 0.05 and 
perturbation: pert = 0.2. 
213 
20 
•a 
d-20 
o 
I -40 
•80 
(Sampling frequency 500 Hz) 
50 100 150 200 
Frequency (Hz) 
250 
Figure 5.22: Frequency response of the linear phase FIR filter. 
Raw'ECG 
3 h 
-1 
m 
Filtered E C G 
Output from digital ANN 
Output from analogue ANN 
248 248.5 249 249.5 250 
Time(s) 
250.5 251 
Figure 5.23: Example of QRS detection for medium quality E C G . 
214 
Offset Current (nA) 
Prototype Chip Transconductance Transconductance 
Thresholding 
Synapse 1 
Thresholding 
Synapse 2 
1 Damage during test 7 
2 32 31 
3 10 2 
4 29 11 
5 13 22 
6 18 5 
7 20 13 
8 10 12 
9 29 48 
. 10 2 35 
Table 5.1: Output current offset of the individual TTSs. 
215 
Biasing current (nA) 15 30 ()() 120 
Nominal resistance (MQ) 30 15 8 2.5 
Table 5.2: Nominal resistance of the I/V converter for various 
bias currents. 
onset voltage (mV) 
Prototype chip Ciinent-to-Vollage ( urrent-to-V^olta;;e 
converter 1 coinerter 2 
1 0.4 0.6 
2 11.S 0.3 
3 2.9 4.2 
4 1.7 6 
5 7.4 2.3 
6 1.5 7 
7 12.4 14.4 
7 5 
9 !.(> 0.1 
10 0.4 7.2 
l able 5.3: Output voltage offset of the individual 1/V converters. 
216 
Prototype Chip Offset voltage (mV) 
HRes l 
1 13.2 
2 4 
3 15.8 
4 7.2 
5 11.4 
6 7.9 
7 13.7 
8 15.2 
9 14.1 
10 6.4 
Table 5.4: Output voltage offset of the individual 
HRess. 
Prototype Chip Offset Current (nA) 
1 14 
2 347 
3 69 
4 165 
5 121 
6 32 
7 42 
8 63 
9 11 
10 23 
Table 5.5: Output current offset of the individual ONs. 
217 
Prototype 
Chip 
Bit Current (nA) 
Bn B7 Bi B^ B. B, 
1 20 40 82 166 322 647 1,285 2,589 
2 20 40 82 164 322 647 1,288 2,582 
3 20 41 82 163 321 646 1,289 2,581 
4 21 41 82 165 323 651 1,294 2,594 
5 20 41 82 165 321 650 1,288 2,583 
6 20 41 81 165 322 647 1,289 2,583 
7 21 41 81 165 321 645 1,283 2,579 
8 21 42 83 162 322 650 1,294 2,589 
9 21 41 82 161 322 648 1,282 2,582 
10 20 41 82 164 323 651 1,295 2,598 
Table 5.6: DACs bit current sources. 
Network Type Number of QRS Number of Number of Performances 
complexes missed beats false detections ofNN(%) 
Analogue simulator 
800 
1 20 97.4 
Digital simulator 0 24 97 
Analogue simulator 
465 
20 2 95.2 
Digital simulator 8 2 97.8 
Table 5.7: Performances of the analogue and digitally simulated QRS complex 
detectors. 
218 
Chapter 6 
Conclusions 
6.1 Discussion of results 
A succinct description of the evolution of artificial neural systems and their 
categorisation has been presented. After a detailed explanation of the concept and 
terminology of AN models and ANN topologies, it has been established that artificial 
neuronal functions may be simulated either as software on a digital computer or in 
hardware using either digital, analogue or hybrid VLSI technologies. It emerged that 
although software-based simulators offer flexibility in terms of computing resolution 
and topology alteration they lack processing speed and fault tolerance which are 
characteristics required by most real time applications. To acquire these features a 
systolic design approach is required and appears to be best achieved when using either 
analogue or hybrid VLSI techniques since they allow for compact processing units. 
The adequacy of various previously reported analogue and hybrid circuits to perform 
as either synapses or neurons was explored. Studies revealed that an MOS transistor 
operated as a voltage controllable transconductor biased in the weak inversion is a 
resourceful processing unit with which to build primitive ANN functions since it allows 
219 
for compact structures which dissipate power levels compatible with implantable 
neuronal systems. However, because of the exponential transfer characteristic, the 
computational schemes either display narrow linear range or are highly sensitive to 
biasing conditions which limit the computing resolution to a level below that required 
for the training of feedforward ANNs. A survey also revealed that existing memory 
schemes such as the capacitor, the floating-gate transistor and the binary weighted array 
of devices, suffer from substantial drawbacks. 
To fully enjoy the transfer characteristic of multipliers operating in the weak 
inversion, distribution of the thresholding operation of the hidden neurons of a fully 
connected feedforward neural network over to the inputs of the synapses they are 
feeding to form TS blocks, has been suggested. Although the modified Gilbert scheme 
can ful f i l the role of a TS, the approach was compromised because of the lack of an 
adequate current memory cell. To overcome this problem a TTS based on a novel four-
quadrant multiplier was introduced. The scheme is compact since it utilises a minimum 
number of devices. Experimental results correlated well with the analytical and 
simulation predictions and revealed that it can be supplied with voltages as low as 3 V, 
has a linear differential input as wide as ± 250 mV while the other input displays the 
required tanh function and that process parameter variations have a limited impact. 
It has also been shown that, although the required characteristic of the neuron is that 
of a linear load, moderate levels of non-linearity in the load resistor characteristic have a 
negligible effect on overall performance of MLPs with the exception that it considerably 
enhances the processing speed. To acquire this feature, a novel diode- based load 
arrangement has also been developed. Analytical and simulation predictions as well as 
experimental measurements indicated that immunity to power supply variations of 
220 
diode-based resistive elements can be substantially increased at the tolerable expense of 
power dissipation. 
It has also been established that combining the novel I T S and load schemes to form 
the core of a feedforward ANN, the weight range of the synapses can readily be adjusted 
and power dissipation traded off against processing speed independently while the input 
and output nodes of the system are thresholded and un-thresholded respectively. An 
output neuron was developed as a means of squashing the output signals and horizontal 
resistances were used to linearise the inputs. Furthermore, since the input and output of 
the network were made to be compatible the proposed design could also be used to 
simulate recurrent ANNs. 
Simulation and analytical predictions suggested that to obtain the required minimum 
weight resolution of 8 bits, the effect of clock feedthrough on a basic PMOS-based 
sample-and-hold circuit as a storage element had to be compensated for by utilising a 
dummy transistor. However this approach has the side effect of decreasing the retention 
time of the memory since the number of leakage paths is increased. A twin capacitor 
structure was then proposed to alleviate this problem and correspondingly attenuate 
refreshing speed. However experimental results revealed that such a storage scheme is 
vulnerable to either charge injection or capacitive coupling phenomenon. 
To assess experimentally the viability of the proposed design, a neural network chip 
including the XOR network as a benchmark problem and a QRS complex detector based 
on a 10:6:3 ML? with an on-chip weight update/refresh scheme was designed and 
fabricated. A special purpose interface board was also built in order to train the 
networks utilising a chip-in-loop architecture. It has been suggested that since the 
retention time of the capacitive storage schemes is of the order of 5 seconds, during 
221 
training refi-eshment of the connection strengths of the networks is best achieved 
between epochs in order to avoid switched induced interference which attenuate the 
weight resolution below the required 8 bits. This approach facilitated successful training 
of the networks utilising the weight perturbation learning rule. In spite of noise 
constraints, the analogue networks were able to reach generalisation levels similar to 
those obtained from a digital simulator. Experimental results have also indicated that 
distortion due to device mismatching does not interfere in the learning performance of 
the networks and that the processing speed of the XOR marginally corresponds to that 
predicted by the simulations. 
To conclude, the results of this study revealed that the transfer characteristic of an 
MOS transistor biased in the weak inversion can be fully exploited (i.e. gate and bulk 
terminals are driven) to design compact and low power neuronal functions which 
alleviate issues such as limited linearity range, restricted weight resolution, processing 
speed and mismatching effect. 
6.2 Recommendations for future work 
A review of the work reveals four possible refinements which could be investigated in 
further research. 
6.2.1 Weight storage 
As mentioned in chapter 5, the effects of either charge injection or capacitive coupling 
onto the weight signals are much larger than anticipated. Whilst the capacitive coupling 
can (in hindsight) be significantly decreased by re-routing analogue and digital signal 
lines the evaluation and reduction in switch-charge injection may be the focus of future 
222 
activities. A solution to these issues may lie in the use of FGMOS transistors. Although, 
as specified in chapter 2, this technology suffers from the fact that the threshold voltage 
Vj varies lineariy with the logarithm of the number of pulses and that the write and 
erase operations are asymmetrical; a diode-connected FGMOS transistor acting as a 
variable resistor biased by a constant current source, as shown in Fig. 6.1, may be used 
to replaced the storage capacitors and thus substantially improve area and power 
efficiency. 
6.2.2 On-chip learning 
An on-chip learning mechanism may be added to overcome the two aforementioned 
problems of FGMOS devices and also provide for a self adaptable neuronal system that 
could be used as an implantable functional electrical stimulator for people suffering 
from paralysed limbs, such as those with spinal cord injuries, and consequently facilitate 
interaction between the network and the changing condition of the patients due to 
fatigue. 
6.2.3 Learning algorithm 
In the previous chapter, it has been mentioned that although the weight perturbation 
learning algorithm is suitable to analogue on-chip implementation, this is partly negated 
by the fact that learning is slower than the error back-propagation. It would be of 
interest to investigate optimisation methods based on simple search techniques such as 
the feedforward method introduced by Petridis and Paraschidis [103] since, firstly, their 
extensive simulation results suggested that their method usually converges faster than 
the back-propagation and, secondly, it facilitates analogue hardware implementation 
even more than the weight perturbation would. 
223 
6.2.4 Sensitivity to temperature 
As stipulated in the introduction, the transconductance characteristic of a weakly 
inverted MOS device is extremely sensitive to temperature drifts. Although for 
implantable neuronal systems the human body offers a stable thermal environment, it 
would also be of interest to evaluate the sensitivity of the suggested system to 
temperature variations. 
6.3 Summary 
In summary, this thesis presents four novel circuits, namely a four-quadrant multiplier, a 
current-to-voltage converter, an output neuron and an on-chip weight update/refresh 
mechanism, designed to simulate low energy feedforward neural networks suitable for 
battery-powered implantable neuronal systems. These have been analytically studied, 
simulated using the CAD tool SPICE, fabricated in a low-cost 2.4 ^m double poly, 
double metal CMOS process and their operations experimentally confirmed. A 2:2:1 
feedforward network and a 10:6:3 MLP were designed and succesfuUy trained, to solve 
the XOR benchmark problem and to detect QRS complexes respectively, using the 
weight perturbation learning algorithm. 
224 
vdd 
FGMOS 
Figure 6.1: Transconductance-thresholding-synapse 
incorporating an FGMOS-based memory device. 
225 
Bibliography 
[1] HUBEL, D. H.: The brain', Scientific American, vol. 241. pp. 39-47, 1979. 
[2] STEVENS, C. F.: The neuron', Scientific American, vol. 241, pp. 49-59, 1979. 
[3] KANDEL, E. R.: 'Small systems of neurons', Scientific American, vol. 241, pp. 
61-70, 1979. 
[4] NAUTA, W. J. H. and FEIRTAG, M.: The organization of the brain' Scientific 
American, vol. 241, pp. 78-105, 1979. 
[5] COWAN, W. M.: The development of the brain', Scientific American, vol. 241, 
pp. 107-117, 1979. 
[6] rVERSEN, L. L.: The chemistry of the brain', Scientific American, vol. 241, pp. 
118-129, 1979. 
[7] KUBEL, D. H. and WIESEL, T. N. : 'Brain Mechanisms of vision', Scientific 
American, vol. 241, pp. 130-144, 1979. 
[8] EVARTS, E. V.: 'Brain mechanisms of movement', Scientific American, vol, 241, 
pp. 146-154, 1979. 
[9] GESCHWIND, N.: 'Specializations of the human brain', Scientific American, vol. 
241, pp. 158-168, 1979. 
226 
[10] KETY, S. S.: 'Disorders of the human brain', Scientific American, vol. 241, pp. 
172-179, 1979. 
[ I I ] CRICK, F. H. C : Thinking about the brain'. Scientific American, vol. 241, pp. 
181-188, 1979. 
[12] HAYKIN, S.: 'Neural networks a comprehensive foundation'. New York: 
Macmillan College, 1994. 
[13] SPOONER, J. D.: 'Ocular anatomy', London: The Hatton Press, 1957. 
[14] SIMPSON, R.; WILLIAMS, R.; ELLIS, R. and CULVERliOUSE, P. F.: ' 
Biological pattern recognition by neural networks'. Marine Ecology Progress 
Series, Vol. 79, pp. 303-308, 1992. 
[15] SEJNOWSKI, T. J. and ROSENBERG, C. R.: 'Parallel networks that learn to 
pronounce english text', Complex Systems, Vol. 1, pp. 145-168, 1987. 
[16] POMERLEAU, D. A.: 'Neural network perception for mobile robot guidance', 
London: Kluwer Academic, 1993. 
[17] GROSSBERG, S.: 'Nonlinear neural networks: principles, mechanisms, and 
architectures', Neural Networks, Vol. 1, pp. 17-61, 1988 
[18] ZURADA, J. M. : ' Introduction to artificial neural systems', New York: West 
Publishing, 1992. 
[19] ISMAIL, M. and FIEZ T.: 'Analog VLSI signal and information processing', New 
York: McGraw-Hill, 1994. 
[20] GRAF, IL P.; JACKEL, L. D.; HOWARD, R. E.; STRAUGHN, B.; DENKER, J. 
S.; HUBBARD, W.; TENNANT, D. M. and SCHWARTZ, D.: 'VLSI 
227 
imple menial ion of a neural network memory wilh several hundreds of neurons', 
Proc, Conf. on Neural Nelworks for Computing, pp. 182-187, 1986. 
[21] HOPFIELD, J. J. and TANK, D. W.: 'Computing wilh neural circuits: a model'; 
Science, Vol. 233, pp. 625-633, 1986. 
[22] MEAD, C : 'Analog VLSI and neural Systems', New York: Addison-Wesley, 
1989. 
[23] MAHER, M. A. C; DEWEERTH, S. P.; MAHOWALD, M. A. and MEAD, C. 
A.: 'Implementing Neural architectures using analog VLSI circuits', IEEE 
Transactions on Circuits and Systems, Vol. 36, pp. 643-652, 1989. 
[24] KOCH, C : 'Seeing chips: analog VLSI circuits for computer vision'. Neural 
Computing, Vol. 1, pp. 184-200, 1989. 
[25] LIU, W.; ANDREOU, A. G. and GOLDSTEIN, M. H.: 'Voiced-spccch 
representation by an analog silicon model of the auditory periphery', IEEE 
Transactions on Neural Networks, Vol. 3, pp. 477-487, 1992. 
[26] LAZZARO, J.: 'A silicon model of an auditory neural representation of spectral 
shape', IEEE Journal of Solid-State Circuits, Vol. 26, pp. 772-777, 1991. 
[27] LAN, N.; FENG, M. and CRAGO, P. E.: 'Neural network generation of muscle 
stimulation patterns for control of arm movements', IEEE Transactions on 
Rehabilitation engineering. Vol. 2, pp. 213-224, 1994. 
[28] GRAUPE, D. and KORDYLEWSKI, H.: 'Artificial neural network control of 
FES in paraplegics for patient responsive ambulation', IEEE Transactions on 
Biomedical Engineering, Vol. 42, pp. 699-707, 1995. 
228 
[29] KAWATO, M.; UNO, Y.; ISOBE, M. and SUZUKI, R.: ' Hierarchical neural 
network model for voluntary movement with application to robotics', IEEE 
Control Systems Magazine, pp. 8-16, 1988. 
[30] ROSENBLATT, F.: The perceptron: a probalistic model for information storage 
and organization in the brain'. Psychological Review, Vol. 65, pp. 386-408, 1958. 
[31] MINSKY, M. and PAPERT, S.: 'Perceptrons', Cambridge, MA: MIT Press, 1969. 
[32] HORNIK, K.; STINCHCOMBE, M. and WHITE, H.; 'Multilayer feedforward 
networks are universal approximators'. Neural Networks, Vol, 2, pp. 359-366, 
1989. 
[33] HOPFIELD, J. J.: 'Neural networks and physical systems with emergent 
collective computational abilities', Proceeding of the National Academy of 
Science, Vol. 79, pp. 2554-2558, 1982. 
[34] WIDROW, B. and LEHR, M. A.: '30 years of adaptive neural networks: 
perceptron, madaline, and backpropagation', Proceedings of the IEEE, Vol. 78, 
ppI4I5-144I, 1990, 
[35] McCULLOCH, W. S. and PITTS, W.: 'A logical calculus of the ideas immanent 
in nervous activity', Bulletin of Mathematical Biophysics, Vol. 5, pp. 115-133, 
1943. 
[36] HEBB, D. 0.: The organization of behavior'. New York: Wiley, 1949. 
[37] ROCHESTER, N. ; HOLLAND, J. H.; HAIBT, L. H. and DUDA, W. L.: 'Tests 
on a cell assembly theory of the action of the brain, using a large digital 
computer', IRE Transactions on Information Theory, IT-2, pp. 80-93, 1956. 
229 
[38] ANDERSON, J. A.: 'A simple neural network generating an interactive memory'. 
Mathematical Biosciences, Vol. 14, pp. 197-220, 1972. 
[39] KOHONEN, T.: 'Correlation matrix memories', IEEE Transactions on 
Computers, Vol, C-21, pp. 353-359, 1972. 
[40] ATLAS, L. E. and SUZUKI, Y.: 'Digital systems for artificial neural networks', 
IEEE Circuits and Devices Magazine, Vol. 5, pp. 20-24, 1989. 
[41] FORREST, B. M,; ROWETH, D.; STROUD, N. ; WALLACE, D. J. and 
WILSON, G. v . : 'Implementing neural network models on parallel computers', 
The Computer Journal, Vol. 30, pp. 413-419, 1987. 
[42] TRELEAVEN, P.; PACHECO, M. and VELLASCO, M.: 'VLSI Architectures 
for neural networks', IEEE Micro, Vol. 9, pp. 8-27, 1989. 
[43] MURRAY, A. F.: 'Silicon implementations of neural networks', lEE Proceedings-
F,Vol. 138, pp. 3-12, 1991. 
[44] GARTH, S. C. J.: 'A chipset for high speed simulation of neural network 
systems', IEEE Conference on Neural Networks, Vol. 3, pp. 443-452, 1987. 
[45] liAMMERSTROM, D.: 'A VLSI architecture for high-performance, low-cost, 
on-chip learning', Internatinal Joint Conference on Neural Networks, Vol. 2, pp. 
537-544, 1990. 
[46] lENNE, P. and VIREDAZ, M. A.: 'GENES fV: a bit-serial processing element 
for a multi-model neural-network accelerator', in Neural Networks Theory, 
Technology, and Applications; SIMPSON, P. K.; New York: IEEE, pp. 797-808, 
1996. 
230 
[47] TSIVIDIS, Y. and SATYANARAYANA, S.: 'Analogue circuits for variable-
syanapse electronic neural networks', Electronics Letters, Vol. 23, pp. 1313-1314, 
1987. 
[48] ZURADA, J. M.: 'Analog implementation of neural networks', IEEE Circuits and 
Devices, V o l 8, pp. 36-41, 1992. 
[49] FOO, S. Y.; ANDERSON, L. R. and TAKEFUJI, Y.: 'Analog components for the 
VLSI of neural networks', IEEE Circuits and Devices, Vol. 6, pp. 18-26, 1990 
[50] MOLLIS, P. W. and PAULOS, J. J.: 'Artificial neural networks using MOS 
analog multipliers', IEEE Journal of Solid-State Circuits, Vol. 25, pp. 849-855, 
1990. 
[51] MOLLIS, P. W. and PAULOS, J. J.: 'A neural network learning algorithm tailored 
for VLSI implementation', IEEE Transactions on Neural Networks, Vol. 5, pp. 
784- 791, 1994. 
[52] KUB, F. J.; MOON, K. K.; MACK, I . A. and LONG, F. M.: 'Programmable 
analog vector-matrix multipliers', IEEE Journal of Solid-Statc Circuits, Vol. 25, 
pp. 207-214, 1990; 
[53] SATYANARAYANA, S.; TSIVIDIS, Y. P. and GRAF, H. P.: 'A rcconfigurable 
VLSI neural network', IEEE Journal of Solid-State Circuits, Vol. 27, pp. 67-81, 
1992. 
[54] CHOI, J.; BANG, S. H. and SHEU, B. J.: 'A programmable analog VLSI neural 
network processor for communication receivers', IEEE Transactions on Neural 
Networks, Vol. 4, pp. 484-495, 1993. 
231 
[55] MORIE, T. and AMEMIYA. Y.: 'An all-analog expandable neural network LSI 
with on-chip backpropagation learning', IEEE Journal of Solid-Statc Circuits, 
Vol. 29, pp. 1086-1093, 1994. 
[56] DOLENKO, B. K. and CARD, H. C: 'Tolerance to analog hardware of on-chip 
learning in backpropagation networks', IEEE Transactions on Neural Networks, 
Vol. 6, pp. 1045-1052, 1995. 
[57] BIBYK, S. and ISMAIL, M.: 'Issues in analog VLSI and MOS techniques for 
neural computing', in Analog VLSI implementation of neural systems; MEAD, C. 
and ISMAIL, M.; Boston: Kluwer Academic, pp. 103-133, 1989. 
[58] LANSNER, J. A. and LEHMANN, T.: ' An analog CMOS chip set for neural 
networks with arbitrary topologies', IEEE Transactions on Neural Networks, Vol. 
4, pp. 441-444, 1993. 
[59] SALAM, F. M. A. and CHOI, M.: 'An all-MOS analog feedforward neural circuit 
wilh learning'. Proceedings of the IEEE International Symposium on Circuits and 
Systems, pp. 2508-2511, 1990. 
[60] LEHMANN, T. and BRUUN, E.: 'Analogue VLSI implementation of back-
propagation learning in artificial neural networks', Proceedings of the ll*** 
European Conference on Circuit Theory and Design, Vol. 1, pp. 491-496, 1993. 
[61] VITTOZ, E. A.: 'Analog VLSI signal processing: why, where, and how?'. Journal 
of VLSI Signal Processing, Vol. 8, pp. 27-44, 1994. 
[62] LINARES-BARRANCO, B.; SANCHEZ-SINENCIO, E.; RODRIGUEZ-
VAZQUEZ, A. and HUERTAS, J. L.: 'A modular T-mode design approach for 
232 
analog neural network hardware implementations', IEEE Journal of Solid-State 
Circuits, Vol. 27, pp. 701-713, 1992. 
[63] PICKARD, S. J; JABRI, M. A.; LEONG, P. H. W.; FLOWER, B. and 
HENDERSON, P.: 'Low power analogue VLSI implementation of a feed-forward 
neural network', Proceedings of the Third Australian Conference on Neural 
Networks, pp. 88-91, 1992. 
[64] COUE, D. and WILSON, G.: 'A four-quadrant subthreshold mode multiplier for 
analog neural-network applications', IEEE Transactions on Neural Networks, 
Vol. 7, pp. 1212-1219, 1996. 
[65] ANDREOU, A. G.; BOAHEN, K. A.; POULIQUEN, P. O.; PAVASOVIC, A.; 
JENKINS, R. E. and STROHBEHN, K.: 'Current-mode subthreshold MOS 
circuits for analog VLSI neural systems', IEEE Transactions on Neural Networks, 
Vol. 2, pp. 205-213, 1991. 
[66] BOAHEN, K. A.; POULIQUEN, P. O.; ANDREOU, A. G. and JENKJNS, R. E.: 
'A heteroassociative memory using current-mode MOS analog VLSI circuits', 
IEEE Transactions on Circuits and Systems, Vol. 36, pp. 747-755, 1989. 
[67] COIiEN, M, H. and ANDREOU, A. G.: 'Currcnt-mode subthreshold MOS 
implementation of the Herault-Juttcn autoadaptive network', IEEE Journal of 
Solid-state Circuits, Vol. 27, pp. 714-727, 1992. 
[68] LINARES-BARRANCO, B.; SANCHEZ-SINENCIO, E.; RODRIGUEZ-
VAZQUEZ, A. and HUERTAS, J. L.: 'A CMOS analog adaptive BAM with on-
chip learning and weight refreshing', IEEE Transactions on Neural Networks, 
Vol. 4, pp. 445-455, 1993. 
233 
[69] COHEN, M. H. and ANDREOU, A. G.: 'Analog CMOS integration and 
experimentation with an autoadaptive independent component analyzer' IEEE 
Transactions on Circuits and Systems-II, Vol, 42, pp. 65-77, 1995. 
[70] GILBERT, B.: 'A precise four-quadrant multiplier with subnanosecond response', 
IEEE Journal of Solid-Stalc Circuits, Vol. SC-3, pp. 365-373, 1968. 
[71] HORIO, Y. and NAKAMURA, S.: 'Analog memories for VLSI ncurocomputing', 
in Artificial neural networks: paradigms, applications, and hardware 
implementations; SANCITEZ-SINENCIO, E. and LAU, C ; IEEE Press, pp. 344-
363, 1992. 
[72] VITTOZ, E. and FELLRATH, J.: 'CMOS analog integrated circuits based on 
weak inversion operation', IEEE Journal of Solid-Statc Circuits, Vol. SC-12, pp. 
224-231, 1977, 
[73] PAVASOVIC, A.: 'Subthreshold region MOSFET mismatch analysis and 
modeling for analog VLSI systems', Ph.D. dissertation, Johns Hopkins 
University, Baltimore, MD, 1991. 
[74] TSrVIDIS , Y. P.: 'Operation and modeling of the MOS transistor'. New York: 
McGraw-Hill, 1988. 
[75] BROWNLOW, M.; TARASSENKO, L. and MURRAY, A.: 'Results from pulse-
stream VLSI neural network devices', in VLSI for artificial intelligence and 
neural networks; DELGADO-FRIAS, J. G. and MOORE, W. R; London: 
Plenum, pp, 215-224, 1991. 
234 
[76] MURRAY, A. F.; CORSO, D. D. and TARASSENKO, L : 'Pulse-strcam VLSI 
neural networks mixing analog and digital techniques', IEEE Transactions on 
Neural Networks, Vol. 2, pp. 193-204, 1991. 
[77] MURRAY, A. F.; CHURCHER, S.; HAMILTON, A.; HOLMES, A. J.; 
JACKSON, G. B.; REEKIE, H. M.; and WOODBURN, R. J.: 'Pulse stream VLSI 
neural networks', IEEE Micro, Vol, 14 , pp. 29-39, 1994. 
[78] ZAGMLOUL, M. E.; MEADOR, J. L and NEWCOMB, R.W.: 'Silicon 
implementation of pulse coded neural networks', Boston: Kluwer Academic, 
1994. 
[79] COGGINS, R.; JABRI, M.; FLOWER, B. and PICKARD, S.: 'A low-power 
network for on-line diagnosis of heart patients', IEEE Micro, Vol. 15, pp. 18-25, 
1995. 
[80] IFEACHOR, E. C; PATEL, S. R.; WESTGATE, J.; CURNOW, J. S. and 
GREENE, K. R.: 'Applications of artificial neural networks to fetal monitoring 
during labour', in Techniques and Applications of Neural Networks; TAYLOR, M. 
and LISBOA, P.; New York: Ellis Horwood. pp. 93-107, 1993. 
[81] LAKSHMIKUMAR, K. R.; I-IADAWAY, R. A. and COPELAND, M, A.: 
'Characterization and modeling of mismatch in MOS transistors for precision 
analog design', IEEE Journal of Solid-State Circuits, Vol. SC-21, pp. 1057-1066, 
1986. 
[82] PELGROM, M. J. M.; DUINMAIJER, A. C. J. and WELBERS, A. P. G,: 
'Matching properties of MOS transistors', IEEE Journal of Solid-State Circuits, 
Vol. 24, pp. 1433-1440, 1989. 
235 
[83] FORTI, F. and WRIGHT, M. E.: 'Measurement of MOS current mismatch in the 
weak inversion region', IEEE Journal of Solid-State Circuits, Vol. 29, pp. 
138-142, 1994. 
[84] MEADE, R. L : 'Foundations of electronics', New York: Delmar, 1991. 
[85] ANTOGNETTI, P. and MASSOBRIO. G.: 'Semiconductor device modeling with 
SPICE', New York: McGraw-Hill, 1988. 
[86] TSIVIDIS. Y. P. and ANASTASSIOU, D.: 'Swichcd-capacitor neural networks'. 
Electronics Letters, Vol. 23, pp. 958-959, 1987. 
[87] RODRIGUEZ-VAZQUEZ, A.; RUEDA, A.; liUERTAS, J. L. and 
DOMINGUEZ-CASTRO, R: 'Switched-capacitor neural networks for linear 
programming', Electronics Letters, Vol. 24, pp. 496-498, 1988. 
[88] liANSEN, J. E.; SKELTON, J. K. and ALLSTOT, D. J.: 'A timc-miltiplexed 
switched-capacitor circuit for neural network applications'. Proceeding of the 
IEEE International Symposium on Circuits System, pp. 2177-2180, 1989. 
[89] SI-IEU, B. J. and ITU, C: 'Switch-induced error voltage on a switched capacitor', 
IEEE Journal of Solid-State Circuits, Vol. SC-19, pp. 519-525, 1984. 
[90] EICHENBERGER, C. and GUGGENBUITL, W.: 'On charge injection in analog 
MOS switches and dummy switch compensation techniques', IEEE Transactions 
on Circuits and Systems, Vol. 37, pp. 256-264, 1990. 
[91] WERBOS, P. J.: 'Backpropagation through time: what it does and how it does it', 
Proceddingofthe IEEE, Vol. 78, pp. 1550-1560, 1990. 
236 
[92] JABRI, M. and FLOWER, B.: 'Weight perturbation: an optimal architecture and 
learning technique for analog VLSI feedforward and recurrent multilayer 
networks', Neural Computation, Vol. 3, pp. 546-565, 1991. 
[93] CAIRNS, G. and TARASSENKO, L : 'Precision issues for learning with analog 
VLSI multilayer perccptrons', IEEE Micro, Vol. 15, pp. 54-56, 1995. 
[94] EBERHARDT, S. P,; TAWEL, R.; BROWN, T. X.; DAUD, T. and THAKOOR, 
A. P.: 'Analog VLSI neural networks: implementation issues and examples in 
optimization and supervised learning', I E E E Transactions on Industrial 
Electronics, Vol. 39, pp. 552-564, 1992. 
[95] BORGSTROM, T. H.; ISMAIL, M. and BIBYK, S. B.: 'Programmable 
current-mode neural network for implementation in analogue MOS VLSI', lEE 
proceedings. Vol. 137, pp. 175-184, 1990. 
[96] LEE, B. W; SHEU, B. J. and YANG, H; 'Analog Hoating-gate synapses for 
general-purpose VLSI neural computation', IEEE Transactions on Circuits and 
Systems, Vol. 38, pp. 654-658, 1991. 
[97] G U G G E N B O H L , W . ; D I , J. and G O E T T E , J.: 'Switched-current memory circuits 
for high-precision applications', I E E E Journal of Solid-State Circuits, Vol. 29, pp. 
1108-1116, 1994. 
[98] PAIN, B . and FOSSUM, E. R.: 'A current memory cell with switch feedthrough 
reduction by error feedback', IEEE Journal of Solid-State Circuits, Vol. 29, pp. 
1288-1290, 1994. 
237 
[99] COUE, D. and WILSON, G.: 'CMOS subthreshold-mode W converter for 
analogue neural network applications', Electronics Letters, Vol, 32, pp, 990-991, 
1996. 
[100] SIVILOTTI, M. A.; MAHOWALD, M. A. and MEAD, C. A: 'Real-time visual 
computations using analogue CMOS processing arrays', Advanced Research in 
VLSI: Proceedings of the Stanford Conference, Cambridge, MA; MIT press, pp. 
295-312, 1987. 
[101] S A C K I N G E R , E. and G U G G E N B O H L , W . : 'A high-swing, high-impedance 
MOS cascode circuit', IEEE Journal of Solid-Sate Circuits, Vol. 25, pp. 289-298, 
1990. 
[102]HOGERVORST, R.; TERO, J. P.; ESCHAUZIER, R. G. H. and HUIJSING, J. 
H.: 'A compact power-efficient 3V CMOS rail-to-rail input/output operational 
amplifier for VLSI cell libraries', IEEE Journal of Solid-State Circuits, Vol. 29, pp. 
1505-1513, 1994. 
[103] PETRIDIS, V. and PARASCHIDIS, K.: 'On the properties of the feedforward 
method: a simple training law for on-chip learning', IEEE Transactions on Neural 
Networks, Vol. 6, pp. 1536-1541, 1995. 
238 
Appendix A 
The MOS transistor 
A brief analytical description of the operation of Complementary Metal Oxide 
Semiconductors (CMOS) is presented in this Appendix. The reader may consult 
references [22], [61], [72-74] for more detailed discussions. The basic structure of a four 
terminal n-channel MOS (NMOS) transistor is shown in Fig. A . l (a) and its symbolic 
representation in Fig. A . l (b). The terminals are known as the Gate (G), the Drain (D), 
the Source (S) and the Bulk (B) which is also referred to as the substrate or back-gate. 
The current that flows between the drain and source nodes of the device depends upon 
the density of channel charge carriers which is controlled by the level of gate and source 
potentials. This current is commonly referred to as the drain current I ^ . 
The NMOS transistor models presented in the subsequent sections are also applicable 
to the p-channel MOS (PMOS) device by multiplying all currents and voltages by - L 
A.l Strong inversion 
For gate-to-source voltage (VQ^) greater than the threshold voltage (V.^) concentration 
of electron carriers in the channel is high. In this mode of conduction, the channel also 
239 
known as the inversion layer is said to be strongly inverted and the drain current is 
mainly due to drift [74] and may be expressed as 
lD = KN.{(VGS-VFB-<t»B)VDS-^V2DS 
- | .Y-[cVds - V Q S + (t)B)^ - ((t>B - V B S ) ^ ] } (A . l ) 
where the parameters are defined in Table A . l . For analytical development this 
expression may appear too complicated and therefore needs to be simplified. This may 
be achieved utilising a binomial expansion technique. The Taylors's series of (1 + x)" is 
given by 
(1 + X)" = 1 + n.x + ^ ^ ^ . x ^ + • '^ y" - + ... forx^<l (A.2) 
If one considers the first 3/2 term in (A. l ) , 
((t>B-VBS+VDs)==(4 + VDs)^=4 = (A.3) 
where 
4 = <t)B-VBs (A.4) 
Utilising (A.2) to expand (A.3) gives 
- - 3 
(<t)B - V B S + V D S ) ^ = ((|)B - V B S ) ^ + ^ . V D S . y < f r B - V Q S + f • r- ^ 
BS 
1 Vj ,3 ^ 3 <s 
' ^ ' ( K - V B S ) ' (<t)B-VBs)^ 
for Vps < ((1)0 - VBS). Thus substituting (A.5) into (A. l ) , the drain current may be 
approximated to 
240 
I D = K N . { ( V G S - VFB - - Y - ^ K - V B S ) . V D S - 1+- DS 
y.V DS 
24.((t)B-VBs)^ 64.((t)B-VBs) 
(A.6) 
It can be seen that the coefficient associated to the cubic and subsequently higher terms 
in are comparatively much smaller than that of the square term. Hence the drain 
current may be further approximated to 
I D - K.N- ( V G S - V T ) . V D S - | V ? > S (A.7) 
where 
V T = V T O + Y | y < t ' B - V B S - (A.8) 
is the threshold voltage and 
5 = 1 + 
2-7(1)8-VBS 
(A.9) 
is the zero bias threshold voltage given by 
VTo = V F B + < t ) B + Y - y ^ (A. 10) 
Note that when both the substrate is lightly doped and the gate oxide is thin, y is small 
and the drain current (A.7) simplifies to the well known form 
D - K.N- (VGS - VT) .VDS - TiVos 
i , , 2 
2 
( A . l l ) 
Operated in strong inversion (also called above threshold) the NMOS transistor therefore 
displays an ohmic characteristic as long as the drain-to-source voltage do not reach the 
saturation level which occurs when d Ip / d V^ j = 0 and is given by 
241 
VDS = V G S - V T . ( A . 12) 
I f so, substituting (A. 12) into (A. l 1), an NMOS device biased in its saturation region of 
strong inversion is modelled by 
I D = ^ . ( V G S - V T ) ' ( A . 1 3 ) 
A.2 Weak inversion 
When the density of mobile charge carriers is lower than depletion charge, the channel is 
weakly inverted. This condition of operation arises for gate-to-source biasing voltages 
lower than the threshold voltage. This is referred to as the subthreshold mode of 
conduction or weak inversion. The flow of electrons is caused by diffusion [ 2 2 ] , [61], 
[ 7 2 - 7 4 ] which materialises as 
L 2^(1)8 - 5 .V , - VBS ^ ^ ^ 
. e x p ( l , . K , . ^ ] . | , . » p ( X M ) . ^ | „ , 4 , 
where K and VQ are defined in table A . l . This expression maybe rewritten in a simpler 
form as 
lD = f l D o . e x p ( K . ^ ] . e x p ( [ l - K ] . ^ ) . | l - e x p ( ^ ) + ^ | (A.15) 
where I^ois the characteristic current and is given by 
lDQ = ^N.Cox.Vf. • ^ = . e x p f - 2 . f ^ l (A.16) 
One can see that for drain-to-source quiescent potential greater than 4.V, but remaining 
low enough to disregard the Early effect, (A. 15) reduces to 
242 
I D = -^.IDo.exp^^K.—^.exp(^[l-K].-;^ J ( A . 1 7 ) 
When the substrate potential with reference to that of the source is null, the above 
expression further simplifies to the well known form 
lD = f . lDo . exp j^K. -^ j ( A . 18) 
243 
(b) 
Figure A. l : An n-channel MOS transistor, (a) structure, (b) symbol. 
244 
Symbol Unit Description 
Transconductancc parameter 
cmWs Electron mobility 
^ Eox 
^ O X ~ 1 ••ox F/m' Gate oxide capacitance per unit 
area 
Eox ~ koX'Eo F/m Permittivity of silicon dioxide 
e„ = 8.854 X 10 " F/m Permittivity of free space 
Dielectric constant of the 
insulator 
m Thin gate oxide thickness 
W m Drawn channel width 
L m Drawn channel length 
V 
^GS 
V Gate to source voltage 
V 
^DS 
V Drain to source voltage 
V 
^DS 
V Bulk to source voltage 
V 
^FD 
V Flat band voltage 
A - \7 1 - NsuB 
<PB - 2 . V , . In—r— 
V Surface potential 
V Thermal voltage 
k = 1.38 X 10-" J/K Boltzmann's constant 
T K Temperature 
q = 1.6 X 10 ' ' C Electron charge 
^SUD cm Substrate doping 
ni= 1.45 X 10'^  cm'^  Intrinsic carrier concentration 
245 
Symbol Unit Description 
(2.q.esi.NsuD)2 
£ : = 1.04x10 10 
K = 
1 + 
2^(1)8 - 5.V, - VBS 
F/m 
V T = V T o + Y . ( y ( t ) B - V B S - J ^ ) V 
V - n ) = V F B + ( | ) B + Y . y ^ V 
Bulk threshold parameter 
Permittivity of silicon 
Threshold voltage 
Zero bias threshold voltage 
Effectiveness of the gate in 
Controlling the channel current 
Early voltage 
Table A . l : List of Symbols. 
246 
Appendix B 
CMOS parameters for M I E T E C 2.4|Lim 
double poly, double metal process 
247 
Parameter NMOS PMOS 
VTO, V 0.86 -0.85 
TOX, X lO ' m 40.29 42.46 
NSUB, X 10" cm"' 1.39 9.1 
XJ, X 10-^  m 0.3 0.5 
LD, X lO'^m 0.22 0.35 
UO, X cmWs 611.37 233.84 
VMAX, X 10^  m/s 158 225.67 
DELTA 0.85 0.96 
THETA, V 0.05 0.12 
ETA 0.07 0.06 
ICAPPA 1.4 9.23 
GAMMA, V'^  0.26 0.69 
NFS, X 10" cm-' 1.35 3.92 
CGBO, X 10-'** F/m 5.57 5.57 
PB, V 0.65 0.76 
CJ, 10-" F/m 0.69 3.1 
MJ 0.5 0.5 
CJSW, X lO-'Op/m 3.43 3.67 
MJSW 0.27 0.38 
PHI, V 0.62 0.67 
RSH,Q 33.42 35.01 
ICP, X lO- 'A /V 51.71 19.14 
JS, A 0.001 0.001 
248 
Appendix C 
Analysis of a diode-based 
current-to-voltage converter 
The objective of this Appendix is to derive the transfer characteristic of a diode-based 
current-to-voltage converter. The notation and configuration refer to Fig. 2.23. The 
analysis is based on the assumption that the operating conditions are such that the 
current flowing through an MOS device biased in weak inversion, as in Appendix A, is 
exponentially related to its gate-source potential and that all transistors have matched 
characteristics. For theoretical purposes the characteristic current and the effectiveness 
of the gate in controlling the channel current parameters of n- and p-channel devices are 
assumed to be identical. However this assumption does not necessarily hold true in 
practice. It is shown in the next Appendix that this limitation may be avoided by using a 
core of single type transistors and a feedback mechanism. 
Consider the resistive branch made up of transistor M, and M 3 , based on the 
Kirchhoff s current principle, the input current is related to 
I S = - I M I - I M 3 (C.l) 
249 
where I ^ , and are the drain currents of M, and M 3 , respectively. Based on (A. 18), 
these drain currents can be expressed as 
I M I = I x . e x p [ ^ K - ^ j =Ix.exp^K 
V d d - E , - V ; ^ 
(C.2) 
and 
IM3 = - I x - e x p l ^ - K ^ ^ j = -Ix-exp 
r . E 2 - v ; 
(C.3) 
where VQS, and V^gj are respectively the gate-to-source potentials of M , and M j . V^^ is 
the supply voltage and E, and are the diodes biasing potentials. Thus combining 
(C. 1), (C.2) and (C.3) it can readily be shown that 
Is = Ix.expi^K ""^^^ j . exp 
^ Vdd-E,+E2^ 
K. 
V, 
- exp 
+ . Vdd-E,+E2 
V, (C.4) 
which reduces to 
I ^ = 2 . I x . e x p ( K X 4 ^ 
Vdd-E,+E2^ 
K. 
V, 
(C.5) 
Hence 
^ - ^ - - 4 ^ . ^ . s i n h - ^ 
2 .1J 
(C.6) 
where I'X is the quiescent bias current given by 
(C.7) 
250 
(C.6) shows that V% is the sum of a quiescent and a dynamic component. Note that the 
former can be adjusted to V^^ / 2 i f E, = E .^ 
Given the symmetry of the scheme, it can be readily derived that 
Vdd-Ei+E2 , V, + -j^.sinh-' 
2.\J 
(C.8) 
Since 1% = -V^ = iJ2, then it can be concluded that 
V Y = V i - V - v = ^ . s i n h - i 's 
.4.1,y 
(C.9) 
251 
Appendix D 
Analysis of a diode-based 
current-to-voltage converter 
incorporating a feed-back mechanism 
The aim of this Appendix is to derive an analytical expression for the transfer 
characteristic of the diode-based current-to-voltagc converter depicted in Fig. 3.8 (a). 
The analysis is based on the simplified model of a PMOS device operated in the 
subthreshold mode of conduction (3.22). It has also been assumed that all devices have 
similar features. The notation and configuration refer to Fig. 3.8 (a). 
Since it was assumed that all the transistors in the resistive scheme shown in Fig. 
2.23 had identical characteristics, it can therefore be appreciated that the operation of 
the core diode group of the resistive load incorporating the feedback arrangement is 
described by (C.7) and (C.9) as 
2.V 
Vo = ^^T^.sinh"' 
r . ^ 
(D. l ) 
252 
where 
.«.,.c4.X^^^) ,D.2) 
where V^j is the supply voltage and E, and Ej are respectively the potentials across 
diodes M5 and M^. An expression for V^j - [E^ + E^] in (D.2) may be derived as follow. 
Consider the shunt device M7, its drain current may be expressed as 
IM7 = - I x . e x p ( - K . ^ ) = - l x . e x p ( K . ^ ' ^ ' * " ^ | ' ^ ° ) (D.3) 
where VQJ^ is the gate-to-sou roe potential of My and E^ is the potential across diodes Mg 
and M^. It may be noticed that the biasing current of the core diode group is given by 
IC = IM7 - IM6 (D.4) 
where I^g is the drain current of and can be expressed as 
IM6 = - I x . c x p [ - K . ^ ] = - i x . e x p ^ K . - ^ ] (D.5) 
where V^g^ is the gate-to-source potential of M^. Thus combining (D.3), (D.4) and 
(D.5), it can readily be shown that 
Vdd - [E, + E2] = Ea - ^ . I n f l + ^ ] (D.6) 
^ IM7>' 
Given that the current source biases both diodes Mg and Mg, the sum of their gate-lo-
source potentials can be expressed as 
(D.7) 
Then combining (D. l ) , (D.2), (D.6) and (D.7) it can be derived that the differential 
potential generated by the resistive load is given by 
253 
Vo = —;r-.sinh ' ( D . 8 ) 
254 
Appendix E 
Switch-induced error voltage on an 
elementary PMOS-based 
sample-and-hold circuit 
The aim of this appendix is to determine an analytical expression that evaluates the 
level of switch-induced error voltage on the elementary PMOS-based samplc-and-hold 
circuit depicted in Fig 4.9. The analysis is based on the lumped models developed by 
Shcu and Hu [89] and assumes that half of the channel carriers exit at the hold node and 
a linear variation of the gale voltage between the "ON" and "OFF" steady states (i.e. 
ground and supply voltage V^J. During the swilching-off transient of the PMOS device 
two distinct periods may be identified. 
In the Tirst phase, when the absolute value of the gate-to-source potential is greater 
than that of V -^, the PMOS transistor is biased in the strong inversion region. This 
conductive state is represented by the equivalent lumped model given in Fig. E. 1 (a). 
and are respectively the gate and gate-drain overlap capacitances which may be 
expressed as 
255 
C G = C o x . W ^ . a - 2 , L D ) ( E . 1 ) 
and 
C o v = C o x . W ^ . Z - D ( E . 2 ) 
where C ^ ^ is the gate capacitance per unit area, IV and L are respectively the drawn 
width and drawn length of the transistor and Lj^ is the lateral diffusion distance. 
Applying Kirchhoff s current principle at the drain node, it can be deduced that 
I h o l d = l o v + l G - l D ( E . 3 ) 
Assuming the input signal to be constant 
r - r ^ _ p d(Vw +VD) _ ^ d VD ,p 
d I d t d t 
Since the gate signal is a ramp which starts rising at the origin from ground toward 
supply voltage at a rale U 
V G = U . t ( E . 5 ) 
the sum of the gale and gale-drain overlap capacitance currents can be expressed as 
T 4-T - f r , C G ^ d ( V G - V D - V w ) 
lov + IG - I^Cov + — j . ^-j ( E . 6 ) 
which may be approximated to 
lov + IG = ( C o v + ^ ) - ^ = ( C o v + ^ ) . U ( E . 7 ) 
i f one realistically assumes that |d VQ / d t| » |d / d l | . Operated in the strong inversion 
region, the drain current of the PMOS transistor is given in Appendix A by 
I D = Kp.- ( V T - [ V G - Vw]) .VD + ^ • ( E . 8 ) 
256 
where Kp is the transconductancc parameter of the P M O S device and its threshold 
voltage which is given by 
V T = V T O - Y.(yVdd-Vw-(J)B- f ^ ] ( E . 9 ) 
Since the drain-to-source potential remains small with respect to V^. - ( V Q - V ^ ) , ( E . 8 ) 
can be approximated to 
l D = K p . ( V T W - U . t ) . V D (E.10) 
where V , ^ = V -^ + V^. Thus combining ( E . 3 ) , ( E . 4 ) , ( E . 7 ) and ( E . I O ) , during the 
conductive phase, the behaviour of the basic PMOS-based sample-and-hold circuit is 
described by the following differential equation 
C h o i d . ^ = f C o v + %1.U - Kp.(V-nv - U.t ).VD 
d t V 2 / (E.11) 
A solution to this equation may be obtained by utilising the well-known method of 
variation of parameters and is given by 
V D ( 0 
71 .U .C hold 
2 .K, hold 
cxp 
K , 
2.U.Ch„i 
ern .Vrwl - crff / . { V i w - U . t } (E.12) 
This expression is valid until the gate-to-source potential reaches the limit of the above 
threshold conduction. At this time, t, = V^^ / U, the first period is over and the error 
voltage is given by 
V D ( I I ) 
71 .U .C hold 
2 .K, 
C 
hold 
.cri 
2.U.Choid ' y 
( E . I 3 ) 
257 
The equivalent lumped model of the circuit, in the subsequent turn-off phase, is given 
in Fig. E . I (b). Since the transistor is biased in the weak inversion, the drain current and 
the gate capacitances have a negligible influence, the operation of the circuit is thus 
dictated by the following differential equation 
for which the solution is 
( E . 1 4 ) 
(E.15) 
where 5 = Vj,(t' = 0) is the magnitude of the switch-induced error voltage at the end of 
the first period. Thus the complete solution is 
VD(t') = ^ . U . t ' + Tl.U.Chold 
2.K:, -hold 
K, 
.crl 2.U.C 
.V TW 
hold 
( E . 1 6 ) 
This expression is valid for the gate-to-source voltage rising from to the supply 
voltage. 
At the time t = tj = (V^j - V^-^) / U, the switching-off transient is complete and the 
total amount of switch-induced error voltage on the holding capacitor is 
VDT = 
_ /Tl .U.Chold 
2.K, 
Cov + 
hold 
P - . V T W 
.(-hold / •hold 
(Vdd-Vrw) (E.17) 
258 
D 
"XT 
c^] |c„ 
ID D 
s 
C^ l^ [£G ^how' 
(a) 
D 
D 
I 
^ovj 
Chold 
Vo 
Chold 
Vo 
(b) 
Figure E . l : Equivalent lumped models of the basic PMOS-based 
sample-and-hold circuit, (a) conductive phase, (b) turn-off phase. 
259 
Appendix F 
Switch-induced error voltage on a 
dummy-compensated PMOS-based 
sample-and-hold circuit 
An analytical expression which approximates the magnitude of the switch-induced error 
voltage on the dummy-compensated PMOS-bascd sample-and-hold circuit depicted in 
Fig. 4.10 is derived in this appendix. The analysis is based on the lumped models 
suggested by Sheu et al. [89] and assumes that the width of the switching device is 
twice that of the dummy transistor, the impedances on the drain and source side of the 
switch are identical and the clock signals have opposite slopes and ramp simultaneously 
in a linear fashion. Three different modes of operation may be identified. 
At the origin of the switching-off phase, the switching transistor is biased in the 
strong inversion while the dummy device is weakly inverted, the equivalent lumped 
model of the dummy-compensated scheme is shown in Fig. F . I (a), where all the 
parameters have been defined in the previous appendix. From the Kirchhoffs current 
law 
260 
Ihoid - lov + I G + I G - ID ( F . I ) 
Based on the analysis presented in the previous appendix, and assuming that the clock 
signal of the dummy device falls as 
V ^ = V d d - U . t (F.2) 
it can readily be shown that, in the early stage of the switching, the transient 
characteristic is described by 
C h o i d . ^ = ^ . U - K P . C V T W - U.t ) .VD 
The solution of this differential equation is 
(F.3) 
VD(t) = TC.U.Chold ( Co 
2 . K P \2.C hold 
exp 
K P , U 
erf 
K , 
2 . U . C 
.V- , 
hold 
^2.Chold L 
TW 
1 2 
u 
cr L1/2.U.C 
{VT^v - U. t} 
hold 
(F.4) 
which is valid until the gate-to-source potential of the dummy transistor falls to V ^ ^ at 
which point the dummy device enters into the strong inversion regime. At this time, t, = 
( V j j - V ^ ) / U , the level of error voltage is 
VD 
X _ Tt.U.Chold ( C G ^ ( K p , 
V 2 . K p V2.Choid>' v2.U.Choid 
TW 
crn K , 
2.U.Ch„,d 
V TW 
K , 
en 2 . U . C hold 
.[2.VTW - Vdd] (F.5) 
and the second period commences. During this phase both switching and dummy 
devices are strongly inverted. This stale is modelled in Fig. F . l (b) and described by the 
following system equation 
C h o i d . ^ = - I D = - K P . ( V T W - U.t ' ) . V D (F.6) 
261 
for which the solution is 
) (F.7) 
where 5 is determined by the level of switch-induced voltage at the origin of this period 
(F.5). Since v^O' = 0) = VpO,), it can readily be shown that 
erfl K , 
2 . U . C 
V ™ - ern 
hold 2 . U . C 
[2.VTW - Vdd] 
hold 
(F.8) 
Substituting (F.8) back into (F.7), the complete solution of (F.6) is 
- < ' > / ^ - ( ^ ) - p ( z T f e ^ f [ v . - 2 . v ™ ] ^ - v ^ v ^ [ u . t ' - v ™ ] ^ } ) 
erfl 
U 2 . U A o , d ' ™ y 
erfl 
^p.u.c 
. [ 2 . V i ^ - Vdd] (F.9) 
hold 
which is valid until the switching device reaches the turn-off state (i.e. V .^ = V ^ ) . The 
time required to reach this state is tj = (2 .VT^ - V J / U . At this point the magnitude of 
the error is 
, X _ /Tt .U.Choid ( Cc ^ ( K p ; 
Vj^j+2.VTw-3.Vdd.VT^ 
J 
ern 
K , 
.V- crfl K i 
l,i^2.U.C 
. [2 .VTW - Vdd] 
hold 
(F.IO) 
and the switching device enters the final phase of turn-off During this phase, for which 
the model is depicted in Fig F . l (c), the gate-overlap capacitor of the switching device 
tends to increase the error voltage while the gate and gate-overlap capacitors of the 
dummy transistor over-compensate for this effect. Based on a technique similar to that 
262 
used to evaluate the error signal during the turn-off state of the basic sample-and-hold 
scheme (see previous appendix for details), it can be deduced that, when both clock 
signals (|> and (|)\ have reached their final steady states, the total amount of switch-
induced error voltage on the capacitor is 
VDT 
7t.U.Choid ( Co ] ( K p r. 
2 . K p 'V2 ,Choid>' '* ' ' ' ' 'v2 .U.Chn .H"^ 
ern 
Cox 
. old>' ^ 
\ / 
^ . V T W -erf i 
Vd^j-H2.VTAV-3.VHH.V dd. V TW 
K , 
[ 2 . V - n v - V j , ] 
hold 
2 . C • . ( V d d - V r w ) hold 
(F . l l ) 
263 
- t—r^VAAAV < I T 1 r 
(a) 
*'D 
D 
ov 
(b) 
D 
Iholdt 
C G I T C O V C O V I I C G iQlLv 
+ 2 I — I 1 1 — 1 2 L L I J J 
I l ^ l Cholc 
Vo 
s 
-f—I—VVVVV-f-T—T 1 — I 1 — I 
oldt 
Covl f C g C g ] E ^ v C f l J | C G 
. J - J 1 2 4 I 1 7 ^ 7 I 1 4  
Chold 
Vo 
* 1 I ' 
I I o 4 _ | _ _ | _ I I Iholdt 
Cov Covl C j j l jCflv Cgyl | C n 
1 , 1 4 I 1 7 • 7 I 1 4 
I?. Choid 
Vo 
(c) 
Figure F . l : Equivalent lumped models of the PMOS-based 
dummy- compensated sample-and-hold circuit, (a) 0 < < V^^ 
- ( V T + V ^ ) ; (b) V , , - ( V ^ + V ^ ) < V ^ < V.^ + V , , ; (c) V^. + V ^ < 
264 
Appendix G 
Yield analysis of the digital-to-analogue 
converter 
The yield analysis presented in this appendix is derived in connection with the structure 
of the digital-to-analogue converter (DAC) presented in Fig. 4.12 and is a direct 
extension of the method presented by Lakshmikumar et ai [81]. 
The gain error and the integral linearity of the D A C are mainly determined by the 
accuracy of the individual current sources. While the gain error can be compensated for 
by adjusting the biasing current Ip, the integral linearity depends on the matching 
properties of the n-channel MOS transistors (M,-M7o). It is well known that, in CMOS 
technology, process parameter deviations arc the results of random variations which can 
be characterised by the probability density function known as the normal (or Gaussian) 
distribution [65]. Thus, as shown in [81], the variance of the normalised output of the 
DAC, for any given input digital word, can be expressed as a function of the variance of 
the unit current source 
I F .(^IDAC + IDAC^ 
265 
where I F and o^^arc respectively the mean and variance of the unit current source. 
IDAC is the mean value of the DAC's output and I^AC analogue complement which 
can respectively be formulated as 
lDAC = ^ . i 2 . ' - ' . b i (G .2) 
and 
lDAC = T S 2 . ' ' ( l - b i ) (G .3) 
^ i=l 
where bj is the i*^  digital input of the D A C and is cither 0 or I . The expected value of 
the DAC's output, normalised to full scale, may be shown to be 
- _ iPAC (G.4) 
IDAC + IDAC 
Substituting (G .2) and (G.3) into (G.4), it can readily be shown that 
where j is a decimal representation of the input digital word and can take any value 
between 0 and 255. Thus, substituting (G.2), (G .3) and (G.5) into ( G . I ) , the variance of 
the normalised DAC's output may be re-written as 
2 5 5 ' . ( 6 3 + ^ + 7 ) I F 
V 2 4'' 
Note that the variance is maximum (i.e. dc^ I d] = Qi) when the digital input is half that 
of the full scale and nil for a maximum and minimum output. Based on a Gaussian 
distribution, the probability that the normalised output of the D A C , for any given input 
digital word, has an integral linearity of ± 1 L S B is given by 
266 
which may be simplified to 
erf 
.255.72 .a, 
(G.8) 
The circuit yield of the D A C is obtained [81] by multiplying the probabilities that each 
of the 256 possible outputs have less than ± 1 L S B error 
254 1 
G = n erf 
r» ^255.y2.az/ 
(G.9) 
Substituting (G.6) into (G.9), the circuit yield of the D A C is 
254 
G = nerf1 j.(255-j) • y j 
1.: J 
(G.IO) 
267 
Appendix H 
Published papers 
268 
CMOS subthreshold-mode l/V converter for 
analogue neural network applications 
D . Cou4 and G . Wilson 
Indexing terms: Neural networks. CMOS integrated circuits 
An improval Ify convener Tor analogue neural network 
applications is prcscntod. using devices operating in the 
subthrtshold mode or conduction. The proposed scheme employs 
diode connected tramiston biased by a network tncDrporeting 
feedback. Analysis and simulation suggest thai the modiTiod 
stnictuic olTen a substantial reduction in the sensitivity to supply 
voliafe variations. As the proposal sctone uses only UMOS 
transistors, it can be implemented in a standard single wcU 
process. 
Introduction: Analogue neural networks (ANNs) are typically 
organised as paruUcI layers of processing units (neurons) intercon-
nected by elements defined as synapses. Synaptic multiplications in 
an A N N ai t oflen implemented using linear transconductanoc 
multipliers [ I , 2) allowing the neuronal summation operation to be 
achieved by a simple physical cormection of the synaptic outputs. 
The summed output current is then converted to an equiralent 
potential, as required by the following layer of synapses, using a 
nonlinear resistive load. 
Where A N N architectures require a large number of resistive 
loads, as Tor example when the backpropagalion algorithm is 
implemented on<hip [3] or reconfigurabilily is involved [4], it is 
highly desirable, i f not essentia], that the silicon area associated 
with each' load-resistor circuit, as determined by the number and 
size of the transistors involved, be minimised, and in particular 
that power dissipation be kept to a minimum. In this context, 
designs exploiting the low current levels associated with subthresh-
old operation are receiving much attention. However, because the 
drain current in a subthreshold mode device is an exponential 
fimction of the gate-souroc voltage, the / / K characteristics of load 
resistors arc highly sensitive to the biasing arrangements. This Let-
ter illustrates the sensitivity problem and describes n load-resistor 
configuration for which the sensitivity to power supply variations 
is substantially improved. 
Sensiiiviiy problem: In several reported A N N designs (3 - 5], I/V 
conversion has been achieved using the CMOS airongcmcnt of 
biased diodes as shown in Fig. 1. Note that, for a fully differential 
scheme, the resistor cifcuit consists of four matched iransistore 
M i - M * and two biasing voluigcs E^Ej. With the devices operating 
in [he subthreshold mode of conduction, the output diffcrenoc 
voltage is given by 
(1) 
where = kVq « 26mV ai room temperature, K b a measure of 
the effectiveness of the gate in controlling the channel current and 
if, is the input current. 
/ is the quiescent bias current given by 
/ = / .exp U 
V^, - (E l + Bj) 
2V, 
( 2 ) 
where K« is the supply voltage and /. is the characteristic current 
for the transistors. Given that the sinh function can be approxi-
mated as 
sinh(x) for | i l < 00 (3) 
then for |iol S 2 / the effective driving point resistance may be 
approximated to the first order as 
2 K / , e x p [ K ^ ' ' ^ - % - ^ ^ ' ^ 
(4) 
It will be apprecialcd therefore that this resistance is highly sensi-
tive to variations in both the supply and bias sources. The follow-
ing Section describes a modified resistive load incorporating a 
Ml c ] 
3 r ^ Mi 
Fig. I l/V converter implemented using a CMOS arrangement of biased 
diodes 
feedback mechanism which substantially impro\-es the sensitivity 
to power supply and biasing variations and is suitable for NMOS 
processes. 
Vdd 
c 
WW "Hp 
^ ^ M7 _, 
Fig. 2 Circuit diagram of proposed l/V converter 
Proposed scheme: After rearranging the positions of the bias 
sources £, and Ej, as indicated in the circuit diagram of Fig. 2, the 
core diode group ( M j - M . ) and the associated biasing elements can 
be implemented using only NMOS devices. The biasing potentials 
arc generated by two diode connected transistor? ( M „ M*) and arc 
controlled by a feedback arrangement comprising transistors M T -
M , together with current source Since the potential across the 
diodes M , - M „ is nominally futcd by / „ variations in the supply 
(KM) arc distributed across the gate-source junctions of the shunt 
transistor M , and the biasing transistor M«. With the proviso that 
these transistors have matching characteristics, it follows that half 
of the supply voltage variations will appear across the device M*. 
Given the symmetry of the scheme it also follows that an equal 
variation wUl appear across the diode connected transistor M j . 
With the biasing potential aaoss the core diode group bujfered 
from supply variations the load resistance sensitivity is corre-
spondingly reduced. 
The reduction in sensitivity can be demonstrated with the 
assumption that the operating conditions arc such that the N M O S 
transistor cuTTcnis are related to their gate-souroc potentials as 
exp (5) 
Reprinted from ELECTRONICS LETTERS 23rd May 1996 Vol. 32 No. 11 pp. 990-991 
269 
It can be shown ihal, for well matched devices, the potentials 
oaoss diodes M , and M , can be expressed as 
(6) 
However. is also related to the bias current by the following 
expression: 
/a = / . e x p ( « ^ ) (7) 
Combining eqns. 1, 6 and 7, the output difTerencc voltage may be 
shown to be 
wo 
where is the dram cuncnt for device M , and /, is the current 
biasing the core diode group. It can be seen that the proposed 
scheme offers a resistive characteristic which is set by the bias cur-
rent and the current ratio V/^n. Since is nominally independent 
it follows that the sensitivity of the resistance value is determined 
by the static current ratio IJIm- Any given level of inscnsiiiviiy 
can be achieved by ensuring that the shunt current / „ is sufli-
cicnily larger than the core group current Improved sensitivity 
is thus acquired at the expense of an increase in the static current 
consumption. Given the subthreshold mode of operation and the 
advantages of an NMOS implementation, the static current bur-
den should not present a problem. 
The behaviour of the proposed / / K converter has been assessed 
via a series of PSPICE simulations based on a 3V power supply 
using level 3 models with ponuneters for a 2 . 4 ^ CMOS process. 
The drcuii was simulated for power supply voltage variation of 
llOOmV. with a bias current / , of 2nA and an aspect ratio of 
WL = 1(V1 for all transistors. The results suggest a sensitivity 
improvement >40:1 is readily achievable,..and conArm that as the 
static current ratio I^Im decreases the sensitivity of the nominal 
resistance to power supply variations improves. 
O lEE 1996 21 Febntary 1996 
EUcironla Utters Online No: 19960639 
D. Cou6 and G. Wilson (School of Eleetrople Communication and 
Electrical Engineering. University of Plymouth. Drake Circus. 
Plymouth. Devon PL4 8AA. United Kingdom) 
References 
t MEAD, c: 'Analog VLSI and neural systems' (Addison-Wcslcy. 
New York, 1989) 
2 ISMAIL. M., and FiEZ. T.: 'Analog VLSI signal and information 
processing' (McGraw-Hill. 1994) 
3 DOiXNKO, D.K-, and CARD, H.C: *Tolerancc lo analog hardworc of 
on-chip leamtog in backpropagatton networks', /EEE Trans. 
Neural Netw.. 1995. 6. pp. 1O4J1I052 
4 SATYANARAYANA. s.! 'Analog VLSI implementation of 
reconfigurable neural networks*. Ph.D. dissertation. Columbia 
Univrreity. New York, N Y . 1991 
5 coue.0,, and WILSON.O.: *A four-quttdranl lubthreshoW mode 
. rauUiplier for analogue neural network applications'. IEEE Trans. 
Neural Netw.. to be published 
270 
IEEE TTlANSACnONS ON NEURAL NETWORKS. VOL 7. NO 5. SETTIXBER 1996 
A Four-Quadrant Subthreshold Mode Multipher 
for Analog Neural-Network Applications 
Domin ique Cou6 and George W i l s o n 
Abstraci—A new four-quadrmnt C M O S analog multiplier is 
presented, based on devices operating In the subthreshold mode 
of conduction. The proposed drcuit is a cross-coupled quad 
structure in which difTerential multiplicatioo b obtained by driv-
ing the gate and bulk (back gate) terminals of the devices. 
Analysis and simulatioa have shown that the new structure has 
the characteristics required for the design of very iMrge scale 
integratioo ( \ X S I ) analog neural networks. Although operating 
at subthreshold current levels, reasonable speed can nevertheless 
be obtained since voltage swings are in the range of a few Vj . The 
behavior of the basic multiplier has been assessed experimentally 
using transistor-arrays and simulation studies on a network 
including 11 neurons and 31 synapses indicate a useful level of 
functionality. 
I . INTRODUCTION 
THE four-quadrant multiplier is an important building block for a large number of signal processing applica-
tions, and particularly in analog neural networks (ANN's) . 
Neural networks are typically organized as parallel layers of 
processing units (neurons) interconnected by elements defined 
as synapses ( I ] as illustrated in Fig. I . The output of a 
synapse is the product of its input (output of previous layer) 
and a weight. The function performed by the neural network 
is determined by its topology and the weights associated 
wi th each interconnection. Applications typically require a 
large number of interconnected neurons and therefore synaptic 
connections i.e., multipliers. It is desirable therefore, i f not 
essential, that multiplying elements use a minimum number 
of active devices and dissipate minimum power. In MOS 
technologies the synaptic element may be designed using 
transistors operating in the saturation, linear, or subthreshold 
regions. Subthreshold operation has the advantage that current 
levels arc typically lower than devices biased into strong inver-
sion, but operating speeds are diminished due to the reduced 
ability to charge/discharge capacitive elements. Nevertheless, 
subthreshold mode of operation may be attractive, because 
o f the lower power levels involved and with voltage swing 
requirements in the order of few Vt the relatively low device 
capacitances allow reasonable speeds to be achieved. 
Several reported synaptic designs I 2 H 5 ] exploit the MOS 
version of the Gilbert cell (6) (see Fig. 2) which utilizes six 
transistors arranged as stacked differential pairs. A fixed tail 
current U, is distributed between the differential pairs P | and 
Manuscnpt received November 17. 1994; revised June 20. 1993 and 
December 10. 1995 
The author! are with the School of Electromc Communicauoa and Electncal 
Engineerui|. Utuvertiiy of Plymouth. Plymouth. Devon PL4 8AA. U.K. 
Publisher Item Idenufier S 1045-9227(96)06606-4. 
NEUaON 
layer m-1 layer m 
Fif. 1. Arehitecture of a neural network. 
i o - ( I l - ^ l 4 ) - ( l 2 - ^ b ) 
layer m-»-: 
Fig. 2. MOS version of the GUbert multiplier 
P2 via a third differential pair P3 in a manner determined by 
the difference voltage, i /y = VVi - VVa- The difference output 
current of the Gilbert structure is given by 
« 0 = ( / l + / 3 ) - ( / 2 + /4) 
= / . . t a n h ( . . ^ ) . t a n h ( . . ^ ) ( I ) 
where i/x = K x i - ^X2t = kT/q, is approximately 26 mV 
at room temperature and K is t measure of the effectiveness 
of the gate in controlling the channel current. Given that the 
tanh function can be expanded as 
1 , 2 s 17 
tanh I = I - T X ' + T7X^ -
If) 315 
for | x | < " 
(2) 
104S-9227/96$05,00 C 1996 IEEE 
271 
C O U £ A N D W I L S O N : A F O U R ^ J U A D R A N T S U B T H R E S H O L D M O D E 
t->(Ilfl4l-(i2+b) 
tb°25a oA 
Fig. 3. Cireuii diigram of the proposed muIUplicr. 
(a) 
Fig. 4. Sulic chincteriliics of ihc analog multiplier (a) /» against Vx and 
(b) / , against V'y. 
ihcn for \ux\ and \UY\ 
a first order as 
«o = 
< V | / K , (1) may be approximated to 
(3) 
- 0 ^ 
100 
/ ' 
\ / / \ r • 
> 
N 
lOOf 
•o.s 0 ^ 
Vv M 
Fig. S. Experimental uaiic chnractemtica of the proposed multiplier. 
I t can be seen that the Gilbert circuit opcmtes as a linear 
tnmsconducumce multiplier fo r input voltage differences less 
than V | / K . In the next section it w i l l be shown that i t is possible 
to obtain a similar relationship in an arrangement in which 
the bias current is distributed across two difTcrential pairs 
by modulating their bulk potentials. In Section I D . simulation 
and experimental results for the multiplier are given and in 
Section I V . i t is shown how (he proposed structure con be 
used in the design of a neural network. 
I I . r}iE P R O P O S E D M U U T P U E R 
A n M M D S version o f the proposed four quadrant-multiplier 
is shown i n Fig . 3 and consists o f a current source and 
differential pairs P\ and P2 o f matched transistors operating 
in the subthreshold mode of conduction. One differential input 
is applied as in the Gilbert structure while the second appears 
between the bulk terminals of MiCMj) and Mi{Mi). It should 
be noted thai using the wel l as a multiplier input has previously 
been described in connection with single-quadrant multipliers 
[3] • The proposed scheme thus represents a natural extension 
of the back gate effect to yield the increased functionality 
associated wi th f u l l four-quadiant multiplication. 
For an N M O S transistor, the subthreshold current PH^J 
is given by 
where Vg, is the gatc-to-sourcc potential, Vd, is the drain-
to-source potential, V^, is the bulk-to-source potential. V„ is 
the Early voltage and / , = W/L • / D O , where Ipo is a 
characteristic curr tni . Wi th the assumptions thai the drain-
source quiescent potential Va, > 4-V; but remains low enough 
to disregard Early effect, (4) reduces to 
/ . . = / . e x p ( « - ^ ) e x p ( l l - « l . ^ ) . (5) 
(4) 
272 
IEEE TKANSACnONS ON NEURAL NETWORKS. VOC 7. NO 5. SEPTEMBER 19% 
Layer m 
Fif. 6. Feedforwifd neurml network. 
I( should be emphasized thai the normal operation of MOS 
transistor requires the source/bulk and drain/bulk junctions be 
reverse-biased, i.e.. < 0 and < 0. 
Given that the sum of the device currents at the common 
source node can be expressed as 
/* = / . cxp 
+ exp .(,.-.1.^)1 
the difference current output becomes 
»o = /* tanh '^  2 ^ tanh (1 - « ) • ^ 
Layer m+l 
and dynamic source potentials, respectively. The quiescent and 
dynamic source voltage can be obtained by rearranging (6) as 
follows: 
(10) 
( 7 ) 
+ In 
I +cxp 
I + cxp 
(ID 
Note that ( 1 ) and (7) differ only in the coefficient of the 
factor, • V, which is 1 - « for the proposed scheme 
and K for the Gilben cell. For a typical K value of 0.8 (9J. 
this variation has the effect of increasing the linear range 
of the y differential input of the proposed multiplier to 
V,/( l -«)ss 130 mV. Thus over the signal ranges: i / ^ < V , / K 
and i/y < Vt/{1 - K), (7) can be approximated to a first order 
where Vx and i^ x are the quiescent and dynamic gate po-
tentials, respectively. Combining (9). (10). and (11). the bulk 
dynamic input range can be expressed as 
( K ) •0 = -^-y^ I ' X « ^ 
and is similar to the result obtained for the Gilbert cell. 
I) Limits of Operation: To ensure proper operation of 
the proposed circuit, the source/bulk junctions of transistors 
Ml-A/4 should not be forward biased. This requires that 
VV + i/y < V. 4-1/. (9) 
where Vy is the quiescent bulk common-mode input level, t/y 
I S the dynamic bulk input voluge, V, and u, are the quiescent 
In 
1 -t-exp 
/ J 
(12) 
and an upper limit for i/y can be obtained by setting i/x at 
ux = - 1 0 Vj (which represents a practical lower bound) 
giving 
K U „ = v , . [ . ( l # ) . « . i ^ ^ - ^ - o . . (13) 
273 
C O U £ A N D W I L S O N : A F O U R - Q U A D R A N T S U B T H R E S H O L D M O D E 
m *' m 
m + I 
m+l m+1 
tn + 1 
m+ 1 
Layer m-1 Layer m Layer m + l 
Fig. 7. FeedforwBid neural nuworit with the acUvaiion function distributed over ihc following synapses. 
TS • + 
n 
Fig. 8. Transconducttnce-ihresholding-syiupse. 
Therefore, in the context of neural-network design. 
A»Kx^.VV and the aspect ratio of the transistors {W/L), 
would be chosen in order (o obtain a bulk dynamic input 
range of a few Vt. 
i n . SiMUUAnON AND EXPERIMENTAL RESULTS 
I) Simulation Results: The behavior of the proposed mul-
tiplier has been assessed via a series of PSpice stmulauons 
based on a 2 V power supply using Level 3 models with 
parameters for a 1.5 CMOS process. The simulations 
typically show that for a bias current of 100 nA and an 
aspect ratio of W/L = 6/1 for the transistors MI-MA, a 
bulk dynamic input range of 150 mV con be obtained with the 
common mode voltage of the gate and bulk terminals set to I 
V and 100 mV. respectively. 
The family of simulated sutic characteristics in Rg. 4(a) 
and (b) shows the differential output current as a function of 
the differential inputs t / X y t ^ over the range -50 mV - SO 
mV in equal 20 mV increments, respeciivcly, and confirm 
that the multiplier generates difference output currents related 
274 
IEEE TTlANSACnONS ON NEURAL NETWORKS. WL, ,7 . NO. 3. SEPTEMBER 1996 
L o a d 
8 u a . 
1-
Vdd 
A 
•TTt 
PI \1J' 
Fig. 9. Load. 
i*i>h ( M l 
•as 
^ -6 0 
Fig. 10. Sigmoid aciivuion function. 
10 ihc product of the differeniial input voltages via a lanh 
funcUon. It may aJso be noted thai as expected, the linear 
range of the ">"* differential input is substantially larger than 
that of "X" input Therefore, it can be appreciated thai over 
the extended signal range i/y < v ; / ( l - «) « 130 mV the 
difference output current can be approximalcd as 
of the gate and bulk terminals were set to 2 V and 0.5 V, 
respectively, to obtain a maximum bulk dynamic input voltage 
of 250 mV for a maximum bias current of 1 nA. With a bias 
current of 250 nA. the static transfer curves of the differential 
, output voltage versus input voltage i/y of the multiplier for 
different values of are shown in Fig. 5. 
It may be observed in Fig. 5 thai the output signal is not 
symmetrical about the y axis. This asymmetiy is mainly due to 
mismatching between components. Measurements have shown 
a 5% mismatch for traosiscon on the same chip and 10% for 
transistors from different chips [9]. 
IV. DESIGN AND SIMULATION OF 
AN ANALOG NEURAL NETWORK 
1) Design Method: For a feedforward network (see 
Fig. 6), the neuron transfer function can be expressed in 
the form 
to = S - ^v • tanh (14) 
where J = h '{l-K)/2 'V,. 
It will be shown later that the saturation associated with the 
tanh function can be exploited in the design of neural networks. 
Simulations based on a range of values for the key static 
variables A . / j , Vx and Vy. have been carried out and con-
firm that the gain factor of the differential output current is 
proponional to the bias current h and thai (he dynamic range 
estimate given in (13) is reasonable. 
2) Experimental Results: The performance of the pro-
posed multiplier has been experimentally assessed using 
Siliconix SD 5000 transistor-arrays. For measurement pur-
poses, (he difference current output was converted into a 
differential voltage via the use of matched load resistors. The 
difference output was sensed using tow noise JFET input 
operational amplifiers operated as voltage followers driving a 
differential amplifier. The quiescent common-mode voltages 
/(57') = / (15) 
where /(•] is a nonlinear activation (threshold) function and 
JCJ"** the output of the jth neuron in layer m. The inputs 
to the neuron are usually referred to as connections, and the 
L/^ as weights. Optionally, a bias 6 is added to the weighted 
sum. Each input is either driven by the output of a neuron of 
Ihc previous layer X " " * or an external input 
In the following section we describe on ANN implementa-
tion in which the inherent tanh function of the transcondiic-
tance multiplier described earlier is exploited to perform both 
the activation function and multiplication. 
The modified ANN structure is based on the observation 
that the activation function of a neuron can be distributed over 
each of die forward layer synapses witiiout affecting the overall 
behavior, as shown in Fig. 7. We shall refer to the combined 
thresholding and synaptic weighting as a thresholding-synapse 
(TS). The output and input of the TS block are related as 
f{ST)- (16) 
Here P^-^' and 5] are die output and input of (he TS 
block, respectively, where j and n define die driving neuron 
275 
C O U £ A N D W I L S O N : A F O U R ^ U A D R A N T SUBTHRESHOLD M O D E 
input 
l a y e r 
l a y o r l a y e r output l a y a r 
Fig. I I . Feedforward ANN. 
in ihe layer m and the driven neuron in the layer m + 1. 
rcspcclivcly. 
2) Circuit Implementaiion: An analog circuit implemcnm-
tion of the neural architecture shown in Fig. 7 can be realized 
using the transconductance-thresholding-synopse (TTS) shown 
in Fig. 8. The summation function con be achieved simply 
by connecting TTS outputs to a common bus bar. With the 
assumption that the designer has access to a twin-well process, 
the summed current can subsequently be converted to an 
equivalent balanced potential, as required by the following 
layer TTS blocks, using the load arrangement shown in Fig. 9. 
• It has been demonstrated (14) that in the linear range of 
Vu.>(|VL,;| < 2 - V t / ( 1 - «)) the difference output current of a 
TTS can be expressed in the form 
• Fig. 12. Simplified imall-slgnal c<]uivaleoi circuit of t TTS and load. 
/p • = g . V •. tanh (17) 
3) Load: The summed current Isj, is convened to a volt-
age Vsj using a load comprising four transistors operating as 
diodes. For matched transistors, the difference output voltage 
is given by 
of activation function such as the sigmoid function (hyperbolic 
tangent), as shown in Fig. 10. 
An analog implementation of a feedforward neural network 
using TTS and load circuits is illustrated In Fig. 11. It can 
be noticed that the activation function of the output layer is 
achieved using a load and o TTS circuit for which the weight 
is set to one. It may also be noted that the input data of the 
ANN will have to be prcprocessed to account for the influence 
of the tonh functions on the input layer. 
y / / \ Dcj/^nAjpectt: It can be noted in (19) that, by proper 
• sinh"* ( — ^ ) (18) choice of the bias current h and the current the maximum 
" weight can be set to a desired value. The maximum weight 
value was set to 10. when the bios current was 250 nA and 
I's = 3.125 QA. 
The speed of a TTS is determined by the current it can 
source/sink at its output and by the rcsistivc-capacitive load it 
has to drive. A simplified small-signal equivalent circuit of o 
TTS and load is shown in Fig. 12. The current / is controlled 
by the output current of the TTS. R represents the resistive 
load, and CL is the total capaciumce at ihc TTS output 
where I's = Is • cxp[/c • V^i - 2E/2 • V,); is the supply 
voluige and E is the biasing voltage of the diodes. Combining 
(17) and (18) the normalized output and input variables are 
related as 
/^ - = 4^-K.>tanh(sinh-^(5>)) (19) 
where Pj = Ipj/4 • and Sj = Isj/4 • /J. 
When this expression Is related to (16) it can be seen that 
the value of the synaptic weight is given by uj = g/4 • 
Is • and the activation function is tanh(sinh"*(-)). The 
tanh(sinh"*( )) function is similar to the most common form 
CL = N.Ci + Cc (20) 
where Ci is the average input capacitance of a TTS (including 
wiring capacitances), yv is the number of TTS inputs con-
276 
IE££ TRANSACTIONS ON NEURAL NETWORKS. VOL 7. NO 5. SEPTEMBER 1996 
Perfoniunces of the ANN 
l*y«r 1 Uy.r 2. l«y«r 3 
Fig. 13 Block diagram of the analog oeuraJ neiworit. 
necied at the output of the driving TTS-load, and Cc is the 
capacitance of the current-to-voltage convener. 
If / in*x is the maximum current available at the output of a 
TTS. and CL is assumed to be linear, then the rate of change 
of the output voltage is defined by the nonlinear differential 
equation 
sinh (21) 
For a maximum output current = 7 /^2 = 125 nA (single-
ended), a solution of (21) suggests a settling lime to within 
5% of error of about 0.65 s^ for a capacitive load of 0.25 pF 
(/V ss 10). 
The choice of W/L ratios for the transistors in the 
transconductance-thrcsholding-synapse are made keeping 
in mind five different issues, mainly, the accuracy of 
computation, the area of the TTS. the speed of operation, 
the maximum weight value, and the power dissipation. 
5) Simulation: The utility of the proposed multiplying 
strxicturc has been assessed via a neural-network simulation 
in PSpice of the function (11 
=0.8 v.. sin _ 1 < ^ < 1 . (22) 
The structure of the network implementing (22) is shown in 
Fig. 13 and comprises 11 neurons and 31 synapses arranged in 
three layers. It may be noted that an extra TTS circuit has been 
added to each neuron to bias the weighted sum. The inputs to 
these bias synapses were set to unity, i.e., = 2 • V,//c. 
The neural network was trained using the error back-
propagauon algorithm with the learning rate set to 0.2 [10). 
Whereas the weights of a network to be trained arc usually 
initialized at small random values, in this case, the training 
time was shortened using initial weights set to values given by 
the trained digital equivalent. Fig. 14 shows the trained output 
of the network after 80 iterations, together with the desired 
function and the corresponding error which has an RMS value 
of approximately 6%. 
It will be appreciated that a physical silicon implemenution 
would not necessarily exhibit the performance predicted by 
simulation for several reasons including modeling imperfec-
tions and process/device tolerances [81. [9). However, these 
nonidealites can be compensated for during training of the 
neural-network system (5). 
In practice it would also be necessary to provide additional 
dynamic analog weight storage circuitry to facilitate weight 
adjustment in each synapse during the learning process. These 
weights can be stored as potentials on storage capacitors, and 
set, via access switches, by circuits that convert values stored 
in digital memory into analog signals [ 11 ]. It may be noted that 
the weights cannot be stored precisely due to charge leakage 
through parasitic paths and charge injection from the access 
switches. However, switch charge injection can be reduced 
to few millivolts using compensation techniques [12] and the 
error introduced by the charge leakage can be limited by 
refreshing the weight voltages periodically [13]. 
V . CONCLUSION 
An analog multiplier based on transistors operating in the 
subthreshold mode of conduction has been presented. Analysis 
and simulation have shown that the new scheme produces a 
difference output current related to the product of difference 
voltages via a tanh function. It has been demonstrated that the 
new multiplier can be applied to the design of analog very 
large scale integration (VLSI) neural networks. The proposed 
circuit is area efficient, has a low power dissipation and the 
nonlinearity due to the tanh function can be used to perform the 
activation function. Although operating at subthreshold current 
levels, reasonable speeds can be obtained since voltage swings 
are in the range of a few Vi. 
REFERENCES 
(1) J M. Zurada. Introduction to Ani/icuil Neural Svsiems Si Paul. MN 
West. 1992 
[21 C. Mead. Analog VLSI and Neural Systems Reading. MA Add.ton-
Wealey. 1989 
277 
C O U E A N D w n ^ N : A F O U R ^ J U A D R A N T SUBTHRESHOLD M O D E 
13) M. H. Cohen and A. C. Andmu. 'Ouncnt-mode sutKhrcshoId MOS 
implcmcniaiion of the Hcraull-Juiicn luioadapive ociworit." IEEE J. 
Solid-Siaie Circuiis. vol. 27. pp. 714-727. May 1992. 
(4| B. Unares-Bainnco. E. Sinchez'Slncncio. A. Rodrf2ucz-Vizi}uei. and 
L. Huerui. "A modular T-mode design approach for analog neural* 
neiwort hard«i-are implcmcnmion." IEEE J. Sotid-State Cinuiis, vol. 
27. pp. 701-713. May 1992. 
(5) "A CMOS analog adaptive BAM with on-chip learning and 
weight itfrcshing." /EEE Trans. Neural Neiworks. vol. 4. pp. 445-455. 
May 1993. 
161 B. Gilberi, "A precise four-qutdrani muliiplier with lubnanosecond 
respoiuc." IEEE J. SoUd-State Circuits, vol. SC-3. pp. 365-373. Dee. 
1968. 
(71 E. Viaot and J. Fellraih. "CMOS analog Integrated circuits based on 
weak invcnion operation," IEEE J. Soltd-State Circuits, vol, SC-I2. pp. 
224-231. June 1977. 
(8| A. Pavasovic. "Subthreshold region MOSEFT mismatch analysis and 
modeling for analog VLSI lyiiems." Ph.D. ditsciuiion, J^ xns HopUns 
Univ.. Baltimore. MD. 1990. 
[91 A. C. Aodreou. K. A. Boahen. P. O. Pouliquen. A. Pavasovic. R. E. 
Jenkins, and K. Strohbehn. "Current-mode subthreshold MOS circuits 
for analog VLSI neural systems." IEEE Trans. Neurol Networks, vol. 2. 
pp. 205-213. Mar. 1991. 
[101 S. Haykio. Neural Networks: A Comprefunsive Foundation. New 
YoA: MacmiUan. 1994. 
( I l l Y. Wang. "A modular analog CMOS L.SI for fecdforword neural 
networks with oa<hip BEP learrving.' in Proc IEEE Int. Sytnp. Circuits 
and Syst., 1993. 
[12] C. Eichcnberger and W. Cuggenbuhl. "On charge injection in analog 
MOS twitches and dummy switch compentation tahniques,"* IEEE 
Trans. Circuiu and SysL, vol. 37. pp. 25^264. Feb. 1990. 
113) A. F. Murray. "Silicon implementations of neural networks." lEBProc., 
vol. 138. pp. 3-12. Feb. 1991. 
Dominique Caat received the Dipldme Univcni-
tain de Technologie en G£nie Clectronique ei Infor-
matiqoe Industrelle from the Univenitf d'Angen. 
Frvtce, in 1990, and the B.Eng. degree in electrical 
and electronic engineeriog from the University of 
Plymouth, U.K.. in 1992. He is currently a Ph.D. 
degree candidite in analog VLSI and neural nei-
. works at Plyn»uth Univeniiy. 
His research interests Include artificial neural 
iKtworks and integrtted circuit design. 
George WOson received the B.Sc. and Ph.D. 
degrees in dccirical engineering from Sunderland 
Polytechnic. SunderUnd. U.K., in 1970 and 1973. 
respectively. 
From 1974 to 1983 he was an Academic 
Staff Member at the James Cook Univcnity of 
North Queensland. Austrvlia. He spent three years 
as a Lecturer in the Depanment of Electronics 
and Informaiioo Engineering at Southampton 
Univcniry. U.K.. and is currently a Reader ai UK 
University of Plymouth. U.K. Hii research interests 
include electronic filten. iniegnted circuits, and neural nctwotlcs. 
278 
