A solid-state electronic two-tap weight linear adaptive neuron by Chen, Chun-Yu Malcolm
Lehigh University
Lehigh Preserve
Theses and Dissertations
1991
A solid-state electronic two-tap weight linear
adaptive neuron
Chun-Yu Malcolm Chen
Lehigh University
Follow this and additional works at: https://preserve.lehigh.edu/etd
Part of the Electrical and Computer Engineering Commons
This Thesis is brought to you for free and open access by Lehigh Preserve. It has been accepted for inclusion in Theses and Dissertations by an
authorized administrator of Lehigh Preserve. For more information, please contact preserve@lehigh.edu.
Recommended Citation
Chen, Chun-Yu Malcolm, "A solid-state electronic two-tap weight linear adaptive neuron" (1991). Theses and Dissertations. 5373.
https://preserve.lehigh.edu/etd/5373
0 
-'-~ ' ~ 
.. ~·-
" . -; 
, };it~{f '.ltrt 
A SOLID-S'l'ATE ELECTRONIC 1'WO-TAP 
WEIGHT LINEAR ADAPTIVE NEURON 
by 
Chun-Yu Malcolm Chen 
A Thesis 
Presented to the Graduate Committee 
of Lehigh University 
in Candidacy for the Degree of 
Master of Science 
• 
m 
Electrical Engineering· 
September 24, 1990 
-~-- .I 
.---'-.. i ·. 
• 
Certificate of Approval 
Thia thesis is accepted and approved in partial fnlftllment of the 
requirements for the deg1 ee of Muter of Science. 
0 
•• 11 
' '· 
~· 
ACKNOWLEDGMENTS 
Acknowledgments 
The author wishes to express his immense gratitude and respect to Dr. 
Marvin H. White, for his guidance and support. Without his indispensable 
advice and encouragement, this work would not be possible. The author would 
like to extend his appreciation to Richard Siergiej for his ever ready help and 
patience to my endless questions. A special word of thanks to Peggy French for 
providing the SONOS transistor samples. It is an opportunity to acknowledge 
the author's fellow colleagues, Sukyoon Yoon, Yin Hu, Zhigang Ma, and Paul 
Orphanos for creating and sharing a harmonic working environment and for 
their valuable comments. 
The author treasures the precious discussions of Dr. Frank Libsch, Dr. 
Anirban Roy, Dr. Thomas Krutsick, and Dr. Umesh Sharma during his first 
year of graduate studies. The author is grateful to Dr. Richard Booth for 
providing software package which is used in part of the thesis. The 
contributions of a summer student, Harikaran Sathianathan, have been most 
helpful. 
The author is indebted to the Office of Naval Research (ONR) and the 
' 
Defense Advanced Research Project Agency (DARPA) Artificial Intelligence 
Neural Networks Technology Program for funding the work at the Sherman 
Fairchild Center. Fellowship support from the National Science Foundation 
(NSF) Engineering Research Center of Advanced Technology for Large 
Structural Systems (ATLSS) at Lehigh University is greatly appreciated. 
This work is dedicated to the parents of the author for their everlasting 
love and support and their spiritual influence toward excellence. 
••• m 
I 
, Table of Contents 
I 
1. Introduction 
1.1 Backgro11nd 
1.2 Electrical Implementation of Synaptic Weights 
1.2.1 Analog versus Digital Hardware Implementation 
1.2.2 Nonvolatile Memory Devices in Neural Networks 
1.3 Scope of this Thesis 
2. Theory of the TJnear Adaptive Neuron 
2.1 Discussion of Ter,a,inology 
2.2 Architectures of Various Neural Networks 
2.2.1 Structure of Neural Networks 
2.2.2 Classification of Neural Networks 
2.2.3 Topology of Neural Networks 
2.3 Advantages and the Applications of Neural Networks 
2.4 Synaptic Weight Formulation 
2.5 The Programming Algorithm 
2.5.1 The Least Mean Square Error Algorithm 
2.5.2 The Clipped Data Least Mean Square Error Algorithm 
2.5.3 Other Fornas of Least Mean Square Error Algorithm 
3. Technology and Characteristics of Modifiable Synapses 
3 .1 Background 
3.2 Fabrication Sequence 
3.3 Characterization of the SONOS Devices 
3.3.1 High Frequency C-V Measurements 
3.3.2 Linear Voltage Ramp Technique 
3.3.3 Dynamic Range Characterization 
3.3.4 Erase/Write Characterization 
3.3.5 Retention Characterization 
• 
1 
2 
s 
s 
6 
6 
8 
8 
10 
10 
13 
14 
15 
18 
21 
25 
28 
29 
32 
32 
38 
41 
42 
45 
48 
50 
54 
4. Theoretical Analysis of the Linear Adaptive Neuron 56 
4.1 Operational Theory 56 
4.2 Theoretical Analysis of the Two-Tap Weight Li.near Adaptive 57 
Neuron with the LMS Learning Algorithm 
4.3 Theoretical Analysis of the Two-Tap Weight Li.near Adaptive 61 
Neuron with the Clipped-data LMS Learning Algorithm 
5. Experimental Setup and Results 67 
• lV 
~;. 
I 
;_:,.._:· 
, 
ttABJ ,I OP CONTIN'l'I 
6.1 The Linear Adaptive Neuron Description 87 
5.1.1 Digital Control/Clocking Module 70 
5.1.2 The Analog Delay Line 76 
5.1.3 The Analog Signal Processor 80 
5.1.4 The Learning Algorithm Module 84 
5.1.5 The Steering Network 86 
6.2 Measurement Setup and Results 87 
5.2.1 Output and Training Signals versus Time Characteristics 89 
5.2.2 Error Signal versus Time Characteristics 91 
6. Conclusions 96 
References 98 
Appendix A. Learning Algorithms 100 
A.1 Hopfield Net 
A.2 The Hamming Net 
A.3 The Carpenter/Grossberg Classifier 
A.4 Single Layer Perceptron 
A.5 Multi-Layer Perceptron 
A.6 Kohonen's Self Organizing Feature Maps 
101 
102 
104 
107 
110 
113 
Appendix B. Derivation of the Varying Convergence Factor 116 
Appendix C. Software Simulation of the JJnear Adaptive 125 
Neuron 
Vita 131 
V 
wrr o, nouus 
List of Figures 
Pipre 1-1: Biological Neuronl 8 
Figure 1-2: Electrical Implementation o!Neural Networb 4 
Figure 1-S: Comparison between the Biological and Elecbica) 5 
Neuron 
Figure 2-1: Different Type of Nonlinearities in Neuron 8 
Figure 2-2: Simplified Block Diagram of the Linear Adaptive Neuron 11 
Figure 2-3: Conceptual View of the Multi-layer Neural System 12 
Figure 2-4: Highlight of the Thesis in the Multi-layer Neural System 13 
Figure 2-5: Schematic Comparison between AI and Neural 17 
Networks9 
Figure 2-6: Schematic Diagram of the Linear Combiner 22 
Figure 2-7: Comparison of Various Algorithms for an Adaptive 31 
Equalizer15 
Figure 3-1: Comparison between a Floating Gate Device and a 33 
Floating Trap Device 
Figure 3-2: Comparison between the MNOS and the SONOS device 34 
Figure 3-3: Photograph of the TP300 Design 39 
Figure 3-4: Photograph of the transistor array in the TP300 39 
Figure 3-5: Block Diagram of the High Frequency C-V Measurement 43 
Setup 
Figure 3-6: High Frequency C-V Curve of a SONOS device
17 44 
Figure 3-7: Block Diagram of the Linear Voltage Ramp 45 
Measurement Setup 
Figure 3-8: Linear Voltage Ramp C-V Curve of a SONOS Device 47 
Figure 3-9: Dytaaroic Range Characteristics of the SONOS Devices 50 
W=150 microns, L=lOO microns 
Figure 3-10: Block Diagram of the Dynamic Measurement Setup 51 
Figure 3-11: Four Testing Modes Performed by the Dynamic 52 
Measurement Setupl8 
Figure 3-12: Erase/Write Curve of a SONOS Device W=150 microns, 53 
L=lOO microns 
Figure 3-13: Retention Curve W=150 microns, L=lOO microns 55 ~ 
Figure 5-1: Block Diagram of the Single 4,vel T,inear Adaptive 69 
Neuron 
Figure 5-2: Timing Diagram of the Digital Control/Clocking Module 71 
Figure 5-3: Schematic of the Digital Control/Clocking Module 74 
Figure 5-4: Timing Diagram of the Digital Contro1/Clocking Module 75 
with Idle Clock Signal 
Figure 5-5: Schematic of the Digital Control/Clocking Module with 76 
Idle Clock Signal 
Figure 5-6: Schematic of the Analog Delay T,ine with Input 77 
Formation Circuitry 
Figure 5-7: Schematic of the Analog Signal Processor 81 
Figure 5-8: Schematic of the Learning Algorithm Module 85 
Figure 5-9: Schematic of the Steering Network 88 
• V1 
·, 
}: 
, I 
( 
~ 
' 
I,. 
!, 
p; 
@•, 
t:> 
" 
' 
(. 
I 
;',:,'•' 
'- . 
,-.. 
;;• 
.?~·, 
~, -. 
'~~; 
,. 
···t:· 
. -;-' .-_.,( .. 
I 
JIIT Of ftGUIII 
Fipre 5-10: Output and Training Sipa1e venua Time 90 
Characteristics: (a)Initia.lized and (b) Adapted 
Figure 5-11: Error versus Time Characteristics - Initialization 
Scheme: + - + -
Figure 5-12: Error versus Time Characteristics - Initializat.i.on 98 
Scheme: + - - + 
Figure 5-13: Error versus Time Characteristics - Initialization 98 
Scheme: - + + -
Figure 5-14: Error versus Time Characteristics - Initialization 
Scheme: - + - + 
Figure A-1: Taxonomy of the Neural Networks 100 
Figure C-1: Simulated Convergence Behavior of the Linear Adaptive 129 
Neuron with W0=10 and W1=-10 (a) Variable 
Convergence Factor (b) Fixed Convergence Factor 
• 
•• Vll 
" 
. \ 
1,11T or TAIi n 
List of Tables 
Table S-1: Operation Mode of SONOS Devices with a Dual Power 
Supply 
Table 3-2: Operation Mode of SONOS Deivces with a Single Power 35 
Supply 
Table 5-1: Clock Signal Definition 72. 
Table 5-2: Clock Signal Generation Operation 72 
Table 5-3: Analog Delay Line Operation 79 
Table 5-4: Correlated Double Sampling Circuit Operation 83 
Table 5-5: Steering Network Circuit Operation , 87 
Table 5-6: Summary of the Experimental Results 94 
••• VU1 
Abstract 
A solid state electronic two-tap weight linear adaptive neuron baa been 
designed and breadboarded. The electronic neuron employs a novel nonvolatile 
memory device, configured as an electrically reprogrammable analog 
conductance, as the synaptic element. A learning rule, known as the Widrow-
Hoff's delta rule, is also incorporated in the neuron. Furthermore, the neuron 
trains itself according to a so-called Clipped-data Least Mean Square Error 
(CLMSE) learning algorithm implemented in the linear adaptive neuron. 
The electrical properties of the nonvolatile memory device for the 
realization of the analog reprogrammable conductance are examined. These 
properties include the erase/write, memory retention, and dynamic range 
characteristics. We have concluded from our investigation the nonvolatile 
memory device possesses many attractive features which make it an ideal 
candidate for the implementation of the synaptic element. 
The electrical perforix,ance of the linear adaptive neuron has been 
evaluated and presented. Theoretical analysis and a software simulation 
routine are also included in this thesis. The unique combination of the learning 
algorithm and the synaptic element serves as an opportunity to study the basic 
functional block in a large neural network. 
\ 
1 
~. 
1.1 Backgi. ound 
Chapter 1 
Introduction 
INTR0DUC110N 
Neural networks have received tremendous attention over the past few 
years. Parallel architecture, a distinct feature of the neural networks, offers 
engineers an additional dimension to realize learning machines over the 
existing serial computers via software simulation. A neural network is a system 
composed of many simple processing elements in parallel whose function is 
determined by network structure, connection strengths, and the processing 
performed at computing elements or nodes. Neural networks can perform high-
level tasks such as adaptation, or low-level tasks such as speech recognition, 
preprocessing sensory input data for vision task.s. 1 
In neuro-biology language, adaptation requires modification of the 
synaptic strength between neurons. A neuron will fire an electrical impulse 
when the summation of the all the input signals coupled through their synaptic 
strengths exceed a certain threshold value. Figure 1-1 shows how a neuron 
might look like in a biological neural system. In the electrical implementation 
of neural n~tworks, a neuron cell body is represented by a summing amplifier. 
The variable synaptic weights are represented by variable resistors. The neural 
networks also need learning algorithm implementations for updating or 
adjusting the synaptic weights. Figure 1-2 show a conceptual diagram of an 
electrical neural network implementation. Figure 1-3 shows a comparison 
between the biological neuron and its electrical counterpart. A more detailed 
discussion of how the electrical neuron mimics the biological neuron in each 
category will be presented in the thesis. 
2 
11 IC'i1UCAL IMPLDCENTATION OF SYNAPTIC WEIGHTS 
SYNAPSE 
DENDRITE 
' I 
\ 
Figure 1-1: Biological Neuron! 
1.2 Electrical Implementation of Synaptic Weights 
1.2.1 Analog versus Digital Hardware Impl~mentation 
From a biological point of view, neurons are almost always nonlinear, 
typically analog in nature and may be slow compared to modem digital 
circuitry. Neurons may also include temporal integration and other types of 
time dependencies and also mathematical operations more complex than 
summation.1 The question of whether the electrical artificial neural networks 
should be implemented in the analog fashion or in digital fashion is an on-going 
3 
I 
I , 
" 
• 
• 
• 
, .. 
ANALOG VERSUS DIGIT AL HA.ROW ARE IMPLEMENTATION 
synapse 
synapse 
axon 
axon 
• 
• 
• 
• 
axon 
synapse 
1 synapse 
t I I 
I: I __ 
,---
---
Leaming 
Algorithm 
d 
Figure 1-2: Electrical Implementation of Neural 
Networks 
training 
signal 
+ 
error 
debate. Some researchers have implemented the synaptic w
eights and the 
network itself digitally2. Digital implementation of th~_ elect
rical neural 
networks enjoys the advantages such as the ease of the design because of the 
vast number of available digital component libraries, the sp
eed of modem 
digital circuitry, and better imm11njty to random noise present in the system. 
However, it also suffers from such drawbacks as, (1) the complexity o
f the 
system increases sharply as the designer increases the accuracy o
f each synaptic 
weight value {bits/weight), (2) the area cons11mption of the system increases as
 
the number of the synaptic weight interconnection increases, an
d (3) the power 
4 
• 
ANALOG VERSUS DIGIT AL HA.RDW ARE 
Bloloalcal Newon 
. Dendrite Input Line 
. Axon Output Line 
. Cell Body Summing Operational Amplifier 
. Synapses SONOS Nonvolatile Transistors 
. Biological Clock Synchronized Clock 
. Nonlinear Linear Approximation 
. Excitatory/Inhibitory Synapse Positive/Negative Differential Conductance 
. Reinforced Lean1ing Reinforced Reading 
• 
• Mernory Rctentiorl/Loss Weight Jnf onnation Rete11lion
/Loss 
Figure 1-3: Comparison between the Biological and 
Electrical Neuron 
consumption will be a bottleneck: for large digital neu
ral networks. 
Furthe1more, the digital approach suffers from the quantizatio
n error in the 
weight value. On the other hand, analog artificial neural n
etworks, although 
suffering from problems in design and speed, off er attrac
tive features such as 
lower power cons11mption, less complexity for large netwo
rk, small chip area 
cons11mption per weight and the absense of the quantizatio
n error in synaptic 
weight values. Therefore, if we seek an efficient hardware
 implementation of 
the neural networks, then the network as well as the syna
ptic weights should be 
analog. The efforts to implement the synaptic weights with t
emporary charge 
storage on the input capacitor of a MOS transistor t
o alter the analog 
conductance of the MOS transistor have been reported in th
e literature. 3•4 
1 This 
approach gains an advantage in chip area consumption, 
but the weight is 
5 
temporary and requires periodic rde1b arimiJar to a DRAM.
 'ftwrelore, neither 
approach is suitable for synaptic weight implementatio
n.. 
1.2.2 Nonvolatile Memory Devices in Neural Networ
k, 
A nonvolatile memory cell, on the other band, provid
es the Ceat1,rea such 
as read enhancement and ret:ention loss wbicb m
imic the biological synapse. 
Two basic types of nonvolatile memory exist, the 
floating gate device and the 
floating trap device. The floating gate device store
s weight information in the 
form of charges on a 'floating' polysilicon layer und
er a polysilicon control gate. 
Holler et. al. 5 have investigat;ed the implementation
 of modifiable synaptic 
weights by using the floating gate devices. Although
 the floating gate device 
shares the same desired features with the floati
ng trap device, its high 
programming voltage become a barrier t:o overcom
e in order t:o realize large 
electrical neural networks. The conventional floati
ng trap device, known as the 
Metal Nitride Oxide Silicon CMNOS) memory transistor, had been
 used as the 
modifiable synaptic weight previously.
6 The scaled floating trap device, such as 
Silicon Oxide Nitride Oxide Silicon (SONOS) device, provides sa
lient features of 
low programming voltage, low power dissipation, w
ide dynamic range of analog 
conductance, and small chip area cons11mption. 
7 All these attractive features 
suggest that SONOS transistor cell should be
 used in future electrical 
implementation of the neural networks. 
1.3 Scope of this Thesis 
My research was initiated to demonstrate the SONO
S nonvolatile memory 
transistors can be effectively used as an electrically
 modifiable synaptic weights 
in neural networks implementation. To serve the 
above-mentioned motivation, 
a two-tap weight linear adaptive neuron has been de
signed and breadboarded. I 
have selected a particular learning algorithm for 
the linear adaptive neuron, 
6 
.. ...; 
.·r-:." .• ,. 
SCOPE OP 1BJS 1Bts1S 
namely, the clipped-data least mean aquare (CLMS) error algorithm. The 
research is intended to provide a means to examine a basic functional neural 
building block for neural networks. 
The work present in this thesis includes an exploration of the operation 
theory of the two-tap weight linear adaptive neuron, and a description of the 
available programming algorithms. This thesis also devot-es a chapter to the 
technology of the electrically modifiable synapses, namely the SONOS 
nonvolatile memory transis1:ors. In addition, I give a detail discussion of the 
experimental setup and the operation of the two-tap weight linear adaptive 
neuron. 
This thesis identifies the attractive features of the SON OS nonvolatile 
memory transistors as the electrically reprogrammable synaptic elements for 
the hardware implementation of the neural networks. Furthermore, the 
feasibility of the integration of the CMOS VLSI technology with SONOS 
. 
technology is demonstrated in the thesis. This integration of both technologies 
is essential for the realization of large neural networks in the future. 
7 
\ 
t 
.l 
THEORY OF nn: LINE.AR ADAP11VE NEURON 
Chapter 2 · 
Theory of the Linear Adaptive Neuron 
2.1 Discussion of Terotinology 
The biological neural system is made up of bj]Hons of DeJ ve cell1, called 
neurons, massively interconnected to each other to perform all the everyday 
cognitive activities. Each neuron is composed of three main parts, namely the 
cell body, the input branches called dendrites, and the output branch called 
axon as previously shown in figure 1-1. The input and output signal 
transmission are done in the form of electrical impulses. The cell body acts like 
a summing point for all the dendrite inputs and produces the output for 
transmission by the axon. The basic computing nodes sums N weighted inputs 
and passed the result through a nonlinearity as mentioned before. Figure 2-1 
shows three common types of nonHnearities; hard limiter, threshold logic 
elements, and sigmoidal functions. 8 
out 
. 
out out 
' . 
+1 
+1 +1 
• 
• m e • m 
- m 
__ _.._1 -1-
Hard Limiter Threshold Logic 
Siemoid 
-
Figure 2-1: Different Type of Nonlinearities in Neuron 
8 
I 
I 
DJ8CUSSJON OP 11AM1NOLOOT 
The entire neural 1y1tem is mauively coupled, meaning that the 
dendrites of a particular neuron can be crossovered by the uons from
 all the 
neighboring neurons. Sometimes, the uon of a particular neuron can
 even 
crossover the dendrite from its own neuron. Wherever the axon 
and dendrite 
cross each other, a synaptic interconnection is formed between th
em. '!he 
electrical impulse travelling along the dendrites and u:ons are caus
ed by the 
potential difference of the sodi,im and potassi11m ions on each sid
e of the 
membrane which separates the dendrite and the axons from the sur
rounding 
fluid. The signal from axon- is transferred to the dendrite at thes
e synaptic 
interconnection by a neurotransmitter. The learning process 
requires the 
modification of how the signals are carried or coupled from the 
axons to the 
dendrites. The cell body deterxr,ines whether it is going to fire an
 electrical 
impulse or not depending on the result of the s11romation process. 
H the result 
of the s11mmation exceeds a threshold value, the neuron will fire 
an electrical 
impulse. Conversely, if the result of the s11mmation is lower than the
 threshold 
value, the neuron will not f:ire any electrical impulses down the axon.
 Generally 
speaking, the threshold value is a nonlinear function of the in
put signal 
strength. The strength of the signal travelling along a dendrite is
 detertnined 
by the strength of the synapses that connected to the particular dendr
ite. Two 
types of synaptic connections have been found in the h11man body, 
namely the 
. 
excitatory and the inhibitory synaptic connection. The excitat
ory synaptic 
connection adds the signal to the s11romation in the cell bod
y while the 
inhibitory synaptic connection reduces the s11mmation in the cell bod
y. 
In order for artificial electrical neural networks to successfully mim
ic the 
biological neural networks, the artificial electrical neural network
s must have 
equivalent coun~rparts to their biological partner. In artificial analog 
electrical"-
neural networks, an operational amplifier replaces the position of
 the cell body. 
9 
-. 
Df8CUSSION OP IIRKINOLOOY 
The operational amplifier is configured to serve two purpoaea, one ia to IWll up 
all the current entering in the inverting or noninverting ter11iinal and the other 
is to convert the summed current into a corresponding volta,e for further 
processing. The output of the operational amplifier represents the axon in the 
biological neuron. The synapses in the biological neural system are U> be 
replaced by variable analog conductances. Since the inputs of the electrical 
neural system, acted like dendrites in the biological neural system, are 
subjected to voltages, the varying analog conductances convert the voltage 
inputs into currents which are summed by the operational amplifier. In order to 
achieve programmability, the electrical neural networks must operate with a 
learning algorithm. The purpose of the learning algorithm is to change the 
system characteristics such that the output signal matches the teacher signal 
Figure 2-2 shows a simplifed block diagram of the linear adaptive neuron. The 
learning algorithm addressed in this thesis is called the clipped data least mean 
square error (CLMSE) algorithm. Compared to the more familiar least mean 
square error (LMSE) algorithm, the CLMSE algorithm clips the amplitude 
information of the input variable. A more detail discussion on the learning 
algorithms will be found later in this thesis. 
2.2 Architectures ofVario11s Neural Networks 
2.2.1 Structure of Neural Networks 
The biological neural system consists of n11merous neurons interconnected 
to each other. It is known that not all the neurons and nerve cells look alike. 
The muscle nerve cell, for example, is long and thin in shape and is responsible 
for muscle contraction. Most researchers agree that the biological neural 
system is made up of different 'layers'.· Each layer is composed of many neurons 
serving similar functions. The neurons in the same layers are not only massive 
10 
X 
SikUCTURE OF NEURAL !'JETWORKS 
Linear 
Adaptive 
Neuron 
error 
y 
,I 
Figure 2-2: Simplified Block Diagram of the Linear 
Adaptive Neuron 
d 
I' 
\ 
interconnected to each other but also have connections between neighboring 
layers. A conceptual multi-layer architecture of neural network is shown in 
figure 2-3. 
The first layer is normally the input layer, which can be considered as the 
sensory neurons in human body. This layer of neurons gathers all the input 
data and pass it down to the next layer, the hidden layer, which resides between 
the input and the output layer of the neural system. The hidden layer is 
responsible for gathering the input information from the input layer and 
processing the information before sending the result to the output layer or 
another hidden layer. While the neurons in the input layer may not have 
interconnections between them, the neurons in the hidden layer are believed to 
be highly interconnected. The neurons in the output layer sum up the result.s 
from the hidden layer and detern,ine their states from the results they receive. 
This thesis concentrates on the part that consists of two hidden layer neurons 
and one output layer neuron and the work is highlighted in figure 2-4. ' 
11 
t 
CLASSIFJCAnON OF NEURAL NETWORKS 
Output Layer 
Hidden Layer 2 
Hidden Layer 1 
Input Layer 
Figure 2-3: Conceptual View of the Multi-layer 
Neural System 
• fl•--
12 
.. 
CI.ASSIFICAnON OF !i£t1RAL, NETWORKS 
Output Layer 
' 
. '~ :_.\-·\ . . : .. : 
' . Hidden Layer 
Input Layer 
Figure 2-4: Highlight of the Thesis in the Multi-layer 
Neural System 
2.2.2 Classification of Ne1# 1\i~tworks 
According to the me~d that the neural networks train or program 
themselves, the electrical neural networks can be classified into three main 
categories: Supervised Training, Unsupervised Training, and Self-Supervised 
Training'. The supervised training neural network requires the presence of an 
external training signal or teacher and the labelling of the data used to train the 
network. The teacher knows the correct response desired fro:c;i. the network and 
gives a corresponding error signal when the network produces an incorrect 
13 
., 
I 
CLASSIFJCAflON OP NIURAL NrJWORKI 
respon1e. The network then utilizes the error to learn and correct ita response. 
Therefore, this type of neural network requires two input signals, one for system 
input signals and the other one for the t;eacher or training 1ipaJ1. 
Unsupervised neural network, on the other band, uses ,inlabeled training data 
and thus requires no external teacher. When data is presented to the network, 
it forms internal clusters that compress the input data into classification 
categories. The third type of neural network, namely the self-supervised 
training neural network, monitors its perforxr1ance internally. Therefore, it does 
not require an external teacher. A corresponding error signal is generated just 
like its supervised training counterpart, however, this error signal is generated 
by the system itself. The error signal is also fed back to the system and the 
correct response is produced after n11mber of iterations. A general review of 
various classes of the neural network is included in Appendix A The linear 
adaptive neuron presented in this thesis belongs to a class of single level 
perceptrons, which performs supervised training. 
2.2.3 Topology of Neural Networks 
The actual implementation of the interconnections of the biological neural 
system is still an open question. However, there are a limited n11mber of 
available electrical interconnections for the implementation of neural networks. 
Currently, three types of the interconnection implementation across the same 
layer are under investigation. They are: the locally connected network, the fully 
connected network, and the sparsely connected network. In the locally 
connected network, the interconnections are made between the neighboring 
neurons only. In the,. fully connected network, the interconnections are made 
from one neuron to all· other neurons in the same layer. Where the sparse 
connected system is applicable, the interconnections are created from one 
14 
.. 
'JOPOLOOY o, NIUIAL Ni'i WORJCI 
neuron to a few distant neurona in the system. There are two types of 
connection between d.i.frerent layers in the system, namely, the feedfoi ward 
connection and the feedback connection. In the f eedf orward connection system, 
the connections are made only from the lower layer of neurons to the higher 
layer of neurons. This network only works until the output comes out from the 
highest layer. An example of this type of connection are the connections 
between the neurons in the input (sensory) layer of the neural network to the 
neurons in the hidden layer. On the other hand, in the feedback connection 
system, part of the higher layer output is fed back as the inputs for the lower 
layer. This type of operation depends on the iteration process of the system to 
produce the final output. An example of this type of connection are the 
connections between the output layer neurons and the hidden layer neurons in 
the system. It should be pointed out that this thesis addresses the electrical 
implementation of a neural network that operates on the principle of supervised 
training, utilizing feedback (backpropagation) connection between the input and 
output layer. 
2.3 Advantages and the Applications of Neural Networks 
Electrical neural networks have many advantages over the traditional 
serial computer. First of all, the neural networks are naturally parallel, while 
the von Neumann architecture, on which modern computers are built, is 
inherently serial in nature. Neural networks find rules for Large Complex 
Systems (LCS) rather than follows models implemented in serial computers that 
might be too simple or rigid or too detailed and complicatied for the time scales 
•. 
involved. 9 In addition, neural networks are adaptive or trainable. Artificial 
Intelligence (AI), implemented on the von Ne11mann · computers, has to depend 
on the preset rules or models in order to learn or adapt. In other words, AI has 
,•'. 15 
/ ·····, 
I 
'· 
I 
~' \.' 
/ 
l 
ADVANTAOD AND 1ili APPLICA110NS OF NEURAL HETWORKI 
to operate in the environment which ia knowable and controllable. On the 
contrary, the adaptive learning in the neural networks can operate in an 
unknown or noisy environment. Due to the massive parallelism, neur
al 
networks have a lot more fa ult tolerance than serial computers. If part of the 
neurons in the neural networks are destroyed, then the collective effort of the 
network will still produce a correct response. On the other hand, if part of the 
AI software code is destroyed, then the output of the serial computer may not 
produce the desirable response. Since all electrical components deteriorate with 
time, the serial computers perforxriance will become worse as time progress
, 
however, the parallelism of the neural networks circ11mvents this problem since
 
the output is determined by the collective efforts of all the components, not just 
one or a few components. In addition, the neural networks are insensitive to 
small variations between computing elements. One of the most important 
advantages of a neural network is its ability to perform tasks in the real time 
domain. If von Ne11mann computers are used to simulate the real time 
performance of neural networks, then they must have tremendous computing
 
speed and memory space, which makes the system 11nreliable. Neural networks 
provide an alternative way for real time system control and real time signal 
processing areas, which are gaining a lot more attention over the years. Figure 
2-5 shows a schematic comparison between AI and Neural Networks . 
• 
The emergence of neural networks has promised several applications for 
which their serial computer counterpart is inefficient in perforniing. Pattern 
recognition, a computation intensive job, is one of the natural applications for 
neural networks. If the task of pattern recognition is to be performed by the
 
serial computer, then the computation and memory requirements for this task 
will force these computers to process in a non-real time environment. This 
bottleneck must be solved if the task requires real time control and signal 
I 
. 
16 
' . 
ADVA..'1,-AGES AND nlE A.PPUCAflONS OF NEURAL NETWORKS 
Artificial Intellegnce Neural Network 
• • Parallel • . Serial 
Software • Hardware 
A Priori 0 A Posteriori 
Left Brain/CC Right Brain 
Deductive Inductive 
• 
• 
. . 
Vulnerable Reliable • 
• • 
• 
Figure 2-5: Schematic Comparison between AI and 
Neural Networks9 
-
processing. One the other band, the inborn massive parallel computation of 
neural networks can reduce the work load of individual computing elements, 
., 
and thus, real time processing can be achieved. Other applications of neural 
networks include associative memory. In a conventional approach, each stored 
word is retrieved by providing its address, where in the associative memory 
application, a memory word is retrieved by providing part of the word itself. If 
the given example is a reasonable match to the corresponding part of the stored 
word, the entire corrected word will appear at the memory output.10 The data 
compression and speech prediction capabilities of neural networks make them 
an excellent candidate in the area of speech communications. Therefore, 
17 
ADVANTAOIS AND TRI APPLICA110NS OF NEURAL NrJWORKS 
co111munication ay1tem1 neecl only transmit vital in!o1 ,,,ation/data to the 
receiving end where the entire speech can be reconstructed by neural networb. 
One of the applications for neural networks is to configurate the neural network 
as an adaptive signal processor for system modeling. Once the adaptive signal 
processor converges, the transfer function of the neural network represents the 
transfer function of the 11nknown system. Recently, parallel distribution 
# 
processing has provided another territory for neural networks. As time 
progresses, neural networks have found their usefii)ness in fields where 
traditional techniques are deter11aine·d to be ineffective. In the future, neural 
networks may integrate with modern signal processing techniques to attain 
optimum system performance. 
' 
2.4 Synaptic Weight Formulation 
The synaptic weights in electrical neural 
• 
require networks 
programmability of the analog conductances. The programmable analog 
conductance in our work utilizes the variable threshold voltage of the 
nonvolatile memory devices, such as an EEPROM. If we restrict our operation 
of the nonvolatile memory transistor to the linear region, then the drain current 
of the transistor can be written as, 
(2.1) 
where 
~eff= µeff(W/L)Ceff (2.2) 
. ' 
µeff is the effective electron mobility, C eff is the effective gate dielectric 
18 
', 
·) 
IYNAPTIC W&IOBT fORIIULAffON 
capacitance per 1,uit area, and W/L ii the tranaiator pometry width to leqth 
ratio. The channel conductance of the tran.sisU>r can be expressed aa, 
(2.3) 
' 
where v,Ja is the electrically modifiable threshold voltage of the nonvolatile 
memory devices. There are two ways to alter the analog channel conductance of 
a semiconductor transistor: (1) fix the threshold voltage but vary the gate to 
source voltage of a transistor, or (2) fix the gate to source voltage but vary the 
threshold voltage of a transistor. A conventional Metal Oxide Semiconductor 
(MOS) transistor offers a fixed threshold voltage. Therefore, the programming 
of the analog conductance of a MOS transistor depends on the charging and the 
discharging of the gate capacitor. However, the charges on the gate capacitor 
slowly leak out due to the leakage currentof the address switch, resulting a loss 
in weight information. Therefore, some form. of additional control/refresh 
circuitry is required to maintain proper st.orage of the weight information. (i.e. 
this approach lacks the nonvolatile property of the neuron). In our work, we 
take the second approach. We program the analog conductance of the weight 
element by programming the threshold voltage ·of the nonvolatile memory 
transistor. The nonlinear term V ds in equation (2.3) can be ignored if either (1) 
the value of V ds is kept small or (2) the neural network is implemented in such a 
way so that the nonlinear term in the conductance is removed after the 
summation operation. In our work, the latter approach is employed. 
The electronic synapses must be bipolar in weight value, that is, they 
have to be either positive or negative at any given time. The positive weight 
19 
'(''·~ 
' 
' 
values represents the ezcitatory aynapee1 and the negative w
ei,t,t values 
represents the inhibitory synapses. Since one single tr
ansistor can only be 
positive in conductance value, we have taken the approach
 of using two analog 
programmable conductances which share a common drain volta
ge as one single 
weight element. Those two analog conductances are conf
igured in such a way 
that the actual weight value is proportional to the diffe
rence in analog 
conductances. If we denote one of the analog conductance in
 the weight element 
to be gds + and the other gds -, then we can write the weight a
s below: 
(2.4) 
where V th+ and V th - are the threshold voltages associa
ted with the analog 
conductances of gds + and gds -. Therefore, if the value of V rn
 + is smaller than the 
value of V th-, then the weight is positive in value. Conversely
, if the value of V th+ 
is larger than the value of V1h -, then the weight will have ne
gative value. 
The error generated in the linear adaptive neuron is used t
o fo1mulate the 
weight increment (or decrement if the error is negative). In our work, w
e employ 
the Widrow-Hoff's Delta Rule11 for the weight update a
lgorithm. The Delta 
Rule can be expressed as follow, 
(2.5) 
where the.& Wis the incremental weight. Using Equation
 (2.4), the incremental 
change in the weight value is proportional to an incre
mental change in the 
<" 
differential threshold voltages. Since the incremental
 weight value varies 
20-
IYNAPTIC Wii081 PORIIULA110N 
proportionally with the error, the quantity 4 W will have ita larp,t value when 
the linear adaptive neuron ia in the early stage of adaptation where the e
rror ia 
the largest. The inci emental change in the weight value will decrease in valu
e 
aa time progress as the neuron adapt-S to its desired response
. If the 
incremental weight value is too large, then the linear adaptive neu
ron may 
overcorrect itself. Under this condition, the error sign will cbange and steer th
e 
neuron toward the desired output. 
2.5 The Prog1·amming Algorithm 
The operation of the linear adaptive neuron is based on the li
near 
combiner with a learning algorithm which is responsible to reconfigura
te the 
elements in the linear combiner to obtain desired response. Figure 2
-6 shows a 
schematical diagram of a linear combiner. The input signal is fed int
o a delay 
line and tapped out nondestructively. The tapped signals are the
n multiply 
with their respective weights and linearly combined at the s11mmer. The
 output 
of the s11mmer represents the neuron output and is compared to a des
ired signal 
which the output of the neuron is trained to match. The linear adap
tive neuron 
compares the output of the neuron t.o the desired signal in order to
 deterxoine 
how the weight elements should be altered. 
In our work, the input tapped signals are x0 (m) and x1 (m), and their 
" 
prospective synaptic weight interconnections are W0 (m) and W1 (m). The output 
of the neuron can be expressed as, J 
1 
y(m) = L, Wk(m)·xm k 
k.=O 
(2.6) 
where mis the time index and k is the spatial tap index. Equation (2.6) can be 
expressed in a matrix form. The input signal vector is defmed as, 
21 
/ 
... 
• - ~ .:.J. I• 
1· { 
i . 
... 
~ .. 
THE PROGR>JOIING ALGORITHM 
X 
Figure 2-6: Schematic Diagram of the Linear 
Combiner 
xm 5 (Xo(m)) = (Xm-0) 
x 1 (m) Xm-1 
and the adjustable weight vect.or is defined as, 
then we can express the output y m in the matrix form as, 
22 
) . 
(:?. 7) 
(:!.8) 
v .wr.x 
.,. . 
(2.9) 
where wT is the transpose of the weight vector. An error is generated when 
there is a mismatch between the output of the neuron Ym and the desired 
response dm for the neuron. The euor generat.ed is then defined as follow, 
(2.10) 
=d -WT· x m m 
=d-x 1 -W m m 
The purpose of the learning algorithm is adjust the weights in the system in 
order to minimize the euor generated. Two types of learning algorithms will be 
discussed in the following sections are the Least Mean Square Error (LMSE) 
algorithm and the Clipped-data Least Mean Square Error (CLMSE) algorithm. 
Both algorithms require the minimization of the mean or the average of the 
" 
square of the error generated. The squared error can be written as, 
.. (2.11) 
Ifwe want to take the mean of a quantity, we have to take the expected value of 
that quantity. Therefore, we can express the expected value of the square of 
error generated as, 
23 
, 
(2.12) 
) 
• 
input signal and R is the input correlation matrix defined as, 
(2.13) 
(2.14) 
Equation (2.12) is a quadratic function of the weight. The process of weight 
adjustment, in order to minimize the mean squared error is to take the gradient 
of equation (2.12) with respect to the weight. 
(2.15) 
The optimum weight vector, generally called the Wiener weight vector, is 
obtained by setting the gradient equal to zero and to yield 
wopt. = R_-1 p 
(2.16) 
The minimum square error is then obtained by substituting equation (2.16) into 
equation (2.12). 
24 
\ . 
• 
• 
C 
iii& PROGRAIOllNG ALGORI IBM 
(2.17) 
It is obvious that the minimization process requires intensive computational 
operations such as S1irnmation, averaging and matrix multiplication. This 
becomes a bottleneck which needs to be removed when the system increases in 
size. We will employ a reasonable approximation for the squared error, called 
the Widrow-Hoff Mean Square Error Algorithm, which will be discussed in the 
next two sections. 
2.5.1 The Least Mean Square Error Algorithm 
The least mean square algorithm utilizes the method of steepest descent 
to minimize the difference between the desired signal and the output of the 
neuron. Therefore, the weight adjustment is in the direction proportional to the 
negative gradient of some function F, where F is the mean square error or the 
expected value of the square of the error signal. Therefore, we can express the 
weight adjustment as 
(2.18) 
whereµ is called the convergence factor. Since our work implements a discrete-
tirne, sample-data system, we can write the differential equation of equation 
,-{2.18) as a difference equation 
25 
--' ' - . 
' 
• 
• 
.. _,· ... ~ 
fBi J ,IAST MEAN SQUARI IRROR ALGORJ i'BN 
W(m+ 1) • W(m)-Jl Vw e2 (2.19) 
• 
where mis the time index. It is quite obvious that equation (2.19) is in the form 
of delta rule as shown in equation (2.5). If we substitute the result of equation 
(2.15) into equation (2.18), then we can write the difference equation in the 
matrix form as 
W(m+l) = (l-2µR)W(m)+2µP (220) 
where / is the identity matrix. Notice that equation (2.20) is a first order 
difference equation which can be solved iteratively for the response
 of the mean 
weight vector, 12 
(2.21) 
If we combine equation (2.19) and (2.10), then we can write the jth 
element of the difference equation as 
(2.22) 
The LMS error algorithm shown above does not require explicit mea
surements 
or calculations of the correlation matrix and, thus, it does not req
uire large 
memory space for the purposes of matrix storage and inversion c
alculation. 
Since no matrix inversion is necessary in this algoritbro, the compu
tation time 
26 
., 
t, 
' l 
• fBi J IAST IIIAN SQUAii IRROR ALOOR11 HM 
can be dramatically reduced. Notice, the incremental weight update ia the 
croaa-co11elation between the error and the input signal vector. The update 
action will be stopped if the er,or and the input signal vector are orthogonal to 
each other. Therefore, the system will converge if there is no common part 
between the input and the error signal. This feature is extremely attractive if 
the sys~m is configured to perform noise and echo cancellation. 
A particular form of the LMS error algorithm, introduced by Widrow and 
Hoff, is the approxirnation13 
(2.23) 
which eliminates the need to compute the average shown in equation (2.22). 
Therefore, we can rewrite equation (2.22) as 
(2.24) 
Although the approximation made by Widrow and Hoff greatly simplifies the 
computation of the incremental weight, it requires the implementation of a four 
quadrant multiplierat each weight location. We shall investigate another form 
of the LMS error algorithm, namely, the "clippe~data" LMS error (CLMSE) 
algorithm discussed in the next section. 
27 
( 
1la& CLIPPRD DATA LIABT MIAH SQUAii IRROR ALOORJ1Bil 
U.2 The Clipped Data I.east Mean Sq11are E1·1 or Altoritbm 
The four quadrant multiplier needed to implement the LMSE algorithm 
poeea a cost and area concern for hardware IC realization. Therefore, the 
replacement of the four quadrant analog multiplier with a non-ideal multiplier 
has been proposed 14 to overcome this barrier. Although the non-ideal multiplier 
has quite different characteristics from the conventional four quadrant 
multiplier, the implementation of the nonideal multiplier is easy and 
inexpensive. Thus, the convergence of the system with the nonideal multipliers 
is an important property to be considered carefully. The general form of a 
nonideal multiplier can be written as 
(a · b) (t) = a(t)j[ b(t)] (2.25) 
where (a · b) is the output of the nonideal multiplier with the input signals of a 
and b. JI b(t)] denotes a monotonic function operating on the input signal b(t). 
Let us consider a particular type of monotonic function, such as 
(£ · x) (t) = E(t) .ftx(t)] 
= £(t) sgn(x(t)) 
(2.26) 
where the input signal is "clipped" to retain just the·sign information. Moschner 
had referred to the algorithm 
(2.27) 
as the clipped-data LMS error algoritb/ The N multiplications in the LMSE 
algorithm are then replaced by N conditional branch operations. Depending on 
28 
,, 
)" 
I • 
.. 
,-: 
' t 
.f 
? 
;. 
, .
.. , 
l 
' 
.r 
:I ,. 
:~ 
.. 
'tsi CLIPPED DATA LIAST MEAN SQUARI DROR ALGORITHM 
the sign of x1{m), the incremental weight is either +2JJ£ or -2µ£. Therefore, 
uplicit multiplication of the input data is eliminated. However, th.is 
simplification of the LMSE algorithm results in increase in the convergence 
time by a factor of rc/2 for the case of Gaussian Noise. Equation (2.27) can be 
rewritten as 
Wj(na+ 1) = Wj(m)+ 2 µ 1£(m)I sgn[£(m)] sgn[x(1,1 :1)] (2.28) 
where sgn[£(m)] sgn[x(nz==J)] represents binary multiplication and, thus, an 
exclusive OR gate can be used to perform this function. The output of the 
exclusive OR gate can be used to control a single-pole double-throw (SPDT) 
switch to steer either the ± 2µ1£(m)I to increment or decrement the weight. 
Other fo1ms of the LMSE algoritlirn are available and will be outlined in the 
next section. 
2.5.3 Other For•os of Least Mean Sq1Jare Error Algorithm 
In the previous sections, the correlation functions of both LMSE and 
CLMSE algorithms are discussed and we may quantify this aspect by a 
correlation coefficient defmed as 
M 
Pj (LMSE) = L e(m)x(m-1) • (2.29) 
m=l 
M 
Pj (CLMSE) = L £(m)sgn[x(m-j)] 
m=l 
(2.30) 
Other algorithms treated in the literature are15 
29 
) ' .... 
.. 
, 
I, 
Ofnlk ,ORMS or I EAST MEAN SQUAii IRROR ALGORJ'IBN 
M 
Pj (J/ybrld) • L sgn[£(m)J-'{m fJ 
,,..1 
M 
Pj (Modified Zero Forcing) = L sgn[£(m)] sgn[x(n1 :/)] 
m-1 
M 
Pj (Zero Forcing) = L sgn[£(m)] sgn[y(n1 :1)] 
m=l 
(2.31) 
(2.32) 
(233) 
A comparison of all the various algorithms is shown in figure 2-7 for the 
equa)ized peak distortion defined as 
N 
D JMalc = L IP1~ 
j =1 
where the summation is valid for all N taps except the center tap. 
(234) 
It is quite obvious that for simpler hardware implementation of the algorith
m, 
the system will suffer from longer convergence time or even instabil
ity. 
Therefore, the trade off between simple hardware implementation and 
the 
system performance requires serious consideration. 
30 
, I 
.. 
2.5 
2.0 
Equallzed 1.5 
Peak 
Dl1tortlon 
1.0 
0.5 
Clipped 
/ 
Linear 
ZF 
o.o ..._.. _________ _.__ _________ __._ _________ ___.~----------J.------------....1 
20,000 4,000 8,000 12,000 16,000 
Number of Received Pulses 
Figure 2-7: Comparison of Various Algorithms for an 
Adaptive Equalizerl5 · 
,. 
I 
31 
-----ff---~ 
,;:, . 
CS OP MODIPIABLI SYNAPSIS 
Chapters 
Technology and Characteristics of 
Modifiable Synapses 
j 
.. 
8.1 Backp'OuDd 
The analog electrically reprogrammable synaptic weight is composed o{ 
nonvolatile memory transistors. Nonvolatile semiconductor memory transistors, 
,inlike regular semiconductor memory such as DRAM or SRAM, retain their 
memory information even when the power supplies are off. There exist two 
basic type of nonvolatile memory transistors, namely the floating gate devices 
and the floating trap devices. Figure 3-1 shows a comparison diagram between 
these two types of devices. The floating gate devices typically have a fairly thick 
ttinneling oxide (80-100 A) underneath the polysilicon storage layer and store 
charges in the polysilicon layer as free charges in the conduction band. On the 
other hand, the floating trap devices no1mally have a thinner ti1nneling oxide 
(20-30 A) and the charges are stored in deep level traps in the nitride layer. 
The analog electrically reprogrammable synaptic weights used in this thesis 
belong to the :floating trap devices. 
The earlier version of the nonvolatile semiconductor memory devices was 
in the form of Metal Nitride Oxide Semiconductor (l\1N0S) devices. Recent 
efforts at Lehigh have scaled down the multi-layer gate dielectric dimensions as 
well as the programming voltages with so-called Silicon Blocking-Oxide Nitride 
:, 
Tunneling-Oxide Silicon (SONOS) devices. Figure 3-2 illustrates a comparison 
between the l\1N0S and the SONOS devices. The main difference between the 
MNOS and the SONOS is the incorporation of an additional layer of blocking 
oxide underneath the gate electrode. The blocking oxide, typically with the 
thickness of 35A - 55A. , is t,o prevent the injection of the carriers from the gate 
32 
;- -- -.... --.--
; 
I 
.i 
( 
I 
I 
' I 
/ 
/ 
Niaide 
~•itlNiNFFitk• ( 
• f Polysilicaa Gire 
,,, 
r~ 
Tunneling Oxide 
p substrate 
Aluminmn 
Floating Gate 
Device 
' )/Pol~con~ 
~~v Blocking Oxide 
.. . . -. 
n+ j l n+ J 
Tunneling Oxide 
Floating Trap 
Device 
p subsbate 
Figure 3-1: Comparison between a Floating Gate 
Device and a Floating Trap Device 
electrode and to prevent the stored charges in the nitride layer from tunneling 
to the gate electrode. 
To operate the SONOS devices, there exist four possible operation 
conditions: erase, write, read, and idle. The write state is defined as the low 
conductance state and the erase state is defined as the high conductance state . 
. 
We can program the SONOS device to either of those two states by a dual power 
supply or a single power supply, to the four terminals of the devices, namely, the 
drain, source, gate and built. If we use a dual power supply, i.e. either the 
positive or the negative voltage is available to the gate, then we can summarize 
the programming operation in table 3-1. VP is the programming gate voltage to 
alter the threshold voltage and therefore the analog conductance of the SONOS 
·- ,I 
device, V r is the reading voltage which is normally chosen to be at the middle of 
33 
. 
Al16Db1DNJ 
• Polysilicon Gare 
V Tmmeling Oxide 
p substrate 
Aluminum 
" / Polysilicon Gate 
v Blocking Oxide 
MNOS 
r L!i±J 
Tunneling Oxide 
p subsb·ate 
SONOS 
Figure 3-2: Comparison between the MNOS and the 
SONOS device 
BACKGROUND 
Operational Mode of a N-channel SONOS Device with a Dual Power Supply 
mode I drain source I gate 
erase 0 0 -V 
+~ write 0 0 
read vds 0 vr 
id.le 0 0 0 
Table 3-1: Operation Mode of SONOS Devices with a 
Dual Power Supply 
bulk 
0 
0 
0 
0 
two extreme threshold voltages when the device is either in its fully erased state 
or in its fully written state, and V tis is the drain bias applied during the read 
operation. In this table, the ·bulk terminal is tied to the source terminal to avoid 
' 
BACKGROUND 
the so called body effect or the tranaiator durin, operation. 1£, however, only 
aingle power supply is available for prop-amming, then the operations can be 
11,rnroarized as shown in table 3-2 ...-.i1¥ .. t the thesis, the programroin1 
Operational Mode of a N-cliannel SONOS Device with a Single Power Supply 
mode drain source gate bulk 
erase vds 
"ct 
0 VP 
write 0 i 0 read vds 0 0 r 
idle 0 0 0 0 
Table 8-2: Operation Mode of SONOS Deivces with a 
Single Power Supply 
operations of the SONOS synaptic weights are done with a dual power supply. 
Therefore, the erase operation of the SONOS devices is performed with a 
negative gate bias; while the write operation of the SONOS devices is performed 
with a positive gate bias. 
ff we perform a C-V measurement of the SONOS device, then we can 
extract the value of the flatband voltage of the device. The flatband voltage of 
the SONOS device can be expressed as16 
(3.1) 
where 4> gs is the work function difference between the gate material and the 
bulk semiconductor, Q1 is the fixed charge at the Si-Si02 interface, QN is the 
trapped charge in the nitride layer, xob and xn are the blocking oxide and nitride 
thicknesses respectively, X is the charge centroid location, £ox and £n are the 
35 
permittivities of the o.tide and the nitride, and c,,.il the effective capac:itence of
 
the device given as 
(3.2) 
and 
where x01 is the t\1nneling oxide th.iclrness. In e
quation (3.2), we assume the 
perm.ittivities of the tt1nneling oxide and the blocking o
xide are the same. 
However, it may not be the case as the ttinneling oxide is
 more silicon rich in 
nature while the blocking oxide is more oxynitride in nature
. 
The threshold voltage of the SONOS device can be extra
cted by normal 
1-V characteristics of the device. We can write the thresh
old voltage of the 
device as 
(3.3) 
" 
(3.4) 
where cl>p is the bulk potential, £sis the per1,aittivity of sil
icon, N8 is the bulk 
doping density,~ is the thermal voltage, and n; is the intri
nsic carrier density. 
. 
q 
Sometimes, the measurement of the device characteristics
 yields the so called 
36 
BACKGROUND 
"tum-on" voltage. The tum-on voltap ia obtained by meuuri.ng the gate to 
IOW'Ce voltage of the device when a predetermined drain current flows through 
the device under test. Therefore, we can write the tum-on voltage, V r, as 
2/ds 
Vr • V th+" peff 
w P,u= µef!Lceff 
(3.5) 
where Ids is the predetermined drain current, µeff is the effective mobility, Wis 
the width of the device, and L is the length of the device. Combining equation 
(3.1) and (3.3) yields the equation, 
QI 
Vth=<I> - -gs C 
eff 
(3.6) 
From the equation shown above, the threshold voltage can be changed by 
changing QN, the trapped charges in the nitride layer. During the write 
operation, electrons will be injected into the nitride layer by tunneling through 
the tunneling oxide. Since electrons carry negative charges, the increase in 
electron population in the nitride layer causes th~ threshold voltage to shift 
positively. On the other hand, during the erase operation, holes are injected 
11 
into the nitride layer, causing the threshold voltage to shift negatively. 
- , 
37 
I 
JAIRJCA110N 8.IQUINCI 
8.2 Fabrication Sequence 
The SONOS transistors are fabrica1:ed with the TPS00-3 rnaak sequence 
developed at J.ebigh University's Microelectronics Research Laboratory. The 
. 
masks are designed to be the device test pattern for various projects. The data 
represented and the actual devices used in the linear adaptive neuron are taken 
from the transistor array as shown in figure 3-3. Figure 3-4 shows a close-up 
view of the entire transistor array. The transist.or array is designed u, 
accommodate various gate lengths. The mask lengths designed for the 
transist:Dr array are lOOµm, 50µm, 20µm, lOµm, 7µm, 5µm, 4µm, 3µm, 2~am, and 
lµm. 
The n-cbannel transistAJr fabrication sequence is listed as follows: 
• Starting Material: p substrate,<100>,3~2-3.Q/cm 
• Front/Backside Implantation 
1. Furnace Clean 
2. 160 A Pad Oxide (Dry, 950°C , 20 min.) 
3. Implant Front Side (Boron, 32Ke V, 1.2 x 1013 cm2) 
4. Implant Back Side (Boron, 32KeV, 2xlol5 cm 2) 
5. Furnace Clean 
6. Anneal (Dry N2, 950°C , 30 min.) 
7. Etch Pad Oxide (BHF 10:1) 
• Field Oxidation 
1. Furnace Clean -
2. 5000A Field Oxide (Wet, 1100°C , 50 min.) 
• Photolithography (N+ SID) 
1. Apply Photoresist (Baker) 
2. Prebake Photoresist (98°C , 30 min.) 
3. UV Exposure and Development 
4. Postbak.e Photoresist (120°C , 30 min.) 
5. Etch Field Oxidation (BHF 10:1) 
38 
.. 
' 
Figure 3-3: Photograph of the TP300 Design 
Figure 3-4: Photograph of the transistor array in the 
TP300 
39 
. ,·· / 
1ABRICA110N SIQUINCI 
6. Strip Photoreaist (PRS-2000) 
• Gate Multi-layer Dielectric 
1. Furnace Clean 
2. 20 A Tunneling Onde (Dry, 720°C , 9 min.) 
3. 120 A Silicon Nitride (LPCVD, 0.25 torr, 100 seem NH3, 20 
.. 
seem SiC12H2, 735°C , 5 min.) 
4. 40 A Blocking Oxide (Wet, 1000 °C , 50 min.) 
5. 5000 A Polysilicon (LPCVD, 0.8 torr, 180 seem SiH., 625°C , 
30 min.) 
• Photolithography (Polysilicon) 
1. Apply Photoresist (Baker) 
2. Prebake Photoresist (98°C , 30 min.) 
3. UV Exposure and Development 
4. Postbake Photoresist (120°C , 30 min.) 
5. Plasma Etch Polysilicon and Triple Dielectric 
6. Strip Photoresist (PRS-2000) 
• Source/Drain Diffusion 
1. Furnace Clean 
2. Diffusion (POC13, 900°C , 20 min.) 
3. Drive-in (Dry N2, 900°C , 30 min.) 
4. Etch p-glass (BHF, 15 sec.) 
5. Furnace Clean 
6. 1200 A Oxide (Wet, 900°C , 30 min.) 
• Photolithography (Contact Window) 
1. Apply Photoresist (Baker) 
2. Prebak.e Photoresist (98°C , 30 min.) 
3. UV Exposure and Development 
4. Postbake Photoresist (120°C , 30 min.) 
5. Etch Oxide (BHF 10:1) 
6. Strip Photoresist (PRS-2000) 
' 
• Metallization RF Magnetrum Sputtered Aluminum 
• Photolithography (Metal) 
\ 
40 
1. Apply Phot;oresist (Baker) 
2. Prebake Photoresist (98°C , 30 min.) 
3. UV Exposure and Development 
4. Postbake Photoresist (120°C , 30 min,) 
5. PAN Etch 
6. Strip Photoresist (PRS-2000) 
• Backside Metallization 
1. Plasma Etch Backside 
2. RF Magnetrum Sputtered Al11min11m 
JARIJCA110N SIQUINCI 
• Post Metallization Anneal PMA (Hz'N2, 400°C , 30 min) 
The fabrication sequence listed above produces a n-cbannel transisu>
r 
only. In order to to produce a p-channel transistor, we can start the fabricati
on 
sequence with a n substrste material or use an n well full CMOS fabricatio
n 
sequence. The CMOS sequence requires more masking steps and implan
tations 
and thus requires more time to finish a full run. 
3.3 Characterization of the SONOS Devices 
To fully characterize the electrical performance of the SONOS devices,
 
several measurements must be carried out. The data presented in this
 thesis 
are taken from the SONOS nonvolatile memory transistor with the o
ptical 
designed length of 100 µm, the optical designed width of 150 µm, and the gate
 
dielectric dimensions of 20 A of tunneling oxide, 95 A of silicon nitride, and 25 
A of blocking oxide. Typical measurement techniques will be discussed at the 
following sections. 
41 
BIOB PRIQUINCY C.V NBABUR.IMENTS 
The high frequency C-V measurement can be used t:o deterxx,ine the 
effective capacitance of the device under test. During measurement, a stepping 
voltage source with a high frequency small AC signal (1 MHz) riding on top of 
the stepping voltage source is applied to either the gate terrx,inal or the bulk 
terix1inal and the measured differential capacitance of the device is recorded. In 
our measurement, the high frequency C-V curve is obt:ained by using HP4280A 
1 MHz C-V meter with HPIB interface to a HP9836 Computer. Figure 3-5 
shows the block diagram of the high frequency C-V measurement setup. A 
typical high frequency C-V curve is shown in figure 3-6. 
We can expressed the capacitance of the device as 
(3.7) 
where Ceff is the effective capacitance defined in equation (3.2) and CD is the 
depletion capacitance given by 
(3.8) 
where xd is the depletion width of the device. The device under test is first 
biased into inversion, then the stepping voltage moves the device toward 
depletion and accumulation. In accumulation, the capacitance measured 
corresponds to the effective capacitance of the device because there is no 
depletion width present in the device. When the device moves toward depletion, 
the depletion width of the device starts to increase, thereby corresponds to a 
42 
• 
lotte 
HP 9836 
Computer 
HPIB 
SIGH FREQUENCY c.v MEASUR&\CENlS 
----
---
-
' \ 
HP4280 
CV Meter 
DUT 
Figure 3-5: Block Diagram of the High Frequen
cy C-V 
Measurement Setup 
decrease in the measured capacitance. Since th
e small high frequency signal is 
fast compared to the response time of the min
ority carriers, the device under 
test is not able to invert the surface of the devic
e. The ref ore, we have a steady 
increase of the depletion width of the device u
ntil the steady state depletion 
I 
width is reached, which corresponds to Cmin
 in figure 3-6. The effective 
capacitance result can be used to check against
 the results from film thickness 
measurements during fabrication, such as ellips
ometry measurements. 
43 
' ·r . • 
IJNEA.R VOLTAGE R.~ TECHNJQL;; 
9£-11!1111111111----------------.. 
· HIGH rREO CV 
P TYP[ 
i 
VZV1l11 
&£-10 
~ 5E-10 
25A ~ . Blocking Oxide 
95 A Silicon Nitride 
20 A . Tunneling Oxide 
~------n+ , 
_.I n+ 
p substrate 
• 
~ 
a: u, 
l&. 
J: 
• 
4(-10 
3£-10 
2£-10 
IE-IB Cmin' ' 
... m 
--I 
SUBS VOLT<V> 
-
Figure 3-6: High Frequency C-V Curve of a SONOS 
device17 
44 
N 
LINEAR VOLTAGE R.4-.MP TECHNJQUE 
8.S.I I,tneer Volta,e Ramp Tecbnlqae 
A Linear Voltage Ramp (LVR) technique is used aa a quaaiatatic C-V 
measurement to det:erm.i.ne the memory window of the SONOS devi
ce. Unlike 
the high frequency C-V measurements described in the previous sectio
n, the 
frequency of the ramp is slow enough so that the minority carrier
s can respond 
to the change of the bias voltage. In our measurements, the test 
are performed 
with the LVR setup which consists of HP9836 computer, HP 8116A
 Function 
Generator, AID and DIA converters, etc. as shown in figure 3-7. 
Analog Out 
KEITHLEY61 
Eectrometer 
HP9836 
Computer 
HP 59313A 
ND Converter 
HP 8116A 
Function Generator 
p 
Figure 3-7: Block Diagram of the Linear Voltage 
Ramp Measurement Setup 
The L VR technique utilizes a ramping voltage source applied e
ither to the 
gate or the bulk te1 ixiinals and measures the displacement curre
nt from the gate 
or the bulk ter11aiual. The" drain and the source te1 ttainals are
 normally tied 
45 
LINIAR VOLTAOI IU:MP TICHNJQUI 
toptber to the bulk ter,,,;naJ and, thua, aerve u the aource or supplying 
minority carriers. The voltage ramp which biases the device can be written u 
Va,•U·t+Vo (3.9) 
where a is the ramp rate in 11nits of m V/sec, and t is the time. Notice, that a ia 
positive if the voltage bias is ramping up and a is negative when the voltage 
bias is ramping down. A DC bias, V0 may be present, however, it is nornaally set 
to zero. The gate or bulk displacement current is monitored during the 
measurement and is recorded along with the bias voltage value. The gate 
displacement current can be expressed as 
iJQG (3.10) IG= dt 
cJQG avGB 
-
• 
-
avGB dt 
where we define the measured capacitance as 
.. 
• 
From equation (3.10), we have a direct relationship between the displacement 
current and the measured capacitance of the device. Therefore, if we plot the 
displacement current vs. the bias voltage, we are essentially plotting the 
effective capacitance-vs. the bias voltage. 
Figure 3-8 shows a measured curve by the L VR setup. The measurement 
46 
.. ' 
~ 
t, 
I 
'•. 
J,. 
ri,.' 
LINEAR VOLTAGE RA.\IP 1'ECH.'nQUE 
• . 
• 
5£-11 paa----~. -----------------
4E-ll 
3£-11 
2E-!! 
lE-11 
A B 
___,_~C~-,,,-----__.,....~ 
-- l 
I 
G.. B ( 
I: 
= -lE-11 
_,:c-_, 1 
-- -
-":::"- t 1 
...... -
~E-11 
-- .. 
__ ... -
~ ._, . ..;. 
I I ___ .,,._.,.. 
, . /iG 
F 
r 
I ,I 
, I 
J I 
I L 
' I ( I 
I f 
• I 
VZVZZZJ 25A '. I Blocking Oxide 
95A Silicon ~tride 
20 A . , Tunneling Oxide 
! . D+ I D+; I I I • • I 
• • • • 
• 
• • • • • 
m ~ m ~ v ~ m - ~ - ro ~ v n m m 
I I I I I I I I 
\;'G < 'l'OL T ) 
Figure 3-8: Linear \,.. oltage 
SONOS Device 
C-V Curve of a 
starts from point A., where the the device is accumulation. When the curve 
moves from point A to point B, the device. is swept from accumulation toward 
depletion. Since the depletion width increases, the measured capacitance 
decreases as expected. However. when the device is swept from point B to point 
C, the device starts to move into inversion. Since the source/drain regions 
provide minority carriers and the ramping rate is normally slow enough for the 
47 
"'. 
LINEAR VOLTAGI MlCP 1'&CBNIQUI 
• 
minority carriers t-0 re8J>Olld, the surf ace ia inverted. Therefore, the measured 
capacitance starts to increase due to the collapse of the depletion region. The 
increase in effective capacitance observed from point C to D is caused by the 
movement of the charge centroid into the nitride. The displacement current 
reverse its sign when the ramp rate reverses its sign and, thus, the value at 
point E is just about the opposite of the value at point D. This time, the device 
starts from inversion and moves towards the accumulation and completes the 
entire cycle by returning to point A. 
The flatband voltage corresponding to the upper trace is found to be 
different from the flatband voltage in the bottom trace. In an ideal MOS 
system, these two voltages should lie vertically over one another, however, in 
the case of a SONOS measurement, the device is written (programmed) during 
the time the device is under inversion because of the positive gate bias on the 
device. Therefore, the threshold voltage of the device shifts positively and the 
effect is clearly shown in the bottom trace. On the other hand, the device is 
erased (programmed) during the time the device is under accumulation and, 
thus, the threshold voltage shifts negatively which is self evident in the upper 
trace. The voltage difference between the two flatband voltages (~ V FB) is 
defined as the memory window. The memory window is a measure of the 
electrical performance of the SONOS device, especially if the SONOS devices 
' 
are used as digital nonvolatile memory cells. 
3.3.3 Dynamic Range Characterization 
For an adaptive system to perform effectively, the modifiable element, the 
nonvolatile analog conductance, should have a wide dynamic range. The 
dynamic range of the conductance determines how well the circuit adapts under 
different training conditions. The dynamic range is measured as the ratio of the 
48 
l 
! . 
.. 
,-: 
\ 
highest conductance to the lowest conductance of the device. If we use a digital 
memory cell as the synaptic element, then 1 bit in the memory cell weight 
corresponds to 6 dB in dynamic range. Therefore, if we want the weight t.o have 
60 dB in dynamic range, then at least 10 bit of digital memory is required per 
weight. This drawback in area consumption of the digital weight element bu 
forced the researchers to seek a good analog conductance which has the same 
dynamic range perfoiumance but with much less area. 
To measure the dynamic range of the SONOS nonvolatile transistor, we 
have to measure its channel conductance at the extreme states, namely the fully 
erased state and the fully written state. The data is taken with a HP4145 
Semiconductor Parameter Analyzer controlled by a HP 9836 technical computer 
n1nning a TECAP (Transistor Electrical Characteristics Ana]ysis Program) 
software package which is used in data acq11isition, parameter extraction and 
graphics display. The SONOS device is first subjected to a positive gate bias, 
say 5V, for 5 minutes to ensure the device is saturated at the fully written state. 
The gate electrode is biased at a low reading voltage, say 1.25V, with the source 
and bulk ter,oinals grounded. A drain current versus drain to source voltage 
characteristics is taken with the drain bias swept from -50 m V to 50 m V. The 
SONOS device is subjected to a negative gate bias (-5V) for 10 minutes to 
achieve the fully erased state. Again, the drain current versus drain to source 
' 
voltage characteristics is taken and plotted with the common drain t,o source 
voltage axis. Figure 3-9 shows a typical dynamic range characteristics of the 
SONOS device. Notice, that the SONOS device promises a 60 dB dynamic range 
between two states. 
49 
. ' 
ERAS£1WR!TE CHARACTERIZATION 
,.. 
• l 
C, 
'-u 
-
• 
.... 
C 
.... 
• tteuu~d 
• 
• • • 
4.8 4.B 
Vp=±5V • 
-~ 
25A Bloc~DI Oxide • 
2.8 • 95A Silicon Sitride • • • • • 2.9 • • • • • 20A. I Tunnelin& Olide 
I D+ I D+ • 
p substrate • • • 
• • 
0 • • • • • • • • • • • • • • • • • ' • • • a 
-
• 
• • • • 
• 
• • 
• ' 
• 
• • 
• . ... 
-2.0 • • • • • • • • • • • • • .. • • • • 
. . • 
• • • 
• • • 
• • • • 
-4.0 ' ' • ' ' • ' ' 
t 
' 1 
' ' 
' ' 
r ' -------------"--"'_....---'-_.._.._.._.__. ~ . a 
-;a.a -40.0 -20.0 0 20.a 40.a st1.a 
VD C mt I t f Vo I ts l 
TEC.'iP jC. 00 
vc. 1.zse v 
Figure 3-9: Dynamic Range Characteristics of the 
SONOS Devices W=150 microns, L=lOO 
• 
m1crons 
3.3.4 Erase/Write Characterization 
,_ 
• CL 
c! 
C 
C: 
.. 
C: 
.... 
-
.... 
Erase/Write measurements of the SONOS devices provides information on 
how fast the programmable synaptic weights can be programmed. The 
erase/write measurement setup is sho\vll in figure 3-10. A resident program is 
stored in an .. L\.IM 65 computer which interfaces with the pattern generator to 
generate the control waveforms for the switching circuitry in the measurements 
18
. Instead of measuring the capacitance of the device, we use three terroinals of 
the device, namely the source, the drain and the gate to measure the tum-on 
.. 
. . 
50 
. l . . 
\ r 
IRASEIWRJTE CHARACTERIZATION 
Perkin-Elmer ~ AIM85 
Computer Tenninal .... 
Port 
Pattern 
Generator 
, Tri 2. 
----
--- -
----
--
' Dynamic 
~Ieasurement 
Tek1111i"'< 7854 
__
 c_rr_· c_u_it __ ,--..i Oscilloscope 
I - \!'I' 
~I 
n+I n+ 
p 
Figure 3-10: Block Diagram of the Dyraarnic 
Measurement Setup 
voltage of the device. The turn-on voltage of the device has been defi
ned 
previously in equation (3.5). In this measurement~ the drain to source current is 
set to be 10 uA. 
• 
The erase/write characteristics are construc;ed by plotting the turn-on 
voltage (thereby, the threshold voltage) vs. the varying pulse width. Normally, 
the programming voltage is kept constant during these measurements. Fig
ure 
3-11 shows the four different modes that a device under test may be subjected to 
during the measurements. To obtain a write curve, for example, the de
vice is 
first fully erased by applying a negative gate bias for a long period of time, 
say 
10 seconds. The de,Tj_c~ is then written with a positive gate bias V!ith a given 
51 
' ' 
' \ {· . 
i, 
ERASE/WRJTE CHA.RACTERlZA TION 
....... :,, 
30K j0K 
-
-
-
-
5V 
:,, 
-
30K 
30K 
I C'JR.:mfl h 
,r J 9-lC;" I ' =--· ..., .... IDLE MODE r I , : , r -
I 
- RERD MODE - --- , 
-
-
-
Figure 3-11: Four Testing Modes Performed by the 
D:~aroic Measurement Setup18 
pulse width and the tum-on voltage is read out during the read operation. The 
entire procedure repeats as we gather the turn-on voltage data points for 
various write pulse widths. 
Figure 3-12 shows a typical erase/w1ite curve of a SONOS device. As we 
may expect it, the turn-on (and thereby, threshold) voltage shift more positively 
as the write pulse width increases. However, for small write pulse widths, the 
SONOS device does not really response to the programming pulses. On the 
other hand, if the write pulses are too long, then the device is driven into fully 
written state. The erase curve is just the opposite case of the w1ite curve. The 
pulse width that corresponds to the point where the erase curve and the write 
· curve cross each other is called the cross-over time .. Tc· .. "\ de,~ce with a smaller 
cross-over time mdicates that it can be programmed faster than a device having 
52 
; 1 
'I 
> 
w 
t, 
a: 
t-
_J 
0 
> 
Cl 
_, 
0 
I 
U1 
w 
fr: 
I 
r-
£.RAS&WRJTE CHAJUCTERJZA TI ON 
• • • • • • • 
Vp=±SV 
• 
• 
2 • 
• 
• • 
• 
Blockfq Olide • 25,\ 
95A' Silicon Nitride • 1 • 20A. Tunnelin1 Oxide 
i · D+ 
• 
' 
D+ • 
I • 
I p substrate • 
• 
• 
0 • Tc • • 
• • • • 
• • 
• • • 
,,~ 
' 
-1 t I t ft .. '' ,..J ' . •• ...J t I f e I t ' ',,...I t f I 
•• ..J ' • t ,,, t t I ttt t t '" 
~ u, u, T cw, N - m -
N C"I 
I I I I I I I + + 
+ 
ISi (SJ (SJ lSJ (SI ISJ ISJ (Sl (SJ 
CS) (SJ 
... 
-
... 
-- -
- -
... 
- - -
PULSE WIDTH CS) 
Figure 3-12: Erase/Writie Curve of a SONOS Device 
W=150 microns, L=lOO microns 
a larger cross-over time. Higher programming voltages will yield smaller cross-
over times due to a larger injection of carriers into the silicon nitride layer. 
From figure 3-12, the cross-over time of the SONOS devices is in the range of 
seconds with ± 5V programming voltages. Therefore, the SONOS device may 
not be very responsive when the device is subjected to a progr~rnming pulse of 
one tenth of a millisecond. This insensitivity of the device to short prograrnn;iing 
pulses will cost the system to converge in a much slower rate than the system 
with smaller cross-over time devices. 
53 
.. 
UTINTION CBARAC'l'KRJZAflON 
8.8.5 Retention Characterization 
Retention measurements on the SON OS devices provide info, mation on 
how well the SONOS devices retain their trapped charges in the nitride area. 
In tbjs measurement, the same setup for the erase/write measurement can be 
used again for retention measurement with a slight modification to the control
 
sequences sent to the pattern generator. The retention characteristics can 
be 
obtained by plotting the turn-on voltage vs. various delay time. The wiite cu
rve 
is constructed by first fully writing the device. The threshold voltage is the
n 
monitored with various delay times to dete1 xxaine how the threshold volta
ge 
decays with respect to time delay. The erase curve is again an opposite case
 to 
the w1ite curve. A typical retention measurement is shown in figure 3-13
. This 
curve can be used to project the memory retention after a long period of time 
delay, say 10 years. In this curve, we can preserve 30% of the original me
mory 
information after a projected 10 year period. 
One trade-off has to be made between good programming speed and good
 
retention. In order to make the SONOS devices program faster, the tl1nnel
ing 
oxide thickness must be reduced. However, a thinner ti1nneling oxid
e would 
also increase thi,l?robability of the trapped charge back-tunneling into t
he bulk 
; 
--/ 
silicon. Therefdre, the retention performance on a thin ti1nneling oxide 
SONOS 
device is degraded. A more detail study and optimization of the SONOS d
evice 
has to be made 1:o obtain a good balance between the programming sp
eed and 
memory retention. 
. \ 
'\ - .. 
54 
,• 
> 
w 
(.!J 
a: 
t-
_J 
0 
> 
0 
_J 
0 
-
.J.. 
U1 
w 
C!: 
-I 
-
• • • 
2 
• 1 • • • 
0 ... " 
N 
-
m 
I I (SJ ~ ~ 
- - -
TBIORE11CAL ANALYSIS OF 1HE LINEAR ADA.P1TVE m:L'RON 
Vp=±5V 
• • • 
• 
• 
• • • 
. ----------( I .. 
25A · 1 Blocking Olide 
95 A Silicon Nitride 
20 A . Tunneling Oxide 10 yrs. 
I i D+' D+ . I · I 
I p substrate 
I 
-
N 
"' 
T In c.a f'- CD 
+ + + 
. + + + 't- 't-
~ tSl ~ ~ (SJ tSl ~ tSl 
.... 
- -- -
-
.... 
- -
DELAY TIME CS) 
en 
+ ~ 
.... 
Figure 3-13: Retention Curve W=150 microns, 1=100 
• 
microns 
55 
..... 
. I. 
Chapter4 J 
Theoretical Analysis of the Linear 
Adaptive Ne11ron 
4.1 Operational Theory 
In this section, a derivation of the performance of a two-tap weight linear 
adaptive neuron with the LMS and the clipped-data LMS algorithms in the 
continuous time domain is presented. The LMS algorithm has been proven to 
work for stationary signal. This derivation is based on sinusoidal signals with 
and without the presence of Gaussian narrow-band noise. The sinusoidal 
signals are chosen because we can readily verify the theoretical results against 
actual experimental results. The derivation involves a training (desired) signal 
that has a phase and magnitude difference with respect to the input signal. The 
analog delay line of the input signal is implemented in such a way that the 
phase difference between the two tapped signals is adjustable by varying the 
sampling clocks. 
Let us begin by writing the mathematical expression for the input and 
desired signals as 
(4.1) 
' 
where x0(t) , x1 (t) are the input tapped signals at time t, d(t) is the desired signal 
at time t, da and xa are the desired and input signal magnitudes, n0(t) and n1(t) 
are the noise associated with x0 and x1 at time t, Tis the delay time, and ms is the 
56 
0 . 
OPIRAffONAL THIORY 
input signal frequency. In the following 1ubseetiona, derivations for LMS and 
clipped-data LMS learning algorithm will be given separately. 
4.2 Theoretical Analysis of the Two-Tap Weight Linear 
Adaptive Neuron with the LMS Learning Algorithm 
As previously mentioned in Chapter 2, the weight adjustment can be 
expressed as 
(4.2) 
=-µVw e2 
with V w expressed as 
(4.3) 
where P is the cross-correlation vector between the desired response and the 
input signal and R is the input correlation matrix. Combining equation (4.2) 
and (4.3) yields the following equation 
dW 
-dt = 2µP-2µRW I (4.4) 
where P and R are substituted with the definition in Chapter 2. The expected 
value of a quantity is computed by taking the average value of the particular ,• 
quantity. Therefore, for the P vector, the expected value is computed as 
57 
, I 
I 
._ 
TIIIORE11CAL ANAL Y81S or 1'81 TWO. TAP WIIGBT LINIAll ADAPDYI 
NSURON WJIB 1Bi I MS LIARNINO ALGORJ'111K 
where T
1 
is the period of the input signal. Use the trig'onometric identity 
1 
cosAcosB = 2 [cos(A-B)+cos(A+B)] 
Equation (4.5) becomes 
=--2 
cosq, 
(4.5) 
Other elements in the matrix R and vector P are computed in much the same 
way to attain the following results 
P= (4.6) 
' 
R= 
x2 
a +a 2 
2 n 
xa2cos(cosn 
2 
where e1n2 · is the variance of the assumed naITowband Gaussian noise. The 
58 
• 
1'Bi0RE11CAL ANAL YBIS OF 1111 TWO. TAP WEIGHT LINEAR ADAPT1VI 
NSURON Wl1B TBi I-NS LEARNJNG ALGORmlM 
time-varying noise components "c)(t) and n1(t) are a111irned to be ,,ncor1elated 
and, therefore, we have 
E(n,,2(t)) • E(ni2(t)) • a,,2 
E(n0 (t) n1 (t)) = 0 
Combining equations (4.4) and (4.6) yields: 
d Wo 
= µdala 
cos, 
-di W1 cos (cp + (i)s n 
X 2 x2 
a +a 2 a 2 cos(rosn 
-2µ 2 n 
X 2 x2 
a a +a2 
2 cos(rosn 2 n 
(4.7) 
(4.8) 
Wo 
W1 
In order to solve for the steady state solution of the weight value, the R matrix 
has to be diagonalized. Therefore, let us introduce a transformation as 
w = <1>11 (4.9) 
where 
1 1 1 -1 
- -
-
...,--
~2 '12 I '12 '12 (!) = <1>-l = = (bT 
-1 1 1 I 
- -
- -
'12 '12 ~2 '12 
With the matrix transformation, we can diagonalize the R matrix as follow 
and obtain the following expression 
59 
TIIIORfflCAJ, ANAL YBIS or nm TWO.TAP WEIGHT LINBAR ADAP11VI 
NIURON Wl'IB THI J ¥8 LIARNING ALGORJ I SN 
d 
-di 
Tio 
111 
JJdaXo 
·--{2 
-2µ 
cost-cos(~+co, n 
cos,+cos(,+CD1n 
X 2 
; (1-cosm,n+ a,,2 0 
2 
XO 2 0 2 (1 +cosm,n+ a,. 
1lo 
111 
(4.10) 
The steady state solution of the weight values can be determined by first setting 
equation (4.10) to zero and thus find the steady state solution for the quantities 
1lo and 111. The steady state solution of the transformed weight values can be 
then obtained by utilizing equation ( 4.9). 
daxa [cos4>-cos(q>+Cl.lsnl 
11o=------
~2 [x0 2 (I -cos ms D + 2 an 2] 
(4.11) 
daxa [cosq>+cos(q>+rosnl 
~1=----------------
~2 [xa 2 (1 +cosrosn + 2an2l 
and the weight values become 
daXa [Xa 2cos q> + 2cos q> (Jn 2- Xa 2 cos cos T cos ( q> + cos n] 
Wo =-----------------
rx 2+2a 2]2-x 4 cos2m T L: a n a s 
(4.12) 
' 
daXa [xa2cos (<I>+ (J)s n + 2 cos («I>+ (J)s n C1 /-Xa 2 cos COST cos «I>] 
W1 =-------------------[x 2+2a 2]2-x 4 cos2ro T a n a s 
Equation (4.12) shows the general expression for the steady solution of the 
weight value. For the special case where rosT is equal to 7r/2 (or (2n+l)m'2), we 
have a 90 degree phase shift between x0(t) and x1(t). Equation (4.12) can be 
simplified to 
60 
TBIOUTICA1,ANALYSJI OPTBI fflO.TAP WEICHT LINEAR ADAPTIVI 
NIURON Wii& TBI J IP L&ARNING ALGORmBI 
daX11 [x0 2cos,+ 2cosfa,,2] 
Wo •---------[xa2 + 2 CJ" 212 
(4.13) 
-d~0 [x0 2sin 4> + 2 sin, a,. 2] 
W1 •---------
[xa2 + 2 a,. 212 
Further simplification can be made if the noise is neglected in formulation of the 
equation. ff we ignore the contribution of the noise in this derivation, the 
steady state weight values can be expressed as 
d0 cos4> Wo=--
xa 
(4.14) 
-d0 sin4> 
W1=---
xa 
The derivation for the analysis for the clipped-data LMS error algorithm is quite 
similar as described above. The distinct difference between the analysis with 
the LMS error algorithm and the clipped-data LMS error algorithm is outlined 
in the next section. 
4.3 Theoretical Analysis of the Two-Tap Weight Linear 
Adaptive Neuron with the Clipped-data LMS Learning 
AlgoritbDl 
The difference between the LMS error algorithm and clipped-data LMS 
error algorithm is the amplitude information of the input signal is not employed 
in the computation of the incremental weight update values. Thus, P and R 
become 
61 
., ·,0-
i'BiORrl'ICAL ANAL YBJS OF 1'Bi TWO. TAP WIJGBT LINIAR ADAJ>iiVI 
NSUION wrra TBi CLIPPIJ>.DATA l,YS LEARNING ALOORJnlM 
p • E ( d,,.sgn [.r,,.] ) 
d,,.sgn [x,,,_1] 
R • E { x,,. sgn[x,,.] 
\x,,._1 sgn[x,,.] 
.:r:,,. .rgn[x,,,_1] ) 
.:r,,._1 sgn[x,,,_1] 
The rnatris element of the autocorrelation rnatriz R can be caJe11Jated as 
and with the identity 
(:A B) Asgn(A)+B sgn(B) sgn + = ___ A_+_B __ _ 
equation (4.16) can be rewritten as 
where 
(4.15) 
(4.16) 
We can proceed with the calculations of the elements for P and R as described in 
the previous section. 
62 
-
TBIOR.l'nCAL ANAL YSJS OF THE TWO. TAP WBJGlll' lJNEAR ADAPTIVI 
NIUIION Wi'tB 1&& CIJPPID.DATA I MS LEARNING ALGORITHM \ 
(4.17) 
Following the procedures outlined in the previous section, the gradient of the 
weight vector can be w, itten as 
(4.18) 
and using the matrix transformation to diagonaHze the R matrix we arrive at 
(4.19) 
We are now in the position to solve equation (4.19) for the st.eady stat.e weight 
values. Following the procedures of the previous section, the steady stat.e 
·1 
transformed weight vector components become 
.. 
J- I. 
, 
63 
' 
~-
.______ -._,, 
i'BiORITICAl, ANAL YBJ8 OP 1'81 TWO.TAP wlDOBT LINEAR ADAPDVS 
NSURON WI 1B 'l'Bi CJ.IPPBD.DATA r MS LEARNING ALGORJ1BM 
do [cosf-cos(,+<O, n1 
'lo·---------
""2 [X0 (1-COS(J)s n+ iia,,] 
da [cos,+cos(, +c.os nJ 
~1-_,- ,-
"'42 [x0 (1 + cosro1 D+"J2 an] 
and the steady state weight vectors are, 
da [xacos,+~2 cosq,a" -XaCOS(J)STcos {cp+cos n1 
Wo = ------_,---2--2-.,-----
[x0+-v2a,,] -x0 corc.osT 
,-
da [x0 cos ( cp + cos D + \12 cos ( cp + cos n a,, -x0 cos cos T cos 4»] 
W1 = -------,_--2-2--_,-----
[x0 +\12 a11] -x0 cos-ros T 
(4.20) 
(4.21) 
For a particular case where the quantity IDsT equals 'lrl2 (or (2n+l),r/2), equation 
''(~l) can be simplified to 
d0 [x0 cosq>+~2 cosq>an] 
'Wio = --------
_,- 2 [x0 +-v2an] 
-d0 [x0 sin 4>+~2sin cpa11 ] 
W1 =--------
_,- 2 [x0 +-v2an] 
(4.22) 
If we ignore the effect of noise as discussed in the previous section, then the 
steady state weight solution becomes 
-
64 
11110Rrl1CAL ANALYSIS OP TBi TWO.TAP WIIOBT UNJ:AR ADAPUYI 
NIURON Wli& 1'81 CLIPPID-DATA I.MS LEARNJNO ALGORITHM 
-d0 sint 
Wi•---
xa 
(4.23) 
For the case where no noise is present in the system, the LMS error algorithm 
has the same steady state weight values as the clipped-data LMS algorithm. 
For a special configuration where cp is equal to 45 degrees, both the steady state 
weight values are equal to each other. 
By solving equation (4.10) and (4.19), the time constant of the transient 
weight value can be written as 
LMS Algorithm 
1 
to=---------
µ [x0 2(1-cosrosT)+2an 2 ] 
1 
ti=--------------
µ [xa 2 (1 +cosrosn+ 2an 2 ] 
Clipped-data LMS Algorithm 
1 
to=---------
4µ [xa(l-coscosT)+--i2e1"] 
1t 
1 
t1 = 
4µ [xa(l +coscosT)+--i2e1"] 
1t 
(4.24) 
The noise present in the system actually tends to reduce the time constant, 
which leads to faster convergence of the weight values. However, the steady 
' 
state weight values steer away from the optim.11m weight values as shown in 
equation (4.23) because of the presence of noise. 
65 
I 
,. 
'! 
' 
• 
TBIORrnCAL ANAL YBJS or 1'81 TWO. TAP wmOBT LIMIAll ADAPiiVI 
NWN WliB 1'81 CLIPPIJ>..DATA r NS L&ARNING ALOORJ'IHN 
The convergence factor' µ, controla the speed or the convergence or the 
algorithm. A large convergence factor will result in a fast adjustment of the 
weight values t:o minimize the overall error. However, due to large convergence 
factor, each weight adjustment becomes so coarse such that every time the 
weights make an adjustment, the circuit overcorrect itself. This results an 
oscillation in the error signal and large misadjustment caused by the variance of 
the weight values. On the other band, a small convergence factor will minimize 
the misadjustment of the circuit; however, the fine incremental weight 
adjustment requires a longer time before the circuit reaches its steady state 
condition. Therefore, the ideal case would be such that the circuit has a large 
convergence factor at the initial stage of adaptation, and the convergence factor 
reduces its value as the error signal reduces. This variable convergence factor 
scheme would cut down both the convergence time and the roisadjustment. In 
our linear adaptive neuron setup, a variable convergence factor scheme is 
achieved by combining an 11nique SONOS synaptic elements property and the 
signal processing circuitry. In appendix B, a detailed discussion on the variable 
convergence factor scheme is presented. 
\ 
66 
,..-
... 
., 
L 
Chapter5 
Experimental Setup and Results 
5.1 The Linear Adaptive Neuron Description 
The linear adaptive neuron circuit is implemented on four difFerent 
breadboards to provide better isolation between the various parts of the circuit. 
The circuit is divided into the following sections: the digital controVclocking 
module, the analog delay line section, the analog signal processor section, and 
the SONOS synaptic weight section. The common point (ground) for digital 
subsection and the analog subsection is kept separated except connecting at a 
single point. Special precaution is taken to provide both clean analog and digital 
power supplies and to avoid ground loop problems. All the components used in 
the linear adaptive neuron circuit except the SONOS synaptic weight elements 
are commercially available. All the digital logics and the analog switches are 
CMOS components while the operational amplifiers and the comparators are 
the TTL linear series. The power rails of the circuit are set to be ± 7.5V to 
demonstrate the low programming capability of the SONOS weights and to 
comply to the voltage handling capability of the CMOS analog switches used in 
.. 
the circuit. 
The circuit operation can be divided into three modes, the disabled mode, 
the initialized mode, and the programmed mode. During the disabled mode, the 
error feedback path is disconnected from the circuit and the gate electrodes of 
the SONOS synaptic weight elements are connected to a read voltage in order to 
read out the channel conductance inf orn1ation of those elements. Since the error 
l . 
feedback path is disconnected, there is no weight update or weigh~ adjustment 
and, therefore, 1the learning algorithm is said to be disabled. During the 
initialized mode, the gate electrodes of the SONOS synaptic weight elements are 
67 
) 
181 LIHIAR ADAPiiYI N'SURON Dl8CRIPTION 
connected tD an initializing voltage. The polarity or the initializing voltage can 
be selected by the user through a SPDT switch in order to preset the weight 
value to a predetero,ined state. Since the weight value is determined by th
e 
differential conductances of two SONOS synaptic weight elements,
 the 
predetermined state can be either positive or negative. The programmed m
ode 
is the normal mode of operation. It is during the programmed mode where
 the 
adaptation takes place. There are two subprocesses under the program
med 
mode, namely, the updating process and the reading process. During th
e 
updating process, a programming voltage is present at the gate electrode of 
the 
SONOS synaptic weight and, thus, the threshold voltage of the SONOS syna
ptic 
weight element is altered accordingly. During the reading process, a r
ead 
voltage is presented to the gate electrode instead of programming voltage a
nd, 
thus, the SONOS synaptic weight element acts as a conductance. Generally,
 the 
'-' 
linear adaptive neuron is first initialized, then it goes through the disable m
ode, 
and finally the programmed mode. 
The block diagram of the single level linear adaptive neuron circuit is 
shown in figure 5-1. The input signal, x(m), is first passed through an analog
 
delay line to create two tapped signals, x0(m) and x1(m). The two tapped signals 
are multiplied with their respective weight values, W0 and W1, and the resul
ts 
are summed in a summing amplifier. The s11mmed signal is then fed int
o a 
. 
correlated double sampling module to remove the unwanted noise and of
fset 
voltages of the amplifiers used in the analog delay line and the s11mm
ing 
amplifier19. The 'clean' signal, called the output signal y(m), is then compared to
 
the training signal, d(m). H there exists any mismatch between the output
 
signal and the training signal, then an error signal is generated as 
the 
difference between the output signal and the training signal. The e
rror 
generated is fed into the clipped data learning algorithm for the calculatio
n of 
68 
-'lHE lJNE.AR ADAPTIVE NEURON OESCJUP110N 
x(m) 
_ __..,.. __ _...: Analog Delay Line 
x0cm> .. .. XI (m) 
~ ~ 
r--~~ 
w0 =-- \.. X) 1 / -
I 
~ ,. ""' 
..._X)~ W1 
- ' I 
I Ir+"'.____,_..,:. Correlated Double Sampling 
I 
I 
I 
\._-' I 
I 
Steering Network: 
4 I 
CLMS Error Algorithm 
I I 
t11or (m) 
sgn[Xc)(m)] sgn(x1 (m)] 
- Digital Delay Line 
• 
y(m) 
.. -
I 
T 
I 
+ 
Figure 5-1: Block Diagram of the Single Level Linear 
Adaptive Neuron 
d(m) 
the weight adjustments. A digital delay line is implemented with D flip flops t.o 
provide the sign information of the input signal for the learDing algorithm. The 
result of the lea.ming algorithm is fed into the the steering network to steer the 
proper programming voltage to the appropriate gate electrode of the SONOS 
synaptic weight elements. In the following sections, various part of the linear 
adaptive neuron will be discussed. 
69 
-· •. J. 
,, 
DIGITAL CONTROUCLOCKlNG MODULI 
-.,.. . 
• 
S.1.1 Dlptal Conb oJ/Clockf ar Module 
The digital control/clocking module generates all the controlling clock 
signals required to perform different operation modes during adaptation. The 
clocking scheme controls the position of the analog swit:ches in the circuit and 
thereby control the signal path within the circuit. Since the circuit is analog in 
nature 'With digital clock control signals, the circuit operation may be classified 
as Discrete Analog Signal Processing (DASP). 
Due to the power limitation on the analog swit:ches, the clocking waveform 
are either tinipolar (0 - 7.5V) or bipolar (-7.5 - 7.5V). In the case of a 11nipolar 
clock, the low level (logic 0) is represented by OV and high level (logic 1) is 
represented by 7.5V. On the other hand, for the case of a bipolar clock, the low 
level is denoted by -7 .5V and high level is denoted by 7 .5V. Since the CMOS 
logic chip used in the digital control module is powered by GND and 7 .5V, a 
level shift on the bipolar waveform is required. The level shifter is done with 
the comparator which compares the input signal against a reference voltage of 
2.5V. When the input signal is in logic 1 (7.5V), the comparator outputs 7.5V 
and when the input signal is at its logic O (OV), the comparator outputs -7.5V. 
The module is built on a printed circuit board with its own power supply and a 
decoupling capacitor is provided for each chip used in the module. 
The definition of each clock signal is defined in Table 5-1. The timing 
. 
diagram of the output waveforms is shown in figure 5-2. As shown in Table 5-1, 
<I>0, the master clock, is generated from a CMOS timer 555
 multivibrator. The 
oscillation frequency is deter,r,ined by two external resistors and one external 
capacitor. All the data presented in this thesis corresponds to a 12.8 KHz 
master clock frequency. The master clock frequency is divided by 2, 4, and 8 
times by a series of T flip-flops to provide the necessary signals for the 
generation of all other control clocks. If we denote cl>of2 as the clock signal twice 
70 
DIGITAL CONTROLJCLOCKlNG MODULE 
4' 
.. 4>1 _n......_ __ n..___ ____,n...._ _ ____.n ...... __
__ , n...__ -= 
<P2 __ n ___ --'n ___
 n ___ n ____ _ 
ct>. _n ________ ---'n....__ ___ ___.n_ 
Ir 
q, ________ n _________ ~n _______ _ 
IS 
q>c __ n_, ---------------Jn _____________ rL 
~ n ______ ,n __ _ 
'!1-'s -----
~ , 
'-!t-' op ____ ____, 
q>read 1 
I ________ I I __ 
f i r (; ' 
Figure 5-2: Timing Diagram of the Digital 
Control/Clocking Module 
as slow as the master clock, <l>c/4 as the clock signal four tim
es slower than the 
master clock and <l>c/8 as the clock signal eight times sl
ower than the roaster 
clock, then we can express all the control clocks by the logi
cal operations listed 
in Table 5-2 where an overbar denotes the inversion of the
 clock signal. 
The analog delay line sampling clock, ct,1 and <1>2, are desig
ned to ensure a 
two phase, non-overlapping clock for the switched capac
itor analog delay line 
operation. The input clocks, cJ>ir and Cbis coincide with tt,1, bu
t are twice as slow 
as cl>1. Therefore, when the ct>1 is high, 
either <l>ir or <l>is will be high. When cl>ir 
. is high, the analog delay line receives the reference voltag
e signal as input. On 
71 
DIGITAL CONTROIJCLOCKING MOOOLI 
Singal 
Clock Signal Definition 
Description 
Master Clock 
Sampling Clock for Analog Delay Line 
Sampling Clock for Analog Delay Line 
Input Reference Clock 
Input Signal Clock 
Clamping Clock (Correlated Double Sampling) 
Sampling Clock (Correlated Double Sampling) 
Weight Updating Clock 
Weight Reading Clock 
Table 5-1: Clock Signal Definition 
Clock Signal Generation Operation 
Singal Logical Operation 
-
<1>2 < <l>o) · <l>o I 2 · <l>o I 4 
<I). 
ll" 
<l>is 
<!>read 
< <1>1) • <l>o I B 
( <1>1) · <l>o I B 
-( tf> o) • <I> o / 2 · <l>o I 4 · '1> o I B 
('1>o I 4) . '1>o / 8 
-
Table 5-2: Clock Signal Generation Operation 
Polarity 
Unipolar 
Unipolar 
Unipolar 
Unipolar 
Unipolar 
Bipolar 
Bipolar 
Bipolar 
Bipolar 
the other hand, when. (l)is is high, the analog delay line is fed with the voltage 
level that is the s11mmati.on of the reference voltage and the input signal. This 
72 
• 
mCJTAL CONTROLJCLOCKING MODULI 
input signal for1,aation scheme is cboaen so that either the reference voltage or 
the 11,rnrnation of the reference voltage and the input signal appears at the 
tapped location alternatively and therefore, the adaptive signal processor can 
remove the nonlinear effects caused by excessive drain bias. ci,c and <1>1 are two 
clock signals used in the correlated double sampling circuit in the analog signal 
processor, which will be described in a later sectio~ to remove the nonlinear 
effects mentioned above. After 4>8 is closed, the output signal is valid and thus 
the error signal and the weight adjustment can be calculated. Once the weight 
adjustment is calculated, <J>up will be high to allow the threshold voltage of the 
SONOS synaptic elements to be altered. While the SONOS synaptic weights 
are not in the update mode, the <J>read clock is high to read out the conductance 
information of the SO NOS weight elements. 
The schematic of the digital control/clocking scheme is shown in figure 
5-3. During the read operation of the SONOS synaptic weight elements, a small 
amount of charge is injected to the nitride layer of the gate dielectric. Although 
this allows for read enhancement of the SO NOS weight, excessive read voltages 
will cause the weight value to drift away from its steady-state optin11m value as 
calculated by the circuit. Therefore, it is necessary to incorporate an idling 
process for the SONOS devices where all the terminals of the SONOS device are 
sitting at the same potential. Thus, the programmed mode will consist of three 
subprocesses: the updating process, the reading process, and the idling process. 
The incorporation of the idling process requires modification on the switching 
network described in a later section. A new timing scheme with the id.le clock 
signal and the schematic with EPROM cell are shown in figure 5-4 and 5-5. All 
the data presented in tbjs thesis were taken without the idle clock signal. 
In order to sample the input waveform sufficiently, the Nyquist theorem 
requires that the sampling frequency be at least twice the input signal 
73 
\ 
DIGITAL CONTRO~CLOCKING MODU1..E 
I 
4. 10M • • • 4 'I 
, • 
• 
• 
I 
I~- 7 '.I~-. • J .. 
tC • J : • 
t I • J ; • 
I •• 
• • .. _ 
t 
/ /-
L a.• 
-- ti LCl.J( 
-
!w-, t& C ! .. t '7 C tn l& 
C l4 
7...C14 1...:1• t( L I K L 
.. K L .. 
1 • 
' 
J • 
I 
r 
... 
I 
. 
00 ~ 
I I . 
• 
. 
,. 
l:r 
• 
1 It' J 
,, I Ii J 
' 
• 
,..~ • :a 1 
.. 
• 
• 74HC11 
A 
74HC21 74NC21 
?...C 11 
I 1 
A 
~ 
' - ' 
I 
01 12 \.. 
--
~7.1 \/ ~7.1 Y 
•7.1 \/ 
:s r:ll r;ll 
' 
•7.1 \/ I I 
4> 
- ~~ -
-
~~ 10K 
' 
s ~ 
...... Co.,_ co ..... 
z ~1 7.5 \/ t-7. S V 1 
A 8 
74HC08 74HCOI • 
Co• I ~~ ,?K I -7.S V \ ;c 
--
+7 s y 
r 
-7.S \/ J~ I 
"' 
llr •1• 
,,,.. 
COlflll 
-7 s " 
Figure 5-3: Schematic of the Digital C~ntrol/Clocking 
Module 
frequency. Since we rely on an accurate knowledge of the phase relationship 
between the tapped signals x0 and xi (i.e. they must be delayed to give rosT = 
{[{2n+l)rr/2]+.6.8} to an accuracy deternained by .18), we have to sample the input 
signal much faster than required by the Nyquist theorem. An error, .18, in the 
phase relationship between x0 and Xi, will deterwine the mean square error. 
For the linear adaptive neuron setup, the sampling clock is 16 times faster than 
74 
.. 
I ' 
DIGITAL CONTROLCLOCKING MODU1.I 
•• 
~ J1 n n n n 
4>2 n n n n 
+Ir J1 n n 
c;I> Is n 
n 
q,c n n 
ct> n n 
s 
q, 
up 
q>read_J I 1 I J I J I 
' 
q,ld)e n , J 
' 
t n t t , I n 
Figure 5-4: Timing Diagram of the Digital 
ControL'Clocking Module with Idle Clock 
Signal 
n 
n 
t 
r 
the input signal frequency (and therefore, <1>1 and <1>2 are 32 times fa
ster tban 
the input signal frequency). 
5.1.2 The Analog Delay T,ine 
, The function of an analog delay line is to transfer the info
rmation (either 
as charge or voltage) through distinct but similar stages in the circuit. A
 main 
objective in this transfer processis to keep the information loss as small 
as 
possible. Common analog delay lines are implemented 
by Charge Coupled 
l 
Device (CCD) or Bucket Brigade technologies. In the linear adaptive n
euron 
75 
·I 
-Vee 
TIMER 555 
· MULTIVI-
~ VIBRATOR....._ 
T I 
-
-
' 
74LS161 
------4 
COUNTER 
0-15 
AO 
Al 
A2 
A3 
'fflE ANALOG DELAY lJNE 
+2.SV 
------D '--f~ q, Q ..... +-2.sv____.v">-- ie3d 
----
---tD 
__ I l I I ll _ D Q V 
+2.SV 
~~ ~ 
Q V -D 
----- +2.SV 
EPROM 
-D 
2kx8 
....,_----1D Q---=v~ <Pis 
--- +2.SV 
~~~ct> Q.__~v 2 
---- +2.SV 
/ j 
Figure 5-5: Schematic of the Digital Control/Clocking . 
Module with Idle Clock Sigiial 
76 
., 
C·· 
' . 
I 
J 
THE ANALOG DELAY UNI 
-
circuit, the 1n1Jo1 delay line ii implement.eel in the switched capacitor f uhion. 
The switched capacitor scheme ia chosen becaUR of the ease of incorporation of 
the analog delay line into the CMOS technology. Switched capacitor analog 
delay lines operate on the principle of transferring charges and storing the 
charge information on the storage capacitor. The clock controlled switches are 
responsible for charging and ~charging the storage capacitors. The operation 
amplifier is used to provide the charging current for the storage capacitor. 
B 
-:- Vr 
4>1 
C 
C,;;:: 
l 
-
4>2 
D I E 
X 
0 
c;-I ..___ 
-
-
4>
1 
Si--' __ ____.n ____ n ____ n.._. __ __.n __ 
<P
2 
n.._. __ ..... n ____ __.n ____ n,.__· _ 
cp. Si _________ ___.n\,..... ______ ~n'--_ 
,r 
I 
~ n n 
"-r'.----_.. -------
IS 
,; 
Figure 5-6: Schematic of the Analog Delay Line with 
' 
Input Formation Circuitry · 
The analog delay line is tapped at different spatial locations to provide the 
delayed version of the input signal. In order to pass the input signal down the 
delay line, the tapped signal has to be read out nondestructively. The 
operational amplifiers used in the delay line also serve "f s buffers at these 
. 
tapped locations. The speed of the delfly line is determined by the controll 
clocking scheme described in the previous section. It is important to note that if 
. 
77 
I 
I - .,. •• , •• -
.. 
\ 
>· 
,, 
j' 
,l. 
flil ANA.LOO DBI.A Y UNI 
there are more atage9 between two adjacent tapped locations. then we can 
sample the input waveform in finer detail to obtain better information about the 
input signal. As mentioned in the previous section, the sampling clock 
frequency is 16 times faster than the input signal frequency (the analog delay 
line sampling clock, c!>1 and 4>2, are 32 times faster than the input signal 
frequency). Therefore, we can adjust the analog delay line sampling clock 
and/or the spatial tapped location to achieve 90 degree phase difference between 
two tapped signals, Xo and x1. Figure 5-6 shows the circuit schematic of the 
analog delay line with the input forn1ation circuitry. 
The clocking diagram of <l>1,<J>2,<l>ir and <l>is are also shown in the figure in 
order to aid in the explanation of the circuit operation. Let us divide the time 
into different time frames, namely T 1, T 2, etc. and label reference points along 
the analog delay line as ~ B, etc. During T 1, both cJ>1 and c!>ir is high, and the 
voltage V ref appears across the capacitors C1 and Ci. During the time T 2, the 
voltage on the capacitor C1 is transferred to capacitor C2. When the time 
progresses to T3, <l>is is high and thus point A has the same potential as the 
input voltage. Since the capacitor holds the reference voltage Vref across it and 
the voltage on a capacitor can not change instantaneously, point B has the 
potential of the s11m of the input signal voltage and the reference voltage. 
Therefore, if we concentrates on one reference point, say point C, we will observe 
the alternation of reference voltage and the s11m of input signal voltage and the 
reference voltage as the time progress from one frame to the next. Table 5-3 
shows a good summary of the analog delay line operation at different time frame 
and at each reference point. 
It is necessary to choose the capacitance value in the analog delay line to 
ensure the proper signal transfer. Several criteria have to be imposed in order 
to choose the appropriate capacitance value. First, the capacitance value has to 
I 
I 
78 
', 
l 
t 
.. 
'i&I ANALOG DI' A Y UNI 
Analog Delay Line Operation 
Time Frame Reference Point Voltage Level 
A 
B,C,D 
E,F 
G,H 
A 
B,C,D 
E,F 
G,H 
A 
B,C,D 
E,F 
G,H 
A 
B,C,D 
E,F 
G,H 
Table 5-3: Analog Delay Line Operation 
Ground 
vref 
Vrer+ x 
Vrer + X 
Ground 
vref 
vref 
Vrer+ X 
X 
Vrer+ X 
vref 
vref 
X 
Vref + X 
Vrer+ X 
vref 
be large enough to hold the charge stored on the capacitor and thus increase 
signal transfer efficiency. If we define flt aS the time when the charging switch 
is open and !leak is the leakage current associated with the period, then we can 
' .. 
calculate the· change or loss of the capacitor voltage (fl V)as shown in the 
following expression. 
where C is the value of the storage capacitor. However, if the storage 
capacitance is too large, then it will result in long charging RC time constant 
causing the storage capacitor to undercharge during the short charging time 
. 79 .. 
·IHI ANALOG DELAY L1N1 
aJJowed by the clock signal. It ia alao important to note that a large cap
acitance 
yields a long diacbarging time constant which is undesirab
le. These 
contradicting criteria must be satisfied simultaneously in order 
to choose proper 
capacitance value. The leakage current and ON resistance can b
e obtained from 
the data book for 4066 analog switch and thus 0.1 µFis deterxr,ined
 t-o perform 
well in my experimental setup. 
5.1.3 The Analog Signal Processor 
The tapped signal from the analog delay line, Xo and x1, and the de
sired 
signal, d, serve as the input signals in this module while the outpu
t signal is the 
error signal, £. Figure 5-7 shows the circuit schematic of th
e analog signal 
·' 
processor. The input signal, Xo, is connected to the drain terxtaina
)s of a pair of 
SONOS nonvolatile transistors served as a weight element. Th
e drain current 
of each SONOS transistor is the product of the input signal a
nd the channel 
conductance. The source terniinal of each transistor is connected
 to a s12mming 
path where all the drain current is s11mmed. In the linear ad
aptive neuron 
setup, two s11mming paths are provided for each pair of the SON
OS transistors. 
The upper stiroroing path is defmed as the positive summing pat
h and the lower 
summing path is identified as the negative summing path. Thes
e two s11rnming 
paths will eventually combine to form a single ended output 
and feed into a 
correlated double sampling circuitry to removed the unwante
d noise and the 
offset voltages introduced by the operational amplifiers and the
 switches. The 
output of the neuron, y, is then compared with the desired or tra
ining signal, d, 
to generate the error signal, £, which is to be minimized in ord
er to train the 
system. 
Let us define the conductances which are connected to the
 positive 
summing path with a superscript of+ and the conductances co
nnected to the 
80 
' 
.. •, 
-f a, 
,\ 
RI 
• Rr 
u 
~ 
\ 
<l>c 
E 
cl>. 
THE ANA.LOG SJCNAL PROCESSOR 
ll 
dlDI' 
Figure 5-7: Schematic of the Analog Signal Processor 
negative summing path with a superscript of-. First, we will focus our anal
ysis 
on the positive 511mming path. The input signal (Xo) multiplied 
by the 
conductance of the left transis1:or in the transistor pair (gdsO +) is combined 
with 
the product of the input signal (x1) and the conductance (gdsl +) in the positive 
s11mming path and is converted into a corresponding voltage by the ope
rational 
amplifier with a gajn of -Rt,. Therefore, we can express the voltage at point A of 
the differential amplifier for small V d.s as 
(5.2) 
On the other hand, the same input signals are multiplied with the cond
uctances 
of the right transistors and voltage converted to yield the voltage at
 point Bas 
(5.3) 
Since we require a single ended output in order to generate error sign
al. vA and 
VB will be s11mmed by the differential amplifier to yield the output Yn as 
81 
'IBI ANALOG SIGNAL PROCESSOR 
Hwe define the output Yn to be the linear product sum between the in
put signal 
and the weight value such as 
then we can define the weight value to be the difference in conductanc
e value in 
each of the transistor pair. 
(5.5) 
A more exact analysis which includes second order V d.s nonlinear ter
nis in Ids 
gives the sdsaroe expressions for the weight values when correlat
ed double 
sampling is employed. From this point on, the quantity (R1R2)JR1 will be 
referred as the gain factor of the differential amplifier section. The
refore, the 
weight value can be either positive or negative as discussed in th
e previous 
chapter. 
The weighted sum of the input signals, y n' contains the DC refer
ence 
voltage and noise which are also amplified by the differential ampl
ifier. The 
correlated double sampling circuit is designed to removed all the comm
on signal 
~-
between each clamping and sampling cycle. The clamping clock, <l>c,
 goes high 
immediately after the cJ>1 and <l>ir clocks phase high. Both the inp
ut terminals, 
82 
G> 
·,' 
( •' 
' ( . 
t. 
•' 
'l'HE A.i'IALOG SIGNAL PROCESSOR 
Zc, and z1, are at the DC reference voltap in potent
ial. Therefore, point D at 
one end of the clamping capacitor, Cc, will be charged to the ga
in factor 
multiplies the reference voltage and the summation of two we
ight value while 
the other plate of the capacitor (point E) is grounded. During the next tim
e , 
frame, the sampling clock <%> 9 , goes high and the input tetxx1i
nals conf:ain the 
s,1mmation of the DC reference voltage and the AC input si
gnal. The input 
signals a.re amplified by the differential amplifier network and a
ppear at point 
D. Since the voltage across the. clamping capacitor can 
not be changed 
ins1:antaneously, the voltage at point E will become the gain factor
 multiplies 
the product of the pure ... ~C input signals and their prospective
 weight values. 
The correlated double sampling operation can be best expla
ined in Table 5-4. 
Notice, the 'clean' output of the correlated double sampling cir
cuit is valid after 
the clock <1>8 is high. Once the output signal is 
valid, it is compared with the 
sampled version of the desired (training) signal and the difference is comp
uted 
by the differential amplifier to generate the error signal 
Correlated Double Sampling Circuit Operation 
Time Frame Reference Point V
oltage Level 
Clamping D 
Rill,. 
R [V,5· Wo+ V,.;· W1] 
E 1 round R~ ~ 
Sampling D ~ - _[(Vref+xo)· Wo+<Vref+x1)· W1J t R ~ 
E ~' - [Xo· Wo+X1. W1J 
Table 5-4: Correlated Double Sampling Circuit 
Operation • 
83 
' ' . 
V 
'l'Bi LIAR.NINO ALGORITHM MODULI 
1.1.4 The I,eara•inr Ailorltbm Module 
The primary function of th.is module is to calculate the incremental weight 
value such that the overall error signal can be minimized in an iterative 
manner. Once the error is generated and fed from the analog processor module, 
the learning algorithm module takes the magnitude and sign info,mation of the 
error and outputs two programming paths. The positive programming path 
amplifies the error magnitude by 2G times. The gain of the module, G, has the 
range from O to 1. On the other band, the negative programming path amplifies 
the error magnitude by -2G times. Each programming path represents the 
positive or negative voltage used as the programming voltage to the gate of the 
SONOS transistors during update mode. The programming paths are level 
shifted to add a DC bias to the amplified AC error signal in order to program the 
SONOS transistor more effectively. 
The circuit schematic of the learning algorithm module is shown in figure 
5-8. The magnitude infor1x1ation of the error signal can be obtained by passing 
the error signal into an absolute value extractor as shown in the input section of 
figure 5-8. First, the sign information of the error signal is extracted by using a 
comparator with inverting terminal connected to ground. The sign information 
is then inverted and both the sign and the inverted sign signals are used to 
drive the switches in the absolute value extractor. IT the error signal is 
detern1ined to be positive, the switches configure the operational amplifier as 
noninverting amplifier and, thus, the error signal comes out unchanged. On the 
other hand, if the error signal is determined to be negative, the switches then 
set the operational amplifier as an inverting amplifier and alter the incoming 
~ error signal. The sign information is also used in the algorithm U> calculate the · 
incremental weight. 
· The learning algorithm implemented in the linear adaptive neuron is the 
84 
• 
;; 
"' 
' f
t .. 
',·~'. 
1D LIARNING Al.GORI I BM MODULI 
&Ior 
sgn (error) R R Vpctc-
sgn (error) 2GR 
R 
sgn (error) 
R 
R -
-
-
-
-
-
PPP - Positive Programming Path 
NPP - Negative Programming Path 
sgn (error) 
-
-
sgn (error) 
-
-
sgn(error) -~ sgn(error) @ sgn{Xo) 
sgn(Xo) • 
sgn(error) ---"' sgn(error)@ sgn(x1) 
sgn(x1 ) _ __, '-------
R1. 
R3-
. 
Figure 5-8: Schematic of the Learning Algorithm 
Module 
85 . -- 1 
R2. 
NPP 
R4-
-
- R~ . 
-+ 
PPP 
-
-
• 
, I 
TBi LEARNING ALGORITHM MODULI 
clipped data algorithm di10111ed in chapter 2 and 4. The incremental weiaf,t 
can be written u 
where index k denotes the spatial location of the weight. If we assign logic
 levels 
1 t:o represent the sign inforxxiation of a positive error signal a
nd logic O to 
represent the sign of a negative error signal, then the operation 
sgn( E.) · sgn(xk) 
can be considered as a digital multiplication which can b·e implem
ented by a 
simple Exclusive OR operation. According to the result of the Exclu
sive OR, the 
steering network described in the following section will take either th
e positive 
programming path voltage or the negative programming path volta
ge and direct 
it to the gate of the SON OS memory transistor during the update mode
. 
5.1.5 The Steering Network 
The steering network is composed of switches controlled by the out
put 
from the learning algorithm module to steer proper programming v
oltages from 
the programming paths. In order to program the SONOS weight e
lement more 
effectively, we program the channel conductances of each transisto
r pair in the 
opposite direction. For example, if the algorithm require an incre
ase in the 
overall weight value, then the channel conductance of the left trans
istor, in the 
. 
transistor pair, (gds +), will be increased by having the negative programming 
path connected to the gate terniinal (erase operation) while the channel 
conductance of the right transi~tor, (gd8·), will be decreased by the pr
ogramming 
j 
voltage from the positive programming path (write operation). This scheme is 
chosen because of the faster convergence in the weight value. Figure
 5-9 shows 
the circuit schematic of the steering network. ff gdsO + denotes the le
ft transistor 
of W0 pair and gdso· denotes the. right transi
stor of W0 pair, then we can 
86 
• 
THE STEERING Nil WORK 
• the steering network operation (programming voltage) in Table S.S . 
The steering network ii also responsible for swit:cbing in the 
read voltage during 
Steering Network Circuit Operation 
sgt&ESPo Sgn£Sgn.J:l gdsO + gd.O 
• + gdsl id.1· 
+ + -2GJEI +2GJ£f -2GJ£ I +2GJEI 
+ - -2GJ£f +2Gl£f +2Gf£1 -2GJ£f 
-
+ +2GJ£f -2Gl£J -2GJEf +2GJ£1 
- -
+2Gt£f -2GJ£1 +2GJ£1 -2GJEI 
Table 5-5: Steering Network Circuit Operation 
the read mode, the initializing voltages during. the initi
alizing mode, and the 
read voltage during the disabled mode, to the gate 
te1 roinals of the SONOS 
transistors to ensure proper operation of the entire circ
uit. 
5.2 Measurement Setup and Results 
The electrical perfo1mance of the linear adaptive neu
ron is demonstrated 
in two major characteristics: (1) The output and training signals ve
rsus time, 
and (2) The error signal versus time. The former characteristics
 provides the 
information of how well the output signal approxi
mates the training signal 
' 
especially in the phase relationship between these 
two signals. The latter 
characteristics shows how fast the linear adaptive
 neuron adapts before it 
reaches its minim11m error. All the wavefo1ms and ph
otographs are taken from 
a Tektronix 7854 Digital Storage Oscilloscope. The 
power supplies not only 
supply the power rails for circuit operation, but they
 also provide a convenient 
way for adjusting the variable reading, DC reference, positive pro
gramming 
shift, and negative programming shift voltages. The f
unction generator is used 
to provide the input and training signals. The traini
ng signal amplitude is set 
87 
I 
' . 
CONA f CONB 
PPP NPP 
g+ 
clsO 
sgn (error) sgn(~) => Positive 
CON A= LOW CONC=ffiGH 
CONB = IllGH COND= LOW 
INIT ENABLE 
V ... ~01--------~a~-' 
Ullt J t 
PROG -
Vre ~o f READ~ 
<PreJ vre 
CONE 0 
I 
I <Pup 
I 
PPP NPP 
£! + 
'-' dsl 
sgn (error) sgn(x1 ) => Positive 
CONE= LO\V CONG= filGH 
CONF=IDGH CONH=LOW 
MEASUREMENT S£n1P AND RESULTS 
ENABLE INlT 
~--c:::h- V init 2 
COND 
PPP NPP 
sgn (error) sgn(~) => Negative 
. 
CON A=IDGH CONC=LOW 
CONB =LOW COND=ffiGH 
ENABLE INIT 
~----r-:~ f:.c_ V init 4 
PROG 
~'1f:.c_ V 
re 
<Pre 
T CONH 
PPP NPP , 
sgn (error) sgn(x1 ) => Negative 
CONE=ffiGH CONG=LOW 
CONF=LOW CONH=IDGH 
Figure 5-9: Sch~atic of the Steering Network 
88. 
~ ' 
~(<··~·-.- ...!lo--······---
MIASURDI.ENT SETUP AND Rl8ULT8 
up to be 10 times greater than the input signal. Furthermore, the t
raining 
signal is phase-shifted by 45 degrees with respect to the input sig
nal in order to 
demonstrate both the amplitude and phase adaptation of the lin
ear adaptive 
neuron. There exists a 90 degree phase shift between the two 
tapped input 
signals as described and analyzed in section 5.1. 
5.2.1 Output and Training Sig11aJs versus Time Characteristic
s 
The output and training signals versus time characteristics con
sist of two 
parts: the initialized and the adapted part. In the initialized p
art, the weights 
are initialized to a known state (either the fully positive stat.e or the full
y 
negative state). The fully positive state is achieved by applying a negative 
programming gate bias to the transistors connected to the positiv
e programming 
path and applying a positive programming gate bias to the trans
istors connected 
to the negative programming path. To obtain the fully negative
 state in weight 
value, the programming gate biases applied to the channel con
ductances are in 
the exactly oppositie way. 
The circuit is then placed under the disabled mode with the out
put and 
training signals monitored by the oscilloscope. Since the inp
ut and training 
signal have an amplitude and phase difference between them, th
e output signal 
(configured to represent the weighted sum of the input signals) and the trainin
g 
signal will have an initial amplitude and phase mismatch. 
Initially, the 
amplitude of the output signal is larger than the training signa
l because of the 
built-in gain of the circuit. However, if the circuit is pla
ced under the 
programmed mode, then the output signal will shrink in amplitu
de and phase 
locked with the training signal as the error signal is minimized. 
Figures 5-10 a 
and b shows the output and training signals versus time chara
cteristics before 
\. 
and after the adaptation takes place. · 
89 
I 
D 
y 
D 
y 
(a) 
(b) 
Figure 5-10: Output and Training Signals versus 
Time Characteristics: (a)Initialized and 
(b) Adapted 
90 
• 
IRROR SIGNAL VERSUS ffl(I CIIA..D.ACJIRJ8'11C8 
1.2.2 Error Sic••al versos Time Characteristics 
' The error signal versus time characteristics provides a q
uantitative 
information of how fast the linear adaptive neuron adapts to the t
raining signal 
(when the error signal is at its minimum). Again the circuit has to start f
rom 
known weight states. However, there msts a distinct dif
ference in 
measurement technique between the error signal versus ti
me characteristics 
and the output and training signal versus time characteristic
s. First of all, the 
circuit does not have t.o go through the disabled mode (in w
hich the error 
feedback path is disconnected from the gate terminals of the S
ONOS devices 
and a read voltage is applied to all the gate electrodes of the S
O NOS devices to 
read out the conductance inforniation) for the error signal versus t
ime 
characteristics. Secondly, the digital oscilloscope is configured to
 perform a 
single sweep, one-shot triggering time base instead of 
the 'free-n1nning' 
automatic time base - the normal operational mode for the 
oscilloscope. The 
oscilloscope is triggered to start the single sweep at the flip of th
e SPOT switch 
which switches the circuit from the initialized mode to the p
rogrammed mode. 
The oscilloscope continuously monitors the error signal, digit
izes the data, and 
stores the data in the memory for further display. Once the 
time base limit is 
reached, the oscilloscope stops sampling the error signal and dis
play all the data 
points on the CRT screen for photographic capture or data tr
ansfer through its 
IEEE 488 bus to a HP 9836 computer. 
It is interesting to note that different weight initialization s
cheme have 
different effects on how the circuit converges to a minim11m er
ror. One possible 
'· 
explanation is the SONOS transistors have different erase ch
aracteristics than 
write characteristics. Therefore, when the weights are init
ialized differently, 
.. there are times when the circuit does more erasing than writ
ing and vice versa, 
and thus creates different error versus time characteristics. F
igures 5-11, 5-12, 
91 
3, 0 
Figure 5-11: Error versus Time Cb~racteristics -
Initialization Scheme: + - + -
We will define the svstem time constant to be the ti
me it takes for the 
., 
error signal to decay from its maxim11m value to the 1
/e of the maxim11m error 
signal. In addition~ we can define the quantity of adapt
ivity as 
A.daptiviry = 20 log 10( t ) in sreadv stare ., (5.6) 
The experimental results are s12mmarized in Table 5
-6. The adapti~lity 
listed in Table 5-6 is calulated based on the final error
 aroplitute at the end of 
the 50 second time span of adaptation. If the circuit is
 allowed to adapt for a 
longer period of time, then the adaptivity is found to b
e improved. From Table 
5-6, the initialization scheme which has the smalles
t system time constaJit 
. - ~~ 
. . ' 
92 
.,.: . : . 
Figure 5-12: Error versus Time Charact.eristics -
Initialization Scheme: + - - + 
Figure 5-13: Error versus Time Charact.eristics • 
Initialization Scheme: - + + -
93 
; 
Figure 5-14: Error versus Time Char
acteristics ... 
Initialization Scheme: - + - + 
~ 
St1mmary of the Experimental Results ~ 
Initialization I Svstem Time Constant (s) .. 
Adaptivity 
+ - + -
3 -20 
+ - - + 
1 -4.4 
- + + -
2 -16.4 
- + - + 
3 -7.9 
Table 5-6: S11mmary of the Experimental R
esults 
yields poorer adaptivity. More experim
ents have to be perfo1med before any 
direct relationship between the system ti
me constant and the adaptivity can be 
drawn. 
94 
'" 
Chapter& 
Conclusions 
• 
CONCLUSIONS 
A solid state electronic two tap weight linear adaptive neuron has been 
designed and constructed in breadboard form. The operation theory of the 
linear adaptive neuron has also been e%J)lored in Chapter 4. In addition, the 
electrical perfo,mance and the experimental results of the linear adaptive 
neuron are presented in Chapter 5. The source code and the results o
f a 
software simulation, written to mimic the electrical neuron opera
tion, is 
included in Appendix C. The two tap weight linear adaptive neuron with 
a 
Wid.row-Hoff s delta learning rule serves as a test vehicle to demonstr
ate the 
salient features of the SONOS nonvolatile memory transistors used as the 
synaptic elements in the hardware implementation of neural networks
. The 
SONOS fabrication technology and the characterization techniques of
 the 
SONOS devices are outlined in Chapter 3. These characterization tech
niques 
include the high frequency C-V characteristics, the linear voltage ram
p 
characteristics, the erase/write characteristics, the retention characterisi
tcs and 
the dynamic range characterisitcs. The attractive features of the SONOS 
electrically modifiable synaptic elements are: 
• Analog (Free from quantization error) 
• Small Size (Estimated 20 µm2/weight cell for 1.2 µm 
feature size) 
• Low Programming Voltages ( < 7.5 V) 
• Low Power Cons11mption ( < lµW/weight cell) 
• Good Dynamic Range (60 dB) 
• Good Memory Retention (20% window at a projected 10 
year period) 
95 
__ .... ~ 
- -.\ :-
• 
I 
,: \ 
' 
~\. 
CONCLUSIONS 
• Reinforced Learninr 
• Excitatory/Inhibitory Synaptic Behavior 
The electrical perfol'mance of the linear adaptive neuron is found to be 
dependent on the weight initialization. It is postulated this observation is due 
to the nonsymmetry of the erase and write operation of the SONOS devices. 
The DC voltage shift added to the programming voltage has its influence on the 
circuit convergence as well. The circuit performance is found to be timing 
dependent. If all the controlling signals are made twice as slow, then the error 
amplitude is found to be larger due to the longer programming and reading 
times, resulting in a 'soft write' operation during reading. A theoretical analysis 
on the learning algorithms and the convergence factor has been presented in 
Appendix B. 
The analog delay line, which is designed to give a 90 degrees phase shift 
between the tapped locations, may have a 'phase error' between the two tap 
locations. In the situation where no noise is present in the circuit, the 
theoretical analysis predicts the error signal becomes sinusoidal due to the 
phase error in the steady state condition. This prediction is confirmed by the 
experimental results shown in Chapter 5. The convergence time constant of the 
system is found to be in the seconds range, corresponding to several thousand of 
programming cycles. This observation indicates that the SONOS nonvolatile 
memory transistors are fairly insensitive to the short programming pulse
 
duration (156 µs) used in the circuit. The adaptivity figures presented in 
Chapter 5 are based on 50 seconds of adaptation time. The adaptivity is found 
to be much better if the circuit is allowed to adapt for a long period of time, say 1 
hour. The typical adaptivity after a long adaptation time is around -26dB. 
Although experimental results presented in Chapter 5 indicate the weight 
96 
-c; .... 
CONCLUSIONS 
• 
initialization schemes which provide better convergence time performance show 
poorer adaptivity behavior, more experiments have to be conducted to draw a 
direct relationship between the system convergence time constant and the 
adaptivity of the system. 
The angular dependence on the adaptivity between the training sipal 
and the input signal has not been fully investigated. It is important to optimize 
not only the circuit design but also the SONOS nonvolatile transistors as well. 
It is desirable to have SONOS devices which program faster, retain memory 
better, and have symmetric erase and write characteristics. The placement of 
the memory window is also an issue of concern. If the memory window is 
centered at the ground level, then the reading voltage can be at the ground 
level, which minimizes the soft write operation of the SONOS memory 
transistor. In order to minimize the noise introduced in the circuit, the offset 
free switched capacitor analog delay line may be implemented with the expense 
of a more elaborate clocking module. 
In order to realize a larger neural network, the synaptic element must be 
readily integratable onto silicon wafers. Since the SONOS technology is fully 
compatible with the CMOS technology, the work presented in this thesis should 
provide a leap toward the realization of artificial neural networks. 
,· 
97 
• 
References 
1. Bernard Widrow, .Study Director, DARPA Neural 
Networl,, Study, 
10 / 87-2 I 88, Final Report, Lincoln Laboratory, MIT, MA, 1988. 
2. H.P. Graf and P. deVegvar, "A CMOS Implem
entation of a Neural 
Network Model", Proceedings of Stanford Conference on Advanc
ed 
Research in VLSI, 1987, pp .. 
3. Y. Tsividis and S. Satyanarayana, "Analog Cir
cuits for Variable-Synapse 
Electronic Neural Networks", Electronic Letter
s, Vol. 23, No. 
24, Nov 1987, pp. 1313-1314. 
4. F.J. Kub, I.A Mack, KK Moon, C.T. 
Yao and J.A. Modla, 
"Programmable Analog Synapses for Microel~tronic 
Neural Networks 
Using a Hybrid Digital-Analog Approach", IEEE 
Device Research 
Coference on Neural Networks, 1988, pp. 24-27. 
5. Mark Holler, Simon Tam, Hernan Castro, a
nd Ronald Benson, "An 
Electrically Trainable Artificial Neural Network (ETANN) w
ith 10240 
'Floating Gate' synapses", Proceedings of IJCNN, 1989. 
6. M.H. White, I.A. Mack, G.M. Borsuk, D.R.Larop
e, and F.J. Kub, "Charge-
Coupled Device (CCD) Adaptive Discrete Analog Signal Proce
ssing", 
IEEE J. of Solid-State Circuits, Vol. SC-14, 1979, pp. 132. 
7. Marvin H. White and Chun-Yu Chen, 
''Electrically Modifiable 
Nonvolatile Synapses for Neural Networks", Proc
eedings of IEEE 
International Symposium on Circuits and Systems, 1989, 
pp. 1213-1216. 
8. Richard P. Lippmann, "An Introduction to Comp
uting with Neural Nets", 
IEEE ASSP Magazine, Vol. 4, No. 2, April 1987, pp. 4-22
. 
9. J.E. Spencer, ''Real-Time Applications of Neur
al Nets", IEEE Trans. on 
Nuclear Science, Vol. 36, No. 5, October 1989, pp. 1485-148
9. 
10. Hans P. Graf and Lawrence D. Jackel, "A
Nalog Electronic Neural 
Network Circuits", IEEE Circuit and Devices Magizin
e, Vol. 5, No. 
4, July 1989, pp. 44-49. 
11. B. Widrow and S.D.Sterns, Adaptive Signal Proc
essing, Prentice-Hall, 
1985. 
12. B. Widrow, P.E. Mantey, L.J. Griffiths, and
 B.B. Goode, "Adaptive 
Antenna System", Proceedings of IEEE, Vol. 55, No. 12, December 196
7, 
pp. 2143-2159. 
13. B. Widrow and M. Hoff,Jr., "Adaptive S
witching Circuits", IRE 
WESCON Conv. Rec.,pt. 4, 1960, pp. 96. 
14. J.L. Moschner, "Adaptive Filter with Clipped In
put Data", Tech. report, 
Standford Lab. Report, No. 6796-1, June 1970. 
98 
, ' 
I , 
" 
15. D. Hirsch and W. W-Jdrow, ·A Simple Adapt
ive Equalizer for Efficient 
Data Transmission", IEEE Trans. Comm. Tech.., Vol. C
om-18, 1970, pp. 
5. 
16. Frank Robert Libsch, Physics, Tech.nology a
nd El,ectrical Aspects of 
Scaled MONOS I SONOS Devices for Low Voltage
 Nonvolatile 
Semiconductor Memories (NVSMs), PhD dissertation,, Lehigh
 University, 
1989. 
17. Margaret Larson French, "Memory W
indow Studies of Nonvolatile 
Silicon-Oxide-Nitride-Oxide-Silicon (SONOS) Memory Devices", 
Master's 
thesis, Lehigh Univeristy, 1990. 
18. Anirban Roy, "Retention1 Endurance and 
Interface Traps in MONOS 
Memory Transistors", Master's thesis, Lehigh U
niveristy, 1985. 
19. Marvin H. White, Donald R. Lampe, Fr
anklyn C. Blaha, and Ingham 
A. Mack, "Characterization of Surface Channel 
CCD Image Array at Low 
Light Levels", IEEE J. of Solid-State Circuits, Vo
l. SC-9, No. 
1, February 1974, pp. 1-13. 
20. John J. Hopfield and David W. Tank, "Com
puting with Neural Circuits: 
A Model", Science, Vol. 233, August 1986, pp. 625
-633. 
21. J. Davis, R. Newburgh, and E. Wegman, 
editors, Neural Dynamics of 
Category Learning and Recongnition: Attention
, Memory Consolidation, 
and Amnesia, Brain Structure, Learning, a
nd Memory, AAAS 
Symposium Series, 1986. 
22. R. Rosenblatt, Principles of Neurodynamics, Spartan Bo
oks, 1959. 
23. D.E. R11melhart, G.E. Hinton, and R.J
. Williams, Learning Internal 
Representations by Error Propagation, MIT Pre
ss, 1986. 
24. T. Kohonen, Self-Organization and Associative Mem
ory, Springer-Verlag, 
1984. 
25. Marvin H. White and J. Ronald Cricchi, "C
haracterization of Thin-Oxide 
:MNOS Memory Transistors", IEEE Transactio
n on Electron Devices, Vol. 
ED-19, No. 12, December 1972, pp. 1280-1288. 
. 
99 
t· 
r: ' 
t·1 p, 
) 
' 
,. 
,} 
( 
,, 
f 
\·-{ 
f, 
Appendix A 
Learning Algorithms 
LEARNING ALGORJTHMS 
There are several algorithms proposed to be used as the lear
ning 
algorithms for the neural networks as classifier. Figure A-1
8 shows a tuonomy 
of six neural networks. Generally, there are two types of in
put signals, binary 
and the continuous-value (analog). Under each signal category, it 
can be 
subdivided into two training methods, supervised or tinsupe
rvised. Those nets 
trained with supervised input can be best used as associativ
e memories or as 
classifier. Those nets trained without supervision are used 
as vector quantizer. 
These six algorithm have been summarized and described i
n the following 
• 
sessions. 
Neural Network Classifiers for Fixed Patteins 
. Binary Input 
Analog Input 
Supervised Unsupervised Supervised 
Unsupervised 
. Multi-Layer Kohonen 
Hopfi.eld Ramming Carpenter/ 
Single-Layer Self-
Grossberg Perceptron Perceptron 
Net Net 
Organizing 
Qassifier FeabcreMap 
'. 
Figure A-1: Taxonomy of the Neural Networks 
'100 
,• .. ~. 
. t 
A.l Bopfteld Net 
The Hopfield Net20 baa contribut.ed the 1UJP in the interest in the neural 
networks. The operation of the Hopfield net of N nodes starts with the 
usignation of connection weight values as follow: 
M-1 
{
) S I ' • 
t;j • ;::J, X; Xj I ¢ } (A.1) 
0 i= j, 0 S i,jS M-1 
where t;j is the connection weight from node i to node j and .x/ is the ith element 
of the examplar for class s with the value either 1 or -1. After initialization, the 
unknown pattern is imposed on the net at time ::0 by forcing the output of the 
net to match the unknown pattern. 
U;(O) = X;, OS i S N-1 (A.2) 
where u; (t) is the output of node i at time t and X; is the ith element of the input 
pattern. The net then iterates until the outputs remain unchanged according to 
the foil owing rule. 
N-1 
U; (t+l) = th[ ~ l;f;(t)], 0 S j ~ M-1 (A.3) 
where the function fh is the hard limiting nonlinearity function. The node 
outputs then represent the examplar pattern that best matches the unknown 
input. 
The Hopfield net, however, has two major limitations. First, the number 
of patterns that can be stored and accurately recalled is severely limited by the 
number of connections and nodes. Second, the Hopfield net becomes unstable 
101 
• 
/\ 
• 
wben the ezamplar pattern abarea many bite in common with another exemplar 
A.2 The Hamming Net 
In a communication syst.em, when binary fiud length signals are 
transmitted through a memoryless channel, a classifier that calculates the 
hamming distance t:o the examplar for each class and select the class that has 
the minimum hamming distance is called the Hamming Net. The hamming 
distance is defmed as the number of bits in the input which do not match the 
corresponding e:xaroplar bits. 
A Hamming Net implementation normally consists of two subnets, the' 
lower subnet lay.er, and the upper MAXNET layer. The lower subnet is 
responsible for the calculation of the matching scores from the input and the 
MAXNET layer selects the node which has the maximum output. If we have N 
input nodes and M output nodes, the operation of the Hamming Net begins with 
the weights and the thresholds values in the lower subnet being set in such a 
way th.at the output layers of the lower subnet calculates the quantity, the 
matching score, N minus the Hamming distance to the examplar pattern. The 
range of the matching, score is therefore from O to N. The higher the matching 
score is, the better match it is for the input t:o the corresponding classes with 
examplars. The thresholds and weights of the MAXNET subnet are fixed. All 
,, 
thresholds are set to zero and the weights from each node to itself are 1. 
Weights between nodes are inhibitory with a value of -£ where £ < 1/M. 
Therefore, we can express the operation of the Hamming Net as follow 
41' 
( 
102 
"... ~ 
,' 
,. 
,'lo', 
In tM lower n,bnet· (A.4) 
xi; N 
w ..• - e .• -IJ 2 J 2 
0 S i S N-1, 0 S j S M-1 
In the upper subnet· 
where Wij is the connection weight from input node i t.o node j in the lower 
subnet and ej denotes the threshold value of node j, t/cl is the connection weight 
from node k t.o node / in the upper MAXNET and xi; is the ith element of 
examplar j. After the weight and the threshold values are initialized, a pattern 
with N elements can then be presented at the bottom of the Hamming Net. The 
pattern has to be present long enough, though, to allow the calculation of the 
matching score to be settled. 
N-1 
µ/0) =fr(~ W ij X; - 0j) (A.5) 
0 S: j S: M-1 
where µj is the output of node j in the lower subnet, x; is the ith element of the 
input node and fr is a nonlinear threshold function. An assumption of the 
nonlinear function has to be made that the maximum input to the function will 
not cause the output to saturate. After the net is initialized with the pattern, 
the pattern can be removed and the MAXNET iterates until the output of only 
one node is positive. Once this condition is achieved, the classification process is . 
103 
q 
..... . 
-~· . --
node. 
(.4.6) 
0 ~j,k S M-1 
In literature, it has been proven that the MAXNET will always converge and
 
find the node with the maximum value when E < k 
The Hamming Net has several advantages over the Hopfield Net. First of 
all, since the Hamming Net implements the optim11m minim11m error clas
sifier 
when bit errors are random and independent, it has at least equivalent or e
ven 
better performance over the Hopfield Net. The Hamming Net also requires 
less 
connections than the Hopfield Net. Furthermore, the Hamming Net requ
ires 
less n11mber of inputs then Hopfield Net. In addition, the Hamming Net d
oes 
not suffer from spurious output patterns which can produce a non match
ing 
result. 
A.3 The Carpenter/Grossberg Classifier 
Carpenter and Grossberg have designed a net which forms clusters and is 
trained without supervision. 21 This net implements a clustering algorithm t
hat 
is similar to the simple sequential leader algorithm. The leader algori
thm 
selects the frrst input as the exaroplar for the first cluster. When the se
cond 
input is present to the net, it compares the second input to its first cluster 
and 
computes the corresponding matching score. H the mate.bing score is within 
a 
certain threshold value, then the second input is classified as the first clus
ter, 
otherwise, it is the exarnplar for the new cluster. This process is repeated for
 all 
104 
..,_ -.r:- ~·. - ---. j 
following inputs. Therefore, the n,1mber of cluatera 1,ow1 with time and it 
ii 
dependent upon the maubing score and the threahold value aet for the n
et. 
Matching scores are computed using feed-forward connections and the 
rnax:im11m value is enhanced uaing lateral inhibitation among the output n
odes. 
Thus, the structure of Carpen1:er/Grossberg net is similar to the Ham
ming net 
described in the previous section. However, there are distinct differen
ces 
between both nets. The Carpenter/Grossberg net provides feedback co
nnections 
from the output nodes to the input nodes. Furthermore, mechanisms
 are also 
provided to turn off the output node with a rnaxim11m value, and to
 compare 
examplars to the input for the threshold test required by the leader a
lgorithm. 
This net is completely described using nonlinear differential equations i
ncluding 
extensive feedback. 
The operation of the Carpenter/Grossberg net starts from 
the 
initialization of setting all the examplars represented by connection w
eights tD 
zero. In addition, a matching threshold, called vigilance, ranging 
from O to 1 
must be set. The vigilance dete1 xxiines how close a new input pattern
 must be to 
a stored exaroplar to be considered similar. Therefore, a vigilance va
lue near 1 
requires a close match between the input pattern and the stored ex
amplar while 
a smaller value of vigilance accepts less similarity. The initializa
tion can be 
described as follow 
• 
t;j(O) = 1 0 S i S N-1 (A
.7) 
1 
bij(O) = N+l OS j S M-1 
set p OS p S 1 
\,_ 
where bij(t) is the bottom-up and the tij(t) is the up-do~ connection weig
ht 
105 
between the input node I and output node J at time r, p is the vigilance value. 
After the input pattern is presented to the bottom of the net, the input pattern 
is compared to all stored examplars in parallel as in the Hamming net t.o 
produce the ma~bing scores as 
N-1 
,,. • f.bb··(t)X· 0 S ,· S M-1 J I) I 
,. 
(A.8) 
where "i is the output of the node j and X; is the ith element of the input pattern. 
The maxim11m matching score is selected by lateral inbibitation as 
• U· = max (U·) J J 
(A.9) 
The selected examplar is then compared with the input patt.em and the ratio of 
the dot product of the input pattern and the selected examplar divided by the 
number of 1 bits in the input is computed. If the ratio is greater than the 
vigilance threshold, then th input is classified to be the selected examplar. The 
selected examplar is then updated by performing the logi~ AND operation 
between the bits of input pattern and the best matched e:x:amplar. ff the ratio is 
less than the vigilance threshold, then the input is considered as a new 
examplar and is added to the net. Each additional. examplar requires one more 
node and 2N new connections to compute the matching score. 
-~ 
106 
---. --
, ···~ 
' ·'lsrr:n 
N-1 
l.rl= ~ .r, 
.~ 
N-1 
-J 
R T · .r II = '5' t ;/ • .r, 
,'::6 
Is UT·.rll > ? 
11.rll P 
• 
YES -. Ma1ch tht stltcttd uanrplar 
NO -. New aamplar 
and the update of the selected exemplar is 
• • t · · ( t+ 1) = t · · ( t) · X · Q ~ ' 
• 
* tij X; 
bij (t+l) = N-l 
0.5 + i~~ tij • (t)X; 
(A.JO) 
(A.11) 
There is problem associated with the Carpenter/Grossberg net. This net 
works well with perfect input patterns; however, even a small noise with the 
input pattern can cause the net to classify the noisy input pattern as a new 
examplar. Modifications are necessary to enhance the performance of this 
algorithm in noise. 
• 
A.4 Single Layer Perceptron 
The single layer perceptron is the first net that can accept analog as well 
as the digital inputs. This net has generated much interest initially because of 
its ability to learn to recognize simple patterns. A perceptron, when first 
developed, can decides whether the input belongs to one of the two classes, A 
and B. Therefore, this net can be used as a classifier. The single node computes 
107 
,,, 
SINGLI LA YER PDCIP'i RON 
a weipuid sum of the input elements, subtracts a threshold (8) and paaaea the 
1'88Wt through a bard limiting nonlinearity such that the output y ia either + 1 or 
-1. The output is categorized as class A if the output ia 1, while the output ii 
judged 88 class B if the output is -1. In order to analyze the net behavior, a plot 
of the decision regions created in the multidimensional apace spanned by the 
input variable can be generated. If there are only two input variables, then the 
decision region is separated by a line; however, if there are more than two input 
variables, then the decision region would be separated by a hyperplane. 
The connections weights and the threshold value of the net can be either 
fixed or adaptive. If one chooses to use adaptive weights, then a updating (or 
learning) algorithm is necessary. The original convergence procedures was 
developed by Rosenblatt22. The net starts its initialization by randomly 
assigning small non-zero weight values. When an input of N elements is 
applied to the net, the output is computed 88 
N-1 
y(t) = fh( 4i W;(t)X;(t)- 8) 
i=O 
(A.12) 
where W; (t) is the ith weight value at time t, fh is a nonJinear hardlimiting 
function and 8 is the threshold value. If the output of the net y(t) is different 
from the desired response, d(t), then an error is generated and the weight values 
• 
needed to be adjusted. 
W;(t+l) = W;(t) + T\ [d(t)-y(t)]x;(t) 0 ~ i ~ N-1 (A.13) 
where d(t) is +1 if the input is from class A and is -1 if the input is from class B, 
and T\ is the gain term or the convergence factor which controls the adaptation 
rate. This gain term must be adjusted to satisfy the conflicting requirements of 
108 
• 
·' :,,' 
~.l 
SINGLI LA YD PIRCIP'i RON 
fast adaptation for real changes in the input distributions and averaging of put 
inputs to provide stable weight estimated. One problem of the perceptron ia 
that the decision boundaries may oscillate continuously when inputs are not 
separable and distribution overlap. Another problem of this net is the averaging 
process in the adaptation algorithm which requires not only a complez hardware 
implementation but is also time and memory space consuming. 
A modification to the perceptron convergence procedure can form the least 
mean square (LMS) error algorithm. This algorithm minimizes the mean 
square error between the desired response. and the actual output of the 
perceptron. This algorithm is called the Widrow-Hoff algoritbm.13, 11 This 
thesis employs a modified version of the Widrow-Hoff LMS error learning 
algorithm. The LMS algorithm is identical to the perceptron convergence 
procedure described above except the hard limiting function is replaced by a 
linear function. Weight values are thus corrected on every trial by an amo\lilt 
that depends on the difference between the desired and the actual output, or the 
error. 
The perceptron training algorithm makes no ass11mptions concerning the 
shape of underlying distributions but focuses on error that occur where 
distribution overlap. Therefore, it is more robust than classical techniques and 
work well when inputs are generated by nonlinear processes and are heavily 
• 
skewed and non-Gaussian. The adaptation algorithm is simple to implement 
and it does not require any other information other than the present values of 
the error and the input variables. However, the perceptron does not work well 
as a classifier if the classes cannot be separated by a hyperplane . 
109 
.,,,_ 
lfULfl.LAYD PIRCIPiitON 
A.5 Multi-Layer Percepb on 
Multi-layer perceptrons are feed-fo, ward neta with one or more layen ol 
nodes between the input and output nodes. These additional layers cont.sin 
hidden 11nits or nodes that are not directly conneded to both the input and 
output nodes. Multi-layer perceptrons overcome many of the limitations or 
singl~layer perceptron, but were generally not used in the past because 
~ 
effective training algoritlhn was not available. This problem is recently changed 
with the newly developed algoritbms.23 The capability of multi-layer perceptron 
stem from the nonlinearities used within the nodes. If nodes were linear, then a 
single-layer net with appropriately chosen weights could exactly duplicat-e those 
calculations performed by any multi-layer net. While the single-layer 
perceptron forms half-plane decision regions, a two layer perceptron can form 
any, possible 11nbounded, convex region in the space spanned by the inputs. 
Convex regions are formed from the intersections of the half-plane regions 
formed by each node in the first layer of the multi-layer perceptron. The 
n11mber of nodes must be large enough to form a decision region that is as 
complex as is required by a given problem. On the other band, it must not be so 
large that the many weights required cannot be reliably estimated from the 
available training data. 
A three layer perceptron can form arbitrarily complex decision regions . 
. 
This property depends on the partitioning the desired region into small 
hypercube (hypersquare if there are only two inputs). Each hypercube requires 
2N nodes in the first layer, and one node in the second layer. Hypercubes are 
assigned to the proper decision regions by connecting the output of each second-
layer node only to the output node corresponding to the decision region that 
node's hypercube is in.- T,he construction procedure can be generalized to use 
,, 
arbitrarily shaped convex regions instead of small hypercubes and is capable of 
' 
110 
,, 
I, 
i 
IIULfl-LAYD PIRCIPl'iiOlf 
pnerating the diacomiected and non-convez region. Since three layer n
et can 
generate arbitrary decision region, a claBBifier can be built with no m
ore than 
three layer net. The number of nodes in the second layer must be gre
ater than 
one when decision regions are disconnected or meshed and canno
t be fo,med 
from one convex region. In the worst case, the n11rnber of nodes 
in the second 
layer is equal to the n11mber of disconnected regions in input distri
butions. The 
n11mber of nodes in the first layer should be Stifticient enough to pr
ovide three or 
more edges for each convex area generated by every second-lay
er node. There 
should thus typically be more than three times as many nodes in th
e first layer 
as in the second layer. This analysis applies to the multi-laye
r perceptron with 
one output node with hard limiting nonlinear function built-in th
e output node. 
Similar behavior is exhibited by multi-layer perceptron with m
ultiple 
output nodes when sigmoidal nonlinearities are used and th
e decision rule is to 
select the class corresponding to the output node with the
 largest output. 
However, the decision region are typically bounded by smooth curv
es instead of 
by straight line segments and thus analysis is more difficult. 
This type of net, 
... 
however, can be trained with the back-propagation training a
lgorithm. The 
back-propagation algorithm is a generalization of the LMS algo
rithm. It uses a 
gradient decent technique to minimize the function of th
e mean square 
difference between the desired and actual outputs. An esse
ntial component of 
• 
the algorithm is the iterative method that propagates error te
rms required to 
adapt weights back from nodes in the output layers to nodes in l
ower layers. In 
a multi-layer architecture, if the index for the first hidden laye
r is j, the index 
for the second hidden layer is k and the output layer index i
s l, then the output 
for each layer can be expressed as 
111 
I 
IIUL11-LAYER PDC&Pi'RON 
N-1 
x'·•fi(~w .. x.-8.) OSJ~N1-1 } . IJI J 
,. 
(.4.14) 
N1-1 
x,.._ • fi( ~ wik' x }- e'1r.) 
,.o 
N2-1 
Y1 • ti( L W j:i" x,;' - e",) 
k•O 
0 SIS M-1 
where 8j , 81,. , A1 are the threshold values for the first hidden layer, second 
hidden layer and the output layer respectively. The weight values are updated 
as follow 
, 
W · · (t+ 1 ) = W · · (t) +'" 6 · X · I) I) 'I J I (A.15) 
where W;j(t) is the weight from hidden node i or from an input node to node j at 
time t, x'i is the output of node i or is an input node, and 11 is the gain factor. 
Equation (A.15) is a generalized equation of how the weights should be updated. 
For example, if we focus our analysis between the input and the first hidden 
layer nodes, then the indices remain unchanged. If we, however, concentrate 
our analysis between the first hidden layer and the second hidden layer, then 
we have to change the index i and j in equation (A 15) to j and k. The error term, 
Oj, is also a generalized term. If node j is an output :iJ.ode, then 
6- = Y·(l -y-) (d--y·) J J J J J 
(A.16) 
where dj is the desired output of node j and Yj is the actual output. If we again 
conform this equation lllto ·tne index convention in equation (A.14), then the 
.,. 
indexj in this equation has to be changed to I. If the nodej is an internal hidden 
node then 
, 
112 
r 
(I 
r 
., 
I 
i 
\. 
\ 
) 
• 
JIULfl.LAtD P&RCIPIWON 
(A.17) 
where k is over all nodes in the layen above node J. Convergence is sometimes 
fast.er if a momentum term is added and weight changes are smoothed by 
(A.18) 
The back propagation algorithm, however, may find a local minimum 
instead of global minimum. In addition, the number of presentations of training 
data required for convergence has many times been large. 
A.6 Kohonen's Self Organizing Feature Maps 
Kohonen's algorithm24 creates a vector quantizer by adjusting weights 
from common input nodes to M output nodes arranged in a two dimensional 
grid. Input vectors are presented sequentially in time without specifying the 
desired output. Aft.er enough input training vectors, weights will specify cluster 
or vector centers that sample the input space such that the point density 
function of the vector centers tends to approximate the probability density 
function of the input vectors. Therefore, the weights will be organized such that 
topologically close nodes are sensitive to inputs that are physically similar. The 
algorithm that forms feature maps requires a neighborhood, which will slowly 
decrease in size with time, to be defined around each nod8( Weights are initially 
set to small random values just like the single and multi-layer perceptron in 
previously section. When the input is presented to the map, the distance · 
between the input and all nodes is computed as 
113 
KOBONIN'S SELP ORGAN17JNG FEATURE MAPS 
(A.19) 
where x;(t) is the input to node i at time t, W;j(t) is the weight from input node i 
to output node j at time t, and dj is the distance between the input and each 
output node j. If the weight vectors are normaJized to have constant length (i.e. 
the snm of the squared weight from all inputs to each output are identical), then 
the node with the minimum Euclidean distance can be found to form the dot 
product of the input and the weights. This selection can be done with. extensive 
lat.era! inbibitation as in th.e MAXNET described in previous section. Once this 
node is selected, weights to the node and the weights to the neighborhood of the 
selected node is modified to make these nodes more responsive to the current 
input. If we define the index of the node selected as /, then the weights will be 
updated as 
W ij(t+ 1) = W;jCt)+1l (t)(x; (t)-W;/t)) (A.20) 
j e N Ej• (t) 0 S i S N - 1 
where NE;(t) is the neighborhood of node/ and 'Jl(t) is the gain term ranging 
from O to 1 th.at decrease in time. Once th.e weights converge, th.e weights are 
then fixed with the gain term is set to 0. 
' This map can be used as speech recognizer as a vector quantizer. Unlike 
th.e Carpent.er/Grossberg classifier, tbis algorithm can perform relatively well 
, because of th.e limited number of classes available, th.e slower weight 
\ 
adaptation, and the t.ermination of adaptation once convergence is reached. 
This algorithm is th.us a viable sequential vector quantizer when th.e number of 
l 
114 
KOIIONIN'S 8111 ORGANIZING PIA1VRI MA.Pl 
' cluat.en desired can be spec:ifted before 111e and tbe amount of training data ii 
larp relative t.o the number of clusten desired. 
• 
115 
I • 
' /•. F·, 
•. 
<. 
i 
,, .. 
J• 
. 
~-
,t. 
i 
' '.' 
; 
• 
• 
DDJV A fl ON OP TIii V ARYINO CONVIIOINCI rACIOll 
AppendixB 
Derivation of the Varying Convergence 
Factor 
The convergence factor, µ, in the LMS and the clipped data LMS 
algorithm plays an important role in the circuit convergence speed. If the 
convergence factor is too large, then each ino emental weight calculated may
 be 
too large t:o make the weights reach their steady state values. As a result, the
 
circuit will overcorrect itself and the error sign.al will oscillate in amplitude.
 
However, larger convergence factor aids in the convergence speed as the erro
r 
signal can be reduced in a much faster pace. On the other band, the
 small 
convergence factor will guarantee a better convergence; however, the
 
convergence speed will be much slower. A delicate balance between the
 
convergence speed and the convergence performance requires a c
areful 
evaluation when design a adaptive circuit. Therefore, should the conve
rgence 
factor be variable, large when the error signal is large, reduced according
ly to 
the error signal, then the circuit performance may be optimized. 
This appendix describes the derivation of a variable convergence factor 
scheme used in the linear adaptive neuron discussed in this thesis. This sche
me 
is the direct result of the SONOS nonvolatile memory transistor syna
ptic 
elements. Let us first review the channel conductance of a SONOS trans
istor, 
the channel conductance can be written as 
(B.l) 
.. 
... ' 
where k is the spatial index and m is the time index, Vr i~ the read voltage, v,
h is 
the electrically programmable threshold voltage, the superscript+ denotes
 the 
116 
' 
• 
DIRIVAftON OP 181 VARYING 
channel conductance connected to the politive programmin1 path, ~ ia the beta 
of the MOS tranailtor defined u 
~ • µ,If (W/L) C,t/ 
where the µ,ff is the eft'ective mobility of carrier, W and L are the width and the 
length of the transist.or, C eff is the effective gate capacitance. If we increase the · 
time index by one, the channel conductance then becomes 
(B.2) 
Combining equations (B.l) and (B.2), we can rewrite the channel conductance in 
the following form 
gdslc + (m+l) = gdslc + (m) - ~ [V thlc + (m+l)-V thk + (m)] 
= gds/c+(m)- ~ ~ V thk+(m+l) 
(B.3) 
where ~ v,hk + (m+ 1) denotes the difference in threshold voltage at adjacent time 
slot. Equation (B.3) is a general expression for the relationship between 
channel conductance at different tjme slot. Therefore, the following relationship 
l 
can also be drawn 
gdsk+(m) = gdsk+(m 1)- ~ a vthk+(m) 
gdslc-(m) = gdsk-(m-1)- ~ a v,hk-(m) 
(B.4) 
From the analysis in chapt.er 4, the weight value can be written as the the 
differential conductance between the positive and negative programming paths 
multiplies by the gain factor of the circuit as 
117 
DDIVAflON or 1'81 VARYING CONVDOINCI PACTOR 
Substitute equation (B.l) int.o (B.5) and assuming the 
transistors, then we can rewrite the weight as 
OR 
(B.5) 
(B.6) 
From equation (B.6), the incremental weight can be obtained. Let us 
concentrate on the clipped data LMS error algorithm implemented in the linear 
adaptive neuron. The clipped data LMS error algorithm is shown below for 
easier comparison and comprehension purpose 
where µclmse denotes the convergence factor for the clipped data LMS error 
• 
algorithm. Combining the above expression and equation (B.6), the following 
result can be obtained 
· 118 
,: f - ·-
D1R1VA110N OF THI VARYING CONVERGENCI PACTOR 
Wt(m+l)-Wt(m) • 2µ,t.&uft(m)l.rgn(£(m)).rgn(xt(m)) C,.7) 
R 
• R,{ Fi) [P (V ,u-(m)-V 1M +(m))-P (V IM-(na 1)-V IM+(na 1))] 
R2 
• a,{ Ii;) P [CV IM-(m)-V IM-(m 1))- (V tJ,t(m)-V ,M+(m l))] 
OR 
Since the result of the digital multiplication, sgn(£(m)) sgn(xk(m)), is either 1 or -1 
(1 if both signs are the same polarity and-1 if the signs are opposit.e in polarity), 
the following conclusion between the convergence factor and the change of the 
threshold voltage of the SON OS nonvolatile memory transistors can be drawn 
(B.8) 
From section 4.2 in the thesis, the incremental weight change is achieved 
by applying the programming voltage to the gate electrode of the SONOS 
nonvolatile memory transistors. The programming voltage, VP, can be 
expressed in one of the either forms below 
VP(m) = 2GIE(m)I + V pdc+ {Positive Programming Path} 
VP(m) = -2GIE(m)I- Vpdc- {Negative Programming Path} 
(B.9) 
where G is the gain factor in the summing amplifier section and can be 
expressed as 
119 
DDIVAffON or THI VARYING CONVIROENCI PACTOR 
Depending on the result of the digital multiplication, either the positive 
programming voltage or the negative programming voltage is chosen by the 
steering network for programming the SONOS devices. I will concentrate the 
following analysis on the positive incremental weight, while negative 
incremental weight has almost the same derivation with minor sign changes. 
H we differentiate equation (B.8) with respect to the error signal, E, then 
the following equation can be obtained 
(B.10) 
Differentiate equation (B.9) with respect to error and substitute the result into 
equation (B.10), we arrive 
. R d 
2µclmse = Rt( Fi)~ 2G d VP [ a V thk-(m)-a V thk +(m)] 
thus 
(B.11) 
lfwe make an approximation as follow 
th.en we can rewrite the convergence factor as 
120 
• 
DDIVA'nON OP TBB VARYING CONVRROENCI PAC'IOI 
(B.12) 
The following analysis is based on the MNOS device sb ucture which 
exhibits nonvolatile memory behavior as the SONOS device structure. The 
change in threshold voltage, Ii V thk> can be written in the following form25 
(B.13) 
where tP is the programming pulse width, C0 and CN are the oxide and nitride 
capacitance defined as 
where £
0
, £N are the permittivity of the oxide and nitride, x0 , xN are the oxide and 
nitride thicknesses respectively, Vr is the characteristic tunnel oxide voltage and 
'tis the characteristic time written as 
(B.14) 
where Ir is the tunneling current density and V0 is the voltage drop across the 
'* 
t,1nneling oxide 
121 
't 
DIRJVAflON OP tBi VARYING CONVDOINCB PACTOR 
From the relationship shown above, we can draw the following result.a 
Substitute equation (B.13) into (B.15), then 
=-
XO 
-
xeff 
tp 
-
't 
-
t 
l+g 
t 
1+ -( C0 ) tP 
CN 't 
t 
l+g 
't 
t 
l+g 
t 
Therefore, we can rewrite the convergence factor as follow 
122 
(B.15) 
(B.16) 
I,·~" 
L ,, 
R2 "i° 
Pcbw • 2lyR- p G 1 ,, 
1+-
1 
DDIVAflON OP 1°Bi VARYING CONVEROINCI rACTOll 
(B.17) 
• 
If we rearrange equation (B.17) and substitut.e the upression for t, then the 
following result may be obtained 
(B.18) 
where µclmse<O) is a constant contained the gain factor of the circuit as 
To further simplify the convergence factor expression, we can rewrite equation 
(B.18) in the following form 
(B.19) 
• 
where 
tpJT Vpdc 
K- exp 
1+ Vr 
. CN 
2G (B.20) 
a=----(t+~;)vr 
Notice from equation (B.19) that when the error signal is large at the 
123 
,. 
• 
Di•lVA'DONOPfBiVARYJMOCONVERGINCIPACIOR 
beginning or the adaptation proce11, the convergence f'ad:or haa the value of 
µ ""-" (0); while the error 1ignal .tart.a t.o reduce due t.o the error minimization 
C " 
algorithm, the convergence !actor also starts to reduce in value. There!ore, the 
speed of convergence depends not only on the gain factor or the circuit, but also 
the SONOS device parameters. Equations (B.19) and (B.20) also provide the 
first cut information for what device parameters will provide the best 
' \ 
,; 
convergence performance for optimum circuit operation. 
124 
. ,, 
' . 
.~~ . ·-~ 
I ' 
IOffWAR.18DIULA110N OP 11li LINEAR ADAP11V1 NSUllON 
AppendixC 
Software Simulation of the Linear 
Adaptive Neuron 
This appendix presents a software simulation program d
esigned to 
~JD1ilate the behavior of the linear adaptive neuron un
der different learning 
algorithms. The program is written in the form of a subrout
ine and is embedded 
in a larger software package called F1DDLER, written b
y Dr. Richard Booth. 
The program utilizes the circuit component values and de
vice parameters as 
inputs and thus it can simulate the circuit operation as cl
ose as possible. The 
variable convergence factor scheme discussed in the prev
ious appendix is also 
incorporated in the program. One of the benefits in
 using the software 
simulation of the linear adaptive neuron circuit is the avai
Jability of the weight 
changing information with respect to time. The program is w
ritten in HP Basic 
language and the source code is listed below 
11449 
11455 
11458 
11461 
11464 
11467 
11470 
11473 
11476 
11479 
11482 
11483 
11485 
11486 
11487 
11488 
11489 
11490 
11491 
11494 
11497 
11500 
SUB Fidclleneur 
#################################1############
######11 
Fiddleneur :OPTION BASE 1 
-R.outine3 6 : ! 
-----
-----
-----
----
COM/Data st u £ f /Title$ ( *), Cname $ (*),Gener at or (*),Data r ray 
( *) 
COM/Datastuff/Nchan rnax,Npts max,Nchan,Npts 
-
-
COM/Character/Blank$,U1ine$,Bold$,Norm$,Cr$,rf$ 
COM/Character/Wht$,Red$,Ye1$,Grn$,Cya$,Blu$,Mag$,Blk$
 
COM/Leveldata/Kp,Kc,Version$ 
COM/Constants/CO,EO,Qe,Me,Kb,Eox,Esi,Mec,Mhc,Vt,Ni,Vsat,
Egap 
DIM Vbfile$[30],File$[30],Drive$[30],L$(30) [200] 
OFF KEY 
Kp=Kp+l ! INCREMENT KEY PR.IOR.J:TY 
Okc=Kc. ! SAVE LAST KEY COLOR 
Kc=S ! KEY COLOR. 
Coloaaode ! COLOR.GRAPH MODE 
- PtJ'I' DATA INTO: 
. ! USER SPECIFIC CODE 
! 1> NCHAN =NUMBER.OF CHANNELS
<= NCBAN MAX 
-
! 2> NPTS =NUMBER.OF POINTS 
IN EACB CBAHNEL 
! 3> TITLE$(1) = TITLE 
>' 
(2) = SOBTZTLZ I 
• 
125 
\ . 
11503 
11506 
11509 
11512 
11525 
11535 
11s,s 
11s,6 
11s,e 
115,g 
11550 
11551 
11SS2 
11553 
11S54 
11555 
11556 
11557 
115S8 
11559 
11560 
11561 
11562 
11563 
11564 
11565 
11566 
11567 
11568 
11569 
11570 
11571 
11572 
11573 
11586 
11587 
11588 
11589 
11590 
11591 
11593 
11594 
11595 
11597 
11598 
11599 
11600 
11601 
11602 
11603 
11604 
11605 
11606 
IOffWARI 8DIULA110N OP TRI UNBAR ADAPnV1 NEURON 
' (3) • CClaSll'1'S 
f •> CDtn:t (CBNI) 
t 5> DATA...IL-.&Y(PTl,CRUf) • DATA 
fvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvwvvvvvvvvvvvvvv
v 
r,,nd co~t (300) 
-Bcban•12 
•pt••llpta mex 
-
ALLOCATE Sign.al(11pta),Xl.(11pt•),X2(11pt•),Deaire(11pt•),r(•pte) 
' ---------------------
-----
roan r 1: IMAGE 8A,ND.5D& 
DATA "n RF • 1S4 
D.l.'l'A "n R.2/Rl • 10 
DATA "n SIG. ANPL • .2 
DATA "n DES. ANPL = 2 
DATA "n SIG. FREQ= 100 
DATA "n DES. FREQ= 100 
DATA "n PHASE DIF = 4S 
DATA "n GAIN = . 6 
DATA "n W/L m 1.5 
DATA "n NOBILITY = 400 
"n TOX 
"n XN 
"n VT (NROS) 
"n JT 
"n '1'P 
DATA 
DATA 
DATA 
DATA 
DATA 
DATA "n V PDC 
I 
• 
RESTORE 
N1ab-16 
-
FOR I=l TO Nlab 
READ L$ (I) 
NEXT I 
GOSOB Getpars 
Axes f1ag=l 
-I 
• 
OF!' KEY 
Kp=Kp+l 
Okc=Kc 
Kc=9 
"SIMO. 
"SIMO. 
Coloaaode 
ON KEY 0 
ON KEY 1 
ON KEY 2 
ON KEY 3 
ON KEY 4 
ON KEY 5 
ON KEY 6 
ON KEY 7 
I,ABEL " 
ON KEY 
I,ABEL " 
I,ABEL " 
I,ABEL " 
a: 2E-7 
-
4.51-7 
-
.8 
= lE-6 
- lE-3 
= 5 
PAR'S 
STAR~ 
ON KEY 9 I,IBEL "QUIT <SIMO.> 
Wait: GOTO Wait 
Quit: BEEP 5000,.01 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" 
" , Kp GOStJB Spar 
",Kp GOSUB Simu 
", Kp GOTO Wait 
" , Kp GOTO Wait 
", Kp GOTO Wait 
", Kp GOTO Wait 
",ltp GOTO Wait 
",Kp GOTO Wait 
" , Kp GOTO Wait 
" , Kp GOTO Quit 
DEALLOCATE Signal(*),X1(*),X2(*),Desire(*),Y(*) 
126 
' 
., .. 
rt . : 
11,oe 
11609 
11610 
11611 
11612 
11613 
11614 
11615 
11616 
11617 
11618 
11619 
11620 
11621 
11622 
11623 
11624 
11625 
11626 
11627 
11628 
11629 
11630 
11631 
11632 
11633 
11634 
11635 
11636 
11637 
11638 
11639 
11640 
11641 
11642 
11643 
11644 
11648 
11649 
11650 
11653 
11654 
11655 
11656 
11657 
11658 
11659 
11660 
11661 
11662 
11663 
11664 
11665 
l 
aor1 WARE SDIULAflON OP 181 LINIAll ADAPHVI NSUR0N 
Clear•creen 
Kp-Kp-1 
XC-Okc 
Color:mode 
StJBDIT 
' ------------------------8 par: ass• sooo, .01 
GCL&AP. 
CALL Entry (L$ (*) , Rlab, "8%NDt,lTIOII PARAN&TSIUI: ") 
GOSUB Getpar• 
RETURN 
Getpar•: 
R f-vAL(L$(1) [15,29)) 
-R2 over r1-vll(L$(2) [15,29)) 
- -San,pl=VAL (L$ (3) (15, 29]) 
Dancpl=VAL (L$ (4) [ 15, 29]) 
Sfreq=VAL(L$(5) (15,29]) 
Dfreq=VAL(L$(6) [15,29]) 
Phaae=VAL(L$(7) [lS,29]) 
Gain=VAL(L$(8) [15,29]) 
W over l=VAL(L$(9) [lS,29)) 
- -U n=VAL(L$(10) [15,29]) 
-T ox=VAL(L$(11) [lS,29]) 
-X n=VAL(L$(12) [15,29]) 
-V t=VAL(L$(13) [15,29]) 
-J t=VAL(L$(14) [15,29]) 
-T_J>=VAL(L$(15) [15,29]) 
V_pdc=VAL(L$(16) [15,29)) 
RETURN 
Simu: BEEP 5000, .01 
DISP Cya$;Bold$;"SIM0LATI0N IH PR0GUSS!";Rorm$;Grn$ 
WllT 2 
DISP 
Title$ (l)="EXAMPLE" 
Title$ (2) ="EXAMPLE" 
Title$ (3) ="EXAMPLE" 
Cname$(l)="TIME" 
Cnarne$(2)="Wl" 
Cnarne$(4)="W2" 
Cnarne$(5)="Error" 
Cnarne$(6)="EF!'ECTIVE MU" 
! #################################################### 
IC ox=3.9 
-Epsilon 0=8.854E-14 
-IC n=6.5 
-Cox=K ox*Epsilon 0/T ox 
- - -Cn=K n*Epsi1on 0/X n 
- - -Ceff=Cox*Cn/(Cox+Cn) 
Beta=O n*W over 1*Ceff 
- - -U eff 0=2*Gain*Beta*R f*R2 over rl 
- - ... - -K=T_p*J_t/(V_t*(Cox+Cn))*ZXP(V_pdc/((l+cox/Cn)*V_t)) 
Alph=2*Gain/((1+cox/Ch)*V t) 
-Phase=Phase*PI/180 
127 · 
11,,, 
11667 
11668 
11670 
11671 
11672 
11673 
11675 
11676 
11677 
11678 
11680 
11681 
11682 
11685 
11686 
11687 
11688 
11689 
11690 
11691 
11692 
11693 
11696 
11697 
11698 
11699 
11700 
11701 
11702 
11703 
11704 
11705 
11706 
11707 
SOFTWARE SIMULATION OF 11IE 1JN&AR ADAPl1VI NIURON 
Dt•(1/Sfreq)*.05 t ,s,rOD/20(TiMI ITSP) 
t TIMI T-0 
! ********************* ll&IGB~I IJIITIAX,IUTIONS ******* 
W1•20 ! FIRST RIC.BT 
W2•20 t SSCOIID RIC.BT 
FOR I•l TO Rpta 
Signal(I)-S•mpl*SIB(Sfreq*T) 
Desire(I)=D•mpl*SIN(Dfreq*T+Pba•e) 
Xl(I)=Signal(I) ! S.INPI& DiPtrr SIGR&L 
X2(I)=Sampl*SIN(Sfreq*T-PI/2) 
Y(I)=Xl(I)*Wl+X2(I)*W2 I CONVOLUTIOII 
Err=Desire(I)-Y(I) 
Nu eff=U eff O*~* 
- - -EXP(Alph*ABS(Err))/(l+K*EXP(Alph*ABS(Err))) 
PR.INT "I, MU E!9!'",I,Mu eff 
- -Wl=Wl+2*Mu eff*Err*Xl(I) 
-W2=W2+2*Mu eff*Err*X2(I) 
-Datarray(I,l)=T 
Datarray(I,2)=Wl 
Datarray(I,4)=W2 
Datarray(I,5)=Err 
Datarray(I,6):::.ldu eff 
-
T-T+Dt 
! tJPDA'l'E WEIGHTS 
NEXT I 
tAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 
• 
Outuser: ! 
Genval 
BEEP 5000, .01 
DISP Cya$;Bold$;"FINISBE0 !!!";Hozm$;Grn$ 
WAIT 2 
DISP 
!Cc=Okc 
Colocnode 
Kp=Kp-1 
RETURN 
SOBEND 
The effects of the variable convergence factor are found to be significant 
for the convergence behavior of the circuit. Figure. C-la and b show the circuit 
convergence performance (error versus time characteristics) under two different 
schemes: one with a variable convergence factor and the other with a fixed 
convergence factor. 
From figures C-1 a and b, the scheme with the variable convergence factor 
converges in a much faster rate as postulated in the previous appendix. From 
the simulation result, the convergence factor will reach its steady state value in 
128 
; . 
L 
0 
L 
L. 
w 
L 
0 
L 
L 
w 
SOFI WARE SDfUL,\TIGN OF THE LINEAR ADAPTIVE NEt..~ON 
" 
2 . ,,.,.. 
• 
f 
• I 
t .. I . ::: 
I ~ :: ~ I 
: ·: i ;: I ! 
: : : > ( f !! I ~ ft 
. · .. · ·.;·a 
•. · .. ·:·-:;t ·I 
• • • • • t ,, .. l 
•...•. , .. ::••+ 
.............. :~: 
. . . . . . ' ' . ,. ' 
. · ... ····":,::~ 0 ... ··:: .. •·:: 
- .. ·.· ::;:::::::i\ ~ ·:-::::::!:iii:ij' 
_. 0 , , ' , I o I O • t t ; 
::: :: :: :: :: :: V ~ 
: . : . : . : : : lf ~ ' 
l 
:·: ;: :f :i i 
- ., .. ~ I 
. : : :· i 
. . . 
I 
.. 
. i 
. 
EXAMPLE 
TIME 
(a) 
2 . rJDDLIJI EXAMPLE 
! ! t 
...• I \ 
• :: ;: :: I I 
: : ·. : : :: ~ : I I 
. .. - • I 1 . • . . • . • .• - • 
1 . . . . . . . . :: : _. I • ,: : . : . . • : '. : : :: :: :: I l 
. • • . • • • • .. .:i I 
. . . . . . . •. .. I 
.. t I • • t • ft o o t o ot ti , , 
. . . .. . . .. .. .. -~ ,; . ,. 
4 ' ' ' ' ' • ' • • • , • ; , •' : • •; ;t I f I 
• . . . . • • • • • • .. • • • .• • • • •• .· -: 1 1' I ~ · : · · · : : : : : : : : : : : : : : : : • : : : : : 1 '. ·. : : ;1 ~ ~ A a 
J : : : : > : : : : : : : : : : : : : : : : : \ : 1 : : ; ~ n n 11 H n ,, . 
.. ·· .. ··:· .. ···::: .. ;:·:.:. ::-:='.:::;::::::!::\ . • • • • • • • • • • . . . . . • . • • • • • • • . • . • . • I • I 
- · · .. · : . · : : · · : : : : · : ·: : · : : : · : : : : : : : : : : \ : : =: 1 l f i l \ : i i l : . 1 0 - .. · · .. · · .. · · ... · : ... · · .. · · .. · : : · : : .. : : : : : : : : : : : ! : ! : i I I 
- .......• · .. · .. ·.· .. : .. :,,::,,::,•:::-•:l••jl· 
- . . . . . • • . • : • • : . . • : • • : : • • : : : : . : : : : : : : : : : : ; : : : : : : : : : i i I ! II I 
•• • • • ,• ••••.••••••••• ,.,,,:: 1 ·11 l\/ 
:: : : : : : : : : : : ; : : : ; : i : : : ; : : : j : : : • \ ! \! \ _! \ ; li ;iji \/ \I ii ~ V ~ V 
• • • . . . • . • . . • :, :- : . : : : : : : : : . : .. : : ; i . ; V ii V V V 
.· ....... .: ,: -: ,: •: ·: :: :: ;: :: :: :: V H ' 
• I a o • • O • • • • O I • • .•. o •. • • ::- ii ' 
• I I I I• I I f .. • o I ,··• •• .: :: u ' 
:: : : : : : : : : : :: ;: :: :: :: :; :: ~ - ' 
- 1 ..,. , . · . ' • : • , : '; : • ;: ·,: II f 
• • • • I • I Io : I :: :: ' 
. . . . . . ,• ~- ' 
.. . . . . : . . : : . :: •: ., 
-.. . . .: :: ~= ,: i . 
. .. . . . .. i I 
. . . ... 
• :: :: ~ I 
.. ~ I 
. :: i 
: i 
TIME 
(b) 
.. 
Figure C-1: Simulated Convergence Behavior of the 
· Linear .. \daptive Neuron with W 0=10 and 
• 
W1=-10 (a) Variable Convergence Factor (b) Fixed Convergence Factor 
129 
.:·-
• 
IOfi WARI SIIIULA110N or 1'Bi UNIAR ADAPIIVI NIURc»f 
.. 
the ru,e or 0.03-0.06. Tberef ore, more ezperimentl have t.o be performed with 
higher convergence factor by increasing the gain of the enor feedback path t.o 
determine whether the system time constant (or the convergence behavior) will 
benefit from the results of the software simulation. 
130. 
1 
:, 
t/ 
,,., . 
VITA 
Vita 
Chun-Yu Malcolm Chen was born August 11, 1967 in Taipei, Taiwan tD 
Cheng-Hsi,ing and Chin-Chu Wu Chen. He attended Lehigh University from 
August 1985 to August 1988, graduated with highest honor, and earned a 
bachelor degree in Electrical Engineering. He bas been enrolled as a full time 
gradua~ student as Lehigh University since his graduation in the department 
of Computer Science and Electrical Engineering. He is a student member of 
IEEE and was elected to the honor societies of Eta Kappa Nu, Tau Beta Pi, and 
Phi Eta Sigma. 
I 
.... ,. 
131 
