Design of Building Blocks for Trit Algorithm by Parthasarathy, Balaji




Bachelor of Engineering 
College of Engineering 
Guindy, India 
1990 
Submitted to the Faculty of the 
Graduate College of the 
Oklahoma State University 
in partial fulfillment of 
the requirements for 
the Degree of 
MASTER OF SCIENCE 
May, 1993 
OKLAHOMA STATE UNIVERSITY 
DESIGN OF BUILDING BLOCKS FOR TRIT 
ALGORITHM 
Thesis Approved: 
Dean of the Graduate College 
11 
PREFACE 
This thesis attempts to design the building blocks for 
TRIT algorithm. PSPICE was used for simulation. The 
building blocks were laidout in Magic. 
I would like to express my sincere gratitude to Dr. 
Chriswell Hutchens for his support, guidance and 
encouragement. I appreciate the time and effort that he 
spent on this project. I would also thank the Naval Ocean 
Systems Center for the facilities and funding of this 
project. I am thankful to Dr. Johnson and Dr. Baker for 
serving in my committee. I thank my research colleagues and 
friends in Stillwater. 
I am always indebted to my good friend Gunna, and would 
like to thank Viswanathan, Sunil Mathews and Raja. 
This work is dedicated to my parents Ranganayaki and 
Parthasarathy, my grandmother Rukmani, uncles Dr.S.V.Kannan 
and Dr.S.V.Vijayaraghavan, Mr.A.V.S.Raja and their families, 
my brothers and their families, and my sister Padmini 
Vasudevan and her family. 
iii 
TABLE OF CONTENTS 
Chapter 
I. INTRODUCTION AND LITERATURE SURVEY . . 
1.1 Introduction . . . . . . . . . . . . . 
1.2 Comparision of Analog and Digital ANN's 
1.3 Comparision of Current and Voltage 
mode approaches . . . . . . . . . . 
1.4 Learning . . . . . . . . . . . . . . . 
1.5 Review of Standard Back Propagation . 
1.6 Advantages . . . . . . . . . . . . . . 
1.7 Literature Survey . . . . . . . . . . 
1.8 Proposal for Hardware Implementation 
of TRIT . . . . . . . . . . . . . . 












II. TRIT MODEL SIMULATIONS ..... 20 
III. 
2.1 TRIT Program Description . . . 
2.2 Standard BP Program Description 
2.3 Adaptive BP Program Description 
2.4 Testing Procedures . . . . . . 
2.5 Results . . . . . . . . . 
2.4 Comparision of Adaptive BP and 
2.7 Results . . . . . . . . . 
. . . 
. . 
. . . . 


















SYSTEM BUILDING BLOCKS . . • • . . • . 35 
3.1 TRIT Interface Specifications ....... 37 
3.2 Current Conveyors ............. 38 
3.2.1 Introduction .......•.•.. 38 
3.2.2 High Power Conveyor . • . 43 
3.2~2.1 Design. . . . . . . 43 
3.2.2.1 Simulations. . . . . . . 47 
3.2.3. Low Power CC • . • . . . . . 50 
3.2.3.1 Simulations. . . . . . . 50 
3 • 2 • 4 Opamp . . . . • . . • . . • . . 52 
3.2.4.1 Analysis of Opamp . . . 52 
3.2.4.2 Biasing circuit of the 
Opamp . . . . . . • . 56 
iv 
Chapter Page 
opamp . . . . • . 56 
3.2.4.3 Power Circuit Design 59 
3.2.4.4 Simulations 60 
3.2.5 Squashing function . . . . . . 60 
3. 2. 5. 1 Design . . . . . . . . . 60 
3. 2. 5. 2 Simulations. . . . . 63 
3.2.5.3 Approximate Sigmoid 
Function. . . . . • . 66 
3.2.5.3.1 Simulations . 69 
3.2.6 Dynamic cascode biasing . • • 73 
3.3 Derivative circuits • . . . • • . . . • . 75 
3.4 Weight Matrix • . . . . . . . . . . . 78 
3.4.1 Floating gate analog memories 81 
3.4.2 Weight adjustment circuitry . • • 85 
3.5 Sample and Hold circuit . . . • . . . . . 88 
3.5.1 Introduction . • • . . • . 88 
3.5.2 Errors in S/H circuit • . • . 90 
3.5.2.1 Charge Injection Error . 90 
3.5.2.2 Switch Feedthrough Error 90 
3.5.2.3 Cascode configurations • 92 
3.5.3 Simulations . . . • . . . . . . . 92 
IV. CONCLUSION AND FUTURE PROSPECTS . 95 
REFERENCES 98 
APPENDIXES 106 
APPENDIX A - STARTUP PROGRAM 106 
APPENDIX B - TRINARY PROGRAM 109 
APPENDIX C - STANDARD BP PROGRAM 115 





LIST OF TABLES 
Comparision of Standard BP and TRIT 
Comparision of adaptive BP and TRIT 
V1 
Page 
• • 31 
• • 33 
LIST OF FIGURES 
Figure 





2. IC Signal Flow Floor Plan 
3. Flowchart of the TRIT program 
4. Flowchart of the TRIT program 
5. Regions of updation of weight vectors 
• • 24 
25 
6. Current Conveyor (CC) based Weight-matrix drivers . 36 
7. Block diagram of the Current Conveyor • • 39 
8. Current Conveyor symbol ••...••.••.... 40 
9. Current Conveyor circuit . . • . . .•.•••. 45 
10. D.C. characteristics of High-power cc ........ 48 
11. A.C. Response of High Power CC .•.......•. 49 
12. DC Characteristics of Low Power cc . . . . . 51 
13 . Opamp circuit diagram . . . . . • • • • . • . . . • 53 
14. D.C. characteristics of opamp . . • . 61 
15. A.C. characteristics of opamp • • 62 
16. Transient characteristics of opamp •.••..... 63 
17. Output function F(.) circuit ••••..•.••.. 64 
18. D.C characteristics of squashing (F(.)) function 67 
19. Sigmoidal function (Single Quadrant) •.•••. 68 
20. Sigmoidal Nonlinearity (Two Quadrant) .. • • 70 
vii 
Figure 
21. D.C. characteristics of sigmoidal circuit 
22. A.C. characteristics of sigmoidal function 
23. Dynamic cascode biasing cirucit 
24. Derivative circuit • 
25. D.C. Characteristics of Derivative circuit . 
26. Weight Cell 
25. Weight adjustment circuitry-1 . . . . . . . 
26. Weight adjustment circuitry-2 . . . 
27. Basic Sample and hold circuit . . . 
28. Sample and Hold (S/H) circuit . . . . . . . 




• • 7 2 
. • • • 74 
• 76 
• • 77 
. 84 
. . . . 86 
. . 87 
. . 89 
















Width to length ratio of subscripted MOSFET x 
Transconductance of MOSFET x 
Channel length modulation parameter of MOSFET 
X 
Voltage gain of subscripted operational 
amplifier x 
Gate to drain capacitance 
Gate to source capacitance 
Gate capacitance of subscripted MOSFET 
Oxide capacitance/unit area 
Small signal drain to source conductance 
Small signal channel transconductance of 
MOSFET x 
Drain current of MOSFET x 
Subscripted MOSFET 
Positive supply voltage 
Drain to source voltage of MOSFET x 
Gate to source voltage of MOSFET x 
Negative supply voltage 
Threshold voltage of MOSFET x 
Weight matrix 
Transpose of the weight matrix 
ix 
CHAPTER I 
INTRODUCTION AND LITERATURE SURVEY 
This thesis addresses the design and simulation 
of the building blocks for Trinary Backpropagation (TRIT) 
algorithm. TRIT is a modified form of the Backpropagation 
(BP) [1,2] algorithm for Artificial Neural Networks {ANN) 
and is also referred to as Trinary Backpropagation 
algorithm. Architecture for parallel on-board learning based 
on the TRIT algorithm is the aim of this work. TRIT 
quantizes the BP algorithm, by updating the weights of a 
Multi-Layer Perceptron {MLP) network, in parallel. The 
weight updates are not unique for each and every element of 
the weight matrix but can be only one of the three values 
an increment, decrement of the same magnitude, or zero. 
This results in large saving in silicon, because of the 
reduced complexity of the weight updates. Also the same 
weight matrix is used both in the forward propagation and 
back propagation mode resulting in area reduction by two. 
Larger layer sizes can be implemented in lesser area because 
of this modification to BP algorithm. On-board learning is 





The resurgence of interest in Artificial Neural 
Networks (ANN) that started in the late eighties has led to 
a host of new potential applications for these ANN models 
[3]. These Neural Network models offer great potential in 
the areas of speech processing, image recognition and 
pattern classification due to their high fault tolerance and 
parallel computation capability (4]. 
The complexity of these neural networks does not stem 
from the complexity of the individual components but from 
the multitude of ways in which a large collection of the 
components can interact. These network models reflect highly 
parallel, regular, and modular architectures that make them 
attractive for Very Large Scale Integrated (VLSI) systems 
[5]. The implementation of such models in hybrid VLSI 
analog/digital circuitry is one of the active current 
research areas [6]. 
The technologies used in special purpose ANN 
implementations are broadly classified as analog [7,8], 
digital or mixed analog/digital (hybrid) IC's [9], optical 
and electro-optical [10]. 
1.2 Comparison of analog and 
digital ANN's 
3 
Analog VLSI neural networks perform better than digital 
circuits in specific applications. Current studies [11] 
indicate that 109 to 1011 interconnections can be achieved 
with analog circuits, a rate much higher than digital 
circuits [11]. Analog VLSI ANN's make use of very simple 
building block which are reconfigurable and versatile. The 
simple building block approach simplifies the design time, 
making efficient use of the Computer Aided Design (CAD) 
tools. The design of simple well-defined analog cells that 
can be interconnected to achieve different linear and/or 
nonlinear functions is the key to the success of the analog 
ANN approach. This approach will bring neural nets VLSI 
design a step closer towards automation. Also, some of the 
traditional analog design requirements such as accurate 
absolute component values, device matching, precise time 
constants, etc are often of a lesser concern in Neural Nets 
applications because computational precision of individual 
neurons is not of paramount importance [12]. 
Schnieder and Card [13] have discussed the effect of 
the low-accuracy components on the design of ANN chips. 
They argue that ANN's with in-situ learning i.e. networks in 
which the synapses contain circuitry which performs local 
computation of weight updates, can adapt the weights to 
compensate for component to component variations present in 
analog networks. In fact, this thesis attempts to 
incorporate many of the transistor imperfections in the 
simulations for testing the validity of our algorithm. The 
results are discussed in Chap II of this thesis. Frye et. 
al. [14] have proved that the average error becomes less 
than 4% in an adaptive ANN, which uses hardware components 
with 30% variation. 
Analog rather than digital VLSI has been identified as 
a major technology for future ANN applications. A large 
effort is being devoted to the ANN implementation in analog 
Metal Oxide Semiconductor (MOS) VLSI [5]. Efficient tools 
for the synthesis at both circuit and layout levels, 
simulation and testing of large scale analog IC's are being 
developed [12]. 
1.3 Comparison of current and 
Voltage mode approaches 
Many of the neural network functions involve current 
4 
rather than voltage. The summing of many signals is readily 
achieved when those signals are currents. The dynamic range 
of the signals are greatly increased when MOS transistors 
are operated over the range from weak to strong inversion. 
This dynamic range is very critical for the scaled VLSI 
technologies which are expected to see a reduction in supply 
voltages. The frequency of operation is also potentially 
increased due to lower impedance internal nodes, and reduced 
full scale swing [12]. Reduced power consumption and 
increased speed of operation are the other inherent 
advantages of the analogue current-mode approach [15]. 
For the reasons outlined above, we make full use of 
current-mode processing in analog VLSI for the 
implementation of the TRIT algorithm. 
1.4 Learning 
5 
ANN models include : Hopfield, Hamming, Single layer 
perceptrons, Multilayer perceptrons {MLP), Grossberg and 
Carpenter, Boltzmann machines, Kohonen Self-organizing maps, 
Bidirectional associative memories, and Neocognitrons. The 
discussion of all these types of networks is beyond the 
scope of this thesis but will include a review of BP and 
MLP's. 
Backpropagation {BP) is a supervised learning scheme 
used in multi-layer perceptrons {feed-forward networks). 
The backpropagation networks are very attractive in 
applications such as pattern and speech recognition, 
waveform classification, etc. 
It has been shown that the multi-layer perceptrons 
(MLP), can approximate any function of interest to any 
degree of accuracy. 
shown in Figure 1. 
subclass of ANN's. 
The multilayer perceptron network is 
MLP are an important and popular 
They have simple dynamics because of the 
6 
Figure 1. Multi Layer Perceptrons {MLP) 
7 
absence of feedback paths. The simple dynamics ensures the 
stability of the multi-layer perceptrons. Also the 
existence of powerful learning and adaptation algorithms for 
these networks make them very attractive from the 
engineering perspective. 
The learning capability of ANN's is one of the most 
intriguing and challenging areas in theoretical 
neuroscience. Some researchers (16] have used fixed 
interconnection weights between processing units to 
implement learning in ANN's using the various algorithms 
discussed before. This however limits the application of 
the network. There has been several attempts [17-22] to 
address the problem of modifiable weight circuitry. 
Learning historically requires connectionist elements to 
have a considerable amount of circuitry, and, hence, a 
large amount of silicon area in addition to high inaccuracy 
(11]. On-chip learning procedures has been reported by 
several authors [17,18]. Furman et.al [18] used a dynamic 
memory cell and circuitry for weight modification and 
storage to implement the BP algorithm. They attempted 
analog storage in digital fashion by storing graded charges 
on a capacitor. The value of the charge represents the 
weight value which complicates the whole circuitry. 
Alspector [17] implemented an digitaljanalog weight 
stochastic learning network. In Alspector's work, the 
weights are subjected to fixed increment or decrement at 
each step of the learning process. 
1.5 Review of Standard 
Backpropagation 
[27,28] 
Standard BP can be represented as 
Oi = Ii (Unit i an input unit) 




where Ii is the external input to the unit i, wij is the 
weight associated with the interconnection from the jth 
processing unit in the network to the ith unit, bi is the 
bias or the offset term, and F is the sigmoidal activation 
function for the hidden and output units. Propagation in the 
forward direction can be represented by the following 
equations 
-ao = p 
ak+l = r+l (Wk+lak + ]Jk+l) k=1,2, ... M-1 
a= aM 
(2) 
while the backpropagation of the error can be represented as 
oM = -FM' (nM) (t - a) 
oM = -Fk' (nk) wk+l Tok+l k=M-1, M-2, ... , 1 
(3) 
9 
Weights and offsets are changed according to 
k =1, 2, .. . M 
(4) 
k =1, 2, .. . M 
The BP algorithm tends to converge very slowly. Also 
the incremental changes in delta (error propagating) and 
output vectors near convergence are extremely small. The 
weight and bias changes are proportional to the error. As 
the error becomes too small, the weight change becomes too 
small as the circuit approaches a stable weight 
configuration. Based on the noise figure of the analog 
process, the incremental changes can be too small to be 
implemented practically in analog hardware. The lower bound 
on the convergence error is set by the limitation of the 
analog hardware. The signal to noise ratio (S/N) expected 
for our circuit is 60dB. So if the network fails to 
converge in this range of signal changes, the noise in the 
circuit takes over. The final circuit state then becomes 
dependent on the noise present in the circuit. 
The potential difficulties associated with computing 
and imposing graded weight updates in parallel in analog 
hardware have led researchers to investigate better and 
easier methods of parallel learning procedures in which 
weight changes are coarsely quantized [6]. Small 
modifications of learning procedures can considerably 
enhance the computational power of neural networks and can 
10 
make practical implementation of such networks easier [23]. 
Peterson & Hartman [24] examined the effect of update 
quantization into two states (increment or decrement) on the 
performance of a mean field theory learning algorithm. 
Alspector et. al. [17] implemented a hybrid digitaljanalog 
circuit in which weights are subjected to fixed increments 
or decrements per step of the learning process. M.Marchesi 
studied the effect of restricting the weights in multi-layer 
perceptrons to powers-of-two or sums of powers-of-two. A 
learning procedure based on backpropagation was used for a 
neural network with these discretized weights [25]. 
Shoemaker [6] proposed a modified Sgn-Sgn or trinary 
learning algorithm, which forces the same weight update 
increment for all elements in the network for efficient 
implementation of backpropagation in electronic perceptrons 
which will henceforth be referred as TRIT algorithm. Several 
electronic neural network solutions have been offered to 
date, but none with parallel onboard TRIT learning. The 
importance of on-board learning has been demonstrated by 
Frye, Wong et. al. [14] They argue that performance of the 
hardware when it learns by simulation is much poorer than 
that of obtained by learning on the network itself. 
In our architecture, charge is stored on an 
electrically isolated floating gate of aMOS device [6], 
which would represent the weight value. However, precise 
control of increments of charge, and hence change in 
11 
weights, is difficult in existing technology because the 
charge tunneling in the gate of a floating MOS device is 
difficult to control [6] due to extreme nonlinearities. If 
a unique and small weight change has to be accomplished on 
each element of the weight matrix, it will require a very 
large and complex circuitry even for small layers. The 
routing complexities related to high voltage problem and 
individual program control also necessitate a large area of 
silicon. Therefore it is advantageous from an 
implementation perspective to increment or decrement all 
the elements of the weight matrix in parallel across an 
entire network [6] or at the very least a complete row or 
column simultaneously. 
Trinary Backpropagation (TRIT) is a simple variant of 
the classical BP algorithm which makes the practical 
implementation of on-board chip learning feasible. In this 
algorithm, the weight changes are assumed to be only one of 
the three values: an increment, a decrement of the same 
magnitude, or zero. 
The TRIT algorithm allows a parallel implementation of 
learning rules with coarsely quantized parameter changes in 
analog integrated circuitry [6]. Conceptually, the 
implementation of the trinary algorithm is relatively easy 
and eliminates the need for complicated circuitry for weight 
updates [6]. However, the dynamic range of the weight 
updating will be limited practically by the lack of local 
12 
control at the elemental weight cell. 
Trinary Backpropagation uses the same forward and 
backward equations. The change of algorithm from the 
regular BP is in the weight and bias modification after the 
values of deltas are calculated as: 
awii = TJaioi 
= 0 
/ibi = T)Sgn(l)i) 
= 0 
( 1oj1 )?:: el, 1ai1 )?:: e:a> 
(jojl < el, 1ai1 < e:a> 
(jai ~ e2 ) 
(jl)jl ( E:a) 
(5) 
where ~, E1 ,E 2 are positive constants. For constant 
~, the learning process correspond to motion on a lattice in 
a weight/bias space, which is in a direction of decreasing 
sum- square error for each training pattern pair, although 
not generally in the direction of steepest descent. E1 ,E 2 
are current or voltage programmable constants. 
The derivative F(sj) is implemented in a piecewise 
fashion as follows: 
for 
= RH for F(si) >Vf (6) 
The programming circuitry will conceptually consist of 
a three position switch: 
1) One position allows application of a programming 
current or voltage pulse to the weight circuit 
which would increment the stored charge by a 
discrete amount. This is equivalent to a fixed 
positive increment in the weight value. 
13 
2) A second position allows decrementing the stored 
charge by tunneling off charge. This is equivalent 
to a fixed discrete decrement of the weight value. 
3) The third and final position leaves the circuit 
open and prevents any program modification of the stored 
charge. This is equivalent to zero change or no change in 
weight matrix. 
1.6 Advantages 
The primary advantage is the on-board chip learning 
implementation. It makes it practically feasible to train 
any network by building application specific hardware. It 
conserves area in case of VLSI implementation. 
The network is inherently faster than the standard BP. 
So it is possible to build real time systems with this 
algorithm, which can be used in Natural Language Processing, 
Vision control etc. 
This algorithm has been proved to be faster in 
convergence than the standard BP and even the BP with 
adaptive weight modification. Therefore it is our intention 
to realize faster convergence through hardware learning. 
From the simulations, it can be stated that the 
component-to-component variation has a negligible effect on 
the convergence property on hardware implementation. Thus 
hardwired TRIT Neural Networks appear to be robust and error 
14 
tolerant of the imperfections in poorly matched devices. 
Because of the enormity of the processing nodes involved, 
damage to a few nodes or links does not significantly impair 
their functionality. 
Massively parallel Analog hardware networks, which are 
very fast and operate in parallel, can be developed based on 
these simulation which prove that the Algorithm is 
convergent for small benchmark problems. 
Such hardware, which can be used to test the validity 
of the practical implementation of ANN's and collective 
systems, can be designed for various specific applications. 
The software learning accomplished on Von-Neumann 
computers does not exploit the inherent parallelism of the 
Neural Networks [14]. By using Hardware rather than Software 
learning, the Neural Networks inherent parallelism is fully 
utilized. 
TRIT implementation can be easily understood from the 
single layer system diagram (Figure 2) which consists of an 
input function, and output circuit function (F(.)), a delta 
input function, a delta output function and the weight 
matrix array. 
1.7 Literature Survey 
Several CMOS analog implementations of ANN's have been 
reported in the literature. Some are inspired by biological 
15 
6,rn 
... F' X 
... 
6put 
Figure 2. IC Signal Flow Floor Plan 
16 
models [23] while some are derived from artificial models. 
They are dedicated to signal processing, image processing or 
pattern recognition without or with in-situ learning. They 
include digital or continuous valued analog signals. These 
networks use different learning algorithms like Hopfield, 
Grossman, Backpropagation etc. Also some of these works 
involve the building of basic cells on a chip with their 
test results. Since it is very difficult to discuss all the 
building block approaches of all the types of learning 
algorithms, we restrict our discussion to VLSI 
implementation of backpropagation learning of ANN's which 
have been fabricated. 
Boser et.al [4] implemented an optical digit recognizer 
on a neural network chip which is trained by 
backpropagation. It recognizes handwritten digits from a 
20x20 pixel image with 2.9% miss-classifications compared to 
a typical value of 2.5% for human beings. The network 
consists of 133000 connections of 3500 neurons arranged in 5 
layers. The throughput of the chip is 130MC/s and the 
operating frequency is 20MHz. 
Nijhuis et.al (26] have fabricated a collision 
avoidance neural network in a 2 micron double metal CMOS 
technology. They had used fully digital network which was 
laid out using standard cell library. It has an operating 
frequency of 20MHz and 10M interconnects per second. The 
chip consists of 12 neurons and 144 synapses and 134 I/O 
pads. 
1.8 Proposal for Hardware 
Implementation of TRIT 
17 
We propose the hardware implementation of potentially 
useful TRIT model in 2 micron Thin-Film Silicon-on-Sapphire 
(TSOS) process. 
The most significant reasons for preferring TSOS over 
bulk process is the reduced Vr variation due to reduction of 
bulk threshold parameter (y) and that Thin oxide film of 
250°A facilitates electron tunneling. The TSOS Process has 
both depletion and enhancement mode devices. 
TSOS devices are made with epitaxial silicon islands on 
a sapphire substrate. There is no body (substrate) contact 
on the device. The threshold voltage Vr for a n-channel 
transistor is given by 
(7) 
where 
= Vr at zero source to body potential 
y = bulk threshold parameter 
= strong inversion potential 
= source to body potential 
For TSOS devices, Vr = Vro· The threshold shifts are 
minimized to a great extent. Also source-to-body and drain-
to-body capacitances are negligible reducing parasitics and 
increasing Bandwidth (BW). 
18 
The electronic implementation of TRIT BP involves 
building sub-components or blocks. The final architecture 
can be realized be interconnecting such blocks on a single 
substrate. Being a parallel processing structure, 
Backpropagation networks are both iterative and highly 
structured. The building blocks take advantage of that fact 
simplifying design and testing at both cell and the system 
levels. This thesis addresses the design, simulation, and 
layout, of these basic building blocks. System level 
integration is beyond the scope of this thesis. 
The proposed IC is universal in the sense that a single 
IC implements each layer of a multilayer perceptron. There 
is one to one relationship between the weight matrix and 
each IC. The building block is at least theoretically 
extensible in the horizontal fashion to any number of 
layers. However the maximum number of neurons is fixed 
vertically by fabrication and limited by the pin count. The 
power supply rails also limit the magnitude of the 
backpropagated term which would limit their horizontal 
extension. The vertical extension is the number of neurons 
per layer of the chip. The horizontal extension is the 
number of layers used in an application. Normally three-
layer networks with an hidden layer is sufficient to 
approximate any function to a reasonable degree [28). 
As opposed to traditional voltage mode analog signal 
processing, in which inherently current signals are 
19 
converted to the voltage domain before any analog signal 
processing takes place, a recently reintroduced, current 
mode analog signal processing approach is taken. The 
current mode approach takes advantage of 1) the convenience 
of summing inner product and backpropagtion currents 2) the 
bidirectionality of triode Floating gate CMOS based weight 
multipliers. The use of current rather than voltage as an 
active parameter can result in higher gain, accuracy, and 
wider bandwidth due to the reduced voltage excursion at 
dynamic nodes [15]. 
Simulations were performed using SPICE. Layouts are 
accomplished by using CAD layout tool MAGIC. All circuits 
are fabricated using NRaD's fabrication facilities. 
Chapter II will discuss the software implementation of 
this model. Chapter III will focus on the design, 
simulation, and performance testing of all building blocks. 
Chapter IV will offer conclusions based on the results and 
suggestions for the future work connected with investigation 
of this proposed hardware implementation. 
CHAPTER II 
TRIT MODEL SIMULATIONS 
This chapter discusses the software program developed 
to test and validate the behavioral aspects of the TRIT 
algorithm. The purpose of the simulations is to test the 
convergence properties of the TRIT algorithm with variation 
in learning rate and E2 under non-ideal conditions. The 
component non-idealities are incorporated in the TRIT 
simulations. The TRIT algorithm will be compared to the 
Standard Backpropagation in the speed of convergence and its 
sensitivity to device imperfections. A character mapping 
and a pattern fitting problem will be used to establish the 
base line performances. The programs were developed in 
Matlab. 
In VLSI circuits, effects including random offsets and 
mismatch, system distortion, frequency response, and 
temperature variations perturb the system outputs [7]. The 
effects that dominate the error in the system depends on 
the system implementation. In our simulations, we have 
attempted to include most of the device imperfection effects 
that play a major role in our implementation. The variation 
20 
21 
in transconductance, the VT mismatch, and the channel length 
modulation parameter (~ effect are the major concerns in 
analog circuit inaccuracies. The errors due to the above-
-mentioned inaccuracies in current conveyors, weight matrix, 
and output function (F(.)) will be discussed in this 
chapter. 
2.1 Trit Program Description 
A program which calculates the initial weight matrix, 
biases and does the feed-forward calculation is run first 
and is enclosed in Appendix c. 
The program sets all the values of the weight and bias 
elements to 0.5 so that all the circuit points are started 
at the same initial condition. A random number 
corresponding to 5% variation of the weight elements is 
added to all the elements of the network. This number 
simulates the error present in the multiplier circuit. The 
reason is that the multiplier and the weight update circuits 
are non-ideal and a detailed analysis of the errors is 
presented in Appendix D. The current conveyors are driven 
by an opamp which is non-ideal. The opamp error is due to 
the mismatch of the input transistors as shown in Figure 13. 
This mismatch of the transistors introduces both VT and ~ 
errors. Also the current mirroring transistors in current 
conveyor (see Figure 8) is not ideal because of the channel 
22 
length modulation parameter (A) effect. Dynamic cascading 
is utilized to reduce the Aeffect. The resulting effective 
A is then negligible compared to the VT and p errors. The p 
error is reduced by laying out the transistors Ml and M2 in 
a common centroid geometry and maintaing moderate 
geometries. These errors are represented by adding a 5% 
random number to the following elements of the network. 
In the main program, shown in Appendix A, the values of 
E1 , e 2 and ~ are set. The value of the learning rate 
determines the number of iterations needed by the network to 
converge. Then the iteration is started as shown by a 
flowchart in Figure 3 & 4 . If the steady state error (SSE) 
is less than 0.1, the program is terminated. 
If SSE is greater than 0.1, the following procedure is 
started 
The values of & and & are calculated. The values 
of ~s determine whether the bias elements are to be 
adjusted. The values of ~s and outputs at the previous 
node corresponding to the weight matrix determines whether 
the weight matrix has to be updated. The region where the 
weights are to be updated are clearly shown in Figure 5. As 
seen from the Figure, if the values of & and Oj are 
greater than the threshold values, then and only then, are 
the corresponding weight elements Wij updated. Otherwise 
the corresponding value of that weight element Wij remains 
unchanged. Similarly if and only if, the value of & is 
Update bias 
vectors 
Figure 3. Flowchart of the TRIT program 
23 
Calculate outputs 
Add V T and P noise 
Plot errors 






Figure 5. Regions of updation of weight vectors 
26 
greater than that of E2 , the bias elements are updated. 
After the modification of weights a noise term of 0.01 
(o? is added to each element of the weight and bias matrix. 
The value of .01 stems from the fact that the values of the 
weight matrix and the bias elements vary from 0.5 to 1.5 and 
the noise floor is assumed to be atleast 60dB down and 
centered around the mean value of 1.0. The input vector is 
multiplied by the multiplier circuit which is explained in 
Chapter III. The analysis of the multiplier circuit 
indicate a maximum error of 1% due to the ~ and threshold 
mismatch. Appendix D analyses the errors in the multiplier 
circuit. This error effect is introduced in the simulations 
by adding a value of 0.01. 
The forward computation is now completed and the output 
error has been determined. The output error is plotted with 
respect to the number of epochs to observe the behavior of 
the network. Since there is always a finite range of weight 
values, which depends on the dynamic range of the weight 
multiplier circuit, weight variation is bounded. The upper 
limit is set at ±3 and the lower limit at ±0.2V. Also the 
squashing current conveyor or each layer outputs is also not 
ideal due to Vr and ~effects. This effect is not 
symmetrical. A detailed analysis of the effect is shown in 
Appendix D. This effect is taken care of by adding the 
noise values to the output values at each layer. 
2.2 standard BP Program Description 
A copy of the program is enclosed in Appendix B. The 
same initial weight matrix and the feedforward values are 
used to start the program. 
27 
First the value of ~ is set. Then the iteration is 
started. The SSE is checked for a value less than 0.1. If 
it is less than 0.1, the program is terminated. Else the 
following procedure is continued. 
The weight matrix and the bias vector elements are 
modified according to the Standard BP formulas. The forward 
computation is completed and the network output is 
calculated. The network output is subtracted from the 
desired output to get the output steady state error. The 
error is plotted with respect to the number of epochs to 
observe the behavior of the network. Further the next loop 
is started by checking the Steady State Error (SSE) and 
calculating the values of ~s. 
2.3 Adaptive BP Program Description 
The adaptive modification is enclosed in Appendix c. 
The backpropagation networks typically employ adaptive 
modification to eliminate local minima or to speed up the 
convergence of the network. The speed of convergence is 
increased by increasing the learning rate if the error 
vectors tend toward minima and vice versa. 
If the value of steady state error decreased in 
successive iterations, the value of learning rate ~' is 
multiplied by a factor of 1.07. If the steady state error 
increased in successive iterations, the learning rate is 
decreased by a factor of 1.02. 
28 
If the steady state error is constant, the value of e 2 
is decreased. Care is taken to test that the value of e 2 is 
not decreased by more than a factor of 1000. e 2 was started 
initially with 0.02. If the steady state error was constant 
in successive iterations, then the values of E2 was halved. 
If the value of error was constant even though the value of 
E2 is reduced by a factor of 1000, the network was 
considered as nonconvergent. The argument proposed is that 
the variation of this parameter E2 in an actual network is 
limited by the noise floor of the network. We assume a SNR 
of 60dB or noise floor of -60dB. Then if we reduce the 
value of E2 below 60dB, the noise in the circuit takes over, 
and it becomes practically impossible to control the 
circuit. 
2.4 Testing Procedures 
The initial simulations were performed to provide 
evidence for comparing the convergence properties of back 
propagation with this three-state or trinary quantization of 
29 
weights and bias updates with those of standard back 
propagation when applied to the same problems. During each 
iteration of the learning trial, the pairs of the 
input/desired output patterns in the training set were 
presented in a fixed sequence to the network and weights 
updated for each, and then the network output was tested 
over all pairs of the training set. This was continued till 
the convergence was obtained. 
Convergence was defined such that all errors between 
desired and actual network outputs were required in 
magnitude to be smaller in magnitude than 0.1. As 
mentioned earlier the values of W1i and b 1 's are varied by 
adding a random constant at every iteration. 
A character-mapping and a pattern-matching problem was 
selected. The number of hidden units and the learning rate 
were varied for standard BP, whereas for the trinary scheme, 
the value of € 2 is also varied. The value of € 1 was set 
at 0.33. The initial weights were set to be 0.5 + a random 
number (a=0.2). 
2.5 Results 
Table 1 depicts the result of the simulations in which 
the number of hidden units and the learning rate were 
varied. The striking result is the difference in the 
convergence time of the two algorithms. Standard BP takes a 
30 
very long time to converge. After a range of correcting the 
weights, the network goes through a prolonged phase in which 
the improvement is very low. In fact the learning rate has 
to be large (>.2) for the standard BP to converge. As the 
number of hidden units is increased, the number of 
iterations goes down significantly. For learning rates less 
than 0.1, it performs very poorly compared to the trinary 
algorithm. 
In the case of trinary BP as shown in Table 1, for 
learning rates less than 0.05, it is 5 to 10 times faster 
than standard BP. But for a learning rate in the range of 
0.3, it is even 15 times faster than BP. But it is 
doubtful whether trinary algorithm can have such a large 
learning rate because the RMS weight correction may be very 
large per iteration. However this simulation gives a feel 
for the convergence of the trinary algorithm and its 
relative speed of convergence compared to the standard BP. 
The reason for rapid convergence of the trinary algorithm 
may be due to the scaling imposed upon the weight and bias 
vectors updates by quantization and its investigation is 
beyond the scope of this thesis. 
Failure to converge within the iteration limit occurred 
for the TRIT problem whenever the value of € 2 was set to be 
greater than 0.01 for the various numbers of hidden units 
and the learning parameters. This occurs because the SSE 
limit set was 0.1 and unless the delta terms become smaller 








CONVERGENCE COMPARISION OF STANDARD 
BP AND TRIT 
-- -" 
Nh = Nb • Nb " Nb = Nb • Nh • 
20 40 10 20 40 10 
260 145 >500 490 290 >500 
(.3) (.3) (.1) (.1) (.15) (.05) 
9 9 45 33 27 158 
(.3) (.3) (.1) (.1) ( .1) (.02> 
31 
Nh., Nh = 
20 40 
>500 >500 
< .05) (.05) 
53 54 
( .05) (.05) 
than 0.01, convergence is not possible. if the delta terms 
fall below E2 the weight corrections ceases due to 
quantization and the network doesn't converge. 
2.6 Comparison of Adaptive BP and TRIT 
32 
It is clear from the simulations that BP with adaptive 
weight modification is slower than the TRIT implementation 
in speed of convergence. See the results of simulation 
shown in Table 2. 
The TRIT BP is 2 times faster than the Adaptive weight 
modified BP. The number of iterations remains constant for 
the variation of the number of hidden units in Standard BP 
with adaptive weight modification while it varies very 
little in TRIT BP. So based on our limited simulation 
results, even the adaptive weight modification of BP 
will not result in comparatively faster in convergence than 
the TRIT. 
2.7 summary 
These simulations give a feel for the potential success 
of a TRIT based algorithm. The TRIT algorithm is found to 
be faster in convergence compared to the BP algorithm. The 
component-to-component variation appears to have an 
negligible effect on the convergence property of TRIT, which 
TABLE II 
CONVERGENCE COMPARISION OF ADAPTIVE 
BP AND TRIT 
Alqo Nh = Nh = Nh = 
ri-- 20 40 60 
thm 
BP 74 74 74 




is encouraging because the analog hardware components are 
inherently low-accuracy components. The various building 
blocks that went into this development of TRIT hardware and 
their simultion results are discusssed in Chapter III. 
CHAPTER III 
SYSTEM BUILDING BLOCKS 
This chapter presents the design and simulation results 
for the basic building blocks of the TRIT algorithm. The 
proposed architecture takes advantage of the 
bidirectionality of the current conveyors and the EEPROM 
CMOS based weight multipliers. The architecture of the 
network is shown in Figure 2. The EEPROM based weight 
multipliers are arranged in a matrix form as shown in the 
Figure 2. The input circuit consists of two current 
conveyors driving the weight matrix. The current conveyor 
functions as a processing element and as well as a bi-
directional voltagejcurrent buffers (BiVI) (52]. The 
current conveyor based weight matrix drivers are shown in 
Figure 6. The current conveyors on the input side (CC1 and 
CC2 ) function as voltage buffers driving a single weight 
matrix column. The current conveyor CC3 is used as current 
controlled current source. The current conveyor CC4 
functions as an output buffer, driving a non-linear mapping 
(squashing) function (F(.)). The derivative circuit, which 






Figure 6. CC Based Weight Matrix Drivers 
system architecture. A weight adjustment logic, based on 
the TRIT algorithm, is needed for the weight update during 
network learning. 
37 
From this discussion, it is clear that the current 
conveyors form the most basic element of this architectural 
approach and care must be taken to specify their 
performance. This is necessary for the inputjoutput 
circuits to interface properly, because both the input and 
the output circuits are made up of current conveyors. The 
following section is a brief summary of the interface 
specifications. 
3.1 TRIT Interface Specifications 
Linear Limiting Squashing CCII+ 
I in ( Full Scale) : 50uA @X = ±0. 5V 
Iout (Sat) : ±3uA 
Iout (sat) = ~ VwtrsVrs/4 
Raut>10MEG at Iout <=3uA 
RL =4/ ( ~ *VwtFS) 
Input & output Biasing CCII+ 
X:±2.5V 
Y: Iout>=±1500uA @ Z= ±2V 
z:scale 1:1 at Follower and 1:1 @mirror 




Y: Iout>=±50uA @ Z= ±0.5V 
Z:Scale 1:1 at follower and 1:1 @ mirror 
Raut>=lMeg @50uA ( Cascoded) 
current Error<=l% 
3.2 Current Conveyors 
3.2.1 Introduction 
38 
A current conveyor is a four terminal device which 
performs many useful analog signal processing functions when 
used in arrangment with other electronic elements. Current 
conveyors are functionally flexible and versatile. They can 
form an integral part of all I/O circuits [49-51]. Current 
conveyors offer several advantages over conventional 
operational amplifiers. They provide the highest gain 
bandwidth product of the process, which depends only on the 
opamp used [48]. They are used within this thesis as a 
practical building block for the implementation of the TRIT 
algorithm. 
The block diagram of a current conveyor (CC) is shown 
in Figure 7. Class-I (CCI±) and class-II (CCII±} conveyors 




Figure 7. Block Diagram of 1he CC-11+-
40 
as: 







Iz 0 ±1 0 Vz 
From the above equation it is clear that no current 
flows into terminal Y. The voltage applied to terminal Y 
will cause an equal voltage to appear on terminal X. 
Terminal Y exhibits an infinite input impedance and terminal 
X exhibits a zero input impedance. An input current Ix on 
terminal X causes an equal current to flow into or out of 
the high impedance output terminal z. The positive sign 
indicates that at any instant both, Ix and I 2 are in the 
same direction (CCII+) while the minus sign denotes the 
opposite directions of the currents signifying CCII-. 
The current CCII configuration allows convenient 
switching between the current conveyor mode and the Voltage 
controlled Voltage Source (VCVS) mode. This choice supports 
the generation of matrix inverse function which is essential 
for implementing backpropagation. 
"The CCII may be viewed as an ideal transistor" [45]. 
The ideal behavior of the NMOS (MFN in Figure 8) transistor 
can be achieved by using it in the negative feedback loop of 
the operational amplifier. The current can only flow away 
from the X terminal. If a PMOS (MFr) transistor is used in 
the feedback loop, current will be restricted to flow into 
the x terminal. Bi-directional current flow can be achieved 
41 
z 
Figure 8. Current Conveyor Symbol 
42 
by using a complementary pair of MOS transistors (MrN and 
Mrp) in the opamp feedback loop. This drain current of MrN 
and Mrp can be mirrored to the output node z. Thus the 
input current Ix is conveyed to the output current I 2 • This 
is a CCII+ realization, since both Ix and I 2 simultaneously 
flow in the same direction. 
The bi-directional voltagejcurrent (BiVI) buffers 
which are based on the current conveyor concept, are shown 
in Figure 6. These buffers provide the dual function of 
voltage drivers and current sources/sinks to isolate the W 
matrix in the forward/reciprocal mode. During the feed 
forward cycle, the buffers on the input side are configured 
as voltage controlled voltage sources, and the buffers on 
the output side are configured as current controlled current 
sources. The output side consists of a current conveyor 
driving the non-linear squashing function F(.) which 
develops the output voltage to the next layer. A nearly 
identical structure is duplicated to achieve 
Backpropagation. Each current conveyor accepts current in 
the input mode or supplies the drive voltage (and current) 
to the weight row matrix (math column). 
During the forward propagation cycle the Ypoint of CC1 
and cc2 are tied to oi while the Y point of CC3 and CC4 are 
grounded. The voltage controlled voltage source configured 
cc ensures Vx1=Vy1 • This results in voltage Oi being applied 
across the drain to source of both the floating gate and 
43 
reference transistors in the ith row. The current flowing 
in the reference transistor is mirrored and applied to the 
floating gate column. Due to the presence of the weight 
charge on the floating gate, these currents will differ by 
the inner product of the applied voltage (01 ) and the stored 
weights. The difference or signal current oj is then 
measured by CC3 • current squashing is accomplished by the 
modified CC3 • The current controlled current source 
configured CC ensures lz 3=Ix3 • This current is the input to 
a linear saturating function which limits the current to 
3uA. Similarly, if all the column currents are summed in 
the X4 terminal then the resulting current indicates the 
multiplied value of the input weight vectors and the weight 
matrix. 
During back propagation a voltage Van is applied to 
the Y inputs of both CC3 and CC4 with the Y inputs of CC1 
and cc2 grounded resulting in role reversal of CC's with the 
I 6'out current available at the Z output of CC1. I6'out 
must be further processed (multiplied by a derivative) to 
compute ~out. The whole approach critically depends on the 
high accuracy of the CC's. The following section presents 
the design of CC1 through CC4 • 
44 
3.2.2 High Power Conveyor 
3.2.2.1 Design 
The high power current conveyor CC2 must be able to 
supply the total weight current in one column of the weight 
matrix. The signal swing is also determined by the two 
current conveying transistors MFN and Mrp as shown in Figure 
9. The two transistors have to be saturated for 
satisfactory performance. The total weight current in one 
column is 
IHPCC = 100*3*5=1500uA 
To source/sink this current, the sourcejsink 
transistors, MIPAI MrN, MrNAI and MFP must be sized 
appropriately. For a weight current of 1500uA and an output 
swing of ±3.5V {Section 3.1). 
(W/LhrN =2*I/ ( ( AV) 2 * Kn) = 2*1500/ ( ( 3. 5-VTN) 2*48) = 10 
{W/LhrP =2*I/( (AV) 2 * Kpl = 2*1500/( (2.5-VTN) 2*21) = 20 
We expect a output swing (at z terminal) of at least 
±2V. This is because, the weight transistor drain to source 
voltage swing is determined by this Z terminal. The dynamic 
range of the weight multipliers depend on the Z terminal 
swing. The length of the cascading transistors (M9 and M10) 
was fixed to be 2um, while the length of the mirroring 
transistor was 6um (MINA and MINB). The long mirroring 
transistor was selected to reduce the channel length 
45 
Figure 9. Current Conveyor circuit 
modulation parameter (~ effect and to reduce local 
geometric and doping mismatches. 
So ( ~V)Mto= ( ~V)MrPa/V ( 6/2) 
( ~Vh9 = ( ~VhrNa/V ( 6/2) 
( ~VhiPa/V ( 6/2) +( ~V)MIPB = 3+VTP 
(~V)MrNa/V(6/2) +(~VhrNa = 3-VTN 
where (~VhrNa = V(2*IziJ\rNa) 
and ( ~V)MIPB = V( 2*!ziJ\IPa) 
Iz = 1500uA, k0 =48 1 Js,=21 
Solving these equations, we get 
(W/LhrPB s:: 75 








For a unity ratio current mirror, the dimensions of the 





Since we have fixed the length of the transistors at 
WMINA = WMINB = 6*37.5 = 225um 
WMIPA = WMIPB = 6*75 = 450um 
WMINA = WMINB = 2*116 =232um 
WMIPA = WMIPB = 2*240 = 480um 
The result is that the widths are very large for the 
mirror transistors. This is due to the design objective of 
maintaining at least a 2V swing at the output (Z terminal), 
47 
while sinking such a large current in the mA range. This 
large current will translate into large ~v drop across the 
mirroring and the cascoding transistors (MIPB and MlO, MINB 
and M9). For the required Z output swing, we need to have 
only a 2V (~V) drop across the two large current carrying 
transistors. This causes the widths of these transistors to 
be large to reduce the ~V value across the transistors. 
The design requirement for the Z terminal output swing 
of 2V would necessitate P-transistor widths of 900um. 
Therefore for pragmatic reasons, the Z terminal swing was 
reduced to lV for final fabrication. 
The revised values of (W/L)'s for the mirrors 
considering a value of lV output swing in Z terminal are 
WMIPA = WMrrs = 180um 
WMrNA = WMrNs = 9 Oum 
WM 9 = 90um 
WM1o = 180um 
3.2.2.2 Simulations 
The D.C., and A.C. characteristics of the high power 
current conveyor are shown in Figures 10 & 11 respectively. 
The D.C. curve shows that the output current tracks the 
input current in the range of 1.5mA. The MIPA and MINA 
current indicate the mirroring currents, while MIPB and MINB 





% error @ 150uA 




-200uA -150uA -1 OOuA - OuA OuA 
iin 



























-25,~----~~---.~7L--~~~--~~----~~..---~ 1 h 1 OMh 
Frequency 
Figure 11. A.C. Response of High Power CC 
49 
50 
these two currents is found to be less than 1% (@ 1.5mA), 
which indicates that our objective of accurate current 
conveying is achieved at the required output swing. The 
output swing is determined by forcing the z terminal to 
remain at 1 V while sweeping the input current. The ~db 
point can be determined from the A.C. characteristics. The 
A.C. characteristic of Figure 11 was obtained by biasing the 
circuit at an d.c. input current of 1uA. For specific 
applications, the full power bandwidth is also an important 
criterion which is computed from the slew rate. 
3.2.3 Low Power Conveyor 
The output current requirement is 150uA, which is 1/10 of 
the high power conveyors's current. We want the same swing 
as the high power conveyor in z, but at reduced current 
value. So the dimensions of all the transistors are the 
scaled values (by 10) of that of the high power current 
conveyor as follows 
(W/L)rN = 1 
(W/L)MP = 2 
WMIPA = WMIPB 
WMINA = WMINB 
WMlO = 18um 




The D.C. characteristics of the low power current 
conveyor is shown in Figure 12. The D.C. output current 
follows the input current in the range of 150uA and the 
current error due to A effect (excluding AP and AVr) is 
again found to be less than 1%. 
3.2.4 Opamp 
51 
A simple self-compensating opamp is chosen for the 
required opamp function for the CCII due to its simplicity 
and the moderate offset demands of the CCII structure. This 
circuit results in a stable, self-compensated, minimum area 
opamp structure. The bandwidth of the opamp can be 
increased by increasing the bias currents at the expense of 
the power dissipation. More complex opamps will produce a 
better performance than this opamp but with increased area. 
The weakness of the opamp is the offset. However, since 
offset is adjusted as a part of backpropagation learning, 
offset requirements will not be stringent. Also 
practically, with an open loop gain of at least 60dB offset 
performance will be limited more by the threshold matching 
of the input differential pair. Future refinement beyond 
the scope of this thesis will focus on Bandwidth and output 





~1% error@ 150uA 




-200uA -1 50uA -1 OOuA - OuA OuA 
iin 




Figure 13. opamp circuit diagram 
54 
the dynamic range of the CC circuit. The circuit diagram of 
the opamp is shown in Figure 13. 
3.2.4.1 Analysis of Opamp 
In all the subsequent discussions, the symmetry of the 
opamp is exploited to reduce the redundancy of the equations 
i.e. M1 and M2, M3 and M4 and M6 and M7 are assumed to be 
symmetric and matched transistors. 
Output Swing 
Positive : It is obvious that to maintain M6 in 
saturation, the output voltage V0 is limited to 
(15) 
Negative : To maintain M8 in saturation, the negative 
output voltage swing is limited to 
(16) 
Input Common Mode Range 
The input common mode range VcMR- is given by 
(17) 
VCMR- ~ VGM5 - V'INM5 + V7NM1 
Because VGMS - VrNMs is the minimum voltage to 
maintain M5 in saturation and the gate voltage of M1 should 
be at least a Vr above its source voltage. VcMR- will be 
less than VGMS because VrNMl >= VrNMs. and there is a 
55 
threshold shift associated with Ml and not with M5. The 
threshold shift is due to the fact that the source of Ml is 
at a different potential than its substrate. 
The input common mode range Vcr~R+ is given by 
VCMR- ~ VGM5 - VTNM5 + VTNMl 
(18) 
The voltage VGM 3 - VTPMJ is the voltage to maintain M3 in 
saturation and the gate voltage of Ml should be a V7 above 
its drain voltage to maintain it in saturation. 
Output Resistance 
The output resistance of the opamp is 
where 
r 0 == IJ. M6 * ( r dsMlll r dsM3 ) II IJ. MB * r dsMl 0 
IJ. M6 == 1 I ( I DM6 *A M6 ) * g mM6 
rdsMlO = 1/ (IDM6*AM6) 
g mM6 == v ( 2 p I DM6) 
(19) 
(20) 
So in order to maintain high output impedance, the A!s 
of the transistors M8, M6 and MlO should be high. Since the 
channel length modulation parameter is inversely 
proportional to the length, the lengths of M6, M8 and M10 
are made reasonably high to ensure a high output impedance. 
Gain 
The gain of the opamp is given by 
(21) 
56 
The value of ro is given by Equation 18. 
Gain-Bandwidth Product (GBW) 
The GBW is given by 
GBW ::: gmH1/ Col 
where col :; CL + co 




col :; co :; CL 
Slew Rate 
The Slew rate (SR) is given by 
(23) 
where Iss is current through M5. So it is 
advantageous to increase the value of the width of M5 to 
increase input CMR and SR. 
Cut-off Frequency 
The cut-off frequency ~db is given by 
c..>3db = 1/roCo 
where ro and C0 are given previously. 
3.2.4.2 Biasing Circuit of the opamp 
(24) 
The bias circuit that forms a part of the opamp circuit 
is shown in Figure 13. The transistor M3B sets up the gate 
57 
voltage for M6. By varying the gate voltage of M6, the 
required output positive output swing can be achieved. The 
transistors M6B and MlOB mirror the input current to the 
transistors MSB and M9B. The reason for using two stack 
mirroring transistors is to achieve the required matching 
with the transistors M6, MS and MlO in addition to 
cascading. M2B controls the gate voltage of MJ and M4. The 
gate voltage of MJ and M4 and the widths effectively control 
the output current. The transistor M7B controls the gate 
voltage of MS. By controlling the gate voltage of MS, the 
required output swing in the negative direction can be 
achieved. The lack of symmetry in the biasing voltage of MS 
is apparent. This can be corrected in future versions by 
including a transistor to match that of MlO. The lack of 
symmetry produces an offset voltage as evident from the D.C. 
characteristics in Figure 14. The transistors MSB, M4B and 
MlB are used for mirroring and matching purposes. The gate 
voltage of M5 is set by dimension of MlOB and the current 
through it. 
The design criteria from Section 3.1 results in the 
following biasing constraints. 
Bias current 50uA 
output swing ±3.5V 
since we desire an output swing of +3.5V the gate 
voltage of M6 should be at least 3.5+VTP to maintain M6 in 
saturation i.e. the gate to source drop on the transistor 
voltage of M6 should be at least 3.5+VTP to maintain M6 in 
saturation i.e. the gate to source drop on the transistor 
M3B should be (3.5+VTP)V. 
So VGM3B = VGM7,M6 = (3.5 + VTP)V = 2.63V 
(24) 
Substituting the values of 
IarAs = 50uA 
VTP = 0.92V 
J\3a = 2*IarAs/(2.5-0.92) 2 = 39 
(W/Lh3a = 39/21 = 2 
58 
Similarly the negative output swing is -3.5V. So the 
gate voltage of M7B should be -3.5+VTN to maintain M8 in 
saturation. The gate to source drop on M7B should be (-
3. 5+VTN) V 
~+Vm=2.6V 
~~ 
Solving for M7B by substituting the values 
1.\?B = 2*IarAs/(2.6-0.92) 2 = 41 
(W/L)M?B = 1 
(25) 
The gate to source voltage on M5, M3 and M4 should be 
at least 1.5VT to reduce the effect of VT mismatch. 
VGSM3,M4= VGsMZB = v'(2*IarAs/ J\za) + VTP 
= -/(2*57/21*9) + 0.92 = 1.7V > 1.5VT 
59 
and a load capacitance of 0.5pF with a resulting slew rate 
of 50/0.5=100Vjusec. However after layout and simulation 
50uA current was not sufficient. Resimulations resulted in 
a current of 57uA. The lOOVjusec requirement should result 
in a peak full power frequency of 
fP=SR/2!Ypp = 2*100/2~4)=8MHZ 
3.2.4.3 Power Circuit Design 
The transistors Ml and M2 were laid out in a common-
centroid geometry to minimize DC offset. To facilitate the 
common centroid design in Magic, each cell has a (W/L) ratio 
of 6/2. The only consideration in the design of transistors 
M6, MB, and MlO is that their lengths should be sufficiently 
large to achieve output impedance of the opamp. High output 
impedance translates into high gain of the opamp. The 
increase in capacitance with their increase in length of the 
transistors is negligible compared to the output capacitance 
of 0.5pF. The design of M5 influences the slew Rate and the 
negative Power Supply Rejection Ratio (PSRR). So increasing 
the width of M5 will provide an increased CMRR, increased 
input common Mode Range (CMR) and increased Slew Rate (SR}, 
which have no significant impact on our design objectives. 
60 
3.2.4.4 Simulations 
The D.C., A.C., and transient characteristics of the 
opamp are shown in Figure 14, 15 & 16. The D.C. 
characteristics show an offset output voltage of 2.26V. The 
A.C. characteristics indicate the the gain-bandwidth product 
is 20MHz and a gain of 72dB. This shows that gain achieved 
is higher than the design objectives. The transient 
characteristics indicate a negative slew rate of 16Vjusec 
and a positive slew rate of 4Vjusec . 
3.2.5 Sguashing function 
3.2.5.1 Design 
Figure 17 shows the linear limiting version of the 
squashing cc. 
Squashing is achieved by addition of two transistors's 
at each conveyor half. This generates a linear limiter 
function. One of the two transistors (MS2N) results in 
additional current branch to the supply rail. The upper 
transistor in the traditional mirror branch takes on the 
classical follower role (MFN), while the lower transistor 
(MSlN) serves to limit the current that can be mirrored to 
the z output. The saturation level is a function of the 
gate bias and geometry of the lower Transistor (MS1N) of 
5.0~--+-------+-------~------+-------+-~ 
0.0~------------------+-+-----------------* 
offset output error voltage 
=-2.26V 
-5 · OV==_4~_==om=v===::::.:,:+.::-_.....,..,,.--_o=.6o_m...,..v-:---=2-i::. ,.....m_,..v.,----=4-1:. =-m-=v..,....J. 
vin 
Fig 14. D.C. characteristics of op-amp 
61 
62 
72 db gain 
output volnage magnitu e 
GBP= 
output voltage phase 























Figure 17. Squashing Current Conveyor 
65 
branch one. Once the current in the mirrored path 
saturates, all additional current is routed to the supply 
rails through the secondary path (MS2N). This results in a 
linear limited function with a globally programmable 
saturation limit. 
The transistor MS2N should be large enough to supply 
the current for the entire column of the weight matrix once 
the transistor M1 saturates. The saturation current was 
fixed at 1500uA. 
The opamp output swing is ±3.5V. So the source of MFN 
should be 3.5V + V1pto maintain MFN in conduction. The 
source of MS2N, the X terminal, is assumed to be at 2V 
during the forward propagation. So the gate to source 
voltage of MS2N is 3. 5V+VTP-VTN· 
The current requirement is 
32*VmMAX* ft.r*VPP for a 32 column matrix 
100*VmMAX* ft.r*VPP for a 100 column matrix 
(WIL)MS2N= 2* (I) I ( ( 3 • 5-VTN-VTN) *kn) 
= 14 for a 32x32 weight matrix 
= 43 for a 100xlOO weight matrix 
(WILhs2 p = 2* (I) I ( ( 3 • 5-VTN-VTN) *kn) 
= 27 for a 32x32 weight matrix 
= 85 for a 100x100 weight matrix 
66 
3.2.5.2 Simulations 
Figure 18 shows the D.c. characteristics of the 
squashing current conveyor. The saturation current is fixed 
to be ±3uA. 
3.2.5.3 Approximate Sigmoid function 
There are many approaches for the saturation function 
in ANN's : Linear saturation function, s-shaped sigmoid 
function, Hyperbolic tangent etc. The sigmoid function 
generation circuit is discussed in this section. 
A MOS device has a nonlinear output I-V characteristic. 
It can be utilized as a sigmoid function. The derivative 
has to be generated as a piecewise linear function. The 
gain can be obtained from the slope of the output-input 
curve of the MOS transistor. 
In Figure 19 , V1n , Vc, and Io are the input voltage, 
the gain control voltage, and output current respectively. 
The design of all the MOS devices ensure that they operate 
in the saturation region for the entire range of input 
voltage. Applying KVL around the loop shown in Figure 19, 
V1 = V GSH5 + V GSMB + V DSM9- V GSH7- V GSM6 






~1% error @ 150uA 




-200uA -1 50uA -1 OOuA - OuA OuA 
iin 
Figure 12. D.C. Characteristics of Low Power CC 
68 
Figure 19. Sigmoidal Function (Single Quadrant) 
69 
The equation above assumes that the transistors M5 M6 
I I 
M7 and MS are matched. The TSOS process ensures almost 
exact cancellation of Vr's. Low y in TSOS process results 
in Vr = Vro. In regular orbit process, Vr of transistors M5, 
M6 and MS would be of different values and the circuit may 
fail. 
Two quadrant operation can be achieved by adding PMOS 
transistors to the circuit in Figure 20. Depending on the 
polarity of the input voltage, the N and P transistors would 
conduct. Symmetry in two quadrants is maintained by the 
proper selection of device geometries in respective parts. 
Transistors M11-14 act as mirroring transistors, 
transferring the drain current of M9 and M10 to the output 
load. The linear and saturation regions of M9 and M10 
almost approximates the Sigmoid function. 
A family of curves can be generated by adjusting the 
value of the control voltage Vc, varying the gain of the 
circuit. The transitors M16-M22 reduces the steady state 
power dissipation. 
3.2.5.3.1 Simulations The D.C. and A.C. 
characteristics of the sigmoid function are shown in Figure 
21 & 22 respectively. The symmetry of the D.C. curve is 





















-12u~1.5V -1.0V -0.5V o.ov 0.5V 1.0V 1.5 
Vln 






G(in dB) = -88.85 uAIV 
Vrn=1 V A.C. 






Figure 22 A.C. characteristics of sigmoidal circuit 
73 
3.2.6 Dynamic cascode biasing 
The requirements for the current mirrors to be used 
with all the current conveyors are : linear current gain, 
high output impedance, wide output voltage swing, small 
input bias voltage, and a good high frequency response. The 
ability to satisfy the requirements depends on the type of 
current mirror chosen [45]. There are basically five types 
of current mirrors in CMOS technology [45]: simple current 
mirror, cascode or stacked current mirror,Wilson current 
mirror, Improved Wilson current mirror and cascode current 
mirror with improved biasing. The tradeoff can be a high 
output impedance and good current conveying capacity for a 
reduced output swing. Simple current mirrors have poor 
mirroring accuracy and low output impedance but have large 
output voltage swing. The stacked or cascode configurations 
suffer from reduced output swing but have high output 
impedance and good accuracy. 
The current conveyor is at the heart of this building 
block approach. A simple current mirror will not achieve 
the required current conveying accuracy. Therefore cascode 
mirrors were used in all the circuits. The cascode mirrors 
have the disadvantage of reduced output swing. But since 
our crucial requirement is an accurate current transfer 
ratio, we selected cascode mirrors with dynamic biasing 
circuit as shown in Figure 23. 
Vdd 





The transistor MJ is used to mirror a portion of 
current that flows through the cascading transistor MlNB. 
It is mirrored by the P-current mirrors. The Aeffect is 
not completely eliminated because the mirroring (P-mirrors) 
are not 100% accurate, and there is a Aeffect at junction 
of MPl and M3. The Aeffect on the mirrors and MJ node can 
be reduced by making the lengths of all the cascade biasing 
transistors large. Further improvement is possible by 
cascading MPl and MP2 as well as MJ. 
3.3 Derivative Circuits 
The derivative circuit of the linear saturating curve 
is shown in Figure 24. The backpropagtion vector consists 
of the derivative of the output vectors of the previous 
stage. So the derivative circuit must accomplish 
multiplying the output vector with the function value. The 
approximate sigmoid circuit derivative circuit acts as a 
transconductance transferring the input voltage to an output 
current. The derivative of that output current can be 
achieved as a peicewise linear derivative function. It 
needs another complex circuitry to multiply the derivative 
current and the output voltage. 
The linear derivative circuit has two impedance states 
- low and high. The low impedance state indicates slow 






Figure 24. Derivative Circuit 
77 










D.C. characteristics of Derivative circuit 
78 
learning. The derivative circuit is shown in Figure 24. As 
soon as the drain voltage of the transistor MSlN becomes Vr, 
it saturates. This point corresponds to the knee of the 
linear saturating curve, when the transistor MS2N takes over 
conduction from MSlN. The comparator CRl switches from low 
to high state, which drives the output of the nor gate to 
high output state. Since the gate of load transistor ML is 
connected to the output of the "nor" gate, it starts 
conducting and the current Iin gets a low resistance or 
high conductance path. The output voltage goes from high 
impedance state to low impedance state. Similar actions 
occur in the p-half as the transistor MSlP saturates, and 
the output goes from the high impedance state to the low 
impedance state. This low state will drive the 
corresponding delta vectors to low values. The reduced 
value of delta vectors will prevent learning, because the 
output vectors in that layer exceeds a certain value, or is 
saturated. Two types of comparators (CRl and CR2) are used 
to trip at two different saturation points of MSlN and MS2N 
[56]. 
3.4 Weight Matrix 
Neural networks "learn" by modifying weights 
(synapses). The weights must be alterable and should take 
a wide range of positive and negative values. The 
incremental weight changes should be small (33]. If 
continuous weight values are used, then there is a need to 
store these values. This storage requirement imposes a 
quantization effect either because of digital storage and 
A/D converters, or by using analog storage and countering 
the effect of noise. The effective noise in the system 
determines the dynamic range of analog values that can be 
stored and retrieved. i.e. the resolution. An even more 
stringent requirement is the development of high density 
storage medium which is readily accessible in IC form. 
Digital storage with A/D and D/A's will not have the 
required chip density. Therefore analog storage is the 
solution [34]. 
79 
There are variety of a methods of producing analog 
storage. Capacitors and integrators will allow the stored 
charge to degrade too fast. In pure analog storage there is 
no noise margin, and, hence no possibility of signal 
restoration. An analog signal can only be maintained, with 
memory decay, and the design objective is to maintain the 
signal as long as necessary. 
An analog memory element can be characterized by: 
(!)location, on-chip or off-chip (2) volatility, volatile 
or nonvolatile (3) programming/erasing method, electrical or 
non-electrical, and (4) the precision in bits. 
storage of analog weights necessitates, 1) truly non-
volatility, for long term retention of the stored knowledge, 
80 
2) on-chip and rapid programmability, to expedite the 
network learning by minimizing read and write times, and 3) 
application specific yet simple, for ease of fabrication, 
analog memories. Discrete programming of true analog 
memories results in finite resolution, usually specified in 
bits. 
Several analog memory designs have been presented in 
the literature [36]. Furman and Abidi [18] presented a feed 
forward network with back error propagation. The weights 
are stored as charges on capacitors on the nodes at 
cryogenic temperatures. Card and Schniedel have used 
capacitors with positive or negative charges with periodic 
refreshing using training data. Bibyk et. al. [37] used 
floating gate MOS transistors to store charges. Hubbard, 
Schwartz and Howard [29] introduced a circuit utilizing 
dynamic charge storage on MOS capacitors. Hoecht et. al. 
[38] presented a method in which a finite number of charge 
levels can be stored on a MOS capacitor. These charge 
levels are preserved by a sense circuitry and regular 
refreshes. Additional analog designs [39-42] are also 
present in the literature. 
The favorable learning feature of the TRIT model is 
that the weights are varied in parallel and across the whole 
network. This eliminates the need for a complex circuitry 
to locate the weights. As the magnitude of the weight 
changes are predetermined, the weight modification is 
81 
further simplified. The floating-gate analog semiconductor 
memories has been proposed by a number of researchers [43] 
as a suitable analog medium for the long-term storage 
of the weights. Y.Tsividis and s.satyanarayana [44] had 
suggested storing analog voltages at the gate capacitance of 
the MOS transistor itself. The inherent non-linearity of a 
transistor can be cancelled by using complementary input 
voltages through the matched weighing transistor, or by 
passing the same voltages through the complementary weighing 
transistors: the n-channel and the p-channel. Learning 
takes place by addressing the proper capacitors and charging 
them according to a specified learning algorithm. Once the 
MOS weights have settled (RC time constant), the capacitors 
are periodically accessed for reading, charging and 
refreshing. This scheme suffers from a relatively short 
retentivity resulting in decreased accuracy. As a result, 
the network becomes "absent minded", forgetting information 
shortly after learning. 
3.4.1 Floating gate analog CFGA) memories 
Floating gate analog memories are alterable and non-
volatile. They provide local on-chip weight storage on the 
floating gate of a transistor. It is small, consumes less 
power, has slow memory decay and is compatible with standard 
fabrication processes. The extra gate layers are used to 
82 
store trapped charges on a floating gate. once trapped 
these charges produce a shift in the gate to source voltage 
which varies the current through the transistor. This type 
of memory element exhibits long term retention because no 
discharge path is available since the gate is surrounded by 
the dielectric material Si02 • This memory transistor is 
operated in the triode region where non-linearity of the 
transistor is fairly low. Usually depletion devices are 
used to eliminate the floating bias. 
The charge on the floating gate of a transistor 
represent the value of the weight. As the network learns, 
the strength of the synapse increases. That is the 
electrical equivalent of dumping more charge on the floating 
gate, i.e., programming and thereby modulating the 
electrical conductivity of the synapse (PMOS). Thus during 
programming, the electrical conductivity of the synapse is 
expected to increase. The P-sense transistor was 
specifically chosen to achieve this desired operation. 
While programming, the floating gate acquires electrons 
which develop a negative potential on the floating gate of 
the P-MOS sense transistor. The floating gate voltage tends 
to become more negative as programming proceeds. Therefore, 
the drain current through the device increases, i.e., 
conductivity increases. 
until recently, the memories discussed above required a 
special fabrication process such as ultrathin window, 
83 
nitrite trap oxide, or a conventional textured polysilicon. 
Usually, these special processes are expensive, immature. 
and simply not available in many design environments, 
especially universities. In order to fulfill the need of an 
analog neural network designers for programmable memories, 
existing standard CMOS process without modifications had to 
be improvised to provide a solution to realize floating gate 
memories. Recently several such implementations have been 
reported [45-46]. The interested reader is referred to the 
earlier work in this field by S.Patil [47]. 
A number of these floating gate analog memories can be 
interconnected suitably to form a weight matrix structure. 
The same weight matrix can be used in both forward and back 
propagation increasing both density and yield. 
current summing is used for the common analog 
computation of the inner product of the weight vector and 
the input vector. current summing offers more dynamic range, 
which is of importance in signal processing applications. 
The linearity is due to summing of the non-linear elements 
currents of the transconductor into a virtual ground. The 
common mode nonlinear terms are eliminated, while the 
difference currents develop an inner product computation 
with wide dynamic range. The compact method of multiplying 
for inner product uses a single Transistor per cell. 
From the Figure 
84 
v~ r aJS 
Vwr~ 
,D 
lo1 = p[{VGs + Vwr- VT)Vos- Vos212] lo2 = p[{VGs- VT)Vos- Vos212] 
lo = lo,- lo2 
= PVwtvds 




The output current as shown in Figure 26 is 
(29) 
The TRIT backpropagation IC consists of this analog 
EEPROM weight matrix surrounded by current conveyors. 
3.4.2 Weight Adjustment Circuit 
The weight adjustment circuitry is shown in Figure 27 & 
28. The circuit implements the TRIT algorithm based on the 
values of the delta and output vectors. The comparator CHl 
switches from high to low state, if the input delta value 
exceeds e 2 • CH2 switches from high to low, if the value of 
delta vector becomes less than € 2 • The switching of either 
comparators results in a high state latched in latch (N9 and 
NlO). The complemented output of the latches is fed to the 
transmission gate, which is clocked. The latch states are 
"OR"ed, clocked and fed to a NOR gate. The other input of 
the NOR gate is the strobe (STR) control signal. A high 


















Figure 28. Weight Adjustment Circuitry-2 
88 
being high. 
Similar actions take place for the comparator CH2's 
output, which results in DEC signal being high, if the delta 
vector is less than E2 • Similar logic can be implemented 
for the output (0) vectors. 
3.5 Sample and Hold (S/H) circuit 
3.5.1 Introduction 
The dynamic current copier (current self-calibrating 
circuit, and dynamic current mirror, etc.) is used. The 
gate capacitor of the MOS device is used to store the 
information for a short period of time since the gate of a 
MOS device practically has infinite input impedance. Figure 
29 shows the basic N-copier cell. To sample the input 
current, switches S1 and S2 are closed. The gate capacitor 
CGs of M1 will charge to voltage VGs required by the 
transistor to achieve the drain current I 0 • If M1 is in 
saturation, the gate voltage is given by: 
(30) 
The switch S1 and S2 are opened successively. The 
circuit goes into hold phase and stores the current 
information as the capacitor voltage in the gate of Ml. 
89 
Figure 29. Basic Sample and Hold Circuit 
90 
Since the gate voltage of Ml is coupled to transistor M2 and 
M3, an equivalent current can sink through M2 at the hold 
phase. The P-copier cell can be achieved by replacing the 
NMOS with a PMOS transistor, and by reversing the direction 
of currents. In such a case, the cell will source I 0 when 
connected to the load. The copier cells need not be 
accurately matched. An error current (AI) is present due 
to: (1) charge sharing between the gate capacitor CGs and 
switch capacitor CGssw· ( 2) channel length modulation 
parameter (3) junction leakage associated with S1 , causing 
a steady discharge of the storage capacitor, 
The minimum dimension switches, Msp and MsN are used to 
reset the gate voltage or hold capacitors. A dummy switch 
can be added in series with the switching transistor to 
further minimize the effect of charge sharing. The channel 
length modulation error is reduced by cascoding the current 
sampling and holding transistors. Dynamic biasing of these 
cascode transistors gives improved cascoding and improved 
current transfer ratio. MeN, MrPA, MIPs, MRN are used for 
dynamic biasing circuitry . 
3.5.2 Errors in S/H circuit 
3.5.2.1 Charge Injection Error 
The switching transistor is made conductive by mobile 
91 
carriers that are attracted into the channel by the gate 
voltage during its closing. For charge equilibrium, the 
total charge of the mobile carriers in the channel must be 
equal to the total charge stored on the gate. The charge is 
stored on the gate in strong inversion. In N-copier when 
switch S1 opens, a fraction ~ of q is dumped on the 
capacitor CG51 , which causes an error in the stored voltage. 
This voltage error (~V) in turn creates a relative error in 
the output current of the copier. ~v can be decreased by 
making the switch gate oxide capacitance a small percentage 
of the CGsN where one limit is given by the area of the CGsN· 
It can also be decreased by reducing the total charge q in 
the channel which in turn reduces the fraction ~ that flows 
onto CGsN. This can be achieved by minimizing the gate area 
WxL and/or by controlling the gate voltages of the switch or 
increasing VGsN· Similar treatment applies to the P-copier 
for determination of the error due to the charge injection. 
The factor ~ determines the amount of charge that is dumped 
on the source. 
3.5.2.2 Switch Feedthrouqh Error 
This contribution is due to the clock voltage that is 
coupled to the gate via CGo· The clock voltages is 




where V42 is the clock voltage and CGo is gate to drain 
capacitance of the transmission gate. The change in the 
gate voltage multiplied by the transconductance reflects an 
error in the drain current. This error is reduced by 
connecting a dummy transistor in series with the switching 
transistor [47]. 
3.5.2.3 Cascade Configurations 
The contribution due to the channel length modulation 
produces change in the drain current as the drain to source 
voltage changes. The Aeffect is reduced by cascading the 
transistors using a regulated cascade structure as shown in 
Figure 30. 
3.5.3 Simulations 
The transient simulation of the sample and hold circuit 
is shown in Figure 31. The output current is out of phase 
with the input current. The output current is the sampled 
and stored value of the input current. 
93 
Figure 28. Sample and Hold Circuit 
94 
60 'L. 
lr put current 
)U-r, 40 
20 )lJ ~ 
0 U1 ,_____ 
-20 u~ 
-40 u~ 
LOad cur enr 
h lt us 5o us 100us 150us 200us 250us 300u -60 s 
Time 
Figure 31. Transient Characteristics of S/H circuit 
CHAPTER IV 
CONCLUSIONS AND FUTURE PROSPECTS 
The design of the basic building blocks for the TRIT 
algorithm in (TSOS) process is completed. A single weight 
matrix is used in both the forward and backpropagation mode 
resulting in reduction of area by two. A Matlab program 
which partially simulates the transistor mismatches in this 
architecure was also developed. The TRIT program with the 
transistor imperfections demonstrate faster convergence than 
BP and insensitivity to MOS parameter variation. 
The system level integration of the TRIT model will 
require the exact specification of all the system 
parameters. The optimal values of learning rate, E1 , and E2 
should be investigated. The forward propagation parameters 
like IsAT' Im, V0 and the backpropagtion parameters like 
&'s, Rt and~ should be specified at the system level. 
The fabricated blocks have to be tested thoroughly to 
test their effectiveness and further refined. The high 
power current conveying transistors are very large. The 
derivative circuit can be further improved. 
Floating gate memories provide the best answer to 
95 
96 
electrically programmable/erasable non-volatile 
semiconductor memories. Reduction in cell size, improvement 
in performance, and circuit density will be the products of 
the floating gate memories research. so future developments 
in FGA memories have to be followed closely to be adopted 
for our design. 
For effective learning, local or on-chip storage and 
modification of the weight is the preferred solution. The 
task of weight updates is complex since it involves issues 
related to high voltage, learning algorithm, and weight 
storage. Precise control of the weight needs extensive 
experimentation to mathematical model and understand the 
programming and erasing behaviors of Floating gate memories. 
The on-chip generation of high voltage poses an 
additional challenge. However, the tunneling physics and 
high voltage pulse generation are two separate issues and 
initially should by handled separately for conceptual 
testing and understanding, and then should be combined 
together. Other issues relating to the weight matrix are 
cell layout, placement, and signal routing. Cell layout 
will have a direct impact on both the silicon area as well 
as on the cell performance. Significant expertise is 
required to arrive at the optimal design. A suitable signal 
routing scheme is required since the weight matrix is 
expected to be dominated by routing wires. In this regard, 
high voltage concerns such as field threshold, reverse 
breakdown etc. need special attention. 
The process maturity will play an very important role 
in TRIT design. Also better analog simulation tools which 
represents the transistor more excatly has to be used to 
further improve the design efficiency. 
97 
BIBLIOGRAPHY 
1. Rumelhart, D.E.Hinton, G.E., & Williams, R.J. (1986). 
Learning internal representations by Error 
propagation, Parallel Distributed Processing, 
Cambridge, MA, MIT Press, pp. 318-362 
2. Richard P. Lippman, "An Introduction to Computing with 
Neural Nets," IEEE ASSP Magazine, April 1987, 
pp. 4-22 
3. Derek B.I.Feltham, and Wojciech Maly, "Physically 
Realistic Fault Models for Analog CMOS Neural 
Networks," IEEE Transactions on Neural Networks, 
1991, pp. 1223-1230 
4. Bernhard E. Boser, Eduard Sackinger, Yann Le Cun, 
Lawrence D. Jackel, "An Analog Neural Network 
Processor with Programmable Topology," IEEE 
Journal on Solid State Circuits, December 1991, 
pp. 2017-2024 
5. Karl Goser, Ulrich Hilleringmann, Ulrich Ruekert, 
and Klaus Schumacher (1989). VLSI Technologies 
for Artificial Neural Networks, Dec 1989, 
pp. 28-43 
6. Shoemaker, P.A., Shimabukoro, R, and Michael J. 
98 
Carlin (1991), "Back Propagation Learning with 
Trinary Quantization of Weight Updates," Neural 
Networks Vol. 4, pp. 231-241 
7. K.A.Boahen, R.E.Jenkins et.al,"A Heteroassociative 
Memory Using Current-Mode MOS Analog VLSI 
Circuits," IEEE Transactions on Circuits and 
Systems, 1989, vol.36, pp. 747-755 
99 
8. S.W.Tsay and R.W.Newcomb, "VLSI Implementation of ART1 
Memories," IEEE Transactions on Neural Networks," 
vol.2, 1991, pp. 214-221 
9. A.F.Murray, D.Del Corso, and L.Tarassenko, "Pulse-
stream VLSI Neural Networks Mixing Analog and 
Digital Techniques," IEEE Transactions on Circuits 
and systems, vol.36, 1989, pp. 193-204 
10. B. Linares, E. sanchez, A.Rodriguez, and J.L.Huertas, 
"Modular Analog continuous-Time VLSI Neural 
Networks with on Chip Hebbian Learning and Analog 
Storage," IEEE Transactions on Neural Networks, 
1992, pp. 1533-1536 
11. simon Y. Foo, Lisa R. Anderson, Yoshigasu Takefuji, 
"Analog Components for the VLSI of Neural 
Networks," circuits and systems, 1990, pp. 18-25 
12. c. Toumazou, F. J. Lidgey, and D. G. Haigh, Analogue IC 
Design: The Current-Mode Approach, Eds., 
Peregrinus, London, 1990 
13. Christian Schneider and Howard Card, "CMOS 
.. 
100 
Implementation of Analog Hebbian Synaptic Learning 
Circuits," IEEE Transactions on Neural Networks, 
1991, pp. I437-I442 
14. Robert c. Frye, Edward A. Rietnam, and Chee c. 
Wong, "Back-Propagation Learning and Nonidealities 
in Analog Neural Network hardware," IEEE 
Transactions on Neural Networks, 1991, pp. 110-117 
15. s. Espejo, A. Rodriquez et. al, "Switched-current 
Techniques for Image Processing cellular 
neural networks in MOS VLSI," IEEE Transactions on 
Neural Networks, 1992, pp. 1537-1540 
16. Jackel, L.D, Graf, H.P., and Howard, R.E.,"Electronic 
Neural Network Chips," Applied Optics, 1987, 
pp. 5077-5080 
17. Alspector, J., Allen, R.B.Hu,v., & Satyanarayana, s. 
(1988), "Stochastic learning networks and their 
implementation," Proceedings of the IEEE 
conference on the Neural Information Processing 
systems. pp. 9-21 
18. Furman B., and Abidi. A, "CMOS Analog IC implementing 
the back propagation algorithm, "First Annual 
Meeting, (Abstract) Neural Networks, 1988, pp. 38 
19. Shoemaker, P.A., and Shimabukoro, R (1988), "A 
modifiable weight circuit for use in adaptive 
neuromorphic networks," Neural Networks, 1, sup. 
1,409 
101 
20. Hu V. Kramer A., and Ko. P. K., "EEPROM'S as analog 
storage devices for neural nets," Neural 
Networks,1988, pp. 385 
21. Shimabkuro R.L., Shoemaker, P.A, and Astewart 
M., "Circuitry for artificial neural networks with 
non-volatile analog memories," Proceedings of the 
IEEE Symposium on Circuits and systems, 1989, 
pp. 1217-1220 
22. Holler,M. ,Tam, S.Castro,H., and Benson, R., "An 
electrically trainable artificial Neural network 
(ETANN) with 10240 "floating gate" 
synapses," Proceedings of the IEEE Joint 
Conference on Neural Networks, 1989, pp. 177-182 
2 3 • Alan F. Murray, and Anthony Smith, "Asynchronous VLS I 
Neural Networks using Pulse-Stream Arithmetic," 
IEEE Transactions on Neural Networks, 1988, 
pp. 688-697 
24. Peterson, c., & Hartman, E. (1989), "Exploration of the 
mean field theory learning algorithm," Neural 
Networks, 2, pp. 475-494 
25. M.Marchesi, G.Orlandi et.al.,"Multi-layer Perceptrons 
with Discrete Weights", Proc. IEEE ISCAS 1990, New 
Orleans, pp. 623-629 
26. Bernd Hofflinger, stefan Neuber et. al, "VLSI 
Implementation of a Neural Car Collision Avoidance 
controller," IEEE Transactions on Neural Networks, 
191, pp. I493-I499 
27. C.A. Mead, Analog VLSI and Neural Systems, Addison 
Wesley Publishing Co. Inc., 1989 
28. Martin Hagan (1992), Back Propagation Class Notes 
on Neural Networks (ECEN 5050.3}, 1991 
29. Kurosh Madani, Patrick Gadra, Eric Belhaire, and 
Francis Devos, "Two analog Counters for Neural 
Network Implementation," IEEE Transactions on 
Neural Networks, 1991, pp. 966-973 
102 
30. Paul Hasler and Lex Akers, "Circuit Implementation of 
Trainable Neural Networks Employing both 
Supervised and Unsupervised Techniques," IEEE 
Transactions on Neural Networks, 1992, pp. 1565-
1568 
31. W.R.Smith, "Trinary Back-Propagation Simulation with 
Component Nonidealities," NOSC Newsletter 
32. Randy L. Shimabukuro, Pat Shoemaket et. al., "Effects 
of Circuit Parameters on Convergence of Trinary 
Update Back-Propagation" 
33. Daniel B. Schwartz, Richard E. Howard, and Wayne E. 
Hubbard, 11 A progr.ammable Analog Neural Network 
Chip," IEEE Transactions on Neural Networks, 1989, 
pp. 313-319 
34. J.Raffel, J.Mann et. al., "A generic architecture for 
wafer-scale neuromorphic systems," IEEE Conference 
of Neural Networks, Vol. 3, 1987, p.501 
35. P. A. Shoemaker, c. G. Hutchens and, s. B. Patil, "A 
Hierarchical Clustering Network Based on a Model 
of Olfactory Processing," Submitted, 1992 
103 
36. Aria Nostrinia, M. Ahmadi, M. Sridhar, G.A. Julien, "A 
hybrid Architecture for multi-layer Neural 
Networks, IEEE Transactions on Neural Networks, 
1992, pp. 1541-1544 
37. T.H.Borgstorm, M.Ismail, and S.B. Bibyk, "Programmable 
current mode network for implementation in 
analogue VLSI," IEEE Proceedings on Neural 
Networks, 1990, pp. 75-84 
38. B.Hochet, V.Peiris, S.Abdo et. al., "Implementation of 
a learning Kohonen neuron based on a new 
multilevel storage technique," IEEE Journal on 
Solid state circuits, 1989, pp. 262-267 
39. P. Mueller et. al, "A general purpose analog neural 
computer," IEEE Second International Conference on 
Neural Networks, 1988, pp. 177-182 
40. H.P.Graf and L.D.Jackel, "Analog Electronic Neural 
Network Circuits," IEEE Circuits and Devices 
Magazine,July 1989, pp. 44-49 
41. E.A.Vittoz, "Analog VLSI Implementation of Neural 
Network," Proceedings of International Symposium 
on circuits and systems, 1990, pp. 2524-2527 
42. F.M.A. Choi et. al, "An All-MOS analog Feedforward 
Neural Circuit with Learning, 11 Proceedings of 
104 
International Symposium on Circuits and Systems, 
1990, pp. 2508-2511 
43. R. L. Shimabukuro, and P. A. Shoemaker, "Circuitry for 
Artificial Neural Networks with Nonvolatile Analog 
Memories," Proceedings, IEEE International 
Symposium on Circuits and Systems, pp. 1217-1220, 
1989 
44. Y. Tsividis, and S. Satyanarayana, "Analog Circuits 
for Variable-Synapse Electronic Neural Networks," 
Electronic Letters, Vol. 23, No. 24, pp. 1313-
1314, November 1987 
45. L. R. Carley, "Trimming Analog Circuits Using 
Floating-Gate Analog MOS Memory" Circuits, Vol. 
24, No. 6, pp. 1569-1575, December 1989 
46 B. W. Lee, B. J. Sheu, and H. Yang, "Analog Floating-
Gate Synapses for General-Purpose VLSI Neural 
Network Computation," IEEE Transaction on Circuits 
and Systems, Vol. 38, No. 6, June 1991 
47. S.B.Patil, "VLSI Design of olfactory Network," 
Master's Thesis, Oklahoma State University, 
1992 
48 c. Toumazou, J. Lidgey, and D. Haigh "Introduction," 
Ch. 1 in Analogue IC Design: The Current-Mode 
Approach, c. Toumazou, F. J. Lidgey, and D.G. 
Haigh, Eds., Peregrinus, London, 1990 
49. K.C.Smith and A.S.Sedra, "The current conveyor - a new 
105 
circuit building block," Proceedings of IEEE, vol 
56, pp. 1368-1369, Aug 1968 
50. A.S.Sedra and K.C.Smith, "A second-generation current 
conveyor and its applications," IEEE Transactions 
on Circuit Theory, Vol CT-17, pp. 132-134, Feb 
1970 
51. A.S.Sedra, "A new approach to active network 
synthesis," Ph.D Thesis, University of 
Toronto,1969 
52. A. S. Sedra, and G. w. Roberts, "Current Conveyor 
Theory and Practice," Ch. 3 in Analogue IC Design: 
The Current-Mode Approach, c. Toumazou, F. J. 
Lidgey, and D. G. Haigh, Eds., Peregrinus, London, 
1990 
53. Paul R.Gray and Robert G.Meyer, analysis and design of 
Integrated Circuits,(2nd Edition), Wiley, NY 1984 
54. s. B. Patil, and c. G. Hutchens, "A Novel squashing 
Function for Electronic Implementation of Neural 
Networks," 5th Oklahoma Symposium of Artificial 
Intelligence, 1991 
55. Y. Tsividis, Operation and Modeling of MOS Transistor, 
56. P. E. Allen, and D. R. Holberg, "Two Stage 
comparators," Ch.7, in CMOS Analog Circuit Design, 
HRW Inc., 1987 
APPENDIX A 
STARTUP PROGRAM 
% program to do initial calculations 
for i=l:63 




























































































epsilon2=input('Enter the value of epsilon2:'); 






% random # of 0.02 is generated 
for n=1:64 
























% beta variation 
for n=l:l 



































































% weight & offset correction 























































































w1(n 1 m)=0.2; 
end; 
if abs(wl(n,m))>=3, 





wl(n 1 m)=3; 
end; 







n1(n,m)=w1(n,k)*p(k,m)+bl(n 1 m); 
a1(n,m)=nl(n 1 m); 
end; 









% addition of noise 
for k=l:hidden units 
n2(n 1 m)=w2(n 1 k)*al(k 1 m)+b2(n,m); 
a2(n 1 m)=n2(n 1 m); 
end; 
if n2(n 1 m)>l 
a2(n,m)=l; 
end; 
if n2(n 1 m)<-1 
a2 ( n 1 m) =-1. 0; 
end; 
113 

































STANDARD BP PROGRAM 
errors=[sse]; 
















% weight & offset correction 

























for n=l:hidden units 
wl(n,m)=wl(n,m)+p(m)*dl(n)*learning_rate; 
end; 























































ERRORS IN MULTIPLIER CIRCUIT 
0.1 Error in multiplier 
IoN=( J\+4(3) [ (Vl-(VrN±4Vr)-V/ /2] 
lop=( f.\+4(3) [ (Vl-(VTP±4Vr)-V//2] 
Assuming ~=A 
Io=2 f)/1 V2 +24f)/1 V2 ± 2 (.MVTV2 ± 24VTV241) 
= Ideal term + ERROR 
where the last term can be neglected 
ERROR= 2 f)/1 V2 ( AAf 13 ± 4VT/V1) 




AVT error is 1% 
0.2 Error in output [F(.)] function 
Due to AVT and 4(3 errors, the output is modified. The 
output current is proportional to the square of the gate to 
118 
119 
source voltage. So error ~n transconductance due to aJi and 
AVT can be derived as follows: 
AI = f3<Ve-VT) 2 - ( f3 ± Af3)[ (Ve-VT ± AV1 ) ] 2 
AI = 13( AVe) 2 - ( f3 ± Af3) [ (AVe ± AVT)] 2 
Simplifying 
AI = fMVe( AWf3 ± 2AVrfVc) 
IF AVe=VT , VT error is 1-2% 




Candidate for the Degree of 
Master of Science 
Thesis: DESIGN OF BUILDING BLOCKS FOR THE TRIT ALGORITHM 
Major Field: Electrical and Computer Engineering 
Biographical: 
Personal Data: Born in Pondicherry, India, March 20, 
1968, the son of K.N. Parthasarathy and K.P. 
Ranganayaki. 
Education: Graduated from Madras Christian College 
School, India, in May 1986; received Bachelor of 
Engineering degree in Electrical and Electronics 
Engineering from College of Engineering in May 
1990; completed requirements for the Master of 
Science degree at Oklahoma State University in 
May, 1993. 
Professional Experience: Research Assistant, 
Department of Electrical Engineering, Oklahoma 
state University, January, 1992, to December, 
1992. 
