# DESIGN OF BUILDING BLOCKS FOR TRIT 

## ALGORITHM

## By

## BALAJ PARTHASARATHY

Bachelor of Engineering
College of Engineering
Guindy, India

1990

Submitted to the Faculty of the
Graduate College of the
Oklahoma State University
in partial fulfillment of
the requirements for
the Degree of
MASTER OF SCIENCE
May, 1993

# OKLAHOMA STATE UNIVERSITY 

## DESIGN OF BUILDING BLOCKS FOR TRIT ALGORITHM

Thesis Approved:


## PREFACE

This thesis attempts to design the building blocks for TRIT algorithm. PSPICE was used for simulation. The building blocks were laidout in Magic.

I would like to express my sincere gratitude to Dr. Chriswell Hutchens for his support, guidance and encouragement. I appreciate the time and effort that he spent on this project. I would also thank the Naval Ocean Systems Center for the facilities and funding of this project. I am thankful to Dr. Johnson and Dr. Baker for serving in my committee. I thank my research colleagues and friends in Stillwater.

I am always indebted to my good friend Gunna, and would like to thank Viswanathan, Sunil Mathews and Raja.

This work is dedicated to my parents Ranganayaki and Parthasarathy, my grandmother Rukmani, uncles Dr.S.V.Kannan and Dr.S.V.Vijayaraghavan, Mr.A.V.S.Raja and their families, my brothers and their families, and my sister Padmini Vasudevan and her family.

TABLE OF CONTENTS
Chapter Page
I. INTRODUCTION AND LITERATURE SURVEY ..... 1
1.1 Introduction ..... 1
1.2 Comparision of Analog and Digital ANN's ..... 2
1.3 Comparision of Current and Voltage mode approaches ..... 4
1.4 Learning ..... 4
1.5 Review of Standard Back Propagation ..... 7
1.6 Advantages ..... 12
1.7 Literature Survey ..... 14
1.8 Proposal for Hardware Implementation of TRIT ..... 15
II. TRIT MODEL SIMULATIONS ..... 20
2.1 TRIT Program Description ..... 21
2.2 Standard BP Program Description ..... 27
2.3 Adaptive BP Program Description ..... 27
2.4 Testing Procedures ..... 28
2.5 Results ..... 29
2.4 Comparision of Adaptive BP and TRIT ..... 32
2.7 Results ..... 32
III. SYSTEM BUILDING BLOCKS ..... 35
3.1 TRIT Interface Specifications ..... 37
3.2 Current Conveyors ..... 38
3.2.1 Introduction ..... 38
3.2.2 High Power Conveyor ..... 43
3.2.2.1 Design. ..... 43
3.2.2.1 Simulations. ..... 47
3.2.3. Low Power CC ..... 50
3.2.3.1 Simulations ..... 50
3.2.4 Opamp ..... 52
3.2.4.1 Analysis of Opamp ..... 52
3.2.4.2 Biasing circuit of the Opamp ..... 56
Chapter Page
opamp ..... 56
3.2.4.3 Power Circuit Design ..... 59
3.2.4.4 Simulations ..... 60
3.2.5 Squashing function ..... 60
3.2.5.1 Design ..... 60
3.2.5.2 Simulations. ..... 63
3.2.5.3 Approximate Sigmoid Function. ..... 66
3.2.5.3.1 Simulations ..... 69
3.2.6 Dynamic cascode biasing ..... 73
3.3 Derivative circuits ..... 75
3.4 Weight Matrix ..... 78
3.4.1 Floating gate analog memories ..... 81
3.4.2 Weight adjustment circuitry ..... 85
3.5 Sample and Hold circuit ..... 88
3.5.1 Introduction ..... 88
3.5.2 Errors in $\mathrm{S} / \mathrm{H}$ circuit ..... 90
3.5.2.1 Charge Injection Error ..... 90
3.5.2.2 Switch Feedthrough Error ..... 90
3.5.2.3 Cascode configurations ..... 92
3.5.3 Simulations ..... 92
IV. CONCLUSION AND FUTURE PROSPECTS ..... 95
REFERENCES ..... 98
APPENDIXES ..... 106
APPENDIX A - STARTUP PROGRAM ..... 106
APPENDIX B - TRINARY PROGRAM ..... 109
APPENDIX C - STANDARD BP PROGRAM ..... 115
APPENDIX D - ERRORS IN MULTIPLIER ..... 118

## LIST OF TABLES

Table Page
I. Comparision of Standard BP and TRIT ..... 31
II. Comparision of adaptive BP and TRIT ..... 33

## LIST OF FIGURES

Figure Page

1. Multi-layer Perceptrons ..... 6
2. IC Signal Flow Floor Plan ..... 15
3. Flowchart of the TRIT program ..... 23
4. Flowchart of the TRIT program ..... 24
5. Regions of updation of weight vectors ..... 25
6. Current Conveyor (CC) based Weight-matrix drivers ..... 36
7. Block diagram of the Current Conveyor ..... 39
8. Current Conveyor symbol ..... 40
9. Current Conveyor circuit ..... 45
10. D.C. characteristics of High-power CC ..... 48
11. A.C. Response of High Power CC ..... 49
12. DC Characteristics of Low Power CC ..... 51
13. Opamp circuit diagram ..... 53
14. D.C. characteristics of opamp ..... 61
15. A.C. characteristics of opamp ..... 62
16. Transient characteristics of opamp ..... 63
17. Output function $F($.$) circuit$ ..... 64
18. D.C characteristics of squashing (F(.)) function ..... 67
19. Sigmoidal function (Single Quadrant) ..... 68
20. Sigmoidal Nonlinearity (Two Quadrant) ..... 70
Figure Page
21. D.C. Characteristics of sigmoidal circuit ..... 71
22. A.C. characteristics of sigmoidal function ..... 72
23. Dynamic cascode biasing cirucit ..... 74
24. Derivative circuit ..... 76
25. D.C. Characteristics of Derivative circuit ..... 77
26. Weight Cell ..... 84
27. Weight adjustment circuitry-1 ..... 86
28. Weight adjustment circuitry-2 ..... 87
29. Basic Sample and hold circuit ..... 89
30. Sample and Hold (S/H) circuit ..... 93
31. Transient Characteristics of the $S / H$ ciruit ..... 94

NOMENCLATURE

| $(W / L)_{M X}$ $\beta_{\mathrm{X}}$ | Width to length ratio of subscripted MOSFET $x$ Transconductance of MOSFET $x$ |
| :---: | :---: |
| $\lambda$ | Channel length modulation parameter of MOSFET |
|  | x |
| $A_{\text {vx }}$ | Voltage gain of subscripted operational amplifier x |
| $\mathrm{C}_{\text {GD }}$ | Gate to drain capacitance |
| $\mathrm{C}_{\text {GS }}$ | Gate to source capacitance |
| $\mathrm{C}_{\text {Gx }}$ | Gate capacitance of subscripted MOSFET |
| $\mathrm{Cox}^{\prime \prime}$ | Oxide capacitance/unit area |
| $g_{\text {ds }}$ | Small signal drain to source conductance |
| $\mathrm{g}_{\mathrm{mx}}$ | Small signal channel transconductance of |
|  | MOSFET x |
| $I_{D x}$ | Drain current of MOSFET $\mathbf{x}$ |
| $M_{x}$ | Subscripted MOSFET |
| $V_{D D}$ | Positive supply voltage |
| $\mathrm{V}_{\mathrm{DS}}$ | Drain to source voltage of MOSFET x |
| $\mathrm{V}_{\text {GSX }}$ | Gate to source voltage of MOSFET x |
| $\mathrm{V}_{\text {Ss }}$ | Negative supply voltage |
| $V_{T X}$ | Threshold voltage of MOSFET x |
| W | Weight matrix |
| $W^{\top}$ | Transpose of the weight matrix |

## CHAPTER I

## INTRODUCTION AND LITERATURE SURVEY

This thesis addresses the design and simulation of the building blocks for Trinary Backpropagation (TRIT) algorithm. TRIT is a modified form of the Backpropagation (BP) [1,2] algorithm for Artificial Neural Networks (ANN) and is also referred to as Trinary Backpropagation algorithm. Architecture for parallel on-board learning based on the TRIT algorithm is the aim of this work. TRIT quantizes the $B P$ algorithm, by updating the weights of a Multi-Layer Perceptron (MLP) network, in parallel. The weight updates are not unique for each and every element of the weight matrix but can be only one of the three values : an increment, decrement of the same magnitude, or zero. This results in large saving in silicon, because of the reduced complexity of the weight updates. Also the same weight matrix is used both in the forward propagation and back propagation mode resulting in area reduction by two. Larger layer sizes can be implemented in lesser area because of this modification to BP algorithm. On-board learning is the goal of this research, which will widen the scope of the

ANN applications.

### 1.1 Introduction

The resurgence of interest in Artificial Neural Networks (ANN) that started in the late eighties has led to a host of new potential applications for these ANN models [3]. These Neural Network models offer great potential in the areas of speech processing, image recognition and pattern classification due to their high fault tolerance and parallel computation capability [4].

The complexity of these neural networks does not stem from the complexity of the individual components but from the multitude of ways in which a large collection of the components can interact. These network models reflect highly parallel, regular, and modular architectures that make them attractive for Very Large Scale Integrated (VLSI) systems [5]. The implementation of such models in hybrid VLSI analog/digital circuitry is one of the active current research areas [6].

The technologies used in special purpose ANN implementations are broadly classified as analog [7,8], digital or mixed analog/digital (hybrid) IC's [9], optical and electro-optical [10].

### 1.2 Comparison of analog and digital ANN's

Analog VLSI neural networks perform better than digital circuits in specific applications. Current studies [11] indicate that $10^{9}$ to $10^{11}$ interconnections can be achieved with analog circuits, a rate much higher than digital circuits [11]. Analog VLSI ANN's make use of very simple building block which are reconfigurable and versatile. The simple building block approach simplifies the design time, making efficient use of the Computer Aided Design (CAD) tools. The design of simple well-defined analog cells that can be interconnected to achieve different linear and/or nonlinear functions is the key to the success of the analog ANN approach. This approach will bring neural nets VLSI design a step closer towards automation. Also, some of the traditional analog design requirements such as accurate absolute component values, device matching, precise time constants, etc are often of a lesser concern in Neural Nets applications because computational precision of individual neurons is not of paramount importance [12].

Schnieder and Card [13] have discussed the effect of the low-accuracy components on the design of ANN chips. They argue that $A N N^{\prime} s$ with in-situ learning i.e. networks in which the synapses contain circuitry which performs local computation of weight updates, can adapt the weights to compensate for component to component variations present in
analog networks. In fact, this thesis attempts to incorporate many of the transistor imperfections in the simulations for testing the validity of our algorithm. The results are discussed in Chap II of this thesis. Frye et. al. [14] have proved that the average error becomes less than $4 \%$ in an adaptive ANN, which uses hardware components with $30 \%$ variation.

Analog rather than digital VLSI has been identified as a major technology for future ANN applications. A large effort is being devoted to the ANN implementation in analog Metal Oxide Semiconductor (MOS) VLSI [5]. Efficient tools for the synthesis at both circuit and layout levels, simulation and testing of large scale analog IC's are being developed [12].

### 1.3 Comparison of Current and Voltage mode approaches

Many of the neural network functions involve current rather than voltage. The summing of many signals is readily achieved when those signals are currents. The dynamic range of the signals are greatly increased when MOS transistors are operated over the range from weak to strong inversion. This dynamic range is very critical for the scaled VLSI technologies which are expected to see a reduction in supply voltages. The frequency of operation is also potentially increased due to lower impedance internal nodes, and reduced
full scale swing [12]. Reduced power consumption and increased speed of operation are the other inherent advantages of the analogue current-mode approach [15].

For the reasons outlined above, we make full use of current-mode processing in analog VLSI for the implementation of the TRIT algorithm.

### 1.4 Learning

ANN models include : Hopfield, Hamming, Single layer perceptrons, Multilayer perceptrons (MLP), Grossberg and Carpenter, Boltzmann machines, Kohonen Self-organizing maps, Bidirectional associative memories, and Neocognitrons. The discussion of all these types of networks is beyond the scope of this thesis but will include a review of $B P$ and MLP's.

Backpropagation (BP) is a supervised learning scheme used in multi-layer perceptrons (feed-forward networks). The backpropagation networks are very attractive in applications such as pattern and speech recognition, waveform classification, etc.

It has been shown that the multi-layer perceptrons (MLP), can approximate any function of interest to any degree of accuracy. The multilayer perceptron network is shown in Figure 1. MLP are an important and popular subclass of ANN's. They have simple dynamics because of the


Figure 1. Multi Layer Perceptrons (MLP)
absence of feedback paths. The simple dynamics ensures the stability of the multi-layer perceptrons. Also the existence of powerful learning and adaptation algorithms for these networks make them very attractive from the engineering perspective.

The learning capability of ANN's is one of the most intriguing and challenging areas in theoretical neuroscience. Some researchers [16] have used fixed interconnection weights between processing units to implement learning in ANN's using the various algorithms discussed before. This however limits the application of the network. There has been several attempts [17-22] to address the problem of modifiable weight circuitry. Learning historically requires connectionist elements to have a considerable amount of circuitry, and, hence, a large amount of silicon area in addition to high inaccuracy [11]. On-chip learning procedures has been reported by several authors [17,18]. Furman et.al [18] used a dynamic memory cell and circuitry for weight modification and storage to implement the BP algorithm. They attempted analog storage in digital fashion by storing graded charges on a capacitor. The value of the charge represents the weight value which complicates the whole circuitry. Alspector [17] implemented an digital/analog weight stochastic learning network. In Alspector's work, the weights are subjected to fixed increment or decrement at
each step of the learning process.

> | $1.5 \begin{array}{c}\text { Review of Standard } \\ \text { Backpropagation } \\ {[27,28]}\end{array}$ |
| :---: |

Standard BP can be represented as

$$
\begin{align*}
O_{i} & =I_{i} & & \text { (Unit i an input unit) } \\
& =F\left(\sum_{i} W_{i j} O_{j}+b_{i}\right) & & \text { (otherwise) } \tag{1}
\end{align*}
$$

where $I_{i}$ is the external input to the unit $i, W_{i j}$ is the weight associated with the interconnection from the $j^{\text {th }}$ processing unit in the network to the $i^{\text {th }}$ unit, $b_{i}$ is the bias or the offset term, and $F$ is the sigmoidal activation function for the hidden and output units. Propagation in the forward direction can be represented by the following equations

$$
\begin{aligned}
\bar{a}^{0} & =p \\
\bar{a}^{k+1} & =\bar{\Psi}^{k+1}\left(W^{k+1} \bar{a}^{k}+\bar{b}^{k+1}\right) \quad k=1,2, \ldots M-1 \\
\bar{a} & =\bar{a}^{M}
\end{aligned}
$$

(2)
while the backpropagation of the error can be represented as

$$
\begin{aligned}
& \bar{\delta}^{M}=-F^{M^{\prime}}\left(\bar{n}^{M}\right)(\bar{t}-\bar{a}) \\
& \bar{\delta}^{M}=-F^{k^{\prime}}\left(\bar{n}^{k}\right) W^{k+1} \bar{\delta}^{k+1} \quad k=M-1, M-2, \ldots, 1
\end{aligned}
$$

(3)

Weights and offsets are changed according to

$$
\begin{align*}
& \Delta W^{k}=-\alpha \bar{\delta}^{k} \bar{a}^{k^{T}} \quad k=1,2, \ldots M \\
& \Delta \bar{b}^{k}=-\alpha \bar{\delta}^{k}
\end{align*}
$$

The BP algorithm tends to converge very slowly. Also the incremental changes in delta (error propagating) and output vectors near convergence are extremely small. The weight and bias changes are proportional to the error. As the error becomes too small, the weight change becomes too small as the circuit approaches a stable weight configuration. Based on the noise figure of the analog process, the incremental changes can be too small to be implemented practically in analog hardware. The lower bound on the convergence error is set by the limitation of the analog hardware. The signal to noise ratio (S/N) expected for our circuit is 60dB. So if the network fails to converge in this range of signal changes, the noise in the circuit takes over. The final circuit state then becomes dependent on the noise present in the circuit.

The potential difficulties associated with computing and imposing graded weight updates in parallel in analog hardware have led researchers to investigate better and easier methods of parallel learning procedures in which weight changes are coarsely quantized [6]. Small modifications of learning procedures can considerably enhance the computational power of neural networks and can
make practical implementation of such networks easier [23]. Peterson \& Hartman [24] examined the effect of update quantization into two states (increment or decrement) on the performance of a mean field theory learning algorithm. Alspector et. al. [17] implemented a hybrid digital/analog circuit in which weights are subjected to fixed increments or decrements per step of the learning process. M.Marchesi studied the effect of restricting the weights in multi-layer perceptrons to powers-of-two or sums of powers-of-two. A learning procedure based on backpropagation was used for a neural network with these discretized weights [25].

Shoemaker [6] proposed a modified Sgn-Sgn or trinary learning algorithm, which forces the same weight update increment for all elements in the network for efficient implementation of backpropagation in electronic perceptrons which will henceforth be referred as TRIT algorithm. Several electronic neural network solutions have been offered to date, but none with parallel onboard TRIT learning. The importance of on-board learning has been demonstrated by Frye, Wong et. al. [14] They argue that performance of the hardware when it learns by simulation is much poorer than that of obtained by learning on the network itself.

In our architecture, charge is stored on an electrically isolated floating gate of a MOS device [6], which would represent the weight value. However, precise control of increments of charge, and hence change in
weights, is difficult in existing technology because the charge tunneling in the gate of a floating MOS device is difficult to control [6] due to extreme nonlinearities. If a unique and small weight change has to be accomplished on each element of the weight matrix, it will require a very large and complex circuitry even for small layers. The routing complexities related to high voltage problem and individual program control also necessitate a large area of silicon. Therefore it is advantageous from an implementation perspective to increment or decrement all the elements of the weight matrix in parallel across an entire network [6] or at the very least a complete row or column simultaneously.

Trinary Backpropagation (TRIT) is a simple variant of the classical BP algorithm which makes the practical implementation of on-board chip learning feasible. In this algorithm, the weight changes are assumed to be only one of the three values: an increment, a decrement of the same magnitude, or zero.

The TRIT algorithm allows a parallel implementation of learning rules with coarsely quantized parameter changes in analog integrated circuitry [6]. Conceptually, the implementation of the trinary algorithm is relatively easy and eliminates the need for complicated circuitry for weight updates [6]. However, the dynamic range of the weight updating will be limited practically by the lack of local
control at the elemental weight cell.
Trinary Backpropagation uses the same forward and backward equations. The change of algorithm from the regular BP is in the weight and bias modification after the values of deltas are calculated as:

$$
\begin{align*}
\Delta W_{i j} & =\eta \delta_{i} O_{j} & & \left(\left|O_{j}\right| \succeq \epsilon_{1},\left|\delta_{i}\right| \succeq \epsilon_{2}\right) \\
& =0 & & \left(\left|O_{j}\right|\left\langle\epsilon_{1},\right| \delta_{i} \mid\left\langle\epsilon_{2}\right)\right. \\
\Delta b_{i} & =\eta \operatorname{sgn}\left(\delta_{i}\right) & & \left(\mid \delta_{i} \geq \epsilon_{2}\right)  \tag{5}\\
& =0 & & \left(\left|\delta_{j}\right|\left\langle\epsilon_{2}\right)\right.
\end{align*}
$$

where $\boldsymbol{\eta}, \epsilon_{1}, \epsilon_{2}$ are positive constants. For constant $\boldsymbol{\eta}$, the learning process correspond to motion on a lattice in a weight/bias space, which is in a direction of decreasing sum- square error for each training pattern pair, although not generally in the direction of steepest descent. $\epsilon_{1}, \epsilon_{2}$ are current or voltage programmable constants.

The derivative $F\left(s_{j}\right)$ is implemented in a piecewise fashion as follows:

$$
\begin{array}{rlrl}
F\left(s_{j}\right) & =R_{L} & & \text { for } \\
& =R_{H} & & F\left(s_{j}\right)<V_{f}  \tag{6}\\
\text { for } & F\left(s_{j}\right)>V_{f}
\end{array}
$$

The programming circuitry will conceptually consist of a three position switch:

1) One position allows application of a programming current or voltage pulse to the weight circuit which would increment the stored charge by a discrete amount. This is equivalent to a fixed positive increment in the weight value.
2) A second position allows decrementing the stored charge by tunneling off charge. This is equivalent to a fixed discrete decrement of the weight value.
3) The third and final position leaves the circuit open and prevents any program modification of the stored charge. This is equivalent to zero change or no change in weight matrix.

### 1.6 Advantages

The primary advantage is the on-board chip learning implementation. It makes it practically feasible to train any network by building application specific hardware. It conserves area in case of VLSI implementation.

The network is inherently faster than the standard BP. So it is possible to build real time systems with this algorithm, which can be used in Natural Language Processing, Vision control etc.

This algorithm has been proved to be faster in convergence than the standard $B P$ and even the $B P$ with adaptive weight modification. Therefore it is our intention to realize faster convergence through hardware learning.

From the simulations, it can be stated that the component-to-component variation has a negligible effect on the convergence property on hardware implementation. Thus hardwired TRIT Neural Networks appear to be robust and error
tolerant of the imperfections in poorly matched devices. Because of the enormity of the processing nodes involved, damage to a few nodes or links does not significantly impair their functionality.

Massively parallel Analog hardware networks, which are very fast and operate in parallel, can be developed based on these simulation which prove that the Algorithm is convergent for small benchmark problems.

Such hardware, which can be used to test the validity of the practical implementation of ANN's and collective systems, can be designed for various specific applications.

The software learning accomplished on Von-Neumann computers does not exploit the inherent parallelism of the Neural Networks [14]. By using Hardware rather than Software learning, the Neural Networks inherent parallelism is fully utilized.

TRIT implementation can be easily understood from the single layer system diagram (Figure 2) which consists of an input function, and output circuit function (F(.)), a delta input function, a delta output function and the weight matrix array.

### 1.7 Literature Survey

Several CMOS analog implementations of ANN's have been reported in the literature. Some are inspired by biological


Figure 2. IC Signal Flow Floor Plan
models [23] while some are derived from artificial models. They are dedicated to signal processing, image processing or pattern recognition without or with in-situ learning. They include digital or continuous valued analog signals. These networks use different learning algorithms like Hopfield, Grossman, Backpropagation etc. Also some of these works involve the building of basic cells on a chip with their test results. Since it is very difficult to discuss all the building block approaches of all the types of learning algorithms, we restrict our discussion to VLSI implementation of backpropagation learning of ANN's which have been fabricated.

Boser et.al [4] implemented an optical digit recognizer on a neural network chip which is trained by
backpropagation. It recognizes handwritten digits from a 20x20 pixel image with $2.9 \%$ miss-classifications compared to a typical value of $2.5 \%$ for human beings. The network consists of 133000 connections of 3500 neurons arranged in 5 layers. The throughput of the chip is $130 \mathrm{MC} / \mathrm{s}$ and the operating frequency is 20 MHz .

Nijhuis et.al [26] have fabricated a collision avoidance neural network in a 2 micron double metal CMOS technology. They had used fully digital network which was laid out using standard cell library. It has an operating frequency of 20 MHz and 10 M interconnects per second. The chip consists of 12 neurons and 144 synapses and 134 I/O
pads.

> 1.8 Proposal for Hardware Implementation of TRIT

We propose the hardware implementation of potentially useful TRIT model in 2 micron Thin-Film Silicon-on-Sapphire (TSOS) process.

The most significant reasons for preferring TSOS over bulk process is the reduced $V_{T}$ variation due to reduction of bulk threshold parameter ( $\gamma$ ) and that Thin oxide film of $250^{\circ} \mathrm{A}$ facilitates electron tunneling. The TSOS Process has both depletion and enhancement mode devices.

TSOS devices are made with epitaxial silicon islands on a sapphire substrate. There is no body (substrate) contact on the device. The threshold voltage $\mathrm{V}_{\mathrm{T}}$ for a n-channel transistor is given by

$$
\begin{equation*}
V_{T}=V_{T 0}+\gamma\left[\sqrt{2 \Phi_{T_{P}}+V_{S B}}-\sqrt{2 \Phi_{i_{F}}}\right] \tag{7}
\end{equation*}
$$

where
$\mathrm{V}_{\mathrm{T} 0} \quad=\quad \mathrm{V}_{\mathrm{T}}$ at zero source to body potential
$\gamma=$ bulk threshold parameter
Ф $=$ strong inversion potential
$\mathrm{V}_{\mathrm{s}} \quad=$ source to body potential
For TSOS devices, $V_{T} \approx V_{T 0}$. The threshold shifts are minimized to a great extent. Also source-to-body and drain-to-body capacitances are negligible reducing parasitics and increasing Bandwidth (BW).

The electronic implementation of TRIT BP involves building sub-components or blocks. The final architecture can be realized be interconnecting such blocks on a single substrate. Being a parallel processing structure, Backpropagation networks are both iterative and highly structured. The building blocks take advantage of that fact simplifying design and testing at both cell and the system levels. This thesis addresses the design, simulation, and layout, of these basic building blocks. System level integration is beyond the scope of this thesis.

The proposed IC is universal in the sense that a single IC implements each layer of a multilayer perceptron. There is one to one relationship between the weight matrix and each IC. The building block is at least theoretically extensible in the horizontal fashion to any number of layers. However the maximum number of neurons is fixed vertically by fabrication and limited by the pin count. The power supply rails also limit the magnitude of the backpropagated term which would limit their horizontal extension. The vertical extension is the number of neurons per layer of the chip. The horizontal extension is the number of layers used in an application. Normally threelayer networks with an hidden layer is sufficient to approximate any function to a reasonable degree [28].

As opposed to traditional voltage mode analog signal processing, in which inherently current signals are
converted to the voltage domain before any analog signal processing takes place, a recently reintroduced, current mode analog signal processing approach is taken. The current mode approach takes advantage of 1) the convenience of summing inner product and backpropagtion currents 2) the bidirectionality of triode Floating gate cMOS based weight multipliers. The use of current rather than voltage as an active parameter can result in higher gain, accuracy, and wider bandwidth due to the reduced voltage excursion at dynamic nodes [15].

Simulations were performed using SPICE. Layouts are accomplished by using CAD layout tool MAGIC. All circuits are fabricated using NRaD's fabrication facilities.

Chapter II will discuss the software implementation of this model. Chapter III will focus on the design, simulation, and performance testing of all building blocks. Chapter IV will offer conclusions based on the results and suggestions for the future work connected with investigation of this proposed hardware implementation.

## CHAPTER II

TRIT MODEL SIMULATIONS

This chapter discusses the software program developed to test and validate the behavioral aspects of the TRIT algorithm. The purpose of the simulations is to test the convergence properties of the TRIT algorithm with variation in learning rate and $\epsilon_{2}$ under non-ideal conditions. The component non-idealities are incorporated in the TRIT simulations. The TRIT algorithm will be compared to the Standard Backpropagation in the speed of convergence and its sensitivity to device imperfections. A character mapping and a pattern fitting problem will be used to establish the base line performances. The programs were developed in Matlab.

In VLSI circuits, effects including random offsets and mismatch, system distortion, frequency response, and temperature variations perturb the system outputs [7]. The effects that dominate the error in the system depends on the system implementation. In our simulations, we have attempted to include most of the device imperfection effects that play a major role in our implementation. The variation
in transconductance, the $\mathrm{V}_{\mathrm{T}}$ mismatch, and the channel length modulation parameter $(\lambda)$ effect are the major concerns in analog circuit inaccuracies. The errors due to the above--mentioned inaccuracies in current conveyors, weight matrix, and output function (F(.)) will be discussed in this chapter.

### 2.1 Trit Program Description

A program which calculates the initial weight matrix, biases and does the feed-forward calculation is run first and is enclosed in Appendix $C$.

The program sets all the values of the weight and bias elements to 0.5 so that all the circuit points are started at the same initial condition. A random number corresponding to $5 \%$ variation of the weight elements is added to all the elements of the network. This number simulates the error present in the multiplier circuit. The reason is that the multiplier and the weight update circuits are non-ideal and a detailed analysis of the errors is presented in Appendix D. The current conveyors are driven by an opamp which is non-ideal. The opamp error is due to the mismatch of the input transistors as shown in Figure 13. This mismatch of the transistors introduces both $\mathrm{V}_{\mathrm{T}}$ and $\boldsymbol{\beta}$ errors. Also the current mirroring transistors in current conveyor (see Figure 8) is not ideal because of the channel
length modulation parameter ( $\lambda$ ) effect. Dynamic cascoding is utilized to reduce the $\lambda$ effect. The resulting effective $\boldsymbol{\lambda}$ is then negligible compared to the $\mathrm{V}_{\mathrm{T}}$ and $\boldsymbol{\beta}$ errors. The $\boldsymbol{\beta}$ error is reduced by laying out the transistors M1 and M2 in a common centroid geometry and maintaing moderate geometries. These errors are represented by adding a 5\% random number to the following elements of the network.

In the main program, shown in Appendix $A$, the values of $\epsilon_{1}, \epsilon_{2}$ and $\boldsymbol{\eta}$ are set. The value of the learning rate determines the number of iterations needed by the network to converge. Then the iteration is started as shown by a flowchart in Figure $3 \& 4$. If the steady state error (SSE) is less than 0.1 , the program is terminated.

If $\operatorname{SSE}$ is greater than 0.1 , the following procedure is started :

The values of $\delta$ and $\delta_{2}$ are calculated. The values of $\boldsymbol{\delta}^{\prime}$ s determine whether the bias elements are to be adjusted. The values of $\boldsymbol{\delta}^{\prime} \mathbf{s}$ and outputs at the previous node corresponding to the weight matrix determines whether the weight matrix has to be updated. The region where the weights are to be updated are clearly shown in Figure 5. As seen from the Figure, if the values of $\delta$ and $O_{j}$ are greater than the threshold values, then and only then, are the corresponding weight elements $W_{i j}$ updated. Otherwise the corresponding value of that weight element $W_{i j}$ remains unchanged. Similarly if and only if, the value of $\boldsymbol{\delta}$ is


Figure 3. Flowchart of the TRIT program


Figure 4. Flowchart of the TRIT program


Figure 5. Regions of updation of weight vectors
greater than that of $\epsilon_{2}$, the bias elements are updated.
After the modification of weights a noise term of 0.01 $(\sigma)$ is added to each element of the weight and bias matrix. The value of .01 stems from the fact that the values of the weight matrix and the bias elements vary from 0.5 to 1.5 and the noise floor is assumed to be atleast 60 dB down and centered around the mean value of 1.0 . The input vector is multiplied by the multiplier circuit which is explained in Chapter III. The analysis of the multiplier circuit indicate a maximum error of $1 \%$ due to the $\beta$ and threshold mismatch. Appendix D analyses the errors in the multiplier circuit. This error effect is introduced in the simulations by adding a value of 0.01 .

The forward computation is now completed and the output error has been determined. The output error is plotted with respect to the number of epochs to observe the behavior of the network. Since there is always a finite range of weight values, which depends on the dynamic range of the weight multiplier circuit, weight variation is bounded. The upper limit is set at $\pm 3$ and the lower limit at $\pm 0.2 \mathrm{~V}$. Also the squashing current conveyor or each layer outputs is also not ideal due to $V_{T}$ and $\beta$ effects. This effect is not symmetrical. A detailed analysis of the effect is shown in Appendix D. This effect is taken care of by adding the noise values to the output values at each layer.

### 2.2 Standard BP Program Description

A copy of the program is enclosed in Appendix B. The same initial weight matrix and the feedforward values are used to start the program.

First the value of $\eta$ is set. Then the iteration is started. The SSE is checked for a value less than 0.1. If it is less than 0.1 , the program is terminated. Else the following procedure is continued.

The weight matrix and the bias vector elements are modified according to the standard BP formulas. The forward computation is completed and the network output is calculated. The network output is subtracted from the desired output to get the output steady state error. The error is plotted with respect to the number of epochs to observe the behavior of the network. Further the next loop is started by checking the Steady State Error (SSE) and calculating the values of $\boldsymbol{\delta}^{\prime} s$.

### 2.3 Adaptive BP Program Description

The adaptive modification is enclosed in Appendix C. The backpropagation networks typically employ adaptive modification to eliminate local minima or to speed up the convergence of the network. The speed of convergence is increased by increasing the learning rate if the error
vectors tend toward minima and vice versa.
If the value of steady state error decreased in successive iterations, the value of learning rate $\eta$, is multiplied by a factor of 1.07. If the steady state error increased in successive iterations, the learning rate is decreased by a factor of 1.02 .

If the steady state error is constant, the value of $\epsilon_{2}$ is decreased. Care is taken to test that the value of $\epsilon_{2}$ is not decreased by more than a factor of 1000. $\epsilon_{2}$ was started initially with 0.02 . If the steady state error was constant in successive iterations, then the values of $\epsilon_{2}$ was halved. If the value of error was constant even though the value of $\epsilon_{2}$ is reduced by a factor of 1000, the network was considered as nonconvergent. The argument proposed is that the variation of this parameter $\epsilon_{2}$ in an actual network is limited by the noise floor of the network. We assume a SNR of 60 dB or noise floor of -60 dB . Then if we reduce the value of $\epsilon_{2}$ below 60 dB , the noise in the circuit takes over, and it becomes practically impossible to control the circuit.

### 2.4 Testing Procedures

The initial simulations were performed to provide evidence for comparing the convergence properties of back propagation with this three-state or trinary quantization of
weights and bias updates with those of standard back propagation when applied to the same problems. During each iteration of the learning trial, the pairs of the input/desired output patterns in the training set were presented in a fixed sequence to the network and weights updated for each, and then the network output was tested over all pairs of the training set. This was continued till the convergence was obtained.

Convergence was defined such that all errors between desired and actual network outputs were required in magnitude to be smaller in magnitude than 0.1. As mentioned earlier the values of $W_{i j}$ and $b_{i}$ 's are varied by adding a random constant at every iteration.

A character-mapping and a pattern-matching problem was selected. The number of hidden units and the learning rate were varied for standard $B P$, whereas for the trinary scheme, the value of $\epsilon_{2}$ is also varied. The value of $\epsilon_{1}$ was set at 0.33 . The initial weights were set to be $0.5+$ a random number ( $\sigma=0.2$ ).

### 2.5 Results

Table 1 depicts the result of the simulations in which the number of hidden units and the learning rate were varied. The striking result is the difference in the convergence time of the two algorithms. Standard BP takes a
very long time to converge. After a range of correcting the weights, the network goes through a prolonged phase in which the improvement is very low. In fact the learning rate has to be large (>.2) for the standard $B P$ to converge. As the number of hidden units is increased, the number of iterations goes down significantly. For learning rates less than 0.1 , it performs very poorly compared to the trinary algorithm.

In the case of trinary BP as shown in Table 1 , for learning rates less than 0.05 , it is 5 to 10 times faster than standard BP. But for a learning rate in the range of 0.3, it is even 15 times faster than BP. But it is doubtful whether trinary algorithm can have such a large learning rate because the RMS weight correction may be very large per iteration. However this simulation gives a feel for the convergence of the trinary algorithm and its relative speed of convergence compared to the standard BP. The reason for rapid convergence of the trinary algorithm may be due to the scaling imposed upon the weight and bias vectors updates by quantization and its investigation is beyond the scope of this thesis.

Failure to converge within the iteration limit occurred for the TRIT problem whenever the value of $\epsilon_{2}$ was set to be greater than 0.01 for the various numbers of hidden units and the learning parameters. This occurs because the SSE limit set was 0.1 and unless the delta terms become smaller

TABLE I

## CONVERGENCE COMPARISION OF STANDARD BP AND TRIT

| $\begin{aligned} & \text { Algor } \\ & \mathrm{i}-- \\ & \text { thm } \\ & \hline \end{aligned}$ | $\begin{aligned} & N_{n}= \\ & 10 \end{aligned}$ | $\begin{aligned} & n_{\mathrm{h}}= \\ & 20 \end{aligned}$ | $\begin{aligned} & N_{\mathrm{n}}= \\ & 40 \end{aligned}$ | $\begin{gathered} N_{\mathrm{n}}= \\ 10 \end{gathered}$ | $\begin{aligned} & N_{\mathrm{n}}= \\ & 20 \end{aligned}$ | $\begin{aligned} & N_{\mathrm{n}}= \\ & 40 \end{aligned}$ | $\begin{aligned} & N_{\mathrm{n}}= \\ & 10 \end{aligned}$ | $\begin{aligned} & N_{\mathrm{n}}= \\ & 20 \end{aligned}$ | $\begin{aligned} & \mathbf{N}_{\mathrm{h}}= \\ & 40 \end{aligned}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| BP | $\begin{aligned} & 440 \\ & (0.3) \end{aligned}$ | $\begin{aligned} & 260 \\ & (.3) \end{aligned}$ | $\begin{aligned} & 145 \\ & (.3) \\ & \hline \end{aligned}$ | $\begin{aligned} & >500 \\ & (.1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 490 \\ & (.1) \end{aligned}$ | $\begin{aligned} & 290 \\ & (.15) \end{aligned}$ | $\begin{aligned} & >500 \\ & (.05) \end{aligned}$ | $\begin{aligned} & >500 \\ & (.05) \end{aligned}$ | $\begin{aligned} & >500 \\ & (.05) \end{aligned}$ |
| TRIT BP | $\begin{aligned} & 9 \\ & \text { (.3) } \end{aligned}$ | $\begin{aligned} & 9 \\ & (.3) \end{aligned}$ | $\begin{aligned} & 9 \\ & (.3) \end{aligned}$ | $\begin{aligned} & 45 \\ & (.1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 33 \\ & (.1) \\ & \hline \end{aligned}$ | $\begin{aligned} & 27 \\ & \text { (.1) } \end{aligned}$ | $\begin{aligned} & 158 \\ & (.02) \end{aligned}$ | $\begin{aligned} & 53 \\ & (.05) \end{aligned}$ | $\begin{aligned} & 54 \\ & (.05) \end{aligned}$ |

than 0.01 , convergence is not possible. if the delta terms fall below $\epsilon_{2}$ the weight corrections ceases due to quantization and the network doesn't converge.

### 2.6 Comparison of Adaptive BP and TRIT

It is clear from the simulations that $B P$ with adaptive weight modification is slower than the TRIT implementation in speed of convergence. See the results of simulation shown in Table 2.

The TRIT BP is 2 times faster than the Adaptive weight modified BP. The number of iterations remains constant for the variation of the number of hidden units in standard BP with adaptive weight modification while it varies very little in TRIT BP. So based on our limited simulation results, even the adaptive weight modification of $B P$ will not result in comparatively faster in convergence than the TRIT.

### 2.7 Summary

These simulations give a feel for the potential success of a TRIT based algorithm. The TRIT algorithm is found to be faster in convergence compared to the BP algorithm. The component-to-component variation appears to have an negligible effect on the convergence property of TRIT, which

TABLE II
CONVERGENCE COMPARISION OF ADAPTIVE BP AND TRIT

| Algo <br> ri-- <br> thm | $\mathrm{N}_{\mathrm{h}}=$ <br> 20 | $\mathrm{N}_{\mathrm{h}}=$ <br> 40 | $\mathrm{N}_{\mathrm{h}}=$ <br> 60 |
| :--- | :--- | :--- | :--- |
| BP | 74 | 74 | 74 |
| TRIT <br> BP | 41 | 42 | 47 |

is encouraging because the analog hardware components are inherently low-accuracy components. The various building blocks that went into this development of TRIT hardware and their simultion results are discusssed in Chapter III.

## CHAPTER III

## SYSTEM BUILDING BLOCKS

This chapter presents the design and simulation results for the basic building blocks of the TRIT algorithm. The proposed architecture takes advantage of the bidirectionality of the current conveyors and the EEPROM CMOS based weight multipliers. The architecture of the network is shown in Figure 2. The EEPROM based weight multipliers are arranged in a matrix form as shown in the Figure 2. The input circuit consists of two current conveyors driving the weight matrix. The current conveyor functions as a processing element and as well as a bidirectional voltage/current buffers (BiVI) [52]. The current conveyor based weight matrix drivers are shown in Figure 6. The current conveyors on the input side $\left(\mathrm{CC}_{1}\right.$ and $\mathrm{CC}_{2}$ ) function as voltage buffers driving a single weight matrix column. The current conveyor $\mathrm{CC}_{3}$ is used as current controlled current source. The current conveyor $\mathrm{CC}_{4}$ functions as an output buffer, driving a non-linear mapping (squashing) function (F(.)). The derivative circuit, which is used to find the delta vectors, is also shown in the


Figure 6. CC Based Weight Matrix Drivers
system architecture. A weight adjustment logic, based on the TRIT algorithm, is needed for the weight update during network learning.

From this discussion, it is clear that the current conveyors form the most basic element of this architectural approach and care must be taken to specify their performance. This is necessary for the input/output circuits to interface properly, because both the input and the output circuits are made up of current conveyors. The following section is a brief summary of the interface specifications.

### 3.1 TRIT Interface Specifications

```
Linear Limiting Squashing CCII+
\(I_{\text {in }}(\) Full Scale \(): 50 u A @ x= \pm 0.5 \mathrm{~V}\)
\(I_{\text {out }}\) (Sat): \(\pm 3 \mathrm{uA}\)
\(I_{\text {out }}\) (sat) \(=\beta_{\mathrm{wt}} \mathrm{V}_{\mathrm{wt} \mathrm{FS}} \mathrm{V}_{\mathrm{FS}} / 4\)
\(\mathrm{R}_{\text {out }}>10 \mathrm{MEG}\) at \(\mathrm{I}_{\text {out }}<=3 \mathrm{uA}\)
\(\mathrm{R}_{\mathrm{L}}=4 /\left(\boldsymbol{\beta}_{\mathrm{wt}} * \mathrm{~V}_{\mathrm{wtFS}}\right)\)
Input \& Output Biasing CCII+
X:士2.5V
\(\mathrm{Y}: \mathrm{I}_{\text {out }}>= \pm 1500 \mathrm{uA}\) @ \(\mathrm{Z}= \pm 2 \mathrm{~V}\)
Z:Scale 1:1 at Follower and 1:1 @mirror
\(\mathrm{R}_{\text {out }}>=1\) MEG @50uA (Cascoded)
```

Current Error<=1\%

Delta CCII+
$\mathrm{X}: \pm 2.5 \mathrm{~V}$
$\mathrm{Y}: \mathrm{I}_{\text {out }}>= \pm 50 \mathrm{uA}$ @ $\mathrm{Z}= \pm 0.5 \mathrm{~V}$
Z:Scale 1:1 at follower and 1:1 @ mirror
$\mathrm{R}_{\text {out }}>=1 \mathrm{Meg}$ @50uA (Cascoded)
Current Error $<=1 \%$

### 3.2 Current Conveyors

### 3.2.1 Introduction

A current conveyor is a four terminal device which performs many useful analog signal processing functions when used in arrangment with other electronic elements. Current conveyors are functionally flexible and versatile. They can form an integral part of all I/O circuits [49-51]. Current conveyors offer several advantages over conventional operational amplifiers. They provide the highest gain bandwidth product of the process, which depends only on the opamp used [48]. They are used within this thesis as a practical building block for the implementation of the TRIT algorithm.

The block diagram of a current conveyor (CC) is shown in Figure 7. Class-I (CCI $\pm$ ) and class-II (CCII $\pm$ ) conveyors have well defined properties [52]. A CCII士 can be expressed


Figure 7. Block Diagram of the CC-II+-
as:

$$
\left[\begin{array}{c}
I_{Y} \\
V_{X} \\
I_{Z}
\end{array}\right]=\left[\begin{array}{ccc}
0 & 0 & 0 \\
1 & 0 & 0 \\
0 & \pm 1 & 0
\end{array}\right]\left[\begin{array}{c}
V_{Y} \\
I_{X} \\
V_{Z}
\end{array}\right]
$$

From the above equation it is clear that no current flows into terminal $Y$. The voltage applied to terminal $Y$ will cause an equal voltage to appear on terminal $x$. Terminal $Y$ exhibits an infinite input impedance and terminal $X$ exhibits a zero input impedance. An input current $I_{x}$ on terminal x causes an equal current to flow into or out of the high impedance output terminal $Z$. The positive sign indicates that at any instant both, $I_{x}$ and $I_{z}$ are in the same direction (CCII+) while the minus sign denotes the opposite directions of the currents signifying CCII-.

The current CCII configuration allows convenient switching between the current conveyor mode and the Voltage controlled Voltage Source (VCVS) mode. This choice supports the generation of matrix inverse function which is essential for implementing backpropagation.
"The CCII may be viewed as an ideal transistor" [45]. The ideal behavior of the NMOS ( $\mathrm{M}_{\mathrm{FN}}$ in Figure 8) transistor can be achieved by using it in the negative feedback loop of the operational amplifier. The current can only flow away from the X terminal. If a $\mathrm{PMOS}\left(\mathrm{M}_{\mathrm{FP}}\right.$ ) transistor is used in the feedback loop, current will be restricted to flow into the X terminal. Bi-directional current flow can be achieved


Figure 8. Current Conveyor Symbol
by using a complementary pair of MOS transistors ( $M_{F N}$ and $M_{P P}$ ) in the opamp feedback loop. This drain current of $M_{\text {FN }}$ and $M_{\text {FP }}$ can be mirrored to the output node $Z$. Thus the input current $I_{x}$ is conveyed to the output current $I_{z}$. This is a CCII+ realization, since both $I_{x}$ and $I_{z}$ simultaneously flow in the same direction.

The bi-directional voltage/current (BiVI) buffers which are based on the current conveyor concept, are shown in Figure 6. These buffers provide the dual function of voltage drivers and current sources/sinks to isolate the W matrix in the forward/reciprocal mode. During the feed forward cycle, the buffers on the input side are configured as voltage controlled voltage sources, and the buffers on the output side are configured as current controlled current sources. The output side consists of a current conveyor driving the non-linear squashing function $F($.$) which$ develops the output voltage to the next layer. A nearly identical structure is duplicated to achieve Backpropagation. Each current conveyor accepts current in the input mode or supplies the drive voltage (and current) to the weight row matrix (math column).

During the forward propagation cycle the $Y$ point of $\mathrm{CC}_{1}$ and $\mathrm{CC}_{2}$ are tied to $\mathrm{O}_{1}$ while the Y point of $\mathrm{CC}_{3}$ and $\mathrm{CC}_{4}$ are grounded. The voltage controlled voltage source configured CC ensures $\mathrm{V}_{\mathrm{X} 1}=\mathrm{V}_{\mathrm{Y} 1}$. This results in voltage $\mathrm{O}_{\mathrm{i}}$ being applied across the drain to source of both the floating gate and
reference transistors in the $i^{\text {th }}$ row. The current flowing in the reference transistor is mirrored and applied to the floating gate column. Due to the presence of the weight charge on the floating gate, these currents will differ by the inner product of the applied voltage $\left(O_{i}\right)$ and the stored weights. The difference or signal current $O_{j}$ is then measured by $\mathrm{CC}_{3}$. Current squashing is accomplished by the modified $\mathrm{CC}_{3}$. The current controlled current source configured CC ensures $I_{z 3}=I_{x 3}$. This current is the input to a linear saturating function which limits the current to 3uA. Similarly, if all the column currents are summed in the $X_{4}$ terminal then the resulting current indicates the multiplied value of the input weight vectors and the weight matrix.

During back propagation a voltage $\mathrm{V}_{\text {din }}$ is applied to the Y inputs of both $\mathrm{CC}_{3}$ and $\mathrm{CC}_{4}$ with the Y inputs of $\mathrm{CC}_{1}$ and $\mathrm{CC}_{2}$ grounded resulting in role reversal of CC 's with the $I_{\mathbf{\delta}^{\prime} \text { out }}$ current available at the Z output of $\mathrm{CC}_{1}$. $\mathrm{I}_{\mathbf{\delta}^{\prime} \text { out }}$ must be further processed (multiplied by a derivative) to compute $\boldsymbol{\delta}_{\text {out }}$. The whole approach critically depends on the high accuracy of the cC's. The following section presents the design of $\mathrm{CC}_{1}$ through $\mathrm{CC}_{4}$.

### 3.2.2 High Power Conveyor

3.2.2.1 Design

The high power current conveyor $\mathrm{CC}_{2}$ must be able to supply the total weight current in one column of the weight matrix. The signal swing is also determined by the two current conveying transistors $M_{P N}$ and $M_{P P}$ as shown in Figure 9. The two transistors have to be saturated for satisfactory performance. The total weight current in one column is

$$
I_{\mathrm{HPCC}}=100 * 3 * 5=1500 \mathrm{uA}
$$

To source/sink this current, the source/sink transistors, $M_{\text {IPA }}, M_{F N}, M_{I N A}$, and $M_{\text {FP }}$ must be sized appropriately. For a weight current of 1500 uA and an output swing of $\pm 3.5 \mathrm{~V}$ (Section 3.1).
$(\mathrm{W} / \mathrm{L})_{\mathrm{MEN}}=2 * \mathrm{I} /\left((\Delta \mathrm{V})^{2} * \mathrm{~K}_{\mathrm{n}}\right)=2 * 1500 /\left(\left(3.5-\mathrm{V}_{\mathrm{TN}}\right)^{2} * 48\right)=10$
$(\mathrm{W} / \mathrm{L})_{\mathrm{MFP}}=2 * \mathrm{I} /\left((\Delta \mathrm{V})^{2} * \mathrm{~K}_{\mathrm{p})}=2 * 1500 /\left(\left(2.5-\mathrm{V}_{\mathrm{TN}}\right)^{2} * 21\right)=20\right.$
We expect a output swing (at Z terminal) of at least $\pm 2 \mathrm{~V}$. This is because, the weight transistor drain to source voltage swing is determined by this Z terminal. The dynamic range of the weight multipliers depend on the Z terminal swing. The length of the cascoding transistors (M9 and M10) was fixed to be $2 u m$, while the length of the mirroring transistor was 6 um (MINA and MINB). The long mirroring transistor was selected to reduce the channel length


Figure 9. Current Conveyor circuit
modulation parameter ( $\lambda$ ) effect and to reduce local geometric and doping mismatches.

$$
\begin{gathered}
\text { So } \quad(\Delta \mathrm{V})_{\mathrm{M} 10}=(\Delta \mathrm{V})_{\mathrm{MIPB}} / \sqrt{ }(6 / 2) \\
(\Delta \mathrm{V})_{\mathrm{M} 9}=(\Delta \mathrm{V})_{\mathrm{MINB}} / \sqrt{ }(6 / 2) \\
(\Delta \mathrm{V})_{\mathrm{MIPB}} / \sqrt{ }(6 / 2)+(\Delta \mathrm{V})_{\mathrm{MIPB}}=3+\mathrm{V}_{\mathrm{TP}} \\
(\Delta \mathrm{~V})_{\mathrm{MINB}} / \sqrt{ }(6 / 2)+(\Delta \mathrm{V})_{\mathrm{MINB}}=3-\mathrm{V}_{\mathrm{TN}} \\
\text { where }(\Delta \mathrm{V})_{\mathrm{MINB}}=\sqrt{ }\left(2 * \mathrm{I}_{z} / \beta_{\mathrm{MINB}}\right) \\
\text { and }(\Delta \mathrm{V})_{\mathrm{MIPB}}=\sqrt{ }\left(2 * \mathrm{I}_{\mathrm{z}} / \beta_{\mathrm{MIPB}}\right) \\
\mathrm{I}_{z}=1500 \mathrm{uA}, \mathrm{k}_{\mathrm{n}}=48, \mathrm{k}_{\mathrm{p}}=21
\end{gathered}
$$

Solving these equations, we get

$$
(W / L)_{\mathrm{MIPB}} \approx 75
$$

and $\quad(W / L)_{M I N B}=37.5$.
For a unity ratio current mirror, the dimensions of the transistors MIPA and MINA are same as that of MIPB and MINB respectively.

$$
\begin{aligned}
& (W / L)_{\mathrm{MINA}}=(\mathrm{W} / \mathrm{L})_{\mathrm{MINB}}=37.5 \\
& (\mathrm{~W} / \mathrm{L})_{\mathrm{MIPA}}=(\mathrm{W} / \mathrm{L})_{\mathrm{MIPB}}=75 .
\end{aligned}
$$

Since we have fixed the length of the transistors at 6um,

$$
\begin{aligned}
& \mathrm{W}_{\mathrm{MINA}}=\mathrm{W}_{\mathrm{MINB}}=6 * 37.5=225 \mathrm{um} \\
& \mathrm{~W}_{\mathrm{MIPA}}=\mathrm{W}_{\mathrm{MIPB}}=6 * 75=450 \mathrm{um} \\
& \mathrm{~W}_{\mathrm{MINA}}=\mathrm{W}_{\mathrm{MINB}}=2 * 116=232 \mathrm{um} \\
& \mathrm{~W}_{\mathrm{MIPA}}=\mathrm{W}_{\mathrm{MIPB}}=2 * 240=480 \mathrm{um}
\end{aligned}
$$

The result is that the widths are very large for the mirror transistors. This is due to the design objective of maintaining at least a 2 V swing at the output ( Z terminal),
while sinking such a large current in the mA range. This large current will translate into large $\Delta \mathrm{V}$ drop across the mirroring and the cascoding transistors (MIPB and M1O, MINB and M9). For the required $Z$ output swing, we need to have only a $2 \mathrm{~V}(\Delta \mathrm{~V})$ drop across the two large current carrying transistors. This causes the widths of these transistors to be large to reduce the $\Delta V$ value across the transistors.

The design requirement for the z terminal output swing of 2 V would necessitate P -transistor widths of 900 um . Therefore for pragmatic reasons, the Z terminal swing was reduced to $1 V$ for final fabrication.

The revised values of ( $W / L$ )'s for the mirrors considering a value of $1 V$ output swing in $Z$ terminal are

$$
\begin{aligned}
& \mathrm{W}_{\mathrm{MIPA}}=\mathrm{W}_{\mathrm{MIPB}}=180 \mathrm{um} \\
& \mathrm{~W}_{\mathrm{MINA}}=\mathrm{W}_{\mathrm{MINB}}=90 \mathrm{um} \\
& \mathrm{~W}_{\mathrm{M} 9}=90 \mathrm{um} \\
& \mathrm{~W}_{\mathrm{M} 10}=180 \mathrm{um}
\end{aligned}
$$

### 3.2.2.2 Simulations

The D.C., and A.C. characteristics of the high power current conveyor are shown in Figures 10 \& 11 respectively. The D.C. curve shows that the output current tracks the input current in the range of 1.5 mA . The MIPA and MINA current indicate the mirroring currents, while MIPB and MINB indicate the mirrored currents. The simulated error between


Figure 12. D.C. Characteristics of Low Power CC


Figure 11. A.C. Response of High Power CC
these two currents is found to be less than $1 \%$ ( 01.5 mA ), which indicates that our objective of accurate current conveying is achieved at the required output swing. The output swing is determined by forcing the Z terminal to remain at $1 V$ while sweeping the input current. The $\omega_{3 \mathrm{db}}$ point can be determined from the A.C. characteristics. The A.C. characteristic of Figure 11 was obtained by biasing the circuit at an d.c. input current of luA. For specific applications, the full power bandwidth is also an important criterion which is computed from the slew rate.

### 3.2.3 Low Power Conveyor

The output current requirement is $150 u$, which is $1 / 10$ of the high power conveyors's current. We want the same swing as the high power conveyor in Z , but at reduced current value. So the dimensions of all the transistors are the scaled values (by 10) of that of the high power current conveyor as follows

$$
\begin{aligned}
(\mathrm{W} / \mathrm{L})_{\mathrm{FN}} & =1 \\
(\mathrm{~W} / \mathrm{L})_{\mathrm{MP}} & =2 \\
\mathrm{~W}_{\mathrm{MIPA}} & =\mathrm{W}_{\mathrm{MIPB}}=18 \mathrm{um} \\
\mathrm{~W}_{\mathrm{MINA}} & =\mathrm{W}_{\mathrm{MINB}}=9 \mathrm{um} \\
\mathrm{~W}_{\mathrm{M} 10} & =18 \mathrm{um} \\
\mathrm{~W}_{\mathrm{M}} & =9 \mathrm{um}
\end{aligned}
$$

The D.C. characteristics of the low power current conveyor is shown in Figure 12. The D.C. output current follows the input current in the range of 150 uA and the current error due to $\lambda$ effect (excluding $\Delta \beta$ and $\Delta V_{T}$ ) is again found to be less than $1 \%$.

## 3.2 .4 Opamp

A simple self-compensating opamp is chosen for the required opamp function for the CCII due to its simplicity and the moderate offset demands of the CCII structure. This circuit results in a stable, self-compensated, minimum area opamp structure. The bandwidth of the opamp can be increased by increasing the bias currents at the expense of the power dissipation. More complex opamps will produce a better performance than this opamp but with increased area. The weakness of the opamp is the offset. However, since offset is adjusted as a part of backpropagation learning, offset requirements will not be stringent. Also practically, with an open loop gain of at least 60 dB offset performance will be limited more by the threshold matching of the input differential pair. Future refinement beyond the scope of this thesis will focus on Bandwidth and output swing improvement. The opamp determines the performance and


Figure 12. D.C. Characteristics of Low Power CC


Figure 13. opamp circuit diagram
the dynamic range of the $C C$ circuit. The circuit diagram of the opamp is shown in Figure 13.

### 3.2.4.1 Analysis of Opamp

In all the subsequent discussions, the symmetry of the opamp is exploited to reduce the redundancy of the equations i.e. M1 and M2, M3 and M4 and M6 and M7 are assumed to be symmetric and matched transistors.

Output Swing

Positive : It is obvious that to maintain M6 in saturation, the output voltage $\mathrm{V}_{\circ}$ is limited to

$$
\begin{equation*}
V_{O} \leq V_{G M 6}-V_{T P} \tag{15}
\end{equation*}
$$

Negative : To maintain M8 in saturation, the negative output voltage swing is limited to

$$
\begin{equation*}
V_{O} \geq V_{G M B}-V_{T N} \tag{16}
\end{equation*}
$$

## Input Common Mode Range

The input common mode range $\mathrm{V}_{\text {CMR }}$ is given by

$$
V_{C M R-} \geq V_{G M 5}-V_{T N M 5}+V_{T N M 1}
$$

Because $\mathrm{V}_{\mathrm{GM}}-\mathrm{V}_{\mathrm{TNM5}}$ is the minimum voltage to maintain M5 in saturation and the gate voltage of M1 should be at least a $\mathrm{V}_{\mathrm{T}}$ above its source voltage. $\mathrm{V}_{\mathrm{CMR}}$ will be less than $V_{G M 5}$ because $V_{T N M 1}>=V_{T M M 5}$. and there is a
threshold shift associated with M1 and not with M5. The threshold shift is due to the fact that the source of M1 is at a different potential than its substrate.

The input common mode range $\mathrm{V}_{\mathrm{CMR}}$ is given by

$$
\begin{equation*}
V_{C M R-} \geq V_{G M 5}-V_{T N M 5}+V_{T N M 1} \tag{18}
\end{equation*}
$$

The voltage $V_{G M 3}-V_{T P M 3}$ is the voltage to maintain M3 in saturation and the gate voltage of M 1 should be a $\mathrm{V}_{\mathrm{T}}$ above its drain voltage to maintain it in saturation.

Output Resistance

The output resistance of the opamp is

$$
\begin{align*}
r_{o} & =\mu_{M 6} *\left(r_{d s M 1} \| r_{d s M 3}\right) \| \mu_{M 8} * I_{d S M 10}  \tag{19}\\
\mu_{M 6} & =1 /\left(I_{D M 6} * \lambda_{M 6}\right) * g_{\text {mM6 }}
\end{align*}
$$

where

$$
\begin{aligned}
I_{d s M 10} & =1 /\left(I_{D M 6} * \lambda M 6\right) \\
g_{\text {mM6 }} & =\sqrt{\left(2 \beta I_{D M 6}\right)}
\end{aligned}
$$

So in order to maintain high output impedance, the $\lambda^{\prime}$ s of the transistors M8, M6 and M10 should be high. Since the channel length modulation parameter is inversely proportional to the length, the lengths of M6, M8 and M10 are made reasonably high to ensure a high output impedance. Gain

The gain of the opamp is given by

$$
\begin{equation*}
A_{v}=g_{m M 1} * r_{0} \tag{21}
\end{equation*}
$$

The value of $r_{0}$ is given by Equation 18.
Gain-Bandwidth Product (GBW)

The GBW is given by

$$
\begin{align*}
& G B W=g_{m M 1} / C_{o 1} \\
& \text { where } C_{01}=C_{L}+C_{O} \\
& C_{o}=C_{d b M B}+C_{\text {dbM }}  \tag{22}\\
& \text { Since } C o<C_{L} \\
& C_{o 1}=C_{o}=C_{L}
\end{align*}
$$

Slew Rate

The Slew rate (SR) is given by

$$
\begin{equation*}
S R=I_{s \varepsilon} / C_{0} \tag{23}
\end{equation*}
$$

where $I_{\text {ss }}$ is current through M5. So it is advantageous to increase the value of the width of M5 to increase input CMR and SR .

## Cut-off Frequency

The cut-off frequency $\mathcal{\omega}_{3 \mathrm{db}}$ is given by

$$
\begin{equation*}
\omega_{3 d b}=1 / r_{0} C_{o} \tag{24}
\end{equation*}
$$

where $r_{0}$ and $C_{o}$ are given previously.

### 3.2.4.2 Biasing Circuit of the opamp

The bias circuit that forms a part of the opamp circuit is shown in Figure 13. The transistor M3B sets up the gate
voltage for M6. By varying the gate voltage of M6, the required output positive output swing can be achieved. The transistors M6B and M10B mirror the input current to the transistors M5B and M9B. The reason for using two stack mirroring transistors is to achieve the required matching with the transistors M6, M8 and M10 in addition to cascoding. M2B controls the gate voltage of M3 and M4. The gate voltage of M3 and M4 and the widths effectively control the output current. The transistor M7B controls the gate voltage of M8. By controlling the gate voltage of M8, the required output swing in the negative direction can be achieved. The lack of symmetry in the biasing voltage of M8 is apparent. This can be corrected in future versions by including a transistor to match that of M10. The lack of symmetry produces an offset voltage as evident from the D.C. characteristics in Figure 14. The transistors M8B, M4B and M1B are used for mirroring and matching purposes. The gate voltage of M 5 is set by dimension of M10B and the current through it.

The design criteria from Section 3.1 results in the following biasing constraints.

Bias current : 50uA
Output swing : $\pm 3.5 \mathrm{~V}$
Since we desire an output swing of +3.5 V the gate voltage of M6 should be at least $3.5+V_{T P}$ to maintain M6 in saturation i.e. the gate to source drop on the transistor
voltage of M6 should be at least $3.5+V_{T P}$ to maintain M6 in saturation i.e. the gate to source drop on the transistor M3B should be $\left(3.5+V_{T P}\right) V$.

$$
\begin{gather*}
\text { So } \mathrm{V}_{\mathrm{G} 33 \mathrm{~B}}=\mathrm{V}_{\mathrm{GM} 7, \mathrm{M} 6}=\left(3.5+\mathrm{V}_{\mathrm{TP}}\right) \mathrm{V}=2.63 \mathrm{~V} \\
\sqrt{\frac{2 * I_{B I A S}}{\mathbf{B}_{M 3 B}}}+V_{T P}=2.63 \mathrm{~V} \tag{24}
\end{gather*}
$$

Substituting the values of
$I_{\text {BIAS }}=50 u A$
$\mathrm{V}_{\mathrm{TP}}=0.92 \mathrm{~V}$
$\beta_{43 \mathrm{~B}}=2 * \mathrm{I}_{\mathrm{BIAS}} /(2.5-0.92)^{2}=39$
$(\mathrm{W} / \mathrm{L})_{\text {M3B }}=39 / 21=2$
Similarly the negative output swing is -3.5V. So the gate voltage of M7B should be $-3.5+V_{T N}$ to maintain M8 in saturation. The gate to source drop on M7B should be ($\left.3.5+V_{T N}\right) \mathrm{V}$

So $\mathrm{V}_{\text {GSM7B }}=\mathrm{V}_{\text {GSM8 }}=2.6 \mathrm{~V}$

$$
\begin{equation*}
\sqrt{\frac{2 * I_{B I A S}}{\mathbf{B}_{M T B}}}+V_{T N}=2.6 \mathrm{~V} \tag{25}
\end{equation*}
$$

Solving for M7B by substituting the values
$\beta_{47 \mathrm{~B}}=2 * \mathrm{I}_{\mathrm{BIAS}} /(2.6-0.92)^{2}=41$
$(\mathrm{W} / \mathrm{L})_{\mathrm{M} 7 \mathrm{~B}}=1$
The gate to source voltage on M5, M3 and M4 should be at least $1.5 \mathrm{~V}_{\mathrm{T}}$ to reduce the effect of $\mathrm{V}_{\mathrm{T}}$ mismatch.

$$
\begin{aligned}
\mathrm{V}_{\mathrm{GSM} 3, \mathrm{M} 4}= & \mathrm{V}_{\mathrm{GSM} 2 \mathrm{~B}}=\sqrt{ }\left(2 * \mathrm{I}_{\mathrm{BIAS}} / \boldsymbol{\beta}_{12 \mathrm{~B}}\right)+\mathrm{V}_{\mathrm{TP}} \\
& =\sqrt{ }(2 * 57 / 21 * 9)+0.92=1.7 \mathrm{~V}>1.5 \mathrm{~V}_{\mathrm{T}}
\end{aligned}
$$

and a load capacitance of 0.5 pF with a resulting slew rate of $50 / 0.5=100 \mathrm{~V} / \mathrm{usec}$. However after layout and simulation 50uA current was not sufficient. Resimulations resulted in a current of 57 uA . The $100 \mathrm{~V} / \mathrm{usec}$ requirement should result in a peak full power frequency of

$$
\mathrm{f}_{\mathrm{p}}=\mathrm{SR} / 2 \Pi \mathrm{pp}=2 * 100 / 2 \Pi(4)=8 \mathrm{MHz}
$$

### 3.2.4.3 Power Circuit Design

The transistors M1 and M2 were laid out in a commoncentroid geometry to minimize DC offset. To facilitate the common centroid design in Magic, each cell has a (W/L) ratio of $6 / 2$. The only consideration in the design of transistors M6, M8, and M10 is that their lengths should be sufficiently large to achieve output impedance of the opamp. High output impedance translates into high gain of the opamp. The increase in capacitance with their increase in length of the transistors is negligible compared to the output capacitance of 0.5 pF . The design of M5 influences the slew Rate and the negative Power Supply Rejection Ratio (PSRR). So increasing the width of M5 will provide an increased CMRR, increased input Common Mode Range (CMR) and increased Slew Rate (SR), which have no significant impact on our design objectives.
3.2.4.4 Simulations

The D.C., A.C., and transient characteristics of the opamp are shown in Figure $14,15 \& 16$. The D.C. characteristics show an offset output voltage of 2.26 V . The A.C. Characteristics indicate the the gain-bandwidth product is 20 MHz and a gain of 72 dB . This shows that gain achieved is higher than the design objectives. The transient characteristics indicate a negative slew rate of $16 \mathrm{~V} /$ usec and a positive slew rate of $4 \mathrm{~V} /$ usec .

### 3.2.5 Squashing function

### 3.2.5.1 Design

Figure 17 shows the linear limiting version of the squashing CC.

Squashing is achieved by addition of two transistors's at each conveyor half. This generates a linear limiter function. One of the two transistors (MS2N) results in additional current branch to the supply rail. The upper transistor in the traditional mirror branch takes on the classical follower role (MFN), while the lower transistor (MS1N) serves to limit the current that can be mirrored to the $Z$ output. The saturation level is a function of the gate bias and geometry of the lower Transistor (MS1N) of


Fig 14. D.C. characteristics of op-amp


Fig 15. A.C. characteristics of op-amp


Fig 16. Transient response of op-amp


Figure 17. Squashing Current Conveyor
branch one. Once the current in the mirrored path saturates, all additional current is routed to the supply rails through the secondary path (MS2N). This results in a linear limited function with a globally programmable saturation limit.

The transistor MS2N should be large enough to supply the current for the entire column of the weight matrix once the transistor M1 saturates. The saturation current was fixed at 1500uA.

The opamp output swing is $\pm 3.5 \mathrm{~V}$. So the source of MFN should be $3.5 \mathrm{~V}+\mathrm{V}_{\mathrm{TP}}$ to maintain MFN in conduction. The source of MS2N, the $X$ terminal, is assumed to be at 2 V during the forward propagation. So the gate to source voltage of MS2N is $3.5 \mathrm{~V}+\mathrm{V}_{\mathrm{TP}}-\mathrm{V}_{\mathrm{TN}}$.

The current requirement is

$$
32 * V_{\text {WTMAX }} * \beta_{\mathrm{NT}} * V_{\mathrm{Pp}} \text { for a } 32 \text { column matrix }
$$

$$
100 * \mathrm{~V}_{\text {WTMAX }} * \beta_{\mathrm{NT}} * \mathrm{~V}_{\mathrm{pp}} \text { for a } 100 \text { column matrix }
$$

$$
(\mathrm{W} / L)_{\mathrm{MS} 2 \mathrm{~N}}=2 *(\mathrm{I}) /\left(\left(3.5-\mathrm{V}_{\mathrm{TN}}-\mathrm{V}_{\mathrm{TN}}\right) * \mathrm{k}_{\mathrm{n}}\right)
$$

$$
=14 \text { for a } 32 \times 32 \text { weight matrix }
$$

$$
=43 \text { for a } 100 \times 100 \text { weight matrix }
$$

$$
(\mathrm{W} / \mathrm{L})_{\mathrm{MS} 2 \mathrm{P}}=2 *(\mathrm{I}) /\left(\left(3.5-\mathrm{V}_{\mathrm{TN}}-\mathrm{V}_{\mathrm{TN}}\right) * \mathrm{k}_{\mathrm{n}}\right)
$$

$$
=27 \text { for a } 32 \times 32 \text { weight matrix }
$$

$$
=85 \text { for a } 100 \times 100 \text { weight matrix }
$$

### 3.2.5.2 Simulations

Figure 18 shows the D.C. characteristics of the squashing current conveyor. The saturation current is fixed to be $\pm 3 u A$.

### 3.2.5.3 Approximate Sigmoid function

There are many approaches for the saturation function in ANN's : Linear saturation function, S-shaped sigmoid function, Hyperbolic tangent etc. The sigmoid function generation circuit is discussed in this section.

A MOS device has a nonlinear output I-V characteristic. It can be utilized as a sigmoid function. The derivative has to be generated as a piecewise linear function. The gain can be obtained from the slope of the output-input curve of the MOS transistor.

In Figure $19, V_{i n}, V_{C}$, and $I_{o}$ are the input voltage, the gain control voltage, and output current respectively. The design of all the MOS devices ensure that they operate in the saturation region for the entire range of input voltage. Applying KVL around the loop shown in Figure 19,

$$
\begin{align*}
V_{i} & =V_{G S M 5}+V_{G S M 8}+V_{D S M 9}-V_{G S M 7}-V_{G S M 6} \\
& =\sqrt{\frac{2 I_{D 5}}{\beta_{5}}}+V_{T 5}+\sqrt{\frac{2 I_{D B}}{\beta_{9}}}+V_{T 8}+V_{D S 9}-\sqrt{\frac{2 I_{D 7}}{\beta_{7}}}-V_{T 7}-\sqrt{\frac{2 I_{D 6}}{\beta_{6}}}-V_{T 6}  \tag{26}\\
& =V_{D S M 9}
\end{align*}
$$



Figure 12. D.C. Characteristics of Low Power CC


Figure 19. Sigmoidal Function (Single Quadrant)

The equation above assumes that the transistors M5, M6, M7 and M8 are matched. The TSOS process ensures almost exact cancellation of $\mathrm{V}_{\mathrm{T}}$ 's. Low $\gamma$ in TSOS process results in $V_{T} \approx V_{T o}$. In regular orbit process, $V_{T}$ of transistors M5, M6 and M8 would be of different values and the circuit may fail.

Two quadrant operation can be achieved by adding PMOS transistors to the circuit in Figure 20. Depending on the polarity of the input voltage, the $N$ and $P$ transistors would conduct. Symmetry in two quadrants is maintained by the proper selection of device geometries in respective parts.

Transistors M11-14 act as mirroring transistors, transferring the drain current of M9 and M10 to the output load. The linear and saturation regions of M9 and M10 almost approximates the sigmoid function.

A family of curves can be generated by adjusting the value of the control voltage $V_{C}$, varying the gain of the circuit. The transitors M16-M22 reduces the steady state power dissipation.
3.2.5.3.1 Simulations The D.C. and A.C.
characteristics of the sigmoid function are shown in Figure 21 \& 22 respectively. The symmetry of the D.C. curve is maintained by matching the $n$-half and $p$-half of the squashing circuit.


Figure 20. Sigmoidal Nonlinearity (Two Quadrant)


Figure 21. D.C. characteristics of sigmoidal circuit

SIGMOIDAL NONLINEARITY


Figure 22 A.C. characteristics of sigmoidal circuit

### 3.2.6 Dynamic cascode biasing

The requirements for the current mirrors to be used with all the current conveyors are : linear current gain, high output impedance, wide output voltage swing, small input bias voltage, and a good high frequency response. The ability to satisfy the requirements depends on the type of current mirror chosen [45]. There are basically five types of current mirrors in CMOS technology [45]: simple current mirror, cascode or stacked current mirror,Wilson current mirror, Improved Wilson current mirror and cascode current mirror with improved biasing. The tradeoff can be a high output impedance and good current conveying capacity for a reduced output swing. Simple current mirrors have poor mirroring accuracy and low output impedance but have large output voltage swing. The stacked or cascode configurations suffer from reduced output swing but have high output impedance and good accuracy.

The current conveyor is at the heart of this building block approach. A simple current mirror will not achieve the required current conveying accuracy. Therefore cascode mirrors were used in all the circuits. The cascode mirrors have the disadvantage of reduced output swing. But since our crucial requirement is an accurate current transfer ratio, we selected cascode mirrors with dynamic biasing circuit as shown in Figure 23.


Figure 23. Dynamic cascode biasing circuit

The transistor M3 is used to mirror a portion of current that flows through the cascoding transistor M1NB. It is mirrored by the P-current mirrors. The $\lambda$ effect is not completely eliminated because the mirroring (P-mirrors) are not $100 \%$ accurate, and there is a $\lambda$ effect at junction of MP1 and M3. The $\lambda$ effect on the mirrors and M3 node can be reduced by making the lengths of all the cascode biasing transistors large. Further improvement is possible by cascoding MP1 and MP2 as well as M3.

### 3.3 Derivative Circuits

The derivative circuit of the linear saturating curve is shown in Figure 24. The backpropagtion vector consists of the derivative of the output vectors of the previous stage. So the derivative circuit must accomplish multiplying the output vector with the function value. The approximate sigmoid circuit derivative circuit acts as a transconductance transferring the input voltage to an output current. The derivative of that output current can be achieved as a peicewise linear derivative function. It needs another complex circuitry to multiply the derivative current and the output voltage.

The linear derivative circuit has two impedance states - low and high. The low impedance state indicates slow learning, while the higher impedance state indicates high


Figure 24. Derivative Circuit

D.C. characteristics of Derivative circuit
learning. The derivative circuit is shown in Figure 24. As soon as the drain voltage of the transistor MS1N becomes $V_{T}$, it saturates. This point corresponds to the knee of the linear saturating curve, when the transistor MS2N takes over conduction from MS1N. The comparator CR1 switches from low to high state, which drives the output of the nor gate to high output state. Since the gate of load transistor ML is connected to the output of the "nor" gate, it starts conducting and the current $I_{i n}$ gets a low resistance or high conductance path. The output voltage goes from high impedance state to low impedance state. Similar actions occur in the p-half as the transistor MS1P saturates, and the output goes from the high impedance state to the low impedance state. This low state will drive the corresponding delta vectors to low values. The reduced value of delta vectors will prevent learning, because the output vectors in that layer exceeds a certain value, or is saturated. Two types of comparators (CR1 and CR2) are used to trip at two different saturation points of MS1N and MS2N [56].

### 3.4 Weight Matrix

Neural networks "learn" by modifying weights (synapses). The weights must be alterable and should take a wide range of positive and negative values. The
incremental weight changes should be small [33]. If continuous weight values are used, then there is a need to store these values. This storage requirement imposes a quantization effect either because of digital storage and A/D converters, or by using analog storage and countering the effect of noise. The effective noise in the system determines the dynamic range of analog values that can be stored and retrieved. i.e. the resolution. An even more stringent requirement is the development of high density storage medium which is readily accessible in IC form. Digital storage with $A / D$ and $D / A^{\prime} s$ will not have the required chip density. Therefore analog storage is the solution [34].

There are variety of a methods of producing analog storage. Capacitors and integrators will allow the stored charge to degrade too fast. In pure analog storage there is no noise margin, and, hence no possibility of signal restoration. An analog signal can only be maintained, with memory decay, and the design objective is to maintain the signal as long as necessary.

An analog memory element can be characterized by: (1)location, on-chip or off-chip (2) volatility, volatile or nonvolatile (3) programming/erasing method, electrical or non-electrical, and (4) the precision in bits.

Storage of analog weights necessitates, 1) truly nonvolatility, for long term retention of the stored knowledge,
2) on-chip and rapid programmability, to expedite the network learning by minimizing read and write times, and 3) application specific yet simple, for ease of fabrication, analog memories. Discrete programming of true analog memories results in finite resolution, usually specified in bits.

Several analog memory designs have been presented in the literature [36]. Furman and Abidi [18] presented a feed forward network with back error propagation. The weights are stored as charges on capacitors on the nodes at cryogenic temperatures. Card and Schniedel have used capacitors with positive or negative charges with periodic refreshing using training data. Bibyk et. al. [37] used floating gate MOS transistors to store charges. Hubbard, Schwartz and Howard [29] introduced a circuit utilizing dynamic charge storage on MOS capacitors. Hoecht et. al. [38] presented a method in which a finite number of charge levels can be stored on a MOS capacitor. These charge levels are preserved by a sense circuitry and regular refreshes. Additional analog designs [39-42] are also present in the literature.

The favorable learning feature of the TRIT model is that the weights are varied in parallel and across the whole network. This eliminates the need for a complex circuitry to locate the weights. As the magnitude of the weight changes are predetermined, the weight modification is
further simplified. The floating-gate analog semiconductor memories has been proposed by a number of researchers [43] as a suitable analog medium for the long-term storage of the weights. Y.Tsividis and S.Satyanarayana [44] had suggested storing analog voltages at the gate capacitance of the MOS transistor itself. The inherent non-linearity of a transistor can be cancelled by using complementary input voltages through the matched weighing transistor, or by passing the same voltages through the complementary weighing transistors: the $n$-channel and the p-channel. Learning takes place by addressing the proper capacitors and charging them according to a specified learning algorithm. Once the MOS weights have settled (RC time constant), the capacitors are periodically accessed for reading, charging and refreshing. This scheme suffers from a relatively short retentivity resulting in decreased accuracy. As a result, the network becomes "absent minded", forgetting information shortly after learning.

### 3.4.1 Floating gate analog (FGA) memories

Floating gate analog memories are alterable and nonvolatile. They provide local on-chip weight storage on the floating gate of a transistor. It is small, consumes less power, has slow memory decay and is compatible with standard fabrication processes. The extra gate layers are used to
store trapped charges on a floating gate. Once trapped these charges produce a shift in the gate to source voltage which varies the current through the transistor. This type of memory element exhibits long term retention because no discharge path is available since the gate is surrounded by the dielectric material $\mathrm{SiO}_{2}$. This memory transistor is operated in the triode region where non-linearity of the transistor is fairly low. Usually depletion devices are used to eliminate the floating bias.

The charge on the floating gate of a transistor represent the value of the weight. As the network learns, the strength of the synapse increases. That is the electrical equivalent of dumping more charge on the floating gate, i.e., programming and thereby modulating the electrical conductivity of the synapse (PMOS). Thus during programming, the electrical conductivity of the synapse is expected to increase. The P-sense transistor was specifically chosen to achieve this desired operation. While programming, the floating gate acquires electrons which develop a negative potential on the floating gate of the $P-$ MOS sense transistor. The floating gate voltage tends to become more negative as programming proceeds. Therefore, the drain current through the device increases, i.e., conductivity increases.

Until recently, the memories discussed above required a special fabrication process such as ultrathin window,
nitrite trap oxide, or a conventional textured polysilicon. Usually, these special processes are expensive, immature. and simply not available in many design environments, especially universities. In order to fulfill the need of an analog neural network designers for programmable memories, existing standard CMOS process without modifications had to be improvised to provide a solution to realize floating gate memories. Recently several such implementations have been reported [45-46]. The interested reader is referred to the earlier work in this field by S.Patil [47].

A number of these floating gate analog memories can be interconnected suitably to form a weight matrix structure. The same weight matrix can be used in both forward and back propagation increasing both density and yield.

Current summing is used for the common analog computation of the inner product of the weight vector and the input vector. Current summing offers more dynamic range, which is of importance in signal processing applications. The linearity is due to summing of the non-linear elements currents of the transconductor into a virtual ground. The common mode nonlinear terms are eliminated, while the difference currents develop an inner product computation with wide dynamic range. The compact method of multiplying for inner product uses a single Transistor per cell. From the Figure

$$
\begin{aligned}
I_{D 1}=\boldsymbol{\beta}\left[\left(V_{G S}+V_{W T}-V_{T}\right) V_{D S}-V_{D S}^{2} / 2\right] & I_{D} \\
I_{D} & I_{D 1}-I_{D 2} \\
& =\boldsymbol{\beta} V_{W 1} V_{d s}
\end{aligned}
$$

Figure 26. Principle of Weight multiplier

$$
\begin{equation*}
I_{D 1}=\beta\left(\left(V_{G S}-V_{T}+V_{W T}\right) V_{D S}-V_{D S}^{2} / 2\right) \tag{27}
\end{equation*}
$$

$$
\begin{equation*}
I_{D 2}=\beta\left(\left(V_{G S}-V_{T}\right) V_{D S}-V_{D S}^{2} / 2\right) \tag{28}
\end{equation*}
$$

The output current as shown in Figure 26 is

$$
\begin{equation*}
I_{O}=I_{D 1}-I_{D 2}=2 * \beta * V_{W T} * V_{D S} \tag{29}
\end{equation*}
$$

The TRIT backpropagation IC consists of this analog EEPROM weight matrix surrounded by current conveyors.

### 3.4.2 Weight Adjustment Circuit

The weight adjustment circuitry is shown in Figure 27 \& 28. The circuit implements the TRIT algorithm based on the values of the delta and output vectors. The comparator CH1 switches from high to low state, if the input delta value exceeds $\epsilon_{2}$. CH2 switches from high to low, if the value of delta vector becomes less than $\epsilon_{2}$. The switching of either comparators results in a high state latched in latch (N9 and N10). The complemented output of the latches is fed to the transmission gate, which is clocked. The latch states are "OR"ed, clocked and fed to a NOR gate. The other input of the NOR gate is the strobe (STR) control signal. A high state at any of the latches is translated to an INC signal


Figure 27. Weight Adjustment Circuitry-1


Figure 28. Weight Adjustment Circuitry-2
being high.
Similar actions take place for the comparator $\mathrm{CH}^{\prime}$ 's output, which results in DEC signal being high, if the delta vector is less than $\epsilon_{2}$. Similar logic can be implemented for the output (O) vectors.

### 3.5 Sample and Hold (S/H) circuit

### 3.5.1 Introduction

The dynamic current copier (current self-calibrating circuit, and dynamic current mirror, etc.) is used. The gate capacitor of the MOS device is used to store the information for a short period of time since the gate of a MOS device practically has infinite input impedance. Figure 29 shows the basic N -copier cell. To sample the input current, switches $S_{1}$ and $S_{2}$ are closed. The gate capacitor $C_{G S}$ of $M_{1}$ will charge to voltage $V_{G S}$ required by the transistor to achieve the drain current $I_{0}$. If $M_{1}$ is in saturation, the gate voltage is given by:

$$
\begin{equation*}
V_{G S}=\sqrt{\frac{2 I_{O}}{\beta_{1}}}+V_{T 1} \tag{30}
\end{equation*}
$$

The switch $S_{1}$ and $S_{2}$ are opened successively. The circuit goes into hold phase and stores the current information as the capacitor voltage in the gate of M1.


Figure 29. Basic Sample and Hold Circuit

Since the gate voltage of M1 is coupled to transistor M2 and M3, an equivalent current can sink through M2 at the hold phase. The p-copier cell can be achieved by replacing the NMOS with a PMOS transistor, and by reversing the direction of currents. In such a case, the cell will source $I_{0}$ when connected to the load. The copier cells need not be accurately matched. An error current ( $\Delta \mathrm{I}$ ) is present due to: (1) charge sharing between the gate capacitor $C_{G S}$ and switch capacitor $\mathrm{C}_{\text {GSSw }}$. (2) channel length modulation parameter (3) junction leakage associated with $S_{1}$, causing a steady discharge of the storage capacitor,

The minimum dimension switches, $\mathrm{M}_{\mathrm{SP}}$ and $\mathrm{M}_{\mathrm{SN}}$ are used to reset the gate voltage or hold capacitors. A dummy switch can be added in series with the switching transistor to further minimize the effect of charge sharing. The channel length modulation error is reduced by cascoding the current sampling and holding transistors. Dynamic biasing of these cascode transistors gives improved cascoding and improved current transfer ratio. $M_{C N}, M_{I P A}, M_{I P B}, M_{R N}$ are used for dynamic biasing circuitry .

### 3.5.2 Errors in S/H circuit

### 3.5.2.1 Charge Injection Error

The switching transistor is made conductive by mobile
carriers that are attracted into the channel by the gate voltage during its closing. For charge equilibrium, the total charge of the mobile carriers in the channel must be equal to the total charge stored on the gate. The charge is stored on the gate in strong inversion. In N-copier when switch $S_{1}$ opens, a fraction $\Delta q$ of $q$ is dumped on the capacitor $C_{G S 1}$, which causes an error in the stored voltage. This voltage error ( $\Delta \mathrm{V}$ ) in turn creates a relative error in the output current of the copier. $\Delta V$ can be decreased by making the switch gate oxide capacitance a small percentage of the $C_{G S N}$ where one limit is given by the area of the $C_{G S N}$. It can also be decreased by reducing the total charge $q$ in the channel which in turn reduces the fraction $\Delta \mathrm{q}$ that flows onto $\mathrm{C}_{\mathrm{GSN}}$. This can be achieved by minimizing the gate area WxL and/or by controlling the gate voltages of the switch or increasing $\mathrm{V}_{\text {GSN }}$. Similar treatment applies to the P -copier for determination of the error due to the charge injection. The factor $\Delta \mathrm{q}$ determines the amount of charge that is dumped on the source.

### 3.5.2.2 Switch Feedthrough Error

This contribution is due to the clock voltage that is coupled to the gate via $C_{G D}$. The clock voltages is partially transferred to the gate via the capacitive network as,

$$
\begin{equation*}
\Delta V=\frac{C_{G D}}{C_{G D}+C_{G S N}} V_{\phi_{2}} \tag{31}
\end{equation*}
$$

where $\mathrm{V}_{\boldsymbol{\phi}}$ is the clock voltage and $\mathrm{C}_{\mathrm{GD}}$ is gate to drain capacitance of the transmission gate. The change in the gate voltage multiplied by the transconductance reflects an error in the drain current. This error is reduced by connecting a dummy transistor in series with the switching transistor [47].

### 3.5.2.3 Cascode Configurations

The contribution due to the channel length modulation produces change in the drain current as the drain to source voltage changes. The $\lambda$ effect is reduced by cascoding the transistors using a regulated cascode structure as shown in Figure 30.

### 3.5.3 Simulations

The transient simulation of the sample and hold circuit is shown in Figure 31. The output current is out of phase with the input current. The output current is the sampled and stored value of the input current.


Figure 28. Sample and Hold Circuit


Figure 31. Transient Characteristics of $\mathrm{S} / \mathrm{H}$ circuit

## CHAPTER IV

## CONCLUSIONS AND FUTURE PROSPECTS

The design of the basic building blocks for the TRIT algorithm in (TSOS) process is completed. A single weight matrix is used in both the forward and backpropagation mode resulting in reduction of area by two. A Matlab program which partially simulates the transistor mismatches in this architecure was also developed. The TRIT program with the transistor imperfections demonstrate faster convergence than BP and insensitivity to MOS parameter variation.

The system level integration of the TRIT model will require the exact specification of all the system parameters. The optimal values of learning rate, $\epsilon_{1}$, and $\epsilon_{2}$ should be investigated. The forward propagation parameters like $I_{\text {SAT }}, I_{\text {WT }}, V_{0}$ and the backpropagtion parameters like $\mathbf{Q}^{\prime}$ 's, $\mathrm{R}_{\mathrm{L}}$ and $\mathrm{R}_{\mathrm{H}}$ should be specified at the system level.

The fabricated blocks have to be tested thoroughly to test their effectiveness and further refined. The high power current conveying transistors are very large. The derivative circuit can be further improved.

Floating gate memories provide the best answer to
electrically programmable/erasable non-volatile semiconductor memories. Reduction in cell size, improvement in performance, and circuit density will be the products of the floating gate memories research. So future developments in FGA memories have to be followed closely to be adopted for our design.

For effective learning, local or on-chip storage and modification of the weight is the preferred solution. The task of weight updates is complex since it involves issues related to high voltage, learning algorithm, and weight storage. Precise control of the weight needs extensive experimentation to mathematical model and understand the programming and erasing behaviors of Floating gate memories.

The on-chip generation of high voltage poses an additional challenge. However, the tunneling physics and high voltage pulse generation are two separate issues and initially should by handled separately for conceptual testing and understanding, and then should be combined together. Other issues relating to the weight matrix are cell layout, placement, and signal routing. Cell layout will have a direct impact on both the silicon area as well as on the cell performance. Significant expertise is required to arrive at the optimal design. A suitable signal routing scheme is required since the weight matrix is expected to be dominated by routing wires. In this regard, high voltage concerns such as field threshold, reverse
breakdown etc. need special attention.
The process maturity will play an very important role in TRIT design. Also better analog simulation tools which represents the transistor more excatly has to be used to further improve the design efficiency.

1. Rumelhart, D.E.Hinton, G.E., \& Williams, R.J. (1986). Learning internal representations by Error propagation, Parallel Distributed Processing, Cambridge, MA, MIT Press, pp. 318-362
2. Richard P. Lippman, "An Introduction to Computing with Neural Nets," IEEE ASSP Magazine, April 1987, pp. 4-22
3. Derek B.I.Feltham, and Wojciech Maly, "Physically Realistic Fault Models for Analog CMOS Neural Networks," IEEE Transactions on Neural Networks, 1991, pp. 1223-1230
4. Bernhard E. Boser, Eduard Sackinger, Yann Le Cun, Lawrence D. Jackel, "An Analog Neural Network Processor with Programmable Topology," IEEE Journal on Solid State Circuits, December 1991, pp. 2017-2024
5. Karl Goser, Ulrich Hilleringmann, Ulrich Ruekert, and Klaus Schumacher (1989). VLSI Technologies for Artificial Neural Networks, Dec 1989, pp. 28-43
6. Shoemaker, P.A., Shimabukoro, R , and Michael J.

Carlin (1991), "Back Propagation Learning with Trinary Quantization of Weight Updates," Neural Networks Vol. 4, pp. 231-241
7. K.A.Boahen, R.E.Jenkins et.al,"A Heteroassociative Memory Using Current-Mode mos Analog VLSI Circuits," IEEE Transactions on Circuits and Systems, 1989, vol.36, pp. 747-755
8. S.W.Tsay and R.W.Newcomb, "VLSI Implementation of ARTI Memories," IEEE Transactions on Neural Networks," vol.2, 1991, pp. 214-221
9. A.F.Murray, D.Del Corso, and L.Tarassenko, "PulseStream VLSI Neural Networks Mixing Analog and Digital Techniques," IEEE Transactions on Circuits and Systems, vol.36, 1989, pp. 193-204
10. B. Linares, E. Sanchez, A.Rodriguez, and J.L.Huertas, "Modular Analog Continuous-Time VLSI Neural Networks with on Chip Hebbian Learning and Analog Storage," IEEE Transactions on Neural Networks, 1992, pp. 1533-1536
11. Simon Y. Foo, Lisa R. Anderson, Yoshigasu Takefuji, "Analog Components for the VLSI of Neural Networks," Circuits and Systems, 1990, pp. 18-25
12. C. Toumazou, F. J. Lidgey, and D. G. Haigh, Analogue IC Design: The Current-Mode Approach, Eds., Peregrinus, London, 1990
13. Christian Schneider and Howard Card, "CMOS

Implementation of Analog Hebbian Synaptic Learning Circuits," IEEE Transactions on Neural Networks, 1991, pp. I437-I442
14. Robert C. Frye, Edward A. Rietnam, and Chee C. Wong, "Back-Propagation Learning and Nonidealities in Analog Neural Network hardware," IEEE Transactions on Neural Networks, 1991, pp. 110-117
15. S. Espejo, A. Rodriguez et. al, "Switched-Current Techniques for Image Processing cellular neural networks in MOS VLSI," IEEE Transactions on Neural Networks, 1992, pp. 1537-1540
16. Jackel, L.D, Graf, H.P., and Howard, R.E.,"Electronic Neural Network Chips," Applied Optics, 1987, pp. 5077-5080
17. Alspector, J., Allen, R.B.Hu,V., \& Satyanarayana, S. (1988), "Stochastic learning networks and their implementation," Proceedings of the IEEE conference on the Neural Information Processing Systems. pp. 9-21
18. Furman B., and Abidi. A, "CMOS Analog IC implementing the back propagation algorithm, "First Annual Meeting, (Abstract) Neural Networks, 1988, pp. 38
19. Shoemaker, P.A., and Shimabukoro, R (1988), "A modifiable weight circuit for use in adaptive neuromorphic networks," Neural Networks, 1, Sup. 1,409
20. Hu V. Kramer A., and Ko. P. K., "EEPROM's as analog storage devices for neural nets," Neural Networks,1988, pp. 385
21. Shimabkuro R.L., Shoemaker, P.A, and Astewart M., "Circuitry for artificial neural networks with non-volatile analog memories," Proceedings of the IEEE Symposium on Circuits and systems, 1989, pp. 1217-1220
22. Holler,M.,Tam, S.Castro,H., and Benson, R., "An electrically trainable artificial Neural network (ETANN) with 10240 "floating gate" synapses," Proceedings of the IEEE Joint Conference on Neural Networks, 1989, pp. 177-182
23. Alan F. Murray, and Anthony Smith, "Asynchronous VLSI Neural Networks using Pulse-Stream Arithmetic," IEEE Transactions on Neural Networks, 1988, pp. 688-697
24. Peterson, C., \& Hartman, E. (1989), "Exploration of the mean field theory learning algorithm," Neural Networks, 2, pp. 475-494
25. M.Marchesi, G.Orlandi et.al.,"Multi-layer Perceptrons with Discrete Weights", Proc. IEEE ISCAS 1990, New Orleans, pp. 623-629
26. Bernd Hofflinger, Stefan Neuber et. al, "VLSI Implementation of a Neural Car Collision Avoidance Controller," IEEE Transactions on Neural Networks,

191, pp. I493-I499
27. C.A. Mead, Analog VLSI and Neural Systems, Addison Wesley Publishing Co. Inc., 1989
28. Martin Hagan (1992), Back Propagation Class Notes on Neural Networks (ECEN 5050.3), 1991
29. Kurosh Madani, Patrick Gadra, Eric Belhaire, and Francis Devos, "Two analog Counters for Neural Network Implementation," IEEE Transactions on Neural Networks, 1991, pp. 966-973
30. Paul Hasler and Lex Akers, "Circuit Implementation of Trainable Neural Networks Employing both Supervised and Unsupervised Techniques," IEEE Transactions on Neural Networks, 1992, pp. 15651568
31. W.R.Smith, "Trinary Back-Propagation Simulation with Component Nonidealities," NOSC Newsletter
32. Randy L. Shimabukuro, Pat Shoemaket et. al., "Effects of Circuit Parameters on Convergence of Trinary Update Back-Propagation"
33. Daniel B. Schwartz, Richard E. Howard, and Wayne E. Hubbard, "A programmable Analog Neural Network Chip," IEEE Transactions on Neural Networks, 1989, pp. 313-319
34. J.Raffel, J.Mann et. al., "A generic architecture for wafer-scale neuromorphic systems," IEEE Conference of Neural Networks, Vol. 3, 1987, p. 501
35. P. A. Shoemaker, C. G. Hutchens and, S. B. Patil, "A Hierarchical Clustering Network Based on a Model of Olfactory Processing," Submitted, 1992
36. Aria Nostrinia, M. Ahmadi, M. Sridhar, G.A. Julien, "A hybrid Architecture for multi-layer Neural Networks, IEEE Transactions on Neural Networks, 1992, pp. 1541-1544
37. T.H.Borgstorm, M.Ismail, and S.B. Bibyk, "Programmable current mode network for implementation in analogue VLSI," IEEE Proceedings on Neural Networks, 1990, pp. 75-84
38. B.Hochet, V.Peiris, S.Abdo et. al., "Implementation of a learning Kohonen neuron based on a new multilevel storage technique," IEEE Journal on Solid State circuits, 1989, pp. 262-267
39. P. Mueller et. al, "A general purpose analog neural computer," IEEE Second International Conference on Neural Networks, 1988, pp. 177-182
40. H.P.Graf and L.D.Jackel, "Analog Electronic Neural Network Circuits," IEEE Circuits and Devices Magazine, July 1989, pp. 44-49
41. E.A.Vittoz, "Analog VLSI Implementation of Neural Network," Proceedings of International Symposium on Circuits and Systems, 1990, pp. 2524-2527
42. F.M.A. Choi et. al, "An All-MOS analog Feedforward Neural Circuit with Learning," Proceedings of

International Symposium on Circuits and Systems, 1990, pp. 2508-2511
43. R. L. Shimabukuro, and P. A. Shoemaker, "Circuitry for Artificial Neural Networks with Nonvolatile Analog Memories," Proceedings, IEEE International Symposium on Circuits and Systems, pp. 1217-1220, 1989
44. Y. Tsividis, and S. Satyanarayana, "Analog Circuits for Variable-Synapse Electronic Neural Networks," Electronic Letters, Vol. 23, No. 24, pp. 13131314, November 1987
45. L. R. Carley, "Trimming Analog Circuits Using Floating-Gate Analog MOS Memory" Circuits, Vol. 24, No. 6, pp. 1569-1575, December 1989

46 B. W. Lee, B. J. Sheu, and H. Yang, "Analog FloatingGate Synapses for General-Purpose VLSI Neural Network Computation," IEEE Transaction on Circuits and Systems, Vol. 38, No. 6, June 1991
47. S.B.Patil, "VLSI Design of olfactory Network," Master's Thesis, Oklahoma State University, 1992
C. Toumazou, J. Lidgey, and D. Haigh "Introduction," Ch. 1 in Analogue IC Design: The Current-Mode Approach, C. Toumazou, F. J. Lidgey, and D.G. Haigh, Eds., Peregrinus, London, 1990
49. K.C.Smith and A.S.Sedra, "The current conveyor - a new
circuit building block," Proceedings of IEEE, vol 56, pp. 1368-1369, Aug 1968
50. A.S.Sedra and K.C.Smith, "A second-generation current conveyor and its applications," IEEE Transactions on Circuit Theory, Vol CT-17, pp. 132-134, Feb 1970
51. A.S.Sedra, "A new approach to active network synthesis," Ph.D Thesis, University of Toronto,1969
52. A. S. Sedra, and G. W. Roberts, "Current Conveyor Theory and Practice," Ch. 3 in Analogue IC Design: The Current-Mode Approach, C. Toumazou, F. J. Lidgey, and D. G. Haigh, Eds., Peregrinus, London, 1990
53. Paul R.Gray and Robert G.Meyer, analysis and design of Integrated Circuits,(2nd Edition), Wiley, NY 1984
54. S. B. Patil, and C. G. Hutchens, "A Novel Squashing Function for Electronic Implementation of Neural Networks," 5th Oklahoma Symposium of Artificial Intelligence, 1991
55. Y. Tsividis, Operation and Modeling of MOS Transistor, 56. P. E. Allen, and D. R. Holberg, "Two Stage Comparators," Ch.7, in CMOS Analog Circuit Design, HRW Inc., 1987

## APPENDIX A

## STARTUP PROGRAM

\% program to do initial calculations
for $i=1: 63$
$\mathrm{p}=[\mathrm{p} \sin (8 * \mathrm{pi} / 63 * i)]$;
end;
$\mathrm{p}=\mathrm{p}^{\prime}$;
$t=p$;
hidden_units=64;
for $m=1:$ hidden units for $n=1: 6 \overline{4}$
for $\mathrm{j}=1: 100$
vari=rand-0.5;
vari=vari/25;
if abs(vari)<=0.02, break;
end;
end;
w1 (m,n) $=0.5+$ vari;
end;
end;
for $m=1$ :hidden_units for $n=1: 6 \overline{4}$
for $j=1: 100$
vari=rand-0.5;
vari=vari/25;
if abs(vari)<=0.02, break;
end;
end;
w2 ( $n, m$ ) =vari+0.5;
end;
end;
for $m=1:$ hidden_units
for $n=1: 1$

```
        for j=1:100
            vari=rand-0.5;
            vari=vari/25;
                        if abs(vari)<=0.02,
                        break;
                    end;
            end;
            bl(m,n)=vari+0.5;
        end;
end;
for m=1:64
    for n=1:1
            for j=1:100
                vari=rand-0.5;
                vari=vari/25;
                if abs(vari)<=0.02,
                    break;
                    end;
            end;
            b2(m,n)=vari+0.5;
    end;
end;
for n=1:hidden_units
    for m=1:1
        for k=1:64
                        nl(n,m)=wl(n,k)*p(k,m)+bl(n,m);
                        al(n,m)=nl(n,m);
            if nl(n,m)>1
                al (n,m)=1.0;
            end;
            if nl(n,m)<-1
                al(n,m)=-1.0;
            end;
            end;
    end;
end;
for m=1:1
    for n=1:64
        for k=1:hidden_units
                        n2(n,m)=w\overline{2}(n,k)*al(k,m)+b2(n,m);
                a2(n,m)=n2(n,m);
                if n2(n,m)>1
                a2(n,m)=1;
            end;
                if n2(n,m)<-1
                        a2(n,m)=-1.0;
            end;
        end;
    end;
end;
```

```
e=t-a2; 108
sse=0;
for i=1:64
    for n=1:1
sse=e(i,n)^2+sse;
    end;
end;
```


## APPENDIX B

## TRIT PROGRAM

```
    epsilon2=input('Enter the value of epsilon2:');
    learning_rate=input('Enter the value of learning
    rate:')
    epsilonl=0.33;
    errors=[sse];
    hidden_units=64;
    examp=0;
% random # of 0.02 is generated
    for n=1:64
        for m=1:hidden units
        for j=1:100
                vari=rand-0.5;
                vari=vari/25;
                if abs(vari)<=0.02,
                    break;
                    end;
            end;
            noisewl(m,n)=vari;
        end;
end;
    for n=1:hidden_units
        for m=1:64
            for j=1:100
                vari=rand-0.5;
                    vari=vari/25;
                        if abs(vari)<=0.02,
                    break;
                    end;
            end;
            noisew2(m,n)=vari;
        end;
end;
                            1 0 9
```

```
% beta variation
    for n=1:1
    for m=1:hidden units
            for j=1:100
                vari=rand-0.5;
                vari=vari/25;
                if abs(vari)<=0.02,
                        break;
                end;
            end;
            delbetal(m,n)=vari;
    end;
end;
for n=1:1
    for m=1:64
            for j=1:100
                vari=rand-0.5;
                vari=vari/25;
                if abs(vari)<=0.02,
                    break;
                    end;
        end;
        delbeta2(m,n)=vari;
    end;
end;
```

\% VT variation
for $n=1: 1$
for $m=1: h i d d e n$ units
for $j=1: 1 \overline{0} 0$
vari=rand-0.5;
vari=vari/25;
if abs(vari)<=0.2,
break;
end;
end;
delvtal(m,n)=vari;
end;
end;
for $n=1: 1$
for $m=1: 64$
for j=1:100
vari=rand-0.5;
vari=vari/25;
if abs(vari)<=0.2,
break;
end;
end;
delvta2(m,n)=vari;
end;
end;
\% iteration starts

## for $i=1: 250$

check=sse;
if sse<=0.1,
$i=i-1$, break;
end;
\% calculation of deltas
for $m=1: 64$
for $\mathrm{n}=1: 1$
$\mathrm{d} 2(\mathrm{~m}, \mathrm{n})=\mathrm{t}(\mathrm{m}, \mathrm{n})-\mathrm{a} 2(\mathrm{~m}, \mathrm{n})$;
if $\mathrm{abs}(\mathrm{n} 2(\mathrm{~m}, \mathrm{n}))>=1.0$, d2 $(\mathrm{n}, \mathrm{m})=0$;
end;
end;
end;
d2=t-a2;
d1=w2'*d2;
\% weight \& offset correction
for $m=1:$ hidden_units
for $n=1: 1$
if $\mathrm{abs}(\mathrm{nl}(\mathrm{m}, \mathrm{n}))>=1$, d1 $(m, n)=0$;
end;
end;
end;
for $n=1: 1$
for $m=1:$ hidden_units
if $\mathrm{d} 1(\mathrm{~m}, \mathrm{n})>=e \mathrm{psil} \mathrm{m}_{2}$, $\mathrm{bl}(\mathrm{m}, \mathrm{n})=\mathrm{b} 1(\mathrm{~m}, \mathrm{n})+$ learning_rate;
end;
if $\mathrm{d} l(\mathrm{~m}, \mathrm{n})<=-$ epsilon2, bl $(m, n)=b 1(m, n)$-learning_rate; end;
end;
end;
for $n=1: 1$
for $m=1: 64$
if $\mathrm{d} 2(\mathrm{~m}, \mathrm{n})>=$ epsilon2,
$\mathrm{b} 2(\mathrm{~m}, \mathrm{n})=\mathrm{b} 2(\mathrm{~m}, \mathrm{n})+$ learning_rate;
end;
if $d 2(m, n)<=-e p s i l o n 2$,
$b 2(m, n)=b 2(m, n)$-learning_rate;

```
for m=1:64
    for n=l:hidden units
        if abs(d2(m))>=epsilon2,
            if abs(al(n))>=epsilon1,
                        if d2(m)*al(n)>=0
                    w2(m,n)=w2(m,n)+
                    learning_rate +
                                    noisew2}(m,n)
    end;
    if d2(m)*al(n)<=0
                                    w2(m,n)=W2(m,n)-
                                    learning_rate +
                                    noisew2(m,n);
                            end;
                end;
            end;
            if abs(w2(m,n))<=0.2,
                if w2(m,n)<0,
                    w2(m,n)=-0.2;
                end;
                if w2(m,n)>0,
                        w2(m,n)=0.2;
                end;
            end;
            if abs(w2(m,n))>3,
                if w2(m,n)<0,
                    w2(m,n)=-3;
                end;
                if w2(m,n)>0,
                    w2 (m,n)=3;
                end;
            end;
    end;
end;
for m=1:64
    for n=1:hidden units
        if abs(dl(n))>=epsilon2,
                if abs(p(m))>=epsilonl,
                    if dl(n)*p(m)>=0
                                    wl(n,m)=w1(n,m)+
                                    learning_rate +
                                    noisewl(n,m);
    end;
                        if dl(n)*p(m)<=0
                        w1(n,m)=wl(n,m)-
                                    learning_rate +
                                    noisew1(n,m);
                                    end;
        end;
```

```
            end;
            if abs(wl(n,m))<=0.2,
        if wl(n,m)<0,
            wl(n,m)=-0.2;
    end;
    if wl(n,m)>0,
                    wl(n,m)=0.2;
    end;
    end;
    if abs(wl(n,m))>=3,
        if wl(n,m)<0,
            w1(n,m)=-3;
        end;
        if wl(n,m)>0,
            w1(n,m)=3;
        end;
            end;
        end;
    end;
% calculation of outputs
    for n=1:64
        for m=1:1
            for k=1:64
                        nl(n,m)=wl(n,k)*p(k,m)+bl(n,m);
                        al(n,m)=nl(n,m);
                                if nl(n,m)>1
                        al(n,m)=1.0;
                        end;
                        if nl(n,m)<-1
                        al(n,m)=-1.0;
                            end;
            end;
    end;
end;
for m=1:1
        for n=1:64
            for k=l:hidden_units
                n2(n,m)=w\overline{2}(n,k)*al(k,m)+b2(n,m);
                a2(n,m)=n2(n,m);
                if n2(n,m)>1
                    a2(n,m)=1;
                end;
                if n2(n,m)<-1
                    a2(n,m)=-1.0;
                    end;
            end;
    end;
end;
for \(m=1\) :hidden units
                        for \(n=1: 1\)
                        if \(\mathrm{al}(\mathrm{m}, \mathrm{n})>-0.5\),
                \(\mathrm{al}(m, n)=\mathrm{al}(m, n) *(1+d e l b e t a 1(m, n))\);
                end;
                        if \(\mathrm{al}(\mathrm{m}, \mathrm{n})<=-0.5\),
                        al \((m, n)=a l(m, n) *(1+\operatorname{delvtal}(m, n)) ;\)
            end;
    end;
    end;
    e=t-a2;
    sse \(=0\);
\% plotting of errors
    for \(m=1: 64\)
        for \(n=1: 1\)
            sse=sse+e(m,n)^2;
        end;
    end;
    errors=[errors sse];
    ploterr(errors);
    hold on;
end;

\section*{APPENDIX C}

\section*{STANDARD BP PROGRAM}
```

errors=[sse];
% iteration starts
for i=1:250
if sse<=0.1,
i=i-1,
break;
end;
% calculation of deltas
for m=1:5
for n=1:3
d2(m,n)=t(m,n)-a2(m,n);
if abs(n2(m,n))>=1.0,
d2(n,m)=0;
end;
end;
end;
d1=w2'*d2;
% weight \& offset correction
for m=1:hidden_units
for n=1:3
if abs(nl(m,n))>=1,
d1(m,n)=0;
end;
end;
end;
for n=1:3
for m=1:hidden units
bl(m,n)=b\overline{l}(m,n)+learning_rate*dl(m,n);
end;
end;115

```
```

    for n=1:3
    for m=1:5
        b2(m,n)=b2(m,n)+d2(m,n)*learning_rate;
    end;
    end;
    for m=1:5
    for n=l:hidden_units
        w2(m,n)=w2(m,n)+d2(m)*al(n)*learning_rate;
    end;
    end;
    for m=1:5
        for n=1:hidden units
        wl(n,m)=w\overline{l}(n,m)+p(m)*dl(n)*learning_rate;
    end;
    end;
    % calculation of outputs
for n=1:hidden_units
for m=1:3
for k=1:5
nl(n,m)=wl(n,k)*p(k,m)+bl(n,m);
al(n,m)=nl(n,m);
if nl(n,m)>1
al(n,m)=1.0;
end;
if nl(n,m)<-1
al(n,m)=-1.0;
end;
end;
end;
end;
for m=1:3
for n=1:5
for k=1:hidden_units
n2(n,m)=w\overline{2}(n,k)*al(k,m)+b2(n,m);
a2(n,m)=n2(n,m);
if n2(n,m)>1
a2(n,m)=1;
end;
if n2(n,m)<-1
end;
end;
end;

```
```

end;117
e=t-a2;
sse=0;
for m=1:5
for n=1:3
sse=sse+e(m,n)^2;
end:
end;
%if i==1,
check=sse*2;
end;
if check=msse,
epsilon2=0.5*epsilon2,
end;
%if check>sse,
learning_rate=1.07*learning_rate;
end;
%if check<sse,
learning_rate=learning_rate/1.02;
end;
errors=[errors sse];
%plot(errors);
%hold on;
end;

```

\section*{APPENDIX D}

\section*{ERRORS IN MULTIPLIER CIRCUIT}

\section*{D. 1 Error in multiplier}
\[
\begin{aligned}
& I_{\mathrm{DN}}=\left(\beta_{\mathrm{N}}+\Delta \boldsymbol{\beta}\right)\left[\left(\mathrm{V}_{1}-\left(\mathrm{V}_{\mathrm{TN}} \pm \Delta \mathrm{V}_{\mathrm{T}}\right)-\mathrm{V}_{2}^{2} / 2\right]\right. \\
& \mathrm{I}_{\mathrm{DP}}=\left(\beta_{\mathrm{P}}+\Delta \beta\right)\left[\left(\mathrm{V}_{1}-\left(\mathrm{V}_{\mathrm{TP}} \pm \Delta \mathrm{V}_{\mathrm{T}}\right)-\mathrm{V}_{2}^{2} / 2\right]\right. \\
& \mathrm{I}_{0}=I_{\mathrm{DN}}-\mathrm{I}_{\mathrm{DP}} \\
& \text { Assuming } \beta=\beta_{2} \\
& \mathrm{I}_{0}=2 \beta \mathrm{~V}_{1} \mathrm{~V}_{2}+2 \Delta \beta \mathrm{~V}_{1} \mathrm{~V}_{2} \pm 2 \beta \Delta \mathrm{~V}_{\mathrm{T}} \mathrm{~V}_{2} \pm 2 \Delta \mathrm{~V}_{\mathrm{T}} \mathrm{~V}_{2} \Delta \beta \\
& \quad=\text { Ideal term }+ \text { ERROR }
\end{aligned}
\]
\[
\text { ERROR }=2 \Delta \beta \mathrm{~V}_{1} \mathrm{~V}_{2} \pm 2 \beta \Delta \mathrm{~V}_{\mathrm{T}} \mathrm{~V}_{2} \pm 2 \Delta \mathrm{~V}_{\mathrm{T}} \mathrm{~V}_{2} \Delta \beta
\]
where the last term can be neglected
\[
\text { ERROR }=2 \beta \mathrm{v}_{1} \mathrm{~V}_{2}\left(\Delta \boldsymbol{\beta} / \boldsymbol{\beta} \pm \Delta \mathrm{V}_{\mathrm{T}} / \mathrm{V}_{1}\right)
\]
\[
\text { So } I_{0}=2 \beta V_{1} V_{2}\left(1+\Delta \beta / \beta \pm \Delta V_{T} / V_{1}\right)
\]
\[
\Delta \beta / \beta=1-2 \%
\]

Assuming V120.5-1V
and \(\Delta \mathrm{V}_{\mathrm{T}} \approx 5-10 \mathrm{mV}\)
\(\Delta \mathrm{V}_{\mathrm{T}}\) error is 1\%
D. 2 Error in output [F(.)] function

Due to \(\Delta \mathrm{V}_{\mathrm{T}}\) and \(\Delta \beta\) errors, the output is modified. The output current is proportional to the square of the gate to
source voitage. So error in transconductance due to \(\Delta \beta\) and \(\Delta V_{T}\) can be derived as follows:
\(\boldsymbol{\Delta I}=\boldsymbol{\beta}\left(\mathrm{V}_{\mathrm{C}}-\mathrm{V}_{\mathrm{T}}\right)^{2}-(\boldsymbol{\beta} \pm \boldsymbol{\Delta} \boldsymbol{\beta})\left[\left(\mathrm{V}_{\mathrm{C}}-\mathrm{V}_{\mathrm{T}} \pm \Delta \mathrm{V}_{\mathrm{T}}\right)\right]^{2}\)
\(\Delta I=\beta\left(\Delta V_{\mathrm{C}}\right)^{2}-(\beta \pm \Delta \beta)\left[\left(\Delta \mathrm{V}_{\mathrm{C}} \pm \Delta \mathrm{V}_{\mathrm{T}}\right)\right]^{2}\)
Simplifying
\(\Delta \mathrm{I}=\boldsymbol{\beta} \Delta \mathrm{V}_{\mathrm{c}}\left(\boldsymbol{\Delta} \boldsymbol{\beta} / \boldsymbol{\beta} \pm 2 \Delta \mathrm{~V}_{\mathrm{T}} / \mathrm{V}_{\mathrm{c}}\right)\)
IF \(\Delta V_{C}=V_{T}, V_{T}\) error is 1-2\%
The \(\beta\) error can be considered as a slope error or addition of noise.

VITA

> Parthasarathy Balaji
> Candidate for the Degree of
> Master of Science

Thesis: DESIGN OF BUILDING BLOCKS FOR THE TRIT ALGORITHM
Major Field: Electrical and Computer Engineering
Biographical:
Personal Data: Born in Pondicherry, India, March 20, 1968, the son of K.N. Parthasarathy and K.P. Ranganayaki.

Education: Graduated from Madras Christian College School, India, in May 1986; received Bachelor of Engineering degree in Electrical and Electronics Engineering from College of Engineering in May 1990; completed requirements for the Master of Science degree at Oklahoma State University in May, 1993.

Professional Experience: Research Assistant, Department of Electrical Engineering, Oklahoma State University, January, 1992, to December, 1992 .```

