Abstract: Hebbian learning in analogue CMOS synapses is obtained by using the transistor characteristics to approximate the multiplicative correlation of neural signals. In situ analogue learning is employed, which means that computations of synaptic weight changes occur continuously during the normal operation of the artificial neural network. The transistor complexity of a synapse is minimised by departing from strict adherence to classical multiplicative rules; learning remains consistent, however, with the original qualitative statement of Hebb. Simulations of circuits with three transistors per synapse in the case of unipolar weights suggest that appropriate learning and forgetting behaviour is obtained at the synaptic level by adopting these area-efficient MOS learning rules in lieu of classical analytical formulations. The theory at the systems level corresponding to these learning rules has not yet been developed.
Introduction
Artificial neural networks consist of neurons and synapses; the area of electronic implementations is normally dominated by the synapses as there are many more of these than there are neurons ( Provided the artificial neurons produce both + 6 and -6 , an inhibitory weight may be realised by employing -6 . In this way, all weights w.j can be positive (unipolar weights). In Fig. 2 the weight wj is adjusted according to a learning algorithm and this has normally been performed externally and applied as V,. In some cases the weight storage (which refreshes the capacitor voltage) is on-chip but the learning algorithm runs offchip. Alspector 'When an axon of cell j is near enough to excite a cell i and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or -I) neuron I The drain current through this device which contributes to the total current into neuron i is given, for V, > q , by where the V:/2 term is assumed to be negligible, K = pC,, W / L is a constant known as the conduction parameter of the transistor [13] , and V, and a 5 represent the gate and drain voltages V, and V,, respectively. For V, < C: the transistor Q1 is in the cutoff region and the current decreases exponentially with further decreases in V, [13] .
The weight wj remains positive (but very small) for negative V,. This method of arriving at the synaptic weight has been described earlier [14-161.
Unipolar Hebbian synapses and MOS learning rules
Hebbian learning depends upon local variables K and 5
and there are penalties to supplying these externally in VLSI; these penalties are associated with the amount of wiring and the number of pins required, which in turn impact the achievable speed and resolution of the circuit. It is therefore desirable to compute the weight changes locally at the synapse. One method is to employ an analogue circuit such as a Gilbert multiplier [17] with inputs K and 5 to provide a current I, to the capacitor to represent the first term of eqn. 1. We also use a leakage resistance R to represent the forgetting term; this leads to a modulation of V, and hence K j in accordance with the analytical form of eqn. 1. To reduce the complexity of the synaptic circuitry the multiplier may be replaced with the simpler circuit of Fig. 3 . The leakage resistor R is approximated by the transistor Q2. The rate of synaptic weight changes or the learning rate is then given by Eqn. 3 is an approximation since R is nonlinear; it is intended as a guide to understanding the simulations below. I, is the current (via Q3-Q4) charging the weight capacitor (the learning signal) which for V, = 0 is approximately proportional to the product v 5. As the capa- The present MOS learning rules are motivated by the desire to achieve low synaptic circuit complexity in VLSI neural networks, rather than being derived from established theoretical principles. Weight saturation does, however, seem to be a reasonable feature on biological grounds.
3
Circuit layout and performance of CMOS synapses Fig. 4 presents a layout of eight synapses of the type shown in Fig. 3 . In 1.2pm CMOS the dimensions of a single synapse are 115 x 10.9pm so that a neural network chip of 1 cm2 area could contain approximately 75 OOO synapses (assuming that the synapses dominated the chip area). SPICE simulations of these layouts have been performed using level 3 transistor models. The multiplication a<(V, -q) in Q, representing the contribution I i j from synapse j to the total current into neuron i is shown in Fig. 5 . For a? < 0.5 V this circuit provides a respectable approximation to an analogue multiplication. Fig. 6 presents the approximation to the multiplication required for Hebbian weight changes or learning.
This Figure corresponds to the case V, = 0 and therefore represents the maximum learning rate. Note that in this case we cannot employ the above trick of driving the learning from aK because we require large signal voltage Vj are ignored by this circuit. The effects of weight saturation are clearly observed in Fig. 7 . As discussed in the previous Section, this represents the major departure from the traditional analytical form of Hebbian learning given by eqn. 1. Weight decay has been greatly exaggerated in this circuit by employing a small R (implemented using Q2 in Fig. 3 ) in order to be able to make it observable in Fig. 7 . Actual implementations of neural network models would employ weight decay rates which would not be discernible in Fig. 7 . Finally, in Fig. 8 we show an example of the dynamic behaviour of a simple neural circuit of two synapses j and k driving one neuron i. 3 is negative throughout the simulation. 4 and V, have both initially been high for a considerable time and the weight of the k synapse V,, has thus achieved its maximum value. V, is then switched to its extreme negative value and since V, began low (i.e. 5 provides little contribution to turning on neuron i), the neuron output 4 follows V, and also swings negative. Since and % are now both at their extreme negative values, they are fully correlated, so that Kj begins to rise., eventually reaching its saturation value near 5 V. V, began to decay when V, went negative, but as 6 soon followed, it merely resulted in a minor glitch in V,k during the transition of V, and 4.
Discussion
On the basis of the above simulations it appears that a mechanism resembling Hebbian learning is operating < .
vi .v whose synapses employ these MOS learning rules. Depending upon the architecture of the network (feedforward, feedforward with crosstalk, or feedback) and the learning mode (supervised or unsupervised) the objective function to be optimized by the learning process will differ. For example, in feedforward supervised nets one minimises a sum of squared errors by gradient descent [ZO], whereas in an unsupervised feedforward net with crosstalk one may maximise mutual information between several neighbouring hidden units [21] . Different measures are optimised by fully connected supervised mean field networks [19] and by unsupervised linear networks [22] even though both of these latter cases employ versions of Hebbian learning rules at the synapses.
Let us now consider the issue of weight decay or forgetting rates. Forgetting rates determine the number of training cases remembered by the capacitor charge, so the desired rates depend upon the number of weights that are mutually dependent during the learning process. For unsupervised learning based on optimisation of local objective functions, the desired forgetting rates are expected to be larger than in supervised networks with global error signals during training. In the supervised case all the weights in the network are mutually dependent. The forgetting rate will normally be orders of magnitude lower than the learning rate so that many past training cases (on the order of the number of weights) can be remembered. To control the learning rate, one can adjust the length/width ratio L/W of the transistors Q3 .___.._ V ' -v
I5
and Q4 in Fig. 3 . Learning rates are inversely proportional to both L/W and to the capacitance C, so that by increasing L/W and C one can decelerate the learning process to the desired rate. One could, alternatively, introduce a series resistor in Fig. 3 to reduce the learning rate. All of these adjustments increase the silicon area per synapse. If extremely low decay rates are desired (to remember large training sets) the chip could be cooled (perhaps even to cryogenic temperatures such as liquid nitrogen 77 K) to reduce pn-junction leakage currents. These leakage currents control the forgetting rate when transistor Q2 in Fig. 3 is in the cutoff mode. We have recently explored the case of bipolar weights and four-quadrant multiplication of and 5 which provides a learning current that responds to both correlation and anticorrelation of these variables. This approach also produces more accurate approximations to a Hebbian multiplication. The penalty is an increased layout area, as more than a dozen transistors per synapse are required in the bipolar case.
One of the limitations of the present approach to in situ learning circuits is the inherent volatility of these circuits. Synaptic weights represent the knowledge of the network, and the network must be continuously exposed to relevant inputs or training data. Irrelevant or null inputs would otherwise corrupt the weights. One cannot simply disconnect the power supply and have the network retain its present state. To circumvent this problem either a refresh mechanism could be incorporated, using the training data, or nonvolatile storage may be employed. Along with several other groups we are investigating implementation of nonvolatile synaptic weights using EEPROM devices [6, 7, 181 . Improvements with EEPROM synapses come at the expense of a more complex processing technology.
Conclusions
The suitability of VLSI-efficient learning algorithms based on the above MOS learning rules, as compared to previous analytical forms of Hebbian learning, can only be ascertained by simulation of a range of network models employing these synapses to attempt a variety of learning tasks. These simulation studies are important in view of the potential benefits from reductions in silicon area in the realisation of synaptic circuits, since the area of these synapses determines the VLSI complexity of the neural network itself. Hebbian learning has become popular in neural network research because it has achieved success in associative tasks and because it represents a simple mathematical approximation to what is believed to underlie biological neural systems. Recent work by Brown and his colleagues has helped to confirm the presence of Hebbian synapses in the hippocampus of mammalian brains [23] . The circuits of this paper or related bipolar versions may also be useful in implementing contrastive Hebbian learning rules in mean field networks [19] . We believe it to be worthwhile to search for learning algorithms which achieve approximations to classical Hebbian rules based upon the simplest silicon implementations even though the corresponding mathematical models may be more complex. 
