We propose and analyze analog VLSZ implementations of neural networks in which both the neural cells and the synapses are realized using Operational l'kansconductance Amplifiers (OTAs). These circuits have inherent advantages of immunrty to noise, very high input/output impedances, diflerential architecture with automatic inversion, and density. An eflcient on-chip technique for weight adaptation and for adjusting the gain of OTA-based neurons is proposed. Power and area requirements are obtained. We consider OTAs as a basic building block for eficiently constructing several types of artificial nevral networks including Hopfield networks, Boltzmann machines and cellular networks. Circuit simulations using MTIME show that small Hopfield memories converge in about a psec.
Introduction
The ori 'nal implementation of the continuous Hopfield modeywas achieved using op-amps as the neurons and resistors as the interconnection weights [4]. This implementation requires op-amps per neuron to be able to provide both excitatory and inhibitory connections since real resistors cannot have negative values. Further, as was discussed in detail in 11, the specification of the resistor values to achieve t 6 e desired interneuron weighting is non-trivial because of the parallel resistances of the interconnections, and the input and output impedances of the op-amps. Resistors in silicon technology are also extremely demanding in terms of required area and cannot be easily varied once fabricated, and hence weight adaptation needs elaborate and indirect techniques.
The potential of using a multi-input Operational Transconductance Amplifier (OTA) circuit to build an artificial neuron was suggested by Reed [7] . He focussed on the design of a single neuron rather than on network or system-level design. Since then, a few VLSI neural net architectures have used OTAs in a limited fashion [3, 5 This paper proposes and evalufor implementing both synapses and neurons, and considers several system-level design issues. In Section 2, we introduce the basic OTA circuits used in this paper.
An on-chip tuning mechanism is proposed. Section 3 presents a case study on implementation of a Hopfield . In an op-amp design, currents from other neurons are summed over synaptic resistors and the resulting net voltage acts as the input to the opamp that serves as a current neuron. The problems with this design include the difficulty of implementing stable VLSI resistors that can be modified after manufacture and a susceptibility to noise due to being inputs and outputs being single ended. The architecture proposed in this paper use8 only one basic building block to fulfil both the synapse and the neuron function.
The OTA Synapse: An OTA synapse is obtained by concatenating a weight tuning mechanism with the basic OTA structure, as shown In Figure 1 . An OTA is ideally a voltage controlled current source with infinite input and output impedance. The OTA gain is modified by adjustin the gate voltage of a MOSFET (via biasing with I?l!UN in Figure 1 ) thus affecting the OTA transconductance. The OTA output current is determined by the product of the differential input voltage and the transconductance of the input pair. In this application, the effective OTA transconductance is used as the synaptic weight. Although the fundamental weighting function (variable transconductance) is accomplished through varying the bias current in the input pair, a more complicated design is required to allow for a more linear voltage to current transfer function. The neuron OTA is shown in Figure  2 . The transconductance function here is governed by the gate voltage VGM and the transistor is biased in the triode region of operation. This is an example of source degeneration to linearize the transconductance. OTAs are generally faster and have larger achievable impedances (input and output) than more traditional op-amp designs. This design successfully balances the need for speed with the inherent non-linearity (due to the lack of a feedback loop) of OTAs.
in this paper is, in principle, an extension o P an op7th lntemional Conference on VLSI Design -January 1994
Note that the basic OTA architecture is fully differential, just as the ETANN. This gives very good immunity to noise, but it also offers another distinct advantage. With the differential architecture we have an inherent inversion, i.e. both the positive and the negative output are available, thereby allowing us to build a neural network with only one amplifier per neuron since we have a positive and a negative output. Next, we can do away with the virtual ground amplifier since the OTA, used as a synapse, has a very high output impedance. The common mode of the synapse output is regulated by a common mode feedback circuit.
A key advantage of the OTA synapse is the programmability of the effective resistance by controlling the bias current. On considering OTAs as (variable gain) current sources, it is immediate how they can be used to implement the combining of activity of a set of afferent synapses -simply sum the currents from the synapses directly into the single, non-critical resistor at the input of the post-synaptic neuron. This is indicated in Figure 3 . Note that since the OTAs used as synapses have an infinite input impedance (i.e. CMOS input devices) they do not resistively load the output of the pre-synaptic neurons. This means the output impedance of the pre-synaptic neurons do not cause an error. In fact, we want this output impedance to be as high as possible to simulate a current source.
Synapse Tuning Mechanism: The basic idea for the tuning of the synapses is shown on the left side of the broken line in Fig. 1 . This circuit could actually be performed with one transistor, but since the current DAC (required for weight storage in on-chip SRAM used for setting the Gm via ITUN, was designed wit P-channel current sources, it was decided to use a current mirror in the tuning cell. When an OTA synapse cell is chosen for updating by the address generator, the transmission gate (i.e. a CMOS switch in the N-channel transistor is turned 'on'. This e B ectively connects the transistor in a diode connected configuration. A tuning current is then forced into the drain of the transistor which is proportional to the synaptic strength. This current creates a gate to source voltage, Vg8, on the transistor where,
and vt is the threshold voltage of the transistor. This tuning current is then mirrored up to the input pair of the OTA, and defines its quiescent bias current, which in turn determines the transconductance of the OTA. When the transmission gate is turned 'off', meaning that the tuning loop has gone on to tune another synapse, the gate voltage on the transistor is maintained by the capacitor connected from its gate to VSS. If the capacitor volta e is updated often enough, there will not be sufficient feakage to cause the weight value to change.
The OTA Neuron: The neuron OTAs have the same basic structure as the synapse OTAs except that they need their own common mode regulation. The neuron OTA schematic is shown in Figure 2 . The summing resistor is used to sense the common mode k 12 voltage at the input of the neuron. This resistive loading is not allowed at the output of the neuron since it would kill the voltage gain, and hence we would no longer have an integrator function. A special common mode sensing circuit has been devised which does not resistively load the neuron output. The average of the two outputs of each neuron are sensed and compared to VAG, which is at mid-supply. The amplifier's common mode is adjusted accordingly. The neuron's gain is adjusted in a similar way that the synapses effective resistance is adjusted, by modifying the transconductance, but in this application the control is through the gate voltage VGM (on the source degeneration transistor) as opposed to the current ITUN used in the synapse (see Figure 1) .
One of the important advantages to using the OTA as the neuron is the integrator function we achieve when we put a capacitive load at the output of an OTA. With conventional implementations of analog networks, operational amplifiers with a sigmoidal transfer function were used as the neurons. Yanai and Sawada [SI have shown that a neural network using intekrators as the neurons is a more ideal implementation, and that catastrophe of memories occurs for low gain when a sigmoidal amplifier is used, yet not with the integrator neurons. In the present circuit, the loading at the output of the neurons is purely capacitive, since they drive only the inputs of the synapse OTAs. This ives the inherent integrator function without actuafy having to put capacitors on the chip.The advantages of using the OTA for the synapse are many. It may seem at first lance that sistor implementation, but as discussed earlier, building resistors on a VLSI chip can be very expensive in terms of size. The fact that the OTA is a very simple amplifier structure means that it can be built in a very small area. The layout for a single OTA neuron is about 3400 pma which suggests the potential for more than 20,000 synapses on a 330 mil x 330 mil die. In reality, some of the die space would be used in routing the interconnects and in the layout for on chip DSP engines and address generators. A conservative estimate would be 10,000 neurons/ synapses per die.
Power Requirements: One important consideration in a VLSI design is power consumption. In a digital implementatioqthe power consumption is a function of how fast you run the clock. In the analog circuit, we are concerned with the power consumption of the analog blocks used in the design. In the present OTA design, each OTA has a quiescent bias current of 2 micro amps. Thus, in the seven neuron fully-recurrent continuous Hopfield circuit considered in the next section, we have 42 synapse OTAs and 7 neuron OTAs for a total of 49 OTAs. This gives a total current consumption of 98 micro-amps. This current gives a total convergence time of only a few hundred nanoseconds. The power versus speed ratio is very good for this architecture, and the speed may be increased further by increasing the bias currents in the OTAs. this structure will consume a larger area t fl an the re-
3

OTA-Based Implementation of
The OTA circuit shown in Fig. 3 is used as the building block for the Hopfield network. N of these blocks are used with appropriate feedback to form an N-neuron fully recurrent network. The input to the system is supplied by either injecting a differential current into the two lines of each row of synapses, or by forcing a voltage there. For our simulations, we used current sources. Note that there is only one common mode amplifier for each row of synapses. This can be done since all the OTA outputs for a given row of synapses are the same so a common voltage can be used for this control.
As stated in Section 2, the power versus speed ratio is very good for this architecture, and the speed may be increased further by increasing the bias currents in the OTAs. It should be noted however, that the convergence time stated is referring to the situation where the input is not lying on a border at the same distance between two different minima. If the input is the same distance from two or more minima, the convergence time could take longer depending on the noise level in the system. In an ideal system with no noise, the network would simply sit at the saddle point and not make a decision. However, even thermal noise is sufficient to push the system towards one of the local minima.
For simulation purposes, the updates are occurring in 4.2psec intervals, yet from leakage statistics taken from the 0.8 micron process, it was determined that the update rate could be much longer with no significant degradation of the synapse weight voltage. A 4 bit DAC was used for simulation purposes, yet an increase in the resolution would not be very costly since only one DAC is needed for the entire network. An increase in the resolution of the DAC would be needed if the number of neurons were increased since the required resolution is a function of the number of patterns stored which determines the range of transconductance values needed in the synapse array. For speed, a current steering method was used which 'steers' the current to VSS when it is not being steered into the output node. The output current, IDAC, is connected to one of the synapse OTAs via the decode logic. Each one of the five input NOR gates act as a decoder to select one of the synapses to be tuned and connects that OTA to the DAC output.
The memory used here is a static memory, but a DRAM could have just as easily been used. The weight values could be generated off chip and loaded into memory, or they could be calculated on chip using a DSP engine. The address generator is a six bit counter with only 5 bits being used. The counter resets at a binary value of 21, and starts over. The reason only 21 addresses are needed instead of 42 (the number of synapses) is that the weight matrix is symmetric for a Hopfield network, so we can tune two OTAs at a time since we know that one synapse will have the same value its its symmetric counterpart. The weight value associated with a given address is recalled from memory and presented to the inputs to the DAC
Hopfield Networks
which generates the desired current for tuning.
One of the main design issues was related to the size of the capacitor in the tuning circuit of the OTA. We want the capacitor to be as small as possible so that it does not consume a lar e area, but if it is too small, then leakage currents wilf degrade the voltage on the capacitor before it can be updated again. Another issue is the clock rate at which we update the synapses. There is a finite slew time required for the IDAC to charge the storage capacitor defined by the relationship, I = C x $, where I is the current through the capacitor and dv/dt is the change in capacitor voltage for a given change in time. Since an LSB current in the DAC of 1pA was desired for low power consump tion, the final capacitor value was chosen to be lpF, assuming an update time (i.e. the period of the clock) to be 1pSec. The timing of the update cycle is not critical with the exception that the sampled value in the OTA synapse must be taken before the address decoder selects the next memory cell and changes the value in the DAC. This only requires that a short delay be introduced which guarantees that the synapse that was most recently tuned is 'unselected' before the next DAC value is loaded.
Memory Storage and Retrievak The circuit was simulated with an analog circuit simulator very similar to SPICE. The software, MTIME, is a Motorola proprietary CAD tool. Process cards were supplied by the Motorola process group for a 0.8 micron CMOS process. For simulation purpmes, the system clock which controls the rate of synapse update was running at 5MHz for the first simulation and lMHz for the second, which gives clock periods of 200 nsec and lpsec respectively. The reason for the two different speeds is that the first run has only one vector programmed into the network. For that run the accuracy of the program voltage stored on the capacitor in the OTA synapse did not need to be very high so a small capacitor was used (0.2pF) along with a faster clock rate. The smaller capacitor gave a faster slewin time creased influence of leakage currents.
Several simulation experiments have been conducted. For example, Figure 4 shows the 7 neuron outputs for the case where three vectors were stored in the network. Here, the clock is running at lMHz, so as can be seen from the output plot the simulation starts at 2Opsec. The first 20 psec were used for startup and initialization of the synaptic weight values, so this portion was deleted. The three stored vectors were as follows:
i) 1 1 1-1 1 1 -1 ii) -1 1 1 1-1 1 1, and
Naturally, the inverse of these three vectors are also local minima, so we have a total of six stored vectors. All three vectors could be accurately recalled from the network along with their inverses. A simulation was run where three inputs were presented to the system. Each input was one hamming distance away from one of the stored vectors, yet greater than one Simulation Results.
for the update cycle, yet less accuracy due to t a e in-hamming distance away from the other vectors and their inverses. The three inputs were, 1 -1 1 -1 1 1-1, -1 1 1 -1 -1 1 1, and -1 -1 1 -1 -1 -1 1. Referring back to Fig. 4 , from 23 psec to 24 psec, the first input was presented to the network. This input was one hamming distance away from the first stored vector. The bit that was in error, bit 2, quickly recovered, but bit 3 started to drift downward. This was the only error seen on the system. Another input was given to the system at 27psec, but the bit that was in error after the first input may have recovered if the system had been allowed to settle for a longer period of time. From 27 psec to 28 psec the second input was presented to the network, and the bit in error, bit 4, corrected itself in about 500 nsec. Lastly, the third vector was input from 31 psec to 32 psec and again bit 4 was in error. The bit corrected itself in less than 500 nsec and the network converged to the third stored pattern.
Extensions and Comparisons
The ONN has the capability to be a general purpose neural element which is not limited to simply implementing the Hopfield network. To confirm this we decided to investigate the circuit requirements of using the ONN in a traditional Boltzmann Machine network, which has been implemented at Bellcore using analog circuitry. For this application it was determined that storage of the weights might best be accomplished through charge stora e on a capacitor. This capacitor would then control &e bias current in the OTA which would in effect control the transconductance. Updating the weights would require the addition or subtraction of a fixed charge packet on the capacitor. A small sense amp would also be required to sense the change in sign of the weight. This would result in an increase in the basic cell area, but the DAC and the memory from the previous architecture could be eliminated.
The logic associated with performing the Boltzmann machine is fairly straightforward. For this architecture, two flip flops and approximately twelve gates are required per cell to perform the entire update prc+ cedure including the sign determination. This function could also be performed with the previous architecture by calculating the new weights and updating the memory in real time. However, this would be overkill and the hardware should be optimized for the preferred application. The noise generator needed for the Boltzmann machine could be accomplished by amplifying the device noise from a MOSFET with an opamp structure and summin this noise directly into the network. At present, we fave identified the fundamental circuit requirements and begun preliminary implementation.
The ONN has been favorably compared with "Floating Gate Synapse" based ETANN architecture, the analog network of Graf and Jackel which stores the weights digitally and uses a digital to analog converter for each interconnection, as well as the Pulse-Stream Neural Networks of Murray [SI. Details appear elsewhere.
74
Conclusion
In this paper we have presented an OTA based neuron capable of implementing many neural network types and we have contrasted this approach with other existing implementation methods. The OTA neural network has been shown to be an effective element both in speed and in programmability in the case of a Hopfield network. Modifications required for implementing other networks were also discussed. Results based on our studies indicate that the OTA based neuron promises the advantages inherent in analog VLSI implementations while overcoming many of its traditional disadvantages. 
