Abstract-We have demonstrated on-chip learning in an array of floating-gate MOS synapse transistors. The array comprises one synapse transistor at each node, and normalization circuitry at the row boundaries. The array computes the inner product of a column input vector and a stored weight matrix. The weights are stored as floating-gate charge; they are nonvolatile, but can increase when we apply a row-learn signal. The input and learn signals are digital pulses; column input pulses that are coincident with row-learn pulses cause weight increases at selected synapses. The normalization circuitry forces row synapses to compete for floating-gate charge, bounding the weight values. The array simultaneously exhibits fast computation and slow adaptation: The inner product computes in 10 s, whereas the weight normalization takes minutes to hours.
I. INTRODUCTION
O UR goal is to develop silicon learning systems. We believe that these systems must possess the following attributes: high device density; low power consumption; fast, parallel computation; and slow, local adaptation. We build our learning systems as integrated circuits, achieving high device density by using MOS IC technology, effecting low power consumption by using subthreshold channel currents, and performing the requisite computations and adaptation by using innate features of the silicon-MOS physics.
We began our investigations by building single-transistor silicon synapses [1] - [5] modeled loosely after biological synapses [6] . Our synapse transistors are floating-gate MOSFETs; they possess nonvolatile analog weight storage, compute locally the product of their stored weight and an applied control-gate input, permit simultaneous computation and weight modification, and determine locally their own weight updates. We select source current as the synapse output, store the weights as floating-gate charge, and achieve bidirectional learning by using a combination of electron tunneling and hot-electron injection to modify the floating-gate charge.
Because our synapse transistors comprise a single device, and employ subthreshold channel currents, we can use them to build dense, low-power, silicon learning systems. Although with its stored analog weight, and outputs a current to the row-output wire; the row wire sums the synapse-output currents along the row. The stored weights are nonvolatile; column inputs that are coincident with row-learn signals cause weight increases at selected synapses. The error signal constrains the time-averaged sum of the row-synapse weights to be a constant, bounding the row weights by forcing the synapses to compete for weight value.
a single transistor cannot model the complex behavior of a neural synapse completely, our synapse transistors can learn from an input signal without interrupting the ongoing computation.
In this paper, we demonstrate on-chip learning in a array of our four-terminal nFET synapse transistors. We show the array block diagram in Fig. 1 . The input vector comprises 10-s pulses; the array computes the inner product of this input vector and the stored analog weight matrix. The weights are nonvolatile; column input pulses that are coincident with row-learn pulses cause weight increases at selected synapses. To prevent unbounded weight values, we enforce a constraint: The time-averaged sum of the synapse weights, in each row of the array, is held constant. This constraint forces row synapses to compete for floating-gate charge, stabilizing the learning.
The array computation and synapse-weight modification occur locally and in parallel. We describe both the computation and weight modification using rules that we derive from the MOS-transistor and MOS-oxide physics. The array achieves our goals of fast computation and slow adaptation: The inner product computes in 10 s, whereas the weight normalization takes minutes to hours.
II. THE nFET SYNAPSE TRANSISTOR
We begin by reviewing our four-terminal nFET synapse transistor. As we show in Fig. 2 , this device is an n-type 0018-9383/97$10.00 © 1997 IEEE floating-gate MOSFET, to which we add a fourth terminal for gate-oxide tunneling. We operate the synapse from a singlepolarity supply, use Fowler-Nordheim (FN) tunneling [7] to remove electrons from the floating gate, and use channel hotelectron injection (CHEI) [8] to add electrons to the floating gate. We fabricate the synapse in a 2-m n-well CMOS process (with NPN option) available from MOSIS.
A. The Synapse Stores a Weight
We select source current as the synapse output. We apply signal inputs to the poly2 control gate, which, in turn, couples capacitively to the poly1 floating gate. We operate the MOS-FET in the subthreshold regime [9] , for three reasons. First, subthreshold channel currents ensure low power consumption. Second, because a subthreshold MOSFET's source current increases exponentially with gate voltage, only small quantities of oxide charge are required for learning. Third, the synapse output is the product of the stored weight and the control-gate input, as we derive from the subthreshold MOSFET equation (1) (2) where is the source current, is the pre-exponential current, is the coupling coefficient from the floating gate to the channel, is the floating-gate charge, is the total capacitance seen by the floating gate, is the thermal voltage is the input (poly1 to poly2) coupling capacitance, is the control-gate voltage,  ,  , , and, for simplicity, we have assumed the source potential to be ground ( ). The synapse weight is the learned quantity: Its value derives from the floating-gate charge, which can change with synapse use. The synapse output is the product of and the source current of an idealized MOSFET that has a control-gate input and a coupling coefficient from the control gate to the channel.
B. Electron Tunneling Increases the Weight
We increase by tunneling electrons off the floating gate. In Fig. 3 , we show the tunneling gate current (the oxide current) versus the reciprocal of the voltage across the tunneling oxide. We fit these data with an FN fit [7] , [10] (3)
where is the gate current; is the oxide voltage; V is consistent with a recent survey [11] of SiO tunneling, given the synapse transistor's 400Å gate oxide; and is a pre-exponential current.
The present synapse requires large tunneling voltages, because the gate-oxide thickness is 400Å. Synapses fabricated in more modern processes with thinner oxides have much lower tunneling voltages. In addition, at lower voltages, the well implant that we use for tunneling can be replaced with a graded implant, reducing the synapse size. In the 2-m Orbit process, the synapse length is 48 m, and the width is 17 m. All voltages in the conduction-band diagram are referenced to the source potential, and we have assumed subthreshold source currents (I s < 100 nA). Although the gate-oxide band diagram actually projects into the plane of the page, for clarity we have rotated it by 90 and have drawn it in the channel direction. When compared with a conventional nFET, the p-type substrate implant quadruples the MOS gate-to-channel capacitance. With a 50 fF interpoly capacitor as shown, the coupling coefficient between the poly2 control gate and the poly1 floating gate is only 0.2. To facilitate testing, we enlarged the interpoly capacitor to 1 pF, thereby increasing the coupling to 0.8.
C. CHEI Decreases the Weight
We decrease by injecting electrons onto the floating gate. To permit CHEI with subthreshold channel currents, we add a bulk p-type implant to the synapse transistor's channel region. This implant serves two functions. First, it increases the peak drain-to-channel electric field, thereby increasing the hotelectron population in the drain-to-channel depletion region. Fig. 3 . Tunneling (gate) current I g versus 01=V ox . We define V ox to be the potential difference between the n + tunneling implant and the floating gate. We fit the data using a conventional Fowler-Nordheim expression. We normalized the data to the tunneling-junction gate-to-n + edge length, in lineal microns, because the floating gate induces a depletion region in the lightly doped n-well, reducing the effective oxide voltage and with it the tunneling current. Because the gate cannot deplete the n + well contact appreciably, the oxide field is higher where the self-aligned floating gate overlaps the n + . Because Ig increases exponentially with Vox , gate-oxide tunneling in the synapse transistor is primarily an edge phenomenon. Second, it raises the transistor's threshold voltage from 0.8 to 6 V; this increase ensures that, for typical floating-gate and drain voltages of about 5.5 and 3 V, respectively, the drainto-gate oxide electric field transports injected electrons to the floating gate, rather than returning them to the drain.
In Fig. 4 , we show the CHEI efficiency (gate current divided by source current ) versus the drain-to-channel potential , for a typical value of gate-to-channel potential. We plot the data versus drain-to-channel potential because the hot-electron population derives from the drain-to-channel electric field. We can re-reference our results to the source potential by using the relationship between source and channel potential in a subthreshold MOSFET [12] , [13] .
When is less than 2 V, the CHEI gate current is exceedingly small, and the weight remains nonvolatile.
When is greater than 2.5 V, the CHEI gate current causes measurable changes in the synapse weight . For reasons that we discuss in Section V, , in this application, typically is less than 3 V, and always is less than 3.5 V. Consequently, we approximate the data of Fig. 4 with a simple exponential (4) where is the gate current; is the source current; is the drain-to-channel potential; and , are fit constants. As a consequence of the synapse transistor's 6 V threshold, the floating-gate voltage usually exceeds 5 V, and the drain-togate oxide electric field strongly favors the transport of injected electrons to the floating gate. The CHEI efficiency therefore is, to first order, independent of the gate-to-channel potential, and we model the CHEI process using only (4).
D. Synapse Weight Updates Follow a Power Law
A synapse's weight updates derive from the tunneling and CHEI oxide currents that alter the floating-gate charge. Because these oxide currents vary with the synapse's terminal voltages and source current, varies with the terminal voltages, which are imposed on the device, and with the source current, which is the synapse output. Consequently, the synapse learns: Its future output depends on both the applied input and the present output.
In Fig. 5 , we show the temporal derivative of the source current versus the source current, for a synapse transistor with (part A) a set of fixed tunneling voltages, and (part B) a set of fixed drain voltages. In both experiments, we held the controlgate input fixed; consequently, these data show the synapse weight updates , as can be seen by differentiating (2) . In Appendix A, we show that the tunneling-induced weight increments follow a power law (5) where we define and in (16) and (17), respectively. In Appendix B, we show that the CHEI-induced weight decrements also follow a power law (6) where we define and in (26) and (27), respectively.
III. THE LEARNING ARRAY
In Fig. 6 , we show one row of the learning array, comprising a synapse transistor at each array node and a normalization circuit at the row boundary. The column inputs and the row-learn signals are 10 s digital pulses. Each synapse multiplies its binary-valued input with its stored weight , and outputs a source current whose magnitude is given by (2) . The total row current is the sum of the source currents from all the synapses in the row. Synapses ordinarily are on; low-true gate inputs turn off selected synapses, decreasing the current transiently. This decrease in , in response to an input vector , is the row computation. and plotted (@I s =@t) versus I s . We fixed the synapse's terminal voltages; consequently, the change in Is is a result of changes in the synapse's weight W . In part A, we applied V in = 5 V, V s = 0 V, V ds = 2 V, and stepped Vtun from 29 to 35 V in 1 V increments; in part B, we applied V in = 5 V, V s = 0 V, V tun = 20 V, and stepped V ds from 2.9 to 3.5 V in 0.1 V increments. We turned off the tunneling and CHEI at regular intervals, to measure I s . Because, for a fixed V in , the synapse's weight updates @W=@t are proportional to (@Is=@t) [see (2) ], these data show that the weight updates follow a power law. The mean values of () and (") are 0.17 and 0.24, respectively.
Synapse-weight increases occur only when both the row and column inputs, and , are true. To see why, we first consider the case when the row learn signal is false ( is low). Because , when is low, is small for every synapse in the row. When is small, the tunneling currents are small, and there is no weight increase at any row synapse. Now we consider the case when is true ( is high). increases as decreases, and follows . If a lowtrue column input is true, then is low; is large, and electron tunneling causes a weight increase at the selected synapse. If, on the other hand, the low-true column input is false, then is high; is too small to cause appreciable tunneling, and there is little change in the synapse's weight.
Tunneling increases the weight value of a row-column selected synapse. Because this weight update is single quadrant, tunneling allows unbounded weight increases. To constrain the array-weight values, we renormalize the weights in each row of the array. Our array affords unsupervised learning [14] , with the following constraint: The sum of the row-synapse weights, averaged over time, is a constant. The array error metric is a weight normalization; we use CHEI feedback along each row of the array to enforce the constraint.
IV. WEIGHT NORMALIZATION
The weight-normalization circuit (see Fig. 6 ) compares , the sum of the synapse drain currents in a row, with , the bias current in transistor ; if , then the circuit uses CHEI to renormalize the weights. To explain the renormalization, we begin by defining row equilibrium: A row is in equilibrium when . In equilibrium, the drain voltage typically causes little or no CHEI in the row synapses.
The normalization circuit constrains as follows: Assume that the row initially is in equilibrium, and that tunneling then raises the weight values of selected synapses, increasing
. The excess drain current ( ) is mirrored by and into capacitor , causing to rise; forces to follow . When rises, all the row synapses undergo CHEI, decreasing all the weights, causing to fall. As falls, also falls, and the row returns to equilibrium. The drain-current constraint requires that, over time, . The normalization circuit creates a negative resistance at the synapses' common drain node, causing to rise when increases.
We now show how the drain-current constraint renormalizes the synapse weights. We begin with the constraint (7) In Section V, we show that the renormalization time constant exceeds 10 s; this value is 10 times longer than the 10-s input pulses (where ). Consequently, for renormalization, we replace in (2) with its temporal average , and we assume that both is time invariant and has the same value for all the row synapses. Substituting (2) into (7), we have
The drain-current and weight-value constraints are equivalent; consequently, row feedback renormalizes the synapse weights. Renormalization forces the row synapses to compete for floating-gate charge; when one synapse's weight value increases, the sum of the weight values of its row neighbors must decrease by the same amount. However, when a selected synapse tunnels, increasing its weight, renormalization forces all the row synapses to undergo CHEI, decreasing all the row-synapse weights. The selected synapse undergoes both tunneling and CHEI; because the exponent in the CHEI weight-update rule is larger than that in the tunneling rule [see Fig. 3 ), syn1's floating gate receives about 100 times more charge than do the other synapses' floating gates; because W increases exponentially with floating-gate charge [see (2) ], syn1's weight increases much more than do the other synapses' weights. The weight increase causes I sum to rise, which, in turn, causes the normalization circuit to raise V d . Because the CHEI efficiency increases with V ds (see Fig. 4 ), a higher V d causes CHEI in all the synapses, decreasing all the weights. The array eventually settles back to equilibrium, with I sum equal to I b , but syn1 now takes a larger share of the total row current, and the other synapses each take a smaller share. The inverting amplifier in the weight-normalization circuit enhances loop stability, for reasons that we discuss in Section V.
(5) and (6)], renormalization constrains a synapse's weightupdate rate, in addition to its weight value.
Tunneling and CHEI effectively redistribute a fixed quantity of floating-gate charge among the row synapse transistors. In Appendix C, we derive the array learning rule, for coincident ( ) pulse inputs to synapse (10)
where we define and in (26) and (36), respectively. In Figs. 7 and 8, we show unsupervised learning in one row of our array; these data highlight both the synapse weight and the update-rate constraints. We fit the data by applying (10) and (11), recursively; the only inputs to the fit equations are the synapse weights at and the fit constants , and .
V. NORMALIZATION-CIRCUIT STABILITY
The normalization circuit creates a negative resistance at the synapses' common drain node: When increases, rises. The loop output is , and the loop feedback comprises CHEI oxide currents: When rises, CHEI decreases the synapse weights, causing to fall. Because the CHEI oxide currents increase exponentially with , the loop dynamics are highly nonlinear. We therefore describe qualitative, rather than quantitative, loop-stability criteria.
The normalization circuit employs positive feedback; to ensure stability, we must make the loop gain less than unity for all frequencies. This requirement implies that the smallsignal impedance , looking into the synapse drain terminals, must be greater than the total impedance , at capacitor . To see why, we assume instead that . A rising induces a small-signal current ; is mirrored by and into , causing to rise by an amount . Because follows , if , then ; will increase rapidly, causing to rise toward . The impedance is limited by interconnect capacitances, and by synapse-transistor channel-length modulation, floatinggate-to-drain overlap capacitance, and drain-current impact ionization. We consider each of these limitations in turn.
A. Interconnect Capacitance
Interconnect capacitance at the synapses' common drain node causes to decrease with frequency. We choose to be much larger than this parasitic capacitance, so the reactive impedance ratio, , favors loop stability for all frequencies.
B. Channel-Length Modulation
Channel-length modulation reduces a synapse's drain impedance, limiting . Fortunately, the synapse transistor's Early voltage exceeds 100 V, as a result of both the 10 m channel length and the p-type channel implant; consequently, the channel-length modulation is small.
C. Floating-Gate-to-Drain Overlap Capacitance
couples to a synapse transistor's floating gate, by means of the floating-gate-to-drain overlap capacitance . The coupling coefficient is , where is the total floating-gate Fig. 7 . Array learning behavior, with fits. We initialized all synapses to the same source-current value prior to starting the experiment. We first applied a train of coincident (x; y) 10-s pulses to synapse 1, causing its weight value and source current to increase. Renormalization caused the weight values and source currents of the other synapses to decrease. Once synapse 1 had acquired 90% of the total row current, we removed the pulse-train stimulus and instead applied it to synapse 2, and then, in turn, to synapses 3 and 4. We measured the synapse source currents after every 10 3 input pulses. In the lower half of the figure, we highlight the first 1600 data points, and fit these data by applying (10) and (11), recursively. The inputs to the fit equations are the initial synapse source-current values (at n = 0); the pulsewidth t pw = 10 s; and the empirical constants tun; , and ". These data show that we can address individual synapses with good selectivity, and can achieve wide separation in the weight values of selected versus deselected synapses.
capacitance. Because increases exponentially with causes to increase exponentially with , limiting . To minimize the effect, we use a large interpoly capacitor ( pF); we also apply inverting feedback from to the floating gate, increasing (see Fig. 6 ). We use an offchip amplifier to generate this inverting feedback; in future arrays, we will use instead our on-chip adaptive floating-gate amplifier [15] .
D. Drain-Current Impact Ionization
Channel electrons that posses sufficient energy for CHEI also posses sufficient energy for impact ionization [16] , [17] . In the synapse transistor, a drain-to-channel electric field that causes CHEI also creates additional electron-hole pairs, causing to increase exponentially with . As a result, increases exponentially with , limiting . If Fig. 8 . Array learning behavior, with fits. We replot the lower half of Fig. 7 , this time on a logarithmic, rather than on a linear, scale. This plot highlights both the synapse weight and update-rate constraints, and shows that the weight values of deselected synapses do not saturate, but instead follow a power-law decay as predicted by (6) and (10).
becomes greater than about 4 V, the rate of drain-current increase causes loop instability, and rises rapidly. As rises, CHEI decreases all the synapse-transistor weights; as saturates near , CHEI causes to fall below , causing to fall, and the loop to return to a stable operating regime. Loop instability causes to undergo a single brief ( 10 s) voltage spike, and reduces all the synapse weights substantially. Fortunately, because the synapse CHEI efficiency is high, weight renormalization rarely causes to exceed 3.5 V; consequently, the loop is stable.
In Fig. 9 , we show the normalization-circuit impedance versus frequency; in Fig. 10 , we show the circuit's impulse response. Although the low-frequency time constant (the adaptation time constant) decreases as increases, typically exceeds 10 s. The loop impulse response shows that, for short timescales, the total drain current can exceed , violating the normalization constraint; for long timescales, . The parasitic coupling between a synapse's tunneling junction and its floating gate is about 5 fF. With pF, a 12 V row-learn pulse increases the floating-gate voltage of every row synapse by about 60 mV. This coupling does not affect the row computation significantly, for two reasons. First, 5 V low-true column inputs always turn off selected synapses, regardless of . Second, because row-learn pulses increase the floating-gate voltage of every deselected synapse by a fixed 60 mV, we can calculate the corresponding source-current increase using (1), and can adjust accordingly.
VI. CONCLUSION
We have shown simultaneous computation and unsupervised learning in a array of nFET synapse transistors. The array computes the inner product of an input vector and a stored analog weight matrix. The array weights are nonvolatile; coincident row and column input pulses cause weight increases at selected synapses. We constrain the time-averaged sum of the row-synapse weights to be constant, forcing row synapses to compete for weight value. The array computation and synapse-weight modification occur locally and in parallel. The array achieves our goals of fast, single-transistor analog computation and of slow, locally computed weight adaptation. We describe the array computation and learning behavior using rules derived directly from the silicon-MOS and silicon-oxide physics.
SiO trapping is a well-known issue in floating-gate transistor reliability [18] ; in the synapse, oxide trapping decreases the weight-update rates. Fortunately, because our synapses require only small quantities of charge for their weight updates, we can ignore oxide trapping in the learning array safely.
Finally, although our array affords unsupervised learning, it uses a feedback error signal to constrain the weight values. Feedback error signals typically are used in supervised neural networks, to adjust the array weights according to the network learning rule. In future floating-gate arrays, rather than using unsupervised learning, we intend to use CHEI to adjust the synapse weights in a supervised fashion, using either pulsed, or continuously valued analog [19] , inputs and row-error signals.
APPENDIX A

The Tunneling Weight-Increment Rule
We begin by taking the temporal derivative of the synapse weight , where :
We substitute (3) for the gate current We substitute (where and are the tunneling-implant and floating-gate voltages, respectively), assume that , expand the exponent using , and solve (14) We substitute , and solve for the tunneling weight-increment rule (15) where (16) and (17) The parameters and vary with the tunneling voltage .
APPENDIX B
The CHEI Weight-Decrement Rule
We begin by defining a synapse transistor's drain-to-channel potential, , in terms of and . In a subthreshold floating-gate MOSFET, the source current is related to the floating-gate and source voltages [9] by (18) and the channel-surface potential, , is related to the floatinggate voltage, [12] , [13] by (19) where is the coupling coefficient from the floating gate to the channel, and derives from the MOS process parameters. Using (18) and (19), we solve for the surface potential in terms of and (20) We now solve for
The CHEI gate current is given by (4). We add a minus sign to , because CHEI decreases the floating-gate charge, and substitute for using (21)
We substitute for using (2) , and solve
We substitute (23) into ,
(24) to get the final weight-decrement rule (25) where (26) and (27) The low-true column-input ( ) pulse duty cycle typically is small, so normally is high ( ). We therefore assume that is a constant ( V) in (27).
APPENDIX C
The Array Learning Rule
We consider the row-synapse weights at discrete time intervals , where is the step number and is the timestep, and derive the row-learning rule for a single coincident ( ) input to a single row synapse. We begin with the equilibrium condition for the row-weight normalization (28) We assume that the normalization time constant is fixed, for the following reason: Coincident ( ) input pulses cause a weight increase at a synapse; the normalization circuit responds by establishing a drain voltage for which the total weight decay, summed over all the row synapses, balances the weight increase at the single synapse. If we assume that the mean density of the coincident input pulses is time-invariant, then 's mean value, , is constant, and therefore the low-frequency loop time constant, , also is constant.
We assume that . The synapse weight values can violate (28) for times , but we require that they satisfy (28) at our measurement time intervals . We permit array inputs at times , immediately after we measure the synapse weight values at . The array inputs comprise a pulsed column vector , where V V , and a pulsed row vector , where V V . Without loss of generality, we assume that at time , the circuit is in equilibrium, and that at , coincident row and column inputs, of duration , have caused synapse 's weight to increase (29) (30) where in (29) we have made the first-order approximation that is constant over , and in (30) we have substituted for using (5) . Because , at time the circuit no longer is in equilibrium (31) and the synapse weights inject down to reestablish equilibrium.
We wish to find the synapse weights at ( ), when the row again satisfies (28). Using (25) and (30), we write weight-decrement expressions for the row synapses (32) (33) where, because the row drain voltage settles during renormalization, may vary over (recall that ). For reasonable values of and , the weight increment from a single coincident ( ) input is small; consequently, we can simplify (33) using (34)
Because varies over , we now re-express in terms of quantities that we know at . We equate the weight increment at synapse [see (30) ] to the sum of the weight decrements at synapses (32) and (34)
and we solve for :
We define , substitute into (32), and use (28) to solve for the row-learning rule (10) (11) Equations (10) and (11) describe the row weight-update rule for a single coincident ( ) pulse input to synapse .
