A simple analogsignal synapse model is developed and later implemenled on a standard 0.35pm CMOS process to provide for large scale of integration, high processing speed and manufacturahility of a multi-layer artificial neural network. Synapse non-linearity with respect to synapse weight is studied. Demonstrated is the capability of the circuit to operate in both feed-forward and learning (training) mode. The elTect of the synapse's inherent quadratic nonlinearity on learning convergence and on the optimization of weight vector update direction is analyzed and found to he beneficial. The suitability of the proposed implementation for very large-scale artificial neural networks is confirmed.
I. INTRODUCTION he signal processing speed, scale of integration, low T power consumption and manufacturability of nowadays ANNs determine their feasibility and usage in real-life applications. Due to conflicting requirements in lowering the supply voltages and increasing clock speeds of the digital circuits, many researchers consider analog implementations of neural networks as a way to carry over signal processing functions with fewer numbers of active semi-conductor devices. The integration of large numbers of neurons in a single chip is beneficial since it increases the VCdimension[ll [2] . It requires the minimization of the synapse area and a more efficient way of data exchange between neurons to be devised. In this respect, analog implementations offer certain benefits making them good contenders for real-time applications. First, they offer high processing speed since the analog signal processing is carried out through summation and multiplication of continuous current or voltage signals with virtually no delay. Second, analog implementations. typically, can have larger scale of integration since they avoid datapath organization which often requires data multiplexing, bus sharing, and data-flow control logic, further limiting the effective rate at which digital neural circuits can process input signals. The main disadvantages of the analog-based designs of A " s are considered to be their lower accuracy and the difficulties with linearity in the computations. These two factors are challenged in this article. First, it is demonstrated that the term "absolute accuracy" is often of lower significance with respect to the ability of a neural network to function in many practical applications. Second, it is demonstrated that ideal linearity in the multiplication computations is not necessarily desirable or even required In most cases, nonlinearity in the synapse transfer function is, in fact. beneficial[l 1][12] [13] . This article is limited to the discussion of the quadratic nonlinearity in the synapse multiplication function of a specific analog implementation.
The paper is structured as follows. Section I1 describes the proposed analog, nonlinear, one-transistor synapse model and explains the motivation behind avoiding use of floating-gate devices. Section I11 examines the inherent nonlinearity of the synapse with respect to its weight. Synapse model functional verification results follow brief extracts from our analytical research on the effects of the quadratic nonlinearity on the feed-forward and LMS training. Results of our circuit simulations and system-level MatLabTM verification of an artificial neuron acting as linear classifier are presented next.
Summary and conclusions presented in Section IV wrap up the paper. Summing the currents of those ''partial products", we get the complete "sum of the weighted products". To express this, we consider a single synapse, k. and define:
Next, from (I), we derive a generic form of the quadratic nonlinearity of the synapse's internal activity field with respec1 to its weight:
where C is a constant (C = 0.5). W e chose the above definitions due to practical considerations-to provide for signal values that are of the same or close order of magnitude. Nevertheless, the results in this text are more For V,, ? 1.0 V, V,, 5100mV and V , = OV, secondorder effects, including channel-length modulation, shortchannel and temperature effects are estimated to contribute an average error of -5.1%. This error, however, is considered included in the overall nonlinearity of iDand does not change the applicability of the considerations given.
For typical operating drain-source voltage ( uDS < lOOmV ) in non-saturated mode of operation, channel-length modulation contributes error of no more than 0.02% which is ignored in further consideration.
generic and can be applied to other, similar to expression (3).
non-linear relationships, provided that the relationship can be approximated linearly within a certain operational range.
For N -number of synapses, the overall synaptic activity is:
Based on these considerations, a single MOSFET device offers a simple way of constructing a "linear combiner" in hardware. Its main advantage over single-transistor synapses, implemented in analog-floating-gate capable technologies, is that it does not require any special fabrication technology, and thus it is easily integrated with other standard CMOS applications to build a complete system-on-a-chip (SoC). Floating-gate technology is available in most "standard CMOS processes; however, it is most often used for binary information storage. In order to reach a 9-bit or better analog storage resolution more specialized and expensive floating-gate fabrication technology is required. Additionally, analog floating-gate control circuits are complicated and small weight updates are difficult [14].
The proposed synapse model is inherently nonlinear but simple enough in its implementation to occupy a very small silicon area, making it very useful in VISI systems. Further, we show that this nonlineariry is not detrimental to the qualities of the proposed synapse hut, in fact, could he beneficial. We also include circuit simulation and systemlevel behavioral simulation results that support the feasibility of using such nonlinear synapses as building blocks of '4"s.
EFFECTS OFTHE NONLINEARITY

A.
To show the effect of the quadratic nonlinearity with respect to synapse weight, due to the described implementation, we evaluate the error defined by:
Effect of synapse quadratic nonlineariry in feedforward mode Expressed in terms of synapse transistor quantities:
From (6) we note that the linearity error does not depend on transistor transconductance parameters i.e., on process or geometrical parameters. For a typical signal range (U,, = 100mV, ucs = I .OV, V, = 0.63' ), we estimate this nonlinearity "error" to be less than IS%(14.29% worst-case). We could apply an input bias to an extra synapse (theta-synapse) to eliminate this "offset" error4 in feed-forward mode if needed. In feedforward mode this bias term is a known constant, thus we could eliminate this term after network training is complete and weights are known. Such correction, however, is not applied in the experiments shown since it is our belief that this inherent offset term is accounted for by the Back-Propagation algorithm during training and, thus, it can be treated by the adaptive process as "constant input noise".
B.
To study the effects of the "offset" term in (4), we use the instantaneous estimate of the gradient and the method of steepest descent in LMS training: A corresponding weight-update vector diagram is shown in Figure 1 . We define the difference between the update vector in the case of an ideally linear synapse output and the case of a nonlinear synapse with quadratic weightnonlinearity as a 'residual weight gradient vector':
Effect of synapse quadratic nonlinearity in leastmean-square (LMS) training
w, 2CWNe,(12)
.
in several applications, this nonlinearity in feed-forward mode proved not relevant to the success of the network for correct classification due to flexibility in the output space definition and, therefore, correction was not necessary We then define a ' m~d i f i e d '~ instantaneous error gradient vector estimate:
VcW IN = -X,e, + W, (13) and then rewrite the weight update rule(1 I): wN+i = WN + +++, -w,) (14) We have analyzed the effect of the modified gradient vector in two ways: effect on the direction of the weightvector update, and effect on the magnitude (norm) of the update. We have concluded that: Additionally, by expanding the error cost function in a Taylor series around the weight vector at any given time, it has been proven that the error is minimized with every step of the iterative descent regardless of the modification due to the residual weight gradient vector i.e. synapse quadratic nonlinearity with respect to its weight. A comparative analysis was also conducted between the modified update( 1 I) and the generalized 'delta rule' including the 'momentum term' as it is known by Rumelhart et al [7] . It was concluded that, while the use of the momentum term can decrease the stable range of the learning rate parameter and lead to instability [8] [9], the effect of the residual weight vector, in contrast, does not decrease the learning rate range and is stabilizing inside the cxCntlco, -determined spatial cone.
The details of this research, however, are outside of the scope of the present article and are not included here.
More information on training ANNs with non-linear synapses can be found in [15] [16].
C.
Experimental data To verify and support the theoretical findings, a number of circuit-level and system-level simulations were carried out. Circuit level simulation results and plots for nonlinear synapse operation, weight-charging, inputsignal conversion and others are exhaustive and available from the author upon request[ 17][ 191. System-level simulations were conducted using MatLabTM software to train and test a neuron using synapses with quadratic nonlinear synapses as modeled by (3) to perform a linear classifier function. Sets of 2D linearly separable clusters of random vectors were generated and then LMS steepest descent training was performed over the same data twice -once for a neuron having ideally linear synapses and again for the described model of a neuron with nonlinear synapses. More than 200 simulation runs over clusters of 100 vectors with varying cluster size and dispersion were evaluated. The results showed [IO] that the classification success of the neuron using nonlinear synapses modeled by (4) was, generally, not lower than the success rate of the correct classification of the neuron with linear synapses, and in many instances was better. Additionally, in most cases, convergence during the training of the neuron using nonlinear synapses was reached in fewer epochs than for the case of the neuron with ideally linear synapses. The results for the original neuron with ideally 
I v . SUMMARY AND CONCLUSION
We show that it is feasible to implement an analog synapse using only hasic properties of MOSFETs in a standard CMOS fabrication process. We describe the model and investigate its operation in feed-forward and learning modes of operation. Due to limited size of this paper, the complete design of the synapse circuit along with a neuron and an ANN using 2176 such neurons are not included hut can be found 
