Four-quadrant one-transistor-synapse for high-density CNN implementations by Domínguez Castro, Rafael et al.
1998 Fifth IEEE International Workshop on Cellular Neural Networks and their Applications, London, England, 14-17 April 1998 I 
Four-Quadrant One-Transistor-Synapse for High-Density CNN 
Implementations 
R. Dominguez-Castro, A. Rodriguez-Vftzquez, S. Espejo, R. Carmona 
Inst. de Microelectronica de Sevilla - Centro Nac. de Microelectronica - C.S.I.C. - Universidad de Sevilla, 
Edf. CICA, C/Tarfia s/n, 41012-Sevilla, SPAIN. 
Phone: +34 5 4239923; FAX: +34 5 4231832; email: rafael@imse.cnm.es 
ABSTRACT: This paper presents a linear; ,four-quadrants, electrically-programmable, one-transistor synapse 
strategy applicable to the implementation of general massively-parallel analog processors in CMOS technology. I t  
is specially suited for  translationally-invariant processing arrays with local connectivity, and results in a signGCant 
reduction in area occupation and power dissipation of the basic processing units. This allows higher integration 
densities and therefore, permits the integration of larger arrays on a single chip. 
1. Introduction 
The most important trend in the electronic implementation of CNNs is the maximization of the number of elementary pro- 
cessing units that can be placed in a single chip. This must be combined with the achievement of an acceptable accuracy for 
the system parameters. The first trend imposes a strong commitment in the design of the cell circuitry: area-efficiency. A sec- 
ond but also important objective is power-economy. Unfortunately, in analog processing circuits, area and power consumption 
of individual devices are directly related to accuracy [ 1). Therefore, the selection of a circuit strategy as simple as possible for 
the prescribed processing function is crucial. 
This paper proposes a one-transistor, four-quadrants, electrically-programmable, linear synapse for the implementation of 
massively-parallel analog-array processors in CMOS technologies. The proposal is easily extensible to general artificial neural 
networks. 
2. Analog-Array-Processors Elementary Units 
Each processing unit (or cel l )  in a massively-parallel analog-array processor can be characterized by an interconnection 
pattem and by an specific processing function. These characteristics are often considered invariant from cell to cell, resulting 
in an additional simplification of the electronic implementation. This property, commonly referred to as spatial invariance or 
uniformity, will be assumed without loss of generality. 
A common characteristic of massively-parallel analog processing algorithms is the local computation (within each cell c )  
of a weighted aggregation of contributions from the cells in its neighborhood, 
N 
yc = 1 Aixi  (1) 
i = l  
The aggregated signal y c  is then used as input to a processing-block which realizes some specific function, and generates 
an output xc representative of the cell state. In tum, this output constitutes the (unscaled) contribution of the cell to its neigh- 
bors. This general cell architecture is illustrated in Figure 1. The output xi of each neighbor is weighted by a coefficient A;, 
which is independent of the particular receptor cell c under the assumed spatial uniformity. Except for specific purpose sys- 
tems it is generally required that the scaling coefficients (or weights) Ai be electrically programmable, for versatility reasons. 
- . -  
W !  W N  
Analog weight-contml bus 
Figure 1. General  architecture of a n  analog-processing-array elementary processing unit (cell). 
At a system level, the required number of signal-scaling circuit-blocks (or synapses)  can be computed as N times the num- 
ber of cells. The associated area and power consumption, as well as the obvious effects of the synapse accuracy on the overall 
performance of the system, renders the selection of the synapse circuitry a crucial issue in the design of integrated analog array 
processors. 
3. Electrically-Programmable Synapses 
Electrically programmable synapses must be driven by the input signal xi, and by a weight signal w; used to program the 
scaling coefficient Ai. Under the assumed spatial uniformity, only N different weight values will coexist throughout the array. 
Therefore, a reduced number (N) of global nodes (common to all cells in the network) satisfy the programming-related routing 
requirements of the array, if the programming signals wi are codified as voltages. Since every cell output xc is transmitted to N 
synapses located within its N neighboring cells, it is also appropriate that synapse input signals x; be codified as voltages. 
Finally, because scaled signals must be added at the input of each cell’s processing block, it is convenient that synapse output 
0-7803-4867-2/98/$10.0001998 IEEE 
243 
signals A+; be given in current form, eliminating the need of a dedicated summing circuit. 
Although the required functionality of a programmable analog synapse may suggest the use of linear analog multipliers, 
there are some specific circumstances common to almost every analog-array processing algorithm which expand the set of 
selectable circuit blocks. 
First note that while the synapses output current is expected to be linear with the input signal xi, it is not required to be lin- 
ear with the weight signal wi, whose function is simply to allow weight variations in some prescribed range. Therefore, func- 
tion A,(w,) may be nonlinear in general. 
Second, in almost every analog-array processing algorithm, the weight values are invariant during processing. Therefore, 
the dynamic response with respect to the weight signal is of little concern and, more important, after setting the weight values, 
any error or deviation from the ideal behavior independent of the input signal xi (but in general dependent on wi) may be can- 
celled using autozeroing, before performing the processing function. 
This is a good practice in general because the offset of the aggregated signal is given by the addition of the output-current 
offset of the N synapses driving each cell. Indeed, this is often the dominant error source of this class of systems. 
In the last years, synapse circuits based on MOS transistors operating in their ohmic region have been employed by several 
authors [ 2 ] ,  [3]. The choice is based on a combined estimation of several performance figures (including area occupation, 
accuracy, linearity, programming weight range, signal-range, and power efficiency) which predicts important advantages as 
compared to other classes of synapse circuits based on the quadratic law of MOS transistors in saturation, or the exponential 
law of bipolar transistors and MOS transistors in weak inversion. 
Regardless the family, practically all synapse circuits employ differential or fully differential architectures to achieve four- 
quadrants behavior and also for linearity reasons. Typical examples include those synapses based on differential pairs, like the 
Gilbert multiplier [4], and also the synapses employed in [2] and [ 3 ] .  In addition to the larger complexity of the synapse, this 
usually forces the use of differential or fully differential architectures in the processing block as well, thus resulting in a sub- 
stantial increase in area occupation. 
In the following section we propose a one-transistor, four-quadrants, electrically-programmable synapse circuit with sin- 
gle-ended architecture. 
4. A One-Transistor Synapse Circuit 
well-known first-order approximation 
The DC current of an MOS transistor operating in its strong-inversion ohmic region can be described by the following 
where 
and every symbols has its well established meaning in MOS literature. 
Eq. (2) predicts an incrementally linear relation between and Vcs , and an approximately linear dependence with VDS 
for vDS << 2 [ vcS - v+vSB)] . These considerations have been widely exploited for many applications, including MOS implemen- 
tations of active RC filters [ 5 ] ,  analog multipliers for RF communication circuits [6] and also synapse circuits for massively- 
parallel analog processing systems [ 2 ] ,  [3] using differential architectures. 
The use of a single transistor to implement an electrically programmable synapse with signals represented by single-ended 
voltages requires that one of the diffusion terminals be set to a fixed voltage level. The gate and the other diffusion terminals 
can then be employed as input points, while the output is obtained from the current flowing out of the fixed-voltage diffusion 
terminal. This is conceptually illustrated in Figure 2a in which a nullator and a DC voltage source represent the ideally null- 
impedance input terminal of the processing block in Figure 1. Such a virtual-reference level is required because the output 
impedance of the synapse is low due to its operation in the ohmic region. The input impedance at the diffusion input is also 
low, while that at the gate input is high. Two altematives can then be considered: using the gate terminal for xi and the diffu- 
sion terminal for wi, or the other way around. This has implications on the output-impedance requirements for either the cell 
processing-block or the voltage sources driving the analog-weight control bus, but the major decision factor is related to lin- 
earity. 
Equation (2) is valid only for vDS 2 o and therefore, the use of the notation introduced in Figure 2a requires an independent 
consideration of the two possible cases vA 5 vL and vA s VL . Still, simple analysis results in the following combined expression 
for IN,  valid in either of the two cases, 
with 
Note that the second summand in (5) is independent of Vc and that the first summand is linear with vG . Since we need a 
244 
linear behavior with respect to one of the inputs (xi) and we can eliminate any systematic offset in a previous step, it seems 
straight forward that we can chose x i =  vG and wi vA , as shown in Figure 2a. 
- -  
Figure 2. a )  One transistor synapse concept, b )  Cell (top) and peripheral (bottom) circuitry for independent-terms substraction. 
There is still another issue related to the obtention of a four-quadrants behavior. While the two possibilities VA 2 V L  and 
vA 5 vL provide double sign capability for the weight, vC must always be positive. Therefore we must select a sufficiently 
high reference level on the gate voltage to act as the zero level for xi. Let us define: 
(7) v x -  v x + VXO =3 v, = vx - Vxo =xi 
and in the same manner, referring the weight voltage to vL , this is, selecting vIV0 = vL , we have 
vw = vti + VL * v, = vw - VL = w; 
Using this new notation, equations (5 )  and (6)  can be rewritten respectively as: 
V 
where both v, and v , ~  can be either positive or negative and still, the first summand in (9) is linear with V, and the 
second one is independent of v, . We define the weight and the output offset of the synapse, respectively, as 
G(Vw) = PV, (11) 
]"(Vu.) = G(V ) ( V  -I' - v  -2 i v  x0 T L " 1  2 
This allows (9) to be written in the following form 
lN(VW' V$ = G(V,,V, + 10(V,) 
Let us now assume that we can eliminate the term ~ ~ ( v , ~ ) ,  Then, we can define 
l N = f + / o ~ l n  = I N - I , , - A i ~ ;  
and rewrite (1 3) as 
IJV,<, V,) = G(Vu.)VX (1 5 )  
which is the equation of a four-quadrants, electrically-programmable, linear analog synapse. This relies only on 
the separated dependencies shown in (13), and not on the specific forms of G(v,) and I ~ ( V , J ,  a fact that will 
become relevant for the consideration of second order effects in a latter section 
5. Cell and Control Circuitry 
In order to preserve the high area efficiency provided by the one-transistor synapses, the circuitry employed within each 
cell for the elimination of the N second summands (one per synapse) should be as simple as possible. This is made easier by 
the fact that the cell circuitry itself computes the sum of these summands, which can therefore be eliminated altogether. 
Under the assumed spatial invariance of the weight signals, common to most analog-array processing systems, the term to 
245 
be eliminated in each cell is also spatially invariant. Therefore, we can reproduce its value in a small circuitry, shared by all the 
cells in the network and placed at the periphery of the cell array, and substract it at the input nodes of each cell processing 
block. This can be achieved, without additional cost in the cell circuitry, through proper weight-dependent biasing of the pro- 
cessing-block input stage. 
As an example, Figure 2b describes a simple technique. The top part of the figure represents the cell circuitry (identical to 
Figure 1 using one-transistor synapses) which includes a class-A input stage as part of the processing-block input circuitry 
(bias current source is not shown for simplicity). The circled transistor performs the substraction of the N second summands in 
(13). The lower part of the figure describes the required biasing devices, placed at the periphery of the cell-array and shared by 
the whole network. 
This technique relies on the use of matched current-conveyors [4] at the peripheral circuitry and at the input node of the 
cells processing-block. Their electronic implementation can be shown to be highly efficient in terms of area and power con- 
sumption [3]. 
It might be argued that large-distance mismatch effects could result in cancellation errors of the independent terms. How- 
ever, as mentioned earlier, it is generally convenient to employ autozeroing techniques. Such autozeroing, which can be easily 
implemented with area-efficient current memories [7] in the cases considered (current-output synapses), would eliminate the 
cancellation error as well. Indeed, the use of autozeroing may render unnecessary the proposed substraction circuitry, since it 
could be used to eliminate the complete sum of independent terms rather than their remaining error. Although combining high- 
signal-ranges and high-absolute-accuracy is often difficult, the recently proposed class of S21 current memories [SI may pro- 
vide a good solution. 
6. Operation Limits and Second Order Effects 
MOS transistor, which can be approximated by 
One fundamental limitation to the operation range of the proposed synapse is imposed by the ohmic region limits of the 
'GS' 'D.7' v f ivSB)  (1 6) 
(17) 
Substitution of the previously employed notation in this equation yields the following lower limit for the gate voltage 
>{VL+V+VW) ;vw'vL 
x- VW'V+VL) ;vw'vL 
Except for this limit, no other restrictions exist on the proposed synapse, on the basis of the first order model considered. It 
can be shown that there is only one second order effect, mobiliy degradation, which represents a relevant deviation from the 
functional dependence expressed in (1 3). Other second order effects affect only to the precise form of G(v,+,) and I&v,) , some- 
thing irrelevant for our discussion. 
Mobility degradation models predict a reduction in the effective carriers mobility (p in equation (3)) with transversal (nor- 
mal to channel surface) electric field, something that affects our present discussion because the transversal electric field 
depends on the gate voltage and thus, the first summand will not be linear with v, . Although the widely accepted simple 
model for mobility degradation [9].  
predicts a continuous reduction of  the effective mobility starting jus t  above vGs = v+vsB), the fact i s  that in most 
technologies, there is an appreciable ( - 2 . 0 ~ )  vGs range above v+vSB) within which mobility reduction is negligi- 
ble. Furthermore, some higher level models accounting for mobility degradation employ a specific parameter to  
define a field threshold below which no mobility degradation occurs (UCRZT in SPICE level 2 [IO]). Regardless 
the continuous or thresholded modelling of mobility degradation, w e  can always define a maximum effective gate 
voltage 
'GEhl,b.x = 'vGS-v+vSB)l (19) 
below which any reasonable linearity requirements are satisfied. The operation of the synapse must be  restricted 
to this range. 
Performing the appropriate substitutions in (19) yields the following upper limit for the gate voltage, 
The selection of vL and vX,, must be made based on the limits imposed by (17) and (20). In tum, this will result in an 
upper limit for the allowed signal ranges of v, and v, . Equation (17) can be rewritten as, 
v, + vxo t v, + V L  + "+VL) 
v, + Vxo 2 VL + v+v, + VL) 
;v, 2 0 
;v, 2 0 (21 1 
Let us denote the signal ranges of vw and V, by IV,$( < VWma and IV,I < V,,,, , respectively. In the above equation, the 
worst-case limit for v,,, is given by the first inequality when V ,  = v , ~ ~ ~ ~  and v, = -vxwax ,which yields, 
' x o  ' ' L  + "+'L) + Vwmax + Vxmax (22) 
246 
Regarding vL , its value must be sufficiently high to provide room for the minimum V _  value and also for some possible 
loss of voltage range due to the limited output swing of the circuits generating the analog weight control signals, which we 
denote as vWmin. This is, 
VL 2 'wmax + 'Wmin (23) 
Because an upper limit for the voltage ranges exist due to mobility degradation, and also because we are interested in max- 
imizing the signal ranges and/or allowing a reduced power supply operation, we will select the minimum allowed value for 
vL . Substituting vL = vtimax + vWmin in (22) results in a minimum value for vX0,  
Vx0 2 'Wmin + 'hVwmax + 'Wmin) + 2vtimax + 'xmax (24) 
Again, we select vxo as its minimum allowed value. The resulting maximum value for vX is then given by 
vX,, = vX0 + v,,,, , this is, 
V ~ t m x  = 'Wmin + "+'wmax + VWmin) + 'Vwmax + "xmax 
and the worst case mobility degradation limit will be imposed by the first inequality in  (20), when v, is mini- 
mum, this is, vw = vL- v,:,,, = vWmrn, which 
yields, 
Vxmax ' VWmin + V+Vwmin) +  GEM^ 
Using (25) into (26),  yields, 
(27) 
1 1 
vwmax + vxmax' 2 -' G E M a  - 3 1 V+Vivmax + 'wmin)- V P ~ m i n ) ]  
For moderate linearity requirements, the right hand side of the above equation takes values in the range of one volt for typ- 
ical CMOS technologies, which for v,,,,, = v,,,, provides a peak-to-peak signal range of about one volt for both Vx and VIb . 
Figure 3 illustrates the above discussion and shows the voltage distribution selected for a particular technology: a standard 
n-well, 0.8pm CMOS process available through EUROPRACTICE. With small changes, these values should be valid for most 
typical CMOS technologies. Note that the minimum power supply level should be at least slightly above vx,,, = 3 . 4 ~  to pre- 
v = 3 v  v 0 v L =  I V  xo 
- 7 
' G E M ~  + V + V ~ m i n )  - 2.8v 
Figure 3. Voltage range distribution for synapse operation. 
vent the possible loss of voltage range due to the limited output swing of the processing block. Still, an optimization of the out- 
put swing of both the analog weight control drivers and the output stage of the processing block should allow the operation of 
the proposed synapse with power supply levels in the range of 3.3v, with similar signal swings for V, and vw . 
Figure 4 provides an additional insight into the selection of vL and vxo values and the associated signal ranges. It shows 
the allowed operation region, delimited by (17) and (20) in the vx. vw plane, within which a squared range (under the assump- 
tion Vtimax = VXmax ) for signals v, and V ,  , centered around ( v ~  vxo) , must be defined. The graphs correspond to vL = I v and 
the specific parameters of the technology being employed. Note that although apparently, an appreciable increase in signal 
ranges could be obtained by increasing the values of vL and vx0, this is not true in general because the limits imposed by (17) 
and (20) will also shift with vL . In view of (27), the increase would be small. On the other hand, it would require a larger 
power supply. 
-0.4~ < V,, S 0.4~ vxt fly- @ 
1.51 I l l  F 
0 VL "W 
Figure 4. Signal ranges delimited by ohmic region and mobility degradation limits. 
247 
7. Results 
In this section we will illustrate the behavior of the proposed synapse in a specific n-well, 0.8pm CMOS technology avail- 
able through EUROPRACTICE. Figure 5 contains HSPICE level 2 simulated transfer characteristics of the proposed synapse. 
Transistors sizes are W =  6pm and L = 24pm. These geometries correspond to reasonable sizes in a practical application, in 
which a low aspect ratio serves to the purposes of having reasonable current levels through the analog weight-control lines, as 
well as moderate power dissipation in the chip. Large channel areas (relative to technology resolution) are required for match- 
ing considerations [ 11. Still, low resolution technologies are highly convenient for matching considerations and also because 
most of the cell area is usually dedicated to contacts, routing, and active region separations. 
Figure 5a reflects the total transistor current IN versus the total gate voltage Vx, for different values of the weight signal 
voltage vW . The value of vL is 1 . 0 ~ .  Values of v,, , relative to V L ,  range from - 0 . 4 ~  (lower trace) to + 0 . 4 ~  (upper trace) in 
50mv increments. The mobility degradation effects are clearly visible at the right side, while those related to the pinch-off 
region can be observed at left side, specially for positive values of v , ~  . The region around vx0 = 3 . 0 ~  reflects the behavior pre- 
dicted by (13). Figure 5b shows the result obtained after substracting the independent term ~ ~ ( v , ) ,  by means of the circuitry 
described in Figure 2b. Wort-case (v,, = 0 . 4 ~ )  total harmonic distortion (THD) is below 0.05% at 1Hz and 0.7% at 1MHz. 
Similar results are obtained from the p-channel version of the circuitry. 
Figure 5. HSPICE level-2 simulations of the proposed synapse: a )  Current IN in Figure 2 versus gate voltage V, for 
different values of V,. b)  Current I,, = IN - I ,  obtained using the circuitry described in Figure 2b. 
8. Conclusions 
This paper has proposed and discussed an electrically programmable, one-transistor, four-quadrants linear synapse strat- 
egy for massively-parallel analog array processing systems, based on MOS operation in the triode region. Signal ranges for 
both the input and the weight signal are in the range of lvpp, and total harmonic distortion is below 0.7% at IMHz. Operation 
from reduced power supplies of about 3 . 3 ~  seems feasible. The proposed synapse circuit results in a substantial reduction in 
area and power consumption of the basic cell, as compared to traditionally employed synapses based on differential or fully- 
differential architectures. This allows the realization of array processors with a larger number of units in the same chip. 






M.J.M Pelgrom, A.C.J. Duinmaijer and A.P.G. Welbers: “Matching Properties of MOS Transistors”. IEEE J. Solid-State 
Circuits, Vol. 24, pp 1433-1440, October 1989. 
P. Kinget and M. Steyaert, “An Analog Parallel array Processor for Real-Time Sensor Signal Processing”, I996 Int. SoEid 
State Circuits Conference, paper 6.1, 1996. 
S. Espejo, A. Rodriguez-Vizquez, R. Carmona and R. Dominguez-Castro: “A 0.8pm CMOS Programmable Analog- 
Array-Processing Vision-Chip with Local Logic and Image-Memory”. 1996 European Solid State Circuits Conference, 
pp. 276-279. Neuchtitel, September 1996. 
[4] C. Toumazou, F.J. Lidgey, D.G. Haigh (Eds.): “Analog IC Design: the Current-Mode Approach”, Peter Peregnnus, 1990. 
[5] Y.P. Tsividis: “Integrated Continuous-Time Filter-Design -- An Overview”. ZEEE J. Solid-State Circuits, Vol. 29, pp 166- 
176, march 1994. 
[6] B. Song: “CMOS RF Circuits for Data Communication Applications”, IEEE J. Solid-state Circuits, V01.21, pp 310-317, 
April 1986. 
[7] C .  Toumazou, J.B. Hughes, N.C. Battersby (Eds.): “Switched-Currents an Analog Technique for Digital Technology”, 
Peter Peregrinus, 1993. 
[8] J.B. Hughes and K.W. Moulding: “S’I: A Switched-Current Technique for High Performance”. Electronic Letters. 
[9] Y.P. Tsividis, “Operation and Modeling of the MOS Transistors”. New York McGraw-Hill, 1987. 
[IO] P Antognetti, G. Massobrio (Eds.): “Semiconductor Device Modeling with SPICE’, McGraw-Hill, 1988. 
V01.29, NO. 16, pp. 1400-1401, August 1993. 
248 
