To enable a dense integration of model synapses in a spiking neural networks (SNN) hardware, various nanoscale devices are being considered. Such devices, besides exhibiting spike-timing dependent plasticity (STDP), need to be highly scalable, have a large endurance and require low energy for transitioning between states. In this work, first, we introduce and empirically determine two new specifications for a resistive random-access memory (RRAM) based synapse: number of conductance levels per synapse and learning-rate. To the best of our knowledge, there are no RRAMs that meet the latter specification. As a solution, we propose the use of multiple RRAMs in parallel within a synapse. While synaptic reading, all RRAMs are simultaneously read and for each synaptic conductance-change event, the mechanism for conductance STDP is initiated on only one RRAM, randomly picked from the set. Second, to validate our solution, we experimentally demonstrate STDP of conductance of a Pr 0.7 Ca 0.3 MnO 3 (PCMO)-RRAM and then show that due to a large learning-rate, a single PCMO-RRAM fails to model a synapse in the training of an SNN. As anticipated, network training improved as more PCMO-RRAMs were added to the synapse. Fourth, we discuss circuit-requirements for implementing such a scheme, to conclude that the requirements are within bounds. Thus, our work presents specifications for synaptic devices in trainable SNNs, indicates the shortcomings of state-of-art synaptic contenders, and provides a solution to extrinsically meet the specifications and discusses the peripheral circuitry that implements the solution.
I. INTRODUCTION
While decoding a brain"s functioning, mimicking biology using models of brain has been a significant approach taken by neuro-scientific community. Also, to perform cognitive computing tasks while maintaining energy efficiency of a human brain has been the grand challenge in the post-scaling and data-driven era. Executing these tasks using spiking (or, third generation) neural networks in a bio-inspired hardware that is functionally and structurally similar to a brain will greatly help in achieving these goals. Such bio-mimicry has led to adoption of a bottom-up approach that distributes and integrates the processing and memory -electronic neurons as computing units and electronic synapses as memory units connected as required in network. Though various VLSIamenable circuits have been proposed that can mimic a synapse [1] - [3] , to integrate as any many as 10 14 synapses within a volume of a human brain requires the model synapse to have dimensions in the order of the thickness of a synaptic cleft (~10nm) [4] - [6] . This has led to an appreciable research and development of a plethora of novel nano-scaled devices that can approximately mimic a biological synapse [5] , [7] - [19] . Of all the novel nano-scaled devices for modeling synapses in hardware, memristors/RRAMs (Resistive Random-Access-Memory) have gained remarkable research interest. These are non-volatile resistive memory-elements whose state/resistance can altered by applying sufficiently strong voltage-pulses and are strong candidates as weights in electronic in-situ trainable neural networks.
Spike-timing dependent plasticity (STDP) rule, a type of Hebbian "local" learning-rule, is considered to be an essential property of any SNN based synapse that their models must exhibit [20] . By changing the strength of voltage-pulses being applied as a function of time-since-last-spike, STDP rule has been demonstrated on various RRAMs [5] , [11] - [18] . But, for an extensive and large-scale utilization of these devices in cross-bars as a synaptic array, it needs to (1) be highly scalable, (2) have excellent endurance, (3) have lowenergy switch-ability and (4) be compatible with complementary metal-oxide-semiconductor (CMOS) technology [5] , [21] , [22] . However, other important but relatively unexplored requisites from the synaptic RRAM that are discussed in this work include: (1) analog range of conductance or ample number of memory states/bits [22] - [24] and (2) a low value of maximum STDP-dictated weightchange (mathematically, low | | ) at each weight-change event. Requisite (1) is based on the fact that in nature, most STDP rules observed in biology are analog [20] , [25] and so are the data-sets (e.g. data-sets of images in color/grey-scale, Fisher"s Iris, Wisconsin"s breast-cancer and chemical assays like wine [26] - [28] .) With only additional costs can either be synthetically transformed into the binary domain, thus often necessitating analog or multi-level synapse. Requisite (2) derives from the fact that all STDP learning-rules have a point of maximum weight-change (at point(s) of highest/least time or rate correlation [20] , [25] ). While training, this value should be kept small for a stable weight-evolution in a network. Since maximum weightchange is similar to learning-rate parameter in artificial neural network, these terms are henceforth used interchangeably.
In this paper, we first empirically show that for softwareequivalent training, (1) the learning-rate (maximum at each weight-change event) must be less than 2% and (2) a resolution of at least 256 levels (8 bits) per synapse are needed. Second, we show that STDP demonstrations on RRAMs up to-date depict a learning-rate of 10%-400%, thus, these devices do not meet our specifications. To tackle this problem we propose the use of multiple ( ) and parallel RRAMs in a synapse. Within such a synapse, reading requires all RRAMs but the conductance-change (as dictated by the STDP rule) is brought about in only one RRAM randomly picked from the set of RRAMs of the synapse. This way, the learning-rate is lowered from to , enabling a software-equivalent learning. Second, we validate our proposal using an interpolationmodel of STDP measurements of a standalone Pr 0.7 Ca 0.3 MnO 3 (PCMO)-RRAM (accompanying all nonidealities) for training an SNN with multiple and parallel RRAMs in all its synapse. Next, learning performance with is evaluated to show that produces excellent peak learning-performance but with significant fluctuations from epoch to epoch, while is necessary for a stable and software-equivalent learning. In comparison, 256 binary synapses are needed for equivalent programming (4× improvement). Fourth, architectural considerations of circuit implementation are then discussed and a simple circuit to implement the random programming scheme is presented. Thus, our work presents the specification for synaptic devices for analog datasets, demonstrates that challenge for synaptic candidates in literature, and presents an architectural solution to enable learning and provides a circuit implementation.
This work is organized as follows: in section 2, we give an overview of the STDP rule, followed by the basis for our claim of 2% learning-rate as a necessary condition for acceptable SNN training performance. In section 3 we report the procedure for STDP demonstration on a PCMO-RRAM and the results. In continuation, using the STDP-data we show that this device fails as a synapse. In section 4, we validate our proposal of using multiple and parallel RRAMs in a synapse. Lastly, in section 5, we discuss the hardware requirements and other consideration for adopting the proposed approach.
II. IDEAL LEARNING-RATE FOR SNNS

A. STDP-Overview
Spike-time dependent plasticity (STDP) is the most common learning-rule used in Spiking Neural Networks (SNNs). It gives a relation between (the time gap between the pre-synaptic and the post-synaptic spikes) and
, the weight change of the synapses, as illustrated in Fig. 1 .
For its illustration, we use a multiplicative exponential weight-update rule [29] , [30] comprising of exponentialdependence term, saturation factor and scaling factors ( ), as given in Eq. 2.
{
Though several other time-dependences of weightchange are possible [20] , we have chosen an exponential one due to its ubiquity in RRAM-demonstrations [5] , [11] - [19] .
model the biological saturation-effects [25] , [29] , [30] and ensures that doesn"t increase/decrease indefinitely. In Eq. 1, is positive while is negative (to strengthen causality and weaken anti-causality between spikes). As an example, we have plotted in Fig. 2: (1) a function of for various (or ) and (2) as a function of (or ) for various with 50ms, , and and:
B. Ideal learning-rate for training
Since in Eq. 1 (or any STDP equation for that matter) set the maximum conductance-change during a weightchange event (or, the learning-rate). While training an artificial neural network, the learning-rate needs to be carefully chosen [31] . Since no work exists that studies learning-rate for spiking neural networks trained using STDP, we empirically determine the learning-rate using the SNN given in [32] . This single layered feed-forward SNN can be trained via an exponential STDP rule to classify datapoints of Fisher"s Iris, Wisconsin"s breast-cancer and wine data-sets. By transforming the input-vector, this network gives state-of-art performance without adding any hidden layer. For training, following STDP rule has been used (from [32] ):
Where, represents parameters for conductance potentiation, represents parameters for conductance depression, set the maximum conductance change, set the STDP time-constant, is the maximum conductance (with 0 as minimum) and is conductance"s saturationfactor. To quantitatively study the effect of on training, we simulate the training of the network to classify Irisdataset with various values of the pair { , } and observe the evolution of classification accuracy as training proceeds. Though bigger datasets may be used, we proceed with Fisher"s Iris for the sake of simplicity of the network"s simulations.
From Fig. 3 , which plots the mean-accuracy for last five epochs of training, it is observed that 1. Lower the learning-rates, better the training in terms of both maximum accuracy and stability 2. Low synaptic weight"s depression-rate (~2%) is a necessary condition for high (more than ~90%) posttraining accuracy 3. Synaptic potentiation rate affects the training performance, but to a lower extent than depression rate . Following observations can be made:
1. For large (~35%), the training never gets completed and the classification accuracy doesn"t settle 2. For large but small (~2%), the classification accuracy settles, but is not the maximum, which means training is partial 3. For large but small , the network is unable to learn 4. For small , the learning is the best and the classification accuracy settles Further, several SNNs trained using STDP in literature rely on learning-rates less than 1% [33] - [37] . Though no work focuses on how learning-rate affects the overall training, it is very likely that the performances may suffer from similar degradation as learning-rates are increased.
C. Number of bits per digital synapses
To empirically determine the number of bits needed in a digital synapse for software-equivalent training, we trained the network used above to classify Fisher"s iris and Wisconsin"s breast-cancer datasets. The total conductance range was divided uniformly into levels. During a conductan ce-update event, the originally continuous change in conductance is discretized to the nearest conductance level. Fig. 5 plots the mean post-training per-cent classification error (CE) for 10 training experiments over 10 training-epochs. It is observed that at least 8 bits/256 discrete levels are needed to ensure lowest CE.
III. PCMO-RRAM AS SYNAPSE IN SNNS
A. STDP of RRAMs using neural write-pulses
RRAM can be approximately modeled as resistor whose resistance can be changed with voltage-pulses, if the pulsestrength exceeds writing threshold ( Fig. 6a ) [38] , [39] . This change increases, as strength above its threshold is increased. Given this approximation, to realize STDP on an RRAM, carefully shaped write-pulses are applied at the two ends of the synaptic RRAM by the two spiking neurons in context ( Fig. 6b ) [22] , [40] . These pulses are so shaped that when two such pulses, corresponding to each of the neurons attached to a synapse, relatively displaced in time, are subtracted,
1. There exists a portion in the net that always just reaches (or, goes above) the RRAM"s threshold 2. the height of the net above the RRAM"s threshold (or compactly, the overdrive) is a function of the relative displacement Each time a neuron spikes, such pulses are applied immediately in response at the terminal of the synaptic RRAM. This way, the RRAM sees a net-voltage equal to the subtraction of the displaced pulses applied by the pre-and the post-synaptic neurons (Fig. 6d ). For implementing an exponential STDP rule, the write-pulse may be given the following shape: Fig. 7) and an area of , originally reported in reported in [43] was used for the demonstrating STDP. The device was initialized to its low-resistance state by applying a long-lasting and constant negative voltage-pulse of -2.5 V and compliance set to 10 mA. For writing, pulses in Eq. 2 were used and the values for "s, "s and A"s are given in Table 1 . The procedure followed to demonstrate the STDP of the device is: 1) the conductance was read using a small rectangular voltage pulse (0.5V), yielding the initial conductance value 2) a was randomly chosen from [-100ms, 100ms] and These three steps were repeated 1000 times for the new state that served as for the new iteration. Of the pulses applied, the ones leading to increase in conductance have been plotted in green/Δ in Fig. 8and those leading to decrease, in blue/∇. The conductance values have been normalized by dividing each value with the maximum conductance observed. For a better visual, uniformly spaced iso-initial-conductance (iso-) points have been plotted in Fig. 9 and iso-time-difference (iso-) in Fig. 10 . To use this data for simulations, we used its interpolation model. Isoand iso-STDP curves obtained using such a model are plotted in Fig. 9b and Fig. 10b . Though the STDP demonstrated experimentally has a time-constant of 50ms, it can be altered by scaling the time-keeping portion of the write-pulses.
C. Training with Single PCMO-RRAM as a synapse
We replaced the mathematical synaptic model in the network mentioned in Sec. II with the interpolation model of PCMO-RRAM. To do so, we modified the:
Read equation: Each read-instance of
, the mathematical conductance, was replaced with the RRAM"s conductance. Since the latter was normalized, an additional factor of was added, leading to the following replacement: 2. STDP rule equation: The STDP rule specified in Eq.
3 was replaced with following equation:
( )
Where, is determined from the interpolation model mentioned above.
Next, the network"s training was simulated. Fig. 11 plots the evolution of classification-accuracy as training proceeds. Clearly, the performance is worse in comparison to that obtained with a mathematical synaptic model. Fig. 3 and Fig. 11 , is that such marked reduction in performance is observed because of large percentage change in conductance of the PCMO-RRAM. This is equivalent to having large learning-rates in a mathematical model. STDP has been demonstrated on several analog RRAMs [5] , [11] - [19] . As observed above, it is necessary for the maximum conductance-change to be less than 2% of the synaptic conductance-range to get software-equivalent evolution of conductance. However, all analog conductance"s plasticity demonstrations up until now have a maximum conductance-change ( | | ) of more than 20% and can go up to 400% in some devices (Fig. 12) . Thus, all currently existing analog RRAMs will fail to produce software-equivalent post-training classification accuracy for our network.
Our hypothesis, based on conclusions made from
Our second hypothesis, based on observation made from Fig. 4 and 5 , is that training performance can improved without changing the network, by reducing the maximum change in conductance change (i.e., lower | | or | | in Eq. 3). The validity of this hypothesis can be tested by using an STDP-data like we used for an RRAM with a lower maximum conductance. Since in our 
knowledge, all RRAMs in the literature do not meet this constraint, this is not possible for us at the moment. To test our hypothesis and as a step towards better memristor based synaptic models, we propose using a set of PCMO-RRAMs in parallel in a way that 1. they function aggregately as synapse 2. the conductance-change pulses are applied to only one of RRAMs Next, we validate our proposal.
IV. PARALLEL AND MULTIPLE PCMOS AS SYNAPSE
To test the ability of multiple ( ) PCMO-RRAMs to collectively act as a synapse, we continued with same network and trained it to classify Iris data-set. The mapping from a mathematical synapse to an -RRAM based synapse is similar to the one described in Sec. 3C, with a slight difference, described as follows:
1. Each read-instance of , the mathematical weight, was replaced with: ( ) Note that RRAM"s conductance increase corresponds to increase in the conductance of the synapse of which it"s a part. We let take all values from the set {2, 4, 16, 36, 64, 100}. Fig. 13 plots the CA, as training proceeds for and . It is observed that learning is more stable for synapses with more RRAMs. Fig. 13 plots the CA"s quantiles for all "s, as the training progresses with number of training-epochs in Fig. 14. The number of training-epochs was determined empirically. For , approximately 50 epochs were needed to stabilize learning. Since learning-rate is roughly inversely proportional to the number of RRAM, the number of training iterations needed is proportionally increased/decreased. For very large learning-rates ( and ), a lower limit of 20 was set. The evolution conductance of each synapse for various "s has been plotted in Fig. 15 . It is observed that:
1. starting from , the network"s CA reaches software-maximum of 97.3% at least once, while training ( Fig. 13) 2. despite being trained for more than adequate epochs, network with synapses having very low number of RRAMs are unable learn stably ( Fig. 13 and Fig. 14a) 3. as is increased, the CA distribution in Fig. 14a follows a trend similar to the one exhibited by Fig. 4 showing the increase in CA as LTP and LTD rates are decreased simultaneously -higher mean and lower variation 4. conductance evolve more smoothly as increases V. DISCUSSION The scheme discussed does not escalate the circuit requirements. During reading, all RRAMs in a synapse need to be simultaneously read from. This is done by applying same reading pulse-voltage to all row-bars of the synapse. The current from each of the branch associated to a synapse is then summed up, to get a current proportional to the synaptic weight.
For performing the writing operation, one RRAM is randomly chosen from RRAMs within a synapse, with a uniform probability for each. This is done by applying the pre-synaptic write-pulse to a random row among rowbars and the post-synaptic write-pulse to a random column among column-bars (for a synapse with RRAM rows and RRAM columns). This way, the RRAM selection is uniformly random among all RRAMs. Though there can be other schemes for uniformly selecting RRAMs, this in our opinion, shouldn"t require complicated and/or large area peripheral circuits.
The reading and writing phases may be multiplexed in time using a global control signal (reading indicated by and writing by ). During the writing phase, to allow random selection of a row/column, a set of global one-hot selection-lines is laid out along the periphery of the array. The active line in this set is changed, periodically. Whenever a neuron ( ) spikes (assumed to be random in time), the content of the selection line set is copied onto the RRAM selection register with output vector ̅ (Fig. 17 ). This way, the same row/column of the synapse remains selected until the next spike, as the active line of the synapse"s selection-register remains same. During the reading phase, the selection register is over-written using the pre-set control input so that ̅ .
Write-pulses may be applied to a single row (column) out of M 1 (M 2 ) via MOSFET based switches as shown in Fig. 18 . Each of the MOSFETs" gate is connected to the one-hot selection vector . Thus, only one MOSFET out of M 1 (M 2 ) is conducting and will allow write-pulses to be applied to the corresponding row (column). However, during the reading phase ( ), all MOSFETs are turned on by setting ̅ . Thus, for all RRAMs within a synapse, same reading pulse is applied on presynaptic terminals and post-synaptic terminals grounded.
Use of multiple RRAMs within a synapse clearly requires a bigger cross-bar array. And, a larger array implies a larger attenuation in the voltages being applied across RRAMs at cross-points far away from input and output sides of the array. Thus, for a constant wire-resistance, the cross-bar array size, or more fundamentally, the number of RRAMs in the synapses cannot be arbitrarily large. For a given , RRAM"s arrangement needs to be carefully designed to maintain a certain minimum level of read-write fidelity. For simplicity, consider an SNN with a single N-RRAM synapse and all RRAMs within it in same conductance state. If the potential-drop due to wireresistance is assumed to be a linear function of cross-point index within a synapse, then the read current-error of a cross-point within the array can be expressed as:
For a synapse with RRAMs, the maximum error in current will be for the cross-point at the corner furthest from the inputs and outputs, i.e., the one with indices . For fixed number of RRAMs in a synapse, say , the maximum error in read current will be This happens when the synapse has an arrangement of cross-points in configuration closest to that of a square. Thus, within a synapse, the RRAMs should be arranged in a square-like configuration.
VI. CONCLUSION
In this work, we introduce and empirically determine two new specifications for an SNN based synapse: number of conductance levels per synapse and maximum learningrate. To the best of our knowledge, there are no RRAMs that meet the latter specification. As a solution, we proposed the use of multiple RRAMs in parallel within a synapse. While synaptic reading, all RRAMs are simultaneously read and for each synaptic conductance-change event, the writing pulses for STDP are applied on only one RRAM, randomly picked from the set. To validate our solution, we experimentally demonstrated STDP of conductance of a PCMO-RRAM and showed that due to large learning-rate, a single device fails to model a synapse in the training of an SNN. As anticipated, network training improved as more RRAMs are added to the synapse. Lastly, we discuss the circuit-requirements for implementing such a scheme, to conclude that the requirements are within bounds. 
