Supervised Learning in Spiking Neural Networks with Phase-Change Memory
  Synapses by Nandakumar, S. R. et al.
Supervised Learning in Spiking Neural Networks with Phase-Change
Memory Synapses
S. R. Nandakumar,1, 2 Irem Boybat,2, 3 Manuel Le Gallo,2 Evangelos Eleftheriou,2 Abu Sebastian,2, a) and
Bipin Rajendran1, b)
1)New Jersey Institute of Technology, Newark, NJ 07102, USA
2)IBM Research – Zurich, 8803 Ru¨schlikon, Switzerland.
3)Ecole Polytechnique Federale de Lausanne (EPFL), 1015 Lausanne, Switzerland
(Dated: 29 May 2019)
Spiking neural networks (SNN) are artificial computational models that have been inspired by the brain’s ability to nat-
urally encode and process information in the time domain. The added temporal dimension is believed to render them
more computationally efficient than the conventional artificial neural networks, though their full computational capa-
bilities are yet to be explored. Recently, computational memory architectures based on non-volatile memory crossbar
arrays have shown great promise to implement parallel computations in artificial and spiking neural networks. In this
work, we experimentally demonstrate for the first time, the feasibility to realize high-performance event-driven in-situ
supervised learning systems using nanoscale and stochastic phase-change synapses. Our SNN is trained to recognize
audio signals of alphabets encoded using spikes in the time domain and to generate spike trains at precise time instances
to represent the pixel intensities of their corresponding images. Moreover, with a statistical model capturing the exper-
imental behavior of the devices, we investigate architectural and systems-level solutions for improving the training and
inference performance of our computational memory-based system. Combining the computational potential of super-
vised SNNs with the parallel compute power of computational memory, the work paves the way for next-generation of
efficient brain-inspired systems.
In recent years, deep learning algorithms have become successful in solving complex cognitive tasks surpassing the per-
formance achievable by traditional algorithmic approaches, and in some cases, even expert humans. However, conventional
computing architectures are confronted with several challenges while implementing the multi-layered artificial neural networks
(ANNs) used in these algorithms, especially when compared against the approximately 20W power budget of the human brain.
The inefficiencies in the von Neumann architecture for neural network implementation arise from the high-precision digital repre-
sentation of the network parameters, constant shuttling of large amounts of data between processor and memory, and the ensuing
limited computational parallelism and scalability. In contrast, the human brain employs billions of neurons that communicate
with each other in a parallel fashion, through dedicated, analog, and low-precision synaptic connections. The spike-based data
encoding schemes used in these biological networks render the computation and communication asynchronous, event-driven,
and sparse, contributing to the high computational efficiency of the brain.
The size and complexity of artificial neural networks are expected to continue to grow in the future and has motivated the
search for efficient and scalable hadware implementation schemes for learning systems1. Spiking neural networks (SNNs) are
excellent candidates to implement large learning networks, especially for energy and memory-constrained embedded applica-
tions, as they closely mimic some of the key computational principles of the brain. Application specific integrated circuit (ASIC)
designs such as TrueNorth from IBM2 and Loihi from Intel3, that implement SNN dynamics, have been successful in demon-
strating two to three orders of magnitude energy efficiency gain by mimicking the sparse, asynchronous, and event-driven nature
of computation in the brain. However, the area expensive static random access memory (SRAM) circuits used for synaptic
weight storage in these chips limit the amount of memory that can be integrated on-chip and hence the scalability of these
architectures.
Crossbar arrays of analog non-volatile memory devices can perform weighted summation of its word line voltages in parallel
using the device conductances and the results are available as currents at its bit lines. This memory architecture performing
computations (computational memory) is using a combination of Ohm’s law and Kirchhoff’s law to reduce matrix-vector multi-
plications to O(1) complex operations4–7. Neural networks have layers of neurons each of which receive a weighted summation
of neuronal response from its previous layer. The underlying matrix-vector multiplications are computationally expensive in tra-
ditional digital systems due to their large sizes and the necessity to store these matrices off chip. SNNs processing asynchronous
events in time can significantly benefit from an on-chip computational memory that could store the synaptic weights in the device
conductance values and provide dedicated connectivity patterns to process parallel events in real-time (Fig. 1a). For instance,
such computational memory based SNNs could also be directly interfaced with spike encoding sensors such as artificial retina8
or cochlea9 to process asynchronous binary spike streams of real-world signals.
a)Electronic mail: ase@zurich.ibm.com
b)Electronic mail: bipin@njit.edu
ar
X
iv
:1
90
5.
11
92
9v
1 
 [c
s.E
T]
  2
8 M
ay
 20
19
2b
c
Input 
neurons
V1
V2
Vi
Ij
Output neurons
Gji
Neurons
Synapses
a
0 5 10 15
Initial conductance ( S)
-1
0
1
2
C
ha
ng
e 
in
 c
on
du
ct
an
ce
 (
S)
Device
Fit line
cryst-GST
amor-GST
Heater
Top electrode
Bottom electrode
FIG. 1. Spiking neural network implementation using computational memory array. a A single layer spiking neural network that
translates an input set of spike trains to an output set of spike trains (top). The network connectivity matrix could be realized using a non-
volatile computational memory array (bottom). The voltage spike trains Vi are applied along the word-lines and the weighed summations are
read as currents I j from the bit-lines. b The characteristic state-dependent behavior of average conductance change observed in a phase-change
memory (PCM) device. The device structure in the inset illustrates the amorphous region (amor-GST) formed inside the crystalline region
(cryst-GST). c The synaptic conductance changes measured using changes in excitatory postsynaptic current (EPSC) as a function of its initial
EPSC amplitudes, from the hippocampal neurons in a rat10. The state-dependent nature of conductance change in response to positive (causal)
spiking is analogous to that observed in the PCM devices.
However, achieving software-equivalent performance using computational memory realized using today’s memory devices is
challenging, due to the inherent non-idealities in the conductance modulation characteristics of these nanoscale devices. Phase-
change memory (PCM) is a mature non-volatile memory technology that has demonstrated gradual conductance modulation,
however it exhibits several non-ideal characteristics that are typical to most nanoscale memories, including limited precision,
stochasticity, non-linearity, as well as drift of the programmed conductance states with time11,12. Nonetheless, large-scale
experiments demonstrating the effective use of PCM as synapses in ANNs show significant promise11,13. PCM is an attractive
technology also for SNN implementations12,14–17 and the similarity of the state-dependent nature of the conductance update in
PCM and in a biological synapse (Fig. 1b, c) opens up the possibility of exploiting the device physics rather than merely being
limited by them. Most of the research efforts and experimental demonstrations using PCM in SNNs focus on an unsupervised
training based on a local learning rule observed in biology known as spike-timing-dependent plasticity (STDP)10. However,
unsupervised STDP based learning generally yields sub-par results in comparison to supervised training or has been limited to
problems where the desired response of the neural network is not known beforehand18. There is also a growing body of evidence
from neuroscience literature suggesting that data encoding using precise spike times in biological neural networks19–21 have
3several computational advantages compared to rate based encoding schemes22.
In this article, we focus on in-situ supervised training of SNNs that learn to generate spikes that encode data corresponding
to real-world signals using precise spike times and experimentally demonstrate their hardware implementation using more than
177,000 on-chip PCM devices. Moreover, we capture the statistical behavior of PCM devices with accurate models and use them
to evaluate the improvement in training performance as a function of the number of PCM devices used per synapse. Next, we
examine how modifications to the input encoding scheme with random jitter can improve learning and lastly, we demonstrate an
array-level compensation scheme to tackle the accuracy drop due to temporal evolution of PCM-based synapses.
RESULTS
A. SNN learning experiment
0 500 1000
Time (ms)
0
50
100
150
O
ut
pu
t n
eu
ro
n 
in
de
x
0 500 1000
Time (ms)
0
40
80
120
In
pu
t c
ha
nn
el
 in
de
x
Time
Am
pl
itu
de
132 x 168
Spikes from silicon 
cochlea subsampled
Desired spike streams
Spike rates as images
Spiking neural 
network
Audio signal 
FIG. 2. SNN training problem. The audio signal is passed through a silicon cochlea chip to generate spike streams. These spike streams are
sub-sampled and applied as input to train the single layer SNN. The desired spike response from the networks representing the images (14×12
pixels) corresponding to the characters in the audio is also shown.
The training problem and the network we used for the experiment are illustrated in Fig. 2. The learning task of the network
is to recognize and translate audio signals corresponding to spoken alphabets into corresponding images, with all information
encoded in the spike domain, as described below. An audio signal captured when a human speaker utters the characters ‘IBM’
(Eye..Bee..Em) is converted to a set of spike streams using a Silicon cochlea chip9 and the resulting 132 spike streams (repre-
senting the signal components in 64 frequency bands) are subsampled to an average spike rate of 10 Hz to generate the binary
spike inputs to the network (see Methods for more details). A raster plot of the generated spikes is shown in Fig. 2. At the output
of the network, there are 168 spiking neurons, with the spike in each neuron representing the instantaneous pixel intensity of the
image corresponding to the input audio signal. The desired spike stream from each output neuron is obtained from a Poisson
random process whose arrival rate is chosen to be proportional to the corresponding pixel intensities in the images (14×12 pixels
showing the characters ‘I’, ‘B’, and ‘M’), inspired by similar statistical distributions observed in animal retina21. Each image has
an average duration of 230 ms and is mapped to the corresponding time window in the audio signal. The network hence receives
132 spike streams corresponding to the audio signals and is connected to 168 spiking neurons at the output, corresponding to
the pixels of the image. In the experiment, the synaptic strength between the input streams and the output neurons is represented
using the conductance of the PCM devices.
An input spike, arriving at time ti on an input synapse, triggers a current flow into the output neuron. The synaptic current
in response to each spike is modeled as Iker(t) = (e−(t−ti)/τ1 − e−(t−ti)/τ2)u(t− ti) multiplied by synaptic weight W , where u(t)
is the Heaviside step function (with τ1 = 5ms and τ2 = 1.25ms). The sum of all the weighted currents are integrated by leaky-
integrate and fire (LIF) neurons to determine a voltage analogous to the membrane potential of the biological neurons. When
this voltage exceeds a threshold, it is reset to a resting potential and a spike is assumed to be generated. During the course of
training, PCM conductance values read from hardware are used to calculate the synaptic currents and the neuronal dynamics
are implemented in software. A supervised training algorithm is used to determine the desired weight updates such that the
observed spikes from the SNN are at the desired time instances. The weight updates will be implemented by modulating the
4corresponding PCM conductance values by applying a sequence of programming pulses. We avoid verifying if the observed
conductance change matches the desired update. This blind programming scheme (without expensive read-verify) is expected
to be the norm of computational memory based learning systems in the future and in this study we experimentally evaluate the
potential of analog PCM conductance to precisely encode spike time information in SNNs.
B. Phase-change memory synapse
For our on-chip training experiment, we used a prototype chip containing more than one million doped-Ge2Sb2Te5 (GST)
based PCM devices fabricated in 90 nm CMOS technology node23. The GST dielectric has a lower resistivity in its poly-
crystalline state and a high resistivity in its amorphous phase. An amorphous region is created around the narrow bottom
electrode via a melt-quench process. Its conductance can be gradually increased by a sequence of partial-SET pulses applied
to the device. A threshold switching phenomenon permits large current flowing though the amorphous volume to increase its
temperature and to initiate crystal growth. We have characterized the crystal growth driven conductance evolution in the PCM
array and have created statistically accurate models24. The PCM models are used to pre-validate the experiment and to evaluate
methods to improve training performance.
While the conductance increment (SET) operation in PCM can be gradual and accumulative, the melt-quench driven conduc-
tance decrement (RESET) process is non-accumulative. This leads to an asymmetric update behavior in conductance increase
and decrease, necessitating the use of the standard differential configuration for weight updates25. In this scheme, each network
weight W is realized as the difference of two PCM conductance values Gp and Gn (W = β (Gp−Gn) where β is a scaling factor
to be implemented in the peripheral circuit of the computational memory array). This allows both increment and decrement of
the W to be implemented as partial-SET operations on Gp and Gn, respectively. This differential configuration improves the
symmetry of weight updates and partially compensates the conductance drift26. Further improvement in conductance change
granularity, stochasticity, and drift behavior can be achieved via a multi-PCM configuration12,27. In our training experiment, both
the Gp and Gn are realized as the sum of four PCM devices. For each synaptic update desired by the training algorithm, only
one of the four devices is programmed, chosen cyclically so that on average all devices receive approximately equal number of
update pulses12. The energy overhead from the multiple devices per synapse is not expected to be significant since PCM devices
can be read with low energy (1 – 100 fJ per device)28 and only one of the devices is programmed per update as in a conventional
synapse. Although we are increasing the area for each synapse, it is worth noting that typical computational memory based
design area for neural networks are dominated by the circuits for peripheral neurons rather than the synapse. Moreover, PCM
devices have been shown to scale to nanoscale dimensions29 and through technology scaling, the synaptic area could reduce
significantly30. Thus in our implementation, each synapse is realized using 8 PCM devices, making a total of 177,408 devices
to represent the weights of 22,176 synapses in the network.
C. Training algorithm
The supervised training of SNNs is a challenging task as the gradient descent based backpropagation algorithms do not
apply directly due to the non-differentiable dynamical behavior of spiking neurons (i.e., the membrane potential encounters
a discontinuity at the point of spike). One approach to circumvent this limitation is to train a continuous-valued ANN using
standard backpropagation algorithm and then convert it into a SNN31–33. However, in this method, the input data and neuron
activations in the ANN are translated to spike rates in the SNN, losing the advantage of precise time-based signal encoding, and
necessitating longer processing times leading to sub-par performance and energy efficiency34. Also, the unconstrained training
of the floating point synapses without taking into account the non-idealities of analog memory devices will lead to further loss
in accuracy when the the trained weights are transferred to nanoscale synapses in hardware. Moreover, training approaches that
implement back-propagation in SNNs using approximate derivatives of the membrane potential around the time of spikes are
also aimed at minimizing cost functions, which have been described in terms of the output spike rate rather than precise spike
times35,36. Encoding events using precise spike times could be more efficient as it leads to sparse computations and low latencies
for decision making2,37–40.
Recently, several approximate spike time based supervised training algorithms have been proposed of varying computational
complexity that have demonstrated various degrees of success in benchmark problems in machine learning. Among these,
SpikeProp37 is designed to generate single spikes, Tempotron39 uses a non-event driven error computation, and ReSuMe41 and
NormAD42 (with relatively higher convergence rate) are designed to generate spikes at precise time instances via spike driven
weight updates. In our experiment, we use the normalized approximate descent (NormAD) algorithm which has been successful
in achieving high classification accuracy for the MNIST hand-written digit recognition problem43. According to this algorithm,
the weight updates ∆W are computed in an event-driven manner, using the relation
5∆W = η
∫ T
0
e(t)
dˆ(t)
‖dˆ(t)‖dt (1)
where η is the learning rate, T is the pattern duration, and e(t) is the difference between desired and observed binary spike trains.
dˆ(t) is obtained by convolving the input spike stream Si(t) with Iker(t) and an approximate impulse response of the LIF neuron
(see Methods). The weight updates are computed only at the time instants corresponding to a spike generated by the learning
network, or the instances where a spike was desired (i.e., when e(t) 6= 0). These are accumulated over the training pattern
duration (one epoch) and is used to modulate the network weights. The ∆W s were converted to desired conductance changes
using the scaling factor β . The desired conductance changes lying in the interval [0.1, 1.5] µS were mapped to amplitudes of
50 ns programming current pulses from 40 µA to 130 µA. The smaller conductance changes were neglected. The conductance
updates during the training were performed by blindly applying the programming pulses without verifying if the observed
conductance change matches the desired update.
D. Training performance
2 4 8 16 32
Number of devices
50
60
70
80
90
100
Ac
cu
ra
cy
 (%
)
PCM model
FP64
0 50 100
Training epoch
20
40
60
80
100
Ac
cu
ra
cy
 (%
)
FP64
Experiment
PCM model
0 500 1000
Time (ms)
0
50
100
150
200
O
ut
pu
t n
eu
ro
n 
in
de
x
Observed spike
Desired spike
a b c
FIG. 3. Training experiment using PCM devices. a Simulated training accuracy as a function of the number of devices in a multi-PCM
synapse (92.5% maximum accuracy). Accuracy is defined as the fraction of the spike events in the desired pattern corresponding to which a
spike was generated from respective output neurons within a certain time interval. The lower bound of the shaded lines correspond to 5 ms
interval, the middle line to 10 ms and the upper bound to 25 ms. b Accuracy as a function of training epochs from the experiment using on-chip
PCM devices. Each synapse was realized using 8 PCM devices in differential configuration. The corresponding training simulation using the
PCM model shows excellent agreement with the experimental result. The experiment, PCM model, and the reference floating point (FP64)
training achieve maximum accuracies of 85.7%, 87%, and 98.9% respectively for the 25 ms error tolerance. c The raster plot of the desired
and observed spike trains from the trained network. A visualization of the character images whose pixel intensities are generated from the
observed spike rates is also shown above the raster plot.
First, we used the PCM model to pre-validate and optimize the training scheme. Fig. 3a shows the improvement in network
training accuracy as the number of PCM devices used per synapse increases (in differential configuration). The performance of
the network is evaluated using an accuracy metric defined as the percentage of the number of spikes out of a total of 987 in the
desired pattern which have an observed spike from the SNN within a certain time interval. In the line plot of accuracy with shaded
bounds, the lower bound, middle line, and the upper bound respectively correspond to spike time tolerance intervals of 5 ms,
10 ms and 25 ms. Note that the average output spike rate for each of the character duration was less than 20 Hz corresponding to
an inter-arrival time of 50 ms, and the task of the network is to create spikes each one of which can be unambiguously associated
with one of the target spikes. A fixed weight range obtained from the reference high-precision training was mapped to the sum
of conductance of 1 to 16 differential pairs and networks were trained for 100 epochs. Using more number of devices in parallel,
with only one programmed at each weight update, permitted smaller weight updates to be programmed more reliably. Although
the accuracy was found to improve with more PCM devices, increasing the total number of devices beyond 16 in this problem
did not lead to corresponding improvements in accuracy. One possible explanation is that, with more number of devices the
observed conductance change (which has a limited dynamic range for a chosen partial-SET programming scheme) captures
6smaller desired weight changes but neglects the larger desired weight changes, leading to slower convergence. The maximum
accuracy observed from the simulation was 92.5% at 25 ms timing error for 16 devices per synapse.
We performed the training experiment with the synapses realized using eight on-chip PCM devices in differential configuration
and the SNN generated more than 85% of the spikes within the 25 ms error tolerance (Fig. 3b). The training experimental results
agree well with the observations from the PCM model based simulation. The training accuracy obtained from the corresponding
64-bit floating point (FP64) training simulation is also shown for reference. A raster plot of the spikes observed from the SNN
trained in the experiment is shown in Fig. 3c as a function of time along with the desired spikes. The character images shown on
top are created using the average spike rate for the duration of each character and it indicates that the network was successfully
trained to generate the spikes to create the images.
0 500 1000
Time (s)
0
40
80
120
In
pu
t c
ha
nn
el
a
0 0.5 1
Corr. coeff. between input
spike streams
0
0.002
0.004
0.006
0.008
0.01
Pr
ob
ab
ilit
y
b
Experiment
With input jitter
0 50 100
Training epoch
20
40
60
80
100
Ac
cu
ra
cy
 (%
)
c
FP64
With input jitter
Experiment
FIG. 4. Role of input correlations in network performance. a Input spike streams with spike times jittered by random amounts uniformly
distributed in [-25, 25] ms. b The cross- correlation between the jittered spike streams are shifted towards zero compared to the experimental
input c The simulated training accuracy is improved when trained with input spike streams of reduced correlation.
While the maximum accuracy obtained by the training the PCM devices is limited by the non-linearity, stochasticity, and
granularity of its conductance change, we observed that accuracy of the SNN could be further enhanced by modifying the input
encoding scheme. The ability of a neural network to classify its inputs depends on the correlation between the inputs. In
Fig. 4 we show using the PCM model simulation that the accuracy gap between those from the experiment and floating point
training simulation can be reduced by decreasing the correlation between the input spike streams. We added a random temporal
jitter uniformly distributed in the interval [-25, 25] ms to each input spike which causes the cross-correlation between the input
spike streams to decrease. The correlation coefficients between the binary spike streams were determined after smoothing them
using a Gaussian kernel (e−t2/2σ2 ) of σ = 5ms. Even though the added jitter only reduces the correlation by a very small
amount (Fig. 4b), the training performance improves substantially, suggesting that encoding schemes or network structures that
inherently separate input features will improve training performance using low-precision devices such as PCM.
E. On-chip inference
The ability of a PCM based SNN to retain the trained state is evaluated by reading the conductance at logarithmic intervals of
time and using it to calculate the network response. Both the spike-time accuracy (Fig. 5a) and the average spike rate (depicted
as pixel intensities in Fig. 5c) drops due to conductance drift over time (Fig. 5b). The conductance decrease causes the net
current flowing into the neurons to reduce which result in errors in spike times and a drop in the neuron spike rate. However, we
show that this can be compensated via an array level scaling method described below.
The conductance drift in PCM is modeled using the empirical relation44,45:
G(t) = G(t0)
( t− tp
t0− tp
)−ν
(2)
where G(t) is the conductance of the device at time t > t0, tp denotes the time when it received a programming pulse and t0
represent the time instant at which its conductance was last read after programming. Thus, each programming pulse effectively
re-initializes the conductance drift27. As a result, the devices in the array will drift by different amounts during training, based on
the instant they received the last weight update. However, once sufficient time has elapsed after training, (i.e., when t becomes
7100 101 102 103 104 105
Time elapsed after training (s)
0
20
40
60
80
100
Ac
cu
ra
cy
 (%
)
a
with drift compensation
without drift
compensation
0 10 20 30
G ( S)
100
101
102
103
104
105
C
ou
nt
after 105s with
drift compensation
after 105s without
drift compensation
G at the
end of training
b
after 105s without compensation
c
after 105s with compensation
at the end of training
FIG. 5. On-chip inference and drift compensation. a Inference using trained PCM array. Due to conductance drift, the accuracy drops over
time (black line). The effect of drift can be compensated by a time-aware scaling method (red line). Percentage accuracy drop over 4×105 s
was reduced from 70% to 13.6% at 25 ms error tolerance. b The drifted conductance distribution at the end of 105s is compared with the
trained conductance distribution. The effect of scaling on the drifted conductance is also shown. c The images generated by the SNN at the
end of training for the audio input (top). The images generated after 105 s (middle). The images generated with drift compensation (bottom).
The brightness of each pixel represents the spike rate for the duration of each character.
much larger than all the tp values of the devices in the array), the conductance drift can be compensated by an array level scaling.
In our study, all the measured conductances were scaled by t0.035e where the te is the time elapsed since training and 0.035 is
the effective drift coefficient for the conductance range of the devices in the array. Fig. 5 shows the improvement in spike-time
accuracy and spike rate obtained using this scaling method. The drop in accuracy after the compensation can be attributed
to the conductance state dependency and variability of the drift coefficient. The inference performance of SNN using PCM
synapses could be further improved by reducing the inherent conductance drift from the devices. The recently demonstrated
projected-PCM cell architecture with an order of magnitude lower drift coefficient is a promising step in this direction46,47.
DISCUSSION
One of the key questions that we have evaluated in this work is the ability of stochastic analog memory devices to represent the
synaptic strength in SNNs that have been trained to create spikes at precise time instances. As opposed to supervised learning in
second generation ANNs whose network output is determined typically by normalization functions such as softmax, learning to
generate multiple spikes at precise time instants is a harder problem. Compared to classification problems, the accuracy of which
depends only the relative magnitude of the response from one out of several output neurons, the task here is to generate close to
1000 spikes at the desired time instances over a period of 1250 ms from 168 spiking neurons, which are only excited by 132 spike
streams. Furthermore, the high correlation observed among several input spike-streams (due to the inherent correlations present
in the frequency components of the input audio signal) also makes the learning problem challenging for networks with low-
precision weights. While the spike rate based pixel intensity plots clearly represents the desired images, we chose to evaluate
our training performance using an accuracy metric defined in terms of spike time tolerance, since SNNs designed to process
precise spike times rather than spike rates could be expected to have higher energy efficiency and smaller response time.
At the same time, the observed conductance characteristics of biological synapses is not all too different from those exhibited
by our nanoscale phase change memory devices. PCM device conductance changes in a stochastic manner when programmed
using partial-SET pulses, and the conductance saturates in approximately 16−20 pulses corresponding to a bit precision on the
order of 4−5 bits. Synaptic transmission in biology is also observed to be stochastic and quantized, and previous studies have
estimated that biological synapses have a precision of about 4.6 bits48.
However, a major difference between our experiments and biology is the dynamics of the spiking neurons and the learning
algorithms used for weight updates. We have implemented the highly simplified leaky-integrate-and-fire model with an artificial
refractory period to model the neuronal dynamics. Numerous studies have pointed out that neuronal integration and spiking in
biology is a highly non-linear and error-tolerant process, with the most striking behavior revealed by the experiments of Mainen
and Sejnowski showing extremely reliable spiking behavior of neocortical neurons when excited by noisy input currents49. Such
non-linear behaviors may also play a key role in allowing biological networks to create spikes with more reliability and precision.
While several algorithms have been developed from mathematical formulations of cost-functions involving spike rates and
spike times, the mechanisms employed by nature to achieve the same task are still not well-understood. Most of the neuroscience
literature focuses on local learning rules such as hebbain plasticity, STDP, triplet-STDP, etc. It is not clear how these different
8local unsupervised learning rules come together to enable biological networks to encode and process information using precise
spike times. Neverthless, the artificial algorithms being developed are achieving increasing amounts of success in showing
software-equivalent performance in several common benchmark tasks in machine learning.
In summary, we analyzed the potential of the PCM devices to realize synapses in SNNs that can learn to generate spikes
at precise time instances via large scale (approximately 180,000 PCMs) supervised training and inference experiments and
simulations. We proposed several strategies to improve the performance of these PCM based learning networks to compensate
for the device level non-idealities. For example, synapse update granularity improved via multi-PCM configurations can improve
the training accuracy. Also, the performance drop during inference due to the conductance drift could be compensated via array
level scaling based on a global factor which is a function of the time elapsed since training alone. We successfully demonstrate
that in spite of its state-dependent conductance update and drift behavior, PCM synaptronic networks could be trained to generate
spikes with a few milliseconds of precision in SNNs. In conclusion, PCM based computational memory presents a promising
candidate to realize energy efficient bio-mimetic parallel architectures for processing time encoded SNNs in real time.
METHODS
Audio to spike conversion
The silicon cochlea chip has 64 band-pass filters with frequency bands logarithmically distributed from 8 Hz to 20 kHz and
generates spikes representing left and right channels. Further, due to the synchronous delta modulation scheme used to create
the spikes, there were on-spikes and off-spikes. The silicon cochlea generated spikes with a time resolution of 1 µs. The spikes
were further sub-sampled to a time resolution of 0.1 ms. The final input spike streams used for the training experiments have an
average spike rate of 10 Hz. Combining all the filter responses with non-zero spikes for left and right channels and the on and
off spikes, there are 132 input spike streams.
Neuron model
The SNN output neurons were modeled using leaky-integrate and fire (LIF) model. Its membrane potential V (t) is given by
the differential equation
Cm
dV (t)
dt
=−gL(V (t)−EL)+ I(t)
where Cm is the membrane capacitance, gL is the leak conductance, EL is the leak reversal potential, and I(t) is the net synaptic
current flowing into the neuron. When V (t) exceed a threshold voltage VT , V (t) is reset to EL and a spike is assumed to be
generated. Once a spike is generated, the neuron is prevented from creating another spike within a short time period called
refractory period tre f . For the training experiment, we used Cm = 300pF, gL = 30nS, EL = −70mV, VT = 20mV, tre f = 2ms.
For the NormAD training algorithm, the approximate impulse response of the LIF neuron is given as 1Cm e
−t/τL u(t) where τL =
0.1×Cm/gL and u(t) is the Heaviside step function. During training, the neuron responses were simulated using 0.1ms time
resolution.
PCM platform
The experimental platform is built around a prototype chip of 3 million PCM cells. The PCM devices are based on doped-
Ge2Sb2Te5 integrated in 90 nm CMOS technology50. The fabricated PCM cell area is 50 F2 (F is the feature size for the 90 nm
technology node), and each memory device is connected to two parallel 240 nm wide n-type FETs. The chip has circuitry for
cell addressing, ADC for readout, and circuits for voltage or current mode programming.
The PCM chip is interfaced with the Matlab workstation via FPGA boards and a high-performance analog-front-end (AFE)
board. AFE implements digital to analog converters, electronics for power supplies, and voltage and current references. FPGA
board implements digital logic for interfacing PCM and AFE board and perform data acquisition. A second FPGA board has an
embedded processor and Ethernet unit for overall system control and data management.
Experiment
The SNN training problem was initially simulated using double-precision (FP64) synapses in the Matlab simulation environ-
ment. The weight range for the SNN was approximately in the range [-6000, 6000]. To map the weights to the PCM conductance
9values in a multi-PCM configuration, the conductance range contribution from each device is assumed to be [0.1 µS, 8 µS]. The
conductance values are read from the hardware using a constant read voltage of 0.3 V, scaled to the network weights and are
used for matrix-vector multiplications in the software simulator. When different number of PCM device are used per synapse, a
scaling factor is determined such that the total conductance map to the same weight range. The weight updates determined from
the training algorithm at the end of an epoch is programmed to the PCM devices using partial-SET pulses of a duration 50 ns
and amplitudes in the range [40 µA, 130 µA]. The device conductance values are read after each epoch and is used to update the
SNN synapse values. Since each conductance values are read and programmed in series, each training epoch was emulated in
an average of 6.3 s.
For inference, the PCM conductance values were read at logarithmic time intervals after 100 epochs of training and the effect
of compensation schemes were evaluated in the software simulator.
1Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521, 436–444 (2015).
2P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, et al., “A million
spiking-neuron integrated circuit with a scalable communication network and interface,” Science 345, 668–673 (2014).
3M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C.-K. Lin, A. Lines, R. Liu, D. Mathaikutty,
S. McCoy, A. Paul, J. Tse, G. Venkataramanan, Y.-H. Weng, A. Wild, Y. Yang, and H. Wang, “Loihi: A Neuromorphic Manycore Processor with On-Chip
Learning,” IEEE Micro 38, 82–99 (2018).
4G. W. Burr et al., “Neuromorphic computing using non-volatile memory,” Advances in Physics: X 2, 89–124 (2017).
5M. Le Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma, C. Bekas, A. Curioni, and E. Eleftheriou, “Mixed-precision in-memory computing,”
Nature Electronics 1, 246–253 (2018).
6A. Sebastian, M. Le Gallo, G. W. Burr, S. Kim, M. BrightSky, and E. Eleftheriou, “Tutorial: Brain-inspired computing using phase-change memory devices,”
Journal of Applied Physics 124, 111101 (2018).
7Q. Xia and J. J. Yang, “Memristive crossbar arrays for brain-inspired computing,” Nature materials 18, 309 (2019).
8P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×128 120 dB 15 µs Latency Asynchronous Temporal Contrast Vision Sensor,” IEEE Journal of Solid-State
Circuits 43, 566–576 (2008).
9S. C. Liu, A. Van Schaik, B. A. Minch, and T. Delbruck, “Asynchronous binaural spatial audition sensor with 2×64×4 Channel output,” IEEE Transactions
on Biomedical Circuits and Systems 8, 453–464 (2014).
10G.-q. Bi and M.-m. Poo, “Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type,”
The Journal of Neuroscience 18, 10464–10472 (1998).
11G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic
weight element,” IEEE Transactions on Electron Devices 62, 3498–3507 (2015).
12I. Boybat, M. Le Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y. Leblebici, A. Sebastian, and E. Eleftheriou, “Neuromorphic
computing with multi-memristive synapses,” Nature Communications 9, 2514 (2018), 1711.06507.
13S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby, I. Boybat, C. di Nolfo, S. Sidler, M. Giordano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi,
and G. W. Burr, “Equivalent-accuracy accelerated neural-network training using analogue memory,” Nature 558, 60–67 (2018).
14D. Kuzum, R. G. Jeyasingh, B. Lee, and H.-S. P. Wong, “Nanoelectronic programmable synapses based on phase change materials for brain-inspired
computing,” Nano letters 12, 2179–2186 (2011).
15B. L. Jackson, B. Rajendran, G. S. Corrado, M. Breitwisch, G. W. Burr, R. Cheek, K. Gopalakrishnan, S. Raoux, C. T. Rettner, A. Padilla, et al., “Nanoscale
electronic synapses using phase change devices,” ACM Journal on Emerging Technologies in Computing Systems (JETC) 9, 12 (2013).
16T. Tuma, M. Le Gallo, A. Sebastian, and E. Eleftheriou, “Detecting correlations using phase-change neurons and synapses,” IEEE Electron Device Letters
37, 1238–1241 (2016).
17S. Sidler, A. Pantazi, S. Woz´niak, Y. Leblebici, and E. Eleftheriou, “Unsupervised learning using phase-change synapses and complementary patterns,” in
International Conference on Artificial Neural Networks (Springer, 2017) pp. 281–288.
18P. Diehl and M. Cook, “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,” Frontiers in Computational Neuroscience 9, 99
(2015).
19Z. F. Mainen and T. J. Sejnowski, “Reliability of spike timing in neocortical neurons.” Science (New York, N.Y.) 268, 1503–6 (1995).
20D. S. Reich, J. D. Victor, B. W. Knight, T. Ozaki, and E. Kaplan, “Response variability and timing precision of neuronal spike trains in vivo,” Journal of
Neurophysiology 77, 2836–2841 (1997), pMID: 9163398, https://doi.org/10.1152/jn.1997.77.5.2836.
21V. J. Uzzell and E. J. Chichilnisky, “Precision of spike trains in primate retinal ganglion cells,” Journal of Neurophysiology 92, 780–789 (2004), pMID:
15277596, https://doi.org/10.1152/jn.01171.2003.
22W. Maass, “Noisy Spiking Neurons with Temporal Coding have more Computational Power than Sigmoidal Neurons,” Advances in Neural Information
Processing Systems 9 9, 211–217 (1997).
23G. F. Close et al., “Device, circuit and system-level analysis of noise in multi-bit phase-change memory,” in IEEE International Electron Devices Meeting
(IEDM) (IEEE, 2010) pp. 29.5.1–29.5.4.
24S. Nandakumar, M. Le Gallo, I. Boybat, B. Rajendran, A. Sebastian, and E. Eleftheriou, “A phase-change memory model for neuromorphic computing,”
Journal of Applied Physics 124, 152135 (2018).
25M. Suri, O. Bichler, D. Querlioz, O. Cueto, L. Perniola, V. Sousa, D. Vuillaume, C. Gamrat, and B. DeSalvo, “Phase change memory as synapse for ultra-
dense neuromorphic systems: Application to complex visual pattern extraction,” in Electron Devices Meeting (IEDM), 2011 IEEE International (2011) pp.
4.4.1–4.4.4.
26M. Suri, D. Garbin, O. Bichler, D. Querlioz, D. Vuillaume, C. Gamrat, and B. Desalvo, “Impact of PCM resistance-drift in neuromorphic systems and
drift-mitigation strategy,” Proceedings of the 2013 IEEE/ACM International Symposium on Nanoscale Architectures, NANOARCH 2013 , 140–145 (2013).
27I. Boybat, S. R. Nandakumar, M. L. Gallo, B. Rajendran, Y. Leblebici, A. Sebastian, and E. Eleftheriou, “Impact of conductance drift on multi-pcm synaptic
architectures,” in 2018 Non-Volatile Memory Technology Symposium (NVMTS) (2018) pp. 1–4.
28M. Le Gallo, A. Sebastian, G. Cherubini, H. Giefers, and E. Eleftheriou, “Compressed sensing recovery using computational memory,” in IEEE International
Electron Devices Meeting (IEDM) (IEEE, 2017) pp. 28–3.
10
29F. Xiong, A. D. Liao, D. Estrada, and E. Pop, “Low-power switching of phase-change materials with carbon nanotube electrodes,” Science 332, 568–570
(2011).
30Y. Choi, I. Song, M. Park, H. Chung, S. Chang, B. Cho, J. Kim, Y. Oh, D. Kwon, J. Sunwoo, J. Shin, Y. Rho, C. Lee, M. G. Kang, J. Lee, Y. Kwon, S. Kim,
J. Kim, Y. Lee, Q. Wang, S. Cha, S. Ahn, H. Horii, J. Lee, K. Kim, H. Joo, K. Lee, Y. Lee, J. Yoo, and G. Jeong, “A 20nm 1.8v 8gb PRAM with 40MB/s
program bandwidth,” in Proc. IEEE International Solid-State Circuits Conference (ISSCC) (2012) pp. 46–48.
31Y. Cao, Y. Chen, and D. Khosla, “Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition,” International Journal of Computer
Vision 113, 54–66 (2015), arXiv:1502.05777.
32P. U. Diehl, G. Zarrella, A. Cassidy, B. U. Pedroni, and E. Neftci, “Conversion of Artificial Recurrent Neural Networks to Spiking Neural Networks for
Low-power Neuromorphic Hardware,” (2016), arXiv:1601.04187.
33B. Rueckauer, I.-A. Lungu, Y. Hu, M. Pfeiffer, and S.-C. Liu, “Conversion of Continuous-Valued Deep Networks to Efficient Event-Driven Networks for
Image Classification,” Frontiers in Neuroscience 11, 1–12 (2017).
34M. Pfeiffer and T. Pfeil, “Deep learning with spiking neurons: Opportunities and challenges,” Frontiers in Neuroscience 12, 774 (2018).
35J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks using backpropagation,” Frontiers in Neuroscience 10, 508 (2016).
36S. Woz´niak, A. Pantazi, and E. Eleftheriou, “Deep Networks Incorporating Spiking Neural Dynamics,” , 1–9 (2018), arXiv:1812.07040.
37S. M. Bohte, H. La Poutre´, and J. N. Kok, “Error-Backpropagation in Temporally Encoded Networks of Spiking Neurons,” Neurocomputing 48, 17–37 (2002).
38P. Crotty and W. B. Levy, “Energy-efficient interspike interval codes,” Neurocomputing 65-66, 371–378 (2005).
39R. Gu¨tig and H. Sompolinsky, “The tempotron: a neuron that learns spike timing-based decisions.” Nature neuroscience 9, 420–8 (2006).
40B. Wang, W. Ke, J. Guang, G. Chen, L. Yin, S. Deng, Q. He, Y. Liu, T. He, R. Zheng, Y. Jiang, X. Zhang, T. Li, G. Luan, H. D. Lu, M. Zhang, X. Zhang, and
Y. Shu, “Firing Frequency Maxima of Fast-Spiking Neurons in Human, Monkey, and Mouse Neocortex,” Frontiers in Cellular Neuroscience 10, 1–13 (2016).
41F. Ponulak and A. Kasiski, “Supervised learning in spiking neural networks with resume: Sequence learning, classification, and spike shifting,” Neural
Computation 22, 467–510 (2010), pMID: 19842989, https://doi.org/10.1162/neco.2009.11-08-901.
42N. Anwani and B. Rajendran, “Normad-normalized approximate descent based supervised learning rule for spiking neurons,” in International Joint Conference
on Neural Networks (IJCNN) (IEEE, 2015) pp. 1–8.
43S. R. Kulkarni and B. Rajendran, “Spiking neural networks for handwritten digit recognitionSupervised learning and network optimization,” Neural Networks
103, 118–127 (2018).
44I. V. Karpov, M. Mitra, D. Kau, G. Spadini, Y. A. Kryukov, and V. G. Karpov, “Fundamental drift of parameters in chalcogenide phase change memory,”
Journal of Applied Physics 102, 124503 (2007).
45M. Le Gallo, D. Krebs, F. Zipoli, M. Salinga, and A. Sebastian, “Collective structural relaxation in phase-change memory devices,” Advanced Electronic
Materials 4, 1700627 (2018).
46W. W. Koelmans, A. Sebastian, V. P. Jonnalagadda, D. Krebs, L. Dellmann, and E. Eleftheriou, “Projected phase-change memory devices,” Nature commu-
nications 6, 8181 (2015).
47I. Giannopoulos, A. Sebastian, M. Le Gallo, V. P. Jonnalagadda, M. Sousa, M. N. Boon, and E. Eleftheriou, “8-bit Precision In-Memory Multiplication with
Projected Phase-Change Memory,”.
48T. M. Bartol, C. Bromer, J. P. Kinney, M. A. Chirillo, J. N. Bourne, K. M. Harris, and T. J. Sejnowski, “Hippocampal Spine Head Sizes are Highly Precise,”
bioRxiv , 016329 (2015).
49Z. Mainen and T. Sejnowski, “Reliability of spike timing in neocortical neurons,” Science 268, 1503–1506 (1995).
50M. Breitwisch, T. Nirschl, C. Chen, Y. Zhu, M. Lee, M. Lamorey, G. Burr, E. Joseph, A. Schrott, J. Philipp, et al., “Novel lithography-independent pore phase
change memory,” in IEEE Symposium on VLSI Technology (IEEE, 2007) pp. 100–101.
ACKNOWLEDGMENTS
We would like to thank Dr. Shih-Chii Liu from the Institute of Neuroinformatics, University of Zurich, for technical assistance
with converting the audio input to spike streams using a silicon cochlea chip. A.S. acknowledges support from the European
Research Council through the European Unions Horizon 2020 Research and Innovation Program under grant number 682675.
B.R. was supported partially by the National Science Foundation through the grant 1710009 and Semiconductor Research
Foundation through the grant 2717.001.
AUTHOR CONTRIBUTIONS
B.R. and A.S. conceived the main ideas in the project. S.R.N and A.S. designed the experiment and S.R.N performed the
simulations. I.B. and S.R.N performed the experiment. M.L.G and I.B. provided critical insights. S.R.N and B.R. co-wrote the
manuscript with inputs from other authors. B.R., A.S., and E.E. directed the work.
