Reliability-Performance Trade-offs in Neuromorphic Computing by Titirsha, Twisha & Das, Anup
1Reliability-Performance Trade-offs
in Neuromorphic Computing
Twisha Titirsha and Anup Das
Electrical and Computer Engineering Drexel University, Philadelphia, PA, USA
Email: {tt624,anup.das}@drexel.edu
Abstract—Neuromorphic architectures built with Non-Volatile
Memory (NVM) can significantly improve the energy efficiency of
machine learning tasks designed with Spiking Neural Networks
(SNNs). A major source of voltage drop in a crossbar of these
architectures are the parasitic components on the crossbar’s
bitlines and wordlines, which are deliberately made longer to
achieve lower cost-per-bit. We observe that the parasitic voltage
drops create a significant asymmetry in programming speed and
reliability of NVM cells in a crossbar. Specifically, NVM cells that
are on shorter current paths are faster to program but have lower
endurance than those on longer current paths, and vice versa.
This asymmetry in neuromorphic architectures create reliability-
performance trade-offs, which can be exploited efficiently using
SNN mapping techniques. In this work, we demonstrate such
trade-offs using a previously-proposed SNN mapping technique
with 10 workloads from contemporary machine learning tasks
for a state-of-the art neuromoorphic hardware.
Index Terms—Neuromorphic Computing, Non-Volatile Mem-
ory (NVM), Phase-Change Memory (PCM), Endurance
I. INTRODUCTION
Spiking Neural Networks (SNNs) are emerging machine
learning models with spike-based computation and bio-inspired
learning algorithms. Event-driven neuromorphic hardware such
as TrueNorth [1], Loihi [2], and DYNAP-SE [3] implements
biological neurons and synapses to execute SNN-based machine
learning tasks in an energy-efficient manner. This makes neu-
romorphic hardware suitable for energy-constrained platforms
such as the embedded systems [4] and edge devices of the
Internet-of-Things (IoTs) [5].
A neuromorphic hardware is implemented as a tile-based
architecture, the tiles are interconnected using a shared intercon-
nect such as the Network-on-Chip (NoC) [6] and Segmented
Bus [7]. Each tile consists of a crossbar, which can implement
a fixed number of neurons and synapses. A crossbar in a
neuromorphic hardware is an n×n organization, with n bitlines
(columns) and n worklines (rows). A silicon neuron is mapped
along each wordline of a crossbar, while a synaptic cell is
placed at the cross-section of each bitline and wordline using
an access device such as a transistor or a diode [8].
Recently, Non-Volatile Memory (NVM) such as Phase-
Change Memory (PCM), Oxide-based Resistive RAM (OxR-
RAM), and and Spin-Transfer Torque Magnetic or Spin-Orbit-
Torque RAM (STT- and SoT-MRAM) are used as synaptic cells
to increase integration density and reduce energy consumption
of crossbars in neuromorphic hardware [9]–[11].
A major source of voltage drops in a crossbar are the parasitic
resistance and capacitance on its bitlines and wordlines, which
are deliberately made longer to achieve lower cost-per-bit. In
fact, for a PCM-based crossbar, each neuron is approximately
18x the size of a PCM cell [12]. To amortize this large size,
systems designers implement larger crossbars, e.g., 128× 128
for DYNAP-SE and 256 × 256 for TrueNorth. For such large
crossbar sizes, the current on the longest path in a crossbar
becomes significantly lower than the current on its shortest
path for the same spike voltage generated from a neuron and
the same conductance programmed on the enabled synaptic
cell in these paths.1
Current asymmetry leads to a difference in performance
and reliability of NVM cells. Higher current through an NVM
cell can lead to faster programming of the cell. This means
that NVM cells on shorter current paths are faster to access
and program. However, NVMs also have limited endurance,
ranging from 105 (for Flash) to 1010 (for OxRRAM), with PCM
somewhere in between (≈ 107). An NVM cell’s endurance is
strongly dependent on the programming current. We build
the case for PCM, where the conductance change is induced
by Joule heating of the chalcogenide material in the cell.
The endurance of the material depends on the self-heating
temperature, which is dependent on the programming current.
Therefore, the NVM cells on shorter current paths have higher
self-heating temperature, and therefore lower endurances.
In recent years, many approaches are proposed to map SNNs
to neuromorphic hardware. This includes the performance-
oriented SNN mapping technique of [13], [14], the dataflow-
based mapping technique of [15], [16], the energy-aware
mapping technique of [17]–[19], the circuit aging-aware map-
ping technique of [20]–[23], and the run-time SNN mapping
technique of [24]. Unfortunately, none of these approaches
exploit the reliability and performance trade-offs of NVM
cells in neuromorphic computing. In this paper, we take
one such mapping approach – SpiNeMap, and show the
significant variations in endurance and speed during its mapping
explorations.
The remainder of this paper is organized as follows. We
provide a background of PCM and neuromorphic architectures
in Section II. Next, we formulate the endurance-access speed
trade-offs for a single PCM cell and integrate such trade-
offs at the crossbar-level in Section III. Next, we discuss the
mapping exploration of SpiNeMap in Section IV. We present
our evaluation in Section V and conclusion in Section VI.
1The length of a current path in a crossbar is measured in terms of the
number of parasitic components that are encountered on the path.
ar
X
iv
:2
00
9.
12
67
2v
1 
 [c
s.N
E]
  2
6 S
ep
 20
20
2II. BACKGROUND
In this section, we discuss the background on Phase-Change
Memory (PCM) to aid the understanding of the trade-offs
in neuromorphic computing. We also provide a discussion
on machine learning approaches using SNNs, and how such
approaches can be mapped to hardware consisting of PCM
cells organized into crossbars.
A PCM cell is built with chalcogenide alloy, e.g., Ge2Sb2Te5
(GST) [25], and is connected to a bitline and a wordline using
an access device. The GST alloy can either be in an amorphous
(high resistance) state, or in one of the partially crystallized
(low resistance) states. PCM is recently explored as scalable
DRAM alternative for conventional computing [26]–[31]. This
work explores PCM for neuromorphic computing. For such
computing architectures, the weight of a synaptic connection is
programmed as conductance of a PCM cell by driving current
and inducing Joule heating in the cell.
In many machine learning approaches such as online learning
using Spike-Timing Dependent Plasticity (STDP) [32], one-
shot learning [33], life-long learning [34], and reinforcement
learning [35], it is necessary to update synaptic weights based
on the input excitation. To facilitate such synaptic updates, a
PCM cell’s state must be switched by driving current through
it using the spikes generated from neurons. However, frequent
switching of a PCM cell’s state may lead to endurance issues,
where the cell fails to be programmed correctly, leading to a
degradation of machine learning performance. Furthermore, a
key requirement in such online learning use-cases is their real-
time performance, i.e., the weight updates must be completed
within a small time interval.
To understand the workload-dependent performance and
endurance trade-offs associated with PCM cells in a neuro-
morphic architecture, Figure 1a shows a simple SNN with
two input and one output neurons. Figure 1b illustrates the
mapping of this SNN to a crossbar. As seen from this figure, the
synaptic weights w1 and w2 are programmed as conductances.
The spike voltages are multiplied with the conductances to
generate current, which gets integrated along the columns.
The current strength guides the update of conductance of
the PCM cell enabled along the current path. Clearly, the
weight update frequency depends on the spikes generated
from the hardware neurons mapped along the rows of a
crossbar. The latter depends on how neurons and synapses
of a machine learning model, e.g., Figure 1a are mapped
to the corresponding resources on a crossbar. To elaborate
on this, Figure 1c illustrates the utilization of a different set
of PCM cells to realize the SNN of Figure 1a. If the PCM
cells in a crossbar have different performance and endurance
characteristics (which we demonstrate in this work), then the
mapping of neurons and synapses of a machine learning model
plays a critical role in system-level performance and reliability.
III. ENDURANCE-PERFORMANCE TRADE-OFFS IN
NEUROMORPHIC HARDWARE
We formulate the endurance-performance trade-offs for a
single PCM cell. To establish the relationship, we consider the
GST material of a PCM cell to be in a crystalline state. The
Post-synaptic 
neurons
1
2
3
(a) (b)
Pr
e
-
sy
n
a
pt
ic
 
n
e
u
ro
n
s
Post-synaptic 
neurons
(c)
Pr
e
-
sy
n
a
pt
ic
 
n
e
u
ro
n
s
Fig. 1. (a) A simple spiking neural network, (b) Mapping of the network to
a crossbar, (c) A different mapping of the network to the hardware.
amorphization process, i.e., the crystalline-to-amorphous state
transition involves driving a very high current through the cell
for a short duration. This high current raises the temperature of
the GST material through Joule heating in the heater attached to
the GST, which transitions the material to its amorphous state.
The crystalline fraction (Vc) is computed using the Johnson-
Mehl-Avrami (JMA) equation [36] as
Vc = exp
[
−α× (TSH − Tamb)
Tm
× t
]
, (1)
where t is the time, Tm is the melting temperature of the GST
material, and α is a fitting constant. The exponential decay of Vc
in Equation 1 implies that, higher the self-heating temperature
(TSH), faster is the reduction of the crystalline volume, i.e.,
faster is the amorphization process.
The self-heating temperature is related to the square of
programming current (Iprog) as
TSH = k · I2prog (2)
where k is a constant.
From Equations 1 and 2 we can conclude that higher the
programming current, higher is the self-heating temperature,
and hence, faster is the programming of the cell.
However, with increase in self-heating temperature, the
endurance of a PCM cell reduces. Using the phenomenological
endurance model [37], endurance of a PCM cell can be
expressed as
Endurance ≈ exp
(
γ
TSH
)
, (3)
where γ is a fitting parameter.
From Equations 2 and 3, we conclude that higher the
programming current, higher is the self-heating temperature
and therefore, lower is the endurance.
Figure 2 shows the current through the PCM cells in a
128x128 PCM crossbar. This current variation is due to the
difference in the length of current paths from pre-synaptic
neurons to post-synaptic neurons in the crossbar, where the
length of a current path is measured in terms of the number
of parasitic elements on the path. These current values are
obtained for a 65nm technology node and at 300K temperature
corner. As can be clearly seen from the figure, current through
PCM cells on the top-right corner of the crossbar is lower than
through PCM cells located at the bottom-left corner. Therefore,
cells at the top-right corner are slower to program and have
higher endurance, while those at the bottom-left corner are
30 32 64 96 128
Post-synaptic neurons
128
96
64
32
0
P
re
-s
yn
ap
ti
c
ne
ur
on
s
200
220
240
260
280
300
320
P
C
M
cu
rr
en
t
(µ
A
)
Fig. 2. Current map in a 128x128 crossbar.
faster to program and have lower endurance. Table I summarizes
these findings.
TABLE I
SUMMARY OF PERFORMANCE-ENDURANCE TRADE-OFFS.
Location Performance Endurance
Top-right corner Low High
Bottom-left corner High Low
IV. MAPPING EXPLORATIONS
In this section, we present the mapping exploration of
SpiNeMap [18] and show the performance-endurance trade-offs
that are obtained during its design-space exploration.
SpiNeMap uses an instance of the Particle Swarm Optimiza-
tion (PSO) [38], a meta-heuristic approach to map neurons and
synapses to the hardware. To this end, SpiNeMap first partitions
a spiking neural network based application into clusters, where
each cluster can fit onto the resources of a crossbar. The
clusters are then mapped to the crossbars using the PSO. In
general, PSO finds the optimum solution to a fitness function
F . Each solution is represented as a particle in the swarm.
Each particle has a velocity with which it moves in the search
space to find the optimum solution. During the movement, a
particle updates its position and velocity according to its own
experience (closeness to the optimum) and also experience of
its neighbors. We introduce the following notations for PSO.
D = dimensions of the search space (4)
np = number of particles in the swarm
Θ = {θl ∈ RD}np−1l=0 = positions of particles in the swarm
V = {vl ∈ RD}np−1l=0 = velocity of particles in the swarm
Position and velocity updates are performed according to
the following equation.
Θ(t+ 1) = Θ(t) + V(t+ 1) (5)
V(t+ 1) = V(t) + ϕ1 ·
(
Pbest −Θ(t)
)
+ ϕ2 ·
(
Gbest −Θ(t)
)
where t is the iteration number, ϕ1, ϕ2 are constants and Pbest
(and Gbest) is the particles own (and neighbors) experience.
In Figure 3, we illustrate the iterative approach to find an
optimal solution using PSO. The PSO algorithm starts with an
initial neighborhood of swarms. In this example, we illustrate
3 swarms, each with 3 particles (see Figure 3a). Each particle
jumps to a new location with a velocity determined as a function
of local best (within swarms), and global best. This continues
until the sub-swarms converge (see Figure 3b). In the third
step, the swarm regroups and the position and velocity update
steps are repeated (see Figure 3c). We continue these iterations
until a predefined convergence criteria is reached.
SpiNeMap uses PSO to minimize the number of spikes
communicated on the global interconnect, which leads to a
reduction in the energy consumption.
(a) Initial neighborhoods (3 
swarms, 9 particles) (a) sub-swarms converge (a) regrouping and repeat
Fig. 3. Illustrating the iterative steps of PSO.
Figure 4 illustrates the performance-endurance trade-offs ob-
tained during the mapping exploration of SpiNeMap. The figure
plots the performance and endurance obtained for different
design solutions generated during the design-space exploration
using the PSO. The figure also shows two solutions – one
with highest endurance and one with the highest performance.
We note that the highest performance mapping is generated
by DFSynthesizer [13], which maps neurons and synapses to
hadware, minimizing the execution time of applications.0.09793705 0.25466341 0.09793705 3.92675183 0.63082338 3.788240720.02805935 0.19863137 0.02805935 5.03445163 0.64198867 2.59803654
0.16272264 0.68054542 0.16272264 1.46940964 0.94876546 3.5426269
0.9125109 0.06024935 0.9125109 16.5976905 0.71778681 1.26881273
0.82289624 0.23507919 0.82289624 4.25388568 0.49773197 1.01201189
0.4336312 0.90744791 0.4336312 1.10199163 7.7809E-05 5.78844211
0.28611595 0.24717534 0.28611595 4.0457111 0.40951421 1.29137813
0.69991471 0.3305638 0.69991471 3.02513465 0.16277602 1.53068274
0.64763388 0.24583535 0.64763388 4.06776329 0.58744568 31.6116955
0.87004573 0.65077591 0.87004573 1.53662724 0.38661817 8.67628409
0.74822903 0.77349485 0.74822903 1.29283342 0.97979477 3.12393113
0.71566205 0.95366108 0.71566205 1.04859055 0.27398953 1.44233713
0.58517569 0.95388293 0.58517569 1.04834668 0.77298997 1.31865119
0.69624083 0.95986071 0.69624083 1.04181783 0.91854744 1.07009966
0.25084274 0.53585545 0.25084274 1.86617493 0.43820214 2.01622735
0.26291353 0.31381998 0.26291353 3.18654021 0.56563288 1.93601315
0.89704306 0.94194816 0.89704306 1.06162955 0.64038192 2.44045033
0.03913895 0.78320216 0.03913895 1.27680956 0.78584499 2.87775779
0.8854878 0.00166894 0.8854878 599.181585 0.34649809 50.0085876
0.70495145 0.56613437 0.70495145 1.7663651 0.22542983 2.05275694
0.94717489 0.21924404 0.94717489 4.56112732 0.00378595 1.99501424
0.68755658 0.66202123 0.68755658 1.51052558 0.17883011 1.17646418
0.10510054 0.89665289 0.10510054 1.11525877 0.0797055 2.94243324
0.26374939 0.17086474 0.26374939 5.85258248 0.58778438 4.66385727
0.93746179 0.39377869 0.93746179 2.5394975 0.25264656 3.9317558
0.55341523 0.39819115 0.55341523 2.51135666 0.3784418 1.00608904
0.68872744 0.10137794 0.68872744 9.86407873 0.28682916 1.91329673
0.95971273 0.86786112 0.95971273 1.15225809 0.97190374 1.39229793
0.06200298 0.22817732 0.06200298 4.38255656 0.34142741 15.8812662
0.41731455 0.29626389 0.41731455 3.3753692 0.48370778 2.53186344
0.14248844 0.59697288 0.14248844 1.67511798 0.3423859 1.0806845
0.35264145 0.05341964 0.35264145 18.7197076 0.52826354 1.13089171
0.04883472 0.13951738 0.04883472 7.16756577 0.55670726 1.69467268
0.07823767 0.255845 0.07823767 3.90861649 0.70183115 3.86203112
0.93194891 0.71385629 0.93194891 1.40084218 0.17267589 28.3693578
0.79714231 0.9664393 0.79714231 1.03472613 0.22583924 1.0628365
0.9431811 0.35075811 0.9431811 2.85096757 0.56255356 3.73905343
0.42399884 0.05014163 0.42399884 19.9435073 0.90441561 1.48823663
0.81171971 0.35592025 0.81171971 2.80961815 0.8957824 1.56237128
0.63779937 0.57118733 0.63779937 1.7507391 0.56651476 3.59113107
0.42646381 0.62940691 0.42646381 1.58879731 0.7828711 3.39532522
0.20608341 0.92678196 0.20608341 1.07900244 0.97810878 2.28083115
0.08133417 0.41265545 0.08133417 2.42332921 0.77299948 3.63273448
0.40046782 0.33878429 0.40046782 2.95173073 0.45922744 16.5882566
0.12036264 0.92319308 0.12036264 1.08319702 0.64753512 2.0061573
0.41677496 0.44500788 0.41677496 2.24715123 0.68045437 2.67915661
0.03801305 0.85455991 0.03801305 1.17019297 0.19972036 1.10477255
0.08086043 0.02377608 0.08086043 42.05907 0.87357655 3.11449993
0.86745656 0.47580325 0.86745656 2.10170904 0.83901484 1.15737363
0.10254638 0.62233661 0.10254638 1.60684747 0.1152653 4.62214051
0.3467139 0.86197096 0.3467139 1.16013189 0.07260967 1.70035601
0.43303618 0.65012106 0.43303618 1.53817507 0.43756779 1.28865271
0.53020849 0.87862492 0.53020849 1.13814209 0.20092369 1.34405651
0.44262145 0.81737723 0.44262145 1.22342533 0.64912729 1.16253588
0.7549753 0.0337669 0.7549753 29.6147997 0.19343057 2.17669399
0.5249642 0.25395398 0.5249642 3.93772128 0.9071509 1.47052008
0.04298418 0.68624219 0.04298418 1.45721147 0.68950241 1.82750847
0.05156215 0.10861752 0.05156215 9.20661827 0.52603126 2.38757745
0.41341078 0.70569469 0.41341078 1.4170434 0.47635407 2.52492555
0.87306891 0.77281104 0.87306891 1.29397737 0.3335281 20.5252043
0.70981692 0.54958249 0.70981692 1.81956306 0.37570416 3.68963827
0.31501313 0.59852199 0.31501313 1.67078239 0.45022811 1.44648203
0.8585691 0.32004903 0.8585691 3.12452125 0.7935841 1.3055856
0.65565855 0.42268238 0.65565855 2.36584267 0.54138124 7.77673793
0.24264268 0.87069093 0.24264268 1.14851317 0.05315795 3.14678615
0.44840889 0.693294 0.44840889 1.44238953 0.00261216 1.17713594
0.89594231 0.59903878 0.89594231 1.66934101 0.9793788 1.25905739
0.22941964 0.26686386 0.22941964 3.74722898 0.4462197 1.88858305
0.02273458 0.51000564 0.02273458 1.96076265 0.22413343 1.24585603
0.42451971 0.78594028 0.42451971 1.27236131 0.34187221 6.22856586
0.29927347 0.48749058 0.29927347 2.05132167 0.62801901 9.60228838
0.29425642 0.99862727 0.29425642 1.00137462 0.09176205 1.13773675
0.89435095 0.02960515 0.89435095 33.7779099 0.41896846 3.56818078
0.2649403 0.11744153 0.2649403 8.51487539 0.89363142 4.47426061
0.79395503 0.28744714 0.79395503 3.47890056 0.27075489 13.6593905
0.72733068 0.73077487 0.72733068 1.36841049 0.81977836 1.46883668
0.0592486 0.76908421 0.0592486 1.30024773 0.92417453 2.80198431
0.13741816 0.31038956 0.13741816 3.22175785 0.1055172 1.00740943
0.14646948 0.47728225 0.14646948 2.09519631 0.68928425 1.21472884
0.06583819 0.46467503 0.06583819 2.15204163 0.68292766 1.02352925
0.83486683 0.99876114 0.83486683 1.00124039 0.42142084 122.783507
0.85000867 0.16334643 0.85000867 6.12195823 0.81081865 2.20846618
0.97351047 0.74152802 0.97351047 1.34856671 0.70878105 1.59583397
0.20645189 0.71491937 0.20645189 1.39875913 0.82504011 2.60701778
0.47985786 0.09277255 0.47985786 10.7790505 0.78844347 1.02909177
0.62260918 0.10641475 0.62260918 9.39719348 0.05082769 2.04938814
0.82696792 0.2066441 0.82696792 4.83923801 0.72522163 1.99693716
0.71323735 0.99199794 0.71323735 1.00806661 0.20373124 1.28569936
0.9467899 0.98985714 0.9467899 1.01024679 0.0840725 10.693039
0.02124958 0.65864334 0.02124958 1.51827239 0.04599922 2.06707047
0.57134506 0.27249619 0.57134506 3.66977608 0.18768662 7.32154376
0.53085718 0.54268378 0.53085718 1.84269373 0.47017988 2.84624006
0.608434 0.43262486 0.608434 2.31147144 0.59600503 3.28606117
0.25139336 0.18636595 0.25139336 5.36578711 0.52299607 2.15853977
0.16870436 0.76481629 0.16870436 1.30750353 0.24203021 1.06114399
0.60989475 0.1291535 0.60989475 7.74272483 0.15184577 1.10627384
0.36096327 0.95908246 0.36096327 1.04266322 0.67747115 1.89096958
0.84351816 0.70632543 0.84351816 1.41577799 0.29854392 1.18284167
0.47624658 0.90738248 0.47624658 1.10207109 0.84547655 5.08957749
0.31370332 0.00624416 0.31370332 160.14953 0.45411071 13.6694345 a
0
5
10
15
20
0 0.2 0.4 0.6 0.8 1
En
du
ra
nc
e
Performance
Highest Endurance
Hi
gh
es
t 
Pe
rfo
rm
an
ce
Fig. 4. Performance-endurance trade-offs using SpiNeMap.
V. EVALUATION
A. Evaluation Framework
We evaluated 1 machine learning applications that are
representative of three most commonly used neural network
classes — convolutional neural network (CNN), multi-layer
perceptron (MLP), and recurrent neural network (RNN). These
applications are 1) LeNet [ 9] based handwritten digit recogni-
tion with 28×28 images of handwritten digits from the MNIST
dataset [ 0]; 2) AlexNet [41] for Imagenet classification [42];
3) VGG 6 [43], also for Imagenet classification [42]; 4) ECG-
based heart-beat classification (HeartClass) [44], [45] using
electrocardiogram (ECG) data from the Physionet database [46];
45) multi-layer perceptron (MLP)-based handwritten digit recog-
nition (MLP-MNIST) [47] using the MNIST database; 6)
edge detection (EdgeDet) [48] on 64 × 64 images using
difference-of-Gaussian; 7) image smoothing (ImgSmooth) [48]
on 64× 64 images; 8) heart-rate estimation (HeartEstm) [49]
using ECG data; 9) RNN-based predictive visual pursuit
(VisualPursuit) [50]; and 10) recurrent digit recognition (R-
DigitRecog) [47]. Table II summarizes the topology, the number
of neurons and synapses of these applications, and their baseline
accuracy. To demonstrate the trade-offs, we enable STDP-based
weight updates [32] in each of these applications.2 But our
approach is not limited to STDP.
TABLE II
EVALUATED APPLICATIONS.
Class Applications Synapses Neurons Topology Accuracy
CNN
LeNet [39] 282,936 20,602 CNN 85.1%
AlexNet [41] 38,730,222 230,443 CNN 90.7%
VGG16 [43] 99,080,704 554,059 CNN 69.8 %
HeartClass [45] 1,049,249 153,730 CNN 63.7%
MLP
DigitRecogMLP 79,400 884 FeedForward (784, 100, 10) 91.6%
EdgeDet [48] 114,057 6,120 FeedForward (4096, 1024, 1024, 1024) 100%
ImgSmooth [48] 9,025 4,096 FeedForward (4096, 1024) 100%
RNN
HeartEstm [49] 66,406 166 Recurrent Reservoir 100%
VisualPursuit [50] 163,880 205 Recurrent Reservoir 47.3%
R-DigitRecog [47] 11,442 567 Recurrent Reservoir 83.6%
We model the DYNAP-SE neuromorphic hardware [3] with
the following configurations.
• A tiled array of 4 tiles, each with a 128x128 crossbar.
There are 65,536 memristors per crossbar.
• Spikes are digitized and communicated between cores
through a mesh routing network using the Address Event
Representation (AER) protocol.
• Each synaptic element is a PCM-based memristor.
Table III reports the hardware parameters of DYNAP-SE.
TABLE III
MAJOR SIMULATION PARAMETERS EXTRACTED FROM [3].
Neuron technology 65nm CMOS
Synapse technology PCM
Supply voltage 1.0V
Energy per spike 50pJ at 30Hz spike frequency
Energy per routing 147pJ
Switch bandwidth 1.8G. Events/s
We evaluate the following metrics.
• Performance: This is the time it takes to execute an
application on the hardware model.
• Effective lifetime: This is the minimum effective life-
time of all PCM cells in the hardware. The effective
lifetime (Li,j), defined for the PCM cell connecting the
ith pre-synaptic neuron with jth post-synaptic neuron in a
memristive crossbar as
Li,j = Ei,j/ai,j , (6)
2Spike-Timing Dependent Plasticity (STDP) [51] is a learning mechanism in
SNNs, where the synaptic weight between a pre- and a post-synaptic neuron is
updated based on the timing of pre-synaptic inputs relative to the post-synaptic
spike.
where ai,j is the number of spikes propagating through
the PCM cell in a given SNN workload and Ei,j is its
endurance.
B. Performance
Figure 5 compares the performance of DFSynthesizer, a
performance-oriented technique to map SNNs to neuromorphic
hardware and SpiNeMap, which minimizes the number of
spikes on the shared interconnect. We observe that compared to
DFSynthesizer, the performance using SpiNeMap is an average
10% lower for these applications.
Le
N
et
A
le
xN
et
V
G
G
16
H
ea
rt
C
la
ss
M
LP
-M
N
IS
T
E
dg
eD
et
Im
gS
m
oo
th
H
ea
rt
E
st
m
V
is
ua
lP
ur
su
it
R
-D
ig
it
R
ec
og
AV
E
R
A
G
E
0
1
N
or
m
al
iz
ed
p
er
fo
rm
an
ce DFSynthesizer SpiNeMap
Fig. 5. Performance normalized to DFSynthesizer.
C. Effective Lifetime
Figure 6 plots the normalized lifetime of DFSynthesizer
and SpiNeMap for the evaluated applications. Lifetime results
are normalized to the lifetime obtained using the mapping
that generates the highest effective lifetime (see Figure 4). We
observe that lifetime using the mapping of DFSynthesizer is on
average 30% lower, while that using SpiNeMap is 19% lower
than the highest lifetime.
Le
N
et
A
le
xN
et
V
G
G
16
H
ea
rt
C
la
ss
M
LP
-M
N
IS
T
E
dg
eD
et
Im
gS
m
oo
th
H
ea
rt
E
st
m
V
is
ua
lP
ur
su
it
R
-D
ig
it
R
ec
og
AV
E
R
A
G
E
0
1
N
or
m
al
iz
ed
lif
et
im
e DFSynthesizer SpiNeMap
Fig. 6. Lifetime normalized to mapping with highest lifetime.
VI. CONCLUSIONS
In this work, we show the trade-offs between performance
and lifetime of neuromorphic hardware with PCM-based
crossbars. Specifically, we show that in a PCM-based crossbar,
the PCM cells that are located on the bottom-left corner
are faster to access but have lower lifetime than PCM cells
on the top-right corner, which are slower but have higher
lifetime. Existing SNN-mapping techniques do not explore this
trade-offs in mapping neurons and synapses to hardware. The
design space exploration of these mapping techniques often
select mapping that generate high performance or optimize for
energy consumption. Therefore, the lifetime obtained using
these techniques is significantly lower than the highest lifetime.
A possible future direction is therefore, to explore the trade-
offs during the design-space exploration. This will enable
generating SNN mapping that are balanced in terms of lifetime,
performance, and energy consumption.
5ACKNOWLEDGMENT
This work is supported by the National Science Founda-
tion Award CCF-1937419 (RTML: Small: Design of System
Software to Facilitate Real-Time Neuromorphic Computing).
REFERENCES
[1] M. V. Debole et al., “TrueNorth: Accelerating from zero to 64 million
neurons in 10 years,” Computer, 2019.
[2] M. Davies et al., “Loihi: A neuromorphic manycore processor with
on-chip learning,” IEEE Micro, 2018.
[3] S. Moradi et al., “A scalable multicore architecture with heterogeneous
memory structures for dynamic neuromorphic asynchronous processors
(DYNAPs),” TBCAS, 2017.
[4] E. A. Lee et al., Introduction to embedded systems: A cyber-physical
systems approach. Mit Press, 2016.
[5] W. Shi et al., “Edge computing: Vision and challenges,” IOTJ, 2016.
[6] L. Benini et al., “Networks on chip: A new paradigm for systems on
chip design,” in DATE, 2002.
[7] A. Balaji et al., “Exploration of segmented bus as scalable global
interconnect for neuromorphic computing,” in GLSVLSI, 2019.
[8] F. Catthoor et al., “Very large-scale neuromorphic systems for biological
signal processing,” in CMOS Circuits for Biological Sensing and
Processing, 2018.
[9] G. W. Burr et al., “Neuromorphic computing using non-volatile memory,”
Advances in Physics: X, 2017.
[10] A. Mallik et al., “Design-technology co-optimization for OxRRAM-based
synaptic processing unit,” in VLSIT, 2017.
[11] P. Wijesinghe et al., “An all-memristor deep spiking neural computing
system: A step toward realizing the low-power stochastic brain,” TETCI,
2018.
[12] G. Indiveri, “A low-power adaptive integrate-and-fire neuron circuit,” in
ISCAS, 2003.
[13] S. Song et al., “Compiling spiking neural networks to neuromorphic
hardware,” in LCTES, 2020.
[14] A. Balaji et al., “Enabling resource-aware mapping of spiking neural
networks via spatial decomposition,” Embedded Systems Letters, 2020.
[15] A. Das et al., “Dataflow-based mapping of spiking neural networks on
neuromorphic hardware,” in GLSVLSI, 2018.
[16] A. Balaji et al., “A framework for the analysis of throughput-constraints
of snns on neuromorphic hardware,” in ISVLSI, 2019.
[17] A. Balaji et al., “PyCARL: A PyNN interface for hardware-software
co-simulation of spiking neural network,” in IJCNN, 2020.
[18] A. Balaji et al., “Mapping spiking neural networks to neuromorphic
hardware,” TVLSI, 2020.
[19] A. Das et al., “Mapping of local and global synapses on spiking
neuromorphic hardware,” in DATE, 2018.
[20] A. Balaji et al., “A framework to explore workload-specific performance
and lifetime trade-offs in neuromorphic computing,” CAL, 2019.
[21] S. Song et al., “Improving dependability of neuromorphic computing
with non-volatile memory,” in EDCC, 2020.
[22] S. Song et al., “A case for lifetime reliability-aware neuromorphic
computing,” in MWSCAS, 2020.
[23] T. Titirsha et al., “Thermal-aware compilation of spiking neural networks
to neuromorphic hardware,” in LCPC, 2020.
[24] A. Balaji et al., “Run-time mapping of spiking neural networks to
neuromorphic hardware,” JSPS, 2020.
[25] S. Ovshinsky, “Reversible electrical switching phenomena in disordered
structures,” Physical Review Letters, 1968.
[26] B. C. Lee et al., “Architecting phase change memory as a scalable dram
alternative,” in ISCA, 2009.
[27] M. K. Qureshi et al., “Scalable high performance main memory system
using phase-change memory technology,” in ISCA, 2009.
[28] S. Song et al., “Enabling and exploiting partition-level parallelism (palp)
in phase change memories,” TECS, 2019.
[29] S. Song et al., “Exploiting inter-and intra-memory asymmetries for data
mapping in hybrid tiered-memories,” in ISMM, 2020.
[30] S. Song et al., “Improving phase change memory performance with data
content aware access,” in ISMM, 2020.
[31] S. Song et al., “Aging Aware Request Scheduling for Non-Volatile Main
Memory,” in ASP-DAC, 2021.
[32] S. R. Kheradpisheh et al., “STDP-based spiking deep convolutional
neural networks for object recognition,” Neural Networks, 2018.
[33] L. Fei-Fei et al., “One-shot learning of object categories,” TPAMI, 2006.
[34] J. M. Allred et al., “Stimulating stdp to exploit locality for lifelong learn-
ing without catastrophic forgetting,” Purdue University West Lafayette
United States, Tech. Rep., 2019.
[35] M. S. Shim et al., “Biologically inspired reinforcement learning for
mobile robot collision avoidance,” in IJCNN, 2017.
[36] M. Avrami, “Granulation, phase change, and microstructure kinetics of
phase change. III,” The Journal of Chemical Physics, 1941.
[37] D. B. Strukov, “Endurance-write-speed tradeoffs in nonvolatile memories,”
Applied Physics A: Materials Science and Processing, 2016.
[38] J. Kennedy, “Particle swarm optimization,” Encyclopedia of machine
learning, 2010.
[39] Y. LeCun et al., “Lenet-5, convolutional neural networks,” URL:
http://yann. lecun. com/exdb/lenet, 2015.
[40] L. Deng, “The mnist database of handwritten digit images for machine
learning research,” IEEE Signal Processing Magazine, 2012.
[41] A. Krizhevsky et al., “Imagenet classification with deep convolutional
neural networks,” in Advances in neural information processing systems
(NeurIPS), 2012.
[42] J. Deng et al., “Imagenet: A large-scale hierarchical image database,” in
Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[43] K. Simonyan et al., “Very deep convolutional networks for large-scale
image recognition,” arXiv, 2014.
[44] A. Das et al., “Heartbeat classification in wearables using multi-layer
perceptron and time-frequency joint distribution of ecg,” in CHASE,
2018.
[45] A. Balaji et al., “Power-accuracy trade-offs for heartbeat classification
on neural networks hardware,” JOLPE, 2018.
[46] G. B. Moody et al., “Physionet: a web-based resource for the study
of physiologic signals,” IEEE Engineering in Medicine and Biology
Magazine, 2001.
[47] P. U. Diehl et al., “Unsupervised learning of digit recognition using spike-
timing-dependent plasticity,” Frontiers in Computational Neuroscience,
2015.
[48] T. Chou et al., “CARLsim 4: An open source library for large
scale, biologically detailed spiking neural network simulation using
heterogeneous clusters,” in International Joint Conference on Neural
Networks (IJCNN), 2018.
[49] A. Das et al., “Unsupervised heart-rate estimation in wearables with
Liquid states and a probabilistic readout,” Neural Networks, 2018.
[50] H. J. Kashyap et al., “A recurrent neural network based model of
predictive smooth pursuit eye movement in primates,” in International
Joint Conference on Neural Networks (IJCNN), 2018.
[51] Y. Dan et al., “Spike timing-dependent plasticity of neural circuits,”
Neuron, vol. 44, pp. 23–30, 2004.
