Stochastic Spiking Neural Networks Enabled by Magnetic Tunnel Junctions:
  From Nontelegraphic to Telegraphic Switching Regimes by Liyanagedera, Chamika M. et al.
Stochastic Spiking Neural Networks Enabled by Magnetic Tunnel Junctions: From
Nontelegraphic to Telegraphic Switching Regimes
Chamika M. Liyanagedera,∗ Abhronil Sengupta, Akhilesh Jaiswal, and Kaushik Roy
Purdue University, West Lafayette, IN 47906
(Dated: January 29, 2018)
Stochastic spiking neural networks based on nanoelectronic spin devices can be a possible pathway
to achieving “brainlike” compact and energy-efficient cognitive intelligence. The computational
model attempt to exploit the intrinsic device stochasticity of nanoelectronic synaptic or neural
components to perform learning or inference. However, there has been limited analysis on the
scaling effect of stochastic spin devices and its impact on the operation of such stochastic networks
at the system level. This work attempts to explore the design space and analyze the performance
of nanomagnet-based stochastic neuromorphic computing architectures for magnets with different
barrier heights. We illustrate how the underlying network architecture must be modified to account
for the random telegraphic switching behavior displayed by magnets with low barrier heights as they
are scaled into the superparamagnetic regime. We perform a device-to-system-level analysis on a
deep neural-network architecture for a digit-recognition problem on the MNIST data set.
I. INTRODUCTION
Emulating the computational primitives of neural-
network-based machine-learning approaches by the in-
herent device physics of nanoelectronic components has
proven to be useful in reducing the area and energy
requirements of the underlying hardware fabrics. To
that effect, several post-CMOS technologies, like phase-
change memories [1], Ag-Si devices [2], and spintronic de-
vices [3] among others, have been shown to exhibit neural
and synaptic functionalities at the intrinsic device level.
In this work, we focus on spintronic technologies in par-
ticular due to the low current and energy requirements
of such devices in comparison to traditional memristive
technologies.
While traditional neuromorphic computing models
have been based on deterministic neural and synaptic
primitives, recent effort has been directed towards adapt-
ing such computing schemes to stochastic models. This
endeavor has been driven primarily by two factors. (1)
Deterministic neural or synaptic models are character-
ized by multibit resolution. However, as device dimen-
sions of nanoelectronic neurons or synapses are scaled
down, they might lose the multibit resolution capac-
ity. In addition, such devices are expected to exhibit
increased stochasticity during the switching process. For
instance, spintronic devices exhibit stochasticity due to
thermal noise at nonzero temperatures. Consequently,
computational models that leverage the underlying de-
vice stochasticity have recently been explored. Informa-
tion encoding over time due to probabilistic synaptic or
neural updates also enables state compression of neural
and synaptic units, thereby allowing them to be imple-
mented by single-bit technologies. (2) The human brain,
the main inspiration behind such neuromorphic comput-
ing models, is characterized by stochastic neural and
∗ cliyanag@purdue.edu
synaptic units. As a matter of fact, neuroscience studies
have indicated that cortical neurons generate spikes prob-
abilistically over time [4]. Consequently, stochastic neu-
ral computing models can potentially enable “brain-like”
cognitive computing. In this work, we focus on stochastic
neural inference in deep neural networks for typical pat-
tern recognition tasks [5]. However, the analysis can be
easily extended to stochastic synaptic units [6], or even
other unconventional computing platforms that require
stochastic switching elements like Ising computing [7, 8]
and Bayesian inference, among others.
Spintronic devices have recently found wide applica-
tion in large-scale neurocomputing hardware owing to
their scalability and low power requirements. Spin-
torque memristors with magnetic domain walls have been
shown to be a suitable candidate for implementing mul-
tilevel neurosynapses [9] and integrating and fire spik-
ing neurons [3].Another study demonstrated that the in-
herent magnetic dynamics of a magnetic tunnel junction
(MTJ) can be used to emulate the functionality of bio-
logically inspired leaky-integrate and fire-spiking neurons
[10]. In Ref. [11], spin-transfer-torque MTJs were used as
stochastic binary synapses, where the stochastic effects
of the devices are used to perform unsupervised learn-
ing. It was also demonstrated that MTJs can be used
as binary elements to implement long-term short-term
stochastic synapses to improve the learning efficiency of
a neural network [6]. A review of bioinspired neuromor-
phic computing platforms based on spintronic devices can
be found in Ref. [12].
As mentioned previously, spintronic devices display a
stochastic switching nature due to thermal noise. Given
a particular duration of write current flowing through
the device, a magnet exhibits a particular probability of
switching during that corresponding write cycle. Con-
secutive write and read cycles can be used to generate an
output pulse stream whose average value depends on the
magnitude of the input stimulus. While stochastic neural
networks based on spintronic devices have been explored
previously [5, 13], there has been limited analysis of the
ar
X
iv
:1
70
9.
09
24
7v
2 
 [c
s.E
T]
  2
6 J
an
 20
18
2scaling effects of these devices. It is generally expected
that, as the magnet dimensions scale down, the device
would exhibit increased stochasticity. Furthermore, the
operating current or voltage ranges required for operating
such devices in the probabilistic regime would be reduced.
However, as the scaling tends to the superparamagnetic
regime, the magnets undergo random telegraphic switch-
ing with a low data-retention time, making the device
practically volatile in nature. Utilizing such a device as a
biased random generator requires a rethinking of the pe-
ripherals and the underlying network architecture since
parallel read and write operations of the nanomagnets
are then required. However, the adaptation of such low-
energy superparamagnets as neural components comes at
the expense of reduced error resiliency. This is the case
mainly because the gradient or the rate of change of the
switching characteristics of such magnets in response to
the input current magnitude is extremely high. This arti-
cle attempts to address the different schemes of operation
of stochastic spiking neural networks (SNNs) for magnets
in nontelegraphic to telegraphic regimes and analyze its
associated energy-accuracy trade-offs at the system level.
II. MAGNETIC TUNNEL JUNCTION AS A
STOCHASTIC SWITCHING ELEMENT
A MTJ is a magnetoresistive device that consists of a
tunneling oxide sandwiched between two magnetic con-
tacts. One of the contacts is magnetically hardened and
is called the pinned layer, while the direction of magne-
tization of the other contact, called the free layer, can be
switched. In a spin-Hall effect based MTJ (SHE-MTJ),
the direction of the free layer is switched by passing a
charge current through an underlying heavy metal (HM),
as shown in Fig. 1. The passage of the charge current
(Icharge) through the HM layer induces a resulting spin
current (Ispin) flowing perpendicular to the planes of the
magnetic layers of the MTJ. This spin current can switch
the direction of magnetization of the free layer, making
it parallel (P) or anti-parallel (AP) to that of the pinned
layer, through the well known spin-orbit torque mecha-
nism [14, 15]. Owing to the magnetoresistance effect, the
SHE-MTJ exhibits a lower resistance (RP ), when in the
P state and a higher resistance (RAP ), when in the AP
state. Thus, the SHE-MTJ shown in Fig. 2, exhibits
decoupled read and write current paths. Write operation
can be achieved by a charge current flowing through the
HM layer, while the read operation can be accomplished
by sensing the resistance of the MTJ in a direction trans-
verse to the plane of the magnetic layers.
It is to be noted that the switching process of the
nanoscale free layer is influenced by thermal noise at
nonzero temperatures. Thermal noise results in a
stochastic switching behavior wherein, for a given cur-
rent flowing through the HM layer, the MTJ switches
with a certain probability. Moreover, the probability of
switching can be controlled by the magnitude of the cur-
(a) (b)
WHM
tHM
Pinned Layer
Free Layer
Charge 
Current
(c)
Spin 
Current
WMTJ LMTJ
FIG. 1. (a) High resistive anti-parallel state of an MTJ, (b)
Low resistive parallel state of an MTJ, and (c) A SHE-MTJ
device structure where the MTJ is switched by passing charge
current through the underlying heavy metal. The charge cur-
rent flowing through the heavy metal leads to spin splitting,
thereby creating a perpendicular spin current, switching the
magnetization direction of the free layer.
rent flowing through the HM. The dynamics of the mag-
netization vector in the presence of the HM-layer cur-
rent is given by the stochastic Landau-Lifshitz-Gilbert-
Slonczewski (LLGS) equation and can be written as [16]
∂m̂
∂τ
= −m̂× ~Heff−αm̂×m̂× ~Heff+ 1|γ| (αm̂×
~STT+ ~STT )
(1)
where τ is |γ|1+α2 t.
Here, α is the Gilbert-damping constant, γ is the gy-
romagnetic ratio, m̂ is the unit vector in the direction
of the magnetization, t is the simulation time and Heff
is the effective magnetic field including the demagnetiza-
tion field and the interface anisotropy field. A detailed
description of the various fields included in Heff can be
found in Ref. [16]. ~STT in Eq. (1) is the term repre-
senting the torque due to the SHE effect (modeled as a
spin-transfer torque term) and can be written as follows
[17],
~STT = |γ|β(m̂× (SHEm̂×m̂p)), β = ~Jq
2eµoMStFL
(2)
where m̂p is the magnetization of the pinned layer
(PL), e is charge of an electron, µo is the permeability
of vacuum, ~ is modified Planck’s constant, tFL is the
thickness of the free layer (FL), and MS is saturation
magnetization. Jq is the charge current density flowing
through the heavy metal. SHE is the spin-polarization
efficiency (defined as the ratio of the spin current gener-
3VREAD
RREF
Spike
IWRITE
IREAD
y
x
z
FIG. 2. Decoupled read and write current paths of the MTJ
with HM. Output of the inverter will be high if the MTJ is
in the P state, and low if the MTJ is in the AP state.
EB
Anti-ParallelParallelEn
er
gy
Angle between  
FL and PL
900 180000
FIG. 3. The two operating states of an MTJ. The two states
are thermally stable if the barrier height of the magnet, EB ,
is large enough.
ated due to the charge current flowing through the HM
layer) and can be written as [18],
she =
Ispin
Icharge
=
piw
4t
θshe
(
1− sech
(
t
λsf
))
(3)
where, w is width of free layer, t is thickness of heavy
metal, θshe is spin hall angle, λsf is spin-flip length.
The random switching process due to the effect of the
thermal noise can be included in the LLGS equation
through a stochastic field ~Hthermal in ~Heff [19],
~Hthermal = ~ζ
√
2αkBT
|γ|dtMSV ol (4)
where, kB is the Boltzmann constant, T is the temper-
ature, V ol is volume of the free layer magnet and dt is the
simulation time step. The term ~ζ in Eq. 4 is a Gaussian
random variable with zero mean and a standard deviation
equal to 1. The inclusion of thermal noise turns the LLG
equations into a stochastic differential equation. We used
Heun’s method to integrate the stochastic LLG equation.
The details of applying Heun’s method to stochastic LLG
equation can be found in [19], [20].The entire field acting
on the nanomagnet Heff is given by,
~Heff = ~Hthermal + ~Haniso + ~Hexternal (5)
where ~Haniso is the anisotropy field that, in in-plane mag-
nets, is dominated by the demagnetization field arising
from the shape of the magnet and is given by
~Hdemag = −MS [Nxxmxx̂, Nyymy ŷ, Nzzmz ẑ] (6)
whereNxx, Nyy, Nzz are the demagnetization factors that
are calculated based on the analytical equations pre-
sented in Ref. [21], and mx,my,mz are the magneti-
zation components of the nano-magnet in the x̂, ŷ and
ẑ directions. The presence of any external field can be
included through the term ~Hexternal.
A. Stochasticity in Non-Telegraphic Regime
The parallel and antiparallel states of the MTJ are sta-
bilized by an energy barrier, EB , that is defined as the
product of the magnetic anisotropy and volume (Fig. 3).
The retention time for the magnetic state of a nanomag-
net is given by [22],
Tretention = τ0exp(
EB
kBT
) (7)
where τ0 is a characteristic time constant in the range
1ps− 100ps [22].The retention time or the lifetime of the
magnet varies exponentially with the barrier height. The
nonvolatility of the magnet enables such devices to be
used in synchronous clocked systems where the device is
operated in successive write and read phases. During the
write cycle, a current pulse of fixed duration is passed
through the HM layer that can switch the MTJ from
one state over the barrier to the other stable state. The
switching probability of the magnet varies with the mag-
nitude of the current pulse flowing through the underly-
ing HM layer. During the read phase, a small current
is passed through the MTJ-Rref (which can be imple-
mented with another MTJ whose state is not disturbed
by the small read current) voltage-divider circuit (see Fig.
2), and the MTJ state is read at the output of the in-
verter. The read current should be sufficiently small such
that it does not disturb the state of the MTJ during the
4-5 0 5 10
0
0.2
0.4
0.6
0.8
1
5 10 15 20
0
20
40
60
80
0 5 10 15 20
-10
-8
-6
-4
-2
0
0 50 100 150
0
0.2
0.4
0.6
0.8
1
Current (μA)
Sw
itc
hi
ng
 P
ro
ba
bi
lit
y
Sp
ik
in
g 
Pr
ob
ab
ili
ty
Input current to MTJ normalized by a 
factor I0
(a) (b)
(c)
Lo
g(
Re
ad
 F
ai
lu
re
 P
ro
b)
(d)
Cu
rr
en
t (
μ
A)
Eb(KBT) Eb(KBT)
5KBT
10KBT
15KBT
20KBT
MTJ sw. prob
sigmoid
Ibias
I0
FIG. 4. (a) Switching characteristics of an MTJ with varying EB at T = 300K for a write cycle duration of 0.5ns, (b) MTJ
switching probability characteristics as a function of I−Ibias, normalized by a factor Io. The data closely resembles the sigmoid
function, (c) Variation of the bias current, Ibias, and the normalizing factor, Io, with varying EB . Both Ibias and Io decrease
with decreasing EB , (d) Failure probability during a read cycle of 1ns (in logarithm scale) with varying EB .
TABLE I. Device Parameters
Parameter
Values
1kBT 2kBT 10kBT 20kBT
Free Layer Width, WMTJ 10nm 17nm 30nm 40nm
Free Layer Length, LMTJ 25nm 42.5nm 75nm 100nm
Free Layer thickness 0.8 nm 1.2 nm
Saturation magnetization, Ms 750 KA/m 1000 KA/m
Heavy metal thickness 2nm
Spin-Hall Angle, θshe 0.3 [15]
Gilbert’s damping factor, α 0.0122 [15]
Temperature, T 300K
read phase. Since the voltage difference at the voltage-
divider output for the parallel and antiparallel states is
generally small, multiple stages of inverters are required
to obtain a full swing at the output.
Figure 4(a) illustrates the variation of the MTJ switch-
ing probability with the amplitude of the current pulse
being passed through the HM layer for different EB val-
ues. The device parameters used for simulations are en-
listed in Table I. Note that the barrier height of the mag-
net is varied by scaling the area of the magnets appro-
priately. It can be shown that the probabilistic switching
characteristics of the MTJ hold a sigmoidal relationship
to the write current by describing the SHE layer cur-
rent I, with two different parameters, namely Ibias and
Io. Ibias is the dc current required to bias the switching
probability of the MTJ to 0.5, and Io is the scaling fac-
tor used to map the swing of the switching probability
around the bias current to the sigmoid curve. Figure 4(b)
depicts the variation of the switching probability of the
MTJ with I − Ibias, normalized by a factor Io. Io can be
found by fitting the switching probability characteristics
[Psw(...)] to the sigmoid function such that [refer Fig.4
(b)],
sigmoid(
I − Ibias
Io
) ≈ Psw(I) (8)
As shown in Fig. 4(a), when EB and hence, the device
dimensions are scaled down, the current range required
for stochastic switching decreases, thereby reducing the
write current requirements of the device. Fig. 4(c) in-
dicates that both the components, Ibias and Io, are re-
duced with a reduction in barrier height. A reduction in
Io implies that the current range that can be utilized for
stochastic MTJ switching is reduced, thereby increasing
the rate of change of the switching probability with a
varying input current. Consequently, the computing sys-
tem becomes more prone to variations in the MTJ input
current and exhibits less error resiliency with the reduc-
tion off Io. These considerations are highlighted in the
next section.
Note that, if EB is not sufficiently large, the state of the
magnet can switch during the read operation due to very
small Tretention value. The retention failure probability
5PF,retention, of a MTJ within a given read access time is
given by
PF,retention = 1− exp(−tread/exp(∆)) (9)
where PF,retention is the retention failure probability
of the MTJ during a read time of tread in nano-seconds,
and ∆ is the EB of the MTJ in kBT . In order to find the
necessary tread for correct read operation, SPICE simu-
lations (with a Verilog A model for the MTJ [23]) are
performed in IBM 45nm technology node. Simulation
results show that the required read time is around 0.2ns
for the nominal corner and 1ns for the worst case cor-
ner (with 2σ variations in the threshold voltage of the
CMOS transistors). Hence, for retention failure proba-
bility calculations, the required read time is taken to be
1ns to ensure that a correct read can be achieved even
for the worst corner. As illustrated in Fig. 4(d), re-
tention failure probability increases exponentially as the
MTJ is scaled down. In order to keep the retention fail-
ure probability smaller than 1%, the EB of the magnet
should be kept greater than 4.6kBT . When the MTJs are
scaled further they enter the superparamagnetic regime,
where the magnets are no longer thermally stable during
the read cycle. Hence, parallel read-write operations are
required for magnets in the superparamagnetic regime
(EB < 5kBT ) to realize stochastic switching elements.
B. Stochasticity in the Telegraphic Regime
For low barrier height nanomagnets (EB ∼ 1kBT ),
even with zero charge current flowing through the HM
layer, the MTJ exhibits random telegraphic switching
between the two equilibrium states (Fig. 5(a)) due to
thermal noise. The random switching characteristics of
such scaled devices in the superparamagnetic regime can
be still manipulated by passing a current through thr HM
layer. For instance, Figs. 5(a)-(c) represents the in-plane
magnetization of the MTJ in presence of write current
of 0, 1.5,−1.5µA, respectively, flowing through the HM
layer of a 1kBT magnet. The dwell time of the MTJ in
either of the two stable states can be modulated by the
magnitude and direction of the input write current.
The volatility of these devices entails a rethinking of
the manner in which such nanomagnets can be operated
with peripherals to realize a stochastic computing ele-
ment. Because of device volatility and low retention time,
such devices cannot be operated with separate write and
read phases. Consequently, the write and read termi-
nals of the MTJ are activated simultaneously, and the
device state is read while an input bias current flows
through the underlying HM layer of the MTJ. For high-
energy-barrier MTJs, the effect of the read current on
the switching characteristics is not a design issue since
the read and write cycles are decoupled in time. How-
ever, for MTJs in the telegraphic switching regime, the
read current can bias the switching characteristics since
the read and write operations occur in parallel. Fur-
thermore, since the devices are highly scaled, the write
(for stochastic switching) and read currents fall in the
same order of magnitude (unlike high-barrier-height mag-
nets, where the write current for stochastic switching is
higher). Hence, the resistive divider of the read circuit
(Fig. 2 )needs to be highly optimized such that the read
current is maintained at the minimal value. SPICE sim-
ulations reveal that the read current can be minimized to
100 nA while having a minimal effect on the MTJ switch-
ing characteristics. Figure 6(a) depicts the average out-
put of the inverter stage over a duration of 2µs with and
without the read current. The case “with read current” is
simulated by considering the additional spin-orbit torque
induced by the 100nA read current flowing through the
HM layer while the case “without read current” ignores
the effect of the additional read current. As can be ob-
served in Fig. 6, the read current has minimal impact
on the MTJ switching probability. Furthermore, device
dimension variations (or equivalently EB variations) and
read circuit variations (±1σ and ± 2σ variations in the
threshold voltages of the CMOS transistors) are shown to
have minimal effect on the stochastic switching behavior
of the nanomagnets (Figs. 6(b)-(c)). Figure 6(d) repre-
sents a typical plot of the voltage output of the inverter
stage as a function of time with no input current flowing
through the underlying HM of the MTJ.
Note that the switching characteristics of superparam-
agnetic MTJs are highly sensitive to any change in the
magnitude of the write current. As depicted in Fig. 6(a),
the switching probability of the MTJ shifts from 0.5 to
0.85 for a 1µA change in the write current. Hence, the
impact of variations on the input current provided to a
network of such scaled MTJs can be significant, and it
is analyzed in more detail in the next section. We would
like to conclude this section by mentioning that the par-
allel read-write operation is not suited for magnetization
switching in the nontelegraphic regime [(10 − 20)kBT
barrier height magnets] since the telegraphic switching
would occur in time scales ranging approximately from
micro- to milliseconds, thereby resulting in an enhanced
delay for the computing process.
C. Stochastic Neuromorphic Computing
A neural network is essentially a collection of lay-
ers of neurons interfaced through a network of weighted
synapses. A particular input to a neuron is first scaled
by the corresponding synaptic weight of the synapse be-
fore it is accumulated and processed by the neuron. Neu-
rons with sigmoid like transfer functions have been shown
to be have appeal for implementing deep spiking neural
networks [5], making SHE-MTJ structures ideal for re-
alizing energy-efficient neuromorphic hardware. In the
stochastic neural network considered in this work, the
MTJ neuron generates an output spike probabilistically
depending on the instantaneous magnitude of the resul-
60 100 200 300 400 500
-1
-0.5
0
0.5
1
0 100 200 300 400 500
-1
-0.5
0
0.5
1
0 100 200 300 400 500
-1
-0.5
0
0.5
1
Time (ns)
(a)
m
x
m
x
Time (ns)
(b)
m
x
Time (ns)
(c)
FIG. 5. Switching characteristics of an MTJ with 1kBT barrier height: (a) When the current flowing through the HM is zero,
the MTJ is equally likely to be in the parallel or anti-parallel state, (b) When −1.5µA is flowing through the HM layer, the
MTJ is more likely to be in the anti-parallel state, (c) When 1.5µA is flowing through the HM layer, the MTJ is more likely
to be in the parallel state.
-3 -2 -1 0 1 2 3
0
0.2
0.4
0.6
0.8
1
(b)
Current (μA)
Vo
lta
ge
 (V
)
1KBT
1.2KBT
1.3KBT
0.8KBT
0.7KBT
-3 -2 -1 0 1 2 3
0
0.2
0.4
0.6
0.8
1
       
(c)
Vo
lta
ge
 (V
)
Current (μA)
nominal
1σ var
-1σ var
2σ var
-2σ var
-3 -2 -1 0 1 2 3
0
0.2
0.4
0.6
0.8
1
       
(a)
Current (μA)
Vo
lta
ge
 (V
)
w/o read 
current
with read 
current
0 50 100 150
0
0.2
0.4
0.6
0.8
1
Vo
lta
ge
 (V
)
Time (ns)
(d)
FIG. 6. (a) Average inverter output over a duration of 2µs with and without the impact of the read current, (b) Variation
of the inverter average output over a duration of 2µs with magnitude of the write current for different EB values, (c) Inverter
average output over a duration of 2µs for nominal corner and for the worst case conditions of ±1σ and ± 2σ variations in
the threshold voltages of the transistors, (d) A typical plot of the output voltage of the inverter stage of the read circuit as a
function of time under zero external input current.
tant weighted synaptic input [5]. This computing frame-
work can be directly translated to the resistive crossbar
architecture illustrated in Fig. 7, where the synaptic
weights are mapped into the resistive elements between
the horizontal and vertical metal lines.
Note that resistive crossbar arrays based on memris-
7To
 M
TJ
 
N
eu
ro
n 
1
To
 M
TJ
 
N
eu
ro
n 
2
To
 M
TJ
 
N
eu
ro
n 
n-
1
To
 M
TJ
 
N
eu
ro
n 
n
Vm+
V-
Gm,1+
Gm,1-
Gm,2+
Gm,2-
Gm,n-1+
Gm,n-1-
Gm,n+
Gm,n-
Gm-1,1+ Gm-1,2+ Gm-1,n-1+ Gm-1,n+
Gm-1,1- Gm-1,2- Gm-1,n-1- Gm-1,n-
V+
Vm-1+
Vm-2+
Vm-
Vm-1-
Gm-2,1+ Gm-2,2+ Gm-2,n-1+ Gm-2,n+
FIG. 7. Crossbar architecture connecting the inputs of one
layer to the neurons of the corresponding layer. Horizontal
bars provide the input voltage for the synapses. The sum-
mation of weighted synaptic currents along the columns of
the crossbar array are then provided as inputs to the MTJ
neurons.
tive devices like phase-change materials [1], Ag-Si devices
[2] and spintronic devices [24] have been proposed and
experimentally demonstrated [25]. Two horizontal lines
are used for each input connected to the crossbar ar-
ray to implement the functionality of positive and neg-
ative weights. An input spike provided to the network
activates the corresponding access transistors by supply-
ing a voltage to the horizontal lines V+ (positive volt-
age) and V− (negative voltage), which is translated to
a current through the vertical columns (weighted by the
conductances of the resistive elements). The current ac-
cumulated in the vertical columns are then supplied as
the write currents to the stochastic neurons of the corre-
sponding layer.
If the weight connecting an input m to a neuron n is
negative, then the corresponding resistive element con-
necting the positive horizontal line and the vertical col-
umn (Gm,n+ ) is programed to a high resistive off state
and the weight connecting the vertical column and the
negative horizontal line is programed to a conductance
given by Gm,n− = wm,nGo and vice versa. Here, wm,n
is the synaptic weight between the input m and neuron
n and Go is the mapped conductance for unity weight.
The conductances of the resistive elements are selected
by scaling the synaptic weights by a factor Go given
by, [Io/(δV )], where δV is the magnitude of the sup-
ply voltage driving the rows of the crossbar array and Io
is the current scaling factor of the stochastic MTJ men-
tioned previously. Assuming that the magnetometallic
spin devices have low input resistance compared to the
cross-point resistances of the crossbar array, the neurons
receive a weighted summation of spike inputs in a par-
ticular layer and produce output spikes probabilistically
over time that will drive the fan-out neurons of the next
layer. For magnetic neurons operating in the nontele-
graphic regime, the read circuit can be interfaced with
a latch that stores the inverter output during the read
cycle, which drives the next stage of neurons during the
following write cycle (hence the term synchronous oper-
ation).
For magnetic neurons operating in superparamagnetic
regime, the inverter output can directly drive the neu-
rons in the next stage (hence asynchronous operation).
Note that the high-barrier-height magnets are also driven
by a current source to bias it at a switching probability
of 0.5, unlike MTJs in the superparamagnetic regime.
Owing to the small input current and the zero bias cur-
rent of magnetic neurons operating in the superparamag-
netic regime, asynchronous architectures grant significant
power savings in the neurons and the resistive crossbar
array. However, as shown later, asynchronous implemen-
tation incurs significant power loss at the read circuit
owing to the continuous switching activity of the invert-
ers.
III. DESIGN CONSIDERATIONS:
SYNCHRONOUS AND ASYNCHRONOUS
NEUROMORPHIC SYSTEMS
A. Device to System Simulation Framework
In order to analyze the design considerations for syn-
chronous and asynchronous stochastic SNNs, a hybrid
device-circuit-system cosimulation framework is used in
this work. A stochastic LLGS simulation for MTJs with
different barrier heights is used to evaluate the proba-
bilistic switching behavior of magnets operating in the
nontelegraphic to telegraphic regime. In this work, we
use magnets having barrier heights 10 and 20kBT for
nontelegraphic regime and magnets of barrier height 1
and 2kBT for telegraphic regime. The device parameters
used for simulations are summarized in table I. SPICE-
level simulations based on a Verilog-A model of the MTJ
is used to evaluate the performance of the stochastic MTJ
along with associated peripherals.
In order to perform a system-level analysis, the per-
formance of the network is assessed for a large-scale
deep-learning network architecture (28×28-6c5-2s-12c5-
2s-10o) on a standard digit-recognition problem based on
the MNIST dataset [26]. The network consists of alter-
nate layers of convolutional and subsampling operations.
The dimensions of the input MNIST images are 28×28,
which are applied as input to the convolutional layer con-
sisting of six convolutional kernels with a size of 5×5.
The subsampling kernel has a size of 2×2 and is followed
by another convolutional layer comprising of 12 output
maps, which, in turn, is followed by another subsam-
pling layer. The final layer consists of ten neurons, each
of which represents one of the ten digit classes. Once the
training is accomplished, the learned weights are mapped
to the synaptic conductances using a value of Go = 5µS
which is in the typical resistance range for memristive
synaptic devices. The same resistive crossbar array is
80 50 100 150 200
80
85
90
95
100
 
Cl
as
si
fic
at
io
n 
Ac
cu
ra
cy
(a)
20KBT
10KBT
Simulation Time (ns)
0 100 200 300 400 500 600
80
85
90
95
100
Simulation Time (ns)
Cl
as
si
fic
at
io
n 
Ac
cu
ra
cy
(b)
1KBT
2KBT
FIG. 8. Variation of classification accuracy of the proposed
network with time for (a) Synchronous, and (b) Asynchronous
implementations.
used for all of the different barrier-height neuronal de-
vices. The supply voltage δV was adjusted in each case
to satisfy the relationship, δV = Io/Go, as explained
previously. The supply voltages δV , was calculated to be
0.1,0.11,1.05 and 2V for nano-magnets of barrier height
1,2,10 and 20kBT respectively. The sigmoid-curve char-
acteristics for the magnets operating in the telegraphic
regime are obtained by averaging the output voltage of
the read inverter circuit over a period of 2µs (for 1kBT
) and 5µs (for 2kBT ).
B. Performance and Energy Estimation
Figure 8 depicts the temporal evolution of the clas-
sification accuracy of the stochastic SNN for the syn-
chronous and asynchronous designs. For the 10 and the
20kBT synchronous designs the classification accuracy
reaches 98.1% and 97.6% respectively, while it saturates
at 97.5% and 97.2% for the 1 and 2kBT asynchronous
designs. Both synchronous networks surpass an accuracy
of 95% just under 20ns, whereas the two asynchronous
networks require 80ns (for 1kBT ) and 250ns (for 2kBT )
to reach the same accuracy. In the asynchronous imple-
mentation, the high frequency telegraphic switching of
the nano-magnets is translated into voltage spikes at a
lower frequency due to gate capacitance charge delays of
the CMOS devices, which explains the slower response of
the asynchronous networks compared to the synchronous
designs. Also as the EB values of the nanomagnets are
increased (for the superparamagnetic regime), the reten-
tion time of the nanomagnets increase, decreasing the
spiking frequency at the output of the inverters. Hence,
as the results show, for asynchronous designs, the time re-
quired for a network to reach a target accuracy increases
with the EB value of the nanomagnets used in the design.
For the synchronous networks, the duration of one time
step is selected to be 4 ns, which includes a write time
of 0.5 ns, a rest period of 2 ns, and a read time of 1 ns,
followed by a reset period of 0.5 ns. The duration of the
time step for the asynchronous networks is determined
by measuring the average duration of a voltage pulse at
the output of the inverter read circuit at zero write cur-
rent, and it is calculated to be 8.2 and 27.5ns for the 1
and 2kBT networks, respectively.
Figure 9 summarizes the energy consumption ob-
served for different components of the network (both
synchronous and asynchronous) corresponding to a
target classification accuracy of 96%. Neuron energy
[Fig. 9(a)] refers to the energy dissipated in the MTJ
neuron due to the write-reset currents flowing through
the HMlayer. The neuron energy consumption is lowest
for the 1kBT asynchronous design with an energy
consumption of 1.15pJ per image classification, and
increases with the size of the magnets up to 37.8pJ
per image classification for the 20kBT synchronous
design. This trend can be explained by the increasing
write-current requirements of the nanomagnets as their
sizes are increased. Since the current flowing through the
HM layer is first routed through the resistive crossbar
network (the synapses), the energy consumption in the
synapses [Fig. 9(c)] show a similar trend, increasing
with the size of the magnets. Also, the bias current
required in the synchronous designs to bias the switching
probability of the MTJs to 0.5 adds to the power
dissipation in the HM layer and the synapses. The
energy-consumption values in the synapses per image
classification are 0.27 and 0.74nJ for the 1 and 2kBT
asynchronous designs, and 1.3 and 6.5nJ for the 10
and 20kBT synchronous designs. The read energy
consumption, illustrated in Fig. 9(b), is the summation
of the power dissipated in the MTJ due to the read
current passing through and the power dissipated in
the CMOS interface circuitry. As the results indicate,
the read energy consumption per image classification
are larger for the asynchronous implementations (3.3
nJ for the 1kBT and 8.95nJ for the 2kBT ) than for
the synchronous implementations (2.1nJ for the 10kBT
and 2.75nJ for the 20kBT ). The majority of the read
power dissipation in asynchronous networks occur at
the CMOS inverters, which are required to operate
continuously due to the parallel read-write nature of the
neurons. In synchronous networks, however, the CMOS
inverters are required to operate only during the read
cycle, and can be deactivated at other times using access
90
2
4
6
8
10
      
        
0
2
4
6
8
0
2
4
6
8
10
0
0.01
0.02
0.03
0.04
Energy Barrier
N
eu
ro
n 
En
er
gy
  (
nJ
)
Re
ad
 E
ne
rg
y 
(n
J)
Energy Barrier
(a) (b)
(c)
To
ta
l E
ne
rg
y 
(n
J)
(d)
Sy
na
ps
e 
En
er
gy
 (n
J)
Energy Barrier Energy Barrier
1KBT 2KBT 10KBT 20KBT 1KBT 2KBT 10KBT 20KBT
1KBT 2KBT 10KBT 20KBT 1KBT 2KBT 10KBT 20KBT
Read
Synapse
Neuron
FIG. 9. (a) Energy consumption of the MTJ neuron, (b) Energy consumption of the read circuit, (c) Energy consumption of
the synapses, (d) Total energy consumption per image classification (for an accuracy of 96%) for the asynchronous (1kBT &
2kBT ) and synchronous (10kBT & 20kBT ) networks.
0 5 10 15 20
94
95
96
97
98
99
  
Synaptic Resistance Variation (%)
Re
co
gn
iti
on
 A
cc
ur
ac
y 
(%
)
(a) (b)
10KBT
20KBT
2.81%
2.8%
0 5 10 15 20
90
92
94
96
98
1KBT
2KBT
5.32%
Synaptic Resistance Variation (%)
Re
co
gn
iti
on
 A
cc
ur
ac
y 
(%
)
FIG. 10. Average classification accuracy (measured over 50 independent Monte Carlo simulations) with variations in the
resistive synapses (%σ variations) for the (a) synchronous design, (b) asynchronous designs.
transistors to save power. For both designs the power
dissipated in the neurons are an order of magnitude
smaller than the power dissipated in the synapses and
the read circuit, owing to the low resistance of the HM
layer. As depicted in Fig. 9(d), the 10kBT synchronous
network shows the minimum energy requirement per
image classification (3.4nJ), closely followed by the
1kBT asynchronous network (3.6nJ). The 2kBT asyn-
chronous network exhibit an energy consumption of
9.7nJ per image classification followed by the 20kBT
synchronous network with an energy consumption of
9.28nJ . For the synchronous networks, the energy
consumption associated with the clocking circuitry is
negligible, especially since a classification accuracy of
96% can be achieved under 10 clock cycles, and hence is
not considered in this analysis.
C. Effect of Variations
Most of the computations of the proposed network oc-
cur in the resistive crossbar array. Hence, any variations
in the resistive elements of the crossbar array can result
in a significant degradation of the classification accuracy.
To measure the effect of such variations, separate ex-
periments are performed allowing variations with a stan-
dard deviation up to 20%in the resistive elements. Ac-
10
0 5 10 15 20 25
92
94
96
98
100
0 5 10 15 20 25
90
92
94
96
98
Re
co
gn
iti
on
 A
cc
ur
ac
y 
(%
)
(a) (b)
10KBT
20KBT 1KBT
2KBT
6.1%
Supply voltage Variation (mV)
Re
co
gn
iti
on
 A
cc
ur
ac
y 
(%
)
Supply voltage Variation (mV)
FIG. 11. Average classification accuracy (measured over 50 independent Monte Carlo simulations) with variations in the
supply voltage (up to 25mV variations) for the (a) synchronous, and (b) asynchronous designs.
cording to the results (see Fig. 10) for variations in the
synapses with a standard deviation of 20%, the accuracy
loss is only 2.8% for the synchronous designs and 5.32%
for the asynchronous designs. The slightly higher accu-
racy degradation observed in the asynchronous designs in
comparison to the synchronous designs can be explained
by the increased sensitivity of the MTJ switching proba-
bility in response to the write current at the superpara-
magnetic regime.
Because of the low operating currents of the nanomag-
nets used in the asynchronous design, the operating volt-
age of the crossbar architecture given by δV = Io/Go
can be very small for low kBT magnets. Hence any vari-
ation in the supply voltage can potentially result in a
large deviation in the write-current magnitude, influenc-
ing the classification accuracy of the network. Figure 11
depicts the behavior of the classification accuracy of the
two designs in the presence of supply voltage variation.
As shown in Fig. 11(a), owing to the larger supply volt-
ages used in the synchronous designs, 10 and 20kBT syn-
chronous implementations are resilient to supply voltage
variations up to 25mV . The asynchronous implementa-
tions, on the order hand, exhibit an accuracy degradation
of 6.1% when variation in the supply voltage is less than
25mV .
As explained in Sec. II, the CMOS inverter read circuit
for the asynchronous implementation must be designed
carefully so that the average magnetization of the nano-
magnet is properly reflected on the average output of the
inverter. Any variation in the CMOS circuitry can offset
the average output of the inverters, adversely affecting
the classification accuracy of the network. As depicted
by Fig. 12, the classification accuracy of the 1kBT asyn-
chronous network decrease by 3% and the accuracy of
the 2kBT asynchronous network decrease by 0.7% for the
worst case corner with 2σ variations in the CMOS read
circuit. The synchronous networks are resilient towards
such CMOS variations since the read time is selected to
be adequate for a correct read even at the worst cell cor-
ner.
D. Effect of Temperature
In this work, the switching characteristics of the MTJs
are varied between the telegraphic and nontelegraphic
regimes by adjusting the width of the FL appropriately.
However, the switching characteristics of the MTJs can
deviate significantly from design values as the operating
temperature changes. Figure 13) depicts how the classi-
fication accuracy of the two designs vary as the operating
temperatures are changed from 200 to 400 K. As observed
by the simulation results, the two synchronous networks
are resilient to variations in temperature and show an
error degradation of less than 0.4% at 400 K. The two
asynhronous networks, on the other hand, are not as re-
silient to variations in temperature. The 1kBT network
display an accuracy degradation of 0.71% at 400K and
0.6% at 200K, while the 2kBT network display an ac-
curacy degradation of 2.8% at 400K and 3.2% at 200K.
The higher temperature dependency of the 2kBT net-
work can be explained by the change in the switching
characteristics of the MTJs at different temperatures. As
illustrated by Fig. 14) the average inverter output of the
2kBT magnet displays a larger shift with temperature
than the 1kBT magnet, resulting in a higher accuracy
degradation.
-2 -1 0 1 2
94
95
96
97
98
1KBT
2KBT
3%
CMOS Variations (σ)
Re
co
gn
iti
on
 A
cc
ur
ac
y 
(%
)
0.7%
FIG. 12. Average classification accuracy for the worst case
corner, with variations in the CMOS read circuit (upto ±2σ
variation) for the asynchronous design.
11
200 250 300 350 400
94
95
96
97
98
        
200 250 300 350 400
90
92
94
96
98
100
     
Re
co
gn
iti
on
 A
cc
ur
ac
y 
(%
)
(a) (b)
10KBT
20KBT
1KBT
2KBT
3.2%
Operating Temperature (K)
Re
co
gn
iti
on
 A
cc
ur
ac
y 
(%
)
Operating Temperature (K)
0.71%
FIG. 13. Classification accuracy with varying operating temperature for the (a) synchronous, and (b) asynchronous designs.
-4 -2 0 2 4
0
0.2
0.4
0.6
0.8
1
   
-4 -2 0 2 4
0
0.2
0.4
0.6
0.8
1
    
Vo
lta
ge
 (V
)
(a) (b)
200K
300K
Current(μA)
Vo
lta
ge
 (V
)
Current(μA)
400K
200K
300K
400K
FIG. 14. Average inverter output under different temperatures for (a) 1kBT magnet (b) 2kBT magnet .
IV. SUMMARY
In this paper, we outline the design considerations
for MTJ-based stochastic SNNs with varying barrier
heights. We show that the reduced energy consump-
tion of low-barrier-height magnets is achieved at the
expense of reduced error and variation tolerance and
constrained design space for the CMOS peripherals.
We further show that, contrary to the popular belief
that superparamagnetic MTJs are more energy efficient
than high-barrier-height magnets, parallel and always
on “read” and “write” operations in superparamagnets
cause the peripheral read circuit energy consumption
to dominate the network energy-consumption profile.
While scaling in the peripheral CMOS technology
reduces the peripheral energy consumption, reduced
error tolerance might still be a concern for spin-based
neuromorphic hardware design. The analysis performed
in this work can be easily extended to other applications
that require probabilistic inferencefor example, Bayesian
networks and Ising computing.
ACKNOWLEDGMENT
The work was supported, in part, by the Center
for Spintronic Materials, Interfaces, and Novel Archi-
tectures (C-SPIN), a MARCO- and DARPA-sponsored
StarNet center, by the Semiconductor Research Corpo-
ration (SRC), the National Science Foundation (NSF),
Intel Corporation, and the U.S. DoD Vannevar Bush Fel-
lowship.
[1] Duygu Kuzum, Rakesh GD Jeyasingh, Byoungil Lee,
and H-S Philip Wong, “Nanoelectronic programmable
synapses based on phase change materials for brain-
inspired computing,” Nano letters 12, 2179–2186 (2011).
[2] Sung Hyun Jo, Ting Chang, Idongesit Ebong, Bhav-
itavya B Bhadviya, Pinaki Mazumder, and Wei Lu,
“Nanoscale memristor device as synapse in neuromorphic
systems,” Nano letters 10, 1297–1301 (2010).
[3] Abhronil Sengupta and Kaushik Roy, “A vision for all-
spin neural networks: A device to system perspective,”
IEEE Transactions on Circuits and Systems I: Regular
Papers 63, 2267–2277 (2016).
[4] Rube´n Moreno-Bote, “Poisson-like spiking in circuits
with probabilistic synapses,” PLoS Comput Biol 10,
e1003522 (2014).
[5] Abhronil Sengupta, Maryam Parsa, Bing Han, and
Kaushik Roy, “Probabilistic deep spiking neural systems
12
enabled by magnetic tunnel junction,” IEEE Transac-
tions on Electron Devices 63, 2963–2970 (2016).
[6] Gopalakrishnan Srinivasan, Abhronil Sengupta, and
Kaushik Roy, “Magnetic tunnel junction based long-term
short-term stochastic synapse for a spiking neural net-
work with on-chip stdp learning,” Scientific Reports 6,
29545 (2016).
[7] Yong Shim, Akhilesh Jaiswal, and Kaushik Roy, “Ising
spin model using spin-hall effect (she) induced magneti-
zation reversal in magnetic-tunnel-junction,” Journal of
Applied Physics 121 (2017).
[8] Brian Sutton, Kerem Yunus Camsari, Behtash Behin-
Aein, and Supriyo Datta, “Intrinsic optimization using
stochastic nanomagnets,” Scientific Reports 7 (2017).
[9] Steven Lequeux, Joao Sampaio, Vincent Cros, Kay
Yakushiji, Akio Fukushima, Rie Matsumoto, Hitoshi
Kubota, Shinji Yuasa, and Julie Grollier, “A magnetic
synapse: multilevel spin-torque memristor with perpen-
dicular anisotropy,” Scientific reports 6, 31510 (2016).
[10] Abhronil Sengupta, Priyadarshini Panda, Parami Wi-
jesinghe, Yusung Kim, and Kaushik Roy, “Magnetic tun-
nel junction mimics stochastic cortical spiking neurons,”
Scientific Reports 6, 30039 (2016).
[11] Adrien F Vincent, Je´roˆme Larroque, Nicolas Locatelli,
Nesrine Ben Romdhane, Olivier Bichler, Christian Gam-
rat, Wei Sheng Zhao, Jacques-Olivier Klein, Sylvie
Galdin-Retailleau, and Damien Querlioz, “Spin-transfer
torque magnetic memory as a stochastic memristive
synapse for neuromorphic systems,” IEEE transactions
on biomedical circuits and systems 9, 166–174 (2015).
[12] Julie Grollier, Damien Querlioz, and Mark D Stiles,
“Spintronic nanodevices for bioinspired computing,” Pro-
ceedings of the IEEE 104, 2024–2039 (2016).
[13] Gopalakrishnan Srinivasan, Abhronil Sengupta, and
Kaushik Roy, “Magnetic tunnel junction enabled all-
spin stochastic spiking neural network,” in 2017 Design,
Automation & Test in Europe Conference & Exhibition
(DATE) (IEEE, 2017) pp. 530–535.
[14] Luqiao Liu, Chi-Feng Pai, Y Li, HW Tseng, DC Ralph,
and RA Buhrman, “Spin-torque switching with the gi-
ant spin hall effect of tantalum,” Science 336, 555–558
(2012).
[15] Chi-Feng Pai, Luqiao Liu, Y Li, HW Tseng, DC Ralph,
and RA Buhrman, “Spin transfer torque devices utilizing
the giant spin hall effect of tungsten,” Applied Physics
Letters 101, 122404 (2012).
[16] Akhilesh Jaiswal, Xuanyao Fong, and Kaushik Roy,
“Comprehensive scaling analysis of current induced
switching in magnetic memories based on in-plane and
perpendicular anisotropies,” IEEE Journal on Emerging
and Selected Topics in Circuits and Systems 6, 120–133
(2016).
[17] Jiang Xiao, Andrew Zangwill, and Mark D Stiles, “Boltz-
mann test of slonczewskis theory of spin-transfer torque,”
Physical Review B 70, 172405 (2004).
[18] Sasikanth Manipatruni, Dmitri E Nikonov, and Ian A
Young, “Energy-delay performance of giant spin hall
effect switching for dense magnetic memory,” Applied
Physics Express 7, 103001 (2014).
[19] William Fuller Brown Jr, “Thermal fluctuations of
a single-domain particle,” Physical Review 130, 1677
(1963).
[20] Werner Scholz, Thomas Schrefl, and Josef Fidler, “Mi-
cromagnetic simulation of thermally activated switching
in fine particles,” Journal of Magnetism and Magnetic
Materials 233, 296–304 (2001).
[21] Amikam Aharoni, “Demagnetizing factors for rectangu-
lar ferromagnetic prisms,” Journal of applied physics 83,
3432–3434 (1998).
[22] L Lopez-Diaz, L Torres, and E Moro, “Transition from
ferromagnetism to superparamagnetism on the nanosec-
ond time scale,” Physical Review B 65, 224406 (2002).
[23] Xuanyao Fong, Sumeet K Gupta, Niladri N Mojumder,
Sri Harsha Choday, Charles Augustine, and Kaushik
Roy, “Knack: A hybrid spin-charge mixed-mode simula-
tor for evaluating different genres of spin-transfer torque
mram bit-cells,” in Simulation of Semiconductor Pro-
cesses and Devices (SISPAD), 2011 International Con-
ference on (IEEE, 2011) pp. 51–54.
[24] Abhronil Sengupta, Aparajita Banerjee, and Kaushik
Roy, “Hybrid spintronic-cmos spiking neural network
with on-chip learning: Devices, circuits, and systems,”
Physical Review Applied 6, 064003 (2016).
[25] Mirko Prezioso, Farnood Merrikh-Bayat, BD Hoskins,
GC Adam, Konstantin K Likharev, and Dmitri B
Strukov, “Training and operation of an integrated neu-
romorphic network based on metal-oxide memristors,”
Nature 521, 61–64 (2015).
[26] Rasmus Berg Palm, “Prediction as a candidate for learn-
ing deep hierarchical models of data,” Technical Univer-
sity of Denmark 5 (2012).
