SPINBIS: Spintronics based Bayesian Inference System with Stochastic
  Computing by Jia, Xiaotao et al.
SPINBIS: Spintronics based Bayesian Inference
System with Stochastic Computing
Xiaotao Jia, Member, IEEE, Jianlei Yang, Member, IEEE, Pengcheng Dai, Runze Liu,
Yiran Chen, Fellow, IEEE and Weisheng Zhao, Fellow, IEEE
Abstract—Bayesian inference is an effective approach for
solving statistical learning problems, especially with uncertainty
and incompleteness. However, Bayesian inference is a computing-
intensive task whose efficiency is physically limited by the
bottlenecks of conventional computing platforms. In this work,
a spintronics based stochastic computing approach is proposed
for efficient Bayesian inference. The inherent stochastic switching
behaviors of spintronic devices are exploited to build stochastic
bitstream generator (SBG) for stochastic computing with hybrid
CMOS/MTJ circuits design. Aiming to improve the inference
efficiency, an SBG sharing strategy is leveraged to reduce the
required SBG array scale by integrating a switch network
between SBG array and stochastic computing logic. A device-
to-architecture level framework is proposed to evaluate the
performance of spintronics based Bayesian inference system
(SPINBIS). Experimental results on data fusion applications have
shown that SPINBIS could improve the energy efficiency about
12× than MTJ-based approach with 45% design area overhead
and about 26× than FPGA-based approach.
Index Terms—Bayesian Inference, Stochastic Computing, Spin-
tronics, Magnetic Tunnel Junction
I. INTRODUCTION
THE rise of deep learning has greatly promoted the de-velopment of artificial intelligence. However, most deep
learning approaches usually require large scale training data
and also bring some overfitting problems. Meanwhile, they
can neither represent the uncertainty and incompleteness of
the world nor take the advantages of well-studied human
experience. In order to overcome these limitations, some
researches trend to utilize Bayesian inference or combine
Bayesian approaches with deep learning. Bayesian inference
provides a powerful approach for information fusion, reason-
ing and decision making that has established it as the key
Manuscript received on May 2018, and revised on September 2018,
October 2018 and December 2018, accepted on January 2019. This work
was supported in part by the National Natural Science Foundation of
China (61602022, 61501013, 61571023, 61521091 and 1157040329), State
Key Laboratory of Software Development Environment (SKLSDE-2018ZX-
07), National Key Technology Program of China (2017ZX01032101), CCF-
Tencent IAGR20180101 and the 111 Talent Program B16001. Xiaotao Jia
and Jialnei Yang contributed equally to this work. Corresponding authors are
Jianlei Yang and Weisheng Zhao.
X. Jia, P. Dai and W. Zhao are with Beijing Advanced Innovation Center
for Big Data and Brain Computing, Fert Beijing Research Institute, School
of microelectronics, Beihang University, Beijing, 100191, China. E-mail:
weisheng.zhao@buaa.edu.cn
J. Yang and R. Liu are with Beijing Advanced Innovation Center for
Big Data and Brain Computing, Fert Beijing Research Institute, School of
Computer Science and Engineering, Beihang University, Beijing, 100191,
China. E-mail: jianlei@buaa.edu.cn
Y. Chen is with Department of Electrical and Computer Engineering, Duke
University, Durham, NC 27708, USA.
Comp.
RNG
Binary 
Number
Clock
Bitstream
x
y
Fig. 1: Traditional stochastic bitstream generator (SBG) cir-
cuits. If x > y, then outputs ‘1’; otherwise outputs ‘0’.
tool for data-efficient learning, uncertainty quantification and
robust model composition. It is widely used in applications
of artificial intelligence and expert systems, such as multi-
sensor fusion [1] and Bayesian belief network [2]. Recently
Bayesian learning has drawn great attentions on deep learning
community and is well combined with many deep neural
networks [3].
The fundamental of Bayesian inference is Bayes’ rule which
could be implemented by probabilistic computing. Probability
computing is a kind of computation-intensive task which is
inefficient with deterministic computation mode [4]. Stochas-
tic computing (SC) is an unconventional computing mode
which has observed to be suitable for efficient probability
computing [5] with high error tolerance abilities and low-
cost implementations of arithmetic operations. However, it is
difficult to leverage the parallelism of stochastic computing al-
gorithms on traditional von-Neumann architectures [6]. Hence,
reconfigurable approach [7] and analog computing [8] [9] is
utilized to realize stochastic computing in order to improve the
Bayesian inference efficiency. The stochastic computing are
usually realized by bit-wise operations on stochastic bitstreams
which is created by pseudo-random number generators (RNG)
and comparators as shown in Fig. 1. It is still expensive
to implement stochastic bitstream generator (SBG) on von-
Neumann architectures with CMOS technologies which is
critical for performing stochastic computing.
Recently, spintronic devices (such as magnetic tunnel junc-
tion, MTJ) pose some promising advantages on generating
random numbers because of the stochastic switching behav-
iors [11]. As shown in Fig. 2, an MTJ device usually switches
with a nondeterministic manner according to the applied bias
voltage and duration time due to the inherent thermal fluctua-
tion of magnetization. Such a stochastic switching behavior
has been exploited for generating random numbers [12]–
[14] efficiently. And consequently, the inherent randomness
of spintronic devices could be well revealed as the stochastic
resources to perform stochastic computing.
In this paper, spintronic device based stochastic computing
is proposed for efficient Bayesian inference system (SPINBIS).
ar
X
iv
:1
90
2.
06
88
6v
1 
 [c
s.E
T]
  1
9 F
eb
 20
19
164 IEEE TRANSACTIONS ON ELECTRON DEVICES, VOL. 62, NO. 1, JANUARY 2015
Analytical Macrospin Modeling of the Stochastic
Switching Time of Spin-Transfer Torque Devices
Adrien F. Vincent, Student Member, IEEE, Nicolas Locatelli, Member, IEEE,
Jacques-Olivier Klein, Member, IEEE, Weisheng S. Zhao, Senior Member, IEEE,
Sylvie Galdin-Retailleau, and Damien Querlioz, Member, IEEE
Abstract— Owing to their nonvolatility, outstanding endurance,
high write and read speeds, and CMOS process compatibility,
spin-transfer torque magnetoresistive memories (MRAMs) are
prime candidates for innovative memory applications. However,
the switching delay of their core components—the magnetic
tunnel junctions (MTJs)—is a stochastic quantity. To account
for this in electronic design, only partial models (working
in extreme regimes) are available. In this paper, we propose
an analytical model for the stochastic switching delay of a
current-driven MTJ, with in-plane magnetization, that agrees
with physical simulations, from low- to high-current regimes
through intermediate regime. We performed physical macrospin
simulations of MTJs for a wide range of current. We developed
an analytical model for the mean switching delay that fits those
simulations results, and smoothly connects well-accepted models
for the extreme low and extreme high currents limits. In addition,
a probability distribution in agreement with our simulations
results is proposed, leading to a full model of the stochastic
switching delay. An example for the application of the model
is proposed. Our analytical model can help to evaluate the error
rate in MRAM designs, and allow designing innovative electronic
circuits that exploit the intrinsic stochastic behavior of MTJs as
a beneficial feature.
Index Terms— Magnetic devices, magnetic memories,
modeling, simulation.
I. INTRODUCTION
RELYING on magnetic tunnel junctions (MTJs),spin-transfer torque magnetoresistive memories
(STT-MRAMs) chips are currently reaching the market.
They offer great advantages for memory applications:
1) low power consumption; 2) high write and read speeds;
Manuscript received August 1, 2014; revised October 10, 2014; accepted
November 5, 2014. Date of current version December 29, 2014. This work
was supported in part by the Agence Nationale de la Recherche (ANR)
COGNISPIN Project under Grant ANR-13-JS03-0004-01 and in part by the
European Union within the FP7 Programme through the ICT BAMBI Project
under Grant FP7-ICT-2013-C. The review of this paper was arranged by Editor
G.-H. Koh.
A. F. Vincent, N. Locatelli, J.-O. Klein, S. Galdin-Retailleau, and
D. Querlioz are with the Centre National de la Recherche Scientifique,
Institut d’Électronique Fondamentale, University of Paris-Sud, Orsay
91405, France (e-mail: adrien.vincent@u-psud.fr; nicolas.locatelli@
u-psud.fr; jacques-olivier.klein@u-psud.fr; sylvie.retailleau@u-psud.fr;
damien.querlioz@u-psud.fr).
W. S. Zhao is with the Centre National de la Recherche Scientifique, Institut
d’Électronique Fondamentale, University of Paris-Sud, Orsay 91405, France,
and also with the Spintronics Interdisciplinary Center, Beihang University,
Beijing 100191, China (e-mail: weisheng.zhao@u-psud.fr).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TED.2014.2372475
Fig. 1. (a) Sketch of an in-plane STT-MTJ. Horizontal arrows: possible
direction of the magnetic moment of each FM layer. The current I is positive
along the vertical arrow. (b) Symbols: experimental measurements of the
switching probability with respect to the duration of the applied programming
pulse, for different programming voltages. The measures were done on devices
from [21] with the experimental setup of [18] (personal communication with
the authors). Solid lines: fits using our analytical model.
3) outstanding endurance; and 4) CMOS compatibility.
In addition, as they are nonvolatile, they are candidates of
choice to lead to new memory usages [1], [2]. For example,
STT-MTJs may become ground-breaking devices that push
forward novel kinds of electronics that would be inspired by
biological systems, such as the human brain [3].
Designing circuits makes an important use of compact
models, ideally relying on analytical equations. Significant
efforts have been made to provide compact models of
STT-MTJs [4]–[12]. Some compact models [5], [6], [11], [12]
directly solve the exact magnetic equation of the dynamic of
the MTJ. Unfortunately, this approach is, particularly, time
consuming compared with other analytical models and does
not suit simulations with a large number of devices.
Faster models rely on simple analytical equations. However,
current analytical models of switching STT-MTJs [13]–[15]
face the problem that the available integrated models do
not link smoothly all regimes of current. An intermediate
regime of current exists, where the physics of the switching
is complex [16], [17] and not properly described by classical
analytical models. Sun’s model, the equation conventionally
used to describe the high-current regime, actually diverges
in this intermediate current regime. For this reason, the cur-
rent models usually ignore the intermediate regime [8]–[10],
although it is, particularly, relevant for applications [16].
In addition, the switching delay of an MTJ is an intrinsically
stochastic quantity [18], as shown in the experimental results
of Fig. 1. Current models usually use a simplified description
0018-9383 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Fig. 2: Experimental measurements of the switching probabil-
ity with respect to the duration of the applied programming
pulse, for different programming voltages [10].
SBGx1
4/8 10010110
SBGx2
6/8 01011111
SBGx3
7/8 11111011
SBGx4
2/8 10010000
00010110
s1
s2
s3
s4
s5
s6
01111011
Counter
y
6/8
Fig. 3: Stochastic circuit that realizes the arithmetic function
y = x1x2x4 + x3(1− x4).
The main contributions of this work are listed as follows:
• An efficient stochastic bitstream generator is proposed by
leveraging the stochastic switching behaviors of MTJ de-
vice. The generated bitstreams have a very low correlation
which is critical for stochastic computing accuracy. And
a state-aware self-control strategy is adopted to improve
the SBG efficiency.
• An SBG sharing strategy is leveraged to reduce the
required SBG array scale by integrating a switch network
between SBG array and stochastic computing logic. The
power consumption of SPINBIS is greatly reduced ben-
efiting from this strategy.
• A device-to-architecture level framework is built to eval-
uate the performance of SPINBIS with the data fusion
applications. Experimental results indicate that it could
achieve significant improvement on inference efficiency
in terms of power, area and speed.
The remainder of this paper is organized as follows. Sec-
ti II tates some preliminaries and related works. The
architecture of SPINBIS is pr sent d in Section III as well
as the SBG sharing techniques. Section IV describes the SBG
circuit design and state-aware self-control strategy. A device-
to-architecture evaluation framework and experimental results
on typical applications are illustrated in Section V. Concluding
remarks are given in Section VI.
II. BACKGROUND
A. Stochastic Computing
Stochastic computing was first introduced in the 1960’s by
von Neumann [15]. The basic idea of stochastic computing
is to represent probability value p by the ratio of ‘1’ in the
binary bitstreams. It is obvious that the representation of p by
stochastic bitstream is not unique. The value of the bitstream
(a) (b)
Free layer
Reference layer
Oxide barrier
BL
WL
IP->AP IAP->P
SL
NMOS
Fig. 4: (a) The typical structure of PMA-MTJ. (b) The circuit
schematic view of 1T1MTJ structure.
is only related to its length and the count of ‘1’, but has
nothing to do with the position of ‘1’. There are two encoding
formats for stochastic bitstream: unipolar and bipolar format.
The value range of unipolar is [0, 1] while bipolar is [−1, 1].
If the bitstream length is n, out of which k bits are ‘1’s, then
the probability value p is represented by p = k/n if using
unipolar encoding format or p = (2k − n)/n if using bipolar
encoding format. In this work, unipolar encoding format is
adopted because p ∈ [0, 1] in Bayesian inference problem.
Arithmetic operations in stochastic computing are realized
by using simple logic gates. For example, the multiplication
operation is achieved by an AND gate and scaled addition is
achieved by a MUX as shown in Fig. 3. Even though there
exists a slight loss in computation accuracy, the advantage of
stochastic computing is that it could significantly improve the
computation energy efficiency when compared with conven-
tional methods [16]–[21]. It is very suitable for inherent error-
resilience applications using stochastic computing to make
a trade-off between the accuracy constraints and the energy
efficiency requirements [22]–[24].
Stochastic computing is not an exact computing method
while the accuracy problem is arisen from several reasons.
The first reason is that the probability values p are usually
converted to stochastic bitstream with a lower quantization
accuracy compared with fixed or floating point methods. The
second reason is that the correlations between different bit-
streams usually degrade the computation accuracy since these
bitstreams are usually obtained by pseudo random number
generators. Aiming to improve the quality of SBGs, many
pioneer researchers have proposed several SBG models such
as linear feedback shift registers (LFSRs) [25]–[27], weighted
binary SNG [28]. However, such CMOS based approaches
usually pose some bottlenecks on power consumption and
chip area efficiency. And consequently some emerging devices
based approaches are investigated in this work.
B. Magnetic Tunnel Junction (MTJ) Device
Fig. 4(a) shows a typical structure of the MTJ device
with perpendicular magnetic anisotropy (PMA) [29]. MTJ is
a sandwich structure consisting of two ferromagnetic (FM)
layers and a tunneling barrier layer. One FM layer is defined
as reference layer (PL) with fixed magnetization direction.
Another FM layer is a kind of free layer (FL) whose mag-
netization direction could be parallel or anti-parallel with that
of PL. The MTJ resistance is determined by the relative
magnetization directions of PL and FL while parallel (P)
magnetization behaviors as low-resistance (RP ) state (logic
‘0’) and anti-parallel (AP) magnetization behaviors as high-
resistance (RAP ) state (logic ‘1’). Tunnel magneto resistance
ratio TMR = (RAP − RP )/RP is defined to characterize
the relationship of RP and RAP . Fig. 4(a) shows the circuit
schematic view of a popular 1T1MTJ memory cell. MTJ state
can be flipped by applying a polarized current injection with
spin transfer torque (STT) mechanism. The switching current
is controlled by the voltage between bit-line (BL) and the
source-line (SL). The nMOS transistor serves as a switch and
controlled by word-line (WL). As shown in Fig. 4(b), the
MTJ state is switched from P state to AP state if the injected
current (IP→AP ) flows through the MTJ from FL to PL. On
the contrary, the MTJ state is switched from AP state to P
state if IAP→P is injected. The MTJ state could be flipped
only if the applied bias voltage is larger than a critical current
Ic0 with an enough duration time as shown in Fig. 2.
The stochastic behavior of MTJ switching has been re-
vealed by [11] and is resulted from the unavoidable thermal
fluctuations of magnetization [30]. The MTJ device usually
switches with a stochastic manner according to the applied
voltage magnitude and duration time as shown in Fig. 2, which
could be represented as a random event with the probability
p. Such stochastic behaviors have been evaluated in MRAM
designs [31], neural network [32] [33] and random number
generator [34]. Meanwhile, such an inherent probabilistic
switching property is a very promising approach to generate
stochastic bitstreams for stochastic computing.
Recently, a simple MTJ based SBG is proposed in [12] but it
lacks of many circuit details. In [35], an MTJ based analog-to-
stochastic converter is proposed for stochastic computation in
vision chips. In [17], MTJ based stochastic computing is inte-
grated into artificial neural network applications. However, the
energy efficiency of their SBGs is relative low. Furthermore,
they have not considered the correlation between different
SBGs which will significantly degrade the computation accu-
racy. Voltage-controlled MTJs (VC-MTJs) are introduced for
stochastic computing to reduce the power consumption in [14]
but each SBG involves too many MTJs. Bitstream correlation
is discussed in [14], however, the proposed shuffle operation
could not remove the relevance essentially and may still result
in unexpected results.
III. THE SPINBIS ARCHITECTURE
A. Motivation
A typical Bayesian inference system (BIS) [36] is shown in
Fig. 5. The input of BIS is a set of bias voltages corresponding
to evidence or likelihood. SBG array is utilized to generate
stochastic bitstreams (SB) according to the input voltages. The
bitstreams are processed by the following stochastic computing
logics which are determined by the given application. There
are two major concerns to realize Bayesian inference applica-
tions on such system. One concern is that it usually requires
large amount of SBGs because each evidence is represented
by one SBG. As we have observed from many applications,
especially with large scale, there are many evidences who have
the same probabilities and may share the same SBGs to reduce
Application
Stoch. Comp.
Logic
Analog Input
(Voltage)
SBG
Array
Results
BSBS
Fig. 5: A typical Bayesian inference system [36].
Application
Stoch. Comp.
Logic
Conflict Sets
SBG Array
Switch
Controller
Switch
Matrix
Results
BS
extraction
SBG Share
Strategy
BS
BS
Digital
Input
Fig. 6: The proposed SPINBIS architecture.
the required SBG array scale. And the second concern is that it
usually requires digital-to-analog converters (DACs) to convert
the input digital sources into analog format which are defined
as bias voltages [36]. Meanwhile, the bias voltages margin is
usually very small and high accuracy DACs are required to
improve the input margin so that the design overheads are
difficult to tolerant. In this work, SPINBIS is proposed to
overcome these two disadvantages.
B. Overview of SPINBIS
SPINBIS is a spin-based Bayesian inference system. As
shown in the diagram from Fig. 6, an SBG sharing strategy
is exploited in SPINBIS to significantly reduce the required
array size which is different from the previous approach [36].
The SBG sharing strategy allows the inputs with the same
evidence could be potentially represented by the bitstream
generated from the same SBG. However, there are some inputs
who are connected together by one or more logic gates, which
are regarded as conflicting with each other. Conflicting inputs
are not allowed to share the same SBG. As shown in Fig. 6,
the conflict sets are extracted from the stochastic computing
logic which contains the conflicting relationship. The stochas-
tic computing logic block is determined according to the
specified applications. The SBG array is pre-built according
to the specified applications and the generated bitstreams are
assigned to stochastic computing logics by a switch matrix
which is controlled by the digital inputs [37]. The input of
switch matrix is the generated bitstreams from SBG array, and
the output is connected to the stochastic computing logics.
The switch matrix is a crossbar structure while each cross
point is realized by a transistor which is controlled by the
switch controller. In summary, the bitstreams from SBG array
are assigned to stochastic computing logics according to the
switch matrix which is controlled by switch controller.
T1
T2
T3
T4 T5
T6
T7 T8 T9
R1
R2
Set 1
T6 T7
T3 T4 T5
T1 T2 T5
T8 T9
Terminals in the 
same set cannot 
share the same 
bitstream
Set 2
Set 3
(a) Logic Diagram
(b) Conflict Sets
Fig. 7: Stochastic computing logic diagram and its conflict set.
Terminals of T1 ∼ T9 are supposed to have the probabilities
{p1, p2, p1, p3, p1, p4, p5, p3, p3}.
C. Stochastic Computing Logic and Conflict Set
One of the most attractive advantages of stochastic com-
puting is that the involved arithmetic operations could be
efficiently realized by simple logic gates, including AND,
MUX, etc. The stochastic computing logics are determined
according to the specified applications. Once the application
is given, the stochastic computing logic is determined. For
the determined computing logics with N inputs, it requires
N bitstreams from the switch matrix. As shown in Fig. 7(a)
of a stochastic computing logics example, there are two
independent sub-circuits with 9 inputs (T1, T2, · · · , T9) and
2 outputs of R1 and R2. For a naive Bayesian inference
system [36], it requires 9 bitstreams from SBG array with
9 SBG circuits.
Suppose that bsi means the input bitstream of terminal Ti,
and pi is the input probability of terminal Ti. p(bs) means
the corresponding probability of bitstream bs. The stochastic
computing logics in Fig. 7(a) are built to realize (1)
pR1 = p1 · p2 · p5 + p3 · p4 · (1− p5)
= p(bs1&bs2&bs5) + p(bs3&bs4&bs5)
pR2 = p6 · p7 · p8 · p9
= p(bs6&bs7&bs8&bs9)
(1)
As we have seen from Eqn. 1, AND operations are executed
among {bs1, bs2, bs5} so that they are defined as conflicting
with each other according to the stochastic computing princi-
ple. And T1, T2, T5 are formulated as a conflict set as shown
in Fig. 7(b). Similarly, T3, T4, T5 are formulated as another
conflict set as well as T6, T7, T8, T9. The input terminals
in the same conflict set are not allowed to share the same
bitstream source from SBG array even if they have the same
input evidence. Otherwise, the input terminals with same input
evidence are allowed to share the same bitstream.
D. SBG Array and SBG Sharing Strategy
As shown in Fig. 2, MTJ switching probability is associated
with bias voltage and duration time. Generally, either bias
voltage or duration time is fixed and the other one is varied for
random switching. In the previous approach [36], bitstreams
are directly fed into stochastic computing logic from SBG
array so that it usually requires many SBGs, as well as DACs.
Furthermore, the output probability of SBG is highly sensitive
to the input bias voltage whose margin is very small as re-
ported in [36]. Accurate mapping from digital probabilities to
voltages requires DACs with high precision, and it is difficult
to tackle the non-linear relationship between probabilities and
voltages. More importantly, a slight noise or process variation
may map a probability to an unexpected voltage. Aiming to
overcome these limitations, the bitstreams in SPINBIS are
provided with a pre-built SBG array and assigned to stochastic
computing logic through a switch matrix.
Different with the SBG array in [36], a pre-build SBG
array based on SBG sharing strategy is utilized in SPINBIS
to improve the stability of SBG and reduce the required
number of SBGs. The BL/SL of each SBG in the array is
supplied by an internal voltage source that could provide
more stable bias voltage than DACs. By this manner, the
generated probability of each SBG is pre-determined and
will be multiplexed by the switch matrix. According to the
SBG sharing strategy, the required number of SBGs could be
much smaller compared with the input terminals of stochastic
computing logics because the non-conflicting terminals are
allowed to share the same bitstreams. Since the SBG array is
pre-built, it has to provide enough kind of bitstreams to satisfy
the required accuracy of the stochastic computing which will
be discussed later.
Assuming that it requires L kinds of probabilities for a
specified application, we define p1, p2, · · · , pL as the required
probabilities. Each kind of probability correspond to one SBG
set which is denoted as SBGi,φ(i), where i = 1, 2, · · · , L
is the index of each kind of probability set, and φ(i) is
the required number of SBGs in each SBG set. For each
SBG set SBGi,φ(i), they generates the same probability pi
but the bitstreams are different from each other. Let M =
φ(1)+φ(2)+ · · ·+φ(L), and M denotes the total number of
SBGs in SBG array. The SBG array is constructed based on
the conflict sets and input probabilities. The conflict sets are
pre-extracted from the stochastic computing logics according
to the specified application. For a particular application, input
probabilities could be evaluated and usually have a certain
distribution which is adopted to determine the probability set
in combination with the pre-extracted conflict sets.
Taking the example of stochastic computing logics in Fig. 7,
input terminals of T1 ∼ T9 are supposed to have the proba-
bilities {p1, p2, p1, p3, p1, p4, p5, p3, p3}, where T1, T3 and T5
have the same probability p1, T4, T8 and T9 have the same
probability p3. Since T1 and T3 don’t belong to the same
conflict set, they could share the same bitstream from SBG
array. But T5 has to adopt the bitstream from other different
SBG because it is conflicted with T1 and T3. Similarly, T4
and T8 could share the same bitstream but not for T9. In this
case, only 7 SBGs are required in SPINBIS while 9 SBGs are
required if no SBG sharing strategy is utilized [36]. Hence, the
SBG sharing strategy could significantly reduce the required
SBGs scale, especially for the applications with large scale of
input probabilities.
ON-1ONO1 O2 O3
Switch
OFF
I1
I2
I3
IM-2
IM-1
IM
bs1
bs2
bs3
bsM-2
bsM-1
bsM
TN-2 TN-1 TNT1 T2 T3
SBG
Array
Stoch. 
Comp.
Logic
CM,NC1,1 C1,2 C1,3
C(M-2),N
Switch
Controller
Switch
ON
Fig. 8: Switch matrix block of SPINBIS.
E. Switch Matrix and Switch Controller
The SBG sharing strategy is realized by exploiting a multi-
plexing network between SBG array and stochastic computing
logics. As shown in Fig. 8, the switch matrix receives the
bitstreams from SBG array and assigns them to stochastic
computing logics. The assigning procedures are determined
by the switch controller. There are M bitstream sources bsj
to be linked to switch matrix left side terminal Ij , where
j = 1, 2, · · · ,M . And there are N outputs (O1, O2, · · · , ON )
from switch matrix to be linked to the input terminals Tk
of stochastic computing logics, where k = 1, 2, · · · , N . The
switch matrix is built with a crossbar structure while nMOS
transistor is located at each cross-point as a selector. The
selection operations of these transistors are carried out by the
switch controller which is determined by the digital inputs and
conflict sets. For each column of the switch matrix, there is
only one selector is switched ON because each input terminal
of stochastic computing logics only accepts one bitstream. For
each row of switch matrix, there may be zero, one or more
selectors are switched ON because the bitstreams from SBG
array may be shared by different input terminals of stochastic
computing logics.
The switching procedures are illustrated in Alg. 1. In Lines
(1-4), the vector bs[i] indicates the first available bitstream
index of probability pi. Lines (5-11) generate control signals
for all terminals in the given conflict set. For the terminals in
one conflict set, the digital inputs (Line 7) are obtained by the
terminal index (Line 6). Then the probability index in vector
P is calculated by Line 8. The control signal is generated by
Line 9. In Line 10, the first available bitstream index of the
probability is updated. By this way, it could guarantee that
each bitstream will not be allocated for terminals who belong
to the same conflict set.
Even though SBG sharing strategy has been utilized to
reduce the required scale of SBG array, the scale of switch
matrix is still too large because the stochastic computing
logics usually have too many input terminals. In this work,
a terminal clustering strategy is further proposed to reduce the
scale of switch matrix. For the input terminals of stochastic
computing logics who always have the same digital input, they
are clustered as a single terminal if they are in the different
conflict sets. As shown in Fig. 7, terminals T1 and T3 belong
Algorithm 1 Switching procedures for SBG sharing strategy.
Input: Digital inputs In[i], where i = 1, 2, · · · , N ; SBG
array SBGi,φ(i) with P [i] = pi, and φ[i] = φ(i),
where i = 1, 2, · · · , L; Conflict sets cflct[i], where
i = 1, 2, · · · , T .
Output: Binary control signal C[i][j], where i = 1, 2, · · · ,M
and j = 1, 2, · · · , N .
1: bs[1] = 1 . bs[i] indicates the first available bitstream
index of probability pi
2: for (i = 2, i ≤ L; i = i+ 1) do
3: bs[i] = bs[i− 1] + φ[i− 1]
4: end for
5: for (i = 1, i ≤ T ; i = i+ 1) do
6: ter idx = cflct[i]
7: pro = In[ter idx]
8: pro idx = findProIndex(pro, P )
9: C[bs[pro idx]][ter idx] = 1
10: bs[pro idx] = bs[pro idx] + 1
11: end for
to different conflict sets, if they always have the same input
probability, they are clustered as the same input terminal.
F. Discussion
The switch matrix and SBG sharing strategy is proposed in
SPINBIS to reduce the required number of SBGs with certain
design overhead. We compare the design complexity between
SPINBIS and the work in [36]. The stochastic computing
logics of SPINBIS are the same as [36]. The scale of SBG
array is reduced from N to M according to the SBG sharing
strategy, where M  N . Since the SBG array accounts for
substantial part of energy consumption in SPINBIS, the energy
consumption is reduced by N−MN when the scale of SBG array
is reduced from N to M . Assuming that there are T transistors
in each SBG circuit, the utilization of transistors in SBG array
is reduced from T ∗ N to T ∗M . According to the terminal
clustering strategy, the number of switch matrix output N is
reduced as N ′ = αN , where α ∈ (0, 1), and the utilization
of transistors in switch matrix is M ∗ αN . In summary, the
utilization of transistors of SBG array is reduced from T ∗N
to T ∗M but with the overhead of M ∗ αN resulted from
switch matrix. Since M  N , the total area of SPINBIS
(T ∗M +M ∗ αN) is mainly determined by M ∗αN . Based
on the above discussion, the advantages of SPINBIS can be
well highlighted when dealing with large scale applications
with regular structure and input patterns.
IV. SPINTRONIC DEVICE BASED ENERGY EFFICIENT SBG
The performance of SBG is critical for efficient SPINBIS
both in inference accuracy and inference speed as well as
the power consumption. A high quality SBG should have the
following two properties at least: (1) The generated bitstream
could represent the given probability as accurately as possible.
If the deviation between probability value and bitstream is too
large, the stochastic computing results will be unpredictable.
(2) The correlations among different stochastic bitstreams
should be as small as possible because high correlation usually
?P
?AP P
1 0
?
PAP
? ?
AP P AP P
0 01 1
Write 1
Write 0
Read Read
Initial 
state
Initial 
state
Write 1Write 0
(a) (b)
Fig. 9: State transition diagram of (a) simple SBG and (b)
self-control SBG.
TABLE I. Enable signal configuration for reset, write and read
operations.
Write En. Read En. Rst. 0 Wrt. 1
reset High Low High Lowwrite Low High
read Low High - -
degrade the accuracy of stochastic computing significantly. In
this section, an efficient SBG circuit is proposed by utilizing
the inherent random behaviors of MTJ devices for Bayesian
inference.
A. Schematic of SBG
The stochastic bitstreams are generated by reading the MTJ
states which have been pre-written as shown in Fig. 2. If the
readout of MTJ is with high resistance i.e. ‘AP’ state, ‘1’
will be generated as one stochastic bit; otherwise, ‘0’ will be
generated. Generally, each bit generation is accomplished by
three stages: reset, write and read. Bitstreams are obtained by
performing these three stages continuously. The state transition
diagram of simple SBG is illustrated in Fig. 9(a).
Both the reset procedure and write procedure is a kind of
programming operation on MTJ device. The reset operation
aims to program the MTJ with bias voltage and duration time
which is large enough to achieve a successful switching while
the switching probability is close to 100%. But the write
operation aims to program the MTJ according to the required
switching probability (p ∈ [0, 1]) as shown in Fig. 2. Assuming
that the initial MTJ state is unknown, the reset operation (Write
0 in Fig 9(a)) is to switch it to ‘P’ state with the probability
p = 100% while the write operation (Write ‘1’ in Fig 9(a)) is
to switch it with the probability p ∈ [0, 1].
The enable signal configuration has been illustrated in
Table I. Both the write and reset operations are accomplished
by the write circuit as shown in Fig. 10(a) while the read
operation is finished by the read circuit as shown in Fig. 10(b).
The multiplexers MUX2 and MUX3 are adopted to switch the
write current or read current flowing through the MTJ.
During write and reset operations, Write En. is set as high,
thus terminal ‘1’ of MUX2 and MUX3 are connected with
corresponding terminal ‘Y’. For reset operation, Wrt. 1 is set
as low and Rst. 0 is set as high so that terminal ‘0’ of MUX1
and terminal ‘1’ of MUX4 are connected with terminal ‘Y’. By
applying a bias voltage between source-line (SL) and GND,
write current flows through the MTJ from bottom to top as
the blue arrow shows. For write operation, Wrt. 1 is set as
high and Rst. 0 is set as low so that terminal ‘1’ of MUX1
and terminal ‘0’ of MUX4 are connected with terminal ‘Y’.
By applying a bias voltage between bit-line (BL) and GND,
current flows through the MTJ from top to bottom as the red
arrow shows.
During read stage, Read En. is set as high while Write
En. is set as low so that terminal ‘0’ of MUX2 and MUX3 are
connected with terminal ‘Y’. A pre-charging sense amplifier is
adopted to compare the MTJ state of data cell with that of the
reference cell as shown in Fig. 10(b). The MTJ resistance state
of reference cell (10(d)) is usually set as (RP + RAP )/2 so
that both AP state and P state of data cell could be identified
correctly. The read circuit consists of a two-branch sensing
circuit with equalizing transistors [38] and a voltage sense
amplifier with dynamic latched comparator [39] for digital
output. Both branches of read circuit are composed by a load
pMOS, a read enable nMOS and a clamped nMOS [40] [41].
The read operation is enabled by setting Read En. as high so
that nMOS N1 and N2 are turned on. The clamped nMOS
is utilized to prevent read disturbance by applying a proper
bias voltage Vclamp. The resistance of reference cell is usually
located between RP and RAP in order to identify the ‘AP’
state or ‘P’ state of data cell. During read stage, the resistance
difference between date cell and reference cell is converted to
the difference of Vdata and Vref which could be sensed by a
dynamic latched voltage comparator with clock enabled. The
state of data cell is read out at each rising edge of Cclk.
B. Energy Efficient SBG Using Self-control Strategy
Energy efficiency of neural network and Bayesian inference
has been considered as a primary concern for applications
on embedded computing platforms. Several research works
have been proposed towards efficient implementation of MTJ-
based stochastic computing. The work in [17] indicated that
the energy consumption required for switching P→AP with
99.9% probability is less than that of switching AP→P. Hence,
they reset the MTJ to AP state every time and then attempt to
switch it to P state to generate one stochastic bit. However,
the energy consumption of resetting P→AP is still wasted
because no bit is generated during the reset procedure. As
illustrated in Fig. 9(a) of simple SBG, the MTJ is first reset as
‘P’ state after each stochastic bit is generated. The stochastic
bits are generated by reading out the MTJ state after the write
procedure. Actually, the stochastic bits are generated based on
whether the MTJ state is switched successfully or not in write
procedure. In this work, we propose an efficient SBG while
the reset procedure is also utilized for generating stochastic
bits.
A self-control strategy is proposed by storing the MTJ state
of previous cycle in a register and then comparing it with
the state of current cycle to determine the stochastic bit as
output. That is, the stochastic bit is generated according to
the comparison whether MTJ state is changed or not. The
state transition diagram of SBG with self-control strategy is
illustrated in Fig. 9(b). If the current state is different from
1 0
1 0
1 0
1 0
BL
SL
Write En.
Write En.
Wrt. 1
Rst. 0
Idata
MUX1
MUX2
MUX3
MUX4
Write 1
Write 0
Write En.
Write En.
Iref
Read En.
Vload
Vclamp
Comp. MTJ State
Cclk
SA
Data 
Cell
Reference 
Cell
1 0
1 0
Bit
Wrt. 1
(c) Self Control Circuit
TG
Tclk
Tclk
DFF
Dclk
Idata Iref
(a) Write Circuit
(b) Read Circuit (e) SBG Symbol 
last_state / Rst. 0
current_state 
Y
Y Y
YY
Y
Vdata
Vref
SBG
01…10
N1 N2
Ref.
(d) Reference Cell
P
AP
P
AP
Fig. 10: Schematic of SBG circuit.
the last state, the output bit is ‘1’, otherwise, the output bit is
‘0’. Meanwhile, the direction of write operation is determined
by the stored state of last cycle. According to the self-control
strategy, the reset→write→read procedures are compressed as
write→read procedures. In write procedure, the biased voltage
between BL and SL are carefully set as a certain range to
guarantee both write ‘0’ and write ‘1’ operations are with the
same probability value. The speed and energy efficiency of
bitstream generation could be improved by 2× theoretically.
The self-control circuit is demonstrated in Fig. 10(c). The
transmission gate (TG) and D-Flip-Flop (DFF) is clocked
by Tclk and Dclk, respectively. The output of comparator in
Fig. 10(b) (i.e. MTJ State) is highly sensitive to its loads.
Hence, the transmission gate is inserted to eliminate the loads
influence. There is a small delay in rising edge of Tclk after
the rising edge of Cclk to guarantee the output of comparator
is stable. DFF is utilized to latch the MTJ state which will
be compared with next cycle for one stochastic bit output. If
the current MTJ state is different from the latched state, the
output of SBG is ‘1’, otherwise, is ‘0’. Meanwhile, the latched
MTJ state also determines the direction of write operation in
the next cycle. If the latched MTJ state is P, the write current
flows through the MTJ from top to bottom which attempts
to switch the MTJ state from P to AP. Otherwise, the write
current has the opposite direction. There is also a small delay
in rising edge of Dclk after the rising edge of Tclk. When Tclk
is high and Dclk is low, TG output is the current state of MTJ
and DFF output is the last state of MTJ. During this period,
XOR operation on current state and last state is regarded as
one bit output. If the current state is different from the latched
state in the last cycle, it means that the state of MTJ has
already switched successfully. The result of the XOR gate is
Vref
R
nMOS
(a)
2250 2500 2750 3000 3250 3500 3750 4000
R (Ohm)
1.05
1.10
1.15
1.20
1.25
V
re
f (
V
)
(b)
Fig. 11: a) Schematic of voltage divider. b) The black solid
line represents the fitting curve of R and Vref , the red circles
represent the desired writing voltages between BL and SL of
SBGs.
high, and consequently one bit of ‘1’ is generated. Otherwise,
one bit ‘0’ is generated. After the rising edge of Dclk, current
state is latched in DFF until the next read stage. In next write
stage, the DFF output Rst. 0 is utilized to control MUX4 and
Wrt. 1 is utilized to control MUX1.
The aforementioned SPINBIS architecture requires that
each SBG should be equipped with two internal voltage
sources with fixed voltage values. In this work, it is achieved
by a voltage divider [42] that consists of one resistor and one
nMOS transistor as shown in Fig. 11(a). Vref is adopted as
the writing voltage for each SBG, and determined by varying
the resistance value R. As shown in Fig. 11(b), Vref varies
smoothly according to R (black solid line). And the desired
writing voltage between BL or SL (red circles in Fig. 11(b))
TABLE II. The parameters definition and default value of MTJ
model.
Parameter Description Default value
α Gilbert damping coefficient 0.027
γ Gyro-Magnetic constant 1.76× 107 Hz/Oe
P Electron polarization percentage 0.52
Hk0 Out of plane magnetic anisotropy 1433 Oe
tsl/tox Height of the free layer / oxide barrier 1.3 nm / 0.85 nm
l/w Length / width of MTJ traverse 45 nm / 45 nm
TMR TMR with zero volt bias voltage 1.5
RA Resistance area product 5 Ω ·m2
 MTJ1
state
0
1
 MTJ2
state
0
1
time (ns)0 100 200 300 400 500
Fig. 12: Simulation results of the proposed instance-vary MTJ
model. Two MTJs are simulated simultaneously with the same
bias voltage and pulse width.
could be achieved by adjusting the resistor value.
For the sake of convenient, the SBG with self-control
strategy is denoted as self-control SBG and the one
described in Section IV-A is denoted as simple SBG.
C. Evaluation of SBG Circuits
The proposed SBG circuits are evaluated to explore its
performance in this section.
1) Simulation setup: The SBG circuit is composed by
hybrid CMOS/MTJ structures with 45 nm CMOS and 45 nm
MTJ technologies. A behavioral model of MTJ is described
by Verilog-A language [43] while the stochastic switching
behaviors are also included. However, the original MTJ model
in [43] only provides stochastic switching behaviors for a
single device. That is, the obtained bitstreams from different
SBG circuits are always the same if they have a same bias
voltage and duration time. Since many MTJs are utilized in
SBG array, the bitstreams generated with original MTJ model
[43] will have very strong correlation with each other which
will lead to inaccuracy stochastic computing results. Hence, a
new compact MTJ model is proposed for stochastic switching
with the property that the switching behaviors of different MTJ
instances are different with each other.
As described in [43], MTJ switching time is obtained by
the critical current and other electrical and physical parame-
ters. The stochastic behavior is independent with the critical
switching current, and is implemented by random functions
with uniform or normal distributions. The basic switching time
dt is determined according to the applied bias voltage for each
cycle. Then a random number that obeys normal distribution
∼ N(seed, dt, σ) is generated per cycle as the final switching
time, where seed is the random seed for random number
generation function, σ is the user specified standard deviation.
The parameter seed is set as a constant value in the model
published in [43]. It indicates that the switching time is the
same if two MTJs have the same dt and σ. Aiming to obtain
the different random behaviors, the MTJ model is revised by
setting seed as different values for different MTJ instances,
Write
EN.
0.0
1.8
 Wrt. 1
0.0
1.8
MTJ
State
0.0
1.0
Read
EN.
0.0
1.8
Comp
Clk
0.0
1.8
 OUT
1.8
0.0
time (ns) 
0.0
5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0 65.0
Reset 0 Write 1 Reset 0 Reset 0 Write 1Write 1
Write 1
Failure
Reset 0
Success
Reset 0
Success
Reset 0
Success
Write 1
Success
Write 1
Success
Read out 0
Read out 1
V(V)
Read out 1
Fig. 13: Simulation results of simple SBG circuit.
which is denoted as instance-vary model. Fortunately, the
Verilog-A language of version 13.1 and above supports the
grammar of arandom[param]. The param argument is
optional and can be set as global or instance. If param
is set as instance for each MTJ’s required randomness,
different seed values will be generated for different MTJ
instances. This feature could satisfy the MTJ’s instance-vary
randomness requirement well. The parameters definition and
default value of MTJ model are provided in Table II for
experimental configurations. The simulation results of MTJ
switching with instance-vary model are shown in Fig. 12. With
the same write operations, MTJ1 and MTJ2 have different
switching results which are critical to generate irrelevant
bitstreams for stochastic computing.
2) SBG simulation results: The simulation results of simple
SBG circuit are illustrated in Fig. 13. The reset→write→read
operations are performed iteratively for 3 cycles from 5 ns
to 65 ns. The reset and write operations are enabled when
Write En. is high. For reset operation (AP → P or P → P),
the bias voltage between SL and GND is about 1.8 V and
the duration time is about 7 ns to guarantee the switching
probability p → 100%. For write operation (P → AP), if the
bias voltage between BL and GND is set as about 1.166 V and
duration time is about 5.4 ns, the switching probability p is
about 50%. For read operation, Read En. is set as high and the
MTJ state is read out while Vload and Vclamp is about 0.8 V .
For each cycle of reset→write→read operations, the MTJ is
first reset as P state with the switching probability p→ 100%
when Write En. and Wrt. 1 is set as low. The write current
flows through the MTJ from bottom to up as the blue arrow
shown in Fig. 10(a). And then the MTJ attempts to finish P
→ AP switching with the provided switching probability when
Wrt. 1 is set as high. The write current flows through the MTJ
from up to bottom as the red arrow shows in Fig. 10(a). At
last, read operation is performed by setting Read En. as high
and Write En. as low. For the 3 cycles of reset→write→read
operations as shown in Fig. 13, writing P → AP fails in the
first cycle but successes in the following two cycles. And
consequently, the bitstream is generated as ‘011’.
The comprehensive simulation results of self-control SBG
circuit are shown in Fig. 14 for 4 cycles from 5 ns to 45
ns. The first cycle aims to initialize MTJ as P state. The
MTJ is read out as P state at 13 ns and TG is turned on
Write
EN. 0.0
1.8
 Wrt. 1
0.0
1.8
MTJ
State 0.0
1.0
Read
EN. 0.0
1.8
Comp
Clk 0.0
1.8
 TGClk
0.0
1.8
 DFFClk
0.0
1.8
Current
State 0.0
2.0
Last
State 0.0
1.8
 OUT
0.0
1.8
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0
Write WriteRead Write WriteReadRead
Write 1 Write 0 Write 0
Write 1
Success
Write 0
Failure
Write 0
Success
Read out 0
Read out 1
Read out 1
Read 
out 0
Latched state 1 Latched state 1
Output 1
Output 0
Output 1
Latched state 0
time(ns)
V(V)
Read
Fig. 14: Simulation wave of self-control SBG circuit.
with a delay of 0.5 ns. Since the ‘last state’ is meaningless
for the first cycle, the XOR result is discarded in this cycle.
And then the current state P is latched in DFF from 14 ns
to 24 ns. For the second cycle, Wrt. 1 is enabled for a write
operation while ‘last state’ is P (logic ‘0’). The write operation
is finished successfully so that the MTJ is in AP state. From
23 ns to 24 ns, the AP state of MTJ is passed through TG and
denoted as ‘current state’. By performing XOR operation on
the ‘last state’ (latched P) and ‘current state’ (AP), one bit of
‘1’ is generated for the second cycle. The ‘current state’ (AP)
is then latched in DFF and becomes as the ‘last state’ for the
next cycle. For the third cycle, Rst. 0 is enabled for writing
MTJ from AP to P state since the latched ‘last state’ is in AP
state. However, the writing operation of AP to P fails for the
third cycle. It means that the MTJ state in the third cycle is not
changed compared with that in the second cycle. Hence, one
bit of ‘0’ is generated for the third cycle. For the forth cycle,
MTJ still attempts to switch from AP to P state and finishes
the switching successfully. Hence, one bit of ‘1’ is generated.
Finally, a bitstream ‘101’ is obtained among these 4 cycles.
For generating a bitstream with n bits, simple SBG circuit
requires 2n write operations (including reset and write) and n
read operations. But for self-control SBG circuit, only n+ 1
write operations (including initialization and write) and n+1
read operations are required. It is obvious that the self-control
SBG circuit could improve the speed and energy efficiency
about 2× compared with simple SBG circuit.
3) SBG performance: The proposed SBG circuits are eval-
uated to analyze the performance of the generated stochastic
bitstreams both on representation accuracy and correlation.
For evaluating the accuracy, n-bits stochastic bitstreams are
generated while n is the bitstream length of 64, 128, 256
and 1000. The bitstream with length n = 1000 is denoted
as the ground truth. The MTJ switching probability of simple
SBG is demonstrated in Fig. 15 with different BL voltage
while the SL voltage is set as 1.8 V . Compared with the
ground truth of length n = 1000, the generated bitstreams with
length n = 64, 128, 256 have the average errors of 1.5%, 0.7%
and 0.6%, respectively. Meanwhile, the relationship between
switching probability and different BL/SL voltage combina-
1.05 1.10 1.15 1.20 1.25
0.0
0.2
0.4
0.6
0.8
1.0
Sw
it
ch
 p
ro
ba
bi
lit
y
BL voltage (V)
max error = 4.0%
avg error = 1.5%
1000 cycle VS 64 cycle
1000 cycle
64 cycle
1.05 1.10 1.15 1.20 1.25
0.0
0.2
0.4
0.6
0.8
1.0
BL voltage (V)
max error = 2.4%
avg error = 0.7%
1000 cycle VS 128 cycle
1000 cycle
128 cycle
1.05 1.10 1.15 1.20 1.2
5
0.0
0.2
0.4
0.6
0.8
1.0
BL voltage (V)
max error = 2.0%
avg error = 0.6%
1000 cycle VS 256 cycle
1000 cycle
256 cycle
Fig. 15: MTJ switching probability with different applied BL
voltage. SL voltage is set as 1.8 V .
(1.05, 1
.508)
(1.108,
 1.562)(1.158,
 1.6)
(1.205,
 1.634)
(1.263,
 1.684)
0.0
0.2
0.4
0.6
0.8
1.0
Sw
it
ch
 p
ro
ba
bi
lit
y
BL/SL voltage (V)
max error = 4.0%
avg error = 1.8%
1000 cycle VS 64 cycle
1000 cycle
64 cycle
(1.05, 1
.508)
(1.108,
 1.562)(1.158,
 1.6)
(1.205,
 1.634)
(1.263,
 1.684)
0.0
0.2
0.4
0.6
0.8
1.0
BL/SL voltage (V)
max error = 2.6%
avg error = 0.9%
1000 cycle VS 128 cycle
1000 cycle
128 cycle
(1.05, 1
.508)
(1.108,
 1.562)(1.158,
 1.6)
(1.205,
 1.634)
(1.263,
 1.684)
0.0
0.2
0.4
0.6
0.8
1.0
BL/SL voltage (V)
max error = 1.6%
avg error = 0.5%
1000 cycle VS 256 cycle
1000 cycle
256 cycle
Fig. 16: MTJ switching probability with different applied
BL/SL voltage combination.
0.1 0.3 0.5 0.7 0.9
Probability
0.0
0.1
0.2
0.3
0.4
0.5
SC
C 
Va
lu
e
length=64
length=128
length=256
length=512
Fig. 17: The self-SCC measurement for probability p ∈ [0, 1],
i.e., evaluating the generated bitstreams for a particular prob-
ability while only 10%, 30%, 50%, 70%, 90% are illustrated.
tions are also demonstrated in Fig. 16 for self-control SBG
circuit. Compared with the ground truth of length n = 1000,
the generated bitstreams with length n = 64, 128, 256 have
the average errors of 1.8%, 0.9% and 0.5%, respectively.
As described above, stochastic computing usually requires
a low correlation among different bitstreams. Many evaluation
metrics of statistical correlation between different bitstreams
have been proposed [44]. The stochastic computing correlation
(SCC) measurement [45] is adopted in our work, which is
particularly proposed for stochastic computing:
SCC (X1, X2) =
{
ad−bc
n×min(a+b,a+c)−(a+b)(a+c) if ad > bc
ad−bc
(a+b)(a+c)−n×max(a−d,0) otherwise
(2)
where X1 and X2 are two stochastic bitstreams for measure-
ment, a is the number of ‘1’s bit-overlapping between X1
and X2, b is the number of bit-overlapping of ‘1’s in X1
and ‘0’s in X2, c is the number of bit-overlapping of ‘0’s in
X1 and ‘1’s in X2, d is the number of ‘0’s bit-overlapping
between X1 and X2. From Eqn. (2), SCC → +1 if X1 and
X2 have a maximum similarity; otherwise, SCC → −1 if
(19, 41) (12, 48) (49, 25) (23, 44) (18, 58)
Probability Combination(%, %)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
SC
C 
Va
lu
e
length=64
length=128
length=256
length=512
Fig. 18: The cross-SCC measurement for probability com-
binations, i.e., evaluating the generated bitstreams be-
tween different probabilities while only the combina-
tions of (19%, 41%), (12%, 48%), (49%, 25%), (23%, 44%),
(18%, 58%) are illustrated.
0 20 40 60 80 100
Probability (%)2.45
2.50
2.55
2.60
2.65
2.70
2.75
En
er
gy
 p
er
 b
it
 (
pJ
)
Fig. 19: Energy consumption for generating bitstreams of
probability p ∈ [0, 1].
X1 and X2 are totally different. And consequently, we have
SCC ∈ [−1, 1]. For a certain probability p ∈ [0, 1], many
bitstreams are generated to compute the SCC absolute value,
which is regarded as self-SCC measurement. A self-SCC
measurement sample is illustrated in Fig. 17 with the bit length
n = 64, 128, 256, 512. For measuring SCC between different
probabilities, two groups of bitstreams are generated to com-
pute the SCC absolute value, which is regarded as cross-SCC
measurement. A cross-SCC measurement sample is demon-
strated in Fig. 18 with the bit length n = 64, 128, 256, 512.
As can be seen from Fig. 17 and Fig. 18, the SCC values
are relatively small so that they could satisfy the requirements
of stochastic computing. And the SCC value decreases when
the bitstream length increases.
There are 93 CMOS transistors, one resistor and 5 MTJs
in the proposed self-control SBG circuits which will be
adopted to analyze the occupied chip area. Also the energy
consumption for generating bitstreams of probability p ∈ [0, 1]
is demonstrated in Fig. 19, from which we can see the larger
probability usually requires more energy consumption.
4) Process Variation: MTJ switching behavior is deeply
impacted by the process variation such as MTJ geometric
variation and initial magnetization angle variation [46], [47].
TABLE III. Simulation results of probability-voltage relation-
ship under the certain process variation.
Bitstream Length 64 128 256
Max Error 0.1295 0.0949 0.0733
Avg. Error 0.0460 0.0336 0.0269
MOS
Transistor
Stochastic
Behavior
Switching
Model
Random
Seed
MTJ
Modeling
SPINBIS
Accuracy, Energy, Speed, Area
SBG Array Switch Matrix
SC
Logic
Switch Controller
Characteristics
Synthesis
Architectural Simulator
RTL
Spectre Simulator
Application
Trace
Switch Matrix/SBG Array/SC Logic
Characteristics
Ar
ch
ite
ct
ur
e
Ci
rc
ui
t Le
ve
l
De
vi
ce
 Le
ve
l
Fig. 20: Evaluation framework of SPINBIS.
Variation in surface area (A) and tunneling oxide thickness
(tox) are the main causes behind the resistance change in
MTJ material because RMTJ ∝ (1/A) · etox . Assuming
that the variation of A and tox follows Gaussian distribution
with a standard deviations of 5% and 2% of their mean
value, respectively [48], [49], the sensitivity of the relationship
between required probability and applied voltage is evaluated
and shown in Table III. The accuracy could be improved by
increasing the bitstream length.
V. APPLICATIONS
As demonstrated in Fig. 6, the stochastic computing archi-
tectures are determined according to the specified applications.
A device-to-architecture level evaluation framework is illus-
trated for SPINBIS and a typical application is demonstrated
as a case study in this section.
A. Evaluation Framework
SPINBIS is implemented by hybrid CMOS/MTJ technolo-
gies with three design hierarchies: device, circuit and archi-
tecture levels as shown in Fig 20. The hybrid CMOS/MTJ
circuits are simulated by Cadence Spectre simulator while the
MTJ model is written by Verilog-A language. The dynamic
switching of MTJ device is realized with two regimes of
Sun model [50] and Neel-Brown model [51]. The stochastic
MTJ switching behaviors are modeled by [11]. In order to
reduce the correlation of bitstreams generated by different
SBG circuits, the random seed is configured as different
for different MTJ instances as described in Section IV-C.
With the circuit simulation results, the SBG array, switch
matrix and stochastic computing logics are abstracted as be-
havioral blocks by performing characterizations. Meanwhile,
the RTL implementation of switch controller is synthesized
by Synopsys Design Compiler with 45 nm FreePDK library.
After performing the characterization of switch controller, an
architectural level simulation is carried out according to the
specified application trace. Finally, the evaluation results of
SPINBIS are obtained in terms of inference accuracy, energy
efficiency, inference speed and design area.
B. Case Study: Data Fusion for Target Locating
Data fusion aims to achieve more consistent, accurate, and
useful information by integrating multiple data sources instead
of by any individual data source. In this section, a simple
data fusion example is demonstrated and the corresponding
Bayesian inference procedures are also studied.
1) Problem definition and Bayesian inference algorithm:
Sensor fusion aims to determine a target location by multiple
sensors [52]. Assuming that there are 3 sensors on a 2D
plane while the width and length of 2D plane is 64 and
sensors are located at (0, 0), (0, 32), (32, 0). Each sensor has
2 data channels: distance (d) and bearing (b). The measured
data (d1, b1, d2, b2, d3, b3) from 3 sensors with 2 channels are
utilized to calculated the target location (x?, y?). In data fusion
application, the probability value that target object locates at
one position of the plane is calculated based on the sensor data.
The position with the largest probability value is considered
to be the position that object target is located at.
Based on the observed data (d1, b1, d2, b2, d3, b3), the
probability of target object located on (x, y) is denoted as
p(x, y|d1, b1, d2, b2, d3, b3) and could be calculated based on
Bayes’ theory:
p(x, y|d1, b1, d2, b2, d3, b3) ∝ p(x, y) ∗
∏
i
p(di|x, y)p(bi|x, y)
(3)
where p(x, y) is denoted as prior probability, and
p(di|x, y), p(bi|x, y) are known as evidence or likelihood
information. Since the target may locate at any position,
the prior probability p(x, y) is the same for any position.
Hence, p(x, y) is ignored in the following Bayesian inference
system. p(di|x, y) means the probability that the i-th sensor
return the distance value of di if the target object is located
at position (x, y). The meaning of p(bi|x, y) is similar to
that of p(di|x, y). The value of p(di|x, y) and p(bi|x, y) is
calculated by (4).
p(di|x, y) = 1√2piσdi · e
− (d(x,y)−µ
d
i )
2
2(σdi )
2
p(bi|x, y) = 1√2piσbi · e
− (b(x,y)−µ
b
i)
2
2(σbi )
2
(4)
where d(x, y) is the Euclidian distance between position (x, y)
and the i-th sensor, µdi is the distance data provided by the
i-th sensor, σdi = 5+µ
d
i /10. b(x, y) is the viewing angle from
the i-th sensor to position (x, y), µbi is the bear data provided
by the i-th sensor, σbi is set as 14.0626 degree.
TABLE IV. Transistor utilizations of SBG array and switch
matrix for different grid size, where Kenergy = MN and
Kcmos = T∗M+M∗N ′T∗N indicates the improvement on energy
and area efficiency, respectively.
Grid Size T N M N ′ Kenergy Kcmos
32× 32 92 6144 320 2817 0.052 1.64
64× 64 92 24576 320 5557 0.013 0.79
2) Bayesian inference system: From Eqn. (3), the
Bayesian inference is calculated by the product of a se-
ries of conditional probabilities, which could be realized
by performing stochastic computing with AND gates and
stochastic bitstreams. Given any 2 positions (x1, y1) and
(x2, y2), the calculations of p(x1, y1|d1, b1, d2, b2, d3, b3) and
p(x2, y2|d1, b1, d2, b2, d3, b3) could be finished in parallel
since they are independent with each other.
The SPINBIS architectures are reformulated as Fig. 21
for sensor fusion applications. For each probability calcula-
tion, it requires 5 AND gates to perform multiplications for
6 conditional probabilities. The SBG sharing and terminal
clustering strategies are utilized to reduce the required scale
of SBG array and switch matrix. The 2D plane is partitioned
as 64× 64 and 32× 32 grids for target locating problem. The
finer grid partition usually achieves more accurate locating
results. Table IV shows the scale of SBG array and switch
matrix for different grid size. The meaning of symbol T , N ,
M and N ′ in Table IV have been described in Section III-F.
For 64 × 64 grid size problem, the energy consumption and
transistor utilization of SBG array and switch matrix are 1.3%
and 80% of that in [36], respectively. For 32 × 32 grid size
problem, the energy consumption is 5.2% of that in [36] while
the transistor utilization is 1.64× of that in [36].
3) Simulation results: The obtained data from sensors is
represented as bitstreams with length of 64, 128 and 256
for stochastic computing. The fusion results on 64 × 64
grid are shown as heat maps in Fig. 22 and compared with
exact result. The bitstream with larger length usually has a
better inference accuracy. Meanwhile, Kullback-Leibler (KL)
divergence is further introduced to measure the differences
between stochastic inference results and exact solutions as
shown in Fig. 23. The dashed yellow line and blue line
represent the KL divergence value under the specified process
variations for 64 × 64 grid and 32 × 32 grid, respectively.
While the solid yellow line and blue line represent the KL
divergence value without process variations. For the same
KL divergence value, the length of bitstream without process
variation is usually larger than that without process variation
but still smaller than the length of work [52]. As reported
in previous work [52], the sensor fusion on 32 × 32 grid for
104 cycles could obtain a KL divergence of 0.029. However,
SPINBIS only need about 128 cycles to achieve such a KL
divergence as shown in Fig. 23 even with the consideration of
process variation. In summary, these advantages benefit from
the high accuracy and low correlation bitstreams generated by
the MTJ based SBG array.
4) Performance: The performance of SPINBIS with
the considerations of process variations is compared with
FPGA [52] and MTJ [36] based approaches. The sensor fusion
Digital Inputs
O
ut
pu
tSBG SBG
SBG SBG ON-1 ONO1 O2
bs1 bs2
bsM-1 bsM
bs1    001000 
bsM-1 101111
bsM   111111
bs2   100001
ON-1   001000
ON      100001
O2      100001
O1     111111
Switch
Controller
…
…
…
…
… … …
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Fig. 21: SPINBIS diagram of target locating problem.
0 10 50 6020 30 40 
exact results
0
10
20
30
40
50
60
0.0
0.2
0.4
0.6
0.8
1.0
(%
)
0 10 20 30 40 50 60 
SC results with BS length of 64
0
10
20
30
40
50
60
0.0
0.2
0.4
0.6
0.8
(%
)
0 10 20 30 40 50 60 
SC results with BS length of 128
0
10
20
30
40
50
60
0.0
0.2
0.4
0.6
0.8
(%
)
0 10 20 30 40 50 60 
SC results with BS length of 256
0
10
20
30
40
50
60
0.0
0.2
0.4
0.6
0.8
(%
)
Fig. 22: Sensor fusion results of SPINBIS for target locating
problem on 64 × 64 grid compared with exact solutions.
200 400 600 800 1000
Bitstream Length
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
KL
 d
iv
er
ge
nc
e
(128, 0.0253)
(128, 0.0289)
64x64 grid with pv
64x64 grid without pv
32x32 grid with pv
32x32 grid without pv
Fig. 23: KL divergence analysis of SPINBIS for target locating
problem.
problem is evaluated by these approaches on the 32×32 grid.
Firstly, the stochastic computing method is compared with
8-bit fixed point binary implementation on the same FPGA
platform of Xilinx Zynq 7020. The stochastic computing
method is referred to [52] but re-implemented by ourselves
for the sake of fairness and clarity. In Table V, stochas-
tic computing results are illustrated with different bitstream
length. As shown in Table V, longer bitstream realization could
obtain a lower KL divergence (better accuracy) but requires
more energy consumption. Once the bitstream length is larger
than about 200, stochastic computing method consumes more
TABLE V. Comparison between stochastic computing method
and 8-bit fixed point binary implementation on FPGA.
Method Bitstream KL Energy UtilizationLength Divergence LUT FF
SC
64 0.051 0.66µJ
9316 17608128 0.037 1.32µJ200 0.031 2.06µJ
256 0.021 2.64µJ
512 0.014 5.28µJ
Binary - - 1.99µJ 234496 253952
TABLE VI. SPINBIS performance comparison with other
methods with the requirement of KL divergence less than
0.029 on 32 × 32 grid, where Ecyc is energy consumption
of each cycle, Tcyc is the duration time of each cycle, Ncyc
is the total cycle count, Etot is the total energy consumption
for all cycles, Ttot is the total inference time, Ncmos is the
number of utilized CMOS transistors.
Method Ecyc Tcyc Ncyc
Etot Ttot Ncmos
(nJ) (ns) (µJ) (µs) (×103)
FPGA 10.3 10 256 2.64 2.56 -
MTJ [36] 4.58 40 256 1.17 10.24 ≈ 830
SPINBIS 0.78 10 128 0.10 1.28 ≈ 1200
energy than 8-bit fixed point binary implementation. Addition-
ally, the resources utilization of stochastic computing approach
is much lower than binary implementation. In fact, stochastic
computing method provides a trade-off between energy con-
sumption and inference accuracy. Hence, stochastic computing
is very promising for fault-tolerant embedded applications
which require higher area efficiency. Then the comparison
of stochastic computing results of different approaches are
illustrated in Table VI. All of the inference approaches are
required to satisfy the requirement of KL divergence less
than 0.029. As seen from Table VI, the energy efficiency
of MTJ-based approach [36] is significantly improved than
FPGA approach [52]. Furthermore, SPINBIS achieves better
energy efficiency and inference speed compared with MTJ [36]
and FPGA [52] approaches and bring about 45% design area
overhead compared with MTJ-based approach [36].
VI. CONCLUSION
Spintronic device is a promising technology because of its
low power, high speed, infinite endurance and easy integration
with CMOS circuit. In this paper, the inherent stochastic
behavior of MTJ is utilized to build the stochastic bitstream
generator which is critical for Bayesian inference system. A
state-aware self-control strategy is proposed to improve the
energy efficiency and speed of SBG circuit. The SBG sharing
strategy and terminal clustering strategy are further proposed
in SPINBIS to reduce the energy consumption and design
area overhead. A device-to-architecture level framework is
demonstrated to evaluate the performance of SPINBIS and
a typical application is demonstrated as a case study. Exper-
imental results on data fusion applications demonstrate that
SPINBIS could improve the energy efficiency about 12× than
MTJ-based approach with 45% design area overhead and about
26× than FPGA-based approach.
In the future, we will carry on our research on the following
aspects. Firstly, the probability and voltage relation is not very
smooth. It is necessary to improve the stability of the proposed
SBG. Secondly, the adopted switch matrix could still have a
congestion problem even if the scale is reduced from M ×N
to M ×N ′. Further reduction of the scale of SPINBIS is also
a desirable research point.
REFERENCES
[1] P. Pinheiro and P. Lima, “Bayesian sensor fusion for cooperative object
localization and world modeling,” in CIAS. Citeseer, 2004.
[2] N. Cruz-Ramı´rez, H. G. Acosta-Mesa, H. Carrillo-Calvet, L. A. Nava-
Ferna´ndez, and R. E. Barrientos-Martı´nez, “Diagnosis of breast cancer
using Bayesian networks: A case study,” Computers in Biology and
Medicine, vol. 37, no. 11, pp. 1553–1564, 2007.
[3] Y. Gal, R. Islam, and Z. Ghahramani, “Deep Bayesian active learning
with image data,” arXiv preprint arXiv:1703.02910, 2017.
[4] C. S. Thakur, S. Afshar, R. M. Wang, T. J. Hamilton, J. Tapson, and
A. Van Schaik, “Bayesian estimation and inference using stochastic
electronics,” Frontiers in neuroscience, vol. 10, 2016.
[5] A. Alaghi and J. P. Hayes, “Survey of stochastic computing,” ACM
TECS, vol. 12, no. 2s, p. 92, 2013.
[6] J. Grollier, D. Querlioz, and M. D. Stiles, “Spintronic nanodevices for
bioinspired computing,” Proceedings of the IEEE, vol. 104, no. 10, pp.
2024–2039, 2016.
[7] M. Lin, I. Lebedev, and J. Wawrzynek, “High-throughput Bayesian
computing machine with reconfigurable hardware,” in Proc. FPGA,
2010, pp. 73–82.
[8] P. Mroszczyk and P. Dudek, “The accuracy and scalability of continuous-
time Bayesian inference in analogue CMOS circuits,” in Proc. ISCAS,
2014, pp. 1576–1579.
[9] J. S. Friedman, L. E. Calvet, P. Bessie`re, J. Droulez, and D. Querlioz,
“Bayesian inference with Muller C-elements,” IEEE TCAS I, vol. 63,
no. 6, pp. 895–904, 2016.
[10] A. F. Vincent, N. Locatelli, J.-O. Klein, W. S. Zhao, S. Galdin-Retailleau,
and D. Querlioz, “Analytical macrospin modeling of the stochastic
switching time of spin-transfer torque devices,” IEEE TED, vol. 62,
no. 1, pp. 164–170, 2015.
[11] T. Devolder, J. Hayakawa, K. Ito, H. Takahashi, S. Ikeda, P. Crozat,
N. Zerounian, J.-V. Kim, C. Chappert, and H. Ohno, “Single-shot
time-resolved measurements of nanosecond-scale spin-transfer induced
switching: Stochastic versus deterministic aspects,” Physical review
letters, vol. 100, no. 5, p. 057206, 2008.
[12] L. A. de Barros Naviner, H. Cai, Y. Wang, W. Zhao, and A. B.
Dhia, “Stochastic computation with spin torque transfer magnetic tunnel
junction,” in Proc. NEWCAS, 2015, pp. 1–4.
[13] Y. Wang, H. Cai, L. A. Naviner, J.-O. Klein, J. Yang, and W. Zhao, “A
novel circuit design of true random number generator using magnetic
tunnel junction,” in Proc. NANOARCH.
[14] S. Wang, S. Pal, T. Li, A. Pan, C. Grezes, P. Khalili-Amiri, K. L. Wang,
and P. Gupta, “Hybrid VC-MTJ/CMOS non-volatile stochastic logic for
efficient computing,” in Proc. DATE, 2017, pp. 1438–1443.
[15] J. Von Neumann, “Probabilistic logics and the synthesis of reliable
organisms from unreliable components,” Automata studies, vol. 34, pp.
43–98, 1956.
[16] R. Venkatesan, S. Venkataramani, X. Fong, K. Roy, and A. Raghunathan,
“Spintastic: Spin-based stochastic logic for energy-efficient computing,”
in Proc. DATE, 2015, pp. 1575–1578.
[17] A. Mondal and A. Srivastava, “Power optimizations in MTJ-based neural
networks through stochastic computing,” in Proc. ISLPED, 2017, pp. 1–
6.
[18] J. S. Friedman, J. Droulez, P. Bessie`re, J. Lobo, and D. Querlioz,
“Approximation enhancement for stochastic Bayesian inference,” Inter-
national Journal of Approximate Reasoning, vol. 85, pp. 139–158, 2017.
[19] L. E. Calvet, J. S. Friedman, D. Querlioz, P. Bessie`re, and J. Droulez,
“Sleep stage classification with stochastic Bayesian inference,” in
Proc. NANOARCH, 2016, pp. 117–122.
[20] Y. Liu, Y. Wang, F. Lombardi, and J. Han, “An energy-efficient stochastic
computational deep belief network,” in Proc. DATE, 2018, pp. 1175–
1178.
[21] Y. Liu, S. Liu, Y. Wang, F. Lombardi, and J. Han, “A stochastic
computational multi-layer perceptron with backward propagation,” IEEE
Transactions on Computers, 2018.
[22] A. Ren, Z. Li, C. Ding, Q. Qiu, Y. Wang, J. Li, X. Qian, and
B. Yuan, “Sc-dcnn: Highly-scalable deep convolutional neural network
using stochastic computing,” ACM SIGOPS Operating Systems Review,
vol. 51, no. 2, pp. 405–418, 2017.
[23] K. Kim, J. Kim, J. Yu, J. Seo, J. Lee, and K. Choi, “Dynamic energy-
accuracy trade-off using stochastic computing in deep neural networks,”
in Proc. DAC. IEEE, 2016, pp. 1–6.
[24] J. Li, A. Ren, Z. Li, C. Ding, B. Yuan, Q. Qiu, and Y. Wang, “Towards
acceleration of deep convolutional neural networks using stochastic
computing.” in Proc. ASPDAC, 2017, pp. 115–120.
[25] P. Jeavons, D. A. Cohen, and J. Shawe-Taylor, “Generating binary
sequences for stochastic computing,” IEEE Transactions on Information
Theory (TIT), vol. 40, no. 3, pp. 716–720, 1994.
[26] R. Cai, A. Ren, N. Liu, C. Ding, L. Wang, X. Qian, M. Pedram, and
Y. Wang, “VIBNN: Hardware acceleration of Bayesian neural networks,”
in Proc. ASPLOS. ACM, 2018, pp. 476–488.
[27] K. Kim, J. Lee, and K. Choi, “An energy-efficient random number
generator for stochastic circuits,” in Proc. ASPDAC. IEEE, 2016, pp.
256–261.
[28] P. K. Gupta and R. Kumaresan, “Binary multiplication with PN se-
quences,” Proc. ICASSP, vol. 36, no. 4, pp. 603–606, 1988.
[29] M. Wang, W. Cai, K. Cao, J. Zhou, J. Wrona, S. Peng, H. Yang, J. Wei,
W. Kang, Y. Zhang et al., “Current-induced magnetization switching in
atom-thick tungsten engineered perpendicular magnetic tunnel junctions
with large tunnel magnetoresistance,” Nature communications, vol. 9,
no. 1, p. 671, 2018.
[30] M. Marins de Castro, R. Sousa, S. Bandiera et al., “Precessional
spin-transfer switching in a magnetic tunnel junction with a synthetic
antiferromagnetic perpendicular polarizer,” Journal of Applied Physics
(JAP), vol. 111, no. 7, p. 07C912, 2012.
[31] Y. Zhang, W. Zhao, J.-O. Klein, W. Kang, D. Querlioz, C. Chappert,
and D. Ravelosona, “Multi-level cell spin transfer torque MRAM based
on stochastic switching,” in Proc. NANO. IEEE, 2013, pp. 233–236.
[32] G. Srinivasan, A. Sengupta, and K. Roy, “Magnetic tunnel junction
enabled all-spin stochastic spiking neural network,” in Proc. DATE,
2017, pp. 530–535.
[33] S. Angizi, Z. He, Y. Bai, J. Han, M. Lin, R. F. DeMara, and D. Fan,
“Leveraging spintronic devices for efficient approximate logic and
stochastic neural networks,” 2018.
[34] X. Fong, M.-C. Chen, and K. Roy, “Generating true random numbers
using on-chip complementary polarizer spin-transfer torque magnetic
tunnel junctions,” in Device Research Conference (DRC), 2014, pp. 103–
104.
[35] N. Onizawa, D. Katagiri, W. J. Gross, and T. Hanyu, “Analog-to-
stochastic converter using magnetic tunnel junction devices for vision
chips,” IEEE TNANO, vol. 15, no. 5, pp. 705–714, 2016.
[36] X. Jia, J. Yang, Z. Wang, Y. Chen, and W. Zhao, “Spintronics
based stochastic computing for efficient Bayesian inference system,” in
Proc. ASPDAC, 2018, pp. 580–585.
[37] G. M. Masson, G. C. Gingher, and S. Nakamura, “A sampler of circuit
switching networks,” Computer, vol. 6, no. 12, pp. 32–48, 1979.
[38] J. Kim, K. Ryu, S. H. Kang, and S.-O. Jung, “A novel sensing circuit
for deep submicron spin transfer torque MRAM (STT-MRAM),” IEEE
VLSI, vol. 20, no. 1, pp. 181–186, 2012.
[39] P. M. Figueiredo and J. C. Vital, Offset reduction techniques in
high-speed analog-to-digital converters: analysis, design and tradeoffs.
Springer Science & Business Media, 2009.
[40] J. Yang, P. Wang, Y. Zhang, Y. Cheng, W. Zhao, Y. Chen, and H. H.
Li, “Radiation-induced soft error analysis of STT-MRAM: A device to
circuit approach,” IEEE TCAD, vol. 35, no. 3, pp. 380–393, 2016.
[41] J. Yang, X. Wang, Q. Zhou, Z. Wang, H. Li, Y. Chen, and W. Zhao,
“Exploiting spin-orbit torque devices as reconfigurable logic for circuit
obfuscation,” IEEE TCAD, vol. 38, no. 1, pp. 57–69, 2019.
[42] R. J. Baker, CMOS: circuit design, layout, and simulation. John Wiley
& Sons, 2008, vol. 1.
[43] Y. Wang, Y. Zhang, E. Deng, J.-O. Klein, L. A. Naviner, and W. Zhao,
“Compact model of magnetic tunnel junction with stochastic spin
transfer torque switching for reliability analyses,” Microelectronics Re-
liability, vol. 54, no. 9, pp. 1774–1778, 2014.
[44] S.-S. Choi, S.-H. Cha, and C. C. Tappert, “A survey of binary sim-
ilarity and distance measures,” Journal of Systemics, Cybernetics and
Informatics, vol. 8, no. 1, pp. 43–48, 2010.
[45] A. Alaghi and J. P. Hayes, “Exploiting correlation in stochastic circuit
design,” in Proc. ICCD, 2013, pp. 39–46.
[46] A. Nigam, C. W. Smullen IV, V. Mohan, E. Chen, S. Gurumurthi, and
M. R. Stan, “Delivering on the promise of universal memory for spin-
transfer torque ram (stt-ram),” in Proc. ISLPED. IEEE Press, 2011,
pp. 121–126.
[47] Y. Emre, C. Yang, K. Sutaria, Y. Cao, and C. Chakrabarti, “Enhancing
the reliability of stt-ram through circuit and system level techniques,”
in Signal Processing Systems (SiPS), 2012 IEEE Workshop on. IEEE,
2012, pp. 125–130.
[48] J. Li, C. Augustine, S. Salahuddin, and K. Roy, “Modeling of fail-
ure probability and statistical design of spin-torque transfer magnetic
random access memory (stt mram) array for yield enhancement,” in
Proc. DAC. ACM, 2008, pp. 278–283.
[49] Y. Zhang, X. Wang, and Y. Chen, “Stt-ram cell design optimization for
persistent and non-persistent error rate reduction: A statistical design
view,” in Proc. ICCAD. IEEE Press, 2011, pp. 471–477.
[50] D. Worledge, G. Hu, D. W. Abraham, J. Sun, P. Trouilloud, J. Nowak,
S. Brown, M. Gaidis, E. Osullivan, and R. Robertazzi, “Spin torque
switching of perpendicular Ta CoFeB MgO-based magnetic tunnel
junctions,” Applied Physics Letters, vol. 98, no. 2, p. 022501, 2011.
[51] R. Heindl, W. H. Rippard, S. E. Russek, M. R. Pufall, and A. B.
Kos, “Validity of the thermal activation model for spin-transfer torque
switching in magnetic tunnel junctions,” Journal of Applied Physics
(JAP), vol. 109, no. 7, p. 073910, 2011.
[52] A. Coninx, P. Bessie`re, E. Mazer, J. Droulez, R. Laurent, M. A. Aslam,
and J. Lobo, “Bayesian sensor fusion with fast and low power stochastic
circuits,” in ICRC, 2016, pp. 1–8.
Xiaotao Jia (S’13-M’17) received the B.S. degree
in mathematics from Beijing Jiao Tong University,
Beijing, China, in 2011, and the Ph.D. degree in
computer science and technology from Tsinghua
University, Beijing, China, in 2016.
He is currently a post-doctoral researcher with the
Fert Beijing Research Institute in Beihang Univer-
sity, Beijing, China. His current research interests
include spintronic circuits and Bayesian learning
systems.
Jianlei Yang (S’12-M’16) received the B.S. degree
in microelectronics from Xidian University, Xi’an,
China, in 2009, and the Ph.D. degree in computer
science and technology with Tsinghua University,
Beijing, China, in 2014.
He joined Beihang University, Beijing, China,
in 2016, where he is currently an Associate Pro-
fessor with the School of Computer Science and
Engineering. From 2014 to 2016, he was a post-
doctoral researcher with the Department of Electrical
and Computer Engineering, University of Pittsburgh,
Pittsburgh, Pennsylvania, United States. From 2013 to 2014, he was a research
intern at Intel Labs China, Intel Corporation. His current research interests
include spintronics and neuromorphic computing systems.
Dr. Yang was the recipient of the first place on TAU Power Grid Simulation
Contest in 2011, and the second place on TAU Power Grid Transient
Simulation Contest in 2012. He was a recipient of IEEE ICCD Best Paper
Award in 2013, IEEE ICESS Best Paper Award in 2017, and ACM GLSVLSI
Best Paper Nomination in 2015.
Pengcheng Dai received the B.S. degree in elec-
tronic engineering from Beihang University, Beijing,
China, in 2017. He is currently a graduate student in
School of Electronic and Information Engineering,
Beihang University, Beijing, China. His research
interests include computing architectures for deep
learning and machine vision.
Runze Liu received the B.S. degree in School of
Computer Science and Engineering, from Beihang
University, Beijing, China, in 2018. He is currently
a graduate student in University of South California,
CA, USA. His research interests include computing
architectures for deep learning and machine vision.
Yiran Chen (M’04-SM’16-F’18) received B.S and
M.S. from Tsinghua University and Ph.D. from
Purdue University in 2005. After five years in in-
dustry, he joined University of Pittsburgh in 2010 as
Assistant Professor and then promoted to Associate
Professor with tenure in 2014, held Bicentennial
Alumni Faculty Fellow. He now is a tenured As-
sociate Professor of the Department of Electrical
and Computer Engineering at Duke University and
serving as the co-director of Duke Center for Evolu-
tionary Intelligence (CEI), focusing on the research
of new memory and storage systems, machine learning and neuromorphic
computing, and mobile computing systems. Dr. Chen has published one book
and more than 300 technical publications and has been granted 93 US patents.
He is the associate editor of IEEE TNNLS, IEEE TCAD, IEEE D&T, IEEE
ESL, ACM JETC, ACM TCPS, and served on the technical and organization
committees of more than 40 international conferences. He received 6 best
paper awards and 14 best paper nominations from international conferences.
He is the recipient of NSF CAREER award and ACM SIGDA outstanding
new faculty award. He is the Fellow of IEEE.
Weisheng Zhao (M’06-SM’14-F’19) received the
Ph.D. degree in physics from University of Paris
Sud, Paris, France, in 2007.
He worked as a Research Associate at the CEA’s
embedded computing laboratory, France, from 2007
to 2009, and at the French national research center
(CNRS), France, as a tenured scientist from 2009
to 2014 where he led the spintronics integration
group. Now he is a professor and director of Fert
Beijing Research Institute in Beihang University,
Beijing, China. He has authored or coauthored 2
books, more than 200 scientific papers in the leading journals such as Nature
Communications, Advanced Materials, Proceedings of the IEEE and he also
holds 4 international patents and more than 50 Chinese patents. He is the
Fellow of IEEE.
Prof. Zhao is the associated editor of IEEE TRANSACTIONS ON NAN-
OTECHNOLOGY and IET ELECTRONICS LETTERS.
