Voltage-driven Building Block for Hardware Belief Networks by Hassan, Orchi et al.
Voltage-driven Building Block for Hardware Belief Networks
Orchi Hassan,1 Kerem Y. Camsari,1 and Supriyo Datta1
1School of Electrical and Computer Engineering,
Purdue University, West Lafayette, IN 47907, USA
Probabilistic spin logic (PSL) based on networks of binary stochastic neurons (or p-bits) has been
shown to provide a viable framework for many functionalities including Ising computing, Bayesian
inference, invertible Boolean logic and image recognition. This paper presents a hardware building
block for the PSL architecture, consisting of an embedded MTJ and a capacitive voltage adder of
the type used in neuMOS. We use SPICE simulations to show how identical copies of these building
blocks (or weighted p-bits) can be interconnected with wires to design and solve a small instance of
the NP-complete Subset Sum Problem fully in hardware.
Keywords - Probabilistic computing, Embedded MTJ, p-bits, p-circuits, Invertible Boolean logic,
Subset Sum Problem
I. INTRODUCTION
Probabilistic spin logic (PSL) has been shown to
provide a viable framework for Ising computing [1–3],
Bayesian inference [2], invertible Boolean logic [4], and
image recognition [5]. The PSL model is defined by two
equations [4] loosely analogous to a neuron and a synapse.
The former is what we call the p-bit whose output mi is
related to its dimensionless input Ii by the relation
mi(t+ ∆t) = sgn{rand(−1, 1) + tanh(Ii(t))} (1a)
where rand(−1,+1) is a random number uniformly dis-
tributed between −1 and +1, and t is the normalized
time unit. The synapse generates the input Ii from a
weighted sum of the states of other p-bits according to
the relation
Ii(t) = I0
(
hi(t) +
∑
j
Jijmj
)
(1b)
where, hi is the on-site bias and Jij is the weight of
the coupling from jth p-bit to ith p-bit and I0 is a
dimensionless constant. These two equations constitute
the behavioral model of PSL. The objective of this paper
is to present a voltage-driven hardware building block
using present day device technologies such as embedded
MRAM [6] and Floating-Gate MOS transistors, such
that identical copies of the same block can be intercon-
nected with wires to implement Eqs. 1.
The paper is organized as follows: We first show a
complete hardware mapping for the weighted p-bit by
augmenting a recently introduced Magnetoresistive Ran-
dom Access Memory (MRAM) type stochastic unit [7]
with a floating gate MOS-based capacitive network [8].
We then show how the results of a fully interconnected
Wp-bit circuit closely approximate the the ideal equations
using an example of an “invertible” Full Adder that can
perform 1-bit addition and subtraction. Finally, we show
how such invertible Full Adders can be interconnected
to solve a simple instance of the NP-complete Subset
Sum Problem.
Each example in this paper has been obtained using
full SPICE models which simply uses transistors, capac-
itors and resistors without any additional complex cir-
cuitry or processing.
II. BUILDING BLOCK
Our building block has two components correspond-
ing to the two Eqs. 1a,b. Eq. 1a is implemented by
the p-bit in Fig.1a which consists of an embedded low-
barrier unstable MTJ coupled to two CMOS inverters
which provides a stochastic output whose average value
is controlled by the input voltage:
Vout,i =
VDD
2
sgn
(
rand(−1,+1) + tanhVin,i
V0
)
(2a)
where ±VDD/2 are the supply voltages, and V0 is a pa-
rameter (∼ 22 mV) describing the width of the sigmoidal
response.
The value of V0 depends on the details of the 1T/1MTJ
in the embedded MRAM structure [7] and the transistor
characteristics. The conductance, G0 of the MTJ is cho-
sen to match the MTJ switching characteristics to the
transistors in the Wp-bit so that the overall transfer char-
acteristics is centered at zero as shown in Fig. 1e. To
do that, an input voltage of Vi =0V is applied at the
input of T1 and T2 transistors turning both of them ON
(|VGS| = 0.4V) and G0 is swept to observe the outputs.
The G0 value for which V
+
OUT=V
−
OUT = 0V is the value
chosen to be the MTJ conductance. For minimum sized
14nm HP-FinFET transistors models with VDD = 0.8V,
1/G0 ≈ 62 kΩ and it seems reasonable considering the
RA-products of modern MTJs [9].
Eqs. 1b is implemented by the weighted synapse por-
tion of Fig. 1a , which is a capacitive voltage adder just
like those used in neuMOS devices [8, 10]. We can write
V i =
Vbias,iCb,i +
∑
j Vout,jCij
Cg + Cz,i + Cb,i +
∑
j Cij
(2b)
ar
X
iv
:1
80
1.
09
02
6v
2 
 [c
s.E
T]
  8
 Fe
b 2
01
9
2(a)
(b)
LLG
Cz,i
Cb,i
Ci,1
Ci,n
Vbias,i
VOUT,1
VOUT,n
Vin,i
Eq. 2(b) Eq. 2(a)
0
+VDD/2 +VDD/2
(c)
(d)
.
.
.
S1
S2
Sl
.
.
.
D1
D2
Dm
.
.
.
Q1
Q2
Qn
1C0
1C0
1C0
2C0
2C0
2C0
4C0
4C0
4C0
V+ V-
VOUT-
VOUT+wp
(e)
weight logic p-bit
T1
T2 T0
FIG. 1. (a) Voltage-driven building block has two com-
ponents corresponding to Eqs. 2a,b. The first is the p-bit
implemented through an embedded low-barrier unstable MTJ
[4] with two inverters added to give positive and negative out-
puts. The low-barrier MTJ can be designed using low barrier
or circular nanomagnets. The second is the capacitive volt-
age adder with an inverter structure on the left similar to
the floating gate MOS transistors used in neuMOS devices
[8]. We call this combination of p-bit and its weight logic a
weighted p-bit (Wp-bit). (b)Shows the the block diagram of
Wp-bit. (c) Shows how an inverter helps amplify the input
(Vi) of the capacitive network to give Vin,i at the gate of the
p-bit’s NMOS transistor T0. (d) Shows the relation of the
input gate voltage of the NMOS (Vin,i) to output (V
+
OUT ).
(e) Shows the transfer characteristics of the Wp-bit as a whole.
The inputs in each case is swept from −0.4V to +0.4V in 1
µs. The yellow dots are time averaged values at each point
over 300 ns and the solid blue lines are numerical fits. The
magnet used in the simulations is defined by parameters in[7]:
Ms = 1100emu/cc,D = 22nm, t = 2nm,α = 0.01. All tran-
sistors were modeled using minimum size (nfin=1) 14 nm HP-
FinFET Predictive Technology Models with VDD = 0.8V and
T = 300K.
Note that the capacitive voltage divider typically attenu-
ates the voltage V i at its output, and the inverter scales
it up to Vin,i as shown in Fig. 1c, the two being related
approximately by
Vin,i ≈ VDD
2
tanh
V i
ν0
≈ VDD
2ν0
V i if V i  ν0 (2c)
where ν0 is a parameter characteristic of the inverter.
Eqs. 2a,b can be mapped onto the PSL Eqs. 1a,b by
defining
mi =
Vout,i
VDD/2
, Ii =
Vin,i
V0
(3a)
Cb,i = biC0 Cz,i = ziC0 (3b)
hi = bi
Vbias,i
VDD/2
, Jij =
Cij
C0
(3c)
I0 =
(VDD/2ν0)(VDD/2V0)
(Cg/C0) + zi + bi +
∑
j Jij
(3d)
Cg is the intrinsic gate capacitance of the neuMOS in-
verter. The significance of C0 is that we assume the input
is composed of many identical capacitors C0, and that
the weights Jij have been designed to have integer val-
ues such that Cij can be implemented by connecting Jij
elementary capacitors in parallel. The other coefficients
zi, bi are also integers. We adjust the number bi of bias
capacitors to facilitate external biasing and the number
zi of grounded capacitors to make zi + bi +
∑
j Jij = K
a constant, so that I0 is independent of index i:
I0 =
(VDD/2ν0)(VDD/2V0)
(Cg/C0) +K
(4)
Note that K is usually a fairly large number equal to
the sum of all the weights, and to implement an I0 ∼ 1
it is important to keep the factor (VDD/2ν0)(VDD/2V0)
to be much greater than 1. This is the reason for using
an inverter between the capacitive voltage adder and the
p-bit. Our model neglects any leakage resistances asso-
ciated with the capacitive weights. Modern transistors
with thin oxides can have gate leakage currents ∼1nA,
with RC ∼ µs-ms. This should not affect the weighting,
since the examples presented here operate at sub-ns time
scales. For slower neurons, it may be advisable to use
thicker oxides for the capacitive weights to ensure lower
leakage.
Fig. 1b shows the icon we use to represent our building
block which we call a weighted p-bit. The input consists
of three types of inputs designated S, D and Q having
capacitances C0, 2 C0 and 4 C0. Combinations of these
are used to implement different weights Jij and different
bias hi. Each block has two outputs V
+
OUT , V
−
OUT . The
choice of output depends on the sign of the corresponding
Jij . Similarly different signs of hi are implemented by
choosing Vbias,i to be +VDD/2 or −VDD/2.
3(a)
(b) (c)
...
...
...
. . .
. . .
1C
1C
1C
2C
4C
4C
S2
S1
S3
D1
Q1
Q2
�zA
Ci –
B –
A –
A+
S+
C0+
0V
hA
hAhBhCi
hC0 hS
VOUT+
VOUT-
V+ V-
+VDD/2 –VDD/2
A
Ci
B
S
A
C0
S
C0
Ci
B
A
0
0
0
0
0
–1
–2
–2
–1 –1
–1 –1
–1
1
1
1
111
2 2 2
2
2
2JFA =
FA
C
FIG. 2. Invertible Full Adder with Wp-bit: (a)[J ] matrix
for implementing a Full Adder. (b) Explicitly shows the hard-
ware connections made to one of the input p-bits (A) from
the other p-bits where 1C, 2C, and 4C represent capacitors
in units of C = C0 = 100aF . (c) Shows the subcircuit repre-
sentation of the Full Adder with its input/output terminals;
Ci, B,A input and S,Co output read terminals and separate
corresponding clamping terminals hCi , hB , hA, hS , hC0 . We
used 8C for the clamping terminals to ensure input / outputs
follow what is dictated by the external signals.
III. INVERTIBLE FULL ADDER
In PSL, any given truth table can be implemented us-
ing Eq. 1 by choosing an appropriate [J ] and [h] matrices
[4]. Here we show how those [J ] and [h] are mapped onto
physical hardware using our proposed building block us-
ing only transistors, resistors and capacitances.
A Full Adder can be implemented in PSL using the [J ]
matrix shown in Fig. 2. In this paper, we improve the 14
p-bit implementation of the invertible Full Adder (FA)
in Ref.[4] and implement the same functionality using 5
p-bits. This is achieved by first noting that the first half
of the FA truth table is complementary to the second
half for the FA (Fig. 3a inset). The first 4 lines in the
truth table is turned into an orthonormal set by a Gram-
Schmidt process and a [J] matrix is obtained using Eq.12
in Ref.[4] which is finally rounded to integer values, with
diagonal entries replaced by zeros. This [J ] defines the
interconnection between the 5 Wp-bits of the Full Adder
in hardware. Each row of the [J ] matrix are realized in
terms of capacitive coupling to the gate of the associated
terminal.
To ensure a uniform I0 is applied to each p-bit (Eq. 4),
the same weighting factor K needs to be used for all Wp-
bits. To apply a given I0, we first find max(bi +
∑
Jij)
B  1
Ci  0
A  1
Directed
Inverted
0 0
-V +V +V
+V
0 0
-V
0
S 0
C0  1
(a)
(b)
FIG. 3. Full SPICE implementation of an Invertible
Full Adder(5 Wp-bit): The 5 Wp-bit invertible Full Adder
circuit is simulated in (a) Directed and (b) Inverted modes.
The clamping values are indicated. All biasing terminals that
are not clamped to 1 or 0 are grounded. The histogram of
[CiBASC0] is obtained after thresholding voltages ((V < 0) ≡
−1, (V > 0) ≡ +1). The SPICE model is run for 1µs and
compared with the PSL equations where each p-bit is updated
in random but sequential order [4]. In this example I0 ' 1 is
chosen to emphasize how the models are in good agreement
even in the magnitudes of the minor peaks of the histogram.
for any given [J ], and then ground zi = M − bi +
∑
Jij
(zi ≥ 0, zi ∈ N) unit capacitances for all terminals where
M is a number that can be used to control I0, a larger M
causing a smaller I0. Fig. 2b shows explicit connections
made to one of the inputs “A” and Fig. 2c shows the
subcircuit of the Full Adder with Ci, B,A as inputs, S,C0
as the outputs, and hCi, hB , hA, hS , hCo as the clamping
pins.
Fig. 4 shows the operation of a Full Adder in the usual
forward mode with Ci, B,A clamped to values (0,1,1)
which forces the S and C0 to (0,1) according to the truth
table. In the invertible mode S and C0 are clamped
to (0,1) and the circuit stochastically searches consis-
tent combinations of Ci, B,A to satisfy the truth table:
{Ci, B,A} = {{0, 1, 1}, {1, 0, 1}, {1, 1, 0}}. Fig. 4 shows
steady state (t = 1 µs) histogram plots of the Full Adder
operation in direct and inverted mode side by side with
results from the PSL behavioral model.
4The good agreement between the ideal PSL behav-
ioral model and the coupled SPICE simulation that solves
PTM-based transistors models with stochastic LLG val-
idates the hardware mapping of the ideal p-bit equations
with the weighted p-bits.
IV. 3SUM PROBLEM
3SUM is a decision problem in complexity theory that
asks whether three elements of a given set can sum up to
zero. A variant of the problem is when the set of three
numbers have to add up to a given constant number.
This problem has a polynomial time solution and is not
in NP. In this section, we show how the invertibility fea-
ture of the Full Adders can be utilized to design a hard-
ware 3SUM solver, and in the next section, we show how
the 3SUM hardware can be modified to design a general
solver for the NP-complete Subset Sum Problem.
-V-V+V
-V 0 0 0 0 0 0 0 0
0 000
0 0 0 +V
-V-V
5
+V +V
X0 X1 X2 X3 X4
(a)
(b)
0 0000 0 0 0
0
0 25 50 75 100 125 150
0
15
30
45 A+B+C
S(15)
0
5
10
15
0
5
10
15
0 25 50 75 100 125 150
0
5
10
15
0 15 30 45
0
0.5
1
0
0.5
1
0
0.5
1
0 5 10 15
0
0.5
1
C
C
B
A
SU
M
A
B
A+B+C
Probabilities
time (ns)
FIG. 4. SPICE simulation of a 4bit 3-SUM Problem (9
× 5 = 45 Wp-bit network): (a) The circuit is constructed
by interconnecting two rows of invertible Full-Adders (FA) to
construct a 3 number, 4-bit adder. The sum S is clamped to
the desired value and A, B, C resolves themselves to create all
the possible 3 number subsets out of all positive numbers 0 to
24−1 that satisfy A + B + C = S. (b) Shows the results when
S is clamped to 15. A, B and C get correlated to satisfy the
sum with different combinations. In this example, the inputs
A, B, C are unconstrained and can take on any value between
0− 15.
The invertibility property of the Full Adders ensure
that given the sum, it can provide the possible input
combinations for that sum as shown in Fig.4a. So an
n-bit 3 number adder circuit implemented in PSL can
essentially provide solution sets for the 3SUM problem
when the sum is clamped to a given value.
Fig. 4a shows the circuit constructed out of Full Adders
to solve a 4-bit 3SUM problem. Each of the Full Adders
in the circuit are the 5 p-bit invertible adders that were
shown in Fig. 3. The first row of adders adds the two
4-bit numbers A and B, and feeds its output X, to the
next row of adders which adds X and C to give the sum
S = C +X = C +B + A. Because p-circuits are invert-
ible, if we clamp the sum S, the circuit naturally explores
through all possible sets and multisets of the set of all in-
tegers from 0 to 24 − 1 that add up to S. The given set
for the problem could be implemented through clamping
certain bits of A,B and C or externally circuitry could be
used to detect only the results that belong to the given
set. Fig. 4b shows the how A,B,C is fluctuating between
values that satisfy the clamped sum 15.
V. SUBSET-SUM PROBLEM (SSP)
In this section, we show how the hardware circuit that
was designed for 3SUM problem could be modified to
solve a small instance of subset-sum problem (SSP) [11]
which is believed to be a fundamentally difficult problem
in computer science (NP-complete). The SSP asks, given
a set G with a finite number of positive numbers, if there
is a subset S’ such that S’ ⊆ G whose elements sum to
a specified target. For example, Fig. 5 shows a circuit
that is programmed to choose a set, G={1, 2, 4} and a
target that is defined by 4-bits. In the 3SUM circuit the
input bits (A, B, C) were left “floating”, here, the inputs
are constrained to a given number (1,2,4) by clamping
the remaining bits of an input. For example, the inputs
A1 and A0 are clamped to zero to make A either 4 or 0.
Under these conditions, clamping the output to a speci-
fied target makes the circuit search for a consistent input
combination to find a subset that satisfies the clamped
target. Fig. 5c shows three example targets where the
inputs get correlated to satisfy the clamped sum. The
invertibility feature that is utilized to solve the SSP in
this hardware is similar to those discussed in the context
of memcomputing [12], however the physical mechanisms
are completely different.
One striking difference in the design of the SSP we con-
sidered, compared to the 3SUM hardware is the direction
of information. In 3SUM the connections were from the
first layer of Full Adders to the second, as in normal ad-
dition (Fig. 4a). In the SSP, we observed that reversing
these connections from the second layer of adder to the
first layer drastically improves the accuracy of the solu-
tion (Fig. 5a). A similar observation regarding the di-
rectional flow of information for another inverse problem
using p-circuits (integer factorization) was made in [4].
Here we have limited the discussion to a small instance
5(a) (b)
S´= {A,B,C}S´ ⊆ G
G={1,2,4}
(c)
-V
-V 0 0
0 00 0
0 0
0 0
0 +V -V
-V-V
-V
X0 X2 X3X1
S4
S3S2S1S0
-V -V V-V-
-V -V
+V
0
A
+B
+C
FIG. 5. SPICE simulation of a 3 input, 3-bit Sub-
set Sum Problem (7 × 5 = 35 Wp-bit network): (a) A
3-input 3-bit binary adder that adds three numbers A,B,C.
Unlike the 3SUM, in this case the inputs are constrained to
a given value specified by the set G ={1, 2, 4} in this exam-
ple. A target S is selected and the output of the adders are
clamped to the target value as shown in (b). (c) Shows three
different instances of a target where the inputs find a con-
sistent combination (the correct subset of G) to satisfy the
target. Histograms show that the highest probable state is
the correct subset. An important difference from the 3SUM
circuit is that the information flow is directed from the target
(second layer of adders) to the first layer of adders.
of the SSP which would in general require more layers
of Full Adders in both vertical and horizontal directions
to account for more numbers of elements in G and their
size. The purpose of this example is to illustrate how
invertibility can be combined with standard digital VLSI
design to construct any general “cost function” for hard
problems of computer science in an asynchronously run-
ning hardware platform without any external clocking.
VI. CONCLUSION
In this paper we have proposed a compact building-
block for Probabilistic Spin Logic (PSL) combining a re-
cently proposed Embedded MRAM-based p-bit, with an
integrated capacitive network that can be implemented
using Floating Gate MOS (FGMOS) transistors simi-
lar to the neuMOS concept. We have shown by exten-
sive SPICE simulations that the results of the hardware
model for the weighted p-bit agree well with the behav-
ioral equations of PSL. Having dedicated MTJ based
hardware stochastic neurons could help minimize the
footprint and consume lower power for applications as
also indicated by ref.[5, 9]. Even though an FGMOS-
based capacitive network for performing the voltage ad-
dition seems like a natural option, we note that the device
equations for any capacitance [Cij] or conductance net-
work [Gij] would have been essentially the same. More-
over, our discussion was only about static weights, but an
FPGA-like reconfigurable weighting scheme can also be
employed either by using transistor-based gates or by ad-
ditional multiplexing circuitry to perform online learning
or to redesign p-circuit connectivity. Finally, using the
basic building block we have shown how a small instance
of the NP-complete Subset Sum Problem hardware solver
can be designed using the unique invertibility feature of
p-circuits.
ACKNOWLEDGMENT
This work was supported in part by the Center for
Probabilistic Spin Logic for Low-Energy Boolean and
Non-Boolean Computing (CAPSL), one of the Nanoelec-
tronic Computing Research (nCORE) Centers as task
2759.005, a Semiconductor Research Corporation (SRC)
program sponsored by the NSF through ECCS 1739635.
[1] B. Sutton, K. Y. Camsari, B. Behin-Aein, and S. Datta,
Scientific Reports 7 (2017).
[2] B. Behin-Aein, V. Diep, and S. Datta, Scientific reports
6 (2016).
[3] Y. Shim, A. Jaiswal, and K. Roy, Jour-
nal of Applied Physics 121, 193902 (2017),
http://dx.doi.org/10.1063/1.4983636.
[4] K. Y. Camsari, R. Faria, B. M. Sutton, and S. Datta,
Phys. Rev. X 7, 031014 (2017).
[5] R. Zand, K. Y. Camsari, I. Ahmed, S. D. Pyle, C. H.
Kim, S. Datta, and R. F. DeMara, arXiv preprint
arXiv:1710.00249 (2017).
[6] C. Lin, S. Kang, Y. Wang, K. Lee, X. Zhu, W. Chen,
X. Li, W. Hsu, Y. Kao, M. Liu, et al., Electron Devices
Meeting (IEDM), 2009 IEEE International (IEEE, 2009)
pp. 1–4.
[7] K. Y. Camsari, S. Salahuddin, and S. Datta, IEEE Elec-
tron Device Letters 38, 1767 (2017).
[8] T. Shibata and T. Ohmi, IEEE Transactions on Electron
devices 39, 1444 (1992).
[9] A. Mizrahi, T. Hirtzlin, A. Fukushima, K. Hitoshi, Y.
Shinji, J. Grollier and D. Querlioz, Neural-like comput-
ing with populations of superparamagnetic basis functions
Nature communications,9, 1533 (2018).
6[10] N. Nakamura, K. Shimada, T. Matsuda, and M. Kimura,
Future of Electron Devices, Kansai (IMFEDK), 2015
IEEE International Meeting for (IEEE, 2015) pp. 90–91.
[11] T. H. Cormen, Introduction to algorithms (MIT press,
2009) .
[12] F. L. Traversa and M. Di Ventra, Chaos: An Interdisci-
plinary Journal of Nonlinear Science 27, 023107 (2017).
