Probabilistic Circuits for Autonomous Learning: A simulation study by Kaiser, Jan et al.
Probabilistic Circuits for Autonomous Learning: A simulation study
Jan Kaiser,1, a) Rafatul Faria,1, b) Kerem Y. Camsari,1, c) and Supriyo Datta1, d)
Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN,
47906 USA
(Dated: 15 October 2019)
Modern machine learning is based on powerful algorithms running on digital computing platforms and there
is great interest in accelerating the learning process and making it more energy efficient. In this paper we
present a fully autonomous probabilistic circuit for fast and efficient learning that makes no use of digital
computing. Specifically we use SPICE simulations to demonstrate a clockless autonomous circuit where the
required synaptic weights are read out in the form of analog voltages. Such autonomous circuits could be
particularly of interest as standalone learning devices in the context of mobile and edge computing.
I. INTRODUCTION
Machine learning, inference and many other emerg-
ing applications1 make use of stochastic neural networks
comprising (1) a binary stochastic neuron (BSN)2,3 and
(2) a synapse that constructs the inputs Ii to the i
th BSN
from the outputs mj of all other BSNs.
The output mi of the i
th BSN fluctuates between +1
and -1 with a probability controlled by its input
mi(t+ τN ) = sgn [tanh (Ii(t))− r] (1)
where r represents a random number in the range
[−1,+1], and τN is the time it takes for a neuron to
provide a stochastic output mi in accordance with a new
input Ii.
Usually the synaptic function, Ii({m}) is linear and is
defined by a set of weights Wij such that
Ii(t+ τS) =
∑
j
Wijmj(t) (2)
where τS is the time it takes to recompute the inputs {I}
everytime the outputs {m} change. Typically Eqs.(1),(2)
are implemented in software, often with special accelera-
tors for the synaptic function using GPU/TPUs4,5.
The time constants τN and τS . are not important when
Eqs.(1),(2) are implemented on a digital computer using
a clock to ensure that neurons are updated sequentially
and the synapse is updated between any two updates.
But they play an important role in clockless operation
of autonomous hardware that makes use of the natu-
ral physics of specific systems to implement Eqs.(1),(2)
approximately. For example, Eq. (1) can be imple-
mented using stochastic magnetic tunnel junctions (s-
MTJs)6,7, while resistive or capacitive crossbars can im-
plement Eq. (2)8. It has been shown that such hard-
ware implementations can operate autonomously with-
out clocks, if the BSN operates slower than the synapse,
that is, if τN >> τS
9.
a)Electronic mail: kaiser32@purdue.edu
b)Electronic mail: rfaria@purdue.edu
c)Electronic mail: kcamsari@purdue.edu
d)Electronic mail: datta@purdue.edu
Stochastic neural networks defined by Eqs. (1),(2) can
be used for inference whereby the weights Wij are de-
signed such that the system has a very high probability
of visiting configurations defined by {m} = {v}n, where
{v}n represents a specified set of patterns. However, the
most challenging and time-consuming part of implement-
ing a neural network is not the inference function, but the
learning required to determine the correct weights Wij
for a given application. This is commonly done offline
using powerful cloud-based processors and there is great
interest in accelerating the learning process and making
it more energy efficient so that it can become a routine
part of mobile and edge computing.
In this paper we present a new approach to the prob-
lem of fast and efficient learning that makes no use of
digital computing at all. Instead it makes use of the
natural physics of a fully autonomous probabilistic cir-
cuit composed of standard electronic components like re-
sistors, capacitors and transistors along with stochastic
magnetic tunnel junctions (MTJs).
We focus on a fully visible Boltzmann machine
(FVBM), a form of stochastic recurrent neural network,
for which the most common learning algorithm is based
on the gradient ascent approach to optimizing the max-
imum likelihood function10,11. We use a slightly simpli-
fied version of this approach, whereby the weights are
changed incrementally according to
Wij(t+ ∆t) = Wij(t) + [vivj −mimj − λWij(t)]
where  is the learning parameter, λ is the regularization
parameter12 and the bipolar training vectors {v}n are cy-
cled sequentially or as an average. The term vivj is the
(average) correlation between the ith and the jth entry
of the training vector {v}n. The term mimj corresponds
to the sampled correlation taken from the model’s distri-
bution. The advantage of this network topology is that
the learning rule is local and tolerant to stochasticity.
For our autonomous operation we replace this equa-
tion with its continuous time version (τL: learning time
constant)
dWij
dt
=
vivj −mimj − λWij
τL
(3)
which we translate into an RC circuit by associating Wij
with the voltage on a capacitor C driven by a voltage
1
ar
X
iv
:1
91
0.
06
28
8v
1 
 [c
s.E
T]
  1
4 O
ct 
20
19
mj
mi mi
Rf
Rf
Rf
Rf
Rf Rf
Rf
Rf
mj
mj
mj
mj
Av
Av
C
Lorem ipsum
Rf
Vij
Vm,ij
Vv,ij
Lorem ipsum
R/2
R/2
Eq. 2Eq. 3 Eq. 1
{m}
[W]{v}n
{m}
{I} {m}
OP1
OP2
OP3
in,i
Weight update Synapse Neuron
LL
G Hnoise
m
Is
FIG. 1. Clockless learning circuit designed to emulate Eqs. (1)-(3) autonomously.
source (Vv,ij−Vm,ij) with a series resistance R (Fig.(1)):
C
dVij
dt
=
Vv,ij − Vm,ij − Vij
R
(4)
with vivj = Vv,ij/(VDD/2) and mimj = Vm,ij/(VDD/2).
From Fig. 1 and comparing Eqs. (3),(4) it is easy
to see how the weights and the learning and regular-
ization parameters are mapped into circuit elements:
Wij = AvVij/V0, λ = V0/(AvVDD/2) and τL = λRC
where Av is the voltage gain of OP3 in Fig. 1. For
proper operation the learning time scale τL has to be
much larger than the neuron time τN to be able to col-
lect enough statistics throughout the learning process.
A key element of this approach is the representation
of the weights W with voltages rather than with pro-
grammable resistances for which memristors and other
technologies are still in development13. By contrast the
charging of capacitors is a textbook phenomenon, allow-
ing us to design a learning circuit that can be built today
with established technology. The idea of using capacitor
voltages to represent weights in neural networks has been
presented by several authors for different network topolo-
gies in analog learning circuits14–17. The use of capaci-
tors has the advantage of having a high level of linearity
and symmetry for the weight updates during the training
process18.
In Section II, we will describe such a learning cir-
cuit that emulates Eqs. (1)-(3). The training images or
patterns {v}n are fed in as electrical signals into the in-
put terminals, and the synaptic weights Wij can then be
read out in the form of voltages from the output termi-
nals. Alternatively the values can be stored in a non-
volatile memory from which they can subsequently be
read and used for inference. In Section III, we will
present SPICE simulations demonstrating the operation
of this autonomous learning circuit.
II. CLOCKLESS LEARNING CIRCUIT EMULATING
EQS.(1)-(3)
The autonomous learning circuit has 3 parts where
each part represents one of the three Eqs. (1)-(3). On
the left hand side of Fig. 1, the training data is fed into
the circuit by supplying a voltage Vv,ij which is given by
the ith entry of the bipolar training vector vi multiplied
by the jth entry of the training vector vj and scaled by
the supply voltage VDD/2. The training vectors can be
fed in sequentially or as an average of all training vectors.
The weight voltage Vij across capacitor C follows Eq. (4)
where Vv,ij is compared to voltage Vm,ij which represents
correlation of the outputs of BSNs mi and mj . Voltage
Vm,ij is computed in the circuit by using an XNOR gate
that is connected to the output of BSN i and BSN j.
The synapse in the center of the circuit connects weight
voltages to neurons according to Eq. (2). Voltage Vij
has to be multiplied by 1 or -1 depending on the cur-
rent value of mj . This is accomplished by using a switch
which connects either the positive or the negative node
of Vij to the operational amplifiers OP1 and OP2. Here,
OP1 accumulates all negative contributions and OP2 all
positive contributions of the synaptic function. The dif-
ferential amplifier OP3 takes the difference between the
output voltages of OP2 and OP1 and amplifies the volt-
age by amplification factor Av. This voltage conversion
is used to control the voltage level of Vij in relation to the
input voltage of each BSN. The voltage level at the input
of the BSN is fixed by the reference voltage of the BSN
V0. However, the voltage level of Vij can be adjusted and
utilized to adjust the regularization parameter λ in the
2
learning rule (Eq. (3)). The functionality of the BSN is
described by Eq. (1) where the dimensionless input I(t)
is given by Ii(t) = Vi,in(t)/V0. This relates the voltage
Vij to the dimensionless weight by Wij = AvVij/V0. The
hardware implementation of the BSN uses a stochastic
MTJ in series to a transistor6. Due to thermal fluctua-
tions of the low-barrier magnet (LBM) of the MTJ the
output voltage of the MTJ fluctuates randomly but with
the right statistics given by Eqn. 1. The time dynam-
ics of the LBM can be obtained by solving the stochastic
Landau-Lifshitz-Gilbert (LLG) equation. Due to the fast
thermal fluctuations of the LBM in the MTJ, Eq. (1)
can be evaluated on a subnanosecond timescale leading
to fast generation of samples19,20.
Fig. 1 just shows the hardware implementation of one
weight and one BSN. The size of the whole circuit de-
pends on the size of the training vector N . For every
entry of the training vector one BSN is needed. The num-
ber of weights which is the number capacitive circuits is
given by N(N − 1)/2 where every connection between
BSNs is assumed to be reciprocal.
The learning process is captured by Eqs. (3) and (4).
The whole learning process has similarity with the soft-
ware implementation of persistent contrastive divergence
(PCD)21 since the circuit takes samples from the models
distribution (Vm,ij) and compares it to the target dis-
tribution (Vv,ij) without reinitializing the Markov Chain
after a weight update. During the learning process volt-
age Vij reaches a constant average value where
dVij
dt ≈ 0.
This voltage Vij = Vij,learned corresponds to the learned
weight.
For inference the capacitor C is replaced by a volt-
age source of voltage Vij,learned. Consequently, the au-
tonomous circuit will compute the desired functionality
given by the training vectors. In general, training and
inference has to be performed on identical hardware in
order to learn around variations. It is important to note
that in inference mode this circuit can be used for opti-
mization by performing electrical annealing. This is done
by increasing all weights voltages Vij by the same factor
over time. In this way the ground state of a Hamiltonian
like the Ising Hamiltionian can be found22,23.
III. SPICE SIMULATIONS
In this section the autonomous learning circuit in Fig.
1 is simulated in SPICE. We show how the proposed
circuit can be used for both inference and learning. As
examples we demonstrate the learning on a Full Adder
(FA) and on 5x3 digits images. For all SPICE simula-
tions the following parameters are used for the stochastic
MTJ in the BSN implementation: Saturation magne-
tization MS = 1100 emu/cc, LBM diameter D = 22
nm, LBM thickness l = 2 nm, TMR=110%, damping
coefficient α = 0.01, temperature T = 300 K and demag-
FIG. 2. Feeding of training data into the circuit a)
Weight voltage V1,5 over time for sequential and average feed-
ing in of the correlation between visible unit i and visible unit
j for training a full adder. b) Correlation v1v5 vs. time t. All
eight lines of the truth table of a full adder are cycled through
where every vector is shown for time T = 1 ns at a time. c)
Enlarged version of subfigure a). For sequential feeding in of
data, the voltage change in v1v5 directly affects V1,5.
A B Cin S Cout Dec PIdeal
-1 -1 -1 -1 -1 0 0.125
-1 -1 1 1 -1 6 0.125
-1 1 -1 1 -1 10 0.125
-1 1 1 -1 1 13 0.125
1 -1 -1 1 -1 18 0.125
1 -1 1 -1 1 21 0.125
1 1 -1 -1 1 25 0.125
1 1 1 1 1 31 0.125
TABLE I. Truth Table of Full Adder Every 0 in the bi-
nary representation of the Full Adder is replaced by a -1 in
the bipolar representation. ”Dec” represents the decimal con-
version of each line. PIdeal is the ideal probability distribution
were every line’s probability is p = 1/8 = 0.125.
netization field HD = 4piMSV with V = (D/2)
2pil. For
the transitors, 14 nm HP-FinFET PTM models were
used24 with fin number fin = 1 for the inverters and
fin = 2 for XNOR-gates. Ideal operational amplifiers
and switches are used in the synapse. For this parameter
set, the reference voltage of each BSN is given by
V0 = 50 mV
6. The characteristic time of the BSNs τN is
in the order of 100 ps20 and much larger than the time it
takes for the synaptic connections, namely the resistors
and operational amplifiers, to propagate BSN outputs
to neighboring inputs (τN  τS).
Learning addition
As first training example we use the probability distri-
bution of a full adder (FA). The truth table/probability
distribution of a full adder with bipolar variables is shown
in table I. To learn this distribution the correlation terms
3
vivj in the learning rule have to be fed into the voltage
node Vv,ij . The correlation is dependent on what train-
ing vector / truth table line is fed in. For the second
line for the truth table for example v1v2 = −1 · −1 = 1
and v1v3 = −1 · 1 = −1 with A being the first node, B
the second node and so on. In Fig. 2 b) the correlation
v1v5 is shown. For the sequential case the value of v1v5
is obtained by circling through all lines of the truth ta-
ble where each training vector is shown for 1 ns. A and
Cout in table I only differ in the fourth and fifth line for
which v1v5 = −1. For all other cases v1v5 = 1. The
average of all lines is shown as red solid line. Fig. 2 a)
shows how the weight voltage Vij with i = 1 and j = 5
for FA learning and the first 1000 ns of training. The
following learning parameters have been used for the FA:
τL = 62.5 ns where C = 1 nF and R = 5 kΩ, Av = 10
and RF = 1 MΩ. This choice of learning parameters
ensures that τL  τN . Due to the averaging effect of
the RC-circuit both sequential and average feeding of the
training vector result in similar learning behavior as long
as the RC-constant is much larger than the timescale of
sequential feeding. Fig. 2 c) shows the enlarged version
of Fig. 2 a). For the sequential feeding, voltage V1,5
changes substantially every time vivj switches to -1.
In Fig. 3 a) the probability distribution of the FA
PTrain is shown after 5000 ns of learning and compared
to the ideal distribution PIdeal. The training vector is
fed in as an average. Fig. 3 a) also shows the normalized
histogram PSPICE of the sampled BSN configurations
taken from the last 2500 ns of learning. For both PTrain
and PSPICE the eight trained configurations of table I are
the dominant peaks. To monitor the training process,
the Kullback-Leibner divergence between the trained to
the ideal probability distribution KL(PIdeal||PTrain(t))
is plotted as a function of training time t in Fig. 3 b).
During training the KL divergence decreases over time
until it reaches a constant value at about 0.33. Fig. 3
shows that the probability distribution of a FA can be
learned very fast with the proposed autonomous learning
circuit.
Learning image completion
As second example the circuit is utilized to train 10
5x3 pixel digit images shown in Fig.4 a). The network is
trained for 3000 ns and the bipolar training data is fed in
as average of the 10 vivj terms for every digit. The same
learning parameters as in the previous section are used
here. In 4 b) the KL divergence is shown as a function of
time between the trained and the ideal probability dis-
tribution where the ideal distribution has 10 peaks with
each peak being 10 % for each digit. Most of the learn-
ing happens in the first 500 ns of training, however, the
KL divergence still reduces slightly during the later parts
of learning. After 3000 ns the KL divergence reaches a
value of around 2.
a)
b)
FIG. 3. Training of a Full Adder in SPICE a) Probabil-
ity distribution of a trained full adder is compared to the ideal
distribution with inputs binary inputs A, B, Cin and outputs
S and Cout. The training is performed for 5000 ns. Blue bars
are the probability distribution extracted from SPICE simu-
lations by creating a histogram of the configurations of m over
the last 2500 ns of training. Red bars are the probability dis-
tribution obtained by using the weight voltages at the end of
training. b) KullbackLeibler divergence between the trained
and the target distribution defined as KL(PIdeal||PTrain(t)) =∑
m PIdeal(m) log(PIdeal(m)/PTrain(m, t)) . Following param-
eters have been used in the simulations: C = 1 nF, R = 5 kΩ,
RF = 1 MΩ, Av = 10, V0 = 50 mV.
For inference we replace the capacitor by a voltage
source where every voltage is given by the previously
learned voltage Vij . The circuit is run for 10 instances
where every instance has a unique clamping pattern of 6
pixels representing one of the 10 digits. The clamped in-
puts are show in Fig. 4 c). The input of a clamped BSN
is set to ±VDD/2. Each instance is run for 100 ns and
the outputs of the BSNs monitored. The BSNs fluctuate
between the configurations given by the learned proba-
bility distribution. In Fig. 4 d) the heat map of the
output of the BSNs is shown. For every digit the most
likely configuration is given by the trained digit image.
To illustrate this point, the amount of BSN fluctuations
is reduced by increasing the learned weight voltages by
factor of I0 = 2. This step is equivalent of reducing the
temperature of the Boltzmann machine by a factor of
225. The circuit is again run in inference mode for 100
ns with the same clamping patterns. In Fig. 4 e) the
heatmap is shown. The circuit locks in into the learned
digit configuration. This shows that in inference mode
4
a)
b)
c)
d)
Digits
Clamping patterns
Heatmap: I0=1
Heatmap: I0=2e)
FIG. 4. Training and Testing of 5x3 Digit images a) 5x3
digit images from 0 to 9. b) Kullback Leibner divergence dur-
ing training for 3000 ns using the autonomous circuit. c)-e)
Image completion: For inference, 6 unique pixels are clamped
for every digit (as shown in c)). Subfigure d) ande) show
the heatmap of BSN outputs during inference for running the
circuit for 100 ns for d) I0 = 1 and e) I0 = 2.
the circuit can be utilized for image completion.
IV. CONCLUSIONS
We have used full SPICE simulations to demonstrate
the feasibility of a clockless autonomous circuit running
without any digital component with the learning parame-
ters set by circuit parameters. Due to the fast BSN oper-
ation, samples are drawn at subnanosecond speeds lead-
ing ultrafast learning, as such the learning speed should
be at least multiple orders of magnitudes faster compared
to other computing platforms26–28. We believe the ap-
proach can be extended to other machine learning algo-
rithms to design autonomous circuits. Such standalone
learning devices could be particularly of interest in the
context of mobile and edge computing.
ACKNOWLEDGMENT
This work was supported in part by ASCENT, one of six centers
in JUMP, a Semiconductor Research Corporation (SRC) program
sponsored by DARPA. KYC gratefully acknowledges support from
Center for Science of Information (CSoI), an NSF Science and Tech-
nology Center, under grant CCF-0939370.
5
1Schuman, C. D. et al. A Survey of Neuromorphic Computing
and Neural Networks in Hardware. arXiv:1705.06963 [cs] (2017).
URL http://arxiv.org/abs/1705.06963. ArXiv: 1705.06963.
2Ackley, D. H., Hinton, G. E. & Sejnowski, T. J. A learning
algorithm for Boltzmann machines. In Readings in Computer
Vision, 522–533 (Elsevier, 1987).
3Neal, R. M. Connectionist learning of belief networks. Artifi-
cial Intelligence 56, 71–113 (1992). URL https://linkinghub.
elsevier.com/retrieve/pii/0004370292900656.
4Google supercharges machine learning tasks
with TPU custom chip (2016). URL
https://cloud.google.com/blog/products/gcp/
google-supercharges-machine-learning-tasks-with-custom-chip/.
5Schmidhuber, J. Deep Learning in Neural Networks: An
Overview. Neural Networks 61, 85–117 (2015). URL http:
//arxiv.org/abs/1404.7828. ArXiv: 1404.7828.
6Camsari, K. Y., Salahuddin, S. & Datta, S. Implementing p-
bits With Embedded MTJ. IEEE Electron Device Letters 38,
1767–1770 (2017).
7Camsari, K. Y., Faria, R., Sutton, B. M. & Datta, S. Stochastic
p -Bits for Invertible Logic. Physical Review X 7 (2017). URL
http://link.aps.org/doi/10.1103/PhysRevX.7.031014.
8Hassan, O., Camsari, K. Y. & Datta, S. Voltage-Driven Building
Block for Hardware Belief Networks. IEEE Design Test 36, 15–
21 (2019).
9Sutton, B., Faria, R., Ghantasala, L. A., Camsari, K. Y. &
Datta, S. Autonomous Probabilistic Coprocessing with Petaflips
per Second. arXiv:1907.09664 [cond-mat] (2019). URL http:
//arxiv.org/abs/1907.09664. ArXiv: 1907.09664.
10Koller, D. & Friedman, N. Probabilistic Graphical Models: Prin-
ciples and Techniques - Adaptive Computation and Machine
Learning (The MIT Press, 2009).
11Carreira-Perpinan, M. A. & Hinton, G. E. On contrastive diver-
gence learning. In Aistats, vol. 10, 33–40 (Citeseer, 2005).
12Ng, A. Y. Feature selection, L 1 vs. L 2 regularization, and
rotational invariance. In Twenty-first international conference
on Machine learning - ICML ’04, 78 (ACM Press, Banff, Al-
berta, Canada, 2004). URL http://portal.acm.org/citation.
cfm?doid=1015330.1015435.
13Li, Y., Wang, Z., Midya, R., Xia, Q. & Yang, J. J. Review
of memristor devices in neuromorphic computing: materials sci-
ences and device challenges. J. Phys. D: Appl. Phys. 51, 503002
(2018). URL http://stacks.iop.org/0022-3727/51/i=50/a=
503002?key=crossref.4b31e50d126d48242ac6ec568c97f1c2.
14Kim, S., Gokmen, T., Lee, H. & Haensch, W. E. Analog CMOS-
based resistive processing unit for deep neural network training.
In 2017 IEEE 60th International Midwest Symposium on Cir-
cuits and Systems (MWSCAS), 422–425 (2017).
15Schneider, C. R. & Card, H. C. Analog CMOS deterministic
Boltzmann circuits. IEEE Journal of Solid-State Circuits 28,
907–914 (1993).
16Card, H. C., Schneider, C. R. & Schneider, R. S. Learning
capacitive weights in analog CMOS neural networks. Journal
of VLSI Signal Processing 8, 209–225 (1994). URL https:
//doi.org/10.1007/BF02106447.
17Sung, C., Hwang, H. & Yoo, I. K. Perspective: A review on mem-
ristive hardware for neuromorphic computation. Journal of Ap-
plied Physics 124, 151903 (2018). URL http://aip.scitation.
org/doi/10.1063/1.5037835.
18Li, Y. et al. Capacitor-based Cross-point Array for Analog Neural
Network with Record Symmetry and Linearity. In 2018 IEEE
Symposium on VLSI Technology, 25–26 (2018).
19Kaiser, J. et al. Ultrafast Fluctuations in Low-Barrier Nano-
magnets. arXiv:1902.03312 [cond-mat] (2019). URL http:
//arxiv.org/abs/1902.03312. ArXiv: 1902.03312.
20Hassan, O., Faria, R., Camsari, K. Y., Sun, J. Z. & Datta,
S. Low-Barrier Magnet Design for Efficient Hardware Binary
Stochastic Neurons. IEEE Magnetics Letters 10, 1–5 (2019).
21Tieleman, T. Training restricted Boltzmann machines using
approximations to the likelihood gradient. In Proceedings of
the 25th international conference on Machine learning - ICML
’08, 1064–1071 (ACM Press, Helsinki, Finland, 2008). URL
http://portal.acm.org/citation.cfm?doid=1390156.1390290.
22Sutton, B., Camsari, K. Y., Behin-Aein, B. & Datta, S. Intrinsic
optimization using stochastic nanomagnets. Scientific Reports 7
(2017). URL http://www.nature.com/articles/srep44370.
23Camsari, K. Y., Chowdhury, S. & Datta, S. Scaled
Quantum Circuits Emulated with Room Temperature p-Bits.
arXiv:1810.07144 [quant-ph] (2018). URL http://arxiv.org/
abs/1810.07144. ArXiv: 1810.07144.
24Predictive Technology Model (PTM) (2012). URL http://ptm.
asu.edu/.
25Aarts, E. & Korst, J. Simulated Annealing and Boltzmann Ma-
chines: A Stochastic Approach to Combinatorial Optimization
and Neural Computing (John Wiley & Sons, Inc., New York,
NY, USA, 1989).
26Adachi, S. H. & Henderson, M. P. Application of Quantum
Annealing to Training of Deep Neural Networks (2015). URL
https://arxiv.org/abs/1510.06356v1.
27Korenkevych, D. et al. Benchmarking Quantum Hardware for
Training of Fully Visible Boltzmann Machines (2016). URL
https://arxiv.org/abs/1611.04528v1.
28Terenin, A., Dong, S. & Draper, D. GPU-accelerated Gibbs
sampling: a case study of the Horseshoe Probit model. Stat
Comput 29, 301–310 (2019). URL http://link.springer.com/
10.1007/s11222-018-9809-3.
6
