$MC^2RAM$: Markov Chain Monte Carlo Sampling in SRAM for Fast Bayesian
  Inference by Shukla, Priyesh et al.
This paper has been accepted at the IEEE International Symposium on Circuits and Systems (ISCAS) to be held in May 2020 at Seville, Spain.
MC2RAM: Markov Chain Monte Carlo Sampling in
SRAM for Fast Bayesian Inference
Priyesh Shukla, Ahish Shylendra, Theja Tulabandhula, and Amit Ranjan Trivedi
University of Illinois at Chicago, IL, USA, email: {pshukl23, amitrt}@uic.edu
Abstract—This work discusses the implementation of Markov
Chain Monte Carlo (MCMC) sampling from an arbitrary Gaus-
sian mixture model (GMM) within SRAM. We show a novel
architecture of SRAM by embedding it with random number
generators (RNGs), digital-to-analog converters (DACs), and
analog-to-digital converters (ADCs) so that SRAM arrays can be
used for high performance Metropolis-Hastings (MH) algorithm-
based MCMC sampling. Most of the expensive computations
are performed within the SRAM and can be parallelized for
high speed sampling. Our iterative compute flow minimizes data
movement during sampling. We characterize power-performance
trade-off of our design by simulating on 45 nm CMOS technology.
For a two-dimensional, two mixture GMM, the implementation
consumes ∼ 91µW power per sampling iteration and produces
500 samples in 2000 clock cycles on an average at 1 GHz clock
frequency. Our study highlights interesting insights on how low-
level hardware non-idealities can affect high-level sampling char-
acteristics, and recommends ways to optimally operate SRAM
within area/power constraints for high performance sampling.
Index Terms—Inference; in-memory computing; Markov chain
Monte Carlo (MCMC) sampling.
I. INTRODUCTION
Markov chain Monte Carlo (MCMC) is an extensively used
statistical sampling technique for generating samples from
high-dimensional probability density functions even when
these functions can not be defined analytically [1], [2]. Espe-
cially, in recent years, as various machine learning (ML) plat-
forms are proliferating for realtime decision-making, MCMC
is being combined with Bayesian ML models to perform
efficient inference [3], [4]. Unlike classical inference, Bayesian
inference can capture uncertainties in the outcomes for risk-
aware decision making [5] as shown in Fig. 1. Moreover, there
is a growing interest to operate ML-based prediction models
at the edge itself [6], [7]. A low power/area MCMC platform
is, therefore, becoming imperative along with low power ML
implementation.
Prior works have discussed low power MCMC implementa-
tions using FPGAs, and have achieved an accuracy similar to
their software counterparts, while being more energy-efficient.
An FPGA-based hardware accelerator [8] was designed for
variational inference of Bayesian neural networks (BNNs).
Conversely, in this work, we present MC2RAM – a cus-
tomized implementation of MCMC within SRAM where we
co-locate and co-optimize functional units, control flow, and
data flow to address critical bottlenecks in high-speed MCMC-
based sampling. Our compute flow exploits the Markov chain
property where successive chain outputs lie in the proximity
minimizing the necessary computing load in each iteration.
While MCMC in prior works [8], [9] is limited to Gaussian
functions, MC2RAM expands this to Gaussian mixture models
H3H2H1 B
BX
Y
Statistical 
densities
Bayesian Inference (BI) in 
Neural Network
From NN with Fixed Weights
Predicted label: Automobile
Correct label: Truck
From NN with Densities of Weights
Predicted label: Automobile
Correct label: Truck
Epistemic uncertainty: 1.9018
Aleatoric uncertainty: 0.0004
Fig. 1: (Left) Bayesian inference (BI) in neural network (NN) rep-
resents weights as a density function, unlike classical inference (CI),
where weights assume scalar values. (Right) An ambiguous image
of trucks in glaring sunlight [10] causing uncertainty in prediction
and decision making. BI accounts for prediction-uncertainty unlike
classical inference.
(GMMs). A GMM can model any density arbitrarily closely
with enough mixture components, making our implementation
vastly more applicable. We develop interesting insights about
the interaction between low-level hardware non-idealities and
high-level sampling characteristics in MC2RAM. Our detailed
design and operating power space exploration can lead to
efficient design methodologies for SRAM-based sampling.
The paper is organized as follows. Section II discusses
the background on MCMC and provides an overview of
MC2RAM. Section III discusses density function computa-
tion and sampling in MC2RAM. Section IV discusses the
results and implications of various sources of non-idealities
in MC2RAM on sampling. Section V concludes this paper.
II. MCMC FOR BI AND PROPOSED MC2RAM
In ML models and methods that employ Bayesian inference
(BI), the expectation of the predicted outcome (or other
quantity of interest) is obtained by solving
∫
M(I, w) ×
P (w|D)dw, where M(I, w) is the model (say, a neural
network) with the input I and parameters/weights w, and
P (w|D) is the posterior density of weights given training data
D. In such computations, an analytical integration is often
intractable since the density function of the random variable
(RV) and/or the function to be integrated, e.g., P (w|D) and
M(I, w) in BI, are too complicated. A Monte Carlo approach,
therefore, becomes necessary to numerically compute these
quantities. Monte Carlo approach reduces an integral over a
function of RV, x, as
ar
X
iv
:2
00
3.
02
62
9v
1 
 [e
es
s.S
P]
  2
8 F
eb
 20
20
F(x): density function
R: random
sample
xt
cand = xt-1 + R
F(xt
cand)
F(xt-1)
>U
Random
Threshold
U
xt = xtcand
TrueFalse
Fig. 2: System level flow for in-SRAM MCMC sampling.
∫
G(x)× F(x)dx ≈ 1
T
× ΣTt=1G(xF(x)) (1)
Here, G(x) is the function to be integrated, F(x) is the
density function of x, and xF(x) is an independent and
identically distributed (i.i.d.) sample drawn from F(x). The
law of large numbers guarantees an asymptotic convergence of
the summation to the exact integral as the number of samples
(T ) increases. In BI, F(x) is the posterior density P (w|D),
which can be numerically extracted using the Bayesian for-
mula P (w|D) ∝ P (D|w) × P (w), but cannot always be
defined analytically. Therefore, MCMC overcomes this critical
problem by defining an ergodic Markov chain. Among the
MCMC methods for sampling we choose Metropolis-Hastings
(MH) sampling [11] that provides a middle ground in terms
of acceptance/rejection complexity of the samples and the
average time needed to generate a candidate sample. The
algorithmic steps for MH-based MCMC are demonstrated in
Fig. 2. A candidate sample at step t, xcandt is determined from
the previously accepted sample xt−1 and a randomly generated
sample, R, as xcandt = R + xt−1; here, R follows the statistics
of the proposal distribution P (e.g., Uniform or Gaussian)
typically centered at zero. The statistics of R controls the
search radius and sample search behavior in MCMC. The
candidate sample is accepted when the ratio of the density of
xcandt to that of xt−1, i.e., F(x
cand
t )/F(xt−1) is greater than
a random threshold, U , which is generated uniformly between
zero and one.
Fig. 3 shows the overall architecture of MC2RAM. An
SRAM array stores the GMM parameters for the density
function of the RV, i.e, mean, variance, and mixture weights
(µ, σ, & p). Since the dimension of the sampling density can
be high, µ, σ, & p vectors are stored appropriately in multiple
SRAM banks as shown in the figure. SRAM arrays are
also integrated with random number generators (RNG). Using
RNGs and a previous sample of the chain (xt−1), a candidate
sample (xcandt ) is generated within the SRAM. For x
cand
t ,
SRAM arrays partially compute the density of the candidate
sample, i.e., F(xcandt ), in parallel by following single instruc-
tion multiple data (SIMD) style execution. Central processing
layer receives the partial terms for density computation from
SRAM arrays and applies Metropolis-Hastings-based sample
acceptance criteria to accept/reject xcandt . Using the accepted
xt, the chain iterates to find the next sample xt+1.
The key complexities of MCMC illustrated via Fig. 2 are: (i)
computations of F(xcandt ) for a candidate sample x
cand
t since
the dimension of xcandt can be high and/or density function
Central processing layer
Uniform Random Number 
Generator (U)
Log-ADD LUT
xt-1
R
N
G
µ
σ
p
In-SRAM Compute
xt-1
µ
σ
p
In-SRAM Compute
Candidate sample generation: In-SRAM random 
number generator generates xt
cand
Density computation: 
SRAM arrays partially 
compute density using 
SIMD instructions 
Sample acceptance/
rejection: Centralized 
processing layer compares 
F(xt
cand)/F(xt-1) against U
R
N
G
U
pdate
U
pdate
1
2 3
Fig. 3: Overview of MC2RAM for within SRAM MCMC sampling.
F(x) can be complex, and (ii) high throughput sampling
since many xcandt may end up being rejected. The details
of our framework addressing these complexities is discussed
subsequently.
III. IN-SRAM DENSITY COMPUTATION AND SAMPLING
The density of a GMM, G(x), for a candidate sample xcandt
is given by
G(xcandt ) =
M∑
j=1
pj ×N(xcandt ;µj , σj) (2)
Here, M is the total number of mixture components;
N(xcandt ;µj , σj) is j
th Gaussian mixture function whose
density depends on µj and σj ; and pj is the mixture weight. In
approximating a density function F(x) using GMM, mixture
Gaussians with only diagonal co-variance can be used (this
is called as mean-field approximation [12]). The density of
N(xcandt ;µj , σj) depends on its exponent Ej as
Ej =
N∑
i=1
(
xcandi − µij
σij
)2
(3)
Here, xcand, µj and 1/σj are each N -dimensional and ex-
panded using the subscript ‘i’ as in the above equation. The
overall GMM density in log-domain can be computed from
the exponents Ej itself using the identity ln(ea + eb) =
a+ ln(1+eb−a) and a look up table (LUT) for ln(1+ex). Ej
can be further simplified by exploiting the sampling property
where xcandt is searched in the proximity of the previous
MCMC sample xt−1. Thus, Ej(t) at xcandt can be computed
from Ej(t− 1) at xt−1 by
Ej(t) = Ej(t− 1) +
( R
σ2j
·R
)
+ 2 · ( R
σ2j
) · (xt−1 − µj) (4)
Here, R is a generated random number from the proposal
distribution P within SRAM, which is used to search the next
MCMC sample. R/σ2j is an N -dimensional vector obtained
by dividing each element of R, Ri, with the corresponding
σ2ij . We apply SRAM to compute scalar products R · R/σ2j
and (xt−1 − µj) ·R/σ2j within SRAM array to evaluate (4).
For the scalar product of two vectors V and W , i.e., V.W ,
W is stored in the 8-T SRAM cell-array and V is copied to
BLP
2
GND
VCELL
WL
BL1
BL2
BLP1
WLP
M1
M2
(a)
Column current proportional to the scalar product
WP
W0 2W0 2n-1W0
D0 D1 Dn-1
WC WC
C0 Ck
WLP
IREF Calibration
   mirrors
(b) Vdd WLPn
WLP1
SRAM
   bit
Column
   select
Row 
DAC
Column
   MUX
Column 
ADC
BLP1
BLP2
VI
WS
VI.WS
(c)
VSAMPLE
Column
   select
BLPn
SRAM
   bit
WLPn
V.W 
Column
current
Cf1
GND
Cf2
Vbn
Vbp
Vdd
GND
Hold cell
VSAMPLE
M1
(d)
(e)
R
MN1 MN2
MP2MP1
CLK
VREF
Course 
Flash
ADC
Dk
D0
GND
Cs
C
lk
Fine 
Flash
ADC
DAC CLK
VREF
Σ
Dn-1
Dk+1
A
R
st
S0S1Sn-1
Programmable 
     width WP
Calibration
 transistors
Quantized σ2
Calibrating
Loop
RNG_EN
CLK
CLK
RNG_EN
   DAC
Current
CPWL
VB1M2M1
Q QB
CLK
CLK
Product port
VBB
(g)
(f) Vdd
Fig. 4: (a) 8-T SRAM cell with product port. (b) DAC co-located with SRAM cell. (c) Current-mode scalar product operation within SRAM.
(d-e) Op-Amp to stabilize column current and interface with ADC. (f-g) In-SRAM RNG cell.
SRAM-embedded random 
number generators
R1
DAC-operand 
buffer
Parallel 
DAC array
Column selector/multiplexer
1/σ1
2
1/σ2
2
1/σn
2
R2
Rn
R1
R2
Rn
D1
D2
Dn
µj1
µj2
µjn
xt-1,1
xt-1,2
xt-1,n
ADC
ADC REG
Read/Write Port
Generate random numbers 
and copy to DAC buffer
1
Compute scalar 
terms of Ej
2
Shift/Sign Op.
Output REG
Generate partial 
terms of Ej
+ 3
Fig. 5: System level flow for in-SRAM MCMC sampling.
the DAC-operand buffer shown in Fig. 5. An 8-T SRAM cell
is shown in Fig. 4(a) which has an additional scalar product
port as shown in red in the figure. SRAM columns store W
in n-columns with n-bit precision. Digital-to-analog converter
(DAC) converts the input V into corresponding analog-mode
current vector Iv and applies to the product word line WLp of
the cells. The basic approach for the scalar product is to use
memory cells as current-mode AND gate. If an SRAM cell ‘j’
stores bit ‘1’, it allows the row DAC current Iv to flow to its
bit-line BLp. The currents from each active cell in the column
add up and follow V.W as shown in Fig. 4(c). The column
multiplexer selects only one column at a time and the selected
column-current will be read by an analog-to-digital converter
(ADC) as shown in Fig. 4(d) and (e) that converts column
current to digital bits representing V.Wi, where Wi is the ith
precision binary vector of W . The column current is passed to
an OP-AMP with a resistive feedback to convert the column
current to corresponding analog voltage. The resistance value,
R, is designed to match the operating range of ADC. OP-AMP
in Fig. 4(d) serves two purposes. It stabilizes the potential
of the tail-end of the column, and it also biases the column
tail potential to zero. The hold cell in the Fig. 4(d) samples
the output potential of OP-AMP and retains it after the OP-
AMP is disconnected and bias-current of row DACs is turned
off to save biasing power. The current of all n-columns are
converted using ADC and combined with digital scaling to
compute V.W . We use two-step flash ADCs [13], [14] that
optimally balances area/power constraints without incurring
excessive delay.
For the proposed implementation in 45nm CMOS, Fig. 6(a)
shows HSPICE simulation for the scalar product using 32-
row SRAM cell array matching the ideal. The current-mode
processing in the proposed design gives significant advantages.
The SRAM cells either act as current buffers or block the
input current so that the variability in SRAM cell transistors
has minimal impact to the accuracy of scalar product that
posed challenge in [15]. Also, the VTH variability of cell
transistors does not affect the scalar product when DAC
reference currents are sufficiently higher than SRAM leakage.
Fig. 6(b-c) illustrate the variability analyses on the operation of
SRAM. In Fig. 6(b), the column current follows a Log-Normal
distribution when considering process variability due to SRAM
transistors. In Fig. 6(c), with σ(VTH) = 30 mV (black curve),
the variation in column current is 1.43 (normalized against
mean) that corresponds to the DAC reference current of 5 nA.
Upon increasing the DAC current, the variability of column
current reduces when normalized against the mean value.
DAC in Fig. 4(b) displays two critical non-idealities that
affect the scalar product accuracy: (i) Channel length modula-
tion (CLM) in the mirroring transistors that affects the scalar
product accuracy posing dependence to WLp potential and (ii)
non-ideal mirroring ratio due to process variability. We address
CLM-induced precision degradation by reducing the turn-ON
voltage of select switches in DAC to limit source-to-drain
voltage of mirroring transistors, which improves the accuracy
as shown in Fig. 6(d). To minimize process variability-induced
non-ideal mirroring ratio in DAC, a set of calibrating transis-
tors with small width Wc relative to mirroring transistors are
added to DAC in Fig. 4(b). DAC mirror current is read against
a reference to add Wc until the current meets the desired level.
In MC2RAM, storage of density function (F(x)) parameters
and sample generation R is collocated within the same array
by integrating RNG cells with SRAM cells. Since in a high-
dimensional weight space many R end up being rejected, co-
locating the operations with the same SRAM array minimizes
overheads and data movement. Fig. 4(f) shows the RNG cell
based on cross-coupled inverters [16]. The differential ends Q
and QB are pre-charged to VDD when CLK = 0. When
CLK = 1, the thermal noise resolves the meta-stability
to generate a random bit. The random bits, stored in DAC
operand buffer, can further be scaled with σ2 within DAC
as shown in Fig. 4(g). The scaled R/σ2 is used for density
0 10 20 30
# Active column-cells (W)
0
0.2
0.4
0.6
0.8
1
I c
o
l 
=
 V
.W
 (N
or
ma
liz
ed
)
V=1
V=3
V=7
V=15
V=31
(a)
0 1 2 3 4
I_column (normalized)
0
2
4
6
8
10
12
# 
of
 o
cc
ur
ra
nc
es
100 monte-
carlo runs
(b)
5 25 50 75 10
0
12
5
15
0
17
5
20
0
22
5
25
0
I
ref  (nA) (DAC)
0.4
0.6
0.8
1
1.2
1.4
1.6
<(
I_c
olu
mn
)/
7(
I_c
olu
mn
) (
SR
AM
)
<Vth  = 10mV
<Vth  = 20mV
<Vth  = 30mV
<Vth  = 40mV
<Vth  = 50mV
(c)
0 10 20 30
# Active column-cells (W)
0.65
0.7
0.75
0.8
0.85
0.9
0.95
I c
o
l 
=
 
V
.W
 (N
or
m
ali
ze
d)
Ideal
Low CLM
High CLM
(d)
SRAM
5%
DAC
13%
ADC
82%
(e)
Fig. 6: (a) V.W scalar product simulated in 45nm CMOS for density computation. (b) Effect of VTH variability in MC2SRAM transistors to
scalar product current (σ(VTH) = 30 mV). (c) Current variability in MC2SRAM controlled by DAC. (d) Effect of CLM in mirror transistors
of DAC on scalar product accuracy. (e) Contribution of MC2RAM peripherals to power consumption per sampling iteration.
-6 -4 -2 0 2 4 6 8
x-dimension
-4
-3
-2
-1
0
1
2
3
4
5
6
y-d
im
en
sio
n
Simulated distribution
Ground truth
(a) Sampling within MC2RAM
-15 -10 -5 0 5 10 15
x-dimension
-10
-8
-6
-4
-2
0
2
4
6
8
y-
di
m
en
si
on
(b) Mean distance, d = 5
1 2 3 4 5 6 7 8
Parameter value
0
0.5
1
1.5
2
K
L
 d
iv
er
ge
nc
e
ADC-precision
DAC-precision
Mean distance
(c)
2 3 4 5 6
Dimension of GMM
0
0.2
0.4
0.6
0.8
1
K
L
 d
iv
er
ge
nc
e
(d)
Fig. 7: (a) Sampling distribution with 8-bit precision DAC, 6-bit precision ADC, VDD = 1V and mean distance = 1. (b) Sampling distribution
with mean distance = 5. (c) KL divergence between ground truth (contour) and sampled distribution with sweeps performed on ADC bits-
precision, DAC bits-precision and mean distance between GMM components. (d) KL divergence over a range of GMM dimensions.
computation based on (4).
IV. RESULTS AND DISCUSSIONS ON SAMPLING
We analyze simulated distribution with respect to the ground
truth using scatter plots and KL divergence [17]. The KL
divergence between two discrete probability density functions
F(x) and G(x) is measured as
DKL(F||G) =
∑
x
F(x)log
(F(x)
G(x)
)
. (5)
We considered a sample GMM, GMMT , with µ =
[1,−1;−1, 1], σ = [1, 0; 0, 1], and p = [0.5, 0.5]. We also con-
sidered 500 samples to be sufficient to eliminate any statistical
errors in KL divergence. We discard the first 50 samples as
burn-in samples. For GMMT , Fig. 7(a) shows the sampling
trajectory and distribution when the precision of DAC/ADC is
8 bits with 1 volts power supply. The contour lines in the figure
represent ground truth GMM and the scattered dots in blue rep-
resent simulated distribution of samples from MC2RAM. The
trajectory of samples as shown in red in the figure corresponds
to 75 random walks. Fig. 7(b) illustrates sampling when mean
distance parameter d in µ = [d,−d;−d, d] between GMM
components is set to 5. Fig. 7(c) shows DAC/ADC imprecision
tolerance limit in MC2RAM which allows DAC and ADC
to be low power/area. Sampling deviates from ground truth
for ADC below 5-bit precision. Whereas, reduction in DAC
precision has no significant impact in KL divergence. This
also justifies our choice of using a low precision two-step
flash ADC in MC2RAM which has lower overhead for low to
moderate precision design. Also, for fixed sampling iterations
KL divergence is high for large separation between GMM
components. As we go for higher dimensions, the deviation
of samples from ground truth increases as shown in 7(d). For
a two-dimensional, two mixture GMM, the implementation
consumes ∼ 91µW power per sampling iteration and produces
500 samples in 2000 clock cycles on an average at 1 GHz
clock frequency. The pie chart in Fig. 6(e) shows SRAM cells
and DAC together contributing to 18% of power consumption
whereas the remaining 82% is due to the ADC. The 10
comparators in the 2-step subranging flash ADC constitute
60% of the power consumed by the ADC, however, the delay
associated with flash ADC is 2 clock cycles, which improves
sampling throughput in MC2RAM.
V. CONCLUSION
We have presented a novel framework MC2RAM that is
a key to accelerate Markov chain Monte Carlo (MCMC)
sampling for Bayesian Inference (BI). We exploit MC2RAM
to store parameters of posterior density of weights in BI and
random number generation for high throughput Metropolis-
Hastings (MH) based sample acceptance/rejection. The frame-
work samples at low precision and power with tolerance to
process variation thus removing latency/energy/safety bottle-
necks associated with traditional von Neumann architecture
that has spatially distant memory and processing elements.
REFERENCES
[1] C. Andrieu, N. D. Freitas, and et al., “An introduction to mcmc for
machine learning,” 2003.
[2] R. M. Neal, “Markov chain sampling methods for dirichlet process
mixture models,” Journal of Computational and Graphical Statistics,
vol. 9, no. 2, pp. 249–265, 2000. [Online]. Available: https:
//amstat.tandfonline.com/doi/abs/10.1080/10618600.2000.10474879
[3] G. E. Hinton and R. M. Neal, “Bayesian learning for neural networks,”
1995.
[4] M. Ghavamzadeh, S. Mannor, J. Pineau, and A. Tamar, “Bayesian
reinforcement learning: A survey,” Foundations and Trends® in
Machine Learning, vol. 8, no. 5-6, pp. 359–483, 2015. [Online].
Available: http://dx.doi.org/10.1561/2200000049
[5] C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight
uncertainty in neural networks,” 2015.
[6] M. Horowitz, “1.1 computing’s energy problem (and what we can do
about it),” in 2014 IEEE International Solid-State Circuits Conference
Digest of Technical Papers (ISSCC), Feb 2014, pp. 10–14.
[7] V. Sze, Y. Chen, T. Yang, and J. S. Emer, “Efficient processing of deep
neural networks: A tutorial and survey,” Proceedings of the IEEE, vol.
105, no. 12, pp. 2295–2329, Dec 2017.
[8] R. Cai, A. Ren, N. Liu, C. Ding, L. Wang, X. Qian, M. Pedram,
and Y. Wang, “Vibnn: Hardware acceleration of bayesian neural
networks,” in Proceedings of the Twenty-Third International Conference
on Architectural Support for Programming Languages and Operating
Systems, ser. ASPLOS ’18. New York, NY, USA: ACM, 2018, pp. 476–
488. [Online]. Available: http://doi.acm.org/10.1145/3173162.3173212
[9] S. S. Banerjee, Z. T. Kalbarczyk, and R. K. Iyer, “Acmc 2 :
Accelerating markov chain monte carlo algorithms for probabilistic
models,” in Proceedings of the Twenty-Fourth International Conference
on Architectural Support for Programming Languages and Operating
Systems, ser. ASPLOS ’19. New York, NY, USA: ACM, 2019, pp. 515–
528. [Online]. Available: http://doi.acm.org/10.1145/3297858.3304019
[10] [Online]. Available: https://github.com/kyle-dorman/
bayesian-neural-network-blogpost
[11] S. Chib and E. Greenberg, “Understanding the metropolis-hastings
algorithm,” The American Statistician, vol. 49, no. 4, pp. 327–335,
1995. [Online]. Available: http://www.jstor.org/stable/2684568
[12] [Online]. Available: https://www.cs.cmu.edu/epxing/Class/10708-14/
scribe notes/scribe note lecture15.pdf
[13] V. Ferragina, N. Ghittori, and F. Maloberti, “Low-power 6-bit flash adc
for high-speed data converters architectures,” in 2006 IEEE International
Symposium on Circuits and Systems (ISCAS), May 2006, pp. 4 pp.–.
[14] F. Maloberti, Data Converters, 1st ed. Springer Publishing Company,
Incorporated, 2010.
[15] J. Zhang, Z. Wang, and N. Verma, “In-memory computation of a
machine-learning classifier in a standard 6t sram array,” IEEE Journal
of Solid-State Circuits, vol. 52, pp. 915–924, 2017.
[16] S. K. Mathew, S. Srinivasan, M. A. Anders, H. Kaul, S. K. Hsu,
F. Sheikh, A. Agarwal, S. Satpathy, and R. K. Krishnamurthy, “2.4 gbps,
7 mw all-digital pvt-variation tolerant true random number generator for
45 nm cmos high-performance microprocessors,” IEEE Journal of Solid-
State Circuits, vol. 47, no. 11, pp. 2807–2821, Nov 2012.
[17] J. R. Hershey and P. A. Olsen, “Approximating the kullback leibler di-
vergence between gaussian mixture models,” in 2007 IEEE International
Conference on Acoustics, Speech and Signal Processing - ICASSP ’07,
vol. 4, April 2007, pp. IV–317–IV–320.
