Representable Matrices: Enabling High Accuracy Analog Computation for
  Inference of DNNs using Memristors by Zhang, Baogang et al.
Representable Matrices: Enabling High Accuracy
Analog Computation for Inference of DNNs using
Memristors
Baogang Zhang∗, Necati Uysal∗ , Deliang Fan† and Rickard Ewetz∗
∗University of Central Florida, Orlando, FL, 32816, USA
†Arizona State University, Tempe, AZ, 85281, USA
baogang.zhang@knights.ucf.edu, necati@knights.ucf.edu, dfan@asu.edu, rickard.ewetz@ucf.edu
Abstract—Analog computing based on memristor technology
is a promising solution to accelerating the inference phase of
deep neural networks (DNNs). A fundamental problem is to
map an arbitrary matrix to a memristor crossbar array (MCA)
while maximizing the resulting computational accuracy. The
state-of-the-art mapping technique is based on a heuristic that
only guarantees to produce the correct output for two input
vectors. In this paper, a technique that aims to produce the
correct output for every input vector is proposed, which involves
specifying the memristor conductance values and a scaling factor
realized by the peripheral circuitry. The key insight of the
paper is that the conductance matrix realized by an MCA
is only required to be proportional to the target matrix. The
selection of the scaling factor between the two regulates the
utilization of the programmable memristor conductance range
and the representability of the target matrix. Consequently, the
scaling factor is set to balance precision and value range errors.
Moreover, a technique of converting conductance values into state
variables and vice versa is proposed to handle memristors with
non-ideal device characteristics. Compared with the state-of-the-
art technique, the proposed mapping results in 4X-9X smaller
errors. The improvements translate into that the classification
accuracy of a seven-layer convolutional neural network (CNN)
on CIFAR-10 is improved from 20.5% to 71.8%.
I. INTRODUCTION
Deep neural networks (DNNs) have in recent years achieved
remarkable results in terms of image, audio, and video recog-
nition [6]. The arising solution to enable the computation-
ally heavy DNNs to be deployed on edge-devices in the
Internet of Things is to leverage memristor-based technology.
Memristor crossbar arrays (MCAs) can perform matrix-vector
multiplication in the analog domain with orders of magnitude
smaller power and latency than in the digital domain [5], [4].
Moreover, the use of MCAs allow matrices to be stored in-
place, which reduces data fetching and communication costs
that fundamentally bounds the performance of any computing
system that processes large amounts of data [8], [2], [9].
Matrix-vector multiplication is performed using an MCA
with access transistors by first programming the conductance
values of the memristors to realize a conductance matrix G,
which is illustrated in Figure 1(a) [3]. Next, an input vector of
voltages (vin) are applied to the vertical columns and a vector
of output voltages (vout) are measured from the horizontal
This paper was accepted at Asia and South Pacific Design Automation
Conference 2020.
This research was supported in part by NSF awards CCF-1755825 and
CNS-1908471.
TIAs
Rs
Input: x 
(vector M)
In
p
u
t:
 G
(m
a
tr
ix
 N
x
M
)
O
u
tp
u
t:
 y
(v
e
c
to
r 
N
)
DACs
A
D
C
s
1T1M
vi1 vi2 viM
vo1
vo2
voNvo= RsGvin
y = Ax
g
a
p
(a) (b)
Fig. 1: (a) An MCA used for matrix-vector multiplication. (b)
Classification accuracy in software and MCA based hardware
on MNIST and CIFAR-10 using the mapping in [5].
rows. The inputs are provided to the MCA using digital-
to-analog converters (DACs) and the outputs are converted
into digital values using transimpedance amplifiers (TIAs) and
analog-to-digital converters (ADCs). The output voltages vo
are equal to RsGvi, where Rs is the feedback resistances of
the TIAs [3], [5]. Next, the output voltages are scaled into
digital values. However, if a weight matrix is mapped to an
MCA without considering effects as IR-drop, programming
errors, and non-ideal device characteristics, the computational
accuracy will be degraded into noise [5]. In particular, the
accuracy is degraded by the IR-drop across the non-zero input,
output and wire resistance in the MCA.
Techniques to map an arbitrary matrix W to an MCA have
been studied in [3], [12], [7], [5]. The conductance matrix
G realized by a set of memristor conductance values g can
be determined analytically using Modified Nodal Analysis
(MNA) [7]. Next, the effective matrix realized by an MCA
(W r) is obtained by scaling G with a factor (1/α), which
is realized by the peripheral circuitry. In [7], the conductance
values g were determined by minimizing the square of the
Frobenius norm of (W −W r) using steepest gradient decent.
However, the method is unable to consistently converge for
arbitrary matrices. In [5], a technique of tunning conductance
values (or state variables of non-ideal memristor devices)
to minimize ||(W − W r) · vcal||2 using Newton’s method
was proposed, where vcal is a calibration vector. The main
limitation of these works is that the scaling factor α was
not explicitly optimized. Nevertheless, the technique in [5]
enabled a five-layer feed-forward neural network to be mapped
to a memristor based platform while achieving software level
accuracy on the MNIST dataset. However, when the technique
is used to map a seven-layer convolutional neural network
ar
X
iv
:1
91
1.
12
35
2v
1 
 [c
s.E
T]
  2
7 N
ov
 20
19
(CNN) to a MCA based platform, the classification accuracy
drops from 75.2% to 20.5%, which is shown in Figure 1.
In this paper, a technique is proposed to map an arbitrary
matrix W into a scaling factor α and memristor state variables
s. The main innovations of the paper are summarized, as
follows:
• The problem of specifying (s, α) is converted into a
problem of specifying (g, α). The technique is based
on replacing each series connected memristor and access
transistor with an ideal conductor. After the conductance
values g have been determined, Newton’s method is used
to obtain the equivalent state variables s.
• The memristor conductance values g are only required
to be specified to realize a conductance matrix G that
is proportional to the matrix W . Next, the conductance
matrix is effectively scaled with 1/α such that W is
effectively realized. Nevertheless, the utilization of the
memristor conductance range is also regulated by the
scaling factor α. If α is set too small, the errors will
be dominated by the limited precision of the memristors.
If α is set too large, the errors will be dominated by
value errors introduced by IR-drop. In particular, it is
impossible to represent small and large values in the
far-end of an MCA because of IR-drop. We refer to
the insights of the matrices that can be realized at
different locations in an MCA as defining the space of
representable matrices. Consequently, α is specified while
balancing precision and value range errors.
• Given α, the conductance values g are specified mini-
mizing ||(W − W r)||2F using steepest gradient decent,
where ||(W −W r)||2 is called the total errors. After it is
impossible to further reduce the total errors, the matrix
W is updated into a new target matrix W t in order to
ensure that ||(W − W r) · vcal||2 = 0, where vcal is a
calibration vector selected from the input vector space. .
• The experimental results show that the proposed mapping
technique results matrix-vector multiplication with 4X-
9X higher computational accuracy compared with in [5].
The improvements translate into that the classification
accuracy of a seven-layer CNN is improved from 20.5%
to 71.8% on CIFAR-10, which is close to the software
accuracy of 75.2%.
II. PRELIMINARIES
A. Circuit model of MCAs [7], [5]
Figure 2 shows two circuit models for the MCAs in Fig-
ure 1. In Figure 2(a), the memristors and the access transistors
are modeled using non-linear equations. The current im(s, vm)
through a memristor is a non-linear function of the state
variable s and the voltage vm across the device. The current
through each access transistor it(vs, vd, vg) is a non-linear
function of the source, drain, and gate voltage. A simplified
circuit model of the MCA when both the memristors and
the access transistors are treated as ideal devices is shown in
Figure 2(b), which allows them to be replaced with a single
ideal memristor g (or an conductor with lower and upper
bounds).
wire
resistance
output 
resistance
}
input 
resistance
}
}
g11
g21
g31
g12
g22
g32
g13
g23
g33
wire
resistance
output 
resistance
}
input 
resistance
}
}
} vout
vinvin
vout
rin
rout rout
rin
rw
rw
s11
s21
s31
s12
s22
s32
s13
s23
s33
}
(a) (b)
Fig. 2: MCA with (a) non-ideal and (b) ideal devices.
B. Matrix realized by an MCA
The matrix W r realized by an MCA is a function of g and
α, i.e., W r = f(g, α). For the circuit model in Figure 2(b), the
realized matrix W r can be obtained analytically, as follows:
W r = G/α (1)
where G is the conductance matrix realized by the resistive
network and and α is an arbitrary scaling factor realized by the
peripheral circuitry. The conductance matrix G is a non-linear
function of the memristor conductance values g. G is obtained
by formulating a system on linear equations that capture the
resistive network using MNA, as follows:
Y (g)
[
v
vdac
]
=
[
0
vin
]
, (2)
where Y (g) is a matrix with dimensions (2NM +
M)x(2NM + M) that is a function of g. M and N are the
number of inputs and outputs, respectively. v and vdac are
respectively the node voltages of the MCA and DACs; vin is
the input voltages. Next, G is obtained analytically, as follows:
G = SY −1(g)B, (3)
where B = [0, 0, I]T and I is an MxM identity matrix. S is
an Nx(2 ·N ·M +M) matrix that selects the output voltages
from Y −1B.
C. Problem formulation
This paper considers the problem of specifying the memris-
tor state variables s and the scaling factor α that maximizes
the computational accuracy when performing matrix-vector
multiplication using an MCA. If the memristor devices are
ideal, the problem consists of specifying the conductance
values g and the scaling factor α. The computational accuracy
is evaluated while accounting for that memristors only can be
programmed to a limited number of distinguishable states [1],
[4]. In the experimental results, the impact of the proposed
mapping technique is also evaluated in terms of classification
accuracy when DNNs trained in software are mapped to an
MCA based platform for inference.
The problem is approached by first converting problem of
specifying state variables and a scaling factor (s, α) into a
problem of specifying memristor conductance values and a
scaling factor (g, α). This enables g to be specified while
minimizing ||(W −W r)||2 or ||(W −W r) · vcal||. Lastly, the
conductance values g are converted back into state variables
s.
Worst precision
Longest value range
Best precision
Shortest value range
Input: vector x
O
u
tp
u
t:
 v
e
c
to
r 
y
 scaling factor   
(small)                                          and conductance range utilization                                              (large) 
 
Lower bound
Upper bound
Few distingushable states                                        Many distingushable states
    Flexible value ranges                                              Restrictive value ranges 
(a) (b) (c)
Fig. 3: (a) The value range and precision of an matrix element is dependent on the location in the MCA. (b) and (c) shows a
trade-off between the value ranges and the precision based on the scaling factor α.
III. PREVIOUS WORK
A. Specification of conductance values g in [7]
In [7], α was fixed and the conductance values g were
determined by formulating an optimization problem, where the
square of the Frobenius norm of (W −W r) was minimized,
as follows:
minF (g, α) = ||W −W r||2 =
N∑
i=1
M∑
j=1
(wij − wrij)2, (4)
where ||.||2 is the square of the Frobenius norm. wij and wrij
are the elements in row i and column j in the weight matrix
W and the realized matrix W r, respectively. The function F
is minimized using steepest gradient decent, as follows:
gk+1 = gk + tOF (g)
= gk + t ·
N∑
i=1
M∑
j=1
2 · (wij − wrij) ·
∂wrij
∂g
, (5)
where
∂wrij
∂g is the derivative of wij with respect to g. t is the
step size, which is determined using a linear search. g0 is equal
to W linearly mapped into the memristor conductance range.
Iterative tuning is performed to compensate for IR drop in the
MCA. The main limitations is that the method is unable to
consistently converge to solutions with high accuracy because
α was fixed.
B. Specification of state variables s in [5]
In [5], a technique of specifying the state variables of the
memristors s was proposed. The matrix W is first linearly
mapped into the programmable memristor conductance range
to obtain an ideal conductance matrix Gideal. The ideal
current (iideal) through each memristor device is obtained
using Gideal and an input calibration vector vcal, which also
implicitly defines the scaling factor α. Next, MNA is used to
formulate a system of (4NM+2N+2M)x(4NM+2N+2M)
equations to capture the circuit model in Figure 2(a). NM
of the equations are used to force the currents through each
memristor to be equal to iideal and the remaining equations are
used to capture the behavior of the circuit. The state variables s
are determined by solving the system of equations using New-
ton’s method. If Newton’s algorithm does not converge, iideal
is updated to ensure that the full programmable conductance
range was utilized. The limitation is that only the zero input
vector (0¯) and the calibration vector (vcal) are guaranteed to
produce the correct output, i.e., ||(W −W r) · 0¯||2 = 0 and
||(W −W r) · vcal||2 = 0.
IV. SPACE OF REPRESENTABLE MATRICES
In this section, we define the space of matrices that are
representable using an MCA and analyse the impact of the
scaling factor α. The observations motivates our proposed
mapping technique.
The value range and precision for each matrix element is
dependent on the location in the MCA and the scaling factor α,
which is illustrated in Figure 3. Based on Eq (1) and Eq (3),
the scaling factor α directly regulates the utilization of the
programmable conductance range, i.e., a larger α implies a
utilization of larger conductance values. The value range for
a matrix element consists of a lower and upper bound on the
value that can be realized. The upper bound mainly stems
from IR-drop. The lower bound stems from that currents may
flow from an vertical line i to an horizontal line j even if the
memristor device connecting vertical line i to horizontal line
j is set to be non-conductive (maximum resistance), i.e., the
current would flow on paths in the MCA containing more than
one memristor. The length of the value range is the longest
in the top-right corner and the shortest in the bottom-left
corner of an MCA. Moreover, there is an equal number of
distinguishable states between every lower and upper bound.
The number of states is dependent on the accuracy of the
closed loop programming and the selected utilization of the
programmable conductance range. Consequently, the worst
(best) precision is obtained for value range’s with the longest
(shortest) length, which is illustrated in Figure 3(a). Moreover,
by reducing the utilization of the conductance range, every
value range becomes more flexible at the expense of worse
precision because the number of distinguishable states within
each value range is reduced, which is illustrated in (b-c) of
Figure 3. The explanation is that utilization of high conductive
increases the currents on the paths with multiple memristors
and IR-drop.
The described observations directly explain the space of
matrices that are representable using an MCA, i.e., every
matrix can be realized using an MCA but the computational
accuracy depends on the sum of the value range errors and the
precision errors (called total errors). Value range errors occur
when matrix elements are attempted to be realized outside
their respective value ranges. The precision errors depend pro-
foundly on how large portion of the programmable memristor
conductance range is utilized. Consequently, a critical problem
is to specify the scaling factor α to balance the value range
and the precision errors, which is shown in Figure 4.
(a) (b)
Fig. 4: (a) Memristor conductance range utilization vs. scaling
factor α. (b) Total errors, value range errors, precision errors
vs. scaling factor α. The results are obtained for an MCA with
dimensions 64x64.
In Figure 4(a), it is shown that the utilization of the
programmable memristor conductance range is dependent on
α. In Figure 4(b), the trade-off between value range errors
and precision errors is shown based on α. If α is selected
too small, the value range errors will neglectable but large
precision errors will be introduced. If α is selected to large,
the precision errors will be neglectable but large value range
errors will be introduced. Hence, α should be selected so only
a few values are slightly outside the value ranges such that the
algorithm in [7] can be used to specify the conductance values
by minimizing ||W −W r||.
V. PROPOSED METHODOLOGY
We propose a five step flow to map an arbitrary target matrix
W to an MCA which is shown in Figure 5. The first step
consists of replacing each non-ideal memristor and access
transistor with an equivalent ideal memristor. As mentioned
earlier, the conversion is performed to allow W r to be com-
puted using Eq (1). The details are provided in Section V-A.
The second step is to determine the scaling factor α that
minimizes the total errors ||W −W r||2, which is explained
in Section V-B. The third step is to specify the conductance
values g that minimize the total errors while guaranteeing
that, ||(W −W r) · vcal||, is close to zero, which is outlined
in Section V-C. The motivation is to leverage the known
properties of the input vector space. Fourth, the state variables
s of the non-ideal memristors are determined by solving a
system of non-linear equations using Newton’s method, as
explained in Section V-D. Lastly, closed loop programming
is applied to program the memristors on-chip to the desired
states s using the techniques in [1], [4].
A. Convert non-ideal devices to ideal devices
The first step is to convert the non-ideal memristors and
access transistor to ideal memristor with a conductance of
g, i.e., converting the circuit in Figure 2(a) to the circuit in
Figure 2(b). g is bounded within [gmin, gmax], where gmin and
gmax are respectively the conservatively estimated minimum
and maximum conductance of the series connection of each
memristor and access transistor. These conductance values are
input: W
Convert non-ideal devices to ideal devices 
Specication of scaling factor 
Specication of conductance values g 
Specication of state variables s
Closed loop programmig of memristors 
output
In
 s
o
ft
w
a
re
(p
ro
p
o
s
e
d
)
In
h
a
rd
w
a
re
Fig. 5: Proposed flow for mapping W to an MCA.
different from the programmable memristor conductance range
because the estimated conductance of the access transistor is
included.
B. Specification of scaling factor α
In this section, a technique of specifying the scaling factor
αopt that minimizes the total errors ||(W−W r)||2 is proposed.
The method used in this paper is based on first guessing an
scaling factor α0. Given α0, the technique in [7] is utilized
to specify the memristor conductance values g by minimizing
||W−W r||. Next, αk is updated to αk+1 based on the relation
between the value range errors and the conversion errors to
minimize the total errors. If the value range errors are larger
than the precision errors, αk is updated to αk+1 = αk ·(1−β).
If the precision errors are larger than the value range errors,
αk+1 = αk · (1 + β). (Experimentally we have observed that
the total errors are close to the minimum when the value
range errors are equal to the precision errors.) The process
is repeated until no further improvements in terms of total
errors are achieved over t iterations. The parameters β and t
are set to balance a trade-off between errors and run-time.
Specifically, the total errors are obtained by first quantizing
g (based on the bit-accuracy of the closed loop programming).
Next, W r is obtained using Eq (1) and the total errors are
computed as ||W−W r||2. The value range errors are obtained
as ||W−W r||2 without first quantizing the conductance values
g. The precision errors are set to the difference between the
total errors and the value range errors.
C. Specification of conductance values g
In this section, the conductance values g are specified
given W , αopt, and a calibration vector vcal from the input
vector space. The objective is to minimize ||W −W r||2 while
guaranteeing that ||(W −W r) · vcal|| = 0. The motivation for
minimizing ||(W −W r) · vcal|| is to exploit that the outputs
from neurons in DNNs are non-negative due to the activation
functions.
This step is performed by updating W to a new target
matrix W t. Next, given W t and αopt the conductance values
g are specified using the approach based on minimizing
||W t − W r||2 using the method in [7]. Unfortunately, it is
impossible eliminate the errors by updating elements in W that
are realized too small (or too large) because the corresponding
memristors are already tuned to the lower (or upper) bound of
the programmable memristor conductance range. Hence, the
errors are distributed to the matrix elements in the same row
to ensure that ||(W −W r) · vcal|| = 0.
Let R = (W r −W ) be the difference between the realized
matrix W r and the matrix W . Next, let r be a vector
containing the sum of the elements in each row of R and
let c be a vector containing the number of memristors with a
conductance not equal to gmin in each row of the MCA. Next,
let u be equal to r element-wise divided by c. Subsequently,
W is updated to W t by adding u(i) to each element in row i
where the conductance of the corresponding memristor is not
equal to gmin. Next, W t is mapped into conductance values
using the technique in [7].
D. Specification of state variables s
In this section, the state variables of the memristors are de-
termined from the ideal conductance values g and a calibration
signal vcal, which is illustrated in Figure 6. First, the node
voltages vc and vr and the currents through the conductors ig
are computed using g and vcal, which is shown in Figure 6(a).
Next, the state variables s are computed using vc, vr, and ig ,
which is illustrated in Figure 6(b).
Computation of vc, vr, ig: First, the node voltages vc and
vr in the MCA are computed with respect to a calibration sig-
nal vcal by solving Eq (1) with vin=vcal. We use vcal=vmax/2
in our implementation (vcal can also be set based on prior
knowledge of the input vectors of a specific application).
Next, the currents though each conductor g is obtained using
ig = g · (vc − vr).
sg
vp
vc
vr
igvcal
Eq (1)
Newton's 
method
(a) (b)
Fig. 6: Specification of s from g and vcal.
Computation of state variables s: The state variables s
are found by solving a non-linear system of two equations, as
follows:
X =
[
s
vp
]
, F (X) = ig −
[
im(s, vc-vp)
it(vp, vr, vg)
]
, (6)
where vp is the node voltages between the memristors and the
access transistors. Next, Newton’s method is used to solve for
F (X) = 0, as follows:
Xk+1 = Xk − J−1F (Xk), (7)
where J−1 is the inverse of the Jacobian of F . Note that New-
tons method can be applied independently for each memristor
and access transistor pair.
VI. EXPERIMENTAL EVALUATION
The experimental results are obtained using a quad core 3.4
GHz Linux machine with 32GB of memory. The proposed
techniques are implemented in MATLAB. The default MCA
in the evaluation has dimensions 128x128, a wire resistance
rw=1Ω, and both the input and output resistance are 100Ω.
The programmable memristor conductance range is 2kΩ to
3MΩ [5]. We use the same non-ideal device models for the
memristors and access transistors as in [5], which is available
in [11]. The bit-accuracy for the memristors is set to 8 bits. The
programming errors are modeled using quantization, where it
is assumed that the distinguishable states in memristor con-
ductance range (or state space) are equidistant. The maximum
input voltage is set to 0.2V . The determined state variables
s and scaling factors α are evaluated using circuit simulation
with SPICE accuracy using the circuit model in Figure 2(a). In
Section VI-A, the proposed mapping is evaluated in terms of
matrix-vector multiplication. In Section VI-B, the proposed
mapping technique is evaluated in an DNN application. We
compare our results with the technique proposed in [5]. No
direct comparison is provided with in [7], since that work
considered a subproblem of our problem formulations.
A. Evaluation of matrix-vector multiplication
In this section, we compare the proposed mapping technique
with the state-of-the-art mapping technique in [5] using full
analog simulation using state variables s. The evaluation is
performed with respect to the maximum output error for
various input vectors and weight matrices. In Figure 7, it
is demonstrated that the proposed mapping results in 4X to
9X smaller errors based on the wire resistance, crossbar size,
number of memristors used per matrix element/weight, and
memristor device model. It is not surprising that significant
smaller maximum output errors are obtained because [5] is a
heuristic and the proposed mapping technique specifies both
g and α by leveraging the insights provided by the space of
representable matrices. Since the benefits are obtained using
only parameter optimization, the power and area is expected to
be extremely similar to in [5], [4], i.e., the benefits are obtained
with no overhead. In general, we find that when square MCAs
of size 32/64/128 are used, 64/60/48% of the programmable
memristor conductance range is utilized. The average run-time
is 0.5/1.5/5 min per matrix with a dimension of 32/64/128,
respectively. Note that for MCAs of dimension 128x128, only
a few α values were evaluated in order to control the run-time.
(a) (b)
(c) (d)
Fig. 7: Comparison with [5] using different (a) wire resistance,
(b) crossbar sizes, (c) number of memristors per weight, and
(d) non-ideal device models.
B. Evaluation of DNN applications
In this section, the proposed mapping and the method
in [5] are used to map DNNs trained in software using GPUs
to MCA based platforms for inference. The networks are
trained using Keras combined with TensorFlow. In particular,
we evaluate a four-layer feed-forward network trained on
the MNIST dataset and a seven-layer CNN trained on the
CIFAR-10 dataset. The feed-forward network has dimensions
784x500x300x10 and the properties of the CNN is shown in
Figure 8.
Input
Conv1
ReLU
Conv2
ReLU
Max Pool
Conv3
ReLU
Conv4
Max Pool
FC1
ReLU
FC2
Softmax
Output
Layers Weight matrix # times
dimensions used
Conv1 27x32 1024
Conv2 288x32 900
Conv3 288x64 900
Conv4 576x64 784
FC1 2304x512 1
FC2 512x10 1
(a) (b)
Fig. 8: (a) Layers of CNN. (b) Weight matrices in convolu-
tional (Conv) and fully-connected layers (FC).
The feed-forward network is mapped to an MCA based
platform by partitioning each weight matrix onto a grid of
128x128 MCAs. The CNN is mapped to an MCA based
platform using the kernel mapping in [9], where each con-
volutional layer and fully-connected layer is partitioned onto
a grid of 128x128 MCAs. The default settings for the MCA
are used with one memristor per weight. The classification
accuracy is computed using one thousand randomly selected
input images and SPICE level circuit simulation.
(a) (b)
Fig. 9: Classification accuracy for different DAC/ADC bit-
accuracies (a) MNIST (b) CIFAR-10.
In Figure 9, the classification accuracy achieved in MCA
based hardware is shown for DACs and ADCs with various
bit accuracies on the MNIST and CIFAR-10 datasets. We
also plot the upper bound on the classification accuracy that
can be achieved using DACs and ADCs, i.e., errors are only
introduced by the DACs and ADCs. The DACs and ADCs
use a fixed and dynamic reference voltage, respectively. In
Figure 9(a), it can be observed that when no DAC/ADC quan-
tization is performed (indicated with ∞) the classification ac-
curacy on the MNIST dataset using the mapping this work and
in [5] is 98.3% and 96.3%, respectively. The upper bound or
software accuracy is 98.4%. Moreover, the proposed mapping
technique follows the upper bound closely and outperforms
the mapping in [5] when DACs and ADCs with smaller bit-
accuracies are used. In Figure 9(b), it can be observed that the
proposed mapping achieves a classification accuracy of 71.8%,
whereas the mapping in [5] results in an accuracy of 20.5%
on CIFAR-10, which is close to the software classification
accuracy of 75.2%. Moreover, the proposed technique follows
the upper bound on CIFAR-10 closely for DACs and ADCs
with different bit-accuracies. It is easy to understand that the
proposed mapping technique outperforms the method in [4]
because each matrix-vector multiplication is performed with
6X smaller errors.
Work Mapping time
(h)
MNIST CIFAR-10
In [5] 0.28 0.61
This work 6.17 0.93
(a) (b)
Fig. 10: (a) Classification accuracy with RTN. (b) Run-time
of mapping DNNs to MCAs.
When moderate to severe (up to 20%) random telegraph
noise is included [10], the classification accuracy is gracefully
reduced (not impacted) on CIFAR-10 (MNIST), which is
shown in Figure 10(a). In Figure 10(b), we show the run-
time of our framework and the techniques in [5]. To limit the
run-time of mapping the weight matrices in the CNN to state
variables, we used the technique in [5] to map FC1, which is
not sensitive to errors. The run-time reported in the table is
based on our implementation of [5], where performance was
prioritized over run-time in the implementation.
VII. SUMMARY AND FUTURE WORK
In this paper, a technique for mapping arbitrary weight
matrices to MCAs is proposed. The technique improves the
computational accuracy of the state-of-the-art with 4X to 9X
and achieves close to software level accuracy on CIFAR-10
dataset when a CNN trained in software is mapped to an MCA
based platform for inference. We plan to reduce the run-time
of the algorithm in the future.
REFERENCES
[1] F. Alibart, L. Gao, B. D. Hoskins, and D. B. Strukov. High precision
tuning of state for memristive devices by adaptable variation-tolerant
algorithm. Nanotechnology, 23(7):075201, 2012.
[2] P. Chi et al. PRIME: a novel processing-in-memory architecture for
neural network computation in reram-based main memory. ISCA’16,
pages 27–39, 2016.
[3] M. Hu et al. Memristor crossbar-based neuromorphic computing system:
A case study. IEEE Transactions on Neural Networks and Learning
Systems, 25:1864–1878, 2014.
[4] M. Hu et al. Memristor-based analog computation and neural network
classification with a DPE. Adv. Materials, 30, 2018.
[5] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves,
S. Lam, N. Ge, J. J. Yang, and R. S. Williams. Dot-product engine for
neuromorphic computing: Programming 1T1M crossbar to accelerate
matrix-vector multiplication. DAC’16, pages 1–6, 2016.
[6] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. In Nature, pages
436–444, 2015.
[7] B. Liu et al. Reduction and IR-drop compensations techniques for
reliable neuromorphic computing systems. ICCAD’2014, pages 63–70,
2014.
[8] A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Stra-
chan, M. Hu, R. S. Williams, and V. Srikumar. ISAAC: a convolutional
neural network accelerator with in-situ analog arithmetic in crossbars.
ISCA’16, pages 14–26, 2016.
[9] L. Song, X. Qian, H. Li, and Y. Chen. Pipelayer: A pipelined reram-
based accelerator for deep learning. HPCA’17, pages 541–552, 2017.
[10] J. P. Strachan. DPE: Exploring high efficiency analog multiplication
with memristor arrays. In Int. Conf.on Rebooting Computing, 2015.
[11] J. P. Strachan, A. C. Torrezan, F. Miao, M. D. Pickett, J. J. Yang, W. Yi,
G. Medeiros-Ribeiro, and R. S. Williams. State dynamics and modeling
of tantalum oxide memristors. IEEE Transactions on Electron Devices,
60(7):2194–2202, 2013.
[12] L. Xia et al. Technological exploration of RRAM crossbar array
for matrix-vector multiplication. Journal of Computer Science and
Technology, 31(1):3–19, 2016.
