TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar
  Systems by Roy, Sourjya et al.
TxSim: Modeling Training of Deep Neural
Networks on Resistive Crossbar Systems
Sourjya Roy*, Shrihari Sridharan*, Shubham Jain† and Anand Raghunathan
*(Equal contributors listed in alphabetical order)
Purdue University, West Lafayette, IN, USA
Abstract—Deep Neural Networks(DNNs) have gained tremen-
dous popularity in recent years due to their ability to achieve
superhuman accuracies in a wide variety of machine learning
tasks. However, the compute and memory requirements of
DNNs have grown rapidly, driving a need for energy efficient
hardware. Resistive crossbars have attracted significant interest
in the design of the next generation of DNN accelerators due to
their ability to natively execute massively parallel vector-matrix
multiplications within dense memory arrays. However, crossbar-
based computations face a major challenge due to device and
circuit-level non-idealities, which manifest as errors in the vector-
matrix multiplications and eventually degrade DNN accuracy. To
address this challenge, there is a need for tools that can model
the functional impact of non-idealities on DNN training and
inference. Existing efforts towards this goal are either limited
to inference, or are too slow to be used for large-scale DNN
training.
We propose TxSim, a fast and customizable modeling frame-
work to functionally evaluate DNN training on crossbar-based
hardware considering the impact of non-idealities. The key
features of TxSim that differentiate it from prior efforts are:
(i) It comprehensively models non-idealities during all training
operations (forward propagation, backward propagation, and
weight update) and (ii) it achieves computational efficiency by
mapping crossbar evaluations to well-optimized BLAS routines
and incorporates speedup techniques to further reduce simulation
time with minimal impact on accuracy. TxSim achieves orders-
of-magnitude improvement in simulation speed over prior works,
and thereby makes it feasible to evaluate training of large-scale
DNNs on crossbars. Our experiments using TxSim reveal that the
accuracy degradation in DNN training due to non-idealities can
be substantial (3%-36.4%) for large-scale DNNs, underscoring
the need for further research in mitigation techniques. We also
analyze the impact of various device and circuit-level parameters
and the associated non-idealities to provide key insights that can
guide the design of crossbar-based DNN training accelerators.
1
I. INTRODUCTION
Deep Neural Networks (DNNs) have greatly advanced the
state-of-the-art in a wide variety of machine learning tasks [1],
[2]. However, these benefits come at the cost of extremely
high computation and storage requirements. GPUs and digital
CMOS-based accelerators [3], [4] have enabled faster and
more energy-efficient realization of DNNs. However, the con-
tinuing growth in network complexities and volumes of data
processed have led to the quest for further improvements in
hardware. For example, training state-of-the-art DNNs requires
exa-ops of compute and can take days to weeks on a GPU [5],
1†Shubham Jain is currently a research staff member at IBM T.J. Watson
Research Center, Yorktown Heights, NY (shubham.jain35@ibm.com)
while Neural Architecture Search (NAS) [6] further increases
computation requirements to zetta-ops.
Resistive Crossbars have emerged as promising building
blocks for future DNN accelerators. They are designed using
emerging non-volatile memory technologies such as PCM [7]
and ReRAM [8] that can enable high-density memory arrays,
while also realizing massive parallel vector-matrix multi-
plications (the dominant compute kernel of DNNs) within
these arrays. Thus, crossbar-based architectures promise to
overcome the data transfer and memory capacity bottlenecks
that are present in current DNN hardware platforms. Conse-
quently, many efforts have explored the design of crossbar-
based accelerators [9], [10]. We specifically focus on crossbar-
based architectures for training DNNs [11]–[13], which have
attracted increasing interest in recent years.
Crossbar-based systems face a major challenge due to
numerous device and circuit-level non-idealities, viz., driver
and sensing resistances, analog-to-digital converter (ADC) and
digital-to-analog converter (DAC) non-linearity, interconnect
resistances, process variations, noise in synaptic devices, im-
perfect write and update operations, and sneak paths [12],
[14]–[16]. Unless addressed, these non-idealities can signif-
icantly degrade DNN accuracy, threatening the viability of
crossbar-based hardware [10]. To quantitatively evaluate and
address this challenge, there is a need for tools that can model
the impact of all non-idealities on each step of DNN train-
ing (forward propagation, backward propagation, and weight
update). DNN training on native hardware (e.g., GPUs) is
already very slow, and software simulation of DNN training
on crossbar-based systems (emulated hardware) will further
slow down the training process considerably. Therefore, it is
extremely important that the modeling tool maintains high
simulation speed (same order-of-magnitude as DNN training
on native hardware). The tool should also be customizable and
support a wide variety of device and circuit parameters and
DNN topologies.
In this work, we propose TxSim, a tool to functionally
evaluate DNN training on crossbar-based systems, which
meets the aforementioned requirements. TxSim utilizes a
three-stage vector-matrix multiplication model to capture the
impact of non-idealities during forward and backward prop-
agation operations with good fidelity and simulation speed.
The first stage consists of a non-linear conversion of digital
inputs to voltages considering DAC non-idealities. The second
stage models the non-idealities within the core crossbar array
(interconnect parasitics, sneak paths and process variations)
ar
X
iv
:2
00
2.
11
15
1v
2 
 [c
s.L
G]
  1
0 J
ul 
20
20
as a series of linear-algebraic transformations wherein ideal
conductance matrices are converted to non-ideal conductance
matrices. The final stage consists of the non-linear transfor-
mation of the currents back to digital outputs considering
ADC non-idealities. Such an approach to modeling allows us
to seamlessly utilize highly-optimized BLAS routines present
in standard ML frameworks ( e.g., PyTorch, Tensorflow and
Caffe). TxSim also models the weight update non-idealities
(stochastic noise and update non-linearity) using optimized
BLAS routines. TxSim also utilizes speedup techniques that
further reduce simulation time without impacting modeling
fidelity.
Prior efforts on functional modeling of crossbar-based DNN
hardware can be broadly classified into efforts that model
inference [14]–[16] and efforts that model training [12], [14].
Inference models are not sufficient for evaluating DNN train-
ing, since training includes additional backward propagation
and weight update operations. As elaborated in Section II,
TxSim’s modeling approach and speedup techniques make it
108x-2000x faster than prior efforts to model DNN training
on crossbars [12], [14]. It achieves this while also being more
comprehensive in the non-idealities modeled (e.g., sneak paths
and wiring parasitics), and being customizable to different
DNN topologies and circuit and device parameters. In sum-
mary, our key contributions are:
• We propose TxSim, a scalable and customizable mod-
eling framework to functionally evaluate DNN training
on crossbar-based system. TxSim models a more com-
prehensive set of non-idealities and is 108x-2000x faster
than prior training frameworks.
• We introduce speedup techniques utilizing approximate
but high fidelity models to further improve simulation
speeds.
• We analyze the impact of various device and circuit-level
parameters and the associated non-idealities on DNN
training to provide key insights to guide cross-layer opti-
mizations for crossbar-based DNN training accelerators.
The paper is organized as follows. Section II discusses the
previous works that model training and inference of crossbar
based architectures. Section III provides a brief overview of
DNN training and background on resistive crosasbar based
systems. Section IV presents the TxSim modeling tool by
enumerating various design components. Section V discusses
the evaluation methodology. Section VI quantifies the applica-
tion level accuracy degradation of DNN training on resistive
crossbars and provides sensitivity analysis to various circuit
and device level parameters and Section VII concludes the
paper.
II. RELATED WORK
In this section, we discuss prior efforts to modeling in-
ference and training on crossbar-based systems, as well as
training algorithms/methodologies for such systems.
Inference modeling. PytorX [15] and RxNN [16] are model-
ing tools that consider the impact of non-idealities in crossbar-
based inference. Other works [17], [18] propose methods to
compensate for accuracy degradation. However, these tools
are not directly applicable to crossbar-based training, which
involves modeling the non-idealities in the backward propa-
gation and weight update phases as well.
Training accelerators. Various architectures [11]–[13] have
been proposed that perform DNN training on crossbar-based
hardware. MNSIM [19] is a tool for early design space
exploration of such architectures. The major focus of these
works have been on area, speed and energy while ignoring, or
assuming very primitive error models for, accuracy evaluation.
Modeling crossbar-based training. Two noteworthy efforts
that model crossbar-based training are CrossSim [12] and
NeuroSim [14]. Table I compares our work with these efforts
along two important dimensions – the fidelity in modeling non-
idealities and the simulation time. CrossSim considers only the
errors due to device updates and peripheral circuits, and reports
results only on a simple 3-layer network and small data set
(MNIST). NeuroSim supports training only with MLPs (fully
connected networks). In contrast, TxSim considers all circuit
and device level non-idealities and is capable of evaluating
more complex networks. When considering simulation time,
CrossSim requires about a week to train a simple 3-layer
network on MNIST. TxSim is 108x faster than CrossSim for
the same task. NeuroSim takes around 5 minutes to perform
inference for a single image. Projected to the full CIFAR 10
validation dataset, this would translate to 34 days for inference
alone (and training would take much longer). In contrast, our
framework is able to train Alexnet on CIFAR100 with only
0.0136 seconds per image, which is around 2000x faster.
TABLE I: Differentiation with other design tools for training
Characteristic CrossSim NeuroSim TxSim
DAC and ADC
non-linearity 3 3 3
Sneak paths 7 7 3
Wire parasitics 7 Simple Detailed
Update
non-linearity 3 3 3
Update noise 3 3 3
Network
topology/Dataset
MLP-
MNIST MLP-MNIST
MLP/CNN/RNN
- All datasets
Customizability
to other devices
and architectures
Low Medium High
Training time
0.26 seconds
per image-
MNIST/MLP
15 minutes per
image-
CIFAR10/VGG8
0.38 seconds
per image-
CIFAR10/VGG8
Training algorithms. Given the errors intrinsic in crossbar-
based computing, it is important to come up with the right
kind of algorithms for training to converge to a good accuracy.
To this end, previous efforts [20], [21] propose enhanced
algorithms that help overcome non-idealities such as device
non-linearity, asymmetry and stochastic noise. Our work com-
plements these efforts by providing a generic modeling tool
that provides an accurate estimate of the degradation due
to non-idealities. We expect such tools to further enable
future development of crossbar-based training architectures
and algorithms.
Fig. 1: TxSim framework for evaluating DNN training on crossbar-based systems
III. PRELIMINARIES
In this section, we present a brief overview of DNN training,
background on resistive crossbars and the non-idealities in
the crossbar array during all phases of training (forward
propagation, backward propagation and weight update).
A. DNN Training
DNN training involves learning weights (strength of the
connection) between neurons in each layer to match the output
of the neural network to it’s true label. The model is typically
initialized with random weight parameters that gets updated
iteratively using Stochastic Gradient Descent(SGD). Different
subsets of training data known as minibatches are fed to
the model in each iteration to minimize the loss between
the output and it’s true label. The overall training data is
fed to the network multiple times until the loss reaches an
optimum value. The performance of DNN is measured by
the total number of correct predictions on unseen test data.
DNN training consists of three stages - Forward Propagation,
Backward Propagation and Weight Update.
Forward Propagation. Minibatch inputs are multiplied
by the layer weights in order to obtain output activations.
The activation outputs are often passed through non-linear
operations such as sigmoid and ReLU.
Backward Propagation. The final layer activations determine
the loss with respect to the true label and the gradient of
loss with respect to weights and activations. The activation
gradients are backpropagated to the previous layers.
Weight Update. The accumulated weight gradients in each
layer are multiplied with a prefigured learning rate that
controls the amount that the weights get updated.
Fig. 2: Resistive crossbar array with peripherals
B. Resistive Crossbars
Resistive Crossbars are 2D arrays of orthonogal wires that
efficiently realizes matrix vector multiplications. At every
junction, the wordline and bitline are connected with a non
volatile memory(NVM) device. An ideal N × M resistive
crossbar consists of N rows and M columns(shown in Figure
2). Initially, a write operation is used to program the NVM
devices serially to a desired conductance state. A digital to
analog converter(DAC) converts N digital inputs to analog
voltages fed to the read wordlines. All the wordlines are
activated concurrently and the current from each NVM device
is accumulated and sensed at the output of the corresponding
bitline. Finally, the M analog currents are converted to
digital outputs by passing through an analog to digital
converter(ADC).
C. Non-idealities in crossbars
Peripheral circuitry. The analog computation in the crossbar
array requires digital to analog (DAC) and analog to digital
(ADC) converters. The DACs and the ADCs are non-linear and
limited in precision to keep their area and power overheads
low.
Circuit non-idealities. The wire resistances, source resis-
tances, sink resistances and sneak paths in the crossbar array
impact the column currents, causing errors in the vector-matrix
multiplication. The errors from the voltage drops across the
parasitics and the current from the sneak paths makes the
actual current deviate a lot from the ideal output current for a
column.
Device non-idealities. The synaptic elements within the cross-
bar array are inherently stochastic. Existing device technolo-
gies can only support precisions upto 6 bits. These devices
also exhibit a non-linear and asymmetric behaviour, suffer
from process variations, drift and limited endurance, which
can affect the overall classification accuracy.
IV. TXSIM MODELING FRAMEWORK
TxSim is a highly customizable and scalable modeling tool
that evaluates the application-level accuracy of DNNs trained
on crossbar-based hardware. Figure 1 outlines the TxSim
modeling process. TxSim takes three main inputs: (i) the
network architecture that defines the number of layers, and the
numbers and sizes of input/output channels and kernels, (ii) the
hardware architecture parameters such as weight and activation
precisions, mapping strategy, level of non-ideality modeling,
etc., and (iii) the crossbar parameters, including DAC/ADC
models, crossbar dimensions, synaptic device characteristics,
etc. The rest of the section describes TxSim by going over
each component in detail.
A. Non-ideal conductance generator
Fig. 3: Non-ideal conductance generator
A typical flow of the non-ideal conductance generator is
shown in Figure 3. The non-ideal conductance generator
analyzes the non-idealities associated with the core crossbar
array, viz., the wire resistances, sink and source resistances,
sneak paths, and process variations. It takes an ideal con-
ductance matrix as an input and converts it into a non-ideal
conductance matrix that incorporates these core array non-
idealities. First, the ideal conductance matrix (Gideal−updated)
is mapped to one or more synaptic devices based on the on-off
ratio and the precision of each device. Next, the ideal conduc-
tance matrix is partitioned into crossbar instances based on the
specified crossbar dimensions. Within each crossbar instance,
the positive and negative conductances may be further mapped
onto separate crossbars, obtaining two different currents that
are subtracted. Lastly, process variations are applied to each
synaptic element based on the specified variation profile.
The ideal conductance matrices are converted to non-ideal
conductance matrices by applying fast crossbar model(FCM),
which was originally proposed for modeling inference [16].
The FCM conversion mechanism applies Kirchhoff’s circuit
laws to account for interconnect resistances. Although this pro-
cess is very accurate, it causes slowdown during the training
simulation because it needs to be used after very minibatch
iteration due to weight updates. Therefore, we also propose
speedup techniques that approximate FCM while maintaining
good modeling fidelity (discussed in section IV-D). Finally,
the crossbar instances are stitched back together to obtain the
non-ideal conductance matrix. Note that a copy of the ideal
conductance matrix is always preserved and used to obtain the
(Gideal−updated) for the next minibatch iterations.
B. Three-stage vector-matrix multiplication model
Once we obtain the non-ideal conductance matrix, we utilize
a three-stage model (shown in Figure 1) to perform the forward
and backward passes. The incoming digital inputs of each
crossbar are converted to voltages depending on the user’s
choice of DAC. The voltages and the non-ideal conductance
matrix are fed to the underlying BLAS functions to obtain
column currents. Subsequently, the column currents are fed
to the ADC model and propagated to the next layer. The
maximum current through ADCs is data dependent and ob-
tained by collecting output distribution statistics over multiple
training epochs. Peripheral operations such as ReLU, sigmoid,
batchnorm, and pooling are computed in the digital domain
and are hence unimpaired by crossbar non-idealities.
Fig. 4: Update noise modeling using TxSim
C. Update model
For efficient weight updates, various parallel update
schemes have been proposed [12], [22], wherein inputs and
errors are converted to voltages and fed to the rows and
columns of the crossbar simultaneously. The change in synap-
tic conductance is proportional to the product of the voltages,
which translates to the weight update operation. The voltages
are either converted to time and magnitude based pulses [12]
or modeled as stochastic bit streams [22] whose coincidence
yields a multiplicative effect. To convert digital gradients
to ∆Gideal, for every layer, a pre-determined scaling factor
(ScaleLayer−k) is used (shown in Figure 8). ScaleLayer−k
is determined using weight (Wmax) and gradient (∆Wmax)
statistics from native DNN training. The major non-idealities
during update operations are stochastic noise of synaptic
devices and the asymmetric write non-linearity [12], which
are both modeled in TxSim.
The update model, shown in equation (1), depends on the
sign of the update. It depends on the current conductance state
(G), the minimum conductance (Gmin), the maximum conduc-
tance (Gmax), the ideal conductance conductance (∆Gideal) and
the update non-linearity factor (v). Due to the non-linear nature
of the device, the updated conductances deviate from the
original values based on v. Another source of non-ideality is
the write noise which arises due to the stochastic nature of the
device [12]. Gnon-ideal is sampled from a Gaussian distribution
whose standard deviation is γ ∗ √(Gmax −Gmin) ∗Gideal,
where γ is the write noise factor. The write noise is directly
proportional to the size of the gradient and higher write noise
factor translates to more write noise being applied to Gideal.
After obtaining ∆Gnon-ideal (see Figure 8), it is passed the
optimizer (such as SGD or Adam). Gideal-previous, stored for
every layer is now updated to obtain a new conductance
matrix Gideal-updated, and subsequently passed to the non-ideal
conductance generator to obtain the next set of non-ideal
conductances. This process is repeated for each DNN layer
over multiple epochs until training converges.
(1)
D. Speedup Techniques
As mentioned earlier, the generation of the non-ideal conduc-
tance matrices is very slow and, while acceptable for inference
(where it is one-time), does scale to DNN training (where
it needs to be invoked after each minibatch, when weights
change). Therefore, we present two complementary speedup
techniques that significantly accelerate training simulation
while preserving good modeling fidelity.
Approximate analytical model (AAM). The current at the
output column in a non-ideal crossbar can be viewed as a sum
of many terms, each corresponding to a path through the cross-
bar. In the AAM model, we consider a subset of these paths
(typically the shorter paths from each row to each column),
while ignoring the longer paths (as shown in Figure 5(a)).
Fig. 5: (a) Approximate Analytical Model overview, (b)
Accuracy comparison for various crossbar array models
The current for each path is computed considering the source
(rsource), sink (rsense), and wire resistances (rrow and rcol).
The AAM model allows us to seamlessly trade-off efficiency
for accuracy by simply considering more or fewer paths.
We plot the modeling error of AAM with respect to FCM
for 64x64 crossbars with different Rmin-Rmax ranges in
Figure 6. As shown, AAM is not suitable for case(c) with
low Rmin-Rmax as it results in considerable errors. However,
for higher Rmin-Rmax range [case (a)] modeling errors are
negligible, and in case (b), modeling errors are quite small.
Therefore, AAM is used selectively only when the synaptic
device resistance range is much higher than wire resistances.
Fig. 6: Error-map of Approximate Analytical model w.r.t
FCM for different resistance ranges
Interpolated-FCM. In this speedup technique, we perform
FCM selectively – only once every L minibatch iterations
(as opposed to each iteration). Every time FCM is per-
formed, the net synaptic conductance distortion due to non-
idealities ((Gideal-Gnon−ideal)/Gideal) is profiled and stored.
For the subsequent L-1 iterations, the Gnon−ideal is computed
using the stored distortion profile. Figure 5(b) shows the
application-level accuracy for various models –FCM, AAM,
and Interpolated-FCM for the LetNet-5 DNN on MNIST
dataset. As shown, the speedup techniques can effectively
model DNN training without much loss in modeling fidelity
(note the highly magnified y-axis range).
V. EXPERIMENTAL METHODOLOGY
In this section, we briefly describe the methodology used to
evaluate TxSim. The synaptic device used is a Ag/Si ReRAM
technology [23] with Rmin= 100KΩ, Rmax=1 MΩ, and read
voltage of 0.5V. The DAC and ADC models are calibrated
with SPICE based on designs obtained from [24] and [25].
The row and column resistances are derived from circuit layout
and found to be 1Ω and 4.6Ω, respectively. We conservatively
assume 32-bit precision for all data structures (viz.) weights,
activations and errors in DNN training based on the scheme
proposed in [26], since it provides classification accuracy close
to floating-point training [27]. Our simulations can be realized
using 64x64 crossbar arrays with 2-bit synaptic devices and
input streaming through 1-bit DAC.
VI. RESULTS
In this section, we present results from applying TxSim to
evaluate the impact of non-idealities on the accuracy of
DNNs trained using crossbar-based systems. We also analyze
DNN training sensitivity to various device and circuit-level
parameters to provide insights for future research.
A. Simulation speed
To quantify
Fig. 7: Slowdown w.r.t software
training per epoch
the advantage
of TxSim in
simulation
speed, we first
compare it to prior
frameworks
that model
training, viz. Neu-
roSim [14] and
CrossSim [12]. We
achieve 108x and
2000x simulation
speedup compared
to CrossSim [12] and NeuroSim [14], respectively. Next, we
compare the simulation speed of TxSim to native fixed-point
training on a NVIDIA GeForce GTX 1080 Ti GPU. For
these experiments, we use a batch size of 128 and a crossbar
size of 64x64. From Figure 7, we can observe that software
simulation of DNN training on crossbar-based system is
only 14x slower compared to FxP training on GPU. This is
reasonable considering the fact that TxSim emulates DNN
training on crossbar-based system with high modeling fidelity
by considering all crossbar non-idealities during the forward,
backward, and update operations.
B. Application-level accuracy
To evaluate the impact of crossbar non-idealities on DNN
training, we trained image classification networks on bench-
marks on an ideal crossbar system (Cross-Ideal) without any
non-idealities and a non-ideal crossbar system (Cross-NI)
with all crossbar non-idealities. The test accuracy vs. training
epochs for smaller networks such as AlexNet and LeNet-5 is
reported in Figure 9. As mentioned earlier, We used a 64x64
Fig. 8: Application level accuracy for large scale networks
crossbar with Rmin=100KΩ, Rmax=1MΩ, non-linearity factor
(v) = 0.01, and stochastic noise factor (γ) = 5. In order to
have verify the modularity of TxSim, we also evaluated a 16-
bit Cross-NI system on MNIST. Next, the application level
accuracy for larger models such as ResNet-56 on CIFAR100
is shown in Figure 8. The accuracy degradation due to crossbar
non-idealities (Cross-NI) is observed to be 3%-36.4% across
the benchmarks. Moreover, the impact of non-idealities is
found to be more prominent on the more complex ResNet-
56 and VGG-16 than the simple LeNet-5/AlexNet DNNs.
We also observe the accuracy degradation in ResNet-20 to
be higher than ResNet-56 and VGG-16. This is because the
non-idealities are learnt better in large-scale DNNs as the
number of layers increase. Clearly, there is a need to bridge
the accuracy gap due to crossbar non-idealities to enable
adoption of crossbar-based system for training DNNs. To
guide potential solutions to this challenge, we next perform
sensitivity analysis to provide insights into the impact of
device and circuit-level parameters on accuracy degradation.
C. Sensitivity Analysis
Fig. 10: Sensitivity to update non-idealities
Sensitivity to update non-idealities. Figure 10 shows the
effect of update non-idealities, viz., write non-linearity and
stochastic noise on the application-level accuracy. The crossbar
dimensions, on-off ratio and other hardware parameters are
kept constant. As non-linearity factor (v) increases from 0.01
to 0.1, there is almost no drop in accuracy. However, when v is
increased to 0.5 and subsequently to 1, the effect of write non-
linearity is very prominent, resulting in large accuracy degra-
dation. Next, the stochastic noise factor(γ), which determines
the standard deviation of the Gaussian distribution from which
the write noise is sampled, is varied between 1 to 10. For γ=1
and γ=5, the drop in accuracy is almost negligible. However,
γ=10 leads to significant (10%) drop in accuracy. From these
experiments, we conclude that the non-linearity factor should
be maintained between 0.1 to 1 and the stochastic noise factor
between 1 to 5 for DNN training on resistive crossbars.
Sensitivity to crossbar
Fig. 11: Sensitivity to crossbar
dimensions
dimensions. Figure 11
shows the application-
level accuracy with in-
crease in the crossbar
dimensions. The accu-
racy drops slightly for
larger crossbars due to
increasing impact of
all non-idealities in-
cluding DACs, ADCs,
wire resistances and
sneak paths. In our experiments, the on-off ratio (Rmax/Rmin)
of the synaptic device is orders-of-magnitude higher than the
wire resistances, highlighting an important observation that
the non-idealities due to wire parasitics is less prominent for
devices such as ReRAM and PCM. However, the effect will
be very prominent when the resistance range of the synaptic
device is closer to the wire resistances, e.g. Spintronic devices.
Sensitivity to on-off ratio. To determine the effect of on-off
ratio (Rmax/Rmin) on the application-level accuracy, we fix
the inputs to the crossbar and change Rmin and Rmax. We
performed two sets of experiments – (i) decrease Rmin and
fix Rmax and (ii) increase Rmax and fix Rmin. From both
the experiments, as indicated in Figure 12, we observe that
decreasing the on-off ratio can degrade accuracy significantly.
However, increasing the Rmax values to attain high on-off
ratio can make the sensing current low and hence can lead to
a decrease in the accuracy due to sensing errors. On the other
hand, decreasing Rmin can have an even greater effect because
of circuit non-idealities. Circuit non-idealities have a greater
impact when the synaptic device resistance range is close to
wire parasitics. For this particular configuration of the device,
when Rmax is 1MΩ, Rmin can be decreased to maintain a
ratio of 8 for the best classification accuracy. Similarly, for
the second experiment, the best accuracy is obtained when
the on-off ratio is 5.
Fig. 12: Sensitivity to Rmax/Rmin ratio
VII. CONCLUSION
Crossbar-based systems are extremely promising for effi-
ciently executing DNN training. In this work, we propose
TxSim, i.e., is a scalable and customizable modeling tool that
evaluates DNN training on resistive crossbars considering the
impact of all computational non-idealities. TxSim models a
more comprehensive set of non-idealities than prior works.
It seamlessly utilizes well optimized CuBLAS routines to
model non-idealities during all DNN training operations, and
achieves 108x-2000x speedup over prior frameworks. To fur-
ther improve the simulation runtime for complex datasets and
network architectures, we also propose speedup techniques,
viz., approximate analytical model (AAM) and interpolated
FCM, that show a good balance between modeling fidelity and
simulation runtime. Using TxSim, we evaluate several DNN
benchmarks and observe that the accuracy degradation can be
considerable (3%-36.4%). We also perform sensitivity analysis
to gain further insights into the impact of various circuit and
device level parameters on DNN training.
VIII. ACKNOWLEDGMENT
This work was supported by C-BRIC, one of six centers
in JUMP, a Semiconductor Research Corporation (SRC) pro-
gram, sponsored by DARPA.
REFERENCES
[1] R. Parloff. The AI Revolution: Why Deep Learning Is Suddenly Chang-
ing Your Life. http://fortune.com/ai-artificial-intelligence-deep-machine-
learning/ . Online. Accessed Sept. 17, 2017.
[2] Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos,
Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam
Coates, and Andrew Y. Ng. Deep speech: Scaling up end-to-end speech
recognition, 2014.
Fig. 9: Test accuracy curve showing impact of crossbar non-idealities
[3] Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav
Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden,
Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris
Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben
Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland,
Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert
Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexan-
der Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen
Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris
Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean,
Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi
Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick,
Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir
Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snel-
ham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory
Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard
Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. In-datacenter
performance analysis of a tensor processing unit. In Proceedings of the
44th Annual International Symposium on Computer Architecture, ISCA
’17, page 1–12, New York, NY, USA, 2017. Association for Computing
Machinery.
[4] Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massen-
gill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan
Adams, Mahdi Ghandi, Stephen Heil, Prerak Patel, Adam Sapek, Gabriel
Weisz, Lisa Woods, Sitaram Lanka, Steven K. Reinhardt, Adrian M.
Caulfield, Eric S. Chung, and Doug Burger. A configurable cloud-
scale dnn processor for real-time ai. In Proceedings of the 45th Annual
International Symposium on Computer Architecture, ISCA ’18, page
1–14. IEEE Press, 2018.
[5] Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar
Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dhee-
manth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan.
Scaledeep: A scalable compute architecture for learning and evaluating
deep networks. SIGARCH Comput. Archit. News, 45(2):13–26, June
2017.
[6] Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural ar-
chitecture search: A survey. Journal of Machine Learning Research,
20(55):1–21, 2019.
[7] B. Rajendran, H. Lung, and C. Lam. Phase change memory —
opportunities and challenges. In 2007 International Workshop on Physics
of Semiconductor Devices, pages 92–95, 2007.
[8] H.Akinaga et al. Resistive Random Access Memory (ReRAM) Based
on Metal Oxides. 2010.
[9] Catherine D. Schuman, Thomas E. Potok, Robert M. Patton, J. Douglas
Birdwell, Mark E. Dean, Garrett S. Rose, and James S. Plank. A survey
of neuromorphic computing and neural networks in hardware. CoRR,
abs/1705.06963, 2017.
[10] Shubham Jain, Aayush Ankit, Indranil Chakraborty, Tayfun Gokmen,
Malte J. Rasch, Wilfried Haensch, Kairshik Roy, and Anand Raghu-
nathan. Neural network accelerator design with resistive crossbars:
Opportunities and challenges. IBM J. Res. Dev., 63:10:1–10:13, 2019.
[11] Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. Pipelayer:
A pipelined reram-based accelerator for deep learning. 2017 IEEE
International Symposium on High Performance Computer Architecture
(HPCA), pages 541–552, 2017.
[12] Sapan Agarwal, Steven J. Plimpton, David R. Hughart, Alexander H.
Hsia, Isaac Richter, Jonathan A. Cox, Conrad D. James, and Matthew J.
Marinella. Resistive memory device requirements for a neural algorithm
accelerator. 2016 International Joint Conference on Neural Networks
(IJCNN), pages 929–938, 2016.
[13] Go¨kmen Tayfun and Yurii Vlasov. Acceleration of deep neural net-
work training with resistive cross-point devices: Design considerations.
Frontiers in Neuroscience, 10, 2016.
[14] Pai-Yu Chen, Xiaochen Peng, and Shimeng Yu. Neurosim: A circuit-
level macro model for benchmarking neuro-inspired architectures in
online learning. IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, 37:3067–3080, 2018.
[15] Zhezhi et al. Noise injection adaption: End-to-end reram crossbar non-
ideal effect adaption for neural network mapping. In Proc DAC 2019.
[16] S. Jain, A. Sengupta, K. Roy, and A. Raghunathan. RxNN: A Framework
for Evaluating Deep Neural Networks on Resistive Crossbars. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and
Systems, pages 1–1, 2020.
[17] Indranil Chakraborty, Deboleena Roy, and Kaushik Roy. Technology
aware training in memristive neuromorphic systems for nonideal synap-
tic crossbars. IEEE Transactions on Emerging Topics in Computational
Intelligence, 2:335–344, 2018.
[18] Shubham Jain and Anand Raghunathan. Cxdnn: Hardware-software
compensation methods for deep neural networks on resistive crossbar
systems. ACM Trans. Embed. Comput. Syst., 18(6), November 2019.
[19] L. Xia, B. Li, T. Tang, P. Gu, P. Chen, S. Yu, Y. Cao, Y. Wang,
Y. Xie, and H. Yang. Mnsim: Simulation platform for memristor-based
neuromorphic computing system. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, 37(5):1009–1022,
2018.
[20] T. Gokmen et al. Training deep convolutional neural networks with
resistive cross-point devices. Frontiers in Neuroscience, 2017.
[21] M.J. Rasch et al. Training large-scale ANNs on simulated resistive
crossbar arrays, 2019.
[22] Seyoung Kim, Tayfun Gokmen, Hyung-Min Lee, and Wilfried E.
Haensch. Analog cmos-based resistive processing unit for deep neural
network training. 2017 IEEE 60th International Midwest Symposium on
Circuits and Systems (MWSCAS), pages 422–425, 2017.
[23] Kuk-Hwan Kim, Siddharth Gaba, Dana C. Wheeler, Jose M. Cruz-
Albrecht, Tahir Hussain, Narayan Srinivasa, and Wei Lu. A functional
hybrid memristor crossbar-array/cmos system for data storage and
neuromorphic applications. Nano letters, 12 1:389–95, 2012.
[24] Jintao Zhang, Zhuo Wang, and N. Verma. A machine-learning classifier
implemented in a standard 6t sram array. In 2016 IEEE Symposium on
VLSI Circuits (VLSI-Circuits), pages 1–2, 2016.
[25] Jing Li, Chao-I Wu, Scott C. Lewis, Jackie Morrish, Tien-Yen Wang,
Richard Jordan, Tom Maffitt, Matthew J. Breitwisch, Alejandro G.
Schrott, Roger Cheek, Hsiang-Lan Lung, and Chung Lam. A novel
reconfigurable sensing scheme for variable level storage in phase change
memory. 2011 3rd IEEE International Memory Workshop (IMW), pages
1–4, 2011.
[26] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish
Narayanan. Deep learning with limited numerical precision. In
Proceedings of the 32nd International Conference on International Con-
ference on Machine Learning - Volume 37, ICML’15, page 1737–1746.
JMLR.org, 2015.
[27] Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Sapan Agarwal,
Matthew Marinella, Martin Foltin, John Paul Strachan, Dejan Milojicic,
Wen mei Hwu, and Kaushik Roy. Panther: A programmable architecture
for neural network training harnessing energy-efficient reram, 2019.
