RESPARC: A Reconfigurable and Energy-Efficient Architecture with
  Memristive Crossbars for Deep Spiking Neural Networks by Ankit, Aayush et al.
RESPARC: A Reconfigurable and Energy-Efficient
Architecture with Memristive Crossbars for Deep Spiking
Neural Networks
Aayush Ankit, Abhronil Sengupta, Priyadarshini Panda, Kaushik Roy
School of Electrical and Computer Engineering, Purdue University
{aankit, asengup, pandap, kaushik}@purdue.edu
ABSTRACT
Neuromorphic computing using post-CMOS technologies is
gaining immense popularity due to its promising abilities to
address the memory and power bottlenecks in von-Neumann
computing systems. In this paper, we propose RESPARC - a
reconfigurable and energy efficient architecture built-on Mem-
ristive Crossbar Arrays (MCA) for deep Spiking Neural Net-
works (SNNs). Prior works were primarily focused on device
and circuit implementations of SNNs on crossbars. RESPARC
advances this by proposing a complete system for SNN accel-
eration and its subsequent analysis. RESPARC utilizes the
energy-efficiency of MCAs for inner-product computation and
realizes a hierarchical reconfigurable design to incorporate the
data-flow patterns in an SNN in a scalable fashion. We eval-
uate the proposed architecture on different SNNs ranging in
complexity from 2k–230k neurons and 1.2M–5.5M synapses.
Simulation results on these networks show that compared to
the baseline digital CMOS architecture, RESPARC achieves
500× (15×) efficiency in energy benefits at 300× (60×) higher
throughput for multi-layer perceptrons (deep convolutional
networks). Furthermore, RESPARC is a technology-aware ar-
chitecture that maps a given SNN topology to the most opti-
mized MCA size for the given crossbar technology.
CCS Concepts
•Hardware → Emerging architectures;
Keywords
Reconfigurablity, Energy-Efficiency, Spiking Neural Network,
Memristive Crossbars
1. INTRODUCTION AND RELATED WORK
Deep Learning Networks (DLN) inspired from the hierar-
chical organization of neurons and synapses in human brain
are an important class of machine learning algorithms and
have redefined the state-of-the-art for many cognitive applica-
tions [1]. However, DLNs involve data-intensive computations
that lead to high power and memory bandwidth requirements
on von-Neumann machines. As a result, the power budget
they thrive on is multiple orders of magnitude greater than
the human brain. For instance, AlexNet [1] that won the Im-
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
c© 2017 ACM. ISBN 978-1-4503-2138-9.
DOI: 10.1145/1235
ageNet challenge in 2012 consisted of 650k neurons and 60M
synapses, and thrives on 2-4 GOPS of compute power per clas-
sification. Such power and memory bottlenecks have inspired
the research in neuromorphic computing to build efficient ar-
chitectures for accelerating neural networks by overcoming the
von-Neumann bottlenecks. To this effect, several works have
shown DLN implementations using graphic processing units,
multi-core processors and hardware accelerators [2, 3].
While DLNs are being successfully used in many recog-
nition applications, there is a growing shift in the research
community towards a more biologically plausible and energy-
efficient computing paradigm, Spiking Neural Networks (SNN)
[4]. Driven by brain-like spike based computations, SNN in-
volves event-driven data processing making them the emerging
choice for energy-efficient recognition applications. Addition-
ally, recent researches have shown deep SNNs to exhibit high
accuracy on various complex recognition tasks [4]. However,
CMOS implementations of neuromorphic systems to acceler-
ate SNNs suffer from power and area inefficiencies that stem
from the realization of neuron and synapse functionality using
primitives namely instructions and Boolean logic, resulting in
dozens of transistors to mimic a single neuron/synapse [5].
The limitations of CMOS can be addressed with emerging
technologies, such as memristive devices that realize synap-
tic functionalities with very high efficacy [6]. Crossbars made
up of these devices at the cross-points have been studied for
energy-efficient inner-product engines [7,8]. This has furthered
the efforts to realize in-memory processing based architectures
using Memristive Crossbar Arrays (MCA) for neuromorphic
applications. However, MCA size is a strong function of tech-
nology, for example, Phase Change Memories (PCM) [9], Ag-
Si [6], Spintronic devices [10] etc. Large crossbars allow more
flexibility in directly mapping an SNN onto it. This can also
reduce peripheral overheads and thereby improve overall en-
ergy consumption. However, large crossbars are infeasible as
they suffer from non-idealities like sneak-paths, process vari-
ations and parasitic voltage drops [11, 12] which lead to erro-
neous computations. This necessitates the design of reconfig-
urable platforms for SNNs that can utilize the MCA energy
benefits as well as address the limitations posed by MCA size.
In this work, we introduce RESPARC - a novel reconfig-
urable neuromorphic architecture built on MCAs for efficient
implementation of SNN applications. The post-CMOS tech-
nology based MCAs provide efficient realization of synapses
[6]. Additionally, crossbars store the network weights thereby
enabling “in-memory processing”. This circumvents the prob-
lems associated with frequent and large volumes of data trans-
fer between CPU and memory for implementing DLNs on
conventional computing systems [13]. We translate the event-
driven nature of SNN computations to architectural techniques
(discussed in section 3.2) in order to achieve higher energy-
ar
X
iv
:1
70
2.
06
06
4v
1 
 [c
s.E
T]
  2
0 F
eb
 20
17
efficiency. Hence, RESPARC aims at synergically combin-
ing the benefits of SNN and the design space of emerging
technology using MCAs. Organizationally, RESPARC is a
three-tiered reconfigurable platform designed to incorporate
the data-flow patterns in any neural network in a scalable
fashion. Each tier is targeted to bring in a specific variety of
reconfigurability with respect to the SNN morphology. The
three tiers are namely:
1. Macro Processing Engine - reconfigurable com-
pute unit to map neurons with variable fan-in.
2. NeuroCell - reconfigurable datapath to map SNNs
with varying inter and intra layer connectivities namely
Multi-Layer Perceptrons (MLPs) and Convolutional Neu-
ral Networks (CNNs).
3. RESPARC - reconfigurable core to map SNNs with
varying size (number of layers).
RESPARC is a spatially scalable architecture. It enumer-
ates the synapses across MCAs on different mPEs, with mPEs
spread across different NeuroCells thereby, using more MCAs
for mapping a larger spiking neural network. Additionally,
RESPARC’s reconfigurability enables the usage of variable
MCA sizes for mapping a given SNN topology. Hence, for any
given MCA technology, a size which is permissible by the tech-
nology constraints for proper operation can be chosen thereby,
enabling “technology-aware” mapping of SNNs on RESPARC.
Prior work on MCAs for SNN implementations have pri-
marily focused on device and circuit optimizations and do not
involve architecture-level analysis [10]. The benefits at device-
level need to be preserved at system-level. Our work proposes
a full-fledged MCA based reconfigurable neuromorphic archi-
tecture that can implement a wide variety of SNNs with vary-
ing complexity and topology, as required by an application. It
also helps to perform system-level analysis of MCAs, as MCAs
are not a drop-in replacement for existing computation cores
in the CMOS SNN implementations.
Post-CMOS based architectures were also explored in [8,13,
14]. While these works propose architectures for artificial neu-
ral networks, RESPARC targets SNN and utilizes its event-
drivenness for added energy benefits. Additionally, RESPARC
is distinct micro-architecturally as it explores a spatially scal-
able design based on reconfigurable hierarchies. Moreover, our
design obviates the use of energy hungry analog-digital con-
versions unlike [13,14] thereby leading to energy reductions.
There has also been prior work on SNN acceleration using
CMOS technologies. For instance, Akopyan et al. [5] proposed
TrueNorth which uses low power 28 nm CMOS technology and
asynchronous circuit designs. While our work is complemen-
tary to the effects of [5], we explore post-CMOS technology
for SNN acceleration. Moreover, to analyze the benefits of
RESPARC with respect to CMOS accelerators, we implement
an optimized CMOS based baseline. Additionally, other tech-
niques such as asynchronous computation will complement the
SNN acceleration on RESPARC.
In summary the key contributions of this work are:
1. An efficient memristive crossbar based architec-
ture for spiking neural networks is designed to har-
ness the energy-efficiency from in-memory processing and
event-driven computation.
2. Different spiking network topologies (MLP, CNN)
from different recognition applications namely digit
recognition, house number recognition and object clas-
sification are mapped onto RESPARC and analyzed for
performance and energy benefits with respect to their
digital CMOS implementations.
3. Different MCA sizes for different SNN topolo-
gies based on the limitations posed by the mem-
ristive technology are explored to determine the opti-
mum crossbar size for mapping a given network.
2. BACKGROUND
2.1 Spiking Neural Network
Input Output OutputInput
Filter
Convolution
(b) CNN(a) MLP
Σ
(c) Neuron
Figure 1: (a) A 2-layer MLP; (b) A convolution layer in CNN;
(c) A Neuron
SNN is regarded as the third generation neural network.
SNNs require the input to be encoded as spike trains and in-
volve spike-based (0/1) information transfer between neurons.
At a particular instant, each spike is propagated through the
layers of the network while the neurons accumulate the spikes
over time causing the neuron to fire or spike. The deep SNN
topologies used in this work are MLPs and CNNs. An MLP,
shown in Fig. 1(a), is a multi-layered SNN in which all neurons
in a layer are connected to all neurons in the previous layer.
A deep CNN, shown in Fig. 1(b), is also a multi-layered SNN
composed of alternating convolution and sub-sampling layers.
As shown in Fig. 1(c), a typical spiking neuron does an accu-
mulation operation followed by thresholding operation. The
spiking neuron model used in this work is the Integrate-and-
Fire (IF) model. Note that, our work focuses on the test-
ing/computation of the SNN and assumes that RESPARC has
been trained offline using supervised training algorithms [4].
2.2 Memristive Crossbars
W2,i
W1,i
Wn,i
. .
 .
X1
X2
Xn
Ni
Neuron
Synapses
In
pu
ts
(a)
V1
Vspike
t
ΣXjWi,j
Neurons
N1 N2 N3 Nm
V1
V2
Vn
G1,1
G2,1
Gn,1
G1,2
G2,2
Gn,2
G1,m
G2,m
Gn,m
In
pu
ts
(b)
I1 ↓
Vi = Xi Ii = ΣXjWi,j Gi,j = Wi,j
Im↓
Figure 2: (a) Typical Spiking Neural Network (SNN) (b) SNN
mapped to Memristive Crossbar Array (MCA)
Fig. 2(a) shows a 2-layer fully connected SNN. Fig. 2(b)
shows the connectivity structure/matrix (from Fig. 2(a)) map-
ped onto an MCA. The memristive devices at its cross-points
encode the synaptic weights of the SNN. An MCA receives
voltage inputs at its rows and the resulting current output at
any column is the weighted summation of the encoded weights
at that column and the input voltage. This is a direct con-
sequence of the Kirchhoff’s law as the current output into a
column from any cross-point will be the product of the con-
ductance at that cross-point and the voltage across it. Thus,
MCA is an analog “inner-product” computation unit. The
MCA outputs are interfaced with neurons. The neurons re-
ceive the input current that results in its membrane poten-
tial accumulating over time. When the membrane potential
reaches a threshold, the neuron spikes (“1”) thus mirroring the
function of an IF neuron.
mPE
Programmable 
Switch
NEUROCELL - 1
NEUROCELL - 2
NEUROCELL - n
C
O
N
TR
O
L
R
EG
IS
TE
R
S Global
Control
Unit
SRAM
(INPUT 
MEMORY)
G
lo
ba
l I
O
_B
US
(s
ha
re
d 
by
 a
ll 
Ne
ur
oC
el
ls
)
Figure 3: RESPARC as a pool of NeuroCells
3. RESPARC
3.1 Reconfigurable Hierarchies
RESPARC (shown in Fig. 3) is the reconfigurable core and
the topmost level among the three reconfigurable hierarchies.
As shown in Fig. 3, RESPARC is composed of pool of Neuro-
Cells which are the second level in the reconfigurable hierarchy.
Fig. 4 shows a macro Processing Engine (denoted as mPE in
Fig. 3) which is the lowest level in the hierarchy. Next, we
will discuss the organization and the logical dataflow in each
hierarchy starting from the lowest level and moving towards
the higher levels.
3.1.1 mPE - Reconfigurable Compute Unit
A macro Processing Engine (mPE) is composed of multi-
ple MCAs tied together to the Local Control Unit. The mPE
shown in Fig. 4 includes four MCAs, each of which is asso-
ciated to its neurons and a set of buffers namely (1) Input
Buffer (iBUFF), (2) Output Buffer (oBUFF) and (3) Target
Buffer (tBUFF). The iBUFF buffers the input spike packets
received until the required data needed by the MCA is avail-
able. Similarly, the oBUFF buffers the output spike packets
computed by the neuron until the required data to be sent to
a target neuron is available. The tBUFF stores the address of
the target neuron(s). Although, we consider IF neurons in this
work, any spiking neuron can be interfaced with the MCA.
The MCAs contain the synapses corresponding to the neu-
rons being computed in an mPE. This is realized by mapping
the connectivity matrix on the MCAs as shown in Fig. 2.
However, for memristive technology, MCA sizes which ensure
reliable operation are much smaller for instance 64 rows and
64 columns (64 ×64) in comparison to a typical neural net-
work’s fan-in that is of the order of several hundreds [11].
This necessitates partitioning the connectivity matrix to map
it across multiples MCAs. Subsequently, the neuron output
is computed by time-multiplexing the MCA outputs onto the
neuron as shown in Fig. 5. An mPE can be configured to sup-
port time-multiplexed computation of multiple degrees to map
neurons with variable fan-in. In case a neuron’s fan-in exceeds
the fan-in support an mPE provides locally, the connectivity
matrix is mapped across multiple mPEs.
For sparser connectivity matrices, which is typical of CNNs,
different output neurons have different inputs along with some
input sharing. Hence, a column (column maps to an output
neuron) in an MCA will consist of synapses at only certain
sparse locations (rows) that correspond to its inputs leading
to incompletely utilized MCA. Further, mapping the connec-
tivity matrix of a CNN directly to a large MCA results in
higher non-utilization due to large number of unused cross-
C
O
N
TR
O
L
R
EG
IS
TE
R
S
to CCU
SW_Out
I_out
C1
C2
C3
C4C
C
U
–C
ur
re
nt
 
C
on
tr
ol
 U
ni
t
request
wait
C_ext
(Input from Switch Network) SW_In
(Input from the IO_Bus) IO_In
C1
C_ext
C2
C3
C4
N
EU
R
O
N
S
oB
U
FF
tB
U
FF
iB
U
FF
SW_In
IO_In
C3
C_ext
C1
C2
C4 oB
U
FF
tB
U
FF
iB
U
FF
SW_In
IO_In
C2
C_ext
C1
C3
C4 oB
U
FF
tB
U
FF
iB
U
FF
SW_In
IO_In
C4
C_ext
C1
C2
C3 oB
U
FF
tB
U
FF
iB
U
FF
to CCU
to CCU
to CCU
(O
ut
pu
t t
o 
th
e 
Sw
itc
h 
N
et
w
or
k)
(Current (I) Input from 
neighboring mPE)
(Current (I) Output 
to neighboring mPE)
Local 
Control
Unit
N
EU
R
O
N
S
N
EU
R
O
N
S
N
EU
R
O
N
S
Figure 4: Macro Processing Engine - The mPE receives input
spikes from the IO Bus and the Switch Network which is pro-
cessed by the MCAs to produce output currents - C1, C2, C3,
C4. Additionally, external MCA currents (C ext) can also be
received by an MPE. Finally, the MCA currents get integrated
into the neurons to produce output spikes that are then sent to
the target neurons through Switch Network. The CCU controls
the transfer of MCA currents to and fro between two mPEs.
points (synapses). However, enumerating the connectivity ma-
trix across multiple smaller MCAs facilitates enhanced input-
sharing that improves MCA utilization. Consequently, this
reduces the number of mPEs required for the mapping. This
improves overall energy consumption by reducing the periph-
eral energy per MCA. Hence, mPE’s reconfigurability enables
optimized MCA utilization for sparse connectivity.
3.1.2 NeuroCell - Reconfigurable Datapath
As shown in Fig. 3, a NeuroCell is composed of multiple
mPEs and programmable switches. The switch network en-
ables spike-packet transfers within the NeuroCell. A switch
connects to its four neighboring mPEs. Additionally, each
switch has a dedicated connection to the switches in the same
row and same column. This enables low-latency (one-hop)
spike-packet transfers between the connected mPEs. Essen-
tially, a NeuroCell is a pool of mPEs coupled with dense local
connections that enables high throughput digital data transfer
within it.
Each switch can be configured to serve one or multiple mPEs
it connects to thereby, realizing a reconfigurable datapath
within the NeuroCell. This enables to optimize the datap-
ath for the given SNN’s connectivity. Consequently, this re-
duces the load on each switch and simplifies the overall traffic
management within the NeuroCell. Fig. 6 shows the pro-
grammable switch design. Each input and output line is as-
sociated with data and address buffers to synchronize data
transfer between the receiver and target mPE. Further, de-
pending on the switch configuration, it arbitrates between the
I1
I2
W1,1
W1,2
W2,1
W2,2
I3
I4
W1,3
W1,4
W2,3
W2,4
N1 N2
Synapses
Wi,2
Wi,1
Wi,4
I1
I2
I4
Ni
Neuron
In
pu
ts
Wi,3
I3
t
Vmem(N1)
vth
O1 gets 
integrated into 
N1 at time t1
O3 gets 
integrated into 
N1 at time t2
O1
O3
(a) (b)
4x2 Feed-forward 
network (i ∈ {1,2})
2x2 MCA 2x2 MCA
O2 O4
Time-multiplexed operation of degree-2 
(output computation requires 2 time-steps)
t1 t2
Figure 5: (a) A feed-forward neural network with neuron fan-in
of 4 (b) Mapping the 4 fan-in neurons using a 2×2 MCAs
Decoder  & 
Arbitration
Logic 
iData Buffer
iAddress Buffer
Data Buffer
iAddress Buffer
oData Buffer
oAddress Buffer
Data Buffer
oAddress Buffer
Input Line-1
Input Line-n
Ouput Line-1
Ouput Line-n
(Each Input/Output Line can be connected 
to either an mPE or a Switch) 
SW_ID mPE_ID MCA_IDInput Address (iAddress)Format
mPE_ID MCA_ID
Output Address (oAddress)Format
MCA_ID
(If switch sends data to 
other switch) 
(If switch sends data to an mPE) 
Figure 6: Programmable Switch
sender mPEs.
As mentioned before, a connectivity matrix can span across
multiple mPEs. To compute the neuron output, MCA cur-
rent(s) from one mPE is transmitted to another mPE (con-
sisting the neuron) followed by their time-multiplexed integra-
tion. Such analog signal transmission is facilitated by gated
wires connecting the neighboring mPEs (dashed lines in Fig.
3).
3.1.3 RESPARC - Reconfigurable Core
RESPARC is the scalable extension of a NeuroCell (NC)
and enables mapping of an SNN (that exceeds an NC’s size)
across multiple NCs. The NCs share a global “IO BUS” that
connects to an SRAM (Input Memory). Thus, data transfer
between different NCs go through the SRAM. Each NC in the
NC-array is associated with a “tag (x, y)” which facilitates
input broadcast from the SRAM to a variable number of NCs
(that map to a given layer) within a single cycle. To monitor
the completion of an NC’s computation, the global control
unit consists of an event-flag, dedicated to every NC which
gets set when the NC completes.
NC1,1
NC2,1
NC3,1
NC1,2
NC2,2
NC3,2
NC1,3
NC2,3
NC3,3
Layer (1+2+3) Layer4
Global IO_BUS
(shared by all NeuroCells)
mPE1,1
mPE2,1
mPE3,1
mPE1,2
mPE2,2
mPE3,2
mPE1,3
mPE2,3
mPE3,3
Layer1 Layer2 Layer3
(a) (b)
Figure 7: (a) Logical Dataflow in NeuroCell - High Through-
put Data Transfer across switch network (multiple mPEs send
data to multiple mPEs in parallel) (b) Logical Dataflow in
RESPARC - Serial Data Transfer across shared bus
Fig. 7 illustrates the logical dataflow involved across hier-
archies for SNN computation. Within an NC, parallel data
transfer occurs between layers of the SNN through the switch
network. Data transfer occurs serially through the shared bus
between layers mapped across multiple NCs to compute the
final output.
3.2 Energy Efficiency
We leverage the energy-efficiency of MCA (weight storage
and inner-product computation) for energy savings. Addition-
ally, as mentioned in section 3.1, the reconfigurability in mPE
enables optimized mapping that reduces the peripheral en-
ergy per MCA thereby resulting in overall energy reductions.
Within a NeuroCell, event-drivenness in SNN computations
is utilized by adding “zero-check logic” in each programmable
switch to prevent data transfers resulting from insignificant
spike-packets (for instance, all bits in the spike packet be-
ing zero). Additionally, at the topmost level in the hierarchy,
RESPARC exploits SNN data statistics (event-drivenness) to
prevent unnecessary broadcasts to NeuroCells by checking the
data read from SRAM with a “zero-check logic”. Thus, the
reconfigurability and event-driven computation further com-
plement the benefits observed with MCAs for energy-efficient
SNN acceleration on RESPARC.
4. EXPERIMENTAL METHODOLOGY
4.1 CMOS Baseline
We implemented the dataflow proposed in [15] for our CMOS
baseline and aggressively optimized it for SNNs. We aug-
mented the implementation with event-driven optimizations
to prevent unnecessary memory fetches and computations.
Additionally, we added buffers to optimize the temporal and
spatial data reuse patterns to minimize the memory fetches
and thereby, optimizing the overall energy consumption. Note
that our CMOS baseline enables to decouple the circuit and
network-on-chip driven optimizations in other CMOS based
SNN accelerators in order to rigorously analyze the MCA cen-
tric memory and computation benefits in RESPARC.
4.2 Architecture Level Simulation Setup
RESPARC is composed of different technologies namely cross-
bar technology, technology of the interfaced neurons and the
CMOS peripherals. For the memristive devices, we used a re-
sistance range of “20kΩ – 200kΩ” with 16 levels (4 bits) for
weight-discretization, that is typical of memristive technolo-
gies such as PCM, Ag-Si [16]. We considered an operating
voltage of “Vdd/2” for the MCA as it is interfaced with CMOS
neurons [17]. The peripheral circuit consisting of buffers, com-
munication and control logic was implemented at the Regis-
ter Transfer Level in Verilog HDL and mapped to IBM 45nm
technology using Synopsys Design Compiler. Synopsys Power
Compiler was used to estimate the energy consumption. The
input memory (SRAM) was modelled using CACTI [18]. Fig.
8 lists the simulation parameters and the implementation met-
rics for one NeuroCell. Please note that the same methodol-
ogy was also used to estimate the energy consumption of our
CMOS baseline. Fig. 9 shows the simulation parameters and
implementation metrics for the baseline.
Our benchmark comprises of six SNN designs from different
recognition applications namely, House Number Recognition
(SVHN dataset [19]), Digit Recognition (MNIST dataset [20])
and Object Classification (CIFAR-10 dataset [21]). We use
one MLP and one CNN from each application. The SNNs were
Micro-architectural
Parameters Value
Architecture 64 bit
NC Dimension 4×4
No. of mPE (Switches) 16 (9)
No. of MCAs per mPE 4
Metrics Value
Feature Size 45nm
Area 0.29 mm2
Power 53.2 mW
Gate Count 67643
Frequency 200 MHz
Figure 8: RESPARC parameters and metrics
Micro-architectural
Parameters Value
NU count 16
FIFO(s): Input (Weight) 16 (1)
FIFO depth 32
Width: FIFO (NU) 4 (4)
Metrics Value
Feature Size 45nm
Area 0.19 mm2
Power 35.1 mW
Gate Count 44798
Frequency 1GHz
Figure 9: CMOS baseline parameters and metrics
Application Dataset Connectivity Layers Neurons Synapses
House Number 
Recognition SVHN
MLP 4 2778 2778000
CNN 6 124570 2941952
Digit 
Recognition MNIST
MLP 4 2378 1902400
CNN 6 66778 1484288
Object 
Classification CIFAR-10
MLP 5 3778 3778000
CNN 6 231066 5524480
Figure 10: SNN Benchmarks
trained using supervised learning algorithm proposed in [4].
Fig. 10 shows the benchmark details. As mentioned before, we
do not consider the training phase of the SNN and hence, do
not consider the energy expended in programming the MCAs.
Also, in typical use case of recognition applications, the train-
ing process is performed once or very infrequently. On the
other hand, the testing or evaluation phase, in which the ac-
tual classification is performed using SNNs, extends for much
longer periods of time. Hence, we evaluate RESPARC for the
more critical testing phase.
5. EXPERIMENTAL RESULTS
In this section, we present the results of various experiments
that demonstrate the benefits of RESPARC and underscore
the effectiveness of the proposed architecture in exploring the
design space of post-CMOS based MCAs for SNN applica-
tions.
5.1 Comparison with CMOS baseline
Fig. 11 compares the energy savings and performance spee-
dups obtained per classification for RESPARC over the CMOS
baseline for various SNN applications with CNN and MLP
topologies. The energy consumptions are normalized to the
energy consumption of MNIST on RESPARC and the perfor-
mance speedups are normalized to CIFAR-10 on CMOS base-
line. The MCA size used is 64 i.e., 64 rows and 64 columns.
As shown in Figs. 11 (a) and (c), RESPARC provides sig-
nificant energy benefits between 10× – 15× (12× on average)
at a performance speedup of 33× – 95× (60× on average) for
the CNN benchmarks. For MLPs, (shown in Figs. 11(b) and
(d)) energy benefits on RESPARC increase to 331× – 549×
(513× on average) at a performance speedup of 360× – 415×
(382× on average). Hence, RESPARC efficiently accelerates
both CNN and MLP based SNN applications.
The lower efficiency (both energy and speedup) for CNNs
stems from the incomplete utilization of MCAs in RESPARC
as discussed in subsection 3.1.1. The incomplete utilization
leads to higher peripheral energy consumption per MCA there-
by, decreasing the overall energy improvement. Additionally,
0
0.5
1
1.5
2
2.5
3
3.5
4
MNIST SVHN CIFAR-10
N
or
m
al
iz
ed
 E
ne
rg
y 
(lo
g1
0 
sc
al
e) CMOS RESPARC
(b)
331×
659× 549×
0
1
2
3
4
5
6
7
8
9
MNIST SVHN CIFAR-10
N
or
m
al
iz
ed
 S
pe
ed
up
 (l
og
2 
sc
al
e) CMOS RESPARC
(c)
95×52×33×
0
2
4
6
8
10
12
MNIST SVHN CIFAR-10
N
or
m
al
iz
ed
 S
pe
ed
up
 (l
og
2 
sc
al
e) CMOS RESPARC
(d)
415×371×
360×
0
1
2
3
4
5
6
7
MNIST SVHN CIFAR-10
N
or
m
al
iz
ed
 E
ne
rg
y 
(lo
g2
 s
ca
le
) CMOS RESPARC
(a)
15×
10×
11×
CNN
CNN
MLP
MLP
Figure 11: Energy and Performance Speedup comparison of
RESPARC vs CMOS baseline per classification
0
1
2
3
4
5
6
N
or
m
al
iz
ed
 E
ne
rg
y 
-R
ES
PA
R
C
SVHNMNIST CIFAR-10
RESPARC - N
N = 128 or 64 
or 32
128
64
32
(a)
Neuron Crossbar
Peripherals (Buffer, 
Control, Communication)
MLP
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
MNIST SVHN CIFAR-10
N
or
m
al
iz
ed
 E
ne
rg
y 
-C
M
O
S
(b)
Core (Buffer, Compute, 
Control)
Memory Access
Memory 
Leakage
MLP
0
1
2
3
4
5
6
7
8
N
or
m
al
iz
ed
 E
ne
rg
y 
-R
ES
PA
R
C
MNIST SVHN CIFAR-10
128 64
32
(c)
Neuron Crossbar
Peripherals (Buffer, 
Control, Communication)
CNN
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
MNIST SVHN CIFAR-10
N
or
m
al
iz
ed
 E
ne
rg
y 
-C
M
O
S
(d)
Core (Buffer, Compute, 
Control)
Memory Access
Memory 
Leakage
CNN
32
64
128
128
64
32
RESPARC - N
N = 128 or 64 
or 32 32
64
128
128
64
32
Figure 12: RESPARC and CMOS baseline energy breakdowns
for different topologies
incompletely utilized MCAs lead to lesser gain in performance
speedup as lesser number of MCA outputs (columns) are uti-
lized. In contrast, MLPs have fully utilized MCAs that result
in higher throughput (number of outputs computed per unit
time) as all the columns of the MCA are being used for output
computation.
5.2 Comparision with varying MCA sizes
The graphs in Fig. 12 show the breakdown of energy from
Fig. 11 into 3 key components for RESPARC: (i) Neuron
(ii) Crossbar (iii) Peripherals and 3 key components for the
CMOS baseline: (i) Core (ii) Memory Access (iii) Memory
Leakage. We present the energy distribution for MLP and
CNN benchmarks on different MCA (crossbar) sizes namely
(i) RESPARC-128 (ii) RESPARC-64 (iii) RESPARC-32. Fig.
12 (a) shows RESPARC energy consumption for MLPs. The
energy consumption decreases with increasing MCA size. This
is due to the fact that for larger MCAs the synapses would
be mapped across less number of mPEs that decreases the
peripheral energy per MCA reducing the overall energy con-
sumption. On the other hand, for CNNs (shown in Fig. 12(c)),
RESPARC-64 is the most energy-efficient. We observe a de-
crease in energy from RESPARC-32 to RESPARC-64 with
CNNs due to decrease in peripheral energy. However, in-
creasing MCA size from 64 to 128 increases the MCA non-
utilization (due to sparser connectivity in CNNs as discussed
in section 3.1.1) that dominates the overall energy consump-
tion. Hence, unlike MLPs, an increase in MCA size from 64
to 128 does not result in a corresponding decrease in the pe-
ripheral energy per MCA as the number of mPEs being used
does not decrease commensurately.
As shown in Fig. 12 (b), the energy consumption in MLPs
on the CMOS baseline is dominated by the memory compo-
nent (access and leakage). This implies that the energy savings
for MLPs on RESPARC results from efficient memory storage
(weight storage in MCAs). On the other hand, Fig. 12 (d)
shows that the computation core (which includes the buffers
and the computation units) dominates the energy consump-
tion in CNNs. This suggests that the energy efficiency for
CNNs on RESPARC results from the efficient inner-product
computation in the MCAs.
5.3 Effect of event-drivenness in SNNs
The graphs in Fig. 13 show the energy savings for MNIST
dataset on RESPARC due to SNN’s event-driven processing
nature. The energy benefits are highest on RESPARC with
the smallest MCA size. This is a consequence of the fact that
the probability of finding zeros with smaller run-lengths (zeros
with run length of 32 refers to a 32-bit spike-packet with all
bits being zero) is significantly higher than that with larger
run-lengths. We also obtained similar energy improvements
with event-driven optimizations on the other two datasets.
As discussed before, smaller MCAs are preferred because of
reliability but they suffer from increased peripheral energy
consumption. However, RESPARC with its event-drivenness
enables using MCAs of smaller sizes for efficient acceleration
of SNNs.
0
0.5
1
1.5
2
2.5
N
or
m
al
iz
ed
 E
ne
rg
y 
-M
LP
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
N
or
m
al
iz
ed
 E
ne
rg
y 
-C
N
N
Neuron Crossbar Peripherals
w/o
w/
RESPARC
128
RESPARC
64
RESPARC 
32
RESPARC
128
RESPARC
64
RESPARC
32
(a) (b)
w/o
w/w/
w/ow/o : without event-drivenness
w/ : with event-
drivenness
w/o : without 
event-drivenness
w/ : with event-
drivenness
w/o w/
w/o
w/
w/o
w/
Figure 13: (a) Energy consumption analysis with event-
drivenness in MLPs (b) Energy consumption analysis with
event-drivenness in CNNs
The benefits observed with CNNs are lesser than MLPs.
This is due to the fact that CNNs process two-dimensional
spatial windows of the input image that typically comprises
of foreground (white) pixels. In contrast, MLPs process one-
dimensional vectors that can easily find zero run-lengths for
background (black) pixels.
5.4 Effect of bit-discretization of MCA
Here, we analyze the memristor bit-precision on accuracy
and energy consumption of RESPARC and CMOS baseline.
As illustrated in Fig. 14 (a), the classification accuracy in-
creases continuously with increasing weight precision (higher
bit-discretization). However, the accuracy with 4-bits is com-
parable to the accuracy with 8 bits. Hence, we used 4-bit
weight precision for our energy comparisons between RESPARC
and CMOS baseline. However, other complex applications
may necessitate the usage of higher bit-discretization for weight
storage.
1 2 4 80
0.5
1
1.5
2
N
or
m
. E
ne
rg
y
Bit Discretization
CMOS
RESPARC
0
0.3
0.6
0.9
1.2
N
or
m
. A
cc
ur
ac
y
Bit Discretization
MNIST
CIFAR10
SVHN
1 2 4 8
(b)(a)
Figure 14: (a) Normalized Accuracy with respect to bit-
dicretization in memristors (b) Normalized Energy with re-
spect to bit-dicretization in memristors
A noteworthy observation here is that the energy consump-
tion in RESPARC (from Fig. 14 (b)) is fairly independent
of the weight precision. However, the area of the memristive
device will increase with increasing precision that will increase
the MCA area resulting in an area overhead. We also observe
from Fig. 14 (b) that the energy consumption of the CMOS
baseline increases with increasing bit-discretization. This is
due to the fact that a higher precision demands bigger mem-
ory, buffers and compute units resulting in an increase in both
the core power (buffer and computation units) and memory
power (access and leakage).
6. CONCLUSIONS
The intrinsic compatibility of post-CMOS technologies with
biological primitives provides new opportunities to develop
efficient neuromorphic systems. In this work we proposed
RESPARC a memristive crossbar based architecture for energy-
efficient acceleration of deep Spiking Neural Networks (SNN).
We developed a reconfigurable hierarchy that efficiently im-
plements SNNs of different connectivities given a memristive
crossbar size and technology. Additionally, RESPARC syner-
gically combines the energy benefits of post-CMOS technolo-
gies and the event-drivenness of bio-inspired SNNs to address
the power and memory bottlenecks in modern computing sys-
tems. Our results on a range of recognition applications sug-
gest that RESPARC is a promising architecture to implement
SNNs providing favorable tradeoffs between energy and cross-
bar size.
7. REFERENCES
[1] A. Krizhevsky et al. Imagenet classification with deep
convolutional neural networks. NIPS, 2012.
[2] S. Chetlur et al. cudnn: Efficient primitives for deep learning.
arXiv:1410.0759, 2014.
[3] T. Chen et al. Diannao: A small-footprint high-throughput
accelerator for ubiquitous machine-learning. ACM Sigplan
Notices, 2014.
[4] P. U. Diehl et al. Fast-classifying, high-accuracy spiking deep
networks through weight and threshold balancing. IJCNN, 2015.
[5] F. Akopyan et al. Truenorth: Design and tool flow of a 65 mw 1
million neuron programmable neurosynaptic chip. TCAD, 2015.
[6] S. H. Jo et al. Nanoscale memristor device as synapse in
neuromorphic systems. Nano letters, 2010.
[7] M. Prezioso et al. Training and operation of an integrated
neuromorphic network based on metal-oxide memristors. Nature,
2015.
[8] X. Liu et al. Reno: A high-efficient reconfigurable neuromorphic
computing accelerator design. DAC, 2015.
[9] B. L. Jackson et al. Nanoscale electronic synapses using phase
change devices. JETC, 2013.
[10] A. Sengupta et al. Proposal for an all-spin artificial neural
network: Emulating neural and synaptic functionalities through
domain wall motion in ferromagnets. TBioCAS, 2016.
[11] J. Liang et al. Cross-point memory array without cell
selectors-device characteristics and data storage pattern
dependencies. TED, 2010.
[12] B. Liu et al. Vortex: variation-aware training for memristor
x-bar. DAC, 2015.
[13] P. Chi et al. Prime: A novel processing-in-memory architecture
for neural network computation in reram-based main memory.
ISCA, 2016.
[14] A. Shafiee et al. Isaac: A convolutional neural network
accelerator with in-situ analog arithmetic in crossbars. ISCA,
2016.
[15] P. Panda at al. Falcon: Feature driven selective classification for
energy-efficient image recognition. arXiv:1609.03396, 2016.
[16] B. Rajendran et al. Specifications of nanoscale devices and
circuits for neuromorphic computational systems. TED, 2013.
[17] A. Joubert et al. Hardware spiking neurons design: Analog or
digital? IJCNN, 2012.
[18] N. Muralimanohar et al. Optimizing nuca organizations and
wiring alternatives for large caches with cacti 6.0. MICRO, 2007.
[19] Y. Netzer et al. Reading digits in natural images with
unsupervised feature learning. 2011.
[20] Y. LeCun et al. Gradient-based learning applied to document
recognition. IEEE, 1998.
[21] A. Krizhevsky et al. Learning multiple layers of features from
tiny images. 2009.
