Fast and Accurate Sparse Coding of Visual Stimuli with a Simple,
  Ultra-Low-Energy Spiking Architecture by Woods, Walt & Teuscher, Christof
Fast and Accurate Sparse Coding of Visual Stimuli
with a Simple, Ultra-Low-Energy Spiking
Architecture
Walt Woods and Christof Teuscher
Department of Electrical and Computer Engineering
Portland State University, Portland, OR, USA
{wwoods, teuscher}@pdx.edu
Abstract—Memristive crossbars have become a popular means
for realizing unsupervised and supervised learning techniques. In
previous neuromorphic architectures with leaky integrate-and-
fire neurons, the crossbar itself has been separated from the
neuron capacitors to preserve mathematical rigor. In this work,
we sought to design a simplified sparse coding circuit without
this restriction, resulting in a fast circuit that approximated a
sparse coding operation at a minimal loss in accuracy. We showed
that connecting the neurons directly to the crossbar resulted in a
more energy-efficient sparse coding architecture, and alleviated
the need to pre-normalize receptive fields. This work provides
derivations for the design of such a network, named the Simple
Spiking Locally Competitive Algorithm, or SSLCA, as well
as CMOS designs and results on the CIFAR and MNIST
datasets. Compared to a non-spiking, non-approximate model
which scored 33% on CIFAR-10 with a single-layer classifier,
this hardware scored 32% accuracy. When used with a state-of-
the-art deep learning classifier, the non-spiking model achieved
82% and our simplified, spiking model achieved 80%, while
compressing the input data by 92%. Compared to a previously
proposed spiking model, our proposed hardware consumed 99%
less energy to do the same work at 21× the throughput. Accuracy
held out with online learning to a write variance of 3%, suitable
for the often-reported 4-bit resolution required for neuromorphic
algorithms; with offline learning to a write variance of 27%;
and with read variance to 40%. The proposed architecture’s
excellent accuracy, throughput, and significantly lower energy
usage demonstrate the utility of our innovations.
Index Terms—sparse coding, locally competitive algorithm,
memristors, neuromorphic architecture, spiking architecture
I. INTRODUCTION
Sparse coding, accomplished through algorithms that en-
code an input stimulus in a new basis with few non-zero
elements, has been shown to improve image classification
accuracy with single-layer classifiers [1]. These algorithms
have also been shown to reduce the learning time required for
backpropagation [2]. Since there are few non-zero elements,
sparse coding also provides a means for minimizing the
bandwidth required to transfer sensor data amongst multiple
processors, or to store that data in long-term storage. In
recent years sparse coding has gained additional traction as
a result of biological evidence that the V1 visual layer in
mammalian cortices performs similar functionality [3]–[5].
The revelation that sparse coding algorithms should be part
of a neuromorphic learning system has intensified research
using these algorithms. The implementation of neuromorphic
algorithms on custom Application-Specific Integrated Circuits
(ASICs) has also been a wildly popular area of study largely
due to the development of programmable, variable-resistance
nanodevices, named memristors, that can be used to realize
the synapses needed in a more compact, energy efficient form
[3], [4], [6]–[11].
Our work generally extends from the Locally Competitive
Algorithm (LCA) proposed by Rozell et al. in 2008, an optimal
solver for the sparse coding problem [12], coupled with Oja’s
rule, used to repeatedly tune the dictionary towards an optimal
solution for a set of training inputs without requiring a super-
visory signal [13]. The LCA was chosen as the most promising
sparse coding algorithm due to its use of inhibitory forces to
force an optimally sparse and stable solution. We give special
consideration to the case where the LCA is used as a front end
to a traditional supervised classifier. We investigate the efficacy
of the supervised classifier to classify an input stimulus as
either the value of a handwritten digit or the type of object in
a tiny image: MNIST and CIFAR-10, respectively [14], [15].
This approach leverages the transformation of input data from
its native space to a decorrelated space via the LCA, and then
uses traditional machine learning techniques to classify the
stimulus based on its decorrelated representation. We have
used this approach in the past [8]. In practice, there are
several benefits to this approach: improved accuracy due to the
stability of the decorrelated representation (an effect similar
to dropout), and the ability to compress the input stimulus
between the measuring device and the identification layer.
A single frame of HD video data consists of approximately
6.2 Mbit of information, which the LCA could compress down
to 1.0 Mbit, an 84 % reduction, at a Root-Mean-Square Error
(RMSE) of 5.8 %, or down to 0.6 Mbit, a 90 % reduction,
at an RMSE of 7.0 %. For video surveillance systems or au-
tonomously driving vehicles, this means that several cameras
could be wired into a single, high-speed, low-energy sparse
coding device, greatly reducing the needed communication
bandwidth for the system.
In this work we set out to provide a low-power, hardware-
friendly realization of an LCA-like algorithm using a spik-
ing framework. To maximize power savings, the model was
simplified and the drawbacks of that simplification were in-
vestigated. Spikes were utilized to save power; the efficacy of
spikes for power saving was explored. The proposed architec-
ture has been named the Simple Spiking Locally Competitive
Algorithm (SSLCA). The SSLCA was compared with the
original LCA’s ODE [12] as well as their later spiking work in
ar
X
iv
:1
70
4.
05
87
7v
3 
 [c
s.E
T]
  2
3 J
an
 20
19
Shapero et al. [16]. Both the MNIST and CIFAR-10 datasets
were used for the comparison. Through these comparisons,
we found that our proposal demonstrated excellent power and
scaling qualities. It is our hope that this work provides the
basis for efficient, next-generation sparse-coding hardware.
II. RELATED WORK
The original work on LCAs was Rozell et al., 2008 [12].
Rozell et al. sought to improve upon prior sparse coding
algorithms by deriving an optimal expression to both minimize
the sparse coding equation (Eq. (1)) and smooth the generated
sparse representation when given time-varying input. Their
work derived an Ordinary Differential Equation (ODE) that
solved both of these problems (Eq. (2)). By providing com-
petition amongst outputs, the number of active outputs may
be minimized while simultaneously maximizing the fidelity of
the reconstruction produced. The sparse coding equation that
is minimized by the LCA, with the input stimulus denoted as
s, the sparse code of coefficients to reconstruct the input as a,
the bases that comprise the sparse code as the matrix Φ, the
reconstruction of the input as sˆ = Φa, and a cost function
λC(·) that expresses the trade-off between reconstruction
quality and the sparsity of the final solution a, is written as:
E(t) =
1
2
||s(t)− sˆ(t)||2 + λ
∑
m
C(am(t)). (1)
E(t) is minimized by descending an ODE derived by Rozell
et al. [12] to its natural steady state:
am(t) = Tλ(um(t)),
u˙m(t) =
1
τ
bm(t)− um(t)−∑
n 6=m
Gm,nan(t)
 , (2)
where um is an underlying, non-sparse state for the mth
neuron that is thresholded with a function Tλ(·) to produce the
resulting sparse code element am, bm is the inner product of
the mth neuron’s receptive field and the input, and Gm,n is the
inner product of the mth and nth receptive fields, minus 1 if
m = n. Approximating the integral of Eq. (2) with sufficiently
small step sizes will always end in a stable u, corresponding
to a generated sparse code a. See the original LCA paper for
more details [12].
Due to the popularity of neuromorphic algorithms, there has
been a lot of prior work relating to ASICs for neuromorphic
sparse coding architectures, both within C. Rozell’s group
and other research groups. The relative performance of these
in terms of energy efficiency and throughput are shown in
Fig. 1. Note that the throughput axis is presented in “Ops/s”
as opposed to “Inputs/s,” so that the scale of an architecture
has no effect on the measures reported. For example, if an
architecture were built to process 64 inputs, and its perfor-
mance were measured, doubling that architecture to 128 inputs
would immediately double its throughput measurement if we
used “Inputs/s” rather than “Ops/s.” Using “Ops/s” makes the
number of inputs irrelevant, which is more appropriate for
104 105 106 107 108
Ops/s
10−12
10−11
10−10
10−9
10−8
10−7
J/
In
p
u
t
Knag 2015
Kim 2015
This work
Shapero 2012
Shapero 2013
Fig. 1: Comparison of the SSLCA’s energy efficiency and throughput,
presented in this work, with previous state-of-the-art results. One “Op,” or
operation, is the complete generation of a sparse code from a single set of
inputs.
architectures which have a fixed processing time regardless of
the number of inputs and outputs.
Throughputs presented are inference-only; throughputs
when using online training would be different in accordance
with the time to set the different storage mediums used in each
algorithm. As an example in this work, many memristive mod-
els can have their resistances modified within 2 ns to 20 ns
[17]. If every algorithm execution led to a weight update of
each device in the matrix, and each update happened serially,
this would lead to e.g. 392 µs being added to each loop of
the matrix for an example network with 784 inputs and 50
outputs. However, this quantity may easily be amortized: either
multiple devices might be updated at once, multiple runs of
the algorithm could be integrated into each weight update,
or only a portion of the weights would need to be updated
each iteration. For all neural algorithms, disabling learning
results in significantly higher throughput. Furthermore, when
the problem at hand is sufficiently solved, there is no further
need for the learning step, an often-used motivation for offline
learning in practical applications.
While Eq. (2) is an effective equation for sparse coding,
prior attempts at implementing the LCA directly in hardware
have suffered from a few details which prevented an efficient
implementation. For example, the LCA’s matrix form can
be intuitively described as reproducing the input based on
a linear combination of the weight matrix columns. The
most significant term limiting the LCA’s efficiency comes
from the inhibition term in Eq. (2): each output column’s
coefficient’s ODE depends on all other output columns. For a
naive implementation, and indeed the one chosen by Rozell’s
group in their original hardware implementation using floating
gates [18], this implies O(N2) hardware scaling: doubling the
number of output elements quadruples the required hardware.
While Rozell’s group found that the actual power consumption
scaled less quickly than the amount of hardware, at O(N√N),
the amount of hardware still scaled as O(N2). Additionally,
ODE convergence in their hardware was relatively slow, oc-
curring after 240 µs [18]. Even outside of the inhibitory term,
a low-power implementation of the dot product in the ODE is
non-trivial: it had to either be implemented digitally, or using
next-generation components like memristors. While variable-
resistance nanodevices like memristors can compute a dot
product using little power themselves, to make the computa-
tion accurate requires either a virtual ground, which consumes
significant power due not only to the matching current but
also due to excessive current drained through low-resistance
devices [8], [17], [19], or requires a tuning resistance that
must change based on each column’s configuration [19]. The
additional power consumed by these solutions motivates the
exploration of techniques that do not require the calculation
of an exact dot product.
The drawbacks of power scaling and slow convergence
times were addressed to some extent later by Shapero et al.
in 2013 [16]. That work extended the original LCA to a
spiking architecture, referred to in this work as the Spiking
Locally Competitive Algorithm (SLCA). The motivation for
spiking largely seems to have stemmed from biology: all
biological systems appear to use spiking rather than constant
signals [3], [4], [20]. Spiking models have also long been
believed to consume less power, and to exhibit additional
computational power due to their stochasticity [21]–[23]. The
validity of leveraging spikes to save power is discussed further
in Section IV-A2 of this work. In their work, Shapero et al.
[16] showed that their SLCA consumed more power than
their LCA at small sizes, but that their SLCA scaled only
as the desirable O(N), and would consume less power than
the LCA at large network sizes. Additionally, they reduced
the convergence time to 25 µs, nearly 90 % faster than their
LCA with a throughput of 40 kOps/s. However, the required
hardware still scaled as O(N2).
Other spiking networks optimized for sparse coding have
been published, such as SAILnet, introduced by Zylberberg
et al. in 2011 [24]. ASICs using this architecture have been
studied, with a substantial reduction in power compared to
the approach presented by Shapero et al. [16]. Knag et al.
[25] were capable of using the SAILnet architecture to process
images using only 48 pJ/input for their inference logic with
a throughput of 0.55 MOps/s, or using 176 pJ/input with a
throughput of 4.8 MOps/s, 120× as fast as Shapero et al.
[16]. Their design was CMOS-based, and utilized a decreased
resolution for weight storage: 4 bits per excitatory or inhibitory
weight. This decision has been justified in a number of
prior works dealing with how much accuracy is needed for
sparse coding algorithms to perform well [8], [26]. This
design was later reconfigured by Kim et al. [27]. They tested
higher clock speeds, optimized the design, and generated less
detailed output, resulting in a throughput of 9.9 MOps/s at
an energy efficiency of 26.4 pJ/input [27]. However, these
numbers benefit from their usage of only 256 output neurons
to represent 1024 inputs, whereas Knag et al. used 256 output
neurons to represent only 256 inputs [25], [27]. Not only does
using fewer outputs result in a worse encoding of the input, but
due to the O(N2) scaling properties of these networks, this
also favors the power and throughput figures in [27]. As such,
subsequent results in this work will be compared with Knag
et al., as their chip performs a comparable amount of work
to ours [25]. Like the LCA, SAILnet uses a direct inhibitory
weight between each pair of output neurons, yielding a scaling
complexity of O(N2).
The closest family of algorithms that does not exhibit
O(N2) scaling is Spike-Timing-Dependent Plasticity (STDP).
STDP exploits what is known as “Hebbian” learning, where
input spike events that occur at the same time as an output
spike event become more likely to trigger that output spike
event. The common idiom for this behavior is, “neurons that
fire together, wire together.” In effect, each output neuron
learns to activate when a correlated set of inputs fires together.
This is very similar to what happens in sparse coding, where a
neuron responds to a specific pattern in the input. The primary
differences are that STDP makes no effort to preserve the
information found in the input and STDP does not implement
inhibition amongst neurons. Rather, the purpose of STDP is to
flag which features are present in the input and how prevalent
they are, without regard for the other features present. Sparse
coding, on the other hand, will suppress output of a feature
that is already represented by a combination of other features.
Both techniques are a form of unsupervised learning, except
sparse coding requires some inhibitory terms while STDP does
not. This gives STDP the desirable quality of O(N) scaling.
Due to its excellent scaling properties, STDP was used in
one of the earliest attempts to replicate the features found
in mammalian visual cortices [3], has been explored as an
autoencoder [28], and has been used to generate unsupervised
features for digit classification on the MNIST digit database
[6]. STDP is also one of the dominant architectures researched
using next-generation nanodevices such as memristors [4], [6],
[7], [9], [10], [20]. The downside of an STDP approach to
input encoding is that more output neurons are required due to
the lack of inhibition; with 50 neurons, prior research showed
that STDP achieved 80% accuracy on MNIST, while a sparse
coding layer using LCA achieved 85% [6], [8]. If a sparse
coding algorithm were implemented with the same efficiency
as STDP, it would be the preferred method of unsupervised
training, as it conveys more depth of information. That is what
our work set out to accomplish: to close the gap between STDP
and sparse coding algorithms such that there exists a simple
and efficient algorithm for sparse coding.
Recent work by Sheridan et al. showed that their group has
manufactured memristive crossbars and applied voltage across
the network to calculate the similarity coefficients in the LCA
equations [29], [30]. Sheridan et al. used a microprocessor
to implement the majority of the LCA, and did not include
comprehensive throughput and power information [29], [30].
However, a fundamental result of their work is that memristive
devices have sufficient resolution and accuracy to implement
LCAs on real devices [29]. Our work extends their work
by proposing a means of implementing the entirety of the
LCA on the same chip as the memristive crossbar with few
Row Header
Spikes
Input spikes
Is any neuron
firing?
Row Header
Row Header
Output
C
olum
n
H
eader
C
olum
n
H
eader
C
olum
n
H
eader
Fig. 2: High-level architecture for the SSLCA. During inference, input spikes
pass through a Row Header. Voltage is forwarded from the Row Headers to
a nanowire crossbar with memristors at each junction. Current is allowed to
pass through each memristive junction and is used to charge or discharge an
LIF neuron in each Column Header (Fig. 5). When any LIF neuron spikes,
an output spike is propagated and inhibitory forces are passed back through
the crossbar to the Row Headers. Only a single shared bit (“Is any neuron
firing?”) is required. The count of output spikes across any given time window
describes the sparse code for the input pattern seen during that time window.
additional components. The Sheridan et al. work also advo-
cated using a Winner-Take-All (WTA) approach to training
the weight matrix: rather than updating the weights for all
columns participating in a reconstruction, they only updated
the largest contributor [29]. While effective, using WTA was
motivated largely by the supposition that a single neuron’s
firing would dominate the response to most stimuli; however,
with inhibition and larger, more complicated inputs, this is not
the case.
Also noteworthy is recent work by Tang et al. which extends
the mathematical justification of spiking LCA networks [31].
Their work is largely theoretical, not corresponding to a single
hardware but rather addressing the mathematics of a spiking
network used to implement the LCA. Similar to Section III and
Shapero et al.’s SLCA [16], they model spikes as rate-based
entities, but emphasize proofs of convergence over specific
designs. In contrast, our work makes use of a statistical model
to define behavior which is hardware friendly and shown to
be empirically useful (Section IV).
III. MODEL
In light of issues with previous hardware implementations
and the potential benefits of sparse coding algorithms dis-
cussed in Sections I and II, we set out to develop the sim-
plest architecture for sparse coding that would exhibit O(N)
scaling, utilize inhibition, and emphasize low-power operation.
While any device whose resistance can be modified in-situ
would suffice, memristors from Lu et al.’s group were chosen
due to their nanoscale form factor and ability to be fabricated
in tight crossbars [29]. These devices additionally exhibit
a low on:off ratio, which has been associated with devices
that possess better long-term storage and analog qualities
[17], [29]. It is also worth noting that while the internal
state of memristive devices could change at any voltage, it
changes very slowly at the 0.7 V used during inference for
In0
0.35
In1
In2
V
o
lt
a
g
e
(V
)
In3
Out0
0.10 Out1
Reconstruction
0 2 4 6 8 10
Time (ns)
Inputs Seen
Fig. 3: Simulated voltage traces within our architecture, using perceptual
icons as guides. Actual model demonstrated is the inhibited SSLCA from
Section III-B. This network has 4 inputs and 2 output neurons; the input
vector passed is [0, 1, 0.5, 0.5], the first output neuron responds to the first
two inputs, and the second output neurons responds to the last two inputs.
Shaded regions are spikes; red shaded regions demonstrate input regions
ignored due to output spikes. Input voltages shown are the charge on the
inhibition capacitors, and the orange dashed line is the inhibition threshold.
Output voltages shown are the charge on the neuron capacitors, and the green
dashed line is the firing threshold. The “Reconstruction” row demonstrates
the input (far left), and then reconstruction that would be generated if the
algorithm stopped at 0.5ns, 1.5ns, etc. The “Inputs Seen” row demonstrates
the uninhibited inputs seen between the i−1th output spike and the ith output
spike; that is, it is the total activity that caused an output spike. Generally,
input activity accumulates for inputs that are currently underrepresented in
the output, and the output neuron that best represents that difference between
input activity and output representation fires, charging inhibition capacitors
that prevent that region of the input from being represented again too soon.
More details can be found throughout Section III.
the SSLCA. Further discussion of the time-varying qualities of
memristive devices can be found in [17]; for this architecture,
we generally assume that the memristive devices used are
static during the inference step due to the low voltages used,
and are adjusted in a separate training step that implements
Oja’s rule, or in a loading process that configures the chip
with weights learned during offline training.
As an initial step, we established that the chosen architecture
should fit the form shown in Fig. 2. Assuming good accuracy
could be derived, such an architecture would be sufficient for
implementing sparse coding with the desired traits. Such an
architecture would clearly exhibit O(N) scaling. Inhibition
could be implemented with a backwards-pass through the
same crossbar used to charge the output neurons. Low-power
operation would stem from the simplicity of the architecture,
its good scaling properties, and an innovation on the way the
neurons were integrated into the architecture.
Sparse coding may be realized with this architecture as
demonstrated in Fig. 3. Input spikes were chosen due to
biological inspiration, the promise of lower power consump-
tion, and also partially because memristors exhibit vastly
different resistances at different voltages; using spikes rather
than voltage-scaled inputs helps to avoid this situation [17].
Upon reaching the Row Headers, each input spike (gray-
shaded regions) would be converted to a voltage and would
charge the state of any output neurons who encode activity
from that input. As with most spiking algorithms, our proposed
network would deal with the statistics of populations of spikes
rather than the interactions of individual spikes. Thus, input
spikes might be produced and converted in any fashion so long
as the expected voltage of the input line scales linearly with the
activity of the input. The amount of charge received by each
output neuron would be controlled by a memristive device at
nanowire junctions between the Row and Column Headers.
These memristive devices form a pattern of conductances for
each column, constituting its Receptive Field (RF). Patterns of
input spikes will produce more charge in output neurons whose
RFs align with the input pattern. When an output neuron is
charged beyond a threshold by a sufficient quantity of input
spikes, that output would produce an output spike (gray-shaded
region), representative of the inputs seen so far. As those
inputs are then represented in the pattern of output spikes,
all output states would reset when the output spike occurred.
To achieve sparse coding, while the output spike was active,
energy would flow backwards through the crossbar, charging
inhibition capacitors that inhibit well-represented input spikes
from propagating until more spikes have been seen than
represented in the output. As a result, a representative encoding
of the input would be produced by counting output spikes,
and this encoding would be sparse as a function of both the
limited number of output spikes collected and the fact that
each output represents a population of input elements. This
figure is referred to throughout this section to explain details
of the SSLCA’s implementation.
Neurons in this architecture differ from previously proposed
architectures. Like prior work, the Column Headers implement
Leaky-Integrate-and-Fire (LIF) neurons [9], [16]. They consist
of a capacitor that charges (integrates) input events as they are
active, and discharges (leaks) when input events are inactive.
In contrast to those architectures, which use a separate resistor
to control the leak rate, this architecture’s LIF neurons both
accrue and dissipate charge via the memristive crossbar. In
addition to requiring fewer components, this configuration
has proven more tolerant of un-normalized receptive fields,
a phenomenon discussed in Section III-A.
The derivation of necessary parameters was broken down
into two stages: calculations without inhibition, and an exten-
sion of those calculations to incorporate inhibition. This divide
was necessary to ensure the solution was tractable, and had
the added benefit of deriving two versions of the architecture
which were used to demonstrate the benefits of inhibition.
A. Uninhibited SSLCA
To begin the derivation for the Uninhibited SSLCA, we start
with the equation for Rozell et al.’s LCA, Eq. (2), and remove
the inhibitory term. What remains is a leaky dot product, with
no O(N2) scaling problem. However, the resulting equation
also no longer possesses optimality guarantees about the
quality of its sparse code.
Oja’s rule, used in this work to train each neuron’s RF,
can be used to somewhat remedy the missing inhibitory term
by adjusting multiple RFs to work together to reconstruct
the input without inhibition [13]. Briefly, Oja’s rule is that a
neuron’s receptive field will change proportionally to its output
activity multiplied by the difference between the original
input and the LCA’s reconstruction: ∆wi,j = ηajri, where
ri = xi −
∑
j wi,jaj . This is identical to gradient descent
with a loss function of r2i . When both wi,j and yj are finite
and bounded, as in a spiking network using memristors, this
equation naturally reduces the weight for inputs that were
over-represented, and increases the weight for inputs that were
under-represented. Furthermore, increasing a weight results in
an increase of the corresponding spike count. As a result, any
given input produces a natural equilibrium of weight and spike
count values. Either might saturate, but that is not a problem
aside from the loss of some of the input’s magnitude.
Even with this compensation, using a leaky dot product for
sparse coding would be difficult in hardware; consider the
angle property of the dot product between two arbitrary vectors
X and Y:
X ·Y = |X||Y|cos(θ). (3)
From Eq. (3), a larger magnitude in either vector could
be used to compensate for a larger difference in angle. In
other words, a maximally-conductive RF would generate more
current than an RF that is a better match. Prior works have
solved this issue by normalizing each RF [29]. Instead, we
decided to solve this problem by creating a negative stimulus
via inactive input channels, achieved by grounding them rather
than using high impedance. The current through these channels
would be proportional to the RF, meaning that missing activity
where the RF is conductive would lead to a higher penalty. As
a result, a maximally conductive RF is perfectly fine within
the system. It will respond to inputs that are maximally valued,
and not respond to inputs with a shape better matched by a
different RF. Coupled with Oja’s rule, a maximally conductive
RF will adapt to not be maximally conductive if it could better
represent a wider range of inputs with a more specialized
shape. This is the reason that capacitors in this network are
placed directly on the crossbar, rather than behind a diode
or equivalent. Placing them directly on the crossbar allows
accrued charge to dissipate when an input row is inactive. This
phenomenon can be witnessed in the output voltage traces in
Fig. 3 at around 4.5 ns.
Using the layout from Fig. 2, the Row Headers for the
uninhibited SSLCA are simple passthroughs (input spikes are
directly connected to the crossbar), and the Column Headers
are simply a capacitor and a Schmitt trigger that drains all
capacitors once any one neuron’s voltage exceeds Vfire volts.
The partial derivative of any neuron’s voltage is therefore:
C
∂Vneuron
∂t
=
∑
i
(Vi(t)− Vneuron)Gi, (4)
where C is the capacitance of the capacitor, Vneuron is the
current voltage of that capacitor, Vi(t) is the ith input’s voltage
at time t (one of Vcc or 0 V depending on whether it is
currently spiking or not), and Gi is the conductance of the
memristive device connecting the nanowires of the ith Row
Header and the neuron in question’s Column Header.
This equation can be better reasoned about by assuming
an input row’s voltage Vi spikes to voltage Vcc with a mean
activity of Ki, meaning that at any point in time the voltage
is Vcc with probability Ki and grounded with probability
1 − Ki. For non-spiking inputs, such as when converting a
stored image to a format consumable by the SSLCA, the
desired duty cycle Ki may be produced through a simple
multiplication. First, establish a desired maximum input spike
duty cycle, Kmax, and then multiply this max cycle by the
analog input (assumed to be bounded on [0, 1]) to produce
Ki, the duty cycle of the input’s spiking representation. Any
method of generating spikes with the given duty cycle might
be used; what is important is the expected value of the input
line voltage. In Fig. 3, relative values of Ki can be seen as
the portion of each input’s area covered by spike activity (the
gray-shaded regions).
Using an expectation to replace Vi(t) with E[Vi(t)] =
KiVcc, Equation (4) can then be reduced via the Laplace
transform to:
Q1 =
∑
i
Gi,
Q2 = Vcc
∑
i
KiGi,
C
∂Vneuron
∂t
= Q2 −Q1Vneuron,
Vneuron(t) =
Q2
Q1
(1− e−tQ1C ) + Vneuron,t=0e
−tQ1
C , (5)
where Vneuron,t=0 is the neuron’s voltage at t = 0. Q1,
the column’s total conductance, and Q2, a matching metric
between the stored RF and the input pattern, arise as intuitive
factors that affect the neuron’s state. To establish the necessary
values for C and Vfire, Q1 and Q2 need to be derived in a way
that produces good results for the network’s “average case.”
Empirically, we found that assuming both the input and stored
RF have binary elements (even for analog problems) produced
the best results: the K values are either 1 or 0, while the G
values are either the minimum or maximum conductance of
our memristive devices.
The resulting calculation for Q1 and Q2, required to de-
termine both the network’s trigger voltage Vfire and neuron
capacitance C, is described in Algorithm 1. Though these
calculations are based on a single sample of Q1 and Q2, our
results showed that the network still worked well outside of
these “average cases” (Section IV).
Sparse coding being the goal of this architecture, we also
make the assumption that any spike event will reset all neuron
charges to 0 V, implying that each output spike only encodes
input activity seen since the end of the previous output spike.
Any input spikes in this reset window will be ignored by the
system (shown as red-shaded regions in Fig. 3). Since the
SSLCA is designed to produce multiple output spikes across
time for a single combination of inputs, this is not a problem
as the statistics of the input spikes give equal likelihood of a
spike during the reset as during any other period of operation.
Real-time (non-episodic) uses of the SSLCA would also not
suffer from this implementation detail, assuming all changes
to the input stimulus happen at a significantly lower frequency
than the spiking frequency of the system. This phenomenon
could be compared to standard sampling theory: one spike is
a sample of the input leading up to that spike, and changing
frequencies in the input that exceed some fraction of the
frequency of the sample rate cannot be deduced.
The downside to this assumption is that the architecture
becomes a one-hot system: a pattern of simultaneously-firing
output spikes becomes impossible. Superficially, this is in
contrast to some other work on stochastic computation with
spiking neurons [21]–[23]. The network still encodes stochas-
tic information in a single output spike: the input pattern
represented is stochastic due to the phase of input spikes’ duty
cycles, and as such the corresponding output is stochastically
selected. By not allowing a pattern of simultaneous output
spikes, the number of representable input patterns in a single
event is reduced. However, due to this stochasticity, we have
found that collecting multiple output spikes over a period of
time results in a stochastic pattern of output activity that accu-
rately represents the input. This is functionally identical to the
trade-off of memory for time in computation: we are reducing
the memory of the momentary output of our architecture in
exchange for longer runtime. For sparse coding, where the
resulting code often needs to be stored or otherwise buffered,
this is not an issue. Practically, as in Fig. 3, the reconstruction
gets progressively better the longer the algorithm runs, with
diminishing returns.
With the above assumption, all Vneuron,t=0 = 0, and
Eq. (5) can be rearranged to calculate C based on some
Q1, Q2, Vfire, and tfire, where Vfire and tfire are the de-
sired voltage and time at which an output spiking event should
occur given the input and stored RF parameters that produce
Q1 and Q2:
C =
−tfireQ1
ln
(
1− Vfire Q1Q2
) . (6)
As tfire can be calculated from the desired hardware clock
rate and number of spikes per patch, the remaining parameters
needed to fully specify the uninhibited SSLCA are Vfire, Q1,
and Q2. Our experiments produced the lowest reconstruction
RMSEs when Vfire is calculated based on a thresholded
max voltage from Eq. (5) with a Q1 and Q2 calculated for
the desired minimum RF that the resulting sparse code can
0.0 0.2 0.4 0.6 0.8
Rfleast
0.3
0.4
0.5
0.6
0.7
0.8
R
f
a
v
g
%
C
or
re
ct
CIFAR
1− e−1
10
15
20
25
30
35
Fig. 4: Accuracy for the SSLCA network on CIFAR-10 across different values
of Rfleast and Rfavg . Generally, setting Rfleast = (1− e−1)Rfavg was
found to be a safe choice.
represent, and when the Q1 and Q2 for the calculation of
C come from an average case of the data set used with
the network. The exact procedure followed to calculate these
values is described in Algorithm 2. The algorithm requires
knowledge of the expected average value of a stored receptive
field, Rfavg , as well as an idea of the minimum input intensity
that should trigger an output spike, Rfleast. After scanning
across many different combinations of Rfavg and Rfleast, we
discovered that setting Rfleast = (1 − e−1)Rfavg typically
yielded optimal results with regards to classification accuracy,
as can be seen in Fig. 4; this relation was used throughout
this work. Though Fig. 4 displays results only on CIFAR, we
found a similar result shape with MNIST when trying different
values of Rfavg and Rfleast. In the context of Section III-B,
the inhibited network, the shape was also similar.
Following Algorithm 2, and substituting the resulting values
into Eq. (6), all parameters for constructing the uninhibited
network are defined, and the network might be built. Applying
voltage spikes to the input lines of magnitude Vcc with a
maximum duty cycle of Kmax will cause the best-matching
column to spike for tspike seconds; collecting these spikes
across a window of time (e.g. 10(tfire+tspike) for an average
of 10 spikes) will produce a reasonable reconstruction of the
input based on the network’s receptive fields. Results with the
uninhibited SSLCA can be found in Section IV.
B. Adding Inhibition to the SSLCA
One of the original requirements deduced at the beginning
of Section III was the need for inhibition. Prior works have
shown the need for inhibition in an effective sparse coding
system [6], [8], and Section IV-A3 of this work demonstrates
this as well. While works such as that of Shapero et al.
implemented inhibition by using additional hardware between
each pair of neurons [16], leading to O(N2) scaling, the
SSLCA is designed in a way that allows for O(N) scaling.
Algorithm 1: Process used to determine Q1 and Q2 given stored RF of
average, relative conductance Rfstored and a matching input of average,
relative intensity Rfinput.
Input: Rfstored, the average, relative conductance of the stored RF.
This value must be on the interval (Gmin
Gmax
, 1];
Rfinput, the average, relative intensity of all inputs;
Gmin, the minimum conductance of a crossbar device;
Gmax, the maximum conductance of a crossbar device;
Kmax, the proportion of time spent at Vcc for an input signal
spiking at its maximum rate;
N , the number of inputs to the network.
Output: Q1, Q2
begin
// Assumes that the stored RF consists
entirely of elements at Gmax or Gmin, and
that the input pattern matches, but with a
scaled intensity of
Rfinput
Rfstored
. This
simplification helps the network perform
well with high-contrast RFs.
gmin ← GminGmax ;
Ih ← Rfstored−gmin1−gmin ; // Portion of inputs
at max-intensity.
Il ← 1− Ih;
Q1 ← NGmaxRfstored;
Q2 ← NVccGmaxKmax RfinputRfstored
(
Ih + Ilg
2
min
)
.
// Note that the g2min term comes
from one gmin multiplication
for Gmax and another for
Kmax. This is the “match”
term between the RF and the
input.
end
Algorithm 2: Recommended process for selecting Vfire, Q1, and Q2,
required for the calculation of C from Eq. (6).
Input: Rfavg , the desired average, relative conductance of a stored
RF. This value must be on the interval (Gmin
Gmax
, 1];
Rfleast, the smallest average, relative input intensity is
expected to trigger an output spike;
Gmin, the minimum conductance of a crossbar device;
Gmax, the maximum conductance of a crossbar device;
Kmax, the proportion of time spent at Vcc for an input signal
spiking at its maximum rate;
N , the number of inputs to the network.
Output: Q1, Q2, Vfire
begin
Vfire ← (1− e−1)Q2Q1 , with Q1, Q2 from Algorithm 1 applied
to Rfstored = Rfavg , Rfinput = Rfleast, other parameters
matching;
Q1, Q2 ← Q1, Q2 from Algorithm 1 applied to
Rfstored = Rfavg , Rfinput = Rfavg .
end
Instead of additional hardware, a percentage of the SSLCA’s
running time is dedicated to calculating inhibitory forces.
Whenever an output spike is generated, the duration of the
output spike is used to pass current from the corresponding
column back through the SSLCA’s crossbar, charging capaci-
tors in the Row Headers. Intuitively, the charges on these ca-
pacitors indicate how well represented the corresponding input
signal is in the current reconstruction; overrepresented input
signals will be suppressed. This is implemented through the
Row and Column Headers shown in Fig. 5. The effect of this
Row Header
Column Header
cbIn
!trainHigh
trainLow!trainAny
Training Subsection
!firingAny
spikeIn
Inhibition Subsection
cbOut
!trainLow trainHigh
!firingSelf
!firingAny & !trainingAny
firingAny
Vfire
firingSelf
toutSpike ≈ f(RC)
CBLOW
CHARGE
SPIKE
SPIKE CBLOW
CHARGE
Inhibition Logic Module
!firingAny
Fig. 5: The Row and Column headers needed to add inhibition to the SSLCA.
The Row Header’s responsibilities are to stop input spikes from reaching the
crossbar when they are inhibited, and to keep track of the current state of the
inhibitory forces. An Inhibition Logic Module is diagrammed as broken out
from the main circuit for space reasons. The CHARGE port is responsible
for sinking current from the crossbar when an output spike has occurred, and
in turn charges the capacitor in the Inhibition Logic Module which prevents
subsequent spikes from applying a voltage on the crossbar. After enough input
spikes occur, the capacitor becomes sufficiently drained to apply voltage to
the crossbar once more. The Column Header is much simpler and uses a
transmission gate to direct current to and from the neuron’s state capacitor.
When any neuron fires, the capacitor is drained, and in the same column, Vcc
is applied to the crossbar. A simple RC circuit cleaned up by several NOT
gates is responsible for the output spike.
can be seen on the voltage traces of the inhibition capacitors
of Fig. 3 during the red-shaded regions: inputs corresponding
to the currently-firing output charge their inhibition capacitors
more quickly than inputs that are poorly-represented by the
spiking output.
The Column Header for the Inhibited SSLCA is almost
identical to that of the Uninhibited SSLCA: a standard LIF
neuron setup, with the state capacitor connected directly
(through a transmission gate) to the nanowire crossbar rather
than being buffered. A crude schmitt trigger setup ensures
that all output capacitors drain sufficiently when any neuron
fires (firingAny), resetting the spike potentials. Additionally,
if the current column is firing (fireSelf), the crossbar column
is pulled up through a transistor. As a result, during each
output spike, an inhibition current will flow back through
the memristive crossbar into the Row Headers. The current
flowing into each Row Header will be proportional to the
receptive field of the firing column.
The Row Header is more complicated, but the important
aspect is that a capacitor storing the inhibition state discharges
whenever an input spike arrives, and charges whenever an
output spike occurs. In this way, the inhibition capacitor
contrasts how much the input is represented in the output
(increasing inhibition voltage) with the actual activity of the
input (decreasing inhibition voltage). The capacitor is charged
through the crossbar junctions; the resistor for discharging
the capacitor, referred to as Rinhib, is the one in the labeled
Inhibition Logic Module. The stored inhibition state, when
above Vcc2 , prevents input spikes from reaching the crossbar.
Vcc
2 is chosen as it maximizes the linearity of the inhibitory
response, since both charging and discharging occur at the
same point on the exponential function (Eq. (7)).
Calibrating this architecture requires specifying both the
capacitor, Cinhib, and the resistor, Rinhib, in the Inhibition
Logic Module (Fig. 5). Ideally, a sparse coding algorithm
should produce a stable, one-hot response to an input that
exactly matches any of the stored RFs, and should combine
several outputs when representing inputs that do not match
a stored RF exactly. For simplicity, we focused on tuning the
inhibitory components of the network to an input that matches
the stored conductance of an RF, similarly to Algorithms 1
and 2. Additionally, we make room for inhibition in the spike
cycle by using a neuron capacitance of Ccb = f(C), where C
is from Eq. (6) and f(C) is an arbitrary function with value
0 < f(C) < C. Using Rcb as the equivalent resistance of
the memristive device used to charge the inhibitory force, and
Rinhib as the resistance in the Inhibition Logic Module, we
can write a few equations to describe the inhibition voltage
Vi for a specific input i both before an output spiking event
(Vi,pre) and after an output spiking event (Vi,post):
A =
1
RcbCinhib
,
B =
Ki
RinhibCinhib
,
Vi,pre = Vi,0e
−tfireB ,
Vi,post = Vcc + (Vi,pre − Vcc)e−tspikeA, (7)
where tspike is the duration of an output spike, Ki is the
portion of the time that the input being tracked is active, and
Vi,0 is the voltage after an output spike. For a stable system
with a uniform firing rate, Vi,0 = Vi,post, and we are left with:
Vi,0 = Vcc +
(
Vi,0e
−tfireB − Vcc
)
e−tspikeA
=
Vcc
(
1− e−tspikeA)
1− e−tfireB−tspikeA . (8)
With the inhibition voltage after a spike defined, one issue
remains: so long as Vi,0 > Vcc2 , the desired tfire will no longer
match tfire without inhibition. There is always a period of
time during which input spikes are inhibited, inflating tfire.
We label this period of time as tinhib, and rewrite tfire as
tinhib + tcollect, allowing Eq. (8) to be rewritten and a second
equation for Vi,0 to be written by integrating backwards to Vi,0
from Vcc2 . These two equations are then combined to make a
single equality, the solution of which indicates adequate values
for Rinhib and Cinhib:
TABLE I: Example Parameters for Inhibited Network
Row N Rfavg Gmin (µS) Gmax (µS) Vfire (mV) Ccb (fF)
1 192 0.40 4.8 19 87 1200
2 192 0.40 0.48 1.9 87 120
3 192 0.40 0.048 0.19 87 12
4 192 0.40 0.048 1.9 130 120
5 192 0.40 0.048 19 140 1200
6 48 0.40 4.8 19 87 290
7 48 0.60 4.8 19 120 430
8 48 0.80 4.8 19 130 580
Vi,0 =
Vcc
(
1− e−tspikeA)
1− e−tcollectB−tspikeA ,
Vi,0 =
Vcc
2
etinhibB ,
Vcc
2
etinhibB =
Vcc
(
1− e−tspikeA)
1− e−tcollectB−tspikeA . (9)
Unfortunately, this formulation leaves two new variables,
tinhib and tcollect. Additionally, were we to use the original
capacitance calculated in Eq. (6), we would miss the desired
tfire due to the added time for inhibition. To solve all of
these problems, we set Ccb = f(C) = C2 . tcollect is then
solved for using the new neuron capacitance Ccb and Q1, Q2
from Algorithm 1 using Rfstored = Rfinput = Rfavg .
tinhib is then solved by subtracting tcollect from tfire. The
remaining variable, Rinhib, is solved for by taking the log
of both sides of the above equality (Eq. (9)), squaring the
result, and minimizing the resulting function via Python’s
scipy.optimize.minimize, ensuring a near-zero result [32].
Examples of the results from Algorithm 2 plus the inhibition
transformations (Ccb = f(C)) can be seen in Table I. Notably,
N = 192 corresponds to an input image of dimension
8×8×3, while N = 48 corresponds to an input dimension of
4×4×3. Row 1 is similar to the settings used in most of our
experiments. While Ccb = 1200fF is significant, this number
could be greatly reduced by future memristive technologies
with greater resistance (rows 2 and 3). Rows 4 and 5 highlight
that a higher ratio of Gmax to Gmin results in a higher
Vfire, which would be helpful to overcome the comparator’s
input offset voltage and would allow the algorithm to better
represent zero weights in each neuron’s RF; both of these
would increase the algorithm’s effectiveness. A lower Gmax
may be artificially imposed on the network if the circuit
designer wants less capacitance and is willing to sacrifice some
of the accuracy that comes from a high Gmax to Gmin ratio.
Rows 6 through 8 demonstrate the effects of fewer inputs,
and of varying Rfavg , the expected average stored RF in the
network. As written, Rfavg is also treated as the average input
to the network; for very low Gmax to Gmin ratios, this might
not make sense, and the actual average input value should be
added as a separate input to Algorithm 2 and used in the final
calculation of Q1 and Q2 to correct for the difference.
To validate this network design, we investigated different
parameters other than Rfavg for Rfstored and Rfinput when
optimizing the inhibitory response. Figure 6 demonstrates the
results of this: while different combinations require different
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Rfstored
0.0
0.2
0.4
0.6
0.8
1.0
R
f
in
p
u
t
Rfstored = Rfinput
30000
45000
60000
75000
90000
105000
120000
Fig. 6: Values ofRinhib needed to achieve the desired spike rate with different
Rfstored and Rfinput values (the spike rate scales with the ratio of Rfinput
over Rfstored). Ideally, the resulting plot would be flat, indicating a single
value of Rinhib is sufficient for all cases. Since it is not flat, areas with
larger than the chosen Rinhib will spike slower than expected, and areas
with a smaller value will spike faster. In practice, we use Rfavg for both
(the blue line); receptive fields with a high stored value and a low input will
under-spike, which should not be an issue as those regions should be better-
covered by another neuron.
values of Rinhib to be completely accurate, choosing a single,
median value works well in practice.
C. Training
The networks we used were trained using the ADADELTA
algorithm in tandem with Oja’s rule across two epochs of the
training data, an identical approach to our prior work [8], [13],
[33]. When considering the reconstructions for Oja’s rule, we
used the ratio of the conductance of each memristive device
to Gmax, the maximum expected conductance of a crossbar
device. This approach limited the minimum representation of
each input element in an RF to the inverse of the conductance
on/off ratio of the memristive device. For our experiments, we
used the Yang et al. device which featured a conductive on/off
ratio of around 4 at 0.7 V [17]. We also tried training without
this limitation (allowing the learned weight to drop all the way
to 0, even though the device conductance would be set to 0.25),
but did not find such a change to impact accuracy, although it
did affect the RMSE between the input and the reconstruction.
Since the logical minimum does not affect the programmed
conductance, this makes sense: the resulting sparse code is
unchanged. The benefit of training with a non-zero minimum
representable value is that the training could be done using
only the memristive crossbar, without supplemental memory.
Homeostasis was used during training to encourage the
network to use all available neurons, similar to prior work
by Querlioz et al. [34]. If a neuron had not produced an
output spike after several patches, Vfire was lowered for that
neuron to encourage it to spike. This behavior was disabled
for evaluating accuracy and RMSE.
For this work, all conductances were represented as analog
values. We have conducted prior research that assumed a lower
resolution of conductances would be achievable [8]. Currently
available literature has shown that memristors might be trained
within 1% of a target resistance [35], which is much better
than the 4-bit resolution needed for good performance with
neuromorphic algorithms [8], [26]. Note that a 4-bit resolution
corresponds to ±3.1% write accuracy.
D. Models Used for Power and Accuracy Comparisons
The SSLCA was simulated algorithmically based on the
above equations and algorithms. The simulator was written as
a hybrid event/time-based simulator based on the maximum
of the next predicted spiking event and a small window of
time (2× 10−18 s). Traces from this simulator are shown in
Fig. 3. An identical setup was used to produce that figure as the
standard setup for the MNIST and CIFAR experiments, with
the exception that Fig. 3 only used a network with 4 inputs
and 2 outputs. Even so, the function of the network is identical
for larger networks. Unless otherwise specified, our algorithm
was configured to collect an average of 10 spikes per exposed
image, based on Fig. 7. Accuracy was computed with a Single-
Layer Perceptron (SLP) network that was trained to associate
resulting sparse codes with the category that generated them.
This setup is efficient to compute, but does not rival the
accuracy of a state-of-the-art deep learning architecture. A
deep learning classifier was investigated in Section IV-A3.
While crossbar and capacitor power were calculated through
these simulations, comparator power for the column headers
was derived by simulating the 5 GHz comparator from Xu
et al. 2011 at 4 GHz using a 0.7 V power supply, implemented
with 45 nm CMOS transistors using the Predictive Technology
Model published by Zhao et al. in 2006 [36], [37]. It was
found that, per column, this setup added 2.2 µW.
Since we used Xu et al.’s comparator at 4 GHz [36], we
configured the networks for an average spike accumulation
period (tfire) of 0.8 ns and an output spike duration (tspike)
of 0.2 ns. Thus, the 10 average output spikes occur roughly
once every tfire + tspike =1 ns. Regardless of the actual
number of spikes, the algorithm stops after 10 ns, and moves
on to the next image. Input spikes were considered with a
firing time of 0.4 ns and a maximum active duty cycle of
Kmax = 0.5 unless otherwise noted. Assuming a data set
with an average input intensity of 0.5, which is similar to the
average of CIFAR-10, this means that each input spikes, on
average, once every 0.40.5Kmax =1.6 ns. Coupled with the time
represented by each output spike, tfire =0.8 ns, we infer that
each output spike experienced spikes from only half of the
active inputs, on average. Note that the input spike generation
method is not important so long as the expectation of the input
voltage is maintained. Our model used a simple uniform-noise-
driven model to produce gaps between spikes such that the
expected voltage on the spiking line was proportional to the
corresponding image element’s intensity.
E. Example Code Availability
The simulation implementation used in this work was made
available on Github at https://github.com/wwoods/tlab_sslca.
IV. RESULTS
The SSLCA, SLCA, and LCA were tested with two different
data sets to demonstrate the relative performance of the
SSLCA. Reported RMSE values were generated as though
zero were representable, and accuracies were from an SLP
(discussed in Sections III-C and III-D). Experiments were run
either 12 times, or until ±10 % accuracy was achieved with
95 % confidence as per [38].
To show that our assumptions and simplifications did not
result in significantly worse accuracy than the algorithms from
which the SSLCA was derived, all results were compared with
both the LCA and Shapero et al. [16]’s SLCA. An accuracy
comparison across different numbers of output spikes can be
seen in Fig. 7. The LCA implementation is from equation
(3.1) of Rozell et al.’s paper [12]; the SLCA implementation
consisted of equations (5)-(7) in Shapero et al.’s paper [16].
Note that only the outputs of the SLCA network are spiking,
while its inputs are constant voltages. Our work dealt with
both spiking inputs and outputs.
Power numbers for the LCA come from (13) of Shapero
et al.’s 2012 work [18] and scale as O(N√N). Power num-
bers for the SLCA come from Shapero et al.’s 2013 work [16],
and as that work included no built-in Vector Matrix Multiplier
(VMM) as our algorithm does, we added the power from a
memristor-based VMM to its figures. The throughputs of each
of those architectures were several orders of magnitude lower
than the SSLCA’s (Fig. 1).
A. CIFAR-10
The first dataset, CIFAR-10, consisted of 60 000 32 × 32
RGB images, each containing one of 10 classes of objects
[15]. For faster simulation and to demonstrate the scalability
of each algorithm, these were scaled down to both 3× 3 and
8×8. As the CIFAR-10 dataset contains equal numbers of each
class, a simple accuracy was used to evaluate each algorithm’s
abilities.
1) Accuracy: Compared to an optimal, analog implementa-
tion of the LCA, the SSLCA with inhibition matched perfor-
mance on the 8×8 rescale of CIFAR-10 with a 3 % relative loss
in accuracy (33 % vs 32 %; Fig. 8). The uninhibited SSLCA
always produced a worse reconstruction than its inhibited
counterpart, although for low Rfavg (and correspondingly a
higher number of spikes per patch) its classification accuracy
was better with the simple SLP classifier. The trained network
had an average spike count of 8 even though the architecture
was configured for 10 spikes. The spike duty cycles were
Kin = 0.5 and Kout = 0.2, where Kin is the maximum
duty cycle of input spikes (and will be scaled by each input’s
intensity), and Kout is the expected duty cycle of the output
spikes. That is, an output spike spikes for Kout(tfire+tspike).
The performance seen on the 3× 3 and 8× 8 rescales are
compared in Fig. 9. In both instances, the performance of the
LCA is approached by the SSLCA. However, the value of
Rfavg that optimizes accuracy is not obvious based on the
problem’s statistics; using the dataset average works well for
the 3 × 3 case, but the 8 × 8 case requires a smaller Rfavg
5 10 15 20 25
# Output Spikes
0.10
0.15
0.20
0.25
0.30
0.35
0.40
R
M
S
E
5 10 15 20 25
# Output Spikes
10
15
20
25
30
35
%
C
or
re
ct
Shapero et al. 2×
Shapero et al. 0.5×
SSLCA w/o Inhibition 2×
SSLCA w/o Inhibition 0.5×
SSLCA with Inhibition 2×
SSLCA with Inhibition 0.5×
LCA 2×
LCA 0.5×
LCA SLCA SSLCA w/o Inhib SSLCA w/ Inhib
0.000
0.025
0.050
0.075
0.100
0.125
0.150
0.175
P
ow
er
(W
)
0.5×
2.0×
Fig. 7: Comparison of the LCA [12], SLCA [16], and the SSLCA on
CIFAR-10 scaled to 8× 8; suffixes indicate completeness (2× indicates 384
neurons, while 0.5× indicates 96 neurons). While the SLCA achieves lower
RMSE with significantly more spikes (around 100), for practical numbers of
spikes the SSLCA produced much better results. The LCA performed better
classification with fewer output neurons because it had slightly less output
activity, which with a shallow classifier is more effective. A lower RMSE is
more important for deep learning, seen in Fig. S3. While the LCA displayed
promising power statistics for this problem, its throughput was four orders of
magnitude smaller.
in order to encourage more output activity, which translates
into higher accuracy. At values of Rfavg approaching the
memristive device’s minimum, the algorithm breaks down, as
seen by the decreasing accuracy. This result can be explained
through Eq. (6) and Algorithm 2: small Rfavg results in
a lower Q1 and thus a smaller C, reducing the smoothing
of input spike activity and in turn producing less consistent
patterns of output spikes.
Another facet investigated was how different K factors
(spike duty cycles) affected the overall classification accuracy
of the system. The result is shown in Fig. S1; generally, a
higher input duty cycle Kin performed better, and a lower
0.3 0.4 0.5 0.6 0.7 0.8
Rfavg
10
15
20
25
30
35
%
C
or
re
ct
0.3 0.4 0.5 0.6 0.7 0.8
Rfavg
0.100
0.125
0.150
0.175
0.200
0.225
0.250
R
M
S
E
0.3 0.4 0.5 0.6 0.7 0.8
Rfavg
4
6
8
10
12
14
#
O
u
tp
u
t
S
p
ik
es SSLCA w/ Inhibition
SSLCA Uninhibited
LCA
CIFAR-10 Avg
Fig. 8: SSLCA accuracy targeting 10 spikes on CIFAR-10 rescaled to 8× 8
with and without inhibition, compared to LCA. Lower Rfavg tended to
produce lower RMSE due to increased activity in the resulting sparse code,
and increased spike count (since the input intensity is greater than the
target, spikes happen more frequently than calibrated). Inhibition ubiquitously
reduced the RMSE, although with an SLP, its classification accuracy was less
than the uninhibited version for darker receptive fields. See Section IV-A3 for
the impact of RMSE when using a deep classifier.
output duty cycle Kout performed better. Intuitively this makes
sense: larger duty cycles for input spikes means that more
spikes are expected to work together when forming a single
output spike; smaller duty cycles for output spikes means more
time spent collecting input spikes, and thus each output spike
represents a better average of the input spikes triggering it.
Device variability was also considered. Previous work has
demonstrated significant variance from one read to the next
[39]. To test how the SSLCA performed with imperfect hard-
ware, we implemented three types of conductance deviations:
read deviation, write deviation using offline training, and
write deviation with online training. Read deviation was re-
calculated after every output spike to better simulate the time-
varying nature of read randomization, and varied the effective
conductance of a device uniformly by ±0 % to 80 % (a
standard deviation of 0 % to 46 %). Write deviation with offline
training consisted of training the model without variance, and
then varying the conductance uniformly by ±0 % to 180 %
(not allowed to drop below 0 S; a standard deviation of 0 %
to 104 %). Write deviation with online training was applied
after each application of Oja’s rule, and modified the target
conductances uniformly by ±0 % to 30 % (a standard deviation
of 0 % to 17 %).
These results can be seen in Fig. 10. Neither read variability
nor offline-trained write variability were found to have a
significant impact. For online training, write variability could
0.3 0.4 0.5 0.6 0.7 0.8
Rfavg
10
15
20
25
30
35
%
C
or
re
ct
0.3 0.4 0.5 0.6 0.7 0.8
Rfavg
0
2
4
6
8
10
#
O
u
tp
u
ts
A
ct
iv
e
SSLCA 8× 8
SSLCA 3× 3
LCA 8× 8
LCA 3× 3
CIFAR-10 Avg
Fig. 9: A look at the difference between CIFAR-10 scaled to 3×3 and 8×8.
In both cases, the SSLCA approached the LCA’s accuracy. Unfortunately, the
required setting of Rfavg to maximize accuracy is not intuitive. The lower
plot shows the sparsity of the output for each configuration; like the original
LCA’s λ threshold, a combination of the Rfavg parameter of the SSLCA
and the number of spikes collected may be used to control the sparsity of the
output.
be tolerated up to 3 %. This result is satisfactory for 4-bit
learning, as described in Section III-C. Should better vari-
ability resistance be required, prior work on imperfect weight
updates has indicated that sensitivity to these deviations might
be further mitigated with a more aggressive training regimen
that deliberately changes the magnitude of weight updates for
greater effect [8].
2) Power: The inhibited SSLCA exhibited extremely low
power consumption on the CIFAR-10 task scaled to 8 × 8
with 128 neurons (2× completion); at the optimal Rfavg =
0.43, the consumption was just 1.77 pJ/input (Fig. 11) with a
throughput of 100 MOps/s. Compared with prior work such
as Knag et al. [25], whose lowest energy consumption was
48 pJ/input, this was a 96 % reduction in energy consumption
for 180× the throughput during inference [25]. At their high
throughput (310 MHz), the SSLCA exhibited a 99 % reduction
in energy consumption with a still substantially improved 21×
throughput.
Spiking architectures are often considered to produce power
savings, though the extent of these savings has been a topic
of discussion for some time [21]; we investigated that claim
in Fig. S2. Except for very large duty cycles, the spiking
architecture’s crossbar used less power than the non-spiking,
voltage-scaled crossbar. With a spiking implementation like
the SSLCA, where input spikes are suppressed during an
output spike, we would have expected the spiking to con-
sume less power so long as (1 − Kout)Kin < Rfinput.
This is a result of average power scaling with the square
0 10 20 30 40
% Read Deviation
0.2
0.4
0.6
0.8
1.0
N
or
m
a
li
ze
d
A
cc
u
ra
cy
0.0 2.5 5.0 7.5 10.0 12.5 15.0
% Write Deviation (Online)
0.2
0.4
0.6
0.8
1.0
N
or
m
a
li
ze
d
A
cc
u
ra
cy
0 20 40 60 80 100
% Write Deviation (Offline)
0.2
0.4
0.6
0.8
1.0
N
or
m
a
li
ze
d
A
cc
u
ra
cy
CIFAR-10 3× 3
CIFAR-10 8× 8
MNIST 14× 14
4-bit Write Threshold
Fig. 10: The effects on the SSLCA of conductance variability during each
read cycle (the period of time between two output spikes), during each
write cycle (online training), or when a weight matrix learned offline is
written to the memristive crossbar (offline training). Our results showed that
unmitigated write deviations become serious for online algorithm stability
after 3%. However, using offline training or modifying the training approach
as previously reported helps significantly [8]. For CIFAR-10 3 × 3, 8 × 8,
and MNIST 14 × 14, Rfavg = 0.46, 0.425, 0.35, respectively. Accuracies
were normalized based on performance without deviations.
of voltage versus linearly with a duty cycle. The SSLCA
surpasses this expectation due to the additional input spike
suppression implemented through the inhibition mechanism.
Interestingly, the standard deviation for the SSLCA’s power
was also substantially lower, probably as a result of columns
in the SSLCA not being grounded, unlike the LCA, which
sinks all current into a virtual ground [8], [17].
3) Information Retention for Deep Learning: While an
SLP might be used in practice due to the simplicity of its
implementation, it does not adequately express the depth of
the information contained in the input dataset. To determine
how much useful information was retained by both the LCA
and SSLCA encodings, we used these architectures to encode
augmented, full-size 32 × 32 CIFAR-10 input images using
convolutions of different sizes. The convolved, sparse coded
input images were then passed as input to a state-of-the-art
deep learning architecture, the DenseNet-BC, presented by
Huang et al. in 2016 [40]. This network architecture consists
of a number of dense blocks that each halve the scale of the
input data; within each dense block are many more layers,
each accepting as input all previous layers within the dense
0.3 0.4 0.5 0.6 0.7 0.8
Rfavg
0.02
0.04
0.06
0.08
0.10
0.12
0.14
P
ow
er
(W
)
SSLCA w/ Inhibition
SSLCA Uninhibited
Fig. 11: Power for 8×8 CIFAR-10. Inhibition produced lower, more consistent
power consumption due to its suppression of input spikes. Higher values
of Rfavg , which encourage the network to learn more conductive RFs,
consumed more power accordingly.
block. Using this setup with 3 dense blocks and parameters
L = 190, k = 40, Huang et al. achieved 96.54 % accuracy on
CIFAR-10 (with data augmentation) [40]. See Huang et al. for
more further details on these parameters.
We tested our architecture by dividing the input CIFAR-10
image into non-overlapping patches of S × S, and encoding
each patch using either the LCA or the SSLCA. For example,
S = 4 implies that the 32× 32 CIFAR-10 image was broken
into 8× 8 non-overlapping regions of 4× 4; each region was
then sparse coded, and the resulting “image” consisting of
all such encodings was passed to the DenseNet. For the 2×
networks with S = 4, this means that rather than receiving
each image as a 32 × 32 × 3 spatial array, we passed in an
8 × 8 × 96 spatial array. For the 0.5× networks, the spatial
array passed would only have a depth of 24. The SSLCA was
configured with Rfavg = 0.45.
In order to allow each DenseNet a similar amount of
expression for its classification, we parametrized the DenseNet
so that the final dense block would output a 4 × 4 spatial
array; the original paper’s final block output an 8 × 8 array.
To accomplish this, each DenseNet had a number of dense
blocks B = −1+log2 32S . To hold the number of computations
that each DenseNet performed roughly equivalent, we chose
L = 40, k = 12, and the number of filters on the initial
convolution before the first dense block was k0 = 6S2B rather
than 16. The limitation of this approach is that the number
of tunable parameters becomes significantly larger with larger
values of S, creating a greater potential for overfitting.
Rather than 300 epochs with mini-batches of 64 samples, we
used 150 epochs and mini-batches of 32 samples to train these
networks. We trained using stochastic gradient descent, with
an initial learning rate of 0.1; after 75 epochs this was reduced
to 0.01, and after 112 epochs this was further reduced to 0.001.
Simulations were done with keras; the DenseNet implementa-
tion can be found at https://github.com/titu1994/DenseNet, and
keras can be found at https://github.com/fchollet/keras. Each
2 4 8
S
0
20
40
60
80
100
%
C
o
m
p
re
ss
io
n
2 4 8
S
0
20
40
60
80
100
%
C
or
re
ct
Average Pooling
LCA
SSLCA w/o Inhibition
SSLCA w/ Inhibition
Raw Pixels % Correct
Fig. 12: Comparison of each algorithm at different encoding scales S. Inhi-
bition always improved performance when using the deep classifier, without
significantly affecting compression. The LCA does not change significantly
due to a fixed λ threshold parameter; the SSLCA might achieve a similar effect
by collecting more spikes, but this would slow the algorithm’s throughput.
accuracy measurement was the result of a single trial, so some
stochasticity is embedded in the reported results. They are
nonetheless internally consistent. Results are shown in Figs. 12
and S3.
On the raw CIFAR-10 data, these conditions produced
a classification accuracy of 92 %. With the analog LCA
S = 4, the DenseNet achieved a classification accuracy
of 82 %, compressing the data down by 90 %. The SSLCA
produced an accuracy of 80 % with 92 % compression. In
the context of the deep classifier, we found a direct and
inverse relationship between the LCA’s RMSE and the clas-
sifier’s accuracy (Fig. S3). Compression was calculated as
1−# active neurons× (log2(# neurons) + 4) divided by
the number of bits in the image (8×W ×H). This represents
the minimum number of bits to send a neuron index and its
4-bit spike count per active neuron.
For different values of S, the LCA maintained similar
compression factors, due to the threshold λ being held constant
with an increasing number of inputs, leading to more active
outputs (Fig. 12). In contrast, the SSLCA’s sparsity comes
from the number of spikes collected, which was fixed at
10 for all experiments. Thus, for larger patch sizes, more
and more sparse representations were created, resulting in
lower accuracy but higher compression. These parameters
are all configurable and could be used to trade off between
accuracy and compression, but these values were chosen as
they produced roughly equivalent compression at S = 4. If
the input dataset has greater covariance, then the accuracy loss
would be lower for higher compression rates. Inhibition was
always beneficial with a deeper classifier, and each increase
in accuracy aligned with lower RMSE without exception.
Figure 12 also includes compression factors and accuracies for
downsampling the input image using average pooling across
groups of S × S pixels. This demonstrates a baseline image
compression technique against the sparse coding achieved by
0.3 0.4 0.5 0.6 0.7 0.8
Rfavg
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
B
ia
s
%
C
or
re
ct
10
20
30
40
50
60
70
80
90
0.3 0.4 0.5 0.6 0.7 0.8
Rfavg
0.004
0.006
0.008
0.010
0.012
P
ow
er
(W
)
SSLCA w/o Bias
SSLCA w/ 0.35 Bias
LCA
Fig. 13: Results of MNIST scaled to 14×14. Since MNIST has a much lower
average input value than CIFAR, a bias needed to be applied to compensate for
the additional current lost from the neurons back into the crossbar (Eq. (5)).
Since a bias also increased the duty cycle of each input signal, more power
was consumed.
the LCA family. Generally, the LCA methods produced greater
accuracies at comparable, high levels of compression than the
average pooling method.
B. MNIST
The second dataset, MNIST, consisted of 70 000 28 × 28
grayscale images, each containing a single, centered, hand-
written digit [14]. Again for faster simulations, this dataset
was scaled down to 14 × 14. The test set contains an equal
number of each digit class, so a simple accuracy was tabulated
for each algorithm.
1) Accuracy: The SSLCA as defined up to this point
performed notably worse on MNIST than the non-spiking
LCA (Fig. 13). Unlike CIFAR, which has an average input
value of 0.47, MNIST has an average input value of only
0.13. Since the SSLCA was designed deliberately to include a
leak current through the crossbar, this lower input intensity
could not sustain neuron charge reliably, leading to more
random patterns of input events being encoded in each output
spike. We found that applying a bias signal, by redefining
the duty cycle of each input signal from Kmaxkinput to
Kmax(bias+(1−bias)kinput), we could remedy this problem
while preserving the power gains of the SSLCA architecture.
For MNIST, we found that a bias of 0.35 boosted perfor-
mance from 77 % correct classification up to 84 %, versus a
performance of 88 % by an optimal, analog LCA. The MNIST
experiments’ responses to write deviations were not found to
be significantly different than the CIFAR experiments’, shown
in Fig. 10.
2) Power: The power savings on MNIST, even with the
bias, were in-line with those found for CIFAR: 0.26 pJ/input
for 100 MOps/s. Note that the increased power savings com-
pared to CIFAR (which consumed 1.77 pJ/input) were due to
the lower relative number of outputs to inputs: additional neu-
rons are more expensive than additional input lines (partially
due to the comparator, though mostly due to the crossbar).
While the cost of additional inputs is different from the cost
of additional columns, the SSLCA still demonstrates O(N)
scaling in both dimensions.
As seen in Fig. 13, a non-spiking approach might consume
less power on MNIST due to the low Rfavg of the dataset.
On the other hand, the power presented for the LCA does
not include inhibitory logic, unlike the SSLCA: it would be
difficult to include inhibition logic without closing the already-
narrow margin.
V. CONCLUSION
Our work demonstrated that memristive devices with a low
conductance ratio could be used in the design of a fast,
low-power sparse coding circuit, with in-situ learning, as
long as their conductances could be set within ±3 %. This
requirement matches the 4-bit resolution required by other
neuromorphic architectures. Our proposed circuit was both fast
and energy-efficient, improving upon a previously published
all-CMOS ASIC with 21× the throughput while using 99 %
less energy per input. The resulting sparse codes were also
shown to be of a high quality. When evaluated with a state-
of-the-art deep learning network, our circuit demonstrated a
reduction in relative accuracy of only 2.4 % (to 80 % from
82 %) compared to an optimal, analog sparse coding algorithm.
Our circuit maintained this fidelity while compressing the
input data by 92 %. These figures are all affected by circuit
parameters that could be adjusted for higher accuracy and
lower compression. We showed that even datasets with low
input activity, such as MNIST, could be properly represented
through the use of a bias. The proposed SSLCA architecture
was demonstrated to be very resistant to device variations,
particularly when used with offline training. Sparse coding
algorithms such as the SSLCA could be used to greatly reduce
communication bandwidth between visual sensors and other
processing algorithms, such as deep-learning networks.
ACKNOWLEDGEMENT
The authors would like to thank Garrett Kenyon of Los
Alamos National Laboratories for helpful discussions concern-
ing the LCA. This work was supported by the National Science
Foundation under award # 1028378 and by DARPA under
award # HR0011-13-2-0015. The views expressed are those of
the author(s) and do not reflect the official policy or position of
the Department of Defense or the U.S. Government. Approved
for public release, distribution is unlimited.
REFERENCES
[1] A. Coates, A. Y. Ng, and H. Lee, “An analysis of single-layer networks
in unsupervised feature learning,” International Conference on Artificial
Intelligence and Statistics, pp. 215–223, 2011.
[2] H. B. Ammar, K. Tuyls, M. E. Taylor, K. Driessens, and G. Weiss,
“Reinforcement learning transfer via sparse coding,” Proceedings of the
11th International Conference on Autonomous Agents and Multiagent
Systems, no. Aamas, pp. 4–8, 2012.
[3] T. Masquelier and S. J. Thorpe, “Unsupervised learning of visual
features through spike timing dependent plasticity,” PLoS Computational
Biology, vol. 3, no. 2, pp. 0247–0257, 2007.
[4] C. Zamarreño-Ramos, L. A. Camuñas-Mesa, J. A. Pérez-Carrasco,
T. Masquelier, T. Serrano-Gotarredona, and B. Linares-Barranco, “On
Spike-Timing-Dependent-Plasticity, Memristive Devices, and Building
a Self-Learning Visual Cortex,” Frontiers in Neuroscience, vol. 5, no.
MAR, pp. 1–22, 2011.
[5] M. Zhu and C. J. Rozell, “Modeling Inhibitory Interneurons in Efficient
Sensory Coding Models,” PLOS Computational Biology, vol. 11, no. 7,
p. e1004353, 2015.
[6] D. Querlioz, W. S. Zhao, P. Dollfus, J.-O. Klein, O. Bichler, and
C. Gamrat, “Bioinspired networks with nanoscale memristive devices
that combine the unsupervised and supervised learning approaches,”
in Proceedings of the 2012 IEEE/ACM International Symposium on
Nanoscale Architectures - NANOARCH ’12, pp. 203–210. New York,
New York, USA: ACM Press, 2012.
[7] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and
W. Lu, “Nanoscale Memristor Device as Synapse in Neuromorphic
Systems,” Nano Letters, vol. 10, no. 4, pp. 1297–1301, 2010.
[8] W. Woods, J. Bürger, and C. Teuscher, “Synaptic Weight States in a Lo-
cally Competitive Algorithm for Neuromorphic Memristive Hardware,”
IEEE Transactions on Nanotechnology, vol. 14, no. 6, pp. 945–953,
2015.
[9] M. Payvand and L. Theogarajan, “Exploiting local connectivity of
CMOL architecture for highly parallel orientation selective neuromor-
phic chips,” Proceedings of the 2015 IEEE/ACM International Sym-
posium on Nanoscale Architectures, NANOARCH 2015, pp. 187–192,
2015.
[10] C. H. Bennett, D. Chabi, T. Cabaret, B. Jousselme, V. Derycke, D. Quer-
lioz, and J. O. Klein, “Supervised learning with organic memristor
devices and prospects for neural crossbar arrays,” Proceedings of the
2015 IEEE/ACM International Symposium on Nanoscale Architectures,
NANOARCH 2015, pp. 181–186, 2015.
[11] B. A. Olshausen and C. J. Rozell, “Sparse codes from memristor
grids,” Nature Publishing Group, vol. 12, no. 8, pp. 722–723, 2017.
[12] C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen,
“Sparse coding via thresholding and local competition in neural
circuits.” Neural Computation, vol. 20, no. 10, pp. 2526–63, 2008.
[13] E. Oja, “Simplified neuron model as a principal component analyzer,”
Journal of Mathematical Biology, vol. 15, no. 3, pp. 267–273, 1982.
[14] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proceedings of the IEEE,
vol. 86, no. 11, pp. 2278–2324, 1998.
[15] A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Im-
ages,” . . . Science Department, University of Toronto, Tech. . . . , vol. 44,
no. 8, pp. 1–60, 2009.
[16] S. Shapero, C. Rozell, and P. Hasler, “Configurable hardware integrate
and fire neurons for sparse approximation,” Neural Networks, vol. 45,
pp. 134–143, 2013.
[17] W. Woods, M. M. A. Taha, S. J. Dat Tran, J. Burger, and C. Teuscher,
“Memristor panic - A survey of different device models in crossbar
architectures,” Proceedings of the 2015 IEEE/ACM International Sym-
posium on Nanoscale Architectures, NANOARCH 2015, pp. 106–111,
2015.
[18] S. Shapero, A. S. Charles, C. J. Rozell, and P. Hasler, “Low Power
Sparse Approximation on Reconfigurable Analog Hardware,” IEEE
Journal on Emerging and Selected Topics in Circuits and Systems,
vol. 2, no. 3, pp. 530–541, 2012.
[19] W. Woods and C. Teuscher, “Approximate vector matrix multiplication
implementations for neuromorphic applications using memristive cross-
bars,” in Proceedings of the 2017 IEEE/ACM International Symposium
on Nanoscale Architectures (NANOARCH), pp. 103–108. IEEE, 2017.
[20] T. Serrano-Gotarredona, T. Masquelier, T. Prodromakis, G. Indiveri,
and B. Linares-Barranco, “STDP and STDP variations with memristors
for spiking neuromorphic learning systems.” Frontiers in neuroscience,
vol. 7, no. February, p. 2, 2013.
[21] W. Maass, “To Spike or Not to Spike: That Is the Question,”
Proceedings of the IEEE, vol. 103, no. 12, pp. 2219–2224, 2015.
[22] S. Habenschuss, Z. Jonke, and W. Maass, “Stochastic Computations
in Cortical Microcircuit Models,” PLoS Computational Biology, vol. 9,
no. 11, p. e1003311, 2013.
[23] T. J. Hamilton, S. Afshar, A. van Schaik, and J. Tapson, “Stochastic
Electronics: A Neuro-Inspired Design Paradigm for Integrated Circuits,”
Proceedings of the IEEE, vol. 102, no. 5, pp. 843–859, 2014.
[24] J. Zylberberg, J. T. Murphy, and M. R. DeWeese, “A Sparse Coding
Model with Synaptically Local Plasticity and Spiking Neurons Can
Account for the Diverse Shapes of V1 Simple Cell Receptive Fields,”
PLoS Computational Biology, vol. 7, no. 10, p. e1002250, 2011.
[25] P. Knag, J. K. Kim, T. Chen, and Z. Zhang, “A Sparse Coding Neural
Network ASIC With On-Chip Learning for Feature Extraction and
Encoding,” IEEE Journal of Solid-State Circuits, vol. 50, no. 4, pp.
1070–1079, 2015.
[26] T. Pfeil, T. C. Potjans, S. Schrader, W. Potjans, J. Schemmel,
M. Diesmann, and K. Meier, “Is a 4-Bit Synaptic Weight Resolution
Enough? Constraints on Enabling Spike-Timing Dependent Plasticity
in Neuromorphic Hardware,” Frontiers in Neuroscience, vol. 6, no.
JULY, pp. 1–19, 2012.
[27] J. K. Kim, P. Knag, T. Chen, and Z. Zhang, “A 640M pixel / s 3 . 65mW
Sparse Event-Driven Neuromorphic Object Recognition Processor with
On-Chip Learning C50 C51,” no. 4, pp. 50–51, 2015.
[28] K. S. Burbank, “Mirrored STDP Implements Autoencoder Learning in
a Network of Spiking Neurons,” PLOS Computational Biology, vol. 11,
no. 12, p. e1004566, 2015.
[29] P. M. Sheridan, C. Du, and W. D. Lu, “Feature Extraction Using
Memristor Networks,” IEEE Transactions on Neural Networks and
Learning Systems, vol. 27, no. 11, pp. 2327–2336, 2016.
[30] P. M. Sheridan, F. Cai, C. Du, W. Ma, Z. Zhang, and W. D. Lu,
“Sparse coding with memristor networks,” Nature nanotechnology,
vol. 12, no. 8, pp. 784–790, 2017.
[31] P. T. P. Tang, T. Lin, and M. Davies, “Sparse coding by spiking
neural networks: Convergence theory and computational results,” arXiv
preprint, vol. abs/1705.05475, 2017.
[32] E. Jones, T. Oliphant, P. Peterson et al., “SciPy: Open source scientific
tools for Python,” 2001–, [Online; accessed 2016-08-17].
[33] M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,”
arXiv, p. 6, 2012.
[34] D. Querlioz, O. Bichler, P. Dollfus, and C. Gamrat, “Immunity to device
variations in a spiking neural network with memristive nanodevices,”
IEEE Transactions on Nanotechnology, vol. 12, no. 3, pp. 288–295,
2013.
[35] F. Alibart, L. Gao, B. D. Hoskins, and D. B. Strukov, “High precision
tuning of state for memristive devices by adaptable variation-tolerant
algorithm.” Nanotechnology, vol. 23, no. 7, p. 075201, 2012.
[36] Y. Xu, L. Belostotski, and J. W. Haslett, “Offset-corrected 5GHz
CMOS dynamic comparator using bulk voltage trimming: Design and
analysis,” in 2011 IEEE 9th International New Circuits and systems
conference, pp. 277–280. IEEE, 2011.
[37] W. Zhao and Y. Cao, “New Generation of Predictive Technology Model
for Sub-45nm Design Exploration,” in 7th International Symposium on
Quality Electronic Design (ISQED’06), pp. 585–590. IEEE, 2006.
[38] M. R. Driels and Y. S. Shin, Determining the Number of Iterations for
Monte Carlo Simulations of Weapon Effectiveness. Naval Postgraduate
School, 2004.
[39] R. Degraeve, A. Fantini, N. Raghavan, L. Goux, S. Clima,
B. Govoreanu, A. Belmonte, D. Linten, and M. Jurczak, “Causes
and consequences of the stochastic aspect of filamentary RRAM,”
Microelectronic Engineering, vol. 147, pp. 171–175, 2015.
[40] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely
Connected Convolutional Networks,” arXiv preprint, pp. 1–12, 2016.
SUPPLEMENTARY MATERIAL
0.2 0.4 0.6 0.8
Kin
0.0
0.1
0.2
0.3
0.4
0.5
K
o
u
t
%
C
or
re
ct
10.0
12.5
15.0
17.5
20.0
22.5
25.0
27.5
30.0
Fig. S1: CIFAR-10 spike accuracy with different spike duty cycles. High duty
cycles for input spikes and low duty cycles for output spikes are the most
accurate, but require more power (Fig. S2).
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Kin
0.125
0.130
0.135
0.140
0.145
0.150
0.155
0.160
R
M
S
E
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Kin
10
15
20
25
30
35
%
C
or
re
ct
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Kin
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
P
ow
er
(W
)
SSLCA
LCA
Fig. S2: Power and accuracy trade-offs on CIFAR-10 with varied Kin. LCA
power shown represents only a voltage-scaled crossbar, to directly compare
spiking and non-spiking approaches. Spiking always consumed less power
than a voltage-scaling approach, a combination of the dataset having a high
average input and the spiking algorithm utilizing inhibition of input signals
(see Fig. 11 for the effects of inhibition on power consumption).
LCA SSLCA w/o Inhibition SSLCA w/ Inhibition
0.00
0.02
0.04
0.06
0.08
0.10
0.12
R
M
S
E
LCA SSLCA w/o Inhibition SSLCA w/ Inhibition
65
70
75
80
85
90
95
%
C
or
re
ct
0.5×
2.0×
Raw pixels
Fig. S3: Comparison of different algorithms and different completenesses at
S = 4. The SSLCA is capable of matching the accuracy of the LCA when a
deeper classifier is used.
