Exploiting the Short-term to Long-term Plasticity Transition in
  Memristive Nanodevice Learning Architectures by Bennett, Christopher H. et al.
Exploiting the Short-term to Long-term Plasticity
Transition in Memristive Nanodevice
Learning Architectures
Christopher H. Bennett∗, Selina La Barbera†, Adrien F. Vincent∗, Jacques-Olivier Klein∗,
Fabien Alibart†, and Damien Querlioz∗
∗Institut d’E´lectronique Fondamentale, Univ. Paris-Sud, CNRS, 91405 Orsay, France
Email: christopher.bennett@u-psud.fr
†Institut d’E´lectronique, Microe´lectronique et Nanotechnologies, UMR CNRS 8520, Villeneuve d’Ascq, France.
Abstract—Memristive nanodevices offer new frontiers for
computing systems that unite arithmetic and memory operations
on-chip. Here, we explore the integration of electrochemical
metallization cell (ECM) nanodevices with tunable filamentary
switching in nanoscale learning systems. Such devices offer a
natural transition between short-term plasticity (STP) and long-
term plasticity (LTP). In this work, we show that this property
can be exploited to efficiently solve noisy classification tasks. A
single crossbar learning scheme is first introduced and evaluated.
Perfect classification is possible only for simple input patterns,
within critical timing parameters, and when device variability
is weak. To overcome these limitations, a dual-crossbar learning
system partly inspired by the extreme learning machine (ELM)
approach is then introduced. This approach outperforms a con-
ventional ELM-inspired system when the first layer is imprinted
before training and testing, and especially so when variability
in device timing evolution is considered: variability is therefore
transformed from an issue to a feature. In attempting to classify
the MNIST database under the same conditions, conventional
ELM obtains 84% classification, the imprinted, uniform device
system obtains 88% classification, and the imprinted, variable
device system reaches 92% classification. We discuss benefits and
drawbacks of both systems in terms of energy, complexity, area
imprint, and speed. All these results highlight that tuning and
exploiting intrinsic device timing parameters may be of central
interest to future bio-inspired approximate computing systems.
I. INTRODUCTION
Memristive nanodevices are a novel form of electronic
memory whose properties are reminiscent of biological
synapses, and which can exhibit plasticity features. In recent
years, a variety of approaches have been considered to integrate
these devices into learning circuits; typically, a crossbar that
stores analog weights is paired with a set of neurons built with
CMOS technology. Collectively, these components manifest
hardware systems that implement rules to perform learning
tasks such as classification. These rules are often bio-inspired,
such as spike-timing dependent plasticity (STDP [1]–[3]; ele-
mentary machine learning approaches such as the perceptron or
gradient descent have also been implemented successfully [4],
[5]. Small demonstrator circuits show the physical feasibility
of these rules [6], [7].
Intrinsic short term dynamics of memristive devices are not
exploited in the aforementioned learning algorithms; rather,
it is the sequence of operations and implied long term plas-
ticity that is crucial to successful learning. However, just as
synapses in the brain, some memristive nanodevices present
rich plasticity behaviors over shorter term time scales [2],
[8]–[10]. In the brain, short-term plasticity (STP) refers to
synaptic state change (potentiation) connecting neurons on the
scale of seconds to minutes, while long-term plasticity (LTP)
potentiates synapses for hours, days or even for a lifetime.
Although the transition from short to long-term plasticity is
already considered in neuroscience models, such as meta-
plastic learning systems [11], this transition remains an under-
explored topic in the field of memristive learning due to an
interest in the non-volatility of devices. While [8], [9] explored
memristive STP/LTP transitions and notably confirmed that
repeated rehearsal of patterns evinces a strong analogy to the
biological transition, they did not consider learning architec-
tures based on these mechanisms. In [12], volatile tungsten-
based memristive devices that relax relatively quickly were
considered for integration into a learning system, but only the
STP regime exploited for classification.
Using the transition from short-term to long-term plasticity
as the core component of a nano-electronic learning system
is therefore a novel approach. Here, we explore the merit
of that approach by constructing two learning architectures
and testing them on two classification tasks. We focus on a
highly promising device, electrochemical metallization (ECM)
cells where rich STP and LTP dynamics have been evidenced
recently [10]. First, we introduce the time dynamics of ECM
nanodevices and an experimentally validated model of their
behavior. Second, a single crossbar approach based on this
concept is proposed and simulated. To overcome its limitations,
we subsequently introduce a system partly inspired by Extreme
Learning Machines (ELM) that exhibits exciting performance
and high resilience to device variability. This allows us to
finally discuss merits and drawbacks of our approach.
II. NANODEVICE PLASTICITY MODEL
The devices considered are electrochemical metallization
(ECM) cells with a 60 nm switching layer, where dendritic
filaments form in between a reactive top electrode (anode)
of silver, and an inert bottom electrode (cathode) of platinum
[10]. The application of a positive bias above a threshold Vth
causes oxidation and drift of silver ions (Ag+) across the
Ag2S switching layer from the cathode towards the anode.
This increases conductivity and physically corresponds to
ar
X
iv
:1
60
6.
08
36
6v
1 
 [c
s.N
E]
  2
7 J
un
 20
16
the formation and strengthening of filaments. Conversely, a
negative bias induces reduction at the Ag electrode, weakening
filaments and decreasing conductivity. The device shows a
natural relaxation towards lower conductivity as Ag+ ions
continue to diffuse and reverse oxidation-reduction occurs.
Critically, this natural relaxation may be fast or slow, depend-
ing on the quantity and quality of the filaments. As reported
in [10], varying filamentary morphology and a possible trade-
off between filament density and diameter create complex
synaptic behavior. In particular, the transition from a relatively
small relaxation time (τ )- the STP regime- to a larger τ
corresponding to the LTP regime - was tunable both by the
number and the characteristics of subsequent pre-synaptic
excitatory pulses. Fig. 1(A) depicts the STP case where a small
number of pulses strengthen the filament so that the ECM
cell’s conductance increases to 0.9mS. However, this state is
not stable: after a time τ = 100s, the conductance has relaxed
to a low conductance state. Conversely, Fig. 1(B) depicts the
LTP case. Differently timed spikes move the synapse to a high
conductance of 3mS, which remains stable after τ = 100s.
A detailed model of STP to LTP transition in ECM cells,
reminiscent of a biological model of plasticity [13], was
validated experimentally in our previous work [10]. We now
revisit the basic equations of this model. Synaptic potentiation
increases in response to a train of pre-synaptic pulses; the
facilitating time constant τfac constantly increases as spikes
are applied and conductance increases, facilitating the STP to
LTP transition. After each programming spike:
τfac = a ·G(t)b, (1)
and after any given delay ∆t from the last spike at time t, the
conductance is
Grelax = G(t) · exp
(−∆t
τfac
)
(2)
Finally, at time t+ ∆t, the conductance value results from the
sum of the exponential relaxation and of a programming spike
if any is applied:
G(t+ ∆t) =
{
Grelax + U(A−Grelax) , if spike
Grelax , if no spike.
(3)
A corresponds to maximum synaptic efficiency (Gmax). A
typical value extracted from device measurement is A = 4mS.
The typical synaptic efficiency is U = 0.025. The power
law prefactor a = 2.42× 10−12 s · S−b, and the power law
exponent b = 4 [10].
III. A SIMPLE LEARNING TASK AND ALGORITHM
We first introduce a simple architecture that highlights the
promise and challenge associated with exploiting the STP to
LTP transition in nanodevices.
A. Crossbar Architecture and Algorithm
The architecture has three components: a software image
database or sensor and circuitry to convert pixels into voltage
spikes, an all-to-all crossbar that connects input and output
neurons electrically at each ECM cell crosspoint, and accom-
panying CMOS circuitry. Learning occurs in three stages.
Fig. 1. (A) depicts a pre-synaptic spike train that keeps a device in the STP
regime; a corresponding, weaker filament that can easily relax is pictured.
(B) depicts a more powerful spike train that successfully moves the pictured
device from the STP to LTP regime. The black solid lines are the measurement
results and the red dots are the model predictions.
In the first stage, ECM cells are “imprinted”. This stage
is sub-divided into several epochs corresponding to the total
number of classes J . In this case, J = 3: classes ’O’, ’Z’
and ’X’. Each image contains L = 36 total and 8 active
pixels. During each epoch, n noisy examples of the given class
are subsequently presented to the crossbar such that a spike
represents an active (white) pixel. Each input neuron receives
one pixel consistently, and the patterns are presented with a
delay ∆t as depicted in Fig. 2, where (A) is in epoch ’O’ and
(B) in epoch ’X’. After imprinting, no voltage is applied on
the crossbar for a wait period T . This allows ECM cells in the
STP regime, assumed to be noise, to return to low conductance
states, while it will not affect those in LTP. Fig. 3 presents the
evolution of the conductance of the 36 ECM cells connected
to one particular neuron, during imprinting (timprint = 26.4ms)
and subsequent wait (T = 1s). The final conductance map for
the output neuron- in this case it has learned ’X’- is visualized
pixel-by-pixel in the inset.
Imprinting only works if each output neuron corresponds
to a different class. Output neurons may be mapped to classes
either in a supervised manner- employing a cell selector such
as the FAST selector [14]- or in an unsupervised manner, such
as a winner-takes all (WTA) half-select scheme. Only nan-
odevices at the intersection of an active pixel/row (receiving
pre-synaptic spike Vprog,h = 0.42V , Vprog,w = 100µs) and the
selected column (green nanodevices in Fig. 2(A),(B)) increase
conductance. In the unsupervised scheme depicted in Fig. 2(B),
input spikes are set at Vprog/2; only with a complementary
spike of the ’winning’ leaky integrate and fire (LIF) output
neuron (−Vprog/2) are synapses in the appropriate column
imprinted. Here, all other LIF neurons besides the one who
spiked first are inhibited due to a lateral diffusion scheme [15].
Conductance evolution is not equivalent between the two cases
due to half-select effects, yet transition from STP to LTP is
nevertheless possible in both systems. The supervised scheme
generates all following simulation results.
Fig. 2. The simple learning system, with characteristic images input to
the system, selected and non-selected nanodevices, and input and output
computing accessories required for learning. (A) supervised scheme, and (B)
unsupervised scheme. Key variables are also noted.
Second, images are presented to the network in ’read’
mode (Vread = 0.1V ) - so as to not disturb conductances-
and currents are read out at all output neurons (not just the
corresponding one). As output currents are a dot product of
device conductances and active pixels for a given image, many
unique values are possible. These values are stored in a circuit
(register) below where they are iteratively averaged. After N
examples, J2 = 9 currents (signatures) are stored: Ireg. Testing
is the final phase: K unknown digits are presented at Vread and
output currents Itest are compared to the register’s values. The
predicted class is the one which minimizes Etot:
Etot =
J∑
i=1
|Iregi − Itesti | (4)
If predicted class is the true class, ’1’ is placed in a ledger;
else ’0’. The final score is simply ledger sum divided by K.
Computing an iterative average during training and storing it
in a register may be achieved in either analog (operation am-
plifiers, sample and hold circuits each containing a capacitor),
or digital (analog to digital converters and conventional digital
memory, eg RAM) fashions, or a combination. Computing Etot
during tests additionally requires an absolute value circuit.
Each output neuron (class) must have access to equivalent
circuitry. Agnostic of implementation, the circuit overhead is
Fig. 3. Conductance as a function of time during the imprinting process-
n = 30,∆t = 200µs and subsequent waiting period, for the 36 ECM cells
connected to a single output neuron. Inset: conductance of the cells at t =
1.026 s, presented as a reconstructed 2-D image. Note time is portrayed on a
semi-logarithmic and not linear axis.
non-negligible and a drawback of this approach.
All of the following reported results, as well as that
depicted in Fig. 3, were produced by a software program that
simulates a crossbar of nanodevices, each following the mathe-
matical model for conductance evolution introduced in Section
II, and tracks evolution of synapses and currents over time in
response to voltage encoded input spike trains. This simulation
software also models nanodevice specific issues such as device
variability. For the simulation results immediately following,
N = K = 100, long wait T = 1s. Noise is added by randomly
flipping to their opposite state 10 % of all pixels in images used
for imprinting, training, and testing.
B. Performance on the Simple Task
Fig. 4(A) presents classification rates as a function of
the chosen inter-pattern wait step ∆t. Each series represents
a different number of patterns n presented per epoch of
imprinting stage. The performance reaches a nearly perfect
98% over a broad timing range (0.1ms < ∆t < 1.2ms)
for n > 20. Below n = 20, it is not possible to reach the
plateau as there are insufficient presynaptic pulses to move
nanodevices from the STP to LTP regime (also visible in Fig. 1
and discussed in [10]).
Fig. 4(A) also shows sub-optimal classification before and
after the optimal range. The former occurs when patterns
imprint too fast to synchronize with the nanodevice’s normal
relaxation parameter τfac, hence the conductance map is over-
saturated (too many devices enter LTP). Conversely, when ∆t
is too large, insufficient devices enter LTP to retain the digit
image. In these cases currents no longer vary meaningfully
neuron-by-neuron, and classification becomes difficult.
1) Effect of Device Variability: Nanodevices always suffer
from some device variability [3]. We consider the case where
ECM cells each behave slightly differently to equivalent pre-
synaptic spikes. Each now receives a different internal device
timing variable: U,A, a. From Eq. 1, each device then pos-
sesses a slightly different τfac as a changes; from Eq.3, each
conductance evolves a bit differently due to varying synaptic
efficiency (U ) and Gmax (A). Random values U,A, a are drawn
from a normal distribution with mean (µ) set as those listed in
Section II, and coefficient of variation considered over degrees
σ/µ = {0.025, 0.05, 0.1, 0.15 }. For each degree of variation,
20 simulations were performed at each ∆t value and averaged.
As depicted in Fig. 4(B), increasing variability reduces the
nearly perfect classification plateau. Unlike the uniform case,
increasingly variable crossbars do not fall off a performance
’cliff’, but experience a gentler landing at increasing dispersion
parameters. In the σ/µ = 0.15 case, the same recognition
’floor’ as in the other cases ( 30%) is not reached even at a
very long inter-pattern wait (4ms).
2) Effect of Increasing Training Samples: Few examples
are needed in order to learn the functions, because the register
reaches a useful average very quickly. In the uniform case
classification reaches 70% with 5 samples, and approaches
100% after 10. Considering variability, 15 samples are needed
to reach peak (95%) classification; however, the highest dis-
persion case (15%) takes 25 samples to reach its peak (90%
classification).
3) Effect of Increasing Noise: In the uniform case, classi-
fication remains nearly perfect until around 15% of pixel flips
and then deteriorates linearly after that until a minimum of
75%. Low and medium variability cases perform well until
10%, following a similar deterioration trajectory thereafter.
However, the highest variability case (15%) only does as well
as the others in the 0-5 % noise range; by 20% noise-induced
flips it already falls to 80% correct.
C. Performance on the MNIST Task
In attempting to classify the MNIST database of hand-
written digits [16], J = 10,M = 784 so 7840 synapses
(ECM devices) attempt to resolve the problem. In this case, the
register must hold J2 = 100 values. While MNIST provides
N = 60k training, K = 10k tests, only N = K = 1k
were used. Overall, the system’s performance on this task
is not favorable. Fig. 4(C) shows that classification peaks
at ∆t = 1.1ms with 61% correct. The insets in Fig. 4(C)
highlight that at this peak, digits are relatively well constructed
if sparse, while reconstructed pixel maps in the mostly evap-
orated (super-optimal ∆t) and oversaturated (sub-optimal ∆t)
regions are unusable. However, even at the peak, classification
is weak because of an intrinsic algorithmic weakness. Since all
the register memorizes are currents and test images have a wide
variety of active pixels (in contrast to the small images), it has a
hard time distinguishing between different classes that produce
similar current sums. Fig. 4(C) also suggests that increasing
the number of output neurons beyond J = 10 does not help
with the present algorithm, as stored averages for redundant
neurons will be similar. Fig. 4(D) shows that device variability
has a deleterious effect on learning ability. At low dispersion,
degree peak classification drops to 50% while preserving a
similar trend, while large dispersion echoes the phenomena of
resilience to large pattern delays observed in Fig. 4(B).
IV. AN ELM-INSPIRED APPROACH
Inferior performance on the harder task, and complex read-
out scheme, inspired us to expand from a one-crossbar system.
Rather than using currents from the imprinted layer to solve
a classification problem, we considered the case where those
currents are passed forward, after being transformed at the
hidden layer via an activation function, to a second crossbar.
We obtain a hardware instantiation of the principles of a single
hidden layer feedforward network, which is reminiscent of the
extreme learning machine (ELM) method [17].
A. System Description
Fig. 5 reveals a conceptual hardware implementation of our
ELM-inspired system built with two crossbars of memristive
devices, along with a timeline of its operation. There are four
phases: imprinting, waiting, training, and testing.
The original database is presented to the inputs of the first
crossbar as binary, noisy, voltage vectors on demand. The first
crossbar uses ECM cells as described in the previous section.
It is imprinted with images taken from the training dataset,
using the same procedure as in section III. It is therefore the
part of the system that uses the STP to LTP solution, and it
yields a projection space Win that will be used in training
and testing. This is in contrast with conventional ELM where
the weights of the first layer are random [18]. Unlike the
system detailed in Section III, currents are not stored but fed
to the the second stage of the system. Before they reach the
second crossbar, they are passed through activation functions
to increase dimensionality. In the ELM scheme, each activation
function is slightly different. The tanh(I) function was chosen
since it can be easily implemented in CMOS and engineered
for variability (offset and/or gain factor). In our case, offsets
were set randomly on each neuron and gain factor always held
at 10. Imprinting happens epoch by epoch and the total number
of epochs is identical to size of the hidden layer, M .
The second layer’s weight matrix Wout acts as a regression
layer, and is not imprinted but trained. It does not exploit
the STP to LTP transition of the nanodevices. Each input of
the second crossbar is connected to two rows of the second
crossbar. This allows us to model positive as well as negative
weights connecting input and output [5], [19]–[21]. A least
squares solution may be obtained by computing weights at
the end of training and importing them (batch mode), or
may be computed iteratively as subsequent training examples
are given (online mode). Batch solutions include a pseudo-
inverse operation, which might be complex to compute in
hardware, and closed-form ridge regression, which may be
easier to implement. A pseudo-inverse computed online can
achieve promising classification [22], while iterative on-chip
training schemes especially for memristive nanodevices exist
and can be naturally implemented in a memristive crossbar. In
this scheme, each column or class of the second (regressed)
layer implements its own approximation of the Widrow-Hoff
algorithm and they can be programmed parallely [5], [19]–
[21]. Both options are visualized in Fig. 5. Batch and online
options will be compared extensively in a follow-up paper. For
consistency, the results presented hereafter always used batch
learning (closed-form ridge regression) to obtain Wout given
actual matrix A of training examples (composed of projected
current vectors from Win), and expected matrix Y (composed
of binary vectors of presented classes):
Wout = Y A
>inv(AA>) (5)
In previous proposals for implementing ELM with memristive
devices, the weights of the first layer are random, using the
intrinsic nature of device variability- in particular, variance in
the OFF state of the memristor [23]. Here, we instead harness
an imprinting made possible by the STP to LTP transition.
We include a direct comparison of this new approach with the
random conductance values approach in the next subsection.
1) Effect of Device Timing and Variability: Fig. 6(A) again
shows that a sufficient number of patterns presented per epoch
(imprinted neuron) n > 20 is a constraint for successful
imprinting. Conversely, over-saturation is also possible when
n > 50 patterns are applied per epoch at the faster (smaller)
time steps. In all uniform cases, performance drops when
imprinting is too slow (∆t > 1ms). Fig. 6(B) shows that
unlike the simple system, increasing output (hidden layer)
neurons increases performance. This is due to the different
random activations provided at each neuron. Whereas Fig. 4
showed a reduction in performance at increasing dispersion,
Fig. 6(C) reveals the contrary case: maximum classification
slightly increases. For instance, when M = 100 and at optimal
wait, max 78% is reached in the uniform case compared to
82% for variable. While the uniform case shows a classification
’cliff’ after 0.8ms, the 10%, 15% variable cases again show a
broad tolerance to slower imprinting. One explanation is that,
with high conductance evolution variance, some synapses are
always excited enough to move from STP to LTP. This result
is attractive, since nanodevice variability is transformed from
a liability into a productive asset of the computing system.
B. Performances
1) Effect of Hidden Layer: Fig. 7 shows that regardless of
device uniformity or variability, imprinting Win is demonstra-
bly meaningful: imprinted systems substantially out-perform
the ELM control cases at every value of M . This result
can be compared with analogous priming of the first layer
of ELM systems in software artificial neural networks. Such
priming has already been reported to improve performance
over the standard case [24], [25]. Here, we show a similar
result subject to unique device timing constraints. Fig. 7
also shows that imprinted systems with synaptic evolution
variability consistently outperform the uniform case (where
every synapse behaves identically). Nanodevice variability then
allows for a greater variability between the hidden neurons,
enhancing the dimensionality of the data provided to the
second layer beyond just the varying activation functions.
At M = 1450, peak classification of 91.8% is obtained for
variable imprinted systems, the uniform imprinted systems
obtain 87.8%, and random weight ELM reaches 84.4%. At
M = L = 784 (dashed vertical line in Fig. 7), random weight
ELM performs similarly to a regression obtained by presenting
all training samples directly to the second layer (83.5%). While
the direct regression can only be made at M = L, to reach
higher performances than standard regression, ELM requires
Fig. 7. Classification rate as a function of hidden layer size M in different
conditions: random weights on first layer (Random Weights ELM); imprinting
on first layer, with no variability (Imprinting Uniform ELM); or variability
(Imprinting Var ELM) on nanodevices. In every case: n = 50 patterns are
given per epoch, T = 1s, ∆t = 200µs, 10% noise is present in every
imprinting, train, and test image, N = 60k,K = 10k, each hidden neuron
slightly varies its activation function with a gain factor constantly 10. The
single purple point/ dashed line at L = M represents the direct regression
solution obtained when all training images are presented directly to the second
layer without any first layer (projections).
substantially higher M values. Conversely, the rich underlying
dynamics of nanodevices allows designers to do more with
less in the imprinting cases.
Fig. 8. Classification rate as a function of number of training samples N
used to compute Wout in the third stage, given M = L = 784 (the purple
line/slice in Fig. 7), depicted on a log scale. Three two-layer systems cases are
depicted: where Win is set randomly with all ECM at low values (Random
Weights ELM), and two imprinted systems where ECM cells are uniform and
variable (5% dispersion), respectively. In imprinting cases, n=50 for uniform
and n=30 for variable cases, constantly in both cases T=1s, ∆t = 200µs. In
all cases, 10% pixels flipped (noise) in every image.
2) Effect of Training Set: Whether in batch or online mode,
minimizing training samples number N used to compose A
can save energy and time. Fig. 8 shows classification rate as
a function of training samples, at the case M = L = 784.
At very low sample size, the rate of improvement is high; a
steady state is reached around N = 5, 000, and performance
is already within 1-2% of maximum around N = 10, 000.
By N = 2, 500, the variable imprinted system already outper-
forms the maximal result obtained for standard ELM (83.5%);
uniform imprinting surpasses by N = 8, 500. With the full
training set, uniform and variable ECM projections ultimately
reach new classification heights (87% and 90%, respectively).
V. DISCUSSION
The simple system introduced in section III achieves
promising classification on a simple task, and does so with
minimal computing accessory as patterns are remembered
naturally as a function of time and device properties. Ex-
plicit weight changes are not needed, which eliminates an
impediment towards larger crossbars that require large circuit
overhead for this purpose. As memristor-CMOS systems have
already been demonstrated to learn images of equivalent com-
plexity [7], a physical implementation of our system is possible
and could demonstrate further trade-offs. However, the readout
involves a relatively complex procedure, and the system is
sensitive to device variability. While it is a proof of concept
for harnessing transition from STP to LTP in nanodevices, it
has limited applicability to real nano-electronic system design.
Conversely, the imprinted ELM architecture introduced in
section IV is a promising lead for future nano-architectures.
Imprinting a first layer with training examples definitively
improves performance on the primary task. By achieving a far
better classification at far smaller hidden layer size M than pre-
viously reported, the approach could dramatically reduce the
total size, number of nanodevices, and CMOS neurons required
to implement future ELM-inspired systems. Moreover, the fact
that variable synapse ELM systems out-performed uniform
synapse ELM systems is promising, as nanodevice variability
is usually a serious concern. As nanodevices are naturally
imperfect and structural synaptic diversity has been shown to
enhance information coding in biological synapses [26], this
implies that naturally variable filamentary nanodevices, such
as our ECM cells, are excellent building blocks for future
neuromorphic systems.
While gradient-based learning systems built with memris-
tive nanodevices report < 1% error on MNIST [27], to reach
these heights suggested weight updates for every device must
be computed externally to the system and programming pulses
applied on a device-by-device basis to both layers over many
epochs. With one-shot training/testing and programming pulses
only being applied to set weights on the smaller second layer
(assuming L×M  J×M ), our system might be an order
of magnitude more energy efficient and require less overhead
too. Additionally, our system offers flexibility unavailable to
gradient-based systems; since ridge regression solutions are
iterative, low sample (N ) solutions to Wout represent a trade-
off between accuracy and speed/energy saving that might be
intentionally exploited by approximate computing systems.
Crossbars systems that use volatile memristive devices for
classification were first explored in [12], yet percentages of
80%+ on the primary task were only possible when the
currents of several individual crossbars were combined and
when two following layers (a multi-layer perceptron), provided
the solution. Our proposed system reduces area, complexity,
and performance in comparison to these past schemes.
However, our system might fairly be considered slow due
to a ’speed limit’ set by device relaxation. If imprinting
proceeds faster than ∆t = 100µS, it oversaturates. Assuming
∆t = 200µs, n = 40, J = 10, and T = 1s (MNIST), then
Ttot = 1.1s is required to imprint Win. Although training and
testing are an order of magnitude faster, even if they were near
instantaneous the system is still slower than competing nano-
electronic systems. Yet, the device timing parameters used
herein were academic. Device engineering, in particular device
scaling, could tune τfac to allow for faster pattern presentation,
thereby narrowing the gap between transient neuromorphic
computing systems and non-volatile ones.
VI. CONCLUSION
Two novel nanoelectronic learning systems were conceived
and simulated on classification tasks. Both systems exploit
the unique properties of an ECM filamentary nanodevice with
tunable STP to LTP transition to memorize and retain average
images from the training sets of the classification tasks while
staying immune to low levels of noise. While the simple
system does well with a simple task, it can not classify
the MNIST database well; moreover, variation is unfriendly
to this system, and readout is complicated. For this reason,
the dual-crossbar system inspired by ELM was developed to
harness variability. While imprinting of the first layer provides
a definitive boost over standard (random weight) ELM, the
combination of synaptic and hidden layer variability help the
proposed system reach > 90% on the MNIST task. However,
both approaches come with a fundamental timing constraint:
the relaxation speed of the nanodevice implicit in the im-
printing stage. While further optimization on both the device
and architecture levels will be needed to reach state-of-the-
art classification, benefits in terms of energy efficiency, area
reduction, partial noise immunity, and anti-fragility to synaptic
variability are already apparent. These first results open the
way for new explorations of neuromorphic architectures, which
harness the intrinsic timing characteristic of nanodevices.
ACKNOWLEDGMENT
This work was supported by the Nanodesign Paris-Saclay
Lidex. The authors would like to thank L. Calvet, D. Vodeni-
carevic, A. Mizrahi, N. Locatelli and J. S. Friedman for fruitful
discussions.
REFERENCES
[1] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu,
“Nanoscale memristor device as synapse in neuromorphic systems,”
Nano letters, vol. 10, no. 4, pp. 1297–1301, 2010.
[2] S. Saighi, C. G. Mayr, T. Serrano-Gotarredona, H. Schmidt, G. Lecerf,
J. Tomas, J. Grollier, S. Boyn, A. F. Vincent, D. Querlioz, S. La Barbera,
F. Alibart, D. Vuillaume, O. Bichler, C. Gamrat, and B. Linares-
Barranco, “Plasticity in memristive devices for spiking neural net-
works,” Frontiers in Neuroscience, vol. 9, Mar. 2015.
[3] D. Querlioz, O. Bichler, A. Vincent, and C. Gamrat, “Bioinspired pro-
gramming of memory devices for implementing an inference engine,”
Proceedings of the IEEE, vol. 103, no. 8, pp. 1398–1416, Aug 2015.
[4] D. Soudry, D. Di Castro, A. Gal, A. Kolodny, and S. Kvatinsky,
“Memristor-based multilayer neural networks with online gradient de-
scent training,” IEEE Trans. Neural Netw., in press, 2015.
[5] D. Chabi, Z. Wang, W. Zhao, and J.-O. Klein, “On-chip supervised
learning rule for ultra high density neural crossbar using memristor for
synapse and neuron,” in IEEE/ACM Int. Symp. Nanoscale Architectures
(NANOARCH), July 2014, pp. 7–12.
[6] F. Alibart, E. Zamanidoost, and D. B. Strukov, “Pattern classification
by memristive crossbar circuits using ex situ and in situ training,” Nat.
Comm., vol. 4, 2013.
[7] M. Prezioso, F. Merrikh-Bayat, B. Hoskins, G. Adam, K. K. Likharev,
and D. B. Strukov, “Training and operation of an integrated neuromor-
phic network based on metal-oxide memristors,” Nature, vol. 521, pp.
61–64, 2015.
[8] T. Chang, S.-H. Jo, and W. Lu, “Short-term memory to long-term
memory transition in a nanoscale memristor,” ACS nano, vol. 5, no. 9,
pp. 7669–7676, 2011.
[9] Z. Q. Wang, H. Y. Xu, X. H. Li, H. Yu, Y. C. Liu, and X. J. Zhu,
“Synaptic learning and memory functions achieved using oxygen ion
migration/diffusion in an amorphous ingazno memristor,” Adv. Func.
Mater., vol. 22, no. 13, pp. 2759–2765, 2012.
[10] S. La Barbera, D. Vuillaume, and F. Alibart, “Filamentary switching:
Synaptic plasticity through device volatility,” ACS nano, vol. 9, no. 1,
pp. 941–949, 2015.
[11] W. C. Abraham, “Metaplasticity: tuning synapses and networks for
plasticity,” Nature Reviews Neuroscience, vol. 9, no. 5, 2008.
[12] J. Bu¨rger and C. Teuscher, “Volatile memristive devices as short-
term memory in a neuromorphic learning architecture,” in Proc.
of IEEE/ACM International Symposium on Nanoscale Architectures.
ACM, 2014, pp. 104–109.
[13] H. Markram, D. Pikus, A. Gupta, and M. Tsodyks, “Potential for
multiple mechanisms, phenomena and algorithms for synaptic plasticity
at single synapses,” Neuropharmacology, vol. 37, no. 4, pp. 489–500,
1998.
[14] S. H. Jo, T. Kumar, S. Narayanan, W. D. Lu, and H. Nazarian, “3d-
stackable crossbar resistive memory based on field assisted superlinear
threshold (fast) selector,” in IEDM Tech. Dig. IEEE, 2014, pp. 6–7.
[15] A. F. Vincent, J. Larroque, W. S. Zhao, N. Ben Romdhane, O. Bichler,
C. Gamrat, J.-O. Klein, S. Galdin-Retailleau, and D. Querlioz, “Spin-
transfer torque magnetic memory as a stochastic memristive synapse,”
in IEEE International Symposium on Circuits and Systems (ISCAS).
IEEE, 2014, pp. 1074–1077.
[16] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
pp. 2278–2324, 1998.
[17] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:
a new learning scheme of feedforward neural networks,” in Proc. IEEE
Int. Joint Conference on Neural Networks, vol. 2. IEEE, 2004, pp.
985–990.
[18] L. C. G.-B. Huang and C.-K. Siew, “Universal approximation using
incremental constructive feedforward networks with random hidden
nodes,” IEEE Trans. Neural Netw., vol. 17, no. 4, p. 879892, 2006.
[19] D. Chabi, W. Zhao, D. Querlioz, and J.-O. Klein, “On-chip universal
supervised learning methods for neuro-inspired block of memristive
nanodevices,” ACM Journal on Emerging Technologies in Computing
Systems (JETC), vol. 11, no. 4, p. 34, 2015.
[20] C. Bennett, D. Chabi, T. Cabaret, B. Jousselme, V. Derycke, D. Quer-
lioz, and J.-O. Klein, “Supervised learning with organic memristor
devices and prospects for neural crossbar arrays,” in IEEE/ACM Inter-
national Symposium on Nanoscale Architectures (NANOARCH). IEEE,
2015, pp. 181–186.
[21] D. Chabi, Z. Wang, C. Bennett, J.-O. Klein, and W. Zhao, “Ultrahigh
density memristor neural crossbar for on-chip supervised learning,”
IEEE Trans. Nanotechnol., vol. 14, no. 6, pp. 954–962, 2015.
[22] A. van Schaik and J. Tapson, “Online and adaptive pseudoinverse
solutions for elm weights,” Neurocomputing, vol. 149, pp. 233–238,
2015.
[23] M. Suri and V. Parmar, “Exploiting intrinsic variability of filamentary
resistive memory for extreme learning machine architectures,” IEEE
Trans. Nanotechnol., vol. 14, no. 6, pp. 963–968, 2015.
[24] M. D. McDonnell, M. D. Tissera, T. Vladusich, A. van Schaik, and
J. Tapson, “Fast, simple and accurate handwritten digit classification
by training shallow neural network classifiers with the extreme learning
machinealgorithm,” PloS one, vol. 10, no. 8, p. e0134254, 2015.
[25] J. Tapson, P. de Chazal, and A. van Schaik, “Explicit computation of
input weights in extreme learning machines,” in Proceedings of ELM-
2014 Volume 1. Springer, 2015, pp. 41–49.
[26] T. M. Bartol, C. Bromer, J. P. Kinney, M. A. Chirillo, J. N. Bourne,
K. M. Harris, and T. J. Sejnowski, “Nanoconnectomic upper bound on
the variability of synaptic plasticity,” eLife, p. e10778, 2015.
[27] E. Zamanidoost, M. Klachko, D. Strukov, and I. Kataeva, “Low area
overhead in-situ training approach for memristor-based classifier,” in
IEEE/ACM Int. Symp. Nanoscale Architectures (NANOARCH), 2015.
IEEE, 2015, pp. 139–142.
Fig. 4. Top Panels: Classification rate of the simple system on the simple images, as a function of time step ∆t. (A) for different number of patterns presented
per epoch (class) and (B) for different degree of device variability (n = 45 patterns imprinted per epoch). Bottom panels: Classification rate on MNIST also as
a function of ∆t. (C) Varying J output neurons when constantly n = 50. The three insets depict reconstructed conductance maps for one of the (Ji) neurons
imprinted at given ∆t parameter; white represents device in LTP (ON), and black those in STP (OFF) (D) Effect of degree of device variability (J = 10 output
neurons,n = 50 in each case). Every image has 10% noise.
Fig. 5. Conceptual architecture diagram for the dual-crossbar system that can compute higher dimensional tasks by projecting currents from examples to a
second regression layer. One representative moment of the system - presentation of an image Tri = 7 during the training period- is depicted. The timeline
below depicts operation through major phases. All variables are the same as defined in Fig. 2.
Fig. 6. Classification rate as a function of delay between pattern presentation during the imprinting phase, for the ELM inspired system. (A) for different
number of patterns presented per epoch (M=10) (B) for different number of hidden neurons M (n=50 patterns) (C) for different nanodevice variability cases
(M=100, n=50). In every case, T = 1s, N = 60k, K = 10k, 10% noise applied to every image in imprinting, testing, training.
