Committee machines -- a universal method to deal with non-idealities in
  memristor-based neural networks by Joksas, D. et al.
Committee Machines—A Universal Method to Deal with
Non-Idealities in Memristor-Based Neural Networks
D. Joksas1∗, P. Freitas2, Z. Chai2, W. H. Ng1, M. Buckwell1,
C. Li3, W. D. Zhang2, Q. Xia3, A. J. Kenyon1, and A. Mehonic1∗
1Department of Electronic and Electrical Engineering,
University College London, London (United Kingdom)
2Department of Electronics and Electrical Engineering,
Liverpool John Moores University, Liverpool (United Kingdom)
3Department of Electrical and Computer Engineering,
University of Massachusetts Amherst (United States of America)
Artificial neural networks are notoriously power- and time-consuming when implemented on con-
ventional von Neumann computing systems. Recent years have seen an emergence of research in
hardware that strives to break the bottleneck of von Neumann architecture and optimise the data
flow, namely, to bring memory and computing closer together. One of the most often suggested solu-
tions is the physical implementation of artificial neural networks in which their synaptic weights are
realised with memristive devices, such as resistive random-access memory. However, various device-
and system-level non-idealities usually prevent these physical implementations from achieving high
inference accuracy. We suggest applying a well-known concept in computer science—committee
machine—in the context of memristor-based neural networks. Using simulations and experimental
data from three different types of memristive devices, we show that committee machines employing
ensemble averaging can successfully increase inference accuracy in physically implemented neural
networks that suffer from faulty devices, device-to-device variability, random telegraph noise and
line resistance. Importantly, we show that the accuracy can be improved even without increasing
the total number of memristors.
I. INTRODUCTION
Artificial neural networks (ANNs), with all of their
variants, are now the main tools in machine learning
tasks, such as classification. The vast amounts of data be-
ing constantly produced have enabled successful training
and operation of ANNs. However, to achieve high infer-
ence accuracy, it is usually necessary for neural networks
to have a large number of parameters. This results in
both training [1] and inference [2] stages being time- and
power-consuming. This is largely caused by the need to
transfer data from memory to computing units—physical
separation of memory and computing is the essence of
any von Neumann system.
One of the most promising solutions to these prob-
lems is the paradigm of non-von Neumann computing
and, specifically, analogue implementations of synapses
(weights) in physical ANNs. Because there are many
more synapses than there are neurons in ANNs, the
matrix-vector multiplications, in which the synaptic
weight values are used, are the costliest operations in
these networks, both in terms of power and time. Com-
puting directly in memory would minimise costly data
transfers from off-chip memory, thus the most popular
approach is using analogue memory devices as proxies
for synaptic weights of ANNs (both fully connected and
∗Correspondence and requests for materials should be ad-
dressed to A.M. (adnan.mehonic.09@ucl.ac.uk) or D.J. (dovy-
das.joksas.15@ucl.ac.uk).
their variants [3, 4]). A common technique is to arrange
such devices in a structure, called crossbar array, in which
every device (or a pair of devices) is used to represent a
single synaptic weight or, more generally, an entry in
a matrix [5]. Memristive devices, such as phase-change
memories (PCMs) [6, 7] or resistive random-access mem-
ories (RRAMs) [8, 9], have been considered as candidates
for such tasks. Although here we focus on ex-situ train-
ing, such systems have been successfully utilised for in-
situ training too [10, 11].
In memristive implementations of ANNs, the main con-
cern is that various non-idealities associated with these
devices can prevent these systems from achieving high
accuracy [21, 22]. Examples of non-idealities affecting
inference accuracy include, but are not limited to, de-
vices not being able to electroform, devices stuck in
one of the resistance states after electroforming, device-
to-device (D2D) variability and random telegraph noise
(RTN). When training analogue systems in-situ, limited
endurance and non-linear resistance modulation too have
to be taken into account. To mitigate the effects of these
device non-idealities, it is often necessary to modify de-
vice structure [9], to use more advanced programming
schemes [17] or to use additional circuitry [13] or high-
precision processing units [23] in conjunction with mem-
ristive elements. On the system level, there is an issue of
line resistance which affects the distribution of currents
and thus decreases the accuracy. These line resistance
effects can be partially compensated for algorithmically
[18] or partially mitigated by using multiple smaller cross-
bar arrays [24]. Examples of past efforts at dealing with
these and other non-idealities of memristive devices and
ar
X
iv
:1
90
9.
06
65
8v
3 
 [c
s.E
T]
  1
3 J
un
 20
20
2First author
(year)
Non-ideality Device type Proposed solution
C. Sung
(2018) [12]
Current/voltage non-linearity TaOx RRAM Hot-forming step is adopted
C. Li
(2018) [13]
Current/voltage non-linearity Ta/HfO2 RRAM 1T1R architecture is adopted
Y. Fang
(2018) [14]
Device-to-device variability HfOx RRAM
Ultra-thin ALD-TiN
buffer layer is introduced
B. Govoreanu
(2013) [15]
Device-to-device variability Al2O3/TiO2 (VMCO) RRAM Non-filamentary RRAM is adopted
A. J. Kenyon
(2019) [16]
Device-to-device variability SiOx RRAM
The roughness of bottom
electrodes is increased
L. Xia
(2017) [17]
Faulty devices -
A modified mapping algorithm
and redundancy schemes are used
S. Ambrogio
(2018) [7]
Limited dynamic range PCM
Two pairs of conductance of varying significance
for every synaptic weight are used
M. Hu
(2016) [18]
Line resistance -
Advanced mapping algorithms are used to
compensate for line resistance effects
W. Wu
(2018) [19]
Programming non-linearity HfOx RRAM
Electro-thermal modulation layer is
deposited on the switching layer
J. Woo
(2016) [9]
Programming non-linearity HfO2 RRAM Bilayer structure is adopted
S. Ambrogio
(2018) [7]
Programming non-linearity PCM
PCM devices are used together
with CMOS transistors
Z. Chai
(2018) [20]
Random telegraph noise TiO2/a-Si (aVMCO) RRAM Non-filamentary RRAM is adopted
Table I: Examples of past efforts at dealing with non-idealities of memristive devices and their systems.
systems are listed in Table I; most of these non-idealities
are still the main focus of the research in the neuromor-
phic community.
We propose a simple way to mitigate the effects of
all types of non-idealities during inference. We sug-
gest combining several non-ideal memristor-based neu-
ral networks into committees to achieve better accuracy.
The committee machine (CM) method we propose sig-
nificantly increases the inference accuracy and does not
increase the computation time because memristive ANNs
in such committees work in parallel.
In this work, we firstly explain the simulation setup—
what networks were trained, how they were simulated
and how they were combined into CMs. After that, fol-
lows the experimental part. We investigate three differ-
ent types of memristor technology—tantalum/hafnium
oxide-based (Ta/HfO2), tantalum oxide-based (Ta2O5),
and amorphous vacancy modulated conductive oxide-
based (aVMCO) devices. By exploring their non-
idealities relevant to inference—faulty devices, D2D vari-
ability, RTN, and line resistance—we use the experimen-
tal data to simulate memristive ANNs working individu-
ally and in committees.
II. RESULTS
A. Simulation setup
Fully connected ANNs were trained in software to
recognise handwritten digits (using MNIST data base
[25]). Architectures with one hidden layer were ex-
plored. Unless stated otherwise, the simulations used
networks with 25 hidden neurons. However, networks
with 50, 100 and 200 hidden neurons were additionally
employed to evaluate the effectiveness of the proposed
method while controlling for the total number of mem-
ristors required. Following training, weights of ANNs
were mapped onto pairs of conductances using propor-
tional mapping scheme (see [33]) to simulate memristor-
based ANNs. Finally, these memristive networks were
disturbed using experimental data to reflect the effect of
device- and system-level non-idealities.
After simulating physical non-idealities, the networks
were combined into CMs that employed ensemble aver-
aging (EA) [26]. The principle of EA is shown in Fig-
ure 1A—several networks are combined in parallel and
then their outputs are averaged. After that, the predic-
3A
B C
AVERAGING
y1 yny2
y
MNIST
N(*1) N(*2) N(*n)
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
Committee of non-ideal memristive networks
Function:
• mitigating the eects of non-idealities
N N N
Identical digital networks
N(*1)1 N(*2)2 N(*n)n
N1 N2 Nn
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
Functions:
• mitigating the eects of non-idealities
• combining the knowledge of digital networks
Dierent digital networks
Committee of non-ideal memristive networks
Figure 1: Using multiple neural networks to improve inference accuracy. A) The principle of EA. B) Using identical digital
networks when implementing committees of memristive neural networks only helps to deal with the damage to the networks
caused by the non-idealities. C) Using different digital networks when implementing committees of memristive neural networks
both helps to deal with the damage to the networks caused by the non-idealities and allows to combine the knowledge of
individual digital networks about the data set.
tion is made using the averaged vector—the prediction is
the label corresponding to the largest entry in the vector.
CM methods are frequently used even with conven-
tional ANNs. Methods, such as EA, often produce bet-
ter accuracy than that of the best individual network
in a committee [27]. Although there are other types
of CMs besides EA, they often rely on training addi-
tional gating networks or boosting networks during the
training stage. Using a gating network in this scenario
would produce additional problems—to avoid it acting as
a performance bottleneck, it too would have to be imple-
mented on crossbar arrays. Various non-idealities would
decrease the effectiveness of this gating network which
is responsible for making the decisions about the whole
committee of ANNs. Likewise, we speculate that boost-
ing of networks would not be feasible in ex-situ training
because it requires information about where individual
ANNs perform poorly—this cannot be known precisely
4until they are implemented physically on crossbar arrays
and the non-idealities manifest themselves. To authors’
best knowledge, the application of boosting in the con-
text of memristive neural networks seems to have been
explored only once before [28]; as expected, it requires
training each memristive implementation differently be-
cause non-idealities manifest themselves differently in dif-
ferent crossbar arrays.
There exist modifications of EA algorithm that could
potentially perform better. One example of this is gen-
eralized ensemble method (GEM) which, instead of us-
ing equal weightings for each network during averaging
(as in EA), uses a different one for each network [26].
These weightings are analytically determined by consid-
ering correlation of errors between different networks.
But because [26] only considered networks with mean
square error loss function (while our networks used cross-
entropy loss function), this work does not explore GEM.
Instead, we investigated whether it is possible to achieve
a better performance by optimising the weightings nu-
merically. This method, like GEM and others previously
mentioned, might be impractical because, firstly, these
weightings could be determined only after the ANNs are
physically implemented on crossbars, and, secondly, the
devices could change throughout their lifetimes thus af-
fecting the optimal weightings.
Even with the assumption that the devices would have
perfect retention, we found that optimisation of weight-
ings achieves effectively the same performance. Because
of these reasons, we focus only on EA in the main text,
but present our results of optimising weightings in Sup-
plementary Figure S3. We stress that we are open to
the idea that other CM methods besides EA could be
utilised successfully for ex-situ training in the context of
memristive ANNs. However, in this work we focus on
demonstrating that CMs can be used to improve the ac-
curacy of memristor-based ANNs in general.
With EA, we find that even when the memristive
ANNs, which go into a committee, all use the same digi-
tally implemented weights that are mapped onto crossbar
arrays (see Figure 1B), committee of memristor-based
networks can still achieve higher accuracy than just a
single non-ideal network. Although all networks have
the same digital weights before mapping, their physical
implementations (which we call ”disturbances” in Fig-
ures 1B, C because they can usually be represented by
the modification of individual weights) will be different.
For example, in one crossbar array, a certain set of devices
will be faulty, while in the other crossbar array, it will be a
different set. This will result in different physical imple-
mentations having slightly different learned representa-
tions of the data set, or, to paraphrase, different networks
will be ”damaged” differently by the non-idealities. This
means that these committees will be able to combine dif-
ferent representations, and thus achieve higher accuracy.
However, by definition, such approach would almost cer-
tainly not yield a committee accuracy that is higher than
the accuracy of a single digitally implemented network.
A better approach is to use different digital networks
for different physical implementations that go into a com-
mittee (see Figure 1C). This approach much more re-
sembles the conventional application of EA in computer
science. In the context of memristive crossbar arrays,
it would not only help to mitigate the effects of the
non-idealities (as in the case of Figure 1B), but would
also allow to combine the representations of digital net-
works that were different even before the mapping stage.
Most importantly, this method allows for a committee to
achieve higher accuracy which is sometimes even higher
than that of individual networks with digitally imple-
mented weights. We thus used this method in this anal-
ysis.
In this work, any given committee used only one net-
work architecture but each network was initialised dif-
ferently before training, thus trained networks had dif-
ferent sets of weights. Although it was not explored in
this work, combining different network architectures in
a committee of memristor-based networks might be ad-
vantageous. Furthermore, in this work we focus on fully
connected ANNs but CMs could be applied to other vari-
ants of neural networks as well. Due to the simplicity of
EA, it could, for example, be employed in convolutional
neural networks (CNNs) [29], which are often used for
image classification. This might be of interest as CNNs
have been successfully implemented using crossbar ar-
rays recently [30]. However, crossbar implementations
are naturally more suited to fully connected networks,
therefore we limit ourselves to this architecture but are
open to exploring the effectiveness of EA with memristive
CNNs in the future.
B. Ta/HfO2 RRAM
With array-level data available, Ta/HfO2 experiments
provide the most complete picture of device- and system-
level non-idealities. In this subsection, we present not
only the analysis of faulty devices and D2D variability,
but also careful consideration of the line resistance ef-
fects. Ta/HfO2 memristors do not exhibit apparent RTN
and overall have excellent retention properties [31], and
thus are perfect candidates for inference application.
1. Faulty devices and device-to-device variability
The most energy-efficient procedure to modulate the
conductance of memristors is by the application of volt-
age pulses. In an ideal scenario, one would apply identi-
cal pulses and observe constant increases in conductance
with each pulse. This is rarely the case in practise, but,
fortunately, this type of behaviour is more relevant for
in-situ training where it is necessary to ensure linear ad-
justment of ANN’s weights [32]. In ex-situ training, con-
ductance verification schemes can be used to program
the devices precisely. Because the devices would have
50 10 20 30 40 50 60 70 80 90 100
Pulse number (#)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Co
nd
uc
ta
nc
e 
(m
S)
SET
RE
SE
T 0.2
0.4
0.6
0.8
1.0
1.2
SET
RE
SE
T 0.2
0.4
0.6
0.8
1.0
1.2
SET
RE
SE
T 0.2
0.4
0.6
0.8
1.0
1.2
0.2
0.4
0.6
0.8
1.0
1.2
SET
RE
SE
T SET
RE
SE
T0.2
0.4
0.6
0.8
1.0
1.2B
A
C D E F
Figure 2: Experimental data of Ta/HfO2 RRAM crossbar array of shape 128 × 64. A) Modulation of devices’ conductance
over 11 SET cycles, each consisting of a 100 potentiating pulses. Violin plots of gradual conductance changes are shown for
all Ta/HfO2 devices, with dots representing median conductance after a certain number of pulses. 100 points were used for
Gaussian kernel density estimation. All violin plots have their maximum widths normalised. B-F) Examples of devices with
their conductance (in mS) B) spanning the full range, C) spanning part of the full range, D) exhibiting cycle-to-cycle variability,
E) stuck at high values, F) stuck at low values. These diagrams show conductance of five devices from Ta/HfO2 crossbar array
over 11 SET and RESET cycles. The radial component represents the conductance, while the angular component represents
the number of applied pulses. The first SET cycle starts at the top of each of the diagrams. The conductance (in blue) over
100 SET pulses is displayed in a clockwise fashion across the right half of each of the diagrams. Following that, conductance (in
orange) over 100 RESET pulses (starting at the bottom) is displayed across the left half of each of the diagrams, after which
the next cycle is displayed.
to be programmed only once, one can spend additional
resources to do so accurately by applying SET (potenti-
ation) and RESET (depression) pulses until a desirable
conductance state is achieved.
Even with this approach, there remain two obstacles—
faulty devices and D2D variability. It is observed in most
memristor technologies that at least a small fraction of
the devices tends to get stuck in a particular conduc-
tance state. Additionally, even if not stuck, different de-
vices might behave differently; for example, they might
have different conductance ranges. Figure 2A shows
conductance changes in Ta/HfO2 RRAM devices (in a
128×64 crossbar array) when they are applied with volt-
age pulses. We can see from the median values that over-
all the devices’ conductance tends to increase as more
SET pulses are applied. However, the wider bottom re-
gions of the violin plots indicate that some devices are
stuck around high resistance state (HRS) and cannot set
entirely no matter how many voltage pulses are applied.
There also exist devices that are stuck in low resistance
state (LRS), or simply do not span the full conductance
range.
Figure 2A combines data from multiple SET cycles
for each of the memristors, thus it is important to un-
derstand how do these devices behave individually. Fig-
ures 2B-F show conductance of 5 (out of 8,192) devices
over 11 SET and RESET cycles. In the five diagrams,
the radial component represents the conductance (in mS)
and the angular component represents the number of ap-
plied pulses. Figure 2B shows an example of preferable
6(and typical) device behaviour—conductance changes in
a continuous fashion and spans a wide range of conduc-
tance values, from∼0.1 ms to∼1.0 ms. Although RESET
cycles tend to feature abrupt decreases in conductance,
one can always repeat a cycle and exploit the more pre-
dictable behaviour of SET cycles.
When encoding continuous numbers into crossbar de-
vices’ conductances, it is often preferable to choose a
large enough conductance range. Using data from Fig-
ure 2A, one could, for example, choose the range between
the first and the last median points (from ∼0.1 mS to
∼1.0 mS). Device, whose behaviour is presented in Fig-
ure 2B, could be easily set to any conductance within that
range, as we have seen before. On the other hand, device,
whose behaviour is presented in Figure 2C, although op-
erating in a predictable fashion, has smaller conductance
range. We can see that in all cycles, its conductance does
not exceed 0.8 mS. This is an example of D2D variabil-
ity that can make it difficult to choose optimal operating
range and set the conductance of all devices precisely.
Device, whose behaviour is presented in Figure 2D,
shows high cycle-to-cycle variability. Although that
could prove to be a problem in some applications, this
specific device might perfectly serve its purpose in ex-
situ training of ANNs. We can observe that this device
spans the same conductance range as device from Fig-
ure 2B, even if in an unpredictable manner. Because all
states in the full range are, in theory, achievable, one
can cycle the device multiple times until it is set to the
required conductance level.
Lastly, we have devices whose negative effect is most
difficult to mitigate—faulty devices. Figure 2E shows
behaviour of a device stuck at high conductance values,
while Figure 2F shows behaviour of a device stuck at low
conductance values. No matter how many pulses the de-
vices are applied with or how many times they are cycled,
they exhibit almost no conductance variation and thus,
in most cases, cannot be used to encode information.
Knowing that some devices perform like the ones whose
behaviour is shown in Figures 2C,E,F, it is important to
minimise their negative effect. If the conductance that a
device has to be set to is outside that device’s range, it
is sensible to set it to the closest achievable conductance.
Although there is little that can be done about fully stuck
memristors, it is possible to optimise the behaviour of de-
vices like the one in Figure 2C that simply have smaller
conductance range. For example, if such a device has to
be set to 0.9 mS, one would set it to the highest achiev-
able conductance (∼0.8 mS). In the following simulations
involving faulty devices and D2D variability, operating
range between the first and the last median points was
used, the devices were chosen randomly from the 128×64
crossbar and set to the most desirable states, as described
in this paragraph.
2. Line resistance
The effect of line resistance can be extremely detrimen-
tal in many crossbar-based implementations of ANNs.
That is especially the case if the crossbars used and the
resistance of the interconnects are large (compared to
memristors’ resistance). Because in a neural network
many of the inputs are non-zero at any given time, a
lot of current accumulates in the bit lines which results
in significant voltage drops across the interconnects, and
thus the current distribution across the crossbar is af-
fected in a major way.
Although there are many possible options for how to
map synaptic weights onto crossbar arrays, the choice can
determine the role of line resistance. It is often the case
that synaptic layers of ANNs are large in size. However,
that does not mean that the weights in those layers have
to be mapped onto crossbars of equivalent shape; not
only is that sometimes impossible, but it can also amplify
the effect of line resistance. For example, if a synaptic
layer with 785 input neurons (as is the case with the first
layer of our ANNs) was mapped onto a crossbar with 785
word lines, massive amounts of current would accumulate
in the bit lines.
The Ta/HfO2 crossbar has shape 128× 64 and so this
shape was chosen for all the simulations involving line
resistance. Even relatively small ANNs of architecture
784(+1):25(+1):10 would need 2× (785×25+26×10) =
39, 770 memristors to be implemented. Even if not all
the inputs were used at any given time, it would not
be possible to fit all the memristors onto a single cross-
bar of shape 128 × 64. To overcome this, we decided to
simulate multiple crossbars, each of which would imple-
ment a subset of the synaptic weights, but, for a given
synaptic layer, would all compute in parallel. Because
d785/128e = 7, seven crossbars were used to implement
the first synaptic layer; the first six crossbars utilised all
128 word lines, while the last one used only the bottom 17
word lines because 785−6×128 = 17. The second synap-
tic layer was implemented using eighth crossbar utilising
its bottom 26 word lines.
Figure 3A shows an example of how the first synaptic
layer of 784(+1):25(+1):10 neural network could be im-
plemented. Specifically, it shows how the first subset of
weights would be implemented using one of the crossbars.
Because we use proportional mapping scheme, positive
and negative weights would be implemented in different
bit lines. In Figure 3A, memristors designated to imple-
ment positive weights are coloured in blue, memristors
designated to implement negative weights are coloured
in orange and unelectroformed memristors are coloured
in black. Because simulations were constrained by exper-
imental data, the rightmost bit lines are unused and as-
sumed to contain only unelectroformed devices. In prac-
tise, the crossbars could be manufactured to fit the ge-
ometry of the ANNs.
In each synaptic layer, the corresponding output cur-
rents from each of the crossbars would be added together.
7V1
V2
V3
V126
V127
V128
I1 I2 I25 I26 I50 I51 I63 I64
x1
y1
y2
y24
y25
x2
x128
x129
x782
x783
x784
x785
memristors
implementing
positive weights
(25 bit lines)
memristors
implementing
negative weights
(25 bit lines)
A
B
sub
set o
f weig
hts mapped onto 1/7 crossbars
−20
−15
−10
−5
0 Average change in current (%
)
Output number (#)
Output number (#)
Without intensity-aware reordering
With intensity-aware reordering
5 10 15 20 25 30 40 4535 50
5 10 15 20 25 30 35 40 45 50
Figure 3: Theoretical implementation of a synaptic layer of shape 785 × 25 using crossbars of shape 128 × 64. A) Mapping
the first subset of weights onto one of the seven crossbars used to implement the whole synaptic layer. Positive weights and
negative weights are mapped onto memristors in different bit lines. B) Heatmap of average changes in output currents due to
line resistance (in all seven Ta/HfO2 crossbars) without and with a scheme that maps certain inputs onto certain word lines
depending on expected average intensities of those inputs. For this particular simulation, it was assumed that Ta/HfO2 devices
can be programmed perfectly.
Additionally, output currents at the bit lines implement-
ing negative weights would be subtracted from the output
currents at the corresponding bit lines implementing pos-
itive weights. For example, in the example configuration
of Figure 3A, output current at the 26th bit line would
be subtracted from the output current at the 1st bit line,
etc.
Unfortunately, even when using multiple smaller cross-
bars, the interconnects can significantly disturb current
distribution in the crossbar. Average output current de-
creases due to line resistance in all seven crossbars of
Ta/HfO2 devices (whose resistance ranges from ∼1 kΩ to
∼11 kΩ, and their interconnect resistance is 0.3 Ω), are
shown in the top heatmap of Figure 3B. We can see that
the current decreases can range from ∼15% at the out-
puts nearest to the applied voltages to ∼18% at the out-
puts in the rightmost bit lines that are used. Such large
current decreases often result from large input voltages
that are applied at the top part of the crossbar, far away
from the outputs. Such inputs generate large amounts of
current that flow through large portions of the bit lines
and, with voltage drops across interconnects, disturb the
overall current distribution in a major way.
In some applications, such as supervised learning, it
might be possible to strategically map certain inputs to
certain word lines, so that the effect of line resistance
880.0
82.5
85.0
87.5
90.0
92.5
95.0
97.5
A
cc
ur
ac
y 
(%
)
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Figure 4: Accuracy achieved by individual networks and
their committees when faulty devices, D2D variability data
and line resistance of Ta/HfO2 crossbar are taken into ac-
count. The maximum whisker length is set to 1.5× IQR.
is minimised. We propose intensity-aware reordering of
ANN’s inputs in which we record the average input in-
tensities over training and verification sets, and then map
inputs with highest average intensities to the word lines
closest to the outputs of a crossbar. This makes it so
that most of the current is generated near the outputs,
while the currents in the top parts of the bit lines are dis-
turbed minimally. Bottom heatmap in Figure 3B shows
average current decreases when using such a scheme with
an unseen test set—we observe significantly smaller de-
creases. Additionally, to make the influence of positive
and negative weights (which are affected very differently
in the naive mapping of Figure 3A) more equal and to
increase the variability between different ANNs in a com-
mittee, we suggest random reordering of inputs and out-
puts. Both intensity-aware and random reordering were
used in all the following simulations involving line resis-
tance. The implementation of these methods individually
and in combination with each other is explained in more
detail in the supplementary information.
3. Inference accuracy
Figure 4 shows the accuracy of individual networks,
as well as of their committees; memristive ANNs were
simulated by taking into account three non-idealities of
Ta/HfO2 crossbar explored earlier—faulty devices, D2D
variability and line resistance. As indicated by the yellow
box plot in Figure 4, individual networks implemented
digitally achieve ∼95.9% median accuracy. Networks
disturbed to reflect the effect of non-idealities achieve
∼90.8% median accuracy, as indicated by the vermilion
100 101 102
Absolute relative error of current (%)
2
5
10
20
30
40
50
60
70
80
90
95
98
Cu
m
ul
at
iv
e 
pr
ob
ab
ili
ty
 (%
)
higher
resistance
states
Data points
Lognormal ts
Figure 5: Cumulative probability plots of RTN-induced rel-
ative current deviations for all 8 resistance states of a Ta2O5
RRAM device. Lognormal fits are shown for each resistance
state.
box plot. Although that is a substantial drop in accuracy,
we see that as more networks are added to the commit-
tee, the more the accuracy increases. When 5 networks
are used in a committee, median accuracy increases up
to ∼95.8%, as indicated by the rightmost green box plot.
C. Ta2O5 RRAM
In order to explore effectiveness of minimising adverse
effects of RTN, we use another memristor technology
based on Ta2O5. To investigate RTN, measurements
from a single device were considered. To simulate line
resistance effects, interconnect resistance from Ta/HfO2
was used and the same crossbar shape was assumed.
1. Random telegraph noise
Memristors often suffer from RTN resulting in a differ-
ent accuracy at any given instant in time. Ta2O5 device
was characterised by measuring the current of 8 resis-
tance states multiple times. Figure 5 shows the cumula-
tive probability plots for those resistance states, together
with lognormal fits modelling the nature of RTN. One of
the things that the figure reveals is that higher resistance
states suffer from higher degree of RTN. Fits for every
resistance state, together with occurrence rates (see Sup-
plementary Table SII), were used to disturb the weights
of ANNs in order to reproduce the effect of RTN.
990
91
92
93
94
95
96
97
A
cc
ur
ac
y 
(%
)
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Figure 6: Accuracy achieved by individual networks and
their committees when RTN data of a Ta2O5 device are taken
into account. Additionally, interconnect resistance of 0.3 Ω
(from Ta/HfO2 array) was used to include line resistance ef-
fects. The maximum whisker length is set to 1.5× IQR.
2. Inference accuracy
The results combining RTN and line resistance effects
for Ta2O5 device are shown in Figure 6. From the dif-
ference in median accuracy between yellow and blue box
plots, we can notice that there is a significant drop in
accuracy simply due to mapping of weights onto conduc-
tances. That is not surprising given that only 8 states
were available for mapping. One can also observe that
further drop in median accuracy due to non-idealities is
not as severe—it drops to ∼94.2%. The RTN disturbance
magnitude is limited to <100% in most cases, which pos-
sibly contributes to its smaller effect on accuracy. Addi-
tionally, Ta2O5 device has much higher resistance (rang-
ing from 25 kΩ to 200 kΩ), thus line resistance is also
less of a concern. When non-ideal networks are com-
bined into committees of 5, the median accuracy jumps
to ∼96.5%—even higher than the software baseline of in-
dividual networks. This reveals additional trend seen in
all the simulations performed—the higher the accuracy of
the individual non-ideal memristive networks, the higher
the accuracy of the committees that they are part of.
D. aVMCO RRAM
Further, we consider a third memristor technology—
one based on aVCMO materials. We test the effects of
RTN by considering measurements from a single device.
Line resistance effects were simulated by using intercon-
nect resistance and shape of Ta/HfO2 crossbar array.
100 101
Absolute relative error of current (%)
2
5
10
20
30
40
50
60
70
80
90
95
98
Cu
m
ul
at
iv
e 
pr
ob
ab
ili
ty
 (%
)
higher
resistance
states
Data points
Lognormal ts
Figure 7: Cumulative probability plots of RTN-induced rel-
ative current deviations for all 8 resistance states of aVMCO
RRAM device. Lognormal fits are shown for each resistance
state.
1. Random telegraph noise
Figure 7 shows the cumulative probability plots for
8 resistance states of an aVMCO device suffering from
RTN. Like in Ta2O5, we observe that higher resistance
states experience RTN of higher magnitude. However,
compared to Ta2O5, the RTN magnitude is much more
predictable. Fits for each of the 8 resistance states,
together with occurrence rates (see Supplementary Ta-
ble SIII), were used to simulate effect of RTN in aVMCO-
based neural networks.
2. Inference accuracy
The results combining RTN and line resistance are
shown in Figure 8. As with Ta2O5, we see a large
drop due to mapping onto conductances—consequence
of very few states available for mapping. More inter-
estingly, the accuracy of individual memristor-based net-
works with and without non-idealities is almost identical.
That is because the occurrence rate of RTN in aVMCO
device is small and there is a much smaller probability of
RTN having large magnitude. Additionally, resistance
of aVMCO device is even higher than that of Ta2O5
device—it ranges from 1 MΩ to 7.5 MΩ. Therefore, line
resistance has even a smaller effect in a hypothetical ar-
ray of aVMCO devices. Due to median accuracy of indi-
vidual non-ideal memristor-based networks being higher
(∼94.7%), the median accuracy of committees is higher
too—in committees of size 5 it increases to ∼96.6%.
10
92
93
94
95
96
97
98
A
cc
ur
ac
y 
(%
)
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Figure 8: Accuracy achieved by individual networks and
their committees when RTN data of an aVMCO device are
taken into account. Additionally, interconnect resistance of
0.3 Ω (from Ta/HfO2 array) was used to include line resistance
effects. The maximum whisker length is set to 1.5× IQR.
III. DISCUSSION
The results from the previous section suggest that the
method of using committee machines to improve the
accuracy of memristive neural networks is technology-
agnostic. CMs can mitigate the effects of faulty devices,
D2D variability, RTN and line resistance in combination
with each other. Although line resistance is more diffi-
cult to deal with using committees due to the similar way
in which all crossbars of different networks get affected,
using random reordering can increase the effectiveness
of ensembles of non-ideal memristive networks. In all
cases, we observe that the accuracy of individual non-
ideal networks largely determines the accuracy of com-
mittees. That is consequential because it means that
although committees always increase the accuracy, there
is still an incentive to optimise the devices and systems
that implement these networks—the higher the accuracy
of individual networks, the higher the accuracy of the
committees.
It is also important to consider whether using larger
networks, instead of committees of smaller networks,
would yield the same results if the same number of
synapses (or memristors) was used in the large network
as in the committee of smaller networks. In our previ-
ous work we found that the accuracy of networks before
disturbance (which we call starting accuracy) has a huge
effect on the robustness to non-idealities—the larger the
starting accuracy, the more robust the networks become
[33]. One way to achieve higher starting accuracy is to
have larger networks, e.g. if we have a network with one
10 5 10 6
Total number of memristors
90
91
92
93
94
95
96
97
98
M
ed
ia
n 
ac
cu
ra
cy
 (%
)
Individual networks
Committees of 2 networks
Committees of 3 networks
Committees of 4 networks
Committees of 5 networks
Figure 9: Median accuracy achieved by individual one-
hidden-layer memristor-based networks and their committees,
when controlled for total number of memristors required. The
networks contained 25, 50, 100 or 200 hidden neurons and
were disturbed using faulty devices and D2D variability data
from Ta/HfO2 crossbar.
hidden layer, we might increase the number of neurons
in that hidden layer, which would likely result in higher
accuracy after training and thus higher robustness.
Figure 9 shows a comparison of CMs of memristor-
based networks disturbed using faulty devices and D2D
variability data from Ta/HfO2 crossbar, when controlled
for the total number of memristors that is required to
implement them (line resistance was not taken into ac-
count due to long time required to simulate it in large
networks). We can observe that committees of two net-
works, each with 25 hidden neurons, (leftmost data point
of the orange curve) achieve ∼0.9% higher median accu-
racy than individual networks with 50 hidden neurons
(second data point from the left in the vermilion curve),
despite both requiring almost identical total number of
memristors. Committees of two networks, each with 100
hidden neurons, (third data point from the left in the or-
ange curve) achieve ∼1.1% higher median accuracy than
individual networks with 200 hidden neurons (rightmost
data point in the vermilion curve), even though both re-
quire almost the same total number of memristors. Even
larger improvement is gained when committees of four
networks, each with 50 hidden neurons, (second data
point from the left in the blue curve) are used instead—
then the accuracy is improved by ∼1.5%, with almost the
exact total number of memristors used.
For different non-idealities and even different training
schemes of the ANNs, the equivalents of Figure 9 might
be different, but there are a few common characteristics
in all of them. In all cases, for a given total number of
memristors used, there is an optimal number of networks
that should be used in a committee. Additionally, we ob-
11
serve that the more severe a non-ideality is, the more ap-
parent the effectiveness of committees becomes. Finally,
sometimes the committees (for a fixed total number of
memristors) might achieve lower accuracy than individ-
ual networks but only if the networks that they replace
are very small and the non-ideality is not very detrimen-
tal. If the networks that are being replaced with commit-
tees of smaller networks, are sufficiently large, the com-
mittees will achieve higher accuracy. An example of that
is shown in Supplementary Figure S5 where aVMCO de-
vice is minimally affected by the non-idealities and so the
advantage of committees becomes apparent only when
replacing larger networks.
The reason why committees work in the context of non-
ideal implementations and why they work best when they
are used to replace large networks might, to some extent,
lie in their training. When it comes to training fully
connected networks, their accuracy tends to saturate as
more weights are added. Supplementary Figure S2 shows
that networks with 50 hidden neurons can be trained to
achieve significantly higher accuracy than networks with
25 hidden neurons. However, networks with 200 hidden
neurons achieve only slightly higher accuracy than net-
works with 100 hidden neurons. This also means that
networks with 200 hidden neurons will be only slightly
more robust to non-idealities than networks with 100 hid-
den neurons. When such networks are affected by non-
idealities, their accuracy drops to similar values but the
smaller network can work in a committee with one more
network, totalling almost the same number of memris-
tors as the large network, but achieving higher accuracy
overall. This is the most likely reason why the commit-
tees of smaller networks are effective at dealing with non-
idealities, especially when replacing large networks.
In addition to the accuracy improvements, committees
can provide flexibility in memristive implementations of
neural networks. Digital implementations of ANNs have
very predictable behaviour due to the precision of digi-
tal logic. Analogue implementations, on the other hand,
can vary greatly even if they use the same weights be-
fore the mapping onto conductances—that is a result of
the stochastic nature of memristors that implement these
ANNs. The parallel and modular nature of committee
machines makes memristive systems much more flexible.
For example, if the verification accuracy of one of the
ANNs in a memristor-based CM deteriorates below ac-
ceptable levels, its outputs could be disabled to ensure
higher accuracy of the rest of the committee.
Importantly, this introduced parallelism comes at al-
most no extra cost. For a fixed total number of mem-
ristors, a committee of smaller networks, compared to
a large individual network, would only require a few
additional output and bias neurons, and an averaging
functionality, which could potentially be implemented in
hardware. For example, an ANN with 50 hidden neurons
would require 846 neurons in total, while a committee of
two ANNs, each with 25 hidden neurons (and thus requir-
ing almost the same total number of memristors), would
require 857 neurons in total.
In summary, our simulations employing experimental
data from three different types of memristive devices
show that committee machines employing ensemble av-
eraging can be used to mitigate the effects of device-
and system-level non-idealities in memristor-based neural
networks. EA allows to achieve higher inference accuracy
in physically implemented neural networks that suffer
from faulty devices, device-to-device variability, random
telegraph noise, and even line resistance. This method
is a universal way to deal with the most common non-
idealities and is straightforward to implement during the
fabrication stage. Increased modularity of these memris-
tive neural network systems will increase not only their
inference accuracy, but also their robustness and flexi-
bility, even without the need to sacrifice area. Although
some level of non-idealities in memristors is unavoidable,
CM method allows us to deal with these on the system
level and is agnostic to a particular technology or, to
some degree, type of the non-ideality.
Methods
Experiments
Ta/HfO2 RRAM 1T1R array consists of NMOS tran-
sistors fabricated in a commercial fab (feature size of
2 µm) and Pt/HfO2/Ta devices. The bottom elec-
trode was deposited by evaporation of 20 nm Pt layer
on top of a 2 nm tantalum (Ta) adhesive layer; the
electrode was patterned by photolitography and a lift-
off process. A 5 nm HfO2 switching layer was de-
posited by atomic layer deposition using water and
tetrakis(dimethylamido)hafnium as precursors at 250 ◦C.
Sputter-deposited Ta of 50 nm thickness followed by
10 nm Pd was used in a liftoff process to serve as
the top electrode. The filamentary based Ta2O5 de-
vice consists of a TiN/4nm stoichiometric Ta2O5/20 nm
nonstoichiometric TaOx/10 nm TaN/TiN stack with a
cross-sectional area of 75 nm × 75 nm, while the non-
filamentary-based aVMCO has a cross-sectional area
of 135 nm × 135 nm and is composed of a TiN/8 nm
amorphous-Si/8 nm anatase TiO2/TiN stack. Ta2O5 and
aVMCO devices were fabricated by imec. The detailed
fabrication process parameters can be found in references
[11, 34, 35] for Ta/HfO2, Ta2O5 and aVMCO RRAMs re-
spectively.
The conductance of Ta/HfO2 devices was modulated
by applying SET pulses (500 µs @ 2.5 V and gate voltage
increasing from 0.6 V to 1.6 V). After each of the 11 cy-
cles, RESET pulses were applied (5 µs @ 0.9 V increasing
to 2.2 V and gate voltage of 5 V). The voltage was being
increased linearly throughout the 100 pulses. All electri-
cal tests for Ta2O5 and aVMCO devices were done with a
Keysight B1500A. The RTN data is extracted by switch-
ing the device into 8 uniformly distributed resistance lev-
els between 25 kΩ and 200 kΩ, and 8 nearly uniformly
12
distributed resistance levels between 1 MΩ and 7.5 MΩ
with incremental RESET DC sweeps [36] for Ta2O5 and
aVMCO respectively. RTN measurement is then carried
out at each resistance level at a 0.1 V and 3 V read-out for
Ta2O5 and aVMCO respectively, with a sampling time
of 2 ms/point and 10,000 sampling point per resistance
level for an RTN measurement period of 20 s.
Simulations
In this work, feed-forward ANNs with fully connected
layers and continuous weights were trained to recognise
handwritten digits using the MNIST data base. All
60,000 MNIST training images were used during the
training stage; training set consisted of 50,000 images
and verification set consisted of 10,000 images. All 10,000
test images were used to evaluate the inference accuracy
of ANNs. Networks used 784 input neurons representing
pixel intensities of MNIST images of 28 × 28 pixel size,
as well as one bias neuron. 10 output neurons were used;
they represented the ANNs’ predictions of 10 handwrit-
ten digits. Hidden layer used sigmoid activation function,
while the output layer used softmax activation function.
Weights were optimised by minimising cross-entropy er-
ror function using stochastic gradient descent. Learn-
ing rate of 0.01 and patience of 25 epochs were used.
25 networks were trained for each architecture explored
by initialising them differently. When numerically op-
timising ANNs’ weightings, optimisation was performed
by employing verification set, while the performance was
evaluated using the test set. The code was implemented
in Python.
Weights were mapped onto pairs of memristors’ con-
ductances using proportional mapping scheme—synaptic
weights were made proportional to one of the conduc-
tances in the pair, while the other was left unelectro-
formed. The zero weight was interpreted as given—in
practise, it would be implemented by not electroforming
the device, thus resulting in its negligible conductance.
Although aVMCO devices do not have electroforming
stage, for consistency we assumed that additional insu-
lating circuit elements could be used to implement the
zero weight. Negative weights would be implemented
by placing certain memristors in dedicated bit lines of
the crossbars whose outputs would be subtracted from
the outputs at the corresponding bit lines implementing
positive weights. Maximum weights after mapping were
optimised separately for each set of network architecture
and conductance levels; in each case this was done by ex-
cluding a certain proportion, pL, of weights with largest
absolute values. What pL values were used for each sim-
ulation is summarised in Supplementary Table SI. More
details on the mapping procedure can be found in our
past work [33].
All non-idealities, except for line resistance, were
simulated by disturbing the individual conductances of
memristor-based ANNs. To investigate line resistance,
loop analysis was employed. By setting up simultane-
ous linear equations using Kirchhoff’s current and voltage
laws, those were solved in sparse matrix representation
using Python’s library scipy.
After simulating memristor non-idealities, committees
of different ANNs were composed. Committees used EA,
i.e. the outputs of individual networks in a committee
were averaged to produce a single output vector. In EA,
the output vectors of individual networks can simply be
added together (if the weightings of different networks
are the same, as we assume in the main text); the label
corresponding to the entry with the highest value would
be the prediction of the committee. This addition can be
performed either in software, or, if the activation function
of the last neuronal layer can be implemented physically,
it can be performed by adding corresponding currents
produced by the circuitry of this activation function.
In the simulations, neural networks that go into a com-
mittee were chosen randomly. This was done to reflect
the most convenient strategy when manufacturing such
systems—because one does not need to selectively choose
the networks, manufactured crossbars can be easily pro-
grammed without the need to replace them if they per-
form poorly when working individually (unless their ef-
fect is so detrimental that they have to be ignored which
can be made possible with this technique). Besides, de-
vices might change over time, thus these simulations,
which show what happens when one does not selectively
choose the networks, are valuable to investigate condi-
tions where it is not possible to replace the networks.
In the simulations, 25 base networks were used (each
having different set of weights) for each of the architec-
tures. Then all of their weights were mapped onto pairs of
conductances using HRS/LRS values extracted from ex-
periments. Finally, to reflect the effect of each of the non-
idealities, all networks were disturbed multiple times. In
each disturbance iteration, multiple combinations of net-
works were chosen and their performance as a committee
of certain size was evaluated. In total, for each simu-
lation (except numerically optimised committees which
used 1,000 points), 10,000 data points were recorded for
a committee of every size—these data captured the vari-
ations of base networks, their combinations and different
disturbance iterations.
Data Availability
All data generated or analysed during this study are
included in this published article (and its supplementary
information file).
Author Contributions
A.M. and D.J. conceived the idea and designed the
study. A.M., P.F. and Z.C. performed the experimen-
tal measurements. D.J. performed the simulations and
13
analysed the experimental and simulation results. C.L.
and Q.X. provided the experimental data of the program-
ming of a Ta/HfO2 1T1R RRAM array. A.M., W.D.Z.
and A.J.K. supervised the research. D.J. wrote the initial
manuscript. All authors contributed to the discussions of
the results and improved the text.
Competing Interests Statement
The authors declare that the research was conducted in
the absence of any commercial or financial relationships
that could be construed as a potential conflict of interest.
Funding
A.M. acknowledges funding from the Royal Academy
of Engineering under the Research Fellowship scheme,
A.J.K. acknowledges funding from the Engineering and
Physical Sciences Research Council (EP/P013503/1) and
the Leverhulme Trust (RPG-2016-135), W.D.Z. acknowl-
edges funding from the Engineering and Physical Sci-
ences Research Council (EP/S000259/1).
14
[1] E. Strubell, A. Ganesh, and A. McCallum, “Energy and
policy considerations for deep learning in NLP,” arXiv
preprint arXiv:1906.02243, 2019.
[2] S. Han, H. Mao, and W. J. Dally, “Deep compression:
Compressing deep neural networks with pruning, trained
quantization and huffman coding,” in International Con-
ference on Learning Representations, 2016, San Juan
(Puerto Rico), arXiv preprint arXiv:1510.00149.
[3] C. Li, Z. Wang, M. Rao, D. Belkin, W. Song, H. Jiang,
P. Yan, Y. Li, P. Lin, M. Hu, N. Ge, J. P. Strachan,
M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, and
Q. Xia, “Long short-term memory networks in memris-
tor crossbar arrays,” Nature Machine Intelligence, vol. 1,
no. 1, pp. 49–57, 2019, doi: 10.1038/s42256-018-0001-4.
[4] Z. Wang, C. Li, W. Song, M. Rao, D. Belkin, Y. Li,
P. Yan, H. Jiang, P. Lin, M. Hu, J. P. Strachan, N. Ge,
M. Barnell, Q. Wu, A. G. Barto, Q. Qiu, R. S. Williams,
Q. Xia, and J. J. Yang, “Reinforcement learning with
analogue memristor arrays,” Nature Electronics, vol. 2,
no. 3, p. 115, 2019, doi: 10.1038/s41928-019-0221-6.
[5] Z. Sun, G. Pedretti, E. Ambrosi, A. Bricalli, W. Wang,
and D. Ielmini, “Solving matrix equations in one step
with cross-point resistive arrays,” Proceedings of the Na-
tional Academy of Sciences, vol. 116, no. 10, pp. 4123–
4128, 2019, doi: 10.1073/pnas.1815682116.
[6] S. R. Nandakumar, M. Le Gallo, I. Boybat, B. Rajen-
dran, A. Sebastian, and E. Eleftheriou, “A phase-change
memory model for neuromorphic computing,” Journal of
Applied Physics, vol. 124, no. 15, p. 152135, 2018, doi:
10.1063/1.5042408.
[7] S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby,
I. Boybat, C. D. Nolfo, S. Sidler, M. Giordano, M. Bodini,
N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi, and
G. W. Burr, “Equivalent-accuracy accelerated neural-
network training using analogue memory,” Nature, vol.
558, no. 7708, pp. 60–67, 2018, doi: 10.1038/s41586-018-
0180-5.
[8] S. Yu, Z. Li, P. Y. Chen, H. Wu, B. Gao, D. Wang,
W. Wu, and H. Qian, “Binary neural network with
16 Mb RRAM macro chip for classification and on-
line training,” in International Electron Devices Meet-
ing. IEEE, 2016, San Francisco (United States), doi:
10.1109/IEDM.2016.7838429.
[9] J. Woo, K. Moon, J. Song, S. Lee, M. Kwak, J. Park,
and H. Hwang, “Improved synaptic behavior under
identical pulses using AlOx/HfO2 bilayer RRAM ar-
ray for neuromorphic systems,” IEEE Electron De-
vice Letters, vol. 37, no. 8, pp. 994–997, 2016, doi:
10.1109/LED.2016.2582859.
[10] M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C.
Adam, K. K. Likharev, and D. B. Strukov, “Training and
operation of an integrated neuromorphic network based
on metal-oxide memristors,” Nature, vol. 521, no. 7550,
pp. 61–64, 2015, doi: 10.1038/nature14441.
[11] C. Li, D. Belkin, Y. Li, P. Yan, M. Hu, N. Ge, H. Jiang,
E. Montgomery, P. Lin, Z. Wang, W. Song, J. P.
Strachan, M. Barnell, Q. Wu, R. S. Williams, J. J.
Yang, and Q. Xia, “Efficient and self-adaptive in-situ
learning in multilayer memristor neural networks,” Na-
ture communications, vol. 9, no. 1, p. 2385, 2018, doi:
10.1038/s41467-018-04484-2.
[12] C. Sung, S. Lim, H. Kim, T. Kim, K. Moon, J. Song, J.-
J. Kim, and H. Hwang, “Effect of conductance linearity
and multi-level cell characteristics of TaOx -based synapse
device on pattern recognition accuracy of neuromorphic
system,” Nanotechnology, vol. 29, no. 11, p. 115203, 2018,
doi: 10.1088/1361-6528/aaa733.
[13] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery,
J. Zhang, W. Song, N. Dvila, C. E. Graves, Z. Li,
J. P. Strachan, P. Lin, Z. Wang, M. Barnell, Q. Wu,
S. Williams, J. Yang, and Q. Xia, “Analogue signal and
image processing with large memristor crossbars,” Na-
ture Electronics, vol. 1, no. 1, pp. 52–59, 2018, doi:
10.1038/s41928-017-0002-z.
[14] Y. Fang, Z. Yu, Z. Wang, T. Zhang, Y. Yang, Y. Cai,
and R. Huang, “Improvement of HfOx -based RRAM de-
vice variation by inserting ALD TiN buffer layer,” IEEE
Electron Device Letters, vol. 39, no. 6, pp. 819–822, 2018,
doi: 10.1109/LED.2018.2831698.
[15] B. Govoreanu, A. Redolfi, L. Zhang, C. Adelmann,
M. Popovici, S. Clima, H. Hody, V. Paraschiv, I. Radu,
A. Franquet, J. C. Liu, J. Swerts, O. Richard, H. Ben-
der, L. Altimime, and M. Jurczak, “Vacancy-modulated
conductive oxide resistive RAM (VMCO-RRAM): An
area-scalable switching current, self-compliant, highly
nonlinear and wide on/off-window resistive switch-
ing cell,” in International Electron Devices Meet-
ing. IEEE, 2013, Washington (United States), doi:
10.1109/IEDM.2013.6724599.
[16] A. J. Kenyon, M. S. Munde, W. H. Ng, M. Buckwell,
D. Joksas, and A. Mehonic, “The interplay between
structure and function in redox-based resistance switch-
ing,” Faraday Discussions, vol. 213, pp. 151–163, 2019,
doi: 10.1039/C8FD00118A.
[17] L. Xia, W. Huangfu, T. Tang, X. Yin, K. Chakrabarty,
Y. Xie, Y. Wang, and H. Yang, “Stuck-at fault toler-
ance in RRAM computing systems,” IEEE Journal on
Emerging and Selected Topics in Circuits and Systems,
vol. 8, no. 1, pp. 102–115, 2017, doi: 10.1109/JET-
CAS.2017.2776980.
[18] M. Hu, J. P. Strachan, Z. Li, and S. R. William, “Dot-
product engine as computing memory to accelerate ma-
chine learning algorithms,” in 17th International Sym-
posium on Quality Electronic Design, 2016, Santa Clara
(United States), doi: 10.1109/ISQED.2016.7479230.
[19] W. Wu, H. Wu, B. Gao, P. Yao, X. Zhang, X. Peng,
S. Yu, and H. Qian, “A methodology to improve linear-
ity of analog RRAM for neuromorphic computing,” in
Symposium on VLSI Technology. IEEE, 2018, Honolulu
(United States), doi: 10.1109/VLSIT.2018.8510690.
[20] Z. Chai, P. Freitas, W. Zhang, F. Hatem, J. F. Zhang,
J. Marsland, B. Govoreanu, L. Goux, and G. S. Kar,
“Impact of RTN on pattern recognition accuracy of
RRAM-based synaptic neural network,” IEEE Electron
Device Letters, vol. 39, no. 11, pp. 1652–1655, 2018, doi:
10.1109/LED.2018.2869072.
[21] A. Chen and M. R. Lin, “Variability of resistive switch-
ing memories and its impact on crossbar array perfor-
mance,” in 2011 International Reliability Physics Sym-
posium. IEEE, 2011, Monterey (United States), doi:
10.1109/IRPS.2011.5784590.
[22] J. Kang, Z. Yu, L. Wu, Y. Fang, Z. Wang, Y. Cai, Z. Ji,
15
J. Zhang, R. Wang, and Y. Yang, “Time-dependent vari-
ability in RRAM-based analog neuromorphic system for
pattern recognition,” in International Electron Devices
Meeting. IEEE, 2017, San Francisco (United States),
doi: 10.1109/IEDM.2017.8268340.
[23] M. Le Gallo, A. Sebastian, R. Mathis, M. Manica,
H. Giefers, T. Tuma, C. Bekas, A. Curioni, and E. Eleft-
heriou, “Mixed-precision in-memory computing,” Na-
ture Electronics, vol. 1, no. 4, p. 246, 2018, doi:
10.1038/s41928-018-0054-8.
[24] Q. Xia and J. J. Yang, “Memristive crossbar arrays
for brain-inspired computing,” Nature materials, vol. 18,
no. 4, p. 309, 2019, doi: 10.1038/s41563-019-0291-x.
[25] Y. LeCun, C. Cortes, and C. J. C. Burges, “The
MNIST database of handwritten digits,” 2010. [Online].
Available: http://yann.lecun.com/exdb/mnist
[26] M. P. Perrone and L. N. Cooper, “When networks dis-
agree: Ensemble methods for hybrid neural networks,” in
Artificial Neural Networks for Speech and Vision. Chap-
man and Hall, 1993, pp. 126–142.
[27] S. Hashem and B. Schmeiser, “Improving model accuracy
using optimal linear combinations of trained neural net-
works,” IEEE Transactions on Neural Networks, vol. 6,
no. 3, pp. 792–794, 1995, doi: 10.1109/72.377990.
[28] B. Li, L. Xia, P. Gu, Y. Wang, and H. Yang, “Merging the
interface: Power, area and accuracy co-optimization for
RRAM crossbar-based mixed-signal computing system,”
in Proceedings of the 52nd Annual Design Automation
Conference, 2015, San Francisco (United States), doi:
10.1145/2744769.2744870.
[29] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
classification with deep convolutional neural networks,”
in Advances in neural information processing systems,
2012, pp. 1097–1105, Lake Tahoe (United States), doi:
10.1145/3065386.
[30] Z. Wang, C. Li, P. Lin, M. Rao, Y. Nie, W. Song,
Q. Qiu, Y. Li, P. Yan, J. P. Strachan, N. Ge, N. Mc-
Donald, Q. Wu, M. Hu, H. Wu, R. S. Williams, Q. Xia,
and J. J. Yang, “In situ training of feed-forward and re-
current convolutional memristor networks,” Nature Ma-
chine Intelligence, vol. 1, no. 9, pp. 434–442, 2019, doi:
10.1038/s42256-019-0089-1.
[31] H. Jiang, L. Han, P. Lin, Z. Wang, M. H. Jang, Q. Wu,
M. Barnell, J. J. Yang, H. L. Xin, and Q. Xia, “Sub-10
nm ta channel responsible for superior performance of
a HfO2 memristor,” Scientific reports, vol. 6, p. 28525,
2016, doi: 10.1038/srep28525.
[32] G. W. Burr, R. M. Shelby, S. Sidler, C. Di Nolfo, J. Jang,
I. Boybat, R. S. Shenoy, P. Narayanan, K. Virwani, E. U.
Giacometti, B. N. Kurdi, and H. Hwang, “Experimen-
tal demonstration and tolerancing of a large-scale neural
network (165 000 synapses) using phase-change memory
as the synaptic weight element,” IEEE Transactions on
Electron Devices, vol. 62, no. 11, pp. 3498–3507, 2015,
doi: 10.1109/TED.2015.2439635.
[33] A. Mehonic, D. Joksas, W. H. Ng, M. Buckwell, and A. J.
Kenyon, “Simulation of inference accuracy using realistic
RRAM devices,” Frontiers in Neuroscience, vol. 13, p.
593, 2019, doi: 10.3389/fnins.2019.00593.
[34] Y. Fan, L. Zhang, D. Crotti, T. Witters, M. Jurczak,
and B. Govoreanu, “Direct evidence of the overshoot
suppression in Ta2O5-based resistive switching memory
with an integrated access resistor,” IEEE Electron De-
vice Letters, vol. 36, no. 10, pp. 1027–1029, 2015, doi:
10.1109/LED.2015.2470081.
[35] B. Govoreanu, D. Crotti, S. Subhechha, L. Zhang,
Y. Chen, S. Clima, V. Paraschiv, H. Hody, C. Adelmann,
M. Popovici, O. Richard, and M. Jurczak, “A-VMCO:
A novel forming-free, self-rectifying, analog memory
cell with low-current operation, nonfilamentary switch-
ing and excellent variability,” in Symposium on VLSI
Technology, 2015, Kyoto (Japan), doi: 10.1109/VL-
SIT.2015.7223717.
[36] Z. Chai, W. Zhang, P. Freitas, F. Hatem, J. F. Zhang,
J. Marsland, B. Govoreanu, L. Goux, G. S. Kar, S. Hall,
P. Chalker, and J. Robertson, “The over-reset phe-
nomenon in Ta2O5 RRAM device investigated by the
RTN-based defect probing technique,” IEEE Electron
Device Letters, vol. 39, no. 7, pp. 955–958, 2018, doi:
10.1109/LED.2018.2833149.
Supplementary Information for the Paper
”Committee Machines—A Universal Method to Deal with
Non-Idealities in Memristor-Based Neural Networks”
D. Joksas1∗, P. Freitas2, Z. Chai2, W. H. Ng1, M. Buckwell1,
C. Li3, W. D. Zhang2, Q. Xia3, A. J. Kenyon1, and A. Mehonic1∗
1Department of Electronic and Electrical Engineering,
University College London, London (United Kingdom)
2Department of Electronics and Electrical Engineering,
Liverpool John Moores University, Liverpool (United Kingdom)
3Department of Electrical and Computer Engineering,
University of Massachusetts Amherst (United States of America)
Reordering Schemes to Deal with Line Resistance
As discussed in the main text, high interconnect resistance can significantly reduce the accuracy of physically im-
plemented ANNs. This is demonstrated in the top left box plots of Supplementary Figure S1, where no reordering
scheme is used. When intensity-aware reordering is used, the decreases in currents in crossbars can be significantly
reduced. This also improves the accuracy of ANNs, as indicated by the top right box plots of Supplementary Fig-
ure S1. Although this method increases the accuracy when tested on an unseen MNIST test set, it should be used
carefully. It should only be used when the training and verification sets are truly representative of the test set because
simple averaging does not take into account the possibility that there might be significantly more examples from one
class than another in the test set if that is not is not the case in the verification set.
Additional problems arise from line resistance when inputs and outputs are mapped the same way in each crossbar. If
all networks in a committee, have their inputs and outputs mapped to the same word and bit lines (in their crossbars),
they might all be disturbed in a very similar way. For example, if in the crossbar implementing the last synaptic
layer, all outputs are mapped to the same bit lines in all crossbars of different neural networks of a committee, certain
classes of the data set might always be affected more than the others. This reduces the effectiveness of committees
because the variability between them decreases due to line resistance effects. Same mapping of outputs for every
crossbar can even affect the accuracy of individual networks. For example, if all crossbars had their positive weights
mapped onto certain bit lines, while negative weights were mapped onto other certain bit lines, this could affect the
influence that positive and negative weights have.
∗ Correspondence and requests for materials should be addressed to A.M. (adnan.mehonic.09@ucl.ac.uk) or D.J. (dovy-
das.joksas.15@ucl.ac.uk).
ar
X
iv
:1
90
9.
06
65
8v
3 
 [c
s.E
T]
  1
3 J
un
 20
20
2The problems related to the mapping of inputs and outputs in every crossbar can be partially addressed by randomizing
that mapping. If the mappings are different in each crossbar, not only does that increase variability between different
networks in a committee, but also makes the influence that positive and negative weights have more equal. The
accuracy of ANNs and their committees under the influence of line resistance, but with randomly reordered inputs
and outputs, is shown in the bottom left box plots of Supplementary Figure S1. It is difficult to evaluate the
effectiveness of CMs in isolation from all the other factors, but one thing that stands out is that the accuracy of
individual non-ideal memristor-based networks increases when inputs and outputs are randomly reordered. This
might be partly due to equalised importance of positive and negative weights mentioned earlier. The other reason
might be pure coincidence—it is possible that when MNIST images are flattened into 7 vectors (each to be received as
an input to separate crossbar), some high-intensity pixels are mapped onto the top word lines of some of the crossbars.
It might be that random reordering simply avoids this unfortunate scenario.
The final point that needs to be discussed is whether intensity-aware and random reordering schemes can be combined.
It is trivial to combine them if we simply use intensity-aware reordering for the inputs of the first synaptic layer and
random reordering for inputs of the rest of the synaptic layers and all the outputs. However, for this work, we
additionally introduced some randomness into the mapping of inputs of the first synaptic layer. Although the effect
of it is very small, we randomize which inputs are mapped onto which crossbars, while also making sure that the
inputs with the highest expected intensities are mapped to the word lines closest to the outputs. For example, without
random reordering the 7 inputs with highest expected intensities would be mapped to the bottom word lines of the 7
crossbars in the following way—input with seventh highest expected intensity would be mapped onto first crossbar,
sixth highest to second crossbar, etc. We introduce randomness by simply changing which particular inputs (in a
set of inputs with similar expected intensity) should be mapped to which crossbar. The accuracy of ANNs and their
committees employing this combined scheme is shown in the bottom right box plots of Supplementary Figure S1.
We can observe that this particular reordering of inputs and outputs results in the highest overall accuracy of both
individual non-ideal memristive networks and their committees.
3Figures
Intensity-aware reordering?
Ra
nd
om
 re
or
de
rin
g?
93
94
95
96
97
98
A
cc
ur
ac
y 
(%
)
committee machines
93
94
95
96
97
98
A
cc
ur
ac
y 
(%
)
YE
S
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
93
94
95
96
97
98
committee machines
93
94
95
96
97
98
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Network type
N
O
NO YES
Figure S1: Accuracy achieved by individual networks and their committees when disturbed using interconnect resistance
from Ta/HfO2 crossbar and when using different reordering schemes. In all box plots, the maximum whisker length is set to
1.5× IQR.
4Number of hidden neurons
95.5
96.0
96.5
97.0
97.5
98.0
A
cc
ur
ac
y 
(%
)
25 50 100 200
Figure S2: Accuracy of digitally implemented networks containing one hidden layer. Accuracy is shown for different number
of hidden neurons. In the box plot, the maximum whisker length is set to 1.5× IQR.
88
89
90
91
92
93
94
95
96
97
A
cc
ur
ac
y 
(%
)
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Figure S3: Effectiveness of committees when dealing with RTN disturbances (simulated by employing Ta2O5 data) while
using numerically optimised weightings. In the box plot, the maximum whisker length is set to 1.5× IQR.
510 5 10 6
Total number of memristors
94.0
94.5
95.0
95.5
96.0
96.5
97.0
97.5
98.0
M
ed
ia
n 
ac
cu
ra
cy
 (%
)
Individual networks
Committees of 2 networks
Committees of 3 networks
Committees of 4 networks
Committees of 5 networks
Figure S4: Median accuracy achieved by individual one-hidden-layer memristor-based networks and their committees, when
controlled for total number of memristors required. The networks contained 25, 50, 100 or 200 hidden neurons and were
disturbed using RTN data from a Ta2O5 device.
10 5 10 6
Total number of memristors
94.5
95.0
95.5
96.0
96.5
97.0
97.5
98.0
M
ed
ia
n 
ac
cu
ra
cy
 (%
)
Individual networks
Committees of 2 networks
Committees of 3 networks
Committees of 4 networks
Committees of 5 networks
Figure S5: Median accuracy achieved by individual one-hidden-layer memristor-based networks and their committees, when
controlled for total number of memristors required. The networks contained 25, 50, 100 or 200 hidden neurons and were
disturbed using RTN data from an aVMCO device.
6Tables
Figures
Device
type
HRS/LRS
Number of
conductance
states
Spacing of states Network architecture pL (%)
3B, 4, 9, S1 Ta/HfO2 10.48 ∞ - 784(+1):25(+1):10 0.1
6, S3, S4 Ta2O5 8 8
Equally spaced
resistance states
784(+1):25(+1):10 0.1
8, S5 aVMCO 7.5 8
{1.00 MΩ, 1.92 MΩ,
2.84 MΩ, 3.76 MΩ,
4.68 MΩ, 5.60 MΩ,
6.52 MΩ, 7.50 MΩ}
(nearly equally spaced
resistance states)
784(+1):25(+1):10 0.1
9 Ta/HfO2 10.48 ∞ - 784(+1):50(+1):10 0.0
9 Ta/HfO2 10.48 ∞ - 784(+1):100(+1):10 0.1
9 Ta/HfO2 10.48 ∞ - 784(+1):200(+1):10 0.0
S4 Ta2O5 8 8
Equally spaced
resistance states
784(+1):50(+1):10 0.1
S4 Ta2O5 8 8
Equally spaced
resistance states
784(+1):100(+1):10 0.1
S4 Ta2O5 8 8
Equally spaced
resistance states
784(+1):200(+1):10 0.0
S5 aVMCO 7.5 8
{1.00 MΩ, 1.92 MΩ,
2.84 MΩ, 3.76 MΩ,
4.68 MΩ, 5.60 MΩ,
6.52 MΩ, 7.50 MΩ}
(nearly equally spaced
resistance states)
784(+1):50(+1):10 0.1
S5 aVMCO 7.5 8
{1.00 MΩ, 1.92 MΩ,
2.84 MΩ, 3.76 MΩ,
4.68 MΩ, 5.60 MΩ,
6.52 MΩ, 7.50 MΩ}
(nearly equally spaced
resistance states)
784(+1):100(+1):10 0.1
S5 aVMCO 7.5 8
{1.00 MΩ, 1.92 MΩ,
2.84 MΩ, 3.76 MΩ,
4.68 MΩ, 5.60 MΩ,
6.52 MΩ, 7.50 MΩ}
(nearly equally spaced
resistance states)
784(+1):200(+1):10 0.4
Table SI: Summary of parameters for each simulation in the main text and supplementary information. Infinite number of
states simply means that, during the mapping of weights onto pairs of conductances, the inability to program the devices
precisely is not taken into account, only their HRS/LRS ratio is. However, the imprecision in programming can be taken into
account during disturbance stage, as was done with Ta/HfO2 memristors.
7Resistance
level
25 kΩ 50 kΩ 75 kΩ 100 kΩ 125 kΩ 150 kΩ 175 kΩ 200 kΩ
RTN
occurrence
rate
40.625% 43.75% 46.875% 59.375% 62.5% 65.625% 68.75% 71.875%
Table SII: Occurrence rate of RTN in Ta2O5 device.
Resistance
level
1.00 MΩ 1.92 MΩ 2.84 MΩ 3.76 MΩ 4.68 MΩ 5.60 MΩ 6.52 MΩ 7.50 MΩ
RTN
occurrence
rate
6.67% 8.89% 8.89% 15.6% 20% 20% 24.4% 28.9%
Table SIII: Occurrence rate of RTN in aVMCO device.
