Committee Machines—A Universal Method to Deal with Non-Idealities in Memristor-Based Neural Networks by Joksas, D et al.
 Joksas, D, Freitas, P, Chai, Z, Ng, WH, Buckwell, M, Li, C, Zhang, WD, Xia, QF, 
Kenyon, AJ and Mehonic, A
 Committee Machines—A Universal Method to Deal with Non-Idealities in 
Memristor-Based Neural Networks
http://researchonline.ljmu.ac.uk/id/eprint/13408/
Article
LJMU has developed LJMU Research Online for users to access the research output of the 
University more effectively. Copyright © and Moral Rights for the papers on this site are retained by 
the individual authors and/or other copyright owners. Users may download and/or print one copy of 
any article(s) in LJMU Research Online to facilitate their private study or for non-commercial research. 
You may not engage in further distribution of the material or use it for any profit-making activities or 
any commercial gain.
The version presented here may differ from the published version or from the version of the record. 
Please see the repository URL above for details on accessing the published version and note that 
access may require a subscription. 
For more information please contact researchonline@ljmu.ac.uk
http://researchonline.ljmu.ac.uk/
Citation (please note it is advisable to refer to the publisher’s version if you 
intend to cite from this work) 
Joksas, D, Freitas, P, Chai, Z, Ng, WH, Buckwell, M, Li, C, Zhang, WD, Xia, 
QF, Kenyon, AJ and Mehonic, A Committee Machines—A Universal Method 
to Deal with Non-Idealities in Memristor-Based Neural Networks. Nature 
Communications. ISSN 2041-1723 (Accepted) 
LJMU Research Online
Committee Machines—A Universal Method to Deal with1
Non-Idealities in Memristor-Based Neural Networks2
D. Joksas1, P. Freitas2, Z. Chai2, W. H. Ng1, M. Buckwell1,3
C. Li3, W. D. Zhang2, Q. Xia3, A. J. Kenyon1, and A. Mehonic14
1Department of Electronic and Electrical Engineering,5
University College London, London (United Kingdom)6
2Department of Electronics and Electrical Engineering,7
Liverpool John Moores University, Liverpool (United Kingdom)8
3Department of Electrical and Computer Engineering,9
University of Massachusetts Amherst (United States of America)10
Abstract11
Artificial neural networks are notoriously power- and time-consuming when implemented on con-12
ventional von Neumann computing systems. Consequently, recent years have seen an emergence13
of research in machine learning hardware that strives to bring memory and computing closer to-14
gether. A popular approach is to realise artificial neural networks in hardware by implementing15
their synaptic weights using memristive devices. However, various device- and system-level non-16
idealities usually prevent these physical implementations from achieving high inference accuracy.17
We suggest applying a well-known concept in computer science—committee machines—in the con-18
text of memristor-based neural networks. Using simulations and experimental data from three19
different types of memristive devices, we show that committee machines employing ensemble aver-20
aging can successfully increase inference accuracy in physically implemented neural networks that21
suffer from faulty devices, device-to-device variability, random telegraph noise and line resistance.22
Importantly, we demonstrate that the accuracy can be improved even without increasing the total23
number of memristors.24
1
I. INTRODUCTION25
Artificial neural networks (ANNs), with all of their variants, are now the main tools in26
machine learning tasks, such as classification. The vast amounts of data being constantly27
produced have enabled successful training and operation of ANNs. However, to achieve28
high inference accuracy, it is usually necessary for neural networks to have a large number of29
parameters. This results in both training [1] and inference [2] stages being time- and power-30
consuming. This is largely caused by the need to transfer data from memory to computing31
units—physical separation of memory and computing is the essence of any von Neumann32
system.33
One of the most promising solutions to these problems is the paradigm of non-von Neu-34
mann computing and, specifically, analogue implementations of synapses (weights) in phys-35
ical ANNs. Because there are many more synapses than there are neurons in ANNs, the36
matrix-vector multiplications, in which the synaptic weight values are used, are the costli-37
est operations in these networks, both in terms of power and time. Computing directly in38
memory would minimise data transfers from off-chip memory, thus the most popular ap-39
proach is using analogue memory devices as proxies for synaptic weights of ANNs (both40
fully connected and their variants [3, 4]). A common technique is to arrange such devices41
in a structure, called crossbar array, in which every device (or a pair of devices) is used to42
represent a single synaptic weight or, more generally, an entry in a matrix [5]. Memristive43
devices, such as phase-change memories (PCMs) [6, 7] or resistive random-access memories44
(RRAMs) [8, 9], have been considered as candidates for such tasks. Although here we fo-45
cus on ex-situ training, such systems have been successfully utilised for in-situ training too46
[10, 11].47
In memristive implementations of ANNs, the main concern is that various non-idealities48
associated with these devices can prevent these systems from achieving high accuracy [12,49
13]. Examples of non-idealities affecting inference accuracy include, but are not limited50
to, devices not being able to electroform, devices stuck in one of the resistance states after51
electroforming, device-to-device (D2D) variability and random telegraph noise (RTN). When52
training analogue systems in-situ, limited endurance and non-linear resistance modulation53
too have to be taken into account. To mitigate the effects of these device non-idealities, it is54
often necessary to modify device structure [9], to use more advanced programming schemes55
2
[14] or to use additional circuitry [15] or high-precision processing units [16] in conjunction56
with memristive elements. On the system level, there is an issue of line resistance which57
affects the distribution of currents and thus decreases the accuracy. These line resistance58
effects can be partially compensated for algorithmically [17] or partially mitigated by using59
multiple smaller crossbar arrays [18]. Examples of past efforts at dealing with these and60
other non-idealities of memristive devices and systems are listed in Table I; most of these61
non-idealities are still the main focus of the research in the neuromorphic community.62
We propose a simple way to mitigate the effects of all types of non-idealities during63
inference. We suggest combining several non-ideal memristor-based neural networks into64
committees to achieve better accuracy. The committee machine (CM) method we propose65
significantly increases the inference accuracy and does not increase the computation time66
because memristive ANNs in such committees work in parallel.67
In this work, we firstly explain the simulation setup—what networks were trained,68
how they were simulated and how they were combined into CMs. After that, follows69
the experimental part. We investigate three different types of memristor technology—70
tantalum/hafnium oxide-based (Ta/HfO2), tantalum oxide-based (Ta2O5), and amorphous71
vacancy modulated conductive oxide-based (aVMCO) devices. By exploring their non-72
idealities relevant to inference—faulty devices, D2D variability, RTN, and line resistance—73
we use the experimental data to simulate memristive ANNs working individually and in74
committees.75
II. RESULTS76
A. Simulation setup77
Fully connected ANNs were trained in software to recognise handwritten digits (using78
MNIST data base [19]). Architectures with one hidden layer were explored. Unless stated79
otherwise, the simulations used networks with 25 hidden neurons. However, networks with80
50, 100 and 200 hidden neurons were additionally employed to evaluate the effectiveness of81
the proposed method while controlling for the total number of memristors required. Follow-82
ing training, weights of ANNs were mapped onto pairs of conductances using proportional83
mapping scheme (see [20]) to simulate memristor-based ANNs. Finally, these memristive84
3
networks were disturbed using experimental data to reflect the effect of device- and system-85
level non-idealities.86
After simulating physical non-idealities, the networks were combined into CMs that em-87
ployed ensemble averaging (EA) [21]. The principle of EA is shown in Figure 1A—several88
networks are combined in parallel and then their outputs are averaged. After that, the89
prediction is made using the averaged vector—the prediction is the label corresponding to90
the largest entry in the vector.91
CM methods are frequently used even with conventional ANNs. Methods, such as EA,92
often produce better accuracy than that of the best individual network in a committee [22].93
Although there are other types of CMs besides EA, they often rely on training additional94
gating networks or boosting networks during the training stage. Using a gating network in95
this scenario would produce additional problems—to avoid it acting as a performance bottle-96
neck, it too would have to be implemented on crossbar arrays. Various non-idealities would97
decrease the effectiveness of this gating network which is responsible for making the deci-98
sions about the whole committee of ANNs. Likewise, we speculate that boosting of networks99
would not be feasible in ex-situ training because it requires information about where indi-100
vidual ANNs perform poorly—this cannot be known precisely until they are implemented101
physically on crossbar arrays and the non-idealities manifest themselves. To authors’ best102
knowledge, the application of boosting in the context of memristive neural networks seems103
to have been explored only once before [23]; as expected, it requires training each memristive104
implementation differently because non-idealities manifest themselves differently in different105
crossbar arrays.106
There exist modifications of EA algorithm that could potentially perform better. One107
example of this is generalized ensemble method (GEM) which, instead of using equal weight-108
ings for each network during averaging (as in EA), uses a different one for each network [21].109
These weightings are analytically determined by considering correlation of errors between110
different networks. But because [21] only considered networks with mean square error loss111
function (while our networks used cross-entropy loss function), this work does not explore112
GEM. Instead, we investigated whether it is possible to achieve a better performance by113
optimising the weightings numerically. This method, like GEM and others previously men-114
tioned, might be impractical because, firstly, these weightings could be determined only after115
the ANNs are physically implemented on crossbars, and, secondly, the devices could change116
4
throughout their lifetimes thus affecting the optimal weightings.117
Even with the assumption that the devices would have perfect retention, we found that118
optimisation of weightings achieves effectively the same performance. Because of these rea-119
sons, we focus only on EA in the main text, but present our results of optimising weightings120
in Supplementary Figure S5. We stress that we are open to the idea that other CM methods121
besides EA could be utilised successfully for ex-situ training in the context of memristive122
ANNs. However, in this work we focus on demonstrating that CMs can be used to improve123
the accuracy of memristor-based ANNs in general.124
With EA, we find that even when the memristive ANNs, which go into a committee, all125
use the same digital weights that are mapped onto crossbar arrays (see Figure 1B), committee126
of memristor-based networks can still achieve higher accuracy than just a single non-ideal127
network. Although all networks have the same digital weights before mapping, their physical128
implementations (which we call ”disturbances” in Figures 1B, C because they can usually129
be represented by the modification of individual weights) will be different. For example, in130
one crossbar array, a certain set of devices will be faulty, while in the other crossbar array, it131
will be a different set. This will result in different physical implementations having slightly132
different learned representations of the data set, or, to paraphrase, different networks will133
be ”damaged” differently by the non-idealities. This means that these committees will be134
able to combine different representations, and thus achieve higher accuracy. However, by135
definition, such approach would almost certainly not yield a committee accuracy that is136
higher than the accuracy of a single digitally implemented network.137
A better approach is to use different digital networks for different physical implementa-138
tions that go into a committee (see Figure 1C). This approach much more resembles the139
conventional application of EA in computer science. In the context of memristive crossbar140
arrays, it would not only help to mitigate the effects of the non-idealities (as in the case141
of Figure 1B), but would also allow to combine the representations of digital networks that142
were different even before the mapping stage. Most importantly, this method allows for a143
committee to achieve higher accuracy which is sometimes even higher than that of individual144
networks with digitally implemented weights. We thus used this method in this analysis.145
An example comparison of these two approaches is presented in Supplementary Figure S8.146
In this work, any given committee used only one network architecture but each network147
was initialised differently before training, thus trained networks had different sets of weights.148
5
Although it was not explored in this work, combining different network architectures in a149
committee of memristor-based networks might be advantageous. Furthermore, in this work150
we focus on fully connected ANNs but CMs could be applied to other variants of neural151
networks as well. Due to the simplicity of EA, it could, for example, be employed in con-152
volutional neural networks (CNNs) [24], which are often used for image classification. This153
might be of interest as CNNs have been successfully implemented using crossbar arrays re-154
cently [25]. However, crossbar implementations are naturally more suited to fully connected155
networks, therefore we limit ourselves to this architecture but are open to exploring the156
effectiveness of EA with memristive CNNs in the future.157
B. Ta/HfO2 RRAM158
With array-level data available, Ta/HfO2 experiments provide the most complete pic-159
ture of device- and system-level non-idealities. In this subsection, we present not only the160
analysis of faulty devices and D2D variability, but also careful consideration of the line resis-161
tance effects. Ta/HfO2 memristors do not exhibit apparent RTN and overall have excellent162
retention properties [26], and thus are perfect candidates for inference application.163
1. Faulty devices and device-to-device variability164
The most energy-efficient procedure to modulate the conductance of memristors is by165
the application of voltage pulses. In an ideal scenario, one would apply identical pulses166
and observe constant increases in conductance with each pulse. This is rarely the case167
in practise, but, fortunately, this type of behaviour is more relevant for in-situ training168
where it is necessary to ensure linear adjustment of ANN’s weights [27]. In ex-situ training,169
conductance verification schemes can be used to program the devices precisely. Because the170
devices would have to be programmed only once, one can spend additional resources to do so171
accurately by applying SET (potentiation) and RESET (depression) pulses until a desirable172
conductance state is achieved.173
Even with this approach, there remain two obstacles—faulty devices and D2D variability.174
It is observed in most memristor technologies that at least a small fraction of the devices175
tends to get stuck in a particular conductance state. Additionally, even if not stuck, different176
6
devices might behave differently; for example, they might have different conductance ranges.177
Figure 2A shows conductance changes in Ta/HfO2 RRAM devices (in a 128 × 64 crossbar178
array) when they are applied with voltage pulses. We can see from the median values179
that overall the devices’ conductance tends to increase as more SET pulses are applied.180
However, the wider bottom regions of the violin plots indicate that some devices are stuck181
around high resistance state (HRS) and cannot set entirely no matter how many voltage182
pulses are applied. There also exist devices that are stuck in low resistance state (LRS), or183
simply do not span the full conductance range.184
Figure 2A combines data from multiple SET cycles for each of the memristors, thus it185
is important to understand how do these devices behave individually. Figures 2B-F show186
conductance of 5 (out of 8,192) devices over 11 SET and RESET cycles. In the five dia-187
grams, the radial component represents the conductance (in mS) and the angular component188
represents the number of applied pulses. Figure 2B shows an example of preferable (and189
typical) device behaviour—conductance changes in a continuous fashion and spans a wide190
range of conductance values, from ∼0.1 ms to ∼1.0 ms. Although RESET cycles tend to191
feature abrupt decreases in conductance, one can always repeat a cycle and exploit the more192
predictable behaviour of SET cycles.193
When encoding continuous numbers into crossbar devices’ conductances, it is often prefer-194
able to choose a large enough conductance range. Using data from Figure 2A, one could,195
for example, choose the range between the first and the last median points (from ∼0.1 mS196
to ∼1.0 mS). Device, whose behaviour is presented in Figure 2B, could be easily set to any197
conductance within that range, as we have seen before. On the other hand, device, whose198
behaviour is presented in Figure 2C, although operating in a predictable fashion, has smaller199
conductance range. We can see that in all cycles, its conductance does not exceed 0.8 mS.200
This is an example of D2D variability that can make it difficult to choose optimal operating201
range and set the conductance of all devices precisely.202
Device, whose behaviour is presented in Figure 2D, shows high cycle-to-cycle variability.203
Although that could prove to be a problem in some applications, this specific device might204
perfectly serve its purpose in ex-situ training of ANNs. We can observe that this device205
spans the same conductance range as device from Figure 2B, even if in an unpredictable206
manner. Because all states in the full range are, in theory, achievable, one can cycle the207
device multiple times until it is set to the required conductance level.208
7
Lastly, we have devices whose negative effect is most difficult to mitigate—faulty devices.209
Figure 2E shows behaviour of a device stuck at high conductance values, while Figure 2F210
shows behaviour of a device stuck at low conductance values. No matter how many pulses211
the devices are applied with or how many times they are cycled, they exhibit almost no212
conductance variation and thus, in most cases, cannot be used to encode information.213
Knowing that some devices perform like the ones whose behaviour is shown in Fig-214
ures 2C,E,F, it is important to minimise their negative effect. If the conductance that a215
device has to be set to is outside that device’s range, it is sensible to set it to the closest216
achievable conductance. Although there is little that can be done about fully stuck memris-217
tors, it is possible to optimise the behaviour of devices like the one in Figure 2C that simply218
have smaller conductance range. For example, if such a device has to be set to 0.9 mS, one219
would set it to the highest achievable conductance (∼0.8 mS). In the following simulations220
involving faulty devices and D2D variability, operating range between the first and the last221
median points was used, the devices were chosen randomly from the 128× 64 crossbar and222
set to the most desirable states, as described in this paragraph.223
2. Line resistance224
The effect of line resistance can be extremely detrimental in many crossbar-based im-225
plementations of ANNs. That is especially the case if the crossbars used are large and the226
resistance of the interconnects is high (compared to memristors’ resistance). Because in a227
neural network many of the inputs are non-zero at any given time, a lot of current accumu-228
lates in the bit lines which results in significant voltage drops across the interconnects, and229
thus the current distribution across the crossbar is affected in a major way.230
The Ta/HfO2 crossbar has shape 128×64 and so this shape was chosen for all the simula-231
tions involving line resistance. Even relatively small ANNs of architecture 784(+1):25(+1):10232
would need 2× (785× 25 + 26× 10) = 39, 770 memristors to be implemented. Even if not233
all the inputs were used at any given time, it would not be possible to fit all the memristors234
onto a single crossbar of shape 128× 64. To overcome this, we decided to simulate multiple235
crossbars, each of which would implement a subset of the synaptic weights, but, for a given236
synaptic layer, would all compute in parallel. Because d785/128e = 7, seven crossbars were237
used to implement the first synaptic layer; the first crossbar utilized bottom 113 word lines,238
8
while the other six crossbars used bottom 112 word lines because 113 + 6× 112 = 785. The239
second synaptic layer was implemented using eighth crossbar utilizing its bottom 26 word240
lines.241
Figure 3A shows an example of how the first synaptic layer of 784(+1):25(+1):10 neural242
network could be implemented. Specifically, it shows how the first subset of weights would243
be implemented using one of the crossbars. Because we use proportional mapping scheme,244
positive and negative weights would be implemented in different bit lines. In Figure 3A,245
memristors designated to implement positive weights are coloured in blue, memristors desig-246
nated to implement negative weights are coloured in orange and unelectroformed memristors247
are coloured in black. Because simulations were constrained by experimental data, some of248
the devices were left unused and assumed to be unelectroformed. In practise, the crossbars249
could be manufactured to fit the geometry of the ANNs.250
In each synaptic layer, the corresponding output currents from each of the crossbars would251
be added together. Additionally, output currents at the bit lines implementing negative252
weights would be subtracted from the output currents at the neighbouring bit lines (to their253
left) implementing positive weights. For example, in the example configuration of Figure 3A,254
output current at the 2nd bit line would be subtracted from the output current at the 1st bit255
line, etc.256
Unfortunately, even when using multiple smaller crossbars, the interconnects can signif-257
icantly disturb current distribution in the crossbar. Average output current decreases due258
to line resistance in all seven crossbars of Ta/HfO2 devices (whose resistance ranges from259
∼1 kΩ to ∼11 kΩ, and their interconnect resistance is 0.35 Ω and 0.32 Ω in the word and bit260
lines, respectively), are shown in the heatmap in Figure 3B. We can see that the current261
decreases can range from ∼12% at the outputs nearest to the applied voltages to ∼16% at262
the outputs in the rightmost bit lines that are used. In the supplementary information, we263
provide a possible strategy of mitigating line resistance effects in supervised learning. This264
scheme was not employed in the simulations described in the main text because we wanted265
to find out how well the CM method would deal with noticeable line resistance effects.266
9
3. Inference accuracy267
Figure 4 shows the accuracy of individual networks, as well as of their committees; mem-268
ristive ANNs were simulated by taking into account three non-idealities of Ta/HfO2 crossbar269
explored earlier—faulty devices, D2D variability and line resistance. As indicated by the270
yellow box plot in Figure 4, individual networks implemented digitally achieve ∼95.9% me-271
dian accuracy. Networks disturbed to reflect the effect of non-idealities achieve ∼91.0%272
median accuracy, as indicated by the vermilion box plot. Although that is a substantial273
drop in accuracy, we see that as more networks are added to the committee, the more the274
accuracy increases. When 5 networks are used in a committee, median accuracy increases275
up to ∼95.7%, as indicated by the rightmost green box plot.276
C. Ta2O5 RRAM277
In order to explore the effectiveness of minimising adverse effects of RTN, we use another278
memristor technology based on Ta2O5. To investigate RTN, measurements from a single279
device were considered. To simulate line resistance effects, interconnect resistance from280
Ta/HfO2 was used and the same crossbar shape was assumed.281
1. Random telegraph noise282
Memristors often suffer from RTN resulting in a different accuracy at any given instant283
in time. Ta2O5 device was characterised by measuring the current of 8 resistance states284
multiple times. Figure 5 shows the cumulative probability plots for those resistance states,285
together with lognormal fits modelling the nature of RTN. One of the things that the figure286
reveals is that higher resistance states suffer from higher degree of RTN. Fits for every287
resistance state, together with occurrence rates (see Supplementary Table SII), were used288
to disturb the weights of ANNs in order to reproduce the effect of RTN.289
2. Inference accuracy290
The results combining RTN and line resistance effects for Ta2O5 device are shown in291
Figure 6. From the difference in median accuracy between yellow and blue box plots, we can292
10
notice that there is a significant drop in accuracy simply due to mapping of weights onto293
conductances. That is not surprising given that only 8 states were available for mapping.294
One can also observe that further drop in median accuracy due to non-idealities is not295
as severe—it drops to ∼94.1%. The RTN disturbance magnitude is limited to <100% in296
most cases, which possibly contributes to its smaller effect on accuracy. Additionally, Ta2O5297
device has much higher resistance (ranging from 25 kΩ to 200 kΩ), thus line resistance is also298
less of a concern. When non-ideal networks are combined into committees of 5, the median299
accuracy jumps to ∼96.5%—even higher than the software baseline of individual networks.300
This reveals additional trend seen in all the simulations performed—the higher the accuracy301
of the individual non-ideal memristive networks, the higher the accuracy of the committees302
that they are part of.303
D. aVMCO RRAM304
Further, we consider a third memristor technology—one based on aVCMO materials. We305
test the effects of RTN by considering measurements from a single device. Line resistance306
effects were simulated by using interconnect resistance and shape of Ta/HfO2 crossbar array.307
1. Random telegraph noise308
Figure 7 shows the cumulative probability plots for 8 resistance states of an aVMCO309
device suffering from RTN. Like in Ta2O5, we observe that higher resistance states experience310
RTN of higher magnitude. However, compared to Ta2O5, the RTN magnitude is much more311
predictable. Fits for each of the 8 resistance states, together with occurrence rates (see312
Supplementary Table SIII), were used to simulate the effect of RTN in aVMCO-based neural313
networks.314
2. Inference accuracy315
The results combining RTN and line resistance are shown in Figure 8. As with Ta2O5, we316
see a large drop due to mapping onto conductances—consequence of very few states available317
for mapping. More interestingly, the accuracy of individual memristor-based networks with318
11
and without non-idealities is almost identical. That is because the occurrence rate of RTN319
in aVMCO device is small and there is a much smaller probability of RTN having large320
magnitude. Additionally, resistance of aVMCO device is even higher than that of Ta2O5321
device—it ranges from 1 MΩ to 7.5 MΩ. Therefore, line resistance has even a smaller effect322
in a hypothetical array of aVMCO devices. Due to median accuracy of individual non-ideal323
memristor-based networks being higher (∼94.6%), the median accuracy of committees is324
higher too—in committees of size 5 it increases to ∼96.7%.325
III. DISCUSSION326
The results from the previous section suggest that the method of using committee ma-327
chines to improve the accuracy of memristive neural networks is technology- and non-ideality-328
agnostic. CMs can mitigate the effects of faulty devices, D2D variability, RTN and line329
resistance in combination with each other. Although CM method is slightly less effective330
with large line resistance (see discussion in the supplementary information), in all cases, we331
observe that the accuracy of individual non-ideal networks largely determines the accuracy332
of committees. That is consequential because it means that although committees always333
increase the accuracy, there is still an incentive to optimise the devices and systems that334
implement these networks—the higher the accuracy of individual networks, the higher the335
accuracy of the committees.336
It is also important to consider whether using larger networks, instead of committees of337
smaller networks, would yield the same results if the same number of synapses (or mem-338
ristors) was used in the large network as in the committee of smaller networks. In our339
previous work we found that the accuracy of networks before disturbance (which we call340
“starting accuracy”) has a huge effect on the robustness to non-idealities—the larger the341
starting accuracy, the more robust the networks become [20]. One way to achieve higher342
starting accuracy is to have larger networks, e.g. if we have a network with one hidden layer,343
we might increase the number of neurons in that hidden layer, which would likely result in344
higher accuracy after training and thus higher robustness.345
Figure 9 shows a comparison of CMs of memristor-based networks disturbed using faulty346
devices and D2D variability data from Ta/HfO2 crossbar, when controlled for the total347
number of memristors that is required to implement them (line resistance was not taken348
12
into account due to long time required to simulate it in large networks). We can observe349
that committees of two networks, each with 25 hidden neurons, (leftmost data point of350
the orange curve) achieve ∼0.9% higher median accuracy than individual networks with351
50 hidden neurons (second data point from the left in the vermilion curve), despite both352
requiring almost identical total number of memristors. Committees of two networks, each353
with 100 hidden neurons, (third data point from the left in the orange curve) achieve ∼1.1%354
higher median accuracy than individual networks with 200 hidden neurons (rightmost data355
point in the vermilion curve), even though both require almost the same total number of356
memristors. Even larger improvement is gained when committees of four networks, each with357
50 hidden neurons, (second data point from the left in the blue curve) are used instead—358
then the accuracy is improved by ∼1.5%, with almost the exact total number of memristors359
used.360
For different non-idealities and even different training schemes of the ANNs, the equiv-361
alents of Figure 9 might be different, but there are a few common characteristics in all of362
them. In all cases, for a given total number of memristors used, there is an optimal number363
of networks that should be used in a committee. Additionally, we observe that the more364
severe a non-ideality is, the more apparent the effectiveness of committees becomes. Finally,365
sometimes the committees (for a fixed total number of memristors) might achieve lower366
accuracy than individual networks but only if the networks that they replace are very small367
and the non-ideality is not very detrimental. If the networks that are being replaced with368
committees of smaller networks, are sufficiently large, the committees will achieve higher369
accuracy. An example of that is shown in Supplementary Figure S7 where aVMCO device370
is minimally affected by the non-idealities and so the advantage of committees becomes371
apparent only when replacing larger networks.372
The reason why committees work in the context of non-ideal implementations and why373
they work best when they are used to replace large networks might, to some extent, lie in374
their training. When it comes to training fully connected networks, their accuracy tends to375
saturate as more parameters are added. Supplementary Figure S4 shows that networks with376
50 hidden neurons can be trained to achieve significantly higher accuracy than networks with377
25 hidden neurons. However, networks with 200 hidden neurons achieve only slightly higher378
accuracy than networks with 100 hidden neurons. This also means that networks with 200379
hidden neurons will be only slightly more robust to non-idealities than networks with 100380
13
hidden neurons. When such networks are affected by non-idealities, their accuracy drops381
to similar values but the smaller network can work in a committee with other networks,382
totalling almost the same number of memristors as the large network, but achieving higher383
accuracy overall. This is the most likely reason why the committees of smaller networks are384
effective at dealing with non-idealities, especially when replacing large networks.385
In addition to the accuracy improvements, committees can provide flexibility in mem-386
ristive implementations of neural networks. Digital implementations of ANNs have very387
predictable behaviour due to the precision of digital logic. Analogue implementations, on388
the other hand, can vary greatly even if they use the same weights before the mapping389
onto conductances—that is a result of the stochastic nature of memristors that implement390
these ANNs. The parallel and modular nature of committee machines makes memristive391
systems much more flexible. For example, if the verification accuracy of one of the ANNs in392
a memristor-based CM deteriorates below acceptable levels, its outputs could be disabled393
to ensure higher accuracy of the rest of the committee.394
Importantly, this introduced parallelism comes at almost no extra cost. For a fixed total395
number of memristors, a committee of smaller networks, compared to a large individual396
network, would only require a few additional output and bias neurons, and an averaging397
functionality, which could potentially be implemented in hardware. For example, an ANN398
with 50 hidden neurons would require 846 neurons in total, while a committee of two ANNs,399
each with 25 hidden neurons (and thus requiring almost the same total number of memris-400
tors), would require 857 neurons in total.401
In summary, our simulations employing experimental data from three different types of402
memristive devices show that committee machines employing ensemble averaging can be used403
to mitigate the effects of device- and system-level non-idealities in memristor-based neural404
networks. EA allows to achieve higher inference accuracy in physically implemented neural405
networks that suffer from faulty devices, device-to-device variability, random telegraph noise,406
and even line resistance. This method is a universal way to deal with the most common407
non-idealities and is straightforward to implement during the fabrication stage. Increased408
modularity of these memristive neural network systems will increase not only their inference409
accuracy, but also their robustness and flexibility, even without the need to sacrifice area.410
Although some level of non-idealities in memristors is unavoidable, CM method allows us411
to deal with these on the system level and is agnostic to a particular technology or, to some412
14
degree, type of the non-ideality.413
METHODS414
Experiments415
Ta/HfO2 RRAM 1T1R array consists of NMOS transistors fabricated in a commercial416
fab (feature size of 2 µm) and Pt/HfO2/Ta devices. The bottom electrode was deposited by417
evaporation of 20 nm Pt layer on top of a 2 nm tantalum (Ta) adhesive layer; the electrode418
was patterned by photolitography and a lift-off process. A 5 nm HfO2 switching layer was419
deposited by atomic layer deposition using water and tetrakis(dimethylamido)hafnium as420
precursors at 250 ◦C. Sputter-deposited Ta of 50 nm thickness followed by 10 nm Pd was421
used in a lift-off process to serve as the top electrode. The filamentary based Ta2O5 device422
consists of a TiN/4nm stoichiometric Ta2O5/20 nm nonstoichiometric TaOx/10 nm TaN/TiN423
stack with a cross-sectional area of 75 nm×75 nm, while the non-filamentary-based aVMCO424
has a cross-sectional area of 135 nm × 135 nm and is composed of a TiN/8 nm amorphous-425
Si/8 nm anatase TiO2/TiN stack. Ta2O5 and aVMCO devices were fabricated by imec. The426
detailed fabrication process parameters can be found in references [11, 28, 29] for Ta/HfO2,427
Ta2O5 and aVMCO RRAMs respectively.428
The conductance of Ta/HfO2 devices was modulated by applying SET pulses (500µs @429
2.5 V and gate voltage increasing from 0.6 V to 1.6 V). After each of the 11 cycles, RESET430
pulses were applied (5 µs @ 0.9 V increasing to 2.2 V and gate voltage of 5 V). The voltage431
was being increased linearly throughout the 100 pulses. All electrical tests for Ta2O5 and432
aVMCO devices were done with a Keysight B1500A. The RTN data is extracted by switching433
the device into 8 uniformly distributed resistance levels between 25 kΩ and 200 kΩ, and 8434
nearly uniformly distributed resistance levels between 1 MΩ and 7.5 MΩ with incremental435
RESET DC sweeps [30] for Ta2O5 and aVMCO respectively. RTN measurement is then436
carried out at each resistance level at a 0.1 V and 3 V read-out for Ta2O5 and aVMCO437
respectively, with a sampling time of 2 ms/point and 10,000 sampling point per resistance438
level for an RTN measurement period of 20 s.439
15
Simulations440
In this work, feed-forward ANNs with fully connected layers and continuous weights were441
trained to recognise handwritten digits using the MNIST data base. All 60,000 MNIST442
training images were used during the training stage; training set consisted of 50,000 images443
and verification set consisted of 10,000 images. All 10,000 test images were used to evaluate444
the inference accuracy of ANNs. Networks used 784 input neurons representing pixel inten-445
sities of MNIST images of 28× 28 pixel size, as well as one bias neuron. 10 output neurons446
were used; they represented the ANNs’ predictions of 10 handwritten digits. Hidden layers447
used sigmoid activation function, while the output layer used softmax activation function.448
Weights were optimised by minimising cross-entropy error function using stochastic gradi-449
ent descent. Learning rate of 0.01 and patience of 25 epochs were used. 25 networks were450
trained for each architecture explored by initialising them differently. When numerically op-451
timising ANNs’ weightings, optimisation was performed by employing verification set, while452
the performance was evaluated using the test set. The code was implemented in Python.453
Weights were mapped onto pairs of memristors’ conductances using proportional map-454
ping scheme—synaptic weights were made proportional to one of the conductances in the455
pair, while the other was left unelectroformed. The zero weight was interpreted as given—456
in practise, it would be implemented by not electroforming the device, thus resulting in its457
negligible conductance. Although aVMCO devices do not have electroforming stage, for con-458
sistency we assumed that additional insulating circuit elements could be used to implement459
the zero weight. Negative weights would be implemented by placing certain memristors in460
dedicated bit lines of the crossbars whose outputs would be subtracted from the outputs at461
the corresponding bit lines implementing positive weights. Maximum weights after mapping462
were optimised separately for each set of network architecture and conductance levels; in463
each case this was done by excluding a certain proportion, pL, of weights with largest abso-464
lute values. What pL values were used for each simulation is summarised in Supplementary465
Table SI. More details on the mapping procedure can be found in our past work [20].466
All non-idealities, except for line resistance, were simulated by disturbing the individual467
conductances of memristor-based ANNs. To investigate line resistance, nodal analysis was468
employed. By setting up simultaneous linear equations using Ohm’s law and Kirchhoff’s469
current law, those were solved in sparse matrix representation using Python’s library scipy.470
16
After simulating memristor non-idealities, committees of different ANNs were composed.471
Committees used EA, i.e. the outputs of individual networks in a committee were averaged472
to produce a single output vector. In EA, the output vectors of individual networks can473
simply be added together (if the weightings of different networks are the same, as we assume474
in the main text); the label corresponding to the entry with the highest value would be475
the prediction of the committee. This addition can be performed either in software, or, if476
the activation function of the last neuronal layer can be implemented physically, it can be477
performed by adding corresponding currents produced by the circuitry of this activation478
function.479
In the simulations, neural networks that go into a committee were chosen randomly.480
This was done to reflect the most convenient strategy when manufacturing such systems—481
because one does not need to selectively choose the networks, manufactured crossbars can be482
easily programmed without the need to replace them if they perform poorly when working483
individually (unless their effect is so detrimental that they have to be ignored which can484
be made possible with this technique). Besides, devices might change over time, thus these485
simulations, which show what happens when one does not selectively choose the networks,486
are valuable to investigate conditions where it is not possible to replace the networks.487
In the simulations, 25 base networks were used (each having different set of weights) for488
each of the architectures. Then all of their weights were mapped onto pairs of conductances489
using HRS/LRS values extracted from experiments. Finally, to reflect the effect of each of490
the non-idealities, all networks were disturbed multiple times. In each disturbance iteration,491
multiple combinations of networks were chosen and their performance as a committee of492
certain size was evaluated. In total, for most simulations, 10,000 data points were recorded493
for a committee of every size—these data captured the variations of base networks, their494
combinations and different disturbance iterations. Only simulations involving line resistance495
or numerical optimisation of weights had fewer data points for some committee sizes (due496
to long simulation times).497
DATA AVAILABILITY498
The data that support the findings of this study are available from the corresponding499
author upon reasonable request.500
17
AUTHOR CONTRIBUTIONS501
A.M. and D.J. conceived the idea and designed the study. A.M., P.F. and Z.C. per-502
formed the experimental measurements. D.J. performed the simulations and analysed the503
experimental and simulation results. C.L. and Q.X. provided the experimental data of the504
programming of a Ta/HfO2 1T1R RRAM array. A.M., W.D.Z. and A.J.K. supervised the505
research. D.J. wrote the initial manuscript. All authors contributed to the discussions of506
the results and improved the text.507
COMPETING INTERESTS STATEMENT508
The authors declare that the research was conducted in the absence of any commercial509
or financial relationships that could be construed as a potential conflict of interest.510
FUNDING511
A.M. acknowledges funding from the Royal Academy of Engineering under the Re-512
search Fellowship scheme, A.J.K. acknowledges funding from the Engineering and Physi-513
cal Sciences Research Council (EP/P013503/1) and the Leverhulme Trust (RPG-2016-135),514
W.D.Z. acknowledges funding from the Engineering and Physical Sciences Research Council515
(EP/S000259/1).516
18
[1] E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerations for deep learning517
in NLP,” arXiv preprint arXiv:1906.02243, 2019.518
[2] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with519
pruning, trained quantization and huffman coding,” in International Conference on Learning520
Representations, 2016, San Juan (Puerto Rico), arXiv preprint arXiv:1510.00149.521
[3] C. Li, Z. Wang, M. Rao, D. Belkin, W. Song, H. Jiang, P. Yan, Y. Li, P. Lin, M. Hu, N. Ge,522
J. P. Strachan, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, and Q. Xia, “Long short-term523
memory networks in memristor crossbar arrays,” Nature Machine Intelligence, vol. 1, no. 1,524
pp. 49–57, 2019, doi: 10.1038/s42256-018-0001-4.525
[4] Z. Wang, C. Li, W. Song, M. Rao, D. Belkin, Y. Li, P. Yan, H. Jiang, P. Lin, M. Hu, J. P.526
Strachan, N. Ge, M. Barnell, Q. Wu, A. G. Barto, Q. Qiu, R. S. Williams, Q. Xia, and J. J.527
Yang, “Reinforcement learning with analogue memristor arrays,” Nature Electronics, vol. 2,528
no. 3, p. 115, 2019, doi: 10.1038/s41928-019-0221-6.529
[5] Z. Sun, G. Pedretti, E. Ambrosi, A. Bricalli, W. Wang, and D. Ielmini, “Solving matrix530
equations in one step with cross-point resistive arrays,” Proceedings of the National Academy531
of Sciences, vol. 116, no. 10, pp. 4123–4128, 2019, doi: 10.1073/pnas.1815682116.532
[6] S. R. Nandakumar, M. Le Gallo, I. Boybat, B. Rajendran, A. Sebastian, and E. Eleftheriou,533
“A phase-change memory model for neuromorphic computing,” Journal of Applied Physics,534
vol. 124, no. 15, p. 152135, 2018, doi: 10.1063/1.5042408.535
[7] S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby, I. Boybat, C. D. Nolfo, S. Sidler, M. Gior-536
dano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi, and G. W. Burr,537
“Equivalent-accuracy accelerated neural-network training using analogue memory,” Nature,538
vol. 558, no. 7708, pp. 60–67, 2018, doi: 10.1038/s41586-018-0180-5.539
[8] S. Yu, Z. Li, P. Y. Chen, H. Wu, B. Gao, D. Wang, W. Wu, and H. Qian, “Binary neu-540
ral network with 16 Mb RRAM macro chip for classification and online training,” in In-541
ternational Electron Devices Meeting. IEEE, 2016, San Francisco (United States), doi:542
10.1109/IEDM.2016.7838429.543
[9] J. Woo, K. Moon, J. Song, S. Lee, M. Kwak, J. Park, and H. Hwang, “Improved synap-544
tic behavior under identical pulses using AlOx/HfO2 bilayer RRAM array for neuromor-545
19
phic systems,” IEEE Electron Device Letters, vol. 37, no. 8, pp. 994–997, 2016, doi:546
10.1109/LED.2016.2582859.547
[10] M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev, and D. B.548
Strukov, “Training and operation of an integrated neuromorphic network based on metal-549
oxide memristors,” Nature, vol. 521, no. 7550, pp. 61–64, 2015, doi: 10.1038/nature14441.550
[11] C. Li, D. Belkin, Y. Li, P. Yan, M. Hu, N. Ge, H. Jiang, E. Montgomery, P. Lin, Z. Wang,551
W. Song, J. P. Strachan, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, and Q. Xia, “Ef-552
ficient and self-adaptive in-situ learning in multilayer memristor neural networks,” Nature553
communications, vol. 9, no. 1, p. 2385, 2018, doi: 10.1038/s41467-018-04484-2.554
[12] A. Chen and M. R. Lin, “Variability of resistive switching memories and its impact on cross-555
bar array performance,” in 2011 International Reliability Physics Symposium. IEEE, 2011,556
Monterey (United States), doi: 10.1109/IRPS.2011.5784590.557
[13] J. Kang, Z. Yu, L. Wu, Y. Fang, Z. Wang, Y. Cai, Z. Ji, J. Zhang, R. Wang, and Y. Yang,558
“Time-dependent variability in RRAM-based analog neuromorphic system for pattern recogni-559
tion,” in International Electron Devices Meeting. IEEE, 2017, San Francisco (United States),560
doi: 10.1109/IEDM.2017.8268340.561
[14] L. Xia, W. Huangfu, T. Tang, X. Yin, K. Chakrabarty, Y. Xie, Y. Wang, and H. Yang,562
“Stuck-at fault tolerance in RRAM computing systems,” IEEE Journal on Emerging and563
Selected Topics in Circuits and Systems, vol. 8, no. 1, pp. 102–115, 2017, doi: 10.1109/JET-564
CAS.2017.2776980.565
[15] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang, W. Song, N. Da´vila, C. E.566
Graves, Z. Li, J. P. Strachan, P. Lin, Z. Wang, M. Barnell, Q. Wu, S. Williams, J. Yang,567
and Q. Xia, “Analogue signal and image processing with large memristor crossbars,” Nature568
Electronics, vol. 1, no. 1, pp. 52–59, 2018, doi: 10.1038/s41928-017-0002-z.569
[16] M. Le Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma, C. Bekas, A. Curioni,570
and E. Eleftheriou, “Mixed-precision in-memory computing,” Nature Electronics, vol. 1, no. 4,571
p. 246, 2018, doi: 10.1038/s41928-018-0054-8.572
[17] M. Hu, J. P. Strachan, Z. Li, and S. R. William, “Dot-product engine as computing mem-573
ory to accelerate machine learning algorithms,” in 17th International Symposium on Quality574
Electronic Design, 2016, Santa Clara (United States), doi: 10.1109/ISQED.2016.7479230.575
20
[18] Q. Xia and J. J. Yang, “Memristive crossbar arrays for brain-inspired computing,” Nature576
materials, vol. 18, no. 4, p. 309, 2019, doi: 10.1038/s41563-019-0291-x.577
[19] Y. LeCun, C. Cortes, and C. J. C. Burges, “The MNIST database of handwritten digits,”578
2010. [Online]. Available: http://yann.lecun.com/exdb/mnist579
[20] A. Mehonic, D. Joksas, W. H. Ng, M. Buckwell, and A. J. Kenyon, “Simulation of inference580
accuracy using realistic RRAM devices,” Frontiers in Neuroscience, vol. 13, p. 593, 2019, doi:581
10.3389/fnins.2019.00593.582
[21] M. P. Perrone and L. N. Cooper, “When networks disagree: Ensemble methods for hybrid583
neural networks,” in Artificial Neural Networks for Speech and Vision. Chapman and Hall,584
1993, pp. 126–142.585
[22] S. Hashem and B. Schmeiser, “Improving model accuracy using optimal linear combinations of586
trained neural networks,” IEEE Transactions on Neural Networks, vol. 6, no. 3, pp. 792–794,587
1995, doi: 10.1109/72.377990.588
[23] B. Li, L. Xia, P. Gu, Y. Wang, and H. Yang, “Merging the interface: Power, area and accuracy589
co-optimization for RRAM crossbar-based mixed-signal computing system,” in Proceedings of590
the 52nd Annual Design Automation Conference, 2015, San Francisco (United States), doi:591
10.1145/2744769.2744870.592
[24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional593
neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105,594
Lake Tahoe (United States), doi: 10.1145/3065386.595
[25] Z. Wang, C. Li, P. Lin, M. Rao, Y. Nie, W. Song, Q. Qiu, Y. Li, P. Yan, J. P. Strachan,596
N. Ge, N. McDonald, Q. Wu, M. Hu, H. Wu, R. S. Williams, Q. Xia, and J. J. Yang, “In situ597
training of feed-forward and recurrent convolutional memristor networks,” Nature Machine598
Intelligence, vol. 1, no. 9, pp. 434–442, 2019, doi: 10.1038/s42256-019-0089-1.599
[26] H. Jiang, L. Han, P. Lin, Z. Wang, M. H. Jang, Q. Wu, M. Barnell, J. J. Yang, H. L. Xin, and600
Q. Xia, “Sub-10 nm ta channel responsible for superior performance of a HfO2 memristor,”601
Scientific reports, vol. 6, p. 28525, 2016, doi: 10.1038/srep28525.602
[27] G. W. Burr, R. M. Shelby, S. Sidler, C. Di Nolfo, J. Jang, I. Boybat, R. S. Shenoy,603
P. Narayanan, K. Virwani, E. U. Giacometti, B. N. Kurdi, and H. Hwang, “Experimen-604
tal demonstration and tolerancing of a large-scale neural network (165 000 synapses) using605
phase-change memory as the synaptic weight element,” IEEE Transactions on Electron De-606
21
vices, vol. 62, no. 11, pp. 3498–3507, 2015, doi: 10.1109/TED.2015.2439635.607
[28] Y. Fan, L. Zhang, D. Crotti, T. Witters, M. Jurczak, and B. Govoreanu, “Direct evidence608
of the overshoot suppression in Ta2O5-based resistive switching memory with an integrated609
access resistor,” IEEE Electron Device Letters, vol. 36, no. 10, pp. 1027–1029, 2015, doi:610
10.1109/LED.2015.2470081.611
[29] B. Govoreanu, D. Crotti, S. Subhechha, L. Zhang, Y. Chen, S. Clima, V. Paraschiv, H. Hody,612
C. Adelmann, M. Popovici, O. Richard, and M. Jurczak, “A-VMCO: A novel forming-free, self-613
rectifying, analog memory cell with low-current operation, nonfilamentary switching and excel-614
lent variability,” in Symposium on VLSI Technology, 2015, Kyoto (Japan), doi: 10.1109/VL-615
SIT.2015.7223717.616
[30] Z. Chai, W. Zhang, P. Freitas, F. Hatem, J. F. Zhang, J. Marsland, B. Govoreanu, L. Goux,617
G. S. Kar, S. Hall, P. Chalker, and J. Robertson, “The over-reset phenomenon in Ta2O5618
RRAM device investigated by the RTN-based defect probing technique,” IEEE Electron Device619
Letters, vol. 39, no. 7, pp. 955–958, 2018, doi: 10.1109/LED.2018.2833149.620
[31] C. Sung, S. Lim, H. Kim, T. Kim, K. Moon, J. Song, J.-J. Kim, and H. Hwang, “Effect621
of conductance linearity and multi-level cell characteristics of TaOx -based synapse device on622
pattern recognition accuracy of neuromorphic system,” Nanotechnology, vol. 29, no. 11, p.623
115203, 2018, doi: 10.1088/1361-6528/aaa733.624
[32] Y. Fang, Z. Yu, Z. Wang, T. Zhang, Y. Yang, Y. Cai, and R. Huang, “Improvement of HfOx -625
based RRAM device variation by inserting ALD TiN buffer layer,” IEEE Electron Device626
Letters, vol. 39, no. 6, pp. 819–822, 2018, doi: 10.1109/LED.2018.2831698.627
[33] B. Govoreanu, A. Redolfi, L. Zhang, C. Adelmann, M. Popovici, S. Clima, H. Hody,628
V. Paraschiv, I. Radu, A. Franquet, J. C. Liu, J. Swerts, O. Richard, H. Bender, L. Altimime,629
and M. Jurczak, “Vacancy-modulated conductive oxide resistive RAM (VMCO-RRAM): An630
area-scalable switching current, self-compliant, highly nonlinear and wide on/off-window re-631
sistive switching cell,” in International Electron Devices Meeting. IEEE, 2013, Washington632
(United States), doi: 10.1109/IEDM.2013.6724599.633
[34] A. J. Kenyon, M. S. Munde, W. H. Ng, M. Buckwell, D. Joksas, and A. Mehonic, “The634
interplay between structure and function in redox-based resistance switching,” Faraday Dis-635
cussions, vol. 213, pp. 151–163, 2019, doi: 10.1039/C8FD00118A.636
22
[35] W. Wu, H. Wu, B. Gao, P. Yao, X. Zhang, X. Peng, S. Yu, and H. Qian, “A methodology637
to improve linearity of analog RRAM for neuromorphic computing,” in Symposium on VLSI638
Technology. IEEE, 2018, Honolulu (United States), doi: 10.1109/VLSIT.2018.8510690.639
[36] Z. Chai, P. Freitas, W. Zhang, F. Hatem, J. F. Zhang, J. Marsland, B. Govoreanu, L. Goux,640
and G. S. Kar, “Impact of RTN on pattern recognition accuracy of RRAM-based synaptic641
neural network,” IEEE Electron Device Letters, vol. 39, no. 11, pp. 1652–1655, 2018, doi:642
10.1109/LED.2018.2869072.643
23
FIGURES644
A
B C
AVERAGING
y1 yny2
y
MNIST
N(*1) N(*2) N(*n)
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
Committee of non-ideal memristive networks
Function:
• mitigating the eects of non-idealities
N N N
Identical digital networks
N(*1)1 N(*2)2 N(*n)n
N1 N2 Nn
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
Functions:
• mitigating the eects of non-idealities
• combining the knowledge of digital networks
Dierent digital networks
Committee of non-ideal memristive networks
Figure 1. Using multiple neural networks to improve inference accuracy. A) The principle of EA.
B) Using identical digital networks when implementing committees of memristive neural networks
only helps to deal with the damage to the networks caused by the non-idealities. C) Using different
digital networks when implementing committees of memristive neural networks both helps to deal
with the damage to the networks caused by the non-idealities and allows to combine the knowledge
about the data set acquired by individual digital networks.
24
0 10 20 30 40 50 60 70 80 90 100
Pulse number (#)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Co
nd
uc
ta
nc
e 
(m
S)
SET
RE
SE
T 0.2
0.4
0.6
0.8
1.0
1.2
SET
RE
SE
T 0.2
0.4
0.6
0.8
1.0
1.2
SET
RE
SE
T 0.2
0.4
0.6
0.8
1.0
1.2
0.2
0.4
0.6
0.8
1.0
1.2
SET
RE
SE
T SET
RE
SE
T0.2
0.4
0.6
0.8
1.0
1.2B
A
C D E F
Figure 2. Experimental data of Ta/HfO2 RRAM crossbar array of shape 128×64. A) Modulation
of devices’ conductance over 11 SET cycles, each consisting of a 100 potentiating pulses. Violin
plots of gradual conductance changes are shown for all Ta/HfO2 devices, with dots representing
median conductance after a certain number of pulses. 100 points were used for Gaussian kernel
density estimation. All violin plots have their maximum widths normalised. B-F) Examples of
devices with their conductance (in mS) B) spanning the full range, C) spanning part of the full
range, D) exhibiting cycle-to-cycle variability, E) stuck at high values, F) stuck at low values.
These diagrams show conductance of five devices from Ta/HfO2 crossbar array over 11 SET and
RESET cycles. The radial component represents the conductance, while the angular component
represents the number of applied pulses. The first SET cycle starts at the top of each of the
diagrams. The conductance (in blue) over 100 SET pulses is displayed in a clockwise fashion
across the right half of each of the diagrams. Following that, conductance (in orange) over 100
RESET pulses (starting at the bottom) is displayed across the left half of each of the diagrams,
after which the next cycle is displayed. Cartesian version of these plots is shown in Supplementary
Figure S9.
25
V16
I1
+ − + − + − + −
I2 I3 I4 I47 I48 I49 I50 I64
V1
V17
V128
pairs of neighbouring bit lines implement
positive and negative weights
x1
y1
y2
y24
y25
x2
x113
x114
x782
x783
x784
x785
A
B
  ~1
/7 of
 weigh
ts mapped onto 1/7 crossbars
−20
−15
−10
−5
0
Average change in current (%
)
Output number (#)
5 10 15 20 25 30 40 4535 50
smaller decreases in current
near the inputs
larger decreases in current
further from the inputs
Figure 3. Theoretical implementation of a synaptic layer of shape 785 × 25 using crossbars of
shape 128 × 64. A) Mapping the first subset of weights onto one of the seven crossbars used
to implement the whole synaptic layer. Positive weights and negative weights are mapped onto
memristors in different bit lines. B) Heatmap of average changes in output currents due to line
resistance (in all seven Ta/HfO2 crossbars). For this particular simulation, it was assumed that
Ta/HfO2 devices can be programmed perfectly.
26
80
82
84
86
88
90
92
94
96
98
A
cc
ur
ac
y 
(%
)
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Figure 4. Accuracy achieved by individual networks and their committees when faulty devices,
D2D variability data and line resistance of Ta/HfO2 crossbar are taken into account. The maximum
whisker length is set to 1.5× IQR.
100 101 102
Absolute relative error of current (%)
2
5
10
20
30
40
50
60
70
80
90
95
98
Cu
m
ul
at
iv
e 
pr
ob
ab
ili
ty
 (%
)
higher
resistance
states
Data points
Lognormal ts
Figure 5. Cumulative probability plots of RTN-induced relative current deviations for all 8
resistance states of a Ta2O5 RRAM device. Lognormal fits are shown for each resistance state.
27
91
92
93
94
95
96
97
A
cc
ur
ac
y 
(%
)
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Figure 6. Accuracy achieved by individual networks and their committees when RTN data of a
Ta2O5 device are taken into account. Additionally, interconnect resistance of 0.35 Ω and 0.32 Ω
in the word and bit lines, respectively, (from Ta/HfO2 array) was used to include line resistance
effects. The maximum whisker length is set to 1.5× IQR.
100 101
Absolute relative error of current (%)
2
5
10
20
30
40
50
60
70
80
90
95
98
Cu
m
ul
at
iv
e 
pr
ob
ab
ili
ty
 (%
)
higher
resistance
states
Data points
Lognormal ts
Figure 7. Cumulative probability plots of RTN-induced relative current deviations for all 8
resistance states of aVMCO RRAM device. Lognormal fits are shown for each resistance state.
28
91
92
93
94
95
96
97
98
A
cc
ur
ac
y 
(%
)
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Figure 8. Accuracy achieved by individual networks and their committees when RTN data of an
aVMCO device are taken into account. Additionally, interconnect resistance of 0.35 Ω and 0.32 Ω
in the word and bit lines, respectively, (from Ta/HfO2 array) was used to include line resistance
effects. The maximum whisker length is set to 1.5× IQR.
10 5 10 6
Total number of memristors
90
91
92
93
94
95
96
97
98
M
ed
ia
n 
ac
cu
ra
cy
 (%
)
Individual networks
Committees of 2 networks
Committees of 3 networks
Committees of 4 networks
Committees of 5 networks
Figure 9. Median accuracy achieved by individual one-hidden-layer memristor-based networks
and their committees, when controlled for total number of memristors required. The networks
contained 25, 50, 100 or 200 hidden neurons and were disturbed using faulty devices and D2D
variability data from Ta/HfO2 crossbar.
29
TABLES645
First author
(year)
Non-ideality Device type Proposed solution
C. Sung
(2018) [31]
Current/voltage non-linearity TaOx RRAM Hot-forming step is adopted
C. Li
(2018) [15]
Current/voltage non-linearity Ta/HfO2 RRAM 1T1R architecture is adopted
Y. Fang
(2018) [32]
Device-to-device variability HfOx RRAM
Ultra-thin ALD-TiN
buffer layer is introduced
B. Govoreanu
(2013) [33]
Device-to-device variability Al2O3/TiO2 (VMCO) RRAM Non-filamentary RRAM is adopted
A. J. Kenyon
(2019) [34]
Device-to-device variability SiOx RRAM
The roughness of bottom
electrodes is increased
L. Xia
(2017) [14]
Faulty devices -
A modified mapping algorithm
and redundancy schemes are used
S. Ambrogio
(2018) [7]
Limited dynamic range PCM
Two pairs of conductance of varying significance
for every synaptic weight are used
M. Hu
(2016) [17]
Line resistance -
Advanced mapping algorithms are used to
compensate for line resistance effects
W. Wu
(2018) [35]
Programming non-linearity HfOx RRAM
Electro-thermal modulation layer is
deposited on the switching layer
J. Woo
(2016) [9]
Programming non-linearity HfO2 RRAM Bilayer structure is adopted
S. Ambrogio
(2018) [7]
Programming non-linearity PCM
PCM devices are used together
with CMOS transistors
Z. Chai
(2018) [36]
Random telegraph noise TiO2/a-Si (aVMCO) RRAM Non-filamentary RRAM is adopted
Table I. Examples of past efforts at dealing with non-idealities of memristive devices and their
systems.
30
Committee Machines—A Universal Method to Deal with1
Non-Idealities in Memristor-Based Neural Networks2
D. Joksas1, P. Freitas2, Z. Chai2, W. H. Ng1, M. Buckwell1,3
C. Li3, W. D. Zhang2, Q. Xia3, A. J. Kenyon1, and A. Mehonic14
1Department of Electronic and Electrical Engineering,5
University College London, London (United Kingdom)6
2Department of Electronics and Electrical Engineering,7
Liverpool John Moores University, Liverpool (United Kingdom)8
3Department of Electrical and Computer Engineering,9
University of Massachusetts Amherst (United States of America)10
Abstract11
Artificial neural networks are notoriously power- and time-consuming when implemented on con-12
ventional von Neumann computing systems. Recent
:::::::::::::
Consequently,
:::::::
recent years have seen an emer-13
gence of research in
::::::::
machine
::::::::
learning hardware that strives to break the bottleneck of von Neumann14
architecture and optimise the data flow, namely, to bring memory and computing closer together.15
One of the most often suggested solutions is the physical implementation of
::
A
::::::::
popular
::::::::::
approach16
:
is
:::
to
:::::::
realise
:
artificial neural networks in which
:::::::::
hardware
:::
by
::::::::::::::
implementing their synaptic weights17
are realised with memristive devices, such as resistive random-access memory
:::::
using
::::::::::::
memristive18
:::::::
devices. However, various device- and system-level non-idealities usually prevent these physical19
implementations from achieving high inference accuracy. We suggest applying a well-known con-20
cept in computer science—committee machine—in
::::::::::::
machines—in
:
the context of memristor-based21
neural networks. Using simulations and experimental data from three different types of mem-22
ristive devices, we show that committee machines employing ensemble averaging can successfully23
increase inference accuracy in physically implemented neural networks that suffer from faulty de-24
vices, device-to-device variability, random telegraph noise and line resistance. Importantly, we25
show
::::::::::::
demonstrate
:
that the accuracy can be improved even without increasing the total number of26
memristors.27
1
I. INTRODUCTION28
Artificial neural networks (ANNs), with all of their variants, are now the main tools in29
machine learning tasks, such as classification. The vast amounts of data being constantly30
produced have enabled successful training and operation of ANNs. However, to achieve31
high inference accuracy, it is usually necessary for neural networks to have a large number of32
parameters. This results in both training [1] and inference [2] stages being time- and power-33
consuming. This is largely caused by the need to transfer data from memory to computing34
units—physical separation of memory and computing is the essence of any von Neumann35
system.36
One of the most promising solutions to these problems is the paradigm of non-von Neu-37
mann computing and, specifically, analogue implementations of synapses (weights) in phys-38
ical ANNs. Because there are many more synapses than there are neurons in ANNs, the39
matrix-vector multiplications, in which the synaptic weight values are used, are the costli-40
est operations in these networks, both in terms of power and time. Computing directly in41
memory would minimise costly data transfers from off-chip memory, thus the most popular42
approach is using analogue memory devices as proxies for synaptic weights of ANNs (both43
fully connected and their variants [3, 4]). A common technique is to arrange such devices44
in a structure, called crossbar array, in which every device (or a pair of devices) is used to45
represent a single synaptic weight or, more generally, an entry in a matrix [5]. Memristive46
devices, such as phase-change memories (PCMs) [6, 7] or resistive random-access memories47
(RRAMs) [8, 9], have been considered as candidates for such tasks. Although here we fo-48
cus on ex-situ training, such systems have been successfully utilised for in-situ training too49
[10, 11].50
In memristive implementations of ANNs, the main concern is that various non-idealities51
associated with these devices can prevent these systems from achieving high accuracy [12,52
13]. Examples of non-idealities affecting inference accuracy include, but are not limited53
to, devices not being able to electroform, devices stuck in one of the resistance states after54
electroforming, device-to-device (D2D) variability and random telegraph noise (RTN). When55
training analogue systems in-situ, limited endurance and non-linear resistance modulation56
too have to be taken into account. To mitigate the effects of these device non-idealities, it is57
often necessary to modify device structure [9], to use more advanced programming schemes58
2
[14] or to use additional circuitry [15] or high-precision processing units [16] in conjunction59
with memristive elements. On the system level, there is an issue of line resistance which60
affects the distribution of currents and thus decreases the accuracy. These line resistance61
effects can be partially compensated for algorithmically [17] or partially mitigated by using62
multiple smaller crossbar arrays [18]. Examples of past efforts at dealing with these and63
other non-idealities of memristive devices and systems are listed in Table I; most of these64
non-idealities are still the main focus of the research in the neuromorphic community.65
We propose a simple way to mitigate the effects of all types of non-idealities during66
inference. We suggest combining several non-ideal memristor-based neural networks into67
committees to achieve better accuracy. The committee machine (CM) method we propose68
significantly increases the inference accuracy and does not increase the computation time69
because memristive ANNs in such committees work in parallel.70
In this work, we firstly explain the simulation setup—what networks were trained,71
how they were simulated and how they were combined into CMs. After that, follows72
the experimental part. We investigate three different types of memristor technology—73
tantalum/hafnium oxide-based (Ta/HfO2), tantalum oxide-based (Ta2O5), and amorphous74
vacancy modulated conductive oxide-based (aVMCO) devices. By exploring their non-75
idealities relevant to inference—faulty devices, D2D variability, RTN, and line resistance—76
we use the experimental data to simulate memristive ANNs working individually and in77
committees.78
II. RESULTS79
A. Simulation setup80
Fully connected ANNs were trained in software to recognise handwritten digits (using81
MNIST data base [19]). Architectures with one hidden layer were explored. Unless stated82
otherwise, the simulations used networks with 25 hidden neurons. However, networks with83
50, 100 and 200 hidden neurons were additionally employed to evaluate the effectiveness of84
the proposed method while controlling for the total number of memristors required. Follow-85
ing training, weights of ANNs were mapped onto pairs of conductances using proportional86
mapping scheme (see [20]) to simulate memristor-based ANNs. Finally, these memristive87
3
networks were disturbed using experimental data to reflect the effect of device- and system-88
level non-idealities.89
After simulating physical non-idealities, the networks were combined into CMs that em-90
ployed ensemble averaging (EA) [21]. The principle of EA is shown in Figure 1A—several91
networks are combined in parallel and then their outputs are averaged. After that, the92
prediction is made using the averaged vector—the prediction is the label corresponding to93
the largest entry in the vector.94
CM methods are frequently used even with conventional ANNs. Methods, such as EA,95
often produce better accuracy than that of the best individual network in a committee [22].96
Although there are other types of CMs besides EA, they often rely on training additional97
gating networks or boosting networks during the training stage. Using a gating network in98
this scenario would produce additional problems—to avoid it acting as a performance bottle-99
neck, it too would have to be implemented on crossbar arrays. Various non-idealities would100
decrease the effectiveness of this gating network which is responsible for making the deci-101
sions about the whole committee of ANNs. Likewise, we speculate that boosting of networks102
would not be feasible in ex-situ training because it requires information about where indi-103
vidual ANNs perform poorly—this cannot be known precisely until they are implemented104
physically on crossbar arrays and the non-idealities manifest themselves. To authors’ best105
knowledge, the application of boosting in the context of memristive neural networks seems106
to have been explored only once before [23]; as expected, it requires training each memristive107
implementation differently because non-idealities manifest themselves differently in different108
crossbar arrays.109
There exist modifications of EA algorithm that could potentially perform better. One110
example of this is generalized ensemble method (GEM) which, instead of using equal weight-111
ings for each network during averaging (as in EA), uses a different one for each network [21].112
These weightings are analytically determined by considering correlation of errors between113
different networks. But because [21] only considered networks with mean square error loss114
function (while our networks used cross-entropy loss function), this work does not explore115
GEM. Instead, we investigated whether it is possible to achieve a better performance by116
optimising the weightings numerically. This method, like GEM and others previously men-117
tioned, might be impractical because, firstly, these weightings could be determined only after118
the ANNs are physically implemented on crossbars, and, secondly, the devices could change119
4
throughout their lifetimes thus affecting the optimal weightings.120
Even with the assumption that the devices would have perfect retention, we found that121
optimisation of weightings achieves effectively the same performance. Because of these rea-122
sons, we focus only on EA in the main text, but present our results of optimising weightings123
in Supplementary Figure S3
:::
S5. We stress that we are open to the idea that other CM meth-124
ods besides EA could be utilised successfully for ex-situ training in the context of memristive125
ANNs. However, in this work we focus on demonstrating that CMs can be used to improve126
the accuracy of memristor-based ANNs in general.127
With EA, we find that even when the memristive ANNs, which go into a committee, all128
use the same digitally implemented
:::::::
digital weights that are mapped onto crossbar arrays129
(see Figure 1B), committee of memristor-based networks can still achieve higher accuracy130
than just a single non-ideal network. Although all networks have the same digital weights131
before mapping, their physical implementations (which we call ”disturbances” in Figures 1B,132
C because they can usually be represented by the modification of individual weights) will133
be different. For example, in one crossbar array, a certain set of devices will be faulty, while134
in the other crossbar array, it will be a different set. This will result in different physical135
implementations having slightly different learned representations of the data set, or, to136
paraphrase, different networks will be ”damaged” differently by the non-idealities. This137
means that these committees will be able to combine different representations, and thus138
achieve higher accuracy. However, by definition, such approach would almost certainly not139
yield a committee accuracy that is higher than the accuracy of a single digitally implemented140
network.141
A better approach is to use different digital networks for different physical implementa-142
tions that go into a committee (see Figure 1C). This approach much more resembles the143
conventional application of EA in computer science. In the context of memristive crossbar144
arrays, it would not only help to mitigate the effects of the non-idealities (as in the case145
of Figure 1B), but would also allow to combine the representations of digital networks that146
were different even before the mapping stage. Most importantly, this method allows for a147
committee to achieve higher accuracy which is sometimes even higher than that of individual148
networks with digitally implemented weights. We thus used this method in this analysis.149
:::
An
:::::::::
example
:::::::::::::
comparison
::
of
::::::
these
:::::
two
::::::::::::
approaches
::
is
::::::::::
presented
:::
in
::::::::::::::::
Supplementary
::::::::
Figure
::::
S8.
:
150
In this work, any given committee used only one network architecture but each network151
5
was initialised differently before training, thus trained networks had different sets of weights.152
Although it was not explored in this work, combining different network architectures in a153
committee of memristor-based networks might be advantageous. Furthermore, in this work154
we focus on fully connected ANNs but CMs could be applied to other variants of neural155
networks as well. Due to the simplicity of EA, it could, for example, be employed in con-156
volutional neural networks (CNNs) [24], which are often used for image classification. This157
might be of interest as CNNs have been successfully implemented using crossbar arrays re-158
cently [25]. However, crossbar implementations are naturally more suited to fully connected159
networks, therefore we limit ourselves to this architecture but are open to exploring the160
effectiveness of EA with memristive CNNs in the future.161
B. Ta/HfO2 RRAM162
With array-level data available, Ta/HfO2 experiments provide the most complete pic-163
ture of device- and system-level non-idealities. In this subsection, we present not only the164
analysis of faulty devices and D2D variability, but also careful consideration of the line resis-165
tance effects. Ta/HfO2 memristors do not exhibit apparent RTN and overall have excellent166
retention properties [26], and thus are perfect candidates for inference application.167
1. Faulty devices and device-to-device variability168
The most energy-efficient procedure to modulate the conductance of memristors is by169
the application of voltage pulses. In an ideal scenario, one would apply identical pulses170
and observe constant increases in conductance with each pulse. This is rarely the case171
in practise, but, fortunately, this type of behaviour is more relevant for in-situ training172
where it is necessary to ensure linear adjustment of ANN’s weights [27]. In ex-situ training,173
conductance verification schemes can be used to program the devices precisely. Because the174
devices would have to be programmed only once, one can spend additional resources to do so175
accurately by applying SET (potentiation) and RESET (depression) pulses until a desirable176
conductance state is achieved.177
Even with this approach, there remain two obstacles—faulty devices and D2D variability.178
It is observed in most memristor technologies that at least a small fraction of the devices179
6
tends to get stuck in a particular conductance state. Additionally, even if not stuck, different180
devices might behave differently; for example, they might have different conductance ranges.181
Figure 2A shows conductance changes in Ta/HfO2 RRAM devices (in a 128 × 64 crossbar182
array) when they are applied with voltage pulses. We can see from the median values183
that overall the devices’ conductance tends to increase as more SET pulses are applied.184
However, the wider bottom regions of the violin plots indicate that some devices are stuck185
around high resistance state (HRS) and cannot set entirely no matter how many voltage186
pulses are applied. There also exist devices that are stuck in low resistance state (LRS), or187
simply do not span the full conductance range.188
Figure 2A combines data from multiple SET cycles for each of the memristors, thus it189
is important to understand how do these devices behave individually. Figures 2B-F show190
conductance of 5 (out of 8,192) devices over 11 SET and RESET cycles. In the five dia-191
grams, the radial component represents the conductance (in mS) and the angular component192
represents the number of applied pulses. Figure 2B shows an example of preferable (and193
typical) device behaviour—conductance changes in a continuous fashion and spans a wide194
range of conductance values, from ∼0.1 ms to ∼1.0 ms. Although RESET cycles tend to195
feature abrupt decreases in conductance, one can always repeat a cycle and exploit the more196
predictable behaviour of SET cycles.197
When encoding continuous numbers into crossbar devices’ conductances, it is often prefer-198
able to choose a large enough conductance range. Using data from Figure 2A, one could,199
for example, choose the range between the first and the last median points (from ∼0.1 mS200
to ∼1.0 mS). Device, whose behaviour is presented in Figure 2B, could be easily set to any201
conductance within that range, as we have seen before. On the other hand, device, whose202
behaviour is presented in Figure 2C, although operating in a predictable fashion, has smaller203
conductance range. We can see that in all cycles, its conductance does not exceed 0.8 mS.204
This is an example of D2D variability that can make it difficult to choose optimal operating205
range and set the conductance of all devices precisely.206
Device, whose behaviour is presented in Figure 2D, shows high cycle-to-cycle variability.207
Although that could prove to be a problem in some applications, this specific device might208
perfectly serve its purpose in ex-situ training of ANNs. We can observe that this device209
spans the same conductance range as device from Figure 2B, even if in an unpredictable210
manner. Because all states in the full range are, in theory, achievable, one can cycle the211
7
device multiple times until it is set to the required conductance level.212
Lastly, we have devices whose negative effect is most difficult to mitigate—faulty devices.213
Figure 2E shows behaviour of a device stuck at high conductance values, while Figure 2F214
shows behaviour of a device stuck at low conductance values. No matter how many pulses215
the devices are applied with or how many times they are cycled, they exhibit almost no216
conductance variation and thus, in most cases, cannot be used to encode information.217
Knowing that some devices perform like the ones whose behaviour is shown in Fig-218
ures 2C,E,F, it is important to minimise their negative effect. If the conductance that a219
device has to be set to is outside that device’s range, it is sensible to set it to the closest220
achievable conductance. Although there is little that can be done about fully stuck memris-221
tors, it is possible to optimise the behaviour of devices like the one in Figure 2C that simply222
have smaller conductance range. For example, if such a device has to be set to 0.9 mS, one223
would set it to the highest achievable conductance (∼0.8 mS). In the following simulations224
involving faulty devices and D2D variability, operating range between the first and the last225
median points was used, the devices were chosen randomly from the 128× 64 crossbar and226
set to the most desirable states, as described in this paragraph.227
2. Line resistance228
The effect of line resistance can be extremely detrimental in many crossbar-based im-229
plementations of ANNs. That is especially the case if the crossbars used
:::
are
::::::
large
:
and230
the resistance of the interconnects are large
::
is
:::::
high
:
(compared to memristors’ resistance).231
Because in a neural network many of the inputs are non-zero at any given time, a lot of232
current accumulates in the bit lines which results in significant voltage drops across the233
interconnects, and thus the current distribution across the crossbar is affected in a major234
way.235
Although there are many possible options for how to map synaptic weights onto crossbar236
arrays, the choice can determine the role of line resistance. It is often the case that synaptic237
layers of ANNs are large in size. However, that does not mean that the weights in those238
layers have to be mapped onto crossbars of equivalent shape; not only is that sometimes239
impossible, but it can also amplify the effect of line resistance. For example, if a synaptic240
layer with 785 input neurons (as is the case with the first layer of our ANNs) was mapped241
8
onto a crossbar with 785 word lines, massive amounts of current would accumulate in the242
bit lines.243
The Ta/HfO2 crossbar has shape 128×64 and so this shape was chosen for all the simula-244
tions involving line resistance. Even relatively small ANNs of architecture 784(+1):25(+1):10245
would need 2× (785× 25 + 26× 10) = 39, 770 memristors to be implemented. Even if not246
all the inputs were used at any given time, it would not be possible to fit all the memristors247
onto a single crossbar of shape 128× 64. To overcome this, we decided to simulate multiple248
crossbars, each of which would implement a subset of the synaptic weights, but, for a given249
synaptic layer, would all compute in parallel. Because d785/128e = 7, seven crossbars were250
used to implement the first synaptic layer; the first six crossbars utilised all 128
:::::::::
crossbar251
::::::::
utilized
::::::::
bottom
::::
113 word lines, while the last one used only the bottom 17
:::::
other
::::
six
::::::::::
crossbars252
:::::
used
::::::::
bottom
:::::
112
:
word lines because 785− 6× 128 = 17
::::::::::::::::::::
113 + 6× 112 = 785. The second253
synaptic layer was implemented using eighth crossbar utilising
::::::::
utilizing
:
its bottom 26 word254
lines.255
Figure 3A shows an example of how the first synaptic layer of 784(+1):25(+1):10 neural256
network could be implemented. Specifically, it shows how the first subset of weights would257
be implemented using one of the crossbars. Because we use proportional mapping scheme,258
positive and negative weights would be implemented in different bit lines. In Figure 3A,259
memristors designated to implement positive weights are coloured in blue, memristors des-260
ignated to implement negative weights are coloured in orange and unelectroformed memris-261
tors are coloured in black. Because simulations were constrained by experimental data, the262
rightmost bit lines are
:::::
some
:::
of
::::
the
:::::::::
devices
:::::
were
::::
left
:
unused and assumed to contain only263
unelectroformeddevices
::
be
:::::::::::::::::
unelectroformed. In practise, the crossbars could be manufactured264
to fit the geometry of the ANNs.265
In each synaptic layer, the corresponding output currents from each of the crossbars266
would be added together. Additionally, output currents at the bit lines implementing neg-267
ative weights would be subtracted from the output currents at the corresponding bit lines268
:::::::::::::
neighbouring
::::
bit
:::::
lines
::::
(to
::::::
their
:::::
left) implementing positive weights. For example, in the ex-269
ample configuration of Figure 3A, output current at the 26th
:::
2nd bit line would be subtracted270
from the output current at the 1st bit line, etc.271
Unfortunately, even when using multiple smaller crossbars, the interconnects can signif-272
icantly disturb current distribution in the crossbar. Average output current decreases due273
9
to line resistance in all seven crossbars of Ta/HfO2 devices (whose resistance ranges from274
∼1 kΩ to ∼11 kΩ, and their interconnect resistance is 0.3 Ω
::::::::
0.35 Ω
:::
and
::::::::
0.32 Ω
::
in
:::::
the
::::::
word275
::::
and
:::
bit
::::::
lines,
:::::::::::::
respectively), are shown in the top heatmap of
:::::::::
heatmap
:::
in
:
Figure 3B. We can276
see that the current decreases can range from ∼15
::
12% at the outputs nearest to the applied277
voltages to ∼18
::
16% at the outputs in the rightmost bit lines that are used. Such large278
current decreases often result from large input voltages that are applied at the top part of279
the crossbar, far away from the outputs. Such inputs generate large amounts of current that280
flow through large portions of the bit lines and, with voltage drops across interconnects,281
disturb the overall current distribution in a major way.282
In some applications, such as supervised learning, it might be possible to strategically283
map certain inputs to certain word lines, so that the effect of line resistance is minimised.284
We propose intensity-aware reordering of ANN’s inputs in which we record the average285
input intensities over training and verification sets, and then map inputs with highest286
average intensities to the word lines closest to the outputs of a crossbar. This makes287
it so that most of the current is generated near the outputs, while the currents in the288
top parts of the bit lines are disturbed minimally. Bottom heatmap in Figure 3B shows289
average current decreases when using such a scheme with an unseen test set—we observe290
significantly smaller decreases. Additionally, to make the influence of positive and negative291
weights (which are affected very differently in the naive mapping of Figure 3A) more equal292
and to increase the variability between different ANNs in a committee, we suggest random293
reordering of inputs and outputs. Both intensity-aware and random reordering were used294
in all the following simulations involving line resistance . The implementation of these295
methods individually and in combination with each other is explained in more detail in the296
supplementary information
::
In
::::
the
:::::::::::::::
supplementary
::::::::::::::
information,
:::
we
::::::::
provide
::
a
:::::::::
possible
:::::::::
strategy297
::
of
:::::::::::
mitigating
:::::
line
:::::::::::
resistance
:::::::
effects
:::
in
:::::::::::
supervised
::::::::::
learning.
::::::
This
::::::::
scheme
:::::
was
::::
not
:::::::::::
employed298
::
in
::::
the
:::::::::::::
simulations
::::::::::
described
:::
in
::::
the
::::::
main
:::::
text
:::::::::
because
:::
we
:::::::::
wanted
:::
to
::::
find
:::::
out
:::::
how
::::
well
:::::
the299
::::
CM
::::::::
method
:::::::
would
:::::
deal
::::::
with
:::::::::::
noticeable
::::
line
:::::::::::
resistance
:::::::
effects.300
3. Inference accuracy301
Figure 4 shows the accuracy of individual networks, as well as of their committees; mem-302
ristive ANNs were simulated by taking into account three non-idealities of Ta/HfO2 crossbar303
10
explored earlier—faulty devices, D2D variability and line resistance. As indicated by the304
yellow box plot in Figure 4, individual networks implemented digitally achieve ∼95.9% me-305
dian accuracy. Networks disturbed to reflect the effect of non-idealities achieve ∼90.8
:::::
91.0%306
median accuracy, as indicated by the vermilion box plot. Although that is a substantial307
drop in accuracy, we see that as more networks are added to the committee, the more the308
accuracy increases. When 5 networks are used in a committee, median accuracy increases309
up to ∼95.8
::::
95.7%, as indicated by the rightmost green box plot.310
C. Ta2O5 RRAM311
In order to explore
:::
the
:
effectiveness of minimising adverse effects of RTN, we use another312
memristor technology based on Ta2O5. To investigate RTN, measurements from a single313
device were considered. To simulate line resistance effects, interconnect resistance from314
Ta/HfO2 was used and the same crossbar shape was assumed.315
1. Random telegraph noise316
Memristors often suffer from RTN resulting in a different accuracy at any given instant317
in time. Ta2O5 device was characterised by measuring the current of 8 resistance states318
multiple times. Figure 5 shows the cumulative probability plots for those resistance states,319
together with lognormal fits modelling the nature of RTN. One of the things that the figure320
reveals is that higher resistance states suffer from higher degree of RTN. Fits for every321
resistance state, together with occurrence rates (see Supplementary Table SII), were used322
to disturb the weights of ANNs in order to reproduce the effect of RTN.323
2. Inference accuracy324
The results combining RTN and line resistance effects for Ta2O5 device are shown in325
Figure 6. From the difference in median accuracy between yellow and blue box plots, we can326
notice that there is a significant drop in accuracy simply due to mapping of weights onto327
conductances. That is not surprising given that only 8 states were available for mapping.328
One can also observe that further drop in median accuracy due to non-idealities is not as329
11
severe—it drops to ∼94.2
:::::
94.1%. The RTN disturbance magnitude is limited to <100% in330
most cases, which possibly contributes to its smaller effect on accuracy. Additionally, Ta2O5331
device has much higher resistance (ranging from 25 kΩ to 200 kΩ), thus line resistance is also332
less of a concern. When non-ideal networks are combined into committees of 5, the median333
accuracy jumps to ∼96.5%—even higher than the software baseline of individual networks.334
This reveals additional trend seen in all the simulations performed—the higher the accuracy335
of the individual non-ideal memristive networks, the higher the accuracy of the committees336
that they are part of.337
D. aVMCO RRAM338
Further, we consider a third memristor technology—one based on aVCMO materials. We339
test the effects of RTN by considering measurements from a single device. Line resistance340
effects were simulated by using interconnect resistance and shape of Ta/HfO2 crossbar array.341
1. Random telegraph noise342
Figure 7 shows the cumulative probability plots for 8 resistance states of an aVMCO343
device suffering from RTN. Like in Ta2O5, we observe that higher resistance states experience344
RTN of higher magnitude. However, compared to Ta2O5, the RTN magnitude is much more345
predictable. Fits for each of the 8 resistance states, together with occurrence rates (see346
Supplementary Table SIII), were used to simulate
:::
the
:
effect of RTN in aVMCO-based neural347
networks.348
2. Inference accuracy349
The results combining RTN and line resistance are shown in Figure 8. As with Ta2O5, we350
see a large drop due to mapping onto conductances—consequence of very few states available351
for mapping. More interestingly, the accuracy of individual memristor-based networks with352
and without non-idealities is almost identical. That is because the occurrence rate of RTN353
in aVMCO device is small and there is a much smaller probability of RTN having large354
magnitude. Additionally, resistance of aVMCO device is even higher than that of Ta2O5355
12
device—it ranges from 1 MΩ to 7.5 MΩ. Therefore, line resistance has even a smaller effect356
in a hypothetical array of aVMCO devices. Due to median accuracy of individual non-ideal357
memristor-based networks being higher (∼94.7
:::::
94.6%), the median accuracy of committees358
is higher too—in committees of size 5 it increases to ∼96.6
::::
96.7%.359
III. DISCUSSION360
The results from the previous section suggest that the method of using committee ma-361
chines to improve the accuracy of memristive neural networks is technology-agnostic
::::::::::::
technology-362
::::
and
::::::::::::::::::::::
non-ideality-agnostic. CMs can mitigate the effects of faulty devices, D2D variability,363
RTN and line resistance in combination with each other. Although line resistance is more364
difficult to deal with using committees due to the similar way in which all crossbars of365
different networks get affected, using random reordering can increase the effectiveness of366
ensembles of non-ideal memristive networks. In
::::
CM
::::::::
method
:::
is
::::::::
slightly
:::::
less
:::::::::
effective
::::::
with367
:::::
large
:::::
line
:::::::::::
resistance
:::::
(see
:::::::::::
discussion
::::
in
::::
the
::::::::::::::::
supplementary
:::::::::::::::
information),
:::
in
:
all cases, we368
observe that the accuracy of individual non-ideal networks largely determines the accuracy369
of committees. That is consequential because it means that although committees always370
increase the accuracy, there is still an incentive to optimise the devices and systems that371
implement these networks—the higher the accuracy of individual networks, the higher the372
accuracy of the committees.373
::
It
::
is
:::::
also
::::::::::::
important
:::
to
:::::::::
consider
:::::::::
whether
:::::::
using
:::::::
larger
:::::::::::
networks,
::::::::
instead
:::
of
:::::::::::::
committees374
::
of
::::::::
smaller
:::::::::::
networks,
::::::::
would
::::::
yield
::::
the
:::::::
same
::::::::
results
::
if
:::::
the
::::::
same
:::::::::
number
:::
of
::::::::::
synapses
:::::
(or375
::::::::::::
memristors)
:::::
was
::::::
used
:::
in
::::
the
::::::
large
:::::::::
network
:::
as
:::
in
:::::
the
:::::::::::
committee
:::
of
:::::::::
smaller
:::::::::::
networks.
::::
In376
:::
our
::::::::::
previous
:::::
work
::::
we
::::::
found
:::::
that
::::
the
::::::::::
accuracy
::
of
::::::::::
networks
:::::::
before
:::::::::::::
disturbance
:::::::
(which
:::
we
:::::
call377
:::::::::
“starting
::::::::::::
accuracy”)
:::::
has
::
a
:::::
huge
:::::::
effect
:::
on
:::::
the
:::::::::::
robustness
:::
to
:::::::::::::::::::::
non-idealities—the
::::::
larger
:::::
the378
::::::::
starting
::::::::::
accuracy,
:::::
the
::::::
more
:::::::
robust
::::
the
::::::::::
networks
:::::::::
become
:::::
[20].
::::::
One
:::::
way
:::
to
::::::::
achieve
::::::::
higher379
::::::::
starting
::::::::::
accuracy
::
is
:::
to
:::::
have
::::::
larger
:::::::::::
networks,
::::
e.g.
::
if
::::
we
:::::
have
::
a
::::::::
network
::::::
with
::::
one
:::::::
hidden
:::::::
layer,380
:::
we
::::::
might
:::::::::
increase
::::
the
:::::::::
number
:::
of
::::::::
neurons
:::
in
:::::
that
::::::::
hidden
::::::
layer,
:::::::
which
:::::::
would
::::::
likely
:::::::
result
:::
in381
::::::
higher
::::::::::
accuracy
::::::
after
:::::::::
training
::::
and
:::::
thus
:::::::
higher
:::::::::::::
robustness.
:
382
Figure 9 shows a comparison of CMs of memristor-based networks disturbed using faulty383
devices and D2D variability data from Ta/HfO2 crossbar, when controlled for the total384
number of memristors that is required to implement them (line resistance was not taken385
13
into account due to long time required to simulate it in large networks). We can observe386
that committees of two networks, each with 25 hidden neurons, (leftmost data point of387
the orange curve) achieve ∼0.9% higher median accuracy than individual networks with388
50 hidden neurons (second data point from the left in the vermilion curve), despite both389
requiring almost identical total number of memristors. Committees of two networks, each390
with 100 hidden neurons, (third data point from the left in the orange curve) achieve ∼1.1%391
higher median accuracy than individual networks with 200 hidden neurons (rightmost data392
point in the vermilion curve), even though both require almost the same total number of393
memristors. Even larger improvement is gained when committees of four networks, each with394
50 hidden neurons, (second data point from the left in the blue curve) are used instead—395
then the accuracy is improved by ∼1.5%, with almost the exact total number of memristors396
used.397
For different non-idealities and even different training schemes of the ANNs, the equiv-398
alents of Figure 9 might be different, but there are a few common characteristics in all of399
them. In all cases, for a given total number of memristors used, there is an optimal number400
of networks that should be used in a committee. Additionally, we observe that the more401
severe a non-ideality is, the more apparent the effectiveness of committees becomes. Finally,402
sometimes the committees (for a fixed total number of memristors) might achieve lower403
accuracy than individual networks but only if the networks that they replace are very small404
and the non-ideality is not very detrimental. If the networks that are being replaced with405
committees of smaller networks, are sufficiently large, the committees will achieve higher406
accuracy. An example of that is shown in Supplementary Figure S5
::
S7
:
where aVMCO de-407
vice is minimally affected by the non-idealities and so the advantage of committees becomes408
apparent only when replacing larger networks.409
The reason why committees work in the context of non-ideal implementations and why410
they work best when they are used to replace large networks might, to some extent, lie in411
their training. When it comes to training fully connected networks, their accuracy tends to412
saturate as more weights
:::::::::::
parameters
:
are added. Supplementary Figure S2
::
S4
:
shows that413
networks with 50 hidden neurons can be trained to achieve significantly higher accuracy414
than networks with 25 hidden neurons. However, networks with 200 hidden neurons achieve415
only slightly higher accuracy than networks with 100 hidden neurons. This also means that416
networks with 200 hidden neurons will be only slightly more robust to non-idealities than417
14
networks with 100 hidden neurons. When such networks are affected by non-idealities, their418
accuracy drops to similar values but the smaller network can work in a committee with419
one more network
:::::
other
::::::::::
networks, totalling almost the same number of memristors as the420
large network, but achieving higher accuracy overall. This is the most likely reason why the421
committees of smaller networks are effective at dealing with non-idealities, especially when422
replacing large networks.423
In addition to the accuracy improvements, committees can provide flexibility in mem-424
ristive implementations of neural networks. Digital implementations of ANNs have very425
predictable behaviour due to the precision of digital logic. Analogue implementations, on426
the other hand, can vary greatly even if they use the same weights before the mapping427
onto conductances—that is a result of the stochastic nature of memristors that implement428
these ANNs. The parallel and modular nature of committee machines makes memristive429
systems much more flexible. For example, if the verification accuracy of one of the ANNs in430
a memristor-based CM deteriorates below acceptable levels, its outputs could be disabled431
to ensure higher accuracy of the rest of the committee.432
Importantly, this introduced parallelism comes at almost no extra cost. For a fixed total433
number of memristors, a committee of smaller networks, compared to a large individual434
network, would only require a few additional output and bias neurons, and an averaging435
functionality, which could potentially be implemented in hardware. For example, an ANN436
with 50 hidden neurons would require 846 neurons in total, while a committee of two ANNs,437
each with 25 hidden neurons (and thus requiring almost the same total number of memris-438
tors), would require 857 neurons in total.439
In summary, our simulations employing experimental data from three different types of440
memristive devices show that committee machines employing ensemble averaging can be used441
to mitigate the effects of device- and system-level non-idealities in memristor-based neural442
networks. EA allows to achieve higher inference accuracy in physically implemented neural443
networks that suffer from faulty devices, device-to-device variability, random telegraph noise,444
and even line resistance. This method is a universal way to deal with the most common445
non-idealities and is straightforward to implement during the fabrication stage. Increased446
modularity of these memristive neural network systems will increase not only their inference447
accuracy, but also their robustness and flexibility, even without the need to sacrifice area.448
Although some level of non-idealities in memristors is unavoidable, CM method allows us449
15
to deal with these on the system level and is agnostic to a particular technology or, to some450
degree, type of the non-ideality.451
METHODS452
Experiments453
Ta/HfO2 RRAM 1T1R array consists of NMOS transistors fabricated in a commercial454
fab (feature size of 2 µm) and Pt/HfO2/Ta devices. The bottom electrode was deposited by455
evaporation of 20 nm Pt layer on top of a 2 nm tantalum (Ta) adhesive layer; the electrode456
was patterned by photolitography and a lift-off process. A 5 nm HfO2 switching layer was457
deposited by atomic layer deposition using water and tetrakis(dimethylamido)hafnium as458
precursors at 250 ◦C. Sputter-deposited Ta of 50 nm thickness followed by 10 nm Pd was459
used in a lift-off process to serve as the top electrode. The filamentary based Ta2O5 device460
consists of a TiN/4nm stoichiometric Ta2O5/20 nm nonstoichiometric TaOx/10 nm TaN/TiN461
stack with a cross-sectional area of 75 nm×75 nm, while the non-filamentary-based aVMCO462
has a cross-sectional area of 135 nm × 135 nm and is composed of a TiN/8 nm amorphous-463
Si/8 nm anatase TiO2/TiN stack. Ta2O5 and aVMCO devices were fabricated by imec. The464
detailed fabrication process parameters can be found in references [11, 28, 29] for Ta/HfO2,465
Ta2O5 and aVMCO RRAMs respectively.466
The conductance of Ta/HfO2 devices was modulated by applying SET pulses (500µs @467
2.5 V and gate voltage increasing from 0.6 V to 1.6 V). After each of the 11 cycles, RESET468
pulses were applied (5 µs @ 0.9 V increasing to 2.2 V and gate voltage of 5 V). The voltage469
was being increased linearly throughout the 100 pulses. All electrical tests for Ta2O5 and470
aVMCO devices were done with a Keysight B1500A. The RTN data is extracted by switching471
the device into 8 uniformly distributed resistance levels between 25 kΩ and 200 kΩ, and 8472
nearly uniformly distributed resistance levels between 1 MΩ and 7.5 MΩ with incremental473
RESET DC sweeps [30] for Ta2O5 and aVMCO respectively. RTN measurement is then474
carried out at each resistance level at a 0.1 V and 3 V read-out for Ta2O5 and aVMCO475
respectively, with a sampling time of 2 ms/point and 10,000 sampling point per resistance476
level for an RTN measurement period of 20 s.477
16
Simulations478
In this work, feed-forward ANNs with fully connected layers and continuous weights were479
trained to recognise handwritten digits using the MNIST data base. All 60,000 MNIST480
training images were used during the training stage; training set consisted of 50,000 images481
and verification set consisted of 10,000 images. All 10,000 test images were used to evaluate482
the inference accuracy of ANNs. Networks used 784 input neurons representing pixel inten-483
sities of MNIST images of 28× 28 pixel size, as well as one bias neuron. 10 output neurons484
were used; they represented the ANNs’ predictions of 10 handwritten digits. Hidden layer485
::::::
layers used sigmoid activation function, while the output layer used softmax activation func-486
tion. Weights were optimised by minimising cross-entropy error function using stochastic487
gradient descent. Learning rate of 0.01 and patience of 25 epochs were used. 25 networks488
were trained for each architecture explored by initialising them differently. When numer-489
ically optimising ANNs’ weightings, optimisation was performed by employing verification490
set, while the performance was evaluated using the test set. The code was implemented in491
Python.492
Weights were mapped onto pairs of memristors’ conductances using proportional map-493
ping scheme—synaptic weights were made proportional to one of the conductances in the494
pair, while the other was left unelectroformed. The zero weight was interpreted as given—495
in practise, it would be implemented by not electroforming the device, thus resulting in its496
negligible conductance. Although aVMCO devices do not have electroforming stage, for con-497
sistency we assumed that additional insulating circuit elements could be used to implement498
the zero weight. Negative weights would be implemented by placing certain memristors in499
dedicated bit lines of the crossbars whose outputs would be subtracted from the outputs at500
the corresponding bit lines implementing positive weights. Maximum weights after mapping501
were optimised separately for each set of network architecture and conductance levels; in502
each case this was done by excluding a certain proportion, pL, of weights with largest abso-503
lute values. What pL values were used for each simulation is summarised in Supplementary504
Table SI. More details on the mapping procedure can be found in our past work [20].505
All non-idealities, except for line resistance, were simulated by disturbing the individual506
conductances of memristor-based ANNs. To investigate line resistance, loop
::::::
nodal
:
analysis507
was employed. By setting up simultaneous linear equations using
:::::::
Ohm’s
::::
law
:::::
and
:
Kirch-508
17
hoff’s current and voltage laws
:::
law, those were solved in sparse matrix representation using509
Python’s library scipy.510
After simulating memristor non-idealities, committees of different ANNs were composed.511
Committees used EA, i.e. the outputs of individual networks in a committee were averaged512
to produce a single output vector. In EA, the output vectors of individual networks can513
simply be added together (if the weightings of different networks are the same, as we assume514
in the main text); the label corresponding to the entry with the highest value would be515
the prediction of the committee. This addition can be performed either in software, or, if516
the activation function of the last neuronal layer can be implemented physically, it can be517
performed by adding corresponding currents produced by the circuitry of this activation518
function.519
In the simulations, neural networks that go into a committee were chosen randomly.520
This was done to reflect the most convenient strategy when manufacturing such systems—521
because one does not need to selectively choose the networks, manufactured crossbars can be522
easily programmed without the need to replace them if they perform poorly when working523
individually (unless their effect is so detrimental that they have to be ignored which can524
be made possible with this technique). Besides, devices might change over time, thus these525
simulations, which show what happens when one does not selectively choose the networks,526
are valuable to investigate conditions where it is not possible to replace the networks.527
In the simulations, 25 base networks were used (each having different set of weights) for528
each of the architectures. Then all of their weights were mapped onto pairs of conductances529
using HRS/LRS values extracted from experiments. Finally, to reflect the effect of each of530
the non-idealities, all networks were disturbed multiple times. In each disturbance iteration,531
multiple combinations of networks were chosen and their performance as a committee of532
certain size was evaluated. In total, for each simulation (except numerically optimised533
committees which used 1,000 points)
:::::
most
:::::::::::::
simulations, 10,000 data points were recorded534
for a committee of every size—these data captured the variations of base networks, their535
combinations and different disturbance iterations.
:::::
Only
::::::::::::
simulations
::::::::::
involving
::::
line
:::::::::::
resistance536
::
or
:::::::::::
numerical
::::::::::::::
optimisation
::
of
:::::::::
weights
:::::
had
::::::
fewer
:::::
data
:::::::
points
::::
for
::::::
some
::::::::::::
committee
:::::
sizes
::::::
(due537
::
to
:::::
long
::::::::::::
simulation
:::::::
times).
::
538
18
DATA AVAILABILITY539
All data generated or analysed during
::::
The
::::::
data
:::::
that
:::::::::
support
::::
the
:::::::::
findings
::
of
:
this study540
are included in this published article (and its supplementary information file)
:::::::::
available
::::::
from541
:::
the
:::::::::::::::
corresponding
::::::::
author
::::::
upon
:::::::::::
reasonable
:::::::::
request.542
AUTHOR CONTRIBUTIONS543
A.M. and D.J. conceived the idea and designed the study. A.M., P.F. and Z.C. per-544
formed the experimental measurements. D.J. performed the simulations and analysed the545
experimental and simulation results. C.L. and Q.X. provided the experimental data of the546
programming of a Ta/HfO2 1T1R RRAM array. A.M., W.D.Z. and A.J.K. supervised the547
research. D.J. wrote the initial manuscript. All authors contributed to the discussions of548
the results and improved the text.549
COMPETING INTERESTS STATEMENT550
The authors declare that the research was conducted in the absence of any commercial551
or financial relationships that could be construed as a potential conflict of interest.552
FUNDING553
A.M. acknowledges funding from the Royal Academy of Engineering under the Re-554
search Fellowship scheme, A.J.K. acknowledges funding from the Engineering and Physi-555
cal Sciences Research Council (EP/P013503/1) and the Leverhulme Trust (RPG-2016-135),556
W.D.Z. acknowledges funding from the Engineering and Physical Sciences Research Council557
(EP/S000259/1).558
19
[1] E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy considerations for deep learning559
in NLP,” arXiv preprint arXiv:1906.02243, 2019.560
[2] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with561
pruning, trained quantization and huffman coding,” in International Conference on Learning562
Representations, 2016, San Juan (Puerto Rico), arXiv preprint arXiv:1510.00149.563
[3] C. Li, Z. Wang, M. Rao, D. Belkin, W. Song, H. Jiang, P. Yan, Y. Li, P. Lin, M. Hu, N. Ge,564
J. P. Strachan, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, and Q. Xia, “Long short-term565
memory networks in memristor crossbar arrays,” Nature Machine Intelligence, vol. 1, no. 1,566
pp. 49–57, 2019, doi: 10.1038/s42256-018-0001-4.567
[4] Z. Wang, C. Li, W. Song, M. Rao, D. Belkin, Y. Li, P. Yan, H. Jiang, P. Lin, M. Hu, J. P.568
Strachan, N. Ge, M. Barnell, Q. Wu, A. G. Barto, Q. Qiu, R. S. Williams, Q. Xia, and J. J.569
Yang, “Reinforcement learning with analogue memristor arrays,” Nature Electronics, vol. 2,570
no. 3, p. 115, 2019, doi: 10.1038/s41928-019-0221-6.571
[5] Z. Sun, G. Pedretti, E. Ambrosi, A. Bricalli, W. Wang, and D. Ielmini, “Solving matrix572
equations in one step with cross-point resistive arrays,” Proceedings of the National Academy573
of Sciences, vol. 116, no. 10, pp. 4123–4128, 2019, doi: 10.1073/pnas.1815682116.574
[6] S. R. Nandakumar, M. Le Gallo, I. Boybat, B. Rajendran, A. Sebastian, and E. Eleftheriou,575
“A phase-change memory model for neuromorphic computing,” Journal of Applied Physics,576
vol. 124, no. 15, p. 152135, 2018, doi: 10.1063/1.5042408.577
[7] S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby, I. Boybat, C. D. Nolfo, S. Sidler, M. Gior-578
dano, M. Bodini, N. C. P. Farinha, B. Killeen, C. Cheng, Y. Jaoudi, and G. W. Burr,579
“Equivalent-accuracy accelerated neural-network training using analogue memory,” Nature,580
vol. 558, no. 7708, pp. 60–67, 2018, doi: 10.1038/s41586-018-0180-5.581
[8] S. Yu, Z. Li, P. Y. Chen, H. Wu, B. Gao, D. Wang, W. Wu, and H. Qian, “Binary neu-582
ral network with 16 Mb RRAM macro chip for classification and online training,” in In-583
ternational Electron Devices Meeting. IEEE, 2016, San Francisco (United States), doi:584
10.1109/IEDM.2016.7838429.585
[9] J. Woo, K. Moon, J. Song, S. Lee, M. Kwak, J. Park, and H. Hwang, “Improved synap-586
tic behavior under identical pulses using AlOx/HfO2 bilayer RRAM array for neuromor-587
20
phic systems,” IEEE Electron Device Letters, vol. 37, no. 8, pp. 994–997, 2016, doi:588
10.1109/LED.2016.2582859.589
[10] M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev, and D. B.590
Strukov, “Training and operation of an integrated neuromorphic network based on metal-591
oxide memristors,” Nature, vol. 521, no. 7550, pp. 61–64, 2015, doi: 10.1038/nature14441.592
[11] C. Li, D. Belkin, Y. Li, P. Yan, M. Hu, N. Ge, H. Jiang, E. Montgomery, P. Lin, Z. Wang,593
W. Song, J. P. Strachan, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, and Q. Xia, “Ef-594
ficient and self-adaptive in-situ learning in multilayer memristor neural networks,” Nature595
communications, vol. 9, no. 1, p. 2385, 2018, doi: 10.1038/s41467-018-04484-2.596
[12] A. Chen and M. R. Lin, “Variability of resistive switching memories and its impact on cross-597
bar array performance,” in 2011 International Reliability Physics Symposium. IEEE, 2011,598
Monterey (United States), doi: 10.1109/IRPS.2011.5784590.599
[13] J. Kang, Z. Yu, L. Wu, Y. Fang, Z. Wang, Y. Cai, Z. Ji, J. Zhang, R. Wang, and Y. Yang,600
“Time-dependent variability in RRAM-based analog neuromorphic system for pattern recogni-601
tion,” in International Electron Devices Meeting. IEEE, 2017, San Francisco (United States),602
doi: 10.1109/IEDM.2017.8268340.603
[14] L. Xia, W. Huangfu, T. Tang, X. Yin, K. Chakrabarty, Y. Xie, Y. Wang, and H. Yang,604
“Stuck-at fault tolerance in RRAM computing systems,” IEEE Journal on Emerging and605
Selected Topics in Circuits and Systems, vol. 8, no. 1, pp. 102–115, 2017, doi: 10.1109/JET-606
CAS.2017.2776980.607
[15] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang, W. Song, N. Da´vila, C. E.608
Graves, Z. Li, J. P. Strachan, P. Lin, Z. Wang, M. Barnell, Q. Wu, S. Williams, J. Yang,609
and Q. Xia, “Analogue signal and image processing with large memristor crossbars,” Nature610
Electronics, vol. 1, no. 1, pp. 52–59, 2018, doi: 10.1038/s41928-017-0002-z.611
[16] M. Le Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma, C. Bekas, A. Curioni,612
and E. Eleftheriou, “Mixed-precision in-memory computing,” Nature Electronics, vol. 1, no. 4,613
p. 246, 2018, doi: 10.1038/s41928-018-0054-8.614
[17] M. Hu, J. P. Strachan, Z. Li, and S. R. William, “Dot-product engine as computing mem-615
ory to accelerate machine learning algorithms,” in 17th International Symposium on Quality616
Electronic Design, 2016, Santa Clara (United States), doi: 10.1109/ISQED.2016.7479230.617
21
[18] Q. Xia and J. J. Yang, “Memristive crossbar arrays for brain-inspired computing,” Nature618
materials, vol. 18, no. 4, p. 309, 2019, doi: 10.1038/s41563-019-0291-x.619
[19] Y. LeCun, C. Cortes, and C. J. C. Burges, “The MNIST database of handwritten digits,”620
2010. [Online]. Available: http://yann.lecun.com/exdb/mnist621
[20] A. Mehonic, D. Joksas, W. H. Ng, M. Buckwell, and A. J. Kenyon, “Simulation of inference622
accuracy using realistic RRAM devices,” Frontiers in Neuroscience, vol. 13, p. 593, 2019, doi:623
10.3389/fnins.2019.00593.624
[21] M. P. Perrone and L. N. Cooper, “When networks disagree: Ensemble methods for hybrid625
neural networks,” in Artificial Neural Networks for Speech and Vision. Chapman and Hall,626
1993, pp. 126–142.627
[22] S. Hashem and B. Schmeiser, “Improving model accuracy using optimal linear combinations of628
trained neural networks,” IEEE Transactions on Neural Networks, vol. 6, no. 3, pp. 792–794,629
1995, doi: 10.1109/72.377990.630
[23] B. Li, L. Xia, P. Gu, Y. Wang, and H. Yang, “Merging the interface: Power, area and accuracy631
co-optimization for RRAM crossbar-based mixed-signal computing system,” in Proceedings of632
the 52nd Annual Design Automation Conference, 2015, San Francisco (United States), doi:633
10.1145/2744769.2744870.634
[24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional635
neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105,636
Lake Tahoe (United States), doi: 10.1145/3065386.637
[25] Z. Wang, C. Li, P. Lin, M. Rao, Y. Nie, W. Song, Q. Qiu, Y. Li, P. Yan, J. P. Strachan,638
N. Ge, N. McDonald, Q. Wu, M. Hu, H. Wu, R. S. Williams, Q. Xia, and J. J. Yang, “In situ639
training of feed-forward and recurrent convolutional memristor networks,” Nature Machine640
Intelligence, vol. 1, no. 9, pp. 434–442, 2019, doi: 10.1038/s42256-019-0089-1.641
[26] H. Jiang, L. Han, P. Lin, Z. Wang, M. H. Jang, Q. Wu, M. Barnell, J. J. Yang, H. L. Xin, and642
Q. Xia, “Sub-10 nm ta channel responsible for superior performance of a HfO2 memristor,”643
Scientific reports, vol. 6, p. 28525, 2016, doi: 10.1038/srep28525.644
[27] G. W. Burr, R. M. Shelby, S. Sidler, C. Di Nolfo, J. Jang, I. Boybat, R. S. Shenoy,645
P. Narayanan, K. Virwani, E. U. Giacometti, B. N. Kurdi, and H. Hwang, “Experimen-646
tal demonstration and tolerancing of a large-scale neural network (165 000 synapses) using647
phase-change memory as the synaptic weight element,” IEEE Transactions on Electron De-648
22
vices, vol. 62, no. 11, pp. 3498–3507, 2015, doi: 10.1109/TED.2015.2439635.649
[28] Y. Fan, L. Zhang, D. Crotti, T. Witters, M. Jurczak, and B. Govoreanu, “Direct evidence650
of the overshoot suppression in Ta2O5-based resistive switching memory with an integrated651
access resistor,” IEEE Electron Device Letters, vol. 36, no. 10, pp. 1027–1029, 2015, doi:652
10.1109/LED.2015.2470081.653
[29] B. Govoreanu, D. Crotti, S. Subhechha, L. Zhang, Y. Chen, S. Clima, V. Paraschiv, H. Hody,654
C. Adelmann, M. Popovici, O. Richard, and M. Jurczak, “A-VMCO: A novel forming-free, self-655
rectifying, analog memory cell with low-current operation, nonfilamentary switching and excel-656
lent variability,” in Symposium on VLSI Technology, 2015, Kyoto (Japan), doi: 10.1109/VL-657
SIT.2015.7223717.658
[30] Z. Chai, W. Zhang, P. Freitas, F. Hatem, J. F. Zhang, J. Marsland, B. Govoreanu, L. Goux,659
G. S. Kar, S. Hall, P. Chalker, and J. Robertson, “The over-reset phenomenon in Ta2O5660
RRAM device investigated by the RTN-based defect probing technique,” IEEE Electron Device661
Letters, vol. 39, no. 7, pp. 955–958, 2018, doi: 10.1109/LED.2018.2833149.662
[31] C. Sung, S. Lim, H. Kim, T. Kim, K. Moon, J. Song, J.-J. Kim, and H. Hwang, “Effect663
of conductance linearity and multi-level cell characteristics of TaOx -based synapse device on664
pattern recognition accuracy of neuromorphic system,” Nanotechnology, vol. 29, no. 11, p.665
115203, 2018, doi: 10.1088/1361-6528/aaa733.666
[32] Y. Fang, Z. Yu, Z. Wang, T. Zhang, Y. Yang, Y. Cai, and R. Huang, “Improvement of HfOx -667
based RRAM device variation by inserting ALD TiN buffer layer,” IEEE Electron Device668
Letters, vol. 39, no. 6, pp. 819–822, 2018, doi: 10.1109/LED.2018.2831698.669
[33] B. Govoreanu, A. Redolfi, L. Zhang, C. Adelmann, M. Popovici, S. Clima, H. Hody,670
V. Paraschiv, I. Radu, A. Franquet, J. C. Liu, J. Swerts, O. Richard, H. Bender, L. Altimime,671
and M. Jurczak, “Vacancy-modulated conductive oxide resistive RAM (VMCO-RRAM): An672
area-scalable switching current, self-compliant, highly nonlinear and wide on/off-window re-673
sistive switching cell,” in International Electron Devices Meeting. IEEE, 2013, Washington674
(United States), doi: 10.1109/IEDM.2013.6724599.675
[34] A. J. Kenyon, M. S. Munde, W. H. Ng, M. Buckwell, D. Joksas, and A. Mehonic, “The676
interplay between structure and function in redox-based resistance switching,” Faraday Dis-677
cussions, vol. 213, pp. 151–163, 2019, doi: 10.1039/C8FD00118A.678
23
[35] W. Wu, H. Wu, B. Gao, P. Yao, X. Zhang, X. Peng, S. Yu, and H. Qian, “A methodology679
to improve linearity of analog RRAM for neuromorphic computing,” in Symposium on VLSI680
Technology. IEEE, 2018, Honolulu (United States), doi: 10.1109/VLSIT.2018.8510690.681
[36] Z. Chai, P. Freitas, W. Zhang, F. Hatem, J. F. Zhang, J. Marsland, B. Govoreanu, L. Goux,682
and G. S. Kar, “Impact of RTN on pattern recognition accuracy of RRAM-based synaptic683
neural network,” IEEE Electron Device Letters, vol. 39, no. 11, pp. 1652–1655, 2018, doi:684
10.1109/LED.2018.2869072.685
24
FIGURES686
A
B C
AVERAGING
y1 yny2
y
MNIST
N(*1) N(*2) N(*n)
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
Committee of non-ideal memristive networks
Function:
• mitigating the eects of non-idealities
N N N
Identical digital networks
N(*1)1 N(*2)2 N(*n)n
N1 N2 Nn
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
di
st
ur
ba
nc
e
Functions:
• mitigating the eects of non-idealities
• combining the knowledge of digital networks
Dierent digital networks
Committee of non-ideal memristive networks
Figure 1. Using multiple neural networks to improve inference accuracy. A) The principle of EA.
B) Using identical digital networks when implementing committees of memristive neural networks
only helps to deal with the damage to the networks caused by the non-idealities. C) Using different
digital networks when implementing committees of memristive neural networks both helps to deal
with the damage to the networks caused by the non-idealities and allows to combine the knowledge
of individual digital networks about the data set
::::::::
acquired
:::
by
:::::::::::
individual
::::::
digital
::::::::::
networks.
25
0 10 20 30 40 50 60 70 80 90 100
Pulse number (#)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Co
nd
uc
ta
nc
e 
(m
S)
SET
RE
SE
T 0.2
0.4
0.6
0.8
1.0
1.2
SET
RE
SE
T 0.2
0.4
0.6
0.8
1.0
1.2
SET
RE
SE
T 0.2
0.4
0.6
0.8
1.0
1.2
0.2
0.4
0.6
0.8
1.0
1.2
SET
RE
SE
T SET
RE
SE
T0.2
0.4
0.6
0.8
1.0
1.2B
A
C D E F
Figure 2. Experimental data of Ta/HfO2 RRAM crossbar array of shape 128×64. A) Modulation
of devices’ conductance over 11 SET cycles, each consisting of a 100 potentiating pulses. Violin
plots of gradual conductance changes are shown for all Ta/HfO2 devices, with dots representing
median conductance after a certain number of pulses. 100 points were used for Gaussian kernel
density estimation. All violin plots have their maximum widths normalised. B-F) Examples of
devices with their conductance (in mS) B) spanning the full range, C) spanning part of the full
range, D) exhibiting cycle-to-cycle variability, E) stuck at high values, F) stuck at low values.
These diagrams show conductance of five devices from Ta/HfO2 crossbar array over 11 SET and
RESET cycles. The radial component represents the conductance, while the angular component
represents the number of applied pulses. The first SET cycle starts at the top of each of the
diagrams. The conductance (in blue) over 100 SET pulses is displayed in a clockwise fashion
across the right half of each of the diagrams. Following that, conductance (in orange) over 100
RESET pulses (starting at the bottom) is displayed across the left half of each of the diagrams,
after which the next cycle is displayed.
:::::::::
Cartesian
:::::::
version
:::
of
:::::
these
:::::
plots
:::
is
::::::
shown
:::
in
:::::::::::::::
Supplementary
::::::
Figure
::::
S9.
26
V16
I1
+ − + − + − + −
I2 I3 I4 I47 I48 I49 I50 I64
V1
V17
V128
pairs of neighbouring bit lines implement
positive and negative weights
x1
y1
y2
y24
y25
x2
x113
x114
x782
x783
x784
x785
A
B
  ~1
/7 of
 weigh
ts mapped onto 1/7 crossbars
−20
−15
−10
−5
0
Average change in current (%
)
Output number (#)
5 10 15 20 25 30 40 4535 50
smaller decreases in current
near the inputs
larger decreases in current
further from the inputs
Figure 3. Theoretical implementation of a synaptic layer of shape 785×25 using crossbars of shape
128×64. A) Mapping the first subset of weights onto one of the seven crossbars used to implement
the whole synaptic layer. Positive weights and negative weights are mapped onto memristors in
different bit lines. B) Heatmap of average changes in output currents due to line resistance (in all
seven Ta/HfO2 crossbars)without and with a scheme that maps certain inputs onto certain word
lines depending on expected average intensities of those inputs. For this particular simulation, it
was assumed that Ta/HfO2 devices can be programmed perfectly.
27
80
82
84
86
88
90
92
94
96
98
A
cc
ur
ac
y 
(%
)
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Figure 4. Accuracy achieved by individual networks and their committees when faulty devices,
D2D variability data and line resistance of Ta/HfO2 crossbar are taken into account. The maximum
whisker length is set to 1.5× IQR.
100 101 102
Absolute relative error of current (%)
2
5
10
20
30
40
50
60
70
80
90
95
98
Cu
m
ul
at
iv
e 
pr
ob
ab
ili
ty
 (%
)
higher
resistance
states
Data points
Lognormal ts
Figure 5. Cumulative probability plots of RTN-induced relative current deviations for all 8
resistance states of a Ta2O5 RRAM device. Lognormal fits are shown for each resistance state.
28
91
92
93
94
95
96
97
A
cc
ur
ac
y 
(%
)
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Figure 6. Accuracy achieved by individual networks and their committees when RTN data of
a Ta2O5 device are taken into account. Additionally, interconnect resistance of 0.3 Ω ::::::0.35 Ω ::::and
:::::::
0.32 Ω
:
in
:::::
the
:::::
word
:::::
and
::::
bit
::::::
lines,
::::::::::::
respectively,
:
(from Ta/HfO2 array) was used to include line
resistance effects. The maximum whisker length is set to 1.5× IQR.
100 101
Absolute relative error of current (%)
2
5
10
20
30
40
50
60
70
80
90
95
98
Cu
m
ul
at
iv
e 
pr
ob
ab
ili
ty
 (%
)
higher
resistance
states
Data points
Lognormal ts
Figure 7. Cumulative probability plots of RTN-induced relative current deviations for all 8
resistance states of aVMCO RRAM device. Lognormal fits are shown for each resistance state.
29
91
92
93
94
95
96
97
98
A
cc
ur
ac
y 
(%
)
Ide
al 
AN
N
(so
ftw
are
 ba
se
lin
e)
Me
mr
ist
ive
 AN
N
(w
ith
ou
t n
on
-id
ea
liti
es
)
Me
mr
ist
ive
 AN
N
(w
ith
 no
n-
ide
ali
tie
s)
Network type
2 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
3 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
4 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
5 m
em
ris
tiv
e A
NN
s
(w
ith
 no
n-
ide
ali
tie
s)
committee machines
Figure 8. Accuracy achieved by individual networks and their committees when RTN data of
an aVMCO device are taken into account. Additionally, interconnect resistance of 0.3 Ω
:::::::
0.35 Ω
::::
and
:::::::
0.32 Ω
::
in
::::
the
:::::
word
::::
and
::::
bit
:::::
lines,
:::::::::::::
respectively,
:
(from Ta/HfO2 array) was used to include line
resistance effects. The maximum whisker length is set to 1.5× IQR.
10 5 10 6
Total number of memristors
90
91
92
93
94
95
96
97
98
M
ed
ia
n 
ac
cu
ra
cy
 (%
)
Individual networks
Committees of 2 networks
Committees of 3 networks
Committees of 4 networks
Committees of 5 networks
Figure 9. Median accuracy achieved by individual one-hidden-layer memristor-based networks
and their committees, when controlled for total number of memristors required. The networks
contained 25, 50, 100 or 200 hidden neurons and were disturbed using faulty devices and D2D
variability data from Ta/HfO2 crossbar.
30
TABLES687
First author
(year)
Non-ideality Device type Proposed solution
C. Sung
(2018) [31]
Current/voltage non-linearity TaOx RRAM Hot-forming step is adopted
C. Li
(2018) [15]
Current/voltage non-linearity Ta/HfO2 RRAM 1T1R architecture is adopted
Y. Fang
(2018) [32]
Device-to-device variability HfOx RRAM
Ultra-thin ALD-TiN
buffer layer is introduced
B. Govoreanu
(2013) [33]
Device-to-device variability Al2O3/TiO2 (VMCO) RRAM Non-filamentary RRAM is adopted
A. J. Kenyon
(2019) [34]
Device-to-device variability SiOx RRAM
The roughness of bottom
electrodes is increased
L. Xia
(2017) [14]
Faulty devices -
A modified mapping algorithm
and redundancy schemes are used
S. Ambrogio
(2018) [7]
Limited dynamic range PCM
Two pairs of conductance of varying significance
for every synaptic weight are used
M. Hu
(2016) [17]
Line resistance -
Advanced mapping algorithms are used to
compensate for line resistance effects
W. Wu
(2018) [35]
Programming non-linearity HfOx RRAM
Electro-thermal modulation layer is
deposited on the switching layer
J. Woo
(2016) [9]
Programming non-linearity HfO2 RRAM Bilayer structure is adopted
S. Ambrogio
(2018) [7]
Programming non-linearity PCM
PCM devices are used together
with CMOS transistors
Z. Chai
(2018) [36]
Random telegraph noise TiO2/a-Si (aVMCO) RRAM Non-filamentary RRAM is adopted
Table I. Examples of past efforts at dealing with non-idealities of memristive devices and their
systems.
31
