Enhancing Fault Tolerance of Neural Networks for Security-Critical
  Applications by Alam, Manaar et al.
Enhancing Fault Tolerance of Neural Networks
for Security-Critical Applications
Manaar Alam1, Arnab Bag1, Debapriya Basu Roy1, Dirmanto Jap2, Jakub
Breier3, Shivam Bhasin2, and Debdeep Mukhopadhyay1
1 Indian Institute of Technology Kharagpur, India
{alam.manaar, amiarnabbolchi, dbroy24, debdeep.mukhopadhyay}@gmail.com
2 Nanyang Technological University, Singapore
{djap, sbhasin}@ntu.edu.sg
3 Underwriters Laboratories, Singapore
jbreier@jbreier.com
Abstract. Neural Networks (NN) have recently emerged as backbone
of several sensitive applications like automobile, medical image, security,
etc. NNs inherently offer Partial Fault Tolerance (PFT) in their architec-
ture; however, the biased PFT of NNs can lead to severe consequences
in applications like cryptography and security critical scenarios. In this
paper, we propose a revised implementation which enhances the PFT
property of NN significantly with detailed mathematical analysis. We
evaluated the performance of revised NN considering both software and
FPGA implementation for a cryptographic primitive like AES SBox. The
results show that the PFT of NNs can be significantly increased with the
proposed methodology.
Keywords: Fault Tolerance · Neural Network · FPGA Implementation.
1 Introduction
We have seen an outburst of research on Neural Networks (NNs) in both industry
and academia over the years because of its compelling performance in wide va-
rieties of domains, starting from image recognition, speech processing to several
sensitive applications like medical diagnosis and security. In addition to the com-
petent performance, one of the most intrinsic features of NNs is that it exhibits
“some degree” of robustness with the ability to function correctly even after the
faults in any of its parameters in the architecture, which promotes its popularity
in diverse applications. Fault tolerance of NNs has been extensively studied in
the past by many researchers [10,3,6,8,4,5,7,11,12,9]. It has been demonstrated
that a typical NN architecture where each neuron computes the weighted sum of
inputs from other neurons cannot achieve complete fault tolerance [7,12], rather
achieves a Partial Fault Tolerance (PFT). The fault tolerance of the NNs can be
increased significantly by injecting noise during the training process [4,5], repli-
cation and majority voting during the prediction operation [8,7], or by increasing
the size of the network [10,6,11]. Fault tolerance of cryptographic algorithms has
ar
X
iv
:1
90
2.
04
56
0v
1 
 [c
s.L
G]
  5
 Fe
b 2
01
9
2 M. Alam et al.
1 2 3 4 5 6 7 8 correct
0
20
40
60
80
12.18
0 0 0 0 0 0 0
87.82
2.6
8.5
18.5
22.2
17.8
8.6
2.5
0.2
19.1
Bits Flipped
%
o
f
C
as
es
ANN Sbox Table Look-Up Sbox
Fig. 1: Comparing fault tolerance of AES SBox when implemented with Table-
Lookup against NN based SBox
emerged as an integral property with the advancement of fault attacks. As an
example, Advanced Encryption Standard (AES), which is secure against known
theoretical cryptanalysis, can be broken by a single fault injection [1].
While general applications are prone to unintentional faults, cryptographic
applications suffer from targeted faults from a powerful adversary. These faults
can be injected intentionally by creating a hazardous environment (like laser in-
jection, clock and voltage glitch, high-intensity electromagnetic wave injection,
etc.) around the system. A fault-tolerant architecture for a cryptographic appli-
cation will thwart such kind of attacks. In our case study, we have concentrated
on AES as it is the de facto standard of block ciphers around the world. One
of the primary components in AES encryption, where the secret key is directly
involved in the computation, is the AddRoundKey operation. The non-linearity
in the calculation is induced by the SubBytes operation, using SBox mapping.
To give a snapshot of the problem at hand, we performed a practical fault in-
jection on f = SBox(x⊕ k) where x is a byte input and k is one byte of secret
key, running on a commercial off-the-shelf microcontroller. Fig. 1 compares fault
tolerance of f implemented using look-up table and boolean logic against the
same function implemented with a NN (having 8 neurons in the input layer,
2 hidden layers with 400 and 200 neurons, and one output layer with 8 neu-
rons) for the mapping of 8-bit input to 8-bit output. Faults were achieved by
laser fault injection technique, aiming at control-flow disturbance. Injection time
varied randomly, covering the entire computation of f , while the other laser pa-
rameters were fixed to guarantee the faults. The NN parameters were chosen to
have 100% accuracy with no constraints on fault tolerance. It can be observed
that a blindly chosen network resulted in a fault tolerance of 87%.
Motivated by the above result, in this paper, we propose to exploit the PFT
property of NN for cryptographic applications. As a proof of concept, we im-
plement combined AES SBox preceded by XOR with secret key over one byte
(f = SBox(x⊕ k)), hereafter referred as f operation, using NNs which exhibits
Enhancing Fault Tolerance of Neural Networks 3
significantly higher degree of fault tolerance compared to the standard designs.
We also propose a method to further improve the PFT of the implementation by
incorporating several constraints on the NN parameters during the training pro-
cess. Later, implementation overhead of the developed highly fault tolerant NN
architecture for f is also reported. The implementation results are reported on
Xilinx Artix-7 FPGA (Basys 3 board), which allows us enough flexibility to test
different settings varying from low overhead, less fault-tolerant to high-overhead,
highly fault tolerant designs, against typical microcontroller.
Contribution
The primary contributions of this work are:
1. We present an analytical way to develop a highly fault tolerant cryptographic
primitive like AES SBox using an NN having a much higher degree of fault
tolerance than the standard implementation. To the best of our knowledge,
this is the first work to achieve a highly fault-tolerant AES SBox which has
only 1.21 × 10−5% chance of producing faulty outputs even after inducing
faults in network parameters.
2. We have implemented the fault tolerant NN architecture in an FPGA plat-
form with tailored implementation strategies. The modified implementation,
as it enforces the fault-tolerance property, incurs extra implementation over-
head for both software and hardware.
2 Preliminaries on Neural Network
In this section, we discuss elementary Neural Network (NN) terminologies which
has been used throughout this paper. We have considered a basic NN architecture
with three layers - the Input layer, the Hidden layer, and the Output layer,
which contain l, m, and n neurons respectively. The Activation functions for
the hidden layer and the output layer are ReLU and Softmax, respectively. The
network is trained using standard Gradient Descent Backpropagation algorithm.
The definition of each symbol related to the NN used in this paper are mentioned
in Table 1.
The value of the kth neuron at the output layer is given by:
yk = Softmax(
m∑
j=1
hjw
(2)
jk + b
(2)
k ) (1)
The value of jth neuron at the hidden layer is given by:
hj = ReLU(
l∑
i=1
xiw
(1)
ij + b
(1)
j ) (2)
4 M. Alam et al.
Table 1: Definition of symbols used throughout this paper
Symbols Definition
xi Input to the i
th neuron in the Input layer
hj Output of the j
th neuron in the Hidden layer
yk Output of the k
th neuron in the Output layer
w
(1)
ij
Weight of the link connecting ith neuron in the Input layer
with the jth neuron in the Hidden layer
w
(2)
jk
Weight of the link connecting jth neuron in the Hidden layer
with the kth neuron in the Output layer
b
(1)
j Bias of the j
th neuron in the Hidden layer
b
(2)
k Bias of the k
th neuron in the Output layer
ReLU Activation function, given by, ReLU(hj) =
{
0, if hj ≤ 0
hj , otherwise
Softmax Activation function, given by, Softmax(yk) =
eyk∑n
k=1
eyk
Linear Activation function, given by, Linear(yk) = yk
3 Fault Tolerant AES S-Box Design
In this section, we use the NN architecture as discussed in Section 2 to model
the f operation. We have considered only those networks which produce 100%
classification accuracy with the training dataset, since a wrong classification
results into incorrect ciphertext. The main focus in the following subsections is
to investigate the PFT of the modelled NN by also concentrating on the hardware
implementation of such design.
3.1 Dataset and Network Topology
The NN in our experiment learns the f operation. Hence, the input to the net-
work NNk for a fixed secret key-byte k is a plaintext byte x and the output
y = SBox(x⊕ k). We have used binary bit patterns for the input x and one-hot
encoding for the output y in order to enhance the prediction accuracy of the
model. Hence, the input layer contains 8 neurons and the output layer contains
256 neurons. We have experimented with different number of hidden layer neu-
rons and the effect of number of neurons on PFT is discussed in Section 3.3. The
NN model trained using the gradient descent algorithm produces floating point
values for weights and bias parameters. In order to alleviate the implementation
effort on hardware, we considered integral values of all the learned parameters
after the training by considering different bit precision after the decimal. The
effect of selecting bit precision on PFT is also discussed in Section 3.3. Softmax
activation function in the output layer needs computation of exponentiation as
shown in Table 1. In order to reduce the implementation complexity, we have
considered Linear activation function (also shown in Table 1) in the output layer
instead of Softmax function while implementing the learned model in the hard-
Enhancing Fault Tolerance of Neural Networks 5
ware1. Hence, during the classification process in the hardware implementation,
Equation (1) is converted to
yk =
m∑
j=1
hjw
(2)
jk + b
(2)
k (3)
Decision on the correct class is taken by the ArgMax function on output layer
neurons to find the neuron having the maximum value.
3.2 Fault Model
Appropriate selection of a fault model for an adversary is crucial in order to
evaluate the fault tolerance capability of the NN and practicality of the attack.
In all our experiments the fault models that we consider are:
1. We assume that the learning phase is fault-free. Faults can only be injected
during the classification phase, not during training phase.
2. We consider single location fault model2, i.e., an adversary can induce fault
at only one weight or bias parameter in an individual execution.
3. An adversary can employ any of the possible fault injection methods like
single-bit flip, multiple-bit flips or zero/random values.
We have the analyzed the fault tolerance with all the possible faulty values for
all the weight and bias parameters in our experiments. In the following subsection
we present the degree of PFT of NN topology with the dataset mentioned in
Section 3.1. We also present a rational behind the selection of bit precision after
the decimal points of the floating point weights and bias values in order to
convert into integer weights along with number of neurons in the hidden layer
with experimental results.
3.3 Fault Tolerance of AES SBox learned with Standard Neural
Network
We have used Keras library to implement the NN as mentioned in Section 3.1.
We have trained the model to achieve 100% classification accuracy, and used
different values of precision to represent integer parameters and number of hidden
layer neurons in order to achieve a final model with satisfactory PFT. The %
Faults in our experiments is calculated as:
%Faults =
#Faulty Output
#Parameters×#All Possible Faulty Values of Each Parameter×#Inputs
1 The motivation of using linear activation function is that the output of softmax
function is directly proportional to the output of linear function which can be imple-
mented with integer parameters with less computational resources, unlike softmax
function.
2 Fault at a single location is actually a very realistic fault model as most of the
differential fault attack works when only a byte or a nibble gets corrupted. Injecting
faults at multiple location often generates faulty output which can not be exploited.
6 M. Alam et al.
(a) First Layer Weight (b) Second Layer Weight
(c) Hidden Layer Bias
Fig. 2: Effect of Selecting Precision on Fault Tolerance
Fig. 2 presents % Faults for five different NNs trained with five different secret
key bytes when faults are injected in ∀i,jw(1)ij (First Layer Weights), ∀j,kw(2)jk
(Second Layer Weights), and ∀jb(1)j (Hidden Layer Bias)3 considering integer
parameters with different precisions after the decimal point of floating point
values. We can easily observe that with the increase in precision % Faults for
First Layer Weights gets decreased while for Second Layer Weights and Hidden
Layer Bias it increases. We have considered 1-bit precision after the decimal
points of the floating point values while converting to integer for the further
implementation as it not only provides the least overall % Faults of the NN but
also helps in the implementation.
Fig. 3 presents % Faults for the previously mentioned five different NNs
when faults are injected at the same parameter, as discussed before, considering
different number of hidden layer neurons. We can see from the figure that with an
increase in number of neurons the % Faults for all the parameters gets decreased,
thereby supporting the need of large number of neurons in the hidden layer. We
3 We do not provide plots for b
(2)
k (Output Layer Bias) as we have found out that
these parameters are totally fault tolerant.
Enhancing Fault Tolerance of Neural Networks 7
(a) First Layer Weight (b) Second Layer Weight
(c) Hidden Layer Bias
Fig. 3: Effect of Number of Neurons in Hidden Layer on Fault Tolerance
have selected 128 neurons in the hidden layer for all our further experiments as
it exhibits significantly high fault tolerance.
Fig. 4 presents the % Fault Value for each parameters in First Layer and
Second Layer Weights of the NN trained with secret key byte 0x25. We can
observe from the figure that most of the parameters in First Layer Weight are
entirely fault resistant, whereas, faults in some of the weights can produce faulty
outputs. However, for almost all of the parameters in Second Layer Weight, we
can observe faulty outputs with properly chosen faults.
Total number of faulty output for all possible combination of faulty param-
eters and inputs for NN with 128 hidden layer neurons is 6978 (147 due to
First Layer Weight, 6810 due to Second Layer Weight, and 21 due to Hidden
Layer Bias Parameters) out of 264,011,776 possibilities. Hence, PFT of this NN
is 2.64× 10−3% faults. However, a cryptographic algorithm such as f operation,
as discussed previously, requires more fault tolerance from implementation. In
order to make the computation more secure, we present an analytical way in the
following subsection to enforce some constraints in NN parameters to increase
PFT of the model.
8 M. Alam et al.
Input Layer Neurons
0 1 2 3 4 5 6 7
Hid
den
 La
yer
 Ne
uro
ns
0
2040
6080
100120
%
 F
au
lt
s
0.0
0.1
0.2
0.3
0.4
0.5
Weight Parameters between Input and Hidden Layer
(a) First Layer Weight
Hidden Layer Neurons
0 20 40 60 80 100 120
Ou
tpu
t L
aye
r N
eur
ons
0
50
100
150
200
250
%
 F
au
lt
s
0.00
0.05
0.10
0.15
0.20
0.25
Weight Parameters between Hidden and Output Layer
(b) Second Layer Weight
Fig. 4: Effect of each weight parameters on overall fault for the Standard Neural
Network Architecture
3.4 Conditions for Implementing Fully Fault-Tolerant Architecture
An adversary can induce faults at any of the learned parameters of the trained
NN model as discussed in Section 3.2. In this subsection, we consider each of the
parameters individually, as shown in Fig. 5, and evaluate the PFT of the NN
architecture with respect to the induced fault.
Case 1: Fault injection in Output Layer Bias
Let an adversary injects a fault and modifies b
(2)
f2
by an amount δ. Hence,
the faulty value b¯
(2)
f2
= b
(2)
f2
± δ. As a result of this fault, only the value of neuron
f2 at the output layer will be affected and all other neurons will be unaltered,
which is shown in Fig. 5a. Let the modified value of yf2 is y¯f2 . Hence, using
Equation (3),
y¯f2 =
m∑
j=1
hjw
(2)
jf2
+ b¯
(2)
f2
=
m∑
j=1
hjw
(2)
jf2
+ b
(2)
f2
± δ = yf2 ± δ
Let for a particular input xc = (xc1, x
c
2, . . . , x
c
l ), the correct class belongs
to the output node c, i.e, the value of yc is maximum because of the ArgMax
function. Since the adversary does not induce any fault in any w
(1)
ij and b
(1)
j ,
all the values for the neurons in the hidden layer will remain unchanged. We
consider following two scenarios for analyzing the fault tolerance of the network.
(a) c 6= f2; i.e., fault affects any node apart from the correct node:
Misclassification will happen due to the fault when y¯f2 > yc, as it will then
classify input xc to class f2 instead of the correct class c due to the ArgMax
function. Hence, the effect of fault has to increase the value of yf2 , i.e.,
yf2 + δ > yc. All other effect of faults which decreases the value of yf2 will
have no impact on NN decision. Hence, the NN will be fault tolerant if
yf2 + δ < yc =⇒ δ < yc − yf2 .
Enhancing Fault Tolerance of Neural Networks 9
x1
xi
xl
y1
yk
yn
h1
hj
hm
ReLU Linear
b
(2)
k
(a)
x1
xi
xl
y1
yk
yn
h1
hj
hm
ReLU Linear
w
(2)
jk
(b)
x1
xi
xl
y1
yk
yn
h1
hj
hm
ReLU Linear
b
(1)
j
(c)
x1
xi
xl
y1
yk
yn
h1
hj
hm
ReLU Linear
w
(1)
ij
(d)
Fig. 5: Effect of Fault at Different Locations: a) Output Layer Bias, b) Weight
Connecting Hidden and Output Layer, c) Hidden Layer Bias, and d) Weight
Connecting Input and Hidden Layer. The Red coloured neurons signify the af-
fected neurons because of the fault induction
(b) c = f2; i.e., fault affects the correct node: Misclassification, in this
case, will happen when y¯f2 < yr, for some neuron r in the output layer.
Hence, the effect of fault has to decrease the value of yf2 , i.e., yf2 − δ < yr
for some r. All the effect of faults which increases the value of yf2 will
have no impact on the NN decision. Hence, the NN will be fault tolerant if
yf2 − δ > yr =⇒ δ < yf2 − yr.
Therefore, the condition for fault-tolerant NN when the fault is injected in
Output Layer Bias is
δ <
{
yc − yf2 , if c 6= f2
yf2 − yr, otherwise, for all r
(4)
10 M. Alam et al.
Case 2: Fault injection in Weight between Hidden-Output Layer
Let an adversary injects a fault and modifies w
(2)
f1f2
by an amount δ. Hence,
the faulty value w¯
(2)
f1f2
= w
(2)
f1f2
± δ. The situation is shown in Fig. 5b. Proceeding
with a similar approach like Case 1, the condition for fault-tolerant NN in this
case is
δ <
{yc−yf2
hf1
, if c 6= f2
yf2−yr
hf1
, otherwise, for all r
(5)
Case 3: Fault injection in Hidden Layer Bias
Let an adversary injects a fault and modifies b
(1)
f1
by an amount δ. Hence,
the faulty value b¯
(1)
f1
= b
(1)
f1
± δ. As a result of this fault, value of neuron f1 at
the hidden layer will be affected and all the neurons in the output layer will be
affected as the value hf1 is propagated to all neurons in the output layer, which
is shown in Fig. 5c.
Let the modified value of hf1 after the fault injection is h¯f1 . Hence,
h¯f1 =
l∑
i=1
xiw
(1)
if1
+ b¯
(1)
f1
=
l∑
i=1
xiw
(1)
if1
+ b
(1)
f1
± δ = hf1 ± δ
Let for a particular input xc = (xc1, x
c
2, . . . , x
c
l ), the correct class belongs
to the output node c, i.e, the value of yc is maximum because of the ArgMax
function. The modified value of yc is given by,
y¯c =
m∑
j=1
j 6=f1
hjw
(2)
jc + h¯f1w
(2)
f1c
+ b(2)c =
m∑
j=1
j 6=f1
hjw
(2)
jc + (hf1 ± δ)w(2)f1c + b(2)c
=
m∑
j=1
j 6=f1
hjw
(2)
jc + hf1w
(2)
f1c
+ b(2)c ± δw(2)f1c = yc ± δw
(2)
f1c
The modified value for any other neuron r can be derived as the same way
and is given by: y¯r = yr ± δw(2)f1r. The NN will be fault tolerant if y¯c > y¯r for all
r, i.e., yc ± δw(2)f1c > yr ± δw
(2)
f1r
. Hence the condition for fault tolerance in this
case is
δ < ± yc − yr
w
(2)
f1r
− w(2)f1c
(6)
Case 4: Fault injection in Weight between Input-Hidden Layer
Let an adversary injects a fault and modifies w
(1)
f0f1
by an amount δ. Hence,
the faulty value w¯
(1)
f0f1
= w
(1)
f0f1
±δ. The situation is shown in Fig. 5d. Proceeding
Enhancing Fault Tolerance of Neural Networks 11
Input Layer Neurons
0 1 2 3 4 5 6 7
Hid
den
 La
yer
 Ne
uro
ns
0
2040
6080
100120
%
 F
au
lt
s
0.04
0.02
0.00
0.02
0.04
Weight Parameters between Input and Hidden Layer
(a) First Layer Weight
Hidden Layer Neurons
0 20 40 60 80 100 120
Ou
tpu
t L
aye
r N
eur
ons
0
50
100
150
200
250
%
 F
au
lt
s
0.000
0.005
0.010
0.015
0.020
0.025
Weight Parameters between Hidden and Output Layer
(b) Second Layer Weight
Fig. 6: Effect of each weight parameters on overall fault for the Modified Neural
Network Architecture
with a similar approach like Case 3, the condition for fault tolerance NN in this
case is
δ < ± yc − yr
w
(1)
f0f1
w
(2)
f1r
− w(1)f0f1w
(2)
f1c
(7)
In the following subsection, we present an analysis on learning AES SBox
using the NN as mentioned before with the four constraints presented in this
section.
3.5 Fault Tolerance of AES SBox learned with Modified Neural
Network
We have used the same NN architecture as mentioned in Section 3.1, i.e., with
8 input layer neurons, 128 hidden layer neurons, and 256 output layer neu-
rons considering integer weights, but with an additional constraints on weight
and bias parameters. The constraint on parameters have been imposed by L2-
Regularization, which adds penalty to the cost function of the NN while train-
ing, restricting the parameters to be constrained within a fixed boundary. We
have tried to replicate different models for which the conditions mentioned in
Equation (4), Equation (5), Equation (6), and Equation (7) holds. We found out
a model with significantly higher PFT than the standard implementation having
no constraints on the parameters.
Fig. 6 presents the % Fault Value for each parameters in First Layer Weights
and Second Layer Weights for such a NN trained with the secret key byte 0x25.
We can easily observe from the figure that the parameters in First Layer Weights
are completely fault tolerant while there are very few faulty outputs due to
fault injection in the Second Layer Weights. In the modified implementation,
all the bias parameters are completely fault tolerant, which effectively prevents
the Single Bias Attack mentioned in [2]. In addition to it, number of sensitive
12 M. Alam et al.
Arithmetic Unit 
Buffer 
Weights 
(ROM) 
Control 
Arithmetic Unit 
Buffer 
Weights 
(ROM) 
Control 
ReLU 
Top-Level 
Control 
LH 
LO 
ArgMax 
Out 
LM 
I7:I0 
Fig. 7: Top Level Architecture of Hidden and Output Layer
weight parameters gets significantly reduced in the modified implementation. As
a result the Gradient Descent Attack [2] becomes more difficult to mount as the
searching complexity of sensitive weight parameters gets increased significantly.
In addition to it, a designer already has information about the sensitive weights,
which he can protect instead of all the weights exhaustively making the attack
more difficult.
The total number of faulty output for all possible combination of faulty pa-
rameters and inputs, in this case, is 32 (due to Second Layer Weight Parameters
only) out of 264,011,776 possibilities. Hence, the PFT of this NN is 1.21×10−5%
faults. We can easily observe that the PFT of the modified model is 218× better
than the previous model which had no constraints on the parameters.
4 Implementation on an FPGA
In this section, we focus on the overhead of different NN architectures for execu-
tion of f operation. To the best of our knowledge, the study of implementation
overhead for achieving a higher degree of fault tolerance is not present in litera-
ture in the context of NNs.
From Equation (2) and Equation (3), we can observe that the computation of
a neuron can be expressed with simple multiplication and addition operations.
This can be efficiently executed by Digital Signal Processor (DSP) blocks of
modern FPGAs which supports fast multiplication and addition of integers.
This motivates our choice of shifting to integer valued weights from floating
point. The architecture of a single NN-based f operation can be realised either
in an iterative or in a parallel fashion depending on area, time constraints. In
our proposed implementation, we have opted for an iterative architecture to
have a compact implementation, which is validated on an Artix-7 FPGA (7a35t-
cpg236).
A top-level description of the complete architecture corresponding to the 8-
128-256 NN is provided in Fig. 7. The hidden layer computation is performed by
Enhancing Fault Tolerance of Neural Networks 13
AND AND AND AND AND AND AND AND
Weights
ROM
Control
DSP ADDER
DSP ADDER
ADD
SHIFT
l0l1l2l31 ⋯
I7 I0I1I2I3I4I5I6
W0W7
OUT
Fig. 8: Top Level Architecture of the Hidden Layer
the LH and ReLU block. Output of ReLU block is forwarded to the LO block
for the output layer computation. The LO block operates in the same way as LH
block, but the data widths are different. Output of the LO block is forwarded
to ArgMax block which finds the index of maximum value and finally produces
an output of the design. Top-level control block controls the operations of these
blocks, which is designed using a Finite State Machine.
Fig. 8 provides a broad overview of the architecture of LH block. Inputs to
this layer (hidden layer) are the eight-bit values of the inputs to f operation.
Since the input bits are either 0 or 1, no multiplication of the weight values with
the input is required. The weight values corresponding to bit 1 are added, and
the bias is added finally to get the output. The selection of weight values is made
using multiple AND gates as depicted in Fig. 8. Block RAMs are used to store the
weights and bias values. Addition of the weight values is done using two DSP
adders with appropriate zero padding. Final addition uses normal LUT-based
adder as the width of the inputs are small (9 bits). The iterative architecture
requires a total of 128 iterations to compute the output of all the 128 neurons.
The new value is inserted into the buffer and is shifted right after each iteration.
The output is directly taken from the buffer register when 128 iterations are
complete.
Fig. 9 shows the architectural diagram of the output layer along with the
ArgMax block, which has a similar structure like LH block, and also operates in
iterative manner. However, this layer has a large number of inputs compared to
the hidden layer and a limited number of multipliers (to keep resource usage low).
14 M. Alam et al.
MUL MUL MUL MUL MUL MUL MUL MUL
ADD
ADD
ADD
MAXVAL MAXIDX
COMPARATOR
COUNTER
CONTROL
Weights 
ROM
MUX MUX MUX MUX MUX MUX MUX MUX MUX MUX MUX MUX MUX MUX MUX MUX
−l127 l112 −l111 l96 −l95 l80 −l79 l64 −l63 l48 −l11 l8 −l7 l4 −l3 l0−w127 w112 −w111 w96 −w95 w80 −w79 w64 −w63 w48 −w47 w32 −w31 w16 −w15 w0
OUT
⋯
⋯ ⋯
⋯
⋯
⋯
⋯
⋯
⋯
⋯ ⋯
⋯
⋯
⋯ ⋯ ⋯
Computation in Neuron
ArgMax Computation
ADD ADD ADD
ADD
Fig. 9: Architecture of the Output Layer with ArgMax Function
Hence, multiplexers have been utilised to select the inputs for each execution
of the computation unit. Each of the multipliers has been constructed using
multiple DSP blocks and is capable of multiplying 20-bit values. DSP and LUT-
based adders are also used in the multiplier. Eight parallel instantiations have
been used for the computation, and hence, computation of one neuron takes
128/8 = 16 executions of the arithmetic unit. Total 256 iterations are required
for the 256 neurons. The output of each iteration is forwarded to the ArgMax
block which finds the index of the maximum value. Finally, the index of the
maximum value is stored in a register of ArgMax block (LM) which is passed to
the output of the design after LO block computation is finished.
Table 2 provides a comparative summary of different NN architectures per-
forming f operation on Artix-7 FPGA. The complete design (8-128-256), which
exhibits the highest PFT, operates at a maximum operating frequency of 128.66
MHz (critical path of 7.77 ns). We also report the resource usage for a standard
AES SBox in the last row of Table 2. We conclude from the table that, as the
number of neurons in the hidden layer increases, resource and timing require-
ment of the designs also increases. This is the overhead penalty that a designer
needs to pay to achieve significantly high PFT as shown in Section 3.5. The
resource usage for LUT-based AES SBox is extremely low, and the maximum
operating frequency is also higher. However, this implementation has zero fault
tolerance as the entire architecture is deterministic.
We have performed on-chip validation of the proposed modified learning
strategy by injecting faults into the 8-128-256 architecture. We used the clock
glitch method to inject faults at a single location into the design at different time
instances during its operation. The final result of fault occurrences is consistent
with our simulation-based experiments. Fig. 10 provides an overview of the on-
chip experimental validations of our idea focusing on the trade-off between PFT
Enhancing Fault Tolerance of Neural Networks 15
Table 2: Post Place&Route Resource Utilisation for Artrix-7 FPGA for Different
Implementations
Design
#Slice
(%)
#LUT
(%)
#Register
(%)
#DSP
(%)
#BRAM
(%)
Freq.
(MHz)
#Clock
Cycle
Delay
(us) % Faults
8-8-2564
127
(1.55)
324
(1.56)
199
(0.47)
33
(36.67)
5
(10)
151.95 1350 8.88 0.16
8-32-256
341
(4.18)
934
(4.49)
951
(2.29)
33
(36.67)
4
(8)
149.43 25576 171.15 0.04
8-64-2565
427
(5.24)
978
(4.70)
1007
(2.43)
33
(36.67)
6
(12)
141.40 49352 349.01 2.7× 10−3
8-128-256
601
(7.37)
1442
(6.93)
1657
(3.98)
33
(36.67)
9
(18)
128.66 96910 753.18 1.2× 10−5
LUT-based
24
(0.29)
80
(0.40)
17
(0.40)
0
(00)
0
(00)
259.80 1
3.85
×10−3 100
LUT 8-8-256 8-32-256 8-64-256 8-128-256
−4
−2
0
2
100%
0.16%
0.04%
2.7.10−3%
1.2.10−5%
Design
lo
g 1
0
(F
au
lt
(%
))
−5
0
5
1.4.10−5
79
29, 294
121, 813
567, 286
lo
g 1
0
(A
re
a
×
D
el
ay
)
Fig. 10: Fault Tolerance vs. Resource Overhead trade-off
and implementation overhead. It is clear from the figure that the LUT-based
AES SBox, though is least in terms of area-delay product, produces highest
number of faulty outputs than any other implementation. The 8-8-256 architec-
ture without any constraints provides a better PFT but with higher resource
utilization. However, when we modify the learning strategy with the constraints
and increase number of neurons in the hidden layer, we can observe that the
PFT increases significantly. We can see that the PFT is maximum for 8-128-256
architecture, though it comes with a higher overhead. The results validate our
idea that the proposed learning strategy can enhance the fault tolerance of NNs.
4 Without any constraints on the parameters.
5 % Faults value for this network is equivalent to the 8-128-256 architecture without
any constraints as mentioned in Section 3.3, which signifies that a smaller network
with the proposed constraints on the parameters can achieve PFT of large network
without any constraint.
16 M. Alam et al.
5 Conclusion
In this paper, we propose design of cryptographic primitives using NN for high
fault tolerance. As a case study, we showed practical implementation of AES
Sbox (with key addition). We then propose a technique to further boost the
fault tolerance of designed primitive with some analytically derived constraints
on the network parameters, which increases the complexity of fault injection
based attacks on NNs. A tailored implementation strategy for the NN using
integer weights is validated on an FPGA. We show that fault tolerance can be
scaled up albeit with higher area/delay overhead.
References
1. Ali, S., Mukhopadhyay, D.: An improved differential fault analysis on aes-256. In:
AFRICACRYPT’11. pp. 332–347. Springer (2011)
2. Liu, Y., Wei, L., Luo, B., Xu, Q.: Fault injection attack on deep neural network.
In: Proceedings of the 36th International Conference on Computer-Aided Design.
pp. 131–138. IEEE Press (2017)
3. Minnix, J.I.: Fault tolerance of the backpropagation neural network trained on
noisy inputs. In: Neural Networks, 1992. IJCNN., International Joint Conference
on. vol. 1, pp. 847–852. IEEE (1992)
4. Murray, A.F., Edwards, P.J.: Synaptic weight noise during multilayer perceptron
training: fault tolerance and training improvements. IEEE Transactions on Neural
Networks 4(4), 722–725 (1993)
5. Murray, A.F., Edwards, P.J.: Enhanced mlp performance and fault tolerance re-
sulting from synaptic weight noise during training. IEEE Transactions on neural
networks 5(5), 792–802 (1994)
6. Neti, C., Schneider, M.H., Young, E.D.: Maximally fault tolerant neural networks.
IEEE Transactions on Neural Networks 3(1), 14–23 (1992)
7. Phatak, D.S., Koren, I.: Complete and partial fault tolerance of feedforward neural
nets. IEEE Transactions on Neural Networks 6(2), 446–456 (1995)
8. Protzel, P.W., Palumbo, D.L., Arras, M.K.: Performance and fault-tolerance of
neural networks for optimization. IEEE transactions on Neural Networks 4(4),
600–614 (1993)
9. dos Santos, F.F., Pimenta, P.F., Lunardi, C., Draghetti, L., Carro, L., Kaeli, D.,
Rech, P.: Analyzing and increasing the reliability of convolutional neural networks
on gpus. IEEE Transactions on Reliability (2018)
10. Segee, B.E., Carter, M.J.: Fault tolerance of pruned multilayer networks. In: Neural
Networks, 1991., IJCNN-91-Seattle International Joint Conference on. vol. 2, pp.
447–452. IEEE (1991)
11. Tchernev, E.B., Mulvaney, R.G., Phatak, D.S.: Investigating the fault tolerance of
neural networks. Neural Computation 17(7), 1646–1664 (2005)
12. Tchernev, E.B., Mulvaney, R.G., Phatak, D.S.: Perfect fault tolerance of the nkn
network. Neural computation 17(9), 1911–1920 (2005)
