Entropy-Based Modeling for Estimating Soft Errors Impact on Binarized
  Neural Network Inference by Khoshavi, Navid et al.
Entropy-Based Modeling for Estimating Soft Errors
Impact on Binarized Neural Network Inference
Navid Khoshavi1,2, Saman Sargolzaei1, Arman Roohi3, Connor Broyles2, Yu Bi4
1Department of Computer Science, Florida Polytechnic University
2Department of Electrical and Computer Engineering, Florida Polytechnic University
3Department of Electrical and Computer Engineering, University of Texas, Austin
4Department of Electrical and Computer Engineering, University of Rhode Island
Abstract—Over past years, the easy accessibility to the large
scale datasets has significantly shifted the paradigm for develop-
ing highly accurate prediction models that are driven from Neural
Network (NN). These models can be potentially impacted by the
radiation-induced transient faults that might lead to the gradual
downgrade of the long-running expected NN inference acceler-
ator. The crucial observation from our rigorous vulnerability
assessment on the NN inference accelerator demonstrates that the
weights and activation functions are unevenly susceptible to both
single-event upset (SEU) and multi-bit upset (MBU), especially in
the first five layers of our selected convolution neural network. In
this paper, we present the relatively-accurate statistical models to
delineate the impact of both undertaken SEU and MBU across
layers and per each layer of the selected NN. These models can be
used for evaluating the error-resiliency magnitude of NN topology
before adopting them in the safety-critical applications.
Index Terms—Fault Injection, Deep Neural Network Accelera-
tor, Machine Learning, Soft Error, Statistical Model
I. INTRODUCTION
Over the past few decades, the focus of the researchers
has been on speeding up the computational capabilities in
the traditional computing-centric model, which is based on
moving large volumes of data from/to memory storage to/from
processing nodes for execution/store. To maximize the efficacy
of computing-centric models, many fine/coarse -grain paral-
lelism paradigms have been utilized, from a software approach,
i.e. instruction/data/thread/transaction -level parallelism, to
micro-architecture domain, i.e. pipelining [1], superscalar [2],
VLIW, Single Instruction Multiple Data (SIMD) instructions
[3], vector processors [4], Graphics Processor Units (GPUs),
and chip-multiprocessors (CMPs).
While the aforementioned innovations have resulted in
significant energy reduction per Floating-Point Operation
(FLOP), the energy cost of data movement has marginally
reduced via adopting these approaches. This has encouraged
the computer architecture designers to explore shifting the
paradigm of computation from the computing-centric models
towards the fine-tuned application-specific architectures for
performing a specific task with the highest performance while
incurring the minimum energy cost per computation opera-
tion. Among the applications that have received considerable
attention for architectural customization, the hardware accel-
erators for Machine Learning (ML) and Deep Learning (DL)
2012 2013 2014 2015 2016 2017 2018 2019 2020
1
8
16
32
24
FP: Floating Point
DaDianNao
Chen et al, MICRO 2014
bit FixedP 
bit FixedP
Limited 
Numerical Precision
Gupta et al, ICML 2015
bit FixedP 
bit FixedP
bit Training ASICs
Wang et al, NeurIPS 2018
bit FP 
Hybrid fixed-point and 
floating-point
Köster et al, ANIPS 2017
bit FixedP + 
bit FP
T/W: Weight in Training
2-bit Fixed-Point 
SAWB, Choi et al, SysML, 2019
LQ-Nets, Zhang et al., arXiv, 2018
WRPN, Mishra, ICLR, 2018
LearningReg, Choi et al., arXiv, 2018
QIP, Jung et al., arXiv, 2018
TTQ, Zhu et al, ICLR, 2017
WEQ, Park et al., CVPR, 2017
TWN, Li & Liu, arXiv, 2016
1-bit Fixed-Point 
LP-BNN, Geng, arXiv, 2019
AutoQB, Lou et al., arXiv, 2019
Compact CNN, Xu et al., arXiv, 2019
BNN, Hubara et al., NIPs, 2016
XNOR-Net, Rastegari et al., CoRR, 2016
Bitwise NN, Kim & Smaragdis, ICML, 2015
8-bit Floating Point
8-bit Training ASICs, Wang et al.,   
NeurIPS 2018
8-bit Fixed-Point
Augmented Flexpoint with stochastic 
quantization, Wu et al., ICLR 2018
16-bit Fixed-Point 
Limited Numerical Precision, Gupta et al., 
ICML 2015
16-bit Fixed-Point + 5-bit FP
Hybrid fixed-point and floating-point 
Köster et al., ANIPS 2017
32-bit Fixed-Point 
DaDianNao, Chen et al., MICRO, 2014
Year
B
i
t
-
W
i
d
t
h
Fig. 1. Roadmap of reduced bit-width representation in neural networks.
algorithms are the most well-studied designs. To optimize the
architecture design for ML/DL accelerators, a broad spectrum
of methodologies have been proposed at both software and
hardware levels.
In the algorithm-based approaches, the data-layout com-
pression [5], [6], [7], network pruning, encoding, batch-size
reduction [8], subset selection [9], and reduced precision repre-
sentation [10] have been explored extensively in both academia
and industry sectors. In particular, the employment of weights
and activations with low bit-width decreases the model size
and the computing complexity. For instance, in [11], the
floating-point network parameters quantized to 32-bit fixed-
point, which leads to a reduction in both data movement over-
head and computation cost. Further improvement performed in
[12], where authors leveraged a 16-bit fix-point representation
for weights. In [13], hybrid representations, including both
fixed and floating-point, were utilized, where some layers were
quantized, while the rest have remained unchanged. This trend,
reduction in precision, has been aggressively continued to
compress the representative bits to 8-bit fixed-point fashions
[14], [15], 2-bit fixed-point precision [16], [17], [18], [19],
[20], [21], [22], [14], and 1-bit fixed-point representation or
Binarized Neural Network (BNN) [23], [24], [25], [26], [27],
[28]. It is anticipated that many industrial companies follow
the scaling roadmap shown in Fig. 1 and expeditiously adopt
the reduced precision representations in Deep NN (DNN)
training and inference.
However, the precision reduction in the representative data
ar
X
iv
:2
00
4.
05
08
9v
2 
 [c
s.L
G]
  2
1 A
pr
 20
20
has exacerbated the impact of soft errors in the quantized
neural network (QNN). In particular, compressing the infor-
mation of a 32-bit data into few bits increases the risk of
losing a significant portion of the tensors’ information, which
are constructed by reduced precision parameters. On the other
hand, the current trend of aggressive dimension reduction in
the transistors has intensified this reliability challenge. The soft
errors are generally originated from high-energy particles and
might strike the sequential/combinational logic in the ML/DL
accelerators. The DL accelerators systematically perform the
inference operation in a pre-trained network. Thus, it is
expected that the neural network (NN) inference accelerator to
maintain its functionality for an extended period. However, the
accumulated radiation-induced transient faults can potentially
impact the individual parameters in NN topology, and if not
mitigated immediately, the functionality of a long-running
expected NN inference accelerator can gradually downgrade
by outlier contamination [29], and lead to drastic accuracy
loss.
Based on our empirical observations obtained from the
rigorous NN vulnerability assessment against soft errors, we
observed that the weights and activation functions are unevenly
susceptible to both single-event upset (SEU), and multi-bit
upset (MBU) scenarios. We also noticed that not only the
soft errors inconsistently influence the parameters, but the
magnitude of fault-influence on the accuracy loss is subject to
how early the parameters in the consecutive layers are exposed
to soft errors.
In this paper, we propose an entropy-based model to esti-
mate the impact of the soft errors on the NN architectural
model. Our proposed model delineates the effects of both
undertaken SEU and MBU across layers and per each layer
of the selected NN. The proposed model can be utilized by
NN architectural developers to evaluate the error-resiliency
magnitude of the NN before employing it in the safety-critical
applications. In summary, our significant contributions in this
paper can be listed as follows:
1) A comprehensive fault injection methodology is used to
delineate the fault-skeleton map which reveals the sensitivity
of each NN parameter in each layer of NN to the soft errors.
2) Based on the results collected in the previous step, we
propose an entropy-based model to estimate the impact of the
soft errors on the NN architectural. This model stresses on
how the appearance of fault at different sequence of layers and
parameters can potentially lead to drastic misclassification.
3) Our finding highlights that considering equal impor-
tance for the parameters across the layers is inaccurate, an
assumption that does not hold according to our preliminary
assessment of the layer importance. Our findings suggest then
to consolidate traditional assumption by incorporating our
preliminary finding for layer importance.
The remainder of the paper is organized as follows. Section
II presents the background on convolutional NN. Section III
describes the empirical observations on fault injection in the
NN accelerator. Section IV explains our entropy-based model
for estimating soft error impact in NN topology. In Section
t!
3 2
 p i
x e
l
3x8 bit RGB 32x32x3 threshold convolutionmax pooling
30x30x64
conv2
conv1
28x28x64
12x12x128 10x10x128
5x5x128
conv3 conv4
conv5 conv6
3x3x256
5 1
2 x
5 1
2
10
max 
pool
max 
pool
FC FC
FC
14x14x64
2 5
6 x
5 1
2
5 1
2 x
6 4
Fig. 2. a schematic view of cnvW2A2 architecture designed in FINN [36].
V, the experimental results are presented. Finally, Section VI
concludes the paper.
II. BACKGROUND
The Convolutional Neural Network (CNN) is generally
used to extract the features of unique information through a
hierarchy of layers. The CNN is widely applied in a variety of
applications such as image processing [30], sentence classifica-
tion [31], semantic parsing [32], and speech recognition [33].
As illustrated in Fig. 2, CNN is constructed by different types
of layers including convolutional, fully connected, and pooling
layers. The input layer provides a set of information based on
the applications of CNN. For instance, in order to classify
the image, the input layer retrieves raw pixel values from
the image [34]. Furthermore, the CNN layers are explicitly
described as the following four sub-layers:
1) convolution sub-layer: A dot product operation is performed
among the weights of the regionally-connected neurons and
their input sets to compute the output,
2) non-linear sub-layer: It employs an activation function
to map the weight sum of regionally-connected neurons to
max(0, weight sum of regionally-connected neurons),
3) normalization sub-layer: It performs the scaling of the
feature values [35],
4) pool sub-layer: It reduces the spatial size to decrease the
number of parameters and computations within the network
[34]. Lastly, the Fully Connected (FC) layer takes the previous
CNN layers and transforms it as a vector that represents the
set of feature values.
III. EMPIRICAL OBSERVATIONS ON FAULT INJECTION ON
NN ACCELERATOR
A. Fault Injection Category
Regardless of how dynamic the soft errors are, the func-
tionality of both combinational and sequential logic circuits
can be either partially or entirely sabotaged upon the strike of
radiation-induced transient faults. The effect of soft errors is
generally modeled as a uniform distribution across space and
time. This assumption is in line with the study presented in
[37]. To precisely examine the impact of soft errors on the
NN accelerator, both Single-Event Upset (SEU) and Multi-
Bit Upset (MBU) are studied in our rigorous NN evaluation.
Even though the SEUs are considered as the significant source
of transient faults, the study in [37] shows that the fragment
of systems that are impacted by MBUs have increased over
the past years. The primary driver of this movement is the
aggressive transistor dimension reduction, which enables the
integration of low dimension transistors in an ultra small scale.
This aggressive accommodation enables the radiation-induced
transient faults with less energy to unbalance the critical charge
of adjacent transistors. To mimic the behavior of an MBU, we
followed the recommendation in [38] by altering a burst of
bits in size of 8-bit. For the sake of simplicity, we disregard
the faults that might occur in CPU, main memory, memory
bus, and the combinational logic.
B. Fault Injection on NN Parameters
The parameters that are used for studying the SEU/MBU
impact are described as follows:
• Weights: The weights in the NN accelerator remain
untouched during the inference operation. However, if
a weight tensor is contaminated by an error that stems
from soft errors, it has the potential to contaminate the
consecutive layers in NN.
• Activations: The activation tensor of each convolution
layer is used as the input for the convolution operation
in the next layer. If the soft error strikes on regionally-
connected activation nodes, the accelerator could experi-
ence certain degree of accuracy loss.
• Layers: The convolutional network topology is generally
composed of multiple convolutional layers, max pool
layers, and fully connected layers. Since each category of
layers has distinct parameters, they can contribute non-
uniformly to the accuracy loss in the NN accelerator.
C. Fault Injection Distribution
Uniform fault injection across layers: The SEU/MBU can
be uniformly injected to the NN architecture across the space
and time of memory dimension. Besides, depending on the
points of fault injection, soft errors can be inserted into the
given memory location where the particular parameters are
stored and executed for the NN topology.
Targeted in-layer fault injection: Considering that the con-
volutional layers are usually customized with various tensor
dimension sizes, the in-layer faults can make an impact on
the classification accuracy. Such an in-layer fault can represent
itself as either a dramatic accuracy drop or reasonable accuracy
loss/gain. Our study will present the impact of soft errors
against targeted parameters within the layer.
D. Experimental Results Analysis
The sensitivity of the NN accelerator to both SEU/MBU
across the stack of layers in NN and per each individual
layer of NN are evaluated. Based on our experimental results
demonstrated in Fig. 3 and Fig. 4, we observed the following
insights:
• Reducing the number of bits for storing the network infor-
mation increases the vulnerability of the NN accelerator
to soft errors (SEU/MBU).
• The activation layers are considerably vulnerable to both
SEU and MBU, as shown in Fig. 4.
1        2        5       10     20      50   100
Number of faults
SEUs Impact on 
Weight/Activation Layers
85
80
75
70
65
A
c c
u r
a c
y
Distribution of Classification Accuracy  
MBUs Impact on 
Weight/Activation Layers
87.5
86
84.5
83
82
A
c c
u r
a c
y
Activation 
Layers
Weight 
Layers
1        2        5       10     20      50    100
Number of faults
2 %
5 %
1 0
%
1 9
. 2
5 %
baseline
2 %
4 %
baseline
cnvW1A1cnvW1A1
cnvW2A2 cnvW2A2
Fig. 3. The impact of uniform SEUs/MBUs injection on two individual groups
of layers: (i) the stack of weight layers, and (ii) the stack of activation layers,
across the entire network in both cnvW1A1 and cnvW2A2 networks.
5 faults 10 faults 50 faults 100 faultsx
Fig. 4. The impact of targeted in-layer SEUs/MBUs injection on both
unprotected cnvW1A1 and unprotected cnvW2A2 networks.
• The classification accuracy is gradually dropped with
respect to the number of accumulated SEU/MBU.
• The effect of MBU is relatively higher than SEU upon
the NN accelerator.
• The NN accelerator can still suffer from a drastic misclas-
sification in the worst-case scenarios despite the negligi-
ble average degradation due to the soft errors’ effect. For
instance, the accuracy of image classifier can drastically
drop by 19.25% in cnvW1A1 where 100 MBU events
are injected during the workload operation as illustrated
in Fig. 3.
IV. BUILDING MODEL INSPIRED BY EMPIRICAL
OBSERVATIONS
We have recently discovered the existence of worst-case sce-
narios exists with drastic degradation on the network inference
accuracy. To understand and design mitigation techniques, and
to delineate the impact of both undertaken SEU and MBU
across layers and per each layer of the selected NN, we,
therefore, performed a statistical analysis of our empirical
database explained here. To stand on a firm basis, let us first
review the basics of our database and model developments.
Vulnerability, in our context, defines the level to which a
bit under the fault injection may perturb inference accuracy.
Accuracy is, therefore, defined as the distance between the
network inferred output and the actual output. The distance
also referred to as loss, is formulated as below,
L(f(x; (wl)Ll=1, s) (1)
where x and s are the input and target output for the
trained binarized network with L convolutional layers. The
f(.) represents network inference computation, and L(.) is
the computed loss between the network output and target
output. It is worth to mention that weights are often stored
in their two’s complement after being converted to binarized
form. The quantization, into 2-bits, is achieved through either
deterministic,
w =
{
+1 if wr ≥ 0
−1 otherwise (2)
or stochastic approach,
w =
{
+1 with probability of σ(wr)
−1 with probability of 1-σ(wr) (3)
σ(w) = max(0,min(1,
wr + 1
2
)) (4)
where wr is the real-world calculated weight before quanti-
zation. The stochastic approach is more appealing, while the
choice depends upon the computational constrained. During
the training phase, the objective is to minimize the amount of
loss. Without the loss of generality, the training process can
readily be translated into the form of an optimization problem,
minwl(L(f(x; (wl)Ll=1, s)) (5)
From our modeling perspective, the optimization problem is
reversely seen where the objective is to inject faults in the
forms of bit-flips, such that the resulting network operation is
perturbed,
maxw˜(L(f(x; (w˜l)Ll=1, s)− L(f(x; (wl)Ll=1, s)) (6)
where w˜ represents perturbed weights due to the fault
injection. It is worth to emphasize that in the bit-flip attack
(BFA) model [39], [40], the distance between pre-fault and
post-fault weight tensors shall remain below a preset. Targeted
BFA implementation is, therefore, transcribed as a bit search
algorithm to solve the above-mentioned optimization problem
constrained on finding an optimal combination of vulnerable
bits to perturb.
A. Modeling the Impact of Uniform fault injection across
layers
We not only analyzed the acquired results to determine
the most vulnerable NN parameters to the soft error but also
to objectively build the corresponding statistical model that
describes the weighted impact of fault-contaminated param-
eters on the inference accuracy of NN. The designed exper-
iment included injection of SEU- and MBU-based uniform
faults over cnvW1A1 and cnvW2A2 network architectures.
The experiment was aimed on surveying the impact of four
input variables, namely (1) quantization level for the net-
work architecture under study (X1 ∈ [1, 2]), (2) fault mode
(X2 ∈ SEU,MBU ), (3) identification variable for the fault
domain (X3 ∈ weight, activation), (4) number of faults
(X4 ∈ [1, 2, 5, 10, 20, 50, 100]), on the inferred accuracy, as
hypothesized by,
E[Yt|X] = β0 + β1Xt1 + β2Xt2 + β3Xt3 + β4Xt4+
β12Xt1Xt2 + β13Xt1Xt3 + β14Xt1Xt4 + β23Xt2Xt3+
β24Xt2Xt4 + β123Xt1Xt2Xt3 + β234Xt2Xt3Xt4+
β1234Xt1Xt2Xt3Xt4 + 
(7)
where β1, β2, β3, β4, β12, β13, β14, β23, β24, β123, β234,
β1234 define the set of coefficient parameters (B), for the set
of variables, previously defined. The model coefficients set B
were then estimated by,
Bˆ ≡ argmin
B
(
∑
t
(Yt − Yˆt)2) (8)
Table I summarizes the calculated set of coefficients (B),
along with their corresponding standard error and statistical
significance level, for the derived model. Our derived model
resulted in an adjusted R2 of 0.91 (p < 0.01). We found a
significant interaction among all independent variables for the
inferred accuracy prediction.
TABLE I
ESTIMATES OF FAULT MODEL COEFFICIENTS FOR SEU- AND MBU-
BASED FAULT. COEFFICIENTS WITH SIGNIFICANCE LEVEL OF < .05 AND
< .01 ARE RESPECTIVELY TAILED BY * AND ** SYMBOLS.
parameter estimate std. error t-value
β0 74.76** 0.68 110.48
β1 5.37** 0.43 12.49
β2 0.45 0.42 1.07
β3 0.65 0.41 1.58
β4 0.06** 0.01 4.18
β12 -0.19 0.27 -0.7
β13 -0.33 0.27 -0.7
β14 -0.02* 0.01 -2.27
β23 -0.28 0.26 1.11
β24 0.04** 0.01 4.68
β34 -0.06** 0.01 -6.8
β123 0.12 0.16 0.74
β124 -0.02** 0.00 -4.00
β134 0.02** 0.00 4.09
β234 -0.04** 0.00 -7.13
β1234 0.02** 0.00 5.93
Fig. 5. The impact of uniform SEUs/MBUs injection on two individual
groups of layers: (i) the stack of weight layers, and (ii) the stack of
activation layers, across the entire network in both unprotected cnvW1A1 and
unprotected cnvW2A2 networks. The impacts are overlaid with the predicted.
B. Modeling the Impact of Targeted In-layer fault injection
To evaluate the importance of layer number on fault mod-
eling, we conducted a separate experiment (injection of SEU-
and MBU-based uniform faults over cnvW1A1 and cnvW2A2
network architectures), with the following independent vari-
ables: (1) quantization level for the network architecture under
study (X1 ∈ [1, 2]), (2) fault mode (X2 ∈ SEU,MBU ),
(3) the layer number in which the fault was injected (X1 ∈
[1, 2, ..., 9]), and (4) number of faults (X4 ∈ [5, 10, 50, 100]),
on the inferred accuracy, as hypothesized by,
E[Yt|X] = β0 + β1Xt1 + β2Xt2 + β3Xt3 + β4Xt4+
β12Xt1Xt2 + β13Xt1Xt3 + β14Xt1Xt4 + β23Xt2Xt3+
β24Xt2Xt4 + β123Xt1Xt2Xt3 + β234Xt2Xt3Xt4+
β1234Xt1Xt2Xt3Xt4 + 
(9)
where β1, β2, β3, β4, β12, β13, β14, β23, β24, β123, β234,
β1234 define the set of coefficient parameters (B), for the set
of variables, previously defined. The model coefficients set B
were then estimated by,
Bˆ ≡ argmin
B
(
∑
t
(Yt − Yˆt)2) (10)
Table II summarizes the found coefficients. This finding high-
lights that considering equal importance for the parameters
across the layers is inaccurate, an assumption that does not
hold according to our preliminary assessment of layer impor-
tance. Adjusted R2 was found to be 0.69 (p < 0.01). We
hypothesize that the reduction in the value of R2 is due to
the non-existence of a model variable concerning the fault
domain in this model. Considering the observed non-uniform
behavior across layers, we surveyed an entropy-based measure
to estimate a layer likelihood to be fault-tolerant. Heat-maps
in Fig. 6 display the probability distributions of the estimated
likelihood of each layer being fault-tolerant (in 100 trials)
over different architectures and under different fault modeling
condition. We observe that the first layers are subjected to
more sensitivity than the last layers. This finding highlights
that considering equal importance for the parameters across
the layers is inaccurate, an assumption that does not hold
according to our preliminary assessment of layer importance.
Our findings suggest then to consolidate traditional assumption
by incorporating our preliminary finding for layer importance.
We refer to this approach as the entropy-based modeling
technique, a technique where our a priori information about
the spatial importance is utilized to derive proper design of
the computational networks.
V. EXPERIMENTAL SETUP
Dataset: In our experiments, we consider the convolutional
network topology (CNV) inspired by BinaryNet [40] and
VGG-16 [41]. It is tailored with six convolutional layers,
three max pool layers, and three fully-connected layers. The
classification process for this dataset utilizes a CNN to classify
images from the CIFAR 10 image set. Two CNNs were
tested: one using 1-bit weight and activations, and one using
2-bit weights and activations. There are around 1.6 million
susceptible bits and 3.2 million susceptible bits to soft errors
in 1-bit and 2-bit CNVs, respectively.
Experiment Setup: We extensively modified the BNN-PYNQ
project to perform 2000 fault injections for each scenario to
collect a sufficient pool of tests. This pool relatively presents a
comprehensive sample of the behavior of fault in each scenario
assumption. The faults are injected on the targeted parameters
either across the entire NN or per a specific layer. We limit our
experimental results on classifying 1000 images in CIFAR-
10 to reduce the relative long fault injection process while
still delivering an approximately-sufficient representation of
the original consideration. The wights and activations are
considerably reused in the course of inference operation. They
are stored on on-chip buffers to reduce the memory access
time significantly. We evaluated the sensitivity of the NN
TABLE II
ESTIMATES OF FAULT MODEL COEFFICIENTS FOR SEU- AND MBU-
BASED FAULT. COEFFICIENTS WITH SIGNIFICANCE LEVEL OF < .05 AND
< .01 ARE RESPECTIVELY TAILED BY * AND ** SYMBOLS.
parameter estimate std. error t-value
β0 75.26** 0.65 116.95
β1 5.10** 0.41 12.33
β2 0.79 0.41 1.95
β3 0.08 0.14 0.5
β4 0.06** 0.01 5.56
β12 -0.17 0.26 -0.67
β13 -0.04 0.09 -0.42
β14 -0.01* 0.01 -2.23
β23 -0.12 0.09 -1.38
β24 -0.11** 0.01 -17.33
β34 -0.01** 0.00 -4.07
β123 0.03 0.06 0.44
β124 0.03** 0.00 8.29
β134 0.00 0.00 1.64
β234 0.01** 0.00 12.75
β1234 0.00** 0.00 -5.90
510
50
100
N
u m
b e
r  
o f
 f a
u l
t s
5
10
50
100
N
u m
b e
r  
o f
 f a
u l
t s
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1 2     3     4     5      6     7    8     9
Layer number
(a)
1 2     3     4     5      6     7    8     9
Layer number
1 2     3     4     5      6     7    8     9
Layer number
1 2     3     4     5      6     7    8     9
Layer number
(b)
cnvW1A1-MBUcnvW1A1-SEU cnvW2A2-SEU cnvW2A2-MBU
Fig. 6. The probability of accuracy drop is written in each square based on the number of accumulated SEU/MBU injection per each layer in (a)
unprotected cnvW1A1 and (b) unprotected cnvW2A2.
accelerator to a range number of faults that might occur in the
operational lifetime of the device. The faults are injected at
uniformly distributed times either across the entire NN or per
each individual layer during the image classification. A host
program running on the ARM processor initializes the weights
and activations of the network on the FPGA, then prepares
the image set and launches the classification process. During
the image classification, faults are injected by reading from
memory on the FPGA used for CNNs parameters, flipping a
bit or a word, then writing back the result. The outputs of
the image classification are then compared against the correct
labels to calculate accuracy.
VI. CONCLUSIONS
The soft errors can potentially change the content of each
individual parameter in NN topology, and if not mitigated
immediately, the accumulated faults can gradually downgrade
the functionality of a long-running expected NN inference
accelerator. This might lead to drastic image misclassification
that can be considered as a delicate reliability concern in
safety-critical applications. In this paper, we proposed an
entropy-based model to estimate the impact of the soft errors
on the NN architectural. Our proposed model delineates the
impact of both undertaken SEU and MBU across layers and
per each layer of the selected NN. It can be utilized by
NN architectural developers to evaluate the error-resiliency
magnitude of the NN before employing it in the safety-critical
applications.
REFERENCES
[1] K. Murakami, N. Irie, and S. Tomita, “Simp (single instruction
stream/multiple instruction pipelining): a novel high-speed single-
processor architecture,” in ACM SIGARCH Computer Architecture News,
vol. 17, no. 3. ACM, 1989, pp. 78–85.
[2] S. Palacharla, N. P. Jouppi, and J. E. Smith, Complexity-effective
superscalar processors. ACM, 1997, vol. 25, no. 2.
[3] K. Asanovic, J. Beck, B. E. Kingsbury, P. Kohn, N. Morgan, and
J. Wawrzynek, “Spert: A vliw/simd microprocessor for artificial neural
network computations,” in [1992] Proceedings of the International
Conference on Application Specific Array Processors. IEEE, 1992,
pp. 178–190.
[4] S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris,
M. Schuette, and A. Saidi, “The reconfigurable streaming vector pro-
cessor (rsvptm),” in Proceedings of the 36th annual IEEE/ACM Inter-
national Symposium on Microarchitecture. IEEE Computer Society,
2003, p. 141.
[5] M. Sedghi, M. Georgiopoulos, and G. C. Anagnostopoulos, “Sparse
inductive embedding: An explorative data visualization technique,”
in 29th IEEE International Conference on Tools with Artificial
Intelligence, ICTAI 2017, Boston, MA, USA, November 6-8, 2017.
IEEE Computer Society, 2017, pp. 618–622. [Online]. Available:
https://doi.org/10.1109/ICTAI.2017.00099
[6] M. Sedghi, “Learning kernel-based approximate isometries,” 2017.
[7] C.-Y. Lin and B.-C. Lai, “Supporting compressed-sparse activations
and weights on simd-like accelerator for sparse convolutional neural
networks,” in Design Automation Conference (ASP-DAC), 2018 23rd
Asia and South Pacific. IEEE, 2018, pp. 105–110.
[8] A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan,
B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, “Scnn: An
accelerator for compressed-sparse convolutional neural networks,” in
ACM SIGARCH Computer Architecture News, vol. 45, no. 2, 2017, pp.
27–40.
[9] M. Sedghi, G. K. Atia, and M. Georgiopoulos, “A multi-criteria
approach for fast and outlier-aware representative selection from
manifolds,” CoRR, vol. abs/2003.05989, 2020. [Online]. Available:
https://arxiv.org/abs/2003.05989
[10] C. De Sa, M. Feldman, C. Re´, and K. Olukotun, “Understanding and
optimizing asynchronous low-precision stochastic gradient descent,” in
ACM SIGARCH Computer Architecture News, vol. 45, no. 2. ACM,
2017, pp. 561–574.
[11] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen,
Z. Xu, N. Sun et al., “Dadiannao: A machine-learning supercomputer,”
in Proceedings of the 47th Annual IEEE/ACM International Symposium
on Microarchitecture. IEEE Computer Society, 2014, pp. 609–622.
[12] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan, “Deep
learning with limited numerical precision,” in International Conference
on Machine Learning, 2015, pp. 1737–1746.
[13] U. Ko¨ster, T. Webb, X. Wang, M. Nassar, A. K. Bansal, W. Constable,
O. Elibol, S. Gray, S. Hall, L. Hornof et al., “Flexpoint: An adaptive
numerical format for efficient training of deep neural networks,” in
Advances in neural information processing systems, 2017, pp. 1742–
1752.
[14] S. Wu, G. Li, F. Chen, and L. Shi, “Training and inference with integers
in deep neural networks,” arXiv preprint arXiv:1802.04680, 2018.
[15] N. Wang, J. Choi, D. Brand, C.-Y. Chen, and K. Gopalakrishnan,
“Training deep neural networks with 8-bit floating point numbers,” in
Advances in neural information processing systems, 2018, pp. 7675–
7684.
[16] F. Li, B. Zhang, and B. Liu, “Ternary weight networks,” arXiv preprint
arXiv:1605.04711, 2016.
[17] E. Park, J. Ahn, and S. Yoo, “Weighted-entropy-based quantization
for deep neural networks,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2017, pp. 5456–5464.
[18] C. Zhu, S. Han, H. Mao, and W. J. Dally, “Trained ternary quantization,”
arXiv preprint arXiv:1612.01064, 2016.
[19] S. Jung, C. Son, S. Lee, J. Son, Y. Kwak, J.-J. Han, and C. Choi,
“Joint training of low-precision neural network with quantization interval
parameters,” arXiv preprint arXiv:1808.05779, 2018.
[20] Y. Choi, M. El-Khamy, and J. Lee, “Learning low precision deep neu-
ral networks through regularization,” arXiv preprint arXiv:1809.00095,
2018.
[21] A. Mishra, E. Nurvitadhi, J. J. Cook, and D. Marr, “Wrpn: wide reduced-
precision networks,” arXiv preprint arXiv:1709.01134, 2017.
[22] D. Zhang, J. Yang, D. Ye, and G. Hua, “Lq-nets: Learned quantization
for highly accurate and compact deep neural networks,” in Proceedings
of the European Conference on Computer Vision (ECCV), 2018, pp.
365–382.
[23] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Bi-
narized neural networks,” in Advances in neural information processing
systems, 2016, pp. 4107–4115.
[24] A. Roohi et al., “Apgan: Approximate gan for robust low energy learning
from imprecise components,” IEEE Transactions on Computers, pp. 1–1,
2019.
[25] M. Kim and P. Smaragdis, “Bitwise neural networks,” arXiv preprint
arXiv:1601.06071, 2016.
[26] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net:
Imagenet classification using binary convolutional neural networks,” in
European Conference on Computer Vision. Springer, 2016, pp. 525–
542.
[27] T. Geng, T. Wang, C. Wu, C. Yang, S. L. Song, A. Li, and M. Herbordt,
“Lp-bnn: Ultra-low-latency bnn inference with layer parallelism,” in
2019 IEEE 30th International Conference on Application-specific Sys-
tems, Architectures and Processors (ASAP), vol. 2160. IEEE, 2019,
pp. 9–16.
[28] A. Roohi et al., “Processing-in-memory acceleration of convolutional
neural networks for energy-effciency, and power-intermittency re-
silience,” in ISQED, March 2019, pp. 8–13.
[29] M. Sedghi, G. K. Atia, and M. Georgiopoulos, “Low-dimensional
decomposition of manifolds in presence of outliers,” in 29th IEEE
International Workshop on Machine Learning for Signal Processing,
MLSP 2019, Pittsburgh, PA, USA, October 13-16, 2019. IEEE,
2019, pp. 1–6. [Online]. Available: https://doi.org/10.1109/MLSP.2019.
8918806
[30] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille,
“Deeplab: Semantic image segmentation with deep convolutional nets,
atrous convolution, and fully connected crfs,” IEEE transactions on
pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848,
2017.
[31] Y. Kim, “Convolutional neural networks for sentence classification,”
arXiv preprint arXiv:1408.5882, 2014.
[32] S. W.-t. Yih, M.-W. Chang, X. He, and J. Gao, “Semantic parsing
via staged query graph generation: Question answering with knowledge
base,” 2015.
[33] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, L. Deng, G. Penn,
and D. Yu, “Convolutional neural networks for speech recognition,”
IEEE/ACM Transactions on audio, speech, and language processing,
vol. 22, no. 10, pp. 1533–1545, 2014.
[34] J. Johnson and A. Karpathy, “Stanford cs class cs231n:
Convolutional neural networks for visual recognition,”
http://cs231n.github.io/convolutional-networks/.
[35] “Why do we need to normalize the images before we put
them into cnn?” https://stats.stackexchange.com/questions/185853/why-
do-we-need-to-normalize-the-images-before-we-put-them-into-cnn.
[36] Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre,
and K. Vissers, “Finn: A framework for fast, scalable binarized neural
network inference,” in Proceedings of the 2017 ACM/SIGDA Interna-
tional Symposium on Field-Programmable Gate Arrays. ACM, 2017,
pp. 65–74.
[37] A. Dixit and A. Wood, “The impact of new technology on soft error
rates,” in 2011 International Reliability Physics Symposium. IEEE,
2011, pp. 5B–4.
[38] H. B. Schirmeier, “Efficient fault-injection-based assessment of
software-implemented hardware fault tolerance,” Ph.D. dissertation,
Technical University Dortmund, Germany, 2016.
[39] A. S. Rakin, Z. He, and D. Fan, “Bit-flip attack: Crushing neural network
withprogressive bit search,” arXiv preprint arXiv:1903.12269, 2019.
[40] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Ben-
gio, “Binarized neural networks: Training deep neural networks with
weights and activations constrained to+ 1 or-1,” arXiv preprint
arXiv:1602.02830, 2016.
[41] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in arXiv preprint arXiv:1409.1556, 2014.
