A Survey on Impact of Transient Faults on BNN Inference Accelerators by Khoshavi, Navid et al.
A Survey on Impact of Transient Faults on BNN
Inference Accelerators
Navid Khoshavi1,2, Connor Broyles2, Yu Bi3
1Department of Computer Science, Florida Polytechnic University
2Department of Electrical and Computer Engineering, Florida Polytechnic University
3Department of Electrical and Computer Engineering, University of Rhode Island
Abstract
Over past years, the philosophy for designing the artificial intelligence algorithms has significantly shifted towards automatically
extracting the composable systems from massive data volumes. This paradigm shift has been expedited by the big data booming
which enables us to easily access and analyze the highly large data sets. The most well-known class of big data analysis
techniques is called deep learning. These models require significant computation power and extremely high memory accesses
which necessitate the design of novel approaches to reduce the memory access and improve power efficiency while taking into
account the development of domain-specific hardware accelerators to support the current and future data sizes and model structures.
The current trends for designing application-specific integrated circuits barely consider the essential requirement for maintaining
the complex neural network computation to be resilient in the presence of soft errors. The soft errors might strike either memory
storage or combinational logic in the hardware accelerator that can affect the architectural behavior such that the precision of the
results fall behind the minimum allowable correctness. In this study, we demonstrate that the impact of soft errors on a customized
deep learning algorithm called Binarized Neural Network might cause drastic image misclassification. Our experimental results
show that the accuracy of image classifier can drastically drop by 76.70% and 19.25% in lfcW1A1 and cnvW1A1 networks,
respectively across CIFAR-10 and MNIST datasets during the fault injection for the worst-case scenarios.
Index Terms
Fault Injection, Deep Neural Network Accelerator, Machine Learning, Soft Error
I. INTRODUCTION
Deep learning has had a long and rich history. The increasing datasets have spurred the rapid development of machine
learning methods. Deep learning Neural Networks (DNNs), which mimics humans’ brain activities, have been surging as an
effective and efficient method to solve a variety of existing big data problems. Computer vision research, e.g. images and
videos, arguably makes the largest impact for today’s DNN study, given that humans are heavily reliant on sights for the
information [1]. The popular self-driving research is one of applications that particularly leverages the advance of computer
vision research. Besides, DNNs have also provided the avenue for the other fields of interest, such as natural language, speech
recognition, robotics, biomedical applications and many more.
The superior accuracy of DNNs, however, comes at the cost of high computational complexity and considerable off-chip
memory accesses. Due to the demands of thousands of parallel computing, Graphic Processing Units (GPUs) have been
mainly adopted for high-performance hardware platform, while Field-Programmable Gate Arrays (FPGA) are employed for
less intensive computing platform, such as in-car computers. Besides, along with the progress of DNN algorithms, scholars have
been largely working on developing the specialized hardware designs to accelerate the inferencing process of neural networks.
To be specific, the two primary phases to build and utilize the DNN model are training and inference modes. In order to train
the DNN model, a gigantic dataset and tons of computation power are required which make this process significantly slow.
Nevertheless, once the training phase is completed, the inference model can quickly perform the prediction on the given input
as long as the re-training is not required. For instance, Google has deployed its in-house hardware accelerator, so called Tensor
Processing Unit (TPU), in datacenter for speeding up DNN applications.
It is worth noting that a parallel approach is to reduce the data size while maintaining the performance. [2] is a preeminent
work studying the problem of representative selection along this line. Not to compromise on the training power, the authors
propose to capture the global structure of huge datasets through non-linear manifolds, hence, offering strong generalization
capabilities. The method has shown improved performance over the state-of-the-art subset selection methods, while bringing
about substantial speed- up.
Another effort that falls in the category of inference accelerators is Binarized Neural Network (BNN) inference accelerator
[3]. BNN compresses the network information in a reduced memory footprint through representing all input activations, weights
and output activations with 1-bit and 2-bit. Even though this data representation has significantly reduced the costly off-chip
memory accesses, but it also has increased the impact of transient faults on the results. In particular, condensing a 32-bit floating
point number which was originally used to demonstrate weights and activations to few bits comes with the risk of loosing the
whole tensor information if the representative BNN bit set flips due to soft errors, which is triggered by high-energy particles
striking transistors, can cause outlier contamination, and malfunctions such as flip bit value in sequential logic and glitch in
combinational logic [4]. Machine learning models have been shown to be drastically distorted in presence of outliers [5],
ar
X
iv
:2
00
4.
05
91
5v
1 
 [c
s.L
G]
  1
0 A
pr
 20
20
1Domain‐specific HW Accelerator
OUTPUT
OUTPUT
The object is a camel. 
Abrupt brake!!! 
PE = Processing Element
PEPE
PEPE
BIT
File
INPUT
Data
Processing
CNN
Learned Network Parameters
DDR Memory
Classifier
F P
G
A
Fig. 1. Soft error impacts on different locations in a DNN accelerator might result in image misclassification. This might cause the self-driving car to make
a hard brake if images are used to define the driving actions (adopted from [9]).
෍𝑋𝑖𝑊𝑖 ൅ 𝑏
௜
𝑋0
Synapse
𝑋0𝑊0
𝑋1𝑊1
𝑋2𝑊2
𝑓
output
𝑊1
𝑊2
𝑋1
𝑋2
𝑊0
Input 
Layer activaiton 
function
Fig. 2. Human brain-inspired neuron model
[6]. Needless to say that BNN inference accelerator is expected to remain functional for a significantly long period. Thus, the
accumulated soft errors can gradually downgrade the output accuracy in the accelerator. In particular, if the soft error causes the
data corruption that will be reused later in the dataflows of BNN accelerator, the contaminated data will pollute any remaining
steps in computations. For instance, as illustrated in Fig. 1, the soft error can impact different locations in a BNN accelerator
which is employed in a self-driving car. This incident might result in image misclassification during the safety-critical mission
and might end up with a potentially dangerous consequences. In this paper, we will present a comprehensive study on the
impact of soft errors on BNN accelerators. Specifically, our contributions in this study are as follows:
• We investigate the behavior of two well-known categories of soft errors, Single-Event Upset (SEU) and Multi-Bit Upset
(MBU), across the time and space on the combinational logic and memory storage in BNN accelerator.
• We examine the effect of soft errors on the different network topologies and data types used to represent the weights,
activations, and different layers. This part of our study identifies the most vulnerable parameters against soft errors in a
BNN inference accelerator.
• We propose a fault injection scenario on a modified version of FINN framework [7]. Our approach is significantly
more accurate compared to the existing software-level fault injections such as [8] while targeting the architectural BNN
accelerator. We inject the faults uniformly across time and space on a Xilinx Zynq-7000 ARM/FPGA SoC while the board
is executing the classification workload.
• We demonstrate that the classification accuracy in BNN accelerator can drastically drop by 76.70% and 19.25% in
lfcW1A1 and cnvW1A1 networks, respectively in the presence of soft errors for the worst-case scenarios.
The remainder of the paper is organized as follows. Sec. II presents the preliminaries of this work. In Sec. III, the experimental
results are presented. Finally, the Section IV concludes the paper.
II. PRELIMINARIES
A. Neural/Deep Neural/Convolutional Networks
A Neural Network (NN) is a computing system inspired by the human brain that is composed of three layers: an input layer,
a hidden layer, and an output layer. Each of these layers are composed of neurons while the connections between these layers
are considered synapses. A neuron in a NN, as illustrated in Fig. 2, is effectively a function that takes in the outputs of all of
the neurons in the previous layer and produces an activation, though neurons in the input layer provide a value inherently.
A Deep Neural Network (DNN) is simply a NN that has more than one hidden layer and is thus distinguished by its depth
(i.e. the number of layers). Each layer of neurons in a DNN performs some kind of operation to train on a distinct set of features
that are based on the previous layer’s output. A Convolutional Neural Network (CNN) is a DNN type utilized for processing
unique information through a hierarchy of layers and is prominent in a variety of applications such as image processing,
sentence classification, semantic parsing, and speech recognition [1]. There are many components that are contained in a CNN
that assist with operating the convolutions. As illustrated in Fig. 3, these include different types of layers such as convolutional,
fully connected, and pooling layers. The input layer maintains different values based on the applications. For example, if a
2Fig. 3. ConvNet architecture for image classification applications (adopted from[12]).
CNN is used for image classification, the input layer holds the raw pixel values of the image [10]. The input layer feeds the
CONV layer which is represented by four sub-layers 1) convolution sub-layer performs a dot product between the weights of
the regionally-connected neurons and their input sets to compute the output, 2) non-linear sub-layer uses a ReLU activation
function to map the weight sum of regionally-connected neurons to max(0, weight sum of regionally-connected neurons), 3)
normalization sub-layer scales the range of distributions of feature values to prevent the learning process to over compensate
the correction in one weight dimension whereas under-compensating in another one [11], 4) pool sub-layer reduces the spatial
size to make the number of parameters and computations to be less and less within the network [10]. The Fully Connected
(FC) layer forms the output of the previous CONV layer as a vector that represents the list of feature values. Next, this vector
can be converted to a stack of fully connected layers to identify the set of votes. The majority of votes determine the class
scores.
B. Binarized Neural Network (BNN)
In BNNs, the entire weights and activations are quantized with one- or two-bits with a small scarification in the classification
accuracy [3]. Such novelty results in representing the BNNs associated parameters in a smaller memory footprint that can fit
on an on-chip memory in the BNN accelerator.
C. DNN Accelerator
The advance of deep neural networks have stimulated the study of novel hardware architectures to fulfill the high demand for
computation power and memory bandwidth. For instance, billions of floating-point operations are required in a modern CNN
to classify a single image [7]. This not only requires the incorporation of customized computation-centric architecture, it also
necessitates the removal of off-chip memory bottlenecks to maintain the DNN associated parameters on-chip. Furthermore, the
inference operations in DNN must be executed in a fraction of second in the safety-critical applications such as autonomous
vehicles [13]. This urges the researchers to explore a novel set of hardware optimizations to meet latency constraints. The
conventional DNN accelerators are equipped with arrays of processing elements and multiple on-chip buffers. The processing
elements enable concurrent execution of sparse dependence multiply-accumulate (MAC) operations. The on-chip buffers store
the input feature maps, weights, partial sums, and output feature maps. Despite the fact that the large-size feature maps and
weights might deprive DNN from the benefits of low access latency to on-chip storage, the temporal and spatial locality
observed in the feature maps and weights enable us to deploy on-chip caching mechanism. In addition, a large on-chip storage
is embedded in modern DNN accelerators to avoid the expensive traffic to access off-chip memory while maintaining the feature
maps and weights near the processing elements [13]. Beside these, various techniques such as compression [14], pruning [15],
and reduced precision [16] have been devised in the past to improve the DNN accelerators’ performance and to amortize their
energy consumption overhead. Nevertheless, the current trends for designing DNN accelerators barely consider the essential
requirement for maintaining the sophisticated processing elements and the on-chip buffers to be resilient in the presence of
soft errors.
In this study, we targeted a well-known category of DNN accelerators called BNN inference accelerator. BNN stores the
weights and activations in 1-bit and 2-bit datatypes to significantly reduces the required memory for storing the network.
This approach facilitates the dedication of exclusive on-chip memory for maintaining the BNN information which results in
significant off-chip memory access reduction.
III. EVALUATION
A. Experimental Setup
In our experiments, we consider two different BNN topologies for evaluating fault injection:
• The convolutional network topology (cnv) determined in BNN is inspired by BinaryNet [3] and VGG-16 [17] which is
tailored with a 6 convolutional layers, 3 max pool layers and 3 fully connected layers. This topology is used to classify
the CIFAR-10 dataset, which is categorized to two groups depends on the data representation for weight and activation:
cnvW1A1 requires 1-bit to store the aforementioned parameters while the 2-bit data type is used in cnvW2A2. There
3are around 1.6 million susceptible bits and 3.2 million susceptible bits to soft errors in W1A1 and W2A2 topologies,
respectively.
• The LFC is constructed by four fully connected layers tailored with 1024 neurons each layer. This network classifies the
MNIST dataset. For such network topology, there are around 3 million susceptible bits to soft errors in both lfcW1A1 and
lfcW1A2 networks.
It is noteworthy to indicate that the binarized algorithm is not applied on the inputs fed to the first layer and the outputs
extracted from the last layer. Furthermore, since the fault injection is relatively long process for collecting meaningful data,
we limited our experimental results on classifying 1000 images in CIFAR-10. This consideration reduces the time window
required for classification. We uniformly distribute the faults across the memory/logic space and time to represent a realistic
scenario of the effect of soft errors on BNN inference accelerator in each fault injection test.
We run our experiments on a FPGA-processor co-design, Xilinx Zynq-7000 ARM/FPGA SoC, where A dual-core ARM
Cortex-A9 processor and Xilinx 7-series FPGA logic are deployed. The FINN framework presented [7] is used for BNN
inference acceleration where a group of images are classified. This process was executed on Cortex A9 cores. FINN reads the
images into the shared DRAM and launches the accelerator for classification. In order to synthesize the corresponding bitfile,
we used Vivado HLS and Vivado.
B. Fault Injection Scenarios on BNN Inference Accelerator
The soft errors might strike either memory storage or combinational logic in the hardware accelerator that cause the precision
of the results to fall behind the minimum allowable correctness. For instance, the classifier misprediction due to soft error
in CNNs running on a hardware accelerator as illustrated in Fig. 1, results in intolerable incident in the mission-critical
applications.
In this study, we assumed that the effect of soft error is uniformly distributed across space and time which is in line with
the studies presented in [18]. Thus, we selected a fault space with uniform distribution over the period of workload execution
and throughout the random locations in the targeted units. We not only investigated the impact of Single-Event Upset (SEU),
but also studied how the Multi-Bit Upset (MBU) affects the output of the accelerator. Our motivation to examine the impact
of both SEU and MBU is the study in [18] that shows even though the SEUs are still the dominance of transient faults,
the ratio of MBUs has significantly increased over past years. The reason behind this relative shift is the technology scaling
which delivers the transistors with reduced dimension. Hence, the radiation-induced transient faults with less energy are able
to unbalance the critical charge of adjacent transistors which result in the stored bits of the adjacent cells to be inverted and
to cause a glitch in the combinational logic [4]. In order to mimic the behavior of an MBU, we targeted a burst of bits in size
of 8-bit for bit-flipping which is aligned with the study demonstrated in [19]. We disregard the faults that might occur in the
combinational logic and the control logic units since they are less sensitive to soft errors [20]. For the sake of simplicity, we
assume that CPU, main memory, and the memory bus are resilient to soft errors.
We examine the effect of soft errors on the network topology and the data type used to represent the weights and activations
similar to the work presented in [8]. The wights and activations are stored on on-chip buffers and are significantly reused over
the period of BNN algorithm execution. To be specific, we targeted the following parameters in the network topology:
• Weights: The BNN accelerator exclusively performs inference operations on a large volume of images for classification.
Since the weights in BNN are set during the training session, any changes in the weights might lead the fluctuations in
the classification accuracy for the rest of BNN execution on FPGA.
• Activations: As mentioned before, the BNN that we target for our study has been already trained. Thus, the soft error
in the activations of a pre-trained BNN manifest itself as variations in the accuracy of classification. Since the corrupted
activations will be reused in the rest of the workload execution, we expect that the classification accuracy to drop.
• Layers: The convolutional network topology determined in BNN is inspired by BinaryNet [3] and VGG-16 [17] which is
tailored with a 6 convolutional layers, 3 max pool layers and 3 fully connected layers. Since each category of layers has
different parameters, we examine the impact of soft errors in each category, separately. We did not run our experiments
on lfc network due to its simple network topology.
We performed 2000 fault injections for scenarios shown in Table I, to collect a sufficient pool of samples for determining the
vulnerability of different network topologies to soft errors in BNN inference accelerator. To be specific, the faults are injected
on the targeted parameters located in a certain memory addresses which can be accessed through our in-house fault injection
script. This process is an on-the-fly process meaning that the fault/faults is/are injected while FINN is running on Zynq-7000
SoC board. We evaluated the sensitivity of the accelerator to a range number of faults that might occur in the operational
lifetime of the device. As listed in Table I, we assessed the impact of injecting 1, 2, 5, 10, 20, 50, and 100 faults to highlight
the impact of accumulated soft errors which is fairly realistic in DNN inference accelerators. We initialized the network to the
default after each fault injection test.
C. Results
We evaluated the vulnerability of BNN accelerator against soft errors through injecting 1, 2, 5, 10, 20, 50, and 100
accumulated faults during the operational lifetime of the accelerator at random time on random location in cnvW1A1, cnvW2A2,
4TABLE I
FAULT INJECTION IMPACT OF BNN CLASSIFICATION ACCURACY*
Weight Activation
SEU MBU SEU MBUNetwork
Topology
# of injected
faults Accuracy
Reduction
Effective
Faults (%)
Accuracy
Reduction
Effective
Faults (%)
Accuracy
Reduction
Effective
Faults (%)
Accuracy
Reduction
Effective
Faults (%)
1 Fault -0.00143 (-0.18%) 28.0 -0.00132 (-0.16%) 38.0 0.0001 (0.01%) 62.0 0.00056 (0.07%) 66.0
2 Faults 0.00021 (0.03%) 39.0 -0.00131 (-0.16%) 59.0 -0.00034 (-0.04%) 77.0 0.00096 (0.12%) 75.0
5 Faults -0.00036 (-0.04%) 76.0 -0.00093 (-0.12%) 84.0 0.00159 (0.20%) 87.0 0.00224 (0.28%) 88.0
10 Fualts -0.00083 (-0.10%) 83.0 -0.00141 (-0.18%) 88.0 0.00345 (0.43%) 94.0 0.00482 (0.60%) 97.0
20 Faults 0.00097 (-0.12%) 86.0 -0.00297 (-0.37%) 96.0 0.00762 (0.95%) 98.0 0.00883 (1.10%) 95.0
50 Faults -0.00219 (-0.27%) 94.0 -0.00346 (-0.43%) 98.0 0.01963 (2.44%) 99.0 0.03740 (4.65%) 100.0
cnvW1A1
100 Faults -0.00244 (-0.30%) 94.0 -0.00361 (-0.45%) 95.0 0.05270 (6.54%) 100.0 0.06787 (8.43%) 100.0
1 Fault -0.00032 (-0.04%) 22.0 -0.00162 (-0.19%) 21.0 -0.00148 (-0.17%) 29.0 -0.00058 (-0.07%) 38.0
2 Faults -0.00192 (-0.23%) 38.0 -0.00155 (-0.18%) 56.0 -0.00118 (-0.14%) 65.0 -0.00123 (-0.14%) 65.0
5 Faults -0.00193 (-0.23%) 74.0 -0.00207 (-0.24%) 85.0 -0.00216 (0.25%) 90.0 -0.00190 (-0.22%) 86.0
10 Faults -0.00266 (-0.31%) 85.0 -0.00315 (-0.37%) 88.0 -0.00085 (-0.10%) 91.0 -0.00064 (-0.08%) 90.0
20 Faults -0.00395 (-0.46%) 97.0 -0.00373 (-0.44%) 96.0 0.00054 (0.06%) 95.0 -0.00058 (0.07%) 91.0
50 Faults -0.00403 (-0.47%) 96.0 -0.00404 (-0.47%) 97.0 0.00193 (0.23%) 94.0 0.00232 (0.27%) 97.0
cnvW2A2
100 Faults -0.00489 (-0.57%) 97.0 -0.00378 (-0.44%) 96.0 0.00711 (0.83%) 97.0 0.00601 (0.70%) 92.0
1 Fault 0.00005 (0.006%) 18.0 0.00003 (0.003%) 16.0 0.00002 (0.002%) 46.0 0.00001 (0.001%) 51.0
10 Faults 0.00003 (0.004%) 69.0 0.00004 (0.004%) 72.0 0.00061 (0.062%) 77.0 0.01199 (1.22%) 78.0lfcW1A1
100 Faults 0.00040 (0.04%) 93.0 0.00030 (0.03%) 89.0 0.05058 (5.14%) 94.0 0.06466 (6.57%) 96.0
1 Fault -0.00014 (-0.01%) 20.0 -0.00012 (-0.01%) 18.0 -0.00013 (-0.01%) 10.0 -0.00014 (-0.01%) 17.0
10 Faults -0.00013 (-0.01%) 79.0 -0.00012 (-0.01%) 69.0 -0.00013 (-0.01%) 82.0 0.00037 (0.04%) 68.0lfcW1A2
100 Faults 0.00450 (0.46%) 88.0 0.00078 (0.08%) 86.0 0.00031 (0.03%) 87.0 0.00055 (0.06%) 88.0
*These results are collected through performing 2000 fault injection tests for each scenario. In particular, we assessed the impact of injecting 1, 2, 5, 10, 20, 50, and 100 faults during the operational lifetime of the
accelerator in cnvW1A1, cnvW2A2, lfcW1A1, and lfcW1A2 to highlight the impact of accumulated soft errors. We only reported the effect of injecting 1, 10, and 100 faults on lfcW1A1 and lfcW1A2. For instance,
the last row of the table indicates that 100 faults are injected at random time on random locations while the classification algorithm is using lfcW1A2 network for classifying the images. The Effective Runs column
indicates what percentage of test runs had faults that cause the overall accuracy to differ from the baseline.
Fig. 4. The distribution of classification accuracy degradation in the presence of various faults in a cnvW1A1 network. The baseline classifies 1000 images
with 80.5% accuracy. The plot has been sectioned based on the number of injected faults. On the most left figure, the plot’s bars that show the min/max
classification accuracy due to the effect of SEUs on weights and activations are colored with the same color in each section. A different color is used for
plotting MBUs’ effects on weights and activations in each section.
Fig. 5. The distribution of classification accuracy degradation in the presence of various faults in a lfcW1A1 network. The baseline classifies 1000 images
with 98.4% accuracy. The outliers and the accuracy distribution due to the effect of faults are illustrated with color-codded dots.
lfcW1A1, and lfcW1A2 under two well-known categories of soft errors, SEU and MBU, as listed in Table I. Each fault injection
scenario was repeated for 2000 rounds. The average of the classification accuracy degradation across fault injection tests was
considered for each scenario. Based on our experimental results, we observed the following incidents:
• Reducing the number of bits for storing the network information increases the vulnerability of the accelerator to soft
errors. This effect has been highlighted with three different colors on Activation column in Table I. Even though the
representation of weight and activation with one bit significantly reduces the memory size, it also drastically increases
the negative effect of soft errors. As an example, the accuracy of the classification dropped by 8.43% after injecting 100
MBUs in the activation layers of cnvW1A1 during the lifetime of the accelerator while our similar research conduction
563%
68%
73%
78%
83%
1 2 3 4 5 6 7 8 9
Cl
as
si
fic
at
io
n 
Ac
cu
ra
cy
Layer
cnvW1A1 Accuracy - 5 SEUs
1 2 3 4 5 6 7 8 9
Layer
cnvW1A1 Accuracy - 10 SEUs
1 2 3 4 5 6 7 8 9
Layer
cnvW1A1 Accuracy - 50 SEUs
1 2 3 4 5 6 7 8 9
Layer
cnvW1A1 Accuracy - 100 SEUs
Fig. 6. The effect of different number of accumulated SEUs on different layers of cnvW1A1.
on cnvW2A2 caused less than 1% accuracy degradation.
• The activation layer functions are significantly vulnerable to both SEUs and MBUs. As shown in the Activation column
of Table I, the effect of the faults that have directly bit flipped the bit set of the activation layers resulted in significant
drop in the classification accuracy. This incident is more noticeable for the cases that the number of accumulated injected
faults are more than 20 faults. We have color coded six cells in the Activation column of Table I to emphasize the drastic
variations in the accuracy of classification for these incidents.
• As the number of accumulated faults increases in the fault injection test, the accuracy of the classification is relatively
reduced. For instance, the classification accuracy in a cnvW1A1 network can significantly drop from 80.5% to 72.0%, on
average in the scenario that 100 accumulated faults are injected during the lifetime of BNN accelerator.
• The effect of MBU is relatively higher than SEU. The MBU causes a set of bits to be flipped. This not only causes the
variation in the targeted parameter, but it also causes the corruption of adjacent bits in other tensors which escalates the
impact of the soft error.
• Even though the average degradation of classification in all scenarios might not seem significant, we observed that
the accelerator can potentially suffer from a drastic misclassification in the worst case scenarios. Fig. 4 and Fig. 5
illustrates the variance of changes in the classification accuracy under different scenarios. Based on Fig. 5 (lfcW1A1 -
Activation SEU), the lfcW1A1 network experiences a very wide range of misclassifications. For instance, the accuracy
of image classifier can drastically drop by 76.7% (from 98.4% to 22.92%) in lfcW1A1. This is the worst case scenario
where 100 SEUs are injected during the workload operation.
• We conducted the fault injection tests on different layers of cnvW1A1 and cnvW2A2 with various accumulated faults. The
results showed that the vulnerability of a layer appears to be directly related to how early it appears in the network, with
the first layer being the most vulnerable by far based on Fig. 6.
IV. RELATED WORKS
The authors of [21] explored the potential of fault injection attacks on DNN to misclassify the given input. It assumes that
faults are systematically effective and are able to modify the parameters in the algorithm-level which lead to changes in the
bias and the targeted layers in DNN. However, this approach lacks a realistic fault injection scenario. The transient faults or
soft errors are uniformly distributed across space and time which makes the effect of them on the hardware unpredictable.
Furthermore, the precise assumed memory faults injection attacks such as laser beam fault injection [22] and row hammer
attack [23] might not reproduce the same results in each attack due to changes in the parameters of the underlying device
through the period of test [24], [25]. The authors of [8] used a DNN simulator to run the fault injections. They modified
the framework written in C++ to realize the DNN hardware accelerator and to study the impact of fault on the underlying
microarchitecture. Again, this approach does not consider a realistic framework for exploring the impact of faults on the DNN
accelerator. On the other hand, the authors of [13] have proposed a circuit-level fault injector to mimic the effect of particle
strikes on one or multiple nodes. Even though this approach can accurately simulates the faults’ effect on the components of the
circuit, but its high-latency and complexity to simulate architectural DNN accelerator makes it undesirable for the researchers
who are studying the resiliency of DNN accelerators as an overarching system.
V. CONCLUSIONS
In this paper, we showed that using the compression techniques for reducing the memory size of DNN has significantly
increased the vulnerability of some parameters in the network to the soft errors. Our realistic FPGA-based fault injection
method proved that the activation layer functions are significantly vulnerable to both SEUs and MBUs compared to the weight
layers. We also demonstrated that the MBU has relatively higher impact on the accelerator. Furthermore, the soft errors have
higher effect on the layers that appear earlier in the network. If the impact of accumulated soft errors in the accelerators are
not decontaminated, it might result in drastic accuracy degradation in the output of the workloads.
REFERENCES
[1] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, “Convolutional neural networks for speech recognition,” IEEE/ACM Transactions
on audio, speech, and language processing, vol. 22, no. 10, pp. 1533–1545, 2014.
[2] M. Sedghi, G. K. Atia, and M. Georgiopoulos, “A multi-criteria approach for fast and outlier-aware representative selection from manifolds,” CoRR,
vol. abs/2003.05989, 2020. [Online]. Available: https://arxiv.org/abs/2003.05989
6[3] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks: Training deep neural networks with weights and
activations constrained to+ 1 or-1,” arXiv preprint arXiv:1602.02830, 2016.
[4] N. Khoshavi, X. Chen, J. Wang, and R. F. DeMara, “Bit-upset vulnerability factor for edram last level cache immunity analysis,” in 2016 17th International
Symposium on Quality Electronic Design (ISQED). IEEE, 2016, pp. 6–11.
[5] M. Sedghi, G. K. Atia, and M. Georgiopoulos, “Robust manifold learning via conformity pursuit,” IEEE Signal Process. Lett., vol. 26, no. 3, pp.
425–429, 2019. [Online]. Available: https://doi.org/10.1109/LSP.2019.2893064
[6] ——, “Kernel coherence pursuit: A manifold learning-based outlier detection technique,” in 52nd Asilomar Conference on Signals, Systems, and
Computers, ACSSC 2018, Pacific Grove, CA, USA, October 28-31, 2018, M. B. Matthews, Ed. IEEE, 2018, pp. 2017–2021. [Online]. Available:
https://doi.org/10.1109/ACSSC.2018.8645334
[7] Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “Finn: A framework for fast, scalable binarized neural network
inference,” in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2017, pp. 65–74.
[8] G. Li, S. K. S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman, J. Emer, and S. W. Keckler, “Understanding error propagation in deep learning neural
network (dnn) accelerators and applications,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage
and Analysis. ACM, 2017, p. 8.
[9] G. Lacey, G. W. Taylor, and S. Areibi, “Deep learning on fpgas: Past, present, and future,” arXiv preprint arXiv:1602.04283, 2016.
[10] J. Johnson and A. Karpathy, “Stanford cs class cs231n: Convolutional neural networks for visual recognition,” http://cs231n.github.io/convolutional-
networks/.
[11] “Why do we need to normalize the images before we put them into cnn?” https://stats.stackexchange.com/questions/185853/why-do-we-need-to-normalize-
the-images-before-we-put-them-into-cnn.
[12] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, vol.
105, no. 12, pp. 2295–2329, 2017.
[13] A. Azizimazreah, Y. Gu, X. Gu, and L. Chen, “Tolerating soft errors in deep learning accelerators with reliable on-chip memory designs,” in 2018 IEEE
International Conference on Networking, Architecture and Storage (NAS). IEEE, 2018, pp. 1–10.
[14] C.-Y. Lin and B.-C. Lai, “Supporting compressed-sparse activations and weights on simd-like accelerator for sparse convolutional neural networks,” in
Design Automation Conference (ASP-DAC), 2018 23rd Asia and South Pacific. IEEE, 2018, pp. 105–110.
[15] A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally, “Scnn: An accelerator for
compressed-sparse convolutional neural networks,” in ACM SIGARCH Computer Architecture News, vol. 45, no. 2, 2017, pp. 27–40.
[16] C. De Sa, M. Feldman, C. Re´, and K. Olukotun, “Understanding and optimizing asynchronous low-precision stochastic gradient descent,” in ACM
SIGARCH Computer Architecture News, vol. 45, no. 2. ACM, 2017, pp. 561–574.
[17] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in arXiv preprint arXiv:1409.1556, 2014.
[18] A. Dixit and A. Wood, “The impact of new technology on soft error rates,” in 2011 International Reliability Physics Symposium. IEEE, 2011, pp.
5B–4.
[19] H. B. Schirmeier, “Efficient fault-injection-based assessment of software-implemented hardware fault tolerance,” Ph.D. dissertation, Technical University
Dortmund, Germany, 2016.
[20] N. Seifert, B. Gill, S. Jahinuzzaman, J. Basile, V. Ambrose, Q. Shi, R. Allmon, and A. Bramnik, “Soft error susceptibilities of 22 nm tri-gate devices,”
IEEE Transactions on Nuclear Science, vol. 59, no. 6, pp. 2666–2673, 2012.
[21] Y. Liu, L. Wei, B. Luo, and Q. Xu, “Fault injection attack on deep neural network,” in 2017 IEEE/ACM International Conference on Computer-Aided
Design (ICCAD), Nov 2017, pp. 131–138.
[22] A. Barenghi, L. Breveglieri, I. Koren, and D. Naccache, “Fault injection attacks on cryptographic devices: Theory, practice, and countermeasures,”
Proceedings of the IEEE, vol. 100, no. 11, pp. 3056–3076, 2012.
[23] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu, “Flipping bits in memory without accessing them: An
experimental study of dram disturbance errors,” in ACM SIGARCH Computer Architecture News, vol. 42, no. 3. IEEE Press, 2014, pp. 361–372.
[24] N. Khoshavi, R. A. Ashraf, R. F. DeMara, S. Kiamehr, F. Oboril, and M. B. Tahoori, “Contemporary cmos aging mitigation techniques: Survey, taxonomy,
and methods,” Integration, vol. 59, pp. 10 – 22, 2017. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0167926017301876
[25] N. Khoshavi, R. A. Ashraf, and R. F. DeMara, “Applicability of power-gating strategies for aging mitigation of cmos logic paths,” in 2014 IEEE 57th
International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE, 2014, pp. 929–932.
