The Study of Transient Faults Propagation in Multithread Applications by Khoshavi, Navid & Samiei, Armin
The Study of Transient Faults Propagation
in Multithread Applications
Navid Khoshavi
∗
University of Central Florida
nkhoshavi@eecs.ucf.edu
Armin Samiei
†
University of Central Florida
samiei2@gmail.com
ABSTRACT
Whereas contemporary Error Correcting Codes (ECC) de-
signs occupy a significant fraction of total die area in chip-
multiprocessors (CMPs), approaches to deal with the vul-
nerability increase of CMP architecture against Single Event
Upsets (SEUs) and Multi-Bit Upsets (MBUs) are sought.
In this paper, we focus on reliability assessment of mul-
tithreaded applications running on CMPs to propose an
adaptive application-relevant architecture design to accom-
modate the impact of both SEUs and MBUs in the entire
CMP architecture.
This work concentrates on leveraging the intrinsic soft-
error-immunity feature of Spin-Transfer Torque RAM (STT-
RAM) as an alternative for SRAM-based storage and opera-
tion components. We target a specific portion of working set
for reallocation to improve the reliability level of the CMP
architecture design. A selected portion of instructions in
multithreaded program which experience high rate of refer-
encing with the lowest memory modification are ideal can-
didate to be stored and executed in STT-RAM based com-
ponents. We argue about why we cannot use STT-RAM for
the global storage and operation counterparts and describe
the obtained resiliency compared to the baseline setup. In
addition, a detail study of the impact of SEUs and MBUs on
multithreaded programs will be presented in the Appendix.
Keywords
multi-threaded applications; on-line testing; soft error; sin-
gle event upset; multiple bit upset; fault-tolerant systems;
memory structure; emerging technology;
∗Navid Khoshavi is a Ph. D. student in department of elec-
tronic engineering and computer science at University of
Central Florida. He received his M.S. from Amirkabir Uni-
versity of Technology (AUT) (Tehran Polytechnic), Iran, in
2012.†Armin Samiei is a M. S. student in department of Computer
Science at University of Central Florida. He received his B.S.
from University of Shiraz, Iran, in 2012.
ACM ISBN 978-1-4503-2138-9.
DOI: 10.1145/1235
1. INTRODUCTION
In order to keep performance improvement rates within
a given power budget, the ITRS technology roadmap rec-
ommends the movement toward employing many-core pro-
cessors in devices offering reduced power consumption and
execution time. To maximize the beneficial of using many-
core processors, thread level parallelism has been introduced
as an inevitable counterpart in multicore programming [5].
Meanwhile, the advances in CMOS technology have pro-
vided reduction in transistor size and voltage levels which
results in significant increase of transient fault occurrence
in the microprocessors. In particular, given roughly 50% of
chip is occupied by memory structure, the existing memory
module becomes highly susceptible to soft errors [13].
Soft errors also referred as Single Event Upsets (SEUs) in-
duced by energetic particles penetrate the silicon substrate
and generate electron-hole pairs along their tracks. If the
generated electrons collected into cell junction is larger than
critical charge (QC), it can flip the cell state [7], [9]. The
smaller device size and power supply reductions have severely
increased the impact of SEU on deep-submicron technology,
as they reduced aggressively the critical charge of memory
cells [8], [6]. Thus, the memory cells have become more sen-
sitive to atmospheric neutrons as well as to alpha particles
which are created by unstable isotopes in the materials of a
chip.
Furthermore, the technology scaling and high precision
manufacturing techniques have also decreased the separation
distance between two adjacent memory cells which results in
a single particle strike passing through adjacent cells in a row
flip more than one cell. This phenomenon is called Multiple-
bit Upset (MBU) [4]. The main issue regarding the MBU
handling is that the existing Error Correcting Codes (ECC)
are not able to handle MBUs due to unpredictable behavior
of the impact of soft errors when they flip more than one
memory cell. To maintain acceptable reliability levels, the
state-of-the-art MBU protection technigues have recently re-
ceived significant attention to protect modern multicore ar-
chitectures from the potential write and read failures intro-
duced by soft errors.
One intuitive solution is to replace traditional vulnerable
SRAM-based memory technology with soft-error-immune mem-
ory modules such as Spin-Transfer Torque Magnetic Ran-
dom Access Memory (STT-RAM). The STT-RAM offers
high density, low standby power, nonvolatility and soft error
resiliency [14][21]. Recent research shows that the intrinsic
immunity of STT-RAM to soft errors cause this device gets
influenced by several order lower soft errors compared to
ar
X
iv
:1
60
7.
08
52
3v
1 
 [c
s.A
R]
  2
8 J
ul 
20
16
SRAM [17][15].
The authors of [15] proposed an architectural radiation-
induced soft error resilient solution for L1 cache through
using STT-RAM as alternative for traditional SRAM. How-
ever, this work only concentrates on the impact of soft errors
in L1 cache while the lifetime of a L1 cacheline is extremely
short which means the error caused by particle strikes may
not have enough time to be either consumed by CPU or
propagated to the lower level of memory hierarchy. On the
other hand, the authors of [14] studied the impact of using
3D stacked STT-RAM caches on the reliability of the whole
cache hierarchy. They proposed a set of configurations for
cache hierarchy and compared the results in respect of per-
formance, power consumption and reliability. The obtained
results show that the replacing memories with a STT-RAM
alternative can significantly mitigate soft errors while offer-
ing slight performance improvement.
In this work, we show that not every memory component
in the processor core is required to be replaced by STT-RAM
because the following reasons:
• The frequency of instruction call is not similar for ev-
ery single instruction of the program. For example, a
95% of a program execution time may be spent on an
iteration which implies that this portion of program is
more susceptible to soft errors if the generated error be
consumed by CPU before masking the error through a
write operation.
• The instructions running on CPU show various sensi-
tivity to soft errors. For example, if a bit cell in regis-
ter file of a sensitive instruction flipped due to particle
strike, the error is immediately propagated into the
multithreaded program which may result in the rapid
crash of the program.
• The STT-RAM suffers from long write latency and
high write energy which impose extra overhead to run
write-intensive workloads in terms of performance re-
duction and high dynamic energy consumption.
The ideal instructions to benefit from soft-error-immune
STT-RAM memory component are those instructions which
experience high rate of referencing with the lowest memory
modification. Thus, a comprehensive study is required to
recognize highly sensitive instruction to soft errors either to
SEU or to MBU, and frequently called instructions with the
lowest memory modification in the system. To protect afore-
mentioned instruction typeset, we propose to use STT-RAM
memory module to maintain the reliability levels of the sys-
tem in an acceptable error margin. Accordingly, the trade-
off among reliability, performance, and power consumption
of the proposed technique compared to traditional method-
ologies will be explored.
The outline of the rest of the paper is as follows: we ex-
amine the runtime behavior of multithreaded programs in
Section 2. The proposed execution model assumed in this
paper will be presented in Section 3. The experimental re-
sults will be discussed in Section 4. Finally, we conclude the
paper in Section 5.
2. THE RUNTIME PROGRAM BEHAVIOR
The runtime program behavior is required to be investi-
gated to help us to determine which functions or instructions
are ideal candidates to be mapped on soft error resilient
STT-RAM components.
2.1 The Frequency Call of Instructions
In order to investigate the instruction references in mul-
tithreaded programs, we used Visual Studio 2013 Profiler
Instrumentation with no special optimizations on Intel i7
with 8 GBs of RAM. The data values and ranges in the
report are named ”Elapsed Inclusive Time %”, ”Elapsed Ex-
clusive Time %”, ”Avg. Elapsed Inclusive Time” and ”Avg.
Elapsed Exclusive Time”which are defined accordingly ”The
percentage of time spent executing a function and its child
functions”, ”The percentage of time spent executing a func-
tion excluding its child functions”, ”The average time spent
executing a function and its child functions” and ”The aver-
age time spent executing a function excluding its child func-
tions”. Table 1 shows most used functions in each bench-
mark and the results gathered from instrumentation.
2.2 The Sensitivity Analysis of Instructions to
Soft Errors
We define the application resilience to soft errors as its
ability to tolerate hardware faults if they occur, without
leading to an incorrect output. Incorrect outputs are also
known as Silent Data Corruptions (SDCs). To recover this
group of failures, there is no dedicated method to indicate
that the application has malfunctioned (unlike a crash or
a hang, where either an exception is raised or a timeout
occurs). Since, we are primarily interested in evaluating
the soft error immunity of applications, we only inject faults
into the program’s data or instructions that are visible at the
assembly code or higher levels, rather than into the micro-
architectural structures.
Accordingly, we classify the outcome of activated faults
based on the program’s behavior to following categories:
• Crash: if the program is terminated by the OS due to
an exception.
• Silent Data Corruption (SDC): if the output is incor-
rect due to lack of appropriate method to report the
impact of the fault propagation.
Figure. 1 shows the impact of SEU on benchmarks suite
which benefit from POSIX Pthreads standard [1] for creat-
ing and handling threads. The most SEUs in specrand and
factorial benchmarks result in SDC in our system while
other benchmarks often crash when a soft error occurs. Fur-
thermore, Figure. 2 shows that the rate of crash or SDC sig-
nificantly increase as the number of flipped bits increased.
This results confirm our previous statement that MBUs are
major reliability issue in current multi-core systems which
demand a comprehensive solution for mitigating them.
To determine the most sensitive instructions to soft er-
ror, we target various type of instructions. We noticed that
pthread related code fragments show high vulnerability to
soft errors. This means pthread related code fragments are
required to be mapped on soft-error-immune storage and op-
eration components. For further information, please refer to
Appendix A.
3. THE PROPOSED EXECUTION MODEL
To completely benefit from the intrinsic soft error resiliency
characteristic of STT-RAM, we also need to thoroughly ex-
plore the other aspects of using STT-RAM instead of SRAM
Table 1: The function reference computation using Visual Studio 2013 profiler instrumentation
Benchmarks Function Number Elapsed Inclusive Elapsed Exclusive Avg. Elapsed Avg. Elapsed
Name of Calls Time% Time% Inclusive Time (sec) Exclusive Time (sec)
blackscholes
mainCRTStartup 1 100 34.41 162.89 56.05
bs thread 1 34 34 55.38 55.38
specrand
printf 1,002 99.9 99.9 0.15 0.15
mainCRTStartup 1 99.98 0.05 152.25 0.08
mm
printf 8 51.93 51.93 0.06 0.06
pthread join 2 21.39 21.39 0.1 0.1
pthread create 2 14.76 14.76 0.07 0.07
qs
printf 5 43.91 43.91 0.1 0.1
pthread join 2 24.74 24.74 0.14 0.14
pthread create 2 17.12 17.12 0.1 0.1
factorial
printf 1 78.09 78.09 0.23 0.23
CxxSetUnhandled 1 9.86 9.86 0.03 0.03
ExceptionFilter
circular buffer
pthread join 2 47.17 47.17 0.12 0.12
pthread create 2 35.41 35.41 0.09 0.09
stack
pthread join 2 55.09 55.09 0.19 0.19
pthread create 2 32.52 32.52 0.11 0.11
0
200
400
600
800
1000
#  
o f
  p
r o
p a
g a
t e
d  
f a
u l
t s
single fault injection into all instructions
crached SDC
(a)
Figure 1: Aggregated single fault injection results
with LLFI for all operation instructions
0
200
400
600
800
1000
#  
o f
  p
r o
p a
g a
t e
d  
f a
u l
t s
multiple fault injection into all instructions
crached SDC
(a)
Figure 2: Aggregated multiple fault injection results
with LLFI for all operation instructions
to make sure the requirements for performance specifica-
tions and energy constraints will be met. This section first
describes the structure of STT-RAM and the steps asso-
ciated to each read or write operation. Then, our hard-
ware/software model assumed in our system and the partic-
ular instruction protection strategies applies to our program
is presented.
3.1 The Spin-Transfer Torque Magnetic Ran-
dom Access Memory (STT-RAM)
As illustrated in Figure. 3, the STT-RAM uses magnetic
elements called magnetic tunneling junction (MTJ) to store
data in which a thin insulating oxide later, e.g. MgO, is
sandwiched by two ferromagnetic layers. Moreover, the up-
per ferromagnetic layer usually aliased as free layer, its po-
larity of magnetic field can be flipped over during a write
event; while, the lower ferromagnetic layer usually called as
pinned layer is designed to have its magnetization pinned.
Thus, MTJ has low (high) resistance distribution if the mag-
netization of the free layer and the pinned layer are aligned
(anti-aligned). Accordingly, low (high) resistance distribu-
tion is stored in MTJ, instead of traditional electronic charge
or current flow.
For a read operation, a small current is required to be
driven from bit-line to source line. Unlike read operation,
a successful write operation requires a current flow drive ei-
ther from bit-line to source-line or vice versa, depending on
the differential voltage between these two lines. Although
STT-RAM does not suffer from write endurance, however
the advent of long write latency and high energy consump-
tion exacerbate the performance of STT-RAM. Figure. 4
shows the write latency comparison for various cache module
configurations among three well-known memory technolo-
gies including eDRAM, STT-RAM, and SRAM obtained
from NVSim [3] for 45nm technology node. This comparison
shows that the STT-RAM is not a good candidate for small
size memory module due to its long write latency compared
to other technologies. This means that the design of pro-
posed architecture should carefully leverage the potential of
STT-RAM for small size storage elements like register arrays
in the processor’s pipeline.
BAA-RIK-14-05: Topic 2 slide 4
Bit Line
Source Line
MTJ Free layer
Pinned layerWord Line
Transistor
Sense Amp.
Ref.
Write Pulse/ 
Read Bias 
Generator
Figure 3: An illustration of a 1T1J STT-RAM cell.
‐4
‐3
‐2
‐1
0
1
2
3
4
5
6
eDRAM STT‐RAM SRAM
Capacity
ln
 (W
rit
e L
at
en
cy
), n
s
WriteLatency
4ns
0.5ns
148ns
13ns
36ns
5ns
5.5ns
Figure 4: The write latency of STT-RAM compared
to other memory technologies.
3.2 Soft Error Resilient Architecture Config-
uration
We want to make both highly referenced storage elements
and operations in the proposed architecture immune to soft
error strikes. The proposed hardware and software interface
is built on the previous works that enable programmers to
map nonvolatile data on nonvolatile main memory [2] [18].
These techniques consist of language, compiler, and runtime
system support to manage nonvolatile data. We extended
the previous frameworks to allow programmers to use special
keywords and library calls to handle data that require soft-
error-immune storage component. In particular, we used
the sensitivity analysis of instructions to soft errors and the
rate of instruction referencing to determine which typeset of
instructions are required to be mapped on soft-error-resilient
components.
As illustrated in Figure. 5, the proposed storage is offered
in the form of both reliable and unreliable storage compo-
nents. The reliable storage components are made of STT-
RAM offering high resiliency to soft errors while the tradi-
tional SRAM cells are used for creating unreliable storage
elements. To be specific, the reliable and unreliable registers
are distinguished based on the register number. The reliable
data stored in memory is distinguished from unreliable data
based on regions of physical memory address. This can be
done by the proposed approach in [10] where the reliable
data can be linked to a reserved virtual address space (reli-
able space). When this reliable address is accessed, the data
is stored in reliable portions of the data cache. Note that
the hybrid SRAM and STT-RAM cache configuration tech-
niques [16] [19] have been proposed in the past and can be
utilized to allocate data across the hybrid cache hierarchy.
For reliable operations, a specific keywords and library
calls such as ”nv pthread t”(a nonvolatile version of pthread t)
are available to programmers for reliable integer ALU op-
erations as well as reliable floating point operations. The
reliable instructions can use special functional units which
are made of STT-RAM offering soft error resiliency. Note
that even if a field which should be mapped into soft-error-
immune field ends up stored in an unreliable memory, it
will still be loaded into reliable registers and be subject to
reliable operations.
4. EXPERIMENTAL RESULTS
In order to evaluate the proposed technique, LLVM-based
fault injection tool called LLFI [20] has been used to inject
transient faults into the multithreaded programs in a multi-
core system (Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz,
RAM=6GB, OS=Linux Ubuntu 14.04.3 LTS). LLFI works
at the LLVM compiler’s IR level, and allows fault-injections
to be performed at specific program points, and into par-
ticular instructions. LLFI supports various fault injection
customizations, and enables tracing the propagation of the
fault among instructions in the program.
The steps required to inject faults using LLFI are as fol-
lowing:
• In Step 1, LLFI takes the program IR as input, and ap-
plies custom fault injection instruction and operand(s)
selector to determine which instructions/operands are
fault injection candidates.
• In Step 2, LLFI instruments the fault injection instruc-
tions/operands with calls to fault injection functions.
The fault injection functions are designed to perturb
the specific instruction operand according to the spec-
ified fault type at runtime (e.g. flip one bit of the
operand for bit-flip faults).
• In Step 3, the compiled program is executed at run-
time, and LLFI randomly selects one runtime instance
of the instrumented instructions to trigger the fault in-
jection function and inject into the selected instruction
operand value.
Because hardware faults occur randomly at runtime, LLFI
picks a random instruction from the set of all dynamically
executed instructions at runtime to inject into.
The benchmarks suite we used in our experimental results
are as following:
• blackscholes: The blackscholes application is an Intel
RMS benchmark. It calculates the prices for a port-
folio of European options analytically with the Black-
Scholes partial differential equation. (PDE).
• specrand: The benchmark simply generates a sequence
of pseudorandom numbers starting with a known seed.
• Matrix Multiplication (mm): It is a simple matrix mul-
tiplication program in which the main thread makes
slave threads responsible to compute each elements of
the product separately and concurrently.
BAA-RIK-14-05: Topic 2 slide 1
EECS
Department
int main(void) 
{
int x;
nv_int y;
int A[100];
nv_int B[100];
nv_pthread_t id1, id2;
…
nv_pthread_mutex_init(&m, 0);
nv_pthread_create(&id1, NULL, t1, &queue);
…
return 0;
}
Source Code Hardware
Registers
CPU
Int FP
Functional Units
Int FP
L1 D
STT-RAM
L1 D
SRAM
L1 I
STT-RAM
L1 I
SRAM
L2
STT-RAM
L2
SRAM
Cache 
Module
Memory
STT-RAM
SRAM
Figure 5: HW/SW model assumed in our system. The green areas are made of soft-error-resilient memory
module.
• Quick Sort (qs): It is a simple quick sort program in
which main thread first partitions the 100-elements
array of integers into two parts, by performing one
round of the quick sort algorithm, then assigns each
sub-arrays to a slave thread in order to sort each part
separately and simultaneously.
• factorial: It computes the product of all positive inte-
gers less than or equal to n.
• circular buffer: It simulates a buffer using shared vari-
ables to synchronize receive and send operations.
• stack: It is a program simulating a data-stack struc-
ture.
Since we are interested in the study of the impact of MBUs
in the existing CMP architectures, we injected 1000 multiple
faults into the assumed hardware model in our system. The
results show that the proposed architecture design is 30%
on average more resilient to soft errors as shown in Figure.
6, 7 and 8. The obtained results show the efficiency of the
proposed method to handle soft errors in the entire CMP
architecture. For most of the benchmarks, the replacement
of a portion of memory modules reduces the period during
which the data in the storage component are exposed to par-
ticle strikes. In the cache hierarchy, the use of STT-RAM to
maintain frequently read cachelines in the high-dense low-
level caches significantly increase the reliability level of the
system. The main reason for the high vulnerability of tradi-
tional low-level caches to soft error is the high potential of
residing a data block in the last level cache for millions of
cycles between two consecutive accesses [11].
In the pipeline stage, the replacement of a portion of un-
reliable registers and functional units with reliable elements
eliminate a high portion of faults in operation components.
However, the soft error in logic components still can be prop-
agated in our proposed approach. Nonetheless, this portion
of faults are relatively small compared to other faults which
exclusively target storage and operation components.
5. CONCLUSIONS
In this paper, we focused on leveraging the intrinsic soft-
error-immunity characteristic of STT-RAM as an alternative
for SRAM-based storage and operation components. We
0
200
400
600
800
1000
#  
o f
  p
r o
p a
g a
t e
d  
f a
u l
t s
multiple fault injection into all instructions
crached SDC
Figure 6: Aggregated results for multiple fault injec-
tion into all operation instructions of the proposed
HW model.
0
200
400
600
800
1000
#  
o f
  p
r o
p a
g a
t e
d  
f a
u l
t s
multiple fault injection into arithmetic instructions
crached SDC
Figure 7: Aggregated results for multiple fault in-
jection into arithmetic operation instructions of the
proposed HW model.
0200
400
600
800
1000
#  
o f
  p
r o
p a
g a
t e
d  
f a
u l
t s
multiple fault injection into load/store instructions
crached SDC
Figure 8: Aggregated results for multiple fault in-
jection into load/store operation instructions of the
proposed HW model.
showed that a specific portion of working set in the multi-
threaded programs are ideal candidate to be stored and ex-
ecuted in STT-RAM based components. Doing so, the pro-
posed CMP architecutre can achieve 30% on average more
resilient to soft errors.
6. FUTURE WORK
We still look for a methodology to determine those in-
structions which experience high rate of references with the
lowest memory modification. To attain this goal, we started
to look into intel vtune [12] which provides a rich set of
performance insight into CPU performance, threading per-
formance & scalability, bandwidth, caching and much more.
We expect to better determine the candidates for using STT-
RAM elements through obtaining the rate of memory mod-
ification by executing each instruction. In addition, another
section in experimental results needs to be added which
shows the amount of energy consumption and performance
benefit achieved using the proposed method compared to
traditional approach.
7. REFERENCES
[1] D. R. Butenhof. Programming with POSIX threads.
Addison-Wesley Professional, 1997.
[2] J. Coburn, A. M. Caulfield, A. Akel, L. M. Grupp,
R. K. Gupta, R. Jhala, and S. Swanson. Nv-heaps:
making persistent objects fast and safe with
next-generation, non-volatile memories. In ACM
SIGARCH Computer Architecture News, volume 39,
pages 105–118. ACM, 2011.
[3] X. Dong, C. Xu, Y. Xie, and N. P. Jouppi. Nvsim: A
circuit-level performance, energy, and area model for
emerging nonvolatile memory. Computer-Aided Design
of Integrated Circuits and Systems, IEEE Transactions
on, 31(7):994–1007, 2012.
[4] B. Gill, M. Nicolaidis, and C. Papachristou. Radiation
induced single-word multiple-bit upsets correction in
sram. In On-Line Testing Symposium, 2005. IOLTS
2005. 11th IEEE International, pages 266–271, July
2005.
[5] D. Gizopoulos, M. Psarakis, S. Adve,
P. Ramachandran, S. Hari, D. Sorin, A. Meixner,
A. Biswas, and X. Vera. Architectures for online error
detection and recovery in multicore processors. In
Design, Automation Test in Europe Conference
Exhibition (DATE), 2011, pages 1–6, March 2011.
[6] N. Khoshavi, H. R. Zarandi, and M. Maghsoudloo.
Control-flow error detection using combining basic and
program-level checking in commodity multi-core
architectures. In 2011 6th IEEE International
Symposium on Industrial and Embedded Systems,
pages 103–106. IEEE, 2011.
[7] N. Khoshavi, H. R. Zarandi, and M. Maghsoudloo.
Control-flow error recovery using commodity
multi-core architecture features. In 2011 IEEE 17th
International On-Line Testing Symposium, pages
190–191. IEEE, 2011.
[8] N. Khoshavi, H. R. Zarandi, and M. Maghsoudloo.
Two control-flow error recovery methods for
multithreaded programs running on multi-core
processors. In Microelectronics (MIEL), 2012 28th
international conference on, pages 371–374. IEEE,
2012.
[9] N. Khoshavi, H. R. Zarandi, and M. Maghsoudloo.
Two control-flow error recovery methods for
multithreaded programs running on multi-core
processors. In FACTA UNIVERSITATIS Series:
Electronics and Energetics, pages 309–323, 2015.
[10] R.-S. Liu, D.-Y. Shen, C.-L. Yang, S.-C. Yu, and
C.-Y. M. Wang. Nvm duet: Unified working memory
and persistent store architecture. In ACM SIGPLAN
Notices, volume 49, pages 455–470. ACM, 2014.
[11] G. H. Loh and M. D. Hill. Efficiently enabling
conventional block sizes for very large die-stacked
dram caches. In Proceedings of the 44th Annual
IEEE/ACM International Symposium on
Microarchitecture, MICRO-44, pages 454–464, New
York, NY, USA, 2011. ACM.
[12] T. Moseley, D. A. Connors, D. Grunwald, and R. Peri.
Identifying potential parallelism via loop-centric
profiling. In Proceedings of the 4th international
conference on Computing frontiers, pages 143–152.
ACM, 2007.
[13] J. Suh, M. Manoochehri, M. Annavaram, and
M. Dubois. Soft error benchmarking of l2 caches with
parma. volume 39, pages 85–96, 2011.
[14] G. Sun, E. Kursun, J. A. Rivers, and Y. Xie.
Exploring the vulnerability of cmps to soft errors with
3d stacked nonvolatile memory. J. Emerg. Technol.
Comput. Syst., 9(3):22:1–22:22, Oct. 2013.
[15] H. Sun, C. Liu, W. Xu, J. Zhao, N. Zheng, and
T. Zhang. Using magnetic ram to build low-power and
soft error-resilient l1 cache. Very Large Scale
Integration (VLSI) Systems, IEEE Transactions on,
20(1):19–28, 2012.
[16] Z. Sun, X. Bi, H. H. Li, W.-F. Wong, Z.-L. Ong,
X. Zhu, and W. Wu. Multi retention level stt-ram
cache designs with a dynamic refresh scheme. In
proceedings of the 44th annual IEEE/ACM
international symposium on microarchitecture, pages
329–338. ACM, 2011.
[17] S. Tehrani and N. M. Seminar. Status and prospect
for mram technology. Everspin Technologies, Inc.,
USA, 116(4), 2010.
[18] H. Volos, A. J. Tack, and M. M. Swift. Mnemosyne:
Lightweight persistent memory. ACM SIGPLAN
Notices, 46(3):91–104, 2011.
[19] J. Wang, Y. Tim, W.-F. Wong, Z.-L. Ong, Z. Sun, and
0200
400
600
800
1000
#  
o f
  p
r o
p a
g a
t e
d  
f a
u l
t s
single fault injection into arithmetic instructions
crached SDC
20
40
60
80
100
#  
o f
  p
r o
p a
g a
t e
d  
f a
u l
t s
(a)
0
200
400
600
800
1000
#  
o f
  p
r o
p a
g a
t e
d  
f a
u l
t s
single fault injection into load/store instructions
crached SDC
(b)
Figure 9: Aggregated single fault injection results
with LLFI for all operation instructions (a) Arith-
metic operation instructions, (b) Load/store in-
structions
H. Li. A coherent hybrid sram and stt-ram l1 cache
architecture for shared memory multicores. In Design
Automation Conference (ASP-DAC), 2014 19th Asia
and South Pacific, pages 610–615. IEEE, 2014.
[20] J. Wei, A. Thomas, G. Li, and K. Pattabiraman.
Quantifying the accuracy of high-level fault injection
techniques for hardware faults. In Dependable Systems
and Networks (DSN), 2014 44th Annual IEEE/IFIP
International Conference on, pages 375–382. IEEE,
2014.
[21] J. Yang, P. Wang, Y. Zhang, Y. Cheng, W. Zhao,
Y. Chen, and H. Li. Radiation-induced soft error
analysis of stt-mram: A device to circuit approach.
Computer-Aided Design of Integrated Circuits and
Systems, IEEE Transactions on, PP(99):1–1, 2015.
8. APPENDIX A
A detail study of the impact of soft errors on multithreaded
programs are shown in Figure. 9 and 10. The soft er-
rors contribute the most SDCs in specrand and factorial
benchmarks in both arithmetic and load/store instruction
typesets while other benchmarks only crash depend on the
sensitivity of their instructions to soft errors. For example,
mm, qs and stack benchmarks show high sensitivity to soft
errors while the circular buffer workload does not readily
get influenced by soft errors.
0
200
400
600
800
1000
#  
o f
  p
r o
p a
g a
t e
d  
f a
u l
t s
multiple fault injection into arithmetic instructions
crached SDC
(a)
0
200
400
600
800
1000
#  
o f
  p
r o
p a
g a
t e
d  
f a
u l
t s
multiple fault injection into load/store instructions
crached SDC
(b)
Figure 10: Aggregated multiple fault injection re-
sults with LLFI for (a) Arithmetic operation instruc-
tions, (b) Load/store instructions
