MeRLiN: Exploiting dynamic instruction behavior for fast and accurate microarchitecture level reliability assessment by Kaliorakis, Manolis et al.
MeRLiN: Exploiting Dynamic Instruction Behavior for Fast and 
Accurate Microarchitecture Level Reliability Assessment
       Manolis Kaliorakis     Dimitris Gizopoulos Ramon Canal       Antonio Gonzalez 
Department of Informatics & Telecommunications 
University of Athens, Greece 
{manoliskal, dgizop}@di.uoa.gr 
Computer Architecture Department 
Universitat Politecnica de Catalunya, Spain 
{rcanal, antonio}@ac.upc.edu
ABSTRACT@ 
Early reliability assessment of hardware structures using 
microarchitecture level simulators can effectively guide 
major error protection decisions in microprocessor design. 
Statistical fault injection on microarchitectural structures 
modeled in performance simulators is an accurate method 
to measure their Architectural Vulnerability Factor (AVF) 
but requires excessively long campaigns to obtain high sta-
tistical significance.  
We propose MeRLiN1, a methodology to boost microar-
chitecture level injection-based reliability assessment by 
several orders of magnitude and keep the accuracy of the 
assessment unaffected even for large injection campaigns 
with very high statistical significance. The core of MeRLiN 
is the grouping of faults of an initial list in equivalent clas-
ses. All faults in the same group target equivalent vulnera-
ble intervals of program execution ending up to the same 
static instruction that reads the faulty entries. Faults in the 
same group occur in different times and entries of a struc-
ture and it is extremely likely that they all have the same 
effect in program execution; thus, fault injection is per-
formed only on a few representatives from each group. 
We evaluate MeRLiN for different sizes of the physical 
register file, the store queue and the first level data cache of 
a contemporary microarchitecture running MiBench and 
SPEC CPU2006 benchmarks. For all our experiments, 
MeRLiN is from 2 to 3 orders of magnitude faster than an 
extremely high statistical significant injection campaign, 
reporting the same reliability measurements with negligible 
loss of accuracy. Finally, we theoretically analyze MeR-
LiN’s statistical behavior to further justify its accuracy. 
CCS CONCEPTS 
• Computer systems organization → Reliability
KEYWORDS 
Microarchitecture level reliability estimation, architectural 
vulnerability factor, fault injection, transient faults 
ACM Reference format: 
M. Kaliorakis, D. Gizopoulos, R. Canal, and A. Gonzalez.
2017. MeRLiN: Exploiting Dynamic Instruction
Behavior for Fast and Accurate Microarchitecture Level
Reliability Assessment. In Proceedings of ISCA ’17,
Toronto, ON, Canada, June 24-28, 2017, 14 pages.
https://dx.doi.org/10.1145/3079856.3080225
1. INTRODUCTION1
Continuous miniaturization of transistors allows computer 
architects to build more complex and efficient circuits in 
terms of functionality and performance. However, these 
chips become more and more susceptible to transient, in-
termittent and permanent faults due to external factors 
(such as particle strikes), manufacturing defects or wear-out 
phenomena [1, 2, 3, 4, 5]. 
Unavoidably, designers devote significant resources (ef-
fort, budget, circuit area) to ensure sufficient reliability lev-
els of the computing system before it is released to market. 
Design decisions for detection, diagnosis, recovery and re-
pair of faults are always translated to performance, area and 
power overheads. If such design decisions are guided by 
inaccurate reliability assessments, they can lead to unneces-
sary and excessive costs for error protection [6]. Early but 
also accurate reliability assessment is vital for optimal se-
lection among the available protection mechanisms.  
The four more popular approaches to estimate the reli-
ability of hardware components are: RTL injection [7, 8, 9], 
microarchitecture level injection [10, 11, 12, 13, 14], ACE 
(Architecturally Correct Execution) analysis [15, 16, 17, 
18] and probabilistic models [19, 20, 21, 22].
RTL injection allows very accurate studies of the fault
effects in all hardware structures but these studies are per-
formed too late in the design cycle to facilitate effective 
decision-making for error protection. Moreover, RTL injec-
tion requires excessively long simulation time which pre-
vents detailed reliability evaluation of components with 
1
 MeRLiN = Microarchitectural evaluation of Reliability using statisticaL 
fault iNjection. 
1
statistically significant number of injections and large 
workloads. Microarchitecture level injection, on the other 
hand, is less detailed than RTL injection and is used for 
accurate full-system studies of fault effects in early design 
stages; it is orders of magnitude faster than RTL injection.  
ACE analysis and probabilistic models are significant-
ly faster than the two injection methods because they re-
quire a single or few fault-free runs to report reliability es-
timations. They provide a very useful but conservative low-
er bound of the reliability (upper bound of the vulnerability) 
of hardware components [9, 23, 24]. In particular, [23] 
reports 7X and [9] reports 3X AVF over-estimation of ACE 
analysis compared to fault injection. For example, [25] re-
ports about 30% AVF for the physical integer register file 
of the out-of-order Alpha 21264 microprocessor with 80 
registers using ACE analysis2; however, our comprehensive 
injection campaign of 60,000 transient faults3 targeting the 
same structure for the same benchmarks on the out-of-order 
x86-64 microarchitecture in Gem5 measures only 2.56%, 
4.81%, and 8.92% AVF for 256, 128 and 64 registers re-
spectively.4 Moreover, ACE analysis is not suitable to eval-
uate fault tolerant mechanisms that are based on soft error 
symptoms, in contrast to microarchitecture level injection 
[9, 27]. Despite of its disadvantages, ACE analysis merit in 
early reliability assessments is indisputable because it gives 
the opportunity to estimate the upper bound of vulnerability 
for different design options (component sizes, policies, etc.) 
in very short time.  
Figure 1 reflects the motivation of our MeRLiN meth-
odology compared to the four aforementioned state-of-the-
art methods in terms of speed and measurement accuracy. 
An ideal method at the top-right corner of the figure would 
provide the highest speed (equal to that of the ACE analysis 
and probabilistic models) and the highest accuracy (equal 
to that of the injection methods with high statistical signifi-
cance). MeRLiN approaches the ideal method boosting mi-
croarchitecture level injection-based reliability assessment 
while keeping its measurement accuracy unaffected. The 
backbone of MeRLiN is built on two major observations:  
 A large number of faults in a statistical fault injection
campaign are over-written before being read or are in-
jected in dead or invalid entries of the hardware struc-
ture [14]. These faults can be easily identified and
pruned from the initial fault list in a single run. We call
this first part of our method ACE-like.
 The faults that are injected in the same or different en-
tries of a structure during the same or different vulnera-
2  Our ACE-like analysis corroborates this and reports about 25% AVF for a 
register file of 80 registers for the same benchmarks. 
3  This population of faults corresponds to an extremely low error margin 
(0.63%) and an extremely high confidence level (99.8%); see [26]. 
4  For more details see Section 4. For 80 registers the injection-based AVF 
measurement is about 6%. 
ble intervals are very likely to have the same effect on 
program execution if these intervals end up to the same 
static instruction and the same micro-operation (uop) 
that reads the faulty entry. MeRLiN groups these faults 
together and performs fault injection on a small number 
of representatives. While it preserves the accuracy of the 
reliability measurements, this grouping drastically re-
duces the number of required injections because instruc-
tion repetition is an extensively inherent property of all 
programs [28, 29, 30, 31]. 
sp
ee
d
accuracy
RTL injection
Microarchitecture level
injection
MeRLiN
Ideal methodACE analysis
Probabilistic
models
Figure 1: Reliability estimation methods: speed and accuracy. 
Microarchitecture level, full-system simulators have 
been used for early assessment of the soft error vulnerability 
of hardware structures (register files, buffers, queues, cach-
es etc.) that occupy the majority of the chip's area [10, 13, 
32, 33, 34]. We implement and evaluate MeRLiN on a 
state-of-the-art microarchitecture level fault injector [12] 
[13] built on Gem5 [35]. MeRLiN's contributions are: 
 It accelerates statistical microarchitecture level fault
injection from 2 to 3 orders of magnitude. Our experi-
ments with full runs of 10 MiBench benchmarks show
93X, 225X and 68X speedup on average for different
sizes of the register file, the store queue and the first
level data cache, respectively. When applied to 10
SPEC CPU2006 benchmarks, MeRLiN reveals larger
average speedups of 1644X, 2018X and 171X for the
register file, the store queue and the first level data
cache, respectively.
 It reports virtually the same reliability estimations as
conventional microarchitectural fault injection with ex-
tremely high statistical significance.
 It delivers fine-grained insights of the fault effects (Si-
lent Data Corruptions-SDC, Detected Unrecoverable
Errors-DUE, crashes, locks) unlike ACE analysis which
only reports a gross AVF estimate. This can be used to
evaluate different protection schemes or to identify
benchmarks more prone to SDCs [27, 36].
2. RELATED WORK
Lifetime analysis has been previously used in several relia-
bility-related studies. The method of [37] uses execution 
intervals sampling for reliability evaluation of caches. In 
2
MeRLiN  ISCA ’17, June 24-28, 2017, Toronto, ON, Canada
[38, 39], the authors separate the Hardware Vulnerability 
Factor (HVF) from the Program Vulnerability Factor 
(PVF), while [40] focuses on on-line vulnerability estima-
tion and [25] aims to develop stressmarks to measure the 
maximum vulnerability of hardware structures to soft er-
rors. The methods in [41, 42, 43, 44] use lifetime analysis 
to support decision-making for error protection. 
Relyzer [45] aims to evaluate the effectiveness of soft-
ware symptom-based error detection techniques and to 
identify all the SDCs [46, 47]. Relyzer injects faults only at 
the software level (architectural registers and output of 
load/store address generation units), without considering 
microarchitecture level masking and its features (flushes, 
store forwarding, dead instructions, cache write backs etc.) 
which our method fully supports. Relyzer comprehensively 
measures the application resiliency or, equivalently, reports 
the PVF portion [38] of the AVF. On the other hand, MeR-
LiN considers both the microarchitecture and the software 
masking and injects faults in the actual bits of any hard-
ware structure at any cycle of the program execution; thus 
it reports the complete AVF including both the HVF and 
the PVF dimensions. Unlike Relyzer, MeRLiN: 
 Reports the vulnerability of all microarchitectural struc-
tures modeled in performance simulators (physical reg-
ister file, ROB, LSQ, predictors, caches, TLBs, etc.) and
the vulnerability of the entire CPU. Relyzer focuses only
on software resilience to faults.
 Reports the vulnerability of instruction related struc-
tures (L1 Instruction cache, fetch queue, trace cache,
etc.). Relyzer only studies faults that reach data fields of
the software.
 Can be used in early design stages to guide reliability
design decisions concerning several microarchitectural
features (components sizes, policies, etc.) or the use of
several hardware and software protection mechanisms;
Relyzer is limited to software symptom-based detectors.
GangES [48] is a follow-up study of [45] that acceler-
ates injections at the software layer monitoring the inter-
mediate execution state of each run. Finally, [12] is orthog-
onal to MeRLiN and can be combined with it, as it acceler-
ates the individual microarchitectural injection runs at 
runtime without pruning the initial fault list. 
3. MeRLiN METHODOLOGY
Our methodology consists of three phases: Preprocessing, 
Fault List Reduction and Fault Injection Campaign as 
shown in Figure 2. We describe the three phases in the fol-
lowing subsections. 
3.1 Preprocessing 
This first phase includes two tasks. First, MeRLiN records 
all vulnerable intervals of all entries of a hardware structure 
during the entire benchmark execution; this is the ACE-like 
analysis step. Then, MeRLiN creates the initial fault list 
repository that consists of a large number of faults for a 
statistically significant sampling: very low error margin 
and very high confidence level [26]. 
3.1.1 ACE-like analysis 
During this first task, the benchmark runs once to comple-
tion to profile the vulnerable intervals (in which a bit flip 
may lead to corruption) of each entry of the target hardware 
structure (e.g. the registers in a physical register file). For 
our analysis, a vulnerable interval of an entry: 
 Starts with a write operation and ends with a committed
read of the same entry;
 Starts with a committed read and ends with another
committed read of the same entry.
benchmark configuration param.
• number of entries
• execution time 
• error margin 
• confidence level
ACE-like analysis
initial fault list
1st step: Grouping according to 
RIP and uPC
group Ngroup 1 group 2 group 3 . . .
2nd step: Grouping according to byte position
reliability estimation
Preprocessing
Fault
List Reduction
Fault 
Injection 
Campaign
vulnerable intervals
groupMgroup 1 group 2 group 3 . . .
fault injection & parsing
reduced fault list
Figure 2: Flowchart of MeRLiN. 
This definition differs from the typical definition of 
ACE intervals [15, 17] (where intermediate reads do not 
define the end of an interval) but the overall vulnerable 
time (sum of vulnerable intervals) is the same. Note that, 
similar to the original ACE analysis wrong-path execution 
instructions are not considered as part of the vulnerable 
intervals of MeRLiN. We highlight this difference between 
the two methods by an example in Figure 3, which repre-
sents the lifetime of an entry during the execution of a 
benchmark. The arrows directed upwards and downwards 
represent read and write operations, respectively. The read 
operations at t2, t5 and t6 are finally squashed. MeRLiN 
divides the interval between t7 and t9 in two individual 
vulnerable intervals, while ACE analysis considers them as 
a single interval. 
This difference between MeRLiN’s first step and classic 
ACE analysis is very important for the second phase of 
3
MeRLiN, where the faults are grouped with respect to the 
instruction pointer (RIP) and the micro program counter 
(uPC) of the committed read that accesses the entry at the 
end of the vulnerable interval. Our analysis requires both 
the RIP and the uPC to cover cases where an x86-64 in-
struction consists of different micro-instructions that access 
the same or different entries of the hardware structure in 
the same or different cycles. These accesses can lead to 
different fault effects and are classified separately.  
Our ACE-like analysis is significantly lighter in terms 
of storage overhead (10-100MB in our experiments) and 
more easily implemented than the complete ACE, because 
it does not trace the transitively dynamically dead (TDD) 
instructions [15]. The execution time of the ACE-like sin-
gle-run step was less than 5 hours for all our experiments. 
At the end of this step, the following information is 
stored in the vulnerable intervals repository for every ACE-
like vulnerable interval of each entry: (i) start and end of 
the interval (cycle numbers), (ii) the instruction pointer 
(RIP) of the static x86-64 instruction that reads an entry at 
the end of the interval, and (iii) the micro program counter 
(uPC) of the micro-operation which is part of the x86 in-
struction and reads an entry at the end of the interval. 
t1 t2 t3 t4 t5 t6 t7 t8 t9
squashedACE interval ACE interval
MeRLiN’s ACE-like
interval
time
MeRLiN’s ACE-like
interval
MeRLiN’s ACE-like
interval
Figure 3: ACE and ACE-like intervals definition example. 
3.1.2 Initial Fault List Creation 
In the second task of the first phase, MeRLiN creates the 
initial fault list repository according to the statistical sam-
pling described in [26]. The initial faults population is de-
fined by: (1) the size (in bits) of the hardware structure, (2) 
the total execution time (in cycles) of the benchmark, (3) 
the statistical confidence level and (4) the statistical error 
margin. To achieve high statistical significance, the initial 
fault list should consist of tens or hundreds of thousands of 
faults. For instance, an injection campaign targeting a 256-
entry integer register file of 64-bit registers with error mar-
gin 2.88%, confidence level 99% and 100M cycles of pro-
gram execution time, requires 2000 fault injection runs 
[26]. If a higher statistical significance is needed (i.e. 
0.63% error margin and 99.8% confidence level), the total 
number of injection runs explodes to 60,000 (an unaccepta-
bly large number of injections even for relatively short 
benchmarks). We use this number of 60K faults to define 
the baseline injection campaign for each single component, 
size and benchmark configuration, ensuring the same or 
even slightly higher statistical significance for all our struc-
tures. According to [26], for estimations of high statistical 
significance the confidence level and the error margin dom-
inate in the calculation of the initial fault list population. 
The outputs of the first phase of MeRLiN are the vul-
nerable intervals repository and the initial fault list that 
feed MeRLiN’s second phase. 
3.2 Fault List Reduction 
This phase of MeRLiN classifies the faults in groups run-
ning a two-step grouping algorithm, and creates the re-
duced fault list that is used for the actual injections. 
3.2.1 1st step of group creation algorithm 
During the execution of the first step of the algorithm, all 
faults of the initial fault list are examined. All faults that 
target a non-vulnerable interval are directly classified as 
Masked as no injection is needed for them. The remaining 
faults that hit ACE-like vulnerable intervals are stored in 
different subdirectories (see Figure 2) according to the RIP 
and the uPC of the instruction that reads the entry at the 
end of the interval. Each of the created groups consists of 
transient faults on the same or different entries of the 
hardware structure being analyzed, during the same or dif-
ferent ACE-like vulnerable intervals that are read by an 
instruction with the same RIP and the same uPC. 
Figure 4 shows an informative example of this first step 
for three entries of a hardware component during the exe-
cution of the same benchmark. When this step finishes, 
four groups are created containing faults that hit different 
hardware entries at different time intervals. The faults with 
the same color belong to the same group. The faults belong-
ing to non-vulnerable intervals (gray color) are character-
ized as Masked. For instance, the faults in intervals t4-t6, 
t10-t13 and t7-t11 are grouped together (red color), because 
these intervals end up to micro-instructions with the same 
ripC and uPC3. 
t1 t2en
tr
y
A
e
n
tr
y
B
t3en
tr
y
C
t4
t9 t10 t13
t6 t8 t12
t5 t7 t11
group 1
group 1
group 1
group 2
group 2
group 3
group 4
rip A
uPC 0
rip D
uPC 3
rip C
uPC 3
rip C
uPC 3
rip A
uPC 0
rip B
uPC 1
rip C
uPC 3
time
Figure 4: 1
st
 step example of the grouping algorithm. 
3.2.2 2nd step of group creation algorithm 
Due to logical masking, all bits in a given faulty entry may 
not have the same effect when read by an instruction. To 
4
maximize MeRLiN’s accuracy especially for groups with 
hundreds of faults, we select more than one fault for the 
actual fault injection runs in cases that faults hit a different 
byte of the entry. Moreover, faults in different bytes are 
selected from different dynamic instances of the same static 
instruction to increase time diversity. This can be further 
extended to separate faults hitting different nibbles or bits, 
but our experiments verify that this is not necessary. 
MeRLiN ensures that for static instructions that are cor-
related with large population of faults, several representa-
tives are selected from different dynamic instances of the 
same instruction, covering all possible byte positions of 
different entries. This per byte selection leads to smaller 
final groups ensuring the statistical significance of MeRLiN 
(see the theoretical analysis in Section 4.4.5), while it leads 
to groups of faults that are extremely likely to have the 
same effect. Figure 5 shows an example of the second step 
of the algorithm for three different hardware entries (K, L, 
M) during the execution of a benchmark. Note that all these
faults were classified in the same group (same rip=F and 
uPC=4) from the first step of the grouping algorithm. The 
number next to each fault corresponds to the group in 
which the fault is finally classified at the end of the second 
step; the faults in circles are stored in the reduced fault list 
repository and are the only ones that will be injected. The 
execution time of the entire MeRLiN’s single-run group 
creation algorithm was less than 50 minutes for all our 
experiments.  
At the end of this phase, the reduced fault list repository 
contains all the selected faults. Only these faults are inject-
ed using the microarchitecture level fault injector. 
bytes
b7
4
3
2
3
1 1
5
4
3
1 1
6
5
4
2
b6
b5
b4
b3
b2
b1
b0
66
time
entry K, rip F, uPC 4 entry L, rip F, uPC 4 entry M, rip F, uPC 4
Figure 5: 2
nd
 step example of the grouping algorithm. 
3.3 Fault Injection Campaign 
In the last phase of MeRLiN, the fault injection campaign 
is launched using all faults of the reduced fault list reposi-
tory. During the parsing step, the outputs of all the injec-
tion runs per reduced group are compared to that of the 
golden run to identify the fault effect and calculate the final 
reliability estimation of the structure. 
4. MeRLiN EVALUATION
4.1 Microarchitecture level fault injector – GeFIN 
We employ GeFIN [13] a Gem5-based [35] microarchitec-
tural injector and extend it to implement and evaluate 
MeRLiN on three structures of an x86-64 out-of-order pro-
cessor: 
 The physical integer Register File (RF) for three sizes:
256, 128, 64 registers.
 The data field of the Store Queue (SQ) of the
Load/Store Queue for three sizes: 64 load and 64 store,
32 load and 32 store, and 16 load and 16 store entries.
Gem5 doesn’t implement data fields in the Load Queue.
 The data field of L1 data cache (L1D) for three sizes:
64KB, 32KB and 16KB.
MeRLiN can be used for: (i) all hardware structures of
the CPU (caches, buffers, queues, registers, etc.), (ii) differ-
ent input sets and benchmarks, (iii) different architectures 
and ISAs. 
4.1.1 Configuration 
Table 1 shows the baseline microprocessor configuration of 
our experiments. For all the experiments, we used ma-
chines with Intel Core i7-4771 at 3.5GHz, 16GBytes of 
RAM at 1600MHz and 1TByte hard disk. 
Table 1: Baseline microprocessor configuration. 
Parameter x86 microprocessor model configuration 
Pipeline OoO 
Physical register file 256/128/64 int; 192 FP 
Issue Queue entries 32 
Load/Store Queue  64/32/16 load & 64/32/16 store entries 
ROB entries 100 
Functional units 
6 int ALUs; 2 complex int ALUs; 4 FP ALUs, 2 
FP mul/div, 4 SIMD 
L1 Instruction Cache 32KB,64B line,128 sets,4-way,write back 
L1 Data Cache 
16KB/32KB/64KB, 
64B line,64/128/256 sets,4-ways,write back 
L2 Cache 1MB,64B line,1024 sets,16-way,write back 
Branch Predictor Tournament predictor 
Branch Target Buffer 
conditional and unconditional branches BTB 
(direct-mapped, 4K entries) 
4.1.2 Fault effect classification 
For each injection run, we classify the fault effect in one of 
the six categories shown in Table 2. 
Table 2: Fault effect classification. 
Category Effect 
Masked Output and x86 exceptions were identical to the golden run 
SDC 
The output is corrupted, but there was no abnormal behavior 
of the simulation process or the x86 exceptions 
DUE 
Simulation process and output are not corrupted, but there 
were indications of extra x86 exceptions 
Timeout 
Includes program flow Deadlocks (not committing further 
instructions) and Livelocks (redirected but continuing to 
commit instructions) that exceed execution time  
of benchmarks by three times 
Crash 
Includes process (abnormal termination of simulated pro-
gram), system (full-system is unable to recover) and simulator 
(simulator process terminated abnormally) crashes 
Assert Simulator stopped due to assert instruction 
5
4.2 Fault Sampling 
An exhaustive fault list at the microarchitecture level con-
sists of all flips for every bit of a hardware structure and for 
every program execution cycle. At the software the same 
list consists of bit flips in the operands of the assembly in-
structions; these faults are not correlated to the execution 
time of the program and the actual bits of the hardware. 
Table 3 presents a high-level quantitative comparison of 
Relyzer [45] and MeRLiN using as starting point the ex-
haustive fault list of the corresponding level of abstraction 
(first column). The second column shows the faults of the 
exhaustive list that remain for injection after the applica-
tion of each method, and the third column presents the 
gains (speedup) in terms of fault list reduction achieved by 
each method over the corresponding exhaustive list. The 
last two columns show the time needed to inject the exhaus-
tive list and the remaining faults in both methods, respec-
tively. Assume that we run one benchmark of 1 billion cy-
cles and we inject faults in the L1D (32KB), the SQ (16 
entries) and the RF (64 registers). The throughput of Gem5 
for full-system cycle-accurate simulation is 105 cycles/sec 
while for software emulation it is 106 cycles/sec [35]. MeR-
LiN delivers 5 orders of magnitude higher gains than Re-
lyzer having as starting point the exhaustive list, while it 
reports the reliability of the exhaustive list 10 orders of 
magnitude faster. 
Table 3: MeRLiN vs. Relyzer using exhaustive fault list. 
Exhaustive  
fault list 
 Remaining 
faults 
 Gain 
Evaluation 
time using 
exhaustive 
fault list 
Evaluation 
time using 
remaining 
faults 
MeRLiN 10
13
10
3
10
10
~3×10
9
 years 4 months 
Relyzer [45] 10
11 
10
6
10
5
~3×10
6
 years 32 years 
Statistical fault sampling is unavoidable due to the huge 
number of faults in the exhaustive fault list. Thus, the ini-
tial fault list for each campaign of this paper was generated 
using statistical fault sampling [26] (Section 3.1.2) and 
consists of 60,000 faults (99.8% confidence level and 
0.63% error margin). To study the scalability of MeRLiN 
(Section 4.4.2.4), we used an initial fault list of 600,000 
faults (99.8% confidence level and 0.19% error margin). 
4.3 Benchmarks 
We employ 10 benchmarks from the MiBench suite [49] 
and 10 from the SPEC CPU2006 suite. We ran the 
MiBench benchmarks to the end to evaluate both MeR-
LiN’s accuracy and speedup. Their execution time ranges 
from 1 to 55 million cycles, while they are very similar in 
instruction mixes and throughput with SPECs. Thus, they 
have extensively been used in many reliability studies [13, 
14, 23, 25, 50]. In the case of SPEC benchmarks, we evalu-
ate MeRLiN running Simpoint samples of 100M committed 
instructions with the largest weight [51]. MeRLiN’s pur-
pose is not to propose new benchmark intervals sampling 
approach for reliability evaluation, but any existing ap-
proach can be used (e.g. [37] for large caches or Simpoints 
that were used in many reliability studies [15, 17, 19]). 
We selected to evaluate MeRLiN’s accuracy executing 
MiBench benchmarks till the end instead of running entire 
SPEC benchmarks, because the execution time of each 
baseline comprehensive injection campaign (60,000 faults 
for each entire SPEC program, component and configura-
tion) would make the evaluation infeasible. Also, we evalu-
ated the accuracy of MeRLiN at the end of the Simpoint 
intervals of two selected SPEC CPU2006 benchmarks 
(bzip2 and gcc) (Section 4.4.3.4).   
4.4 Results and Analysis 
We evaluate MeRLiN in terms of reliability estimation ac-
curacy and speedup against the comprehensive campaign 
and the ACE-like analysis. Then, we discuss Relyzer’s heu-
ristics if employed in MeRLiN’s concept. Finally, we ana-
lyze the statistical properties of MeRLiN. 
4.4.1 Homogeneity of fault effects 
First, to measure the effectiveness of our grouping algo-
rithm we define the homogeneity metric. In equation (1), N 
is the number of the groups that MeRLiN generates and 
#faults is the number of faults of a group. The dominant 
class of a group is defined as the category among those of 
Table 2 that contains the largest number of faults in the 
group. Thus, dominant_class% is the percentage of faults of 
the group that are classified in the dominant class. When 
dominant_class% equals 100%, it means that all the faults 
in that group have the same fault effect. Finally, #to-
tal_faults is the total population of faults that hit vulnerable 
intervals. Large values of homogeneity close to 1.0, denote 
that the vast majority of faults across all groups lead to the 
same effect, and the accuracy of the algorithm is high. 
groupN
group1
#faults × dominant _ class%
 = 
#total_ faults × 100%
homogeneity

(1) 
Figure 6 shows the homogeneity for all our experiments 
running the 10 MiBench. On the average, the highest ho-
mogeneity for the RF is 0.940, for the SQ is 0.982 and for 
L1D is 0.920. In general, the homogeneity values are very 
high for this fine-grained classification (the 6 classes). If 
homogeneity is calculated in coarser granularity (masked 
vs. not-masked faults) and all classes that lead to non-
masking are combined together, then homogeneity is even 
larger; see the values at the top of each bar in Figure 7. In 
Figure 7 the value at the bottom of each bar represents the 
percentage of groups (average for all our experiments with 
MiBench) that consist of faults with exactly the same effect 
(masked, non-masked) meaning that they have a perfect 
homogeneity value of 1.0. Finally, homogeneity climbs to 
0.99 if we count the faults excluded by the ACE-like, but 
here we focus only on MeRLiN’s grouping part. All these 
results indicate the extremely high accuracy of MeRLiN. 
6
0
.9
3
0
.9
3
0
.9
2
0
.9
7
0
.9
2
0
.9
6
0
.9
3
0
.9
3
0
.9
6
0
.9
2
0
.9
4
0
.9
0
0
.9
0
0
.9
0
0
.9
6
0
.9
0
0
.9
5
0
.9
5
0
.9
2
0
.9
7
0
.9
5
0
.9
3
0
.9
4
0
.9
0
0
.9
2
0
.9
6
0
.9
6
0
.9
2
0
.9
5
0
.9
0
0
.9
5
0
.9
3
0
.9
3
0
.9
9
0
.9
9
0
.9
9
0
.9
9
0
.9
7 0
.9
9
0
.9
6 0
.9
8
0
.9
6
0
.9
9
0
.9
8
0
.9
9
0
.9
9
0
.9
8
0
.9
9
0
.9
7
0
.9
8
0
.9
5
0
.9
7
0
.9
4
0
.9
7
0
.9
7
0
.9
8
0
.9
6
0
.9
5 0
.9
7
0
.9
6
0
.9
8
0
.9
4
0
.9
7
0
.9
4
0
.9
7
0
.9
6
0
.9
0
0
.9
0
0
.9
5
0
.9
2
0
.8
8
0
.9
5
0
.9
4
0
.9
7
0
.8
8
0
.8
8
0
.9
2
0
.9
1
0
.8
9
0
.9
1
0
.8
9 0
.9
0
0
.9
6
0
.9
5
0
.9
6
0
.8
9
0
.8
9
0
.9
2
0
.8
9
0
.8
9
0
.9
2
0
.9
2
0
.9
0
0
.9
2 0
.9
4 0
.9
5
0
.8
9
0
.8
9 0
.9
1
0.80
0.85
0.90
0.95
1.00
su
sa
n_
c 
(2
56
re
gs
)
su
sa
n_
s 
(2
56
re
gs
)
su
sa
n_
e 
(2
56
re
gs
)
st
rin
gs
ea
rc
h 
(2
56
re
gs
)
dj
pe
g 
(2
56
re
gs
)
 s
ha
 (2
56
re
gs
)
fft
 (2
56
re
gs
)
qs
or
t (
25
6r
eg
s)
cj
pe
g 
(2
56
re
gs
)
ca
es
 (2
56
re
gs
)
av
er
ag
e 
(2
56
re
gs
)
su
sa
n_
c 
(1
28
re
gs
)
su
sa
n_
s 
(1
28
re
gs
)
su
sa
n_
e 
(1
28
re
gs
)
st
rin
gs
ea
rc
h 
(1
28
re
gs
)
dj
pe
g 
(1
28
re
gs
)
 s
ha
 (1
28
re
gs
)
fft
 (1
28
re
gs
)
qs
or
t (
12
8r
eg
s)
cj
pe
g 
(1
28
re
gs
)
ca
es
 (2
56
re
gs
)
av
er
ag
e 
(1
28
re
gs
)
su
sa
n_
c 
(6
4r
eg
s)
su
sa
n_
s 
(6
4r
eg
s)
su
sa
n_
e 
(6
4r
eg
s)
st
rin
gs
ea
rc
h 
(6
4r
eg
s)
dj
pe
g 
(6
4r
eg
s)
 s
ha
 (6
4r
eg
s)
fft
 (6
4r
eg
s)
qs
or
t (
64
re
gs
)
cj
pe
g 
(6
4r
eg
s)
ca
es
 (6
4r
eg
s)
av
er
ag
e 
(6
4r
eg
s)
su
sa
n_
c 
(6
4e
nt
rie
s)
su
sa
n_
s 
(6
4e
nt
rie
s)
su
sa
n_
e 
(6
4e
nt
rie
s)
st
rin
gs
ea
rc
h 
(6
4e
nt
rie
s)
dj
pe
g 
(6
4e
nt
rie
s)
sh
a 
(6
4e
nt
rie
s)
fft
 (6
4e
nt
rie
s)
qs
or
t (
64
en
tri
es
)
cj
pe
g 
(6
4e
nt
rie
s)
ca
es
 (6
4e
nt
rie
s)
av
er
ag
e 
(6
4e
nt
rie
s)
su
sa
n_
c 
(3
2e
nt
rie
s)
su
sa
n_
s 
(3
2e
nt
rie
s)
su
sa
n_
e 
(3
2e
nt
rie
s)
st
rin
gs
ea
rc
h 
(3
2e
nt
rie
s)
dj
pe
g 
(3
2e
nt
rie
s)
sh
a 
(3
2e
nt
rie
s)
fft
 (3
2e
nt
rie
s)
qs
or
t (
32
en
tri
es
)
cj
pe
g 
(3
2e
nt
rie
s)
ca
es
 (3
2e
nt
rie
s)
av
er
ag
e 
(3
2e
nt
rie
s)
su
sa
n_
c 
(1
6e
nt
rie
s)
su
sa
n_
s 
(1
6e
nt
rie
s)
su
sa
n_
e 
(1
6e
nt
rie
s)
st
rin
gs
ea
rc
h 
(1
6e
nt
rie
s)
dj
pe
g 
(1
6e
nt
rie
s)
sh
a 
(1
6e
nt
rie
s)
fft
 (1
6e
nt
rie
s)
qs
or
t (
16
en
tri
es
)
cj
pe
g 
(1
6e
nt
rie
s)
ca
es
 (1
6e
nt
rie
s)
av
er
ag
e 
(1
6e
nt
rie
s)
su
sa
n_
c 
(6
4K
B)
su
sa
n_
s 
(6
4K
B)
su
sa
n_
e 
(6
4K
B)
st
rin
gs
ea
rc
h 
(6
4K
B)
dj
pe
g 
(6
4K
B)
sh
a 
(6
4K
B)
fft
 (6
4K
B)
qs
or
t (
64
KB
)
cj
pe
g 
(6
4K
B)
ca
es
 (6
4K
B)
av
er
ag
e 
(6
4K
B)
su
sa
n_
c 
(3
2K
B)
su
sa
n_
s 
(3
2K
B)
su
sa
n_
e 
(3
2K
B)
st
rin
gs
ea
rc
h 
(3
2K
B)
dj
pe
g 
(3
2K
B)
sh
a 
(3
2K
B)
fft
 (3
2K
B)
qs
or
t (
32
KB
)
cj
pe
g 
(3
2K
B)
ca
es
 (3
2K
B)
av
er
ag
e 
(3
2K
B)
su
sa
n_
c 
(1
6K
B)
su
sa
n_
s 
(1
6K
B)
su
sa
n_
e 
(1
6K
B)
st
rin
gs
ea
rc
h 
(1
6K
B)
dj
pe
g 
(1
6K
B)
sh
a 
(1
6K
B)
fft
 (1
6K
B)
qs
or
t (
16
KB
)
cj
pe
g 
(1
6K
B)
ca
es
 (1
6K
B)
av
er
ag
e 
(1
6K
B)
RF SQ L1 data cache
Homogeneity 
0.
95
2
0.
95
3 0.
96
1
0.
98
3
0.
97
7
0.
97
3
0.
94
4
0.
94
2
0.
93
1
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
256regs 128regs 64regs 64entries 32entries 16entries 64KB 32KB 16KB
RF SQ L1D
Homogeneity using only masked and non-masked categories
90.8% 90.5% 90.3% 92.0% 90.7% 91.1% 88.4% 88.3% 89.1%
Figure 7: Coarse-grained homogeneity (top of bars) and per-
centage of groups with perfect homogeneity (equal to 1.0) 
(bottom of bars); average for 10 MiBench. 
4.4.2 Speedup 
We evaluate the speedup of MeRLiN against the compre-
hensive baseline fault injection campaigns (60,000 faults). 
4.4.2.1 MiBench benchmarks 
Figure 8 presents the speedup of the method for 256, 128 
and 64 physical registers for the 10 MiBench benchmarks. 
The lower (blue) segment and the value on top of it indicate 
the speedup compared to the comprehensive baseline injec-
tion method (60,000 faults) after the first ACE-like pass. 
The higher (red) segment of each bar indicates the speedup 
achieved by the grouping algorithm on top of the first ACE-
like step. The value on top of the red bar represents the 
final speedup achieved by MeRLiN. For example, for 64 
registers and the qsort benchmark the ACE-like step reduc-
es the initial fault list by 4.1X (60,000/14,757). The re-
maining 14,757 faults are further reduced by the grouping 
algorithm to 1126 faults that should be actually injected; 
this totally corresponds to 53.3X (60,000/1126) reduction 
of the initial fault list. The average speedups are 93.1X, 
62.1X and 43.7X for 256, 128 and 64 registers, respective-
ly. Similarly, Figure 9 and Figure 10 present the speedup 
for the store queue and the data cache, respectively. The 
average speedups for the store queue are 224.9X, 186.7X 
and 146.9X for 64, 32 and 16 entries respectively, while for 
the data cache they are 67.9X, 61.6X and 59.0X for 64KB, 
32KB and 16KB respectively.  
4.4.2.2 Actual Estimation Time running MiBench 
Figure 11 depicts the actual time required for the fault in-
jection campaigns in the three structures with the compre-
hensive fault injection method (60,000 faults per campaign; 
blue bars) and MeRLiN method (red bars) for all MiBench 
benchmarks and all component configurations. We assume 
that all injections run sequentially in the same machine.  
40.68
77.07 82.09
199.84
0.65 0.49 1.28 2.42
0
50
100
150
200
Register File Store Queue L1  data cache Final Estimation
Time
M
o
n
th
s
Comprehensive fault injection (60,000 faults) MeRLiN
Figure 11: Actual reliability estimation times of the compre-
hensive baseline injection vs. MeRLiN for all structures con-
figurations of this study running 10 MiBench benchmarks. 
4.4.2.3 SPEC CPU2006 benchmarks 
To evaluate the efficiency of MeRLiN in terms of speedup 
in larger benchmarks, we ran Simpoint samples of 100M 
committed instructions with the highest weight from 10 
selected integer benchmarks of the SPEC CPU2006 suite 
assuming an initial fault list of 60,000 faults. We used the 
configuration of Table 1, with 128 physical integer regis-
ters, 16 store and 16 load queue entries and a 32KB L1 data 
cache. The results of the speedup that MeRLiN delivers are 
reported in Figure 12. MeRLiN leads to very high final 
speedups of 1644X, 2018X and 171X on average for the 
RF, the SQ and the L1D cache, respectively, which are 
higher than the speedups obtained for MiBench programs 
since the Simpoint samples we used for SPECs correspond 
to the most representative part of their execution. 
Figure 6: Fine-grained homogeneity of fault effects in the RF, SQ and L1D for 10 MiBench benchmarks; using 6 classes of Table 2. 
7
Figure 8: MeRLiN speedup for the three sizes of the Physical Integer Register File running 10 MiBench benchmarks. 
Figure 12: MeRLiN speedup for the Register File (RF), Store Queue (SQ), and L1 data cache (L1D) running 10 SPEC CPU2006. 
Figure 10: MeRLiN speedup for the three sizes of the L1 data cache running 10 MiBench benchmarks. 
Figure 9: MeRLiN speedup for the three sizes of the Store Queue running 10 MiBench benchmarks. 
16.7 16.2 13.7 14.7 14.9 14.4 16.7 16.1 14.1 16.9 15.4 8.4 8.0 7.1 7.6 7.3 7.3 8.1 8.1 7.1
16.9
8.6 4.1 4.2 3.6 3.9 3.3 3.9 4.1 4.1 3.7 4.1 3.9
52.0
275.2
54.6
35.8
62.9
107.7
74.9
111.5
89.8
66.6
93.1
32.3
155.8
35.3
23.5
43.7
71.2
54.5
74.6
59.8
70.1
62.1
21.0
115.8
24.5 15.7
41.0 43.9 42.8
53.3
41.8 37.2 43.7
0
50
100
150
200
250
300
su
sa
n_
c
su
sa
n_
s
su
sa
n_
e
st
rin
gs
ea
rc
h
dj
pe
g
sh
a fft
qs
or
t
cj
pe
g
ca
es
av
er
ag
e
su
sa
n_
c
su
sa
n_
s
su
sa
n_
e
st
rin
gs
ea
rc
h
dj
pe
g
sh
a fft
qs
or
t
cj
pe
g
ca
es
av
er
ag
e
su
sa
n_
c
su
sa
n_
s
su
sa
n_
e
st
rin
gs
ea
rc
h
dj
pe
g
sh
a fft
qs
or
t
cj
pe
g
ca
es
av
er
ag
e
256regs 128regs 64regs
S
p
e
e
d
u
p
Speedup from ACE-like Speedup from Grouping
6.3
121.7
10.1 8.7 8.5
37.2 17.8 2.7 24.0 24.6 26.2 5.9
105.3
9.8 7.9 7.9 26.9 10.7 2.4 19.4 20.3 21.6 5.5
86.7
10.0 7.1
25.8 18.8 8.0 2.4 14.4 13.1 19.2
147.1
705.9
170.9
126.1115.8
219.8
150.4153.1
192.3
267.9
224.9
105.3
560.7
148.1
88.5 99.0
209.8
141.8138.2151.5
223.9
186.7
84.6
387.1
110.1
66.2
148.1
127.9
138.9
122.7
121.7 161.7146.9
0
100
200
300
400
500
600
700
800
su
sa
n_
c
su
sa
n_
s
su
sa
n_
e
st
rin
gs
ea
rc
h
dj
pe
g
sh
a fft
qs
or
t
cj
pe
g
ca
es
av
er
ag
e
su
sa
n_
c
su
sa
n_
s
su
sa
n_
e
st
rin
gs
ea
rc
h
dj
pe
g
sh
a fft
qs
or
t
cj
pe
g
ca
es
av
er
ag
e
su
sa
n_
c
su
sa
n_
s
su
sa
n_
e
st
rin
gs
ea
rc
h
dj
pe
g
sh
a fft
qs
or
t
cj
pe
g
ca
es
av
er
ag
e
64entries 32entries 16entries
S
p
e
e
d
u
p
Speedup from ACE-like Speedup from Grouping
4.4 5.1 2.7
8.6 5.6 9.2 2.2 1.8 1.4 2.1 4.3 4.2 3.3 1.9 7.6 2.1
5.9
2.0 1.8 1.5 1.7 3.2 4.6 2.4 1.9 6.0 2.7 4.5 2.2 1.9 1.9 1.5 3.0
54.9
95.2
61.8
43.1 44.4
72.5
71.5
117.2
67.2
51.5
67.9
42.6
100.0
49.5
36.4
47.4
69.4
68.2
90.5
61.5
50.3
61.6
35.5
96.8
41.3 34.8
48.0
69.0 68.3
87.3
46.7
61.9 59.0
0
20
40
60
80
100
120
140
su
sa
n_
c
su
sa
n_
s
su
sa
n_
e
st
rin
gs
ea
rc
h
dj
pe
g
sh
a fft
qs
or
t
cj
pe
g
ca
es
av
er
ag
e
su
sa
n_
c
su
sa
n_
s
su
sa
n_
e
st
rin
gs
ea
rc
h
dj
pe
g
sh
a fft
qs
or
t
cj
pe
g
ca
es
av
er
ag
e
su
sa
n_
c
su
sa
n_
s
su
sa
n_
e
st
rin
gs
ea
rc
h
dj
pe
g
sh
a fft
qs
or
t
cj
pe
g
ca
es
av
er
ag
e
64KB 32KB 16KB
S
p
e
e
d
u
p
Speedup from ACE-like Speedup from Grouping
7
517
6 7
294
6 7
264
5 7
268
6 7
253
5 7
297
5 7
260
5 7
255
6 7
237
6 7
244
5 7
289
6
1,875
2,308
175
1,935
2,500
168
1,935
2,222
178
405
1,538
188
1,818
2,222
168
2,000
2,143
172
1,071
1,622
150
1,818
1,935
175
1,765 1,818
169
1,818
1,875
172
1,644
2,018
171
0
500
1000
1500
2000
2500
RF SQ L1D RF SQ L1D RF SQ L1D RF SQ L1D RF SQ L1D RF SQ L1D RF SQ L1D RF SQ L1D RF SQ L1D RF SQ L1D RF SQ L1D
bzip2 gcc mcf gobmk hmmer sjeng libquantum h264ref omnetpp astar average
S
p
e
e
d
u
p
Speedup from ACE-like Speedup from Grouping
8
4.4.2.4 Scaling of the MeRLiN method 
The higher the statistical significance of the initial fault list 
the larger the speedup that MeRLiN offers. In our initial set 
of campaigns, we ran all MiBench benchmarks using 
60,000 faults per campaign (99.8% confidence level and 
0.63% error margin). To stress MeRLiN even further, we 
repeated all these campaigns using a huge 10 times larger 
initial list of 600,000 faults (99.8% confidence level and 
0.19% error margin)5. Figure 13 presents the average 
speedup achieved for these two sets of campaigns by the 
ACE-like (lower purple segment of each bar) and the 
grouping step (upper white segment) of MeRLiN, as well as 
the final speedup achieved (value on top of each bar) for 
each configuration. The final speedup was scaled up 3.46 
times on average; practically meaning that for a 10 times 
increase of the initial fault list, MeRLiN finally applies only 
2.89 times more faults.  
69.2 70.1 69.5
298.0
252.8
200.5
130.2
81.3 60.9
348.5 303.8
292.6
929.5
686.5
547.3
367.1
259.6
183.7
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
64
KB
32
KB
16
KB
64
en
tri
es
32
en
tri
es
16
en
tri
es
25
6r
eg
s
12
8r
eg
s
64
re
gs
64
KB
32
KB
16
KB
64
en
tri
es
32
en
tri
es
16
en
tri
es
25
6r
eg
s
12
8r
eg
s
64
re
gs
L1D SQ RF L1D SQ RF
Error margin 0.63% Error margin 0.19%
S
p
e
e
d
u
p
Speedup from ACE-like Speedup from Grouping
Figure 13: MeRLiN speedup scaling for 0.63% (60K faults) 
and 0.19% error margin (600K faults); 10 MiBench average. 
4.4.3 Reliability Estimation Accuracy 
We measure the accuracy of the reliability estimations of 
MeRLiN for the three components running 10 MiBench 
benchmarks till the end. We compare MeRLiN's accuracy 
against the injection in: (i) the remaining fault list after the 
exclusion of the faults that target non-vulnerable intervals 
(identified by the ACE-like step of the method), (ii) the 
comprehensive baseline fault list (60,000 faults). Finally, 
we evaluate MeRLiN’s accuracy for the RF with 60K faults 
using Simpoints from the bzip2 and the gcc. 
4.4.3.1 Accuracy in the remaining fault list after ACE-like 
The estimation accuracy of MeRLiN for the three structures 
of this study against the injection using the remaining fault 
list after the ACE-like step is shown in Figure 14. Each 
graph shows the average fault effect classification across 
the 10 MiBench benchmarks used in our study for the three 
configurations of each structure. The first bar (blue) in each 
class corresponds to the results of the fault injection in the 
remaining fault list after the ACE-like analysis, while the 
5  In all our experiments we round up the number of injections (60,000 and 
600,000) instead of rounding the error margins. 
second bar (red) illustrates the results on the same fault list 
after applying MeRLiN’s grouping algorithm and injecting 
only the selected faults. The values on top of each bar rep-
resent the measurement per fault effect category. Similar 
behavior is observed across all benchmarks. For all compo-
nent configurations, MeRLiN reports negligible differences 
compared to the injection using all the faults that hit only 
vulnerable intervals. 
60
.6
1%
8.
01
%
0.
10
%
2.
20
%
28
.9
8%
0.
10
%
61
.0
2%
7.
67
%
0.
12
%
2.
16
%
28
.9
7%
0.
06
%
63
.1
5%
6.
23
%
0.
16
%
3.
13
%
27
.2
9%
0.
04
%
61
.3
4%
8.
47
%
0.
11
%
1.
80
%
28
.2
2%
0.
06
%
61
.0
8%
7.
57
%
0.
11
%
1.
99
%
29
.2
1%
0.
04
%
65
.2
6%
5.
06
%
0.
15
%
3.
37
%
26
.1
0%
0.
06
%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
256regs 128regs 64regs
Fault Effect Classification after ACE-like of Register File 
(average for 10 MiBench benchamarks)
Fault Injection in complete fault list remaining after ACE-like MeRLiN
69
.0
5%
24
.4
4%
0.
31
%
0.
35
%
5.
81
%
0.
04
%
67
.1
5%
24
.1
1%
0.
41
%
0.
34
%
7.
93
%
0.
06
%
68
.7
2%
18
.9
1%
0.
46
%
0.
50
%
11
.3
3%
0.
08
%
69
.4
0%
24
.3
8%
0.
31
%
0.
38
%
5.
49
%
0.
04
%
67
.4
7%
23
.7
9%
0.
30
%
0.
38
%
8.
00
%
0.
06
%
69
.6
4%
17
.8
3%
0.
47
%
0.
48
% 11
.5
0%
0.
08
%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
64 entries 32 entries 16 entries
Fault Effect Classification after ACE-like of Store Queue
(average for 10 MiBench benchamarks)
Fault Injection in complete fault list remaining after ACE-like MeRLiN
43
.4
1%
43
.5
2%
2.
16
%
1.
52
%
9.
17
%
0.
22
%
40
.8
5%
48
.4
8%
1.
46
%
1.
67
%
7.
41
%
0.
13
%
44
.2
3%
46
.6
8%
0.
76
%
1.
62
%
6.
63
%
0.
08
%
45
.2
2%
40
.8
0%
2.
67
%
1.
99
%
9.
09
%
0.
23
%
40
.0
0% 49
.5
3%
1.
40
%
1.
60
%
7.
33
%
0.
14
%
42
.0
9%
49
.8
4%
0.
27
%
1.
42
%
6.
31
%
0.
07
%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
64 KB 32 KB 16 KB
Fault Effect Classification after ACE-like of L1 data cache
(average for 10 MiBench benchamarks)
Fault Injection in complete fault list remaining after ACE-like MeRLiN
Figure 14: Classification of MeRLiN against injection with 
the remaining faults after ACE-like step for the RF, SQ, L1D. 
4.4.3.2 Accuracy in the comprehensive list of 60K faults 
Figure 15 shows the bigger picture for MeRLiN’s accuracy, 
in which the final fault effect classification of the compre-
hensive baseline fault injection of 60,000 faults (blue bar) is 
compared to the final classification of MeRLiN (red bar). 
Each bar represents the average values across the 10 
MiBench benchmarks. Similar behavior is observed across 
all benchmarks. MeRLiN for all cases is extremely accurate 
and delivers virtually the same reports with the comprehen-
sive injection, but orders of magnitude faster.  
9
97
.4
4%
0.
52
%
0.
03
%
0.
14
%
1.
86
%
0.
01
%
95
.1
9%
0.
96
%
0.
04
%
0.
28
%
3.
52
%
0.
01
%
91
.0
8%
1.
53
%
0.
05
%
0.
84
%
6.
48
%
0.
02
%
97
.4
8%
0.
54
%
0.
02
%
0.
12
%
1.
83
%
0.
01
%
95
.1
8%
0.
90
%
0.
03
%
0.
26
%
3.
63
%
0.
00
%
91
.5
6%
1.
26
%
0.
05
%
1.
01
%
6.
10
%
0.
02
%
0%
20%
40%
60%
80%
100%
120%
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
256regs 128regs 64regs
Final Fault Effect Classification of Register File 
(average for 10 MiBench benchmarks)
Comprehensive Baseline Fault Injection (60,000 faults) MeRLiN
97
.8
2%
1.
69
%
0.
04
%
0.
02
%
0.
43
%
0.
00
%
97
.3
3%
1.
87
%
0.
05
%
0.
03
%
0.
72
%
0.
00
%
97
.3
4%
1.
39
%
0.
06
%
0.
06
%
1.
14
%
0.
01
%
97
.8
8%
1.
64
%
0.
04
%
0.
03
%
0.
41
%
0.
00
%
97
.3
7%
1.
82
%
0.
04
%
0.
04
%
0.
73
%
0.
00
%
97
.4
4%
1.
31
%
0.
06
%
0.
06
%
1.
12
%
0.
01
%
0%
20%
40%
60%
80%
100%
120%
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
64 entries 32 entries 16 entries
Final Fault Effect Classification of Store Queue
(average for 10 MiBench benchmarks)
Comprehensive Baseline Fault Injection (60,000 faults) MeRLiN
80
.9
8%
15
.8
9%
0.
49
%
0.
33
%
2.
27
%
0.
04
%
76
.5
8%
19
.8
8%
0.
44
%
0.
54
%
2.
53
%
0.
03
%
77
.8
5%
18
.3
8%
0.
23
%
0.
60
%
2.
91
%
0.
03
%
82
.1
4%
14
.5
1%
0.
61
%
0.
51
%
2.
19
%
0.
04
%
76
.2
9%
20
.2
4%
0.
32
%
0.
53
%
2.
59
%
0.
03
%
76
.9
0%
19
.7
6%
0.
09
%
0.
52
%
2.
70
%
0.
03
%
0%
20%
40%
60%
80%
100%
120%
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
64 KB 32 KB 16 KB
Final Fault Effect Classification of L1 data cache
(average for 10 MiBench benchmarks)
Comprehensive Baseline Fault Injection (60,000 faults) MeRLiN
Figure 15: Final classification of MeRLiN against comprehen-
sive baseline injection (60,000 faults) for the RF, SQ, L1D. 
4.
19
6
4.
12
5
12
.2
62
3.
94
1
3.
94
7
12
.3
13
3.
65
3
3.
45
9
12
.0
58
0.
89
2
0.
86
7
4.
40
7
0.
54
9
0.
53
9 2.
56
6
0.
27
2
0.
26
2
1.
45
6
99
7
93
7
24
59
61
4
62
2
11
20
29
0
30
3
63
6
0
400
800
1200
1600
2000
2400
0
2
4
6
8
10
12
14
16
18
20
Ba
se
lin
e
M
eR
Li
N
AC
E-
lik
e
Ba
se
lin
e
M
eR
Li
N
AC
E-
lik
e
Ba
se
lin
e
M
eR
Li
N
AC
E-
lik
e
Ba
se
lin
e
M
eR
Li
N
AC
E-
lik
e
Ba
se
lin
e
M
eR
Li
N
AC
E-
lik
e
Ba
se
lin
e
M
eR
Li
N
AC
E-
lik
e
Ba
se
lin
e
M
eR
Li
N
AC
E-
lik
e
Ba
se
lin
e
M
eR
Li
N
AC
E-
lik
e
Ba
se
lin
e
M
eR
Li
N
AC
E-
lik
e
256 regs 128 regs 64 regs 64 entries 32 entries 16 entries 64KB 32KB 16KB
Register File Store Queue L1 data cache
F
IT
F
IT
Figure 16: Final reliability assessment (FIT) for RF, SQ, and 
L1D (average for 10 MiBench benchmarks). 
4.4.3.3 Final Reliability Assessment (FIT) 
Figure 16 demonstrates the final reliability estimation in 
Failures-in-Time (FIT) rates for the comprehensive base-
line campaign (60,000 faults), the MeRLiN method and the 
ACE-like method running the 10 MiBench benchmarks to 
the end. The reported FIT rates are the products of AVF, 
raw FIT rate and number of structure's bits. The AVF of the 
injection-based methods is the ratio of the non-masked in-
jections over the total injections, while the AVF of the 
ACE-like is measured as in [15]. Any raw FIT rate can be 
used; we use 0.01 FIT per bit.  
MeRLiN reports negligible differences compared to the 
comprehensive baseline injection, while the ACE-like de-
livers a pessimistic lower bound of structures' reliability. 
4.4.3.4 Accuracy using SPEC CPU2006 benchmarks 
The evaluation of MeRLiN’s accuracy for SPEC CPU2006 
benchmarks executed until the end in detailed microarchi-
tectural simulation mode is infeasible as was discussed in 
Section 4.3. To overcome this difficulty and in order to 
evaluate the accuracy that MeRLiN provides for SPEC 
CPU2006 benchmarks, we applied MeRLiN injecting faults 
in the physical register file for the gcc and bzip2 bench-
marks and terminating the fault injection runs at the end of 
the Simpoint interval. The configuration for these experi-
ments is the one of Table 1, with 128 physical registers, 16 
store and 16 load queue entries and a 32KB L1 data cache. 
As we do not execute the fault injection runs to the end, 
we are not able to identify SDCs, timeouts or any other ab-
normal behavior after the end of the Simpoint interval. 
Thus, only for these experiments we used a different fault 
effect classification than the classification presented in Ta-
ble 2. The classification consists of the following catego-
ries: (i) Masked; indicates a fault that was not over-written 
or hit a non-vulnerable interval without affecting program 
execution, (ii) DUE (as in Table 2), (iii) Crash (as in Table 
2), (iv) Assert (as in Table 2), and (v) Unknown; indicates 
a fault that still exists but at the end of the Simpoint inter-
val it is not known if it will eventually be classified in one 
of the previous classes or if it will lead to an abnormal be-
havior. 
Table 4 summarizes our measurements per fault effect 
category using MeRLiN and the comprehensive baseline 
fault list of 60K faults for the two benchmarks. In both cas-
es, MeRLiN delivers very accurate results per fault effect 
category compared to the comprehensive baseline method, 
while the maximum inaccuracy that was observed is only 
1.11 percentile points for the Unknown category of the 
bzip2 benchmark. 
Table 4: MeRLiN’s accuracy for gcc and bzip2 benchmarks. 
Category 
gcc 
(MeRLiN) 
gcc 
(baseline 
60K faults) 
bzip2 
(MeRLiN) 
bzip2 
(baseline 
60K faults) 
Masked 85.08% 85.08% 84.98% 84.98% 
DUE 0.06% 0.07% 0.29% 0.81% 
Crash 3.67% 3.13% 3.50% 4.10% 
Assert 0.01% 0.01% 0.03% 0.02% 
Unknown 11.18% 11.71% 11.20% 10.09% 
10
4.4.4 Analysis of Relyzer’s heuristics 
Both MeRLiN and Relyzer prune faults of the initial fault 
list being injected at different levels of the system stack. 
Thus, in this section we analyze the applicability of Relyzer 
heuristics at the microarchitecture level injection. 
Bounding addresses: It prunes faults in the address 
field of store and load instructions if the valid address space 
is violated. This heuristic requires an unaffordable amount 
of memory to track the addresses in data related structures 
(e.g. caches). Also, MeRLiN provides finer grained effect 
classification for non-masking categories (Table 2) and is 
not limited to symptom-based techniques. 
Def-use: It prunes faults in the destination architectural 
register of an instruction followed by another instruction 
that consumes this value, as these faults will have the same 
effect. Store-equivalence is similar to the def-use for store 
and load instructions. These two heuristics cannot be ap-
plied at the microarchitecture level of our work. The desti-
nation register of an instruction and the source register of a 
subsequent correspond to the same physical entity [47]. 
Control-equivalence: Software analysis using basic 
blocks tracks the control flow paths of all the dynamic in-
stances of all the static instructions to separate Masked 
from SDC faults [46]. For each path Relyzer randomly 
chooses only one pilot. To evaluate this heuristic, we ran 
the 10 MiBench to the end with 128 registers, 16 SQ en-
tries and 32KB L1D. Exhaustive fault injection is infeasi-
ble; thus, we used the remaining faults (from 60,000 initial 
faults) after the pruning by our ACE-like step. We used a 
control flow path depth of 5, exactly as Relyzer does [45]. 
In terms of speedup, MeRLiN slightly prevails on aver-
age in the RF (62.1X compared to 60.5X) and the L1D 
(60.1X compared to 59.1X), while for the SQ, MeRLiN 
provides 146.9X speedup compared to 150.6X of Relyzer's 
heuristic. Figure 17, illustrates the results of the compari-
son in terms of inaccuracy in percentile units compared to 
the injection using the same fault list.  
A source of Relyzer’s inaccuracy is the static instruc-
tions with large population of faults that are represented by 
only one randomly selected pilot. In [45], 52% on average 
of all static instructions have only 1 pilot. We measured 
that Relyzer leaves 9% of the groups correlated to a static 
instruction with large population of faults (more than 100 
faults) with only 1 pilot, while MeRLiN leaves less than 
2%. The heuristic of Relyzer if applied to our statistical 
concept selects only one pilot for code loops with large 
number of iterations. Assume a for-loop with 1000 itera-
tions that consists of only one static instruction with only 
two control flow paths with 995 and 5 instances, respective-
ly. Due to statistical sampling all faults may come only 
from the first path. In this case, Relyzer chooses only one 
pilot for this loop. On the contrary, MeRLiN, due to the 
homogenous distribution of faults, chooses more than one 
from different bytes and dynamic instances. These large 
loops exist in most program execution phases, including 
initialization and output phase that are not examined by 
[45]. Despite of Relyzer’s indisputable merit in software 
resilience, this heuristic of Relyzer is not so efficient to be 
employed in our concept. 
1.
53 1.
75
0.
07 0
.3
8
4.
01
0.
28
3.
23
1.
76
3.
35
1.
65
2.
42
2.
41
2.
93
4.
12
0.
44
0.
16 0.
26 0
.6
5
0.
07
0.
69
0.
12
0.
15
0.
92
0.
02
0.
83 1
.1
0
0.
02
0.
01 0
.2
8
0.
01
0.
86 1.
06
0.
06
0.
08
0.
08
0.
01
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
M
as
ke
d
SD
C
D
U
E
Ti
m
eo
ut
C
ra
sh
As
se
rt
RF SQ L1D
In
ac
cu
ra
cy
 in
 p
er
ce
nt
ile
 u
ni
ts
Relyzer MeRLiN
Figure 17: Inaccuracy of MeRLiN and Relyzer vs. injection 
with the remaining faults after ACE-like;avg. for 10 MiBench. 
4.4.5 Theoretical analysis of MeRLiN 
In this section we analyze the statistical behavior of MeR-
LiN comparing the mean and the variance of the AVF 
measurements it reports to the corresponding mean and 
variance of the comprehensive fault injection campaign. 
We assume that soft errors affecting the microprocessor bits 
follow a normal distribution [26]. A fault injection cam-
paign can be described as a binomial experiment of F indi-
vidual injections, each of which has a probability of success 
(program is affected) or failure (program is not affected; 
fault is masked). Thus, the AVF measurement k (0 ≤ k ≤ 1) 
in our case means that k·F faults are Not-Masked.  
MeRLiN’s first phase prunes a fraction m (0 ≤ m ≤ 1) of 
the F faults that are guaranteed masked: m·F. The remain-
ing (1–m)·F faults (which now contain all k·F Not-Masked 
faults of the initial list of F faults) are forwarded to the sec-
ond phase of MeRLiN (grouping). This second phase pro-
duces n groups of faults with sizes si (i=1, 2, … , n). The 
sum of the group sizes is equal to the number of faults 
passed to the second phase: s1 + s2 + … + sn = (1–m)·F.  
When the comprehensive injection campaign (without 
MeRLiN) is applied, all F faults are injected and the out-
come r of each run is observed (Not-Masked=1 or 
Masked=0). In this case, the AVF (k) is6: 
1 1
isn
j
i
i j
r
k
F
 


We assume that the probability of Non-Masking within 
a group i is pi. Within a group i, all faults have the same 
6
 We could consider as group 0 with size s0= m·F the group of faults from 
MeRLiN’s pre-processing step but since all faults of this group are masked, 
i.e. r=0, this group is not needed in the calculations. 
11
probability pi because of MeRLiN’s grouping criterion: 
faults in a group hit the same byte of the entries during a 
vulnerable interval that ends with the same instruction that 
reads the entry. The results of Figure 7 show the validity of 
this assumption; they indicate that the vast majority of 
groups have homogeneity close to 1.0 (considering only the 
masked and non-masked categories) and that the percent-
age of groups with perfect homogeneity is very large in all 
cases. Across groups, probabilities pi are different since the 
groups correspond to faults eventually read by different 
instructions. The mean (expected value; E) of the AVF 
measurement k in the comprehensive campaign is7:  
1 1 1 1 1 1 1
( )
( )
i i is s sn n n n
j j
i i i i i
i j i j i j i
r E r p s p
E k E
F F F F
      
 
 
    
 
 
 
   
When MeRLiN is employed it delivers a new AVF 
measurement kMeRLiN. For each run r of the selected fault 
from a group i all faults are assumed to have the same re-
sult (1=Not-Masked, 0=Masked). So, the true measurement 
in this case is si ri for each group i and the new AVF kMeRLiN 
is: 
1
n
i i
i
MeRLiN
s r
k
F




, which has a mean 
1 1 1 1
( ) ( )
( ) ( )
n n n n
i i i i i i i i
i i i i
MeRLiN
s r E s r s E r s p
E k E E k
F F F F
   
 
   
     
 
 
 
   
therefore, MeRLiN reports AVF with the same mean value 
as the original comprehensive set of F fault injections. Τhe 
variance of the AVF measurements k and kMeRLiN is shown 
in the following equations8: 
2
1 1 1 1 1 12 2
2 2
2 1
2
( ) (1 )
( )
(1 )
( )
i i is s sn n n
j j
i i i i
i j i j i j
n
i i i
i
r r p p
k
F F F
s p p
k
F

 

     

 
  
    
 
 
 
  
 
  

2 2 2
2 2 1 1 1
2 2
( ) (1 )
( )
n n n
j
i i i i i i i
i i i
MeRLiN
s r s r s p p
k
F F F

    
 
    
   
 
 
 
  
The values of both 2( )k and 2( )MeRLiNk  are very small 
(several orders of magnitude smaller than the means of k 
7
 We use the linearity property of the means of independent variables which 
holds for binomial distribution. The mean of a binomially distributed varia-
ble is E(X) = n p with n experiments and p success probability.  
8
 We use the relation 2 2 2 2 2( ) ( ) ( )a X b Y a X b Y          for the vari-
ances of independent variables. Group 0 has zero variance.  
and kMeRLiN, respectively) for two reasons: (a) the groups 
generated by MeRLiN are very homogeneous (thus either pi 
or (1–pi) is zero or is very small) as shown in Section 4.4.1 
and (b) the sizes of the groups (si values) are very small 
compared to F. In our experiments, the average size of a 
MeRLiN group is always less than100 and typically ranges 
between 5 and 40. Thus, with simple calculations on the 
above equations the variance of the initial AVF value when 
F consists of 60K faults is about 8 to 10 orders of magni-
tude smaller than the mean. Therefore, the multiplication 
with the si values in the variance of MeRLiN’s AVF meas-
urements 2( )MeRLiNk keeps this variance from 6 to 8 orders 
of magnitude smaller than the mean (assuming si values up 
to 100): still a very small variance, only slightly increased 
compared to the initial one. 
Overall our analysis shows that the AVF measurement 
of MeRLiN has the same mean as the comprehensive ex-
periment of F injections, while both have a very small vari-
ance. These two statistical properties make them almost 
statistically equivalent although MeRLiN reports AVF in 2 
to 3 orders of magnitude shorter time. 
5. CONCLUSIONS
We presented MeRLiN, a methodology to accelerate com-
prehensive, statistically significant microarchitecture level 
fault injection campaigns on hardware structures modeled 
in performance simulators. MeRLiN's effectiveness is based 
on the combination of the principle of dynamic instruction 
repetition and the identification of the non-vulnerable in-
tervals for the entries of the hardware structures. We 
demonstrated its efficiency using microarchitecture level 
fault injection on a Gem5 model of a contemporary micro-
processor. We reported results for the method's speedup, 
accuracy, and scaling for different sizes of the physical reg-
ister file, store queue and first level data cache. 
MeRLiN achieves several orders of magnitude speedup 
(reduction of the number of injections) while it virtually 
delivers the same reliability measurements compared to 
exhaustive (but computationally infeasible) fault injection 
campaigns. Our experimental results and theoretical analy-
sis validate MeRLiN’s accuracy. 
ACKNOWLEDGMENT 
This work has been funded by the European Union through 
the CLERECO FP7 Project (Grant Agreement 611404) and 
the UniServer H2020 Project (Grant Agreement 688540). 
6. REFERENCES
[1] Robert Baumann. 2005. Soft errors in advanced computer systems. In 
IEEE Design & Test of Computers, vol. 22, no. 3, pp. 258-266, May-
June. DOI:http://doi.org/10.1109/MDT.2005.69 
[2] Zeshan Chishti, Alaa R.Alameldeen, Chris Wilkerson, Wei Wu, and 
Shih-Lien Lu. 2009. Improving cache lifetime reliability  
at ultra-low voltages. In Proceedings of the IEEE/ACM  
12
International Symposium on Microarchitecture (MICRO). 
DOI:http://dx.doi.org/10.1145/1669112.1669126 
[3] Cristian Constantinescu. 2003. Trends and challenges in VLSI circuit  
reliability. In IEEE Micro, vol. 23, pp. 14-19, July. 
DOI:http://dx.doi.org/10.1109/MM.2003.1225959 
[4] Lin Huang, and Qiang Xu. 2010. AgeSim: A simulation framework for 
evaluating the lifetime reliability of processor-based SoCs. In Proceed-
ings of Design, Automation and Test in Europe (DATE). ISBN:978-3-
9810801-6-2
[5] Sani R.Nassif, Nikil Mehta, and Yu Cao. 2010. A resilience roadmap. 
In Proceedings of Design, Automation and Test in Europe (DATE).  
ISBN:978-3-9810801-6-2 
[6] Yixin Luo, Sriram Govindan, Bikash Sharma, Mark Santaniello, Justin 
Meza, Aman Kansal, Jie Liu, Badriddine Khessib, Kushagra Vaid, and 
Onur Mutlu. 2014. Characterizing application memory error vulnerabil-
ity to optimize datacenter cost via heterogeneous reliability  
memory. In Proceedings of IEEE/IFIP International Conference  
on Dependable Systems and Networks (DSN).  
DOI:http://dx.doi.org/10.1109/DSN.2014.50 
[7] Hyungmin Cho, Shahrzad Mirkhani, Chen-Yong Cher, Jacob 
A.Abraham, and Subbanish Mitra. 2013. Quantitative evaluation of soft 
error injection techniques for robust system design. In Proceedings of 
ACM/EDAC/IEEE Design and Automation Conference (DAC). 
ISBN:978-1-4503-2071-9 
[8] Michail Maniatakos, Naghmeh Karimi, Chandra Tirumurti, Abhijit Jas, 
and Yiorgos Makris. 2011. Instruction- level impact analysis of low-
level faults in a modern microprocessor controller. In  
IEEE Transactions on Computers, vol. 60, no. 9, pp.1260-1273. 
DOI:http://dx.doi.org/ 10.1109/TC.2010.60 
[9] Nicholas J.Wang, Aqeel Mahersi, and Sanjay J.Patel. 2007. Examining 
ACE analysis reliability estimates using fault-injection. In Proceedings 
of IEEE/ACM International Symposium on Computer Architecture 
(ISCA). DOI:http://dx.doi.org/10.1145/1250662.1250719 
[10] Gulay Yalcin, Osman S.Unsal, Adrian Cristal, and Mateo  
Valero. 2011. FIMSIM: A fault injection infrastructure for  
microarchitectural simulators. In Proceedings of IEEE  
International Conference on Computer Design (ICCD).  
DOI:http://dx.doi.org/10.1109/ICCD.2011.6081435 
[11] Nikos Foutris, Dimitris Gizopoulos, John Kalamatianos, and Vilas 
Sridharan. 2013. Assessing the impact of hard faults in performance 
components of modern microprocessors. In Proceedings of IEEE  
International Conference on Computer Design (ICCD). 
DOI:http://dx.doi.org /10.1109/ICCD.2013.6657044 
[12] Athanasios Chatzidimitriou, and Dimitris Gizopoulos. 2016. Anatomy 
of microarchitecture-level reliability assessment: Throughput and accu-
racy. In Proceedings of IEEE International Symposium on  
Performance Analysis of Systems and Software (ISPASS). 
DOI:http://dx.doi.org/10.1109/ISPASS.2016.7482075 
[13] Manolis Kaliorakis, Sotiris Tselonis, Athanasios Chatzidimitriou,  
and Dimitris Gizopoulos. 2015. Differential fault injection  
on microarchitectural simulators. In Proceedings of IEEE  
International Symposium on Workload Characterization (IISWC). 
DOI:http://dx.doi.org/10.1109/IISWC.2015.28 
[14] Manolis Kaliorakis, Sotiris Tselonis, Athanasios Chatzidimitriou, and 
Dimitris Gizopoulos. 2015. Accelerated microarchitectural  
fault injection-based reliability assessment. In Proceedings of  
IEEE International Symposium on Defect and Fault  
Tolerance in VLSI and Nanotechnology Systems (DFTS). 
DOI:http://dx.doi.org/10.1109/DFT.2015.7315134 
[15] Shubhendu S.Mukherjee, Christopher Weaver, Joel Emer, Steven 
K.Reinhardt, and Todd Austin. 2004. A systematic methodology to 
compute the architectural vulnerability factors for a high-performance 
microprocessors. In Proceedings of IEEE/ACM International Sympo-
sium on Microarchitecture (MICRO). ISBN:0-7695-2043-X 
[16] Arun Nair, Stijn Eyerman, Lieven Eeckhout, and Lizy K.John. 2012. A 
first-order mechanistic model for architectural vulnerability factor. In 
Proceedings of IEEE/ACM International Symposium on Computer 
Architecture (ISCA). DOI:http://dx.doi.org/10.1145/2366231.2337191 
[17] Arijit Biswas, Paul Racunas, Romulus Cheveresan, Joel Emer,  
Shubhendu S.Mukherjee, and Ram Rangan. 2005. Computing  
architectural vulnerability factors for address-based structures. In Pro-
ceedings of IEEE/ACM International Symposium on Computer Archi-
tecture (ISCA). DOI:http://dx.doi.org/10.1109/ISCA.2005.18 
[18] Hossein Asadi, Vilas Sridharan, Mehdi Tahoori, and David Kaeli. 2005. 
Balancing performance and reliability in the memory hierarchy. In Pro-
ceedings of IEEE International Symposium on Performance  
Analysis of Systems and Software (ISPASS). DOI:http://dx.doi.org/ 
10.1109/ISPASS.2005.1430581 
[19] Xiaodong Li, Sarita V.Adve, Pradip Bose, and Jude A.Rivers. 2005. 
SoftArch: An architecture-level tool for modeling and  
analyzing soft errors. In Proceedings of IEEE/IFIP International  
Conference on Dependable Systems and Networks (DSN). 
DOI:http://dx.doi.org/10.1109/DSN.2005.88 
[20] Jinho Suh, Murali Annavaram, and Michel Dubois. 2012. MACAU: A 
markov model for reliability evaluations of caches under single-bit  
and multi-bit upsets. In Proceedings of IEEE International  
Symposium on High Performance Computer Architecture (HPCA). 
DOI:http://dx.doi.org/ 10.1109/HPCA.2012.6168940 
[21] Jinho Suh, Mehrtash Manoochehri, Murali Annavaram, and  
Michel Dubois. 2011. Soft error benchmarking of L2 caches  
with PARMA. In Proceedings of ACM SIGMETRICS. 
DOI:http://dx.doi.org/10.1145/2007116.2007127 
[22] Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke. 
2010. Shoestring: probabilistic soft error reliability on the cheap. In 
Proceedings of IEEE/ACM International Conference on Architectural 
Support for Programming Languages and Operating Systems 
(ASPLOS). DOI:http://dx.doi.org/10.1145/1736020.1736063 
[23] Nishant J.George, Carl R.Elks, Barry W.Johnson, and John Lach. 2010. 
Transient fault models and AVF estimation revisited. In Proceedings of 
IEEE/IFIP International Conference on Dependable Systems and 
Networks (DSN). DOI:http://dx.doi.org/10.1109/DSN.2010.5544276 
[24] Xiaodong Li, Sarita V.Adve, Pradip Bose, and Jude A.Rivers. 2007. 
Architecture-level soft error analysis: Examining the limits of common 
assumptions. In Proceedings of IEEE/IFIP International Conference 
on Dependable Systems and Networks (DSN). DOI: http://dx.doi.org/ 
10.1109/DSN.2007.15 
[25] Arun A.Nair, Lizy K.John, and Lieven Eeckhout. 2010. AVF Stress-
mark: Towards an automated methodology for bounding the worst-case 
vulnerability to soft errors. In Proceedings of IEEE/ACM International  
Symposium on Microarchitecture (MICRO). DOI:http://dx.doi.org/ 
10.1109/MICRO.2010.34 
[26] Regis Leveugle, A.Calvez, Paolo Maistri, and Pierre Vanhauwaert. 
2009. Statistical fault injection: Quantified error and confidence. In 
Proceedings of Design, Automation and Test in Europe (DATE). 
DOI:http://dx.doi.org/ 10.1109/DATE.2009.5090716 
[27] Arijit Biswas, Paul Racunas, Joel Emer, and Shubhendu S.Mukherjee. 
2008. Computing accurate AVFs using ACE analysis on performance 
models: a rebuttal. In IEEE Computer Architecture Letters, vol.7, no. 
1, January-June. DOI:http://dx.doi.org/ 10.1109/L-CA.2007.19 
[28] Aashish Phansalkar, Ajay Joshi, and Lizy K.John. 2007. Analysis of 
redundancy and application balance in the SPEC CPU2006  
benchmark suite. In Proceedings of IEEE/ACM International Sympo-
sium on Computer Architecture (ISCA). DOI:http://dx.doi.org/ 
10.1145/1273440.1250713 
[29] Avinash Sodani, and Gurinhar S.Sohi. 1997. Dynamic instruction reuse. 
In Proceedings of IEEE/ACM International Symposium on Computer 
Architecture (ISCA). DOI:http://dx.doi.org/10.1145/384286.264200 
13
[30] Avinash Sodani, and Gurinhar S.Sohi. 1998. An empirical  
analysis of instruction repetition. In Proceedings of IEEE/ACM  
International Conference on Architectural Support for  
Programming Languages and Operating Systems (ASPLOS). 
DOI:http://dx.doi.org/10.1145/384265.291016 
[31] Saisanthosh Balakrishnan, and Gurinhar S.Sohi. 2003. Exploiting value 
locality in physical register files. In Proceedings of IEEE/ACM Interna-
tional Symposium on Microarchitecture (MICRO). ISBN:0-7695-
2043-X 
[32] Xin Fu, Tao Li, and Jose Fortes. 2006. Sim-SODA: A unified frame-
work for architectural level software reliability analysis. In Workshop 
on Modeling, Benchmarking and Simulation. 
[33] Lide Duan, Bin Li, and Lu Peng. 2009. Versatile prediction and fast 
estimation of architectural vulnerability factor from processor perfor-
mance metrics. In Proceedings of IEEE International Symposium  
on High Performance Computer Architecture (HPCA). 
DOI:http://dx.doi.org/ 10.1109/HPCA.2009.4798244 
[34] Konstantinos Parasyris, Georgios Tziantzoulis, Christos D. Antonopou-
los, and Nikolaos Bellas. 2014. GemFI: A fault injection tool for study-
ing the behavior of applications on unreliable substrates. In Proceed-
ings of IEEE/IFIP International Conference on Dependable Systems 
and Networks (DSN). DOI:http://dx.doi.org/10.1109/DSN.2014.96 
[35] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven 
K.Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek 
R.Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey 
Sewell, Muhammad Shoaib, Nilay Vaish, Mark D.Hill,  
and David A.Wood. 2011. The Gem5 simulator. In ACM  
SIGARCH Computer Architecture News, vol. 39, no. 2, May. 
DOI:http://dx.doi.org/10.1145/2024716.2024718 
[36] Man-Lap Li, Pradeep Ramachandran, Swarup K.Sahoo, Sarita V.Adve, 
Vikram S.Adve, and Yuanyuan Zhou. 2008. Understanding the propa-
gation of hard errors to software and implications for resilient system 
design. In Proceedings of IEEE/ACM International Conference on Ar-
chitectural Support for Programming Languages and Operating Sys-
tems (ASPLOS). DOI:http://dx.doi.org/ 10.1145/1353534.1346315 
[37] Jinho Suh, Murali Annavaram, and Michel Dubois. 2013.  
PHYS: Profiled-HYbrid Sampling for soft error reliability  
benchmarking. In Proceedings of IEEE/IFIP International  
Conference on Dependable Systems and Networks (DSN). 
DOI:http://dx.doi.org/10.1109/DSN.2013.6575352 
[38] Vilas Sridharan, and David R.Kaeli. 2009. Eliminating  
microarchitectural dependency from architectural vulnerability.  
In Proceedings of IEEE International Symposium on 
High Performance Computer Architecture (HPCA). 
DOI:http://dx.doi.org/10.1109/HPCA.2009.4798243 
[39] Vilas Sridharan, and David R.Kaeli. 2010. Using hardware vulnerabil-
ity factors to enhance AVF analysis. In Proceedings of IEEE/ACM In-
ternational Symposium on Computer Architecture (ISCA). 
DOI:http://dx.doi.org/10.1145/1816038.1816023 
[40] Arijit Biswas, Niranjan Soundararajan, Shubhendu S.Mukherjee, 
Sudhanva Gurumurthi. 2009. Quantized AVF: a means of capturing 
vulnerability variations over small windows of time. In International 
Workshop on Silicon Errors in Logic-System Effects (SELSE). 
[41] Pablo Montesinos, Wei Liu, and Josep Torrellas. 2007. Using  
register lifetime predictions to protect register files  
against soft errors. In Proceedings of IEEE/IFIP International  
Conference on Dependable Systems and Networks (DSN). 
DOI:http://dx.doi.org/10.1109/DSN.2007.99 
[42] Xin Xu, and Man-Lap Li. 2012. Understanding soft error propagation 
using efficient vulnerability-driven fault injection. In Proceedings of 
IEEE/IFIP International Conference on Dependable Systems and 
Networks (DSN). ISBN: 978-1-4673-1624-8 
[43] Vimal Reddy, and Eric Rotenberg. 2007. Inherent Time  
Redundancy (ITR): Using program repetition for low-overhead  
fault tolerance. In Proceedings of IEEE/IFIP International  
Conference on Dependable Systems and Networks (DSN). 
DOI:http://dx.doi.org/10.1109/DSN.2007.59 
[44] Mohamed A.Gomaa, and T.N.Vijaykumar. 2005. Opportunistic transi-
ent-fault detection. In Proceedings of IEEE/ACM International  
Symposium on Computer Architecture (ISCA). DOI:http://dx.doi.org/ 
10.1109/ISCA.2005.38 
[45] Siva K.S.Hari, Sarita V.Adve, Helia Naemi, and Pradeep Ramachan-
dran. 2012. Relyzer: Exploiting application-level fault equivalence to 
analyze application resiliency to transient faults. In Proceedings of 
IEEE/ACM International Conference on Architectural Support for 
Programming Languages and Operating Systems (ASPLOS). 
DOI:http://dx.doi.org/ 10.1145/2150976.2150990 
[46] Guanpeng Li, Qining Lu, and Karthik Pattabiraman. 2015.  
Fine-grained characterization of faults causing long latency  
crashes in programs. In Proceedings of IEEE/IFIP International  
Conference on Dependable Systems and Networks (DSN). 
DOI:http://dx.doi.org/10.1109/DSN.2015.36 
[47] Horst Schirmeier, Christoph Borchert, and Olaf Spinczyk. 2015. Avoid-
ing pitfalls in fault-injection based comparison of program susceptibility 
to soft errors. In Proceedings of IEEE/IFIP International Conference 
on Dependable Systems and Networks (DSN). DOI:http://dx.doi.org/ 
10.1109/DSN.2015.44 
[48] Siva K.S.Hari, Radha Venkatagiri, Sarita V.Adve, and Helia 
Naemi. 2014. GangES: Gang error simulation for hardware  
resilience evaluation. In Proceedings of IEEE/ACM  
International Symposium on Computer Architecture (ISCA). 
DOI:http://dx.doi.org/10.1145/2678373.2665685 
[49] Matthew R.Guthaus, Jeff S.Ringenberg, Damien Ernst, Todd M.Austin, 
Trevor Mudge, and Richard B.Brown. 2001. MiBench: A free, com-
mercially representative embedded benchmark suite. In Proceedings of 
IEEE International Workshop on Workload Characterization (WWC). 
DOI:http://dx.doi.org/ 10.1109/WWC.2001.990739 
[50] Daya S.Khudia, and Scott Mahlke. 2014. Harnessing soft computations 
for low budget fault tolerance. In Proceedings of IEEE/ACM  
International Symposium on Microarchitecture (MICRO). 
DOI:http://dx.doi.org/ 10.1109/MICRO.2014.33 
[51] Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. 
2002. Automatically characterizing large scale program behavior. In 
Proceedings of IEEE/ACM International Conference on Architectural 
Support for Programming Languages and Operating Systems 
(ASPLOS). DOI:http://dx.doi.org/10.1145/635506.605403 
14
