Architectural-Space Exploration of Heterogeneous Reliability and
  Checkpointing Modes for Out-of-Order Superscalar Processors by Prabakaran, Bharath Srinivas et al.
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
Architectural-Space Exploration of
Heterogeneous Reliability and
Checkpointing Modes for Out-of-Order
Superscalar Processors
BHARATH SRINIVAS PRABAKARAN1, (Student Member, IEEE), MIHIKA DAVE2, FLORIAN
KRIEBEL1, SEMEEN REHMAN1, AND MUHAMMAD SHAFIQUE1, (Senior Member, IEEE)
1Technische Universität Wien (TU Wien), Vienna, Austria
2University of Illinois Urbana-Champaign, Champaign, Illinois, USA
Corresponding author: Bharath Srinivas Prabakaran (bharath.prabakaran@tuwien.ac.at).
This work is supported in parts by the German Research Foundation (DFG) as part of the priority program "Dependable Embedded
Systems" (SPP 1500 - http://spp1500.itec.kit.edu)
ABSTRACT Reliability for multi-core processors has emerged as an important design constraint. A key
research challenge is to detect and/or mitigate transient faults, such as soft errors, that can abruptly terminate
an executing application or generate incorrect output, both leading to undesirable effects that can potentially
be catastrophic in safety-critical systems. State-of-the-art reliability techniques and mechanisms deploy
full-scale redundancy, like double or triple modular redundancy (DMR, TMR), on different layers of the
computing stack to detect and/or correct such transient faults. However, the techniques relying on full-
scale redundancy incur significant area, performance, and/or power overheads, which might not always be
feasible/practical due to system constraints such as deadlines and available power budget for the full chip (or
a processor core). Moreover, depending on the inherent resilience of an application, not every application
requires full-scale redundancy, that would, otherwise, result in resource/energy wastage. Hence, techniques
relying on selective redundancy have recently been investigated by researchers.
In this work, we propose a novel design methodology to generate and explore the architectural-space of
heterogeneous reliability modes for out-of-order superscalar multi-core processors. These heterogeneous
modes are iso-ISA (i.e., implement the same Instruction Set Architecture), but differ in terms of the micro-
architectural implementation, i.e., different components are hardened with different reliability techniques.
Hence, these heterogeneous modes enable varying reliability and power/area trade-offs, from which an
optimal configuration can be chosen at run time to meet the reliability requirements of a given system, while
reducing the corresponding power overheads (or alternatively solving the inverse problem, i.e., maximizing
the reliability under a given power constraint). We implemented different reliability modes for the ALPHA
21264 out-of-order superscalar microprocessor, and integrated different cores with heterogeneous reliability
modes in a multi-core configuration. Our experimental results show that a pareto-optimal heterogeneous
reliability mode reduces the core vulnerability by 87%, on average, across multiple application workloads,
with area and power overheads of 10% and 43%, respectively.
To further enhance the design space of heterogeneous reliability modes, we investigate the effectiveness of
combining different processor state compression techniques like Distributed Multi-threaded Checkpointing
(DMTCP), Hash-based Incremental Checkpointing (HBICT) and GNU zip, such that the correct processor
state can be recovered once a fault is detected. These state compression techniques aim at reducing the
storage requirements of the processors’ correct state, which is backed-up at an application checkpoint during
its execution to ensure successful recovery. We reduced the checkpoint sizes by a factor of ~6× using a
unique combination of different state compression techniques. To validate our concepts, we significantly
enhanced the open-source cycle-accurate simulator gem5 (which is widely adopted by the relevant research
communities) with the state compression techniques and the heterogeneous reliability modes.
INDEX TERMS Reliability, Multi-Cores, Heterogeneity, Fault-Tolerance, AVF, Hardening, Microproces-
sors, Superscalar, Resilience, Design Space Exploration, Checkpointing, Out-of-Order, Architecture.
VOLUME 4, 2016 1
ar
X
iv
:1
81
1.
07
61
2v
3 
 [c
s.A
R]
  1
2 J
ul 
20
19
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
I. INTRODUCTION
AGGRESSIVE transistor scaling has led to an increasedsusceptibility towards several reliability problems, such
as soft errors, at the hardware layer [1]. Soft errors are
transient faults in the hardware that cause bit-flips in the
micro-architecture, which may propagate to the application
output and corrupt its state, or may terminate the applica-
tion’s execution [2] [3]. The rate of occurrences of these soft
errors is expected to increase with each new generation of mi-
croprocessors being released, due to aggressive shrinking of
transistors’ feature sizes and imperfection in the fabrication
process [4] [5] (see Section II).
Plenty of research works focusing on techniques like full-
scale redundancy and checkpointing have been proposed
towards prevention, detection, and/or mitigation of soft errors
across the computing stack, i.e., the hardware and software
layers [6] [7]. Reliability at the hardware layer is ensured
through redundancy of execution paths and/or hardening of
pipeline components, i.e., full-scale Double or Triple Mod-
ular Redundancy (DMR, TMR). Software-layer techniques
realize full-scale spatial/temporal redundancy by executing
multiple redundant instructions or threads of an application,
thereby ensuring a reliable output [8] [9] [10]. However,
these full-scale redundancy techniques incur significant per-
formance and energy overheads (e.g., in case of temporal
redundancy), and area/power/energy overhead (e.g., in case
of spatial redundancy).
Therefore, we propose to investigate the individual prop-
erties and requirements of an application workload to deter-
mine the component-level vulnerabilities of an out-of-order
superscalar processor at design-time, i.e., enabling reliability
provisions at a much finer granularity. Based on this analysis,
we develop a wide range of heterogeneous reliability and
checkpointing modes that enable efficient control over the
achieved reliability and the incurred overhead, especially
when considering the diverse resilience properties of differ-
ent executing applications at different run time instances.
Our previous work [11] provides an initial proof-of-concept
of this work and preliminary results for the feasibility of
reliability-heterogeneous cores. In this work, we significantly
extend this concept, and provide a systematic methodology
to integrate such reliability heterogeneous modes on a chip
along with other different types of reliability mechanisms like
checkpointing and state compression to expand the space of
design trade-offs for reliability vs. overhead.
In a nutshell,wemake the following novel contributions:
(1) Vulnerability Analysis: A comprehensive vulnerability
analysis of an out-of-order superscalar core to compute
the Architectural Vulnerability Factor (AVF) of different
pipeline components.
(2) Methodology for Architectural-Space Generation and
Exploration: A novel architectural-space generation and
exploration methodology that:
(a) analyzes the processor-level vulnerabilities of out-of-
order superscalar cores to develop a wide range of
heterogeneous reliability modes at design-time;
(b) provides a multi-core processor with multiple hetero-
geneous reliability modes, such that each core deploys
distinct reliability measures in different components;
(c) enables the reliability-power trade-offs of the pro-
posed heterogeneous reliability modes under diverse
application workloads, which can be leveraged at run-
time to either optimize reliability under the given
power constraint, or decrease power consumption un-
der the reliability requirement of the system;
(3) Run-Time System: To evaluate the run-time benefits of
reliability-heterogeneity, we execute multiple different
application workload mixes on our heterogeneous 10-
core processor with all the designed heterogeneous re-
liability modes. We propose and evaluate two differ-
ent mapping policies, namely, Vulnerability-Constrained
Power Minimization and Power-Constrained Vulnerabil-
ity Minimization. These two policies differ in terms of
the constraints imposed and their minimization objec-
tives, as provided by the system designer. We illustrate
the decrease in power overhead and the full-processor
vulnerability factor using these two task-mapping poli-
cies.
(4) State Compression: To further enhance the processor
reliability and to increase the design space, we analyze
and investigate efficient state compression techniques.
These strategies can be used to decrease the storage
size of the checkpointing data, for example, in case of
transient faults, and to enable an efficient rollback of the
processor to its last known safe-state.
(5) Evaluation and Discussion: We evaluate the effective-
ness of the heterogeneous reliability modes designed
using our proposed methodology under diverse applica-
tion workloads by enhancing the widely adopted cycle-
accurate simulator gem5 to offer the required functional-
ity of heterogeneous-reliability and state compression.
Fig. 1 illustrates an overview of our contributions in a design-
flow for developing heterogeneous multi-core processors.
Application 
Program
ALPHA 
Processor [31]
System
Model
[24][35]
System 
Constraints
RM9 RM9
U
R
M
7
R
M
7
R
M
4
RM1 RM1
RM3 RM2
RM6
Our Architectural-Space 
Generation & Exploration 
Methodology (Section IV)
Vulnerability 
Analysis
(Section IV-C)
Fault Injection
(Section IV-D)
Heterogeneous 
Reliability 
Modes
(Section IV-E)
Checkpointing 
& State 
Compression
(Section IV-F)
Heterogeneous Multi-Core Processor
ASIC Design Flow (Section IV-B)
Run-Time System (Section IV-F)
R
M
4
FIGURE 1: An Overview of our Contributions
(Highlighted Boxes) in the Processor Design Flow.
Paper Organization: Section II presents the preliminaries
and background information required to understand our pro-
posed contributions. We discuss the system models in Sec-
tion III. Section IV presents our methodology for generation
2 VOLUME 4, 2016
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
and exploration of the architectural-space of heterogeneous
reliability modes, including results that illustrate the benefits
of the proposed approaches. Section V presents the related
work on state-of-the-art reliability techniques and hetero-
geneous reliability approaches, followed by the conclusion
presented in Section VI.
II. PRELIMINARIES AND BACKGROUND
Soft Errors: In the era of nanometer technology nodes, reli-
ability threats like manufacturing-induced process variation,
device aging, and transient faults are increasingly challeng-
ing the functional correctness and safety-critical aspects of
the systems where these electronic devices are deployed [1].
An example of transient faults is soft errors, which have
emerged as a serious threat to the reliability of a digital
system. These soft errors are generated at the hardware layer,
due to four key factors, namely,
(1) Alpha Particles, which are positively charged composite
particles emitted during radioactive decay. These par-
ticles travel through the semiconductor device thereby
disturbing the electron distribution of the transistor [12].
(2) Cosmic Rays, which are a flux of energetic neutrons that
are constantly emitted by the solar system [13] [14].
(3) Thermal Neutrons, which are neutrons that have at-
tained thermal equilibrium after dissipating all kinetic
energy [15].
(4) Internal Factors such as random noise, signal integrity
issues, cross-talks, and electromagnetic interference [3].
Soft errors cause temporary bit-flips either in the control or
data path of a micro-architecture, or in the on-chip memory
cells, which may propagate to the application output or may
crash, hang, or terminate the application execution [2]. Fig. 2
illustrates the soft-error phenomenon, which can be broken
into three phases.
(1) First, in the ion-track formation phase (phase-I), a high
energy particle (such as the cosmic rays discussed ear-
lier) strikes the transistor to generate multiple electron-
hole pairs, which in turn increase the concentration of
carriers along the ion’s path.
(2) In phase-II (current pulse generation), the ions collected
at the depletion region form a “temporary” channel that
funnels the current from source to drain, which could
toggle the transistor state for tens of picoseconds. This
can result in a bit flip in (i) the memory cell, which can
be latched to the incorrect value until and unless its is
over-written by another value; or (ii) the logic gate that
can potentially propagate to the final output of the circuit,
thereby corrupting the output of the circuit.
(3) In the ion diffusion phase (phase-III), over a period of
tens or hundreds of picoseconds, the charges diffuse into
the depletion layer, thereby disintegrating the temporary
channel.
Increasing Soft Error Rates: In the earlier genera-
tion technology nodes, the transistor dimensions were large
enough that a temporary channel could not funnel the cur-
n
-
--
- -
--
- --
-
-
-- -
- -
--
-
-
-
-
-
-
-
-
-
-
- -
+
+
++
+
+
+
++
+
+
+
+++
+
+
++
+
++
+
++
+
+
+
+
+
p-well
n+
-
-
--
-
-
-
--
- -
- -
-
-
-
-
--
+
+
+
+
++
++ +
+
+
++
3
2
1
0
10
-13
10
-12
10
-11
10
-10
10
-9
Time (seconds)
C
u
rr
e
n
t
Phase-I Phase-IIIPhase-II
High Energy Particle Strike (Protons or Neutrons)
FIGURE 2: The Three Phases of the Soft Error
Phenomenon (adapted from [2]).
rent from source to drain. Furthermore, due to reducing
transistor dimensions, the rate of soft error occurrences is
increasing with each new generation of processors being
released into the market, due to their fabrication using con-
tinuously smaller technology nodes [4] [5] (see Fig. 3).
This is a major threat for the current world infrastructure,
which heavily relies on electronics for all activities, such as
work, communication, transportation, socializing, internet,
etc. Even the day-to-day devices and services that people
use, e.g., wearable devices such as smart-watches and fitness
trackers, mobile computing platforms such as mobile phones
and laptops, and on-demand cloud services offered by large-
scale data centers, heavily rely on the reliability of electronic
devices. This becomes even more crucial for safety-critical
application domains like aerospace, automotive, healthcare,
industry 4.0, smart grids, smart homes, etc.
180 135 90 65 45 32 22 16
0
25
50
75
100
125
150
S
o
ft
-e
rr
o
r 
R
a
te
 F
a
il
u
re
 i
n
 T
im
e 
o
f 
a
 C
h
ip
Technology Node (nm)
~100% Increase 
in Failure Rate
~100× Increase 
in Failure Rate
Kirin 650 SoC
Intel Core i7
Intel Pentium IV
FIGURE 3: Increase in Soft-error Rate of a Chip for
Multiple Technology Nodes (adapted from [5]).
Processor Hardening: Reliability at the hardware layer is
typically ensured by the use of full-scale redundancy, which
involves realizing multiple instantiations of the hardware unit
with the same set of inputs, to generate outputs that can
be compared with each other to detect errors (in DMR) or
correct errors using a voter circuit (in case of TMR), which
we refer to as hardware hardening. An overview of these
hardware-level redundancy techniques is presented in Fig. 4.
DMR and TMR incur significant area and power overheads
caused by the redundant hardware units and the additional
circuitry used to detect or correct errors. Furthermore, since
VOLUME 4, 2016 3
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
the additional hardware components execute in parallel, the
throughput of the system is not affected, with a minimal gate-
level increase in delay caused by the voter circuit. Typically,
to ensure very high reliability, the entire processor pipeline
(full-scale) is hardened, i.e., all the pipeline components are
instantiated thrice with the same set of inputs and a voter
circuit to elect the majority output, as illustrated by Gaisler’s
completely hardened LEON3-FT micro-processor that de-
ploys redundancy in the register file and cache memory [16].
Fig. 4 also illustrates the gate-level implementation of the
voter circuit, and how, in the case of soft errors, the majority
output is elected and generated as the final output. Note, this
leads to the possibility of the voter circuit becoming a single
point of failure, which is mitigated by triplicating the voter
circuit as well, and has been deployed, for example, in the
Saturn Launch Vehicle Digital Computer [17] [18]. In this
work, without the loss of generality, we advocate the enabling
of fine-grained reliability at different component level that
can facilitate realization of different hardening modes for
different processor cores, thereby providing a wide range of
reliability-power trade-offs. As a proof of concept, we will
showcase an example of using component-level TMR with
a single majority voter circuit. However, any other reliability
mechanism can be deployed as a knob at the component level.
Double Modular Redundancy
Output
HWR-HW
OutputError
Triple Modular Redundancy
Input Input
~100% 
Area/Power 
Overhead
~200% Area/Power Overhead
R-HWR-HWHW
Voter
HHH
HHH
VVV
Voter Triplication
VoterE
O
O
O
≠
Soft-error
FIGURE 4: An Overview of the Redundancy Techniques
at the Hardware Layer.
Out-of-Order Superscalar Processors: Besides transis-
tor scaling, architectural innovations such as deep pipelining,
instruction-level parallelism, out-of-order execution, specu-
lative execution, branch prediction, etc. have tremendously
increased the computing capabilities of microprocessors.
Almost all current generation microprocessors are realized
with such functionalities to ensure high system performance.
For example, superscalar processors exploit an application’s
instruction-level parallelism to execute multiple instructions
in parallel during the same clock-cycle on multiple differ-
ent execution units [20]. Out-of-Order processors execute
instructions out-of-order, as opposed to the typical sequential
execution, by exploiting the interdependency, or the lack
thereof, of program instructions and the data processed by
Branch 
Predictor
M
U
X
MUX
Instruction 
Cache
(I-$)
Line/Set 
Prediction
Fetch RenameSlot
Integer 
Register 
Rename
Floating-
Point 
Register 
Rename
Integer 
Issue 
Queue
Floating-
Point
Issue 
Queue
Issue
Integer 
Register 
File
Integer 
Register 
File
Register Read
Floating-
Point 
Register 
File
Integer 
Execution
Integer 
Execution
Execute
Integer 
Execution
Integer 
Execution
Memory
Floating-Point Multiply
Floating-Point Addition
Data 
Cache
(D-$)
Level-II Cache & System Interface
Addr
Addr
FIGURE 5: ALPHA 21264 Out-of-Order Superscalar
Processor Architecture (adapted from [19]).
them [21]. This allows for executing “independent” instruc-
tions in clock-cycles that would be otherwise lost in pipeline
stalls caused by control- or data-flow dependencies. Fig. 5 il-
lustrates the control- and data-path of the ALPHA 21264 out-
of-order superscalar microprocessor [19], which is widely
used in the architecture research community.
Alpha 21264, or Alpha 7, is a four-issue, seven pipeline
stage superscalar processor architecture that is capable of
executing up to six (four integer and two floating-point) in-
structions per cycle (IPCs) while sustaining four instructions
simultaneously. During a program’s execution, the processor
can accommodate up to 80 instructions in the pipeline, which
is kept track of using the processor’s re-order buffer (ROB).
The Alpha 7 processor also includes two cache levels, i.e., the
primary and secondary caches. The processor uses a modi-
fied Harvard architecture that implements separate primary
instruction (I-cache) and data caches (D-cache), typically
of size 64KB each. The D-cache is dual-ported in order
to allow simultaneous read and write on both rising and
falling edge of the clock. This feature allows for reducing
the area and power overheads associated with duplicating
the cache, as in the Alpha 21164 microprocessor. The sec-
ondary cache, or B-cache, is usually a direct-mapped cache
that is located off-chip and shared by all processor cores.
Typically, L2-cache has a maximum capacity of 16MB and is
constructed using synchronous static random access memory
(SSRAM), which is accessed using a dedicated 128-bit high-
bandwidth bus [22]. Branch prediction in this microprocessor
is implemented using a hybrid two-level branch prediction
algorithm called tournament prediction, with a minimum
branch misprediction penalty of 7 clock-cycles [22]. The
processor was built using 15.2 million transistors, roughly
40% of which was occupied by the core processing unit and
the rest of which was consumed by the caches and branch
history tables [23].
III. SYSTEM MODEL
ArchitectureModel: To cater for different application work-
loads with varying reliability requirements, we realize a
4 VOLUME 4, 2016
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
reliability-heterogeneous multi-core processor (HMC):
HMC = {PC1, PC2, ..., PCM}
where PCj denotes the jth processor core, such that, j ∈
{1, 2, ...,M}, with a total of M processor cores in the
HMC. Each processor core has L different architectural
components, denoted as:
PCj = {C(j,1), C(j,2), ..., C(j,L)}
where C(j,k) denotes the kth component in the jth processor
core. Each architectural component (like re-order buffer,
register file, instruction queue, etc.) in each processor core
(C(j,k)) can be hardened by using mechanisms like TMR,
DMR, Checkpointing and Rollback, Error Correcting Codes,
or Razor latches. We denote the ith reliability technique of
the component C(j,k) as:
RT (C(j,k)) = i
Without loss of generality, in this work, we explore the appli-
cability of TMR for designing the heterogeneous reliability
modes. This leads to i = {0, 1}, where RT (C(j,k)) = 0
denotes the unprotected component without any type of
hardening and RT (C(j,k)) = 1 denotes a component that
has been hardened by triple modular redundancy, thereby
enabling heterogeneous hardening.
The area of each processor core is denoted as A(PCj),
which is the summation of area of all the processor compo-
nents, including the overhead of hardening certain compo-
nents. Note, only a selective subset of the different hetero-
geneous reliability modes can be activated at run-time due
to the total power constraint of a system, while considering
the application’s reliability requirement. An overview of the
symbols used in this work and their denotations have been
presented in Table 1.
Application Model: The applications are modeled as a
set of task graphs {T,E} containing task and dependency
information for all application workloads. T is denoted as
T = {T1, T2, ..., TZ} for a set of Z tasks. E is defined as
E = {Exy | (Tx, Ty) ∈ T} for the set of task dependencies.
For the given processor core (PCj) each task Tq has the
following execution properties:
• P (Tq, PCj), which denotes the peak power consumption,
• L(Tq, PCj), which denotes the average performance in
terms of execution time, and
• FPV F (Tq, PCj), which denotes the full-processor vul-
nerability factor.
Reliability Model: The Architectural Vulnerability Factor
(AVF) of a hardware component is defined as the proba-
bility of a fault to propagate to the final output resulting
in an execution error [24]. We compute the AV F of a
component C(j,k) as the fraction of bits vulnerable in each
cycle (Vulnerable-Bits) to the total number of output bits
(TotalBits) generated by component C(j,k) for a duration of
N cycles. AVF of a component C(j,k) is ‘0’ if the component
is hardened, or produces no architecturally incorrect bits [24].
Note, all bits of a branch predictor are always architecturally
TABLE 1: Symbols and Denotations
Symbol Denotation
HMC Heterogeneous Multi-core Processor
M Total Number of Cores in HMC
PCj jth Processor Core
C(j,k)
kth Architectural Component
in jth Processor Core
L
Total Number of Architectural
Components in PCj
RT (C(j,k))
The Reliability Technique used
to Harden Component C(j,k)
A(PCj) Area of the Processor Core PCj
T Set of Tasks
Z Total Number of Tasks
E Set of Task Dependencies
P (Tq , PCj)
Power Consumption for the qth Task
when executing on jth Processor Core
L(Tq , PCj)
Average Execution Time for the qth
Task when executing on jth Processor
Core
AV FC(j,k)
Architectural Vulnerability Factor
of Component C(j,k)
FPV F (Tq , PCj)
Full-Processor Vulnerability Factor of
Processor Core PCj for Task Tq
N Number of Clock Cycles
V ulnerableBits
Total Number of Vulnerable Bits
V ulnerableT ime
Time Duration of Vulnerable Bits
TotalBits Total Number of Output Bits
TotalT ime Total Duration of Application Execution
Perror Error Rate of a Transient Fault
Pflip
Probability of High-Energy Particle
Strike leading to a Soft Error
Nerror Number of Program Failures
NFI Spatial Vulnerability of Component
ORM Optimal Reliability Modes
correct, therefore a branch predictor’s AVF is always ‘0’.
Similarly, all bits of the program counter (PC) are always
vulnerable, therefore the AVF of a PC is always ‘100’ [24].
AVF is estimated using the following equation:
AV FC(j,k) =
∑N
n=0 V ulnerableBits(C(j,k))
TotalBits×N × 100
To study the impact of component hardening on the full-
processor, we extend the AV F to define the Full-Processor
Vulnerability Factor (FPV F ) for a given application work-
load. We define FPV F as the ratio of the total number
of vulnerable bits (VulnerableBits) in the processor pipeline
for the duration they are vulnerable (VulnerableTime) to the
total number of bits in the processor pipeline (TotalBits) for
the total duration of application execution (TotalTime). It is
computed using the following equation:
FPV F (Tq, PCj) =∑
∀C(j,k) V ulnerableBits(C(j,k))×V ulnerableT ime(C(j,k))∑
∀C(j,k) TotalBits(C(j,k))×TotalT ime(C(j,k))
× 100
IV. HETEROGENEOUS RELIABILITY MODES OF
OUT-OF-ORDER SUPERSCALAR CORES
A. METHODOLOGY OVERVIEW
Fig. 6 presents an overview of our methodology for designing
and exploring heterogeneous reliability modes for out-of-
VOLUME 4, 2016 5
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
Vulnerability Models (Section-III)
𝐴𝑉𝐹𝐶 =  
 𝑉𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒𝐵𝑖𝑡𝑠𝑛=𝑁𝑛=0
𝑇𝑜𝑡𝑎𝑙𝐵𝑖𝑡𝑠 × 𝑁
 
𝐹𝑃𝑉𝐹𝑊 =  
   𝑉𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒𝐵𝑖𝑡𝑠𝑖∀𝑖∈𝐶 × 𝑉𝑢𝑙𝑛𝑒𝑟𝑎𝑏𝑙𝑒𝑇𝑖𝑚𝑒𝑖
 𝑇𝑜𝑡𝑎𝑙𝑇𝑖𝑚𝑒𝑖  ×  𝑇𝑜𝑡𝑎𝑙𝑇𝑖𝑚𝑒𝑖∀𝑖∈𝐶
 
𝐼𝑉𝐼𝑖  =  
 𝐼𝑉𝐼𝑖𝑐  × 𝐴𝑐  × 𝑃𝑓𝑎𝑢𝑙𝑡  𝑐 𝑐∈𝑊
 𝐴𝑐𝑐∈𝑊
 
 
Set of Applications
Error Dist.
FVI
Area
PowerIVI
TRF
Vulnerability and Hardware 
Analysis (Section IV-C)
ALPHA 21264 Processor (Section II)
BP
I-$
RM
RM
IQ
IQ
RF
RF
RF
I-EX
I-EX
I-EX
I-EX
D-$
FP Mul.
FP Add.
Fault Injection Engine (Section IV-D)
BP
I-$
RM
RM
IQ
IQ
RF
RF
RF
I-EX
I-EX
I-EX
I-EX
D-$
FP Mul.
FP Add.
Heterogeneous Reliability Modes (Section IV-E)
IF S RM IQ Reg EX M
IF S RM IQ Reg EX M
IQ
IQ
Unprotected Core 
U
Reliability Mode 
RMK
Checkpointing and Rollback (Section IV-F)
Checkpointing
State Compression
Compression 
Algorithms
Rollback
State Decompression
Compressed Variants
FIGURE 6: Overview of Our Architecture-Space Generation and Exploration Methodology for Hardening
Out-of-Order Superscalar Heterogeneous Multi-Core Processors.
order superscalar multi-core processors. Our methodology
targets two approaches for designing heterogeneous relia-
bility modes: (1) Redundancy, and (2) Checkpointing. To
ensure reliable execution at the hardware layer, we propose
hardening the processor’s highly vulnerable pipeline compo-
nents. These pipeline components are selected based on the
initial fault-injection experiments, or on the AVF values that
are estimated based on the number of vulnerable bits and
vulnerable time of each component (see model description
in Section III). Furthermore, we ensure reliability by investi-
gating state compression techniques that can reduce the size
of checkpoint data. Before moving on to our fault-injection
and vulnerability analyses, we will present our experimental
setup for better understanding.
B. EXPERIMENTAL SETUP
To evaluate the vulnerability, power and area requirements of
the proposed heterogeneous reliability modes, we have mod-
ified the well-established open-source tools like the cycle-
accurate system simulator, gem5 [25] and HP’s power and
area estimator tool McPAT [26]. Our extensions to these
tool chains provide the following functionality: (1) estimate
the vulnerability of all pipeline components by determining
their AVFs [24], (2) support for heterogeneous reliability
modes by hardening key pipeline components using com-
ponent-level redundancy [11], but not full-scale pipeline
triplication all the time, and (3) checkpoint processor states
using mechanisms like Distributed Multi-Threaded Check-
pointing (DMTCP) [27] [28] and Hash-Based Incremental
Checkpointing Tool (HBICT) [29] [30]. Due to its high
customization capability, we use the Alpha 21264 four-issue
out-of-order superscalar core [31] as our target platform.
Furthermore, we extend the concept of AVF towards the
FPVF metric (see Section III) to evaluate the impact of
component hardening on the reliability mode, for a given
application workload. To account for a wide range of ap-
plications, we evaluate the proposed heterogeneous reliabil-
ity modes using the MiBench application benchmark suite.
Fig. 7 presents an overview of our experimental setup.
gem5
MiBench
Applications
ALPHA 
21264 Core
Configuration 
Files McPAT
Area 
Reports
AVF/FPVF
Models
Power 
Reports
FPVF 
Reports
Checkpointing 
Techniques
Compression 
Algorithms
FIGURE 7: Overview of our Experimental Setup.
C. VULNERABILITY ANALYSIS
We evaluate the vulnerability of an O3 superscalar Alpha
21264 core components [31] for the Bit-counts, SHA,
6 VOLUME 4, 2016
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
0 20 40 60 80 100
ROB
Inst. Queue
Load Queue
Store Queue
Int. Reg. File
Rename Map
Int. ALU
Int. Mult/Div
FP ALU
Differences in 
AVF of ALPHA 
Core Components
SHA
Bit-counts
P
ip
el
in
e 
C
o
m
p
o
n
en
t
Architectural Vulnerability Factor (AVF)
FIGURE 8: Differences in AVF of Alpha 7 Pipeline
Components under (SHA and Bit-countsWorkloads).
Dijkstra, and Patricia application workloads [32].
We analyze the vulnerability of the following key pipeline
components:
• Re-order Buffer (ROB),
• Issue Queue (IQ),
• Load Queue (LQ),
• Store Queue (SQ),
• Integer, Floating Pt. Register Files (RF),
• Rename Map (RM),
• Integer ALU (Int. ALU),
• Floating Point ALU (FP ALU),
• Integer Multiply/Divide (Int. MD), and
• Floating Point Multiply/Divide (FP MD).
The results of our vulnerability analyses are presented in
Figs. 8 and 9.
0
20
40
60
80
100
ROB IQ LQ SQ Int RF Int ALU Int MD FP ALU
Processor Components
SHA Patricia
Dijkstra Bit-counts
0
20
40
60
80
100
ROB IQ LQ SQ Int RF FP RF Int
ALU
Int
MD
FP
ALU
FP
MDProcessor Components
Blackscholes Canneal
(b) Architectural Vulnerability Factor of 
Quad-Core Alpha Processor Pipeline Components
(a) Architectural Vulnerability Factor of 
Single-Core Alpha Processor Pipeline Components
FIGURE 9: AVF Distribution of Key Pipeline
Components in Single- and Multi-Core Alpha 7
Processors.
From the results obtained, we make the following key
observations:
• The AVFs of the different pipeline components vary for
different application workloads.
• We have identified three key pipeline components (Inte-
ger ALU, Store Queue, and Re-order Buffer) that are
more vulnerable during the execution of SHA, when
compared to Bit-counts.
• Similarly, the re-order buffer is 27% and 46%
less vulnerable to soft errors during the execution of
Patricia and Bit-counts, when compared to
workloads like SHA and Dijkstra.
• Similar differences in component-AVFs can be observed
when varying multi-threaded application workloads,
from the PARSEC benchmark suite, are executed on a
multi-core processor, as shown in Fig. 9.
These components have different AVFs because of the type
of instructions being executed and their application-specific
properties (compute or memory-intensive, instruction-level
parallelism, cache hit/miss rate, etc.). For example, compo-
nents like the Re-order Buffer and the Store Queue are more
vulnerable in SHA because of higher levels of instruction-
level parallelism and more store instructions.
Based on this analysis, we can infer that hardening certain
components of the pipeline increases the reliability of a core
more than hardening the other components. Therefore, we
generate a wide range of reliability-heterogeneous Alpha
cores, and explore this architectural-space in terms of relia-
bility, power, and area, to select a configuration that increases
the reliability of application executions while decreasing the
area/power overhead.
D. FAULT INJECTION
Fault injection techniques are typically used to study, an-
alyze and evaluate the behaviour of a system susceptible
to faults [33]–[35]. The fault model for the ALPHA core
components is based on single- and multi-bit transient faults.
The soft error rate for each component is defined as the
product of error rate and the component’s AVF. The soft
error rate of the processor’s pipeline components have been
derived from the works presented in [36] [37]. To account
for a component’s spatial vulnerability (NFI ), the number
of faults injected in a pipeline component is proportional
to its on-chip area. We define Pflip as the probability that
a high-energy particle strike leads to a change in the logic
state of a pipeline component. Furthermore, to facilitate
fast simulation, the faults are injected in the region of in-
terest, the components, registers, and cache lines used by
the application. The application output is classified into 3
major categories, namely, (1) correct output, (2) incorrect
output, and (3) program failures (Nerror), which comprise of
multiple scenarios such as unaligned instruction, unmapped
address, and segmentation fault. The error rate (Perror) of
a transient fault in the component leading to an error in the
application execution is defined as follows:
VOLUME 4, 2016 7
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
Configurable 
Fault Generator
Vulnerability and Fault Models
BP
I-$
Int.
RM
FP
RM
Int.
IQ
FP
IQ
Int.
RF
Int.
RF
FP
RF
I-EX
I-EX
I-EX
I-EX
D-$
FP Mul.
FP Add.
Target Processor and Pipeline Components
 
Set of Appns.
Fault Injection 
Engine
Fault Files
 
Error Dist.
FVI
Area Power
IVI
TRF
Fault Analysis
Cycle 
Accurate 
Simulation
Golden 
Outputs
Erroneous 
Outputs
AVF IVI
CVI FVI
FIGURE 10: Overview of the Fault Injection
Methodology for Analyzing Processor Component
Vulnerabilities.
Perror = Pflip × NerrorNFI
An overview of the methodology used to inject and analyze
faults in various pipeline components is presented in Fig. 10.
Based on the vulnerability and fault models presented in
Section III and the configuration of the target processor,
including its pipeline components, we generate a list of fault
files, that is provided as an input to the fault injection engine.
This is used to insert faults/bit-flips into the target processor
platform during the application’s execution using a cycle-
accurate simulator, i.e., gem5. The architectural parameters
for the Alpha processor and the fault injection experiments
are illustrated in tables 2 and 3. We study the output obtained
from these simulations, which contains a list of correct and
erroneous outputs. These outputs are then compared against
the golden execution to estimate the type of error and the
frequency of these error occurrences for various pipeline
components. A subset of the results obtained from this ex-
periment are illustrated in Fig. 11.
TABLE 2: Processor Parameters for Vulnerability
Analyses Experiments
Parameter Value
Core Alpha 21264
Frequency 2 GHz
Simulation Mode Syscall Simulation
L1 Cache (I-$ and D-$) 32kB, 2-way, 64B, 2 cycles
L2 Cache 256kB, 2-way, 64B, 20 cycles
Cache Policy Snooping Coherence, LRU
TLB Data: 64, Instruction: 48
Re-order Buffer 192 Entries
Instruction Queue 64 Entries
Load-Store Queues 32 Entries
Register File Integer: 256, Floating-Pt.: 256
TABLE 3: Parameters for Fault Injection Experiments
Parameter Description Properties/Values
Distribution Distribution modelsfor fault generation Random
Bit Flips Min/Max number of bits flipped 1/1, 1/2, ...
Fault
Probability
Probability that
strike becomes a fault [2] 10%-100%
Fault
Location
List of target
processor components
Register file,
PC, IQ, etc.
Processor
Layout/ Area Size of the complete target device
Gate-
equivalents
or mm2 [38]
Component
Area
Area of different processor
components given as
percentage of processor area
0%-100%
Place and
Altitude
City and altitude at which the
device is used to
determine the flux rate (NFlux)
Oslo,
1-20km
Frequency Operating frequencyof the processor 50, 100 MHz
The results in Fig. 11 depict the error rate of three
pipeline components, namely, Level-II Cache, Integer Arith-
metic Logic Unit, and Instruction Queue. Faults injected in
the L2-cache lead to four major types of error and correct
output. The rest of the types are classified into the “others”
Correct 
Output
Incorrect 
Output
Unaligned 
Instruction
Unknown 
Instruction
Out of 
Memory
Others
F
o
u
r-
b
it
 F
au
lt
s
T
w
o
-b
it
 F
au
lt
s
O
n
e-
b
it
 F
au
lt
s
Fault Injection Analysis: L-II Cache Fault Injection Analysis: ALU
Bit-counts Dijkstra PatriciaSHA
Bit-counts Dijkstra PatriciaSHA
Bit-counts Dijkstra PatriciaSHA
Fault Injection Analysis: Instruction Queue
Correct 
Output
Incorrect 
Output
Unaligned 
Instruction
Unmapped 
Address
Others
Correct 
Output
Unknown 
Instruction
Invalid 
Instruction
Segmentation 
Fault
Bit-counts Dijkstra PatriciaSHA
Bit-counts Dijkstra PatriciaSHA
Bit-counts Dijkstra PatriciaSHA
Bit-counts Dijkstra PatriciaSHA
Bit-counts Dijkstra PatriciaSHA
Bit-counts Dijkstra PatriciaSHA
A B
FIGURE 11: Error Rate of Three Pipeline Components (L2 Cache, ALU, Instruction Queue) in the Alpha 7 Processor.
8 VOLUME 4, 2016
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
Heterogeneous Reliability Modes for Out-of-Order Superscalar Processors
BP
I-$
Int.
RM
FP
RM
Int.
IQ
FP
IQ
Int.
RF
Int.
RF
FP
RF
I-EX
I-EX
I-EX
I-EX
D-$
FP Mul.
FP Add.
Reliability Mode U  Unprotected Core
BP
I-$
Int.
RM
FP
RM
Int.
IQ
FP
IQ
Int.
RF
Int.
RF
FP
RF
I-EX
I-EX
I-EX
I-EX
D-$
FP Mul.
FP Add.
Int.
RF
Int.
RF
FP
RF
Int.
RF
Int.
RF
FP
RF
V
V
V
V
V
V
Reliability Mode RM1
Integer & Floating Point 
Register File Hardening
20% Area Overhead
70% Power Overhead
Pareto-optimal for SHA
BP
I-$
Int.
RM
FP
RM
Int.
IQ
FP
IQ
Int.
RF
Int.
RF
FP
RF
I-EX
I-EX
I-EX
I-EX
D-$
FP Mul.
FP Add.
I-$
I-$
V
RMK
I-$ Hardening
X% Power Overhead
Y% Area Overhead
Heterogeneous Reliability Modes RM3, RM4, …
BP
I-$
Int.
RM
FP
RM
Int.
IQ
FP
IQ
Int.
RF
Int.
RF
FP
RF
I-EX
I-EX
I-EX
I-EX
D-$
FP Mul.
FP Add.
Int.
RM
Int.
RM
FP
RM
FP
RM
V
V
Int.
IQ
Int.
IQ
FP
IQ
FP
IQ
V
V
Reliability Mode RM2
RM, IQ Hardening
80% Power Overhead
30% Area Overhead
FIGURE 12: Heterogeneous Reliability Modes for ALPHA 7 Out-of-Order Superscalar Processor.
category. The four major error categories are: (1) incorrect
output, (2) unaligned instruction, (3) unknown instruction,
and (4) out of memory. The label A depicts the applications
with a higher percentage of correct output, when compared
to the others. On average, the Bit-counts and SHA appli-
cations produce a correct output more than 80% of the time,
whereas Dijkstra and Patricia, on average produce a
correct output less than 70% and 60% of the time. This
is due to the lower number of load and store instructions
in the two applications present in label A, when compared
to the other two. Therefore, the probability of a soft error
in L2-cache leading to an error during the execution is
higher in an application with a relatively higher number
of load and store instructions as compared to the others.
Similarly, the label B depicts the percentage of fault injection
experiments that lead to an unmapped address. As explained
in the earlier example, due to the higher number of load
and store instructions in Dijkstra and Patricia, the
large number of unmapped addresses can be attributed to the
corruption of bits during address generation. Similarly, due to
their compute-intensive nature, a higher number of incorrect
outputs are generated by faults injected in an ALU during
the execution of applications like bit-counts and SHA.
Faults injected in the Instruction Queue cause three major
types of error, namely, (1) unknown instruction, (2) invalid
instruction, and (3) segmentation fault.
E. HETEROGENEOUS RELIABILITY MODES FOR
ALPHA CORES
As discussed in Section IV-C, the AVF of the pipeline compo-
nents varies for the different application workloads. Hence,
we propose to harden a combination of the key pipeline
components in out-of-order superscalar processors, instead
of employing full-scale TMR across the complete pipeline,
to increase core reliability while reducing the area and power
overheads of full-scale TMR. This generates a design space
of multiple heterogeneous reliability modes (RM), nine of
which are illustrated in this work (and unprotected core).
Table 4 presents our list of nine proposed heterogeneous RM
and the components that are hardened in these modes using
TMR. Hardened components have three instances with the
same inputs, and a voter circuit at the output to determine the
majority. An overview of the proposed heterogeneous relia-
bility modes for Alpha 7 processor is presented in Fig. 12.
We evaluate the vulnerability of our heterogeneous relia-
bility modes by executing applications from the MiBench ap-
plication benchmark to estimate the FPVF for each scenario.
We also evaluate the area and power overheads incurred by
each reliability mode. These results are illustrated in Fig. 13.
VOLUME 4, 2016 9
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
0
50
100
150
200
0
5
10
15
20
25
30
35
Unprotected reg file iq +map iq+lsq iq+lsq+map+rob reg file + iq+lsq reg file+map reg file+map+rob map+rob reg file+iq+lsq+map
U RM1 RM2 RM3 RM4 RM5 RM6 RM7 RM8 RM9
0
10
20
30
F
u
ll
-P
ro
ce
ss
o
r
V
u
ln
er
a
b
il
it
y
 F
a
ct
o
r
0
100
200
A
re
a
/P
o
w
er
 O
v
er
h
ea
d
Bit-counts
Dijsktra
SHA
Patricia
Power
Area
185% Overhead
0% Overhead
FIGURE 13: Full-Processor Vulnerability Factor (FPVF) and Power/Area Trade-off of Our Heterogeneous Reliability
Modes for Different MiBench Applications.
(a) Bit-counts
2
3
UA
re
a
1
2
3
1
(b) Dijkstra
UA
re
a
2
3
1
(c) Patricia
UA
re
a
2
3
1
(d) SHA
UA
re
a
2
3
1
(e) All Applications
A
re
a
Non-Pareto Optimal Reliability Mode     Pareto Optimal Reliability Mode
FIGURE 14: Design Space Exploration of Our Heterogeneous Reliability Modes for MiBench Applications.
From the results obtained, we make the following key
observations:
• Different heterogeneous reliability modes can reduce
the full-processor vulnerability to different extents de-
pending upon the properties of the executing applica-
tion. For example, reliability modes like RM2, RM6,
and RM9 reduce the processor vulnerability of SHA by
more than 50%, but not of Dijkstra, even though
they have similar vulnerabilities in all other reliability
modes.
• Hardening specific components in the pipeline can sig-
nificantly reduce the overall processor vulnerability. For
example, key components like Rename Map (RM) and
Reorder Buffer (ROB) effectively reduce the FPVF for
all applications, as shown by the heterogeneous relia-
bility modes RM4, RM7 and RM8. However, utilizing
these hardening modes incurs significant area and power
overheads.
• Certain heterogeneous reliability modes are very ef-
fective in reducing the FPVF by a large margin for
very small area/power overhead. For example, RM2 and
RM6 reduce the FPVF by more than 50% for <75% area
and power overheads when executing SHA.
• Hardening all pipeline components without hardening
the most highly vulnerable component of the system
introduces very high overheads without reducing the
vulnerability of the system significantly. This is illus-
trated by the reliability mode RM9, in which the ROB is
not hardened. This reliability mode has area and power
overheads close to ~200% with insignificant reductions
in FPVF, when compared to RM4, which significantly
reduces the FPVF for comparatively lower overheads.
TABLE 4: Proposed Heterogeneous Reliability Modes
Reliability Mode Components Hardened
U Unprotected
RM1 RF
RM2 IQ, RM
RM3 IQ, LQ, SQ
RM4 IQ, LQ, SQ, RM, ROB
RM5 RF, IQ, LQ, SQ
RM6 RF, RM
RM7 RF, RM, ROB
RM8 RM, ROB
RM9 RF, IQ, LQ, SQ, RM
TABLE 5: Pareto-Optimal Reliability Modes
for MiBench Applications
Application Pareto-Optimal Reliability Modes
Bit-counts U, RM4, RM7
Dijkstra U, RM4, RM7, RM8
Patricia U, RM4, RM7
SHA U, RM2, RM6, RM7, RM8
All U, MR4, RM7, RM8
Using the data gathered from the simulation of our designs,
we perform a design space exploration that trades-off FPVF,
area, and power overheads to extract the pareto-optimal de-
signs that suit the target application best. The pseudo code
of the pareto-frontier extraction algorithm is presented in
Algorithm 1. The corresponding results are illustrated in
Fig. 14. The x-axis denotes the FPVF, whereas the y- and
z-axes denote the power and area overheads, respectively.
10 VOLUME 4, 2016
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
Algorithm 1 Pareto-Frontier Extraction
Input: {FPV F,A, P}∀RM∀i∈[1,K]
Output: OptimalReliabilityModes (ORM)
1: TempSignal = 0;
2: TempArray(3,K) = 0;
3: TempArray2(3,K) = 0;
4: B = [FPV F,Area, Power];
5: for k← 1 to 3 do
6: j = 0;
7: temp = B(k, :);
8: for i← 1 to 3 do
9: if i! = k then
10: j = j + 1;
11: TempArray2(j, :) = temp−B(i, :);
12: end if
13: end for
14: if TempArray2(1 : j, :) < 0 then
15: TempSignal = TempSignal + 1;
16: TempArray(TempSignal, :) = temp;
17: end if
18: end for
19: if TempSignal >= 1 then
20: ORM = TempArray(1 : TempSignal, :);
21: end if
The design labeled U in all applications is the unprotected
core that is highly vulnerable to soft errors. As it does not
deploy any redundancy measures, it has zero area and power
overhead, and hence lies on the pareto-front. The pareto-
optimal reliability modes for the applications are presented
in Table 5. RM4 is pareto-optimal for all applications except
SHA. The register file is highly vulnerable to soft errors
during the execution of SHA and needs to be hardened to
reduce its vulnerability. The reliability mode RM7 is pareto-
optimal for all four applications and reduces the FPVF on
average by 87% with average area and power overheads of
10% and 43%, respectively.
A super-set of the pareto-optimal reliability modes for all
these applications can be selected to design a heterogeneous
multi-core processor. We can build the chip by selecting the
reliability modes from this super-set such that the form-factor
and cost constraints are adhered to. At run-time, the required
reliability modes can be switched-on/-off depending upon the
power constraints of the system.
F. RUN-TIME SYSTEM
Although this work focuses mostly on the design-time as-
pects of achieving heterogeneous reliability in out-of-order
superscalar processors, in this sub-section we present a brief
overview of a run-time system analysis using our proposed
heterogeneous multi-core processor. Our HMC is a 10-
core processor that is composed of all the 10 heterogeneous
reliability modes discussed in sub-section IV-E. We illus-
trate the benefits of our reliability modes by executing 5
application workload mixes, the compositions of which are
Start
Initialize Variable (A) to
First Task in Workload
Workload, Vulnerability 
Constraints, {FPVF, Power}, HMC
Initialize Variable (B) to
First Core in HMC
AscendingSort(HMC, Power)
FPVF < 
VulConst.?
Map Task (A) to Core (B);
Remove (B) from HMC
Move (A) to Next 
Task in Workload
Stop
Move (B) to Next 
Core in HMC
End of Task
in Workload?
End of 
Core in
HMC?
YES NO
NO
YES
YES
NO
FIGURE 15: Flowchart Illustrating the
Vulnerability-Constrained Power Minimization
Task-to-Core Mapping Policy.
Start
Initialize Variable (A) to 
First Task in Workload
Workload, Power Budget,
{FPVF, Power}, HMC
Initialize Variable (B) to 
First Core in HMC
Overhead <
PowerBud.?
Map Task (A) to Core (B);
Remove (B) from HMC
Move (A) to Next 
Task in Workload
Stop
Move (B) to Next 
Core in HMC
End of Task
in Workload?
End of 
Core in
HMC?
YES
NO
NO
YES
YES
NO
AscendingSort(HMC, FPVF)
FIGURE 16: Flowchart Illustrating the
Power-Constrained Vulnerability Minimization
Task-to-Core Mapping Policy.
presented in Table 6, on the 10-core heterogeneous processor
to evaluate the power-overheads and FPVF of the HMC for
each workload mix. The task-to-core mapping can be done
using one of the following heuristics:
(1) Vulnerability-Constrained Power Minimization: In
this technique (see Fig. 15), we impose a vulnerability
constraint on each task in the mix, i.e., each task is only
mapped sequentially to a core that can successfully exe-
cute the task under the imposed vulnerability constraint.
If a convenient core (one that satisfies vulnerability con-
straint) is not available, then the task is not scheduled
immediately. The goal of this approach is to minimize
the power overhead of the complete processor.
(2) Power-Constrained Vulnerability Minimization: This
approach (Fig. 16) imposes a constraint on the maximum
power overhead of the whole processor, i.e., the task-
to-core mapping is stalled when the power constraint is
exceeded, which is an overhead of 100% for each task
VOLUME 4, 2016 11
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
F
u
ll
 P
ro
ce
ss
o
r 
V
u
ln
er
ab
il
it
y
 F
ac
to
r
10
20
30
40
10090807060
Un-protected Processor
Full-Protected Processor
Vul.-Const. Power Min.
Power-Const. Vul. Min.
FIGURE 17: Run-Time Task Mapping Analysis of HMC.
in the mix. The goal of this task mapping policy is to
minimize the FPVF.
The results of this evaluation are presented in Fig. 17, in
which we make the following key observations:
• The proposed reliability modes can be deployed in a
heterogeneous multi-core processor to reduce the power
overheads of the executing the application workloads,
based on the application’s workload requirement.
• The proposed reliability modes can either be used to
minimize the power overhead or the full-processor vul-
nerability factor as illustrated by the two task mapping
policies.
Although 100% task mapping is not achieved as in the un-
protected or full-protected case, this can be resolved by
efficiently selecting the reliability modes to be deployed in
the HMC considering the potential application workloads
and/or by using a task mapping algorithm that can efficiently
schedule the tasks to processor cores.
TABLE 6: Workload Mixes and their Application
Compositions
Application
Mix Composition
MIX-1 [Bit-counts, Dijkstra, SHA, Patricia,Bit-counts, Dijkstra, SHA, Patricia]
MIX-2 [Bit-counts, Bit-counts, Bit-counts, Bit-counts,Dijkstra, Dijkstra, Dijkstra, Dijkstra]
MIX-3 [Bit-counts, SHA, Patricia, Bit-counts,SHA, Patricia, Bit-counts, Patricia]
MIX-4 [SHA, Patricia, SHA, Patricia, SHA, Patricia]
MIX-5 [SHA, SHA, SHA, SHA, SHA, Dijkstra]
G. STATE COMPRESSION TECHNIQUES
Checkpointing and Rollback is an effective way of guar-
anteeing reliability at the software layer by means of pro-
viding both spatial and temporal redundancy. A checkpoint
is a snapshot of the processor state at any instant in time.
Checkpoints allow the system to rollback to the previous safe
states in case a failure is detected and re-execute instructions.
Fig. 18 presents an overview of the methodology that we
use for checkpointing and state compression. Checkpoints
are typically inserted intermittently into the target application
Checkpointing and State Compression
Main Memory S1S2S3S4S5
BP
I-$
Int.
RM
FP
RM
Int.
IQ
FP
IQ
Int.
RF
Int.
RF
FP
RF
I-EX
I-EX
I-EX
I-EX
D-$
FP Mul.
FP Add.
Checkpointing State Compression
Compression 
AlgorithmsVariant 
Selection
Safe-State 
Selection
Rollback State Decompression
S5
S5
S5
Compressed Variants
    S5
BP
I-$
Int.
RM
FP
RM
Int.
IQ
FP
IQ
Int.
RF
Int.
RF
FP
RF
I-EX
I-EX
I-EX
I-EX
D-$
FP Mul.
FP Add.
FIGURE 18: Overview of the Methodology for
Checkpointing and State Compression.
for periodic state retention and, if required, rollback to an
earlier processor state, i.e., in case of faulty execution. Typi-
cally, the collected processor’s state information is stored in
the main memory or off-chip nonvolatile memory, which can
still be used for rollback in case of power-off. In our case, to
reduce the size of checkpointing data, we introduce another
stage of state compression, that utilizes state-of-the-art com-
pression techniques to generate a wide-range of compressed
checkpoint variants. The optimal compressed variant can
be selected based on the system’s resource constraints and
available on-/off-chip memory. In case a fault is detected in
the current processor state, during the application execution,
the previous safe-state is decompressed and rolled back to
ensure the correct execution of the application.
The checkpointing mechanism deployed by gem5 comes
with certain caveats. This technique does not preserve cache
and pipeline states in a checkpoint because of which frequent
restoration from such checkpoints results in performance
loss, if deployed in real-world systems. Therefore, we ex-
plore techniques like DMTCP [27] [28] that checkpoints the
Linux process. The back-end checkpointing mechanism of
DMTCP is accessible to the programmer via numerous APIs.
These APIs can be used in conjunction with the front-end
gem5 pseudo-instructions for checkpoint creation/recovery.
Since these software-based checkpoints are often large, the
checkpoint is compressed using gzip and HBICT to save
memory. HBICT [29] [30] provides DMTCP support for
12 VOLUME 4, 2016
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
delta-compression (relative to the previous compression)
which is further compressed using gzip (combination of
lossless data compression algorithms like LZ77 and Huffman
coding).
We investigate the effectiveness of these techniques in all
possible combinations, by applying them one after the other,
on applications from the MiBench application benchmark
suite by simulating them on the ALPHA core using gem5.
The results of this experiment are presented in Fig. 19. From
these results, we make the following key observations:
• the combination of DMTCP and gzip is highly success-
ful in reducing the checkpoint size by ~6×
• the combination of DMTCP, HBICT, and gzip tech-
niques reduces the checkpoint size by ~5.7×.
HBICT, which utilizes delta-compression, requires all pre-
vious checkpoints for efficient rollback. Since the base file
size of HBICT+DMTCP is 1.03× larger than the file size of
DMTCP, the effectiveness of the combined state compression
technique (DMTCP+HBICT+gzip), with respect to DMTCP,
is reduced.
Bit-count Sha Patricia Dijkstra
0
20
40
60
80
100
120
Bit-counts SHA Patricia Dijkstra
C
h
ec
k
p
o
in
t 
D
at
a 
S
iz
e 
[M
B
]
gem5
DMTCP + gzip
DMTCP + gzip +HBICT
DMTCPDMTCP + HBICT
FIGURE 19: Effectiveness of State Compression
Techniques in Reducing State Size.
V. RELATED WORK
Reliability is a major research challenge that is being tackled
by the community at large via global initiatives like the NSF’s
Variability Expedition1 and DFG’s SPP 1500 Priority Pro-
gram2. Research works from the academia and industry alike
have addressed the challenges associated with technology
scaling across the layers of the computing stack.
Mitigation Strategies: The work in [39] presents the
Razor approach, which can be used to dynamically detect
and correct timing errors by monitoring the error rate at
run-time to tune the circuit’s supply voltage. The adaptive
approach presented in [40] enables per-core dual modular
redundancy (DMR) through the means of DVFS to offer a
stable soft error rate (SER). An OS-level dynamic reliabil-
ity management system for heterogeneous architectures for
achieving an optimal trade-off between reliability (lifetime)
and power/performance efficiency is presented in [41]. A
software-level technique is presented in [42], which is used
1http://www.variability.org/
2http://spp1500.itec.kit.edu
to detect errors by duplicating instructions during compile
time by using different variables and registers for new in-
structions. A software-controlled fault-tolerance scheme is
proposed in [43] that allows programmers and designers to
trade-off between performance and reliability based on the
system’s requirement. Luo et al. [44] quantify the tolerance
of application to memory errors to propose several new hard-
ware/software heterogeneous-reliability memory systems to
reduce their vulnerabilities and data-center costs.
Reliability Modeling: The work in [45] demonstrates the
concept of Program Vulnerability Factor, which captures the
architecture-level fault masking properties of the underlying
program while exhibiting workload-driven changes in the
AVF for all architectural components. Li et al. [46] analyze
the correlation between soft-error rate and the energy con-
sumption behaviour of on-chip data caches. This involves
analyzing (1) the leakage energy optimizations on soft errors,
and (2) the energy overheads of protecting on-chip memories
against soft errors. A software-level technique proposed in
[47] introduces transient fault tolerance in a multi-core sys-
tem by exploiting process-level redundancy (PLR) to create
multiple application threads and compare them to ensure cor-
rect execution of the application. A software-level approach
to enable self-adaptive reliability for multi-/many-core sys-
tems is proposed in [48] by activating redundancy measures
based on the application’s dependability requirements. A
simultaneous and redundantly threaded (SRT) processor is
presented in [49], which provides transient fault tolerance
with significantly higher performance. Redundant copies of
the program threads are executed simultaneously on the SRT
to ensure accurate application execution. Kriebel et al. [50]
analyze and present the reliability issues of on-chip memory
systems to propose a reliability-aware reconfigurable last-
level cache architecture that adapts the cache parameters
to concurrently execute multi-threaded workloads at run-
time in order to minimize their vulnerabilities. A soft error-
aware cache architectural space-exploration methodology is
presented in [51] for varying the application workloads and
cache parameters for the complete cache hierarchy. An adap-
tive soft-error resilience (ASER) approach is presented in
[52] by proposing and managing reliability-heterogeneous
dark silicon many-core processors (darkRHPs). The pro-
posed darkRHPs deploy redundancy at the architecture level,
i.e., hardening either the full-processor pipeline of an in-
order LEON3 processor and/or caches. The work in [53]
presents an approach that exploits the on-chip dark-silicon to
synergistically mitigate reliability and variability challenges
associated with transistor technology scaling. An overview
of different heterogeneous fault-tolerance schemes for both
hardware and software layers is presented in [11], which also
provides an initial proof-of-concept of this work.
This work, on the other hand, focuses on generating and
exploring a wide-range of heterogeneous reliability modes
using two key approaches, i.e., (1) Redundancy, by hardening
different combinations of the pipeline components for an
out-of-order superscalar processor, and (2) Checkpointing,
VOLUME 4, 2016 13
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
by reducing the size of the checkpoint data using efficient
compression techniques.
VI. CONCLUSION
In this work, we presented a novel design space genera-
tion and exploration methodology that is used to develop
a wide range of heterogeneous reliability modes for out-of-
order superscalar processors. By analyzing the architectural
vulnerability of key pipeline components, we propose to
harden them in multiple different combinations with varying
levels of reliability to cater to the application’s requirement
while minimizing the power overhead. The pareto-optimal
reliability mode RM7 is successful in reducing the processor
vulnerability by 87% on average, with area and power over-
heads of 10% and 43%, respectively. To further enhance our
design space for heterogeneous reliability, we also investigate
effective state-compression techniques to reduce the data size
of a checkpoint by ~6×. Our studies illustrate that in power-
constrained scenarios, enabling reliability at a fine granular-
ity, and deploying reliability-heterogeneous super-scalar out-
of-order processor bear a significant potential for real-world
systems, especially when considering diverse vulnerability
profiles of different applications, which can further vary
depending upon their input workloads.
REFERENCES
[1] J. Henkel, L. Bauer, N. D. Dutt, P. Gupta, S. R. Nassif, M. Shafique,
M. B. Tahoori, and N. Wehn, “Reliable on-chip systems in the
nano-era: lessons learnt and future trends,” in The 50th Annual
Design Automation Conference 2013, DAC ’13, Austin, TX, USA,
May 29 - June 07, 2013, 2013, pp. 99:1–99:10. [Online]. Available:
https://doi.org/10.1145/2463209.2488857
[2] R. C. Baumann, “Radiation-induced soft errors in advanced semiconductor
technologies,” IEEE Transactions on Device and materials reliability,
vol. 5, no. 3, pp. 305–316, 2005.
[3] G. P. Saggese, N. J. Wang, Z. Kalbarczyk, S. J. Patel, and R. K.
Iyer, “An experimental study of soft errors in microprocessors,”
IEEE Micro, vol. 25, no. 6, pp. 30–39, 2005. [Online]. Available:
https://doi.org/10.1109/MM.2005.104
[4] S. Feng, S. Gupta, A. Ansari, and S. A. Mahlke, “Shoestring: probabilistic
soft error reliability on the cheap,” in Proceedings of the 15th International
Conference on Architectural Support for Programming Languages
and Operating Systems, ASPLOS 2010, Pittsburgh, Pennsylvania,
USA, March 13-17, 2010, 2010, pp. 385–396. [Online]. Available:
https://doi.org/10.1145/1736020.1736063
[5] S. Y. Borkar, “Designing reliable systems from unreliable components:
The challenges of transistor variability and degradation,” IEEE Micro,
vol. 25, no. 6, pp. 10–16, 2005. [Online]. Available: https://doi.org/10.
1109/MM.2005.110
[6] T. Li, R. G. Ragel, and S. Parameswaran, “Reli: Hardware/software
checkpoint and recovery scheme for embedded processors,” in 2012
Design, Automation & Test in Europe Conference & Exhibition, DATE
2012, Dresden, Germany, March 12-16, 2012, 2012, pp. 875–880.
[Online]. Available: https://doi.org/10.1109/DATE.2012.6176621
[7] C. J. Li and W. K. Fuchs, “Catch-compiler-assisted techniques for
checkpointing,” in Proceedings of the 20th International Symposium
on Fault-Tolerant Computing, FTCS 1990, Newcastle Upon Tyne,
UK, 26-28 June, 1990, 1990, pp. 74–81. [Online]. Available: https:
//doi.org/10.1109/FTCS.1990.89337
[8] S. S. Mukherjee, M. Kontz, and S. K. Reinhardt, “Detailed design and
evaluation of redundant multithreading alternatives,” in 29th International
Symposium on Computer Architecture (ISCA 2002), 25-29 May
2002, Anchorage, AK, USA, 2002, pp. 99–110. [Online]. Available:
https://doi.org/10.1109/ISCA.2002.1003566
[9] N. Oh, P. P. Shirvani, and E. J. McCluskey, “Error detection
by duplicated instructions in super-scalar processors,” IEEE Trans.
Reliability, vol. 51, no. 1, pp. 63–75, 2002. [Online]. Available:
https://doi.org/10.1109/24.994913
[10] G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. I. August,
“SWIFT: software implemented fault tolerance,” in 3nd IEEE / ACM
International Symposium on Code Generation and Optimization (CGO
2005), 20-23 March 2005, San Jose, CA, USA, 2005, pp. 243–254.
[Online]. Available: https://doi.org/10.1109/CGO.2005.34
[11] S. Rehman, F. Kriebel, B. S. Prabakaran, F. Khalid, and M. Shafique,
“Hardware and software techniques for heterogeneous fault-tolerance,”
in 24th IEEE International Symposium on On-Line Testing And Robust
System Design, IOLTS 2018, Platja D’Aro, Spain, July 2-4, 2018, 2018,
pp. 115–118. [Online]. Available: https://doi.org/10.1109/IOLTS.2018.
8474219
[12] T. C. May and M. H. Woods, “Alpha-particle-induced soft errors in
dynamic memories,” IEEE Transactions on Electron Devices, vol. 26,
no. 1, pp. 2–9, Jan 1979.
[13] G. R. Srinivasan, P. C. Murley, and H. K. Tang, “Accurate, predictive
modeling of soft error rate due to cosmic rays and chip alpha radiation,” in
Proceedings of 1994 IEEE International Reliability Physics Symposium,
April 1994, pp. 12–16.
[14] T. J. O’Gorman, “The effect of cosmic rays on the soft error rate of a dram
at ground level,” IEEE Transactions on Electron Devices, vol. 41, no. 4,
pp. 553–557, April 1994.
[15] P. Hazucha and C. Svensson, “Impact of cmos technology scaling on
the atmospheric neutron soft error rate,” IEEE Transactions on Nuclear
Science, vol. 47, no. 6, pp. 2586–2594, Dec 2000.
[16] Gaisler. Leon3ft fault-tolerant processor. [Online]. Available: https:
//www.gaisler.com/index.php/products/processors/leon3ft
[17] R. E. Lyons and W. Vanderkulk, “The use of triple-modular redundancy
to improve computer reliability,” IBM Journal of Research and
Development, vol. 6, no. 2, pp. 200–209, 1962. [Online]. Available:
https://doi.org/10.1147/rd.62.0200
[18] M. M. Dickinson, J. B. Jackson, and G. C. Randa, “Saturn v launch
vehicle digital computer and data adapter,” in Proceedings of the October
27-29, 1964, Fall Joint Computer Conference, Part I, ser. AFIPS ’64
(Fall, part I). New York, NY, USA: ACM, 1964, pp. 501–516. [Online].
Available: http://doi.acm.org/10.1145/1464052.1464099
[19] R. E. Kessler, “The alpha 21264 microprocessor,” IEEE Micro, vol. 19,
no. 2, pp. 24–36, 1999. [Online]. Available: https://doi.org/10.1109/40.
755465
[20] M. Johnson, Superscalar microprocessor design, ser. Prentice Hall series
in innovative technology. Prentice Hall, 1991.
[21] W. Hwu and Y. N. Patt, “Hpsm, a high performance restricted data
flow architecture having minimal functionality,” in Proceedings of the
13th Annual International Symposium on Computer Architecture, ser.
ISCA ’86. Los Alamitos, CA, USA: IEEE Computer Society Press,
1986, pp. 297–306. [Online]. Available: http://dl.acm.org/citation.cfm?
id=17407.17391
[22] C. C. Corporation, “Alpha 21264 microprocessor hardware reference
manual,” July 1999, Alpha 21264 Manual.
[23] P. E. Gronowski, W. J. Bowhill, R. P. Preston, M. K. Gowan, and R. L.
Allmon, “High-performance microprocessor design,” IEEE Journal of
Solid-State Circuits, vol. 33, no. 5, pp. 676–686, May 1998.
[24] S. S. Mukherjee, C. T. Weaver, J. S. Emer, S. K. Reinhardt,
and T. M. Austin, “A systematic methodology to compute the
architectural vulnerability factors for a high-performance microprocessor,”
in Proceedings of the 36th Annual International Symposium on
Microarchitecture, San Diego, CA, USA, December 3-5, 2003, 2003,
pp. 29–42. [Online]. Available: https://doi.org/10.1109/MICRO.2003.
1253181
[25] N. L. Binkert, B. M. Beckmann, G. Black, S. K. Reinhardt, A. G.
Saidi, A. Basu, J. Hestness, D. Hower, T. Krishna, S. Sardashti,
R. Sen, K. Sewell, M. S. B. Altaf, N. Vaish, M. D. Hill, and
D. A. Wood, “The gem5 simulator,” SIGARCH Computer Architecture
News, vol. 39, no. 2, pp. 1–7, 2011. [Online]. Available: https:
//doi.org/10.1145/2024716.2024718
[26] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and
N. P. Jouppi, “Mcpat: an integrated power, area, and timing modeling
framework for multicore and manycore architectures,” in 42st Annual
IEEE/ACM International Symposium on Microarchitecture (MICRO-42
2009), December 12-16, 2009, New York, New York, USA, 2009, pp.
469–480. [Online]. Available: https://doi.org/10.1145/1669112.1669172
14 VOLUME 4, 2016
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
[27] J. Ansel, K. Arya, and G. Cooperman, “DMTCP: transparent
checkpointing for cluster computations and the desktop,” in 23rd
IEEE International Symposium on Parallel and Distributed Processing,
IPDPS 2009, Rome, Italy, May 23-29, 2009, 2009, pp. 1–12. [Online].
Available: https://doi.org/10.1109/IPDPS.2009.5161063
[28] [Online]. Available: http://dmtcp.sourceforge.net/
[29] S. Agarwal, R. Garg, M. S. Gupta, and J. E. Moreira, “Adaptive
incremental checkpointing for massively parallel systems,” in Proceedings
of the 18th Annual International Conference on Supercomputing, ICS
2004, Saint Malo, France, June 26 - July 01, 2004, 2004, pp. 277–286.
[Online]. Available: https://doi.org/10.1145/1006209.1006248
[30] [Online]. Available: http://hbict.sourceforge.net/index.html
[31] R. E. Kessler, “The alpha 21264 microprocessor,” IEEE Micro, vol. 19,
no. 2, pp. 24–36, 1999. [Online]. Available: https://doi.org/10.1109/40.
755465
[32] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and
R. B. Brown, “Mibench: A free, commercially representative embedded
benchmark suite,” in Proceedings of the fourth annual IEEE international
workshop on workload characterization. WWC-4 (Cat. No. 01EX538).
IEEE, 2001, pp. 3–14.
[33] S. K. S. Hari, T. Tsai, M. Stephenson, S. W. Keckler, and J. S.
Emer, “SASSIFI: an architecture-level fault injection tool for GPU
application resilience evaluation,” in 2017 IEEE International Symposium
on Performance Analysis of Systems and Software, ISPASS 2017, Santa
Rosa, CA, USA, April 24-25, 2017, 2017, pp. 249–258. [Online].
Available: https://doi.org/10.1109/ISPASS.2017.7975296
[34] H. Ziade, R. A. Ayoubi, and R. Velazco, “A survey on fault injection
techniques,” Int. Arab J. Inf. Technol., vol. 1, no. 2, pp. 171–186, 2004.
[Online]. Available: http://www.iajit.org/ABSTRACTS-2.htm#04
[35] S. Rehman, M. Shafique, F. Kriebel, and J. Henkel, “Reliable
software for unreliable hardware: embedded code generation aiming
at reliability,” in Proceedings of the 9th International Conference
on Hardware/Software Codesign and System Synthesis, CODES+ISSS
2011, part of ESWeek ’11 Seventh Embedded Systems Week, Taipei,
Taiwan, 9-14 October, 2011, 2011, pp. 237–246. [Online]. Available:
https://doi.org/10.1145/2039370.2039408
[36] S. S. Mukherjee, J. S. Emer, and S. K. Reinhardt, “The soft error
problem: An architectural perspective,” in 11th International Conference
on High-Performance Computer Architecture (HPCA-11 2005), 12-16
February 2005, San Francisco, CA, USA, 2005, pp. 243–247. [Online].
Available: https://doi.org/10.1109/HPCA.2005.37
[37] A. Dixit and A. Wood, “The impact of new technology on soft error
rates,” in 2011 International Reliability Physics Symposium, April 2011,
pp. 5B.4.1–5B.4.7.
[38] J. Gaisler, “A portable and fault-tolerant microprocessor based on
the SPARC V8 architecture,” in 2002 International Conference on
Dependable Systems and Networks (DSN 2002), 23-26 June 2002,
Bethesda, MD, USA, Proceedings, 2002, pp. 409–415. [Online].
Available: https://doi.org/10.1109/DSN.2002.1028926
[39] D. Ernst, S. Das, S. Lee, D. T. Blaauw, T. M. Austin, T. N. Mudge, N. S.
Kim, and K. Flautner, “Razor: Circuit-level correction of timing errors
for low-power operation,” IEEE Micro, vol. 24, no. 6, pp. 10–20, 2004.
[Online]. Available: https://doi.org/10.1109/MM.2004.85
[40] R. Vadlamani, J. Zhao, W. P. Burleson, and R. Tessier, “Multicore
soft error rate stabilization using adaptive dual modular redundancy,”
in Design, Automation and Test in Europe, DATE 2010, Dresden,
Germany, March 8-12, 2010, 2010, pp. 27–32. [Online]. Available:
https://doi.org/10.1109/DATE.2010.5457242
[41] A. Baldassari, C. Bolchini, and A. Miele, “A dynamic reliability
management framework for heterogeneous multicore systems,” in
IEEE International Symposium on Defect and Fault Tolerance in
VLSI and Nanotechnology Systems, DFT 2017, Cambridge, United
Kingdom, October 23-25, 2017, 2017, pp. 1–6. [Online]. Available:
https://doi.org/10.1109/DFT.2017.8244440
[42] N. Oh, P. P. Shirvani, and E. J. McCluskey, “Error detection
by duplicated instructions in super-scalar processors,” IEEE Trans.
Reliability, vol. 51, no. 1, pp. 63–75, 2002. [Online]. Available:
https://doi.org/10.1109/24.994913
[43] G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, D. I. August, and S. S.
Mukherjee, “Software-controlled fault tolerance,” TACO, vol. 2, no. 4,
pp. 366–396, 2005. [Online]. Available: https://doi.org/10.1145/1113841.
1113843
[44] Y. Luo, S. Govindan, B. Sharma, M. Santaniello, J. Meza, A. Kansal,
J. Liu, B. Khessib, K. Vaid, and O. Mutlu, “Characterizing application
memory error vulnerability to optimize datacenter cost via heterogeneous-
reliability memory,” in 44th Annual IEEE/IFIP International Conference
on Dependable Systems and Networks, DSN 2014, Atlanta, GA,
USA, June 23-26, 2014, 2014, pp. 467–478. [Online]. Available:
https://doi.org/10.1109/DSN.2014.50
[45] V. Sridharan and D. R. Kaeli, “Eliminating microarchitectural dependency
from architectural vulnerability,” in 15th International Conference on
High-Performance Computer Architecture (HPCA-15 2009), 14-18
February 2009, Raleigh, North Carolina, USA, 2009, pp. 117–128.
[Online]. Available: https://doi.org/10.1109/HPCA.2009.4798243
[46] L. Li, V. Degalahal, N. Vijaykrishnan, M. T. Kandemir, and M. J.
Irwin, “Soft error and energy consumption interactions: a data cache
perspective,” in Proceedings of the 2004 International Symposium on
Low Power Electronics and Design, 2004, Newport Beach, California,
USA, August 9-11, 2004, 2004, pp. 132–137. [Online]. Available:
https://doi.org/10.1145/1013235.1013273
[47] A. Shye, T. Moseley, V. J. Reddi, J. Blomstedt, and D. A. Connors,
“Using process-level redundancy to exploit multiple cores for transient
fault tolerance,” in The 37th Annual IEEE/IFIP International Conference
on Dependable Systems and Networks, DSN 2007, 25-28 June 2007,
Edinburgh, UK, Proceedings, 2007, pp. 297–306. [Online]. Available:
https://doi.org/10.1109/DSN.2007.98
[48] C. Bolchini, M. Carminati, and A. Miele, “Self-adaptive fault
tolerance in multi-/many-core systems,” J. Electronic Testing,
vol. 29, no. 2, pp. 159–175, 2013. [Online]. Available:
https://doi.org/10.1007/s10836-013-5367-y
[49] S. K. Reinhardt and S. S. Mukherjee, “Transient fault detection
via simultaneous multithreading,” in 27th International Symposium
on Computer Architecture (ISCA 2000), June 10-14, 2000,
Vancouver, BC, Canada, 2000, pp. 25–36. [Online]. Available:
http://doi.ieeecomputersociety.org/10.1109/ISCA.2000.854375
[50] F. Kriebel, S. Rehman, A. Subramaniyan, S. J. B. Ahandagbe, M. Shafique,
and J. Henkel, “Reliability-aware adaptations for shared last-level caches
in multi-cores,” ACM Trans. Embedded Comput. Syst., vol. 15, no. 4, pp.
67:1–67:26, 2016. [Online]. Available: https://doi.org/10.1145/2961059
[51] A. Subramaniyan, S. Rehman, M. Shafique, A. Kumar, and J. Henkel,
“Soft error-aware architectural exploration for designing reliability
adaptive cache hierarchies in multi-cores,” in Design, Automation &
Test in Europe Conference & Exhibition, DATE 2017, Lausanne,
Switzerland, March 27-31, 2017, 2017, pp. 37–42. [Online]. Available:
https://doi.org/10.23919/DATE.2017.7926955
[52] F. Kriebel, S. Rehman, D. Sun, M. Shafique, and J. Henkel, “ASER:
adaptive soft error resilience for reliability-heterogeneous processors in
the dark silicon era,” in The 51st Annual Design Automation Conference
2014, DAC ’14, San Francisco, CA, USA, June 1-5, 2014, 2014, pp.
12:1–12:6. [Online]. Available: https://doi.org/10.1145/2593069.2593094
[53] F. Kriebel, M. Shafique, S. Rehman, J. Henkel, and S. Garg,
“Variability and reliability awareness in the age of dark silicon,” IEEE
Design & Test, vol. 33, no. 2, pp. 59–67, 2016. [Online]. Available:
https://doi.org/10.1109/MDAT.2015.2439640
BHARATH SRINIVAS PRABAKARAN (S’19)
is a PhD Student at the Computer Architecture and
Robust Energy-Efficient Technologies (CARE-
Tech.) research group, Institute of Computer En-
gineering, TU Wien, Austria. He graduated with
a Bachelor of Engineering in Electrical and Elec-
tronics and a Master of Science in Biological
Sciences from the Birla Institute of Technology
and Science (BITS), Pilani in 2017. He was as
a visiting researcher at TU Dresden for a span
of 1 year from 2016 to 2017, where he completed his master thesis fo-
cused on “Approximate Computing”. His research interests include fault-
tolerant computing, wearable architectures, healthcare systems, energy-
efficient technologies, and embedded machine learning.
VOLUME 4, 2016 15
Prabakaran et al.: Heterogeneous Reliability and Checkpointing Modes for Out-of-Order Superscalar Processors
MIHIKA DAVE is currently working as a software
engineer at Facebook, Inc. She graduated with a
Master of Science in Computer Science from the
University of Illinois at Urbana-Champaign with
a specialization in natural language processing in
2018. In 2016, she graduated with a Bachelor of
Engineering from BITS-Pilani, India where she
secured the 1st rank in the Department of Elec-
trical and Electronics Engineering and received a
Bronze Medal in the entire batch of students across
all the Science and Engineering Departments. Her main research interests
are heterogeneous fault-tolerance, machine learning, and natural language
processing. She is the recipient of several scholarships and awards, such
as the DAAD-WISE Scholarship, Michal S. Hughes Award in Software
Engineering, and BITS-Pilani Merit Scholarship.
FLORIAN KRIEBEL is a university assistant at
the Computer Architecture and Robust Energy-
Efficient Technologies (CARE-Tech.) research
group, Institute of Computer Engineering, TU
Wien, Austria. He received the M.Sc. degree
in computer science from Karlsruhe Institute of
Technology (KIT), Germany, in 2013. His current
research interests include dependable computing,
cross-layer reliability modeling, and optimization.
Mr. Kriebel has received the CODES+ISSS 2011
and 2015 Best Paper Awards.
SEMEEN REHMAN is currently on a Laufbahn-
stelle (Tenure-Track Assistant Professor) position
at Institute of Computer Technology (ICT), Fac-
ulty of Electrical Engineering and Information
Technology, Technische Universität Wien (TU
Wien). Before that, she was a post-doctoral re-
searcher at the Technische Universität Dresden
(TU Dresden) and Karlsruhe Institute of Technol-
ogy (KIT), Germany since 2015. She received her
Ph.D. in computer science on 15.July.2015 from
KIT, Germany. Her main research interests are dependable systems, cross-
layer design for error resiliency with a focus on run-time adaptations,
emerging computing paradigms like approximate computing, hardware se-
curity, energy-efficient computing, embedded systems, MPSoCs, IoT and
CPS. Dr. Rehman has contributed key ideas that have led to various DFG
projects such as GetSURE and GetSURE-II at the KIT, which focused on
enabling reliability across multiple software and hardware layers. At the
Chair for Processor Design at Technische Universität Dresden, Germany,
she initiated the research on Reconfigurable Approximate Computing. Dr.
Rehman received the CODES+ISSS 2011 and 2015 Best Paper Awards,
DATE 2017 Best Paper Award Nomination, several HiPEAC Paper Awards,
Richard Newton Young Student Fellow Award at DAC 2015, and Research
Student Award at KIT in 2012. She has served on the TPC of multiple
premier conferences on design automation and embedded systems (like
DATE and CASES), and has (co-)chaired sessions at the DATE 2017, 2018,
and 2019 conferences. She co-authored 1 book, multiple book chapters, and
30+ publications in premier journals and conferences.
MUHAMMAD SHAFIQUE (M’11 - SM’16) is a
full professor of Computer Architecture and Ro-
bust Energy-Efficient Technologies (CARE-Tech.)
at the Institute of Computer Engineering, TU
Wien, Austria since Nov. 2016. He received his
Ph.D. in Computer Science from Karlsruhe Insti-
tute of Technology (KIT), Germany, in Jan.2011.
Before, he was with Streaming Networks Pvt. Ltd.
where he was involved in research and develop-
ment of video coding systems for several years.
His research interests are in computer architecture, power-/energy-efficient
systems, robust computing, hardware security, Brain-Inspired computing
trends like Neuromorphic and Approximate Computing, hardware and
system-level design for Machine Learning and AI, emerging technologies
& nanosystems, FPGAs, MPSoCs, and embedded systems. His research
has a special focus on cross-layer modeling, design, and optimization of
computing and memory systems, as well as their deployment in use cases
from Internet-of-Things (IoT), Cyber-Physical Systems (CPS), and ICT for
Development (ICT4D) domains.
Dr. Shafique has given several Keynotes, Invited Talks, and Tutorials. He
has also organized many special sessions at premier venues and served as the
Guest Editor for IEEE Design and Test Magazine and IEEE Transactions on
Sustainable Computing. He has served on the TPC of numerous prestigious
IEEE/ACM conferences. Dr. Shafique received the 2015 ACM/SIGDA
Outstanding New Faculty Award, six gold medals in his educational career,
and several best paper awards and nominations at prestigious conferences
like CODES+ISSS, DATE, DAC and ICCAD, Best Master Thesis Award,
DAC’14 Designer Track Best Poster Award, IEEE Transactions of Com-
puter "Feature Paper of the Month" Awards, and Best Lecturer Award. Dr.
Shafique holds one US patent and has (co-)authored 6 Books, 10+ Book
Chapters, and over 200 papers in premier journals and conferences. He is a
senior member of the IEEE and IEEE Signal Processing Society (SPS), and
a member of the ACM, SIGARCH, SIGDA, SIGBED, and HIPEAC.
16 VOLUME 4, 2016
