Error isolation in distributed systems by Behrens, Diogo
Faculty of Computer Science Institute of Systems Architecture, Professorship of Systems Engineering
ERROR ISOLATION IN DISTRIBUTED
SYSTEMS
Diogo Behrens
Born on: 7th July 1981 in Porto Alegre, Brazil
DISSERTATION
to achieve the academic degree
DOKTORINGENIEUR (DR.-ING.)
First referee
Prof. Christof Fetzer, Ph.D.
Second referee
Flavio P. Junqueira, Ph.D.
Submitted on: 26th January 2015
Defended on: 14th January 2016

PREFACE
In distributed systems, if a hardware fault corrupts the state of a process, this error might
propagate as a corrupt message and contaminate other processes in the system, causing
severe outages. Recently, state corruptions of this nature have been observed surprisingly
often in large computer populations, e.g., in large-scale data centers. Moreover, since the
resilience of processors is expected to decline in the near future, the likelihood of state
corruptions will increase even further.
In this work, we argue that preventing the propagation of state corruption should be a first-
class requirement for large-scale fault-tolerant distributed systems. In particular, we propose
developers to target error isolation, the property in which each correct process ignores any
corrupt message it receives.
Typically, a process cannot decide whether a received message is corrupt or not. Therefore,
we introduce hardening as a class of principled approaches to implement error isolation in
distributed systems. Hardening techniques are (semi-)automatic transformations that enforce
that each process appends an evidence of good behavior in the form of error codes to all
messages it sends. The techniques “virtualize” state corruptions into more benign failures
such as crashes and message omissions: if a faulty process fails to detect its state corruption
and abort, then hardening guarantees that any corrupt message the process sends has invalid
error codes. Correct processes can then inspect received messages and drop them in case
they are corrupt.
With this dissertation, we contribute theoretically and practically to the state of the art in
fault-tolerant distributed systems. To show that hardening is possible, we design, formalize,
and prove correct different hardening techniques that enable existing crash-tolerant designs
to handle state corruption with minimal developer intervention. To show that hardening is
practical, we implement and evaluate these techniques, analyzing their effect on the system
performance and their ability to detect state corruptions in practice.
iii

ACKNOWLEDGMENTS
First of all, I’m very grateful to my supervisor Christof Fetzer, who provided me with numerous
insights, endless suggestions, and an environment in which I was free to pursue almost any
topic and try almost any idea; thank you! I’d also like to warmly thank Flavio Junqueira and
Marco Serafini for showing me what research is about and for mentoring me for so long.
This dissertation compiles a few publications together. Without Christof, Flavio, Marco,
and my other coauthors Dmitrii Kuvaiskii, Sergei Arnautov, and Stefan Weigert, none of these
publications would have been possible. In particular, I’d like to thank Dmitrii for developing
HardPaxos with me (Chapter 5); Marco for co-designing SEI and helping me with several of
the proofs in Chapters 2 and 4; and Sergei for hardening Deadwood (Chapter 4). Moreover,
Dmitrii, Sergei, Stefan as well as Franz Eichhorn and Jons-Tobias Wamhoff gave me feedback
on the text of this dissertation. Thank you guys.
During my doctoral studies at the TU Dresden, I had the great pleasure to sit in the best
office of our group! To my office mates Martin, Stefan, Jons, Sebastian, and Torvald, thanks
for all the help, discussions, coffee breaks, and “Feierabende in der Bierstube”.
Finally, I’d like to thank my wife Madlen for keeping me sane during this long journey.
Without your unconditional support I cannot imagine finishing this work.
v

CONTENTS
vii

Contents
1 Introduction 1
1.1 Hardware errors in the wild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Existing approaches and challenges . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Fault-tolerant hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Byzantine fault tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Application-specific detection . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Software-based error detection . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Contributions and road map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Error isolation with hardening 13
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Faults, errors, and failures . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.3 Fault models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Error isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Benign-fault simulation with error isolation . . . . . . . . . . . . . . . . . 20
2.2.3 Hardening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Modeling process faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Process model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.2 Fault model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 Fault assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Encoded processes 39
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Perfect AN-encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.1 Model refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.2 PAN-encoding rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3 PAN-encoding correctness . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Real-world AN-encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3.1 Finite domains and modular arithmetic . . . . . . . . . . . . . . . . . . . 60
3.3.2 Assumption coverage under uniform random faults . . . . . . . . . . . . 62
3.3.3 Assumption coverage under non-uniform random faults . . . . . . . . . 66
3.4 A framework for building hardened distributed systems . . . . . . . . . . . . . . 71
3.4.1 Process interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.2 Encoding processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.3 Error isolation in practice . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.4.4 Limitations and optimizations . . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Experimental evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.1 Algorithms and methodology . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.2 Paxos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5.3 Strong leader election . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.4 Fault injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4 Scalable Error Isolation 83
4.1 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.1.1 Memory scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
ix
Contents
4.1.2 Computation scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.1.3 Development effort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 Single-threaded SEI-hardening . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.1 Model refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.2 SEI-hardening specification . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.3 Correctness with block-confined faults . . . . . . . . . . . . . . . . . . . 102
4.2.4 Correctness with gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3 Multithreaded SEI-hardening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.1 Model refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.2 Algorithm extensions for multiple threads . . . . . . . . . . . . . . . . . 124
4.3.3 Correctness with multiple threads . . . . . . . . . . . . . . . . . . . . . . 124
4.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.4 SEI-hardening implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.4.1 Development effort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.4.2 Library internals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.4.3 Hardening real-world code bases . . . . . . . . . . . . . . . . . . . . . . 131
4.4.4 Further instrumentation challenges . . . . . . . . . . . . . . . . . . . . . 132
4.5 Fault coverage evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.5.1 Software fault injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.5.2 Hardware fault injection . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.6 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.6.1 Setup and methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.6.2 Computation scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.6.3 Single-thread scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5 Hardened state-machine replication 151
5.1 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.1.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.1.2 Failure model and assumption coverage . . . . . . . . . . . . . . . . . . 154
5.1.3 HardPaxos overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.1.4 HardCore, invariants, and certificates . . . . . . . . . . . . . . . . . . . . 156
5.2 The HardPaxos algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.2.1 Normal operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.2.2 New epoch start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.2.3 Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.2.4 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.2.5 Garbage collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.3 Enforcing trust in HardPaxos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.3.1 AN-encoding and error model . . . . . . . . . . . . . . . . . . . . . . . . 165
5.3.2 Hardening HardCore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.4 Fault coverage evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.4.3 Undetected errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.5 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.5.1 Implementation and baselines . . . . . . . . . . . . . . . . . . . . . . . . 169
5.5.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.5.3 HardCore’s performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.5.4 Response time and throughput . . . . . . . . . . . . . . . . . . . . . . . 170
5.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
x
Contents
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6 Conclusion 173
6.1 Practical contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.2 Theoretical contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Bibliography 179
Lists 193
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Transformation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
xi

1 INTRODUCTION
1

1.1 Hardware errors in the wild
Large-scale distributed systems are at the core of every successful online service on the Web.
Operating such systems at scale comes with a number of challenges related to fault tolerance.
Since machine and process crashes are common failures in production, many systems critical
to Web-scale services employ techniques such as state-machine replication [Sch90] to guar-
antee availability and integrity despite crash failures. Real-world examples of such systems
are Chubby [Bur06], ZooKeeper [Hun+10], Megastore [Bak+11], and Spanner [Cor+12a].
Although occurring less often than crash failures, hardware errors also occur in produc-
tion [SPW09; HSS12] and are expected to happen more frequently in future hardware gener-
ations [Bor05; BC11]. A non-negligible number of these errors is uncorrectable by hardware
techniques such as error-correcting codes (ECC) [SPW09] and manifest to the application as
state corruptions. If a process externalizes corrupt state, e.g., by sending a corrupt message
out, the process commits a so-called value failure.
In spite of crash-tolerant systems being successfully deployed and extensively used in prac-
tice, their underlying crash-failure model does not include value failures. Consequently, one
may expect value failures to propagate across the system, bringing the system to inconsistent
states and even causing outages. Recent events have shown large distributed services being
disrupted by failures of this nature, resulting in long periods of unavailability, e.g., the Amazon
S3 case in 2008 [Ama08a].
In this work, we argue that preventing the propagation of state corruption should be a
first-class requirement for large-scale fault-tolerant distributed systems. We design, formal-
ize, prove correct, implement, and evaluate hardening techniques that enable existing crash-
tolerant protocols to handle state corruption with minimal developer intervention. In particular,
we define hardening as any transformation of distributed system processes that allows each
process to “virtualize” state corruptions into benign failures such as process crashes and
message omissions.
We start this introductory chapter by discussing the causes and effects of hardware errors in
distributed systems (Section 1.1). Next, we present an overview of existing approaches to pro-
tect applications against hardware errors and draw several challenges to be addressed in this
work (Section 1.2). We conclude the chapter by relating our contributions and the addressed
challenges, while delineating the road map for the rest of this dissertation (Section 1.3).
1.1 HARDWARE ERRORS IN THE WILD
Hardware errors can have several causes such as cosmic rays, aging, heat and production
failures [SPW09] as well as electromagnetic coupling [Kim+14], hardware/software incompat-
ibility [Ker07], and even incorrect use of dynamic voltage/frequency scaling (see example in
Section 4.5.2). The cause of an error is called a fault.1 Hardware faults can be transient or
permanent, depending on whether they are bounded or continuous in time [Avi+04]; and they
can affect different hardware components.
In this work, we focus on hardware faults affecting sequential-logic circuits used in memory
elements as well as combinational-logic circuits used in computation units. Such hardware
faults manifest at the application level as state corruptions errors, which in turn may propagate
as crash failures or, in the worst case, as value failures2. Unfortunately, the consequences of
state corruption in distributed systems are rather unpredictable due to the inherent complexity
of communication patterns among processes. As illustration consider the following anecdotal
examples.
1To minimize the number of digressions, we postpone detailed definitions of faults, errors, failures, models, etc
to Chapter 2. We believe these terms can be intuitively understood in this introduction.
2In contexts other than distributed systems, value failures are often called silent data corruptions (SDCs) [Rei+05a;
FSS09; DHE12; Hof+14].
3
1 Introduction
• Chandra et al. have observed hardware faults corrupting the state of processes in some
instances of Google’s Chubby locking service [CGR07], resulting in at least one complete
failure of the service; the duration of the outage was however not reported.
• In July 2008, a single flipped bit in a machine propagated via messages to other pro-
cesses, contaminating them and finally causing an 8-hour outage of Amazon S3 service
in the USA and Europe [Ama08a]; a severe service disruption to most clients of the S3.
• When under high load, faulty hardware has corrupted the payload of several user mes-
sages in Amazon S3 during 2008 and 2011 [Ama08b; Ama08c; Ama11]; however, only
a few clients were affected.
• Ma.gnolia, a social bookmarking website, had a major data corruption episode in January
2009 [Wir09], losing all its data from all its users; an event that finally led to the company’s
shutdown at the end of 2010.
Hardware faults might not only disrupt the availability and integrity of systems; they have
also been successfully used to compromise the security of systems. Govindavajhala and Ap-
pel have shown how to gain control over a Java Virtual Machine using a heat lamp pointed
to the memory modules of a computer [GA03]. The heat of the lamp induces bit flips in
memory, which can bypass the access authorization of otherwise well-protected data struc-
tures. Bar-El et al. have described different approaches to successfully perform “fault attacks”
on public key algorithms such as RSA and DES, for example, by inducing clock or power
glitches [BE+06].
In some cases, attackers do not even have to tamper with the hardware; instead, they
can simply wait for the fault to happen. Recently, Dinaburg has proposed a new form of
attack called bitsquatting [Din11]. The attacker registers DNS entries pointing to addresses
with potentially dangerous content, as in typosquatting attacks [ME10]. The registered names
differ from well-accessed website names by single flipped bits instead of differing by a typo
– e.g., mic2osoft.com results from microsoft.com by bit-flipping one single bit of its fourth
character3. Dinaburg has collected evidence that such domain names receive a non-negligible
number of hits per day, and suggested that the reason for that are hardware faults during the
resolution of domain names in one of the many resolvers or even in the local caches at the
clients. Nikiforakis et al. have recently shown that the number of such domains registered has
been increasing continuously [Nik+13], corroborating the claim that such attacks are feasible
and successful.
While undetected state corruptions are hard to track down beyond single episodes like the
one mentioned above due to the need of application-specific information, some work has
studied the frequency of detected errors in large populations of computers. Hwang et al.
recently analyzed hardware error detection logs coming from 300 TB/year worth of DRAM
memory from Google’s servers and other large server systems [HSS12]. They showed a high
rate of detected state corruptions in memory of up to 167,066 FIT4. Nightingale et al. analyzed
Windows Error Reporting logs collected over 8 months from nearly one million machines; they
observed that the likelihood of a first detected CPU subsystem error occurring in a machine
is nearly an order of magnitude higher than of the first detected memory error [NDO11].
These and other studies – e.g., Bairavasundaram et al. [Bai+08], Schroeder et al. [SPW09], and
Dinaburg [Din11] – confirm the intuition that faults that would be very unlikely in a small cluster
become much more likely at scale.
Given that the hardware error rate is expected to increase in the future [Shi+02; Bor05;
Fen+10; BC11], state corruption might become more frequent at a small scale as well. Accord-
ing to researchers from Intel, the reliability per bit tends to decrease by 8% in every generation
3In ASCII code, the binary value 1110010 represents the letter r, while 1100010 represents the number 2.
4Failures in time (FIT) is the number of failures expected in 109 device-hours of operation.
4
1.2 Existing approaches and challenges
of processors [Bor04; Bor05]. New processor generations have traditionally achieved higher
performance through higher circuit density and lower energy consumption, but this trend is
reaching physical limits where reliability starts to be negatively affected [Con03; Bor05; BC11].
Researchers estimate that the rate of soft errors for processors might increase up to one
user-visible failure per day per chip with 16 nm or smaller technologies [Shi+02; Fen+10].
1.2 EXISTING APPROACHES AND CHALLENGES
In this section, we discuss existing approaches to harden distributed systems against state
corruption. From this discussion we draw challenges, which are then used to drive our work.
1.2.1 FAULT-TOLERANT HARDWARE
Each individual component in a computer is a potential source of errors. Controllers, processor,
main memory, and disk are all prone to errors. DRAM, commonly used for the main mem-
ory of commodity servers, has been identified in various studies as an important source of
errors [SPW09; Li+10; HSS12]. Hardware manufacturers have proposed a number of mech-
anisms to protect systems against state corruption. DRAM often includes error-correction
codes (ECC) like SEC-DED codes or codes in the Chipkill family [Del97]. Some modern pro-
cessors such as the Intel Xeon even protect the whole memory hierarchy (i.e., main memory
and CPU caches) and the communication buses with error codes [Int]. The extent of these
techniques is, however, limited to memory elements.
Hardware faults can also affect the outcome of combinational logic, causing the CPU to com-
pute values incorrectly. Mainframes like the HP NonStop [WJB06] use redundant processors
running in (loose) lockstep to detect state corruption in the processors before errors propagate.
IBM S/390 [Sle+99] and IBM zSeries [BS04] also rely on modular redundancy of components
and built-in checks to detect malfunction of hardware as fast as possible. Such systems pro-
vide a high level of reliability. Nevertheless, Web-scale services are typically deployed on data
centers built with commodity hardware [BH09]. Hardware fault-tolerant computers are expen-
sive and usually an order of magnitude slower than commodity hardware [Bar05]. Therefore,
we focus on commodity hardware in this work.
Challenge 1 (Comprehensive fault model) When devising hardening techniques for com-
modity hardware, the underlying fault model has to consider state corruption caused by faults
in memory elements, e.g., in DRAM modules, as well as by faults in the computation, e.g.,
in arithmetic and logic units (ALUs).
1.2.2 BYZANTINE FAULT TOLERANCE
In the 70’s, NASA funded the design of a fault-tolerant computer for aircraft control using
redundancy [Wen+78]. One of the major concerns in the project was dealing with value failures
caused by hardware faults of any nature. This concern led to the definition of the now well-
known Byzantine-failure model [LSP82]. The Byzantine failure model assumes the existence
of a powerful adversary controlling faulty processes and performing arbitrary and possibly
malicious fault transitions. A system that tolerates Byzantine failures can also withstand
state corruption caused by hardware faults, since these are milder, non-malicious, random
fault transitions. Byzantine fault tolerance has been successfully applied in the safety-critical
industry, e.g., as part of the Boeing 777 primary computing infrastructure [Yeh96].
Byzantine-fault-tolerant (BFT) algorithms also incorporate features orthogonal to state corrup-
tion resilience, namely, tolerance against intrusions and bugs [CL99]. Despite this broad range
of tolerated faults and the extensive body of recent work in the area [Kot+07; Ami+08; Cle+09;
5
1 Introduction
Ver+13; Kap+12a], BFT systems are seldom employed outside of the safety-critical domain
because they incur costs that are often hard to justify [SJR07]. We identify the following costs
associated with the development and deployment of BFT systems:
• BFT algorithms are harder to understand and to prove correct than their crash-tolerant
counterparts [Coa88; NT88], increasing the cost and the resistance to implement these
algorithms in practice.
• BFT algorithms often require more hardware resources, incurring higher ownership and
maintenance costs. For instance, a replicated service requires at least 2f + 1 replicas
to tolerate f crash failures, but at least 3f + 1 replicas to tolerate f Byzantine process
failures [CL99].
• A higher number of replicas leads to higher failure frequencies, which can in turn lead
to lower availability than in crash-tolerant systems [Pow92].
• Alternatively, trusted hardware components can be introduced in the machines to reduce
the number of required replicas [Chu+07; Lev+09; Ver+13; Kap+12a]. The additional
custom hardware components incur, however, additional costs. Moreover, their trust is
conditioned on the correct behavior of the hardware – which in turn is assumed to hold
in the aforementioned references.
• To indeed tolerate malicious attacks and software bugs, BFT replicas have to be suf-
ficiently diverse, so that a successful attack on one replica does not imply successful
attacks on more than f replicas. Achieving diversity is, however, costly. Diverse replicas
have to be generated by N-version programming [KL86; CA95] or by automated software
diversity, which is an active area of research [Hom+13]. Moreover, the diverse replicas
have to run on top of a careful selection of diverse software stacks [Gar+11]. Finally,
these diverse replicas running on diverse stacks have to be maintained.
• In practice, tolerance against malicious attacks is often redundant or an overkill [Bha+10],
for instance, in data centers protected by firewalls. Tolerance to bugs is also an orthog-
onal concern and an active area of research [Süß10].
• Finally, a large class of systems just requires integrity (safety) in the presence of state
corruption, and availability is not strictly necessary, e.g., caches such as memcached,
DNS resolvers, etc. In the Byzantine model, providing fault tolerance or just safety
comes at a very similar cost, see for example the Nysiad system, which achieves safety
through replication [Ho+08] over multiple physical machines. The Thema system also
shows the additional complexity of using a fault-tolerance-only approach in Byzantine-
tolerant three-tiered web systems [Mer+05].
The hardening techniques proposed in this work do not handle malicious attacks nor software
bugs – other techniques have to handle them. Given the difficulties in implementing and
deploying Byzantine-fault-tolerant systems, we identify the following challenges to be tackled.
Challenge 2 (Local transformation) A hardening technique should be local, i.e., it should not
require replica machines as Nysiad [Ho+08]. Moreover, a hardening technique should ideally
not modify existing distributed algorithms in substantial ways or introduce complexity to them.
Challenge 3 (No dependency on trusted components) A hardening technique should not
rely on trusted components such as TrInc [Lev+09], i.e., it should not assume the existence of
trusted hardware components. Still, a hardening technique might leverage hardware compo-
nents that are typical in commodity hardware – e.g., commodity servers are often equipped
with ECC memory.
6
1.2 Existing approaches and challenges
1.2.3 APPLICATION-SPECIFIC DETECTION
Recent work on Mesa, the data warehousing system from Google, reports that state corrup-
tion in the computation, e.g., in the CPUs, is common at scale [Gup+14]. Therefore, Mesa
uses dedicated application-level techniques to detect state corruption such as checksumming,
assertion checks, as well as offline cross-replica consistency checks. Accurately assessing
the application-specific impact of state corruptions is a laborious process – see Li et al. [Li+10]
for example – that most production systems do not undergo. Using application-specific solu-
tions like the ones used in Mesa give application developers the burden of guaranteeing state
integrity. Developers often place error detection checks in their code, but the actual coverage
of these checks is unclear. In this work, we argue that preventing the propagation of state
corruption, including corruption caused in the computation, should be a first-class require-
ment for large-scale fault-tolerant distributed systems. Instead of using ad hoc, application-
specific checks to detect hardware errors, we advocate hardening approaches that enable
crash-tolerant protocols to handle state corruption with minimal developer intervention.
Challenge 4 (Systematic approach) Without a principled approach, it is possible to overlook
relevant corruption scenarios, leaving systems prone to outages due to state corruptions.
Approaches that automatically or semi-automatically protect the system are less error-prone
and reduce the burden on the developer.
1.2.4 SOFTWARE-BASED ERROR DETECTION
Research on software-based error detection is vast – see Reis et al. [Rei+05a] and Schif-
fel [Sch11] for extensive surveys. A straightforward approach, called redundant multithreading
(RMT), is to use replicated threads or processes to detect errors in a similar way as high-end
mainframes do with lockstep processors [WJB06]. In a nutshell, the RMT approach executes
the same program in multiple threads or processes and compares their outputs, aborting in
case of a disagreement. Reinhardt and Mukherjee [RM00] first proposed RMT as a hard-
ware technique. Shye et al. [Shy+07] showed how to implement RMT in software as a set
of redundant processes. Their approach, called process-level redundancy (PLR), relies on bi-
nary instrumentation to dynamically create redundant processes and intercept system calls.
Wang et al. [Wan+07] implemented a similar idea with a compiler transformation, which deter-
ministically replicates the thread execution. More recently, Döbel et al. [DHE12] implemented
the approach directly in a microkernel, freeing the programmer from the burden of using bi-
nary instrumentation or compilers altogether – with the drawback of having to use the specific
microkernel.
Some software-level approaches employ self-checks instead of redundant threads. Special
compilers transform the original source code or binary adding state redundancy and consis-
tency checks. These approaches can be divided into instruction duplication [OSM02b; Rei+05a;
RCA07; Fen+10] and AN codes [RCA07; FSS09] (a class of arithmetic error codes [Avi71]). The
former duplicates the number of registers and/or memory locations and performs the same
computation twice in each of the copies, comparing the results before externalizing them. The
latter transforms the program operations and state in such a way that the values in registers
and (possibly) in memory locations are kept encoded while being operated; a program detects
state corruption when it identifies incorrectly encoded values.
LACK OF END-TO-END GUARANTEES
Irrespective of whether employing self-checks or redundant threads, most error-detection
approaches lack in providing end-to-end guarantees. Since they focus on single-machine sys-
tems, these approaches strive to provide fail-stop behavior, i.e., a faulty process can fail only
7
1 Introduction
by silently crashing. As we show in our experiments (Section 4.5), faulty processes do not
necessarily fail silently, even when hardened. The “sphere of replication” [RM00] of these
approaches only include the processor [Rei+05a; RCA07; Fen+10] or the processor and the
main memory [OSM02b], but does not include the network. Consequently, when a message
is sent it does not carry any certification that the process correctly generated the message.
If a fault occurs after the process generates a message, but before the message is sent,
that corrupt message cannot be identified as such by the receiving process and may disrupt
the distributed system. One might mitigate the problem by appending a checksum or cyclic
redundancy check (CRC) [War12] to the message, but that can again fail if the fault occurs
between the generation of the message and its CRC calculation.
Challenge 5 (End-to-end hardening) Hardening has to protect the system against state cor-
ruption in an end-to-end fashion, allowing the error detection to be performed remotely. Hard-
ening has to detect state corruptions due to faults in the network or due to faults in the
processes; occurring before, while, or after the calculation of any error-detection code sent
along the messages.
Only few hardening approaches provide end-to-end guarantees – we call them end-to-end
hardening approaches. For example, prior work on AN codes [FSS09; Ulb+12] advertises en
passant that if messages are kept encoded while being sent in the network, the error detection
is end-to-end. PASC [Cor+12b] guarantees end-to-end error detection by design with a scheme
similar to instruction duplication as long as replica messages are sent along with the original
messages. Our work is motivated and inspired by these approaches.
IMPRECISE FAULT MODELS
Another limitation of most existing techniques is that they are built on top of imprecise fault
models, which miss to define either the frequency in which faults occur, or the extent of the
faults, or both. Consider the following definition of a single event upset (SEU) fault model by
Shye et al. [Shy+07]:
“We assume the single event upset (SEU) fault model in which a single transient
fault occurs at a time.”
The statement is imprecise because it might mean that a single fault occurs during a program
execution, or during the processing of a message, or in a 10 ms time window, etc. Moreover, a
“single fault” could be a fault corrupting a single bit in a memory element, or corrupting a single
machine word, or a single program variable, etc. Likewise, the AN-encoding literature [FSS09;
Ulb+12; Wam+13] builds upon the rather imprecise error model by Forin [For89].
RESTRICTIVE FAULT MODELS
Reis et al. define their SEU fault model as the model in which “at most one bit can be flipped
once during a program’s execution” [RCA07]. This version of the SEU fault model is indeed
more precise, and several works are based on it [Rei+05b; RCA07; YGS09]. As we now argue,
however, this fault model is very restrictive in frequency and extent.
Frequency. The assumption of a single fault per execution is common in the literature [RM00;
Rei+05a; Sag+05; RCA07; Shy+07; Per+07; Wan+07; DHE12]. The reasoning behind the
assumption is that the probability of concurrent particle strikes is small [Sag+05]. Notwith-
standing, recent studies report that hard errors, i.e., errors caused by permanent faults, tend
to dominate the observations of memory errors in large installations [HSS12]. Hard errors
8
1.2 Existing approaches and challenges
can be seen as several faults of the same bits or words in one execution. Furthermore, dis-
tributed systems, which are the context of this work, tend to run for long time periods – days,
months, or longer – without being restarted. The assumption of a single fault per execution
can, therefore, result in low coverage solutions.
Extent. Although evidence supports single-bit errors as a realistic representation of faults
occurring in memory elements [Li+07; Li+10], faults in combinational logic do not necessarily
result in single-bit errors. Saggese et al. performed gate-level fault injection in DLX and Alpha
microprocessors and measured a non-negligible number of multibit errors in the sequential
elements caused by faults in the combinational logic [Sag+05]. We illustrate this problem
with a simple example: Registers rs and rt are to be multiplied, and the result to be stored in
register rd. Register rs holds the integer 3, whereas rt holds 2. Before reaching the hardware
multiplier in the ALU, the first bit of register rt is inverted by a fault; hence, the multiplier
interprets the value in rt as the integer 3. The result in register rd is 9 instead of 6. This
glitch in the input of the ALU corrupts four bits of the register rd – since 9 is 1001 in binary,
and 6 is 0110. The possibility of multibit errors also increases the smaller are the dimensions
and operating voltage levels of the logic circuits [MZM10]. Therefore, single-bit errors could, in
practice, turn out to be a poor model for state corruptions caused by faults in the computation.
A fault-tolerant system might fail unexpectedly if the assumed fault model does not cor-
respond to the reality, possibly resulting in catastrophic consequences. Consider another
example. RMT only works if threads/processes have an independent probability of failing.
Since threads run in a single server, they obviously share hardware resources; and when shar-
ing resources, they also share errors. Therefore, hardware faults might lead to common-mode
failures. If one or more faults can corrupt the state of multiple replica processes, the corrupt
processes might produce the same incorrect outputs. That can occur even if the faults corrupt
the state of the replicas in different ways. For instance, consider a conditional branch that is
taken only if a variable x is different from zero. If x is corrupt to a value different from zero
in multiple replicas, then all replicas take the branch even if the corrupt values of x are differ-
ent among the replicas. Note that a similar argument can be made for instruction-duplication
approaches.
Since it is difficult to precisely determine what part of a process state can be corrupted by
a hardware malfunction, a weaker model is easier to generalize over different applications and
platforms. A model is weak when it makes weak assumptions about faults, allowing more
behaviors of the system, and with that covering more scenarios. One example of such a
model is the ASC model, first proposed by Correia et al. [Cor+12b].
Challenge 6 (Precise and general fault model) The fault model has to be precise in defining
the extent and the frequency of faults. The fault model itself should not be too restrictive to
avoid low coverage solutions.
INFORMAL ALGORITHMS
Imprecise and too restrictive fault models are only part of the problem. Perry et al. [Per+07]
point out that techniques to detect hardware faults are usually presented without any rigorous
proof that they indeed guarantee some particular reliability property at all. We believe the lack
of formalism is a severe limitation of existing work, and should be tackled if hardening is to
form a real alternative to Byzantine fault tolerance.
Challenge 7 (Formal guarantees) To understand which scenarios hardening techniques can
capture and which they cannot, a general and precise fault model has to be devised. Hardening
techniques have to be proven correct on top of this model. The “power” of a hardening
technique is then expressed by the coverage of its fault assumptions in practice.
9
1 Introduction
Two exceptions are the works by Perry et al. aforementioned and by Correia et al. [Cor+12b].
Perry et al. define a fault-tolerant assembly language called TALFT. If a program is programmed
with this language, then the program can provably detect state corruptions within their fault
model. The fault detection is guaranteed with instruction duplication similar to SWIFT [Rei+05a].
TALFT approach is limited to the SEU fault model and requires trusted hardware components
to check replicated values before writing them into memory.
Correia et al. endeavor into proving the correctness of another hardening technique called
PASC. The work of Correia et al. tackles several of the challenges identified here. Their work
includes a precise fault model, a hardening algorithm, and an implementation. The ASC fault
model is comprehensive: it captures the notion of intermittent errors and that whenever an
error occurs across a computer system, the whole state of a process can transition to an
arbitrary value. The PASC algorithm is local (only redundancy inside a single process is used)
and untrusted (it assumes that faults may occur while executing the hardening code). The
PASC implementation also mitigates the burden of hardening systems by automating it when
applications follow an assumed structure. We base part of our work on the ASC fault model
and PASC algorithm, but we improve them in several dimensions, as discussed in Section 1.3.
OVERHEAD
Hardening approaches do not require additional machines as BFT systems do, but they do
require redundancy in one form or another. In general, this redundancy is overhead in time or
space, i.e., in CPU cycles or memory. We now compare the overhead of approaches that do
not provide end-to-end guarantees against those that provide these guarantees.
PLR [Shy+07] duplicates (or triplicates) the number of processes in a program. If the ma-
chine has multiple processors, the average performance overhead (in execution time) is 16.9%
for the SPEC2000 benchmark. Nevertheless, the combined number of CPU cycles from the
duplicated processes is at least twice as much as in the unprotected program – which is not
duplicated. The wasted cycles cannot be harvested in multithreaded applications, limiting
the possible throughput. Due to instruction-level parallelism in modern CPUs, instruction-
duplication approaches such as EDDI [OSM02b] and SWIFT [Rei+05a] incur less than 100%
overhead even though they do not offload the duplicated instructions to another processor.
Reis et al. [Rei+05a] reports, in average, an overhead of 62% for EDDI and 41% for SWIFT.
The memory overhead EDDI incurs is 100% memory since EDDI duplicates all memory loca-
tions used in the program. SWIFT, in contrast, leverages ECC protection, mostly eliminating
the memory overhead. The overhead of instruction-duplication approaches can be further re-
duced using compiler techniques to identify vital checks [YGS09] or by relying on hardware
extensions [WP05].
In contrast, existing end-to-end hardening approaches still incur substantial memory over-
head. AN-encoding extends the types of variables, e.g., from 32-bits to 64-bits, introducing
at least 100% overhead [FSS09]. Similarly, PASC [Cor+12b] requires the state of the process
to be duplicated in the main memory.
The execution overhead of end-to-end hardening is also substantial. AN-encoding often
makes operations become heavy-weight when encoded. That leads non-trivial applications
to perform with a slowdown of more than 60 or even 250 times [Sch+10a]. The execution
overhead is also inherent to PASC. PASC executes all instructions twice as EDDI and SWIFT,
but in a different manner. The duplication occurs at a higher level; the hardened program
first executes all instructions of an event handler on one copy of the state and then executes
all instructions again on another copy. The resulting overhead is at least 2 times within
event handlers. Unfortunately, PASC does not support multithreaded applications, which is an
important performance limitation. Note that the execution overhead of end-to-end hardening
techniques is mitigated once the delays intrinsic to networking are considered.
10
1.3 Contributions and road map
The following argument could justify the overhead of end-to-end hardening: In contrast to
crash failures, which are already handled in practice by fault-tolerant systems, state corruptions
can have catastrophic consequences in distributed systems. In the set of techniques that can
handle state corruptions, BFT is the only alternative to end-to-end hardening, but the costs
associated with BFT are prohibitively high. The computer memory price has been steadily
decreasing over the last 50 years5, mitigating the memory overhead problem of end-to-end
hardening. Moreover, servers in data centers are often underutilized and have spare CPU
cycles [BH07]. The spare CPU cycles could be traded to achieve higher fault coverage using
end-to-end hardening combined with existing crash fault tolerance.
State corruptions occur observably less often than crash failures and message omissions.
That is a fact. Therefore, despite the above argument, we believe that a practical solution
to tolerate state corruption should incur as little overhead as possible on top of the already
existing overhead to tolerate benign failures.
Challenge 8 (Acceptable overhead) A hardening technique that handles a large number of
faults has little value if it hampers performance bellow acceptable margins or introduces a
prohibitive memory overhead.
1.3 CONTRIBUTIONS AND ROAD MAP
With this dissertation, we contribute theoretically and practically to the problem of hardening
distributed systems against state corruption:
Theoretical perspective: We show how to comprehensively and precisely model faults; and
how to prove local, untrusted, end-to-end hardening techniques correct.
Practical perspective: We show that it is possible to harden distributed systems against
state corruption with an acceptable performance and resource penalty and with little
development effort.
We now relate the collected challenges with our contributions and give the road map for the
rest of this work.
Chapter 2: We define our distributed system model and formalize the central property of
this work: error isolation. Error isolation states that corrupt messages do not propagate
errors between processes of the system. The property tackles the end-to-end challenge
(Challenge 5). We then show that if error isolation holds, crash-tolerant distributed systems
can also tolerate arbitrary faults.
The theoretical goal of this dissertation is to create a mathematical framework where fault
assumptions can be precisely stated, and algorithms can be modeled and proved correct. We
formalize error isolation in terms of local properties (Challenge 2). Next, we define the ASC
fault model, a comprehensive and precise fault model (Challenges 1 and 6).
The ASC fault model defined in this work is a generalization of the ASC fault model by Cor-
reia et al. [Cor+12b]. The new ASC fault model can represent a wide range of arbitrary process
faults at high level, being general enough to represent any Byzantine failure. Local end-to-end
hardening is impossible when a system is subject to Byzantine failures; hence, we restrict the
ASC faults with a property called fault diversity, which embodies the intuition that hardware
faults are non-malicious. We then propose three different techniques to harden distributed
systems against state corruption, which are local (Challenge 2), end-to-end (Challenge 5) and
untrusted (Challenge 3). The hardening techniques form the three following chapters.
5See http://hblok.net/blog/storage for a list of memory prices over years.
11
1 Introduction
Chapter 3: We employ an existing error-detection technique, namely, AN-encoding to achieve
error isolation. We formalize and prove AN-encoding correct on top of our model (Challenge 7).
We are not aware of similar proofs for AN-encoding techniques. Next, we implement a frame-
work to help the development of distributed systems. With our framework, a developer
can first implement benign-fault-tolerant distributed systems and then automatically hardened
them with AN-encoding (Challenge 4). We evaluate the performance and coverage of our
implementation with two use cases: the Highly-available Leader Election Service by Fetzer
and Cristian [FC99a]; and the Paxos algorithm by Lamport [Lam98].
Chapter 4: We design and prove correct a new algorithm to harden distributed systems, the
Scalable Error Isolation (SEI) algorithm (Challenge 7). Although SEI is inspired on PASC, it offers
several advantages. First, it can leverage hardware-level memory integrity checks (e.g., ECC) or
software-level error detection to reduce or even eliminate memory overhead – PASC introduces
100% memory overhead (Challenge 8). Second, SEI supports multithreaded applications –
PASC can only execute single-threaded applications (Challenge 8). Third, SEI requires minimal
effort to harden existing systems by employing compiler-based markers (Challenge 4). To
illustrate that, we harden two existing code-bases: a DNS resolver and memcached.
Chapter 5: We trade generality for performance (Challenge 8) in the design of HardPaxos, an
atomic broadcast protocol based on Paxos [Lam98] that has a part of the algorithm factored out
into a “trusted module” called HardCore. As long as HardCore does not fail arbitrarily, we show
that any replicated service built on the top of the atomic broadcast library can tolerate state
corruptions in addition to crashes – the basic Paxos only provides crash-tolerance. We then
employ again AN-encoding to make HardCore trustworthy. The solution allows any service to
tolerate crashes and state corruptions, while incurring minimal overhead.
Chapter 6: We conclude this work with a discussion of the results as well as suggestions
of future work.
12
2 ERROR ISOLATION WITH
HARDENING∗
∗This chapter expands and builds upon the content of “Scalable error isolation for distributed systems”, presented
at NSDI ’15 [Beh+15a], and the companion technical report [Beh+15b]. Parts of Sections 2.1 and 2.2 first
appeared at LADC ’13 [BWF13].
13

2.1 Background
Error isolation is a central concept of this dissertation. In Section 2.1, we define error isolation
as the property in which corrupt messages do not propagate errors between processes of the
system, i.e., corrupt messages might be sent in the system, but they are always discarded
by correct receiver processes. Error isolation can be achieved with traditional techniques such
as Byzantine-fault-tolerant algorithms. Nevertheless, we are interested in techniques that
achieve error isolation locally, i.e., exchanging no additional messages among processes. In
Section 2.2.3, we define hardening as a transformation of a program p into a program ph, such
that if all processes of the system run such a program ph, then error isolation is guaranteed
locally. We then show that if error isolation holds, crash-tolerant distributed systems can also
tolerate arbitrary faults in environments free of malicious attacks and bugs.
To design hardening techniques, we have to first model the effect and extent of arbitrary
faults inside processes. In Section 2.3, we introduce a new fault model upon which the
hardening techniques of the following chapters are constructed: the arbitrary state corruption
(ASC) fault model. Correia et al. first proposed the ASC fault model as the underlying fault
model for the PASC algorithm [Cor+12b]. In contrast to their ASC fault model, our version
of the model is (1) more precise by using a cleaner notation and structure; (2) more general
by having one main assumption, namely, fault diversity, and (3) decoupled from any specific
hardening algorithm such as PASC. To show the expressiveness of our ASC fault model, we
sketch an ASC-based formalization of the error model used in the AN-encoding literature.
2.1 BACKGROUND
We start by introducing the system model and defining several terms that are used throughout
this dissertation. This section motivates and sets the basis for the formalism we present
starting in Section 2.3.
2.1.1 SYSTEM MODEL
A distributed system is a set of processes Π = {π1, . . . ,πn}, with n > 1, that communicate
via message-passing over a network. The processes and the network are the components of
the distributed system. In the literature, the network is often modeled as a set of channel
components [CGR11]. In this work, such a fine granularity is not necessary, so we model the
network as a single component. In fact, except for the fault model and fault assumptions, our
system model is essentially the same as in the FLP paper [FLP85]: we assume no bound on
the transmission delay or process relative speed. We use in some of our definitions the time
of a wall clock not accessible by the system.
Each component of the system is modeled as a state machine, which consists of state
variables and state transitions. A state is an assignment of values to the state variables. The
system state – also called configuration in the literature [FLP85; Gär99] – encompasses the
state of all system components, i.e., processes and network. An execution – also called be-
havior [LM94; Lam02] or trace [War10] – is any infinite sequence of system states. A system
execution is an execution given by interleaved state transitions of the system components,
starting from some initial system state. Finally, a property is defined as a set of execu-
tions [War10].
There are two types of components in a system. The network component is simply a
multiset into which and from which processes send and receive messages. The processes
are precisely defined in Section 2.3.1. In a nutshell, a process π executes the following
state transitions in a loop. First, π receives a message from the network, removing it from
the network multiset and storing it in its input buffer. Next, π handles the input message by
performing some computation based on its state and the message, modifying its internal state,
and (possibly) generating one or more output messages, which are placed in π’s output buffer.
15
2 Error isolation with hardening
Finally, π sends any output message to the network, by removing the message from the output
buffer and writing it into the network multiset. Note that the details of message receive and
send are not relevant for our work; hence, we model them as single state transitions. The
handling of a message is, however, represented as a series of state transitions.
2.1.2 FAULTS, ERRORS, AND FAILURES
Intuitively, faults are defects or adversarial conditions that when activated can cause errors.
Errors may propagate and become externally visible as failures.
In more detail, faults, also called fault transitions or fault steps, are special state transitions
added to the components of the system [Cri85; LM94]. When a fault transition is taken in
an execution, an error occurs. Errors are abnormal system states. A component commits a
failure if an error becomes visible to other components of the system or to an external user.
In other words, a failure is a deviation from the component’s expected external behavior.
A component failure can manifest in different ways. A commission [KMMS03] or value
failure [Pow92] is an unexpected message or a message with unexpected content sent out
by a component. An omission failure is the absence of a message expected to be sent by
a component. More precisely, based on the current system state, an omniscient observer
would expect a message being sent by the component, but none is sent. In system models
that assume some form of synchrony, failures can also manifest on the time domain, being
called timing failures [Pow92] or performance failures [CF99]. Since we make no synchrony
assumptions, we do not consider timing/performance failures.
A system failure is a specification deviation at the system level, observed, for example, by
an external client. A system failure can be the result of one or more component failures.
FROM FAULTS TO FAILURES
Failures, faults and errors are often confused. As pointed by Cristian [Cri91], “what one per-
son calls a failure, a second person calls a fault, and a third person might call an error”. One
reason for this confusion is that a failure of one component can be a fault to another compo-
nent. Moreover, a failure from a lower level of abstraction can be a fault to a higher level of
abstraction. Let us clarify these interactions by means of examples:
• A failure of a transistor, e.g., incorrect output voltage, is a fault to a gate (at a higher
level of abstraction).
• A failure of a gate, e.g., incorrect Boolean output, is a fault to a memory element (at the
same level of abstraction).
• A failure of a memory element, e.g., a bit flip, is a state corruption in the process (at a
higher level of abstraction); state corruption is an error in the process state.
• A failure of a process, e.g., corrupt message sent out, is a fault to another process (at
the same level of abstraction).
• A failure of a process, e.g., crash, is a fault to the distributed system (at a higher level).
If not stated otherwise, faults and failures always refer to the component level. Again,
components are for us the network and processes.
FAULT MODEL, FAULT ASSUMPTIONS, AND CORRECTNESS
A fault model is a set of fault transitions added to the components of the system. Since no
distributed algorithm can tolerate an unbounded number of component failures, one typically
16
2.1 Background
assumes a limit to the fault extent or frequency, for example, at most f processes might crash
in an execution or at most one bit is corrupt in an execution. Under such a fault assumption, an
algorithm is correct if and only if every execution of the system satisfies the system expected
behavior, i.e., no execution results in a system failure.
ASSUMPTION COVERAGE
In practice, an implemented system might still fail even if its fault-tolerant algorithm is correct.
The coverage of an assumption is the probability of the assumption holding in practice given
that a failure occurred [Pow92]; for example, if the model assumes anything “bad” can happen,
then it has assumption coverage 1, i.e., 100%. In contrast, a model assuming the only
possible faults are crashes has a low coverage when the system it models is deployed in an
environment where radiation is high, e.g., space, because the system is likely to suffer from
state corruptions.
2.1.3 FAULT MODELS
To conclude this section, we introduce two fault models: the benign fault model and the
arbitrary fault model. They serve as baselines for our new fault model in Section 2.3.
BENIGN FAULT MODEL
We consider benign failures1 to be those failures that are already covered by practical fault-
tolerant systems such as the target systems mentioned in the introduction. In our benign
fault model, sometimes called crash-stop model [Cor+12b], two sets of failures can occur.
First, processes might crash. Second, the network might lose, duplicate, reorder, or mis-
route messages. The most prevalent of these failures is the process crash. A process π
commits a crash failure by performing a crash fault transition into a halt state from which
no further transition of π is possible. From the halt state, process π can only commit omis-
sion failures, sending no further message out. The remainder failures are due to the net-
work component. Message omission, duplication, and reordering are well-known problems in
point-to-point communication, and are typically solved by retrying and piggybacking sequence
numbers to messages [CGR11]. Misrouting can also be easily detected if the sender appends
the process identifier of the destination process to each message it sends; and the receiver
checks whether it is indeed the right destination process of every message it receives.
In general, algorithms designed to tolerate benign failures are easier to reason about [Coa88;
NT88] and in many cases preferable to arbitrary-fault-tolerant algorithms [Cri91; Pow92]. In this
work, we assume all distributed algorithms in consideration can tolerate benign failures.
ARBITRARY FAULT MODEL
On the other extreme of the spectrum is the arbitrary fault model. The arbitrary fault model
extends components with arbitrary faults, which are fault transitions to any representable
state of a component, e.g., corruption of messages in the network, corruption of a variable in
a process, substitution of the entire operating system image, etc. Besides failing benignly, a
component might fail arbitrarily by committing a value failure, i.e., by sending corrupt messages
out. We define a corrupt message as follows.
A generation history for a message m is a sequence of messages that could be received
by a correct process in order to generate m.
1The term benign qualifies a failure (or fault) as one “of a mild character”, without attributing intention to it.
17
2 Error isolation with hardening
Definition 2.1 (Message precedence) Let π be a process. Let mout and m′out two output
messages. Let m′in be the input message π handled in order to produce m
′
out . We say that
mout precedes m′out if and only if π sends mout before it receives m
′
in.
A process π receives the input message m when it starts the handler that takes m as input.
Definition 2.2 (Generation history of a message) Let π be a faulty process. Let h be a
subsequence of the sequence of correct messages received by π. Let m be a message sent
by π. The subsequence h is a generation history for m if there exists an execution in which
π is correct and π outputs m after receiving each message in h.
Note that a message can have multiple generation histories, corresponding to multiple exe-
cutions where the message might have been produced.
A correct message m is one that has a correct generation history h, that is, a generation
history guaranteeing that previous history is not lost. The inductive definition is as follows.
Definition 2.3 (Correct generation history of a message) Let π be a faulty single-threaded
process. Let m be a message sent by π.
• If π has sent no message m′ before m such that m′ has a correct generation history,
then all generation histories of m are correct for m.
• Else, for each output message m′ preceding m, let H be set of correct generation
histories of m′. A generation history of m is correct if and only if it extends some
generation history in H.
This definition refers to a single-threaded process where the precedence relationship among
messages is a total order. We will consider the multithreaded case in Section 4.3.
Definition 2.4 (Correct message) Let π be a faulty process. A message m currently sent by
π is correct if and only if it has a correct generation history.
Definition 2.5 (Corrupt message) A message m is corrupt if and only if it is not correct.
This definition of correct message forces every correct output message to take into con-
sideration the effect of all correct output messages that have been processed previously and
became externally visible. Messages received together can be processed in any order, but
they must result in consistent histories.
Messages are always part of the state of some component. Hence, a corrupt message
can be equivalently defined by referring to the state representing the message inside the
component. If that state is corrupt, the message is corrupt; and vice versa. Such a definition
of corrupt message, however, requires a precise notion of state corruption, which we first
introduce in Section 2.3.
Beyond corrupt messages, arbitrary faults can manifest as omission failures [KMMS03] as
well, but we attribute omission failures to the network.
Remarks on the Byzantine fault model. From a failure perspective, the arbitrary fault model
and the Byzantine fault model [LSP82] are equivalent. Nevertheless, Byzantine faults often only
model the specification deviation, i.e., the value or omission failure, ignoring the actual internal
state corruption – see, for example, the definition by Castro and Liskov [CL99]. That is not a
problem, but a feature of the Byzantine fault model. One can abstract away the causes of a
Byzantine failure and simply model the Byzantine failures as fault transitions.
In this work, however, we are interested precisely in the arbitrary faults that become Byzan-
tine failures (i.e., arbitrary failures). We model an arbitrary-fault transition as an arbitrary change
18
2.2 Problem definition
Classification Failure Fault
correct no –
crashed crash crash transition
faulty no arbitrary transition
contaminated no corrupt message received and processed
failed corrupt message from faulty or contaminated
Table 2.1: Process classification according to failure and fault
of the internal state of a system component. (Normal) state transitions might then propagate
the caused error to finally become a Byzantine failure. This level of modeling is particularly
important to reason about the correctness of our hardening techniques because, as we will
define in the next section, the challenge in devising a hardening technique is in detecting
internal state corruptions before they become arbitrary failures.
2.2 PROBLEM DEFINITION
In an execution subject to arbitrary faults, an omniscient observer can classify a process π as
correct, crashed, faulty, contaminated, or failed. Table 2.1 relates this classification with the
faults and failures process π suffers and commits. In this work, it is sufficient to classify only
the processes; we do not classify the network component as faulty, crashed, etc.
Process π is correct if it neither commits a failure nor suffers a fault. Process π becomes
crashed if it fails by performing a crash transition. If process π fails by sending a corrupt
message, then π is classified as failed, independently of the fault that caused the failure. If
process π suffers a fault other than a crash transition, π does not have to immediately fail;
in fact, it may never fail. While process π is in an erroneous state, but not yet failed, p is
classified as faulty or contaminated depending on which type of fault π has suffered first. A
process π becomes faulty if π performs an arbitrary fault transition. A process π becomes
contaminated if π receives, processes, and modifies its state according to a corrupt message
sent by some other component – the other component being the network or another process.
2.2.1 ERROR ISOLATION
The central objective of this work is coping with failure propagation due to arbitrary faults in
distributed systems. We call failure propagation the contamination of a process as a result
of an arbitrary failure of another component. Different from faulty processes, contaminated
processes are induced to transition into an erroneous state by an external fault source. One
cannot restrict by assumption how many processes might get contaminated in this way be-
cause a contaminated process can again contaminate other correct processes, disrupting the
whole system at some point as in Amazon S3 case [Ama08a].
Processes of a fault-tolerant system do not propagate failures as long as the system guar-
antees error isolation, which is defined as follows.
Property 2.1 (Error isolation) A correct process π discards a received message m without
modifying its state according to m if m is corrupt.
Naming Property 2.1 as error isolation is a rather subjective choice. In contrast to terms such
as failure containment or fault isolation, error isolation implicitly indicates that approaches im-
plementing the property might work from inside the components, stopping errors before they
become failures. Moreover, error isolation has been used in the recent literature [Cor+12b].
19
2 Error isolation with hardening
Ea Ei Eb
Executions with
arbitrary faults
Executions with
arbitrary faults
and error isolation
Executions with
benign faults
assumption
mapping µ
Figure 2.1: Benign-fault simulation with error isolation: If error isolation holds, executions sub-
ject to arbitrary faults can be mapped into executions subject to benign faults.
A corollary of Property 2.1 is that, if error isolation holds, no correct process is ever con-
taminated.
Error isolation could be easily achieved by crashing correct processes. To rule out such
solutions, we also need this additional property.
Property 2.2 (Accuracy) A correct process π is never induced to crash or discard a correct
message.
2.2.2 BENIGN-FAULT SIMULATION WITH ERROR ISOLATION
We now argue that benign-fault-tolerant algorithms can be used without modifications to
tolerate arbitrary faults if error isolation holds. Technically, benign faults are simulated on
top of arbitrary faults [AW04]. A distinctive characteristic of our simulation is the assumption
of error isolation, i.e., we assume there exists some technique applied to the system that
guarantees the error isolation property always hold. Note that one goal of this work is to
design such techniques. In this section, we argue about a positive consequence of using
them: the possibility of benign-fault-tolerant algorithms also tolerating arbitrary faults.
Figure 2.1 shows three sets of executions, i.e., properties, of a given distributed system.
The circle on the left-hand side represents the set Ea of all possible executions of the system
under arbitrary faults (including fault-free executions). The circle on the right-hand side repre-
sents the set Eb of all possible executions under benign faults (including fault-free executions).
Th property Ei is the subset of Ea for which error isolation holds. We show that a straight-
forward mapping between Ei and Eb exists. Consequently, a system tolerant to benign faults
can also tolerate arbitrary faults if error isolation holds.
Theorem 2.1 Consider a distributed system that tolerates benign faults. The same system
tolerates arbitrary faults if error isolation holds.
Proof sketch: We informally show by construction that a function µ can map each execution
ea ∈ Ei into some execution eb ∈ Eb. The system consists of a set of processes Π and a
network component ξ:
• Network: If any message m in ξ is corrupt in ea, then µ removes m from ξ in eb.
• Processes: No correct process becomes contaminated by Property 2.1. Hence, pro-
cesses can only be correct, crashed, failed or faulty. If a correct process π ∈Π receives
a corrupt message m in ea, π does not change its state according to m since error iso-
lation holds. Therefore, such a receive step of π in ea is a stuttering step in eb – i.e.,
a step that does not change the state of π. If a process π ∈Π crashes, µ maps π as
crashed. If a process π ∈Π becomes faulty in execution ea, then µ maps π’s state as
if it were executing correctly in eb.
20
2.2 Problem definition
The mapping µ together with stuttering steps transform corrupt messages in execution ea into
omission failures in execution eb. Moreover, duplication, reordering and misrouting failures
occurring in ea are also allowed in eb. □
Note that any fault assumption restricting the number of crashed processes in a benign-
fault-tolerant algorithm has to be reformulated to restrict the number of faulty and crashed
processes instead.
Theorem 2.1 has one important practical corollary. Remember that state-machine replication
is widely used in real-world systems to tolerate crash failures (see Chapter 1). By Theorem 2.1,
state-machine replication can tolerate arbitrary faults with with only 2f +1 processes if error iso-
lation holds – where f is the assumed bound on the number of faulty and crashed processes.
In contrast, the more general Byzantine-fault-tolerant solutions can also tolerate arbitrary faults
with at least 3f + 1 replicas, assuming no trusted hardware components.
Remarks on liveness and safety. Intuitively, a liveness property is a set of executions that
eventually satisfy some condition, e.g., a group of processes eventually take a common de-
cision on some action. A safety property is a set of executions that always guarantee some
invariant, e.g., Property 2.1.
Our work is mainly concerned with safety properties. As in traditional asynchronous models,
liveness properties require further assumptions such as a global stabilization time [DLS88].
Error isolation does not directly imply that liveness properties are guaranteed.
As in the Byzantine fault model, a faulty process could cause a Denial-of-Service and violate
liveness since it could (involuntarily) flood the network with nonsense messages. Assuming an
eventual bound on the communication delays between processes indirectly rules such scenar-
ios out. A more elegant assumption would be, however, that failed processes eventually crash
– as in the Mortal Byzantine model [Wid+07]. Such an assumption is justifiable if processes
abort once errors are detected and, specially, if they perform periodic rejuvenation [CL02].
Note that if safety properties of the algorithm rely on a clock, one has to additionally assume
that the clock is correct, i.e., it has a maximum drift rate. Techniques to build Byzantine-fault-
tolerant clocks [FC99b] can be used to enforce such assumption.
2.2.3 HARDENING
Under the benign-failure model, error isolation is trivially guaranteed. A process follows its
specification until it crashes, sending no corrupt messages. In the arbitrary fault model, error
isolation is challenging to guarantee. Byzantine fault tolerance is one way to achieve error
isolation at the cost of additional hardware components [CL99; Chu+07; Lev+09]. In this
work, we exclusively focus on faults caused by hardware errors as opposed to malicious
adversaries or bugs. We argue that hardening can make the probability of failure propagation
negligible2.
To define hardening precisely, we first introduce the concept of message validity. Intu-
itively, an invalid message is a message that is corrupt (Definition 2.5), but this fact is triv-
ially detected by inspecting the message itself – Powell similarly defines noncode value er-
rors [Pow92]. This definition of message validity is based on the observation that often error-
detection codes, e.g., cyclic redundancy check (CRC), are used to protect messages during
transmission [War12]. The error-detection codes define a syntax of messages that can be
accepted as correct.
2Note that hardening cannot cope with bugs. Tolerance to bugs is also an orthogonal concern and an active area
of research [Süß10].
21
2 Error isolation with hardening
Definition 2.6 (Message validity) Let M be the set of all possible messages sent in the
system. Let CV ⊂ M define the set of messages that pass the acceptance test of a given
error-detection code. A message m is valid if and only if m∈ CV , otherwise m is invalid.
Note that invalid messages and corrupt messages are not the same. Valid and invalid
messages are statically defined for all executions with the set CV ⊂ M – any valid message
m is contained in CV, whereas any invalid message m is contained in M \ CV . Whether a
message is correct or corrupt, however, depends on the sequence of messages received by a
process (see Definition 2.5). If no fault occurs, a correctly-implemented error-detection code
never classifies correct messages to be invalid.
Any corrupt but valid message is a potential threat to the system if the system is not de-
signed to cope with arbitrary failures. The goal of hardening is to make corrupt but valid
messages impossible in any execution of the system. In a hardened system, a correct pro-
cess can determine if a message is correct by directly inspecting the error-detection code
piggybacked in the message.
HARDWARE ERRORS IN THE NETWORK
Using the concept of message validity, the following assumption rules out the arbitrary be-
havior of the network component that is not caused by hardware errors.
Assumption 2.1 (No spurious messages) If a valid message m is received by a process πr
at a time tr , then another process πs sent m at time ts ≤ tr .
In other words, Assumption 2.1 asserts that the network never creates valid messages as
if it were a process, except for duplicated messages. If the network component does create
a message, then the message is invalid.
Assumption 2.1 has a low assumption coverage in systems subject to malicious adversaries.
A hacker could break the error-detection code and insert corrupt but valid messages in the
compromised network component. However, if the system is never subject to attacks, which
is our focus in this work, arbitrary faults have a negligible probability of producing a valid
message. A message encompasses header fields, checksum, payload, etc. Creating a corrupt
but valid message spontaneously would require one or more highly improbable fault transitions
in the network component. Note that valid messages might still be duplicated, but duplicates
are tolerated since they are benign failures.
HARDWARE ERRORS IN THE PROCESSES
In the hardening problem, message validity should be more than a checksum protecting the
message against corruption of the network; it should represent an evidence of good behavior
of the process sending the message. Intuitively, hardening is any software-only, process-
local technique that translates “good” and “bad” behavior of a process into the validity of
the messages. By software-only we mean a technique that requires no additional hardware
components or processes to work – although it might use additional hardware to minimize its
overhead. By process-local we mean a technique that requires no additional communication
between processes, i.e., the process alone decides whether it is behaving correctly or not
and translates this information as message validity.
In an environment with no spurious messages (Assumption 2.1), a hardening technique
should essentially achieve the following two properties in addition to accuracy (Property 2.2)
to guarantee error isolation.
Property 2.3 (Local error exposure) For any output message m of a faulty process π, if m
is corrupt, then m is invalid, i.e., m /∈ CV .
22
2.3 Modeling process faults
Property 2.4 (Local error filtering) For any message m received by a correct process π, if
m /∈ CV , then π discards m without changing its state.
Together these properties define the hardening problem.
Definition 2.7 (Hardening) A hardening technique transforms a native program p into a hard-
ened program ph such that a process π executing program ph guarantees local error exposure
(Property 2.3), local error filtering (Property 2.4), and accuracy (Property 2.2).
A hardened process is a process executing a hardened program. Definition 2.7 asserts that
a correct hardened process never changes its state according to any invalid message received.
More importantly, a faulty hardened process never sends out any corrupt but valid message.
Note that Property 2.3 constrains the behavior of faulty processes. That is, however, impos-
sible under arbitrary faults. Independent of how the hardening is implemented, the information
representing the detection of an error has to be present in some state variable or a combination
of state variables. One or multiple arbitrary faults could always erase this information, leaving
no traces back. Any hardening technique, consequently, has to make further assumptions
about the format and/or frequency of the arbitrary faults. In later sections, we formulate these
additional assumptions and also experimentally evaluate their assumption coverage accord-
ing to the specific technique. For the sake of argument, we assume here such a hardening
technique (cf. Definition 2.7) exists.
The following theorem directly results from Assumption 2.1 and Definition 2.7.
Theorem 2.2 Let p be the native program running in a process π ∈Π of a given distributed
system. If every process π ∈Π executes a hardened program ph, then error isolation (Prop-
erty 2.1) holds.
Proof:
1. Every corrupt message is invalid.
1.1. Any corrupt message sent out by the network is invalid by Assumption 2.1.
1.2. Any corrupt message sent out by a hardened process π ∈Π is invalid by Property 2.3.
2. Error isolation (Property 2.1) holds.
2.1. Correct processes discard any invalid messages by Property 2.4.
2.2. Correct processes discard any corrupt messages by Steps 1 and 2.1.
□
2.3 MODELING PROCESS FAULTS
In this section, we precisely model arbitrary process faults and build the framework upon
which the hardening techniques presented in Chapters 3 and 4 are constructed and their fault
assumptions can be stated. Since our main goal is to prevent error propagation with techniques
local to processes, we have to model the faults occurring inside components, in particular,
inside processes – arbitrary faults occurring in the network are ruled out by Assumption 2.1.
We start by modeling the process execution, assuming processes work in a strict receive-
handle-send loop (Section 2.3.1). Next, we present our arbitrary state corruption (ASC) fault
model (Section 2.3.2). Since the fault model depends on the definition of the process model,
we often refer to both models together simply as the ASC model. Finally, we present the
fault-diversity assumption, a fault assumption common to all techniques presented in this work
(Section 2.3.3).
2.3.1 PROCESS MODEL
A process π is a deterministic state machine composed of state variables and a set of state
transitions.
23
2 Error isolation with hardening
PROCESS STATE
We model the state s of a process as an assignment of values from a domain D to the set of
variables V ; the variables V represent memory locations, registers and the program counter.
Note that the domain D is the same for all variables.
Definition 2.8 (State) Given a set of variables V used by a process π and a values domain
D, a state s of process π is a surjective function s : V → D.
We use the notation v to indicate a variable and s[v ] to indicate the value of the variable
at state s – we loosely follow the TLA+ language notation [Lam02], in which functions are
denoted with square brackets. Unless noted otherwise, s[v ] represents the current value of
variable v , whereas s′[v ] represents the value of v after the next state transition is performed.
For simplicity, we occasionally indicate v as a value when the state including v is clear from
the context. In particular, we indicate pc as the value of the program counter (i.e., s[pc]), and
pc’ the next value of the program counter (i.e., s′[pc]).
Program variables: A program p running on a process π does not have to use all variables in
V (see Definition 2.12 for the definition of program p). We define the program variables Vp ⊂ V
to be the set of all variables potentially used by any execution of process π running a program
p. We assume that Vp is a strict subset of all variables V , i.e., there are always variables that
are not used by p. This restriction facilitates the design of hardening mechanisms because it
allows us to reason about all variables of a process, including the hardening-specific variables,
before actually introducing the hardening technique.
Message buffer variables: Messages are represented with sets of variables in the state of
processes. The set of variables Vi ⊂ Vp represents the next input message to be handled,
whereas the set of variables Vo ⊂ Vp represents the next output messages to be sent. For
convenience, we assume that Vi ∩ Vo = { }, i.e., a process possesses an input buffer and an
output buffer.
HANDLER PROGRAMS AND HANDLE STEPS
The execution of a process π is a loop divided into three phases: message receipt, message
processing, and message sending. These phases are three types of steps the process can
take – note that a process may also take stuttering steps. To model the message processing
phase, we need the concepts of instructions, operations and programs.
Definition 2.9 (Instruction) An instruction i is a tuple ⟨operation, operands⟩.
Definition 2.10 (Operation and operands) An operation is a machine operation as under-
stood from ordinary computers, e.g., an addition, a subtraction, a conditional jump. The
operands of an operation are specific variables or constant values used in the operation.
Although there is a difference between instruction and operation, i.e., an instruction is a
concrete assignment of operands to an operation, we often use these terms interchangeably.
Operations can be branching and non-branching. The non-branching operations, e.g., arith-
metic operations, always increment the pc by 1, letting the program counter point to the
following instruction. The branching operations set the pc according to their operands, e.g.,
an unconditional branch Jmp(v ) sets the pc to the address in variable v .
Definition 2.11 (Source and target operands) The operands of an instruction are divided in
groups: source operands and target operands. An operation might use the value of the source
operands, perform a computation and write the result in the target operands.
24
2.3 Modeling process faults
Although the program counter pc is modified by all operations, pc is not part of the target
operands of any instruction except when explicitly used as an operand. Therefore, branching
instructions have no target operand since they only modify the pc.
We now define what programs are and how they are executed.
Definition 2.12 (Handler program) A handler program p is an indexable sequence of instruc-
tions, denoted as ⟨i1, . . . , iN⟩, i.e., a function p : {1, . . . , N} → I, mapping indices from 1 to N
into the set I of all instructions.
Definition 2.13 (Handle step) Given a handler program p for process π, a handle step Handle(s)
applies the instruction p[pc] to the current state s resulting in a new state s′.
We allow only well-formed handler programs. Any well-formed handler program eventually
terminates.
Definition 2.14 (Well-formed handler) A handler program p is well-formed if and only if, for
any correct state s with pc = 1, by successively applying handle steps, eventually an instruction
of p sets pc = N + 1.
By setting pc to N + 1, we model the termination of the handler and the return of control
to the caller.
In our model, the instructions of a program p are not part of the variables V of the pro-
cess. This modeling decision does not cause major limitations to our model, but simplifies its
presentation. We discuss this point further in Section 2.3.2.
PROCESS COMMUNICATION
Since process π is part of a distributed system, we also need to model communication. In
addition to handle steps, a process can be modeled with a message receipt and a message
sending step. A receive step reads a message m from the network and writes m into the
predefined set of variables Vi . If there are messages to be sent out, a send step writes into
the network the messages formed by the predefined set of variables Vo. Otherwise, the send
step does nothing.
A process state transition can be either the receipt of a message if pc = 0, a program step
Handle(s) if 0 < pc ≤ N, or the sending of output messages if pc = N + 1. Formally, the
following predicate holds for every state s of process π running a program p:
Next ∆=
∨ pc = 0 =⇒ Receive ∧ pc′ = pc + 1
∨ pc > 0 ∧ pc ≤ N =⇒ s′ = Handle(s)
∨ pc = N + 1 =⇒ Send ∧ pc′ = 0
Such a modeling, nevertheless, requires a precise definition of the receive and send steps.
In turn, that further requires us to consider the whole distributed system to determine which
messages can be sent in every step. Instead, we opt for a simpler, but equivalent, modeling
using the concept of a traversal.
TRAVERSALS
A traversal is the part of an execution of a process π from the time when an input message
is stored into the variables Vi – i.e., the traversal’s initial state – until the time the output
messages in variables Vo are ready to be sent into the network – i.e., the traversal’s final
state. Stated differently, a traversal of a process π running a program p is an execution of the
25
2 Error isolation with hardening
s0 s1 s2 si si+1 si+2 si+3 si+4 sj sj+1 sj+2
Receive Next . . . Next Send Receive Next . . . Next Send
Traversal 1 Traversal 2
Execution’s initial state
Traversal 1’s initial state Traversal 2’s initial state
Figure 2.2: Example of two traversals in some execution E|π.
handle steps while 1 ≤ pc ≤ N. The execution of a process consequently is as a sequence of
traversals interleaved with receive and send steps (see Figure 2.2 for an example execution
with two traversals). With this modeling, we can reason about the correctness of traversals,
instead of complete process executions.
A process state transition is defined as follows.
Definition 2.15 (Process state transition) Given a program p for process π, a state transition
Next is either a program step Handle(s) if 1 ≤ pc ≤ N or stuttering steps once pc = N + 1.
Formally, the following predicate holds for every state s:
Next ∆=
∨ pc ≥ 1 ∧ pc ≤ N =⇒ s′ = Handle(s)
∨ pc = N + 1 =⇒ pc′ = N + 1
The communication is modeled by many possible traversal initial states, each of them with
a (potentially different) message stored in the variables Vi . Let I be such a set of traversal
initial states. We now define a traversal.
Definition 2.16 (Traversal) A traversal of process π running a program p is an execution
starting from a traversal initial state s∈ I followed by states satisfying the process state
transition Next or a stuttering step. Formally,
Traversal ∆= (s∈ I) ∧□(Next ∨ s′ = s)
The □ operator indicates that, for every pair of consecutive states ⟨s, s′⟩ in any traversal,
the Next formula holds or the process stutters. The stuttering steps s′ = s are steps in which
the process does nothing.
The difficulty of modeling traversals is exactly on defining the set of possible initial states I.
For that, we use the set of all possible executions (of the complete system), and then select
those states of π immediately after a message has been received by π, i.e., immediately
after a message has been written into the variables Vi and pc = 1. Figure 2.2 shows an
example of two traversals in some execution of π. The traversal initial states are the first
states within the traversals, i.e., {s1, si+3} ⊆ I. Remember that an execution e = ⟨S1, S2, . . .⟩ is
a sequence of states Si = ⟨sπ1 , . . . , sπN , snetwork⟩ comprising the state of processes in Π and
network component.
Definition 2.17 (Traversal initial states I) Let E be all possible executions of the system.
Let E|π be the set of executions E restricted to the states of process π. Finally, let A be the
set of all states in all executions of π, i.e., A = {s : ∀s∈ e : ∀e∈ E|π}. The set of correct
traversal initial states I is the subset of A such that s[pc] = 1 for every state s∈ I.
26
2.3 Modeling process faults
s0 s1 s2 si si+1 si+2 si+3 si+4 sj sj+1 sj+2
s23 s
2
4 s
2
k s
2
k+1 s
2
k+2 s
2
k+3 s
2
k+4
Receive Next . . . Next Send Receive Next . . . Next Send
Traversal 1 of E1|π Traversal 2 of E1|π
Fault
Next . . . Next Send Receive Next . . .
Continuing traversal 1 of E2|π Traversal 2 of E2|π
Figure 2.3: Example of a fault affecting a traversal of execution E1|π. The fault forks E1|π into
another execution E2|π.
2.3.2 FAULT MODEL
So far our process model does not contain faults. We now introduce the arbitrary state
corruption (ASC) fault model, which is essentially the arbitrary fault model (Section 2.1.3)
recast for single-process traversals. In the ASC fault model, faults are state transitions [Cri91]
that form a disjunction with the process state transitions [LM94]. A traversal starts from some
initial state and performs a sequence of transitions. Each transition of the traversal is either
a Next or a Fault. Formally, □(Next ∨ Fault ∨ s′ = s) holds for every pair of consecutive states
⟨s, s′⟩ in any traversal of π.
Figure 2.3 shows an example of an execution E1|π forking into execution E2|π at state s2
with a Fault step. The first traversal of execution E1|π is the sequence of states s1, s2, . . . , si+1,
whereas the first traversal of E2|π is the sequence of states s1, s2, s23, . . . , s
2
k+1.
3 In the ASC
fault model, a Fault step can be a crash transition or an arbitrary process fault, i.e., the
corruption of one or more variables in V .
CRASH FAULTS
We now model the process faults in benign fault model, i.e., the crash of a process π running
a program p. We assume the existence of a program counter value “halt” different from
0, . . . , N + 1. The process π performs no further state transitions if pc = “halt”.
Definition 2.18 (Crash fault) A crash fault is a crash transition that is only enabled if the
process is not halted yet. A crash transition sets the program counter to the “halt” value.
Formally,
Crash ∆= pc′ = “halt”
Halted ∆= pc = “halt”
Fault ∆= ¬Halted ∧ Crash
When pc = “halt”, the process state transition Next is never enabled – i.e., it is always false
– because every clause in the disjunction asserts that pc is some value in the set {0, . . . , N +1}
(see Definition 2.15). Since Next and Fault are disabled if Halted is true, the only possible
state transitions of process π after a crash are stuttering steps s′ = s.
Note that Definition 2.18 is concerned with process faults. Benign network faults – i.e.,
omission, reordering, duplication, and misrouting – are all captured by the initial state I with
faults, as defined in Section 2.3.3. Also, note that our notation does not comply with the full
TLA+ language [Lam02] when defining state transitions such as Next or Fault. To simplify the
presentation, we often only mention the variables that are modified in the state transition.
For example, in Fault of Definition 2.18, pc′ is set to “halt”, while all other variables in V
3We use a superscript to indicate to which execution the state belongs to, e.g., s23 belongs to execution E2|π.
27
2 Error isolation with hardening
are kept unmodified. With this simplification, we avoid adding a disjunction to Fault with
∀v ∈ V \ {pc} : s′[v ] = s[v ] or some equivalent construction.
STATE CORRUPTION FAULTS
We now define the state corruption faults of a process π running a program p. State corruption
faults can potentially modify the whole state of process π, and any arbitrary (Byzantine) failure
of process π can be modeled as a state corruption fault.
Definition 2.19 (Variable corruption) A variable corruption changes the value of a variable
v ∈ V to an arbitrary value x ∈ D such that the new value x is different from the current value
s[v ]. Formally, the formula Corruption defines a variable corruption:
Corruption(v ) ∆= ∃ x ∈ D : s[v ] ̸= x ∧ s′[v ] = x
In this work, we define a single type of corruption fault: arbitrary state corruption (ASC). An
ASC fault corrupts a set of variables W ⊆ V (following Definition 2.19), while leaving all other
variables with their previous values.
Definition 2.20 (Arbitrary state corruption) An Arbitrary State Corruption (ASC) fault is the
corruption of any non-empty subset W ⊆ V of variables. Formally,
ASCFault ∆= ∃W ⊆ V : W ̸= {} ∧ ∀w ∈W : Corruption(w )
One could imagine other types of state corruptions such as the corruption of a single variable,
or the corruption of a single bit in a variable. ASC faults result in weaker (i.e., more general)
properties, which include more restrictive forms of state corruptions (e.g., Single-Event Upset).
Note that we are not interested in the causes of the ASC faults, just on their effect, i.e., the
errors caused by them. In our process model, all conditions required by a process to perform
a Next step are present in its state, so it is natural that faults only corrupt the process state.
For example, we do not have to model a fault incorrectly computing an addition because that
is equivalent to correctly performing the addition and then corrupting the resulting value in
the state with an ASC fault. In this way, ASC faults capture hardware errors in a unified way
regardless of whether a hardware error affects memory elements or affects combinational
logic circuits that eventually write into memory elements. Also, note that any variable v ∈ V
can be corrupt, not only program variables Vp. Since hardening algorithms may only have
bookkeeping variables in V , these variables are also subject to corruption.
We now define an arbitrary process fault.
Definition 2.21 (Arbitrary process fault) An arbitrary process fault is either a crash transition
or an ASC fault. An arbitrary process fault is only enabled if the process is not halted. Formally,
Fault ∆= ¬Halted ∧ (Crash ∨ ASCFault)
Transient, intermittent, and permanent faults. In our model, ASC faults are transient: They
corrupt state variables once they occur, but later writes into the corrupt variables can correct
their values. Intermittent and permanent faults are, nevertheless, possible in our fault model.
The ASC fault model per se does not restrict the frequency in which ASC faults may occur.
A permanent or intermittent fault can be modeled as several occurrences of the same type
of ASC fault. Consider a variable v being corrupted by an ASC fault and later being reset
by an instruction i – i.e., being written with a correct value by instruction i. If another ASC
fault occurs immediately after the instruction i, then no further instruction can read the value
corrected by the instruction i.
28
2.3 Modeling process faults
Faults in the program instructions. In our model, the state of π might be corrupt, but not
the program p since p is not modeled as part of the process state. A transient corruption of
the text segment of a program, i.e., its instructions, typically result in a crash or in an ASC
fault, for example, if the corrupt instructions incorrectly modify variables. Evidence from both
our experiments and related work [Cor+12b] suggests that text corruption is mostly harmless,
since it quickly leads to the crash of the faulty process. Notwithstanding, as long as the
hardening technique does not assume a limit on the frequency of ASC faults, repeated ASC
faults can also model faults in the program instructions, following the same argument as for
permanent faults.
MAPPING SYMPTOMS INTO ASC FAULTS
To illustrate the expressiveness of our fault model, we take a series of software-level symp-
toms caused by hardware errors and show that each of these symptoms can be modeled
as one arbitrary state corruption fault interleaved with program operations, i.e., with process
state transitions.
Consider a process π with variables V = {x, y, z} executing the simple program p in Fig-
ure 2.4. Assume that in the initial state the values of variables x, y , and z are 2, 1, and 0,
respectively. The only correct execution of p is shown in Figure 2.5.
pc Instruction
1 z = x + y
2 y = x * z
Figure 2.4: Simple program p
pc x y z
1 2 1 0
2 2 1 3
3 2 6 3
Figure 2.5: Correct execution of p
Data-flow symptoms. We start with the six data-flow symptoms defined in the rather in-
formal error model of Forin [For89]. These high-level faults are relevant since they are often
used in the AN-encoding literature [Sch+10a; Ulb+12; Wam+13; Hof+14].
A faulty operation is the malfunction of an operator producing an incorrect result despite of
correct input operands. A faulty operation can be modeled by letting the operation execute
correctly, and then performing an ASC fault on the target operand. Consider the example
in Figure 2.6. The state s1 before the faulty operation is performed contains x = 2, y = 1
and z = 0. The operation is performed normally, setting z = x + y = 3. Afterwards, a single
fault changes the value of variable z in state s2 into some arbitrary value in state s3, where
Df = D\{s[z]}. In the figure, we mark the corrupt variables with bold fonts.
x = 2
y = 1
z = 0
pc = 1
x = 2
y = 1
z = 3
pc = 2
x = 2
y = 1
z = 6
pc = 2
s′[z] = s[x] + s[y ]
pc′ = pc + 1
step
s′[z]∈ Df
ASC fault
s1 s2 s3
Figure 2.6: Faulty operation as ASC fault
A modified operand is defined as one or multiple bit flips of one operand read in a given
operation. An ASC fault can represent such a high-level fault if taking place before the opera-
tion. Figure 2.7 shows such an example where first variable y is corrupted by the fault, where
Df = D\{s[y ]}, and subsequently the operation is performed propagating the error from y to z.
29
2 Error isolation with hardening
x = 2
y = 1
z = 0
pc = 1
x = 2
y = 7
z = 0
pc = 1
x = 2
y = 7
z = 9
pc = 2
s′[y ]∈ Df
ASC fault
s′[z] = s[x] + s[y ]
pc′ = pc + 1
step
s1 s2 s3
Figure 2.7: Modified operand as ASC fault
An exchanged operand is defined as an operation that uses a wrong operand to perform its
computation. The value of the operand itself is not necessarily incorrect. Figure 2.8 shows
how such fault can be seen as a restricted ASC fault taking place after the correct operation.
The variable z is modified by the fault with the value of y + y , representing the exchange of
operand x with operand y .
x = 2
y = 1
z = 0
pc = 1
x = 2
y = 1
z = 3
pc = 2
x = 2
y = 1
z = 2
pc = 2
s′[z] = s[x] + s[y ]
pc′ = pc + 1
step
s′[z]∈ Df
ASC fault
s1 s2 s3
Figure 2.8: Exchanged operand as ASC fault
In general, an exchanged operand is a restricted ASC fault in which the value of the corrupt
variable is any value in Df ⊆ D. For this particular operation (i.e., addition of x and y ), the set
Df is defined as the union of two sets: (1) the set of values when adding all possible variables
with x except y and z, and (2) the set of values when adding all possible variables with y
except x and z:
Df = {x + v : ∀v ∈ V \ {y, z}} ∪ {v + y : ∀v ∈ V \ {x, z}}
Similarly, the set Df can be defined for each instruction in program p.
An exchanged operator is defined as an assignment to the target operands using an op-
eration different from the one defined in the current instruction. The source operands are
however the same defined in the instruction. Figure 2.9 shows how such fault can be seen
as an ASC fault taking place after the correct operation. The example shows z being modified
with the result of x −y , representing the addition operator + of the instruction 1 being modified
with a subtraction operator −.
x = 2
y = 1
z = 0
pc = 1
x = 2
y = 1
z = 3
pc = 2
x = 2
y = 1
z = 1
pc = 2
s′[z] = s[x] + s[y ]
pc′ = pc + 1
step
s′[z]∈ Df
ASC fault
s1 s2 s3
Figure 2.9: Exchanged operator as ASC fault
Similarly to an exchanged operand, exchanged operator is modeled with a subset of values
Df ⊆ D, for example, Df = {x − y, x * y, x/ y, x%y, x&y, x|y, . . .}. The set Df is defined for each
instruction of p and depends on the possible operations of the specific hardware.
Finally, a lost store is a fault in which a variable z is assigned with a new value by an
instruction of p, but a later instruction reads the old value of z. Consider the example: In
instruction 1, the variable z is set to x + y , but a lost store reset the value of z to 0 using the
30
2.3 Modeling process faults
u function. The following instruction 2 multiplies x * y , propagating the error and resulting in
an unexpected value of y .
Figure 2.10 shows how lost stores can be seen as an ASC fault taking place between
correct operations. We introduce another specification-only function u: The surjective function
u : V → D keeps track of the last value of each variable before an assignment by any instruction
of the program. Each Next step updates u such that ∀v ∈ V : s′[v ] ̸= s[v ] =⇒ u′[v ] = s[v ]. A
lost store is an ASC fault that replaces the value of a variable v in the next state s′ with its
old value u[v ].
x = 2
y = 1
z = 0
pc = 1
x = 2
y = 1
z = 3
pc = 2
x = 2
y = 1
z = 0
pc = 2
x = 2
y = 0
z = 0
pc = 3
s′[z] = s[x] + s[y ]
pc′ = pc + 1
step
s′[z] = u[z]
ASC fault
s′[y ] = s[x] * s[z]
pc′ = pc + 1
step
s1 s2 s3 s4
Figure 2.10: Lost store as restricted ASC fault
Relation between data-flow symptoms. Once the symptoms are precisely modeled, rela-
tions between them can be derived. Modified operands and faulty operations are equivalent,
being simply ASC faults with no additional restrictions. All other symptoms are ASC faults
restricted to some subset of values Df ⊆ D. Exchanged operand faults are program and state
dependent: what values the Df has depends on which is the next instruction and what are
the current values of all variables in the process. Exchanged operator faults highly depend
on which instruction set the underlying hardware offers. If the instruction set contains only
addition and subtraction operations, then the Df should not contain the value x%y . If the
instruction set contains a root operation, then x
1
y should be in Df . Finally, a lost store is a very
restricted ASC fault, where only one specific value in D can be used: the value given by u.
Control-flow symptoms. Besides data-flow symptoms, another typical set of errors consid-
ered are control-flow errors [OSM02a; Bor+06]. Control-flow errors can be divided into branch
errors and non-branch errors. The former are errors in branching instructions, e.g., the instruc-
tion jumps to an incorrect address despite the fact that the operands were correct. The latter
are errors in any other instruction of the program. They simply modify the instruction pointer
to some arbitrary value.
Figure 2.11 shows how an ASC fault taking place at any point in the execution can model
a control-flow error. The example starts with a fault changing the program counter from 1 to
2, effectively skipping the first instruction of Figure 2.4. The second instruction executes, and
the resulting values in y and z diverge from the expected values in Figure 2.5.
x = 2
y = 1
z = 0
pc = 1
x = 2
y = 1
z = 0
pc = 2
x = 2
y = 0
z = 0
pc = 3
pc′ ∈ D
ASC fault
s[y ]′ = s[x] * s[z]
pc′ = pc + 1
step
s1 s2 s3
Figure 2.11: Control-flow fault as ASC fault and error propagation
We model branching and non-branching errors in the same way. The only difference is that
branching errors occur immediately after a branching instruction. Note that errors affecting the
31
2 Error isolation with hardening
s0 s1 s2 si si+1 si+2 si+3 si+4 sj sj+1 sj+2
s23 s
2
4 s
2
k s
2
k+1 s
2
k+2 s
2
k+3 s
2
k+4
s3i+2 s
3
i+3 s
3
i+4 s
3
i+5 s
3
i+6 s
3
i+7
Receive Next . . . Next Send Receive Next . . . Next Send
Traversal 1 of E1|π Traversal 2 of E1|π
Fault
Next . . . Next Send Receive Next . . .
Continuing traversal 1 of E2|π Traversal 2 of E2|π
Fault
Send Receive Next Next Send
Traversal 2 of E3|π
different traversal initial states
Figure 2.12: Example of a fault affecting execution E1|π inside and outside a traversal. The
faults fork E1|π into other execution.
operands of a branch instruction are in fact data-flow errors, e.g., a branch instruction jumps
to an arbitrary position in the program due to an arbitrary value in its operand.
2.3.3 FAULT ASSUMPTIONS
The goal of our hardening techniques presented in Chapters 3 and 4 is to guarantee Proper-
ties 2.3, 2.4 and 2.2 in face of ASC faults. In particular, guaranteeing Property 2.3 is the major
challenge since it asserts what a faulty process is allowed to do.
Local error exposure (Property 2.3) asserts that if a faulty process sends a corrupt message
out, this message is invalid. Consider now the example in Figure 2.12. A hardening technique
should guarantee that this property holds whenever a traversal terminates, e.g., at states
si+1, s2k+1, s
3
i+6, etc. Without any further assumption, however, ASC faults can change the
variables in any way and arbitrarily often, inducing a process π to commit Byzantine failures,
i.e., sending corrupt but valid messages.
Hardening techniques have to cope with two general fault cases:
Case 1: A fault directly corrupts variables causing an arbitrary failure. For example, before the
Send step is taken after state si+1, a fault corrupts the variables in Vo and the corrupt
data is sent out at state s3i+2.
Case 2: A fault causes a failure indirectly. For example, a fault corrupts variables of the
program at state s2 and the error propagates to variables in Vo via Next steps. The
process then sends corrupt data out at state s2k+1.
Note that Case 1 can also occur during a traversal, instead of occurring after a traversal.
Imagine a variable v ∈ Vo is only modified by a Next step from state s1 to s2. In a fault-free
execution, v is correct when sent out in the Send step at state si+1. However, if a Fault step
corrupts v at state s2 and no Next step modifies v until s2k+1, then v is corrupt when sent out
in the Send step from s2k+1 to s
2
k+2.
Also note that Case 2 might become a failure only in some further traversal, for example, at
the end of traversal 2 of execution E2|π. Consequently, the traversal initial states of traversal
2 in execution E1|π and in execution E2|π are different. To model all possible traversals, we
therefore not only add Fault steps to the set of state transitions of the process, but we also
add faults in the traversal initial states, so that at least one of them contains the variable
corruption of traversal 1 in E1|π.
Fault steps and faults in the initial state have to be restricted by fault assumptions, otherwise
the traversal might lead to or even already start at a state in which local error exposure can
32
2.3 Modeling process faults
be violated. Although most fault assumptions are specific to the hardening techniques, one
central assumption is common to all techniques presented here: fault diversity. We start by
defining fault diversity and discussing how it inhibits the possibility of Case 1 faults becoming
arbitrary failures. Next, we precisely define the concept of corrupt variables. We conclude
our modeling by refining the definitions of traversals and initial states now taking into account
faults and fault assumptions.
FAULT DIVERSITY
Each hardening technique employs some form of space redundancy to protect a subset of
variables Vd ⊂ V . Space redundancy relies on an assumption that we call fault diversity,
following the terminology of Correia et al. [Cor+12b].
Assumption 2.2 (Fault diversity) A Fault step at some state s satisfies fault diversity if and
only if it is a crash transition for every variable v ∈ Vd modified during the fault is invalid at state
s′. Formally, the ASCFault formula is composed via conjunction with the following formula:
FaultDiversity ∆= ∀v ∈ Vd : s′[v ] ̸= s[v ] =⇒ ¬Valid ′(v )
Fault diversity asserts that every variable v ∈ Vd modified by a Fault step at state s is invalid
at state s′. The definition of the set Vd ⊆ V as well as of the formula Valid(v )4 depends on
the hardening technique. As we will present in the following chapters, AN-encoding adds
redundancy in the form of codes to each variable in V except the program counter pc. For
AN-encoding the validity of a variable depends on whether its value and its code match. In
contrast, SEI-hardening reserves a replica variable in V for each variable in Vd , defining the
validity as the value of a variable and its replica being the same.
In spite of differences, all techniques consider sets Vi and Vo as part of Vd . The concept
of validity (Definition 2.6) states that messages are sent out with enough information to be
classified as valid or invalid by a correct receiver. It is natural to use space redundancy in
the process state to also protect messages end-to-end since messages are stored in the
state as variables in Vi and Vo. Therefore, fault diversity guarantees that, if a message is
corrupted in the network, then some variables representing the message are invalid upon
receipt. Moreover, if a fault directly corrupts some variables v ∈ Vo at some state s, then v is
invalid at state s′, implying that the message m /∈ CV . In our example, a Case 1 fault corrupting
a v ∈ Vo at state si+1 results in an invalid variable v at state s3i+2. If the message represented
by Vo is sent out at state s3i+2 anyhow, then a correct process receiving the message can
discard it. Therefore, fault diversity allows the hardening techniques to focus on Case 2 faults
since Case 1 faults cannot occur by fault diversity assumption.
CORRUPTED VARIABLES AND TRAVERSALS
To guarantee that Property 2.3 holds, a hardening technique has to assure that, at the end of
every traversal, e.g., at state si+1 in our example of Figure 2.12, the variables in Vo are either
all correct or some of them are invalid. Intuitively, a variable is correct if it is not corrupt.
To reason about the correctness of the hardening techniques, we have to precisely define
what corrupt and correct variables are. We introduce a specification-only reference state r
(following Definition 2.8). The state r is not accessible by the system and is only used to
specify faults and reason about correctness of the algorithms. A variable is said to be corrupt
while its value in s is different from its value in r .
4The primed formula Valid ′(v ) is the formula Valid(v ) with state s′. It means that the variable v is valid at s′.
33
2 Error isolation with hardening
Definition 2.22 (Corrupt and correct variable) A variable v ∈ V is corrupt if and only if the
value of v in s is different from the value of v in r; otherwise v is correct. Formally,
Corrupt(v ) ∆= s[v ] ̸= r [v ]
Correct(v ) ∆= ¬Corrupt(v )
Intuitively, the reference state r of process π takes the same process state transitions and
the same stuttering steps as state s. Nevertheless, if an arbitrary fault affects the state s, it
does not affect the reference state r . Consequently, an ASC fault might make the value of
variables in s and r diverge; the error might also be propagated to other variables via Next
steps. Moreover, an ASC fault might also “fix” a corrupt variable, i.e., the fault might make
the value of some variable in s match again the value in r . We now define reference state
transitions, reference fault transitions and reference initial states.
Definition 2.23 (Reference state transition) The reference state transition RNext is given by
Definition 2.15 using state r instead of s.
Definition 2.24 (Reference fault transition) The reference fault state transition RFault is de-
fined as follows:
RFault ∆=
∨ s′[pc] = “halt” =⇒ r ′[pc] = “halt”
∨ s′[pc] ̸= “halt” =⇒ r ′ = r
RFault is a disjunction with two cases. In the first case, if the process halts/crashes, then
the next value of pc in state r is also set to “halt”. In the second case, if a fault different from
a crash occurs in s, RFault stutters state r .
Definition 2.25 (Reference initial states Ir ) The same as Definition 2.17, but where E is the
set of all possible executions of the system under benign failures.
The benign faults in Definition 2.25 do not only refer to benign process faults, but also to
benign network faults. Although the reference initial states have no corrupt variables, they
might contain misrouted or reordered messages in Vi due to these benign network faults –
remember that we assume the system is benign-fault tolerant.
We now reformulate the definition of a traversal (Definition 2.16) to take faults and the
reference state into account. We join the process state transitions with the reference state
transitions, i.e., (Next ∧ RNext). Since faults do not corrupt the reference by definition, we
join the fault transitions with reference fault transitions, i.e., (Fault ∧RFault). State transitions
always satisfy the following formula:
□
(
(Next ∧ RNext) ∨ (Fault ∧ RFault) ∨ (s′ = s ∧ r ′ = r )
)
Finally, we redefine the traversal initial state since the previous definition (Definition 2.17)
does not model faults. Consider again the example of Figure 2.3. State s2k+3 differs in some
arbitrary way from state si+3. State s2k+3 may contain corrupt variables from previous traversal,
which were either corrupted directly by the Fault step or indirectly via error propagation when
performing Next steps. Moreover, state s2k+3 may contain an input message in Vi that arrived
corrupt from the network. Allowing, however, any number or format of ASC faults to affect
the initial state of a traversal can break any hardening technique. As we already discussed,
fault diversity has to hold at the traversal initial states, rendering a corrupt message in Vi as
invalid. Hence, we redefine the set of traversal initial states as follows.
34
2.3 Modeling process faults
Definition 2.26 (Traversal initial states Is with faults) A traversal initial state s is any state
that is either correct, i.e., equal to the reference initial state r ∈ Ir of the traversal, or some
variable v is such that s[v ] ̸= r [v ]. The set of initial values Is boils down to the set D|V | of
all possible states for a set of variables V – remember that D is the domain of values of all
variables.
The set of all possible traversal initial states Is ⊆ D|V | is formally defined as follows.
Is
∆= Ir ∪ {s∈ D|V | : ∀v ∈ Vd : s[v ] = r [v ] ∨ ¬Valid(v )}
Note that only variables v ∈ Vd have to be valid; the definitions of set Vd and of Valid(v ) are
technique-specific.
With reference state transitions RNext, the set Ir of reference initial states, and the set Is
of traversal initial states, we can redefine traversals by coupling the transitions on r and on s.
Definition 2.27 (Traversal with fault and reference state) A traversal with faults and refer-
ence state of process π running a program p is an execution starting from a potentially corrupt
initial state followed by states satisfying the process state transitions Next and the reference
state transitions RNext, or the fault state transitions Fault and the reference fault transitions
RFault, or a stuttering step. Formally,
Traversal ∆=
∧ r ∈ Ir ∧ s∈ Is
∧□
(
(Next ∧ RNext) ∨ (Fault ∧ RFault) ∨ (s′ = s ∧ r ′ = r )
)
In summary, the initial reference state r is one of the possible reference initial states of
Definition 2.25. The initial state s is one of the possible states in Is – being either equal
to r or corrupt. If a process state transition (Next step) takes place, then a reference state
transition (RNext step) takes place as well. If a fault transition (Fault step) takes place, then
a reference fault transition (RFault) takes place. Finally, if the state s stutters, the reference
state r stutters.
Remark on stuttering traversals. If an ASC fault causes the program counter pc to point
N + 1 before any variable is modified, this traversal is said to stutter due to a fault. A traversal
that stutters due to a fault simply results in message omissions – i.e., an input message not
being processed – and/or message duplication – i.e., an output message being sent again.
Such failures can be mapped as network faults in the distributed system model. Hence, we
do not consider traversals that stutter due to faults in our process fault model.
RELATION BETWEEN CORRUPT VARIABLES AND CORRUPT MESSAGES
We conclude this section by showing that if a process π sends a corrupt message (cf. Def-
inition 2.5) then there is at least one variable v ∈ Vo that is corrupt (cf. Definition 2.22).
Lemma 2.1 allows us to reason exclusively on corrupt variables in the design of SEI in the
next sections.
Lemma 2.1 (Corrupt message implies corrupt Vo) Let E be an execution of the distributed
system and σ = ⟨T1, T2, . . .⟩ the sequence of traversals executed by process π in E that
modified a variable in the state, i.e., we ignore stuttering traversals. Let mko be an output
message sent by process π in a traversal Tk ∈ σ. If message mko is corrupt, then there exists
v ∈ Vo such that Corrupt(v ) holds at the end of Tk .
35
2 Error isolation with hardening
Proof:
1. Let se be the state at the end of traversal Tk such that se[pc] = N.
2. Let re be the reference state at the end of traversal Tk such that re[pc] = N.
3. By transposition, it is sufficient to show that if ∀v ∈ Vo : ¬Corrupt(v ) at state se, then mko
is a correct message according to Definition 2.4, i.e., mko has a correct generation history.
4. Assume ∀v ∈ Vo : se[v ] = re[v ] according Step 3 and definition of Corrupt(v ) (Defini-
tion 2.22).
5. Let r ji be the reference initial state of a traversal Tj .
6. Let mji be the input message represented by the values of Vi in state r
j
i .
7. hk = ⟨m1i , m2i , . . . , mki ⟩ is a correct generation history hk for message mko.
7.1. hk is a generation history for message mko, by Definitions 2.2 and 2.27 since we only
consider non-stuttering traversals.
7.2. The precedence relationship of Definition 2.1 is a total order, since π is single-threaded
and deterministic.
7.3. CASE: k = 1
7.3.1. π sends no output message before m1o, therefore hk is correct.
7.4. CASE: k > 1
7.4.1. Let message mjo be the correct message sent by π in a traversal Tj such that there
is no mo that precedes mko but does not precede m
j
o.
7.4.2. By induction, hj = ⟨m1i , m2i , . . . , m
j
i⟩ is a correct generation history for m
j
o.
7.4.3. hk = ⟨m1i , m2i , . . . , m
j
i , m
k
i ⟩ extends hj , therefore it is correct. □
2.4 RELATED WORK
In this section, we relate the content of this chapter with prior work. We divide the section in
two parts: We first discuss related work on the area of transformational approaches to improve
the fault tolerance of systems. Next, we consider fault models that represent hardware errors
inside processes.
TRANSFORMATION FOR FAULT TOLERANCE
Automatic transformations for fault tolerance were already studied in the 80s by Coan [Coa88]
and Neiger and Toueg [NT88], among others. In contrast to Coan’s approach, hardening is
neither restricted to algorithms working in a “particular standard form”, i.e., round-based algo-
rithms, nor restricted to only one class of distributed algorithms, i.e., consensus algorithms
and related algorithms. The only restriction we make is that algorithms have three phases:
message receipt, message handling and message sending (see Section 2.3.1). This template
is, nevertheless, the general template of distributed applications. In contrast to Neiger and
Toueg’s approach, our approach is not restricted to synchronous systems.
The transformation of crash-tolerant systems into systems that tolerate Byzantine faults is
also called the simulation of crash failures on top of Byzantine failures and is presented in
detail in the text book by Attiya and Welch [AW04]. Despite focusing mainly on synchronous
systems, the simulation by Attiya and Welch is interesting because it is modular and general.
It first guarantees Byzantine failures are identical to all correct processes; then it guarantees
that identical Byzantine failures are mapped into omission failures; and finally it maps omission
failures into crash failures. Error isolation (Property 2.1) is related to their integrity property
of the omission failure model. In contrast to our approach, their approach has several issues
affecting its practicality. First, it increases the message complexity to guarantee identical
Byzantine failures. Second, it increases message sizes by a factor O(|Π|) to implement omis-
sion semantics. The message-size increase occurs because a process πs sending a message
36
2.4 Related work
m appends to m a support set, i.e., the set of all messages that caused πs to generate
m. Third, a receiver process πr keeps a local copy of each process in Π. Upon receiving a
message m, πr locally simulates the sender πs using the support set received along with m,
and validates that m should indeed have been sent. Our approach neither simulates other
processes, nor increases message complexity, nor increases message sizes by more than a
constant factor.
More recently, Ho et al. [Ho+08] proposed Nysiad, a system that transforms arbitrary dis-
tributed algorithms by assigning guard hosts to each process; it requires at least 3f +1 guards
per process to tolerate one faulty guard. Guards constrain the order in which processes handle
their input messages. Nysiad also increases the message complexity and requires that each
guard runs a copy of the processes it monitors. Hardening neither requires extra messages,
nor simulates other processes. Nysiad shows that, under the Byzantine fault model, algo-
rithms have to replicate processes over at least 3f +1 physical machines even when only error
isolation is needed. In contrast, hardening does not imply replication. Therefore, hardening is
better suited for arbitrary distributed systems since some distributed systems do not handle
crash failures with replication, e.g., memcached recovers by restarting and repopulating the
cache with queries from an external database. Requiring such systems to replicate to cope
with hardware errors can be prohibitively expensive.
Clement et al. [Cle+12] propose a transformation that does not introduce replicas, but as-
sumes each process is equipped with a trusted hardware component such as a trusted in-
crementer [Lev+09] or an attested append-only memory [Chu+07]. The presence of trusted
hardware components makes Byzantine faults identical to all correct processes. Each process
then employs “an expensive mechanism of replaying the execution of almost the entire sys-
tem” to validate input messages received by correct processes. This validation scheme again
renders the work as a mostly theoretical contribution.
Related to the approaches aforementioned is also the body of work on Byzantine-fault de-
tection (BFD) [DGG02; KMMS03; HKD07]. BFD provides an abstraction that promises to sim-
plify the construction of algorithms that tolerate Byzantine faults. Byzantine-fault detectors
are, however, inadequate to implement error isolation since they eventually detect Byzantine
faults. Property 2.4, in particular, requires messages to be identified as valid or invalid upon
receipt; an eventual time instant might be simply too late to stop the failure propagation.
Haeberlen and Kuznetsov [HK09] propose a model in which BFD transformations can be de-
scribed. One could consider reusing their model for describing hardening transformations,
but their model focuses on the process failures, which are faults to the entire distributed
system, and abstracts from the actual process faults, which cause the process failures in the
first place. Consequently, the model does not detail the internal state of processes and their
processing steps. Notwithstanding, models of arbitrary faults affecting the state of processes
do exist. We discuss some of these models next.
MODELING EFFECTS OF HARDWARE ERRORS
Forin [For89] describes an error model that supposedly captures the effect of all hardware errors
at the program level, with exception of control-flow errors. This model is, however, described
concisely and informally, without specifying a computation model (i.e., a process model). We
do not have confidence that this model properly captures the effect of all hardware errors in a
process. In fact, we think that the informality of Forin’s model can lead to the development of
mechanisms with unclear guarantees, e.g., ANB- and ANBD-encoding (we discuss this issue
further in Section 3.3). As we have shown in Section 2.3.2, ASC faults are more general than
Forin’s errors since his errors can always be represented as ASC faults.
Perry et al. [Per+07] formally define a fault-tolerant assembly language called TALFT. A pro-
gram implemented in TALFT can provably detect hardware errors within their fault model.
37
2 Error isolation with hardening
Although the approach proposed by Perry et al. is precise, it is not general because fault de-
tection is guaranteed with a specific detection mechanism, namely, instruction duplication
similar to SWIFT [Rei+05a]. TALFT has two further limitations. First, it assumes the single-
bitflip fault model. As we argued in Section 1.2, this model might be too restrictive in practice.
Second, it requires trusted hardware components to check replicated values before writing
them into memory. In this work, we develop untrusted solutions, which do not require addi-
tional (potentially expensive) hardware components.
Pattabiraman et al. [Pat+08] introduce SymPLFIED, a framework to model and verify the
effect of hardware errors on software. SymPLFIED takes as input, among other items, an
error model, a detector model and the target program source code. It then applies model
checking, injecting symbolic errors to the variables of the program, and verifying when the
monitored properties are violated. The approach inherits the limitations of model checking
such as state explosion for large programs. More importantly, SymPLFIED can only show that
a given program has the desired properties, but it cannot prove that a hardening technique
works for any program, which is our goal.
Correia et al. first proposed the ASC fault model as the underlying fault model for the PASC
algorithm [Cor+12b]. ASC faults capture hardware errors in memory elements and combina-
tional logic circuits (see Section 2.3.2), allowing not only single bit flips, but also the corruption
of several bits, in several variables, multiple times. In contrast to their ASC fault model, the
model defined in this chapter is (1) more precise by using a cleaner notation and structure;
(2) more general by having one main assumption, namely, fault diversity, and (3) decoupled
of any specific hardening algorithm such as PASC. As a consequence of (2), our ASC model
seamlessly describes permanent and intermittent faults by allowing the re-occurrence of the
same faults throughout an execution. As a consequence of (3), our ASC model can be used to
design other algorithms seemingly disconnect from PASC such as AN-encoding (Chapter 3).
2.5 CONCLUSION
The main contributions of this chapter are the hardening problem definition and the new ASC
fault model. Together they form a framework on which new hardening techniques can be
designed and proved correct (Chapter 4). Moreover, the formal framework sheds light on
existing techniques such as AN-encoding (Chapter 3) by facilitating their formalization and the
precise specification of their fault assumptions.
Hardening is an interesting problem because solutions – i.e., hardening techniques – allow
already existing benign-fault-tolerant distributed systems to tolerate not only benign faults, but
also the more severe ASC faults.
Road map. In Chapters 3 and 4, we present two different hardening techniques on top of
the ASC model. Each technique refines our process and fault models adding further technique-
specific fault assumptions. Moreover, each hardening technique defines the set of protected
variables Vd and the predicate Valid used in the fault-diversity assumption.
Future work. We see several open problems in this chapter to be tackled in future work.
For example, the proof of Theorem 2.1 is rather informal. A better formalization of the benign-
fault model and a better proof can deepen our understanding of the relation between the
simulation discussed in Section 2.2 with the existing body of theoretical work on the area.
Moreover, we believe that we could easily formulate a proof for the claims that liveness is
not guaranteed by our simulation (Section 2.2.2), and that our ASC model can generate any
Byzantine behavior (Section 2.3.2). Finally, a further contribution would be proving that fault
diversity is a necessary property to locally achieve error isolation.
38
3 ENCODED PROCESSES∗
∗The contents of Sections 3.4 and 3.5 in this chapter first appeared at LADC ’13 [BWF13].
39

3.1 Background
One of the known problems with AN-encoding and other error-detection techniques is a lack
of a formalism [Per+07]. In this chapter, we do the first steps in modeling and proving AN-
encoding correct in a formal framework for distributed systems. In Section 3.2, we first
define a simplified AN-encoding technique that only supports a small instruction set. We call
the technique Perfect AN-encoding, or PAN-encoding. We then refine the ASC fault model
with assumptions specific for PAN-encoding, and prove the technique correct. In Section 3.3,
we estimate the coverage of our assumptions, pinpoint the formalization limitations and diffi-
culties, and discuss existing approaches to increase the assumption coverage.
Despite requiring some strong assumptions, AN-encoding turns out to be very effective
in practice. In Section 3.4, we design and implement a real framework to build distributed
systems capable of automatically encoding applications developed on top. Our framework
extends the scope of AN-encoding to distributed systems in a systematic way. Moreover,
a practical issue not considered in previous work is the excessive network traffic overhead
caused by sending encoded messages out. We propose a simple bandwidth optimization to
reduce the network overhead to a constant factor of 8 bytes per message.
In Section 3.5, we implement two well-known algorithms in our framework: the Highly-
available Leader Election Service by Fetzer and Cristian [FC99a]; and the multi-instance Paxos
by Lamport [Lam98], which is widely used in replicated systems [Bol+11; Bur06]. We then
evaluate them experimentally. Our results illustrate the possible trade-off between fault cov-
erage and CPU utilization. Encoded Paxos variants show virtually no network overhead: they
reach the same request throughput as the native variant with 5 acceptors at the cost of extra
CPU cycles and higher response time. Under a load between 1 k and 5 k.req/s, for example,
encoded variants of Paxos provide response times varying from 20 to 105 ms; an overhead of
at least 4 times in relation to the native variant. The encoded variants of the leader election
provide excellent election times, while consuming at most 11% more CPU cycles. Finally, our
fault injection results suggest that AN-encoding can successfully guarantee error isolation: An
encoded Paxos proposer has its probability of undetected errors (given that a failure occurred)
decreased two orders of magnitude, from 16% to about 0.34%.
In Section 3.6, we discuss additional related work relevant to the topics in this chapter. We
conclude this chapter with an outlook and future work suggestions in Section 3.7.
3.1 BACKGROUND
Arithmetic error codes [Avi71] are a family of error codes often used to detect hardware errors.
Codes of this family have an attractive property over ordinary error-detection codes such as
CRC: Arithmetic operations preserve arithmetic error codes; in other words, arithmetic error
codes allow for the computation with the encoded data. Moreover, these codes do not
assume an error rate and can even detect permanent faults.
Arithmetic error codes were initially implemented in fault-tolerant hardware of safety-critical
applications; for example, the STAR computer [Avi+71] was designed for spacecrafts and im-
plemented residue codes; the Vital Coded Processors [For89], used in automatic transporta-
tion systems, employed AN codes. Software-based implementations of arithmetic error codes
have also been proposed and evaluated [RCA07; FSS09; Ulb+12].
All software-based implementations we are aware of employ AN codes exclusively; hence,
we restrict our focus to these codes. In this context, AN-encoding is the procedure of trans-
forming a native program p into an AN-encoded program pe by using, for example, a compiler.
The transformation encodes the value of each state variable x of program p by multiplying its
original functional value xf with a constant A defined in compile time. The resulting encoded
value of x is denoted xc. The domain of an encoded variable x is divided into a few code
values (multiples of A) and many noncode values (not multiples).
One can easily check whether xc is a code value or not by applying the modulo operation on
41
3 Encoded processes
1
2
3
+
+
Domain of values
Valid code value
Fault-free operation
Faulty operation
Read/write code value
Read/write noncode value
Figure 3.1: Encoded operations and errors
xc with A. If the result is zero, then the value is code, otherwise the value is noncode. The
multiplication by A works as a “randomization” of the code values. Ideally, if a fault flips bits
of xc, its value becomes not multiple of A with a high probability, approximately 1 − 1A [For89].
The encoding transformation also replaces each operation of the native program p with an
encoded operation in program pe. Encoded operations take encoded values as arguments
and produce encoded values as output. Ideally, encoded operations have the three properties
illustrated in Figure 3.1:
1. if operands are code values and no error occurs, then the operation result is a code
value;
2. if at least one operand is noncode, then the operation result is a noncode value;
3. if the operator is faulty or exchanged, then the operation result is a noncode value.
As an example of how the code is preserved, consider an encoded addition. If A = 7, then
instead of adding the two values 2 and 3 the encoded program adds 14 and 21. The result
35 is again a multiple of A because xc + yc = xf · A + yf · A = (xf + yf ) · A. If the addition is faulty
or one of the arguments is noncode, the result is noncode with high probability.
AN codes cannot detect all symptoms described by Forin (see symptoms in Section 2.3.2,
Page 29). Therefore, Forin and others propose several heuristics to improve the coverage of
AN codes (which we discuss in Section 3.3). In the next section, we formalize a simplified,
pure AN-encoding in the ASC model and precisely define the fault assumptions required to
guarantee error isolation.
3.2 PERFECT AN-ENCODING
To prove that a hardening algorithm is correct, we have to show that error isolation holds for all
possible executions. The challenge of showing AN-encoding is correct is in finding a reason-
able set of fault assumptions that make AN-encoding behave deterministically with respect to
all possible faults. In this section, we propose Perfect AN-encoding: an AN-encoding transfor-
mation for a simple yet relevant instruction set. Perfect AN-encoding, or simply PAN-encoding,
encompasses refinements to the ASC model, fault assumptions, and encoding transformation
rules. Under the fault assumptions asserted here, PAN-encoding never fails. In Section 3.3,
we discuss the probability of the assumptions failing and, consequently, of error isolation
being violated.
42
3.2 Perfect AN-encoding
3.2.1 MODEL REFINEMENTS
PROCESS MODEL
We start refining the process model defined in Section 2.3.1 by specifying the instructions
available in the underlying hardware. We partially borrow the instruction set from the “Fault-
tolerant Typed Assembly Language” (TALFT) [Per+07].
The set of variables used in the original program is called Vp and the set of additional variables
introduced in the encoded program is called Ve, such that Vp ∩ Ve = { } and Vp ∪ Ve ⊆ V . The
variables in Vp are determined by the programmer, but for our needs we always use four
variables, following the convention in TALFT: rz for conditional branches, rd for addresses
in branches and for the result of operations, and rs and rt for other source operands. The
variables in Ve are va, vb, vc, vd , ve, vf , vg, v1, v2, v3, v4, v5 , and v6 . The domain D of all
variables in V is the set of natural numbers.
We now define the PAN instruction set.
Definition 3.1 (PAN instruction set, non-branching operations) A program p can contain
any number of non-branching instructions using the following arithmetic and move operations.
• The addition and multiplication operations increment the program counter pc and add or
multiply the values of the operands rs and rt writing the resulting value into rd:
Add(rd, rs, rt) ∆= pc′ = pc + 1 ∧ s′[rd] = s[rs] + s[rt]
Mul(rd, rs, rt) ∆= pc′ = pc + 1 ∧ s′[rd] = s[rs] * s[rt]
• The modulo operation increments the program counter pc and calculates the modulo of
rs with rt, writing the remainder value into rd:
Mod(rd, rs, rt) ∆= pc′ = pc + 1 ∧ s′[rd] = s[rs] % s[rt]
• The subtraction operation increments the program counter pc and subtracts rt from rs,
writing the result into rd:
Sub(rd, rs, rt) ∆=
∨ s[rs] ≥ s[rt] ∧ pc′ = pc + 1 ∧ s′[rd] = s[rs] − s[rt]
∨ s[rs] < s[rt] ∧ pc′ = pc + 1 ∧ s′[rd] = 0
If the value of rt is greater than the value of rs, the result is 0 (there is no wrap-around).
• The division operation either increments the program counter pc and divides rs by rt,
writing the result into rd, if the value of rt is different from 0; or aborts the process if
the value of rt is 0 by setting pc to “halt”:
Div (rd, rs, rt) ∆=
∨ s[rs] ̸= 0 ∧ pc′ = pc + 1 ∧ s′[rd] = s[rs] / s[rt]
∨ s[rt] = 0 ∧ pc′ = “halt”
• The move operation increments the program counter pc and writes a constant operand C
into rd:
Mov (rd, C) ∆= pc′ = pc + 1 ∧ s′[rd] = C
• The load operation Ld(rd, rs) loads the value of the variable pointed by s[rs] into the
variable rd; whereas the store operation St(rd, rs) stores the value of rs into the variable
pointed by s[rd].
Ld(rd, rs) ∆= pc′ = pc + 1 ∧ ∃ v ∈ V : v = s[rs] ∧ s′[rd] = s[v ]
St(rd, rs) ∆= pc′ = pc + 1 ∧ ∃ v ∈ V : v = s[rd] ∧ s′[v ] = s[rs]
43
3 Encoded processes
Definition 3.2 (PAN instruction set, branching operations) A program p can contain any
number of branching instructions using the following branching operations.
• The unconditional branch Jmp unconditionally moves the value of rd into the program
counter pc:
Jmp(rd) ∆= pc′ = s[rd]
• The conditional branch Bz moves the value of rd into the program counter pc if the value
of rz is 0, otherwise it increments pc:
Bz(rz, rd) ∆=
∨ rz = 0 ∧ pc′ = s[rd]
∨ rz ̸= 0 ∧ pc′ = pc + 1
• The abort branch sets the pc with “halt”, forcing the process to halt:
Abort ∆= pc′ = “halt”
The process model presented here has two rather unusual definitions. First, the domain
D of the variables in V is the set of natural numbers. Second, our subtraction operation
implements saturation arithmetic, i.e., x − y = 0 if x ≤ y . These two assumptions simplify
our presentation and allow us to initially ignore the consequences of the modular arithmetic
of real processors, specifically, the “wrap-around” effect on overflows. In Section 3.3, we
review these assumptions.
FAULT MODEL
PAN-encoding requires two refinements of the ASC fault model: (1) a definition of the Valid(v )
predicate for fault diversity (Assumption 2.2); and (2) additional assumptions necessary to
prove PAN-encoding correct.
Recall that the set of variables Vd is used in the definition of Assumption 2.2. In PAN-
encoding, Vd = Vp ∪ Ve. Also, recall that the set of input and output variables, Vi and Vo, are
subsets of Vp. The fault diversity assumption is then instantiated with the following Valid(v )
predicate.
Definition 3.3 (PAN Valid(v ) predicate) A variable v ∈ Vd is valid at state s in the execution
of a PAN-encoded program pe ran by process π if and only if the value of v modulo constant
A is 0 at state s. We define
Valid(v ) ∆= s[v ]%A = 0.
To describe whether variable v is valid in the state after a state transition is taken, i.e., in
the next state s′, we also define
Valid ′(v ) ∆= s′[v ]%A = 0.
Assumption 2.2 and Definition 3.3 are the first step in making encoding deterministic with
respect to faults. As long as the assumption holds, no ASC fault can corrupt a variable in Vp
or Ve such that the resulting value is code. Since “codeness” can be expressed as a Valid(v )
predicate, we interchangeably use the terms code and valid as well as noncode and invalid
throughout this chapter.
AN-encoding is a mechanism to detect errors in the data-flow only; hence, it cannot cope
with most faults affecting the control flow. Therefore, we restrict ASC faults such that they
do not modify the program counter in arbitrary ways.
44
3.2 Perfect AN-encoding
Assumption 3.1 (No program-counter corruption) ASC faults do not corrupt the program
counter pc.
NPC ∆= pc′ = pc
Assumptions 2.2 and 3.1 specifically restrict the Fault steps of the model. The former
constrains the format of ASC faults – only those Fault steps that result in noncode values
are allowed – and the latter excludes any possible direct corruption of the program counter
altogether.
In addition to restricting the ASC faults, we have to restrict the possibility of error propagation
in the Next steps. In the following assumptions, let i = p[pc] be the next instruction to be
performed by the program; let SO be the set of source operands and TO be the set of
target operands of instruction i, respectively; and let Opcode be the operator identifier of the
instruction i.
Assumption 3.2 (Noncodeness propagation) If instruction i is neither a branching instruc-
tion nor load or store, and some source operand of i is noncode, then all target operands are
noncode. Formally,
NCP ∆=
Opcode /∈ {“Jmp”, “Bz”, “Ld”, “St”} ∧ ∃ v ∈ SO : ¬Valid(v ) =⇒ ∀w ∈ TO : ¬Valid ′(w )
Assumption 3.2 guarantees that instructions propagate the “noncodeness” to the target
operands, i.e., operations in our instruction set guarantee that if operands are noncode (i.e.,
invalid), the result is noncode. Consider the example of an instruction Add(rd, rs, rt). The
source and target operands are, respectively, SO = {rs, rt} and TO = {rd}. Now assume the
encoding constant A = 11 and the one source operand to be s[rs] = 22. Moreover, assume
the other source operand to be invalid, i.e., ¬Valid(rt). Once the operation has been executed,
the result in rd is invalid by basic arithmetic rules and the definition of Valid(v ):
s′[rd] = (22 + s[rt]) % A = (2 * A + s[rt]) % A = s[rt] % A ̸= 0
As already mentioned, AN-encoding cannot cope with control-flow faults. Control-flow faults
not only directly modify the pc; they might also corrupt variables used in branching instruc-
tions. The following assumption guarantees that branching operations crash the process if
the destination variable of the branching instruction is corrupt. The reasoning behind the as-
sumption is that, since faults are random, there is a very high likelihood that the address in
the corrupt variable points to some region of memory unmapped by the operating system,
resulting in a segmentation fault.
Assumption 3.3 (Benign branch corruption) If instruction i is a Jmp(v ) instruction, and v is
corrupt at s, then the process crashes. If instruction i is a Bz(z, v ) instruction, variable z has
value 0, and v is corrupt at s, then the process crashes. Formally,
BBC ∆=
∧Opcode = “Jmp” ∧ Corrupt(v ) =⇒ pc′ = “halt”
∧Opcode = “Bz” ∧ Corrupt(v ) ∧ s[z] = 0 =⇒ pc′ = “halt”
Note that Assumption 3.3 assumes a Next step with a Bz instruction crashes the process
only if the destination variable v is corrupt and the condition variable z is zero. Otherwise, the
program counter pc is only incremented regardless of whether v is corrupt or not.
45
3 Encoded processes
The next assumption restricts the error propagation of load or store instructions using corrupt
pointers. The pointers, i.e., variables holding the index of other variables, are likely to point
to unmapped regions of memory if corrupted by a fault. Hence, if a load or store instruction
executes using corrupt pointers, then the process is likely to crash.
Assumption 3.4 (Benign pointer corruption) If instruction i is a Ld(v, vs) or St(vd , v ) instruc-
tion and the variable v that points to the actual variable to be loaded or stored is corrupt at s,
then the process crashes at s′. Formally,
BPC ∆= Opcode∈ {“Ld”, “St”} ∧ Corrupt(v ) =⇒ pc′ = “halt”
We discuss the coverage of these assumptions in Section 3.3.
3.2.2 PAN-ENCODING RULES
We now define the encoding rules to be applied on a native program p. The rules substitute
and add new instructions to p creating a new program pe. We implicitly adjust all constants
pointing to instruction indices such that they consider the additional instructions added to
p. All constants are encoded my multiplying their value with A. We present the rules in
increasing order of complexity. For simplicity, we require modulo instructions to be replaced
with a software implementation using the other operations1; hence, we do not have to provide
any transformation rule for modulo instructions.
Rule 3.1 (PAN-encoded addition, subtraction, and move) Addition, subtraction and move
instructions in p are kept unmodified.
Rule 3.2 (PAN-encoded multiplication) Every multiplication Mul(rd, rs, rt) in p is substituted
by three instructions:
+0 Mul(va, rs, rt)
+1 Mov (vb, A)
+2 Div (rd, va, vb)
The values on the left margin of the rule (i.e., +0, +1, +2) represent how the indices of the
native program p have to modified. The first instruction is a substitution, hence it adds 0 to
the indices.
Initially, the values of rs and rt are multiplied and stored in the temporary variable va. Let
x = s[rs] and y = s[rt]. Since (x * A) * (y * A) is different from the expected value (x * y ) * A,
one needs to correct the result dividing it by A. Hence, the second and third instructions
respectively move the constant A into a temporary register vb and divide va by A, writing the
final result into rd.
The rule for transforming unconditional branches is quite simple. The constant A is used to
divide the encoded address in variable rd, storing the result into variable vb as follows.
Rule 3.3 (PAN-encoded unconditional branch) Every unconditional branch Jmp(rd) in p is
substituted by 3 instructions:
+0 Mov (va, A)
+1 Div (vb, rd, va)
+2 Jmp(vb)
1The modulo N of a value x is the reminder of the division of x by N, calculated as
(
x − N · ⌊ xN ⌋
)
. The floor
function is directly implemented by integer division.
46
3.2 Perfect AN-encoding
Note that vb contains an unencoded value at Line +2. In contrast to program variables
in Vp, which are always kept encoded, auxiliary variables in Ve may contain unencoded (and
consequently invalid) values even though the value is correct with respect to a reference
execution. In general, that only occurs for (conditional and unconditional) branch operations
and load/store operations. Branch and load/store operations can only perform a correct branch
or access the correct memory location using unencoded values since the processor is not
aware of encoding. Therefore, these operations require the address operand to be decoded
in an auxiliary variable before being used. Note that decoding a variable as in Line +1 does
not violate fault diversity because fault diversity only restricts Fault steps.
The PAN-encoding load and store operations uses rules similar to the unconditional branch
as follows.
Rule 3.4 (PAN-encoded load operation)
Every load Ld(rd, rs) in p is substituted by
3 instructions:
+0 Mov (va, A)
+1 Div (vb, rs, va)
+2 Ld(rd, vb)
Rule 3.5 (PAN-encoded store operation)
Every store St(rd, rs) in p is substituted by
3 instructions:
+0 Mov (va, A)
+1 Div (vb, rd, va)
+2 St(vb, rs)
Slightly more sophisticated is the rule for transforming conditional branches. Before we
proceed to conditional branches, we first define an important auxiliary sequence of operations:
the validity check.
Rule 3.6 (PAN-encoded validity check) The validity check Check(v ), with v ∈ Vd , aborts the
process if the value of variable v is noncode, otherwise it continues the execution normally.
This subrule can be used by other rules when needed. To avoid variable clashes, ve, vf , and
vg are not used by any other rule.
+1 Mov (ve, A)
+2 Mod(vf , v, ve)
+3 Mov (vg, L)
+4 Bz(vf , vg)
+5 Abort
L: +6
if s[vf ] = 0, i.e.,
s[v ]%A = 0
The validity check is not a substitution, therefore, it only adds instructions starting the count
with +1. The check performs a modulo operation of the value of v with the constant A and
aborts if the result is not 0. The label L at Line +6 represents the value of program counter
in the instruction just after the validity check.
Rule 3.7 (PAN-encoded conditional branch) Every conditional branch Bz(rz, rd) in p is sub-
stituted by 8 instructions:
+0 Mov (va, A)
+1 Mov (v1, L1)
+2 Mov (v2, L2)
+3 Bz(rz, v1)
+4 Check(rz)
+5 Jmp(v2)
L1: +6
s[rz] = 0
Div (vb, rd, va)
+7 jump to s[rd]/ AJmp(vb)
L2: +8
s[rz] ̸= 0
and
rz is valid
47
3 Encoded processes
function div(x, y )
z ← 0
if y = 0 then
raise division-by-zero exception
while x ≥ y do
z ← z + 1
x ← x − y
return z
Algorithm 3.1: Software-implemented unsigned integer division
A conditional branch Bz(rz, rd) jumps to the address in rd if rz is 0. The encoded conditional
branch has both operands encoded. The encoded value of 0 is also 0, hence, the encoded
conditional branch first checks whether rz is 0 and in the positive case sets the pc to the
trampoline at L1 using variable v1. This trampoline (Line +6) is necessary since the destination
address rd is encoded – and the processor is not aware of AN codes. The trampoline decodes
the address in rd using the constant A stored in va, and then branches using an unconditional
branch into the target address.
For two reasons rz can be different from 0 at Line +3. First, rz can be correct, so the
execution should continue to the next instruction at L2. Second, rz can be incorrect (and
consequently invalid). We have to check rz (Line +4) and abort the process if rz is invalid.
Otherwise, if the correct value of rz is 0 and a fault corrupts rz before Line +3, the execution
would incorrectly proceed to the next instruction at L2. Recall that all invalid values are non-
zero because zero is multiple of A.
The division is a more elaborated encoded operation. As multiplication, division requires
some correction work. Simply dividing xc by yc results in an unencoded value because xcyc =
xf *A
yf *A
= xfyf . However, the desired result of a division is
xf
yf * A. Obviously, multiplying the result
of the division with the constant A yields the desired result, but a fault occurring between
both operations may undetectably corrupt the unencoded value xfyf . A better solution is to first
multiply the dividend with A and then perform the division, because that opens no window
of vulnerability. Unfortunately, this approach results in a non-remainder-free calculation, which
has to be corrected. Remember that the Div operation implements an integer division, e.g.,
⌊32⌋ = 1. If the division 3/ 2 were encoded with A = 5 using the approach above, it would first
multiply the dividend 15 with A and apply the divisor 10, but the result would be different
from the expected 5: ⌊ (3 * 5) * 5
2 * 5
⌋
=
⌊15
2
⌋
= 7 ̸= 1 * 5
Schiffel proposes two solutions for the remainder problem of divisions. The first solution is
by transforming divisions into software-based divisions, i.e., by performing a division without
using the Div operation. Algorithm 3.1 depicts a software division in pseudo-code. Since
emulating divisions in software is slow, Schiffel proposes a second, faster approach using
the Div operation. Nevertheless, this second approach cannot be proved correct under the
assumptions defined in this work. Specifically, the fast division can violate Assumption 3.2
with a fault corrupting the reminder correction variable (see Schiffel’s dissertation [Sch11],
Listing 4.9, Page 65).
To avoid making our assumptions even stronger, we opt in creating a division rule using
the slow but safe software-based division. The next rule is the encoded implementation of
Algorithm 3.1 using the instruction set defined in Section 3.2.1. Note that the compiler used
in Section 3.4 implements the fast but unsafe division instead.
48
3.2 Perfect AN-encoding
Rule 3.8 (PAN-encoded division) Every division Div (rd, rs, rt) in p is substituted by 25 in-
structions.
+0 Mov (va, A)
+1 Mov (v0, 0)
+2 Mov (v1, L1)
+3 Mov (v2, L2)
+4 Mov (v3, L3)
+5 Mov (v4, L4)
+6 Mov (v5 , L5)
+7 Mov (v6 , L6)
+8 Bz(rt, v1)
+9 Check(rt)
+10 Jmp(v2)
L1: +11
if s[rt] = 0
Abort
L2: +12
if s[rt] ̸= 0
Mov (vb, 0)
+13 Add(vc, rs, v0)
L3: +14 Sub(vd , vc, rt)
+15 Bz(vd , v4)
+16 Check(vd )
+17 Add(vb, vb, va)
+18 Add(vc, vd , v0)
+19
loop while
s[vd ] ̸= 0
Jmp(v3)
L4: +20
if s[vd ] = 0
leave loop
Sub(vd , rt, vc)
+21 Bz(vd , v5 )
+22 Check(vd )
+23 Jmp(v6 )
L5: +24
if s[vc] = s[rt]
adjust result
Add(vb, vb, va)
L6: +25
if s[vc] ̸= s[rt]
no adjustment
Add(rd, vb, v0)
The first eight instructions just initialize auxiliary variables with constants: va with constant
A; v0 with constant 0; and v1 to v6 with the addresses of the respective labeled instructions.
These variables are not modified by any other instructions in the encoded division.
Following the specification of the native Div operation (Definition 3.2), the first branch aborts
the process if the divisor rt is 0. Otherwise, the process jumps to instruction with label L2.
The following two instructions initialize the division loop. vb is the counter holding the result
of division (the quotient) and is initialized with 0 (i.e., z in the pseudo-code). vc initially holds
a copy of rs to be modified in each iteration of the loop (i.e., x in the pseudo-code). We use
the addition with 0 to copy the value from rs into vc. The subtraction (Line +14) subtracts the
divisor rt from vc, and stores the result into vd . We use two variables, vc and vd , to represent
x because we need the last value of vc once we leave the loop, as explained below.
In our simple instruction set we do not have the comparison operation less-than, but we
can implement the while loop since our subtraction truncates at 0. If s[rt] is less than s[vc]
at Line +15, then vd is different from 0 and loop is taken by incrementing pc.
In the body of the loop (Lines +16 to +19), vd ’s validity is checked, otherwise a fault could
force another iteration of the loop, and no trace of the error would be detected if eventually
the value of vd is less than rt – this check is similar to the check at Line +4 of Rule 3.7. If
the check succeeds, then the quotient vb is incremented with A at Line +17. Remember that
A represents the number 1 encoded. Finally, the value of vd is copied into vc, and the loop
starts again at label L3.
Once s[rt] is greater or equal to s[vc], the value in s[vd ] is 0 at Line +15. The following
instruction (Bz) sets the pc to L4, leaving the loop. Because the loop is left even when s[rt] is
49
3 Encoded processes
equal to s[vc], we might have to adjust the quotient by 1. There are three cases at Line +20:
1. s[vc] ̸= 0 and s[vc] < s[rt]: the dividend is not a multiple of the divisor; division is
complete.
2. s[vc] ̸= 0 and s[vc] = s[rt]: the dividend is a multiple of the divisor; we should add 1 to
the quotient.
3. s[vc] = 0: the dividend rs was zero and the division is complete with quotient 0.
Note a case s[vc] ̸= 0 and s[vc] > s[rt] does not exist. Such case would imply that s[vd ]
would be different from 0, what is contradiction because the loop was left.
To cope with the second case, we have to test whether s[vc] = s[rt] and add 1 to the
quotient. Finally, the quotient is copied into the result variable rd.
To complete the PAN-encoding, the encoded program pe has also to check that the input
message is indeed formed with code values.
Rule 3.9 (PAN-encoding input message check) A PAN-encoded program pe has as first in-
structions a loop over every variable v ∈ Vi such that pc′ = N + 1 if any variable v is non-
code/invalid. For each variable v ∈ Vi , the following sequence of instructions is added to the
beginning of pe.
+1 Mov (va, A)
+2 Mov (vb, L)
+3 Mov (vc, N + 1)
+4 Mod(vd , v, va)
+5 Bz(vd , vb)
+6
jump to N + 1
(skip message)
Jmp(vc)
L: +7
if s[vd ] = 0, i.e.,
s[v ]%A = 0
Note that setting the pc to N + 1 in order to skip a message is only a modeling artifact. If a
real implementation would do so, then an old but valid message or an invalid message would
be sent. An alternative would be to set pc to 0, but then the traversal would never terminate
from a modeling perspective.
3.2.3 PAN-ENCODING CORRECTNESS
We are now ready to prove that a process π running an encoded program pe guarantees error
isolation. Our proof is based on the inductive invariant defined below. The core of our proof
shows that for any state s of some traversal of π running pe in which the inductive invariant
holds, any Next or Fault taken maintain the inductive invariant at state s′.
The possible Next and Fault transitions depend on the sequence of instructions of program
pe. Since a program pe is composed only with instruction sequences given by the Rules 3.1
to 3.9 by definition, we partition our proof in several lemmas, each of them showing that the
inductive invariant is maintained by the sequence of instructions of a transformation rule.
We then show that the inductive invariant holds at any traversal initial state and, hence, the
inductive invariant inductively holds for any traversal of π. Consequently, the inductive invariant
also holds for any execution of π running pe since executions are sequences of traversals.
We conclude the proof (Page 59) showing that the inductive invariant implies error isolation.
More precisely, we show the inductive invariant implies Properties 2.2, 2.3, and 2.4, which in
turn imply error isolation (Property 2.1).
50
3.2 Perfect AN-encoding
INDUCTIVE INVARIANT
In order to define the inductive invariant, we make the use of three auxiliary sets referring to
instructions of program pe:
IA contains the indices of all Abort instructions in pe, i.e., all instances of Line +5 of Rule 3.6,
and Line +11 of Rule 3.8.
IS contains the indices of all message-skip instructions in pe, i.e., all instances of Line +6
of Rule 3.9 (Jmp to N + 1 instructions).
IC contains the index of the first instruction of every validity check in pe, i.e., all instances
of Line +4 of Rule 3.7 and Lines +9, +16, and +22 of Rule 3.8.
These indices are important because once the process executes the instructions pointed by
them, the traversal (potentially) terminates. By Definition 2.27, once pc is set to “halt” (e.g.,
when executing an Abort instruction), the traversal terminates by stuttering forever. Similarly,
once pc is set to N+1 and a message is sent, the traversal stutters forever. Finally, if pc
points to the first instruction of a validity check, then the traversal may also terminate (i.e.,
the process may abort) if the variable being checked is invalid.
In the following, if the program counter has a value of one of these sets and the traversal
is about to stutter forever, we say that the process π is about to halt. If pc is “halt” or N+1,
we say that the process π is halted.
We define the inductive invariant as follows.
Property 3.1 (PAN Inductive invariant) For a given state s, for all variable v ∈ Vp \ {pc}, v is
either invalid or correct at s. If the program counter pc is correct, then for all variable v ∈ Ve,
v is also either invalid or correct at s. If the program counter pc is incorrect, then the process
is halted or is about to halt and the variable vd ∈ Ve is either invalid or correct at s. Formally,
InductiveInvariant ∆=
∧ ∀ v ∈ Vp \ {pc} : ¬Valid(v ) ∨ Correct(v )
∧ Correct(pc) =⇒ ∀ v ∈ Ve : ¬Valid(v ) ∨ Correct(v )
∧ ¬Correct(pc) =⇒ ∧ pc ∈ {“halt”, N + 1} ∪ IA ∪ IS ∪ IC
∧ ¬Valid(vd ) ∨ Correct(vd )
We should make four observations at this point. First, variables in Ve are sometimes invalid
even in fault-free traversals, e.g., when performing an unconditional branch, the target address
has to be decoded and the variable holding this target address may be not a multiple of A,
i.e., invalid. The unencoded variables Ve are a requirement to define the PAN-encoding rules.
Second, the sets IA, IS, and IC are necessary for the definition of the inductive invariant
because we need to allow the program counter to deviate from the reference state r if an
error is detected. Although Assumption 3.1 disallows any faults directly corrupting the program
counter, the program counter in s may still deviate from the reference program counter in r
once an error is detected with Rule 3.6, Line +4, since the results of the check is non-zero
upon an error (in s) and zero otherwise (in r ). Similarly, the conditional branch Bz(z, d) may
incorrectly miss a branch to d if the operand z is corrupt (and non-zero). That is however
not an issue for a program transformed with our rules, because we check the validity of z
immediately after conditional branches (see Bz instructions in Rules 3.7 and 3.8).
Third, the last line of the inductive invariant treats vd differently from other variables in Ve.
Because a Bz(z, d) instruction may miss a branch in Rules 3.7 and 3.8 if z is invalid, we have
to check z’s validity. In most of such validity checks, z is some variable in Vp (e.g., rt or rz).
However, in the rule for encoded division we also have to use an auxiliary variable – vd . If vd
51
3 Encoded processes
is invalid in a conditional branch, pc deviates from the reference. The state following Bz has
pc ∈ IC and vd has to be invalid (a fault cannot make vd correct again).
Fourth, in the cases that Bz misses a branch as described above, the program counter
assumes a value in IC. IC only contains the first instruction of each validity check because if
the program counter is equal to the first instruction of a validity check and the variable being
checked is invalid, then the process eventually crashes irrespective of how many faults occur
henceforth. We start by showing that.
Lemma 3.1 Assume that, at some state si , pc points to the first instruction of Check(v ) with
v ∈ Vp ∪ Ve and v is invalid. The process eventually crashes at state sf and for all w ∈ Vp, for
all s such that si ⪯ s ⪯ sf , s[w ] = si [w ] or w is invalid at s.
Proof: Faults may interleave with the execution of Check(v). Faults do not corrupt pc by
Assumption 3.1. If a fault occurs at some state s and corrupts a variable v ∈ Vp \ {pc}, then v
is invalid at s′. Let pc0 be the index of the instruction immediately before the first instruction
of Check(v ).
• pc0 + 1 (s = si ): Mov (ve, A) writes A into ve, where ve /∈ Vp. pc’ = pc0 + 2 by definition
of Mov . v is invalid at s′ since v is invalid at s by assumption. For all w ∈ Vp \ {pc},
s′[w ] = s[w ] since ve ̸= w by definition of Rule 3.6 (disjoint set of variables).
• pc0 +2: Mod(vf , v, ve) read from ve and v and writes into vf , where vf /∈ Vp. pc’ = pc0 +3
by definition of Mod. v is invalid at s by Step pc0+1 or by fault diversity (Assumption 2.2)
if a fault corrupts v between pc0 + 1 and pc0 + 2. vf is invalid at s′ by Assumption 3.2.
For all w ∈ Vp \ {pc}, s′[w ] = s[w ] since vf ̸= w by definition of Rule 3.6 (disjoint set of
variables).
• pc0 + 3: Mov (vg, L) writes pc0 + 6 into vg, where vg /∈ Vp. pc’ = pc0 + 4 by definition
of Mov . vf is invalid at s by Step pc0 + 2 or by fault diversity (Assumption 2.2) if a
fault corrupts vf between pc0 + 2 and pc0 + 3. vf is invalid at s′ since vf ̸= vg. For
all w ∈ Vp \ {pc}, s′[w ] = s[w ] since vg ̸= w by definition of Rule 3.6 (disjoint set of
variables).
• pc0 + 4: Bz(vf , vg) reads from vf and vg and writes into pc. s[vf ] is invalid at s by
Step pc0 + 3 or by fault diversity (Assumption 2.2) if a fault corrupts vf between pc0 + 3
and pc0+4. vf ̸= 0 since vf is invalid at s and by definition of Valid. Hence, pc′ is invariably
set to pc0 + 5 by definition of Bz. For all w ∈ Vp \ {pc}, s′[w ] = s[w ] by definition of Bz.
• pc0 + 5 (s′ = sf ): pc’= “halt” by Assumption 3.3 and definition of Abort. For all
w ∈ Vp \ {pc}, s′[w ] = s[w ] by definition of Abort.
By Assumption 3.1, a fault does not modify the pc. Hence, irrespective of how many faults
happen, eventually pc = pc0 + 5 and the process crashes if enough Next steps are taken.
Moreover, ve, vf , and vg are the only variables modified by Next steps in Check(v). Therefore,
for all w ∈ Vp \ {pc}, s[w ] = si [w ] since {ve, vf , vg} ∩ Vp = { } or w is invalid at s by fault diversity
(Assumption 2.2) for any state s such that si ⪯ s ⪯ sf . □
Since a process is doomed to abort if the precondition of Lemma 3.1 holds, we can consider
in such cases the validity check to be a single instruction.
FAULT STEPS AND TRIVIAL CASES
We now show that Fault steps maintain the inductive invariant.
52
3.2 Perfect AN-encoding
Lemma 3.2 Assume the inductive invariant (Property 3.1) holds at state s. If a Fault step is
taken, the inductive invariant holds at s′.
Proof: If pc = “halt”, then the Fault step is not enabled. Therefore, the inductive invariant
trivially holds since s′ = s. If the Fault step is enabled, there are two cases. If the pro-
cess crashes, the invariant again holds trivially. Otherwise, let W ⊆ Vp ∪ Ve be the subset of
variables modified by the fault. For all v ∈W , v is invalid at state s′ by fault diversity (Assump-
tion 2.2). For all variable v ∈ (Vp ∪ Ve) \ W , s′[v ] = s[v ] by definition of ASC fault. Furthermore,
pc’ = pc by Assumption 3.1. Hence, the inductive invariant holds at s′. □
Besides assuming as a precondition that the inductive invariant holds, we often assume
that the pc is correct at s to simplify proving the lemmas of the encoding rules. Most other
cases (where pc is not correct at s) boil down to the following trivial cases.
Lemma 3.3 Let the inductive invariant hold at state s, ¬Correct(pc) = TRUE and pc /∈ IC. If a
Next step is taken, then the inductive invariant holds at state s′.
Proof: There are four cases for which the inductive invariant holds, pc is not correct and
pc /∈ IC:
1. If pc = “halt”, then Next steps are disabled by Definition 2.27 and the inductive invariant
trivially holds at s′ since s′ = s.
2. If pc = N + 1, then Next steps stutter by Definition 2.27 and the inductive invariant
trivially holds at s′ since s′ = s.
3. If pc ∈ IA and a Next step is taken, then pc’ = “halt”, vd is not modified, and no variable
in Vp is modified by definition of Abort; hence, the inductive invariant holds at s′ since
pc’ = “halt”, for all v ∈ Vp \ {pc}, s′[v ] = s[v ], and s′[vd ] = s[vd ].
4. If pc ∈ IS and a Next step is taken, then pc’ = N + 1, vd is not modified, and no variable
in Vp is modified by definition of Jmp; hence, the inductive invariant holds at s′ since
pc’ = N+1, for all v ∈ Vp \ {pc}, s′[v ] = s[v ], and s′[vd ] = s[vd ]. □
The only case left where the pc is incorrect is when pc ∈ IC. We will deal with this case
together with the cases where pc is correct.
PAN-ENCODING RULES
Let s be the state when the pc points to the first instruction of a sequence of instructions
given by Rules 3.1-3.9. We now prove that for each sequence of instructions, if the inductive
invariant holds at s then the inductive invariant holds at s′. We start with the encoded addition
and multiplication rules (Rules 3.1 and 3.2).
Lemma 3.4 Assume the inductive invariant holds at state s. Assume pc points to the first
instruction of an encoded addition, subtraction, or move (Rule 3.1). If a Next step is taken,
then the inductive invariant holds at state s′.
Proof: Encoded addition, subtraction, and move are essentially the same as the native
operations. Since pc /∈ {“halt ′′, N +1}∪ IA∪ IS∪ IC, pc is correct at s by the inductive invariant.
• Add(rd, rs, rt) reads from rs and rt and writes into rd. If either rs or rt is invalid at s,
then rd is invalid at s′ by Assumption 3.2. By the inductive invariant, rs and rt can only
be invalid or correct. pc′ = pc + 1 by definition of Add. Hence, the inductive invariant
holds at s′.
53
3 Encoded processes
• Sub(rd, rs, rt) : Similar to addition.
• Mov (rd, C): Since the constant C is part of the instruction and the program cannot be
corrupt by definition, rd is correct in s′. pc′ = pc + 1 by definition of Mov . Hence, the
inductive invariant holds at s′. □
Lemma 3.5 Assume the inductive invariant holds at state s. Assume pc points the first
instruction of an encoded multiplication (Rule 3.2). For every Next step taken, the inductive
invariant holds at the next state s′.
Proof: The encoded multiplication is composed of three instructions, which might be inter-
leaved with faults. Let pc0 be the value pc has at the first instruction of encoded Mul(rd, rs, rt).
Since pc0 /∈ {“halt ′′, N + 1} ∪ IA ∪ IS ∪ IC, pc is correct at s by the inductive invariant.
• pc0 +0: Mul(va, rs, rt) reads from rs and rt and writes into va. rs and rt are either correct
or invalid at s by the inductive invariant. If rs or rt is invalid, the value of va in s′ is invalid
by Assumption 3.2. pc′ = pc + 1 by definition of Mul. Hence, the inductive invariant
holds at s′.
• pc0 + 1: Mov (vb, A) writes A into vb. Since the constant A is part of the instruction and
the program cannot be corrupt by definition, vb is correct in s′. pc′ = pc + 1 by definition
of Mov . Hence, the inductive invariant holds at s′.
• pc0 + 2: Div (rd, va, vb) reads from va and vb and writes in to rd. va and vb are either
correct or invalid at s by the inductive invariant. If va or vb are invalid at s, then rd is
invalid at s′, otherwise it is correct by definition of Div . pc′ = pc + 1 by definition of Div .
Hence, the inductive invariant holds at s′. □
Lemma 3.6 Assume the inductive invariant holds at state s. Assume pc points the first
instruction of an encoded unconditional branch (Rule 3.3). For every Next step taken, the
inductive invariant holds at the next state s′.
Proof: The encoded unconditional branch is composed of three instructions, which might be
interleaved with faults. Let pc0 be the value pc has at the first instruction of encoded Jmp(rd).
Since pc0 /∈ {“halt ′′, N + 1} ∪ IA ∪ IS ∪ IC, pc is correct at s by the inductive invariant.
• pc0 + 0: Mov (va, A) writes A into va. The inductive invariant holds at s′ by an argument
for Mov similar to Step pc0 + 1 of Lemma 3.5.
• pc0 + 1: Div (vb, rd, va) reads from rd and va and writes into vb. The inductive invariant
holds at s′ by an argument for Div similar to Step pc0 + 2 of Lemma 3.5. Note that vb
is likely to be invalid even in fault-free executions, since the division decodes the value
s[vb].
• pc0 + 2: Jmp(vb) reads from vb and writes into pc. If vb is corrupt, then pc’ = “halt” by
Assumption 3.3. Otherwise, pc′ = s[vb] and is correct. Hence, the inductive invariant
holds at s′. □
We now show that encoded load and store instructions also preserve the inductive invariant.
Note that the encoded load and store rely on Assumption 3.4.
Lemma 3.7 Assume the inductive invariant holds at state s. Assume pc points the first
instruction of an encoded load (Rule 3.4). For every Next step taken, the inductive invariant
holds and either pc’ is correct or pc’ = “halt” at the next state s′.
54
3.2 Perfect AN-encoding
Proof: The encoded load is composed of three instructions, which might be interleaved
with faults. Let pc0 be the value pc has at the first instruction of encoded Ld(rd, rs). Since
pc0 /∈ {“halt ′′, N + 1} ∪ IA ∪ IS ∪ IC, pc is correct at s by the inductive invariant.
• pc0 +0: The inductive invariant holds at s′ by an argument for Mov similar to Step pc0 +1
of Lemma 3.5.
• pc0 + 1: The inductive invariant holds at s′ by an argument for Div similar to Step pc0 + 2
of Lemma 3.5.
• pc0 + 2: Ld(rd, vb) reads from vb and v = s[vb] writes into rd. If vb is corrupt, then pc’
= “halt” by Assumption 3.4. If vb is correct, pc′ = pc + 1 and is correct. Moreover, if vb
is correct, then v ∈ Vp since the native program only use variables in Vp by definition.
Since v is either invalid or correct at s by the inductive invariant, rd is either invalid or
correct at s′. Hence, the inductive invariant holds at s′. □
Lemma 3.8 Assume the inductive invariant holds at state s. Assume pc points the first
instruction of an encoded store (Rule 3.5). For every Next step taken, the inductive invariant
holds and either pc’ is correct or pc’ = “halt” at the next state s′.
Proof: The encoded store is composed of three instructions, which might be interleaved
with faults. Let pc0 be the value pc has at the first instruction of encoded St(rd, rs). Since
pc0 /∈ {“halt ′′, N + 1} ∪ IA ∪ IS ∪ IC, pc is correct at s by the inductive invariant.
• pc0 +0: The inductive invariant holds at s′ by an argument for Mov similar to Step pc0 +1
of Lemma 3.5.
• pc0 + 1: The inductive invariant holds at s′ by an argument for Div similar to Step pc0 + 2
of Lemma 3.5.
• pc0 + 2: St(vb, rs) reads from rs and vb and writes into v = s[vb]. If vb is corrupt, then
pc’ = “halt” by Assumption 3.4. If vb is correct, pc′ = pc + 1 and is correct. Moreover,
if vb is correct, v ∈ Vp since the native program only use variables in Vp by definition.
Since rs is invalid or correct at s by the inductive invariant, v is invalid or correct at s′.
Hence, the inductive invariant holds at s′. □
Lemma 3.9 Assume the inductive invariant holds and pc is correct at state s. Assume pc
points to the first instruction of Check(v ) (Rule 3.6). Assume v is either correct or invalid. For
every Next step taken, the inductive invariant holds at the next state s′.
Proof: Let pc0 be the value pc has at the instruction immediately before the first instruction
of Check(v ) for some variable v ∈ V . pc is correct at s by assumption.
• pc0 + 1: Mov (ve, A) writes A into ve. The inductive invariant holds at s′ by an argument
for Mov similar to Step pc0 + 1 of Lemma 3.5.
• pc0 + 2: Mod(vf , v, ve) read from ve and v and writes into vf . v and ve are either correct
or invalid at s by the inductive invariant. By Assumption 3.2, vf is invalid at s′ if v or ve
are invalid at s; otherwise, vf is correct at s′. pc′ = pc + 1 by definition of Mod. Hence,
the inductive invariant holds at s′. Moreover, s[vf ] is 0 if and only if vf is correct at s′ by
the definition of Mod and of Valid.
• pc0 + 3: Mov (vg, L) writes into vg. The inductive invariant holds at s′ by an argument for
Mov similar to Step pc0 + 1 of Lemma 3.5. Note that even in a fault-free traversal, vg
might be invalid (but correct) at s′ since L is a constant address value.
55
3 Encoded processes
• pc0 +4: Bz(vf , vg) reads from vf and vg and writes into pc, no further variable is modified
by definition of Bz. There are two cases:
1. vf is correct at s: s[vf ] = 0 by Step pc0 + 2; hence, pc′ = s[vg] by definition of Bz.
There are two subcases:
a) If vg is corrupt at s, then pc’ = “halt” by Assumption 3.3. Hence, the inductive
invariant holds at s′.
b) If vg is correct at s, then pc′ = pc0 + 6 and pc’ is correct. Hence, the inductive
invariant holds at s′.
2. vf is invalid at s: pc′ = pc0 + 5 by definition of Bz and since s[vf ] ̸= 0. Hence, the
inductive invariant holds at s′ because pc′ ∈ IA.
• pc0 + 5: The inductive invariant holds at s′ by Lemma 3.3. □
Note that in Lemma 3.9 pc starts correct at s by assumption. The case when pc is incorrect
and the variable being checked is invalid is dealt by Lemma 3.1. Both lemmas are essential
for the following lemma of encoded conditional branches.
Lemma 3.10 Assume the inductive invariant holds at state s. Assume pc points to the first in-
struction in an encoded conditional branch (Rule 3.7). For every Next step taken, the inductive
invariant holds at the next state s′.
Proof: The encoded conditional branch is composed of eight instructions, which might be
interleaved with faults. Let pc0 be the value pc has at the first instruction of encoded Bz(rz, rs).
Since pc0 /∈ {“halt ′′, N + 1} ∪ IA ∪ IS ∪ IC, pc is correct at s by the inductive invariant.
• pc0 + 0 to pc0 + 2: The inductive invariant holds at s′ by an argument for Mov similar to
Step pc0 + 1 of Lemma 3.5.
• pc0 + 3: Bz(rz, v1) reads v1 and rz and writes into the pc, no further variable is modified
by definition of Bz. v1 and rz are either correct or invalid at s by the inductive invariant.
There are four cases:
1. s[rz] = 0 and v1 is corrupt at s: pc’ = “halt” by Assumption 3.3. Hence, the
inductive invariant holds at s′.
2. s[rz] = 0 and v1 is correct at s: pc’ = pc0 + 6 by definition of Bz. Since rz has to be
correct at s (rz is valid), pc’ is correct. Hence, the inductive invariant holds at s′.
3. s[rz] ̸= 0 and rz is correct at s: pc’ = pc0 + 4 by definition of Bz and pc’ is correct
since rz is correct at s. Hence, the inductive invariant holds at s′.
4. s[rz] ̸= 0 and rz is invalid at s: pc’ = pc0 + 4 by definition of Bz and pc’ is incorrect
but pc′ ∈ IC. rz is invalid at s′ since rz is invalid at s and Bz does not write into rz.
Hence, the inductive invariant holds at s′.
• pc0 + 4: There are two cases for Check(rz):
1. pc is correct at s: The inductive invariant holds at s′ by Lemma 3.9.
2. pc is incorrect at s: rz is invalid at s by Step pc0 + 3. pc’ = “halt”, for all v ∈ Vp :
s′[v ] = s[v ], and s′[vd ] = s[vd ] by Lemma 3.1. Hence, the inductive invariant holds
at s′.
• pc0 +5: The inductive invariant holds at s′ by an argument for Jmp similar to Step pc0 +2
of Lemma 3.6.
56
3.2 Perfect AN-encoding
• pc0 + 6: Div (vb, rd, va) reads from rd and va and writes into vb. The inductive invariant
holds at s′ by an argument for Div similar to Step pc0 + 2 of Lemma 3.5. Note that vb
is likely to be invalid even in fault-free executions, since the division decodes the value
s[vb].
• pc0 +7: The inductive invariant holds at s′ by an argument for Jmp similar to Step pc0 +2
of Lemma 3.6. □
Lemma 3.11 Assume the inductive invariant holds at state s. Assume pc points the first
instruction of an encoded division (Rule 3.8). For every Next step taken, the inductive invariant
holds at the next state s′.
Proof: The encoded division (Rule 3.8) is composed of 26 steps which might be interleaved
with faults. Let pc0 be the value pc has at the first instruction of encoded Div (rd, rs, rt). Since
pc0 /∈ {“halt ′′, N + 1} ∪ IA ∪ IS ∪ IC, pc is correct at s by the inductive invariant.
• pc0 + 0 to pc0 + 7: The inductive invariant holds at s′ by an argument for Mov similar to
Step pc0 + 1 of Lemma 3.5.
• pc0 + 8: Bz(rt, v1) reads v1 and rt and writes into the pc, no further variable is modified
by definition of Bz. v1 and rt are either correct or invalid at s by the inductive invariant.
There are four cases:
1. s[rt] = 0 and v1 is corrupt at s: pc’ = “halt” by Assumption 3.3. Hence, the inductive
invariant holds at s′.
2. s[rt] = 0 and v1 is correct at s: pc’ = pc0 + 11 by definition of Bz. Since rt has to
be correct at s (rt is valid), pc’ is correct. Hence, the inductive invariant holds at s′.
3. s[rt] ̸= 0 and rt is correct at s: pc’ = pc0 + 9 by definition of Bz and pc’ is correct
since rt is correct at s. Hence, the inductive invariant holds at s′.
4. s[rt] ̸= 0 and rt is invalid at s: pc’ = pc0 + 9 by definition of Bz and pc’ is incorrect
but pc′ ∈ IC. rt is invalid at s′ since rt is invalid at s and Bz does not write into rt.
Hence, the inductive invariant holds at s′.
• pc0 + 9: There are two cases for Check(rt):
1. pc is correct at s: The inductive invariant holds at s′ by Lemma 3.9.
2. pc is incorrect at s: rt is invalid at s by Step pc0 + 8. pc’ = “halt”, for all v ∈ Vp :
s′[v ] = s[v ], and s′[vd ] = s[vd ] by Lemma 3.1. Hence, the inductive invariant holds
at s′.
• pc0+10: The inductive invariant holds at s′ by an argument for Jmp similar to Step pc0+2
of Lemma 3.6.
• pc0 + 11: The inductive invariant holds at s′ by Lemma 3.3.
• pc0+12: The inductive invariant holds at s′ by an argument for Mov similar to Step pc0+1
of Lemma 3.5.
• pc0 + 13: Add(vc, rs, v0) reads from rs and v0 and writes into vc. v0 and rs are either
invalid or correct by the inductive invariant. If rs or v0 are invalid at s, then vc is invalid at
s′ by Assumption 3.2. Otherwise, vc is correct at s′. pc’ = pc+ 1 by definition of Add.
Hence, the inductive invariant holds at s′.
57
3 Encoded processes
• pc0 + 14: Sub(vd , vc, rt) reads from vc and rt and writes into vd . vc and rt are either
invalid or correct by the inductive invariant. If vc or rt are invalid at s, then vd is invalid at
s′ by Assumption 3.2. Otherwise, vd is correct at s′. pc’ = pc+ 1 by definition of Sub.
Hence, the inductive invariant holds at s′.
• pc0 +15: Bz(vd , v4) reads v4 and vd and writes into the pc, no further variable is modified
by definition of Bz. v4 and vd are either correct or invalid at s by the inductive invariant.
There are four cases:
1. s[vd ] = 0 and v4 is corrupt at s: pc’ = “halt” by Assumption 3.3. Hence, the
inductive invariant holds at s′.
2. s[vd ] = 0 and v4 is correct at s: pc’ = pc0 + 20 by definition of Bz. Since vd has to
be correct at s (vd is valid), pc’ is correct. Hence, the inductive invariant holds at s′.
3. s[vd ] ̸= 0 and vd is correct at s: pc’ = pc0 + 16 by definition of Bz and pc’ is correct
since vd is correct at s. Hence, the inductive invariant holds at s′.
4. s[vd ] ̸= 0 and vd is invalid at s: pc’ = pc0 +16 by definition of Bz and pc’ is incorrect
but pc′ ∈ IC. vd is invalid at s′ since vd is invalid at s and Bz does not write into
vd . Hence, the inductive invariant holds at s′.
• pc0 + 16: There are two cases for Check(vd ):
1. pc is correct at s: The inductive invariant holds at s′ by Lemma 3.9.
2. pc is incorrect at s: vd is invalid at s by Step pc0 + 15. pc’ = “halt”, for all v ∈ Vp :
s′[v ] = s[v ], and s′[vd ] = s[vd ] by Lemma 3.1. Hence, the inductive invariant holds
at s′.
• pc0 + 17 and pc0 + 18: The inductive invariant holds at s′ by an argument for Add similar
to Step pc0 + 13 of this lemma.
• pc0+19: The inductive invariant holds at s′ by an argument for Jmp similar to Step pc0+2
of Lemma 3.6.
• pc0+20: The inductive invariant holds at s′ by an argument for Sub similar to Step pc0+14
of this lemma.
• pc0 +21: Bz(vd , v5 ) reads v5 and vd and writes into the pc, no further variable is modified
by definition of Bz. v5 and vd are either correct or invalid at s by the inductive invariant.
There are four cases:
1. s[vd ] = 0 and v5 is corrupt at s: pc’ = “halt” by Assumption 3.3. Hence, the
inductive invariant holds at s′.
2. s[vd ] = 0 and v5 is correct at s: pc’ = pc0 + 24 by definition of Bz. Since vd has to
be correct at s (vd is valid), pc’ is correct. Hence, the inductive invariant holds at s′.
3. s[vd ] ̸= 0 and vd is correct at s: pc’ = pc0 + 22 by definition of Bz and pc’ is correct
since vd is correct at s. Hence, the inductive invariant holds at s′.
4. s[vd ] ̸= 0 and vd is invalid at s: pc’ = pc0 +22 by definition of Bz and pc’ is incorrect
but pc′ ∈ IC. vd is invalid at s′ since vd is invalid at s and Bz does not write into
vd . Hence, the inductive invariant holds at s′.
• pc0 + 22: There are two cases for Check(vd ):
1. pc is correct at s: The inductive invariant holds at s′ by Lemma 3.9.
2. pc is incorrect at s: vd is invalid at s by Step pc0 + 21. pc’ = “halt”, for all v ∈ Vp :
s′[v ] = s[v ], and s′[vd ] = s[vd ] by Lemma 3.1. Hence, the inductive invariant holds
at s′.
58
3.2 Perfect AN-encoding
• pc0+23: The inductive invariant holds at s′ by an argument for Jmp similar to Step pc0+2
of Lemma 3.6.
• pc0 + 24 and pc0 + 25: The inductive invariant holds at s′ by an argument for Add similar
to Step pc0 + 13 of this lemma.
□
Lemma 3.12 Assume the inductive invariant holds at state s. Assume pc points to the first
instruction of the input message check (Rule 3.9) of one variable v ∈ V i. For every Next step
taken, the inductive invariant holds and either pc’ is correct or pc′ = N + 1 at the next state s′.
Proof: Rule 3.9 is a set of instructions checking variables in Vi . Let pc0 be the value pc has at
the instruction immediately before the first instruction of the input message check (Rule 3.9)
for v . Since pc0 /∈ {“halt ′′, N + 1} ∪ IA ∪ IS ∪ IC, pc is correct at s by the inductive invariant.
• pc0 + 1 to pc0 + 3: The inductive invariant holds at s′ by an argument for Mov similar to
Step pc0 + 1 of Lemma 3.5.
• pc0 + 4: Mod(vd , v, va) read from v and va and writes into vd . v and va are either correct
or invalid at s by the inductive invariant. By Assumption 3.2, vd is invalid at s′ if v or va
are invalid at s; otherwise, vd is correct at s′. pc′ = pc + 1 by definition of Mod. Hence,
the inductive invariant holds at s′. Moreover, s[vd ] is 0 if and only if vd is correct at s′ by
the definition of Mod and of Valid.
• pc0 +5: Bz(vd , vb) reads from vd and vb and writes into pc, no further variable is modified
by definition of Bz. There are two cases:
1. vd is correct at s: s[vd ] = 0 by Step pc0 + 4; hence, pc′ = s[vb] by definition of Bz.
There are two subcases:
a) If vb is corrupt at s, then pc’ = “halt” by Assumption 3.3. Hence, the inductive
invariant holds at s′.
b) If vb is correct at s, then pc′ = pc0 + 7 and pc’ is correct. Hence, the inductive
invariant holds at s′.
2. vd is invalid at s: pc′ = pc0 + 6 by definition of Bz and since s[vd ] ̸= 0. Hence, the
inductive invariant holds at s′ because pc′ ∈ IS.
• pc0 + 6: Jmp(vc) reads vc and writes into pc. If vc is corrupt at s, then pc’ = “halt”
by Assumption 3.3, otherwise pc′ = s[v6 ] = N + 1 and is correct. Hence, the inductive
invariant holds at s′. □
INDUCTIVE INVARIANT AND ERROR ISOLATION
We now prove that for any traversal initial state in which error isolation holds, the inductive
invariant holds at the state after the check of the input message has finished.
Lemma 3.13 If the inductive invariant holds for the initial state s0 of a traversal, then the
inductive invariant holds for every state s of the traversal.
Proof: There are two types of state transitions: Next and Fault steps. Assume the inductive
invariant holds at state s of the traversal. If a Fault step is taken, then the inductive invariant
holds at s′ by Lemma 3.2. If a Next step is taken, there are two cases:
• pc is incorrect and pc /∈ IC: The process is halted or is about to halt by the inductive
invariant. The inductive invariant holds at s′ by Lemma 3.3.
59
3 Encoded processes
• pc is correct or pc ∈ IC: The inductive invariant holds at s′ by Lemmas 3.4, 3.5, 3.6,
3.7, 3.8, 3.10, 3.11, and 3.12, and by the fact that an encoded program pe contains only
instructions of the Rules 3.1 to 3.9.
By induction, the inductive invariant holds for every state s of the traversal. □
Lemma 3.14 Any traversal initial state implies the inductive invariant.
Proof: By definition of traversal initial state (Definition 2.26), every variable in v ∈ Vp ∪ Ve
is either invalid or correct. Moreover, pc = 1 for any s∈ Is and r ∈ Ir since s is the state
immediately after a Receive step by definition. Therefore, pc is correct. □
Theorem 3.1 PAN-encoding guarantees local error exposure, local error filtering and accuracy.
Proof:
• If any v ∈ Vp is corrupt at s, then v is invalid at s by Lemmas 3.13 and 3.14. Vo ⊂ Vp by
definition, hence, local error exposure (Property 2.3) holds, i.e., every corrupt message
sent out is invalid.
• Local error filtering (Property 2.4) holds since a correct process discards any invalid
message by Rule 3.9.
• Accuracy (Property 2.2) holds since a correct process does not discard any valid message
by Rule 3.9. □
Corollary 3.1 PAN-encoding guarantees error isolation under Assumptions 2.2, 3.1, 3.2, 3.3,
and 3.4.
Proof: Since PAN-encoding guarantees local error exposure and local error filtering by Theo-
rem 3.1, it guarantees error isolation by Theorem 2.2. □
3.3 REAL-WORLD AN-ENCODING
PAN-encoding is based on a fictitious instruction set and requires strong assumptions on fault
and state transitions to work. In this section, we pinpoint the most fragile points of PAN-
encoding. We briefly describe how AN-encoding implementations heuristically solve these
issues. We also point directions for future work efforts on the formalization of AN-encoding.
3.3.1 FINITE DOMAINS AND MODULAR ARITHMETIC
In PAN-encoding, we assume the value domain D to be the set of natural numbers. In practice,
D is a finite subset of natural numbers 0, . . . , M −1, where M −1 is the greatest representable
value of the variable type; for example, a 32-bit machine has M = 232. Since D is finite, integer
arithmetic operations might overflow. With no extra care, encoding might result in incorrect
executions if the program relies on the wrap-around property of overflows.
As an example, consider a program operating on 2-bit variables, i.e., D = {0, 1, 2, 3}. Assume
A = 11. An encoding compiler blows up type of the variables to a size large enough such
that all original domain can still be represented after encoding. The 2-bit domain has to be
expanded to 6-bit variables since with 5-bit the value 33 = 3 * A cannot be represented.
Figure 3.2 depicts the domain D with code values marked with gray background. Depending
on the choice of the constant A, more values than in the original domain are code. In the
example, values 44 and 55 are both code, but their unencoded counterparts (4 and 5) do not
exist in D, which only contains values up to 3.
60
3.3 Real-world AN-encoding
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
Figure 3.2: Domain D = {0, . . . , 26 − 1} and encoding constant A = 11
Consider the example of a native program adding values 2 and 3. The expected result is
1 since 1 ≡ 5 mod | D |. Hence, the expected result in the encoded program is 11 since
11 = A * 1. Now, the encoded program might fail since 22 + 33 = 55, i.e., the addition does
not wrap around with the same values in the native and encoded programs.
Using our instruction set, additions can cope with overflows if the rule to encode an addition
replaces the instruction Add(rd, rs, rt) with the following sequence of instructions:
+0 Mov (v0, 0)
+1 Mov (v1, L)
+2 Mov (va, MAX VAL)
+3 Add(vb, rs, rt)
+4 Sub(vc, vb, va)
+5 Bz(vc, v1)
+6 Add(vb, vc, v0)
L: +7
if s[va] ≤MAX VAL
Add(rd, vb, v0)
The constant MAX VAL represents the maximal value in D encoded with A and is moved
into va. The result of the addition is stored in a temporary variable vb and compared with
va. If vb is less than or equal to va, then the value in vc is 0 by the definition of truncating
subtraction. In such case, the addition is complete, and the value of vb is moved into the
result variable rd. Otherwise, if vb is greater than va, then the value in vc already contains the
wrapped-around value of the addition and is moved in vb and, subsequently, into the target
operand rd.
In general, coping with overflows depend on what is expected and what is undefined
behavior allowed by the language. We refer to Schiffel [Sch11] for the overflow corrections for
multiplication and other operations. In the following discussion, we assume that all encoded
values are useful values, i.e., we ignore that some encoded values might be overflow values.
To achieve that, a correction mechanism has to be implemented or that the constant A has to
be selected such that encoded values do not overflow.
Another subtle assumption in process model of PAN-encoding is the saturating subtraction.
In a domain D = {0, 1, 2, 3}, using our subtraction definition, 2 − 3 results in 0. However,
standard CPUs typically implement non-saturating subtraction. If the minuend is less than the
subtrahend, the difference wraps around, e.g., 2 − 3 results in 3 in domain D = {0, 1, 2, 3}.
The assumption of saturating subtraction is not a technical one. Saturating subtraction sim-
plifies our presentation because it allows us to perform conditional branches with inequalities
such as va ≤ vb using the same conditional-branch operation Bz that branches if the result is
0. With a saturating subtraction, x − y is zero if x = y or x < y . In contrast, with a traditional
modular-arithmetic subtraction, x − y is not zero if x < y .
Standard x86 CPUs implement saturating arithmetic in the MMX/SSE2 instruction set ar-
chitecture extensions [Int14]. Nonetheless, if the target instruction set does not implement
61
3 Encoded processes
such saturating subtractions or if the programmer relies on the wrap-around effect of the
subtraction, then the process model has to be enriched with other conditional-branches be-
sides Bz. The encoding rules presented in Section 3.2.2 can be easily adapted to use other
conditional-branch operations.
3.3.2 ASSUMPTION COVERAGE UNDER UNIFORM RANDOM FAULTS
PAN-encoding can tolerate any number of ASC faults under Assumptions 2.2, 3.1, 3.2, 3.3
and 3.4; in other words, PAN-encoding does not fail in any execution of the model. We
now discuss the coverage of PAN-encoding assumptions. For that, instead of asserting the
assumptions hold in every execution, we assume faults are random and uniformly distributed:
If variable v is corrupted by a fault, then the new value of v is any value of domain D with
the same probability. We then calculate, under the assumption of uniform random faults, the
probability of the assumptions failing.
FAULT DIVERSITY
The fault diversity assumption (Assumption 2.2) captures the idea that a random fault has a
low probability of resulting in a code value. In a finite domain, the number of code values is
much smaller than the number noncode values (see Figure 3.2 for an example). If a variable
v contains a code value and a fault corrupts v by writing into v any value of domain D with
uniformly distributed probability, then the probability that v still contains a code value after the
fault occurs is approximately 1A , as we now show.
If C values in domain D are multiples of A, i.e., code values, then the probability of v
containing a code value after the fault occurs is CM since M is the total number of elements
in domain D. The code values in domain D are 0, A, 2 * A, . . . , k * A. Hence, the number of
code values C is simply k + 1. In the example of Figure 3.2, k = 5 because 55 is the greatest
multiple of A in the domain; the number of code values in the example is C = k +1 = 5+1 = 6.
We know that k * A is the greatest multiple of A that is less than M; and that (k + 1) * A is
greater or equal to M:
k * A < M (3.1)
(k + 1) * A ≥ M (3.2)
From Equation 3.1, we have that k < MA and, consequently, the following holds:
k + 1 <
M
A
+ 1. (3.3)
From Equation 3.2, we have that k + 1 ≥ MA . Therefore, from Equation 3.3, the following
holds:
M
A
≤ k + 1 < M
A
+ 1 (3.4)
From Equation 3.4 and the definition of ceiling function, we finally have that
C = k + 1 =
⌈M
A
⌉
. (3.5)
As an example, consider D and A from Figure 3.2. The value of k is then 5 and
C =
⌈M
A
⌉
=
⌈64
11
⌉
=
⌈
5.81
⌉
= 6 = 5 + 1 = k + 1.
62
3.3 Real-world AN-encoding
The probability of randomly picking a value multiple of A out of D is given by the number
of code values divided by the total number of values in D, i.e.,
P{v ≡ 0 mod A} = C
M
Eq 3.5
=
⌈M
A
⌉
*
1
M
≈ 1
A
(3.6)
Note that by simplifying the numerator M inside the ceiling function with the denominator
M in Equation 3.6, we only approximate the actual probability. In our example, 1A = 0.090 and
C
M = 0.09375. Note that in our implementation (Section 3.4), the default value for A used by
the compiler is 65521, yielding a probability of 165521 = 0.000015262.
RANDOM ADDRESSES IMPLY CRASH
We have three assumptions that rely on the following observation: specially in 64-bit architec-
tures, a uniform random value has a negligible probability of representing an address allocated
in the program’s address space.
The x64 architecture defines a limit of 48 bits for addressable bytes: 248 = 256 TiB can be
addressed [AMD12]. The use of addresses outside this limit results in a hardware exception.
Assuming a machine possesses 256 TiB of RAM2 and every byte of the memory is allocated,
the probability of a uniform random value being a valid address is 2
48
264 =
1
65536 . Operating
systems typically aborts a process with a “segmentation fault” if it tries to fetch instructions
or access data of non-allocated addresses.
Assumption 3.1 asserts that a fault never directly modifies the program counter. The as-
sumption has a high coverage since if a fault would modify the program counter with a uniform
random value, the fault would be equivalent to a crash transition with a high probability, i.e.,
1 − 165536 ≈ 0.99998. Similarly, Assumption 3.3 asserts that if the destination operand of Jmp
and Bz operations is corrupt, the process crashes (with a high probability). Finally, Assump-
tion 3.4 asserts that if Ld or St operations load from or store into the address in a corrupt
variable, the process crashes (with a high probability).
NONCODENESS PROPAGATION
The arithmetic operations in our model are assumed to propagate the “noncodeness” of
source operands into target operands (Assumption 3.2). Operations implemented in real pro-
cessors, however, do not always guarantee that. Here, we analyze the addition operation only.
A precise coverage estimation of this assumption requires a careful analysis of all arithmetic
operations available in the real processor, and it might depend on the program itself. The
thorough analysis of the error propagation probability is a topic of research on its own3 and
outside of the scope of this work.
We now estimate the assumption coverage of the addition operation Add(z, x, y ) as an
example (see Definition 3.1). Two scenarios can violate the Assumption 3.2, resulting in
variable z being corrupt but valid: one source operand is corrupt or both source operands are
corrupt. The target operand z can be corrupt or not since the addition operation overwrites
it anyhow. Note that after the addition operation is executed, a fault could corrupt z directly.
Above, we have already studied the probability of direct corruption cases: 1A .
One corrupt source operand. We show that, if one source operand is corrupt, the probability
of error propagation is also 1A . To show that, we have to be aware of one phenomenon related
to modular arithmetic: two code values added together are not necessarily a code value
anymore; likewise, one code value added to a noncode value can result in a code value. To
2As of 2014, each machine in our cluster has one DIMM with 8 GiB.
3See, for example, the recent dissertation by Morozov [Mor12].
63
3 Encoded processes
understand how this happens, consider the example of Figure 3.2 with constant A = 11.
Adding the code value 11 to the code value 55 results in the noncode value 2 because
11 + 55 = 66 and 66 ≡ 2 mod 64. Adding the code value 11 to the noncode value 53 results
in the code value 0 because 11+53 = 64 and 64 ≡ 0 mod 64. Therefore, we have to consider
two subcases: one when the corrupt operand is valid, and one when it is invalid. Without
loss of generality, let x be the correct source operand and y be the corrupt source operand.
In other words, this analysis employs x as a constant value, y as a random variable, and z as
a function of y .
In the first subcase, y is corrupt but valid. For a given value of x, we count how many code
values y may take such that the resulting value in z is still a code value. We use the running
example of Figure 3.2 to illustrate our reasoning. Assume x = 55. If x is added to y = 0, then
the result is again a code value, but any other code value in domain D added to x yields a
noncode result. Now, assume x = 44. If y ∈ {0, 11}, then the result is a code value. Similarly,
with x = 33 and y ∈ {0, 11, 22}, the result is a code value. In general, for a code value k * A in
the operand x, the corrupt operand y can take (C − k) code values that added to k * A result in
a code value. Remember that C is the number of code values in domain D (see Equation 3.5).
In the second subcase, y is corrupt but invalid. As above, for a given value of x, we count
how many noncode values y may take such that the result is still a code value. Again, we
refer to our running example. If x = 0, no noncode value added to x results in a code value.
If x = 11, then y = 53 result in a code value; if x = 22, then y ∈ {53, 42}; if x = 33, then
y ∈ {53, 42, 31}; and so on. In general, for a code value k * A in the operand x, the corrupt
operand y can take k code values that added to x result in a code value in the operand z.
Combining both cases, for a code value k * A in the operand x, the corrupt operand y can
take (C − k) code values and k noncode values that added to x result in a code value. Given
that y is corrupt and x is correct, the probability that y takes such a value is the sum of both
cases divided by the total number of values y can take, which is M.
P{Valid ′(z)} = P{(k * A + y ) ≡ 0 mod A} =
(C − k) + k
M
=
C
M
Eq 3.5
≈ 1
A
(3.7)
Two corrupt source operands. We now show the probability of error propagation to be
again 1A even if both source operands are corrupt. This analysis employs x and y as random
variables and z as a function of x and y . There are three cases to consider: (1) both operands
are valid; (2) one operand is valid and the other is invalid; and (3) both operands are invalid.
We first consider x and y to be corrupt but valid. Let x contain some code value k * A. As
the case discussed above, y can take (C − k) code values such that added to x’s code value
again result in a code value. The total number of such dangerous code-value pairs is the sum
of the (C − k) cases of each value k * A operand x can take:
C−1∑
k=0
(C − k) =
C2 + C
2
The total number of code-value pairs the corrupt operands may take is C2. The probability
the corrupt operands contain a dangerous code-value pair is the relation between the number
of dangerous cases by the total number of code-value pairs. The probability that x and y are
valid in the first place is CM *
C
M . Hence, the probability that x and y are valid and that z is valid
after the addition is given by following equation.
P{Valid ′(z) ∧ Valid(x) ∧ Valid(y )} =
C2+C
2
C2 *
C2
M2
=
C2 + C
2 * M2
(3.8)
We now consider both source operands to be corrupt, one to be valid, and the other one
to be invalid. Without loss of generality, let x be the valid source operand and y be the
64
3.3 Real-world AN-encoding
invalid source operand. For each value k * A operand x may hold, y can take k noncode
values such that added to k * A again result in a code value. The total number of possible
code/noncode-value pairs is the sum of the k cases of each value k * A operand x can take:
C−1∑
k=0
k =
C2 − C
2
The total number of code/noncode-value pairs the corrupt operands may take is C * (M − C)
since C is the number of code values and M − C is the number of noncode values. The
probability the corrupt operands contain a dangerous code/noncode-value pair is the relation
between the number of dangerous cases by the total number of code/noncode-value pairs.
Moreover, the probability that x is valid and y is invalid is CM *
(
1 − CM
)
= CM *
M−C
M . Therefore,
the probability that x is valid, y invalid and z valid after the addition is given by the following
equation.
P{Valid ′(z) ∧ Valid(x) ∧ ¬Valid(y )} =
C2−C
2
C * (M − C)
*
C
M *
M − C
M
=
C2 − C
2 * M2
(3.9)
Finally, we consider the case that both source operands are corrupt and invalid. Let x = 1
and constant A > 1. For any y ∈ {A − 1, 2 * A − 1, . . .}, x and y contain noncode values, but
their sum is code, i.e., (x + y ) ≡ 0 mod A, because 1 + (A − 1) = A, 1 + (2 * A − 1) = 2 * A,
etc. Since A < M by design, if x contains a noncode value e and y contains a noncode value
(k * A − e), then their sum is the code value k * A, i.e.,
e + (k * A − e) ≡ k * A mod M.
For each of the M − C possible noncode values e operand x may take, there are C noncode
values (k * A − e) operand y may take such that x + y is code; hence, the total number of
dangerous noncode-value pairs is simply (M − C) * C.
Now, the total number of possible noncode-value pairs the corrupt operands may take is
(M − C)2. Therefore, the probability the corrupt operands contain a dangerous noncode-value
pair is the relation between the number of dangerous cases by the total number of noncode-
value pairs. The probability of that x and y are invalid and that z is valid after the addition is
the following.
P{Valid ′(z) ∧ ¬Valid(x) ∧ ¬Valid(y )} = (M − C) * C
(M − C)2 *
(M − C)2
M2
=
(M − C) * C
M2
(3.10)
We now combine the three cases given by Equations 3.8, 3.9 and 3.10 and apply basic
algebra to derive the probability of z being valid when both operands are corrupt.
P{Valid ′(z)} = P{(x + y ) ≡ 0 mod A} = C
2 + C
2 * M2
+
C2 − C
2 * M2
+
M * C − C2
M2
=
C2
M2
+
M * C − C2
M2
=
C
M
Eq 3.5
≈ 1
A
(3.11)
DISCUSSION
Equations 3.6, 3.7 and 3.11 have a fundamental consequence. The probability of undetectably
propagating an error to a variable (Equations 3.7 and 3.11) is the same as the probability of a
65
3 Encoded processes
fault directly corrupting the variable and resulting in a code value (Equation 3.6). Therefore, for
any execution of a program (only composed of additions), if a variable v gets corrupted (either
directly by a fault or indirectly via error propagation), then the probability of v being valid is
always 1A . The steps of the program do not increase the probability of a single variable being
corrupt but valid given any number of ASC faults.
This result corroborates our claim that AN-encoding does not require an assumption on the
rate of faults. In practice, however, our analysis has two caveats. First, the analysis of error
propagation presented here considered only the addition operation. Without an analysis of all
other operations in the instruction set available in the real system, one cannot conclude that
the error propagation does not increase the probability of some variables being undetectably
corrupt. Second, the larger the memory footprint, i.e., the number of variables used by the
program, the higher is the probability that at least one variable is corrupt and valid given that
at least one fault has occurred.
In real-world implementations of AN-encoding, the practical counter-measure against “un-
knowns” is being conservatively careful. A real implementation introduces checks (Rule 3.6)
for the target operands, for example, after every arithmetic operation, heuristically controlling
the increase of the probability of undetected errors.
To reduce the overhead of checking, an alternative is to use a special accumulator variable.
After each operation, every target operands to be checked is added to the accumulator. Since
addition preserves the code, in this way, several operands can be checked at once. The
accumulator can then be checked at cost-effective positions in the program, trading overhead
and coverage.
3.3.3 ASSUMPTION COVERAGE UNDER NON-UNIFORM RANDOM FAULTS
The assumption that random faults are uniformly distributed can itself have a low coverage
in real-world scenarios. A random fault is not uniformly distributed if some values in the
domain D have a higher probability of being picked to corrupt a variable v . We now discuss
the reasons, consequences and several heuristics that mitigate non-uniformity.
SINGLE-BIT CORRUPTIONS
To randomly corrupt a variable in a uniform way, a fault has to potentially change all bits of
the variable. Transient faults are, however, often assumed to corrupt single bits with a higher
probability than multiple bits [Bas+03; RCA07; Ulb+12; Hof+14]. In fact, evidence supports
the claim that random faults in the main memory do not follow a uniform distribution [Li+07;
Li+10]. Nevertheless, a corresponding fault model for combinational logic does not seem
to exist yet [Sch11]. Current models do characterize the error probabilities of combinational
circuit components [Shi+02; Fen+10], but not their effect on the state of the program and,
consequently, the correctness of the executions.
Due to the non-uniform distribution of random faults, some constants A invariably result
in less robust codes than others. For the sake of example, assume faults corrupt single bits
with a higher probability than multiple bits. If a program is encoded with a constant A that is
power-of-two, then fault diversity has an assumption coverage as low as 50%. Any functional
value, e.g., 1, multiplied by a power-of-two constant, e.g., 16, results in a simple shift of the
functional value by a number of bits to the left, e.g., 1 * 16 = 16 = 0x10 = 1 << 4. Hence,
a single bit flip in any of the higher-order bits of the variable cannot be detected, e.g., a flip
of the 6th bit would result in 0x30 = 48 and 48 ≡ 0 mod 16. If the unencoded value has
n bits and the encoded value has n + k bits, then the probability of undetected corrupt value
is nn+k for a power-of-two encoding constant A. Since k is never larger than n in practice –
otherwise the redundant information would use more bits than the functional value itself –
the probability of an undetected corruption can be as low as 50% if n = k.
66
3.3 Real-world AN-encoding
Symptom
Arithmetic code
AN ANB ANBD
Faulty operation X X X
Modified operand X X X
Exchanged operand X X
Exchanged operator X X
Lost store X
Control-flow error O O
Table 3.1: High-level symptoms and AN-code variants. Variants detect symptoms marked with
X and enable other mechanisms detect symptoms marked with O.
Schiffel [Sch11] as well as Hoffmann et al. [Hof+14] discuss principles to select good con-
stants A, e.g., never pick a power-of-two constant; pick a constant that yields a high Hamming
distance between code values and noncode values. In general, a robust code makes faults
look like uniformly distributed.
FAULT SYMPTOMS
In real-world scenarios, the non-uniformity of random faults also manifests in more subtle ways
than in preferring single-bit to multiple-bit corruptions. Table 3.1 summarizes the symptoms
that a fault might manifest at the program level. As discussed in Section 2.3.2, while the first
two symptoms can be modeled as unrestricted ASC faults, all other symptoms are modeled
as ASC faults that pick the corrupt value out of a subset of the domain D. Therefore, these
latter ASC faults are random but not uniformly distributed over the values of D since some
values are never picked.
Forin [For89] claims that AN codes can cope with some, but not all of these symptoms.
The reason why PAN-encoding can be proved correct despite of Forin’s claim is that As-
sumptions 2.2, 3.1, 3.2, 3.3 and 3.4 exclude the possibility of exchanged operand, exchanged
operator, lost store, and control-flow error symptoms altogether. If these symptoms, however,
have a non-negligible probability of occurring in reality, then the coverage of the assumptions
has to be reconsidered. Forin proposes heuristics to neutralize each of the symptoms – with
exception of control-flow errors. We now present these heuristics as employed by software-
only implementations of AN-encoding [Sch+10a; Hof+14] and point out some difficulties in
formalizing them.
EXCHANGED OPERANDS
Symptom. If a pointer variable is corrupt and still points to some mapped address, then
loading the value from that address does not cause the process to crash, but may violate
Assumption 3.4 instead. If the mistakenly loaded value is a code value as well, the process
experiences an exchanged operand symptom, which in turn can violate the inductive invariant
used to prove PAN-encoding correct. Consider the simple program that loads into variable y
the value from the address in variable v with Ld(y, v ), and subsequently adds x and y storing
the result into variable z with Add(z, x, y ). If v is corrupt before Ld is executed, then y is
likely to be corrupt – unless the loaded value is exactly the same as the expected value. If y
contains a code value, then the error cannot be detected by Check(y ) or Check(z) since the
value of y modulo A is still 0. Note that a similar example can be imagined if the fault corrupts
the opcode of an instruction changing its source operands.
67
3 Encoded processes
Heuristic. In ANB-encoding, the encoding compiler assigns a signature Bx for each variable
x ∈ Vp. A variable x is encoded by multiplying its functional value xf with the constant A and
adding a signature Bx , where 0 < Bx < A. To check whether variable x is a code value, one
performs the modulo operation with A and compares the result with Bx (instead of zero). As
in AN-encoding, the addition of two code values preserves the code:
zc = xc + yc = A * xf + Bx + A * yf + By = A * (xf + yf ) + Bx + By
The resulting value zc modulo A gives a remainder Bz = Bx + By as signature. Bz is com-
puted during compile time and used in Check(z) during runtime. Intuitively, if each variable
v has a unique signature Bv , then the probability of randomly picking a variable that has the
same signature than v is zero since the signatures are unique. Therefore, if a fault “exchanges
the operands”, then ANB-encoding guarantees the result of the computation contains an un-
expected signature, i.e., is invalid.
Issues. A formalization of ANB-encoding poses several challenges. First, the validity defini-
tion has to be changed to accommodate the signatures. One might try to define Valid(v ) ∆=
(s[v ] % A) = B[v ], where function B : V → D maps variables to signatures. The fault diversity
assumption then asserts that a fault does not corrupt a variable in such a way that it contains
the signature of the corrupt variable. Such a formalization, however, only holds as long no
load operations are performed. Values loaded into other variables contain signatures from
their original variables, easily breaking a fault diversity assumption with the above definition
of Valid(v ). An alternative, more restrictive, definition would be Valid(v ) ∆= (s[v ] % A)∈ Bs,
where Bs is the set of all signatures of the program. More precise definitions of validity seem
to be cumbersome, requiring the indirections of Ld and St to be taken into account in the
fault assumption definition.
Second, for all Bx and By , Bx + By < A must hold if the variable x is to be added to variable
y at any point in the program. Otherwise, the computation will result in invalid signatures
although no error has occurred. Similar assumptions have to be done for other operators too.
Therefore, the constant A and the use of the variables in the program will actually define the
maximum number of unique signatures that are available.
Third, since the validity now depends on the expected signature B, a receiver process in
the distributed system has to predict which signature to expect for each variable of a given
message; remember that messages are simply arrays of variables. In our implementation
(Section 3.4.2), we simply remove the B signatures before transmitting a message.
Finally, the ANB-encoding rules are much more intricate than those presented for PAN-
encoding. We failed to devise an adequate formalization for ANB-encoding. Unfortunately,
without a formalization, ANB-encoding cannot be proved correct.
EXCHANGE OPERATORS
Symptom. While the processor decodes an instruction fetched from memory, a fault in
the processor can result in an exchanged operator, i.e., an operation different from the one
specified in the instruction is executed on the specified operands. Consider the example of
Add(z, x, y ) being exchanged by a subtraction Sub(z, x, y ). With AN-encoding, Check(z) does
not detect an error in variable z because its value is multiple of A:
xc − yc = xf * A − yf * A = (xf − yf ) * A
Heuristic. Forin [For89], Schiffel et al. [Sch+10a], and Hoffman et al. [Hof+14] claim that ANB-
encoding can detect exchanged operators. Indeed, the following examples show that if an
68
3.3 Real-world AN-encoding
addition is exchanged by a subtraction or by a multiplication, the resulting signature differs:
(xc + yc) % A =
(
(x + y ) * A + (Bx + By )
)
% A = Bx + By
(xc − yc) % A =
(
(x − y ) * A + (Bx − By )
)
% A = Bx − By
(xc * yc) % A =
(
(x * y ) * A
2 + (Bx * By )
)
% A = Bx * By
If Bx > 0 and By > 0 and Bx ̸= By , the signatures obviously differ: Bx +By ̸= Bx −By ̸= Bx *By .
Issues. The main challenge in formalizing exchange operators is the large number of cases
to be considered. Every operation used to compute on encoded values has to be substituted
by every other operation available in the hardware, and the resulting value has to present a
different signature than the original operation. By starting with the addition, one can quickly
find counter examples that refute the claim of ANB-encoding detecting exchanged operators.
We found counter-examples when an addition is exchanged with a modulo operator or with a
division operator. We present here one counter-example for addition exchanged with a modulo
operator. Let the constants be A = 7, Bx = 2, By = 3, and let the program consist of a single
instruction: Add(z, x, y ). The signature Bz = Bx + By = 5 is calculated during compilation time.
Recall that the variable z is valid if and only if zc ≡ Bz mod A. Finally, assume x = 9 and y = 1
for some execution. The addition of x and y is executed as follows and the result is valid.
zc = xc + yc = (x * A + Bx ) + (y * A + By )
= (9 * 7 + 2) + (1 * 7 + 3)
= (9 + 1) * 7 + (2 + 3)
= 10 * 7 + 5
Now let the addition operation Add(z, x, y ) be exchanged by the modulo operation Mod(z, x, y ).
The addition exchanged by a modulo operator is executed as follows.
zc = xc % yc = (x * A + Bx ) % (y * A + By )
= (63 + 2) % (7 + 3)
= 65 % 10
= 5
To check whether the result of an operation is noncode or code, one takes the modulo A of
the result and compares that with the expected signature Bz computed offline. The addition
and the modulo operations on x and y calculate, however, exactly the same signature 5.
(xc + yc) % A = (10 * 7 + 5) % 7 = 5 = (5) % 7 = (xc % yc) % A
This example is not an isolated case: For any pair Bx and By and for any constant A, there are
pairs of values of x and y such that if they are intended to be added but an exchange operator
performs a modulo or division operation, then the result contains the same signature than the
addition. In other words, some value pairs are unprotected against exchanged operator faults
with probability 1.
The probability of such an “aliasing” of signatures depends on the value of the program
variables and not only on the constants of the encoding. Typically, one does not know what the
program variables contain as values. More importantly, one cannot restrict what the variables
are allowed to contain as values. We see as an important next step the calculation of how
many pairs are unprotected for given sets of constants. Unfortunately, we could not come up
with an analytical result to precisely determine this number.
The issues with exchanged operands complicates modeling and formal proof of ANB-
encoding further. At the moment, we see ANB-encoding as a practical protection against
exchanged operators, but with no provable guarantees, no clear fault assumption and, conse-
quently, unknown fault coverage.
69
3 Encoded processes
LOST STORES
Symptom. As load operations, store operation might also propagate errors without crashing
the process when the target address is corrupt. The symptoms observed are not exchanged
operands but lost stores, i.e., the value read by a load instruction is an old value because the
intended value was stored somewhere else.
Heuristic. ANBD-encoding adds a further value Dx to each variable x.4 Dx is a timestamp
that counts the version of the variable x so that the first time x is written Dx is 1, the second
time 2, and so on. The number of times x was written – effectively, the value of Dx – is added
to the variable itself and also kept in a version table.
Issues. ANBD brings additional problems into the formalization of encoding. One should
assume that no ASC fault consistently corrupts any variable x and its version Dx in the table.
In this way, the entry in the table protects Dx in the variable and vice versa. Moreover, the
extent of ASC faults has to be further limited by assumption. Consider the following example.
A variable x is going to be written with a store operation. The address of x is in variable vx
and the address of x’s entry in the version table is in variable ve. If an ASC fault corrupts
both pointers, then the lost store cannot be detected. Technically, there are two lost stores:
one for vx and one for ve. If we restrict ASC faults to only corrupt one variable, then two
ASC faults can corrupt both pointers. Hence, we also need to restrict the fault frequency by
assumption. We see as an open question how to precisely model the timestamps and the
effect of ASC faults with ANBD-encoding.
Finally, the correct timestamp D has to be “predicted” by a receiver, otherwise the re-
ceiver cannot operate with the encoded value. Since we could not devise a way how to let
the receiver know the timestamp D, our implementation (Section 3.4.2) simply removes the
timestamp D from each variable of a message before sending the message out; the receiver
then adds its own D to the variables of the message. Estimating the coverage impact of this
window of vulnerability is not trivial.
CONTROL-FLOW ERRORS
Symptom. Since random addresses are not necessarily bringing the process to crash, we
have to consider control-flow errors. AN-encoding does not provide any means to detect
control-flow errors unless the divergent control flow is reflected on the code values – i.e.,
AN-encoding might detect control-flow errors by accident.
A fault may trigger two types of control-flow errors. First, a fault can corrupt the program
counter, taking the flow of execution to an arbitrary instruction of the program. Second, a
fault can corrupt an operand used by a branch instruction – either the conditional value or the
target address.
Heuristic. Although ANB- and ANBD-encoding cannot directly detect control-flow errors, the
signatures of each variable used in a program can be used to form “block signatures”. A basic
block is a sequence of instructions with no branch, ended with a branching instruction. During
compile time, the compiler calculates the sum of the signatures of all values used in each basic
block. With basic-block signature, the compiler then computes a list of expected signatures.
During runtime, the block signature is calculated again, but this time with the current values
in the variables. After each instruction, the operands are added to an accumulator. At the
end of the block, the accumulator modulo A has to contain the sum of the signatures of all
4Dx is not to be confused with the domain D.
70
3.4 A framework for building hardened distributed systems
operands (and resulting operands) of the block. The calculated signature can be compared
against the expected signatures.
Issues. Control-flow errors introduce another level of complexity to the correctness reason-
ing of AN-encoding. Nevertheless, in our fault injection experiments, control-flow errors have
a less prominent effect on the executions than other types of faults (see Sections 4.5 and 5.4).
Therefore, although important, we see the modeling of this heuristic with a lower priority than
the previous heuristics. In Section 4.2.2, we propose as part of the SEI-hardening a restricted
control-flow error detection algorithm that is provably correct. Its restriction is the assumption
that a single control-flow error occurs during a traversal.
3.4 A FRAMEWORK FOR BUILDING HARDENED DISTRIBUTED
SYSTEMS
In this section, we present a framework for building distributed algorithms that automati-
cally hardens processes using AN-encoding. To perform the transformation automatically, we
employ the encoding compiler by Schiffel et al. [Sch+10a]. To enable communication and end-
to-end protection of encoding in the distributed algorithms, we develop an encoding wrapper
module for our framework. We start by describing the framework and the encoding transfor-
mation. Next, we explain how error isolation is achieved in practice. Finally, we discuss costs
and optimizations of our approach. Section 3.5 evaluates the performance and fault coverage
of two algorithms implemented on top of our framework.
3.4.1 PROCESS INTERFACES
Each process of a distributed algorithm is written as module in C language using the frame-
work’s interface to communicate with the external world. Processes are identified with an
integer id. State variables are global variables inside the process module.
Figure 3.3 shows the interface provided by the framework. The function ucast() stands
for unicast; it sends a message m, represented by pointer data and size s, to a destination
process dst. The sender id can be added to the message payload if required by the receiver.
Many distributed algorithms make use of alarms, i.e., timeouts. Alarms are implemented as
messages to oneself with a given delay. The function alarm() schedules an alarm identified
by aid, which is triggered after local time t. The framework provides other two functions: the
clock() function writes the value of the local clock into variable t, and the abort() function
terminates the process. Alarm scheduling, message transmission, and process termination are
all handled in the framework’s event loop. Note that the interface provided by our framework
can be extended to any number of functions as long as the encoding wrapper is adapted
accordingly, as described in Section 3.4.2.
void ucast(int32˙t dst , void* data , size˙t s);
void alarm(int32˙t aid , time˙t* t);
void clock(time˙t* t);
void abort(void);
Figure 3.3: Interface provided to processes
void init(int32˙t id);
void recv(void* data , size˙t s);
void trig(int32˙t aid);
Figure 3.4: Interface expected from
processes
Processes are expected to implement the interface in Figure 3.4. Once the program is
started by the operating system, the framework’s event loop initializes the process via the
init() function with an integer representing its process id. When the event loop receives
a message m from the network, it calls recv() with corresponding pointer and size. The
71
3 Encoded processes
process
wrapper
event loop
encoded
process
mc
srcIP
mc
dst IP
send(dst f,mc)recv(mc)
sendc(dstc, mc)recvc(mc)
Figure 3.5: Encoded process and wrapper:
message sending
encoded
process
process
wrapper
event loop
alarm(aidc, tf)trig(aidc)
alarmc(aidc, tc)trigc(aidc)
after time tf
Figure 3.6: Encoded process and wrapper:
alarms and triggers
function trig() is called with the alarm aid once time t is reached if the alarm() function
was previously called with the same aid. Besides these functions, the framework provides
a few other functions to read configurations and terminate the process. These functions are
however not relevant for our discussion.
A process is started with a static list of IP addresses and ids of all processes participating
in the distributed algorithm. The translation between ids and IP addresses is performed in the
framework’s event loop.
Although our model assumes unreliable datagram communication, our implementation
transmits messages between processes via TCP connections since the TCP protocol can
cope with most network faults in a practical way. The application running on top should, how-
ever, not rely on the delivery guarantees of TCP. Also, note that algorithms relying on alarms
and clock time should make additional assumptions: the clock variable should never be corrupt
and eventually communication and processing should be timely (see also Section 2.2.2).
3.4.2 ENCODING PROCESSES
With no source-code modification, processes can be hardened by compiling the process mod-
ule with the encoding compiler, and subsequently linking it with the framework’s library and a
wrapper module. We use the encoding compiler by Schiffel et al. [Sch+10a], which can trans-
form programs with AN-, ANB-, and ANBD-code. By selecting different arithmetic error codes,
our framework allows the developer to trade fault coverage for CPU cycles depending on the
system requirements. In Section 3.5, we evaluate error detection and performance overhead
of benign-fault-tolerant algorithms running with encoded processes.
WRAPPER
The wrapper is an algorithm-independent module, which allows the encoded process to access
the functions provided by the framework; it has a non-encoded part and an encoded part.
Originally, the encoding compiler is only capable of compiling complete programs. We have
modified the encoding compiler such that only the process module and part of the wrapper
are encoded, both together being called encoded process.
Figure 3.5 depicts an encoded process receiving and sending messages in our framework
– the subscript c represents an encoded function or value. State and functions inside the
shaded area are all encoded. The wrapper decodes arguments when necessary. Some (but
not all) arguments have to be decoded when leaving the encoded area because the event loop
does not “speak the code”. For instance, the encoded destination id of a message dstc has
to be decoded to its functional value dstf in order to be translated into an IP address dst IP by
the event loop. In contrast, the payload of a message is kept encoded during the message
life-time unless the bandwidth optimization (Section 3.4.4) is enabled.
72
3.4 A framework for building hardened distributed systems
Figure 3.6 depicts the alarm scheduling and triggering of an encoded process. Again, the
wrapper must decode the time tc, otherwise the framework cannot schedule the alarm. In
contrast, the encoded alarm id, aidc, is kept encoded. The event loop sees the aidc as a 64-bit
integer. Once the alarm is triggered, the event loop calls trig() with the aidc as argument.
MESSAGE VALIDITY IMPLEMENTATION
The multiple bytes in a message are all values of the encoded process. A message is invalid if
any of these bytes in a message is noncode. To guarantee that the message is complete, the
encoded part of our wrapper piggybacks the message size to the message, more precisely,
the encoded representation of the message size. Therefore, a valid message is a message
whose bytes are correctly encoded including its piggybacked size.
ANB- AND ANBD-ENCODED MESSAGES
A message with ANB-code and ANBD-code has to be slightly adapted before being transmit-
ted. A receiver process cannot know what B and D signatures to expect in the variables of
message since processes do not execute in lockstep. Therefore, the B and D signatures have
to be subtracted from the message such that the message becomes AN-encoded only. The
wrapper of the receiver process adds its local signatures before passing the message into
the encoded area. The protection provided by the AN-encoding is, nevertheless, preserved: a
wrong subtraction or addition of a signature results in an unexpected result of check with the
modulo operation.
3.4.3 ERROR ISOLATION IN PRACTICE
We now explain how errors caused by ASC faults are “virtualized” into benign failures in our
framework. Error isolation is achieved if correct processes do not modify their state based on
messages corrupted by the network or by the sender process. Remember that we consider
only distributed algorithms that tolerate benign failures: crash failures, message misrouting,
message duplication, and performance/omission failures.
MESSAGES
A process aborts upon detection of noncode values in one of its variables, transforming a
corruption into a crash failure. Nevertheless, if a corrupt message is sent out – for example,
if the check of the noncode value has also failed – then the message is invalid with a high
probability. Messages always carry an evidence of good behavior of the process that created
them: Since messages are nothing else than variables in the encoded process, a valid message
is a correctly encoded array of values. Hence, we expect ASC faults to invalidate output
messages with high probability due to the properties of AN-encoding.
If the wrapper of a process receives an invalid message, it discards it, transforming the
corrupt message into an omission failure. The wrapper might fail to detect an invalid message
if it suffers an arbitrary fault itself. The “noncodeness” of an invalid input message is again
propagated to any modified variable by the encoded process with high probability.
The event loop of a process, the network stack, operating system, etc, are all essentially
part of the network component (Section 2.1.1). Any ASC faults in the unencoded part of a
process affecting the content of an encoded message make the message invalid with high
probability. Because the wrapper decodes the destination dstc of a message (Figure 3.5), if
dstf is modified by an ASC fault, the message is simply misrouted; a benign failure.
73
3 Encoded processes
ALARMS
Alarms are local messages. The wrapper aborts the process if the process calls alarm() with
an invalid alarm id. If the wrapper fails to detect an invalid alarm id when trig() is called,
then the error propagates to variables modified via the properties in Figure 3.1.
Since the alarm identifier aidc is kept encoded in the event loop as a 64-bit integer, if an
ASC fault modifies the alarm identifier, the error is detected by the wrapper and the process
is aborted. The wrapper decodes, however, the encoded time value tc and sets the alarm in
the event loop with its functional value tf . If the scheduled time tf of an alarm is modified
by a fault, the alarm is triggered either too early or too late. Performance failures are benign
failures (Section 2.2.2). Note that the encoded part of the wrapper can detect out-of-order
alarms by bookkeeping the id of the next alarm to be triggered.
3.4.4 LIMITATIONS AND OPTIMIZATIONS
Our framework restricts in a few aspects how the process module can be programmed. These
limitations are imposed by the current implementation of the encoding compiler. First, the
process implementation cannot use function pointers. In despite of that, the algorithms we
have implemented could be easily designed to use switch-case constructs instead. Second,
the process implementation cannot use 64-bit variables. Therefore, for example, time has to
be represented by a structure with two 32-bit integers and manipulated using macros. Last,
the source code of all libraries used within the process implementation have to be available,
so that the encoding compiler can encode them. In our experience, the second restriction
was the most limiting.
Encoding is known to be computationally expensive [Sch+10a; Sch+10c]. In our framework,
only the distributed algorithm’s processes and part of the wrapper are encoded; the remaining
framework code, libraries, network stack, operating system are left untouched. Other opti-
mization to hide the cost of encoding are also possible. For example, the wrapper has to
perform some expensive operations when receiving a message: (1) it checks if the message
is valid; (2) it allocates a region of memory inside the encoded process; (3) it copies the re-
ceived message over this region of memory; and (4), if the process is compiled with ANB- or
ANBDmem-code, it additionally adds signatures and versions to the message copy. Only after
these operations, the wrapper calls recv() with the pointer to the memory region allocated
in (2). If a machine has multiple cores/processors, a simple optimization could run operations
(1)-(4) in a separate thread, overlapping the receiving and processing of messages in the en-
coded process. Another approach to exploit multiprocessing is to use ParExC [Süß+09], which
runs checks on helpers threads. Although we use one extra thread to receive message in the
event loop, we decided to perform all the work above the event loop with a single thread. In
this way, we can precisely measure the performance overhead.
BANDWIDTH OPTIMIZATION
Encoding incurs a higher network bandwidth utilization. The compiler transforms each 32-bit
word in a 64-bit word, blowing up message sizes by at least a factor of two. We propose a
bandwidth optimization that can reduce the per-message overhead to a constant size using
checksums, e.g., 32-bit CRC, to protect messages on transmission.
Our optimization works as follows:
1. When the encoded process calls ucastc() with a message mc, the wrapper calculates
the checksum s of mc.
2. The wrapper decodes mc and calls ucast(m · s), forwarding the message to the frame-
work’s event loop.
74
3.5 Experimental evaluation
client
proposer
acceptor1
acceptorn
..
.
ack request
phase 2A
phase 2B
payload
Figure 3.7: Paxos implementation in encoding framework. Client sends request payload to all
acceptors. Proposer sends phase 2A to and receives phase 2B from majority of
acceptors.
3. The message m · s is sent to the destination via the network.
4. When the message m · s is received from the network, the event loop calls recv(),
which is implemented by the wrapper.
5. The wrapper encodes m and calculates the s′ of mc.
6. The (encoded part of the) wrapper checks whether s = s′, and calls recvc(mc) in the
positive case; otherwise, drops mc.
A fault that corrupts the message is detected by the checksum. The checksum protects
the message from end-to-end because it is calculated on the encoded data. Because the
comparison between checksums is encoded, the probability of its failure not being detected
is negligible.
3.5 EXPERIMENTAL EVALUATION
In this section, we experimentally evaluate our framework focusing on two questions. (1)
how costly is the encoding for distributed algorithms? And (2) what is the probability that
error isolation is violated, i.e., what is the probability of corrupt but valid messages being sent
by a process? To answer these questions, we experimentally compare the performance of
different variants of two important distributed algorithms: Paxos and strong leader election.
Next, we inject faults in a process with the EIS fault injector [Sch+10b] and measure the
probability of invalid messages being sent.
3.5.1 ALGORITHMS AND METHODOLOGY
PAXOS
We implemented the multiple-instance variant of the Paxos algorithm [Lam98], which achieves
consensus on a sequence of values and is widely used as building block for replicated sys-
tems [Bol+11; Bur06]. Our implementation follows the structure in Figure 3.7. In our imple-
mentation processes are either clients, proposers, or acceptors. A proposer orders requests
from clients, and sends them to acceptors with a phase 2A message. Requests messages
are in fact request identifiers; the request’s payload is directly sent to the acceptors to avoid
a network bottleneck on the proposer.
Once a request and its payload are both received, the acceptor logs them and replies the
proposer with a phase 2B message. Acceptors only do in-memory logging. If the proposer
receives a reply, i.e., a phase 2B message, from a majority of acceptors, it sends an ack
message to the client containing the identifier and ordering of the chosen request.
In the classical Paxos [Lam98], proposers also perform a phase 1 and determine a distin-
guished proposer to guarantee progress. Nevertheless, in practical systems the phase 1 is
75
3 Encoded processes
not executed as long as a distinguished proposer does not change between requests [CGR07].
Since we consider only the common case performance, we have not implemented phase 1.
Acceptors run in exclusive machines. We experiment with 3 and with 5 acceptors, i.e.,
maximum number of faulty processes is f = 1 and f = 2, respectively. These numbers of
acceptors represent common choices in practical systems [Hun+10; Bur06]. We run a single
client and a single proposer collocated on the same machine. The client generates requests
in a fixed interval of time given by a load parameter. We allow the client to send at most
200 concurrent requests. The client batches requests up to 1 MB if the limit of concurrent
requests is reached. To avoid exhausting the acceptors’ memory, our client sporadically sends
prune messages to the proposer, which propagates the prune request to all acceptors.
STRONG LEADER ELECTION
We implemented the strong leader election algorithm by Fetzer and Cristian [FC99a]. “Strong”
refers here to the safety property satisfied by the algorithm: there are never two leaders at
the same instant of time. Our implementation follows closely the algorithm described in their
paper. The support number is set to a majority, i.e., a process only becomes leader if it receives
timely support from a majority of processes. Our experiments run with three processes, each
of them on a different machine. Once a process becomes leader, the remainder processes
are called slaves. We vary how many heart-beats per second are sent via the election period
(EP) parameter. The expires parameter, which determines the crash detection time, is set to
4 * EP.
ENVIRONMENT SETUP AND VARIANTS
All of our experiments were performed in 6 machines with 2 quad-core 2.0 GHz Xeon proces-
sors, 8 GB of RAM, and Gigabit Ethernet interface. The measured maximal bandwidth of a
machine in our cluster is 944 Mbit/s. Our settings are in ideal conditions, i.e., processes do
not crash, links are up and timely, and there are no other jobs running on the machines. Our
CPU utilization measurements focus on the thread performing the upcalls from the event loop
to the algorithm’s process. We use rusage function and consider only the user time.
We experiment the following variants: NATIVE is the variant without any encoding; AN,
ANB and ANBD are compiled with AN-, ANB- and ANBD-code and bandwidth optimization (see
Section 3.4.4); and finally, AN-NAIVE, ANB-NAIVE and ANBD-NAIVE are the same variants without
bandwidth optimization. ANBD-encoded Paxos could not be correctly compiled; we failed in
determining the causes of the miscompilation.
3.5.2 PAXOS
NETWORK BANDWIDTH
We show that the encoded variants of Paxos can reach the network bound. Figure 3.8 depicts
the maximum goodput each variant can achieve with 3 and 5 acceptors and a request payload
size of 1 KiB – a typical value, used for example in [Hun+10]. Goodput is the throughput at
the application level, i.e., 1000 requests per second result in a goodput of 1 MiB/s.
Since multicast is performed in software – as in systems such as ZooKeeper [Hun+10] – the
proposer has one third of its bandwidth with three acceptors, and a fifth with five acceptors.
The results in Figure 3.8 are consistent with this observation: the maximal goodput NATIVE
achieves is about 35 MiB/s and 20 MiB/s with 3 and 5 acceptors respectively.
Our AN, and ANB variants use as much bandwidth as NATIVE except of 4 extra bytes for
CRC per message. The results show that AN reaches the maximum goodput for both 3 and
5 acceptors. ANB also achieves the maximal goodput with 5 acceptors and reaches about
76
3.5 Experimental evaluation
0
10
20
30
40
0
10
20
30
40
3
acceptors
5
acceptors
0 100 200 300 400 500
Response time (ms)
G
oo
dp
ut
(M
iB
/s
)
AN-NAIVE ANB-NAIVE NATIVE
0
10
20
30
40
0
10
20
30
40
3
acceptors
5
acceptors
0 100 200 300 400 500
Response time (ms)
G
oo
dp
ut
(M
iB
/s
)
AN ANB NATIVE
Figure 3.8: Goodput versus response time for 3 and 5 acceptors with different encoded vari-
ants
0
25
50
75
100
125
2.5 5.0 7.5 10.0
Load (k.req/s)
R
es
po
ns
e
tim
e
(m
s)
AN ANB NATIVE client proposer
0
25
50
75
100
2.5 5.0 7.5 10.0 2.5 5.0 7.5 10.0
Load (k.req/s)
C
P
U
ut
ili
za
tio
n
(%
)
Figure 3.9: Response time and CPU utilization at nominal load for 3 and 5 acceptors with
different encoded variants
23 MiB/s with 3 acceptors. By increasing batch sizes and the concurrent request limit, ANB
would possibly reach the network limit. AN-NAIVE and ANB-NAIVE are network bound, but with
half of the NATIVE’s goodput since they consume twice as much bandwidth with encoded
messages.
PERFORMANCE UNDER NOMINAL LOAD
In practice replicated systems work below the system’s limit if responsiveness is critical. We
now configure the client with target loads that range from 1000 up to 5000 req/s with 5
acceptors with payload size of 1 KiB. Figure 3.9 shows the response time and CPU utilization
of our Paxos variants. Under a load of 1000 req/s the mean response time and its standard
deviation is 4.38± 0.18 ms for NATIVE, 21.74± 3.04 ms for AN, and 39.87± 5.10 ms for ANB.
Now observe (Figure 3.9, left side) that for AN the response time increases slower from
4000 req/s on and for ANB from 2000 req/s on. At these points the client starts batching its
requests. For this load range, the response time of AN is from 15 ms to 50 ms higher than
NATIVE variant; while the response time of ANB is from 35 ms up to 100 ms higher.
77
3 Encoded processes
EP (ms)
NATIVE AN ANB ANBD
µ (ms) σ µ (ms) σ µ (ms) σ µ (ms) σ
50 64.57 1.56 62.79 0.99 63.45 1.68 64.23 1.11
200 204.71 1.59 204.30 1.65 205.81 0.72 204.00 1.01
300 303.73 0.80 305.49 1.36 305.12 0.97 304.26 0.73
400 404.49 0.74 406.52 1.94 405.65 0.52 404.20 0.44
Table 3.2: Mean election time µ with 3 processes
EP (ms)
Leader’s CPU utilization (%) Slave’s CPU utilization (%)
NATIVE AN ANB ANBD NATIVE AN ANB ANBD
50 0.28 1.82 3.29 10.88 0.12 0.48 0.81 3.08
200 0.12 0.62 1.13 3.68 0.04 0.16 0.27 1.06
300 0.06 0.32 0.56 1.84 0.04 0.09 0.15 0.57
400 0.04 0.22 0.38 1.24 0.02 0.07 0.11 0.41
Table 3.3: Mean CPU utilization of the leader with 3 processes
Encoding incurs a high CPU utilization as expected, as depicted in Figure 3.9 (right side).
Note that, even though the CPU utilization is close to 100%, the high goodput presented
above is achieved because the proposer and acceptors do not process but rather only store
the batched payloads. The execution of the requests is outside the scope of the Paxos
algorithm.
3.5.3 STRONG LEADER ELECTION
We evaluate the performance of our leader election variants. The results show that encoding
incurs no costs on election time and an acceptable cost on CPU utilization. Table 3.2 depicts
the mean election time (µ) in milliseconds and its standard deviation (σ) for different EP values
and different variants of the algorithm. Each mean is the aggregation of 5 runs. The results
show insignificant difference of election time for the selected EP parameters, with variations
between 1 and 2 ms for all EP values.
Table 3.3 shows the mean CPU utilization for the same experiments. We separate the
results between the leader and one slave process. We see the direct correlation between
the election period and the CPU utilization. In general, the encoded variants incur in a higher
CPU utilization than NATIVE, which remains close 0% for all parameter values. For example,
for EP = 50, the mean CPU utilization is 0.3% for NATIVE, 1.8% for AN, 3.2% for ANB, and
11% for ANBD. The robust ANBD variant presents measurements from 1.24 up to 10.88% in
the leader and 0.41% up to 3.08% in the slave, which are acceptable CPU utilization values
for many applications.
3.5.4 FAULT INJECTION
We conclude our evaluation with a fault injection experiment. We inject a total of 30,000
different faults in a proposer process of Paxos. For that end we modified the wrapper so that
it first stores in disk a 30 s trace of the proposer’s interaction with the framework interface,
i.e., its downcalls and upcalls. For each variant we save such a trace and replay it 10,000
times, injecting one fault in a random place per replay. Whenever calls from the process to
the framework diverge from the logged trace, the wrapper aborts the process.
Table 3.4 shows, for different fault types, the percentage of arbitrary failures in relation to
the total number of failures; and in parentheses the absolute relation for the 2000 injections.
78
3.5 Experimental evaluation
Symptom
Arbitrary failures/Total failures
NATIVE AN ANB
Faulty operation 5.82% (86/1477) 2.10% (25/1193) 0.26% (4/1566)
Modified operand 5.66% (79/1396) 1.27% (14/1101) 0.19% (2/1033)
Exchanged operator 10.85% (46/424) 1.58% (9/571) 0.62% (6/963)
Exchanged operand 49.25% (295/599) 3.32% (17/512) 0.27% (1/374)
Lost store 30.01% (277/923) 2.79% (16/573) 0.35% (5/1409)
Total 16.35% (783/4789) 2.05% (81/3950) 0.34% (18/5345)
Table 3.4: Results of fault injection in proposer. Results give the percentage of proposer’s
arbitrary failures given that a failure occurred due to an arbitrary fault.
The fault types used in these experiments represent the software-level symptoms caused by
hardware errors as described by Forin. Arbitrary failures are corrupt but valid messages sent
by the proposer that were not sent in the saved trace. Note that, while only the correctly
encoded messages are valid for encoded variants, all messages are valid for the NATIVE variant.
The total number of failures encompasses all failures: crashes, omissions, corrupt but valid
messages, and incorrectly encoded (invalid) messages.
These results give the probability of error isolation being violated given that a failure occurred
due to an arbitrary fault. AN processes violate error isolation in 1.27 up to 3.32% of the failures;
ANB processes violate error isolation in about 0.3% of the failures; and finally, native processes
violate error isolation in 5 up to 50% of the failures.
3.5.5 DISCUSSION
In this chapter, our fault injection experiments focused on a proposer of the Paxos algorithm,
which is by far the most intricate process we have implemented in our framework. Our fault
injection results are consistent with previous work [Sch+10a], indicating that these results can
also be extrapolated to other algorithms.
The fault injection results corroborate the hypothesis that arithmetic error codes can enforce
error isolation. One can use a benign-fault-tolerant algorithm, automatically transform it in our
framework, reach the same bandwidth utilization, and reduce the risk of violating error isolation
from 16% down to 0.34%. In turn, the transformed system incurs higher CPU utilization and
response time. Notwithstanding, one can run a highly fault-tolerant ANBD variant with CPU
utilization from 1 up to 11% and achieve the same election times as with a NATIVE variant.
AN and ANB variants utilize even less CPU: up to 1.8% and up to 3% respectively.
Our measurements have shown high response times even with nominal load, varying from
20 to 60 ms for AN, and from 40 to 105 ms for ANB. One overhead source is the rather
costly movement of data from the network buffer into a preallocated memory region used
by the encoded process. The encoding runtime copies and checks the codeness of the
data in an unoptimized loop, which was rather designed for the initialization of the process.
Two strategies could help to mitigate this source of overhead: (1) the memory buffer in the
encoded process could be directly used to read data from the network; or (2) the encoding
runtime could first copy the data with a fast memcpy() call and then perform necessary checks.
Nevertheless, we believe such an improvement would not render the overhead non-negligible.
For example, the evaluation in Section 5.5 shows the number of hardware cycles can increase
about 39 times for a small piece of code protected with AN-encoding (Table 5.5). In Chapter 5,
we propose a solution to the overhead that is tailored to replicated systems employing Paxos.
79
3 Encoded processes
3.6 RELATED WORK
We now briefly discuss some further prior work related to the contents of this chapter.
Our framework uses the encoding compiler by Schiffel et al. [Sch+10a] to implement error
isolation. Their work focus on hardware errors in safety-critical systems, while our work
focus on distributed systems running on commodity hardware. We not only integrated their
compiler in our framework, but also extended it with wrappers to perform communication
between processes, to trigger alarms, and to virtualize arbitrary failures. Our fault injection
experiments were conducted using a fault injector also by Schiffel et al. [Sch+10b]. The results
of our fault injection campaign is compatible with their results [Sch+10a; Sch+10c].
Wamhoff et al. [Wam+13] implement error correction in applications by combining the error
detection capabilities of AN-encoding with the rollback-retry support of software transactional
memory. The solution proposed by Wamhoff et al. approximates fail-stop processes [SS83] by
trying to mask all hardware errors before they are sent out into the network. Our approach
does not directly correct errors. Instead, it transforms distributed algorithms to perform error
isolation, i.e., faulty processes expose errors in such as way that correct processes can detect
and ignore them. The distributed algorithms are then expected to tolerate benign failures such
as crashes and message omissions. As the previous work by Schiffel et al., Wamhoff et al.
also do not provide any formal guarantees or correctness arguments for their approach. In this
chapter, we have formalized (a simplified version of) AN-encoding and its fault assumptions,
and analyzed its assumption coverage.
In the context of distributed systems, Bhatotia et al. [Bha+10] also focus on arbitrary faults
in non-malicious environments. Their approach is, however, restricted to a specific set of
distributed applications, namely, MapReduce jobs programmed in the Pig framework. More
recently, Correia et al. have presented the PASC library [Cor+12b], which locally duplicates
each process and executes each message handling on both replicas. As our approach, theirs
also allows faulty processes to send invalid messages. Nevertheless, while PASC assumes a
single transient fault during the processing of a message, our approach detects multiple faults,
including intermittent and permanent faults, which are reported to occur in practice [SPW09;
NDO11].
3.7 CONCLUSION
The main contributions of this chapter to the state of the art are two fold. First, we endeavor in
the formalization of AN-encoding and its assumptions, including assumption coverage estima-
tions. Second, we extend the context of AN-encoding to distributed systems in a systematic
and practical fashion by developing a framework on which distributed systems can be built
and their processes can be automatically encoded.
Road map. In Chapter 4, we propose a new hardening technique that can leverage hardware
protection given by ECC memory. The technique virtually eliminates memory overhead and
can provide an acceptable performance overhead in terms of throughput and response time.
On the downside, the technique cannot tolerate permanent and intermittent faults since it as-
sumes a maximum frequency of faults per traversal. In Chapter 5, we return to AN-encoding
and propose a solution tailored to replicated systems employing Paxos. Our approach drasti-
cally reduces the space and performance overhead incurred by AN-encoding by encoding only
part of the Paxos algorithm.
Future work. Several open problems are left in our formalization effort. First, our coverage
estimation of AN-encoding under uniform random faults is limited. In particular, we only an-
80
3.7 Conclusion
alyzed the coverage of Assumption 3.2 for the addition operation. To completely estimate
the coverage of AN-encoding’s assumptions, one has to further analyze multiplication, divi-
sion and other operations in the instruction set of the target system. Second, we have not
estimated the coverage under non-uniform random faults because we could not determine
fault assumptions for the different heuristics proposed in the literature. Future work should
focus on modeling these heuristics and shedding light on the assumptions these heuristics
actually make in order to work. Only once these assumptions are defined, the coverage of
AN-encoding can be estimated under non-uniform faults. Finally, our AN-encoding modeling
does not consider multiple threads. We believe that the addition of multiple threads to AN-
encoding would not require overwhelming changes to the model. As basis for future work,
we explore the modeling of multiple threads in the context of our new hardening technique
in the next chapter.
From a practical perspective, our framework (Section 3.4) offers a good prototyping infras-
tructure for hardening distributed algorithms. The framework can, however, be improved in
several ways. Our implementation does not yet leverage hardware-implemented CRC instruc-
tions available in modern CPUs [Int07]. Moreover, the transfer of data from the network into
the encoded process can be revised and optimized as discussed above. Finally, the addition of
multiple threads in the wrappers to encode, decode and calculate CRCs can further improve
the responsiveness of processes.
81

4 SCALABLE ERROR ISOLATION∗
∗This chapter expands and builds upon the content of “Scalable error isolation for distributed systems”, presented
at NSDI ’15 [Beh+15a], and the companion technical report [Beh+15b]. Parts of this chapter were also presented
at HotDep ’13 [Beh+13].
83

4.1 Rationale
In this chapter, we introduce Scalable Error Isolation (SEI), a new hardening technique that
wraps a process of a distributed system and executes redundant checks to prevent the pro-
cess from propagating local errors to other processes. SEI is scalable in three dimensions. For
memory, SEI can leverage hardware-based error-detection mechanisms such as ECC to virtu-
ally eliminate the memory footprint of redundant information. For computation, SEI supports
multithreading and, thus, covers complex error propagation patterns among threads sharing
the same state. For development effort, SEI enables hardening real-world applications with
minor developer involvement. Moreover, SEI is designed to formally guarantee error isolation
in presence of ASC faults under two main assumptions: fault diversity (Assumption 2.2) and
a fault frequency (Assumption 4.1). Fault diversity asserts that an ASC fault cannot corrupt a
variable such that v is valid immediately after the fault; in this chapter, a variable v is valid as
long as its value is equal to the value of a replica variable v̄ . Fault frequency asserts that at
most one ASC fault may occur per traversal.
We present a short overview of SEI in Section 4.1. In Sections 4.2.1 and 4.2.2, we refine
the ASC model from Chapter 2.3 and present a formal specification of SEI assuming single-
threaded applications. In Sections 4.2.3 and 4.2.4, we then prove SEI correct. We extend the
model, algorithms and proofs to support multithreaded applications in Section 4.3.
Next, we present an implementation of our hardening technique called libsei (Section 4.4).
Hardening processes with libsei is a semi-automated task, requiring the developer to man-
ually mark the regions of the program that represent event handlers1 and the variables that
hold messages. Nevertheless, libsei allows for hardening existing systems with a small
amount of effort. We demonstrate that by hardening two real-world applications: memcached
and a recursive DNS resolver. In Section 4.4.3, we discuss our experience hardening these
applications.
We conducted extensive fault injection experiments, both software- and hardware-based,
and performance evaluation of our hardened applications. SEI presents an excellent trade-off
between fault coverage and performance. It makes the likelihood of error propagation negligi-
ble: down by two orders of magnitude from 44% to only 0.15% of the errors using targeted
fault injection, and no undetected errors when reducing the CPU voltage. For multithreaded
memcached, the throughput overhead is almost zero with value sizes of 128 bytes or larger,
while the memory overhead is about 30 KiB (the number of bytes modified during event han-
dling). Section 4.5 presents the results of our fault injection experiments, while Section 4.6
presents performance evaluation results of our hardened applications.
In Section 4.7, we compare SEI-hardening with other existing approaches, including the
technique presented in Chapter 3. We conclude this chapter in Section 4.8 with a discussion
of the results and possible future work.
4.1 RATIONALE
In a nutshell, SEI detects state corruption in the computation, i.e., errors in the arithmetic
and logic units of the processors, by processing each message twice and comparing the
results locally. SEI detects state corruption in memory by using error-detection codes. These
codes can be implemented in software, but SEI can also leverage hardware-level mechanisms
such as ECC to virtually eliminate memory overhead. In addition, delegating the verification
of memory data to the hardware makes this part of hardening very efficient. A caveat of
using hardware detection is that SEI is not always able to handle pointer corruptions. Our
fault-injection evaluation, however, indicates that only a negligible fraction of the injected
faults may propagate (Section 4.5). Our experiments also show that software-level state
replication completely prevents propagation, but it trades-off memory and CPU resources for
1In this chapter, we use the terms event handler and handler program interchangeably.
85
4 Scalable Error Isolation
⟨m, m̄⟩ ← Network
Compare m = m̄
drop ⟨m, m̄⟩ if mismatch
- For every v read, compare v = v̄
- Record state updates
- Produce output message ni
Abort
if mismatch
- For every v read, compare v = v̄
- Record state updates
- Produce output replica n̄i
if mismatch
- Compare state updates records
- Compare ni = n̄i
if mismatch
For every pair ni , n̄i :
Network ← ⟨ni , n̄i⟩
Receive phase
Init phase
Exec1 phase
Exec2 phase
Validate phase
Send phase
tr
av
er
sa
l
Figure 4.1: High-level phases of a hardened handler
just a marginal increase in fault coverage.
Hardening with SEI modifies the three phases of the receive-handle-send loop (see Sec-
tion 2.3.1). Hardening message-receipt and message-sending phases only entails attaching a
message replica to every message, e.g., in the form of CRCs. The core challenge of hardening
is protecting the message-handling phase. SEI-hardening2 replaces the original handler pro-
gram p with a hardened handler program ph, whose high-level (simplified) phases3 are shown
in Figure 4.1.
SEI makes sure that a process executing a handler does not read values that have been
directly corrupted by a fault before the handler execution. To this end, when a process reads
a variable, it compares its value with its replica – i.e., with its error code – using a validity
check and aborting if the check fails. In the context of SEI, fault diversity (Assumption 2.2) is
instantiated as the property in which a fault cannot corrupt a variable and its replica with the
same (or an equivalent) value.
If a fault occurs while a process executes a handler, a variable may be read and corrupted
after the validity checks are executed. Therefore, hardening must guarantee that state updates
produced by the handler are correct even if they are computed based on incorrect inputs. SEI
detects incorrect state updates by executing the handler twice (Exec1 and Exec2 phases
of Figure 4.1), recording all state updates, and comparing them at the end of the traversal
(Validate phase).
2For short, we often refer to the technique as SEI-hardening. Note that we also call the SEI-hardening technique
an algorithm although it comprises not one, but several small algorithms.
3Technically, these phases can be composed of multiple state transitions. In Section 4.2.1, we prefer, instead,
the term block for phases within traversals.
86
4.1 Rationale
To understand better how SEI detects corrupt state, let us consider a simple example
where the state of a single-threaded process is initially correct and a fault occurs during the
execution of the hardened handler. In general, faults might corrupt any variable (and any
number of variables) of the hardened program, including the data structures of the hardening
algorithm itself. Moreover, the high-level phases of Figure 4.1 may be executed multiple times
or skipped due to faults affecting the control flow. SEI tolerates such faults; for the sake of
example, however, assume that faults neither corrupt the basic execution flow of Figure 4.1
nor the data structures used by SEI. In this case, it is easy to argue why state updates
performed in the handler are correct.
We only need to consider the occurrence of at most one fault during the execution of the
phases of Figure 4.1 – since SEI assumes a maximal fault frequency of at most one ASC fault
occurring per traversal. Hence, a fault occurs either in Init, Exec1, Exec2, or Validate phase.
If Init, Exec1 and Validate phases are executed without faults, we have a correct reference
value to compare our results against shall a fault occur during Exec2. Similarly, if Init, Exec2
and Validate phases are executed without faults, we can check if a fault occurred during Exec1.
A fault during Exec1 may still corrupt the variables read during Exec2. However, such faults
can be detected by the validity checks of Exec2, which are executed correctly (since only one
fault may occur) and will make the process crash.
If Init, Exec1 and Exec2 are executed without faults, then every output message is correct
before starting Validate. Now, if a fault occurs in Validate, then the output messages are
either (1) correct if the fault did not corrupt the messages; or (2) invalid by fault diversity. If
the process completes the Validate phase without crashing and sends messages and their
replicas into the network, a receiver can use the message replica (e.g., the CRC) to guarantee
end-to-end detection of corruption and to verify that the sender executed correctly (in Init).
A fault occurring in the Init phase may cause a correct message to be mistakenly dropped.
Such a fault maps back to a message omission – a benign fault assumed to be handled by
the original algorithm (see Section 2.1.3). In this simplified algorithm, a fault could also cause
a corrupt message to be mistakenly considered as correct, e.g., by skipping the comparison
in the Init phase. Our complete algorithm takes care of such cases by executing Init twice.
Note that, although SEI assumes a fault frequency of at most one fault per traversal, it does
not limit the number of faults in an execution and, in particular, does not limit the number
of faults between traversals. Any state corruption occurring between two traversals must
be caused by a fault and is, therefore, detected via validity check during the next hardened
handler execution.
4.1.1 MEMORY SCALABILITY
In addition to the program variables V , SEI keeps a set of replica variables Vr , which are in an
encoded form, e.g., as a hardware-level ECC data or a software-level checksum. SEI does not
assume arithmetic error codes as the encoding form of variables in Vr since these are typically
not implemented in commodity hardware; for example, ECC is not an arithmetic error code.
Therefore, we cannot compute with the values of the encoded variables in Vr . Consequently,
Exec1 and Exec2 have to execute on the same variables in V .
To execute the handler twice, the state updates of Exec1 can be stored in a copy-on-write
buffer, which we call changelog, so that the original values in V and Vr remain intact and
ready to be used in Exec2. However, a changelog requires intercepting all read accesses to
variables, checking if variables have been modified, and possibly redirecting reads to the new
values in the changelog. This results in a significant overhead.
Instead, SEI uses an approach we call snapshot buffers4. In Exec1, instead of recording
4The idea of changelogs and snapshot buffers is similar to the two writing policies of CPU caches: write-back
and write-through. Moreover, the write-sets of transactional memory algorithms are data structures similar to
87
4 Scalable Error Isolation
state updates, SEI takes snapshots of the current state of variables before they are modified
for the first time and stores them in a snapshot buffer O. The new value of an updated
variable v ∈ V is directly written into v and its replica variable v̄ ∈ Vr . Directly writing into the
variables makes reads cheaper (since they are not redirected) without making writes more
expensive (since they are redirected in both approaches). Our evaluation in Section 4.6 shows
that snapshot buffers result in a significant performance improvement over changelogs.
The major implication of using snapshot buffers is that we must reset the updated variables
to their original values in O after Exec1 and before Exec2 – call this procedure the Reset
phase. Reset introduces complex inter-dependencies between faults occurring in the Exec1
and Exec2 phases. For example, a process might fail to take the snapshot of a variable v
during Exec1 due to a fault. If the original value of v is not reset correctly, we cannot assume
that Exec2 produces correct results which can be used to validate the state updates of Exec1.
SEI introduces additional checks to handle faults occurring during Exec1 or Reset.
4.1.2 COMPUTATION SCALABILITY
We have so far considered single-threaded processes. Supporting multiple threads opens
new possibilities for error propagation that must be taken into account. For example, a faulty
thread τi may write an incorrect value into a shared variable, and another thread τj may read
this incorrect value and output a corrupt message before τi executes Validate and detects the
incorrect update.
SEI can harden multithreaded processes where shared variables are protected using locks.
A hardened thread holds any lock it acquires during Exec1 until the end of Validate. During
Exec1, the thread stores the information on the sequences of locks acquired and the result
of the lock operations in a queue Q. Whenever the thread acquires a lock in Exec2, it checks
Q to make sure that locks are taken consistently, and makes the process crash otherwise.
In order to address situations when faulty threads enter a critical section without properly
acquiring the required locks, SEI enforces each thread to wait for the completion of all con-
current traversals of other threads with a barrier. Each thread is associated to a concurrency
counter, initially equal to zero. Before executing Exec1, a thread increments its own counter.
After comparing all state updates in Validate, the thread increments its counter again. There-
fore, an odd value of a counter indicates that a thread is executing a hardened handler. Before
releasing the locks, a thread waits until all other threads either have an even counter or incre-
ment it again. This guarantees that all concurrent threads have correctly validated their state
updates, otherwise the process would have crashed. Even if some thread τ experiences a
fault that prevents it from acquiring a lock during Exec1, this implies that the traversal counters
have been set without faults, so all other threads will wait until τ completes its traversal.
4.1.3 DEVELOPMENT EFFORT
Besides scaling with the application’s state size, SEI also scales with the size and complexity of
the code base. We have implemented a C library called libsei to harden distributed systems
using the SEI algorithm (Section 4.4). Hardening with libsei is semi-automated; a developer
must annotate the code of her distributed-system processes using libsei’s marker functions.
More specifically, she must indicate the parts of the program handling messages, as well as
the buffers representing input and output messages. An out-of-the-box compiler pass can
then automatically harden the annotated code by instrumenting it with calls to libsei’s API.
To show that hardening existing systems with a small amount of effort is possible, we
have hardened two real-world applications: memcached, a popular in-memory distributed cache
system, and Deadwood, a recursive DNS resolver. In our experience, hardening required a
our changelogs and snapshot buffers; write-sets cannot, however, withstand state corruptions.
88
4.2 Single-threaded SEI-hardening
good understanding of the code base. However, only minimal modifications are required. We
were able to harden memcached, for example, by adding only 60 lines of code to mark 8 event
handlers and to append CRCs to output messages.
We chose to harden these applications because they represent a large class of systems,
including other distributed in-memory caches, load balancers, web servers, and application
servers. For such systems, integrity is critical, but liveness in the presence of faults is not
strictly necessary. If a cache is not available, then the end result is slower responses, since
such a cache typically optimizes the access time of some persistent backend store. Similarly,
if a web server is not available, clients can be redirected to another one. If faults, how-
ever, corrupt state, then the end result is an incorrect (but possibly fast) response. For this
class of systems, hardening is much more convenient than masking faults with state-machine
replication, since hardening does not require replication across servers.
4.2 SINGLE-THREADED SEI-HARDENING
In this section, we present the SEI-hardening of single-threaded programs. The extension to
multiple threads is presented in Section 4.3. The single-threaded SEI-hardening achieves error
isolation and can leverage encoded replica state to save space. We first refine the process and
fault models we have defined in Section 2.3, follow it with a presentation of SEI-hardening,
and prove its correctness.
4.2.1 MODEL REFINEMENTS
SEI-hardening differentiates between global and local variables. Intuitively, global variables are
variables in the main memory that might be allocated throughout multiple traversals, whereas
local variables are registers holding values temporarily. We differentiate between local and
global variables in this context to represent the flow of values when performing a computation.
When incrementing the value of a global variable v , for example, the value of v is copied to
a register (some local variable vl ), which is then incremented.
A more precise definition of global and local variables is as follows:
Definition 4.1 (Global variables) The set Vg ⊂ V is the set of global variables of process
π. Global variables are long-lived variables, whose values are used in multiple traversals of
process π. The global variables are initialized with default values in the initial state of the first
traversal of π, and their values depend on the sequence of traversals.
Definition 4.2 (Local variables) The set Vl ⊂ V is the set of local variables of π, where
Vl∩Vg = { }. Local variables are short-lived variables, e.g., registers, whose values are discarded
once a traversal has finished. Local variables are initialized with default values before being
used in each traversal.
Note that some languages, such as Java, assign a default initial value to all variables; other
languages, such as C, inform the programmer of uninitialized local variables upon compilation.
Following a RISC-like architecture, we assume all computations are performed on local
variables, and global variables are only accessed via load and store operations.
Definition 4.3 (Load and store operations) Global variables can only be accessed via Ld and
St operations:
• Ld(vd , vs) loads the value of global variable v pointed by a local variable vs, i.e., v = s[vs],
into a local variable vd .
89
4 Scalable Error Isolation
• St(vd , vs) stores the value of a local variable vs into a global variable v pointed by a local
variable vd , i.e., v = s[vd ].
Moreover, only global variables can be accessed via Ld and St operations, i.e., a local variable
v1 ∈ Vl cannot point to another local variable v2 ∈ Vl .
The restriction of Ld and St operations being only used to access global variables helps
us to precisely define the transformations rules in the next section. In practice, however,
memory locations can also be used as local variables, e.g., the program stack. In fact, our
compiler-based implementation of SEI considers variables in the stack – which are stored in
the main memory – as a set of local variables, and differentiates at runtime whether the Ld
or St operation is accessing a global variable in the main memory, or a local variable in the
main memory.
In our algorithms, we make use of the Abort operation defined as follows.
Definition 4.4 (Abort operation) The Abort operation sets the pc with “halt” (see Page 27),
forcing the process to stop performing further Next steps.
FAULT DIVERSITY
SEI-hardening employs space redundancy to detect faults. In contrast to Encoded Processing,
SEI-hardening uses the concept of replica state for that.
Definition 4.5 (Replica variables Vr ) The set of variables Vr ⊂ V is such that Vr ∩ Vg = { },
Vr ∩ Vl = { }, and |Vr | ≥ |Vg|.
Definition 4.6 (Replica mapping µr ) There is an injective function µr : Vg → Vr that maps
each variable in Vg to a distinct variable in Vr . If a variable v ∈ Vg and a variable v̄ ∈ Vr are
such that µr [v ] = v̄ , then v and v̄ are called replica variables.
Figure 4.2 depicts the main sets of variables of a process π. By definition, global variables
and only global variables have replicas. The set of replica variables Vr is neither global nor local.
In practice, replica variables are most likely to be hosted in the main memory just as global
variables. Nonetheless, the replica of a variable does not need to be a complete copy: ECC
in main memory and CRC for messages are examples of redundant information, in hardware
or software, that can be used to implement replicas in reduced space.
Recall that the set of all variables of program p running on process π is Vp ⊂ V (see
Section 2.3.1). We define parts of Vg and, indirectly, of Vr to be exclusive for the program p
or for bookkeeping of SEI.
Definition 4.7 (Program state) The set Vs = Vp ∩ Vg is the set of all global variables used
in the program p running on process π. The part of the state formed by the variables Vs is
referred to as the program (global) state.
Definition 4.8 (Hardening state) The set Vh ⊂ Vg is the set of all global variables used by
SEI-hardening. The set Vh is such that Vh ∩ Vs = { }. The part of the state given by variables
Vh is referred to as the hardening state.
We model the variables representing input and output messages (i.e., Vi and Vo, respec-
tively) as subsets of Vs since, in practice, messages are part of the program variables and are
typically stored in main memory. As discussed in Section 2.2.3, we assume that an error-
detection code is transmitted along with each message sent over the network. Since the
sets Vi and Vo, being part of the global variables, have counterpart replica variables in Vr , one
90
4.2 Single-threaded SEI-hardening
Vp
Vl
Vg Vr
µr
Figure 4.2: Variables of process π. The set of all variables is V = Vl ∪ Vg ∪ Vr . The set of all
variables of program p running on process π is Vp. The set of all global variables
of program p running on process π is Vs = Vp ∩ Vg.
could be inclined to directly use those replica variables as error-detection code for the mes-
sages. In practice, however, if the replica variables are implemented with ECC memory, then
a SEI-hardening implementation cannot retrieve the values of these replica variables when
sending a message since the hardware does not allow direct access to the ECC memory.
Moreover, a SEI-hardening implementation cannot directly write the replica variables (i.e., the
error-detection code or CRC) received from network into the ECC memory. Therefore, we
model a second set of replica variables exclusive for variables in Vi ∪ Vo. This second set of
variables can be a part of Vg or Vl depending on the implementation.
Definition 4.9 (Replicas of Vi and Vo) The set of variables Vi ⊂ V represents the input mes-
sage, with Vi ⊂ Vs. The set of variables Vo ⊂ V represents the output messages, with
Vo ⊂ Vs. Each variable v in Vi ∪ Vo has two replica variables: v̄ ∈ Vr and v̈ ∈ V . The set V̄i
and V̄o represent the first set of replica variables of Vi and Vo, respectively. The set V̈i and V̈o
represent the second set of replica variables of Vi and Vo, respectively.
Notation of replica variables. We use a bar on top of a variable v to represent its replica
in Vr : the bar can be thought as the complement of v . We use a double dot on top of a
variable v to represent v ’s second replica (if v is an input or output message): the double dot
can remind that v̈ is the second replica of v . In Table 4.1, we list all variables sets used in
our model.
The main fault assumption of SEI-hardening is fault diversity (Assumption 2.2). Fault diversity
asserts that if a global variable is corrupted by a fault, its replica does not have the same value
at some state s, i.e., ¬Valid(v ) does not hold at s. Defining Valid(v ) for variables in Vg\(Vi ∪Vo)
is straightforward: Valid(v ) ∆= s[v ] = s[v̄ ]. If a fault corrupts a variable v ∈ Vg\(Vi ∪ Vo) at state
s, then at state s′ it holds that s′[v ] ̸= s′[v̄ ] by fault diversity. Remember that s′ is the state
immediately following s in the process execution.
In contrast, defining Valid(v ) for input and output variables requires more care since such
variables have two replicas. Fault diversity guarantees that Valid(v ) does not hold after a fault.
One may try to define Valid(v ) ∆= s[v ] = s[v̄ ] ∧ s[v ] = s[v̈ ]. Such a definition does not reflect
the intuitive notion of fault diversity because after a fault corrupting some variable v ∈ Vo,
it could hold that s[v ] ̸= s[v̄ ] and s[v ] = s[v̈ ]. Unfortunately, it violates local error exposure
(Property 2.3) since a Send step would send v and v̈ out. Even if the SEI-hardening checks
whether s[v ] = s[v̄ ] before sending out a message containing v , this check could be skipped
by a control-flow fault (SEI-hardening allows faults to affect the control-flow, as we will see
below). Therefore, we define the validity of Vi and Vo as Valid(v )
∆= s[v ] = s[v̄ ] ∨ s[v ] = s[v̈ ]
because the negation of Valid(v ) gives the fault diversity property we intuitively search for.
91
4 Scalable Error Isolation
Definition 4.10 (Valid(v ) predicate) Valid(v ) is defined for each subset of Vg as follows.
Valid(v ) ∆= s[v ] = s[v̄ ] for all v ∈ Vg \ Vi ∪ Vo
Valid(v ) ∆= s[v ] = s[v̄ ] ∨ s[v ] = s[v̈ ] for all v ∈ Vi ∪ Vo
Valid(v̄ ) ∆= Valid(v ) for all v̄ ∈ Vr
Valid(v̈ ) ∆= Valid(v ) for all v̈ ∈ V̈i ∪ V̈o
For every global variable v that does not belong to Vi or Vo, v is valid at a state s if and only
if Valid(v ) holds, i.e., v equals v̄ . In case v is an input or output variable, Valid(v ) also holds
if v equals the second replica v̈ . If a variable v ∈ Vi ∪ Vo is equal to only one replica, e.g., v̄ ,
we say that v is valid with respect to v̄ and invalid with respect to v̈ . We also define Valid(v )
for replica variables (v̄ and v̈ ), although we do not use the predicate directly in our proofs.
This definition is important because an ASC fault could corrupt only the replica variables of a
variable v , without modifying v itself.
An ASC fault can never turn a variable to become valid again, i.e., ¬Valid(v ) holds from
immediately after a fault corrupts a variable v until v is assigned to a new value by some
instruction of program p or by the Receive step.
FAULT FREQUENCY
Besides assuming fault diversity, SEI-hardening assumes an upper bound on the frequency
of fault occurrences within a traversal. An execution of the system – i.e., a sequence of
traversals – can observe an unbounded number of faults. In particular, we do not assume any
bound on the number of faults between traversals.
Assumption 4.1 (Fault frequency) At most one ASC fault might occur in a traversal A of a
process π.
Fault frequency is a system-specific assumption because the execution time of handler
programs and even the execution time of single instructions changes from system to system.
Nevertheless, we target distributed systems which handle events in less than milliseconds,
which is a reasonable fault frequency bound considering the error rates published in the liter-
ature [Cor+12b]. The fault model consequently assumes that no two faults occur within such
a short time window. The frequency of uncorrectable hardware-level data corruption reported
by studies “in the wild” indicate that this assumption holds with very high probability [NDO11;
SPW09; HSS12].
A consequence of assuming fault frequency is that SEI cannot tolerate permanent hardware
faults (see discussion on Page 28 of Section 2.3.2). The fault frequency assumption also
renders SEI vulnerable to faults affecting the text segment, i.e., the program instructions,
because such faults behave as permanent faults. Evidence suggests, however, that faults
corrupting the text segment quickly lead to a process crash [Gu+03]. Moreover, such faults
are innocuous if hardware error detection is employed (over the whole memory hierarchy)
because every loaded instruction is compared to its replica when the processor fetches the
instruction.
POINTER CORRUPTIONS
In SEI-hardening, we assume that the use of corrupt pointers invariably leads to a process
crash. A similar assumption was made in AN-encoding (Assumption 3.4). If a load or store
operation is executed using a corrupt local variable as a pointer, i.e., the corrupt local variable
points to some arbitrary value in the memory picked at random by a fault, then the process
crashes.
92
4.2 Single-threaded SEI-hardening
Assumption 4.2 (Benign pointer corruption) If a corrupt local variable vl ∈ Vl is used in a
Ld or St operation, then π immediately crashes.
Assumption 4.2 asserts that ASC faults corrupting pointers result in crashes with such a
high probability that we can consider negligible the probability of π not crashing when using a
corrupt pointer in a load or store operation (see Section 3.3.2 for a discussion on the coverage
of such assumptions). This assumption is extremely important in SEI-hardening. If pointer
corruptions were allowed, we would have to reason about the effects of error propagation to
any variable in the program. Assumption 4.2 restricts the error propagation to those variables
intended to be written or read in the algorithms.
Since pointers cannot be corrupt, we can define high-level read and write operations to
simplify the presentation of our algorithms. These operations refer directly to a global variable
instead of referring to a local variable that points to the global variable. In an implementation
of SEI, however, there is such a local variable pointing to the global variable.
Definition 4.11 (v → vl) The read operation v → vl loads the value of the global variable
v ∈ Vg into the local variable vl ∈ Vl .
Definition 4.12 (v ← vl) The write operation v ← vl stores the value of the local variable
vl ∈ Vl into the global variable v ∈ Vg.
Note that Correia et al. achieve a similar simplification in PASC with their access fault as-
sumption [Cor+11].
FURTHER ASSUMPTIONS
Finally, we introduce the last two assumptions of SEI-hardening, which are also used in PASC.
Assumption 4.3 (Immutable input messages) No variable v ∈ Vi is ever modified within a
traversal.
Assumption 4.4 (Initially correct state) The first state of the first traversal of an execution
of a process π is correct, i.e., s = r .
4.2.2 SEI-HARDENING SPECIFICATION
We now present the specification of SEI-hardening. SEI-hardening transforms the program p
running on a process π of a benign-fault-tolerant distributed system into a hardened program
ph. By hardening the programs running on all processes of the system, the system can
tolerate arbitrary faults in non-malicious environments (see Section 2.2.3). The SEI-hardening
transformation mainly duplicates the execution of p and inserts several checks to guarantee
error isolation.
SEI does not define how the transformation is to be implemented. In particular, our spec-
ification does not dictate whether the implementation realizes replicas using software-based
error-detection codes or leveraging existing hardware error detection mechanisms.
BLOCKS AND GATES
In SEI, a program is divided in blocks and gates. Blocks are sequences of instructions imple-
menting either the original functionality in program p, or implementing part of the hardening.
Gates are also sequences of instructions, but they serve to check the control flow of the
traversal only.
The first transformation rule of SEI defines the structure of a hardened program, indepen-
dently of the program p itself.
93
4 Scalable Error Isolation
program ph
1 Filter
3 FirstGate
9 Prepare1
13 Gate(cfp, cfg)
18 Prepare2
22 Gate(cf1, cfp)
*27 Exec1
28 Gate(cfr , cf1)
33 Reset
48 Gate(cf2, cfr )
*53 Exec2
54 Gate(cfc, cf2)
59 Validate
73 LastGate
84
Algorithm 4.1: Hardened program ph
block Filter
1 if ¬CheckMessage(Vi ) then
2 goto 84
block Prepare1
9 foreach v ∈ Vs do
10 O̊(v ) ←vv̄ FALSE
11 N̊(v ) ←vv̄ FALSE
12 U(v ) ←vv̄ FALSE
block Prepare2
18 foreach v ∈ Vs do
19 O̊(v ) ←vv̄ FALSE
20 N̊(v ) ←vv̄ FALSE
21 U(v ) ←vv̄ FALSE
Algorithm 4.2: Init blocks: Filter ,
Prepare1, and Prepare2
Rule 4.1 (Hardened program structure) A hardened program ph is a sequence of gates and
blocks as shown in Algorithm 4.1.
A traversal of a SEI-hardened program ph is an interleaved execution of blocks and gates
starting with the block Filter . Line 84 represents the end of the handler program, i.e., once the
execution reaches Line 84, messages might be sent out (see process model in Section 2.3.1).
The gates guarantee that the high-level control flow executes correctly, i.e., that the blocks
Filter , Prepare1, Prepare2, Exec1, Reset, Exec2, and Validate execute in the order given by
Algorithm 4.1. As we will describe below, FirstGate, Gate(. . .) and LastGate are the procedures
given by Algorithm 4.8. We first focus on the presentation of the blocks assuming blocks
execute in order, and ignoring the existence of the gates. We describe the gate algorithms
later in this section.
Algorithm 4.2 describes the three Init blocks. The Filter block checks whether the input
message is valid, discarding it in case it is invalid by jumping to the first instruction after the
program ph (Line 84). The Prepare1 block initializes the bookkeeping data structures used by
SEI. The Prepare2 performs reinitializes all data structures. The repetition of the operations
guarantees that if the Prepare1 block is skipped by a fault, the data structures are still correctly
initialized. Note that if Filter is skipped by a fault, a validity check in Validate aborts process π.
Once the input message is checked, and the data structures are initialized, the Exec1 block
executes program p for the first time (Algorithm 4.1). Subsequently, the Exec2 block executes
program p for the second time. Blocks Exec1 and Exec2 are transformed versions of the
original program p using Rules 4.2-4.6. Between Exec1 and Exec2 blocks, the Reset block
rolls back all changes done in Exec1 to variables in Vs, so that Exec2 can repeat the same
computation. Finally, the Validate block compares the computations of Exec1 and Exec2
performing a series of checks.
We do not restrict how blocks and gates are implemented – as, for example, procedures,
functions, or macros. To ease the presentation, blocks and gates are all inlined to form a
single line count over all blocks (the numbers on the left margin of the algorithms). Some
algorithms are used multiple times, not having an absolute line number. In such algorithms,
we prefix the line numbers with a “+” sign. Blocks Exec1 and Exec2 count for a single line –
marked with a “*” sign in Algorithm 4.1 – since their length depends on the specific program
p being used. Finally, lines of the algorithms may represent several instructions, e.g., if-then
94
4.2 Single-threaded SEI-hardening
function CheckMessage(Vc)
do
+0 foreach v ∈ Vc do
+1 if v ̸= v̄ or v ̸= v̈ then
+2 return FALSE
+3 return TRUE
Algorithm 4.3: Validity check of in-
put variables (Vc = Vi ) or output vari-
ables (Vc = Vo)
function Check(Vc) do
+0 foreach v ∈ Vc do
+1 if v ̸= v̄ then
+2 return FALSE
+3 return TRUE
Algorithm 4.4: Validity check of a
set of variables Vc ⊆ Vg
structures. “The execution of a line” means that the process takes enough Next steps so
that all instructions of the line are executed. We do not consider all instructions of a line to
be executed atomically with respect to Fault steps.
TRANSFERRING TRUST FROM AND TO MESSAGE REPLICAS
As discussed in Section 4.2.1, with each message m sent over the network, a replica of m
(possibly in the form of an error-detection code such as CRC) is sent along to detect data
corruption. Upon receiving a message m, process π writes m into the variables of Vi and
V̄i . Moreover, π writes the replica of message m – i.e., the error-detection code – into the
second replica variables V̈i . Intuitively, the double check in Filter (Algorithm 4.2, Line 1) and
Validate (Algorithm 4.5, Line 69) “transfers” the validity from the replica variables V̈i to the
replica variables V̄i . Lemma 4.1 formalizes this intuition.
Algorithm 4.3 represents the validity check for messages, whereas Algorithm 4.4 represents
the validity check for any other variable in Vg. The ability to check the validity of variables quickly
is essential in SEI-hardening because these checks are performed very often. To support
different implementations, we do not specify how exactly the validity check is implemented.
If the system leverages existing hardware error-detection codes, then the check v ̸= v̄ is
automatically performed by the hardware with any ordinary load operation of v . Once the
validity of the input message has been “transferred“, the variables in Vi can simply be checked
against the variables in V̄i as normal variables; having the benefit of fast checks if these are
used in the implementation.
On the one hand, Algorithm 4.3 checks that s[v ] = s[v̄ ]∧s[v ] = s[v̈ ] for every v of a message.
On the other hand, fault diversity guarantees that after a fault corrupting some variable v of a
message, s[v ] ̸= s[v̄ ] ∧ s[v ] ̸= s[v̈ ] (by Definition 4.10). This complementary use of validity is
necessary to transfer the trust from the internal replicas V̄o to the replicas V̈o being sent to
the network (see Lemma 4.10).
DATA STRUCTURES
SEI uses three data structures to bookkeep changes to variables in Vs:
• O is a set data structure representing the old values of variables in Vs modified during
Exec1.
• N is a set data structure representing the new values of variables in Vs modified during
Exec1.
• U is a map data structure marking the variables in Vs modified (updated) during Exec2.
O, N and U are instances of two simple data structure types: maps and sets. We model
maps and sets as follows.
95
4 Scalable Error Isolation
Definition 4.13 (Map data structure) A map data structure D is a set of global variables
VD ⊂ Vg\Vs and a bijective function µD : Vs → VD.
At each state s, a map data structure implements an in-memory function that maps each
variable v in Vs to some value (possibly different from s[v ]). For any v ∈ Vs, we use the
shorthand notation “D(v ) at state s” meaning the value of µD[v ] at state s, i.e., s[µD[v ]].
Definition 4.14 (Set data structure) A set data structure D is composed of a map data struc-
ture D and an auxiliary map data structure D̊. For all v ∈ Vs, the variable µD[v ] stores a value
for variable v. For all v ∈ Vs, if the value of the variable µD̊[v ] is TRUE, the set data structure
is said to contain v, otherwise v is not in the set data structure.
When s[µD̊[v ]] = TRUE, the set contains a value for variable v , i.e., the value s[µD[v ]]. When
s[µD̊[v ]] = FALSE, the set does not contain a value for variable v , and the value s[µD[v ]] is
undefined. For any v ∈ Vs, we use the shorthand notation:
• “v ∈ D at state s” meaning that D̊(v ) at state s is TRUE, i.e., s[µD̊[v ]] = TRUE; and
• “v /∈ D at state s” meaning that D̊(v ) at state s is FALSE, i.e., s[µD̊[v ]] = FALSE.
Data structures are initialized in blocks Prepare1 and Prepare2. They are initialized twice
because a partial initialization can compromise the correctness of the hardening. Clearly,
these definitions of data structures are rather space inefficient. In a real implementation, these
data structures would not contain a mapping for all variables in Vp, but instead they would
dynamically adapt to the currently used variables. The definitions above, however, simplify
our formalization. They are general enough to represent many real implementations of set
and map data structures because they require multiple instructions to introduce elements in
the set or map. These instructions are not atomic with respect to Fault steps.
HIGH-LEVEL ASSIGNMENTS
We now define two high-level assignment operations used in the algorithms of blocks and
gates. These operations write a value to or copy the value between variables in Vg. The
operations writing any variable v ∈ Vg also write in its replica variable v̄ ∈ Vr .
Definition 4.15 (v ←vv̄ vl) Given a global variable v ∈ Vg and a local variable vl ∈ Vl , the high-
level instruction v ←vv̄ vl writes the value of vl into the variable v and its replica v̄.
+0 v ← vl
+1 v̄ ← vl
Since vl is a local variable, vl has no replica. A corruption of vl can cause v and its replica v̄ to
be corrupt and valid, i.e., equal. In our algorithms, we also use v ←vv̄ TRUE or v ←vv̄ FALSE,
where TRUE and FALSE are constants. In such cases, a fault cannot make both replicas have
the same value since constants are part of the instruction, and assume no faults occur in the
text segment by definition (see discussion in Section 2.3, Page 29).
Definition 4.16 (v ←Cpy w) Given two global variables v, w ∈ Vg, the high-level instruction
v ←Cpy w copies the value of w into v and its replica v̄. We store the value of w temporarily
in the local variable vl ∈ Vl . The high-level assignment v ←Cpy w is the following sequence
of instructions:
+0 w → vl
+1 v ←vv̄ vl
96
4.2 Single-threaded SEI-hardening
Note that, in a real computer equipped with ECC in its memory modules, copying the value
of a memory location w to another location v does not automatically copy w ’s error code
into v ’s error code. Our modeling of copying values between global variables captures this
window of vulnerability by first copying the value of w into a local variable vl . Since vl is a local
variable, a corruption of vl can then propagate to v and v̄ . SEI can detect such corruptions, as
proved below.
TRANSFORMATION OF PROGRAM p INTO BLOCKS Exec1 AND Exec2
Block Exec1 is the original program p with Rules 4.2, 4.3, and 4.5 applied, whereas block
Exec2 is program p with Rules 4.2, 4.4 and 4.6 applied. These rules substitute read (v → vl )
and write instructions (v ← vl ) in program p with sequences of instructions that additionally
check the validity and manipulate the bookkeeping data structures.
Rule 4.2 (Reading variable v ∈ Vs in blocks Exec1 and Exec2) A read instruction v → vl ,
with global variable v ∈ Vs and local variable vl ∈ Vl , is substituted by the following sequence
of instructions.
replace v → vl with
+0 if ¬Check(v ) then
+1 Abort
+2 v → vl
The validity check of variable v ∈ Vs ∪ Vi is performed only the first time v is read in Exec1
and the first time v is read in Exec2.
Before a variable is read, its validity is verified by comparing it with its replica. This verifica-
tion is cheap if we use hardware memory protection. If software-level memory protection is
used instead, it is sufficient to do the check only the first time a variable is read in the block.
In fact, in this case the cost of verifying if the variable has been already read can be lower
than the cost of executing the comparison.
Rule 4.3 (Writing variable v ∈ Vs in blocks Exec1) A write instruction v ← vl , with global
variable v ∈ Vs and vl ∈ Vl , is substituted by the following sequence of instructions.
replace v ← vl with
+0 if v /∈O then
+1 if ¬Check(v ) then
+2 Abort
+3 O(v ) ←Cpy v
+4 if ¬Check(O̊(v )) or v ∈O then
+5 Abort
+6 O̊(v ) ←vv̄ TRUE
+7 if ¬Check(v ) or v ̸= O(v ) then
+8 Abort
+9 v ←vv̄ vl
+10 if ¬Check(O̊(v )) or v /∈O then
+11 Abort
97
4 Scalable Error Isolation
The sequence of instructions from Line +3 to +6 add the value of v to the set O. In particular,
the value of v in the data structure O is up-to-date after Line +3, but v is only contained by O,
i.e., v ∈O, from the state after the execution of Line +6. The check at Line +5 guarantees no
control-flow error can force a second execution of Line +3. The check at Line +8 guarantees
that the validity of v is propagated to O(v ).
Write assignments during the second execution are simpler than in the first execution:
Rule 4.4 (Writing variable v ∈ Vs in blocks Exec2) A write instruction v ← vl , with global
variable v ∈ Vs and local variable vl ∈ Vl , is substituted by the following sequence of instruc-
tions.
replace v ← vl with
+0 U(v ) ←vv̄ TRUE
+1 v ←vv̄ vl
+2 if ¬Check(U(v )) or v /∈ U then
+3 Abort
First, v is added to U. Next, the original assignment to v is executed along with a write to
v̄ (Line +1). Finally, the validity and containment in U is checked.
The variables belonging to output variables have to additionally update their second replicas.
We define two rules which are applied before Rules 4.3 and 4.4.
Rule 4.5 (Writing variable v ∈ Vo in blocks Exec1) The following instruction is inserted be-
fore a write instruction v ← vl in block Exec1, with local variable vl ∈ Vl and output global
variable v ∈ Vo.
before v ← vl insert
+1 v̈ ← ∼ vl
Rule 4.5 writes the 1-complement of vl ’s value into the replica variable v̈ , before a write
operation executes in Exec1 stores the value of vl into v . That guarantees that the message
m represented by the variables in Vo is invalid with respect to the variables in V̈o. In case a
fault induces a jump out of the traversal directly to Line 84, the message in Vo can be safely
sent since it is invalid.
Rule 4.6 (Writing variable v ∈ Vo in blocks Exec2) The following instruction is inserted be-
fore the last write instruction v ← vl in block Exec2, with local variable vl ∈ Vl and output
global variable v ∈ Vo.
before v ← vl insert
+1 v̈ ← vl
Rule 4.6 writes again into v̈ , but this time it writes the value of vl . If a fault induces a jump
out the traversal at this point, the value of vl is correct by the fault-frequency assumption.
Hence, if the message in Vo is valid, then it is the correct message. Note that if v is written
multiple times in the block, only the last write to v is accompanied by a write to v̈ .
98
4.2 Single-threaded SEI-hardening
block Reset
33 foreach v ∈O do
34 if v /∈ N then
35 if ¬Check(v ) then
36 Abort
37 N(v ) ←Cpy v
38 if ¬Check(N̊(v )) or v ∈ N
then
39 Abort
40 N̊(v ) ←vv̄ TRUE
41 if ¬Check(v ) or v ̸= N(v )
then
42 Abort
43 v ←Cpy O(v )
44 if ¬Check(N̊(v )) or v /∈ N then
45 Abort
46 if ¬Check(O) then
47 Abort
block Validate
59 if ¬Check(U) or ¬Check(N) then
60 Abort
61 foreach v ∈ N do
62 if ¬Check(v ) or v ̸= N(v ) then
63 Abort
64 if v /∈ U then
65 Abort
66 foreach v ∈ U do
67 if v /∈ N then
68 Abort
69 if ¬CheckMessage(Vi ) then
70 Abort
71 if ¬CheckMessage(Vo) then
72 Abort
Algorithm 4.5: Reset and Validate blocks
RESETTING AND VALIDATION
The final two blocks introduced into ph are Reset and Validate (Algorithm 4.5). Closely resem-
bling Rule 4.3, the Reset block loops over all variables in O and restores their old values, while
saving their new values in N. Moreover, Reset checks the validity of O before finishing.
The Validate block first checks the validity of data structures U and N. Next, it checks that
the final value of each variable v ∈ Vs is the same as the value of N(v ). Both foreach loops
check that every variable in U is also in N and vice versa. Finally, the Validate block checks
the message validity of the input and output messages. The check at Line 69 in conjunction
with the message validity check in the Filter block guarantee that the input variables Vi are
valid not only with respect to V̄i but also with respect to V̈i . The check at Line 71 guarantees
that the output variables Vo are valid with respect to V̈o if the output message is valid with
respect to V̄o.
CONTROL-FLOW GATES
SEI-hardening is straightforward to be proved correct as long as faults do not affect the control
flow of the program. Control-flow faults entangle the reasoning because they can change in
several different ways the order in which the instructions are executed. To simplify the design
of SEI, we introduce control-flow gates, a mechanism that “confines” the control-flow faults
within the blocks.
Intuitively, control-flow gates enforce that a fault cannot make the process leave the block
where the fault occurs without crashing the process. The scheme mainly provides two guar-
antees: First, a fault in the current block cannot jump back into a previous block. Second, a
fault in the current block cannot jump into the next block.
To understand how gates achieve these guarantees, consider the example program shown
in Algorithm 4.6 containing two blocks, B1 and B2, and a variable cf initialized with the value
99
4 Scalable Error Isolation
e1 B1
e2 if cf = TRUE then
e3 Abort
e4 cf ←vv̄ TRUE
e5 B2
e6 if cf = FALSE then
e7 Abort
Algorithm 4.6: Example: sin-
gle control-flow gate
e1 B1 (* Block 1 *)
e2 if cf1 = TRUE then (* Gate 1 *)
e3 Abort
e4 cf1 ←vv̄ TRUE
e5 B2 (* Block 2 *)
e6 if cf2 = TRUE then (* Gate 2 *)
e7 Abort
e8 cf2 ←vv̄ TRUE
e9 if cf1 = FALSE then
e10 Abort
e11 B3 (* Block 3 *)
e12 if cf2 = FALSE then (* Gate 3 *)
e13 Abort
Algorithm 4.7: Example: sequence of
control-flow gates
FALSE (cf stands for control-flow flag). We assume that in fault-free traversals, no instruction
in B1 jumps into B2, nor any instruction in B2 jumps into B1. Blocks can be seen as strictly
separate “phases” of a fault-free traversal. In this example, we also assume that cf is not
corrupted by a fault, only the control flow, i.e., the program counter.
Assume a control-flow fault occurs in B2. The fault cannot jump into B1 without crashing
the process. We show why: We know that Line e4 is executed correctly since the fault occurs
in B2, which is after Line e4. A jump from B2 into B1 would execute instructions in B1 and
eventually execute Line e2. Since we assume at most one fault per traversal, the second time
Line e2 is executed, the process crashes because cf is TRUE. The process does not crash the
first time Line e2 is executed because cf is initialized with FALSE (outside the pseudo-code).
Now assume a control-flow fault occurs in B1. The fault cannot jump into B2 without
crashing the process. A jump from B1 into B2 would execute instructions in B2 and eventually
execute Line e6. We know that Line e6 is executed correctly since the fault occurs in B1,
which is before Line e6. The process crashes because Line e4 is skipped by the fault and cf
is initially FALSE. In fault-free traversals, the process does not crash since cf is set to TRUE at
Line e4.
So far, the scheme partially confines faults within blocks: a fault in B1 can either jump
back inside B1 or jump exactly at cf = TRUE (Line e4); likewise, a fault in B2 can either jump
back inside B2 or jump exactly at cf = TRUE. A fault in B1 or B2 can still leave the traversal
completely jumping to some line after Line e7, however. We deal with faults leaving the
traversal in Section 4.2.4.
The scheme described above can be combined to confine faults in several blocks. Algo-
rithm 4.7 shows an example with 3 blocks. Gates 1 and 2 guarantee that no control-flow
fault can jump from B1 into B2 and vice versa; whereas Gates 2 and 3 guarantee that no
control-flow fault can jump from B2 into B3 and vice versa. Additionally, Gates 1, 2 and 3
together guarantee that no fault can jump from B1 into B3 because the gates are chained:
the check of the assignment to cf2 (Lines e6 and e8) takes place before the last check of cf1
(Line e9). If a fault in B1 jumps into B3, then cf2 is FALSE and the process crashes. Recall
that only one fault can occur per traversal by assumption, hence, no second fault occurs to
jump over Line e12. If a fault in B1 jumps exactly at Line e8, then cf2 is set to TRUE, but the
process crashes at Line e9 anyhow because cf1 is FALSE.
The generalization of this scheme is used in SEI-hardening (Algorithm 4.1) and is described
in Algorithm 4.8. The following control-flow flags are used: cfg is the control-flow flag of
100
4.2 Single-threaded SEI-hardening
procedure FirstGate
3 if ¬Check(cfg) or cfg = TRUE then
4 Abort
5 cfp, cf1, cfr , cf2, cfc ←vv̄ FALSE
6 if ¬Check(cfg) or cfg = TRUE then
7 Abort
8 cfg ←vv̄ TRUE
procedure Gate(cfnext, cfprev)
+0 if ¬Check(cfnext) or cfnext = TRUE
then
+1 Abort
+2 cfnext ←vv̄ TRUE
+3 if ¬Check(cfprev) or cfprev = FALSE
then
+4 Abort
procedure LastGate
73 cfg ←vv̄ FALSE
74 if ¬Check(cfp) or cfp = FALSE then
75 Abort
76 if ¬Check(cf1) or cf1 = FALSE then
77 Abort
78 if ¬Check(cfr ) or cfr = FALSE then
79 Abort
80 if ¬Check(cf2) or cf2 = FALSE then
81 Abort
82 if ¬Check(cfc) or cfc = FALSE then
83 Abort
Algorithm 4.8: Control-flow gates
Prepare1, cfp of Prepare2, cf1 of Exec1, cfr of Reset, cf2 of Exec2, and cfc of Validate.
The control-flow flag cfg additionally protects the whole traversal when a fault jumps out of
the traversal by forcing a subsequent traversal to crash the process. Moreover, the hardening
of the blocks guarantees that Vo is either invalid with respect to V̈ or it is correct if such a
fault occurs.
Definition 4.17 (First initialization of the control-flow flags) In the first state of the first
traversal, all control-flow flags except cfg are initialized with TRUE; cfg is initialized with FALSE.
Remember that the initial state of the first traversal of an execution is correct by assumption
(Assumption 4.4). In each Gate(. . .), the flags have to be initially FALSE, otherwise the process
crashes at Line +0 of the gate. As shown in Algorithm 4.8, FirstGate initializes all flags with
FALSE if and only if cfg is FALSE.
If one fault jumps out of the current traversal A, cfg is not set to FALSE since Line 73 is
skipped. In a subsequent traversal B, a fault might occur since we assume at most one fault
per traversal. If no fault occurs, process π crashes in the check of Line 3 since cfg is TRUE.
A fault could, nevertheless, occur and jump over this line. The double check for cfg = TRUE
at Line 6 guarantees that no fault can skip both checks and still reset the other control-flow
flags to FALSE. A fault in traversal B that skips both checks ends up also skipping the reset of
the control-flow flags. If process π does not skip the reset of the flags, then it must execute
one of the checks of cfg, which causes it to crash.
If some flag other than cfg is initially TRUE in traversal B and not reset at Line 5, then π will
crash in the respective gate at Line +0. The only way to set the cfg to FALSE is by executing
the LastGate. However, LastGate crashes the process if any of the other control-flow flags is
FALSE. As we will show in the correctness proof of Section 4.2.4, the only possible scenario is
a fault in the subsequent traversal B jumping over the FirstGate into the same block K where
the previous traversal A was left by the first fault. We show that such sequence of faults is
equivalent to a single fault confined to block K .
101
4 Scalable Error Isolation
Set Description Relation
V all variables of π
Vg global variables of π Vg ⊂ V
Vl local variables of π Vl ⊂ V
Vr replica variables of π Vr ⊂ V
Vp variables used by program p Vp ⊂ V
Vs global variables used by program p Vs = Vp ∩ Vg
Vh global variables used by hardening Vh ⊂ Vg and Vh ∩ Vs = { }
Vi input message variables Vi ⊂ Vs
V̄i input message’s first replica V̄i ⊂ Vr
V̈i input message’s second replica V̈i ⊂ Vg or V̈i ⊂ Vl
Vo output message variables Vo ⊂ Vs
V̄o output message’s first replica V̄o ⊂ Vr
V̈i output message’s second replica V̈o ⊂ Vg or V̈o ⊂ Vl
O variables with old values of variables in Vs O ⊂ Vh
O̊ variables marking modified variables in Vs O̊ ⊂ Vh
N variables with new values of variables in Vs N ⊂ Vh
N̊ variables marking modified variables in Vs N̊ ⊂ Vh
U variables marking modified variables in Vs (Exec2) U ⊂ Vh
Flags control-flow variables of gates Flags ⊂ Vh
Table 4.1: Summary of all variable sets used in SEI-hardening
4.2.3 CORRECTNESS WITH BLOCK-CONFINED FAULTS
Our proof is divided in two parts. In this section, we only consider the blocks of SEI-hardening
(Algorithm 4.1) and assume faults cannot escape them. In Section 4.2.4, we prove that the
gates can guarantee this assumption. We define block-confined faults as follows.
Assumption 4.5 (Block-confined faults) A fault corrupting the control flow – i.e., the program
counter – does not set the pc to an instruction other than from the block where the fault
occurs.
Given Assumption 4.5, we show that if the following invariant holds at the beginning of a
traversal, then it holds at the end of the traversal. That is technically not an inductive invariant
over the steps of the system, but rather over the complete traversals, hence, we call this
invariant a traversal invariant. The traversal invariant asserts that if a variable v is corrupt at
state s (i.e., different from there reference value), then v is invalid at s. In this section, the
traversal invariant is assumed to hold in the initial state of the traversal sb. We prove that the
traversal invariant holds in the final state of the traversal se.
Property 4.1 (Traversal invariant) Given a state s of a traversal,
(a) for all v ∈ Vg\(Vi ∪ Vo), if s[v ] ̸= r [v ], then s[v ] ̸= s[v̄ ];
(b) for all v ∈ Vi , if s[v ] ̸= r [v ], then s[v ] ̸= s[v̄ ] ∨ s[v ] ̸= s[v̈ ]; and
(c) for all v ∈ Vo, if s[v ] ̸= r [v ], then s[v ] ̸= s[v̈ ].
Particularly important is the distinction between the cases of general global variables and
input/output variables. Global variables other than Vi and Vo are protected by their replica
102
4.2 Single-threaded SEI-hardening
State Description Line
sb the state at the first line of program ph (Filter ) 1
sg the state at the first line of Prepare1 9
sp the state at the first line of Prepare2 18
sx1 the state at the first line of Exec1 27
sr the state at the first line of Reset 33
sx2 the state at the first line of Exec2 53
sc the state at the first line of Validate 59
se the state immediately after the last line of program ph 84
rb the reference state at the first line of program ph 1
re the reference state immediately after the last line of program ph 84
Table 4.2: States used in the correctness proofs
state between traversals and by the SEI-hardening algorithms during traversals. If some fault
occurs until the beginning of the traversal and corrupts a variable v , then sb[v ] ̸= sb[v̄ ] by
fault diversity. We will show that if some variable v is corrupted during the traversal, then
se[v ] ̸= se[v̄ ].
In contrast, variables in Vi might arrive already corrupt from the network. If no fault occurs
until the beginning of the traversal, then it holds that sb[v ] = sb[v̄ ] for all v ∈ Vi and sb[v ] ̸= sb[v̈ ]
for some v ∈ Vi . In other words, the input message is valid with respect to V̄i , but invalid with
respect to V̈ . Moreover, if some fault occurs until the beginning of the traversal corrupting a
variable v ∈ Vi , then sb[v ] ̸= sb[v̄ ] by fault diversity.
Finally, the output variables have a stronger requirement: if some variable v ∈ Vo is corrupted
during the traversal, then it should hold that se[v ] ̸= se[v̈ ]. By definition, the values of variables
v and v̈ are sent over the network, but not of variable v̄ – remember that an implementation
using hardware error-detection codes cannot access the ECC memory to retrieve the value of
v̄ . Since any check after the traversal but before the sending could be skipped by a fault we
cannot rely on v̄ for guaranteeing error isolation – remember that we do not make any fault
frequency assumption outside the traversals. The traversal invariant has to make an assertion
on the variables that are effectively sent over the network, hence, it must hold that v and v̈
have the same value at se.
CONVENTIONS
Since we model here high-level operations of the algorithms, each line of the algorithms and
Rules 4.1-4.6 are in fact a sequence of instructions, i.e., a sequence of Next steps. Remember
that we do not consider all instructions of a line to be executed atomically with respect to
Fault steps. We use the following convention to refer to a state:
The state s at Line X is the state s before the execution of the first instruction of the se-
quence of instructions represented by Line X.
The state s immediately after Line X is the state s at Line X+1.
A fault occurs or takes place at state s: s is the state before the fault transition is taken and
s′ is the state after the fault has taken place.
Since in this section a fault is always confined to the block where it occurs, the state at the
first line of every block exists. Moreover, the state after the last line (Line 84) exists. We use
these states to guide our proofs; they are listed in Table 4.2. Note that if a traversal never
completes, it can be equated to a crash so we do not need to consider it.
103
4 Scalable Error Isolation
Filter Prepare1 Prepare2
Exec1 Reset Exec2 Validate
sb sg sp sx1 sr sx2 sc se
States of traversal
F
T
M
ig
ht
fa
il
Figure 4.3: Fault assumption for Lemmas 4.1 and 4.2
In a traversal, states are totally ordered starting from state sb until state is se. We use
the ≺ relation to indicate that a state precedes another.
Faults are one way to corrupt variables, the other way is via error propagation occurring in
a Next state transition, i.e., the execution of instructions over corrupt source operands might
result in corrupt target operands. We use the expressions modified by a fault or modified by
an assignment to mean a variable was modified at a state s by a Fault transition or by a Next
transition, respectively. Sometimes we have to differentiate these two cases referring to the
last modification of a variable up to the current state s. For that we define what it means for
a variable to be determined by a fault and determined by an assignment as follows.
Definition 4.18 (Determined by a fault at state s) A variable v ∈ V is determined by a fault
at state s if and only if the last fault before s occurs at a state sf ≺ s, takes place at state s′f ,
and v is not assigned by an instruction in any state sa with s′f ≺ sa ≺ s.
Definition 4.19 (Determined by an assignment at state s) A variable v ∈ V is determined
by an assignment i at state s if and only if v is not determined by a fault at s, i is the last
assignment to v performed at state sa ≺ s, i transforms state sa into s′a, and s′a[v ] = s[v ].
In our proofs, remember that we implement the containment of a variable v in a set data
structure – e.g., v ∈O – with an auxiliary map data structure (see Definition 4.14). When
a variable v is contained by a set data structure D, then its value in the auxiliary map data
structure D̊ is TRUE – e.g., s[O̊(v )] = TRUE.
FAULTS IN THE Init BLOCKS
We start by showing that, if a fault occurs in one of the Init blocks and process π does
not crash, then at the beginning of Exec1, all the variables in Vi can be checked against V̄i .
Moreover, all data structures are either correctly initialized or invalid. See Figure 4.3 for our
fault assumption in the following lemmas.
Lemma 4.1 asserts that if the input message has arrived invalid from the network, then it is
either discarded or the trust from the V̈i is transferred to V̄i – even if a fault occurs in one of
the Init blocks. In other words, the variables of the input message are invalid with respect to
V̄i before Exec1 starts if they are invalid with respect to V̈i when the traversal starts. Again,
the trust transfer allows Exec1 and the following blocks to treat variables in Vi as ordinary
program-state variables – leveraging cheap hardware checks if these are available. Moreover,
the lemma asserts that the bookkeeping data structures O, N and U are all correctly initialized
even if a fault occurs in the Init blocks.
Lemma 4.1 Assume the traversal invariant holds at sb, a fault occurs in Filter, Prepare1, or
Prepare2, and π does not crash. For all v ∈ Vi , if sb[v ] ̸= sb[v̈ ], then v is invalid at sx1 . Moreover,
for all v ∈ Vs,
104
4.2 Single-threaded SEI-hardening
• sx1 [O̊(v )] = FALSE or O̊(v ) is invalid at sx1 ,
• sx1 [N̊(v )] = FALSE or N̊(v ) is invalid at sx1 ,
• and sx1 [U(v )] = FALSE or U(v ) is invalid at sx1 .
Proof:
1. CASE: A fault occurs in Filter
1.1. Prepare1, Prepare2, and Validate execute correctly since faults are block-confined by
Assumption 4.5.
1.2. No variable in Vi is modified by an instruction between sb and se by Assumption 4.3.
1.3. For all v ∈ Vi , sx1 [v ] = sx1 [v̈ ] and sx1 [v ] = sx1 [v̄ ], otherwise π crashes in Validate (Line 69)
by Steps 1.1 and 1.2.
1.4. For all v ∈ Vi , sx1 [v ] = sb[v ] and sx1 [v̈ ] = sb[v̈ ] by Steps 1.2 and 1.3 and by fault diversity.
1.5. For all v ∈ Vi , sb[v ] = sb[v̈ ] by Step 1.4.
1.6. For all v ∈ Vs, sx1 [O̊(v )] = FALSE, sx1 [N̊(v )] = FALSE, and sx1 [U(v )] = FALSE since Prepare2
executes correctly by Step 1.1.
2. CASE: A fault occurs in Prepare1
2.1. Filter , Prepare2, and Validate execute correctly since faults are block-confined by As-
sumption 4.5.
2.2. No variable in Vi is modified by an instruction between sb and se by Assumption 4.3.
2.3. For all v ∈ Vi , sx1 [v ] = sx1 [v̈ ] and sx1 [v ] = sx1 [v̄ ], otherwise sx1 does not exist since
Filter terminates the traversal at Line 2 or π crashes in Validate (Line 69) by Steps 2.1
and 2.2.
2.4. For all v ∈ Vi , sx1 [v ] = sb[v ] and sx1 [v̈ ] = sb[v̈ ] by Steps 2.2 and 2.3 and by fault diversity.
2.5. For all v ∈ Vi , sb[v ] = sb[v̈ ] by Step 2.4.
2.6. For all v ∈ Vs, sx1 [O̊(v )] = FALSE, sx1 [N̊(v )] = FALSE, and sx1 [U(v )] = FALSE since Prepare2
executes correctly by Step 2.1.
3. CASE: A fault occurs in Prepare2
3.1. Filter and Prepare1 execute correctly since faults are block-confined by Assumption 4.5.
3.2. No variable in Vi is modified by an instruction between sb and sp by Assumption 4.3.
3.3. For all v ∈ Vi , sp[v ] = sp[v̈ ] and sp[v ] = sp[v̄ ], otherwise sp does not exist since Filter
terminates the traversal at Line 2 by Steps 3.1 and 3.2.
3.4. For all v ∈ Vi , sp[v ] = sb[v ] and sp[v̈ ] = sb[v̈ ] by Steps 3.2 and 3.3 and by fault diversity.
3.5. For all v ∈ Vi , sb[v ] = sb[v̈ ] by Step 3.4.
3.6. For all v ∈ Vs, sp[O̊(v )] = FALSE, sp[N̊(v )] = FALSE, and sp[U(v )] = FALSE by correct
execution of Prepare1.
3.7. For all v ∈ Vi , if sx1 [v ] ̸= sp[v ], then v is invalid at sx1 by fault diversity.
3.8. For all v ∈ Vs, if sx1 [O̊(v )] ̸= FALSE, then O̊(v ) is invalid at sx1 by fault diversity.
3.9. For all v ∈ Vs, if sx1 [N̊(v )] ̸= FALSE, then N̊(v ) is invalid at sx1 by fault diversity.
3.10. For all v ∈ Vs, if sx1 [U(v )] ̸= FALSE, then U(v ) is invalid at sx1 by fault diversity. □
We now show that if Exec1, Reset and Exec2 correctly execute on correct variables, then
the result of the traversal is correct (or invalid).
Lemma 4.2 Assume the traversal invariant holds at sb, no fault occurs in Exec1, Reset and
Exec2, and π does not crash. For every variable v ∈ Vs, if se[v ] ̸= re[v ], then v is invalid at se.
Proof:
1. Filter , Prepare1 and Prepare2 do not write into any v ∈ Vs by definition (Algorithm 4.2).
2. For all v ∈ Vs, either sx1 [v ] = sb[v ] by Step 1, or v is invalid at sx1 by fault diversity if a fault
modified v .
3. For all v ∈ Vs read in Exec1 for the first time at some state s, v is valid at s since π does
not crash in the check of Line +1, Rule 4.2.
105
4 Scalable Error Isolation
Filter Prepare1 Prepare2
Exec1
Reset Exec2 Validate
sb sg sp sx1 sr sx2 sc se
States of traversal
F
T
M
ig
ht
fa
il
Figure 4.4: Fault assumption for Lemmas 4.3 and 4.4
4. For all v ∈ Vs read in Exec1 for the first time at some state s, s[v ] = sx1 [v ] = sb[v ] = rb[v ]
by Steps 1, 2 and 3 and traversal invariant.
5. For all v ∈ Vs, v /∈O, v /∈ N and v /∈ U at state sx1 by Lemma 4.1, and O̊(v ), N̊(v ) and U(v )
are valid at sx1 since no fault occurs after Prepare2 and since π does not crash at the checks
of Lines 46 and 59.
6. Exec1, Reset, and Exec2 execute correctly (by assumption) on correct variables by Steps 3,
4 and 5 and traversal invariant.
7. For every variable v ∈ Vs, if se[v ] ̸= re[v ], then v is invalid at se by Step 6, fault diversity
and traversal invariant. □
FAULTS IN THE Exec1 BLOCK
Following the sequence of blocks, we now assume a fault occurs in Exec1 (see Figure 4.4).
We first show that if some variable v is determined by an assignment in block Exec1, then
all instructions in Rule 4.3 are executed. In other words, if v is modified by an assignment in
Exec1, O(v ) contains v ’s old value at state sr . Moreover, v is valid at the initial state sb. This
ensures that Reset can correctly rollback state updates performed in Exec1.
Lemma 4.3 Assume no faults occurs in Init blocks and Reset block, and π does not crash.
For any v ∈ Vs modified by an assignment in block Exec1, it holds that sr [O(v )] = sb[v ],
sr [O̊(v )] = TRUE, O(v ) and O̊(v ) are valid at sr , and v is valid at sb.
Proof:
1. For all v ∈ Vs, v /∈O, v /∈ N, and v /∈ U at state sx1 by assumption.
2. Let some v ∈ Vs be determined by assignment for the first time in Exec1 at state s1
immediately after Line +9, Rule 4.3.
3. s1 ≺ sr by Assumption 4.5.
4. O(v ) and O̊(v ) are valid at sr , otherwise π would crash in Reset, Line 46.
5. s1[O̊(v )] = TRUE, i.e., v ∈O at s1.
5.1. If no fault occurs before s1, the value of v is added to O (Line +6, Rule 4.3) before the
value v is modified.
5.2. If no fault occurs after s1, the absence of an entry for v would result in a crash right
after s1 due to the check v ∈O (Line +11, Rule 4.3); a contradiction.
6. There is a state s2 ≺ s1 immediately after Line +6, Rule 4.3, when O̊(v ) is assigned for the
first time in Exec1.
6.1. Assume by contradiction that s1 ≺ s2.
6.2. v /∈O at state s1 since Init blocks execute correctly by assumption.
6.3. A fault must have skipped Lines +3 to +6 before Line +9 is executed, otherwise
s2 ≺ s1.
6.4. π crashes at Line +11 by Step 6.2 and since no faults occurs after s1 by Step 6.3; a
contradiction.
106
4.2 Single-threaded SEI-hardening
7. s2[O(v )] = sb[v ] and v is valid at sb if no fault occurs before s2.
7.1. s2[O(v )] = sb[v ] since no faults occur before s2.
7.2. v is not assigned before s2 by definition of s1.
7.3. π does not crash in the check of Line +1, Rule 4.3, by assumption.
7.4. v is valid at sb by Steps 7.2 and 7.3.
8. s2[O(v )] = sb[v ] and v is valid at sb if no fault occurs after s2.
8.1. s2[O(v )] = s2[v ] and v is valid at s2, otherwise π crashes after s2 at Line +8, Rule 4.3.
8.2. v is not modified by a fault before s2 since v is valid at s2 by Step 8.1.
8.3. v is not modified by an assignment before s2 by definition of s1.
8.4. s2[v ] = sb[v ] by Steps 8.2 and 8.3.
8.5. s2[O(v )] = sb[v ] and v is valid at sb by Steps 8.1 and 8.4.
9. sr [O(v )] = s2[O(v )], sr [O̊(v )] = TRUE, O(v ) is valid at sr and O̊(v ) is valid at sr .
9.1. No instruction writes FALSE into O̊(v ) after s2 by definition.
9.2. sr [O̊(v )] = TRUE and O̊(v ) is valid at sr by Steps 6, 9.1, 4 and 5.
9.3. O(v ) is not modified by assignment after s1 if O̊(v ) is not modified by a fault after s1.
9.3.1. Assume there is a state s3 at Line +3 of Rule 4.3 and s3 ≻ s1.
9.3.2. s3[O̊(v )] = TRUE since O̊(v ) is not modified by a fault by assumption, since s2 ≺
s1 ≺ s3 and by Steps 9.1 and 5.
9.3.3. π crashes at Line +5 by Step 9.3.2; a contradiction.
9.4. O(v ) is not modified by assignment after s1 if O̊(v ) is modified by a fault after s1.
9.4.1. Assume there is a state s3 at Line +3 of Rule 4.3 and s3 ≻ s1.
9.4.2. O̊(v ) is invalid by fault diversity and since s2 ≺ s1 ≺ s3.
9.4.3. π crashes at Line +5 by Step 9.4.2; a contradiction.
9.5. sr [O(v )] = s2[O(v )] by Steps 9.3 and 9.4, and sr [O̊(v )] = TRUE by Step 9.2.
10. sr [O(v )] = sb[v ], sr [O̊(v )] = TRUE and v is valid at sb by Steps 4, 7, 8 and 9. □
Using Lemma 4.3, we show next that for all v ∈ Vs, v either has the expected result at se,
or v is invalid.
Lemma 4.4 Assume the traversal invariant holds at sb, a fault occurs in Exec1, and π does
not crash. For all v ∈ Vs, if se[v ] ̸= re[v ], then v is invalid at se.
Proof:
1. If v is determined by a fault before se, we are done by fault diversity.
2. Init, Reset, Exec2, and Validate execute correctly by Assumption 4.5.
3. For all v ∈ Vs, sx2 [v ] = rb[v ] or v is invalid at sx2 .
3.1. For all v ∈ Vs assigned in Exec1, v is valid at sb by Lemma 4.3.
3.2. For all v ∈ Vs assigned in Exec1, sr [O(v )] = sb[v ] by Lemma 4.3.
3.3. For all v ∈ Vs assigned in Exec1, sb[v ] = rb[v ] by Step 3.1 and by traversal invariant.
3.4. For all v ∈ Vs assigned in Exec1, sx2 [v ] = sr [O(v )] = rb[v ] by Steps 3.2, 3.3 and 2.
3.5. For all v ∈ Vs, sx2 [v ] = rb[v ] or v is invalid at sx2 by Step 3.4, traversal invariant, fault
diversity, and since no other block writes to v ∈ Vs before Reset except Exec1 by
definition.
4. For all v ∈ Vs read in Exec2, v is valid at sx2 , otherwise π would crash at Line +1, Rule 4.2,
by Step 2.
5. For all v ∈ Vs modified in Exec2, sc[v ] = re[v ], since Exec2 correctly executes on correct
inputs by Steps 2, 3, and 4.
6. For all v ∈ Vs modified in Exec2, se[v ] = sc[v ] = re[v ] by Step 2, 5 and since no other block
writes to v ∈ Vs after Exec2 by definition.
7. For all v ∈ Vs, se[v ] = re[v ] or v is invalid at se by Step 5, traversal invariant, fault diversity
since no other block writes to v ∈ Vs after Exec2 by definition. □
107
4 Scalable Error Isolation
Filter Prepare1 Prepare2 Exec1
Reset
Exec2 Validate
sb sg sp sx1 sr sx2 sc se
States of traversal
F
T
M
ig
ht
fa
il
Figure 4.5: Fault assumption for Lemmas 4.5, 4.6 and 4.7
FAULTS IN THE Reset BLOCK
Our next three lemmas assume that a fault occurs in the Reset block (see Figure 4.5). We
start by showing that the state at the beginning of Reset is the expected final state, and any
variable modified in Exec1 has its old value correctly stored in O.
Lemma 4.5 Assume the traversal invariant holds at sb, no faults occurs in Init blocks and in
Exec1, and π does not crash. For all v ∈ Vs, it holds that sr [v ] = re[v ] or v is invalid at sr .
Proof:
1. For all v ∈ Vs read for the first time at some state s during the execution of Exec1, it holds
that s[v ] = rb[v ].
1.1. v is valid since π does not crash by assumption, every read is preceded by a validity
check by Rule 4.2, and no fault occurs between sb and sr .
1.2. s[v ] = rb[v ] by Step 1.1, traversal invariant and Lemma 4.1.
2. For all v ∈ Vs read or modified in Exec1, sr [v ] = re[v ] since no fault occurs between sb and
sr by assumption and π reads correct inputs by Step 1.
3. For all v ∈ Vs, sr [v ] = re[v ] or v is invalid at sr by Step 2 and traversal invariant. □
We now show that if some v ∈ N at the beginning of Exec2, then N(v ) holds the reference
final value of v .
Lemma 4.6 Assume the traversal invariant holds at sb, a fault occurs in Reset, and π does
not crash. For all v ∈ Vs, if v ∈ N at sx2 , then sx2 [N(v )] = re[v ] and N(v ) and N̊(v ) are valid at
sx2 .
Proof:
1. Init, Exec1, Exec2, and Validate execute correctly by assumption.
2. N(v ) and N̊(v ) are valid at sx2 , otherwise π crashes in Validate.
3. There is a state s2 at Line 37 when N̊(v ) is assigned TRUE for the first time in Reset.
3.1. N̊(v ) is set to FALSE in Init blocks, which execute correctly by assumption.
3.2. sx2 [N̊(v )] = TRUE, i.e., v ∈ N at sx2 , by assumption.
3.3. Only Reset block writes TRUE to N̊(v ) by definition.
4. s2[N(v )] = sr [v ] and v is valid at sr if no fault occurs before s2.
4.1. s2[N(v )] = sr [v ] since no faults occur before s2.
4.2. v is not modified by assignment after sr and before s2 by definition of s2.
4.3. π does not crash in the check of Line 35 by assumption.
4.4. v is valid at sr by Steps 4.2 and 4.3.
5. s2[N(v )] = sr [v ] and v is valid at sr if no fault occurs after s2.
5.1. s2[N(v )] = s2[v ] and v is valid at s2, otherwise π crashes at Line 41.
5.2. v is not determined by a fault before s2 since v is valid at s2 by Step 5.1.
5.3. v is not modified by an assignment after sr and before s2.
108
4.2 Single-threaded SEI-hardening
5.3.1. Assume there is a state s1 such that sr ≺ s1 ≺ s2 and s1 is the state after Line 43.
5.3.2. If no fault occurs before s1, then s1[N̊(v )] = TRUE by correct execution up to s1 and
Step 1.
5.3.3. If no fault occurs before s1, there is a state s3 ≺ s1 at Line 40 by Step 5.3.2; a
contradiction with definition of s2 (first time v is assigned in Reset).
5.3.4. If no fault occurs after s1, π crashes in the check of Line 44.
5.4. s2[v ] = sr [v ] by Steps 5.2 and 5.3.
5.5. s2[N(v )] = sr [v ] and v is valid at sr by Steps 5.1 and 5.4.
6. sx2 [N(v )] = s2[N(v )] and sx2 [N̊(v )] = TRUE.
6.1. No instruction writes FALSE into N̊(v ) after s2 by definition (Algorithm 4.1) and block
assumption.
6.2. sx2 [N̊(v )] = TRUE by Steps 6.1, 1 and 2 and definition of s2.
6.3. N(v ) is not modified by assignment after s2 if N̊(v ) is not modified by a fault after s2.
6.3.1. Assume there is a state s3 at Line 37 and s3 ≻ s2.
6.3.2. s3[N̊(v )] = TRUE since N̊(v ) is not modified by a fault by assumption, and since
s2 ≺ s3 and by Steps 6.1 and 2.
6.3.3. π crashes at Line 38 by Step 6.3.2; a contradiction.
6.4. N(v ) is not modified by assignment after s2 if N̊(v ) is modified by a fault after s2.
6.4.1. Assume there is a state s3 at Line 37 and s3 ≻ s2.
6.4.2. N̊(v ) is invalid at s3 by fault diversity and since s2 ≺ s3.
6.4.3. π crashes at Line 38 by Step 6.4.2; a contradiction.
6.5. sx2 [N(v )] = s2[N(v )] by Steps 6.3 and 6.4, and sx2 [N̊(v )] = TRUE by Step 6.2.
7. sx2 [N(v )] = sr [v ], sx2 [N̊(v )] = TRUE and v is valid at sr by Steps 4, 5 and 6.
8. sx2 [N(v )] = re[v ] by Lemma 4.5. □
Finally, we show that if a fault occurs in Reset, the state at the end of the traversal is the
expected state or invalid. Two cases can happen. First, if a variable v is modified in Exec2,
then it must contain the expected final value, otherwise π crashes when comparing U with N
(Line 59) or the current value of v with N(v ) (Line 62). Second, if a variable v is not modified
in Exec2, then it contains either the initial value if it was not modified in Exec1, or it contains
the expected final value if it was modified in Exec1. A third case, where v is only modified in
Reset, is not possible since π would crash when comparing U with N (Line 59).
Lemma 4.7 Assume the traversal invariant holds at sb, a fault occurs in Reset, and π does
not crash. For all v ∈ Vs, if se[v ] ̸= re[v ], then v is invalid at se.
Proof:
1. Init, Exec1, Exec2 and Validate execute correctly by assumption.
2. For all v ∈ Vs determined by a fault at state se, we are done by fault diversity.
3. For all v ∈ Vs not modified by assignment in Exec1, Reset, or Exec2, we are done by
traversal invariant.
4. If suffices to show that for all v ∈ Vs determined by an assignment in Exec1, Reset, or
Exec2, if sc[v ] ̸= re[v ], then v is invalid at sc.
4.1. No v ∈ Vs is assigned in Validate by definition.
4.2. For all v ∈ Vs determined by assignment in Exec1, Reset, or Exec2, se[v ] = sc[v ] or v
is invalid at se by Step 4.1 and since no fault occurs in Validate.
5. Let v ∈ Vs be determined by assignment in Exec1, Reset, or Exec2. We show that if
se[v ] ̸= re[v ], then v is invalid at se.
6. CASE: N̊(v ) is invalid at sx2 .
6.1. π crashes in the Validate block, Line 59.
7. CASE: sc[N̊(v )] = TRUE and N̊(v ) is valid at sc.
7.1. v ∈ U at sc, otherwise π crashes in check of Line 64.
7.2. sc[v ] = sx2 [N(v )], otherwise π crashes in check of Line 62.
109
4 Scalable Error Isolation
Filter Prepare1 Prepare2 Exec1 Reset
Exec2 Validate
sb sg sp sx1 sr sx2 sc se
States of traversal
F
T
M
ig
ht
fa
il
Figure 4.6: Fault assumption for Lemmas 4.8 and 4.9
7.3. sc[v ] = sx2 [N(v )] = sr [v ] = re[v ] by Lemma 4.6 and Step 1.
7.4. v is valid at sc, otherwise π crashes in check of Line 62.
8. CASE: sc[N̊(v )] = FALSE and N̊(v ) is valid at sc.
8.1. v /∈ U at sc, otherwise π crashes in check of Line 67.
8.2. v is not assigned in Exec2 by Steps 1 and 8.1.
8.3. v is not assigned in Reset, otherwise v ∈ N at sc by Lemma 4.6.
8.4. v is determined by assignment in Exec1 by Steps 5, 8.2 and 8.3.
8.5. sr [v ] = re[v ] and v is valid at sr by Step 8.4 and Lemma 4.5.
8.6. sc[v ] = sr [v ] and v is valid at sc by Steps 8.2 and 8.3.
8.7. sc[v ] = re[v ] and v is valid at sc by Steps 8.5 and 8.6.
9. For all v ∈ Vs, if se[v ] ̸= re[v ], then v is invalid at se by Steps 2, 3, 4, 6, 7 and 8. □
FAULTS IN THE Exec2 AND Validate BLOCKS
We now assume a fault occurs either in the Exec2 or in the Validate block (see Figure 4.6).
Since most of the blocks execute correctly, proving the correctness of Exec2, as well as of
Validate below, is much simpler than the previous blocks.
Lemma 4.8 Assume the traversal invariant holds at sb, a fault occurs in Exec2, and π does
not crash. For all v ∈ Vs, if se[v ] ̸= re[v ], then v is invalid at se.
Proof:
1. For all v ∈ Vs, if v is determined by a fault, we are done by fault diversity.
2. Init, Exec1, Reset, and Validate execute correctly by assumption.
3. For all v ∈ Vs, sx2 [v ] = rb[v ] or v is invalid at sx2 by Step 2.
4. Let v ∈ Vs be such that se[v ] ̸= re[v ] and v is valid at se. We show that if se[v ] ̸= re[v ], then
v is invalid at se.
5. CASE: v was assigned in Exec1.
5.1. v ∈ N at sc by Step 2.
5.2. N(v ) and N̊(v ) are valid at sc, otherwise π crashes in the check of Line 59.
5.3. v ∈ U at sc, otherwise π crashes in the check of Line 64.
5.4. sc[v ] = sc[N(v )], otherwise π crashes in the check of Line 62.
5.5. sc[N(v )] = re[v ] by Steps 2 and 5.2.
5.6. se[v ] = sc[v ] = re[v ] by Steps 5.4, 5.5 and 2; a contradiction with Step 4.
6. CASE: v was not assigned in Exec1 and not assigned in Exec2.
6.1. v /∈O at sr by Step 2.
6.2. v is not assigned in Reset by Step 6.1.
6.3. se[v ] = re[v ] if v is not determined by a fault since rb[v ] = re[v ].
6.4. se[v ] = re[v ] or v is invalid at se by Step 6.3 and 1; a contradiction with Step 4.
7. CASE: v was not assigned in Exec1, but was assigned in Exec2.
110
4.2 Single-threaded SEI-hardening
7.1. v /∈O at sr by Step 2.
7.2. v is not assigned in Reset by Step 7.1.
7.3. sr [v ] = rb[v ] or invalid by Step 7.2 and traversal invariant.
7.4. Let s be the state immediately after v is assigned in Exec2 (Line +1, Rule 4.4).
7.5. If no fault occurs before s, then v ∈ U at sc by correct execution before s and U(v ) is
valid at sc, otherwise π crashes in the check of Line 59.
7.6. If no fault occurs after s, then v ∈ U at sc and U(v ) is valid at sc, otherwise π crashes
in the check of Line +2, Rule 4.4.
7.7. π crashes at Line 67; a contradiction.
8. For all v ∈ Vs, if se[v ] ̸= re[v ], then v is invalid at se by Steps 1, 5, 6 and 7. □
Finally, we assume that a fault occurs in the Validate block and show that the traversal
invariant holds after the traversal has finished.
Lemma 4.9 Assume the traversal invariant holds at sb, a fault occurs in Validate, and π does
not crash. For all v ∈ Vs, if se[v ] ̸= re[v ], then v is invalid at se.
Proof:
1. For all v ∈ Vs, if v is determined by a fault, we are done by fault diversity.
2. Validate does not modify any v ∈ Vs by definition.
3. Init, Exec1, Reset, and Exec2 execute correctly by assumption.
4. For all v ∈ Vs, if se[v ] ̸= re[v ], then v is invalid at se by Steps 1, 2 and 3 and by traversal
invariant. □
OUTPUT MESSAGE’S VALIDITY
We finally show that, if faults are block-confined and some variable v ∈ Vo is corrupt at se,
then v is invalid with respect to v̈ at se.
Lemma 4.10 Assume the traversal invariant holds at sb and π does not crash. If there exists
a variable v ∈ Vo such that se[v ] ̸= re[v ], then se[v ] ̸= se[v̈ ].
Proof:
1. If v is determined by a fault, then se[v ] ̸= se[v̈ ] by fault diversity.
2. v is determined by an assignment by Step 1.
3. CASE: Fault occurs in Validate
3.1. Init, Exec1, Reset, and Exec2 blocks execute correctly by fault-frequency assumption.
3.2. Validate block does not assign any variable v ∈ Vo by definition.
3.3. For all variable v ∈ Vo, sc[v ] = re[v ] by Steps 3.1 and 3.2.
3.4. se[v ] = sc[v ] = re[v ] by Steps 3.3 and 2.
4. CASE: Fault occurs before Validate
4.1. Validate block executes correctly by fault-frequency assumption.
4.2. Validate block does not assign any variable v ∈ Vo by definition.
4.3. For all v ∈ Vo, v is valid at sc, otherwise π crashes in the check of Line 71 by Step 4.1.
4.4. For all v ∈ Vo, se[v ] = sc[v ] by Steps 4.2.
4.5. For all v ∈ Vo, se[v ] = re[v ].
4.5.1. Assume there is a v ∈ Vo such that se[v ] ̸= re[v ].
4.5.2. v is invalid at se by Step 4.5.1 and Lemmas 4.2, 4.4, 4.7 and 4.8.
4.5.3. A contradiction of Steps 4.5.2, 4.3 and 4.4.
5. For all v ∈ Vo, if se[v ] ̸= re[v ], then se[v ] ̸= se[v̈ ] by Steps 1, 3 and 4. □
Lemma 4.11 If the traversal invariant holds at state sb and π does not crash, then the traversal
invariant holds at state se.
Proof: By Lemmas 4.2, 4.4, 4.7, 4.8, 4.9 and 4.10. □
111
4 Scalable Error Isolation
4.2.4 CORRECTNESS WITH GATES
We now show that the complete SEI-hardening guarantees error isolation. In contrast to the
previous section, here we do not assume block-confined faults. In particular, in this section,
faults either jump from within a block into another block or leave the traversal altogether by
jumping out of the traversal.
Definition 4.20 (A fault jumps to another block) A fault jumps to another block if and only
if the fault occurs at some state sf after execution of some instruction of a block B1 and it
corrupts the program counter pc to the value of some instruction in another block B2.
Definition 4.21 (A fault jumps out of the traversal) A fault jumps out of the traversal if and
only if the fault corrupts the program counter pc to the value of some instruction at or imme-
diately after Line 84.
Remember that Line 84 represents the exit point of the program, i.e., messages may be
externalized after this line.
The road map of our proof consists of the following steps:
1. We start by showing that if a fault occurs in a traversal A and the fault does not jump out
of the traversal, then the fault is block-confined; hence, the results from block-confined
faults apply.
2. Next, we show that even if a fault jumps out of the traversal A, local error exposure is
preserved (although the traversal invariant might be violated).
3. Finally, we show that if a fault jumps out of traversal A, then any subsequent traversal
B that modifies the state and does not crash π leads to a state in which the traversal
invariant holds again.
STEP 1: COMPLETE TRAVERSALS
Let Flags = {cfg, cfp, cf1, cfr , cf2, cfc} be the set of control-flow flags in SEI-hardening. To
simplify the presentation, we incorporate the instructions of the gates as part of the blocks of
Algorithm 4.1 such that:
• All instructions up to and including Line 8 belong to block Filter ;
• Prepare1 starts at the state sg immediately after the execution of Line 8;
• Prepare2 starts at the state sp immediately after the execution of Line +2 of Gate(cfp, cfg);
• Exec1 starts at the state sx1 immediately after the execution of Line +2 of Gate(cf1, cfp);
• Reset starts at the state sr immediately after the execution of Line +2 of Gate(cfr , cf1);
• Exec2 starts at the state sx2 immediately after the execution of Line +2 of Gate(cf2, cfr );
• Validate starts at the state sc immediately after the execution of Line +2 of Gate(cfc, cf2).
When the system starts, the first state of the first traversal is correct by Assumption 4.4.
Therefore, we know that in the first traversal sb[cfg] = FALSE and for every other flag cf ∈ Flags,
sb[cf ] = TRUE (Definition 4.17). We define the following predicate to express this condition at
some state s.
112
4.2 Single-threaded SEI-hardening
Complete ∆= ∧ s[cfg] = FALSE ∨ ¬Valid(cfg)
∧ ∀cf ∈ Flags\{cfg} : s[cf ] = TRUE ∨ ¬Valid(cf )
Following directly from Algorithms 4.1 and 4.8, if Complete holds at sb, then Complete
holds at se as long as no fault occurs. As we show next, if Complete also holds at se when
a fault occurs, then the fault cannot jump out of the traversal. Moreover, if a fault does not
jump out of the traversal, then it can be equated as a block-confined fault.
Lemma 4.12 Assume Complete holds at sb, π does not crash, and a fault occurs before the
first assignment of Line 5 and the fault jumps to another block. For all v ∈ Vg, se[v ] = sb[v ] or
v is invalid at se.
Proof:
1. Let sf be the state immediately after the fault.
2. For all cf ∈ Flags\{cfg}, sf [cf ] = TRUE or cf is invalid since Complete holds at sb and by fault
diversity.
3. Only one fault occurs by fault-frequency assumption.
4. No instruction sets any cf ∈ Flags\{cfg} to FALSE by Step 3 and since the fault jumps to
another block by assumption and valid at sa by assumption.
5. The fault jumps to some instruction after Line +0, otherwise π crashes by Steps 2 and 4.
6. No variable is modified by an instruction between sb and se by Step 5.
7. For all v ∈ Vg, se[v ] = sb[v ] or v is invalid at se. □
By Lemma 4.12, we can disconsider faults occurring before the first assignment of Line 5
since the state stutters.
We now show that if the traversal invariant and, consequently, the Complete predicate, hold
at sb, and a fault does not jump out of the traversal, then the traversal invariant holds at se.
Lemma 4.13 Assume Complete holds at sb and π does not crash. If a fault occurs after the
first assignment of Line 5 and the fault does not jump out of the traversal, then the first state
of each block exists (Table 4.2) and they appear in the traversal in the order of Algorithm 4.1.
Proof:
1. Let the sa be the state immediately after the first assignment of Line 5.
2. sa[cfc] = FALSE and valid at sa by assumption.
3. The fault occurs after or at sa by assumption.
4. Only one fault occurs by fault-frequency assumption.
5. State sc exists and sc ≺ se.
5.1. Line 82 is executed at state s since the fault does not jump out.
5.2. cfc is assigned to TRUE in Gate(cfc, cf2), otherwise π crashes after s by Steps 2, 3 and 5.1.
5.3. Let sc be the state immediately after Line +2 in Gate(cfc, cf2).
5.4. sc ≺ se since se is the last state of the traversal by definition.
6. State sx2 exists and sx2 ≺ sc.
6.1. If no fault occurs before sc, then cf2 is assigned to TRUE in Gate(cf2, cfr ) at some state
s ≺ sc.
6.2. If no fault occurs after sc, then sc[cf2] = TRUE and valid at sc, otherwise π crashes in
the check after sc (Line +3 in Gate(cfc, cf2)).
6.3. If no fault occurs after sc, cf2 is assigned to TRUE in Gate(cf2, cfr ) at some state s ≺ sc
by Step 6.2.
6.4. There exists a state sx2 = s such that sx2 ≺ sc by Steps 6.1 and 6.3.
113
4 Scalable Error Isolation
7. State sr exists and sr ≺ sx2 .
7.1. If no fault occurs before sx2 , then cfr is assigned to TRUE in Gate(cfr , cf1), at some
state s ≺ sx2 .
7.2. If no fault occurs after sx2 , then sx2 [cfr ] = TRUE and valid at sx2 , otherwise π crashes
in the check after sx2 (Line +3 in Gate(cf2, cfr )).
7.3. If no fault occurs after sx2 , cfr is assigned to TRUE in Gate(cfr , cf1) at some state s ≺ sx2
by Step 7.2.
7.4. There exists a state sr = s such that sr ≺ sx2 by Steps 7.1 and 7.3.
8. State sx1 exists and sx1 ≺ sr .
8.1. If no fault occurs before sr , then cf1 is assigned to TRUE in Gate(cf1, cfp) at some state
s ≺ sr .
8.2. If no fault occurs after sr , then sr [cf1] = TRUE and valid at sr , otherwise π crashes in
the check after sr (Line +3 in Gate(cf2, cfr )).
8.3. If no fault occurs after sr , cf1 is assigned to TRUE in Gate(cf1, cfp) at some state s ≺ sr
by Step 8.2.
8.4. There exists a state sx1 = s such that sx1 ≺ sr by Steps 8.1 and 8.3.
9. State sp exists and sp ≺ sx1 .
9.1. If no fault occurs before sx1 , then cfp is assigned to TRUE in FirstGate, at some state
s ≺ sx1 .
9.2. If no fault occurs after sx1 , then sx1 [cfp] = TRUE and valid at sx1 , otherwise π crashes
in the check after sx1 (Line +3 in Gate(cf2, cfr )).
9.3. If no fault occurs after sx1 , cfp is assigned to TRUE in FirstGate at some state s ≺ sx1
by Step 9.2.
9.4. There exists a state sp = s such that sp ≺ sx1 by Steps 9.1 and 9.3.
10. State sg exists and sg ≺ sp.
10.1. If no fault occurs before sp, then cfg is assigned to TRUE in FirstGate, at some state
s ≺ sp.
10.2. If no fault occurs after sp, then sp[cfg] = TRUE and valid at sp, otherwise π crashes in
the check after sp (Line +3 in Gate(cfp, cfg)).
10.3. If no fault occurs after sp, cfg is assigned to TRUE in FirstGate at some state s ≺ sp
by Step 10.2.
10.4. There exists a state sg = s such that sg ≺ sp by Steps 10.1 and 10.3.
11. State sb exists and sb ≺ sg by definition of sb (first state of a traversal).
12. sb, sg, sp, sx1 , sr , sx2 , sc, and se exist and sb ≺ sg ≺ sp ≺ sx1 ≺ sr ≺ sx2 ≺ sc ≺ se. □
Lemma 4.14 Assume Complete holds at sb, π does not crash, and a fault occurs after the
first assignment of Line 5 and the fault does not jump out of the traversal. Given the first
state s of any block of Algorithm 4.1, no instruction of a preceding block is executed after s.
Proof:
1. The first state of every block exists by Lemma 4.13.
2. No instruction preceding Gate(cfc, cf2) is executed after sc.
2.1. Assume some instruction preceding Gate(cfc, cf2) is executed at s after sc.
2.2. sc[cfc] = TRUE and cfc is valid at sc by Step 1.
2.3. π crashes at Line +0 of Gate(cfc, cf2) after s either by Step 2.2 or by fault diversity.
3. No instruction preceding Gate(cf2, cfr ) is executed after sx2 .
3.1. Assume some instruction preceding Gate(cf2, cfr ) is executed at s after sx2 .
3.2. sx2 [cf2] = TRUE and cf2 is valid at sx2 by Step 1.
3.3. π crashes at Line +0 of Gate(cf2, cfr ) after s either by Step 3.2 or by fault diversity.
4. No instruction preceding Gate(cfr , cf1) is executed after sr .
4.1. Assume some instruction preceding Gate(cfr , cf1) is executed at s after sr .
4.2. sr [cfr ] = TRUE and cfr is valid at sr by Step 1.
114
4.2 Single-threaded SEI-hardening
4.3. π crashes at Line +0 of Gate(cfr , cf1) after s either by Step 4.2 or by fault diversity.
5. No instruction preceding Gate(cf1, cfp) is executed after sx1 .
5.1. Assume some instruction preceding Gate(cf1, cfp) is executed at s after sx1 .
5.2. sx1 [cf1] = TRUE and cf1 is valid at sx1 by Step 1.
5.3. π crashes at Line +0 of Gate(cf2, cfr ) after s either by Step 5.2 or by fault diversity.
6. No instruction preceding Gate(cfp, cfg) is executed after sp.
6.1. Assume some instruction preceding Gate(cfp, cfg) is executed at s after sp.
6.2. sp[cfp] = TRUE and cfp is valid at sp by Step 1.
6.3. π crashes at Line +0 of Gate(cfp, cfg) after s either by Step 6.2 or by fault diversity.
7. No instruction preceding FirstGate is executed after sg.
7.1. Assume some instruction preceding Line 8 (FirstGate) is executed at s after sg.
7.2. sg[cfg] = TRUE and cfg is valid at sg by Step 1.
7.3. π crashes at Line 6 after s either by Step 7.2 or by fault diversity.
8. No instruction of a preceding block is executed after the first instruction of the following
block by Steps 2, 3, 4, 5, 6 and 7. □
Lemma 4.14 does not disallow blocks to be executed twice as long as the fault does not
cross the border of some gate. For example, Exec1 could be executed again after Gate(cf1, cfp)
and before Gate(cfr , cf1). In fact, if a fault does not jump out of a traversal, then the fault is
block-confined (as in Assumption 4.5). Therefore, the results of Section 4.2.3 hold here as
well.
Lemma 4.15 If the traversal invariant holds at sb, a fault occurs and does not jump out of the
traversal, and π does not crash, then the traversal invariant holds at se.
Proof:
1. Complete holds at sb by the traversal invariant.
2. The first state of every block exists and are the order of Algorithm 4.1 by Step 1 and by
Lemma 4.13.
3. Faults are block-confined by Step 1 and by Lemma 4.14.
4. The traversal invariant holds at se by Steps 2 and 3 and Lemma 4.11. □
STEP 2: PARTIAL TRAVERSALS
There are four cases to consider when a fault jumps out of a traversal:
Skipped traversal: A fault occurs at some state before the execution of Line 8, i.e., before sg:
only variables in Flags\{cfg} might be assigned before such fault occurs, and no variable
is assigned after the fault occurs. If cfg is valid at se, then se[cfg] = FALSE.
Initialized traversal: A fault occurs after sg and before the execution of Line +2 of Gate(cfp, cfg),
i.e., before sp: The bookkeeping data structures and the control-flow flags of SEI-
hardening are reset, but no variable of the actual program state are modified.
Partial traversal: A fault occurs after sp and before the execution of Line +2 of Gate(cfc, cf2),
i.e., before sc: The blocks modifying state (i.e., Exec1, Reset and Exec2) are executed
partially. The fault jumps out before entering into the Validate block.
Terminal traversal: A fault occurs after sc: All blocks are executed correctly except of Validate.
We now show that for each of these cases local error exposure (Property 2.3) is preserved.
Lemma 4.16 Assume the traversal invariant holds at sb, π does not crash, and a fault occurs
before the execution of Line 8. If the fault jumps out of the traversal, then for all v ∈ Vs,
se[v ] = sb[v ] or v is invalid at se.
115
4 Scalable Error Isolation
Proof:
1. Let s be the state immediately after the fault occurs.
2. The fault after or during the execution of Line 5.
2.1. Otherwise, the traversal simply stutters by Lemma 4.12 and can be ignored.
3. For some cf ∈ Flags\{cfg} : s[cf ] = FALSE or cf is invalid at s by fault diversity and since
some instructions of Line 5 are executed by Step 2.
4. The fault jumps to the last line of ph, otherwise π crashes at Line 82 by Step 3.
5. No variable v ∈ Vs is modified by an instruction between sb and se by Step 4 and by
definition of ph.
6. For all v ∈ Vs, se[v ] = sb[v ] or v is invalid at se by Step 5 and by fault diversity. □
Note that by Lemma 4.16, no variable in Vs is modified, but some variable in Flags\{cfg}
might be reset. Since no variable in Vs local error exposure is not violated since it depends
only on Vo and Vo ⊂ Vs.
Lemma 4.17 Assume the traversal invariant holds at sb, π does not crash, and a fault occurs
after sg and before the execution of Line +2 of Gate(cfp, cfg). If the fault jumps out of the
traversal, then for all v ∈ Vs, se[v ] = rb[v ] or v is invalid at se.
Proof:
1. The fault occurs at state s before Line +2 in Gate(cfp, cfg) and after FirstGate, otherwise
Initialized does not hold at se.
2. Only Filter , FirstGate and Prepare1 are executed by Step 1.
3. For all v ∈ Vi , s[v ] = rb[v ], s[v ] = s[v̄ ], s[v ] = s[v̈ ] by Step 2, traversal invariant, and definition
of Filter .
4. For all v ∈ Vs, s[v ] = rb[v ] or v is invalid at s by Steps 2 and 3 and by traversal invariant.
5. For all v ∈ Vs, se[v ] = rb[v ] by Step 4 or v is invalid at se by fault diversity. □
Lemma 4.18 Assume the traversal invariant holds at sb, π does not crash, and a fault occurs
after sc. If the fault jumps out of the traversal, then for all v ∈ Vs, se[v ] = re[v ] or v is invalid
at se.
Proof:
1. The fault occurs at state s after Line +2 in Gate(cfc, cf2).
2. All blocks except of Validate execute correctly by Step 1.
3. For all variable v ∈ Vs, s[v ] = re[v ] by Step 2 and traversal invariant.
4. For all v ∈ Vs, se[v ] = re[v ] or v is invalid at se by Step 3 or by fault diversity. □
Lemma 4.19 Assume the traversal invariant holds at sb, π does not crash, and a fault occurs
after sp and before the execution of Line +2 of Gate(cfc, cf2). If the fault jumps out of the
traversal and ∃v ∈ Vo : se[v ] ̸= re[v ], then se[v ] ̸= se[v̈ ].
Proof:
1. Assume by contradiction that some v ∈ Vo is such that se[v ] ̸= re[v ] and se[v ] = se[v̈ ].
2. If the fault corrupts v , then se[v ] ̸= se[v̈ ] by fault diversity and since the fault jumps out of
the traversal.
3. v is assigned at some state s before the fault by Steps 1 and 2.
4. The fault does not modify v after s by Step 2.
5. No fault occurs before s′ by fault-frequency assumption and by Step 3.
6. CASE: Fault occurs in Exec1.
6.1. s[v̈ ] ̸= s[v ] by definition of assignment to v ∈ Vo (Rule 4.5).
6.2. s = se since fault jumps out and by Step 4.
6.3. A contradiction by Steps 1, 6.1 and 6.2.
116
4.2 Single-threaded SEI-hardening
7. CASE: Fault occurs in Reset.
7.1. Exec1 executes correctly by case assumption.
7.2. If v is not assigned in Reset, se[v̈ ] ̸= se[v ] by definition of assignment to v ∈ Vo and
by Step 7.1; a contradiction.
7.3. v is assigned in Reset at state s by Steps 3, 7.1 and 7.2.
7.4. s′[v ] = sb[v ] by definition of Reset and Step 7.3.
7.5. s′[v̈ ] ̸= sb[v ] by definition of assignment to v ∈ Vo.
7.6. s′[v̈ ] ̸= s′[v ] by Steps 7.4 and 7.5.
7.7. s′ = se since fault jumps out and by Step 4.
7.8. A contradiction by Steps 7.1, 7.2, 7.6 and 7.7.
8. CASE: Fault occurs in Exec2.
8.1. Exec1 and Reset execute correctly by case assumption.
8.2. If v is not assigned in Exec2, se[v̈ ] ̸= se[v ] by definition of assignment to v ∈ Vo in
Exec1 and by Step 8.1; a contradiction.
8.3. v is assigned in Exec2 at state s by Steps 3, 8.1 and 8.2.
8.4. s′[v ] = re[v ] by Steps 8.3.
8.5. s′ = se since fault jumps out and by Step 4.
8.6. A contradiction by Steps 8.2, 8.4 and 8.5.
9. If ∃v ∈ Vo : se[v ] ̸= re[v ], then se[v ] ̸= se[v̈ ] by Steps 6, 7 and 8. □
Lemmas 4.16, 4.17, 4.18 and 4.19 show that if a fault jumps out of the traversal the variables
in Vo do not represent a valid output message, i.e., local error exposure (Property 2.3) holds
at state se. Nevertheless, the state se is potentially corrupt and the traversal invariant might
not hold at se.
STEP 3: THE LIMITED EFFECT OF MULTIPLE FAULTS
We now turn to the cases in which a fault jumps out of a traversal A, and a subsequent
traversal B executes. We have seen that local error exposure holds at the final state of
traversal A. Our next goal is to show that if π does not crash during a subsequent traversal
B, then the traversal invariant holds at the final state of traversal B. We use the following
notation to differentiate the state of both traversals: Non-primed states (e.g., sb, rb, sg, sp,
sx1 ) are states in traversal A; primed states (e.g., sb’, rb’, sg’, sp’, sx1 ’) are states in traversal B.
The key insight of this section is that multiple ASC faults have a limited effect. By As-
sumption 4.1, at most one fault occurs per traversal, but between traversals several faults
might occur. We will show that a sequence of faults formed by one fault that jumps out of
traversal A followed by any number of faults between traversals and, finally, followed by zero
or one fault in traversal B behaves similarly to a single block-confined fault. Nevertheless,
such fault sequences are not equivalent to single block-confined faults. They can render input
messages to be omitted and output messages to be duplicated. Such failures are, however,
in the crash-stop fault model and assumed to be tolerated by any algorithm being hardened
(see Section 2.1.3).
We start with the case of a fault jumping out of traversal A before the execution of Line 8.
The control-flow flags might be partially initialized. In particular, cfg is still FALSE at state se5
since it is only set to TRUE at Line 8 and the fault jumps out of the traversal. If no fault occurs
in traversal B, partially initialized flags are fully initialized and the traversal executes normally.
Note that the input message of traversal A is not processed. Therefore, the reference initial
state rb’ of the subsequent traversal B is the same reference initial state rb of traversal A
except that that the input message in Vi is different since messages are received between
traversals. The next lemma asserts that if a fault does occur in traversal B, then the traversal
invariant holds at se’.
5The state immediately after a fault that jumps out of a traversal is the final state of that traversal.
117
4 Scalable Error Isolation
sb
Traversal A
sg sp sx1 sr re
Fault jumps out se sb’
crash at FirstGate if no fault jumps in
Fault jumps into
sx1 ’ sr ’ sx2 ’ sc’ se’
Traversal B
Figure 4.7: Jump-out jump-in fault sequence: If a fault jumps out of traversal A, then in the
subsequent traversal B: (1) another fault has to occur; and (2) this fault has to
jump into the same block in which traversal A was left (see Lemma 4.21).
Lemma 4.20 Assume traversal invariant holds at state sb of traversal A, a fault occurs at some
state sf before the execution of Line 8 and jumps out of traversal A, and π does not crash. If
a fault occurs in traversal B, then for all v ∈ Vg, s′e[v ] = r ′e[v ] or v is invalid at s′e.
Proof:
1. Let s′f be the state immediately after a fault occurs in traversal B.
2. ∀v ∈ Vg : se[v ] = sf [v ] or v is invalid at se by definition of sf and by fault diversity.
3. No instruction modifies any v ∈ Vg\Vi until s′f by definition.
4. ∀v ∈ Vg\Vi : s′f [v ] = sf [v ] or v is invalid at s
′
f by Step 2 and 3 and by fault diversity,
irrespective of how many faults occurred between traversals A and B.
5. CASE: The fault occurs in traversal B before the execution of Line 8.
5.1. If s′f is in Filter block up to Line 8, then the sequence of faults from traversal A to
traversal B are equivalent to a single block-confined fault by Step 4; the results follow
from Lemma 4.11.
5.2. If s′f is after Line 8, then the state s
′
g of traversal B exists by Lemma 4.13; a contradiction
with the fact the state at which the fault occurs precedes Line 8 by case assumption
and s′f succeeds Line 8 by step assumption.
6. CASE: A fault occurs in traversal B after the execution of Line 8.
6.1. Filter and FirstGate execute correctly in traversal B by case assumption since Line 8
is the last line of FirstGate.
6.2. For all v ∈ Vg\Vi : s′g[v ] = sg[v ] or v is invalid by Step 4 and by case assumption.
6.3. Let rg be the reference state after the execution of Line 8 of traversal A.
6.4. For all v ∈ Vg\Vi : s′g[v ] = rg[v ] or v is invalid by Step 6.2.
6.5. sg’ is the same state as sg, but with a potentially different input message in Vi by Step
6.4 and since between two traversals a new message might be received.
6.6. For all v ∈ Vg, s′e[v ] = r ′e[v ] or v is invalid at s′e by Step 6.5.
7. For all v ∈ Vg, s′e[v ] = r ′e[v ] or v is invalid at s′e by Steps 5 and 6. □
We now show that also for the case of a fault jumping out of traversal A after Line 8, the
traversal invariant holds at se’. The control-flow flag cfg is TRUE at once the fault occurs in
traversal A since it is set at Line 8. It is easy to see that if no fault occurs in the subsequent
traversal B, π crashes at the checks of FirstGate because either s′b[cfg] = TRUE or cfg is invalid
at sb’. In fact, if any instruction of blocks Exec1, Reset or Exec2 are to be executed in traversal
B, a fault has to occur before Line 3 completely jumping over the checks of the FirstGate. For
that we define a fault that “jumps in” a traversal as follows.
Definition 4.22 (A fault jumps in) A fault jumps in the traversal if and only if a fault occurs
at some line before Line 3 corrupting the program counter pc to the value of some instruction
after Line 6, but before Line 84.
The next lemma (Lemma 4.21) asserts that not only that fault has to occur in traversal B,
but also that the fault has to jump into exactly the same block at which traversal A was left.
118
4.2 Single-threaded SEI-hardening
Figure 4.7 exemplifies a fault occurring in Exec1 jumping out of traversal A. A fault has to occur
jump into traversal B, otherwise π crashes in the checks of FirstGate since s′b[cfg] = TRUE or
invalid at sb’. Following the example of the figure, if the fault jumps into B before Gate(cf1, cfp),
i.e., before sx1 ’, then π crashes at Line +0 of Gate(cf1, cfp) because s
′
b[cf1] = TRUE or invalid
at sb’ since the fault in traversal A jumped out after Gate(cf1, cfp), i.e., sx1 . If the fault jumps
into B after Gate(cfr , cf1), i.e., after sr ’, then π crashes at Line +3 of Gate(cf2, cfr ) because
s′b[cfr ] = FALSE or invalid at sb’ since the fault in traversal A jumped out before Gate(cfr , cf1),
i.e., before sr . Hence, the only block into which the fault can jump is Exec1, i.e., after sx1 ’ and
before sr ’.
The consequence of Lemma 4.21 is important. The faults of two consecutive traversals
behave similarly to a single block-confined fault. Moreover, since multiple faults might oc-
cur between traversals, the sequence of faults formed by a fault jumping out of traversal A
followed by zero or more faults between traversals A and B and, finally, followed by a fault
jumping into traversal B behave as a single block-confined fault.
Lemma 4.21 Assume traversal invariant holds at state sb of traversal A, a fault occurs at some
state sf after the execution of Line 8 and jumps out of traversal A, and π does not crash. If
a fault jumps into traversal B at state s′f , then s
′
f is in the same block as sf .
Proof:
1. s′e[cfg] = FALSE or cfg is valid at se’.
1.1. sf [cfg] = TRUE or cfg is valid at sf since the fault in traversal A occurs after Line 8.
1.2. se[cfg] = TRUE or cfg is invalid at se by Step 1.1 and since the fault jumps out of
traversal A and by fault diversity.
1.3. s′b[cfg] = TRUE or cfg is invalid at sb’ by Step 1.2 and by fault diversity, irrespective of
how many faults occurred between traversals A and B.
1.4. A fault has to occur in traversal B at some state s before the execution of Line 3 and
jump into traversal B after Line 6, otherwise π crashes by Step 1.3.
1.5. s′f [cfg] = TRUE or cfg is invalid at s
′
f by Step 1.3 and 1.4 and by fault diversity.
1.6. If s′f is some state after Line 73, then no variable in Vg is modified in B; we can ignore
such cases since B is a series of stuttering steps.
1.7. If s′f is some state before Line 73, then s
′
e[cfg] = FALSE and cfg is valid at se’ by
definition of Line 73 and by fault-frequency assumption.
2. CASE: sf in Validate before Line 73.
2.1. ∀cf ∈ Flags : sf [cf ] = TRUE and cf is valid at sf since no fault occurs before sf in A
by fault-frequency assumption, and every cf ∈ Flags is assigned TRUEby an instruction
before Line 73 by Algorithm 4.8.
2.2. No instruction modifies any cf ∈ Flags from sf until s′f by Definitions 4.21 and 4.22.
2.3. ∀cf ∈ Flags : s′f [cf ] = TRUE or cf is invalid at s
′
f by Step 2.1 and 2.2 and by fault
diversity, irrespective of how many faults occurred between traversals A and B.
2.4. s′f must be after Line +0 of Gate(cfc, cf2), otherwise π crashes by Step 2.3.
2.5. s′f must be before Line 73, otherwise s
′
e[cfg] = TRUE or cfg is invalid at se’ by Step 2.3;
a contradiction with Step 1.
2.6. s′f is in Validate by Steps 2.4 and 2.5.
3. CASE: sf in Exec2.
3.1. ∀cf ∈ Flags\{cfc} : sf [cf ] = TRUE and cf is valid at sf since no fault occurs before sf
in A by fault-frequency assumption, and every cf ∈ Flags is assigned by an instruction
before Exec2.
3.2. sf [cfc] = FALSE and cfc is valid at sf since no fault occurs before sf in A by fault
frequency assumption, and every cf ∈ Flags is assigned by an instruction before Exec2.
3.3. No instruction modifies any cf ∈ Flags from sf until s′f by Definitions 4.21 and 4.22.
119
4 Scalable Error Isolation
3.4. ∀cf ∈ Flags\{cfc} : s′f [cf ] = TRUE or cf is invalid at s
′
f by Step 3.1 and 3.3 and by fault
diversity, irrespective of how many faults occurred between traversals A and B.
3.5. s′f [cfc] = FALSE or cfc is invalid at s
′
f by Step 3.2 and 3.3 and by fault diversity, irrespec-
tive of how many faults occurred between traversals A and B.
3.6. s′f must be after Line +0 of Gate(cf2, cfr ), otherwise π crashes by Step 3.4.
3.7. s′f must be before Line 73, otherwise s
′
e[cfg] = TRUE or cfg is invalid at se’ by Step 3.3;
a contradiction with Step 1.
3.8. s′f must be before or at Line +2 of Gate(cfc, cf2), otherwise π crashes after Line 73 by
Steps 3.5 and 3.7.
3.9. s′f is in Exec2 by Step 3.8.
4. CASE: sf in Reset.
4.1. ∀cf ∈ Flags\{cf2, cfc} : sf [cf ] = TRUE and cf is valid at sf since no fault occurs before sf
in A by fault frequency assumption, and every cf ∈ Flags is assigned by an instruction
before Reset.
4.2. ∀cf ∈ {cf2, cfc} : sf [cf ] = FALSE and cf is valid at sf since no fault occurs before sf in
A by fault frequency assumption, and every cf ∈ Flags is assigned by an instruction
before Reset.
4.3. No instruction modifies any cf ∈ Flags from sf until s′f by Definitions 4.21 and 4.22.
4.4. ∀cf ∈ Flags\{cf2, cfc} : s′f [cf ] = TRUE or cf is invalid at s
′
f by Step 4.1 and 4.3 and by
fault diversity, irrespective of how many faults occurred between traversals A and B.
4.5. ∀cf ∈ {cf2, cfc} : s′f [cf ] = FALSE or cf is invalid at s
′
f by Step 4.2 and 4.3 and by fault
diversity, irrespective of how many faults occurred between traversals A and B.
4.6. s′f must be after Line +0 of Gate(cfr , cf1), otherwise π crashes by Step 4.4.
4.7. s′f must be before Line 73, otherwise s
′
e[cfg] = TRUE or cfg is invalid at se’ by Step 4.3;
a contradiction with Step 1.
4.8. s′f must be before or at Line +2 of Gate(cf2, cfr ), otherwise π crashes after Line 73 by
Steps 4.5 and 4.7.
4.9. s′f is in Reset by Step 4.8.
5. CASE: sf in Exec1.
5.1. ∀cf ∈ Flags\{cfr , cf2, cfc} : sf [cf ] = TRUE and cf is valid at sf since no fault occurs
before sf in A by fault frequency assumption, and every cf ∈ Flags is assigned by an
instruction before Exec1.
5.2. ∀cf ∈ {cfr , cf2, cfc} : sf [cf ] = FALSE and cf is valid at sf since no fault occurs before sf
in A by fault frequency assumption, and every cf ∈ Flags is assigned by an instruction
before Exec1.
5.3. No instruction modifies any cf ∈ Flags from sf until s′f by Definitions 4.21 and 4.22.
5.4. ∀cf ∈ Flags\{cfr , cf2, cfc} : s′f [cf ] = TRUE or cf is invalid at s
′
f by Step 5.1 and 5.3 and
by fault diversity, irrespective of how many faults occurred between traversals A and
B.
5.5. ∀cf ∈ {cfr , cf2, cfc} : s′f [cf ] = FALSE or cf is invalid at s
′
f by Step 5.2 and 5.3 and by fault
diversity, irrespective of how many faults occurred between traversals A and B.
5.6. s′f must be after Line +0 of Gate(cf1, cfp), otherwise π crashes by Step 5.4.
5.7. s′f must be before Line 73, otherwise s
′
e[cfg] = TRUE or cfg is invalid at se’ by Step 5.3;
a contradiction with Step 1.
5.8. s′f must be before or at Line +2 of Gate(cfr , cf1), otherwise π crashes after Line 73 by
Steps 5.5 and 5.7.
5.9. s′f is in Exec1 by Step 5.8.
6. CASE: sf in Prepare2.
6.1. ∀cf ∈ Flags\{cf1, cfr , cf2, cfc} : sf [cf ] = TRUE and cf is valid at sf since no fault occurs
before sf in A by fault frequency assumption, and every cf ∈ Flags is assigned by an
instruction before Prepare2.
120
4.2 Single-threaded SEI-hardening
6.2. ∀cf ∈ {cf1, cfr , cf2, cfc} : sf [cf ] = FALSE and cf is valid at sf since no fault occurs before
sf in A by fault frequency assumption, and every cf ∈ Flags is assigned by an instruction
before Prepare2.
6.3. No instruction modifies any cf ∈ Flags from sf until s′f by Definitions 4.21 and 4.22.
6.4. ∀cf ∈ Flags\{cf1, cfr , cf2, cfc} : s′f [cf ] = TRUE or cf is invalid at s
′
f by Step 6.1 and 6.3
and by fault diversity, irrespective of how many faults occurred between traversals A
and B.
6.5. ∀cf ∈ {cf1, cfr , cf2, cfc} : s′f [cf ] = FALSE or cf is invalid at s
′
f by Step 6.2 and 6.3 and by
fault diversity, irrespective of how many faults occurred between traversals A and B.
6.6. s′f must be after Line +0 of Gate(cfp, cfg), otherwise π crashes by Step 6.4.
6.7. s′f must be before Line 73, otherwise s
′
e[cfg] = TRUE or cfg is invalid at se’ by Step 6.3;
a contradiction with Step 1.
6.8. s′f must be before or at Line +2 of Gate(cf1, cfp), otherwise π crashes after Line 73 by
Steps 6.5 and 6.7.
6.9. s′f is in Prepare2 by Step 6.8.
7. CASE: sf in Prepare1.
7.1. ∀cf ∈ Flags\{cfp, cf1, cfr , cf2, cfc} : sf [cf ] = TRUE and cf is valid at sf since no fault
occurs before sf in A by fault frequency assumption, and every cf ∈ Flags is assigned
by an instruction before Prepare1.
7.2. ∀cf ∈ {cfp, cf1, cfr , cf2, cfc} : sf [cf ] = FALSE and cf is valid at sf since no fault occurs
before sf in A by fault frequency assumption, and every cf ∈ Flags is assigned by an
instruction before Prepare1.
7.3. No instruction modifies any cf ∈ Flags from sf until s′f by Definitions 4.21 and 4.22.
7.4. ∀cf ∈ Flags\{cfp, cf1, cfr , cf2, cfc} : s′f [cf ] = TRUE or cf is invalid at s
′
f by Step 7.1 and
7.3 and by fault diversity, irrespective of how many faults occurred between traversals
A and B.
7.5. ∀cf ∈ {cfp, cf1, cfr , cf2, cfc} : s′f [cf ] = FALSE or cf is invalid at s
′
f by Step 7.2 and 7.3 and
by fault diversity, irrespective of how many faults occurred between traversals A and
B.
7.6. s′f must be after Line 6, otherwise π crashes by Step 7.4.
7.7. s′f must be before Line 73, otherwise s
′
e[cfg] = TRUE or cfg is invalid at se’ by Step 7.3;
a contradiction with Step 1.
7.8. s′f must be before or at Line +2 of Gate(cfp, cfg), otherwise π crashes after Line 73 by
Steps 7.5 and 7.7.
7.9. s′f is in Prepare1 by Step 7.8. □
Note that the case of fault in traversal A occurring after Line 73 can be ignored because
in such case all blocks execute correctly. Also, note that if sf in Filter , then the fault occurs
before Line 8, and the results follow from Lemma 4.20.
Lemma 4.22 Assume the traversal invariant holds at state sb of traversal A, a fault occurs at
some state sf after Line 8 and jumps out of traversal A, and π does not crash. If a fault jumps
into traversal B, then the for all v ∈ Vg\Vi , s′f [v ] = sf [v ] or v is invalid at s
′
f .
Proof:
1. No variable in Vg\Vi is modified by assignment between sf and se of traversal A by Defini-
tion 4.21.
2. No variable in Vg\Vi is modified by assignment between s′b and s
′
f of traversal B by Defini-
tion 4.22.
3. No variable in Vg\Vi is modified by assignment between se and sb’ since variables in Vg\Vi
are only modified inside traversals.
4. For all v ∈ Vg\Vi , s′f [v ] = sf [v ] or v is invalid at s
′
f by Steps 1, 2 and 3 and fault diversity. □
121
4 Scalable Error Isolation
Finally, we show that if the traversal invariant holds at state sb of a traversal A, a fault jumps
out of traversal A, and another fault jumps into traversal B, then the traversal invariant holds
at se’. In particular, it holds that s′e[v ] = re[v ] for all variable v ∈ Vg that is valid at se’; or
s′e[v ] = r
′
e[v ] for all variable v ∈ Vg that is valid at se’.
Lemma 4.23 Assume the traversal invariant holds at state sb of traversal A, a fault occurs at
some state sf after Line 8 and jumps out of traversal A. If a fault jumps into traversal B at
state s′f and π does not crash, then ∀v ∈ Vg\Vi : s
′
e[v ] = re[v ] ∨ s′e[v ] = r ′e[v ] or v is invalid at
se’.
Proof:
1. No fault occurs in traversal A before sf by fault-frequency assumption.
2. No fault occurs in traversal B after s′f by fault-frequency assumption.
3. For all v ∈ Vg\Vi , s′f [v ] = sf [v ] or v is invalid at s
′
f by Lemma 4.22.
4. N, O and U are valid at s′e since no fault occurs after s
′
f and s
′
f occurs before LastGate.
5. CASE: sf in Validate before Line 73.
5.1. State s′f is in block Validate by Lemma 4.21.
5.2. No assignment to Vg\Vi occurs after sc until s′f by definition of Validate and by Step 3.
5.3. No assignment to Vg occurs after s′f by definition of Validate.
5.4. ∀v ∈ Vg\Vi : se[v ] = sf [v ] = re[v ] by Step 1 and by case assumption.
5.5. ∀v ∈ Vg\Vi : s′e[v ] = re[v ] by Steps 5.2, 5.3 and 5.4.
6. CASE: sf in Exec2.
6.1. State s′f is in block Exec2 by Lemma 4.21.
6.2. CASE: v ∈ N and v ∈ U at s′f .
6.2.1. s′e[v ] = s
′
e[N(v )], otherwise π crashes in Validate.
6.2.2. s′e[N(v )] = s
′
f [N(v )] = sf [N(v )] by Step 3.
6.2.3. sf [N(v )] = re[v ] by Step 1 and 4.
6.2.4. s′e[v ] = re[v ] by Steps 6.2.1, 6.2.2, and 6.2.3.
6.3. CASE: v ∈ N and v /∈ U at s′f ; or v /∈ N and v ∈ U at s
′
f .
6.3.1. π crashes at Validate; a contradiction
6.4. CASE: v /∈ N and v /∈ U at s′f .
6.4.1. v is assigned in Exec2 at some state s′, otherwise s′e[v ] = se[v ] = sb[v ].
6.4.2. s′f ⪯ s
′ by Definition 4.22 and fault-frequency assumption.
6.4.3. No fault occurs after s′ by Step 6.4.2.
6.4.4. π crashes at the check of Line +2 of Rule 4.4; a contradiction.
7. CASE: sf in Reset.
7.1. State s′f is in block Reset by Lemma 4.21.
7.2. Reset does not read or write any v ∈ Vi by definition of Reset.
7.3. ∀v ∈ Vg : s′e[v ] = re[v ] by Lemmas 4.21 and 4.7 and Steps 7.1 and 7.2.
8. CASE: sf in Exec1.
8.1. State s′f is in block Exec1 by Lemma 4.21.
8.2. ∀v ∈ Vg : s′e[v ] = r ′e[v ] by Lemmas 4.21 and 4.4 and Steps 8.1 and 3.
9. CASE: sf in Prepare2 or sf in Prepare1.
9.1. ∀v ∈ Vg : s′e[v ] = r ′e[v ] by Lemmas 4.21 and 4.2 and Step 3. □
We now conclude our proof that SEI-hardening guarantees error isolation.
Theorem 4.1 SEI-hardening guarantees error isolation under the Assumptions 2.2, 4.1, 4.2,
4.3, and 4.4.
Proof:
1. Local error exposure (Property 2.3)
122
4.3 Multithreaded SEI-hardening
If a fault jumps out of the traversal the result holds by Lemmas 2.1, 4.16, 4.20 and 4.23. If
a fault does not jump out of the traversal, the result holds by Lemmas 2.1 and 4.15.
2. Local error filtering (Property 2.4)
If an invalid message is received a correct process discards the message in Filter before
any variable v ∈ Vg is modified by definition of Algorithms 4.1 and 4.2.
3. Accuracy (Property 2.2)
A message is only discarded by a correct process if it is invalid by definition of Algorithms 4.1
and 4.2. Process π only aborts if an error is detected by Rules 4.1-4.6 and Algorithms 4.1,
4.2, 4.5 and 4.8.
4. Under Assumptions 2.2, 4.1, 4.2, 4.3, and 4.4, SEI-hardening guarantees error isolation by
Steps 1, 2 and 3 and Theorem 2.2. □
4.3 MULTITHREADED SEI-HARDENING
In this section, we introduce multithreading support into SEI. Our approach uses two-phase
locking (2PL) [WV01] to isolate the state used by a traversal from other concurrent traversals.
In particular, locks are acquired and held until both executions and the final checks of a traversal
finish. Additionally, since threads can read corrupt values from faulty threads that skip lock
acquisition, we add a barrier after the Validate block, which enforces that a traversal only
terminates after all other concurrently executed traversals have finished their Validate blocks.
We start by discussing further model refinements necessary to support multithreading; next,
we prove the approach correct; and, finally, we discuss the limitations of our multithreading
support and a possible alternative solution.
4.3.1 MODEL REFINEMENTS
Definition 4.23 (Multithreaded process) A process π consists of multiple treads {τ0, . . . , τt}.
Each thread τi executes the Next state transitions on the variable set Vi ⊂ V . Threads share
a subset Vshr of their variables. No input or output variable is shared among threads.
Assumption 4.6 (Mutual exclusion and lock hierarchies) Threads access variables in Vshr
only in critical sections. Mutual exclusion is obtained using locks and locks are acquired in a
consistent order, e.g., via lock hierarchies [Ham99].
Given the way SEI handles locks, the behavior of multithreaded processes is equivalent to
a single-threaded process. Therefore, we can adopt the notion of correct message for single-
threaded processes (Definition 2.3). However, for completeness, we extend the definition of
correct message to a multithreaded process.
Definition 4.24 (Refinement of a generation history of a message) Let π be a multithreaded
process, m be a message sent by π, and h the generation history for m. A refinement ĥ of
h is the sequence of local steps π executes in a run where π is correct and receives each
message in h until m is sent.
Definition 4.25 (Correct multithreaded generation history of a message) Let π be a faulty
multithreaded process. Let m be a message sent by π.
• If π has sent no message m′ before m such that m′ has a correct generation history,
then all generation histories of m are correct for m.
• Else, for each output message m′ preceding m, let H be set of correct generation
histories of m′. A generation history of m is correct if and only if it extends some
generation history in H.
123
4 Scalable Error Isolation
add at the end of the Init blocks
+1 if ¬Check(i) ∨ ¬Check(c[i]) then Abort
+2 c[i]← c[i] + 1
after executing Validate
+1 foreach l ∈Q do
+2 Release(l)
+3 c[i]← c[i] + 1
+4 csnap ← c
+5 if ¬Check(csnap) then Abort
+6 foreach j ̸= i such that csnap[j] is even do
+7 wait until c[j] > csnap[j] ∨ ¬Check(c[j])
Algorithm 4.9: Hardening rules for thread τi
before Acquire(l) in Exec1
+1 add l to set Q
around Acquire(l) in Exec2
+1 do nothing
around Release(l) during Exec1 or Exec2
+1 if ¬Check(l) ∨ ¬Holding(l) then Abort
Algorithm 4.10: Intercepted operations
In addition, given two output messages m1 and m2 having two sets of correct generations
histories Hm1 and Hm2 respectively, it must hold that for each refinement ĥ1 of h1 ∈ Hm1 (resp.
ĥ2 of h2 ∈ Hm2 ) there must exist a refinement ĥ2 of h2 ∈ Hm2 (resp. ĥ1 of h1 ∈ Sm1 ), such that
either ĥ1 extends ĥ2 or ĥ2 extends ĥ1.
The first two conditions are the same as Definition 2.3. The third is specific to multithreaded
processes.
In presence of multiple threads, we can have multiple concurrent traversals, at most one
per thread. The fault frequency assumption with multiple traversals becomes the following.
Assumption 4.7 (Fault frequency with multiple threads) Given a set of concurrent traver-
sals E = {E1, . . . , En} executed by different threads, at most one fault occurs between the
earliest beginning and the latest end of a traversal in E.
The state of a lock l indicates the thread t currently holding l. We consider three lock
primitives for a lock l: Acquire(l), Release(l), Holding(l). The first primitive acquires l if it is
available. It returns a Boolean indicating success. Locks can be acquired multiple times. The
second primitive releases the lock. It returns false if it is invoked by a thread that is not
currently holding l and true otherwise. The last primitive returns a Boolean indicating whether
the invoking thread holds l.
A critical section C is a sequence of instructions that must be executed in mutual exclusion.
In a correct execution, C is only executed by threads holding a set of locks LC .
4.3.2 ALGORITHM EXTENSIONS FOR MULTIPLE THREADS
To support multiple threads, SEI employs an array c of counters, one per thread (Algorithm 4.9);
this array of counters is called the barrier. Before a thread starts executing the Exec1 block,
it increments its counter c[i], and increments it again after completing the Validate block.
Therefore, an odd counter indicates that a thread is executing event handling. Immediately
after completing its Validate block, each thread reads c and makes sure that all other threads
that executed traversals concurrently have finished checking their modifications to state s.
As we show in Algorithm 4.10, SEI intercepts lock operations to make sure that a thread
keeps its locks throughout the first and second execution. Each thread adds its locks to a set
Q and releases them only at the end of the check procedure.
4.3.3 CORRECTNESS WITH MULTIPLE THREADS
We now show that Theorem 4.1 holds even in presence of multiple threads. This boils down
to showing that there is no error propagation among threads through shared variables.
124
4.3 Multithreaded SEI-hardening
Lemma 4.24 If a set of threads τ1, . . . , τn of a process π, with n > 1, have violated mutual
exclusion and are in the same critical section C while running traversal E1, . . . , En, respectively,
then π crashes before all threads exit C.
Proof:
1. Let LC be the set of locks required to enter C.
2. Let s be the state when the last thread in τ1, . . . , τn executes its first operation of C.
3. No fault occurs during traversal E1, . . . , En after state s.
3.1. Mutual exclusion is violated at state s.
3.2. A fault has occurred before s in some traversal E1, . . . , En.
3.3. Step 3 follows by fault frequency.
4. Each thread in τ1, . . . , τn executes Release(l) for each l ∈ LC before exiting C by definition
of LC and Step 3.
5. If some l ∈ LC is invalid, some thread crashes the process before exiting C, upon releasing
l and checking that l is invalid by Steps 3 and 4.
6. Else, if all l ∈ LC are valid
6.1. At most one thread holds all locks in LC .
6.2. Some thread τi crashes the process before exiting C, upon releasing some l ∈ LC and
checking that τi does not hold l, by Steps 3 and 4. □
Lemma 4.25 Let two threads τi and τj execute block Exec1 or Exec2 concurrently in traversals
Ei and Ej , respectively. Let a fault occur after the beginning of Ei or Ej . Assume block-confined
faults (Assumptions 4.5). If the process does not crash, then thread τi and τj only end their
respective traversals after the other thread has finished executing the Exec1 or Exec2 block
and has verified all its state modifications in the Validate block.
Proof:
1. No fault occurs during Ei or Ej in blocks different from Exec1 or Exec2 by fault frequency
and block-confinement.
2. The entry of τi and τj in the counter vector c has been increased at state si and sj during
a fault-free period by Step 1.
3. c[i] or c[j] are valid at si and sj because else the process crashes before the end of Ei or
Ej by Step 2, contradiction.
4. c[i] or c[j] are set to an even value at si and sj by Steps 2 and 3.
5. The Validate block of τi completes only in the following cases (same argument for τj ):
5.1. CASE: csnap[j] is odd, or it is even and csnap[j] < c[j]
5.1.1. τj has increased csnap[j] at some state s ≻ sj , by Step 4.
5.1.2. State s can only occur after τj terminates the executing the Exec1 or Exec2 block
and checks all its state modifications in the Validate block by Step 1.
5.2. CASE: csnap[j] is even and ¬Valid(c[j])
5.2.1. Variable c[j] is corrupted after the snapshot of c is taken, by Step 1 and since the
validity of the variables in c is checked in Line +3 of Algorithm 4.9.
5.2.2. The fault corrupting c[j] must have occurred after Ei and Ej are terminated, by fault
frequency. □
Lemma 4.26 Theorem 4.1 holds in processes with multiple threads.
Proof:
1. Two types of error propagation in multithreaded applications are not covered by Theo-
rem 4.1.
2. CASE: Mutual exclusion is violated.
2.1. This is ruled out by Lemmas 4.24 and 4.25.
125
4 Scalable Error Isolation
3. CASE: A thread τi reads a corrupt variable v determined by an assignment from a different
thread τj .
3.1. Let τi be the first thread that reads a variable corrupted in this manner in a state s.
3.2. Let Ej be the traversal in which τj last writes v before s.
3.3. If τi has read v then τj has released the lock on v by Lemmas 4.24 and 4.25.
3.4. Locks are only released after the Validate block.
3.5. τj does not modify v after releasing the locks protecting v by definition of s.
3.6. The value read by τi is the final value of v in Ej .
3.7. Theorem 4.1 ensures that τi can detect the correctness of v . □
A corollary of Lemma 4.26 is the following.
Corollary 4.1 Error isolation holds in presence of multiple threads.
4.3.4 DISCUSSION
We now discuss some limitations of our multithreading support in SEI and one possible
alternative solution.
LIMITATIONS
Fault coverage. Our barrier mechanism assumes block-confined faults (see Lemma 4.25), an
assumption that is not needed in the single-threaded case. The reason for that assumption can
be illustrated with an example. An ASC fault can bring a thread τi to jump from pc = 0 directly
inside a critical section CS. Assume τi is preempted after modifying some state variables
inside CS. Next, another thread τj gets into CS and reads the corrupt variables. Thread τj
can successfully acquire the locks protecting CS because τi skipped their acquisition due to
the ASC fault. Moreover, τj does not block on the barrier at the end of the traversal because
τi skipped the Init blocks altogether. Therefore, if τi is not scheduled before τj finishes its
traversal, then τj may send corrupt messages and violate error isolation.
Block-confined faults rule out this scenario because a fault skipping the Init block cannot
jump into an execution block. This assumption might be, however, unnecessarily strong. It
would be sufficient to argue that a fault will not skip the Init block and jump inside an execution
block and inside a critical section inside that block. Despite the slightly strong assumption,
our experiments in Section 4.5.1 do not reveal a major difference in the number of undetected
errors between single-threaded and multithreaded executions.
Concurrency. Our algorithm implements a two-phase locking (2PL) scheme since locks are
not released until the Validate block terminates. This locking protocol is known to limit the
available parallelism under contended workloads. Two further issues with SEI can aggravate
this problem. First, our algorithm increases critical section length by executing them twice.
In our experiments with memcached, however, we have not experienced significant reduction
in the system’s throughput due to the locking scheme when enough parallelism is available
(i.e., when threads mostly access different keys). Second, the barrier approach can make fast
threads wait for slow threads. That can be mitigated by “preempting” the traversal and letting
the thread execute other work meanwhile – see Section 4.4.3 for details.
Deadlocks. SEI targets applications where locks are acquired in a consistent order (see
Assumption 4.6) because our 2PL approach might cause deadlocks otherwise. An alternative
solution that does not require hierarchical locks is to rollback the state modifications (as Reset
does) upon detecting a deadlock; and, subsequently, waiting for a random period of time
126
4.4 SEI-hardening implementation
before retrying. Note that the deadlock detection mechanism does not have to be harden
because a misdetection only affects liveness but not safety.
THE MINI-TRAVERSALS APPROACH
Given the limitations of the 2PL scheme, one might consider the following alternative solution.
We split a traversal E into mini-traversals by considering every unlock event as an external-
ization event. Indeed, once a thread τi releases a lock l, the data protected by l can be read
by another thread τj , which in turn can use the data to create and send a message; hence,
unlocking can be seen as state externalization.
The mini-traversal approach can be realized by wrapping lock release functions. Whenever
a release function is called in a traversal E, before the lock is actually released, the traversal is
reset, re-executed, and checked (Reset, Exec2, and Validate blocks). After that, the traversal
sends a local message, effectively splitting the traversal E into two mini-traversals E1 and E2.
The local message represents the local state (including acquired locks) being transferred from
mini-traversal E1, which has just finished, to mini-traversal E2, which is about to start.
We have implemented this solution, but practical limitations made it very inefficient. In
particular, since traversals are split, a mini-traversal Ei might start at deep level in the call stack
and end at a higher level in the call stack. If that occurs, all stack frames between both stack
levels have to be reset before the traversal Ei can start its Exec2 block. The need to copy
large portions of the stack after every lock release render the mini-traversals approach hard
to employ in practice; for example, in memcached more than 10 lock releases might occur per
traversal, each copying as much as 800 bytes of stack data. In this chapter, we only consider
the 2PL scheme presented above (see also our future work suggestions in Section 4.8).
4.4 SEI-HARDENING IMPLEMENTATION
We now present libsei, a library designed to semi-automatically harden benign-fault-tolerant
distributed systems. libsei does not require re-developing the system from scratch, en-
abling existing code to be hardened with a small effort, as we discuss in this section and in
Section 4.4.3. We start presenting the interface and usage of libsei. Next, we get into the
details of the implementation.
4.4.1 DEVELOPMENT EFFORT
Hardening an event handler using libsei only requires:
1. marking the beginning and end of the event handler using the macro functions ˙˙begin()
and ˙˙end();
2. calling ˙˙output˙append(var, var˙len) to indicate that a buffer pointed by variable var
with size var˙len is added to the current output message;
3. calling ˙˙output˙done() to indicate that the output message is complete and its CRC
can be finalized and added to the output buffer;
4. appending CRCs to output messages after retrieving them by calling ˙˙crc˙pop(); and
5. starting the compiler as described below.
The developer must include all operations modifying the state of the process as part of the
event handler enclosed by ˙˙begin() and ˙˙end(). Message handling and message sending
are external to libsei and do not require interaction with the library. In contrast to the SEI’s
127
4 Scalable Error Isolation
char* omsg; size˙t olen; // output message
char imsg[MAX˙LEN ]; size˙t ilen; // input message buffer
uint32˙t crc;
while (1) –
ilen = recv˙msg˙and˙crc(imsg , &crc);
// hardened event handler
if (˙˙begin(imsg , ilen , crc)) –
do˙something˙here(imsg);
omsg = create˙a˙message˙here (&olen);
˙˙output˙append(omsg , olen);
˙˙output˙done (); // finalize CRC
˙˙end ();
˝ else continue; // discard invalid input
send˙msg˙and˙crc(omsg , olen , ˙˙crc˙pop ());
˝
Figure 4.8: Example of an event loop and a hardened handler
specification, message replicas are represented as CRCs in libsei. That does not, however,
affect the correctness of the SEI algorithm.
Figure 4.8 shows the pseudo-code of a typical event-based process. The functions provided
by libsei are prefixed with “˙˙”; all remaining code is part of the pre-existing code base that
needs to be hardened. Apart from adding some annotations and including CRCs in messages,
which is a good practice anyway, there is not much a developer needs to do for hardening.
When the hardened system runs with multiple threads, the function ˙˙barrier() – not
depicted in the example – returns false if the thread should wait for another thread to complete
the execution of its handler. The developer is responsible for calling ˙˙barrier() and blocking
message sending while it returns false. In Section 4.4.3, we discuss how to mitigate this
blocking overhead.
AUTOMATIC INSTRUMENTATION
Scaling to large or existing code bases requires minimizing the development effort of us-
ing libsei. A major challenge in hardening an application with SEI is capturing state up-
dates. A tedious and error-prone approach would be to manually modify all operations with
calls to the library interface; for example, a developer manually translating the increment
(*p)++ of a counter variable in the heap would first have to read the value pointed by p,
then increment it, and finally store it in p again. The translated operation would resemble
˙˙write(p,˙˙read(p)+1), where ˙˙write() and ˙˙read() are internal functions of libsei.
Such approach can only work for small examples, but not for existing code bases. Our approach
is to automatically intercept such memory operations using a compiler transformation avail-
able out-of-the-box. In particular, we use transactional memory (TM) support of gcc,6 which
is available from version 4.7. The TM compiler option redirects all memory operations within
˙˙begin() and ˙˙end() markers to a standardized application binary interface (ABI) [Int09].
Note that libsei provides the ABI, but it neither implements nor relies on a TM algorithm. It
merely executes procedures that store snapshots and state updates and performs validation,
as described in Section 4.2.2. The macros marking the beginning and the end of event han-
dlers translate into compiler keywords (specifically, ˙˙transaction˙atomic – code ˝), which
instruct the compiler to instrument the event handler code. To execute the event handler
6http://gcc.gnu.org/wiki/TransactionalMemory
128
4.4 SEI-hardening implementation
twice, we implement a mechanism similar to the setjmp()/longjmp(), which is also used to
abort transactions in transactional memory implementations.
An out-of-the-box compiler transformation has one advantage over designing a custom com-
piler pass: It minimizes the cost to adopt the approach. In the case of libsei, only a library is
necessary to be installed in addition to a standard compiler. Moreover, since the transactional
memory ABI is in the process of standardization, other compilers implementing the same
interface can be used to harden processes. A drawback of misusing the TM’s ABI to other
ends than initially designed is that the compiler sometimes cannot optimize the transformed
code in the way a custom transformation could do (see issues in Section 4.4.3).
Remarks on external functions. External functions called within hardened event handlers
have to be treated with care. If an external function neither has side effects nor modifies
its arguments, e.g., strlen(), then the developer can annotate it with transaction˙pure to
instruct the compiler not to instrument the function. If the external function modifies state,
the compiler requires the source code of the function to redirect writes to the TM’s ABI. In
our prototype, we integrate in libsei a series of functions from libc, specially functions in
string.h since these are commonly used in C programs.7 Finally, if the external function
performs an external action, e.g., sends a message or prints on the display, the developer has
to decide either (1) to allow the function be called twice (once per handler execution), or (2)
to wrap the function manually. For example, since fprintf() is only used to write in log files
in our example applications (see Section 4.4.3), we do not instrument calls to it. In contrast,
functions such as send() have to be wrapped as described in Section 4.4.2.
MANUAL INSTRUMENTATION
Although the instrumentation of all write accesses within event handlers is done automati-
cally, other parts of the instrumentation have to be manually performed, in particular, marking
event handler boundaries, marking message buffers and introducing the CRCs into messages.
Nevertheless, manual instrumentation is only a minor part of the instrumentation and makes
the hardening flexible.
Local handlers. libsei supports local handlers, helping the developer to call event han-
dlers without explicitly receiving a message. The developer marks the event handler with
˙˙begin˙nm(), taking no messages as argument. Local handlers are important for hardening
existing code bases, as we discuss below, because applications do not always follow the
receive-handle-send pattern required by SEI. The caveat of using ˙˙begin˙nm() is that du-
plicated or invalid handling of events cannot be detected since no messages are passed as
argument. In our fault injection experiments, however, such problems have not manifested.
Performance tuning. The manual instrumentation can also help to improve the performance
since the developer can choose what instructions are part of event handlers. For example, if a
piece of code only manipulates a performance statistics variable, the developer might decide
to keep the code outside any event handler since the variable does not contain critical data
for the application safety.
4.4.2 LIBRARY INTERNALS
Figure 4.9 depicts the main data structures of our prototype library libsei: the input and
output message buffers ibuf and obuf, and the state buffer sbuf. The input buffer contains
7The functions are taken from OpenBSD’s libc implementation: http://www.openbsd.org
129
4 Scalable Error Isolation
Event Handler
∼∼∼
∼∼∼∼∼∼∼∼∼
len
crc
ibuf
Vo V̈o
crc1 crc1
crc2 crc2
crc3 crc3
obuf
O/ N U
a1,v1 a1
a2,v2 a2
regs
sbuf
second executioninput
message
output
messages
Figure 4.9: libsei data structures and event handling
a pointer to the input message, its length, and its CRC. The output buffer contains CRCs
of the original and replica messages, which are used for validation (Validate block) and for
end-to-end message corruption detection. The state buffer holds two data structures. First, a
set containing the original values of the variables modified in Exec1 block as well as the new
values after Exec1, working as the snapshot buffer O and the new-value buffer N. Second,
sbuf contains the update buffer U with the pointers of the variables modified during Exec2,
used in Validate. sbuf also keeps a snapshot of the CPU registers to be reset for the second
execution. During the Reset block, in addition to resetting the modified variables in memory,
the registers are also set back to their original values and the instruction pointer is reset to
the first instruction of the handler, marking the start of the second execution. Note that the
registers are not compared in both executions since they are local variables.
libsei also tracks lock acquisitions (wrapping the pthread mutex interface) with a queue Q
and memory management by wrapping malloc() and free(). After calling these functions in
Exec1, libsei saves the arguments and return values of the calls in a queue (see Section 4.3).
In Exec2 block, after checking the arguments to be the same as in Exec1, the calls to these
functions return the values in the queue. The actual deallocations, similarly to lock releases,
are postponed to the end of Validate. In general, any function performing an external action
– e.g., sending a message – called inside an event handler has to be wrapped since it will
be executed twice; the compiler terminates with an error otherwise. Among others, libsei
currently wraps sendto() and sendmsg(), postponing their calls until the end of the second
execution. No wrapper is necessary for external actions performed outside the event handler.
For example, all calls sending messages in memcached follow this pattern.
Our implementation has some simplifications in relation to the complete SEI algorithm
(Section 4.2.2). First, libsei relies on memory error detection codes to keep variable replicas
and execute validity checks. This allows us to nearly eliminate the memory overhead of state
replication and the CPU overhead of validation – calls to the Check function (Algorithm 4.4)
are automatically done by the hardware. More specifically, each thread in libsei requires
about 30 KiB of memory for hardening-related data structures (e.g., obuf, sbuf, . . . ). At the
moment, libsei does not execute validity checks on state stored on disk. Second, our control-
flow checking implementation uses exactly the same gates as PASC [Cor+12b], reducing the
number of bookkeeping flags and the complexity of the implementation. Third, several checks
in SEI are simplified to only execute the Check function: Lines +4, +7, +10 of Rule 4.3;
Line +2 of Rule 4.4; and Lines 38, 41, 44 of Reset block. Finally, libsei data structures are
initialized only once in Init. In Section 4.5, we evaluate the fault coverage of libsei given
these simplifications.
As already mentioned, replica messages are transmitted and stored with CRCs in our im-
plementation. Since libsei calculates 32-bit CRCs for every input and output message,
efficient CRC computation is critical to obtain a reasonable performance. libsei calculates
CRCs using the SSE4.2 hardware extensions, which is commonly found even on consumer
130
4.4 SEI-hardening implementation
laptops [Int07]; and falls back to the efficient slicing-by-8 algorithm [KB05] in absence of the
extensions. This greatly reduces the cost of calculating CRCs, although there is still a linear
overhead associated with the message size.
4.4.3 HARDENING REAL-WORLD CODE BASES
To evaluate the effort of applying SEI, we have hardened two applications implemented in C:
memcached and Deadwood. memcached is a popular multithreaded in-memory key-value cache8,
highly optimized for performance, that exposes a get/set request interface to remote clients.
memcached is essentially a large hashtable with an LRU eviction logic with linked lists to evict
items. Deadwood is the single-threaded recursive DNS resolver of MaraDNS9. We have used
memcached 1.4.15 and Deadwood version 3.2.05. Deadwood has a modular code base, and is
designed to be immune to several spoofing attacks common to DNS servers.
There are three main steps to harden an existing code base.
STEP 1: EVENT HANDLERS ANNOTATION
The initial challenge when hardening an existing code base is in choosing the right place for
the event handler markers. A good understanding of the code base is necessary to determine
what state is persistent across the processing of multiple requests. In memcached, we have
marked 8 event handlers and added 7 lines related to CRC of messages. More than 120
functions were automatically instrumented. In Deadwood, we have marked 2 handlers and
added 8 lines related to CRC. More than 170 functions were automatically instrumented.
STEP 2: CODE BASE ADAPTATION
Hardening required modifying and adding about 60 code lines to memcached. Applications
do not always cleanly follow the pattern “message receiving, message handling, message
sending”. After an event handler of a get request retrieves an item, the content is sent back
to the client in the message-sending phase. Afterward, memcached needs to decrement the
reference counter of the item, which, being part of the state, should also be modified in
hardened handlers. For such cases we have used local event handlers.
Thread synchronization and nondeterminism between two executions of the same event
handler have also to be considered. SEI currently only supports lock-based synchronization
(see Section 4.3). The slab allocator of memcached, for example, uses ad hoc synchronization,
so we disabled it for the hardened version. We left it enabled for the original version, however.
In our experiments, the slab allocator plays a minor role since our workloads are mostly
get requests, which do change state but do not allocate new memory objects. Moreover,
memcached’s main thread periodically reads the clock and updates a global variable called
current˙time. Worker threads access this variable without any synchronization. Therefore,
two executions of the same handler can see different clock readings. We solve this issue
with 3 lines of code. First, the current˙time variable is declared as thread-local; second, the
main thread uses an auxiliary variable to read time; third, each worker thread updates its own
copy of current˙time between event handlers when going to the main event loop.
The changes to Deadwood were only a few. We have modified 2 code lines of Deadwood
to move a buffer from the stack to the heap in order to be able to reset updates. Differently
from memcached, Deadwood calls sendto() within event handler, e.g., if a query is received
for an entry not present in the resolver, it forwards the request upstream and replies the client
afterwards. Hence, we added to libsei wrapper functions that delay such message sending
functions until both executions of the handler have finished.
8http://memcached.org
9http://maradns.samiam.org/deadwood
131
4 Scalable Error Isolation
STEP 3: PERFORMANCE TUNING
In some cases, the TM compiler might “over-protect” the code from the SEI’s point of view.
For example in Deadwood, dozens of strings are allocated and freed in the scope of a single
handler; although these strings are in the heap memory, they are local variables of the handler
and do not have to be protected. The developer can inform libsei to ignore writes into a
region of memory, e.g., into a string, by calling ˙˙ignore˙addr(addr, size). Moreover, if
a complete function only modifies local variables, the instrumentation of the function can be
disabled by declaring it with the transaction pure attribute.
To mitigate the effect of the barrier (see Section 4.3) on the system scalability, the developer
can adapt the system to handle other requests while a thread is waiting for other threads to
terminate executing concurrent event handlers. In particular, memcached always serves another
connection if sending a message would block the thread. Therefore, when a thread has to
wait for the barrier, we simply fake a “would-block” error. The caveat of this solution is the
further 40 lines of code added to memcached. An alternative solution is configuring libsei to
disable the barrier altogether, allowing threads to terminate without waiting for other threads.
This solution requires no additional code change, but assumes locks cannot be skipped by an
ASC fault affecting the control flow. We have implemented and evaluated both approaches
and report the results next.
4.4.4 FURTHER INSTRUMENTATION CHALLENGES
We close this section with a discussion of the difficulties when using the transactional memory
compiler for hardening. An open question is whether some of these issues could be solved
by developing a custom SEI compiler pass to perform the event handler instrumentation.
ANNOTATIONS
More than 100 different functions are called inside the event handlers of memcached and Dead-
wood and have to be hardened. To allow the compiler to generate the instrumented clones,
these functions have either to be compiled in the same module where the transactions are
declared or their prototypes have to be annotated with the transaction˙safe keyword. We
opted for the former approach for memcached and the latter approach for Deadwood. The
approaches do not differ on the generated binaries, only on the development process. The
former breaks module encapsulation and requires the developer to solve name clashes de-
pending on the source code10. The latter requires the tiresome annotation of every function
used in the event handlers – Vyas, Liu and Spear report on a similar process when transaction-
alyzing memcached [VLS13]. A helpful compiler option here would be to instruct the compiler
to instrument all functions of a given module without having to annotate them.
CODE INLINING AND READ OPERATIONS
libsei could not be inlined with the memcached code even using gcc’s linking time optimization
(-flto flag). The reason seems to be the transactional memory pass. Consequently, every
read and write operation inside the hardened handlers have to perform a function call, incurring
extra overhead. That is specially unfortunate for read operations, since the SEI algorithm with
hardware error detection codes does not require them to be intercepted. The consequence of
read memory operations being redirected to functions is two-fold. First, the compiler cannot
carry on some compiler optimizations to execute operations out-of-order. Second, during run
10It might, however, allow for better optimizations by the compiler, since all the code is located within a single
compilation unit.
132
4.5 Fault coverage evaluation
time an extra function call is performed, introducing additional costs to push variables on the
stack, jump to the read function code, execute the read, pop variables from stack and return.
COMPILER OVER-INSTRUMENTATION
The compiler sometimes cannot decide whether an address is local (on the stack) or not.
Consider the following two functions:
void foo(int* p) – (*p)++; ˝
void bar() – int p = 0; foo(&p); ˝
The variable p resides on the stack of bar(), but the compiler might not be able to know
this information in foo(). In such cases, the compiler conservatively instruments the access
to p and leaves to libsei the responsibility of checking whether the address is within the
stack boundaries. Consequently, libsei has to perform the additional check for every written
address, which adds up overhead.
4.5 FAULT COVERAGE EVALUATION
The evaluation of SEI comprises two parts: fault coverage and performance. In this section,
we report our fault coverage results; the performance results appear in Section 4.6. Since our
results for memcached and Deadwood are consistent, we briefly mention Deadwood experi-
ments only to reinforce our results with memcached.
To assess fault coverage, we performed two groups of fault injection experiments. The first
group consists of an extensive software fault injection campaign. Our goal is to determine
(1) whether SEI effectively guarantees error isolation; and (2) how memory and computation
scalability affect fault coverage. The second group of experiments consists of hardware fault
injection using the dynamic voltage scaling of a processor. Our goal here is to collect evidence
that our approach can indeed detect and isolate real, physically-induced faults.
4.5.1 SOFTWARE FAULT INJECTION
SETUP AND METHODOLOGY
In our software fault injection experiments, we follow the approach of Basile et al. [Bas+03]
and Correia et al. [Cor+12b] injecting single bit flips. To inject faults during runtime, we have
developed a bit-flip injector (BFI) employing Intel’s Pin dynamic binary instrumentation frame-
work [Luk+05]. BFI can inject the three groups of faults described in Table 4.3.
Fault types. A control-flow (CF) fault flips a bit of the instruction pointer. A fault in the data-
flow (DF) group affects the computation: WREG and WVAL represent incorrectly computed
values that are respectively written into a register or a memory location, e.g., an addition that
results in a wrong value and is stored in a register; WADDR and RADDR represent errors
while calculating an indexed address for reading or writing from memory. Finally, a fault in
the RD group either directly corrupts a register before being used in an operation (RREG); or
directly corrupts a memory location before being read into a register (RVAL) – RD stands for
read.
Field studies show that most memory faults are detected by ECC [HSS12; SPW09]. Injected
RVAL faults, however, automatically overwrite both, the value and its ECC. Hence, RVAL faults
represent worst-case scenarios in which the ECC memory is not able to detect data corruption
as assumed by fault diversity (see Section 4.2.1).
133
4 Scalable Error Isolation
Group Fault Description
CF CF IP register changes (control-flow fault)
DF
WREG register value changes after it is written
WVAL memory value changes after it is written
WADDR calculated address changes before write
RADDR calculated address changes before read
RD
RREG register value changes before it is read
RVAL memory value changes before it is read
Table 4.3: Fault types supported by BFI
Variants. Our experiments employ the following application variants: MC, the unhardened
memcached; MC-SEI, the SEI-hardened memcached; MC-SEIL, a SEI-hardened memcached that has
the barrier disabled and assumes locks are not skipped; MC-SEI-DUP, a SEI-hardened memcached
that duplicates the state in memory instead of relying on ECC; DW, the unhardened Deadwood;
and DW-SEI, the SEI-hardened Deadwood.
Experiment groups. We perform two sets of experiments. The first set studies the fault
coverage of SEI and the effects of leveraging hardware error detection codes in the imple-
mentation. We run single-threaded experiments with MC, MC-SEI, MC-SEI-DUP, DW, and DW-
SEI. We perform 8,000 executions for each fault type and each single-threaded variant of
memcached, with a subtotal of 96,000 executions for the DF group, 48,000 for the RD group,
and a total of 168,000 executions (see Table 4.4). For each Deadwood variant, we perform
4000 executions for each fault type, with a total of 56,000 executions. The second set of
experiments investigates whether the computational scalability aspect of our implementation
affects the fault coverage. In this set, we run MC-SEI and MC-SEIL with 4 threads. For these
multithreaded experiments, we perform a total of 80,000 executions (see Table 4.6).
Fault injection. Each fault injection execution consists of three phases. The warmup phase
populates the state of the application, but does not inject faults. The injection phase issues
commands of a synthetic workload and injects one fault at a randomly selected instruction.
Finally, the propagation phase retrieves again all entries in the application without injecting
further faults. In memcached, the warmup phase issues set commands, the propagation phase
issues get commands, and the injection phase issues both types of commands. In Deadwood,
all phases issue DNS queries to resolve names. During warmup, Deadwood requests the entry
from the upstream if the entry is not in the resolver’s cache.
In each execution, one fault is injected at a randomly selected instruction inside or outside
the event handler including shared libraries; Pin cannot, however, instrument instructions inside
syscalls. Note that some instructions are not susceptible to every fault, for example, an
instruction that does not write to memory cannot suffer a WADDR fault. In such cases,
we inject the fault in the first susceptible instruction after the selected one. Moreover, if
multiple registers/addresses operands are susceptible to the fault then the operand is selected
randomly.
To speed up our fault injection experiments and make the results reproducible, we have
modified memcached and Deadwood to read commands from an input-trace file and write
responses into an output-trace file by wrapping functions reading from and writing to sockets.
To compare the output trace, we first create an output-trace file for a golden run, i.e., an
execution without faults. The output-trace of each execution is compared with the golden
run. A fault that causes a trace deviation, e.g., an unexpected message or a shorter trace,
134
4.5 Fault coverage evaluation
produces a manifested error. The errors we report are all manifested, consequently we often
refer to them as just errors.
FAULT COVERAGE AND MEMORY SCALABILITY
We initially experimented with a single thread to observe the effects of injected faults with-
out the effects of concurrent access. Table 4.4 summarizes the results of our fault injection
experiments with memcached. The right-most column shows, for each fault-type/variant com-
bination, the total number of manifested errors out of 8000 injections. Manifested errors are
classified in detected and undetected, shown as percentage of the total number of mani-
fested errors. Undetected errors are corrupt output messages that cannot be detected by the
client. They correspond to error propagation scenarios where the error isolation property is
violated. Detected errors are further divided into SEI-detected, i.e., errors detected and iso-
lated by libsei, for example, crashes initiated by the library or invalid messages detectable
at the client; and Crash/other errors, i.e., errors detected or isolated by other mechanisms,
for example, crashes due to segmentation fault or assertions, infinite loops, and also error
messages or partial messages detectable at the client.
The most important result of our fault injection experiments is the drastic decrease of
undetected errors when hardening applications with libsei. Aggregating all fault types, MC
shows 33.43% undetected errors while MC-SEI only 0.25%. Undetected errors in the MC
variant range from 9% up to 69% of the errors, depending on the fault type. In contrast,
the MC-SEI variant shows at most 0.83% undetected errors. SEI-detected column in Table 4.4
shows that libsei effectively detects and isolates from 14% up to 82% of the errors (47.05%
aggregated over all fault types).
Aggregating the results per fault group, MC-SEI variant shows 0.15% undetected errors for
DF faults and 0.52% for RD faults. RD faults are typically detected by error detection codes
mechanisms in hardware. Therefore, our results indicate SEI is also resilient to unexpected
faults, e.g., an uncorrectable error by the ECC memory. Like PASC [Cor+12b], MC-SEI-DUP uses
software-duplicated state and detects all injected faults, including RD faults. As this chapter
focuses on the use of hardware error detection, we henceforth analyze errors manifested
specifically on the variants without duplication.
The results above are not restricted to memcached. Table 4.5 presents results for Deadwood
experiments under equivalent conditions. Hardening Deadwood reduces the undetected er-
rors from at least 8.71% down to at most 0.53%. Aggregating the results for faults in the DF
group only, DW shows 32.38% undetected errors, whereas DW-SEI only 0.12%.
Silent and non-silent errors. We now study in more detail how errors manifest specifically
on the MC-SEI variant. We first classify the errors presented above in an orthogonal way,
dividing them into silent and non-silent errors. Silent errors are errors that are perceived by
the client as a crash, i.e., the TCP socket closes before any further bytes are received.11 Note
that all undetected errors are non-silent.
Figure 4.10 (left) shows that the majority of silent errors is either a crash of the process
due to a segmentation fault or assertions, or a detection of inconsistent pointers by libsei,
meaning that different pointers were used in the handler executions. Both manifestations
together account for more than 75% of all errors.
Another large number of errors detected by the hardening are inconsistent values, consti-
tuting from 4% for CF up to 19% for WVAL. Hangs and cf/other errors detected constitute
no more than 1.3% of the errors. The latter represents control-flow violations libsei detects
11In fact, we recognize a silent error as an output-trace file shorter than expected with no partial or unexpected
messages in the file.
135
4 Scalable Error Isolation
Fault Variant Undetected SEI-detected Crash/other Total errors
CF
MC 9.66% - 90.34% 6690
MC-SEI 0.06% 14.72% 85.21% 6520
MC-SEI-DUP 0.00% 9.87% 90.13% 6594
WREG
MC 34.50% - 65.50% 4194
MC-SEI 0.13% 52.93% 46.94% 4493
MC-SEI-DUP 0.00% 40.41% 59.59% 4180
WVAL
MC 69.92% - 30.08% 3304
MC-SEI 0.16% 82.26% 17.58% 5063
MC-SEI-DUP 0.00% 79.80% 20.20% 2510
WADDR
MC 45.51% - 54.49% 3564
MC-SEI 0.11% 61.20% 38.69% 5413
MC-SEI-DUP 0.00% 46.88% 53.12% 4394
RADDR
MC 32.25% - 67.75% 4118
MC-SEI 0.21% 34.21% 65.58% 5302
MC-SEI-DUP 0.00% 32.06% 67.94% 4907
DF group
aggregation
MC 44.18% - 55.82% 15180
MC-SEI 0.15% 57.57% 42.28% 20271
MC-SEI-DUP 0.00% 45.81% 54.19% 15991
RREG
MC 25.55% - 74.45% 5678
MC-SEI 0.21% 39.51% 60.28% 5708
MC-SEI-DUP 0.00% 34.09% 65.91% 5453
RVAL
MC 41.65% - 58.35% 4936
MC-SEI 0.83% 54.03% 45.14% 5813
MC-SEI-DUP 0.00% 62.83% 37.17% 5989
RD group
aggregation
MC 33.04% - 66.96% 10614
MC-SEI 0.52% 46.84% 52.64% 11521
MC-SEI-DUP 0.00% 49.13% 50.87% 11442
aggregation
MC 33.43% - 66.57% 32484
MC-SEI 0.25% 47.05% 52.70% 38312
MC-SEI-DUP 0.00% 39.96% 60.04% 34027
Table 4.4: Results of fault injection in memcached. Errors classified in undetected, detected
with SEI, and detected with other mechanisms. Total errors out of 8000 executions
of memcached for each fault-type/variant combination. Results are also aggregated
over all fault types as well as for faults in the DF group (WREG, WVAL, WADDR,
RADDR) and in the RD group (RREG, RVAL).
136
4.5 Fault coverage evaluation
Fault Variant Undetected SEI-detected Crash/other Total errors
CF
DW 8.71% - 91.29% 3251
DW-SEI 0.12% 9.33% 90.55% 3334
WREG
DW 26.41% - 73.59% 1946
DW-SEI 0.09% 36.45% 63.46% 2140
WVAL
DW 35.95% - 54.05% 1235
DW-SEI 0.06% 56.54% 43.40% 1567
WADDR
DW 27.50% - 72.50% 1818
DW-SEI 0.28% 39.31% 60.40% 2139
RADDR
DW 40.14% - 59.86% 2070
DW-SEI 0.04% 41.42% 58.54% 2501
DF group
aggregation
DW 32.38% - 67.62% 7069
DW-SEI 0.12% 42.45% 57.43% 8347
RREG
DW 23.01% - 76.99% 2686
DW-SEI 0.07% 24.88% 75.05% 3014
RVAL
DW 44.37% - 55.63% 2565
DW-SEI 0.53% 41.84% 57.63% 2818
RD group
aggregation
DW 33.44% - 66.56% 5251
DW-SEI 0.29% 33.08% 66.63% 5832
aggregation
DW 27.80% - 72.20% 15571
DW-SEI 0.18% 33.02% 66.80% 17513
Table 4.5: Results of fault injection in Deadwood. Errors classified in undetected, detected
with SEI, and detected with other mechanisms. Total errors out of 4000 executions
of Deadwood for each fault-type/variant combination. Results are also aggregated
over all fault types as well as for faults in the DF group (WREG, WVAL, WADDR,
RADDR) and in the RD group (RREG, RVAL).
137
4 Scalable Error Isolation
silent
0
25
50
75
100
CF
W
RE
G
W
AD
D
R
W
VA
L
RR
EG
RA
D
D
R
RV
AL
%
of
th
e
m
an
ife
st
ed
er
ro
rs
hang
cf/other
values
pointers
crash
non-silent
0
1
2
3
4
5
CF
W
RE
G
W
AD
D
R
W
VA
L
RR
EG
RA
D
D
R
RV
AL
invalid
omission
errormsg
undetected
Figure 4.10: Error manifestations in MC-SEI
and other assertions such as differences in the number of written addresses in both execu-
tions. Errors due to control-flow faults are mostly crashes because some invalid instruction
is executed or due to segmentation faults, typically caused by inconsistent use of the stack.
Moreover, most detected errors due to control-flow faults are pointer or value errors, indicating
that control-flow faults often affect the state of the process.
Since hardening cannot guarantee fail-stop behavior, some errors are non-silent: clients
perceive them as unexpected messages. Figure 4.10 (right) shows these remaining errors.
Note that the undetected fraction is mostly below 0.21% and never exceeds 0.83% (numbers
for MC-SEI are also in the undetected column of Table 4.4).
The detected non-silent errors are divided in three categories. A message is invalid if the
message CRC does not match the message payload. From 0.7% up to 3.4% of the errors in
MC-SEI are detected invalid messages, representing the majority of non-silent errors. The other
two error classes are not detectable through hardening. Some messages also arrive truncated
at the client, e.g., when the memcached crashes before writing the complete message out.
Interestingly, memcached itself produces error messages, for example, when a fault makes
memcached think it is out of memory. Truncated and error messages are classified here as
errormsg, and constitute up to 1.2% of the errors. Finally, omission errors occur when the
client misses one reply despite of an otherwise correct output trace; they constitute at most
0.13% of the errors.
Analysis of the undetected errors. We analyzed the log files generated in our fault injection
experiments and identified pointer corruption as the major source of undetected errors in MC-
SEI. Leveraging hardware error detection, as SEI does, has the side effect that variables and
their replicas (the ECC data) are stored in the same memory location and accessed together
by the processor. A fault corrupting a pointer to a variable in an undetectable manner is
equivalent to a consistent corruption of the variable and its replica ECC data. Such scenario
violates the fault diversity assumption and is consequently not covered by our fault model.
138
4.5 Fault coverage evaluation
About half of the undetected errors in MC-SEI could not be traced back to any code because
they occurred in system libraries. The other half of undetected errors occurred mostly outside
handlers as pointer corruptions. To understand a typical scenario of such error propagation,
consider the following instructions, which are executed upon completion of sending a reply.
// item *it = *(c-¿icurr);
mov (%rax),%rax
mov %rax,-0xe0(%rbp)
After replying to a get request, memcached decrements the previously incremented reference
counter of the retrieved item. The object c is the connection, and *(c-¿icurr) is the address
of the retrieved item, which is kept in the hashtable. The first instruction stores the address of
the current item, *(c-¿icurr), into register rax. The second instruction moves the address
into the stack, i.e., into the target address -0xe0(%rbp). In our logs, a WADDR fault flipped
the calculated address, making the mov operation write the pointer just after the stack. The
execution proceeded to decrement the reference counter, which is executed in a hardened
local event handler. The pointer used, however, was the wrong pointer because the address
-0xe0(%rbp) still pointed to an old item in the hashtable. The old item had its reference
counter decremented and was freed since its reference counter reached zero. The memory
location was later reused for another entry of the hashtable, resulting in two items (keys)
pointing to the same item object in memory.
Since these pointer corruptions occurred between traversals, once the next traversal started,
the value was already undetectably corrupt (value and replica were the same). In such cases,
SEI can at most guarantee that the computation over the incorrect values is performed cor-
rectly, but may produce an incorrect output nevertheless.
A few pointer corruptions also occurred inside handlers, i.e., within a traversal. Consider
the following simplified C code which restores the old value of an address stored in an entry
e in sbuf during the reset.
#define RESET˙ENTRY(e, type) “
type* taddr = (type*) e-¿addr; “
type val = READ˙OLD(e, type, e-¿addr); “
WRITE˙NEW(e, type, e-¿addr, *taddr); “
*taddr = val;
taddr contains some address whose content was modified in the traversal. The old value
is read from the entry, the new value in the heap is written in the entry, and then the old
value is written back into the heap. If a fault in the last line changes the address where the
old value should be restored (for example, a WADDR fault while calculating *taddr), then
the new value remains in *taddr, whereas the old value is stored in some other memory
location X . Most of such faults are detected as segmentation faults or are detected by the
second execution or the validation. However, if the new and the old value are the same,
the only effect of the fault is the corruption of the memory location X and its ECC replica.
The corruption becomes problematic (in some subsequent traversal) when X is used but SEI
cannot detect it as corrupt. We have seen this scenario occur two times in our fault injection
campaigns.
Given the low overhead libsei (see Section 4.6), the results presented above are encour-
aging showing a decrease of the ratio of undetected errors from 33.43% to only 0.25%;
or from 44.18% down to only 0.15% when only aggregating DF faults. Employing software-
based replica variables eliminates the residual undetected errors because we use two separate
pointers for the value and its replica. Employing software-based replica variables constitutes
a trade-off between memory footprint and fault coverage (see MC-SEI-DUP in Table 4.4).
139
4 Scalable Error Isolation
Fault Variant Undetected SEI-detected Crash/other Total errors
CF
MC-SEI 0.02% 13.78% 86.20% 6366
MC-SEIL 0.05% 12.84% 87.11% 6330
DF
MC-SEI 0.16% 58.58% 41.26% 19484
MC-SEIL 0.28% 58.61% 41.11% 19088
Table 4.6: Results of fault injection in memcached with 4 threads. Errors for 4-threaded execu-
tions classified in undetected, detected with SEI, and detected with other mecha-
nisms.
COMPUTATION SCALABILITY
We conclude this section presenting the results of our multithreaded experiments with MC-SEI
and MC-SEIL. Table 4.6 shows values aggregated for control-flow faults (CF) and for data-flow
faults (DF), which consist of WREG, WVAL, WADDR, and RADDR faults. The results indicate
that (1) multithreaded executions do not present more undetected errors than single-threaded
executions; and (2) although MC-SEIL assumes locks cannot be skipped, it does not show
substantially more undetected errors than MC-SEI. In particular, CF faults, which can potentially
jump over locks, resulted in less than 0.1% of undetected errors in both variants.
4.5.2 HARDWARE FAULT INJECTION
Software fault injection can reproduce fault cases very precisely, making it easier to analyze
and understand failures. However, there is the risk of introducing a bias, and consequently, we
have also used hardware fault injection to reproduce realistic and unbiased failure scenarios.
We perform hardware fault injection by using the dynamic voltage and frequency scaling
(DVFS) support of an AMD FX based multicore CPU (Bulldozer). DVFS can be used to undervolt
cores, reducing the voltage below the predefined value while keeping the frequency constant.
The scenario of our experiments could be the effect of misconfiguration of power-saving
options or of a power supply failure. Note that future microprocessors are also expected to
run at lower voltage, thus increasing the likelihood of data corruption [BC11].
We experimented with variants MC and MC-SEI running a single thread. After launching the
application, we lowered the voltage between 100 and 150 mV of the nominal CPU voltage
(1.225 V). Table 4.7 shows the outcome of 936 observations. The application often crashed
in at most 40 seconds of execution. In addition to crashing, the machine froze very often,
explaining the reduced number of experiments performed.
The vast majority of errors were crashes caused by segmentation faults, invalid instruction
errors, and other errors detected by the operating system. In MC-SEI, 2.35% of the errors
(11 cases) were detected by libsei, and we observed no undetected errors. In MC, 0.85%
of the errors (4 cases) were undetected. Although not conclusive, the experiment indicates
that (1) undetected errors, i.e., corrupt messages, can happen due to hardware faults; and (2)
some of these faults manifest as ASC faults and are successfully detected by libsei.
Errors MC MC-SEI
Crash/other 464 457
SEI-detected - 11
Undetected 4 0
Total 468 468
Table 4.7: Errors in memcached when undervolting CPU
140
4.6 Performance evaluation
4.6 PERFORMANCE EVALUATION
We present in this section a performance evaluation of our SEI-hardened applications. We
focus on the following questions: (1) How much overhead does SEI add to the applications?
(2) Does memcached hardened with SEI scale with the number of threads? (3) Does the
overhead depend on memcached parameters such as value size and number of keys? (4) What
is the overhead difference between SEI with snapshot buffers and with changelog buffers?
(5) Does SEI’s barrier affect the performance of multithreaded executions?
4.6.1 SETUP AND METHODOLOGY
We run memcached with a hashtable of 2 GB on a 12-core 2.66 GHz Intel Xeon X5650 machine
(Linux 3.8 kernel). The machine has SSE4.2 extensions, hence, our CRC calculation uses
the respective hardware extensions. We use 8 client machines with a similar configuration
(8-core 2 GHz Xeon) connected via Gigabit Ethernet. Each client machine runs one instance of
Facebook’s mcblaster12 workload generator. Each mcblaster instance measures the average
throughput and response time for 60 seconds.
The workload can be configured using the following parameters. A request can either
get or set a value from or into the selected key. Clients randomly select (using uniform
distribution) the next key to be issued from the integer set {1, . . . , K } where K is called key
range. The value size defines the size of the values. The threads value defines the number of
threads in memcached; one client machine with 64 connections is started for each memcached
thread. Finally, the load is the aggregated number of requests per second issued by all clients.
Clients mainly issue get requests since they represent the vast majority of operations in typical
workloads [Ati+12; Nis+13; Ven+12].
We consider the following memcached variants: MC, MC-SEI, MC-CLOG, and MC-SEIL. Like
PASC, MC-CLOG uses changelog buffers instead of snapshot buffers and does not execute
the Reset block between the two executions of the original event handler (see Section 4.1.1).
Stock memcached has an important bottleneck due to a global lock protecting the LRU eviction
list, i.e., cache˙lock, that is known to affect scalability [Ati+12]. We have improved this bottle-
neck to increase scalability by having all our variants of memcached acquiring the cache˙lock
with trylock(), and only updating the list if there are no concurrent updates. Even with this
bottleneck improvement, MC still does not scale above 8 threads. Finally, to avoid modifying
the workload generator, the hardened memcached variants compute 32-bit CRCs as prescribed
by the algorithm, but do not send them along with messages. The expected performance
impact of 4 bytes of CRC is negligible when added.
Experiments with Deadwood follow a similar setup, but with up to 20 client machines run-
ning nsping - a DNS querying tool available in BSD ports. Clients send requests to resolve
a domain name randomly selected from the list of 100 most visited websites13. Deadwood
is a single-threaded server and only evaluated in the “single-threaded scenario” section be-
low. In the warm-up phase the resolver’s cache is empty, hence, the first request for every
domain name is forwarded to an upstream DNS server. Once the upstream server replies,
the response is cached and sent to the client. Further requests for the same domain name
are then served directly by Deadwood. The presented results do not include warm-up phase
measurements. Since cached data does not change for long periods of time (depending on
the TTL value), we only evaluate get requests. CRC for the output messages are calculated,
but not sent out. Each experiment runs for 60 seconds.
12http://github.com/fbmarc
13List taken from http://www.alexa.com.
141
4 Scalable Error Isolation
0
200
400
600
800
0
200
400
600
800
0
200
400
600
800
8
B
128
B
256
B
1 2 4 6 8 10 12
Threads
Th
ro
ug
hp
ut
w
ith
re
sp
on
se
<
1
m
s
(k
.r
eq
/s
)
MC-SEI MC-SEIL MC
0
200
400
600
800
0
200
400
600
800
0
200
400
600
800
8
B
128
B
256
B
10 100 1k 10k 100k 1M
Key range, log(key)
Th
ro
ug
hp
ut
w
ith
re
sp
on
se
<
1
m
s
(k
.r
eq
/s
)
MC-SEI MC-SEIL MC
Figure 4.11: Throughput of get requests varying threads (with key range of 1000) and key
range length (with 8 threads) for different value sizes (8, 128 and 256 bytes)
4.6.2 COMPUTATION SCALABILITY
We start by investigating the performance of MC, MC-SEI, and MC-SEIL varying number of
threads and key range. libsei shows excellent scalability in both dimensions.
Figure 4.11 (left) represents the scalability limit for memcached when varying the number of
threads from 1 to 12. The y-axis depicts the maximal throughput that can be achieved while
keeping the average response time across all requests below 1 ms. We define memcached’s
capacity at 1 ms response because response times above 1 ms are not desired in such systems
mainly used for speeding up database queries. We also vary the message size from extremely
small messages (8 B) to medium messages (256 B). With larger value sizes, fewer threads
are necessary to achieve the maximal throughput with any variant; all variants achieve their
maximal throughput with 8 threads. With value sizes larger or equal to 128 B and 8 threads,
MC-SEI and MC-SEIL show negligible throughput overhead. With a value size of 8 B and 8
threads, the overhead is 25% for MC-SEI and 20% for MC-SEIL. MC-SEIL shows a lower
overhead than MC-SEI due to the disabled SEI’s barrier.
All variants decrease their throughput with small message sizes and more than 8 threads.
The large number of threads and short message handling increase the contention on the
few acquired locks, resulting in higher response times and, consequently, lower throughput
of requests with response time below 1 ms. The throughput of MC-SEI decreases more
sharply than of other variants due to MC-SEI’s barrier, which blocks traversals until all concurrent
traversals have finished.
Figure 4.11 (right) depicts the maximal throughput achieved when varying the key range
and the value size with 8 threads. Few keys introduce contention between the threads,
since they access the same buckets, acquiring at least 2 locks per request. Critical sections
become longer due to hardening. Consequently, the scalability with queries spanning very
few keys, e.g., 10 keys, is limited. Such scenario could also represent a workload with a few
142
4.6 Performance evaluation
512 B 1 KiB 4 KiB
0.0
0.5
1.0
1.5
0 50 100 150 0 30 60 90 0 10 20 30
Throughput (k.req/s)
R
es
po
ns
e
tim
e
(m
s)
MC-SEI MC-CLOG MC
Figure 4.12: Response time versus throughput varying value size from 512 B up to 4 KiB. At
the 1 ms cut MC-SEI achieves 80% with 1 KiB values.
0.0
0.5
1.0
1.5
0 50 100 150
Throughput (k.req/s)
R
es
po
ns
e
tim
e
(m
s)
MC-SEI MC-CLOG MC
Figure 4.13: Response time versus through-
put with 8 B value size. At the
1 ms cut MC-SEI achieves 70%
of the throughput of MC.
0
25
50
75
100
0 50 100 150
Throughput (k.req/s)
C
P
U
ut
ili
za
tio
n
(%
)
MC-SEI MC-CLOG MC
Figure 4.14: CPU utilization versus through-
put with 8 B value size
hot keys. We expect, however, a memcached instance to host and serve many thousands of
different keys. As we distribute the workload across more keys, there is less contention and
consequently more opportunity for concurrent execution. The overhead with 1 M keys and 8 B
value sizes, for example, is about 25% for both MC-SEI and MC-SEIL. The overhead becomes
negligible with larger value sizes and more than 100 keys.
4.6.3 SINGLE-THREAD SCENARIOS
PERFORMANCE OF MEMCACHED
SEI is designed to amortize its performance overhead with the number of threads when
little contention exists. However, even in a single-threaded configuration, it has 1.6x higher
throughput than our PASC adaptation, MC-CLOG, and a viable overhead compared to MC. The
value sizes range from extremely small messages (8 B) to fairly large messages (4 KiB).
Figure 4.12 shows the response time versus throughput for memcached using value sizes
143
4 Scalable Error Isolation
from 512 B up to 4 KiB and varying the load of get requests. Note that the vast majority of
the requests in typical workloads are gets [Ati+12; Nis+13; Ven+12]. For 1 KiB and 4 KiB large
values, the response time elbow of MC raises at the limit of the network indicating that MC
is network bound. For smaller value sizes, e.g., 512 B, MC is CPU bound. With 4 KiB large
values, all variants saturate the network, while being CPU bound with 1 KiB or smaller values.
Figures 4.13 and 4.14 show results for 8 B value sizes. A value size as small as 8 B can
still be useful since it can represent, for example, a 64-bit counter. As with 512 B, all variants
are CPU bound (Figure 4.14). The variants have their response time elbow once the CPU
utilization reaches 100%, i.e., at about 55, 80, and 105 k.req/s for MC-CLOG, MC-SEI, and MC,
respectively. Note that the throughput increases even though the CPU has reached its limit
due to batching at the socket level.
In summary, MC-SEI has 70% of the MC’s throughput with 8 B large values and about 80%
with 1 KiB large values, representing 30% and 20% overhead, respectively. The overhead of
MC-CLOG is at least two times larger, i.e., about 58% with 8 B values and 48% for 1 KiB. Pre-
senting a substantially higher overhead than MC-SEI, we do not evaluate MC-CLOG any further.
Note that these results show the benefit of using snapshot buffers instead of changelogs,
even though the transactional compiler does not completely remove the instrumentation of
operations reading from memory (see discussion on Page 132).
Value size. We now investigate in more detail the influence of the value size on the over-
head of get and set requests. Figure 4.15 depicts the response time and CPU utilization
of MC-SEI and MC varying the value size from 8 B to 8 KiB with a key range of 1000 keys
and a load of 10 k.req/s. The value size increases the message size and, consequently, the
response time. The difference of response time varies from 2% to 7% (small labels on the
top of the response time measurements). Larger value sizes also affect the CPU utilization
because larger messages have to be copied from/to the socket. Furthermore, MC-SEI’s CPU
utilization overhead increases with the value size irrespective of whether get or set requests
are issued. For example, for set requests, the difference of CPU utilization between MC-SEI
and MC increases from 7.8% up to 17%. The reason can be tracked down to the CRC cal-
culations: The dashed lines show the average CPU utilization of MC-SEI with CRC calculation
disabled. The difference between the dashed line and MC is roughly constant around 9%.
Large requests, however, saturate the network before CPU (see 4 KiB, Figure 4.12), so the
additional CRC computation does not cause any important performance penalty. Also, many
practical workloads are comprised of small requests. For such workloads, this additional CRC
computation is not an issue.
Key range. So far we have used only a key range of only 1000 keys. With a range of 1000
keys most of the accessed memcached state fits in the lowest CPU cache. Therefore, we now
investigate the influence of the key range on MC-SEI and MC varying the key range from 1000
to 100,000 keys with 1 KiB large values and a load of 10 k.req/s. Note that the unit of the
x-axis is 1000 keys. On the top of the response time measurements a small label indicates the
overhead of MC-SEI. No trend can be identified in the response time or CPU utilization with
the increasing number of keys. The response time overhead varying between 3% and 6%.
The CPU utilization difference is about 4% and 11% for get and set requests, respectively.
The additional overhead for set is expected due to the longer code path in such requests.
PERFORMANCE OF DEADWOOD
Similar to MC, we investigate the performance of hardened version of Deadwood (DW-SEI)
compared to native Deadwood (DW). We measure response time, throughput and CPU uti-
lization while varying the number of clients.
144
4.6 Performance evaluation
+5%
+3%
+2%
+3%
+6%
+6%
+7%
+6%
0.0
0.1
0.2
0.3
0.4
0.0
0.1
0.2
0.3
0.4
get
set
1Ki 4Ki 8Ki8
Value size (bytes)
R
es
po
ns
e
tim
e
(m
s)
MC-SEI MC
0
20
40
60
0
20
40
60
get
set
1Ki 4Ki 8Ki8
Value size (bytes)
C
P
U
ut
ili
za
tio
n
(%
)
MC-SEI MC
Figure 4.15: Response time and CPU utilization varying value size for gets and sets with
10 k.req/s load
+3% +4% +3% +3% +3%
+5% +6% +4% +5% +4%
0.0
0.1
0.2
0.3
0.4
0.0
0.1
0.2
0.3
0.4
get
set
1001 20 50 80
Key range (k.key)
R
es
po
ns
e
tim
e
(m
s)
MC-SEI MC
0
20
40
60
0
20
40
60
get
set
1001 20 50 80
Key range (k.key)
C
P
U
ut
ili
za
tio
n
(%
)
MC-SEI MC
Figure 4.16: Response time and CPU utilization varying number of keys with 10 k.req/s load
(value size 1KiB)
145
4 Scalable Error Isolation
0
2
4
6
0 20 40 60
Throughput (k.req/s)
R
es
po
ns
e
tim
e
(m
s)
DW-SEI DW
Figure 4.17: Deadwood’s response time
versus throughput varying the
number of clients
0
25
50
75
100
0 20 40 60
Throughput (k.req/s)
C
P
U
ut
ili
za
tio
n
(%
)
DW-SEI DW
Figure 4.18: Deadwood’s CPU utilization
versus throughput varying the
number of clients
+6%
+6% +16% +11%
+20%
0
50
100
150
200
0 10 20 30
Throughput (k.req/s)
R
es
po
ns
e
tim
e
(µ
s)
DW-SEI DW
Figure 4.19: Deadwood’s response time (in microseconds) for throughput up to 30 k.req/s
Due to the rather small message sizes – average size of a request message is 28 B and of a
response 76 B – and due to the nature of the application itself, both Deadwood variants are CPU
bound. Figure 4.18 shows the server eventually reaches 100% CPU utilization when increasing
the load; Figure 4.17 shows the response time quickly raises once the CPU utilization is 100%.
DW-SEI has a maximum capacity of approximately 33 k.req/s, while DW of 53 k.req/s. DW-
SEI reaches its CPU bound faster due to the overhead of executing the event handler twice,
intensified by the over-instrumentation issues reported in Section 4.4.3. That boils down to a
throughput overhead of 38% under high load.
Figure 4.19 shows the response time before reaching 100% CPU consumption. In particular,
we zoom in the range between 1 k.req/s and 30 k.req/s. Under such moderate loads, the
response-time overhead varies between 6% and 20%.
DISCUSSION
Overall, even in single-threaded scenarios we observed no more than 50% overhead. The
relatively low overhead is due to the hardening of application event handlers, but not the un-
derlying software components, such as the operating system. SEI expects faults in these
components to manifest as ASC faults, corrupting the application state or its messages.
According to our fault injection experiments, SEI is sufficient and a “duplicate everything”
strategy is not strictly necessary. We expect long event-handling phases, however, to induce
higher overheads.
146
4.7 Related work
4.7 RELATED WORK
SEI builds on the work of Correia et al. [Cor+11; Cor+12b], who proposed a systematic tech-
nique to harden distributed systems against ASC faults. Their ASC-hardening algorithm, called
PASC, has three attractive properties common to SEI: it is local, so it does not require replica
processes; it is generic, since it provides formal error isolation guarantees without knowledge
of the application being hardened14; it is untrusted, since faults can occur even during the ex-
ecution of the hardening algorithm itself. However, unlike SEI, PASC is not scalable in several
dimensions:
Memory footprint: PASC essentially doubles the memory footprint of the hardened applica-
tion because it keeps two copies of the state. For systems that store data primarily in
memory such as ZooKeeper or memcached, this is a major limitation.
Multithreading: PASC does not allow multiple threads accessing a shared state, which in a
multicore world limits the opportunities for concurrent processing.
Development effort: PASC requires developers to build their application from scratch on top
of the library, and uses a single state object to encapsulate the whole process state,
which is impractical in large systems.
The approach presented in Chapter 3 suffers of similar limitations: AN-encoding increases
the application state by a factor of two; developers have to implement their applications on
top of our framework; and the encoding compiler used in our framework does not support
multithreaded applications.
SWIFT [Rei+05a] is a well-known compiler-based technique to detect transient hardware
errors. It also relies on hardware mechanisms to detect memory errors – i.e., it assumes
machines are equipped with hardware-based error-correcting codes – and concentrates on
errors occurring in the CPU. The main idea of the approach is to repeat the execution of every
instruction computing new values, and to subsequently compare these values. In contrast
to SEI, SWIFT only detects single bit flips and cannot detect the whole range of control-
flow errors detected by SEI. Moreover, SWIFT does not focus on end-to-end protection in
distributed systems; it treats any operation that writes into or reads from memory as an I/O
operation. In contrast, SEI guarantees error isolation for distributed systems: either the faulty
process itself detects the error and crashes, or the recipient detects a faulty message and
drops it.
SEI is proven correct by assuming that the dereference of corrupt pointers invariably leads
to crashes (Assumption 4.2). The pointer problem identified in our evaluation of libsei (Sec-
tion 4.5.1) decreases the coverage of Assumption 4.2. Interestingly, the same problem is also
present in SWIFT. If a corrupt pointer is dereferenced in some part of the code that is not
protected by SWIFT, then the state of the application might be compromised because a write
operation into that pointer can corrupt variables that are otherwise protected by SWIFT. By du-
plicating the state (without relying on ECC), the coverage of Assumption 4.2 is higher because
non-hardened code is unlikely to consistently corrupt both replicas of a variable. This trade-off
between space and coverage has to be considered when implementing and deploying a hard-
ening technique. Note that SEI supports both cases: replica variables in hardware with ECC
as well as replica variables implemented in software. See our paper at NSDI’15 [Beh+15a] for
a more detailed comparison between SEI, SWIFT, and PASC.
Recently, Martens et al. [Mar+14] proposed Crosscheck, an approach to protect services
against ASC faults that supports multithreading. In a nutshell, Crosscheck employs aspect-
oriented programming to enrich objects with checksums, which are then compared among
14Generic refers here only to the formal guarantees the method provides. As in SEI, the developer need to
understand the application to apply the technique.
147
4 Scalable Error Isolation
replicas. Consequently, Crosscheck requires replication and deterministic multithreading,
while SEI does not. As discussed in Chapter 1, many applications can tolerate crashes without
replication, e.g., memcached. Moreover, since Crosscheck uses aspect-oriented programming,
it might require existing applications to be reimplemented; for example, Martens et al. reim-
plemented memcached in C++. Therefore, SEI can be applied to a broader set of distributed
systems than Crosscheck. Note that in Chapter 5, we also present an approach to tolerate
ASC faults leveraging the redundancy already available in replicated systems.
Vyas et al. [VLS13] reported on the “transactionalization” of memcached. Their work also
employs the TM compiler extension used by libsei. Nevertheless, Vyas et al. focused on
concurrency, substituting critical sections with transactions and employing a real transactional
memory implementation. Our work focus on ASC-hardening. libsei does not “transaction-
alize” critical sections, but complete event handlers, potentially containing multiple critical
sections. Moreover, libsei does not substitute the concurrency control mechanism with a
transaction memory; instead, it still relies on the original locks of the program. Due to these
differences, we could not reuse most of the insights provided by their work.
4.8 CONCLUSION
In this chapter, we have proposed SEI, a novel hardening technique that can leverage mech-
anisms provided in hardware, such as error correction codes in memory modules, to obtain
an efficient hardened implementation. The exercise of hardening an existing system like
memcached surfaced a number of challenges, mostly related to deviations to the structure our
algorithm expects. Yet, we were able to harden it with some reasonable amount of effort,
keeping its core functionality.
We have evaluated the performance of our hardened memcached, using hardware-level error
detection in the memory module. Hardening memcached with SEI only introduces a negligible
throughput overhead with value sizes of 128 B or larger, and at most 25% with small messages
with value size of 8 B. Moreover, SEI adds a constant memory overhead of about 30 KiB per
thread to the application. Even in single-threaded scenarios with value sizes as small as 8 B,
SEI provides an acceptable overhead; for example, SEI-hardened memcached shows only 30%
lower throughput than stock memcached. Moreover, SEI only increases response time between
2 and 7% in memcached and between 9 and 13% in Deadwood for moderate loads.
In our fault injection experiments, SEI effectively prevents error propagation, reducing the
undetected errors from about 33% down to only 0.25% of the errors. Moreover, when
inducing faults by undervolting the CPU, we have not observed any error propagation in the
hardened memcached, whereas without hardening 0.85% of the errors were propagated. We
conclude that SEI is effective coping with arbitrary state corruption.
Road map. In Chapter 5, we return to AN-encoding (Chapter 3) and propose a solution
tailored to replicated systems employing Paxos. The approach is less generic than the one
presented in this chapter, but can drastically reduce the performance overhead incurred by
hardening since only a small part of the Paxos algorithm is transformed.
Future work. Several problems are left open in this chapter. In our opinion, the most impor-
tant of them are the following.
• The multithreading support currently employed by SEI is based on a 2PL scheme, which
can decrease the available parallelism of applications due to the long critical sections.
An alternative approach was discussed in Section 4.3.4. An open problem is whether
and how the stack-copy costs of the mini-traversals approach could be mitigated.
148
4.8 Conclusion
• Another issue with the multithreading support of SEI is the limitation to lock-based race-
free algorithms. We are not sure, however, whether a solution exists that can support
ad hoc synchronization.
• Our implementation, libsei, relies on the transactional memory compiler pass of gcc to
instrument the program. We believe that a custom compiler pass could solve or reduce
the impact of several of the limitations discussed in Sections 4.4.3 and 4.4.4.
• Hardware fault injection experiments were not extensively performed; a more elaborated
setup should be devised to draw more insightful conclusions.
• Finally, our software fault injection experiments do not inject multibit faults nor corrupt
multiple variables, although SEI should be able to cope with such faults in theory. Ex-
panding these experiments can help to evaluate how resilient our implementation is in
practice.
149

5 HARDENED STATE-MACHINE
REPLICATION∗
∗The contents of this chapter first appeared at SRDS ’14 [BKF14].
151

5.1 Rationale
State-Machine Replication (SMR) is a well-known technique to make services fault-tolerant and
is often employed by the industry in systems such as Chubby [Bur06], ZooKeeper [Hun+10],
and Megastore [Bak+11]. Although SMR systems tolerate process crashes, practical imple-
mentations cannot withstand ASC faults.
As argued in Chapter 1, Byzantine-fault-tolerant (BFT) algorithms can tolerate state corrup-
tions, but typically incur unjustifiable ownership and maintenance costs; so they are not used
in practice to implement SMR systems. Encoded processes (Chapter 3) and SEI (Chapter 4)
can harden crash-tolerant systems against ASC faults without requiring the systems to be
replicated. Unfortunately, encoding and SEI have a space and performance cost associated
with their generality.
In this chapter, we show that if an application already employs replication, one can opti-
mize these hardening approaches to incur a smaller overhead. In particular, we show that
crash-tolerant SMR systems and the service running on top can also tolerate ASC faults by
hardening only part of the underlying atomic broadcast algorithm. Our approach incurs little
state, throughput and response time overhead.
In Section 5.1, we propose HardPaxos, a variant of the Paxos [Lam98] algorithm. Paxos
is a crash-tolerant algorithm widely used to implement the atomic broadcast in SMR sys-
tems [Bur06; Bak+11; KA08]. In our variant of this algorithm, critical functions and state
variables are factored out and kept inside a trusted module called HardCore. In Section 5.2,
we describe how to satisfy the Paxos invariants given HardCore is trustworthy.
If HardCore is an ordinary software module, it may also be subject to ASC faults. In Sec-
tion 5.3, HardCore is made trustworthy with AN-encoding (Chapter 3) and duplicated execution
(a largely simplified version of SEI’s approach). If a hardware fault affects the state or computa-
tion of HardCore, the whole process is aborted with a high probability. Moreover, all messages
exchanged in HardPaxos are augmented with end-to-end error detection codes that allow the
receiver’s HardCore to detect the incorrect execution of a non-silent faulty process.
In Section 5.5, we perform extensive fault injection experiments to evaluate the fault cov-
erage of our hardening. The experiments show a decrease of undetected errors from about
15% without hardening down to 0.9% with AN-encoding and further down to 0.02% with
AN-encoding plus duplicated execution. We also show that the hardening does not hamper
performance. Even with the strongest hardening, the throughput loss is at most 4% for small
messages (64 bytes) and close to zero for messages of 1 KiB or larger; the increase in the
single response time does not exceed 5%.
Section 5.2.4 discusses HardPaxos’s correctness, Section 5.2.5 presents the garbage col-
lection support in HardPaxos, and Section 5.6 presents further related work. We conclude the
chapter in Section 5.7.
5.1 RATIONALE
In this section, we describe our assumptions and give an overview of our approach.
5.1.1 SYSTEM MODEL
We assume an asynchronous system model with a set of processes Π = {π1, . . . ,πn} (with
n ≥ 3) that communicate via message-passing. Each process represents a replica of a
service, for example, a database (see Figure 5.1). The service runs on top of an SMR library.
Clients access the service by issuing requests to one of the processes using a client library.
The execution of a replica is given by a sequence of slots; each slot represents one client
request. The SMR server library guarantees that correct processes agree on the requests
to be executed in each slot. To achieve agreement, the server library employs HardPaxos.
153
5 Hardened state-machine replication
Service
HardPaxos HardCore
SMR Server Lib
Client Lib
Client
replies
requests
Figure 5.1: HardPaxos-based SMR architecture
HardPaxos contains a trusted module called HardCore, which encapsulates a set of variables
and functions described below.
5.1.2 FAILURE MODEL AND ASSUMPTION COVERAGE
In this chapter, our failure model is basically the same as defined in Section 2.1: messages
might be lost, duplicated or reordered, and processes might crash or fail arbitrarily. We restrict,
however, failures with the following assumptions.
Assumption 5.1 There is a bound f = ⌊(n − 1)/ 2⌋ on the number of faulty processes; failures
occur independently.
Assumption 5.2 Hash functions are collision-resistant and error-detection codes cannot be
forged.
Assumption 5.3 The trusted module only fails by crashing its process.
Note that the assumption that error-detection codes cannot be forged (Assumption 5.2) is
nothing else than our fault diversity assumption (Assumption 2.2).
For the sake of reasoning about the algorithm, one asserts that these assumptions always
hold. In practice, however, assumptions hold with a probability that depends on the system
implementation and deployment environment – this probability is called assumption cover-
age [Pow92]. For example, Assumption 5.2 only holds with a high probability if the system
is not subject to malicious attacks; this work focuses on such non-malicious deployments.
Assumption 5.3 only holds with a high probability if either the system is deployed on very
reliable hardware – an expensive solution – or if we harden the trusted module against ASC
faults (Section 5.3). Note that Assumption 5.3 does not restrict the failure modes of the
untrusted part of the processes, including the service running on top. Such differentiation in
the failure model is called wormhole in the literature [Ver06]. Another example of wormholes
are unreliable failure detectors [CT96].
Furthermore, we can only provide liveness properties under the following additional assump-
tions:
Assumption 5.4 Message transmission is eventually timely, i.e., there is a global stabilization
time [DLS88].
Assumption 5.5 Faulty replicas eventually become silent, e.g., by crashing, or are rejuve-
nated.
In practice, after a period of arbitrary behavior, hardware faults often lead to a kernel panic
or a machine shutdown – as in the Mortal Byzantine model [Wid+07]. Moreover, processes
can be periodically rejuvenated [CL99]. Rejuvenation is, however, an orthogonal concern out
of the scope of this dissertation.
154
5.1 Rationale
Client
Leader
p1
p2
p3
REQUEST PROPOSE ACCEPT COMMIT REPLY
request delivered reply delivered
Figure 5.2: Normal-case execution
p1
p2
Candidate
p3
PREPARE PROMISE REPROPOSE
Figure 5.3: Election and recovery
5.1.3 HARDPAXOS OVERVIEW
HardPaxos is one possible variant of the Paxos algorithm [Lam98]. Paxos is typically described
with proposer, acceptor and learner roles [Lam01]. In HardPaxos, processes play all three
roles. Hence, we prefer the terms leaders and followers.
HardPaxos satisfies the following properties:
Property 5.1 (Agreement) If two correct processes πi and πj deliver requests ri and rj in slot
s, then ri = rj .
Property 5.2 (Validity) Only a request that has been issued by a client may be delivered by
correct processes.
Property 5.3 (Termination) If a client issues a request r, all correct processes eventually
deliver r.
During normal execution (see Figure 5.2), a distinguished process, called leader, proposes
requests issued by clients to all other processes, called followers. The goal of the leader is
to get client requests accepted by a majority of processes: itself and half of the followers.
A client request comprises the command to be executed by the service, the request’s hash
(which identifies and protects the request), a client identification number and a client sequence
number (which have to be used by the service to execute requests in client order and make
them idempotent). Upon receiving requests from the client, the leader proposes them in
slots. Followers accept proposals from the leader by storing them into a log and sending
accept messages to the leader (the leader accepts its own proposal immediately). Once the
leader receives accept messages from a majority of processes, it sends a commit message
to the followers and delivers the request to the service. Upon receipt of a commit message,
each follower also delivers the request.
The service executes the request and sends a reply to the client via the SMR library. The
client library delivers the reply upon receiving f + 1 equivalent reply messages from different
processes. This voting mechanism [Sch90] is common for BFT algorithms [CL99] and enables
HardPaxos to tolerate ASC faults in the atomic broadcast and in the service.
HardPaxos runs in epochs, each of them having at most one leader. An epoch starts when a
candidate demotes the current leader sending a prepare message with a new epoch number
to all processes (see Figure 5.3). Processes reply to the prepare message with a promise
message, promising not to accept any message from older epochs (effectively adopting the
proposed epoch); each process sends a log of accepted requests along with the promise.
The candidate becomes leader once it receives promises for its epoch from a majority of
processes; the candidate immediately receives a promise from itself. The promise messages
also allow the new leader to determine which requests were not yet committed. The new
leader recovers the system by reproposing these requests. As long as the system is stable,
no other process tries to demote the current leader.
155
5 Hardened state-machine replication
Certificate Fields
PREPARE π, e
PROMISE π, e, a, c
PROPOSE π, e, s, h
ACCEPT π, e, ea, s, h, m
COMMIT π, e, s, h
Table 5.1: Certificate types
Field Description
a max. accepted slot
c max. committed slot
e current epoch
ea accepted-at epoch
h request hash
m committed mark
π process id
s slot number
Table 5.2: Certificate fields
We assume messages arrive in FIFO order and are not lost at the level of HardPaxos.
That is implemented by the server library using TCP sockets. Additionally, we assume the
client library retries a request if no response arrives. Similar assumptions are typical in the
literature [Hun+10; Ver+13].
5.1.4 HARDCORE, INVARIANTS, AND CERTIFICATES
A naive approach to guarantee safety in face of state corruption would be to move the whole
algorithm inside the trusted module. In contrast, our approach only adds enough functionality
into HardCore so that the invariants guaranteed by the classical Paxos still hold under our
failure model. We keep most of the processing – including communication via sockets, event
looping, data structure manipulations, iterations, copying and storing requests, and service on
top – outside HardCore to minimize the performance impact incurred by hardening, which can
be prohibitive (see Section 3.5).
We consider the two central invariants of Paxos, adapted from “Paxos Made Simple” [Lam01].
These invariants enable HardPaxos to satisfy the properties in Section 5.1.3.
Invariant 5.1 A process accepts a proposal in epoch e if and only if it has not sent a promise
in an epoch e′ with e′ > e.
Invariant 5.2 For each slot s, for any request r and epoch e, if a proposal with request r is
issued in epoch e, then there is a set S consisting of a majority of processes such that either
(a) no process in S has accepted any proposal for slot s in any epoch e′ with e′ < e; or
(b) request r is the request of the proposal m among all proposals accepted by the processes
in S such that m has been issued in the highest epoch e′ with e′ < e.
We illustrate our approach with the example of how Invariant 5.2(a) is guaranteed: A leader
may propose a request r in a slot s only if no process in a majority has accepted any other re-
quest r ′ in s – otherwise, causing replicas to diverge. Since processes can fail in unpredictable
ways, we cannot guarantee a faulty leader never sends different proposals for the same slot
s. Nevertheless, we can guarantee that only one of the proposals is legal by requiring pro-
posals to be confirmed by HardCore with a certificate. The leader’s HardCore only confirms a
proposal – i.e., creates a certificate – given certificates from a majority of processes declaring
that no other request has been accepted in s. This approach is pervasive in the design of
HardPaxos: each critical action of the protocol must be confirmed by HardCore, which verifies
input certificates, changes its state, and produces output certificates.
Certificates are enclosed in every message along with the message’s payload, e.g., a pro-
posal consists of a propose certificate and the client request. Certificates are sealed1 with
1We use the term sealing instead of signing because we do not use a cryptographically strong algorithm.
156
5.2 The HardPaxos algorithm
error-detection codes, such that they cannot be forged in the untrusted part of the algorithm
or on the network. Upon receipt of a message, correct processes verify the message’s certifi-
cate; if the certificate is illegal (e.g., corrupt and invalid) the message is discarded. HardCore
itself produces illegal certificates if illegal input certificates are given (Section 5.2); or if an inter-
nal error is detected (Section 5.3). Tables 5.1 and 5.2 describe the certificates of HardPaxos.
As an example, a proposec certificate consists of the process id π, the current epoch e, the
proposed slot s, and the hash h of the request r being proposed. The symbol ⊥ represents
illegal certificates.
5.2 THE HARDPAXOS ALGORITHM
In this section, we present HardPaxos, an atomic broadcast algorithm based on Paxos, which
contains a trusted subset of functions and variables called HardCore. We split HardPaxos in
four parts. Algorithms 5.1 and 5.2 represent normal operation. Algorithms 5.3 and 5.4 repre-
sent the leader election/recovery. Algorithms 5.1 and 5.3 are the untrusted part of HardPaxos,
whereas Algorithms 5.2 and 5.4 are the HardCore.
The state of HardCore mainly consists of three monotonically increasing counters: E counts
epochs of the algorithm – called ballot number in Paxos; A counts slots, i.e., the highest slot
accepted; and C counts the number of commits, i.e., the highest slot known to have a value
accepted by a majority2. The state in the untrusted part of HardPaxos mainly consists of the
variable Log holding the accepted and committed requests and their respective certificates.
Other variables are introduced below as needed.
5.2.1 NORMAL OPERATION
We start by explaining the untrusted part of HardPaxos. In particular, we start by considering
that a leader already is defined and the system is operating normally.
UNTRUSTED PART OF HARDPAXOS
Algorithm 5.1 depicts the algorithm executed by processes under normal operation. Upon
receiving a request r from a client (Line 5), the leader retrieves a certificate from its HardCore
by calling Propose().
If HardCore returns legal certificates for r , the leader proposes r along with the proposec
certificate to all followers (Line 11). Moreover, the leader immediately accepts the request by
saving it in its Log (Line 9 and 11). The leader does not need to send a proposal to itself: when
Propose() is called, an acceptc certificate is also returned. Finally, the leader adds acceptc into
Accepts (Line 10), which is used to record what processes accepted each slot.
Upon receiving a proposal from the leader, each follower lets its HardCore instance verify
the proposec certificate of the proposal. If HardCore returns a legal acceptc certificate, the
follower accepts the request in the proposal by saving the request and the acceptc certificate
into Log. Afterward, the follower sends back to the leader an accept message (Line 15).
Among other fields, an acceptc certificate contains the slot s and the request’s hash
value h. Upon receiving an accept message, the leader saves the acceptc certificate in
Accepts (Line 19). If the leader has collected certificates from a majority of processes for
slot s, the leader commits the request calling Commit() on the HardCore. If a legal commitc
certificate is returned, the leader sends it to the followers in a commit message and delivers
the request r .
Upon receiving the commit message from the leader (Line 25), each follower commits the
request by calling Committed() in its HardCore. Committed() verifies the commitc certificate
2In Paxos parlance, a value is said to be chosen once a majority accepts it.
157
5 Hardened state-machine replication
1 upon initialization do
2 Log[ ]← {}
3 Promises← {}
4 Accepts[ ]← {}
5 upon receiving ⟨REQUEST, r⟩ from client do
6 proposec, acceptc ←
Propose(r .h, Promises)
7 if ¬legal(proposec)
or ¬legal(acceptc) then
8 return
9 Log[acceptc.s] ← (acceptc, r )
10 Accepts[acceptc.s] ← {acceptc}
11 send ⟨proposec, r⟩ to all π in Π − {πi }
12 upon receiving ⟨proposec, r⟩ from πj do
13 acceptc ← Accept(proposec)
14 if ¬legal(acceptc) then return
15 Log[s] ← (acceptc, r )
16 send ⟨acceptc⟩ to πj
17 upon receiving ⟨acceptc⟩ on leader do
18 s← acceptc.s
19 Accepts[s]← Accepts[s] ∪ {acceptc}
20 if |Accepts[s]| ≥ f + 1 then
21 commitc ←
Commit(acceptc.h, Accepts[s])
22 if legal(commitc) then
23 deliver Log[s].r
24 send ⟨commitc⟩ to all π in
Π − {πi }
25 upon receiving ⟨commitc⟩ on follower do
26 if Committed(commitc) then
27 deliver Log[commitc.s].r
Algorithm 5.1: Normal operation for process πi (untrusted part of HardPaxos)
and returns a Boolean allowing the delivery or denying it. In the positive case, the follower
delivers the request r .
Request delivery lies outside the concerns of HardPaxos. The SMR server library checks
delivers the request to the service, which in turn creates a reply to the client. Upon receiving
f + 1 equal replies, the SMR client library delivers the reply to the client.
HARDCORE: PROPOSING THE RIGHT REQUESTS
We now describe the same execution from the perspective of the HardCore modules; the
algorithm is depicted in Algorithm 5.2. Invariant 5.2(a) states the leader can only propose a
request r with hash h for a slot s in epoch e if no other request r ′ has been accepted for the
same slot s by a majority in any epoch less than e. The Propose() function (Line 3) returns a
legal proposec certificate under two conditions. First, the provided promisec certificates are
from a majority of processes and match the epoch E counter. Second, the certificates have
the highest accepted slot a less than s, indicating that no process has accepted any other
request in s (Line 8). promisec certificates mark the highest slot accepted by each process
and are retrieved during leader election. Note that some functions such as Propose() check
whether the number of outstanding accepted requests is within a maximum K . The constant
K is essential for the garbage collection mechanism presented in Section 5.2.5.
To ensure Invariant 5.2(a), Propose() also guarantees that (1) a process can create only one
legal proposec certificate for a slot s in an epoch e since the A counter is always incremented
in one epoch; (2) only one process can create legal proposec certificates in an epoch e since
leaderOf(e) = π is true for a single process. An epoch is a tuple (i , p) of an integer i and a
process identifier π. Epochs are ordered by the integers using the process identifier as tie
breaker. No two leaders ever choose the same epoch: a process calculates the next epoch
incrementing the integer i and substituting the process identifier p with its own.
The Accept() function in the HardCore checks whether the proposal of the leader is legal,
whether it matches the follower’s epoch E , and whether the request is intended to be executed
in the next slot A+1. If the test succeeds, HardCore increments the acceptance counter A
and returns a legal acceptance certificate.
The Commit() and Committed() functions in HardCore check if certificates are legal and
158
5.2 The HardPaxos algorithm
1 upon initialization of πi do
2 E , A, C ← 0
3 function Propose(h, Promises)
4 if ¬leaderOf(E ) = πi or Ah> A
or A - C ≥ K then
5 return ⊥,⊥
6 s← A + 1
7 V ← legalSet(Promises)
8 if |{v ∈ V : v.e = E ∧ v.a < s}| ≤ f then
9 return ⊥,⊥
10 A ← s
11 return seal(⟨PROPOSE, πi , E , A, h⟩),
seal(⟨ACCEPT, πi , E , E , A, h, FALSE⟩)
12 function Accept(⟨PROPOSE, π, e, s, h⟩)
13 if ¬legal(⟨PROPOSE, π, e, s, h⟩) or e ̸= E
or s ̸= A + 1 or A - C ≥ K then
14 return ⊥
15 A ← A + 1
16 return seal(⟨ACCEPT, πi , E , E , A, h,
FALSE⟩)
17 function Commit(h, Accepts)
18 V ← legalSet(Accepts)
19 if ¬leaderOf(E ) = πi
or |{x ∈ V : x.e = E ∧ x.s = C + 1
∧ x.h = h}| ≤ f then
20 return ⊥
21 C ← C + 1
22 return seal(⟨COMMIT, πi , E , C, h⟩)
23 function Committed(⟨COMMIT, π, e, s, h⟩)
24 if ¬legal(⟨COMMIT, π, e, s, h⟩)
25 or e ̸= E or s ̸= C + 1 then
26 return FALSE
27 C ← C + 1
28 return TRUE
Algorithm 5.2: Normal case for πi (HardCore)
increment the commit counter C. Commit() returns a legal certificate for each slot at most
once since the counter C is only monotonically incremented in HardCore, i.e., C never rolls
back. Similarly, Committed() returns TRUE at most once. This implies that correct processes
delivers each slot at most once.
5.2.2 NEW EPOCH START
Before discussing Invariants 5.1 and 5.2(b), we explain how leader election and recovery works
in HardPaxos. Again, we start with the untrusted part and then present HardCore’s counterpart.
UNTRUSTED PART OF HARDPAXOS
In Algorithm 5.3, when a candidate πj suspects the current leader πi is faulty – e.g., using
heartbeats – it calls Prepare() in its HardCore, which in turn increments the epoch counter E
and returns a preparec and a promisec certificate. After updating the Log – as explained below
– πj initializes the Recovery set with its accepted requests and saves its promisec certificate
in the Promises set, which is used by the new candidate to collect promises from processes.
The candidate then sends a prepare message to all other processes.
Upon receiving a prepare message from candidate πj , a follower πi calls Promise() in its
HardCore, which returns a promisec certificate. If the promise certificate is legal, the follower
then updates its Log – again, as explained below – and send the certificate and its Log of
accepted requests to the candidate.
Upon receiving promise messages, the candidate πj collects the Logs and the promisec
certificates contained in the messages (Line 23). Candidate πj becomes the leader once it re-
ceives legal promisec certificates from a majority of processes and, consequently, StartEpoch()
returns TRUE. In this case, the new leader also calls the Recover() function.
159
5 Hardened state-machine replication
1 upon initialization do
2 Recovery ← {}
3 upon suspecting current leader do
4 preparec, promisec← Prepare()
5 if ¬legal(preparec)
or ¬legal(promisec) then
6 return
7 L[]← {}
8 foreach (acceptc, r )∈ Log do
9 L← L ∪ {(Update(acceptc), r)}
10 Log ← L
11 Recovery ← {Log}
12 Promises ← {promisec}
13 send ⟨preparec⟩ to all πj in Π − {πi }
14 upon receiving ⟨preparec⟩ from πj do
15 promisec← Promise(preparec)
16 if ¬legal(promisec) then
17 return
18 L[]← {}
19 foreach (acceptc, r )∈ Log do
20 L← L ∪ {(Update(acceptc), r)}
21 Log ← L
22 send ⟨promisec, Log⟩ to πj
23 upon receiving ⟨promisec, L⟩ do
24 Recovery ← Recovery ∪ {L}
25 Promises← Promises ∪ {promisec}
26 if StartEpoch(Promises) then
27 Recover()
28 function Recover()
29 minc ← min({x.c : ∀x ∈ Promises}) + 1
30 maxa ← max({x.a : ∀x ∈ Promises})
31 for s← minc to maxa do
32 X ← {x[s].acceptc : ∀x ∈ Recovery}
33 proposec, acceptc ←
Repropose(s, X , Promises)
34 if ¬legal(proposec)
or ¬legal(acceptc) then
35 return
36 Y ← {x[s].r : ∀x ∈ Recovery}
37 Let r be s.t. r ∈ Y ∧ r .h = acceptc.h
38 Log[s]← (acceptc, r )
39 Accepts[s]← {acceptc}
40 send ⟨proposec, r⟩ to all π in Π − {πi }
41 Z ← {x ∈ X : x.m = TRUE}
42 foreach acceptc ∈ Z do
43 send ⟨acceptc⟩ to πi
Algorithm 5.3: Leader Election for πi (untrusted part of HardPaxos)
HARDCORE: ELECTING A PROCESS
When a candidate suspects the current leader is faulty it calls Prepare() (Algorithm 5.4). Pre-
pare() increments the epoch counter E – more precisely, it chooses the next epoch in lexico-
graphical order corresponding to πj – and returns a preparec and a promisec certificate. We
ignore for the moment the updates of A, Ap and Ah.
Promise() checks whether the given preparec certificate is legal and the epoch in the certifi-
cate is greater than the current epoch. In the positive case, HardCore saves the epoch counter
E in the previous epoch variable Ep, and adopts the new epoch e by overwriting its E counter.
From this time on, the follower’s HardCore ignores any proposals from older epochs because
the Accept() function only accepts a proposal in epoch e if e = E (Algorithm 5.2, Line 12). This
scheme guarantees Invariant 5.1.
HARDCORE: ACCEPTING THE RIGHT PROPOSALS
During recovery, HardCore has to reset the A counter (Algorithm 5.4, Lines 7 and 15), oth-
erwise slots that have already accepted a request cannot be modified. Note that in different
epochs multiple requests might have to be accepted for the same slot before a request is
finally chosen, i.e., before an agreement slot is reached among the processes.
A naive approach would be to reset A to 0, but that would make garbage collection impossi-
ble (see Section 5.2.5) by requiring all committed requests to be reconsidered in the recovery.
Slots that have already reached agreement are eventually committed during normal operation,
and the highest committed slot is marked by C. So we reset the acceptance counter A to C,
and memorize the highest accepted slot in Ah.
A promisec certificate consists of the new epoch E , the highest accepted slot Ah, and the
160
5.2 The HardPaxos algorithm
1 upon initialization of πi do
2 Ep, Ah, Ap← 0
3 function Prepare()
4 Ep ← E ; E ← nextEpoch(E )
5 if A > Ah then
6 Ah ← A
7 Ap ← A; A ← C
8 return seal(⟨PREPARE, πi , E⟩),
seal(⟨PROMISE, πi , E , Ah, C⟩)
9 function Promise(⟨PREPARE, π, e⟩)
10 if ¬legal(⟨PREPARE, π, e⟩) or e ≤ E then
11 return ⊥
12 Ep ← E ; E ← e
13 if A > Ah then
14 Ah ← A
15 Ap ← A; A ← C
16 return seal(⟨PROMISE, πi , E , Ah, C⟩)
17 function Update(⟨ACCEPT, π, e, ea, s, h, m⟩)
18 if ¬legal(⟨ACCEPT, π, e, ea, s, h, m⟩)
or e ̸= Ep
or (¬m and s ≤ Ap and ea ̸= Ep) then
19 return ⊥
20 else
21 return seal(⟨ACCEPT, πi , E , ea, s, h,
s ≤ C⟩)
22 function StartEpoch(Promises)
23 V ← legal(Promises)
24 if |{v ∈ V : v.e = E}| ≤ f then
25 return FALSE
26 else
27 return TRUE
28 function Repropose(s, Accepts, Promises)
29 if ¬leaderOf(E ) = πi or s ̸= A + 1
or A − C ≥ K then
30 return ⊥,⊥
31 Vp ← legalSet(Promises)
32 if |Vp| ≤ f then
33 return ⊥,⊥
34 if ∃v ∈ Vp : v.e ̸= E
or ∀v ∈ Vp : v.a < s then
35 return ⊥,⊥
36 Va ← legalSet(Accepts)
37 if ∃v ∈ Vp : v.a ≥ s ∧
∄w ∈ Va : w.p = v.p∧w.s = s∧w.ea = E
then
38 return ⊥,⊥
39 h← chooseHash(Va)
40 A ← A + 1
41 return seal(⟨PROPOSE, πi , E , A, h⟩),
seal(⟨ACCEPT, πi , E , E , A, h, s ≤ C⟩)
Algorithm 5.4: Leader Election for πi (HardCore)
highest committed slot C. The candidate πj uses the received promisec certificates to start
its leadership and to determine from which slot to start the normal execution.
5.2.3 RECOVERY
PROMISING THE RIGHT ACCEPTED REQUESTS
Invariant 5.2 only holds if each process (1) only accepts one request per epoch in a slot s and
(2) does not forget about any acceptance. (1) is easily guaranteed because the acceptance
counter A is incremented in Accept() (Algorithm 5.2, Lines 6 and 15), disabling the acceptance
of other requests for the same slot in the same epoch. To illustrate the need of (2), consider
the following fault scenario in which a process “forgets” an acceptance: Process πi accepts
request r for some slot in epoch e. A new leader election occurs, and πi lies to the leader
during recovery returning a request r ′ accepted for that slot in a previous epoch e′.
The Update() function described in Algorithm 5.4, Line 17. guarantees forgetting is de-
tectable as follows. Whenever a process πi enters into a new epoch (Lines 4 and 12), πi
binds each acceptc certificate in Log with the current epoch E using Update() (called in Algo-
rithm 5.3, Lines 8 and 19). If the acceptc certificates are not updated, they will be ignored by
the HardCore of the leader.
Update() performs two important checks before updating the epoch number of a certificate.
First, it checks whether the certificate is bound with the previous epoch Ep. The check implies
that the unhardened part of HardPaxos has scrupulously updated all certificates for every epoch
in which the process has participated. Second, if the slot s has not yet been committed, i.e.,
161
5 Hardened state-machine replication
mark m is FALSE, then Update() checks whether the slot s has been accepted (or reaccepted)
by the process in the previous epoch by comparing it to Ap. In the positive case, ea must
be equal to the previous epoch Ep. In the negative case, the certificate is the latest accepted
for the slot since the certificate is bound to the previous epoch Ep, and the same check has
been performed before binding it to Ep. Update() guarantees that only the acceptc certificates
of latest accepted requests for each slot can be updated, i.e., bound to the current epoch E ,
while elder acceptc certificates expire and are ignored.
If the tests in Update() pass, HardCore returns a new accept certificate. The new certificate
contains, among other fields, the current epoch E , the epoch in which the request was ac-
cepted last ea (i.e., the previous epoch in which the process participated) and field m marking
whether the request has been committed.
REPROPOSING THE RIGHT REQUESTS
We now describe how the new leader recovers the followers. As mentioned above, upon
receiving promise messages, the candidate πj collects the Logs and the promisec certificates
contained in the messages (Algorithm 5.3, Line 23). Process πj becomes the leader once it
receives legal promisec certificates from a majority of processes and StartEpoch() returns TRUE
(Algorithm 5.4, Line 22). StartEpoch() only returns TRUE if more than f legal promises match
the current epoch E .
Once StartEpoch() returns TRUE, the leader calls Recover() to repropose previous requests
not committed by all processes (Algorithm 5.3, Line 28). The recovery calculates the first and
last slots to be reproposed. The first is the minimum committed slot among all processes
plus one. The last is the highest slot accepted by any process in a majority. For each slot
s, the leader creates a set X with the accept certificates contained in the Recovery set.
These certificates were collected with the promises. The leader then calls the Repropose()
function, which is implemented in the HardCore. Repropose() selects the correct request to
be reproposed at slot s by inspecting the given accept certificates and promise certificates.
In case the HardCore returns legal certificates, the leader collects now a set Y of requests
from the Recovery set. Next, it picks out of Y the request matching the hash chosen by the
Repropose() function. The leader accepts the request with h in slot s, possibly overwriting an
old accepted request and certificate, and also resets the Accepts map with its own acceptc
certificate (Algorithm 5.3, Line 39). Next, the leader sends a new proposal for slot s to all
followers, which are accepted in the normal operation of the followers. If some process has
committed slot s already, it will ignore the proposal. To simplify the algorithm, the leader
sends to itself all committed acceptc certificates it has from other processes (Line 42). A
simple optimization here would be possible: the leader could only repropose requests to the
followers that have not committed s.
We now come back to Invariant 5.2(b), i.e., a request r only can be reproposed if r is the
request of the proposal m among all proposals accepted by the processes in a majority such
that m has been issued in the highest epoch e′ with e′ < e. To guarantee this invariant, the
reproposed hash h is chosen from the acceptc certificates inside HardCore. Algorithm 5.4,
Line 28, describes the Repropose() function. It starts by checking whether the process is
indeed the leader for the epoch and whether the slot being reproposed is the next slot. It
then checks if there is a majority of legal, bound to the current epoch promisec certificates.
HardCore also has to make sure that at least one of the promisec certificates shows a slot
greater or equal to slot s since s cannot be reproposed if it has never been proposed before
– such scenario could occur if a faulty process calls Repropose() when actually it should not.
Next, HardCore selects the legal acceptc certificates out of Accepts set – Accepts set is the set
X in the untrusted part of the algorithm. Line 37 checks that for every process that accepted
no less than s requests, there is an acceptc certificate. Processes that do not provide acceptc
must instead provide a promisec certificate with field a less than s; this way they prove they
162
5.2 The HardPaxos algorithm
have never seen any request in s. Line 39 selects the hash h from the acceptc certificate
with the highest accepted epoch number ea just as in Paxos. Finally, Repropose() returns a
proposec and an acceptc certificates. Note that once the leader exits the Recover() function,
it is not able to propose new requests while Ah > A; in other words, the Recover() function
has to call Repropose() for all slots between the last committed slot and the highest accepted
slot. If a faulty leader does not call call Repropose() correctly, e.g., with no acceptc certificate
for a slot s, HardCore returns illegal certificates without incrementing A. Consequently, no
further Repropose() with a higher s succeeds and the leader gets stuck. In such case, some
other process becomes a candidate (and eventually leader) after a timeout.
As a result, HardCore ensures Invariant 5.2(b) holds because (1) processes cannot lie about
what they accepted in a slot; (2) only the latest accepted requests are updated from epoch
to epoch; (3) there is only one process reproposing in an epoch; (4) the leader can repropose
only one request per epoch per slot; and (5) the leader can only repropose the request of the
proposal issued in the highest epoch, and this request is selected inside HardCore.
5.2.4 CORRECTNESS
All messages in HardPaxos are sent along with a certificate generated by the HardCore.
Correct processes discard any messages with illegal certificates. Hence, the correctness
of HardPaxos mainly depends on showing that legal certificates are correctly created even in
face of arbitrary behavior in the untrusted part of the algorithm. Remember that we assume
the HardCore can only fail by crashing (Assumption 5.3).
AGREEMENT
Agreement property states that any two correct processes can only deliver the same request
r with hash h in some slot s, i.e., no two commitc certificates for the same slot can be created
with different request hashes. From Invariants 5.1 and 5.2 follows that, once a majority of
processes accepted a request r with hash h in a slot s, all subsequent proposals for s are the
request with hash h, implying that only the request r with hash h can be delivered in s. To
understand why this holds, remember that acceptc certificates are created only when proposec
certificates are received or created (Algorithm 5.2, Lines 3 and 12). proposec certificates are
created under two circumstances: when a leader reproposes slots during recovery, or when
a leader proposes a client request during normal operation.
A faulty follower could try to fool a new leader during recovery by sending an old acceptc
certificate in a promise message. Notwithstanding, the new leader only accepts certificates
bound to its epoch E . Therefore, the faulty follower would have to update the old acceptc
certificate calling Update(). Update(), however, only updates the latest acceptc certificates
of the previous epoch Ep, i.e., epoch e in the certificate is equal to Ep. Moreover, Update()
checks whether the certificate was (re)accepted in the previous epoch. If the request was
(re)accepted in the previous epoch, i.e., acceptc.s is less than or equal to Ap, then the accept
epoch ea in acceptc should be equal to Ep. If the request was committed in the previous
epoch, then m mark has not been set yet and the ea should be equal to Ep as well. If the
request was committed in an epoch e′ preceding Ep, then the mark m should have been set
in some epoch between e′ and the previous epoch Ep and, hence, the latest acceptc has been
selected in that epoch.
A faulty leader cannot fool other processes either. The leader can only propose a new
request if a majority of processes certify they have not accepted any request in s for any
epoch less than E (Algorithm 5.2, Line 8). When reproposing slot s, if different requests were
accepted in s by different processes, i.e., provided acceptc certificates containing different
request hashes, then the leader’s HardCore chooses for reproposal the request accepted
in the greatest epoch. Finally, from Invariants 5.1 and 5.2 follows that, once a majority of
163
5 Hardened state-machine replication
processes agreed on a request r in a slot s, all subsequent leaders propose r for s. When
the current leader gathers acceptc certificates from a majority, HardCore generates a commitc
certificate. Correct processes only deliver requests upon receiving such a commitc certificate.
Hence, no two correct processes ever deliver different requests in the same slot. A faulty
process can still deliver a different request, but the reply will be filtered by the client voting
mechanism explained in Section 5.1.3.
VALIDITY
Clients protect their requests with a hash (Section 5.1.3), which is checked by the service
before executing. HardPaxos satisfies validity since error correcting codes cannot be forged.
In particular, a leader cannot create a request out of the blue and calculate its hash correctly
only due to hardware errors. Assumption 5.2 allows the implementation to use a simple CRC
function to protect the client requests.
TERMINATION
Termination is guaranteed if eventually all links are stable and all faulty processes become silent
(Assumptions 5.4 and 5.5). Without assuming that faulty processes eventually become silent,
a “babbling idiot” could hamper progress by, for example, constantly sending ever increasing
PREPARE messages; or simply flooding the network switches. Note that if Assumptions 5.4
and 5.5 do not hold in practice, HardPaxos still guarantees the safety properties.
5.2.5 GARBAGE COLLECTION
The Log variable, which keeps accepted and committed requests, cannot grow without limits.
We now explain the garbage collection used to prune the log in HardPaxos.
SNAPSHOTS AND PRUNING
Three mechanisms are introduced to enable garbage collection. First, the SMR server library
asks the service to take a snapshot of its state every Z commits (a similar mechanism is used
by PBFT [CL99]). Z is called snapshot period.
Second, we tie together acceptance and commit counters by allowing the A acceptance
counter be incremented only if C ≥ A − K , where K is a constant called commit distance.
Consider the example where K = 20 and Z = 100. The K limit guarantees that if A = 121 on
process π, then π did a snapshot at slot 100 since C has to be at least 100 to increment A
from 120 to 121.
Third, we introduce snapshot slots, slots for which A is such that (A − K ) modulo Z is 1.
After proposing 120 slots, the leader constructs the next PROPOSE with A = 121 in a special
way: the proposal contains no payload, and its h value is assigned the digest of its latest
snapshot (at slot 100). When a follower receives this proposal message, it compares the
received digest with the digest of its own snapshot at slot 100, and replies with an accept
message only if the digests are the same. Note that this mechanism utilizes the HardCore’s
normal-operation functions and messages.
After the leader receives accept messages for slot 121 from a majority of processes, it com-
mits, making the snapshot stable. The leader then sends a commit message to the followers
with C = 121. This commit message is called a snapshot message, and the corresponding
certificate a snapshotc certificate. Once a process receives a snapshotc, it can prune its log
up to slot s = 100.
The snapshotc certificate guarantees that f + 1 processes agreed on the snapshot, i.e., at
least one correct process made the snapshot. Since this certificate contains not only the slot,
164
5.3 Enforcing trust in HardPaxos
but also a snapshot digest, the certificate alone is sufficient to fetch a correct snapshot from
another process.
CATCH UP
When some process π starts to lag behind, π performs a catch up to bring itself up-to-date.
Process π notices that it needs to catch up when it receives a proposal with slot higher
than its A+K . To catch up, the process has to fetch the latest snapshot as well as the
log of subsequent proposals. In the presence of arbitrary failures, it is only safe to get this
information from at least a majority of other processes3. That is why a catch up is done via
leader election. The recovering process proposes a new epoch. The followers reply to it with
promise messages, which contain not only the log of requests, but also the latest snapshotc
certificate. The leader chooses the actual latest snapshotc certificate and searches for the
corresponding snapshot. If the leader itself does not possess the latest snapshot, it asks
other replicas for it. After the snapshot is received, it is applied on the leader’s state. Besides
that, the leader has to call a special CatchUp() function in HardCore with the corresponding
snapshotc certificate. CatchUp() updates the C and A counters to the slot of the snapshot.
After installing the snapshot, the leader is up-to-date and can start reproposing requests as
described above.
5.3 ENFORCING TRUST IN HARDPAXOS
HardPaxos only satisfies Agreement if the HardCore never fails arbitrarily (Assumption 5.3), i.e.,
faults in the computation and memory affecting the integrity of the variables inside HardCore
can only result in detectable benign failures such as illegal certificates or process crashes. We
enforce Assumption 5.3 with the AN-encoding technique presented in Chapter 3. Moreover,
we combine AN-encoding with duplicated execution to improve fault coverage. The duplicated
execution employed here is inspired, but much simpler than the approach of Chapter 4. We
briefly review AN-encoding and its error model and then describe how we harden HardCore.
5.3.1 AN-ENCODING AND ERROR MODEL
AN-encoding extends the representation of each variable x of a program module from n to n+k
bits such that only 2n of the 2n+k values are code values, while the remaining are noncode,
i.e., invalid. Ideally, by randomly choosing the code values, the likelihood of an error resulting
in another code value is 2−k . In practice, AN-encoding transforms, for example, 32-bit variables
into 64-bit variables and “randomizes” the code values by multiplying the functional value xf
of each variable x with a constant A: xc = xf ·A. The domain of the encoded x is divided into a
few code values (multiples of A) and many noncode values (not multiples). The integrity of x
is verified applying the modulo operation with the constant A. If the result is zero, x is code,
otherwise noncode. Moreover, operations are also transformed, so that correctly executed
encoded operations preserve the code, and incorrectly executed encoded operations break
the code with high probability (see discussion in Section 3.3).
Forin [For89] models hardware faults as symptoms perceived at the program level (see exam-
ples in Section 2.3.2). In this model, AN-encoding can detect faulty operations, i.e., incorrectly
executed operations, and modified operands, e.g., variables corrupted by bit flips or stuck-at
bits. AN-encoding, however, cannot detect some error classes: exchanged operator, e.g., an
addition is exchanged by a subtraction; exchanged operand, e.g., a variable argument is ex-
changed with another variable due to a corrupt address; and lost stores, e.g., a value is stored
3snapshotc and commitc certificates from one process are actually safe to apply; however, proposec certificates
shall be fetched from a majority.
165
5 Hardened state-machine replication
in the wrong address and the correct address keeps its old (possibly code) value. Moreover,
control-flow errors are only detected if variables become noncode.
5.3.2 HARDENING HARDCORE
We employ an AN-encoding compiler4, which transforms C code into encoded C code, which
in turn is compiled with gcc version 4.6. The encoding compiler transforms the program code
and variables in the target module, including pointers, into encoded variables. During runtime,
if a noncode value is detected, the process is aborted, transforming a state corruption into a
crash failure.
OVERHEAD
AN-encoding is known to incur high execution and state overhead. Nevertheless, relative
to the whole process, which includes SMR library and service, AN-encoding has a limited
performance impact (as we show in Section 5.5). Similarly, the state overhead is rather small,
doubling the HardCore’s state size from 28 to 56 bytes and certificates’ size, for example
acceptc, from 24 bytes to 48 bytes.
END-TO-END PROTECTION
Certificates are always kept encoded, i.e., during computation inside HardCore and also during
storage and transmission outside HardCore. Since the certificates are always kept encoded,
the function seal() in Section 5.2 simply returns the certificate – no additional error-detection
codes have to be used other than AN code. The function legal() in Section 5.2 returns TRUE
if every encoded value in the certificate is a code value.
IMPROVED FAULT COVERAGE
As we discussed above and in Chapter 3, AN-encoding cannot capture all error classes. To
improve the probability of detecting control-flow, exchanged operator and lost store errors, we
also duplicate the HardCore module.
The implementation is fairly simple. We allocate HardCore object twice and add a wrapper
around both objects with the same interface. Every call to HardCore is captured by the
wrapper, which in turn calls both copies of the module and compares the certificates on return.
If an error is not detected by AN-encoding, a discrepancy on the returned certificates should
be seen as long as the fault affects only one of the copies. To provide end-to-end protection,
the final certificates returned by the wrapper are always a concatenation of the certificates
from both HardCore copies. In Section 5.5, we evaluate both approaches: AN-encoded and
duplicated-AN-encoded.
5.4 FAULT COVERAGE EVALUATION
We now evaluate the fault coverage of HardCore by performing software fault injection. We
consider the following questions: (1) If no hardening is used, is the number of arbitrary failures
caused by injected faults non-negligible? (2) Can HardCore with AN-encoding detect as many
errors as HardCore with AN-encoding and duplicated execution? (3) What errors can cause
the hardened HardCore to fail arbitrarily?
4In contrast to Chapter 3, we use here a commercial AN-encoding compiler from SIListra Systems GmbH:
http://silistra-systems.com
166
5.4 Fault coverage evaluation
Outcome Failure Native (%) AN (%) 2AN (%)
AN-detected
AN-crash – 35.2 39.6
Illegal cert – 0.4 0.4
Illegal state – 0.4 1.0
Crash/other
Crash 76.3 55.8 55.8
Hang 0.3 1.3 1.4
Return ⊥ 8.8 6.0 1.7
Undetected
Unexp. cert 9.0 0.5 0.02
Unexp. state 5.6 0.4 0.0
Table 5.3: Results of fault injection in HardCore. Columns show the percentage of errors in
each category.
5.4.1 METHODOLOGY
To study the fault coverage of HardCore, we wrote a simple test case in three variants:
native, AN and 2AN. The test case calls the most used functions of HardCore, i.e., Propose(),
Accept(), Commit() and Committed(). The functions are called with pre-defined arguments,
and the HardCore’s state and certificates are checked on return. For example, after a call to
Propose(), the test checks whether: (1) the returned proposec and acceptc certificates are
legal, (2) the proposec and acceptc certificates have expected values, and (3) the HardCore’s
state has expected values.
We inject one fault in one function for each test execution. The outcome of an execution
is classified as: Crash, if it crashes because of a segmentation fault or an assertion; Hang, if
it did not return after 10 seconds; Return ⊥, if the function returns an error code; AN-crash, if
the execution crashes due to a hardening detection; Illegal certificate, if the function returns
a noncode certificate (i.e., detectable with AN checks); Illegal state, if the HardCore’s state is
noncode; Unexpected certificate or unexpected state, if a certificate or the state is code but
unexpected (i.e., not detectable). The latter two failures are severe: they might propagate to
other components, breaking safety properties and bringing replicas to an inconsistent state.
To inject faults, we use the same fault injection tool developed for Section 4.5.1. The tool
randomly selects an instruction of the target function to inject one fault5. For each of the
four functions, the test runs as many times as to gather 1,000 executions with detected
outcome. For each injection, the tool randomly selects one of the following low-level faults:
CF (corrupts instruction pointer), WVAL (corrupts memory location after being written), RVAL
(corrupts memory location before being read), WADDR (corrupts address before writing to it),
RADDR (corrupts address before reading from it), WREG (corrupts register after being written),
or RREG (corrupts register before being read). A corruption randomly flips one bit of a register
or an address.
5.4.2 RESULTS
Table 5.3 summarizes the executions that resulted in some form of failure. Native variant
failed in 4688 out of 9466 executions; AN failed 4036 out of 6216 executions; and 2AN failed
4001 out of 6143 executions. For all variants, most of the failures are detections caused by
segmentation faults or assertions. Almost 15% of the failures of the native (unhardened) Hard-
Core were undetected, i.e., could compromise the safety of the system. With AN-encoding
0.9% of the failures were undetected, while with AN-encoding and duplicated execution only
5We have not injected faults in the code segment because they often result in invalid instruction streams, quickly
crashing the system [Cor+12b].
167
5 Hardened state-machine replication
Hardening WVAL RVAL WADDR RADDR WREG RREG CF Total
AN 5 10 10 1 6 2 2 36
2AN 1 0 0 0 0 0 0 1
Table 5.4: Number of undetected errors per fault type
0.02%. Although most of the hardening detections stopped the process (AN-crash), 0.4% of
them were detectable non-silent failures. Note that illegal state failures result in AN-crashes
once the state is accessed again since the state is noncode.
5.4.3 UNDETECTED ERRORS
To understand the causes of undetected errors in the hardened variants, Table 5.4 shows the
absolute number of undetected errors that caused failures for each fault type. When possible,
we relate the injected faults with Forin’s high-level error model (Section 5.3).
Control-flow errors (CF) typically bring the process to a crash by executing some invalid
instruction. In contrast, because they modify addresses before writing into them, WADDR
faults are likely to result in lost stores, which are undetectable by AN-encoding. We analyzed
the log files and the disassembled binary to find out the symptoms of the remainder fault
types. To our surprise, most undetected errors were caused by faults that lead to lost stores.
For example, in one CF case the program jumped from the preamble of a store function into
the preamble of a function that does an overflow check. Once the function returned, the caller
continued execution without noticing the missed store. Moreover, 8 out of 10 RVAL cases
were lost stores: some pointer the stack) was modified before being loaded into a register
and then used to store a value.
Other than lost stores, we found one case of exchanged operand (not detectable by AN-
encoding): The single RADDR case was an addition which had the address of one of its
operands modified to point to another code word in the memory. One of the RVAL cases
was a pointer corruption that modified the HardCore’s state. When calling Propose(), the
implementation passes two pre-allocated pointers which hold the return certificates. Before
being encoded, one of them was corrupted by an offset inside HardCore’s state. When
Propose() wrote into the certificate, it overwrote the HardCore’s state.
The 0.02% undetected errors of HardCore-2AN turns out to be a single case. In particular,
this single case seems to be a false positive: the pointer of the return certificate passed to
Propose() was corrupted on the stack of Propose() before being encoded. Both executions
of the HardCore wrote into the wrong memory. Once Propose() returned, the pointer in the
test case was still correct, but its content contained an old certificate, which was code but
unexpected. Old certificates are promptly rejected by the receiver’s HardCore; hence, they
are benign faults.
LOST STORE EXAMPLE
Since lost stores were the most common cause of undetected errors, we show one example
of a lost store caused by a WREG fault in the Commit() function. Consider for example the
snippet in Figure 5.4.
An encoded value xc kept in the stack -0x8(%rbp) has to be written into an encoded
address ac kept in register rax. First, ac has to be decoded by calling decode an64() with
the encoded value in register rdi. The unencoded address a is returned in the 32-bit register
eax. memory64 write() takes as arguments the address in edi and the value in rsi. If a fault
modifies eax or edi before memory64 write() is called, then the value xc is written in some
other memory address, resulting in a lost store.
168
5.5 Performance evaluation
// encoded value in -0x8(“%rbp) and encoded address in %rax
mov %rax,%rdi
callq 400775 ¡decode˙an64¿ // decoded address in %eax
mov -0x8(%rbp),%rdx
mov %rdx,%rsi
mov %eax,%edi
callq 4008e0 ¡memory64˙write¿
Figure 5.4: Example of lost store
5.5 PERFORMANCE EVALUATION
We focus on the following questions: (1) What is the hardening overhead of HardCore? (2)
What is the response time overhead incurred perceived by the clients? (3) Can HardPaxos
reach the same throughput as a crash-tolerant SMR library?
5.5.1 IMPLEMENTATION AND BASELINES
We have implemented an SMR library in C using the same framework developed in Chapter 3.
In the following, we give the library the same name as the algorithm – HardPaxos. HardPaxos
uses TCP sockets to communicate with clients and between replica processes. HardPaxos
employs an adaptive batching similar to PBFT’s: if a maximum number of inflight proposals is
reached, the following requests are batched in a single proposal. The batch size is limited to
30 requests.
HardPaxos comes in two flavors: AN and 2AN. Whereas AN has HardCore hardened with
AN-encoding, 2AN has HardCore additionally hardened with duplicated execution. We com-
pare HardPaxos with a crash-tolerant and a Byzantine fault-tolerant baseline: Paxos determines
our performance upper-bound, whereas PBFT6 determines our lower-bound.
In Paxos and consequently HardPaxos, the leader could quickly become a network bottle-
neck: for each request r , the leader proposes r ’s payload to the followers (Prepare in Figure 5.2)
and sends the reply to the client after r has been executed (Reply in Figure 5.2). To unload
the leader’s channels, the leader does not send a reply to the client; only the followers provide
reply messages. The client library waits for at least f + 1 replies with the same payload, and
in case some follower fails to send a reply, the client retries.
5.5.2 SETUP
Experiments were conducted on a cluster of 30 machines with 2 Intel Xeon E5405 2.0 GHz
processors, 8 GiB of RAM, and Gigabit Ethernet interface. The measured maximal TCP band-
width in the cluster is 944 Mbit/s or 118 MiB/s.
We assume ideal conditions: links are up and timely, and there are no other jobs running
on the machines. All protocols are configured to tolerate 1 fault, i.e., we use 4 replicas
for PBFT and 3 replicas for Paxos, HardPaxos-AN, and HardPaxos-2AN. Replicas run on ded-
icated machines, and clients are distributed uniformly among other machines of a cluster.
Response times are computed for each client request, and an average is calculated every
second. Throughput is controlled by increasing the number of clients. PBFT has disabled
multicast.
6http://www.pmg.csail.mit.edu/bft
169
5 Hardened state-machine replication
Variant Leader Follower
Native 284 132
AN 11,018 3,500
2AN 21,639 6,743
Table 5.5: HardCore’s cycles per commit
Variant 64 B 1 KiB 4 KiB
PBFT 340 415 662
Paxos 535 675 722
AN 551 704 751
2AN 562 724 753
Table 5.6: Response time of single requests (µs)
5.5.3 HARDCORE’S PERFORMANCE
We start by measuring the overhead incurred by hardening in our HardCore module. For that
we create a simple micro-benchmark that mimics the behavior of a leader and a follower in
a loop. Table 5.5 shows the number of hardware cycles per commit for native HardCore,
i.e., unhardened, and HardCore hardened with AN-encoding and duplicated execution. The
difference between AN and 2AN is twofold since 2AN essentially runs the same code twice
using two state copies. As expected, the overhead in comparison to native is high: about 39
times for leader and 27 times for follower. Hence, encoding would incur a prohibitive overhead
if applied to the whole process. Yet, as we show next, this overhead is not prohibitive in the
context of HardPaxos.
5.5.4 RESPONSE TIME AND THROUGHPUT
To evaluate the hardening overhead in HardPaxos, we use an echo service benchmark. The
echo service receives a client request with a dummy payload and sends back a reply with
the same payload. We experiment with payloads of 64 B, 1 KiB, and 4 KiB. We focus on this
benchmark because it exclusively shows the overhead of hardening in the atomic broadcast
protocol. Remember that the service is not hardened, only the HardCore. Our experiments
measure throughput and response time in graceful runs of PBFT, Paxos and the two versions
of HardPaxos.
Table 5.6 shows the mean response time for single requests of one client and payload sizes
of 64 B, 1 KiB, and 4 KiB. PBFT is slightly faster than Paxos for all cases. HardPaxos-AN and
HardPaxos-2AN present about 4% higher response time than Paxos, which demonstrates the
low overhead incurred by hardening under low loads.
Figure 5.5 shows that when many clients send small messages (64 B), the throughput is
limited by the leader’s CPU. Its utilization reaches 100% for all 4 protocols at high loads.
At all loads, HardPaxos performs almost as good as Paxos; the average difference is about
3-4%. For example, at 10 ms response time, both HardPaxos-AN and HardPaxos-2AN reach
96% of Paxos’ throughput. Comparing to PBFT, HardPaxos shows at least 2.1 times higher
throughput. Hence, hardening only part of HardPaxos incurs an acceptable overhead even for
workloads with small messages.
One can see that the response time of PBFT (up to 38 k.op/s) is lower than our Paxos
and HardPaxos implementations. This can be explained by two reasons: (a) PBFT uses UDP
protocol for all communication whereas Paxos and HardPaxos use TCP; (b) to minimize im-
plementation efforts, our Paxos and HardPaxos libraries use memcpy() and queues in several
places in the leader’s critical path, introducing a higher overhead than PBFT’s optimized im-
plementation. Note that the response time of Paxos and HardPaxos is higher, the smaller the
message. That is caused by our batching implementation, which has lower impact with larger
messages.
Figure 5.5 also shows experiments with payloads of 1 KiB, the typical payload size to eval-
uate systems such as ZooKeeper [Hun+10]. Already with 1 KiB payloads, HardPaxos variants
achieve the network limit before reaching the CPU limit. In the worst case, HardPaxos-AN
170
5.6 Related work
64 B payload 1 KiB payload 4 KiB payload
0
5
10
15
0 40 80 120 0 10 20 30 40 50 0 5 10 15
Throughput (k.op/s)
R
es
po
ns
e
tim
e
(m
s)
HARDPAXOS-AN HARDPAXOS-2AN PAXOS PBFT
Figure 5.5: Response time versus throughput for 64 B, 1 KiB, 4 KiB request and reply payload
presents 620µs higher response time than Paxos, and HardPaxos-2AN 1,230µs. This is due
to higher CPU utilization, which in its turn is induced by the additional CRC computation and
hardening of HardCore. At high loads, however, both achieve the limit of 50 k.op/s; here Hard-
Paxos and Paxos show the same throughput and response time due to aggressive batching.
The maximal throughput achieved by HardPaxos is about 2.3 times higher than PBFT.
Finally, Figure 5.5 shows that for large messages (4 KiB), HardPaxos achieves the same
throughput as Paxos regardless of the hardening scheme. The network limit is at about
13 k.op/s. These results show that Paxos and HardPaxos saturate the network with about
50 MiB/s of payload. That is consistent with the observation that, since the leader has to
send all client requests to two followers, only half of the network limit is available.
5.6 RELATED WORK
State-machine replication (SMR) was thoroughly examined more than two decades ago [Sch90].
Companies like Google, Yahoo! and Microsoft successfully employ the SMR approach in their
systems [Bur06; Hun+10; Bak+11], basing their implementations on a crash fault model.
Recently, several resource-efficient BFT algorithms have been published such as A2M-PBFT-
EA [Chu+07], MinBFT [Ver+13] and CheapBFT [Kap+12a]. Although all these protocols rely on
trusted modules, HardPaxos differs from them in several ways. First, these protocols are
based on PBFT by Castro and Liskov [CL99], which tolerates Byzantine failures, whereas our
work is based on Paxos [Lam98], which is crash-tolerant. HardPaxos sends fewer messages
than PBFT-based algorithms because it does not broadcast accept and commit messages,
and has a simpler leader election and recovery. Second, they require tamper-proof trusted
modules since they consider malicious adversaries. A2M trusted module is implemented
either in a trusted VM or as a hardware component; MinBFT’s trusted module is implemented
using a Trusted Platform Module (TPM); and CheapBFT’s trusted module is built on FPGA.
Third, these works assume the trusted module does not fail arbitrarily. In our work, we
enforce the HardCore to detect ASC faults with a high probability. Our trusted module is
more complex than MinBFT’s and CheapBFT’s, but simpler than A2M since it only contains
counters. Notwithstanding, our approach could be used to harden a software-only version of
their trusted modules.
In Chapter 3, we also used AN-encoding to achieve error isolation; we also implemented
Paxos on top of our framework (see Section 3.5). That approach, however, suffers higher
execution and state overhead since the whole process has to be encoded. AN-encoding the
171
5 Hardened state-machine replication
whole processes incurs at least 4 times higher response time. PASC [Cor+12b] also incurs
non-negligble overhead. An SMR system hardened with the PASC approach has a 17% to
30% lower throughput than its crash-tolerant counterpart. Furthermore, both approaches
introduce 100% state overhead in the atomic broadcast algorithm, which can hold a large
log of requests. In contrast, we have encoded only a small part of HardPaxos, allowing the
overhead to be mitigated by the rest of the implementation.
As mentioned in Section 4.7, Martens et al. [Mar+14] recently proposed Crosscheck, an ap-
proach to protect replicated services against ASC faults that supports multithreading. The
high-level idea of Crosscheck and HardPaxos are similar: By exploiting the redundancy already
present in replicated services, one can mitigate the costs of hardening. The approach em-
ployed by Crosscheck, however, drastically differs from ours. Crosscheck requires the service
to be programmed in an object-oriented language that supports aspects such as C++. Using
aspects, Crosscheck enriches objects with checksum fields. These checksums are then used
by the replicas to cross-check their modifications to the state. To support multithreading,
Crosscheck also requires deterministic multithreading. In contrast, HardPaxos makes no as-
sumptions about the language in which the service is implemented, nor it requires a specific
structure except from the service. HardPaxos mainly requires that the service uses the Hard-
Paxos library to perform the communication between processes. In fact, the service could
be implemented in any language that offers C-bindings – most major languages offer this
feature. Moreover, to enable multithreaded services, HardPaxos can be used to implement,
for example, execute-verify replication [Kap+12b].
5.7 CONCLUSION
In this chapter, we have presented HardPaxos, a hardened variant of the industry-standard
Paxos algorithm, and the corresponding state-machine replication library. By hardening only
a small portion of HardPaxos, we enable crash-tolerant SMR systems (including the service
running on top) to tolerate ASC faults with an excellent coverage at small performance costs.
Our fault coverage evaluation has shown a decrease of undetected errors from 15% down
to 0.02%. Moreover, our hardening schemes do not impair performance. The throughput
decrease of HardPaxos in relation to Paxos is at most 4% for 64 B sized messages and
virtually zero for 1 KiB sized or larger messages. The increase in the response time is also
small, no more than 5%.
In the context of SMR systems, we believe HardPaxos to be a more practical against
ASC faults than existing approaches in the literature such as Byzantine-fault-tolerant repli-
cation [CL99], Crosscheck replication [Mar+14], or whole-process hardening with AN-encoding
(Chapter 3), PASC [Cor+12b], or SEI (Chapter 4).
Future work. AN-encoding is one approach to enforce the trust of HardCore. As future work,
one could apply SEI or PASC to harden the HardCore, reducing even more the overhead. These
approaches would not, however, withstand permanent faults.
In this chapter, we have not proven the HardPaxos algorithm correct. In particular, we do
not formally show that HardPaxos guarantees the Paxos’ invariants. We see a more formal
evaluation of HardPaxos as a good starting point for future work. In this context, another open
topic is the modeling of the HardCore module on top of our formal framework introduced in
Chapter 2. Since the HardCore algorithm is rather simple, it might be possible to precisely
model the unhardened HardCore and a hardened HardCore variant (with one of the hardening
techniques presented in this dissertation). One could then mechanically verify the correctness
of the hardened HardCore under ASC faults with an automatic theorem prover or a model
checker.
172
6 CONCLUSION
173

6.1 Practical contributions
State corruptions are a threat to distributed systems that can lead to catastrophic conse-
quences. In this dissertation, we propose hardening (Section 2.2.3) as a class of general
techniques for protecting distributed systems against state corruption caused by hardware
faults in an end-to-end fashion (Challenge 5, Chapter 1), while requiring changes strictly local
to each process of the system (Challenge 2).
Hardening techniques guarantee error isolation, the property in which no faulty process
propagates internal errors and contaminates correct processes. With hardening in place, de-
velopers can employ existing benign-fault-tolerant algorithms to implement systems tolerant
not only to crashes and message loss, but also to a wide range of hardware faults. Our hard-
ening techniques do not require any trusted software or hardware components (Challenge 3)
and are systematic (Challenge 4): They are fully- or semi-automatic, requiring minor effort from
the developer to employ them.
We now review our contributions and discuss open issues in light of the presented models
and hardening approaches.
6.1 PRACTICAL CONTRIBUTIONS
From a practical perspective, this dissertation contributes three techniques to harden dis-
tributed systems: encoded processes, SEI, and HardPaxos. In the encoded processes ap-
proach (Chapter 3), a developer builds her benign-fault-tolerant system on top of our frame-
work, which automatically hardens the processes with AN codes using a compiler transfor-
mation. Our software fault injection experiments show that encoded processes achieve a
high fault coverage: AN-encoding reduced the number of undetected errors from 16% down
to 0.34%. Nevertheless, for algorithms such as Paxos, which require a moderate to high
throughput, the approach presents a close to prohibitive performance and space overhead: at
least 4 times increase in the response time (even for low loads) and a state increase of 100%.
Because hardware errors occur less often than process crashes, a solution to tolerate such
errors should incur as little overhead as possible (Challenge 8). In Chapter 4, we propose a new
and efficient hardening technique called SEI, which incurs an acceptable overhead (leveraging
hardware error-detection codes), supports multithreaded services, and can harden existing
code bases. As use cases, we harden the Deadwood DNS resolver and the well-known
memcached distributed caching system. Our fault injection campaigns again show a high fault
coverage, reducing the number of undetected errors from about 33% down to 0.25%. Our
performance evaluation shows that for workloads that are not CPU bound, e.g., memcached
running with multiple threads, SEI can exploit the unused CPU cycles to protect the service
against state corruption without impairing performance. However, the performance penalty
associated with hardening becomes more salient in CPU-bounded services: For example,
Deadwood shows a non-negligible overhead of at least 40%, whereas memcached running
with a single thread and very small value sizes (8 bytes) shows an overhead of at most 30%.
Although a part of SEI’s and AN-encoding’s overhead might be mitigated by better compiler
transformations, a large part of the overhead incurred by these techniques is not amenable
to optimizations because it is rather intrinsic to the approach of protecting the whole service.
Even though our approaches do not harden the whole stack – i.e., TCP stack and operating
system are not hardened – they do harden the complete algorithm providing the desired
service. The larger the algorithm, the higher the CPU overhead the approaches introduce.
In Chapter 5, we give up some of the generality achieved with the previous two approaches,
and show how to protect replicated systems against state corruption with very low overhead
in space and performance. Our approach hardens only the core of the algorithm performing
the replication – i.e., Paxos – but does not harden the service itself. We call HardCore the
hardened core of Paxos, and we call HardPaxos the resulting atomic broadcast algorithm.
Although a developer has to implement the communication of her system using the Hard-
175
6 Conclusion
Paxos state-machine replication library, we make few assumptions on how the service itself
is implemented – it might even be implemented in languages other than C, no source code
is required. In our experiments, HardPaxos incurs a performance overhead of no more than
5% of the throughput. Interestingly, the fault coverage of any service implemented on top
of HardPaxos only depends on the fault coverage of HardCore. We harden HardCore with
AN-encoding (also used in Chapter 3) and duplication, achieving an excellent fault coverage:
Only 0.02% of the errors are undetected.
6.2 THEORETICAL CONTRIBUTIONS
From a theoretical perspective, this dissertation also contributes to the state of the art in
several ways. In Chapter 2, we refine the error isolation property in a set of implementable
properties. Out of these properties, local error exposure (Property 2.3) is the hardest to
guarantee, since it restricts the behavior of faulty processes: The property asserts that if
a faulty process sends a corrupt message, this message has to entail enough information
such that a correct receiver process can understand that the message is corrupt by simple
inspection of that message. We envision this additional information to be implemented as
some error-detection code such as CRC; in this way, the code also protects the message
against faults in the network.
To formally express assumptions and design hardening algorithms (Challenge 7), we model
hardware faults as arbitrary state corruption (ASC) faults. Our ASC fault model is comprehen-
sive and precise (Challenges 1 and 6): An ASC fault is allowed to modify all state variables of
a process without any frequency limitations. State variables can represent here any piece of
hardware that holds volatile information such as CPU registers (including the program counter)
and memory locations in the CPU cache hierarchy as well as in the main memory.
Our ASC model can emulate any Byzantine behavior; hence, a local hardening technique
directly built on top of the model is impossible. We argue that a fundamental assumption to
enable hardening techniques to work is fault diversity (Assumption 2.2). Fault diversity asserts
that a validity predicate over some state variables holds even in face of faults modifying these
variables during the execution. The intuition behind fault diversity is in the core of any error-
detection code: If a fault modifies a value protected by an error code, the probability of the
code and the value still passing the code’s acceptance test is negligible.
In Chapters 3 and 4, we define the AN-encoding and SEI techniques on top of the ASC
model, respectively. We prove these techniques correct given a set of assumptions to restrict
ASC faults beyond fault diversity, each technique having its own assumptions. In particular,
AN-encoding does not restrict the frequency in which ASC faults may occur, whereas SEI
requires that at most one ASC fault occurs per traversal. The practical consequence of this
dissimilarity is that AN-encoding can withstand permanent faults and faults in the text segment
of programs, while SEI cannot since such faults are nothing else than several ASC faults in
sequence.
6.3 FUTURE WORK
We see several open issues that can and should be addressed in the future. Most of these
issues were already pointed out in the conclusion section of each of the previous chapters.
Here we highlight some further issues.
The majority of our software fault injection experiments assumed single-bit faults per exe-
cution – with exception of the experiments in Section 3.5.4, where we injected hardware-error
symptoms. We have decided to use single-bit faults because they are commonly used in fault
injection experiments in the literature. Notwithstanding, the ASC fault model is more general
176
6.3 Future work
than this model, and the hardening techniques presented in this work can cope in theory with
faults corrupting several bits, several variables, multiple times per execution. In the future,
one could make our claims stronger by extending our fault injection experiments with more
severe fault scenarios, e.g., multiple faults, permanent faults, and intermittent faults.
Note that our hardware fault injections (Section 4.5.2) were uncontrolled, possibly resulting
in several corruptions in multiple traversals. These experiments were, however, of a limited
scale. In the future, they can be extended and systematized to increase the statistical rele-
vance of the results. For achieving that, one should automate the machinery used to perform
these experiments with a remote-controlled power switch. Currently our setup often requires
human intervention to restart the computer under test, what makes collecting large number
of samples a laborious task.
Another potential future work is in extending existing techniques to provide end-to-end
guarantees. We believe that SWIFT can exploit instruction-level parallelism better than SEI.
SWIFT sequentially executes two copies of each instruction on different data (registers are
duplicated), allowing the processor to better exploit out-of-order execution. Extending SWIFT
with a more comprehensive control-flow error detection and end-to-end guarantees would
have a great practical impact. We think, however, that extending and proving SWIFT correct
in our framework would not be a trivial task.
Finally, the assumption coverage of AN-encoding was studied in some detail in Chapter 3,
where we estimated it assuming faults are random and uniformly distributed. In the future,
besides extending this study, which encompassed a single operation, we should perform
similar studies for other fault assumptions, in particular those of SEI.
177

BIBLIOGRAPHY
179

Bibliography
[Ama08a] Amazon. Amazon S3 Availability Event: July 20, 2008. http://status.aws.
amazon.com/s3-20080720.html. July 2008.
[Ama08b] Amazon. New defective S3 load balancer corrupts relayed messages. https:
//forums.aws.amazon.com/thread.jspa?threadID=22709. June 2008.
[Ama08c] Amazon. “Odd data corruption during download”. https://forums.aws.amazon.
com/thread.jspa?messageID=86214. Apr. 2008.
[Ama11] Amazon. “Single-bit corruption of a small percentage of S3 data”. https://
forums.aws.amazon.com/thread.jspa?messageID=262676. July 2011.
[AMD12] AMD. AMD64 Architecture Programmer’s Manual. Volume 2: System Program-
ming. Sept. 2012.
[Ami+08] Y. Amir, B. Coan, J. Kirsch, and J. Lane. “Byzantine replication under attack”. In:
38th IEEE/IFIP International Conference on Dependable Systems and Networks
(DSN). June 2008, pp. 197–206. DOI: 10.1109/DSN.2008.4630088.
[Ati+12] Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny.
“Workload analysis of a large-scale key-value store”. In: Proceedings of the 12th
ACM SIGMETRICS/PERFORMANCE joint international conference on Measure-
ment and Modeling of Computer Systems. SIGMETRICS ’12. 2012, pp. 53–64.
DOI: 10.1145/2254756.2254766.
[Avi+04] Algirdas Avizienis, Jean claude Laprie, Brian Randell, and Carl E. Landwehr. “Basic
Concepts and Taxonomy of Dependable and Secure Computing”. In: IEEE Trans-
actions on Dependable and Secure Computing Vol. 1 (Issue 1 2004), pp. 11–33.
DOI: 10.1109/TDSC.2004.2.
[Avi+71] A. Avizienis, G.C. Gilley, Francis P. Mathur, D.A. Rennels, J.A. Rohr, and D.K.
Rubin. “The STAR (Self-Testing And Repairing) Computer: An Investigation of the
Theory and Practice of Fault-Tolerant Computer Design”. In: IEEE Transactions on
Computers Vol. C-20.11 (Nov. 1971), pp. 1312–1321. DOI: 10.1109/T-C.1971.
223133.
[Avi71] A. Avizienis. “Arithmetic Error Codes: Cost and Effectiveness Studies for Appli-
cation in Digital System Design”. In: IEEE Transactions on Computers Vol. 20
(Issue 11 Nov. 1971). DOI: 10.1109/T-C.1971.223134.
[AW04] Hagit Attiya and Jennifer Welch. Distributed computing: fundamentals, simula-
tions, and advanced topics. John Wiley & Sons, 2004.
[Bai+08] Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau,
Garth R. Goodson, and Bianca Schroeder. “An analysis of data corruption in the
storage stack”. In: ACM Transactions on Storage Vol. 4.3 (2008), pp. 1–28.
[Bak+11] Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James
Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. “Mega-
store: Providing Scalable, Highly Available Storage for Interactive Services”. In:
Proceedings of the Conference on Innovative Data system Research (CIDR).
2011, pp. 223–234.
[Bar05] H.J. Barnaby. “Will radiation-hardening-by-design (RHBD) work”. In: Nuclear and
Plasma Sciences, Society News (2005).
[Bas+03] C. Basile, Long Wang, Z. Kalbarczyk, and R. Iyer. “Group communication protocols
under errors”. In: 22nd International Symposium on Reliable Distributed Systems
(SRDS). IEEE, 2003, pp. 35–44. DOI: 10.1109/RELDIS.2003.1238053.
[BC11] Shekhar Borkar and Andrew A. Chien. “The future of microprocessors”. In: Com-
munications of the ACM Vol. 54.5 (May 2011), pp. 67–77. DOI: 10.1145/1941487.
1941507.
181
Bibliography
[BE+06] H. Bar-El, H. Choukri, D. Naccache, Michael Tunstall, and C. Whelan. “The Sor-
cerer’s Apprentice Guide to Fault Attacks”. In: Proceedings of the IEEE Vol. 94.2
(Feb. 2006), pp. 370–382. DOI: 10.1109/JPROC.2005.862424.
[Beh+13] Diogo Behrens, Christof Fetzer, Flavio P. Junqueira, and Marco Serafini. “To-
wards Transparent Hardening of Distributed Systems”. In: Proceedings of the
9th Workshop on Hot Topics in Dependable Systems. HotDep ’13. ACM, 2013.
DOI: 10.1145/2524224.2524230.
[Beh+15a] Diogo Behrens, Marco Serafini, Flavio P. Junqueira, Sergei Arnautov, and Christof
Fetzer. “Scalable Error Isolation for Distributed Systems”. In: Proceedings of the
12th USENIX Symposium on Networked Systems Design and Implementation.
NSDI ’15. USENIX Association, 2015.
[Beh+15b] Diogo Behrens, Marco Serafini, Sergei Arnautov, Flavio Junqueira, and Christof
Fetzer. Scalable error isolation for distributed systems: modeling, correctness
proofs, and additional experiments. Tech. rep. TUD-FI15-01-Februar 2015, ISSN
1430-211X. Technische Universität Dresden, Fakultät Informatik, Feb. 2015.
[BH07] Luiz André Barroso and Urs Hölzle. “The Case for Energy-Proportional Comput-
ing”. In: IEEE Computer Vol. 40.12 (Dec. 2007). DOI: 10.1109/MC.2007.443.
[BH09] L.A. Barroso and U. Hölzle. “The datacenter as a computer: An introduction to
the design of warehouse-scale machines”. In: Synthesis Lectures on Computer
Architecture Vol. 4.1 (2009).
[Bha+10] Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Flavio Junqueira, and
Benjamin Reed. “Reliable data-center scale computations”. In: Proceedings of
the 4th International Workshop on Large Scale Distributed Systems and Middle-
ware. LADIS ’10. ACM, 2010, pp. 1–6. DOI: 10.1145/1859184.1859186.
[BKF14] Diogo Behrens, Dmitrii Kuvaiskii, and Christof Fetzer. “HardPaxos: Replication
Hardened against Hardware Errors”. In: IEEE 33rd International Symposium on
Reliable Distributed Systems (SRDS). 2014, pp. 232–241. DOI: 10.1109/SRDS.
2014.13.
[Bol+11] William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters,
and Peng Li. “Paxos replicated state machines as the basis of a high-performance
data store”. In: Proceedings of the 8th Conference on Networked Systems De-
sign and Implementation. NSDI ’11. USENIX Association, 2011.
[Bor+06] E. Borin, Cheng Wang, Y. Wu, and G. Araujo. “Software-based transparent and
comprehensive control-flow error detection”. In: ACM/IEEE International Sympo-
sium on Code Generation and Optimization (CGO). 2006. DOI: 10.1109/CGO.
2006.33.
[Bor04] Shekhar Borkar. “Microarchitecture and Design Challenges for Gigascale Integra-
tion”. In: Proceedings of the 37th Annual IEEE/ACM International Symposium
on Microarchitecture. MICRO-37. IEEE Computer Society, 2004. DOI: 10.1109/
MICRO.2004.24.
[Bor05] Shekhar Borkar. “Designing reliable systems from unreliable components: the
challenges of transistor variability and degradation”. In: IEEE Micro Vol. 25.6
(2005). DOI: 10.1109/MM.2005.110.
[BS04] Wendy Bartlett and Lisa Spainhower. “Commercial Fault Tolerance: A Tale of Two
Systems”. In: IEEE Transactions on Dependable and Secure Computing Vol. 1
(Issue 1 Jan. 2004), pp. 87–96. DOI: 10.1109/TDSC.2004.4.
182
Bibliography
[Bur06] Mike Burrows. “The Chubby lock service for loosely-coupled distributed sys-
tems”. In: Proceedings of the 7th symposium on Operating Systems Design
and Implementation. OSDI ’06. USENIX Association, 2006, pp. 335–350.
[BWF13] Diogo Behrens, Stefan Weigert, and Christof Fetzer. “Automatically Tolerating
Arbitrary Faults in Non-malicious Settings”. In: Proceedings of the 6th Latin-
American Symposium on Dependable Computing (LADC). IEEE Computer So-
ciety, 2013, pp. 114–123. DOI: 10.1109/LADC.2013.26.
[CA95] Liming Chen and Algirdas Avizienis. “N-version programming: a fault-tolerance
approach to reliability of software operation”. In: 25th International Symposium on
Fault-Tolerant Computing. IEEE, June 1995, pp. 113–119. DOI: 10.1109/FTCSH.
1995.532621.
[CF99] Flaviu Cristian and Christof Fetzer. “The Timed Asynchronous Distributed System
Model”. In: IEEE Transactions on Parallel and Distributed Systems Vol. 10 (Issue
6 June 1999), pp. 642–657. DOI: 10.1109/71.774912.
[CGR07] Tushar D. Chandra, Robert Griesemer, and Joshua Redstone. “Paxos made live:
an engineering perspective”. In: Proceedings of the 26th annual ACM Symposium
on Principles of Distributed Computing. PODC ’07. 2007, pp. 398–407. DOI: 10.
1145/1281100.1281103.
[CGR11] Christian Cachin, Rachid Guerraoui, and Luı́s Rodrigues. Introduction to reliable
and secure distributed programming. Springer, 2011.
[Chu+07] Byung-Gon Chun, Petros Maniatis, Scott Shenker, and John Kubiatowicz. “At-
tested append-only memory: making adversaries stick to their word”. In: Proceed-
ings of 21st ACM SIGOPS Symposium on Operating Systems Principles. SOSP
’07 Vol. 41 (Issue 6 Oct. 2007), pp. 189–204. DOI: 10.1145/1323293.1294280.
[CL02] Miguel Castro and Barbara Liskov. “Practical Byzantine Fault Tolerance and Proac-
tive Recovery”. In: ACM Transactions on Computer Systems (TOCS) Vol. 20.4
(Nov. 2002), pp. 398–461. DOI: 10.1145/571637.571640.
[CL99] Miguel Castro and Barbara Liskov. “Practical Byzantine fault tolerance”. In: Pro-
ceedings of the 3rd Symposium on Operating Systems Design and Implementa-
tion. OSDI ’99. USENIX Association, 1999, pp. 173–186.
[Cle+09] Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike Dahlin, and Mirco Marchetti.
“Making Byzantine fault tolerant systems tolerate Byzantine faults”. In: Proceed-
ings of the 6th USENIX Symposium on Networked Systems Design and Imple-
mentation. NSDI ’09. USENIX Association, 2009, pp. 153–168.
[Cle+12] Allen Clement, Flavio Junqueira, Aniket Kate, and Rodrigo Rodrigues. “On the
(Limited) Power of Non-equivocation”. In: Proceedings of the 2012 ACM Sym-
posium on Principles of Distributed Computing. PODC ’12. 2012, pp. 301–308.
DOI: 10.1145/2332432.2332490.
[Coa88] B.A. Coan. “A compiler that increases the fault tolerance of asynchronous pro-
tocols”. In: IEEE Transactions on Computers Vol. 37.12 (Dec. 1988), pp. 1541–
1553. DOI: 10.1109/12.9732.
[Con03] Cristian Constantinescu. “Trends and challenges in VLSI circuit reliability”. In:
IEEE Micro Vol. 23.4 (2003), pp. 14–19. DOI: 10.1109/MM.2003.1225959.
[Cor+11] Miguel Correia, Daniel Gómez Ferro, Flavio P. Junqueira, and Marco Serafini. Mod-
els and Algorithms for ASC Hardening and a Correctness Proof. YL-2011-003.
Yahoo! Labs, 2011.
183
Bibliography
[Cor+12a] James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher
Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter
Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexan-
der Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao,
Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang,
and Dale Woodford. “Spanner: Google’s Globally Distributed Database”. In: Pro-
ceedings of the 10th USENIX Conference on Operating Systems Design and
Implementation. OSDI’12. USENIX Association, 2012, pp. 251–264.
[Cor+12b] Miguel Correia, Daniel Gómez Ferro, Flavio P. Junqueira, and Marco Serafini.
“Practical Hardening of Crash-tolerant Systems”. In: Proceedings of the 2012
USENIX Annual Technical Conference. ATC’12. USENIX Association, 2012.
[Cri85] Flaviu Cristian. “A Rigorous Approach to Fault-Tolerant Programming”. In: IEEE
Transactions on Software Engineering Vol. SE-11.1 (Jan. 1985), pp. 23–31. DOI:
10.1109/TSE.1985.231534.
[Cri91] Flavin Cristian. “Understanding fault-tolerant distributed systems”. In: Commu-
nications of the ACM Vol. 34.2 (Feb. 1991), pp. 56–78. DOI: 10.1145/102792.
102801.
[CT96] Tushar D. Chandra and Sam Toueg. “Unreliable failure detectors for reliable dis-
tributed systems”. In: Journal of the ACM Vol. 43 (Issue 2 Mar. 1996), pp. 225–
267. DOI: 10.1145/226643.226647.
[Del97] Timothy J. Dell. A White Paper on the Benefits of Chipkill- Correct ECC for PC
Server Main Memory. Tech. rep. IBM Microelectronics Division, 1997.
[DGG02] Assia Doudou, Benoı̂t Garbinato, and Rachid Guerraoui. “Encapsulating Failure
Detection: From Crash to Byzantine Failures”. In: Reliable Software Technologies
– Ada-Europe 2002. Ed. by Johann Blieberger and Alfred Strohmeier. Vol. 2361.
Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2002, pp. 24–
50. DOI: 10.1007/3-540-48046-3˙3.
[DHE12] Björn Döbel, Hermann Härtig, and Michael Engel. “Operating System Support
for Redundant Multithreading”. In: Proceedings of the 10th ACM International
Conference on Embedded Software. EMSOFT ’12. 2012, pp. 83–92. DOI: 10.
1145/2380356.2380375.
[Din11] Artem Dinaburg. “Bitsquatting: DNS Hijacking without Exploitation”. In: BlackHat
Security, Defcon 19. 2011.
[DLS88] Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. “Consensus in the presence
of partial synchrony”. In: Journal of the ACM Vol. 35 (Issue 2 Apr. 1988), pp. 288–
323. DOI: 10.1145/42282.42283.
[FC99a] Christof Fetzer and Flaviu Cristian. “A Highly Available Local Leader Election Ser-
vice”. In: IEEE Transactions on Software Engineering Vol. 25 (Issue 5 Sept. 1999),
pp. 603–618. DOI: 10.1109/32.815321.
[FC99b] Christof Fetzer and Flaviu Cristian. “Building Fault-Tolerant Hardware Clocks from
COTS Components”. In: Proceedings of the 7th IFIP International Working Con-
ference on Dependable Computing for Critical Applications (DCFTS 1999). IEEE
Computer Society, Nov. 1999, pp. 67–68. DOI: 10.1109/DCFTS.1999.814290.
[Fen+10] Shuguang Feng, Shantanu Gupta, Amin Ansari, and Scott Mahlke. “Shoestring:
probabilistic soft error reliability on the cheap”. In: ACM SIGARCH Computer Ar-
chitecture News. Vol. 38. 1. 2010, pp. 385–396. DOI: 10.1145/1736020.1736063.
184
Bibliography
[FLP85] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. “Impossibility of
distributed consensus with one faulty process”. In: Journal of the ACM Vol. 32
(Issue 2 Apr. 1985), pp. 374–382. DOI: 10.1145/3149.214121.
[For89] P. Forin. “Vital coded microprocessor principles and application for various transit
systems.” In: IFAC/IFIP/IFORS Symposium. 1989.
[FSS09] Christof Fetzer, Ute Schiffel, and Martin Süßkraut. “AN-Encoding Compiler: Build-
ing Safety-Critical Systems with Commodity Hardware”. In: Computer Safety, Re-
liability, and Security. Ed. by Bettina Buth, Gerd Rabe, and Till Seyfarth. Vol. 5775.
Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2009, pp. 283–
296. DOI: 10.1007/978-3-642-04468-7˙23.
[GA03] S. Govindavajhala and A.W. Appel. “Using memory errors to attack a virtual ma-
chine”. In: Symposium on Security and Privacy. IEEE, May 2003, pp. 154–165.
DOI: 10.1109/SECPRI.2003.1199334.
[Gar+11] Miguel Garcia, Alysson Bessani, Ilir Gashi, Nuno Neves, and Rafael Obelheiro.
“OS Diversity for Intrusion Tolerance: Myth or Reality?” In: 41st IEEE/IFIP In-
ternational Conference on Dependable Systems and Networks. DSN ’11. IEEE
Computer Society, 2011, pp. 383–394. DOI: 10.1109/DSN.2011.5958251.
[Gu+03] Weining Gu, Z. Kalbarczyk, Ravishankar, K. Iyer, and Zhenyu Yang. “Characteri-
zation of Linux kernel behavior under errors”. In: IEEE International Conference
on Dependable Systems and Networks. DSN ’03. IEEE, 2003, pp. 459–468. DOI:
10.1109/DSN.2003.1209956.
[Gup+14] Ashish Gupta, Fan Yang, Jason Govig, Adam Kirsch, Kelvin Chan, Kevin Lai, Shuo
Wu, Sandeep Dhoot, Abhilash Kumar, Ankur Agiwal, Sanjay Bhansali, Mingsheng
Hong, Jamie Cameron, Masood Siddiqi, David Jones, Jeff Shute, Andrey Gubarev,
Shivakumar Venkataraman, and Divyakant Agrawal. “Mesa: Geo-Replicated, Near
Real-Time, Scalable Data Warehousing”. In: VLDB. 2014.
[Gär99] Felix C. Gärtner. “Fundamentals of fault-tolerant distributed computing in asyn-
chronous environments”. In: ACM Computing Surveys (CSUR) Vol. 31.1 (Mar.
1999), pp. 1–26. DOI: 10.1145/311531.311532.
[Ham99] Marc Hamilton. Software development: building reliable systems. Prentice Hall
Professional, 1999.
[HK09] Andreas Haeberlen and Petr Kuznetsov. “The Fault Detection Problem”. In: Pro-
ceedings of the 13th International Conference on Principles of Distributed Sys-
tems. OPODIS ’09. Springer-Verlag, 2009, pp. 99–114. DOI: 10.1007/978-3-
642-10877-8˙10.
[HKD07] Andreas Haeberlen, Petr Kuznetsov, and Peter Druschel. “PeerReview: Practi-
cal Accountability for Distributed Systems”. In: Proceedings of the 21st ACM
Symposium on Operating Systems Principles. SOSP ’07. ACM, Oct. 2007. DOI:
10.1145/1294261.1294279.
[Ho+08] Chi Ho, Robbert van Renesse, Mark Bickford, and Danny Dolev. “Nysiad: practical
protocol transformation to tolerate Byzantine failures”. In: Proceedings of the
5th USENIX Symposium on Networked Systems Design and Implementation.
NSDI’08. USENIX Association, 2008, pp. 175–188.
[Hof+14] Martin Hoffmann, Peter Ulbrich, Christian Dietrich, Horst Schirmeier, Daniel Lohmann,
and Wolfgang Schroder-Preikschat. “A Practitioner’s Guide to Software-Based
Soft-Error Mitigation Using AN-Codes”. In: IEEE 15th International Symposium
on High-Assurance Systems Engineering (HASE) (2014), pp. 33–40. DOI: 10 .
1109/HASE.2014.14.
185
Bibliography
[Hom+13] A. Homescu, S. Neisius, P. Larsen, S. Brunthaler, and M. Franz. “Profile-guided
automated software diversity”. In: IEEE/ACM International Symposium on Code
Generation and Optimization (CGO). Feb. 2013, pp. 1–11. DOI: 10.1109/CGO.
2013.6494997.
[HSS12] Andy A. Hwang, Ioan A. Stefanovici, and Bianca Schroeder. “Cosmic Rays Don’t
Strike Twice: Understanding the Nature of DRAM Errors and the Implications for
System Design”. In: Proceedings of the 17th International Conference on Archi-
tectural Support for Programming Languages and Operating Systems. ASPLOS
XVII. ACM, 2012, pp. 111–122. DOI: 10.1145/2150976.2150989.
[Hun+10] Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. “ZooKeeper:
wait-free coordination for internet-scale systems”. In: Proceedings of the 2010
USENIX Annual Technical Conference. ATC’10. USENIX Association, 2010, pp. 11–
11.
[Int] Intel. Intel Xeon Processor E7 Family: Reliability, Availability, and Serviceability.
http://www.intel.com/content/dam/www/public/us/en/documents/white-
papers/xeon-e7-family-ras-server-paper.pdf.
[Int07] Intel. Intel SSE4 Programming Reference. 2007.
[Int09] Intel. Intel Transactional Memory Compiler and Runtime Application Binary Inter-
face. 2009.
[Int14] Intel. Intel R⃝ 64 and IA-32 Architectures Software Developer’s Manual. 2014.
[KA08] Jonathan Kirsch and Yair Amir. “Paxos for System Builders: An Overview”. In:
Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Mid-
dleware. LADIS ’08. ACM, 2008. DOI: 10.1145/1529974.1529979.
[Kap+12a] Rüdiger Kapitza, Johannes Behl, Christian Cachin, Tobias Distler, Simon Kuhnle,
Seyed Vahid Mohammadi, Wolfgang Schröder-Preikschat, and Klaus Stengel. “Cheap-
BFT: resource-efficient byzantine fault tolerance”. In: Proceedings of the 7th ACM
European Conference on Computer Systems. EuroSys ’12. ACM, 2012, pp. 295–
308. DOI: 10.1145/2168836.2168866.
[Kap+12b] Manos Kapritsos, Yang Wang, Vivien Quema, Allen Clement, Lorenzo Alvisi, and
Mike Dahlin. “All About Eve: Execute-verify Replication for Multi-core Servers”.
In: Proceedings of the 10th USENIX Conference on Operating Systems Design
and Implementation. OSDI’12. USENIX Association, 2012, pp. 237–250.
[KB05] Michael E. Kounavis and Frank L. Berry. “A Systematic Approach to Building
High Performance Software-Based CRC Generators”. In: 10th IEEE Symposium
on Computers and Communications (ISCC) (June 2005), pp. 855–862. DOI: 10.
1109/ISCC.2005.18.
[Ker07] Kernel. Data corruption with Opteron CPUs and Nvidia chipsets. https://bugzilla.
kernel.org/show˙bug.cgi?id=7768. Jan. 2007.
[Kim+14] Yoongu Kim, R. Daly, J. Kim, C. Fallin, Ji Hye Lee, Donghyuk Lee, C. Wilker-
son, K. Lai, and O. Mutlu. “Flipping bits in memory without accessing them:
An experimental study of DRAM disturbance errors”. In: ACM/IEEE 41st Inter-
national Symposium on Computer Architecture (ISCA). 2014, pp. 361–372. DOI:
10.1109/ISCA.2014.6853210.
[KL86] John C. Knight and Nancy G. Leveson. “An experimental evaluation of the as-
sumption of independence in multiversion programming”. In: IEEE Transactions
on Software Engineering Vol. 12.1 (1986), pp. 96–109. DOI: 10.1109/TSE.1986.
6312924.
186
Bibliography
[KMMS03] Kim Potter Kihlstrom, Louise E. Moser, and P. M. Melliar-Smith. “Byzantine Fault
Detectors for Solving Consensus”. In: The Computer Journal Vol. 46.1 (2003),
pp. 16–35. DOI: 10.1093/comjnl/46.1.16.
[Kot+07] Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund
Wong. “Zyzzyva: speculative byzantine fault tolerance”. In: Proceedings of 21st
ACM SIGOPS Symposium on Operating Systems Principles. SOSP ’07. ACM,
2007, pp. 45–58. DOI: 10.1145/1294261.1294267.
[Lam01] Leslie Lamport. “Paxos made simple”. In: ACM SIGACT News Vol. 32.4 (2001),
pp. 18–25.
[Lam02] Leslie Lamport. Specifying systems. Addison-Wesley Reading, 2002.
[Lam98] Leslie Lamport. “The part-time parliament”. In: ACM Transactions on Computer
Systems (TOCS) Vol. 16 (Issue 2 May 1998), pp. 133–169. DOI: 10.1145/279227.
279229.
[Lev+09] Dave Levin, John R. Douceur, Jacob R. Lorch, and Thomas Moscibroda. “TrInc:
small trusted hardware for large distributed systems”. In: Proceedings of the
6th USENIX Symposium on Networked Systems Design and Implementation.
NSDI’09. USENIX Association, 2009, pp. 1–14.
[Li+07] Xin Li, Kai Shen, Michael C. Huang, and Lingkun Chu. “A Memory Soft Error
Measurement on Production Systems”. In: Proceedings of the 2007 USENIX
Annual Technical Conference. ATC’07. USENIX Association, 2007.
[Li+10] Xin Li, Michael C. Huang, Kai Shen, and Lingkun Chu. “A Realistic Evaluation of
Memory Hardware Errors and Software System Susceptibility”. In: Proceedings
of the 2010 USENIX Annual Technical Conference. ATC’10. USENIX Association,
2010.
[LM94] Leslie Lamport and Stephan Merz. “Specifying and Verifying Fault-Tolerant Sys-
tems”. In: Proceedings of the 3rd International Symposium Organized Jointly with
the Working Group Provably Correct Systems on Formal Techniques in Real-Time
and Fault-Tolerant Systems. ProCoS. Springer-Verlag, 1994, pp. 41–76.
[LSP82] Leslie Lamport, Robert Shostak, and Marshall Pease. “The Byzantine Gener-
als Problem”. In: ACM Transactions on Programming Languages and Systems
(TOPLAS) Vol. 4 (Issue 3 July 1982), pp. 382–401. DOI: 10.1145/357172.357176.
[Luk+05] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff
Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. “Pin: building
customized program analysis tools with dynamic instrumentation”. In: Proceed-
ings of the 2005 ACM SIGPLAN Conference on Programming Language Design
and Implementation. PLDI ’05. ACM, 2005, pp. 190–200. DOI: 10.1145/1065010.
1065034.
[Mar+14] Arthur Martens, Christoph Borchert, Tobias Oliver Geißler, Daniel Lohman, Olaf
Spinczyk, and Rüdiger Kapitza. “Crosscheck: Hardening replicated multithreaded
services”. In: Workshop on Dependability of Clouds, Data Centers and Virtual
Machine Technology (DCDV 2014). 2014.
[ME10] Tyler Moore and Benjamin Edelman. “Measuring the Perpetrators and Funders of
Typosquatting”. In: Proceedings of the 14th International Conference on Financial
Cryptography and Data Security. FC’10. Springer-Verlag, 2010, pp. 175–191. DOI:
10.1007/978-3-642-14577-3˙15.
187
Bibliography
[Mer+05] Michael G. Merideth, Arun Iyengar, Thomas Mikalsen, Stefan Tai, Isabelle Rou-
vellou, and Priya Narasimhan. “Thema: Byzantine-Fault-Tolerant Middleware for
Web-Service Applications”. In: IEEE 24th Symposium on Reliable Distributed Sys-
tems (SRDS). IEEE Computer Society, 2005, pp. 131–142. DOI: 10.1109/RELDIS.
2005.28.
[Mor12] Andrey Morozov. “Dual-graph Model for Error Propagation Analysis of Mecha-
tronic Systems”. PhD thesis. 01062 Dresden, Germany: Technische Universität
Dresden, 2012.
[MZM10] Natasa Miskov-Zivanov and Diana Marculescu. “Multiple Transient Faults in Com-
binational and Sequential Circuits: A Systematic Approach”. In: IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems Vol. 29.10 (Oct.
2010), pp. 1614–1627. DOI: 10.1109/TCAD.2010.2061131.
[NDO11] Edmund B. Nightingale, John R. Douceur, and Vince Orgovan. “Cycles, cells and
platters: an empirical analysis of hardware failures on a million consumer PCs”.
In: Proceedings of the 6th ACM European Conference on Computer Systems.
EuroSys ’11. ACM, 2011, pp. 343–356. DOI: 10.1145/1966445.1966477.
[Nik+13] Nick Nikiforakis, Steven Van Acker, Wannes Meert, Lieven Desmet, Frank Piessens,
and Wouter Joosen. “Bitsquatting: Exploiting Bit-flips for Fun, or Profit?” In: Pro-
ceedings of the 22nd International Conference on World Wide Web. WWW ’13.
ACM, 2013, pp. 989–998. DOI: 10.1145/2488388.2488474.
[Nis+13] Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee,
Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford,
Tony Tung, and Venkateshwaran Venkataramani. “Scaling Memcache at Face-
book”. In: Proceedings of the 10th USENIX Conference on Networked Systems
Design and Implementation. NSDI ’13. USENIX Association, 2013, pp. 385–398.
[NT88] Gil Neiger and Sam Toueg. “Automatically increasing the fault-tolerance of dis-
tributed systems”. In: Proceedings of the 7th annual ACM Symposium on Prin-
ciples of Distributed Computing. PODC ’88. ACM, 1988, pp. 248–262. DOI: 10.
1145/62546.62588.
[OSM02a] N. Oh, P.P. Shirvani, and E.J. McCluskey. “Control-flow checking by software
signatures”. In: IEEE Transactions on Reliability Vol. 51.1 (Mar. 2002), pp. 111–
122. DOI: 10.1109/24.994926.
[OSM02b] N. Oh, P.P. Shirvani, and E.J. McCluskey. “Error detection by duplicated instruc-
tions in super-scalar processors”. In: IEEE Transactions on Reliability Vol. 51.1
(Mar. 2002), pp. 63–75. DOI: 10.1109/24.994913.
[Pat+08] K. Pattabiraman, N. Nakka, Z. Kalbarczyk, and R. Iyer. “SymPLFIED: Symbolic
program-level fault injection and error detection framework”. In: 38th IEEE/IFIP
International Conference on Dependable Systems and Networks (DSN). June
2008, pp. 472–481. DOI: 10.1109/DSN.2008.4630118.
[Per+07] Frances Perry, Lester Mackey, George A. Reis, Jay Ligatti, David I. August, and
David Walker. “Fault-tolerant Typed Assembly Language”. In: Proceedings of the
2007 ACM SIGPLAN Conference on Programming Language Design and Imple-
mentation. PLDI ’07. ACM, 2007, pp. 42–53. DOI: 10.1145/1250734.1250741.
[Pow92] D. Powell. “Failure mode assumptions and assumption coverage”. In: 22nd In-
ternational Symposium on Fault-Tolerant Computing. FTCS-22. IEEE, July 1992,
pp. 386–395. DOI: 10.1109/FTCS.1992.243562.
188
Bibliography
[RCA07] G.A Reis, J. Chang, and D.I August. “Automatic Instruction-Level Software-Only
Recovery”. In: IEEE Micro Vol. 27.1 (Jan. 2007), pp. 36–47. DOI: 10.1109/MM.
2007.4.
[Rei+05a] G.A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D.I. August. “SWIFT: soft-
ware implemented fault tolerance”. In: ACM/IEEE International Symposium on
Code Generation and Optimization (CGO). Mar. 2005, pp. 243–254. DOI: 10.
1109/CGO.2005.34.
[Rei+05b] George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, David I. August,
and Shubhendu S. Mukherjee. “Design and Evaluation of Hybrid Fault-Detection
Systems”. In: ACM/IEEE 32nd International Symposium on Computer Architec-
ture (ISCA) (2005), pp. 148–159. DOI: 10.1109/ISCA.2005.21.
[RM00] Steven K. Reinhardt and Shubhendu S. Mukherjee. “Transient Fault Detection via
Simultaneous Multithreading”. In: Proceedings of the 27th Annual International
Symposium on Computer Architecture (ISCA). ACM, 2000, pp. 25–36. DOI: 10.
1145/339647.339652.
[Sag+05] Giacinto P. Saggese, Nicholas J. Wang, Zbigniew T. Kalbarczyk, Sanjay J. Patel, and
Ravishankar K. Iyer. “An Experimental Study of Soft Errors in Microprocessors”.
In: IEEE Micro Vol. 25.6 (Nov. 2005), pp. 30–39. DOI: 10.1109/MM.2005.104.
[Sch+10a] Ute Schiffel, André Schmitt, Martin Süßkraut, and Christof Fetzer. “ANB- and
ANBDmem-Encoding: Detecting Hardware Errors in Software”. In: Computer
Safety, Reliability, and Security. Ed. by Erwin Schoitsch. Vol. 6351. Lecture Notes
in Computer Science. Springer Berlin / Heidelberg, 2010, pp. 169–182. DOI: 10.
1007/978-3-642-15651-9˙13.
[Sch+10b] Ute Schiffel, André Schmitt, Martin Süßkraut, and Christof Fetzer. “Slice Your Bug:
Debugging Error Detection Mechanisms using Error Injection Slicing”. In: 8th
European Dependable Computing Conference (EDCC). IEEE Computer Society,
2010. DOI: 10.1109/EDCC.2010.12.
[Sch+10c] Ute Schiffel, André Schmitt, Martin Süßkraut, and Christof Fetzer. “Software-
Implemented Hardware Error Detection: Costs and Gains”. In: Proceedings of
the 2010 3rd International Conference on Dependability. DEPEND ’10. IEEE Com-
puter Society, 2010, pp. 51–57. DOI: 10.1109/DEPEND.2010.16.
[Sch11] Ute Schiffel. “Hardware Error Detection Using AN-Codes”. PhD thesis. 01062
Dresden, Germany: Technische Universität Dresden, 2011.
[Sch90] Fred B. Schneider. “Implementing fault-tolerant services using the state machine
approach: a tutorial”. In: ACM Computing Surveys (CSUR) Vol. 22 (Issue 4 Dec.
1990), pp. 299–319. DOI: 10.1145/98163.98167.
[Shi+02] Premkishore Shivakumar, Michael Kistler, Stephen W. Keckler, Doug Burger, and
Lorenzo Alvisi. “Modeling the Effect of Technology Trends on the Soft Error Rate
of Combinational Logic”. In: IEEE International Conference on Dependable Sys-
tems and Networks. DSN ’02. IEEE Computer Society, 2002, pp. 389–398.
[Shy+07] A. Shye, T. Moseley, V.J. Reddi, J. Blomstedt, and D.A. Connors. “Using Process-
Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance”. In:
37th IEEE/IFIP International Conference on Dependable Systems and Networks.
DSN ’07. 2007, pp. 297–306. DOI: 10.1109/DSN.2007.98.
[SJR07] Yee Jiun Song, Flavio Junqueira, and Bejamin Reed. “BFT for Skeptics”. In: SOSP
07 WIP (2007).
189
Bibliography
[Sle+99] T.J. Slegel, III Averill R.M., M.A. Check, B.C. Giamei, B.W. Krumm, C.A. Kry-
gowski, W.H. Li, J.S. Liptay, J.D. MacDougall, T.J. McPherson, J.A. Navarro, E.M.
Schwarz, K. Shum, and C.F. Webb. “IBM’s S/390 G5 microprocessor design”. In:
IEEE Micro Vol. 19.2 (Mar. 1999), pp. 12–23. DOI: 10.1109/40.755464.
[SPW09] Bianca Schroeder, Eduardo Pinheiro, and Wolf-Dietrich Weber. “DRAM errors in
the wild: a large-scale field study”. In: Proceedings of the 11th international joint
conference on Measurement and modeling of computer systems. SIGMETRICS
’09. ACM, 2009, pp. 193–204. DOI: 10.1145/1555349.1555372.
[SS83] Richard D. Schlichting and Fred B. Schneider. “Fail-stop processors: an approach
to designing fault-tolerant computing systems”. In: ACM Transactions on Com-
puter Systems (TOCS) Vol. 1 (Issue 3 Aug. 1983), pp. 222–238. DOI: 10.1145/
357369.357371.
[Süß+09] Martin Süßkraut, Stefan Weigert, Ute Schiffel, Thomas Knauth, Martin Nowack,
Diogo Becker de Brum, and Christof Fetzer. “Speculation for Parallelizing Runtime
Checks”. In: Proceedings of the 11th International Symposium on Stabilization,
Safety, and Security of Distributed Systems (SSS). Springer-Verlag, 2009, pp. 698–
710. DOI: 10.1007/978-3-642-05118-0˙48.
[Süß10] Martin Süßkraut. “Automatic Hardening against Dependability and Security Soft-
ware Bugs”. PhD thesis. 01062 Dresden, Germany: Technische Universität Dres-
den, 2010.
[Ulb+12] P. Ulbrich, M. Hoffmann, R. Kapitza, D. Lohmann, W. Schroder-Preikschat, and R.
Schmid. “Eliminating Single Points of Failure in Software-Based Redundancy”.
In: 9th European Dependable Computing Conference (EDCC). IEEE Computer
Society, 2012, pp. 49–60. DOI: 10.1109/EDCC.2012.21.
[Ven+12] Venkateshwaran Venkataramani, Zach Amsden, Nathan Bronson, George Cabrera
III, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Jeremy
Hoon, Sachin Kulkarni, Nathan Lawrence, Mark Marchukov, Dmitri Petrov, and
Lovro Puzar. “TAO: how facebook serves the social graph”. In: Proceedings of the
2012 ACM SIGMOD International Conference on Management of Data. SIGMOD
’12. ACM, 2012, pp. 791–792. DOI: 10.1145/2213836.2213957.
[Ver+13] G. Veronese, M. Correia, A. Bessani, L. Lung, and P. Verissimo. “Efficient Byzan-
tine Fault Tolerance”. In: IEEE Transactions on Computers Vol. 62.1 (Jan. 2013),
pp. 16–30. DOI: 10.1109/TC.2011.221.
[Ver06] Paulo Verissimo. “Travelling through Wormholes: a New Look at Distributed Sys-
tems Models”. In: ACM SIGACT News Vol. 37.1 (2006), pp. 66–81.
[VLS13] Trilok Vyas, Yujie Liu, and Michael Spear. “Transactionalizing Legacy Code: An
Experience Report Using GCC and Memcached”. In: TRANSACT 13. ACM, Mar.
2013.
[Wam+13] Jons-Tobias Wamhoff, Mario Schwalbe, Rasha Faqeh, Christof Fetzer, and Pas-
cal Felber. “Transactional Encoding for Tolerating Transient Hardware Errors”. In:
Proceedings of the 15th International Symposium on Stabilization, Safety, and Se-
curity of Distributed Systems (SSS). Ed. by Teruo Higashino, Yoshiaki Katayama,
Toshimitsu Masuzawa, Maria Potop-Butucaru, and Masafumi Yamashita. Vol. 8255.
Lecture Notes in Computer Science. Springer International Publishing, Nov. 2013,
pp. 1–16. DOI: 10.1007/978-3-319-03089-0˙1.
[Wan+07] Cheng Wang, Ho-seop Kim, Youfeng Wu, and Victor Ying. “Compiler-Managed
Software-based Redundant Multi-Threading for Transient Fault Detection”. In:
IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
2007, pp. 244–258. DOI: 10.1109/CGO.2007.7.
190
Bibliography
[War10] Timo Warns. Structural Failure Models for Fault-Tolerant Distributed Computing.
Springer, 2010.
[War12] Henry S. Warren. Hacker’s Delight. 2nd. Addison-Wesley Professional, 2012.
[Wen+78] J.H. Wensley, L. Lamport, J. Goldberg, M.W. Green, K.N. Levitt, P.M. Melliar-
Smith, R.E. Shostak, and C.B. Weinstock. “SIFT: Design and analysis of a fault-
tolerant computer for aircraft control”. In: Proceedings of the IEEE Vol. 66.10
(Oct. 1978), pp. 1240–1255. DOI: 10.1109/PROC.1978.11114.
[Wid+07] J. Widder, G. Gridling, B. Weiss, and J.-P. Blanquart. “Synchronous Consensus
with Mortal Byzantines”. In: 37th IEEE/IFIP International Conference on Depend-
able Systems and Networks. DSN ’07. June 2007, pp. 102–112. DOI: 10.1109/
DSN.2007.91.
[Wir09] Wired. Ma.gnolia Suffers Major Data Loss, Site Taken Offline. http://www.
wired.com/2009/01/magnolia-suffer/. Jan. 2009.
[WJB06] Alan Wood, Robert Jardine, and Wendy Bartlett. “Data integrity in HP NonStop
servers”. In: Workshop on SELSE. 2006.
[WP05] N.J. Wang and S.J. Patel. “ReStore: symptom based soft error detection in micro-
processors”. In: 35th IEEE/IFIP International Conference on Dependable Systems
and Networks. DSN ’05. June 2005, pp. 30–39. DOI: 10.1109/DSN.2005.82.
[WV01] Gerhard Weikum and Gottfried Vossen. Transactional Information Systems: The-
ory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan
Kaufmann Publishers Inc., 2001.
[Yeh96] Y.C. Yeh. “Triple-triple redundant 777 primary flight computer”. In: IEEE Aerospace
Applications Conference. Vol. 1. Feb. 1996, pp. 293–307. DOI: 10.1109/AERO.
1996.495891.
[YGS09] Jing Yu, Maria Jesus Garzaran, and Marc Snir. “ESoftCheck: Removal of Non-vital
Checks for Fault Tolerance”. In: IEEE/ACM International Symposium on Code
Generation and Optimization (CGO). 2009, pp. 35–46. DOI: 10.1109/CGO.2009.
14.
191

LISTS
193

Figures
FIGURES
2.1 Benign-fault simulation with error isolation . . . . . . . . . . . . . . . . . . . . . 20
2.2 Example of two traversals in some execution . . . . . . . . . . . . . . . . . . . 26
2.3 Example of fault forking an execution . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Simple program p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Correct execution of p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6 Faulty operation as ASC fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.7 Modified operand as ASC fault . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.8 Exchanged operand as ASC fault . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.9 Exchanged operator as ASC fault . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.10 Lost store as restricted ASC fault . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.11 Control-flow fault as ASC fault and error propagation . . . . . . . . . . . . . . . 31
2.12 Example of faults inside and outside traversals . . . . . . . . . . . . . . . . . . . 32
3.1 Encoded operations and errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Domain D = {0, . . . , 26 − 1} and encoding constant A = 11 . . . . . . . . . . . . 61
3.3 Interface provided to processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4 Interface expected from processes . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5 Encoded process and wrapper: message sending . . . . . . . . . . . . . . . . . 72
3.6 Encoded process and wrapper: alarms and triggers . . . . . . . . . . . . . . . . 72
3.7 Paxos implementation in encoding framework . . . . . . . . . . . . . . . . . . . 75
3.8 Goodput versus response time for 3 and 5 acceptors . . . . . . . . . . . . . . . 77
3.9 Response time and CPU utilization for 3 and 5 acceptors . . . . . . . . . . . . . 77
4.1 High-level phases of a hardened handler . . . . . . . . . . . . . . . . . . . . . . 86
4.2 Variables of process π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3 Fault assumption for Lemmas 4.1 and 4.2 . . . . . . . . . . . . . . . . . . . . . 104
4.4 Fault assumption for Lemmas 4.3 and 4.4 . . . . . . . . . . . . . . . . . . . . . 106
4.5 Fault assumption for Lemmas 4.5, 4.6 and 4.7 . . . . . . . . . . . . . . . . . . . 108
4.6 Fault assumption for Lemmas 4.8 and 4.9 . . . . . . . . . . . . . . . . . . . . . 110
4.7 Jump-out jump-in fault sequence . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.8 Example of an event loop and a hardened handler . . . . . . . . . . . . . . . . . 128
4.9 libsei data structures and event handling . . . . . . . . . . . . . . . . . . . . . 130
4.10 Error manifestations in MC-SEI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
4.11 Throughput of memcached variants varying threads and key range length . . . . 142
4.12 Response time versus throughput varying value size from 512 B up to 4 KiB . . 143
4.13 Response time versus throughput with 8 B value size . . . . . . . . . . . . . . . 143
4.14 CPU utilization versus throughput with 8 B value size . . . . . . . . . . . . . . . 143
4.15 Response time and CPU utilization varying value size with 10 k.req/s load . . . 145
4.16 Response time and CPU utilization varying number of keys with 10 k.req/s load 145
4.17 Deadwood’s response time versus throughput varying the number of clients . 146
4.18 Deadwood’s CPU utilization versus throughput varying the number of clients . 146
4.19 Deadwood’s response time with throughput up to 30 k.req/s . . . . . . . . . . . 146
5.1 HardPaxos-based SMR architecture . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.2 Normal-case execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.3 Election and recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.4 Example of lost store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.5 Response time versus throughput of HardPaxos, Paxos, and PBFT . . . . . . . 171
195

Tables
TABLES
2.1 Process classification according to failure and fault . . . . . . . . . . . . . . . . 19
3.1 High-level symptoms and AN-code variants . . . . . . . . . . . . . . . . . . . . . 67
3.2 Mean election time µ with 3 processes . . . . . . . . . . . . . . . . . . . . . . . 78
3.3 Mean CPU utilization of the leader with 3 processes . . . . . . . . . . . . . . . 78
3.4 Results of fault injection in proposer . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.1 Summary of all variable sets used in SEI-hardening . . . . . . . . . . . . . . . . 102
4.2 States used in the correctness proofs . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3 Fault types supported by BFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.4 Results of fault injection in memcached . . . . . . . . . . . . . . . . . . . . . . . . 136
4.5 Results of fault injection in Deadwood . . . . . . . . . . . . . . . . . . . . . . . 137
4.6 Results of fault injection in memcached with 4 threads . . . . . . . . . . . . . . . 140
4.7 Errors in memcached when undervolting CPU . . . . . . . . . . . . . . . . . . . . 140
5.1 Certificate types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.2 Certificate fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.3 Results of fault injection in HardCore . . . . . . . . . . . . . . . . . . . . . . . . 167
5.4 Number of undetected errors per fault type . . . . . . . . . . . . . . . . . . . . 168
5.5 HardCore’s cycles per commit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.6 Response time of single requests (µs) . . . . . . . . . . . . . . . . . . . . . . . 170
197

Algorithms
ALGORITHMS
3.1 Software-implemented unsigned integer division . . . . . . . . . . . . . . . . . . 48
4.1 Hardened program ph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2 Init blocks: Filter , Prepare1, and Prepare2 . . . . . . . . . . . . . . . . . . . . . . 94
4.3 Validity check of input variables (Vc = Vi ) or output variables (Vc = Vo) . . . . . . 95
4.4 Validity check of a set of variables Vc ⊆ Vg . . . . . . . . . . . . . . . . . . . . . . 95
4.5 Reset and Validate blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.6 Example: single control-flow gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.7 Example: sequence of control-flow gates . . . . . . . . . . . . . . . . . . . . . . 100
4.8 Control-flow gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.9 Hardening rules for thread τi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.10 Intercepted operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.1 Normal operation for process πi (untrusted part of HardPaxos) . . . . . . . . . . . 158
5.2 Normal case for πi (HardCore) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.3 Leader Election for πi (untrusted part of HardPaxos) . . . . . . . . . . . . . . . . 160
5.4 Leader Election for πi (HardCore) . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
199

Transformation rules
TRANSFORMATION RULES
3.1 PAN-encoded addition, subtraction, and move . . . . . . . . . . . . . . . . . . . 46
3.2 PAN-encoded multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 PAN-encoded unconditional branch . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 PAN-encoded load operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.5 PAN-encoded store operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6 PAN-encoded validity check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7 PAN-encoded conditional branch . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.8 PAN-encoded division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.9 PAN-encoding input message check . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1 Hardened program structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2 Reading variable v ∈ Vs in blocks Exec1 and Exec2 . . . . . . . . . . . . . . . . 97
4.3 Writing variable v ∈ Vs in blocks Exec1 . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Writing variable v ∈ Vs in blocks Exec2 . . . . . . . . . . . . . . . . . . . . . . . 98
4.5 Writing variable v ∈ Vo in blocks Exec1 . . . . . . . . . . . . . . . . . . . . . . . 98
4.6 Writing variable v ∈ Vo in blocks Exec2 . . . . . . . . . . . . . . . . . . . . . . . 98
201

