Distributed real-time fault tolerance in a virtualized separation kernel by Missimer, Eric
Boston University
OpenBU http://open.bu.edu
Theses & Dissertations Boston University Theses & Dissertations
2017
Distributed real-time fault
tolerance in a virtualized
separation kernel
https://hdl.handle.net/2144/27371
Boston University
BOSTON UNIVERSITY 
 
GRADUATE SCHOOL OF ARTS AND SCIENCES 
 
 
 
 
 
Dissertation 
 
 
 
 
 
DISTRIBUTED REAL-TIME FAULT TOLERANCE IN A 
 
VIRTUALIZED SEPARATION KERNEL 
 
 
 
by 
 
 
 
ERIC SCOTT MISSIMER 
 
B.A., Boston University, 2010 
 
 
 
 
 
Submitted in partial fulfillment of the 
 
requirements for the degree of 
 
Doctor of Philosophy 
 
2017 
c© Copyright by
ERIC SCOTT MISSIMER
2017
Approved by
First Reader
Richard West, PhD
Professor of Computer Science
Second Reader
Neil Audsley, PhD
Professor of Computer Science
Third Reader
Martin Herbordt, PhD
Professor of Electrical and Computer Engineering
Acknowledgments
I would like to thank my advisor Rich for his support of my Ph.D. research, his guidance
and for imparting his wisdom and expertise. The insights I gained about operating and
real-time systems will be forever useful in my future endeavors.
I would also like to thank Kate for being there for me through this process. Whether it
was helping me run experiments, editing papers or pushing me to not lose sight of the goal
of finishing, you were always there for me and I will be forever grateful for all of your
support. I am glad I will be there to help you the same way you helped me.
To my parents, sister and grandfather, thank you for all the support and encouragement,
mostly in the form of asking when I was going to be done.
To my late grandmother, thank you for always supporting and believing in me.
Last but not least, to my two crazy pups Bitsy and Dash, thanks for making sure I
started every day early so I could work on this dissertation.
This work is supported in part by the National Science Foundation (NSF) under Grant
# 1527050. Any opinions, findings, and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily reflect the views of the NSF.
iv
DISTRIBUTED REAL-TIME FAULT TOLERANCE IN A
VIRTUALIZED SEPARATION KERNEL
(Order No. )
ERIC SCOTT MISSIMER
Boston University, Graduate School of Arts and Sciences, 2017
Major Professor: Richard West, Professor of Computer Science
ABSTRACT
Computers are increasingly being placed in scenarios where a computer error could result
in the loss of human life or significant financial loss. Fault tolerant techniques must be em-
ployed to prevent an error from resulting in a fault causing such losses. Two types of errors
that are common in real-time and embedded system are soft errors, i.e. data bit corruption,
and timing errors, such as missed deadlines. Purely software based techniques to address
these types of errors have the advantage of not requiring specialized hardware and are able
to use more readily available commercial off-the-shelf hardware. Timing errors are ad-
dressed using Adaptive Mixed-Criticality, a scheduling technique where higher criticality
tasks are given precedence over those of lower criticality when it is impossible to guaran-
tee the schedulability of all tasks. While mixed-criticality scheduling has gained attention
in recent years, most approaches assume a periodic task model and that the system has a
single criticality level which dictates the available budget to all tasks. In practice these
assumptions do not hold: different types of tasks are better served by different scheduling
approaches and only a subset of high critical tasks might require additional capacity to
meet deadlines. In the latter case, this occurs when a process has experienced a fault and
requires additional capacity to perform the recovery.
v
In this thesis, soft errors are addressed using a novel real-time fault tolerance method
based on a virtualized separation kernel. Instead of executing redundant copies of an
application on separate machines, the applications are consolidated onto one multi-core
processor and use hardware virtualization extensions to partition the applications. This
allows new recovery schemes to be explored. In addition, the maximum recovery time
is sufficiently bounded to ensure recovery occurs in a timely manner without affecting
the normal execution of the application. A virtualized separation kernel in combination
with Adaptive Mixed-Criticality techniques creates a fault tolerant system that predictably
detects and recovers from timing and soft errors.
vi
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Fault Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Virtualized Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 I/O Adaptive Mixed Criticality . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Recovery Adaptive Mixed Criticality . . . . . . . . . . . . . . . . . . . . 11
1.6 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Background and Related Work 13
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Soft Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 VCPU Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.3 Virtualized Separation Kernel . . . . . . . . . . . . . . . . . . . 21
2.1.4 Mixed-Criticality . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.1 Hypervisor Based Fault Tolerance (HBFT) . . . . . . . . . . . . 27
2.2.2 Process N-Modular Redundancy . . . . . . . . . . . . . . . . . . 28
3 I/O Adaptive Mixed-Criticality 30
3.1 Response Time Analysis for SS and PIBS . . . . . . . . . . . . . . . . . 30
vii
3.2 System Model and Analysis . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 I/O Adaptive Mixed-Criticality Model . . . . . . . . . . . . . . . 33
3.2.2 IO-AMC-rtb . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.1 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . 40
3.4.2 Quest Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 45
4 Recovery Adaptive Mixed-Criticality 52
4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 RAMC Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.1 RAMC HI-criticality steady state . . . . . . . . . . . . . . . . . 54
4.2.2 RAMC Mode Change: RAMC-rtb . . . . . . . . . . . . . . . . . 56
4.3 IO-RAMC Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.1 IO-RAMC HI-criticality steady state . . . . . . . . . . . . . . . . 58
4.3.2 IO-RAMC Mode Change: IO-RAMC-rtb . . . . . . . . . . . . . 61
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5 Virtualized Fault Tolerance 68
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 N-Modular Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.1 Arbitrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.2 Redundant Application . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.3 Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3 Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3.1 Recovery-Copy-on-Write . . . . . . . . . . . . . . . . . . . . . . 75
viii
5.3.2 Live Migration Recovery . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Real-Time Guarantees . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.1 NMR Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.2 Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.3 Recovery Adaptive Mixed-Criticality . . . . . . . . . . . . . . . 85
5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5.1 Hashing a Subset of Memory . . . . . . . . . . . . . . . . . . . . 88
5.5.2 Insufficient HI-Criticality Capacity . . . . . . . . . . . . . . . . 88
6 Conclusion 97
7 Future Work 100
7.1 AMC Model Where Tasks Periods Change . . . . . . . . . . . . . . . . . 100
7.2 Arbitrator, Kernel and Hypervisor Soft Error Protection . . . . . . . . . . 101
7.3 Sporadic Server and PIBS Dependency Scheduling . . . . . . . . . . . . 102
7.4 Fault Recovery Scheduling to Reduce Recovery Time . . . . . . . . . . . 102
Bibliography 103
Curriculum Vitae 108
ix
List of Tables
3.1 Parameters Used to Generate Task Sets . . . . . . . . . . . . . . . . . . . 42
3.2 Quest Task Set Parameters for Scheduling Overhead . . . . . . . . . . . . 47
3.3 Quest Task Set Parameters for I/O Device Mode Change . . . . . . . . . 50
4.1 IO-RAMC Experiment Task Parameters . . . . . . . . . . . . . . . . . . 66
5.1 Task Completion Times with Varying HI-Criticality Capacities for Pull
Recovery-Copy-on-Write . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Task Completion Times with Varying HI-Criticality Capacities for Live
Migration Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
x
List of Figures
1.1 Overview of our proposed fault classification scheme. . . . . . . . . . . . 7
2.1 PIBS Server Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Sporadic Server and PIBS Interaction . . . . . . . . . . . . . . . . . . . 17
2.3 Example Task and I/O Scheduling using Sporadic Servers . . . . . . . . . 19
2.4 Example Task and I/O Scheduling using Sporadic Servers & PIBS . . . . 20
3.1 Schedulability of SS+PIBS vs SS-Only . . . . . . . . . . . . . . . . . . 44
3.2 Schedulability of IO-AMC vs AMC . . . . . . . . . . . . . . . . . . . . 44
3.3 Weighted Schedulability vs % of HI-criticality Tasks . . . . . . . . . . . 45
3.4 Weighted Schedulability vs Criticality Factor . . . . . . . . . . . . . . . 46
3.5 Weighted Schedulability vs Number of Tasks . . . . . . . . . . . . . . . 46
3.6 Job Completion Times for SS+PIBS vs SS-Only . . . . . . . . . . . . . . 48
3.7 Scheduling Overheads for SS-Only vs SS+PIBS . . . . . . . . . . . . . . 49
3.8 Data From HI- and LO-criticality USB Cameras . . . . . . . . . . . . . . 51
3.9 Total Data Processed Over Time . . . . . . . . . . . . . . . . . . . . . . 51
4.1 RAMC: Weighted Schedulability vs Number of Triggered Tasks . . . . . 65
4.2 IO-RAMC: Weighted Schedulability vs Number of Triggered Tasks . . . 66
4.3 Job completion times for four tasks. The red line denotes the time the fault
was injected. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1 N-Modular Redundant scenario . . . . . . . . . . . . . . . . . . . . . . . 71
xi
5.2 Flowchart depicting interaction between sandboxes in the fault tolerance
scenario. Arrows between sandboxes depict messages being sent. CP
represents the checkpoint system call. . . . . . . . . . . . . . . . . . . . 91
5.3 Flowchart depicting the Recovery-Copy-on-Write (RCOW) recovery mech-
anism. Arrows between sandboxes depict messages being sent. . . . . . . 92
5.4 Flowchart depicting the Recovery-Live-Migration (LMR) recovery mech-
anism. Arrows between sandboxes depict messages being sent. . . . . . . 93
5.5 CPU 0 Correctness Error Timeline Live Migration Recovery . . . . . . . 94
5.6 CPU 0 Correctness Error Timeline Recovery CoW . . . . . . . . . . . . 94
5.7 CPU 0 Timing Error Timeline Live Migration Recovery . . . . . . . . . . 95
5.8 CPU 0 Timing Error Timeline Recovery CoW . . . . . . . . . . . . . . . 95
5.9 CPU 0 Timeline Hashing Subset of Memory . . . . . . . . . . . . . . . . 96
5.10 CPU 0 Affects of Decreasing HI-Criticality Capacity . . . . . . . . . . . 96
xii
List of Abbreviations
AMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adaptive Mixed-Criticality
CAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Controller Area Network
COTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Commerical Off-The-Shelf
CP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checkpoint
EPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extended Page Table
HBFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Hypervisor Based Fault Tolerance
IO-AMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I/O Adaptive Mixed-Criticality
KVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kernel-based Virtual Machine
LMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Live Migration Recovery
PIBS . . . . . . . . . . . . . . . .Priority Inheritance Bandwidth Preserving Server
PID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ProportionalIntegralDerivative
RAMC . . . . . . . . . . . . . . . . . . . . . . . . . . .Recovery Adaptive Mixed-Criticality
RCOW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Recovery Copy-on-Write
TLB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Translation Lookaside Buffer
VFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtualized Fault Tolerance
VMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Machine Monitor
xiii
Chapter 1
Introduction
1.1 Motivation
Computers have been entrusted with the safety of human lives. Airplanes equipped with
flight management systems fly approximately one million passengers every day in the
United States alone [Uni14]. Driver-less cars are also beginning to appear as a safe, more
efficient mode of transportation. Failures in such systems can cause the loss of human life.
In less extreme cases, computer failure could cause significant financial loss. For example,
in 1996 a floating point to integer conversion error caused the Ariane 5 rocket to explode
shortly after takeoff [LLJLF+96]. These are merely a few examples of how computers are
trusted with lives and money. In such systems, there is a need for high confidence that the
computer will not fail.
High confidence computer systems are vulnerable to a variety of faults and failures.
For example, DRAM and SRAM can experience a random change of value, a phenomenon
known as soft errors, typically caused by radiation [Bau05, BHMK95]. Software may be
incorrectly written such that an error always occurs given the same input, as in the case
with the Ariane 5 rocket. Software can also be incorrectly written so an error only occurs
given certain asynchronous events, e.g. race conditions. Compute nodes can experience
power failures causing them to shutdown. Also, in high confidence real-time systems,
deadlines can be missed causing the lack of action or the incorrect action to be taken in
2the physical world.
Embedded and real-time systems are particularly vulnerable to soft errors due to the
harsher environments they are exposed to. A soft error is when a data bit is corrupted
until it is written to again [Bau05]. For example, in air and spacecrafts soft errors are
caused predominantly by radiation but with a much higher frequency than that experience
at sea level [WDS+91, OBF+93, SSAW92, NPK+85]. Implanted medical devices such
as implantable cardioverter defibrillators have also experienced temporary errors such as
device resets and incorrect data during cancer radiation treatment [EKBSS13]. Experi-
ments conducted by Wilkinson et al. also found soft errors in the SRAM of electronics
operating near a linear accelerator (linac), a device commonly used for cancer radiation
treatments [WBB+05]. In such cases, a failure of an electronic device could result in the
loss of human life.
Furthermore, real-time systems are also vulnerable to timing errors related to missed
deadlines. In real-time systems, the lack of a response in a timely manner can be just
as dangerous as an incorrect response as the system does not know how to proceed. For
example, consider a semi-autonomous vehicle using camera and LIDAR sensor data. In
this simplified example, there is one task responsible for processing this data which sends
a binary signal to the brakes if there is a pedestrian in front of the vehicle. If said task did
not finish at the appropriate time the vehicle could strike a pedestrian. It would also not
be appropriate to merely have the vehicle brake if the task was delayed as this could cause
an accident in itself. Therefore, techniques that ensure tasks finish within a timely manner
are just as important as techniques that protect against soft errors.
Fault tolerance is the ability for a system to continue operating properly given that a
fault or failure has occurred. Software-based approaches rely on little or no specialized
hardware to detect both software and hardware faults and recover appropriately. The ad-
3vantage to software fault tolerance is the ability to use commercial off-the-shelf (COTS)
hardware instead of specially designed hardware.
This thesis will introduce Virtualized Fault Tolerance (VFT) which is a software tech-
nique used to detect and recovery from soft errors. It is based on N-Modular Redundancy
(NMR) and takes advantage of the Quest-V Separation Kernel, discussed in Chapter 2.1.3.
This thesis will also extend Adaptive Mixed-Criticality (AMC) which is a common mixed-
criticality solution to work with non-periodic bandwidth-preserving servers used in the
Quest operating system. Finally, a variant of AMC will be used to address the issue of
predictable recovery in the VFT recovery scheme.
1.2 Fault Classification
Various failure classification schemes have been created to aid in the development of fault
tolerance methods. By classifying different types of failures we are able to determine
what fault tolerance methods are applicable to a specific application. We are also able to
quantify the effectiveness of fault tolerance methods.
The following is a list of failures from Cristian [Cri91] and Hadzilacos and Toueg [HT94]:
Crash Failure:
The compute node has halted and remains so until it reboots if it reboots at all.
Depending on how the node is rebooted it could either reboot to an initial predefined
state (amnesia crash), retain some information (partial amnesia crash) or retain all
state (pause crash). Partial amnesia crash and pause crash involve check-pointing
state to a non-volatile medium. If the node never reboots it is a halting crash.
Omission Failure:
The compute node fails to respond to an input. Omission failures are further divided
4into two groups: send and receive omission failures. For send omission failures the
node fails to send the output message. For receive omission failures the node fails
to properly receive a message placed into its input buffer.
Arbitrary/Response Failure:
A compute node produces an incorrect result or enters an incorrect state. This type
of failure is also referred to as a Byzantine failure. Soft errors, previously mentioned,
fall into this category.
Timing Failure:
The compute node either produces a result too early or failures to produce a result
in time. The latter case is also referred to as a performance failure.
Different fault tolerance methods aim at solving different types of failures. For ex-
ample, hypervisor-based fault tolerance described in Section 2.2.1 is designed for crash
failures and Adaptive Mixed-Critically discussed in Section 2.1.4 handles timing failures.
Certain fault tolerance models assume only specific failures occurs. For example, the fail-
stop model assumes that arbitrary failures do not occur, i.e. a compute node will stop
instead of producing arbitrary outputs.
Another classification scheme that is applicable in conjunction with the previous scheme
is whether the fault is permanent, intermittent or transient [BCDGG00]. Permanent faults
lead to errors whenever the component is activated. Intermittent and transient faults are
both characterized by being temporary, i.e. the error will not always occur. Intermittent
faults are internal to the system while transient faults are external, e.g. radiation causing a
state transition in an integrated circuit.
Faults are classified by whether it is a physical or logical fault [Avi78]. Physical faults
are problems with the physical hardware, typically due to either wear and tear or external
5factors affecting the system. Logical faults are typically referred to as “bugs” and are a
result of either design or interaction faults. Design faults refer to errors introduced during
the implementation of the system. Interaction faults are introduced to the system at a man-
machine interface during operation or maintenance. Physical faults are typically referred
to as hardware faults and logical faults as software faults. This distinction is slightly
misleading as the incorrect implementation of a hardware component is a logical fault
caused by the persons who designed the hardware.
Finally, faults are classified as either Byzantine or common-mode failure faults [LH94].
Byzantine faults are arbitrary faults that affect a single computational instance. Common-
mode failure faults are when multiple copies in a redundant system suffer faults nearly
simultaneously, typically due to a single cause. This classification scheme parallels the
previously discussed classification scheme. Logical faults are typically common-mode
failure faults. Physical faults, when they do not bring down the entire system, typically do
not affect all the nodes in a distributed system the same and therefore are Byzantine faults.
We propose a new fault classification scheme that attempts to unify some of the ideas
in the other schemes (Figure 1.1). Specifically, we use the logical vs. physical fault but
further subdivide logical into synchronous and asynchronous faults and physical into per-
manent and transient/intermittent. Synchronous faults are logical faults that consistently
occur. For example, a division by zero error given some input value would be a syn-
chronous fault. Asynchronous faults do not consistently occur. An example of an asyn-
chronous fault would be a race condition; it only occurs during a certain interleaving of
thread execution. Faults can also be classified by how the fault was observed, either an
incorrect result or a result at the incorrect time (typically late or never). We include crash
failures as timing errors as a crash of a system is typically observed by a heartbeat mes-
sage not being received before some threshold. The benefits of the proposed scheme are
6the following:
1. While not true for all instances a system typically does not know what caused the
fault, only that a value/message was not received in time or the value/message was
incorrect. By examining these two classes of faults we are able to create different
recovery schemes that are applicable to systems as the correct recovery scheme is
activated when the fault is observed.
2. By further subdividing logical and physical faults we see a parallel between the sub-
categories. Specifically, both asynchronous logical faults and transient/intermittent
physical faults could benefit from a rollback-reply mechanism to recover from the
fault. Such a mechanism would not be useful for synchronous logical faults, such as
unexpected input, as the same error would occur a second time.
Figure 1.1 will be used as a reference for the remaining sections in this chapter to
highlight the types of faults each technique addresses.
1.3 Virtualized Fault Tolerance
The first software-based fault tolerance approach takes advantage of the unique virtual-
ized separation kernel design of the Quest-V real-time operating system. Quest-V was
developed at Boston University from scratch for high confidence real-time applications. It
utilizes the hardware virtualization extensions commonly found in new processors such as
Intel processors with VT-x and ARM v7. As opposed to a traditional virtualization moni-
tor, which multiplexes and emulates hardware resources, the monitor in Quest-V statically
partitions hardware resources such as CPU, memory and devices. The monitor is therefore
not needed during normal operation resulting in low virtualization overheads. The sand-
box has both a kernel and user-space applications and is responsible for managing its own
7Figure 1.1: Overview of our proposed fault classification scheme. Examples of causes in-
clude: unexpected input, race conditions and faults from soft errors. *Note that common-
mode failures, while typically logical synchronous faults, could also be physical faults if
the physical state change occurred to all redundant compute nodes. **Similarly, we focus
mainly on physical faults that affect only a single or small subset of all the compute nodes;
therefore such faults are Byzantine faults as defined by Lala and Harper [LH94].
resources. This results in each sandbox being isolated from each other so accidental or
malicious errors in one sandbox cannot propagate to other sandboxes. The virtualization
aspects of Quest-V will be discussed in more detail in Section 2.1.3.
The fault tolerance approach consists of two parts: a fault detection scheme based on
8N-Modular Redundancy and a recovery scheme inspired by virtual machine live migra-
tion. Both parts take advantage of the virtualized separation kernel design of Quest-V. By
using hardware virtualization extensions a distributed system-on-a-chip is created where
shared memory is used for communication. This results in higher bandwidth and lower
latency than traditional communication mediums. This also allows for simpler solutions
to distributed problems such as clock synchronization [LWCM14]. The N-Modular Re-
dundancy fault detection mechanism consists of running multiple copies of an application
across multiple Quest-V sandboxes. Each sandbox is isolated via the hardware virtualiza-
tion extensions. The results of each application are sent to an arbitrator sandbox which
acts as the voter. If an error is detected the recovery mechanism is invoked. The recovery
mechanism takes advantage of the fact that the redundant copies of the application coexist
on the same processor. There are two variants to the recovery mechanism but both involve
copying a correct instance of the application to replace the faulty one.
This fault tolerance scheme will be referred to as Virtualized Fault Tolerance (VFT).
It is a fault tolerance scheme that is 1) suitable for real-time systems, 2) utilizes COTS
hardware and 3) places little burden on the application developer. The fault tolerance
scheme is suitable for real-time systems as all costs associated with the scheme are pre-
dictable and therefore are incorporated into the real-time analysis of the taskset. This is
discussed in detail in Section 5.4. VFT is a software-based scheme that only requires
hardware virtualization extensions such as those found in modern Intel x86, AMD x86
and ARM v8. This permits the use of commercial off-the-shelf processors as opposed
to processors specifically designed for harsh environments. This is advantageous as such
processors, for example, radiation hardened processors, are slower, more expensive and
heavier than their COTS counterparts [LGM+96]. Finally, VFT aims to ease the burden
placed on application developers. Both fault detection and recovery are generic and do
9not rely on application specific information. This is opposed to acceptance tests used to
detect faults [Hec76, Ise05] and application specific recovery mechanisms [DS10]. VFT
was developed to address soft errors as shown in Figure 1.1.
There exists fault tolerance techniques similar to VFT that do not provide real-time
guarantees. For example, Dobel et al. [DH13] and Mushtaq et al. [MAAB13] both intro-
duce process level fault tolerance similar to VFT. Neither technique, however, is capable
of making real-time guarantees with regard to the overhead of fault detection and recovery.
Furthermore, these techniques do not take advantage of the hardware virtualization exten-
sions found in recent COTS processors. Fault tolerance techniques in real-time systems
often rely on physically separate compute nodes [WLG+78, KWFT88, KBRJ12] and often
rely on low-bandwidth communication protocols such as Controller Area Network (CAN)
bus or similar custom protocols. Even when a higher bandwidth network medium is used,
such as Ethernet, which is the case for systems such as SAFER [KBRJ12] the bandwidth
and latency are still drastically different than those compared to memory. Since redundant
instances of an application coexist on a single multi-core processor sending data from one
sandbox to another merely involves copying data from one region of memory to another.
Shared memory channels have higher bandwidth and lower latency than traditional em-
bedded communication mechanisms such as CAN and outperforms even high performance
communication mediums such as Infiniband and Ethernet. Specifically, a single channel of
DDR3-1600 DRAM can obtain a peak transfer rate of 204.8 Gb/s [Dre07, JED08] which is
approximately 3.7 times more bandwidth than Mellanox 56 Gb/s Infiniband [Mel11] and
over five times larger bandwidth than 40 Gb/s Ethernet [Cis16]. In terms of latency, both
Infiniband, Ethernet or any other communication medium is going to add an additional
cost to memory latency considering the data they are transferring initially resides in mem-
ory. Therefore, while external communication mediums could theoretically have a similar
10
bandwidth to memory their latency is always going to be greater. By using a more efficient
communication medium new fault recovery techniques that previously would have been
too costly can be explored.
1.4 I/O Adaptive Mixed Criticality
Mixed-criticality scheduling orders the execution of tasks of different criticality levels.
Criticality levels are based on the consequences of a task violating its timing requirements
or failing to function as specified. For example, DO-178B is a software certification used
in avionics, which specifies several assurance levels in the face of software failures. These
assurance levels range from catastrophic (e.g., could cause a plane crash) to non-critical
when they have little or no impact on aircraft safety or overall operation.
Mixed-criticality scheduling was first introduced by Vestal (2007) [Ves07]. Later,
Baruah, Burns and Davis (2011) [BBD11] introduced Adaptive Mixed-Criticality (AMC)
scheduling. The work presented in this paper builds upon AMC to extend it for use in
systems where tasks make I/O requests. The second main contribution of this thesis is
the IO-AMC model and analysis. IO-AMC is a mixed-criticality analysis where threads
are scheduled using either Sporadic Servers or Priority Inheritance Bandwidth-preserving
Servers (PIBS). It is shown that while a system of Sporadic Servers and PIBS has a slightly
lower schedulability than a system of only Sporadic Servers from a theoretical point of
view, in practice a real implementation of both scheduling policies results in Sporadic
Server and PIBS outperforming a system of only Sporadic Servers.
Previous mixed-criticality analysis assumes that all jobs in the system are scheduled
under the same policy, typically as periodic tasks. However, as previously shown by Dan-
ish, Li and West [DLW11], using the same scheduling policy for both task threads and
11
bottom half interrupt handlers1 results in lower I/O performance and larger overheads.
Specifically, the authors compared the Sporadic Server (SS) [Spr90] model for both main
threads and bottom half interrupt handlers to using Sporadic Servers for main threads and
Priority Inheritance Bandwidth-preserving Servers (PIBS) for bottom half threads. The
results showed that by using PIBS for interrupt bottom half threads, the scheduling over-
heads are reduced and I/O performance is increased. The details of PIBS will be discussed
in Section 2.1.2. IO-AMC builds upon this model to consider Adaptive Mixed-Criticality.
1.5 Recovery Adaptive Mixed Criticality
The final contribution of this thesis is Recovery Adaptive Mixed Criticality (RAMC).
RAMC is an AMC variant suitable for use during the recovery scheme for VFT or other
process based recovery mechanisms. Traditional AMC divides the tasks into two groups,
LO- and HI-criticality and provides the mechanism to give additional budget to the HI-
criticality tasks at the expense of the LO-criticality tasks. Unlike traditional AMC or IO-
AMC, Recovery AMC allows individual tasks to change their active criticality level, dy-
namically increasing the available capacity for a subset of the HI-criticality tasks. This is
suitable for process recovery techniques such as VFT, where only a single task might need
additional capacity as opposed to the set of all HI-criticality tasks.
The RAMC technique will be used in conjunction with VFT to show that it is possible
to provide real-time guarantees during the recovery of processes. It will be combined with
the IO-AMC technique to work in not just a system of periodic tasks but to also work in a
system of Sporadic Servers and PIBS.
1The Linux terminology is used, where the top half is the non-deferrable work that runs in interrupt
context, and the bottom half is the deferrable work executed in a thread context after the top half.
12
1.6 Thesis Statement
Thesis: This thesis uses hardware virtualization to replicate services in a chip-level dis-
tributed system. The combination of Adaptive Mixed-Criticality techniques and service
replication provides the basis for a real-time fault tolerant system, which detects and re-
covers from timing and soft errors, and outperforms traditional distributed system ap-
proaches.
1.7 Thesis Organization
The rest of this thesis is organized as follows. Chapter 2 provides the necessary back-
ground and related work. Chapter 3 covers I/O Adaptive Mixed-Criticality an Adap-
tive Mixed-Criticality approach that takes into account the scheduling of I/O bottom half
threads. Chapter 4 introduces Recovery Adaptive Mixed-Criticality (RAMC), an Adaptive
Mixed-Criticality variant used during fault recovery mechanisms that require additional
execution time during the recovery routine. Virtualized Fault Tolerance (VFT) is intro-
duced in Chapter 5. VFT takes advantage of the unique Quest-V separation kernel design
to execute redundant tasks in an efficient manner. Finally, the thesis conclusion is Chapter
6 followed by future work being covered in Chapter 7.
Chapter 2
Background and Related Work
2.1 Background
This section will introduce the necessary background information for understanding the
work introduced in this thesis.
2.1.1 Soft Errors
A soft error is a data bit in a device being corrupted. In the terrestrial domain, soft errors
are predominantly caused by radiation. One cause of soft errors is by radiation [Bau05].
Trace amounts of uranium and thorium in packaging materials were the dominant cause of
soft errors in DRAM in the 1970s. More recently the dominant cause of soft errors is both
high-energy cosmic radiation [Nor96] and the interaction of low-energy cosmic radiation
with the isotope boron-10 [BHMK95]. While DRAM used to be the most susceptible
places for soft errors in the terrestrial domain, it is now one of the more robust components.
SRAM and to a lesser degree the logical components of an integrated circuit are now the
most vulnerable components.
Similarly, in the non-terrestrial domain, specifically for air and spacecrafts, soft er-
rors are caused predominantly by radiation but with a much higher frequency [WDS+91,
OBF+93, SSAW92, NPK+85]. Specifically, for space environments there are three pri-
mary sources of radiation that cause soft errors: 1) galactic cosmic rays with a source
14
outside of our solar system, 2) solar particles accelerated by solar flares and 3) energetic
particles trapped by the Earth’s magnetosphere [WDS+91]. Soft errors have also been
reported with a higher frequency compared to sea-level in avionic flights. For example,
Olsen et al. [OBF+93] reported a high number of soft errors in flights with a 10km alti-
tude. These soft errors were caused by neutrons, which are not present in cosmic rays due
to their short lifespan. Neutrons are however produced by the interaction of cosmic rays,
nitrogen and oxygen. Neutrons cause soft errors by either transferring energy to a silicon
nucleus or by causing a proton or alpha particle to escape from a nucleus. Both result in
electron-hole pairs being created, which will directly lead to a soft error.
Finally, Heat has also been shown to generate soft errors in COTS systems [GA03].
Specifically, Govindavajhala and Appel used an incandescent light bulb to heat DRAM
to approximately 100 degrees Celsius. At this temperature, Govindavajhala et al. noticed
random bit flips in memory which they then exploited to take control of a Java virtual
machine.
2.1.2 VCPU Scheduler
Sporadic Servers (SS) [Spr90] and Priority Inheritance Bandwidth-preserving Servers (PIBS)
[DLW11] are the two scheduling models used in the Quest real-time operating system [Que].
VCPUs are a kernel level resource container unrelated to the Quest-V variant of Quest.
Sporadic Servers are specified using a budget capacity, C, and period T . By default,
the Sporadic Server with the smallest period is given highest priority, which follows the
rate-monotonic policy [LL73]. The main tasks in Quest run on Sporadic Servers, thereby
guaranteeing them a minimum share of CPU time every real-time period. Replenishment
lists are used to track the consumption of CPU time and when it is eligible to be re-applied
to the corresponding server.
15
PIBS uses a much simpler scheduling method which is more appropriate for the short
execution times associated with interrupt bottom half threads. A PIBS is specified by
a utilization, U , instead of a capacity and period. A PIBS always runs on behalf of a
Sporadic Server and inherits both the priority and period of the Sporadic Server. For
example, the PIBS running in response to a device interrupt would run on behalf of the
Sporadic Server that requested the I/O action to be performed. The capacity of a PIBS is
calculated as C=U×T , where T is the period of the Sporadic Server.
As with Sporadic Servers, PIBS uses replenishments but instead of a list, there is only
a single replenishment. When a PIBS has executed Cactual, its next replenishment is set
to t + Cactual/U , where t is the time the PIBS started its most recent execution. A PIBS
cannot execute again until the next replenishment time regardless of whether it has utilized
its entire budget or not. Since a PIBS uses only one replenishment value rather than a list,
it is beneficial for scheduling short-lived interrupt service routines that would otherwise
fragment a Sporadic Server’s budget into many small replenishments. The replenishment
method of a PIBS limits its maximum utilization within any sliding window of size T to
(2− U)U . This occurs when the PIBS first runs for C1=U(T − UT ) and then again for
C2=UT . This is demonstrated in Figure 2.1:
C1 + C2
T
=
(T ′ ∗ U) + C2
T
=
(T − C2) ∗ U + C2
T
=
(C2/U − C2) ∗ U + C2
C2/U
= (2− U)U
The interaction between Sporadic Servers and PIBS is depicted in Figure 2.2. First, the
16
Figure 2.1: PIBS Server Utilization
Sporadic Server initiates an I/O related system call (Step 1). The system call invokes the
associated device driver, which programs the device (Step 2). The device will eventually
initiate an interrupt which is handled by the top half interrupt handler (Step 3). The top half
interrupt handler will acknowledge the interrupt and wake up one of the PIBS to handle
the majority of the work associated with the interrupt (Step 4). Note that although the
figure shows PIBS run at kernel-level, they could just as well be associated with user-space
threads if the system granted such privileges. Finally, after a PIBS finishes executing it will
wake up the corresponding Sporadic Server, assuming the Sporadic Server was blocked
on an I/O request (Step 5). A more detailed description of the scheduling of PIBS and
Sporadic Servers can be found in prior work (2011) [DLW11].
The advantages of using PIBS for bottom half interrupt handling include lower schedul-
ing overhead and no delayed execution due to replenishment list fragmentation. The short
execution time of bottom half interrupt handlers will cause a Sporadic Server to complete
execution before exhausting its available capacity. This leads to a fragmented replenish-
ment list. In practice, the lists are finite in length because of memory constraints and to
limit scheduling overhead. When a replenishment list is full, items are merged to make
17
Figure 2.2: Sporadic Server and PIBS Interaction
space for new replenishments. This causes the available budget to be deferred [SBWH10],
and the effective utilization of the Sporadic Server will drop below its specified value.
This, in turn, results in deadlines being missed. In contrast, PIBS have only a single re-
plenishment list item and a different policy for how the replenishment is posted, which
prevents a drop in their effective utilization and lower scheduling overhead.
Figure 2.3 shows an example of replenishment list fragmentation. The first task, τ1,
begins execution at time t=0 and continues execution for eight time units. τ1 utilized its
entire capacity at t=8 so a single replenishment item is posted for 8 time units of capacity
at time t=16. The replenishment is posted at t=16 because τ1 started execution at time t=0
and has a period of 16. Right at the completion of its execution τ1 initiates an I/O related
event, e.g. a read. Suppose this I/O event causes four interrupts to occur. Each interrupt
initiates a bottom half thread that takes one time unit of computation to complete. τ1 will
require all four bottom half interrupt handlers to complete execution before it continues
18
execution, e.g. it is blocking on the read. τ2 is the task responsible for handling these
bottom half interrupt handlers. The first interrupt occurs at time t=9 and is immediately
handled by τ2. Note that at time t=9 the time of the head replenishment list item is updated
from zero to nine. This is to ensure that when a replenishment item is posted after the task
blocks or depletes its budget the replenishment item is posted at the correct time. Once τ2
completes execution of the bottom half interrupt handler it blocks as it waits for another
I/O interrupt to occur.
When τ2 blocks it posts a replenishment item for the capacity that it used. Since it
used 1 time unit of capacity and started executing at time t=9 a replenishment item of
1 time unit is posted at time t=25. At time t = 11 another interrupt occurs, waking
up τ2 for another time unit. The time of the first replenishment list item is updated to
11 to reflect that the Sporadic Server started execution at time t=11. After handling the
bottom half interrupt handler, another replenishment item for one time unit is posted, this
time at time t=27. When the third interrupt occurs τ2 again executes for 1 time unit.
However, when τ2 attempts to post a replenishment item for the one unit of capacity used
it cannot since its replenishment list is full.1 In order to ensure that τ2 does not adversely
affect other running tasks, its remaining capacity of one time unit is merged with the
next replenishment list item, which in this example is at t=25. This results in the available
capacity for τ2 being zero, leaving it unable to immediately handle the interrupt that occurs
at time t=15. Instead, the execution of the interrupt is delayed and completes only at time
t=26. Meanwhile, τ1, which had the capacity to execute at time t=16, is blocking waiting
for completion of the fourth interrupt handler. τ1 begins execution at time t=26 but that
leaves only six time units until the deadline at time t=32, instead of the eight required to
complete execution.
1For the sake of this example the replenishment list size is three. In practice, a larger replenishment list
size would be chosen but, regardless, fragmentation and capacity postponement can occur [DLW11].
19
Figure 2.3: Example Task and I/O Scheduling using Sporadic Servers
Figure 2.4 shows a similar scheduling scenario. However, this time the interrupt bot-
tom halves are handled by a PIBS. As with the previous scenario, τ1 initiates an I/O related
event at time t=8 and blocks until the completion of the event. The first interrupt occurs
at time t=9 and is immediately handled by PIBS. As with the Sporadic Server, the time
in the replenishment list item is updated to reflect when the PIBS started execution. Once
the event is handled, the PIBS posts a single replenishment item at time t=13. This is
because τ2 is running on behalf of τ1 so it inherits both the priority and period of τ1. Since
τ2 executed for only 25% of its available four time units of capacity the replenishment
is posted 25% of its period from when it started execution. The second interrupt occurs
at time t=11 but its execution is deferred until τ2 has available capacity. At time t=13
20
the third interrupt arrives and τ2 has the capacity to handle both it and the previous inter-
rupt. Finally, the fourth interrupt arrives at time t=15, which can also be handled by τ2.
Since τ2 has executed for 75% of its available capacity a replenishment is posted twelve
time units after it started execution, at t=25. This permits τ1 to continue execution at time
t=16. The pattern then repeats itself. This simple example demonstrates the advantages of
PIBS for bottom half threads compared to Sporadic Servers. Finally, note that even if the
replenishment list in the first example had been long enough to avoid the delayed budget,
the Sporadic Server would have experienced twice as much context switching overhead
compared to the equivalent PIBS.
Figure 2.4: Example Task and I/O Scheduling using Sporadic Servers & PIBS
Note that in the first example, if a different policy for handling a full Sporadic Server
replenishment list had been chosen, τ2 might have completed in time for τ1 to finish before
21
its deadline. For example, if the later replenishment items were merged instead of the head
replenishment item, τ2 would have had one remaining time unit of capacity to handle the
last bottom half interrupt handler. However, as more interrupts occur, this temporary fix
will no longer work as more capacity is delayed further in time.
2.1.3 Virtualized Separation Kernel
Quest-V is a virtualized separation kernel that utilizes hardware virtualization features to
run multiple separate instances of the Quest real-time operating system. These instances
are isolated from each other using hardware virtualization extensions available on modern
processors. As opposed to a typical monitor, which multiplexes and emulates hardware
resources, the Quest-V monitor partitions and isolates resources. A subset of CPUs, mem-
ory and devices are statically assigned to each Quest-V sandbox. Therefore, there is no
virtual machine scheduling or device multiplexing that can introduce overheads or unpre-
dictability. During normal operation, a Quest-V sandbox does not perform VM-exits2 and
therefore minimizes the cost of using the hardware virtualization extensions. Currently,
excluding forced exits, such as the CPUID instruction, the monitor is only entered to dy-
namically setup shared communication channels and to migrate a process from one sand-
box to another. The Quest-V monitor also differs from a typical hypervisor as a typical
hypervisor runs guest virtual machines that are unrelated to each other and the hypervi-
sor. A Quest-V sandbox, while capable of running completely independent of the other
sandboxes, does not have to. The sandboxes are capable of explicit inter-sandbox com-
munication via shared memory channels and are capable of process migration between
sandboxes. A more detailed description of Quest-V can be found in Li, West and Mis-
simer (2014a) [WLMD16] and Li et al. (2014b) [LWCM14].
2A VM-exit is the transition from a guest virtual machine to the virtual machine monitor
22
One of the unique design principles of Quest-V is that the ring of protection responsible
for protecting and partitioning resources is different than the ring responsible for managing
resources. Specifically, the monitor partitions CPUs, memory and devices, and the kernel
running in each sandbox is responsible for managing the resources. This differs from
typical hypervisors such as VMware’s ESX, Linux/KVM and Xen. In these systems, the
hypervisor takes control of all resources and temporarily grants or emulates resources to
guests. The Quest-V design is in some ways similar to the micro-kernel [Lie95] design,
however, in a micro-kernel, the most trusted privilege domain is in the critical path whereas
the monitor in the Quest-V design is not entered during normal operations.
2.1.4 Mixed-Criticality
Mixed-criticality is a fault tolerance technique that addresses performance timing fail-
ures. The idea was introduced by Vestal in 2007 [Ves07]. Tasks are divided into different
groups of criticality, where the highest level of criticality is for mission critical tasks that
must complete by their deadline while the lowest level criticality tasks can miss deadlines
or even be halted. The key concept behind mixed-criticality is to analyze the system at
different criticality levels and ensure that all tasks of that criticality level will meet all
deadlines. Vestal proposed two methods for handling mixed-criticality: period transfor-
mation to increase the priority of high criticality tasks [SLR86] and a modified form of
Audsley’s priority assignment algorithm [Aud91] by selecting higher criticality tasks first
in the selection process.
Another form of mixed-criticality is Adaptive Mixed-Criticality (AMC) [BBD11]. In
AMC, a task τi is defined by its period, deadline, a vector of computation times and a
criticality level,
(
Ti, Di, ~Ci, Li
)
. In the simplest case, Li∈{LO, HI}, i.e. there are two
criticality levels LO and HI where HI>LO. For tasks for which L=LO, C (HI) is not
23
defined as there are no HI-criticality versions of these tasks to execute. For HI-criticality
tasks C (HI)≥C (LO). The system also has a criticality level and it initially starts in the
LO-criticality mode. While running in the LO-criticality mode, both LO- and HI-criticality
tasks execute, and while running in HI-criticality mode, only HI-criticality tasks execute.
If a high criticality task exhausts its C (LO) before finishing its current job, the system
switches into the HI-criticality mode and suspends all LO-criticality tasks. This requires a
signaling mechanism available to tasks to signal that they have completed execution of a
specific job instance.
The schedulability test for AMC consists of three parts: 1) the schedulability of the
tasks when the system is in the LO-criticality state, 2) the schedulability of the tasks when
the system is in the HI-criticality state and 3) the schedulability of the tasks during the
mode change from LO-criticality to HI-criticality. The first two are simple and are handled
with the traditional response time analysis, taking into account the appropriate set of tasks
and worst case execution times. Specifically, the response time analysis for each task τi
when the system is in the LO-criticality state is:
RLOi = Ci (LO) +
∑
τj∈hp(i)
⌈
RLOi
Tj
⌉
Cj (LO)
and the response time analysis for the HI-criticality state is:
RHIi = Ci (HI) +
∑
τj∈hpH(i)
⌈
RHIi
Tj
⌉
Cj (HI)
where hpH (i) is the set of all high-criticality tasks with a priority higher than or equal to
that of task τi.
What remains is whether all HI-criticality tasks will meet their deadlines during the
mode change from LO-criticality to HI-criticality. Baruah, Burns and Davis provided two
24
sufficient but not complete scheduling tests for the criticality mode, i.e. the tests will
not admit task sets that are not schedulable but may reject task sets that are schedulable.
The first is AMC-rtb (response time bound) which derives a new response time analysis
equation for the mode change. The second is AMC-max which derives an expression
for the maximum interference a HI-criticality task experiences during the mode change.
AMC-max iterates over all possible points in time where the interference could increase,
taking the maximum of these points. AMC-max is more computationally expensive than
AMC-rtb but dominates AMC-rtb by permitting certain task sets that AMC-rtb rejects, and
accepting any task set that AMC-rtb accepts. Both tests use Audsley’s priority assignment
algorithm [Aud01], as priorities that are inversely related to period are not optimal for
AMC [Ves07, BBD11].
In this thesis, we focus on the use of AMC-rtb for response time analysis of a system
with Sporadic Servers and PIBS. This is because of the added expense incurred by AMC-
max, which must iterate over all time points when LO-criticality tasks are released.
The AMC-rtb analysis starts with a modified form of the traditional periodic response
time analysis:
R∗i = Ci +
∑
τj∈hp(i)
⌈
R∗i
Tj
⌉
Cj (min (Li, Lj)) (2.1)
Wheremin (Li, Lj) returns the lowest criticality level passed to it, e.g. in the case of a
dual-criticality level system, HI is only returned if both arguments are HI. The use of min
implies that we only consider criticality levels equal to or less than the criticality level of
τi. If we divide the higher priority tasks by criticality level, we obtain the following:
R∗i = Ci +
∑
τj∈hpH(i)
⌈
R∗i
Tj
⌉
Cj (min (Li, Lj)) +
∑
τj∈hpL(i)
⌈
R∗i
Tj
⌉
Cj (LO) (2.2)
25
Where hpL (i) is the set of all LO-criticality tasks with a priority higher than or equal to
the priority of task τi. The min in the third term is replaced with LO as we know Lj=LO.
Since we are only concerned with high priority tasks after the mode change, i.e. Li=HI,
Equation 2.2 becomes:
R∗i = Ci (HI) +
∑
τj∈hpH(i)
⌈
R∗i
Tj
⌉
Cj (HI) +
∑
τj∈hpL(i)
⌈
R∗i
Tj
⌉
Cj (LO) (2.3)
Finally, the response time bound is tightened even further by recognizing that LO-criticality
tasks only interfere with HI-criticality tasks before the mode change occurred. With this
observation, the final AMC response time bound equation is:
R∗i = Ci (HI) +
∑
τj∈hpH(i)
⌈
R∗i
Tj
⌉
Cj (HI) +
∑
τj∈hpL(i)
⌈
RLOi
Tj
⌉
Cj (LO) (2.4)
In addition to the previously described model, Burns and Baruah [BB13] provide an
extension to AMC that permits lower criticality tasks to continue execution in the HI-
criticality state. This extension is used in our AMC model with support for I/O, and is
briefly summarized as follows:
If LO-criticality tasks are allowed to continue execution in the HI-criticality mode at a
lower capacity, the following is the response time for a HI-criticality task τi:
R∗i = Ci +
∑
τj∈hpH(i)
⌈
R∗i
Tj
⌉
Cj (HI) +
∑
τj∈hpL(i)
⌈
RLOi
Tj
⌉
Cj (LO)
+
∑
τj∈hpL(i)
(⌈
R∗i
Tj
⌉
−
⌈
RLOi
Tj
⌉)
Cj (HI) (2.5)
The final term in Equation 2.5 expresses the maximum number of times the LO-criticality
task will be released multiplied by its smaller3 HI-criticality execution time. While Equa-
3For LO-criticality tasks that execute in HI-criticality mode, C (LO)>C (HI), whereas for HI-criticality
26
tion 2.5 also applies to LO-criticality tasks that continue running after the mode change, a
tighter bound is possible. Specifically, if a LO-criticality task has already run for C (HI)
before the mode change then it has met its HI-criticality requirement. Therefore, RLOi is
replaced with a smaller value for LO-criticality tasks. To this end RLO∗i is defined as the
following:
RLO∗i = min (Ci (LO) , Ci (HI)) +
∑
τj∈hp(i)
⌈
RLO∗i
Tj
⌉
Cj (LO) (2.6)
Note that RLO∗i =R
LO
i if Li=HI and R
LO∗
i ≤R
LO
i if Li=LO, as LO-criticality tasks will have a
smaller capacity in the HI-criticality mode. Therefore, Equation 2.5 is replaced with the
following more general equation that is tighter for LO-criticality tasks:
R∗i = Ci +
∑
τj∈hpH(i)
⌈
R∗i
Tj
⌉
Cj (HI) +
∑
τj∈hpL(i)
⌈
RLO∗i
Tj
⌉
Cj (LO)
+
∑
τj∈hpL(i)
(⌈
R∗i
Tj
⌉
−
⌈
RLO∗i
Tj
⌉)
Cj (HI) (2.7)
In Chapter 3 an analysis similar to the AMC-rtb analysis previously described will be
introduced for a system of Sporadic Servers and Priority Inheritance Bandwidth-preserving
Servers.
2.2 Related Work
This section will discuss work that is tangential but not required background information
for the contributions of this thesis.
tasks C (HI)≥C (LO).
27
2.2.1 Hypervisor Based Fault Tolerance (HBFT)
Hypervisor-based fault tolerance (HBFT) is an example of primary-backup based fault
tolerance system that uses a hypervisor to encapsulate a guest virtual machine so the state
of the entire system is easy to manipulate and save. This allows the primary and backup,
which run on different machines, to be easily synchronized without relying on any infor-
mation from or modification of the operating system or application. Such an approach has
the advantage that it is applicable to commodity and/or closed sourced operating systems
and applications.
HBFT was first introduced by Bressoud and Schneider [BS95]. The authors proposed
three reasons why hypervisor-based fault tolerance was appealing: 1) hardware replica-
tion incurs a design cost for each new realization of the architecture, 2) adding replica
coordination to an existing operating system will be difficult as mature operating systems
are invariably complicated and 3) when replica coordination is left to the application de-
veloper the same problems are repeatedly solved by application developers in different
applications. The authors argue that using a hypervisor addresses all three reasons as no
changes to hardware need to be made, assuming the architecture supports a virtual ma-
chine environment, all operating systems for the target architecture will work on a fully
functioning hypervisor and similarly unmodified applications will work as well.
Bressoud and Schneider [BS95] developed a hypervisor for the HP’s PA-RISK that
took advantage of the architecture’s recovery register. The recovery register is decre-
mented each time an instruction is retired and causes an interrupt to be raised when it
becomes negative. This is used to divide the execution of the guest into epochs. At the
beginning of each epoch all external events, such as I/O interrupts, are delivered to the
guest virtual machine. This ensures the execution of both virtual machines are identical
and the backup is ready to take over whenever the primary fails.
28
Cully et al. [CLM+08] developed a similar approach with Xen [BDF+03] by perform-
ing mini-VM migrations between the primary and backup instead of forcing the primary
and backup to run in lockstep. The primary system runs as normal and then at the epoch its
state is checkpointed and the changed state is sent asynchronous to the backup, allowing
the primary to continue execution. The backup is passive and does not execute the guest
virtual machine until the primary has failed. Output external to the system, e.g. network
traffic, is not made visible until the replicated state has been committed. This approach
experimentally proved to be more efficient than [BS95] and also has the advantage that
only a single backup machine is required for multiple primary systems since the backup is
inactive until a failure.
2.2.2 Process N-Modular Redundancy
HBFT captures the state of the entire virtual machine. Other approaches work at a smaller
scale, specifically protecting the state of a process. This section will describe various
techniques to perform process based N-Modular Redundancy (NMR).
Mushtag, Al-Ars and Bertels created a user-level based scheme for process protec-
tion [MAAB13]. In their scheme a process is duplicated and each copy runs separately
from the others. The execution of the process is divided into epochs were at each epoch
a hash of modified memory is compared to each the other instances. If the hashes are the
same one instance, the leader, creates a snapshot of itself and stores that snapshot for poten-
tial recovery. The previous snapshot, if one exists, is discarded so only one snapshot ever
exists. If the hashes do not match all the redundant copies are destroyed and new copies
are created using the previous snapshot. Care is taken to ensure that non-deterministic
functions such as gettimeofdate will return the same result across all instances. Fur-
thermore, the order of execution for synchronization primitives is ensured to be identical
29
by adding a counter to locking mechanisms, e.g. mutexes. Whenever the leader process
acquires a lock it increments the counter for the mutex. The follower threads are only per-
mitted to acquire a mutex if the counter is set to the correct value, ensuring the acquisition
of locks is the same across processes. As their approach is a user-level only approach to
track modified memory at the beginning of each epoch memory is marked as read only by
the process itself. A signal handler is registered to keep track of when read-only memory
is written to and also changes the protection to be writable to allow execution to continue.
Munk et al. [MAL+15] proposed a similar scheme for multi-core systems. Virtualized
Fault Tolerance (VFT) presented in this thesis will use a similar process based NMR ap-
proach but will take advantage of the Quest-V separation kernel design for isolation and
use recovery techniques that do not require a checkpoint to be created and rolled back to
when an error occurs [MAL+15].
Chapter 3
I/O Adaptive Mixed-Criticality
Traditional Adaptive Mixed-Criticality analysis assumes tasks follow a periodic model.
However, in practice tasks have different characteristics that lend them to be scheduled
using different task models. Specifically, as previously discussed in Chapter 2, it has been
shown that a system of Sporadic Servers for main tasks and Priority Inheritance Band-
width Preserving Servers (PIBS) for interrupt bottom halves outperforms a system of just
Sporadic Servers [DLW11]. This chapter will introduce I/O Adaptive Mixed-Criticality
(IO-AMC), an Adaptive Mixed-Criticality (AMC) variant using both Sporadic Servers
and PIBS. As with the previous work on Sporadic Server and PIBS this work is focused
on the scheduling of tasks within a kernel which may or may not be in a Quest-V sandbox.
Additionally, this work assumes tasks are independent of each other and therefore share
no resources.
3.1 Response Time Analysis for SS and PIBS
In order to perform an Adaptive Mixed-Criticality analysis for a combined Sporadic Server
and PIBS system, the response time analysis equation of the system must be derived.
First, under the assumption that a Sporadic Server can be treated as an equivalent periodic
task [Spr90], the response time equation for task τi in a system of only Sporadic Servers
31
is the following:
Ri = Ci +
∑
τj∈hp(i)
⌈
Ri
Tj
⌉
Cj
where hp (i) is the set of tasks of equal or higher priority than task τi. Second, due to the
worst-case phasing of a combined system of PIBS and Sporadic Servers, a PIBS utilization
bound of (2− U)U cannot repeatedly occur. The worst case phasing results in at most
an additional capacity (i.e., execution time) of (Tq−TqUk)Uk for PIBS τk assigned to the
Sporadic Server τq. This is only possible if PIBS blocks waiting on I/O before consuming
its full budget capacity. Therefore, a tighter upper bound on the maximum interference a
PIBS causes:
Iqk (t) = (Tq−TqUk)Uk +
⌈
t
Tq
⌉
TqUk
= (1− Uk)TqUk +
⌈
t
Tq
⌉
TqUk
=
(
1 +
⌈
t
Tq
⌉
− Uk
)
TqUk
This is incorporated into the response time analysis of Sporadic Server τi, in a system
consisting of both Sporadic Servers and PIBS, in the following way:
Ri = Ci +
∑
τj∈hp(i)
{⌈
Ri
Tj
⌉
Cj
}
+
∑
τk∈ps
max
τq∈hip(i)
{Iqk (Ri)} (3.1)
Where ps is the set of all PIBS and hip (i)=hp (i) ∪ {τi}, i.e. the set containing τi and
all tasks with equal or higher priority than task τi. This is necessary as the PIBS can be
running on behalf of task τi. In general, there is no a priori knowledge about which PIBS
runs for which Sporadic Server. Therefore, the Sporadic Server, τq that maximizes the
32
interference caused by the PIBS must be considered. If such a priori knowledge existed,
it could be used to reduce the possible set of Sporadic Servers on behalf of which a PIBS
could be executing. However, without such knowledge all possible Sporadic Server tasks
of equal or higher priority must be considered.
The response time analysis for a PIBS is therefore dependent on the associated Spo-
radic Server. The response time analysis for PIBS τp when assigned to Sporadic Server τs
is:
sRp =(2− Up)UpTs +
∑
τj∈hip(s)
{⌈
sRp
Tj
⌉
Cj
}
+
∑
τk∈ps\{τp}
max
τq∈hip(s)
{Iqk (sRp)} (3.2)
Note that (2−Up)UpTs is the maximum execution time of the PIBS over a time window of
Ts, i.e. I
s
p (Ts)= (2−Up)UpTs. Besides the first terms differing, Equation 3.2 differs from
Equation 3.1 in that hip (s) is used instead of hp (s) for the set of Sporadic Servers. This
is because Sporadic Server τs must be included as it has an equal priority to PIBS τp when
τp is running on behalf of τs. Also, the summation over all PIBS does not include PIBS
τp when determining its response time. If sRp≤Ts, for each and every Sporadic Server τs
that τp can be assigned to, then τp will never miss a deadline.
3.2 System Model and Analysis
This section describes the system model for I/O Adaptive Mixed-Criticality (IO-AMC),
comprising both Sporadic Servers and Priority Inheritance Bandwidth-Preserving Servers
(PIBS). IO-AMC focuses on the scheduling of I/O events and application threads in a
mixed-criticality setting. Based on the IO-AMC model, we will derive a response time
33
bound, IO-AMC-rtb, for Sporadic Servers and PIBS.
3.2.1 I/O Adaptive Mixed-Criticality Model
Sporadic Servers follow a similar model to the original AMC model. A Sporadic Server
task τi is assigned a criticality level Li∈{LO, HI}, a period Ti and a vector of capacities
~Ci. The deadline is assumed to be equal to the period. If Li=LO, τi only runs while the
system is in the LO-criticality mode and therefore only C (LO) is defined. For HI-criticality
tasks both C (LO) and C (HI) are defined and C (HI) ≥ C (LO).
For PIBS, an I/O task τk is again assigned to either the LO or HI criticality level:
Lk∈{LO, HI}. As previously discussed, PIBS are only defined by a utilization Uk. The pe-
riod, deadline and priority for a PIBS is inherited from the Sporadic Server for which it is
performing a task. For IO-AMC, this definition is extended and each PIBS is defined by a
vector of utilizations ~Uk. If τk is a LO-criticality PIBS, i.e. Lk=LO, then Uk (LO)>Uk (HI)
and if Lk=HI then Uk (LO)≤Uk (HI). This definition allows LO-criticality PIBS to con-
tinue execution after the switch to HI-criticality. This model allows users to assign criti-
cality levels to I/O devices indirectly by assigning criticality levels to the PIBS that execute
in response to the I/O device.
With the typical AMC model now augmented to consider PIBS, it is possible to derive
a new admissions test for IO-AMC. First, the PIBS interference equation introduced in
Section 2.1.2 is modified to incorporate criticality levels:
Iqk (t, L) =
(
1 +
⌈
t
Tq
⌉
− Uk (L)
)
TqUk (L)
As before, there are three conditions that must be considered: (1) the LO-criticality
steady state, (2) the HI-criticality steady state, and (3) the change from LO-criticality to HI-
criticality. The steady states are again simple and are merely extensions of the non-mixed-
34
criticality response time bounds. For Sporadic Server tasks the steady state equations are:
RLOi = Ci (LO) +
∑
τj∈hp(i)
{⌈
RLOi
Tj
⌉
Cj (LO)
}
+
∑
τk∈ps
max
τq∈hip(i)
{
Iqk
(
RLOi , LO
)}
(3.3)
RHIi = Ci (HI) +
∑
τj∈hpH(i)
{⌈
RHIi
Tj
⌉
Cj (HI)
}
+
∑
τk∈ps
max
τq∈hipH(i)
{
Iqk
(
RHIi , HI
)}
(3.4)
where hipH (i)=hpH (i)∪{τi}, i.e. it is the set of all HI-criticality tasks of higher or
equal priority than task τi, plus task τi itself. For PIBS task τp, running on behalf of
Sporadic Server task τs, the steady state equations are:
sR
LO
p =(2− Up (LO))Up (LO)Ts +
∑
τj∈hip(s)
{⌈
sR
LO
p
Tj
⌉
Cj (LO)
}
+
∑
τk∈ps\{τp}
max
τq∈hip(s)
{
Iqk
(
sR
LO
p , LO
)}
(3.5)
sR
HI
p =(2− Up (HI))Up (HI)Ts +
∑
τj∈hipH(s)
{⌈
sR
HI
p
Tj
⌉
Cj (HI)
}
+
∑
τk∈ps\{τp}
max
τq∈hipH(s)
{
Iqk
(
sR
HI
p , HI
)}
(3.6)
As with the traditional response time analysis of PIBS, its deadline is the same as that of
its corresponding Sporadic Server τs. Therefore, the above analysis must be applied to all
Sporadic Servers associated with a PIBS.
3.2.2 IO-AMC-rtb
The techniques described in Section 2.1.4 are used for the IO-AMC-rtb analysis. Specifi-
cally, LO-criticality PIBS are allowed to continue execution in the HI-criticality mode. For
35
a Sporadic Server task the IO-AMC-rtb equation is:
R∗i = Ci +
∑
τj∈hpH(i)
⌈
R∗i
Tj
⌉
Cj (HI) +
∑
τj∈hpL(i)
⌈
RLO∗i
Tj
⌉
Cj (LO)
+
∑
τk∈psH
{
max
τq∈hip(i)
Iqk (R
∗
i , HI)
}
+
∑
τk∈psL
{
max
τq∈hip(i)
Iqk
(
RLO∗i , LO
)
+ max
τq′∈hipH(i)
Iq
′
k
(
R∗i −R
LO∗
i , HI
)}
(3.7)
where psH and psL are the set of HI and LO-criticality PIBS respectively. The last sum-
mation in Equation 3.7 represents the maximum interference a LO-criticality PIBS causes.
Specifically, Iqk (R
LO∗
i , LO) represents the maximum interference the PIBS causes before
the mode change and Iq
′
k (R
∗
i −R
LO∗
i , HI) represents the total interference the PIBS causes
after the mode change. Again, the Sporadic Server that maximizes the interference is
chosen for each PIBS.
The IO-AMC-rtb equation for a PIBS τk when assigned to Sporadic Server τs is:
sR
∗
p =(2− Up (HI))TsUp (HI) +
∑
τj∈hipH(s)
⌈
sR
∗
p
Tj
⌉
Cj (HI)
+
∑
τj∈hipL(s)
⌈
sR
LO∗
p
Tj
⌉
Cj (LO) +
∑
τk∈(psH\{τp})
{
max
τq∈hip(s)
Iqk
(
sR
∗
p, HI
)}
+
∑
τk∈(psL\{τp})
{
max
τq∈hip(s)
Iqk
(
sR
LO∗
p , LO
)
+ max
τq′∈hipH(s)
Iq
′
k
(
sR
∗
p − sR
LO∗
p , HI
)}
(3.8)
Equation 3.8 differs from Equation 3.7 in the first term, and by the exclusion of τp from
the set of PIBS. Similar to Equation 3.2, the response time analysis requires iterating over
all HI-criticality Sporadic Servers that could be associated with the PIBS. This is because
only the HI-criticality Sporadic Servers are of interest after the mode change.
36
As mentioned in Section 2.1.4, recent related work by Burns and Baruah [BB13] has
extended the original AMC model to allow LO-criticality tasks to continue running after
the mode to the HI-criticality mode. The derivation this work provided was used to allow
LO-criticality PIBS to continue running after the mode change. This work is also applicable
to Sporadic Servers and allows LO-criticality Sporadic Servers to continue running in the
HI-criticality mode. This derivation is similar to the one provided by Burns and Baruah
but there are subtle differences due to the inclusion of PIBS.
First for Sporadic Servers during the mode change the new IO-AMC-rtb equation is:
R∗i =Ci +
∑
τj∈hpH(i)
⌈
R∗i
Tj
⌉
Cj (HI) +
∑
τj∈hpL(i)
⌈
RLO∗i
Tj
⌉
Cj (LO)
+
∑
τj∈hpL(i)
(⌈
R∗i
Tj
⌉
−
⌈
RLO∗i
Tj
⌉)
Cj (HI) +
∑
τk∈psH
{
max
τq∈hip(i)
Iqk (R
∗
i , HI)
}
+
∑
τk∈psL
{
max
τq∈hip(i)
Iqk
(
RLO∗i , LO
)
+ max
τq′∈hip(i)
Iq
′
k
(
R∗i −R
LO∗
i , HI
)}
(3.9)
In addition to the third term which is taken from Equation 2.7, Equation 3.9 differs
from Equation 3.7 (where LO-criticality Sporadic Servers do not run in the HI-criticality
mode) in that all Sporadic Servers of higher or equal priority must be considered when ac-
counting for the interference from LO-criticality PIBS after the mode change. Specifically,
the hipH in the final term has changed to hip to reflect that fact that Sporadic Servers of
all criticality levels run in the HI-criticality mode.
Equation 3.10 is the IO-AMC-rtb equation for PIBS when LO-criticality Sporadic
Servers are allowed to run in the HI-criticality mode. Again the only differences are
the inclusion of the interference caused by LO-criticality tasks after the mode change and
changing the hipH to hip in the final term to account for the fact that all Sporadic Servers
37
are capable of executing after the mode change.
sR
∗
p =(2− Up (HI))TsUp (HI) +
∑
τj∈hipH(s)
⌈
sR
∗
p
Tj
⌉
Cj (HI)
+
∑
τj∈hipL(s)
⌈
sR
LO∗
p
Tj
⌉
Cj (LO) +
∑
τj∈hpL(i)
(⌈
sR
∗
p
Tj
⌉
−
⌈
sR
LO∗
p
Tj
⌉)
Cj (HI)
+
∑
τk∈(psH\{τp})
{
max
τq∈hip(s)
Iqk
(
sR
∗
p, HI
)}
+
∑
τk∈(psL\{τp})
{
max
τq∈hip(s)
Iqk
(
sR
LO∗
p , LO
)
+ max
τq′∈hip(s)
Iq
′
k
(
sR
∗
p − sR
LO∗
p , HI
)}
(3.10)
3.3 Implementation
Changes to the existing Quest scheduler to support IO-AMC scheduling include:
• Mapping Quest processes to a task and job model;
• Detecting when to change to HI-criticality mode;
• Adjusting Sporadic Server and PIBS replenishments.
Quest tasks are assigned to Sporadic Servers and are similar to UNIX processes in
that they run until an exit system call is invoked. In comparison, real-time tasks in
the IO-AMC model release a job at a specific rate and the job runs until completion. To
accommodate the differences, a new sync system call was introduced. The sync system
call indicates the start and end of real-time jobs. A typical IO-AMC task has a setup phase,
followed by a loop which repeatedly calls sync within the loop.
A typical IO-AMC application will look similar to the procedure outlined in Algo-
rithm 1.
An alternative approach would be to have one process repeatedly fork another process,
with the newly forked process being the job for that period. This approach would not
38
Algorithm 1 IO-AMC User Space Process
procedure MAIN
// Setup Code
while TRUE do
sync ()
// Job Code
end while
end procedure
require the application developer to invoke the sync system call as the exit system call
could be used to inform the scheduler that the job is completed. This approach was not
chosen however due to the extra overhead involved in the creation of a new process.
The first time sync is called, a mixed-criticality task will sleep until its Sporadic
Server is replenished to full capacity. The deadline will be set T time units after the
process wakes up. Subsequent calls to syncwill have the process sleep up to the deadline,
emulating the job being completed and waiting for the next job.
Quest bottom half threads are assigned to Priority Inheritance Bandwidth Preserving
Servers and follow the real-time task and job model. Bottom half threads are only woken
up by the top halves, and at the end of their execution they notify the scheduler of their
completion. Therefore, no changes needed to be made to the PIBS and bottom half threads.
The conditions for a mode change depend on whether a task is running a job or bottom
half on a Sporadic Server or PIBS. A HI-criticality Sporadic Server initiates a mode change
when it has depleted all replenishment items that are before the deadline. If this happens
the Sporadic Server will not be able to run until after the deadline and therefore a mode
change must occur. A HI-criticality PIBS causes a mode change when it has depleted its
budget before completing the bottom half thread. In this case, the single replenishment for
a PIBS will be at the deadline and therefore the PIBS will not be able to run until after its
deadline unless a mode change occurred.
Finally, when the mode change occurs, the Sporadic Server and PIBS replenishment
39
items must be adjusted to take into account the new or removed budget. For HI-criticality
Sporadic Servers, the difference between C (HI) and C (LO) is added to the beginning of
the first replenishment list item, if the item’s replenishment time is equal to or less than the
current time, or if the replenishment list is full. Otherwise, a new replenishment list item is
inserted at the beginning with a replenishment time equal to the current time. A replenish-
ment item R has two properties, R.amt, which is the amount of budget replenished, and
R.time, which is when the replenishment occurs. This is demonstrated in Algorithm 2.
Algorithm 2 HI-Criticality Sporadic Server Adjustment
⊲ S is the Sporadic Server being modified
now ← current time
additional cap← C (HI)− C (LO)
if (S.Q.head.time ≤ now) OR (MAX LENGTH = S.Q.length) then
S.Q.head.amt← S.Q.head.amt+ additional cap
else
⊲ R is a new replenishment item
R.time← now
R.amt← additional cap
S.Q.add (R)
end if
For LO-criticality Sporadic Servers, the adjustment algorithm is more complicated. If
C (LO)−C (HI) is less than or equal to the remaining budget for this period, i.e. before the
deadline, then budget is removed from the replenishment list by moving backwards in time
from the replenishment item right before the deadline. The head replenishment item must
be treated differently if the Sporadic Server has used some of that budget at the time of a
mode change. S.usage tracks how much has been used from the head replenishment item.
If S.usage is greater than zero, a new replenishment item might be posted at the end of the
queue with an amount equal to S.usage. If C (LO)−C (HI) is greater than the remaining
budget for this period then the Sporadic Server has run for more than its C (HI) for this
period. In this case all the available budget in the replenishment list is removed for this
period and the difference is removed from the end of the replenishment list. Algorithm 3
40
contains the pseudocode for adjusting the budget of a LO-criticality Sporadic Server.
PIBS have a much simpler mode change algorithm because there is only one replenish-
ment item to consider for each invocation of the server. Also, due to the aperiodic nature
of I/O events, a deadline for a PIBS is calculated from when an I/O event is initiated. This
results in pessimistic analysis in Equations 3.7 and 3.8. We have to assume that regardless
of whether a PIBS is LO or HI-criticality, it causes the maximum interference possible in
both the LO and HI modes. Therefore, both LO and HI PIBS simply replenish their full
budget at the time of a mode change.
3.4 Evaluation
The experimental evaluation consists of two sections: 1) simulation based schedulability
tests and 2) experiments conducted using the IO-AMC implementation in the Quest op-
erating system. The simulation based schedulability tests show that a system of Sporadic
Servers and PIBS have a similar but slightly lower schedulability than a system of just
Sporadic Servers. This is due to the extra interference caused by PIBS compared to Spo-
radic Servers. The Quest experiments show the benefits of PIBS compared to Sporadic
Servers and how mixed-criticality is used to control the bandwidth from I/O devices with
different criticalities.
3.4.1 Simulation Experiments
In order to compare the proposed scheduling approaches, random task sets were generated
with varying total utilizations. 500 task sets were generated for each utilization value
ranging from 0.20 to 0.95 with 0.05 increments. Each task set was tested to see if it
was schedulable under the different policies. Each PIBS was randomly assigned to a
single Sporadic Server of the same criticality level. For systems comprising only Sporadic
41
Algorithm 3 LO-Criticality Sporadic Server Adjustment
⊲ S is the Sporadic Server being modified
Rd ← replenishment item in S.Q right before the deadline of S
⊲ Rd is NULL if no such replenishment item exists
now ← current time
reduced cap← C (LO)− C (HI)
while (reduced cap > 0) AND (Rd 6= NULL) do
if (Rd = S.Q.head) AND (S.usage > 0) then
if Rd.amt− S.usage > reduced cap then
Rd.amt← Rd.amt− reduced cap
reduced cap← 0
else
reduced cap← reduced cap− (Rd.amt− S.usage)
S.Q.remove (Rd)
Rd.amt← S.usage
Rd.time← Rd.time+ S.period
S.Q.add (Rd)
S.usage← 0
end if
Rd ← NULL
else
if Rd.amt ≤ reduced cap then
reduced cap← reduced cap−Rd.amt
Rtmp ← Rd.prev
S.Q.remove (Rd)
Rd ← Rtmp
else
Rd.amt← Rd.amt− reduced cap
reduced cap← 0
end if
end if
end while
while reduced cap > 0 do
if S.Q.end.amt ≤ reduced cap then
reduced cap← reduced cap− S.Q.end.amt
S.Q.remove (S.Q.end)
else
S.Q.end.amt← S.Q.end.amt− reduced cap
reduced cap← 0
end if
end while
42
Servers, the PIBS were converted to Sporadic Servers of equivalent utilization and period.1
The parameters used to generate the task sets used are outlined in Table 3.1.
Parameter Value
Number of Tasks 20 (15 Main, 5 I/O)
Criticality Factor 2
Probability Li = HI 0.5
Period Range 1 – 100
I/O Total Utilization 0.05
Table 3.1: Parameters Used to Generate Task Sets
The UUnifast algorithm [BB05] was used to generate the task sets. Task periods
were generated with a log-uniform distribution. For the mixed-criticality experiments,
Ci(LO)=Ui/Ti. If Li=HI, Ci(HI)=CF× Ci(LO), where CF is the criticality factor. For our
experiments, if Li=LO, Ci(HI)=0.
The following are the different types of schedulability tests that were used in the eval-
uation. This includes schedulability tests for mixed-criticality and traditional systems.
• SS-rta – Sporadic Server response time analysis. Due to the nature of Sporadic
Servers, this is the same as a periodic response time analysis.
• SS+PIBS-rta – Sporadic Server and PIBS response time analysis introduced in this
paper. See Section 2.1.2.
• AMC-rtb – Adaptive Mixed-Criticality response time bound developed by Baruah
et al. [BBD11]. See Section 2.1.4.
• IO-AMC-rtb – I/O Adaptive Mixed-Criticality response time bound developed in
this paper. See Section 3.2.2.
1The PIBS period was set equal to its corresponding Sporadic Server.
43
• AMC UB – This is not a schedulability test but instead an upper bound for AMC. It
consists of both the LO- and HI-criticality level steady states tests. See Section 2.1.4
for details.
• IO-AMC UB – This is not a schedulability test but instead an upper bound for IO-
AMC. It consists of both the LO- and HI-criticality level steady states tests. See
Section 3.2.2 for details.
3.4.1.1 SS+PIBS vs. SS-Only Simulations
Figure 3.1 contains the results of the response time analysis and event simulator for a
system of Sporadic Servers and PIBS (SS+PIBS) compared to a system of only Sporadic
Servers (SS-Only). As expected, a higher number of the Sporadic Server only task sets are
schedulable using the response time analysis equations compared to the SS+PIBS response
time analysis. This is due to the extra interference a PIBS causes compared to a Sporadic
Server of equivalent utilization and period.
3.4.1.2 IO-AMC vs. AMC Simulations
In this section, IO-AMC is compared to an AMC system containing only Sporadic Servers
under different mixed-criticality scenarios.
Figure 3.2 shows the response time analysis and simulation results when LO-criticality
tasks do not run in the HI-criticality mode. Similar to Figure 3.1, AMC-rtb outperforms
IO-AMC-rtb. This is due to the fact that AMC-rtb is an extension of the traditional re-
sponse time analysis and does not experience the extra interference caused by PIBS.
We also varied task set parameters to identify their effects on schedulability. For each
set of parameters p in a given test y, we measured the weighted schedulability [BBA10],
44
0 %
20 %
40 %
60 %
80 %
100 %
 0  0.2  0.4  0.6  0.8  1
Sc
he
du
la
bl
e 
Ta
sk
 S
et
s
Utilization
SS-rta
SS+PIBS-rta
Figure 3.1: Schedulability of SS+PIBS vs SS-Only
0 %
20 %
40 %
60 %
80 %
100 %
 0  0.2  0.4  0.6  0.8  1
Sc
he
du
la
bl
e 
Ta
sk
 S
et
s
Utilization
AMC UB
IO-AMC UB
AMC-rtb
IO-AMC-rtb
Figure 3.2: Schedulability of IO-AMC vs AMC
which is defined as follows:
Wy (p)=
∑
∀τ
(u (τ)× Sy (τ, p)) /
∑
∀τ
u (τ)
45
where Sy (τ, p) is the binary result (0 or 1) of the schedulability test y on task set τ , and
u(τ) is the total utilization. The weighted schedulability compresses a three-dimensional
plot to two dimensions and places higher value on task sets with higher utilization.
Figures 3.3, 3.4, and 3.5 show the results of varying the probability of a HI-criticality
task, the criticality factor and the number of tasks respectively. In all scenarios, LO-
criticality tasks do not run in the HI-criticality mode. As expected, the percentage of
schedulable tasks for IO-AMC is slightly lower than the percentage for traditional AMC.
This is again due to the slightly larger interference caused by a PIBS.
0 %
20 %
40 %
60 %
80 %
100 %
 0  0.2  0.4  0.6  0.8  1
W
ei
gh
te
d 
Sc
he
du
la
bi
lity
Percentage of Tasks with High Criticality
AMC UB
IO-AMC UB
AMC-rtb
IO-AMC-rtb
Figure 3.3: Weighted Schedulability vs % of HI-criticality Tasks
3.4.2 Quest Experiments
The above simulation results do not capture the practical costs of a system of servers for
tasks and interrupt bottom halves. This section investigates the performance of our IO-
AMC policy in the Quest real-time system. We also study the effects of mode changes
on I/O throughput for an application that collects streaming camera data. All experiments
46
0 %
20 %
40 %
60 %
80 %
100 %
 1  1.5  2  2.5  3  3.5  4  4.5  5
W
ei
gh
te
d 
Sc
he
du
la
bi
lity
Criticality Factor
AMC UB
IO-AMC UB
AMC-rtb
IO-AMC-rtb
Figure 3.4: Weighted Schedulability vs Criticality Factor
0 %
20 %
40 %
60 %
80 %
100 %
 0  20  40  60  80  100
W
ei
gh
te
d 
Sc
he
du
la
bi
lity
Number of tasks
AMC UB
IO-AMC UB
AMC-rtb
IO-AMC-rtb
Figure 3.5: Weighted Schedulability vs Number of Tasks
were run on a 3.10 GHz Intel R© Core i3-2100 CPU.
47
3.4.2.1 Scheduling Overhead
We studied the scheduling overheads for two different system implementations in Quest.
In the first system, Sporadic Servers were used for both tasks and bottom halves (SS-Only).
In the second system, Sporadic Servers were used for tasks, and PIBS were used to handle
interrupt bottom halves (SS+PIBS). In both cases, a task set consisted of two application
threads of different criticality levels assigned to two different Sporadic Servers, and one
bottom half handler for interrupts from a USB camera. The first application thread read
all the data available from the camera in a non-blocking manner and then busy-waited
for a set time to simulate processing the data. The busy-wait time was independent of the
amount of data read as PIBS have been shown to result in higher data bandwidth compared
to Sporadic Servers when used for bottom half interrupt handling [DLW11]. In order to
measure scheduling overhead independent of the different data bandwidth of Sporadic
Server and PIBS the execution time of the tasks was kept constant between experiments.
The second application thread simply busy-waited for its entire budget to simulate a CPU-
bound task without any I/O requests. Both application threads consisted of a sequence of
jobs. Each job was released once every server period or immediately after the completion
of the previous job, depending on which was later. The experimental parameters are shown
in Table 3.2.
Task C (LO) or U (LO) C (HI) or U (HI) T
Application 1
(HI-criticality) 23ms 40ms 100ms
Application 2
(LO-criticality) 10ms 1ms 100ms
Bottom Half (PIBS) U (LO) = 1% U (HI) = 2% 100ms
Bottom Half (SS) 1ms 2ms 100ms
Table 3.2: Quest Task Set Parameters for Scheduling Overhead
The processor’s timestamp counter was recorded when each application finished its
48
current job. Results are shown in Figure 3.6. For SS+PIBS, each application completed
its jobs at regular intervals. However, for SS-Only, the HI-criticality server for interrupts
from the USB camera caused interference with the application tasks. This led to the HI-
criticality task depleting its budget before finishing its job. This is due to the extra overhead
added by a Sporadic Server handling the interrupt bottom half thread. Therefore, the sys-
tem had to switch into the HI-criticality mode to ensure the HI-criticality task completed
its job, sacrificing the performance of the LO-criticality task. This is depicted by the larger
time between completed jobs in Figure 3.6. The SS+PIBS task set did not suffer from this
problem due to the lower scheduling overhead caused by PIBS.
 0  1  2  3  4  5
Se
rv
er
 T
yp
e
Time (seconds)
HI-Criticalty Task (SS+PIBS)
LO-Criticalty Task (SS+PIBS)
HI-Criticalty (SS-Only)
LO-Criticalty (SS-Only)
Figure 3.6: Job Completion Times for SS+PIBS vs SS-Only
Figure 3.7 shows the additional overhead caused when Sporadic Servers are used for
bottom half threads as opposed to PIBS. This higher scheduling overhead is the cause
for the mode change in the previous experiment. Figure 3.7 depicts two different sys-
tem configurations, one involving only a single camera and another involving two cam-
eras. For each configuration, the scheduling overhead for both SS-Only and SS+PIBS was
49
measured. For the single camera configuration, there is one HI-criticality task, one LO-
criticality task, and one HI-criticality server (either PIBS or Sporadic Server) for the USB
camera interrupt bottom half thread. The scheduling overhead for SS-Only is more erratic
and higher than the system of Sporadic Servers and PIBS. The variability in the overhead
when Sporadic Servers are used for bottom half interrupt handling is due to the list of bud-
gets in Sporadic Servers that must be handled by the scheduler as opposed to the single
replenishment item for a PIBS. The second configuration adds a LO-criticality camera with
a 2% utilization in the LO-criticality mode, a 1% utilization in the HI-criticality mode, and
a period of 100 microseconds when utilizing a Sporadic Server. Figure 3.7 shows that the
scheduling overhead for an SS-Only system more than doubled, going from an average of
0.21% to 0.49%, while an SS+PIBS system experienced only a small increase of 0.03%.
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
 0.8
 0  20  40  60  80  100  120  140  160  180
CP
U 
%
Time (seconds)
SS-Only  (2 Camera)
SS-Only  (1 Camera)
SS+PIBS  (2 Camera)
SS+PIBS (1 Camera)
Figure 3.7: Scheduling Overheads for SS-Only vs SS+PIBS
50
3.4.2.2 Mode Change for I/O Device
As mentioned in Section 3.2.2, assigning criticality levels to bottom half interrupt handlers
is akin to assigning criticality levels to the device associated with the bottom half. To test
this assertion, two USB cameras were assigned different criticality levels and a mode
change was caused during the execution of the task set. The task set consisted of two
Sporadic Servers and two PIBS, as shown in Table 3.3.
Task C (LO) or U (LO) C (HI) or U (HI) T
Application 1
(HI-criticality) 25ms 40ms 100ms
Application 2
(LO-criticality) 25ms 24ms 100ms
Camera 1 – PIBS
(HI-criticality) U (LO) = 0.1% U (HI) = 1% 100ms
Camera 2 – PIBS
(LO-criticality) U (LO) = 1% U (HI) = 0.1% 100ms
Table 3.3: Quest Task Set Parameters for I/O Device Mode Change
Figure 3.8 shows the camera data available at each data point. At approximately 30
seconds, a mode change occurs that causes Camera 1 to change from a utilization of 0.1%
to 1%, thereby increasing the amount of data received. Also at the time of the mode
change, Camera 2’s utilization switches from 1% to 0.1%, causing a drop in received data.
The variance for Camera 1 after the mode change is due to extra processing of the delayed
data that is performed by the bottom half interrupt handler. Finally, Figure 3.9 shows the
total data processed from each camera over time.
51
 0
 5000
 10000
 15000
 20000
 25000
 30000
 35000
 40000
 0  10  20  30  40  50  60
By
te
s 
Re
ad
Time (seconds)
Camera 1
Camera 2
Figure 3.8: Data From HI- and LO-criticality USB Cameras
 0
 1x106
 2x106
 3x106
 4x106
 5x106
 6x106
 7x106
 0  10  20  30  40  50  60
To
ta
l B
yt
es
 R
ea
d
Time (seconds)
Camera 1
Camera 2
Figure 3.9: Total Data Processed Over Time
Chapter 4
Recovery Adaptive Mixed-Criticality
This chapter will cover both the Recovery Adaptive Mixed Criticality (RAMC) model and
scheduling analysis. RAMC was designed to address the issue of performing a recovery in
Virtualized Fault Tolerance (Chapter 5). Specifically, during recovery extra computational
time is required for both the recovering process and the correct process being used for
recovery. While AMC or IO-AMC could be used for this, both would result in the unnec-
essary reduction in progress for LO-criticality tasks. In order for AMC or IO-AMC to be
used for recovery, all processes being protected via VFT would have to be marked as HI-
criticality. When the recovery procedure is initiated, a mode change would occur, granting
the HI-criticality processes additional execution time at the cost of the LO-criticality tasks.
However, typically only one process in the faulty sandbox and one process in a correct
sandbox will actually need the additional capacity. Therefore, a mixed-criticality scheme
with a finer grain of control of which HI-criticality tasks perform the mode change would
allow the LO-criticality tasks to make further progress in the HI-criticality mode. RAMC
was designed specifically to accomplish this goal.
4.1 Model
Similar to traditional AMC a task τi is defined by its period, deadline, a vector of com-
putation times and a criticality level,
(
Ti, Di, ~Ci, Li
)
. We will focus on the case of two
53
criticality levels, LO and HI in RAMC. Therefore, HI-criticality tasks are assigned two
capacities, C (LO) and C (HI), with C (HI)≥C (LO). RAMC will use the extension pro-
vided by Burns and Baruah [BB13] to permit LO criticality tasks to continue executing
when a HI-criticality task uses its HI-criticality budget. Therefore, LO-criticality tasks will
also be defined by both a C (LO) and C (HI) with C (LO)>C (HI). All tasks are assigned
a period T and a deadline D. We will focus specifically on the case where D=T . As
previously mentioned, unlike traditional AMC, in RAMC the mode change is a per task
event as opposed to a system-wide event. Therefore, a maximum number of HI-criticality
tasks that run using their HI-criticality capacity at any given time, h, is also specified.
The HI-criticality tasks that are using their HI-criticality capacity will be referred to ask
triggered tasks. If h is equal to the number of HI-criticality tasks than RAMC becomes
equivalent to AMC. Finally, if any of the HI-criticality tasks are in the HI-criticality state
all of the LO-criticality tasks will also be in the HI-criticality state, and therefore be using
their smaller capacity.
Recovery AMC is developed for systems where a some or all tasks are being protected
from transient faults. When a task experiences a transient fault a recovery mechanism will
be invoked that requires additional computational time. This is best illustrated with an
example. Consider the set of three tasks:
1. τ1 = {T = 5, L = HI, C (LO) = 2, C (HI) = 3}
2. τ2 = {T = 10, L = HI, C (LO) = 3, C (HI) = 5}
3. τ3 = {T = 10, L = LO, C (LO) = 3, C (HI) = 0}
τ1 and τ2 are HI-criticality tasks that are being protected from transient faults. τ3
is a LO-criticality task and therefore is not protected. Normally τ1 and τ2 take 2 and
3 time units of execution respectively. However, when they experience a fault they re-
54
quire an additional 1 and 2 time units to recover. Under a normal scheduling scheme,
without mixed-criticality, both protected tasks would have to be scheduled using their HI-
criticality capacities and therefore the task set of just the protected tasks is not schedulable:
(3/5) + (5/10) = 1.1 > 1. If Adaptive Mixed-Criticality is used the task set is still not
schedulable for the same reason, the utilization of the HI-criticality tasks exceeds one.
However, under RAMC, if at most one task experiences a fault and therefore need to ini-
tiate the recovery routine, i.e. h = 1, then the task set is schedulable, specifically by
assigned τ1 the highest priority, τ2 the second highest and τ3 the lowest priority.
4.2 RAMC Analysis
Similar to AMC, the response time bound for RAMC consists of three parts, the LO-
criticality steady state, the HI-criticality steady state and the mode change. The RAMC
LO-criticality steady state is identical to the AMC LO-criticality steady state and therefore
will not be discussed here.
4.2.1 RAMC HI-criticality steady state
The RAMC HI-criticality steady state differs from the AMC steady state in that only a
subset of HI-criticality tasks are actually using their HI-criticality budgets. Therefore, the
HI-criticality steady state response time bound for a given set of triggered tasks, λ, is
expressed as the following:
RHIi (λ) = C
∗
i +
∑
τj∈hpHT(i)
⌈
RHIi (λ)
Tj
⌉
Cj (HI) +
∑
τj∈hpHU(i)
⌈
RHIi (λ)
Tj
⌉
Cj (LO)
+
∑
τj∈hpL(i)
⌈
RHIi (λ)
Tj
⌉
Cj (HI) (4.1)
55
C∗i is C (HI) if τi is either a LO-criticality task or a triggered HI-criticality task. If τi
is an untriggered HI-criticality task then C∗i =C (LO). hpHT (i) is the set of triggered
HI-criticality tasks with a priority greater than or equal to τi and hpHU (i) is the set of
untriggered HI-criticality tasks with a priority greater than or equal to τi. Each term is
explained as the following:
1. C∗i – the capacity of τi depending on its criticality level and whether it is triggered
2.
∑
τj∈hpHT(i)
⌈
RHIi (λ)
Tj
⌉
Cj (HI) – The interference caused by triggered HI-criticality
tasks
3.
∑
τj∈hpHU(i)
⌈
RHIi (λ)
Tj
⌉
Cj (LO) – The interference caused by untriggered HI-criticality
tasks
4.
∑
τj∈hpL(i)
⌈
RHIi (λ)
Tj
⌉
Cj (HI) – The interference caused by LO-criticality tasks.
Let Λ be the set of all possible sets of triggered tasks. Therefore RHIi is defined as:
RHIi = max
λ∈Λ
(
RHIi (λ)
)
(4.2)
Initially, this appears to be computationally expensive as the size of Λ is
(
n
h
)
, where n
is the number of tasks. However, by rewriting Equation 4.1, it is shown that not all sets of
triggered tasks must be enumerated. The modified Equation 4.1 is the following:
RHIi (λ) = Ci (LO) + γ (i, λ) (Ci (HI)− Ci (LO)) +
∑
τj∈hpHT(i)
⌈
RHIi (λ)
Tj
⌉
(Cj (HI)− Cj (LO))
+
∑
τj∈hpH(i)
⌈
RHIi (λ)
Tj
⌉
Cj (LO) +
∑
τj∈hpL(i)
⌈
RHIi (λ)
Tj
⌉
Cj (HI) (4.3)
56
where γ (i, λ) is defined as:
γ (i, λ) =


1 if τi ∈ λ
0 otherwise
(4.4)
Equation 4.3 makes it clear that selecting the set of triggered tasks that maximizes RHIi
is the same as maximizing:
γ (i, λ) (Ci (HI)− Ci (LO)) +
∑
τj∈hpHT(i)
⌈
RHIi (λ)
Tj
⌉
(Cj (HI)− Cj (LO)) (4.5)
Therefore, since each task increases RHIi by a specific amount and said amount is in-
dependent of whether the other tasks are triggered or not then finding the subset of tasks
that maximizes RHIi is simply finding the h tasks that each give the largest value for Equa-
tion 4.5. Note that since the value for Equation 4.5 is dependent on RHIi (λ) the set of
triggered tasks must be recalculated for each iteration of the recurrence relationship.
4.2.2 RAMCMode Change: RAMC-rtb
Themode change analysis for RAMCwill follow a similar analysis to the original AMC-rtb
along with using the same technique described for the RAMC HI-criticality steady state.
Specifically, the response time bound analysis for the mode change given a specific set of
triggered tasks λ is:
R∗i (λ) = C
∗
i +
∑
τj∈hpHT(i)
⌈
R∗i (λ)
Tj
⌉
Cj (HI) +
∑
τj∈hpHU(i)
⌈
R∗i (λ)
Tj
⌉
Cj (LO)
+
∑
τj∈hpL(i)
⌈
RLO∗i
Tj
⌉
Cj (LO) +
∑
τj∈hpL(i)
(⌈
R∗i (λ)
Tj
⌉
−
⌈
RLO∗i
Tj
⌉)
Cj (HI) (4.6)
Equation 4.6 borrows from Equation 2.7 to permit lower tasks to continue running
57
after the mode change. Furthermore, it uses the same pattern in Equation 4.1 to separate
triggered HI-criticality tasks and untriggered HI-criticality tasks. As with the HI-criticality
steady state the actual response time analysis involves choosing the set of triggered tasks
that maximizes R∗i (λ):
R∗i = max
λ∈Λ
(R∗i (λ)) (4.7)
As with the HI-criticality steady state it is not necessary to iterate over all possible combi-
nations of triggered tasks. Instead, by rewriting Equation 4.6 it is again shown that not all
sets of triggered tasks need to be enumerated to determine the set that maximizes R∗i (λ).
The modified version of Equation 4.6 is the following:
R∗i (λ) = Ci (LO) + γ (i, λ) (Ci (HI)− Ci (LO))
+
∑
τj∈hpHT(i)
⌈
R∗i (λ)
Tj
⌉
(Cj (HI)− Cj (LO)) +
∑
τj∈hpH(i)
⌈
R∗i (λ)
Tj
⌉
Cj (LO)
+
∑
τj∈hpL(i)
⌈
RLO∗i
Tj
⌉
Cj (LO) +
∑
τj∈hpL(i)
(⌈
R∗i (λ)
Tj
⌉
−
⌈
RLO∗i
Tj
⌉)
Cj (HI) (4.8)
Therefore maximizing R∗i is choosing the set of tasks of size h that maximizes:
γ (i, λ) (Ci (HI)− Ci (LO)) +
∑
τj∈hpHT(i)
⌈
R∗i (λ)
Tj
⌉
(Cj (HI)− Cj (LO)) (4.9)
Which is almost identical to the HI-criticality steady state counterpart, Equation 4.5. As
with the HI-criticality steady state the set of triggered tasks must be calculated for each
iteration of the recurrence relationship as R∗i (λ) increases.
58
4.3 IO-RAMC Analysis
The Recovery Adaptive Mixed Criticality variant was combined with the IO-AMC variant
to produce IO-RAMC. The response time bound analysis for IO-RAMC must take into
account the interference of PIBS in addition to Sporadic Servers/Periodic Tasks. Similar
to the RAMC, the LO-criticality steady state is identical to the non-recovery version and
therefore only the HI-criticality steady state and the mode change will be analyzed.
4.3.1 IO-RAMC HI-criticality steady state
As with RAMC, the response time analysis for the HI-criticality steady state differs from
IO-AMC in that only a subset of HI-criticality tasks transition into the HI-criticality mode.
Similar to the IO-AMC derivation there are two separate response time equations, one for
Sporadic Servers and another for PIBS. To account for the two types of scheduling entities
there will be two sets of triggered tasks, λs and λp each with a maximum size of hs and
hp respectively. The HI-criticality Sporadic Server steady state response time bound for a
given the sets of triggered tasks, λs and λp, for IO-RAMC is:
RHIi (λs, λp) = C
∗
i +
∑
τj∈hpHT(i)
⌈
RHIi (λs, λp)
Tj
⌉
Cj (HI)
+
∑
τj∈hpHU(i)
⌈
RHIi (λs, λp)
Tj
⌉
Cj (LO)
+
∑
τj∈hpL(i)
⌈
RHIi (λs, λp)
Tj
⌉
Cj (HI)
+
∑
τk∈psHT
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , HI
)}
+
∑
τk∈psHU
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , LO
)}
59
+
∑
τk∈psL
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , HI
)}
(4.10)
This equation is similar to Equation 4.1 except for the three additional terms repre-
senting the interference of the triggered HI-criticality PIBS, the untriggered HI-criticality
PIBS and the LO-criticality PIBS respectively. As with RAMC, the worst case response
time analysis is the set of tasks (λs ∈ Λs, λp ∈ Λp) that maximizes R
HI
i (λs, λp). Since
the interference from the Sporadic Servers is not affected by which PIBS are triggered
and vice versa the maximum interference between the two groups is calculated separately.
Therefore the maximum response time bound for Sporadic Server τi is:
RHIi = max
λs∈Λs
(
max
λp∈Λp
(
RHIi (λs, λp)
))
(4.11)
Maximizing for λs, the set of Sporadic Servers that causes the most interference, is
the same as Equation 4.5. Maximizing for λp involves maximizing the last three terms
of equation 4.10. Similar to the Sporadic Server/Period task case, the last three terms
representing the interference caused by PIBS are rewritten to simplify the problem of
finding the set of hp tasks that cause the largest interference. Specifically the interference
caused by PIBS are written as:
∑
τk∈psHT
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , HI
)
− Iqk
(
RHIi (λs, λp) , LO
)}
+
∑
τk∈psH
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , LO
)}
+
∑
τk∈psL
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , HI
)}
(4.12)
which makes it clear that maximizing the interference caused by PIBS is selecting the hp
60
tasks that maximizes:
∑
τk∈psHT
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , HI
)
− Iqk
(
RHIi (λs, λp) , LO
)}
(4.13)
For PIBS τp assigned to Sporadic Server τs in a IO-RAMC system the HI-criticality
steady state response time bound is:
sR
HI
p (λs, λp) = (2− U
∗
p )U
∗
pTs +
∑
τj∈hipHT(s)
⌈
sR
HI
p (λs, λp)
Tj
⌉
Cj (HI)
+
∑
τj∈hipHU(s)
⌈
sR
HI
p (λs, λp)
Tj
⌉
Cj (LO)
+
∑
τj∈hipL(s)
⌈
sR
HI
p (λs, λp)
Tj
⌉
Cj (HI)
+
∑
τk∈psHT\{τp}
max
τq∈hip(s)
{
Iqk
(
RHIi (λs, λp) , HI
)}
+
∑
τk∈psHU\{τp}
max
τq∈hip(s)
{
Iqk
(
RHIi (λs, λp) , LO
)}
+
∑
τk∈psL\{τp}
max
τq∈hip(s)
{
Iqk
(
RHIi (λs, λp) , HI
)}
(4.14)
U∗p is Up (HI) if τi is either a LO-criticality task or a triggered HI-criticality task. If τi is an
untriggered HI-criticality task then U∗p=Up (LO). Excluding the first term, Equation 4.14
is nearly identical to Equation 4.10 except for the inclusion of τs in the interference rep-
resented by the first three terms and the exclusion of τp in the interference represented by
the last three terms.
As with Sporadic Servers in IO-RAMC, the final response time bound for a PIBS is
whichever sets of triggered tasks maximizes sR
HI
p (λs, λp). Specifically:
sR
HI
p = max
λs∈Λs
(
max
λp∈Λp
(
sR
HI
p (λs, λp)
))
(4.15)
61
As with the Sporadic Server case finding the set of triggered tasks that maximizes
sR
HI
p (λs, λp) is done in two separate parts as the set of triggered Sporadic Servers does not
affect the set of triggered PIBS and vice versa. Specifically to maximize with respect to
the set of triggered Sporadic Servers is to maximize the following:
γ (p, λs)
(
(2− Udiffp )U
diff
p Ts
)
+
∑
τj∈hipHT(s)
⌈
sR
HI
p (λs, λp)
Tj
⌉
(Cj (HI)− Cj (LO)) (4.16)
where Udiffp =Up (HI)−Up (LO). Equation 4.16 is similar to Equations 4.5 except the first
term represents the added computation time for the PIBS τp is among the triggered tasks
instead of the Sporadic Server/Periodic Task, and the second term includes τs in the set of
possibly triggered tasks.
Maximizing the interference caused by PIBS for Equation 4.14 is identical to Equa-
tion 4.13 except that τp is excluded from consideration:
∑
τk∈psHT\{τp}
max
τq∈hip(s)
{
Iqk
(
sR
HI
p (λs, λp) , HI
)
− Iqk
(
sR
HI
p (λs, λp) , LO
)}
(4.17)
As with IO-AMC, the response time analysis for a PIBS must be conducted for each
possible Sporadic Server τs that the PIBS performs work on behalf of, and therefore
sR
HI
p <Ts for all τs unless a priori information is known to limit the set of Sporadic Servers
the PIBS τp performs work on behalf of.
4.3.2 IO-RAMCMode Change: IO-RAMC-rtb
What remains for the IO-RAMC analysis is the mode change analysis for when the system
switches to the HI-criticality mode from the LO-criticality mode. Similar to the IO-RAMC
HI-criticality steady state the analysis for the mode change will consist of two portions,
62
one for Sporadic Servers and another for PIBS. The Sporadic Server IO-RAMC response
time bound is the following:
R∗i (λs, λp) = C
∗
i +
∑
τj∈hpHT(i)
⌈
R∗i (λs, λp)
Tj
⌉
Cj (HI) +
∑
τj∈hpHU(i)
⌈
R∗i (λs, λp)
Tj
⌉
Cj (LO)
+
∑
τj∈hpL(i)
⌈
RLO∗i
Tj
⌉
Cj (LO) +
∑
τj∈hpL(i)
(⌈
R∗i (λs, λp)
Tj
⌉
−
⌈
RLO∗i
Tj
⌉)
Cj (HI)
+
∑
τk∈psHT
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , HI
)}
+
∑
τk∈psHU
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , LO
)}
+
∑
τk∈psL
max
τq∈hip(i)
{
Iqk
(
RLO∗i (λs, λp) , LO
)}
+
∑
τk∈psL
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp)−R
LO∗
i , HI
)}
(4.18)
Equation 4.18 is similar to the RAMC response time bound analysis (Equation 4.6) ex-
cept it includes the interference caused by PIBS. The interference caused by PIBS is bro-
ken down into 4 categories, with each category corresponding to a term in Equation 4.18:
1.
∑
τk∈psHT
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , HI
)}
– the interference caused by triggered HI-
criticality PIBS
2.
∑
τk∈psHU
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , LO
)}
– the interference caused by untriggered
HI-criticality PIBS
3.
∑
τk∈psL
max
τq∈hip(i)
{
Iqk
(
RLO∗i (λs, λp) , LO
)}
– the interference caused by LO-criticality
PIBS before the mode change
4.
∑
τk∈psL
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp)−R
LO∗
i , HI
)}
– the interference caused by
LO-criticality PIBS after the mode change
63
Equation 4.18 is written to easily show how to maximize the sets of triggered tasks:
R∗i (λs, λp) = Ci (LO) + γ (i, λs, λp) (Ci (HI)− Ci (LO))
+
∑
τj∈hpHT(i)
⌈
R∗i (λs, λp)
Tj
⌉
(Cj (HI)− Cj (LO)) +
∑
τj∈hpH(i)
⌈
R∗i (λs, λp)
Tj
⌉
Cj (LO)
+
∑
τj∈hpL(i)
⌈
RLO∗i
Tj
⌉
Cj (LO) +
∑
τj∈hpL(i)
(⌈
R∗i (λs, λp)
Tj
⌉
−
⌈
RLO∗i
Tj
⌉)
Cj (HI)+
+
∑
τk∈psHT
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , HI
)
− Iqk
(
RHIi (λs, λp) , LO
)}
+
∑
τk∈psH
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp) , LO
)}
+
∑
τk∈psL
max
τq∈hip(i)
{
Iqk
(
RLO∗i (λs, λp) , LO
)}
+
∑
τk∈psL
max
τq∈hip(i)
{
Iqk
(
RHIi (λs, λp)−R
LO∗
i , HI
)}
(4.19)
Using Equation 4.19 it is easy to see that the tasks that maximize the interference
during the mode change are also the same that maximize the interference during the HI-
criticality steady state.
Finally, the response time analysis for a PIBS during the mode change is shown in
Equation 4.20 below.
sR
∗
p (λs, λp) =
(
2− U∗p
)
U∗pTs
∑
τj∈hipHT(s)
⌈
sR
∗
p (λs, λp)
Tj
⌉
Cj (HI)
+
∑
τj∈hipHU(s)
⌈
sR
∗
p (λs, λp)
Tj
⌉
Cj (LO) +
∑
τj∈hipL(i)
⌈
sR
LO∗
p
Tj
⌉
Cj (LO)
+
∑
τj∈hipL(i)
(⌈
sR
∗
p (λs, λp)
Tj
⌉
−
⌈
sR
LO∗
p
Tj
⌉)
Cj (HI)
+
∑
τk∈psHT\{τp}
max
τq∈hip(i)
{
Iqk
(
sR
∗
p (λs, λp) , HI
)}
+
∑
τk∈psHU\{τp}
max
τq∈hip(i)
{
Iqk
(
sR
∗
p (λs, λp) , LO
)}
64
+
∑
τk∈psL\{τp}
max
τq∈hip(i)
{
Iqk
(
sR
LO∗
p (λs, λp) , LO
)}
+
∑
τk∈psL\{τp}
max
τq∈hip(i)
{
Iqk
(
sR
∗
p (λs, λp)− sR
LO∗
p , HI
)}
(4.20)
Equation 4.20 is similar to Equation 4.18 except for the differences already seen be-
tween the Sporadic Server and PIBS response time equations. As with Equation 4.18, the
true response time bound involves maximizing over the sets of triggered tasks, which is
the same equation for the mode change as the PIBS HI-criticality steady state. Also as
with the other PIBS analyses, all Sporadic Servers τs that the PIBS τp can be assigned to
must be tested against.
4.4 Experimental Results
Similar to IO-AMC, RAMC was analyzed using both a scheduling simulator and in hard-
ware experiments using the Quest operating system. The task parameters used for the sim-
ulated experiments are the same as those used in the IO-AMC simulation experiments (See
Section 3.4.1).
Figure 4.1 shows the effect of varying the number of triggered tasks for a system of
periodic tasks (RAMC). When the number of triggered tasks is zero, the schedulability
is identical to a system without Adaptive Mixed Criticality. As the number of maximum
triggered tasks increases the schedulability decreases, approaching the traditional Adaptive
Mixed Criticality response time bound schedulability. Figure 4.2 shows a similar result for
a system of Sporadic Servers and PIBS (IO-RAMC). Similar to Figure 4.1 as the number
of triggered tasks increases the schedulability approaches that of the non-recovery IO-
AMC. When the number of triggered tasks is zero the schedulability is slightly lower than
a system with AMC due to PIBS resetting themselves after the theoretical mode change.
65
0 %
20 %
40 %
60 %
80 %
100 %
 0  0.2  0.4  0.6  0.8  1
Sc
he
du
la
bl
e 
Ta
sk
se
ts
Utilization
Periodic (No AMC)
RAMC (h=0)
RAMC (h=1)
RAMC (h=2)
RAMC (h=4)
RAMC (h=8)
AMC
Figure 4.1: RAMC: Weighted Schedulability vs Number of Triggered Tasks
Both figures demonstrate that RAMC has a higher schedulability than traditional AMC
assuming the number of triggered tasks is smaller than the number of HI-criticality tasks.
An experiment was also conducted in Quest to investigate IO-RAMC on a physical
machine. A scenario consisting of 2 HI-criticality tasks and 2 LO-criticality tasks was used.
The task parameters are in Table 4.1. During execution a simulated fault was injected into
the first HI-criticality task which caused a recovery routine to be initiated. The exact
mechanism of the recovery routine is covered in Chapter 5. The only part of the recovery
scheme relevant to IO-RAMC is that in order to perform the recovery additional capacity
is required by the HI-criticality task. This causes the first task to enter the HI-criticality
mode along with the LO-criticality tasks while the remaining HI-criticality task remains in
the LO-criticality mode. This is depicted in Figure 4.3. The two HI-criticality tasks did not
miss any deadlines as the additional capacity provided to the faulty task was sufficient to
perform the recovery in a timely manner. The LO-criticality tasks temporarily missed their
66
0 %
20 %
40 %
60 %
80 %
100 %
 0  0.2  0.4  0.6  0.8  1
Sc
he
du
la
bl
e 
Ta
sk
se
ts
Utilization
Sporadic Server
and PIBS (No AMC)
IO-RAMC (h=0)
IO-RAMC (h=1)
IO-RAMC (h=2)
IO-RAMC (h=4)
IO-RAMC (h=8)
IO-AMC
Figure 4.2: IO-RAMC: Weighted Schedulability vs Number of Triggered Tasks
Task Criticality Faulty C (LO) C (HIO) T
Task 1 HI Yes 3ms 15ms 100ms
Task 2 HI No 10ms 40ms 100ms
Task 3 LO No 40ms 25ms 100ms
Task 4 LO No 40ms 25ms 100ms
Table 4.1: IO-RAMC Experiment Task Parameters
deadlines while the faulty task was in the HI-criticality state. After the recovery, all the
tasks switched back into their LO-criticality state and the system returned to normal.
67
LO-Criticality Task 2
LO-Criticality Task 1
HI-Criticality Non-Faulty Task
HI-Criticality Faulty Task
 0  1  2  3  4  5  6  7  8  9  10
Time (Period)
Figure 4.3: Job completion times for four tasks. The red line denotes the time the fault
was injected.
Chapter 5
Virtualized Fault Tolerance
5.1 Introduction
Virtualized Fault Tolerance (VFT) is a fault tolerance scheme that takes advantage of the
Quest-V Virtualized Separation Kernel design. Since each sandbox in Quest-V is isolated
from each other using hardware virtualization extensions the sandboxes are independent
components similar to separate compute nodes in a traditional redundancy scenario. There
are architectural limitations such as shared caches, a shared memory bus, and a shared
power source so the level of isolation is not as strong as separate compute nodes. How-
ever, it is still strong in the sense that a malicious or faulty software component, at the
kernel level or above, or a hardware fault that is isolated to a single sandbox’s memory
or core(s) will not affect another sandbox. Furthermore, by having redundant components
on the same multi-core CPU new recovery techniques that take advantage of the fact that
communication between sandboxes occurs in shared memory channels are investigated.
Specifically, live migration recovery techniques that occur by directly copying memory
from one sandbox to another will be investigated.
VFT is focused specifically on protecting a user-space process. Neither the kernel nor
monitor are protected with VFT. VFT was designed to take advantage of multi-core sys-
tems and to have a low run-time overhead by spreading the execution of multiple instances
on separate cores. This is appropriate for user-space applications that would have iden-
69
tical executions given the same input. VFT is also appropriate for applications where a
majority of the memory related to the process is in user-space as opposed to kernel space.
Since only user-space is protected, if the process has a large unprotected kernel footprint
there is a higher probability of a soft error occurring in the kernel. Protecting the kernel
and monitor can be accomplished with techniques such as arithmetic encoding [FSS09]
which is more suitable to the kernel and monitor as they not identical across all sandbox
instances. This will be discussed in more detail in Future Works (Chapter 7).
The remainder of this chapter will discuss the architecture and implementation details
of VFT along with the costs associated with VFT in order to make real-time guarantees.
5.2 N-Modular Redundancy
This section will describe the N-modular redundancy (NMR) setup in Quest-V. There are
two main software components in the system: the arbitrator and the redundant application
(RA).
5.2.1 Arbitrator
The arbitrator takes the role of the voter in a typical NMR system. It is responsible for
comparing the results of the sandboxes and notifies a sandbox if it experiences an error and
needs to initiate the recovery procedure. The arbitrator, while not required, will typically
be located in a sandbox different than the RAs. Along with collecting the result of the
RAs the arbitrator is also responsible for providing the input for and acting on the output
of the RAs. In a typical real-time/embedded scenario the arbitrator will read input from
a sensor, e.g. a camera and then send to the newly read value to the RAs. When the
RAs send their result to the arbitrator, it will be in the form of a command or value to
be sent to another I/O device,e.g. a servo or motor controller. In this way, the arbitrator
70
acts as a buffer to the I/O devices leaving the RAs to perform the computational and
memory intensive portion of the system. Figure 5.1 shows a typical NMR scenario using
four sandboxes. In this case, there are three redundant copies of the application residing
in Sandboxes 1 to 3 and Sandbox 4 contains the arbitrator. In Figure 5.1, the arbitrator
reads the camera data from the device and sends it to the RAs. The RAs perform the
necessary computation and send back the commands to be sent to the motor. Note that
the sandboxes are not limited to executing only the RA or arbitrator. Quest is capable of
running multiple independent tasks within a single sandbox other tasks are permitted to
be running alongside the depicted applications. The real-time scheduler in Quest ensures
that each RA and the arbitrator receive an appropriate amount of CPU time to ensure the
correct operation of the system [DLW11].
5.2.2 Redundant Application
The redundant application (RA) exists in three or more different sandboxes. In order to
properly detect errors, the redundant application instances must be identical. This means
the only input to the RAs is from the arbitrator and the RA must perform a completely
deterministic computation. For example, if the RAs read from the timestamp counter most
likely each RA would read a different value. This different value would be treated as an
error by the arbitrator. Therefore things such as timestamp values must be included in the
input from the arbitrator. This limitation is not too restrictive as the point of the redundant
application is to perform the same computation multiple times to protect against phenom-
ena such as soft faults. Therefore, the redundant application should be deterministic given
a specific input, otherwise, the output might be different even when there are no faults.
71
Figure 5.1: N-Modular Redundant scenario
5.2.3 Execution
What remains is to describe the interaction between the RAs and the arbitrator. During
the execution of the redundant applications, the applications will make a special system
call at predefined points. This system call will hash either all of the application memory
or a subset. This hash is sent to the voter and is what is compared to ensure the proper
execution of the redundant computation. This information is passed to the voter via shared
memory channels that are established at the beginning of the RA’s execution. Once the
result has been passed to the voter the RA continues execution. It does not have to wait for
the voter unless for some extremely rare reason the voter has been delayed and the buffer
72
the application passes the hash result into is full. This results in most of the overhead for
the fault tolerance technique being the time spent hashing memory.
Ideally, the process memory would be hashed at each sync point to ensure that any
deviation would be detected as soon as possible. However, if the cost of performing a
full memory hash is too expensive a partial memory hash is performed on a rotating basis.
Each iteration a user-specified number of pages is hashed and the next page to hash is
noted. During the next iteration, the hashing process starts where the previous hash left
off. Eventually, the entire process address space is hashed. This approach reduces the
hashing time but leaves the system vulnerable for a longer time. Specifically, if there are
three redundant copies of an application and a second fault occurs in a different sandbox
than a majority of the redundant copies experienced are erroneous. In this case, recovery is
no longer possible. Note, that two or more errors in the same sandbox are still recoverable,
it is when errors occur in a majority of sandboxes the system is no longer recoverable.
The result of the application’s computation is also sent to the voter. Results are sent
through an inter-sandbox mailbox mechanism. What is unique about this mailbox mech-
anism is that all entries are tagged with a specific incrementing number. This unique
number is necessary as the recovery mechanism can cause both repeated execution and
execution phases to be skipped (more on this later). The identification number is used to
handle cases where this occurs. Currently, if the message to the voter experienced a soft
error the voter would not be able to determine whether the error was in the message or
because the redundant computation was faulty. In either case the voter would believe the
message is because the redundant computation is faulty.
Once the voter receives the hashes from the redundant applications it compares the
hashes. Assuming the hashes are correct it updates a variable shared with each redundant
execution letting them know that they passed the test. This indicates that the buffer slot
73
where the hashes are placed in is free as the old hash is no longer needed. Each redundant
application has its own copy of this variable so if there are three redundant applications
three variables are updated. This variable is located in an extended page table (EPT) shared
memory segment mapped to both sandboxes.
If an error is detected in a redundant computation, via an incorrect hash value, the
voter informs the redundant computation by setting a shared variable to the appropriate
value. Also, the identification number of a correct instance of the redundant computation
is also passed to the faulty application. The identification number of the correct sandbox is
used for recovery. The shared variable that indicates an error is checked by the redundant
application whenever it makes the fault tolerance checkpoint system call and right before
the scheduler switches context to the redundant computation. If said variable indicates an
error it will initiate the fault recovery mechanism. The recovery mechanisms are discussed
in detail in Section 5.3.
Figure 5.2 depicts the interactions between sandboxes in the fault tolerance scenario.
At time t0 the three sandboxes with the redundant application all execute one iteration of
the application. They then make the fault tolerance checkpoint (CP) system call at time
t1. During this system call, the hash for the application address space is created and sent
to the arbitrator. They then continue execution. Note that the execution of the redundant
application is depicted as occurring across all sandboxes at the exact time. This is for
the simplicity of the diagram and in reality, the redundant executions and checkpoint
system calls can occur at different times. At time t2 the arbitrator checks the hashes and
verify that all the hashes are correct and does not need to notify any application to initiate
recovery. At this point, the arbitrator would increment a per application counter so the
redundant sandboxes are aware of which items in the circular buffer the hashes are placed
into are free.
74
At time t3 Sandbox 1 experiences a fault that causes its execution to produce an in-
correct result. When the sandboxes enter the checkpoint system call the hash produced
by Sandbox 1 will be inconsistent with the hashes produced by Sandboxes 2 and 3. Even
though the state of Sandbox 1 is incorrect it will continue execution after the checkpoint
system call until notified by the arbitrator that it is incorrect. The arbitrator notifies Sand-
box 1 that it has experienced an error and Sandbox 1 initiates the recovery procedure. The
recovery procedures are discussed in detail in the next Section 5.3.
5.3 Recovery
There are two variants of the fault recovery mechanism. However, before discussing the
differences the similarities will be discussed first. The main idea of both mechanisms is to
copy a correct instance of the redundant application to the sandbox containing the faulty
one, replacing the faulty redundant application, while keeping the correct instance in its
original sandbox. In a sense, it is performing a migration of the application and keeping the
original in place. The recovery mechanism occurs without the need to perform VM-exits
into the virtual machine monitor. This is accomplished by having the monitor map the
necessary pages in the remote sandboxes as readable. This permits a faster recovery with
little sacrifice to isolation or security as the remote pages that are readable are intended to
be identical to pages that are local to the sandbox.
The the majority of the recovery occurs in the context of the faulty application and
is budgeted towards the faulty application. This ensures the performance of other appli-
cations in the same sandbox as the faulty application is not affected during the recovery
procedure in an unpredictable manner. Furthermore, the performance of the correct in-
stance will be affected to some degree but this will be minimized as much as possible
and most importantly the performance degradation will be predictable. In both recovery
75
schemes, a sandbox is selected by the arbitrator to the be correct sandbox used as a copy
for the faulty sandbox. The only necessary criteria for the correct sandbox is that it was
one of the sandboxes with a majority vote. However, given that the performance of the
correct sandbox will be affected it would be ideal to select the sandbox with the smallest
utilization among the majority sandboxes.
VFT assumes that the recovery can be performed quickly enough before another fault
occurs. If a fault occurs in another sandbox, specifically the one being used for recov-
ery, the fault will be propagated to another sandbox resulting in two fault sandboxes. The
amount of time to perform the recovery is analyzed in Section 5.4 and experimental re-
sults shown in Section 5.5 show how this window can be controlled by the application
scheduling parameters.
5.3.1 Recovery-Copy-on-Write
The first of the two recovery mechanisms will be referred to as recovery-copy-on-write
(RCOW). In this mechanism, the faulty application signals the correct application’s sand-
box during the beginning of the recovery procedure. When the correct application’s sand-
box receives this signal it will create a new page directory structure and new page tables,
setting all the entries in the page table to their original physical frames except they will
be marked as read-only. At this point, the correct application continues execution but will
page fault whenever it attempts to write to memory. The page fault handler will then per-
form a copy on write and use the new page for the correct application instance, leaving
the old copy in the old page directory/table data structures. This is akin to copy-on-write
that occurs after a fork system call. It is important that the correct application instance
uses the new copy of the frame leaving the original copy for the faulty instance. This is
because the faulty instance is not notified that the copy has occurred. This ensures that the
76
faulty application copies the correct application sandbox using the old page directory/table
data structures knowing that the frames will not be modified by the running application.
When the recovery procedure is complete the faulty application instance notifies the cor-
rect sandbox that it is done so it frees all resources related to the old page directory/table
data structures. This approach impacts performance of the correct application as it now
has to perform copy on write.
Figure 5.3 is a flowchart depicting the steps in the RCOW recovery mechanism. For
the sake of simplicity, only two sandboxes are shown, the faulty sandbox (Sandbox 1)
and the sandbox being used as the master copy (Sandbox 2). The diagram starts with
Sandbox 1 having just received the message from the arbitrator that it needs to recover.
Sandbox 1 then notifies Sandbox 2 to mark all pages as read only. This is to ensure that
as Sandbox 1 copies the application image from Sandbox 2 it copies a consistent image
of the application. Sandbox 2 notifies Sandbox 1 to begin copying pages and Sandbox 2
continues execution of the application. While running the redundant application, when-
ever Sandbox 2 writes to a new page it will experience a page fault. The page fault handler
copies the page using the new page for the application and leaving the old page for Sand-
box 1. This may occur multiple times during the recovery process. When Sandbox 1 is
finished copying the program pages it notifies Sandbox 2 to release any pages that were
kept for Sandbox 1 to recover. Sandbox 1 that continues execution using its newly recov-
ered application image.
5.3.2 Live Migration Recovery
The second recovery mechanism performs a live-migration recovery allowing the correct
application to continue execution while the faulty sandbox copies pages. This approach
will be referred to as live-migration-recovery (LMR). In this approach, the faulty instance
77
first signals the sandbox of the correct instance to clear the dirty bit of the page table entries
for all user space pages of the correct application instance. The faulty instance then copies
all of the user space memory of the correct instance. Once this has occurred the faulty
application again signals the correct sandbox asking for which pages have been updated
since the last signal. If none have been updated the recovery process is complete. If some
pages have been updated the correct sandbox notifies the faulty application of which subset
of pages needs to be copied. The faulty application copies these pages and then repeats the
process asking which pages, if any, have been updated. This process repeats until either
no pages have changed or the system has determined that repeated rounds do not decrease
the number of pages. At this point a scaled down version of RCOW is initiated where only
the pages that have changed since the last copy are again copied. Once this is completed
the recovery procedure is finished.
Figure 5.4 is a flowchart depicting the steps in the LMR mechanism. As with RCOW,
once notified that it needs to initiate a recovery Sandbox 1 first notifies Sandbox 2 that it
will be using it as the master copy. Sandbox 2 will then clears all the dirty bit entries in
the page tables for the master copy application. Finally, it will flush its translation looka-
side buffer (TLB) so any writes to application pages will have the memory management
unit (MMU) update the corresponding dirty bit entry in the page table. Once this is com-
plete, it will notify Sandbox 1 which will then copy all the pages from the master copy
in Sandbox 2. While Sandbox 1 copies the pages, Sandbox 2 continues execution as nor-
mal, including the execution of the master copy application. Once Sandbox 1 is finished
copying the master copy instance it will notify Sandbox 2. Sandbox 2 will again clear
all the dirty bit entries in the master copy application’s page table entries and this time
note which ones were marked as dirty. Sandbox 2 then sends Sandbox 1 which page table
entries were modified. Sandbox 1 copies just these pages and again notifies Sandbox 2
78
that it finished. This process could repeat multiple times. Eventually, or perhaps even in
the first iteration, the number of pages that have changed since the last time Sandbox 2
notifies Sandbox 1 will either be zero or more than or equal to the previous iteration. If
the number of pages that changed was zero then the program no longer needs to copy any
more pages. If the number of pages was the same or more than Sandbox 2 decides that
it is no longer worth doing the incremental copies. Sandbox 2 will then basically initiate
the RCOW recovery mechanism but just for the subset of pages that have changed since
the last copy. Sandbox 1 then copies this subset of pages and notifies Sandbox 2 to release
any resources related to the recovery procedure. Both sandboxes than continue execution
of the redundant application as normal.
5.4 Real-Time Guarantees
The fault tolerance approach introduced in this thesis is intended for real-time applications
where traditional approaches might be either too expensive or cannot provide the neces-
sary guarantees. Therefore, in order to incorporate VFT into a real-time system, the costs
associated with both the N-Modular Redundancy and recovery must be understood. For
the rest of this section, it is assumed tasks follow a Sporadic Server model where they are
specified by both a capacity, C, and period T such that each task is guaranteed C time
units of execution every T time units. We will assume the implicit deadline model so the
deadline of a task is equal to its period. The Sporadic Server model is mathematically sim-
ilar to the periodic task model with the advantage that it is a bandwidth preserving model.
More information on Sporadic Servers can be found in [Spr90, DLW11, SBWH10].
79
5.4.1 NMR Cost
The cost of performing the N-Modular Redundancy depends on how often the checkpoint
system call is made and the number of pages that are included in the hash. Typically, the
checkpoint system call will be made once per period at the end of the C time units
of execution. The time for a single checkpoint is completely dependent on the num-
ber of pages, N , and the time to hash a single page, H . Therefore, to ensure that the
checkpoint system call does not interfere with the predictable execution of the task the
capacity assigned to a task should be:
C = C ′ +N ×H (5.1)
where C ′ is the original capacity.
5.4.2 Recovery
The cost to perform a recovery is more complicated than the NMR cost. This is partly
due to the fact that the recovery affects both the recovery task and the task being used as
the correct copy. Since each of recovery method uses a different mechanism the cost of
recovery differs for each method. RCOW will be covered first as it is the simpler method.
5.4.2.1 RCOW Costs
For the program performing the recovery-copy-on-write, the cost is summed up by each
action depicted in Figure 5.3. From the point of view of the recovering sandbox the actions
are:
Once a message has been sent to the correct sandbox, the recovering process must wait
for the master copy to both mark all pages as read-only and populate the recovery data. It
80
send start message to correct sandbox()
wait()
copy pages()
send finish message to correct sandbox()
send finish message to arbitrator()
then proceeds to copy the correct process image. Finally, it sends two messages one to the
correct instance and another to the arbitrator to indicate that the recovery has finished.
In the current implementation of VFT, messages between sandboxes are implemented
as inter-processor interrupts (IPIs), while the time to send an IPI varies the worst case
time is not dependent on the size of the program being protected so for the sake of this
discussion it is constant.
The added computation for the recovering process, R, is represented as the following:
R = 3I + S1 +N × P (5.2)
where P is the time it takes to copy a page and I is the time to send an IPI. S1 is the time
spent waiting for the correct sandbox to perform its first stage. For RCOW the first stage
of the correct sandbox consists of marking the pages as read-only, populating the recovery
data and sending a message to the faulty sandbox:
S1 = RO ×N + PRD + I + FTLB (5.3)
where RO is the time to mark a single page as read-only, PRD is the time to populate the
recovery data and FTLB is the explicit cost of flushing the TLB.
The correct sandbox has two additional stages which cause interference due to the
RCOW procedure. The second stage occurs while the recovery procedure is still occurring.
There are two sources of interference, the extra TLB misses that occur because the TLB
81
must be flushed after the pages have been marked as read-only and the cost associated with
the copy-on-write. This interference is represented as:
S2 = WP × (PF + P + 2T ) +ROP ∗ T (5.4)
where WP represents the number of pages that are read from and written to during the
recovery, PF is the cost of a page fault, P is again the time to copy a page, T is the cost
of a TLB miss and finally, ROP is the number of pages only read from. The pages read
and written to experience 2 TLB misses because the TLB must be flushed after the copy
on write has occurred.
The final stage for the correct sandbox is invoked when the faulty instance sends a
message indicating it is finished with the recovery. The correct sandbox must then mark
any pages still marked as read-only as read and writable along with freeing the pages that
had been kept for the recovering process to copy. The cost of this stage is:
S3 =MW ×N +WP ∗ FP (5.5)
whereMW is the time to mark a page as writable, FP is the time to free a frame, and as
beforeN is the number of pages andWP is the number of pages written to. Therefore the
total additional computation the correct sandbox needs is:
S = S1 + S2 + S3 (5.6)
5.4.2.2 LMR Costs
The Live Migration Recovery (LMR) approach tries to minimize the effects on the correct
sandbox by avoiding the need to perform a page fault and copy on write. The cost equation
82
for LMR is more complicated than RCOW as LMR will repeatedly loop as it copies pages.
The steps for the faulty sandbox performing LMR are the following:
Algorithm 4 Live Migration Recovery pseudo-code for the faulty sandbox
send message to correct sandbox()
wait()
copy all pages()
repeat
send message to correct sandbox()
num dirty pages, last copy ← wait()
if num dirty pages > 0 then
copy some pages(num dirty pages)
end if
until last copy
if num dirty pages != 0 then
send message to correct sandbox()
end if
send message to arbitrator()
Using Algorithm 4 the cost of recovery is expressed as the following:
R = 3I + S1 +N × P +
n−1∑
i=1
{I + S1 +DP (i)× P}
+ I + S2 +DP (n)× P (5.7)
where I is the cost to send an IPI, S1, and S2 are the costs for the correct sandbox’s first
and second stage, N is the number of pages of the program and P is the cost to copy a
page. DP (i) is the number of dirty pages for the ith iteration of the loop, starting with 1.
Note that the last iteration of the loop is handled separately. This is because the correct
sandbox may initiate an RCOW subroutine which takes a longer amount of time. n is the
number of iterations of the loop that occur.
What remains is to determine the number of iterations and the number of dirty pages
for each iteration. The number of dirty pages depends on the behavior of the application
83
and the time since the previous message being sent to the correct sandbox which is ac-
tually dependent on the number of dirty pages in the previous iteration. Specifically, in
each round as the number of dirty pages decreases it takes less time for the faulty sandbox
to copy the necessary pages. Since it takes less time to copy this allows less time for the
correct sandbox to modify pages which would have to be updated by the faulty sandbox
in the next iteration round. Since the algorithm stops and performs an RCOW if the num-
ber of pages modified stays the same or increases then the number of iteration rounds is
bounded. Therefore, the number of dirty pages for iteration i is:
DP (i) =


⌈(N × P + I) ∗W ⌉ if i = 0
⌈(DP (i− 1)× P + I) ∗W ⌉ if i > 0
(5.8)
with W being the maximum rate that the correct application instance modifies pages.
Using Equation 5.8 it is easy to determine the maximum number of iterations the live
migration recovery will take. It is either the lowest value for i such that DP (i)=0 or
the lowest value for i such that DP (i)≥DP (i− 1). If it is the first case, then the live
migration recovery will be able to occur without the need to perform an RCOW subroutine,
if it is the later case it will not.
The correct sandbox stages costs are the following:
• S1 – cost to clear the dirty bit in all pages and send a message
• S2 – cost to populate the recovery data and initiate the RCOW subroutine if neces-
sary and send a message
• S3 – cost associated with page faults and copy-on-writes
• S4 – cost to mark all pages as readable and writable and with freeing pages kept
84
for the recovery process if the RCOW subroutine was initiated along with sending a
message
The total cost to the correct sandbox depends on the number of iteration that occur for
the and is expressed as the following:
S = S1 + (n− 1)S2 + S3 + S4 (5.9)
S1 is a function of the number of pages:
S1 = N ×R + I (5.10)
with N being the number of pages and R being the time to mark a page as read-only.
S2 is the cost to both populate the recovery data and initiate a limited RCOW. It is
limited in the sense that only the only pages that need to be copied are marked as read-
only are. S2 is defined as:
S2 = RO ×DP (n) + PRD + I (5.11)
Note that if the RCOW subroutine does not need to be initiated DP (n)=0 and the only
cost for S2 is the time to populate the recovery data (PRD) and send a message (I).
S3 is the cost incurred due to the page faults and copy-on-writes that may occur if the
RCOW subroutine is initiated. It is similar to S2 for the RCOW recovery mechanism.
Specifically, it is defined as:
S3 =λ (n)×
(
min (WP,DP (n))× (PF + P + 2T )+
(ROP + UWP ) ∗ T
)
(5.12)
85
where UWP is the number of unprotected pages written and λ (n) denotes whether the
RCOW subroutine is initiated and is defined as:
λ (n) =


0 if DP (n) = 0
1 if DP (n) > 0
(5.13)
The key differences between Equation 5.12 and its RCOW counterpart (Equation 5.4)
are the following:
• If the RCOW subroutine is not necessary i.e., λ (n)=0, then S3=0 as the correct
sandbox will not experience any added interference from an RCOW subroutine that
is not initiated.
• Instead of multiplying the cost of the page fault, copying a page and the extra TLB
miss byWP it is now multiplied by min (WP,DP (n)). This is due to the fact that
only a subset of the pages that are written to will have a copy on write performed,
specifically the subset that needs to be copied.
• Since some pages that are written to are not protected (represented by UWP ) writ-
ing to these pages only incurs a TLB miss.
Finally, S4 is the cost associated with freeing the data used by the RCOW subroutine.
Since there is nothing to free if the RCOW subroutine is not used S4 will also make use of
λ.
S4 = λ (n) (MW ×N +min (WP,DP (n)) ∗ FP ) (5.14)
5.4.3 Recovery Adaptive Mixed-Criticality
As previously mentioned in Chapter 4 Recovery Adaptive Mixed-Criticality (RAMC) is
used to increase the capacity of both the faulty task and the task being copied during recov-
86
ery. With the above equations for the cost of recovery, the necessary additional capacity is
calculated. This permits recovery to occur in a timely manner without affecting any tasks
beyond the reduction in available capacity for LO-criticality tasks. RAMC combined with
the VFT fault detection and recovery will be demonstrated in the experimental results in
the next section.
5.5 Experimental Results
Virtualized Fault Tolerance was evaluated by using a quadcopter control loop as the main
test program. The control loop was taken from the crazyflie quadcopter [cra] and modified
so that the replicated programs performed the main ProportionalIntegralDerivative (PID)
control computation while the arbitrator passed input data to the replicated programs and
read the output of the PID control computation, specifically the desired motor speeds. On
a real quadcopter, the PID input would be read from various sensors. For this experiment,
the sensor input values are stored in the arbitrator instead of reading from the sensors. This
allows easy reproducibility between experiment runs.
The system was evaluated under two different fault scenarios. The first involved sim-
ulating a random fault that affects the results of the PID control loop. Specifically, before
the hash was taken the memory corresponding to the motor control values is randomly
flipped. This will be referred to as a correctness error. The second scenario involved
delaying the execution of one of the redundant programs to the point that it missed a dead-
line. This will be referred to as a timing error. For the purposes of the correctness error
scenario, the detection of timing errors was disabled. This was done to allow the effects of
a capacity that is too low to perform the hashing or recovery. For both scenarios the time
the task completed each job was recorded.
Figure 5.5 shows a timeline for Core 0 for different fault tolerance scenarios. Each
87
block tick represents when Core 0 finished its task and the cross tick corresponds to when
Core 0 initiated the fault recovery procedure. For all runs, each task is assigned a period of
1 millisecond. The first row labeled “No Fault Detection, (C=10µs)” represents the base
case of no fault detection and no faults. A capacity of 10 microseconds is required for the
PID control loop to run in the base case. The next row, labeled “Fault Detection Enabled
(C=10µs)” shows that 10 microseconds is not a sufficient enough capacity to accommodate
both the PID control loop and the fault detection hashing. An additional 20 microseconds,
to total 30 microseconds, is necessary to accommodate and this is shown in row three. Row
four shows, similar to row three except that a correctness fault is injected during the fourth
period. When the fault is detected by the arbitrator, a message is sent to Core 0 to start
the recovery routine. The recovery routine takes significantly longer than the PID control
loop which results in multiple deadlines being missed. In order to perform the recovery
in enough time to avoid missing a deadline, the capacity needs to be increased to 90µs.
This is illustrated in the fifth row, where no deadlines are missed. Finally, the sixth row
labeled, “RAMCWith Fault (LO=30µs, HI=90µs)”, shows that Recovery AMC is used to
allow the task to have a lower capacity during normal operation while still permitting the
recovery to occur in a timely manner by switching the recovering task into HI mode, giving
it the necessary capacity. After recovery, the task switches back into LO-criticality mode,
permitting any LO-criticality tasks to restore to their LO-criticality capacity. Figure 5.6
shows similar results for the Copy-on-Write (CoW) recovery except that a HI-criticality
capacity of 150mus is required to perform the recovery in a timely manner.
A similar set of experiments were performed where a timing error was injected instead
of a correctness error. Figures 5.7 and 5.8 are analogous to Figures 5.5 and 5.6. The
timeliness fault was injected during the fifth period by having the task enter an infinite
loop. The error is not detected until the end of the fifth period at which point the arbitrator
88
signals the faulty sandbox to initiate the recovery routine for the task. In the cases where
there is enough capacity to perform the recovery in a timely manner, the recovery occurs
after the redundant tasks have finished the sixth job. This results in the recovering task
beginning normal execution at the start of the seventh period.
5.5.1 Hashing a Subset of Memory
Additional experiments were conducted that demonstrated the overhead of fault detection
is reduced by hashing a subset of memory. This feature allows the application developer
to make a trade off in the needed capacity for the redundant applications, and therefore the
schedulability of the system, compared to the resilience of the system to soft errors. See
Section 5.2.3 for more details about hashing a subset of memory.
This resulted in a capacity of 20ms being required to as opposed to 30ms when all
memory was hashed. The recovery still required either 90ms for the Pull Dirty Recovery
and 150ms for the Copy-on-Write Recovery as the entire address space still had to be
copied from another sandbox. Figure 5.9 shows the results of hashing a subset of memory
which are similar to the previous results with respect to recovery occurring in a timely
manner.
5.5.2 Insufficient HI-Criticality Capacity
Experiments were also conducted demonstrating the effects of decreasing the HI-criticality
capacity to a point where recovery could not occur in a timely manner. This results in
the recovering sandbox leaving the recovery process but not synchronized with the other
sandboxes. However, the additional capacity provided by the HI-criticality mode change
for the task is used to perform more computation per period and therefore catch up with the
other redundant copies. During this time the arbitrator would only be able to use results
89
Pull CoW Pull CoW Pull CoW
Iteration # No Faults Recovery Recovery Recovery
HI = 150µs HI = 80µs HI = 50µs
0 0.0212 0.0222 0.0203 0.0229
1 1.0186 1.0229 1.0213 1.0179
2 2.0254 2.0197 2.0229 2.0219
3 3.0231 3.0174 3.0226 3.0183
4 4.0210 4.0248 4.0239 4.0244
5 5.0175 5.6262 6.6294 8.5685
6 6.0245 6.0229 7.0226 8.5810
7 7.0216 7.0170 7.6281 8.5934
8 8.0233 8.0246 8.0255 9.0306
9 9.0194 9.0191 9.0223 9.5744
10 10.0174 10.0263 10.0243 10.0222
11 11.0251 11.0234 11.0260 11.0266
12 12.0223 12.0218 12.0195 12.0206
13 13.0198 13.0198 13.0206 13.0242
Table 5.1: Task Completion Times with Varying HI-Criticality Capacities for Pull
Recovery-Copy-on-Write
from the instances that are not delayed which increases the window where a second fault
could cause the lack of a majority vote but this is another trade-off an application developer
could make given the system requirements. Figure 5.10, demonstrates the recovery taking
multiple periods due to a lower HI-criticality capacity. As the HI-capacity decreases, it
takes more time for the faulty process to come back in sync. Tables 5.1 and 5.2 show the
time stamps in terms of periods and the iteration number for RCOW and LMR respectively.
90
Live Migration Live Migration Live Migration
Iteration # No Faults Recovery Recovery Recovery
HI = 90µs HI = 60µs HI = 40µs
0 0.0212 0.0250 0.0256 0.0238
1 1.0186 1.0231 1.0231 1.0210
2 2.0254 2.0196 2.0210 2.0230
3 3.0231 3.0172 3.0239 3.0190
4 4.0210 4.0245 4.0215 4.0179
5 5.0175 5.5655 —- —-
6 6.0245 6.0239 6.6097 —-
7 7.0216 7.0197 7.0265 8.0320
8 8.0233 8.0250 8.0211 8.5944
9 9.0194 9.0220 9.0160 9.0273
10 10.0174 10.0256 10.0233 10.0243
11 11.0251 11.0201 11.0273 11.0255
12 12.0223 12.0279 12.0274 12.0265
13 13.0198 13.0279 13.0237 13.0252
Table 5.2: Task Completion Times with Varying HI-Criticality Capacities for Live Migra-
tion Recovery
91
Figure 5.2: Flowchart depicting interaction between sandboxes in the fault tolerance sce-
nario. Arrows between sandboxes depict messages being sent. CP represents the check-
point system call.
92
Figure 5.3: Flowchart depicting the Recovery-Copy-on-Write (RCOW) recovery mecha-
nism. Arrows between sandboxes depict messages being sent.
 93 
 
Figure 5.4:  Flowchart depicting the Recovery-Live-Migration (LMR) recovery mech-
anism.  Arrows between sandboxes depict messages being sent. 
93
Figure 5.4: Flowchart depicting the Recovery-Live-Migration (LMR) recovery mecha-
nism. Arrows between sandboxes depict messages being sent.
94
IO-RAMC With Fault
(LO=30µs, HI=90µ)
With Fault
(C=90µs)
With Fault
(C=30µs)
Fault Detection Enabled
(C=30µs)
Fault Detection Enabled
(C=10µs)
No Fault Detection
(C=10µs)
 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
Te
st
 S
ce
na
rio
P=
1m
s 
fo
r a
ll r
un
s
Time (Period)
Figure 5.5: CPU 0 Correctness Error Timeline Live Migration Recovery
IO-RAMC With Fault
(LO=30µs, HI=150µ)
With Fault
(C=150µs)
With Fault
(C=30µs)
Fault Detection Enabled
(C=30µs)
Fault Detection Enabled
(C=10µs)
No Fault Detection
(C=10µs)
 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
Te
st
 S
ce
na
rio
P=
1m
s 
fo
r a
ll r
un
s
Time (Period)
Figure 5.6: CPU 0 Correctness Error Timeline Recovery CoW
95
IO-RAMC With Fault
(LO=30µs, HI=90µ)
With Fault
(C=90µs)
With Fault
(C=30µs)
Fault Detection Enabled
(C=30µs)
Fault Detection Enabled
(C=10µs)
No Fault Detection
(C=10µs)
 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
Te
st
 S
ce
na
rio
P=
1m
s 
fo
r a
ll r
un
s
Time (Period)
Figure 5.7: CPU 0 Timing Error Timeline Live Migration Recovery
IO-RAMC With Fault
(LO=30µs, HI=150µ)
With Fault
(C=150µs)
With Fault
(C=30µs)
Fault Detection Enabled
(C=30µs)
Fault Detection Enabled
(C=10µs)
No Fault Detection
(C=10µs)
 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
Te
st
 S
ce
na
rio
P=
1m
s 
fo
r a
ll r
un
s
Time (Period)
Figure 5.8: CPU 0 Timing Error Timeline Recovery CoW
96
RAMC With Timing
Fault LMR
(LO=20µs, HI=90µ)
RAMC With Timing
Fault RCoW
(LO=20µs, HI=150µ)
RAMC With Correctness
Fault LMR
(LO=20µs, HI=90µ)
RAMC With Correctness
Fault RCoW
(LO=20µs, HI=150µ)
Fault Detection Enabled
(C=20µs)
 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
Te
st
 S
ce
na
rio
P=
1m
s 
fo
r a
ll r
un
s
Time (Period)
Figure 5.9: CPU 0 Timeline Hashing Subset of Memory
RAMC With Correctness
Fault LMR
(LO=30µs, HI=90µs)
RAMC With Correctness
Fault LMR
(LO=30µs, HI=60µs)
RAMC With Correctness
Fault LMR
(LO=30µs, HI=40µs)
RAMC With Correctness
Fault RCoW
(LO=30µs, HI=150µs)
RAMC With Correctness
Fault RCoW
(LO=30µs, HI=80µs)
RAMC With Correctness
Fault RCoW
(LO=30µs, HI=50µs)
 0  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15
Te
st
 S
ce
na
rio
P=
1m
s 
fo
r a
ll r
un
s
Time (Period)
Figure 5.10: CPU 0 Affects of Decreasing HI-Criticality Capacity
Chapter 6
Conclusion
This thesis demonstrates that software techniques provide efficient and predictable re-
silience to soft errors and missed deadlines for high confidence, real-time systems. Soft
errors can occur more often in environments that real-time systems are exposed to, e.g.
aerospace. Additionally, missed deadlines can be just as serious as incorrect results be-
cause of the real-world interaction real-time systems typically have. For example, an au-
tonomous vehicle missing a deadline could result in a collision with another vehicle or a
pedestrian.
Two main approaches to solve the problem of soft errors and missed deadlines were
introduced in this thesis. The first builds on the scheduling framework in the Quest real-
time operating system, comprising a collection of Sporadic Servers for tasks and Priority
Inheritance Bandwidth-Preserving Servers (PIBS) for interrupt handlers. A response time
analysis for a collection of Sporadic Servers and PIBS in a systemwithout mixed criticality
levels was introduced. The analysis was then extended to support an I/O Adaptive Mixed-
Criticality (IO-AMC) model in a system comprising of tasks and interrupt handlers. The
IO-AMC response time bound considers a mode change to high-criticality when insuffi-
cient resources exist for either high-criticality tasks or interrupt handlers in low-criticality
mode. The analysis considers the interference from low-criticality tasks and interrupt han-
dlers before the mode change. Recovery Adaptive Mixed-Criticality (RAMC) was then
98
introduced to address the problem of performing a mode change for a specific subset of
tasks instead of the entire system. The two approaches were combined resulting in IO-
RAMC which permitted a subset of Sporadic Server and PIBS tasks to change criticality
modes.
Simulation results show that a system of only Sporadic Servers for both tasks and in-
terrupt handlers has a higher theoretical number of schedulable task sets. However, in
practice, using PIBS to handle interrupts is shown to be superior because of lower system
overheads. This paper also shows experimental results in the Quest real-time operating
system, where criticality levels are assigned to devices. This enables high criticality de-
vices to gain more computational time when insufficient resources exist to service both
high and low criticality tasks and interrupt bottom halves. In turn, this enables high crit-
icality tasks that issue I/O requests to be granted more CPU time to meet their deadlines.
Simulation results also show that RAMC and IO-RAMC resulted in higher schedulability
compared to AMC and IO-AMC. Experimental results in Quest showed the usefulness
of IO-RAMC. Specifically, a HI-criticality faulty task was able to recover using its HI-
criticality capacity while permitting the HI-criticality task to stay in a LO-criticality state.
This results in more HI-criticality capacity being available to the LO-criticality tasks.
The second main approach introduced in this paper is an N-Module Redundancy tech-
nique based upon the Quest-V Virtualized Separation Kernel called Virtualized Fault Tol-
erance (VFT). VFT uses the isolation provided by the Quest-V hypervisor to create sep-
arate redundant process images that exist in different sandboxes. Using shared memory
as the communication medium between sandboxes novel fault recovery techniques were
introduced, specifically Recovery-Copy-on-Write (RCOW) and Live Migration Recovery
(LMR). Both techniques take advantage of the fact that a faulty process instance state is
restored using a correct state that exists on the same multicore CPU. Both techniques also
99
provide guarantees in terms of the cost of recovery so recovery is accurately accounted
for in the scheduling of the system. Each technique is more suited for certain types of
program behavior. For example, RCOW is more suited for scenarios where multiple it-
erations of LMR would occur because a program is modifying a large number of pages
every iteration. Finally, VFT was combined with RAMC to show that RAMC is an appro-
priate scheduling technique to handle the recovery of tasks in VFT and it provides higher
schedulability than traditional AMC.
Chapter 7
Future Work
This chapter describes future work for Quest-V related to Adaptive Mixed-Criticality and
fault tolerance.
7.1 AMCModel Where Tasks Periods Change
Adaptive Mixed-Criticality scheduling approaches typically focus on increasing the bud-
get available to a task to permit it to complete a job when its budget has been exhausted
before job completion. However, there exist scenarios where instead of directly increasing
the budget it would be more beneficial to decrease the period of a task when moving from
low to high criticality. For example, if wind conditions began to destabilize the quadcopter
it would benefit from being able to read sensor values and before motor adjustments at a
higher frequency. While the budget of the task could be increased, AMC, just like tradi-
tional periodic tasks, make no guarantee when the task will execute within its period so
the additional budget could be available back to back with the original which would ef-
fectively result in the same frequency of updates just with two or more updates happening
back to back.
One naive approach to this problem would be to represent each HI-criticality task as
two tasks, one that only ran in the LO-criticality mode and one that only ran in the HI-
criticality mode with a smaller period. This is not optimal because if such a task set was
101
applied to an AMC mode change model the LO-criticality task would interfere with its HI-
criticality counterpart. An analysis that did not decouple the two tasks would most likely
result in a higher schedulability than one that did.
7.2 Arbitrator, Kernel and Hypervisor Soft Error Protection
The fault tolerance detection and recovery mechanism presented in this thesis was de-
signed to protect a process from a soft error. This is because the redundancy occurs at
the process layer and therefore TMR is applicable at the process level. Enforcing com-
plete redundancy to occur at the kernel layer would require each sandbox to run identical
processes, which would lower the total utilization of the system if not all processes had
to be fault tolerant. However, there would be some redundancy in terms of hypervisor
and kernel code sections as the hypervisor and kernel are duplicated across the sandboxes.
Alternative fault tolerance techniques could be applied to the non-redundant kernel and
hypervisor portions. For example, arithmetic encoding [FSS09] could be used to protect
kernel and hypervisor data structures. In arithmetic encoding, redundancy is added to data
resulting in a larger domain of values where only a subset is valid. One such example is
the AN-code where a value is multiplied by a constant A, resulting in only valid values
being multiples of A. If a value is not a multiple of A then the system experienced an er-
ror. Hoffman, Dietrich and Lohmann applied arithmetic encoding to a smaller specialized
operating system [HDL13] for automobiles but this approach has not been explored for
more general operating systems such as Quest. Such techniques could also be appropriate
for the arbitrator which has only one process instance. Future work would investigate in-
corporating a similar approach to the arbitrator process and Quest kernel and hypervisor
in combination with the VFT approach discussed in this thesis to provide an even more
robust fault tolerant system.
102
7.3 Sporadic Server and PIBS Dependency Scheduling
The scheduling analyses in this paper assume that tasks and I/O bottom half interrupt
handlers are executed on separate servers that are independent of one another. In practice,
a task may be blocked from execution until a pending I/O request is completed. Future
work will consider more complex task models where I/O requests lead to blocking delays
that impact the execution of tasks. This would allow a full end-to-end analysis of tasks
that require I/O, e.g. sensor data, before performing a task.
Also resources can be shared between LO- and HI-criticality tasks. During a mode
change if a resource was held by a LO-criticality task this would adversely affect a HI-
criticality task. To ensure HI-criticality tasks meet all deadlines a scheduling framework
that incorporated resource sharing as well as I/O would need to be considered.
7.4 Fault Recovery Scheduling to Reduce Recovery Time
The recovery mechanisms described in Chapter 5 would be made more efficient if the
process being copied is not executing during the recovery phase. For Recovery Copy-on-
Write (RCOW), if the processing being copied is not executing, the copy-on-write mecha-
nism does not need to be invoked. For Live Migration Recovery (LMR), only one iteration
of the copying has to be performed if the process being copied is not executing. If the
scheduler could guarantee that the processing being copied is not executing, the necessary
capacity for recovery would be reduced which would increase total schedulability. Future
work would investigate scheduling techniques that could allow such guarantees to be made
with regards to what tasks are running during the recovery mechanism.
Bibliography
[Aud91] Neil C Audsley. Optimal priority assignment and feasibility of static priority
tasks with arbitrary start times. Technical Report YCS 164, Department of
Computer Science, University of York, November 1991.
[Aud01] Neil C Audsley. On priority assignment in fixed priority scheduling. Infor-
mation Processing Letters, 79(1):39–44, 2001.
[Avi78] Algirdas Avizˇienis. Fault-tolerance: The survival attribute of digital systems.
Proceedings of the IEEE, 66(10):1109–1125, 1978.
[Bau05] Robert Baumann. Soft errors in advanced computer systems. Design & Test
of Computers, IEEE, 22(3):258–266, 2005.
[BB05] Enrico Bini and Giorgio C. Buttazzo. Measuring the performance of schedu-
lability tests. Journal of Real-Time Systems, 30(1–2):129–154, 2005.
[BB13] Alan Burns and SK Baruah. Towards a more practical model for mixed
criticality systems. In 1st International Workshop on Mixed Criticality Sys-
tems (WMC) at the 34th IEEE Real-Time Systems Symposium, Vancouver,
Canada, 2013.
[BBA10] Andrea Bastoni, Bjo¨rn Brandenburg, and James Anderson. Cache-related
preemption and migration delays: Empirical approximation and impact on
schedulability. Proceedings of OSPERT, pages 33–44, 2010.
[BBD11] Sanjoy K Baruah, Alan Burns, and Robert I Davis. Response-time analy-
sis for mixed criticality systems. In Real-Time Systems Symposium (RTSS),
pages 34–43. IEEE, 2011.
[BCDGG00] Andrea Bondavalli, Silvano Chiaradonna, Felicita Di Giandomenico, and
Fabrizio Grandoni. Threshold-based mechanisms to discriminate transient
from intermittent faults. Computers, IEEE Transactions on, 49(3):230–245,
2000.
[BDF+03] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex
Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of
virtualization. ACM SIGOPS Operating Systems Review, 37(5):164–177,
2003.
104
[BHMK95] Robert Baumann, Tim Hossain, Shinya Murata, and Hideki Kitagawa.
Boron compounds as a dominant source of alpha particles in semiconductor
devices. In Reliability Physics Symposium, 1995. 33rd Annual Proceedings.,
IEEE International, pages 297–302. IEEE, 1995.
[BS95] Thomas C Bressoud and Fred B Schneider. Hypervisor-based fault toler-
ance. ACM SIGOPS Operating Systems Review, 29(5):1–11, 1995.
[Cis16] Cisco. The future is 40 gigabit ethernet. Technical report, 2016.
[CLM+08] Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, Norm
Hutchinson, and Andrew Warfield. Remus: High availability via asyn-
chronous virtual machine replication. In Proceedings of the 5th USENIX
Symposium on Networked Systems Design and Implementation, pages 161–
174. San Francisco, 2008.
[cra] Crazyflie. https://www.bitcraze.io/crazyflie-2/.
[Cri91] Flavin Cristian. Understanding fault-tolerant distributed systems. Commu-
nications of the ACM, 34(2):56–78, 1991.
[DH13] Bjo¨rn Do¨bel and Hermann Ha¨rtig. Where have all the cycles gone? in-
vestigating runtime overheads of os-assisted replication. In Workshop on
Software-Based Methods for Robust Embedded Systems, SOBRES, vol-
ume 13, 2013.
[DLW11] Matthew Danish, Ye Li, and Richard West. Virtual-CPU scheduling in the
Quest operating system. In Real-Time and Embedded Technology and Appli-
cations Symposium (RTAS), 2011 17th IEEE, pages 169–179. IEEE, 2011.
[Dre07] Ulrich Drepper. What every programmer should know about memory. 2007.
[DS10] Alex Depoutovitch and Michael Stumm. Otherworld: giving applications
a chance to survive os kernel crashes. In Proceedings of the 5th European
conference on Computer systems, pages 181–194. ACM, 2010.
[EKBSS13] Jan Elders, Martina Kunze-Busch, Robert Jan Smeenk, and Joep LRM
Smeets. High incidence of implantable cardioverter defibrillator malfunc-
tions during radiation therapy: neutrons as a probable cause of soft errors.
Europace, 15(1):60–65, 2013.
[FSS09] Christof Fetzer, Ute Schiffel, and Martin Su¨ßkraut. An-encoding compiler:
Building safety-critical systems with commodity hardware. In Computer
Safety, Reliability, and Security, pages 283–296. Springer, 2009.
105
[GA03] Sudhakar Govindavajhala and Andrew W Appel. Using memory errors to
attack a virtual machine. In Security and Privacy, 2003. Proceedings. 2003
Symposium on, pages 154–165. IEEE, 2003.
[HDL13] Martin Hoffmann, Christian Dietrich, and Daniel Lohmann. dOSEK: A
dependable RTOS for automotive applications. In Proceedings of the
19th IEEE Pacific Rim International Symposium on Dependable Computing
(PRDC ’13), Vancouver, British Columbia, Canada, December 2013. IEEE
Computer Society Press.
[Hec76] Hi Hecht. Fault-tolerant software for real-time applications. ACM Comput-
ing Surveys (CSUR), 8(4):391–407, 1976.
[HT94] Vassos Hadzilacos and Sam Toueg. A modular approach to fault-tolerant
broadcasts and related problems. 1994.
[Ise05] Rolf Isermann. Model-based fault-detection and diagnosis–status and appli-
cations. Annual Reviews in control, 29(1):71–85, 2005.
[JED08] JEDEC. DDR3 SDRAM, 11 2008.
[KBRJ12] Junsung Kim, Gaurav Bhatia, Ragunathan Raj Rajkumar, and Markus
Jochim. SAFER: System-level architecture for failure evasion in real-time
applications. In Real-Time Systems Symposium (RTSS), pages 227–236.
IEEE, 2012.
[KWFT88] RM Keichafer, Chris J. Walter, Alan M. Finn, and Philip M. Thambidurai.
The MAFT architecture for distributed fault tolerance. Computers, IEEE
Transactions on, 37(4):398–404, 1988.
[LGM+96] Kenneth A LaBel, Michele M Gates, Amy KMoran, Paul WMarshall, Janet
Barth, EG Stassinopoulos, Christina M Seidleck, and Cheryl J Dale. Com-
mercial microelectronics technologies for applications in the satellite radi-
ation environment. Proc. IEEE Aerospace Applications, pages 375–390,
1996.
[LH94] Jaynarayan H Lala and Richard E Harper. Architectural principles for safety-
critical real-time applications. Proceedings of the IEEE, 82(1):25–40, 1994.
[Lie95] J. Liedtke. On micro-kernel construction. In Proceedings of the Fifteenth
ACM Symposium on Operating Systems Principles, SOSP ’95, pages 237–
250, New York, NY, USA, 1995. ACM.
[LL73] Chung Laung Liu and James W Layland. Scheduling algorithms for multi-
programming in a hard-real-time environment. Journal of the ACM (JACM),
20(1):46–61, 1973.
106
[LLJLF+96] Jacques-Louis Lions, Lennart Lu¨beck, Gilles Kanh Jean-Luc Fauquember-
gue, Wolfgang Kubbat, Stefan Levedag, Didier Merle Leonardo Mazzini,
and Colin O’Halloran. Ariane 5 flight 501 failure, July 1996.
[LWCM14] Ye Li, Richard West, Zhuoqun Cheng, and Eric Missimer. Predictable com-
munication and migration in the Quest-V separation kernel. In Real-Time
Systems Symposium (RTSS). IEEE, 2014.
[MAAB13] Hamid Mushtaq, Zaid Al-Ars, and Koen Bertels. Efficient software-based
fault tolerance approach on multicore platforms. In Proceedings of the Con-
ference on Design, Automation and Test in Europe, pages 921–926. EDA
Consortium, 2013.
[MAL+15] Peter Munk, Mohammad Shadi Alhakeem, Raphael Lisicki, Helge Parzy-
jegla, Jan Richling, and Hans-Ulrich Heiß. Toward a fault-tolerance frame-
work for cots many-core systems. In Dependable Computing Conference
(EDCC), 2015 Eleventh European, pages 167–177. IEEE, 2015.
[Mel11] Mellanox. Mellanox infiniband fdr 56gb/s for server and storage intercon-
nect solutions. Technical report, 2011.
[Nor96] Eugene Normand. Single event upset at ground level. Nuclear Science, IEEE
Transactions on, 43(6):2742–2750, 1996.
[NPK+85] Donald K Nichols, William E Price, WA Kolasinski, R Koga, James C
Pickel, James T Blandford, and AE Waskiewicz. Trends in parts suscep-
tibility to single event upset from heavy ions. Nuclear Science, IEEE Trans-
actions on, 32(6):4189–4194, 1985.
[OBF+93] J Olsen, PE Becher, PB Fynbo, P Raaby, and J Schultz. Neutron-induced
single event upsets in static rams observed a 10 km flight attitude. Nuclear
Science, IEEE Transactions on, 40(2):74–77, 1993.
[Que] Quest. http://www.QuestOS.org.
[SBWH10] Mark Stanovich, Theodore P Baker, An-I Wang, and Michael Gonza´lez Har-
bour. Defects of the POSIX sporadic server and how to correct them. In
Real-Time and Embedded Technology and Applications Symposium (RTAS),
2010 16th IEEE, pages 35–45. IEEE, 2010.
[SLR86] Lui Sha, John P Lehoczky, and Ragunathan Rajkumar. Solutions for some
practical problems in prioritized preemptive scheduling. In Real-Time Sys-
tems Symposium (RTSS), volume 86, pages 181–191, 1986.
[Spr90] Brinkley Sprunt. Aperiodic task scheduling for real-time systems. PhD the-
sis, Carnegie Mellon University, 1990.
107
[SSAW92] MS Shea, DF Smart, JH Allen, and DC Wilkinson. Spacecraft problems
in association with episodes of intense solar activity and related terrestrial
phenomena during march 1991. Nuclear Science, IEEE Transactions on,
39(6):1754–1760, 1992.
[Uni14] United States Department of Transportation, Bureau of Transportation
Statistics. Air carriers: T-100 domestic market (U.S. carriers). http:
//www.transtats.bts.gov/, Dec 2014.
[Ves07] Steve Vestal. Preemptive scheduling of multi-criticality systems with vary-
ing degrees of execution time assurance. In Real-Time Systems Symposium
(RTSS), pages 239–243. IEEE, 2007.
[WBB+05] Jeffrey D Wilkinson, Chad Bounds, Timothy Brown, Bruce J Gerbi, and
Joel Peltier. Cancer-radiotherapy equipment as a cause of soft errors in elec-
tronic equipment. Device and Materials Reliability, IEEE Transactions on,
5(3):449–451, 2005.
[WDS+91] Daniel C Wilkinson, Stuart C Daughtridge, John L Stone, Herbert H Sauer,
and Phil Darling. TDRS-1 single event upsets and the effect of the space
environment. Nuclear Science, IEEE Transactions on, 38(6):1708–1712,
1991.
[WLG+78] John H Wensley, Leslie Lamport, Jack Goldberg, Milton W Green, Karl N
Levitt, PMMelliar-Smith, Robert E Shostak, and Charles BWeinstock. Sift:
Design and analysis of a fault-tolerant computer for aircraft control. Pro-
ceedings of the IEEE, 66(10):1240–1255, 1978.
[WLMD16] Richard West, Ye Li, Eric Missimer, and Matthew Danish. A virtualized
separation kernel for mixed-criticality systems. ACM Transactions on Com-
puter Systems (TOCS), 34(3):8, 2016.
Curriculum Vitae
109
