Resource-Aware Replication on Heterogeneous Multicores: Challenges and
  Opportunities by Döbel, Björn et al.
Resource-Aware Replication on Heterogeneous Multicores:
Challenges and Opportunities
Bjo¨rn Do¨bel, Robert Muschner, Hermann Ha¨rtig
Technische Universita¨t Dresden
Dresden, Germany
{doebel,robemusc,haertig}@tudos.org
Abstract
Decreasing hardware feature sizes and increasing het-
erogeneity in multicore hardware require software that
can adapt to these platforms’ properties. We imple-
mented ROMAIN, an OS service providing redundant
multithreading on top of the FIASCO.OC microkernel to
address the increasing unreliability of hardware. In this
paper we review challenges and opportunities for RO-
MAIN to adapt to such multicore platforms in order to
decrease execution overhead, resource requirements, and
vulnerability against faults.
1 Introduction
Commercial-off-the-shelf (COTS) hardware components
are becoming more powerful and complex with every
hardware generation. Decreasing transistor sizes are an
enabler for these developments, because vendors can
now add ever more functional units with lower energy
consumption. The downside of this development is that
processors become more vulnerable to permanent and
transient hardware errors [4]. This trend is expected to
increase and poses a serious threat to future hardware
generations [10].
While hardware fault-tolerance mechanisms to miti-
gate these reliability issues exist [3, 8], COTS vendors
try to avoid such extensions because they make hard-
ware more expensive to produce. Researchers addressed
this problem by proposing software-level fault tolerance
methods that run on COTS hardware. These methods
come in the form of compiler extensions [15] and exten-
sions to the runtime environment [18].
The increasing availability of varying processor units
leads to a second trend: today’s hardware platforms are
becoming more heterogeneous. Non-uniform memory
architectures and cache hierarchies require new forms
of scheduling threads and data [22]. The availability of
specialized compute units and general-purpose GPUs in-
creases the complexity of deciding when and where to
run an application workload [13].
We designed ASTEROID, a fault-tolerant operating
system architecture as shown in Figure 1. This archi-
tecture combines the FIASCO.OC microkernel, the cor-
responding L4Re user-level runtime environment, and
ROMAIN [6], an operating system service that pro-
vides replicated execution for binary applications using
software-implemented redundant multithreading [14].
Replicated
Driver
Replicated
Application
L4 Runtime
Environment
Romain
Fiasco.OC microkernel
Figure 1: ASTEROID Resilient OS Architecture
In this paper we survey ROMAIN in Section 2. There-
after we discuss three challenges with respect to adapting
to heterogeneous hardware and evaluate how ROMAIN
may aid or benfit from these scenarios. In Section 3 we
show how the underlying cache hierarchy interacts with
replication performance. Thereafter we present a method
to adapt the number of ROMAIN replicas to environmen-
tal vulnerability conditions in Section 4. Last, we show
how the ROMAIN software architecture can be mapped to
CPU cores with different resilience levels in order to bet-
ter protect ROMAIN’s Reliable Computing Base (RCB)
in Section 5.
2 Replication as an OS Service
Compiler extensions that provide software-implemented
fault tolerance require the protected software’s source
code to be available for recompilation. This is often im-
possible as many vendors distribute software as binaries.
Replication-based fault tolerance schemes do not share
1
Copyright is held by the author/owner(s).
1st Workshop on Resource Awareness and Adaptivity in Multi-Core
Computing (Racing 2014), May 29–30, 2014, Paderborn, Germany.
35
this requirement and can therefore protect binary appli-
cations.
We implemented ROMAIN, an extension to FI-
ASCO.OC’s L4 Runtime Environment (L4Re).1 RO-
MAIN provides a software implementation of redundant
multithreading [14]. As shown in Figure 2, multiple
replicas of a protected application run in isolated address
spaces on different CPU cores in a multicore system. A
master process controls these replicas by managing their
resources and intercepting all their communication to the
outside world. The master thereby makes sure that repli-
cas always obtain identical inputs and produce the same
outputs unless they are affected by a hardware fault.
Replica Replica Replica
Romain Master
System
Call Proxy
Memory
Manager
=
CPU 0 CPU 1 CPU 2
Figure 2: ROMAIN Replication Architecture
ROMAIN executes replicas using FIASCO.OC’s vir-
tual CPU (vCPU) mechanism [11]. This OS feature al-
lows the master process to intercept any externalization
event (e.g., system calls, page faults, CPU exceptions)
that is generated by a replica. Replicas therefore never
directly interact with applications outside their sphere of
replication [14].
When encountering an externalization event, the mas-
ter process first compares the replicas’ states to validate
that they still match. After successful validation, the
replicas’ externalization event is handled. For OS sys-
tem calls, handling means that the master performs the
respective system call on behalf of the replicas, while the
replicas remain blocked. Once the system call returns,
the master adjusts the replicas’ states as if they had is-
sued the system call themselves. Thereafter, the replicas
resume independent execution until they reach their next
externalization event.
Microkernels — such as FIASCO.OC — perform all
memory management in user-space applications [2].
ROMAIN leverages this property to maintain full control
over the resources owned by each replica. Any replica
request to create a new kernel object is intercepted and
emulated by the master process. Furthermore, the master
1http://l4re.org
acts as the replicas’ memory manager. It keeps track of
which regions in virtual memory are attached to the repli-
cas and services any page faults that arise while replicas
execute.
ROMAIN distributes replicas across the available CPU
cores in a multicore system. This has two advantages:
first, this approach adds another layer of redundancy to
the system. If one of the CPU cores encounters a perma-
nent hardware fault, only a single replica is affected by
this problem. ROMAIN can detect the resulting error and
correct it for instance by reassigning the faulty replica to
another CPU core. Second, using multiple CPUs allows
replicas to execute concurrently as long as they only per-
form internal computations. This approach therefore also
minimizes the runtime overhead for replicated execution.
Our initial implementation of replica assignment dis-
tributed replica threads sequentially across the available
CPUs, starting at CPU 0. We will see in the follow-
ing section that this strategy – which does not incorpo-
rate any knowledge about the underlying platform and
its properties – is far from ideal.
3 Adapting to Resource Requirements
In modern multicore platforms, CPUs are often dis-
tributed across multiple sockets. CPUs on a socket
share caches and local memory. Inter-processor inter-
rupts (IPIs) between CPUs on the same sockets are de-
livered faster than to CPUs on a different socket. To op-
timize runtime overheads, ROMAIN needs to be aware
of the cache and communication hierarchy when placing
replicas on CPUs.
We performed an experiment on a multicore system
containing two sockets, each containing six Intel Xeon
X5650 CPUs clocked at 2.66 GHz. We analyzed the
sources of replication overhead for ROMAIN and found
IPIs that are sent between replicas and the master for ev-
ery externalization event to be large contributors of over-
head. For the platform in question we found that IPIs
require an average of 5,900 CPU cycles for intra-socket
communication and 14,300 CPU cycles for inter-socket
communication.
Based on this observation we implemented a core as-
signment algorithm in ROMAIN that tries to place repli-
cas on CPUs on the same socket in order to save IPI
overhead. With these optimizations in place, we exe-
cuted 11 of the 12 SPEC INT 2006 benchmarks in RO-
MAIN.2 We show three classes of results in Figure 3
and compare them to native execution (i.e., without RO-
MAIN). First we executed a single instance of each
benchmark using ROMAIN. This setup does not pro-
vide any fault tolerance, but solely measures the run-
time overhead induced by intercepting system calls and
2We left out 483.xalancbmk because it uses deprecated C++ STL
features that are not supported by L4Re’s standard C++ library.
2
36
400
perl
401
bzip2
403
gcc
429
mcf
445
gobmk
456
hm-
mer
458
sjeng
462
lib
quan-
tum
464
h264ref
471
om-
net++
473
astar
1
1.05
1.1
1.15
1.2
1.25
1.3
R
u
n
ti
m
e
n
or
m
a
li
ze
d
vs
.
n
a
ti
ve
ex
ec
u
ti
o
n Single DMR TMR
1.45
1.95
Figure 3: Overhead for replicating the SPEC INT 2006 benchmarks with one, two, and three replicas compared to
native execution. Geometric mean overheads: GM(DMR) = 0.66%, GM(T MR) = 2.51%
proxying them through the ROMAIN master. The sec-
ond class of experiment shows the overhead for running
two replicas (DMR), providing error detection capabili-
ties. Finally, a third set of experiments shows ROMAIN
running in triple-modular-redundant (TMR) mode. This
last setup shows the required overhead for achieving fault
tolerance against a single faulty replica.
Replication with ROMAIN is acceptably cheap in most
cases – the geometric mean overhead for TMR execution
is at 2.51%. However, the results show that four of the
benchmarks induce considerably higher overheads and
we therefore investigated these benchmarks more thor-
oughly. The high overheads for the 403.gcc benchmark
can be attributed to its exceptional memory access pat-
terns, which cause high memory management overheads
as well as trigger errors in the current ROMAIN proto-
type’s memory management.3
We inspected the 429.mcf, 462.libquantum, and
471.omnet++ benchmarks more closely using hardware
performance counters. We found these benchmarks to
cause a huge amount of last-level cache misses. All repli-
cas share a single L3 cache when placed on the same
CPU socket and this cache then becomes a replication
bottleneck. We adapted ROMAIN’s CPU placement algo-
rithm to distribute replicas across our two CPU sockets,
thereby giving replicas access to two separate L3 caches.
We then re-ran the benchmarks and show the improved
benchmark results in Figure 4. The experiment confirms
that our initial CPU assignment strategy hurts cache-
bound applications and that these applications should be
distributed across CPU sockets.
Adaptation Challenge: Assigning replicas to CPUs
in a heterogeneous platform can significantly influence
replication overhead. Our current strategy assigns repli-
cas to CPUs statically. Hence, ROMAIN needs to rely on
user knowledge to configure the proper assignment. Our
vision for future versions of ROMAIN is to monitor cache
3We nevertheless show these numbers for completeness.
miss rates of replicas at runtime. The master process can
thereby distinguish between communication-bound and
cache-bound applications. The former benefit from be-
ing placed on a single socket, whereas the latter benefit
from being distributed over all sockets to increase cache
availability.
Reliability adds an orthogonal perspective: Replicas
running on the same socket may be affected by faults
that hit the whole socket. Therefore distributing repli-
cas across sockets may increase tolerance against these
faults. On the other hand, if the communication con-
nection between two sockets fails, distributed replicas
may no longer synchronize whereas replicas on the same
socket still function correctly. Such interactions between
the hardware fault model and replica placement need to
be further investigated.
429
mcf
462
libquan-
tum
471
om-
net++
1
1.05
1.1
1.15
1.2
R
u
n
ti
m
e
n
or
m
a
li
ze
d
vs
.
n
a
ti
ve
ex
ec
u
ti
o
n
Single DMR TMR
Figure 4: Overhead for replicating cache-bound SPEC
INT 2006 benchmarks when distributing replicas across
different CPU sockets
4 Adapting Resource Consumption
While the mechanism introduced in the previous sec-
tion allows us to optimize replication overhead, it does
not solve a more general problem of replicated execu-
tion: executing N replicas requires roughly N times the
amount of resources of single-instance execution. Even
3
37
though modern multicore platforms provide plenty of
processor and memory resources, we would like to re-
duce resource consumption for replication in order to
save energy.
Statically replicated applications need to configure the
number of replicas (N := 2× num f aults + 1 [17]) for the
worst possible case. However, in many situations the
protected system will not suffer from worst-case condi-
tions all the time. An embedded device might for in-
stance be designed to work in high-temperature or high-
radiation conditions, but live in non-hazardous environ-
ments for most of its execution time. In this case we
would like to switch on replication only during periods
of hazardous operation and save replication resources at
all other times.
From the perspective of a software developer, different
parts of a program may have different criticality and vul-
nerability levels. It may therefore be useful to adapt the
number of running replicas depending on the criticality
of code that is executed at the moment. Program annota-
tions or program vulnerability analysis [20] may give the
replication system hints about when and how to adapt.
Based on these observations we assume that an exter-
nal observer (e.g., a sensor determining hazardous situ-
ations) or the application itself (using specific compiler-
inserted vulnerability hints) is able to notify the ROMAIN
master process about the need to adapt replica counts.
We added mechanisms to ROMAIN that interpret this in-
formation and adapt replica execution accordingly.
In order to decrease the number of replicas, ROMAIN
waits for the next replica externalization event. At this
point we know all replicas have reached identical state,
otherwise ROMAIN would trigger error recovery. This
means that the replicas have the same instruction pointer,
register and memory content. Furthermore, no hardware
I/O is in flight.
ROMAIN then puts one replica to sleep, releases all
memory consumed by this replica, and resumes execu-
tion with N− 1 replicas. To increase replica count, RO-
MAIN wakes up a sleeping replica when getting a respec-
tive hint. The newly woken replica’s CPU state is set to
the state of all other replicas. New memory regions are
allocated and their content copied over from a running
replica. Then, ROMAIN resumes execution using N + 1
replicas.
ROMAIN’s mechanism for dynamically adjusting
replicas allows us to adapt resource and energy consump-
tion to the actual reliability needs of a replicated appli-
cation. However, we found that releasing and reallocat-
ing memory regions may cause significant overhead be-
cause of the additional memory management operations
required for every replica modification. For the case of
decreasing replicas, we hide the latency for freeing mem-
ory by performing these duties in a background worker
thread that runs concurrently to the other replicas.
For the case of increasing the number of replicas we
added a copy-on-write (COW) mechanism to ROMAIN.
Instead of allocating new memory regions during replica
startup, we make already existing regions available to
the respective replica and mark them read-only. Only
if a replica writes to this region we perform an actual
reallocation and copy data. This mechanism reduces
replica startup overhead, but has implications to RO-
MAIN’s hardware requirements.
Our initial implementation of ROMAIN replicates all
memory resources allocated by a replicated application.
This approach allows replicas to execute without the
need for synchronizing upon every memory access and
therefore decreases replication overhead. From a re-
liability perspective, we can furthermore refrain from
requiring costly ECC-protected memory, because repli-
cated memory resources allow detection and correction
of memory errors as well. In the case of COW memory
regions, this guarantee no longer holds. The number of
replicated memory regions may be smaller than the num-
ber of available replicas. A memory fault in a COW re-
gion may be seen by multiple replicas and can therefore
lead to a situation where the majority of replicas makes
their decisions based an this erroneous value.
Adaptation Challenge: ROMAIN provides mecha-
nisms to dynamically adapt the number of replicas at run-
time. Fast memory management using a COW mecha-
nism only works if ROMAIN runs on a hardware platform
providing ECC-protected memory. Otherwise we have
to fall back to allocating and copying memory immedi-
ately during replica wakeup. System designers therefore
need to address another challenge: Are they willing to
get dynamic replication at the cost of increased overhead
for replica creation? Or would they prefer to trade this
overhead for increased production cost and energy con-
sumption when using memory ECC?
5 Adapting to Hardware Vulnerability
The ASTEROID architecture uses replication to protect
user applications against the effects of hardware faults.
However, a subset of the system — consisting of the OS
kernel and the ROMAIN replication servcie — is not cov-
ered by these protection mechanisms. This is problem-
atic for the whole system’s reliability: After all, these
components are required to execute correctly in order to
properly implement replication. We call these compo-
nents the Reliable Computing Base (RCB) [7].
We believe that heterogeneous hardware can aid in
protecting the RCB. Today’s multicore platforms use het-
erogeneity to provide dedicated compute cores (such as
general-purpose GPUs [16]) or energy-efficient proces-
sor alternatives (such as ARM’s big.LITTLE [1]). The
Error-Resilient System Architecture furthermore pro-
4
38
NonRes Res NonRes Res
Rep
NonRes Res NonRes Res
Rep
NonRes Res NonRes Res
Rep 1Rep
handle
migrate
2HT
msg()
Rep HT
handle
3HT
Memwrite poll
Rep HT
Mempoll
handle
Figure 5: Notification variants: 1) Replica migration 2)
Asynchronous Notifications 3) Shared-memory polling
posed multicore chips that comprise CPUs with different
reliability levels [12]. We therefore assume that future
heterogeneous platforms will provide resilient (ResCore)
and non-resilient (NonResCore) CPUs on the same chip.
We propose to split the ASTEROID software into a
non-resilient and the RCB layer [5]. The non-resilient
layer is protected using the replication mechanisms de-
scribed in the previous sections. The RCB layer is pro-
tected by running the respective software components on
specific ResCores. The idea of running OS services on
dedicated cores is nothing new and has previously been
introduced by FlexSC [19] and fos [21]. Our contribu-
tion here is to use this split for protecting the RCB.
In Section 2 we explained that ROMAIN runs replicas
concurrently on different physical CPU cores. To protect
the RCB we have to distinguish between replica execu-
tion and master execution in ROMAIN. Replica code is
scheduled by the ROMAIN master to run on any Non-
ResCore. ROMAIN master code however needs to run on
a ResCore. To implement replication, we now need an
additional mechanism to transfer replica state between
the resilient and non-resilient worlds.
We implemented three alternative mechanisms for
state transfer as shown in Figure 5. Our first alterna-
tive uses FIASCO.OC’s mechanism to migrate threads
between CPU cores to migrate a replica to a ResCore
for handling externalization events. The second alterna-
tive avoids migration cost and instead uses a RCB helper
thread for every replica. Then, instead of migrating the
replica, ROMAIN simply sends the replica state to the
ResCore using a synchronous inter-processor message.
Our third alternative avoids the cost of sending inter-
processor messages by having the helper thread poll a
shared-memory region for new data (e.g., using x86’
monitor/mwait set of instructions).
susan CRC32 susan CRC32
0
20
40
60
80
R
u
n
ti
m
e
n
or
m
a
li
ze
d
vs
.
n
a
ti
ve
ex
ec
u
ti
o
n
Migration
Sync
SharedMem
DMR TMR
Figure 6: Runtime overhead for two high-overhead
benchmarks from the MiBench benchmark suite when
run with different inter-replica notification methods
Using these three mechanisms we were able to adapt
ROMAIN to run RCB code and replica code on cores
with different resilience levels. We evaluated the alter-
natives using benchmarks from the MiBench embedded
benchmark suite [9]. We ran the benchmarks once using
ROMAIN and then picked the two benchmarks with the
highest replication overhead (susan, CRC32). For these
two benchmarks we performed an experiment on a mul-
ticore machine assuming that CPU0 of this machine was
a ResCore and all other cores were NonResCores. We
then ran these benchmarks in ROMAIN using DMR and
TMR setups. Figure 6 compares the overheads induced
by the different notification mechanisms.
While our results show that shared-memory polling is
the most efficient notification mechanism, this approach
relies on correctly functioning shared memory between
the non-resilient and resilient worlds. This may be an-
other reliability issue because unreliable software might
use such shared memory regions to overwrite data RCB
components rely on. Hence, practical implementations
may forbid shared memory to isolate protected data from
unrprotected software. In this case we have to resort to
migration or synchronous messaging.
Adaptation Challenge: ROMAIN allows running RCB
code on resilient cores while replica code runs on non-
resilient CPUs. Heterogeneous platforms may provide
these hardware features. System designers then need
to decide at which ratio to provide ResCores and Non-
ResCores. Furthermore, hardware reliability mecha-
nisms may be necessary to protect transmission of replica
data to RCB software components.
5
39
6 Conclusion
In this paper we reviewed the ROMAIN OS replication
service with respect to its adaptation capabilities for fu-
ture heterogeneous multicore platforms. We showed that
the heterogeneity introduced by such platforms leads
to new optimization criteria when assigning replicas to
physcial CPUs. We determined challenges that system
developers face when applying replication on these plat-
forms and showed opportunities that arise for improv-
ing overall system reliability by explicitly leveraging the
platform’s reliability properties.
Acknowledgments
This work was supported by the German Research Foun-
dation (DFG) as part of the priority program ”Depend-
able Embedded Systems” (SPP 1500 - spp1500.itec.
kit.edu).
References
[1] ARM LTD. Big.LITTLE processing with ARM Cortex-A15.
Whitepaper, 2011.
[2] ARON, M., DELLER, L., ELPHINSTONE, K., JAEGER, T.,
LIEDTKE, J., AND PARK, Y. The SawMill framework for virtual
memory diversity. In Proceedings of the 8th Asia-Pacific Com-
puter Systems Architecture Conference (Bond University, Gold
Coast, QLD, Australia, Jan. 29–Feb. 2 2001).
[3] AUSTIN, T. DIVA: A reliable substrate for deep submicron mi-
croarchitecture design. In Microarchitecture, 1999. MICRO-32.
Proceedings. 32nd Annual International Symposium on (1999),
pp. 196–207.
[4] BORKAR, S. Designing reliable systems from unreliable com-
ponents: the challenges of transistor variability and degradation.
IEEE Micro 25, 6 (Nov.-Dec. 2005), 10–16.
[5] DO¨BEL, B., AND HA¨RTIG, H. Who watches the watchmen? –
protecting operating system reliability mechanisms. In Interna-
tional Workshop on Hot Topics in System Dependability (2012),
HotDep’12.
[6] DO¨BEL, B., HA¨RTIG, H., AND ENGEL, M. Operating system
support for redundant multithreading. In 12th International Con-
ference on Embedded Software (EMSOFT) (Tampere, Finland,
October 2012).
[7] ENGEL, M., AND DO¨BEL, B. The Reliable Computing Base:
A Paradigm for Software-Based Reliability. In Workshop on
Software-Based Methods for Robust Embedded Systems (2012).
[8] ERNST, D., KIM, N. S., DAS, S., PANT, S., RAO, R., PHAM,
T., ZIESLER, C., BLAAUW, D., AUSTIN, T., FLAUTNER, K.,
AND MUDGE, T. Razor: a low-power pipeline based on circuit-
level timing speculation. In Microarchitecture, 2003. MICRO-36.
Proceedings. 36th Annual IEEE/ACM International Symposium
on (dec. 2003), pp. 7–18.
[9] GUTHAUS, M. R., RINGENBERG, J. S., ERNST, D., AUSTIN,
T. M., MUDGE, T., AND BROWN, R. B. MiBench: A free,
commercially representative embedded benchmark suite. In Pro-
ceedings of the Workload Characterization, 2001. WWC-4. 2001
IEEE International Workshop (Washington, DC, USA, 2001),
IEEE Computer Society, pp. 3–14.
[10] ITRS. International Technology Roadmap for Semiconduc-
tors. http://www.itrs.net/Links/2011ITRS/Home2011.
htm, 2011.
[11] LACKORZYNSKI, A., WARG, A., AND PETER, M. Generic Vir-
tualization with Virtual Processors. In Real-Time Linux Workshop
(Nairobi, Kenya, October 2010).
[12] LEEM, L., CHO, H., BAU, J., JACOBSON, Q., AND MITRA,
S. Ersa: Error Resilient System Architecture for Probabilistic
Applications. In Design, Automation Test in Europe Conference
Exhibition (DATE) (March 2010), pp. 1560–1565.
[13] PIENAAR, J. A., CHAKRADHAR, S., AND RAGHUNATHAN, A.
Automatic generation of software pipelines for heterogeneous
parallel systems. In Proceedings of the 2012 International Con-
ference for High Performance Computing, Networking, Storage
and Analysis (Washington, DC, USA, 2012), SC ’12, IEEE Com-
puter Society, pp. 1–12.
[14] REINHARDT, S. K., AND MUKHERJEE, S. S. Transient fault
detection via simultaneous multithreading. SIGARCH Comput.
Archit. News 28 (May 2000), 25–36.
[15] REIS, G. A., CHANG, J., VACHHARAJANI, N., RANGAN, R.,
AND AUGUST, D. I. SWIFT: Software implemented fault tol-
erance. In Proceedings of the International Symposium on Code
Generation and Optimization (2005), IEEE Computer Society,
pp. 243–254.
[16] SANDERS, J., AND KANDROT, E. CUDA by Example: An
Introduction to General-Purpose GPU Programming, 1st ed.
Addison-Wesley Professional, 2010.
[17] SCHNEIDER, F. B. Implementing Fault-Tolerant Services Us-
ing the State Machine Approach: A Tutorial. ACM Computing
Surveys 22, 4 (Dec. 1990), 299–319.
[18] SHYE, A., MOSELEY, T., REDDI, V. J., BLOMSTEDT, J., AND
CONNORS, D. A. Using process-level redundancy to exploit
multiple cores for transient fault tolerance. In Proceedings of the
37th Annual IEEE/IFIP International Conference on Dependable
Systems and Networks (Washington, DC, USA, 2007), DSN ’07,
IEEE Computer Society, pp. 297–306.
[19] SOARES, L., AND STUMM, M. FlexSC: flexible system call
scheduling with exception-less system calls. In Proceedings of
the 9th USENIX conference on Operating systems design and
implementation (Berkeley, CA, USA, 2010), OSDI’10, USENIX
Association, pp. 1–8.
[20] SRIDHARAN, V., AND KAELI, D. R. Quantifying Software Vul-
nerability. In Workshop on Radiation effects and fault tolerance
in nanometer technologies (New York, NY, USA, 2008), WREFT
’08, ACM, pp. 323–328.
[21] WENTZLAFF, D., AND AGARWAL, A. Factored operating sys-
tems (fos): The case for a scalable operating system for multi-
cores. SIGOPS Oper. Syst. Rev. 43, 2 (Apr. 2009), 76–85.
[22] ZHURAVLEV, S., BLAGODUROV, S., AND FEDOROVA, A. Ad-
dressing shared resource contention in multicore processors via
scheduling. In Proceedings of the Fifteenth Edition of ASPLOS
on Architectural Support for Programming Languages and Oper-
ating Systems (New York, NY, USA, 2010), ASPLOS XV, ACM,
pp. 129–142.
6
40
