Do Your Cores Play Nicely? A Portable Framework for Multi-core
  Interference Tuning and Analysis by Iorga, Dan et al.
Do Your Cores Play Nicely?
A Portable Framework for Multi-core Interference Tuning and Analysis
Dan Iorga
Imperial College London
d.iorga17@imperial.ac.uk
Tyler Sorensen
Princeton University
ts20@cs.princeton.edu
Alastair F. Donaldson
Imperial College London
alastair.donaldson@imperial.ac.uk
Abstract
Multi-core architectures can be leveraged to allow inde-
pendent processes to run in parallel. However, due to re-
sources shared across cores, such as caches, distinct processes
may interfere with one another, e.g. affecting execution time.
Analysing the extent of this interference is difficult due to:
(1) the diversity of modern architectures, which may contain
different implementations of shared resources, and (2) the com-
plex nature of modern processors, in which interference might
arise due to subtle interactions. To address this, we propose a
black-box auto-tuning approach that searches for processes
that are effective at causing slowdowns for a program when
executed in parallel. Such slowdowns provide lower bounds
on worst-case execution time; an important metric in systems
with real-time constraints.
Our approach considers a set of parameterised “enemy”
processes and “victim” programs, each targeting a shared re-
source. The autotuner searches for enemy process parameters
that are effective at causing slowdowns in the victim programs.
The idea is that victim programs behave as a proxy for shared
resource usage of arbitrary programs. We evaluate our ap-
proach on: 5 different chips; 3 resources (cache, memory bus,
and main memory); and consider several search strategies
and slowdown metrics. Using enemy processes tuned per chip,
we evaluate the slowdowns on the autobench and coremark
benchmark suites and show that our method is able to achieve
slowdowns in 98% of benchmark/chip combinations and pro-
vide similar results to manually written enemy processes.
1. Introduction
Multi-core processors have seen widespread adoption, with
nearly every consumer device containing more than one inde-
pendent processing unit. However, due to shared resources and
their corresponding arbitration mechanisms, e.g. cache hierar-
chies and protocols, reasoning about program behaviours on
multi-core processors can be significantly more complex than
on their single-core predecessors. Interference can affect non-
functional properties of otherwise entirely separate processes,
e.g. the execution time of a program on a multi-core processor
can vary greatly depending on shared resource contention. Be-
cause of these issues, the Worst Case Execution Time (WCET)
of an application on a multi-core chip is difficult to derive, and
as a result, multi-core processors remain challenging to deploy
in systems with hard or soft real-time constraints.
Because of this, prior work has identified interference paths,
where the contention and arbitration of shared resources might
impact program execution time [15, 20, 21, 7, 8]. Compo-
nents of interference paths include caches, memory buses, and
main memory systems. Although not widely adopted, various
schemes have been proposed to limit this interference. Such
schemes require either hardware support, e.g. cache partition-
ing [18], or invasive software modifications, e.g. bank-aware
memory allocation and bandwidth reservation [27, 26]. How-
ever, even with these schemes, interference can still be sub-
stantial [24]. As a result, there is immediate and pragmatic
interest in detecting and quantifying interference effects rather
than aiming to mitigate them entirely.
In this vein, various techniques have been investigated to
quantify the effects of interference on real-time properties,
and evaluated for specific multi-core architectures [22, 19, 13].
Typically, work in this domain consists of: manually devel-
oping small programs designed exclusively to stress an inter-
ference path (called enemy processes in this work), executing
a set of enemy processes on all but one core of a multi-core
system (called a hostile environment in this work), and evalu-
ating the execution time of a sequential Software Under Test
(SUT) on the remaining core. The slowdown observed in the
SUT from the hostile environment is a useful quantification of
interference effects.
This prior work requires the manual design of hostile envi-
ronments and corresponding enemy processes, presenting two
immediate limitations. First, manual enemy process design
is not portable across architectures, as different architectures
may have different implementations, or their shared resources
may be configured differently. Thus, manual effort is required
for each new target architecture. Second, hand-tuned enemy
processes may not be as effective as possible, due to subtle
interactions that are difficult to derive from available documen-
tation, and thus unlikely to be considered by human designers.
In this work, we aim to address both limitations. The heart
of our contribution is an auto-tuning method that can tune
enemy process parameters to be effective at slowing down
an SUT. For each interference path considered, our approach
takes a parameterised enemy process (called an enemy tem-
plate) and a corresponding victim program, which is designed
to be especially vulnerable to the particular interference path.
We then employ auto-tuning to search for enemy process pa-
rameters that are effective at causing a slowdown in the cor-
responding victim program. After obtaining tuned enemy
ar
X
iv
:1
80
9.
05
19
7v
1 
 [c
s.D
C]
  1
3 S
ep
 20
18
1 void vec_add(float *A, float *B, float *C, int SIZE) {
2 for (int i = 0; i < SIZE; i++)
3 C[i] = A[i] + B[i];
4 }
(a) A vector addition program that computes C← A+B
1 void cache_enemy(byte* scratch) {
2 while(1)
3 for (int i = 0: i+=STRIDE; i < BUFFER_SIZE)
4 ACCESS(&(scratch[i]));
5 }
(b) An enemy template that targets the cache by accessing a region of memory
in cache line sized strides; values in CAPS are parameters
Figure 1: Example of an SUT (a) and an enemy process (b)
Table 1: Parameters for the enemy template of Figure 1b for the
Pi3 and 570X boards: BUFFER_SIZE is given in KB; STRIDE is
given in bytes; and ACCESS is given by a sequence of store (S)
and load (L) operations
Pi3 570X
hand-tuned auto-tuned hand-tuned auto-tuned
BUFFER_SIZE 512 20480 2048 40960
STRIDE 64 262 64 40960
ACCESS SL SLSSL SL SS
processes, we employ a second level of tuning over all vic-
tim programs to obtain a combination of enemy processes,
which can be deployed as a hostile environment. The tuning
approach can be run on many chips using the same enemy
templates and victim programs to produce chip-specialised
hostile environments.
We illustrate the problems with manually-designed enemy
processes, and explain at a high-level how our auto-tuning
approach overcomes these limitations, using an example.
Example of Portability Limitation Figure 1a shows a sim-
ple vector addition program. The SIZE parameter can be set
to a value (in this case we use 16K) such that the vector data
is not able to fit in a entirely in a core-local cache; thus, mem-
ory accesses must go through shared caches at some point.
Suppose we want to assess the potential interference between
this program (the SUT) running on one core and a set of in-
dependent programs running on other cores. The program of
 1
 2
 4
 8
 16
HT(Pi 3) HT(570X) AT
slo
wd
ow
n
Pi 3 slowdowns
1.14 1.14
10.08
 1
 2
HT(Pi 3) HT(570X) AT
570X slowdowns
1.06
1.36
1.66
Figure 2: Slowdowns caused by different hostile environ-
ments on the Pi3 and 570X: HT denotes enemy processes
hand-tuned for the chip given in (), and AT denotes auto-tuned
enemy processes for the target chip
Figure 1b is an example of an enemy process: the sole task of
the program is to exercise an interference path. This particu-
lar enemy process is designed to cause cache contention by
looping over a memory region the size of the shared cache and
accessing memory at a stride of the cache line size. To explore
potential interference, the execution time of the SUT execut-
ing on one core can be measured while multiple instances of
the cache enemy processes are running on other cores in the
system.
As the enemy process code of Figure 1b shows, there are
three parameters that need to be instantiated: (1) size of the
shared cache; (2) cache line size; and (3) memory instructions,
i.e. loads or stores, used to access memory. To provide suitable
values, the target processor must be known. For example, on a
4-core Raspberry Pi 3 B (abbreviated to Pi3), the shared cache
size is 512KB, the cache line is 64 bytes and we use a single
store followed by a load as the instructions. The execution
time of the SUT shows a 1.14× slowdown when executed
in parallel with this enemy processes running on the three
additional cores.
Now, if we consider a different processor, say a 4-core
Intel Joule 570X (abbreviated to 570X), we might try the
same experiment using the same parameters from the Pi3.
In this case, we observe a slowdown of 1.06×. We then
might try a different enemy process, tuned to the architectural
details of the 570X. This changes the size of the shared cache
to 2048KB but keeps the same cache line size. With this
new enemy process, a much greater slowdown of 1.36× is
observed. However, when the 570X enemy process is used on
the Pi3, the slowdown observed is 1.14×, exactly the same as
the original Pi3 enemy process. Since the Pi3 and the 570X
have the same same cache line size and the same associativity
but different cache sizes, the enemy processes will access the
cache in a simillar manner but the Pi3 enemy process will not
access the entire 570X cache.
These results are intuitive: enemy processes tuned to a par-
ticular architecture will be less effective when running on a
different architecture. Prior work in this area has largely fo-
cused on such enemy processes, i.e. hand-tuned for a particular
architecture. In the case where enemy processes are written
at the assembly level, the situation is even worse: enemy pro-
cesses designed with respect to one ISA are inapplicable to a
processor with a different ISA.
Example of Effectiveness Limitation Recall that the hand-
tuned enemy processes showed an SUT slowdown of 1.14×
and 1.36× for the Pi3 and 570X, respectively. These slow-
downs were achieved by making reasonable human judge-
ments for enemy process parameters. If instead, using the
methodology described in the remainder of this paper, the en-
emy template is auto-tuned for two hours, different parameters
can be found (summarised in Table 1).
The values found by auto-tuning do not correspond to any
architectural features that we are aware of, and it seems un-
likely that a human designer with detailed knowledge of these
2
processors would guess them. However, the slowdowns of
the SUT using this auto-tuned hostile environment increase to
10.08×1 and 1.66× for the Pi3 and 570X, respectively. The
slowdown results are summarised in Figure 2. Thus, an auto-
tuning methodology for enemy processes (1) requires less
architectural knowledge and manual effort than hand-tuned
approaches of previous work, and (2) can provide greater
slowdowns than reasonable hand-tuned enemy processes.
Contributions We present an auto-tuning methodology for
hostile environments and the corresponding enemy processes.
This approach can be employed for different chips, removing
the need for the detailed architectural expertise required by
prior works in this area. Additionally, because of the many
configurations explored by the tuning approach, our methods
may discover non-intuitive parameters that cause slowdowns
beyond what might be developed through hand-optimisation,
as illustrated in the above example.
We illustrate our approach by creating a hostile environment
on five different chips for three interference paths: cache,
memory bus, and the main memory system. This requires three
enemy process, victim program pairs. For tuning, we explore
three search strategies (random search, simulated annealing
and Bayesian optimisation), and report on the effectiveness of
each.
Finally, we assess the effectiveness of our approach at caus-
ing slowdowns in SUTs by running benchmarks from the
coremark and autobench application suites [3, 2] in the hostile
environments produced by our tuning methodology. We show
that we can achieve statistical significant slowdowns for 98%
of benchmarks. We compare the slowdowns caused by our
auto-tuned hostile environments with hand-tuned hostile envi-
ronments from prior work and show that the slowdowns are
comparable and that in some cases we are even able to achieve
higher slowdowns.
In summary, our contributions, in order of presentation, are:
1. An auto-tuning methodology for hostile environments with
the aim to cause slowdowns in an SUT; this methodology
is portable and can be used to automatically tune hostile
environments for different chips (Section 2).
2. An illustration of our methodology on five different chips
for three interference paths: cache, memory bus, and main
memory systems; we evaluate several natural search strate-
gies and slowdown metrics (Section 3).
3. An assessment of the extent that our tuned hostile environ-
ments slowdown the coremark and autobench real-time ap-
plication suites; we show that in many cases our auto-tuned
hostile environments are as effective as hand-optimised
assembly environments of prior work [22], and sometimes
better (Section 4).
The source code for our framework can be found online.2
1This slowdown is alarming, but we have rigorously validated this result
and see similar values for the Pi3 throughout this work
2https://github.com/mc-imperial/multicore-test-harness
Table 2: The available parameters of the tuning framework
Parameter Discussed in this work
Victim program resource cache; bus; main memory
Enemy template resource cache; bus; main memory
Search strategy random; sim. ann.; Bayesian opt.
Metric median; max; quantile
2. Creating a Hostile Environment
We now describe in detail our methodology for creating a
hostile environment, which aims to be effective at causing
slowdowns in an SUT through shared resource interference.
After a high-level overview of our approach (Section 2.1),
we detail the interference paths (shared resources) targeted
in this work along with the associated per-resource victim
programs (Section 2.2). We then explain how we use auto-
tuning to search for enemy template parameters per resource
(Section 2.3) and how hostile environments are constructed
from tuned enemy processes (Section 2.4). Because different
search strategies can be used in the tuning phase, we outline
several natural choices (Section 2.5). We conclude the section
by describing the care we have taken to ensure validity of
the measurements that form the basis of the tuning process
(Section 2.6).
2.1. Overview of our approach
The first step in our method consists of identifying possible
interference paths, i.e. shared resources for which multi-core
contention might cause slowdowns. For each one of these
paths, we create: (1) a parameterised enemy template that
will run in an infinite loop and stress the resource and (2) a
victim program that performs a fixed amount of synthetic work,
making heavy use of the resource. We tune the parameters
of the enemy template using the slowdown it can cause on
its corresponding victim program as the objective function.
Because the victim program is vulnerable to interference on
the target resource, the degree to which it is slowed down
serves as a proxy for measuring interference on the associated
interference path.
Each enemy process is tuned to provoke interference to
a specific resource. However, we want to develop a hostile
environment that is effective at slowing down an arbitrary
SUT. A black-box SUT is likely to use multiple resources in
complex ways. Thus, we aim to find a combination of tuned
enemy processes to be effective across all victim programs.
The second step in our methodology involves searching for
a combination of tuned enemy processes, with one enemy
process running on each non-SUT core of the processor, that
is effective in causing interference with respect to multiple
resources.
Table 2 summarises the parameters that can be used to
configure our auto-tuning framework. In the tuning phase of
our methodology, different search strategies can be employed.
3
We evaluate three natural choices: random search, simulated
annealing and Bayesian optimisation (see Section 2.5 for more
detailed descriptions of each), and compare their proficiency
in finding effective parameters. Since measurements in this
domain are noisy, we time multiple runs and analyse the results
to best approximate the actual interference. These metrics,
along with the steps we have taken to ensure measurement
validity, are explained in more detail in Section 2.6.
2.2. Shared Resources and Victim Programs
We now discuss several shared resources that can lead to inter-
ference between independent applications running in parallel
on separate cores. For each type of interference, we first
outline the ANSI-C victim program we have designed to be
vulnerable to this interference, and then describe a param-
eterized enemy template which aims to effectively provoke
interference through this resource.
Bus Buses are used to transfer data between memory and
processing elements. To reduce the area of a processor, the bus
is often shared between multiple processing cores, requiring
some form of arbitration mechanism. There are three main
classes of resource arbitration mechanisms: (1) time-driven
arbitration uses a predefined bus schedule that assigns time
slots to contending components; (2) event-driven arbitration
resolves contention at runtime, e.g. via a round-robin or first-in-
first-out strategy; and (3) a hybrid approach that uses different
runtime policies for each time slot [5].
Victim Program The bus victim program reads a series of
numbers from a main memory data buffer and increments their
value. The increment operation forces the enemy process to
bring the numbers from main memory to the CPU. Afterwards,
the process writes the numbers back to a second buffer in
main memory. The buffers are allocated using malloc and
are sufficiently large not to fit in the cache. This entire process
ensures that the bus will be kept busy with transfers between
main memory and registers.
Enemy Template We have designed the enemy with the aim of
hindering data transfers between the CPU and main memory.
It achieves this by performing the same operation as the victim
process and thus competes for the same bus resource. Each
enemy template has distinct buffers to simplify the design and
avoid using any synchronisation mechanism.
The configurable parameters are:
• Size of main memory data buffers.
• Integer data type used for buffers : int8_t, int16_t
• Which buffer is used for read and which for writes.
Intuitively, the larger the size of the data being transferred,
the more contention it will cause. However, there might be pat-
terns where fast transfers through the bus followed by pauses
will trigger some timing issues as shown in [12].
Cache Caches are used to reduce latency between the pro-
cessor and main memory. A cache is much smaller than the
main memory and, as a result, only a small subset of main
memory can be stored in the cache at one time. There are
multiple cache levels, and usually the last level cache is shared
between processor cores. As an example, the 570X has two
levels of cache, where the level 1 data and instruction caches
are core-local, while the level 2 cache is shared between all
cores. Once a cache becomes full, stale data is evicted and
replaced with newer data using a variety of policies, e.g., first
in first out, least recently used, and random replacement [14].
Because these caching policies have a direct effect on mem-
ory latency, multi-core timing analysis must take into account
potential interference from contending cores.
Victim Program The victim program is configured based on
the size of the shared cache, its associativity and its line size.
The program allocates an array of the size of the last level
cache and then reads and writes to the array in a pattern given
by the associativity and the line size. The processor optimises
access by storing the array in the last level cache.
Enemy Template The cache enemy template works by striding
over a data buffer and performing a sequence of memory
operations (reads or writes). The configurable parameters will
force it to access the cache in a chaotic manner, thus working
against data locality for which caches are designed to optimise.
The cache enemy template is configured by the following
parameters:
• Data buffer size
• Stride value
• Series of operations; reads, writes (up to five)
Main Memory Access to main memory is granted by a
shared controller. In a single read or write operation, only one
bank of memory can be accessed for single-channel memories
and simultaneous requests can lead to delayed requests [26].
The interference issues are similar to those of shared caches,
however, the runtime impact of contended memory accesses
are higher.
Victim Program The victim program allocates a data buffer in
main memory and writes random values to it. We ensure that
the values are actually written in main memory and not just in
the cache by having a sufficiently large buffer that does not fit
in the cache for any processor.
Enemy Template The goal of the enemy template is to touch
as many memory banks as possible. It allocates a large data
buffer and repeatedly selects a random, contiguous sub-region
of this buffer, of a given fixed size, and memsets the sub-region
with random values. A randomly-selected byte is chosen and
memset is used to write this byte across the region. The
following parameters configure the enemy process:
• The size of the data buffer
• The size of the sub-region
Overlap and Extensibility There will be a certain level
of overlap between enemy processes. The hardware cache
policies can make the memory enemy processes first write
data to the cache and therefore also act as a cache enemy.
4
Also, we expect that the memory enemy process will stress the
system bus by bringing data from the processor to the memory.
However, we assume that each enemy process will concentrate
its attack on one resource, e.g. the bus enemy, in contrast to the
memory enemy, will only read data from the same locations
in main memory and therefore will demand less work from
the memory controller. This way the enemy processes will
complement each other.
It is easy to extend the framework and stress different shared
resources that we have not included and that might be unique
to specific platforms. This extension can be done by adding
a pair consisting of a victim program and an enemy template,
including the tunable parameters.
2.3. Tuning enemy processes per Resource
For each processor, we tune the parameters of the enemy
templates to cause the highest slowdown in their corresponding
victim program. In this section, we describe how this process
is performed.
In what follows, let R denote the set of resources to be
targeted. In our current work, R = {bus,cache,RAM}. For
a resource r ∈ R let Vr denote the victim program associated
with r, and Tr the enemy template associated with r. For
example Vbus denotes the victim programs associated with the
bus resource, and Tcache the enemy template associated with
the cache resource.
A template Tr takes a set of parameters drawn from a param-
eter set Pr. For a given parameter valuation p ∈ Pr, let Tr(p)
denote the concrete enemy process obtained by instantiating
Tr with parameters p.
For a victim program Vr, template Tr and parameter setting
p ∈ Pr, let slowdown(Vr,Tr(p)) denote the slowdown associ-
ated with (1) executing Vr in isolation on core 0 (with all other
cores unoccupied), compared with (2) executing Vr on core 0,
in parallel with an instance of Tr(p) on every other available
core.
Our aim is to compute:
arg max
p∈Pr
slowdown(Vr,Tr(p))
Because Pr is too large to search exhaustively, we use a
search strategy to approximate the maximum. These are dis-
cussed in Section 2.5.
Let ptunedr denote the best parameter setting that was found
via search using the chosen strategy. We refer to the set
ResourceTunedEnemies = {Tr(ptunedr ) | r ∈ R} as the set of
resource-tuned enemies.
2.4. Tuning a Hostile Environment
The tuning process described in Section 2.3 considers the
same enemy process running on every available core other
than that occupied by the SUT. We aim to devise a deployment
of enemy processes that is effective at inducing interference
across all resource types since we do not know the resource
usage profile of the SUT a priori. We determine the best
configuration of enemy processes by using a strategy similar
to the one described in [23].
We refer to a configuration of multiple possibly distinct
enemy processes running on the non-SUT cores as a hostile
environment. More formally, for an n-core processor where
the SUT runs on core 0, a hostile environment is a mapping
from {1, . . . ,n−1}→ ResourceTunedEnemies.
We now describe our strategy for choosing a suitable hostile
environment from the set of |R|n−1 possibilities. First, for each
resource r ∈ R, we rank every possible hostile environment
according to the extent to which they slow down Vr, with the
environment that induces the largest slowdown ranked first.
Let RankedEnvironments(r) denote this ranking. This set is
much smaller than the tuning set, so we can exhausting run
these experiments. The most common processor in our case, a
4 core processor, with 3 shared resources would only require
81 evaluations.
We then select a Pareto optimal hostile environment. This
is an environment e such that there does not exist an environ-
ment e′ 6= e such that for all r ∈ R, e′ is ranked more highly
than e in RankedEnvironments(r). Being Pareto optimal, e
may not be unique and in this case we use a tie breaking
mechanism. The tie breaking mechanism consists of select-
ing the environment that is ranked better in all but one of the
RankedEnvironments(r).
2.5. Search Strategies
To estimate the maximum interference caused to the victim
program, we need to find effective parameters for the enemy
templates given in Section 2.2. We intuitively expect the
search space of enemy process configurations to be discontin-
uous with respect to interference, e.g. due to caches having
fixed parameters that are typically powers of two, memory
being organised in banks, etc. Therefore we utilise search
strategies that do not make any explicit assumption about the
convexity of the cost function and do not rely on gradient
information. To do so, we evaluate the following candidates:
Random search (RAN) RAN samples different configura-
tions and remembers the best values. This approach has the
advantage of being lightweight and providing a baseline for
the more complicated techniques.
Simulated Annealing (SA) SA is a metaheuristic to approx-
imate global optimisation in a large search space. It is often
used when the search space is discrete (e.g., all tours that visit
a given set of cities). For problems where finding an approxi-
mate global optimum is more important than finding a precise
local optimum in a fixed amount of time, simulated annealing
may be preferable.
Bayesian Optimisation (BO) Having an unknown objective
function, the Bayesian strategy is to treat it as a random func-
tion and place a prior over it. The prior captures our beliefs
about the behaviour of the function. After gathering the func-
5
 600
 800
 1000
 1200
 1400
 78  79  80  81  82  83
Fr
eq
eq
nc
y[
MH
z]
Temperature['C]
Figure 3: Frequency variation due to temperature on the Pi3.
tion evaluations, which are treated as data, the prior is updated
to form the posterior distribution over the objective function.
The posterior distribution, in turn, is used to construct an ac-
quisition function that determines what the next query point
should be.
There are advantages and disadvantages to each one of
these strategies. Random search and simulated annealing can
quickly determine the next query point. Bayesian optimisation
needs time to remodel the objective function and the acquisi-
tion function after each new query is made. On the other hand,
it is expected that Bayesian optimisation will only sample
points that will increase our knowledge of the problem. In
general, one would expect to prefer the first strategies in cases
where the cost function is cheap to evaluate and BO in cases
where the cost function is expensive to evaluate. We evaluate
the effectiveness of these approaches in Section 3.2.
2.6. Measurement validity
A threat to the validity of our approach is that measurement er-
rors and performance fluctuations due to external factors may
cause us to wrongly conclude that our test harness is responsi-
ble for slowing down an SUT. Similar to other approaches that
make use of enemy processes [11], we deal with factors re-
lated to the hardware, operating system, and compiler. We also
make use of a statistical metric, more specifically quantiles, to
refine our results.
Hardware Hardware mechanisms are generally designed to
be transparent to the user but can be unpredictable. We took
into account the following factors in our design:
1. Frequency throttling due to increased temperature. When
the temperature of a processor increases beyond a limit,
frequency throttling can kick in. Figure 3 shows how fre-
quency is affected by temperature on the Pi3. The data was
gathered using a tool called vcgencmd. We want to guard
against the risk of attributing an SUT slow-down to inter-
ference caused by an enemy process when the slow-down
is actually due to the raised temperature of the processor.
To mitigate this risk, we measure the temperature at the end
of each experiment and discard the result if the temperature
has risen above 80 degrees. We empirically found that
using this temperature threshold works well on the other
devices used.
2. Alternating between hot and cold caches. We flush the
cache at the beginning of each experiment as data left from
the previous experiment might affect the execution time of
the next one.
Operating system Modern operating systems are multi-
threaded and include a range of elements, aimed at efficient
execution of a large number of threads. However, this can
make thread execution more unpredictable. We use the follow-
ing techniques to mitigate the effects that the operating system
could have on our measurements.
1. Thread migration. The operating system might decide to
migrate the thread to a different core for various reasons.
We avoid this by pinning the SUT and the enemy processes
on a specific cores using the taskset linux command.
2. SUT preemption. To avoid the kernel from stochastically
preempting the SUT, adding the cost of context switching
to our measurement time, we run the application at the
maximum possible priority.
3. Ensure parallel execution. The operating system might
decide to postpone the startup of any of the enemy pro-
cesses after the SUT has started, rendering the experiment
practically useless. To evaluate the maximum startup la-
tency in different platforms that we considered, we used
the latency evaluation framework [4] and discovered that
the maximum startup latency is generally under 1 ms. To
ensure that all the enemy processes are running before the
SUT starts executing, we wait 10 ms after all the enemy
processes have started, which should be a conservative
margin for any evaluation.
4. Remove unnecessary software The interaction between dif-
ferent software can be difficult to predict. To mitigate the
chance of this occurring, we removed all software that is
not strictly required for our experiments such as any graph-
ical capabilities, logging software and network managers.
Compiler The compiler might optimise away part of the
code in the enemy process to reduce execution time, decreas-
ing the stress it is intended to put on specific resources. To
avoid this, we generate random numbers at runtime for certain
elements, e.g. the number written by memset. Furthermore,
we run the compiler with the -O0 flag.
Statistical analysis We can never truly get rid of all non-
deterministic elements of our environment that can interfere
with our measurements. For this reason, we need to measure
multiple runs and apply a statistical analysis.
We need a metric that can reliably estimate the worst-case
execution time and ignore the unreliable outliers. Oliveira et.
al. [9] show how quantiles can be used to compare the latency
and end-to-end times of two different Linux schedulers. We
follow a similar approach and run the same experiment multi-
ple times and calculate the quantile. Since we are interested in
the worst-case behaviour, we would naturally want to select
a quantile that is close to the 100th one as possible. However,
6
choosing too high of a quantile will not properly disqualify
outliers.
Figure 4 shows the variance of different quantiles for our
development platforms. For each board, we ran the coremark
benchmark alongside some manually tuned enemy processes
and measured the execution time 1000 times. We divided this
execution times into 40 sets of 250 data points each and we
measured the quantile from the 75th to the 100th. Afterwards
we calculated the variance of the 40 sets for each development
board and plotted the results. The figure shows how selecting
too high a quantile results in noisy data. For that reason, we
have chosen the 90th quantile as our slowdown metric.
Another issue that we take into consideration is the number
of measurements required for obtaining a reliable value of the
quantile. For this reason, we measure 20 times with the same
configuration and calculate the 90% confidence interval. If
the range of values within the interval is too high, we repeat
the process and add more measurements until it decreases to
a desired threshold. We do this by calculating the difference
between the quantile and the interval endpoints and checking
that it is not higher than 5%. However, it often happens that
the measurements never converge to the desired threshold. For
this reason, we limit the number of measurements to 200.
3. Experimental setup
We now evaluate the effectiveness of our approach by running
it on a collection of embedded development boards, and eval-
uating its effect on a series of industry benchmarks common
in time-constrained domains. In Section 3.1 we present the
utilised hardware and afterwards in Section 3.2 we compare
the considered search strategies. We conclude this part with
Section 3.3 where we describe how we developed a hostile
environment for each platform.
3.1. Hardware and Benchmarks
Benchmarks The synthetic victim programs we created are
designed to be especially vulnerable to shared resource inter-
ference. While these synthetic applications show how we can
achieve extreme interference for a specific resource, we are
also interested in observing the effects on industry standard
benchmarks. These benchmarks are summarised in Table 3.
EEMBC Coremark [3] is a standardised benchmark used for
evaluating processors. It is composed of implementations of
the following algorithms: list processing (find and sort), ma-
0.2
0.4
0.6
0.8
1
 0.75  0.8  0.85  0.9  0.95  1
Va
ria
nc
e[
e-
6]
Quantile
Pi3
410c
570X
T3
M3
Figure 4: Variation of the quantile.
Table 3: Benchmarks used to evaluate our approach, along
with a short alias to use in figures.
Suite Benchmark name Alias
Coremark coremark a
Autobench
bitmnp-rspeed-puwmod-4K b
bitmnp-rspeed-puwmod-4M c
matrix-tblook-4K d
matrix-tblook-4M e
puwmod-rspeed-4K f
puwmod-rspeed-4M g
rspeed-idctrn-canrdr-4K h
rspeed-idctrn-canrdr-4M i
rspeed-idctrn-iirflt-4K j
rspeed-idctrn-iirflt-4M k
ttsprk-a2time-matrix-4K l
ttsprk-a2time-matrix-4M m
ttsprk-a2time-pntrch-4K n
ttsprk-a2time-pntrch-4M o
ttsprk-a2time-pntrch-aifirf-4K p
ttsprk-a2time-pntrch-aifirf-4M q
ttsprk-a2time-pntrch-idctrn-4K r
ttsprk-a2time-pntrch-idctrn-4M s
ttsprk-a2time-pntrch-tblook-4K t
ttsprk-a2time-pntrch-tblook-4M u
Table 4: Development boards used to evaluate our approach.
Name Short name SoC Arch Cores
Raspberry Pi 3 B Pi3 BCM2837 ARM A53 4
DragonBoard 410c 410c Adreno306 ARM A53 4
Intel Joule 570X 570x Atom 4
Nano-PC T3 T3 S5P6818 ARM A53 8
BananaPi M3 M3 A837 A7 8
trix manipulation (common matrix operations), state machine
(determine if an input stream contains valid numbers), and
CRC (cyclic redundancy check).
EEMBC Autobench2 [2] consists of automotive workloads,
including: road speed calculation and finite impulse response
filters. This benchmark suite is of interest for the real-time
industry and has been used in the evaluation of other works in
this domain, e.g. [13, 12].
Hardware We have chosen a range of development boards,
containing both ARM and x86 CPUs to evaluate the portability
of our approach. Table 4 shows the SoC, the architecture and
the number of cores for each of them.
The operating system can have an impact on the effective-
ness of our approach. This impact is minimised to create a fair
comparison between the different development boards. We
used Debian Linux, as it was available across all platforms.
3.2. Comparing Search Strategies
We now compare the search strategies described in Section 2.5
and determine which one is the most proficient at finding
effective parameters of the enemy templates. We tune the
enemy templates using their corresponding victim program as
described in Section 2.3 with all three search strategies tuning
for 2 hours. Since all search strategies have a certain degree of
7
Table 5: Comparing search strategies when tuning templates
against litmus tests. The search strategies are placed in or-
der of effectiveness, with the ordering symbols described in
Section 3.2
Board Cache Memory Bus
Pi3 SA<BO<RAN SA<RAN≈BO SA<BO<RAN
410c SA<RAN<BO SA<RAN<BO SA≈BO<RAN
570X SA≈RAN<BO SA<RAN<BO SA<BO≈RAN
T3 SA<RAN≈BO SA<RAN<BO SA≈BO≈RAN
M3 SA<BO≈RAN SA<RAN<BO SA≈BO≈RAN
randomness and can sometimes get lucky or unlucky (even BO
starts by randomly sampling its starting points) we perform
three runs of each search method for each shared resource.
We use the Wilcoxon rank-sum method to test if values from
one set are stochastically more likely to be greater than values
from another set. This method is non-parametric, i.e. it does
not assume any distribution of values, and returns a p-value
indicating the confidence of the result.
The results of this experiment can be found in Table 5 where
we constructed an order of the effectiveness of each search
method. However, some orders are more confident than others,
i.e. the ones with a low enough p-value. When the p-value is
low (below 0.5) we have higher confidence in the ordering,
denoted by the < symbol. However, when the p-value is high
(above 0.5) we are not as confident in the ordering, denoted by
the ≈ symbol. In all cases SA seems to perform the worst. BO
performs well for the memory enemy process and RAN for the
bus enemy process. However, the difference is not clear for the
cache enemy process, with RAN and BO randomly obtaining
the best result.
RAN performs well due to the highly irregular search space
that the parameters of our enemy templates have. BO can
intelligently sample points of interest quickly and has reduced
chances of getting stuck in a local minimum. It is surprising
that SA ranks last in our comparison. Most likely the search
space is highly irregular, and the algorithm often gets stuck in
a local minimum. Of course, SA can be configured to focus
more on exploration, but then there would be no reason to use
it in place of RAN.
3.3. Creating a Hostile Environment
After determining the appropriate search strategy for each of
the enemy processes on each board, we search for the most
aggressive parameters that maximise interference. We tune
each of the enemy templates and its corresponding victim pro-
gram with the winning strategies for a more extended period
(8 hours).
Table 6 presents the maximum slowdown obtained along-
side the search strategy used. The most significant slowdowns
were obtained for the cache or main memory resources. The
bus appears less vulnerable to interference than the other two
resources.
Table 6: Maximum slowdown obtained using the best search
strategy found on the corresponding victim program
Board Cache Memory Bus
Pi3
Slowdown 16.53 6.71 1.38
Method RAN BO RAN
Slowdown 1.81 2.65 1.07
410c
Method RAN BO RAN
570X
Slowdown 1.96 2.65 1.07
Method BO BO RAN
Slowdown 5.29 1.27 1.17
T3
Method BO BO RAN
M3
Slowdown 7.50 49.47 2.18
Method RAN BO RAN
Table 7: Snippet of rank scores for 570X. The environment r is
described by a sequence of 3 letters describing what resource
is stressed by each core. For example: CMB indicates that the
first core stresses the cache, the second stresses the memory
while the third stresses the bus.
Cache Main Memory Bus
rank r score rank r score rank r score
1 MMM 1.51 1 BBB 1.41 1 MBM 1.16
2 MMB 1.48 2 MBM 1.34 ... ... ...
3 MBM 1.47 ... ... ... ... ... ...
... ... ... ... ... ... ... ... ...
26 BCC 1.19 26 BMC 1.04 26 CBC 1.02
From Table 4 we see that the Pi3 and the 410c have the
same architecture, but implemented in different SoCs. The Pi3
is especially vulnerable to cache interference while the 410c
is much less prone to the same type of interference. It is likely
that this can be explained by microarchitectural differences
between the the two boards; however, we are not aware of
the exact mechanism that causes this difference as low-level
details are generally not available for most SoCs on the market
today.
We now determine the optimal hostile environment for each
of the boards using the methodology described in Section 2.4.
An example of this approach can be seen in Table 7, where
we have the ranked list of each of the possible environments
for the 570X. For this platform, the MBM configuration is the
Pareto optimal, where MBM denotes the hostile environment
where the first core stresses the Main memory, the second
core stresses the Bus and the last one also stresses the Main
memory.
4. Results
Now we evaluate the effectiveness of the hostile environment
on the benchmarks of Section 3.1. In Section 4.1 we measure
how the benchmark runtimes are influenced by our hostile
environments and then compare our method with previous
approaches in Section 4.2.
8
 1
 2
 4
 8
a b c d e f g h i j k l mn o p q r s t u
Sl
ow
do
wn
(a) Pi3
 1
 1.02
 1.04
 1.06
 1.08
 1.1
 1.12
ab c de f gh i j k lmnopq r s t u
Sl
ow
do
wn
(b) 410c
 1
 1.05
 1.1
 1.15
 1.2
 1.25
 1.3
ab c de f gh i j k lmnopq r s t u
Sl
ow
do
wn
(c) 570X
 1
 1.05
 1.1
 1.15
 1.2
 1.25
 1.3
ab c de f gh i j k lmnopq r s t u
Sl
ow
do
wn
(d) T3
 1
 1.05
 1.1
 1.15
 1.2
 1.25
 1.3
 1.35
 1.4
ab c de f gh i j k lmnopq r s t u
Sl
ow
do
wn
(e) M3
 1
 1.02
 1.04
 1.06
 1.08
 1.1
 1.12
 1.14
 1.16
 1.18
 1.2
Pi3 410c 570X T3 M3
Sl
ow
do
w
n
4.01
1.03
1.11
1.07
1.13
(f) Geometric mean of slowdown
Figure 5: Slowdowns observed on the available boards. Figures (a)-(e) show the slowdowns obtained for each benchmark on
each of the considered development boards. Figure (f) shows the geometric mean of all benchmarks for each board
4.1. Evaluating the Hostile Environment
With the Pareto optimal hostile environment determined for
each SoC, we evaluate its effectiveness on the benchmarks
described in Section 3.1. Figure 5 shows the results of the
hostile environment on the benchmarks for each one of the
boards. To determine if our slowdowns are statistically sig-
nificant, we calculated the 90% confidence interval for the
benchmarks running in isolation compared to running in the
hostile environment. We then proceed to determine if there
is any overlap between the two. There are only two cases
when they overlap, that is for benchmark a on the 410c and
for benchmark f on the M3. For the considered benchmarks,
we can summarise that this approach has a 98% effectiveness
of slowing down applications across the benchmark suite we
consider.
Figure 5f shows the geometric mean of all benchmark slow-
downs for each of the platforms. The Pi3 is the most vulnera-
ble development board in our experiments, while the 410c is
the most resilient one. This score does not provide a hard guar-
antee of the timing predictability of any of the tested boards,
but it does offer a means by which we can quickly eliminate
unreliable platforms. This experiment is in line with the re-
sults from Table 6 where only the victim processes were used,
and each resource was stressed individually. More specifically,
the large slowdowns in the victim programs can be directly
correlated to the large slowdowns in the benchmarks.
4.2. Comparing with Hand-Written Assembly
Previous approaches have relied on hand-crafting assembly en-
emy processes to assess the maximum slowdown that a given
platform can experience. One example of such an approach in-
volves implementing a pointer chasing scenario [22], in which
the enemy process creates a large array of addresses in main
memory where each address points to a different location in
the same array. The enemy process then starts to navigate
the array using assembly code. By utilising assembly code,
the compiler is prevented from performing any optimisations.
The irregular access pattern contravenes the locality principal
needed by the cache to store information efficiently.
Using the code provided by the authors of [22], we evalu-
ated our hostile environment against the previous approach.
Since this code is written to assembly language we were only
able to execute it on the 570X, which is the only x86 devel-
opment at our disposal. We ran the benchmarks alongside
the hand-crafted assembly enemy processes and measured the
slowdown. Figure 6 shows the slowdowns observed using
the hostile environment and the hand-crafted assembly enemy
processes with two different array sizes (4KB and 4MB). We
calculated the 90% confidence interval for the obtained results
to evaluate if the differences between the two approaches are
of statistical significance. The confidence intervals proved
to be relatively large, and there was an overlap between the
two methods. In 14 cases out of the total 21 benchmarks, our
method achieves a higher slowdown. However, out of those,
9
 1
 1.05
 1.1
 1.15
 1.2
 1.25
 1.3
a b c d e f g h i j k l m n o p q r s t u
Sl
ow
do
w
n
Tuned Assembly 4K Assembly 4M
Figure 6: Comparison with two hand-written assembly enemy processes on the 570X. The first assembly enemy process uses an
array of size 4KB to store addresses, while the second one uses an array of size 4MB.
only 11 of them are statistically significant. In the other 7
cases, the assembly approach can reach a higher slowdown,
but the confidence intervals always overlap.
Our experiments show a statistically significant higher slow-
down in 52% of the applications, in the remaining 48% of the
cases, there was no statistically significant difference. While
our method does not always outperform the hand-written tests,
our method has the advantage of being portable, i.e. it does
not require crafting assembly code for each specific platform.
5. Related work
Applications with similar functionality as the enemy processes
have been used before in the literature. They have been re-
ferred to as resource stressing benchmarks [22], resource
stressing kernels [12] or synthetic contenders [1]. Radojkovic
et al. [22] were the first to utilise such techniques by deploy-
ing assembly code to measure multi-core interference on real
application workloads. They propose a framework for quanti-
fying the maximum slowdown obtained during simultaneous
execution by stressing a single shared resource at a time. Their
work examines several Intel processors, exploring the extent
that the interference from resource stressing benchmarks can
slow down real-time software. Nowotsch et al. [19] perform
a similar experiment on a multi-core PowerPC-based proces-
sor platform and focus specifically on the memory system.
The platform allows for different memory configurations and
provides several methods for interference mitigation. Regard-
less of configuration, SUT slowdowns are still observed when
executing the resource stressing kernels on distinct cores. Fer-
nandez et al. [13] evaluate a multi-core LEON-based processor
and run experiments with both a Linux and real-time operat-
ing system. Unsurprisingly, the slowdown is mitigated on the
real-time operating system, but not eliminated.
Fernandez et al. argue that most resource stressing bench-
marks may fail at producing safe bounds [12]. Under heavy
contention, arbitration policies of shared resources such as
round robin and first in first out produce a so-called "syn-
chrony effect" that causes the SUT to suffer a delay that is not
as severe as the potential worst-case. They propose a method
to improve the bound by varying the injection time between
requests to the shared hardware resources. Approaches such
as [16, 10] rely on randomisation of the source code to pro-
duce different memory mapping and therefore gather a large
set of possible execution times. They utilise a statistical ap-
proach called "extreme value theory" and can provide multiple
worst-case execution times alongside a confidence factor.
Tuning strategies have been used to optimise different com-
putational aspects, with Ansel et al. [6] showing how such an
approach can be used for a variety of domain-specific issues.
Wegner et al. [25] use genetic algorithms to find the inputs that
cause the longest or shortest execution time. To do so, they
formulate the search for such inputs as an optimisation prob-
lem. Law et al. [17] use simulated annealing on a single core
processor to maximise code coverage and therefore obtain an
estimate of the WCET.
Griffin et al. [1] take a different approach and train a deep
linear neural network to learn the relationship between interfer-
ence and the effect of the SUT execution time. This approach
is used to calculate an interference multiplier that can be ap-
plied to a previously calculated WCET without interference.
Previous approaches are limited by the need of the develop-
ers to tune each resource stressing benchmark for each specific
SoC and also can not always detect hidden interference pat-
terns that are specific to the underlying microarchitecture of
the system. In contrast, our approach assumes no knowledge
of the architecture and microarchitecture of the system and
can detect hidden interference patterns automatically.
6. Conclusions
We have devised a portable auto-tuning method for deter-
mining interference across a wide range of platforms. Our
approach is based on configurable enemy processes and does
not rely on advanced knowledge of the microarchitectural de-
tails of a given platform. For determining the more effective
parameters, we compared three different search strategies and
determined the better candidates.
We evaluated this method across a wide range of processors,
consisting of both ARM and x86 processors using industry
standard benchmarks. Our approach is capable of causing
interference in 98% of the cases. We compared the slowdowns
caused by our auto-tuned hostile environments with hand-
tuned hostile environments from prior work and showed that
10
the slowdowns are comparable, and in some cases, even able
to achieve statistically significant higher slowdowns.
References
[1] Exploring and understanding multicore interference from observable
factors, 2017.
[2] EEMBC autobench 2, 2018. Available online https://www.eembc.
org/autobench2//index.php.
[3] EEMBC coremark, 2018. Available online https://www.eembc.
org/coremark//index.php.
[4] Latency evaluation framework, 2018. Available online
https://git.kernel.org/pub/scm/linux/kernel/git/
clrkwllms/rt-tests.git/.
[5] Andreas Abel, Florian Benz, Johannes Doerfert, Barbara Dörr, Sebas-
tian Hahn, Florian Haupenthal, Michael Jacobs, Amir H. Moin, Jan
Reineke, Bernhard Schommer, and Reinhard Wilhelm. Impact of re-
source sharing on performance and performance prediction: A survey.
In Pedro R. D’Argenio and Hernán Melgratti, editors, CONCUR 2013 –
Concurrency Theory, pages 25–43, Berlin, Heidelberg, 2013. Springer
Berlin Heidelberg.
[6] Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-
Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe.
Opentuner: An extensible framework for program autotuning. In
International Conference on Parallel Architectures and Compilation
Techniques, Edmonton, Canada, August 2014.
[7] Guy Berthon, Marc Fumey, Xavier Jean, Helene Misson, Laurence
Mutuel, and Didier Regis. White Paper on Issues Associated with
Interference Applied to Multicore Processors. Technical report, Thales
Avionics, 2733 South Crystal Drive, Suite 1200, Arlington, VA 22202,
2016.
[8] Vincent Brindejonc and Roger Anthony. Avoidance of Dysfunctional
Behavior of Complex COTS Used in an Aeronautical Context. In
Lambda-Mu RAMS Conference, Dijon, France, 2014.
[9] Augusto Born de Oliveira, Sebastian Fischmeister, Amer Diwan,
Matthias Hauswirth, and Peter F. Sweeney. Why you should care
about quantile regression. SIGARCH Comput. Archit. News, 41(1):207–
218, March 2013.
[10] Enrique Díaz, Mikel Fernández, Leonidas Kosmidis, Enrico Mezzetti,
Carles Hernandez, Jaume Abella, and Francisco J. Cazorla. Mc2:
Multicore and cache analysis via deterministic and probabilistic jitter
bounding. In Johann Blieberger and Markus Bader, editors, Reliable
Software Technologies – Ada-Europe 2017, pages 102–118, Cham,
2017. Springer International Publishing.
[11] Zhenman Fang, Sanyam Mehta, Pen-Chung Yew, Antonia Zhai, James
Greensky, Gautham Beeraka, and Binyu Zang. Measuring microar-
chitectural details of multi- and many-core memory systems through
microbenchmarking. ACM Trans. Archit. Code Optim., 11(4):55:1–
55:26, January 2015.
[12] G. Fernandez, J. Jalle, J. Abella, E. Quiñones, T. Vardanega, and F. J.
Cazorla. Computing safe contention bounds for multicore resources
with round-robin and fifo arbitration. IEEE Transactions on Computers,
66(4):586–600, April 2017.
[13] Mikel Fernández, Roberto Gioiosa, Eduardo Quiñones, Luca Fossati,
Marco Zulianello, and Francisco J. Cazorla. Assessing the suitability
of the ngmp multi-core processor in the space domain. In Proceedings
of the Tenth ACM International Conference on Embedded Software,
EMSOFT ’12, pages 175–184, New York, NY, USA, 2012. ACM.
[14] John L. Hennessy and David A. Patterson. Computer Architecture:
A Quantitative Approach. Morgan Kaufmann, Amsterdam, 5 edition,
2012.
[15] L. M. Kinnan. Use of multicore processors in avionics systems and
its potential impact on implementation and certification. In 2009
IEEE/AIAA 28th Digital Avionics Systems Conference, pages 1.E.4–1–
1.E.4–6, Oct 2009.
[16] L. Kosmidis, R. Vargas, D. Morales, E. Quiñones, J. Abella, and F. J.
Cazorla. Tasa: Toolchain-agnostic static software randomisation for
critical real-time systems. In 2016 IEEE/ACM International Confer-
ence on Computer-Aided Design (ICCAD), pages 1–8, Nov 2016.
[17] Stephen Law and Iain Bate. Achieving Appropriate Test Coverage
for Reliable Measurement-Based Timing Analysis. Proceedings -
Euromicro Conference on Real-Time Systems, 2016-Augus:189–199,
2016.
[18] J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. En-
abling software management for multicore caches with a lightweight
hardware support. In Proceedings of the Conference on High Perfor-
mance Computing Networking, Storage and Analysis, pages 1–12, Nov
2009.
[19] J. Nowotsch and M. Paulitsch. Leveraging multi-core computing archi-
tectures in avionics. In 2012 Ninth European Dependable Computing
Conference, pages 132–143, May 2012.
[20] P. Parkinson. The challenges of developing embedded real-time
aerospace applications on next generation multi-core processors. In
Aviation Electronics Europe, Munich, Germany, April 2016.
[21] P. Parkinson. Update on using multicore processors with a commercial
arinc 653 implementation. In Aviation Electronics Europe, Munich,
Germany,e, April 2017.
[22] Petar Radojkovic´, Sylvain Girbal, Arnaud Grasset, Eduardo Quiñones,
Sami Yehia, and Francisco J. Cazorla. On the evaluation of the impact
of shared resources in multithreaded cots processors in time-critical
environments. ACM Trans. Archit. Code Optim., 8(4):34:1–34:25,
January 2012.
[23] Tyler Sorensen and Alastair F. Donaldson. Exposing errors related to
weak memory in gpu applications. SIGPLAN Not., 51(6):100–113,
June 2016.
[24] P. K. Valsan, H. Yun, and F. Farshchi. Taming non-blocking caches
to improve isolation in multicore real-time systems. In 2016 IEEE
Real-Time and Embedded Technology and Applications Symposium
(RTAS), pages 1–12, April 2016.
[25] Joachim Wegener, Harmen Sthamer, Bryan F. Jones, and David E.
Eyres. Testing real-time systems using genetic algorithms. Software
Quality Journal, 6(2):127–135, Jun 1997.
[26] H. Yun, R. Mancuso, Z. P. Wu, and R. Pellizzoni. Palloc: Dram
bank-aware memory allocator for performance isolation on multicore
platforms. In 2014 IEEE 19th Real-Time and Embedded Technology
and Applications Symposium (RTAS), pages 155–166, April 2014.
[27] H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha. Memguard:
Memory bandwidth reservation system for efficient performance iso-
lation in multi-core platforms. In 2013 IEEE 19th Real-Time and
Embedded Technology and Applications Symposium (RTAS), pages
55–64, April 2013.
11
