A Compiler Assisted Scheduler for Detecting and Mitigating Cache-Based
  Side Channel Attacks by Khan, Sharjeel et al.
A Compiler Assisted Scheduler for Detecting and Mitigating Cache-Based Side
Channel Attacks
Sharjeel Khan
Georgia Institute of Technology
smkhan@gatech.edu
Girish Mururu
Georgia Institute of Technology
girishmururu@gatech.edu
Santosh Pande
Georgia Institute of Technology
santosh@cc.gatech.edu
Abstract
Detection and mitigation of side channel attacks is a very
important problem. Although this is an active area of research,
the solutions proposed in the literature have limitations in
that they do not work in a real world multi-tenancy setting
on servers, have false positives or high overheads limiting
their applicability. In this work, we demonstrate a compiler
guided scheduler, Biscuit, which detects the cache based side
channel attacks for processes scheduled on multi-tenancy
server farms with very high accuracy. At the heart of the
scheme is the generation and use of a cache miss model
which is inserted by the compiler at the entrances of loop
nests to predict the underlying cache misses. Such inserted
beacons convey the cache miss information to the scheduler
at run time which uses it to co-schedule processes such that
their combined cache footprint does not exceed the maximum
capacity of the last level cache. The scheduled processes are
then monitored for actual vs predicted cache misses and when
an anomaly is detected, the scheduler performs a binary search
to isolate the attacker. We show that Biscuit is able to detect
and mitigate Prime+Probe, Flush+Reload, and Flush+Flush
attacks on OpenSSL cryptography algorithms with an F-score
of 1 and also detect and mitigate degradation of service on
vision application suite with an F-score of 0.92. Under no
attack scenario, the scheme poses low overheads (up to a
maximum of 6%). We believe that due to its ability to deal
with multi-tenancy, its attack detection precision, ability to
mitigate and low overheads, such a scheme is practicable.
1 Introduction
Modern servers are multi-core machines that run multiple
processes simultaneously. These processes are protected from
one another through process isolation through virtual address
spaces; in addition many servers also adopt virtual machines
for multi-tenancy to achieve complete software-stack isola-
tion. The purpose of isolation is to make the memory con-
tents or private data of one process non accessible to other
processes. In spite of such mechanisms there have been at-
tempts to gain access to private data. Private data has been
leaked or attacked traditionally by exploiting memory cor-
ruption or deviating control flow of the processes through
input strings [4, 30], for which strong defense mechanisms
[1, 3, 6, 7, 12, 21, 23, 42] have been used. While these mecha-
nisms safeguard against the faults in the program itself, side-
channel attacks [9,15,16,18,20,41] are becoming ubiquitous.
Side channel attacks are a class of attacks that extract secret-
key in cryptography algorithms by recording the changes in
the physical properties of the machine which act as a side-
channel. The secret-key is obtained by analyzing the differ-
ential information exhibited by the physical property which
is used as a side-channel. Attacks have used different phys-
ical properties of a system such as time [9, 20, 41], power
consumption [22], memory consumption [13], sound [8] or
electromagnetic emissions [29] to leak data.
Among various side-channel attacks that leverage different
physical properties, time-based attacks that utilize caches are
most prevalent and are attractive for the attackers for the
following reasons:
• Caches are major component of the data access pipeline
used to reduce memory latency in all computer sys-
tems. Since caches are omni-present, such attacks can
be staged on a wide variety of systems.
• Cache-based side-channel attacks are easier to monitor
as external equipment is not required.
• The attack can be carried out remotely without accessing
the machine physically which is a requirement in several
other side-channel attacks.
• The attack also slows down the victim considerably, thus
denying the process to service its users.
Currently, the literature describes three well-known cache-
based side-channel attacks which are: Flush+Flush [9],
Prime+Probe [20], and Flush+Reload [41]. These attacks
1
ar
X
iv
:2
00
3.
03
85
0v
2 
 [c
s.C
R]
  1
0 M
ar 
20
20
force the victim’s data out of the cache by occupying the en-
tire cache space and then record the pattern in which the cache
sets were filled in by the victim. The adversary repeats this
step and then analyzes the cache access pattern to obtain the
victim’s secret key, which has been demonstrated successfully
in [24,31,40,41]. Note that even if the attacker were not able
to successful in terms of analyzing the cache access pattern
and retrieve the secret key, she would have still managed to
slow down the victim substantially thus degrading the service
of the victim. Thus, a degradation of Service (DS) attack is
often one of the side effects of key-stealing attack.
1.1 Cache-based side-channel Attacks
Several cache-based side-channel attacks have been studied.
Here, we elaborate on the three well-known and recent cache-
based side-channel attacks.
1. Flush and Reload: This technique relies on identical
pages being shared between the attacker and victim pro-
cesses and in particular, victims share the pages con-
taining cryptography code or data with the attacker. The
adversary also assumes selected set of cache lines can be
flushed, through the invocation of certain instructions, as
an example through a clflush instruction. The adversary
conducts the attack in three steps. In the first step, the
attacker flushes a memory line from the cache ( e.g. us-
ing the clflush instruction). The attacker waits for a fixed
interval during which the victim may access the memory
line bringing the memory into the cache. After the wait,
the attacker accesses the same memory line. If the victim
accessed the memory line, the attacker’s access will re-
quire a short time. On the other hand, if the victim never
accessed the line in the wait period, the attacker’s access
will require the memory line to be fetched from memory
leading to a longer time. By continuously repeating the
above steps, the attacker records the pattern of memory
access by the victim which is later analyzed to deduce
the secret key.
2. Flush and Flush: This technique is similar to the above
Flush and Reload attack with the same requirements and
consists of the same first two steps. The attack takes into
account that the clflush instruction aborts early when the
memory line is not in the cache. By exploiting this rule,
the attacker does not access the memory line in the third
step. Rather, she flushes the memory line again similar
to the first step. If the victim accessed the memory line
in the intervening period between the first and second
flush, the attacker’s second flush will require a longer
time because it needs to evict the line from all the caches.
On the other hand, if the victim never accessed the line
in the wait period, the attacker’s second flush will abort
early in a shorter time. The attacker can records this trace
and analyzes the differential behavior and use its results
to crack the secret-key as in Flush and Reload.
3. Prime and Probe: This technique does not have any
prior set-up requirements as in the previous two attacks
and hence can be much more pervasive. Before the attack
is initiated, the attacker creates an eviction set, which is a
set of memory lines mapped to the same set, for a cache
set. In the next prime step, the attacker fills the entire
cache set with the memory lines from the eviction set.
In second step, similar to Flush and Reload, the attacker
waits for fixed interval of time during which the victim
may access a memory line that is brought into the same
cache set. In the next probe step, the attacker accesses
the eviction set again, checking if any memory line in
the eviction set has been removed from the cache by the
victim. If the victim never evicted a cache line from the
same cache set, then the accesses for each memory line
in eviction set will be short and vice versa. Recording
the victims access pattern the attacker can analyze the
differential behavior.
It may be noted that regardless of the exact attack technique
above, each results in a significant number of cache misses
which can serve as the basis of detection of an attack. The key
question to be answered however is: what is the expected be-
havior of normal (no-attack scenario) cache misses at a given
program point during the application’s dynamic execution and
how can one carefully modulate the expected cache behav-
iors such that the departures from the same are successfully
declared as attacks? Through a combination of compiler anal-
yses that generate cache models and by carefully controlling
scheduling decision, this work successfully constructs such
a solution. Before we delve into the details of our scheme,
we first provide a detailed survey of existing solutions citing
their pros and cons.
1.2 Defense Mechanisms
The cache-based side channel attacks shown so far directly
target the secret key in the cryptography algorithms which
calls for a serious look into possible defense mechanisms for
these attacks.
1.2.1 Prevention
Several cache-based side-channel attack prevention mecha-
nisms focus on changing the cache designs such as chang-
ing cache replacement policy [39], encrypting the cache ad-
dress [26, 27], and locking cache lines [37]. These solutions
require a modification in hardware and do not apply to already
existing systems and in some cases the solutions degrade the
performance of applications. Such hard isolation, preventing
sharing of resources that house sensitive data, in software has
also been studied.
2
Cachebar [45] is a memory management subsystem that
provides two main mechanisms against side channel attacks.
The first mechanism copies pages to isolate pages from being
shared among from different processes so that a victim using
shared library page containing cryptography function cannot
be traced by the adversary conducting the Flush and Reload
attack. The second mechanism limits the number of cache
lines that a process can access thus prohibiting the prime
and probe based attacker from exercising the entire eviction
set leaving the attacker unsure of victims accesses. However,
these mechanisms are closely tied to the working of Flush and
Reload and Prime and Probe attacks. The copying of pages
places extra burden and also withdraws the benefits of shared
libraries and limiting the cache access adversely impacts the
performance of genuine processes with overheads up to 25%.
StealthMem [14] allocates isolated pages called stealth
pages that map to different cache sets to each process and
assumes the victim’s confidential calculations and data are
placed within these stealth pages. To adhere to this constraint,
victim’s source code must be changed. StealthMem also dis-
allows each process from accessing a piece of cache which
grows with the increase in number of cores and incurs over-
head during normal execution without attacks with them in-
curring overheads up to 5.9%.
While Cachebar and StealthMem is a hard isolation ap-
proach, a soft isolation software solution [33] is a scheduler-
based approach that disrupts the victims cache access tracing
by pre-empting all the processes. The scheduler analyzes the
minimum run time guarantee and schedules the process with
the corresponding time slice and also performs CPU state
cleansing between pre-emption in order to create a soft iso-
lation between processes. This technique however increases
the latency of each process and in server farm environments
that over-provision cores [2] de-scheduling the processes and
idling the machines further decreases machine utilization.
1.2.2 Detection and Mitigation
Several researchers have studied the detection of these at-
tacks. Most of the detection mechanisms are software-based
solutions [35, 36] that perform program analysis on bina-
ries to model the secret key dependent memory accesses and
control-flow. The model is passed to an SMT solver to detect
leakage areas that can be exploited by side-channel attacks.
The timings channels can be disrupted through program anal-
ysis and transformation which attempt to ensure that the CPU
cycles and cache misses and hits are independent of the se-
cret data [38]. This technique also leads to longer time for
responses and throughput degradation leading to an average
overhead of 50% and worse case of 225%.
Several other techniques involve runtime mechanisms [5,
17, 28, 43] that use performance counters to check for anoma-
lies in programs. Due to underlying false positives, these
runtime detection mechanisms do not mitigate the attack and
leave it up to the system administrator for resolution. In ad-
dition, these techniques are closely tied to the cryptography
algorithms on which they detect the attack. For example, Spy-
detector [17] is an anomaly detection semi-supervised mech-
anism to detect side-channel attacks at runtime. The attack
detection mechanism builds a clustering model that learns on
cache misses, cache accesses, and number of processes of the
execution windows. The predicted workload level is passed
to the clustering model, which raises an alarm for a possible
attack if a window is not within the cluster. The clustering
model is closely tied to the cryptography algorithms thus
weakening the detection for algorithms with modifications
and for different workloads. Moreover, the mechanism must
figure out the granularity of the window that captures appli-
cation phases because different applications require different
window sizes thus reducing the generality of the mechanism.
In addition to just detection schemes, many software based
schemes that perform mitigation after detection have also
been studied. CloudRadar [43] generates a cache-access pro-
file of the cryptography applications pertinent to the specific
attacks and during program execution looks for the behaviors
that match the profile to flag an attack. Next, it then migrates
one of the processes or a known victim process to mitigate the
attack. CloudRadar involves mechanisms that can be prone
to noise such as execution profiles of cryptography applica-
tions and similarly to the cache hits or cache miss profile of
known-attacks. A new unknown attack with a slightly dif-
ferent profile or tricky modifications in the implementation
of the known-attacks decreases the strength of this mecha-
nism. Also, the behavior of these applications can change in
presence of other co-executing applications which can lead to
profile mismatch thus escaping the detection radar. In a real
execution environment, co-executing applications as well as
variants of (known) attacks are very likely and this points to
the need for a new technique.
In summary, most of the above mechanisms are either
hardware-based and do not apply to existing machines, or
in case of software-based solutions they reduce efficiency
of caches or increase latency, or they are closely tied to the
specific cryptography and cache-attack algorithms and do not
apply in case of multiple co-executing applications. These
mechanisms also fail to thwart the degradation of service at-
tack by these cache-based side-channel attacks. On the other
hand, current hardware counter based runtime detection mech-
anisms such as SypDetector suffer from relatively high false
positives and also false negatives with a F-score of 0.83 with
Prime and Probe and Flush and Flush attacks. CloudRadar
suffers from the burden of matching execution windows of
victim and attacker and with higher granularity of window
incurs false-positive rate of up to 30%.
In this paper, we propose Biscuit, a compiler-assisted sched-
uler that in a multi-tenant environment detects any cache-
based side-channel attack on any program and then mitigates
the attack by de-scheduling the potential culprit thereby avoid-
3
ing any degradation of serice (DS) attack. The potential culprit
is scheduled back and is allowed to run to completion when
all other processes have finished their execution. As noted
earlier, Biscuit relies on the fact that the victims of cache-
based side-channel attack incur a significantly larger number
of cache-misses compared to normal execution as shown in
Figure 1. The number of cache misses is atleast five times
that of the normal execution. In order to detect the cache
behavior anomaly, Biscuit must first answer the question of
cache misses expected at a given program point during the
application’s normal (no attack) execution with a significant
accuracy. For this purpose, Biscuit first generates a cache-miss
model for every loop using compiler analysis which is then
inserted in the application. During the execution, this model
transmits evaluated values of cache misses to the scheduler
for each executing process. The scheduler then leverages the
predicted cache-misses information to make scheduling de-
cisions because the predicted cache-misses are mainly due
to cold misses incurred by accessing unique memory. This
ensures that the predicted cache footprints of the scheduled
processes always fit within the last-level cache (LLC) and its
capacity is not exceeded. Such a scheduling decision there-
fore ensures that the deviations from predicted cache misses
are not caused due to cache capacity being exceeded. Since
the predicted cache misses are supposed to be closer to the
actual ones due to such scheduling decisions under multi-
tenancy, the departures from the predicted behaviors must be
attributable to some other reasons, viz. attacks. During the
execution, the scheduler monitors the cache-misses to check
if the cache miss prediction is violated for a given process.
Upon encountering such a scenario, the scheduler through
a careful binary search, isolates the culprit responsible for
cache-misses in victim.
We evaluated Biscuit on all the above three cache-based
side-channel attacks on OpenSSL’s implementation of AES,
RSA, and ECSDA cryptography algorithms in a multi-tenant
environment. Biscuit was able to catch all attacks on cryptog-
raphy algorithms with no false positives. We also checked the
usefulness of these techniques on the San Diego Computer Vi-
sion benchmarks to see if we can catch degradation of service
(DS) attacks on them. On these benchmarks as well, all the
attacks were caught, however there were a few false positives.
We summarize our contributions as follows:
• A predictive cache miss-model for each loop based on
loop bounds
• A scheduler that leverages the expected cache-miss in-
formation and schedules effectively not to degrade run-
time cache behavior and which detects cache-based side-
channel attacks by pinpointing and de-scheduling the
attacker.
• An evaluation of our scheduler using three well-known
side-channel attacks on OpenSSL’s cryptography algo-
Figure 1: Normalized Cache Misses of Attacks
rithms and computer vision applications for secret-key
and DS, respectively.
• A demonstration that we are able to catch these attacks in
a multi-tenant environment with attack-agnostic models
and overheads of less than 6% under no attack scenario.
Caches have been used in multiple ways to leak secret-key
of cryptography algorithms, for example in HIDE [46], the
attacker used the cache to induce differential address infor-
mation on the address bus to leak the secret-key. This attack
requires physical access to the machine to snoop on the ad-
dress bus. We focus on the time-based cache attacks in this
work, however, attacks shown in HIDE on the unmodified
hardware can also be caught by our technique since they result
in higher cache misses to begin with. Cache has also been
used as a side-channel to leak data in the well-known attacks
such as spectre, meltdown [10, 16, 18, 32], but these attacks
exploit some other architectural features such as speculative
execution to attack and only use cache as a micro-architectural
covert channel to transmit the data. In particular, they do not
target the victim’s cache and hence tackling them is outside
the scope of this work.
The remainder of the paper is structured as follows. In
Section 2, we provide an overview of the entire framework.
In Section 3, we explain the cache-miss model followed by
beacon framework that interact with the scheduler in Section
4. We explain the Biscuit Scheduler in Section 5 and provide
a detailed account of evaluation in Section 6. In Section 7,
we present some prior mechanisms against cache-based side-
channel attacks and conclude in Section 8.
2 Overview
To catch the cache-based side-channel attacks, Biscuit re-
lies on the fact that the victim experiences a significantly
larger number of cache-misses while being attacked than dur-
ing normal execution. To establish the expected number of
cache-misses during normal execution, Biscuit first builds a
4
cache-miss model with the help of the compiler for every loop
because the phases of execution during which cache-misses
are experienced in an application are well-categorized by the
loops in the program. Every loop in the application is then
prefixed with a cache-miss model, which learns the cache-
misses with respect to the loop properties through training
over the input data sets. Generation and insertion of the cache
model before the loop nests forms the compilation phase of
Biscuit. All applications, except the attacker must go through
the compilation phase for effective scheduling such that dy-
namic cache contention is minimal under normal (no attack)
execution. Cache-misses can also be calculated through ana-
lytical models [11]. However, these models can handle only
affine accesses and model simple caches.
In the runtime phase, the cache-miss model infers the
cache-misses of each loop before executing the loop. This
predicted information is passed onto the scheduler, which first
schedules the processes such that there is no cache-contention
between the co-executing processes. Then the scheduler mon-
itors the cache misses while the loops in the applications
execute. If the loop experiences cache misses more than the
predicted cache-misses, then the scheduler performs a binary
search over all the executing processes to catch the plausi-
ble culprit causing the cache-misses. The culprit is then de-
scheduled until all the executing processes finish after which
it is scheduled back.
2.1 Threat Models and Goals
Within a multi-tenant environment the cache-based side-
channel attacks are more decisive, that is the cache-misses
can result from co-executing processes and not from an attack,
and proposed solution mechanisms are either hardware-based
or induce runtime overhead or hard to apply in such an en-
vironment with low false negatives and false positives. We
assume a multi-tenant environment with a mix of various ap-
plications such that they do not overload the system and any
process that needs more than available resources is resched-
uled onto a different node. Server farms utilize less than 50%
of the machines by over-provisioning [2] and cluster sched-
uler can be invoked to reschedule the process onto another
node in the cluster. We assume that the plausible victim pro-
cess are compiled through Biscuit compiler which inserts
a cache-miss model on top of every loop. Other process in
the system are compiled with the Biscuit compiler for effec-
tive scheduling with no contention of cache resource. The
attacker may or may not be compiled with Biscuit compiler.
The Biscuit scheduler uses the compiler inserted cache-miss
information to efficiently schedule the co-executing process
onto the available cores such that the processes’ combined
memory footprint fit the cache. We assume a trusted com-
piler inserts valid cache miss model and the process does not
falsify the information provided to the scheduler. A secure
handshake based on a secret-key mechanism is commonly
used in such scenarios. The goal of the scheduler is to detect
any cache-based side-channel attack on any victim process
and find the plausible culprit process and de-schedule it with
high F-score.
3 Cache Miss Model
We develop a linear-regression model to predict the cache
misses for a loop. For a given loop, with enough training
input for different loop iterations of a loop, the pattern of
cache misses is amplified with the increase in number of
iterations. Thus for a given loop, cache misses is directly
proportional to the number of loop iterations, such that
CM ∝ N =⇒ CM = α∗N (1)
where CM is the cache misses for a loop which takes N loop
iterations and α is the constant. In the case of nested loops,
the cache misses depend on the number of iterations for each
nested loop. The cache misses for entire loop is a function of
the loop iterations of the nested loop, such that
CM = f (N1,N2, . . . ,Nn) (2)
where CM is the cache misses for the loop nest with N j being
the number of loop iterations for each nested loop j.
We normalize the loops by running LLVM compiler’s loop-
simplify pass. The loop-simplify pass transforms the loop to
start from lower bound set to zero and then increment by one
till the loop bound. Loop simplification normalizes for and
while loops, which do not have a data-dependent terminating
conditions. For example, the loop in Code 1 is normalized to
Code 2.
1 f o r ( i = 1 0 ; i < N; i +=2) {
2 a = b + i ;
3 }
Code 1: Loop
1 f o r ( i = 0 ; i < N/ 2 − 5 ; i ++) {
2 a = b + ( i + 5 ) ∗ 2
3 }
Code 2: Normalized Loop
In the normalized loops, the loop bound (upper) is equal to the
number of loop iterations. Thus, the number of cache misses
can be written as a function of the loop bounds of each nested
loop. Cache misses equation 2 is equivalent to
CM = f (lb1, lb2, . . . , lbn) (3)
where lb j = N j and lb j is the loop bound for the j loop inside
the loop nest. Each nested loop will individually add to the
cache misses so we can transform the equation 2 into
CM = f1(lb1)+ f2(lb1, lb2)+ · · ·+ fn(lb1, lb2, . . . , lbn),
(4)
5
which is equivalent to
CM = α1 ∗ lb1 +α2 ∗ lb1 ∗ lb2 + · · ·+αn ∗ lb1 ∗ lb2 ∗ · · · ∗ lbn.
(5)
Through linear regression, we get each coefficient αk plus a
constant term α0 representing the y-intercept.
We record the cache misses for a loop by reading the per-
formance counters during the execution of the loop. To record
this data, our LLVM pass instruments perf to start monitor-
ing cache misses before the loop and stop monitoring after
the loop. Loops for which loop bounds cannot be extracted
statically, we profile the loop iterations during the training
runs for collecting cache-miss data. The cache misses and
the number of loop iterations are fed to scikit-learn’s linear
regression [25] to learn a linear cache-miss regression model
as in Equation 5 for each loop.
The linear model will only predict a single value for the
cache misses, but because of non-determinism cache-misses
varies even for the same loop with same iterations when ex-
ecuted several times. The variance is different for small and
large loop bounds. To accommodate for deviation due to
non-determinism, we calculate the standard deviation over
multiple runs for each loop and then determine the maximum
ratio of standard deviation to the average cache misses,i.e.
k = max(
σl
cml
),∀ loop l (6)
where σl is the standard deviation of each loop l and cml
is the average cache misses of the corresponding loop. This
ratio is then appended to the predicted value in Equation 5 to
account for non-determinism as
CM(U) =CM + k ∗CM (7)
where CM(U) is the upper bound on the cache misses. Note
that, we do not need a lower bound on cache misses because
for the attack to be flagged the cache misses must exceed
CM(U) and any misses within the upper bound can be safely
considered as safe execution. The linear model 5 along with
upper bound equation 7 is instrumented before the loop header
in the LLVM intermediate code (IR). During runtime, these
equations are evaluated and CM(U) is passed on as beacons
to the scheduler.
The cache miss model serves two purposes for the biscuit
scheduler. One it is used to check if a process is experiencing
cache misses more than it should and secondly, to schedule
the process with as minimal cache-contention for which the
scheduler uses the cache-miss information as the cache/mem-
ory footprint of the process. When a process is executed to
be profiled with no other co-executing processes the unique
memory accesses results in cache misses, the misses due to
conflicts are avoided because of efficient hash mechanisms
that utilize the entire cache, and any capacity misses means
the cache size is insufficient for the process and the process
must be run alone or de-scheduled as per the scheduling logic
explained in Section 5. Hence the predicted cache misses can
be used as memory footprint for all purposes of scheduling.
4 Beacons
The cache miss equations are inserted in the pre-header of
the loop. These equations are resolved using the run time
loop bound values to predict the entire loop’s cache misses.
The predicted cache misses value is passed to the scheduler
through function calls to a library called beacons. The library
interfaces with the scheduler through shared memory in our
implementation but other secure communication channels can
be employed.
These beacons are classified based on the precision of the
loop bound information for the loop. The imprecision of the
data emerges from the type of loops. Many loops iterate for a
fixed number of times determined by the loop bounds which
can be compile-time unknowns, but still loop invariant. How-
ever, some loops terminate based on some data-dependent
condition and the number of loop iterations for such loops
cannot be resolved at the pre-header of the loop. It is also dif-
ficult to calculate the loop iterations for loops with non-affine
control variables (loop bounds or loop index). Based on the
precision of the loop bound information in turn the cache-
miss information passed on to the scheduler, the beacons are
classified as Precise or Expected Beacons.
1. Precise Beacon The loop bounds are known at least
before the execution of the loop begins and are loop
invariant. One such loop nest, in which loop bound of
each loop within the loop nest is a loop nest invariant is
rectangular loops as shown in Code 3.
1 f o r ( i n t i = 0 ; i <= N; ++ i )
2 {
3 a [ i ] = i +1 ;
4 f o r ( i n t j = 0 ; j < M; ++ j )
5 a [ j ] = j +1 ;
6 }
Code 3: Rectangular Loop
The cache misses based on equation 5 for this loop is
CM = α1 ∗M ∗N +α2 ∗N +α0. (8)
. Similarly, in triangular loops, as shown in Code 4, al-
though the inner loop bound is dependent on the loop
index and outer loop index variables, the inner loop exe-
cutes for one iteration to upper bound iterations for every
iteration of the outer loop thus executing for a total of
N(N)/2 iterations.
1 f o r ( i n t i = 0 ; i <= N; ++ i )
2 {
3 a [ i ] = i +1 ;
4 f o r ( i n t j = 0 ; j < i ; ++ j )
5 a [ j ] = a [ j ] + ( j−i ) ;
6
6 }
Code 4: Triangular Loop
The cache misses based on equation 5 for this triangular
loop is
CM = α1 ∗ N
2
2
+α2 ∗N +α0. (9)
.
2. Expected Beacon
The Expected beacon type is of loops with loop bounds
that are either data dependent or loop control variables
that are non-affine. For these loops, the loop bound is
either not known before the execution of the loop as
in data dependent loop or cannot be calculated because
of the limitations of the compiler tools as in non-affine
loops. For example, the below loop bound Code 5 is
data dependent and the number of loop iterations is only
known after the loop terminates.
1 whi le ( a [ i ] == i )
2 {
3 i += 1 ;
4 }
Code 5: Data Dependent Loop
For such loops, we calculate expected cache-misses by
using average loop iterations value collected from the
training phase of the cache-miss model. The expected
value when plugged into Equation 5, the cache-misses
for this loop is given by
CM = α1 ∗E +α0, (10)
where E is the expected loop bound. If the loop nest
consists of both precise and expected loops, then the
loop is classified on the type of outermost loop as either
expected or precise, accordingly.
Both precise and expected beacons are first inserted in the
loop nest pre-header. However, the parent function of this
loop nest can be called inside some other functions’ loop nest
which can result in multiple beacon calls to the scheduler. In
such cases we interprocedurally hoist the beacon calls outside
the caller function’s outermost loop. However, due to such
hoisting, the beacon loop bound variable information required
for precise beacons may not be available at the external loop
nest pre-headers (since it may be interprocedurally defined
inside the callee function). In some cases, a backward inter-
procedural slice can be used to determine the inner loop bound
but in other cases, estimates must be used. For the latter case,
the precise beacons are converted to expected beacons and
the expected loop iterations are used instead at the external
loop nest pre-header. The LLVM beacon pass also inserts a
completion beacon at each exit of the loop nest. This comple-
tion beacon only tells the scheduler that the loop has ended.
In other words, beacons communicate dynamic regions of
executions (loops) to the scheduler along with expected cache
misses using the hoisted cache model.
5 Biscuit Scheduler
The biscuit scheduler uses the information sent by the beacon
for two purposes. One, the scheduler uses the predicted cache
misses to efficiently schedule the processes on the available
cores such that cache contention is minimal among the si-
multaneously executing processes. When cache-misses are
profiled with no other co-executing process, the total cache-
misses are roughly equal to the memory footprint of the exe-
cuting process or in this case the executing loop region. First,
the Biscuit scheduler must ensure that the total expected cache
footprint of all the co-executing applications does not exceed
the LLC (last level cache) capacity. This ensures that the pre-
dicted cache misses roughly equal dynamic cache misses for
each scheduled application under normal (no attack) condi-
tions. Secondly, the scheduler uses the predicted cache misses
to detect a plausible attack when the monitored cache misses
of an application exceeds the predicted cache misses. Because
the scheduler avoids cache contention, cache misses of the
application will mostly exceed because of an attacker con-
ducting a cache-based side-channel attack. These two core
duties of the Biscuit scheduler are elucidated below.
5.1 Scheduling
The scheduler starts by scheduling one process on to each
core of every socket in the machine thus scheduling every
new process on a free core. As the processes execute, the
beacons are fired. These beacons are processed by the sched-
uler. The scheduler knows the number of sockets, number of
cores per socket and the cache size of each socket. On receiv-
ing a beacon, the scheduler checks if the memory footprint
(predicted cache-misses) fits the available cache size. For the
very first beacon, the available cache size is the cache size of
the socket the process is executing on. The available cache
size is updated by subtracting the cache requirements of the
subsequently executing beacon of a scheduled process.
On next successive beacons the processes are allowed to
continue on the socket if the memory footprint (predicted
cache misses) is less or equal to the available cache. If the
available cache in the socket does not meet the requirements
of the beacon, then the scheduler first checks for a free core
ensuring that there is enough free cache available on other
sockets. If these conditions are not satisfied then the scheduler
greedily tries to swap a process from another socket with
this process such that the cache requirements of both the
processes is satisfied. If the scheduler cannot relocate the
process, then the process is de-scheduled owing to a lack of
adequate resources. At this point, we assume the process can
be scheduled on to another node in a cluster if needed.
7
5.2 Detection and Mitigation of Side-Channel
Attacks
On receiving a beacon, the scheduler first schedules the pro-
cess appropriately and simultaneously starts monitoring the
cache-misses incurred by the process. For each beacon pro-
cess, the scheduler keeps examining if the cache misses are
less than the predicted caches misses regularly. On observing
a process exceeding the predicted cache misses the scheduler
is alerted of a plausible attack on the process, which is now a
plausible victim. Since the scheduler has managed to schedule
the process such that cache contention due to co-execution is
minimal, the scheduler expects the actual cache misses expe-
rienced by the process to be within the predicted upper bound
CM(U) as in Equation 7. To catch the plausbile attacker and
mitigate the attack, the scheduler conducts a binary search on
the executing processes as in Algorithm 1.
The last level cache is shared among the processes within
the same socket. So, the plausible attacker must be executing
on the same socket as the victim. The scheduler conducts a
binary search using misses per thousand instructions (MPKI)
calculated over a period of 10 ms to catch the attacker. After
recording the current MPKI, the scheduler de-schedules half
of the executing processes other than the victim (which is
detected due to its dynamic cache misses being higher than
the expected value) .The scheduler checks if the new MPKI is
less than the previously recorded MPKI. If so, then the plausi-
ble attacker must be in the de-scheduled process set. In such a
scenario, the scheduler switches the currently scheduled pro-
cesses with the de-scheduled process with the victim intact
and rechecks MPKI if it decreases. If the plausible attacker
was already in the scheduled set, then it further carries out a
split and again isolating the attacker and victim to the same
subset. In short, the scheduler manages to reduce the number
of processes to half with victim and attacker still intact and
executing. The scheduler repeats the above step till only two
processes remain in the scheduled set, a plausible attacker
and victim. Two interesting outcomes are possible at this
stage. If the victims’ MPKI decreases on de-scheduling the
other process, then the de-scheduled process is the plausible
attacker. However, if the MPKI does not decrease, then the
victim was a beacon process but also an attacker who was
masquerading as a victim. The victim process in this case,
fired a beacon with predicted cache misses, however, the in-
formation could be simply false or the victim was hijacked
through a control data or non-control data attack and was used
in carrying out the attack. The scheduler is able to catch such
attackers as well. These cases are checked in the Algorithm
which describes the necessary details.
Some other details of the scheduling scheme are as fol-
lows. Once the loop completion beacon is fired after exiting a
loop in a scheduled process, the available cache size is also
updated accordingly and performance counter monitoring is
paused and reset. The monitoring is resumed at the next bea-
con. When all the scheduled processes finish executing the
plausible attacker is re-scheduled because the process may
be wrongly classified as an attacker (which might happen
due to inaccuracies in cache miss prediction) and therefore
such a process should be allowed to complete. Since no other
processes are scheduled such a process should progress and
complete. But if the process does not complete and other pro-
cesses are in queue to be scheduled, then the plausible attacker
will be de-scheduled again. At this point, further forensic ex-
amination can be conducted on the plausible attacker. Our
results show that our attack detection techniques are 100 %
accurate on the cryptography benchmarks and we have very
few false positives for the DS attacks for San Diego Vision
benchmarks.
Algorithm 1 Mitigation Algorithm
1: procedure MITIGATE_ATTACKER(victim)
2: while True do
3: if #processes== 1 then
4: DESCHEDULE_PROCESS(victim)
5: if #processes== 2 then
6: prevMPKI← GETMPKI(victim)
7: DESCHEDULE_PROCESS(other_proc)
8: newMPKI← GETMPKI(victim)
9: if newMPKI< prevMPKI then
10: attacker← other_proc
11: else
12: DESCHEDULE_PROCESS(victim)
13: RESUME_PROCESS(other_proc)
14: attacker← victim
15: prevMPKI← GETMPKI(victim)
16: DESCHEDULE_PROCESS(1− (#process/2))
17: newMPKI← GETMPKI(victim)
18: if newMPKI< prevMPKI then
19: RESUME_PROCESS(1− (#process/2))
20: DESCHEDULE_PROCESS((#process/2 +
1)−#process)
21: MITIGATE_ATTACKER(victim)
22: else
23: MITIGATE_ATTACKER(victim)
6 Evaluation
The experiments were conducted on Dell PowerEdge R440
server which is equipped with Intel Xeon Gold 5117 processor
clocked at 2.00 GHz. Dell PowerEdge R440 has two sockets,
each consisting of 14 cores and 11-way associative 19 MB
LLC. We decided to carry our experiments with up to 18 jobs
utilizing 18 cores of the machine a little over 50% utilization
in accordance with our threat model (it may be noted that
the server farms have an utilization of less than 50% to effec-
tively give SLA (service level agreement) guarantees). Other
8
Figure 2: Cache Model Accuracy
daemon process executing on the machine should not disturb
the experiments as more free cores are available. The server
OS is Ubuntu 18.04 with 4.15 linux kernel and our baseline
scheduler is the Completely Fair Scheduler (CFS), the default
linux scheduler. We use scikit-learn to build the cache-miss
model. We write the compiler passes in LLVM 3.8 to collect
the training data for cache-miss model and then insert and
hoist the model.
Benchmarks: The cache-based side-channel attacks have
been successful in extracting the secret keys of cryptography
algorithms. We use OpenSSL’s implementation of AES, RSA,
and ECDSA to demonstrate the working of Biscuit in catching
the cache-based side-channel attacks. Each cryptography al-
gorithm encrypts and then decrypts random strings of random
lengths with a random secret key. To show that Biscuit can
successfully detect and mitigate the cache-based side-channel
attacks in a multi-tenant environment, we run several image
analysis and processing applications from San Diego Vision
Benchmarks Suite [34] alongside the cryptography applica-
tions to simulate a real world job mix on server farms. We
show that Biscuit catches cache-based side-channel attacks
on cryptography algorithms with all three different attacks
namely– Prime and Probe, Flush and Reload, and Flush and
Flush. In case of non-crypto applications, the attacks manage
to degrade the service of the victim by inducing huge number
of cache misses in the victim. Biscuit can even detect such
degradation of service attacks. We also show the effective-
ness of Biscuit against degradation of service using Flush and
Reload on SDVBS.
Attack Setup: Flush and Reload and Flush and Flush at-
tacks require sharing of pages between the victim. These
attacks map the shared library into its virtual address space.
Although Prime and Probe attack does not require sharing
of pages with the victim, the page sharing makes it easier to
create the eviction sets. We share the libcrypto library for
OpenSSL for the side-channel attacks. For degradation of ser-
vice attack on SDVBS, we create a libsdcommon library from
the common folder consisting of general functions in SDVBS.
We also profile the libraries to find the library code address
range that is used by the benchmarks. These ranges make it
easier for the attacker to know the exact pages to attack for the
information. As the first step, Prime and Probe must build an
eviction set, a set of addresses that belong to the same cache
set, from physical addresses. It accesses its pagemap to figure
out the conversion from virtual address to physical address.
The attacker chooses 11 physical addresses (because 11-way
associative cache) to create the eviction set to prime the cache
and probe it later.
Evaluation Method: We create different configuration con-
sisting of different applications ranging from 3 to 18 in num-
ber to test the Biscuit scheduler versus the CFS. We first test
the scheduler efficiency in terms of the performance of the
applications scheduled without the attacker. Note that even
without the attacker, Biscuit monitors the cache misses and
regularly checks for attacks. Then we test the ability of the
scheduler to catch the attack. In this test, we add the attack to
the configuration. When demonstrating attack over OpenSSL,
we use applications from SDVBS as co-executing applica-
tions and vice-versa. For example, to test an attack on AES
in a configuration of three processes, we used SVM from
SDVBS and the attack as the co-executing process.
9
(a) AES Time (b) RSA Time (c) ECSDA Time
Figure 3: Timing between CFS and Biscuit for OpenSSL Algorithms alongside no attacks
(a) Disparity Time (b) Multi-NCut Time (c) SVM Time
(d) Localization Time (e) Mser Time (f) Sift Time
(g) Stitch Time (h) Texture Synthesis Time (i) Tracking Time
Figure 4: Timing between CFS and Biscuit for SDVBS Algorithms alongside no attacks
10
(a) Flush+Reload Misses (b) Flush+Flush Misses (c) Prime+Probe Misses
Figure 5: AES Misses
(a) Flush+Reload Misses (b) Flush+Flush Misses (c) Prime+Probe Misses
Figure 6: RSA Misses
(a) Flush+Reload Misses (b) Flush+Flush Misses (c) Prime+Probe Misses
Figure 7: ECDSA Misses
11
(a) Disparity Misses (b) SVM Misses (c) Multi-NCut Misses
(d) Localization Misses (e) Mser Misses (f) Sift Misses
(g) Stitch Misses (h) Texture Synthesis Misses (i) Tracking Misses
Figure 8: SDVBS Cache Misses
12
Benchmarks F-Score
OpenSSL 1
SDVBS 0.9230769231
Table 1: F-Score
6.1 Cache Model Accuracy
The cache-model accuracy depends on the training data and
the types of loops. For OpenSSL, we use random strings of
different lengths and random secret-keys for training and the
randomness persits even for testing. For SDVBS, we leverage
the input data sets provided with the suite for training and
testing. The model predicted value is further compromised in
the case of expected beacons in which the loop bounds used
in the model are expected loop bound obtained during train-
ing and not the actual iterations. Since, we use the standard
deviation ratio as in Equation 7 to predict cache-misses, we
account the k∗CM to get the error in prediction with test data.
The average accuracy is 95% as shown in Figure 2.
In each OpenSSL application, random strings of various
lengths with random secret-key are encrypted and decrypted
multiple times. The number of times the algorithm runs is
different in case of training versus testing. For example, one
application was run for 5000 times during training versus
7500 times during testing. While the accuracy is 100% for
AES, it is 96% for RSA and ECDSA. During the runs with the
Biscuit scheduler, the attacks on OpenSSL are always caught
because the attacker is causing cache misses even when the
cache is just being warmed up for the loop leading to very
high cache misses. Once the attacker is caught, the cache
is already warmed up and the execution continues normally.
Therefore all cache-based side-channel attacks are caught
by Biscuit scheduler and the attacker is isolated with 100%
accuracy using the scheduling Algorithm 1 described earlier
without besmirching a normal process. In other words, all
attacks are caught with no false positives and false negative
leading to an F-Score of 1.
SDVBS has different image sizes–test, sim, sqcif, qcif, cif,
vga, and fullhd. We mostly use cif and vga for training and
test on fullhd because these sizes are large enough to cause
atleast some cache misses for each loop compared to the
first four that execute without causing cache misses due to
small loop bound. However, three applications (localization,
svm, and multi_ncut) do not have fullhd or vga data. Hence,
localization is trained on qcif and cif and tested on vga. Svm
and multi_ncut are trained on sqcif and qcif and tested on cif.
Because image sizes are small, svm finishes way too fast for
the attacker to attack the process although the cache model
has high accuracy. However, multi_ncut takes minutes for cif
data image so this benchmark is easily attacked. These issues
cause fluctuations in the accuracy of cache miss prediction
which also lead to few false positives when co-executing with
more than 12 processes for two benchmarks leading to an F-
Figure 9: Detection Efficiency for OpenSSL
score of 0.92. There are six configurations that each produce
one false positive. However, in the case of false positives, the
attacker is still caught first because in spite of lower accuracy
of the cache miss model, the scheduler picks out the attacker
based on dynamic MPKI.
6.2 Detection Efficiency
The schedulers’ efficiency in catching these side-channel at-
tacks is determined by how early was the attacker caught;
the earlier an attack is caught the better, since this limits the
attacker’s ability to extract more information on the secret
key. With this motivation, we define the detection efficiency
as
D = 1− Attack active time
Total time of victim
(11)
where Attack active time is the duration from when the attack
started to the time when the attack was caught. Since all pro-
cesses are started simultaneously, the above equation signifies
the ratio of time the victim executed relieved of the attacker or
in other words the time the victim was not under attack after
the adversary started executing. Detection efficiency of 100%
means the attacker was detected before any attack happened
and 0% means the attacker was not detected at all during the
attack.
In case of cache-based side-channel attack on OpenSSL as
shown in Figure 9, we can see that the attack was caught
early in all the algorithms except Prime+Probe because
Prime+Probe causes cache misses in the victim very slowly
compared to Flush+Reload or Flush+Flush. Also compara-
tively Prime+Probe utilizes only one memory address because
more memory addresses mean more eviction sets that take
memory space and time to initialize. Overall on an average the
scheduler detects the cache-based side-channel attack before
the attacker can leak more than 50% of the secret-key.
In case of degradation of service attack on SDVBS as
shown in Figure 10, the detection efficiency is really low
due to two reasons. The first reason is the shared library’s
attack memory lines might not be used a lot by the victim. Our
beacon is hoisted at the outerloop and the misses take time
to accumulate so the attack gets detected late. The second
reason is the program has a short execution time so the attack
is detected very late because it needs to accumulate misses.
13
Figure 10: Detection Efficiency for SDVBS
For few cases in which the cache model is less accurate, the
attacker can still be detected because the predicted misses
are exceeded and the scheduler starts mitigation using MPKI.
However, in few cases the program executes too fast for any
detection and mitigation of the attacker. In two particular
cases, svm and mser benchmark finished executing even be-
fore the attacker caused cache-misses. Overall on average the
detection efficiency is less than 25% but the scheduler saves
these applications from the attacker.
6.3 Performance
The timing differences between Biscuit and CFS for OpenSSL
and few SDVBS benchmarks are shown in Figure 3 and Fig-
ure 4. The Biscuit Scheduler is almost efficient as the CFS.
This can be easily seen in short programs like in SVM’s fig-
ure that shows the two curves being interchangeable. For the
longer programs like Multi-NCut in SDVBS, CFS is better by
maximum of 6% than Biscuit. This six percent overhead is
due to Biscuit’s constant monitoring of cache misses.
As mentioned before, Biscuit catches the cache-based at-
tacks in all the benchmarks. Since MSER and SVM execute
for a very short time, these benchmarks could not be attacked.
The cache misses in the Biscuit scheduler which detected and
mitigated the attack versus CFS which did not mitigate the
attack is shown in Figures 5 to 8. In the case of OpenSSL,
the cache misses decrease by 5x except for the Prime and
Probe attack which had its shortcomings as explained before.
In SDVBS, the longer programs also had a large decrease in
their cache misses. For MSER and SVM, they looked iden-
tical to CFS Flush+Reload because there is no attack due to
them completing very fast. Other observations include lines
overlapping and the Biscuit scheduler sometimes having more
misses than CFS with attack in SDVBS. The lines overlapping
like in Figure 6 for Flush+Reload is not actually an overlap.
The two graphs have different values but the CFS attack has
a lot of misses causing the other two lines to get combined
based on the y-axis. Biscuit scheduler has more misses in few
cases where a process falsely got flagged as an attack and
got de-scheduled. Once they are re-scheduled, they have to
rewarm their cache creating more misses. The other reason
being the non-determinism of the program with cache misses
that caused Biscuit to have more misses like in Prime + Probe
for AES. Overall, Biscuit defends against cache-based side-
channel and degradation of service attacks with an overhead
of less than 6% during normal (no attack) execution.
7 Related Work
There has been a lot of research in cache-based side channel
attacks. The research can be categorized based on software or
hardware approaches and can be further divided based on the
methods such as prevention, detection, or mitigation.
Cache-mapping partitions the cache to make it harder for at-
tackers to figure out address which belong to same set. Some
methods such as [26,27] encrypt the memory addresses using
secret keys that are stored in isolated memory. This requires
additional memory and the key must be changed regularly.
Other methods uses randomness in cache replacement policy
to avoid the attacker from mapping the cache sets like RP-
Cache [37] and NewCache [19]. However, these do not scale
with Last-Level Cache.
In addition to cache-mapping, line-locking like PLcache
[37] locks cache lines such that the attacker cannot evict
these cache lines. This reduces the cache space available for
processes. SHARP [39] only evicts cache lines that are not
available in L1 or L2 caches of any process. This method
requires changing hardware to allow more communication
between L1, L2, and LLC caches. In addition, SHARP throws
an alarm which needs to be caught by the OS and can be false
positive.
Some methods require disrupting the time signals so that
the attacker cannot use the timing methods. Some approaches
take programs and disrupt the cycles by buffering extra in-
structions [38]. The two programs are identical in terms of
functionality but the extra instructions means slower execu-
tion. Other methods add noise frequently to programs [44].
This method works for L1 caches but not for Last-level caches
because the attacker is basing his assumption on the virtual
addresses.
These mechanisms and the ones compared against in intro-
duction either require hardware changes or do hard isolation
leading to cache reservation for the defense mechanism or do
incur many false positives and negatives. Biscuit solves these
problems with a cache-miss model tailored at the granular-
ity of loop and the scheduler that first schedules efficiently
and then detects and mitigates any cache-based side-channel
attacks and degradation of service.
8 Conclusion
In this work, we demonstrate a compiler guided scheduler
Biscuit which detects the cache based side channel attacks
for processes on multi-tenancy server farms with very high
14
accuracy. We show that Biscuit is able to detect and miti-
gate Prime+Probe, Flush+Reload, and Flush+Flush attacks
on OpenSSL cryptography algorithms with an F-score of 1
and also detect and mitigate degradation of service on vision
application suite with an F-score of 0.92 Under no attack
scenario, the scheme poses low overheads (upto a maximum
of 6%). At the heart of the scheme is the generation and use
of a cache miss model which is inserted by the compiler at
the entrances of loop nests to predict the underlying cache
misses. The beacons convey such an information to the sched-
uler which uses them to co-schedule processes such that their
combined cache footprint does not exceed the maximum ca-
pacity of the last level cache. The scheduled processes are
monitored for cache misses and when an anomaly is detected,
the scheduler performs a binary search to isolate the attacker.
We believe that due to its ability to deal with multi-tenancy,
its precision and low overheads, such a scheme is practicable.
References
[1] Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay
Ligatti. Control-flow integrity. In Proceedings of the
12th ACM Conference on Computer and Communica-
tions Security, CCS 2005, Alexandria, VA, USA, Novem-
ber 7-11, 2005, pages 340–353, 2005.
[2] L. A. Barroso and U. Hölzle. The case for energy-
proportional computing. Computer, 40(12):33–37, Dec
2007.
[3] Tyler K. Bletsch, Xuxian Jiang, and Vincent W. Freeh.
Mitigating code-reuse attacks with control-flow locking.
In Twenty-Seventh Annual Computer Security Applica-
tions Conference, ACSAC 2011, Orlando, FL, USA, 5-9
December 2011, pages 353–362, 2011.
[4] Tyler K. Bletsch, Xuxian Jiang, Vincent W. Freeh, and
Zhenkai Liang. Jump-oriented programming: a new
class of code-reuse attack. In Proceedings of the
6th ACM Symposium on Information, Computer and
Communications Security, ASIACCS 2011, Hong Kong,
China, March 22-24, 2011, pages 30–40, 2011.
[5] Marco Chiappetta, Erkay Savas, and Cemal Yilmaz.
Real time detection of cache-based side-channel attacks
using hardware performance counters. Appl. Soft Com-
put., 49(C):1162–1174, December 2016.
[6] John Criswell, Nathan Dautenhahn, and Vikram S. Adve.
Kcofi: Complete control-flow integrity for commodity
operating system kernels. In 2014 IEEE Symposium on
Security and Privacy, SP 2014, Berkeley, CA, USA, May
18-21, 2014, pages 292–307, 2014.
[7] Xinyang Ge, Weidong Cui, and Trent Jaeger. GRIFFIN:
guarding control flows using intel processor trace. In
Proceedings of the Twenty-Second International Con-
ference on Architectural Support for Programming Lan-
guages and Operating Systems, ASPLOS 2017, Xi’an,
China, April 8-12, 2017, pages 585–598, 2017.
[8] Daniel Genkin, Adi Shamir, and Eran Tromer. Rsa key
extraction via low-bandwidth acoustic cryptanalysis. In
Juan A. Garay and Rosario Gennaro, editors, Advances
in Cryptology – CRYPTO 2014, pages 444–461, Berlin,
Heidelberg, 2014. Springer Berlin Heidelberg.
[9] Daniel Gruss, Clémentine Maurice, Klaus Wagner, and
Stefan Mangard. Flush+ flush: a fast and stealthy cache
attack. In International Conference on Detection of
Intrusions and Malware, and Vulnerability Assessment,
pages 279–299. Springer, 2016.
[10] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard.
Cache template attacks: Automating attacks on inclu-
sive last-level caches. In 24th USENIX Security Sympo-
sium (USENIX Security 15), pages 897–912, Washing-
ton, D.C., August 2015. USENIX Association.
[11] Tobias Gysi, Tobias Grosser, Laurin Brandner, and
Torsten Hoefler. A fast analytical model of fully associa-
tive caches. In Proceedings of the 40th ACM SIGPLAN
Conference on Programming Language Design and Im-
plementation, PLDI 2019, page 816–829, New York,
NY, USA, 2019. Association for Computing Machinery.
[12] Hong Hu, Chenxiong Qian, Carter Yagemann, Simon
Pak Ho Chung, William R. Harris, Taesoo Kim, and
Wenke Lee. Enforcing unique code target property for
control-flow integrity. In Proceedings of the 2018 ACM
SIGSAC Conference on Computer and Communications
Security, CCS ’18, pages 1470–1486, New York, NY,
USA, 2018. ACM.
[13] S. Jana and V. Shmatikov. Memento: Learning secrets
from process footprints. In 2012 IEEE Symposium on
Security and Privacy, pages 143–157, May 2012.
[14] Taesoo Kim, Marcus Peinado, and Gloria Mainar-Ruiz.
Stealthmem: System-level protection against cache-
based side channel attacks in the cloud. In Proceedings
of the 21st USENIX Conference on Security Symposium,
Security’12, page 11, USA, 2012. USENIX Association.
[15] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin,
Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad
Lai, and Onur Mutlu. Flipping bits in memory with-
out accessing them: An experimental study of dram
disturbance errors. In Proceeding of the 41st Annual
International Symposium on Computer Architecuture,
ISCA ’14, pages 361–372, Piscataway, NJ, USA, 2014.
IEEE Press.
15
[16] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas,
Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas
Prescher, Michael Schwarz, and Yuval Yarom. Spectre
attacks: Exploiting speculative execution. arXiv preprint
arXiv:1801.01203, 2018.
[17] Yusuf Kulah, Berkay Dincer, Cemal Yilmaz, and Erkay
Savas. Spydetector: An approach for detecting side-
channel attacks at runtime. International Journal of
Information Security, 18(4):393–422, Aug 2019.
[18] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas
Prescher, Werner Haas, Anders Fogh, Jann Horn, Stefan
Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom,
and Mike Hamburg. Meltdown: Reading kernel memory
from user space. In 27th USENIX Security Symposium
(USENIX Security 18), pages 973–990, Baltimore, MD,
August 2018. USENIX Association.
[19] F. Liu, H. Wu, K. Mai, and R. B. Lee. Newcache: Secure
cache architecture thwarting cache side-channel attacks.
IEEE Micro, 36(5):8–16, Sep. 2016.
[20] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee. Last-
level cache side-channel attacks are practical. In 2015
IEEE Symposium on Security and Privacy, pages 605–
622, May 2015.
[21] Y. Liu, P. Shi, X. Wang, H. Chen, B. Zang, and H. Guan.
Transparent and efficient cfi enforcement with intel pro-
cessor trace. In 2017 IEEE International Symposium
on High Performance Computer Architecture (HPCA),
pages 529–540, Feb 2017.
[22] C. Luo, Y. Fei, P. Luo, S. Mukherjee, and D. Kaeli. Side-
channel power analysis of a gpu aes implementation. In
2015 33rd IEEE International Conference on Computer
Design (ICCD), pages 281–288, Oct 2015.
[23] Vishwath Mohan, Per Larsen, Stefan Brunthaler,
Kevin W. Hamlen, and Michael Franz. Opaque control-
flow integrity. In 22nd Annual Network and Distributed
System Security Symposium, NDSS 2015, San Diego,
California, USA, February 8-11, 2015, 2015.
[24] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache
attacks and countermeasures: the case of aes. In Cryp-
tographers’ track at the RSA conference, pages 1–20.
Springer, 2006.
[25] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gram-
fort, Vincent Michel, Bertrand Thirion, Olivier Grisel,
Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vin-
cent Dubourg, and et al. Scikit-learn: Machine learning
in python. J. Mach. Learn. Res., 12(null):2825–2830,
November 2011.
[26] Moinuddin K Qureshi. Ceaser: Mitigating conflict-
based cache attacks via encrypted-address and remap-
ping. In 2018 51st Annual IEEE/ACM International
Symposium on Microarchitecture (MICRO), pages 775–
787. IEEE, 2018.
[27] Moinuddin K. Qureshi. New attacks and defense for
encrypted-address cache. In Proceedings of the 46th In-
ternational Symposium on Computer Architecture, ISCA
’19, pages 360–371, New York, NY, USA, 2019. ACM.
[28] M. Sabbagh, Y. Fei, T. Wahl, and A. A. Ding. Scadet:
A side-channel attack detection tool for tracking prime-
probe. In 2018 IEEE/ACM International Conference
on Computer-Aided Design (ICCAD), pages 1–8, Nov
2018.
[29] Asanka Sayakkara, Nhien-An Le-Khac, and Mark Scan-
lon. A survey of electromagnetic side-channel attacks
and discussion on their case-progressing potential for
digital forensics. Digital Investigation, 29:43–54, Jun
2019.
[30] Hovav Shacham. The geometry of innocent flesh on
the bone: return-into-libc without function calls (on the
x86). In Proceedings of the 2007 ACM Conference
on Computer and Communications Security, CCS 2007,
Alexandria, Virginia, USA, October 28-31, 2007, pages
552–561, 2007.
[31] Eran Tromer, Dag Arne Osvik, and Adi Shamir. Efficient
cache attacks on aes, and countermeasures. Journal of
Cryptology, 23(1):37–71, 2010.
[32] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel
Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein,
Thomas F Wenisch, Yuval Yarom, and Raoul Strackx.
Foreshadow: Extracting the keys to the intel {SGX}
kingdom with transient out-of-order execution. In 27th
{USENIX} Security Symposium ({USENIX} Security
18), pages 991–1008, 2018.
[33] Venkatanathan Varadarajan, Thomas Ristenpart, and
Michael Swift. Scheduler-based defenses against
cross-vm side-channels. In Proceedings of the 23rd
USENIX Conference on Security Symposium, SEC’14,
page 687–702, USA, 2014. USENIX Association.
[34] Sravanthi Kota Venkata, Ikkjin Ahn, Donghwan Jeon,
Anshuman Gupta, Christopher Louie, Saturnino Garcia,
Serge Belongie, and Michael Bedford Taylor. Sd-vbs:
The san diego vision benchmark suite. In 2009 IEEE
International Symposium on Workload Characterization
(IISWC), pages 55–64. IEEE, 2009.
[35] Shuai Wang, Yuyan Bao, Xiao Liu, Pei Wang, Danfeng
Zhang, and Dinghao Wu. Identifying cache-based side
16
channels through secret-augmented abstract interpreta-
tion. In 28th USENIX Security Symposium (USENIX
Security 19), pages 657–674, Santa Clara, CA, August
2019. USENIX Association.
[36] Shuai Wang, Pei Wang, Xiao Liu, Danfeng Zhang, and
Dinghao Wu. Cached: Identifying cache-based timing
channels in production software. In 26th USENIX Secu-
rity Symposium (USENIX Security 17), pages 235–252,
Vancouver, BC, August 2017. USENIX Association.
[37] Zhenghong Wang and Ruby B Lee. New cache de-
signs for thwarting software cache-based side channel
attacks. ACM SIGARCH Computer Architecture News,
35(2):494–505, 2007.
[38] Meng Wu, Shengjian Guo, Patrick Schaumont, and Chao
Wang. Eliminating timing side-channel leaks using pro-
gram repair. In Proceedings of the 27th ACM SIGSOFT
International Symposium on Software Testing and Anal-
ysis, pages 15–26. ACM, 2018.
[39] Mengjia Yan, Bhargava Gopireddy, Thomas Shull, and
Josep Torrellas. Secure hierarchy-aware cache replace-
ment policy (sharp): Defending against cache-based
side channel atacks. SIGARCH Comput. Archit. News,
45(2):347–360, June 2017.
[40] Yuval Yarom and Naomi Benger. Recovering openssl
ecdsa nonces using the flush+ reload cache side-channel
attack. 2014.
[41] Yuval Yarom and Katrina Falkner. Flush+reload: A
high resolution, low noise, l3 cache side-channel attack.
In 23rd USENIX Security Symposium (USENIX Secu-
rity 14), pages 719–732, San Diego, CA, August 2014.
USENIX Association.
[42] Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, Laszlo
Szekeres, Stephen McCamant, Dawn Song, and Wei
Zou. Practical control flow integrity and randomization
for binary executables. In 2013 IEEE Symposium on
Security and Privacy, SP 2013, Berkeley, CA, USA, May
19-22, 2013, pages 559–573, 2013.
[43] Tianwei Zhang, Yinqian Zhang, and Ruby B Lee.
Cloudradar: A real-time side-channel attack detection
system in clouds. In International Symposium on Re-
search in Attacks, Intrusions, and Defenses, pages 118–
140. Springer, 2016.
[44] Yinqian Zhang and Michael K Reiter. Düppel:
retrofitting commodity operating systems to mitigate
cache side channels in the cloud. In Proceedings of the
2013 ACM SIGSAC conference on Computer & commu-
nications security, pages 827–838. ACM, 2013.
[45] Ziqiao Zhou, Michael K. Reiter, and Yinqian Zhang. A
software approach to defeating side channels in last-
level caches. In Proceedings of the 2016 ACM SIGSAC
Conference on Computer and Communications Security,
CCS ’16, page 871–882, New York, NY, USA, 2016.
Association for Computing Machinery.
[46] Xiaotong Zhuang, Tao Zhang, and Santosh Pande. Hide:
an infrastructure for efficiently protecting information
leakage on the address bus. ACM SIGOPS Operating
Systems Review, 38(5):72–84, 2004.
17
