CACHE SNIPER : Accurate timing control of cache evictions by Briongos, Samira et al.
ar
X
iv
:2
00
8.
12
18
8v
1 
 [c
s.C
R]
  2
7 A
ug
 20
20
CACHESNIPER: Accurate timing control of cache evictions
Samira Briongos
NEC Laboratories Europe
Ida Bruhns
Universität zu Lübeck
Pedro Malagón
ETSIT-LSI-DIE Universidad Politécnica de Madrid
Thomas Eisenbarth
Universität zu Lübeck
José M. Moya
ETSIT-LSI-DIE Universidad Politécnica de Madrid
Abstract
Microarchitectural side channel attacks have been very
prominent in security research over the last few years.
Caches have been an outstanding covert channel, as they
provide high resolution and generic cross-core leakage even
with simple user-mode code execution privileges. To prevent
these generic cross-core attacks, all major cryptographic li-
braries now provide countermeasures to hinder key extrac-
tion via cross-core cache attacks, for instance avoiding secret
dependent access patterns and prefetching data. In this paper,
we show that implementations protected by ‘good-enough’
countermeasures aimed at preventing simple cache attacks
are still vulnerable. We present a novel attack that uses a spe-
cial timing technique to determine when an encryption has
started and then evict the data precisely at the desired instant.
This new attack does not require special privileges nor ex-
plicit synchronization between the attacker and the victim.
One key improvement of our attack is a method to evict data
from the cache with a single memory access and in absence
of shared memory by leveraging the transient capabilities of
TSX and relying on the recently reverse-engineered L3 re-
placement policy. We demonstrate the efficiency by perform-
ing an asynchronous last level cache attack to extract an RSA
key from the latest wolfSSL library, which has been espe-
cially adapted to avoid leaky access patterns, and by extract-
ing an AES key from the S-Box implementation included in
OpenSSL bypassing the per round prefetch intended as a pro-
tection against cache attacks.
1 Introduction
Modern CPUs are highly optimized to achieve the maxi-
mum performance and efficiency for existing manufacturing
technologies while ensuring logical correctness and isola-
tion between processes. But since any running software in-
teracts with the microarchitectural elements of the processor
and changes their state, shared hardware resources are influ-
enced by all processes running on a machine. An attacker can
leverage these shared resources to recover secret information
from a victim software by exploiting the microarchitectural
state [6, 15, 39, 44, 46, 57, 58, 61].
Among the microarchitectural elements utilized in these
attacks, the cache memory has probably been the most ex-
ploited hardware resource. Traditionally, cache side chan-
nel attacks have focused on cryptographic implementa-
tions. For instance, cache attacks have successfully retrieved
ECDSA [68], RSA [22,33,70] and AES [4,9,36] keys, even
breaking the isolation between virtual machines (VMs) [34,
36,56]. As these attacks do not require any special privileges
to succeed, side channel attacks pose a great threat and keep
gaining attention.
As a direct response to the threat imposed by microarchi-
tectural attacks, many different countermeasures were put in
place. As described by Ge et al. [23], preemptive counter-
measures try to help in the hard task of designing leakage
free code [12, 21, 37, 66], hardware countermeasures either
design or take advantage of hardware features to avoid the
leakage and detection based countermeasures try to deter-
mine whether there is an attack going on [8, 11, 16, 31, 71].
In this paper, we will focus on the preemptive countermea-
sures of side channel resistant code and prefetching.
Prefetching is used as a strategy to improve performance
as well as preventing attackers from observing the cache state
in cache attacks [4]. For example, a prefetching strategy was
implemented in OpenSSL 1.0.0a and beyond: The S-Box
implementation prefetches the S-Box before executing each
round [13,55]. If the attacker flushes a line holding data from
the S-Box before or during the execution of any intermediate
round, and waits till the encryption has been performed, she
will only observe cache hits. Consequently, she would not be
able to distinguish whether those access are due to the actual
utilization of the line or to the prefetch stage [13].
Several recent works have shown that in scenarios where
the attacker controls the OS, even smallest leakages can
be exploited. Many recent prominent works have either ex-
ploited the intra-core resource sharing of simultaneous mul-
tithreading (SMT) [7, 26, 59, 61] or the ability of interrupted
1
execution of SGX [14,51,52,60]. These attacks are so power-
ful that several cryptographic libraries, including OpenSSL,
now ignore them: the cost of protecting against them is exu-
berant and arguably not justified in scenarios where attacker
control of the OS is out of scope (or simply worse than the
attacks). Thus, they make do with the imperfect but allegedly
sufficient countermeasures of prefetching and leakage mini-
mization described above.
In this work, we show that even a classic user-lever cache
adversary can overcome these countermeasures. At a very
high level, the attacker has an offline stakeout phase where
she determines the time elapsed between the start of the tar-
get algorithm and the use of the target function or data. In the
online phase, she then detects the start of the target algorithm
by the victim, waits for the time previously determined, and
evicts the target data from the cache. The attacker can then
collect valuable data to extract the key. Due to the high preci-
sion gained from the exact timing of the eviction, very small
windows of opportunity can be leveraged. We demonstrate
the attack on an OpenSSL AES S-Boximplementation and
a square and multiply implementation from wolfSSL. Both
are considered protected from side channel attacks, but both
have vulnerabilities that can be exploited by CACHESNIPER.
Contributions
In summary, this paper presents the following contributions.
- We present CACHESNIPER, a methodology to evict data
from the cache at the desired instant, even in the absence
of shared memory, taking advantage of the replacement
policy.
- We reduce the number of samples required to get a cryp-
tographic key by precisely selecting the instant of time
at which we desire to interrupt a victim process.
- We demonstrate that prefetching data to the cache is not
effective against cache attacks, if there is a window of
time during the execution of a sensitive part of code, at
which private data is not accessed uniformly.
- We show in realistic experimental setups the feasibil-
ity of side channel last-level cache attacks against the
AES S-Box implementation that includes prefetching
as countermeasure, and against an RSA implementation
protected by a side channel resistant implementation.
Disclosure and Code Publication
We demonstrate ourmethod by attacking two real world cryp-
tographic implementations, an AES S-Boximplementation
by OpenSSL (version 1.0.2k) and an RSA square-and-
multiply implementation by wolfSSL (version 4.4.0). We re-
sponsibly disclosed to both libraries on June 22nd and 23rd.
The communication with the OpenSSL team is still ongoing.
No CVE was issued since CACHESNIPERfalls outside their
threat model. Additionally, the S-Boximplementation was re-
moved entirely from the latest version of OpenSSL. wolfSSL
immediately issued CVE-2020-15309 and proposed a fix for
the vulnerability, which we checked and acknowledged.
2 Background
This section introduces some basic concepts on cache mem-
ory, cache attacks and transactional execution that are of key
importance in the efficiency of the proposed cache attack.
2.1 Cache architecture
Caches are small memories located between the processor
and the main memory, specially designed to reduce the gap
between processor and memory throughput. Modern proces-
sors include cache memories that are hierarchically orga-
nized; low level caches (L1 and L2) are core private, smaller
and closer to the processor (with reduced latency), whereas
the last level cache (LLC or L3) is bigger and shared among
all the cores. Intel processors have traditionally included L3
inclusive caches, in order to simplify the implementation of
cache coherence: all the data which is present in the private
low-level caches has to be in the shared L3 cache.
Most modern processors include w-way set-associative
caches; a trade-off between directly mapped caches, usually
with high cache miss rates, and fully associative caches, with
a very complex logic. The cache is organized into multiple
sets (s), each of them containing w lines of usually 64 bytes
of data. The set in which each line is placed is derived from
its address (directly mapped). The address bits are divided
into offset (lowest-order bits used to locate data within a line),
index (log2(S) consecutive bits starting from the offset bits
that address the set) and tag (remaining bits which identify if
the data is cached). Many caches additionally contain slices.
The entries are mapped to the slices by a function f , which
usually depends on some fixed bits of the data. Slice selec-
tion mechanisms are usually not public, and effort has gone
into reverse engineering them. The do however not have an
impact on CACHESNIPER.
2.2 Replacement policies
At some point during the use of a set-associative cache, data
will need to be loaded into the cache after a cache miss, but
the corresponding cache set is full. At this point, an algorithm
called the replacement policy decides which of the currently
cached lines is evicted and replaced with the line accessed.
As it is crucial for maximizing the hit ratio and achieving
good performance, manufacturers do not publish the imple-
mentation details of the policy. Each cache level has it’s own
replacement policy. Based on the observations of the evic-
tions in the LLC,most modern processors implement pseudo-
2
LRU (Last Recently Used) policies. Regarding Intel, the offi-
cial name of the LLC replacement policy is “The Quad-Age
LRU” [41].
There have been several efforts to reverse engineer the re-
placement policy of Intel processors in order to estimate/mea-
sure the number of misses, without explaining which con-
crete elements in the cache would be evicted in the event of
a miss [1, 2, 67]. Later studies [28, 63] have focused on evic-
tion strategies related to maximize the number of evictions in
order to improve memory attacks. Recently, the replacement
policy of all the cache levels of modern Intel processors have
been reverse engineered and published [3, 10, 62].
2.3 Transactional memory and Intel TSX
Intel TSX is an instruction set extension for x86 that sup-
ports Transactional memory. Transactional memory enables
optimistic execution of the transactional code regions speci-
fied by the programmer. The processor executes the specified
sections assuming that there is no conflict with other threads
or CPU cores, which might access or modify the same data.
Transactional memory reduces the need of mutual exclusion
mechanisms, using a local version of data and registering a
hardware-based callback mechanism in case a conflict with
other threads is detected. If the execution ends successfully,
the processor commits all the changes as if they had occurred
instantaneously, becomingvisible to the remaining processes.
Otherwise, the transaction is cancelled, all memory changes
are discarded and a callback function is called. This process
is known as an Abort, and the callback is known as an Abort
handler. There are various reasons why a transaction may
abort in Intel TSX, but we particularly focus on the cache
related ones. Namely, a transaction aborts if data from its
“write set” is evicted from the L1 cache or if data from its
“read set” is evicted from the L3 cache [19, 20]. AMD pro-
vides similar transaction mechanisms and ARM is introduc-
ing it.
2.4 Related work
Cache memory was first mentioned as a covert channel in
1992 [32]. Since then, many different techniques have been
developed: Kelsey entertained the idea of attacks based on
cache hit ratios, Osvik et al. proposed the widely known
Evict+Time and Prime+Probe attacks, revealing the cache
sets accessed by the victim, and Gullasch and Yarom and
Falkner both developed a powerful attack that exploits shared
memory, which they later named Flush+Reload [30, 42, 53,
69]. From these attacks, Flush+Reload and Prime+Probe
are widely used due to their high resolution. This work fo-
cuses on Flush+Reload and Prime+Abort, a derivative of
Prime+Probe that leverages transactional aborts to detect
cache evictions caused by the victim process [20].
The Flush+Reload technique requires shared memory,
which means the victim and attacker use the same data dur-
ing their respective execution. This can be met by them using
the same library, which is often the case for libraries shipped
with the operating system. In addition, a clflush instruc-
tion is needed on the target processor, which is not the case
in many scenarios, e.g. when attacking from JavaScript [24].
The attacker uses this instruction to flush the desired lines
from the cache, making sure the victim process needs to load
them from memory to use them. She then reloads the data,
measuring the time this takes. If the victim process used the
data, the reload time observed by the attacker will be short.
This attack is easy to implement and provides precise infor-
mation about the data the victim process uses at cache-line
granularity.
The Flush+Reload attack has been used in many ways
since it was introduced. Gullasch et al. retrieved an AES
key and Yarom et al. demonstrated that one trace is enough
to retrieve an RSA key, and attacked ECDSA [30, 68, 69].
Flush+Reload is applicable to launch cross-VM-attacks,
used for attacking AES, retrieve keystrokes or profile the vic-
tim [4, 29, 38, 56].
It is vital for Flush+Reload that the attacker and the vic-
tim share memory. The Prime+Probe attack still works in
virtual environments [22,33]. An attacker can target the LLC
cache, particularly one set of the LLC, and it will still able to
extract sensitive information. Since Prime+Probe does not
require special OS features, it can be applied on virtually
any system. As a preparation step for a Prime+Probe attack,
the attacker needs to construct an eviction set (a group of
w different addresses that map to one specific set in w-way
set-associative caches). Constructing eviction sets and deal-
ing with missing address information as well as slice selec-
tion mechanisms has been discussed extensively in the liter-
ature [22, 34, 36, 63].
One of the drawbacks of both the Flush+Reload and
Prime+Probe technique is the necessity for precise timers to
detect whether the victim accessed the memory. The timer-
less attack Prime+Abort exploits Intel’s implementation of
Hardware Transactional Memory TSX: it starts a transaction
to fill the the targeted cache set (prime), then waits if it re-
ceives an abort because the victim has accessed this set [20].
However it does obviously require Intel TSX, which is not
available on all machines.
The OpenSSL AES implementation protected by the
prefetch was attacked by Ashokkumar et al. in 2018. They
used a chosen plaintext approach targeting the first and sec-
ond round of the AES in OpenSSL [13]. Cohney et al. at-
tack the last round of the AES, but in a very different setting.
They target a deeper layer implementation of T-Table AES
used only to encrypt seeds for the pseudo random number
generator in AES [17]. While failing to protect these inner
implementations the same way as the public ones is a no-
table oversight, it is also one of the main difference to our
3
target: We attack an implementation that is explicitly pro-
tected from cache side channel attacks. The second differ-
ence is that as many other side channel attacks, this one uses
SGX enclaves in step mode to enhance their timing granu-
larity. Moghimi et al. [50] also target SGX assuming a mali-
cious or compromised operating system. The scenario allows
them to use the L1 cache as source of information. This way
they retrieve various samples per round and manage to dis-
tinguish the prefetching stages from the normal operations
of the round for both T-Table and an S-Box implementations
of AES.
CACHESNIPER does not require SGX, nor the frequent in-
terrupts of the victim given by the powerful adversarial sce-
nario controlling the OS. It works across cores and does not
need special privileges. It only requires either sharedmemory
with the vicitim (e.g. vial libraries) or TSX. Thus, CACHES-
NIPER is a general attack technique that can be used in many
settings. We chose to demonstrate it on an AES and and RSA
implementation, but it can be applied in many scenarios to
circumvent cache attack countermeasures.
3 Target algorithms
The different cryptographic algorithms that have been shown
to be vulnerable to cache attacks have in common that they
perform secret dependant accesses to memory. An attacker
can observe these accesses through the cache and then re-
trieve the aforementioned secrets.
For this reason a common countermeasure with less im-
pact on performance than flushing the data from the caches is
to prefetch the data in the cache to prevent the attacker from
distinguishing data that have been used for the cryptographic
operation from data that have not.
We focus on two common algorithms (AES and RSA) that
have been targeted multiple times, in particular we focus on
two implementations that have been theoretically protected
against these attacks. In the following sections we describe
their implementations and explain the countermeasures they
implement.
3.1 AES S-Box implementation in OpenSSL
AES [18] is a commonly used symmetric block cipher that
operates with data in blocks of 16-bytes. It consists of differ-
ent operations (AddRoundKey, SubBytes, ShiftRows andMix-
Columns) that are repeated each round. The S-Box is the ta-
ble that holds the data for the SubBytes operation, concretely
it holds 256 byte values.
The S-Box software implementation of AES replaced the
previously used T-Table implementation, which used four ta-
bles (T-Table) with pre-computed values of the SubBytes,
ShiftRows and MixColumns operations. That is, it trans-
formed the aforementioned operations into look up opera-
tions in order to improve the performance of the encryption
and decryption processes. The accesses to the tables are key
dependant and not all of them are used during the encryp-
tion process. This fact has been exploited multiple times to
recover the secret keys [4, 9, 36].
As opposed to the T-Table implementation, the S-Box im-
plementation does not merge different operations into one.
The table is used 16 times each round, once per round in-
put byte, during the SubBytes operation. Considering a cache
line size of 64 bytes, such table uses 4 cache lines (256 bytes).
If we compute the probability of not accessing one of these
cache lines and assume a key size of 128 bits, and as a result
10 rounds, such probability is equal to 0, as shown in equa-
tion 1. As a consequence, an attacker observing the cache
before and after the encryption process will not gain any in-
formation from that observation.
Pr[no access S-Box in encryption] =
(
1−
64
256
)10∗16
= 0 (1)
Pr[no access S-Box in round] =
(
1−
64
256
)16
= 0.01 (2)
Equation 2 shows that observing each round individually
would give a 0.1 chance of not accessing one of the lines.
Instead of relying on the challenging task to stop the pro-
cess after each round, the countermeasure of prefetching was
applied: The OpenSSL S-Box implementation includes a
prefetch stage before each of the rounds. Since the 256 bytes
of the S-Box table map to 4 different cache lines, the encryp-
tion process only has to read 4 values to ensure the whole
S-Box table is loaded into the cache memory. Even when
the S-Box implementation performs key-dependant memory
access, the data will always be in the cache. As a result, tra-
ditional side channel attacks cannot be used to extract infor-
mation from this implementation. Irazoqui et al. analyzed
the OpenSSL implementation in 2017 with a tool for leak-
age detection and came to the same conclusion, declaring it
leakage free [35]. Indeed, if we compare the time it takes
to perform an encryption with the time it takes to perform
a single Probe [22] in one set, it is clear that the resolution
of a Prime+Probe attack1 is not high enough to retrieve fine-
grained information from this implementation: As we can
see in figure 1, a probe time without any cache misses takes
almost as long as the entire encryption. Even if the attacker
primes the cache before the encryption starts and is able to
tell exactly when the encryption starts, a single cache miss
will increase the time required for probing to be greater than
the total encryption time.
The S-Box implementation is not the default one in
OpenSSL when an AES encryption operation is triggered
from the command line. However, when using the C API of
the library, a call to the function AES_encrypt() will use the
S-Box. A developer wishing to use the default AES-NI in-
structions (as in the command line) has to explicitly indicate
1The Probe includes sequential accesses to the elements of the eviction
set and no cache misses that would increase the Probe times
4
400 500 600 700 800
0
0.1
0.2
Time (cycles)
Encryption time
Probe time
Figure 1: Histograms of the Probe times and the AES encryp-
tion times of the S-Box implementation measured in a ma-
chine shipped with an Intel Core i5-7600K including 1000
time samples.
it. While analyzing the shared library included in Ubuntu
16.04 or CentOS 7.6 (OpenSSL 1.0.2g), we observed an ad-
ditional protection. The OpenSSL implementation of AES
has four different S-Boxes. If there are, for example, two pro-
cesses using the library at the same time, each of them will
use a different table.
We discovered this implementation is still vulnerable,
since there are tiny time windows during the encryption pro-
cess from which the attacker can gain valuable information.
As we will show in section 5, our proposal allows an attacker
to observe information referring to just the last round, even in
the absence of shared memory. We show how to bypass the
prefetch, and perform a cross-core cache attack that recovers
the secret key of this theoretically protected implementation.
3.2 WolfSSL RSA exponentiation
RSA is the most widely used public key cryptographic algo-
rithm. It considers a public key (n,e) where n is the prod-
uct of two prime numbers p and q that remain secret, and
a private key (p,q,d) where d ≡ e−1 (mod (p− 1)(q− 1)).
In order to understand the attacks only the encryption and
decryption operations are relevant. For a message m, the ci-
phertext c is obtained as c= me (mod n) and it is recovered
with an analogous operation m = cd (mod n). In particular
the decryption, which is the exponentiation operation using
the secret key, is the target of the attack.
There are multiple ways of implementing this exponenti-
ation [25, 45]. We will focus particularly on the square-and-
multiply exponentiation, since the wolfSSL implementation
is based on this. The square-and-multiply approach scans the
bits of the secret exponent, performing a square operation in-
dependently of the value of the scanned bit, and a multiplica-
tion if such bit is equal to 1. Thus, an attacker monitoring the
square and multiply operations can retrieve the sequence of
bits of the exponent.
The countermeasures wolfSSL has deployed to protect
this implementation are to always perform the square and the
multiply operations and to load the two possible values of the
bit to keep them in the cache so they prevent an attacker from
distinguishing which one (0 or 1) was used. These are clearly
put in place to prevent cache attacks, which can be seen in
both the source code comments and the release notes [5, 54].
The resulting procedure is summarized in algorithm 1.
Algorithm 1 wolfSSL exponentiation implementation
Input: base b, modulo m, exponent e= (en−1...e0)2
Output: be (mod m)
init(R);
for i from n− 1 downto 0 do
mul(R[0],R[1],R[ei]));
red(R[ei]);
sqr(R[2],R[2]); ⊲ temp variable that avoids the leakage of R[ei]
red(R[2]);
end for
return R;
In order to analyze the wolfSSL implementation of the ex-
ponentiation, we compiled the latest version at the time of
writing this paper (version 4.4.0) with the –enable-debug –
enable-keygen flags in order to be able to keep the symbols
after the installation and to generate RSA keys. Later, we ran
the tests included in the library itself to analyze their RSA
implementation and found that the exponentiation is still vul-
nerable despite the steps taken to remove side channel vul-
nerabilities.
As in the case of AES, this approach still leaves a window
long enough for observations of the bit values. Indeed there
are two possible windows to retrieve the secret information.
Firstly, at the end of the multiplication operation in line 3
only the result referring to the actually used bit is stored. In
that copy process, one of the two possible values is loaded
while the other one remains untouched. This leaks the key bit.
The second window is even bigger, because the reduce oper-
ation in line 4 only uses the information of the actual key bit
value, not taking the precaution of loading both values. This
means that this function could be even vulnerable to a tra-
ditional cache attack, although the synchronization between
the attacker and the victim process would be a challenge.
4 Attack scenario and overview
One of the goals of this work is to demonstrate an attack
mechanism that gives an attacker more insights about what
the victim is doing and further control of the cache evictions
so she can obtain the desired information. We first describe
the considered scenario, in which the attacker does not inter-
act with the victim directly. Next, we describe the steps of
the CACHESNIPER approach.
In the considered scenario, which is depicted in Figure 2,
we can find the following agents:
5
! " #$%&
HOST MACHINE
P1 P2
ATTACKER SERVER
APP
CLIENT
! " #$%&
CLIENT
CLIENT
Covert
Channel
Figure 2: Diagram of the considered scenario for the attack
against the S-Box implementation of AES. The attacker and
the victim share the hardware
Server Encrypts/decrypts a block of data whenever it gets a
request from any client. While this is a simplified ver-
sion of a real server process, it is enough to create a re-
alistic scenario for the attacks.
Client Sends requests at random times, each around 500µs
plus a random time. This process tries to emulate the
normal behavior of a real network.
Attacker Monitors the cache to detect the exact times when
the target process (an encryption process) is running so
it can launch a precise attack.
Our main assumption is that both the attacker and the vic-
tim are using the same machine. Since we assume no syn-
chronization between the victim and the attacker, the attacker
does not know when the victim is running an encryption.
Thus, the first challenge for the attacker is to detect when
the victim is performing an encryption, which she does by
spying on the cache.
The second challenge for the attacker is to design a tech-
nique that allows her to evict the target data from the memory
at the desired instant. Note that this eviction has to be accu-
rate since the attacker has a very limited time window to ob-
serve the victim behavior. To retrieve necessary information
up front, the attacker needs to perform an offline stakeout
phase. Then, CACHESNIPER is conducted in three steps:
Aim Detect the execution of the victim process.
Wait Wait for a pre-determined time
Shoot Cause an eviction of the desired data.
We will explain the stakeout and the aim and shoot steps in
the following sections. The wait step is straight forward after
completing the stakeout. Note that although we use AES for
exemplification, the presented approaches are general, and
can be applied to virtually any other target. The pseudocode
of the attack is displayed in listing 2, and all line numbers in
the following refer to this code.
4.1 Stakeout: Preparing for attack
The goals of the stakeout-phase are to find an appropriate
cache region to monitor and determining the waiting time
for the second attack step. This can be performed offline and
is completely independent of any victim executions.
For AES, we would like to monitor the S-Box. As stated in
section 3.1, we have to determine which of the four S-Boxes
is used by the server. In our case it is the third table, and it
seems to stay the same through executions. To retrieve this
information, an attacker can just perform an unmodified ver-
sion of the Flush+Reload or even Prime+Probe attacks and
profile the process.
The waiting time is the time between the start of the en-
cryption and the relevant observation window. The constant
WAIT_TIME represents that value in listing 2. One possi-
ble way to obtain it is to profile the victim application on a
machine similar to the targetmachine. The attacker measures
the time it takes the process to execute the code between the
point that serves for detection and the point in which the at-
tacker can gain information from the evicted data, for exam-
ple, the last round of the AES encryption process in our sce-
nario. Note that the time it takes to execute the clflush has
to be considered but it is not significant if no fence instruc-
tions are used to measure its execution time. As a result, we
can evict data that is accessed as soon as 60 cycles after the
execution of the line used for detection. This time has to be
considered when selecting the gadget function.
A second approach assumes that an attacker does not know
the waiting time in advance, but she knows some characteris-
tics of the process as the probability of observing a cache hit
or miss in the target. For example around 1% of cache misses
will be ideally observed if we hit exactly the last round of
the AES encryption or around 50% of the bits are expected
to be 1 in an RSA secret key. In this case, the value for
WAIT_TIME can be retrieved automatically by analyzing
the number of hits and misses observed in the recovery step
(lines 10 - 15) and modifying its value accordingly. Even fur-
ther, the attacker can define an initial value forWAIT_TIME
and update or adapt it dynamically based on the comparison
between the actual and the expected observations.
4.2 Aim: Detection of the victim’s execution
In order to spy on a reduced time window during the execu-
tion of the target algorithms we must be able to identify such
window during the runtime. For this reason, we study differ-
ent detection mechanisms to cover a variety of attack scenar-
ios. In the most favorable scenario, the host machine has Intel
TSX, which we aim to use for detection. In the second sce-
nario, the attacker relies on the existence of shared memory
and on the knowledge of some characteristics of the target
algorithm. Indeed, as we will show in later sections, if the
victim prefetches the target data, we can use that prefetch for
6
detection. Finally, and although we do not explore further in
that direction we briefly describe other approaches that could
be used for detection.
To evaluate which of the different approachesworks better,
we have modified the server, so it gives us information about
the precise time instants at which the target process (an AES
encryption) begins and ends. We then use this information
to evaluate how many of the encryptions have been detected
and if so, how many of them are detected before the encryp-
tion has ended to ensure we still have time for making an
observation.
In particular, we monitor one of the lines of the S-Box.
And since the prefetching stage of the targeted OpenSSL
AES implementation involves the S-Box being loaded into
the cache at the beginning of the encryption process, this re-
liably informs the attacker when the encryption has started:
whenever we detect the presence of the S-Box data in the
cache, we assume the encryption has started.
4.2.1 Leveraging TSX for detection
As we have already stated, the process running inside a trans-
action can either be completed and commit the results or suf-
fer an abort and rollback the computations.
The Prime+Abort attack [20] deliberately causes conflicts
in the L3 cache with the victim process to determine whether
it has or has not used certain data. We use such conflicts in
the L3 cache to determine the exact instants when the pro-
cess executes the target instructions or uses certain data we
want to monitor. The specific targets here are the data in the
S-Box in the attack against AES and one instruction in the
vulnerable exponentiation function in the case of the attack
against RSA.
First of all we build eviction sets mapping to the same sets
as our targets. We enabled hugepages to construct eviction
sets as Liu et al. did in [22], although it is possible to use the
reverse-engineered mapping function [49] or a different ap-
proach that does not require the use of huge pages [40, 63].
We read the data in one eviction set inside the transaction
and wait for the victim to execute. We observed some spon-
taneous aborts as also noticed in previous works [20,27], but
we were able to properly detect 97% out of 10000 encryp-
tion processes executed during this experiment. We believe
that during the “undetected” executions we were loading the
data into the transactional region concurrently or just after
the execution of the victim process, or an unrelated process
accidentally evicted our data.
Moreover, we have used the abort handler to read the sys-
tem timestamp and compared this value with the timestamp
we have collected in the server just before executing the en-
cryption process. The difference between them is almost con-
stant and the mean the value of this difference is 380 cycles
in our system (Intel Core i5-7600K). Based on this measure-
ments, we state that this technique accurately informs about
the execution of the victim and that the attack gets the abort
as soon as the conflict happens. Note that the measured times
include the execution of some instruction before the prefetch
of the S-Box and the time it takes the CPU to retrieve the
S-Box data from the main memory. That is, the transaction
aborts once the data has been effectively loaded into the
cache and the attacker’s data is removed. While a heavy sys-
tem load leads to additional aborts that are not related to the
encryption, it does not influence the detection time.
4.2.2 Shared Memory and detection
The Flush+Reload [69] technique has revealed itself as one
of the most reliable sources of information in cache attacks,
especially when compared to the Prime+Probe technique
[4, 9, 22, 36]. The main reason is that, in the former case,
observations are made on a shared memory block, whereas
in the latter case, any other process running in the machine
could force an eviction and the attacker would not be able
to distinguish the origin of the eviction. The Flush+Reload
technique requires the victim and the attacker to share the li-
brary. This is often not a problem since, for example, we have
observed that both Ubuntu 16.04 and CentOS 7.6 are shipped
with a compiled version of libcrypto (OpenSSL 1.0.2g) and
newer versions of Ubuntu such as Ubuntu 20.04 also come
with the latest version of the shared library (OpenSSL 1.1.1f).
Similarly, wolfSSL, when compiled and installed, generates
a shared library.
Back to the S-Box example, the attacker flushes the data of
the S-Box line monitored in the previous scenario and then
waits for an arbitrary time and reloads that data. We have ob-
served that, if we do not include this waiting time, we are not
able to observe any cache access. We have similarly consid-
ered two different ways of ensuring that the process waits for
a fixed time: increasing the value of a variable up to a limit
or actively polling the system counter through the rdtsc in-
struction and constantly checking if we have waited for the
desired time. Each approach yields to different results, but
they are similar in the sense that they lead to similar detec-
tion rates.
Figure 3 shows the results in terms of correctly detected
encryptions and valid encryptions (the encryption was de-
tected before the victim had actually finished it) as a function
of the waiting time. The waiting time refers to the limit up
to which a variable is increased: from 0 to the limit. As we
did in the previous case, we collect information referring to
10000 encryptions per each considered limit. As it can be de-
rived from the figure, limits around 20 obtain the best results
in terms of valid encryptions. As a consequence, an attacker
should select that limit to maximize the chances of success
when trying to evict data later during execution of the target
window. In any case, these results show that it is also possi-
ble to use the Flush+Reload technique for detection in case
the processor does not include TSX, although the results will
7
0 20 40 60 80 100 120 140
0
50
100
wait time
detected
valid
Figure 3: Percentage of correctly detected encryptions and
valid encryptions for an attack as a function of the waiting
time measured as the limit of a count
not be as accurate as the ones obtained using TSX.
4.2.3 Other scenarios
Another option to detect the execution of the victim process
are Prime+Probe based approaches. If we look back to fig-
ure 1 and compare the Probe times with the execution of the
whole encryption process it seems clear that performing a
complete Probe of the data in the eviction set is not a real op-
tion to obtain valid information for the attack from a single
round. However, considering the replacement policy of the
L3 cache and that the attacker knows the insertion order of
the elements in the eviction set, she can sequentially access
such elements from the one that was first inserted to the one
that was last inserted. This way, she can be reasonably sure
that she is accessing the element that the target will replace
in case of conflict.
The main drawback of this approach is that even if the de-
tection is successful, the attacker does not have any control
of the state of the cache at the time the victim executes. As a
result, she cannot accurately predict the number of accesses
that will be required to evict the recently inserted data. Since
the capability of the attacker to achieve this eviction during
the execution of the victim process is determined by this state
of the cache and the number of required accesses, the attacker
capabilities are rather limited in this scenario.
4.3 Shoot: Accurate eviction of the data
Due to the prefetching or to the always execute and always
load into the cache strategies, the ability of the attacker to re-
trieve the secret information is determined by his ability to
evict the data from the cache during the vulnerable windows.
Namely, when attacking the S-Box implementation of AES
our goal is to achieve evictions during the last round and af-
ter the last prefetch, when attacking the RSA exponentiation,
our target are certain windows during the execution of some
operations in the multiply or the reduce functions.
Is TSX available?
Is there shared
memory?
Is there shared
memory?
AIM: Flush+Reload
WAIT(WAIT_TIME)
SHOOT: Flush
AIM: TSX (abort)
WAIT(WAIT_TIME)
SHOOT: single access
AIM: TSX (abort)
WAIT(WAIT_TIME)
SHOOT: Flush
Yes
No
Yes
No
Yes
Figure 4: Flow diagram of the possible scenarios that enable
the proposed attack
As we did when explaining the detection scenarios, we
also consider different possibilities in this case. The consid-
ered scenarios are depicted in figure 4. Note that since we ob-
tained the best detection results for the machines with TSX,
this is the feature we first look for, although it will not be used
during the eviction process. In figure 4 we have highlighted
the scenarios we are going to describe more deeply in this
section for the shake of simplicity and brevity. In all cases
the idea is the same, once we know the victim is running we
wait for the exact time determined in stakeout, which will al-
low us to force the eviction at the desired instant.
In the following, we refer to the technique that allows us
to evict data from the cache once the execution of the vic-
tim has been detected using the TSX capabilities as method
1 when there is shared memory between the victim and the
attacker, and as method 2 when there is not. Although not
explicitly mentioned in the figure, the attacker has to recover
the information leaked thanks to the accurate eviction of the
data. For method 1, reloading the shared data is enough, in
the second case, another access is required.
In order to help us with the explanation of both methods,
we include the pseudocode of the CACHESNIPER attack pro-
cess (Aim, wait, shoot, then recover information) particular-
ized to the S-Box attack that we have been using as example
in this section. Note that the procedure procedure summa-
rized in the algorithm 2 is general, the attacker just has to
change the S-Box by her target, for example the RSA mul-
tiply function in our other example, and adapt the waiting
times and the eviction function correspondingly.
As we will show, method 1 differs from method 2 in the
way the cache set is filled during the transaction, in the value
of theWAIT_TIME even if the target is the same, in the way
this target is evicted from the cache and finally in the way
the information about the actual access to the data is inferred.
Next, we explain the particularities of each approach.
8
Algorithm 2Attack against the OpenSSL AES S-Box imple-
mentation for the TSX-based detection scenario
Input: Address(S-Box(0)),Eviction_set ⊲ Address of the
S-Box
Output: X0,S
Nr+1
⊲ Information about the access and
ciphertext
1: for t = 0 to number_of_encryptions do
2: Start_Transaction();
3: if Successfully Started then
4: fill_cache_set();
5: else ⊲ Aim: Abort handler detects the access
6: time_interrupt=timestamp()+WAIT_TIME;
7: While(timestamp()≤ time_interrupt) {};
8: Evict from cache (S-Box(0));
9: Wait until encryption ends
10: Infer victim access to((S-Box(0))
11: if hasAccessed((S-Box(0)) then
12: X0 [t] = 1; ⊲ Data used
13: else
14: X0 [t] = 0;
15: end if
16: end if
17: end for
18: return X0,S
Nr+1;
4.3.1 Method 1
As we have already stated, this method relies on the existence
of shared memory. That allows the attacker to flush from the
cache the desired data when the desired instant comes. Once
the transaction is started, the attacker has to fill the cache set
(line 4 in the algorithm 2). It is not required for the attacker to
place the data in any specific order but she must fill the whole
set to get the abort signal that will inform her that the victim
is accessing the target. One can see this detection mechanism
as a “break point” the attacker places into the code without
notifying it to the victim.
Once the abort is triggered by the victim’s access, we use
the handler to carry the attack. This process includes mainly
the definition of the time the attacker has to wait (line 6), the
procedure to evict the data (line 8) and finally a way to re-
trieve and store the desired information (lines 10 - 15). Since
there is shared memory data is evicted using the clflush in-
struction and retrieved measuring the time it takes to read
such data (reload). This is a traditional Flush+Reload attack
carried by an abort handler, which allows for very precise
timing of the flush instruction.
4.3.2 Method 2
If there is no shared memory or the data is going to be re-
trieved from a non shared variable (e.g. in the attack against
RSA), it is still possible to achieve the desired eviction accu-
rately by just accessing one memory location. In order to be
able to do this, we have to manipulate the cache state prior to
the detection of the encryption process. Indeed, the required
state of the cache, depends on the replacement policy of the
last level cache and it can be changed by accessing data lo-
cated in the cache. Note that accesses to data located in lower
caches do not change the state of the last level cache. Inter-
estingly, the state of the cache remains unchanged when the
transaction aborts. In other words, the abort does not revert
the microarchitectural state of the cache.
One of the simplest ways to check this assumption, is flush-
ing some data out of the cache before executing the trans-
actional code. Then, in the transactional region, reload that
data back into the cache. Next, we wait in an endless loop
doing nothing within the transaction, until an abort happens.
The abort handler checks if the data is in the cache or not.
We have repeated this test 10000 times, and the conclusion
is that the data that has been loaded into the cache during a
transactional operation, remains in the cache after such oper-
ation has finished, in the sense that even if the process never
gets to see the result of the code executed inside the trans-
action, the microarchitectural state of the processor retains
some information.
This is similar to transient execution attacks [44, 46],
which exploit the fact that microarchitectural changes of tran-
sient executions that have never been committed are still ob-
servable from the architectural state.
Although the manipulation of the data in the cache is per-
formed during the preparation of the code for the detection,
in the fill_cache_set() function of the algorithm 2 (line 4), it
is directly relatedwith the eviction, and this is the reason why
we are giving more details in this section. This initial setup
is also one of the differences between method 1 and method
2. Also, since this approach evicts the target by accessing an-
other block instead of using the clflush instruction (line 8)
this has an effect on the WAIT_TIME value. Namely, the
target data is not evicted from the cache until the replace-
ment block has been retrieved from the main memory, and
this introduces a delay of around 200 cycles since detection.
This means that around 250 cycles must elapse after detec-
tion, otherwise it is not possible to evict the data. The profil-
ing of the victim for the selection of theWAIT_TIME or the
automation of the process to obtain it is still necessary. The
last difference to method 1 is how data is retrieved, in this
case another access is required (line 10 )
In the following, we give more information about why it is
possible to use TSX to support our attack, how this approach
works and some considerations related with the L1 cache re-
placement policy.
Replacement policy and accurate eviction of data
The replacement policy described in [10] agrees with the
ones described in [3, 62] in the selection of the eviction can-
didate most of the times. Indeed, our attack can be explained
9
A2
Content
AGE
B
2
C
2
D
2
E
2
F
2
G
2
H
2
A
2
Content
AGE
B
1
C
1
D
1
E
1
F
1
G
1
H
1
Access to B to H
S-Box
2
Content
AGE
B
2
C
2
D
2
E
2
F
2
G
2
H
2
Encryption process starts
A
2
Content
AGE
B
3
C
3
D
3
E
3
F
3
G
3
H
3
Access A
Figure 5: General diagram of the process required to evict
the data from the cache with just one access.
considering any of them as the correct one. Mainly, once the
data is inserted into the cache set, all of them distinguish two
different ages above the age 0, and they all consider that the
data is evicted from the cache with age 3. They differ in the
way the ages are updated. Since our attack was originally de-
signed using the approach described in [10], we will use that
policy for the explanation.
Besides the replacement policy, there are two things we
need to bear in mind to understand this approach. The first
one is that only access to the last level cache update the val-
ues of the ages of the elements in the LLC. The second one is
that the LLC in our machine is inclusive. This means that if
the victim loads the S-Box into the cache after a cache miss,
it will be loaded in all the cache levels, and in case this data is
used multiple times during the encryption, it will be retrieved
from the low level caches leaving the age of the cache line
holding the S-Box unchanged. As a result, if it is the oldest
element in the set when inserted, it will be evicted in case of
conflict.
The general idea of the attack is depicted in Figure 5 par-
ticularized for the S-Box attack. As a difference with the pre-
vious approach the attacker needs to set the ages of the el-
ements that have just been inserted into the set (stage 1) to
the ones shown in the second stage of the figure. This means
that the fill_cache_set() function in the algorithm 2 in-
volves two steps. Note that all the elements in the eviction set
are accessed, so the transaction still aborts when the victim
accesses the monitored data.
Once the encryption begins, the S-Box data will be placed
into the cache. In that moment, the oldest element (A) that
we have not accessed to ensure it is older than the rest of
elements, will be evicted. At the moment the S-Box data is
inserted, and considering all the updates in the ages of the
blocks, all the blocks (the ones originally placed in the cache
by the attacker and the S-Box block) will have the same age.
Since we perform the accesses in such a way that the oldest
element before inserting the S-Box data (A) is located in the
first position in the set, the S-Box automatically becomes the
eviction candidate just after being inserted.
Then the procedure is similar as it was in the previous case.
We wait for the desired time and then by accessing A (which
is out of the cache because it was replaced with the S-Box
data) we ensure that the data of the S-Box is removed from
the cache. We wait until the encryption ends, and finally re-
trieve the information about the actual access to the S-Box
by accessing B. Since B is the first element with age 3, in
case of conflict it will be replaced with the S-Box data. Note
that after the recovery step we need to change the ages again
in a similar way they did in the Reload+Refresh attack [10].
Influence of the replacement policy of the L1 caches
When trying to force all the elements in the set to get the
ages depicted in the stage 2 of Figure 5, we accessed the cor-
responding blocks B to H as a linked list (to avoid pipeline
effects as much as possible). We thought that this way all
the data would be retrieved from the LLC and as a result, the
ages of all the elements would be updated. To test our hypoth-
esis that the data was retrieved from the LLC, we measured
the times it took us to read each of the blocks. Surprisingly,
we found out that some of them were retrieved from the L1
cache. The replacement policy of L1 and L2 caches was re-
sponsible for this effect, so we performed some experiments
to understand how it works.
In fact, we only observed this behaviour in our processor
LLC whose LLC is 12-way associative, whereas the L1 and
the L2 caches are 8-way and 4-way associative respectively.
However, we performed the same test in a different processor
(Intel i7-6700K) with a 16-way associative cache, and all the
data was retrieved from the LLC. In the first case, just after
the 12 elements of the set have been placed in the cache, 4
of them are only present in the LLC, and the L1 cache will
have suffered 4 misses but still keep 4 of these blocks. The
replacement policy defines which elements are replaced and
which are replaced. On the contrary, if the number of ways of
the LLC is 16, the L1 cache will have suffered 8 misses. This
means it is likely that the 8 elements that were first accessed
only reside in the LLC. We observed that in the first case the
block we call B is in the L1 cache whereas it is not in the sec-
ond case. For these reasons we conducted some experiments
to determine the replacement policy in the low level caches.
Low level caches are assumed to implement a Pseudo-
LRU replacement policy, simpler than the policy imple-
mented in the LLC. We conducted some experiments aimed
to retrieved that policy. Based on our observations and some
intuition we already had, we were similarly able to explain
our results assuming a replacement policy and checking
times. It turns out that Intel implements a tree based Pseudo-
LRU replacement policy in the low level caches. Indepen-
dently and concurrently with the design of this attack, some
other researchers arrived to the same conclusion [3, 62].
To sum up, the tree-based replacement policy is repre-
sented in Figure 6. Starting from the root node it selects each
of the branches depending on the intermediate values of the
10
A B C D E F G H
0 1 1 1
1 0
1
Figure 6: Tree structure that controls the Pseudo-LRU re-
placement policy of L1 and L2 caches. The eviction candi-
date is highlighted
nodes. In the example the eviction candidate is marked in red.
Since the root node contains a 1, it selects the right branch.
The value of the pointed node is a 0, so it selects the left
branch. Finally, the last node has a 1, so it points to the ele-
ment at the right, F in the example. Note that the blocks of
memory in the cache set (A to H) are ordered. According
to our experiments when the cache set is completely empty,
the elements are inserted linearly in the first free block they
find regardless of the actual values of the nodes. Once the
set gets completely filled with data, the apparent value of all
the nodes seems to be 0. If an element in the cache is either
accessed or replaced, the values of the nodes that pointed to
it are switched.
This replacement policy explains why B is in the L1 cache
after reading the whole LLC eviction set (12 elements). For
this reason, B cannot be the first element to be accessed be-
cause its age would not change. Based on this replacement
policy, we have prepared a linked list of addresses to access
11 elements of the eviction set out of 12 possible in our 12-
way cache, that ensures all of them are in the LLC. This way
we are able to reduce their age and indeed, we have checked
that the designed access pattern retrieves all the elements in
that list are retrieved from the LLC.
5 Results
In this section we explain how we recover the AES and RSA
secret keys, demonstrating that the countermeasures to pre-
vent cross-core last level cache attacks can be circumvented.
In the cases where some adaptations of the proposed ap-
proach are possible, we mention it. All the experiments were
performed in the machine described in table 1. The version
of OpenSSL used is the one included in our CentOS sys-
tem (OpenSSL 1.0.2k). Note that the replacement policy on
which this attack relies is implemented in the Intel Core pro-
cessors starting from the 6th generation.
5.1 Attack against the AES
This targeted S-Box implementation was also attacked
in [50], albeit for a much more powerful adversary with full
OS control targeting SGX. That work monitored the entire
Table 1: Experimental platform details.
Processor Intel core i5-7600K
Cores 4
Frequency 3.8 GHz
LLC slices 8
LLC size 6 MB
LLC ways 12
L1 size 32 KB
L1 ways 8
L1 data cache, observing various samples per round, allow-
ing them to distinguish the prefetching stages from the nor-
mal operations of the round. Our approach on the contrary
does not need to frequently interrupt the victim, works across
cores and does not have any special requirement, just user-
level privileges as commonly assumed for cache attacks.
As we have already stated in the previous sections of this
work, the target of our attack against the S-Box is the last
round of the encryption. In this round, the contents of the
S-Box is xored with the corresponding round key to get the
ciphertext. In order to retrieve the secret key in all scenar-
ios (method 1 and 2) we use the information referring to the
accesses to the S-Box retrieved during the attack phase (Al-
gorithm 2) and assume the ciphertext to be known by the
attacker.
We use the non-access approach described in [9]. In a nut-
shell, they used information from cache misses, that is when-
ever the victim did not load the data into the cache. Note
that in this particular implementation, the S-Box is accessed
16 times during the last round, and even if we are able to
accurately get the information referring to the last round ex-
clusively, we would not know which of the 16 accesses was
responsible for this access. On the contrary, if we determine
that an element has not been accessed it means none of the
operations in the last round has used it. As a consequence,
we xor each byte of the ciphertext with the 64 values of the
S-Box held in the cache line (ki = Ci⊕ S-Box [0 to 63]).
None of these values could be the secret key.
In our test system the last round takes around 50-60 cycles
to execute. However the encryption time is not constant, and
has a variance of around 15 cycles. Thus, even if we interrupt
the victim at the exactly chosen time, which is the estimated
time when the last round starts to perform its operations, we
may not hit our target. Even slight variation in the interrup-
tion instantly lead to a different number of observed accesses.
If we interrupt before the last prefetch, we will only observe
cache hits. In contrast, if we evict the S-Box data after the
encryption has ended, we will only see cache misses. Indeed,
assuming that the 16 operations that use the S-Box in the last
round of the encryption are executed sequentially from 0 to
15, the expected cache miss probability depends on the exact
operation at which is evicted from the cache. This probability
is depicted in Figure 7.
To deal with the variation in the execution time of the al-
gorithm and to maximize our success chances, when estimat-
ing the WAIT_TIME in the Algorithm 2 instead of using
11
0 2 4 6 8 10 12 14 16
0
20
40
60
80
Operation at which the data is evicted from the cache
P
ro
ba
bi
li
ty
Figure 7: Probability of not accessing a line during the last
round of the encryption depending on the number of instruc-
tions of the last round that have been actually executed.
the probability of 1% as the expected one, we allow up to 7%
of cache misses. This way we try to ensure that the observed
cachemisses actually happen in the round. This valuewas de-
termined empirically in our machine, by selecting a different
probability values and carrying multiple experiments with
the value of WAIT_TIME adapted dynamically according
to that probability and window sizes of 10000 observations.
Both approaches, the one that assumes shared memory
(method 1) and the one that does not (method 2), are able
to successfully retrieve the secret AES key. Even with the
false positives (cache misses that do not refer to the key byte)
introduced by all the variances, we get enough information
from all the rounds. Indeed, as our results show, it is more
likely to evict the data in the middle of the execution of the
last round than just at the beginning. As a result some bytes
are recovered faster than others as it can be observed in Fig-
ure 8. The number of samples required to retrieve the secret
key slightly varies between executions and depends on the
threshold selected, even if we use the adaptive approach. In
both cases, the minimum number of samples required to re-
trieve the whole key is about 500000.
0 1 2 3 4 5
·105
0
100
200
Number of samples
K
ey
va
lu
e
po
ss
ib
il
it
ie
s
byte1 byte2
byte3 byte4
byte5 byte6
byte7 byte8
byte9 byte10
byte11 byte12
byte13 byte14
byte15 byte16
Figure 8: Key candidates for each of the bytes of the key of
the S-Box AES implementation of OpenSSL retrieved using
the TSX-based detection and method 2.
As can be observed in Figure 8, half of the key has been
1 2 3 4 5
·105
0
50
100
Number of samples
K
ey
se
ar
ch
sp
ac
e
Figure 9: Key candidate search space for the 128 bits of the
key of the S-Box AES implementation of OpenSSL retrieved
using the TSX-based detection and method 2.
completely leaked after 100000 samples (the initial four
bytes are obtained with 10000 samples, although it cannot
be clearly observed in the figure). Around 12 of the 16 bytes
are already known with 200000 samples. Retrieving the last
4 bytes of the key is the hardest part, it takes 300000 more
samples, which gives an idea about the difficulty of evicting
the data in between the execution of the prefetch and the sub-
sequent access in the last round.
Figure 9 is complementary to Figure 8. It represents the
number of guesses that an attacker would require to retrieve
the key using brute force and the information about the ac-
tual access of the victim. Note that both plots represent the
same experiment. Different experiments have yielded to sim-
ilar results but, as we have stated before depend on the ac-
tual threshold defined by WAIT_TIME. Besides, we have
noticed that the location of the S-Box in the cache is also
important, because there are some sets that are noisier than
others.
5.2 Attack against RSA
The modular exponentiation executed for the RSA decryp-
tion operations in the wolfSSL implementation is a variation
of the well-known square-and-multiply algorithm. Indeed, it
is a cache attack protected version of the square-and-multiply
implementation that we have described in section 3.2.
Based on the code they provide for the tests, we generated
different secret keys of 2048 bits, and embedded them in our
server application in such a way that it decrypts the received
data by calling to the wolfSSL RSA decrypt operation. Note
that in this case we can observe the leakage by monitoring
accesses to one of the two arrays R[0] or R[1], cause the ac-
cesses to each of them depend on the key bit value (0 or 1).
During the execution of the multiply function, they are both
loaded into the memory, but at the end of the function they
perform a copy operation wich only accesses the required
value. That is, an attacker can for example remove R[0] from
the cache before the execution of the copy operation and
check it afterwards. This operation takes around 70-80 cycles
in our system, which should be enough for the observation.
12
However, attacking this implementation is eased by the re-
duce operation executed after the multiply operation. Such
function only loads the correct value of R[y] where y is the
secret key value. The execution of the reduce operation lasts
about 2300 cycles in our test system. Note that this time even
allows the execution of a complete probe cycle so we do not
to be so precise evicting the data when targeting this func-
tion.
As a difference with the attack against AES where we tar-
geted the S-Box for both detection and eviction it is not pos-
sible to use the same content for detection and eviction in
this case. Namely, we have used both the multiply and the
reduce operations for the detection and later evict R[0]. Note
that while the functions are shared, R[0] and R[1] are not, so
the approach explained in method 1 is not possible. Also, the
attacker will have to profile the application for determining
the WAIT_TIME and to determine the cache set in which
R[0] is loaded. The task of profiling is eased with the help
of the detection of the multiply function. In our posterior ex-
periments we assume that the attacker already knows the set
where R[0] maps.
There are also some other differenceswith method 2, since
we do not require such an accurate eviction and loading the
data of a whole eviction set conflicting with R[0] in the trans-
actional region leads to false positives in detection so it is not
required. However, some of them can be loaded to reduce the
time it takes to evict R[0] considering that all of them can be
accessed when retrieving the actual information about the ac-
cess equivalent to the inference step in algorithm 2, line 10.
In this case, the key bits correctly guessed depend on both
the accuracy of the detection and on the ability of the attacker
to remove the data from the cache during the execution of the
leaky parts of the code. The mean time between the execu-
tion of two multiply operations is about 24000 cycles. That
time seems to be “constant” and it is enough for carrying
the detection, eviction and retrieving the data. We collected
information for the execution of 100 RSA decryptions. Our
attack correctly detected 96.8% of the multiply operations
introducing 1.3% of false positives. From those correctly de-
tected operations, the information referring to the access to
R[0] featured 91% of true positives rate and 87.2% of false
negative rate, namely a precision of 87,6%.
Note that no further processing of the results was done.
If so, it is possible to determine which of the executions of
the multiply operation were not detected. Note that since we
get quite exact timestamps from the aborts, trace alignment
becomes easier. The decision about the correct value of the
secret bits of the exponent can be made based on the infor-
mation retrieved from various traces.
6 Countermeasures
The presented attack is feasible due to the fact that the in-
terval between an access to a piece of data or a function
that could leak data, and the following access gives the at-
tacker the possibility to observe such accesses and, as a con-
sequence, retrieve the secret information. In order to prevent
this leakage, these susceptible windows must be removed
from the source code, and as a consequence the code should
be redesigned. In order to help developers to find leakages in
their code, there are tools that detect these leakages [65, 66].
In particular, the AES S-Box implementation for x86,
could make this attack harder just by including one last
prefetch after the encryption ends. Alternatively, each S-Box
lookup could access all four cache lines, eliminating cache
line leakage. Considering the wolfSSL RSA implementation,
it should at least load into the cache the leaky data in the two
vulnerable functions. Note however, that the proposed coun-
termeasures are intended to prevent the exploitation through
the LLC, however the powerful SGX scenario may still be
able to retrieve some information.
Regarding the countermeasures that prevent cache attacks
by means of new cache designs [43] or applying hardware
modifications [48,64], they can be effective for the presented
attack. However, they are not available yet. Similarly, tech-
niques that allocate the victim and the attacker data in differ-
ent andmutually exclusive cache sets [47] would prevent this
attack.
Finally, detection based countermeasures that monitor the
execution of the algorithms they aim to protect and collect in-
formation about execution times or from performance coun-
ters (i.e. cache misses or accesses) to detect changes in the
execution trace could detect the attack [8, 11, 16]. Note that
CacheSniper was not designed to be stealthy and generates
cache misses on the victim algorithm. However CacheSniper
can also improve the efficiency of existing attacks. Indeed,
we tested it against the T-table implementation, retrieving the
secret key in about 4-5 ms from just 300 encryptions. These
short times seriously limit the capability of the aforemen-
tioned countermeasures to trigger the alarm on time.
7 Conclusions
We present CACHESNIPER, a new side channel attack con-
sisting of the steps aim,wait, and shoot. The aim step detects
the exact start of the victim operation. The attacker thenwaits
for a predetermined time. Last, in the shoot step, the target el-
ement is evicted from the cache at precisely the rightmoment.
We show that CACHESNIPER can be used to launch last level
cache side channel attacks on implementations that are gen-
erally regarded as protected by leveraging tiny windows of
secret dependent accesses in the implementation.
Achieving such accurate evictions is possible thanks to the
ability of TSX to synchronize with the execution of the target
algorithms, the knowledge of the replacement policies imple-
mented in Intel processors and to the fact that the modifica-
tion of the state of the cache inside the transactional region
remains even if the transaction aborts and never commits.
13
We demonstrate this by retrieving the secret key from the
OpenSSL AES S-Box implementation and the secret bits of
the modular exponentiation implemented as part of the the
wolfSSL RSA decryption algorithm, both of which were re-
garded as secure against cache side channel attacks since tra-
ditional cache attacks would not be able to retrieve such in-
formation. CACHESNIPER can however be applied to a much
wider range of targets.
References
[1] A. Abel and J. Reineke. Measurement-based modeling
of the cache replacement policy. In 2013 IEEE 19th
Real-Time and Embedded Technology and Applications
Symposium (RTAS), pages 65–74, April 2013.
[2] A. Abel and J. Reineke. Reverse engineering of cache
replacement policies in intel microprocessors and their
evaluation. In 2014 IEEE International Symposium
on Performance Analysis of Systems and Software (IS-
PASS), pages 141–142, March 2014.
[3] Andreas Abel and Jan Reineke. uops.info: Charac-
terizing latency, throughput, and port usage of instruc-
tions on intel microarchitectures. In Proceedings of the
Twenty-Fourth International Conference on Architec-
tural Support for Programming Languages and Operat-
ing Systems, ASPLOS ’19, pages 673–686, New York,
NY, USA, 2019. ACM.
[4] Gorka Irazoqui Apecechea, Mehmet Sinan Inci,
Thomas Eisenbarth, and Berk Sunar. Wait a minute!
A fast, cross-vm attack on AES. In Research in Attacks,
Intrusions and Defenses - 17th International Sympo-
sium, RAID 2014, Gothenburg, Sweden, September 17-
19, 2014. Proceedings, pages 299–319, 2014.
[5] Jacob Barthelmeh. wolfssl (formerly cyassl) release
3.10.0. https://github.com/wolfSSL/wolfssl/
releases/tag/v3.10.0-stable, 2016.
[6] Atri Bhattacharyya, Alexandra Sandulescu, Matthias
Neugschwandtner, Alessandro Sorniotti, Babak Falsafi,
Mathias Payer, and Anil Kurmus. Smotherspectre: Ex-
ploiting speculative execution through port contention.
In Proceedings of the 2019 ACM SIGSAC Conference
on Computer and Communications Security, CCS ’19,
page 785–800,New York, NY, USA, 2019. Association
for Computing Machinery.
[7] Atri Bhattacharyya, Alexandra Sandulescu, Matthias
Neugschwandtner, Alessandro Sorniotti, Babak Falsafi,
Mathias Payer, and Anil Kurmus. Smotherspectre: ex-
ploiting speculative execution through port contention.
In Proceedings of the 2019 ACM SIGSAC Conference
on Computer and Communications Security, pages
785–800, 2019.
[8] Samira Briongos, Gorka Irazoqui, Pedro Malagón, and
Thomas Eisenbarth. Cacheshield: Detecting cache at-
tacks through self-observation. In Proceedings of the
Eighth ACMConference on Data and Application Secu-
rity and Privacy, CODASPY ’18, pages 224–235, New
York, NY, USA, 2018. ACM.
[9] Samira Briongos, Pedro Malagón, Juan-Mariano
de Goyeneche, and Jose M. Moya. Cache misses and
the recovery of the full aes 256 key. Applied Sciences,
9(5), 2019.
[10] Samira Briongos, Pedro Malagon, Jose M. Moya, and
Thomas Eisenbarth. Reload+refresh: Abusing cache re-
placement policies to perform stealthy cache attacks. In
29th USENIX Security Symposium (USENIX Security
20), Boston, MA, August 2020. USENIX Association.
[11] Samira Briongos, Pedro Malagón, José L. Risco-
Martín, and José M. Moya. Modeling side-channel
cache attacks on aes. In Proceedings of the Summer
Computer Simulation Conference, SCSC ’16, pages
37:1–37:8, San Diego, CA, USA, 2016. Society for
Computer Simulation International.
[12] R L Brotzman, Shen Liu, Danfeng Zhang, Gang Tan,
and Mahmut T. Kandemir. Casym: Cache aware sym-
bolic execution for side channel detection and mitiga-
tion. 2019 IEEE Symposium on Security and Privacy
(SP), pages 505–521, 2018.
[13] Ashokkumar C., Bholanath Roy, Bhargav
Sri Venkatesh Mandarapu, and Bernard Menezes.
“s-box” implementation of aes is not side channel
resistant. Journal of Hardware and Systems Security,
4, 12 2019.
[14] Alejandro Cabrera Aldaya and Billy Bob Brumley.
When one vulnerable primitive turns viral: Novel
single-trace attacks on ecdsa and rsa. TCHES, 2020,
Mar. 2020.
[15] Claudio Canella, Daniel Genkin, Lukas Giner, Daniel
Gruss, Moritz Lipp, Marina Minkin, Daniel Moghimi,
Frank Piessens, Michael Schwarz, Berk Sunar,
Jo Van Bulck, and Yuval Yarom. Fallout: Leaking
data on meltdown-resistant cpus. In Proceedings
of the ACM SIGSAC Conference on Computer and
Communications Security (CCS). ACM, 2019.
[16] Marco Chiappetta, Erkay Savas, and Cemal Yilmaz.
Real time detection of cache-based side-channel attacks
using hardware performance counters. Applied Soft
Computing, 49:1162 – 1174, 2016.
14
[17] Shaanan Cohney, Andrew Kwong, Shachar Paz, Daniel
Genkin, NadiaHeninger, EyalRonen, andYuval Yarom.
Pseudorandom black swans: Cache attacks on ctr_drbg.
Cryptology ePrint Archive, Report 2019/996, 2019.
https://eprint.iacr.org/2019/996.
[18] Joan Daemen and Vincent Rijmen. The Design of Rijn-
dael: AES - The Advanced Encryption Standard. Infor-
mation Security and Cryptography. Springer, 2002.
[19] Dave Dice, Tim Harris, Alex Kogan, and Yossi Lev.
The influence of malloc placement on TSX hardware
transactional memory. CoRR, abs/1504.04640, 2015.
[20] Craig Disselkoen, David Kohlbrenner, Leo Porter, and
Dean Tullsen. Prime+abort: A timer-free high-
precision l3 cache attack using intel TSX. In 26th
USENIX Security Symposium (USENIX Security 17),
pages 51–67, Vancouver, BC, 2017. USENIX Associ-
ation.
[21] Goran Doychev, Dominik Feld, Boris Kopf, Laurent
Mauborgne, and Jan Reineke. Cacheaudit: A tool for
the static analysis of cache side channels. In Pre-
sented as part of the 22nd USENIX Security Sympo-
sium (USENIX Security 13), pages 431–446, Washing-
ton, D.C., 2013. USENIX.
[22] Fangfei Liu and Yuval Yarom and Qian Ge and Ger-
not Heiser and Ruby B. Lee. Last level Cache Side
Channel Attacks are Practical. In Proceedings of the
2015 IEEE Symposium on Security and Privacy, SP ’15,
pages 605–622, Washington, DC, USA, 2015. IEEE
Computer Society.
[23] Qian Ge, Yuval Yarom, David Cock, and Gernot Heiser.
A survey of microarchitectural timing attacks and coun-
termeasures on contemporary hardware. Journal of
Cryptographic Engineering, 8(1):1–27, Apr 2018.
[24] Daniel Genkin, Lev Pachmanov, Eran Tromer, and Yu-
val Yarom. Drive-by key-extraction cache attacks from
portable code. In International Conference on Ap-
plied Cryptography and Network Security, pages 83–
102. Springer, 2018.
[25] Daniel M. Gordon. A survey of fast exponentiation
methods. J. Algorithms, 27(1):129–146, April 1998.
[26] Ben Gras, Cristiano Giuffrida, Michael Kurth, Herbert
Bos, and Kaveh Razavi. Absynthe: Automatic black-
box side-channel synthesis on commodity microarchi-
tectures. In Network and Distributed Systems Security
(NDSS) Symposium, 2020.
[27] Daniel Gruss, Julian Lettner, Felix Schuster, Olya Ohri-
menko, Istvan Haller, and Manuel Costa. Strong and
efficient cache side-channel protection using hardware
transactional memory. In 26th USENIX Security Sym-
posium (USENIX Security 17), pages 217–233,Vancou-
ver, BC, August 2017. USENIX Association.
[28] Daniel Gruss, Clémentine Maurice, and Stefan Man-
gard. Rowhammer.js: A remote software-induced fault
attack in javascript. In Detection of Intrusions and Mal-
ware, and Vulnerability Assessment, pages 300–321.
Springer, 2016.
[29] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard.
Cache template attacks: Automating attacks on inclu-
sive last-level caches. In 24th USENIX Security Sympo-
sium (USENIX Security 15), pages 897–912, Washing-
ton, D.C., 2015. USENIX Association.
[30] David Gullasch, Endre Bangerter, and Stephan Krenn.
Cache Games – Bringing Access-Based Cache Attacks
on AES to Practice. In Proceedings of the 2011 IEEE
Symposium on Security and Privacy, SP ’11, pages
490–505, Washington, DC, USA, 2011. IEEE Com-
puter Society.
[31] Berk Gulmezoglu, Ahmad Moghimi, Thomas Eisen-
barth, and Berk Sunar. Fortuneteller: Predicting mi-
croarchitectural attacks via unsupervised deep learning,
2019.
[32] Wei-Ming Hu. Lattice scheduling and covert channels.
In IEEE Symposium on Research in Security and Pri-
vacy, pages 52–61. IEEE Computer Society, 1992.
[33] Mehmet Sinan Inci, Berk Gulmezoglu, Gorka Irazo-
qui, Thomas Eisenbarth, and Berk Sunar. Seriously,
get off my cloud! cross-vm rsa key recovery in a pub-
lic cloud. Technical report, IACR Cryptology ePrint
Archive, 2015.
[34] Mehmet Sinan I˙nci, Berk Gulmezoglu, Gorka Irazoqui,
Thomas Eisenbarth, and Berk Sunar. Cache Attacks En-
able Bulk Key Recovery on the Cloud. In Benedikt
Gierlichs and Axel Y. Poschmann, editors, Crypto-
graphic Hardware and Embedded Systems – CHES
2016: 18th International Conference, Santa Barbara,
CA, USA, August 17-19, 2016, Proceedings, 2016.
[35] Gorka Irazoqui, Kai Cong, Xiaofei Guo, Hareesh Khat-
tri, Arun K. Kanuparthi, Thomas Eisenbarth, and Berk
Sunar. Did we learn from LLC side channel attacks? A
cache leakage detection tool for crypto libraries. CoRR,
abs/1709.01552, 2017.
[36] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar.
S$A: A Shared Cache Attack that Works Across Cores
and Defies VM Sandboxing and its Application to AES.
In 36th IEEE Symposium on Security andPrivacy (S&P
2015), pages 591–604, 2015.
15
[37] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar.
Mascat: Preventing microarchitectural attacks before
distribution. In Proceedings of the Eighth ACM Con-
ference on Data and Application Security and Privacy,
CODASPY ’18, pages 377–388, New York, NY, USA,
2018. ACM.
[38] Gorka Irazoqui, Mehmet Sinan Inci, Thomas Eisen-
barth, and Berk Sunar. Know thy neighbor: Crypto
library detection in cloud. PoPETs, 2015(1):25–40,
2015.
[39] Saad Islam, Ahmad Moghimi, Ida Bruhns, Moritz
Krebbel, Berk Gulmezoglu, Thomas Eisenbarth, and
Berk Sunar. SPOILER: Speculative load hazards boost
rowhammer and cache attacks. In 28th USENIX Secu-
rity Symposium (USENIX Security 19), pages 621–637,
Santa Clara, CA, August 2019. USENIX Association.
[40] Saad Islam, Ahmad Moghimi, Ida Bruhns, Moritz
Krebbel, Berk Gulmezoglu, Thomas Eisenbarth, and
Berk Sunar. {SPOILER}: Speculative load haz-
ards boost rowhammer and cache attacks. In 28th
{USENIX} Security Symposium ({USENIX} Security
19), pages 621–637, 2019.
[41] S. Jahagirdar, V. George, I. Sodhi, and R. Wells. Power
management of the third generation intel core micro
architecture formerly codenamed ivy bridge. In 2012
IEEE Hot Chips 24 Symposium (HCS), pages 1–49,
Aug 2012.
[42] John Kelsey, Bruce Schneier, David Wagner, and Chris
Hall. Side channel cryptanalysis of product ciphers.
Journal of Computer Security, 8(2/3):141–158, 2000.
[43] Taesoo Kim, Marcus Peinado, and Gloria Mainar-Ruiz.
Stealthmem: System-level protection against cache-
based side channel attacks in the cloud. In Pre-
sented as part of the 21st USENIX Security Symposium
(USENIX Security 12), pages 189–204, Bellevue, WA,
2012. USENIX.
[44] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin,
Daniel Gruss, Werner Haas, Mike Hamburg, Moritz
Lipp, Stefan Mangard, Thomas Prescher, Michael
Schwarz, and Yuval Yarom. Spectre attacks: Exploit-
ing speculative execution. In 40th IEEE Symposium on
Security and Privacy (S&P’19), 2019.
[45] C.K. KoÃ§. Analysis of sliding window techniques for
exponentiation. Computers & Mathematics with Appli-
cations, 30(10):17 – 24, 1995.
[46] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas
Prescher,Werner Haas, Anders Fogh, Jann Horn, Stefan
Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom,
and Mike Hamburg. Meltdown: Reading kernel mem-
ory from user space. In 27th USENIX Security Sympo-
sium (USENIX Security 18), pages 973–990, Baltimore,
MD, 2018. USENIX Association.
[47] F. Liu,Q. Ge,Y. Yarom, F.Mckeen, C. Rozas, G. Heiser,
and R. B. Lee. Catalyst: Defeating last-level cache
side channel attacks in cloud computing. In 2016
IEEE International Symposium on High Performance
Computer Architecture (HPCA), pages 406–418,March
2016.
[48] Fangfei Liu and Ruby B. Lee. Random fill cache
architecture. In Proceedings of the 47th Annual
IEEE/ACM International Symposium on Microarchi-
tecture, MICRO-47, pages 203–215, Washington, DC,
USA, 2014. IEEE Computer Society.
[49] Clémentine Maurice, Nicolas Scouarnec, Christoph
Neumann, Olivier Heen, and Aurélien Francillon. Re-
verse engineering intel last-level cache complex ad-
dressing using performance counters. In Proceed-
ings of the 18th International Symposium on Research
in Attacks, Intrusions, and Defenses - Volume 9404,
RAID 2015, pages 48–65, New York, NY, USA, 2015.
Springer-Verlag New York, Inc.
[50] Ahmad Moghimi, Gorka Irazoqui, and Thomas Eisen-
barth. Cachezoom: How sgx amplifies the power
of cache attacks. In Wieland Fischer and Naofumi
Homma, editors, Cryptographic Hardware and Embed-
ded Systems – CHES 2017, pages 69–90, Cham, 2017.
Springer International Publishing.
[51] DanielMoghimi,MoritzLipp, Berk Sunar, andMichael
Schwarz. Medusa: Microarchitectural data leakage via
automated attack synthesis. In 29th {USENIX} Secu-
rity Symposium ({USENIX} Security 20), 2020.
[52] Daniel Moghimi, Jo Van Bulck, Nadia Heninger, Frank
Piessens, and Berk Sunar. Copycat: Controlled
instruction-level attacks on enclaves for maximal key
extraction. arXiv preprint arXiv:2002.08437, 2020.
[53] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache
Attacks and Countermeasures: The Case of AES. In
Topics in Cryptology – CT-RSA 2006: The Cryptogra-
phers’ Track at the RSA Conference 2006, San Jose, CA,
USA, February 13-17, 2005. Proceedings, pages 1–20,
Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
[54] Todd Ouska. Commit message: switch tim-
ing resistant exptmod to use temp for square
instead of leaking key bit to cache monitor.
https://github.com/wolfSSL/wolfssl/commit/
6ef9e79ff5ccd2b96fdfed404ada872fd29514be,
2016.
16
[55] Andy Polyakov. Commit message: Agres-
sively prefetch s-box in sse codepatch, . . . .
https://github.com/openssl/openssl/commit/
fc92414273bc30deee51bf1c99abe4b5802f55fb#
diff-c0dcd6713547b63fc56ce9716bf52bd9 , 2006.
[56] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and
Stefan Savage. Hey, you, get off of my cloud: explor-
ing information leakage in third-party compute clouds.
In ACMConference on Computer and Communications
Security, CCS 2009, Chicago, Illinois, USA, November
9-13, 2009, pages 199–212, 2009.
[57] Michael Schwarz, Moritz Lipp, Daniel Moghimi,
Jo Van Bulck, Julian Stecklina, Thomas Prescher, and
Daniel Gruss. ZombieLoad: Cross-privilege-boundary
data sampling. In CCS, 2019.
[58] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel
Genkin, Baris Kasikci, Frank Piessens, Mark Silber-
stein, Thomas F. Wenisch, Yuval Yarom, and Raoul
Strackx. Foreshadow: Extracting the keys to the Intel
SGX kingdomwith transient out-of-order execution. In
Proceedings of the 27th USENIX Security Symposium.
USENIX Association, August 2018.
[59] Jo Van Bulck, Daniel Moghimi, Michael Schwarz,
Moritz Lipp, Marina Minkin, Daniel Genkin, Yarom
Yuval, Berk Sunar, Daniel Gruss, and Frank Piessens.
LVI: Hijacking Transient Execution through Microar-
chitectural Load Value Injection. In 41th IEEE Sympo-
sium on Security and Privacy (S&P’20), 2020.
[60] Jo Van Bulck, Frank Piessens, and Raoul Strackx. Sgx-
step: A practical attack framework for precise enclave
execution control. In Proceedings of the 2nd Workshop
on System Software for Trusted Execution, pages 1–6,
2017.
[61] S. van Schaik, A. Milburn, S. Osterlund, P. Frigo,
G. Maisuradze, K. Razavi, H. Bos, and C. Giuffrida.
Ridl: Rogue in-flight data load. In 2019 2019 IEEE
Symposium on Security and Privacy (SP), pages 88–
105, Los Alamitos, CA, USA, may 2019. IEEE Com-
puter Society.
[62] Pepe Vila, Pierre Ganty, Marco Guarnieri, and Boris
Köpf. Cachequery: Learning replacement policies from
hardware caches. arXiv preprint arXiv:1912.09770,
2019.
[63] Pepe Vila, Boris Köpf, and José F. Morales. Theory and
practice of finding eviction sets. In 2019 IEEE Sympo-
sium on Security and Privacy, SP 2019, San Francisco,
CA, USA, May 19-23, 2019, pages 39–54. IEEE, 2019.
[64] Zhenghong Wang and Ruby B. Lee. New cache de-
signs for thwarting software cache-based side channel
attacks. In Proceedings of the 34th Annual Interna-
tional Symposium on Computer Architecture, ISCA ’07,
pages 494–505, New York, NY, USA, 2007. ACM.
[65] Samuel Weiser, Andreas Zankl, Raphael Spreitzer,
Katja Miller, Stefan Mangard, and Georg Sigl. DATA
– differential address trace analysis: Finding address-
based side-channels in binaries. In 27th USENIX Secu-
rity Symposium (USENIX Security 18), pages 603–620,
Baltimore, MD, August 2018. USENIX Association.
[66] Jan Wichelmann, Ahmad Moghimi, Thomas Eisen-
barth, and Berk Sunar. Microwalk: A framework for
finding side channels in binaries. In Proceedings of
the 34th Annual Computer Security Applications Con-
ference, ACSAC ’18, pages 161–173, New York, NY,
USA, 2018. ACM.
[67] Henry Wong. Intel Ivy Bridge cache replacement pol-
icy, jan 2013.
[68] Yuval Yarom and Naomi Benger. Recovering openssl
ecdsa nonces using the flush+reload cache side-channel
attack. IACR Cryptology ePrint Archive, 2014:140,
2014.
[69] Yuval Yarom and Katrina Falkner. FLUSH+RELOAD:
A High Resolution, Low Noise, L3 Cache Side-
Channel Attack. In 23rd USENIX Security Symposium
(USENIX Security 14), pages 719–732, 2014.
[70] Yuval Yarom, Daniel Genkin, and Nadia Heninger.
Cachebleed: a timing attack on openssl constant-time
rsa. Journal of Cryptographic Engineering, 7(2):99–
112, Jun 2017.
[71] Tianwei Zhang, Yinqian Zhang, and Ruby B. Lee.
CloudRadar: A Real-Time Side-Channel Attack Detec-
tion System in Clouds, pages 118–140. Springer Inter-
national Publishing, Cham, 2016.
17
