CacheShield: Protecting Legacy Processes Against Cache Attacks by Briongos, Samira et al.
CacheShield: Protecting Legacy Processes Against Cache Aacks
Samira Briongos
Universidad Politécnica de Madrid
samirabriongos@die.upm.es
Gorka Irazoqui
Worcester Polytechnic Institute
girazoki@wpi.edu
Pedro Malagón
Universidad Politécnica de Madrid
malagon@die.upm.es
Thomas Eisenbarth
Worcester Polytechnic Institute
teisenbarth@wpi.edu
ABSTRACT
Cache attacks pose a threat to any code whose execution ow or
memory accesses depend on sensitive information. Especially in
public clouds, where caches are shared across several tenants, cache
attacks remain an unsolved problem. Cache attacks rely on evictions
by the spy process, which alter the execution behavior of the victim
process. We show that hardware performance events of crypto-
graphic routines reveal the presence of cache attacks. Based on this
observation, we propose CacheShield, a tool to protect legacy code
by monitoring its execution and detecting the presence of cache at-
tacks, thus providing the opportunity to take preventative measures.
CacheShield can be run by users and does not require alteration of
the OS or hypervisor, while previously proposed software-based
countermeasures require cooperation from the hypervisor. Unlike
methods that try to detect malicious processes, our approach is
lean, as only a fraction of the system needs to be monitored. It also
integrates well into today’s cloud infrastructure, as concerned users
can opt to use CacheShield without support from the cloud service
provider. Our results show that CacheShield detects cache attacks
fast, with high reliability, and with few false positives, even in the
presence of strong noise.
KEYWORDS
Cache attacks, Hardware performance counters, Change point de-
tection
1 INTRODUCTION
Modern computing technologies like cloud computing build on
shared hardware resources opaquely accessible by independent
tenants, ensuring protection through sandboxing techniques. How-
ever, although this isolation is solid at the logical level, ensuring
tenants cannot access each others memory, hypervisors cannot
properly prevent information leakage stemming from the shared
hardware resources such as caches. Compared to other resources
like Branch Prediction Units or the DRAM, caches can be exploited
to recover ne grain information from co-resident tenants in shared
environments.
Cache attacks extract private information by setting up the cache
memory, executing the victim process and observing eects related
to sensitive data. Time-based cache attacks measure the eect on
the victim process execution time [5] while access-based cache at-
tacks measure the eect on the attacker [32]. Practical access-based
cache attacks have been published for cloud environments with
dierent variants: Prime+Probe [9, 17, 34, 47, 48], Flush+Reload
[2, 13, 14, 43], and Flush+Flush [11]. All of them have demon-
strated to recover cryptographic keys, break security protocols or
infer privacy related information. They have shown that attacks
can succeed in contemporary public cloud systems, with severe
consequences to sensitive data of cloud customers.
To deter cache attacks, several techniques for detection and/or
mitigation have been proposed. Most of the proposed mitigation
techniques succeed in stopping cache based attacks, but are not
being adopted by cloud service providers. Proposed hardware coun-
termeasures require making modications to the hardware that not
only induce severe performance penalties but also take years to
integrate and deploy into the infrastructure. Cloud hypervisors, on
the contrary, can implement any of the proposed hypervisor based
countermeasures [19, 21, 39] by just making small modications
to the kernel conguration. Despite the immediate x that these
countermeasures would provide, they are not being adopted by
cloud providers, mainly due to the constant performance overhead
that they add to their systems. Other feasible mitigation proposals
consider periodic VM migration to avoid long-term co-location. VM
migration, however, also introduces extra overhead whether there
is an attack or not. Other proposals suggest that just as the attacker
uses a side-channel to obtain information, the VM can defend itself
by using a side-channel to detect co-resident tenants with possibly
malicious intentions [46]
The current situation leaves tenants with little help from hard-
ware and hypervisor designers or cloud service providers to protect
themselves against cache attacks. Thus, we observe the necessity of
giving those tenants that voluntarily want to protect against cache
attacks, tools to defend themselves. So far, all known cache attacks
have in common that they cause cache misses in the victim VM
process. Thus, detecting an anomaly in the number of cache misses
in the victim can indicate an ongoing cache attack and thus trigger
VM migration or other actions to mitigate the attack.
Cache misses can be obtained by reading the hardware perfor-
mance counters found in all modern processors. These hardware
event counters track hardware events such as cache misses, and
were originally intended to enable the detection of bottlenecks in
executed software. Optimization is not the only application of these
counters, it has also been demonstrated that the hardware per-
formance counters are also useful to detect malware and security
breaches [3, 8, 36, 40]. Libraries such as PAPI (Performance Appli-
cation Programming Interface), facilitate the task of conguring
and reading those hardware counters.
There have been several attempts to detect cache attacks using
the hardware counters [7, 31, 45], but they have strong drawbacks.
1
ar
X
iv
:1
70
9.
01
79
5v
1 
 [c
s.C
R]
  6
 Se
p 2
01
7
Some works require the hypervisor to periodically monitor all exist-
ing processes, which introduces a great overhead in CPU usage and
depends on how eciently an attacker can hide from the monitoring
tool [31, 45]. Other works oer solutions applicable to multi-process
environments, but not feasible in cloud environments [7, 11].
We propose to use a monitoring service inside the VM that de-
tects anomalies in the cache miss hardware performance counter
only in the victim side. The monitoring service can be activated on
demand inside the VM. The performance counters must be exposed
to the VM in order to be feasible. Just changing the conguration of
the hypervisor, it is possible to enable performance counters access
inside the VM. This access can be enabled only in the VMs that
request the service, and as the hypervisor is responsible for the
virtualization of the counters that can be read inside a VM, they
refer uniquely to this VM. That is, one VM can not read counters
referring to another VM, even when they share the hardware. Right
now, most cloud service providers only expose them if the cus-
tomer is renting the whole machine, probably due to their fear of
utilization as a side channel in hardware shared by various tenants.
However, we believe they do not have much to worry about, as
current attacking techniques exploiting shared hardware expose
much more information than hardware counters would.
Our work demonstrates for the rst time, that performance
counter access for tenant VMs can indeed be utilized to improve
security of the tenants. We oer tenant VMs a new monitoring
service, CacheShield, to detect cache attacks. CacheShield can be
activated before running sensitive processes. CacheShield detects
attacks quickly and with high reliability and low CPU overhead,
due to the use of Page’s cumulative sum method [30]. The CUSUM
method is an unsupervised anomaly detection method, ensuring
that even new attack techniques can be detected with high con-
dence. CacheShield automatically turns o if the monitored process
is idle by detecting the lack of activity, resulting in a signicant
reduction in CPU processing overheads. In summary, our work
• presents a performance counter based monitoring service that
users can voluntarily activate to detect when they are under
attack.
• only monitors the victim process upon when active, i.e., the
cloud service provider does not waste cycles continuously
monitoring all processes.
• only requires the hypervisor to enable VM access to the perfor-
mance counters, a feature commonly supported by all major
hypervisor systems, including KVM, VMware and Xen. No
other additional help from the underlying system is needed.
• implements an ecient algorithm that maximises fast and
reliable attack detection, while minimizing false positives and
keeping the performance overhead minimal and restricted to
the victim VM.
• succeeds detecting all existing cache attacks, including stealthy
attacks that are miss-detected by other solutions, e.g. Flush+Flush,
since our detection uses attack characteristics that are inde-
pendent of attack and victim behavior.
The rest of the paper is organized as follows. After discussing
background and related work in Section 2, we show that monitoring
performance events of a victim process is sucient for reliable
attack detection in Section 3. CacheShield is developed in Section 4.
Section 5 presents the performance evaluation in several relevant
scenarios and in section 6 we suggest dierent countermeasures.
Finally, Section 7 discusses the conclusions of our work.
2 BACKGROUND AND RELATEDWORK
2.1 Cache attacks
In the last years cache attacks have shown to pose a big threat in
those systems in which the underlying hardware architecture is
shared with a potential attacker. Cache attacks monitor the uti-
lization of the cache to retrieve information about a co-resident
victim. Indeed, if the utilization of such a hardware piece is directly
correlated with a security-critical piece of information (e.g., a cryp-
tographic key) the consequences of the attack can be as devastating
as an impersonation of the victim.
Two main cache attack designs out-stand over the rest: the
Flush+Reload and the Prime+Probe attacks. The rst was rst in-
troduced in [13], and was later extended to target the LLC to retrieve
cryptographic keys, TLS protocol session messages or keyboard
keystrokes across VMs [12, 18, 43]. Further, Zhang et al. [48] showed
that Flush+Reload is applicable in several commercial PaaS clouds.
Despite its popularity and resistance to micro-architectural noise,
the Flush+Reload presents a main drawback, as it can only be
applied in systems in which memory deduplication mechanisms
are in place, and further, can only recover information coming from
statically allocated data.
The Prime+Probe attack design, contrary to the Flush+Reload
attack, is agnostic to special OS features in the system, and therefore
it can not only be applied in virtually every system, but further, it
can additionally recover information from dynamically allocated
data. This attack was rst proposed for the L1 data cache in [29],
while later was expanded to the L1 instruction cache [1]. Recently, it
has been shown to also bypass several diculties to target the LLC
and recover cryptographic keys or keyboard typed keystrokes [9,
17, 22]. Even further, the Prime+Probe attack was used to retrieve
a RSA key in the Amazon EC2 cloud [16].
Variations of both attacks have also been proposed to bypass
specic diculties found in some systems (e.g., lack of a ush in-
struction in the Instruction Set Architecture). Perhaps the one that
most directly inuences this work is the design of the Flush+Flush
attack, as it was proposed to be stealthy and bypass attack monitor-
ing systems [11]. This attack retrieves information by measuring
the execution time of the ush instruction, thus avoiding direct
cache accesses. As we will see, although this design might be eec-
tive against some of the proposed detection systems, ours correctly
identies when such an attack is being executed.
2.2 Performance counters
The performance counters are special purpose hardware registers
that count a broad spectrum of low-level hardware events related
to code execution. The selection of observable events is usually
larger than the number of actual counters, hence, counters must
be congured in advance. All events associated with a counter
are recorded in parallel. As the PMU allows detailed insight into
the state of the processor in real-time, it is a valuable tool for
debugging applications and their performance. The list of available
events consequently focuses on waiting periods (e.g. clock cycles
2
the processor is stalled), memory or bus accesses (e.g. cache misses
or DRAM requests), and other performance-critical metrics like
branch prediction or TLB events.
All main micro-processor architectures, i.e., Intel, AMD and
ARM, include a bigger or a smaller number of these congurable
registers. However, while monitoring of these hardware events
in Intel and AMD processors is usually possible from user mode
(when referring to an application also being run in user mode), ARM
devices require root rights to enable them. Emulating the behavior
in ARM devices, cloud providers might disable the utilization of
performance counters from guest VMs. Indeed we nd two main
reasons why they would do this
• Performance counters might be utilized with malicious pur-
poses, similarly to the way the thermal sensor was used in [25],
and retrieve information from co-resident user hardware uti-
lization [6], which in theory should not be possible as the
hypervisor only gives information to each VM about itself.
• As performance counters are hardware dependent, giving a
guest VM access to benign utilization of performance counters
might be problematic if guest VMs are migrated over dierent
architectures, as customers would have to design code for
dierent hardware architectures.
We do not believe that these facts should make cloud providers
disable the usage of performance counters from guest VMs, specially
when one can use them as a protection mechanism as we will see
later in this work. In fact, attackers have already found alternative
ways to retrieve the same information performance counters give.
For instance, attackers can read the cycle counter or an incremental
thread to know when TLB or cache misses occur. Thus, disabling the
counters does not entirely prevent the leakage of hardware events
information. As for the second claim, a possible solution could be to
create clusters with the same hardware conguration, and migrate
VMs within this cluster. Thus, we do not believe the above concerns
are strong enough arguments against the guest VM performance
counter usage. In this paper we will further show that such a usage
can indeed oer more protection to cloud infrastructure customers.
2.3 Detection, mitigation and other
countermeasures
HPCs have been used to detect generic malware [3, 35, 36] as well
as microarchitectural attacks [7, 11, 31, 45]. Their success mostly
depends on the ability to correctly identify cache (and other re-
source) attack patterns monitoring the associated event in the HPC.
This approach is usually implemented at the OS or hypervisor
level that has enough permissions to monitor what is running in
the system. However, we observe two main problems with these
detection-based approaches:
• Most of these detection approaches incur severe performance
overheads that hypervisors or OSs do not seem willing to pay,
as to the best of our knowledge no OS is implementing such
a mechanism. This leaves the user of the system with few
resources to know whether her code will be executed in a safe
environment.
• As these detection countermeasures base their success on the
monitoring of both the victim and the attacker processes, the
attacker can vary patterns in a smart way to try to bypass the
detection mechanisms.
These facts are observed, for instance in [7, 31, 45]. All three
works incur signicant overheads on all applications. CloudRadar,
for example, requires three dedicated cores for its detection [45]. In
addition, they usually assume the ability to monitor the attacking
process [7, 11], which is not possible across VM boundaries (except
for the hypervisor), and usually not even possible for user-level
processes.
Detection-based countermeasures are not the only possibility
shown to prevent cache timing attacks. Preemptive approaches can
be taken at the hardware, software and application level. The rst
usually requires changes in the hardware pieces such that colli-
sions in the cache can not happen, or if they do, they do not carry
information [41]. The second involves the utilization of specic
software features (e.g., page allocation) to prevent two processes
from colliding in the cache [19]. Finally, the latter is achieved by
utilizing specic tools to ensure a security sensitive binary does
not leak information, even if it is under attack [44].
3 VICTIM-BASED ATTACK DETECTION
Our objective is to build an attack detection tool that detects any
abuse of the LLC without any modications to the hypervisor,
OS, or the CPU hardware. Unlike previous approaches, we show
that monitoring the behavior of a victim application is sucient
for the detection of cache attacks. To that end, we rst analyze
the behavior of these victim applications by monitoring several
critical hardware performance counters. This behavior of critical
applications is analyzed in the presence and absence of various
cache attacks, and further analysis is performed to determine how
well each counter serves as an indicator for ongoing attacks.
For the sake of simplicity, we base our analysis on cryptographic
algorithms, which are the most popular target for cache attacks. Our
approach can also detect attacks on other security-critical pieces
of code like SSL/TLS protocol stacks. There are dierent types of
cryptographic algorithms in use, which traditionally have been
classied as symmetric cryptography an public-key cryptography.
Symmetric cryptosystems are sometimes also called private key
algorithms, and include algorithms for encryption, authentica-
tion as well as hashing. Encryption and authentication schemes
use single key for both the encryption/authentication and de-
cryption/verication. Popular algorithms include AES and DES
for encryption, SHA-2 and SHA-3 for hashing and HMAC or
GCM for authentication or authenticated encryption. Symmet-
ric primitives are usually heavily optimized for performance
and feature constant execution ows. However, some imple-
mentations make use of table look-ups, which often result in
exploitable cache leakage. One example is AES, the most widely
used encryption algorithm. For AES, table look-ups are dicult
to avoid, unless hardware support such as AES-NI is available.
Public key cryptosystems use a public key for encryption or
verication and private key for decryption or signing. While
public key cryptography can be used in more exible ways,
the used primitives are much more costly than for symmetric
cryptography. As a result, public key cryptography is mainly
3
used for authentication and key exchange to establish a com-
munication session, where payloads are protected using sym-
metric cryptography. Another important oered service are
certicates, which require digital signatures for generation and
verication of certicates. RSA, ECC and ElGamal are currently
the prevailing schemes for public key cryptography.
As explained before, we only collect information about the vic-
tim processes, i.e. the processes operating on sensitive data. This
approach avoids the need of monitoring other processes or VMs
running in the same host. This approach also avoids relying on
the information gathered from an attacker who might try to hide
changes on its behavior to avoid triggering an alarm. Considering
that each kind of algorithm presents dierent characteristics, we
gather and analyze data of the execution of dierent algorithms in
an initial scenario. Next, we show that the main results obtained in
this scenario can be extended to others.
3.1 Analyzing Hardware Performance Events
Modern server CPUs make a large number hardware performance
counters available, but only a limited number, typically 4 to 8, can
be monitored in parallel. We use the Performance API (PAPI) [27]
to access the performance counters. PAPI provides sucient reso-
lution to detect attacks while it also simplies the task of collecting
performance data. In this preliminary step, we collect data from 30
accessible hardware event counters on our test platform, for sample
public and private key algorithms, in the presence and the absence
of cache attacks. The PAPI interface provides instructions that al-
low us to read the counters for our process before and after each
cryptographic operation, that is, we get detailed information about
the variation of the counters for a single encryption or decryption
execution. Since the number of counters that can be read at the
same time is limited, we collect the data for dierent groups of
counters at dierent times. We then join the data and compute the
statistics. Once we get all this data, we carry further study to deter-
mine and quantify which counters provide meaningful information
to detect the attacks.
As sample victim algorithms for this analysis, we chose the
software AES T-Table implementation and the RSA sliding window
implementation (with ag RSA_FLAG_NO_CONSTTIME set) of
OpenSSL 1.0.1f, which give representative results for public key
and symmetric key cryptography. As sample attacks we use the
Flush+Reload against both implementations. Flush+Reload tries
to gain information from the execution of certain instructions or
from the accesses to certain data which depend on the key. For the
used version of RSA, attacks target the instructions (depending on
the implementation RSA can also be attacked considering accesses
to data), while AES attacks are an example of cache attacks focused
on the data.
Our experiments are performed on an Intel Core i7-4790 CPU
3.60 GHz machine with 8 MB of L3 cache and 8 GB of RAM, with
Centos 7 OS. For each counter we collect samples for 1 million
encryption or decryption operations. One noteworthy observation
is that, whereas in the case of AES the values of the counters do
not seem to depend on the key. For the analyzed vulnerable RSA
implementation, however, some of the parameters depend on the
value of the key. This behavior can be noticed, for example, in the
number of instructions executed and in the decryption times. In
fact, the number of operations performed depends on the distribu-
tion of zeros and ones in the key. However, while the number of
instructions is not aected by the attacks, the decryption times are,
as they include the extra times for cache misses.
We can select up to 5 or 6 counters which are representative of
the attacks, as this is the maximum number of counters readable
in parallel on our platform. The number of counters that can be
read at the same time also varies depending on which counters
are used and the combination of them. In order to decide which
counters carry more information relative to the attacks, we use the
WEKA tool [15]. This tool was designed with the aim of allowing
researchers to easily access to state-of-the-art techniques in ma-
chine learning. WEKA implements several algorithms to perform
attribute selection. As inputs for the tool, we select a subset among
all the samples (otherwise the time it takes to perform the selection
increases exponentially). We randomly select 50000 instances of
each of the groups, that is for AES attack and non-attack and for
RSA attack and non-attack, so we obtain 200000 samples with in-
formation about 30 counters, each labeled with ’1’ for attacks and
’0’ for non-attacks.
We rst use the infoGain function, which evaluates the worth
of an attribute by measuring the information gain with respect to
the class according with “InfoGain(Class,Attribute) = H(Class) -
H(Class | Attribute)”, where H is the entropy. Note that our experi-
ments are balanced between attack and non-attack "classes”, that is
H(Class)= 1, thus an ideal attribute would gain 1 bit. Values around
0.5 may indicate the attribute carries meaningful information, but
only for one of the algorithms or one of the attacks. Thus, L3 cache
misses are not only the most meaningful predictions, but also work
across the considered scenarios.
We have also evaluated the relief algorithm [20] for feature
selection. Unlike the InfoGain, which only evaluates information
gained from each attribute individually, the relief algorithm outputs
a score of the predictive value of an attribute relative to other
attributes. More positive weights indicate more predictability for
this attribute. To calculate the weight of an attribute, it iteratively
rst identies the nearest neighbors from the same and dierent
classes. Then, weight increases if a change in the attribute leads to
a change in the class and decreases when a change in the attribute
value has no eect on the class.
Table 1 presents a summary of the counters which give most rele-
vant information for detection according to the selection algorithms,
altogether with their mean values for the considered scenarios, and
with the dierences between attacks and the expected behavior.
Both tests indicate that L3 cache misses are most meaningful. In
fact, the relief algorithm scores all other attributes with very low
scores, implying only little additional gain from using them.
3.2 Concurrent Signal Assessment
Tracking hardware performance events for each cryptographic
operation showed that victim-based attack detection is feasible
and helped identifying relevant counters. However, achieving fast
detection with this approach, would require adding instructions
in the middle of the code we want to protect. Hence, it requires
alteration of the target code, which adds unnecessary burden on
4
Table 1: Overview ofmost relevant hardware performance counters in the presence and absence of attacks, over 1million calls
to RSA and AES, as well as their rankings according to the InfoGain and relief metrics. Level 3 cache misses, PAPI_L3_TCM,
clearly have the strongest information for cache attacks.
Performance AES AES w/ Attack RSA RSA Joint Evaluation
Counter Normal 1 line 4 lines Normal Attack Algorithms
µn µa1 µa1 − µn µa2 µa2 − µn µn µa µa − µn infoGain Relief
PAPI_L3_TCM 0.0002 0.92 0.9189 3.56 3.5598 1.12 2601.4 2600.28 0.885 0.245
Cycles (rdtsc) 612.33 828.60 216.27 1151.71 539.38 8.840e+07 8.956e+07 1.151e+06 0.714 0.014
PAPI_REF_CYC 61.93 71.01 9.08 79.93 18 2.453e+06 2.484e+06 3.1e+04 0.683 0.005
PAPI_CA_SNP 21.95 28.77 6.82 30.35 8.4 727.87 3417.5 2689.63 0.531 0.034
PAPI_CA_INV 21.99 28.83 6.86 30.45 8.46 727.88 3417.6 2689.72 0.530 0.033
PAPI_L3_TCR 24.65 24.73 0.08 25.90 1.25 490.47 3253.6 2763.13 0.528 0.029
PAPI_L2_TCM 28.31 28.51 0.2 28.42 0.09 559.27 3325.6 2766.33 0.513 0.028
PAPI_L2_ICM 15.51 9.16 -6.35 11.86 -3.65 381.12 3149.2 2768.08 0.510 0.056
10 20 30 40 50 60 70 80
Time
0
5
10
15
20
25
30
35
40
LL
C 
M
iss
es
aes no attack
aes 1 flush
aes 2 flush
aes 3 flush
aes 4 flush
rsa no attack
rsa attack
Figure 1: Mean LLC miss traces over time for AES and RSA
executions in the presence and absence of cache attacks. The
numbers next to flush indicate the number of lines ushed
at a time.After the start up peaks, themisses go to zero in the
absence of a cache attack, while under attack they remain
high.
the user and diminishes practicality. Also, for more eective attack
detection, it is preferable to read performance counters concurrently
to the execution of the sensitive process. This way, even attacks
that succeed during the execution of a single call to the sensitive
function, e.g. the attacks presented in [42, 43], can be detected and
prevented in time.
During our initial experiments we have observed that all imple-
mentations feature a start up behavior, where data is loaded into
the cache for the rst time and the frequency of the CPU might
be adjusted. The subsequent executions feature a more constant
behavior. Regarding to the counters analyzed, for AES these start
up executions show indistinguishable behavior under attack and
without an attack. For RSA, they can be distinguished, but it would
be necessary to know exactly if the current sample belongs to the
start-up group of normal executions. However, if we switch to con-
tinuous monitoring, the dierences between algorithms disappear
and the start-up behavior is restricted to a short time at the begin-
ning of the processes. Figure 1 represents the mean value of the
L3 miss counter in our initial scenario setup, for Flush+Reload
attacks as well as normal execution of the mentioned encryption
processes. The average is computed over 1000 encryptions and
counters are read every 100 µs . It can be clearly observed that after
the initial transient state, the number of misses goes to zero in the
absense of attacks (aes no attack and rsa no attack) for both
crypto primitives. It can also be observed that the mean number
of misses in the case of an attack varies with the number of lines
ushed each time aes 1 flush, aes 2 flush.... Thus, with
concurrent monitoring, both algorithms behave similarly for the
normal executions.
Switching to continuous monitoring of the counters implies
that the information on total encryption times or reference cycles
is no longer useful nor available. To ensure the information of
the other counters mentioned in Table 1 is still optimal for attack
detection, we performed a new analysis considering each sample
collected at a period of 1 ms as an independent input to the selection
attribute algorithms. The results show that for the LLC misses
counter the infoGain increases up to 0.92, while values for the
other counters decreases. Additionally, the relief algorithm output
still gives better score for the L3 cache misses (0.18) and in this
scenario, this value is still 5 times bigger than the weight of the next
counter, indicating the L3_TCM is still the one counter of choice for
cache attack detection.
We performed additional experiments to determine how well
a cluster algorithm would distinguish between attacks and non-
attacks with the periodically sampled data from several counters
at once. WEKA also includes clustering algorithms. We tested EM
and Self Organizing Maps, setting the number of clusters to two.
The most interesting result of this experiments is that while these
algorithms were able to classify in the same cluster respectively
84% and 91% of the attack samples when using only the LLC misses
counter, this number decreases to around 50-60% when adding
other counters. These results indicate that cache attacks can be
detected, regardless of the algorithm the victim process runs, by
only using information gathered from the L3 cache miss counter.
The algorithms feature zero misses after the initial warm-up, except
if an attacker is forcing misses. Additionally, as all known cache
attacks, including Flush+Flush, cause cache misses on the victim
process to obtain information, the results obtained here for the
Flush+Reload attack are applicable for other attacks. Thus, we de-
cided to only use this one attribute, as it provides most information
and, also allows us to keep the detection tool simple.
5
Figure 2: Overview of CacheShield.
4 CACHE SHIELD
So far, techniques proposed to detect cache attacks imply monitor-
ing the victim VM, the attacker VM, and any other VM running in
the same host [7, 45]. Monitoring all VMs at rates which vary from
1 us to 5 ms result in huge overheads, and increases with each new
virtual machine allocated in the same host.
As a consequence, cloud providers may not want to implement
such a tool, as it increases overall system cost, while the benet
of preventing cache attacks might be a benet only few customers
are willing to pay for. Yet, only the hypervisor, and thus the cloud
service provider (CSP) has the ability to monitor all VMs on a
system. Indeed, as of now, we are not aware of any CSPs employing
VM monitoring for microarchitectural attacks.
As a dierence with previous approaches, our goal is to design
CacheShield in such a way that we avoid monitoring all the other
processes or VMs running in the same host, i.e., we only focus on
our own process. We assume that we have access to the perfor-
mance counters within the VMs. Although most cloud providers
currently do not allow access to the performance counters, hypervi-
sor systems such as VMware and KVM can be easily congured to
permit reading the counters inside the VM. Moreover, it is possible
to decide which of the VMs allocated in a host would have access
to the counters for their processes upon request. Even when our
approach can be implemented at the hypervisor, we believe that
for cloud providers would be easier just to enable the counters for
the VMs that require it, leaving the responsibility on them, than to
take care of these attacks.
By leaving the choice of deciding which processes should be
monitored and when in the hands of the user, the impact in per-
formance of such monitoring is reduced to a minimum, as we only
watch a possible victim when it is executing the protected task.
From the cloud provider’s point of view, this way of facing detec-
tion also means no waste, as it only aects the implied VM and
only when it is necessary. Additionally, as the user decides when it
is necessary to protect a process, we avoid the need to detect when
a sensitive process is executed. As a consequence, we also reduce
the risk of not detecting the execution of this sensitive process and
then the probability of missing an attack.
Figure 2 presents a diagram of our proposed solution. Whenever
a user wants protection, he informs the CacheShield module, which
utilizes the information gathered from the performance counters
to decide whether the user is being attacked. If CacheShield detects
an attack, an appropriate response mechanism to prevent the in-
formation leakage is put in place. Although we mainly focus on
the detection phase, we discuss in Section 6 some of the counter-
measures that can be implemented to eectively prevent the attack
from retrieving information, such as the utilization of a fake key or
the addition of noise patterns in the cache.
4.1 Detection Algorithm
One of our goals is obtaining a technique for attack detection no
matter which algorithm is being attacked. Additionally, we want to
detect all types of cache attacks, even unknown attacks, for which
the tool has not been trained to detect.
Supervised learning algorithms such as neural networks, have
already been used to detect certain cache attacks. As any supervised
algorithm they have to be “trained” to detect the attack. That is,
they require a labeled data set including data from the dierent
attacks we want to detect, so they can build models of them and
identify their characteristic features. The drawback of supervised
learning is precisely that we need to train the algorithm for each
situation, for each algorithm and each attack. As a consequence new
attacks, or attacks with dierent patterns would not be detected.
The alternative is using unsupervised techniques. An unsuper-
vised algorithm does not receive labeled data, by itself tries to
cluster the received data into dierent groups or to nd relation-
ships between dierent inputs in order to put any new sample in the
appropriate cluster. We will briey explore clustering techniques in
the next section to select the counters which can identify an attack.
Other kind of unsupervised techniques are anomaly-based detec-
tion algorithms, which in theory could detect “zero-day" attacks.
To the best of our knowledge no successful cache detection fully
based on anomaly-detection techniques has been yet demonstrated.
Change-point detection methods are designed to deal with the
problem of detecting abrupt changes in distributions. Under the
assumption that cache attacks have an eect in the performance
of the protected algorithms, change-point detection algorithms
stand as great candidates to detect LLC attacks. We propose an
algorithm based on change point-detection techniques which is
self-learning so it adapts itself to detect dierent attack patterns,
which allows us to x the attack detection delay, and which is
computationally simple so it respects the constraint of minimum
impact in performance and can be implemented online.
4.2 Cache Shield Design
CacheShield monitors the counters for LLC misses and for total
cycles. The former gives information about the use of the LLC of
the protected processes while the latter gives information about
when it is running or when it has nished. Based on Figure 2,
the CacheShield module needs the PID of the process we desire
to protect and the process protected also needs to know the PID
of the CacheShield process. The reason is both processes need to
communicate with each other (one needs to inform the other when
to watch and other needs to inform the one when there is an attack
going on), and that the counters can be attached to gather the data
from a single process given its PID.
On Unix systems, the easiest way to use CacheShield is to use
the fork operation, and then to use the exec system call to run the
module and to give it the PID of the parent process. The parent
process then can execute the desired operation while being moni-
tored. In case that the parent process stops or waits for something,
CacheShield automatically stops after noticing the parent has not
6
been running for a while. This means that when the parent runs
again, it needs to send a “SIGCONT” signal to the CacheShield tool.
In a similar way, if the tool detects an attack, it can send a signal
to inform the parent. On Windows Systems the mechanisms for
inter-process communications are slightly dierent, but the tool
can be also adapted.
Change Point Detection: In order to eectively asses the detec-
tion task, we made use of change point detection theory (CPD) [4].
This theory can be used to construct the commonly known as quick
detection algorithms, which have been successfully applied for qual-
ity control, signal-processing, anomaly or intrusion detection tasks
among other problems [26, 28, 37, 38]. The assumption in these
scenarios is that the parameters describing the monitored system
do not change or change very slowly under normal conditions. The
parameters can, however, change at unknown time instants (includ-
ing at startup) into anomalous conditions. Thus, CPD algorithms
are used to determine if there has been a signicant change in the
characteristic parameters of the monitored system, quickly and
with high condence.
The theory of change point detection leads to the development
of ecient algorithms presenting certain optimality properties, in
the sense that for a given false-alarm rate (FAR) they minimize
the average time it takes to detect the change in the descriptive
system’s features [28]. CPD algorithms can be easily implemented,
do not require too much memory and, as a consequence do not have
signicant computation overhead. These methods belong to the
“anomaly detection” class and are unsupervised techniques. Hence,
they are well-suited to detect new attacks. All these properties
made them very attractive for the attack-detection objective. In the
following, we describe the parameters of the algorithm and how
we assess key issues, such as the choice of models or the use of
prior information.
We denote the sequence of observations of the N variables
monitored in parallel as X (t) = (X1(t), ...,XN (t)), t ≥ 1. Before
a change occurs, the joint probability distribution (pdf) of the
random variables X1, ...,XN also known as prechange distribu-
tion, can be denoted as p0(X1, ...,XN ). If a change occurs at an
unknown time instant λ, the observations will follow a dierent
distribution p1(X1, ...,XN ), also called postchange distribution. That
is, when t < λ the observations X (t) will have conditional pdf
p0(X (t)|X (1), ...,X (t − 1)), and pdf p1(X (t)|X (1), ...,X (t − 1)) for
t ≥ λ.
Under the hypothesis that a change has occurred, the stopping
time τ at which the alarm is triggered gives a measurement of the
detection time. It is typically dened as the rst time the change
sensitive statistic watching the system, exceeds a threshold. Naming
E0 and Eλ the expectations for the sequence of observations prior
and after the change at time λ , the average detection delay (ADD)
is dened as:
ADDλ(τ ) = Eλ(τ − λ |τ ≥ λ)
On the other hand, considering that there has not occurred any
alarm, the mean time between false alarms will be given by the
expression E0τ . As a consequence of this denition, the average
frequency of false alarms or false alarm rate (FAR) is dened as:
FAR(τ ) = 1
E0τ
For a good detection procedure it is expected low FAR and small
values of the expected detection delay. The design of CPD algo-
rithms often involves a trade-o between these two parameters.
Page’s cumulative sum (CUSUM) detection algorithm [30] is one
of the most popular CPD algorithms: with a full-knowledge of the
pre-change and post-change distributions it provides an optimal
scheme minimizing the worst-case detection delay. Page’s CUSUM
algorithm utilizes the log-likelihood ratio (LLR) to check the hy-
pothesis that a change occurred, LLR is dened as:
s(t) = lnp1(X (t)|X (1), ...,X (t − 1))
p0(X (t)|X (1), ...,X (t − 1))
The key property of this ratio is that a change in the parameter
under study will also cause a change in the sign of the log-likelihood
ratio. In other words, s(t) shows a negative drift before change
and a positive drift after change. The relevant information for the
detection task lies then in the dierence between the value of s and
a minimum value. The decision rule is based on a comparison with
a threshold h:
дk = Sk −mk ≥ h
where
Sk =
k∑
t=1
s(t) mk = min1≤j≤k Sj
This decision rule can be replaced by the following, which obeys
the recursion and whose value for the initial observation is k = 0.
дk = max
{
0,дk−1 + ln
p1(X (k))
p0(X (k))
}
≥ h
Then the detection time for the given threshold is
τ (h) = min {k ≥ 1 : дk ≥ h}
Although this rst approach considers that both distributions are
known, this assumption is usually not true, and as a consequence
this proposal has to be adapted for each situation. We may know one
distribution in advance or none, so it may be necessary to estimate
the parameters of the algorithm during the runtime. As long as
the estimators for the distributions and the real observation meet
certain convergence conditions, we will be able to x for example
the desired detection delay or the FAR.
ChangePointDetection inCacheShield:While facing the cache
attack detection, the attack may start from the very beginning or
it may start after a few “normal” transactions. Both situations are
eciently managed by the proposed CUSUM algorithm. We assume
that each new sample can be classied into one of two dierent
groups or clusters, namely “attack” and “non-attack”. The “non-
attack” cluster represents how we expect the protected process to
behave under normal conditions. Based on the information we can
gain from the counters, this assumption is that after a few samples
corresponding to the initialization of the protected process, the
number of L3 cache misses will be around 0, then µna = 0. On the
other hand, when there is an attack, we have observed that the
mean number of misses is µa . Then each new sample belonging
to the “attack” cluster will be around µa . The value of µa is un-
known and depends on the attack so it needs to be computed and
recalculated with each new sample.
If we denote as missi each new sample that the CacheShield
module gets referring to the protected process, we need to decide if
7
it belongs to one cluster or to the other. To do so, we compute the
value of the "probability" that missi belongs to each one making
use of the distance metric, this way we dene the distance from
missi to µna as:
dna (i) =missi − µna =missi
Then, the distance with the "attack" cluster will be
da (i) = |missi − µa |
As stated before the value of µa is unknown when we start
to monitor the process. We select an arbitrary initial value, and
whenever a new samplemissi is obtained, ifmissi ≥ 0 we update
the value of µa as follows:
µa = (1 − β) ∗ µa + β ∗missi
This method is known as exponentially weighted moving aver-
age, where the weight of the older datum decreases exponentially.
This way of estimating the mean of the "attack cluster" makes the
the election of the initial arbitrary value irrelevant after collecting
a few misses samples. If the initial value is chosen too low, we may
trigger false positives. We recommend the election of an initial
value higher than 10, in order to keep the rate of false positives
low, while being able to detect the attack in a reasonable time. We
will further discuss the noise tolerance of the proposed detection
algorithm in the next section. In our experiments we set β = 0.05
and the initial value to 12.5.
Now we are in conditions to dene the probability of belonging
to each cluster:
pna (missi ) = da (i) + 1|dna (i)| + |da (i)| , pa (missi ) =
dna (i) + 1
|dna (i)| + |da (i)|
The value 1 has been added to avoid divisions by 0 in the LLR calcu-
lation that has to be performed as part of the detection algorithm.
As a result, for every sample k , k ≤ 1 we can express the detection
rule as follows:
дk = max
{
0,дk−1 + log
dna (k) + 1
da (k) + 1
}
≥ h
As it can be easily derived from the previous equation and ac-
cording to the properties of the LLR, when the number of misses is 0
or close to 0, the distance between the sample and the "non-attack"
cluster dna (k) will be lower than the distance to the attack cluster
da (k), so the value of the metric дk decreases or stays at zero. On
the other hand, readings from the LLC misses counter approaching
to the attack cluster will increase the value дk . The properties of
this approach let us choose the threshold based on a minimum
detection time we want to achieve. Note that when the error in the
estimation of the mean ϵ approaches to zero, da (i) = ϵ also tends
to zero, then the increase in the value of дk is also limited
log dna (k) + 1
da (k) + 1 ≤ log(µa + 1)
As a consequence, the minimum expected detection time for the
given threshold h is:
τe (h) ≥ hlog(µa + 1)
or reformulating this equation, the threshold h, for a minimum
expected delay τe
hτ ≤ τe ∗ log(µa + 1)
0 20 40 60 80 100 120
Sample
0
10
20
30
Misses trace
gk
estimated mu
threshold
Figure 3: Relevant parameters for the detection task,
prime+probe attack on AES
The unit of the τe (h) is number of samples. Given that the most
eective cache attacks can potentially extract most of the key with
just one execution of the victim, the sampling rate must be chosen
lower than the execution time of the victim. As the execution time
of these algorithms is in the order of few milliseconds, a sampling
rate of 100 µs seems sucient to provide evidence of the attack. This
frequency can be increased at additional load for the system. So, for
an expected detection delay of 1 ms with a sampling rate of 100 µs
we can dene the threshold ash = 10∗ log(µa+1). As a result of this
selection of h, when the µa is recalculated, the threshold should be
recalculated too. The choice of the threshold h also determines the
tolerance to noisy frames, and as a consequence the false positive
rate. In practice, the false positive rate cannot be estimated and has
to be measured.
Algorithm 1 summarizes CacheShield implementation and an
example of the values of the parameters considered in the detection
process is given in Figure 3.
Algorithm 1 CacheShield detection algorithm
Input: Process PID
Output: Aack detected
read_counters(misses,cpu_cycles);
wait;
while victim_is_running do
read_counters(misses,cpu_cycles);
if misses> 0 then
update µa ;
update h;
end if
calculate дk
if дk > h then
trigger_alarm;
end if
wait;
return detected;
5 EVALUATION OF CACHESHIELD
Once we have dened the relevant parameters of the detection
algorithm and described it in detail, we evaluate its performance.
To this end we ran several experiments in dierent environments
and machines.
Native Environment The experiments for non-virtualized envi-
ronments were performed in an Intel Core i7-4790 CPU 3.60GHz
8
machine with 8 MB of L3 cache and 8 GB of RAM, with Centos
7 OS.
KVM-based Hypervisor These experiments used the same hard-
ware as above, but this time within a VM also with Centos 7
hosted in KVM as hypervisor.
VMware-based Cloud Server We have also executed experiments
in a host managed with VMware, this machine is equipped with
a Intel XeonE5-2670 v2 processor, 25Mb of L3 cache and 32GB
of RAM. The OS in these VMs was Ubuntu 12.04.
When a user is executing the crypto algorithm in their own
machine, they can get information about the utilization of such
machine or other task running concurrently. However, when exe-
cuting the crypto algorithms in cloud environments they can not
get any information about what their neighbors are doing. In such
scenarios, it becomes mandatory to study how the execution of
dierent applications running in parallel with the protected pro-
cess aects the behavior of CacheShield. Note that as we use the
"total cycles” counter (to determine if the victim is executing or not)
and the LLC misses counter to decide if there is an attack going
on, applications consuming high amount of memory resources are
the most likely to cause the LLC misses indicator to rise, and as
a consequence, to trigger false positives. For this reason, we have
selected several worst case scenario applications with high memory
activity to run in parallel with the victim and CacheShield:
Yahoo Cloud Serving Benchmark This benchmark was origi-
nally designed as a tool that provides a common evaluation
framework and a set of common workloads to test the perfor-
mance of dierent serving stores as elastic search, Cassandra,
MongoDB among others [33]. It allows dierent congurations
for the workloads and provides a set of example workload sce-
narios, together with a workload generator, which generates
the load to test storage systems. In our experiments, we use
this benchmark with the Apache Cassandra database and the
example workload named workloada.
Video Streaming Another kind of application that can generate
cache misses is web-browsing or video streaming. The video
streaming VM continuously streams and plays back youtube
videos on the refox browser.
Randmem Benchmark This benchmark was originally intended
to test the impact of burst reading and writings [24]. Depending
on the conguration, the benchmark accesses data stored in
an array either sequentially or in random order. The tool also
allows to congure the size of the memory it is going to use,
by default it tries to use as much as possible, up to 2 Gb. In
our experiments, we launch each randmem instance with no
memory limitation, which means 2 Gb of RAM memory are
used by each instance.
To show the applicability of CacheShield to a broad range of im-
plementations that require protection against cache attacks, we
chose from a range of crypto primitives and implementations,
though focusing on vulnerable ones, since such legacy implemen-
tations actually require protection. The three crypto algorithms
considered as victims are
AES as the most common symmetric encryption algorithm. We
consider the T-Table implementation of AES from Openssl
1.0.1f, which is fast, but also leaky.
RSA is the probably most widely used signature and public key
encryption algorithm. We analyzed the RSA implementation
from Openssl 1.0.1f, with a 2048 bit key, and the RSA_FLAG_-
NO_CONSTTIME ag set.
ElGamal we chose the ElGamal implementation of libgcrypt 1.5.0
with a 4096 bits key. Unlike AES and RSA, ElGamal was not
considered during the design of CacheShield, and hence shows
how CacheShield can be expected to perform for other types
of algorithms.
These algorithms dier quite signicantly in their particular imple-
mentation and usage of cache. Many other potentially leaky codes
might require protection, and we are condent that CacheShield
will perform well.
To evaluate the eectiveness of CacheShield across dierent
types of cache attacks, we implemented and performed three popu-
lar attacks, namely Flush+Reload, Flush+Flush and Prime+Probe.
We collected data for the above-mentioned algorithms under attack
as well as from normal executions, as baseline behavior. Under each
conguration, we collect data for more than 1000 executions of
the crypto primitives, and in the case of the AES attack we also
consider dierent attack rates (number of lines ushed at a time),
as the attacker may try to gain dierent amount of information
from the T-tables per execution [10]. As stated in previous sections,
the main characteristics dening the detection algorithm are the
mean detection time, and the false positive rate. Table 2 presents the
results for mean detection time under dierent congurations, for
the dierent attacks and algorithms and table 3 shows the results
related with false positives in noisy environments.
Note that the attack requirements for Flush+Flush/Flush+Reload
and Prime+Probe dier signicantly. While Flush+X attacks are
faster and more precise, they require shared data, i.e. deduplica-
tion between attacker and victim. All the attacks performed in
virtualized scenarios were across VMs so we enabled deduplication
features (KSM and TPS for KVM and vmware respectively) to per-
form Flush+X attacks. Prime+Probe attacks work across VMs even
without deduplication, so we disabled deduplication and enabled
huge pages. Prime+Probe attacks require, prior to the information
extraction, a proling of the cache [9, 16, 17]. The proling stage
reveals the sets the victim process is accessing and that carry the
necessary information to succeed in the attack. In this situation, the
detection tool will trigger an alarm whenever the set being tested
by the attacker was actually used by the victim. Fig. 4 visualizes the
output of the detection algorithm,for the cache proling stage of
an 8 MB L3 cache when the target is the T-table implementation of
AES. The x-axis represents each set of the cache being evicted dur-
ing the Prime+Probe proling step; a 1 on the the y-axis indicates
that an alarm has been triggered. Thus, alarms are only triggered
when the cache attack aects the target.
For all evaluated attacks, the detection rates are 100%. Note that
the sampling rate is 100 µs and that we want to detect the attack
before the end of each decryption (for public key cryptography).
If we wished to detect attacks against algorithms whose duration
is below 5 or 6 ms, we will need to increase the sampling rate,
since mean number of samples required to detect the attack cannot
be lowered arbitrarily without increasing the FAR too much. The
duration of the decryption/encryption depends on frequency of
9
Table 2: Mean detection time (ms) per attack and scenario for the evaluated crypto algorithms. Note that in all cases
CacheShield has the same conguration and that detection times are much lower than the ones required for the attack to
succeed
Scenario AES RSA ElGamal
F+R (1) F+R (4) F+F (1) P+P F+R F+F P+P F+R F+F P+P
Native 3.98 4.48 3.38 5.08 3.70 3.54 5.16 2.97 3.47 3.68
KVM 7.38 7.05 6.64 9.53 4.08 3.92 4.93 3.76 3.45 3.98
Vmware 8.75 5.98 10.74 13.42 4.43 3.87 4.51 4.83 5.06 7.08
Table 3: False positive rate for dierent scenarios and al-
gorithms. (Instances: Y - Yahoo Cloud Serving, V - Video
Streaming; R - Randmem)
Scenario Noise Instances False Positives
Y V R AES RSA ElGamal
KVM 1 0 0 1.2% 4.5% 2.8%
KVM 0 1 0 1.1% 3.4% 0.6%
KVM 0 0 1 12.2% 21.4% 15.4%
Native 1 0 0 1.2% 4.1% 2.4%
Native 0 1 0 0.5% 1.3% 0.3%
Native 0 0 1 11.1% 19.2% 13.8%
VMware 1 2 10 0.1% 5.9% 4.1%
1000 2000 3000 4000 5000 6000 7000 8000
Set profiled
0
0.5
1
Figure 4: Output of CacheShield when the cache is proled
accessing each set. "1" indicates a positive attack detection.
the processor, and as a consequence on the machine. For example,
ElGamal encryption takes around 11 ms when being attacked on
the i7 machine, while this time increases up to 24 ms on the Xeon
machine. Thus, we are able to detect attacks against ElGamal when
less than the 37% of the encryption has been performed for the
i7 machine, and 30% for the second machine in the worst case.
Regarding to RSA this mean execution times are around 18 ms
for the i7 machine and around 37 ms for the other. Then, in the
worst case, on average we detect the attacks with less than 50% of
the decryption performed in the rst case and with about 37% of
decryption in the second one.
Regarding to the existing dierences between false positive rates
for AES and public crypto algorithms, these are easy to explain.
While between AES encryptions exists some time in which the
processor does nothing, the others execute uninterruptedly. This
fact increases the probability of other processes accessing the cache
during the same interval. For example, while the AES encryption in
the period of 100 µs is only active during around 7000 cycles while
the RSA process is active during about 30000 cycles for the VMware
machine when there is no attack. Fig. 5 depicts the LLC misses
for one noisy RSA encryption, besides the initialization steps, it
can be observed a high amount of cache misses during the whole
0 20 40 60 80 100 120 140 160
Samples
0
100
200
M
is
se
s
Figure 5: LLC misses for a noisy RSA execution under rand-
mem benckmark.
0 100 200 300 400 500 600
Sample
0
10
20
30
M
is
se
s
Figure 6: Sample of a noisy execution of AES under rand-
mem benchmark.
decryption. Similarly, Fig. 6 corresponds to one process performing
AES encryptions.
The results also show that the tolerance to noise of the detec-
tion algorithm is more dependent on the hardware than on the
virtualizing technology: While the results for the native and KVM
scenarios are similar and the hardware is the same, the results are
signicantly better on the Xeon machine. The Xeon machine did
not trigger any false positives when there were one or two VMs
generating "noise" concurrently, until we launched several more
instances. As this machine is more similar to the kind of machine
cloud providers utilize, these results show that the tool is practical
in these environments.
One approach to reduce the false positives in noisy environ-
ments could be considering the variance of the samples collected in
the CUSUM algorithms proposed, as attacks present low variance
compared with noise. However, we could fail to detect attacks mas-
querading as memory activity by generating dierent number of
misses each time. Another consideration relative to memory utiliza-
tion, and the false positives that are triggered when is high memory
utilization is that Prime+Probe attacks need low memory activity
to accurately locate the sets and to perform the attack, other way
it renders much more dicult. On the other hand, Flush+Flush
attacks are more tolerant to noise, but memory activity degrade its
performance. So it is not likely that the attacker performs the attack
in a situation where the memory is highly utilized. Additionally, the
level of utilization of public clouds is low [23], so the assumption
of high memory utilization in the considered cloud scenarios may
not be realistic. As for using the tool in our controlled physical
10
101 102 103
sampling rate us
0
5
10
15
20
25
CP
U 
ut
iliz
at
io
n
Figure 7: CPU utilization of the i7 machine in native envi-
ronment for dierent sampling rates inmicroseconds. High-
lighted the 100 us rate as it is the one we use in our experi-
ments
101 102 103
sampling rate us
0
5
10
15
20
25
30
35
40
CP
U 
ut
iliz
at
io
n
Figure 8: CPU utilization of the Xeonmachine in virtualized
environment for dierent sampling rates in microseconds.
Highlighted the 100 us rate as it is the one we use in our
experiments
machine, we can get to know which is the level of utilization of the
memory and then decide if it is worth it to change the parameters
of the detection algorithm.
One last consideration about our tool is the amount of CPU it
utilizes to monitor the victim and compute the detection algorithm.
Fig. 7 and 8 show the mean CPU utilization of CacheShieldfor
dierent sampling rates and for dierent situations, namely when
the victim is attacked and when is not, because the amount of
operations it has to do changes, and again for both architectures,
depending on the sampling rate. To obtain the CPU utilization we
have measured the time it takes to read the counters and perform
the calculations and the total time elapsed, then the utilization is
given by its division. Note that sampling rates of 10 µs are not
always achievable as sometimes (around 10% of the time) it takes
more time to read the counters and perform the calculations. Note
that in both cases total utilization of our tool is below 5% of CPU
utilization.
6 CACHE ATTACK COUNTERMEASURES
Once an attack has been detected, CacheShield needs to react in
some way. One way is to simply interrupt the monitored process
and to purge used keys. While this approach ensures high secu-
rity, it decreases the usability, as any false positives will result a
total cryptosystem shutdown. An alternative is to continue execu-
tion, but to apply preventative measures to reduce or prevent the
exploitability of the cache.
Adding Noise A simple method to hinder cache attacks is making
the channel noisier, e.g. through frequently ushing cache lines
used by the protected process, or by performing additional
reads on data. This approach works particularly well if critical
data is known, e.g. the tables of an AES implementation.
Dummy Operations An alternative approach is to perform dummy
operations on meaningless secrets. In practice this can mean
to run the protected process with a newly generated secret.
the original process can either be paused, or be continued in
parallel to the dummy process. Parallel processing obfuscates
the true leakage. However, depending on the attack type, an
attacker might still succeed with an increased number of ob-
servations. Pausing has the advantage that the attacker might
actually extract the dummy key and discontinue the attack. The
monitor can then restart the original process in the absence
of the attack. Either way, the performance degradation is not
negligible, but it only is incurred in the presence of an attack.
Protected Implementations The main reason why leakage is
still observed in security solutions is the performance overhead
that pure constant time implementations present. A way of
avoiding such a scenario is to use protected implementations
only when CacheShield detects an attack behavior. When no
attack is detected, faster (less secure) implementations can be
used.
Other more sophisticated solutions are also possible, but might not
be as universally applicable. Since our focus is on the lightweight
detectability of cache attacks at the user level, we do not explore
these additional avenues of countermeasures.
7 CONCLUSION
In this work we have introduced CacheShield, a tool that is able
to detect all known types of cache attacks targeting cryptographic
applications. The analysis of various hardware performance coun-
ters revealed that the LLC miss counter by itself carries enough
information to detect cache attacks. We take advantage of change
point detection algorithms and adapt them to our objective of cache
attack detection. CacheShield was designed to detect attacks based
on the characteristics of two particular algorithms, AES and RSA.
The evaluation revealed that CacheShield can also be used for other
algorithms (as shown for ElGamal) without further modication. It
is also eective against “unknown” attacks, as all known attacks
force cache misses on the victim. This behavior can be easily de-
tected, since the number of L3 cache misses of crypto algorithms
approaches zero after a brief initial warm-up. In addition, we have
shown that CacheShield tolerates considerably high amount of
noise only triggering a few false positives in machines similar to
the ones cloud providers use.
Previously proposed cache attack detection tools work at the hy-
pervisor level and also need to continuously monitor all untrusted
and concurrently running processes or VMs, resulting in huge per-
formance overheads and often have questionable detection rates
for novel attacks such as Flush+Flush. CacheShield only needs ac-
cess to the protected victim process, and only during its execution,
11
greatly reducing the waste. All major hypervisor systems support
transparent access to hardware performance counters for guest
VMs while ensuring proper isolation between VMs. We urge Cloud
Service Providers to enable these features in their systems and thus
give their tenants nally the means to protect themselves against
cache attacks with tools such as CacheShield.
8 ACKNOWLEDGMENTS
Visit of Samira Briongos to Vernam group at Worcester Polytech-
nic Institute has been supported by a collaboration fellowship of
the European Network of Excellence on High Performance and
Embedded Architecture and Compilation (HiPEAC). This work
was in part supported by the National Science Foundation under
Grant No. CNS-1618837 and by the Spanish Ministry of Economy
and Competitiveness under contracts TIN-2015-65277-R, AYA2015-
65973-C3-3-R and RTC-2016-5434-8.
REFERENCES
[1] Onur Acıiçmez and Werner Schindler. 2008. A Vulnerability in RSA Implementa-
tions Due to Instruction Cache Analysis and its Demonstration on OpenSSL. In
Topics in Cryptology–CT-RSA 2008. Springer, 256–273.
[2] Gorka Irazoqui Apecechea, Mehmet Sinan Inci, Thomas Eisenbarth, and Berk
Sunar. 2014. Wait a Minute! A fast, Cross-VM Attack on AES. In Research
in Attacks, Intrusions and Defenses - 17th International Symposium, RAID 2014,
Gothenburg, Sweden, September 17-19, 2014. Proceedings. 299–319. DOI:https:
//doi.org/10.1007/978-3-319-11379-1_15
[3] M. B. Bahador, M. Abadi, and A. Tajoddin. 2014. HPCMalHunter: Behavioral
malware detection using hardware performance counters and singular value
decomposition. In 2014 4th International Conference on Computer and Knowledge
Engineering (ICCKE). 703–708.
[4] Michèle Basseville and Igor V. Nikiforov. 1993. Detection of Abrupt Changes:
Theory and Application. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
[5] Daniel J. Bernstein. 2005. Cache-timing attacks on AES. Technical Report.
[6] Sarani Bhattacharya and Debdeep Mukhopadhyay. 2015. Who Watches the
Watchmen?: Utilizing Performance Monitors for Compromising Keys of RSA on
Intel Platforms. Springer Berlin Heidelberg, Berlin, Heidelberg, 248–266. DOI:
https://doi.org/10.1007/978-3-662-48324-4_13
[7] Marco Chiappetta, Erkay Savas, and Cemal Yilmaz. 2016. Real time detection of
cache-based side-channel attacks using hardware performance counters. Applied
Soft Computing 49 (2016), 1162 – 1174.
[8] John Demme, Matthew Maycock, Jared Schmitz, Adrian Tang, Adam Waksman,
Simha Sethumadhavan, and Salvatore Stolfo. 2013. On the Feasibility of Online
Malware Detection with Performance Counters. In Proceedings of the 40th Annual
International Symposium on Computer Architecture (ISCA ’13). ACM, New York,
NY, USA, 559–570.
[9] Fangfei Liu and Yuval Yarom and Qian Ge and Gernot Heiser and Ruby B. Lee.
2015. Last level Cache Side Channel Attacks are Practical. In Proceedings of the
2015 IEEE Symposium on Security and Privacy (SP ’15). IEEE Computer Society,
Washington, DC, USA, 605–622. DOI:https://doi.org/10.1109/SP.2015.43
[10] Marc Green, Leandro Rodrigues Lima, Andreas Zankl, Gorka Irazoqui, Johann
Heyszl, and Thomas Eisenbarth. 2017. AutoLock: Why Cache Attacks on ARM
Are Harder Than You Think. CoRR abs/1703.09763 (2017). http://arxiv.org/abs/
1703.09763
[11] Daniel Gruss, Clémentine Maurice, Klaus Wagner, and Stefan Mangard. 2016.
Flush+Flush: A Fast and Stealthy Cache Attack. In 13th Conference on Detection
of Intrusions and Malware & Vulnerability Assessment (DIMVA).
[12] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. 2015. Cache Tem-
plate Attacks: Automating Attacks on Inclusive Last-Level Caches. In 24th
USENIX Security Symposium (USENIX Security 15). USENIX Association, Wash-
ington, D.C., 897–912. https://www.usenix.org/conference/usenixsecurity15/
technical-sessions/presentation/gruss
[13] David Gullasch, Endre Bangerter, and Stephan Krenn. 2011. Cache Games –
Bringing Access-Based Cache Attacks on AES to Practice. In Proceedings of the
2011 IEEE Symposium on Security and Privacy (SP ’11). IEEE Computer Society,
Washington, DC, USA, 490–505. DOI:https://doi.org/10.1109/SP.2011.22
[14] Berk Gülmezoglu, Mehmet Sinan Inci, Gorka Irazoqui Apecechea, Thomas Eisen-
barth, and Berk Sunar. 2015. A Faster and More Realistic Flush+Reload Attack on
AES. In Constructive Side-Channel Analysis and Secure Design - 6th International
Workshop, COSADE 2015, Berlin, Germany, April 13-14, 2015. Revised Selected
Papers. 111–126. DOI:https://doi.org/10.1007/978-3-319-21476-4_8
[15] Mark Hall, Eibe Frank, Georey Holmes, Bernhard Pfahringer, Peter Reutemann,
and Ian H. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD
Explor. Newsl. 11, 1 (Nov. 2009), 10–18. DOI:https://doi.org/10.1145/1656274.
1656278
[16] Mehmet Sinan İnci, Berk Gulmezoglu, Gorka Irazoqui, Thomas Eisenbarth, and
Berk Sunar. 2016. Cache Attacks Enable Bulk Key Recovery on the Cloud. In
Cryptographic Hardware and Embedded Systems – CHES 2016: 18th International
Conference, Santa Barbara, CA, USA, August 17-19, 2016, Proceedings, Benedikt
Gierlichs and Axel Y. Poschmann (Eds.).
[17] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. 2015. S$A: A Shared Cache
Attack that Works Across Cores and Dees VM Sandboxing and its Application
to AES. In 36th IEEE Symposium on Security and Privacy (S&P 2015). 591–604.
[18] Gorka Irazoqui, Mehmet Sinan Inci, Thomas Eisenbarth, and Berk Sunar. 2015.
Lucky 13 Strikes Back. In Proceedings of the 10th ACM Symposium on Information,
Computer and Communications Security (ASIA CCS ’15). ACM, New York, NY,
USA, 85–96. DOI:https://doi.org/10.1145/2714576.2714625
[19] Taesoo Kim, Marcus Peinado, and Gloria Mainar-Ruiz. 2012. STEALTHMEM:
System-Level Protection Against Cache-Based Side Channel Attacks in the
Cloud. In Presented as part of the 21st USENIX Security Symposium (USENIX Secu-
rity 12). USENIX, Bellevue, WA, 189–204. https://www.usenix.org/conference/
usenixsecurity12/technical-sessions/presentation/kim
[20] Kenji Kira and Larry A Rendell. 1992. The feature selection problem: Traditional
methods and a new algorithm. In AAAI, Vol. 2. 129–134.
[21] Peng Li, Debin Gao, and Michael K Reiter. 2014. Stopwatch: a cloud architecture
for timing channel mitigation. ACM Transactions on Information and System
Security (TISSEC) 17, 2 (2014), 8.
[22] Moritz Lipp, Daniel Gruss, Raphael Spreitzer, Clémentine Maurice, and Stefan
Mangard. 2016. ARMageddon: Cache Attacks on Mobile Devices. In 25th USENIX
Security Symposium, USENIX Security 16, Austin, TX, USA, August 10-12, 2016. 549–
564. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/
presentation/lipp
[23] H. Liu. 2011. A Measurement Study of Server Utilization in Public Clouds. In
2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure
Computing. 435–442. DOI:https://doi.org/10.1109/DASC.2011.87
[24] Roy Longbottom. 2016. RandMem Benchmark. http://www.roylongbottom.org.
uk/. (2016). [Online; accessed 19-May-2017].
[25] Ramya Jayaram Masti, Devendra Rai, Aanjhan Ranganathan, Christian Müller,
Lothar Thiele, and Srdjan Capkun. 2015. Thermal Covert Channels on Multi-core
Platforms. In 24th USENIX Security Symposium (USENIX Security 15). USENIX
Association, Washington, D.C., 865–880. https://www.usenix.org/conference/
usenixsecurity15/technical-sessions/presentation/masti
[26] David McDonald. 1990. A cusum procedure based on sequential ranks. Naval
Research Logistics (NRL) 37, 5 (1990), 627–646. DOI:https://doi.org/10.1002/
1520-6750(199010)37:5<627::AID-NAV3220370504>3.0.CO;2-F
[27] Philip J. Mucci, Shirley Browne, Christine Deane, and George Ho. 1999. PAPI: A
Portable Interface to Hardware Performance Counters. In In Proceedings of the
Department of Defense HPCMP Users Group Conference. 7–10.
[28] Veronica Montes De Oca, Daniel R. Jeske, Qi Zhang, Carlos Rendon, and Mazda
Marvasti. 2010. A cusum change-point detection algorithm for non-stationary
sequences with application to data network surveillance. Journal of Systems and
Software 83, 7 (2010), 1288 – 1297. DOI:https://doi.org/10.1016/j.jss.2010.02.006
{SPLC} 2008.
[29] Dag Arne Osvik, Adi Shamir, and Eran Tromer. 2006. Cache Attacks and Coun-
termeasures: The Case of AES. In Topics in Cryptology – CT-RSA 2006: The
Cryptographers’ Track at the RSA Conference 2006, San Jose, CA, USA, February
13-17, 2005. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg, 1–20.
DOI:https://doi.org/10.1007/11605805_1
[30] ES Page. 1954. Continuous inspection schemes. Biometrika 41, 1/2 (1954), 100–
115.
[31] Mathias Payer. 2016. HexPADS: A Platform to Detect “Stealth” Attacks. In
Engineering Secure Software and Systems: 8th International Symposium, ESSoS
2016, London, UK, April 6–8, 2016. Proceedings, Juan Caballero, Eric Bodden, and
Elias Athanasopoulos (Eds.). Springer International Publishing, Cham, 138–154.
[32] Colin Percival. 2005. Cache missing for fun and prot. In Proc. of BSDCan 2005.
[33] Yahoo research. 2010. Yahoo! Cloud System Benchmark (YCSB). https://github.
com/brianfrankcooper/YCSB. (2010). [Online; accessed 19-May-2017].
[34] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. 2009. Hey,
you, get o of my cloud: exploring information leakage in third-party compute
clouds. In ACM Conference on Computer and Communications Security, CCS 2009,
Chicago, Illinois, USA, November 9-13, 2009. 199–212. DOI:https://doi.org/10.1145/
1653662.1653687
[35] Baljit Singh, Dmitry Evtyushkin, Jesse Elwell, Ryan Riley, and Iliano Cervesato.
2017. On the Detection of Kernel-Level Rootkits Using Hardware Performance
Counters. In Proceedings of the 2017 ACM on Asia Conference on Computer and
Communications Security. ACM, 483–493.
[36] Adrian Tang, Simha Sethumadhavan, and SalvatoreJ. Stolfo. 2014. Unsuper-
vised Anomaly-Based Malware Detection Using Hardware Features. In Research
12
in Attacks, Intrusions and Defenses, Angelos Stavrou, Herbert Bos, and Geor-
gios Portokalidis (Eds.). Lecture Notes in Computer Science, Vol. 8688. Springer
International Publishing, 109–129.
[37] A. G. Tartakovsky, B. L. Rozovskii, R. B. Blazek, and Hongjoong Kim. 2006.
A novel approach to detection of intrusions in computer networks via adap-
tive sequential and batch-sequential change-point detection methods. IEEE
Transactions on Signal Processing 54, 9 (Sept 2006), 3372–3382. DOI:https:
//doi.org/10.1109/TSP.2006.879308
[38] Alexander G. Tartakovsky, Boris L. Rozovskii, Rudolf B. BlaÅ¿ek, and Hongjoong
Kim. 2006. Detection of intrusions in information systems by sequential change-
point methods. Statistical Methodology 3, 3 (2006), 252 – 293. DOI:https://doi.
org/10.1016/j.stamet.2005.05.003
[39] Venkatanathan Varadarajan, Thomas Ristenpart, and Michael Swift. 2014.
Scheduler-based Defenses against Cross-VM Side-channels. In 23rd USENIX Secu-
rity Symposium (USENIX Security 14). USENIX Association, San Diego, CA, 687–
702. https://www.usenix.org/conference/usenixsecurity14/technical-sessions/
presentation/varadarajan
[40] X. Wang and R. Karri. 2013. NumChecker: Detecting kernel control-ow
modifying rootkits by using Hardware Performance Counters. In 2013 50th
ACM/EDAC/IEEE Design Automation Conference (DAC). 1–7.
[41] Zhenghong Wang and Ruby B. Lee. 2007. New cache designs for thwarting
software cache-based side channel attacks. In 34th International Symposium on
Computer Architecture (ISCA 2007), June 9-13, 2007, San Diego, California, USA.
494–505. DOI:https://doi.org/10.1145/1250662.1250723
[42] Yuval Yarom and Naomi Benger. 2014. Recovering OpenSSL ECDSA Nonces
Using the FLUSH+RELOAD Cache Side-channel Attack. IACR Cryptology ePrint
Archive 2014 (2014), 140.
[43] Yuval Yarom and Katrina Falkner. 2014. FLUSH+RELOAD: A High Resolu-
tion, Low Noise, L3 Cache Side-Channel Attack. In 23rd USENIX Security Sym-
posium (USENIX Security 14). 719–732. https://www.usenix.org/conference/
usenixsecurity14/technical-sessions/presentation/yarom
[44] Andreas Zankl, Johann Heyszl, and Georg Sigl. 2017. Automated Detection of
Instruction Cache Leaks in Modular Exponentiation Software. In Smart Card
Research and Advanced Applications: 15th International Conference, CARDIS 2016,
Cannes, France, November 7–9, 2016, Revised Selected Papers, Kerstin Lemke-Rust
and Michael Tunstall (Eds.). Springer International Publishing, Cham, 228–244.
DOI:https://doi.org/10.1007/978-3-319-54669-8_14
[45] Tianwei Zhang, Yinqian Zhang, and Ruby B. Lee. 2016. CloudRadar: A Real-
Time Side-Channel Attack Detection System in Clouds. Springer International
Publishing, Cham, 118–140. DOI:https://doi.org/10.1007/978-3-319-45719-2_6
[46] Y. Zhang, A. Juels, A. Oprea, and M. K. Reiter. 2011. HomeAlone: Co-residency
Detection in the Cloud via Side-Channel Analysis. In 2011 IEEE Symposium on
Security and Privacy. 313–328. DOI:https://doi.org/10.1109/SP.2011.31
[47] Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2012. Cross-
VM side channels and their use to extract private keys. In ACM Conference on
Computer and Communications Security, CCS’12, Raleigh, NC, USA, October 16-18,
2012. 305–316. DOI:https://doi.org/10.1145/2382196.2382230
[48] Yinqian Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart. 2014. Cross-
Tenant Side-Channel Attacks in PaaS Clouds. In Proceedings of the 2014 ACM
SIGSAC Conference on Computer and Communications Security (CCS ’14). ACM,
New York, NY, USA, 990–1003. DOI:https://doi.org/10.1145/2660267.2660356
13
