New Attacks and Defenses for Randomized Caches by Ramkrishnan, Kartik et al.
New Attacks and Defenses for Randomized Caches
KARTIKRAMKRISHNAN, ANTONIAZHAI, STEPHENMCCAMANT, and PENCHUNGYEW,Uni-
versity of Minnesota, Twin Cities
The last level cache is vulnerable to timing based side channel attacks because it is shared by the attacker and the victim processes
even if they are located on different cores. These timing attacks evict the victim cache lines using small conflict groups (SCG), and
monitor the cache to observe when the victim uses these cache lines again. A conflict group is a collection of cache lines which will
evict the target cache line.
To defeat these attacks, defenses randomize the address-to-set mappings in hardware, using cryptographic hash functions. Fur-
thermore, these defenses also change the encryption key periodically, to make attacks even harder. The re-randomization rate is
slow, because cache lines need to be moved in order to re-randomize, and moving too many cache lines creates extra evictions and
affects performance. We show that CEASER needs a substantially higher refresh rate to defend against attacks, resulting in significant
performance hit ( 20% average, upto 50%). CEASER-S adds another level of randomization to the cache by dividing the cache into
smaller banks, known as skews, and using a different encryption key for each skew. Thus, each cache line has a different set mappings
in each skew. We introduce new attacks on CEASER-S that can learn SCGs in O (N ) time. We show that the refresh rate requirement
for the default CEASER-S configuration is high to defend these attacks, resulting in significant performance hit (15%). We also propose
to increase the cache associativity and the number of skews for greater security.
We identify two key issues regarding the previous strategy, CEASER-S, namely the high cost of scaling up the skew count and high
cost of increasing refresh rate. Next, we propose a new randomization strategy using an indirection table, which mitigates these two
issues. Addresses of cache lines are encrypted and used to lookup the indirection table entry. Each indirection table entry stores a
mapping to a randomly chosen cache set. The cache line is placed into this randomly chosen set. The set mappings are re-randomized
across all sets, which is much greater than the number of skews (100x more than the default configuration), and requires only one
or two extra iTable lookups compared to the baseline, thus the cost of randomization is not too bad. Secondly, the encryption key
changes upto 50x faster than CEASER’s default rate, by using evictions to trigger the re-randomization. Instead of moving cache lines,
this mechanism re-randomizes one iTable entry at a time, whenever the cache lines corresponding to the iTable entry are naturally
evicted. Thus, the miss rate is not much worse than the baseline.
We quantitatively show that our scheme does almost as good as a fully associative cache to defend against these attacks. We also
demonstrate new attacks that target the indirection table by oversubscribing its entries, and quantitatively show that our scheme is
resilient against new attacks for trillions of years. Using CACTI 7.0, we estimate low area and power overhead compared to a baseline
inclusive last-level cache. Lastly, we evaluate its performance overhead using the SPECrate 2017 and PARSEC 3.0 benchmarks, and
show that the impact on performance is very low (<4%).
1 INTRODUCTION
A side-channel is any means of observation that can indirectly infer secret information about a system or application.
Commonly exploited side-channels include acoustic [45], electromagnetic [13, 35], and thermal [19] side-channels.
Side-channel attacks use different components of the processor’s microarchitecture to infer secret information, such
as speculative execution, branch prediction, TLBs, virtual addressing, and caches [10, 12, 14, 21, 26–28, 34, 48, 52, 52].
Last-level cache (LLC) attacks [5, 21, 34] are particularly dangerous because they do not require the victim’s and
attacker’s processes to be co-located on the same core. Cache side-channel attacks can be classified into many different
types [17, 36]. We discuss different kinds of cache side-channel attacks in detail in §2. In particular, conflict-based
Authors’ address: Kartik Ramkrishnan, ramkr004@umn.edu; Antonia Zhai, zhai@umn.edu; Stephen Mccamant, mccamant@umn.edu; Pen Chung Yew,
yew@umn.edu, University of Minnesota, Twin Cities.
Manuscript submitted to ACM 1
ar
X
iv
:1
90
9.
12
30
2v
1 
 [c
s.C
R]
  2
6 S
ep
 20
19
2 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
attacks, such as PRIME + PROBE [34] and EVICT + RELOAD [16], use interference between the victim and attacker
processes to steal secret information. This is achieved using small conflict groups (SCG), which are a small group of
cache lines which evict the target cache line. We empirically set the maximum size of a small conflict group as 1000
cache lines, because attacks usually require 2000-5000 cycles between measurements [15, 57], and a conflict group
of size 1000 takes about 5000 cycles to load, considering the latest LLC implementations [3], which pipeline the LLC
accesses to happen with 5 cycles between accesses. For a set associative cache, these conflict groups map to the same
cache set, thereby known as a conflict set, which only needs to be larger than the cache associativity. In previous
work [7, 33, 37, 51], the cache line addresses have been randomized to defeat such attacks. This randomization makes
it difficult for the attacker to discover SCGs. These randomization strategies use both hardware [33, 40, 41, 51] and
software based approaches [8, 47]. Software based approaches have large performance overhead (upto 15x) due to
introduction of new memory accesses and instructions to randomize the addresses. Hardware-based randomization
strategies randomize the mappings between the cache line addresses and the cache sets, and can have much lower
overhead (1%-5%). Table based(TB) [33, 51] randomization has been proposed for the private cache. TB randomization
has a two-level structure. The addresses are mapped to a randomization table in the first level and then the table entries
are mapped to the cache sets in the second level. There are two kinds. The first kind, which we call TB-1 [33], maps
addresses to a (conceptually) fully associative randomization table(s) in the first level, and the second level has a static
mapping to the cache sets. These fully associative tables are very expensive in terms of energy and area. They can be
implemented efficiently for the private caches, but are impractical for the LLC, because it is many times larger. The
second kind, which we call TB-2 [51], uses multiple randomization tables, one for each process/security domain. In
each table, the first level mapping is static, and the second level mapping uses dynamic random placements (DRP) i.e.
each table entry stores a set mapping, which is randomly changed using a random number generator (RNG), each time
the cache line(s) mapped to that entry are evicted. Since the randomization table effectively stores pointers to the cache
sets, we call it an indirection table, or iTable. It is not suitable for the LLC, where several threads/processes may be
active simultaneously, causing the storage overhead to become too high, due to the large number of tables required.
Dynamic encryption(DE) schemes, such as CEASER [40] encrypt the cache line address and map the encrypted address
to the cache sets. The execution is divided into epochs, each of which spans many memory accesses. There is a current
key and target key. At the start of an epoch, all cache sets are mapped using the current key, but they are gradually
refreshed so that at the end of the epoch they all use the target key. In order to refresh the cache set, all the cache lines
need to be moved to a new location determined by the target key. Therefore, the refresh rate needs to be low (eg. one
refresh every 100 accesses), to make the cache line movement overhead low. Dynamic encryption increases security,
but is vulnerable to faster attacks, which may find a conflict set in much less time than the epoch, as we show in §4.
A general approach to achieve greater security is to add additional levels of defenses. This is a similar approach to
multi-factor authentication (MFA), wherein using a password and answering a security question to log in to an online
banking account, is significantly more secure than only using a password, but does not take much more login time. The
cost the user pays for this extra protection is additive, but the security benefits are multiplicative. CEASER-S [41] uses
this approach i.e. it uses both random skew select (RSS) to improve the security of CEASER. It uses a randomized version
of the skew associative cache, in which the cache is divided into several smaller caches of lower associativity (called
skews) and cache lines may randomly be placed into any of the skews. The second randomization uses the CEASER
style dynamic encryption on each skew. However, lower skew counts of CEASER-S are vulnerable to new attacks (see
§4.5.1), which can discover SCGs in O(N ) time. CEASER-S requires a large number of skews (>128) to defend against
these new attacks, or a much higher refresh rate (epoch length N accesses).
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 3
Instead of using skews, we propose to use an indirection table (iTable), shared by all processes/threads, to address two
key issues, namely, increase the amount of randomization and the refresh rate, at a low cost. The cache line addresses
are encrypted and the iTable lookup occurs using the encrypted address. Each iTable entry stores a mapping to a cache
set, determined by a random number generator. The cache line is placed in this cache set. Re-randomization occurs one
iTable entry at a time (which usually maps one or two cache line addresses). Each re-randomization changes the set
mapping to a random value. Thus the re-randomization happens across all cache sets, which is 100x greater than the
default number of skews of CEASER-S. To increase re-randomization rate at a lower cost, the iTable entry is transitioned
to the target key whenever the cache lines mapped by it are naturally evicted, due to cache misses. An epoch length of
2 ∗ N accesses (N is the cache size) is sufficient to transition most of the iTable entries, which is upto 50x more than the
randomization rate proposed by CEASER. A cleaner mechanism transitions any iTable entries which did not naturally
transition by the end of the epoch.
The above scheme using the indirection table uses both dynamic encrypt in the first level(DE) and dynamic random
placements in the second level (DRP), therefore it is called DE+DRP. DE+DRP significantly increases the SCG size,
making the cache as secure as a fully associative cache (which has truly random set mappings) against the new attacks.
We also introduce a new attack on DE+DRP that targets the indirection table, by oversubscribing one of the entries, i.e.
create an SCG that maps to a single iTable entry. Using a simple analysis, we show in §7 that DE+DRP prevents the
creation of SCGs for trillions of years.
In summary, the major contributions of this work are listed below:
• We discuss new attacks that can defeat existing approaches (such as CEASER and CEASER-S) by recovering
SCGs in O(N ) time.
• We propose a novel two level approach using an indirection table to defend against conflict-based attacks, called
DE+DRP. We show analytically that it is resilient against attacks for trillions of years.
• We evaluate our scheme for performance, area and energy overhead and find that they are low. The performance
evaluation is done on SPEC2017rate and PARSEC 3.0 benchmarks, using ZSim [43]. The area and energy overhead
area evaluated using CACTI 7.0 [4].
The rest of the paper is organized as follows. §2 provides background information about existing conflict-based
attacks. §3 discusses existing table based defenses, and why they are not scalable to the LLC. §4 discusses existing
defenses for the LLC and new attacks that they are vulnerable against. §5 discusses a simple two level randomization
with static mappings. §6 discusses the DE+DRP randomization in detail. §7 quantitatively shows that DE+DRP is robust
against even new attacks for trillions of years. §8 discusses miscellaneous design issues. §9 presents the performance
results. §10 discusses other related work regarding cache randomization. §11 concludes the paper.
2 ATTACK BACKGROUND
Cache side-channel attacks come in many different flavours [17]. The majority of cache side-channel attacks use timing
information about the victim process, such as total execution time, or the time to access certain addresses in the cache.
Three types of cache side-channel attacks have been identified, namely, conflict-based, flush-based, and collision-based.
Conflict-based attacks leverage conflict misses in the cache. Whenever a victim interferes with the attacker’s data, or
vice-versa, it leaks information about the victim’s activities to the attacker. Flush-based attacks use clflush or related
instructions to flush specific cache lines and monitor their future activities. Lastly, cache collision attacks observe the
effect of a victim’s cache access patterns on its total execution time to deduce its secret. The key difference between
Manuscript submitted to ACM
4 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
conflict attacks and cache collision attacks is that the attacker does not need to probe the cache directly in the latter,
and only needs to measure the execution time of the victim.
2.1 Overview of Cache Side-Channel Attacks
A PRIME + PROBE attack [5] monopolizes the target set(s) with attacker’s data, and waits for a fixed interval. After the
interval has elapsed, the attacker measures the time required to access its data. If any cache line was evicted due to a
victim access, the attacker would observe greater latency to fetch the cache line, due to a cache miss. Thus, the attacker
detects victim accesses to the target set(s). EVICT + TIME [5] attack evicts victim data from a target set, then executes
the victim process and measures the total time of execution. If that set was accessed by the victim, the execution time
will be higher due to cache miss(es). This is used by the attacker to infer secret information used by the victim. EVICT
+ RELOAD [16] is very similar to PRIME + PROBE, except that the attacker probes for a shared cache line, instead
of a cache miss on its data. Lastly, PRIME + ABORT [10] uses a hardware transaction to prime the target set. If the
transaction aborts, it means the attacker data was evicted from the set due to a victim access. The time of access is
recorded during each abort, thus revealing the victim access patterns.
FLUSH + RELOAD [57] flushes a cache line using clflush, waits for a fixed interval, and measures time required to
reload the flushed cache line. A higher access latency due to a cache miss indicates that no victim access took place
during the previous interval, and vice versa. FLUSH + FLUSH [15] flushes a cache line using clflush, waits for a fixed
interval, and then flushes the same cache line again. A shorter flush time indicates that no victim access took place
during the previous interval, and vice versa. Both these attacks are useful to spy on the victim’s access patterns, if the
victim and attacker share the same cache line. clflush is also used in speculative execution attacks, such as Spectre [26],
Meltdown [28], and Foreshadow [48, 52] to transmit the result of speculative execution to the attacker.
There are two phases in cache collision attacks. The first phase is a cleaning phase, during which the attacker evicts
all the victim data from the cache. In the second phase, the attacker executes the victim process and measures the
execution time. Depending on the number of cache hits/misses, the execution time will be different for different inputs.
This can be used to deduce the secret information [6].
2.2 Conflict vs Non-Conflict Attacks
Flush-based attacks, a type of non-conflict based attacks, can be mitigated by disabling data sharing between the victim
and the attacker. On the other hand, to defend against conflict-based attacks, the application should be cache oblivious,
i.e. its cache access pattern should be independent of the secret information it tries to protect. In this work, we focus on
defense against conflict-based attacks, which are the only mode of attack when the attacker and victim do not share
data. We do not defend against flush based attacks, where the attacker is monitoring a cache line shared with the victim.
We also do not prevent some covert channel attacks, such as priming the entire cache instead of cache set, detecting
cache contention etc. We do prevent shared data attacks which require higher timing resolution, empirically 2000-5000
cycles long [57]. However, we do not prevent shared data attacks that might work even without any high resolution
timing measurements [49], and leave those to future work.
3 TABLE BASED RANDOMIZATION
Table based randomization is used to prevent conflict based attacks in the L1 cache. Table based randomization’s
strength is that it re-randomizes the set mapping of a cache line, whenever the cache line is evicted. This high rate of
re-randomization makes it difficult to mount attacks. There are two kinds of randomization that are possible for the
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 5
first level cache, which we name TB-1 and TB-2. In the TB-1 approach, the cache line addresses are mapped to a fully
associative randomization table. Thus, each cache line address may be mapped to any entry of the table. The mapping
from the table to the cache sets is deterministic. Therefore, in order to access the cache, the entire table needs to be
looked up (all entries). This is very expensive for larger caches, like the last level cache. Whenever a cache line is evicted
from the cache set, then the next time it is fetched, it is mapped to a random location in the randomization table. Thus
the set location of the cache line is also random. This makes attacks that use conflict sets ineffective, because each time
the target cache line is evicted, it is simply remapped to a different set. In the TB-2 approach, the cache line addresses
are mapped to the randomization table in a deterministic way, using some index bits of the cache line addresses. The
randomization table has a pointer in each entry to the correct cache set, where the cache line will be placed. Therefore,
we call it the indirection table, or iTable. The random number generator determines a cache set which is stored in the
iTable entry. There are multiple such iTables, and it is expected that the attacker and victim processes use different
iTables. Thus, TB-2 is also not suitable for the LLC because dozens of iTables may be required depending on the core
count, making the storage overhead impractical.
4 EXISTING RANDOMIZATIONS AND NEW ATTACKS
In this section, we discuss existing defenses, and new attacks which can defeat the randomization defenses for the LLC.
The attacks use small conflict groups (SCGs) (< 1000 cache lines in size) to evict a target cache line. A special case of
SCG is a conflict set, in which all cache lines of the SCG always map to the same set as the target, and is guaranteed to
evict the target. The conflict set needs to be at least as large as the cache associativity to evict the target. We use the
following variables, N is the number of cache lines,w is the cache associativity.
4.1 The Set Associative Cache
The memory address 64 bits. The first few bits (from the right) are the block offset bits, which are not used for determining
the set mapping. The next few bits are the set bits, which determine the set mapping. The remaining bits are the tag
bits. For a 64-byte cache line, there are 6 block offset bits (26 = 64). For a 2 MB LLC cache bank, there are 2048 cache
sets, thus there are 11 set bits (211 = 2048). We use this configuration for the rest of this work.
4.1.1 Simple Attack On Set Associative Cache. The simplest attack has two steps. First, we choose a random collection
of L cache lines, such that it contains a conflict set (SCG whose cache lines all map to the same set as the target, size at
least equals the cache associativity). The cache has N cache lines. The conflict group is of size L = k ∗ N , where k is a
fraction less than 1. The next step is to reduce the size of the conflict group in a systematic way so that the attacker
is left with a conflict set in the end. The simple approach reduces the size one cache line at a time, and checks to see
whether the conflict set is still in the reduced conflict group (i.e. there is at least one cache miss when the entire group
and the target cache line is loaded). Thus, the total number of accesses to perform the attack is the arithmetic series
L + (L − 1) + (L − 2) + ... = O(L2) = O(N 2)
Since the refresh time for DE strategy isO(N ), which is much smaller thanO(N 2) it is secure for arbitrarily sized caches
against such an attack strategy (see [40]) This simple attack was proposed in a previous work [34] to find the conflict
sets in the last level cache.
Manuscript submitted to ACM
6 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
4.2 Static Encrypt (SE)
One approach to protect the cache is by encrypting the cache line addresses, using a cryptographic hash function. We
consider the example of a 64-bit address space, and 64-byte cache blocks. 58 address bits except for the first 6 address
bits (from the right) can be randomized by encrypting them. As a result, cache lines are randomly mapped to different
sets, because the set bits change to random values. However, the simple attack, which we discuss in §4.1.1, can still
work to create conflict sets.
4.3 Dynamic Encrypt (DE)
Dynamic encryption (DE) uses a dynamically changing cryptographic hash function to encrypt the addresses. These
encrypted addresses are then used to map to the cache sets. This approach was used in CEASER [40]. The addresses
are encrypted using a low latency block cipher , and mapped to the cache sets. Two keys are in use at a time, and the
mappings transition gradually between the original key and the target key, during each epoch (a fixed number of cache
accesses), by slowly refreshing the cache sets to only use the target key. At the end of the epoch, the current key and
the target key are swapped, and the new target key is randomized to a different value. As the refresh routine scans
through the cache sets over the course of the epoch, it moves the cache lines in the current set (which uses the original
key) to other sets, based on the target hash function. The refresh time linearly depends on the cache size, N , and needs
to be at least an order of magnitude more than N (100 ∗ N accesses default) to ensure that performance is not affected.
Unsophisticated cache attacks require O(N 2) time to find a conflict set. However, more sophisticated attacks reduce
the attack time to only O(N ) time, and are able to break the defense for arbitrarily sized caches. In order to defend
against these more advanced attacks, it is required to substantially increase the refresh rate, so that there is a large
gap between the attacker’s rate of finding the conflict sets and the rate at which the sets are refreshed. Unfortunately,
increasing the refresh rate to a safer value (at least an order of magnitude less than N ) results in a very poor performance,
due to a large increase in cache evictions caused by moving cache lines. Also, the hardware cost of refreshing becomes
high because, at higher refresh rates, there needs to be hardware support to move multiple cache lines at a time. In the
following, we discuss the simple attack, that the DE is able to defend, and more advanced attacks, that can break the
defense.
4.3.1 The Binary Search Attack. The simple attack that we discussed in §4.1.1 requires O(N 2) accesses in order to
find a conflict set. However, the encryption key of DE gets changed in O(N ) time. Therefore, for larger caches, by the
time the attacker finds a conflict set, the encryption key would have changed, making the attack ineffective. Instead of
reducing the conflict group size one cache line at a time, we propose to reduce it by a small fraction f of its current size.
This creates a high probability that the conflict set is retained in the reduced conflict group. The total number of cache
accesses is reduced to a geometric series
L + f · L + f 2 · L + ... ≤ L1 − f = O(N )
4.3.2 The Advanced Builder Attack. Instead of using group reduction, we can build a conflict set one cache line at a
time. With each access, the target cache line is also monitored, to check if it has been evicted. If so, the cache line is
added to the conflict set. This builder attack takes about N .w time, which is O(N ), time to discover the conflict set.
We can speed up the builder attack in the following manner. First a conflict group is created by randomly selecting L
cache lines. The conflict group contains at least one conflict set. When all the cache lines of the conflict group and the
target cache line are accessed, there will be at least one cache miss, due to the conflict set inside the conflict group. Next,
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 7
the conflict group and the target cache line is loaded again, there will be another cache miss, this time on a different
cache line of the conflict set, depending on how the replacement state of the cache set got modified due to our accesses.
We do this a few times, with a different cache line being revealed due to each cache miss. Thus, the total attack time is
close tow .L, which is O(N ).
Impact of Replacement Policy : There can be a few variations of the builder attack based on the replacement policy.
For a cache using the random replacement policy, the cache line evicted from the set is randomly selected. In the builder
attack, each time a cache line is evicted, a new cache line may be evicted from the cache, revealing a new member of the
conflict set, or the same cache line may be evicted again, thus revealing no new member of the conflict set. Therefore,
more evictions may be required to find out all members of the conflict set, due to repeat evictions of some of the cache
lines. Modern caches typically do not use random replacement, which is a stateless replacement policy. Instead, they
typically use a replacement state, which is often stored using counters for each cache line. These counters are updated
whenever the cache set is accessed. An attacker with knowledge of the replacement policy could manipulate these
counters to implement a builder style attack faster than for a cache with a random replacement policy. For the rest of
the paper, we assume a random replacement policy is used, for maximum protection against attack.
4.4 Static and Dynamic Random Skew
The traditional skew associative, which we call static random skew (SRS) cache is vulnerable to attack due to static
skew mappings. The simple attack is effective in learning a conflict set for this cache. A more secure variant of the
cache uses a random number generator instead of using a deterministic hash function to select the cache skews, which
we call dynamic random skew (DRS). This cache is resilient against the usual attacks, such as the builder attack and the
conflict group reduction attack. However, it is still possible to develop an SCG using the builder attack.
4.4.1 Defense Against Basic Attack and Reduction Attack. In the attack, we create a large conflict group (1000s of cache
lines) that contains a conflict set (of size w + 1) on one of the skews. This will cause a cache miss when the entire
conflict group is loaded. When the conflict group is loaded again however, there is only a 1/s chance that the evicted
cache line from the conflict set maps to the same skew. If it gets loaded into a different cache skew, then the attack has
failed, because no matter how many times the conflict group is loaded, there aren’t any further cache misses, hence we
don’t learn anything about the conflict set for the target cache line. Similarly, the attacks that try to reduce a larger part
of the conflict group at a time also fail due to the same reason.
4.4.2 Builder Attack on DRS cache. A cache line is selected, which is the target cache line. Next, random cache lines are
selected, all the while monitoring the target to see whether it is evicted or not. In case it is evicted, then we record the
evictor into the SCG. Thus, we build up an SCG in this manner. However, there is a key difference regarding the size of
the SCG required to make this attack work. We noted that the minimum size of the SCG was justw for the previous
cache designs. However, it can be much larger for the DRS cache depending on the number of skews. Using the analysis
below, we find that the minimum skew size to have a significant chance of evicting the target cache line is about s .w .
We verified using simulations that this SCG size has a significant chance of evicting the target cache lines ( > 50% ), for
s andw values upto 128. The attack time is roughly N .s .w , thus linearly increasing it by a factor of s compared to the
previous attack.
Analysis : We perform a simple analysis to gain an intuition for the SCG size required to do an attack. Let us build a
conflict group of size д. During the process of discovering the members of the SCG, we discover д/s cache lines from
each skew, because they are randomly discovered among the skews. Once the members of the SCG are discovered we
Manuscript submitted to ACM
8 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
proceed to use it in an attack. Let the target cache line reside in skew x , set y. When we load the SCG, then out of the
д/s cache lines which map to skew x and set y, only д/s2 cache lines will map to skew x , set y again, due to random
skew selection. We equate this to the set associativity of each skew (w/s), which we consider to as enough to evict the
target cache line. This yields an SCG size д = w .s , which is sufficient to evict the target cache line.
Measurement Count : This attack is quite a bit weaker than the conflict set attacks. In order to make a measurement,
the SCG needs to be loaded into the cache. For another measurement, a different SCG needs to be used. Thus, the
number of high resolution measurements depends on the number of SCGs that the attacker is able to find. However,
although it is a weaker attack, it is still significant that the attacker is able to make any measurements at all, hence we
consider it to be a security risk that needs to be mitigated. On the other hand, the conflict set attacks are more powerful
because the same SCG can be re-used again and again to evict the target cache line. Thus, the attacker can make a large
number of measurements (before the epoch runs out).
4.5 Dynamic Random Skew + Dynamic Encrypt(DRS+DE)
A recent work [41] combined the DRS scheme and CEASER style dynamic encryption to make the cache more secure.
This defense is known as the CEASER-S approach. CEASER-S raises the bar for attackers, because there are now two
randomizations instead of one. CEASER-S uses a variant of skew associative cache for the first randomization.
4.5.1 Builder Attack on DRS+DE. The builder attack can be used on the cache. A target cache line is selected, following
which other random cache lines are used to evict it. Each cache line which evicts it is recorded by the attacker and
added to the conflict group. Similar to the builder attack on DRS, д = s .w is the minimum size of the SCG for a > 50%
chance to evict the target cache line. It takes roughly N accesses to discover each member of the SCG, so the attack
takes N .s .w time to create (N accesses to discover each member of the SCG).
Analysis : The д = s .w cache lines are discovered in N .s .w time on average. For the attack to be successful, we need
to discover these д cache lines in less time than the epoch time. Let E = k .N be the epoch of the cache, where k is an
integer > 1. Therefore N .w .s = k .N . This yields k = w .s . For the smaller skew counts (2-4) recommended in CEASER-S,
and an associativity of 16, k is between 32 and 64. Thus, the epoch length needs to be at least 64.N . We recommend
having a significant amount of gap between the attack rate and the refresh rate (at least an order of magnitude), which
yields an epoch time of 6.4N accesses for the four skew case. We propose that either the epoch length needs to be
reduced drastically, or the skew count needs to be increased substantially in order to have a good security guarantee.
For a default skew count of 2, we need to have an epoch length of minimum 32.N to defend our attack. However, with
a smaller SCG size ( < w .s .N ), probabilistic attacks may yet be possible. We leave such attacks to future work.
Proposed Improvements : We observe that two improvements are required for our solution. Firstly, it would be much
more secure to have a randomization which increased the SCG size by one or two orders of magnitude. If we randomize
across all the sets instead of the skews, then our group size would increase to S .w (where S is the number of sets, much
greater than the number of skews). Second, a higher rate of change of the encryption key (consequently a lower epoch
duration) is beneficial for security too. Using an indirection table, we can achieve both these goals, using the following
two strategies. Firstly, the indirection table randomizes across all the cache sets. Secondly, the iTable re-randomizes its
entries whenever there is an eviction. The overhead of the iTable is mainly due to storage. The access overhead is quite
minimal, requiring only an extra access or two.
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 9
5 USING AN INDIRECTION TABLE FOR GREATER SECURITY
We observe from the previous section that there are two key issues that we need to resolve, namely, (1) we need the
randomization to happen across all the sets to create a larger SCG and (2) we need the encryption key to change
faster, at a low cost. There can be numerous ways to achieve these goals, so we present one possible solution, using an
iTable. Other solutions could include combinations of skews, indirection tables, using tables in a different way to add
randomization, such as using a random offset etc. We provide the following key intuition to do our design. Firstly, in
order for the randomization to occur across all the cache sets, we use an indirection table to decide the set mappings.
Therefore, the encrypted address does not map to the cache lines directly, and instead maps to the indirection table.
Secondly, in order to reduce cost of lookup, each cache line should only lookup a small number of iTable entries to find
its set mapping. Lastly, the randomization rate of the encryption key needs to increase significantly to mitigate the
attacks, we suggest to re-randomize whenever there are cache evictions to achieve this increase. Another advantage is
that there need re-randomization rate when not under attack, i.e. when the cache miss rate is low.
Using the iTable for Static Mappings : We explain a simplified use of the indirection table first, using static mappings,
due to ease of explanation. We consider the first variant of this two level static randomization (TLSR), which we call
TLSR-SE, short for TLSR static encrypt. In TLSR-SE, the first level of the mappings are not mapped using the set bits,
instead, the cache line address is encrypted first and then mapped to the cache sets. This iTable lookup is called the
select operation. This makes it harder for the attacker to create the conflict sets, because the attacker does not know the
set bits beforehand. The second variant of TLSR is TLSR-SRP, in which the cache line addresses are mapped to the iTable
entries using the LSB bits of the address as index bits. The iTable entries contain mappings to the cache sets, which
are randomly generated when the cache is initialized. Combining the two above randomization yields TLSR-SE+SRP,
i.e. we use both the static encrypt and static random placements, or SE+SRP for short. In the first level, the cache line
address is encrypted to map to an entry of the iTable, and in the second level, the iTable entry is mapped to the cache
set. This determines the location of the cache line. Both TLSR-SE and TLSR-SRP randomization is vulnerable to attack.
The key vulnerability is that the mappings are static, hence we can use the basic attack to reduce a conflict group to an
SCG, that we discussed in §4.1.1.
The iTable Size : The iTable size determines how many cache lines can be kept in the cache. To an extreme, only one
entry means only one set can be stored. A larger number of entries can store a larger number of cache line. A minimum
of S entries is required to populate the entire cache.
6 INDIRECTION TABLE BASED TWO LEVEL DYNAMIC RANDOMIZATION (TLDR)
We observed that the TLSR scheme is not secure due to static mappings in §5, because existing attacks can create conflict
sets. We consider two simpler variants , which use dynamic encryption (DE) or dynamic random placement (DRP),
for improving the security. Following this, we shall explain the more complex version, i.e. DE+DRP. For simplicity of
explanation, we consider single level dynamic randomization first and then consider multi-level randomization.
6.1 TLDR-DE
We add dynamic encryption to SE+SRP, making it DE+SRP. There are two key goals that we discussed earlier in §4.5.1,
and also need to be incorporated into the scheme. Using DE, we address the second goal, i.e. we would like to increase
the re-randomization rate of the encryption key significantly. There can be a lot of strategies to achieve these goals, so
we present just one possible solution, which can be further tuned or may even be totally different, depending on other
Manuscript submitted to ACM
10 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
design goals of the designer. In our solution, we re-randomize one iTable entry at a time, whenever there is an eviction
due to a cache miss.
6.1.1 CEASER-like DE Mechanism Overview. We use a CEASER-like mechanism to transition the key by slowly
transitioning the iTable entries. The execution is divided into epochs, where each epoch is a pre-determined number of
cache misses, which we denote as E. There are two encryption keys, ki and kj , in use during an epoch, the current key
and the target key, which are used to perform a select operation on the iTable entries, instead of using only a single key
that is used by SE encryption. Thus, each address can selects upto two iTable entries, i and j, one using the current
key, ki and one using the target key kj . The iTable entries may be in a transitioned state, or a non-transitioned state.
The entries which are non transitioned, can be selected using either key, ki or kj . The entries which are transitioned
can only be selected using the target key, kj . Thus, we may need to check upto two sets Si and Sj to access the cache.
To remove this overhead, we can optimized the key select using a a precedence logic, which is discussed in §6.1.2.
We slowly transition the iTable entries, so that by the end of the epoch, they are all in a transitioned state. We use
a transition bit per iTable entry to store this information. At this point, we switch the current and target keys, and
assign a new randomly chosen target key. The iTable entries are changed to the transitioned state, whenever there is an
eviction in the cache. This achieves our second design goal of the system. We discuss the details of the transition in the
following section. Some optimizations to the replacement policy are also suggested, discussed in §6.1.3, to facilitate
a faster transition rate of the entries, with lower overhead. In this strategy, instead of moving the cache lines when
transitioning an iTable entry, we use natural evictions to facilitate the transitions. However, since natural evictions may
happen randomly to different iTable entries, it is also necessary to have a cleaner mechanism to ensure that all entries
are transitioned.
6.1.2 Key Select Operation With Precedence Logic. Ordered key select is an optimization that requires to lookup only
one cache set, instead of a couple, thus reducing the access latency. The two iTable entries, determined by the target
key, i and j, where i has greater precedence than j. If i has not yet transitioned to the target state, then it is selected,
and we use the set mapping of that entry, Si , and do not need to use j . In case i has transitioned to the target state, then
it is rejected, and we lookup the set mapping Sj to access the cache line. Thus, we only need to access one of the cache
sets, Si or Sj , which is much more efficient than accessing both.
6.1.3 Transitioning the iTable entries. Whenever all the cache lines which are mapped by a particular iTable entry, i ,
are evicted, then the iTable entry enters into a transitioned state. If it had already transitioned earlier, then it is not
necessary. Sometimes, there may be multiple cache lines which are mapped using the same iTable entry. In this case,
we cannot transition until all these cache lines are evicted. For the purpose of faster transitioning using evictions, we
suggest changes to the replacement policy, so that all these lines get evicted during a cache miss, not just one cache
line. Furthermore, since the transitions happen in a random manner depending on the evictions, it is also necessary to
have a cleaner mechanism to transition iTable entries which may have been missed.
Replacement Policy For Faster Transitions : The cache set has many cache lines. Per cache line, we suggest to keep
track of the iTable entry corresponding to the cache line address (to reduce encryption overhead). When there is a
cache miss on cache line L with corresponding iTable entry k , then the replacement policy does two things. Firstly, it
classifies the cache lines according to their respective iTable entries, selects a random iTable entry among these, and
evicts the corresponding cache lines.
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 11
Cleaner Mechanism : The cleaner mechanism requires to transition the iTable entries so that all entries have
transitioned to the target encryption key at the end of the epoch. The cleaner mechanism’s main goal is to ensure
correctness, because any cache lines which did not transition may result in an incorrect lookup. The cleaner mechanism
activates in the second half of the epoch, and scans through the iTable entries. Whenever it detects an iTable entry that
has not transitioned, it initiates an eviction in order to transition the iTable entry to the transitioned state. In most
cases, the iTable entry would already have transitioned, so there is not much perfomance impact.
The iTable Size and the Epoch Length : Since we may evict multiple cache lines per cache miss, therefore, we
recommend to have a low iTable load, so that there is usually only 1 cache line mapped by a single iTable entry. Thus
an iTable size of N (average load of 1) or greater is recommended. The epoch length depends on the iTable size. Since
we would like most of the iTable entries to have naturally transitioned by the end of the epoch (for lower cleaner
mechanism overhead), we use an epoch of length 2 ∗ N .
Total Storage Overhead : We are trading off storage overhead and security i.e. we have increased security but pay
for it with storage overhead. The iTable contains N entries, and each entry has loд2(S) bits for the set mapping, and 1
bit to indicate the transition state. The other overhead is the per cache line iTable entry information in the cache sets,
which is loд2(N ) bits per cache, and the increased tag size (since the tag now includes all bits except the set bits). The
overhead is about 2 ∗ loд2(S) + 1 + loд2(N ) bits per cache line, which is about 368 bits for S = 211 and N = 215. Since
each cache line has about 550 bits, this is about 7% overhead.
6.1.4 Attack on TLDR-DE. The attack on TLDR-DE requires to find a conflict set before the DE randomization can
change. The attack can be carried out using the advanced attack strategies that we discussed in §4.3.2, where the conflict
set can be reduced very quickly, even before the encryption function can change. We consider one particular example
for convenience. In this example, we create a conflict group. Each time there is a cache miss, we learn a new member of
the conflict set inside the conflict group. Thus in k .N cache accesses, it is possible to learn all the cache lines of the
conflict set. The epoch length is about 2 ∗N cache accesses, therefore, it is not long enough to prevent this attack, which
happens in L ∗w evictions, which can be less than 2 ∗ N . For example, we verified using simulation that we recovered a
conflict set for N = 32768 andw = 16 (typical LLC configuration) in 0.4 ∗ N accesses.
6.2 TLDR-DRP
We consider a second variant of TLDR, which we call SE+DRP. In this variant we re-randomize the second level
mappings between the iTable entries and the cache sets, so that set mapping of a cache line is randomly changed after
each eviction. This achieves the second goal that we discussed in §4.5.1.
6.2.1 The DRP Mechanism. The basic structure of the TLDR-DRP contains an iTable which creates two levels of
mappings, one between the cache line addresses and the iTable entries, and the other layer between the iTable entries
and the cache sets. The iTable entry contains the set mapping. Whenever the cache lines corresponding to the iTable
entry are evicted, then the re-randomization occurs in the iTable entry. Whenever the iTable entry transitions, then we
change the set mapping stored in the iTable entry.
Defending Against Oversubscription Attacks : An extra component we add is a victim buffer, which holds the cache
lines which may be used in an oversubscription attack on the cache. Whenever an entry of the iTable is oversubscribed,
then the extra cache lines are sent into the victim buffer. This helps increase the attack time until the buffer has become
full. The larger the buffer size, the greater the time to do the attacks. We discuss these oversubscription attacks in more
detail in the following.
Manuscript submitted to ACM
12 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
/* Cache and iTable Storage. N is the number of cache lines, S is the number of cache sets,
and w is set associativity */
1 iTable[N]{
2 refresh; // refresh bit
3 set_mapping; // set mapping
4 }
5 cache[S][w];
/* Pseudocode for the DE part */
6 check_buffer(lAddr, buf_hit, data);
7 if buf_hit then
8 return data;
9 end
/* Key Select Operation uses current key DE1() and target key DE2() */
10 iIdx1 = DE1(lAddr); // get iTable entry iIdx1
11 iIdx2 = DE2(lAddr); // get iTable entry iIdx2
12 r1 = iTable[iIdx1].refresh; // get refresh bit r1
13 S1 = iTable[iIdx1].set_mapping; // get set mapping S1
14 r2 = iTable[iIdx2].refresh; // get refresh bit r2
15 S2 = iTable[iIdx2].set_mapping; // get set mapping S2
/* Check cache hit in set indicated by iTable */
16 if r1 then
17 check_tags(cache, S1, tag_hit, data) ; // access S1
18 else
19 check_tags(cache, S2, tag_hit, data) ; // access S2
20 end
/* Pseudocode for the DRP part */
21 if tag_hit then
22 return data ; // tag hit, we are done
23 else
24 data = access_mem( lAddr ) ; // tag miss, get data from memory
25 repl(cache, S, repl_ways, oversubscribe) ; // invoke replacement policy calculation
26 if oversubscribe then
27 use_buffer(data, lAddr) ; // use buffer to store oversubscribed cache line
28 else
29 refr_idx = evict(cache, repl_ways) ; // Evict the cache line(S) selected by replacement policy
repl(), return the corresponding iTable entry refr_idx
30 fill_set(cache, data, S, lAddr) ; // Put the data into the cache set
31 refresh(iTable, refr_idx) ; // refresh the iTable entry
32 end
33 end
Algorithm 1: Pseudocode for DE+DRP
6.2.2 Attack on TLDR-DRP. It is possible to attack the TLDR-DRP scheme because the first level randomization is
static. We call this attack the oversubscription attack. The key idea is to load a conflict group, where all cache lines map
to the same iTable entry as the target cache line. This means that these cache lines will always map to the same cache
set as the target. We can gradually build up the SCG. We load the target cache lines and a random cache line. Then, we
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 13
load random cache lines, monitoring if both the target L and the candidate C are evicted simultaneously. If so, then C is
added to the SCG. We repeat until the SCG grows to sizew .
6.3 TLDR-DE+DRP
We use both DE and DRP mechanisms previously discussed in order to achieve both the goals of randomization that we
stated in §4.5.1. Since there are many components now, we use a pseudocode, to explain this more complex scheme.
6.3.1 PseudoCode. We present the pseudocode for DE+DRP in Algorithm 1, which contains all the mechanism we
have discussed so far. When the cache is accessed, then the first step is to access the victim buffer ( lines 6-9). This
victim buffer is required to prevent the attacks on DE, as we discussed in §6.2.2. If the cache line is not in the victim
buffer, then the next step is to check in the cache. In order to access the cache, the first step is the optimized key select
mechanism that we discussed in §6.1.2 (lines 10-15). The key select decides which iTable entry needs to be accessed.
Once we determine this, we access the cache set mapped by the iTable entry (lines 16-20). Next, we check the tags in
the cache set for a tag hit (lines 16-20). If the cache set contains the tag, then we are done. Otherwise, it is a tag miss,
hence the memory is accessed and the cache line is fetched from main memory (lines 24). Next, the replacement policy
that we discussed in §6.1.3 decides which cache lines needs to be evicted (line 25). Once the eviction is done, then the
cache line is filled either into the cache set or into the victim buffer, based on whether the cache set was oversubscribed
or not (26-31). Lastly, the corresponding iTable entry is refreshed (line 31). We have omitted the pseudo code for the
cleaner mechanism to keep our description short. There can be many ways to implement it, the way we did it was to
have an extra access by the cleaner after each normal access during the second half of the epoch.
7 SECURITY ANALYSIS
The attack criteria for the attack to succeed is to form an SCG to evict the target cache line. We try all the attacks on
the previous schemes and show that they do not work on DE+DRP. The first kind of attack is the basic attack (see
§4.1.1), which tries to reduce the conflict group to an SCG. Each time there is a cache eviction, then the target cache
line is re-randomized to a different cache set, and only a 1/S chance to map back to the same cache set. It is virtually
impossible to reduce the conflict group beyond a few cache lines.
The builder attack (§4.4.2) will learn one cache line at a time, and add it to the SCG. However, after each eviction, the
target cache line will map to a random cache set. Thus, the conflict group needs to become comparatively larger in
order to attack the cache. We can analyze how many cache lines are needed to evict the target cache line. Intuitively,
each member of the SCG will map to a random set, therefore, it will require many cache lines to evict the target cache
line. Using simulation. Therefore, our scheme significantly increases the security of the cache compared to the baseline.
Using simulation, we recorded the total time to evict the targets using SCGs of different sizes. We found that we
need at about 1000 cache lines for a 2% chance to evict the target cache lines, and for a large probability like 50%, it will
require an even greater SCG size, closer to N cache lines. Thus, our strategy increases the size of the conflict group by
an order of magnitude compared to the existing strategy, which only requires about 100 cache lines for a 70% chance to
evict the target in the default configuration (2 skews), and requires a larger number of skews (about 64) to get it down
to 2%. This is much larger than the size required to mount the attack.
Analysis : To simplify the analysis, we model the conflict group as a random set of cache lines, of size д. The cache
configuration is also randomly chosen. Assuming that the target cache line is in a random cache location, the SCG has
approximately 1/N chance to evict it. Thus, the chance to evict the target cache line survives after each eviction is
Manuscript submitted to ACM
14 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
about (1 − 1/N ). As we load the entire SCG, the chance that the target cache line decreases exponentially, depending
on the SCG size, to (1 − 1/N )д . Substituting in the group size of about N yields about 1/e chance to evict the target. If
we cut down the SCG size to about 1000 cache lines, then the probability to evict the target falls to 3%. These numbers
are in agreement with our simulation results. The analysis we have performed is exactly the same as what we may do
for a fully associative cache, which has truly random set mappings.
7.1 Oversubscription Attack and Defense for the iTable
Instead of directly attacking the cache, the attacker can target the iTable entries instead, by oversubscribing the iTable
entries. This is an attack similar to the one described in §6.2.2, where we tried to oversubscribe one of the iTable entries
in order to attack the cache. In order to carry out this attack, the attacker tries to learn iTable entries that map to the
same set as the target cache line. First, the attacker loads a conflict group, which contains an SCG oversubscribing one
of the iTable entries. Next the attacker tries to reduce the size of the conflict group one cache line at a time in order to
learn the SCG. Therefore, in order to prevent this type of attacks, our epoch needs to complete fast enough, so that
the SCG gets scattered before the attacker can discover it. We found using simulation that an epoch of length 2 ∗ N is
enough to prevent the iTable entries from being oversubscribed completely.
Analysis :We estimate the probability that a particular iTable entry is oversubscribed given a particular epoch length.
The accesses to each entry of the iTable is modelled as a Poisson distribution in which cache lines are evenly spread
over all the iTable entries. The Poisson Distribution is described using the following description :
e−λλ−k/k!
Where λ is the load. The load on the iTable entries refers to the number of cache lines which are mapped by the iTable
entries. The total iTable load is quite low, because only a small number of cache lines are mapped to each iTable entry.
The epoch length is 2 ∗ N cache lines, therefore, we consider the average load on an entry using 2 ∗ N cache lines (2),
for our calculations. We set λ = 2 and the k = 9, in order to oversubscribe the iTable entry (for cache set associativity
w = 8), and multiply the resultant probability with the number of iTable entries, to find out how many iTable entries
might have 9 mapped cache lines, which is about 16. Since all these cache lines are stored inside of the victim buffer,
it is impossible to oversubscribe a single entry of the iTable. Using simulations, we observed at most 11 cache lines
oversubscribed per epoch for the benchmarks. Therefore, we use a victim buffer size of 32 (2x) to guarantee that we
always have room for cache lines which oversubscribe the iTable entry.
8 MISCELLANEOUS DISCUSSION
We discuss some miscellaneous issues for a real implementation of the scheme, which we organize into three categories,
the LLC pipeline, the replacement policy implementation and other system level aspects.
Pipelining : Modern caches are pipelined for high performance. The LLC pipeline needs an extra stage in order to
integrate the iTable into it. Thus, the total access latency of the cache increases by one pipeline stage. Using modern
processors as a reference [2, 3], this requires about 3-7 cycles of extra access time. In the common case, the iTable stage
rarely causes pipeline hazards. Thus we conservatively model the delay using a 10 cycle additional latency, which
would allows 5 cycles for the hash function and 5 cycles for the iTable pipeline stage. Even shorter delay is possible
using lower latency hash function or more optimized design.
Replacement Policy : The replacement policy needs to check the iTable index of each cache line (stored per cache
line) and select one of them for eviction. This is similar to other high performance replacement schemes [20, 53], except
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 15
that they check a smaller number of bits. It is possible to efficiently implement this, by only checking the last few LSB
bits for example and checking the remaining bits if necessary. The storage overhead of the iTable index bits required
for replacement decision is discussed in §9.4.
Other System Level Aspects : Our design is independent from the rest of the system and does not require much changes.
There is no change in the coherence protocol, including multi-socket coherence, or the way memory consistency is
handled by the cache. Context switches and TLB shootdowns do not require participation from the cache because
it is the LLC, which is physically tagged and indexed. There is no change required in SMT or core design. Two
level randomization can even be implemented for directories, which have a similar set based structure as caches.
Hardware/Software prefetchers require no modification.
9 EVALUATION
We simulated the SPECrate 2017 and PARSEC 3.0 benchmarks using five configurations : (1) the baseline LLC, which is
an inclusive set associative cache (2) DE+DRP, and (3) DE+DRP with prefetching enabled (DE+DRP+pref) (4) CEASER
(5) CEASER-S. The results for the latter four configurations are normalized against the baseline. The LLC bank has
the same configuration as the example we used throughout the paper, i.e. an iTable size of 215, and an epoch size of
216 evictions. To account for the additional latency of the iTable access and the encryption, we added 10 cycles to the
baseline LLC access time to model the randomized LLC access time (see §8). The two key results we study are the
instructions per cycle and misses per kilo instructions (MPKI). We study total execution time instead of IPC for parsec
benchmarks, because they have variable number of instructions. The IPC is normalized by dividing the baseline IPC.
The ∆ MPKI is obtained by subtracting the MPKI of baseline from the MPKI of baseline.
9.1 Single Core Performance Results
Each SPECrate benchmark is simulated for a representative interval [46] of 1 billion instructions using the ‘ref’ inputs,
whereas the PARSEC benchmarks are simulated in their entirety using the ‘simmedium’ inputs. We simulate 21/23
SPECrate benchmarks and 11/13 PARSEC 3.0 benchmarks. We had runtime errors with roms, bwaves and do not have
results for these benchmarks. Each simulation was performed five times (due to the random nature of the access
patterns), and the average result is presented. The simulated configuration uses an out-of-order processor 32KB L1
cache, 256KB L2 cache and 2MB L3 cache banks per core. The prefetcher is a strided prefetcher that trains on L2 accesses
and prefetches into L2 and L3.
9.1.1 CEASER and CEASER-S :. We simulated the performance for CEASER using an epoch length 0.1*N accesses. This
was required in order to mitigate the fastest attack, which found a conflict set in less than N accesses, using the strategy
in §4.3.1. The performance fell by over 20% on average for SPEC2017 rate benchmarks. The maximum decrease in
performance was for gcc, mcf, wrf and xalan, due to the large increase in the MPKI. Some bencmarks had a significant
drop in performance even if there wasn’t a large increase in the MPKI, this may be because the relative increase in
MPKI is very large, even if the absolute value is not that large. For example, the MPKI increase for imagick is only
about 4, but the absolute MPKI for the baseline was only about 0.017, thus the relative MPKI increase was close to 200x.
CEASER-S : A key configuration is to evaluate CEASER-S on the baseline 16-way associative cache, where the cache
was divided into two skews. We set the epoch length to N accesses, which is ten times slower randomization rate than
CEASER, thus we expect the performance to be better. The performance penalty was still about 15% for the SPEC
bencmarks. The maximum penalty we observed was for xalan and wrf, due to a large increase in the MPKI. The
Manuscript submitted to ACM
16 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
 0.6
 0.8
 1
 1.2
 1.4
PER
GCC
MCF
CAC
NAM
PAR
POV
LBM
OMN
WRF
XAL
X26
BLE
CAM
DEE
IMA LEE NAB
EXC
FOT
XZ GEO
IP
C 
(N
or
m
) DE+DRP
DE+DRP+Prefetch
CEASER
CEASER-S
-5
 0
 5
 10
 15
 20
 25
PER
GCC
MCF
CAC
NAM
PAR
POV
LBM
OMN
WRF
XAL
X26
BLE
CAM
DEE
IMA LEE NAB
EXC
FOT
XZ ARI
Δ M
PK
I
DE+DRP
DE+DRP+Prefetch
CEASER
CEASER-S
 0.4
 0.6
 0.8
 1
 1.2
 1.4
BLKS
BDYT
FACE
FERR
FLDA
FRQM
RYTR
VIPS
CANL
DEDP
STCL
GEO
Cy
cle
s (
No
rm
)
DE+DRP
DE+DRP+Prefetch
CEASER
CEASER-S
-6
-4
-2
 0
 2
 4
 6
 8
BLKS
BDYT
FACE
FERR
FLDA
FRQM
RYTR
VIPS
CANL
DEDP
STCL
ARIT
 Δ 
M
PK
I DEDRPDEDRP+Prefetch
CEASER
CEASER-S
 0.93
 0.96
 0.99
 0  10  20  30  40  50  60  70  80
IP
C 
(N
or
m
)
Workload Mixes
-0.2
 0.3
 0.8
 1.3
 1.8
 0  10  20  30  40  50  60  70  80
Δ M
PK
I
Workload Mixes
Fig. 1. From Top To Bottom, SPECRate 2017 IPC, PARSEC 3.0, MultiProgramming IPC and MPKI
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 17
performance was mostly in line with CEASER, but the effect on MPKI was less severe, thus the performance was not as
bad.
9.1.2 DE+DRP. The performance results normalized against the baseline inclusive cache configuration, and shown in
Figure 1. There was a 4% IPC penalty for the randomized cache on average, and a marginal increase in MPKI (< 1%)
for the SPECrate 2017 benchmark suite on a single-core processor. We observe that benchmarks mcf,wrf and fotonik
benefit significantly from prefetching, due to a large decrease in MPKI (5, 3 and 8 respectively). The other benchmarks
have only a small increase in the MPKI. The primary reason for lower performance is due to the increased latency of
accessing the randomized LLC. xalan shows the maximum degradation in performance. However, xalan does not show
significant increase in MPKI, only about 0.24. The significant performance drops is due to a high rate of LLC accesses;
about 10 % of all loads occur in the LLC. Thus, the higher access latency has a significant impact on the performance
regardless of the MPKI, so it is worth it to design the system with lower LLC latency.
DE+DRP+Prefetch : This configuration observed a small increase in performance on average compared to the baseline,
due to less number of conflict misses. With prefetching enabled, the performance became 2% better than the baseline,
on average. The maximum benefit was observed for fotonik ( > 10% performance benefit).
9.2 SPECrate 2017 Multiprogramming Performance
We classified the benchmarks into high MPKI(mcf, lbm, parest, cam4, bwaves), medium MPKI(xz, deepsjeng, cactus,
x264, gcc, omnetpp, namd) and low MPKI(pov, exchange, blender, leela, wrf, imagick, fotonik) groups. There are
four groups of 20 workloads each using benchmarks with different MPKI. MIX-1 has only high MPKI benchmarks.
MIX-2 has both high MPKI and low MPKI benchmarks, whereas MIX-3 has only low MPKI benchmarks. We simulated
the baseline and the DE+DRP configuration for 80 different workload mixes. The bottom subfigure of Figure 1 shows
the IPC and MPKI of various workload mixes compared to the baseline inclusive cache, for a four-core configuration,
ordered in increasing performance. We fast forwarded 10 billion cycles, then simulated until both benchmarks had
reached 1 billion cycles. We used the weighted IPC metric to measure the performance. Most of the benchmarks show a
small decrease in IPC; the performance reduction was about 3% on average. The increase in MPKI is quite minimal for
almost all the workloads, which explains the modest increase in IPC; the average MPKI increase is less than 0.1. Some
of the workloads (such as workload 20) have a very small increase in MPKI, yet a larger decrease in performance. This
may be because of the performance penalty due to a greater proportion of loads being serviced by the LLC.
We also simulated 8-benchmark workload mixes on 8-core processors, where the workloads were created using a
similar strategy as for the 4-core simulation. We observe about 3% performance loss on average, similar to the four-core
simulation result. This is because the random nature of LLC access patterns does not create extra conflict misses when
a larger number of processes run concurrently. For the same reason, we expect the randomization scheme to scale well
to a larger number of cores.
9.3 PARSEC 3.0 Multithreaded Performance Results
We simulated 11/13 parsec benchmarks for all four configurations. Due to runtime errors, we do not have results for
swaptions and x264. We simulated a four-core configuration, and ran the complete benchmark using the simmedium
input, for all four configurations.
CEASER and CEASER-S : For CEASER, the average performance hit was about 20% and upto 60% in the case of
streamcluster. This was due to significant increase in the MPKI of all the benchmarks. For CEASER-S, the performance
Manuscript submitted to ACM
18 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
hit was a bit lower, about 15% on average, due to the lower rate of re-randomization compared to CEASER. The greatest
performance impact was also for streamcluster, with a 55% performance hit. The main reason was the large increase
in MPKI for streamcluster. Some benchmarks, such as raytrace, also had significant performance hit, but the ∆ MPKI
was not that much. However, the relative increase in MPKI was quite high for raytrace, about 60x increase in the
MPKI.
DE+DRP : The effect of randomization on performance is again minimal for most benchmarks. On average, we did
not observe a significant difference in execution time or total cache misses. The most performance degradation was
about 8% for canneal and 7% for streamcluster. This is because both of these benchmarks had a high percentage of
loads serviced by the LLC (about 6% and 5% each), thus the extra LLC access latency penalty affected the performance.
DE+DRP+Prefetch : On average, there was a small performance benefit due to prefetching into LLC. facesim showed
a significant reduction in runtime with prefetching enabled because there was a 3x reduction in LLC load misses. Since
the other benchmarks had a much lower MPKI with almost all of them being less than 1, there wasn’t much change in
the overall MPKI and the LLC accesses were a relatively less important factor for performance compared to some of the
SPEC benchmarks like MCF and LBM, which have much higher MPKI. 8-core and 16-core configurations also showed
similar performance as baseline. PARSEC benchmarks also showed about 15% decrease in performance, mainly due to a
significant increase in the MPKI.
9.4 Area and Storage Overhead
We used CACTI 7.0 for modelling the area increase in the cache, and also calculated the storage increase for each
scheme. CEASER and CEASER-S did not have much storage overhead. We also did not model the area overhead of the
encryption circuits, which are only thousands of gates in size, and require a negligible amount of extra area.
CEASER : CEASER requires a significantly higher refresh rate, with an epoch length of 0.1 ∗N . Therefore, we suggest
to use 10 extra read-write ports to handle the extra traffic. There is significant area impact of this due to the extra bitlines
required to read the storage cells. Using CACTI 7.0, we estimate the area to increase by 30x, which is not practical.
CEASER-S : There are two key changes in the hardware. Firstly, there are two skews in the cache, which requires
some duplication of the set lookup logic. Therefore, we expect to have a small increase in area. We did not consider this
area increase, because we expect it is not that much. Secondly, we add one extra read-write port to handle the extra
cache line movement, which results in area increase of about 2x.
DE+DRP : Compared to the baseline cache, the key sources of storage overhead are the iTable (64 KB), the extra tag
bits in the tag array (12 extra tag bits per cache line), and the iTable index bits used by the replacement policy (16 bits
per cache line). The tag size originally was 48 bits, and the cache lines are 64 bytes each. The total storage overhead is
about 7%. The area of the iTable, when implemented as a direct-mapped cache, is very small compared to the tag and
data arrays. The tag array sizes increase significantly compared to the baseline, due to the extra index bits and larger
tag size. However, it is still very small compared to the data array’s size. Finally, the data array size is significantly
smaller when the associativity is decreased. The randomized cache array’s size is about 10% smaller than the baseline
due to lower associativity. Using CACTI 7.0, we estimated that the total area of the tag and data arrays reduced by 2%
for an 8-way associative randomized cache, compared to a 16-way associative baseline inclusive cache. This is because
the increase in area due to the three sources of overhead was offset by the reduction in associativity.
9.5 Energy Consumption
We used CACTI 7.0 to estimate the energy cost of each cache access.
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 19
CEASER and CEASER-S : The energy cost of CEASER increases due to the extra encryption and decryption. CEASER-S
has some additional energy overhead due to the duplicated set lookup logic for each skew, however, we expect this
is quite small compared to the tag and data lookup. For both these schemes, the increase in energy occurs due to the
larger number of cache accesses (due to cache misses) and longer runtime, and not due to the cost of each cache access.
For CEASER-S, we expect about 10x dynamic energy cost due to the extra refresh rate, and about 2x energy cost due to
lower refresh rate compared to CEASER.
DE+DRP :There are two key sources of dynamic energy overhead. Firstly, there are two iTable access per cache
hit, and an additional iTable access per cache miss. Secondly, each access requires to read a larger tag compared to
the set associative cache. Fortunately, these overheads can be completely negated by the reduced associativity of the
randomized cache compared to the baseline. The randomized cache is 8-way associative, whereas the baseline is 16-way
associative. Thus the former has less dynamic energy consumption when accessing the cache and tag arrays.
On average, for the single-core SPECrate benchmarks, we estimate approximately 4% dynamic energy reduction. For
the four-core PARSEC 3.0 benchmarks, we estimate 6% reduction in dynamic energy. The leakage energy consumption
is increases modestly (< 4%) due to extra tag bits and iTable storage. For CEASER and CEASER-S, there can be significant
increase in dynamic energy due to the performance loss, which results in a significantly longer execution time. In
particular, there are a larger number of cache accesses, which results in more energy consumption from the cache
as well. Leakage energy is not affected, because both these schemes do not require additional storage. This is a key
advantage of these schemes from the perspective of energy consumption.
9.6 Other Factors that Affect Performance of DE+DRP
Hash Function Latency : In our simulations, we imposed 10 cycles additional latency for each LLC access (5 cycles
for iTable lookup and 5 cycles for encryption), but it is possible to reduce this latency substantially by removing the
encryption from the critical access path, or using a low latency cipher. There was negligible degradation in performance
compared to the baseline for zero latency, and a much smaller performance penalty for 5 cycle latency or less (< 1%).
Associativity : For associativity 4 or more, we did not observe significant degradation in performance. The 8-way
associative cache, which we use in C1, hits a sweet spot in terms of hardware overhead, security and performance.
Cleaner Routine Overhead : The cleaner routine is responsible for carrying out cache line evictions over the course of
the last N accesses of an epoch. The performance effect of the cleaner routine was to increase the contention due to
extra accesses to the cache and to create some extra evictions. This did not have a significant impact on performance,
because in the common case, the cleaner routine access does not cause an eviction in the cache, and only has an extra
iTable access.
Victim Buffer : The victim buffer was filled upto a max of 11 entries, which is well below its 32 entry capacity. Thus,
the victim buffer works as expected. Using CACTI 7.0, we estimate negligible increase in power (< 1%) due to small
buffer size and infrequent data accesses.
10 RELATEDWORK
Many software and hardware approaches to defend against cache side-channel attacks exist. The most relevant to our
work are randomization-based defenses.
CEASER [40] uses an OLDR-DE approach to defend the cache. CEASER-S [41], uses a skew associative cache in
addition to OLDR-DE to improve the security over CEASER.
Manuscript submitted to ACM
20 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
NewCache [33] uses a CAM address decoder to implement a fully associative randomization table, which is very
expensive in terms of dynamic power (10x), leakage power (50% extra), and area (6x) for a 2MB LLC bank. It gets worse
for larger cache sizes. Our approach is more scalable regardless of the number of concurrent processes operating at
the time. Multi-table solutions using OLDR-DRP approach [51] require different randomization tables for different
security domains, which is not scalable to the LLC due to the large storage (many MBs per bank). The single table
version of [51] is not secure, as we discussed in §4, due to the oversubscription attack. In contrast, DE+DRP has the
advantage that it is both scalable and secure against attacks.
Path ORAM [47] creates a large set of memory accesses corresponding to each memory access to hide the true access
from the attacker, and has been implemented in hardware [37]. Random-eviction caches randomly evict data from cache
sets to add noises to the attacker’s measurements [22], whereas random-fill caches [32] de-correlate the demand fetches
from the addresses fetched into the cache. However, they do not provide any security guarantee against conflict-based
attacks.
10.0.1 Software Randomization Techniques. Software-based randomization [7] changes the addresses randomly and
probabilistically during runtime. ASLR randomizes the virtual addresses used by the kernel (KASLR) and user space [1].
SGX Shield implements ASLR for secure SGX enclaves [44]. However, ASLR only randomizes the code locations once
when they are loaded, so access patterns may still be learned via the cache side-channel over the course of execution.
Dr SGX [7] probabilistically randomizes the addresses using a software ‘permutation’ cache. Memory trace oblivious
execution algorithms [30] ensure that the same memory trace is generated regardless of the secret, but incur 15x or
more performance overhead, making them impractical for most applications [29, 30, 42].
10.0.2 Cache Partitioning. While cache randomization is effective to defeat cache side channels, partitioning is even
more secure because it causes isolation between the attacker and the victim, ensuring no information leakage, and
thus even preventing covert channels. However, partitioning usually is non-scalable, because sharing of resources is no
longer based on need. Having too many partitions drastically reduces the system performance.
SecDCP [50], DAWG [25], PLCache [51] use way partitioning to allocate different ways of a set to different security
domains. Since there is never any sharing and interference between these ways, there is no possibility of doing an
attack. A related work is SHARP [55], which uses the cache replacement policy to protect the contents of the private
caches from cross-core attacks on the last level cache. NoMo [11] cache ensures that an attacker cannot monopolize a
cache set and thus limits the observation of the victim’s cache accesses. CATalyst [31] leverages Intel CAT technology
to allocate ‘secure pages’ to virtual machines, which get an exclusive access to some of the ways of each cache set.
StealthMem uses page coloring [24] to partition the cache sets so that secure and insecure processes cannot interfere
with each other. CacheBar [58] uses software partitioning between security domains to disable cache line sharing in the
last level cache.
10.0.3 Miscellaneous. Hardware performance counters can be used to detect unusual cache activity, as used by
HexPads [39]. CC-Hunter [9] looks for strange access patterns in the microarchitecture to detect attacks. Replay
Confusion [56] and SHARP [55] can detect attacks by checking unusual cache access patterns. Noises can be added to
the cache timing to disrupt the attacker’s measurements [18], or by changing the granularity of the timers [38]. Some
cache side-channel defenses are designed specifically for defeating speculative execution attacks include SafeSpec [23]
and InvisiSpec [54], by bypassing the cache until speculative loads are safe to be committed.
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 21
11 CONCLUSION
In this work, we have two key contribution. Firstly, we present new attacks which defeat existing randomization based
defeneses which protect the cache. Secondly, we present a novel two-level randomization scheme for caches, which
defeats side-channel attacks by randomizing the set mappings of the cache lines. A mathematical analysis is presented
to show that conflict-based attacks will become impractical to carry out. We also proposed a novel design for the
last-level cache that implements this scheme. We simulate real-world PRIME+PROBE attacks on AES and RSA ciphers,
and show that our randomization is effective. We used ZSim, a modern multicore simulator, and used SPEC 2017 and
PARSEC 3.0 benchmarks to evaluate the performance overhead. We used CACTI 7.0 to estimate the area overhead of
our design. Our implementation substantially increases the security of the cache against side-channel attacks, but with
very little area and performance overhead.
REFERENCES
[1] [n.d.]. https://en.wikipedia.org/wiki/Address_space_layout_randomization .
[2] [n.d.]. https://www.7-cpu.com/cpu/Broadwell.html.
[3] [n.d.]. https://www.7-cpu.com/cpu/Skylake.html.
[4] Rajeev Balasubramonian, Andrew B Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect
exploration in innovative off-chip memories. ACM Transactions on Architecture and Code Optimization (TACO) 14, 2 (2017), 14.
[5] Daniel J Bernstein. 2005. Cache-timing attacks on AES. (2005).
[6] Joseph Bonneau and Ilya Mironov. 2006. Cache-collision timing attacks against AES. In International Workshop on Cryptographic Hardware and
Embedded Systems. Springer, 201–215.
[7] Ferdinand Brasser, Srdjan Capkun, Alexandra Dmitrienko, Tommaso Frassetto, Kari Kostiainen, Urs Müller, and Ahmad-Reza Sadeghi. 2017. DR.
SGX: Hardening SGX Enclaves against Cache Attacks with Data Location Randomization. arXiv preprint arXiv:1709.09917 (2017).
[8] Ferdinand Brasser, Urs Müller, Alexandra Dmitrienko, Kari Kostiainen, Srdjan Capkun, and Ahmad-Reza Sadeghi. [n.d.]. Software grand exposure:
SGX cache attacks are practical. ([n. d.]).
[9] Jie Chen and Guru Venkataramani. 2014. Cc-hunter: Uncovering covert timing channels on shared processor hardware. InMicroarchitecture (MICRO),
2014 47th Annual IEEE/ACM International Symposium on. IEEE, 216–228.
[10] Craig Disselkoen, David Kohlbrenner, Leo Porter, and Dean Tullsen. 2017. Prime+ abort: A timer-free high-precision l3 cache attack using intel TSX.
In 26th USENIX Security Symposium (USENIX Security 17),(Vancouver, BC). 51–67.
[11] Leonid Domnitser, Aamer Jaleel, Jason Loew, Nael Abu-Ghazaleh, and Dmitry Ponomarev. 2012. Non-monopolizable caches: Low-complexity
mitigation of cache side channel attacks. ACM Transactions on Architecture and Code Optimization (TACO) 8, 4 (2012), 35.
[12] Dmitry Evtyushkin, Ryan Riley, Nael CSE Abu-Ghazaleh, Dmitry Ponomarev, et al. 2018. BranchScope: A New Side-Channel Attack on Directional
Branch Predictor. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating
Systems. ACM, 693–707.
[13] Karine Gandolfi, ChristopheMourtel, and Francis Olivier. 2001. Electromagnetic analysis: Concrete results. In InternationalWorkshop on Cryptographic
Hardware and Embedded Systems. Springer, 251–261.
[14] Ben Gras, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. [n.d.]. Translation Leak-aside Buffer: Defeating Cache Side-channel Protections with
TLB Attacks.
[15] Daniel Gruss, Clémentine Maurice, Klaus Wagner, and Stefan Mangard. 2016. Flush+ Flush: a fast and stealthy cache attack. In International
Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 279–299.
[16] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. 2015. Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches.. In
USENIX Security Symposium. 897–912.
[17] Zecheng He and Ruby B Lee. 2017. How secure is your cache against side-channel attacks?. In Proceedings of the 50th Annual IEEE/ACM International
Symposium on Microarchitecture. ACM, 341–353.
[18] Wei-Ming Hu. 1992. Reducing timing channels with fuzzy time. Journal of computer security 1, 3-4 (1992), 233–254.
[19] Michael Hutter and Jörn-Marc Schmidt. 2013. The temperature side channel and heating fault attacks. In International Conference on Smart Card
Research and Advanced Applications. Springer, 219–235.
[20] Aamer Jaleel, Kevin B Theobald, Simon C Steely Jr, and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction
(RRIP). In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 60–71.
[21] Mehmet Kayaalp, Nael Abu-Ghazaleh, Dmitry Ponomarev, and Aamer Jaleel. 2016. A high-resolution side-channel attack on last-level cache. In
Proceedings of the 53rd Annual Design Automation Conference. ACM, 72.
Manuscript submitted to ACM
22 Kartik Ramkrishnan, Antonia Zhai, Stephen Mccamant, and Pen Chung Yew
[22] Georgios Keramidas, Alexandros Antonopoulos, Dimitrios N Serpanos, and Stefanos Kaxiras. 2008. Non deterministic caches: A simple and effective
defense against side channel attacks. Design Automation for Embedded Systems 12, 3 (2008), 221–230.
[23] Khaled N Khasawneh, Esmaeil Mohammadian Koruyeh, Chengyu Song, Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2018.
SafeSpec: Banishing the Spectre of a Meltdown with Leakage-Free Speculation. arXiv preprint arXiv:1806.05179 (2018).
[24] Taesoo Kim, Marcus Peinado, and Gloria Mainar-Ruiz. 2012. STEALTHMEM: System-Level Protection Against Cache-Based Side Channel Attacks
in the Cloud.. In USENIX Security symposium. 189–204.
[25] Vladimir Kiriansky, Ilia Lebedev, Saman Amarasinghe, Srinivas Devadas, and Joel Emer. [n.d.]. DAWG: A Defense Against Cache Timing Attacks in
Speculative Execution Processors. ([n. d.]).
[26] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and
Yuval Yarom. 2018. Spectre Attacks: Exploiting Speculative Execution. arXiv preprint arXiv:1801.01203 (2018).
[27] Esmaeil Mohammadian Koruyeh, Khaled Khasawneh, Chengyu Song, and Nael Abu-Ghazaleh. 2018. Spectre Returns! Speculation Attacks using the
Return Stack Buffer. In 12th {USENIX} Workshop on Offensive Technologies ({WOOT} 18). {USENIX} Association.
[28] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and Mike
Hamburg. 2018. Meltdown. arXiv preprint arXiv:1801.01207 (2018).
[29] Chang Liu, Austin Harris, Martin Maas, Michael Hicks, Mohit Tiwari, and Elaine Shi. [n.d.]. GhostRider: A Hardware-Software System for Memory
Trace Oblivious Computation. ([n. d.]).
[30] Chang Liu, Michael Hicks, and Elaine Shi. 2013. Memory trace oblivious program execution. In Computer Security Foundations Symposium (CSF),
2013 IEEE 26th. IEEE, 51–65.
[31] Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas, Gernot Heiser, and Ruby B Lee. 2016. Catalyst: Defeating last-level cache side
channel attacks in cloud computing. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on. IEEE, 406–418.
[32] Fangfei Liu and Ruby B Lee. 2014. Random fill cache architecture. InMicroarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium
on. IEEE, 203–215.
[33] Fangfei Liu, Hao Wu, Kenneth Mai, and Ruby B Lee. 2016. Newcache: Secure cache architecture thwarting cache side-channel attacks. IEEE Micro
36, 5 (2016), 8–16.
[34] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B Lee. 2015. Last-level cache side-channel attacks are practical. In Security and Privacy
(SP), 2015 IEEE Symposium on. IEEE, 605–622.
[35] Jake Longo, Elke De Mulder, Dan Page, and Michael Tunstall. 2015. SoC it to EM: electromagnetic side-channel attacks on a complex system-on-chip.
In International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 620–640.
[36] Yangdi Lyu and Prabhat Mishra. 2018. A Survey of Side-Channel Attacks on Caches and Countermeasures. Journal of Hardware and Systems Security
2, 1 (2018), 33–50.
[37] Martin Maas, Eric Love, Emil Stefanov, Mohit Tiwari, Elaine Shi, Krste Asanovic, John Kubiatowicz, and Dawn Song. 2013. Phantom: Practical
oblivious computation in a secure processor. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM,
311–324.
[38] Robert Martin, John Demme, and Simha Sethumadhavan. 2012. TimeWarp: rethinking timekeeping and performance monitoring mechanisms to
mitigate side-channel attacks. ACM SIGARCH Computer Architecture News 40, 3 (2012), 118–129.
[39] Mathias Payer. 2016. HexPADS: a platform to detect âĂĲstealthâĂİ attacks. In International Symposium on Engineering Secure Software and Systems.
Springer, 138–154.
[40] Moinuddin K Qureshi. 2018. CEASER: Mitigating Conflict-Based Cache Attacks via Encrypted-Address and Remapping. In 2018 51st Annual
IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 775–787.
[41] Moinuddin K Qureshi. 2019. New attacks and defense for encrypted-address cache. In Proceedings of the 46th International Symposium on Computer
Architecture. ACM, 360–371.
[42] Ashay Rane, Calvin Lin, and Mohit Tiwari. [n.d.]. Raccoon: Closing Digital Side-Channels through Obfuscated Execution.
[43] Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In ACM SIGARCH
Computer architecture news, Vol. 41. ACM, 475–486.
[44] Jaebaek Seo, Byoungyoung Lee, Seong Min Kim, Ming-Wei Shih, Insik Shin, Dongsu Han, and Taesoo Kim. 2017. SGX-Shield: Enabling Address
Space Layout Randomization for SGX Programs.
[45] Adi Shamir and Eran Tromer. 2004. Acoustic cryptanalysis. presentation available from http://www. wisdom. weizmann. ac. il/ tromer (2004).
[46] Timothy Sherwood, Erez Perelman, and Brad Calder. 2001. Basic block distribution analysis to find periodic behavior and simulation points in
applications. In Parallel Architectures and Compilation Techniques, 2001. Proceedings. 2001 International Conference on. IEEE, 3–14.
[47] Emil Stefanov, Marten Van Dijk, Elaine Shi, Christopher Fletcher, Ling Ren, Xiangyao Yu, and Srinivas Devadas. 2013. Path ORAM: an extremely
simple oblivious RAM protocol. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, 299–310.
[48] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Thomas F. Wenisch, Yuval Yarom, and
Raoul Strackx. 2018. Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution. In Proceedings of the 27th
USENIX Security Symposium. USENIX Association. See also technical report Forshadow-NG [52] (https://foreshadowattack.eu).
[49] Jo Van Bulck, Nico Weichbrodt, Rüdiger Kapitza, Frank Piessens, and Raoul Strackx. 2017. Telling your secrets without page faults: Stealthy page
table-based attacks on enclaved execution. In 26th {USENIX} Security Symposium ({USENIX} Security 17). 1041–1056.
Manuscript submitted to ACM
New Attacks and Defenses for Randomized Caches 23
[50] Yao Wang, Andrew Ferraiuolo, Danfeng Zhang, Andrew C Myers, and G Edward Suh. 2016. SecDCP: secure dynamic cache partitioning for efficient
timing channel protection. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE. IEEE, 1–6.
[51] Zhenghong Wang and Ruby B Lee. 2007. New cache designs for thwarting software cache-based side channel attacks. In ACM SIGARCH Computer
Architecture News, Vol. 35. ACM, 494–505.
[52] Ofir Weisse, Jo Van Bulck, Marina Minkin, Daniel Genkin, Baris Kasikci, Frank Piessens, Mark Silberstein, Raoul Strackx, Thomas F. Wenisch, and
Yuval Yarom. 2018. Foreshadow-NG: Breaking the Virtual Memory Abstraction with Transient Out-of-Order Execution. Technical report (2018). See
also USENIX Security paper Foreshadow [48] (https://foreshadowattack.eu).
[53] Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C Steely Jr, and Joel Emer. 2011. SHiP: Signature-based hit predictor
for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 430–441.
[54] Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison, Christopher W Fletcher, and Josep Torrellas. 2018. InvisiSpec: Making Speculative
Execution Invisible in the Cache Hierarchy. In Proceedings of the 51th International Symposium on Microarchitecture (MICROâĂŹ18).
[55] Mengjia Yan, Bhargava Gopireddy, Thomas Shull, and Josep Torrellas. 2017. Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending
Against Cache-Based Side Channel Atacks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, 347–360.
[56] Mengjia Yan, Yasser Shalabi, and Josep Torrellas. 2016. ReplayConfusion: detecting cache-based covert channel attacks using record and replay. In
The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 39.
[57] Yuval Yarom and Katrina Falkner. 2014. FLUSH+ RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack.. In USENIX Security
Symposium. 719–732.
[58] Ziqiao Zhou, Michael K Reiter, and Yinqian Zhang. 2016. A software approach to defeating side channels in last-level caches. In Proceedings of the
2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 871–882.
Manuscript submitted to ACM
