Advanced profiling for probabilistic Prime+Probe attacks and covert
  channels in ScatterCache by Purnal, Antoon & Verbauwhede, Ingrid
Advanced profiling for probabilistic Prime+Probe attacks
and covert channels in ScatterCache
Antoon Purnal and Ingrid Verbauwhede
imec-COSIC, KU Leuven, Belgium
{firstname.lastname}@esat.kuleuven.be
Abstract. Timing channels in cache hierarchies are an important enabler in many microar-
chitectural attacks. ScatterCache (USENIX 2019) is a protected cache architecture that
randomizes the address-to-index mapping with a keyed cryptographic function, aiming to
thwart the usage of cache-based timing channels in microarchitectural attacks. In this note,
we advance the understanding of the security of ScatterCache by outlining two attacks
in the noise-free case, i.e. matching the assumptions in the original analysis. As a first con-
tribution, we present more efficient eviction set profiling, reducing the required number of
observable victim accesses (and hence profiling runtime) by several orders of magnitude. For
instance, to construct a reliable eviction set in an 8-way set associative cache with 11 index
bits, we relax victim access requirements from approximately 225 to less than 210. As a sec-
ond contribution, we demonstrate covert channel profiling and transmission in probabilistic
caches like ScatterCache. By exploiting arbitrary collisions instead of targeted ones, our
approach significantly outperforms known covert channels (e.g. full-cache eviction).
1 Introduction
As an essential part of modern-day computing, caches hide the ever-growing latency gap
between the CPU and main memory technology by exploiting locality of memory accesses.
Inherent to the operation of caches, some accesses are fast and some are slow. Resulting
from this fundamental timing side-channel, caches have been used as important building
blocks in many micro-architectural attacks, ranging from attacks on cryptographic im-
plementations [1, 2] to transient execution attacks like Spectre [3] and Meltdown [4].
Often, the term side channel is used when an attacker uses the channel to spy on a non-
cooperating victim (as in attacks on AES implementations), whereas a covert channel
denotes a channel that is used deliberately by communicating parties (as in Meltdown).
Cache attacks. Existing cache attacks can largely be classified in two categories. Removal-
based cache attacks, like Flush+Reload [5] and derivatives [6], can infer memory access
patterns at the granularity of cache lines but require shared memory between attacker and
victim. Contention-based cache attacks like Prime+Probe [2, 7] and Evict+Time [2] are
more coarse-grained, but only rely on the shared nature of the cache. Because this enabler
is readily attained in practice, contention-based attacks constitute a powerful threat.
Protected cache architectures. To mitigate cache attacks, the cache hardware archi-
tecture can be augmented, giving rise to protected cache architectures. As a promising line
of work, randomized cache architectures [8, 9, 10, 11] focus on randomizing the otherwise
predictable mapping of memory addresses to cache sets, raising the effort for cache attacks
that rely on eviction sets (like Prime+Probe). In this context, ScatterCache [11] is a
recent contribution that achieves this randomization with a key-dependent cryptographic
mapping. Moreover, its mapping function depends on the security domain, and separately
indexes the different cache ways to increase the perceived number of cache sets.
ar
X
iv
:1
90
8.
03
38
3v
1 
 [c
s.C
R]
  9
 A
ug
 20
19
Contributions. In this note, we challenge and complement the existing security analysis
of ScatterCache with two main observations.
1) More efficient Prime+Probe profiling. The total profiling runtime is largely de-
termined by the required number of observable victim accesses. We generalize the
approach for profiling eviction sets in ScatterCache, significantly reducing the re-
quired victim accesses. For instance, to construct a reliable eviction set in a noise-free
8-way set associative cache with 11 index bits, our attack accomplishes a reduction
from approximately 225 to less than 210 victim accesses.
2) Covert channels. While ScatterCache has the potential to thwart covert-channel
cache attacks by significantly lowering the channel capacity, the designers do not explic-
itly or quantitatively consider this extended attacker model. We demonstrate a suitable
approach for constructing and exploiting covert channels, yielding a covert channel that
interpolates between between full cache eviction [12] and Prime+Probe on traditional
caches.
2 ScatterCache preliminaries
Proposed at USENIX 2019, ScatterCache [11] is a promising contribution in the con-
text of protected cache architectures. This section concisely introduces its key principles. It
is not intended to present a comprehensive overview of protected cache architectures, nor
does it aspire to be a complete description of ScatterCache. Its purpose lies in delin-
eating the scope of this document and providing the necessary preliminaries to understand
the assumptions and attacks in the sections that follow.
As a randomized cache architecture, ScatterCache replaces the predictable mapping
from memory addresses to cache set indices by a pseudorandom mapping. The indexing
into cache sets is performed by the Index Derivation Function (IDF). This function is
instantiated with a keyed cryptographic primitive, where the key is randomly generated
at system boot.
Figure 1 presents the context in which the IDF is used. The IDF additionally considers
a Security Domain Identifier (SDID) input to the mapping function, differentiating the
mapping for processes belonging to other security domains. Furthermore, by separately
indexing the different cache ways, ScatterCache dynamically composes cache sets based
on the indices in individual cache ways. As a result, the number of perceived (logical)
cache sets is much larger than the physically available amount. The designers propose
two constructions for the IDF: (1) The hashing variant pseudorandomly generates output
indices based on all IDF inputs; (2) The permutation variant instantiates an individual
cryptographic permutation for each of the cache ways, where the permutation is selected
based on the cache tag and way index. As in the original analysis, this document focuses
on the hashing variant as it is likely more secure [11] and straightforwardly translates to
existing cryptographic primitives like (tweakable) block ciphers of conventional sizes.
In the remainder of this document, the number of ways and index bits of the cache are
denoted by nways and bindices, respectively.
Fig. 1: Cryptographic index derivation function (IDF) in ScatterCache [11].
3 Assumptions and simulation model
In the analysis that follows, we make identical assumptions to the original security
analysis of ScatterCache, which we now make explicit. Where applicable, we also
describe how these assumptions are implemented in the simulator that we use to verify
the attacks.
Cryptographic unit (IDF). As mentioned earlier, the analysis pertains to the hashing
variant of ScatterCache. The model assumes that the mapping from memory address
and SDID to output indices is perfectly pseudorandom and hence considers the crypto-
graphic unit as a black box. Inherently, this assumption implies that there are no crypt-
analytic attacks on the IDF. As a result, we assume that processes cannot, with any
advantage over exhaustive search: (1) determine the memory address corresponding to a
specific index; (2) determine other cache way output indices from one given output index;
(3) find inputs to the cryptographic unit that produce output collisions; (4) recover the
IDF key.
Cache properties. Faithful to how caches behave in the real world, processes cannot
monitor the entries in the cache directly, nor can they infer to which cache way a certain
memory address is allocated. The only interface available to them is access latency, ob-
servable by reading specific addresses; the latency is low in case of a cache hit and high
in case of a miss. Since ScatterCache flexibly adapts to a generic cache with nways and
2bindices indices, the analysis is also general in these parameters. Finally, the cache has a
random replacement policy.
Noise-free model. Given that we match the assumptions in the security analysis of
ScatterCache, we consider a completely noise-free model. In particular, we assume that
there are no contributions of random noise (e.g. other processes on the same machine)
and systematic noise (e.g. the code and memory of these processes do not influence the
cache state). Section 6 revisits this assumption and paves the way for future research.
Simulator. The analysis described in the remainder of this report has been verified on
a Python model of ScatterCache that satisfies these assumptions. In the simulator,
we instantiate the cryptographic unit with AES-128. Note that this does not incur a
loss of generality as the attacks do not exploit any IDF internals. Moreover, in practice
ScatterCache would likely use a less-established cipher due to the stringent latency
requirements of the CPU pipeline.
4 Faster Prime+Probe profiling
Existing analysis. A Prime+Probe attack generally consist of two phases: a profiling
phase and an exploitation phase, both of which are made harder by ScatterCache. In
their paper, the ScatterCache authors derive the Prime+Probe profiling effort for the
attacker in terms of number of accesses to the cache. In particular, to find t addresses that
collide with the victim address in at least one cache way, the expected number of victim
accesses is determined as n2ways · 2bindices · t. This number is very high, and prohibitively
large for practical attacks. However, we show that this is a suboptimal attack strategy.
Indeed, given that the profiling runtime is largely determined by the number of observable
victim accesses, the profiling phase should strive to reduce these to a minimum.
Profiling phase. A more effective profiling approach looks as follows.
(i) The attacker generates k different addresses and reads them from memory, thereby
loading them into the cache.
The number of addresses k is an attack parameter. Depending on k, there can be
collisions within the set of attacker addresses. As a key step to eliminate false pos-
itives later on, the attacker prunes this address set by accessing these k addresses
again, removing all addresses that result in high access latency (this means that
they had been evicted by some other attacker address). The attacker iteratively
continues the pruning this until no more addresses get evicted. Let mpr denote the
number of pruning iterations.
The attacker now has a set of k′ ≤ k addresses, which are guaranteed to reside at
a different location in the cache.
(ii) The attacker triggers the victim to perform the access of interest. Specifically,
the victim loads the target address, thereby evicting one of the attacker addresses with
probability p = k′/(nways · 2bindices).
The expected value of the cache coverage with the coupon collector problem gives
an estimate of this p, although in practice it is lower due to the pruning.
(iii) The attacker now accesses the set of k′ addresses again, storing an address in
case its access latency is high - it must have been evicted by the victim.
(iv) The attacker repeats this until a set of t addresses is obtained, taking on average
t
p iterations. Every iteration requires one observable victim access and less than
(mpr + 2)k attacker accesses.
Note that after the first iteration, the victim access of interest should be consid-
ered to be already in the cache. To cope with this, the attacker has two options:
(1) flushing the cache in between iterations by accessing many different addresses;
(2) proceeding normally, noting that the expected number of iterations (and hence
victim accesses) increases with a factor c ≤ min(nways, 1/p). Unless mentioned oth-
erwise, we assume that attacker adopts the flushing approach to explicitly minimize
the number of victim accesses.
Exploitation phase. Resulting from the profiling phase, the attacker now has t addresses
that collide with the victim access of interest in at least one cache way. Proceeding with
the attack, the Prime+Probe exploitation phase is probabilistic, implying that t should
be chosen as a function of the desired success probability. Referring the reader to a detailed
exploration of the exploitation phase in [11]; choosing t = 275 in an 8-way set-associative
cache with 11 index bits results in an eviction probability of 99%.
Discussion. We have experimentally validated the profiling procedure with the simulator
described in Section 3. Profiling Prime+Probe is much easier with this approach, notably
in terms of victim accesses of interest, which take the most effort for the attacker to obtain.
Depending on the attack parameter k (and hence k′), our procedure reduces the expected
number of iterations and victim accesses Av from the original Av = n
2
ways · 2bindices · t to
Av =
t
p
=
nways · 2bindices · t
k′
(potentially multiplied with c). The expected number of attacker accesses is upper-bounded
by Aa = (mpr + 2)kt/p (excluding potentially flushing the cache). Note that our profiling
approach generalizes the ScatterCache security analysis. The highest values for victim
accesses Av are obtained with k = 1, corresponding exactly to the original analysis
†.
To provide tangible results and illustrate the scalability of the proposed approach, Table 1
presents, for several (nways, bindices, k)-tuples, parameter values and adversarial effort
to obtain one colliding address. For t colliding addresses, the runtime increases linearly.
Applying our approach to the AES T-tables example from the ScatterCache paper [11]
with the same assumptions (nways= 8; bindices= 11; t = 275; cache hit 9.5ns; cache miss
50ns; flushing approach between iterations which takes 3.6ms; victim process computes
0.5ms), the total runtime for profiling the eviction set reduces from an estimated 38
hours [11] to less than 5 seconds (k = 8000).
Table 1: As function of the attack parameter k: victim accesses (Av), attacker
accesses per victim access (Aa/Av) and profiling runtime for one colliding address;
averaged over resp. 107 (k = 1), 105 (k = 200) and 104 (k ≥ 2000) simulator runs.
To compute the runtime, we include the attacker miss rate amiss.
nways bindices k mpr k
′ p Av Aa/Av amiss time
1 0 1 2.44 ·10−4 4098 2 ≈ 0 17 s
200 2.07 194 0.047 21 800 0.25 86 ms10
2000 4.63 1306 0.333 3 10 · 103 0.27 13 ms
1 0 1 1.20 ·10−4 8354 2 ≈ 0 34 s
200 1.94 197 0.024 42 780 0.26 173 ms
4
11
4000 4.83 2610 0.317 3 21 · 103 0.26 14 ms
nways bindices k mpr k
′ p Av Aa/Av amiss time
1 0 1 1.23 ·10−4 8130 2 ≈ 0 33 s
200 1.93 197 0.024 42 780 0.26 172 ms10
4000 5.12 2653 0.33 3 22 · 103 0.25 14 ms
1 0 1 6.03 ·10−5 16584 2 ≈ 0 68 s
200 1.71 199 0.012 83 740 0.27 341 ms
8
11
8000 5.52 5305 0.33 3 46 · 103 0.23 15 ms
Towards noisy environments. Obviously, the described profiling becomes more involved
when noise is present in the system. For instance, the attacker should take care that the
essential pruning step effectively terminates. While investigating the attacks in the noisy
case is an interesting and necessary avenue of future work, we reiterate that we simply
match the assumptions in the ScatterCache security analysis.
†For the k = 1 case, it holds that k′ = 1 and c = nways, obtaining the expression from [11].
5 Covert channels
Among other microarchitectural attacks, Meltdown-type attacks use the cache as a
covert channel. In covert channels, transmitter and receiver processes actively collaborate;
they can already do so in the Prime+Probe profiling phase. This allows for faster profiling
than for the attack in Section 4, as any collisions are now sufficient (cf. the well-known
birthday problem in statistics). The correspondence with the cryptographic assumptions
on the IDF is apparent: collaboration in the profiling phase reduces second-preimage search
to collision search. While optimizations are possible, a baseline attack has the following
steps.
Profiling phase.
(i) The receiver process generates a large number of addresses, loads them into
the cache and prunes this address set (similar to the attacker process in Section 4).
(ii) The transmitter process generates a large number of addresses and loads them into
the cache.
(iii) Receiver process now loads its original set of addresses again, storing the addresses
with high access latency as collision addresses. These must have been evicted
by the transmitter, indicating collision in at least one cache line.
(iv) Transmitter does the same; it loads its original set of addresses again. Slow accesses
correspond to addresses that collide with the receiver in at least one cache line.
(v) This process can be repeated until both transmitter and receiver obtain a desired
number of colliding addresses. This number is an attack parameter, it can e.g. be
set as a fraction f of the total number cache lines (e.g. f = 0.05)
Transmission phase. The transmitter and receiver process now each have a set of ad-
dresses (resp. tT and tR), satisfying that each transmitter address collides with at least
one victim address in at least one cache way. As a result, these sets constitute a number
of probabilistic covert channels. Because each collision is assumed to occur only in one
cache way, each covert channel has a 1nways success probability. The transmitter collision
addresses tT are partitioned into s disjoint and equally sized bins tT,i (say s = 64).
Transmission occurs in sequences of s bits at a time. To increase the reliability of the
channel, sequences are separated by a cache flush, performed by accessing many different
addresses (excluding the tT and tR addresses), either by the transmitter or receiver pro-
cess. It is not required that the full cache be evicted; but it should flush the majority of
transmitter addresses from the cache. Within a sequence, the transmission occurs one
bit at a time:
Transmitting bit i of the sequence.
(i) The receiver loads the tR addresses into the cache.
So the receiver listens on every channel.
(ii) Depending on the value of bit i, the transmitter either does (i = 1) or does not
(i = 0) access the addresses in the bin tT,i.
So the transmitter sends on tTs channels, but sends the same bit of information on
every channel. This redundancy overcomes the probabilistic nature of the
attack. The separation in s bins ensures that not all attacker addresses are stuck
in the cache after sending only one bit. Moreover, it increases the bandwidth of the
attack.
(iii) The receiver now loads the tR addresses again, counting the number of slow accesses
(=evictions). If this number is larger than a predetermined threshold d, bit i is deter-
mined to be 1, otherwise it is 0.
This threshold d is determined ahead of time. Its optimal value can be greater than
zero due to the non-zero probability of false positives in the channel.
Discussion. We have successfully simulated both the profiling and transmission phase of
the described covert channel attack, using the setup described in Section 3. The covert
channel bit error rate (BER) and bandwidth depend on the cache parameters nways and
bindices, and can be traded off by modifying the attack parameters f and s. The resulting
channel bandwidth interpolates a regular Prime+Probe covert channel and a covert
channel based on full-cache eviction.
While the described covert channel profiling considers memory addresses that collide in
one cache way only, finding a full cache-set collision (or anything in between) also requires
less effort if the processes collaborate. In general, the complexity of the profiling phase can
be traded off with the quality of the colliding channels in the transmission phase.
6 Conclusion and future work
Protected cache architectures constitute a promising line of work to thwart cache-based
timing channels in microarchitectural attacks. In this note, we further the understanding
of the residual attack surface, both for profiling side-channels and constructing covert
channels. For conflict-based side-channel profiling, we generalize the existing analysis,
revealing that the requirement of observable victim accesses can be lowered by several
orders of magnitude. Acknowledging that completely closing covert channels is extremely
difficult, we outline an approach that has the potential to significantly outperform covert
channels based on full cache evictions.
Future work. To strengthen the confidence in protected cache architectures and partic-
ular instances thereof, we identify interesting directions of future work:
– Explore the effectiveness of these attacks in the presence of noise, both by exploring
noise-reduction techniques and carefully selecting the attack parameters;
– Investigate formal approaches to (provably) provide lower bounds on attacker effort
and upper bounds on covert channel capacities;
– Consider performance-preserving countermeasures (contrary to high key agility), to
aid in this formal analysis and/or to limit the adversarial exposure window.
Acknowledgements
This work is supported in part by the Horizon 2020 research and innovation programme
under grant agreement Cathedral ERC Advanced Grant 695305, and by a gift from Intel
Corporation.
References
[1] Bernstein, D.J.: Cache-timing attacks on AES. Preprint available at http://cr.yp.to/papers.html#
cachetiming (2005)
[2] Osvik, D.A., Shamir, A., Tromer, E.: Cache attacks and countermeasures: the case of AES. In:
Cryptographers’ track at the RSA conference. pp. 1–20. Springer (2006)
[3] Kocher, P., Genkin, D., Gruss, D., Haas, W., Hamburg, M., Lipp, M., Mangard, S., Prescher, T.,
Schwarz, M., Yarom, Y.: Spectre attacks: Exploiting speculative execution. In: 2019 IEEE Symposium
on Security and Privacy (2019)
[4] Lipp, M., Schwarz, M., Gruss, D., Prescher, T., Haas, W., Fogh, A., Horn, J., Mangard, S., Kocher,
P., Genkin, D., Yarom, Y., Hamburg, M.: Meltdown: Reading kernel memory from user space. In:
27th USENIX Security Symposium. pp. 973–990 (2018)
[5] Yarom, Y., Falkner, K.: Flush+ reload: a high resolution, low noise, l3 cache side-channel attack. In:
23rd USENIX Security Symposium. pp. 719–732 (2014)
[6] Gruss, D., Maurice, C., Wagner, K., Mangard, S.: Flush+ flush: a fast and stealthy cache attack. In:
International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. pp.
279–299. Springer (2016)
[7] Liu, F., Yarom, Y., Ge, Q., Heiser, G., Lee, R.B.: Last-level cache side-channel attacks are practical.
In: 2015 IEEE Symposium on Security and Privacy. pp. 605–622. IEEE (2015)
[8] Wang, Z., Lee, R.B.: New cache designs for thwarting software cache-based side channel attacks.
ACM SIGARCH Computer Architecture News 35(2), 494–505 (2007)
[9] Wang, Z., Lee, R.B.: A novel cache architecture with enhanced performance and security. In: Pro-
ceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture. pp. 83–93.
IEEE Computer Society (2008)
[10] Qureshi, M.K.: Ceaser: Mitigating conflict-based cache attacks via encrypted-address and remapping.
In: 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). pp. 775–
787. IEEE (2018)
[11] Werner, M., Unterluggauer, T., Giner, L., Schwarz, M., Gruss, D., Mangard, S.: SCATTERCACHE:
Thwarting Cache Attacks via Cache Set Randomization. In: 28th USENIX Security Symposium
(2019)
[12] Maurice, C., Neumann, C., Heen, O., Francillon, A.: C5: cross-cores cache covert channel. In: In-
ternational Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. pp.
46–64. Springer (2015)
