Speculative Interference Attacks: Breaking Invisible Speculation Schemes by Behnia, Mohammad et al.
Speculative Interference Attacks: Breaking Invisible
Speculation Schemes
Mohammad Behnia, Prateek Sahu◦, Riccardo Paccagnella, Jiyong Yu, Zirui Zhao, Xiang Zou‡,
Thomas Unterluggauer‡, Josep Torrellas, Carlos Rozas‡, Adam Morrison†, Frank Mckeen‡,
Fangfei Liu‡, Ron Gabor, Christopher W. Fletcher, Abhishek Basak‡, Alaa Alameldeen‡
University of Illinois at Urbana-Champaign, ◦University of Texas at Austin, ‡Intel Corporation,
†Tel Aviv University, Toga Networks
{mbehnia2,rp8,jiyongy2,ziruiz6,torrella,cwfletch}@illinois.edu,
prateeks@utexas.edu,
{xiang.chris.zou,thomas.unterluggauer,carlos.v.rozas,frank.mckeen,fangfei.liu,abhishek.basak,
alaa.r.alameldeen}@intel.com,
mad@cs.tau.ac.il,
ron.gabor@toganetworks.com
ABSTRACT
Recent security vulnerabilities that target speculative execu-
tion (e.g., Spectre) present a significant challenge for proces-
sor design. The highly publicized vulnerability uses specula-
tive execution to learn victim secrets by changing the cache
state. As a result, recent computer architecture research has
focused on invisible speculation mechanisms that attempt to
block changes in cache state due to speculative execution.
Prior work has shown significant success in preventing Spec-
tre and other vulnerabilities at modest performance costs.
In this paper, we introduce speculative interference attacks,
which show that prior invisible speculation mechanisms do
not fully block speculation-based attacks that use cache state.
We make two key observations. First, mis-speculated younger
instructions can change the timing of older, bound-to-retire
instructions, including memory operations. Second, chang-
ing the timing of a memory operation can change the order of
that memory operation relative to other memory operations,
resulting in persistent changes to the cache state. Using both
of these observations, we demonstrate (among other attack
variants) that secret information accessed by mis-speculated
instructions can change the order of bound-to-retire loads.
Load timing changes can therefore leave secret-dependent
changes in the cache, even in the presence of invisible specu-
lation mechanisms.
We show that this problem is not easy to fix: Speculative
interference converts timing changes to persistent cache-state
changes, and timing is typically ignored by many cache-based
defenses. We develop a framework to understand the attack
and demonstrate concrete proof-of-concept attacks against
invisible speculation mechanisms. To address this problem,
we provide security definitions sufficient to block speculative
interference attacks; describe a simple defense mechanism
with a high performance cost; and discuss how future research
can improve its performance.
1. INTRODUCTION
Speculative execution attacks such as Spectre [29] and
follow-on work [8, 12, 22, 28, 30, 34, 43, 56] have opened a
new chapter in processor security. In these attacks, adversary-
controlled transient instructions—i.e., speculative instruc-
tions bound to squash—access and then transmit potentially
sensitive program data over microarchitectural covert chan-
nels (e.g., the cache [59], port contention [8]). For example in
Spectre variant 1—if (i < N) { j = A[i]; B[j]; }—
speculative execution bypasses a bounds check due to a
branch misprediction, accesses an out-of-bounds value (j =
A[i]) and transmits that value through a cache-based covert
channel (B[j]), i.e., by forcing a cache fill to occur in a set
that depends on j. In this paper, we consider the illegally
accessed value j to be the secret. Here, the attacker controls
the value of i, thus j can be any value in program memory
and the covert channel can reveal arbitrary program data.
While a variety of covert channels can be used to leak
secret values under mis-speculation, cache-based covert chan-
nels [29, 35, 57, 59, 60] make the fewest assumptions on the
attacker and have therefore received the most attention. This
is for two reasons. First, secret-dependent cache fills leave a
persistent footprint in the cache which is observable long after
speculation squashes. Second, certain levels of modern cache
hierarchies are globally shared by all cores in the system,
enabling attackers to observe said persistent state changes
from other physical cores [33, 57]. By contrast, many other
covert channels (e.g., arithmetic port contention [6, 8]) leave
only intermittent side effects that must be monitored before
the squash, and/or require that the attacker share hardware
resources on the same physical core (e.g., branch predictor
channels [4, 13])—both of which can be easily blocked (e.g.,
disabling SMT).
The above view of the covert channel landscape has led to
a surge of architecture-level “invisible speculation” proposals
to block cache-based covert channels due to mis-speculations
(e.g., InvisiSpec [56], SafeSpec [26], Delay-on-Miss [38],
Conditional Speculation [31], MuonTrap [5]). Invisible spec-
ulation schemes add hardware to prevent mis-speculated
loads from making persistent state changes to the memory
subsystem. To maintain the performance benefits of caching,
only non-speculative loads that are bound to retire are allowed
to modify the cache state. To maintain the performance bene-
fits of out-of-order execution, loads are allowed to “invisibly”
1
ar
X
iv
:2
00
7.
11
81
8v
1 
 [c
s.A
R]
  2
3 J
ul 
20
20
execute (i.e., bring data directly to the core without filling the
cache) and forward their results to dependent instructions.
1.1 This Paper
In this paper we introduce speculative interference attacks,
which show that invisible speculation schemes do not fully
block speculation-based attacks that use the cache state. Our
attacks are based on two key observations. First, that mis-
speculated instructions can influence the timing of older,
bound-to-retire operations. Second, if changing the timing
of a memory operation changes the order of that memory
operation with respect to other memory operations, the re-
sulting re-ordering can cause persistent cache-state changes.
Putting these together, we show (among other attack vari-
ants) how secret information accessed in a mis-speculated
window influences the order of bound-to-retire loads, leaving
secret-dependent state changes in the cache—even if invisible
speculation is enabled.
To explain these ideas in more detail, consider a simple
but representative invisible speculation scheme Delay-on-
miss (DoM) [38]. DoM issues a speculative load and (a)
on an L1 cache hit, forwards the load result to dependent
instructions, or (b) on an L1 cache miss, delays servicing the
miss and re-issues the load when it becomes non-speculative.
In case (a), DoM does not update any replacement state (e.g.,
replacement bits) in the L1 cache until the load becomes
non-speculative. For simplicity, we explain ideas assuming
only branch instructions cast speculative shadows [38], i.e.,
a load is considered non-speculative/safe iff it is older than
the oldest unresolved branch. We discuss attacks on more
conservative DoM variants in § 3.
DoM’s (and other invisible speculation schemes’) stated
security goal is to only focus on blocking cache state changes
due to mis-speculations, while leaving other covert chan-
nels un-blocked. This is reflected in DoM’s design. On the
one hand, DoM prevents mis-speculated loads from directly
changing the cache state. On the other hand, DoM allows
mis-speculated loads to forward their results to dependent
instructions, which can clearly form covert channels through
intermittent state changes. For example, both whether a
mis-speculated load hits or misses in the L1 cache, and the
mis-speculated load’s return value, determine whether and
how dependent mis-speculated instructions execute. This
is exactly the basis for forming, e.g., arithmetic unit port
contention covert channels [6, 8].
interference_target;
if (i < N) { // mispredict
  j = A[i]; // M1
  k = B[j]; // M2
  interference_gadget(k); }
Frontend
Exe1 Exe2
1
2
3
4
4
CDB
Figure 1: Speculative interference example. Assume this code snip-
pet is run on a processor protected by invisible speculation such as
DoM. (Left) Instructions making up the interference gadget issue at
a time that depends on whether load M2 hits or misses in the cache,
which depends on the secret j returned by load M1. (Left and Right)
The interference gadget’s execution causes timing changes to older non-
speculative instructions, the interference target, due to the processor is-
suing them to parallel execution units EXE1 and EXE2, respectively (Ã
on Left and Right). CDB denotes the common data bus.
This paper demonstrates how instructions that cause inter-
mittent state changes can be leveraged to create persistent
state changes in the cache. Consider the example in Figure 1,
modeled after Spectre variant 1. Suppose this code is run on
a processor using DoM. In Figure 1 (Left), a mis-speculated
load M1 forwards secret data j to a second load M2 (À). A
normal Spectre attack would monitor the cache state change
left by M2 to deduce j. To prevent this leak, DoM would
prevent M2 from changing the cache state, specifically by
allowing it to access and return data from the L1 if there is
an L1 hit and delaying its execution otherwise. While this
blocks the cache state change due to M2, M2 is allowed to
forward its result when it completes (Á), meaning that depen-
dent instructions, dubbed the interference gadget, execute at
a time that depends on j (Â). This has the potential to create
a traditional non-cache based covert channel, e.g., through
execution unit port usage, which DoM ignores.
Our key observation is that secret-dependent timing ef-
fects caused by the interference gadget can be monitored
indirectly through how it interacts with the execution of older
instructions. For example, the instruction(s) before the mis-
speculated branch, dubbed the interference target (Ã). Al-
though the interference target appears before the interference
gadget in program order, out-of-order execution could have
both of them executing concurrently in separate execution
units. For example, Figure 1 (Right) shows the interference
gadget, mapped to execution unit EXE1, influencing the tim-
ing of the interference target, mapped to EXE2 (Ã). For ex-
ample, if both EXE1 and EXE2 contend for the common data
bus in the same cycle. We call this speculative interference.
Next, we show how speculative interference can be used to
bootstrap a change in the cache state. Specifically, suppose
the interference target is made up of two independent loads
A and B. Since these loads are older than the mispredicted
branch in program order, they are not protected by DoM.
We show how, depending on the timing changes caused by
the interference gadget, the order in which A is issued with
respect to B can change. That is, depending on a secret, the
processor issues either A followed by B or B followed by
A. To finish the attack, we show how changing the order of
memory operations can be used to create persistent changes
in the cache state, the intuition being that state in the cache
(e.g., replacement bits) depend on not just what requests are
made, but also their order.
This issue is not easy to fix. The crux of the problem is that
timing changes can be converted to persistent state changes.
These timing changes can arise due to interference through
a large number of microarchitectural structures, through dif-
ferent instructions, etc. Further, while our example reorders
two loads that originate from the same thread, there are many
other memory address streams through which to interleave
operations, e.g., interleaving instruction and data cache ac-
cesses, accesses made across threads and security domains,
etc.—which further widens the attack surface. The rest of
this paper proposes a framework for reasoning about the new
attack surface, details additional attack variants with concrete
proof-of-concepts running on real hardware, proposes a secu-
rity definition for soundly defeating the attacks, and takes a
first step towards designing defenses.
To summarize, we make the following contributions.
• We introduce and provide a framework to reason about
2
speculative interference attacks, whereby subtle secret-
dependent microarchitectural interference influences the
behavior of older non-speculative instructions. We show
how this can be used to create cache-based covert channels,
even in the presence of invisible speculation schemes.
• We provide proof-of-concept exploits for two such attacks
on a real machine, exploiting interference in the processor’s
out-of-order issue logic and frontend queues. Both attacks
are cache-based and work across physical cores.
• We provide a sufficiently strong security definition to block
the attacks, and also provide a starting point defense and
discussion to set a research agenda for more efficiently and
comprehensively addressing the problem.
2. BACKGROUND
2.1 Threat Model
We adopt the standard threat model used by invisible spec-
ulation schemes [5, 26, 31, 38, 56]. Such schemes care about
preventing “persistent” side-effects that are observable due
to mis-speculated loads. For example, which lines are in the
cache, replacement and coherence state of each line, etc.
Invisible speculation schemes further distinguish from
where the attacker is monitoring the covert channel (i.e.,
where the receiver [27] runs). In particular, one of the first
invisible speculation schemes by Yan et al. [56] specifies
several such attacker models:
SameThread model: Here we consider untrusted code
interleaved with trusted code, as in the case of a sandbox.
CrossCore model: Here, the idea is that the system pre-
vents untrusted code from running on a sibling hyper-thread.
However, it may run on another core and monitor a cache-
based covert channel through a shared cache.
We will show attacks against these models. Yan et al. [56]
also specifies an SMT model where the receiver runs in an
adjacent hyper-thread. This gives the attacker more power,
thus we will focus on the former two models.
2.2 Invisible Speculation Schemes
Invisible speculation aims to block covert channels from
forming through the cache due to mis-speculated loads [5,26,
31, 38, 56]. While there have been multiple invisible specu-
lation proposals, they all share several common traits. For
security, they prevent state changes to the cache due to mis-
speculated loads, until those loads are deemed safe/become
non-speculative. Performance hinges on several optimiza-
tions. First, mis-speculated loads should—subject to the
constraint of not updating cache state—be allowed to execute
and forward their results to dependent (mis-speculated) in-
structions. This allows these schemes to maintain efficiency
in the common case that speculation turns out to be correct.
Second, loads should be able to update cache state when they
become non-speculative (safe). Together, these performance
optimizations enable such schemes to reap the benefits of
out-of-order execution and caching.
Different invisible speculation schemes differ in their ex-
act policies for allowing speculative loads to execute. We
describe the “Delay-On-Miss” (DOM) [38] scheme here, as
it is simple and illustrates the main ideas.
Delay-On-miss (DOM) allows loads that hit in the L1 cache
to execute and forward their results to dependent instructions
(which are themselves allowed to execute). Any cache state
change that would have been made as a by-product of the
L1 cache hit (e.g., modifying replacement bits) is deferred
until the load becomes non-speculative. If the load misses
the L1 cache, it is delayed and re-executed when it becomes
non-speculative.
2.3 OoO Processor Pipeline
Dynamically scheduled processors execute instructions
out-of-order (OoO) to improve performance [20, 48]. An
instruction is fetched by the processor frontend, dispatched to
reservation stations (RS) for scheduling, issued to execution
units (EUs) in the processor backend, and finally retired
when it updates the machine’s architected state. Instructions
proceed through the frontend, backend and retirement stages
in order, possibly out of order and in order, respectively. In-
order retirement is implemented by queueing instructions in
a reorder buffer (ROB) [25] in order and retiring a completed
instruction when it reaches the ROB head.
3. BREAKING INVISIBLE SPECULATION
WITH SPECULATIVE INTERFERENCE
ATTACKS
Invisible speculation designs allow secrets to be brought
into the pipeline by mis-speculated loads, but they prevent
such loads from making secret-dependent observable changes
to the cache state, in order to block cache-based covert chan-
nels. These schemes do not otherwise restrict how secrets
flow through the pipeline. This allows secret-dependent us-
age of pipeline resources by speculative instructions, which
itself forms a covert channel if it can be observed by an at-
tacker. The invisible speculation paradigm considers such
covert channels out of scope, arguing that it is straightforward
to block direct observations of how speculative instructions
utilize microarchitectural resources (§ 3.1).
Here, we show that the invisible speculation approach
remains vulnerable to subtle cache-based covert channels.
We introduce a speculative interference attack framework,
which transforms secret-dependent resource usage patterns
of speculative instructions into cache-based covert channels.
A speculative interference attack exploits such intermittent
resource contention to influence the timing of non-speculative
parts of the pipeline (§ 3.2). These timing effects are used to
make the non-speculative cache access pattern depend on the
secret, creating a cache covert channel (§ 3.3).
3.1 Traditional Contention Attacks
Invisible speculation schemes target cache-based covert
channels. Cache channels have a persistence property: any
secret-dependent cache state changes made by mis-speculated
instructions, such as line fills and evictions, are not rolled
back and remain observable after the mis-speculated execu-
tion is squashed. Cache state changes can also be observed
from other cores by monitoring the LLC [33, 58].
In contrast, a speculative contention-based channel leaks
the secret using intermittent side-effects of mis-speculated
instructions, which leave no trace after a squash. For ex-
ample, a mis-speculated instruction’s execution time and
3
how/when it uses microarchitectural structures such as exe-
cution units [19], ports [8, 18], etc. Unlike cache channels, a
contention channel requires the attacker to directly monitor a
mis-speculating victim instruction’s behavior, as the instruc-
tion executes. The attacker must therefore physically contend
with the victim on microarchitectural resources during the
mis-speculating execution, typically by running on a sibling
SMT context [8, 18].
Invisible speculation designs justify defending against only
cache-based channels based on the above differences between
channel types.1 The argument is that for all but a narrow set
of attacks (e.g., uncore bus contention), contention attacks
can be blocked with complementary mechanisms to prevent
untrusted parties from concurrently sharing non-cache re-
sources. Typically, this means not scheduling untrusting pro-
grams together on sibling SMT contexts or disabling SMT.
Caches are excepted because disabling cache sharing would
disastrously degrade the system’s efficiency.
3.2 Making Mis-Speculation Influence the
Timing of Non-Speculative Actions
The first insight underlying speculative interference attacks
is that a speculative instruction can delay non-speculative
pipeline operations—such as execution of bound-to-retire
instructions or frontend actions—in a variable, secret-
dependent manner. A speculative interference attack exploits
mis-speculated instructions to delay an unprotected (non-
speculative) memory operation, making the time at which it
accesses the cache depend on the secret. This ability can be
used to construct a cache-based covert channel (§ 3.3).
In contrast to traditional contention attacks, our attacks do
not leak the secret through direct observations of the mis-
speculated instruction’s behavior, which is intermittent and
not observable after the squash. Instead, the mis-speculated
instruction’s behavior is indirectly observed from how it in-
fluenced the non-speculative actions, which are not squashed.
The root cause of the problematic behavior in that in
current OoO pipeline designs, microarchitectural resources
(such as reservation stations and execution units) are allo-
cated to, or deallocated from, instructions in the ROB using
policies that maximize performance, without factoring in
whether an instruction is speculativeâA˘Tˇand so potentially
mis-speculatedâA˘Tˇor known to be retirement-bound. For
example:
• Some microarchitectural resources are allocated to instruc-
tions based on their readiness, so that they execute as soon
as possible. This means that the cycle in which a specu-
lative instruction starts occupying a resource depends on
when its operands become ready, which can depend on the
speculative instructions producing the operands. While the
resource is allocated to the speculative instruction, an older,
retirement-bound instruction that becomes ready can be
blocked from using that resource, thereby slowing down
its execution.
• In some cases, how long an instruction uses a resource
depends on the instruction’s operand. This means that the
1For example, MuonTrap [5] excludes non-cache speculative covert
channels because “these attacks do not involve hiding state to be
picked up later... and so can be prevented by preventing soft state
changes before speculation is completed.”
cycle in which a speculative instruction releases a resource,
unblocking any older, retirement-bound instructions that
require the resource, depends on its operands.
Similarly, intermittent resource contention with speculative
instructions, whose duration is operand-dependent, can pre-
vent dispatching of new instructions into the ROB, throttling
the frontend and thereby delaying frontend actions (§ 3.2.2).
Overall, the microarchitectural resource usage pattern of
speculative instructions depends on their operands and can
therefore delay retirement-bound instructions (or frontend
actions) in an operand-dependent way. Invisible speculation
schemes do not prevent these effects. These schemes leave
the basic OoO pipeline design unchanged, since to block
just speculative loads from changing cache state requires
modifying only load-related logic (e.g., load unit) and/or the
cache subsystems.
3.2.1 Attack Framework
We present a framework for leaking a secret bit by making
the secret determine the time at which an unprotected victim
memory operation accesses the cache, modifying its state.
In § 3.3, we construct a cache-based covert channel from
this ability. Thus, the full attack transforms secret-dependent
timing variations into secret-dependent cache state changes.
The attack forwards a secret to a sequence of mis-
speculated instructions, called an interference gadget, which
create secret-dependent pressure on some microarchitectural
resource (§ 3.2). This pressure influences the timing of ac-
tions performed by a non-speculative part of the pipeline,
called the interference target, making these timings encode
information about the secret. The interference target is chosen
so that changing its timing creates a pipeline “ripple effect”
that ultimately delays the unprotected victim memory access.
The combination of a gadget and a target defines a sender,
which is the sending side of the cache covert channel.
Interference gadgets can be classified as one of two types,
related to the type of information flow [36, 47] from their
interference: explicit and implicit.
An explicit gadget consists of instructions with operand-
dependent (i.e., variable) execution time and resource usage
pattern, which reveal information about the operand. We refer
to such instructions as a transmitters [63]. In this paper, we
use loads for the transmitters, but the ideas generalize to other
classes of transmitters (e.g., data-dependent arithmetic [19]).
In an explicit gadget, it is the secret-dependent resource usage
of the gadget—i.e., how the gadget executes—that creates
the interference encoding the secret.
An implicit gadget consists of (1) non-transmitter instruc-
tions (with an operand-independent resource usage pattern)
which are data-dependent on a transmitter, and (2) that trans-
mitter, which receives the secret as an operand. In an implicit
gadget, the transmitter’s secret-dependent execution time de-
termines when (not how) the gadget creates interference.
Figure 2 depicts the attack framework and timeline.
1 A mis-speculated access load reads a secret into the
pipeline, and forwards it to the interference gadget, whose
instructions are data-dependent on the access load. As a con-
crete example, we show these instructions in the shadow of
a mispredicted array bounds check, which is slow to resolve
due to a cache miss on variable N.
4
interference_target;
if (i < N) { // mispredict
  secret = A[i]; // access
  interference_gadget(secret); 
}
1
2
3
secret
== 0
gadget
secret
== 1
secret
== 0
secret
== 1
access target
Time
explicit gadget
1 2 3
gadgetaccess target
Time
gadgetaccess target
Time
implicit gadget
1 2 3
access target
Time
gadget
A
A
A
A
A = unprotected memory access
Figure 2: Speculative interference attack framework (left) and timeline (right) for both types of interference gadgets.
1 z = ... // takes Z cycles
2 A = f(z) // takes F cycles
3 y = load(A)
4 if ( i < N): // mispred. taken (miss on N)
5 // access Inference Gadget
6 secret = load(&TargetArray[i ])
7 x = load(&S[secret ∗ 64])
8 f′(x)
Branch resolves,
Speculation squashes
Wait on
z
z
returns
f(z)
stalled
Load S[64]
issued
S[64]
returns
f'1 runs
f'2 runs
- - -
f(z) returns
load(A) issued
Wait on
z
z
returns
f(z)
stalled
f3 runsf1 runs - - -Load S[0]
issued
f1 runs f2 runs
secret == 1
(Load S[64] hits)
secret == 0
(Load S[0] miss)
Branch condition,
miss speculation
S[0]
returnsf2 runs
(a)
(b)
f(z) returns
load(A) issued
Figure 3: Delaying a load using contention on a non-pipelined EU (GDNPEU ). Instruction sequences f and f’ use the same non-pipelined EU.
2 Depending on the secret, the gadget either interferes or
does not interfere with the target. In both types of gadgets, the
secret-dependent behavior of the gadget’s transmitter(s) kick-
starts the attack. For explicit gadgets, it determines whether
the gadget contends for resources with the target or not. For
implicit gadgets, it determines whether the gadget executes
concurrently with the target (interfering with it) or after it.
This step exploits the fact that invisible speculation does
not “protect” a transmitter’s execution time or resource usage
pattern, which remains operand-dependent. This means that
the secret operand gets encoded into the transmitter’s behav-
ior. This behavior cannot be directly observed by the attacker,
but it can be indirectly observed through its influence on the
interference target, which is non-speculative.
3 How quickly the target executes determines when the un-
protected victim’s memory operation is issued and accesses
the cache, thereby modifying its state. The target thus typi-
cally consists of older, retirement-bound instruction(s), which
delay a victim retirement-bound load’s D-Cache access. How-
ever, in some invisible speculation schemes, the target can
also be the core’s frontend, delaying a victim I-Cache access
(§ 3.2.2). Overall, whether or not the gadget interferes with
the target determines whether the unprotected victim memory
operation gets delayed.
3.2.2 Interference Gadgets & Targets
Here, we design several interference gadget/targets, which
illustrate explicit and implicit gadgets that delay D- and I-
Cache accesses. § 4 shows that these gadget designs lead
to observable interference in practice. Our goal here is not
to exhaustively enumerate all possible gadgets/targets, but
to illustrate the problem. Exploring if (and which) other
microarchitectural resources can be used to build interference
gadgets is an interesting topic for future work.
We first present two gadgets that delay an unprotected
D-Cache access. We exploit the fact that in all invisible
speculation designs, a load that executes only when it reaches
the head of the ROB performs its D-Cache access unprotected.
Our interference targets therefore arrange for the victim load’s
address, or target address, to become available just as it
reaches the head of the ROB. Our interference gadgets either
delay the victim from reaching the ROB head or delay its
execution (cache access) after it has reached the ROB head.
Next, we present a gadget that delays an unprotected I-
Cache access. Such accesses are performed by InvisiSpec and
DoM. We acknowledge that unprotected I-Cache access can
form a cache-based covert channel in and of itself but describe
this gadget due to its interesting property of interfering with
the frontend and not with some instruction’s execution.
GDNPEU: Delay data access using non-pipelined EU. This
is an implicit gadget that creates a secret-dependent delay of
the victim’s target address generation. Figure 3(a) shows the
gadget and target. The target consists of a retirement-bound
load whose operand, A, is generated by a dependent chain of
instructions, denoted f. The gadget consists of a load (trans-
mitter) and a sequence of independent instructions, denoted
f’, that depend on the load. (Here and elsewhere, the gad-
get executes in the shadow of a slow-to-resolve mispredicted
branch, due to a cache miss on N.) Each instruction in f’ uses
the same EU as the target. This must be a non-pipelined EU,
so that a gadget instruction being issued to the EU blocks a
target instruction from issuing.
Figure 3(b) shows the attack timeline. The value z, on
which the target address A depends on, takes z cycles to
compute. The value z is such that before z gets computed,
there is enough time for the attack’s access load to read
the secret, forward it to the interference gadget, and for the
gadget’s transmitter load to return (if it is a cache hit).
1 The transmitter load accesses a secret-dependent cache
line. This load executes under invisible speculation protec-
tion, but it still retrieves the data from some level of the
memory hierarchy. The attacker can therefore orchestrate for
its execution time to be secret-dependent, by appropriately
priming the cache prior to the attack.
2 If secret = 1, the transmitter load returns quickly,
just before the value z is produced. This makes the instruc-
tion sequence f’ in the gadget ready, and its first instruction
f ′1 is issued to the EU before f1, the first instruction in f.
Thus, when f1 becomes ready, it is blocked from using the
EU. When f ′1 completes, f1 is issued (due to age-ordered
5
1 A = f(z) // takes Z cycles
2 y = load(A)
3 if ( i < N): // mispred. taken (miss on N)
4 // Interference Gadget
5 secret = load(&TargetArray[i ])
6 x0 = load(&S[secret ∗ 64 ∗ 0])
7 x1 = load(&S[secret ∗ 64 ∗ 1])
8 ...
9 xM−1 = load(&S[secret ∗ 64 ∗ (M−1)])
Branch resolves,
Speculation squashes
- - -
f(z)
issued
load(A)
stalled
Load
Ssecret == 1
(x0..xM-1 diff)
secret == 0
(x0..xM-1 same)
Branch condition,
miss speculation
(a)
(b)
f(z)
issued
load(A)
stalled
Load
S+64
MSHR
Full
f(z)
returns
MSHR
Available
load(A)
returns
Load
S
Load
S
f(z)
returns
load(A)
returns
- - -
Load
S+(M-1)*64
Figure 4: Delaying a load using miss status holding registers (GDMSHR). M is the number of L1 D-Cache MSHRs.
scheduling). However, once f1 completes, f2—which de-
pends on f1—does not immediately become ready, due to
f1’s writeback delay. In contrast, f ′2, which depends only on
the transmitter, is already ready and so is issued to the EU.
Overall, this creates a cascading effect in which each instruc-
tion fi gets delayed, delaying the target address’ computation,
until the mis-speculation is squashed.
3 If secret = 0, the transmitter load does not return
before z is produced. (In delay-based invisible speculation
designs, the load is never executed. In other designs, the
load simply takes a long time to return compared to the hit
case.) As a result, the target address is computed before the
gadget’s interfering instructions execute (if they execute), and
the victim load can issue.
GDMSHR: Delay data access with MSHR contention. This
is an explicit gadget that delays the execution time of a load
at the head of the ROB by a secret-dependent amount of time.
Figure 4(a) shows the gadget and target. The target consists
of the victim load, whose address operand takes z cycles to
generate. The value z is such that the gadget’s instructions
can issue while the target address is being generated. The
gadget consists of M independent loads, where M is the num-
ber of L1 D-Cache miss status handling registers (MSHRs),
each of which holds information on all the outstanding misses
for some cache line. The gadget’s goal is to create secret-
dependent MSHR pressure, to delay when the victim load
obtains its data after reaching the head of the ROB and is-
suing. This gadget targets invisible speculation designs that
issue speculative L1 D-Cache misses, i.e., InvisiSpec, Safe-
Spec, and MuonTrap. None of these designs specify changes
to the MSHR allocation policy, so we assume they use the
standard policy of allocating an MSHR to a missing load
based on issue order. Figure 4(b) shows the attack timeline.
1 If secret = 1, each gadget load accesses a different
cache line. The attacker primes the cache so that each of these
accesses is an L1 D-Cache miss. The result is that each load
allocates a distinct MSHR, exhausting the available MSHRs.
Thus, the victim load (assumed to be an L1 D-Cache miss)
cannot issue and it is delayed until one of the gadget loads
completes or the mis-speculation is squashed.
2 If secret = 0, all the gadget loads access the same
cache line. They therefore use the same MSHR, which leaves
MSHRs available for the victim load once it reaches the head
of the ROB. The victim load can issue and is not delayed.
GIRS: Delay instruction fetch with RS contention. This
is an implicit gadget that stalls dispatch of instructions into
the ROB, which throttles the frontend, causing it to stop I-
Cache accesses for fetching instructions. Figure 5(a) shows
the gadget; the target is the frontend, so does not appear in
the code. The gadget consists of a load (transmitter) and a
long sequence of dependent arithmetic (ADD) instructions that
depend on the load. Figure 5(b) shows the attack timeline.
1 The transmitter load accesses a secret-dependent cache
line, which is setup to hit or miss in the cache hierarchy,
depending on the secret.
2 If secret = 1, the transmitter load is a miss in the D-
Cache. The dependent ADD instructions are fetched and fill
up the RS slots, but do not issue. This leads to the RS getting
filled up. Consequently, it creates a pressure on the Fetch
Unit and the frontend stalls. Hence, the target instruction is
not fetched. Once the branch resolves, execution continues
and the target instruction get fetched.
3 If secret= 0, the transmitter load hits in the D-Cache.
Its output is then available to the dependent instructions
quickly. The dependent ADD instructions issue as soon as
the resources are available, hence freeing up the RS slots.
Since the RS does not fill up, the frontend does not stop fetch-
ing. Consequently, the cache line holding target instr is
fetched into the I-Cache.
3.3 From Timing to Cache State Changes
We now show how to transform the basic attack primitive
(§ 3.2), which creates a secret-dependent delay for an unpro-
tected victim memory access, into a cache covert channel.
The insight here is that we can transform a secret-dependent
timing change—the delay in the victim’s access—into a
secret-dependent cache state change by using the delay to re-
order the victim access with another (unprotected) reference
memory access, which occurs at a fixed, secret-independent
time. Conceptually, the reference access acts as a kind of
“clock,” helping the attacker to observe whether the victim
load issues before or after some point in time.
Crucially, the only property required from the reference
memory access is that its issue time does not depend on the
secret. In particular, the reference access can be issued by the
victim or by the attacker, depending on the specific attack.
We use this property as follows. The basic attack primitive
determines whether an unprotected victim memory access A
is delayed or not, depending on the secret. It thus determines
whether A accesses the cache after or before the unprotected
reference memory access B, respectively. The cache state,
σ , is determined by the sequence of memory accesses to
the cache, α . We assume that σ is not commutative, i.e.,
σ(α AB) 6= σ(α BA). Formally, therefore, making the order
in which A and B access the cache secret-dependent makes the
cache state secret-dependent, creating a cache covert channel.
Non-commutativity of the cache state holds for most cache
6
1 if ( i < N): // mispredict taken (miss on N)
2 secret = load(&TargetArray[i ]) // access
3 // Interference Gadget
4 x = load(&S[secret ∗ 64])
5 // Congest RS
6 sum += x;
7 ...
8 sum += x; // many times
9 target instr . // Target instruction
Load S[64]
issues
S[0]
returns
ADDs
Dispatch
Fetch Target
Inst
I-Cache fill
RS full,
Frontend stops fetch
Branch resolves,
Speculation squashes
secret == 1
(Load S[64] miss)
secret == 0
(Load S[0] hit)
Branch condition,
miss speculation
(a)
(b)
Fetch Target
Inst
I-Cache fill
Load S[0]
issues
Figure 5: Back-throttling the Fetch Unit by contending for RS for GIRS gadget. sum+=x repeats for N (number of RS slots) times.
Accesses With Secret-Dependent Order
Gadget VD-VD/VI VD-AD VI-AD
GDNPEU InvisiSpec (Spectre), DoM
(non-TSO), SafeSpec (WFB)
All All
GDMSHR InvisiSpec (Spectre), SafeSpec
(WFB)
InvisiSpec,
SafeSpec,
MuonTrap
InvisiSpec,
SafeSpec,
MuonTrap
GIRS – – InvisiSpec,
DoM
Table 1: Invisible speculation designs vulnerability matrix.
architectures, particularly after sufficiently many past ac-
cesses, as long as both memory accesses target different cache
line addresses that map to the same cache set. We denote
these lines by A and B, according to the memory access that
targets each of them. The memory access order impacts the
set’s replacement state (e.g., LRU bits), and can be observed
by inducing evictions and monitoring which lines get evicted
(by timing memory accesses).
Blocking replacement state-related leakage is explicitly in
the scope of invisible speculation (e.g., [31, 38, 56]). How-
ever, we are not aware of such attacks being demonstrated in
practice.2 In § 4, we demonstrate a covert channel based on
the ordering of two LLC accesses on a commercial CPU with
a sophisticated replacement policy. Thus, for the following
discussion, we assume that achieving such secret-dependent
ordering is equivalent to forming a covert channel.
3.3.1 Completing the Attacks
In this section, we combine the speculative interference
gadgets (§ 3.2.2) with various types of reference memory
accesses to obtain several complete attacks on different points
in the invisible speculation design space. Each attack creates
a cache covert channel by making the secret determine the
order of two unprotected LLC accesses, which may be a
victim data access (VD), victim instruction fetch (V I), or
an attacker data access (AD). Table 1 summarizes which
defenses are vulnerable to which attack combinations.
VD-VD ordering. This attack targets invisible speculation
designs that may have multiple unprotected loads executing
concurrently. For example, InvisiSpec and SafeSpec have
modes that only defend against control-flow mis-speculation.
In these modes, any load that becomes ready to execute when
there are no unresolved branches older than it in the ROB,
performs an unprotected access [26, 56]. A similar case
exists with DoM on architectures with a non-TSO memory
2Recent work [55] shows information leakage through cache LRU
states, but its channels rely on more than ordering of two accesses.
1 z = ... // takes Z cycles
2 A = f(z) // takes F cycles
3 y = load(A)
4 B = g(z) // takes G > F cycles
5 z = load(B)
6 if ( i < N): // mispredict taken (miss on N)
7 secret = load(&TargetArray[i ])
8 // Interference Gadget
9 x = load(&S[sec ∗ 64]) // secret =1−>hit, secret =0−>miss
10 f′(x)
Figure 6: Reordering victim loads by exploiting contention on a
non-pipelined EU. Instruction sequences f and f’ use the same non-
pipelined EU. Instruction sequence g uses a different EU.
consistency model. In this case, any load can execute without
protection if all older branches have resolved and all older
stores and loads have their addresses resolved [38].
We show how to base the attack on the GDNPEU or G
D
MSHR
gadgets, by modifying the gadget’s interference target so
that the victim load A is followed (in program order) by a
retirement-bound reference load B, whose issue time is not
affected by the gadget. Due to space constraints, we fully
describe the attack based on the GDNPEU gadget; the G
D
MSHR-
based attack is similar. Figure 6 shows the modified target and
the original gadget. Both A and B’s address generation depend
on z. If secret= 0 (i.e., no speculative interference), load
A accesses the D-Cache before the reference load B, since the
sequence of instruction that generates B, g(z), takes longer
to complete than f(z). However, if secret = 1, there is
speculative interference, so A’s generation is delayed while
B’s is not, and load B accesses the D-Cache first.
VD-V I ordering. Modifying the target in the VD-VD attack
so that the branch condition i depends on load A makes the
delay of load A also delay the branch’s resolution time, i.e.,
when the squash occurs. This can change the order of a post-
squash instruction fetch—which is unprotected, as it is of the
correct execution path—with respect to load B.
VD-AD ordering. Many invisible speculation designs unpro-
tect a load only when it becomes the oldest load or the oldest
instruction in the ROB. This is the case in InvisiSpec’s Fu-
turistic mode [56], SafeSpec’s wait-for-commit mode [26],
Conditional Speculation [31], and MuonTrap [5]. These de-
signs make it impossible to reorder unprotected victim loads,
as no two such loads can execute concurrently. As noted
above, however, the same effect—secret-dependent order—
can be achieved if the attacker performs the reference access.
For this, the attacker simply needs to issue an LLC access to
the same set accessed by the VD load from another core, at a
7
fixed time after inducing the mis-speculation. This attack can
be based on either of the GDNPEU or G
D
MSHR gadgets.
V I-AD ordering. As in the VD-V I case, the GDNPEU and
GDMSHR gadgets can be used to target the branch condition, de-
laying a post-squash instruction fetch on the correct execution
path. This can be measured using the attacker’s LLC access
as a reference clock. In contrast, the GIRS gadget only im-
pacts the timing of instruction fetches in the mis-speculated
path. Hence, the delay it introduces for instruction fetches
can only be observed if I-Cache accesses are not protected by
the invisible speculation scheme, as in InvisiSpec and DoM.
Attack landscape summary Every invisible speculation de-
sign we have evaluated is vulnerable to at least one of the
attacks described above. Table 1 summarizes which designs
are vulnerable to which attack combinations. The differences
in security manifest in whether an attacker can reorder unpro-
tected victim accesses, or must rely on its own access as a
“reference clock.”
3.4 Existence of Senders
In § 4, we show that speculative interference attacks are
practical. Hence, existing work in the invisible speculation
paradigm remains vulnerable to subtle cache-based covert
channels. Invisible speculation thus only provides a con-
ditional security guarantee, because it protects only those
programs that do not contain code fragments embodying
senders such as those designed in this section. Unfortunately,
users and/or developers cannot know if their programs satisfy
this condition without performing program analysis to verify
the non-existence of senders. Relying on such program anal-
ysis to guarantee security undermines the efficacy of invisible
speculation as a hardware defense.
On top of this conceptual “security uncertainty” issue,
there are several real-world speculative execution attack set-
tings [23] in which the attacker has some control over the
instruction stream and can craft interference gadgets/targets.
These settings include (1) the in-domain setting, where a
software sandbox executes attacker-controller code, as in the
case of in-browser JavaScript code or user-supplied Linux
eBPF kernel extensions [39]; and (2) the domain-bypass set-
ting, where the attacker runs its own program, attempting to
use its mis-speculated execution to steal secrets from another
hardware protection domain, e.g., Meltdown [32].
4. ATTACK DEMONSTRATIONS
In this section, we demonstrate concrete proof-of-concept
(PoC) speculative interference attacks based on the ideas from
§ 3 on a commercial machine. Although invisible speculation
schemes are not implemented today, we can emulate their
behavior by arranging for loads that would be made ‘invisible’
to return data in secret-dependent amounts of time. At the
same time, by evaluating on real hardware, we must address
many details in real machines that are simplified in simulators
(e.g., LLC replacement policies, RS limits).
We evaluate multiple D-Cache PoCs described in § 3.2.2
namely, GDNPEU and G
D
MSHR. Both PoCs were successfully
implemented and the attacks successfully leak secret bits
to the attacker. We only show the GDNPEU attack for space.
Of independent interest, our implementation of GDNPEU re-
quires constructing a novel receiver able to read changes in
replacement state for the QLRU_H11_M1_R0_U0 replace-
ment policy (§ 4.2.2). We further demonstrate a variation on
GIRS demonstrating how speculative interference can cause
back-throttling through the I-Cache. All three attacks change
cache state, with a receiver (attacker) that monitors execution
from another physical core (CrossCore; § 2.1). In what fol-
lows, we refer to our GDNPEU PoC as the D-Cache PoC (§ 4.2)
and our variant of GIRS as the I-Cache PoC (§ 4.3).
4.1 Methodology
Processor details. We evaluate on an Intel Core i7-7700
Kaby Lake CPU with 4 physical cores running at a base fre-
quency of 3.6GHz, with hyper-threading enabled. Each core
has a unified reservation station, that is shared across exe-
cution units, stores up to 97 micro-ops, and has 8 execution
unit ports (numbered 0 through 7). Each physical core has
two levels of private cache (a 32KB L1-instruction and 32KB
L1-data cache, 256KB of combined L2) and 8MB of Shared
L3 (LLC) cache [1, 2].
Tools borrowed from prior work. In both the attacks, we
trigger branch mispredictions by training the target branch in
a given direction (similar to [29]). Likewise, we delay branch
resolution by having the branch predicate be the result of a
pointer chase. Both attacks use a Flush+Reload-style [59]
receiver. This decision for the receiver is not fundamental for
the I-Cache PoC (and alternatives could have been used, e.g.,
Prime+Probe [33]), but is important for our current D-Cache
PoC (discussed below).
Finally, the D-Cache PoC uses standard techniques to con-
struct eviction sets in the LLC [33], which are sets of cache
lines that map to the same LLC set in the same LLC slice. By
accessing lines in an eviction set, the attacker can efficiently
evict other lines whose set and slice is known.
4.2 D-Cache PoC
Recall from § 3.2.2, the key principle in the GDNPEU attack
is for the attacker to observe the reordering of two bound-to-
retire loads. Our PoC enables the attacker to measure this
ordering by mapping the two loads to the same LLC set and
measuring changes in replacement state.
To deploy the attack there are two ingredients that need
to be developed. First (§ 4.2.1), an implementation of the
GDNPEU sender, i.e., to re-order older bound-to-retire loads.
Second (§ 4.2.2), a novel receiver capable of measuring dif-
ferences in LLC replacement state. We consider both of these
to be of independent interest, i.e., to re-order older non-load
instructions to perform different speculative interference at-
tack variants or to be used in entirely different attack settings
(in the case of the replacement state-based receiver).
4.2.1 Sender (Load Re-ordering)
To reorder the loads to two addresses A and B, we follow
the structure from Figure 6. Namely, there are two sequences
of instructions, f(z) and g(z), that generate addresses A
and B respectively. An interference gadget only affects f(z).
In presence of the gadget, load A is delayed to issue after
load B whereas regularly it would be issued before load B.
First consider the address generation of A and the inter-
ference gadget in isolation. We implement f(z) and f’(x)
8
(Figure 6) as repeated sequences of same instructions, called
the target instruction and gadget instruction, respectively.
We pick suitable instructions (i.e., that maximize the inter-
ference of the gadget on the target) as follows. We identify
high latency, low-throughput instructions that use the same
execution port. Low-throughput allows for an issued instruc-
tion in the interference gadget to block the execution port
of ready-to-schedule instructions in the interference target;
high latency maximizes the time it blocks instructions in the
interference target. Finally, the gadget instruction should
be composed of only a few micro-ops. This allows more
instructions in the interference gadget to occupy RS simulta-
neously, which increases the likelihood of them getting issued
concurrent to the target instructions.
120 130 140 150 160 170 180
Cycles
0.00
0.05
0.10
Fr
eq
ue
nc
y
= 15, b = 3
Interference Gadget Contention Histogram
interference
baseline
Figure 7: The average time (measured with clock thread [41, 44]) to
execute the interference target changes by ∼ 16 clock ticks (80 rdtsc
cycles) based on the presence or absence of the interference gadget.
Based on the above process, our PoC uses the VSQRTPD in-
struction for both gadget and target. VSQRTPD consists of only
1 micro-op executed on the core’s execution port 0 and has
observed latencies of 15–16 cycles and reciprocal throughput
of 9–12 cycles [14]. We also verified that the attack is func-
tional with VDIVPD. Figure 7 shows the time from the issue
of the first instruction of f(z) to the completion of the load
A in the presence (interference) and absence (baseline) of the
interference gadget’s execution. The takeaway is there is a
clear timing difference in the interference target’s execution
depending on presence/absence of the interference gadget’s
execution. This is the secret-dependent delay imposed by the
gadget on the victim load.
4.2.2 Receiver (Monitoring Replacement State)
With the capability to reorder two loads, the next ingre-
dient for the attack is to translate a reordering of loads into
a persistent cache state change. We achieve this using the
cache replacement state.3 For the rest of the section, we use
the notation A-B to indicate the order in time in which the
loads are issued, i.e., A-B means A issues first and vice-versa.
We also assume access to eviction sets (EV; § 4.1).
Our attack targets the replacement state because we are
only changing the order of loads. Changing the order of
loads is different than changing which loads are issued as in
a normal cache-based attack. For example, a standard LLC
Prime+Probe attack, without a very fine probe granularity,
would observe both A and B in the cache, regardless of their
order and be unable to distinguish A-B from the B-A case.
3RELOAD+REFRESH [10] also uses replacement-state manipula-
tion principles to execute a cache-based attack. The distinction in
this work is we try to identify the victim’s load issue order, whereas
they try to identify the presence of a victim’s access to a target
address.
Translating load issue order into a persistent replacement
state change is not difficult in textbook replacement policies,
such as LRU, as the ordering directly influences replacement
priority ranking. However, replacement policies in modern
machines, such as our target processor, are more complex.
The new technical challenge for the attacker is that fresh
insertions of A and B are ranked equally.
We show how this new challenge can be overcome by pro-
viding a technique to extract replacement state data from the
replacement policy on the Kaby Lake machine. To identify
the replacement policy on our machine, we used a CacheAn-
alyzer tool by nanoBench [3]. The resulting replacement pol-
icy is approximately QLRU_H11_M1_R0_U0 (“Quad-age
LRU”) on specific cache sets [51].4 QLRU is a Static-RRIP
Replacement policy variant with a 2 bit field used for the age
of a cache line [3, 24], summed up here:
• M1: Insertion policy. Inserts cache lines with age 1.
• H11: Hit promotion policy. Promotes a line of age 3 to age
1, age 2 to age 1, and age 1/0 to age 0 upon hit.
• R0: Eviction policy. Insert to leftmost location if cache
set is not full; otherwise, evict block corresponding to the
leftmost physical tag with age 3.
• U0: Age update policy. Increments age fields of all cache
lines until there is a candidate ready for eviction (age = 3).
Attacker Receiver Protocol. We now describe how the
attacker decodes from the replacement state whether A-B
or B-A occured. At a high level, similar to a traditional
cache attack, the attacker thread first primes the LLC set,
waits for the victim to issue its secret-dependent ordering,
and finally probes the LLC set to determine which order-
ing the victim issues. Due to the nature of QLRU, however,
the details are different from conventional attacks. Specif-
ically, the attacker first constructs two eviction sets of size
LLC_ASSOCIATIVITY-1 elements, call these EVS1 and
EVS2, which map to the same LLC set and slice as A and
B. The attacker then uses the following access sequences to
prime and probe the cache set:
• Prime Sequence: Access EVS1 many times + Access A
• Probe Sequence: Access EVS2
The attacker accesses EVS1 many times in order to saturate
their age at 0, leaving A with an age of 3. To be able to access
address A, our current PoC requires that the receiver share
memory with the victim (hence the use of Flush+Reload).
For our machine, the targeted cache sets have an associa-
tivity of 16. We will refer to elements in EVS1 as EV0-
EV14, and elements in EVS2 as EV15-EV29. The resulting
cache states for prime and probe with the A-B sequence is
displayed in Figure 8. The main principle is that only A
or B is still resident in the LLC by the end of the entire
prime+victim_accesses+probe sequence.
4.2.3 End-to-End Attack
In this section, we present the overall D-Cache PoC. The
attack steps shown in Figure 9 are explained in detail below:
1 Attacker initializes eviction sets based on addresses A,B.
4It is likely the case that the LLC cache sets do not strictly abide
by this replacement policy and have an adaptive replacement policy.
However, for the purposes of this PoC, the attack strategy that creates
observable replacement state changes on QLRU_H11_M1_R0_U0,
also creates observable replacement state changes on our machine.
9
EV0 EV1 EV2 EV3...EV11 EV12 EV13 EV14 A
2 3
(a) After Prime 
Sequence 2 2 2 2 2 2
(b) Victim
Access A-B
(c) Probe with 
EV15-EV29
B EV1 EV2 EV3...EV11 EV12 EV13 EV14 A
1 23 3 3 3 3 3
B EV15 EV16 EV17...EV25 EV26 EV27 EV28 EV29
3 23 3 3 3 3 3
Figure 8: QLRU State for the targeted cache set. EVN,A,B repre-
sent addresses and numbers represent the age for each cache line. (a)
shows the cache state after attacker primes the cache. (b) & (c) repre-
sent the cache states after the victim runs (with pattern A-B) and after
the attacker completes the probe. A victim access pattern of B-A has
analogous state changes.
≈ ≈
Victim (Sender)
Core 1
Attacker (Receiver)
Core 2
ti
m
e
1
2
4
5
find_eviction_set (A, B)
train_branch_predictor()
prime_llc_set()
A = contention_target()
y = load(A)
B = fixed_latency()
z = load(B)
N = pointer_chase()
if (i < N): //mis-spec
  secret = load(tgt[i])
  x = load(&S[secret*64])
  //miss(secret=0)⇒ A-B
  //hit(secret=1)⇒ B-A
  contention_gadget()
probe_llc_set()
load(A), load(B)
if(A cache_hit & B cache_miss)
  secret = 1
if(A cache_miss & B cache_hit)
  secret = 0
3
Figure 9: An End to End visualization of the D-Cache attack.
2 Attacker primes the LLC set replacement state (§ 4.2.2)
and mis-trains the victim’s branch predictor.
3 Victim issues loads to A and B, where order depends on
the secret (§ 3.2.2). If secret = 0, A-B is issued, and if secret
= 1, B-A is issued.
4 Attacker probes the LLC set replacement state (§ 4.2.2),
and observes the residency of A or B in the LLC set. The
residency is determined by issuing a timed access to A and B
and comparing it with a LLC cache miss threshold.
5 Attacker attempts to identify the secret bit. If the victim
issues the load sequence A-B (secret = 0), the expectation is
for load(A) to be a cache miss and load(B) to be a cache hit.
If the victim issues the load sequence B-A (secret = 1), the
expectation is for load(A) to be a cache miss and load(B) to
be a cache hit. Cases where both accesses are cache misses
can happen due to noise and are ignored.
6 Attacker repeats steps 2-5 as needed to increase confi-
dence in results.
Note, while our PoC observes load-reordering through
replacement state, other receivers not based on replacement
state might be possible.
4.3 I-Cache PoC
As discussed in § 3.2.2, the I-Cache attack works by cre-
ating contention on available reservation stations to create
a ripple effect that eventually stops the frontend from fetch-
ing more instructions, causing changes to the I-Cache access
pattern. We now describe this attack in detail.
4.3.1 Experiment Setup
As with the D-Cache PoC, we show the I-Cache PoC given
an attacker that monitors state from another physical core
through the LLC. The pseudo code for the victim is described
in Figure 5. Without loss of generality, the target instruction
used at line 9 in Figure 5 is a shared library function call.
For simplicity, our attack slightly differs from that explained
in Figure 5. Specifically, we move the target instruction
into the mis-speculated path (before the branch join point).
Thus, in a correct execution, the target instruction will not be
fetched (as opposed to fetched later). The receiver (attacker)
on the adjacent physical core issues a load to a shared library
function to perform the reload step.
0xf740:
target func LD A
ADD A, x0
ADD A, x1
...
ADD A, xN
CALL f740
Decode Dispatch
Fetch 0xf740
Load A
F    . . .    F
Inst. Fetch Reservation Station
To-be
SquashedA
2
3
4a
Back
pressure
4b
Cache
Hierarchy
Figure 10: Steps involved in performing an I-Cache attack. Dotted
lines are conditional events depending on Block A’s presence in cache.
4.3.2 Attack Details
Here we refer to the pseudo-code in Figure 5 and describe
the attack sequence. Item numbers refer to steps in Figure 10.
1 Attacker first primes the cache hierarchy by flushing out
the target address (shared function pointer) from the I-Cache.
2 When the victim runs next, it mis-speculates on a branch
(line 1 in Figure 5) that prompts the frontend to fetch transient
(bound-to-squash) instructions, which dispatches a secret
dependent load instruction followed by a large number of
load dependent arithmetic instructions.
3 The next steps proceed very similar to what we describe in
§ 3.2.2 GIRS paragraph. The victim dispatches LD A (line 4 in
Figure 5), which is setup to hit or miss in the cache hierarchy
based on secret index.
4a Miss: LD A misses, creating a frontend stall due to de-
pendent ADDs. Hence, as per our implementation, the target
instruction (call 0xf740 in Figure 10) is not fetched. When
the branch resolves, execution continues but because the tar-
get instruction was on the mis-speculated path, the target
address was never fetched into the cache.
4b Hit: LD A returns quickly and no RS congestion occurs.
The target instruction is executed and hence the target address
cache line is fetched into the I-Cache. Since the branch
resolves after the target instruction is executed, the fetched
line leaves a persistent change in cache state after the mis-
speculated instructions are squashed.
5 After waiting for the victim to run, the attacker performs
a standard probe step, re-loading the shared function pointer,
either via a function call (I-Cache) or by loading the contents
of the function pointer (D-Cache).
4.4 Attack evaluation
We run the PoCs in a cross-core setting and evaluate the
end-to-end covert channel error rate vs. throughput. See
Figure 11 (a) and (b) for the D-Cache and I-Cache PoC, re-
spectively. Throughput is defined as the number of secret bits
transmitted per unit time. It is represented as bits per second
(bps) and evaluated by measuring the CPU cycles required to
10
0 50 100 150 200
bit rate (bps)
0.0
0.2
0.4
Bi
t E
rro
r P
ro
ba
bi
lit
y (a) D-Cache
0 200 400 600 800 1000
bit rate (bps)
(b) I-Cache
Figure 11: Attack PoC channel error vs. bit rate. (a) DCU PoC
(Section 4.2), (b) ICU PoC (Section 4.3).
leak 1 bit. Error rate is defined as the number of incorrectly
inferred bits over the total number of bits transmitted. We can
trade-off error rate and bit rate by changing PoC parameters,
e.g., the number of times the PoC is run to leak each bit, the
amount of time spent trying to mistrain the branch predictor.
As a representative result: in the I-Cache PoC, choosing a rate
of 465 bps (0.2 error-rate), an AES-128 key can be leaked in
under 0.3s with 80% accuracy.
5. DEFENSES
We discuss various approaches for invisible speculation
designs to block speculative interference attacks. To this end,
we first propose a formal definition of what it means to block
all cache covert channels (§ 5.1). We describe two designs
that achieve this goal. The first design (§ 5.2) is straightfor-
ward, but imposes significant performance overhead, unlike
current designs [5,38]. We thus propose a high-level approach
for a more efficient solution (§ 5.2), whose exploration we
leave to future work.
5.1 Ideal Invisible Speculation
We define an ideal invisible speculation security prop-
erty, which formally models the security goal of eliminating
all speculative execution cache covert channels. Informally,
ideal invisible speculation requires that the system’s cache
state is invariant of speculative execution.
More formally: We assume a multi-core system with pri-
vate L1 I- and D-Caches and a shared L2 cache (or L2). In
an invisible speculation design, the L2 can receive visible
and invisible accesses. A visible access corresponds to a
standard cache fill or writeback, causing changes in both the
L1 and L2. An invisible request is a request type added by
the defense; it does not cause state changes in the L2 and its
response does not change state in the L1. We assume that the
attacker sees the sequence (without timing information) of
visible L2 accesses. We call this the L2 access pattern.
We formulate the security goal of the L2 access pattern
being invariant of speculation. To define this guarantee, we in-
troduce the following definitions. Given an execution E of the
microarchitecture, define C(E) as the L2 access pattern in E.
Define NoSpec(E) as the execution that would have occurred
if E had no mis-speculations. Then ideal invisible specula-
tion is the following property, akin to non-interference [17]:
For any execution E: C(E) =C(NoSpec(E)).
5.2 Basic Defense Design
Here, we present a simple solution that can provide ideal
invisible speculation. The idea is that, when instructions that
might cause a mis-speculation are inserted in the ROB, the
hardware automatically inserts a special type of fence. The
Figure 12: Performance of basic defense on SPEC2017 benchmarks.
fence allows subsequent instructions to be inserted into the
ROB, but prevents them from being issued until the instruc-
tion before the fence becomes non-speculative.
To achieve ideal invisible speculation, fences must be in-
serted after any instruction that may cause a squash. This
threat model (considering all forms of speculation) is some-
times referred to as the Futuristic model [56]. The design
can be tuned to consider only control-flow speculation (the
Spectre model [56]) by placing fences only after branches.
This requires adjusting the security property to consider only
control-flow speculation.
5.3 Evaluation of the Basic Defense
We evaluate the performance of the basic defense design
on the Gem5 [9] simulator. We model a high performance
multi-core system (8-issue OoO 2 GHz cores with 32 KB L1
D-Cache and 2 MB per-core shared L2 banks). This configu-
ration is similar to systems used in previous invisible specula-
tion work (e.g., [5, 38, 56]). We use the SPEC CPU2017 [21]
benchmarks with reference input size and Simpoints [45]
to identify around 10 representative execution regions per
benchmark and run 10 million instructions per simpoint.
Figure 12 shows the performance overhead of the basic
defense design over the unsafe, unmodified processor, un-
der both Spectre and Futuristic threat models. When adding
the basic defense scheme, the execution time becomes on
average 1.58× the execution time of the unsafe baseline for
Spectre threat model. In terms of Futuristic threat model,
the execution time of the basic defense scheme is on average
5.38× the execution time of the unsafe baseline. In compari-
son, prior works report overheads of < 5% and < 20% in the
Spectre and Futuristic models, respectively [5, 38, 56]. There-
fore, while the simple solution can achieve ideal invisible
speculation, it does so at a dramatic performance cost.
5.4 Discussion: Potential Advanced Defense
Here, we outline at a high level a more advanced solution
to speculative interference attacks. The approach relies on
designing hardware that follows two rules: 1) no instruction
releases its hardware resources while the instruction is specu-
lative, and 2) no instruction ever delays an older instruction.
Not Releasing Resources Early. This rule requires that a
speculative instruction releases the hardware resources it uses
only when it becomes non-speculative or gets squashed. Ex-
amples of such resources are reservation stations and exe-
cution units. This rule makes the duration of time that the
instruction occupies any resource independent of the instruc-
11
tion’s operands. The trade-off is that the instruction may hold
on to the resource for longer than in the baseline design.
Not Delaying Older Instructions. This rule is implemented
by assigning a priority tag to each instruction, based on its
age in the ROB. The general strategy is to follow three steps.
First, when two instructions with priorities i and j are about
to use any shared resource, the hardware gives precedence
to the instruction with higher priority. Second, to prevent
counter wrap-around problems, we maintain two priority tags
(head and tail) and use logic akin to FIFO full/empty to check
priority tag values on instructions. Third, when a branch is
squashed, the priority tag is reset to the correct value.
This design is straightforward when every single resource
is perfectly pipelined. This is currently the case for pipelined
EUs, cache ports and banks, and write back links. How-
ever, for resources that are not perfectly pipelined (e.g., non-
pipelined EUs), three different choices are available.
One approach is to make them fully pipelined (at the ex-
pense of longer latency). A second approach is to make the
corresponding resource scheduler smarter, so that it “looks
ahead in time” and anticipates if, by assigning the resource
to a low-priority instruction now, one may have to stall a
higher-priority instruction later. In this case, the low priority
instruction is stalled until the higher priority one uses the
resource. This strategy may not be possible all the time (or
be quite expensive/slow). The third approach is to design the
EU to be “squashable”. This means that it can be freed-up on
demand if a higher-priority instruction requests the EU. This
complicates the design, as it requires that the instruction cur-
rently using the resource be “re-issuable” (i.e., the hardware
holds its state and can reuse it to relaunch the operation).
Takeaway. Extending the invisible speculation approach
to block speculative interference attacks appears to involve
significant complexity and efficiency costs. Whether defenses
focusing on only cache attacks—but fully blocking them—
can be simpler or more efficient than defenses with more
comprehensive threat models [7, 53, 62, 63] is an interesting
question for future work.
6. RELATED WORK
Most speculative execution attacks that have been pre-
sented to date build on cache-based covert channels to leak
data, being inspired by either Spectre [12, 22, 29, 30, 34, 52]
or Meltdown [11, 32, 40, 49, 50, 54]. To our knowledge, only
SMoTherSpectre [8] and NetSpectre [43] suggest the use
of alternative covert channels, such as port contention, for
speculative execution attacks.
This paper proposes attacks on invisible speculation
schemes [5, 26, 31, 38, 56] that are designed to block cache-
based covert channels (§ 2). We provide extensive back-
ground on such schemes in § 2. CleanupSpec [37] targets
the invisible speculation goal of blocking only cache covert
channels, but uses a unique approach of (1) undoing cache
occupancy changes upon a squash and (2) using randomized
replacement to block replacement-related leakage. Cleanup-
Spec does not block speculative interference but makes its ex-
ploitation more challenging. For example, on a W -way asso-
ciative cache, we could use a sender that reordersW+1 unpro-
tected accesses to make cache occupancy secret-dependent.
We leave this as future work.
Beyond invisible speculation, there are several other hard-
ware mechanisms designed to block speculative execution at-
tacks [7,15,27,42,46,53,61,62,63]. Data-oblivious ISA exten-
sions [61], SpectreGuard [15], ConTExT [42], DAWG [27]
and CSF [46] require some degree of software support (e.g.,
setting up cache partitions, users annotating what data is
secret) which severely constrains adoption. STT [62, 63],
NDA [53] and SpecShield [7] are software-transparent hard-
ware mechanisms that propose selective speculation, allowing
certain instructions to execute speculatively while delaying
(or executing in a data-oblivious fashion [62]) others. None
of these schemes can comprehensively defeat all speculative
interference attacks. For example, while STT soundly blocks
speculative interference attacks that leak transiently accessed
data, it offers no protection against speculative interference
attacks that leak non-transiently accessed (bound-to-retire)
data.
Concurrent to this work, Fustos et al. [16] proposed Spec-
treRewind, which also makes the observation that younger
speculative instructions can influence the timing of older
bound-to-retire instructions. Yet, their work does not show
how this can be used to create a cache-based covert channel.
Thus, while they do point out that their attacks break Invi-
siSpec [56] and SafeSpec [26], it is a traditional contention
attack (§ 1) and explicitly outside of the scope of invisible
speculation schemes. Our attacks create cache-based covert
channels, and thus tolerate significantly weaker attackers.
7. CONCLUSION
This paper presented speculative interference attacks,
which show that invisible speculation schemes are not im-
mune to cache attacks. The broader implication of our work is
to demonstrate the security pitfalls of a well-studied approach
to building secure processors, namely to ignore “bandwidth”
or “contention” or “intermittent” covert channels and solely
focus on cache-based channels. Specifically, we show how
an attacker can convert timing changes into persistent state
changes. Long term, we hope our work helps set a research
agenda towards more comprehensive security definitions and
more secure, efficient invisible speculation mechanisms.
REFERENCES
[1] “8th and 9th generation intelÂo˝ coreâDˇc´ processor families datasheet,
volume 1 of 2,” https://www.intel.com/content/dam/www/public/us/en/
documents/datasheets/8th-gen-core-family-datasheet-vol-1.pdf.
[2] “Kaby lake - microarchitectures - intel - wikichip,”
https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake.
[3] A. Abel and J. Reineke, “nanobench: A low-overhead tool for running
microbenchmarks on x86 systems,” arXiv preprint arXiv:1911.03282,
2019.
[4] O. Aciicmez, J.-P. Seifert, and C. K. Koc, “Predicting secret keys via
branch prediction,” IACR’06, 2006.
[5] S. Ainsworth and T. M. Jones, “Muontrap: Preventing cross-domain
spectre-like attacks by capturing speculative state,” in ISCA’20, 2019.
[6] A. C. Aldaya, B. B. Brumley, S. ul Hassan, C. P. García, and
N. Tuveri, “Port contention for fun and profit,” IACR’18, 2018.
[7] K. Barber, A. Bacha, L. Zhou, Y. Zhang, and R. Teodorescu,
“SpecShield: Shielding Speculative Data from Microarchitectural
Covert Channels,” in PACT’19, 2019.
[8] A. Bhattacharyya, A. Sandulescu, M. Neugschwandtner, A. Sorniotti,
12
B. Falsafi, M. Payer, and A. Kurmus, “SMoTherSpectre: Exploiting
Speculative Execution through Port Contention,” in CCS’19, 2019.
[9] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi,
A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen,
K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, “The
Gem5 Simulator,” ACM SIGARCH Computer Architecture News,
no. 2, pp. 1–7, 2011.
[10] S. Briongos, P. Malagón, J. M. Moya, and T. Eisenbarth, “Reload+
refresh: Abusing cache replacement policies to perform stealthy cache
attacks,” in 29th {USENIX} Security Symposium ({USENIX} Security
20), 2020.
[11] C. Canella, D. Genkin, L. Giner, D. Gruss, M. Lipp, M. Minkin,
D. Moghimi, F. Piessens, M. Schwarz, B. Sunar, J. Van Bulck, and
Y. Yarom, “Fallout: Leaking data on meltdown-resistant cpus,” in
Proceedings of the ACM SIGSAC Conference on Computer and
Communications Security, 2019.
[12] G. Chen, S. Chen, Y. Xiao, Y. Zhang, Z. Lin, and T. H. Lai,
“SgxPectre attacks: Leaking enclave secrets via speculative execution,”
CoRR, vol. abs/1802.09085, 2018. [Online]. Available:
http://arxiv.org/abs/1802.09085
[13] D. Evtyushkin, R. Riley, N. Abu-Ghazaleh, and D. Ponomarev,
“Branchscope: A new side-channel attack on directional branch
predictor,” in ASPLOS’18.
[14] A. Fog et al., “Instruction tables: Lists of instruction latencies,
throughputs and micro-operation breakdowns for intel, amd and via
cpus,” Copenhagen University College of Engineering, vol. 93, p. 110,
2011.
[15] J. Fustos, F. Farshchi, and H. Yun, “Spectreguard: An efficient
data-centric defense mechanism against spectre attacks,” 2019 56th
ACM/IEEE Design Automation Conference (DAC), pp. 1–6, 2019.
[16] J. Fustos and H. Yun, “Spectrerewind: A framework for leaking
secrets to past instructions,” 2020.
[17] J. A. Goguen and J. Meseguer, “Security policies and security models,”
in 1982 IEEE Symposium on Security and Privacy, 1982.
[18] B. Gras, C. Giuffrida, M. Kurth, H. Bos, and K. Razavi, “Absynthe:
Automatic blackbox side-channel synthesis on commodity
microarchitectures,” 2020.
[19] J. Großschädl, E. Oswald, D. Page, and M. Tunstall, “Side-channel
analysis of cryptographic software via early-terminating
multiplications,” 2009.
[20] J. L. Hennessy and D. A. Patterson, Computer Architecture, Sixth
Edition: A Quantitative Approach, 6th ed. Morgan Kaufmann
Publishers Inc., 2017.
[21] J. L. Henning, “SPEC CPU2006 Benchmark Descriptions,” ACM
SIGARCH Computer Architecture News, no. 4, pp. 1–17, 2006.
[22] J. Horn, “Speculative execution, variant 4: speculative store bypass,”
https://bugs.chromium.org/p/project-zero/issues/detail?id=1528, 2018.
[23] Intel, “Refined Speculative Execution Terminology,”
https://software.intel.com/security-software-
guidance/insights/refined-speculative-execution-terminology, 2020.
[24] A. Jaleel, K. B. Theobald, S. C. Steely Jr, and J. Emer, “High
performance cache replacement using re-reference interval prediction
(rrip),” ACM SIGARCH Computer Architecture News, vol. 38, no. 3,
pp. 60–71, 2010.
[25] M. Johnson, Superscalar Microprocessor Design. Prentice Hall
Englewood Cliffs, New Jersey, 1991.
[26] K. N. Khasawneh, E. M. Koruyeh, C. Song, D. Evtyushkin,
D. Ponomarev, and N. B. Abu-Ghazaleh, “Safespec: Banishing the
spectre of a meltdown with leakage-free speculation,” in DAC’19,
2019.
[27] V. Kiriansky, I. A. Lebedev, S. P. Amarasinghe, S. Devadas, and
J. Emer, “Dawg: A defense against cache timing attacks in speculative
execution processors,” in MICRO’18, 2018.
[28] V. Kiriansky and C. Waldspurger, “Speculative Buffer Overflows:
Attacks and Defenses,” Jul 2018.
[29] P. Kocher, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp,
S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom, “Spectre attacks:
Exploiting speculative execution,” in S&P’19, 2019.
[30] E. M. Koruyeh, K. N. Khasawneh, C. Song, and N. Abu-Ghazaleh,
“Spectre returns! speculation attacks using the return stack buffer,” in
WOOT’18, 2018.
[31] P. Li, L. Zhao, R. Hou, L. Zhang, and D. Meng, “Conditional
speculation: An effective approach to safeguard out-of-order execution
against spectre attacks,” in HPCA’19, 2019.
[32] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, S. Mangard,
P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg, “Meltdown:
Reading kernel memory from user space,” in USENIX Security’18,
2018.
[33] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee, “Last-level cache
side-channel attacks are practical,” in 2015 IEEE Symposium on
Security and Privacy, May 2015, pp. 605–622.
[34] G. Maisuradze and C. Rossow, “Ret2spec: Speculative execution
using return stack buffers,” in CCS’18, 2018.
[35] D. A. Osvik, A. Shamir, and E. Tromer, “Cache attacks and
countermeasures: The case of aes,” in CT-RSA’06, 2006.
[36] A. Sabelfeld and A. C. Myers, “Language-based information-flow
security,” IEEE Journal on Selected Areas in Communications, vol. 21,
no. 1, pp. 5–19, Jan 2003.
[37] G. Saileshwar and M. K. Qureshi, “Cleanupspec: An "undo" approach
to safe speculation,” in MICRO’19.
[38] C. Sakalis, S. Kaxiras, A. Ros, A. Jimborean, and M. Själander,
“Efficient Invisible Speculative Execution Through Selective Delay and
Value Prediction,” in ISCA’19, 2019.
[39] J. Schulist, D. Borkmann, and A. Starovoitov, “Linux Socket Filtering
aka Berkeley Packet Filter (BPF),”
https://www.kernel.org/doc/Documentation/networking/filter.txt, 2018.
[40] M. Schwarz, M. Lipp, D. Moghimi, J. Van Bulck, J. Stecklina,
T. Prescher, and D. Gruss, “ZombieLoad: Cross-privilege-boundary
data sampling,” in CCS, 2019.
[41] M. Schwarz, C. Maurice, D. Gruss, and S. Mangard, “Fantastic timers
and where to find them: high-resolution microarchitectural attacks in
javascript,” in International Conference on Financial Cryptography
and Data Security. Springer, 2017, pp. 247–267.
[42] M. Schwarz, R. Schilling, F. Kargl, M. Lipp, C. Canella, and D. Gruss,
“ConTExT: Leakage-Free Transient Execution,” arXiv e-prints, May
2019.
[43] M. Schwarz, M. Schwarzl, M. Lipp, and D. Gruss, “Netspectre: Read
arbitrary memory over network,” in ESORICS’19, 2019.
[44] M. Schwarz, S. Weiser, D. Gruss, C. Maurice, and S. Mangard,
“Malware guard extension: Using sgx to conceal cache attacks,” in
International Conference on Detection of Intrusions and Malware, and
Vulnerability Assessment. Springer, 2017, pp. 3–24.
[45] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, “automatically
characterizing large scale program behavior.”
[46] M. Taram, A. Venkat, and D. Tullsen, “Context-sensitive fencing :
Securing speculative execution via microcode customization,” in
ASPLOS’19, 2019.
[47] M. Tiwari, H. M. Wassel, B. Mazloom, S. Mysore, F. T. Chong, and
T. Sherwood, “Complete information flow tracking from the gates up,”
in ASPLOS’09, 2009.
[48] R. M. Tomasulo, “An efficient algorithm for exploiting multiple
arithmetic units,” IBM Journal of Research and Development, vol. 11,
no. 1, pp. 25–33, 1967.
[49] J. Van Bulck, M. Minkin, O. Weisse, D. Genkin, B. Kasikci,
F. Piessens, M. Silberstein, T. F. Wenisch, Y. Yarom, and R. Strackx,
“Foreshadow: Extracting the keys to the Intel SGX kingdom with
transient out-of-order execution,” in USENIX Security’18, 2008.
[50] S. van Schaik, A. Milburn, S. ÃU˝sterlund, P. Frigo, G. Maisuradze,
K. Razavi, H. Bos, and C. Giuffrida, “RIDL: Rogue in-flight data
load,” in S&P, May 2019.
[51] P. Vila, P. Ganty, M. Guarnieri, and B. Köpf, “Cachequery: Learning
replacement policies from hardware caches,” arXiv preprint
arXiv:1912.09770, 2019.
[52] J. Wampler, I. Martiny, and E. Wustrow, “Exspectre: Hiding malware
in speculative execution.” in Proceedings of the Symposium on
Network and Distributed System Security (NDSS), 2019.
[53] O. Weisse, I. Neal, K. Loughlin, T. Wenisch, and B. Kasikci, “NDA:
13
Preventing Speculative Execution Attacks at Their Source,” in
MICRO’19, 2019.
[54] O. Weisse, J. Van Bulck, M. Minkin, D. Genkin, B. Kasikci,
F. Piessens, M. Silberstein, R. Strackx, T. F. Wenisch, and Y. Yarom,
“Foreshadow-NG: Breaking the virtual memory abstraction with
transient out-of-order execution,” Technical report, 2018.
[55] W. Xiong and J. Szefer, “Leaking Information Through Cache LRU
States,” in HPCA’20, 2020.
[56] M. Yan, J. Choi, D. Skarlatos, A. Morrison, C. W. Fletcher, and
J. Torrellas, “InvisiSpec: Making Speculative Execution Invisible in
the Cache Hierarchy,” in MICRO’18, 2018.
[57] M. Yan, R. Sprabery, B. Gopireddy, C. Fletcher, R. Campbell, and
J. Torrellas, “Attack Directories, Not Caches: Side Channel Attacks in
a Non-Inclusive World,” in IEEE S&P, 2019.
[58] F. Yao, M. Doroslovacki, and G. Venkataramani, “Are Coherence
Protocol States Vulnerable to Information Leakage?” in HPCA’18,
2018.
[59] Y. Yarom and K. Falkner, “Flush+Reload: A high resolution, low
noise, L3 cache side-channel attack,” in USENIX Security’14, 2014.
[60] Y. Yarom, D. Genkin, and N. Heninger, “CacheBleed: A Timing
Attack on OpenSSL Constant Time RSA,” IACR’16, 2016.
[61] J. Yu, L. Hsiung, M. E. Hajj, and C. W. Fletcher, “Data oblivious isa
extensions for side channel-resistant and high performance computing,”
in NDSS’19, https://eprint.iacr.org/2018/808.
[62] J. Yu, N. Mantri, J. Torrellas, A. Morrison, and C. W. Fletcher,
“Speculative Data-Oblivious Execution (SDO): Mobilizing Safe
Prediction For Safe and Efficient Speculative Execution,” in ISCA’20.
[63] J. Yu, M. Yan, A. Khyzha, A. Morrison, J. Torrellas, and C. W.
Fletcher, “Speculative Taint Tracking (STT): A Comprehensive
Protection for Speculatively Accessed Data,” in MICRO’19.
14
