Survey of Transient Execution Attacks by Xiong, Wenjie & Szefer, Jakub
Survey of Transient Execution Attacks
Wenjie Xiong
Dept. of Electrical Engineering
Yale University
wenjie.xiong@yale.edu
Jakub Szefer
Dept. of Electrical Engineering
Yale University
jakub.szefer@yale.edu
ABSTRACT
Transient execution attacks, also called speculative execution at-
tacks, have drawn much interest as they exploit the transient exe-
cution of instructions, e.g., during branch prediction, to leak data.
Transient execution is fundamental to modern computer architec-
tures, yet poses a security risk as has been demonstrated. Since the
first disclosure of Spectre and Meltdown attacks in January 2018, a
number of new attack types or variants of the attacks have been
presented. These attacks have motivated computer architects to re-
think the design of processors and propose hardware defenses. This
paper summarizes the components and the phases of the transient
execution attacks. Each of the components is further discussed and
categorized. A set of metrics is proposed for each component to
evaluate the feasibility of an attack. Moreover, the data that can be
leaked in the attacks are summarized. Further, the existing attacks
are compared, and the limitations of these attacks are discussed
based on the proposed metrics. In the end, existing mitigations at
the micro-architecture level from literature are discussed.
KEYWORDS
Transient Execution, Speculative Execution, TimingChannels, Covert
Channels, Secure Processor Architectures
1 INTRODUCTION
In the past decades, computer architects have been working hard
to improve the performance and power efficiency of computing
systems. Different optimizations have been introduced in the var-
ious processor micro-architectures to improve the performance,
including pipelining, out-of-order execution, and branch predic-
tion [42]. Some of the optimizations require aggressive speculation
of the executed instructions. For example, while waiting for a con-
ditional branch to be resolved, branch prediction is used to predict
the direction of the branch and the processor begins to execute code
down the predicted control path before the outcome of the branch is
known. Such speculative execution of instructions causes the micro-
architectural state of the processor to be modified. The execution of
the instructions down the speculated path is commonly called the
transient execution – since the instructions execute transiently and
should disappear with no side-effects if there was mis-speculation.
When a mis-speculation is detected, the architectural and micro-
architectural side effects should be cleaned up – but it is not done
so today, leading to a number of recently publicized transient exe-
cution attacks [16, 53, 60, 70, 83, 95, 96, 102].
Besides focusing on pure performance optimization, many pro-
cessors are designed to share hardware units in order to reduce
area and improve power efficiency. For example, hyper-threading
allows different programs to execute concurrently on the same
processor pipeline by sharing the execution and other functional
units among the hardware threads in the pipeline. Further, because
supply voltage does not scale with the size of the transistors [72],
modern processors use multi-core designs. In multi-core systems,
caches, memory-related logic, and peripherals are shared among
the different processor cores. Sharing of the resources has lead to
numerous timing-based side and covert channels [41, 64, 91, 117]
– the channels can occur independent of transient execution, or
together with transient execution, which is the focus of this survey.
The covert channels can be used to exfiltrate data from the transient
states.
Today’s processor designs aim to ensure the execution of a pro-
gram results in architectural states as if each instruction is exe-
cuted in the program order. At the Instruction set architecture
(ISA) level, today’s processors behave correctly. Unfortunately, the
complicated underlying micro-architectural states, due to different
optimizations, are modified during the transient execution, and
the various transient execution attacks have shown how to make
the modification visible so data can be leaked. For example, timing
channels can lead to information leaks that can reveal some of
the micro-architectural states which are not available at the ISA
level [41, 64, 91, 117]. Especially, the micro-architectural states of
a processor are not captured by the ISAs today, e.g., pipelining,
the out-of-order execution, speculative execution, operation of the
caches, sharing of units in hyper-threading, and the other sharing
of resources in multi-core designs.
Transient execution combined with covert channels results in
transient execution attacks which can compromise the confiden-
tiality of the system, e.g., Spectre [1, 10, 20, 53, 54, 65, 86, 93],
Meltdown [5, 16, 52, 60], Foreshadow [95, 102], LazyFP [89], or
Micro-architectural Data Sampling (MDS) [70, 83, 96]. These tran-
sient execution attacks have been shown to break the security
boundaries, e.g., privilege level, SGX enclave, sandbox, etc. These
attacks have raised a lot of interests, and motivated computer archi-
tects to rethink the design of processors and propose a number of
hardware defenses [8, 32, 50, 51, 85, 108]– this survey summarizes
the attacks and hardware defenses, while software-based defenses
are summarized in existing work [16].
This paper provides a survey of existing transient execution at-
tacks. We start by analyzing each component of the transient ex-
ecution attacks to show the root cause of these attacks. Then, we
discuss the phases of the attacks and the data that can be leaked in
these attacks. In the end, we discuss the mitigation strategies for
each attack component and explain several existing hardware-based
mitigations. The contributions are the following:
• We provide a taxonomy of the existing transient execution
attacks by analyzing each component of the attack. We pro-
pose metrics to compare the attacks on different dimensions.
• We summarize and categorize the existing and potential
timing-based covert channels in micro-architectures that
ar
X
iv
:2
00
5.
13
43
5v
1 
 [c
s.C
R]
  2
7 M
ay
 20
20
can be used for transient execution attacks. We also propose
metrics to compare the covert channels quantitively.
• We discuss the limitations of the existing attacks based on
the metrics we proposed.
• We list and compare the different mitigation strategies in
micro-architectural designs.
2 COMPONENTS AND PHASES OF
TRANSIENT EXECUTION ATTACKS
Transient execution covert channel attacks are attacks that leak
data from transient execution through a covert channel, where the
secret or sensitive data is only available to the attacker when their
code is executing transiently.
2.1 Components of the Attacks
The transient execution attacks contain two components: transient
execution and a covert channel, as shown in Figure 1. Especially,
the secret or sensitive data is only available to the attacker under
transient execution – this differentiates the transient execution
attacks from conventional covert channel attacks where the data is
available, not just during transient execution1. After the secret data
is accessed in transient execution and encoded into a covert channel,
the secret data is extracted via the covert channel. Following the
terminology in [94], we use the following terms:
• Speculation Primitive: The piece of code that causes tran-
sient execution to happen, e.g., prediction or an exception.
• Disclosure Gadget: The code to be executed transiently,
which encodes the secret information into a covert channel.
• Disclosure Primitive: The covert channel used to make
the secret observable by an attacker.
2.2 Attack Phases
As shown in Figure 1, there are three phases in the attacks:
Phase 1 – setup. The processor executes a set of instructions
which modify the micro-architectural states such that it will later
cause the transient execution of disclosure gadget to occur in a
manner predictable to the attacker. An example is performing many
branches at a specific address to “train” the branch predictor. The
setup can be done by the attacker running some code or the attacker
causing the victim to run in a predictable (to the attacker) manner
so that the micro-architectural state is set up as the attacker expects.
Phase 2 – transient execution and encoding to the covert
channel. The transient execution is actually triggered. The cause
of transient execution is also known as speculation primitive. The
disclosure gadget is executed transiently, to encode the secret into
a covert channel. The instructions are eventually squashed, and
the architectural states of the transient instructions are rolled back.
Phase 2 can be either in the victim’s code or in the attacker’s code.
Phase 3 – decoding from the covert channel. The attacker is
able to recover the data via the covert channel, called disclosure
primitive of the attacker.
1There are also attacks using the timing difference in transient execution, e.g., [29–
31, 31, 46]. These attacks are still conventional covert channel attacks, where the
timing difference comes from the prediction units. Thus, these attacks are not in the
scope of this paper.
During an attack, Phase 1 and 2 cause the transient execution
of disclosure gadget with the speculation primitive. Then, Phase 2
and 3 leverage the covert channel (a.k.a., disclosure primitive).
2.3 Where is Transient Execution Occurring?
Each phase can be performed by the attacker code or by the victim
code, resulting in eight attack scenarios in Figure 2. When a phase
is performed by the victim, the attacker should have the ability to
trigger the victim to execute the code. We categorize the attacks
based on who is execution transiently and encodes the secret to the
covert channel in Phase 2.
2.3.1 Victim is Executing Transiently: If the victim is the one who
executes transiently (Figure 2 (a-d)), the victim is caused to execute
instructions that encode some secret into the covert channel during
transient execution, and the attacker obtains the secret by decoding
the signals from the covert channel. In this scenario, the attacker is
assumed to be able to control or trigger the victim’s execution. The
attacker can do this by calling some victim functions with certain
parameters, e.g., in SGXpectre [20], the attacker can launch the
target enclave program.
Different from the conventional side and covert channels, here,
the victim encoding phase is executed transiently, and thus, the
attack cannot be detected by simply analyzing the software seman-
tics of the victim code. This attack vector leverages the difference
between the expected semantics of software execution and the
execution in hardware and is fundamental in current computer
architectures.
To prepare for the transient execution (i.e., setup phase), there are
also two options. First, sharing of hardware components that cause
transient execution, e.g., the prediction unit, between the attacker
and the victim, as shown in Figure 2 (c,d). So the attacker can be in
control of the executed code. Second, the attacker triggers some of
the setup code in the victim domain, as shown in Figure 2 (a,b). For
the first option, the attacker needs to prepare some code to setup
the hardware to lure the victim into desired transient execution. For
the second option, the attacker needs to understand the victim’s
code, and be able to trigger the code execution with a controlled
input, e.g., call a function of the victim code.
To decode data from the covert channel, there are also two op-
tions: 1. the attacker using her code to observe the covert channel,
as shown in Figure 2 (b,d); 2. the attacker use the victim code to
observe the channel, as shown in Figure 2 (a,c). For the second case,
the attacker may directly query some victim code, and the result of
the victim code reveals information on the channel, or the attacker
may trigger the execution of some piece of the victim code and
measure the time or other side effect of the execution.
The attacker can use the victim code to complete both setup and
decoding steps, as shown in Figure 2 (a), such as in [86]. In this
case, the attacker can even launch the attack remotely. But most
demonstration of transient execution attacks leverages scenarios
(b,d) in Figure 2, because they use less code in the victim codebase
and are easier for the attacker.
2.3.2 Attacker is Executing Transiently. As shown in Figure 2 (e-h),
the attacker can directly obtain the secret in transient execution.
The attacker will then encode the data into a covert channel and
2
Instruction triggering 
transient execution
1. Load Secret
Tr
an
sie
nt
 
Ex
ec
ut
io
n.
speculation window
2. Encode to 
covert channel
…
time…
Disclosure 
gadget
Speculation 
primitive 
Setup
Phase 1: Phase 3:
decode from the 
covert channel 
Phase 2:
Covert Channel
Disclosure primitive 
1
2
Figure 1: Phases of transient execution attacks.
V
Victim Attacker
Decode from 
covert chan.
V
Victim Attacker
c) d)
Transient 
execution:
Encode to 
covert chan.
SetupSetupSetup
b)
Transient 
execution:
Encode to 
covert chan.
Transient 
execution:
Encode to 
covert chan.
Decode from 
covert chan.
Decode from 
covert chan.
V
Victim Attacker
A A A
Setup
a)
Transient 
execution:
Encode to 
covert chan.
Decode from 
covert chan.
V
Victim Attacker
At
ta
ck
er
 E
xe
cu
tin
g 
Tr
an
sie
nt
ly
A
Trigger
Trigger
Trigger
Trigger
Trigger
Trigger Trigger
Trigger
V
Victim Attacker
Decode from 
covert chan.
V
Victim Attacker
g) h)
Transient 
execution:
Encode to 
covert chan.
SetupSetupSetup
f)
Transient 
execution:
Encode to 
covert chan.
Transient 
execution:
Encode to 
covert chan.
Decode from 
covert chan.
Decode from 
covert chan.
V
Victim Attacker
A A A
Setup
e)
Transient 
execution:
Encode to 
covert chan.
Decode from 
covert chan.
V
Victim Attacker
A
Trigger
Trigger
Trigger
Trigger
Vi
ct
im
Ex
ec
ut
in
g 
Tr
an
sie
nt
ly
Figure 2: Possible scenarios of transient execution attacks: a-d) The attacker triggers part of victim code to execute transiently to leak secret
through the covert channel, or e-h) The attacker executes transiently to access data that she does not have permission to access and encode it
into the covert channel.
decode it to obtain the secret in the architectural state, such as in her
memory. The attacker can also launch different software threads
for the setup or the decoding phases. The attacker’s code shown in
Figure 2 (e-h) might be in different threads even on different cores.
During the attack, the attacker directly obtains the secret in
transient execution, and thus, the attacker should be able to know
the location of the victim data. There might be only the attacker
code running, or the attacker and the victim running in parallel.
When there is only the attacker code running, the victim’s protected
data should be addressable to the attacker or the data is in some
register in the hardware, i.e., the attacker should have a way to point
to the data. In Meltdown [60], the attacker code first loads protected
data by its virtual address to register and transfer the data through
a covert channel. When the attacker and the victim are running
concurrently, the attacker should be able to partially control the
victim’s execution or synchronize with the victim execution. In
MDS attacks [70, 83, 96], the attacker need to synchronize with the
victim execution to extract useful information from the in-flight
data of the victim.
The setup phases and decoding phases can also be done by the
victim, resulting in four attack scenarios in Figure 2 (e-h). However,
all the known attacks where the attacker executes transiently, i.e.,
Meltdown [60] and MDS [70, 83, 96], use exception (of the attacker)
as the speculation primitive, and there is no need to train any
predictor. Moreover, it is more practical for the attacker to decode
from the covert channel rather than triggering the victim to decode.
Thus, scenario (h) in Figure 2 is usually leveraged by the attacker
in attacks where the attacker is executing transiently.
3
In micro-architectural implementations, transient execution al-
lows the attacker to access more data than it is allowed in the archi-
tecture layer. Thus, this type of attacks is implementation depen-
dent and does not work on all the CPUs, e.g., meltdown [60], Fore-
shadow [95, 102], Micro-architectural Data Sampling (MDS) [70,
83, 96], are reported to work on Intel processors.
3 TRANSIENT EXECUTION
In this section, we focus on how to setup and trigger transient exe-
cution in Phase 1 and 2, respectively. Transient execution happens
when the pipeline is squashed following a mis-speculation or detec-
tion of an exception and the all the architectural states are rolled
back, but not all the micro-architectrual side effects are cleaned up.
We first discuss all possible causes of transient execution, discuss
the features of transient execution that are required for an attack,
and analyze under what condition the transient execution can be
leveraged for attacks.
3.1 Causes of Transient Execution
The following is a list of possible causes of squashing the pipeline,
which in turn are all the possible causes of transient execution.
Mis-prediction: The first possible cause for having to squash a
pipeline is mis-prediction. Modern computer architectures make
predictions to make full use of the pipeline to gain performance.
When the prediction is correct, the execution continues and the
results of the predicted execution will be used. In this way, pre-
dictions boost the performance by executing instructions earlier.
If the prediction is wrong, the pipeline will be squashed, and the
architecture states are rolled back as if the prediction never hap-
pened. There are three types of predictions: control flow prediction,
address speculation, and value prediction.
(1) Control FlowPrediction:Control Flow Prediction predicts
the execution path that a program will follow. Branch prediction
unit (BPU) stores the history of past branch directions and targets
and leverages the locality in the program control flow to make
predictions for future branches. BPU predicts whether the branch
is to be taken or not (i.e., branch direction) by using pattern history
table (PHT), and what is the target address (i.e., branch or indirect
jump target) by branch target buffer (BTB) or return stack buffer
(RSB).
(2) Address Speculation: Address speculation is a prediction
on whether two addresses are the same when the addresses are not
fully available yet. It is used to improve performance in the memory
system, e.g., store-to-load (STL) forwarding in the load-store queue,
line-fill buffer (LFB) in the cache.
(3) Value Prediction: To further improve the performance,
while the pipeline is waiting for the data to be loaded from memory
hierarchy, value prediction units have been designed to predict the
data value and to continue the execution based on the prediction.
While this is not known to be implemented in commercial architec-
tures, value prediction had been proposed in literature [58, 59].
Exceptions: The second possible cause for having to squash a
pipeline is exceptions. If an instruction causes an exception, the
following instructions in the pipeline will be squashed when the
instructing is to retire. And the OSwill come to handle the exception.
[16] lists all the exception types or permission bit violations, which
can lead to exceptions.
There are also methods to suppress an exception. For example,
using transactional memory (Intel TSX [2]), if a problem occurs
during the transaction, all the architectural states in the transaction
will be rolled back by a transaction abort, suppressing the fault [83,
96]. Another way is to put the load that would cause exception
in a mis-predicted branch. In this survey, even if the exception is
suppressed later, we categorize the attack to be due to exceptions.
Interrupts: The third possible cause for having to squash a
pipeline is interrupts. If a peripheral device or a different core causes
an interrupt, the CPUwill handle the interrupt first, and meanwhile,
squash all the instructions in the pipeline. After the interrupt is
handled, the current program will continue the execution, i.e., the
instructions will be fetched into the pipeline.
Load-to-load Reordering (Multi-Core): The fourth possible
cause for having to squash a pipeline is load-to-load reordering.
x86 architectures use total store order (TSO) memory model [87].
In TSO, all observable load and store reordering are not allowed
except store to load reordering where a load bypasses an older
store of a different address. To prevent a load to load reordering,
if a load has performed but not yet retired and the core receives a
cache invalidation for the line read by the load, the pipeline will be
squashed.
3.1.1 Causes of Transient Execution in Existing Attacks. Not all
transient execution can cause an attack, and Table 1 shows the spec-
ulation primitives leveraged in existing attacks. (Mis-)prediction is
leveraged in Spectre-type attacks, e.g., [53]. Address speculation
is leveraged in MDS attacks, e.g., [70, 83, 96]. Exceptions of loads
or stores are leveraged in Meltdown-type attacks, e.g., [60, 95, 102].
Other types of exceptions, interrupts, and load-to-load reordering
are not currently considered to be exploitable. Because the instruc-
tions that get squashed are in the execution path, the execution will
be resumed later on, and no extra data is accessible to the attacker
during the transient execution.
The sample codes of different variants are shown in Figure 3. The
victim code should allow a potential mis-prediction or exception
to happen. In Spectre V1, to leverage PHT, a conditional branch
should exist in the victim code followed by the gadget. Similarly, in
Spectre V2 and V5, the victim code should have an indirect jump (or
a return from a function) that uses BTB (or RSB) for prediction of
the execution path. In Spectre V4, to use STL, the victim code should
have a store following a load having potential address speculation.
In Meltdown, the attacker code should make an illegal load to cause
an exception.
3.2 Metrics for Speculation Primitives
If the attacker wants to use a speculation primitive to launch a
transient execution attack, the attacker should be able to cause
transient execution of the disclosure gadget in a controlled manner.
We use the following metrics to evaluate speculation primitives:
• Required Control of Victim Execution: This metric evalu-
ates whether the attacker needs to control the execution of victim
code – details will be discussed in Section 3.3.
• Level of Sharing: This metric evaluates how close the attacker
should co-locate with the victim and whether the attacker should
4
Table 1: Causes of Transient Execution and Existing Attacks Types.
Cause of Transient Execution Attack Type
Prediction
Control Flow Prediction Spectre (except V4)
Address Speculation Spectre V4, MDS
Value Prediction Not implemented in commercial architectures today
Exception Meltdown
Interrupts no existing attacks today
Load-to-load reordering no existing attacks today
struct array *arr1 = ...;
struct array *arr2 = ...;
unsigned long offset = ...;
if (offset < arr1_len) {
sec = arr1[offset];
value2 = arr2[sec*c];}
Spectre V1:
...
jmp LEGITIMATE_TRGT
...
mov r8, QWORD PTR[r15]
lea rdi, [r8]
...
Spectre V2:
The attacker trains the 
BTB to jump to the 
disclosure gadget.
Call Fun1
...
...
ret
...
mov r8, QWORD PTR[r15]
lea rdi, [r8]
...
Fun1:
main:
The attacker pollutes the 
RSB, to return to disclosure 
gadget after Fun1. 
Spectre V5:
The attacker trains the PHT 
to execute disclosure gadget.
char * ptr = sec;
char **slow_ptr = *ptr;
clflush(slow_ptr)
*slow_ptr = pub;
value2 = 
arr2[(*ptr) *c];
The attacker delays the 
address calculation 
causing speculation.
Spectre V4:
(rcx = address lead 
to exception)
Retry:
mov al, byte [rcx]
shl rax, 0xc
jz retry
Mov rbx, qword [rbx + rax]
The attacker access  the 
address in rcx to cause a  
exception.
Meltdown:
Figure 3: Example code of transient execution attacks. Code highlighted in orange triggers transient execution. Code highlighted in yellow
with dashed frame is the disclosure gadget.
share memory space with the victim to trigger the transient execu-
tion in a controlled manner – details will be discussed in Section 3.4.
• Speculative Window Size: This metric indicates how many
instructions can be executed transiently – the speculation window
size will be discussed in more detail in Section 3.5.
• Exploitable Data: This metric indicates what secret can be
accessed during transient execution – this will be discussed in the
consequence of the attack in Section 3.6.
3.3 Required Control of Victim Execution
The speculation primitive usually contains two parts: the code that
needs prediction, e.g., conditional branch, direct or indirect jump,
and the code that mis-trains the prediction units. In Phase 1, the
prediction unit is mis-trained to direct future prediction to execute
the disclosure gadget. In Phase 2, the transient execution of the
disclosure gadget is triggered.
For the case of using mis-predicition as speculation primitive, the
(mis-)training can be part of victim code, which is triggered by the
attacker. In the example of Spectre V1, the attacker can first provide
legal inputs to train the PHT to execute the gadget branch. Then,
the training code will always share the same prediction unit as
when the real attack happens. But in this case, the attacker should
be able to control the execution of victim code. The (mis-)training
code can also be a part of the attacker’s code and run in parallel
with the victim code, e.g., in Spectre V2. Then, it is required that
the attacker’s training thread and the victim’s thread should be
co-located to share the same prediction unit. Further, to share the
same entry of the prediction unit, if the prediction unit is indexed
by physical address, the attacker and the victim should also share
the same memory space to share the entry. The required control of
victim execution is summarized in Table 2.
For the speculation primitives that leverage exceptions, the in-
structions that follow the exception will be executed transiently,
and thus, no mis-training (phase 1) is required, but the attacker
needs to make sure the disclosure gadget is located in the code such
that it is executed after the exception-causing instruction.
3.4 Level of Sharing in Mis-Training
The attacker can mis-train the prediction when running in parallel
with the victim code. Then, it is required that the attacker’s train-
ing thread to share the same prediction unit with the victim. The
following discusses different prediction mechanisms.
3.4.1 Control Flow Prediction: To predict the branch direction,
modern branch predictors use a hybrid mechanism [28, 47, 67, 69,
88]. One major component of the branch predictor is the PHT.
Typically, a PHT entry is indexed based on some bits of the branch
address, so a branch at a certain virtual address will always use
the same entry in the PHT. In each entry of the PHT, a saturating
counter stores the history of the prior branch results, which in turn
is used to make future predictions.
To predict the branch targets, a BTB stores the previous target
address of branches and jumps. Further, a return instruction is a
special indirect branch that always jumps to the top of the stack.
The BTB does not give a good prediction rate on return instructions,
and thus, RSB has been introduced in commercial processors. The
RSB stores N most recent return addresses.
In Intel processors, the PHT and BTB2 are shared for all the
processes running on the same physical core (same or different
logical core in SMT). The RSB is dedicated to each logical core
in the case of hyper-threading [65]. Table 3 shows whether the
prediction unit can be trained when the training code and the
2In [30], the authors did not observe BTB collision between logical cores. However, it
is demonstrated that the attacker can mis-train the indirect jump of a victim when
they are two hyper-threads sharing the same physical core in [53]. Thus, we think
BTB is shared across hyper-threads in some of the processors.
5
Table 2: Required Control of Victim Execution in Different Attack Scenarios
Scenarios in Figure 2 Phase 1 Phase 2 Required Control of Victim Execution
a,b Victim Victim trigger desired victim execution
c,d Attacker Victim share prediction unit and address space to train
e,f Victim Attacker trigger desired victim execution
g,h Attacker Attacker not required, all done by the attacker
Table 3: Level of Sharing and (Mis-)training the Prediction Unit on Intel processors.
Prediction Unit
sam
e th
read
sam
e co
re, S
MT
sam
e ch
ip, d
iffer
ent
core
sam
e m
othe
rboa
rd
Branch
PHT [31, 52] f(virtual addr) f(virtual addr) – –
BTB [30, 53] f(virtual addr) f(virtual addr)a – –
RSB [65] not by addressb – – –
Address
STL [46, 70] f(physical addr) c – – –
LFB [83, 96] not by address not by address – –
Otherd
“–" indicates the prediction unit is not possible to be trained under the corresponding sharing setting; Otherwise, the
prediction unit can be trained and “f(virtual addr)" indicates the prediction unit is indexed by a function of the virtual
address, “f(physical addr)" indicates the prediction unit is indexed by a function of the physical address, and “not by
address" indicates the prediction unit is not indexed by addresses. a Conflicting results are presented in different
publications [30, 53]. b Most OSes overwrite RSBs on context switches. c STL is possible after context switch, but
not on SGX enclave exit. d In [83], it is indicated that there could be other structures which forward data speculatively.
victim are running in parallel in different settings. The results are
implementation-dependent and Table 3 shows the result from Intel
processors.
The prediction units sometimes have many entries, and the at-
tacker and the victim should use the same entry for mis-training.
The attacker and the victim will use the same entry only if they
are using the same index. When the prediction unit is indexed by
virtual address, such as the PHT and the BTB, the attacker can
train the prediction unit from another address space using the same
virtual address as the victim code, as shown in Table 3. If only part
of the virtual address is used as the index, the attacker can train
with an aliased virtual address, which maps to the same entry of
the prediction unit as the victim address. The RSB does not index
by the address, it overflows and has conflicts when there are more
than N nested calls, and will cause mis-prediction.
3.4.2 Address Speculation: One of the uses of address speculation
is in the memory disambiguation to resolve read-after-write haz-
ards, which is the data dependency between instructions in out of
order executions. In Intel processors, there are two known uses of
address speculation. First, loads are assumed not to conflict with ear-
lier stores with unknown addresses, and speculatively store-to-load
(STL) forwarding will not happen. When the address of a store is
later resolved, the addresses of younger loads will be checked. And
if store-to-load forwarding should have happened and data depen-
dence has been violated, the loads will be flushed, and the new data
is reloaded from the store, as shown in the attacks [1, 81]. Second,
for performance, when the address of a load partially matches the
address of a preceding store, the store buffer will forward the data
of the store to the load speculatively, even though the full addresses
of the two may not match [70]. In the end, if there is mis-prediction,
the load will be marked as faulty, and flushed and reloaded again.
Another use of address speculation is in conjunction with the
line-fill buffer (LFB), which is the buffer storing cache-lines to be
filled to L1 cache. LFB may forward data speculatively without
addresses [83, 96]. Address speculation may also be used in other
hardware structures in Intel processors, as indicated in [83].
To trigger address speculation, the availability of the address
should be delayed to force the hardware to predict the address. One
way is to make the address calculation depends on some uncached
data, as in Spectre V4 [1]. Another way is to use a newly mapped
page, so that the physical address is available only after OS handling
the page-in event, as in [96]. In an extreme case, the speculation
can even be caused by a NULL pointer or an invalid address, and
then the error is suppressed in the attacker code, as in attack [83].
In STL, the entries are indexed by a function of physical addresses.
In this case, the training code needs to share memory space with
the victim to achieve an attack.
3.5 Speculative Window Size
To let an attack happen, there should be a large enough speculative
window for the disclosure gadget to finish executing transiently.
The speculative window size is the window from the time the tran-
sient execution starts (instruction fetch) to the time the pipeline
is squashed. In attacks leveraging predictions, the speculative win-
dow depends on the time the prediction is resolved. In a conditional
branch, the time to resolve the branch depends on the time to solve
the branch condition; in indirect jump, this depends on the time to
obtain the target address; and in address speculation, this depends
on the time to get the virtual and then the physical address. In
6
attacks leveraging exceptions, the speculative window depends on
the implementation of exceptions. To make the speculative window
large enough for the disclosure gadget, the attacker can delay the
obtaining the result of the branch condition or the addresses by
leveraging uncached loads from main memory, chains of dependent
instructions, etc.
3.6 Exploitable Data
Table 4 lists the information that can be leaked by the current
transient execution attacks, assuming the speculative window is
large enough for the attack to happen.
In Table 4, we categorize the Spectre-type attacks by the type of
prediction. We assume the victim is executing transiently and the
disclosure gadget can only read from memory that the victim could
access architecturally, as is assumed in [53]. In branch prediction,
the disclosure gadget in victim code will execute transiently, and
thus, any data then can be accessed by the victim legally can be
leaked to the attacker. In address speculation, specifically STL, stale
data or data depends on the stale data (e.g., data pointed by the
stale data) can be accessed by the attacker. In addition, the Spectre-
type attack can also access data transiently that is otherwise not
permitted. So the attacker will execute transiently to access the
secret. In Spectre V1 attack using SWAPGS instruction [12], kernel
data can be accessed by the attacker running in user level.
Meltdown-type attacks allow the attacker to access illegal data
directly in speculation. In some processor implementations, even
if a load causes an exception due to permission violation, the data
might still be propagated to the following instructions. For example,
in Meltdown [60], privileged data is accessible transiently to an
unprivileged user even if the privileged bit in the page table is
set. In L1 terminal fault (L1TF) [102], secret data in L1 cache is
accessible transiently even if the present bit in the page table is
not set. In Table 4, Meltdown-type attacks are categorized by the
cause of the exception and the related permission bit. [16] provides
a systematic review of Meltdown-attacks, categorizing the cause
of the exception and the related permission bit, the source of data
leakage.
MDS-type attacks also allow the attacker to access data that is
not permitted due to address speculation. Data present in the micro-
architecture buffers can be accessed speculatively to the attacker.
Different from Spectre-type and Meltdown-type attacks, address
speculation is usually due to the address not being available yet,
and thus, the data obtained is not related to the address used by the
attacker, but could be any data in the buffer at the moment of the
attack [70, 83, 96].
4 COVERT CHANNELS
Transient execution enables the attacker to access the secret data
transiently, and for the attacker to eventually obtain the secret
data in architectural states, a covert channel3 [91] is required. The
disclosure gadget is the piece of code that encodes the data into the
covert channel when executing transiently. There is a distinction
between conventional channels where the encoding happens in
software execution path, and transient execution channelswhere the
3The channel is considered a covert channel, not a side channel [53], because the
attacker has control over the disclosure gadget, which encodes the secret.
encoding phase is executed transiently. Here, we focus on covert
channels that can be used in transient attacks – these can also be
used as conventional channels.
There are two parties in a covert channel: the sender and the
receiver. In the covert channels, the sender execution will change
some micro-architectural state and the receiver will observe the
change to extract information.
4.1 Assumptions on Covert Channels as the
Disclosure Primitive
This work focuses on covert channels that do not require physical
presence and which only require attacker’s software (or software
under the attacker’s control) to be executing on the same system
as the victim. Thus, we do not consider physical channels, such as
power [33], EM field [66], acoustic signals [6, 34], etc.; because the
attacker needs to have physical access to the device and special
sensors to get observations, which is hardly practical in remote
software attacks. There are certain physical channels that can be ac-
cessed from software, such as temperature [105]. However, thermal
conduction is slow and the bandwidth is limited.
Any sharing of hardware resources between users could lead to
a covert channel between a sender and a receiver [98]. The receiver
can observe the status of the hardware system with some metadata
from the covert channel, such as the execution time, values of
hardware performance counters (HPC), system behavior, etc.
The most commonly used observation by the receiver of the
covert channels is the timing of execution. In today’s processors,
components are designed to achieve a better performance, and thus,
the execution time contains information about whether certain
hardware unit is available during execution (e.g., port), whether
the micro-architectural states are optimal for the code (e.g., cache
hits or misses), etc. To observe the hardware states via timing, a
timer is needed. In x86, rdtscp instruction can be used to read a
high-resolution time stamp counter of the CPU, and thus, can be
used to measure the latency of a chosen piece of code. When the
rdtscp is not available, a counting thread can be used as a timer [84].
The receiver can also gain information from HPCs. HPCs have
information about branch prediction, cache, TLB, etc, and are used
in covert channel attacks [31]. However, HPCs must be configured
in kernel mode [23], and thus, are not suitable for unprivileged
attackers.
The receiver can further observe the state of the hardware by
some system behaviors, e.g., abortion. In Prime+Abort attack [26],
by exploiting TSX, an attacker receive an abort (call-back) if the
victim process accesses a critical address.
In other cases, several covert channels are used in series. Here,
for transient execution attacks, we only consider channels where
the receiver can decode data architecturally. For example, in the
Fetch+Bounce covert channel [81], first, the secret is encoded into
the TLB states, which affect the STL forwarding, and then a cache
Flush+Reload covert channel is used to observe the STL forwarding
results. The first channel can only be observed by instructions
in transient execution and the states will be removed when the
instruction retires. We only consider the second covert channel to
be critical for transient execution attack because the last channel
allows the attacker to observe the secret architecturally.
7
Table 4: Data Leaked by the Transient Execution Attacks.
hy
pe
rv
iso
r
ac
ro
ss
VM
ke
rn
el
da
ta
ac
ro
ss
us
er
ap
p.
SG
X
sa
nd
bo
x
st
al
e
da
ta
Spectre-
type
Branch [10, 20, 53, 54, 65, 86, 93] ⊠ ⊠ ⊠ ⊠ ⊠ ⊠ □
STL [1] □ □ □ □ □ □ ⊠
Meltdown-
type
PF-US (V3) [60, 93] □ □ ⊠ □ □ □ □
PF-P (L1TF) [95, 102] ⊠ ⊠ ⊠ ⊠ ⊠ □ □
PF-RW (V1.2) [52] □ □ □ □ □ ⊠ □
NM (LazyFP) [89] □ □ □ □ □ □ ⊠
GP (V3a) [5] □ □ ⊠ □ □ □ □
MDS-
type
LFB [83, 96] ⊠ ⊠ ⊠ ⊠ ⊠ ⊠ □
STL [70] ? ? ⊠ ? ? ? □
⊠ indicates that the attack is possible to leak the protected data; □ indicates that the
attack cannot leak the data; ? indicates that no attack has been shown yet.
4.2 Categorization of Covert Channels
We categorize the covert channels into volatile channels and per-
sistent channels. In volatile channels, the sender and the receiver
share the resource on the fly, no states are changed, e.g., sharing a
port or some logic concurrently. The sender and the receiver have
contention when communicating using this type of channel. In
persistent channels, the sender changes the micro-architectural
states, and the receiver can observe the state changes later, e.g.,
change of cache state. Although the states may be changed later,
we call them persistent channels to differentiate from the volatile
channels.
4.2.1 Volatile Covert Channels. In a volatile covert channel, there
is contention for hardware between the sender and receiver on the
fly, and thus, the two should run concurrently, for example, as two
hyper-threads in SMT processors or running concurrently on two
different cores. As shown in Figure 4, the receiver first measures
the baseline execution time when the sender is not using the shared
resource. Then, the sender causes contention on the shared resource
or not depending on the message to be sent, while the receiver
continues to measure the execution time. If the execution time
increases, the receiver knows the sender is using the shared resource
at the moment.
Execution units, ports, and buses are shared between the hyper-
threads running concurrently on the same physical core, and can
be used for covert channels [3, 10]. L1 cache ports are also shared
among hyper-threads. In Intel processors, L1 cache is divided into
banks, and each cache bank can only handle a single (or a limit
number of) requests at a time. CacheBleed [113] leverages the
contention L1 cache bank to build a covert channel. Later, Intel
resolved the cache bank conflicts issue with the Haswell gener-
ation. However, MemJam [71] attack demonstrates that there is
still a false dependency of memory read-after-write requests when
the addresses are of the same L1 cache set and offset for newer
generations of Intel processors. This false dependency can be used
for a covert channel. As shown in Table 5, the covert channel in
execution ports and L1 cache ports can lead to covert channels
between hyper-threads in SMT setting.
Memory bus serves memory requests to all the cores using the
main memory. In [104], it is shown that the memory bus can act
as a high-bandwidth covert channel medium, and covert channel
attacks on various virtualized x86 systems are demonstrated.
4.2.2 Persistent Covert Channels. In persistent channels, the sender
and the receiver share the same micro-architectural states, e.g.,
registers, caches, etc. Different from volatile covert channels, the
state will be memorized in the system for a while. And the sender
and the receiver do not have to execute concurrently. Depending
on whether the state has an ownership (i.e., can only be used by
one party) or can be directly accessed by anyone, we further divide
the persistent channels into occupancy-based and encode-based, as
shown in Figure 5.
(1) Occupancy-based Persistent Covert channels:
• Contention-based Persistent Channels:
In this channel, the sender and the receiver will compete to
occupy some states to store their data or metadata to (de-)accelerate
their execution. One example of the contention-based channel is
the Prime+Probe attack [38, 74, 75, 110]. The receiver first occupies
a cache set (i.e., primes). Then, the sender may use the state for her
data or not, depending on the message to be sent. And in the end,
the receiver reads (i.e., probes) her data that were used to occupy
the cache set in the first step to see whether those data are still
in the cache by measuring the timing, as shown in the first row
of Figure 5. Other examples of the contention-based channel are
cache Evict+Time attack [9, 74], the covert channel in DRAM row
buffer [76].
Another possible contention is that the sender needs to use the
same piece of data (e.g., need exclusive access to the data for write),
and thus, the receiver’s copy of data can be invalidated. Some state
is used for tracking the relationship of data in different components,
which can cause the data in one component to be invalidated and
cause contention. For example, cache coherency policy can change
the cache state of a cache line in a remote cache, and thus, it results
in a covert channel between threads on different cores on the same
processor chip [93, 111]. Cache directory keeps the tags and cache
8
Step 1: Step 2: (Sending 0) (Sending 1)
No contention, and the 
receiver’s request gets 
processed.
Contention on the resource, 
and the receiver’s request 
gets delayed.
Receiver measures 
baseline execution 
time.
Sender Receiver
Shared Resource:
e.g., port or logic 
Sender Receiver
Shared Resource:
e.g., port or logic 
Shared Resource:
e.g., port or logic 
Sender Receiver
Figure 4: Steps for the sender and the receiver to transfer information through volatile covert channels. The yellow box shows the shared
resource. The solid (dashed) arrow shows the shared resource is (not) requested or used by the corresponding party.
The receiver measures 
if her data is still in the 
shared states.Co
nt
en
tio
n:
Re
us
e:
Step 1: Step 2:
Shared states:
Receiver’s data
En
co
de
-
ba
se
d:
Shared states:
Sender’s data
or invalid
(Sending 1)(Sending 0)
Step 3:
Shared states:
empty
The receiver measures 
if the shared data is in 
the shared states.
Shared states:
State 0
Shared states:
State 1
The receiver measures
the state to decode
Shared states:
Receiver’s data
Shared states:
empty
Shared states:
shared data
Shared states:
State 0
Oc
cu
pa
nc
y-
ba
se
d:
SenderSender Receiver
Figure 5: Steps for the sender and the receiver to transfer information through different types of persistent covert channels.
coherence state of cache lines in the lower levels of cache in a non-
inclusive cache hierarchy and can cause eviction of a cache line
in the lower cache level (a remote cache relative to the sender) to
build a covert channel [110].
• Reuse-based Persistent Channels:
In this channel, the sender and the receiver will share some data
or metadata, and if the data is stored in the shared state, it could
(de-)accelerate both of their execution. The cache Flush+Reload
attack [37, 112] transfers information by reusing the same data in
the cache. The receiver first cleans the cache state. Then, the sender
loads the shared data or not. And in the end, the receiver measures
the execution time of loading the shared data, as in Figure 5. If the
sender loads the shared data in the second step, the receiver will
observer faster timing compared to the case when the sender does
not load the shared data. There are other reuse-based attacks, such
as Cache Collision attack [13] and the cache Flush+Flush attack [36].
BTB can also be used as such a covert channel, as shown in [101].
The sender and the receiver use the same indirect jump source,
ensuring the same BTB entry is used. If the receiver has the same
destination address as the sender, the BTB will make a correct
prediction resulting in a faster jump.
(2) Encode-based Persistent Channels:
Different from the contention-based and the reused-based covert
channel, where the user needs to occupy the states (e.g., registers,
cache, or some entries) or data to change the execution, in this type
of channels, the sender and the receiver can both directly change
and probe the shared state. One example of such a channel is the
AVX channel [86]. There are two AVX2 unit states: power-off and
power-on. To save power, the CPU can power down the upper half
of the AVX2 unit by default. In step 2, if the sender then uses the
AVX2 unit, it will be power-on the unit for at least 1 ms. In step
3, the receiver can measure whether the AVX2 unit is power-on
by measuring the time of using AVXs unit. In this way, the sender
encodes the message into the state of the AVX2 unit, as shown in
Figure 5. Other examples are the covert channel using cache LRU
states [15, 51, 106].
4.3 Metrics for Covert Channels
This section lists metrics to compare different covert channels:
• Level of Sharing: This metric indicates how the sender and
the receiver should co-locate. As shown in Table 5, some of the
covert channels only exists when the sender and the receiver share
the same physical core. Other attacks exist when the sender and
the receiver share the same chip or even the same motherboard.
• Bandwidth: This metric measures how fast the channel is.
The faster the channel, the faster the attacker can transfer the
secret. Table 5 compared the bandwidth of different covert channels.
Usually, the bandwidth is measured in a real system considering
the noise from activities by other software and the system.
• Time Resolution of the Receiver: As shown in Figures 4
and 5, the receiver needs to measure and differentiate different
states. For a timing channel, the time resolution of the receiver’s
clock decides whether the receiver can observe the difference be-
tween the sender sending 0 or 1. The last column of Table 5 shows
the timing difference between states. Some channels, such as cache
L1, require a very high-resolution clock to differentiate 5 cycles
9
Table 5: Known Covert Channels in Micro-architecture.
Covert Channel Type
Level of Sharing
Bandwidth
Required
Time
Resolution
of the
Receiver
(CPU cycles)sa
m
e
th
re
ad
sa
m
e
co
re
,S
M
T
sa
m
e
ch
ip
,d
iff
er
en
tc
or
e
sa
m
e
m
ot
he
rb
oa
rd
Volatile
Covert
Channels
Execution Ports [3, 10, 98] □ ⊠ □ □ not given 50 Vs. 80
L1 Cache Ports [71, 113] □ ⊠ □ □ not given 36 Vs. 48
Memory Bus [104] □ ⊠ ⊠ ⊠ ∼700 B/s 2500 Vs. 8000
Persistent
Covert
Channels
AVX2 unit [86] ⊠ ⊠ □ □ >0.02B/s 200 Vs. 550
BTB [101] ⊠ ⊠ □ □ not given 34 Vs. 50a
TLB [35, 44, 81] ⊠ ⊠ □ □ ∼5kB/s per set 105 Vs. 130b
L1, L2 (tag, LRU) [51, 106, 107] ⊠ ⊠ □ □ ∼1MB/s per cache entry 5 Vs. 15c
LLC (tag, LRU) [15, 64] □ □ ⊠ □ ∼0.7MB/s per set 500 Vs. 800
Cache Coherence [93, 111] □ □ ⊠ ⊠ ∼1MB/s per cache entry 100 Vs. 250d
Cache Directory [110] □ □ ⊠ □ ∼0.2MB/s per slice 40 Vs. 400
DRAM row buffer [76] □ □ ⊠ ⊠ ∼2MB/s per bank 300 Vs. 350
⊠ indicates that the attack is possible to leak the protected data; □ indicates that the attack cannot leak the data.
a Simulation results in GEM5. b Depending on the level of TLB used, the required time resolution varies. The biggest
one is shown. c Shows the time resolution for covert channel use L1 cache. d Depending on the setup, the required
time resolution varies. The biggest one is shown.
from 15 cycles, while the LLC covert channel only needs to differ-
entiate 500 cycles from 800 cycles, and the receiver only needs a
coarse-grained clock.
• Retention Time: This metric measures how long the chan-
nel can keep the secret. In some of the covert channels (volatile
channels in Section 4.2.1), no state is changed, e.g., the channel
leveraging port contention [3]. The retention time of such channels
is zero, and the receiver must measure the channel concurrently
when the sender is sending information. Other covert channels
(persistent channels in Section 4.2.2) leverage state change in micro-
architecture, the retention time depends on how long the state will
stay, for example, AVX2 unit will be powered off after about 1ms.
If the receiver does not measure the state in time, she will obtain
no information. For other states, such as register, cache, etc., the
retention time depends on the usage of the unit and when the unit
will be used by another user.
4.4 Comparison of Covert Channels
Table 5 lists whether a covert channel exists in different sharing
settings. Whether a covert channel exists depends on whether the
unit is shared in that setting. For example, AVX2 units, TLB, and the
L1/L2 caches are shared among programs using the same physical
core. Therefore, a covert channel can be built among hyper-threads
and threads sharing a logical core in a time-sliced setting. The LLC,
cache coherence states, and DRAM are shared among different cores
on the chip, and therefore, a covert channel can be built between
different cores.
Some covert channels may use more than one component listed
in Table 5. For example, in the cache hierarchy, there could be
multiple levels of caches shared among the sender and the receiver.
In Flush+Reload cache covert channel, the receiver can use the
clflush instruction to flush a cache line from all the caches, and the
sender may load the cache line into L1/L2 of that core or the shared
LLC. If the sender and the receiver are in the same core, then the
receiver will reload the data from L1. If the sender and the receiver
are in different cores and only sharing the LLC, the receiver will
reload the data from LLC. Therefore, even with the same covert
channel code, the location of the covert channel depends on the
actual setting of the sender and the receiver. On the other hand,
for the same covert channel protocol, if it can establish a covert
channel in different hardware components, the covert channel will
exist in the settings that is the union of all the components shown
in Table 5. For example, with Flush+Reload, the LLC leads to a
covert channel among threads on different cores, L1/L2 leads to a
covert channel sharing the same physical core. The Flush+Reload
attack builds a covert channel when the sender and the receiver are
on the same chip either on the same physical core (using L1/L2) or
not (using LLC), i.e., the union of the sharing settings of L1/L2 and
LLC in Table 5.
As shown in Table 5, the channels in caches have relatively
high bandwidth (∼1MBits/s), which allows the attacker to launch
efficient attacks. Covert channels in AVX and TLB are slower but
enough for practical attacks.
4.5 Disclosure Gadget
The covert channel is used in the disclosure gadget to transfer the
secret to be accessible to the attacker architecturally. Disclosure
gadget usually contains two steps: 1. load the secret to the register;
2 encode the secret into a covert channel. As shown in Figure 6,
the disclosure gadget code depends on the covert channel used. For
covert channels in the memory hierarchy (e.g., cache side channel),
it will consist of memory access whose address depends on the
10
struct array *arr1 = ...;
struct array *arr2 = ...;
unsigned long offset = ...;
if (offset < arr1_len) {
sec = arr1[offset];
value2 = arr2[sec*c];}
struct array *arr1 = ...;
struct array *arr2 = ...;
unsigned long offset = ...;
if(offset < arr1_len){
if(arr1[offset])
_mm256_instruction();}
Cache covert channel: AVX-based covert channel:
1. Load secret
2. Encode
Disclosure 
gadget:
Figure 6: Example disclosure gadgets for different covert channels.
secret value. For AVX-based covert channels, the disclosure gadget
encodes the secret by using AVX instruction.
5 EXISTING TRANSIENT EXECUTION
ATTACKS
The transient execution attacks contain two parts: triggering tran-
sient execution to obtain data that is otherwise not accessible (dis-
cussed in Section 3) and transferring the data via a covert channel
(discussed in Section 4).
Compared to conventional covert channel attacks, the transient
execution attacks allow the attacker to access more secret data.
In Spectre-type attacks, the victim will encode the secret into the
channel, and the behavior cannot be analyzed from the software
semantics without a hardware model of prediction. In Meltdown-
type and MDS-type attacks, the micro-architecture propagates data
that is not allowed to propagate at the ISA level (propagation is
not visible at ISA level, but can be reconstructed through cover
channels which observe the changes in micro-architecture). To
formally model and detect the behavior, a new micro-architectural
model, including the transient behavior, should be used [19, 39, 68].
5.1 Existing Transient Execution Attacks Types
To launch an attack, the attacker needs a way to cause transient
execution of the victim or herself and a covert channel. Table 6
shows the attacks that are demonstrated in the publications. For
demonstrating different speculation primitives, researchers usually
use the covert channel in caches (row L1, L2 in Table 6). This is
because the cache Flush+Reload covert channel is simple and effi-
cient. For demonstrating different covert channels used in transient
execution attacks, researchers usually use PHT (Spectre V1). This is
because Spectre V1 is easy to demonstrate. Note that every entry in
the table can become an attack. For mitigations, each entry of the
table should be mitigated, either mitigate all the covert channels or
prevent accessing the secret data in transient execution.
5.2 Limitations of Existing Attacks
5.2.1 Limited Controllability of the Speculative Primitive. Spectre-
type attacks require the attacker to mis-train the prediction unit in
the setup phase to let the victim execute gadget speculatively. To
be able to mis-train, the attacker either needs to control part of the
victim’s execution to generate the desired history for prediction
or needs to co-locate with the victim on the same core. MDS-type
attacks also require the attacker and the victim to share the same
address speculation unit. As shown in Table 3, the prediction unit
is shared only within a physical core, for some unit, not even share
between each hyper-thread. In practice, it is not trivial to co-locate
on the same core.
5.2.2 Limited Exploitable Data of the Speculative Primitive. Meltdown-
type and MDS-type attacks both rely on the propagation of se-
cret data during transient execution. Therefore, the attacks are
implementation-dependent and are not applicable to all the proces-
sors [16]. Furthermore, there is a limit on the source of the data. For
example, L1TF attacks [95, 102] only pass data in L1 cache. MDS-
type attacks only pass data in LFB [96] and STL [70]. If the critical
data is not in the structure that is vulnerable or if the structure is
isolated, the attack is mitigated. For example, to mitigate attacks
in the time-sliced sharing setting, data can be flushed from the
above-mentioned structure during a context switch.
5.2.3 Limitation of the Disclosure Primitive. For a covert channel
to exist, the sharing of hardware is needed, which requires the co-
location of the attacker and the victim. Furthermore, for a certain
attack implementation, only one disclosure primitive is used, and
the attack can be mitigated by blocking the covert channel.
6 MITIGATIONS OF SPECTRE-TYPE
ATTACKS IN MICRO-ARCHITECTURE
DESIGN
In this section, we focus on mitigations to Spectre-type attacks
in micro-architecture designs. Spectre-type attacks are more fun-
damental in modern computer architectures. Meltdown-type and
MDS-type attacks are implementation-dependent, and we consider
them as implementation bugs. They can be fixed, although per-
formance penalty is unknown now. We focus on possible future
micro-architecture designs that are safe against Spectre. Thus, soft-
ware mitigation schemes in current commercial computers, such
as [17, 18, 73], are not discussed in detail.
6.1 Mitigating Transient Execution
The simplest mitigation is to stop any transient execution. However,
it will come with huge performance overhead, e.g., adding fence
after each branch to stop branch prediction causes 88% performance
loss [108].
To mitigate Spectre-type attacks, one solution is to limit the
attackers’ ability to mis-train the prediction units to prevent the
disclosure gadget to be executed transiently (the first metric in
Section 3.2). The prediction units (e.g., PHT, BTB, RSB, STL) should
not be shared among different users. This can be achieved by static
partition for concurrent users and flush the state during context
switches. For example, there are ISA extensions for controlling and
stopping indirect branch predictions [4, 45]. In [92], a decode-level
branch predictor isolation technique is proposed. However, if the
attacker can train the prediction unit by executing victim code
with certain input (e.g., always provide valid input in Spectre V1),
isolation is not enough.
There is also mitigation in software to stop speculation to make
the potential secret data depends on the result of the branch condi-
tion by introducing data dependency, e.g., masking the data with
the branch condition [17, 73], because current processors do not
speculate on data. However, this solution requires to identify all
control flow dependency and all disclosure gadgets, to figure out
11
Table 6: Transient Execution Attacks Types.
Cause of Transient Execution
Covert Channel PHT BTB RSB STL LFB Exception
Execution Ports [10] □ □ □ □ □
L1 Cache Ports □ □ □ □ □ □
Memory Bus □ □ □ □ □ □
AVX2 unit [86] □ □ □ □ □
TLB □ □ □ □ □ □
L1, L2 (tag, LRU) [20, 53] [53] [54, 65] [1, 70] [83, 96] [5, 52, 60, 89, 95, 102]
LLC (tag, LRU) □ □ □ □ □ □
Cache Coherence [93] □ □ □ □ [93]
Cache Directory □ □ □ □ □ □
DRAM row buffer □ □ □ □ □ □
□ shows attacks that are possible but not demonstrated yet.
all possible control flow that could lead to the execution of the
disclosure gadgets, and to patch each of them. It is a challenge to
identify all (current and future) disclosure gadgets, because dis-
closure gadgets may vary due to the encoding to different covert
channels, and formal methods are required [39].
The windowing gadget creates a large enough speculative win-
dow to let the disclosure gadget execute transiently. The micro-
architecture may be able to limit the time of speculation to prevent
the encoding to the covert channel (the third metric in Section 3.2).
However, the disclosure gadget can be very small that only con-
tains two loads from L1 [106], which is only about 20 cycles in total.
Detecting a malicious windowing gadget accurately can be chal-
lenging.
To mitigate leak of secret during the transient execution attacks,
one way is to prevent the transient execution of disclosure gadget,
i.e., to stop loading of secrets in transient execution or stop propa-
gating the secret to younger instructions in the disclosure gadget
transiently. For Meltdown-type and MDS-type attacks, it means
to stop propagating secret data to the younger instructions. For
Spectre-type attacks, however, the logic may not know which data
is secret. To mitigate the attacks, secret data should be tagged with
metadata as in secure architecture designs, which will be discussed
in Section 6.1.1.
Another conservative solution is that any data cannot be propa-
gated speculatively, which can potentially prevent transient execu-
tion attacks with any covert channel. Specifically, in NDA [101], a
set of propagation policies are designed for defending the attacks
leveraging different types of transient executions (for example, tran-
sient execution due to branch prediction or all transient execution),
showing the trade-off between security and performance. Similarly,
in SpecShield [7, 8], different propagation policies are designed and
evaluated. In Conditional Speculation [56], the authors target at
covert channels in the memory system, and proposed an architec-
ture where data cannot be transiently propagated to instructions
that lead to changes in memory system showing 13% performance
overhead. They further optimized the design for Flush+Reload
cache side channels resulting performance overhead of 7%. Further-
more, in STT [114], all possible covert channels are analyzed and
a dynamic information flow tracking based micro-architecture is
proposed to defend all covert channels, which improves the perfor-
mance by wake up instructions as early as possible. The overhead
to defend Spectre-like attacks is moderate, e.g., 21% reported in
SpecShield [7], 20 ∼ 51% (113% for defending all transient execution
attacks) reported in NDA [101], and 8.5% in STT [114].
6.1.1 Mitigations in Secure Architectures. Secure architectures are
designed to protect the confidentiality (or integrity) of certain data
or code. Thus, secure architectures usually come with ISA exten-
sions to identify the data or code to be protected, e.g., secret data
region, and micro-architecture designs to isolate the data and code
to be protected [21, 57, 90].
With knowledge about the data to be protected, hardware can
further stop propagate secret data speculatively. The hardware can
identify data that is depended on secret with taint checking, as
proposed in [32, 53, 85, 92], and forbids tainted data to have micro-
architectural side effects or flushes all the states on exits to defend
permanent covert channel, and disable SMT to defend transient
covert channel. The overhead of such mitigation depends on the
size of secret data to be protected. For example, as reported in Con-
TExT [85], the overhead is 71.14% for security-critical applications,
and less than 1% for real-world workloads. Similar overhead is re-
ported in SpectreGuard [32]. Intel also proposed a newmemory type,
named speculative-access protected memory (SAPM) [48]. Any ac-
cess to SAPM region will cause instruction-level serialization and
speculative execution beyond the SAPM-accessing instruction will
be stopped until the retirement of that instruction.
6.2 Mitigating Covert Channels
To limit the disclosure primitive, one way is to isolate all the hard-
ware across the sender and receiver of the channel, so the change
cannot be observable to the receiver. However, this is not possible,
e.g., in some attacks, the attacker is both the sender and the receiver
of the channel.
Another mitigation is to eliminate the sender of the covert chan-
nel in transient execution. For volatile covert channels, the mitiga-
tion is challenging. For permanent covert channels, there should
not be speculative change to any micro-architectural states, or any
micro-architectural state changes should be rolled back when the
pipeline is squashed. Covert channels in memory systems, such
as caches and TLBs, are most commonly used. Hence, most of the
existing mitigations focus on cache and TLB side channels.
12
Table 7: Comparison of Different Mitigation Schemes in Micro-architecture.
Mitigation Schemes Performance Overhead
Fence after each branch 88% [108]
Stop propagating all data 30 ∼ 55% [8]; 21% [7]; 20 ∼ 51% [101]; 8.5% [114]
Stop propagating all data to cache changes 13% [56]
Stop propagating all data to Flush+Reload channel 7% [56]
Stop propagating all tagged secret data 71% for security-critical applications, < 1% for real-
world workloads [32, 85]
Partitioned cache 1 ∼ 15% [51]
Stop (Undo) speculative change in caches 22% [108]; 11% [80]; 5.1% [79]
InvisiSpec [108] proposed the concept of “visibility point" of a
load, which indicates the time when a load is safe to cause micro-
architecture state changes that are visible to attackers. Before the
visibility point, a load may be squashed, and should not cause any
micro-architecture state changes visible to the attackers. To reduce
performance overhead, a “speculative buffer" is used to temporarily
cache the load, without modifications in the local cache. After
the “visibility point", the data will be fetched into the cache. For
cache coherency, a new coherency policy is designed such that
the data will be validated when stale data is potentially fetched.
The GEM5 [11] simulation results show a 22% performance loss for
SPEC 2006 benchmark [43]. Similarly, SafeSpec [50] proposed to
add shadow buffers to caches and TLBs, so that transient changes
in the caches and TLBs does not happen.
CleanupSpec [79] proposed to use a combination of undoing the
speculative changes and secure cache designs.Whenmis-speculation
is detected and the pipeline is squashed, the changes to the L1
cache is rolled back. For tracking the speculative changes in caches,
1Kbyte storage overhead is introduced. To prevent the cross-core
or multi-thread covert channel, partitioned L1 with random re-
placement policy and randomized L2/LLC are used. Because only a
small portion of transient executions results in mis-speculations,
the method shows an average slowdown of 5.1%.
Moreover, accessing speculative loads that hit in L1 cachewill not
cause side effects (except LRU state updates) in the memory system.
Therefore, only allow speculative L1 hits can mitigate transient
execution attacks using covert channels (other than LRU) in the
memory system. In Selective Delay [80], to improve performance,
for a speculative load that miss in L1, value prediction is used. The
load will fetch from deeper layers in the memory hierarchy until the
load is not speculative. In their solution, 11% performance overhead
is shown.
Meanwhile, many secure cache architectures are proposed to
use randomization to mitigate the cache covert channels in general
(not only the transient execution attacks). For example, Random
Fill cache [62] decouples the load and the data that is filled into
cache, and thus, the cache state will no longer reflect the sender’s
memory access pattern. Random Permutation (RP) cache [99], New-
cache cache [63, 100], CEASER cache [78] and ScatterCache [103]
randomize memory-to-cache-set mapping to mitigate contention-
based occupancy-based covert channels in cache. Non Deterministic
cache [49] randomizes cache access delay and de-couple the rela-
tion between cache block access and cache access timing. Secure
TLBs [25] are also proposed to mitigate covert channels in TLBs.
But again, all the possible covert channels need to be mitigated
to fully mitigate transient execution attacks. Further, Cyclone [40]
proposed a micro-architecture to detect cache information leaks
across security domains.
Another mitigation is to degrade the quality of the channel or
even make the channel unusable for a practical attack. For exam-
ple, many timing covert channels require the receiver to have a
fine-grained clock to observe the channel (the second metric in
Section 4.3). Limiting the receiver’s observation will reduce the
bandwidth or even mitigate the covert channel [77, 82]. Noise can
also be added to the channel to reduce the bandwidth (the third
metric in Section 4.3).
However, the above mitigations only cover covert channels in
memory systems. To mitigate other covert channels, there are the
following challenges: 1. Identify all possible covert channels in
micro-architecture, including future covert channels. Formal meth-
ods are required in this process. For example, information flow
tracking, such as methods in [24, 115, 116], can be used to analyze
the hardware components, where the data of transient execution
could flow to. Then, analyze if each of the components could result
in a permanent or transient covert channel. 2. Mitigate each of the
possible covert channels.
6.2.1 Mitigations in Secure Architectures. With clearly defined se-
curity domain, isolation can be designed to mitigate not only tran-
sient covert channels and also conventional covert channels. For
example, to defend cache covert channels, a number of partitioned
caches to different security domains are proposed, either stati-
cally [14, 22, 41, 51, 55, 61, 99, 109, 115, 116] or dynamically [27, 97].
With partition, shared resource no longer exists between the sender
and the receiver, and the receiver cannot observe secret dependent
behavior to decode the secret.
The above proposal assumes the hardware is isolated for each se-
curity domains. However, there is also the scenario where software
outside the security domain may use the same hardware after a
context switch. InMI6 processor [14], caches and ports partitioning
are used to isolate software on different cores. Further, when there
is a context switch, a security monitor flushes the architecture and
micro-architecture states, which holds the information of in-flight
speculation from the previously executing program. To protect the
security monitor, speculation is not used in the execution of the
security monitor.
7 CONCLUSION
This paper provided a survey of the transient execution attacks. This
paper first presents the two components of the attacks – transient
13
execution and covert channel, and the three phases of the attack. It
further analyzes each component by proposing a set of metrics and
using the metrics to compare the primitives used in existing attacks.
Especially, the paper enumerates all possible causes of transient
executions and categorizes the covert channels that can be used in
an attack. Combining the two primitives, different types of attacks
are compared, and the data exploitable in the attack are discussed.
In the end, possible mitigation schemes in hardware are discussed
and compared.
REFERENCES
[1] 2018. speculative execution, variant 4: speculative store bypass. https://bugs.
chromium.org/p/project-zero/issues/detail?id=1528 accessed May. 2019.
[2] 2019. Intel Transactional Synchronization Extensions (Intel TSX) Overview.
https://software.intel.com/en-us/cpp-compiler-developer-guide-and-
reference-intel-transactional-synchronization-extensions-intel-tsx-overview
accessed May. 2019.
[3] Alejandro Cabrera Aldaya, Billy Bob Brumley, Sohaib ul Hassan, Cesar Pereida
García, and Nicola Tuveri. [n. d.]. Port contention for fun and profit. In Port
Contention for Fun and Profit. IEEE, 0.
[4] AMD. 2018. Software Techniques for Managing Speculation on AMD Proces-
sors. https://developer.amd.com/wp-content/resources/Managing-Speculation-
on-AMD-Processors.pdf accessed May. 2019.
[5] ARM. 2019. Vulnerability of Speculative Processors to Cache Timing Side-
Channel Mechanism. https://developer.arm.com/support/arm-security-
updates/speculative-processor-vulnerability accessed May. 2019.
[6] Michael Backes, Markus Dürmuth, Sebastian Gerling, Manfred Pinkal, and
Caroline Sporleder. 2010. Acoustic Side-Channel Attacks on Printers.. In USENIX
Security symposium. 307–322.
[7] Kristin Barber, Anys Bacha, Li Zhou, Yinqian Zhang, and Radu Teodorescu. 2019.
Specshield: Shielding speculative data from microarchitectural covert channels.
In 2019 28th International Conference on Parallel Architectures and Compilation
Techniques (PACT). ACM.
[8] Kristin Barber, Li Zhou, Anys Bacha, Yinqian Zhang, and Radu Teodorescu.
2019. Isolating Speculative Data to Prevent Transient Execution Attacks. IEEE
Computer Architecture Letters (2019).
[9] Daniel J Bernstein. 2005. Cache-timing attacks on AES. (2005).
[10] Atri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandtner,
Alessandro Sorniotti, Babak Falsafi, Mathias Payer, and Anil Kurmus. 2019.
SMoTherSpectre: exploiting speculative execution through port contention.
arXiv preprint arXiv:1903.01843 (2019).
[11] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali
Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh
Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture
News 39, 2 (2011), 1–7.
[12] Bitdefender. 2019. Bypassing KPTI Using the Speculative Behavior of the
SWAPGS Instruction. https://www.bitdefender.co.th/wp-content/uploads/
gz/Bitdefender-WhitePaper-SWAPGS.pdf accessed May. 2019.
[13] Joseph Bonneau and Ilya Mironov. 2006. Cache-collision timing attacks against
AES. In International Workshop on Cryptographic Hardware and Embedded Sys-
tems. Springer, 201–215.
[14] Thomas Bourgeat, Ilia Lebedev, Andrew Wright, Sizhuo Zhang, Srinivas De-
vadas, et al. 2019. Mi6: Secure enclaves in a speculative out-of-order processor.
In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Mi-
croarchitecture. ACM, 42–56.
[15] Samira Briongos, Pedro Malagón, José M Moya, and Thomas Eisenbarth. 2019.
RELOAD+ REFRESH: Abusing Cache Replacement Policies to Perform Stealthy
Cache Attacks. arXiv preprint arXiv:1904.06278 (2019).
[16] Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin
Von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, and Daniel Gruss.
2019. A systematic evaluation of transient execution attacks and defenses. In
28th USENIX Security Symposium (USENIX Security 19). 249–266.
[17] Chandler Carruth. 2018. Speculative Load Hardening (a Spectre variant #1
mitigation. https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html
accessed May. 2019.
[18] Microsoft Security Response Center. 2019. Retpoline: a software construct for
preventing branch-target-injection. https://support.google.com/faqs/answer/
7625886 accessed Oct. 2019.
[19] Kevin Cheang, Cameron Rasmussen, Sanjit Seshia, and Pramod Subramanyan.
2019. A formal approach to secure speculation. In 2019 IEEE 32nd Computer
Security Foundations Symposium (CSF). IEEE, 288–28815.
[20] Guoxing Chen, Sanchuan Chen, Yuan Xiao, Yinqian Zhang, Zhiqiang Lin, and
Ten H Lai. 2019. SgxPectre: Stealing Intel Secrets from SGX Enclaves Via
Speculative Execution. (2019), 142–157.
[21] Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. IACR Cryptology
ePrint Archive 2016, 086 (2016), 1–118.
[22] Victor Costan, Ilia Lebedev, and Srinivas Devadas. 2016. Sanctum: Minimal
hardware extensions for strong software isolation. In 25th USENIX Security
Symposium (USENIX Security 16). 857–874.
[23] Sanjeev Das, Jan Werner, Manos Antonakakis, Michalis Polychronakis, and
FabianMonrose. 2019. SoK: The challenges, pitfalls, and perils of using hardware
performance counters for security. In Proceedings of 40th IEEE Symposium on
Security and Privacy (S&P 19).
[24] Shuwen Deng, Doğuhan Gümüşoğlu, Wenjie Xiong, Y. Serhan Gener, Onur
Demir, and Jakub Szefer. 2019. SecChisel Framework for Security Verification
of Secure Processor Architectures. In Proceedings of the Workshop on Hardware
and Architectural Support for Security and Privacy (HASP).
[25] Shuwen Deng,Wenjie Xiong, and Jakub Szefer. 2019. Secure TLBs. In Proceedings
of the International Symposium on Computer Architecture (ISCA).
[26] Craig Disselkoen, David Kohlbrenner, Leo Porter, andDean Tullsen. 2017. Prime+
Abort: A Timer-Free High-Precision L3 Cache Attack using Intel {TSX}. In 26th
USENIX Security Symposium (USENIX Security 17). 51–67.
[27] Leonid Domnitser, Aamer Jaleel, Jason Loew, Nael Abu-Ghazaleh, and Dmitry
Ponomarev. 2012. Non-monopolizable caches: Low-complexity mitigation of
cache side channel attacks. ACM Transactions on Architecture and Code Opti-
mization (TACO) 8, 4 (2012), 35.
[28] Marius Evers, Po-Yung Chang, and Yale N Patt. 1996. Using hybrid branch
predictors to improve branch prediction accuracy in the presence of context
switches. In ACM SIGARCH Computer Architecture News, Vol. 24. ACM, 3–11.
[29] Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2015. Covert
channels through branch predictors: a feasibility study. In Proceedings of the
Fourth Workshop on Hardware and Architectural Support for Security and Privacy.
ACM, 5.
[30] Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2016. Jump
over ASLR: Attacking branch predictors to bypass ASLR. In The 49th Annual
IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 40.
[31] Dmitry Evtyushkin, Ryan Riley, Nael CSE Abu-Ghazaleh, ECE, and Dmitry
Ponomarev. 2018. BranchScope: A New Side-Channel Attack on Directional
Branch Predictor. In Proceedings of the Twenty-Third International Conference
on Architectural Support for Programming Languages and Operating Systems
(ASPLOS ’18). ACM, New York, NY, USA, 693–707. https://doi.org/10.1145/
3173162.3173204
[32] Jacob Fustos, Farzad Farshchi, and Heechul Yun. 2019. SpectreGuard: An Effi-
cient Data-centric Defense Mechanism against Spectre Attacks.. In DAC. 61–1.
[33] Daniel Genkin, Itamar Pipman, and Eran Tromer. 2015. Get your hands off
my laptop: Physical side-channel key-extraction attacks on PCs. Journal of
Cryptographic Engineering 5, 2 (2015), 95–112.
[34] Daniel Genkin, Adi Shamir, and Eran Tromer. 2014. RSA key extraction via low-
bandwidth acoustic cryptanalysis. In Annual Cryptology Conference. Springer,
444–461.
[35] Ben Gras, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. 2018. Translation
Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks.
In USENIX Security Symposium. USENIX, 955–972.
[36] Daniel Gruss, Clémentine Maurice, Klaus Wagner, and Stefan Mangard. 2016.
Flush+ Flush: a fast and stealthy cache attack. In International Conference on
Detection of Intrusions and Malware, and Vulnerability Assessment. Springer,
279–299.
[37] Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. 2015. Cache Template
Attacks: Automating Attacks on Inclusive Last-Level Caches.. InUSENIX Security
Symposium. 897–912.
[38] Roberto Guanciale, Hamed Nemati, Christoph Baumann, and Mads Dam. 2016.
Cache storage channels: Alias-driven attacks and verified countermeasures. In
Security and Privacy (SP), 2016 IEEE Symposium on. IEEE, 38–55.
[39] Marco Guarnieri, Boris Köpf, José F Morales, Jan Reineke, and Andrés Sánchez.
2018. SPECTECTOR: Principled Detection of Speculative Information Flows.
arXiv preprint arXiv:1812.08639 (2018).
[40] Austin Harris, Shijia Wei, Prateek Sahu, Pranav Kumar, Todd Austin, and Mo-
hit Tiwari. 2019. Cyclone: Detecting Contention-Based Cache Information
Leaks Through Cyclic Interference. In Proceedings of the 52nd Annual IEEE/ACM
International Symposium on Microarchitecture. ACM, 57–72.
[41] Zecheng He and Ruby B Lee. 2017. How secure is your cache against side-
channel attacks?. In Proceedings of the 50th Annual IEEE/ACM International
Symposium on Microarchitecture. ACM, 341–353.
[42] John L Hennessy and David A Patterson. 2011. Computer architecture: a quanti-
tative approach. Elsevier.
[43] John L Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH
Computer Architecture News 34, 4 (2006), 1–17.
[44] Ralf Hund, Carsten Willems, and Thorsten Holz. 2013. Practical Timing Side
Channel Attacks Against Kernel Space ASLR. In IEEE Symposium on Security
and Privacy. IEEE, 191–205.
[45] Intel. 2018. Speculative Execution Side Channel Mitigations. https://software.
intel.com/security-software-guidance/api-app/sites/default/files/336996-
14
Speculative-Execution-Side-Channel-Mitigations.pdf accessed May. 2019.
[46] Saad Islam, Ahmad Moghimi, Ida Bruhns, Moritz Krebbel, Berk Gulmezoglu,
Thomas Eisenbarth, and Berk Sunar. 2019. SPOILER: Speculative Load Hazards
Boost Rowhammer and Cache Attacks. In 28th USENIX Security Symposium
(USENIX Security 19). USENIX Association, Santa Clara, CA, 621–637. https:
//www.usenix.org/conference/usenixsecurity19/presentation/islam
[47] Daniel A Jiménez and Calvin Lin. 2001. Dynamic branch prediction with
perceptrons. In Proceedings HPCA Seventh International Symposium on High-
Performance Computer Architecture. IEEE, 197–206.
[48] Kekai Hu Ke Sun, Rodrigo Branco. 2019. A New Memory Type against Specula-
tive Side Channel Attacks. https://blogs.technet.microsoft.com/srd/2018/03/15/
mitigating-speculative-execution-side-channel-hardware-vulnerabilities/ ac-
cessed May. 2019.
[49] Georgios Keramidas, Alexandros Antonopoulos, Dimitrios N Serpanos, and
Stefanos Kaxiras. 2008. Non deterministic caches: A simple and effective defense
against side channel attacks. Design Automation for Embedded Systems 12, 3
(2008), 221–230.
[50] Khaled N Khasawneh, Esmaeil Mohammadian Koruyeh, Chengyu Song, Dmitry
Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. 2019. SafeSpec: Ban-
ishing the Spectre of a Meltdown with Leakage-Free Speculation. In Proceedings
of the 56th Annual Design Automation Conference 2019. ACM, 60.
[51] Vladimir Kiriansky, Ilia Lebedev, Saman Amarasinghe, Srinivas Devadas, and
Joel Emer. 2018. DAWG: A defense against cache timing attacks in speculative
execution processors. In 2018 51st Annual IEEE/ACM International Symposium
on Microarchitecture (MICRO). IEEE, 974–987.
[52] Vladimir Kiriansky and Carl Waldspurger. 2018. Speculative buffer overflows:
Attacks and defenses. arXiv preprint arXiv:1807.03757 (2018).
[53] Paul Kocher, Jann Horn, Anders Fogh, , Daniel Genkin, Daniel Gruss, Werner
Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael
Schwarz, and Yuval Yarom. 2019. Spectre Attacks: Exploiting Speculative Exe-
cution. In 40th IEEE Symposium on Security and Privacy (S&P’19).
[54] Esmaeil Mohammadian Koruyeh, Khaled N Khasawneh, Chengyu Song, and
Nael Abu-Ghazaleh. 2018. Spectre returns! speculation attacks using the return
stack buffer. In 12th USENIX Workshop on Offensive Technologies (WOOT 18).
[55] Ruby B Lee, Peter Kwan, John P McGregor, Jeffrey Dwoskin, and Zhenghong
Wang. 2005. Architecture for protecting critical secrets in microprocessors. In
ACM SIGARCH Computer Architecture News, Vol. 33. IEEE Computer Society,
2–13.
[56] Peinan Li, Lutan Zhao, Rui Hou, Lixin Zhang, and Dan Meng. 2019. Condi-
tional Speculation: An Effective Approach to Safeguard Out-of-Order Execution
Against Spectre Attacks. In 2019 IEEE International Symposium on High Perfor-
mance Computer Architecture (HPCA). IEEE, 264–276.
[57] David Lie, Chandramohan Thekkath, MarkMitchell, Patrick Lincoln, Dan Boneh,
John Mitchell, and Mark Horowitz. 2000. Architectural support for copy and
tamper resistant software. Acm Sigplan Notices 35, 11 (2000), 168–177.
[58] MikkoH Lipasti and John Paul Shen. 1996. Exceeding the dataflow limit via value
prediction. In Proceedings of the 29th annual ACM/IEEE international symposium
on Microarchitecture. IEEE Computer Society, 226–237.
[59] Mikko H Lipasti, Christopher B Wilkerson, and John Paul Shen. 1996. Value
locality and load value prediction. ACM SIGPLAN Notices 31, 9 (1996), 138–147.
[60] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas,
Anders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval
Yarom, and Mike Hamburg. 2018. Meltdown: Reading Kernel Memory from
User Space. In 27th USENIX Security Symposium (USENIX Security 18).
[61] Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas, Gernot Heiser,
and Ruby B Lee. 2016. Catalyst: Defeating last-level cache side channel attacks
in cloud computing. In High Performance Computer Architecture (HPCA), 2016
IEEE International Symposium on. IEEE, 406–418.
[62] Fangfei Liu and Ruby B Lee. 2014. Random fill cache architecture. In Microar-
chitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on.
IEEE, 203–215.
[63] Fangfei Liu, Hao Wu, Kenneth Mai, and Ruby B Lee. 2016. Newcache: Secure
cache architecture thwarting cache side-channel attacks. IEEE Micro 36, 5 (2016),
8–16.
[64] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B Lee. 2015. Last-
level cache side-channel attacks are practical. In Security and Privacy (SP), 2015
IEEE Symposium on. IEEE, 605–622.
[65] Giorgi Maisuradze and Christian Rossow. 2018. ret2spec: Speculative execution
using return stack buffers. In Proceedings of the 2018 ACM SIGSAC Conference
on Computer and Communications Security. ACM, 2109–2122.
[66] Nikolay Matyunin, Jakub Szefer, Sebastian Biedermann, and Stefan Katzen-
beisser. 2016. Covert channels using mobile device’s magnetic field sensors. In
2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE,
525–532.
[67] Scott McFarling. 1993. Combining branch predictors. Technical Report. Technical
Report TN-36, Digital Western Research Laboratory.
[68] Ross Mcilroy, Jaroslav Sevcik, Tobias Tebbi, Ben L Titzer, and Toon Verwaest.
2019. Spectre is here to stay: An analysis of side-channels and speculative
execution. arXiv preprint arXiv:1902.05178 (2019).
[69] Pierre Michaud, André Seznec, and Richard Uhlig. 1997. Trading conflict and
capacity aliasing in conditional branch predictors. In ACM SIGARCH Computer
Architecture News, Vol. 25. ACM, 292–303.
[70] Marina Minkin, Daniel Moghimi, Moritz Lipp, Michael Schwarz, Jo Van Bulck,
Daniel Genkin, Daniel Gruss, Berk Sunar, Frank Piessens, and Yuval Yarom.
2019. Fallout: Reading Kernel Writes From User Space. (2019).
[71] Ahmad Moghimi, Thomas Eisenbarth, and Berk Sunar. 2018. MemJam: A false
dependency attack against constant-time crypto implementations in SGX. In
CryptographersâĂŹ Track at the RSA Conference. Springer, 21–44.
[72] Donald A Neamen. 2012. Semiconductor physics and devices: basic principles.
New York, NY: McGraw-Hill,.
[73] Oleksii Oleksenko, Bohdan Trach, Tobias Reiher, Mark Silberstein, and Christof
Fetzer. 2018. You shall not bypass: Employing data dependencies to prevent
bounds check bypass. arXiv preprint arXiv:1805.08506 (2018).
[74] Dag Arne Osvik, Adi Shamir, and Eran Tromer. 2006. Cache attacks and counter-
measures: the case of AES. In CryptographersâĂŹ Track at the RSA Conference.
Springer, 1–20.
[75] Colin Percival. 2005. Cache missing for fun and profit.
[76] Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan
Mangard. 2016. {DRAMA}: Exploiting {DRAM} Addressing for Cross-CPU
Attacks. In 25th USENIX Security Symposium (USENIX Security 16). 565–581.
[77] Filip Pizlo. 2018. What Spectre and Meltdown Mean For WebKit. https://webkit.
org/blog/8048/what-spectre-and-meltdown-mean-for-webkit/ accessed May.
2019.
[78] Moinuddin K Qureshi. 2018. CEASER: Mitigating Conflict-Based Cache At-
tacks via Encrypted-Address and Remapping. In 2018 51st Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO). IEEE, 775–787.
[79] Gururaj Saileshwar and Moinuddin K Qureshi. 2019. CleanupSpec: An Undo
Approach to Safe Speculation. In Proceedings of the 52nd Annual IEEE/ACM
International Symposium on Microarchitecture. ACM, 73–86.
[80] Christos Sakalis, Stefanos Kaxiras, Alberto Ros, Alexandra Jimborean, and Mag-
nus Själander. 2019. Efficient Invisible Speculative Execution Through Selective
Delay and Value Prediction. In Proceedings of the 46th International Symposium
on Computer Architecture. ACM, 723–735.
[81] Michael Schwarz, Claudio Canella, Lukas Giner, and Daniel Gruss. 2019. Store-
to-Leak Forwarding: Leaking Data on Meltdown-resistant CPUs. arXiv preprint
arXiv:1905.05725 (2019).
[82] Michael Schwarz, Moritz Lipp, and Daniel Gruss. 2018. JavaScript Zero: real
JavaScript and zero side-channel attacks. NDSS 18 (2018).
[83] Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian Steck-
lina, Thomas Prescher, and Daniel Gruss. 2019. ZombieLoad: Cross-Privilege-
Boundary Data Sampling. arXiv preprint arXiv:1905.05726 (2019).
[84] Michael Schwarz, Clémentine Maurice, Daniel Gruss, and Stefan Mangard. 2017.
Fantastic timers and where to find them: high-resolution microarchitectural
attacks in JavaScript. In International Conference on Financial Cryptography and
Data Security. Springer, 247–267.
[85] Michael Schwarz, Robert Schilling, Florian Kargl, Moritz Lipp, Claudio Canella,
and Daniel Gruss. 2019. ConTExT: Leakage-Free Transient Execution. arXiv
preprint arXiv:1905.09100 (2019).
[86] Michael Schwarz, Martin Schwarzl, Moritz Lipp, Jon Masters, and Daniel Gruss.
2019. Netspectre: Read arbitrary memory over network. In European Symposium
on Research in Computer Security. Springer, 279–299.
[87] Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Mag-
nus O Myreen. 2010. x86-TSO: a rigorous and usable programmer’s model for
x86 multiprocessors. Commun. ACM 53, 7 (2010), 89–97.
[88] Eric Sprangle, Robert S Chappell, Mitch Alsup, and Yale N Patt. 1997. The agree
predictor: A mechanism for reducing negative branch history interference. In
ACM SIGARCH Computer Architecture News, Vol. 25. ACM, 284–291.
[89] Julian Stecklina and Thomas Prescher. 2018. LazyFP: Leaking FPU register state
using microarchitectural side-channels. arXiv preprint arXiv:1806.07480 (2018).
[90] G Edward Suh, Dwaine Clarke, Blaise Gassend, Marten Van Dijk, and Srinivas
Devadas. 2014. AEGIS: architecture for tamper-evident and tamper-resistant
processing. InACM International Conference on Supercomputing 25th Anniversary
Volume. ACM, 357–368.
[91] Jakub Szefer. 2018. Survey of Microarchitectural Side and Covert Channels,
Attacks, and Defenses. Journal of Hardware and Systems Security (13 September
2018). https://doi.org/10.1007/s41635-018-0046-1
[92] Mohammadkazem Taram, Ashish Venkat, and Dean Tullsen. 2019. Context-
sensitive fencing: Securing speculative execution via microcode customization.
In Proceedings of the Twenty-Fourth International Conference on Architectural
Support for Programming Languages and Operating Systems. ACM, 395–410.
[93] Caroline Trippel, Daniel Lustig, and Margaret Martonosi. 2018. MeltdownPrime
and SpectrePrime: Automatically-Synthesized Attacks Exploiting Invalidation-
Based Coherence Protocols. arXiv preprint arXiv:1802.03802 (2018).
[94] Paul Turner. 2018. Mitigating speculative execution side channel hardware
vulnerabilities. https://github.com/intelstormteam/Papers accessed Oct. 2019.
15
[95] Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, Frank
Piessens, Mark Silberstein, Thomas F Wenisch, Yuval Yarom, and Raoul Strackx.
2018. Foreshadow: Extracting the keys to the intel SGX kingdom with transient
out-of-order execution. In 27th USENIX Security Symposium (USENIX Security
18). 991–1008.
[96] Stephan van Schaik, Alyssa Milburn, Sebastian Österlund, Pietro Frigo, Giorgi
Maisuradze, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. 2019. RIDL:
Rogue In-flight Data Load. In S&P.
[97] Yao Wang, Andrew Ferraiuolo, Danfeng Zhang, Andrew C Myers, and G Ed-
ward Suh. 2016. SecDCP: secure dynamic cache partitioning for efficient tim-
ing channel protection. In Design Automation Conference (DAC), 2016 53nd
ACM/EDAC/IEEE. IEEE, 1–6.
[98] ZhenghongWang and Ruby B Lee. 2006. Covert and side channels due to proces-
sor architecture. In Computer Security Applications Conference, 2006. ACSAC’06.
22nd Annual. IEEE, 473–482.
[99] Zhenghong Wang and Ruby B Lee. 2007. New cache designs for thwarting soft-
ware cache-based side channel attacks. In ACM SIGARCH Computer Architecture
News, Vol. 35. ACM, 494–505.
[100] Zhenghong Wang and Ruby B Lee. 2008. A novel cache architecture with
enhanced performance and security. In Microarchitecture, 2008. MICRO-41. 2008
41st IEEE/ACM International Symposium on. IEEE, 83–93.
[101] Ofir Weisse, Ian Neal, Kevin Loughlin, Thomas F Wenisch, and Baris Kasikci.
2019. NDA: Preventing Speculative Execution Attacks at Their Source. In
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microar-
chitecture. ACM, 572–586.
[102] Ofir Weisse, Jo Van Bulck, Marina Minkin, Daniel Genkin, Baris Kasikci, Frank
Piessens, Mark Silberstein, Raoul Strackx, Thomas F Wenisch, and Yuval Yarom.
2018. Foreshadow-NG: Breaking the virtual memory abstraction with transient
out-of-order execution. Technical Report. Technical report.
[103] Mario Werner, Thomas Unterluggauer, Lukas Giner, Michael Schwarz, Daniel
Gruss, and Stefan Mangard. 2019. Scattercache: Thwarting cache attacks via
cache set randomization. In 28th USENIX Security Symposium (USENIX Security
19). 675–692.
[104] Zhenyu Wu, Zhang Xu, and Haining Wang. 2014. Whispers in the hyper-space:
high-bandwidth and reliable covert channel attacks inside the cloud. IEEE/ACM
Transactions on Networking 23, 2 (2014), 603–615.
[105] Wenjie Xiong, Nikolaos Athanasios Anagnostopoulos, André Schaller, Stefan
Katzenbeisser, and Jakub Szefer. 2019. Spying on Temperature using DRAM. In
Proceedings of the Design, Automation, and Test in Europe (DATE).
[106] Wenjie Xiong and Jakub Szefer. 2019. Leaking Information Through Cache LRU
States. arXiv preprint arXiv:1905.08348 (2019).
[107] Yunjing Xu, Michael Bailey, Farnam Jahanian, Kaustubh Joshi, Matti Hiltunen,
and Richard Schlichting. 2011. An exploration of L2 cache covert channels in
virtualized environments. In Proceedings of the 3rd ACM workshop on Cloud
computing security workshop. ACM, 29–40.
[108] Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison, Christopher
Fletcher, and Josep Torrellas. 2018. InvisiSpec: Making Speculative Execution
Invisible in the Cache Hierarchy. In 2018 51st Annual IEEE/ACM International
Symposium on Microarchitecture (MICRO). IEEE, 428–441.
[109] Mengjia Yan, Bhargava Gopireddy, Thomas Shull, and Josep Torrellas. 2017. Se-
cure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against
Cache-Based Side Channel Attacks. In Proceedings of the 44th Annual Interna-
tional Symposium on Computer Architecture. ACM, 347–360.
[110] Mengjia Yan, Read Sprabery, Bhargava Gopireddy, Christopher Fletcher, Roy
Campbell, and Josep Torrellas. 2019. Attack directories, not caches: Side channel
attacks in a non-inclusive world. In Attack Directories, Not Caches: Side Channel
Attacks in a Non-Inclusive World. IEEE, 0.
[111] Fan Yao,Milos Doroslovacki, and Guru Venkataramani. 2018. Are Coherence Pro-
tocol States Vulnerable to Information Leakage?. In High Performance Computer
Architecture (HPCA), 2018 IEEE International Symposium on. IEEE, 168–179.
[112] Yuval Yarom and Katrina Falkner. 2014. FLUSH+ RELOAD: A High Resolution,
Low Noise, L3 Cache Side-Channel Attack.. In USENIX Security Symposium,
Vol. 1. 22–25.
[113] Yuval Yarom, Daniel Genkin, and Nadia Heninger. 2017. CacheBleed: a timing
attack on OpenSSL constant-time RSA. Journal of Cryptographic Engineering 7,
2 (2017), 99–112.
[114] Jiyong Yu, Mengjia Yan, Artem Khyzha, Adam Morrison, Josep Torrellas, and
Christopher W Fletcher. 2019. Speculative Taint Tracking (STT): A Compre-
hensive Protection for Speculatively Accessed Data. In Proceedings of the 52nd
Annual IEEE/ACM International Symposium onMicroarchitecture. ACM, 954–968.
[115] Danfeng Zhang, Aslan Askarov, and Andrew C Myers. 2012. Language-based
control and mitigation of timing channels. ACM SIGPLAN Notices 47, 6 (2012),
99–110.
[116] Danfeng Zhang, Yao Wang, G Edward Suh, and Andrew C Myers. 2015. A
hardware design language for timing-sensitive information-flow security. In
ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 503–516.
[117] Yinqian Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2014. Cross-
tenant side-channel attacks in PaaS clouds. In Proceedings of the 2014 ACM
SIGSAC Conference on Computer and Communications Security. ACM, 990–1003.
16
