SPECCFI: Mitigating Spectre Attacks using CFI Informed Speculation by Koruyeh, Esmaeil Mohammadian et al.
SPECCFI: Mitigating Spectre Attacks using CFI
Informed Speculation
Esmaeil Mohammadian Koruyeh, Shirin Haji Amin Shirazi, Khaled N. Khasawneh,
Chengyu Song and Nael Abu-Ghazaleh
Computer Science and Engineering Department
University of California, Riverside
{emoha004,shaji007,kkhas001,csong,naelag}@ucr.edu
Abstract—Spectre attacks and their many subsequent variants
are a new vulnerability class for modern CPUs. The attacks rely
on the ability to misguide/hijack speculative execution, generally
by exploiting the branch prediction structures, to execute a
vulnerable code sequence speculatively. In this paper, we propose
to use Control-Flow Integrity (CFI), a security technique used
to stop control-flow hijacking attacks, on the committed path to
prevent speculative control-flow from being hijacked to launch
the most dangerous variants of the Spectre attacks (Spectre-BTB
and Spectre-RSB). Specifically, CFI attempts to constrain the
target of an indirect branch to a set of legal targets defined
by a pre-calculated control-flow graph (CFG). As CFI is being
adopted by commodity software (e.g., Windows and Android)
and commodity hardware (e.g., Intel’s CET and ARM’s BTI),
the CFI information could be readily available through the hard-
ware CFI extensions. With the CFI information, we apply CFI
principles to also constrain illegal control-flow during speculative
execution. SPECCFI ensures that control flow instructions target
legal destinations to constrain dangerous speculation on forward
control-flow paths (indirect calls and branches). We complement
our solution with a precise speculation-aware hardware stack to
constrain speculation on backward control-flow edges (returns).
We combine this solution with existing solutions against branch
target predictor attacks (Spectre-PHT) to close all non-vendor-
specific Spectre vulnerabilities. We show that SPECCFI results
in small overheads both in terms of performance and additional
hardware complexity.
I. INTRODUCTION
The recent Spectre [41] attacks have demonstrated how
speculative execution can be exploited to enable disclosure of
secret data across isolation boundaries. Specifically, attackers
can misguide the processor to speculatively execute a read
instruction with an address under their control. Although the
speculatively read values are not visible through the archi-
tectural state, since the misspeculation effects are eventually
undone, they can be communicated out using a side-channel.
Since their introduction, a large number of attacks following
the same pattern (temporary read of sensitive data through
speculation, followed by disclosure of this data through a
covert channel) have been discovered which enabling bypass-
ing different permissions using a number of different specula-
tion triggers [10], [13], [26], [29], [40], [43], [45], [57], [62],
[66], [72]; it is clear that this is a general class of weakness
that requires deep rethinking of processor architecture.
Since speculation has a great impact on performance, to
mitigate this threat without throttling speculation, some so-
lutions such as InvisiSpec [73] and SafeSpec [38] propose
separating speculative data from committed data. Such an
approach, rather than attempting to limit speculation, would
isolate possible leakage. However, the principle has to be
applied to every micro-architectural structure (e.g., cache,
TLB, DRAM row buffer), and it is unclear if this approach
could prevent leakage through contention, for example, using
the functional unit port side-channel [7], [13].
Another direction to mitigate this threat is to abort spec-
ulation if a potentially dangerous gadget can be executed
speculatively. For example, Intel and AMD suggest insert-
ing serialization instructions like lfence to prevent loading
potentially secret data [6], [34]. Because blindly inserting
serialization instructions will have the same effect as disabling
speculation, thus severely reducing performance [32], a better
solution is to conditionally insert barriers. The MSVC C
compiler [47], oo7 [70], and Respectre [31] use static analysis
to identify dangerous gadgets and only insert lfence before
the identified gadgets. Context-Sensitive Fencing [63] dynam-
ically inserts serialization instructions when a load instruction
operates on untrusted data (address), but only for Spectre-PHT.
Our observation is that Spectre-like attacks rely on ma-
nipulating the prediction structures (see Section II-A for de-
tails) to coerce speculation to an attacker-chosen code gadget
with attacker-controlled input. Therefore, these attacks can
be defeated more efficiently by identifying and preventing
wrong speculation (prediction). As the first step towards this
direction, we propose SPECCFI, a lightweight solution to
prevent the two most dangerous Spectre variants: Spectre-
BTB (v2) and Spectre-RSB (v5) attacks. SPECCFI prevents
these attacks by using control-flow integrity (CFI) principles
to identify when to constrain speculation.
In contrast to traditional CFI, even hardware supported pro-
posals, whose purpose is to prevent illegal control flow within
the primary architecturally visible control flow of a program,
SPECCFI pushes CFI to the speculation level, where it can
be used to determine whether a speculative execution path
should be allowed or limited. Compared to existing solutions
against Spectre-BTB and Spectre-RSB attacks, such as the
recent microcode update from Intel [34] and retpoline [65],
SPECCFI introduces less performance degradation as it still
allows correct speculation to proceed, while existing solutions
blindly “disable” all indirect branch prediction.
ar
X
iv
:1
90
6.
01
34
5v
1 
 [c
s.C
R]
  4
 Ju
n 2
01
9
We also like to argue that defenses against Spectre-BTB
and Spectre-RSB attacks serve as the foundation for defense
against Spectre-PHT (v1) attacks. The reason is that serial-
ization instructions can be viewed as a special type of inline
reference monitor. Therefore, it is crucial to make sure that
these inserted barriers are never bypassed. However, without
protections against Spectre-BTB (forward indirect branches)
and Spectre-RSB (returns), attackers can easily bypass the
barriers to carry out the attacks [13]. Furthermore, as demon-
strated in return-oriented programming [61], by jumping to the
middle of an x86 instruction, attackers can use unintended gad-
get to launch attacks. For this reason, we envision SPECCFI
being combined with existing solutions against v1 attacks [19],
[51], [63] to provide comprehensive protection against Spectre
attacks.
The SPECCFI principle can be applied to any CFI im-
plementation (e.g., coarse-grained such as Intel’s CET [36],
or fine-grained such as HAFIX [21]), with small differences
in implementation and leading to the enforcement to the
respective version of CFI. We present our baseline design
in Section IV and Section V. We investigate two versions
of SPECCFI: SPECCFI-base that implements CFI only for
speculation, and SPECCFI-full that also supports CFI for
the committed control flow (i.e. conventional goal of CFI).
Section VII evaluates performance and complexity of both
SPECCFI-base and SPECCFI-full. We show that SPECCFI-
base eliminates dangerous misspeculations (where the pre-
dicted target label does not match the destination), without
deteriorating performance.
SPECCFI-full incurs an additional small overhead, on par
with other hardware CFI implementations [20]–[22]. We also
analyze the implementation complexity resulting from the
hardware shadow stack, and the extensions to the BTB, and
find that the overhead is modest.
Although some software and hardware solutions have
started to appear to defend against this class of attacks, we
believe that our solution is elegant along with a number
of interesting properties. We believe that it also combines
well with other proposed defenses, such as SafeSpec [38]
and InvisSpec [73] which limit the speculative side effects
once misspeculation occurs, by limiting the opportunities for
harmful speculation. Section VIII compares SPECCFI to these
and other works.
In summary, the contributions of the paper include:
• We present a new defense against Spectre variants that
rely on polluting the BTB and RSB, by embedding CFI
principles into the branch prediction decisions.
• We analyze the security of the proposed designs showing
that it protects against all variants of Spectre-BTB (v2)
and Spectre-RSB (v5) attacks. Combined with solutions
such as context-sensitive fencing, we believe that we can
completely secure the system against Spectre attacks.
• We analyze the performance and complexity of SPECCFI,
showing that it leads to little overhead. Compared to a
defense that prevents speculation around indirect jumps,
indirect calls and returns, SPECCFI provides equivalent
security yet still avoids the large performance overheads.
The hardware complexity is also negligible.
II. BACKGROUND
This section overviews some background: branch predictor
structures in modern processors, Spectre attacks, and CFI.
A. Branch prediction and Spectre attacks
Branch prediction is a critical component of modern proces-
sors that support speculative out-of-order execution. When a
control flow instruction (branch, call or return) is encountered,
the result of the instruction (whether or not a conditional
branch will be taken; what is the target of an indirect branch or
call or a return) is generally not known at the front end of the
pipeline. As a result, to continue to fill the pipeline and utilize
the available resources of the processor branch prediction is
used.
tag target label
BTB
CFI Label  Branch Address
Predicted Direction
Predicted Target
0
1
S0
MuxRSB/SCS
LCP
return address
return address
PHT
TN T N
is_return
Fig. 1: Branch Predictor Unit consists of three different pre-
dictors: (1) PHT for conditional branch direction; (2) BTB for
indirect branch addresses; and (3) RSB for return addresses.
Modern processors employ sophisticated predictors (shown
in Figure 1) which consists of three components:
• Direction predictor: is responsible for predicting the di-
rection of a conditional branch. Although a number of
implementations have been studied, modern predictors
typically implement a two-level context sensitive predic-
tor [26]. The first level is a simple predictor that hashes
each branch address to a direction predictor (typically
a 2-bit saturating counter). This predictor is used either
when a branch is not being successfully predicted or
when the predictor has not been trained yet. When the
predictor is trained, it typically uses a variant of a gshare
predictor [75], which uses the global history of a branch
in addition to its address to hash to a direction predictor
as before. The advantage is that the same branch can have
different predictions based on the control flow path used
to reach it.
• Target predictor: is used by indirect jump and indirect call
instructions which jump to an address held in a register or
a memory location. In this case, the target of the branch is
not known at the front-end of the pipeline. This predictor
typically uses the hash of the branch address to index a
cache holding the branch targets called the branch target
buffer (BTB). Modern processors often have a 2-way
or 4-way set associative BTBs. BTBs are shared across
TABLE I: Spectre attack variants and their targeted branch
prediction components
Spectre Element exploited
Spectre-PHT (v1) [41] Pattern History Table (PHT)
Spectre-PHT (v1.1) [40] Pattern History Table (PHT)
Spectre-BTB (v2) [41] Branch Target Buffer (BTB)
Spectre-RSB (v5) [43], [45] Return Stack Buffer (RSB)
threads on a virtual core: one value used by a process
could be used by another process whose branch has a
matching address in the BTB [27].
• The return address stack: Since returns are not well pre-
dicted using the BTB, and often follow strict call-return
semantics, their target is predicted using a return address
stack of fixed size. When a call instruction executes,
the return address is pushed on this hardware stack; if
overflow happens, previous entries are overwritten [43].
When a return is encountered the top of the stack is
popped and used as the return target.
Spectre Attacks Spectre attacks exploit the branch and
aliasing predictors to fool them to access unauthorized data
speculatively [13], [15], [29], [40], [41], [43], [45]. The main
properties that the attack exploits in speculative execution are:
(1) Speculative instructions have unintended side-effects on
micro-architectural states even if they do not get committed;
and (2) attackers can deliberately mislead execution into
attacker-intended gadgets by mistraining branch predictors,
and use the previous property to leak sensitive information.
Specifically, an attacker selected gadget is executed specu-
latively to perform unauthorized access and leak the value
through a side-channel [10], [32], [41]. Based on the prediction
structure being attacked, variants of the Spectre attacks that are
addressed in this work are shown in Table I. Mitigations for
other variants of the Spectre attacks as well as variants of the
Meltdown attack have been discussed thoroughly by works
such as Canella et al. [15].
B. Control-flow Integrity
Control-flow integrity (CFI) [4], [53] is the state-of-the-
art solution to mitigate control-flow hijacking attacks. In
such attacks, attackers corrupt/overwrite control data (i.e. data
that controls indirect control transfer, function pointers and
return addresses for instance ) to divert the victim program’s
execution to carry out attacker-controlled logic, such as a
malware and open a backdoor. CFI prevents such attacks by
enforcing a basic safety property: software execution must
follow a path of a control-flow graph (CFG) determined ahead
of time [4]. Hence, a CFI mechanism always consists of two
components: one that computes the CFG and one that regulates
the control transfer.
Constructing CFG. The security guarantee of a CFI mech-
anism directly depends on the accuracy of the CFG, which
can be constructed through static or dynamic analysis. Coarse-
grain CFI mechanisms [77], [78] generate the CFG using
simple static analysis: any address-taken function can be a
legitimate target for any indirect call; any address taken basic
block can be a legit target for any indirect jump; and the
address of the next instruction after any call can be a legit
target for a return. Although coarse-grained CFI can eliminate
most illegal control transfer targets, follow-up research has
shown that the CFG used is too permissive/inaccurate that
it still allows attacks [17], [28]. Fine-grained CFI solutions
improve the accuracy of the CFG by incorporating type
information [49], [54], [64], [68], [71]. Unfortunately, the
CFG may still allow illegal control transfers [16], [25]. More
recently, researchers have proposed utilizing runtime informa-
tion to further improve the precision of the CFG [24], [50],
[67], which can even achieve perfect accuracy [30] (i.e., one
possible target per indirect control transfer site).
Regulating control-flow. Once the CFG is calculated, legit-
imate control transfers can be grouped into equivalence sets.
Within the same set, control-flow can be transferred from any
source location (e.g., a callsite or return site) to any target
location (e.g., target function or callsite). By assigning each
equivalence set a unique ID/label, runtime control-flow can
be regulated with a simple check—source label must match
destination label. Such checks can be implemented using either
software or hardware. Some hardware extensions only support
a single label [11], [36], [37] thus can only enforce coarse-
grained CFI. Others support multiple labels [20], [22] and fine-
grained CFI. Some hardware extensions also include a shadow
stack to enforce unique return target [20], [21], [36], [37].
Adoption. Because of its effectiveness against control-flow
hijacking attacks, CFI has been adopted by both commodity
software and hardware. Tice et al. [64] introduced forward-
edge CFI to LLVM and GCC in 2014. Android adopted this
implementation in 8.1 to protect its media stack and extended
the protection in Android 9 to more components and the
OS kernel. Microsoft introduced its own CFI implementation,
control-flow guard in Visual Studio 2015 and has been uti-
lizing it to protect important OS components, including the
web browser. In Windows 10 (V1730), Microsoft extended
the protection to the OS kernel and hypervisor (Hyper-V). On
the hardware side, Intel introduced Control-flow Enforcement
Technology (CET) [36] and ARM introduced a similar mech-
anism, Branch Target Indicators (BTI), in ARMv8.5-A [11].
III. SPECCFI SYSTEM MODEL
This section first overviews the threat model we assume
to clarify the assumptions made in the system. We next
describe the extensions to the Instruction Set Architecture
(ISA) to support SPECCFI. Finally, we describe the compiler
modifications to prepare the binary for execution to benefit
from the SPECCFI defense.
A. Threat Model
The main goal of SPECCFI is to prevent attackers from
launching branch target injection attacks (i.e., Spectre-BTB
and Spectre-RSB) to leak memory content of the victim
program that is otherwise not accessible to the attackers.
We assume a strong local adversary model. In particular, we
assume a shared BTB across different hardware threads (i.e.,
hyperthread) and protection domains (address space, privilege
level, and SGX enclaves). We assume RSB is not shared
between hardware threads but it is shared between different
protection domains.
In our model, adversaries can inject arbitrary branch targets
into BTB by executing arbitrary indirect branches within their
own protection domain. Their goal is to control the predicted
branch target in the victim protection domain.
We note that Meltdown style attacks [44], [48], [57], [58],
[66], [69] are outside the threat model since they occur due
to speculation on the value to be used within the execution of
the same instruction; privileged kernel memory [44], L1 cache
contents [66], fill buffer [58], in-flight data in modern CPUs
(for example: Re-Order Buffer and Line Fill Buffers) [69],
and store buffer [48], [57]. Moreover, misspeculation through
the direction predictor (which leads to Spectre-PHT) does not
result in a control flow violation, since it is simply resulting in
the incorrect choice among two legal control flow directions.
Luckily, existing works have already developed protections
against Spectre-PHT, primarily by limiting speculation around
conditional branches that can lead to dangerous misspecu-
lation [19], [31], [47], [63], [70]. Similarly, Spectre-STL is
out-of-scope but can be completely mitigated by disabling
speculative store bypass [5], [9], [34]. To the best of our
knowledge, SPECCFI is the first hardware design that targets
the more dangerous Spectre-BTB and Spectre-RSB attacks
even when they use diffident side-channels (e.g., contention-
based side-channel in SMT processors [13]).
We further assume that target software is protected with
hardware-enforced CFI, which marks valid indirect control
transfer targets (e.g., ENDBRANCH in CET). Although the target
software may contain memory vulnerabilities (e.g., buffer
overflows) that could be exploited to achieve arbitrary read and
write (i.e., the traditional threat model for CFI), such attacks
are out-of-scope of this work.
B. Instruction Set Architecture (ISA) Extension
Most hardware CFI extensions [11], [20]–[22], [36] use
target labeling to enforce forward-edge CFI and a shadow
stack to enforce backward-edge CFI. Without the loss of
generality, we assume two modifications to the ISA to inform
the hardware of the labels from the CFG analysis:
• Extending the indirect jmp and call instructions to
include CFI labels. For coarse-grained CFI enforcement
(e.g., Intel CET [36] and ARM BTI [11]), the label at
jump and callsites can be omitted.
• Adding a new instruction to mark legitimate indirect
branch targets with corresponding labels. For coarse-
grained CFI enforcement, the label can be omitted (e.g.,
the case of Intel CET) or collapsed to two labels: one for
jump targets and the other for call targets (e.g., the case
of ARM BTI).
The shadow stack is generally transparent to the program
and will not be directly manipulated. However, certain lan-
guage features such as exception handling, setjmp/longjmp,
require manipulation of the shadow stack. To support these
features, additional instructions are needed, but since they do
not interact with SPECCFI, we omit their details. The Intel
CET specification [36] can be referred to as an example of
what instructions are necessary and how they interact with the
ISA. Table II summarizes required ISA changes.
TABLE II: ISA Extensions to support CFI.
Instruction Description
call [dest],label Target class-aware call
jmp [dest],label Target class-aware jump
cfi_lbl Verify CFI integrity
C. Compiler Modification
SPECCFI relies on the compiler to mark valid indirect
control transfer targets with labels.
Fortunately, because these required modifications are the
same as CFI, they are already available as part of commod-
ity compilers. For example, both LLVM and GCC include
support for (1) software-enforced fine-grained forward-edge
CFI [64], (2) Intel CET, and (3) ARM BTI. Therefore,
SPECCFI requires little (potentially no) modifications to the
compilers. SPECCFI is compatible with any label based CFI
implementation.
IV. FORWARD-EDGE DEFENSE
In this section, we describe the component of SPECCFI
responsible for preventing both misspeculation as well as
control-flow that breaks CFI on the forward-edge (i.e., on
indirect calls and indirect jumps). This defense is responsible
for preventing Spectre-BTB (v2) both within the same address
space and across different address spaces. It is also responsible
for maintaining CFI integrity on committed instructions (the
traditional use of CFI).
A. Preventing Spectre-BTB (within the same address space)
In this attack, the attacker pollutes the target BTB entry
by repeatedly executing another indirect branch in its address
space that hashes into the same entry. This can be achieved
through script engines like the JavaScript engine in browsers
and the BPF JIT engine in the kernel. When the victim branch
is executed speculatively, the polluted entry will direct the
victim to a malicious gadget. Our goal is to prevent the victim
from jumping speculatively to the malicious gadget.
Our first design considers augmenting the BTB to hold
a CFI label for the target. This design would extend the
execution of indirect call/jmp instructions to update the
BTB to add not only the target (which is the traditional
implementation) but also the CFI label of the branch. Later in
the speculation path, all indirect calls and jumps are indexed
to the BTB to predict their target as before, but with an
additional check against the CFI label. This defense prevents
attacker-controlled misspeculation: the CFI label of the current
indirect branch instruction is compared with the label of the
BTB entry. If the labels do not match, we prevent fetching
and executing instructions speculatively from this target. For
benign programs, such misspeculation is likely to occur only
when the BTB is cold (has not been initialized yet), or when
branch aliasing causes collisions in the BTB structure. While
these cases should be rare, in both cases the value in the BTB
is not the correct target. Limiting such erroneous speculation
might result in performance improvement since we do not
waste any cycles on fetching instructions from what is likely
to be the wrong path.
Since only committed indirect branches update the BTB,
possible targets that may be used by attackers are limited to
gadgets starting with a cfi_lbl instruction with an identical
label to that of the call/jmp instructions label. Note that a
label may be shared by multiple locations in the code in CFI,
and misspeculation among these locations is still possible (i.e.,
control flow bending [17]); as known from CFI solutions, this
set is much smaller than the potential targets set without CFI.
B. Preventing Spectre-BTB (cross-address-spaces)
0x09: load rax, 0x25
0x10: call *rax, L1
...
0x25: cfi_lbl L1
0x26: add rbx,1
0x09: load rax, 0x50
0x10: call *rax,L1
...
0x25: load rbx,[secret
]
0x50: cfi_lbl L1
(Attacker) (Victim)
Fig. 2: Example attack across address spaces
The first design described above (storing the CFI labels in
BTB entries) can sufficiently mitigate attacks within the same
address space, but cannot in general prevent attacks across
address spaces, where attackers pollute the globally shared
BTB from a program controlled by them. In this case, if
attackers know the label used by the victim program (e.g.,
through offline analysis), they can potentially craft a valid
entry in BTB with the same label as victims and bypass the
protection. Consider the example in Figure 2. The attacker
inserts L1 and 0x25 in the 0x10 index of BTB, by carefully
selecting the location of a branch and its label. This entry
is valid since the target is annotated with cfi_lbl L1, the
same label as the call. When the CPU context switches to
the victim space, the victim call at location 0x10 is indexed
to BTB and uses the BTB entry, inserted by the attacker to
predict its target. As a result, the CPU continues speculative
execution of the malicious gadget from 0x25. In this case,
despite the fact that the victim’s security check for labels is
passed, the attacker is able to redirect the control flow and
execute the malicious gadget to reveal the secret.
To prevent cross-address-space attacks, one possibility is to
randomize the mapping of addresses to the BTB (e.g., similar
to the CASESAR solution for caches [55]) to make it difficult
for attackers to guess the label or the location associated with
the target branch. However, as this approach only provides
probabilistic guarantees against attacks, we decided to use
check label
initial
any instruction 
except indirect call/jmp
waiting
indirect call/jmp
insert fences
any instruction 
except cfi_lbl
cfi_lbl
not matching labels
matching labels
Fig. 3: State machine for design protecting against attacks
across address spaces
an alternative implementation that avoids using labels in the
BTB. Specifically, we enforce the CFI check by ensuring that
the first speculatively executed instruction after an indirect
branch is a legal CFI label instruction with a matching label.
We note that this is the standard implementation of hardware
acceleration of CFI. However, since we are using CFI to
constrain speculation (not just the committed instructions),
this approach requires pushing the check a little earlier in
the execution (to the decode stage of the first instruction on
the speculative path). However, as our experimental analysis
shows, this results in a negligible impact on performance
since only the detection of misspeculation is delayed, but legal
speculation is not.
With respect to performance, the two implementations op-
erate differently, but are likely to perform similarly. The first
implementation requires modifications to the critical BTB
structure and can potentially slow down the execution pipeline,
favoring the second, target label-checking, implementation.
A small disadvantage of the second implementation is that
the target instructions have to be speculatively fetched (if
not cached) to be able to check the label, which could be
avoided if the label mismatched is detected by the BTB in
the first implementation. However, this should lead to no loss
of performance since these instructions cannot be executed
anyway until they are fetched.
The state machine implementing the check in the decode
stage of the pipeline is shown in Figure 3. Unlike the initial
design modifying the BTB, we do not reference the BTB for
checking the labels. Starting at the initial state, any indirect
call/jmp instruction in the decode stage sets the CFI_REG
register with its own CFI label and causes the CPU to wait
for a cfi_lbl instruction. The decode stage makes sure that
the next instruction is a cfi_lbl instruction. This restricts
potential gadgets to be starting with a cfi_lbl instruction.
Moreover, the CPU will confirm that the CFI_REG value and
the label of the cfi_lbl instruction are equal. In this way,
potential gadgets are further restricted to those with a matching
label. When the instruction following the call/jmp is not
a cfi_lbl instruction or when the label of the cfi_lbl
instruction does not match the label of the call/jmp, a
lfence micro-up is inserted into the pipeline to guarantee
that executing from the wrong speculative path is prevented.
C. Enforcing CFI for Committed Instructions
The design presented thus far prevents all variants of
Spectre-BTB attacks. SPECCFI is essentially hardware-
supported CFI, but with CFI enforcement during speculation.
Thus, given the similarity in the hardware support to traditional
CFI, we also extend the design to support standard CFI to
enforce the CFI rules on committed instructions and defend
against control flow hijacking attacks. This support is achieved
by enforcing the CFI check during the commit stage of the
pipeline: if an indirect call/jmp instruction is not followed
by a cfi_lbl instruction with a matching label, the CPU
raises a CFI violation exception.
V. BACKWARD-EDGE DEFENSE
The backward-edge defense component of SPECCFI pro-
tects misspeculation on return instructions. Return instructions
typically obtain their predicted addresses from a hardware
stack called the Return Stack Buffer (RSB). The RSB has
been shown to be vulnerable to a range of Spectre attacks [43],
[45]. To provide protection for the backward-edge, hardware
CFI proposals use a Shadow Call Stack (SCS), which is
protected from normal memory reads and writes, and can
only be manipulated through special instructions [36]. Similar
to RSB, the SCS is used to retain the return addresses of
previously executed calls. The differences are: (1) SCS is in
memory, so it is saved and restored across context-switch;
while RSB is a special cache in the CPU and its content is
shared across different context. (2) SCS is only used for CFI
enforcement and its size is configurable; while RSB is only
used for speculation, and since misspeculation was thought to
be only a performance problem, RSB is a best effort structure
that is not maintained precisely and has a limited size.
A. Combined Speculation-consistent RSB/SCS: Overview
To provide defenses against Spectre-RSB attacks, we com-
bine traditional RSB and SCS into a unified structure RS-
B/SCS acting as both RSB and call stack. Conceptually,
RSB in our design can be viewed as the in-processor cache
for the in-memory SCS. We note that this is different from
other SCS implementations that retain the RSB separately. By
getting speculation targets from the precisely maintained SCS,
consistent with the philosophy of SPECCFI, we move the CFI
guarantees to the speculation stage, closing the Spectre-RSB
vulnerability.
The overall design of RSB/SCS has additional requirements
from the design of conventional SCS. Specifically, since we
have to be able to use it to obtain speculation targets, it
must track additional speculative state without affecting the
committed state of the SCS. We describe the overall design in
the remainder of this section.
When a context switch occurs, the committed RSB/SCS
entries must be saved such that they can be restored when
the program runs again. To be able to keep the state of
this structure consistent, we extend the reorder buffer (ROB,
which is the structure in the CPU used to track speculative
instructions and their register values before they commit) to
track this state. Specifically, we add a logical register OLD_RS
which (is subject to renaming and) holds the return address
that is pushed to the RSB/SCS by a call instruction, or popped
by a return instruction from the RSB/SCS. In addition, we
keep track of a pointer to the last committed entry (LCP) of
the RSB/SCS so as to save and restore the state of committed
entries in this structure in the case of context switch or a spill
overflow to memory. At the decode stage, If the instruction
is a call, the next address is “speculatively” pushed to the
RSB/SCS structure. When this instruction commits, the LCP is
updated to point to the last committed entry. If the instruction
is decoded as a return it “speculatively” pops a return value
from the RSB/SCS structure into OLD_RS (without changing
LCP) and sets the program counter to this address for next
instruction fetch. To support conventional CFI, when the return
instruction reaches the commit stage, the value of the OLD_RS
register is compared with the top of the traditional software
stack. If these two values do not match, a CFI violation
exception is raised. Otherwise, this return instruction gets
committed, and the LCP is decreased by 1 to point to the
next committed entry in the RSB/SCS.
We considered the need to provision the stack with ad-
ditional ports since it is used not only to serve committed
instructions, but also to handle speculative calls and returns.
However, we found that additional ports do not result in
performance benefits because the speculative SCS state is held
primarily in the port-rich reorder buffer. When the in-processor
cache (RSB) overflows or the current thread is about to be
swapped out, we spill it over to the hardware-protected in-
memory SCS. When the RSB underflows or a new thread is
swapped in, we load entries from the SCS. Currently, we did
not explore optimization to prefetch values from the SCS when
RSB is close to empty, or to push some values proactively to
memory when RSB gets close to full.
B. Misprediction Recovery
Every ret instruction utilizes the RSB/SCS to predict
their jump target. Since the state of RSB/SCS is modified by
speculative call and ret instructions, in case of misspeculation,
the CPU has to recover the correct state of the structure.
When misspeculation is detected, we need to flush all
the speculated instructions from the pipeline. As a part of
this process, we have to annul all the corresponding entries
from the ROB. During annulment, for every call or return
instruction, we not only remove the ROB entry but also update
the RSB/SCS to preserve the consistent state of the structure.
If the instruction is a call, the top of the RSB/SCS will be
popped. In the case of a ret instruction, the value of OLD_RS
will be pushed back to the RSB/SCS.
C. RSB/SCS Work Flow
To clarify how this structure works, we step through the
example code sample presented in Figure Figure 5.
instruction Spec bit … OLD_RS
call 0 0x10
call 0 0x25
instruction Spec bit … OLD_RS
ret 1 0x25
0x10
0x25
LCP
0x10
0x25
LCP
0x10
LCP
0x26
instruction Spec bit … OLD_RS
call 1 0x26
jz 1 null
ret 1 0x26
Call 1 0X27
0x10
LCP
0x260X27
instruction Spec bit … OLD_RS
call 1 0x26
jz 1 null
0x10
LCP
0x26
0x10
LCP
0x27
instruction Spec bit … OLD_RS
call 1 0x26
jz 1 null
ret 1 0x26
Call 1 0X27
instruction Spec bit … OLD_RS
call 1 0x26
jz 1 null
ret 1 0x26
Call 1 0X27
1 2
4 5
3
ROB ROB ROB
ROB ROB ROB
RSB RSB RSB
RSBRSB RSB6
Fig. 4: Example of the operation of the combined RSB/SCS
0x09: call Function1;
0x10:
0x24: Function1:
call Function2;
0x25: call Function3;
0x26: call Function4;
0x27:
0x36: Function2:
ret;
0x74: Function3:
jz 0x86;
0x86: ret;
Fig. 5: Code sample to illustrate the operation of RSB/SCS
Let’s assume both calls to function1 and function2 have
pushed their return values to the RSB/SCS. By committing
these instructions at Ê, the LCP is updated to point to the
last committed value and then the corresponding entries are
evicted from ROB. In the second step Ë, the return instruction
from the first call is being executed speculatively, saving the
return address in the ROB, and eventually getting committed.
The following speculative call to function3 at Ì, will push
its return address to RSB/SCS. At stage Í, the execution
of the return instruction and the following call to function4,
change the RSB/SCS state. Assume that a misspeculation on
the jz instruction has been detected at Î and every instruction
executed after the branch has to be flushed. Therefore, the
recovery process starts annulling instructions from the last
entry in ROB until the misspeculated instruction has been
reached. Annulling the last call in the ROB at Î, the value at
the top of RSB/SCS is popped and at Ï, annulling the return,
the OLD RES value of the instruction saved in ROB is pushed
back to the RSB/SCS to reset the state to the previous state
before the misspeculation.
D. Preventing RSB Poisoning
Since the RSB/SCS is not shared between different threads
and preserved across context switches, the attacker is not
able to poison this structure. Although we allow special
instructions to manipulate the SCS to take care of cases such
as setjmp/longjmp, we assume these instructions are only
available to code within the trusted computing base to prevent
them from being abused to arbitrarily manipulate the RSB/SCS
(which is not a Spectre vulnerability).
VI. SECURITY ANALYSIS
In this section, we analyze whether SPECCFI can achieve
our design goal: preventing attackers from launching branch
target injection attacks to leak memory content of the victim
program that is otherwise not accessible to the attackers.
A. Guarantees against Branch Target Injection
Branch target injection attacks target two prediction com-
ponents: branch target buffer (BTB) and return stack buffer
(RSB). Similar to CFI where the defense does not prevent
attackers from modifying control data (e.g., function pointers
and return addresses) but aims to prevent attackers from
arbitrarily altering the control-flow to execution code gadget
they want; SPECCFI does not prevent BTB injection: attackers
can still insert arbitrary prediction targets into the BTB by
executing branches inside their own protection domain [26].
What SPECCFI guarantees is that if the injected target is not
a valid indirect control transfer target in the victim protection
domain, then the injected prediction target will not be executed
speculatively, i.e., they cannot speculatively execute arbitrary
code gadgets. For RSB, SPECCFI essentially converts it into a
cache for the shadow call stack (SCS) and is flushed/restored
during context switch, so both in-address-space injection and
cross-address-space injection are no longer feasible.
Impact of Imprecise CFG: One weakness of static CFG
construction is imprecision, which may still allow attackers
to launch attacks using permitted function-level gadgets [14],
[16], [25], [56]. Since SPECCFI also relies on the CFI analysis
to provide valid targets for forward-edge indirect control
transfer, it also suffers from the same weakness: mis-prediction
is still possible. However, we want to point out that because of
some unique characteristics of Spectre attacks, this weakness
does not pose significant security threats. We will discuss the
details in the next subsection.
B. Incorporating Defense against Spectre-PHT
SPECCFI alone can only mitigate Spectre-BTB and Spectre-
RSB attacks. In this subsection, we discuss how SPECCFI
can be and should be combined with Spectre-PHT defenses to
complete the defense against all Spectre variants. In particular,
to defend against Spectre-PHT attacks, researchers have pro-
posed code analysis techniques [31], [47], [70] to (1) identify
dangerous code gadgets that can be used to leak information
and (2) conditionally insert serialization instructions (e.g.,
lfence) to prevent these dangerous code gadgets from being
executed speculatively. One tricky part of such analysis is
that, although on the committed path, direct control transfer
is always correct; during speculation, even direct control
transfer can be wrong. For a simple example, consider a
direct call behind a conditional branch: if the prediction on
the conditional branch is wrong, then the following direct call
is also wrong. For this reason, when analyzing the code to
identify potential dangerous gadgets for Spectre-like attacks,
one must perform inter-procedural analysis (for both direct and
indirect calls) to account for gadgets that may span between
function calls. The unique opportunity here is that, if the static
analysis to identify and eliminate Spectre gadgets uses the
same CFG as CFI enforcement, then malicious gadgets at
the beginning of functions should already be eliminated. As a
result, when combined with such defenses, even if SPECCFI
allows misspeculation due to imprecise CFG, the wrong target
cannot be used to launch attacks, because the gadgets have
already been eliminated.
At the same time, defenses against Spectre-PHT attacks
also rely on SPECCFI-like technique. The reason is the same
as why inline reference monitors like SFI [46], [74] has to
enforce certain degree of control-flow regulation—if attackers
can hijack the control-flow to arbitrary locations, then they can
easily bypass the inserted checks and bypass the protection.
This is especially dangerous to variable length ISA like x86
where attackers can jump to the middle of an instruction
and start executing new logic. Similarly, SPECCFI provides
the same runtime guarantee to Spectre-PHT defenses: by
enforcing that even speculative control-flow cannot deviate
from the CFG used in static analysis, the code being analyzed
and instrumented will be the same as executed.
C. Comparison to Intel CET
A few days before the submission of this paper, Intel
published a new specification of its CET [36] extensions. The
new specification includes a paragraph (section 3.8) indicating
their plans to include a check that an indirect branch executed
speculatively targets a legal Branch_end target. Intel sug-
gested this solution, which is essentially the configuration of
SPECCFI using CET as the CFI implementation, concurrently
with our work.
We believe that Intel’s interest in this solution validates it
practicality as a defense against transient speculation attacks.
While the updated CET specifications document describes
only the general idea, our work contributes a reference imple-
mentation and assessment of both the performance and secu-
rity of the solution. In addition, SPECCFI provides substantial
security advantages over the new CET, including:
• Backward edge protection using the speculation aware
shadow stack. While Intel CET uses a shadow stack to
protect the backward edge for committed instructions,
the specifications describe no plans to use it for limiting
speculation. It is not trivial to extend the shadow stack to
track the speculative state, as we describe in Section V.
• Generalized CFI protection and limiting control flow
bending. CET only enforces that control flow (whether
committed or, in the new specifications, speculative)
happens to the start of a legal basic block. As a result,
it allows arbitrary control flow bending [16], which
does not meaningfully restrict the attack opportunities.
In contrast, SPECCFI admits any CFI implementation,
which can substantially shrink the control bending attack
possibilities. Specifically, from a given indirect control
flow instruction, only the gadgets with matching CFI
label are reachable. State-of-the-art CFI systems such as
PathArmor/Context Sensitive CFI can be supported [67]
substantially limiting the control flow opportunities. In
particular, we intend to explore supporting uCFI [30]
in our future work, leaving no control flow bending
opportunities available.
VII. EVALUATION
In this section, we evaluate SPECCFI in terms of per-
formance and hardware complexity. All experiments were
conducted using the MARSSx86 (Micro Architectural and
System Simulator for x86) [52], a widely used cycle accurate
simulator. MARSSx86 is built using PTLsim [76] and does
a full system simulation (including the OS) on top of the
QEMU [12] emulator. First, we configured MARSSx86 to
simulate an Intel Skylake processor; configurations are shown
in Table III. We then integrated SPECCFI into the simulator
to model all new operations realistically and in full details, in
order to retain hardware faithful cycle accurate modeling of
the extended processor pipeline.
A. Performance Evaluation
We use the SPEC2017 benchmarks [2] for evaluation,
which is a standard benchmark suite used to evaluate the
impact of processor modification on a range of representative
applications that exhibit a range of different behaviors. All
benchmarks were compiled using an LLVM compiler that is
TABLE III: Configuration of the simulated CPU
Parameter Configuration
CPU SkyLake
Issue 6-way issue
IQ 96-entry Issue Queue
Commit Up to 6 Micro-Ops/cycle
ROB 224-entry Reorder Buffer
iTLB 64-entry instructions Translation Lookaside Buffer
dTLB 64-entry data Translation Lookaside Buffer
LDQ 72-entry Load Queue
STQ 56-entry Store Queue
RSB 16-entry Return Stack Buffer
I-Cache 32 KB, 8-way, 64B line, 4 cycle hit
D-Cache 32 KB, 8-way, 64B line, 4 cycle hit
modified to mark valid indirect control transfer targets with
labels. Unfortunately, since there is no official LLVM front-
end for FORTRAN [3], 8 out of the 23 SPEC2017 benchmarks
were not compiled as they contain FORTRAN code.
One option to prevent Spectre attacks is to insert fences
to stop speculation around indirect control flow instructions.
In order to evaluate SPECCFI performance, we compare it
against the following design points:
• Baseline: this is the case of an unmodified unprotected
machine. Specifically, we compile and run the SPEC2017
benchmarks using unmodified version of LLVM compiler
and MARSSx86 simulator. In all of our experiments, we
use the Instructions committed Per Cycle (IPC), a com-
mon metric for evaluating the performance of processors,
to report performance. The IPC values of the defenses
are normalized to this baseline implementation without
defenses; thus, a higher normalized value than 1 indicates
better than baseline performance.
• Retpoline-style software fencing: we implement a system
adding fences to indirect branches using software. The
compiler is modified to substitute all the indirect branches
and return instructions with a sequence of instructions
which ensure that the target of the branches are resolved
before any following instruction that might touch the
cache (i.e, load) are issued. For protecting the forward
edges (i.e. indirect call and jumps) This is done by
converting each indirect call to the three following in-
structions: Ê a load preparing the value of the target
register/memory, Ë an lfence making sure that no
future load is issued before the branch is resolved and
Ì the actual call to the address specified in the target
register. Taking the same approach for securing backward
edges (i.e. returns) we substitute any ret instruction
with a sequence of Ê a pop from top of the software
stack to the target register, Ë an lfence making sure
to stop the speculation before the actual target of ret
resolved and Ì a jmp to transfer the control to the target.
Conceptually, this solution is similar to the Retpoline
defense [65] which essentially replaces speculation on
indirect branches with an empty stall gadget. Different
from Retpoline, we also insert the fences for returns
(Retpoline does not protect returns, and leaves the code
vulnerable to Spectre-RSB attacks).
0
0.2
0.4
0.6
0.8
1
1.2
pe
rlb
en
ch gcc mc
f
na
md
pa
res
t
po
vra
y
lbm
om
ne
tpp
xal
an
cb
mk x26
4
ble
nd
er
de
ep
sje
ng
im
ag
ick lee
la na
b xz
Av
era
ge
NO
RM
AL
IZE
D 
IP
C
All Target Fencing-Full Retpoline-sty le-Full SpecCFI-Full
Fig. 6: Performance Impact
This software approach has the advantage of not modi-
fying the underlying hardware but imposes a noticeable
overhead in the number of instructions and code size.
• All Target Fencing: In this approach, we show one
implementation with an lfence, inserted in hardware,
at target of each indirect branch and return (the all target
fencing) since such a defense is possible without CFI.
This is done by detecting every indirect call, jump, or
return in the decode stage of the pipeline and inserting
an lfence at target of them to make sure that the branch
is resolved before issuing further instructions.
The implementations discussed above prevent speculation
by inserting lfence into the pipeline. SPECCFI offers a
more intelligent and targeted way of using fences for securing
forward edges (as discussed in Section IV), as well as a
new method for making backward edges secure (as explained
in Section V). To study the effect of different serializing
instruction we use two different types of lfence instructions
in our experiments:
• Strict lfences, are highly restrictive and prevent any
instruction to pass through them until the fence re-
tires [63]. This type of fences impose high overhead to the
system. All the x86 serialization instructions including
the lfence we use in our experiment, categorize as strict
fences.
• Relaxed lfences, only stop certain types of instructions
until the fence gets retired [63], while letting the others
through. For example, LSQ-LFENCE [63], prevents any
subsequent load instruction from being issued specula-
tively out of the load/store queue but allows any other
instruction to pass it. LSQ-LFENCEs are secure against
Spectre because they prevent the speculative loads, and
have the advantage of letting speculation on other types
of instructions proceed, substantially reducing the perfor-
mance impact.
Figure 6 shows the performance overhead of SPECCFI-full
(securing both forward and backward edges) in comparison
to All Target Fencing and Retpoline-style software fencing
approaches. We note that in general, inserting serializing
instructions (e.g, lfence) in the target of every indirect
branch is expensive, imposing performance overhead of 39%
and 48% on average for All Target Fencing and Retpoline style
respectively. Using SPECCFI, by inserting lfence only when
the CFI check fails, the number of inserted lfence drops
020000
40000
60000
80000
100000
120000
140000
pe
rlb
en
ch gcc mc
f
na
md
pa
res
t
po
vra
y
lbm
om
ne
tpp
xal
an
cb
mk x26
4
ble
nd
er
de
ep
sje
ng
im
ag
ick lee
la na
b xz
Av
era
ge
LF
EN
CE
S P
ER
 M
ILL
IO
N 
IN
ST
RU
CT
IO
NS
SpecCFI All Target Fencing
Fig. 7: Number of lfences inserted by different defenses
0
0.2
0.4
0.6
0.8
1
1.2
pe
rlb
en
ch gcc mc
f
na
md
pa
res
t
po
vra
y
lbm
om
ne
tpp
xal
an
cb
mk x26
4
ble
nd
er
de
ep
sje
ng
im
ag
ick lee
la na
b xz
Av
era
ge
NO
RM
AL
IZE
D 
IP
C
All Target Fencing-Forward All Target Fencing-Backward SpecCFI-Forward SpecCFI-Backward
Fig. 8: Overhead breakdown for forward and backward edge
significantly thus reducing the performance overhead to less
than 1.9% on average.
To illustrate the reason behind the performance reduction
in the different approaches, we study the number of lfence
inserted in each approach in Figure 7. Note that benchmarks
such as mcf and omnet, are C++ benchmarks which use a
large number of indirect branches due to the common use of
virtual function calls and function pointers. As a result, this
leads to a large number of lfence being inserted into the
pipeline, and to a substantial performance impact compared to
the baseline implementation. The only exception in this trend
is Provay which has the highest overhead in all the defenses
but does not have huge number of lfence compared to the
other benchmarks. Looking more closely at this benchmark,
we found out that it is a memory intensive benchmark with
the highest number of load and store micro-ops among all the
benchmarks. Intel manuals [35] indicate that an lfence is
committed only when there is no preceding outstanding store.
Thus, for this benchmark, each lfence instruction remains
active for a longer period of time until it gets committed
which explains the high performance impact. It is also worth
mentioning that unlike the All target fencing and Retpoline-
style which insert lfence for each indirect branch, the
reported number of lfence for SPECCFI is due to mis-
prediction of the BTB. This means that the higher the rate
of mis-prediction is, the more lfence are inserted.
Since SPECCFI does not protect the backward edges using
lfence, in Figure 8, we study the effect of securing the for-
ward and backward edges separately. Note that in Retpoline-
style, all return instructions are converted to a sequence of
instructions terminating with a jmp, meaning that there is no
remaining ret instruction (i.e. backward-edge) in the code
compiled in this setting. Therefore, the overhead measured
as the overhead of Retpoline-style-full is equivalent to only
Retpoline-style-forward overhead and the overhead on the
0
0.2
0.4
0.6
0.8
1
1.2
pe
rlb
en
ch gcc mc
f
na
md
pa
res
t
po
vra
y
lbm
om
ne
tpp
xal
an
cb
mk x26
4
ble
nd
er
de
ep
sje
ng
im
ag
ick lee
la na
b xz
Av
era
ge
NO
RM
AL
IZE
D 
IP
C
All Target Fencing-Full Retpoline-Style-Full SpecCFI-Full
Fig. 9: Performance using relaxed fences
backward-edge is zero. The results from the breakdown show
that as expected, the overhead in general increases with the
number of indirect branches in All Target Fencing. As for
SPECCFI, the overhead caused from forward edge defense
is mainly low since it is only due to CFI mismatches which
indicate misprediction of the branches. Therefore, the major
part of the whole SPECCFI overhead is the overhead of
SPECCFI-full on the backward-edge which is associated with
maintaining the RSB/SCS hardware structure. It is important to
consider that this maintenance effort also includes procedures
to make sure the committed path is secured and therefore only
a portion of this overhead is associated with defense against
spectre attacks.
As mentioned previously, since strict lfence imposes a
higher overhead on the system and relaxed lfence provides
the same security guarantee with lower overhead, we imple-
mented all discussed defenses with relaxed lfence as well
to study the differences in overhead. Figure 9 examines the
effect of relaxed lfence. The results show that the overhead
caused by strict lfence is much higher than that of relaxed
lfence. Also as expected, using strict instead of relaxed
causes far more performance degradation when the benchmark
is memory intensive (i.e., has a lot of stores in this case). Our
results show that just by changing the type of the lfence
from strict to relaxed, the average overhead drops down from
48.9% to 22.6% for Retpoline-style and from 39.9% to 18.82%
for All Target Fencing. However, they are still substantially
higher than SPECCFI.
B. Hardware Implementation Overhead
To estimate the hardware overheads of SPECCFI, we im-
plemented the basic structures and integrated them within an
open core to estimate the area and timing overhead. Specif-
ically, the implementation consists of adding two CFI_REG
registers in two locations of the pipeline: (1) decode stage, to
support detecting CFI violations for speculative instructions
and (2) commit stage, to support detecting CFI violations for
committed instructions. Since CFI_REG is used to store the
CFI labels its size should be the same as the maximum CFI
label size (32-bits for our design). Furthermore, we need to add
two comparators; one in decode and one in commit stage of
the pipeline. These comparators will be used by cfi_lbl
instruction to compare its label to the CFI_REG (todetect
violations).
TABLE IV: SPECCFI hardware implementation overhead
after adding it to the AO486 open-core
Static power Dynamic power Area Cycle time
SPECCFI 0.4% 0.4% 0.1% 0.0%
Additionally, SPECCFI needs a LCP register to point to the
last entry of the RSB/SCS from a committed call, used to
distinguish between entries from speculative and committed
instructions. Since RSB/SCS has 16 in-processor cache entries,
the LCP size is 4-bit. Moreover, at two stages of the pipeline,
new entries can be added to the RSB/SCS: (1) while executing
call instruction and (2) load the preserved RSB/SCS entries
from memory in case of underflow. Therefore, we had to
update the number of write ports from 1 to 2. The same
thing applies to the number of read ports, as we may use
RSB/SCS to fetch next instruction while spilling over to
memory in case of RSB/SCS overflow. In addition, to preserve
the correct behaviour of RSB/SCS, we provided two LCP
update mechanisms: (1) -/+1: for regular push/pop operations
and (2) -/+4: for handling overflow and underflow of the
structure. The cost of the RSB/SCS itself did not lead to a
noticeable increase in complexity or area.
To measure the impact of SPECCFI implementation on
power, area, and cycle time, we modified the open source pro-
cessor (AO486) [8] to include SPECCFI design using Verilog.
To synthesize the implementation of integrating SPECCFI to
the processor on a DE2-115 FPGA board [1] we used Quartus
2 17.1 software. The results shown in Table IV prove that
SPECCFI indeed has very low implementation complexity. In
terms of power, there is a 0.4% increase in core dynamic
and static power. Although it is difficult to measure power
accurately, we applied the power analysis tool provided by
Quartus to measure power after synthesis to get more accurate
results. In terms of area, there is a 0.1% increase in total
logic elements. Moreover, since SPECCFI design is simple,
it fits within the optimized frequency of the core. Thus, it
has no effect on cycle time. The AO486 processor is an
implementation of the 80486 ISA using a 32-bit in-order
pipeline. Thus, these results are relative to the small pipelined
core; the overheads will be much smaller if compared to a
modern out-of-order superscalar core.
C. Empirical Security Evaluation
1) Against real exploits: To verify our analysis, we eval-
uated the effectiveness of SPECCFI against real-world ex-
ploits. We ran previously disclosed Spectre-BTB [41], Spectre-
RSB [43], and SMoTHerSpecter [13] PoC inside the emulator.
Table V summarizes the results, using the same classification
scheme proposed in [15]. The experiment results show that
SPECCFI was able to prevent all information leaks.
2) Impact of CFG precision: To study the difference be-
tween coarse-grained CFI (e.g., Inte CET [36]) and fine-
grained CFI (e.g., SPECCFI) against BTB injection attacks,
we used the SMoTherSpectre [13] for a demonstration. In this
attack, attacker has to find a BTI gadget in the victim process
which loads a secret in a register and terminates by an indirect
TABLE V: Empirical security evaluation of SPECCFI.
in-place out-of-place
Spectre-BTB Cross-address-space 3 3Same-address-space 3 3
Spectre-RSB Cross-address-space 3 3Same-address-space 3 3
SmotherSpecter Cross-address-space 3 3Same-address-space 3 3
TABLE VI: Available SMother Gadgets in Standard Libraries
Standard Libraries CFI ImplementationCoarse-grained Fine-grained
glibc-2.29 314 1
libssl-1.1 21 1
libcrypto-1.1 98 4
ld-2.29 64 0
libstdc++ 47 0
branch to be able to perform BTB injection. By poisoning the
BTB, attacker transfers the control to a SMoTHer Gadget to
leak the secret. The SMoTHer Gadget start with a comparison
based on the target register followed by an conditional jump
which enables the SMoTherSpectre to leak the secret through
port contention side-channel. Figure 10 compares the required
SMoTHer gadgets and feasibility of the attack under coarse-
grained and fine-grained CFI.
Table VI shows the number of available SMoTher Gadgets
from several standard libraries. Using the constraints for the
SMother Gadget in [13], we scanned for valid SMoTHer
gadgets in the first 70 instructions after label instructions
(endbr64 and cfi_lbl). For SPECCFI, we used function
signature based approach for generating labels [49], [50].
As we can see, although fine-grained CFI still permits some
gadgets, the number is much smaller.
VIII. RELATED WORK
Since the initial announcement of Spectre and Meltdown in
January of 2018, several Spectre variants have appeared [26],
[40], [41], [43], [45]. Spectre attacks are characterized by
manipulating the prediction mechanisms to trigger speculation
to an attacker chosen gadget. They differ in what they exploit
to trigger speculation: branch direction predictor (variant 1,
variant 1.1) [26], [40], [41], branch target predictor (or branch
target buffer) for variant 2 [41], return stack buffer for Spectre-
RSB (also called variant 5) [43], [45], or load-store aliasing
predictor for variant 4 [29]. To mitigate these attacks, several
software and hardware defenses ranging from programming
guidelines for cryptographic software developers [18] to ar-
chitectural changes [38], [73] have been proposed. In this
section, we will overview these defenses categorized into the
Spectre attack variants that they defend against. Table VII
shows the Spectre attacks defenses and which attacks they
mitigate and Table VIII shows the Spectre attacks defenses and
their impact on hardware complexity, software modifications,
and performance. SPECCFI is the only defense that provides
complete protection against all Spectre attacks with little
impact on performance and implementation overhead. Note
that we are not considering Meltdown style attacks [44], [48],
Train_BTB:
0x1:mov rax, 0x20
0x2:call *rax
foo:
0x10: endbr64
0x11: nop
main: //BTI gadget
0x0:mov rdx,[secret]
0x1:mov rax,0x10
0x2:call *rax //baz()
baz: //Smother free
0x10: endbr64
...
0x14: nop
bar://Smother Gadget
0x20:endbr64
0x24:cmp $0, rdx
0x25:je <>
Attacker Victim
(a) Coarse-grained enforcement of CFI (e.g. CET)
Train_BTB:
0x1:mov rax, 0x20
0x2:call *rax, L1
foo:
0x20: cfi_lbl, L1
0x21: nop
main: //BTI gadget
0x0:mov rdx,[secret]
0x1:mov rax,0x10
0x2:call *rax, L1//baz()
baz: //Smother free
0x10: cfi_lbl, L1
...
0x14: nop
bar://Smother Gadget
0x20:cfi_lbl, L2
0x21:cmp $0, rdx
0x22:je <>
Attacker Victim
(b) Fine-grained enforcement of CFI (e.g, SPECCFI)
Fig. 10: Speculative control-flow bending attack example.
TABLE VII: Spectre defenses and the attacks they mitigate. Symbols show if an attack is mitigated ( ), not mitigated (#), or
partially mitigated (G#).
Attacks Side-channel prevention Speculation prevention
DAWG [39] SafeSpec [38],InvisiSpec [73]
LFENCE
[6], [9], [32]
IRBS, IBPB,
STIBP [6], [34]
(SLH) [19],
(YSNB) [51] Retpoline [65] RSB Stuffing [33] CSF [63] ConTExT [59] SPECCFI
Spectre-PHT G# G#  #  # #  G#  
Spectre-BTB G# G# #  #  # G# G#  
Spectre-RSB G# G# # # # #  G# G#  
SmotherSpectre # # # # # G# G# G# G#  
TABLE VIII: Spectre defenses and their overhead in terms of hardware complexity, software modification, and performance.
Symbols show if overhead is high (↑), low (↓), or no overhead (-). The performance overhead results are based on what was
reported in the studies; which are based on real world usage or a specific benchmark (may not represent real world usage).
Overhead Side-channel prevention Speculation prevention
DAWG [39] SafeSpec [38],InvisiSpec [73]
LFENCE
[6], [9], [32]
IRBS, IBPB,
STIBP [6], [34]
(SLH) [19],
(YSNB) [51] Retpoline [65] RSB Stuffing [33] CSF [63] ConTExT [59] SPECCFI
Hardware ↑ ↑ – – – – – ↓ ↓ ↓
Software modification – – ↑ – ↑ ↓ ↓ ↓ ↓ ↓
Performance 1 - 5 % SafeSpec: -3%InvisiSpec: 22% 62 - 74.8 % 20 - 50 %
SLH: 29 - 36.4 %
YSNB: 60 % 5 - 10 % ↓ 2.7 - 15.2 % 1 - 71.14 % 1.9 %
[57], [58], [66], [69] since they rely on speculation within a
single instruction and therefore do not rely on manipulating
the branch prediction structures.
A. Spectre-PHT Defenses
Spectre-PHT exploits the PHT of the branch predictor
to perform the attack. To defend against this attack, Intel,
AMD, and ARM proposed to use instructions that serial-
ize the execution (e.g. lfence) to stop speculation around
branches (e.g. both directions of the branch) [6], [9], [32].
Although liberal serialization (e.g., at every branch instruc-
tion) can mitigate Spectre-PHT attacks, doing so severely
hurts performance [32]: serializing all branch instructions will
eliminate the performance benefit of the branch predictor
(e.g., up to 10x slowdown [51]). Against this drawback,
multiple proposals tried to reduce the number of serializing
instructions introduced using static analysis to serialize exe-
cution around exploitable gadgets only [31], [32], [47], [70].
However, these approaches miss some of the gadgets that
can be exploited [42]. Another weakness about these defenses
is that even though they stop speculative execution around
exploitable gadgets, they do not stop speculative code fetches
and other micro-architectural behaviors before execution (e.g.,
instruction cache and iTLB fills) which can still leak data [60].
Speculative Load Hardening (SLH) [19] and You Shall
Not Bypass (YSNB) [51] tried to reduce the high overhead
of using liberal fences. Therefore, they proposed to identify
Spectre gadgets, then injecting artificial dependencies between
branches and identified gadgets. Doing so will reduce the
speculation window of the attack. Although this would results
in performance advantage over liberal fencing, they still have
36%-60% performance overhead [63].
An attractive hardware solution is Context-Sensitive Fencing
(CSF) [63]. CSF is a micro-code mitigation technique were
serialization instructions are added dynamically based on run-
time conditions that identifies potential exploit execution.
Injecting serialization instructions dynamically reduces the
impact of stopping speculation on performance which results
on low performance overhead. Moreover, CSF proposed to
defend against Spectre-BTB and Spectre-RSB using a special
fence that would flush the BTB/RSB when transferring control
to higher domains. However, flushing BTB and RSB would
hurt performance since it will result in more mis-predictions.
In addition, in simultaneous multithreading (SMT) processor,
flushing the BTB/RSB after control transfer is not enough to
protect against Spectre-BTB and Spectre-RSB since they can
be polluted after the control transfer using other threads.
B. Spectre-BTB and Spectre-RSB Defenses
Spectre-BTB exploits the BTB and Spectre-RSB exploit
the RSB to perform the attack. Google proposed Return
Trampoline (retpoline) [65] as a software mitigation tech-
nique that defends against Spectre-BTB by replacing indirect
branches with push+return instruction sequence that prevent
BTB poisoning. However, this solution has high performance
overhead since it stops speculation (similar to serialization).
In addition, it can be bypassed using ret instructions since
they cause miss-speculation through BTB; this is a by-product
of a feature on Intel’s Skylake+ processors (starting from
Skylake) that allow processors to predict the address of a ret
instruction from BTB in case of RSB underfilling. To solve
this exploit, RSB stuffing [33] was proposed to intentionally
fill the RSB with benign delay gadgets to avoid misspeculation
on context switches. Although this technique can partially
mitigate Spectre-BTB (when using ret to trigger speculation
through BTB), it can also defend against SpectreRSB cross-
domains attack. However, since we are filling the RSB on
context switch, stored entries for the currently running process
will be lost when execution is switched back to the current
process (i.e. performance loss due to losing speculation infor-
mation). Against this drawback, SPECCFI saves committed
RSB entries per process in case of a context switch out of
the process and restores them when execution returns to the
process, which results in improving the prediction performance
of ret instructions.
Intel and AMD added new instructions to their instruction
set architecture (ISA) that can control indirect branches to
defend against Spectre-BTB [6], [34]. The addition consists
of three controls:
• Indirect Branch Restricted Speculation (IBRS): allows
processors to enter IBRS mode (privileged mode) and
execute indirect branches that are not influenced by less
privileged mode.
• Single Thread Indirect Branch Prediction (STIBP): will
not allow a hyperthread running on a core to use branch
predictor entries inserted by the other thread running on
the same core.
• The Indirect Branch Predictor Barrier (IBPB): allows
processors to flush BTB and clear their state. This way
the code executed before the barrier cannot impact branch
prediction of the code executed after this instruction.
These new ISA instructions defend only against Spectre-
BTB. In addition, they have a high performance overhead; up
to 24% on Skylake and up to 53% on Haswell [23].
C. Spectre All Variants Defenses
Several mitigations were proposed to defend against all
variants of Spectre. Dynamically Allocated Way Guard
(DAWG) [39] was proposed to provide isolation between
protection domains by partitioning the cache at the cache
way granularity. Although this method can prevent leakage
of the data through a cache side-channel, it requires domains
enforcement management in the software, defending cache as
leakage source only, and it can not protect against attacks
that are performed within the same address space or isolation
domain. In addition, since it is a cache specific defense, other
micro-architectural structures can be used for communication
(e.g. branch predictor).
SafeSpec [38] and InvisiSpec [73] are hardware mitigation
techniques that are similar to DAWG in the way that they are
both trying to prevent side-channel communication from spec-
ulative instructions. Therefore, they propose to mitigate the
side-effect of speculative execution on the micro-architectural
state; shadow micro-architectural structures for caches and
Translation Lookaside Buffers (TLBs) were added to store
transient effect of speculative instructions. These effects will
be committed to caches and TLBs only when speculation
is deemed correct and flush the changes from the shadow
structures otherwise. Although these solutions outperform soft-
ware solutions, they require making disruptive changes to the
processor/memory architecture and consistency models.
ConTExT [59] introduced protecting secret data from spec-
ulative execution. Basically, they proposed a new memory
mapping (called non-transient mapping) which indicates data
that must not be accessed by speculative instructions. Never-
theless, this solution requires changes to the architecture and
the operating system, the developer involvement by annotating
the secret data, and incur high performance overhead for
security-critical applications.
IX. CONCLUDING REMARKS
In this paper, we presented a new design that for the
first time, protects against misspeculation targeting the branch
target buffer (BTB) and the return stack buffer (RSB). These
attacks are arguably the most dangerous speculation attacks
because they can bypass compiler inserted fences. Prior de-
fenses either excluded these attacks from their threat model, or
resulted in aggressive speculation that dramatically degraded
performance. In contrast, SPECCFI provides complete pro-
tection against these dangerous attacks, with little impact on
performance, and with minimal hardware complexity.
SPECCFI introduces the idea of using CFI, explored previ-
ously as a protection against control-flow hijacking attacks for
committed instructions (i.e., even on non-speculative proces-
sors), as a defense against speculation attacks. In particular,
SPECCFI verifies the forward-edge of CFI on the instructions
in the speculative path and only allows speculation if CFI
labels match and verifies the backward-edge using a unified
shadow call stack. Essentially, SPECCFI moves the CFI check
to the decode stage of the pipeline, preventing speculative
execution of instructions unless they conform to the CFI
annotations. For normal programs, this results in little perfor-
mance degradation since it only prevents speculation with mis-
matching CFI labels, which will result in misspeculation. By
stopping misspeculation, we benefit from avoiding the cache
pollution and other resource waste during misspeculation.
Combined with recent proposals to mitigate Spectre-PHT,
we believe SPECCFI completely mitigates the threat from
speculation attacks. Moreover, it does so without sacrificing
performance due to speculative execution and with minimal
modifications to the processor pipeline.
REFERENCES
[1] Altera de2-115 development and education board. https://www.altera.
com/solutions/partners/partner-profile/terasic-inc-/board/altera-de2-
115-development-and-education-board.html#overview, 2010.
[2] Spec cpu2017 documentation. https://www.spec.org/cpu2017/Docs,
2017.
[3] Test suite extensions. https://llvm.org/docs/Proposals/TestSuite.html,
2019.
[4] Martı´n Abadi, Mihai Budiu, U´lfar Erlingsson, and Jay Ligatti. Control-
flow integrity. In ACM Conference on Computer and Communications
Security (CCS), 2005.
[5] ADVANCED MICRO DEVICES, INC. Amd64 technology:
Speculative store bypass disable. https://developer.amd.com/wp-
content/resources/124441 AMD64 SpeculativeStoreBypassDisable
Whitepaper final.pdf, 2018.
[6] ADVANCED MICRO DEVICES, INC. Software tech-
niques for managing speculation on amd processors.
https://developer.amd.com/wp-content/resources/90343-B
SoftwareTechniquesforManagingSpeculation WP 7-18Update FNL.
pdf, 2018.
[7] Alejandro Cabrera Aldaya, Billy Bob Brumley, Sohaib ul Hassan,
Cesar Pereida Garca, and Nicola Tuveri. Port contention for fun and
profit. Technical report, 2018. Available from https://eprint.iacr.org/
2018/1060.pdf.
[8] Osman Aleksander. The ao486 project. https://github.com/alfikpl/ao486,
2014.
[9] ARM. Cache speculative side-channels. https://bugs.chromium.org/p/
project-zero/issues/detail?id=1528, 2018.
[10] ARM. Vulnerability of speculative processors to cache timing side-
channel mechanism. https://developer.arm.com/support/security-update,
2018.
[11] ARM Limited. Arm® a64 instruction set architecture (00bet9), 2018.
[12] Fabrice Bellard. Qemu, a fast and portable dynamic translator. In
USENIX Annual Technical Conference, volume 41, page 46, 2005.
[13] Atri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandtner,
Alessandro Sorniotti, Babak Falsafi, Mathias Payer, and Anil Kur-
mus. Smotherspectre: exploiting speculative execution through port
contention. arXiv preprint arXiv:1903.01843, 2019.
[14] Nathan Burow, Scott A Carr, Stefan Brunthaler, Mathias Payer, Joseph
Nash, Per Larsen, and Michael Franz. Control-flow integrity: Precision,
security, and performance. arXiv preprint arXiv:1602.04056, 2016.
[15] Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Ben-
jamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, and
Daniel Gruss. A systematic evaluation of transient execution attacks and
defenses. In USENIX Security Symposium, 2019.
[16] Nicholas Carlini, Antonio Barresi, Mathias Payer, David Wagner, and
Thomas R Gross. Control-flow bending: On the effectiveness of control-
flow integrity. In USENIX Security, 2015.
[17] Nicholas Carlini and David Wagner. ROP is still dangerous: Breaking
modern defenses. In USENIX Security, 2014.
[18] Chandler Carruth. Mitigating speculative attacks in crypto.
https://github.com/HACS-workshop/spectre-mitigations/blob/master/
crypto guidelines.md, 2018.
[19] Chandler Carruth. Rfc: Speculative load hardening (a spectre vari-
ant 1 mitigation). https://lists.llvm.org/pipermail/llvm-dev/2018-March/
122085.html, 2018.
[20] Nick Christoulakis, George Christou, Elias Athanasopoulos, and Sotiris
Ioannidis. Hcfi: Hardware-enforced control-flow integrity. In ACM
Conference on Data and Application Security and Privacy (CODASPY),
2016.
[21] Lucas Davi, Matthias Hanreich, Debayan Paul, Ahmad-Reza Sadeghi,
Patrick Koeberl, Dean Sullivan, Orlando Arias, and Yier Jin. Hafix:
Hardware-assisted flow integrity extension. In Design Automation
Conference (DAC), 2015.
[22] Lucas Davi, Patrick Koeberl, and Ahmad-Reza Sadeghi. Hardware-
assisted fine-grained control-flow integrity: Towards efficient protection
of embedded systems against software exploitation. In Design Automa-
tion Conference (DAC), 2014.
[23] Matthew Dillon. Clarifying the spectre mitigations. http://lists.
dragonflybsd.org/pipermail/users/2018-January/335637.html, 2018.
[24] Ren Ding, Chenxiong Qian, Chengyu Song, Bill Harris, Taesoo Kim,
and Wenke Lee. Efficient protection of path-sensitive control security.
In USENIX Security, 2017.
[25] Isaac Evans, Fan Long, Ulziibayar Otgonbaatar, Howard Shrobe, Martin
Rinard, Hamed Okhravi, and Stelios Sidiroglou-Douskos. Control
jujutsu: On the weaknesses of fine-grained control flow integrity. In
ACM Conference on Computer and Communications Security (CCS),
2015.
[26] D. Evtyushkin, R. Riley, N. Abu-Ghazaleh, and D. Ponomarev. Branch-
scope: A new side-channel attack on directional branch predictor. In
ACM International Conference on Architectural Support for Program-
ming Languages and Operating Systems (ASPLOS), 2018.
[27] Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. Jump
over aslr: Attacking branch predictors to bypass aslr. In Proc. IEEE/ACM
International Symposium on Microarchitecture (Micro), 2016.
[28] Enes Go¨ktas, Elias Athanasopoulos, Herbert Bos, and Georgios Por-
tokalidis. Out of control: Overcoming control-flow integrity. In IEEE
Symposium on Security and Privacy (Oakland), 2014.
[29] J. Horn. speculative execution, variant 4: speculative store by-pass. https:
//bugs.chromium.org/p/project-zero/issues/detail?id=1528, 2018.
[30] Hong Hu, Chenxiong Qian, Carter Yagemann, Simon Pak Ho Chung,
William R. Harris, Taesoo Kim, and Wenke Lee. Enforcing unique
code target property for control-flow integrity. In ACM Conference on
Computer and Communications Security (CCS), 2018.
[31] Open Source Security Inc. Respectre: The state of the art in spectre
defenses. https://www.grsecurity.net/respectre announce.php, 2018.
[32] Intel. Intel analysis of speculative execution side channels.
https://newsroom.intel.com/wp-content/uploads/sites/11/2018/01/Intel-
Analysis-of-Speculative-Execution-Side-Channels.pdf, 2018.
[33] Intel. Retpoline: A branch target injection mitigation.
https://software.intel.com/security-software-guidance/api-app/sites/
default/files/Retpoline-A-Branch-Target-Injection-Mitigation.pdf, 2018.
[34] Intel. Speculative execution side channel mitigations. https://software.
intel.com/security-software-guidance/api-app/sites/default/files/336996-
Speculative-Execution-Side-Channel-Mitigations.pdf, 2018.
[35] Intel Corporation. Intel 64 and ia-32 architectures optimization
reference manual. https://www.intel.com/content/dam/www/public/us/
en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf,
2016.
[36] Intel Corporation. Control-flow enforcement technology preview.
https://software.intel.com/sites/default/files/managed/4d/2a/control-
flow-enforcement-technology-preview.pdf, 2017.
[37] Mehmet Kayaalp, Meltem Ozsoy, Nael Abu-Ghazaleh, and Dmitry
Ponomarev. Branch regulation: Low-overhead protection from code
reuse attacks. In International Symposium on Computer Architecture
(ISCA), 2012.
[38] Khaled N Khasawneh, Esmaeil Mohammadian Koruyeh, Chengyu Song,
Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. Safe-
spec: Banishing the spectre of a meltdown with leakage-free speculation.
In Design Automation Conference (DAC), 2019.
[39] V. Kiriansky, I. Lebedev, S. Amarasinghe, S. Devadas, and J. Emer.
Dawg: A defense against cache timing attacks in speculative execution
processors. 2018.
[40] V. Kiriansky and C. Waldspurger. Speculative buffer overflows: Attacks
and defenses. arXiv preprint arXiv:1807.03757, 2018.
[41] P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Ham-
burg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom.
Spectre attacks: Exploiting speculative execution. In IEEE Symposium
on Security and Privacy (Oakland), 2019.
[42] Paul Kocher. Spectre mitigations in microsoft’s c/c++ compiler.
MicrosoftCompilerSpectreMitigation.html, 2018.
[43] E. Koruyeh, K. Khasawneh, C. Song, and N. Abu-Ghazaleh. Spectre
returns! speculation attacks using the return stack buffer. In USENIX
Workshop on Offensive Technologies (WOOT), 2018.
[44] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh,
J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg.
Meltdown: Reading kernel memory from user space. In USENIX
Security Symposium (Security), 2018.
[45] G. Maisuradze and C. Rossow. ret2spec: Speculative execution using
return stack buffers. In ACM Conference on Computer and Communi-
cations Security (CCS), 2018.
[46] Stephen McCamant and Greg Morrisett. Evaluating sfi for a cisc
architecture. In USENIX Security Symposium, 2006.
[47] Microsoft. Spectre mitigations in msvc. https://blogs.msdn.microsoft.
com/vcblog/2018/01/15/spectre-mitigations-in-msvc/, 2018.
[48] Marina Minkin, Daniel Moghimi, Moritz Lipp, Michael Schwarz,
Jo Van Bulck, Daniel Genkin, Daniel Gruss, Berk Sunar, Frank Piessens,
and Yuval Yarom. Fallout: Reading kernel writes from user space. 2019.
[49] Ben Niu and Gang Tan. Modular control-flow integrity. In ACM
SIGPLAN Conference on Programming Language Design and Imple-
mentation (PLDI), 2014.
[50] Ben Niu and Gang Tan. Per-input control-flow integrity. In ACM
Conference on Computer and Communications Security (CCS), 2015.
[51] Oleksii Oleksenko, Bohdan Trach, Tobias Reiher, Mark Silberstein, and
Christof Fetzer. You shall not bypass: Employing data dependencies to
prevent bounds check bypass. arXiv preprint arXiv:1805.08506, 2018.
[52] A. Patel, F. Afram, and K. Ghose. Marss-x86: A qemu-based micro-
architectural and systems simulator for x86 multicore processors. In
Proc. of QUF, 2011.
[53] PAX team. Future of pax. https://pax.grsecurity.net/docs/pax-future.txt,
2002.
[54] PAX team. RAP: RIP ROP. https://pax.grsecurity.net/docs/PaXTeam-
H2HC15-RAP-RIP-ROP.pdf, 2015.
[55] Moinuddin K. Qureshi. Ceaser: Mitigating conflict-based cache attacks
via encrypted-address and remapping. In Proc. IEEE/ACM International
Symposium on Microarchitecture (Micro), 2018.
[56] Felix Schuster, Thomas Tendyck, Christopher Liebchen, Lucas Davi,
Ahmad-Reza Sadeghi, and Thorsten Holz. Counterfeit object-oriented
programming: On the difficulty of preventing code reuse attacks in c++
applications. In IEEE Symposium on Security and Privacy, pages 745–
762. IEEE, 2015.
[57] Michael Schwarz, Claudio Canella, Lukas Giner, and Daniel Gruss.
Store-to-leak forwarding: Leaking data on meltdown-resistant cpus.
arXiv preprint arXiv:1905.05725, 2019.
[58] Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, Julian
Stecklina, Thomas Prescher, and Daniel Gruss. Zombieload: Cross-
privilege-boundary data sampling. arXiv preprint arXiv:1905.05726,
2019.
[59] Michael Schwarz, Robert Schilling, Florian Kargl, Moritz Lipp, Claudio
Canella, and Daniel Gruss. Context: Leakage-free transient execution.
arXiv preprint arXiv:1905.09100, 2019.
[60] Michael Schwarz, Martin Schwarzl, Moritz Lipp, and Daniel Gruss.
Netspectre: Read arbitrary memory over network. arXiv preprint
arXiv:1807.10535, 2018.
[61] Hovav Shacham. The geometry of innocent flesh on the bone: Return-
into-libc without function calls (on the x86). In ACM Conference on
Computer and Communications Security (CCS), 2007.
[62] J. Stecklina and T. Prescher. Lazyfp: Leaking fpu register state using
microarchitectural side-channels. arXiv preprint arXiv:1806.07480,
2018.
[63] Mohammadkazem Taram, Ashish Venkat, and Dean Tullsen. Context-
sensitive fencing: Securing speculative execution via microcode cus-
tomization. In ACM International Conference on Architectural Support
for Programming Languages and Operating Systems (ASPLOS), 2019.
[64] Caroline Tice, Tom Roeder, Peter Collingbourne, Stephen Checkoway,
U´lfar Erlingsson, Luis Lozano, and Geoff Pike. Enforcing forward-edge
control-flow integrity in gcc & llvm. In USENIX Security, 2014.
[65] P. Turner. Retpoline: a software construct for preventing branch-target-
injection. https://support.google.com/faqs/answer/7625886, 2018.
[66] Jo Van B., M. Minkin, O. Weisse, D. Genkin, B. Kasikci, F. Piessens,
M. Silberstein, T. F Wenisch, Y. Yarom, and R. Strackx. Foreshadow:
Extracting the keys to the intel sgx kingdom with transient out-of-order
execution. In USENIX Security Symposium (Security), 2018.
[67] Victor van der Veen, Dennis Andriesse, Enes Go¨ktas¸, Ben Gras, Lionel
Sambuc, Asia Slowinska, Herbert Bos, and Cristiano Giuffrida. Practical
context-sensitive CFI. In ACM Conference on Computer and Commu-
nications Security (CCS), 2015.
[68] Victor van der Veen, Enes Go¨ktas, Moritz Contag, Andre Pawoloski,
Xi Chen, Sanjay Rawat, Herbert Bos, Thorsten Holz, Elias Athana-
sopoulos, and Cristiano Giuffrida. A tough call: Mitigating advanced
code-reuse attacks at the binary level. In IEEE Symposium on Security
and Privacy (Oakland), 2016.
[69] Stephan van Schaik, Alyssa Milburn, Sebastian sterlund, Pietro Frigo,
Giorgi Maisuradze, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida.
RIDL: Rogue in-flight data load. In IEEE Symposium on Security and
Privacy (Oakland), May 2019.
[70] Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, Tulika Mitra,
and Abhik Roychoudhury. oo7: Low-overhead defense against spectre
attacks via binary analysis. arXiv preprint arXiv:1807.05843, 2018.
[71] Hua Wang, Yao Guo, and Xiangqun Chen. Fpvalidator: validating type
equivalence of function pointers on the fly. In Annual Computer Security
Applications Conference (ACSAC), 2009.
[72] O. Weisse, J. Van, M. Minkin, D. Genkin, B. Kasikci, F. Piessens,
M. Silberstein, R. Strackx, T. Wenisch, and Y. Yarom. Foreshadow-
NG: Breaking the virtual memory abstraction with transient out-of-order
execution. Technical report, 2018.
[73] M. Yan, J. Choi, D. Skarlatos, A. Morrison, C. Fletcher, and J. Torrellas.
Invisispec: Making speculative execution invisible in the cache hierarchy.
In IEEE/ACM International Symposium on Microarchitecture (MICRO),
2018.
[74] Bennet Yee, David Sehr, Gregory Dardyk, J Bradley Chen, Robert Muth,
Tavis Ormandy, Shiki Okasaka, Neha Narula, and Nicholas Fullagar.
Native client: A sandbox for portable, untrusted x86 native code. In
IEEE Symposium on Security and Privacy, pages 79–93. IEEE, 2009.
[75] Tse-Yu Yeh and Yale N Patt. Alternative implementations of two-level
adaptive branch prediction. In ACM SIGARCH Computer Architecture
News, volume 20, pages 124–134, 1992.
[76] M. T. Yourst. Ptlsim: A cycle accurate full system x86-64 microarchi-
tectural simulator. In Proc. of ISPASS, 2007.
[77] Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, Laszlo Szekeres,
Stephen McCamant, Dong Song, and Wei Zou. Practical control flow
integrity and randomization for binary executables. In IEEE Symposium
on Security and Privacy (Oakland), 2013.
[78] Mingwei Zhang and R Sekar. Control flow integrity for COTS binaries.
In USENIX Security, 2013.
