Abstract-Spectre attacks and their many subsequent variants are a new vulnerability class for modern CPUs. The attacks rely on the ability to misguide/hijack speculative execution, generally by exploiting the branch prediction structures, to execute a vulnerable code sequence speculatively. In this paper, we propose to use Control-Flow Integrity (CFI), a security technique used to stop control-flow hijacking attacks, on the committed path to prevent speculative control-flow from being hijacked to launch the most dangerous variants of the Spectre attacks (Spectre-BTB and Spectre-RSB). Specifically, CFI attempts to constrain the target of an indirect branch to a set of legal targets defined by a pre-calculated control-flow graph (CFG). As CFI is being adopted by commodity software (e.g., Windows and Android) and commodity hardware (e.g., Intel's CET and ARM's BTI), the CFI information could be readily available through the hardware CFI extensions. With the CFI information, we apply CFI principles to also constrain illegal control-flow during speculative execution. SPECCFI ensures that control flow instructions target legal destinations to constrain dangerous speculation on forward control-flow paths (indirect calls and branches). We complement our solution with a precise speculation-aware hardware stack to constrain speculation on backward control-flow edges (returns). We combine this solution with existing solutions against branch target predictor attacks (Spectre-PHT) to close all non-vendorspecific Spectre vulnerabilities. We show that SPECCFI results in small overheads both in terms of performance and additional hardware complexity.
I. INTRODUCTION
The recent Spectre [41] attacks have demonstrated how speculative execution can be exploited to enable disclosure of secret data across isolation boundaries. Specifically, attackers can misguide the processor to speculatively execute a read instruction with an address under their control. Although the speculatively read values are not visible through the architectural state, since the misspeculation effects are eventually undone, they can be communicated out using a side-channel. Since their introduction, a large number of attacks following the same pattern (temporary read of sensitive data through speculation, followed by disclosure of this data through a covert channel) have been discovered which enabling bypassing different permissions using a number of different speculation triggers [10] , [13] , [26] , [29] , [40] , [43] , [45] , [57] , [62] , [66] , [72] ; it is clear that this is a general class of weakness that requires deep rethinking of processor architecture.
Since speculation has a great impact on performance, to mitigate this threat without throttling speculation, some solutions such as InvisiSpec [73] and SafeSpec [38] propose separating speculative data from committed data. Such an approach, rather than attempting to limit speculation, would isolate possible leakage. However, the principle has to be applied to every micro-architectural structure (e.g., cache, TLB, DRAM row buffer), and it is unclear if this approach could prevent leakage through contention, for example, using the functional unit port side-channel [7] , [13] .
Another direction to mitigate this threat is to abort speculation if a potentially dangerous gadget can be executed speculatively. For example, Intel and AMD suggest inserting serialization instructions like lfence to prevent loading potentially secret data [6] , [34] . Because blindly inserting serialization instructions will have the same effect as disabling speculation, thus severely reducing performance [32] , a better solution is to conditionally insert barriers. The MSVC C compiler [47] , oo7 [70] , and Respectre [31] use static analysis to identify dangerous gadgets and only insert lfence before the identified gadgets. Context-Sensitive Fencing [63] dynamically inserts serialization instructions when a load instruction operates on untrusted data (address), but only for Spectre-PHT.
Our observation is that Spectre-like attacks rely on manipulating the prediction structures (see Section II-A for details) to coerce speculation to an attacker-chosen code gadget with attacker-controlled input. Therefore, these attacks can be defeated more efficiently by identifying and preventing wrong speculation (prediction). As the first step towards this direction, we propose SPECCFI, a lightweight solution to prevent the two most dangerous Spectre variants: Spectre-BTB (v2) and Spectre-RSB (v5) attacks. SPECCFI prevents these attacks by using control-flow integrity (CFI) principles to identify when to constrain speculation.
In contrast to traditional CFI, even hardware supported proposals, whose purpose is to prevent illegal control flow within the primary architecturally visible control flow of a program, SPECCFI pushes CFI to the speculation level, where it can be used to determine whether a speculative execution path should be allowed or limited. Compared to existing solutions against Spectre-BTB and Spectre-RSB attacks, such as the recent microcode update from Intel [34] and retpoline [65] , SPECCFI introduces less performance degradation as it still allows correct speculation to proceed, while existing solutions blindly "disable" all indirect branch prediction.
We also like to argue that defenses against Spectre-BTB and Spectre-RSB attacks serve as the foundation for defense against Spectre-PHT (v1) attacks. The reason is that serialization instructions can be viewed as a special type of inline reference monitor. Therefore, it is crucial to make sure that these inserted barriers are never bypassed. However, without protections against Spectre-BTB (forward indirect branches) and Spectre-RSB (returns), attackers can easily bypass the barriers to carry out the attacks [13] . Furthermore, as demonstrated in return-oriented programming [61] , by jumping to the middle of an x86 instruction, attackers can use unintended gadget to launch attacks. For this reason, we envision SPECCFI being combined with existing solutions against v1 attacks [19] , [51] , [63] to provide comprehensive protection against Spectre attacks.
The SPECCFI principle can be applied to any CFI implementation (e.g., coarse-grained such as Intel's CET [36] , or fine-grained such as HAFIX [21] ), with small differences in implementation and leading to the enforcement to the respective version of CFI. We present our baseline design in Section IV and Section V. We investigate two versions of SPECCFI: SPECCFI-base that implements CFI only for speculation, and SPECCFI-full that also supports CFI for the committed control flow (i.e. conventional goal of CFI). Section VII evaluates performance and complexity of both SPECCFI-base and SPECCFI-full. We show that SPECCFIbase eliminates dangerous misspeculations (where the predicted target label does not match the destination), without deteriorating performance.
SPECCFI-full incurs an additional small overhead, on par with other hardware CFI implementations [20] - [22] . We also analyze the implementation complexity resulting from the hardware shadow stack, and the extensions to the BTB, and find that the overhead is modest.
Although some software and hardware solutions have started to appear to defend against this class of attacks, we believe that our solution is elegant along with a number of interesting properties. We believe that it also combines well with other proposed defenses, such as SafeSpec [38] and InvisSpec [73] which limit the speculative side effects once misspeculation occurs, by limiting the opportunities for harmful speculation. Section VIII compares SPECCFI to these and other works.
In summary, the contributions of the paper include:
• We present a new defense against Spectre variants that rely on polluting the BTB and RSB, by embedding CFI principles into the branch prediction decisions.
• We analyze the security of the proposed designs showing that it protects against all variants of Spectre-BTB (v2) and Spectre-RSB (v5) attacks. Combined with solutions such as context-sensitive fencing, we believe that we can completely secure the system against Spectre attacks.
• We analyze the performance and complexity of SPECCFI, showing that it leads to little overhead. Compared to a defense that prevents speculation around indirect jumps, indirect calls and returns, SPECCFI provides equivalent security yet still avoids the large performance overheads. The hardware complexity is also negligible.
II. BACKGROUND
This section overviews some background: branch predictor structures in modern processors, Spectre attacks, and CFI.
A. Branch prediction and Spectre attacks
Branch prediction is a critical component of modern processors that support speculative out-of-order execution. When a control flow instruction (branch, call or return) is encountered, the result of the instruction (whether or not a conditional branch will be taken; what is the target of an indirect branch or call or a return) is generally not known at the front end of the pipeline. As a result, to continue to fill the pipeline and utilize the available resources of the processor branch prediction is used. Modern processors employ sophisticated predictors (shown in Figure 1 ) which consists of three components:
• Direction predictor: is responsible for predicting the direction of a conditional branch. Although a number of implementations have been studied, modern predictors typically implement a two-level context sensitive predictor [26] . The first level is a simple predictor that hashes each branch address to a direction predictor (typically a 2-bit saturating counter). This predictor is used either when a branch is not being successfully predicted or when the predictor has not been trained yet. When the predictor is trained, it typically uses a variant of a gshare predictor [75] , which uses the global history of a branch in addition to its address to hash to a direction predictor as before. The advantage is that the same branch can have different predictions based on the control flow path used to reach it.
• Target predictor: is used by indirect jump and indirect call instructions which jump to an address held in a register or a memory location. In this case, the target of the branch is not known at the front-end of the pipeline. This predictor typically uses the hash of the branch address to index a cache holding the branch targets called the branch target buffer (BTB). Modern processors often have a 2-way or 4-way set associative BTBs. BTBs are shared across [41] Pattern History Table ( PHT) Spectre-PHT (v1.1) [40] Pattern History Table ( PHT) Spectre-BTB (v2) [41] Branch Target Buffer (BTB) Spectre-RSB (v5) [43] , [45] Return Stack Buffer (RSB) threads on a virtual core: one value used by a process could be used by another process whose branch has a matching address in the BTB [27] .
• The return address stack: Since returns are not well predicted using the BTB, and often follow strict call-return semantics, their target is predicted using a return address stack of fixed size. When a call instruction executes, the return address is pushed on this hardware stack; if overflow happens, previous entries are overwritten [43] . When a return is encountered the top of the stack is popped and used as the return target.
Spectre Attacks Spectre attacks exploit the branch and aliasing predictors to fool them to access unauthorized data speculatively [13] , [15] , [29] , [40] , [41] , [43] , [45] . The main properties that the attack exploits in speculative execution are:
(1) Speculative instructions have unintended side-effects on micro-architectural states even if they do not get committed; and (2) attackers can deliberately mislead execution into attacker-intended gadgets by mistraining branch predictors, and use the previous property to leak sensitive information. Specifically, an attacker selected gadget is executed speculatively to perform unauthorized access and leak the value through a side-channel [10] , [32] , [41] . Based on the prediction structure being attacked, variants of the Spectre attacks that are addressed in this work are shown in Table I . Mitigations for other variants of the Spectre attacks as well as variants of the Meltdown attack have been discussed thoroughly by works such as Canella et al. [15] .
B. Control-flow Integrity
Control-flow integrity (CFI) [4] , [53] is the state-of-theart solution to mitigate control-flow hijacking attacks. In such attacks, attackers corrupt/overwrite control data (i.e. data that controls indirect control transfer, function pointers and return addresses for instance ) to divert the victim program's execution to carry out attacker-controlled logic, such as a malware and open a backdoor. CFI prevents such attacks by enforcing a basic safety property: software execution must follow a path of a control-flow graph (CFG) determined ahead of time [4] . Hence, a CFI mechanism always consists of two components: one that computes the CFG and one that regulates the control transfer. Constructing CFG. The security guarantee of a CFI mechanism directly depends on the accuracy of the CFG, which can be constructed through static or dynamic analysis. Coarsegrain CFI mechanisms [77] , [78] generate the CFG using simple static analysis: any address-taken function can be a legitimate target for any indirect call; any address taken basic block can be a legit target for any indirect jump; and the address of the next instruction after any call can be a legit target for a return. Although coarse-grained CFI can eliminate most illegal control transfer targets, follow-up research has shown that the CFG used is too permissive/inaccurate that it still allows attacks [17] , [28] . Fine-grained CFI solutions improve the accuracy of the CFG by incorporating type information [49] , [54] , [64] , [68] , [71] . Unfortunately, the CFG may still allow illegal control transfers [16] , [25] . More recently, researchers have proposed utilizing runtime information to further improve the precision of the CFG [24] , [50] , [67] , which can even achieve perfect accuracy [30] (i.e., one possible target per indirect control transfer site). Regulating control-flow. Once the CFG is calculated, legitimate control transfers can be grouped into equivalence sets. Within the same set, control-flow can be transferred from any source location (e.g., a callsite or return site) to any target location (e.g., target function or callsite). By assigning each equivalence set a unique ID/label, runtime control-flow can be regulated with a simple check-source label must match destination label. Such checks can be implemented using either software or hardware. Some hardware extensions only support a single label [11] , [36] , [37] thus can only enforce coarsegrained CFI. Others support multiple labels [20] , [22] and finegrained CFI. Some hardware extensions also include a shadow stack to enforce unique return target [20] , [21] , [36] , [37] . Adoption. Because of its effectiveness against control-flow hijacking attacks, CFI has been adopted by both commodity software and hardware. Tice et al. [64] introduced forwardedge CFI to LLVM and GCC in 2014. Android adopted this implementation in 8.1 to protect its media stack and extended the protection in Android 9 to more components and the OS kernel. Microsoft introduced its own CFI implementation, control-flow guard in Visual Studio 2015 and has been utilizing it to protect important OS components, including the web browser. In Windows 10 (V1730), Microsoft extended the protection to the OS kernel and hypervisor (Hyper-V). On the hardware side, Intel introduced Control-flow Enforcement Technology (CET) [36] and ARM introduced a similar mechanism, Branch Target Indicators (BTI), in ARMv8.5-A [11] .
III. SPECCFI SYSTEM MODEL
This section first overviews the threat model we assume to clarify the assumptions made in the system. We next describe the extensions to the Instruction Set Architecture (ISA) to support SPECCFI. Finally, we describe the compiler modifications to prepare the binary for execution to benefit from the SPECCFI defense.
A. Threat Model
The main goal of SPECCFI is to prevent attackers from launching branch target injection attacks (i.e., Spectre-BTB and Spectre-RSB) to leak memory content of the victim program that is otherwise not accessible to the attackers. We assume a strong local adversary model. In particular, we assume a shared BTB across different hardware threads (i.e., hyperthread) and protection domains (address space, privilege level, and SGX enclaves). We assume RSB is not shared between hardware threads but it is shared between different protection domains.
In our model, adversaries can inject arbitrary branch targets into BTB by executing arbitrary indirect branches within their own protection domain. Their goal is to control the predicted branch target in the victim protection domain.
We note that Meltdown style attacks [44] , [48] , [57] , [58] , [66] , [69] are outside the threat model since they occur due to speculation on the value to be used within the execution of the same instruction; privileged kernel memory [44] , L1 cache contents [66] , fill buffer [58] , in-flight data in modern CPUs (for example: Re-Order Buffer and Line Fill Buffers) [69] , and store buffer [48] , [57] . Moreover, misspeculation through the direction predictor (which leads to Spectre-PHT) does not result in a control flow violation, since it is simply resulting in the incorrect choice among two legal control flow directions. Luckily, existing works have already developed protections against Spectre-PHT, primarily by limiting speculation around conditional branches that can lead to dangerous misspeculation [19] , [31] , [47] , [63] , [70] . Similarly, Spectre-STL is out-of-scope but can be completely mitigated by disabling speculative store bypass [5] , [9] , [34] . To the best of our knowledge, SPECCFI is the first hardware design that targets the more dangerous Spectre-BTB and Spectre-RSB attacks even when they use diffident side-channels (e.g., contentionbased side-channel in SMT processors [13] ).
We further assume that target software is protected with hardware-enforced CFI, which marks valid indirect control transfer targets (e.g., ENDBRANCH in CET). Although the target software may contain memory vulnerabilities (e.g., buffer overflows) that could be exploited to achieve arbitrary read and write (i.e., the traditional threat model for CFI), such attacks are out-of-scope of this work.
B. Instruction Set Architecture (ISA) Extension
Most hardware CFI extensions [11] , [20] - [22] , [36] use target labeling to enforce forward-edge CFI and a shadow stack to enforce backward-edge CFI. Without the loss of generality, we assume two modifications to the ISA to inform the hardware of the labels from the CFG analysis:
• Extending the indirect jmp and call instructions to include CFI labels. For coarse-grained CFI enforcement (e.g., Intel CET [36] and ARM BTI [11] ), the label at jump and callsites can be omitted.
• Adding a new instruction to mark legitimate indirect branch targets with corresponding labels. For coarsegrained CFI enforcement, the label can be omitted (e.g., the case of Intel CET) or collapsed to two labels: one for jump targets and the other for call targets (e.g., the case of ARM BTI). The shadow stack is generally transparent to the program and will not be directly manipulated. However, certain language features such as exception handling, setjmp/longjmp, require manipulation of the shadow stack. To support these features, additional instructions are needed, but since they do not interact with SPECCFI, we omit their details. The Intel CET specification [36] can be referred to as an example of what instructions are necessary and how they interact with the ISA. Fortunately, because these required modifications are the same as CFI, they are already available as part of commodity compilers. For example, both LLVM and GCC include support for (1) software-enforced fine-grained forward-edge CFI [64] , (2) Intel CET, and (3) ARM BTI. Therefore, SPECCFI requires little (potentially no) modifications to the compilers. SPECCFI is compatible with any label based CFI implementation.
IV. FORWARD-EDGE DEFENSE
In this section, we describe the component of SPECCFI responsible for preventing both misspeculation as well as control-flow that breaks CFI on the forward-edge (i.e., on indirect calls and indirect jumps). This defense is responsible for preventing Spectre-BTB (v2) both within the same address space and across different address spaces. It is also responsible for maintaining CFI integrity on committed instructions (the traditional use of CFI).
A. Preventing Spectre-BTB (within the same address space)
In this attack, the attacker pollutes the target BTB entry by repeatedly executing another indirect branch in its address space that hashes into the same entry. This can be achieved through script engines like the JavaScript engine in browsers and the BPF JIT engine in the kernel. When the victim branch is executed speculatively, the polluted entry will direct the victim to a malicious gadget. Our goal is to prevent the victim from jumping speculatively to the malicious gadget.
Our first design considers augmenting the BTB to hold a CFI label for the target. This design would extend the execution of indirect call/jmp instructions to update the BTB to add not only the target (which is the traditional implementation) but also the CFI label of the branch. Later in the speculation path, all indirect calls and jumps are indexed to the BTB to predict their target as before, but with an additional check against the CFI label. This defense prevents attacker-controlled misspeculation: the CFI label of the current indirect branch instruction is compared with the label of the BTB entry. If the labels do not match, we prevent fetching and executing instructions speculatively from this target. For benign programs, such misspeculation is likely to occur only when the BTB is cold (has not been initialized yet), or when branch aliasing causes collisions in the BTB structure. While these cases should be rare, in both cases the value in the BTB is not the correct target. Limiting such erroneous speculation might result in performance improvement since we do not waste any cycles on fetching instructions from what is likely to be the wrong path.
Since only committed indirect branches update the BTB, possible targets that may be used by attackers are limited to gadgets starting with a cfi_lbl instruction with an identical label to that of the call/jmp instructions label. Note that a label may be shared by multiple locations in the code in CFI, and misspeculation among these locations is still possible (i.e., control flow bending [17] ); as known from CFI solutions, this set is much smaller than the potential targets set without CFI. The first design described above (storing the CFI labels in BTB entries) can sufficiently mitigate attacks within the same address space, but cannot in general prevent attacks across address spaces, where attackers pollute the globally shared BTB from a program controlled by them. In this case, if attackers know the label used by the victim program (e.g., through offline analysis), they can potentially craft a valid entry in BTB with the same label as victims and bypass the protection. Consider the example in Figure 2 . The attacker inserts L1 and 0x25 in the 0x10 index of BTB, by carefully selecting the location of a branch and its label. This entry is valid since the target is annotated with cfi_lbl L1, the same label as the call. When the CPU context switches to the victim space, the victim call at location 0x10 is indexed to BTB and uses the BTB entry, inserted by the attacker to predict its target. As a result, the CPU continues speculative execution of the malicious gadget from 0x25. In this case, despite the fact that the victim's security check for labels is passed, the attacker is able to redirect the control flow and execute the malicious gadget to reveal the secret.
To prevent cross-address-space attacks, one possibility is to randomize the mapping of addresses to the BTB (e.g., similar to the CASESAR solution for caches [55] ) to make it difficult for attackers to guess the label or the location associated with the target branch. However, as this approach only provides probabilistic guarantees against attacks, we decided to use an alternative implementation that avoids using labels in the BTB. Specifically, we enforce the CFI check by ensuring that the first speculatively executed instruction after an indirect branch is a legal CFI label instruction with a matching label. We note that this is the standard implementation of hardware acceleration of CFI. However, since we are using CFI to constrain speculation (not just the committed instructions), this approach requires pushing the check a little earlier in the execution (to the decode stage of the first instruction on the speculative path). However, as our experimental analysis shows, this results in a negligible impact on performance since only the detection of misspeculation is delayed, but legal speculation is not.
With respect to performance, the two implementations operate differently, but are likely to perform similarly. The first implementation requires modifications to the critical BTB structure and can potentially slow down the execution pipeline, favoring the second, target label-checking, implementation. A small disadvantage of the second implementation is that the target instructions have to be speculatively fetched (if not cached) to be able to check the label, which could be avoided if the label mismatched is detected by the BTB in the first implementation. However, this should lead to no loss of performance since these instructions cannot be executed anyway until they are fetched.
The state machine implementing the check in the decode stage of the pipeline is shown in Figure 3 . Unlike the initial design modifying the BTB, we do not reference the BTB for checking the labels. Starting at the initial state, any indirect call/jmp instruction in the decode stage sets the CFI_REG register with its own CFI label and causes the CPU to wait for a cfi_lbl instruction. The decode stage makes sure that the next instruction is a cfi_lbl instruction. This restricts potential gadgets to be starting with a cfi_lbl instruction. Moreover, the CPU will confirm that the CFI_REG value and the label of the cfi_lbl instruction are equal. In this way, potential gadgets are further restricted to those with a matching label. When the instruction following the call/jmp is not a cfi_lbl instruction or when the label of the cfi_lbl instruction does not match the label of the call/jmp, a lfence micro-up is inserted into the pipeline to guarantee that executing from the wrong speculative path is prevented.
C. Enforcing CFI for Committed Instructions
The design presented thus far prevents all variants of Spectre-BTB attacks. SPECCFI is essentially hardwaresupported CFI, but with CFI enforcement during speculation. Thus, given the similarity in the hardware support to traditional CFI, we also extend the design to support standard CFI to enforce the CFI rules on committed instructions and defend against control flow hijacking attacks. This support is achieved by enforcing the CFI check during the commit stage of the pipeline: if an indirect call/jmp instruction is not followed by a cfi_lbl instruction with a matching label, the CPU raises a CFI violation exception.
V. BACKWARD-EDGE DEFENSE
The backward-edge defense component of SPECCFI protects misspeculation on return instructions. Return instructions typically obtain their predicted addresses from a hardware stack called the Return Stack Buffer (RSB). The RSB has been shown to be vulnerable to a range of Spectre attacks [43] , [45] . To provide protection for the backward-edge, hardware CFI proposals use a Shadow Call Stack (SCS), which is protected from normal memory reads and writes, and can only be manipulated through special instructions [36] . Similar to RSB, the SCS is used to retain the return addresses of previously executed calls. The differences are: (1) SCS is in memory, so it is saved and restored across context-switch; while RSB is a special cache in the CPU and its content is shared across different context. (2) SCS is only used for CFI enforcement and its size is configurable; while RSB is only used for speculation, and since misspeculation was thought to be only a performance problem, RSB is a best effort structure that is not maintained precisely and has a limited size.
A. Combined Speculation-consistent RSB/SCS: Overview
To provide defenses against Spectre-RSB attacks, we combine traditional RSB and SCS into a unified structure RS-B/SCS acting as both RSB and call stack. Conceptually, RSB in our design can be viewed as the in-processor cache for the in-memory SCS. We note that this is different from other SCS implementations that retain the RSB separately. By getting speculation targets from the precisely maintained SCS, consistent with the philosophy of SPECCFI, we move the CFI guarantees to the speculation stage, closing the Spectre-RSB vulnerability.
The overall design of RSB/SCS has additional requirements from the design of conventional SCS. Specifically, since we have to be able to use it to obtain speculation targets, it must track additional speculative state without affecting the committed state of the SCS. We describe the overall design in the remainder of this section.
When a context switch occurs, the committed RSB/SCS entries must be saved such that they can be restored when the program runs again. To be able to keep the state of this structure consistent, we extend the reorder buffer (ROB, which is the structure in the CPU used to track speculative instructions and their register values before they commit) to track this state. Specifically, we add a logical register OLD_RS which (is subject to renaming and) holds the return address that is pushed to the RSB/SCS by a call instruction, or popped by a return instruction from the RSB/SCS. In addition, we keep track of a pointer to the last committed entry (LCP) of the RSB/SCS so as to save and restore the state of committed entries in this structure in the case of context switch or a spill overflow to memory. At the decode stage, If the instruction is a call, the next address is "speculatively" pushed to the RSB/SCS structure. When this instruction commits, the LCP is updated to point to the last committed entry. If the instruction is decoded as a return it "speculatively" pops a return value from the RSB/SCS structure into OLD_RS (without changing LCP) and sets the program counter to this address for next instruction fetch. To support conventional CFI, when the return instruction reaches the commit stage, the value of the OLD_RS register is compared with the top of the traditional software stack. If these two values do not match, a CFI violation exception is raised. Otherwise, this return instruction gets committed, and the LCP is decreased by 1 to point to the next committed entry in the RSB/SCS.
We considered the need to provision the stack with additional ports since it is used not only to serve committed instructions, but also to handle speculative calls and returns. However, we found that additional ports do not result in performance benefits because the speculative SCS state is held primarily in the port-rich reorder buffer. When the in-processor cache (RSB) overflows or the current thread is about to be swapped out, we spill it over to the hardware-protected inmemory SCS. When the RSB underflows or a new thread is swapped in, we load entries from the SCS. Currently, we did not explore optimization to prefetch values from the SCS when RSB is close to empty, or to push some values proactively to memory when RSB gets close to full.
B. Misprediction Recovery
Every ret instruction utilizes the RSB/SCS to predict their jump target. Since the state of RSB/SCS is modified by speculative call and ret instructions, in case of misspeculation, the CPU has to recover the correct state of the structure.
When misspeculation is detected, we need to flush all the speculated instructions from the pipeline. As a part of this process, we have to annul all the corresponding entries from the ROB. During annulment, for every call or return instruction, we not only remove the ROB entry but also update the RSB/SCS to preserve the consistent state of the structure. If the instruction is a call, the top of the RSB/SCS will be popped. In the case of a ret instruction, the value of OLD_RS will be pushed back to the RSB/SCS.
C. RSB/SCS Work Flow
To clarify how this structure works, we step through the example code sample presented in Figure Figure 5 . Let's assume both calls to function1 and function2 have pushed their return values to the RSB/SCS. By committing these instructions at , the LCP is updated to point to the last committed value and then the corresponding entries are evicted from ROB. In the second step , the return instruction from the first call is being executed speculatively, saving the return address in the ROB, and eventually getting committed. The following speculative call to function3 at , will push its return address to RSB/SCS. At stage , the execution of the return instruction and the following call to function4, change the RSB/SCS state. Assume that a misspeculation on the jz instruction has been detected at and every instruction executed after the branch has to be flushed. Therefore, the recovery process starts annulling instructions from the last entry in ROB until the misspeculated instruction has been reached. Annulling the last call in the ROB at , the value at the top of RSB/SCS is popped and at , annulling the return, the OLD RES value of the instruction saved in ROB is pushed back to the RSB/SCS to reset the state to the previous state before the misspeculation.
D. Preventing RSB Poisoning
Since the RSB/SCS is not shared between different threads and preserved across context switches, the attacker is not able to poison this structure. Although we allow special instructions to manipulate the SCS to take care of cases such as setjmp/longjmp, we assume these instructions are only available to code within the trusted computing base to prevent them from being abused to arbitrarily manipulate the RSB/SCS (which is not a Spectre vulnerability).
VI. SECURITY ANALYSIS
In this section, we analyze whether SPECCFI can achieve our design goal: preventing attackers from launching branch target injection attacks to leak memory content of the victim program that is otherwise not accessible to the attackers.
A. Guarantees against Branch Target Injection
Branch target injection attacks target two prediction components: branch target buffer (BTB) and return stack buffer (RSB). Similar to CFI where the defense does not prevent attackers from modifying control data (e.g., function pointers and return addresses) but aims to prevent attackers from arbitrarily altering the control-flow to execution code gadget they want; SPECCFI does not prevent BTB injection: attackers can still insert arbitrary prediction targets into the BTB by executing branches inside their own protection domain [26] . What SPECCFI guarantees is that if the injected target is not a valid indirect control transfer target in the victim protection domain, then the injected prediction target will not be executed speculatively, i.e., they cannot speculatively execute arbitrary code gadgets. For RSB, SPECCFI essentially converts it into a cache for the shadow call stack (SCS) and is flushed/restored during context switch, so both in-address-space injection and cross-address-space injection are no longer feasible.
Impact of Imprecise CFG: One weakness of static CFG construction is imprecision, which may still allow attackers to launch attacks using permitted function-level gadgets [14] , [16] , [25] , [56] . Since SPECCFI also relies on the CFI analysis to provide valid targets for forward-edge indirect control transfer, it also suffers from the same weakness: mis-prediction is still possible. However, we want to point out that because of some unique characteristics of Spectre attacks, this weakness does not pose significant security threats. We will discuss the details in the next subsection.
B. Incorporating Defense against Spectre-PHT
SPECCFI alone can only mitigate Spectre-BTB and Spectre-RSB attacks. In this subsection, we discuss how SPECCFI can be and should be combined with Spectre-PHT defenses to complete the defense against all Spectre variants. In particular, to defend against Spectre-PHT attacks, researchers have proposed code analysis techniques [31] , [47] , [70] to (1) identify dangerous code gadgets that can be used to leak information and (2) conditionally insert serialization instructions (e.g., lfence) to prevent these dangerous code gadgets from being executed speculatively. One tricky part of such analysis is that, although on the committed path, direct control transfer is always correct; during speculation, even direct control transfer can be wrong. For a simple example, consider a direct call behind a conditional branch: if the prediction on the conditional branch is wrong, then the following direct call is also wrong. For this reason, when analyzing the code to identify potential dangerous gadgets for Spectre-like attacks, one must perform inter-procedural analysis (for both direct and indirect calls) to account for gadgets that may span between function calls. The unique opportunity here is that, if the static analysis to identify and eliminate Spectre gadgets uses the same CFG as CFI enforcement, then malicious gadgets at the beginning of functions should already be eliminated. As a result, when combined with such defenses, even if SPECCFI allows misspeculation due to imprecise CFG, the wrong target cannot be used to launch attacks, because the gadgets have already been eliminated.
At the same time, defenses against Spectre-PHT attacks also rely on SPECCFI-like technique. The reason is the same as why inline reference monitors like SFI [46] , [74] has to enforce certain degree of control-flow regulation-if attackers can hijack the control-flow to arbitrary locations, then they can easily bypass the inserted checks and bypass the protection. This is especially dangerous to variable length ISA like x86 where attackers can jump to the middle of an instruction and start executing new logic. Similarly, SPECCFI provides the same runtime guarantee to Spectre-PHT defenses: by enforcing that even speculative control-flow cannot deviate from the CFG used in static analysis, the code being analyzed and instrumented will be the same as executed.
C. Comparison to Intel CET
A few days before the submission of this paper, Intel published a new specification of its CET [36] extensions. The new specification includes a paragraph (section 3.8) indicating their plans to include a check that an indirect branch executed speculatively targets a legal Branch_end target. Intel suggested this solution, which is essentially the configuration of SPECCFI using CET as the CFI implementation, concurrently with our work.
We believe that Intel's interest in this solution validates it practicality as a defense against transient speculation attacks. While the updated CET specifications document describes only the general idea, our work contributes a reference implementation and assessment of both the performance and security of the solution. In addition, SPECCFI provides substantial security advantages over the new CET, including:
• Backward edge protection using the speculation aware shadow stack. While Intel CET uses a shadow stack to protect the backward edge for committed instructions, the specifications describe no plans to use it for limiting speculation. It is not trivial to extend the shadow stack to track the speculative state, as we describe in Section V.
• Generalized CFI protection and limiting control flow bending. CET only enforces that control flow (whether committed or, in the new specifications, speculative) happens to the start of a legal basic block. As a result, it allows arbitrary control flow bending [16] , which does not meaningfully restrict the attack opportunities. In contrast, SPECCFI admits any CFI implementation, which can substantially shrink the control bending attack possibilities. Specifically, from a given indirect control flow instruction, only the gadgets with matching CFI label are reachable. State-of-the-art CFI systems such as PathArmor/Context Sensitive CFI can be supported [67] substantially limiting the control flow opportunities. In particular, we intend to explore supporting uCFI [30] in our future work, leaving no control flow bending opportunities available.
VII. EVALUATION
In this section, we evaluate SPECCFI in terms of performance and hardware complexity. All experiments were conducted using the MARSSx86 (Micro Architectural and System Simulator for x86) [52] , a widely used cycle accurate simulator. MARSSx86 is built using PTLsim [76] and does a full system simulation (including the OS) on top of the QEMU [12] emulator. First, we configured MARSSx86 to simulate an Intel Skylake processor; configurations are shown in Table III . We then integrated SPECCFI into the simulator to model all new operations realistically and in full details, in order to retain hardware faithful cycle accurate modeling of the extended processor pipeline.
A. Performance Evaluation
We use the SPEC2017 benchmarks [2] for evaluation, which is a standard benchmark suite used to evaluate the impact of processor modification on a range of representative applications that exhibit a range of different behaviors. All benchmarks were compiled using an LLVM compiler that is One option to prevent Spectre attacks is to insert fences to stop speculation around indirect control flow instructions. In order to evaluate SPECCFI performance, we compare it against the following design points:
• Baseline: this is the case of an unmodified unprotected machine. Specifically, we compile and run the SPEC2017 benchmarks using unmodified version of LLVM compiler and MARSSx86 simulator. In all of our experiments, we use the Instructions committed Per Cycle (IPC), a common metric for evaluating the performance of processors, to report performance. The IPC values of the defenses are normalized to this baseline implementation without defenses; thus, a higher normalized value than 1 indicates better than baseline performance.
• Retpoline-style software fencing: we implement a system adding fences to indirect branches using software. The compiler is modified to substitute all the indirect branches and return instructions with a sequence of instructions which ensure that the target of the branches are resolved before any following instruction that might touch the cache (i.e, load) are issued. For protecting the forward edges (i.e. indirect call and jumps) This is done by converting each indirect call to the three following instructions: a load preparing the value of the target register/memory, an lfence making sure that no future load is issued before the branch is resolved and the actual call to the address specified in the target register. Taking the same approach for securing backward edges (i.e. returns) we substitute any ret instruction with a sequence of a pop from top of the software stack to the target register, an lfence making sure to stop the speculation before the actual target of ret resolved and a jmp to transfer the control to the target. Conceptually, this solution is similar to the Retpoline defense [65] which essentially replaces speculation on indirect branches with an empty stall gadget. Different from Retpoline, we also insert the fences for returns (Retpoline does not protect returns, and leaves the code vulnerable to Spectre-RSB attacks). This software approach has the advantage of not modifying the underlying hardware but imposes a noticeable overhead in the number of instructions and code size.
• All Target Fencing: In this approach, we show one implementation with an lfence, inserted in hardware, at target of each indirect branch and return (the all target fencing) since such a defense is possible without CFI. This is done by detecting every indirect call, jump, or return in the decode stage of the pipeline and inserting an lfence at target of them to make sure that the branch is resolved before issuing further instructions.
The implementations discussed above prevent speculation by inserting lfence into the pipeline. SPECCFI offers a more intelligent and targeted way of using fences for securing forward edges (as discussed in Section IV), as well as a new method for making backward edges secure (as explained in Section V). To study the effect of different serializing instruction we use two different types of lfence instructions in our experiments:
• Strict lfences, are highly restrictive and prevent any instruction to pass through them until the fence retires [63] . This type of fences impose high overhead to the system. All the x86 serialization instructions including the lfence we use in our experiment, categorize as strict fences.
• Relaxed lfences, only stop certain types of instructions until the fence gets retired [63] , while letting the others through. For example, LSQ-LFENCE [63] , prevents any subsequent load instruction from being issued speculatively out of the load/store queue but allows any other instruction to pass it. LSQ-LFENCEs are secure against Spectre because they prevent the speculative loads, and have the advantage of letting speculation on other types of instructions proceed, substantially reducing the performance impact. Figure 6 shows the performance overhead of SPECCFI-full (securing both forward and backward edges) in comparison to All Target Fencing and Retpoline-style software fencing approaches. We note that in general, inserting serializing instructions (e.g, lfence) in the target of every indirect branch is expensive, imposing performance overhead of 39% and 48% on average for All Target Fencing and Retpoline style respectively. Using SPECCFI, by inserting lfence only when the CFI check fails, the number of inserted lfence drops significantly thus reducing the performance overhead to less than 1.9% on average.
To illustrate the reason behind the performance reduction in the different approaches, we study the number of lfence inserted in each approach in Figure 7 . Note that benchmarks such as mcf and omnet, are C++ benchmarks which use a large number of indirect branches due to the common use of virtual function calls and function pointers. As a result, this leads to a large number of lfence being inserted into the pipeline, and to a substantial performance impact compared to the baseline implementation. The only exception in this trend is Provay which has the highest overhead in all the defenses but does not have huge number of lfence compared to the other benchmarks. Looking more closely at this benchmark, we found out that it is a memory intensive benchmark with the highest number of load and store micro-ops among all the benchmarks. Intel manuals [35] indicate that an lfence is committed only when there is no preceding outstanding store. Thus, for this benchmark, each lfence instruction remains active for a longer period of time until it gets committed which explains the high performance impact. It is also worth mentioning that unlike the All target fencing and Retpolinestyle which insert lfence for each indirect branch, the reported number of lfence for SPECCFI is due to misprediction of the BTB. This means that the higher the rate of mis-prediction is, the more lfence are inserted.
Since SPECCFI does not protect the backward edges using lfence, in Figure 8 , we study the effect of securing the forward and backward edges separately. Note that in Retpolinestyle, all return instructions are converted to a sequence of instructions terminating with a jmp, meaning that there is no remaining ret instruction (i.e. backward-edge) in the code compiled in this setting. Therefore, the overhead measured as the overhead of Retpoline-style-full is equivalent to only Retpoline-style-forward overhead and the overhead on the 
NORMALIZED IPC
All Target Fencing-Full Retpoline-Style-Full SpecCFI-Full Fig. 9 : Performance using relaxed fences backward-edge is zero. The results from the breakdown show that as expected, the overhead in general increases with the number of indirect branches in All Target Fencing. As for SPECCFI, the overhead caused from forward edge defense is mainly low since it is only due to CFI mismatches which indicate misprediction of the branches. Therefore, the major part of the whole SPECCFI overhead is the overhead of SPECCFI-full on the backward-edge which is associated with maintaining the RSB/SCS hardware structure. It is important to consider that this maintenance effort also includes procedures to make sure the committed path is secured and therefore only a portion of this overhead is associated with defense against spectre attacks. As mentioned previously, since strict lfence imposes a higher overhead on the system and relaxed lfence provides the same security guarantee with lower overhead, we implemented all discussed defenses with relaxed lfence as well to study the differences in overhead. Figure 9 examines the effect of relaxed lfence. The results show that the overhead caused by strict lfence is much higher than that of relaxed lfence. Also as expected, using strict instead of relaxed causes far more performance degradation when the benchmark is memory intensive (i.e., has a lot of stores in this case). Our results show that just by changing the type of the lfence from strict to relaxed, the average overhead drops down from 48.9% to 22.6% for Retpoline-style and from 39.9% to 18.82% for All Target Fencing. However, they are still substantially higher than SPECCFI.
B. Hardware Implementation Overhead
To estimate the hardware overheads of SPECCFI, we implemented the basic structures and integrated them within an open core to estimate the area and timing overhead. Specifically, the implementation consists of adding two CFI_REG registers in two locations of the pipeline: (1) decode stage, to support detecting CFI violations for speculative instructions and (2) commit stage, to support detecting CFI violations for committed instructions. Since CFI_REG is used to store the CFI labels its size should be the same as the maximum CFI label size (32-bits for our design). Furthermore, we need to add two comparators; one in decode and one in commit stage of the pipeline. These comparators will be used by cfi_lbl instruction to compare its label to the CFI_REG (todetect violations). Additionally, SPECCFI needs a LCP register to point to the last entry of the RSB/SCS from a committed call, used to distinguish between entries from speculative and committed instructions. Since RSB/SCS has 16 in-processor cache entries, the LCP size is 4-bit. Moreover, at two stages of the pipeline, new entries can be added to the RSB/SCS: (1) while executing call instruction and (2) load the preserved RSB/SCS entries from memory in case of underflow. Therefore, we had to update the number of write ports from 1 to 2. The same thing applies to the number of read ports, as we may use RSB/SCS to fetch next instruction while spilling over to memory in case of RSB/SCS overflow. In addition, to preserve the correct behaviour of RSB/SCS, we provided two LCP update mechanisms: (1) -/+1: for regular push/pop operations and (2) -/+4: for handling overflow and underflow of the structure. The cost of the RSB/SCS itself did not lead to a noticeable increase in complexity or area.
To measure the impact of SPECCFI implementation on power, area, and cycle time, we modified the open source processor (AO486) [8] to include SPECCFI design using Verilog. To synthesize the implementation of integrating SPECCFI to the processor on a DE2-115 FPGA board [1] we used Quartus 2 17.1 software. The results shown in Table IV prove that SPECCFI indeed has very low implementation complexity. In terms of power, there is a 0.4% increase in core dynamic and static power. Although it is difficult to measure power accurately, we applied the power analysis tool provided by Quartus to measure power after synthesis to get more accurate results. In terms of area, there is a 0.1% increase in total logic elements. Moreover, since SPECCFI design is simple, it fits within the optimized frequency of the core. Thus, it has no effect on cycle time. The AO486 processor is an implementation of the 80486 ISA using a 32-bit in-order pipeline. Thus, these results are relative to the small pipelined core; the overheads will be much smaller if compared to a modern out-of-order superscalar core.
C. Empirical Security Evaluation 1) Against real exploits: To verify our analysis, we evaluated the effectiveness of SPECCFI against real-world exploits. We ran previously disclosed Spectre-BTB [41] , Spectre-RSB [43] , and SMoTHerSpecter [13] PoC inside the emulator. Table V summarizes the results, using the same classification scheme proposed in [15] . The experiment results show that SPECCFI was able to prevent all information leaks.
2) Impact of CFG precision: To study the difference between coarse-grained CFI (e.g., Inte CET [36] ) and finegrained CFI (e.g., SPECCFI) against BTB injection attacks, we used the SMoTherSpectre [13] for a demonstration. In this attack, attacker has to find a BTI gadget in the victim process which loads a secret in a register and terminates by an indirect branch to be able to perform BTB injection. By poisoning the BTB, attacker transfers the control to a SMoTHer Gadget to leak the secret. The SMoTHer Gadget start with a comparison based on the target register followed by an conditional jump which enables the SMoTherSpectre to leak the secret through port contention side-channel. Figure 10 compares the required SMoTHer gadgets and feasibility of the attack under coarsegrained and fine-grained CFI. Table VI shows the number of available SMoTher Gadgets from several standard libraries. Using the constraints for the SMother Gadget in [13] , we scanned for valid SMoTHer gadgets in the first 70 instructions after label instructions (endbr64 and cfi_lbl). For SPECCFI, we used function signature based approach for generating labels [49] , [50] . As we can see, although fine-grained CFI still permits some gadgets, the number is much smaller.
VIII. RELATED WORK
Since the initial announcement of Spectre and Meltdown in January of 2018, several Spectre variants have appeared [26] , [40] , [41] , [43] , [45] . Spectre attacks are characterized by manipulating the prediction mechanisms to trigger speculation to an attacker chosen gadget. They differ in what they exploit to trigger speculation: branch direction predictor (variant 1, variant 1.1) [26] , [40] , [41] , branch target predictor (or branch target buffer) for variant 2 [41] , return stack buffer for Spectre-RSB (also called variant 5) [43] , [45] , or load-store aliasing predictor for variant 4 [29] . To mitigate these attacks, several software and hardware defenses ranging from programming guidelines for cryptographic software developers [18] to architectural changes [38] , [73] have been proposed. In this section, we will overview these defenses categorized into the Spectre attack variants that they defend against. Table VII shows the Spectre attacks defenses and which attacks they mitigate and Table VIII shows the Spectre attacks defenses and their impact on hardware complexity, software modifications, and performance. SPECCFI is the only defense that provides complete protection against all Spectre attacks with little impact on performance and implementation overhead. Note that we are not considering Meltdown style attacks [44] , [48] LFENCE [6] , [9] , [32] IRBS, IBPB, STIBP [6] , [34] (SLH) [19] , (YSNB) [51] Retpoline [65] [9] , [32] IRBS, IBPB, STIBP [6] , [34] (SLH) [19] , (YSNB) [51] Retpoline [65] RSB Stuffing [33] CSF [63] ConTExT [59] SPECCFI [57], [58] , [66] , [69] since they rely on speculation within a single instruction and therefore do not rely on manipulating the branch prediction structures.
A. Spectre-PHT Defenses
Spectre-PHT exploits the PHT of the branch predictor to perform the attack. To defend against this attack, Intel, AMD, and ARM proposed to use instructions that serialize the execution (e.g. lfence) to stop speculation around branches (e.g. both directions of the branch) [6] , [9] , [32] . Although liberal serialization (e.g., at every branch instruction) can mitigate Spectre-PHT attacks, doing so severely hurts performance [32] : serializing all branch instructions will eliminate the performance benefit of the branch predictor (e.g., up to 10x slowdown [51] ). Against this drawback, multiple proposals tried to reduce the number of serializing instructions introduced using static analysis to serialize execution around exploitable gadgets only [31] , [32] , [47] , [70] . However, these approaches miss some of the gadgets that can be exploited [42] . Another weakness about these defenses is that even though they stop speculative execution around exploitable gadgets, they do not stop speculative code fetches and other micro-architectural behaviors before execution (e.g., instruction cache and iTLB fills) which can still leak data [60] .
Speculative Load Hardening (SLH) [19] and You Shall Not Bypass (YSNB) [51] tried to reduce the high overhead of using liberal fences. Therefore, they proposed to identify Spectre gadgets, then injecting artificial dependencies between branches and identified gadgets. Doing so will reduce the speculation window of the attack. Although this would results in performance advantage over liberal fencing, they still have 36%-60% performance overhead [63] . An attractive hardware solution is Context-Sensitive Fencing (CSF) [63] . CSF is a micro-code mitigation technique were serialization instructions are added dynamically based on runtime conditions that identifies potential exploit execution. Injecting serialization instructions dynamically reduces the impact of stopping speculation on performance which results on low performance overhead. Moreover, CSF proposed to defend against Spectre-BTB and Spectre-RSB using a special fence that would flush the BTB/RSB when transferring control to higher domains. However, flushing BTB and RSB would hurt performance since it will result in more mis-predictions. In addition, in simultaneous multithreading (SMT) processor, flushing the BTB/RSB after control transfer is not enough to protect against Spectre-BTB and Spectre-RSB since they can be polluted after the control transfer using other threads.
B. Spectre-BTB and Spectre-RSB Defenses
Spectre-BTB exploits the BTB and Spectre-RSB exploit the RSB to perform the attack. Google proposed Return Trampoline (retpoline) [65] as a software mitigation technique that defends against Spectre-BTB by replacing indirect branches with push+return instruction sequence that prevent BTB poisoning. However, this solution has high performance overhead since it stops speculation (similar to serialization). In addition, it can be bypassed using ret instructions since they cause miss-speculation through BTB; this is a by-product of a feature on Intel's Skylake+ processors (starting from Skylake) that allow processors to predict the address of a ret instruction from BTB in case of RSB underfilling. To solve this exploit, RSB stuffing [33] was proposed to intentionally fill the RSB with benign delay gadgets to avoid misspeculation on context switches. Although this technique can partially mitigate Spectre-BTB (when using ret to trigger speculation through BTB), it can also defend against SpectreRSB crossdomains attack. However, since we are filling the RSB on context switch, stored entries for the currently running process will be lost when execution is switched back to the current process (i.e. performance loss due to losing speculation information). Against this drawback, SPECCFI saves committed RSB entries per process in case of a context switch out of the process and restores them when execution returns to the process, which results in improving the prediction performance of ret instructions.
Intel and AMD added new instructions to their instruction set architecture (ISA) that can control indirect branches to defend against Spectre-BTB [6], [34] . The addition consists of three controls:
• Indirect Branch Restricted Speculation (IBRS): allows processors to enter IBRS mode (privileged mode) and execute indirect branches that are not influenced by less privileged mode.
• Single Thread Indirect Branch Prediction (STIBP): will not allow a hyperthread running on a core to use branch predictor entries inserted by the other thread running on the same core.
• The Indirect Branch Predictor Barrier (IBPB): allows processors to flush BTB and clear their state. This way the code executed before the barrier cannot impact branch prediction of the code executed after this instruction. These new ISA instructions defend only against Spectre-BTB. In addition, they have a high performance overhead; up to 24% on Skylake and up to 53% on Haswell [23] .
C. Spectre All Variants Defenses
Several mitigations were proposed to defend against all variants of Spectre. Dynamically Allocated Way Guard (DAWG) [39] was proposed to provide isolation between protection domains by partitioning the cache at the cache way granularity. Although this method can prevent leakage of the data through a cache side-channel, it requires domains enforcement management in the software, defending cache as leakage source only, and it can not protect against attacks that are performed within the same address space or isolation domain. In addition, since it is a cache specific defense, other micro-architectural structures can be used for communication (e.g. branch predictor).
SafeSpec [38] and InvisiSpec [73] are hardware mitigation techniques that are similar to DAWG in the way that they are both trying to prevent side-channel communication from speculative instructions. Therefore, they propose to mitigate the side-effect of speculative execution on the micro-architectural state; shadow micro-architectural structures for caches and Translation Lookaside Buffers (TLBs) were added to store transient effect of speculative instructions. These effects will be committed to caches and TLBs only when speculation is deemed correct and flush the changes from the shadow structures otherwise. Although these solutions outperform software solutions, they require making disruptive changes to the processor/memory architecture and consistency models.
ConTExT [59] introduced protecting secret data from speculative execution. Basically, they proposed a new memory mapping (called non-transient mapping) which indicates data that must not be accessed by speculative instructions. Nevertheless, this solution requires changes to the architecture and the operating system, the developer involvement by annotating the secret data, and incur high performance overhead for security-critical applications.
IX. CONCLUDING REMARKS
In this paper, we presented a new design that for the first time, protects against misspeculation targeting the branch target buffer (BTB) and the return stack buffer (RSB). These attacks are arguably the most dangerous speculation attacks because they can bypass compiler inserted fences. Prior defenses either excluded these attacks from their threat model, or resulted in aggressive speculation that dramatically degraded performance. In contrast, SPECCFI provides complete protection against these dangerous attacks, with little impact on performance, and with minimal hardware complexity.
SPECCFI introduces the idea of using CFI, explored previously as a protection against control-flow hijacking attacks for committed instructions (i.e., even on non-speculative processors), as a defense against speculation attacks. In particular, SPECCFI verifies the forward-edge of CFI on the instructions in the speculative path and only allows speculation if CFI labels match and verifies the backward-edge using a unified shadow call stack. Essentially, SPECCFI moves the CFI check to the decode stage of the pipeline, preventing speculative execution of instructions unless they conform to the CFI annotations. For normal programs, this results in little performance degradation since it only prevents speculation with mismatching CFI labels, which will result in misspeculation. By stopping misspeculation, we benefit from avoiding the cache pollution and other resource waste during misspeculation.
Combined with recent proposals to mitigate Spectre-PHT, we believe SPECCFI completely mitigates the threat from speculation attacks. Moreover, it does so without sacrificing performance due to speculative execution and with minimal modifications to the processor pipeline.
