Intel Software Guard Extensions (SGX) is a promising hardwarebased technology for protecting sensitive computation from potentially compromised system software. However, recent research has shown that SGX is vulnerable to branch-shadowing -a side channel attack that leaks the fine-grained (branch granularity) control flow of an enclave (SGX protected code), potentially revealing sensitive data to the attacker. The previously-proposed defense mechanism, called Zigzagger, attempted to hide the control flow, but has been shown to be ineffective if the attacker can single-step through the enclave using the recent SGX-Step framework.
INTRODUCTION
Intel Software Guard Extension (SGX) 1 is a recent hardware-based Trusted Execution Environment (TEE) providing isolated execution and guaranteeing the integrity and confidentiality of data within an enclave. The enclave is protected from all other software on the platform, including potentially malicious system software (e.g., operating system, hypervisor, and BIOS). Additionally, SGX enables hardware-based measurement and attestation of enclave code.
Although Intel has stated that side-channel attacks are beyond the scope of SGX 2 , recent research has demonstrated that SGX is susceptible to several side-channel attacks, which could leak secret information. In particular, Lee et al. [10] demonstrated a branchshadowing side channel attack that allows untrusted software to learn the precise control flow of code running inside an enclave. If this control flow depends on any secret information, this side channel would leak the secret information. This attack abuses the CPU's Branch Prediction Unit (BPU), which is used to improve performance by allowing pipelining of instructions before exact branching decisions are known, i.e., whether or not branches are taken, and the targets of indirect branches. The BPU bases its decisions on recent branch history, which is stored in the CPU's internal Branch Target Buffer (BTB). Two critical factors allow this attack to proceed: 1) BTB entries created by branches inside the enclave are not cleared when the enclave exits; and 2) BTB entries only contain the lower 31 bits of the branch instruction's address, allowing the attacker to create shadow branch instructions outside the enclave that map to the same BTB entries as the enclave's branches. The attacker executes the victim enclave, interrupts it immediately after the branch instruction, executes the shadow branch code, and checks whether the branches were correctly predicted, thus revealing whether the BTB entry had been created by the enclave.
Lee et al. [10] also proposed a software-based defense against branch-shadowing, called Zigzagger. Using compile-time instrumentation, Zigzagger converts all conditional and unconditional branches into unconditional branches targeting Zigzagger's trampolines, i.e., minimal code sections that hold intermediate jumps -bounces -to the target locations. The Zigzagger trampolines initiate a series of jumps back-and-forth to different branches. The idea is that the attacker cannot interrupt the enclave with sufficient precision to shadow the target branch in this rapid series of jumps. However, SGX-Step [15] invalidates this assumption by showing how an enclave can be interrupted with single instruction granularity, thus breaking the Zigzagger defense.
The recent Spectre [9] attacks, and their subsequent SGX-specific SGXPectre variant [4] are similar to branch-shadowing in that they exploit the BPU. However, we have confirmed experimentally that neither recent firmware patches, nor the Retpoline compiler-based mitigation affect the ability to perform branch-shadowing attacks.
To overcome this challenge, we present a new defense against branch-shadowing, even if the attacker can single-step through the enclave. Similar to Zigzagger, we use compile-time modifications to convert all branch instructions into unconditional branches targeting our in-enclave trampoline code. At run-time, we then randomize the layout of our trampoline, forcing the attacker to shadow all possible locations. The finite size of the BTB limits the number of guesses the attacker can perform, and thus we can quantify and limit the success probability of a branch-showing attack using the size of the trampoline as a tunable security parameter.
Our contributions are therefore:
• Experimental analysis demonstrating that the recent Spectre mitigation techniques do not affect the branch-shadowing attack (Section 3). • A new approach for defending against branch-shadowing attacks, even in the presence of single-step enclave execution, using control flow randomization (Section 5). • An initial LLVM implementation of our solution (Section 6) 3 and a quantitative evaluation of its performance and security guarantees (Section 7).
BACKGROUND 2.1 Branch Prediction
Intel CPUs use instruction pipelining to load and execute instructions in batches. This allows optimization such as parallelizing and reordering of instructions. The CPU also performs speculative execution, i.e., it uses the BPU to predict which branches will be taken, and executes them before knowing if they are taken. 4 In modern microprocessors, the BPU typically consists of two main subsystems, a BTB and a directional predictor. The BTB is used to predict the targets of indirect branches. 5 Whenever a branch is taken, a new record is created in the BTB associating the branch instruction's addresses with the target address. Upon encountering subsequent branch instructions, the BPU checks the BTB for the branch instruction address and, if an entry exists, it predicts that the current branch instruction will behave in the same way. The exact details of the BTB lookup algorithms, hashing and size are not public, but the BTB size on Intel Skylake CPUs has been experimentally determined to be 4096 entries [10] . The directional predictor is used to predict whether or not a conditional branch will be taken [5] .
Multiple processes executing on the same core share the same BPU, allowing an attacker to misuse the BPU across processes to infer the target and direction of branch instructions [5, 10] .
Branch-shadowing Attacks on SGX
In the branch-shadowing attack by Lee et al. [10] , the attacker first statically analyzes the unencrypted enclave code and enumerates all branches (i.e., conditional, unconditional, and indirect) together with their target addresses. She then creates shadow code where the branch-instructions and target addresses are aligned such that they will use the same BPU history entries. The attacker then allows the enclave to execute briefly before interrupting it. Finally, she enables the performance counter, in particular the Last Branch Record (LBR), and executes the shadow code, prompting the CPU to predict shadow-branch behavior based on prior enclave execution. The LBR contains information on branch prediction but cannot record in-enclave branches. However, the in-enclave branches can be inferred from the LBR entries for the branches executed after exiting the enclave. Unlike cache-based channels, this does not require timing because the LBR directly reports prediction status.
Zigzagger and SGX-Step
Zigzagger [10] is presented as a software-based countermeasure to thwart branch-shadowing attack. Zigzagger removes the branches from the enclave functions by obfuscating and replacing a set of branch instructions with a series of indirect jumps. Instead of each conditional branching instruction, an indirect jump and a conditional move (CMOV) is used. Zigzagger assumes that an attacker cannot precisely time the enclave interrupts, i.e., a single probe will cover over 50 instructions. It introduces a trampoline to exercise all unconditional jumps before finally jumping to the final destination. The attacker will typically always detect the same set of taken jumps (i.e., all the unconditional jumps) and cannot distinguish the final jump from the decoy-jumps.
However, Van Bulck et al. [15] presented SGX-Step, a framework consisting of a Linux kernel driver and runtime library that manipulates the processor's Advanced Programmable Interrupt Controller (APIC) timer in order to interrupt an enclave after a single instruction i.e., to single-step the enclave's execution. They show that this makes the Zigzagger defense ineffective because the attacker can distinguish meaningful jumps from decoys.
SPECTRE MITIGATION TECHNIQUES
The recent Spectre [9] and SGXPectre [4] attacks are similar to branch-shadowing in that they abuse the BPU to exploit speculative execution. Whereas branch-shadowing aims to infer prior branching behavior, these attacks instead manipulate upcoming branch prediction, e.g., cause speculative execution to touch otherwise inaccessible memory. Although not designed to do so, we suspected the new Spectre mitigation techniques could also affect the branch-shadowing attacks. However, our testing indicates that neither the recent firmware patches from Intel 6 , nor the compilerbased Retpoline 7 affect the ability to perform branch-shadowing attacks against SGX.
In particular, we confirmed that Indirect Branch Restricted Speculation (IBRS) -designed to prevent unprivileged code from affecting speculation in privileged execution, e.g., within the enclave -has no effect on branch-shadowing. In our tests we saw no difference between an updated i7-7500U CPU and non-updated machines. We speculate that this is because IBRS is specifically designed to prevent low-privilege code from affecting high-privilege code. Whereas branch-shadowing relies on high-privileged code affecting in subsequent low-privilege code. The Retpoline defense replaces branch instructions with return instructions but our tests indicate that return statements affect the BTB, not only the dedicated Return Stack Buffer (RSB). SGXPectre further demonstrated that Spectre attacks can be performed against Retpoline.
THREAT MODEL AND REQUIREMENTS
We assume that the attacker has fine-grained control of enclave execution, i.e., can interrupt the enclave with instruction-level accuracy. The attacker can thus perform a branch-shadowing attack against every branch instruction. Specifically, the attacker can determine whether or not a branch instruction has been executed and taken (i.e., whether a conditional jump fell through or not). If the branching decisions depend on sensitive enclave data, the attacker can infer this data through the branch-shadowing attack. This is a significantly stronger attacker capability than that assumed by previous work [10] because Van Bulck et al. [15] showed that single-step execution of SGX enclaves is both feasible to implement and sufficient to break existing defenses like Zigzagger [10] . We focus on branch-shadowing attacks and do not consider other side-channels, such as cache or page-fault attacks.
Given these attacker capabilities, we require a defence mechanism that prevents fine-grained branch-shadowing from revealing secret-dependent control flow. Specifically, in the instrumented code, we require that:
R.1 Any branch that can be directly observed through branch shadowing reveals no secret-dependent control flow information. R.2 For any secret-dependent branches, the attacker's probability of success is bounded based on a security parameter k.
PROPOSED APPROACH
Our mitigation scheme uses compile-time obfuscation and run-time randomization to hide the control flow of an enclave application. While our proposed method is inspired by and uses a similar approach to Zigzagger, we assume a stronger attacker model. Specifically, our approach can defend against branch-shadowing even in the presence of an attacker with single-step capabilities. Figure 1 illustrates the high-level view of our approach. The system consists of two main components: an obfuscating compiler and a run-time randomizer. The obfuscating compiler modifies the code by converting all branching instructions to indirect branches. The indirect branch targets are then explicitly set by the instrumentation depending on the converted branch type. We use conditional moves as replacements for conditional branches, allowing us to replicate the functionality of any conditional branch without involving the BPU. The observable control flow transitions, i.e., non trampoline branches, are further organized so that they are always unconditionally executed in the same order. The key insight of our approach is that, unlike Zigzagger, the trampolines are randomized inside the enclave at run-time by the randomizer. This prevents the attacker from reliably tracking their execution. Since only the trampolines are randomized, all other code remains in execute-only memory. Taken together, these two properties fulfill requirements R.1 and R.2, as we show in our security evaluation in Section 7.
Listing 1 and Figure 2 show a single if-statement and corresponding Control Flow Graph. The corresponding obfuscated CFG is show in Figure 3 . Figure 4 shows the same obfuscated code with the branch instructions converted. The static code is produced at compile time and its layout is assumed to be known to the attacker. The trampoline is similarly produced at compile time but is then randomized at run-time within the enclave. We assume that the attacker can observe and shadow the static code whereas the trampoline is unknown. Specifically, our approach works as follows:
Branch conversion: All branching instructions are converted to indirect unconditional branches. A register (r15) is reserved and populated with the original branch targets, which are stored in a jump-table that is updated during randomization. Conditional branches are converted to conditional moves (cmov) (e.g., Block0 in Figure 4 ).
Jump blocks: Each block is followed by a jump-block that jumps to a trampoline indicated by r15. Execution flows that do not include a specific block still go through any intermediate jump-blocks to ensure that all indirect jumps outside the trampolines are executed. For instance, when taking the if-clause (Block1), the elseblock (Block2) must not be executed but the corresponding jumpblock (B2J) must be (e.g., the blue line in Figure 4 ). This ensures that an attacker always sees the same sequence of jumps (i.e., B0J, B1J, and B2J), regardless of actual executed code. Trampolines: The corresponding trampolines are created, corresponding to either the branching target or the fall-through block (i.e., the next block that will be executed when a conditional branch is not taken). In Figure 3 , after execution of the if-block (Block1) the control flow is transferred to tb2S that will jump to the following jump-block B2J without executing the corresponding Block2 itself.
Skip blocks: When skipping a block -e.g., the else block after taking the if block -we must nonetheless execute the corresponding jump-block to prevent its omission from leaking information. The jump-block target is prepared in the prior trampoline block by setting r15. For instance, after executing the if-block the corresponding trampoline (tb2S) not only jumps to the correct jumpblock, but also sets the next target, tb3, into r15. To prevent timing attacks that measure the number of instructions between jumpblocks, the skipping trampolines (e.g., tb1S and tb2S) are populated with dummy-instructions to ensure that the timing between each jump-block is constant regardless of control flow. Although not shown in our example code, nested blocks are treated similarly to ensure that they execute all intermediary jumps.
Randomization: Trampolines are prepared during compilation, and are randomized at run-time inside the enclave. The randomization is implemented such that shadowing it does not reveal the randomization pattern. Randomizing the trampolines forces the attacker to shadow all possible locations in the enclave and thus, prevents shadowing the trampoline branches and reliably tracking the program's execution.
Re-randomization: Since an attacker could repeatedly call the same enclave functionality to gradually determine the randomization pattern, we can periodically re-randomize the trampolines. For example, the trampolines could be re-randomized on each enclave entry. As future work we envision to: a) provide code-annotation for limiting the obfuscation to only developer-determined sensitive parts, and b) randomize the trampoline code only when detecting multiple enclave entries (i.e., after a given number of potential shadowing attempts). 
IMPLEMENTATION DETAILS
We have implemented an open-source prototype of our approach, based on LLVM 6.0 and implemented in the X86 target backend. The instrumentation is applied by systematically traversing all functions and modifying their branching instructions, as explained in Section 5. Since the run-time randomization library cannot be randomized, it must be resistant to branch-shadowing attacks. While implemented, we have not yet integrated the randomizer to our instrumentation. For efficient and fine-grained randomization we do not preform in-place randomization, instead, we move trampoline entries between two trampoline areas. Listing 2 shows an overview of our randomization algorithm. Detailed description is available in our extended technical report [8] .
We have also implemented an application for shadowing inenclave execution in a controlled manner. Our setup is similar to [10] i.e., our application 1) retrieves branch instruction addresses and sets up a corresponding shadow-jump, 2) executes the victim enclave function and returns, 3) enables performance counters and executes the shadow-code, and 4) reads performance counters to infer in-enclave execution. Our setup is such that it could be integrated into the SGXStep-framework. We have replicated the shadowing techniques shown by [10] and performed shadowing on return statements.
EVALUATION 7.1 Security Analysis
As specified in Requirements (Section 4), we must prevent an attacker from inferring the secret-dependent control flow by R.1) ensuring that observable branches do not leak information, and R.2) preventing the attacker from probing other branches with a probability based on the security parameter k.
To hide any data-dependant branches (R.1), we replace all conditional branches with unconditional branches. We further setup the control flow so that each block in the static code section is executed in the same order and on each function call. One limitation is that we do not conceal the number of loop executions, because this is typically unknown compile time. In some cases this could be avoided by unrolling loops.
The remaining branching instructions are exclusively in the trampolines, for which the locations are randomized to defend against shadowing (R.2). Without knowing the exact trampoline layout, the attacker is forced to guess or exhaustively probe all possible locations. The probability of attack success (P attack ) is given by P attack = G k , where G is the number of guesses and k the number of possible trampoline locations.
The upper limit for G is the number of BTB entries, but in practice this is lowered by any intermediate code (e.g., system calls and attack setup) that pollutes the BTB. The security parameter k determines the trampoline randomization space. Because X86 allows unaligned execution, a single 4KB range gives us up to 4091 potential trampoline locations (with a trampoline size starting at 5 bytes). With a randomization area of 8KB and 4096 BTB entries, the success probability of shadowing a single branch has an upper bound of 0.5. The probability of following the full control flow drops exponentially as the number of targeted branches increase.
Performance Evaluation
We evaluated the overhead of our system in terms of CPU-utilization, memory use, and code size. All software was compiled using the SGX SDK version 2.0 and run on an SGX-enabled Intel Skylake Core We used SGX-Nbench 8 which is adapted from Nbench-byte-2.2.3, to measure the CPU and memory overhead of 10 different benchmarks executed within an enclave. All benchmarks were conducted with full instrumentation, but do not include randomization or dummy-instructions. Although the randomization would introduce additional overhead, it need not be constantly repeated. Instead it can be performed once on enclave creation and then later after a specified number of enclave re-entries.
CPU overhead: Table 1 shows the computational performance of various benchmarks in the enclave before and after obfuscation. The decrease in performance (i.e., the number of iterations per second) results from the addition of trampoline jumps and the need to exhaustively execute all jump-blocks. However, since we have obfuscated the entire program, these results represent the worst case scenarios. In real deployments, only the parts of the code that depend on secret data would be obfuscated. The performance penalty depends on how complicated the function is in terms of size and number of branches. The Assignment benchmark, for instance, has functions with many nested conditional branches, all of which require corresponding jump-blocks to be added and executed.
Memory overhead: As expected, our instrumentation does not increase heap or stack usage of the enclave.
Code size: To measure the increase in code size, we compared the size of the enclave object files before and after instrumentation. The size of the SGX-Nbench object files increased from 329.1 kB to 370.1 kB after instrumentation. Similarly to performance overhead, code size overhead will also decrease when instrumenting only the secret-dependent sections of the code.
RELATED WORK
There is a growing body of research on side channel attacks targeting Intel SGX and corresponding countermeasures. In addition to the branch-shadowing attacks [5, 10] , there are other side channel attacks targeting SGX enclaves [2, 7, 14, 16 ]. 8 https://github.com/utds3lab/sgx-nbench Several approaches have been presented to thwart controlledchannel (page-fault) attacks. SGX-Shield [11] randomizes the memory layout, similar to Address Space Layout Randomization (ASLR), to prevent control flow hijacking and hide the enclave memory layout. This approach impedes run-time attacks that exploit memory errors or attacks that rely on a known memory layout (e.g., controlled-channel attacks). SGX-Shield uses on-load randomization, allowing repeated branch-shadowing attacks to gradually reveal the randomization pattern. Our approach solves this through run-time re-randomization. We further minimize the additional attack-surface by limiting the randomization to the trampolines.
Shinde et al. [13] propose an approach that masks page-fault patterns by making the program's memory access pattern deterministic. More precisely, they alter the program such that it accesses all its data and code pages in the same sequence, regardless of the input. This makes the enclave application demonstrate the same page-fault pattern for any secret input variables. T-SGX [12] leverages Intel Transactional Synchronization Extensions (TSX) to suppress encountered page-faults without invoking the underlying OS. Although T-SGX does not mitigate branch-shadowing attacks [10] , it could be combined with our approach to address both branch-shadowing and page-fault attacks.
DR.SGX [1] is presented to defend against cache side-channel attacks. It permutes data locations, and continuously re-randomizes enclave data in order to hamper correlation of memory accesses. This approach prevents leakages resulting from secret-dependant data accesses. Similarly, Chandra et. al [3] inject dummy data instances into the user-supplied data instances in order to add noise to memory access traces. They randomize/shuffle the dummy data with the user data to reduce the chance of extracting sensitive information from side-channels. Both approaches are similar to ours in that they employ randomization, but they are not designed to defend against branch shadowing attacks since they randomize data memory locations rather than control flow.
CCFIR (Compact Control Flow Integrity and Randomization) [17] is a new method proposed to impede control-flow hijacking attacks (e.g., return-into-libc and ROP). CCFIR controls the indirect control transfers and limits the possible jump location to a whitelist in a Springboard. Randomizing the order of the stubs in the Springboard adds an extra layer of protection and frustrates guessing of the function pointers and return addresses. However, CCFIR has not been designed for use in SGX enclaves.
Obfuscation techniques were previously used to thwart leakages via side-channel attacks. Oblivious RAM (ORAM) [6] conceals the program's memory access pattern by shuffling and re-encrypting the accessed data. However, the state should be stored/updated at client-side, which makes it difficult to use for protecting cache since it is challenging to store the internal state of ORAM securely without hardware support, given the small size of cache lines. Moreover, this approach incurs significant performance overhead.
None of the above countermeasures focus on mitigating branchshadowing attacks, and additionally, Lee et. al [10] have demonstrated that their branch-shadowing attack is capable of breaking the security constructs of SGX-Shield, T-SGX, and ORAM.
CONCLUSION AND FUTURE WORK
We propose a software-based mitigation scheme to defend against branch-shadowing attacks, even in the presence of attackers with the ability to single-step through SGX enclaves. Our approach combines compile-time control flow obfuscation with run-time code randomization to prevent the enclave program from leaking secret-dependant control flow. We evaluated our approach using ten benchmarks from SGX-Nbench. Although we considered the worstcase scenario (whole program instrumentation), our results show that, on average, our approach results in less than 18% performance loss and less than 1.2 times code size increase.
As future work, we will integrate the randomizing component and optimize our obfuscating compiler to reduce overhead. In addition, we plan to integrate our approach with other defences, in order to mitigate a broader range of side-channel attacks.
