Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch
  Shadowing by Lee, Sangho et al.
Inferring Fine-grained Control Flow Inside
SGX Enclaves with Branch Shadowing
Sangho Lee† Ming-Wei Shih† Prasun Gera† Taesoo Kim† Hyesoon Kim† Marcus Peinado∗
† Georgia Institute of Technology
∗ Microsoft Research
A revised version of this paper will be
presented at USENIX Security Symposium 2017.
Please cite this paper as
Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim, and Marcus Peinado,
“Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing,”
in Proceedings of the 26th USENIX Security Symposium (Security),
Vancouver, Canada, August 2017.
Abstract
Intel Software Guard Extension (SGX) is a hardware-based
trusted execution environment (TEE) that enables secure com-
putation without trusting any underlying software, such as op-
erating system or even hardware firmware. It provides strong
security guarantees, namely, confidentiality and integrity, to an
enclave (i.e., a program running on Intel SGX) through solid
hardware-based isolation. However, a new controlled-channel
attack (Xu et al., Oakland 2015), although it is an out-of-scope
attack according to Intel SGX’s threat model, demonstrated that
a malicious OS can infer coarse-grained control flows of an
enclave via a series of page faults, and such a side-channel can
be severe for security-sensitive applications.
In this paper, we explore a new, yet critical, side-channel at-
tack against Intel SGX, called a branch shadowing attack, which
can reveal fine-grained control flows (i.e., each branch) of an
enclave program running on real SGX hardware. The root cause
of this attack is that Intel SGX does not clear the branch his-
tory when switching from enclave mode to non-enclave mode,
leaving the fine-grained traces to the outside world through a
branch-prediction side channel. However, exploiting the chan-
nel is not so straightforward in practice because 1) measuring
branch prediction/misprediction penalties based on timing is too
inaccurate to distinguish fine-grained control-flow changes and
2) it requires sophisticated control over the enclave execution
to force its execution to the interesting code blocks. To over-
come these challenges, we developed two novel exploitation
techniques: 1) Intel PT- and LBR-based history-inferring tech-
niques and 2) APIC-based technique to control the execution of
enclave programs in a fine-grained manner. As a result, we could
demonstrate our attack by breaking recent security constructs,
including ORAM schemes, Sanctum, SGX-Shield, and T-SGX.
Not limiting our work to the attack itself, we thoroughly studied
the feasibility of hardware-based solutions (e.g., branch history
clearing) and also proposed a software-based countermeasure,
called Zigzagger, to mitigate the branch shadowing attack in
practice.
1 Introduction
Establishing a trusted execution environment (TEE) is one of the
most important security requirements, as we cannot fully trust
the underlying computing platform, such as the public cloud and
possibly compromised operating system (OS). When we want to
run security-sensitive applications (e.g., processing financial or
health data) in the public cloud, we need to either fully trust the
operator, which is problematic [15], or encrypt all data before
uploading it to the cloud and perform computations directly
on the encrypted data by using fully homomorphic encryption,
which is too slow [43], or property-preserving or searchable
encryption, which is basically weak [39, 44, 16]. Even when we
use the private cloud or personal workstation, similar problems
still exist because we cannot ensure whether the underlying OS
is robust against attacks due to its huge code base and high
complexity [55, 27, 17, 36, 20, 2]. Since the OS is the trusted
computing base (TCB), by compromising it, an attacker can
have full control of any applications running on the platform.
Hardware-based TEEs, such as Trusted Platform Module
(TPM) [57], ARM TrustZone [4], and Intel Software Guard
Extension (SGX) [23], have been actively proposed to realize
TEEs. Especially, Intel SGX is receiving a lot of attention
because of its availability and applicability. All Intel Skylake
and Kaby Lake CPUs support Intel SGX, and processes secured
by Intel SGX (i.e., processes running inside an enclave) can use
almost every unprivileged CPU instruction without restrictions.
As far as we can trust the hardware vendors (i.e., if there is no
hardware backdoor [63]), it is believed that the hardware-based
TEE is secure since compromising hardware is more difficult
1
ar
X
iv
:1
61
1.
06
95
2v
3 
 [c
s.C
R]
  1
 Ju
n 2
01
7
than software in most cases due to physical limitations (e.g.,
desoldering CPU packaging) and verifiability.
Unfortunately, recent studies [62, 52] show that Intel SGX suf-
fers from a noise-free side-channel attack, known as a controlled-
channel attack. Intel SGX allows an OS to have full control
of the page table of an SGX program; it can map or unmap
arbitrary memory pages of the SGX program. This makes a
malicious OS know exactly which memory pages a victim SGX
program attempts to access by monitoring page faults. Unlike
conventional side channels such as cache-timing channels, this
page-fault side channel is deterministic; namely, it does not
suffer from any measurement noise.
The controlled-channel attack has a limitation; it only reveals
coarse-grained page-level access patterns. Intel said that its
architecture (including Intel SGX) aims to provide protection
against side-channel attacks at the cache-line granularity [22].
Thus, such page-level access information would be too coarse
grained to be their main concern. Further, researchers propose
effective countermeasures against the controlled-channel attack,
which are based on balanced execution [52] and user-space page-
fault detection [51, 10, 52]. However, these countermeasures
only prevent the controlled-channel attack, hence a fine-grained
side-channel attack, if it exists, would easily bypass them.
We thoroughly explored Intel SGX to know whether it has a
critical side channel that reveals fine-grained information (finer
than cache-line granularity) and is robust against noise. One key
observation is that Intel SGX leaves branch history uncleared
during enclave mode switches, which can be used as a side
channel. Knowing the branch history (i.e., taken or not-taken
branches) is critical because it would reveal the fine-grained ex-
ecution trace of a process in terms of basic blocks. To avoid this
problem, Intel SGX hides the branch history information inside
an enclave from hardware performance counters, including last
branch record (LBR) and Intel Processor Trace (PT) [23]. In
other words, an OS is unable to directly monitor and manipu-
late the branch history of enclave processes. However, since
Intel SGX does not clear the branch history, the fine-grained
execution traces can be potentially inferred outside of an enclave
through a branch-prediction side channel [3, 12, 13].
The branch-prediction side channel attack aims to recognize
whether the history of a target branch instruction is stored into
a CPU internal buffer for the branch prediction, known as the
branch target buffer (BTB). To achieve the goal, this attack
measures how long it takes to execute a shadowed branch in-
struction, which could be mapped into the same BTB entry the
target branch instruction is stored into due to their same address
in terms of lowest 31 bits (§2.2) or set conflicts (§6.2). This
collision between two branch instructions results in a timing
difference due to branch misprediction penalty (§3). Several
researchers have tried to use this side channel to infer crypto-
graphic keys [3], create a covert channel [12], and break address
space layout randomization (ASLR) [13].
This attack, however, is difficult to realize without a compro-
mised OS (i.e., the threat model of SGX) and a precise measure-
ment strategy due to the following reasons. First, an attacker
cannot easily guess the address of a target branch instruction
and manipulates its branch addresses due to ASLR. Second,
since the BTB’s capacity is limited, its entry would be easily
overwritten by other branch instructions before an attacker gets
a chance to probe it. Third, measuring branch misprediction
penalty suffers from tremendous time noise (§3.3). In summary,
an attacker should have 1) a right to freely manipulate the vir-
tual address space, 2) access to the BTB anytime before it is
overwritten, and 3) a method to recognize branch misprediction
with negligible (or no) noise.
In this paper, we present a new branch-prediction side-channel
attack, called the branch shadowing attack, to identify fine-
grained control flows inside an enclave without noise (to iden-
tify conditional and indirect branches) or with negligible noise
(to identify unconditional branch). A malicious OS can easily
manipulate the virtual address space of an enclave process, so
that it is easy to create shadowed branch instructions collid-
ing with target branch instructions in an enclave. To minimize
the measurement noise, we tried to use Intel PT’s timestamps
instead of RDTSC (§3.3). More importantly, we found that Sky-
lake’s LBR allows us to obtain the most accurate information
for the branch shadowing attack because it reports whether each
conditional/indirect branch instruction is correctly predicted or
mispredicted. That is, we can exactly know the prediction and
misprediction of conditional and indirect branches (§3.3, §3.5).
Furthermore, Skylake’s LBR reports elapsed core cycles be-
tween LBR entry updates, which are very stable according to
our measurements (§3.3). By using this information, we can
precisely infer the execution of an unconditional branch (§3.4).
Precise execution control and frequent branch history probing
are other important requirements of the branch shadowing attack.
To achieve these goals, we manipulated the frequency of the
local advanced programmable interrupt controller (APIC) timer
as frequently as possible and modified the timer interrupt code
to make it execute the branch shadowing attack. Further, we
selectively disable the CPU cache when a more precise attack is
needed (§3.6).
We performed case studies to evaluate the effectiveness of
the branch shadowing attack (§4). First, we extracted sensitive
information from SGX applications including Linux SGX SDK
(string conversions and formatted strings), mbed TLS crypto-
graphic library (RSA private keys), LIBSVM machine-learning
library (classification models and parameters), and Apache web
server (HTTP requests). Next, we analyzed state-of-the-art stud-
ies to secure SGX including deterministic multiplexing [52],
Sanctum [10], SGX-Shield [49], and T-SGX [51], and con-
firmed that our attack bypassed all of them. Finally, we sug-
gested hardware- and software-based countermeasures against
the branch shadowing attack, by clearing branch history during
enclave mode switches and using indirect branches with multi-
ple targets (§5). Both of them had acceptable overhead (below
1.3×).
In summary, the contributions of this paper are as follows:
• Fine-grained attack. We demonstrate that the branch
shadowing attack can identify fine-grained control flow
information inside an enclave in terms of basic blocks, un-
like the state-of-the-art controlled-channel attack that only
reveals page-level accesses.
• Precise attack. We make the branch shadowing attack
2
very precise by 1) exploiting Intel PT and LBR to correctly
identify branch history and 2) adjusting the local APIC
timer to precisely control the execution inside an enclave.
We can deterministically know whether a target branch has
taken or not taken without noise (conditional and indirect
branches) or with negligible noise (unconditional branch).
• Countermeasures. We design proof-of-concept hardware-
and software-based countermeasures against the branch
shadowing attack. We evaluate both approaches’ effective-
ness and performance overhead.
The remainder of this paper is organized as follows. §2 ex-
plains details about Intel SGX and other processor features our
attack relies on. §3 introduces our branch shadowing attack
in detail. §4 explains how the branch shadow attack reveals
sensitive information from SGX applications and defeats re-
cent security proposals. §5 describes our hardware-based and
software-based mitigations against the branch shadowing attack.
§6 discusses some limitations of the branch shadowing attack
and considers possible advanced attacks. §7 introduces related
work. §8 concludes this paper.
2 Background
In this section, we first explain the basics of Intel SGX. Then, we
explain Intel CPU’s other essential features (branch prediction,
LBR, and local APIC timer) related with our attack.
2.1 Intel SGX
Intel SGX [9] is one of the existing implementations of hardware-
based TEE that has been shipped with Intel CPU since Skylake.
SGX is designed under the assumption that the TCB is reduced
to include only the internals of the CPU package, i.e., privileged
software such as OS or hypervisor and other hardware units are
excluded. To this end, SGX allows an application to instantiate
a secure container called an enclave. Enclaves are allocated in
a dedicated physical memory region, called the enclave page
cache (EPC), that is protected by an on-chip memory encryption
engine (MEE) such that the EPC content always stays encrypted
and is only decrypted right before entering the CPU package.
SGX also enforces different CPU access controls between en-
clave code and non-enclave code to allow only the enclave to
access its own code and data, while accesses from other soft-
ware are prohibited. Note that the enclave is still allowed to
access non-enclave memory region. Enclaves can be created as
part of applications’ address space via an SGX instruction set.
Measurement of code and data is calculated during the loading
process and can serve as evidence about the enclave in remote
attestation [28, 21].
Non-enclave cod and enclave code interaction. Non-enclave
code can only switch to enclave code via either the EENTER
instruction to a list of defined entry points or the ERESUME in-
struction that resumes execution where an asynchronous enclave
exit (AEX) happens due to events such as exceptions and inter-
rupts. Upon enclave exit (AEX or using the EEXIT instruction),
a series of checks and actions is performed, such as TLB flush,
to ensure the isolation of an enclave. To exchange input and
output values, enclave code and non-enclave code use untrusted
memory outside an enclave.
2.2 Branch Prediction
Branch prediction is one of the most important features of mod-
ern pipelined processors. Basically, an instruction pipeline con-
sists of four major stages: fetch, decode, execute, and write-back.
This pipeline structure makes the processor execute an instruc-
tion while fetching/decoding the next instructions and storing the
result of the previous instruction into the memory (or the cache);
namely, the processor can execute a number of instructions in
parallel. However, the pipelined processor has a problem with
a branch instruction because, before executing it, the processor
cannot know what the next instruction is. Making the instruction
pipeline stall until the processor confirms the next instruction
is bad for the overall throughput, so modern processors have a
branch prediction unit (BPU) to predict the next instruction after
a branch instruction and execute it to maintain the pipeline uti-
lization. However, a branch misprediction would bring a penalty
because the processor needs to clear the pipeline and roll back
the execution results. This is why the Intel optimization man-
ual [24] emphasizes branches and Intel provides a dedicated
hardware feature to log branch information: the LBR, which
will be explained later.
Branch and branch target prediction. There are two kinds
of branch predictions: branch prediction and branch target
prediction. Branch prediction is a procedure to predict the next
instruction of a conditional branch by guessing whether it will
be taken or not be taken. Branch target prediction is a procedure
to predict the target instruction of a conditional or unconditional
branch before executing it. For branch target prediction, modern
processors have the branch target buffer (BTB) to store the
computed target addresses of taken branch instructions and fetch
them when the corresponding branch instructions are about to
be executed.
BTB structure and partial tag hit. The BTB resembles as
cache. Some address bits are used to compute the index bits and
some address bits are used for tag. However, in the BTB, only
smaller number of bits are used for tag to save the BTB unlike
cache uses the all the remaining bits for tag. For example, in 64
bit address space, if ADDR[11:0] are used for index, instead of
using ADDR[63:12] for a tag, only partial number of bits such as
ADDR[31:12] are used as tag. The reasons are first, compared to
a data cache, the BTB size is very small, which results in many
unused bits. Second, typically in one program, the upper bits are
almost the same. Third, unlike a cache which needs to provide
the architectural values, the BTB is just a predictor. Even if a
partial tag matching results in a false BTB hit, the correct target
will be computed at the execution stage and the pipeline will
be roll back if the prediction is wrong. This feature is needed
because for indirect branches, even a BTB hit can results in a
wrong prediction, which should be corrected at the execution
stage.
Static and dynamic branch prediction. The static branch
prediction is a basic rule of predicting the next instruction of a
branch instruction when it has no history [24]. First, a processor
3
predicts a conditional branch will not be taken, which means the
next instruction will be directly fetched (i.e., a fall-through path).
Second, a processor predicts an indirect branch will not be taken.
Third, a processor predicts that an unconditional branch will be
taken (i.e., the specified target will be fetched).
When a branch instruction has a history, i.e., it has a BTB
entry, a processor predicts that the stored target address will be
the next instruction. This procedure is known as dynamic branch
prediction. Note that the changes of branch prediction behaviors
according to the branch history can be used as a side channel to
infer a victim process’s activities (§3).
2.3 Last Branch Record (LBR)
The LBR is Intel CPU’s new feature that logs the information of
recently taken branches without any performance degradation,
as it is separated from the instruction pipeline [32, 31, 25]. In
Skylake CPUs, the LBR stores the information of up to 32 recent
branches, including the address of a branch instruction (from),
the target address (to), whether the branch or branch target was
mispredicted, and the elapsed core cycles between LBR entry
updates (also known as the timed LBR). Without filtering, the
LBR records all kinds of branches, including function calls,
function returns, indirect branches, and conditional branches.
Also, the LBR can selectively record branches taken in the user
space, kernel space, or both.
2.4 Local APIC Timer
The local advanced programmable interrupt controller (APIC) is
a component of Intel CPUs to configure and handle CPU-specific
interrupts [25, §10]. An OS can program the local APIC through
memory-mapped registers (e.g., device configuration register)
or model-specific registers (MSRs) to adjust the frequency of
the local APIC timer, which generates high-resolution timer
interrupts, and deliver an interrupt to a CPU core (e.g., inter-
processor interrupt (IPI) and I/O interrupt from the I/O APIC).
Intel CPUs support three local APIC timer modes: periodic,
one-shot, and timestamp counter (TSC)-deadline modes. The
periodic mode lets an OS configure the initial-count register
whose value is copied into the current-count register the local
APIC timer uses. The current-count register’s value decreases
at the rate of the bus frequency, and when it becomes zero, a
timer interrupt is generated and the register is re-initialized by
using the initial-count register. The one-shot mode lets an OS
reconfigure the initial-count counter value whenever a timer
interrupt is generated. The TSC-deadline mode is the most
advanced and precise timer mode that allows an OS to specify
when the next timer interrupt should occur in terms of a TSC
value. Our target Linux system (kernel version 4.4) uses the
TSC-deadline mode, so we mainly considers this mode.
3 Branch Shadowing Attacks
In this section, we explain our attack, the branch shadowing
attack, to obtain the fine-grained control flow information of
an enclave process. We first introduce our threat model and
depict how we can attack three types of branches: conditional,
unconditional, and indirect branches. Then, we describe our
approach to synchronize the victim and attack code in terms of
execution time and memory address space.
3.1 Threat Model
We explain our threat model, which is based on the original
threat model of Intel SGX and the controlled-channel attack [62]:
an attacker has compromised the operating system and exploits
it to attack a target enclave program.
First, the attacker knows the possible control flows of a target
enclave program (i.e., a sequence of branch instructions and their
targets) by statically or dynamically analyzing its source code or
binary. Unobservable code (e.g., self-modifying code and code
from remote servers) is outside the scope of our attack. Also,
the attacker can map the target enclave program into specific
memory addresses to designate the locations of each branch
instruction and its target address. Self-paging [19] and live re-
randomization of address-space layout [14] inside an enclave
are outside the scope of our attack.
Second, the attacker can infer which portion of code the target
enclave program runs through observable events, such as calling
functions outside an enclave and page faults. Our attack uses this
information to synchronize the execution of the target enclave
program with the branch probing code (§3.8).
Third, the attacker can interrupt the execution of the target
enclave program as frequently as possible to frequently run the
branch probing code. This can be done by manipulating a local
APIC timer and/or disabling the CPU cache (§3.6).
Fourth, the attacker can recognize the branch probing code’s
branch prediction and misprediction by monitoring hardware
performance counters (e.g., the LBR) or measuring branch mis-
prediction penalty [3, 13, 12].
3.2 Overview
The goal of the branch shadowing attack is to obtain the fine-
grained control flow of an enclave program by 1) knowing
whether a branch instruction has been taken or not taken and 2)
inferring the target address of the taken branch. To achieve the
goal, an attacker first needs to analyze the source code and/or
binary of a victim enclave program to enumerate all branches
(unconditional, conditional, and indirect branches) and their
target addresses. Next, the attacker writes shadow codes for
each branch or a set of branches to probe their branch history,
which is similar to Evtyushkin et al.’s attack using the BTB [13].
Since using BTB alone suffers from significant noise, the branch
shadowing uses both BTB and LBR, which allows the attacker
precisely identify the states of all branch types (§3.3, §3.4, §3.5).
Due the size limitation of BTB and LBR, the branch shadowing
attack has to synchronize the execution of the victim code and
the shadow code in terms of execution time and memory address
space. We manipulate the local APIC timer and the CPU cache
(§3.6) to frequently interrupt an enclave process’s execution for
synchronization, and adjust virtual address space (§3.7) and run
probing code to find a function an enclave process is running
(§3.8).
4
1 if (a != 0) {
2 ++b;
3 ...
4 }
5 else {
6 --b;
7 ...
8 }
9 a = b;
10 ...
(a) Victim if-else statement exe-
cuted inside an enclave. Accord-
ing to the value of a, either if-
block or else-block is executed.
1 ⋆ if (c != c) {
2 nop; // never executed
3 ...
4 }
5 ⋆ else {
6 ⋆ nop; // execution
7 ⋆ ...
8 ⋆ }
9 ⋆ nop;
10 ⋆ ...
(b) Shadowed if-else statement
aligned with (a) and executed af-
ter (a). The BPU predicts which
block will be executed according
to the branch history of (a).
Figure 1: An example of a shadowing scheme (b) against a victim’s
conditional branch (a). The execution time (i.e., running [1, 5-10],
marked with ⋆ in (b)) of the shadowing instance depends on the branch-
ing result (i.e., taken or not at [1] in (a)) of the victim instance.
3.3 Conditional Branch Shadowing
We explain how an attacker can know whether a target con-
ditional branch inside an enclave has been taken or not taken
by shadowing its branch history. Unlike other branch types (un-
conditional and indirect branches) explained later, a conditional
branch is related to two kinds of prediction: branch prediction
and branch target prediction. For a conditional branch, we focus
on recognizing whether the branch prediction is correct or not
because it lets us know the result of the condition evaluation (i.e.,
a given condition of if statement or for loop). This goal dif-
fers from the previous branch timing attack against ASLR [13]
because its goal is finding a randomized target address of a
branch instruction by probing possible target addresses while
monitoring the penalty of branch target mispredictions.
Inferring through timing (RDTSC). We first explain how we
can infer branch mispredictions with RDTSC, which is based on
Evtyushkin et al.’s approach [13]. Figure 1 shows example
code with a conditional branch and its shadow for attack. The
victim code’s execution depends on the value of a: if a is not
zero, the branch will not be taken such that the if-block will
be executed; otherwise, the branch will be taken such that the
else-block will be executed. In contrast, we make the shadow
code’s branch always be taken (i.e., the else-block is always
executed). Without the branch history, this branch is always
mispredicted due to the static branch prediction rule (§2.2).
To exploit the branch history, we have to align the shadow
code’s address (both the branch instruction and its target address)
with the victim code’s address in terms of lower 31 bits, such
that the shadow code can share the same BTB entries with the
victim code.
When the victim code has been executed before the (aligned)
shadow code is executed, the branch prediction or misprediction
of the shadow code depends on the execution of the victim code.
If the conditional branch of the victim code has been taken, i.e.,
if a was zero, the BPU predicts that the shadow code will also
take the conditional branch, which is a correct prediction so that
no rollback will happen. If the conditional branch of the victim
code either has not been taken, i.e., if a was not zero, or has
Correct prediction Misprediction
Mean σ Mean σ
RDTSCP 94.21 13.10 120.61 806.56
Intel PT CYC packets 59.59 14.44 90.64 191.48
LBR elapsed cycle 25.69 9.72 35.04 10.52
Table 1: Measuring branch misprediction penalty with RDTSCP, Intel PT
CYC packet, and LBR elapsed cycle (10,000 times). Our machine has
an Intel Core i7 6700K CPU (4GHz). We put 120 NOP instructions at
the fall-through path. The LBR elapsed cycle is less noisy than RDTSCP
and Intel PT. σ stands for standard deviation.
not been executed, the BPU predicts that the shadow code will
not take the conditional branch. However, this is an incorrect
prediction such that a rollback will happen.
Previous branch timing attacks try to measure such a rollback
penalty by using RDTSC or RDTSCP instructions (e.g., before Line
1 and after Line 5 of Figure 1b). However, according to our
experiments (Table 1), branch misprediction penalties were very
noisy such that it was difficult to set a clear boundary between
correct prediction and misprediction. This is because the num-
ber of instructions that would be mistakenly executed due to
the branch misprediction is difficult to predict given the highly
complicated internal structure of the latest Intel CPUs (e.g., out-
of-order execution). Therefore, we think that the RDTSC-based
inference is difficult to use in practice and, thus, we aim to use
the LBR to realize precise attacks, since it lets us know branch
misprediction information and its elapsed cycle feature has a
small noise (Table 1).
Inferring from execution traces (Intel PT). In addition to
RDTSC, we found that Intel PT can be used to measure a mispre-
diction penalty of a target branch, as it provides precise elapsed
cycles (known as a CYC packet) between each PT packet. How-
ever, this CYC packets cannot be immediately used for our
purpose because Intel PT aggregates a series of conditional and
unconditional branches into a single packet as an optimization.
To avoid this problem, we intentionally insert an indirect branch
right after the target branch, making all branches properly record
their elapsed time in different CYC packets. As shown in Table 1,
Intel PT’s timing information about the branch misprediction
can significantly reduce the measurement variance of our attack
compared to RDTSCP.
Precise leakages (LBR). Figure 2 shows a detailed procedure
of conditional branch shadowing with the BTB and LBR. We
first explain the case where a conditional branch has been taken
(Case 1). 1 A conditional branch of the victim code inside an
enclave is taken and the corresponding information (the branch
instruction’s address and the relative target address) is stored into
the BTB. Note that this branch taken happens inside an enclave
such that the LBR does not report this information unless we
run an enclave process with a debug mode. 2 The enclave
execution is interrupted and an OS takes control. We explain
how a malicious OS can frequently interrupt an enclave process
in §3.6. 3 The OS kernel enables the LBR and then executes the
shadow code. 4 The BPU correctly predicts that the conditional
branch will be taken. 5 Finally, by disabling and retrieving the
LBR, we can know the shadow code’s conditional branch has
been correctly predicted. Note that, by default, the LBR reports
5
...
cmp $0, rax
je 0xc2
inc rbx
...
jmp 0x4c
dec rbx
...
mov rbx, rdx
...
Enclave
0x400530:
0x4005f4:
0x400620:
0x4005f2:
Branch Target Buffer
0x400530 + 0xc2
...
Addr Target
Last Branch Record
❶ Take branch 
and store history
❷ Interrupt
❸ Enable LBR and execute shadow code
LBR do not record 
branch inside enclave
0x400532:
❹ Predict branch 
with history
...
cmp rax, rax
je 0xc2
nop
...
nop
nop
...
0xffff400530:
0xffff4005f4:
correct!
0xffff400532:
0xffff4005f5:
...
From To †Predicted
0xffff400530 0xffff4005f4 Yes
❺ Disable LBR and
check the branch info
Same address domain
Information flow
Execution flow
Mispredicted flow
(a) Case 1: The target conditional branch has been taken.
...
cmp $0, rax
je 0xc2
inc rbx
...
jmp 0x4c
dec rbx
...
mov rbx, rdx
...
Enclave
0x400530:
0x4005f4:
0x400620:
0x4005f2:
Branch Target Buffer
0x400530 n/a
...
Addr Target
Last Branch Record
❶ No branch and
delete history
❷ Interrupt ❻ Disable LBR and
check the branch info
0x400532:
...
cmp rax, rax
je 0xc2
nop
...
nop
nop
...
0xffff400530:
0xffff4005f4:
0xffff400532:
0xffff4005f5:
Incorrect!
❹ Mispredict branch 
❺ Take correct branch
❸ Enable LBR and execute shadow code
...
From To
0xffff400530 0xffff4005f4 No
 Predicted
(b) Case 2: The target conditional branch has not been taken (i.e., it has either not been executed or been executed but not taken.)
Figure 2: Branch shadowing attack against a conditional branch (i.e., Case 1 for taken and Case 2 for non-taken branches) inside an enclave (†
LBR records the result of misprediction. For clarity, we use the result of prediction in this paper.)
that all the branches (including function calls) occurred in user
and kernel spaces. Since our shadow code have no function
calls and is executed in the kernel, we use the LBR’s filtering
mechanism to ignore every function call and all branches in the
user space.
Next, we explain the case where a conditional branch has not
been taken (Case 2). 1 The conditional branch of the victim
code inside an enclave is not taken, so either no information
is stored into the BTB or the corresponding old information is
deleted (i.e., old information can be evicted if newer branches
need to be inserted in the same set.) 2 The enclave execution is
interrupted and an OS takes control. 3 The OS kernel enables
the LBR and then executes the shadow code. 4 The BPU
incorrectly predicts that the conditional branch will not been
taken. 5 The execution is rolled back and the code takes the
branch. 6 Finally, by disabling and retrieving the LBR, we
can know that the shadow code’s conditional branch has been
mispredicted and can see the misprediction penalty.
3.4 Unconditional Branch Shadowing
We explain how an attacker can know whether a target uncon-
ditional branch inside an enclave has been executed or not by
shadowing its branch history. The execution of an unconditional
branch gives us two kinds of information. First, an attacker
can infer where the instruction pointer (IP) inside an enclave
currently points. Second, an attacker can infer the result of
the condition evaluation of an if-else statement because an if
block’s last instruction is an unconditional branch to skip the
corresponding else block.
Unlike a conditional branch, an unconditional branch is al-
ways taken when it is executed; i.e., a branch prediction is
not needed. Thus, to recognize its behavior, we need to divert
its target address to observe branch target mispredictions, not
branch mispredictions. Interestingly, we found that the LBR
does not report the branch target misprediction of an uncondi-
tional branch, unlike conditional and indirect branches. Thus,
we use the elapsed cycles of a branch that the LBR reports to
identify branch target misprediction penalty, which is less noisy
than RDTSC (Table 1).
Attack procedure. Figure 3 shows a procedure of uncondi-
tional branch shadowing. Unlike the conditional branch shadow-
ing, we make the target of the shadowed unconditional branch
differ from the target of the victim unconditional branch inside
an enclave to monitor a branch target misprediction of the shad-
owed branch. We first explain the case where an unconditional
branch has been executed (Case 3). 1 An unconditional branch
of the victim code inside an enclave is executed (i.e., taken) and
the corresponding information is stored into the BTB. 2 The
enclave execution is interrupted and OS takes control. 3 The
OS kernel enables the LBR and then executes the shadow code.
4 The BPU mispredicts the branch target of the shadowed
unconditional branch due to the mismatched branch history. 5
The execution is rolled back and the shadow code jumps into
the correct target. 6 The shadow code executes an additional
branch to measure the elapsed cycle of the mispredicted branch.
7 Finally, by disabling and retrieving the LBR, we infer that
a branch target misprediction happened, according to the large
elapsed cycles.
Next, we explain the case where an unconditional branch has
not been taken (Case 4). 1 The enclave does not execute an
unconditional branch of the victim code (e.g., a function contain-
ing the code is never executed), so the BTB does not have any
6
...
jmp 0xc2
inc rbx
...
dec rbx
mov rbx, rdx
...
Enclave
0x400530:
0x4005f4:
0x4005f2:
Branch Target Buffer
0x400530 + 0xc2
...
Address Target
Last Branch Record
❶ Take branch 
and store history
❷ Interrupt
0x400532:
...
From To Predicted Elapsed cycles
0xffff400530 0xffff400620 Yes? 0
0xffff400621 0xffff400627 Yes 35
...
jmp 0xee
nop
...
nop
...
nop
jmp 0x4
...
nop
...
0xffff400530:
0xffff4005f4:
0xffff400620:
0xffff400532:
0xffff400621:
0xffff400627:
Incorrect!
❸ Enable LBR and execute shadow code
❹ Mispredict branch
❺ Take correct branch
❻ Additional branch to
obtain elapsed cycles
❼ Disable LBR and
check the branch info
Same address domain
Information flow
Execution flow
Mispredicted flow
(a) Case 3: The target unconditional branch has been taken. The LBR does not report the misprediction of unconditional branches, but we can
infer it by using the elapsed cycles.
...
jmp 0xc2
inc rbx
...
dec rbx
mov rbx, rdx
...
Enclave
0x400530:
0x4005f4:
0x4005f2:
Branch Target Buffer
0x400530 n/a
...
Address Target
Last Branch Record
❶ No 
execution
❷ Interrupt
❸ Enable LBR and execute shadow code
0x400532:
...
From To Predicted Elapsed cycles
0xffff400530 0xffff400620 Yes 0
0xffff400621 0xffff400627 Yes 25
...
jmp 0xee
nop
...
nop
...
nop
jmp 0x4
...
nop
...
0xffff400530:
0xffff4005f4:
0xffff400620:
Correct!
0xffff400532:
0xffff400621:
0xffff400627:
❻ Disable LBR and 
check the branch info
❺ Additional branch to 
obtain elapsed cycles
❹ Predict branch
(b) Case 4: The target unconditional branch has not been taken (i.e., it has not been executed.)
Figure 3: Branch prediction attack against an unconditional branch inside an enclave.
information of the branch. 2 The enclave execution is inter-
rupted and an OS takes control. 3 The OS kernel enables the
LBR and then executes the shadow code. 4 The BPU correctly
predicts the unconditional branch’s target because no branch his-
tory exists. 5 The shadow code executes an additional branch
to measure the elapsed cycles. 6 By disabling and retrieving
the LBR, we infer that no branch target misprediction happened,
according to the small elapsed cycles.
No misprediction of unconditional branch. We found that
the LBR always reports that every taken unconditional branch
has been correctly predicted no matter whether a branch target
misprediction has happened due to the BTB collision. Intel
does not explain about this behavior, but we think that this is
because the target of an unconditional branch is fixed such that
it should not be mispredicted in general. The LBR is proposed
for profiling branches and letting programmers know which
branches are frequently mispredicted. With this information,
they can improve the performance of their program by reorganiz-
ing branches to reduce mispredictions. In contrast, programmers
have no way to handle mispredicted unconditional branches,
which depend on the execution of kernel or another process
simultaneously running in the same core due to hyperthreading;
i.e., it does not help programmers improve their program and
only reveals side-channel information. We believe these are
the reasons the LBR just treats every unconditional branch as
correctly predicted.
3.5 Indirect Branch Shadowing
We explain how we can know whether a target indirect branch
inside an enclave has been executed by shadowing its branch his-
tory. Like an unconditional branch, an indirect branch is always
taken when it is executed. However, unlike an unconditional
branch, an indirect branch has no fixed branch target. If there is
no history the BPU predicts that the right next instruction will be
executed; this is the same as the indirect branch not being taken.
To recognize its behavior, we make a shadowed indirect branch
target its next instruction to monitor a branch target mispredic-
tion due to the history. The LBR reports the mispredictions of
indirect branches such that we do not need to rely on elapsed
cycles to attack indirect branches.
Attack procedure. Figure 4 shows a detailed procedure of
indirect branch shadowing. As mentioned previously, we set
the target of the shadowed indirect branch target as its next
instruction to observe whether a branch target misprediction
happens or not due to the branch history. We first explain the
case where an unconditional branch has been executed (Case
5). 1 An indirect branch of the victim code inside an enclave
is executed (i.e., taken) and the corresponding information is
stored into the BTB. 2 The enclave execution is interrupted and
OS takes control. 3 The OS kernel enables the LBR and then
executes the shadow code. 4 The BPU mispredicts the branch
target of the shadowed indirect branch due to the mismatched
branch history. 5 The execution is rolled back and the shadow
code jumps into the correct target. 6 Finally, by disabling and
retrieving the LBR, we can know that the shadow code’s indirect
branch has been incorrectly predicted.
7
...
jmpq *rdx
inc rbx
...
dec rbx
mov rbx, rdx
...
Enclave
0x400530:
0x4005f4:
0x4005f2:
Branch Target Buffer
0x400530 0x4005f4
...
Address Target
Last Branch Record
❷ Interrupt
❸ Enable LBR and execute shadow code
0x400532:
...
From To Predicted
0xffff400530 0xffff400532 No
...
mov 0xffff400532, rdx
jmpq *rdx
nop
...
0xffff400532:
0xffff400530:
0xffff400533:
❹ Mispredict branch
❶ Take branch 
and store history
Incorrect! ❺ Take correct branch
❻ Disable LBR and
check the branch info
Same address domain
Information flow
Execution flow
Mispredicted flow
(a) Case 5: The target indirect branch has been taken.
...
jmpq *rdx
inc rbx
...
dec rbx
mov rbx, rdx
...
Enclave
0x400530:
0x4005f4:
0x4005f2:
Branch Target Buffer
0x400530 n/a
...
Address Target
Last Branch Record
❶ No 
execution
❷ Interrupt
❸ Enable LBR and execute shadow code
0x400532:
...
From To Predicted
0xffff400530 0xffff400532 Yes
❺ Disable LBR and
check the branch info
...
mov 0xffff400532, rdx
jmpq *rdx
nop
...
0xffff400532: Correct!
0xffff400530:
0xffff400533:
❹ Predict branch 
(b) Case 6: The target indirect branch has not been taken (i.e., it has not been executed.)
Figure 4: Branch prediction attack against an indirect branch inside an enclave
Next, we explain the case where an indirect branch has not
been taken (Case 6). 1 The enclave does not execute the
indirect branch of the victim code, so that the BTB does not
have any information of the branch. 2 The enclave execution is
interrupted and an OS takes control. 3 The OS kernel enables
the LBR and then executes the shadow code. 4 The BPU
correctly predicts the indirect branch’s target because there is
no branch history. 5 Finally, by disabling and retrieving the
LBR, we can know that the shadow code’s indirect branch has
been correctly predicted, implying that the victim code’s indirect
branch has not been executed.
Inferring branch targets. Unlike conditional and uncondi-
tional branches, an indirect branch can have multiple targets such
that just knowing whether it has been executed or not would be
insufficient to know the victim code’s execution. Since the indi-
rect branch is mostly used for representing a switch-case state-
ment, it is also related to a number of unconditional branches
(i.e., break) as an if-else statement does. This implies that
an attacker can identify which case block has been executed by
probing the corresponding unconditional branch. Also, if an at-
tacker can repeatedly execute a victim enclave program with the
same input, he or she can test the same indirect branch multiple
times while changing candidate target addresses to eventually
know the real target address by observing a correct branch target
prediction.
Table 2 summarizes the branch types and states our branch
shadowing attack can infer and the necessary information.
3.6 Frequent Interrupt and Probe
The branch shadowing attack needs to consider cases that change
(or even remove) BTB entries because they make the attack miss
some branch histories. First, the size of the BTB is limited such
that a BTB entry could be overwritten by another branch in-
struction. We empirically identified that the Skylake’s BTB has
4,096 entries where the number of ways is four and the number
Branch State BTB LBR Inferred
Pred. Elapsed Cycl.
Cond. Taken ✓ ✓ - ✓Not-taken - ✓ - ✓
Uncond. Exec. ✓ - ✓ ✓Not-exec. - - ✓ ✓
Indirect Exec. ✓ ✓ - ✓Not-exec. - ✓ - ✓
Table 2: Branch types and states the branch shadowing attack can infer
by using the information of BTB and/or LBR.
of sets is 1,024 (§5.1). Due to its well-designed index hashing
algorithm, we observed that conflicts between two branch in-
structions located at different addresses rarely happened. But,
no matter how, if more than 4,096 different branch instructions
have been taken, the BTB will highly likely be overflowed and
we lose some branch histories. Second, a BTB entry for a con-
ditional or an indirect branch can be removed or changed due
to a loop or re-execution of the same function. For example,
a conditional branch has been taken at its first run and has not
been taken at its second run due to the changes of the given
condition, removing the corresponding BTB entry. A target of
an indirect branch can also be changed according to conditions,
which change the corresponding BTB entry. If the branch shad-
owing attack could not check a BTB entry before it has been
changed, it will lose the information.
To overcome this challenge, we interrupt an enclave process
as frequently as possible and check the branch history, by ma-
nipulating the local APIC timer and the CPU cache.
Manipulating local APIC timer. We manipulate the fre-
quency of the local APIC timer in a recent version of Linux
that uses the TSC-deadline mode timer. Figure 5 shows how
we modified the lapic_next_deadline() function specifying
the next TSC deadline and the local_apic_timer_interrupt()
function called whenever a timer interrupt is fired. We
8
1 /* linux-4.4.23/arch/x86/kernel/apic/apic.c */
2 ...
3 // manipualte the delta of TSC-deadline mode
4 unsigned int lapic_next_deadline_delta = 0U;
5 EXPORT_SYMBOL_GPL(lapic_next_deadline_delta);
6
7 // specify the virtual core under attack
8 int lapic_target_cpu = -1;
9 EXPORT_SYMBOL_GPL(lapic_target_cpu);
10
11 // a hook to launch branch shadowing attack
12 void (*timer_interrupt_hook)(void*) = NULL;
13 EXPORT_SYMBOL_GPL(timer_interrupt_hook);
14 ...
15 // update the next TSC deadline
16 static int lapic_next_deadline(unsigned long delta,
17 struct clock_event_device *evt) {
18 u64 tsc;
19
20 tsc = rdtsc();
21 if (smp_processor_id() != lapic_target_cpu) {
22 wrmsrl(MSR_IA32_TSC_DEADLINE,
23 tsc + (((u64) delta) * TSC_DIVISOR)); // original
24 }
25 else {
26 wrmsrl(MSR_IA32_TSC_DEADLINE,
27 tsc + lapic_next_deadline_delta); // custom deadline
28 }
29 return 0;
30 }
31 ...
32 // handle a timer interrupt
33 static void local_apic_timer_interrupt(void) {
34 int cpu = smp_processor_id();
35 struct clock_event_device *evt = &per_cpu(lapic_events, cpu);
36
37 if (cpu == lapic_target_cpu && timer_interrupt_hook) {
38 timer_interrupt_hook((void*)&cpu); // call attack code
39 }
40 ...
41 }
Figure 5: Modified local APIC timer code of Linux kernel 4.4.23. We
changed lapic_next_deadline() to manipulate the next TSC deadline
and local_apic_timer_interrupt() to launch the branch shadowing
attack. We wrote a kernel module to change the exported global vari-
ables and function.
made and exported two global variables and function point-
ers to manipulate the behaviors of lapic_next_deadline()
and local_apic_timer_interrupt() with a kernel mod-
ule: lapic_next_deadline_delta to change the delta;
lapic_target_cpu to specify a virtual CPU running a victim en-
clave process (via a CPU affinity); and timer_interrupt_hook
to specify a function to be called whenever a timer interrupt is
generated. In our evaluation environment having an Intel Core
i7 6700K CPU (4GHz), we were able to have 1,000 as the mini-
mum delta value; i.e., it fires a timer interrupt about every 1,000
cycles. Note that, in our environment, a delta value lower than
1,000 made the entire system freeze because a timer interrupt
was generated before an old timer interrupt was handled by the
interrupt handler.
We also counted how many CPU instructions can be executed
between such frequent timer interrupts by running a loop with
an ADD instruction. On average, about 48.76 ADD instructions
were executed between two timer interrupts (standard deviation:
2.75)1. This implies that, by using this frequent timer, we can
apply the branch shadowing attack to a victim enclave process
1The number of iterations was 10,000. We disabled Hyper-Threading, Speed-
Step, TurboBoost, and C-States to reduce noise.
1 /* isgx_ioctl.c */
2 ...
3 static long isgx_ioctl_enclave_create(struct file *filep,
4 unsigned int cmd, unsigned long arg) {
5 ...
6 struct isgx_create_param *createp =
7 (struct isgx_create_param *) arg;
8 void *secs_la = createp->secs;
9 struct isgx_secs *secs = NULL;
10 // SGX Enclave Control Structure (SECS)
11 long ret;
12 ...
13 secs = kzalloc(sizeof(*secs), GFP_KERNEL);
14 ret = copy_from_user((void *)secs, secs_la, sizeof (*secs));
15 ...
16 ⋆ secs->base = vm_mmap(file, MANIPULATED_BASE_ADDR, secs->size,
17 ⋆ PROT_READ | PROT_WRITE | PROT_EXEC,
18 ⋆ MAP_SHARED, 0);
19 ...
20 }
Figure 6: Modified Intel SGX driver to manipulate the base address of
an enclave
every 50th instructions.
Disabling cache. If we want to attack a very short loop having
branches (i.e., shorter than 50 instructions), the frequent timer
interrupt would not be enough. To interrupt an enclave process
more frequently, we selectively disable the L1 and L2 cache of
a CPU core running the victim enclave process, by setting the
cache disable (CD) bit of the CR0 control register through a
kernel module. With the frequent timer interrupt and disabled
cache, about 4.71 ADD instructions were executed between two
timer interrupts on average (standard deviation: 1.96 with 10,000
iterations). This would be enough to attack most branches. One
limitation of cache disabling is that it significantly slows the
execution of a victim enclave process such that the process may
notice it is under an attack. Therefore, an attacker needs to
carefully disable the cache only for certain cases (e.g., when
he or she recognizes a victim enclave process is executing a
function containing a very short loop).
3.7 Virtual Address Manipulation
To perform the branch shadowing attack, an attacker has to
manipulate the virtual addresses of a victim enclave process.
Since the attacker has already compromised an OS, manipulating
the page table to change virtual addresses is an easy task. For
simplicity, we assume the attacker disables the user-space ASLR
and modifies the Intel SGX driver for Linux (vm_mmap) to change
the base address of an enclave, as shown in Figure 6. Also, the
attacker puts an arbitrary number of NOP instructions before the
shadow code to satisfy the alignment.
3.8 Attack Synchronization
Although the branch shadowing attack probes multiple branches
in each iteration, it is insufficient when a victim enclave program
is large. An approach to overcome this limitation is to apply
the branch shadowing attack in a function level. Namely, an
attacker first infers functions a victim enclave program either has
executed or is currently executing and then probes branches be-
longing to the functions. If those functions contain entry points
9
that can be invoked from outside (via the EENTER instruction) or
rely on external calls, the attacker can correctly identify them
because they are controllable and observable by the OS. How-
ever, the attacker needs another strategy to infer the execution
of non-exported functions.
To find such executed functions, an attacker can create special
shadow code consisting of always reachable branches of target
functions (e.g., a conditional or unconditional branch located
at the prologue). By periodically executing the special shadow
code, the attacker can know which function has been executed
and will run certain shadow code for the function.
Also, we can use the page-fault side channel [62] to synchro-
nize attacks in terms of pages. Since this channel allows an
attacker to know a code page that is about to be executed, he or
she only needs to check functions located in the code page. But,
this approach would not work when a victim enclave process is
secured with recent studies [52, 10, 51] that prevent page-fault
side channels.
4 Case Studies
In this section, we explain how we can use the branch shadowing
to attack SGX applications and recent studies of securing SGX
against the controlled-channel attack.
4.1 Attacking Enclave Applications
We explain how the branch shadowing attack infers fine-grained
control-flow information of target SGX programs. Specifically,
we focus on examples in which the controlled-channel attack
cannot extract any information, e.g., control flows within a single
page.
Linux SGX SDK. We attacked two libc functions, strtol()
and vfprint(), supported by Linux SGX SDK. Figure 7a is
a simplified strtol() function that converts a string into an
integer. By using the branch shadowing attack, we were able to
infer the sign of an input number by checking the branches in
Lines 10–15. Also, we could infer the length of an input number
by checking the loop branch in Lines 18–27. In addition, when
an input number was hexadecimal, we were able to use the
branch at Line 20 to know whether each digit was larger than
nine.
Figure 7b is a simplified vfprintf() function used to print
a formatted string. The branch shadowing attack was able to
infer the format string by checking the switch-case statement
in Lines 4–13 and the types of input arguments to this function
according the switch-case statement in Lines 15–23. In con-
trast, the controlled-channel attack cannot infer this information
because the functions called by vfprint(), including ADDSARG()
and va_arg(), are inline functions. No page fault sequence will
be observed.
mbed TLS. mbed TLS is a lightweight implementation of
TLS. We ported it to Intel SGX and tried to attack its RSA
implementation, which was not supported by Intel SGX SDK.
mbed TLS’s RSA uses the Montgomery multiplication, as shown
in Figure 8, which has a dummy subtraction (Lines 24–27) to
prevent the well-known remote timing attack [8]. The branch
1 /* linux-sgx/sdk/tlibc/stdlib/strtol.c */
2 long strtol(const char *nptr, char **endptr, int base) {
3 const char *s;
4 long acc, cutoff;
5 int c, neg;
6
7 s = nptr;
8 do { c = (unsigned char) *s++; } while (isspace(c));
9
10 ⋆ if (c == ’-’) {
11 ⋆ neg = 1; c = *s++;
12 ⋆ } else {
13 ⋆ neg = 0;
14 ⋆ if (c == ’+’) c = *s++;
15 ⋆ } // infer the sign of an input number
16
17 ⋆ for (acc = 0, any = 0;; c = (unsigned char) *s++) {
18 ⋆ if(isdigit(c)) c -= ’0’;
19 ⋆ else if (isalpha(c)) c -= isupper(c) ? ’A’ - 10 : ’a’ - 10;
20 ⋆ // infer hexademical
21 else break;
22
23 if (!neg) {
24 acc *= base; acc += c;
25 }
26 ...
27 ⋆ } // infer the length of an input number
28
29 return acc;
30 }
(a) Simplified strtol(). The branch shadowing attack can infer the
sign and length of an input number.
1 /* linux-sgx/sdk/tlibc/stdio/vfprintf.c */
2 int __vfprintf(FILE *fp, const char *fmt0, __va_list ap) {
3 ...
4 for (;;) {
5 ch = *fmt++;
6 switch (ch) {
7 ...
8 ⋆ case ’d’: case ’i’: ADDSARG(); break;
9 ⋆ case ’p’: ADDTYPE_CHECK(TP_VOID); break;
10 ⋆ case ’X’: case ’x’: ADDUARG(); break;
11 ...
12 }
13 } // infer input format string
14 ...
15 for (n = 1; n <= tablemax; n++) {
16 switch (tyypetable[n]) {
17 ⋆ case T_INT:
18 ⋆ (*argtable)[n].intarg = va_arg(ap, int); break;
19 ⋆ case T_DOUBLE:
20 ⋆ (*argtable)[n].doublearg = va_arg(ap, double); break;
21 ...
22 }
23 } // infer the types of input arguments
24 ...
25 return ret;
26 }
(b) Simplified vfprintf(). The branch shadowing attack can infer the
format string and variable arguments.
Figure 7: libc functions of Linux SGX SDK attacked by branch shad-
owing
shadowing attack was able to detect the execution of this dummy
branch. However, the controlled-channel cannot know whether a
dummy subtraction has happened because both real and dummy
branches execute the same function: mpi_sub_hlp().
LIBSVM. LIBSVM is a popular library supporting support vec-
tor machine (SVM) classifiers. We ported a classification logic
of LIBSVM to Intel SGX because it would be a good example
of machine learning as a service [40] while hiding the detailed
parameters. Figure 9 shows the LIBSVM’s kernel function code
running inside an enclave. The branch shadowing attack can
10
1 /* bignum.c */
2 static int mpi_montmul(mbedtls_mpi *A, const mbedtls_mpi *B,
3 const mbedtls_mpi *N, mbedtls_mpi_uint mm,
4 const mbedtls_mpi *T) {
5 size_t i, n, m;
6 mbedtls_mpi_uint u0, u1, *d;
7
8 d = T->p; n = N->n; m = (B->n < n) ? B->n : n;
9
10 for (i = 0; i < n; i++) {
11 u0 = A->p[i];
12 u1 = (d[0] + u0 * B->p[0]) * mm;
13
14 mpi_mul_hlp(m, B->p, d, u0);
15 mpi_mul_hlp(n, N->p, d, u1);
16
17 *d++ = u0; d[n+1] = 0;
18 }
19
20 ⋆ if (mbedtls_mpi_cmp_abs(A, N) >= 0) {
21 ⋆ mpi_sub_hlp(n, N->p, A->p);
22 ⋆ i = 1;
23 ⋆ }
24 ⋆ else { // dummy subtraction to prevent timing attacks
25 ⋆ mpi_sub_hlp(n, N->p, T->p);
26 ⋆ i = 0;
27 ⋆ }
28 return 0;
29 }
Figure 8: Montgomery multiplication (mpi_montmul()) of mbed TLS.
The branch shadowing attack can infer whether a dummy subtraction
has performed or not.
1 /* svm.cpp */
2 double Kernel::k_function(const svm_node *x, const svm_node *y,
3 const svm_parameter& param) {
4 switch(param.kernel_type) {
5 ⋆ case LINEAR:
6 ⋆ return dot(x,y);
7 ⋆ case POLY:
8 ⋆ return powi(param.gamma*dot(x,y)+param.coef0,param.degree);
9 ⋆ case RBF:
10 double sum = 0;
11 while (x->index != -1 && y->index != -1) {
12 ⋆ if (x->index == y->index) {
13 ⋆ double d = x->value - y->value;
14 ⋆ sum += d*d; ++x; ++y;
15 ⋆ }
16 ⋆ else {
17 ⋆ ...
18 ⋆ }
19 ...
20 ⋆ } // infer the lengths of x and y
21 ⋆ return exp(-param.gamma*sum);
22 ⋆ case SIGMOID:
23 ⋆ return tanh(param.gamma*dot(x,y)+param.coef0);
24 ⋆ case PRECOMPUTED:
25 ⋆ return x[(int)(y->value)].value;
26 default:
27 return 0;
28 ⋆ } // infer the kernel type
29 }
Figure 9: Kernel function of LIBSVM. The branch shadowing attack
can infer the kernel type.
recognize the kernel type such as linear, polynomial, and radial
basis function (RBF) due to the switch-case statement in Lines
4–28. Also, when a victim used an RBF kernel, we were able to
infer the number of features (i.e., the length of a vector) he or
she used (Lines 11–20).
Apache. Apache is the most widely used web server. We ported
Apache by decoupling the original Apache program such that
some modules, such as the HTTP module, are secured by Intel
SGX. Figure 10 shows a lookup function of Apache to parse the
1 /* http_protocol.c */
2 static int lookup_builtin_method(const char *method, apr_size_t len)
3 {
4 ...
5 switch (len) {
6 ⋆ case 3:
7 switch (method[0]) {
8 ⋆ case ’P’: return (method[1] == ’U’ && method[2] == ’T’
9 ⋆ ? M_PUT : UNKNOWN_METHOD);
10 ⋆ case ’G’: return (method[1] == ’E’ && method[2] == ’T’
11 ⋆ ? M_GET : UNKNOWN_METHOD);
12 default: return UNKNOWN_METHOD;
13 }
14 ..
15 ⋆ case 5:
16 switch (method[2]) {
17 ⋆ case ’T’: return (memcmp(method, "PATCH", 5) == 0
18 ⋆ ? M_PATCH : UNKNOWN_METHOD);
19 ⋆ case ’R’: return (memcmp(method, "MERGE", 5) == 0
20 ⋆ ? M_MERGE : UNKNOWN_METHOD);
21 ...
22 }
23 ⋆ ...
24 default:
25 return UNKNOWN_METHOD;
26 }
27 }
Figure 10: HTTP method lookup function in Apache’s http module.
The branch shadowing attack can infer the type of http method sent by
clients.
method of an HTTP request. Due to its switch-case statements,
we can easily identify the method of a target HTTP request, such
as GET, POST, DELETE, and PATCH. Since this function invokes
either no function or memcmp(), the controlled-channel attack
has no chance to identify the method.
4.2 Attacking Side Channel Mitigations
In this section, we review recent studies [52, 10, 51] to secure
Intel SGX against page-fault and/or cache-timing attacks, and
explain how the branch shadowing attack can defeat them. We
also discuss how we can use the branch shadowing attack to
break an ASLR implementation in SGX [49], though it is outside
the scope of our threat model.
Deterministic multiplexing. To prevent the page-fault side
channel, Shinde et al. [52] propose a deterministic multiplex-
ing technique to make all page accesses oblivious. This tech-
nique is a weak form of the oblivious RAM (ORAM) tech-
nique [56, 34, 37, 46], but much faster than when developer-
assisted compiler optimization is applied (at most 1.29× over-
head). The deterministic multiplexing works as follows. First, it
makes the execution tree of each function balanced by introduc-
ing dummy (or decoy) branches and basic blocks (Figure 11a).
This balanced execution tree is necessary to hide the behavior
and execution time of a function because it can reveal which
basic blocks of the function have been executed. Next, the de-
terministic multiplexing puts all real and dummy code blocks at
the same execution level into the same code page and all data
blocks that the code blocks will access into the same data page.
This ensures that whether an enclave process is executing a real
or dummy block, a page fault will occur at the same page. Thus,
monitoring page fault sequences no longer reveals the control
flows of a victim enclave process.
However, the branch shadowing attack can easily defeat the
11
deterministic multiplexing technique because this attack can
know whether a victim enclave process is executing a real or
dummy block by using the branch history, not the page faults.
That is, the selective execution of real or dummy branches ac-
cording to condition evaluation cannot hide any secrets from the
branch shadowing attack. One possible way to improve the de-
terministic multiplexing technique is to always to execute both
real and dummy branches as Raccoon [46] does. However, it
is difficult to use in practice due to huge performance overhead
(21.8×).
T-SGX. Shih et al. [51] propose T-SGX, which is a software-
based technique to detect a page fault in user space by using an
existing Intel CPU instruction: Transactional Synchronization
Extensions (TSX). Intel TSX allows a user-level process to
know whether a memory access generates a page fault or not
before it is delivered to an OS. Therefore, with Intel TSX, an
enclave process can detect suspicious page faults and terminate
its execution, whose effects would be the same as proposals
demanding hardware modifications [10, 52].
T-SGX protects each basic block by individually wrapping
it with Intel TSX and makes each of them jump to each other
through a springboard page to enforce control flows (Figure 11b).
However, the branch shadowing attack can easily recognize
which blocks have been executed by probing those branch in-
structions, implying that T-SGX cannot be used to detect or
prevent the branch shadowing attack.
SGX-Shield. Seo et al. [49] develop SGX-Shield, which is an
enclave program to load the code consisting of randomization
units (RUs) while randomizing their locations in place, i.e., it
implements fine-grained ASLR (Figure 11c). Since a malicious
OS is no longer able to know the exact addresses of the target
branch instructions due to randomization, it is difficult to directly
apply the branch shadowing attack.
However, an attacker can infer the execution sequence of
RUs because of the following limitations of SGX-Shield. First,
SGX-Shield does not support live re-randomization such that
the locations of all branch instructions are not changed during
its execution. Second, the sizes of RUs are fixed (32 or 64 bytes)
and their addresses are aligned to avoid any decoding errors.
Since the last instruction of an RU is always a branch instruction
to jump into the next RU, an attacker can identify whether it has
been executed by testing the branch instruction. By repeating it
against all RUs, the attacker will obtain the execution sequence
eventually.
Sanctum. Costan et al. [10] design a new hardware-based TEE,
called Sanctum, which is a secured version of Intel SGX, built
on top of the RISC-V [60, 59] Rocket Core. Sanctum’s goals
are detecting page-fault-based attacks and preventing cache-
timing attacks. First, to detect page-fault-base attacks, Sanctum
lets an enclave process know whether a page fault is occurring
without the help of an OS (Shinde et al. [52] also mention a
similar hardware design.) The enclave process then inspects
whether the page fault is legitimate and terminates its execution
when there is a security problem. Second, to prevent cache-
timing attacks, Sanctum implements a page-coloring technique
to partition the last-level cache (LLC). In Sanctum, physical
addresses are shifted before being stored in the LLC, so that an
taken
Block2
Block3
Block4
Block5
Dummy
Dummy
Block1
not-taken
code page1 code page2 code page3
branch instruction
(a) Deterministic multiplexing
Block1
Block2
Block3
TSX 
springboard
(b) T-SGX
RU1
RU4
RU2
RU3
(c) SGX-Shield
Figure 11: Software-based methods to secure SGX. An attacker knows
or can estimate the locations of branch instructions such that branch
shadowing attacks are possible.
OS cannot know the cache set storing its target memory page.
However, since the branch shadowing attack neither generates
any page faults nor relies on physical addresses, such counter-
measures are irrelevant to this attack. Further, Sanctum aims
to bring minimal modifications to the RISC-V Rocket Core,
which also supports static and dynamic branch prediction. This
implies that, by manipulating virtual addresses, we can perform
branch shadowing attacks against the Sanctum’s enclave unless
Sanctum obfuscates branch prediction behaviors.
5 Countermeasures
In this section, we introduce our hardware-based and software-
based countermeasures against the branch shadowing attack.
5.1 Hardware-based Countermeasure
The micro-architectural state of branch execution is maintained
in two important structures: BTB and BPU. These are not nec-
essarily monolithic structures, and they may be further divided
into sub-structures depending on the implementation. For in-
stance, the BTB may comprise of a different unit for indirect
branches known as an iBTB. These structures are implemented
per hardware core, and on systems that use Simultaneous Multi-
Threading (SMT), they are generally shared between all the
hardware threads of the core. On modern Intel processors with
hyperthreading, we confirmed that the BTB state is shared be-
tween different SMT threads (hyperthreads) by creating set con-
flicts in the BTB between two different hyper threads. To miti-
gate the class of security vulnerabilities described in this paper,
we need to prevent two sources of information leakage: hy-
perthreads running in the same core; and the kernel and user
code on the same hardware thread. Preventing leakage between
hyperthreads is only possible if different hardware threads use
different structures, or if hyperthreading is disabled (§6.2). In
order to prevent the leakage of information on the same hard-
ware thread, we need to ensure that all branch related states are
flushed whenever the context switches to or from enclave mode.
12
00.2
0.4
0.6
0.8
1
1.2
bzip2
gcc
mcf
gobmk
hmmer
sjeng
libquantum
h264ref
omnetpp
astar
xalancbmk
bwaves
gamess
milc
zeusmp
gromacs
cactusADM
leslie3d
namd
dealII
soplex
povray
calculix
GemsFDTD
tonto
lbm wrf
sphinx3
GM
EAN
N
or
m
al
iz
ed
In
st
ru
ct
io
ns
pe
rc
yc
le
SPEC Benchmark
no flushes
flush per 100 cycles
flush per 1k cycles
flush per 10k cycles
flush per 100k cycles
flush per 1M cycles
flush per 10M cycles
Figure 12: Instructions per cycle of SPEC benchmark in terms of frequency of BTB + BPU flushing.
0
20
40
60
80
100
n/a 100 1k 10k 100k 1M 10MN
or
m
al
iz
ed
B
T
B
H
it/
M
is
s
R
at
e
Flushing Frequency (cycles)
Avg BTB Hit Rate Avg BTB Miss Rate
Figure 13: Average BTB hit/miss rate for SPEC06 w.r.t. frequency of
BTB + BPU flushing.
0
20
40
60
80
100
n/a 100 1k 10k 100k 1M 10M
N
or
m
al
iz
ed
B
T
B
St
at
s
Flushing Frequency (cycles)
Avg BP_ON_PATH_CORRECT
Avg BP_ON_PATH_MISPREDICT
Avg BP_ON_PATH_MISFETCH
Figure 14: Average BTB stats for SPEC06 w.r.t. frequency of BTB +
BPU flushing.
Parameter Value
CPU 4 GHz out of order core, 4 issue width, 256 entry ROB
L1 cache 8 way 32 KB I-cache + 8 way 32 KB D-cache
L2 cache 8 way 128 KB
L3 cache 32 way 8 MB
BTB 4 way 1,024 sets
BPU gshare, branch history length 16
Table 3: MacSim Simulation parameters
Whenever an enclave context switch (via the EENTER, EEXIT, or
ERESUME instruction or AEX) happens, we need to flush the BTB
and BPU state. Since the BTB and BPU benefit from local and
global branch execution history, there would be a performance
penalty if these structures are flushed too frequently.
We aim to determine the performance impact of flushing these
structures at different frequencies in a cycle level out-of-order
microarchitecture simulator, MacSim [30]. The details of our
simulation parameters are listed in Table 3. The BTB is modeled
after the BTB in Intel Skylake processors. We used a method
similar to [58, 1] to reverse engineer the BTB parameters. From
our experiments, we found that the BTB is organized as a 4-
way set associative structure with a total of 4,096 entries. We
model a simple branch predictor, gshare [38], for the simulation.
Current Intel processors use more advanced predictors, but the
specifics are not very important for these experiments. We use
200 million instruction long traces from the SPEC06 benchmark
suite for simulation and flush the BTB and BPU periodically at
varying frequencies.
Figure 12 shows the normalized instructions per cycle (IPC)
for different flush frequencies. We found that if the flush fre-
quency is higher than 100K cycles, it has a negligible impact
on the performance. At a flush frequency of 100K cycles, the
performance impact is lower than 2% and at 1 million cycles, it
is negligible. Figure 13 shows the BTB hit rate, whereas Fig-
ure 14 shows the BPU correct, incorrect (direction prediction is
wrong), and misfetch (target prediction is wrong) percentages.
The BTB and BPU statistics are also virtually indistinguishable
beyond a flush frequency of 100K cycles.
In a 4GHz CPU, if we assume that the interval between inter-
rupts (or AEX) is 100K cycles, there would be 10,000 interrupts
per second. According to our measurements, about 250 and
1,000 timer interrupts are generated per second in Linux (ver-
sion 4.4) and Windows 10, respectively. Thus, if there is no I/O
device generating too many interrupts, the flush frequency of
100K cycles would be reasonable.
5.2 Software-based Countermeasure
The hardware-based countermeasure can effectively prevent
the branch shadowing attack, but we cannot be sure when and
whether such hardware changes can be realized. Especially, if
such changes cannot be done with micro code updates, Intel
CPUs already deployed in the markets would have no counter-
measure against the branch shadowing attack.
Possible software-based countermeasures against the branch
shadowing attack are to remove branches [40] or to use the
state-of-the-art ORAM technique, Raccoon [46]. Ohrimenko et
al. [40]’s data-oblivious machine learning algorithms try to elim-
inate all branches by using a conditional move instruction, CMOV.
However, their approach is algorithm-specific, i.e., we cannot
apply it to general applications. Raccoon [46] always executes
13
cmp $0, $a
je block2
<code1>
jmp block5
cmp $0, $b
je block4
<code2>
jmp block5
<code3>
<code4>
if (a != 0) {
  <code1>
}
else if (b != 0) {
  <code2>
}
else {
  <code3>
}
 <code4>
block3:
block1:
block2:
block5:
block0:
block4:
(a) An example code snippet. It selectively executes a branch block
according to a and b variables.
mov $block1, r15
cmp $0, $a
cmov $block2, r15
jmp zz1
<code1>
mov $block5, r15
jmp zz2
mov $block3, r15
cmp $0, $b
cmov $block4, r15
jmp zz3
<code2>
mov $block5, r15
jmp zz4
<code3>
<code4>
block0:
block0.j:
block1.j:
block1:
block2.j:
block2:
block3.j:
block3:
block5:
block4:
Zigzagger's trampoline
zz1:jmp block1.j
zz2:jmp block2.j
zz3: jmp block3.j
zz4: jmpq *r15
(b) The protected code snippet by Zigzagger. All branch instructions
are executed regardless of a and b variables. An indirect branch in
the trampoline and CMOV instructions in the translated code are used to
obfuscate the final target address. Note that r15 is reserved in Zigzagger
to store the target address.
Figure 15: Securing an example code snippet with Zigzagger.
both paths of a conditional branch, such that it can hide whether
the branch has been really taken from a branch shadowing attack.
But, its performance overhead is high (21.8×).
Zigzagger. We propose a practical, compiler-based mitigation
scheme against the branch shadowing attack, called Zigzagger.
The basic idea of Zigzagger is to obfuscate a set of branch in-
structions into a single indirect jump. However, it is not straight-
forward to compute the target block of each branch without
relying on conditional jumps because conditional expressions
would become very complex when we need to handle nested
branches. In Zigzagger, we solved this problem by utilizing a
CMOV instruction, which performs a conditional MOV operation,
and introducing a sequence of non-conditional jump instructions
in lieu of each branch. Zigzagger’s approach has several bene-
fits: 1) in terms of security, it provides a first line of protection
on each branch blocks and explodes the potential flows in an
enclave program; 2) in terms of performance, the unconditional
jumps are much more favorable to instruction pipelining; 3)
in terms of practicality, Zigzagger’s transformation does not
require complex analysis of code semantics (i.e., possible to
implement it as a compiler pass). Furthermore, Zigzagger’s
execution pattern—back-and-forth jumps between the converted
branch set and the Zigzagger’s trampoline—practically increases
the bar for de-obfuscating the fine-grained control-flow of the
protected enclave problem. It is worth noting that this counter-
measure is not specific to Intel SGX nor the branch shadowing
attack proposed in this paper; we can use this approach to miti-
gate other types of branch-based timing attacks.
Figure 15 shows how Zigzagger transforms an example code
snippet having if, else-if, and else blocks. It converts all con-
Benchmark Baseline Zigzagger
(iter/s) #Branches (overhead)
2 3 4 5 All
numeric sort 967.25 1.05× 1.11× 1.12× 1.13× 1.15×
string sort 682.31 1.08× 1.15× 1.18× 1.15× 1.27×
bitfield 4.5E+08 1.03× 1.10× 1.14× 1.18× 1.31×
fp emulation 96.204 1.10× 1.21× 1.15× 1.27× 1.35×
fourier 54982 0.99× 0.99× 1.01× 1.01× 1.01×
assignment 35.73 1.36× 1.56× 1.50× 1.55× 1.90×
idea 10,378 2.16× 2.16× 2.18× 2.19× 2.19×
huffman 2478.1 1.59× 1.46× 1.61× 1.63× 1.81×
neural net 16.554 0.75× 0.77× 0.85× 0.86× 0.89×
lu decomposition 1,130 1.04× 1.09× 1.08× 1.11× 1.17×
GEOMEAN 1.17× 1.22× 1.24× 1.26× 1.34×
Table 4: Overhead of the Zigzagger approach according to the number
of branches belonging to each Zigzagger
ditional and unconditional branches into unconditional branches
targeting Zigzagger’s trampoline that jumps back-and-forth with
the converted branches and finally jumps into the real target
address in a reserved register r15 stored before jumping into the
Zigzagger. It reserves the register for performance reasons; for
programs that can utilize more registers, it can potentially use the
main memory instead, but reserving r15 in SGX has negligible
performance overhead [51]. To emulate conditional execution
without using conditional jump, we use CMOV instructions: e.g.,
the CMOV instructions in Figure 15b update r15 only when a or
b is zero. Otherwise, these instructions are treated as NOP in-
structions. Since all of the unconditional branches are executed
almost simultaneously in sequence, an attacker has difficulty rec-
ognizing the current instruction pointer; our APIC timer trick is
not fine-grained enough to distinguish each branches in practice
(§3.6). At last, the indirect branch in Zigzagger’s trampoline
now has five different target addresses, obfuscating potential
target addresses.
Implementation. We implemented Zigzagger in LLVM 4.0
as an LLVM pass that converts branches in each function and
constructs the required trampoline. We also modified the LLVM
backend to reserve the r15 register. We observed that when
a function has many branches, making them share a single
trampoline in Zigzagger introduces non-negligible performance
overhead due to frequent jumps. To avoid this problem, our
implementation provides a knob to configure the number of
branches that each trampoline manages and randomly assigns
branches to each trampoline. Note that such merging-based
optimization trades the security for performance, but we believe
it becomes more useful in practice (e.g., selectively applying to
security-sensitive routines).
Our proof-of-concept implementation of Zigzagger, which
provides full protection, imposes 1.34× performance overheads,
when evaluating it with the nbench benchmark suite (Table 4).
With optimization (i.e., merging ≤ 3 branches into a single
trampoline), the average overhead becomes less than 1.22×.
Note that reserving a register in our microbenchmark results in
4%–50% performance improvement.
14
6 Discussion
In this section, we explain some limitations of the branch shad-
owing attack and discuss possible advanced attacks.
6.1 Limitations
An important limitation of the branch shadowing attack is that
it cannot distinguish a not-taken conditional branch from a not-
executed conditional branch because, in both cases, the BTB has
no information about the branch; the static branch prediction
rule is applied. Also, the branch history attack cannot distin-
guish an indirect branch to the right next instruction from a
not-executed indirect branch because their predicted branch tar-
gets are the same. Therefore, an attacker has to probe a number
of correlated branches (e.g., unconditional branches in else-if
or case blocks) to overcome this limitation.
6.2 Advanced Attacks
We introduce two advanced attacks based on the branch shad-
owing attack: hyperthreaded branch shadowing attack and blind
branch shadowing attack.
Hyperthreaded branch shadowing. Since two hyperthreads
simultaneously running in the same physical core share the same
BTB, a malicious hyperthread is able to attack a victim enclave
hyperthread by using BTB entry conflicts, if a malicious OS
gives the address information of the victim to it. We found
that branch instructions with the same low 16-bit address were
mapped into the same BTB set. Thus, a malicious hyperthread
can monitor a BTB set for evictions by filling the BTB set
with four branch instructions. The BTB clearing (§5.1) cannot
prevent this attack because no enclave mode switch happens.
However, this attack cannot precisely identify the higher order
bits of the victim branch’s address yet since they aren’t used in
set index calculation. We plan to reverse engineer the BTB’s
characteristics in more detail to determine whether we can obtain
the exact address of taken branches.
Blind branch shadowing. A blind branch shadowing attack is
an attempt to probe the entire or selected memory region of a
victim enclave process to detect any unknown branch instruc-
tions. This attack would be necessary if a victim enclave process
has self-modifying code or uses remote code loading, though
it is outside the scope of our threat model (§3.1). In the case
of unconditional branches, blind probing is easy and effective
because it does not need to infer target addresses. However,
in the case of conditional and indirect branches, blind probing
needs to consider branch instructions and their targets simulta-
neously such that the search space would be huge. We plan to
consider an effective method to minimize the search space to
know whether this attack is practical.
7 Related work
This section introduces studies related with our work including
studies on SGX and microarchitectural side channels.
Intel SGX. The strong security guarantee provided by SGX
has drawn significant attention from the research community.
Several security applications of SGX have been proposed, in-
cluding secure data analysis [48, 40], secure distributed com-
puting [11, 7], and secure networking service [50, 42]. Also,
researchers implemented library OSes for SGX [6, 5] to run ex-
isting applications inside an enclave without any modifications.
The security properties of SGX itself are also being intensively
studied. For example, Sinha et al. [54, 53] develop tools to
verify the confidentiality of enclave programs.
However, the authors of the above-mentioned projects do
not consider the potential security attacks against SGX. Xu et
al. [62] and Shinde et al. [52] demonstrate the first side-channel
attack on SGX by leveraging the fact that SGX relies on OS
for memory resource management. The attack is done by inten-
tionally manipulating the page table to trigger a page fault and
using a page fault sequence to infer the secret inside an enclave.
Weichbrodt et al. [61] also show how a synchronous bug can be
exploited to attack SGX applications.
To address the page-fault-based side-channel attack, Shinde et
al. [52] have proposed an ORAM-like scheme that can effec-
tively obfuscate the memory access pattern of the enclave pro-
gram, but it suffers from significant performance overhead.
Shih et al. [51] have proposed a compiler-based solution using
Intel TSX to detect any suspicious page faults inside an enclave.
Also, Costan et al. [10] have proposed a new enclave design to
prevent both page-fault and cache-timing side channels. Finally,
Seo et al. [49] have enforced fine-grained ASLR on enclave pro-
grams, which can raise the bar of exploiting any vulnerabilities
and inferring control flow with page-fault sequences. However,
we demonstrated that none of these solutions can mitigate the
branch shadowing attack.
Microarchitectural side channel. Numerous researchers have
considered the security problems of microarchitectural side chan-
nels. The most popular and well-studied microarchitectural
side channel is a CPU cache timing channel first developed
by [33, 29, 41] to break cryptosystems. This attack is further
extended to be conducted in the public cloud setting to rec-
ognize co-residency of virtual machines [47, 64]. Recently,
several researchers further improved this attack to exploit the
last level cache [26, 35] and realize a low-noise cache storage
channel [18]. The CPU cache is not the sole source of the
microarchitectural side channel. For example, Hund et al. [20]
exploits the translation lookaside buffer (TLB) timing channel to
break the kernel ASLR. Also, researchers improve this attack by
exploiting other side channels in the Intel TSX [27], a PREFETCH
instruction [17], and the BTB [13], respectively. Note that Ge et
al. [45] publish a comprehensive survey of microarchitectural
side channels.
8 Conclusion
A hardware-based TEE is a promising technology to realize the
truly secure public cloud, but, without serious security analysis,
no one is willing to trust and use the TEE. Especially, a lack of
thorough analysis of side channels is problematic because it is
difficult to ensure that a TEE is completely free from any side
15
channels. In this paper, we explored a new side-channel attack
against Intel SGX, called a branch shadowing attack, which can
precisely identify fine-grained (basic-block-level) control flows
executed inside an enclave. We proposed a hardware-based
countermeasure that clears the branch history during enclave
mode switch and a software-based mitigation that makes branch
executions oblivious.
References
[1] “The BTB in contemporary Intel chips—Matt Godbolt’s
blog,” http://xania.org/201602/bpu-part-three, (Accessed
on 11/10/2016).
[2] “Kernel self protection project - linux kernel security sub-
system,” https://kernsec.org/wiki/index.php/Kernel_Self_
Protection_Project.
[3] O. Aciicmez, K. Koc, and J. Seifert, “On the power of
simple branch prediction analysis,” in Proceedings of the
2nd ACM Symposium on Information, Computer and Com-
munications Security (ASIACCS), 2007.
[4] ARM, “ARM TrustZone,” https://www.arm.com/products/
security-on-arm/trustzone.
[5] S. Arnautox, B. Tarch, F. Gregor, T. Knauth, A. Martin,
C. Priebe, , J. Lind, D. Muthukumaran, D. O’Keeffe, M. L.
Stillwell, D. Goltzsche, D. Eyers, R. Kapitza, P. Pietzuch,
and C. Fetzer, “SCONE: Secure Linux containers with In-
tel SGX,” in Proceedings of the 12th USENIX Symposium
on Operating Systems Design and Implementation (OSDI),
Savannah, GA, Nov. 2016.
[6] A. Baumann, M. Peinado, and G. Hunt, “Shielding applica-
tions from an untrusted cloud with Haven,” in Proceedings
of the 11th USENIX Symposium on Operating Systems De-
sign and Implementation (OSDI), Broomfield, Colorado,
Oct. 2014.
[7] S. Brenner, C. Wulf, M. Lorenz, N. Weichbrodt,
D. Goltzsche, C. Fetzer, P. Pietzuch, and R. Kapitza, “Se-
cureKeeper: Confidential ZooKeeper using Intel SGX,” in
Proceedings of the 16th Annual Middleware Conference
(Middleware), 2016.
[8] D. Brumley and D. Boneh, “Remote timing attacks are
practical,” in Proceedings of the 12th USENIX Security
Symposium (Security), Washington, DC, Aug. 2003.
[9] V. Costan and S. Devadas, “Intel SGX explained,” Cryptol-
ogy ePrint Archive, Report 2016/086, 2016, http://eprint.
iacr.org/2016/086.pdf.
[10] V. Costan, I. Lebedev, and S. Devadas, “Sanctum: Min-
imal hardware extensions for strong software isolation,”
in Proceedings of the 25th USENIX Security Symposium
(Security), Austin, TX, Aug. 2016.
[11] T. T. A. Dinh, P. Saxena, E.-C. Cang, B. C. Ooi, and
C. Zhang, “M2R: Enabling stronger privacy in MapReduce
computation,” in Proceedings of the 24th USENIX Security
Symposium (Security), Washington, DC, Aug. 2015.
[12] D. Evtyushkin, D. Ponomarev, and N. Abu-Ghazalch,
“Covert channels through branch predictors: A feasibil-
ity study,” in Proceedings of the 4th Workshop on Hard-
ware and Architectural Support for Security and Privacy
(HASP), 2015.
[13] ——, “Jump over ASLR: Attacking branch predictors
to bypass ASLR,” in Proceedings of the 49th Annual
IEEE/ACM International Symposium on Microarchitec-
ture (MICRO), Taipei, Taiwan, Oct. 2016.
[14] C. Giuffrida, A. Kuijsten, and A. S. Tanenbaum, “En-
hanced operating system security through efficient and
fine-grained address space randomization,” in Proceed-
ings of the 21st USENIX Security Symposium (Security),
Bellevue, WA, Aug. 2012.
[15] T. Grance and W. Jansen, “Guidelines on security and
privacy in public cloud computing,” https://www.nist.gov/
node/591971.
[16] P. Grubbs, R. McPherson, M. Naveed, T. Ristenpart, and
V. Shmatikov, “Breaking web applications built on top of
encrypted data,” in Proceedings of the 23rd ACM Confer-
ence on Computer and Communications Security (CCS),
Vienna, Austria, Oct. 2016.
[17] D. Gruss, C. Maurice, A. Fogh, M. Lipp, and S. Mangard,
“Prefetch side-channel attacks: Bypassing SMAP and ker-
nel ASLR,” in Proceedings of the 23rd ACMConference on
Computer and Communications Security (CCS), Vienna,
Austria, Oct. 2016.
[18] R. Guanciale, H. Nemati, C. Baumann, and M. Dam,
“Cache storage channels: Alias-driven attacks and veri-
fied countermeasures,” in Proceedings of the 37th IEEE
Symposium on Security and Privacy (Oakland), San Jose,
CA, May 2016.
[19] S. M. Hand, “Self-paging in the Nemesis operating sys-
tem,” in Proceedings of the 3rd USENIX Symposium on
Operating Systems Design and Implementation (OSDI),
New Orleans, LA, Feb. 1999.
[20] R. Hund, C. Willems, and T. Holz, “Practical timing side
channel attacks against kernel space ASLR,” in Proceed-
ings of the 34th IEEE Symposium on Security and Privacy
(Oakland), San Francisco, CA, May 2013.
[21] Intel, “Intel software guard extensions: Intel attestation
service API,” https://software.intel.com/sites/default/files/
managed/3d/c8/IAS_1_0_API_spec_1_1_Final.pdf.
[22] ——, “Protection from side-channel attacks,” https://
software.intel.com/en-us/node/696952.
16
[23] ——, “Intel software guard extensions programming refer-
ence (rev2),” Oct. 2014, 329298-002US.
[24] ——, “Intel 64 and IA-32 architectures optimization refer-
ence manual,” Jun. 2016.
[25] ——, “Intel 64 and ia-32 architectures software devel-
oper’s manual combined volumes: 1, 2a, 2b, 2c, 2d, 3a, 3b,
3c and 3d,” Sep. 2016.
[26] G. Irazoqui, T. Eisenbarth, and B. Sunar, “S$A: A shared
cache attack that works across cores and defies VM
sandboxing—and its application to AES,” in Proceedings
of the 36th IEEE Symposium on Security and Privacy (Oak-
land), San Jose, CA, May 2015.
[27] Y. Jang, S. Lee, and T. Kim, “Breaking kernel address
space layout randomization with Intel TSX,” in Proceed-
ings of the 23rd ACM Conference on Computer and Com-
munications Security (CCS), Vienna, Austria, Oct. 2016.
[28] S. Johnson, V. Scarlata, C. Rozas, E. Brick-
ell, and F. Mckeen, “Intel software guard exten-
sions: EPID provisioning and attestation services,”
https://software.intel.com/en-us/blogs/2016/03/09/intel-
sgx-epid-provisioning-and-attestation-services.
[29] J. Kelsey, B. Schneier, D. Wagner, and C. Hall, “Side
channel cryptanalysis of product ciphers,” in Proceedings
of the 5th European Symposium on Research in Computer
Security (ESORICS), Belgium, Sep. 1998.
[30] H. Kim, J. Lee, N. B. Lakshminarayana, J. Sim, J. Lim, and
T. Pho, “MacSim: A CPU-GPU heterogeneous simulation
framework.”
[31] A. Kleen, “Advanced usage of last branch records,” 2016,
https://lwn.net/Articles/680996/.
[32] ——, “An introduction to last branch records,” 2016, https:
//lwn.net/Articles/680985/.
[33] P. C. Kocher, “Timing attacks on implementations of diffie-
hellman, rsa, dss, and other systems,” in Annual Interna-
tional Cryptology Conference, 1996.
[34] C. Liu, A. Harris, M. Maas, M. Hicks, M. Tiwari, and
E. Shi, “GhostRider: A hardware-software system for
memory trace oblivious computation,” in Proceedings of
the 20th ACM International Conference on Architectural
Support for Programming Languages and Operating Sys-
tems (ASPLOS), Istanbul, Turkey, Mar. 2015.
[35] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee, “Last-
level cache side-channel attacks are practical,” in Proceed-
ings of the 36th IEEE Symposium on Security and Privacy
(Oakland), San Jose, CA, May 2015.
[36] K. Lu, C. Song, T. Kim, and W. Lee, “UniSan: Proactive
kernel memory initialization to eliminate data leakages,”
in Proceedings of the 23rd ACM Conference on Computer
and Communications Security (CCS), Vienna, Austria, Oct.
2016.
[37] M. Maas, E. Love, E. Stefanov, M. Tiwari, E. Shi,
K. Asanovic´, J. Kubiatowicz, and D. Song, “PHANTOM:
Practical oblivious computation in a secure processor,” in
Proceedings of the 20th ACM Conference on Computer
and Communications Security (CCS), Berlin, Germany,
Oct. 2013.
[38] S. McFarling, “Combining branch predictors,” 1993.
[39] M. Naveed, S. Kamara, and C. V. Wright, “Inference at-
tacks on property-preserving encrypted databases,” in Pro-
ceedings of the 22nd ACM Conference on Computer and
Communications Security (CCS), Denver, Colorado, Oct.
2015.
[40] O. Ohrimenko, C. F. Manuel Costa, S. Nowozin, A. Mehta,
F. Schuster, and K. Vaswani, “SGX-enabled oblivious ma-
chine learning,” in Proceedings of the 25th USENIX Secu-
rity Symposium (Security), Austin, TX, Aug. 2016.
[41] D. Page, “Theoretical use of cache memory as a crypt-
analytic side-channel.” IACR Cryptology ePrint Archive,
2002.
[42] R. Pires, M. Pasin, P. Felber, and C. Fetzer, “Secure
content-based routing using Intel Software Guard Exten-
sions,” in Proceedings of the 16th Annual Middleware
Conference (Middleware), 2016.
[43] R. A. Popa, “Building practical systems that compute on
encrypted data,” Ph.D. dissertation, Massachusetts Institute
of Technology, 2014.
[44] D. Pouliot and C. V. Wright, “The shadow nemesis: Infer-
ence attacks on efficiently deployable, efficiently search-
able encryption,” in Proceedings of the 23rd ACM Confer-
ence on Computer and Communications Security (CCS),
Vienna, Austria, Oct. 2016.
[45] Q.Ge, Y. Yarom, D. Cock, and G. Heiser, “A survey of
microarchitectural timing attacks and countermeasures on
contemporary hardware,” Cryptology ePrint Archive, Re-
port 2016/613, 2016, http://eprint.iacr.org/2016/613.pdf.
[46] A. Rane, C. Lin, and M. Tiwari, “Raccoon: Closing digital
side-channels through obfuscated execution,” in Proceed-
ings of the 24th USENIX Security Symposium (Security),
Washington, DC, Aug. 2015.
[47] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey,
you, get off of my cloud: exploring information leakage
in third-party compute clouds,” in Proceedings of the 16th
ACM Conference on Computer and Communications Secu-
rity (CCS), Chicago, IL, Nov. 2009.
[48] F. Schuster, M. Costa, C. Fournet, C. Gkantsidis,
M. Peinado, G. Mainar-Ruiz, and M. Russinovich, “VC3:
Trustworthy data analytics in the cloud using SGX,” in
Proceedings of the 36th IEEE Symposium on Security and
Privacy (Oakland), San Jose, CA, May 2015.
17
[49] J. Seo, B. Lee, S. Kim, M.-W. Shih, I. Shin, D. Han, and
T. Kim, “SGX-Shield: Enabling address space layout ran-
domization for SGX programs (to appear),” in Proceedings
of the 2017 Annual Network and Distributed System Secu-
rity Symposium (NDSS), San Diego, CA, Feb.–Mar. 2017.
[50] M.-W. Shih, M. Kumar, T. Kim, and A. Gavrilovska, “S-
NFV: Securing NFV states by using SGX,” in Proceedings
of the 1st ACM International Workshop on Security in SDN
and NFV, New Orleans, LA, Mar. 2016.
[51] M.-W. Shih, S. Lee, T. Kim, and M. Peinado, “T-SGX:
Eradicating controlled-channel attacks against enclave pro-
grams (to appear),” in Proceedings of the 2017 Annual Net-
work and Distributed System Security Symposium (NDSS),
San Diego, CA, Feb.–Mar. 2017.
[52] S. Shinde, Z. L. Chua, V. Narayanan, and P. Saxena, “Pre-
venting your faults from telling your secrets,” in Proceed-
ings of the 11th ACM Symposium on Information, Com-
puter and Communications Security (ASIACCS), Xi’an,
China, May–Jun. 2016.
[53] R. Sinha, M. Costa, A. Lal, N. P. Lopes, S. Rajamani, S. A.
Seshia, and K. Vaswani, “A design and verification method-
ology for secure isolated regions,” in Proceedings of the
2016 ACM SIGPLAN Conference on Programming Lan-
guage Design and Implementation (PLDI), Santa Barbara,
CA, Jun. 2016.
[54] R. Sinha, S. Rajamani, S. Seshia, and K. Vaswani, “Moat:
Verifying confidentiality of enclave program,” in Proceed-
ings of the 22nd ACM Conference on Computer and Com-
munications Security (CCS), Denver, Colorado, Oct. 2015.
[55] C. Song, B. Lee, K. Lu, W. R. Harris, T. Kim, and W. Lee,
“Enforcing Kernel Security Invariants with Data Flow
Integrity,” in Proceedings of the 23rd Annual Network
and Distributed System Security Symposium (NDSS), San
Diego, CA, Feb. 2016.
[56] E. Stefanov, M. Van Dijk, E. Shi, C. Fletcher, L. Ren,
X. Yu, and S. Devadas, “Path ORAM: An extremely simple
oblivious RAM protocol,” in Proceedings of the 20th ACM
Conference on Computer and Communications Security
(CCS), Berlin, Germany, Oct. 2013.
[57] Trusted Computing Group, “Trusted platform module
(TPM) summary,” http://www.trustedcomputinggroup.org/
trusted-platform-module-tpm-summary/.
[58] V. Uzelac and A. Milenkovic, “Experiment flows and mi-
crobenchmarks for reverse engineering of branch predictor
structures,” in Performance Analysis of Systems and Soft-
ware, 2009. ISPASS 2009. IEEE International Symposium
on. IEEE, 2009, pp. 207–217.
[59] A. Waterman, Y. Lee, R. Avizienis, D. A. Patterson, and
K. Asanovic´, “The RISC-V instruction set manual volume
II: Privileged architecture version 1.9,” EECS Department,
University of California, Berkeley, Tech. Rep. UCB/EECS-
2016-129, Jul 2016. [Online]. Available: http://www2.eecs.
berkeley.edu/Pubs/TechRpts/2016/EECS-2016-129.html
[60] A. Waterman, Y. Lee, D. A. Patterson, and K. Asanovic´,
“The RISC-V instruction set manual, volume I: User-level
ISA, version 2.1,” EECS Department, University of
California, Berkeley, Tech. Rep. UCB/EECS-2016-118,
May 2016. [Online]. Available: http://www2.eecs.berkeley.
edu/Pubs/TechRpts/2016/EECS-2016-118.html
[61] N. Weichbrodt, A. Kurmus, P. Pietzuch, and R. Kapitza,
“AsyncShock: Exploiting synchronisation bugs in Intel
SGX enclaves,” in Proceedings of the 21th European Sym-
posium on Research in Computer Security (ESORICS),
Heraklion, Greece, Sep. 2016.
[62] Y. Xu, W. Cui, and M. Peinado, “Controlled-channel at-
tacks: Deterministic side channels for untrusted operating
systems,” in Proceedings of the 36th IEEE Symposium on
Security and Privacy (Oakland), San Jose, CA, May 2015.
[63] K. Yang, M. Hicks, Q. Dong, T. Austin, and D. Sylvester,
“A2: Analog malicious hardware,” in Proceedings of the
37th IEEE Symposium on Security and Privacy (Oakland),
San Jose, CA, May 2016.
[64] Y. Zhang, A. Juels, A. Oprea, and M. K. Reiter, “Home-
alone: Co-residency detection in the cloud via side-channel
analysis,” in Proceedings of the 32nd IEEE Symposium on
Security and Privacy (Oakland), Oakland, CA, May 2011.
18
