GhostKnight: Breaching Data Integrity via Speculative Execution by Zhang, Zhi et al.
GhostKnight: Breaching Data Integrity via
Speculative Execution
Zhi Zhang∗,1,2, Yueqiang Cheng∗,3, Yinqian Zhang4 and Surya Nepal2
The draft is subject to change
∗ Both authors contributed equally to this work
1 The University of New South Wales
2 Data61, CSIRO, Australia
3 Baidu Security
4 The Ohio State University, America
Abstract—Existing speculative execution attacks are limited to
breaching confidentiality of data beyond privilege boundary, the
so-called spectre-type attacks. All of them utilize the changes in
microarchitectural buffers made by the speculative execution to
leak data. We show that the speculative execution can be abused
to break data integrity. We observe that the speculative execution
not only leaves traces in the microarchitectural buffers but also in-
duces side effects within DRAM, that is, the speculative execution
can trigger an access to an illegitimate address in DRAM. If the
access to DRAM is frequent enough, then architectural changes
(i.e., permanent bit flips in DRAM) will occur, which we term
GhostKnight. With the power of of GhostKnight, an attacker is
essentially able to cross different privilege boundaries and write
exploitable bits to other privilege domains. In our future work,
we will develop a GhostKnight-based exploit to cross a trusted
execution environment, defeat a 1024-bit RSA exponentiation
implementation and obtain a controllable signature.
I. INTRODUCTION
In 2018, Kocher et al. [21] uncovered a class of security
vulnerabilities, the so-called “spectre”, that embeds within the
speculative execution of modern processors, extracting private
information through a timing side-channel.
For instance, when the processor encounters a conditional
branch, it depends on a value to determine the destination of
the branch, but unfortunately the value is located in DRAM
(Dynamic Random Access Memory) and the processor is
supposed to wait for it to arrive. To maximize performance,
the processor instead makes an educated guess about the
destination and thus speculatively executes ahead. If the guess
is correct, then a performance advantage is achieved. If the
guess turns out to be wrong, then the processor discards
the architectural changes (e.g., register values) caused by the
speculative execution. However, it leaves sensitive microar-
chitectural changes (e.g., data in cache), which are although
invisible but can be leaked through known attack vectors such
as side-channels [18], [24].
Motivated by the spectre, numerous spectre-type attacks
have been disclosed (e.g., spectre V1, V2, V4 [21], [36], [15]).
As shown in Figure 1, all spectre-like attacks in common
demonstrate how to read out sensitive bits belonging to other
processes, that is, the only capability of existing speculative
execution attacks is limited to breaking data confidentiality of
other domains (e.g., kernel, hypervisor and etc.).
Breach Data Confidentiality Breach Data Integrity
Spectre-V1
Spectre-V2
Spectre-V4 Our Work
…
SpectreRSB
NetSpectre
Fig. 1: All existing speculative execution attacks (shown on
left side of the vertical line) are limited to breaching data
confidentiality, also known as spectre-type attacks. In contrast
to them, our work, GhostKnight, on the right side, can breach
data integrity.
Our contributions: to the best of our knowledge, we are the
first to exploit the speculative execution to breach data integrity
(i.e., flip bits in sensitive memory) as shown on the right side
of Figure 1]. We observe that the speculative execution can
be triggered to leave side effects of illegal DRAM accesses,
which makes it possible to break data integrity. Take the
aforementioned conditional branch as an example, it checks
if an array index is within the array size. If the index value
passes the check, then we take the branch and perform a
valid array access. When the array size resides within DRAM
and takes about a few hundred clock cycles to arrive, the
processor will not wait for the check to be resolved. Instead,
it relies on prediction history records to predicts whether the
check will pass. By training the branch predictor, attackers
can induce the processor to speculatively perform a memory
access with an illegal index that is out-of-bound. In this
case, the access searches CPU cache first and then triggers
a DRAM access if cache miss occurs. If we perform the
speculative DRAM accesses as frequent as possible, then we
can trigger a software-induced hardware bug [20] and flip bits
in inaccessible.
With the above key observation, we introduce a paradigm
shift in the speculative execution attacks, called GhostKnight.
It is able to cross privilege boundary and flip bits in memory
belonging to other domains, thus extending the capability to
ar
X
iv
:2
00
2.
00
52
4v
1 
 [c
s.C
R]
  3
 Fe
b 2
02
0
compromising data integrity. Similar to spectre-based attacks,
GhostKnight also abuses the speculative execution to break
existing privilege boundaries enforced by such as Memory
Management Unit (MMU), MMU virtualization (e.g., Intel
EPT [16], AMD NPT [1], and ARM Second-stage transla-
tion [3]), Intel MPK [16], Intel SGX [2], AMD SME [19] and
etc.
In order to demonstrate the general idea of GhostKnight,
we need to address the three following challenges.
First, GhostKnight must efficiently mistrain the processor’s
prediction logic in order to trigger the speculative execution as
quickly as possible, since we require a high enough DRAM-
access frequency to induce bit flips [20]. To address that,
GhostKnight leverages spectre-V1 as a case and mistrains the
branch predictor with a minimal number of valid array indexes.
We can use one Intel Performance Counter (PMC) to decide
the minimal array index number that ensures the effectiveness
in mistraining CPU’s branch predictor.
Second, GhostKnight must verify whether a triggered spec-
ulative execution indeed performs a DRAM access beyond
memory boundary. If the speculation window is restricted, then
the DRAM access will not occur in the window. As such, we
propose a working Algorithm 1 to perform the verification.
The algorithm is based on the timing difference between cache
and DRAM access. The experimental results indicate that
existing spectre-V1 attacks cannot reliably trigger speculative
DRAM access, indicating that the speculation window must
be extended.
Third, GhostKnight must reliably and efficiently extend
speculation window to trigger speculative execution-based bit
flips. A major reason why our triggered speculation window
length is restricted is that the window is nested by preced-
ing speculation windows. To this end, we apply previous
works [26], [17] to terminate the preceding windows right
before triggering our speculative DRAM access. However,
such works cannot work in our experimental setting. Instead,
we use an additional but empty loop to reliably stop the
preceding windows. The loop count is minimized to maximize
the efficiency of speculative DRAM access.
We implement GhostKnight on Ubuntu Linux and it is
able to trigger the aforementioned bug and make architectural
changes, i.e., permanently flip bits in DRAM. In our future
work, we target a 1024-bit RSA exponentiation implementa-
tion running in a trusted execution environment (TEE). The
TEE is provided by Intel EPT and prevents accesses from
malicious users and even kernel. We leverage GhostKnight to
cross the TEE, write exploitable bits into secret exponent of
the RSA algorithm and gain a controllable signature.
The main contributions of this paper are as follows:
• All currently known spectre-like attacks are limited
to breaching data confidentiality. In contrast to them,
GhostKnight undermines data integrity through specula-
tive execution.
• We present GhostKnight on Ubuntu Linux and induce
permanent bit flips in DRAM, making it possible to write
exploitable bits in memory of other domains.
The rest of the paper is structured as follows. In Sec-
tion II, we briefly introduce the background information. In
Section III, we present GhostKnight in detail. Section IV
discusses how to utilize GhostKnight to cross different memory
boundaries. Particularly, we will compromise a trusted execu-
tion environment enforced by MMU virtualization as our future
work. Section V and Section VI we summarize related works
and conclude this paper, respectively.
II. BACKGROUND
In this section, we first introduce the spectre vulnerability
and then describe modern DRAM organization as well as the
software-induced hardware bug.
A. Spectre
Spectre [21] is a hardware vulnerability allowing an unpriv-
ileged attacker to cross memory boundary and read any target
secrets. Specifically, this vulnerability resides in a Branch
Prediction Unit (BPU) of most modern CPUs. The BPU
enables the CPU to predict the branch target, and speculatively
execute a certain number of instructions on a predicted path,
the so-called “speculative execution” feature.
This feature improves performance greatly particularly
when the branch target is dependent on a value that is not
in CPU cache but stays in DRAM. As DRAM access is
much slower (a few hundred clock cycles) compared to CPU
access (several clock cycles), the CPU would not idle and wait
for the value to come from DRAM during this time period.
Instead, it saves a checkpoint of its current valid execution
state, attempts to guess the branch target based on a history of
branch executions and speculatively execute instructions along
the guessed path long before the value is known. Clearly, if
the educated guess turns out be wrong, then the CPU must
discard architectural changes (e.g., register values) caused by
the incorrectly executed instructions and revert the execution
state back to the saved checkpoint for the sake of security.
Unfortunately, microarchitectural side effects (e.g., CPU cache
state) during the speculative execution are irrevocable, building
a spectre-based timing side channel by which an adversary is
able to steal a protected secret from other privileged domains
(e.g., kernel, hypervisor).
Spectre-V1: given that there are several spectre variants
(e.g, Bounds Check Bypass and Branch Target Injection), in
this paper, we utilize the variant of bounds check bypass
as a case to implement GhostKnight. Specifically, spectre-
V1 is the bounds check bypass side channel. The bounds
check has conditional branch instructions used to check if
an array-index candidate is within a valid range. The spectre-
V1 abuses speculative execution to bypass the bounds check
and speculatively access invalid memory with an out-of-bound
index. The access will load invalid data into cache, which can
be leaked to attackers by previous side channel techniques such
as Flush+Reload [35].
B. Dynamic Random-Access Memory
Main memory of most modern computers uses Dynamic
Random-Access Memory (DRAM). Memory modules are usu-
ally produced in the form of dual inline memory module, or
DIMM, where both sides of the memory module have separate
electrical contacts for memory chips. Each memory module is
2
directly connected to the CPU’s memory controller through
one of the two channels. Logically, each memory module
consists of two ranks, corresponding to its two sides, and each
rank consists of multiple banks. A bank is structured as arrays
of cells with rows and columns. Every cell stores a binary
data whose value depends on whether the cell is electrically
charged or not.
The charge in the DRAM cell is not persistent and will
drain over time due to various charge leakage reasons [20]. To
prevent data loss, a periodic re-charge or refresh is required for
all cells. DRAM specification specifies that the DRAM refresh
interval is typically 32 or 64 ms, during which all cells within
a rank will be refreshed. The higher interval indicates a better
performance and thus 64ms is the default one.
Whenever a memory access to a desired bank occurs, this
“opens” a specified row by transferring all data in the row
to the bank’s row buffer and a specified column from the
row buffer will be accessed. As such, subsequent to the same
row will be served by the row buffer, while accessing/opening
another row will flush the row buffer.
C. A Software-induced Hardware Bug
Kim et al. [20] report the hardware bug that DRAM rows
are vulnerable to persistent charge leakage induced by adjacent
rows. They leverage FPGA to frequently open (also known
as rowhammer) one row within the DRAM refresh interval,
resulting in bit flips in a neighboring row. To trigger the
bug from modern processors, memory accesses initiated by
processors must also frequently reach a targeted row. As such,
an adversary has to clear the CPU caches and the row buffer,
and gain knowledge of how DRAM is accessed by the CPU.
Firstly, modern CPUs have multiple levels of caches to
effectively reduce the memory access time. If data is present
in the CPU cache, accessing it will be fulfilled by the cache
and never reach the DRAM memory. To this end, CPU cache
must be flushed in order to hammer rows and this can be done
explicitly by an unprivileged instruction (e.g., clflush) on
x86 or implicitly by eviction sets of physical pages [4], [13],
[5], [24], [27].
Secondly, since the row buffer facilitates the memory
accesses to the same row, bypassing it is also a necessity
for hammering. As mentioned above, hammering two different
rows within the same bank in an alternate manner can bypass
the row buffer. If the two rows happen to be one row apart, such
technique is called double-sided rowhammer. If not, then it
is coined single-sided rowhammer. Alternatively, one-location
rowhammer [12] forces the memory controller to clear the row
buffer and thus only needs to hammer one row.
Lastly, as all mainstream operating systems implement
memory isolation, virtual addresses are the way that almost all
programs running on the CPU access memory. To map a virtual
address to a DRAM address, CPU’s Memory Management
Unit (MMU) will translate the virtual address to a physical ad-
dress, which the memory controller will then map to a DRAM
address. The virtual to physical mapping can be addressed by
accessing pagemap or forcing huge-page allocation. Note that
unprivileged users can access the pagemap interface before
Linux kernel 4.0 [31] and they cannot allocate huge pages
since the superpage feature is disabled by default.
III. OVERVIEW
Our primary goal is to present the general idea of
GhostKnight and demonstrate it in a real-world system. In this
section, we firstly present the threat model and assumptions,
then identify the main challenges of GhostKnight and intro-
duce new techniques to overcome the challenges.
A. Threat Model and Assumptions
• The attacker controls an unprivileged user process that
has no special privileges such as accessing pagemap or
enabling superpage. That is, the attacker cannot obtain
the virtual-to-physical address mapping.
• The installed memory modules are susceptible to the
software-induced hardware bug [20]. Pessl et al. [29]
report that many mainstream DRAM manufacturers have
vulnerable DRAM modules, including both DDR3 and
DDR4 memory.
B. Main Challenges
In general, we have three following steps about
GhostKnight. 1 collect enough vulnerable memory addresses;
2 perform speculative hammering for each pair of addresses;
3 check if any bit flip occurs. If not, go to step 2 . In the
first step, we scan the available system memory and conduct
double-sided rowhammer tool 1 to collect many enough pairs
of vulnerable memory addresses that trigger bit flips. In the
second step, each speculative hammering includes effectively
and efficiently mistraining the CPU’s prediction logic and
triggering one speculative DRAM access. In the following, we
talk about the challenges of the speculative hammering.
efficient mistraining: previous spectre-type attacks effec-
tively mistrain the CPU’s prediction logic and how to per-
form effective mistraining vary among spectre variants. For
instance, spectre-V2 [21] feeds the branch predictor with
enough malicious destinations. GhostKnight can leverage any
spectre variant and take spectre-V1 as an example. To this end,
GhostKnight effectively mistrains the conditional branch logic
of spectre-V1 with enough legitimate array indexes.
As triggering the hardware bug requires high-frequency
of DRAM accesses, the mistraining efficiency should
be maximized. We empirically determine the minimum
number of array indexes by using a specific Intel
Performance Counter (PMC). The Intel PMC counts
the mispredicted conditional branch event [17] (i.e.,
BR_MISP_EXEC.TAKEN_CONDITIONAL). Specifically,
we first develop a kernel module to record the event count
reported by the spectre-V1 proof-of-concept code (PoC) [21]
as the baseline, and then reduce the valid array index number
one by one from the PoC until the event count is below the
baseline. We conduct such experiment on Lenovo Thinkpad
T420 with 2.6GHz Intel Core i5 2540M and 8GB DDR3
memory, and the results show that the minimal number that
ensures effective mistraining can be reduced from 5 to 4.
verifiable speculative DRAM access: after the mistraining
with a minimal number of valid array indexes, we can trigger
the speculative execution with an invalid array index, which
1 https://github.com/google/rowhammer-test
3
Algorithm 1: Verify a speculative DRAM access
1 Initially: vul addr is a vulnerable virtual address that needs
speculative hammering. victim array is a pre-allocated
array and its size is array size. threshold is a predefined
access latency, an indicator of a cache or DRAM access.
2 Function victim function(index)
3 if index < array size then
4 access victim array + index
5 end
6 flush both array size and vul addr.
7 invoke victim function with 4 valid indexes in sequence.
8 . efficient and effective mistraining.
9 invoke victim function with an invalid index.
10 . trigger speculative execution that will access vul addr.
11 latency ← profiling access to vul addr
12 if latency < threshold then
13 return 1 . speculative DRAM access succeeds.
14 end
15 return 0 . speculative DRAM access fails.
points to one vulnerable address. As the invalid array index
needs clflush and resides within DRAM, there exists an
speculation window between when the conditional branch in-
struction is issued and when it is committed. With the window,
the processor is expected to speculatively access the vulnerable
address from DRAM. To ensure that the access occurs in
DRAM rather than cache, we clflush the vulnerable address
before each speculative window. Note that if the window
is narrow, then the processor cannot complete the DRAM
access and thus fails to trigger the hardware bug. As the
aforementioned Intel PMC is only able to indicate whether
the speculation window exists, we cannot know whether the
window is large enough. To this end, we propose Algorithm 1
to verify whether a speculative DRAM access occurs within
the window.
Typically, modern processors have multiple levels of cache
and external physical memory (i.e., DRAM) to manage mem-
ory accesses. Compared to DRAM, caches are much smaller
but faster memory that store copies of frequently-accessed
values. To be specific, loading a value from caches costs no
more than 100 cycles while fetching it from DRAM often takes
several hundred cycles. When a value is fetched from DRAM,
a copy of that value and its nearby values in DRAM will be
placed into the cache, to reduce the latency of future accesses.
As such, we mistrain the branch predictor logic inside
victim function in line 7 and then trigger the speculative
execution in line 9, with the hope that vul addr pointed
by the invalid index will be accessed from DRAM once. To
check whether the DRAM access occurs, we profile the access
latency to the vulnerable address in line 11. If the access
latency is less than a predefined threshold (e.g., 100 cycles) in
line 12, then vul addr is speculatively accessed from DRAM
and thus cached. Otherwise, the speculative DRAM access fails
and returns 0.
We implement the Algorithm on Lenovo T420 based on
the above spectre-V1 PoC [21] and the results are displayed
in the dashed line in Figure 2. Clearly, we cannot observe a
single one speculative DRAM access out of 1000 speculative
executions by leveraging the existing spectre-V1 attack [21].
0 200 400 600 800 1000
Speculative Execution (#)
0
1
La
te
nc
y 
(b
in
ar
y)
Spectre-V1 GhostKnight
Fig. 2: 0 in Y-axis means that a speculative DRAM access
showed in X-axis does not occur. Current spectre-V1 At-
tack [21] has a small speculation window and cannot spec-
ulatively access a target address from DRAM as described in
the dashed line. GhostKnight extends the window and reliably
accesses the address from DRAM each time as shown in the
solid line.
reliable and efficient speculative DRAM access: as the
spectre-V1 attack cannot reliably hammer the vulnerable ad-
dress, the speculation window needs to be extended. A possible
reason why the window length is restricted is because the
window is nested into a preceding speculation window which
might be nested into another one.
To present a long enough speculation window for one
DRAM access, previous works [26], [17] use instructions such
as mfence or lfence to terminate the preceding windows
each time before triggering the speculative DRAM access.
However, most of the 1000 speculative executions still satisfy
the address access from cache, indicating that such works do
not work at least in our test machine.
To address this problem, we place one time-consuming
instruction (i.e., syscall) before triggering the target spec-
ulative execution and the instruction is used to exhaust the
preceding window. The experimental results indicate that such
placement ensures 100% speculative DRAM accesses. How-
ever, the time cost caused by the syscall is high, making
it impossible to trigger the hardware bug. To address the
efficiency problem, we replace the instruction with a finite but
empty loop. The loop wastes as few CPU cycles as possible in
order to achieve reliable and efficient speculative hammering.
As the solid line in Figure 2 shows, GhostKnight has extended
the speculation window and ensures illegitimate DRAM access
inside the window.
C. Time cost per speculative hammering
After addressing the above challenges, we profile the time
cost taken by one speculative hammering to verify whether
GhostKnight is efficient enough to trigger the hardware bug
and flip bits. Given that double-sided hammering is the most
4
300 400 500 600 700 800 900 1000110012001300140015001600
Time (cycles) per hammer
0
500
1000
1500
2000
Ti
m
e 
(s
ec
on
ds
) u
nt
il 
fir
st
 b
it 
fli
p
(a) As the time cost per double-sided hammer increases, the time to
find the first bit flip also grows. When the time cost per hammer is
greater than 1500 cycles, no bit flip is observed within 2 hours.
0 20 40 60 80 100
Hammer Number (#)
1000
1100
1200
1300
1400
1500
1600
Ti
m
e 
(c
yc
le
s)
 p
er
 h
am
m
er
(b) 100% of the time costs per speculative hammering are less than
1500. 92% of the time costs are within the range of {1200, 1400}.
Fig. 3: The time cost caused by one speculative hammering is clearly below the maximum cost that allows bit flips, indicating
that GhostKnight is efficient enough to induce bit flips.
efficient way to flip bits, we use the aforementioned rowham-
mer tool to determine the maximum time cost that permits a
bit flip on the test machine.
As the clflush instruction is the most efficient (costs
below 200 cycles) and effective (cache miss rate per one round
is 100%) way to flush all levels of CPU caches, we modify
the tool to embed two such instructions inside one round of
double-sided hammering. In order to increase the time cost
of each round, we put a certain number of NOP instructions
preceding the clflush instructions per one hammering. We
incrementally add the NOP number so that the time cost per
hammering will grow. The time cost for the first bit flip to
occur is shown in Figure 3b. As we can see from the Figure,
when the NOP number is 0, the first bit flip is observed within
10 seconds. As the NOP number grows, the time costs until the
first bit flip also increase. When the time cost per hammer is
more than 1500 cycles, we cannot observe the bit flip within
2 hours. As such, 1500 can be the maximum cost that allows
bit flips.
We then check whether the time taken per speculative
hammering is less than the maximum cost. Unfortunately,
speculative hammering one address is costly, making spec-
ulative hammering a pair of addresses hard to meet the
time requirement. Instead, GhostKnight applies speculative
hammering on one address and direct hammering on the other
address, and the time cost of hammering the address pair
is displayed in Figure 3. All the time costs per speculative
hammering are well below 1500, indicating that GhostKnight
is efficient to induce bit flips. In our experiments, GhostKnight
can induce the first bit flip within 5 minutes of speculative
hammering. Alternatively, GhostKnight can leverage the one-
location hammering to speculatively hammer only one address.
IV. DISCUSSION
In this section, we first talk about our future work, that is,
compromise a trusted execution environment (TEE) enforced
by MMU virtualization and then we discuss how to cross other
privilege boundaries.
A. Attacking MMU Virtualization
MMU Virtualization: in an MMU-assisted virtualization
environment, there are two levels of page tables. The first-
level page table, i.e., Guest Page Table (GPT), is managed
by the kernel in the guest space, and the other one, e.g.,
Intel’s Extended Page Table (EPT), AMD’s Nested Page Table
(NPT) or ARM’s Second-stage Page Table, is managed by the
hypervisor in the hypervisor space. The hardware checks the
access permissions at both levels for a memory access. If the
hypervisor removes the executable permission for a page Pa
in the EPT, then the page Pa can never be executed, regardless
of its access permissions in the GPT.
As a result, such hypervisor-based access control removes
the potentially compromised kernel out of Trusted Computing
Base (TCB) and motivates numerous security defenses [34],
[9], [14], [8], [11]. Such works rely on the hardware-assisted
hypervisor to provide a trusted execution environment (TEE)
for critical code and data of a target application, thus safe-
guarding data confidentiality and data integrity. Among the
defenses, AppShield [9] is a typical example and provides
integrity and confidentiality of data residing in TEE. AppShield
is a tiny hypervisor with the MMU virtualization deployed.
It is able to isolate a protected application’s virtual address
space and block accesses to the address space unless they are
authorized by the application. The application is full-fledged
without any restriction such that it can request the untrusted
kernel to (de)allocate DRAM memory and access the allocated
memory at native speed as in a bare-metal (unprotected)
setting.
5
Essentially, GhostKnight can defeat all the above defenses
that rely on the MMU virtualization. As a case, GhostKnight
will show how to cross the privilege boundary enforced by
AppShield and write exploitable bits in the TEE. Specifi-
cally, we place a 1024-bit RSA exponentiation implementation
(using square and multiply as its exponentiation) into TEE
provied by AppShield, and leverage GhostKnight to bypass
AppShield’s security guarantees by writing bits into the secret
exponent of the algorithm, resulting in an attacker-controllable
signature.
B. Attacking Other Privilege Boundaries
Also, GhostKnight potentially can bypass other known
privilege boundaries and flip bits in other privilege domains.
For instance,
• intro-process separation (i.e., sandboxed code);
• inter-process separation;
• user-kernel separation;
• hardware encalve separated from user or kernel-space;
• remote-local process separation;
When defeating MMU virtualization, GhostKnight can
simply flush cached target vulnerable address pairs by us-
ing the clflush instruction although they are inaccessible.
However, it is challenging for GhostKnight to flush target
address pairs when breaking other privilege boundaries, since
clflush is not available in a sandboxed environment or can-
not be applied to a privileged vulnerable address. Intuitively,
there are two following approaches for GhostKnight to either
flush or bypass cache.
Eviction-based: an attacker can evict any target address
by accessing enough congruent memory addresses which are
mapped to the same cache set as the target address [4],
[13], [5], [24], [27], [37]. We can use this approach to evict
a cached privileged address although we do not have the
access permission. To achieve a high DRAM-access frequency,
the eviction should be efficient and requires a small enough
eviction set congruent to the target address.
Uncached memory-based: as direct memory access (DMA)
memory is uncached, past attacks (e.g., Throwhammer [32],
Nethammer [23] Drammer [33]) have abused DMA memory
for hammering. Similarly, we might find such uncached priv-
ileged memory for speculative hammering.
V. RELATED WORK
In this section, we first decompose existing spectre-type
attacks from a high-level and then briefly introduce the mi-
croarchitectural causes of such attacks.
A. Attack Phases
All existing spectre-like attacks have demonstrated how to
abuse different speculation primitives (e.g., conditional branch
direction prediction) of modern processors to read arbitrary
memory. In general, these attacks can be decomposed into 4
common phases:
• Prepare microarchitectural side channel. As speculative
executions leave side effects in microarchitectural buffers
such as cache, private data can be inferred in the last phase
using existing timing vectors (e.g., Prime+Probe [24]).
To this end, microarchitectural buffer states are polluted.
• Prepare misspeculative execution. Depending on a spe-
cific speculation primitive, the CPU is tricked into exe-
cuting code with attacker-controlled arguments. The code
is usually within the context of a target privileged domain
such as kernel.
• Trigger misspeculative execution. There exists a time
window between when permission checks in the pipeline
are issued and when they are committed or retired. To
fully utilize the window, the CPU will make mispredic-
tions based on the previous phase and execute transient
instructions, resulting in permanent microarchitectural
state changes but transient architectural state changes.
The misspeculative execution encodes secrets of other
domains through microarchitectural state changes.
• Read Secrets via microarchitectural side channel. In this
phase, secrets are reconstructed by decoding the micro-
architectural state changes. This can be done by using an
existing timing vector mentioned in the first phase.
B. Microarchitectural Causes
Pattern History Table: spectre-V1 [21], NetSpectre [30]
and SGXSpectre [28] poison Pattern History Table (PHT) to
enable branch direction misprediction. The PHT, a component
of the branch prediction unit (BPU), is a two-dimensional
table of counters and each table entry is a 2-bit saturating
counter. The counter stores one of two kinds of information.
One is about the virtual address bits of a recently executed
branch instruction, and the other is a combination of the branch
instruction address and the outcome of the branch (i.e., branch
history) [6], [10]. Based on the PHT, the CPU can predict
whether a conditional branch should be taken or not.
Branch Target Buffer: spectre-V2 [21] and SGXPectre [7]
poison Branch Target Buffer (BTB) to enable branch target
misprediction. The BTB is also a component of the BPU and
stores target virtual addresses of N most recently executed
branches. By looking up the BTB, the CPU can directly
obtain the target address and speculatively fetch corresponding
instructions in the next cycle.
Return Stack Buffer: both Koruyeh et al. [22] and
Maisuradze et al. [25] poison Return Stack Buffer (RSB) to
hijack the return flow during the CPU’s speculative execution.
The RSB stores the N most recent return virtual addresses,
that is, the virtual addresses following the N most recent call
instructions. To predict the return address before executing a
ret instruction, the CPU first pops the top most entry from
the RSB to predict the return destination.
Store To Load Dependency: spectre-V4 [15] poisons Store
To Load Dependency (STLD) to trick the CPU into specula-
tively execute a load instruction even if it is unknown wheter
the instruction is overlapped with previous store instructions.
The STLD requires that a load micro-op shall not be executed
before all preceding store micro-ops complete writing to the
same memory location. For the sake of performance, the
CPU’s memory disambiguator will predict which load does
not depend on any prior stores. If there is a load that requires
no such dependency, then the load will speculatively read data
6
from the L1 data cache. When the physical addresses of all
prior stores are known, the prediction is verified. If the load
conflicts with at least one previous store, the load and its
succeeding instructions are re-executed.
VI. CONCLUSION
In this paper, we demonstrated the first exploit that utilized
speculative execution to break data integrity. In the near future,
we will demonstrate a GhostKnight-based attack to defeat
the MMU virtualization and write exploitable bits beyond the
privilege boundary.
REFERENCES
[1] AMD, Inc., “Secure virtual machine architecture reference manual,”
Dec. 2005.
[2] I. Anati, S. Gueron, S. Johnson, and V. Scarlata, “Innovative technology
for cpu based attestation and sealing,” in Proceedings of the 2nd
international workshop on hardware and architectural support for
security and privacy, vol. 13. ACM New York, NY, USA, 2013.
[3] ARM, Inc., “Secure virtual machine architecture reference manual,”
Dec. 2005.
[4] Z. B. Aweke, S. F. Yitbarek, R. Qiao, R. Das, M. Hicks, Y. Oren, and
T. Austin, “Anvil: Software-based protection against next-generation
rowhammer attacks,” ACM SIGPLAN Notices, vol. 51, no. 4, pp. 743–
755, 2016.
[5] E. Bosman, K. Razavi, H. Bos, and C. Giuffrida, “Dedup est machina:
Memory deduplication as an advanced exploitation vector,” in Security
and Privacy, 2016 IEEE Symposium on. IEEE, 2016, pp. 987–1004.
[6] C. Canella, J. Van Bulck, M. Schwarz, M. Lipp, B. Von Berg, P. Ortner,
F. Piessens, D. Evtyushkin, and D. Gruss, “A systematic evaluation of
transient execution attacks and defenses,” in 28th {USENIX} Security
Symposium ({USENIX} Security 19), 2019, pp. 249–266.
[7] G. Chen, S. Chen, Y. Xiao, Y. Zhang, Z. Lin, and T. H. Lai, “Sgxpectre:
Stealing intel secrets from sgx enclaves via speculative execution,” in
2019 IEEE European Symposium on Security and Privacy (EuroS&P).
IEEE, 2019, pp. 142–157.
[8] X. Chen, T. Garfinkel, E. C. Lewis, P. Subrahmanyam, C. A. Wald-
spurger, D. Boneh, J. Dwoskin, and D. R. Ports, “Overshadow: a
virtualization-based approach to retrofitting protection in commodity
operating systems,” ACM SIGOPS Operating Systems Review, vol. 42,
no. 2, pp. 2–13, 2008.
[9] Y. Cheng, X. Ding, and R. Deng, “Appshield: Protecting applications
against untrusted operating system,” Singaport Management University
Technical Report, SMU-SIS-13, vol. 101, 2013.
[10] D. Evtyushkin, R. Riley, N. C. Abu-Ghazaleh, ECE, and D. Ponomarev,
“Branchscope: A new side-channel attack on directional branch predic-
tor,” ACM SIGPLAN Notices, vol. 53, no. 2, pp. 693–707, 2018.
[11] T. Garfinkel, B. Pfaff, J. Chow, M. Rosenblum, and D. Boneh, “Terra: A
virtual machine-based platform for trusted computing,” in ACM SIGOPS
Operating Systems Review, vol. 37, no. 5. ACM, 2003, pp. 193–206.
[12] D. Gruss, M. Lipp, M. Schwarz, D. Genkin, J. Juffinger, S. O’Connell,
W. Schoechl, and Y. Yarom, “Another flip in the wall of rowhammer
defenses,” arXiv preprint arXiv:1710.00551, 2017.
[13] D. Gruss, C. Maurice, and S. Mangard, “Program for testing for the
dram rowhammer problem using eviction,” https://github.com/IAIK/
rowhammerjs, May 2017.
[14] O. S. Hofmann, S. Kim, A. M. Dunn, M. Z. Lee, and E. Witchel,
“Inktag: Secure applications on an untrusted operating system,” in ACM
SIGPLAN Notices, vol. 48, no. 4. ACM, 2013, pp. 265–278.
[15] J. Horn, “speculative store bypass,” https://bugs.chromium.org/p/
project-zero/issues/detail?id=1528, February 2018.
[16] Intel, Inc., “Intel 64 and IA-32 architectures software developer’s
manual combined volumes: 1, 2a, 2b, 2c, 3a, 3b and 3c,” Oct. 2011.
[17] ——, “Intel 64 and IA-32 architectures optimization reference manual,”
Sep. 2014.
[18] G. Irazoqui, T. Eisenbarth, and B. Sunar, “S$a: A shared cache attack
that works across cores and defies vm sandboxing–and its application
to aes,” in 2015 IEEE Symposium on Security and Privacy. IEEE,
2015, pp. 591–604.
[19] D. Kaplan, J. Powell, and T. Woller, “Amd memory encryption,” White
paper, 2016.
[20] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson,
K. Lai, and O. Mutlu, “Flipping bits in memory without accessing them:
An experimental study of dram disturbance errors,” in ACM SIGARCH
Computer Architecture News, vol. 42. IEEE Press, 2014, pp. 361–372.
[21] P. Kocher, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp,
S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom, “Spectre attacks:
Exploiting speculative execution,” arXiv preprint arXiv:1801.01203,
2018.
[22] E. M. Koruyeh, K. N. Khasawneh, C. Song, and N. Abu-Ghazaleh,
“Spectre returns! speculation attacks using the return stack buffer,” in
12th {USENIX} Workshop on Offensive Technologies ({WOOT} 18),
2018.
[23] M. Lipp, M. T. Aga, M. Schwarz, D. Gruss, C. Maurice, L. Raab, and
L. Lamster, “Nethammer: Inducing rowhammer faults through network
requests,” arXiv preprint arXiv:1805.04956, 2018.
[24] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee, “Last-level cache
side-channel attacks are practical,” in 2015 IEEE Symposium on Security
and Privacy. IEEE, 2015, pp. 605–622.
[25] G. Maisuradze and C. Rossow, “ret2spec: Speculative execution using
return stack buffers,” in Proceedings of the 2018 ACM SIGSAC Confer-
ence on Computer and Communications Security, 2018, pp. 2109–2122.
[26] A. Mambretti, M. Neugschwandtner, A. Sorniotti, E. Kirda, W. Robert-
son, and A. Kurmus, “Speculator: a tool to analyze speculative execution
attacks and mitigations,” in Proceedings of the 35th Annual Computer
Security Applications Conference, 2019, pp. 747–761.
[27] C. Maurice, M. Weber, M. Schwarz, L. Giner, D. Gruss, C. A. Boano,
S. Mangard, and K. Ro¨mer, “Hello from the other side: Ssh over robust
cache covert channels in the cloud.” in NDSS, 2017, pp. 8–11.
[28] D. OKeeffe, D. Muthukumaran, P.-L. Aublin, F. Kelbert, C. Priebe,
J. Lind, H. Zhu, and P. Pietzuch, “Spectre attack against sgx enclave,”
Jan. 2018.
[29] P. Pessl, D. Gruss, C. Maurice, M. Schwarz, and S. Mangard, “Drama:
Exploiting dram addressing for cross-cpu attacks,” in USENIX Security
Symposium, 2016, pp. 565–581.
[30] M. Schwarz, M. Schwarzl, M. Lipp, J. Masters, and D. Gruss, “Net-
spectre: Read arbitrary memory over network,” in European Symposium
on Research in Computer Security. Springer, 2019, pp. 279–299.
[31] K. A. Shutemov, “Pagemap: Do not leak physical addresses to non-
privileged userspace,” https://lwn.net/Articles/642074/, 2015.
[32] A. Tatar, R. K. Konoth, E. Athanasopoulos, C. Giuffrida, H. Bos, and
K. Razavi, “Throwhammer: Rowhammer attacks over the network and
defenses,” in 2018 USENIX Annual Technical Conference, 2018.
[33] V. van der Veen, Y. Fratantonio, M. Lindorfer, D. Gruss, C. Maurice,
G. Vigna, H. Bos, K. Razavi, and C. Giuffrida, “Drammer: Determinis-
tic rowhammer attacks on mobile platforms,” in Proceedings of the 2016
ACM SIGSAC Conference on Computer and Communications Security.
ACM, 2016, pp. 1675–1689.
[34] J. Yang and K. G. Shin, “Using hypervisor to provide data secrecy
for user applications on a per-page basis,” in Proceedings of the fourth
ACM SIGPLAN/SIGOPS international conference on Virtual execution
environments. ACM, 2008, pp. 71–80.
[35] Y. Yarom and K. Falkner, “Flush+ reload: a high resolution, low noise,
l3 cache side-channel attack,” in 23rd {USENIX} Security Symposium
({USENIX} Security 14), 2014, pp. 719–732.
[36] G. P. Zero, “Spectre v2,” channel.https://googleprojectzero.blogspot.ch/
2018/01/reading-privileged-memory-with-side.html, 2018.
[37] Z. Zhang, Y. Cheng, D. Liu, S. Nepal, and Z. Wang, “Teleham-
mer: A stealthy cross-boundary rowhammer technique,” arXiv preprint
arXiv:1912.03076, 2019.
7
