TeleHammer: A Formal Model of Implicit Rowhammer by Zhang, Zhi et al.
TeleHammer: A Formal Model of Implicit
Rowhammer
Zhi Zhang∗,1,2, Yueqiang Cheng∗,3, Dongxi Liu2, Surya Nepal2, and Zhi Wang4
∗ Both authors contributed equally to this work
1 University of New South Wales
2 Data61, CSIRO, Australia
3 Baidu Security
4 Florida State University, America
Abstract—The rowhammer bug is to frequently access hammer
rows to induce bit flips in their adjacent victim rows, allowing an
attacker to gain privilege escalation or steal private data. A key
requirement of all existing rowhammer attacks is that an attacker
must have access to at least part of an exploitable hammer row.
We refer to such rowhammer attacks as PeriHammer. The state-
of-the-art software-only defenses against PeriHammer attacks
is to make the exploitable hammer rows beyond the attacker’s
access permission.
In this paper, we question the necessity of the above require-
ment and propose a new class of rowhammer attacks, termed
as TeleHammer. It is a paradigm shift in rowhammer attacks
since it crosses privilege boundary to stealthily rowhammer
an inaccessible row by implicit DRAM accesses. Such accesses
are achieved by abusing inherent features of modern hardware
and/or software. We propose a generic model to rigorously
formalize the necessary conditions to initiate TeleHammer and
PeriHammer, respectively. Compared to PeriHammer, TeleHam-
mer can defeat the advanced software-only defenses, stealthy in
hiding itself and hard to be mitigated.
To demonstrate the practicality of TeleHammer and its ad-
vantages, we have created a TeleHammer’s instance, called
PThammer, which leverages the address-translation feature of
modern processors. We observe that a memory access from
user space can induce a load of a Level-1 page-table entry
(L1PTE) from memory and thus hammer the L1PTE once,
although L1PTE is not accessible to us. To achieve a high
enough hammering frequency, we flush relevant TLB and cache
effectively and efficiently. To this end, we demonstrate PThammer
on three different test machines and show that it can cross user-
kernel boundary and induce the first bit flips in L1PTEs within 15
minutes of double-sided hammering. PThammer does not require
the superpage system setting, and works on Ubuntu Linux.
We have exploited PThammer to defeat advanced software-only
rowhammer defenses in default system setting.
I. INTRODUCTION
In 2014, Kim et al. [18] discovered an infamous software-
induced hardware fault, the so-called “rowhammer” bug.
Specifically, frequent accessing the same addresses in two
DRAM (Dynamic Random Access Memory) rows (i.e., ham-
mer rows) can cause bit flips in an adjacent row (i.e., a victim
row). If sensitive structures, such as page tables, are placed
onto the victim row, an adversary can corrupt the structures by
exploiting adjacent hammer rows although she has no access
to the structures. As such, the bug can be exploited to break
MMU-based domain isolation between different security do-
mains (e.g., user and kernel) without software vulnerabilities,
enabling a powerful class of attacks targeting DRAM-based
systems. The attacks are so hazardous that they can either
gain the privilege escalation [34], [12], [5], [8], [35], [11],
[36], [38] or steal the private data [33], [3], [21]. To exploit
the bug, all existing rowhammer attacks require access to at
least part of an exploitable hammer row (a hammer row is
exploitable when part of it is sensitive [21] or its adjacent
victim row is sensitive [34] ) as shown in Figure 1. As their
access to the hammer row is legitimate and conforms to the
privilege boundary enforced by MMU, we term such attacks
as PeriHammer.
To defeat PeriHammer-type attacks, numerous hardware and
software based mitigation techniques have been proposed. As
hardware based mitigation require DRAM updates or upgrade
and cannot be backported, recent software only defenses
including CATT [6], RIP-RH [4] and CTA [37] are practical
for bare-metal systems. These defenses in common enforce
DRAM-based memory isolation at different granularity to
deprive the attackers of access to the exploitable hammer rows.
Take CTA [37] as an example, it primarily isolates DRAM
memory of page tables such that all exploitable hammer rows
that can flip bits in page tables are beyond the privilege
boundary of unprivileged attackers.
TeleHammer: in this paper, we introduce a paradigm shift
in rowhammer attacks through a new class of rowhammer
attacks, called TeleHammer. As shown in Figure 1, an at-
tacker T in TeleHammer tricks a benign entity to implicitly
hammer the exploitable rows that are inaccessible. This cross-
privilege-boundary property thus eliminates the above key
requirement of PeriHammer. Essentially, TeleHammer abuses
built-in features of modern hardware and/or software (such
as speculative execution, system call handler, etc.) and the
entity is either hardware such as processor or software such
as system call handler. For instance, an unprivileged attacker
can exploit speculative execution to cross user-kernel privilege
boundary and access kernel memory [19]. If each access
occurs frequently within DRAM, then the accessed kernel
memory might be vulnerable to the rowhammer bug.
To exploit the rowhammer bug, TeleHammer has the fol-
lowing challenges to overcome. First, an unprivileged attacker
such as T in Figure 1 should find one feature of either
ar
X
iv
:1
91
2.
03
07
6v
3 
 [c
s.C
R]
  3
0 J
ul 
20
20
…
…
𝑇
Cache/DRAM DRAM
X
only accessible to X
…
TeleHammer
PeriHammer
only accessible to P
Hammer Row
only accessible to both X and P
Hammer Row
…
P
Fig. 1: TeleHammer and PeriHammer. In all existing rowham-
mer attacks (i.e., PeriHammer), an attacker P requires ex-
ploitable hammer rows to be at least partially accessible.
An attacker T in TeleHammer removes this requirement by
tricking a benign entity X (e.g., system call handler) to
implicitly hammer the exploitable hammer rows. Compared
to PeriHammer, the across-privilege-boundary property makes
TeleHammer stealthy and hard to be mitigated, which is the
main difference between TeleHammer and PeriHammer.
hardware or software to identify a path (e.g., solid line with an
arrow in Figure 1) to the inaccessible and exploitable hammer
rows. Second, T should have an approach to effectively specify
the path each time when tricking X into hammering the ham-
mer rows. Third, both specifying and hammering operations
should be efficient enough so as to trigger the rowhammer bug
(see details in Section III-B).
We propose a generic formal model to rigorously formalize
the above challenges. Also, the model can be used to formalize
PeriHammer and show that PeriHammer is a special case of
TeleHammer. TeleHammer exhibits the following advantages
over PeriHammer.
• TeleHammer can defeat the aforementioned advanced
software-only defenses [6], [4], [37], since it uses cross-
privilege-boundary rowhammer and eschews the aforemen-
tioned critical requirement of PeriHammer.
• TeleHammer is stealthy, since it hammers a hammer row
not by itself but using a target domain (e.g., kernel), making
it hard to trace the real culprit.
• TeleHammer is difficult to defend against, since abundant
instances can be derived by its design. A countermeasure
against specific instances cannot defeat TeleHammer.
PThammer: to demonstrate the practicality of TeleHammer,
we create a working instance, called PThammer, that satisfies
the formal model of TeleHammer. Specifically, we observe
that a memory access triggers the address translation in
modern OSes on x86-64 microarchitecture. In response to
the memory access, the processor first searches Translation-
lookaside Buffer (TLB) to check if a corresponding physical
address exists. If the search fails (i.e., a TLB miss), then the
processor searches paging structure that hosts a partial address
mapping of different page-table levels [2]. If another miss
occurs, it fetches four-level page-table entries (PTEs) from
CPU cache, otherwise DRAM memory. Fetching PTEs from
memory causes hammering the PTEs once. Intuitively, the
four-level PTEs will be hammered if TLB, paging structure
and cache are flushed out effectively and efficiently.
PThammer abuses the above address-translation feature to
target Level-1 PTEs (L1PTEs). It requires effectively and
efficiently flushing corresponding address mappings from TLB
and relevant L1PTEs from cache. However, it is not authorized
to explicitly perform such flushes by the means of instructions.
Instead, PThammer first prepares a complete pool of eviction
sets for TLB and cache, respectively, which can be used to
implicitly flush a target entry from TLB and a target memory
line from cache. From both TLB and cache eviction pools,
PThammer respectively selects two eviction sets for subse-
quent double-sided hammering. Note that the pool preparation
for either TLB or cache is a one-off cost and can be done at the
beginning of PThammer. We launch PThammer as an unpriv-
ileged process in two system settings (i.e., superpage system
setting is disabled by default or enabled) on three different
machines. The experimental results indicate that PThammer is
able to cross the user-kernel boundary and induce the first bit
flip in L1PTEs within 15 minutes of double-sided hammering
in either setting. Furthermore, we develop PThammer-based
exploit to defeat all the aforementioned practical defenses (see
details in Section IV).
Note that PThammer is not the only working instance
of TeleHammer, and there should exist other instances that
also leverage built-in system features to achieve the frequent
implicit DRAM accesses. We discuss potential instances in
Section V.
Summary of Contributions: the main contributions of this
paper are as follows:
• All previous rowhammer exploits (i.e., PeriHammer) re-
quire access permission to exploitable hammer rows. In
contrast, we propose a new class of rowhammer attacks,
called TeleHammer, that proposes cross-privilege-boundary
rowhammer and voids the critical requirement.
• We present a generic model to formally define necessary
conditions to launch TeleHammer and PeriHammer, respec-
tively. Based on the model, we summarize three advantages
of TeleHammer over PeriHammer.
• We propose an instance of TeleHammer, called PThammer,
that leverages the address-translation feature to cross the
user-kernel boundary and induce rowhammer bit flips in
Level-1 page tables.
• We evaluate PThammer in two system settings (i.e., su-
perpage is either inactive by default or active) on three
different machines, which indicate that PThammer can cross
the user-kernel boundary and induce the first bit flip in
Level-1 page table entries within 15 minutes of double-
sided hammering, and defeat the aforementioned advanced
software-only defenses in default system setting.
II. BACKGROUND AND RELATED WORK
In this section, we first introduce CPU cache, Translation-
lookaside Buffer (TLB), Dynamic Random-access Memory
(DRAM) and then the rowhammer bug as well as its attacks.
A. CPU Cache
In commodity Intel x86 micro-architecture platforms, there
are three levels of CPU caches. Among all levels of caches,
the first level of cache (i.e., L1 cache) is closest to CPU. L1
cache has two types of caches, i.e., L1D caching data and L1I
caching instructions. The second level of cache, L2, is unified
caching both data and instructions. Similar to L2, the last-level
cache (LLC) or L3, is also unified. Generally speaking, cache
of a specific level is set-associative and it consists of S sets.
Each set contains L lines and data or code can be cached in
any line of the set; this is referred as an L-way set-associative
cache set. For each cache line, it stores B bytes. Thus, the
overall cache size of that level will be S× L× B.
When an accessed variable is stored in a cache set, Intel
micro-architectures use its virtual or physical address to decide
its corresponding cache set of a specific cache level. For
instance, L1 cache set is indexed using bits 6 to 11 of a virtual
address. For L3, its indexing scheme is more complicated. In
contrast to L1 and L2 that are private to a physical core, L3 is
shared among all cores. So L3 cache is firstly partitioned into
slices, and one slice serves one core with a higher priority. For
each slice, it is further divided into cache sets as mentioned
above. As such, some physical-address bits are XORed to
decide a slice, and some bits (bits 6 to 16) are XORed to
index a cache set [25].
B. Translation-lookaside Buffer
Translation Lookaside Buffer (TLB) has two levels. The
first-level (i.e., L1), consists of two parts: one that caches
translations for code pages, called L1 instruction TLB (L1
iTLB), and the other that caches translations for data pages,
called L1 data TLB (L1 dTLB). The second level TLB (L2
sTLB) is larger and shared for translations of both code and
data. Similar to the CPU cache above, the TLB at each level
is also partitioned into sets of ways. One way is a TLB entry
that stores one address mapping between a virtual address and
a physical address.
Note that a virtual address (VA) determines a TLB set of
each level. Although there is no public information about the
mapping between the VA and the TLB set, it has been reverse-
engineered on quite a few Intel commodity platforms [10].
C. Dynamic Random-access Memory
Main memory of most modern computers uses Dynamic
Random-access Memory (DRAM). Memory modules are usu-
ally produced in the form of dual inline memory module, or
DIMM, where both sides of the memory module have separate
electrical contacts for memory chips. Each memory module is
directly connected to the CPU’s memory controller through
one of the two channels. Logically, each memory module
consists of two ranks, corresponding to its two sides, and each
rank consists of multiple banks. A bank is structured as arrays
of memory cells with rows and columns.
Every cell of a bank stores one bit of data whose value
depends on whether the cell is electrically charged or not.
A row is a basic unit for memory access. Each access to a
bank “opens” a row by transferring the data in all the cells
of the row to the bank’s row buffer. This operation discharges
all the cells of the row. To prevent data loss, the row buffer
is then copied back into the cells, thus recharging the cells.
Consecutive access to the same row will be fulfilled by the
row buffer, while accessing another row will flush the row
buffer.
D. Rowhammer Overview
Rowhammer bugs: Kim et al. [18] discovered that current
DRAMs are vulnerable to disturbance errors induced by
charge leakage. In particular, their experiments have shown
that frequently opening the same row (i.e., hammering the row)
can cause sufficient disturbance to a neighboring row and flip
its bits without even accessing the neighboring row. Because
the row buffer acts as a cache, another row in the same bank is
accessed to flush the row buffer after each hammering so that
the next hammering will re-open the hammered row, leading
to bit flips of its neighboring row.
Hammering techniques: generally speaking, there are three
techniques regarding hammering a vulnerable DRAM.
Double-sided hammering: it is the most efficient technique
to induce bit flips. Two adjacent rows of a victim row are
hammered simultaneously and the adjacent rows are called
hammer rows [18].
Single-sided hammering: Seaborn et al. [34] proposed a
single-sided hammering by randomly picking multiple ad-
dresses and just hammering them with the hope that such
addresses are in different rows within the same bank.
One-location hammering: one-location hammering [11]
randomly selects a single address for hammering. It exploits
the fact that advanced DRAM controllers employ a more
sophisticated policy to optimize performance, preemptively
closing accessed rows earlier than necessary.
Key requirements: the following requirements are needed by
PeriHammer-based attacks to gain either privilege escalation
or private information.
First, CPU cache must be either flushed or bypassed. It can
be invalidated by instructions such as clflush on x86. In
addition, conflicts in the cache can evict data from the cache
since cache is much smaller than the main memory. Therefore,
to evict hammer rows from the cache, we can use a crafted
access pattern [13] to cause cache conflicts for hammer rows.
Also, we can bypass the cache by accessing uncached memory.
Second, the row buffer must be cleared between consecutive
hammering DRAM rows. Both double-sided and single-sided
hammering explicitly perform alternate access to two or more
rows within the same bank to clear the row buffer. One-
location hammering relies on the memory controller to clear
the row buffer.
Third, existing rowhammer attacks require that a hammer
row be accessible to an attacker in order to gain the privilege
escalation or steal the private data, such that a victim row can
be compromised by hammering the hammer row.
Fourth, either the hammer row or the victim row must
contain sensitive data objects (e.g., page tables) we target. If
the victim row hosts the data objects, an attacker can either
gain the privilege escalation or steal the private data [34], [3].
If the hammer row hosts the data objects, an attacker can steal
the private data [21].
1) Rowhammer Attacks: In order to trigger rowhammer
bug, frequent and direct memory access is a prerequisite. Thus,
we classify rowhammer attacks into three categories based on
how they flush or bypass cache.
Instruction-based: the clflush instruction is commonly
used for explicit cache flush [18], [34], [11], [33] ever since
Kim et al. [18] revealed the rowhammer bug. Also, Qiao et
al. [32] reported that non-temporal store instructions such as
movnti and movntdqa can be used to bypass cache and
access memory directly.
Eviction-based: alternatively, an attacker can evict a target
address by accessing congruent memory addresses which are
mapped to the same cache set and same cache slice as the
target address [1], [13], [5], [25], [27]. A large enough set
of congruent memory addresses is called an eviction set. Our
PThammer also chooses the eviction-based approach to flush
Level-1 PTEs from cache.
Uncached Memory-based: as direct memory access (DMA)
memory is uncached, past rowhammer attacks such as
Throwhammer [35] and Nethammer [23] on x86 microarchi-
tecture and Drammer [36] on ARM platform have abused
DMA memory for hammering. Note that such attacks hammer
target rows that are within an attacker’s access permission.
III. TELEHAMMER OVERVIEW
In this section, we first present the threat model and assump-
tions, and then introduce the formal model of TeleHammer,
followed by an instance of TeleHammer to demonstrate its
practicality.
A. Threat Model and Assumptions
Our threat model is similar to other rowhammer attacks [38],
[33], [32], [5], [12], [34]. Specifically,
• The kernel is considered to be secure against software-only
attacks. In other words, our attack does not rely on any
software vulnerabilities.
• An adversary controls an unprivileged user process that
has no privileges such as accessing pagemap that has the
mapping between a virtual address and a physical address.
• An attacker has no knowledge about the kernel memory
locations that are bit-flippable.
• The installed DRAM modules are susceptible to
rowhammer-induced bit flips. Pessl et al. [31] report
that mainstream DRAM manufacturers have vulnerable
DRAM modules, including DDR3 and DDR4.
B. Formal Modeling of TeleHammer
We propose a formal model of TeleHammer to characterize
its attack paradigm.
Let U be a set of entities, which can be any component in
modern OSes that is able to initiate memory accesses. If u ∈
U , then u could be, for instance, a user process, kernel and etc.
A set of memory addresses is denoted by M . Each memory
address has access permissions assigned to each entity. Given
a memory address m ∈ M , the permission function Π(m)
returns a set of entities that can access (e.g., read) m.
Only in this model, memory refers to not only the DRAM
memory row but also other types of high-speed memory (e.g.,
cache, register, DRAM row buffer, etc.). Generally, a memory
access starts by searching the content in the fastest memory
hardware first (e.g., registers). If the search fails, then it goes to
other memory hardware such as the slowest DRAM memory
row. The validity function V(m) ∈ {0, 1} indicates whether m
contains valid contents (i.e., V(m) = 1) or not (i.e., V(m) = 0)
to satisfy the search. The time function Tnode(u,m) returns the
latency taken by u to access m.
As we are aware of, a memory access from an entity may
trigger other entities with subsequent memory accesses to
complete a computing task. For instance, when a regular user
initiates a memory access, it can trigger the modern processor
to access page-table entries. Such situation is modeled by the
directed memory graph defined below.
Definition 1 (Directed Memory Graph). A directed memory
graph G (e.g., Figure 2) is a pair (M,E), where memory
addresses in M constitutes the nodes of G, and E contains
all the directed edges. A directed edge in E is represented by
a quintuple such as (ma, u1,m1) in Figure 2, where ma and
m1 ∈ M , u1 ∈ U , respectively. An edge (ma, u1,m1) ∈ E
has the following semantics:
• u1 ∈ Π(m1) and,
• an access to ma can potentially trigger u1 to access m1
within time Tedge(ma, u1,m1).
Note that the time Tedge(ma, u1,m1) is decided by
ma triggering u1 and then u1 accessing m1. As such,
Tedge(ma, u1,m1) should be greater than Tnode(u1,m1) given
the time taken by the trigger. Since memory addresses in this
model have different memory types, there exist other edges
starting from ma such as (ma, u1, mˆ1). Which edge to access
at runtime is highly dependent on the time taken by the edge.
Intuitively, the edge that has a shorter time would have a higher
chance to be selected. Take Level-1 cache as an example, it is
shared between all the cores of the processor and partitioned
into multiple slices (one for each core). Each core will choose
to access its own slice rather than others since the time to
access its own slice is faster.
To exploit the rowhammer bug, an attacker must hammer
a node (e.g., mh in Figure 2) that is located in the DRAM
row, rather than other nodes (e.g., mˆh in cache). As such,
the attacker is supposed to select the edge (mn, uh,mh) at
runtime and we call such edge a memory access edge.
Definition 2 (Memory Access Edge). An edge (mn, uh,mh)
with V(mh) = 1 is defined as a memory access edge, denoted
by (−−−−−−−→mn, uh,mh) if ∀(mn, uh, mˆh) ∈ E (mˆh 6= mh) satisfies
the following requirements:
• V(mˆh) = 0 or,
• Tedge(mn, uh, mˆh) > Tedge(mn, uh,mh).
If V(mˆh) = 0, such nodes do not contain valid content.
Thus, their edges will not be selected at runtime. Or if the
edges (e.g., (mn, uh, mˆh)) take a longer time, then such
edges will not be taken, either. As runtime, specifying the
memory access to mh can be done by setting V(mˆh) = 0 if
Tedge(mn, uh, mˆh) < Tedge(mn, uh,mh). We use a function
Tset(mh) to denote the time cost of specifying the memory
access to mh. For instance, mˆh and mh are within the cache
and DRAM row, respectively and each holds the same valid
data. If uh wants to access mh, uh could either invoke an
instruction (e.g., clflush) or leverage previous cache eviction
approaches [1], [13] to flush mˆh, that is, set V(mˆh) to 0.
If an access to another node such as ma shown in Figure 2
triggers a chain of memory accesses at runtime, a communi-
cation path is built up. Formally, the communication path is
defined below.
Definition 3 (Communication Path). As shown in Figure 2,
m1 and mn ∈M (n > 1). A communication path P(m1,mn)
is a sequence of memory access edges (~ei, i ∈ [1, 2, ..., n −
1]) for which there is a sequence of distinct nodes (mi,
i ∈ [1, 2, ..., n]) such that ~ei = (−−−−−−−−−−→mi, ui+1,mi+1) for i ∈
[1, 2, ..., n− 1].
Given the path P(m1,mn), we use last(P(m1,mn))
to denote the last memory access edge in the path. Let
~e = last(P(m1,mn)). Then, P(m1,mn)|~e means a subpath
of P(m1,mn) excluding ~e; that is, the concatenation of
P(m1,mn)|~e and ~e is P(mn−1,mn). For path P(m1,mn), its
time latency is denoted by Tp(P(m1,mn)).
Definition 4 (Communication Latency). Let P(ma,mh) be
a communication path, last(P(ma,mh)) = ~e, and ~e =
(−−−−−−−→mn, uh,mh). Then, Tp(P(ma,mh)) is defined as:
• Tp(P(ma,mh)) = Tedge(mn, uh,mh), if P(ma,mh) = ~e,
otherwise,
• Tp(P(ma,mh)) = Tp(P(ma,mh)|~e) + Tedge(mn, uh,mh).
Note that when P(ma,mh) = ~e, then ma and mn are the
same node, i.e., ma = mn such that P(ma,mh) has only one
memory access edge.
When a hammer row is being hammered, the rowhammer
bug can badly affect either the hammer row itself [21],
or a victim row that is at least one-row away (within the
same DRAM bank) from the hammer row [18], [38]. We
use Rmax to indicate the maximum row distance between
a hammer row and a victim row, since some defenses rely
on empirically-determined Rmax for their effectiveness. For
example, ZebRAM [20] empirically reported that Rmax is 1 on
its test DRAM modules. However, our experiments in Lenovo
X230 show that Rmax can be 2 (see Section V).
Let mh being the hammered node in DRAM and mv being
another affected DRAM node, then mv resides either in the
same row as mh or a victim row. We use a row-index function
Row(mh) to return the row index of mh if mh is within a
row, or −1 otherwise. As such, the row distance between
Row(mh) and Row(mv) should be no greater than Rmax. To
make the rowhammer bug exploitable, mv contains sensitive
information (e.g., a page table or a cryptographic key), making
sensitivity function S(mv) return 1, otherwise 0.
As a minimum hammering frequency is required to hammer
mh and trigger the rowhammer bug, we use Tmax to represent
a required maximum time latency per hammering. Tmax is
highly dependent on a DRAM module and the hammering
technique. For example, we perform double-sided rowhammer
test in the Lenovo X230 in Section IV-A5 and the maximum
time cost per double-sided hammering (i.e., Tmax) should be
less than 1500 cycles to trigger rowhammer bit flips.
To this end, the following defines the necessary conditions
for a TeleHammer based exploit.
Definition 5 (TeleHammer). Let G be the directed memory
graph of a computing task being conducted by an attack
process a, exemplified in Figure 2, where ma, mh and
mv (a 6∈ Π(mv)) represent an attack address, a hammer
address and a victim address, respectively. a can launch a
TeleHammer-based exploit, if conditions below are satisfied:
• Row(mh) 6= −1, Row(mv) 6= −1, S(mv) = 1,
• | Row(mh)− Row(mv) |≤ Rmax,
• ∃P(ma,mh) in G,
• Tset(mh) + Tnode(a,ma) + Tp(P(ma,mh)) + Tδ ≤ Tmax
The last condition specifies the time requirement for Tele-
Hammer. As shown in Figure 2, modern hardware expects
to take the fastest path to handle the computing task for the
attack process a, i.e., P(ma, mˆh). However, the path must be
changed to P(ma,mh) so as to hammer mh by using uh. As
uh accessing nodes such as mˆh takes a shorter time, a must
set such nodes invalid to specify mh for uh to access. The
time taken by the specifying is Tset(mh).
Also, a considers the time by accessing ma (i.e.,
Tnode(a,ma)), the time by walking through the communica-
tion path P(ma, uh,mh) (i.e., Tp(P(ma,mh))) and the time a
has to wait to perform next hammering (i.e., Tδ).
Note that the value of Tδ is dependent on whether mh is
last node that a computing task needs to access. As mh is
in Figure 2, Tδ is then negligible. Otherwise, a waits for Tδ ,
in which period the computing task reaches last node from
mh. Take the task of one system call handler as an example,
if a targets mh that hosts one entry of system-call table for
hammering, a needs to wait for the system call handler to
complete its routine after mh is accessed.
When ma and mh refer to the same memory address, i.e.,
ma = mh, then TeleHammer has an access to mh and actually
becomes PeriHammer. As such, we can also formally define
PeriHammer below based on the above formal model.
Definition 6 (PeriHammer). PeriHammer would succeed if the
following conditions are met:
• Row(mh) 6= −1, Row(mv) 6= −1, S(mv) = 1,
• | Row(mh)− Row(mv) |≤ Rmax,
• a ∈ Π(mh),
• Tset(mh) + Tnode(a,mh) + Tδ ≤ Tmax
Clearly, the last condition removes the latency caused by the
path P(ma,mh), making it faster to hammer once. Besides,
α… …
… …
𝑚1
𝑚𝑛
…
𝑚𝑎
𝑚ℎ
 𝑚1  𝑚1
 𝑚ℎ
TeleHammer PeriHammer
 𝑚ℎ
𝑚𝑣
(𝑚𝑎 , 𝑢1, 𝑚1)
(𝑚𝑛 , 𝑢ℎ , 𝑚ℎ)
α
α α
Fig. 2: Formal Modeling of TeleHammer and PeriHammer.
TeleHammer specifies the path from the initial memory access
(−−−−−−−→ma, u1,m1) to the last (−−−−−−−→mn, uh,mh) so as to hammer
mh indirectly. When ma and mh refer to the same node,
i.e., ma = mh TeleHammer can hammer mh directly and
thus becomes PeriHammer. (Node mˆh has a dashed circle,
meaning that Tedge(mn, uh, mˆh) has a lower latency than
Tedge(mn, uh,mh), thus a has to set mˆh to invalid to specify
the memory access edge from mn to mh.)
it is much easier for a to specify the access to mh other
than other nodes and spend much less time compared to a in
TeleHammer as discussed in Definition III-B. Tδ is neglectable
since mh is the only accessed node and a does not have to
wait for next hammering.
A comparison of TeleHammer and PeriHammer: as shown
in Figure 2, TeleHammer is effective against the rowhammer
defenses where mh is located in a physically isolated DRAM
partition, since it requires no access to mh.
On top of that, TeleHammer is stealthy and hard to be traced
by dynamic analysis at runtime, since it has a complicated
communication path and hammer mh by using uh. In contrast,
an attacker via PeriHammer hammers mh directly by herself.
Besides, mitigating TeleHammer is challenging due to
abundant communication path candidates. TeleHammer can
identify as many paths as possible by leveraging built-in
features of modern hardware and/or software. Thus eliminating
the communication path we have identified in the following
sections essentially cannot defend against TeleHammer.
Clearly, TeleHammer is slower than PeriHammer by com-
paring their time condition in their respective definition, in-
dicating that PeriHammer is potentially faster in inducing bit
flips.
To demonstrate the practicality of TeleHammer, we have
created an instance of it, called PThammer. Besides, we
discuss in detail about other possible instances in Section V.
C. PThammer: page-table based TeleHammer
PThammer is page-table based TeleHammer. It allows an
unprivileged attacker to cross privilege boundary and hammer
page tables, resulting in bit flips in other page tables. In the
following, we discuss how PThammer satisfies the formal
conditions in Definition 5.
1) Satisfy Formal Definition 5: First, page tables are critical
in memory isolation and are inaccessible to an unprivileged
attacker. If the attacker can compromise a memory address
hosting page tables by the rowhammer effect, then it satisfies
the first condition; here, the address refers to mv and S(mv) =
1.
Second, page tables are common and can be widely dis-
tributed in modern OS kernels. Thus, both mh and mv can
be kernel addresses hosting page tables, that is, hammering
page tables of mh will flip bits in page tables of mv . To
this end, we can leverage previous works such as memory
spray [7] and memory ambush [34] to force the kernel to
create a large number of page-table pages, with the hope that
some page tables are placed into hammer addresses like mh
while some are within victim addresses like mv . As such,
we can create numerous pairs of such hammer addresses and
victim addresses so that they become highly likely to induce
exploitable bit flips in page tables. Note that the rowhammer
defense (i.e., CTA [37]) allocates all page tables from a
reserved memory partition and this will greatly increase the
number of pairs compared to page-table allocation from the
whole system memory.
Third, there exists a communication path (see Definition 5)
that allows an attacker to indirectly access page tables. To
this end, we observe that a least privileged memory triggers
an address translation where the processor can access page
tables from memory. When a user allocates a virtual memory
page by malloc and then accesses the page for the first time,
an address-translation process occurs. Within the process, the
processor performs multi-level page-table walk, populates cor-
responding page-table entries (PTEs) and allocates a physical
memory page for the user. To facilitate subsequent memory
access as shown in Figure 3, Translation Look-aside Buffer
(TLB) stores a complete address mapping from a virtual
address to a physical address. Paging structure caches a
partial address mapping of different page-table levels [2]. For
instance, the paging structure of Level-2 PD translates a virtual
address to a physical address of Level-1 PT. With bits 12∼20
from the virtual address [15], a corresponding physical address
of a Level-1 PTE (L1PTE) can be obtained. CPU cache copies
the accessed four-level PTEs from memory. By doing so, the
processor will search these hardware structures in the order
of priority to get a matching physical address. If TLB, paging
structure and cache are all effectively flushed, the processor
then has to access the four-level PTEs from memory. As such,
an access to an address ma by a can trigger the processor to
access four-level PTEs from memory if the flushing operation
is conducted effectively.
Last, as ma can be within cache, the time (Tnode(a,ma))
to access it is negligible. To meet the time condition in
the definition, the time (Tp(P(ma,mh))) to walk through the
above identified path, the time (Tset(mh)) to specify the path,
and the time (Tδ) to wait for next hammering must all be as
low as possible.
Optimize Tp(P(ma,mh)) and Tδ: we optimize the identified
communication path by making mh host one L1PTE rather
than other-level PTE, shown as a solid line with an arrow in
Figure 3. The path is optimized for following reasons.
• flushing paging structure is required when mh hosts other-
level PTE. Directly flushing paging structure requires ex-
ecuting a privileged instruction such as invlpg, while
indirect flushing needs to reverse-engineer the mapping
between a virtual address and the paging structure index.
• flushing all-level (or other single-level) PTEs from paging
structure and cache is intuitively more time consuming
compared to flushing the L1PTE, as shown in Figure 3.
• specifying such a path consumes much less memory, mak-
ing the exploit stealthier. As mentioned above, we need to
allocate a lot of page-table pages to flip exploitable bits
in page tables. Creating a PT-level page of 512 entries
requires exhausting 2MiB memory. For a higher page-table-
level page, its creation requires much more memory. For
example, the PD-level page creation requires 1GiB memory.
• compared to other-level PTEs, a bit flip in one L1PTE is
easier to become exploitable and gain privilege escalation,
since the L1PTE decides the physical page that a user can
access as well as the access permission.
As mh that hosts one L1PTE is the last accessed memory
address when translating a virtual address, it means that Tδ ≈
0.
Optimize Tset(mh): to specify the path to the L1PTE (shown
in red solid line in Figure 3), we only need to flush the address
mapping from TLB and the L1PTE from cache with the PD-
level paging structure still being effective.
Intuitively, we can simply invoke invlpg to flush the
whole TLB. As for the cache flush, we can perform a page-
table walk, get the virtual address of the L1PTE and thus flush
its valid content from cache by invoking clflush. By doing
so, we are able to flush both TLB and cache as quickly as
possible. However, kernel privilege is required to complete the
flushing. Alternatively, we can perform the flushing indirectly
by manipulating cache and TLB replacement states. As the
size of TLB and cache is limited, we can simply create many
pages as an eviction buffer and access those pages one by one
so as to evict target TLB entry and cache line. Although this
approach can effectively flush both TLB and cache, it does
not reduce Tset(mh) to its minimum.
In a nutshell, the key challenge to minimize the time is to
determine two minimum eviction sets so as to flush targeted
TLB entry and cache line effectively and efficiently.
2) Effective and Efficient TLB Flush: As Gras et al. [10]
have revealed that there exists an explicit mapping between a
virtual page number and multi-level TLB set, we simply create
an initial eviction set that contains multiple (physical) pages
TLB
miss
virtual address
physical address
hit
PD
(paging structure)
PDPT
(paging structure) 
PML4
(paging structure)
miss
miss
hit
hit
hit PDPTE 
(memory)
PML4E
(memory)
CR3
miss
PDPTE
(cache)
PML4E
(cache)
PDE 
(cache)
PTE 
(cache)
PDE
(memory)
PTE
(memory)
miss
miss
miss
miss
hit
hit
hit
hit
Level-4
Level-3
Level-2
Level-1
Fig. 3: Address Translation. A solid line with an arrow indi-
cates the fastest communication path that PThammer identifies
to hammer a Level-1 page-table entry (PTE). When specifying
the path, PThammer only flushes TLB and cache while retains
all-level paging structure effective. Note that PML4E, PDPTE,
PDE are other three-level PTE, respectively.
to flush a cached virtual address from TLB. One subset of the
pages is congruent and mapped to a same L1 dTLB set while
the other is congruent and mapped to a same L2 sTLB set if
TLB applies a non-inclusive policy.
Take one of our test machines, Lenovo T420, as an ex-
ample, both L1 dTLB and L2 sTLB have a 4-way set-
associative for every TLB set and thus 8 (physical) pages
are enough as an minimum eviction set to evict a target
virtual address from TLB. However, when we create such an
eviction set and then profile the access latency of a target
virtual address, its latency remains unstable. To collect finer-
grained information on TLB misses induced by the target
address, we develop a kernel module that applies Intel Perfor-
mance Counters (PMCs) to monitor the TLB-miss event (i.e.,
dtlb_load_misses.miss_causes_a_walk). The ex-
perimental results show that TLB misses in both levels do
not always occur when profiling the target address, meaning
that the target address has not been effectively evicted by the
eviction set, and thereby rendering the TLB flush ineffective.
A possible reason is that the eviction policy on TLB is not
true Least Recently Used (LRU).
Decide the Minimal Size for a TLB Eviction Set: to this
end, we propose a working Algorithm 1 that can decide a
minimal size without knowing its eviction policy. Note that the
minimal size is used to prepare a minimal TLB eviction set in
PThammer while PThammer itself does not use the algorithm.
Specifically, line 2 to 8 defines a function profile tlb set
that reports a TLB-miss number (tlb miss num) induced by
accessing target addr. Specifically, the function argument
(i.e., set) is write-accessed (line 4-6) to flush the cached
target addr in TLB (line 3) and then tlb miss num of
write-accessing target addr is reported in line 7. Based on a
pre-allocated buf , we select those all pages that are indexed
Algorithm 1: Decide a minimal eviction-set size for TLB
1 Initially: target addr is a page-aligned virtual address that
needs its cached TLB entry flushed. A buffer (buf ) is
pre-allocated, size of which is decided by available TLB
entries. A set (init set) is initialized to empty. A unique
number is assigned to data marker.
2 Function profile tlb set(set)
3 target addr ← data marker
4 foreach page ∈ set do
5 page[0]← data marker
6 end
7 tlb miss num is decided by accessing target addr.
8 return tlb miss num
9 foreach page ∈ buf do
10 if page and target addr are in the same set then
11 page[0]← data marker
12 add page into init set.
13 end
14 end
15 threshold← profile tlb set(init set)
16 for page ∈ init set do
17 take one page out of init set.
18 temp tlb miss← profile tlb set(init set)
19 if temp tlb miss < threshold then
20 put page back into init set and break.
21 end
22 end
23 return the size of init set
to the same TLB set as the target addr by leveraging the
reverse engineered mapping [10] in line 9-14. Note that the
buf size is large enough to effectively flush any targeted
virtual address and it is decided by the number of TLB entries
that serve 4KiB-page translation if target addr is allocated
from a 4KiB-page list, otherwise, the number of TLB entries
that support 2MiB or 1GiB should be involved. The selected
pages are then populated and added into init set as shown
in line 10-13. It is necessary to populate the selected pages in
order to trigger the address-translation feature and thus TLB
will cache address mappings accordingly. In line 15, we can
gain a threshold for effective TLB flushes. We then start to
tailor the set to its minimum while retain its effectiveness in
line 16-23.
3) Effective and Efficient Cache Flush: Now we are going
to flush a cached Level-1 PTE (L1PTE) that corresponds to a
target virtual address. Considering that last-level cache (LLC)
is inclusive [15], we target flushing the L1PTE from LLC
such that the L1PTE will also be flushed out from both L1
and L2 caches (we thus use cache and LLC interchangeably
in the following section). In contrast to TLB that is addressed
by a virtual page-frame number, LLC is indexed by physical-
address bits, the mapping between them has also been reverse
engineered [14], [26], [16]. Based on the mapping, we can
intuitively create an eviction set consisting of many congruent
memory lines (i.e.,cache-line-aligned virtual addresses), which
are mapped to the same cache slice and cache set as the
L1PTE. On top of that, the eviction set can also be minimized
in case where the eviction policy of LLC is not publicly
documented.
Decide the Minimal Size for an LLC Eviction Set: we
extend the aforementioned kernel module to count the event of
LLC misses (i.e., longest_lat_cache.miss) and have a
similar algorithm to Algorithm 1 to decide the minimal size for
an LLC eviction set, namely, prepare a large enough eviction
set congruent as a target virtual address and gain a threshold
of LLC-miss number induced by accessing the target address,
remove memory lines randomly from the set one by one and
verify whether currently induced LLC-miss number is less
than the threshold. If yes, a minimal size can be determined.
Also, this algorithm is performed in an offline phase long
before PThammer is launched.
Although the size of eviction-set is determined ahead of
time, PThammer in our threat model cannot know the mapping
between a virtual and a physical address, making it challenging
to prepare an eviction set for any target virtual address during
its execution. Also, PThammer cannot obtain the L1PTE’s
physical address, and thus it is difficult to learn the L1PTE’s
exact location (e.g., cache set and cache slice) in LLC. To
address the above two problems, PThammer at the beginning
prepares a complete pool of eviction sets, which can be used to
flush any target data object including the L1PTE. It then selects
an eviction set from the pool to evict a target L1PTE without
knowing the L1PTE’s cache location. Note that preparing the
eviction pool is a one-off cost and PThammer only need to
repeatedly select eviction sets from the pool when hammering
L1PTEs.
Prepare a Complete Pool of LLC Eviction Sets: the pool
has a large enough number of eviction sets and each can be
used to flush a memory line from a specific cache set within
a cache slice in LLC. The size of each eviction set is the
pre-determined minimum size. We implement the preparation
based on previous works [25], [9]. Both works rely on the
observation that a program can determine whether a target line
is cached or not by profiling its access latency. If a candidate
set of memory lines is its eviction set, then the target line’s
access latency is above a time threshold after iterating all the
memory lines within the candidate set.
Specifically, if a target system enables superpage, a virtual
address and its corresponding physical address have the same
least significant 21 bits, indicating that if we know a virtual
address from a pre-allocated super page, then its physical
address bit 0∼20 is leaked and thus we know the cache set
index that the virtual address maps to (see Section II-A).
The only unsolved is the cache slice index. Based on a past
algorithm [25], we allocate a large enough memory buffer
(e.g., twice the size of LLC), select memory lines from the
buffer that have the same cache-set index and group them into
different eviction sets, each for one cache slice.
If superpage is disabled, then only the least significant 12
bits (i.e., 4KiB-page offset) is shared between virtual and
physical addresses and consequently we know a partial cache-
set index (i.e., bits 6∼11). As such, we utilize another previous
work [9] to group potentially congruent memory lines into
Algorithm 2: Select a minimal LLC eviction set
1 Initially: a virtual page-aligned address (target addr) is
allocated and needs its L1PTE cache-line flushed. A complete
pool of individual eviction sets (eviction sets).
l1pte offset is decided by target addr. max latency is
initialized to 0 and indicates the maximum latency induced by
accessing target addr. max set represents the eviction set
used for the L1PTE cache flush.
2 Function profile evict set(set, target)
3 foreach memory line ∈ set do
4 read-access memory line.
5 end
6 flush a target TLB entry.
7 latency is decided by accessing target.
8 return latency
9 foreach set ∈ eviction sets do
10 obtain page offset from first memory line in set.
11 if page offset == l1pte offset then
12 latency ← profile evict set(set, target addr).
13 if max latency < latency then
14 max latency = latency.
15 max set = set.
16 end
17 end
18 end
19 return max set
a complete pool of individual eviction sets. Compared to
the above grouping operation, this grouping process is much
slower, since there are many more memory lines sharing the
same partial cache-set bits rather than complete bits.
Select a Target LLC Eviction Set: based on the pool
preparation, we develop an Algorithm 2 to select an eviction
set from the pool and evict a L1PTE corresponding to a target
address.
In line 9, we enumerate all the eviction sets in the pool
and collect those sets that have the same page offset as the
L1PTE in line 11. This collect policy is based on an interesting
property of the cache. Oren et al. [30] report that if there are
two different physical memory pages that their first memory
lines are mapped to the same cache set of LLC, then the rest
memory lines of the two pages also share (different) cache
sets. This means if we request many (physical) memory lines
that have the same page offset as the L1PTE and access each
memory line, then we can flush the L1PTE from LLC.
After the selection, line 12-16 will select the target eviction
set from the collected ones. In line 12, we profile every
selected eviction set through a predefined function from line
2-8. Within this function, we perform read access to each
memory line of one eviction set, which will implicitly flush
the L1PTE from LLC if the eviction set is congruent with
the L1PTE, and then flush the target TLB entry related to
target addr to make sure the subsequent address translation
will access the L1PTE. At last, we measure the latency induced
by accessing target addr. Based on this function, we can
find the targeted eviction set that causes the maximum latency
in line 13-16, as fetching the L1PTE from DRAM is time-
consuming when accessing target addr triggers the address
translation in line 7. Give that LLC is shared between page-
table entries and user data, we must carefully set target addr
to page-aligned (normally 4KiB-aligned) but not superpage-
aligned (normally 2MiB-aligned), that is, its page offset is 0
and different from l1pte offset, which is the page offset of
L1PTE. As such, they are placed into different cache sets and
the selected eviction set is ensured to flush the target L1PTE
rather than target addr.
IV. EVALUATION
In this section, we test PThammer on three different ma-
chines running a Ubuntu system shown in Table I and each
Ubuntu system by default disables the superpage feature. No
matter whether superpage is enabled or not, we can observe
the first cross-boundary bit flip on each test machine. As a
case, we then leverage PThammer to compromise the state-of-
the-art rowhammer defenses on Lenovo T420 with the default
system setting.
A. PThammer
We first decide the minimal eviction-set size to effectively
and efficiently flush TLB and last-level cache (LLC) at an
offline stage. Based on the minimal size, we can prepare a
minimal TLB or LLC eviction set from a complete pool of
TLB or LLC eviction sets.
1) Respective Minimal Eviction-Set Size: Based on the
Algorithm 1 in Section III-C2, we first obtain an initial eviction
set where its page number is twice the number of both L1dTLB
and L2sTLB wayness and each page in the set is mapped to
the same L1dTLB set or L2sTLB set as a target page-aligned
virtual address. We then remove one page from the set each
time to check a TLB miss rate of the target virtual address,
as shown in Figure 4a. As all the test machines have the
same wayness of L1dTLB and L2sTLB, the TLB miss rate
on each machine initially remains relatively stable when the
eviction-set size drops down one by one until 12, and thereafter
decreases dramatically. We choose 12 as the minimal TLB
eviction-set size.
For LLC, each Lenovo machine has 12-wayness of LLC
and thus each initial eviction set is set to 24 memory lines
that map to the same LLC set as a target virtual address. For
the Dell machine, it has 16-wayness and thus its initial set
size is 32. Similar to TLB, memory lines in the eviction set
are also removed one by one and the LLC miss rate for each
removal is shown in Figure 4b. Clearly, the LLC miss rate on
the Lenovo machines stays stable (more than 95%) until the
set size of 13, but decreases gradually below 90% after 12.
For the Dell machine, its miss rate drops below 94% when its
set size is less than 17. As such, we choose 13 as the minimal
eviction-set size for the Lenovo machines and 17 for the Dell
machine.
2) One-off Respective Eviction Pool Preparation: For TLB,
we allocate a complete pool of 4KiB pages and its page
number is 8 times as many as the number of both L1dTLB
and L2sTLB entries that translate a 4KiB-page. As can be see
Machine Model Architecture CPU DRAMVersion TLB (Wayness) LLC (Wayness, Size)
Lenovo T420 Sandy Bridge i5-2540M 4-way L1dTLB, 4-way L2sTLB 12-way, 3MiB Samsung DDR3 8GiB
Lenovo X230 Ivy Bridge i5-3230M 4-way L1dTLB, 4-way L2sTLB 12-way, 3MiB Samsung DDR3 8GiB
Dell E6420 Sandy Bridge i7-2640M 4-way L1dTLB, 4-way L2sTLB 16-way, 4MiB Samsung DDR3 8GiB
TABLE I: System Configurations.
20 30 40 50 60 70 80 90 100
Miss Rate (Percentage %)
11
12
13
14
15
16
TL
B 
Ev
ict
io
n 
Se
t (
Pa
ge
 #
)
Lenovo T420 Lenovo X230 Dell E5420
(a) The TLB miss rate on each machine remains relatively stable (no
more than 95%) when the TLB eviction set reduces from 16 to 12
and then decreases dramatically when the set is reduced to 11.
50 60 70 80 90 100
Miss Rate (Percentage %)
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
LL
C 
Ev
ict
io
n 
Se
t (
M
em
or
y 
Lin
e 
#)
Lenovo T420 Lenovo X230 Dell E5420
(b) The LLC miss rate on the two Lenovo machines stays greater than
95% when their eviction set is no more than 13 while Dell machine
still has a high miss rate of 94% when its eviction set is 17.
Fig. 4: TLB/LLC miss rate with regard to the size of TLB/LLC eviction set.
Machine Model System Setting One-off Eviction Pool Preparation Eviction Set Selection Hammering CostTLB LLC TLB LLC Until First Bit Flip
Lenovo T420 superpage 11millisec 0.3min 1microsec 285millisec 10minregularpage 11millisec 18min 1microsec 283millisec 10min
Lenovo X230 superpage 7millisec 0.3min 1microsec 282millisec 15minregularpage 7millisec 19min 1microsec 288millisec 15min
Dell E6420 superpage 7millisec 0.3min 1microsec 258millisec 14minregularpage 7millisec 38min 1microsec 270millisec 12min
TABLE II: Averaged time costs for conducting PThammer on each machine for 5 runs. The first bit flip can be observed
within 15 minutes of double-sided hammering. The pool preparation for either TLB or LLC is a one-off cost and performed
only once at the beginning of PThammer such that respective TLB and LLC eviction set are selected each time right before
hammering.
from Table II, TLB pool preparation in each setting on each
test machine is pretty fast (within milliseconds).
For LLC, we prepare a complete pool of either 2MiB pages
(i.e., superpage enabled) or 4KiB pages (i.e., regularpage by
default) and its size in both settings are twice the size of LLC.
As the cache-set bits in the superpage setting are all known,
the eviction pool preparation is much faster compared to that in
the regularpage setting. Particularly, in the regularpage setting,
the pool preparation in the Dell machine takes nearly 20
minutes longer than that in the two Lenovo machines and this
is probably because that the Dell machine has larger wayness
and size of LLC (see Table I). For each machine on each
setting, the number of eviction sets in each pool is almost
the same as the LLC set-number, making the efficiency of
selecting an eviction set comparable (within 290 milliseconds).
3) Respective Minimal Eviction Set Selection: TLB eviction
set selection relies on a complete reverse-engineered mapping
between virtual addresses and TLB sets [10], and thus it
introduces no false positives, indicating that PThammer can
always select a matching eviction set for TLB).
However, selecting an LLC eviction set is based on pro-
filing the access latency to a target address, described in
Algorithm 2. As such, the profiled latency is not completely
precise due to noise (e.g., interrupts) and may introduce false
positives to the selection. To this end, we develop a kernel
module to obtain the physical address of each L1PTE and
thus verify whether the L1PTE is congruent with the eviction
set selected by Algorithm 2. The experimental results show
that the eviction set selection for LLC has no more than 6%
false positives in each system setting on each test machine.
300 400 500 600 700 800 900 1000110012001300140015001600
Time (Cycles) Per Doubld-sided Hammering
0
500
1000
1500
2000
Ti
m
e 
(S
ec
on
ds
) U
nt
il 
Fi
rs
t B
it 
Fl
ip
Lenovo T420 Lenovo X230 Dell E5420
Fig. 5: As the time cost per double-sided hammering increases,
the time to find the first bit flip also grows. When the time
cost per hammering is greater than 1500 cycles in both Lenovo
machines and 1600 cycles in the Dell machine, no bit flip is
observed within 2 hours.
Note that selecting a TLB-based eviction set is within 1
microsecond while the LLC eviction set selection is within 290
milliseconds. Both of them are quite efficient, indicating that
we can quickly start double-sided PThammering as mentioned
below.
4) Double-sided PThammering: To efficiently induce bit
flips, we should hammer two L1PTEs that are one row apart
within the same bank, similar to the way how double-sided
hammering works. As such, we expect to select appropriate
user virtual addresses such that their relevant L1PTEs meet
the above requirement. However, the physical address of each
L1PTE is required to know its location (i.e., DIMM, rank,
bank, and row) in DRAM given that a physical address to
a DRAM location has been reverse-engineering [38], [31].
As we have no permission to access the kernel space, we
cannot know the the physical address of a L1PTE, making
it challenging to conduct double-sided PThammering.
To address this problem, we are inspired by previous
works [7], [21], [36] and have two following steps. Firstly,
we select a pair of addresses that their respective L1PTEs
are highly likely to be two rows apart. As the DRAM row
size per row index (i.e., RowsSize) is publicly available (e.g.,
RowsSize is 256KiB in each test machine), we can abuse the
buddy allocator to force a large enough allocation of Level-1
page tables (e.g., 2GiB out of the total 8GiB DRAM for Level-
1 page-tables, 8K times as large as RowsSize). By doing so,
most page-table pages are in consecutive physical-page order
and will exhaust consecutive rows with a high probability. On
top of that, one Level-1 page-table page of 4KiB maps 512
physical pages of 4KiB. To this end, we choose such pair of
addresses that their address difference is : 2 ·RowsSize · 512
(e.g., the address difference is 256MiB in our test machines).
Secondly, loading two L1PTEs residing in the same bank of
different rows triggers the row-buffer conflict [29] and causes
clearly higher latency, compared to loading two L1PTEs that
are within different banks. To this end, we perform TLB and
LLC flushes for a selected address pair to load two L1PTEs
from DRAM, and then profile the access latency to the address
pair. If the latency is no less than a predetermined threshold,
then the two L1PTEs are believed to be in the same bank with
one row apart and the corresponding address pair will thus be
sent for hammering. We identify the threshold at an offline
stage by profiling access latency to 1000 address pairs of two
kinds (i.e., one kind is that two L1PTEs for one address pair
are in different rows of the same bank while the other one is
that the two L1PTEs are in different banks). The experimental
results show that each machine in both system settings has the
same threshold. In each setting of each machine, no less than
95% of address pairs are in the same bank when their access
latency satisfies the threshold. Within the 95% address pairs,
no less than 90% address pairs are in the same bank with one
row apart.
5) Time Costs for PThammer: As talked in Section III-C,
the time cost per double-sided hammering must be no greater
than the maximum latency allowed to induce bit flips. We
firstly identify the maximum time cost that permits a bit flip
on each machine through a published double-sided hammering
tool 1.
The tool embeds two clflush instructions inside each
round of double-sided hammering. To increase the time cost
for each round of hammering, we add a certain number of NOP
instructions preceding the clflush instructions in each run
of the tool. We incrementally add the NOP number so that the
time cost per hammering will grow after each run. The time
cost for the first bit flip to occur on each machine is shown
in Figure 5. As shown in the Figure, the time cost until the
first bit flip increases with an increasing cost per hammering.
When the time cost per hammering is more than 1500 cycles
on both Lenovo machines while 1600 on the Dell machine,
not a single bit flip is observed within 2 hours. As such, 1500
and 1600 can be the maximum cost permitted to flip bits for
the Lenovo and Dell machines, respectively.
We then check whether the time taken by each round of
double-sided PThammering is no greater than the permitted
cost. For each double-sided PThammering, it requires accesses
to two user virtual addresses as well as their respective TLB
eviction set (i.e., 24 virtual addresses in total on each machine)
and LLC eviction set (i.e., 26 virtual addresses on each Lenovo
machine and 34 virtual addresses on the Dell machine). In
both system settings, we conduct double-sided PThammering
for 50 rounds and measure the time that each round takes.
The experimental results (see Figure 6 in Appendix A) show
that the time taken per double-sided PThammering is well
below the aforementioned maximum cost on each machine,
meaning that most address accesses within each PThammering
are served by CPU caches rather than DRAM.
1 https://github.com/google/rowhammer-test
As a result, we conduct PThammer for 5 runs in both system
settings and display the averaged time costs in Table II. The
table clearly shows that we can successfully observe the first
bit flip within 15 minutes of double-sided PThammering.
B. Defeat the state-of-the-art software-only defenses
To defend against rowhammer attacks, numerous software-
only defenses have been proposed. Among the software-
based defenses, CATT [6], RIP-RH [4] and CTA [37] are
practical to mitigate existing rowhammer attacks in bare-metal
systems. Note that RIP-RH [4] enforces DRAM-based process
isolation and thus prevents attackers from hammering target
user processes. However, it does not protect the kernel and
its page tables. Clearly, PThammer can defeat it by inducing
rowhammer bit flips in a L1 PTE and gain kernel privilege.
In this section, we use Lenovo T420 to demonstrate proof-of-
concept attacks against CATT [6] and CTA [37] respectively
in the default system setting.
Compromise CATT [6]: CATT [6] partitions each DRAM
bank into a kernel part and a user part. These two parts are
separated by at lease one unused row. When physical memory
request is initiated, CATT allocates memory from either the
kernel part or the user part according to the intended use of
the memory. By doing so, CATT can confine bit-flips induced
by the user domain to its own partition and thereby eliminate
exploitable hammer rows that badly affect the kernel domain,
the so-called physical kernel isolation.
However, we are still able to implicitly hammer kernel
memory from the user domain by leveraging PThammer.
Specifically, our exploit has the following four phases:
1) Rely on the past work [7] to allocate consecutive DRAM
rows for Level-1 page-table pages (2GiB memory is allo-
cated out of the total 8GiB DRAM);
2) Perform double-sided PThammering.
3) Verify whether “exploitable” bit flips have occurred by
checking if a virtual address points to a page-aligned
L1PTE. If not, go to step 2 and restart PThammer (On
average, 4 PThammer-induced bit flips are needed to have
one exploitable one);
4) If yes, we have gained the kernel privilege and we can gain
the root privilege by changing uid of current process to 0.
Compromise CTA [37]: CTA (i.e., Cell-Type-Aware) [37]
focuses on PTE-based privilege escalation rowhammer attacks.
In such attacks, all the attackers induce bit-flips in L1PTEs
such that the induced PTEs no longer point to the attackers’
memory pages but instead point to other page-table pages of
the same process, thereby gaining illegal access to the page
tables. In order to destroy this core property, CTA places
Level-1 page tables in DRAM true-cells above a “Low Water
Mark” in the physical memory. If a PTE has a bit-flip in its
physical frame number, it only points to a physical address
lower than the “Low Water Mark” rather than the page-table
region.
By leveraging PThammer, we can break CTA and gain root
privilege. The key steps for the exploit are listed below:
1) We spray the physical memory under the “Low Water
Mark” with a large enough number of security critical
structures, i.e., cred (note that cred stores the uid field.).
Specifically, the attack process creates 32K child processes
by invoking the fork system call. For each child process
creation, the kernel allocates a kernel stack and multiple
kernel structures including cred.
2) Inside each child process, it firstly registers a signal and
then goes to sleep. The registered signal will help the attack
process wake up the child process when necessary.
3) After completing the child-process creations, the attack
process starts to occupy consecutive DRAM rows above the
“Low Water Mark” by forcing page-table page allocations
(4GiB memory is used out of the total 8GiB DRAM).
4) The attack process performs double-sided PThammering;
5) The attack process verifies whether “exploitable” bit flips
have occurred by checking if a virtual address (VA) points
to cred structure page. As the cred contains three user
ids (e.g., uid and suid) and three group ids (e.g., gid and
sgid) stored sequentially, the attack process can construct
a unique string of the six ids and compare the string to
the VA-pointing page. If the pointed page does not contain
the string, then go to the step 4 to restart PThammer (On
average, 8 PThammer-induced bit flips are needed to have
one exploitable one);
6) If yes, the attack process has located a cred structure,
changes uid to 0 and then wakes up every child process by
delivering the registered signal. Inside the signal-catching
function, each child process can check whether it has
become a root process by invoking getuid.
V. DISCUSSION
Defeat ZebRAM [20]: ZebRAM is a rowhammer defense but
only works for a virtualized system. We can extend PThammer
to defeat it in our future work.
Empirically, ZebRAM observes that hammering a rowi can
only affect adjacent rowi+1 and rowi−1. Based on this ob-
servation, ZebRAM leverages the hypervisor to split memory
of a VM into safe and unsafe regions using even and odd
rows in a zebra pattern. That is, all even rows of the VM are
for the safe region that contains data, while all odd rows are
for the unsafe region as swap space. As such, a rowhammer
attack from the safe region can only incur useless bit flips in
the unsafe region. For a rowhammer attack from the unsafe
region, it is not possible since the unsafe region is inaccessible
to an unprivileged attacker.
However, our experimental results in Lenovo X230 show
that rowi+2 and rowi−2 are able to induce bit flips in rowi,
to be specific, 173 bit flips have occurred in rowi within 16
hours. Clearly, ZebRAM’s observation does not hold at least
in our test machine and thus enabling both PeriHammer and
TeleHammer to defeat ZebRAM in such a machine. Also, Kim
et al. [18] report that ZebRAM’s observation is not correct, i.e.,
hammering a row can affect three rows or more in a certain
number of DRAM modules.
For those modules that support ZebRAM’s empirical ob-
servation, an attacker can compromise ZebRAM as follows.
ZebRAM does not protect the physical memory of the hy-
pervisor and thus extended page tables (EPTs) residing in
the hypervisor space are adjacent to each other. As such, an
unprivileged attacker can initiate regular memory accesses to
conduct PThammer-like attacks, causing exploitable bit flips
in EPT entries and escaping the VM.
Other Possible Instances of TeleHammer: Besides PTham-
mer, there might also exist other instances of TeleHammer that
leverage other built-in features of modern hardware/software.
Particularly, features that focus more on functionality and
performance may become potential candidates. For the hard-
ware, we discuss about two famous CPU features. Specifically,
out-of-order and speculative execution are two optimization
features that allow a parallel execution of multiple instruc-
tions to make use of instruction cycles efficient. As such,
an unprivileged attacker can leverage such hardware features
to bypass user-kernel privilege boundary and access kernel
memory [19], [24].
For the software, we talk about OS kernel features that
handle local and network requests. A system call is a pro-
grammatic feature in which a user application requests a
service from the kernel. By invoking a system call handler,
a user indirectly accesses the kernel memory. A network I/O
mechanism is also a programmatic feature that allows the OS
to serve requests from the network. Particularly, the network
interface card (NIC) will throw out a hardware exception to
notify the kernel of each network packet NIC receives. Within
the exception handler, the kernel will access kernel memory.
Thus, a remote user invokes this feature to access kernel
memory.
As a result, an attacker might build up an exploitable
communication path to a target kernel address by abusing the
above features.
Mitigation: we might detect both TeleHammer and PeriHam-
mer by performance counters [1]. However, such anomaly-
based detection is prone to false positives and/or false nega-
tives by nature [6].
Alternatively, we might take hardware defenses such as
PARA [18], TRR [28], [17] and TWiCe [22] to increase
DRAM refresh rate for specified rows, which would reduce
Tmax in Definition 5 (see section III-B) as much as possible
so as to break the last time condition in the definition.
Unfortunately, they require new hardware designs and thus
cannot be used to protect legacy systems.
For PThammer, we might cache PTEs in an isolated cache
to eliminate a communication path identified by PThammer.
Since PTEs are placed in a separated cache, then PThammer
cannot use the cache-eviction approach to evict PTEs. How-
ever, reserving an isolated cache only for page-table pages
is expensive in hardware and requires re-designing hardware.
Even if such an isolated cache for PTEs would be released by
CPU manufacturers, there might exist other communication
paths for PThammer to hammer PTEs, or other instances
of TeleHammer that hammers other critical structures in the
kernel space. Summarizing, we believe that TeleHammer-
based rowhammer attacks are hard to be mitigated.
VI. CONCLUSION
In this paper, we first observed a critical condition required
by existing rowhammer exploits to gain the privilege esca-
lation or steal the private data. We then proposed a new
class of rowhammer attacks, called TeleHammer, that crosses
privilege boundary and thus eschews the condition. Besides,
we presented a formal model to define key conditions to
set up TeleHammer and PeriHammer and summarized three
advantages of TeleHammer over PeriHammer.
On top of that, we created an instance of TeleHammer,
called PThammer that can cross the user-kernel boundary and
induce bit flips in Level-1 page table entries. Our experimental
results on three test machines showed that the first cross-
boundary bit flip occurred within 15 minutes of double-sided
PThammering. Furthermore, we developed PThammer-based
attacks that allow an unprivileged attacker to compromise
the state-of-the-art software-only defenses in default system
setting.
REFERENCES
[1] Zelalem Birhanu Aweke, Salessawi Ferede Yitbarek, Rui Qiao, Reetu-
parna Das, Matthew Hicks, Yossi Oren, and Todd Austin. Anvil:
Software-based protection against next-generation rowhammer attacks.
ACM SIGPLAN Notices, 51(4):743–755, 2016.
[2] Thomas W Barr, Alan L Cox, and Scott Rixner. Translation caching:
skip, don’t walk (the page table). ACM SIGARCH Computer Architecture
News, pages 48–59, 2010.
[3] Sarani Bhattacharya and Debdeep Mukhopadhyay. Curious case of
rowhammer: flipping secret exponent bits using timing analysis. In
International Conference on Cryptographic Hardware and Embedded
Systems, pages 602–624. Springer, 2016.
[4] Carsten Bock, Ferdinand Brasser, David Gens, Christopher Liebchen,
and Ahamd-Reza Sadeghi. Rip-rh: Preventing rowhammer-based inter-
process attacks. In Proceedings of the 2019 ACM Asia Conference on
Computer and Communications Security, pages 561–572. ACM, 2019.
[5] Erik Bosman, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida.
Dedup est machina: Memory deduplication as an advanced exploitation
vector. In Security and Privacy, 2016 IEEE Symposium on, pages 987–
1004. IEEE, 2016.
[6] Ferdinand Brasser, Lucas Davi, David Gens, Christopher Liebchen, and
Ahmad-Reza Sadeghi. Can’t touch this: Software-only mitigation against
rowhammer attacks targeting kernel memory. In USENIX Security
Symposium, 2017.
[7] Yueqiang Cheng, Zhi Zhang, Surya Nepal, and Zhi Wang. Cattmew:
Defeating software-only physical kernel isolation. IEEE Transactions
on Dependable and Secure Computing, 2019.
[8] Pietro Frigo, Cristiano Giuffrida, Herbert Bos, and Kaveh Razavi. Grand
pwning unit: accelerating microarchitectural attacks with the gpu. In
Security and Privacy, 2018 IEEE Symposium on. IEEE, 2018.
[9] Daniel Genkin, Lev Pachmanov, Eran Tromer, and Yuval Yarom. Drive-
by key-extraction cache attacks from portable code. In International
Conference on Applied Cryptography and Network Security, pages 83–
102. Springer, 2018.
[10] Ben Gras, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. Trans-
lation leak-aside buffer: Defeating cache side-channel protections with
{TLB} attacks. In 27th {USENIX} Security Symposium ({USENIX}
Security 18), pages 955–972, 2018.
[11] Daniel Gruss, Moritz Lipp, Michael Schwarz, Daniel Genkin, Jonas
Juffinger, Sioli O’Connell, Wolfgang Schoechl, and Yuval Yarom.
Another flip in the wall of rowhammer defenses. arXiv preprint
arXiv:1710.00551, 2017.
[12] Daniel Gruss, Cle´mentine Maurice, and Stefan Mangard. Rowham-
mer.js: A remote software-induced fault attack in javascript. In Detection
of Intrusions and Malware, and Vulnerability Assessment, pages 300–
321. Springer, 2016.
[13] Daniel Gruss, Cle´mentine Maurice, and Stefan Mangard. Program for
testing for the dram rowhammer problem using eviction. https://github.
com/IAIK/rowhammerjs, May 2017.
[14] Ralf Hund, Carsten Willems, and Thorsten Holz. Practical timing side
channel attacks against kernel space aslr. In 2013 IEEE Symposium on
Security and Privacy, pages 191–205. IEEE, 2013.
[15] Intel, Inc. Intel 64 and IA-32 architectures optimization reference
manual. September 2014.
[16] Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. Systematic reverse
engineering of cache slice selection in intel processors. In 2015
Euromicro Conference on Digital System Design, pages 629–636. IEEE,
2015.
[17] JEDEC Solid State Technology Association. Low power double
data rate 4 (lpddr4). https://www.jedec.org/standards-documents/docs/
jesd209-4b, 2015.
[18] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee,
Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. Flipping
bits in memory without accessing them: An experimental study of dram
disturbance errors. In ACM SIGARCH Computer Architecture News,
volume 42, pages 361–372. IEEE Press, 2014.
[19] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Ham-
burg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz,
and Yuval Yarom. Spectre attacks: Exploiting speculative execution.
arXiv preprint arXiv:1801.01203, 2018.
[20] Radhesh Krishnan Konoth, Marco Oliverio, Andrei Tatar, Dennis An-
driesse, Herbert Bos, Cristiano Giuffrida, and Kaveh Razavi. Zebram:
comprehensive and compatible software protection against rowhammer
attacks. In 13th {USENIX} Symposium on Operating Systems Design
and Implementation ({OSDI} 18), pages 697–710, 2018.
[21] Andrew Kwong, Daniel Genkin, Daniel Gruss, and Yuval Yarom.
Rambleed: Reading bits in memory without accessing them. In 41st
IEEE Symposium on Security and Privacy (S&P), 2020.
[22] Eojin Lee, Ingab Kang, Sukhan Lee, G Edward Suh, and Jung Ho
Ahn. Twice: preventing row-hammering by exploiting time window
counters. In Proceedings of the 46th International Symposium on
Computer Architecture, pages 385–396. ACM, 2019.
[23] Moritz Lipp, Misiker Tadesse Aga, Michael Schwarz, Daniel Gruss,
Cle´mentine Maurice, Lukas Raab, and Lukas Lamster. Nethammer:
Inducing rowhammer faults through network requests. arXiv preprint
arXiv:1805.04956, 2018.
[24] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner
Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, and
Mike Hamburg. Meltdown. arXiv preprint arXiv:1801.01207, 2018.
[25] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B Lee.
Last-level cache side-channel attacks are practical. In 2015 IEEE
Symposium on Security and Privacy, pages 605–622. IEEE, 2015.
[26] Cle´mentine Maurice, Nicolas Scouarnec, Christoph Neumann, Olivier
Heen, and Aure´lien Francillon. Reverse engineering intel last-level cache
complex addressing using performance counters. In Proceedings of the
18th International Symposium on Research in Attacks, Intrusions, and
Defenses, RAID 2015, pages 48–65, 2015.
[27] Cle´mentine Maurice, Manuel Weber, Michael Schwarz, Lukas Giner,
Daniel Gruss, Carlo Alberto Boano, Stefan Mangard, and Kay Ro¨mer.
Hello from the other side: Ssh over robust cache covert channels in the
cloud. In NDSS, pages 8–11, 2017.
[28] Micron, Inc. Ddr4 sdram mt40a2g4, mt40a1g8, mt40a512m16 data
sheet. https://www.micron.com/products/dram/ddr4-sdram/, 2015.
[29] Thomas Moscibroda and Onur Mutlu. Memory performance attacks:
Denial of memory service in multi-core systems. In USENIX Security
Symposium, 2007.
[30] Yossef Oren, Vasileios P Kemerlis, Simha Sethumadhavan, and Ange-
los D Keromytis. The spy in the sandbox: Practical cache attacks in
javascript and their implications. In Proceedings of the 22nd ACM
SIGSAC Conference on Computer and Communications Security, pages
1406–1418. ACM, 2015.
[31] Peter Pessl, Daniel Gruss, Cle´mentine Maurice, Michael Schwarz, and
Stefan Mangard. Drama: Exploiting dram addressing for cross-cpu
attacks. In USENIX Security Symposium, pages 565–581, 2016.
[32] Rui Qiao and Mark Seaborn. A new approach for rowhammer attacks. In
Hardware Oriented Security and Trust (HOST), 2016 IEEE International
Symposium on, pages 161–166. IEEE, 2016.
[33] Kaveh Razavi, Ben Gras, Erik Bosman, Bart Preneel, Cristiano Giuf-
frida, and Herbert Bos. Flip feng shui: Hammering a needle in the
software stack. In USENIX Security Symposium, pages 1–18, 2016.
[34] Mark Seaborn and Thomas Dullien. Exploiting the dram rowhammer
bug to gain kernel privileges. In Black Hat’15, 2015.
[35] Andrei Tatar, Radhesh Krishnan Konoth, Elias Athanasopoulos, Cris-
tiano Giuffrida, Herbert Bos, and Kaveh Razavi. Throwhammer:
Rowhammer attacks over the network and defenses. In 2018 USENIX
Annual Technical Conference, 2018.
[36] Victor van der Veen, Yanick Fratantonio, Martina Lindorfer, Daniel
Gruss, Cle´mentine Maurice, Giovanni Vigna, Herbert Bos, Kaveh
Razavi, and Cristiano Giuffrida. Drammer: Deterministic rowhammer
attacks on mobile platforms. In Proceedings of the 2016 ACM SIGSAC
Conference on Computer and Communications Security, pages 1675–
1689. ACM, 2016.
[37] Xin-Chuan Wu, Timothy Sherwood, Frederic T. Chong, and Yanjing
Li. Protecting page tables from rowhammer attacks using monotonic
pointers in dram true-cells. In Proceedings of the Twenty-Fourth
International Conference on Architectural Support for Programming
Languages and Operating Systems, ASPLOS ’19, pages 645–657, 2019.
[38] Yuan Xiao, Xiaokuan Zhang, Yinqian Zhang, and Radu Teodorescu. One
bit flips, one cloud flops: Cross-vm row hammer attacks and privilege
escalation. In USENIX Security Symposium, pages 19–35, 2016.
APPENDIX
In either system setting, we conduct double-sided PTham-
mering for 50 rounds on each machine and measure the time
that each PThammering takes, shown in Figure 6.
As Figure 6a of the regularpage setting shows, most time
costs per double-sided PThammering (no less than 96%) in
both Lenovo machines are in the range of {600, 900} (100%
time costs are below 1000 cycles) while the Dell machine has
a range of {900, 1400}.
In Figure 6b of the superpage setting, most time costs per
double-sided PThammering (no less than 94%) in both Lenovo
machines are in the range of {400, 900} (100% time costs are
below 1100 cycles) while the Dell machine has the same range
of {900, 1400}.
Given the maximum time cost that allows bit flips in
Figure 5 of Section IV-A5, we conclude that PThammer is
efficient enough to induce bit flips.
0 10 20 30 40 50
Hammer Number (#)
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
Ti
m
e 
(C
yc
le
s)
 P
er
 H
am
m
er
Lenovo T420 Lenovo X230 Dell E5420
(a) Double-sided PThammering in the default regularpage setting.
0 10 20 30 40 50
Hammer Number (#)
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
Ti
m
e 
(C
yc
le
s)
 P
er
 H
am
m
er
Lenovo T420 Lenovo X230 Dell E5420
(b) Double-sided PThammering in the superpage setting.
Fig. 6: In both system settings, the time-cost range on each machine is well below the maximum time cost (see Figure 5) that
allows bit flips.
