Using Name Confusion to Enhance Security by Ziad, Mohamed Tarek Ibn et al.
Using Name Confusion to Enhance Security
M. Tarek Ibn Ziad
Columbia University
mtarek@cs.columbia.edu
Miguel A. Arroyo
Columbia University
miguel@cs.columbia.edu
Evgeny Manzhosov
Columbia University
evgeny@cs.columbia.edu
Vasileios P. Kemerlis
Brown University
vpk@cs.brown.edu
Simha Sethumadhavan
Columbia University
simha@cs.columbia.edu
Abstract—We introduce a novel concept, called Name Confu-
sion, and demonstrate how it can be employed to thwart multiple
classes of code-reuse attacks. By building upon Name Confusion,
we derive Phantom Name System (PNS): a security protocol
that provides multiple names (addresses) to program instructions.
Unlike the conventional model of virtual memory with a one-to-
one mapping between instructions and virtual memory addresses,
PNS creates N mappings for the same instruction, and randomly
switches between them at runtime. PNS achieves fast randomiza-
tion, at the granularity of basic blocks, which mitigates a class
of attacks known as (just-in-time) code-reuse.
If an attacker uses a memory safety-related vulnerability to
cause any of the instruction addresses to be different from the
one chosen during a fetch, the exploited program will crash. We
quantitatively evaluate how PNS mitigates real-world code-reuse
attacks by reducing the success probability of typical exploits
to approximately 10−12. We implement PNS and validate it by
running SPEC CPU2017 benchmark suite. We further verify
its practicality by adding it to a RISC-V core on an FPGA.
Lastly, PNS is mainly designed for resource constrained (wimpy)
devices and has negligible performance overhead, compared to
commercially-available, state-of-the-art, hardware-based protec-
tions.
I. INTRODUCTION
Virtual memory addresses serve as references, or names,
to objects (i.e., instructions, data) during computation. For
instance, every instruction in a program is uniquely identified
(at run time) with a virtual memory address: the value in the
Program Counter (PC). Typically, the virtual memory address
assigned to an instruction is kept constant and unique for the
life time of the program. In this work, we show that having
multiple names for an instruction—at any given time instant—
improves the security of the system with minimal hardware
support without performance degradation.
How can having multiple names improve security? Given
multiple names for an instruction, we define a security protocol
that specifies a random sequence of names to be used during
execution. If the attacker does not follow the security protocol
by supplying an incorrect name, the exploited program will
crash. In other words, if there are N addresses (names) per
instruction, and if the attacker has to reuse P instruction
sequences to complete an attack, the probability of detecting
the attack is 1− (1/N)P, without any false positives. For
example, for N = 256 and P = 5, then the probability of an
attack succeeding is 1 in 1 trillion. This kind of protection
makes this technique suitable to be used as a standalone
solution, or in tandem with other, heavier-weight hardening
mechanisms. We refer to such classes of architectures as Name
Confusion Architectures.
Name confusion is fundamentally different from other
hardening paradigms. For example, in the information-hiding
paradigm [31], the program addresses (or parts of them) are
kept a secret, but there is only one name per instruction. Sim-
ilarly, Instruction Set Randomization (ISR) techniques [36],
[51], [59] randomize the encoding of instructions in memory,
while also maintaining a unique instruction name per program
execution. In the metadata-based paradigm, such as Control-
Flow Integrity (CFI) [1], [12], the set of targets (names) that
can result from the execution of certain instructions (i.e., in-
direct branches) are computed statically and checked during
execution. In moving target paradigms, such as Shuffler [72]
and Morpheus [28], the names of instructions change over
time; however, at any given time, there is only one valid
name/address for an instruction.
In this work, we explore an application of a name
confusion-based architecture, and show how it is used to
mitigate a class of attacks known as code-reuse (ROP [11],
[15], [57], JOP [6], COP [30]), including their just-in-time
variants [61]. The instance we consider, called Phantom Name
System (PNS), provides up to N different names, for any
instruction, at any given time, where N is a configurable
parameter (it is set to 256 in our design). The security protocol
for PNS is simply a truly random selection among the different
names. Specifically, PNS works as follows: during instruction
fetch, the address used to fetch the instruction is randomly
chosen from one of the N possible names for the instruction,
and the instruction is retrieved from that address. From that
point on, any PC-relative addresses used by the program relies
on the name obtained during fetch. If the attacker’s strategy
causes any of the PC-relative addresses to be different from the
one used during the fetch, then an invalid instruction will be
executed, leading to unexpected effects, such as an alignment,
or instruction-decoding, exception. These unexpected effects
lead to program crashes that can work as signals of bad
actions taking place especially in the case of repeated crashes.
Orthogonal mechanisms that turn these signals into a defensive
advantage exist [45].
A naive implementation of PNS would require each in-
struction to be stored in N locations so they will have N
names. Consequently, the capacity of all PC-indexed mi-
croarchitectural structures would be divided by N, heavily
impacting performance. Further, this requires changes to the
ar
X
iv
:1
91
1.
02
03
8v
3 
 [c
s.C
R]
  2
6 A
ug
 20
20
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
compiler, linker, loader, etc. In this work, we use a simple
technique to avoid these problems: we intentionally alias the
different instruction names/addresses so they point to the same
instruction, allowing us to serve the N instructions from one
copy. This idea is similar to how multiple virtual addresses
can point to the same physical addresses (used to implement
copy-on-write [9]) with two key differences: first, in PNS the N
names correspond to the same virtual address, not a physical
address; and second, the PNS addresses do not need to be
page-aligned as required for data synonyms—i.e., PNS virtual
address names can be arbitrarily offset. The first difference
ensures that PNS can be handled at the application level
without requiring significant changes to the operating system
(OS), which manages the virtual-to-physical address mappings,
while the second is key to providing security.
With the above optimizations, we show that ROP/JOP/-
COP attack protection is provided at almost no performance
overhead and without any binary changes. We further show,
that our scheme can be combined with previously known tech-
niques [18], [43], [47], [66] that encrypt instruction addresses
stored in the heap or the global data section(s), viz., function
pointers, to provide robust security against even larger class of
attacks, such as COOP [56]. The combined protection scheme
has 6% performance overhead, making it better than state-of-
the-art commodity security solutions, like the ARM pointer
authentication code (PAC) [53] that is available in the latest
iPhone devices, and has the additional benefit of not requiring
a 64-bit architecture. Supporting non-64-bit architectures is
important as they make up the majority of the computing
devices that exist nowadays: in 2018, 11.75 million servers
shipped worldwide [63] vs. 28.1 billion 32-bit (or smaller)
microcontrollers [64].
II. NAME CONFUSION ARCHITECTURE
A name confusion architecture assigns different addresses,
or names, to any contiguous group of instructions randomly at
runtime. In this section, we introduce PNS, a security protocol
derived from the principles of name confusion architectures.
PNS consists of N phantoms (domains). It requires every
instruction in the program to have N unique names. To assign
the names, we use a mapping function, namep = f (va, p),
which takes the instruction virtual address, va, and a phantom
index, p as inputs and returns the phantom name, namep. This
way any instruction is mapped by f to unique location in each
of the N phantoms. The function f does not have to be kept a
secret, as security is purely derived from the random selection
of p at fetch time. For mapping a phantom name to its original
virtual address, we use the inverse function, va= f−1(namep).
To enable the inverse function we ensure that the phantom
name encodes the phantom index, p as part of the phantom
address.
A. PNS Framework
There are four main operations to realize PNS: Populate,
Randomize, Resolve, and Conceal.
Populate. PNS creates multiple phantoms of basic blocks, and
populates them in the phantom name space. The left-hand side
of Figure 1 shows a program with two Phantoms, such that
every basic block (BBL) has two different names (addresses)
Phantom Name Space
Virtual 
Address 
Space
BBL A
BBL B
BBL C
BBL A
BBL B
BBL C
BBL A
BBL C
Security 
Shift
(δ)
. .
 . 
BBL B
Phantom1
Phantom0
Phantom 
Offset
(Δ)
Physical 
Address 
Space
BBL A
BBL B
BBL C
Fig. 1: Basic block mapping for PNS. BBLs are only dupli-
cated in the phantom space.
in Phantom0 (aka the original domain) and Phantom1. PNS
separates the two Phantoms by a phantom offset, ∆, in the
phantom space. To add discrepancy between the Phantom
copies, we introduce a minor security shift, δ , so that they
are not perfectly overlapped after removing ∆. This is shown
by the shaded basic block in Figure 1 and is necessary for
security, as will be illustrated in Section IV-C. The inverse
mapping function f−1 maps all phantoms to a single name in
the virtual address space, which is then translated to a physical
address by the OS.
Randomize. We modify the hardware to randomize program
execution between the Phantoms at runtime. For example,
some basic blocks will be executed from Original (Phantom0)
while other basic blocks will be executed from any other Phan-
tom. Correctness is unaltered because all Phantoms provide the
same functionality by construction.
Resolve. Accessing different instruction names at runtime
incurs additional performance overheads as each name needs
to be translated to a virtual address and then a physical one
before usage. To mitigate this problem, PNS uses the inverse
mapping function f−1 to resolve the different Phantoms to
their archetype basic block. By doing so, the processor back-
end continues to operate as if there is only one copy of the
program in the phantom name space.
Conceal. Normal programs push return addresses to the ar-
chitectural stack to help return from non-leaf function calls.
The attacker may learn the domain of execution, the Phantom
index, by monitoring the stack contents at runtime using arbi-
trary memory disclosure vulnerabilities [61]. Thus to preserve
name confusion, we need to conceal the execution domain of
the instructions.
B. PNS Construction
In this section, we discuss alternative design choices for
the different operations in the PNS framework.
Populate. Many approaches can be used to populate the
Phantoms. One approach is to use the most significant bits
(MSBs) to separate the program copies in the phantom space.
For example, a ∆ of 0x8000_0000_0000_0000 will create
two phantoms on 64-bit systems, where each phantom resides
in one half of the address space. This approach is acceptable
2
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
for 64-bit systems because VA allows for 64 bits, yet only 48
are used in practice, leaving the higher order bits available for
phantom addresses. However, this is costly for 32-bit systems
as it will reduce the effective range of addresses a program can
use by half. Instead, to store the phantom index we add n addi-
tional bits to the hardware program counter, while maintaining
the 32-bit virtual address space of the program. This allows
PNS to generate N = 2n phantoms. Specifically, f , sets the
additional n bits at control-flow transitions to randomize the
execution at runtime. For simplicity, we set the phantom offset
as ∆ = 1 32 and the minor security shift of any phantom
to be a multiple of the phantom index (i.e., δ p = p×δ ). We
elaborate more on the PNS realization in Section III.
Randomize. PNS can randomize program addresses at any
level of granularity, ranging from individual instructions to
entire programs. In the rest of the paper, we use basic blocks as
our elements of interest. We do not evaluate finer granularities
here due to the lack of a strong security need. We define
the basic block as a single entry, single exit region of code.
Thus, any instruction that changes the PC register (referred
to by control-flow instructions, such as jmp, call, ret)
terminates a BBL and starts a new one.1
Conceal. We can prevent attackers from learning the execution
domain in a number of ways. One straightforward way is to
encrypt the return address with a secret key and only decrypt
it upon function return. Another key-less, and low overhead,
method that we implement is to split this information so
that the public part is what is common between the phantom
domains, and the private part that distinguishes the domains is
hidden away without architectural access.
We split the return addresses between the architectural
stack and a new hardware structure called the Secret
Domain Stack (SDS), which by construction is immutable
to external writes. SDS achieves this goal by splitting the return
address (32+n) bits into two parts; the n-bits, which represent
the phantom index (p), and the lower 32 bits of the address,
which encodes the security shift (δ ). With each function call
instruction, the lower 32 bits of the return address are pushed to
the architectural (software) stack, whereas the phantom index p
is pushed onto the SDS. A ret instruction pops the most
recent p from the top of SDS and concatenates it with the
return address stored on the architectural stack in memory.
While under attack, the return address on the architectural
stack will be corrupted by the attacker. However, the attacker
cannot access SDS so they cannot reliably adjust the malicious
return address to correctly encode δ , leading to an incorrect
target address after PNS merges the malicious return address
with the phantom index p from SDS. Deployment issues with
the SDS such as sizing, overflows, multithreading, etc. are
described in Section VIII.
C. PNS Correctness
Since the addresses are selected during a fetch, how can
we be assured that all PC relative computations are used in the
correct way as encoded in the original program? In order to
discuss the correctness of PNS construction and operation, we
consider the structured programming theorem [20], in which a
1Some compilers, such as LLVM, deviate from this definition and treat
call instructions as part of the BBL.
program is composed from any subset of the control structures:
Sequence, Selection, Iteration, and Recursion. We show that
PNS does not affect the four structures. For simplicity, let
us assume n = 1, so that only two phantoms exist, Original
or Phantom. First, the Sequence structure represents a series
of ordered statements or subroutines executed in sequence.
PNS guarantees this property by executing the statements
(instructions) of a BBL in the same domain of execution
(either Original or Phantom).
For handling the Selection structure, let us assume that a
program is represented by a binary tree with a branching factor
of 2. Hence, we define two types of such trees. Type I is the
tree where nodes are represented by the BBLs of committed
instructions. Type II is the tree where each node has the
address of the first instruction in the executed BBL. The edges
are given by the direction taken by the last instruction of each
BBL. The root of the tree is the first instruction fetched from
the _start() section of a binary (or the address of this
instruction in the address based tree). The leaf node is the last
BBL of the program (or the address of this BBL in trees of
Type II). In the case of PNS, every taken branch on the tree
of Type I is the same as every taken branch on the tree of
Type II, i.e., program functional decisions are not affected.
However, the contents of the tree nodes in Type II trees would
be different for each program execution, as each BBL will be
fetched from either Original or Phantom domain and addresses
of these domains differ by ∆. In other words, after the branch
is resolved to be taken or not taken, PNS operates on the
outcome and randomly chooses the domain of execution for
the next basic block.
The above argument also applies for the Iteration control-
structure, in which the same basic block is executed multiple
times. PNS does not change the functionality of the basic
block, however, each time the basic block is executed it will
be executed in one of the phantom domains. The Recursion
construct is similar to iterative loops and thus is guaranteed
to be executed correctly with PNS. The same proof holds
for n> 1 by making the branching factor of the tree equals 2n.
From the user perspective, the program will produce the same
result, since the order and flow of instructions has not changed
(Type I trees are unique). However, from the perspective
of an entity that observes only addresses, each execution of
a program appears to be a different sequence of addresses
since the addresses of individual basic blocks will be altered
(Type II trees are not unique). This enables a substantial
security benefit as we show next.
III. HARDWARE DESIGN
Figure 2 summarizes our modifications to support PNS.
The changes are limited to structures that operate on PC.
A. Selector
With PNS, each PC is extended by (additional) n-bits,
dubbed the phantom index (p). So, a program counter from
phantom p will have the following format:
PCp[31+n : 0] = {p[n−1 : 0],PC[31 : 0]} (1)
3
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
ITLB
FETCH
L1 I-Cache
PC
Selector
D
E
C
O
D
E
R
DEC EXECUTE
 nextPC
B
D
B
R
A
S
Train
B
TB
ALU
ALU
Branch
Unit
W
R
I
T
E
B
A
C
K
WB
...
...
...
Fig. 2: Processor pipeline with PNS hardware.
The Selector (S) is responsible for adjusting the PC before
executing any new BBL so that the execution flow cannot be
predicted by the attacker. Specifically, the selector takes the
predicted target for a branch (PCnew) with control-flow signal s
as input: s is set to one if the Branch Predictor Unit (BPU) has
a predicted target for this instruction, or to zero otherwise. The
selector generates the nextPC as the output. If s equals one,
the selector generates an n-bit random phantom index pnext .2
Based on pnext , the selector adjusts the nextPC according to
Equation 2.
nextPC[31+n : 0] ={pnext [n−1 : 0],
PCnew− (pnext − pnew)×δ} (2)
Note that pnew is the phantom index of the predicted target
PCnew. For example, assuming n = 8-bits, we have 28 = 256
phantoms. If PCnew corresponds to the fifth phantom (i.e.,
pnew = 5) and the selector randomly chooses the eighth phan-
tom (i.e., pnext = 8), nextPC will equal {8,PCnew−3δ}. On
the other hand, if the selector randomly chooses the second
phantom (i.e., pnext = 2), nextPC will equal {2,PCnew+3δ}.
As the security shift δ is only used to break the overlapping
between the names in different phantoms, it can be arbitrarily
set to a single byte on CISC architectures or multiples of the
instruction size on RISC architectures.
Performance Optimization #1. The aforementioned selector
adds one cycle latency to the nextPC calculations in the fetch
stage. To alleviate this, we move the selector to the commit
stage—placing the selector at the commit stage allows us to
mask latency overheads needed for target address adjustments
so that it does not affect performance.
At the commit stage, the target of the branch instruction
is known and sent back to the fetch stage to update (train)
the BPU buffers. At this point, the selector will adjust the
target address by using pnext , as explained above and update
the BPU buffers with nextPC. This ensures that the next
execution for this control-flow instruction will be random and
unpredicted. To bootstrap the first execution of a control-flow
instruction, we consider the two possible cases: correct and
incorrect prediction. If the first occurrence of the control-
flow instruction is correctly predicted to be PC + 4 (falling
through), then the selector will keep using the current domain
of execution (unknown to the attacker) for the next BBL. If
the first occurrence of the control-flow instruction is incorrectly
predicted, it would be detected later on in the commit stage
2This can be implemented using n metastable flip-flops [37].
PC [31:0]p [n-1:0]
Incoming Extended PC
PC [31:0]
Virtual Address
Left Shift
by δ
+
Fig. 3: Mapping the extended PC (i.e., the phantom name) to
the virtual address before indexing into the microarchitectural
structures.
and the pipeline will be flushed. In this case, the selector will
adjust the resolved target address by using p (unknown to the
attacker) and update the BPU buffers with nextPC.
B. Branch Prediction Unit (BPU)
The branch prediction unit stores a record of previous target
addresses in the branch target buffer (BTB), and the recent
return addresses in the return address stack (RAS). For the cur-
rent PC value, the BPU checks if the corresponding entry exists
in the BTB by indexing with the PC. If it exists, the found
target address becomes the nextPC. Otherwise, nextPC is
incremented to PC + 4 (or PC + Instruction size).
If the predicted target address turns out to be incorrect later in
the instruction pipeline, the processor re-fetches the instruction
with the correct target address (available usually at the execute
stage of the branch instruction) and nullifies the instructions
fetched with the predicted target address.
Performance Optimization #2. PNS assigns N different
addresses for the same control-flow instruction. In this case,
we will have multiple entries in the prediction tables for the
same effective instruction; this reduces the capacity to 100N %.
To handle this issue, we map the incoming phantom address
to its original name before indexing into the BPU tables, as
shown in Figure 3. We do so by modifying the hashing function
of the BPU tables to avoid adding any latency to the lookup
operation. This way we guarantee that all phantom addresses
(names) map to the same table entry. After indexing, we get
the desired values from the prediction tables. As explained
in Section III-A, the nextPC values stored in the BTB are
already chosen at random from the last successful commit
of this control-flow instruction (or any of its phantoms). The
branch direction prediction results (Taken vs. Not Taken) in
the branch direction buffer (BDB) remain the same.
C. Translation Look-aside Buffer (TLB)
Performance Optimization #3. Similar to the BPU buffers,
the fact that we have N variants of every BBL with different
virtual addresses may lead to multiple different virtual-to-
physical address entries in the TLB for the same translation,
reducing its capacity to 100N %. To avoid potential perfor-
mance degradation, we map the incoming phantom address
to its original name before accessing the ITLB. For example,
the following two phantom addresses, {2, 0x00BB_FFF4}
and {0, 0x00BB_FFF8}, will point to the same virtual
address, 0x00BB_FFF8. This common virtual address has
a unique mapping to a physical address, 0x0011_DDFC,
that is stored in the ITLB. Thus, the translations related to
all Phantoms map to a single entry in the ITLB, while we
4
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
do not modify physical addresses so that the stored physical
address part of the translation remains unaffected.
D. Instruction Cache
Performance Optimization #4. Creating N variants of the
code sections for each program means that the L1-I$ capacity
would be effectively reduced to 100N %. PNS maps the incoming
phantom address to its virtual address before accessing the
L1-I$ (in case of virtually-indexed caches) or performing the
tag comparison (in case of virtually-tagged caches).3 This
represents our simple inverse mapping function, f−1. The
latency of the adjustment operations (shifting and addition)
can be masked within the cache read operation. This incoming
address adjustment ensures that while executing a BBLPhantom
we fetch the correct instruction.
E. Execution Unit
Performance Optimization #5. If the target architecture
allows forwarding the PC register through the pipeline for
regular instructions, we make sure that the PC register is
always mapped to the virtual address before operating on
it. This mapping may introduce additional latency for the
execute stage as it should be done before/after it. To mask
such latencies, one solution is to always forward the two
versions, Phantomp and Original, of the PC register to the
desired execution units. Although such a solution completely
hides the adjustment latency, it may increase the execution
unit(s) area.
F. Secret Domain Stack
Performance Optimization #6. Unlike prior work, which
stores a complete version of the return addresses (e.g., 32-bit
on AARCH32) in what is called a shadow stack [14], we only
store n = 8 bits per return address. To minimize silicon area
within the processor and facilitate managing the SDS, as dis-
cussed in Section II-B, we do not need to store the full return
address. This structure does not introduce additional latency
as it is accessed in parallel to the normal architectural stack
access. We evaluate the optimal size of SDS in Section VI.
IV. CODE REUSE PROTECTION WITH PNS
In this section, we summarize code-reuse attacks and
defenses, and discuss how PNS is used to mitigate such attacks.
A. Background
Attacks that chain together gadgets whose last instruction
is a ret are known as return oriented programming (ROP)
attacks [11], [57]. To mount a ROP attack, the attacker has to
first analyze the code to identify the respective gadgets, which
are sequences of instructions in the victim program (including
any linked-in libraries) that end with a return. Second, the
attacker uses a memory corruption vulnerability to inject a
sequence of return addresses corresponding to a sequence of
gadgets. When the function returns, it returns to the location
of the first gadget. As that gadget terminates with a return,
the return address is that of the next gadget, and so on. As
3 No changes are needed for Physically-Indexed Physically-Tagged (PIPT)
caches.
ROP executes legitimate instructions belonging to the program,
it is not prevented by WˆX [23]. Note that variants of ROP
that use indirect jmp or call instructions, instead of ret,
to chain the execution of small instruction sequences together
also exist, dubbed jump-oriented programming (JOP) [6] and
call-oriented programming (COP) [30], respectively.
B. Currently Deployed Mitigations
The standard mitigation technique against ROP/JOP/COP,
and pretty much every Code Reuse Attack (CRA) variant, is
address space layout randomization (ASLR), which is cur-
rently a well-adopted defense, enabled on (pretty much) every
contemporary OS [70]. Essentially, ASLR forces the attacker
to first disclose the code layout (e.g., via a code pointer)
to determine the addresses of gadgets. Snow et al. [61] ob-
served that typical programs have multiple memory disclosure
vulnerabilities. They developed a just-in-time ROP (JIT-ROP)
compiler that explores the program’s memory, disassembling
any code it finds (in memory), as well as, searching for
API/system calls. Then, they construct a compatible code-reuse
payload on the fly. Note that, in principle, JIT-ROP is not
restricted to dynamically stitching together only ROP payloads;
it can also compile JOP, COP, or any other code-reuse payload.
Recently, ARM introduced PAC in Armv8.3A, which
is implemented in the Apple’s iPhone XS SoC [53]. The
idea is based on a concept known as cryptographic control-
flow integrity (CCFI) [47]. For every code pointer, such
as return addresses and function pointers, CCFI stores a
cryptographically-secure authentication code in the pointer’s
unused most significant bits. Checking the authentication code
of a pointer before any indirect branches prevents control-
flow hijacking because the attacker cannot compute a valid
authentication code without access to keys. As we will show
in Section VI, to achieve low overheads with this scheme, it is
essential to have 64-bit architecture and to apply the solution
to only a subset of the pointers: full-application of the idea
on a 32-bit processor results in 91% overhead for SPEC2017.
In contrast, we want to enable security for 16, 32- and 64-bit
systems, as non-64-bit systems are widely used in Internet-of-
Things and Cyber Physical Systems. Thus, there is a need for
new low overhead deployable solutions.
C. PNS for CRA Protection
PNS mitigates ROP by ensuring that the addresses of the
ROP gadgets in the gadget chain change after the chain is
built. This will result in undefined behavior of the payload
(likely leading to a program crash). Consider the example in
Figure 1: PNS simultaneously populates multiple (apparent)
phantoms of the program code in the phantom name space; to
successfully thwart the ROP gadget chain, the location of the
ROP gadgets in all phantoms should be different [21].
Traditional in-place randomization techniques [22], [52]
can be used to generate Phantoms. However, using an ag-
gressive randomization approach will complicate the inverse
mapping function, f−1, which is responsible for recovering
the archetype basic block from the different Phantoms. This
will cause performance overheads with almost no additional
security (beyond changing the gadget addresses in the phantom
copies). PNS adopts a more efficient code layout randomiza-
tion technique by introducing a security shift, δ , between the
5
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
Regular Program
Execution
①
②
③
Inst 20 (Add)
Inst 21 (Sub)
Inst 22 (Mul)
Inst 23 (Add)
Inst 24 (Jump)
Inst 72 (Sub)
Inst 73 (Ret)
...
...
Program Execution
with Code Reuse Attack
①
②
③
④
Program Execution with PNS
①
③
Program Execution with PNS
and Code Reuse Attack
Inst 20 (Add)
Inst 21 (Sub)
Inst 22 (Mul)
Inst 24 (Jump)
Inst 71 (Add)
Inst 72 (Sub)
Inst 73 (Ret)
Inst 7 (Add)
Inst 8 (Sub)
Inst 10 (Call)
Inst 11 (Add)
Inst 12 (Jump)
...
Inst 23 (Add)
❸
(a) (b) (c) (d)
Inst 9 (mov)
...
Inst 7 (Add)
Inst 8 (Sub)
Inst 10 (Call)
Inst 11 (Add)
Inst 12 (Jump)
Inst 9 (mov)
Inst 20 (Add)
Inst 21 (Sub)
Inst 22 (Mul)
Inst 24 (Jump)
Inst 72 (Sub)
Inst 73 (Ret)
Inst 23 (Add)
...
Inst 8 (Sub)
Inst 10 (Call)
Inst 11 (Add)
Inst 12 (Jump)
Inst 9 (mov)
Inst 71 (Add) Inst 71 (Add)
... Inst 20 (Add)
Inst 21 (Sub)
Inst 22 (Mul)
Inst 24 (Jump)
Inst 72 (Sub)
Inst 73 (Ret)
Inst 23 (Add)
Inst 8 (Sub)
Inst 10 (Call)
Inst 11 (Add)
Inst 12 (Jump)
Inst 9 (mov)
Inst 71 (Add)
...
Inst 7 (Add)
Inst 7 (Add)δ
...
②
①
③
Inst 20 (Add)
Inst 21 (Sub)
Inst 22 (Mul)
Inst 24 (Jump)
Inst 72 (Sub)
Inst 73 (Ret)
Inst 23 (Add)
...
Inst 8 (Sub)
Inst 10 (Call)
Inst 11 (Add)
Inst 12 (Jump)
Inst 9 (mov)
Inst 71 (Add)
... Inst 20 (Add)
Inst 21 (Sub)
Inst 22 (Mul)
Inst 24 (Jump)
Inst 72 (Sub)
Inst 73 (Ret)
Inst 23 (Add)
Inst 8 (Sub)
Inst 10 (Call)
Inst 11 (Add)
Inst 12 (Jump)
Inst 9 (mov)
Inst 71 (Add)
...
Inst 7 (Add)
Inst 7 (Add)δ
...
②
WRONG 
Phantom0 Phantom1 Phantom0 Phantom1
Fig. 4: PNS effect on CRAs: (a) shows the regular program execution; (b) shows successful CRA via ROP; (c) shows regular
program execution with PNS; (d) shows ROP failure with PNS due to missing the desired gadget.
individual Phantoms, so that they are not perfectly overlapped
after removing the phantom offset, ∆. This simplifies f−1
computations (as shown in Figure 3 and maintains code
locality.
While the program is executing, PNS randomly de-
cides which copy of the program should be executed next.
Figure 4(a) shows the normal execution of a program,
where Inst 10 changes the control-flow of the program to
a different BBL (starting with Inst 71). After the called
BBL is executed, the control-flow is transmitted to the original
landing point (Inst 11 via a ret instruction). Figure 4(b)
shows a successful CRA via ROP, in which the attacker uses
a memory safety vulnerability to overwrite the return address
stored on the stack and divert the control flow to Inst 24
upon executing the ret instruction. Figure 4(c) shows the
diversified execution of a program with PNS. For simplicity,
we only show two phantoms and use a security shift, δ ,
sized to one instruction. Each control flow instruction can
arbitrary choose to change the execution domain or not. Here,
the Randomize operation decides to execute Inst 71 from
the Phantom domain. As the attacker cannot predict this
runtime decision in advance, they provide the wrong gadget
address on the stack (now shifted by δ ). Thus, they will end-
up executing a WRONG instruction, as shown in Figure 4(d).
This WRONG instruction may belong to a different BBL or
divert the execution to a new undesired BBL. In general, if
the attacker makes the wrong guess, they will execute one less
(or one more) instruction compared to the desired gadget. If δ
is smaller than the instruction size, the attacker will skip a
portion of the instruction resulting in an incorrect instruction
decoding.
V. PNS SECURITY EXTENSIONS
In this section, we discuss two PNS enhancements that
boost its security guarantees against state-of-the-art CRAs.
A. TRAP Instructions
To further limit the attack surface of PNS, we add TRAP
instructions. These instructions are inserted at the beginning of
PhantomN
4: TRAP
6: Inst B2 (Jmp)
5: Inst B1 (Sub)
δ 0: TRAP
6: Inst B2 (Jmp)
5: Inst B1 (Sub)
Phantom0 Phantom1
0: TRAP
1: Inst A1 (Mov)
2: Inst A2 (Add)
3: Inst A3 (Jmp)
1: Inst A1 (Mov)
2: Inst A2 (Add)
4: TRAP
3: Inst A3 (Jmp)
0: TRAP
6: Inst B2 (Jmp)
5: Inst B1 (Sub)
1: Inst A1 (Mov)
2: Inst A2 (Add)
4: TRAP
3: Inst A3 (Jmp)
δ
...
Fig. 5: PNS with TRAP instructions.
every basic block. While PNS is enabled, the security shift, δ ,
will cause the TRAP instruction that exists in the beginning of
a BBL in Original domain to appear at different locations of
the same BBL in each of the Phantom domains, as shown in
Figure 5. This way we provide the ability to catch attackers
that guess the incorrect diversification on the TRAP instruction
while targeting BBL boundaries.
Let us consider the example of Figure 4(d). Here, the at-
tacker tries to divert the control flow to Inst 24 upon execut-
ing the ret instruction. Our Randomize operation decides
to execute Inst 71 from the Phantom domain. As a result,
the attacker will step on the instruction that follows Inst
24 in Phantom, which will be a TRAP instruction in the
new model. Executing a TRAP instruction results in a security
exception and program termination, effectively thwarting the
attacker. Programs never execute TRAP instructions in normal
conditions (no attack) as there exist no control-flow transfers
to them.4
B. Lightweight Pointer Encryption
Besides ROP, CRA variants also extensively rely on pointer
corruption (e.g, JOP/COOP [6], [56]) to subvert a program’s
intended control flow. There also exist many software-based
mitigations for JOP/COOP-like attacks [8], [13], [44], [74]. In
4We modify the hardware to handle the case of a fall-through BBL to
prevent legitimately stepping on a TRAP instruction.
6
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
this paper, we use a hardware-based technique for hardening
PNS against them. Since the attacker needs to overwrite
legitimate pointers used by indirect branches to launch the
attack, we encrypt the contents of the pointer upon creation
and only decrypt it upon usage (at a call site). Consequently,
attackers cannot correctly overwrite it.
To achieve the above goal our Lightweight Pointer En-
cryption (PtrEnc) scheme adds two new instructions: ENCP
and DECP. The two instructions can either be emitted by the
compiler (if re-compiling the program is possible) or inserted
by a binary rewriter.
Encrypt Pointer (ENCP RegX). The mnemonic ENCP indi-
cates an encryption instruction. RegX is the register containing
the pointer, e.g., virtual function pointers. The register that
holds the encryption key is hardware-based and never appears
in the program binary.
Decrypt Pointer (DECP RegX). The mnemonic DECP indi-
cates a decryption instruction. RegX is the register containing
the pointer. The register that holds the decryption key is
hardware-based and does not appear in the program binary.
As a result, the attacker cannot directly leak the key’s value.
Moreover, the attacker cannot simply use the new instructions
as signing gadgets to encrypt/decrypt arbitrary pointers as they
will have to hijack the control flow of the program first.
Unlike prior pointer encryption solutions, which use weak
XOR-based encryption [18], [66], PNS relies on strong cryp-
tography (The QARMA Block Cipher Family [2]). In contrast
to full CCFI solutions [47], [53], which use pointer authentica-
tion to protect all code pointers including return addresses, our
approach only guards pointer usages (loads and stores). Return
addresses are handled by PNS randomization, reducing the
overall performance overheads, as will be shown in Section VI.
VI. PERFORMANCE EVALUATION
In this section, we compare PNS against prior solutions,
and devise experiments to analyze the benefits of each aspect
of the PNS scheme.
As we focus on resource constrained (wimpy) devices, we
use ARM ISA to demonstrate PNS as it dominates the embed-
ded and mobile markets with its 32-bit ARMv5-8 instruction
set architecture (ISA). However, the concept of PNS can be
applied to any other ISA (e.g., RISC-V).
A. Experimental Setup
We implement PNS in the out-of-order (OoO) CPU model
of Gem5 [4] for the ARM architecture. We execute ARM32
binaries from the SPEC CPU2017 [10] C/C++ benchmark suite
on the modified simulator in syscall emulation mode with the
ex5_big configuration (see Table I), which is based on the
ARM Cortex-A15 32-bit processor.
To compile the benchmarks, we build a complete toolchain
based on a modified Clang/LLVM v7.0.0 compiler including
musl [48], compiler-rt, libunwind, libcxxabi, and
libcxx. Using a full toolchain allows us to instrument all
binary code including shared libraries and remove them from
the trusted code base (TCB). In order to evaluate PNS, we use
our modified toolchain to generate the following variants.
Core ARMv7a OoO core at 1.8 GHz
BPred: BiModeBP, 4096-entry BTB, 48-entry RAS
Fetch: 3 wide, 48-entry IQ
Issue: 8 wide, 60-entry ROB
Writeback: 8 wide, 16-entry LQ, 16-entry SQ
L1 I-cache 32KB, 2-way, 2 cycles, 64B blocks, LRU replacement,
2 MSHRs, no prefetch
L1 D-cache 32KB, 2-way, 2 cycles, 64B blocks, LRU replacement,
16-entry write buffer, 6 MSHRs, no prefetch
L2 cache 2MB, 16-way, 15 cycles, 64B blocks, LRU replacement,
8-entry write buffer, 16 MSHRs, stride prefetch
DRAM LPDDR3, 1600 MHz, 1GB, 15ns CAS latency and row
precharge, 42ns RAS latency
TABLE I: Simulation parameters.
Baseline. This is the case of an unmodified unprotected
machine. Specifically, we compile and run the SPEC CPU2017
benchmarks using an unmodified version of the toolchain and
Gem5 simulator. In all of our experiments, we use the total
number of cycles (numCycles) to complete the program, as re-
ported by Gem5, to report performance. The numCycles values
of the defenses are normalized to this baseline implementation
without defenses; thus, a normalized value greater than one
indicates higher performance overheads.
PNS. In this scenario, we run unmodified binaries on our
modified Gem5 implementation with all optimizations, as
described in Section III.
PNS-TRAP. In order to prepare PNS-TRAP binaries, we imple-
ment an LLVM backend pass to insert TRAPs at the beginning
of BBLs. This step can also be achieved with an appropriate
binary rewriter making it compatible with legacy binaries [32],
[38], [73]. In our compiler pass, we modified our PNS address
translation function in Gem5 to avoid executing the inserted
TRAP instructions for normal program execution while resolv-
ing the BBLPhantoms correctly to a single BBL in the virtual
address space.
PNS-PtrEncLite. To evaluate the performance of PNS with
PtrEnc, we first write an LLVM IR pass to instrument the code
(including shared libraries) and insert the relevant instructions
as described in CCFI [47]. Specifically, we emit instructions
whenever (1) a new object is created (to encrypt the contents
of the vptr), (2) a virtual function call is made (to decrypt the
vptr), or (3) any operation on code pointers in C programs.
Then, we appropriate the encodings for ARM’s ldc and stc
instructions respectively, which are themselves unimplemented
in Gem5, to behave as ENCP and DECP instructions. We
add a dedicated functional unit in Gem5 to handle these
instruction’s latency in order to avoid any contention on the
regular functional units. We also assume equal cycle counts
of 8 for both instructions [2]. This latency is to emulate the
effect of the actual encryption/decryption.
PtrEncFull. In this approach, we instrument code pointer
load/store operations in addition to function entry/exit points to
protect return addresses for non-leaf functions. Conceptually,
this solution is similar to ARM PAC [43]. However, due to the
absence of PAC support in Gem5 (and for 32-bit ARM archi-
tectures in general), we only perform behavioral simulation
7
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
per
lbe
nch gcc mc
f
om
net
pp
xal
anc
bm
k
x26
4
dee
psje
ng lee
la xz
nam
d lbm
ble
nde
r
ima
gick nab aMe
an
gM
ean
0.0
1.0
1.2
1.4
1.6
1.8
No
rm
. C
yc
le
s
2.3 3.4 4.6 4.8 1.9
PNS PNS+TRAP PNS+PtrEncLite PtrEncFull NNC
Fig. 6: PNS performance evaluation for SPEC2017 C/C++ benchmarks.7
for comparison purposes, without keeping track of the actual
pointer metadata.
Naive Name Confusion (NNC). For the sake of completeness
and fair comparison, we also implement a static version where
there are two copies of the code, i.e., a version without
the phantom aspect of the naming scheme. In this model,
we have two virtual addresses for each instruction but these
addresses are physically stored in memory, essentially halving
the capacity of the microarchitectural structures.5 We create
the two copies by introducing a shift of TRAP instruction
size in one of them. At a high-level our implementation
works as follows: (1) clone functions using an LLVM IR
pass, (2) LLVM backend pass to insert TRAPs for cloned
functions, (3) instruct the LLVM backend to globalize BBL
labels, (4) emit a diversifier BBL for every BBL, and (5)
rewrite branch instruction targets to point to the diversifier.
Of the 16 C/C++ benchmarks, 14 compile with all different
toolchain modifications. parest has compatibility issues with
musl due to exception handling usages, while povray failed
to run on Gem5. For NNC, gcc, xalancbmk, and x264
present compilation and/or linking issues.
B. Experimental Results
We run all benchmarks to completion with the test input
set on our augmented Gem5. We verified the correctness
of the outputs against the reference output. Figure 6 shows
the performance overhead of the different design approaches
(all normalized to Baseline). As expected, PNS has identi-
cal performance to Baseline. The overhead of PNS-TRAP is
minimal, 0%– 22% (avg. 4%)6. Adding support for PtrEnc in-
creases the performance overheads of PNS-PtrEncLite to 0%–
61% (avg. 6%). The perlbench benchmark suffers from a
relatively high overhead due to its extensive use of function
pointers and indirect branches. On the other hand, fully protect-
ing the binaries with a deterministic defense such as PtrEncFull
encounters a 91% overhead on average (geometric mean
of 62%). Our static implementation of software NNC intro-
duces an arithmetic average overhead of 31% (geometric mean
of 26%)7. In contrast to software Isomeron [21] which relies
5This model is similar to the Isomeron solution proposed by Davi et al. [21],
with the modification that it is used at the BBL granularity as opposed to the
original work, which uses a dynamic binary rewriting framework with function
granularity.
6This average is for 14/16 benchmarks, as explained in Section VI-A
7Our NNC implementation does not instrument external libraries (only the
main application code) due to compilation issues. This leads to overheads that
are less than intuitively expected.
per
lbe
nchgcc mc
f
om
net
pp
xal
anc
bm
k
x26
4
dee
psje
nglee
la xz
nam
d lbm
ble
nde
r
ima
gick nab aMe
an
gM
ean
0.0
1.0
1.2
1.4
1.6
No
rm
. C
yc
le
s
Fetch Cache Cache+Fetch
Fig. 7: PNS performance evaluation with additional one-cycle
access latency for fetch stage, L1-I$, and both.
on dynamic binary instrumentation (DBI), the overheads for
our implementation are primarily attributed to the indirection
every BBL branch must make to the diversifier.
As illustrated in Section III, the required PNS modifications
do not add additional cycle latency to the processor pipeline.
However, we performed an additional set of experiments with
a more conservative assumption of having one additional cycle
latency for all instructions in fetch stage, or one more cycle
for accessing L1 instruction cache, or both. We show results
compared to an unmodified baseline in Figure 7. We notice
an average performance overhead of 1% for stalling the fetch
stage. However, stalling the instruction cache for one cycle
(hit latency is originally two cycles) is more harmful to the
performance. Thus, the I$ optimizations are mandatory, as
described in Section III.
Finally, the call depths listed in Table II show that SPEC
programs do not exceed a depth of 244 (leela), indicating
that a 256-entry hardware Secret Domain Stack is suf-
ficient to handle the common execution cases.
Bench. Call Bench. Call Bench. Call
Name Depth Name Depth Name Depth
perlbench 24 x264 15 lbm 10
gcc 28 deepsjeng 48 blender 23
mcf 28 leela 244 imagick 22
omnetpp 196 xz 16 nab 16
xalancbmk 77 namd 12 povray -
TABLE II: Max. call depth for SPEC C/C++ benchmarks.
C. FPGA Prototyping
For the sake of completeness, we have developed an
FPGA prototype of PNS using the Bluespec hardware descrip-
tion language (HDL). Specifically, we added PNS hardware
8
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
modifications to the front-end of the 32-bit Flute RISC-V
processor, a 5-stage in-order pipelined processor typically used
for low-end applications that need MMUs [7]. We prototyped
the processor on the the Xilinx Zynq (ZCU106) Evaluation
Kit.
Our evaluation results shows that we can reliably run with
a clock period of 7.5 ns (maximum frequency of 133 MHz) for
both the baseline core and the modified one. The area increase
due to PNS is negligible (0.83% extra Flip-Flops with 2.02%
additional LUTs). We verified the correctness of our FPGA
implementation by running simple bare-metal applications.
Due to the lack of mature support for 32-bit OSs on RISC-V,
we were not able to run the application benchmarks we used
in simulation. We leaving booting Linux on our modified core
as future work.
VII. SECURITY ANALYSIS & EVALUATION
In this section, we present our evaluation results regarding
the effectiveness of PNS and discuss its security guarantees
against CRAs.
A. Threat Model
Adversarial Capabilities. We consider an adversary model
that is consistent with previous work on code-reuse attacks and
mitigations [21], [28], [47], [56]. We assume that the adversary
is aware of the applied defenses and has access to the source
code, or binary image, of the target program. Furthermore,
the target program suffers from memory safety-related vul-
nerabilities that allow the adversary to read from, and write
to, arbitrary memory addresses. The attacker’s objective is to
(ab)use memory corruption and disclosure bugs, mount a code-
reuse attack, and achieve privilege escalation.
Hardening Assumptions. We assume that the underlying OS
is trusted. If the OS is compromised and the attacker has
kernel privileges, the attacker can execute malicious code
without making ROP-style attacks; a simple mapping of the
data page as executable will suffice. We assume that ASLR
and WˆX protection are enabled—i.e., no code injection is
allowed (non-executable data), and all code sections are non-
writable (immutable code). Thus, attacks that modify pro-
gram code at runtime, such as rowhammer [39], are out of
scope. We also do not consider non-control data attacks [65],
such as Data-Oriented Programming [33] and Block-Oriented
Programming [35]. This class of attacks only tamper-with
memory load and store operations, without inducing any
unintended control flows in the program. This limitation also
applies to prior work as well [12], [21], [28], [47]. Lastly,
every other standard hardening feature (e.g., stack-smashing
protection [17], CFI [12]) is orthogonal to PNS; our proposed
scheme does not require nor preclude any such feature.
B. Secrets
There are no secret parameters in the basic PNS scheme.
The number of phantoms and the security shift can be made
public as security comes from the random selection of names.
For PNS extensions, a per-process key (used for encryption)
should be kept secret for the lifetime of the respective process.
Bench. PNS PNS Bench. PNS PNS Bench. PNS PNS
Name Chains Chains Name Chains Chains Name Chains Chains
perlbench 17 0 x264 23 0 lbm 23 0
gcc 23 0 deepsjeng 11 0 blender 23 0
mcf 11 0 leela 15 0 imagick 23 0
omnetpp 23 0 xz 11 0 nab 23 0
xalancbmk 15 0 namd 23 0 povray 23 0
TABLE III: ROP gadget-chain reduction for SPEC2017 C/C++
benchmarks. PNS and PNS correspond to the number of valid
ROP chains before and after PNS.
C. Quantitative Security Analysis
ROP-Gadget Chain Evaluation. To evaluate PNS against
real-world ROP attacks we use Ropper [55], a tool that can find
gadgets and build ROP chains for a given binary. A common
ROP attack is to target the execve function with /bin/sh as
an input to launch a shell. As the chain-creation functionality in
Ropper is only available for x86 [55], we analyze SPEC2017
x86 binaries for this particular exploit and report the number
of available gadget chains (PNS).
To emulate the effect of PNS, we modified the Ropper code
to extend each gadget length by one byte, decode the gadget,
and check if the new gadget is semantically equivalent to the
old one or not. This emulates the effect of an attacker targeting
a particular address, but instead executing the one before due
to the PNS security shift, δ . As shown in Table III, PNS foils
all the gadget-chains found by our modified Ropper. Extending
the Ropper chain-creation functionality to the ARM ISA is part
of our future work. Intuitively, the results would be even worse
for the attacker in ARM as the state-space is more constrained
due to instruction alignment requirements.
Control-flow Hijacking Evaluation. We further evaluate se-
curity by using RIPE [71], an open source intrusion prevention
benchmark suite. We port RIPE to ARM and run it on our
modified Gem5, with n= 8 bits, as described in Section VI. We
mainly focus on return-address manipulation as a target code
pointer and ret2libc/ROP as attack payloads. Shellcode
attacks are not considered as we expect WˆX.
Our ported RIPE benchmark contains 54 (relevant) attack
combinations. On an unprotected Gem5 system, 50 attacks
succeed and 4 attacks fail. After deploying PNS, all of the 54
attacks fail including the single-gadget ret2libc attacks.
That is mainly due to our high number of phantoms present
at runtime, 28 = 256.
That said, real-world exploits typically involve payloads
with several gadgets. According to Cheng et al. [16] the
shortest gadget chain consists of thirteen gadgets. Hence,
the probability for successful execution of a gadget chain
is psuccess ≤
( 1
256
)13
= 4.93×10−32. Snow et al. [61] success-
fully exploited a vulnerability with a ROP payload consisting
of only six gadgets, which would equate to a better, but still
low, success probability of psuccess =
( 1
256
)6
= 3.55×10−15.
D. Qualitative Security Evaluation
Just-In-Time Return-Oriented Programming. Although JIT-
ROP [61] permits the attacker to construct a compatible code-
reuse payload on the fly, they cannot modify the gadget chain
9
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
after the control flow has been hijacked. As a result, the
attacker needs to guess the domain of execution of the entire
JIT-ROP gadget-chain in advance. So, PNS mitigates JIT-
ROP similarly to how it mitigates (static) ROP/JOP/COP: i.e.,
by removing the attacker’s ability to put together (either in
advance or on the fly) a valid code-reuse payload. The above
security guarantees are achieved by the regular PNS proposal
(as explained in Section IV) with no extensions or program
recompilation, making it suitable for legacy binaries and shared
third party libraries.
Blind Return-Oriented Programming. BROP attacks can
remotely find ROP gadgets, in network-facing applications,
without prior knowledge of the target binary [5]. The idea
is to find enough gadgets to invoke the write system call
through trial and error; then, the target binary can be copied
from memory to the network to find more gadgets. As a proof
of concept, the authors showed an example with 5-gadgets that
invokes write. With PNS, the success probability of invoking
write would be
( 1
256
)5
= 9.09×10−13. Note that completing
an end-to-end attack requires harvesting, and using, even more
gadgets, after dumping the target binary, which makes the
attack unfeasible on a PNS-hardened system. Additionally,
BROP requires services that restart after a crash, while failed
attempts will be noticeable to a system admin.
Whole-function Reuse. Unlike ROP attacks, which (re)use
short instruction sequences, entire functions are invoked, in
this case, to manipulate the control-flow of the program. This
type of attack includes counterfeit object-oriented program-
ming (COOP) attacks, in which whole C++ functions are
invoked through code pointers in read-only memory, such as
vtables [56]. PNS relies on the PtrEnc extension to prevent
the attacker from manipulating pointers (vptr) that point to
vtables—a necessary step for mounting a COOP attack.
Ret2libc is another example for whole function reuse
attacks, in which the attacker tries to execute entire libc
functions [49], [62].8 With PNS, the attacker will have to guess
the address of the first basic block of the function in order to
lunch the attack, reducing the success probability to
( 1
256
)
=
0.0039.
Our analysis of real-world exploits shows that executing
a ret2libc attack incurs multiple steps in order for the
attacker to (1) prepare the function arguments based on the
calling convention, (2) jump to the desired function entry,
(3) silence any side-effects that occur due to executing the
whole function, and (4) reliably continue (or gracefully termi-
nate) the victim program without noticeable crashes. (1) and
(3) generally requires code-reuse (ROP) gadgets, as demon-
strated by the following publicly-available exploits: (a) ROP
+ ret2libc-based exploit against mcrypt [25], (b) ROP
+ ret2libc-based exploit against Nginx [27], (c) ROP +
ret2libc + shellcode-based exploit for Apache + PHP [24]
and (d) ROP + ret2libc-based exploit against Netperf [26].
Thus, if the ROP part of the exploit requires G gadgets,
the probability for successfully exploiting the program would
exponentially decrease to psuccess ≤
( 1
256
)G
. That is because
the attacker will have to guess the domain of execution (out
of 28 = 256 phantoms) of every gadget.
8In general, any function, of any other shared library, or even the main
binary itself, can be used instead.
A natural question to ask is: can the attacker leverage PNS’s
mechanisms to hijack the system? The answer is no for the
following reasons. To divert the control flow of a program,
an attacker must corrupt either (1) return addresses, which are
protected with PNS randomization, or (2) function pointers,
which are protected by the PtrEnc extension. To corrupt
return addresses, an attacker must make a guess (this will
exponentially scale with the number of return addresses to
be corrupted) to determine the correct execution domain. To
bypass PtrEnc, an attacker has to leak the key, which is
hardware-based, or divert the control flow to a signing gadget
(an encryption instruction). The latter requires hijacking the
control-flow first, which is already guarded with randomization
and encryption thus constructing a chicken and egg dilemma.
Side-channel Attacks. PNS takes multiple steps to be resilient
to side channel attacks. Firstly, PNS purposefully avoids timing
variances introduced due to hardware modifications, in order
to limit timing-based side channel attacks. Additionally, the
attacker cannot leak the random phantom index, p, which are
generated by the selector as it is unreadable from both user and
kernel mode—it exists within the processor only. Similarly, the
execution domain cannot be leaked to the attacker through the
architectural stack, as PNS keeps it within the hardware in
the SDS.
VIII. PNS SYSTEM LEVEL SUPPORT
For completeness, we outline design changes required to
deploy a PNS general-purpose system.
Sizing. Although SDS only stores eight bits per return address
in hardware, it still has a limited size that cannot be dynami-
cally increased as the architectural stack. This means programs
with deeply nested function calls may result in a SDS overflow.
To handle this issue, we add two new hardware exception sig-
nals: hardware-stack-overflow and hardware-stack-underflow.
The former is raised when the SDS overflows. In this case, the
OS (or another trusted entity), encrypts and copies the contents
of the SDS to the kernel memory. This kernel memory location
will be a stack of stacks and every time a stack is full it will
be appended to the previous full stack. The second exception
will be raised when the SDS is empty to decrypt and page-in
the last saved full-stack from kernel memory.
Stack Unwinding. Since addresses are split across the ar-
chitectural (software) stack and the SDS it is vital to keep
them in sync for correct operation. Earlier, we described how
normal LIFO call/rets are handled. In some cases, however,
the stack can be reset arbitrarily by setjmp/longjmp or
C++ exception handling. To ensure the stack cannot be dis-
closed/manipulated maliciously during non-LIFO operations,
we change the runtime to encrypt the jmp_buffer before
storing it to memory. Additionally, we also store the current
index of the SDS. When a longjmp is executed, we decrypt
the contents of the jmp_buffer and use the decrypted SDS
index to re-synchronize it with the architectural stack. The
same approach can be applied to the C++ exception handling
mechanism by instrumenting the appropriate APIs.
Context Switches. The SDS of the current process is stored in
the Process Control Block before a context switch. In terms of
cost, the typical size of the SDS is 256-bytes (256 entries, each
has 8-bits). Moving this number of bytes between the SDS
10
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
and memory during context switch requires just a few load
and store instructions, which consume a few cycles. This
overhead is negligible with respect to the overhead of the rest
of the context switch (which happens infrequently; every tens
of milliseconds).
Multithreading. To support multithreading, the SDS has to
be extended with a multithreading context identifier, which
increases the size of stack linearly with number of thread
contexts that can be supported per hardware core.
Dynamic Linking. Dynamically-linked shared libraries are
essential to modern software as they reduce program size
and improve locality. Although most embedded system soft-
ware (the primary target in this work) in MCUs is typically
statically-linked, we note that PNS is compatible with shared
libraries as it can be fully realized in hardware. Thus, it does
not differentiate between BBLs related to the main program
and the ones corresponding to shared libraries. On the other
hand, dynamic linking has been a challenge for many CFI
solutions, as control flow graph edges that span modules may
be unavailable statically. CCFI [47] suffers from the same
limitation as the dynamically shared library code needs to be
instrumented before execution; otherwise, the respective pages
will be vulnerable to code pointer manipulation attacks.
IX. RELATED WORK
As shown in Section I, the idea of having multiple names
for the same instruction is fundamentally different compared
to other security paradigms. Further, in Section VI we showed
that PNS has lower overheads compared with the state-of-
the-art commercial solution (ARM PAC). In this section, we
explore prior CRA mitigations and discuss their benefits and
differences (summarized in Table IV).
CRA Mitigations for Resource Constrained Devices. Ny-
man et al. [50] introduced CaRE, an interrupt-aware control-
flow integrity (CFI) scheme for low-end microcontrollers that
leverages TrustZone-M security extensions. CaRE instruments
binaries in a manner which removes all function calls and
indirect branches and replaces them with dispatch instructions
that trap control flow to a branch monitor. Although the
branch monitor eliminates control-flow attacks that utilize
such branches, the performance overheads introduced range
between 13% and 513%. In the case of indirect calls, CaRE
matches the branch target against a record of valid subroutine
entry points. Unlike PNS-PtrEncLite, this coarse-grained ap-
proach does not protect against whole function reuse attacks.
Hardware-based CRA Mitigations. Intel architectures offer
a hardware-based CFI technology named Control-flow En-
forcement Technology (CET) that is to be available in future
x86 processors [34], [58]. CET adds a new ENDBRANCH
instruction, which is placed at the entry of each BBL that
can be invoked via an indirect branch. When an indirect
forward branch occurs, the following instruction is expected
to be an ENDBRANCH, otherwise an attack is assumed. CET
provides only coarse-grained protection where any of the
possible indirect targets are allowed at every indirect control-
flow transfer. Thus, an attacker can still reuse the whole BBL
and store the address of the ENDBRANCH of the desired BBL
in the stack as before. The above attack will fail against PNS
with high probability as every instruction (and basic block)
can have up to N different addresses forcing the attacker to
gamble on which one to use.
Additionally, CET protects call-return instructions
using a full shadow stack (i.e., 32 or 64 bits per entry),
that resides in virtual memory. Unlike a shadow stack which
compares return address on every ret instruction, our SDS
only concatenates the domain bits to the return address with
no wasteful comparisons. Furthermore, PNS uses a smaller
hardware structure (the SDS) that consumes 8 bits per entry
and that cannot be leaked by an attacker who can illegally
tamper main memory.
Recently, ARM introduced the Pointer Authentication Code
(PAC) feature in Armv8.3A as a hardware primitive to
mitigate CRAs [53]. Hans et al. showed how to harden ARM
PAC against reply attacks by using unique tweaks (along with
the authentication key) for different pointer types [43]. As
discussed in Section IV-B, ARM PAC relies on the currently
unused upper bits of the 64-bit pointers. Mapping the same
technique to non 64-bit systems results in high performance
overheads, as evaluated in Section VI.
While our PNS-PtrEncLite extension relies on crypto-
graphic algorithms similar to ARM PAC [53], PNS-PtrEncLite
has two main advantages. First, PNS-PtrEncLite uses en-
cryption instead of authentication to avoid storing additional
metadata (authentication code) per pointer on 32-bit systems.
Second, ARM PAC is applied for all code pointers including
return addresses and function pointers. This is represented
by PtrEncFull in our evaluation. On the other hand, PNS-
PtrEncLite is only applied for function pointers (and C++ vir-
tual pointers) as the return addresses are protect by PNS’s fine-
grained randomization. The reduction in the cryptographically-
protected locations highly reduced the performance overheads,
as shown in Section VI.
N-Variant eXecution Systems. The general idea of N-variant
execution (NVX) systems is to run N different copies/variants
of the same code, alongside each other, while checking their
runtime behavior [3], [19]. If the variants produce a different
response to a single common input (due to an internal failure
or external attack payload), the checker detects such diver-
gences in execution and raises an alert. Since 2006, many
NVX systems have been proposed to achieve reliability and
security goals [29], [40], [42], [46], [67], [68]. While NVX
systems can offer additional benefits over PNS, such as precise
failure detection, they suffer from considerable performance
(at least 100%) and memory overheads, and therefore are not
suitable for resource constrained systems.
Code Randomization. Multiple work has proposed using in-
place code randomization (aka instruction set randomization)
to defeat code reuse attacks [36], [51], [59]. The main idea is
to randomize the encoding of instructions in memory, while
maintaining a unique instruction name per program execution.
Unlike PNS, which dynamically change instruction names
at runtime, the above techniques are static (i.e., applied in
compile time) in nature. As a result, they are susceptible to
JIT-ROP attacks while PNS is not.
Live Randomization. Recent work has pioneered the use
of hardware moving target defenses to protect against code-
reuse attacks [28]. Gallagher et al. proposed Morpheus, an
architecture that (1) displaces code and data pointers in the
11
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
Proposal Hardware Software Randomization Main Sources of Overheads Cost of Portability to 32-bit systems Energy
Support Modifications Interval Overheads
CaRE [50] Yes Recompile No Directing every branch to external monitor None Moderate
Intel CET [34] Yes Recompile No Maintaining full shadow stack None Low
CCFI [47] No Recompile No Using complete pointer authentication Extra Load/Store per pointer Moderate
ARM PAC [53] Yes Recompile No Using complete pointer authentication (negligible on h/w) Extra Load/Store per pointer Moderate
NVX [3], [46] No Recompile No Running N program copies simultaneously Increase overheads by a factor of N High
Isomeron [21] No DBI 1 ms (Func. time) Maintaining two program copies (high TLB and I$ misses) None High
Shuffler [72] No DBI 50 ms Offloading computations to another core/thread Double overheads on single-core systems High
Morpheus [28] Yes Recompile 50 ms Adding 2-bit tags per 64-bit words (pointer size) Double memory tags overhead Low
PNS Yes None 10 ns (BBL time) Using Lightweight Pointer Encryption None Low
TABLE IV: Comparison with prior work.
address space (2) diversifies the representation of code and
pointers using strong encryption, and (3) periodically repeats
the above steps using a different displacement and key.
The main conceptual difference between Morpheus and
PNS is that in PNS, at any given instant there are multiple
names (addresses) for an instruction while there is only one
name (address) for an instruction in Morpheus. This distinction
is also true of PNS and software moving target systems [72]
used to protect against code reuse attacks.
In terms of security, Morpheus must keep two parameters
a secret until they are changed: displacements for the code and
data regions, and keys for encrypting/decrypting pointers. In
the basic PNS there are no secrets, and in the enhanced PNS,
there is only one secret viz. the key used for code pointer
encryption. If PNS is added to Morpheus it increases the
security level offered by Morpheus because the attacker has
to disclose the displacement and break the name confusion to
mount a code-reuse attack.
PNS can also provide an illusion of a faster churn rate. The
churn time can be thought of as the time an attacker has to
deploy a countermeasure. PNS, forces the attacker to have a
counter strategy every basic block which normally completes
execution in the order of nanoseconds. While Morpheus’ churn
rate (milliseconds for PNS level of performance) is sufficient
to protect against remote network adversaries, the (apparently)
faster churn provided by PNS is meaningful in offering pro-
tection against local attackers especially with side channel
capabilities, and thus is again complementary to Morpheus.
The BBL-by-BBL apparent churn offered by PNS also comes
at much lower energy cost compared to Morpheus as it does
not require memory scanning to identify pointers. Finally from
a deployment perspective, a unique benefit of PNS is that it
works for “wimpy” non-64-bit systems while Morpheus and
software moving target systems, rely on the availability of a
64-bit address space for security.
Memory Safety Defenses. Hardware primitives for memory
safety, such as LowFat [41], CHERI [69], REST [60], and
Califorms [54], can mitigate CRAs by detecting the initial
memory safety violation. While providing higher security
guarantees than PNS, the above techniques are less suitable
for resource constrained systems due to their high energy
overheads in addition to their intrusive changes to the entire
software stack (including software, hardware, and OS). On
the contrary, PNS is able to enhance security even for legacy
binaries.
X. CONCLUSION
In this paper, we proposed PNS, a name confusion design
that allows for multiple addresses/names for individual instruc-
tions. We also demonstrated an application of PNS, which
is used to mitigate code-reuse attacks, a prominent class of
attacks that have proven costly to mitigate. The key idea is
to force the attacker to carry out the difficult task of guessing
which randomly-chosen name will be used, by the hardware,
to carry out a successful attack. Building on this idea we
show that we protect against a variety of code-reuse attacks,
including the state-of-the-art JIT-ROP and COOP.
While it offers strong security guarantees, PNS requires
minor modifications to the processor front-end: specifically, it
requires changes to indexing functions, 8 metastable flip-flops,
and 256 bytes of state. Experimental results showed that PNS
incurs negligible performance impact compared to hardware-
based cryptographic control-flow integrity schemes, which is
the state-of-the-art commercial solution (implemented on the
iPhone XS). Another major benefit of PNS is that it does not
depend on “free” bits or the vastness of the 64-bit address
space to work, making it suitable for 16- and 32-bit micro-
controllers and microprocessors. For the foreseeable future,
code-reuse attacks will continue to plague systems security.
The increased proliferation of resource-constrained systems
that cannot deal with the performance overheads of server-
grade defenses calls for more efficient mitigations. Thus,
PNS provides a cheaply deployable hardware technique that
strengthens control flow protection uniformly across embedded
and server ecosystems.
REFERENCES
[1] M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti, “Control-flow
integrity,” in Proceedings of the 12th ACM Conference on Computer
and Communications Security, ser. CCS ’05, Alexandria, VA, USA,
2005, pp. 340–353.
[2] R. Avanzi, “The QARMA block cipher family. almost MDS matrices
over rings with zero divisors, nearly symmetric even-mansour construc-
tions with non-involutory central rounds, and search heuristics for low-
latency S-boxes,” IACR Transactions on Symmetric Cryptology, vol.
2017, no. 1, pp. 4–44, Mar. 2017.
[3] E. D. Berger and B. G. Zorn, “Diehard: Probabilistic memory safety
for unsafe languages,” in Proceedings of the 27th ACM SIGPLAN
Conference on Programming Language Design and Implementation, ser.
PLDI ’06, Ottawa, Ontario, Canada, 2006, pp. 158–168.
[4] N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu,
J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell,
M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, “The Gem5
simulator,” SIGARCH Computer Architecture News, 2011.
12
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
[5] A. Bittau, A. Belay, A. Mashtizadeh, D. Mazie`res, and D. Boneh,
“Hacking blind,” in Proceedings of the 2014 IEEE Symposium on
Security and Privacy, ser. S&P ’14, Washington, DC, USA, 2014, pp.
227–242.
[6] T. Bletsch, X. Jiang, V. W. Freeh, and Z. Liang, “Jump-oriented
programming: A new class of code-reuse attack,” in Proceedings of the
6th ACM Symposium on Information, Computer and Communications
Security, ser. ASIACCS ’11, Hong Kong, China, 2011, pp. 30–40.
[7] Bluespec, “Flute: 5-stage, in-order, piplined RISC-V CPU,” https:
//github.com/bluespec/Flute, 2019, [Online; accessed 15-June-2020].
[8] D. Bounov, R. Gokhan Kici, and S. Lerner, “Protecting C++ dynamic
dispatch through VTable interleaving,” in Proceedings of the 2016
Network and Distributed System Security Symposium, ser. NDSS ’16,
San Diego, CA, USA, February 2016.
[9] D. Bovet and M. Cesati, Understanding the Linux Kernel, Second
Edition, 2nd ed., A. Oram, Ed. Sebastopol, CA, USA: O’Reilly &
Associates, Inc., 2002.
[10] J. Bucek, K.-D. Lange, and J. v. Kistowski, “SPEC CPU2017:
Next-generation compute benchmark,” in Proceedings of the 2018
ACM/SPEC International Conference on Performance Engineering, ser.
ICPE ’18, Berlin, Germany, April 2018, pp. 41–42.
[11] E. Buchanan, R. Roemer, H. Shacham, and S. Savage, “When good in-
structions go bad: Generalizing return-oriented programming to RISC,”
in Proceedings of the 15th ACM Conference on Computer and Com-
munications Security, ser. CCS ’08, Alexandria, Virginia, USA, 2008,
pp. 27–38.
[12] N. Burow, S. A. Carr, J. Nash, P. Larsen, M. Franz, S. Brunthaler, and
M. Payer, “Control-flow integrity: Precision, security, and performance,”
ACM Computing Surveys (CSUR), vol. 50, no. 1, p. 16, 2017.
[13] N. Burow, D. McKee, S. A. Carr, and M. Payer, “CFIXX: Object type
integrity for C++ virtual dispatch,” in Proceedings of the 2018 Network
and Distributed System Security Symposium, ser. NDSS ’18, San Diego,
CA, USA, February 2018.
[14] N. Burow, X. Zhang, and M. Payer, “SoK: Shining light on shadow
stacks,” in Proceedings of the 2019 IEEE Symposium on Security and
Privacy, ser. S&P ’19, May 2019.
[15] S. Checkoway, L. Davi, A. Dmitrienko, A.-R. Sadeghi, H. Shacham,
and M. Winandy, “Return-oriented programming without returns,” in
Proceedings of the 17th ACM Conference on Computer and Communi-
cations Security, ser. CCS ’10, Chicago, Illinois, USA, 2010, pp. 559–
572.
[16] Y. Cheng, Z. Zhou, M. Yu, X. Ding, and R. H. Deng, “Ropecker:
A generic and practical approach for defending against ROP attacks,”
in Proceedings of the 2014 Network and Distributed System Security
Symposium, ser. NDSS ’16, San Diego, CA, USA, February 2014.
[17] C. Cowan, C. Pu, D. Maier, J. Walpole, P. Bakke, S. Beattie, A. Grier,
P. Wagle, Q. Zhang, and H. Hinton, “Stackguard: Automatic adaptive
detection and prevention of buffer-overflow attacks.” in USENIX secu-
rity symposium, vol. 98, 1998, pp. 63–78.
[18] C. Cowan, S. Beattie, J. Johansen, and P. Wagle, “Pointguard: Protecting
pointers from buffer overflow vulnerabilities,” in Proceedings of the
12th Conference on USENIX Security Symposium - Volume 12, ser.
SSYM ’03, Washington, DC, USA, 2003, p. 7.
[19] B. Cox, D. Evans, A. Filipi, J. Rowanhill, W. Hu, J. Davidson, J. Knight,
A. Nguyen-Tuong, and J. Hiser, “N-variant systems: A secretless
framework for security through diversity,” in Proceedings of the 15th
Conference on USENIX Security Symposium - Volume 15, ser. USENIX-
SS’06, Vancouver, B.C., Canada, 2006.
[20] O.-J. Dahl, E. W. Dijkstra, and C. A. R. Hoare, Eds., Structured
Programming. London, UK: Academic Press Ltd., 1972.
[21] L. Davi, C. Liebchen, A.-R. Sadeghi, K. Z. Snow, and F. Monrose, “Iso-
meron: Code randomization resilient to (Just-In-Time) return-oriented
programming,” in Proceedings of the 2015 Network and Distributed
System Security Symposium, ser. NDSS ’15, San Diego, CA, USA,
February 2015.
[22] L. V. Davi, A. Dmitrienko, S. Nu¨rnberger, and A.-R. Sadeghi, “Gadge
me if you can: Secure and efficient ad-hoc instruction-level random-
ization for x86 and ARM,” in Proceedings of the 8th ACM SIGSAC
Symposium on Information, Computer and Communications Security,
ser. ASIA CCS ’13, Hangzhou, China, 2013, pp. 299–310.
[23] U. Drepper, “Security enhancements in redhat enterprise Linux (beside
SELinux),” 2004.
[24] Exploit Database, “Apache 2.4.7 + php 7.0.2 - openssl seal() unini-
tialized memory code execution,” https://www.exploit-db.com/exploits/
40142, [Online; accessed 15-June-2020].
[25] ——, “mcrypt 2.5.8 - local stack overflow,” https://www.exploit-db.
com/exploits/22928, [Online; accessed 15-June-2020].
[26] ——, “Netperf 2.6.0 - stack-based buffer overflow,” https://www.
exploit-db.com/exploits/46997, [Online; accessed 15-June-2020].
[27] ——, “Nginx 1.3.9 ¡ 1.4.0 - chuncked encoding stack buffer overflow,”
https://www.exploit-db.com/exploits/25775, [Online; accessed 15-June-
2020].
[28] M. Gallagher, L. Biernacki, S. Chen, Z. B. Aweke, S. F. Yitbarek, M. T.
Aga, A. Harris, Z. Xu, B. Kasikci, V. Bertacco, S. Malik, M. Tiwari, and
T. Austin, “Morpheus: A vulnerability-tolerant secure architecture based
on ensembles of moving target defenses with churn,” in Proceedings
of the Twenty-Fourth International Conference on Architectural Support
for Programming Languages and Operating Systems, ser. ASPLOS ’19,
Providence, RI, USA, 2019, pp. 469–484.
[29] R. Gawlik, P. Koppe, B. Kollenda, A. Pawlowski, B. Garmany, and
T. Holz, “Detile: Fine-grained information leak detection in script
engines,” in Proceedings of the 13th International Conference on
Detection of Intrusions and Malware, and Vulnerability Assessment -
Volume 9721, ser. DIMVA ’16, San Sebastian, Spain, 2016, pp. 322–
342.
[30] E. Go¨ktas, E. Athanasopoulos, H. Bos, and G. Portokalidis, “Out of
control: Overcoming control-flow integrity,” in Proceedings of the 2014
IEEE Symposium on Security and Privacy, ser. S&P ’14, San Jose, CA,
USA, may 2014, pp. 575–589.
[31] E. Go¨ktas¸, R. Gawlik, B. Kollenda, E. Athanasopoulos, G. Portokalidis,
C. Giuffrida, and H. Bos, “Undermining information hiding (and what
to do about it),” in Proceedings of the 25th USENIX Security Symposium
(USENIX Security 16), Austin, TX, Aug. 2016, pp. 105–119.
[32] D. Ha, W. Jin, and H. Oh, “REPICA: Rewriting position independent
code of ARM,” IEEE Access, vol. 6, pp. 50 488–50 509, 2018.
[33] H. Hu, S. Shinde, S. Adrian, Z. L. Chua, P. Saxena, and Z. Liang,
“Data-oriented programming: On the expressiveness of non-control data
attacks,” in Proceedings of the 2016 IEEE Symposium on Security and
Privacy, ser. S&P ’16, San Jose, CA, USA, May 2016, pp. 969–986.
[34] Intel, “Intel control-flow enforcement technology preview,”
https://software.intel.com/sites/default/files/managed/4d/2a/
control-flow-enforcement-technology-preview.pdf, 2017, [Online;
accessed 15-June-2020].
[35] K. K. Ispoglou, B. AlBassam, T. Jaeger, and M. Payer, “Block oriented
programming: Automating data-only attacks,” in Proceedings of the
2018 ACM SIGSAC Conference on Computer and Communications
Security, ser. CCS ’18, Toronto, Canada, 2018, pp. 1868–1882.
[36] G. S. Kc, A. D. Keromytis, and V. Prevelakis, “Countering code-
injection attacks with instruction-set randomization,” in Proceedings of
the 10th ACM Conference on Computer and Communications Security,
ser. CCS ’03, Washington D.C., USA, 2003, pp. 272–280.
[37] L.-S. Kim and R. W. Dutton, “Metastability of CMOS latch/flip-flop,”
IEEE Journal of Solid-State Circuits, vol. 25, no. 4, pp. 942–951, Aug
1990.
[38] T. Kim, C. H. Kim, H. Choi, Y. Kwon, B. Saltaformaggio, X. Zhang,
and D. Xu, “RevARM: A platform-agnostic ARM binary rewriter for
security applications,” in Proceedings of the 33rd Annual Computer
Security Applications Conference, ser. ACSAC ’17, Orlando, FL, USA,
2017, pp. 412–424.
[39] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson,
K. Lai, and O. Mutlu, “Flipping bits in memory without accessing them:
An experimental study of DRAM disturbance errors,” in Proceeding of
the 41st Annual International Symposium on Computer Architecuture,
ser. ISCA ’14, Minneapolis, Minnesota, USA, 2014, pp. 361–372.
[40] K. Koning, H. Bos, and C. Giuffrida, “Secure and efficient multi-variant
execution using hardware-assisted process virtualization,” in Proceed-
ings of the 2016 46th Annual IEEE/IFIP International Conference on
Dependable Systems and Networks, ser. DSN ’16, June 2016, pp. 431–
442.
[41] A. Kwon, U. Dhawan, J. M. Smith, T. F. Knight, Jr., and A. DeHon,
13
M. Tarek Ibn Ziad, M. Arroyo, E. Manzhosov, V. P. Kemerlis, and S. Sethumadhavan
“Low-fat pointers: Compact encoding and efficient gate-level implemen-
tation of fat pointers for spatial safety and capability-based security,”
in Proceedings of the 2013 ACM SIGSAC Conference on Computer &
Communications Security, ser. CCS ’13, Berlin, Germany, 2013, pp.
721–732.
[42] Y. Kwon, D. Kim, W. N. Sumner, K. Kim, B. Saltaformaggio, X. Zhang,
and D. Xu, “LDX: Causality inference by lightweight dual execution,”
in Proceedings of the Twenty-First International Conference on Archi-
tectural Support for Programming Languages and Operating Systems,
ser. ASPLOS ’16, Atlanta, Georgia, USA, 2016, pp. 503–515.
[43] H. Liljestrand, T. Nyman, K. Wang, C. C. Perez, J.-E. Ekberg, and
N. Asokan, “PAC it up: Towards pointer integrity using ARM pointer
authentication,” in Proceedings of the 28th USENIX Security Sympo-
sium, ser. (USENIX Security 19), Santa Clara, CA, USA, Aug. 2019,
pp. 177–194.
[44] LLVM, “Control flow integrity design,” https://clang.llvm.org/docs/
ControlFlowIntegrityDesign.html, [Online; accessed 15-June-2020].
[45] M. E. Locasto, S. Sidiroglou, and A. D. Keromytis, “Application
communities: Using monoculture for dependability,” in Proceedings
of the First Conference on Hot Topics in System Dependability, ser.
HotDep05. Yokohama, Japan: USENIX Association, 2005, p. 9.
[46] K. Lu, M. Xu, C. Song, T. Kim, and W. Lee, “Stopping memory disclo-
sures via diversification and replicated execution,” IEEE Transactions
on Dependable and Secure Computing, 2018.
[47] A. J. Mashtizadeh, A. Bittau, D. Boneh, and D. Mazie`res, “CCFI:
Cryptographically enforced control flow integrity,” in Proceedings of
the 22Nd ACM SIGSAC Conference on Computer and Communications
Security, ser. CCS ’15, Denver, Colorado, USA, 2015, pp. 941–951.
[48] musl, “musl libc,” http://www.musl-libc.org/, [Online; accessed 15-
June-2020].
[49] Nergal, “The advanced return-into-lib(c) exploits: PaX case study,” http:
//phrack.org/issues/58/4.html, 2001, [Online; accessed 15-June-2020].
[50] T. Nyman, J.-E. Ekberg, L. Davi, and N. Asokan, “CFI CaRE:
Hardware-supported call and return enforcement for commercial micro-
controllers,” in Research in Attacks, Intrusions, and Defenses. Springer
International Publishing, 2017, pp. 259–284.
[51] A. Papadogiannakis, L. Loutsis, V. Papaefstathiou, and S. Ioannidis,
“ASIST: Architectural support for instruction set randomization,” in
Proceedings of the 2013 ACM SIGSAC Conference on Computer and
Communications Security, ser. CCS ’13, Berlin, Germany, 2013, pp.
981–992.
[52] V. Pappas, M. Polychronakis, and A. D. Keromytis, “Smashing the
gadgets: Hindering return-oriented programming using in-place code
randomization,” in Proceedings of the 2012 IEEE Symposium on
Security and Privacy, ser. S&P ’12, May 2012, pp. 601–615.
[53] I. Qualcomm Technologies, “Pointer authentication on
ARMv8.3,” https://www.qualcomm.com/media/documents/files/
whitepaper-pointer-authentication-on-armv8-3.pdf, 2017, [Online;
accessed 15-June-2020].
[54] H. Sasaki, M. A. Arroyo, M. T. I. Ziad, K. Bhat, K. Sinha, and
S. Sethumadhavan, “Practical byte-granular memory blacklisting using
Califorms,” in Proceedings of the 52nd Annual IEEE/ACM International
Symposium on Microarchitecture, ser. MICRO ’52, Columbus, OH,
USA, 2019, pp. 558–571.
[55] S. Schirra, “Ropper,” https://github.com/sashs/Ropper, [Online; ac-
cessed 15-June-2020].
[56] F. Schuster, T. Tendyck, C. Liebchen, L. Davi, A.-R. Sadeghi, and
T. Holz, “Counterfeit object-oriented programming: On the difficulty
of preventing code reuse attacks in C++ applications,” in Proceedings
of the 2015 IEEE Symposium on Security and Privacy, ser. S&P ’15,
Oakland, CA, USA, 2015, pp. 745–762.
[57] H. Shacham, “The geometry of innocent flesh on the bone: Return-
into-libc without function calls (on the x86),” in Proceedings of the
14th ACM Conference on Computer and Communications Security, ser.
CCS ’07, Alexandria, Virginia, USA, 2007, pp. 552–561.
[58] V. Shanbhogue, D. Gupta, and R. Sahita, “Security analysis of pro-
cessor instruction set architecture for enforcing control-flow integrity,”
in Proceedings of the 8th International Workshop on Hardware and
Architectural Support for Security and Privacy, ser. HASP ’19, Phoenix,
AZ, USA, 2019.
[59] K. Sinha, V. P. Kemerlis, and S. Sethumadhavan, “Reviving instruction
set randomization,” in Proceedings of the 2017 IEEE International
Symposium on Hardware Oriented Security and Trust (HOST), 2017,
pp. 21–28.
[60] K. Sinha and S. Sethumadhavan, “Practical memory safety with REST,”
in Proceedings of the 45th Annual International Symposium on Com-
puter Architecture, ser. ISCA ’18, Los Angeles, California, USA, 2018,
pp. 600–611.
[61] K. Z. Snow, F. Monrose, L. Davi, A. Dmitrienko, C. Liebchen, and
A.-R. Sadeghi, “Just-in-time code reuse: On the effectiveness of fine-
grained address space layout randomization,” in Proceedings of the 2013
IEEE Symposium on Security and Privacy, ser. S&P ’13, Berkeley, CA,
USA, May 2013, pp. 574–588.
[62] Solar Designer, “Getting around non-executable stack (and fix),” http:
//seclists.org/bugtraq/1997/Aug/63, August 1997.
[63] Statista, Gartner, and IDC, “Server shipments worldwide
from 2010 to 2018.” https://www.statista.com/statistics/219596/
worldwide-server-shipments-by-vendor/, 2019, [Online; accessed
15-June-2020].
[64] Statista and I. Insights, “Microcontroller unit (mcu) shipments world-
wide from 2015 to 2023.” https://www.statista.com/statistics/935382/
worldwide-microcontroller-unit-shipments/, 2019, [Online; accessed
15-June-2020].
[65] L. Szekeres, M. Payer, T. Wei, and D. Song, “SoK: Eternal war in
memory,” in Proceedings of the 2013 IEEE Symposium on Security
and Privacy, ser. S&P ’13, San Francisco, CA, USA, 2013, pp. 48–62.
[66] N. Tuck, B. Calder, and G. Varghese, “Hardware and binary modifica-
tion support for code pointer protection from buffer overflow,” in 37th
International Symposium on Microarchitecture (MICRO ’04), Portland,
OR, USA, 2004, pp. 209–220.
[67] S. Volckaert, B. Coppens, and B. De Sutter, “Cloning your gadgets:
Complete ROP attack immunity with multi-variant execution,” IEEE
Transactions on Dependable and Secure Computing, vol. 13, no. 4, pp.
437–450, July 2016.
[68] S. Volckaert, B. Coppens, A. Voulimeneas, A. Homescu, P. Larsen,
B. De Sutter, and M. Franz, “Secure and efficient application monitoring
and replication,” in Proceedings of the 2016 USENIX Conference on
Usenix Annual Technical Conference, ser. ATC ’16, Denver, CO, USA,
2016, pp. 167–179.
[69] R. N. M. Watson, J. Woodruff, P. G. Neumann, S. W. Moore, J. An-
derson, D. Chisnall, N. H. Dave, B. Davis, K. Gudka, B. Laurie,
S. J. Murdoch, R. M. Norton, M. Roe, S. D. Son, and M. Vadera,
“CHERI: A hybrid capability-system architecture for scalable software
compartmentalization,” in Proceedings of the 2015 IEEE Symposium on
Security and Privacy, San Jose, CA, USA, May 2015, pp. 20–37.
[70] O. Whitehouse, “An analysis of address space layout randomization on
windows vista,” Jan 2007.
[71] J. Wilander, N. Nikiforakis, Y. Younan, M. Kamkar, and W. Joosen,
“RIPE: Runtime intrusion prevention evaluator,” in Proceedings of the
27th Annual Computer Security Applications Conference, ser. ACSAC
’11, Orlando, Florida, USA, 2011, pp. 41–50.
[72] D. Williams-King, G. Gobieski, K. Williams-King, J. P. Blake, X. Yuan,
P. Colp, M. Zheng, V. P. Kemerlis, J. Yang, and W. Aiello, “Shuffler:
Fast and deployable continuous code re-randomization,” in Proceedings
of the 12th USENIX Symposium on Operating Systems Design and
Implementation, ser. OSDI 16. Savannah, GA, USA: USENIX
Association, 2016, pp. 367–382.
[73] D. Williams-King, H. Kobayashi, K. Williams-King, G. Patterson,
F. Spano, Y. J. Wu, J. Yang, and V. P. Kemerlis, “Egalito: Layout-
agnostic binary recompilation,” in Proceedings of the Twenty-Fifth
International Conference on Architectural Support for Programming
Languages and Operating Systems, 2020, pp. 133–147.
[74] C. Zhang, S. A. Carr, T. Li, Y. Ding, C. Song, M. Payer, and D. Song,
“Vtrust: Regaining trust on virtual calls,” in Proceedings of the 2016
Network and Distributed System Security Symposium, ser. NDSS ’16,
San Diego, CA, USA, February 2016.
14
