PAC it up: Towards Pointer Integrity using ARM Pointer Authentication by Liljestrand, Hans et al.
PAC it up: Towards Pointer Integrity using ARM Pointer Authentication
Hans Liljestrand
Aalto University, Finland
Huawei Technologies Oy, Finland
hans.liljestrand@aalto.fi
Carlos Chinea Perez
Huawei Technologies Oy, Finland
carlos.chinea.perez@huawei.com
Thomas Nyman
Aalto University, Finland
thomas.nyman@aalto.fi
Jan-Erik Ekberg
Huawei Technologies Oy, Finland
Aalto University, Finland
jan.erik.ekberg@huawei.com
Kui Wang
Huawei Technologies Oy, Finland
Tampere University of Technology, Finland
wang.kui1@huawei.com
N. Asokan
Aalto University, Finland
asokan@acm.org
Abstract
Run-time attacks against programs written in memory-
unsafe programming languages (e.g., C and C++) remain a
prominent threat against computer systems. The prevalence
of techniques like return-oriented programming (ROP) in at-
tacking real-world systems has prompted major processor
manufacturers to design hardware-based countermeasures
against specific classes of run-time attacks. An example is
the recently added support for pointer authentication (PA)
in the ARMv8-A processor architecture, commonly used in
devices like smartphones. PA is a low-cost technique to au-
thenticate pointers so as to resist memory vulnerabilities. It
has been shown to enable practical protection against mem-
ory vulnerabilities that corrupt return addresses or function
pointers. However, so far, PA has received very little atten-
tion as a general purpose protection mechanism to harden
software against various classes of memory attacks.
In this paper, we use PA to build novel defenses
against various classes of run-time attacks, including the
first PA-based mechanism for data pointer integrity. We
present PARTS, an instrumentation framework that inte-
grates our PA-based defenses into the LLVM compiler and
the GNU/Linux operating system and show, via systematic
evaluation, that PARTS provides better protection than cur-
rent solutions at a reasonable performance overhead.
1 Introduction
Memory corruption vulnerabilities, such as buffer overflows,
continue to be a prominent threat against modern software
applications written in memory-unsafe programming lan-
guages, like C and C++. Theses vulnerabilities can be ex-
ploited to overwrite data in program memory. By over-
writing control data, such as code pointers and return ad-
dresses, attackers can redirect execution to attacker-chosen
locations. Return-oriented programming (ROP) [38] is a
well known technique that allows the attacker to leverage
corrupted control-data and pre-existing code sequences to
construct powerful (Turing-complete) attacks without the
need to inject code into the victim program. By over-
writing non-control data, such as variables used for deci-
sion making, attackers can also influence program behav-
ior without breaking the program’s control-flow integrity
(CFI) [1]. Such attacks can cause the program to leak sen-
sitive data or escalate attacker privileges. Recent work has
shown that non-control-data attacks can also be generalized
to achieve Turing-completeness. Such data-oriented pro-
gramming (DOP) attacks [18] are difficult to defend against,
and are an appealing attack technique for future run-time ex-
ploitation. Software defenses against run-time attacks can
offer strong security guarantees, but their usefulness is lim-
ited by high performance overhead, or requiring significant
changes to system software architecture. Consequently, de-
ployed solutions (e.g., Microsoft EMET [28]) trade off secu-
rity for performance. Various hardware-assisted defenses in
the research literature [16, 46, 45, 15, 41, 43, 31, 35] can
drastically improve the efficiency of attack detection, but
the majority of such defenses are unlikely to ever be de-
ployed as they require invasive changes to the underlying
processor architecture. However, the prevalence of advanced
attack techniques (e.g, ROP) in modern run-time exploita-
tion has prompted major processor vendors to integrate se-
curity primitives into their processor designs to thwart spe-
cific attacks efficiently [20, 32, 34]. Recent additions to
the ARMv8-A architecture [3] include new instructions for
pointer authentication (PA). PA uses cryptographic message
authentication codes (MACs), referred to as pointer authen-
tication codes (PACs), to protect the integrity of pointers.
However, PA is vulnerable to pointer reuse attacks where an
authenticated pointer is substituted with another [34]. Practi-
cal PA-based defenses must minimize the scope of such sub-
stitution.
Goals and Contributions In this work, we further the
security analysis of ARMv8-A PA by categorizing pointer
reuse attacks, and show that PA enables practical defenses
against several classes of run-time attacks. We propose an
ar
X
iv
:1
81
1.
09
18
9v
4 
 [c
s.C
R]
  2
4 M
ay
 20
19
enhanced scheme for pointer signing that enforces pointer
integrity for all code and data pointers. We also propose run-
time type safety which constrains pointer substitution attacks
by ensuring the pointer is of the correct type. Pointer signing
and run-time type safety are effective against both control-
flow and data-oriented attacks. Finally, we design and im-
plement Pointer Authentication Run-Time Safety (PARTS),
a compiler instrumentation framework that leverages PA to
realize our proposed defenses. We evaluate the security
and practicality of PARTS to demonstrate its effectiveness
against memory corruption attacks. Our main contributions
are:
• Analysis: A categorization and analysis of pointer reuse
and other attacks against ARMv8-A pointer authentica-
tion (Section 3).
• Design: A scheme for using pointer integrity to system-
atically defend against control-flow and data-oriented
attacks, and run-time type safety, a scheme for guar-
anteeing safety for data and code pointers at run-time
(Section 5).
• Implementation: PARTS, a compiler instrumentation
framework that uses PA to realize data pointer, code
pointer, and return address signing (Section 6).
• Evaluation: Systematic analysis of PARTS showing
that it has a reasonable performance overhead (< 0.5%
average overhead for code-pointer and return address
signing, 19.5% average overhead for data-pointer sign-
ing in nbench-byte (Section 7)) and provides better se-
curity guarantees than fully-precise static CFI (9).
2 Background
2.1 Run-time attacks
Programs written in memory-unsafe languages are prone
to memory errors like buffer-overflows, use-after-free er-
rors and format string vulnerabilities [42]. Traditional
approaches for exploiting such errors by corrupting pro-
gram code have been rendered largely ineffective by the
widespread deployment of measures like data execution pre-
vention (DEP). This has given rise to two new attack classes:
control-flow attacks and data-oriented attacks [12].
2.1.1 Control-flow attacks (on ARM)
Control-flow attacks exploit memory errors to hijack pro-
gram execution by overwriting code pointers (function return
addresses or function pointers). Corrupting a code pointer
can cause a control-flow transfer to anywhere in executable
memory. Corrupting the return address of a function can be
used for ROP attacks, which are feasible on several architec-
tures, including ARM [22].
ARM processors, similar to other RISC processor designs,
have a dedicated Link Register (LR) that stores the return ad-
dress. LR is typically set during a function call by the Branch
with Link (bl) instruction. An attacker cannot directly influ-
ence the value of LR, as it is unlikely for a program to con-
tain instructions for directly modifying it. However, nested
function calls require the return address of a function to be
stored on the stack before the next function call replaces the
LR value. While the return address is stored on the stack, an
attacker can use a memory error to modify it to subsequently
redirect the control flow on function return. On both x86 and
ARM, it is possible to perform ROP attacks without the use
of return instructions. Such attacks are collectively referred
to as jump-oriented programming (JOP) [10].
Control-flow integrity (CFI) [1] is a prominent defense
technique against control-flow attacks. The goal of CFI is
to allow all the control flows present in a program’s control-
flow graph (CFG), while rejecting other flows. Practical de-
ployment of CFI solutions must trade off precision with per-
formance overhead. Thus, widely deployed CFI solutions
are less precise than state-of-the-art solutions presented in
scientific literature.
2.1.2 Data-oriented attacks
In contrast to control-flow attacks, data-oriented attacks
can influence program behavior without the need to mod-
ify code pointers. Instead, they corrupt variables that in-
fluence the program’s decision making, or leak sensitive in-
formation from program memory. Such attacks are called
non-control-data attacks. Chen et al [12] demonstrated a
variety of non-control-data attacks for forging user creden-
tials, changing security critical configuration parameters, by-
passing security checks, and escalating privileges. Recent
work on DOP [18] showed that non-control-data corruption
can also enable expressive attacks without compromising
control-flow integrity. DOP may compromise the input of
individual program operations and chain together a chosen
sequence of operations to achieve the intended functionality.
A data-oriented attack can in principle corrupt arbitrary
program objects, but corrupting data pointers is often the pre-
ferred attack vector [13]. In Chen et al.’s attack against the
GHTTPD web server [12], a stack buffer overflow is used to
corrupt a data pointer used in input string validation in order
to bypass security checks on the input under the attacker’s
control. Data pointers are also routinely corrupted in heap
exploitation. For instance, the “House of Spirit” attack on
Glibc1, involves corrupting a pointer returned by malloc()
to trick subsequent malloc() calls into returning attacker
controlled memory chunks. The DOP attacks in [18] also in-
volve the corruption of pointers as a means to control which
data is processed by vulnerable code.
1Team Shellphish repository of educational heap exploitation tech-
niques: https://github.com/shellphish/how2heap
address
PACPAC address
Pointer
Pointer
pacia pointer, modifier;
keyed-MACPA-key
Figure 1: The PAC is created using key-specific PA in-
structions (pacia) and is a keyed MAC calculated over the
pointer address and a modifier.
2.2 ARM Pointer Authentication
ARMv8.3-A includes a new feature called pointer authen-
tication (PA). PA is intended for checking the integrity of
pointers with minimal size and performance impact. It is
available when the processor executes in 64-bit ARM state
(AArch64). PA adds instructions for creating and authen-
ticating pointer authentication codes (PACs). The PAC is
a tweakable message authentication code (MAC) calculated
over the pointer value and a 64-bit modifier as the tweak
(Figure 1). Different combinations of key and modifier
pairs allow domain separation among different classes of
authenticated pointers. This prevents authenticated pointer
values from being arbitrarily interchangeable with one an-
other. Preventing, for example, attacks from using a func-
tion pointer as a return address, or vice versa.
The idea of using of MACs to protect pointers at run-time
is not new. Cryptographic CFI (CCFI) [27] uses MACs to
protect control-flow data such as return addresses, function
pointers, and vtable pointers. Unlike ARMv8-A PA, CCFI
uses hardware-accelerated AES for speeding up MAC calcu-
lation. Run-time software checks are needed to compare the
calculated MAC to a reference value. PA, on the other hand,
uses either QARMA [6] or a manufacturer-specific MAC,
and performs the MAC comparison in hardware.
64-bit ARM processors only use part of the 64-bit address
space for virtual addresses (Figure 2). The PAC is stored
in the remaining unused bits of the pointer. On a default
AArch64 Linux kernel configuration with 39 bit addresses
and without address tagging [3, D4.1.4], the PAC size is 24
bits. However, depending on the memory addressing scheme
and whether address tagging is used, the size of the PAC is
between 3 and 31 bits [34]. Security implications of the PAC
size are discussed in Section 9.
PA provides five different keys for PAC generation: two
for code pointers, two for data pointers, and one for generic
use. The keys are stored in hardware registers configured
to be accessible only from a higher privilege level: e.g., the
kernel maintains the keys for a user space process, generat-
ing keys for each process at process exec. The keys remain
constant throughout the process lifetime, whereas the mod-
ifier is given in an instruction-specific register operand on
0va_size5563
addresstag / reserved reserved
PACPAC
upper/lower bit
Figure 2: Pointer layout on 64-bit ARM. The PAC is stored
in the reserved bits, and its size depends on the used virtual
address range. If pointer tagging is disabled, then the PAC
can also extend to the tag bits.
each PAC creation and authentication (i.e., MAC verifica-
tion). Thus it can be used to describe the run-time context in
which the pointer is created and used. The modifier value is
not necessarily confidential (see Section 4) but ideally such
that it 1) precisely describes the context of use in which the
pointer is valid, and 2) cannot be influenced by the attacker.
PA is used by instrumenting code with PAC creation and
authentication instructions. PA instruction mnemonics are
generally prefixed either with pac or aut for creation and
authentication, respectively, followed by two characters that
select one of the data or code keys. For instance, the pacia
instruction in Figure 1 will generate an authenticated pointer
(pac) based on the instruction (i) A-key (a). Table 5 in Ap-
pendix C provides a list of PA instructions . An authenti-
cated pointer cannot be used directly, as the PAC embed-
ded in the pointer value intentionally interferes with address
translation. The corresponding PA authentication instruction
(in this case, autia) removes the PAC from the pointer if
authentication is successful, i.e., if the current pointer value,
key and modifier for autia yields a PAC that matches the
PAC embedded in the pointer. If authentication fails, the
pointer is invalidated such that a dereference or call using
the pointer will cause a memory translation fault. Dedicated
PA instructions are encoded in NOP space; older processors
without PA support will ignore them. For code pointers,
ARM has combined PA instructions that can do authentica-
tion and branching in one instruction, but these are not back-
wards compatible. For instance, the blra (Branch with Link
to Register, with pointer Authentication) instruction can be
used to implement an indirect function call using an authen-
ticated pointer.
Return address signing. Qualcomm’s return address sign-
ing scheme [34] is the first to make use of ARMv8-A PA. It
was first introduced in Linaro’s GCC toolchain, but has been
supported by mainline GCC since version 7.02. It thwarts at-
tacks that manipulate function return addresses through stack
corruption (see Section 2.1.1) by ensuring that the return ad-
dress in LR always contains a PAC when written to or re-
trieved from memory. Listing 1 shows an example.
2GCC return address signing and PA support is based on patches
provided by ARM, https://github.com/gcc-mirror/gcc/commit/
06f29de13f48f7da8a8c616108f4e14a1d19b2c8
f u n c t i o n :
pac iasp ; À c r e a t e PAC
s t p FP , LR , [ SP , #0 ] ; s t o r e LR
; . . .
ldp FP , LR , [ SP , #0 ] ; l oad LR
a u t i a s p ; Á a u t h e n t i c a t e
r e t ; r e t u r n
Listing 1: Return address signing using PA. At funtion entry,
paciasp is used to create a PAC in LR (À). The value is then
authenticated with autiasp before return (Á).
The instrumentation adds paciasp (À) at beginning of the
function prologue, before the LR value is stored on the stack.
paciasp adds a PAC tag using the current Stack Pointer (SP)
value as the modifier. Before function return, autiasp (Á)
authenticates the pointer and either removes the PAC or in-
validates the pointer. An alternative is to use the combined
autiasp+ret instruction, retaa, but it is not backwards-
compatible with older processors.
The PAC cryptographically binds the return address to the
current SP value. It is valid only when authenticated using
the same SP value as on PAC creation. The goal is to limit
the validity of the PAC to the function invocation that created
it, thus preventing reuse of authenticated return addresses.
3 Attacks on Pointer Authentication
PA prevents an attacker from injecting or forging pointer val-
ues. This effectively prevents any attack that relies on cor-
rupting pointers, resisting even attackers with arbitrary ac-
cess to program memory.
To protect authenticated pointers, PA relies on the con-
fidentiality of process-specific PA keys and the immutabil-
ity (but not confidentiality) of PA modifier values. PA keys
are managed by the kernel and never revealed to user space.
Although the keys are used by PAC creation and authenti-
cation operations in user space, such operations take place
using dedicated PA instructions, and direct access to the PA
key registers is subject to hardware-enforced access controls.
Consequently, our adversary model (Section 4) assumes that
the attacker cannot read or modify the PA keys.
The modifier value used in computing a PAC can depend
on both static (e.g., a hard-coded value) and dynamic (e.g.,
the SP) information. We assume that the program code it-
self is not confidential and that the attacker can learn how
dynamic modifiers are generated and may infer their values.
PA also relies on the security of the underlying crypto-
graphic primitives. In particular, an attacker may attempt
to brute-force either the PA keys themselves, or individual
PAC values. Sophisticated adversaries may even attempt
cryptanalysis attacks based on known PAC values, or side-
channels attacks against the hardware circuitry for comput-
ing PACs. The security of the QARMA block cipher has
already been analyzed [47, 26]. We leave the scrutiny of the
cryptographic building blocks outside the scope of this pa-
per. Nevertheless, the limited PAC size means that guessing
attacks are a potential concern. We discuss the feasibility of
brute-forcing PACs in Section 7.2.4. Assuming proper pre-
cautions for the lifetime of PA keys (see Section 2.2), we
do not consider guessing attacks the primary attack vector
against PA. However, the following concerns for the security
of PA-based defenses remain: 1) an attacker controlling the
creation of PAC values, or 2) an attacker reusing previously
authenticated pointers.
Malicious PAC generation. Attackers can potentially
control PAC values in three ways, by controlling:
1. the unauthenticated pointer value before PAC creation:
get an arbitrary authenticated pointer for any context
with the same modifier and PA key.
2. control the PA modifier value: get an authenticated
pointer for a context with the same PA key, but with
an attacker-chosen modifier.
3. both: get arbitrary authenticated pointers for a context
with attacker-chosen modifier, and the same PA key.
To prevent the attacker from generating arbitrary authen-
ticated pointers, the program must not contain PA creation
instructions with attacker controlled inputs. Also, a control-
flow attack could be mounted by chaining together instruc-
tion sequences to prepare the PA operand registers with at-
tacker controlled input and then jump to a PA instruction at
another part of the program. This suggests that PA-based de-
fenses must provide, or be combined with, CFI guarantees
that prevent the use of individual authentication instructions
as attacker-controlled gadgets.
Reuse attacks. The attacker can read authenticated point-
ers (including PAC values), and later reuse them to either:
• rollback an authenticated pointer to a previous value, or
• substitute an authenticated pointer with another using
the same PA modifier.
For instance, in GCC’s return address signing scheme
(Section 2.2), the return address is bound to the location
of the stack frame by using the current SP value as the PA
modifier. However, the SP value is not necessarily unique
to a specific function invocation. Consequently, an attacker
can reuse the authenticated return addresses value from one
function when a different vulnerable function executes with
a matching SP value. Given that typical programs offer no
guarantees on the uniqueness of SP values between different
function invocations, this approach exposes a large attack
surface for pointer reuse attacks. Therefore, a concern for
any PA-based defense is partitioning authenticated pointers
into distinct classes based on different <PA key, modifier>
pairs.
Attackers can reuse only those pointers they can observe
(as opposed all possible values a function pointer can take).
Even with full read access to memory (and hence the ability
to observe any pointer value that has been generated so far),
attackers are still limited to authenticated pointer values the
program has already generated.
4 Adversary Model and Requirements
4.1 Pointer Integrity
Kuznetsov et al. [24] introduced the idea of code pointer in-
tegrity: ensuring precise memory safety for all code point-
ers in a program. Since control-flow attacks depend on the
manipulation of code pointers, guaranteeing code pointer in-
tegrity will render all control-flow attacks impossible [24].
The notion of pointer integrity is generalizable to both
code and data pointers. In Section 9.1, we provide a more
rigorous definition of pointer integrity. Intuitively, pointer
integrity aims to prevent unintentional changes to pointers
while they remain in program memory so that the value of a
pointer at the time it is “used” (e.g., dereferenced or loaded
from memory) is the same as when it was created or stored on
memory. In particular, integrity-protected pointers reference
the intended target objects. As explained in Section 2.1, all
control-flow attacks, all known DOP attacks and many other
data-oriented attacks rely on the manipulation of vulnerable
pointers. Consequently, ensuring pointer integrity will pre-
vent these attacks.
4.2 Attacker Capabilities
To reason about how effectively PA defends against state-
of-the-art attacks we assume attacker capabilities consistent
with prior work on run-time attacks (Section 2.1). Our adver-
sary model assumes a powerful attacker with arbitrary mem-
ory read and write capabilities restricted only by DEP. The
attacker can thus read any program memory and write to non-
code segments. We further assume that the attacker has no
control of higher privilege levels, i.e., an attacker targeting a
user space process cannot access the kernel or higher privi-
lege levels. Specifically, we assume that the attacker cannot
infer the PA keys, as they are in registers not directly read-
able from user space (Section 2.2). We discuss protection of
kernel code using PA in Section 10. The attacker’s ability to
read arbitrary memory precludes the use of randomization-
based defenses that cannot withstand information disclosure
(e.g., address space layout randomization [39] or software
shadow-stacks [1]). PA was specifically designed to remain
effective even when the entire memory layout of the victim
process is known.
4.3 Goal and Requirements
Our goal is to thwart control-flow and data-oriented attacks
by preventing the attacker from forging pointers used by a
vulnerable program. We identify the following requirements
that our solution should satisfy:
R1 Pointer Integrity: Detect/prevent the use of corrupted
code and data pointers.
R2 PA-attack resistance: Resist attempts to control PAC
generation, and pointer reuse attacks.
R3 Compatibility: Allow protection of existing programs
without interfering with their normal operation.
R4 Performance: Minimize run-time and memory over-
head and gracefully scale in relation to the number of
protected pointers and dereferences/calls.
5 Design
To meet our requirements (Section 4.3) we must solve a
number of challenges which we elaborate below:
5.1 Instrument program with PA instructions
5.2 Create PACs in statically allocated data
5.3 Pointer compartmentalization
5.4 On-load data pointer authentication
5.5 Handling pointer conversions
5.1 Instrument program with PA instructions
To meet requirement R1 , the program executable must be
instrumented with PA instructions to create and authenticate
PACs when needed. For this, we designed and implemented
Pointer Authentication Run-Time Safety (PARTS), a com-
piler enhancement that emits PA instructions to sign pointers
in memory as required. Specifically, it protects:
• return addresses;
• local, global and static pointers; and
• pointers in C structures.
Figure 3 shows the overall architecture of the PARTS-
enhanced compiler. PARTS analyzes the compiler’s inter-
mediate representation (IR) to identify any pointers used by
the program and then emits PA instructions at points in the
program where pointers are (a) created or stored in memory,
and (b) loaded from memory or used.
5.2 Create PACs in statically allocated data
Programs may contain pointers which are initialized by the
compiler, e.g., defined global variables. However, PAC val-
ues for authenticated pointers cannot be calculated before
Table 1: For code and data pointers PARTS uses a static PA modifier based on the pointer’s ElementType as defined by LLVM.
Return address signing uses a 48-bit function-id and the 16 most-significant bits of the SP value.
key Modifier type Modifier construction
À Data pointer signing Data A static type-id = SHA3(ElementType)
Á Code pointer signing Instr A static type-id = SHA3(ElementType)
Â Return address signing Instr B dynamic + static SP | function-id = compile-time nonce
executable
LL
V
M
Clang frontend
Source Code
backend
optimizer
PARTS
PARTS
PARTSlib
Figure 3: The PARTS instrumentation is setup with compiler
modifications and utilizes a run-time support library.
program execution, as PA keys are set only at program
launch. Consequently, initialized pointers in the program’s
data segment pose a challenge, as their values are normally
initialized by the linker and loaded into memory separately.
PARTS solves this problem by generating a custom initial-
izer function for pointers requiring PACs. At run-time, the
PARTS runtime library, PARTSlib, processes the relocated
variables and invokes the generated initializer function to en-
sure that any defined pointers are furnished with a PAC.
5.3 Pointer compartmentalization
As described in Section 3 the attacker may attempt to
reuse previously signed pointers. To meet requirement R2
PARTS therefore limits the scope of such reuse attacks
by compartmentalizing pointers in three different ways, as
shown in Table 1.
Code / Data Pointer Compartmentalization: Recall from
Section 2.2, that PA provides separate key sets for data and
code pointers making it possible to limit reuse attacks.
Run-time type safety: Pointer compartmentalization, while
effective, is coarse-grained. To address this, PARTS adds
run-time type safety for data and code pointers. Run-time
type safety records the pointer’s type by encoding it in the
PA modifier. Then, it checks that pointer dereferences or in-
direct calls take place using a pointer with a recorded type
that matches the type expected at the use site. PARTS as-
signs pointers a unique id, type-id, based on the pointer’s
LLVM ElementType which depends on the pointed-to data,
structure, or function signature. Two pointers are compatible
(have the same type-id) if their ElementType is the same.
PARTS uses a deterministic scheme, detailed in Section 6.1
and shown in Table 1, to calculate type-ids during compi-
lation. This ensures that separate compilation units generate
equivalent type-ids for compatible objects, and different
type-ids for non-compatible ones.
Improved Return Address Signing: While run-time type
safety could also be applied for return addresses, it would
result in an over-permissive policy for backward edges. As
described in Section 3, binding the authenticated return ad-
dress to the current stack pointer value alone is insufficient
because the stack pointer may not be unique to a specific
function invocation. Instead, PARTS uses a combination of
the current stack pointer value, and a compile-time nonce
(function-id) ensuring that the authenticated return ad-
dress cannot be reused across invocations of different func-
tions, while the stack pointer values effectively compartmen-
talizes return addresses to callers with different stack layouts.
5.4 On-load data pointer authentication
Pointers with PACs can be authenticated either as they are
loaded from memory, or immediately before they are used.
We refer to these as on-load and on-use authentication, re-
spectively. Data pointers are often dereferenced frequently
without intervening function calls, i.e., they will not be
cleared after use. This allows the compiler to optimize mem-
ory accesses such that, for instance, temporary values might
never be written to memory. PARTS accommodates this be-
havior by only using on-load authentication for data point-
ers. The combined PA instructions can be used for on-use
authentication of code pointers, which are typically loaded
to a register, used once, and cleared. On-load authentication
always uses the standalone authentication instructions. An
attacker could attempt to exploit either the standalone au-
thentication or the separate pointer dereference by diverting
control flow to either. However, as mentioned in Section 3,
PA solutions must be combined with CFI guarantees, which
prevent this type of attacks.
5.5 Handling pointer conversions
A data pointer to an object of a specific type may be con-
verted to a pointer to a different object type. When run-time
type safety is applied to authenticated pointers, special care
must be taken to not interfere with legitimate pointer con-
versions to meet requirement R3 . For instance, if a struct
pointer is cast to a pointer to its first field, it will change the
type-id and hence the expected PAC.
If the source and destination object types are compatible,
no special consideration is needed. If not, PARTS must con-
vert the authenticated pointer to the correct type-id. Be-
cause data pointer PAC creation and authentication is done
at store/load, PARTS handles conversions by; (a) if loading
the pointer from memory, validating and stripping the PAC
using the type-id of the original object, and (b) on store,
creating a new PAC using the destination object type-id.
A pointer to a function of one type may be converted to a
pointer to a function of another type. However, the behav-
ior when calling a function pointer cast to a non-compatible
type is undefined [21][6.3.2.3§8]. Hence, PARTS does not
need to convert the pointer’s PAC to match the destination
function’s type-id. If the converted pointer is converted
back, the result is expected to be the same as the original
pointer [21][6.3.2.3§8]. PARTS satisfies this as it does not
modify the pointer’s PAC.
6 Implementation
The PARTS compiler is based on LLVM 6.0 but modifies
and adds new passes to the optimizer and the AArch64 back-
end (Figure 4). The optimization passes (¶) generate neces-
sary metadata for PA modifiers, inserts wrappers for com-
patibility with legacy code, and prepares initializers for stat-
ically allocated pointers. The AArch64 Frame Lowering
emits function prologues and epilogues and is modified to
include instructions for authenticating the LR value (·). The
PARTS backend passes (¸) retrieve the PA modifiers and in-
struments appropriate low-level instructions. The resulting
binary is linked with PARTSlib (¹), which at run-time cre-
ates PACs for the initialized pointers.
6.1 LLVM Compiler Integration
While the LLVM 6.0 AArch64 backend recognizes PA in-
structions, they are not used by any pre-existing security fea-
ture. Our modifications consist of added optimizer and back-
end passes, minor modifications to the AArch64 backend,
and new PARTS-specific intrinsics. Where applicable, we
use optimizer passes that operate on the high-level LLVM
intermediate representation (IR). Nonetheless, much of the
needed functionality is PA-specific and thus implemented in
the backend that uses low-level LLVM machine IR (MIR),
and a register- and instruction set specific to 64-bit ARM.
Determining pointer type-id. The compiler backend
views the program from a low-level perspective, and the MIR
executable
Clang Frontend
LL
V
M
 PARTS opt-passes
source code
o
p
t
LL
V
M
 IR
b
ac
ke
n
d
M
ac
h
in
e 
IR AArch64 modifications
 PARTS backend-passes
PARTSlib
new component
LLVM internal
Figure 4: PARTS architecture.
has lost much of the semantics present in C or the high-level
IR. Therefore, PARTS must determine type-ids during
its optimizer passes where this information is still available
(Figure 4, ¶). The type-id for data consists of a truncated
64-bit SHA-3 hash of the pointer’s LLVM ElementType.
The ElementType represents the IR level data type and
distinguishes between basic data types, but does not re-
tain typedef or other information from the frontend (i.e.,
clang). Code pointers use the same scheme wherein the
ElementType consists of the function signature at the same
abstraction level. The type-ids are passed to the backend
either via PARTS-specific compiler intrinsics, or by embed-
ding them as metadata in the existing IR instructions. The
AArch64 instruction selection retrieves the information from
the IR instructions and transfers it to the emitted MIR (Fig-
ure 4, ·). To facilitate the run-time bootstrap (Section 6.2)
PARTS also includes a pass that prepares a custom initial-
izer function that is called at run-time to generate PACs for
defined global pointers (Figure 4, ¶).
Return addresses signing. Return address signing is im-
plemented in the AArch64 backend during frame lowering
(Figure 4, ·). Frame lowering emits the function prologues
and epilogues, and for non-leaf functions, emits instruc-
tions for storing and retrieving the LR value from the stack.
PARTS authenticates the value of the LR only if it was re-
trieved from the stack. The PAC modifier is based on the 16
least-significant bits of the SP value and a 48-bit function-
specific function-id. The function-id is guaranteed to
be unique within the current compilation unit or, with link
time optimization (LTO), the whole program. To avoid rep-
etition across different compilation units, the function-id
is generated using a pseudorandom, non-repetitive sequence.
MACRO movFunc t ionId Mod
movk Mod , # func_ id16 , l s l #16
movk Mod , # func_ id32 , l s l #32
movk Mod , # func_ id48 , l s l #48
ENDM
f u n c t i o n :
mov Xd , SP ; À g e t SP
movFunc t ionId Xd ; Á g e t i d
pacib LR , Xd ; Â PAC
s t p FP , LR , [ SP , #0 ] ; s t o r e
; f u n c t i o n body
ldp FP , LR , [ SP , #0 ] ; l oad LR
mov Xd , SP ; Ä g e t SP
movFunc t ionId Xd ; Ã g e t i d
a u t i b LR , X ; Å au th
r e t
Listing 2: The PARTS return address signing binds the PAC
to the SP (À,Ä) and unique function id (Á,Ã). The PA modi-
fier is in register Xd during PAC creation (Â) and authentica-
tion (Å). The 48-bit func-id is split into three 16-bit parts,
each moved individually to Xd by left-shifting.
MACRO movTypeId Mod
mov Mod , # t y p e _ i d 0 0
movk Mod , # t y p e _ i d 1 6 , l s l #16
movk Mod , # t y p e _ i d 3 2 , l s l #32
movk Mod , # t y p e _ i d 4 8 , l s l #48
ENDM
mov c P t r , # i n s t r _ a d d r ; l oad c P t r
movTypeId Xd ; ¶ g e t i d
pac ia c P t r , Xd ; · PAC
; no i n t e r m e d i a t e c P t r i n s t r u m e n t a t i o n
movTypeId Xd ; ¸ g e t i d
blraa c P t r , Xd ; ¹ branch
Listing 3: The PARTS forward-edge code pointer signing
uses the code pointer’s type-id as the PA modifier (¶,¸).
The 64-bit type-id is split into four 16-bit parts. The
PAC is created only once when initially creating the code
pointer (·). Upon use, i.e., indirect call, the PAC is authen-
ticated using the combined branch and authenticated instruc-
tion (¹). PARTS does not instrument intermediate store/load
operations.
Code pointer signing. PARTS uses the combined PA in-
structions for branches and converts branch instructions di-
rectly to their PA variants (Figure 4, ¸). The PAC for any
code pointer is created only once at the time of pointer cre-
ation, e.g., when the address of a function is taken. This is
instrumented by adding a PAC-creation instruction immedi-
ately after the instruction that moves a code pointer to a regis-
ter. Subsequent load and store operations do not authenticate
the signed code pointers, instead they are authenticated only
on use.
Data pointer signing. As discussed in Section 5.4, it is
not feasible to perform on-use authentication for data point-
ers. Instead, we authenticate data pointers when they are
loaded from memory and create PACs before storing them.
In some cases, e.g., using globals, the IR will include ex-
plicit load and store operations that can be furnished with the
type-id. Our modified Instruction Selection then forwards
the type-id to the emitted MIR (Figure 4, ·). However,
stack-based store and load operations, in particular, are often
not present before the backend finalizes the stack-layout and
register allocation. Thus, some load and store instructions
must be instrumented solely in the backend.
While it would be possible to modify the AArch64 back-
end (e.g., register allocation), we have instead opted for a less
invasive approach. The PARTS backend pass (Figure 4, ¸)
finds load and store instructions in the MIR, and uses the
attached type-id for instrumentation. When the type-id
is not present, e.g., because the load and store is a register
spill, the type-id is fetched from surrounding code. For in-
stance, when instrumenting the store due to register spilling
a pointer variable, the correct type-id can be fetched from
the original load.
6.2 Run-time Bootstrap
Programs may contain pointers in statically allocated data,
i.e., pointers stored in global variables or static local vari-
ables. These are initialized by the compiler or linker, and
therefore cannot include PACs. The PARTSlib runtime li-
brary instead invokes the compiler generated custom PAC
initializer function at process startup. Our Proof-of-Concept
implementation invokes the PARTSlib bootstrap using com-
piler instrumentation that explicitly calls the functionality
when entering main.
Our current approach relies on LTO, because the initializer
function is created once for each optimization unit. An alter-
native is to use the C constructor feature supported by Clang
and GCC. The libc initialization and will run all constructor
functions before invoking the program’s main function. The
order in which constructor functions are run is well-defined
only within the same translation unit. This means that pro-
grams that already use C constructors to run custom code
may interfere with the PA initialization routine. Therefore,
we aim to move support for PA directly to the dynamic linker
(see Section 10).
6.3 Instrumentation
PARTS uses only in-line instrumentation and does not re-
quire storage of separate run-time metadata. With the ex-
ception of the bootstrap process the original code structure
is thus largely unchanged. As discussed in Section 2.2, no
explicit error handling is added by PARTS; instead, an au-
thentication failure will set specific high-order bits in the
pointer, thus triggering a memory translation fault on sub-
sequent dereference or call using the pointer that failed au-
thentication. The high-order bits ensure that the fault is dis-
tinguishable as one caused by authentication failure. Our
code listings use two macros for setting up PA modifiers for
return address signing and type-id based PACs, these are
shown in Listing 2 and Listing 3.
Return address signing. The return address signing in-
strumentation is similar to GCC’s implementation [34] but
includes an added modifier (Listing 2). The function pro-
logue is instrumented such that it prepares the PA modifier
by moving SP (À) value into a free register. The SP value is
combined with the function-id (Á) to form the PA modi-
fier, which is then used with the instruction B key (Â). The
function-id is generated at compile-time using LLVM’s
random number generator, and is guaranteed to be unique
withing the LLVM Module (i.e., the whole program, when
using link time optimization). The function epilogues (i.e.,
any part that ends with a return or a tail-call) are similarly
instrumented to generate the same PA modifier (Ã,Ä) and to
verify the PAC in the restored LR (Å).
Code pointer signing. PARTS instruments code pointers
only on creation and use (Listing 3). Specifically, when
a code pointer is initially created, PARTS will use the in-
struction A-key to create a PAC (·) based on the target
type-id (¶). The instrumentation will at no point re-
move the PAC from a code pointer. Instead, PARTS uses
the combined authenticate and branch instructions — e.g.,
blraa — to perform the branch directly on an authenticated
pointer (¹), again using the same PA modifier (¸).
Data pointer signing. All data pointer stores and loads are
instrumented such that a PAC is created immediate before
store and authenticated immediately after load (Listing 4).
When a data-pointer is used the instrumentation first sets up
the correct PA modifier, i.e., the type-id (À). The pointer is
then immediately authenticated using the modifier and data
A-key (Á); this also strips the PAC from the pointer. As long
as the data pointer resides in a register it can thus be used
without any performance overhead. PARTS creates PACs for
pointers immediately before store in the same manner, save
for the pacda instruction.
l d r dP t r , [ SP , #0 ] ; l oad dP t r
movTypeId Xd , # t y p e _ i d ; À g e t i d
autda dP t r , Xd ; Á a u t h e n t i c a t e
; dP t r i s d i r e c t l y u s a b l e
Listing 4: PARTS immediately authenticates data pointers
loaded from writeable memory. This is done by first loading
the type-id (À) and then verifying the PAC (Á).
7 Evaluation
We develop our Proof-of-Concept implementation of PARTS
on the ARMv8-A Base Platform Fixed Virtual Platform
(FVP), based on Fast Models 11.4, which supports version
8.0 to 8.4 of the ARMv8-A architecture [5]. At the time of
writing, the only PA-capable hardware is the Apple A12 and
S4 SoCs featuring ARMv8.3-A CPUs [2]. However, these
proprietary SoCs are, to the best of our knowledge, not avail-
able in development versions outside Apple. The FVP pro-
vides a software simulation of an ARMv8.3-A processor in
AArch64 mode, and is, to the best of our knowledge, the only
publicly available environment with ARMv8-A PA support.
7.1 ARMv8.3 Emulation and Software Stack
We use GNU/Linux with a 4.14 kernel, modified to sup-
port PA . We modified the bootloader and kernel to acti-
vate ARMv8-A PA, and allow key configuration during ker-
nel scheduling at Exception Level 1 (EL1 in Figure 5). Our
kernel modifications are based on Mark Rutland’s 2018 PA
patches3.
During system boot, the PA-setup proceeds as follows: As
the PA feature is turned off by default it needs to be acti-
vated, through the SCR_EL3 system control register, by the
ARM Trusted Firmware, i.e., at Exception Level 3 (EL3).
Second, in order to support kernel (EL1) or hypervisor (EL2)
use, the hardware by default traps the use of PA and the set-
ting of PA keys to EL3. As a consequence, we must release
the trapping of PA instructions to these exception levels (bits
SCR_EL3_API, HCR_EL2_API in the SCR/HCR registers re-
spectively) as well as the trapping of PA key writing (bits
SCR_EL3_APK, HCR_EL2_APK). The PA key management in
the Linux kernel can only be performed with these precondi-
tions.
The kernel scheduler is modified to dynamically deter-
mine whether PA is enabled. PA keys for each task are
stored in a process-specific mm_context_t structure (in
the process’ memory descriptor in the kernel) which con-
tains architecture-specific data related to the process address
space. Threads within the same process have a common
memory descriptor, and thus share the same PA keys. The
3https://lwn.net/Articles/752116/
EL1 - Kernel
binary with 
PARTS
EL3 - ARM trusted FW
EL
0
 –
u
se
r 
sp
ac
e
mm_context_t (1/task)
key reg. bank (1/core)scheduler
EL2 - Hypervisor
binary with 
PARTS
binary with 
PARTS
binary with 
PARTS
source
PARTS 
compiler

 
Figure 5: The trapping of PA configuration must be released
¶, in order to allow the kernel to manage the PA keys on
process creation and context switches ·. Faults generated
by failed authentications will be trapped by the kernel ¸.
scheduler will configure the PA key registers using the keys
in the process’ memory descriptor whenever a task is sched-
uled to run. When a new child process is forked, the parent’s
keys are duplicated to the child’s memory descriptor. How-
ever, when a new executable file is exec’d in the context of
an existing process, the kernel initializes a new set of PA keys
using get_random_bytes(). In other words, each new pro-
cess receives a new set of PA keys which remain unchanged
thereafter.
7.2 Security Evaluation
PARTS provides a practical realization of Pointer Integrity.
PARTS ensures: 1) that code pointers in indirect branches
are always authenticated using the combined branch and au-
thenticate instructions, and 2) that a data pointer dereference
is always preceded by a PA authentication, if the data pointer
was read from memory. We evaluate the security properties
of PARTS and demonstrate its practical efficacy in prevent-
ing existing attacks.
7.2.1 Return address signing
Return address signing in both GCC [34], and PARTS pre-
vents an attacker from introducing forged return addresses to
the program stack ( R1 ). PARTS further narrows the scope
for reuse attacks compared to the return address signing in
GCC ( R2 ). Recall that return address signing in GCC [34]
determines a function’s execution context solely using the SP
value. Therefore, it falls short of compartmentalizing return
addresses to individual function invocations in cases where
the value of the SP coincides between different function in-
vocations. The likelihood of SP values coinciding depends
on many factors, including the order in which functions are
called, and stack frames sizes of functions. Determining the
possible SP value collisions in an arbitrary program’s call
graph, in other words the susceptibility of the program to-
wards return address reuse attacks, would require an exhaus-
tive search through all the program’s potential stack states.
Compared to GCC, PARTS augments the PA modifier
used for return address signing by combining a function-
specific identifier with the SP value. As a result, PARTS
return address signing precludes the possibility of reuse of
the return address between different functions, irrespective
of SP value collisions. It remains susceptible to pointer reuse
between distinct invocations of the same function from call
sites with same SP value.
7.2.2 Forward-edge code pointer signing
As with PARTS return address signing, forward-edge code
pointer signing prevents an attacker from using forged code
pointers injected into program memory ( R1 ). This prevents
a large class of attacks (e.g., typical ROP/JOP gadgets) that
rely on redirecting the control flow to code in the middle
of functions, i.e., addresses that never were valid targets of
benign control-flow transfers.
PARTS restricts forward-edge code pointer reuse by en-
forcing run-time type safety for signed pointers ( R2 ). Un-
der this scheme, pointers used in a pointer reuse attack must
share the same type-id (i.e., have a matching type on the
LLVM IR level). This prevents large classes of function-
reuse attacks. The solution is compatible with common pro-
gramming patterns involving function pointers ( R3 ), such
as callbacks, but allows reuse between code pointers to func-
tions with identical type signatures.
7.2.3 Data pointer signing
PARTS data pointer signing protects all data pointers and
prevents an attacker from loading a forged data pointer to
program memory ( R1 ). This prevents all non-control data
attacks that rely on corrupting data pointers to unintended
parts of of memory. This class of attacks includes all cur-
rently known DOP attacks [18].
PARTS restricts data pointer reuse by enforcing run-time
type safety also for data pointers ( R2 ). Reuse attacks would
be more useful to an attacker if they could substitute a vulner-
able pointer with one referencing an object of different size
or type. Therefore restricting pointer substitution based on
the pointer’s type restricts the attacker’s capability to cause
unintended data flows within the program. However, pointer
conversions are a challenge for data pointer integrity. As
discussed in Section 5.3, PARTS accommodates data point-
ers that are cast from type A to an incompatible type B by
writing the converted pointer using the type-id of B. This
may expand the effective set of reusable pointers under our
threat model; the attacker can record pointers of type A and
reuse them at PAC conversion site A→ B, thereby obtaining
a pointer of type B to an object of type A. This converted
pointer can then be used at de-reference sites that require
pointers of type B. If the program also includes a conversion
from B to A this makes both types interchangeable.
PARTS data pointer integrity does not guarantee spatial
safety of pointer accesses to data objects, nor does it address
the temporal safety (e.g., prevent use-after-free conditions).
ARMv8-A PA does not provide facilities to directly address
these challenges. We discuss orthogonal schemes that can
be used in combination with PARTS to provide spatial and
temporal safety guarantees in Section 8.
7.2.4 PAC entropy
As explained in Section 3, the PAC size b is a concern for any
PA-based scheme. On typical AArch64 Linux systems, b is
between 16 and 24. To succeed with probability p, a PAC
guessing attack requires log(1−p)log(1−2−b) guesses on the assump-
tion that a PAC comparison failure leads to program termi-
nation. On our simulator setup where b = 16, achieving a
50%-likelihood for a correct guess requires 45425 attempts.
Note that ROP/DOP attacks require an environment where
a set of jumps (gadgets) can be set up, each requiring a sepa-
rate PAC to be broken. Consequently, success probability of
a complete attack will decrease exponentially with the num-
ber of jumps necessary.
Pre-forked or multithreaded programs will share the same
PA key between the parent and all sibling threads/processes.
This could allow an attacker to brute force a PAC by target-
ing a sibling, if PAC failure on a sibling does not result in the
termination (and hence PA key reset) of all threads/processes
sharing the same PAC key. In this scenario, 2b−1 guesses on
average are enough to guess a b-bit PAC (32768 guesses for
b = 16). Multithreaded / pre-forking applications could be
hardened against guessing attacks by requiring a full appli-
cation restart if the number of unexpected terminations of
child threads/processes exceeds a pre-defined threshold.
7.3 Performance Evaluation
The FVP processor, peripheral models, and micro-
architectural fabric is simplified. Consequently, timing on
the FVP model differs from actual hardware. The ARM Fast
Models documentation states that ”all instructions execute
in one processor master clock cycle“. We confirm this be-
havior for PA instructions in the FVP by using microbench-
marks that allow PA instructions to be timed in isolation
(Section 7.3.1). As a result, we cannot use the FVP to es-
timate the expected run-time overhead of PARTS. Instead,
00000000000008 b0 < s t a r t > :
8b0 : eb0a011f cmp x8 , x10 ; À
8b4 : 540000 aa b . g e 8 c8 < e x i t >
8b8 : dac10109 pac ia x9 , x8 ; Á
8 bc : dac11109 a u t i a x9 , x8 ; Â
8 c0 : 91000508 add x8 , x8 , #0 x1
8 c4 : 17 f f f f f b b 8b0 < s t a r t >
Listing 5: Minimal loop for timing PA instructions on FVP.
Results and documentation indicate that all instructions re-
quire only one cycle, i.e., in this case adding two PA instruc-
tions among the four non-PA instructions incurs a 50% over-
head.
we estimate the execution time of PA instructions and de-
velop a PA-analogue that emulates the run-time cost of PA
instructions (Section 7.3.2). We then run large-scale bench-
marks on real (non-PA) hardware using our PA-analogue
(Section 7.3.4).
7.3.1 Confirming simulator behavior
To measure PA instruction performance on the FVP we use a
hand-crafted assembly loop with a pacia (Á) and autia (Â)
instruction (Listing 5). We configured this loop to exe-
cute 107 times (À) and measured run time by reading the
cntvct_el0 register, which provides access to a timer clocked
at 100MHz. We then compared the timing with and without
the PA instructions (Á,Â). To exclude potential differences
between different host machines we took measurements with
FVP rate limiting enabled on the following underlying host
CPUs: i7–8700K, i7–7600U and i7–7500U. In all cases, we
observed an overhead of 50.00%, which is consistent with
the assumed PA instruction behavior on the FVP.
7.3.2 PA-analogue
From [6, Table 8] we can deduce that on a (1.2GHz) mobile
core, the PAC is computable with an approximate overhead
of 4 cycles, without accounting for the potential speed ben-
efits of opportunistic pipelining or the inclusion of several
parallel PAC computing engines per core. For simplicity, we
assume equal cycle counts for all PA instructions. Based on
this assumption we construct a PA-analogue (Listing 6) as
a proxy to measure overhead of PA instrumentation on non-
PA CPUs: it consists of four exclusive-or (eor) operations to
account for the 4 cycles. The final eor operates on the modi-
fier and SP to enforce a memory read/write dependency, thus
preventing the CPU pipeline from arbitrarily delaying the op-
erations. We have confirmed that our PA-analogue exhibits
the expected overhead using our microbenchmarks.
eor Xptr , Xptr , #0 x2 ; spend c y c l e s
eor Xptr , Xptr , #0 x3 ; t o a p p r o x i m a t e
eor Xptr , Xptr , #0 x5 ; PA i n s t r u c t i o n
eor Xptr , Xptr , Xmod ; overhead
Listing 6: PA-analogue simulating PA instructions
7.3.3 Expected overhead
Based on expected PA instruction cost we can estimate
micro-level overhead ( R4 ) of PARTS. In particular, we can
estimate which factors will contribute to the overhead for
specific PARTS features. For a single PA instruction our in-
strumentation overhead consists of four move instruction to
prepare the PA modifier, and a PA instruction. In modern
ARM Cortex-A processors certain movk/movk pairs — e.g.,
those used PARTS to prepare the PA modifiers (Listings 2, 3,
and 4) — can be executed with one-cycle execute latency and
four-instruction/cycle execution throughput [4]. Therefore,
we estimate the cost of a single PAC creation or authentica-
tion to be between 6 to 8 cycles. Based on microbenchmarks
similar to Listing 5, we have confirmed that our instrumen-
tation, using our PA-analogue, causes micro-level overheads
inline with these estimates. However, the proportional over-
head depends on the specific program and enabled PARTS
features.
Return address signing. Each non-leaf function call re-
quires two PA instructions for storing and loading the re-
turn address, for an estimated run-time cost of 12 to 16 cy-
cles. The total overhead introduced by PARTS return ad-
dress signing, therefore, scales linearly with the number of
non-leaf function invocations.
Forward-edge code pointer integrity. Code pointers are
instrumented only on initial pointer creation, and on all sub-
sequent indirect calls using the pointer. For a typical pro-
gram, the expected overhead will largely consist of authenti-
cated indirect calls; each with an estimated 6 to 8 cycle cost.
Similarly, any new code pointers created at turn-time incur
an estimated 6 to 8 cycle overhead when created.
Data pointer integrity. All data pointer loads and stores,
to/from any memory, are instrumented; each with an esti-
mated overhead of 6 to 8 cycles. In particular, this includes
memory stores and loads where intermediary values are tem-
porarily written to the stack (e.g., during function calls).
Compiler optimizations generally strive to minimize inter-
mediate stores into memory, and as a result also reduce the
number of PARTS on-load authentications. In other words,
the performance overhead incurred by PARTS data pointer
integrity scales linearly with the number of stores and loads,
rather than the number of pointer dereferences. We estimate
the worst case impact of PARTS data pointer integrity on
synthetic performance benchmarks in Section 7.3.4 by dis-
abling all compiler optimizations.
7.3.4 nbench-byte benchmarks
For our performance evaluation we use the Linux nbench-
byte 2.2.3 synthetic benchmark4 designed to measure CPU
and memory subsystem performance, providing a reasonable
prediction of real-world system performance5. We follow
work such as [7, 11, 25, 36, 40, 11] and use nbench rather
than the SPEC CPU standardized applications benchmarks
for our evaluation, as nbench allows us verify the functional-
ity of PARTS instrumentation with manageable simulation
times on the FVP. The current version of the SPEC CPU
benchmark suite, SPEC CPU20176, has replaced many tests
in the previous, now retired SPEC CPU20067 with signif-
icantly larger and more complex workloads (up to ~10X
higher dynamic instruction counts). As a result, the SPEC
simulation times on the FVP proved to be unmanageable; for
example, running individual SPEC benchmarks take hours to
days to complete on the FVP. This is a challenge for both re-
searchers and industry practitioners who rely on hardware
simulation for evaluation [33]. We report our results for a
subset of SPEC CPU2017 tests in Appendix B.
The nbench benchmarks include 10 different tests. We
adopt the same methodology as Brasser et al. [7] and run
each test a constant number of iterations for the following
cases: a) uninstrumented baseline b) each PARTS scheme
(return address signing, forward-edge code pointer integrity,
and data pointer integrity) enabled individually, and c) all
schemes enabled simultaneously. All benchmarks are
compiled with the LLVM 6.0, but using different switches
to enable measured PARTS features. Compiler optimiza-
tions were disabled for all tests. The tests were performed
on a 96boards Kirin 620 HiKey (LeMaker version) with
a ARMv8-A Cortex A53 Octa-core CPU (1.2GHz) / 2GB
LPDDR3 SDRAM (800MHz) / 8GB eMMC, running the
Linux kernel v4.18.0 and BusyBox v1.29.2. Figure 6 shows
the results, normalized to the baseline. A more detailed de-
scription can be found in Appendix A.
Return address signing incurs a negligible overhead of
less than 0.5%. This is expected because the estimated per-
function overhead of 12 to 16 cycles is typically small com-
pared to the full execution time of the instrumented func-
tion. The same holds for indirect calls (6-8 cycle overhead at
the call site), although indirect calls are underrepresented in
nbench-byte. However, our microbenchmarks for the code
pointer integrity instrumentation indicate that the estimate
of a 6 to 8 cycle overhead per indirect function call (Sec-
4http://www.math.utah.edu/~mayer/linux/bmark.html
5http://www.math.utah.edu/~mayer/linux/byte/bdoc.pdf
6https://www.spec.org/cpu2017/
7https://www.spec.org/cpu2006/
n
o
rm
al
iz
ed
 o
ve
rh
ea
d
0.9
1
1.1
1.2
1.3
1.4
1.5
Numeric sort String sort Bitfield FP emulation Fourier Assignment Idea Huffman Neural net Lu
decomposition
return address signing forward-edge code pointer signing data pointer signing all enabled
(a) Results of instrumented nbench-byte tests features, normalized to a non-instrumented baseline.
ev
en
t 
co
u
n
t
1
.8
E+
0
3
4
.0
E+
0
6
5
.7
E+
0
3
6
.2
E+
0
5
5
.2
E+
0
6
2
.3
E+
0
5
1
.6
E+
0
6
1
.8
E+
0
4
3
.6
E+
0
5
1
.9
E+
0
4
1
.5
E+
0
1
1
.5
E+
0
1
1
.5
E+
0
1
1
.5
E+
0
1
1
.5
E+
0
1
1
.5
E+
0
1
1
.5
E+
0
1
1
.5
E+
0
1
1
.5
E+
0
1
1
.5
E+
0
13
.0
E+
0
8
1
.8
E+
0
8
1
.0
E+
0
8
5
.9
E+
0
8
2
.8
E+
0
4
1
.9
E+
0
8
2
.0
E+
0
8
3
.4
E+
0
8
7
.8
E+
0
2
1
.9
E+
0
8
Numeric sort String sort Bitfield FP emulation Fourier Assignment Idea Huffman Neural net Lu
decomposition
return address signing forward-edge code pointer signing data pointer signing
(b) Run-time count of executed locations instrumentable by PARTS. Because the program’s memory profile affects performance the bench-
mark results clearly correlate with observed memory use (e.g., FP emulation has a large data pointer integrity overhead because it uses many
data pointers)
Figure 6: nbench benchmark results
tion 7.3.3) is reasonable under the assumed QARMA perfor-
mance.
Data pointer integrity, as mentioned (Section 7.3.3), de-
pends largely on the memory profile of the instrumented pro-
gram. For instance, the floating point emulation test exten-
sively handles data pointers, resulting in a 39.5% overhead.
In contrast, the Fourier and neural network benchmarks con-
tain no data pointers and thus incur no discernible overhead.
The geometric mean of the overhead of the combined instru-
mentation for all tests is 19.5%.
7.4 Compatibility Evaluation
Based on our evaluation, PARTS is compatible with standard
C code ( R3 ). However the presence of PACs in protected
pointers may interfere with code that expects a particular
pointer layout. This is a limitation of any PA-based solu-
tion. Because return address signing only affects the instru-
mented function, it can be safely applied without interfering
with the operation of other parts of programs, or uninstru-
mented code.
PARTS forward-edge code pointer integrity and data
pointer integrity can be safely applied to complete code
bases. However, if PARTS is applied only to a partial
code base, the instrumented code interfacing with non-
instrumented (legacy) libraries requires special consider-
ation. In particular pointers used by both instrumented
and uninstrumented code cannot be passed directly between
them. We discuss solutions for backwards compatibility with
legacy libraries in Section 10.
We encountered no compatibility issues with PARTS dur-
ing our performance evaluation with nbench (Section 7.3).
8 Related Work
Code-pointer integrity (CPI) [24] protects access to code
pointers — and data pointers that may point to code pointers
— by storing them in a disjoint area of memory; the SafeS-
tack8. The SafeStack itself must be protected from unau-
thorized access. Randomizing the location of the SafeStack
is efficient [23], but easily defeated by an attacker who can
read arbitrary memory. Stronger protection of the SafeS-
tack using hardware-enforced isolation or software-isolation
incurs an average performance overhead of 8.4% or 13.8%
in SPEC CPU2006 benchmarks. Code-Pointer Separation
(CPS) [24] is a variant of CPI that only secures code point-
ers to achieve reduced run-time overhead. CPS implemented
using hardware-enforced segmentation or information hiding
incurs a performance overhead in the order of 1.8 to 2.2%.
Protecting pointers using cryptography. Prior crypto-
graphic defenses against run-time attacks generally assume
the attacker cannot read memory. PointGuard [13] instru-
ments a program to apply a secret XOR mask to all pointer
values. This prevents an attacker from reliably forging
8https://clang.llvm.org/docs/SafeStack.html
pointer values without knowledge of the mask. Data ran-
domization [8] extends data masking to cover all data in
memory. It uses static points-to analysis and distinct masks
to partition memory accesses in separate classes. Both
PointGuard and data randomization rely on the secrecy of
the XOR mask, but store their secrets within the process’
address space. Neither PointGuard nor data randomization
remain effective under our threat model.
Similarly to ARMv8-A PA, Cryptographic CFI
(CCFI) [27] uses MACs to protect control-flow data,
such as return addresses, function pointers, and vtable point-
ers. Like PARTS, CCFI uses a function’s type signature to
separate function pointers to distinct protection domains,
but does not protect function pointers embedded in C struc-
tures. The use of MACs gives CCFI and PA several useful
advantages over traditional CFI approaches: it prevents
attackers from introducing non-authenticated pointers to
the program memory, it allows separating pointers into
different protection domains based on static, or run-time
characteristics, which enables more finer-grained separation
of sensitive pointers than stateless CFI. Unlike PA, CCFI
only benefits from hardware-accelerated AES for speeding
up MAC, resulting in a high performance overhead (52%
overhead on average in SPEC CPU2006 benchmarks). In
contrast, PARTS also benefits from hardware-accelerated
checks by using ARMv8-A PA instructions, protects both
code and data pointers, including pointers embedded in C
structures.
Hardware-assisted mechanisms. Various hardware-
assisted defenses are described in research litera-
ture [16, 46, 45, 15, 41, 43, 31, 35]. CHERI [46] is a
hardware-assisted memory capability model for the 64-bit
MIPS IV ISA that adds new instructions allowing byte-
granularity enforcement of memory accesses. A memory
capability is unforgeable and grants access to a certain
memory range. CHERI can support a number of protection
models, such as pointer safety [46] and software compart-
mentalization [45, 43]. At time of writing, CHERI has
only been realized as a soft microprocessor prototype on a
64-bit MIPS FPGA. Hardware-Assisted Data-flow Isolation
(HDFI) [41] is a tagged memory extension for the RISC-V
instruction set architecture that provides instruction-level
granularity isolation and the ability to enforce a variety of
security models (including pointer integrity). HDFI is effi-
cient (< 2% overhead) but only supports two simultaneous
protection domains.
Only a few commercial processors, such as the SPARC
M79, support tagged memory, which can be used to real-
ize variety of security models (including pointer integrity).
ARM recently announced support for memory tagging in the
9https://swisdev.oracle.com/_files/What-Is-ADI.html
ARMv8.5-A architecture10. It enforces that all accesses to
memory must be made via a pointer with the correct tag.
Pointer tags use the existing address tagging feature in the
ARM ISA that partly overlaps with the bits used to store PA
PACs, meaning that enabling both features simultaneously
reduces the available PAC size by eight bits.
Hardware-assisted memory tagging is designed primar-
ily as a statistical debug aid against use-after-free and other
temporal memory errors. Hardware-Assisted AddressSan-
itizer (HWASAN) [37] is an AArch64-specific compiler-
based tool that builds upon AddressSanitizer (ASAN) — a
memory-error detector popular for vetting memory safety
bugs during software testing. ASAN can detect both spatial
and temporal memory errors. HWASAN can leverage hard-
ware tagged memory, such as SPARC ADI and the upcoming
ARMv8.5-A to reduce the performance overhead associated
with managing tagged memory checks in software. ASAN
/ HWASAN are complementary to PARTS, as they provide
spatial and temporal safety for data accesses via pointers.
Intel Memory Protection Extensions (MPX) is a hardware
feature for detecting spatial memory errors that debuted in
the Intel Skylake microarchitecture. MPX is similar to the
software based SoftBound [29] and its hardware-based pre-
decessor [16]. Although Intel MPX is a hardware-assisted
approach specifically designed to provide spatial memory
safety guarantees, it is not faster than software-based ap-
proaches [32]. It can cause up to 4x slowdown in the worst
case with an average run-time overhead of 50%. It also suf-
fers from other shortcomings, such as the lack of support for
multithreading and several common C/C++ idioms. GCC
has dropped support for MPX altogether11.
Control-flow integrity. Carlini et al. [9] define fully-
precise static CFI as follows: “An indirect control-flow
transfer along some edge is allowed only if there exists a
non-malicious trace that follows that edge.” In other words,
fully-precise static CFI enforces that execution follows a
CFG that contains an edge if and only if that edge is exer-
cised by intended program behavior. Fully-precise static CFI
is thus the most restrictive stateless policy possible without
breaking intended functionality. To date, there exist no im-
plementation of fully-precise CFI; all practical implementa-
tions are limited by the precision of CFGs obtained through
static control analysis.
Carlini et al. further show that all stateless CFI schemes,
including fully-precise static CFI are vulnerable to control-
flow bending; attacks where each control-flow transfer is
within a valid CFG, but where the program execution trace
conforms to no feasible benign execution trace. For instance,
in a stateless policy such as fully-precise static CFI, the best
10https://community.arm.com/processors/b/blog/posts/
arm-a-profile-architecture-2018-developments-armv85a
11https://gcc.gnu.org/viewcvs/gcc?view=revision&
revision=261304
possible policy for return instructions (i.e., backward edges
in the CFG) is to allow return instructions within a function
F to target any instruction that follows a call to F . In other
words, fully-precise static CFI checks if a given control-flow
transfer conforms to any of the known control-flow transfers
from the current position in the CFG, and does not distin-
guish between different paths in the CFG that lead to a given
control-flow transfer.
The seminal work on stateful CFI [1] combines the restric-
tion of indirect call instructions to valid targets within the
CFG with a shadow call stack to enforce integrity of return
addresses stored on the call stack. The shadow stack main-
tains a shadow copy of each return address on the call stack
in a separate region of memory the attacker cannot access.
Each return instruction is then instrumented to validate that
the returns addresses on the call and shadow stack match.
This ensures that each return only returns to its correspond-
ing call site.
Context-sensitive CFI [44, 17] is a generalization of state-
ful CFI techniques. It provides stronger security properties
than stateless CFI. For instance, path-sensitive CFI [17] can
ensure that each control-flow transfer taken by the program
is consistent with a non-malicious trace. Context-sensitive
CFI does not rely on data integrity, and can thus be en-
forced more efficiently than full data-integrity. Neverthe-
less, context-sensitive has been dismissed as impractical for
real-world adoption [1]. Recent context-sensitive CFI im-
plementations [44, 17] that rely on branch recording fea-
tures available in modern 64-bit Intel microprocessors show
promise in enabling context-sensitive CFI enforcement with
reasonable overhead on commodity hardware. However,
state-of-the-art implementations are either limited in terms
of the size of of the branch history used to make CFI deci-
sions, over-approximation of program CFG, or reliance on
complex run-time monitoring, none of which are likely to
be acceptable for integration into commodity operating sys-
tems. uCFI is a recent context-sensitive CFI scheme that
uses hardware-assisted tracing capabilities on Intel proces-
sors to achieves precise CFI with low overhead by combining
compile-time analysis with optimized run-time tracing, but is
still reliant on a separate monitoring process [19]. Similarly
to all CFI solutions, uCFI and context-sensitive CFI cannot
protect against non-control data attacks that do not influence
the program’s execution trace.
Control-flow integrity on ARM. CFI for Clang provides
a number of CFI schemes for C and C++ using LLVM and
its Clang compiler fronted. 12. For C code, it checks that
function calls target a function of the correct type and uses
a shadow stack to protect backward edges. MoCFI [14] is
a software-based CFI approach targeting ARM-based smart-
phone platforms; it uses a combination of a shadows stack,
12https://clang.llvm.org/docs/ControlFlowIntegrity.html
static analysis, and run-time heuristics to determine the set
of valid targets for control-flow transfers. Specifically, indi-
rect functions calls are constrained to target instructions at
the beginning of functions (as determined by static analysis)
and indirect jumps (e.g.,into branch tables) are restricted to
the function’s scope. However, MoCFI makes no attempt to
protect the integrity of the shadow stack data, and is thus sus-
ceptible to data-oriented attacks that can break shadow stack
integrity. CFI CaRE [30] is a CFI solution targeting small,
embedded ARM-based microcontrollers (MCUs). Similarly
to MoCFI, it uses a shadow stack, but accommodates small
MCUs by relaxing the restriction on indirect calls to only
validate that each call targets the beginning of functions. In
contrast to MoCFI, CFI CaRE uses the ability to perform
hardware-enforced isolated execution on ARM MCUs to iso-
late the shadow stack from the protected program.
9 Comparison with other integrity policies
9.1 Fully precise pointer integrity
As discussed in Section 4.1, Pointer Integrity can be loosely
defined as a policy ensuring that the value of a pointer at the
time of use (dereference or call) corresponds to the value of
the pointer when it was created. In this section, we provide a
more rigorous definition of Pointer Integrity.
We define fully-precise pointer integrity as follows: A
pointer dereference is allowed if and only if the pointer is
based on its target object. We adopt Kuznetsov et al.’s [24]
definition of “based on” and say a pointer P is based on a tar-
get object X if, and only if, P is obtained at run-time by ”(i)
allocating X on the heap, (ii) explicitly taking the address of
X, if X is allocated statically, such as a local or global vari-
able, or is a control-flow target (including return locations,
whose addresses are implicitly taken and stored on the stack
when calling a function), (iii) taking the address of a sub-
object y of X (e.g., a field in the struct X), or (iv) computing
a pointer expression (e.g., pointer arithmetic, array index-
ing, or simply copying a pointer) involving operands that are
either themselves based on object X or are not pointers.“
Kuznetsov et al’s CPI [24] (Section 8) provides fully pre-
cise integrity guarantees for code pointers by ensuring that
accesses to sensitive pointers are safe (sensitive pointers are
code pointers and pointers that may later be used to ac-
cess sensitive pointers). However, CPI requires dedicated,
integrity-protected storage for sensitive pointers.
As discussed in Section 7.2, PARTS, and PA solutions in
general, achieve an approximation of fully-precise pointer
integrity. In particular, PARTS allows the substitution of a
pointer P by another pointer P′ based on object X , if P and
P′ share the PA modifier. In other words, when PA modifiers
are unique to each protected pointer value, PA provides fully-
precise pointer integrity. However, ensuring the uniqueness
of PA modifiers is not possible in practice due to the fol-
lowing reasons: 1) program semantics may require a set of
pointers to be substitutable with each other (e.g., pointers to
callback functions) 2) the choice of allowed pointers may
depend on run-time properties (e.g., which callback func-
tion was registered earlier). In these cases, a unique mod-
ifier must be determined at run-time. Fully-precise pointer
integrity does not imply memory safety. In the case of PA,
if the modifier is determined at run-time and stored in mem-
ory, the PA modifier itself may become a target for an at-
tacker wishing to undermine the integrity policy. To avoid
this, modifier values must be derived in a way which leaves
the value outside the control of the attacker, e.g., stored in a
dedicated hardware register, or read-only program memory.
9.2 Fully-precise static CFI
In contrast to stateless CFI, which allows control-flow tran-
sitions present in its CFG regardless of the origin of the code
pointer value, PA-based solutions (including PARTS) can
preclude forged pointer values from outside the process. The
policy that prevents pointer reuse can suffer from limitations
similar to those present stateless CFI.
PARTS return address signing provides strong guarantees
even when subjected to pointer reuse. In contrast, a stateless
CFI policy allows a function to return to any of its call sites.
As such, static CFI cannot prevent injection of pointers that
are within the expected CFG, i.e., control-flow bending at-
tacks. PARTS additionally requires matching SP values, and
that the reused return address originates from a prior func-
tion invocation of the same function within the same process
for an attack to succeed.
PARTS forward-edge code pointer integrity provides sim-
ilar guarantees (under reuse attacks) as LLVM’s type-based
protection (when subjected to any forged pointer). In both
cases, attacks are limited to using pointers of the correct dy-
namic type. PARTS in addition requires that the injected
pointer originates from the victim process.
Path-sensitive CFI (Section 8) can provide stronger poli-
cies compared to both stateless CFI and PA-based solutions
but current implementations use either extensive run-time
monitoring or a shadow stack. In order for a shadow stack
to be effective, it must be protected from modification by
the attacker. This can be achieved by software instrumenta-
tion that sanitizes all memory accesses, hardware support for
per-instruction memory isolation, or randomization. While
shadow-stacks protected through randomization can be im-
plemented with minimal performance overhead, our adver-
sary model precludes this approach. Furthermore, software-
isolated shadow stack solutions impose impractical perfor-
mance overheads, and ARM processors do not currently pro-
vide direct hardware support for shadow stacks.
10 Conclusion and Future Work
We plan to extend PARTS protection architecture to other
protection domains like the OS kernel, or hypervisor. Such
additions require that key configuration is trapped in on a
higher exception level (EL), i.e., in the hypervisor or trusted
software. Trapping key configuration beyond the kernels
reach prevents the kernel from updating PA keys, and thus,
from handling context switches. Nonetheless, the only sig-
nificant change for PARTS architecture is to arrange for key
configuration for both kernel and EL0 PARTS to be trapped
(and managed) on a higher exception level (EL2,3). We are
further looking at adding C++ support PARTS. While we do
not expect any fundamental problems, some C++ specific
features, such as inheritance, cannot be directly handled by
our current instrumentation strategy. Authenticated pointers
with PACs cannot be used by legacy code (Section 2.2) while
PARTS-instrumented code will trap if pointers without PACs
are used. For legacy and PARTS code to interact, we can use
wrappers that manipulate function arguments and return val-
ues by embedding/stripping PACs. For shared pointers or
complex data structures, annotations can disable authentica-
tion of selected pointers, allowing programmers to manually
adjust pointer conversion to and from legacy code.
Currently, the PARTS compiler assumes shared libraries to
be uninstrumented. Instrumented shared libraries must deal
with PACs for statically allocated pointers after linking, and
thus require changes to the dynamic linker. Moreover, if a
future PA-scheme utilizes dynamic modifiers for shared ob-
jects, the dynamic linker could then harmonize PA modifiers
among all callees using the shared resources.
Pointer integrity does not imply full memory safety (Sec-
tion 9.1). Although ARMv8-A PA does not support bounds
checking for pointer accesses with authenticated pointers, it
has a general-purpose instruction, pacga, for producing and
validating PACs computed over the contents of two 64-bit
registers. This can be used to build authenticated canaries to
identify buffer overflow attacks, or to validate the integrity
(freshness) of atomic data, such as integer or counter values.
In principle, pacga instructions can even be chained to vali-
date arbitrary-sized blocks of data.
Finally, effective ways of complementing PA with other
emerging memory safety mechanisms like the forthcoming
support for memory tagging in ARMv8.5-A is an important
line of future work.
Acknowledgments
This work was supported in part by the Academy of Finland
under grant nr. 309994 (SELIoT), and the Intel Collabora-
tive Research Institute for Collaborative Autonomous & Re-
silient Systems (ICRI-CARS).
The authors thank Kostya Serebryany and Rémi Denis-
Courmont for interesting discussions and Zaheer Gauhar for
implementation assistance.
References
[1] ABADI, M., ET AL. Control-flow integrity principles,
implementations, and applications. ACM Trans. Inf.
Syst. Secur. 13, 1 (Nov. 2009), 4:1–4:40.
[2] APPLE INC. iOS Security — iOS 12.
https://www.apple.com/business/site/docs/
iOS_Security_Guide.pdf, 2018.
[3] ARM LTD. ARMv8 architecture reference manual, for
ARMv8-A architecture profile (ARM DDI 0487C.a).
https://static.docs.arm.com/ddi0487/ca/
DDI0487C_a_armv8_arm.pdf, 2017.
[4] ARM LTD. Cortex A57 Software Optimization Guide.
http://infocenter.arm.com/help/topic/
com.arm.doc.uan0015b/Cortex_A57_Software_
Optimization_Guide_external.pdf, 2018.
[5] ARM LTD. Fast models, version 11.4,
fixed virtual platforms (FVP) reference guide.
https://static.docs.arm.com/100966/1104/
fast_models_fvp_rg_100966_1104_00_en.pdf,
2018.
[6] AVANZI, R. The QARMA block cipher family. al-
most MDS matrices over rings with zero divisors,
nearly symmetric even-mansour constructions with
non-involutory central rounds, and search heuristics for
low-latency s-boxes. IACR Trans. Symmetric Cryptol.
2017, 1 (2017), 4–44.
[7] BRASSER, F., ET AL. DR.SGX: Hardening SGX
enclaves against cache attacks with data location
randomization. https://arxiv.org/abs/1709.
09917, 2017.
[8] CADAR, C., ET AL. Data randomization. Tech. Rep.
MSR-TR-2008-120, Microsoft Research, September
2008.
[9] CARLINI, N., ET AL. Control-flow bending: On the ef-
fectiveness of control-flow integrity. In Proc. USENIX
Security ’15 (2015), pp. 161–176.
[10] CHECKOWAY, S., ET AL. Return-oriented program-
ming without returns. In Proceedings of the 17th ACM
Conference on Computer and Communications Secu-
rity (New York, NY, USA, 2010), CCS ’10, ACM,
pp. 559–572.
[11] CHEN, S., ET AL. Detecting privileged side-channel
attacks in shielded execution with DéJà Vu. In Proc.
ACM ASIA CCS ’17 (2017), pp. 7–18.
[12] CHEN, S., XU, J., SEZER, E. C., GAURIAR, P., AND
IYER, R. K. Non-control-data attacks are realistic
threats. In Proc. USENIX Security ’05 (2005), pp. 177–
191.
[13] COWAN, C., ET AL. PointGuardTM: Protecting point-
ers from buffer overflow vulnerabilities. In Proc.
USENIX Security ’03 (2003), pp. 91–104.
[14] DAVI, L., ET AL. MoCFI: A framework to mitigate
control-flow attacks on smartphones. In Proc.NDSS ’12
(2012).
[15] DAVI, L., ET AL. HAFIX: Hardware-assisted flow in-
tegrity extension. In Proc. ACM/EDAC/IEEE DAC ’15
(2015), pp. 74:1–74:6.
[16] DEVIETTI, J., ET AL. Hardbound: Architectural sup-
port for spatial safety of the C programming language.
In Proc. ’08 (2008), pp. 103–114.
[17] DING, R., ET AL. Efficient protection of path-sensitive
control security. In Proc. USENIX Security ’17 (2017),
pp. 131–148.
[18] HU, H., ET AL. Data-oriented programming: On the
expressiveness of non-control data attacks. In Proc.
IEEE S&P ’16 (2016), pp. 969–986.
[19] HU, H., ET AL. Enforcing unique code target property
for control-flow integrity. In Proceedings of the 2018
ACM SIGSAC Conference on Computer and Commu-
nications Security (New York, NY, USA, 2018), CCS
’18, ACM, pp. 1470–1486.
[20] INTEL. Control-flow enforcement technology pre-
view. https://software.intel.com/sites/
default/files/managed/4d/2a/control-flow-
enforcement-technology-preview.pdf, 2016.
[21] ISO/IEC. ISO/IEC 9899:201x committee draft — De-
cember 2, 2010. http://www.open-std.org/jtc1/
sc22/wg14/www/docs/n1548.pdf, 2010.
[22] KORNAU, T. Return Oriented Programming for
the ARM Architecture. PhD thesis, Ruhr-Universität
Bochum, 2009.
[23] KUZNETSOV, V., ET AL. Poster: Getting the point(er):
On the feasibility of attacks on code-pointer integrity.
IEEE S&P ’15.
[24] KUZNETSOV, V., ET AL. Code-pointer integrity. In
Proc. USENIX OSDI ’14 (2014), pp. 147–163.
[25] LEE, S., ET AL. Inferring fine-grained control flow
inside SGX enclaves with branch shadowing. In Proc.
USENIX Security ’17 (2017), pp. 557–574.
[26] LI, R., AND JIN, C. Meet-in-the-middle attacks on
reduced-round QARMA-64/128. The Computer Jour-
nal 61, 8 (2018), 1158–1165.
[27] MASHTIZADEH, A. J., ET AL. CCFI: Cryptograph-
ically enforced control flow integrity. In Proc. ACM
CCS ’15 (2015), pp. 941–951.
[28] MICROSOFT. Enhanced Mitigation Experience
Toolkit. https://www.microsoft.com/emet, 2016.
[29] NAGARAKATTE, S., ET AL. SoftBound: Highly com-
patible and complete spatial memory safety for C. In
Proc. ACM PLDI ’09 (2009), pp. 245–258.
[30] NYMAN, T., ET AL. CFI CaRE: Hardware-supported
call and return enforcement for commercial microcon-
trollers. In Research in Attacks, Intrusions, and De-
fenses (2017), pp. 259–284.
[31] NYMAN, T., ET AL. HardScope: Thwarting DOP
with hardware-assisted run-time scope enforcement.
arXiv:1705.10295 [cs.CR], 2017.
[32] OLEKSENKO, O., ET AL. Intel MPX explained:
An empirical study of Intel MPX and software-based
bounds checking approaches. https://arxiv.org/
abs/1702.00719, 2017.
[33] PANDA, R., ET AL. Wait of a decade: Did SPEC CPU
2017 broaden the performance horizon? In Proc. IEEE
HPCA ’18 (2018), pp. 271–282.
[34] QUALCOMM TECHNOLOGIES, INC. Pointer authenti-
cation on ARMv8.3. https://www.qualcomm.com/
media/documents/files/whitepaper-pointer-
authentication-on-armv8-3.pdf, 2017.
[35] ROESSLER, N., AND DEHON, A. Protecting the stack
with metadata policies and tagged hardware. In Proc.
IEEE S&P ’18 (2018), pp. 1072–1089.
[36] SEO, J., ET AL. SGX-Shield: Enabling address
space layout randomization for SGX programs. In
Proc.NDSS ’17 (2017).
[37] SEREBRYANY, K., ET AL. Memory tagging and
how it improves C/C++ memory safety. arXiv:
1802.09517 [cs.CR], 2018.
[38] SHACHAM, H. The geometry of innocent flesh on the
bone: Return-into-libc without function calls (on the
x86). In Proc. ACM CCS ’07 (2007), pp. 552–561.
[39] SHACHAM, H., ET AL. On the effectiveness of
address-space randomization. In Proc. ACM CCS ’04
(2004), pp. 298–307.
[40] SHIH, M.-W., ET AL. T-SGX: Eradicating controlled-
channel attacks against enclave programs. In Proc.
NDSS ’17 (2017).
[41] SONG, C., ET AL. HDFI: Hardware-assisted data-flow
isolation. In Proc. IEEE S&P ’16 (2016), pp. 1–17.
[42] SZEKERES, L., ET AL. SoK: Eternal war in memory.
In Proc. IEEE S&P ’13 (2013), vol. 12, pp. 48–62.
[43] TSAMPAS, S., ET AL. Towards automatic compart-
mentalization of c programs on capability machines. In
Workshop on Foundations of Computer Security 2017
(8 2017), pp. 1–14.
[44] VAN DER VEEN, V., ET AL. Practical Context-
Sensitive CFI. In Proc. ACM CCS ’15 (2015), pp. 927–
940.
[45] WATSON, R. N. M., ET AL. CHERI: A hybrid
capability-system architecture for scalable software
compartmentalization. In Proc. IEEE S&P ’15 (2015),
pp. 20–37.
[46] WOODRUFF, J., ET AL. The CHERI capability model:
Revisiting RISC in an age of risk. In Proc. ’14 (2014),
pp. 457–468.
[47] ZONG, R., AND DONG, X. Meet-in-the-middle at-
tack on QARMA block cipher. IACR Cryptology ePrint
Archive (2016).
A nbench experimental setup
The nbench benchmarks employs dynamic workload adjust-
ment to allow the tests to expand or contract depending on
the capabilities of the system under test. To achieve this,
nbench employs timestamping to ensure that a test run ex-
ceeds a pre-determined minimum execution time. If a test
run finishes before the minimum execution time has been
reached, the test dynamically adjusts its workload, and tries
again. For example, the Numeric Sort test will construct an
array filled with random numbers, measure the time taken
to sort the array. If the time is less than the pre-determined
minimum time, the test will build two arrays, and try again.
If sorting two arrays takes less time than the pre-determined
minimum, the process repeats with more arrays.
Since we want to determine the relative overhead in exe-
cution time caused by our instrumentation, we employ the
methodology described by Brasser et al. [7] and modify
nbench to instead run each test a constant number of it-
erations. The number of iterations was determined indi-
vidually for each test based on the iteration counts deter-
mined by a unmodified nbench run on the FVP. We then
instrument the nbench benchmarks using our PA-analogue
(Section 7.3.2) and measure the relative execution time be-
tween non-instrumented and instrumented nbench tests on
the HiKey development platform using the BusyBox time
utility.
Each individual benchmark test was run 200 times us-
ing the pre-determined number of iterations. Figure 6a, in
Section 7.3.4 shows instrumentation overhead for individ-
ual tests in relation to the uninstrumented test run. Table 3
shows the numeric overhead ratio for each individual test.
Because the nbench benchmarks are designed to measure
performance in a manner which is operating system agnos-
tic, they are written in ANSI C and only execute in a single
thread. We therefore only consider user time when measur-
ing the overhead of the instrumentation, and exclude context
switches and system calls.
The run-time overhead of PARTS is dependent on spe-
cific run-time events, such as the number of function invo-
cations in the case of return address signing. Figure 6b in
Section 7.3.4 shows the order of magnitude of instrumented
run-time events in the nbench tests. We also report the user
mode run-time for uninstrumented nbench tests, the number
of iterations of each individual test, and number of instru-
mented run-time events in Table 4.
B SPEC CPU2017 experimental setup
Due to unmanageable simulation times in the FVP simulator
we have verified the correctness of PARTS instrumentation
only on a subset of SPEC CPU2017 benchmarks. Specif-
ically, we chose the 505.mcf_r and 519.lbm_r benchmarks
from the SPECrate 2017 integer and floating point suites,
because these were the smallest C benchmarks in terms of
lines of code. The benchmarks were compiled using SPEC
runcpu, with a AArch64-specific configuration specifying
whole-program-llvm13, with our PARTS-enabled LLVM, as
the compiler. We then extracted the bitcode — created by
whole-program-llvm during compilation — and used it to
instrument and compile the binaries we used for evaluation:
one uninstrumented, one instrumented with PA instructions,
and one instrumented with our PA-analogue. We enabled
both return address and forward-edge code pointer signing
for the instrumented binaries.
We run the PARTS-instrumented binaries on the FVP sim-
ulator to confirm correct functionality. The simulation time
for the tested benchmarks was between 12 and 48 hours. Per-
formance benchmarks, for baseline and PA-enabled binaries,
were run on the HiKey devices, using the same setup as our
13https://github.com/travitch/whole-program-llvm
Table 2: Overhead as ratio and standard deviation (σ ) for re-
turn address signing and (forward-edge) code pointer signing
for 505.mcf_r and 519.lbm_r SPEC benchmarks.
Benchmark Uninstrumented ret. addr. sign. + code ptr. integrity
ratio σ ratio σ
505.mcf_r 1 0.004 1.005 0.004
519.lbm_r 1 0.000 1.000 0.000
nbench evaluation. The results are shown in Table 2, and
are based on five runs of each benchmark. In 505.mcf_r we
observed overheads consistent with our results from nbench.
We observed no discernible overhead in 519.lbm_r. We at-
tributed this to the following properties of 519.lbm_r: (a) it
does not exhibit forward-edge code pointers, and (b) it has
few non-leaf function calls in relation to the arithmetic com-
putation performed part of the benchmark.
Table 3: Overhead as ratio and standard deviation (σ ) for nbench tests reported separately for uninstrumented, return address
signing, (forward-edge) code pointer signing, data pointer signing and all instrumentation enabled.
Test Uninstrumented
PARTS
ret. addr. sign code ptr. signing data ptr. signing all enabled
ratio σ ratio σ ratio σ ratio σ ratio σ
Numeric sort 1 0.002 1 0.003 1 0.003 1.293 0.003 1.293 0.003
String sort 1 0.002 1.01 0.002 1 0.002 1.251 0.002 1.259 0.002
Bitfield 1 0.002 1 0.002 1 0.002 1.15 0.002 1.15 0.001
FP emulation 1 0.001 1 0.001 1 0.001 1.395 0.001 1.396 0.001
Fourier 1 0.002 1.027 0.004 0.999 0.003 0.998 0.002 1.016 0.003
Assignment 1 0.001 1 0.002 1 0.002 1.145 0.002 1.145 0.002
Idea 1 0.001 1.004 0.002 1 0.002 1.279 0.002 1.283 0.002
Huffman 1 0.001 0.999 0.001 0.999 0.001 1.294 0.001 1.295 0.002
Neural net 1 0.001 1.002 0.002 1 0.002 1.001 0.002 1.001 0.003
Lu decomposition 1 0.001 1 0.002 1 0.002 1.173 0.002 1.173 0.002
Geometric average 1 - 1.004 - 1.000 - 1.191 - 1.195 -
Table 4: User mode run-time (utime) and standard deviation (σ ) in seconds for uninstrumented nbench tests, the pre-determined
number of iterations for each individual test, and the number of run-time events that are affected by instrumentation. Non-leaf
calls correspond to function invocations protected by return address signing. Leaf calls correspond to function invocations
which do no store the value of LR in memory, and thus can be left uninstrumented. Instruction pointers created and indirect
calls are instrumented by (forward-edge) code pointer signing, and data pointer loads / stores correspond to events where data
pointer instrumentation is active.
Test Baseline Instrumented eventsutime σ iterations non-leaf calls leaf calls instr. ptr. created indirect calls data ptr. ldr/str
Numeric sort 3.573 0.007 350 1802 7117598 10 5 302212833
String sort 2.971 0.005 125 3977237 1022510 10 5 180105579
Bitfield 2.687 0.004 101647890 5669 4308 10 5 104670943
FP emulation 5.862 0.004 35 616536 37906118 10 5 589518589
Fourier 2.693 0.005 25870 5240188 161 10 5 27504
Assignment 4.414 0.005 10 225602 113353 10 5 190662093
Idea 2.808 0.004 1500 1640184 54420196 10 5 196844406
Huffman 4.212 0.005 1000 17659 46983276 10 5 343176061
Neural net 5.477 0.007 10 359423 441412 10 5 782
Lu decomposition 3.596 0.005 230 18970 441412 10 5 186704928
C ARMv8-A PA Instructions
Table 5: List of PA instructions [3]. PA Key indicates the PA key the instruction uses. Addr. indicates the source of the address
to be signed / authenticated (Xd indicates that the address is specified using a general purpose register). Mod. indicates the
modifier used by the instruction (Xm indicates that the modifier is specified by a general purpose register.) The backwards-
compatible column indicates if the instruction encoding resides in the NOP space for pre-existing ARMv8-A processors.
Instruction Mnemonic
PA Key
Addr. Mod.
Backwards-
compatibleInstr. Data Gen-
A B A B eric
BASIC POINTER AUTHENTICATION INSTRUCTIONS
Add PAC to instr. addr.
paciasp 3 LR SP 3
pacia 3 Xd Xm 7
paciaz 3 LR zero 3
paciza 3 Xd zero 7
pacia1716 3 X17 X16 3
pacibsp 3 LR SP 3
pacib 3 Xd Xm 7
pacibz 3 LR zero 3
pacizb 3 Xd zero 7
pacib1716 3 X17 X16 3
Add PAC to data addr.
pacda 3 Xd Xm, 7
pacdza 3 Xd zero 7
pacdb 3 Xd Xm 7
pacdzb 3 Xd zero 7
Calculate generic MAC pacga 3 7
Authenticate instr. addr.
autiasp 3 LR SP 3
autia 3 Xd Xm 7
autiaz 3 LR zero 3
autiza 3 Xd zero 7
autia1716 3 X17 X16 3
autibsp 3 LR SP 3
autib 3 Xd Xm 7
autibz 3 LR zero 3
autizb 3 Xd zero 7
autib1716 3 X17 X16 3
Authenticate data addr.
autda 3 Xd Xm, 3
autdza 3 Xd zero 3
autdb 3 Xd Xm 3
autdzb 3 Xd zero 3
Strip PAC
xpacd Xd 7
xpaci Xd 7
xpaclri LR 3
COMBINED POINTER AUTHENTICATION INSTRUCTIONS
Authenticate instr. addr.
and return
retaa 3 LR SP 7
retab 3 LR SP 7
Authenticate instr. addr.
and branch
braa 3 Xd Xm 7
braaz 3 Xd zero 7
brab 3 Xd Xm 7
brabz 3 Xd zero 7
Authenticate instr. addr.
and branch with link
blraa 3 Xd Xm 7
blraaz 3 Xd zero 7
blrab 3 Xd Xm 7
blrabz 3 Xd zero 7
Authenticate instr. addr.
and exception return
eretaa 3 ELR SP 7
eretab 3 ELR SP 7
Authenticate data. addr.
and load register
ldraa 3 Xd zero 7
ldrab 3 Xd zero 7
