PACStack: an Authenticated Call Stack by Liljestrand, Hans et al.
PACStack: an Authenticated Call Stack
Hans Liljestrand
Aalto University, Finland
Huawei Technologies Oy, Finland
hans.liljestrand@aalto.fi
Thomas Nyman
Aalto University, Finland
thomas.nyman@aalto.fi
Lachlan J. Gunn
Aalto University, Finland
lachlan@gunn.ee
Jan-Erik Ekberg
Huawei Technologies Oy, Finland
Aalto University, Finland
jan.erik.ekberg@huawei.com
N. Asokan
Aalto University, Finland
asokan@acm.org
ABSTRACT
A popular run-time attack technique is to compromise the control-
flow integrity of a program by modifying function return addresses
on the stack. So far, shadow stacks have proven to be essential for
comprehensively preventing return address manipulation. Shadow
stacks record return addresses in integrity-protected memory, se-
cured with hardware-assistance or software access control. Soft-
ware shadow stacks incur high overheads or trade off security for
efficiency. Hardware-assisted shadow stacks are efficient and secure,
but require the deployment of special-purpose hardware.
We present authenticated call stack (ACS), an approach that uses
chained message authentication codes (MACs) to achieve compa-
rable security without requiring additional hardware support. We
present PACStack, a realization of ACS on the ARMv8.3-A archi-
tecture, using its general purpose hardware mechanism for pointer
authentication (PA). Via a rigorous security analysis, we show
that PACStack achieves security comparable to hardware-assisted
shadow stacks without requiring dedicated hardware. We demon-
strate that PACStack’s performance overhead is negligible (<1%).
1 INTRODUCTION
Traditional code-injection attacks are ineffective in the presence
of W⊕X policies that prevent the modification of executable mem-
ory [42]. However, code-reuse attacks can alter the run-time be-
havior of a program without modifying any of its executable code
sections. Return-oriented programming (ROP) is a prevalent attack
technique that corrupts function return addresses to hijack a pro-
gram’s control flow. ROP can be used to achieve Turing-complete
computation by chaining together existing code sequences in the
victim program. To prevent ROP, return addressesmust be protected
when stored in memory. At present, the most powerful protection
against ROP is using an integrity-protected shadow stack that main-
tains a secure reference copy of each return address [1]. Integrity
of the shadow stack is maintained by making it inaccessible to
the adversary either by randomizing its location in memory or by
using specialized hardware [25]. Recent software-based shadow
stacks show reasonable performance [10], but are vulnerable to
an adversary capable of exploiting memory vulnerabilities to infer
the location of the shadow stack. To date, only hardware-assisted
shadow stacks, such as Intel CET [25], achieve negligible over-
head without any security trade-offs. But employing such a custom
hardware mechanism incurs a development and deployment cost.
Recent ARM processors include support for general-purpose
pointer authentication (PA); a hardware extension that uses a tweak-
able message authentication code (MAC) to sign and verify point-
ers [2]. One initial use case of PA is the signing and verification of
return addresses [39]. However, current PA schemes are vulnerable
to reuse attacks, where the adversary can reuse previously observed
valid protected pointers [31]. Prior work [31, 39] and current im-
plementations by GCC 1 and LLVM 2 mitigate reuse attacks, but
cannot completely prevent them.
In this paper, we propose a new approach, authenticated call
stack (ACS), providing security comparable to hardware-assisted
shadow stacks, with minimal overhead and without requiring new
hardware-protected memory. ACS binds all return addresses into a
chain of MACs that allow verification of return addresses before
their use. We show how ACS can be efficiently realized using ARM
PA while resisting reuse attacks. The resulting system, PACStack,
can withstand strong adversaries with full memory access. Our
contributions are:
• ACS, a new approach for precise verification of function
return addresses by chaining MACs (Section 5).
• PACStack, a LLVM-based realization of ACS using ARM PA
without requiring additional hardware (Section 6).
• A systematic evaluation of PACStack security, showing that
its security is comparable to shadow stacks (Section 7).
• An experimental evaluation of PACStack performance, show-
ing that it incurs negligible overhead (<1%) (Section 8).
For realizing PACStack, we implemented an efficient authenticated
stack using ARM PA. This approach may be generalizable to other
data structures and applications (Section 10.1).
2 BACKGROUND
2.1 ROP on ARM
In ROP, the adversary exploits a memory vulnerability to manipu-
late return addresses stored on the stack, thereby altering the pro-
gram’s backward-edge control flow. ROP allows Turing-complete at-
tacks by chaining together multiple gadgets, i.e., adversary-chosen
sequences of pre-existing program instructions that together per-
form the desired operations. ARM architectures use the link register
(LR) to hold the current function’s return address. LR is automati-
cally set by the branch with link (bl) or branch with link to register
(blr) instructions that are used to implement regular and indirect
1https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html
2https://reviews.llvm.org/D49793
1
ar
X
iv
:1
90
5.
10
24
2v
1 
 [c
s.C
R]
  2
4 M
ay
 20
19
Liljestrand, et al.
tag/PAC sign ext./PAC virtual address (AP)
reserved bit8 bits VA_SIZE bits
64-bit modifier (M)
PA key (K)HK(AP, M)
3 – 23 bits
general purpose registers
configuration register
Figure 1: PA uses an embedded authentication token based
on the pointer’s address, a modifier, and a key.
function calls. Because LR is overwritten on call, non-leaf functions
must store the return address onto the stack. This opens up the
possibility of performing ROP on ARM architectures [27].
2.2 ARM Pointer Authentication
The ARMv8.3-A PA extension supports calculating and verifying
pointer authentication codes (PACs) [2]. A PA pac instruction cal-
culates a keyed tweakable MAC, HK(AP ,M), over the address AP of
a pointer P using a 64-bit modifierM as the tweak. The resulting
authentication token, referred to as a PAC, is embedded into the
unused high-order bits of P . It can be verified using a PA aut veri-
fication instruction that recalculates HK(AP ,M), and compares the
result to the PAC embedded in P .
Since the PAC is stored in unused bits of a pointer, its size is
limited by the virtual address size (VA_SIZE in Figure 1) andwhether
address tagging is enabled [2]. On a 64-bit ARM machine running
a default Linux kernel, VA_SIZE is 39, which leaves 16 bits for
the PAC when excluding the reserved and address tag bits. PA
provides five different keys; two for code pointers, two for data
pointers, and one for generic use. Each key has a separate set of
instructions3, e.g., the autia and pacia instructions always operate
on the instruction keyA, stored in the APIAKey_EL1 register. Access
to the key registers and PA configuration registers can be restricted
to a higher exception level (EL). Linux v5.0 4 adds full support for
PA, such that the kernel (at EL1) manages user-space (EL0) keys
and prevents EL0 from modifying them.
As currently specified, PA does not cause a fault on verification
failure; instead, it strips the PAC from the pointer P and flips one
of the high-order bits such that P becomes invalid. Upon using an
invalid pointer, the memory management unit detects the invalid
bit and generates a memory translation fault. We use the term using
to indicate the execution of any instruction that causes the pointer
to be translated, including pointer dereference and instruction fetch
on return or branch.
PA also supports the generic pacga instruction, which outputs a
32-bit PAC based on a 64-bit input value and a 64-bit modifier, but
does not include a corresponding verification instruction. To verify
the pacga PAC, instrumented code must explicitly compare it to
the expected value.
2.2.1 PA-based return address protection. Return address protec-
tion is the first published PA-based control-flow protection [39].
It is implemented as the -msign-return-address feature of GCC
3A full list of PA instructions from [31] is available in Appendix B.
4https://kernelnewbies.org/Linux_5.0#ARM_pointer_authentication
1 epilogue:
2 paciasp ; sign LR ❶
3 str LR, [SP] ; push LR onto stack
4 function_body:
5 ...
6 epilogue:
7 ldr LR, [SP] ; pop stack onto LR
8 autiasp ; verify LR ❷
9 ret
Listing 1: TheGCCand LLVM/Clang -msign-return-address
feature uses PA to sign and verify the return address in LR
when storing and loading it from the stack.
and LLVM/Clang. 5 An authenticated return address is computed
using the paciasp instruction (❶ in Listing 1) and verified with the
autiasp instruction (❷ in Listing 1). These instructions implicitly
use the value of stack pointer (SP) as the modifier. An adversary
cannot create the correct PAC for an arbitrary pointer and there-
fore cannot modify the return address without causing a fault on
function return.
Prior PA-based solutions, including -msign-return-address,
are vulnerable to reuse attacks where an adversary replaces a valid
authenticated return address with another authenticated return
address previously read from the process’s memory. For a reused
PAC to pass verification, both the original and replacement PAC
must have been computed using the same PA key and modifier. This
applies to any PA scheme, not only authenticated return addresses.
For instance, if a constant modifier is used then all pointers based
on the same key are interchangeable. Using only the SP value as
a modifier reduces the set of interchangeable pointers, but still
allows reuse attacks when SP values coincide. Reuse attacks can be
mitigated by further narrowing the scope of modifier values [31],
but such techniques cannot completely prevent reuse attacks.
3 ADVERSARY MODEL
In this work, we consider a powerful adversary, A, with arbitrary
control of process memory but restricted by a W⊕X policy. There-
fore A can read all process memory, but write operations and
execution are restricted such that A can neither modify program
code nor execute memory pages reserved for data (e.g., the pro-
gram stack). This adversary model is consistent with prior work
on run-time attacks [42].
These abilities allow A to modify any pointers in the process
data memory pages. In particular,A can modify the function return
addresses when they are stored on the program call stack.
In this work, we exclude adversaries with kernel mode privilege
escalation capabilities, i.e., A cannot undermine kernel integrity
or confidentiality. As a consequence, A cannot modify or read
sensitive data in kernel memory or kernel-managed registers, such
as the PA keys. As in prior work on control-flow integrity (CFI), we
do not consider non-control data attacks [12], such as data-oriented
programming (DOP) [23].
5https://gcc.gnu.org/gcc-7/changes.html and https://reviews.llvm.org/D49793
2
PACStack: an Authenticated Call Stack
ret0 ret1
auth0  = HK(ret0, 0) auth1  = HK(ret1, auth0) authn = HK(retn, authn-1)
retn
Figure 2: ACS is an chained MAC of tokens authi , i ∈ [0,n − 1] that are cryptographically bound to the corresponding return
addresses, reti , i ∈ [0,n], and the last authn .
4 REQUIREMENTS & ASSUMPTIONS
Our goal is to thwart A who modifies function return addresses
on the call stack in order to hijack the program control flow. We
define the following requirements for our solution:
R1 Return address integrity: Detect if a function return address
has been modified while in program memory.
R2 Memory disclosure tolerance: Remain effective even when A
can read the entire process address space.
R3 Compatibility: Be applicable to typical (standards-compliant)
C code, without requiring source code modifications.
R4 Performance: Impose only minimal run-time performance
and memory overhead, while meeting R1–R3.
We make the following assumptions about the system:
A1 AW⊕X policy that protects code memory pages from modifi-
cation by non-privileged processes. W⊕X is today supported
by all major processor architectures, including ARMv8-A.
A2 Coarse-grained forward-edge CFI. We assume that ACS is
combined with a CFI solution that restricts forward control-
flow transfers to a set of valid targets. Specifically, we assume
that indirect function-calls always target the beginning of
a function and that indirect jumps to arbitrary addresses
is infeasible. This property can be satisfied by several pre-
existing software-only CFI solutions with reasonable over-
head [1, 16, 28, 33], as well as with negligible overhead by
using hardware-assisted mechanisms like ARM PA itself [31],
branch target indicators [2], or TrustZone-M [5, 36].
Coarse-grained forward-edge CFI (A2) and W⊕X (A1) are used to
preventA from tampering with the instrumentation that maintains
the ACS, as discussed in Section 7.2.
5 DESIGN: AUTHENTICATED CALL STACK
In this section we present our general design for ACS, not tied to a
particular hardware-assisted mechanism. In Section 6, we present
our implementation that efficiently realizes ACS using PA. While
PA approximates pointer integrity it falls short when the modifier is
not unique to a pointer. Our key idea is to provide a modifier for the
return address by cryptographically binding it to all previous return
addresses in the call stack. This makes the modifier statistically
unique to a particular control-flow path, thus preventing reuse-type
attacks and allowing precise verification of return addresses.
Recall that on ARM systems, the return address is initially
stored in LR, which cannot be manipulated by A (Section 2.1).
However, non-leaf functions need to store their return address on
the stack before invoking a nested function. The return addresses
reti , i ∈ [0,n − 1] (where n is the depth of the call stack in terms of
stack-frame2
stack-frame1
auth0
ret1
stack-framei
authi-1
reti
authi := HK(reti, authi-1)
:= HK(reti, HK(reti-1, authi-2))
…
stack-frame0
ret0
auth1 := HK(ret1, auth0)
:= HK(ret1, HK(ret0, 0))
auth2 := HK(ret2, auth1)
:= HK(ret2, HK(ret1, auth0))
:= HK(ret2, HK(ret1, HK(ret0, 0)))
auth0 := HK(ret0, 0)
Figure 3: ACS stores return addresses and intermediate au-
thentication tokens, authi , i ∈ [0,n − 1], on the stack. Only
the last token (authn ) needs to be securely stored.
active function records) must thus always be stored on the stack,
where A can modify them by exploiting memory vulnerabilities.
ACS protects these values by computing a series of chained au-
thentication tokens authi , i ∈ [0,n] that cryptographically bind
the latest authn to all return addresses reti , i ∈ [0,n − 1] stored
on the stack (Figure 2). Only the MAC key and the last authentica-
tion token authn must be stored securely to ensure that previous
auth tokens and return addresses can be correctly verified when
unwinding the call stack. We use a tweakable MAC function HK to
generate a b-bit authentication token authi :
authi =
{
HK(reti ,authi−1) if i > 0
HK(reti , 0) if i = 0
authn is maintained in a register unmodifiable byA. Figure 3 shows
how authentication tokens and return addresses are stored on the
call stack. On function calls, authi is retained across the call to
the callee, which calculates authi+1 and stores both authi and the
corresponding return address reti+1 on its stack frame. On return,
auth′i−1 and ret
′
i values are loaded from the stack and are verified by
comparing HK(auth′i−1, ret ′i ) to authi . If the results differ, then one
or both of the loaded values have been corrupted (R1). Otherwise,
they are valid—i.e., auth′i−1 = authi−1 and ret
′
i = reti—in which
case authi is discarded from the secure register and replaced with
the verified authi−1 before the function returns to reti .
3
Liljestrand, et al.
CPU registers
LR CR
f1 stack
f2 stack
authi-1
reti
f1:
CR LR=authi-1
call f2 // sets LR reti
asd
LR CR=authi-1
…
f2:
stack  LR=reti, CR=authi-1
LR  HK(LR =reti, CR =authi-1)
…
…
CR, R  auth'i, ret'i-1 from stack
iff LR=reti equals  HK(R=ret’i, CR=auth'i-1) 
return to R
abort
Figure 4: To maintain the integrity of ACS the last authen-
tication token is maintained and retained through function
calls in the designated CR. The notation x ′ indicates that x is
read from the stack and may have been compromised.
5.1 Authenticated return addresses
We can avoid the need to maintain separate auth and ret values by
defining a combined authenticated return address as follows:
areti = authi ∥ reti ,where
authi =
{
HK(reti ,areti−1) if i > 0
HK(reti , 0) if i = 0
We call authi and the corresponding areti valid if they are equal
to HK(reti ,areti−1) for some given areti−1.
In this variant, not only the current authentication token, but
also the current return address are securely stored. Because the
plain return address reti is never stored on the stack, A is limited
to manipulating the earlier authenticated return addresses on stack,
i.e., areti , i ∈ [0,n−1]. A compromised authenticated return address
must therefore pass two authentications before use: first when being
restored from the stack, and second, when being used as the target
of a function return. We discuss the security properties in Section 7.
The remainder of Section 5 will focus on aret , but unless other-
wise noted, similar argumentation also applies for separate auth
tokens.
5.2 Securing the authentication token
The current authenticated return address aretn , is secured by keep-
ing it exclusively in a CPU register. On processors with a dedicated
link register, LR can be used to store aret ; otherwise an additional
register must be reserved for this purpose. On function calls, aret
must be securely retained during a function call that overwrites
LR. This is done by modifying the calling convention such that
aret is kept in a specific register which we call a chain register (CR)
(Figure 4).
ACS protects the integrity of backward-edge control-flow trans-
fers. Combined with coarse-grained forward-edge CFI (Assump-
tion A2), it ensures that: 1) immediately after function return, the
aretn in CR is valid, 2) at function entry the aretn−1 stored in CR
is valid, and 3) LR is always used as or set to a valid aret . This
ensures that token updates are done securely, and that the ACS
instrumentation cannot be bypassed or used to generate arbitrary
authenticated return addresses.
5.3 Mitigation of hash-collisions:
authentication token masking
Though aretn is protected by hardware, the fact that it is embedded
in the return pointer means that the size b of the authentication to-
ken auth is limited by the pointer address size. This is significant, as
collisions can be found afterA has seen, on average, approximately
1.253 · 2b/2 tokens [40, Section 1.4.2] (e.g., 321 tokens for b = 16).
Despite this, we can still prevent A from recognizing collisions,
thus forcingA to guess which authenticated return addresses yield
a collision, succeeding with a probability 2−b . The auth of any aret
stored on the stack is masked using a pseudo-random value derived
from the previous aret value:
authi = HK(reti ,areti−1) ⊕ HK(0,areti−1).
Themask HK(0,areti−1) is exclusive-OR-ed with HK(reti ,areti−1) af-
ter it is generated and before it is authenticated, thereby preventing
A from identifying opportunities for pointer reuse.
5.4 Irregular stack unwinding
The C standard includes the setjmp / longjmp programming in-
terface, which can be used to add exception-like functionality to
C (Listing 2). The longjmp C function executes a non-local jump
to a prior calling environment stored using the setjmp function.
At setjmp, callee-saved registers (whose values are guaranteed to
persist through function invocations), as well as the stack pointer
SP and return address are stored in the given jmp_buf buffer (➀ in
Listing 2). setjmp returns 0 to indicate that execution is continuing
directly after the call. Upon executing longjmp, the environment
is restored from jmp_buf (➂); program execution continues at the
setjmp return site with a non-zero value (➁). Calling longjmp us-
ing an expired buffer, i.e., after the corresponding setjmp caller
has returned (➃), results in undefined behavior (the implications of
this are discussed in Section 10.2). Because jmp_buf also stores the
latest authenticated token, ACS needs a mechanism to ensure its
integrity when using setjmp and longjmp.
When stored in memory, the integrity of jmp_buf cannot be
guaranteed. Nonetheless, the stored areti is bound to the corre-
sponding areti−1 on the setjmp caller’s stack. This ensures that
longjmp always restores a valid ACS state. To limit the set of values
A can inject into jmp_buf, we replace the setjmp return address
retb in jmp_buf with aretb , defined as as:
aretb = (HK(retb ,areti ) ∥ retb ) ⊕ HK(SPb ,areti ),
where SPb is the SP value stored in jmp_buf. When executing
longjmp, aretb is recalculated based on the buffer values to verify
that the stored areti was stored by a setjmp. A cannot generate
the aretb value for an arbitrary areti , nor replace aretb with a
previously observed areti . However, because longjmp explicitly
allows jumping to prior states, ACS cannot ensure that the target is
the intended one, i.e., A could substitute the correct jmp_buf with
another. Shadow stacks share a similar limitation [15], and cannot
guarantee that the intended state has been reached, only that the
return address (and stack pointer) in that state is intact.
4
PACStack: an Authenticated Call Stack
1 #include <setjmp.h>
2
3 jmp_buf ebuf;
4
5 void try_catch () {
6 int err;
7
8 if (!( err = setjmp(ebuf ))) { // iff ebuf set ➀
9 checked_func (); // after setjmp
10 } else {
11 handle_error(err); // after longjmp ➁
12 }
13 }
14
15 void checked_func () {
16 // ...
17 longjmp(ebuf , E_NUM); // throw exception ➂
18 // ...
19 }
20
21 int main() {
22 try_catch ();
23 longjmp(ebuf , E_NUM); // undefined behavior! ➃
24 }
Listing 2: setjmp / longjmp allows the programmer to trans-
fer execution to another location, potentially in another
function. The location, and the state of the environment af-
ter the transfer, is determined by an in-memory buffer con-
taining the calling environment of a previous setjmp call.
Calling longjmp after the calling environment is destroyed
results in undefined behavior.
6 IMPLEMENTATION: PACSTACK
We present PACStack, an ACS realization using ARMv8.3-A PA.
PACStack is based on LLVM 7.0 and integrated into the 64-bit ARM
backend, used via llc, the LLVM static compiler. PACStack adds
two compilation passes: 1) to instrument function calls for aret
propagation, and 2) to instrument function prologues and epilogues.
The instrumentation is applied by passing the -pafss-ng flag to
llc when transforming LLVM bitcode to target-specific assembly.
We plan to add PACStack support to the Clang compiler driver.
Compiler source code will be made available at https://pacstack.
github.io
6.1 Function call instrumentation
Recall from Section 5 that ACS can be implemented using separate
auth and ret tokens (variant 1), or using a combined authenticated
return address (variant 2).
In both PACStack variants, we designate the general purpose
register X28 as the chain register (CR) and reserve it for instrumen-
tation use. PACStack instruments call sites to move auth (variant 1)
or aret (variant 2) to CR (❶ in Listing 3) in order to retain its value
through function calls that overwrite link register (LR) (❷). After
function return the contents of CR are restored to LR (❸).
The advantage of using X28 is that it is a callee-saved register.
Whenever a function uses a callee-saved register, it must also ensure
that the old value is restored before return. By using X28 as CR,
1 call -site
2 mov X28 , LR ; CR← authi ❶
3 bl @func ; LR← r eti+1 ❷
4 mov LR, X28 ; LR← authi ❸
Listing 3: PACStack retains the last auth / aret via CR, defined
as the general purpose register X28.
PACStack can be transparently mixed with uninstrumented code
(either PACStack-instrumented applications using uninstrumented
libraries, or vice-versa). We discuss the security implications of
mixing instrumented and uninstrumented code in Section 10.3.
Our current PACStack implementation reserves X28 exclusively
for instrumentation use because the LLVM 7.0 implementation pre-
vents LR-use without substantial changes to compiler internals6.
However, we expect the performance cost to be negligible, as cases
where the compiler needs to utilize all callee-saved registers (X19-
X29) are infrequent. Note that reserving exclusive use of a register
has also been proposed for shadow stacks on the x86 architec-
ture [10], even though x86 has fewer general purpose registers
compared to 64-bit ARM processors. Unlike shadow stacks, ACS
in general can avoid consuming additional registers by using LR to
store auth (variant 1, Section 6.2) or aret (variant 2, Section 6.2).
6.2 Authenticated return addresses with PA
Variant 1: generating auth with pacga. In this variant, we use
pacga to generate auth tokens:
Xd← authi =
{
pacga(Xd, LR = reti , CR = authi−1) if n > 0
pacga(Xd, LR = reti , CR = any) if n = 0
To generate and verify authentication tokens, PACStack instru-
ments function prologues and epilogues (Listing 4). In the function
prologue, authi−1 and reti (in CR and LR, respectively) are stored
on the function stack frame and then used to generate a new authi
with pacga (❶). The authi−1 and reti values are then stored on the
function stack frame. Before function return, PACStack verifies the
auth′i−1 and ret
′
i read from the stack by calculating the correspond-
ing auth′i (❸) and comparing it to authi , stored in LR (❹). For ret0
any value currently in CR is used to generate auth0 and stored for
later validation.
Variant 1 can efficiently compute 32-bit authentication tokens
values using pacga. However, it has two drawbacks: First, an addi-
tional stack store / load is added for the 4-byte token; to preserve the
callee-saved behavior of CR, the full 8-byte register content must be
stored on the stack. Second, the output of pacga must be explicitly
checked using a comparison and a conditional branch instruction.
For this reason, our current implementation only supports variant 2
below. However, in Section 10.1 we discuss using pacga to bind
other stack-based write-once data to a specific ACS state.
Variant 2: generating aret with autib. In this variant, we use
pacib and autib instructions to efficiently calculate and verify ACS
authenticated return addresses (Listing 5). These instructions differ
from pacga in that the output is an authenticated return address
6https://github.com/llvm/llvm-project/blob/llvmorg-7.0.0/llvm/lib/Target/AArch64/
AArch64CallingConvention.td#L278
5
Liljestrand, et al.
1 prologue:
2 stp X28 , LR , [SP] ; stack← authi−1, r eti
3 pacga LR , LR , X28 ; LR← authi ❶
4 function_body:
5 ...
6 epilogue:
7 ldp X28 , Xr , [SP] ; CR, Xr← auth′i−1, r et ′i from stack❷
8 pacga Xd , Xr , X28 ; Xd← auth′i ❸
9 cmp Xd , LR ; if (auth′i , authi ) ❹
10 jnz abort ; then abort
11 ret Xr ; return via Xr to r eti
Listing 4: Variant 1 of PACStack generates and verifies auth
tokens using pacga (❶ and ❸). Both authi−1 and reti are
stored on the stack, and are hence validated against authi
on function return (❷). Where possible, the store pair (stp) /
load pair instructions (ldp) are used tominimize the latency
for successive loads / stores.
1 prologue:
2 str X28 , [SP] ; stack ← areti−1 ➀
3 pacib LR , X28 ; LR← areti ➁
4 function_body:
5 ...
6 epilogue:
7 ldr X28 , [SP] ; CR← aret ′i−1 from stack ➂
8 autib LR , X28 ; LR← (r eti or r et ∗i ) ➃
9 ret
Listing 5: At function entry, PACStack stores the prior
areti−1 on the stack (➀) and generates the new areti (➁).
Before return, areti−1 is loaded from the stack (➂) and
verified against areti (➃). On verification failure, LR is set to
an invalid address ret∗i , causing a fault on return.
which is directly written to LR:
LR← areti =
{
pacib(LR = reti , CR = areti−1) if i > 0
pacib(LR = reti , CR = any) if i = 0
The corresponding verification is similar, and defined as:
LR← autib(LR = areti , CR) =
{
reti if HK(reti , CR) = authi
ret∗i otherwise,
where autibwill automatically handle verification errors by setting
LR to an unusable address ret∗i . No additional checking is needed;
executing a return to ret∗i causes a address translation fault (Sec-
tion 2.2). In variant 2, PACStack requires no additional stack space
as areti−1 is stored on the stack in place of reti , not in addition to
it. The value of CR for aret0 is handled identically as in variant 1
for auth0.
6.3 Mitigating hash collisions: PAC masking
To preventA from identifying PAC collisions that can be reused to
violate the integrity of the call stack, PACStack masks all authenti-
cation tokens values before storing them on the stack (Listing 6).
A pseudo-random value is generated by generating a PAC for the
address 0x0, pacib(0,areti−1) (❷, ❹).
1 prologue:
2 str X28 , [SP] ; stack ← areti−1
3 pacib LR, X28 ; LR← aretunmaskedi
4 mov Xd, X28 ; Xd← areti−1
5 mov X28 , #0 ; CR← 0
6 pacib X28 , Xd ; CR←maski ❶
7 eor X28 , X28 , LR ; CR←maski ⊕ aretunmaskedi ❷
8 function_body: ; = areti
9 ...
10 epilogue:
11 ldr Xd, [SP] ; Xd← aret ′i−1 from stack
12 mov LR, #0 ; LR← 0
13 pacib LR, Xd ; LR←mask ′i ❸
14 eor LR, LR, X28 ; LR←mask ′i ⊕ areti ❹
15 mov X28 , Xd ; CR← aret ′i−1
16 autib LR, X28
17 ret
Listing 6: PACStackmasks authentication tokens to prevent
A from detecting PAC collisions. The mask is created in
CR with pacib(0,areti−1) (❶), and exclusive-OR-ed with the
unmasked authentication token (❷). On return, the mask
is recreated (❸) and applied to the masked authentication
token areti−1 (❹) before verification.
By using pacib we efficiently obtain a pseudo-random value
that can be directly applied to the authentication token part of aret
using only an exclusive-or instruction (eor).
Because this construction uses the same key to generate both
authentication tokens and masks, A must not obtain an areti for a
reti = 0x0 and any existing areti−1. PACStack will never generate
such aret values, as the return address never points to memory
address zero. To prevent leaking the mask directly, it is cleared after
use. We can thus be certain that no HK(0,x) value is visible to A
nor possible to pre-compute without the confidential PA key.
This approach to masking requires two additional PAC calcu-
lations for each function activation. Our current implementation
supports this as an optional feature that can be invoked using the
-pafss-ng-cp flag.
6.4 Irregular stack unwinding
PACStack binds jmp_buf buffers to the areti at the time of setjmp
call by replacing the setjmp return address retb with its authenti-
cated counterpart aretb (Section 5.4). We do not modify the libc
implementation; instead setjmp / longjmp calls are replaced with
wrapper functions that inject the necessary instrumentation.
The setjmp_wrapper wrapper function (Listing 7) executes
setjmp and updates the buffer with aretb . PACStack generates
aretb based on the current SP value, CR and the setjmp return ad-
dress; this avoids the need to read the values setjmp has stored.
The longjmp_wrapper (Listing 8) retrieves aretb , areti , and the SP
values from the buffer. It then verifies the values and writes retb
into jmp_buf.
6.5 Multi-threading
Because ARMv8-A has only one set of general purpose registers
their values must be stored in memory when entering EL1 (i.e.
6
PACStack: an Authenticated Call Stack
1 setjmp_wrapper:
2 ... ; Xb← jmp_buf
3 bl <setjmp >
4 ret_b:
5 cbz X0 , <return > ; exit iff after longjmp
6 mov Xd , <ret_b > ; Xd← r etb
7 mov X28 , SP; ; Xd← SPb
8 pacib Xd , LR; ; Xd← pacib(SPb, areti )
9 pacib X28 , LR; ; CR← pacib(r etb, areti )
10 eor X28 , X28 , Xd; ; CR← aretb
11 str X28 , [Xb, #r] ; replace return in jmp_buf
12 return:
13 ...
Listing 7: PACStack redirects setjmp calls to its own
setjmp_wrapper that binds the return address and areti
in jmp_buf to the current stack frame and corresponding
areti−1. #r is the offset of the return address within jmp_buf.
1 longjmp_wrapper:
2 ... ; Xb← jmp_buf
3 ldr X28 , [Xb, #a] ; CR← aret ′i
4 ldr LR , [Xb, #r] ; LR← aret ′b
5 ldr Xd , [Xb, #s] ; CR← SP ′b
6 autib Xd , LR; ; Xd← autib(SP ′b, aret ′i )
7 autib X28 , LR; ; CR← autib(aret ′b, aret ′i )
8 eor X28 , X28 , Xd; ; CR← r etb
9 str X28 , [Xb, #r] ; replace return in jmp_buf
10 bl <longjmp >
11 ...
Listing 8: Before longjmp, the PACStack longjmp_wrapper
checks the binding of the aret ′b , ret
′
b and sp
′
b values stored
in jmp_buf.A cannot generate aret ′b for arbitrary values and
therefore cannot inject them in jmp_buf. #r, #a and #s are the
offsets to retb , CR, and reti within jmp_buf.
kernel-mode) from EL0 (i.e. user-mode), for example during context
switches and system calls. This must not allow A to modify the
aret values or read the mask, which are both exclusively in either
CR or LR during execution (Listings 5 and 6), but must be stored
in memory during the context switch. On ARMv8-A, system calls
are implemented using the supervisor call instruction (svc) that
switches the CPU to EL1 and triggers a configured handler. On
64-bit ARM, Linux v5.0 uses the kernel_entry7 macro to store all
register values on the EL1 stack, where they cannot be accessed
by user-space processes. During context switches, callee-saved
registers (including CR) and LR are stored in struct cpu_context8
which belongs to the in-kernel task structure and cannot be accessed
by user space. The CR and LR values of a non-executing task are
thus securely stored within the kernel, beyond the reach of other
processes or other threads within the same process. Thus, no kernel
modifications are needed to securely apply PACStack to multi-
threaded applications.
7https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/
kernel/entry.S?h=v5.0
8https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/
include/asm/processor.h?h=v5.0
7 SECURITY EVALUATION
We address two questions in this section:
(1) Is the ACS scheme cryptographically secure?
(2) Do ACS’s guarantees hold when instantiated as PACStack?
7.1 ACS security
A generic representation of an attack against ACS is shown in
Figure 5. Under normal operation, function C returns to A if called
from A (Figure 5a); i.e., when called from A, the return address of
C is an address retA in A. The goal of A, shown in Figure 5b, is to
cause the function C to return to some other address retB .
Since the authenticated return address aretA containing retA
is protected from A, in order to perform a backward-edge
control-flow attack, A must achieve two goals successfully:
AG-Jump: Obtain an authenticated return address aretB ,
valid with respect to some known modifier, which will validate
successfully when C returns.
AG-Load: Violate the integrity of the call stack such that the
LR register is loaded with aretB from AG-Jump rather than the
correct authenticated return address aretA.
This requires two returns: one from a ‘loader’ function to load
A’s aretB into LR, and another from C to the return address retB
contained in aretB .
In the analyses below, we treat the authentication token HK(P,m)
as a random oracle with respect to both the pointer P andmodifierm.
This means that if HK(P ,m) has never been computed by a function
call, HK(P ,m)will match any value with probability 2−b . The design
of ACS ensures that there is no authentication oracle available: the
only way to test whether an authentication token is valid with
respect to some pointer and modifier is to attempt to return using
the return address and token, triggering a crash if the token is
incorrect. The difficulty of achieving these goals therefore depends
onwhetherA’s desired control-flow violation follows the call graph
of the program and whether authentication tokens are masked.
Violating control-flow integrity while still traversing the call graph
is easier because this allows A to harvest authentication tokens
and search for collisions; violations that do not follow the call graph
are more difficult because they require that A make one or more
guesses, risking a crash.
7.1.1 Violations that follow the call graph. As A can harvest au-
thenticated return pointers when they are written to the stack, the
short authentication tokens mean that in the absence of masking
an attacker can violate the integrity of the call stack by finding
collisions in HK(·, ·).
In order to achieve goal AG-Load,A must find two authenticated
return addressesaretA andaretB , such that i) they are both returned
to by a functionC , ii) thatC contains a call-site to the loader function
with a corresponding return address retC , and iii) such that
HK(retC ,aretA) = HK(retC ,aretB ) = authcollision. (1)
Note that the collisions must be for different values in the sec-
ond argument only, since this is the value under the control of A.
7
Liljestrand, et al.
C
BA
Correct control flow
(a) Normal control flow.
C
loader
BA
AG-Load
AG-Jump
Correct control flow
(b) A’s desired control flow.
Figure 5: Anatomy of a backward-edge control-flow attack against ACS. In order to force functionC to return to B instead of its
caller A, A substitutes their authenticated return address aretB when some function—the ‘loader’—returns to retC in function
C (goal AG-Load). If aretB is valid with respect to some known modifier, then at the end of functionC the program will return
to the corresponding retB (goal AG-Jump).
Collisions that require different values for retC cannot be exploited
because retC is in a register and cannot be modified by A.
The authentication tokens contained in aretA and aretB depend
on the path that A has taken through the call graph. A can obtain
as many authentication tokens with retC as a pointer as there are
distinct execution paths leading toC . The number of such paths will
explode combinatorially as the complexity of the program increases,
and cycles in the call graph—as occur in Figure 5—make the number
of paths essentially infinite, limited only by available stack space.
Having found such a collision, A then arranges for function C
to be called, traversing the call graph in such a way that it is set up
to return toA using aretA. Then, when the functionC calls into the
loader function, it will set LR to aretC . When the loader function
returns to retC , it will attempt to load aretA from the stack. Instead,
A substitutes aretB , which because of (1) will validate correctly
when returning to retC. Since aretB is a valid authenticated return
address, C will successfully return to retB , thereby violating the
integrity of the call stack.
More concretely, after collecting q authentication tokens, the
probability that some pair collides is [40, Section 1.4.2]:
pcollision(q) = 1 −
2b !
(2b − q)! · 2q ·b
This quickly approaches 1 as A collects more tokens, on average
occurring after obtaining
q =
√
π2b
2
tokens. With a 16-bit PAC, A will therefore obtain a collision after
harvesting 321 pointers on average.
In order to successfully mount the attack described above, A
must find two colliding authentication tokens and then perform
the substitution. Without masking, A can read the authentication
token from the stack. A can then keep collecting authentication
tokens until they find two that collide; since these are both valid
pointers, A will always succeed once this occurs, thus
P[AG-Load|Collision] = 1.
When masked, A cannot identify authentication token colli-
sions: aretA and aretB have different masking values HK(0,aretA)
and HK(0,aretB ), and so it is impossible to identify a collision with
a more than negligibly-better probability than by random selec-
tion. This means that A will succeed in the attack above with a
probability of 2−b . We give a detailed proof in Appendix A.
In practice, this means thatA can use this attack to traverse the
program’s call graph, but cannot jump to an address that is not a
valid return address for C function.
7.1.2 Violations that leave the call graph. We now consider A’s
probability of success when attempting to return to an address retB
in a way that that does not follow the program’s call graph.
In this case, the path from B to C has not been traversed, and
the instrumentation has never before computed the authentica-
tion token HK(retC ,aretB ). Therefore,A succeeds at AG-Load—i.e.,
HK(retC ,aretB ) = HK(retC ,aretA)—with probability P[AG-Load] =
2−b , irrespective of whether the substituted aretB is a valid authen-
ticated return address. On failure, which has probability 1 − 2−b ,
the process will crash.
A’s probability of then achieving goal AG-Jump depends on
whether retB is the return address of a valid call-site. If it is, then
A can obtain a valid authenticated return pointer for that location
in the same way as in Section 7.1.1, thereby succeeding with prob-
ability P[AG-Jump] = 1. If retB has never been used as a return
address, then no authentication token has ever been generated for
that pointer. Therefore, AG-Jump is achieved with probability at
most P[AG-Jump] = 2−b ; otherwise, the process crashes.
A can therefore succeed with probability 2−b when the return
address is a valid call-site return address, or with probability of
2−2b when the return address is not.
We summarize our results in Table 1.
8
PACStack: an Authenticated Call Stack
Violation type No masking Masking
On-graph 1 2−b
Off-graph to call-site 2−b 2−b
Off-graph to arbitrary address 2−2b 2−2b
Table 1: Maximum probability of success various call-stack
integrity violations with and without masking.
7.2 Run-time attack resistance of PACStack
PACStack must ensure the integrity of aretn and the confidentiality
of the masks. The former is achieved by storing aretn in LR or
CR, which is reserved for this purpose, not used by regular code,
and hence inaccessible to A (Section 6.1). The latter is maintained
as the mask is re-generated each time it is needed, only stored
in LR, and cleared after use (Section 6.3). This holds true also in
multi-threading environments (Section 6.5).
Recent results have shown that traditional CFI solutions are un-
able to withstand control-flow bending [11]; attacks where each
control-flow transfer follows the program’s CFG, but the program
execution trace conforms to no feasible benign execution trace.
PACStack—or ACS in general—is not susceptible to backward-edge
control-flow bending, because it precisely protects the integrity of
the authenticated return addresses while they remain on the stack.
A cannot trick PACStack to deviate from an expected return flow
by replacing aretn with a valid, but outdated aret value, because
PACStack never writes aretn onto the stack.A also cannot reliably
exploit PAC collisions to replace part of the aret chain, as each
aret is masked.A cannot tamper with the instrumentation itself by
modifying the instructions in memory (Assumption A1). By requir-
ing coarse-grained forward-edge CFI (Assumption A2), PACStack
ensures that authentication token calculations and masking are
executed atomically and cannot be used to manipulate reti , areti−1
or the mask during the function prologue and epilogue. This holds
even if the forward-edge CFI is susceptible to control-flow bending.
7.2.1 Tail-calls and signing gadgets. A recent discovery by Google
Project Zero 9 shows that PA schemes can be vulnerable to an
attack whereby specific code sequences can be used as gadgets to
generate PACs for arbitrary pointers. Recall that on PAC verification
failure an aut instruction removes the PAC, but corrupts a well-
known high-order bit such that the pointer becomes invalid. If a
pac instruction adds a PAC to a pointer P with corrupt high-order
bits, it treats the high-order bits as though they were correct when
calculating the new PAC, and flips a well-known bit p of the PAC
if any high-order bit was corrupt. This means that instruction
sequences such as the one shown in Listing 9, consisting of an aut
instruction followed by a pac instruction, can be used generate a
valid PAC for a pointer even if the original pointer is not valid to
begin with. A writes an arbitrary pointer P to memory (❶) and
allows it to be verified. When verification fails, autia removes the
PAC, and corrupts the high-order bit in P , writing the resulting P∗
to the destination register (❷). The subsequent pacia will add the
correct PAC for P , then flip bit p of the PAC to indicate that the input
9https://googleprojectzero.blogspot.com/2019/02/examining-pointer-authentication-
on.html
1 ... ; A injects P at <ptr> ❶
2 ldr Xd, <ptr > ; Xd← P
3 autia Xd, <mod > ; Xd← P ∗ ❷
4 pacia Xd, <mod > ; Xd← pacia (P, < mod >) ⊕ p ❸
5 str Xd, <ptr > ; <ptr> ← Xd
6 ... ; A sets <ptr> to <ptr> ⊕ p ❺
7 ldr Xd, <ptr > ; Xd← pacia (P, < mod >)
8 autia Xd, <mod > ; Xd← P (valid pointer) ❻
Listing 9: PA adds a PACbased on the address bits. An invalid
input pointer (❶), causes only a single bit-flip in the output
PAC (❸). This could be exploited to generate valid PACs for
arbitrary pointers.
1 A:
2 epilogue:
3 ...
4 ldr X28 , [SP] ; load invalid aret ′i−1
5 autib LR, X28 ; LR← r et ∗i ➂
6 b <B> ; tail call B ➀
7
8 B:
9 prologue:
10 str X28 , [SP]
11 pacib LR, X28 ; LR← areti ⊕ p ➄
12 ...
13 epilogue:
14 ...
15 autib LR, X28 ; LR← r et ∗i ➃
16 ret ; ➁
Listing 10: Tail-call optimizations on 64-bit ARM remove
an unnecessary return by converting a branch with link
instruction to a non-linking branch instruction (➀).
pointer was invalid (❸). A can now flip bit p back (❺) in order to
obtain the correct PAC for pointer P (❻).
The PA signing gadget requires finding a matching
⟨autib, pacib⟩ pair operating on pointer P in the code without any
use of P between these instructions. In PACStack each verification
is immediately followed by a return, which ensures that the
failure is detected. Tail-calls are a notable exception. Tail-calls are
function calls executed before return and optimized so that the
callee directly returns to the caller of the optimized invocation of B
in Listing 10. On 64-bit ARM processors, tail-calls are implemented
using the b or br instructions that do not update LR (➀). The
tail-called function can return (➁) to the LR value set before
the tail-call (➂). PACStack limits A to modifying the previous
authentication token on the stack. A could attempt to exploit the
signing gadget to trick PACStack to accept an invalid aret ′i−1 (➃),
and subsequently load it into LR after return. This is not possible
as A cannot flip the bit p of aret ′i (➄), because areti ⊕ p is: 1) kept
in LR while in B, and 2) verified against areti+1 on subsequent
function calls from B. The invalid aret ′i−1 is thus rejected by autib
(➃) before the return from B.
9
Liljestrand, et al.
1 ; replace:
2 ; pacia LR , CR
3 ; with:
4 eor LR , LR, #const1
5 eor LR , LR, #const2
6 eor LR , LR, #const3
7 eor LR , LR, CR
Listing 11: PA-analog used to simulate overhead on non-
PA hardware, based on an estimated overhead of 4
cycles. Three exclusive-or inputs are constants, whereas
the last instruction uses both inputs to ensure instruction
pipelining must get both values.
8 PERFORMANCE EVALUATION
At present, the only publicly available PA-enabled SoCs are the
Apple A12 and S4, neither of which support PA for 3rd party code
at the time of writing. To verify the correctness of instrumentation
we ran all benchmarks on the ARMv8-A Base Platform Fixed Virtual
Platform (FVP), based on Fast Models 11.4, which supports ARMv8.3-
A [4]. Because the FVP runs the v4.14 kernel, we have used PA RFC
patches 10 modified to support all PA keys.
The FVP is not cycle-accurate and executes all instructions in
one master cycle; therefore, it cannot be used for performance
evaluation. Based on prior evaluations of the QARMA cipher [7],
which is used as the underlying cryptographic primitive in reference
implementations of PA [39], Liljestrand et al. estimate that the PAC
calculations incur an average overhead of four cycles on a 1.2GHz
CPU [31]. We employ the PA-analog (Listing 11) introduced by
Liljestrand et al. to estimate the run-time overhead of PACStack.
The PA-analog consists of four eor instructions that both read and
write the registers used by the corresponding PA instruction in
order to induce similar constraints on instruction pipelining within
the CPU. To preserve compiler behavior, the PA-analog is swapped-
in during a separate pre-emit pass, i.e., after both register allocation
and instruction scheduling.
Using the PA-analog, we conducted benchmarks on a 96board
Kirin 620 HiKey (LeMaker version) with an ARMv8-A Cortex A53
Octa-core CPU (1.2GHz) / 2GB LPDDR3 SDRAM (800MHz) / 8GB
eMMC, running the Linux kernel v4.18.0 and BusyBox v1.29.2.
We have performed benchmarks using both nbench-byte-2.2.311
program and the SPEC CPU 2017 benchmark package12.
8.1 nbench-byte-2.2.3
The nbench program includes 10 separate benchmarks and is de-
signed to measure CPU and memory performance. The benchmarks
employ dynamicworkload adjustment to ensure that a test run takes
at least a certain amount of time. In order to determine the relative
overhead introduced by PACStack, we took the same approach as
prior work [9, 31] andmodified nbench to perform a pre-determined
number of iterations of each benchmark and measured the execu-
tion time of each separately. All binaries used in the performance
evaluation were produced by our PACStack-enabled compiler. We
10https://lwn.net/Articles/752116/
11http://www.math.utah.edu/~mayer/linux/bmark.html
12https://www.spec.org/cpu2017
-0.50% 0.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00%
Numeric sort
String sort
Bitfield
FP emulation
Fourier
Assignment
IDEA
Huffman
Neural net
LU decomposition
PACStack without masking PACStack with masking
Figure 6: Relative performance overhead of the individual
nbench-byte-2.2.3 benchmarks. The error bars indicate the
standard error for n = 10 test runs per benchmark. The geo-
metric mean of of all benchmarks is 0.5%.
disabled all optimizations when compiling benchmark binaries (-O0
flag for Clang and LLVM, and -O=0 for llc). We evaluated the per-
formance of nbench in three configurations: i) PACStack disabled,
to determine the baseline execution time; ii) PACStack enabled,
without PAC masking; and iii) PACStack enabled, with PAC mask-
ing. We repeated each benchmark 10 times and measured the user
time using the time utility for each benchmark run. The results are
shown in Figure 6, and indicate an overhead of 0.5% when using
PAC masking, and an overhead of <0.3% without (geometric mean
of all benchmarks [44]).
8.2 SPEC CPU 2017
In contrast to nbench-byte-2.2.3, SPEC CPU 2017 is an industry-
standard benchmarking suite that consists of larger units of work
based on real-world applications. Due to resource constraints it was
not feasible to install both the PACStack compiler and the SPEC
CPU suite on the FVP or HiKey board. Instead we compiled the
benchmarks with the SPEC runcpu utility configured to use WL-
LVM13 as the compiler. WLLVM produces binaries containing the
LLVM Intermediate Representation (IR), which we extracted and
instrumented using PACStack and the PA-analog. All compilation
phases were executed using the -O0 or -O=0 flag to disable opti-
mizations. The benchmark execution command and input files were
determined using the SPEC specinvoke utility and then timed on
the HiKey board using time.
Our measurements include all C-language SPECrate benchmarks,
with the exception of two benchmarks that were incompatible with
the WLLVM build environment that we used. For each benchmark,
we compared the performance of three different configurations: i) a
baseline with PACStack disabled, ii) PACStack without masking,
and iii) PACStack with masking. Results are shown in Figure 7 and
are reported as the mean overhead and corresponding standard er-
ror. The SPEC CPU 2017 benchmark suite is resource intensive [38];
a single iteration of all SPEC benchmarks in Figure 7 took 13 times
longer than an iteration of all nbench benchmarks. We therefore
13https://github.com/travitch/whole-program-llvm
10
PACStack: an Authenticated Call Stack
n=8
n=4
n=7
n=20
n=4
n=4
-0.50% 0.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00%
505.mcf_r
519.lbm_r
525.x264_r
538.imagick_r
544.nab_r
557.xz_r
PACStack without masking PACStack with masking
Figure 7: Relative performance overhead for SPECCPU 2017
benchmarks; error bars show the standard error for n mea-
surements. The geometric mean of all benchmarks is 0.9%.
performed fewer measurements for SPEC than for nbench. Conse-
quently, though the SPEC benchmarks are more representative of
real-world workloads, they are more sensitive to outliers than those
in Figure 6. The results show an overhead of 0.9% with masking,
and 0.4% without masking (geometric mean of all benchmarks).
The performance overhead of PACStack is proportional to the fre-
quency of function calls; benchmarks with few function calls are
affected less by the instrumentation compared to benchmarks with
frequent function calls. For instance, the 519.lbm_r benchmark
performs fluid dynamics and consists of large nested loops with few
function calls. Consequently we see little effect on performance in
519.lbm_r; in fact, our measurements show a small improvement
in performance, which is likely caused by CPU pipeline optimiza-
tions that happen to be advantageous.
Based on these results, we expect the overhead for both PAC-
Stack configurations to be negligible on ARMv8.3-A PA-capable
hardware.
9 RELATEDWORK
Control-flow hijacking attacks were discovered and popularized
more than two decades ago [41]. The majority of CFI solutions
proposed since then are stateless: they validate each control-flow
transfer in isolation without distinguishing among different paths
in the control-flow graph (CFG). Fully-precise static CFI [11] is in
theory the most restrictive stateless policy that is possible with-
out breaking the intended functionality of the protected program.
In fully-precise static CFI, and by extension any stateless policy,
the best possible policy for return instructions is to allow returns
within a function F to target any instruction that follows a call to
F . All stateless CFI schemes, including fully-precise static CFI, are
vulnerable to control-flow bending [11].
Stateful CFI can express policies that take previous control-
flow transfers into account. HAFIX [17] is a hardware-assisted CFI
scheme that confines function returns to active call sites. Context-
sensitive CFI [18, 24, 45] further ensures that each control-flow
transfer taken by the program is consistent with a non-malicious
trace. This leads to a more expressive policy compared to stateless
CFI, but context-sensitive CFI enforcement has been dismissed as
impractical for real-world adoption [1]. Hardware-assisted branch
recording features available in modern 64-bit Intel microprocessors
show promise in enabling context-sensitive CFI enforcement on
commodity hardware, but suffer from i) limited branch history used
to make CFI decisions, ii) over-approximation of the program CFG,
iii) reliance on complex run-time monitoring. HAFIX, on the other
hand, requires changes to the underlying processor architecture.
Stateless forward-edge CFI enforcement is often combined with
a shadow stack [1, 13–16, 20, 21, 25, 35, 36, 43] to enforce the in-
tegrity of return addresses stored on the call stack. In fact, the
results by Carlini et al. [11] show that a shadow stack (or equiva-
lent mechanism) is essential for the security of CFI. The shadow
stack maintains a copy of each return address in a separate region
of memory. Each return instruction is then instrumented to validate
that the return addresses on the call and shadow stack match. This
ensures that each return is restricted only to its corresponding call
site.
Although shadow stacks provide precise protection, traditional
shadow stacks incur significant performance overhead and lead to
false positives for programming constructs that cause mismatches
between calls and returns (C++ exceptions with stack unwinding,
setjmp/ longjmp). Recent shadow designs demonstrate that per-
formance can be increased by either leveraging a parallel shadow
stack [15], or using a dedicated register for shadow stack address-
ing [10]. However, in these schemes the shadow stack still resides
in the same address space as the target application, and can be
compromised if the shadow stack location is known to A. For
traditional shadow stacks, a typical solution for dealing with mis-
matches between calls and returns is to pop return addresses off
the shadow stack until a match is found, or the shadow stack is
empty (e.g., binary RAD [13]). This not only increases the complex-
ity and run-time of the shadow stack instrumentation placed in
the function epilogue, but also sacrifices precision, e.g., it allows
A to redirect longjmp to any previously active call site. This can
be avoided by storing and validating both the return address and
stack pointer [14, 37, 43]. So far, only hardware-assisted shadow
stacks promise to achieve negligible overhead without any security
trade-offs (e.g., Intel CET[25]).
The idea of using of MACs to protect the return address at run-
time was introduced in Cryptographic CFI (CCFI) [33] which uses
MACs to protect return addresses and other control-flow data (e.g.,
function pointers and C++ vtable pointers). CCFI’s return address
protection is similar to PA-based return address signing [39]; both
bind the return address to the address of the function’s stack frame
and thus provide only coarse-grained resistance against pointer
reuse attacks [31].
Other prominent defenses against control-flow attacks include
fine-grained code randomization [30], and code-pointer integrity
(CPI) [28]. Code randomization makes it more difficult forA to find
suitable gadgets to exploit in their attacks, but is not effective if the
memory layout of the program becomes known. CPI protects code
pointers by storing them in a separate safe stack. The safe stack
requires similar integrity guarantees as shadows stacks to remain
effective [19].
PACStack targets the ARM architecture, which traditionally has
received less attention compared to the x86 family of computer ar-
chitectures in terms of CFI research.MoCFI [16] is a software-based
CFI approach specifically targeting ARM application processors
used in smartphones. It uses a combination of a shadow stack, static
analysis and run-time heuristics to determine the set of valid targets
11
Liljestrand, et al.
for control-flow transfers, but suffers from the same drawbacks that
plague traditional shadow stack schemes. CFI CaRE [36] is a CFI
solution targeting small, embedded ARM-based microcontrollers
(MCUs). It uses the ability to perform hardware-enforced isolated
execution on ARMv8-M MCUs to isolate the shadow stack to a
secure processor state. The ARMv8-M [5] architecture enforces
that calls to secure functions must target secure gate instructions
placed at the beginning of such functions. The ARMv8.5-A archi-
tecture introduces similar branch target indicators (BTI) [3] to also
ARM application processors. BTI constitutes one way to meet the
PACStack pre-requisite of coarse-grained CFI for indirect branch
instructions, e.g., calls via function pointers.
10 DISCUSSION
10.1 Generalizing ACS to other data structures
ACS builds on the idea of chaining cryptographic authentication
codes. This simple, yet powerful, construct is similar to hash chains,
which have been used before as means of password protection
(Lamport signatures [29]), digital signatures (Merkle trees [34]),
and have seen use in technologies such as blockchain [46] and
trusted hardware access control authorization policies [6].
While the focus of this work is on applying this idea to protect
the integrity of return addresses in the program call stack, the
same approach can be generalized to other data structures and
applications. For example, the call-stack protection could easily
be extended to cover the frame pointer, or other data stored in a
function’s stack frame, and protect such data from unauthorized
modification.
In addition to instrumentation that can protect the call stack,
an ACS-like authenticated stack, or other data structure such as
a Merkle-tree [34] can be implemented as reusable library, which
would allow application developers to protect the integrity of criti-
cal data structures frommanipulation as a result of software [12, 23],
or hardware attacks [26].
An example of such a use case is data structures in operating
system kernels. For instance, the Linux kernel source code features
a generic double linked list implementation, which doubles as a
queue and stack, depending on where in the kernel it is used14.
Kernel data structures are critical to the system security. Many
of the vulnerabilities found in the kernel allow limited access to
kernel data. Malicious modification of kernel data can lead to a
wide range of effects, including privilege escalation and process
hiding [8]. Applying ACS-like protection to critical kernel stacks
can protect such structures from: i) malicious modification byA in
an effort to compromise kernel data integrity ii) accidental misuse
by programmers, e.g., operating on a stack as a queue and vice
versa (a side-effect of reuse of generic list implementations).
10.2 Support for software exceptions
The setjmp / longjmp interface has traditionally been used to pro-
vide exception-like functionality in C. However, modern coding
standards for C and C++ that aim to facilitate code safety, secu-
rity, and reliability consider them harmful and forbid their use, e.g.,
MISRA C:2004 [22, Rule 20.7] and JSF AV C++ [32, Rule 20]. Recall
14https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/
linux/list.h?h=v5.0
from Section 5.4, that calling longjmp with an expired jmp_buf
is is undefined behavior. For PACStack, this means that although
the aretb in jmp_buf to the corresponding SP and authi , it cannot
guarantee their freshness. A can modify jmp_buf to contain the
previously used aretb and SPb , but must also modify the stack-
frame at SPb , such that it contains the prior areti . This allows a
control-flow transfer to a previously valid setjmp return site and
SP value. To prevent reuse of expired jmp_buf buffers, longjmp
can be rewound step-by-step, i.e., conceptually performing returns
until the correct stack-frame is reached.
We plan to extend PACStack support to LLVM libunwind15 ,
which provides frame-by-frame unwinding of the call stack. By
validating the ACS on each stack frame unwinding, we can ensure
that only a fresh and valid state is reached.
Because C++ exceptions also cause irregular stack unwinding
they pose a similar challenge. However, C++ already performs more
fine-grained stack unwinding to correctly destroy objects in un-
wound stack frames. The LLVM libcxxabi library will, depending
on configuration, use libunwind for this purpose. With PACStack
support in libunwind, we will be able to secure both setjmp /
longjmp and support C++ exception handling.
10.3 Interoperability with unprotected code
Interoperability with unprotected (uninstrumented) code is an
important deployment consideration. On one hand, a PACStack-
protected application may need to interoperate with unprotected
shared libraries. On the other, an unprotected app may need to
interoperate with PACStack-protected shared libraries. The latter
scenario is relevant for deployment in mobile operating systems
such as Android, where multiple stakeholders provide application
binaries to consumer devices. The deployment of PACStack, or
any other run-time protection mechanism, is likely to be driven
by OEMs that enable specific protection schemes for the operat-
ing system and system applications. However, OEMs are not in
control of native code deployed as part of applications distributed
through standard application marketplaces. It should be possible
for one version of the shared libraries shipped with the operating
system to remain interoperable with both PACStack-protected, and
unprotected apps.
In Section 6.1 we explain how the use of callee-saved registers
allows PACStack to remain interoperable with unprotected code.
Recall that because CR is a callee-saved register it will be restored
upon return. However, PACStack cannot guarantee that CR remains
unmodified during the execution of the unprotected code that could
temporarily store its value on the stack. To achieve the security
guarantees describes in Section 7, PACStack instrumentation must
be applied to both the application and any shared libraries. How-
ever, partial protection, e.g. PACStack-protected shared libraries
can significantly raise the bar for the attacker, as calls into protected
functions can still benefit from return address authentication. Com-
mon shared libraries like libc are a popular source for gadgets
for run-time attacks because of their size and availability. Because
functions in a PACStack-protected library validate the return ad-
dress in returns from library functions, they effectively remove a
potentially large set of reusable gadgets from A’s disposal.
15https://github.com/llvm/llvm-project/tree/master/libunwind
12
PACStack: an Authenticated Call Stack
11 CONCLUSION
We showed how a general-purpose hardware security mechanism
(ARM PA) can provide guarantees on-par with hardware-assisted
shadow stacks, without requiring additional hardware support or
compromising security. Other general-purpose primitives like mem-
ory tagging and branch target indicators are being rolled out. Cre-
ative uses of such primitives hold the promise of significantly im-
proving software protection.
ACKNOWLEDGMENTS
This work was supported in part by the Academy of Finland under
grant nr. 309994 (SELIoT), and the Intel Collaborative Research
Institute for Collaborative Autonomous & Resilient Systems (ICRI-
CARS).
REFERENCES
[1] Martín Abadi et al. 2009. Control-flow Integrity Principles, Implementations, and
Applications. ACM Trans. Inf. Syst. Secur. 13, 1 (Nov. 2009), 4:1–4:40.
[2] ARM Ltd. 2017. ARMv8 Architecture Reference Manual, for ARMv8-A archi-
tecture profile (ARM DDI 0487C.a). https://static.docs.arm.com/ddi0487/ca/
DDI0487C_a_armv8_arm.pdf.
[3] ARM Ltd. 2018. Arm A-Profile Architecture Developments 2018: Armv8.5-
A. https://community.arm.com/developer/ip-products/processors/b/processors-
ip-blog/posts/arm-a-profile-architecture-2018-developments-armv85a.
[4] ARMLtd. 2018. FastModels, Version 11.4, Fixed Virtual Platforms (FVP) Reference
Guide. https://static.docs.arm.com/100966/1104/fast_models_fvp_rg_100966_
1104_00_en.pdf.
[5] ARM Ltd. 2019. Armv8-M Architecture Reference Manual (ARM DDI 0553B.g).
https://static.docs.arm.com/ddi0553/bg/DDI0553B_g_armv8m_arm.pdf.
[6] Will Arthur and David Challener. 2015. A Practical Guide to TPM 2.0: Using the
Trusted Platform Module in the New Age of Security (1st ed.). Apress, Berkely, CA,
USA.
[7] Roberto Avanzi. 2017. The QARMA Block Cipher Family. Almost MDS Matrices
Over Rings With Zero Divisors, Nearly Symmetric Even-Mansour Constructions
With Non-Involutory Central Rounds, and Search Heuristics for Low-Latency
S-Boxes. IACR Trans. Symmetric Cryptol. 2017, 1 (2017), 4–44.
[8] Ahmed M. Azab et al. 2014. Hypervision Across Worlds: Real-time Kernel Pro-
tection from the ARM TrustZone Secure World. In Proc. ACM CCS ’14. 90–102.
[9] Ferdinand Brasser et al. 2017. DR.SGX: Hardening SGX Enclaves against Cache
Attacks with Data Location Randomization. https://arxiv.org/abs/1709.09917.
[10] Nathan Burow et al. 2019. SoK: Shining Light on Shadow Stacks.
arXiv:1811.03165v2 [cs.CR]. https://arxiv.org/abs/1811.03165v2
[11] Nicolas Carlini et al. 2015. Control-flow Bending: On the Effectiveness of Control-
flow Integrity. In Proc. USENIX Security ’15. 161–176.
[12] Shuo Chen et al. 2005. Non-control-data Attacks Are Realistic Threats. In Proc.
USENIX Security ’05. 177–191.
[13] Tzi-Cker Chiueh and Fu-Hau Hsu. 2001. RAD: a compile-time solution to buffer
overflow attacks. In Proc. 21st International Conference on Distributed Computing
Systems. 409–417.
[14] Marc L. Corliss, E. Christopher Lewis, and Amir Roth. 2005. Using DISE to Protect
Return Addresses from Attack. ARM SIGARCH Comput. Archit. News 33, 1 (2005),
65–72.
[15] Thurston H.Y. Dang, Petros Maniatis, and David Wagner. 2015. The Performance
Cost of Shadow Stacks and Stack Canaries. In Proc.ACM ASIA CCS ’15. 555–566.
[16] Lucas Davi et al. 2012. MoCFI: A framework to mitigate control-flow attacks on
smartphones. In Proc. NDSS ’12.
[17] Lucas Davi et al. 2015. HAFIX: Hardware-assisted Flow Integrity Extension. In
Proc. ACM/EDAC/IEEE DAC ’15. 74:1–74:6.
[18] Ren Ding et al. 2017. Efficient Protection of Path-Sensitive Control Security. In
Proc. USENIX Security ’17. 131–148.
[19] I. Evans et al. 2015. Missing the Point(er): On the Effectiveness of Code Pointer
Integrity. In Proc. IEEE S&P ’15. 781–796.
[20] Jonathon T. Giffin, Somesh Jha, and Barton P. Miller. 2002. Detecting Manipulated
Remote Call Streams. In Proc. USENIX Security ’02. 61–79.
[21] Jonathon T. Giffin, Somesh Jha, and Barton P. Miller. 2004. Efficient context-
sensitive intrusion detection. In Proc. NDSS ’04.
[22] HORIBA MIRA Ltd. 2004. Guidelines for the Use of the C Language in Critical
Systems. http://www.misra.org.uk/
[23] Hong Hu et al. 2016. Data-Oriented Programming: On the Expressiveness of
Non-control Data Attacks. In Proc. IEEE S&P ’16. 969–986.
[24] Hong Hu et al. 2018. Enforcing Unique Code Target Property for Control-Flow
Integrity. In Proc. ACM CCS ’15. 1470–1486.
[25] Intel. 2016. Control-flow Enforcement Technology Preview. https:
//software.intel.com/sites/default/files/managed/4d/2a/control-flow-
enforcement-technology-preview.pdf.
[26] Yoongu Kim et al. 2014. Flipping Bits in Memory Without Accessing Them:
An Experimental Study of DRAM Disturbance Errors. In Proc. IEEE ISCA ’14.
361–372.
[27] Tim Kornau. 2009. Return Oriented Programming for the ARM Architecture. Ph.D.
Dissertation. Ruhr-Universität Bochum.
[28] Volodymyr Kuznetsov et al. 2014. Code-pointer Integrity. In Proc. USENIX OSDI
’14. 147–163.
[29] Leslie Lamport. 1981. Password Authentication with Insecure Communication.
Commun. ACM 24, 11 (1981), 770–772.
[30] Per Larsen et al. 2014. SoK: Automated Software Diversity. In Proc. IEEE S&P ’14.
276–291.
[31] Hans Liljestrand et al. 2019. PAC it up: Towards Pointer Integrity using
ARM Pointer Authentication. (to appear) Usenix SEC 2019, arXiv:1811.09189
[cs.CR]. https://arxiv.org/abs/1811.09189
[32] Lockheed Martin Corporation. 2005. Joint Strike Fighter Air Vehicle C++ Coding
Standards (Revision C). http://www.jsf.mil/downloads/down_documentation.
htm
[33] Ali Jose Mashtizadeh et al. 2015. CCFI: Cryptographically Enforced Control Flow
Integrity. In Proc. ACM CCS ’15. 941–951.
[34] Ralph C. Merkle. 1988. A Digital Signature Based on a Conventional Encryption
Function. In CRYPTO ’87. Springer-Verlag, 369–378.
[35] Danny Nebenzahl, Mooly Sagiv, and Avishai Wool. 2006. Install-Time Vaccination
of Windows Executables to Defend Against Stack Smashing Attacks. IEEE Trans.
Dependable Secur. Comput. 3, 1 (2006), 78–90.
[36] Thomas Nyman et al. 2017. CFI CaRE: Hardware-Supported Call and Return
Enforcement for Commercial Microcontrollers. In Research in Attacks, Intrusions,
and Defenses. 259–284.
[37] H. Ozdoganoglu et al. 2006. SmashGuard: A Hardware Solution to Prevent
Security Attacks on the Function Return Address. IEEE Trans. Comput. 55, 10
(2006), 1271–1285.
[38] R. Panda et al. 2018. Wait of a Decade: Did SPEC CPU 2017 Broaden the Perfor-
mance Horizon?. In Proc. IEEE HPCA ’18. 271–282.
[39] Qualcomm. 2017. Pointer Authentication on ARMv8.3. https://www.qualcomm.
com/media/documents/files/whitepaper-pointer-authentication-on-armv8-
3.pdf.
[40] Nigel P Smart. 2016. Cryptography Made Simple. Springer.
[41] Solar Designer. 1997. lpr LIBC RETURN exploit. http://insecure.org/sploits/
linux.libc.return.lpr.sploit.html
[42] László Szekeres et al. 2013. SoK: Eternal War in Memory. In Proc. IEEE S&P ’13.
48–62.
[43] Caroline Tice et al. 2014. Enforcing Forward-edge Control-flow Integrity in GCC
& LLVM. In Proc. USENIX Security ’14. 941–955.
[44] Erik van der Kouwe et al. 2018. Benchmarking Crimes: An Emerging Threat in
Systems Security. https://arxiv.org/abs/1801.02381.
[45] Victor van der Veen et al. 2015. Practical Context-Sensitive CFI. In Proc. ACM
CCS ’15. 927–940.
[46] Dylan Yaga et al. 2018. Blockchain Technology Overview. Technical Report
NIST.IR.8202. National Institute of Standards and Technology.
A SECURITY PROOFS
In Section 7.1, we gave an informal analysis of the security of ACS;
here we give a more detailed proof of security, and in particular
prove that authentication token masking prevents A from obtain-
ing exploitable authentication token collisions.
The argument proceeds as follows: we suppose that A, after
obtaining q authentication tokens, can find a pair of inputs (x ,y)
and (x ,y′) whose authentication tokens HK(·, ·) collide. This can
be used to construct a distinguisher of the masks HK(0, ·) from a
random string. The structure of the authentication tags is such that
this further reduces to a semantic security game for one-time pad
encryption of the masks. Then, we show that any violation of the
integrity of an ACS-protected call stack also yields values whose
authentication tokens collide as described above, allowing us to
bound the probability of an integrity violation.
We summarize our notation in Table 2.
13
Liljestrand, et al.
Games
GACS
(Figure 13)
Security game for ACS integrity.
GPAC-Collision
(Figure 8)
Security game for the identification of col-
liding authentication tokens.
GPAC-Distinguish
(Figure 9)
Security game for the distinguishability of
HK(·, ·) from a random oracle.
G1,G2,G3
(Figure 11)
Semantic security games for the mask
HK(0, ·).
Adversary interfaces
GACS Aoracle-request Get path through the call-
graph for which A wants
the final authenticated re-
turn address pushed to the
stack.
Aoracle-response Return a previously-
requested authenticated
return address.
AACS-Violation Return to the challenger
authenticated return val-
ues that can be used to vi-
olate call stack integrity.
GPAC-Collision Aoracle-request Get a value for which A
wants a masked authenti-
cation token.
Aoracle-response Return a previously-
requested masked authen-
tication token.
Agen-collision Return to the challenger
two authenticated return
values with colliding au-
thentication tokens.
GPAC-Distinguish Aoracle-request Get a value for which A
wants an authentication
tag.
Aoracle-response Return a previously-
requested authentication
token.
Adistinguish Return to the challenger
a single bit identifying
whether the given tokens
were from a random oracle
or HK(·, ·).
G1,G2 Bdistinguish Identify the authentication
token function used to gen-
erate masked authentica-
tion tokens.
G3 Bdistinguish’ As for G1,G2, but with
the inputs represented as
strings rather than func-
tions.
Table 2: Notation used in Appendix A.
GAPAC-Collision(1λ ,H ,q)
K
$← {0, 1}λ
/ Give A q masked authentication tokens
/ of their choice.
for i ∈ {1, . . . , q } do
(x, y) ← Aoracle-request()
Aoracle-response (HK(x, y) ⊕ HK(0, y))
endfor
/ A is challenged to provide inputs whose authentication tokens collide.
(xˆ, yˆ, yˆ′) ← Agen-collision()
if yˆ , yˆ′ ∧ HK(xˆ, yˆ) = HK(xˆ, yˆ′) then
return 1
else
return 0
endif
Figure 8: Security game for finding colliding PACs given
masked authentication tokens.
Theorem A.1 (PAC-masking prevents collision-finding).
Suppose that after q queries, an adversary A can distinguish
HK(·, ·) from a random oracle with advantage no greater than
AdvAPAC-Distinguish(1λ ,H ,q), as given in Figure 9. Then, assuming a
key-length of λ for HK(·, ·), and given access to q masked authentica-
tion tokens, A can identify a pair of inputs (xˆ , yˆ) and (xˆ , yˆ′) whose
corresponding unmasked authentication tokens collide with advantage
at most 2AdvAPAC-Distinguish(1λ ,H ,q).
Proof. We begin with a collision-game GAPAC-Collision(1λ ,H ,q),
shown in Figure 8 in which the adversary is given oracle access
to the authentication token generator and then asked to provide
values x ,y,y′ such that HK(x ,y) = HK(x ,y′).
An adversary that selects (x ,y,y′) at random from {0, 1}VA_SIZE×
{0, 1}VA_SIZE+b × {0, 1}VA_SIZE+b , such that y , y′, will win with
probability 2−b ; A’s advantage is therefore
AdvAPAC-Collision(1λ ,H ,q) = P
[
GAPAC-Collision(1λ ,H ,q) = 1
]
− 2−b .
Wewill bound this advantage by reduction to a semantic security
game for the masks. We consider the following games, shown in
Figure 11, and described in Figure 10.
The first hop, from G1 to G2, is based on indistinguisha-
bility and relaxation: we suppose that HK(·, ·) can be distin-
guished from a random oracle with probability no more than
1
2 + Adv
A
PAC-Distinguish(1λ ,H ,q), and that the adversary is not lim-
ited in the number of queries that can be made to the masked
authentication token oracle. Then,
P[GB1 (1λ ,H ,q) = 1] ≤ P[GA2 (1λ ,H ,q) = 1]
+ AdvAPAC-Distinguish(1λ ,H ,q).
The second hop, from G2 to G3, is a mere reformulation of G2 such
that random oracles are represented as strings, and that rather than
allowing B to request arbitrarily many authentication tokens from
14
PACStack: an Authenticated Call Stack
GAPAC-Distinguish(1λ ,H ,q)
K
$← {0, 1}λ
/ B is given values of their choice from either
/ HK(·, ·) or a random oracle RO (x, y)
S0(x, y) def= RO (x, y)
S1(x, y) def= HK(x, y)
c
$← {0, 1}
for i ∈ {1, . . . , q } do
(x, y) ← Aoracle-request()
Aoracle-response (Sc (x, y))
endfor
/ A is challenged to determine whether it received
/ values from HK(·, ·) or the random oracle.
cˆ ← Adistinguish()
if c , cˆ then
return 1
else
return 0
endif
Figure 9: Security game in which A attempts to distinguish
HK(·, ·) from a random oracle.
GB1 (1λ ,H ,q): B obtains masked authentication to-
kens HK(x ,y) ⊕ HK(0,y) for up to q pairs (x ,y)
of B’s choice, and must then distinguish the
masks HK(0, ·) from a random oracle.
GB2 (1λ ,H ,q):H
K
(·,
·)→
ra
nd
om
or
ac
le
This is the same as the previous game,
except that HK(·, ·) is replaced by a random or-
acle and B is not limited in their number of
queries. B must now distinguish between two
random oracles, one of which is used in com-
puting the authentication tokens, and one of
which is independent of the authentication to-
kens.
GB3 (1λ ,H ,q):
Re
fo
rm
ul
at
io
n
This is the semantic security game for
repeated one-time-pad encryptions of a ran-
dom string.
Figure 10: The game-hops used in Figure 11.
the challenger, we instead give B direct access to the oracle, as
represented by the sequence of strings T1...2VA_SIZE .
The third game is a semantic security game for the one-time
pad, where A is given 2VA_SIZE encryptions of S1 and then asked
to distinguish between S1 and a random string. The perfect secrecy
of the one-time pad means that P[GB1 (1λ) = 1] = 12 and so
P[GB1 (1λ) = 1] ≤
1
2 + Adv
A
PAC-Distinguish(1λ ,H ,q). (2)
Finally, we provide a reduction from GAPAC-Collision(1λ ,H ,q) to
G1B(1λ). Suppose A can win GAPAC-Collision(1λ ,H ,q) with advan-
tage AdvAPAC-Collision(1λ ,H ,q). Then, we define an adversary AA
for G1B(1λ), shown in Figure 12.
This adversary wins G1B(1λ) with probability at least 12 +
1
2Adv
A
PAC-Collision(1λ ,H ,q), and so by (2)
AdvAPAC-Collision(1λ ,H ,q) ≤ 2AdvAPAC-Distinguish(1λ ,H ,q).
If the MAC HK(·, ·) is a pseudo-random function family with respect
to K , then AdvAPAC-Distinguish(1λ ,H ,q) is negligible, and thus so is
AdvAPAC-Collision(1λ ,H ,q). □
With a bound on A’s probability of successfully obtaining a
PAC collision, we may now obtain a bound on their probability of
violating the integrity of an ACS-protected call stack.
Theorem A.2 (Security of ACS). Consider a program whose
call stack is protected by ACS, which has a call-graph C and b-bit
masked authentication tokens TK(x ,y) = HK(x ,y) ⊕ HK(0,y). Then, an
adversary with arbitrary control over memory can violate backward-
edge control-flow integrity with probability
P
[
GAACS(1λ ,H ,C,q)
]
≤ P
[
GAPAC-Collision(1λ ,H ,q)
]
≤ 2−b + 2AdvAPAC-Distinguish(1λ ,H ,q)
Proof. We begin with a security game for ACS, shown in Fig-
ure 13.
Our goal is to provide a black-box reduction from
GAACS(1λ ,H ,C,q) to GAPAC-Collision(1λ ,H ,q).
From line 24 of Figure 13, winning GAACS implies that A has
obtained colliding authentication tokens, and therefore A can win
GAPAC-Collision with probability at least P[GAACS]. Substituting the
bound from Theorem A.1, we obtain the bound given. □
15
Liljestrand, et al.
GB1 (1λ ,H ,q)
K
$← {0, 1}λ
S0(y) def= RO (y)
S1(y) def= HK(0, y)
T (x, y), x , 0, first q queries def= HK(x, y) ⊕ HK(0, y)
/ The adversary is given S0 and S1 and challenged to
/ determine which is used to calculate T (·, ·).
c
$← {0, 1}
cˆ ← Bdistinguish (T , Sc , S1−c )
if c = cˆ then
return 1
else
return 0
endif
GB2 (1λ ,H ,q)
S0(y) def= RO0(y)
S1(y) def= RO1(0, y)
T (x, y), x , 0 def= RO1(x, y) ⊕ RO1(0, y)
/ The adversary is given S0 and S1 and challenged to
/ determine which is used to calculate T (·, ·).
c
$← {0, 1}
cˆ ← Bdistinguish (T , Sc , S1−c )
if c = cˆ then
return 1
else
return 0
endif
GB3 (1λ ,H ,q)
P1. . .2VA_SIZE ← {0, . . . , 2b − 1}2
b+VA_SIZE
S0
$← {0, . . . , 2b − 1}2b+VA_SIZE
S1
$← {0, . . . , 2b − 1}2b+VA_SIZE
T1. . .2VA_SIZE ← P1. . .2VA_SIZE ⊕ S1
/ The adversary is given S0 and S1 and challenged to
/ determine which is used to calculate T··· .
c
$← {0, 1}
cˆ ← Bdistinguish’ (T , Sc , S1−c )
if c = cˆ then
return 1
else
return 0
endif
HK(·, ·) → random oracle random oracle→ random string
Figure 11: Security games used in Theorem A.1.
BAoracle-request()
return Aoracle-request()
BAoracle-response(x)
Aoracle-response(x )
BAdistinguish(T , S, S ′)
x, y, y′ ← Agen-collision(T )
if S (y) ⊕ S (y′) = T (x, y) ⊕ T (x, y′) then
return 1
else
return 0
endif
Figure 12: An adversary BA for G1 used in our black-box
reduction of GPAC-Collision to G1. Not shown is the variant
BAdistinguish’(T , S, S ′) that is identical to BAdistinguish(T , S, S ′) ex-
cept that T , S , and S ′ are given in the form of strings.
16
PACStack: an Authenticated Call Stack
GAACS(1λ ,H ,C,q)
1 : K $← {0, 1}λ
2 :
3 : / Give A q tokens from call-graph traversals.
4 : for i ∈ {1, . . . , q } do
5 : p1. . .m+1 ← Aoracle-request()
6 : / Is the request for a real path through the call-graph?
7 : if ∃j : pj → pj+1 < edges(C) then
8 : return 0
9 : endif
10 : authm ← TK(pm, TK(pm−1, · · · ) ∥ pm−1) ∥ pm
11 : Aoracle-response(authm )
12 : endfor
13 :
14 : ptrjumper, ptrcorrect, authcorrect, tcorrect,
15 : ptradv, authadv, tadv ← AACS-Violation()
16 :
17 : / The substituted masked authenticated return address must be different.
18 : if ptrcorrect = ptradv ∧ authcorrect = authadv then
19 : return 0
20 : endif
21 :
22 : / Does the return pointer authenticate correctly with the adversary’s
23 : / new masked authenticated return address as the modifier?
24 : if HK(ptrjumper, authcorrect ∥ ptrcorrect)
25 : , HK(ptrjumper, authadv ∥ ptradv) then
26 : return 0
27 : endif
28 :
29 : / Did the adversary provide a valid masked authenticated return address?
30 : if authadv = HK(ptradv, tadv)
31 : return 1
32 : else
33 : return 0
34 : endif
Figure 13: Security game for ACS with respect to a pro-
gramhaving call-graphC and authentication token function
TK(·, ·).
17
Liljestrand, et al.
B ARMv8.3-A PA INSTRUCTIONS
Table 3: List of PA instructions [31]. PA Key indicates the PA key the instruction uses. Addr. indicates the source of the address
to be signed or authenticated.Mod. indicates the modifier used by the instruction. Xd and Xm indicates that the input is taken
from a general purpose register. The backwards-compatible column indicates if the instruction is safe on pre ARMv8.3-A.
Instruction Mnemonic
PA Key
Addr. Mod. Backwards-compatibleInstr. Data Gen-A B A B eric
BASIC POINTER AUTHENTICATION INSTRUCTIONS
Add PAC to instr. addr.
paciasp ✓ LR SP ✓
pacia ✓ Xd Xm ✗
paciaz ✓ LR zero ✓
paciza ✓ Xd zero ✗
pacia1716 ✓ X17 X16 ✓
pacibsp ✓ LR SP ✓
pacib ✓ Xd Xm ✗
pacibz ✓ LR zero ✓
pacizb ✓ Xd zero ✗
pacib1716 ✓ X17 X16 ✓
Add PAC to data addr.
pacda ✓ Xd Xm, ✗
pacdza ✓ Xd zero ✗
pacdb ✓ Xd Xm ✗
pacdzb ✓ Xd zero ✗
Calculate generic MAC pacga ✓ ✗
Authenticate instr. addr.
autiasp ✓ LR SP ✓
autia ✓ Xd Xm ✗
autiaz ✓ LR zero ✓
autiza ✓ Xd zero ✗
autia1716 ✓ X17 X16 ✓
autibsp ✓ LR SP ✓
autib ✓ Xd Xm ✗
autibz ✓ LR zero ✓
autizb ✓ Xd zero ✗
autib1716 ✓ X17 X16 ✓
Authenticate data addr.
autda ✓ Xd Xm ✗
autdza ✓ Xd zero ✗
autdb ✓ Xd Xm ✗
autdzb ✓ Xd zero ✗
Strip PAC
xpacd Xd ✗
xpaci Xd ✗
xpaclri LR ✓
COMBINED POINTER AUTHENTICATION INSTRUCTIONS
Authenticate instr. addr.
and return
retaa ✓ LR SP ✗
retab ✓ LR SP ✗
Authenticate instr. addr.
and branch
braa ✓ Xd Xm ✗
braaz ✓ Xd zero ✗
brab ✓ Xd Xm ✗
brabz ✓ Xd zero ✗
Authenticate instr. addr.
and branch with link
blraa ✓ Xd Xm ✗
blraaz ✓ Xd zero ✗
blrab ✓ Xd Xm ✗
blrabz ✓ Xd zero ✗
Authenticate instr. addr.
and exception return
eretaa ✓ ELR SP ✗
eretab ✓ ELR SP ✗
Authenticate data. addr. and
load register
ldraa ✓ Xd zero ✗
ldrab ✓ Xd zero ✗
18
