Shadow stacks are the go-to solution for perfect backward-edge control-flow integrity (CFI). Software shadow stacks trade off security for performance. Hardware-assisted shadow stacks are efficient and secure, but expensive to deploy. We present authenticated call stack (ACS), a novel mechanism for precise verification of return addresses using aggregated message authentication codes. We show how ACS can be realized using ARMv8.3-A pointer authentication, a new low-overhead mechanism for protecting pointer integrity. Our solution achieves security comparable to hardware-assisted shadow stacks, while incurring negligible performance overhead (< 0.5%) but requiring no additional hardware support.
INTRODUCTION
Control-flow attacks are a prevalent threat; defences such as data execution prevention (DEP) can be circumvented using code-reuse attacks, such as return oriented programming (ROP). ROP is performed by corrupting function return addresses to alter a program's backward-edge control-flow. The gold standard for preventing ROP is the shadow stack [1] , which securely stores a copy of each return address, thus allowing exact verification of returns. While software solutions can achieve reasonable performance [4] , so far, only hardware-assisted shadow stacks, such as Intel CET [5] , promise negligible overhead without security trade-offs. ARMv8.3-A recently introduced pointer authentication (PA) [2] which can be used to protect return addresses [8] . Current PA schemes are vulnerable to reuse attacks, where previously signed pointers are reused to bypass authentication [7] .
In this paper, we propose authenticated call stack (ACS), a novel scheme to precisely protect return addresses, that achieves security comparable to hardware-assisted shadow stacks, with minimal overhead, and without integrity-protected memory. ACS aggregates message authentication codes (MACs) bound to return addresses. We show how ACS can be realized using ARM PA. The resulting system, PACStack, is efficient, can withstand strong adversaries with full memory access, and requires no additional hardware. Our contributions are:
• ACS, an aggregated MAC design that provides precise verification of return addresses (Section 4).
• PACStack, a LLVM-based ACS implementation using ARM PA to achieve negligible overhead (< 0.5%) at no additional hardware cost (Sections 5 and 6).
BACKGROUND
ROP on ARM: In ROP, the attacker exploits a memory vulnerability to manipulate return addresses and alter the program's backward-edge control-flow. ROP allows Turing-complete attacks by chaining multiple returns into attacker-chosen code. The ARM architecture uses a dedicated link register (LR) to hold the current function's return address. For non-leaf function calls the LR value must be stored on the stack, thereby allowing ROP [6] . ARM PA: ARMv8.3-A introduced PA, which supports authenticating pointers [2] . Pointers are signed using a PA instruction to calculate a MAC, H K (V ||M), using key K, a given input V and a 64-bit modifier M. The resulting MAC, the pointer authentication code (PAC), is typically embedded into the pointer itself. On a 64-bit ARM running a default Linux kernel, the virtual address (VA) is 39 bits, which, excluding other reserved bits, leaves 16 for the PAC. For example, the pacia and autia instructions generate a PAC based on the input pointer's VA and an arbitrary 64-bit modifier to authenticate the pointer value, VA. Additionally a generic PA instruction, pacga, outputs a separate 32-bit PAC based on an arbitrary 64-bit input and a 64-bit modifier. PA will be supported in Linux 5.0 1 , where the kernel will set a process' keys at exec.
THREAT MODEL
In this work, we assume a powerful adversary with arbitrary read and write access to process memory, restricted by DEP. This threat model is consistent with prior work on control-flow attacks. We do not consider adversaries that can manipulate kernel memory, or directly alter CPU configuration and registers. Software-only shadow stacks cannot efficiently protect against such an adversary.
DESIGN
On ARM, the return address, ret n , of a leaf-function can be securely stored in LR. Prior return addresses ret i , i ∈ [0, n − 1] must however be stored on the stack. ACS protects these values by aggregating a series of iterative MACs auth i , i ∈ [0, n] that cryptographically bind all return values ret i , i ∈ [0, n] to the latest auth n (Figure 1) . The topmost token, auth n , is securely stored in LR to prevent its modification. To prevent pre-computation of hash values, we utilize a keyed MAC function H K to generate auth i :
On function call, LR cannot be used to hold auth i ; it is instead put in the chain register (CR in Figure 2 ). After verification of auth i−1 and ret i in the function epilogue, auth i−1 is put in CR. Authenticated return-address variant To avoid storing both auth i and ret i we define a combined aret i : aret i = auth i ||ret i , where
We secure aret n in LR, and store aret i , i ∈ [0, n − 1] on the stack. Irregular stack unwinding In some cases, notably C++ exceptions and C setjmp / longjmp, the call-stack is unwound in an irregular manner. To preserve ACS integrity, the unwinding is modified such that the associated data (e.g., jmp_buf) is tied to the corresponding auth i , i.e., auth buf = H K (jmp_bu f ||auth i ).
IMPLEMENTATION
We implement PACStack, an ACS solution on top of LLVM 7.0, using PA to efficiently compute auth and aret values. We use Mark Rutland's Linux kernel patches 2 . pacga-based instrumentation: Our first variant uses pacga as H K to generate and verify auth i in the function prologue and epilogue, respectively. Call-sites are instrumented to pass auth i via CR, set to x28. x28 is a callee saved register, i.e., its value is preserved across calls, allowing PACStack to call non-instrumented code. autia-/ pacia-based instrumentation: This variant using pacia and autia to authenticate return addresses avoids additional store / load operations as CR is stored on the stack instead of LR.
EVALUATION
Performance: We use nbench-byte-2.2.3 3 on a 960board Kirin 620 HiKey running Linux kernel v4.18.0 and BusyBox v1.29.2 for evaluation. Because the hardware lacks PA support we simulate PA overhead based on estimated cost of four cycles [3, 7] . Our functional evaluation was done on a Linaro FVP simulator that supports PA instructions. Our current PACStack instrumentation is based on the autia / pacia variant and incurs a < 0.5% overhead. Security evaluation 4 : ROP on the pacga-variant, requires modifying the stack to inject a ret ′ n and auth ′ n−1 , such that auth n = H K (ret ′ n ||auth ′ n−1 ). Because the attacker cannot control auth n , the only course of action is correctly guessing the 32-bit auth ′ n−1 , which yields a success probability of 2 −32 . The second variant is subtly different: an attacker must correctly guess both auth ′ n−i and auth ′ n−2 . First to inject an aret ′ n−1 that passes authentication against and replaces aret n in LR, but still returns to ret n . And second, to inject an aret ′ n−2 , such that the return to aret n−1 passes authentication and a return to ret ′ n−1 is performed. The guesses cannot be done separately, thus the attack probability on a system with 16-bit PAC size is 2 −32 . The security of PACStack depends on the cryptographic properties of PA and the integrity of the topmost auth n .
CONCLUSION
We showed how a general-purpose hardware security mechanism (ARM PA) can provide security similar to hardware-assisted shadow stacks, without requiring additional hardware support or sacrificing CFI precision. Other general-purpose primitives are being rolled out 5 . Creative uses of such primitives hold the promise of significantly improving software protection.
