A Leak-Resilient Dual Stack Scheme for Backward-Edge Control-Flow
  Integrity by Zieris, Philipp & Horsch, Julian
A Leak-Resilient Dual Stack Scheme for Backward-Edge
Control-Flow Integrity
Philipp Zieris
Fraunhofer AISEC
philipp.zieris@aisec.fraunhofer.de
Julian Horsch
Fraunhofer AISEC
julian.horsch@aisec.fraunhofer.de
ABSTRACT
Manipulations of return addresses on the stack are the basis for a
variety of attacks on programs written in memory unsafe languages.
Dual stack schemes for protecting return addresses promise an ef-
ficient and effective defense against such attacks. By introducing
a second, safe stack to separate return addresses from potentially
unsafe stack objects, they prevent attacks that, for example, mali-
ciously modify a return address by overflowing a buffer. However,
the security of dual stacks is based on the concealment of the safe
stack in memory. Unfortunately, all current dual stack schemes
are vulnerable to information disclosure attacks that are able to
reveal the safe stack location, and therefore effectively break their
promised security properties. In this paper, we present a new, leak-
resilient dual stack scheme capable of withstanding sophisticated
information disclosure attacks. We carefully study previous dual
stack schemes and systematically develop a novel design for stack
separation that eliminates flaws leading to the disclosure of safe
stacks. We show the feasibility and practicality of our approach by
presenting a full integration into the LLVM compiler framework
with support for the x86-64 and ARM64 architectures. With an
average of 2.7% on x86-64 and 0.0% on ARM64, the performance
overhead of our implementation is negligible.
CCS CONCEPTS
• Security and privacy→ Software and application security;
KEYWORDS
Control-flow integrity; Dual stacks; Code reuse attacks; Information
leaks; Information hiding; ASLR; LLVM
ACM Reference Format:
Philipp Zieris and Julian Horsch. 2018. A Leak-Resilient Dual Stack Scheme
for Backward-Edge Control-Flow Integrity. In ASIA CCS ’18: 2018 ACM
Asia Conference on Computer and Communications Security, June 4–8, 2018,
Incheon, Republic of Korea. ACM, New York, NY, USA, 12 pages. https:
//doi.org/10.1145/3196494.3196531
ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea
© 2018 Copyright held by the owner/author(s). Publication rights licensed to the
Association for Computing Machinery.
This is the author’s version of the work. It is posted here for your personal use. Not
for redistribution. The definitive Version of Record was published in ASIA CCS ’18:
2018 ACM Asia Conference on Computer and Communications Security, June 4–8, 2018,
Incheon, Republic of Korea, https://doi.org/10.1145/3196494.3196531.
1 INTRODUCTION
A vast majority of today’s security-relevant vulnerabilities arise
from the broad use of unsafe programming languages, such as C
and C++. In favor of efficiency and flexibility, these languages omit
the enforcement of strong type and memory safety. The lack of such
an enforcement frequently causes programming errors to result in
vulnerable code pointers that can be corrupted at runtime in order
to divert a program’s control-flow and induce malicious program
behavior [46]. A common form of runtime attacks on software
takes control of its target by illicitly altering return addresses on
the program’s stack through a buffer overflow [8, 44]. The malicious
program behavior is then carried out after the exploited function
returns, diverting control to a location of the attacker’s choosing.
This threat sparked the academic research of various defense
techniques that commonly focus on securing the backward-edges—
as taken by function returns—of a program’s Control-Flow Graph
(CFG). The first and most common approaches for backward-edge
Control-Flow Integrity (CFI) [1, 15] maintain copies of return ad-
dresses on a shadow stack and use these copies to verify the integrity
of function returns. In order to reduce the performance overhead of
shadow stacks, more sophisticated solutions, namely SafeStack [32]
and AG-Stack [35], separate potentially unsafe stack objects (e.g.,
local stack variables that store user-supplied data) from sensitive
stack objects (e.g., return addresses). They realize this as dual stack
schemes by introducing a second stack into the protected programs
to hold the unsafe stack objects, while keeping the safe objects on
the program’s original stack. These new stacks—commonly referred
to as unsafe stack and safe stack—effectively thwart straightforward
runtime attacks that overflow return addresses. As they decouple
vulnerable stack objects from the sensitive control-flow data, the
attacker is unable to overwrite return addresses by exploiting the
vulnerable objects.
However, recent runtime attacks [22, 26, 37] undermine the se-
curity property of dual stacks. Both SafeStack and AG-Stack rely on
information hiding and conceal the location of their safe stacks by
means of Address Space Layout Randomisation (ASLR). By using
techniques to disclose the location of safe stacks, the attacker is able
to overwrite return addresses using arbitrary vulnerable pointers
within the program’s memory (e.g., stack or heap), even if these
return addresses are protected by backward-edge CFI mechanisms.
Hence, utilizing information disclosure as part of a sophisticated
runtime attack enables the attacker to circumvent the protective
property of dual stack schemes altogether. In detail, two major
attack surfaces for information disclosures on dual stacks can be
identified [26]: (1) Leaks of pointers to the safe stack, found in
various, unsafe locations throughout the program’s memory space,
and (2) the safe stack itself, prone to identification by searching the
entire address space of the program.
ar
X
iv
:1
80
6.
09
49
6v
1 
 [c
s.C
R]
  2
5 J
un
 20
18
ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea Philipp Zieris and Julian Horsch
In this paper, we present a new, leak-resilient dual stack design
that eliminates the aforementioned attack surfaces. For this, we
shift the paradigm of dual stacks from relocating potentially unsafe
objects towards placing the sensitive objects on the new stack while
keeping the unsafe objects on the program’s original stack. This
basic design change enables our solution (1) to completely avoid
pointer leaks and (2) to gain defensive properties against searching
the safe stack within the program’s address space.
We show the feasibility of our approach by presenting effi-
cient implementations for the widely used x86-64 and ARM64
(AArch64) architectures. To this end, we fully integrate our pro-
posed dual stack design into the LLVM compiler framework and
provide fully compatible Linux runtime environments on both ar-
chitectures. The changes applied to LLVM and the Linux runtime
do not restrict any compiler and linker features, such as tail call
optimization or Position-Independent Code (PIC), and are interop-
erable with language features, such as shortened return sequences
(setjmp()/longjmp()), multi-threading, or C++ exceptions. Fur-
ther, the changes do not restrict the interoperability of protected
programs with statically linked or dynamically loaded libraries,
including unprotected legacy libraries.
In summary, we make the following contributions:
• We propose a novel approach to stack separation that—by
design—eliminates disclosure of the safe stack location by
exploiting leaked pointers or searching the address space,
using most recent techniques such as memory allocation
oracles [37].
• We present a full integration of our proposed dual stack
scheme into the LLVM compiler framework and the Linux
runtime environment with complete support for the x86-64
and ARM64 architectures1.
• We show that our approach is practically feasible with neg-
ligible performance overhead using standard CPU bench-
marks and the Apache and Nginx web servers.
In the remainder of this paper, we introduce our threat model (§ 2),
give a brief overview on information hiding (§ 3), study the weak-
nesses of previous dual stack designs (§ 4), describe the design of
our leak-resilient dual stack scheme (§ 5), present our implemen-
tation (§ 6), evaluate the performance (§ 7), discuss related work
(§ 8), and conclude (§ 9).
2 THREAT MODEL
Dual stack schemes are designed to prevent runtime attacks that
achieve malicious code execution by exploiting backward-edges in
a program’s CFG. To evaluate our own dual stack approach and
draw comparisons to other approaches, we define a threat model
within which the ultimate goal of an attacker is to maliciously alter
a return address on the target program’s stack. We base our threat
model on prior work in the area [1, 35, 47].
First, we assume that the compiler toolchain is trusted and reli-
ably inserts our defense mechanism into the target program. Next,
we assume that the target program is running on a system where
the hardware and all privileged software components, such as hy-
pervisors or operating systems, are trusted and operating correctly.
1Available at https://github.com/llvm-return-stack.
Hence, program-specific runtime data held by privileged compo-
nents (e.g., register states) is out of the attacker’s reach. The system
further employs standard defense mechanisms, i.e., Data Execution
Prevention (DEP), Write XOR Execute (W⊕X), and ASLR. Therefore,
the target’s program code and read-only data cannot be modified
by the attacker and Just-In-Time (JIT) compiled and self-modifying
code are out of scope.
We assume that the target program contains memory errors
resulting in (remotely) exploitable, input-controlled vulnerabilities
granting the attacker the following capabilities:
• The attacker is able to read from any memory location in the
target’s address space. This enables him to perform informa-
tion disclosure attacks in order to leak the process’ memory
layout.
• The attacker is able to write data to any writable location in
the target’s address space. This includes non-contiguous and
contiguous writes. The latter describes a write to a location
directly adjacent to a vulnerable location, for example, a
buffer overflow.
These assumptions yield a realistic threat model with strong at-
tacker capabilities. The goal of our attacker is to alter a return
address on the target program’s stack granting malicious code exe-
cution, for example, in form of a chain of return-oriented instruction
sequences (i.e., gadgets) or a function call with attacker-supplied
parameters (e.g., execve() to spawn a shell).
3 INFORMATION HIDING
Dual stack schemes, like most modern defenses that protect the
control-flow of programs, rely on information hiding to strengthen
their protective properties. As such, these defenses separate sensi-
tive pointers (e.g., return addresses, function pointers, and dispatch
tables) and metadata thereof from everything else in memory by
placing them in self-contained hidden areas. Access to these areas is
then restricted to legitimate and non-exploitable code, concealing
the areas from unauthorized access by attackers. However, new
attacks circumvent this concealment by means of information dis-
closure, i.e., by searching for the hidden areas and leaking their
locations. To conduct such an information disclosure, the attacker
has to overcome the entropy with which the hidden area is placed
in the target process’ address space. Depending on the defense, this
entropy is equal to the size of the process’ address space or the
entropy introduced by the deployed information hiding technique.
Our leak-resilient dual stack, as the previous dual stack schemes
SafeStack and AG-Stack, is developed for the Linux operating sys-
tem. On Linux, the common technique deployed for information
hiding is ASLR. In the following, we discuss the virtual memory lay-
out of a typical, 64-bit Linux (kernel version 4.14.4) process under
ASLR. We highlight the differences between the ASLR implementa-
tions on the x86-64 and ARM64 architectures and briefly discuss
their implications on the security of dual stacks and other defenses.
Processes are provided a virtual address space of 2b bytes by the
Linux kernel, organized in virtual pages of 4 KiB (212 bytes) and
spanning over the address range [0, 2b ). On the x86-64 architecture,
the number of address bits b is 47, yielding an address space of 128
TiB (247 bytes; 235 pages). On ARM64 b is 48, yielding an address
space of 256 TiB (248 bytes; 236 pages). The kernel populates the
A Leak-Resilient Dual Stack Scheme for Backward-Edge CFI ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea
Stack
Memory mapping
space
Heap
Loadable ELF
segments
2b
stack_base
mmap_base
brk
brk_start
0x0
stack_offset
mmap_offset
brk_offset
load_offset
Figure 1: Memory layout of an ASLR-enabled process on 64-
bit Linux.
address space deterministically, starting from the lowest address,
with the loadable ELF segments (.text, .data, .bss, etc.) followed
by the heap, and, from the highest address, with the stack followed
by the memory mapping space. Accordingly, the stack and memory
mapping space grow down towards lower addresses, while the
heap grows in the opposite direction. The layout of the virtual
address space is depicted in Figure 1. The memory mapping space
is organized by the kernel’s mmap() infrastructure and initially
populated with shared libraries that are required during the process’
initialization. These libraries are loaded in a deterministic manner
and typically include ld and libc. During execution, the memory
mapping space also holds shared libraries loaded dynamically as
needed, (large) heap allocations backed by mmap(), as well as file
and anonymous mappings retrieved through mmap().
During the initialization of the process, the kernel randomizes
the memory layout by applying randomly chosen offsets to the base
addresses of the loadable ELF segments, the heap, the memory map-
ping space, and the stack. The offset to the loadable ELF segments
is only randomized for Position-Independent Executables (PIEs)
and otherwise fixed to a pre-defined value. The different intervals,
the offsets are randomly chosen from, are illustrated for both ar-
chitectures in Table 1. For each interval, the offset is page aligned,
reducing the available bits for randomization (i.e., the entropy) by
the page size. The addressable bits and resulting entropy for each
interval are illustrated in Table 1 as well.
Typically, defenses that rely on information hiding place their
hidden areas within the memory mapping space. This space grows
towards the heap and can occupy a vast majority of the process’
available virtual memory. These defenses are therefore able to al-
locate large hidden areas, offering an entropy of 28 bits on x86-64
but only 18 bits on ARM64. Other defenses, such as SafeStack and
AG-Stack, that declare the stack as a hidden area, only achieve an
entropy of 22 bits and 18 bits on x86-64 and ARM64.
2Taken from randomize_stack_top(), arch_pick_mmap_layout(), arch_random-
ize_brk(), and load_elf_binary(), respectively.
Table 1: Randomization intervals of ASLR and their result-
ing entropies on 64-bit Linux for the x86-86 and ARM64 ar-
chitectures.
Name Interval2 Bits Entropy
x8
6-
64
stack_offset [0, 0x400000000) 34 22
mmap_offset [0, 0x10000000000) 40 28
brk_offset [0, 0x2000000) 25 13
load_offset [0, 0x10000000000) 40 28
A
R
M
64
stack_offset [0, 0x40000000) 30 18
mmap_offset [0, 0x40000000) 30 18
brk_offset [0, 0x40000000) 30 18
load_offset [0, 0x40000000) 30 18
4 ON THE SECURITY OF DUAL STACKS
In recent years, dual stack schemes emerged as the most sophisti-
cated defense mechanism against attackers focusing on exploiting
backward-edges in a program’s CFG. Dual stack schemes divide a
program’s stack into two separated areas, the unsafe stack holding
potentially unsafe stack objects, such as local stack variables that
store user-supplied data, and the safe stack holding sensitive stack
objects, such as return addresses that are crucial to the integrity
of the program’s control-flow. To strengthen the resilience against
information disclosure attacks, these schemes randomize the lo-
cation of the safe stack within the address space of the protected
program. The two main dual stacks approaches are SafeStack [32]
and AG-Stack [35]. Before presenting our leak-resilient dual stack
design, we introduce both mechanisms below and discuss recent
advances in targeted information disclosure attacks against them.
4.1 SafeStack
SafeStack was initially presented by Kuznetsov et al. [32] as part
of a general solution to enforce Code Pointer Integrity (CPI). Yet,
SafeStack works independently from the overall CPI solution and
can be used to guarantee backward-edge CFI. As a result, Safe-
Stack has been integrated into the LLVM compiler framework as
an architecture-independent3 extension [21].
SafeStack performs compile-time analysis in order to distribute
all stack objects onto a safe and an unsafe stack. This analysis
considers register spills and return addresses to be always safe. The
safety of function parameters and local variables is determined by
analyzing their type and scope. Stack objects are deemed safe if
they are accessed exclusively within their corresponding function
and only with a fixed offset through the local stack pointer or
frame pointer. All remaining stack objects, i.e., those passed to child
functions, are marked unsafe and stored on the unsafe stack.
In regard to the implementation of SafeStack, both the original
work and the LLVM extension save the reference to the safe stack
in the program’s stack pointer register (e.g., RSP on x86-64) and the
reference to the unsafe stack in the program’s Thread Local Storage
(TLS), as multi-threading is supported. Both implementations use
memory mappings to store their unsafe stacks and dereference the
3The extension supports the ARM, ARM64, MIPS32, MIPS64, x86, and x86-64
architectures.
ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea Philipp Zieris and Julian Horsch
pointer from the TLS upon access to unsafe stack objects. The two
implementations differ in terms of the placement of safe stacks.
Kuznetsov et al. store safe stacks within a (much) larger memory
mapping—the safe region—used by the overall CPI solution to store
sensitive data. In contrast, due to the lack of this region, the LLVM
extension opts to store safe stacks as a replacement for the pro-
gram’s original stacks. As a result, the main thread’s original stack
becomes the safe stack and each child thread’s safe stack is allocated
as a dedicated memory mapping.
4.2 AG-Stack
AG-Stack was presented by Lu et al. [35] as part of a general
solution—called ASLR-Guard—to prevent information disclosure
attacks that leak the location of any executable program code or
any pointer to executable program code, ultimately rendering code-
reuse attacks infeasible. Lu et al. achieve this goal by separating and
randomizing code and data sections of a program, encrypting point-
ers to program code when they are treated as data, and deploying a
dual stack scheme—namely AG-Stack—to secure return addresses.
Hence, within ASLR-Guard, AG-Stack provides backward-edge CFI.
AG-Stack distributes stack objects onto a safe and an unsafe stack
according to a straightforward approach: The only objects consid-
ered to be safe are return addresses. All other objects on a stack,
namely register spills, function parameters, and local variables are
deemed unsafe and stored on the unsafe stack. The prototype im-
plementation by Lu et al. was conducted on the x86-64 architecture
and also supports multi-threading. AG-Stack keeps its references
to the safe and unsafe stack in two dedicated registers, namely RSP
and R15. Further, like SafeStack, AG-Stack stores the unsafe stacks
as memory mappings and uses the safe stacks as a replacement for
the program’s original stacks.
4.3 Information Disclosure Attacks
Both SafeStack and AG-Stack conceal their safe stacks within the
address space of the protected program. For this, they rely on the
principles of information hiding, and are therefore prone to informa-
tion disclosure attacks. These attacks exploit memory corruptions,
i.e., use vulnerable data variables or data pointers [46], to locate a
hidden area, such as the safe stack, within a much larger memory
space. In recent years, attackers developed different approaches
to locate hidden areas by means of leveraging leaked pointers in
linked libraries, probing the program’s address space in a brute-
force manner, and utilizing memory allocation oracles. We discuss
these attacks briefly in the following, before describing our dual
stack design, which withstands all of them.
As described in § 4.1, the original SafeStack scheme places its
safe stacks within the CPI safe region. Thus, information disclo-
sure attacks against these stacks require leaking the safe region,
which imposes different obstacles on the attacker. Hence, we base
our discussion regarding SafeStack on its implementation in LLVM.
Further, as described in § 4.2, AG-Stack is part of the overall defense
solution ASLR-Guard. Since we do not have access to an implemen-
tation of ASLR-Guard, we base our discussion of AG-Stack solely
on the information given in the paper by Lu et al. [35].
4.3.1 Pointer Leaks. A major aspect in keeping hidden areas
concealed is the protection of references to these areas. Typically,
CFI solutions realize this by accessing the hidden areas exclusively
through dedicated registers. However, even assuming protected
programs themselves never write these dedicated registers to un-
protected memory, it is not guaranteed that libraries linked into the
programs do not leak them anyhow. Therefore, a valid approach
for an attacker to disclose hidden areas is to leverage such pointer
leaks created at known memory locations by linked libraries.
Considering dual stack schemes, leaks of the stack pointer regis-
ter (e.g., RSP on x86-64) reveal the location of safe stacks, in case the
schemes utilize that register to hold the safe stack pointer (like Safe-
Stack and AG-Stack). The stack pointer register is typically written
to memory by libraries that intervene with the control-flow of pro-
grams. On GNU/Linux runtime environments, most prominently,
the GNU C standard and GCC runtime libraries perform control-
flow changes due to features such as shortened return sequences,
stack unwinding, and multi-threading.
Shortened Return Sequences. Through the setjmp()/longjmp()
interface, the GNU C standard library allows functions to return
to a different function than their immediate caller. This creates a
shortened return sequence, as functions in-between the call site of
setjmp() and the invocation of longjmp() are not returned to, but
skipped entirely. For this, a copy of the entire register state is created
upon entering setjmp() and reinstated into the registers by invok-
ing longjmp(). The register state is stored at a programmer-chosen
memory location, potentially leaking the stack pointer register to
unsafe memory (e.g., when the register state is stored on the heap).
Stack Unwinding. The GCC runtime library implements stack
unwinding to facilitate C++ exception handling, debugging func-
tionalities, and multi-threading. During unwinding, the unwinder
traverses backwards through the stack cleaning up stack frames
along the way. To keep track of the stack frames, the unwinder
stores the stack pointer of the current frame in an unwinding con-
text. This unwinding context is allocated on the heap creating a
leak of the stack pointer register to unsafe memory.
Multi-threading. Support for multi-threading is provided by the
GNU C standard library in form of POSIX threads. On creation of
new threads, the library allocates an extra stack in the memory
mapping space and stores its base address and size within a cor-
responding Thread Control Block (TCB). References to this TCB
are then passed—via the thread ID—to programmer-controlled vari-
ables or stored at known offsets in the library’s data segment (e.g.,
stack_used). Thus, the location of the stack is leaked at multiple
locations throughout the unsafe memory.
Impact on SafeStack and AG-Stack. Both schemes are affected by
pointer leaks in linked libraries. Because the underlying causes of
these leaks differ, two types can be distinguished:
Neglected Spills. Libraries that directly intervene with the
control-flow of programs need to store register states to
memory, spilling the register dedicated to hold the safe stack
pointer (e.g., shortened return sequences and stack unwind-
ing).
Neglected Metadata. Libraries that maintain stack metadata
and keep references to that metadata throughout their mem-
ory leak the safe stacks of SafeStack and AG-Stack by default,
A Leak-Resilient Dual Stack Scheme for Backward-Edge CFI ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea
because both schemes replace the process’ original stacks
with safe stacks (e.g., multi-threading).
SafeStack and AG-Stack do not deploy any form of countermeasures
to prevent pointer leaks through spills and metadata.
4.3.2 Brute-force Probing. Another approach to disclose hidden
areas is scanning the entire address space of the targeted program in
a brute-forcemanner, and, at the same time, probe each encountered
page whether it belongs to the hidden area. When conducting such
a scan, the attacker has to overcome the entropy of the searchable
address space, which, depending on the implementation, is equal to
the size of the addressable memory space or the entropy introduced
by the used information hiding technique.
However, attackers have managed to drastically reduce this en-
tropy by leveraging information about the hidden areas, such as
its size, internal structure (i.e., sparsely populated or characteristic
byte sequences), and placement within the larger address space (i.e.,
vicinity to known mappings, such as libraries). Like this, recent
attacks successfully disclosed the safe region of CPI [20, 22] and the
safe stacks of the LLVM extension [26]. These attacks conducted
brute-force searches by utilizing fault analysis and timing side-
channels, both of which originally used to search for gadgets (e.g.,
within shared libraries) in ASLR-protected programs to facilitate
code-reuse attacks [5, 43].
Göktaş et al. [26] studied the effectiveness of fault-tolerant mem-
ory oracles to locate small-sized hidden areas, and, in the process,
successfully attacked and disclosed the safe stacks of LLVM. In
LLVM, safe stacks are created on a per-thread basis, with each stack
being at most 8 MiB (223 bytes; 211 pages) in size and allocated
in the memory mapping space. With an entropy of 28 bits4 for
memory-mapped addresses, this results in 217 (228/211) possible
page-aligned start addresses for each safe stack.
In order to reduce this entropy, Göktaş et al. developed two
techniques—namely thread spraying and stack spraying—that arti-
ficially prepare the safe stacks for attacks. With thread spraying,
the authors exploit (mis-)configurations of target programs that
allow spawning a multitude of worker threads (e.g., with JavaScript
control in browsers to start web workers at will). For each new
thread, a safe stack is allocated in the memory mapping space
using mmap(). As subsequent calls to mmap() allocate memory at
consecutive addresses, fast successively spawned threads create a
contiguous memory region containing safe stacks only. Using this
approach, Göktaş et al. spawn 200 worker threads and increase the
combined stack size to 400 MiB (216 pages), in turn reducing the
search area to 228/216 = 212 pages. To find safe stacks within this
area, Göktaş et al. further conduct stack spraying to populate the
stacks with unique byte sequences that are easily searchable. For
this, they utilize web workers within their controlled JavaScript
environment to call a function recursively with their unique byte
sequence as argument. Thus, the memory location of a safe stack
can be obtained by probing at worst 212 pages for the byte sequence.
Impact on SafeStack and AG-Stack. Both schemes are suscepti-
ble to brute-force probing due to several reasons. First, as both
schemes replace the process’ original stacks with safe stacks, the
main thread’s safe stack receives an entropy of only 22 bits, while
4These values are for the x86-64 architecture. On ARM64, the entropies are even lower.
the safe stacks of child threads receive an entropy of 28 bits. As
safe stacks are allocated with a default (maximum) stack size of
8 MiB, the effective entropy for a linear search is reduced to 11
bits (222/211) for the main thread and 17 bits (228/211) for all child
threads.
Second, due to the allocation through mmap(), the safe stacks of
child threads are likely to be placed in memory locations adjacent
to each other or to well-known libraries. This can further reduce
the effective entropy, as the safe stacks form a contiguous memory
region stretching over a multiple of 211 pages and the adjacency
to libraries enables an attacker to search for these libraries (which
ideally occupy more than 211 pages) instead of the safe stacks.
Finally, both schemes are inherently prone to thread spraying and
SafeStack is additionally exposed to stack spraying. For both, thread
spraying is amplified by the fact that memory mappings retrieved
through mmap() observe a spatial adjacency. Stack spraying, on the
other hand, is thwarted by AG-Stack, as return addresses are strictly
separated from other stack objects by placing them exclusively on
the safe stack. In doing so, an attacker is not able to exploit other
stack objects to artificially populate the safe stacks.
4.3.3 Memory Allocation Oracles. Instead of probing memory
in a brute-force manner, hidden areas can also be disclosed with
constant costs using memory allocation oracles. Memory allocation
oracles reveal the size of holes (unmapped memory) between allo-
cations inside the target’s address space. Therefore, by identifying
all holes in an address space, hidden areas can be trivially inferred
by an attacker. Oikonomopoulos et al. [37] presented attacks based
on memory allocation oracles and broke all current CFI and CPI
solutions that rely on information hiding, including the CPI safe
region, the safe stacks of LLVM, and ASLR-Guard with AG-Stack.
In their paper, Oikonomopoulos et al. describe two types of mem-
ory allocation oracles: Ephemeral Allocation Primitives (EAPs) that
create short-lived allocations and Persistent Allocation Primitives
(PAPs) that create allocations with a lifetime lasting for the dura-
tion of the attack. To craft these primitives, the authors rely on
target programs that allocate memory objects (e.g., using mmap())
as part of their input-handling logic, and additionally on a memory
corruption that grants an attacker the ability to control the size of
these memory allocations. Using the example of web servers, when
an attacker is able to allocate memory as part of HTTP client con-
nections, an EAP can be crafted using a non-persistent connection,
while a PAP can be crafted using a persistent connection.
An attacker with access to an EAP and its size parameter can
perform a binary search to find the larger one of the two holes
surrounding the hidden area. In each binary search iteration, the
attacker performs a single EAP invocation and observes the posi-
tive (successful allocation; empty space) or negative (unsuccessful
allocation; space contains parts of the hidden area) feedback. Based
on this feedback, the attacker adapts the size for the next iteration.
At the end, the search naturally returns the size of the larger hole.
To generalize the attack, i.e., searching for multiple holes within
an address space, the attacker combines EAP with PAP in order to
permanently allocate holes already discovered.
Impact on SafeStack and AG-Stack. Information disclosure at-
tacks based on memory allocation oracles reveal the entire memory
layout of processes, enabling an attacker to distinguish between
ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea Philipp Zieris and Julian Horsch
Table 2: Resilience of dual stack schemes against informa-
tion disclosure attacks.
Safe-
Stack
AG-
Stack
Leak-Resilient
Dual Stack
Pointer Leaks ✗ ✗ ✓
Brute-Force Probing
(ASLR) Entropy 22/28 22/28 32
Max. Size (in pages) 211 211 23
Effective Entropy 11/17 11/17 29
Spatial Adjacency ✗ ✗ ✓
Thread Spraying ✗ ✗ ✗
Stack Spraying ✗ ✓ ✓
Allocation Oracles ✗ ✗ ✓
mapped and unmapped memory. As both dual stack schemes allo-
cate safe stacks as self-contained memory objects, their locations
can trivially be inferred when the memory layout is known.
4.4 Summary
Both SafeStack and AG-Stack are vulnerable to all three types of
information disclosure, as summarized in Table 2. For brute-force
probing, the overall vulnerability is comprised of the effective en-
tropy (distinguishable for main/child threads and dependent on
ASLR entropy and maximum stack size), spatial adjacency (received
through mmap()) and general susceptibility to thread and stack
spraying.
The weaknesses of SafeStack and AG-Stack can be attributed to
four specific design flaws:
❶ Structural Flaw. Safe stacks that contain additional objects
besides return address are susceptible to stack spraying (this
only applies to SafeStack, not AG-Stack).
❷ Oversizing. The size of safe stacks directly weakens the re-
silience against brute-force probing by reducing the effective
entropy. While dual stack schemes divide stack objects onto
two stacks, both SafeStack and AG-Stack allocate safe stacks
with the system-default size of (at most) 8 MiB (211 pages).
❸ Architectural Conformity. When replacing the process’
original stacks with safe stacks and using the architecture’s
regular stack pointer register (i.e., RSP for SafeStack and
AG-Stack) to reference these stacks, they are susceptible to
pointer leaks through neglected metadata.
❹ Inherited Inflexibility. Due to the architectural confor-
mity, safe stacks also have to adhere to the system’s spec-
ifications. For this reason, safe stacks strictly receive the
entropy of stack_offset and mmap_offset (see Figure 1),
and further inherit the weaknesses imposed by mmap(), i.e.,
a spatial adjacency in memory, as well as a susceptibility to
thread spraying and memory allocation oralces.
Furthermore, both SafeStack and AG-Stack suffer from pointer leaks
through neglected spills, which cannot be avoided by a specific
design choice but must be handled manually as described in § 6.3.
An illustration of the design flaws is given in Table 3. Based on these
Table 3: Design flaws of SafeStack and AG-Stack. The num-
bers correspond to❶ the structural flaw,❷ oversizing,❸ the
architectural conformity, and ❹ the inherited inflexibility.
Design Flaws
Pointer Leaks ❸
Brute-Force Probing
Effective Entropy ❷ ❹
Spatial Adjacency ❹
Thread Spraying ❹
Stack Spraying ❶
Allocation Oracles ❹
flaws, we will present the design differences of our leak-resilient
dual stack in the following section.
5 LEAK-RESILIENT DUAL STACK DESIGN
According to our threat model from § 2, dual stack schemes have
to protect return addresses against contiguous and arbitrary writes
to enforce backward-edge CFI effectively. The prevention of con-
tiguous writes is achieved through the basic design principle of
dual stacks: Separating potentially unsafe stack objects, such as
local stack variables storing user-supplied data, from sensitive stack
objects, i.e., the return addresses. Hence, we maintain this basic
design and separate stack objects onto safe and unsafe stacks.
To prevent arbitrary writes on safe stacks, their locations must
be kept secret from an adversary. Therefore, the safe stacks have to
withstand powerful information disclosure attacks that, according
to our treat model, are capable of reading arbitrary memory loca-
tions within the target program’s address space. In order to harden
dual stack schemes, we improve on the four design flaws identified
in § 4 and summarized in Table 3. In the following, we present the
improvements to these flaws and build our leak-resilient dual stack
scheme. The improvements in terms of leak-resilience achieved
with our dual stack scheme are summarized in Table 2.
5.1 Return Stacks
In order to prevent structural flaws (❶), an important design choice
for dual stack schemes is the differentiation between safe and unsafe
stack objects. In this regard, and as discussed in § 4.3.2, the naïve
approach of declaring all stack objects, except return addresses,
unsafe is more effective against attacks that conduct stack spraying.
The lack of controllable objects prevents an attacker from artificially
populating the safe stack to his own choosing, taking away his
ability to identify safe stacks easily during brute-force probing.
Thus, we adopt this design choice and only store return addresses
on our safe stacks, which we name return stacks accordingly.
5.2 Return Stack Size
Capitalizing on the fact that we only store return addresses on our
return stacks, we are able to counteract the oversizing flaw (❷). The
size required by the return stack directly depends on the number
of nested function calls in the protected program. We measured the
maximum function call depth for SPEC CPU2017 by instrumenting
A Leak-Resilient Dual Stack Scheme for Backward-Edge CFI ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea
Table 4: Maximum function call depth of the SPECCPU2017
benchmark on the x86-64 architecture.
Max. Call Depth
500.perlbench_r 310
502.gcc_r 2581
505.mcf_r 29
520.omnetpp_r 303
523.xalancbmk_r 85
525.x264_r 15
531.deepsjeng_r 63
541.leela_r 779
557.xz_r 20
function prologues and epilogues. The results, summarized in Ta-
ble 4, show an average call depth of 465 and a maximum of 2581
for the gcc benchmark. As one virtual memory page is able to hold
512 64-bit addresses, we conclude that a return stack size of 8 pages
is sufficient for practical use.
By reducing the size of our return stacks, we directly strengthen
the resilience of our dual stack scheme against brute-force probing.
For our scheme, the effective entropy, with which return stacks are
placed in the program’s address space, is only reduced by 3 bits,
instead of 11 bits for SafeStack and AG-Stack.
5.3 Return Stack Pointer
Another design choice of SafeStack and AG-Stack is maintaining
an architectural conformity (❸) by replacing the process’ original
stacks with safe stacks and using the architecture’s regular stack
pointer register to reference these stacks. As discussed in § 4.3.1,
this results in pointer leaks through neglected metadata. To avoid
such undesired references to our return stacks, we opt to use a
different register to hold the return stack pointer.
In particular, we choose one of the architecture’s general-purpose
registers and dedicate it exclusively to the return stack pointer. As
a consequence, in our dual stack scheme, the process’ original
stack becomes the unsafe stack and the architecture’s regular stack
pointer register still holds the unsafe stack pointer. This way, we
guarantee that neglected metadata may only contain references
to the unsafe stack and that the return stack pointer cannot be
leaked into unsafe memory. This design choice also creates different
compatibility options for the use with legacy (unprotected) libraries,
which we will discuss in § 5.6.
5.4 Return Stack Region
As a consequence of not replacing the process’ original stacks with
return stacks, we gain flexibility (❹) when placing return stacks
within the process’ address space. Our dual stack scheme is not
forced to place the main thread’s return stack at the system’s des-
ignated memory location (i.e., stack_base; see Figure 1). Instead,
all return stacks can be allocated freely within the virtual address
space. This lifts the restriction on the entropy of the main thread’s
return stack, adjusting all return stacks in our scheme to the same
entropy. To specifically harden our dual stack scheme, we place all
return stacks within a dedicated return stack region that follows
two design goals: (1) Maintaining no metadata about its internal
structure at any time and (2) presenting a strong protection against
information disclosure that severely impedes brute-force probing
and directly renders memory allocation oracles impossible.
The return stack region is allocated in the memory mapping
space using mmap() and occupies a size of 244 bytes (232 pages).
Within this region, return stacks are assigned randomized base ad-
dresses resulting in an effective entropy of 29 bits (232/23). Due to
this randomness, we are further able to break any spatial adjacency
that is otherwise experienced when using plain mmap() to allocate
stacks within the memory mapping space. Our return stack region
guarantees stacks to have no spatial adjacency to themselves or to
any well-known libraries that are stored in the memory mapping
space. Thus, we prevent an attacker from artificially constructing
contiguous memory regions occupied by return stacks and from
searching well-known libraries instead of the return stacks them-
selves. As a consequence, the randomization within our return stack
region greatly reduces the ability of an attacker to search for return
stacks. The effects of thread spraying are also minimized, as the
return stack region is large enough to theoretically hold (232 − 1)/9
return stacks with page-sized gaps (i.e., guard pages) between them.
Summarizing, our dual stack scheme heavily impedes brute-force
probing, even when supported by thread spraying.
To render memory allocation oracles impossible, and in contrast
to other schemes that rely on information hiding, we remove all
access permissions from the return stack region upon allocation.
Then, we only set read/write access on memory pages as needed,
i.e., as soon as they are occupied by return stacks. This way, we
keep return stacks within an always-allocated, but non-accessible
memory region.Memory allocation oracles are therefore able to find
the return stack region, but are unable to search for return stacks
within that region. To disclose stacks within the return stack region,
an attacker has to perform an exhaustive search in a brute-force
manner, which is heavily impeded as explained before. Additionally,
this design requires no metadata to manage safe stacks, and thus
prevents the explicit leakage of stack locations.
5.5 Stack Creation and Destruction
To manage the creation of new return stacks within the metadata-
free return stack region, we rely on information about the access
permissions of memory pages. As no interface is exposed by the
Linux kernel to retrieve this information, we leverage the write()
syscall for this purpose. The syscall takes a memory address and
file descriptor as input and attempts to write the bytes stored at
the specified address into the file. On success, the syscall returns
the number of bytes written indicating that the memory address
is readable. On failure, an error code is returned indicating no
read permissions. Note that write() only reads from the specified
address, and therefore has no effect on the process’ memory. Hence,
we can utilize write() to determine whether memory pages of our
return stack region are currently free (non-readable) or occupied
by a return stack (readable), which enables us to create and destroy
return stacks without maintaining any metadata.
For the stack creation, the return stack region is randomly probed
using the write() syscall until 10 non-readable, consecutive mem-
ory pages (8 pages for the return stack, enclosed by two guard pages)
ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea Philipp Zieris and Julian Horsch
are discovered. As the region can hold a maximum of (232 − 1)/9
return stacks, a collision is highly unlikely. Therefore, the perfor-
mance overhead for stack allocations and the corresponding thread
creations is negligible. The permissions of the 8 center pages are set
to read/write access and the dedicated return stack pointer register
is initialized to the base address of the new return stack. For the de-
struction of return stacks, the return stack pointer is obtained from
its dedicated register and the permissions of the memory pages
occupying the return stack are reset to non-accessible.
5.6 Library Compatibility Options
Our dual stack scheme is designed to protect executables and li-
braries alike. Depending on the usage scenario, it might be neces-
sary to have interoperability with legacy libraries. Therefore, we
provide three options for library integration:
Secure. The library is compiled and instrumented with our
dual stack scheme and therefore fully protected. Note that
this option does not require any changes to the source code
of the library.
Aware. The library is aware of the presence of return stacks
without requiring instrumentation by our scheme. Because
we utilize a general-purpose register to hold return stack
pointers, libraries can be compiled to omit the usage of that
register altogether. This allows us to link unprotected li-
braries into protected programs while guaranteeing the re-
turn stack pointer is never spilled by the unprotected library.
As a result, our scheme facilitates the creation of fully aware
runtime environments (e.g., an entire Linux system) that are
able to execute protected and unprotected programs simul-
taneously without compromising the security of protected
programs in any way.
Compatible. A pre-compiled library is still compatible with
our scheme. However, such a library might spill the return
stack pointer onto the unsafe stack.
Summarizing, our dual stack scheme achieves compatibility to all
types of libraries. While the first two options require re-compilation,
none of the cases require source code changes.
6 IMPLEMENTATION
We present implementations of our dual stack scheme for the x86-
64 and ARM64 architectures. The implementations are designed for
the ELF binary format in conjunction with GNU/Linux execution
environments. Our dual stack scheme is integrated into the LLVM
compiler framework (version 6.0.0), with modifications to the com-
piler backend (machine code generation) and the runtime support
library. For the execution of secured programs, our scheme further
requires modifications to the GNU C standard and GCC runtime
libraries.
6.1 Compiler Backend
Data on the stack is organized in stack frames—contiguous, per-
function data regions—containing the function’s return address
and other stack objects, such as the function’s parameters, register
spills, and local variables. Stack frames are created and destroyed
by the prologue and epilogue of a function, which are emitted by
the compiler backend during machine code generation. To add
1 PUSH %RBP
2 MOV %RSP , %RBP
3 PUSH %RBX
4 SUB $72 , %RSP
5 ...
6 ADD $72 , %RSP
7 POP %RBX
8 POP %RBP
9 RETQ
1 POPQ (%R15)
2 LEA 8(%R15), %R15
3 PUSH %RBP
4 MOV %RSP , %RBP
5 PUSH %RBX
6 SUB $64, %RSP
7 ...
8 ADD $64, %RSP
9 POP %RBX
10 POP %RBP
11 LEA -8(%R15), %R15
12 JMPQ (%R15)
Figure 2: Typical prologues and epilogues of a regular func-
tion (left) and a function with support for our return stack
(right) on the x86-64 architecture.
support for our dual stack scheme, we modify function prologues
and epilogues to maintain our return stack and store return ad-
dresses there instead of on the regular stack. In the following, we
present these modifications for the x86-64 and ARM64 Instruction
Set Architectures (ISAs).
6.1.1 x86-64 Architecture. The x86-64 ISA [30] implements full
descending stacks with the dedicated registers RSP to hold the
stack pointer and RBP to hold the frame (or base) pointer. While
the stack is maintained by the function prologue and epilogue, the
return address is pushed onto the stack by the CALL instruction
and popped by the RET instruction. In the following, we assume
a typical function that saves one general-purpose register (RBX)
alongside the frame pointer and return address onto the stack, and
additionally reserves 64 bytes of stack space for local variables. In
x86-64 machine code, this translates to the prologue (ln. 1–4) and
epilogue (ln. 6–9) shown on the left-hand side of Figure 2. The stack
is extended by an additional 8 bytes, as RSP must be kept at an
16-byte alignment. The return address is pushed onto the stack by
the preceding CALL instruction, which is not depicted in the figure.
To extend the x86-64 ISA for our dual stack scheme, an additional
dedicated register has to be identified that holds the return stack
pointer. As discussed in § 5, general-purpose registers are most
suitable for this purpose, as they maintain compatibility with legacy
libraries. Based on this criterion, we choose R15 as the dedicated
register for the return stack pointer.
To support return stacks, it would be ideal to replace CALL in-
structions and push return addresses directly onto the return stack
before jumping to the called function. However, this is not an ac-
ceptable option, as it would break compatibility to legacy libraries
that still use the CALL instruction. Therefore, we opt to move re-
turn addresses from the regular stack onto the return stack during
function prologues. This is achieved through two additional instruc-
tions (ln. 1–2), as depicted on the right-hand side of Figure 2. The
first instruction directly pops the return address from the regular
stack onto the return stack, without using a temporary register.
The second instruction increments the return stack pointer by the
size of the pushed return address. The epilogue is extended by one
additional instruction (ln. 11) to decrement the return stack pointer
before returning by directly jumping to the address stored on the
A Leak-Resilient Dual Stack Scheme for Backward-Edge CFI ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea
1 SUB SP , SP, #96
2 STR X19 , [SP, #64]
3 STP FP , LR, [SP, #80]
4 ADD FP , SP, #80
5 ...
6 LDP FP , LR, [SP, #80]
7 LDR X19 , [SP, #64]
8 ADD SP , SP, #96
9 RET
Figure 3: Typical prologue and epilogue of a function on the
ARM64 architecture.
return stack (ln. 12). We use the LEA instruction for manipulating
the return stack pointer to ensure that the operation has no side
effects on the flags register. The remaining instructions (ln 3–10)
are consistent with the regular prologue/epilogue pair, with the
exception that no stack realignment has to be carried out as our
prologue spills an even number of registers.
In summary, our implementation realizes an empty ascending
stack to hold return addresses in only three additional instructions
for each function’s prologue/epilogue pair.
6.1.2 ARM64 Architecture. The ARM64 ISA [2] implements full
descending stacks with the dedicated registers SP to hold the stack
pointer and FP to hold the frame pointer. In contrast to x86-64, the
ARM64 ISA does not provide a CALL instruction to push return
addresses, but instead utilizes the so-called link register (LR) to pass
them to called functions. As a consequence, functions have to save
return addresses themselves during the creation of stack frames.
In the following, we again assume a typical function that saves
one general-purpose register (X19) alongside the frame pointer and
return address onto the stack, and additionally reserves 64 bytes
of stack space for local variables. In ARM64 machine code, this
translates to the prologue (ln. 1–4) and epilogue (ln. 6–9) shown in
Figure 3. The prologue first extends the stack by the combined size
of register spills, alignment bytes, and local variables (i.e., 24+8+64).
Then, the registers are spilled and the frame pointer is updated
to point to the saved frame pointer value on the stack. Likewise,
the epilogue destroys the stack frame by reversing the prologue’s
instructions.
To extend the ARM64 ISA for our dual stack scheme, we chose
the general-purpose register X28 to hold the return stack pointer.
Further, we implemented the empty ascending stack to hold re-
turn addresses by utilizing memory transfer instructions with pre-
and post-indexed addressing unique to ARM architectures. In pre-
indexed addressing, the memory address accessed is the sum of the
used base register and an offset, and the address is written back
to the base register. With post-indexed addressing, the accessed
address is the value in the base register, and the sum of the address
and the offset is written back to the base register. Like this, we
are able to add support for return stacks with zero or only two5
5If the function spills an odd number of registers, we need zero additional instructions,
if it spills an even number, we need two additional instructions, a store and a load.
1 STR LR, [X28], #8
2 SUB SP, SP, #80
3 STP X19 , FP, [SP, #64]
4 ADD FP, SP, #72
5 ...
6 LDP X19 , FP, [SP, #64]
7 LDR LR, [X28], #-8!
8 ADD SP, SP, #80
9 RET
Figure 4: Typical prologue and epilogue of a function with
support for our return stack on the ARM64 architecture.
additional instructions per prologue/epilogue pair. Adapting the ex-
ample prologue/epilogue pair from Figure 3 requires no additional
instructions. The modifications are depicted in Figure 4.
The prologue first stores LR onto the return stack using a store
instruction with post-indexed addressing (ln. 1). The post-indexed
store increments the base register X28 by 8 bytes upon completion.
The remaining instructions (ln. 2–4), as in the regular prologue,
reduce the stack pointer, spill the registers, and update the frame
pointer. Because our scheme stores the return address on the re-
turn stack, the prologue now spills an even amount of registers,
namely X19 and FP, which can be merged into one store instruction.
Additionally, no realignment is needed, which is why the stack is
extended only by the size of register spills and local variables (i.e.,
16 + 64). The epilogue (ln. 6–9) operates similarly to the regular
epilogue, except that LR is restored from the return stack using
a pre-indexed load instruction, which decrements X28 by 8 bytes
before loading the return address into LR.
6.2 Runtime Support Library
Our runtime support library adds a constructor to the initialization
section of protected ELF binaries. At load-time, the dynamic linker
executes our constructor, which first initializes the return stack re-
gion using mmap() and then creates the return stack for the process’
main thread. The runtime support library also handles the creation
and destruction of return stacks for multi-threaded ELF binaries.
For this, the thread initialization is intercepted. Our interceptor
function performs two tasks: Creating the child thread’s return
stack and registering a destructor to clean up the return stack upon
thread cancellation.
6.3 GNU/Linux Runtime Libraries
Executing protected ELF binaries under our dual stack scheme re-
quires certain adjustments to the GNU/Linux runtime environment.
We discuss these adjustments in the following.
6.3.1 Dynamic Linker. For compatibility with our dual stack
scheme, the dynamic linker of the GNU C standard library must be
compiled with awareness of the dedicated return stack pointer reg-
ister. Because our runtime support library has to initialize the return
stack pointer early in the loading process, the dynamic linker might
clear our dedicated register at a later stage. Therefore, we recompile
the dynamic linker using the GCC compiler flags -ffixed-r15 for
ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea Philipp Zieris and Julian Horsch
x86-64 and -ffixed-x28 for ARM64. This guarantees the integrity
of our return stack pointer register during load-time.
6.3.2 Shortened Return Sequences. As discussed in § 4.3.1, the
GNU C standard library spills registers to unsafe memory through
the setjmp()/longjmp() feature. To prevent leaks of our return
stack pointer, we introduce functionally equivalent safe versions of
setjmp() and longjmp() and add a custom transformation pass
to the LLVM compiler framework.
The transformation pass performs two tasks: First, all calls to
setjmp() and longjmp() are substituted for calls to their safe coun-
terparts. Second, for each call to safesetjmp(), a unique, pointer-
sized marker is pushed onto the return stack. To distinguish the
marker from return addresses, a value above the process’ maximum
virtual address is used. Next, safesetjmp() reads the marker upon
invocation and stores it as part of the register state (i.e., instead of
the return stack pointer). Finally, when safelongjmp() is invoked,
the marker is read from the register state and used to unwind the re-
turn stack. The return stack is traversed backwards until the marker
is encountered, decrementing the return stack pointer along the
way. This implicitly restores the original return stack pointer.
6.3.3 Stack Unwinding. As discussed in § 4.3.1, the GCC runtime
library needs to keep track of currently unwound stack frames. This
also applies to our return stack, as the unwinder eventually has
to reinstate the correct return stack pointer. To support our dual
stack scheme without compromising its security, we extended the
GCC runtime library to perform unwinding without storing the
return stack pointer in the unwinding context. Instead, we store
an offset in the unwinding context, which indicates the position of
the return address of the currently unwound stack frame relative
to the top of the return stack.
On both, x86-64 and ARM64, stack unwinding instructions are
generated according to the DWARF standard. To implement return
stack support with minimal ramifications on the existing unwind-
ing specification, we introduce a new unwinding instruction, called
DW_CFA_def_rsp_offset. This unwinding instruction takes a sin-
gle unsigned LEB128 offset representing the spill size of the stack
frame (i.e., function) on the return stack (i.e., 8 bytes for the return
address). The instruction is inserted alongside the regular unwind-
ing instructions in order to inform the unwinder to additionally
unwind the return stack. On encountering the instruction, the un-
winder stores the offset in the unwinding context, adding it to
any previously stored offset. Like this, the unwinder accumulates
the offset from the top of the return stack to the return address
belonging to the currently unwound stack frame. At the end, the
unwinder directly updates the return stack pointer in its register
by subtracting the offset.
6.3.4 Multi-threading. On GNU/Linux environments, support
for multi-threading is provided by the GNU C standard library in
form of user-level and kernel-level context switching. User-level
threads operate without kernel support and store switched contexts
within the program’s address space. Therefore, user-level threads
inherently leak the return stack pointer and must not be used
when protecting programs with our dual stack scheme or any other
equivalent scheme relying on information hiding.
Table 5: Overhead of SafeStack and our leak-resilient dual
stack scheme on the x86-64 ISA.
Baseline SafeStack Leak-ResilientDual Stack
500.perlbench_r 570 s 2.63% 4.21%
502.gcc_r 411 s 0.97% 5.60%
505.mcf_r 623 s 1.28% 1.44%
520.omnetpp_r 687 s 2.91% 2.04%
523.xalancbmk_r 570 s -0.53% 0.18%
525.x264_r 393 s 0.51% 3.82%
531.deepsjeng_r 437 s 1.14% 4.35%
541.leela_r 761 s 1.97% 7.10%
557.xz_r 562 s 0.71% 0.89%
Apache 130.70 s 0.05% 0.11%
Nginx 178.54 s 0.18% 0.15%
Mean 1.08% 2.72%
On the other hand, kernel-level—or POSIX—threads rely on the
Linux kernel to handle and store context switches. As such, the
return stack pointer is safely stored in the kernel’s address space and
our dual stack scheme is able to support POSIX threads naturally
without leaking the return stack pointer. Protected programs must
be compiled with a modified version of the GCC runtime library
(see § 6.3.3), as the stacks of threads are forcefully unwound during
thread cancellation to perform clean-up.
7 PERFORMANCE EVALUATION
We compare the performance of our leak-resilient dual stack scheme
to a baseline with regular stacks and the SafeStack implementation
in order to evaluate the overhead added by our additional security
measures. We primarily tested with the SPEC CPU2017 bench-
mark suite (with 2 iterations) on x86-64 and the EEMBC CoreMark
benchmark (with 25,000 iterations) on ARM64. We also applied
both schemes to real world programs, i.e., the Apache (version
2.4.33) and Nginx (version 1.13.10) web servers, and tested their
throughput with ApacheBench. For this, we sent 1 million HTTP
requests to our server from a remote (LAN) host and served each
request with 128 bytes of data. All evaluations were carried out
on an Intel Core i5-7440HQ machine running Debian Buster and a
Raspberry Pi 3 running Yocto Poky Linux. The quad core proces-
sors of both systems were clocked at a static 2.2 GHz and 600 Mhz,
respectively, to avoid frequency scaling. For Debian Buster, we used
dual stack-aware GNU C standard and GCC runtime libraries, while
for the Raspberry Pi 3 we compiled a fully aware Yocto Poky Linux
system from scratch (see § 5.6). For all evaluations, we used natively
LLVM-compiled binaries as baseline and enabled the compiler flags
-fsanitize=safe-stack and -fsanitize=return-stack for the
respective dual stack schemes.
On the x86-64 ISA, the performance measurements, as recorded
in Table 5, show an average overhead of 2.72% for our scheme, com-
pared to 1.08% for SafeStack. This means the additional security
gained by our scheme comes at a mean loss of 1.64% in perfor-
mance. This is mainly due to the fact that our scheme adds three
A Leak-Resilient Dual Stack Scheme for Backward-Edge CFI ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea
Table 6: Overhead of SafeStack and our leak-resilient dual
stack scheme on the ARM64 ISA.
Baseline SafeStack Leak-ResilientDual Stack
CoreMark 16.43 s -0.73% -0.43%
Apache 196.36 s 1.49% 0.85%
Nginx 215.98 s 0.73% -0.50%
Mean 0.50% -0.03%
additional instructions to every function, whereas SafeStack only
requires additional instructions for accessing objects on the unsafe
stack, which does not occur for all functions. However, even with
our scheme, some programs, such as the Apache and Nginx web
servers, as well as the xalancbmk and xz benchmarks, experience
virtually no performance overhead, which we contribute to larger
functions with fewer transitions in-between them. This assumption
is supported by our measurements of the maximum function call
depth from § 5.2 (see Table 4), as programs with a shallow call depth
will call fewer functions in general.
For the ARM64 ISA, as recorded in Table 6, on average our
scheme causes no measurable performance overhead (i.e., -0.03%),
compared to an overhead of 0.50% for SafeStack. In comparison to
x86-64, this is a clear performance gain for our scheme, which we
attribute to the flexibility of the ISA’s memory transfer instructions
and the subsequent reduction in instructions needed per function.
In general, some benchmarks gain performance using our scheme,
a result also observable in some benchmarks with SafeStack. We
attribute those results to improved caching behavior since both
approaches can increase spatial locality of similarly accessed data,
e.g., return addresses.
8 RELATEDWORK
To this day, a variety of defense mechanisms has been proposed
to secure return addresses against exploitation through runtime
attacks. In the following, we briefly present defenses related to dual
stack schemes and discuss their differences to our work.
Stack canaries [13] are randomized values that are placed be-
tween the local variables and the return address of a stack frame
and checked for their correctness before function returns. Other
designs refrain from adding canary values, but check the integrity
of the saved frame pointer [3, 40] instead. Stack canaries do not
hold up against our threat model, as they only prohibit contiguous
writes, but leave the stack unprotected against arbitrary writes.
Shadow stack schemes provide backward-edge CFI by maintain-
ing copies of return addresses in a separate memory region—the
shadow stack—and using these copies to verify the integrity of
function returns. The different types of shadow stack schemes en-
sure the integrity of return addresses by either checking that the
addresses on the regular stack match the stored copies [10, 18,
23, 28, 31, 36, 41, 51], or by overwriting the return addresses on
the regular stack with the copies before issuing function returns
[1, 15, 16]. Shadow stacks only partly satisfy our threat model, as
they detect both contiguous and arbitrary writes onto return ad-
dresses, but do not provide countermeasures against information
disclosure attacks [11]. If anything, shadow stacks are typically
stored at well-known memory locations (such as the data segment,
within the TLS, or at a fixed offset from the regular stack), enabling
an attacker to overwrite the copies of return addresses. Merely
some recent hardware-assisted schemes deploy isolated monitoring
processes, which inherently protects their shadow stacks from in-
formation disclosure, but also induces non-negligible performance
overheads [27, 34].
Other defense solutions provide backward-edge CFI based on
heuristic schemes that either detect gadget chains based on their
characteristics [9, 53] or restrict function returns to call-preceded
instructions [39, 48, 50, 52], active call sites [17], or white-listed code
locations [23]. However, heuristic-based defenses do not hold up
against our threat model, as deviations from the intended CFG are
permitted within the scope of their respective heuristic [7, 24, 25].
Dual stack schemes have been proposed even before SafeStack
and AG-Stack. SplitStack [49], ASR [4], and XFI [19] (with possible
hardware support [6]) separate stack objects onto two independent
stacks by applying different methods to differentiate between safe
and unsafe objects. Similar approaches apply source code trans-
formations [14, 45] to move unsafe objects to the heap instead of
storing them on the stack. Due to their similarities with SafeStack,
these dual stack solutions suffer from the same weaknesses against
information disclosure attacks.
True hardware-based shadow stack schemes provide superior
resilience under our threat model, as the shadow stacks are com-
pletely out of the attackers reach by cryptographically securing
them through hardware measures [12] or placing them in separate
hardware caches [33, 38]. However, these solutions require custom
hardware rendering their application in off-the-shelf systems in-
feasible. Outside of the academic research community, the need for
backward-edge CFI recently lead Intel and ARM to integrate practi-
cal solutions into their next generation processors. Intel CET [29]
introduces new CALL and RET instructions to maintain a shadow
stack through a dedicated register and provides a new page table
protection policy to secure the shadow stack in user-space mem-
ory. The ARMv8.3 64-bit ISA introduces new instructions that add
cryptographic authentication codes [42] to pointers (including re-
turn addresses) stored in memory. Both solutions provide superior
resilience under our threat model but require the purchase of new
hardware, not available in the near future.
Summarizing, in contrast to our presented dual stack design,
none of the related approaches is able to provide a readily and
widely usable, leak-resilient and secure solution for backward-edge
CFI that holds up against a strong threat model.
9 CONCLUSION
In this paper, we presented a leak-resilient dual stack scheme capa-
ble of protecting return addresses even in the presence of sophisti-
cated information disclosure attacks, including leveraging leaked
pointers in libraries, probing in a brute-force manner, and utilizing
memory allocation oracles. To achieve this goal, we studied the
vulnerabilities of previous dual stack designs and developed a novel
approach for stack separation. Our approach minimizes the size of
ASIA CCS ’18, June 4–8, 2018, Incheon, Republic of Korea Philipp Zieris and Julian Horsch
the safe stack, carefully avoids spills of pointers to the safe stack
and optimizes the random placement of the stack in memory. Our
approach is highly flexible and can be used to create completely
protected runtime environments that secure programs and libraries
alike, but also to create aware environments that are able to exe-
cute protected and unprotected programs simultaneously without
compromising the security of protected programs in any way. Our
implementation using the LLVM compiler framework shows that
our design is practical and highly efficient, on average causing no
measurable performance overhead on ARM64 (i.e., 0.0%) and only
a negligible overhead of 2.7% on x86-64.
REFERENCES
[1] Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. 2005. Control-Flow
Integrity: Principles, Implementations, and Applications. In CCS. ACM.
[2] ARM Ltd. 2017. ARM Architecture Reference Manual ARMv8. https://developer.
arm.com/docs/ddi0487/latest. (2017). Version B.b.
[3] Arash Baratloo, Navjot Singh, and Timothy Tsai. 2000. Transparent Run-time
Defense against Stack Smashing Attacks. In USENIX ATC.
[4] Sandeep Bhatkar, R. Sekar, and Daniel C. DuVarney. 2005. Efficient Techniques
for Comprehensive Protection from Memory Error Exploits. In USENIX Security.
[5] Andrea Bittau, Adam Belay, Ali Mashtizadeh, David Maziéres, and Dan Boneh.
2014. Hacking Blind. In S&P. IEEE.
[6] Mihai Budiu, Úlfar Erlingsson, and Martín Abadi. 2006. Architectural Support
for Software-based Protection. In ASID. ACM.
[7] Nicolas Carlini, Antonio Barresi, Mathias Payer, David Wagner, and Thomas R.
Gross. 2015. Control-flow Bending: On the Effectiveness of Control-flow Integrity.
In USENIX Security.
[8] Stephen Checkoway, Lucas Davi, Alexandra Dmitrienko, Ahmad-Reza Sadeghi,
Hovav Shacham, and Marcel Winandy. 2010. Return-Oriented Programming
Without Returns. In CSS. ACM.
[9] Yueqiang Cheng, Zongwei Zhou, Miao Yu, Xuhua Ding, and Robert H. Deng.
2014. ROPecker: A Generic and Practical Approach for Defending Against ROP
Attacks. In NDSS.
[10] Tzi-cker Chiueh and Fu-Hau Hsu. 2001. RAD: A Compile-time Solution to Buffer
Overflow Attacks. In ICDCS.
[11] Mauro Conti, Stephen Crane, Lucas Davi, Michael Franz, Per Larsen, Marco
Negro, Christopher Liebchen, Mohaned Qunaibit, and Ahmad-Reza Sadeghi.
2015. Losing Control: On the Effectiveness of Control-Flow Integrity Under Stack
Attacks. In CCS. ACM.
[12] Marc L. Corliss, E. Christopher Lewis, and Amir Roth. 2005. Using DISE to Protect
Return Addresses from Attack. ACM SIGARCH Computer Architecture News 33, 1
(2005).
[13] Crispan Cowan, Calton Pu, Dave Maier, Jonathan Walpole, Peat Bakke, Steve
Beattie, Aaron Grier, Perry Wagle, and Qian Zhang. 1998. StackGuard: Auto-
matic Adaptive Detection and Prevention of Buffer-Overflow Attacks. In USENIX
Security.
[14] Christopher Dahn and Spiros Mancoridis. 2003. Using Program Transformation
to Secure C Programs against Buffer Overflows. In WCRE. IEEE.
[15] Thurston H.Y. Dang, Petros Maniatis, and David Wagner. 2015. The Performance
Cost of Shadow Stacks and Stack Canaries. In ASIACCS. ACM.
[16] Lucas Davi, Alexandra Dmitrienko, Manuel Egele, Thomas Fischer, Thorsten
Holz, Ralf Hund, Stefan Nürnberger, and Ahmad-Reza Sadeghi. 2012. MoCFI: A
Framework to Mitigate Control-Flow Attacks on Smartphones. In NDSS.
[17] Lucas Davi, Matthias Hanreich, Debayan Paul, Ahmad-Reza Sadeghi, Patrick
Koeberl, Dean Sullivan, Orlando Arias, and Yier Jin. 2015. HAFIX: Hardware-
Assisted Flow Integrity Extension. In DAC. ACM.
[18] Lucas Davi, Ahmad-Reza Sadeghi, and Marcel Winandy. 2011. ROPdefender:
A Detection Tool to Defend Against Return-oriented Programming Attacks. In
ASIACCS. ACM.
[19] Úlfar Erlingsson, Martín Abadi, Michael Vrable, Mihai Budiu, and Geroge C
Necula. 2006. XFI: Software Guards for System Address Spaces. In USENIX OSDI.
[20] Isaac Evans, Samuel Fingeret, Julian Gonzalez, Ulziibayar Otgonbaatar, Tiffany
Tang, Howard Shrobe, Stelios Sidiroglou-Douskos, Martin Rinard, and Hamed
Okhravi. 2015. Missing the Point(er): On the Effectiveness of Code Pointer
Integrity. In S&P. IEEE.
[21] LLVM Foundation. [n. d.]. SafeStack. https://clang.llvm.org/docs/SafeStack.html.
([n. d.]).
[22] Robert Gawlik, Benjamin Kollenda, Philipp Koppe, Behrad Garmany, and
Thorsten Holz. 2016. Enabling Client-side Crash-resistance to Overcome Diversi-
fication and Information Hiding. In NDSS.
[23] Xinyang Ge, Weidong Cui, and Trent Jaeger. 2017. GRIFFIN: Guarding Control
Flows Using Intel Processor Trace. In ASPLOS. ACM.
[24] Enes Göktaş, Elias Athanasopoulos, Herbert Bos, and Georgios Portokalidis. 2014.
Out of Control: Overcoming Control-Flow Integrity. In S&P. IEEE.
[25] Enes Göktaş, Elias Athanasopoulos, Michalis Polychronakis, Herbert Bos, and
Georgios Portokalidis. 2014. Size Does Matter: Why Using Gadget-Chain Length
to Prevent Code-Reuse Attacks is Hard. In USENIX Security.
[26] Enes Göktaş, Robert Gawlik, Benjamin Kollenda, Elias Athanasopoulos, Georgios
Portokalidis, Cristiano Giuffrida, and Herbert Bos. 2016. Undermining Informa-
tion Hiding (And What to do About it). In USENIX Security.
[27] Yufei Gu, Qingchuan Zhao, Yinqian Zhang, and Zhiqiang Lin. 2017. PT-CFI: Trans-
parent Backward-edge Control Flow Violation Detection Using Intel Processor
Trace. In CODASPY. ACM.
[28] Suhas Gupta, Pranay Pratap, Huzur Saran, and S. Arun-Kumar. 2–6. Dynamic
Code Instrumentation to Detect and Recover from Return Address Corruption.
In WODA. ACM.
[29] Intel Corp. 2017. Control-flow Enforcement Technology Pre-
view. https://software.intel.com/sites/default/files/managed/4d/2a/
control-flow-enforcement-technology-preview.pdf. (2017).
[30] Intel Corp. 2017. Intel 64 and IA-32 Architectures Software Developer’s Manual.
https://software.intel.com/en-us/articles/intel-sdm. (2017).
[31] Mehmet Kayaalp, Meltem Ozsoy, Nael Abu-Ghazaleh, and Dmitry Ponomarev.
2012. Branch Regulation: Low-Overhead Protection from Code Reuse Attacks.
In ISCA. IEEE.
[32] Volodymyr Kuznetsov, László Szekeres, Mathias Payer, George Candea, R. Sekar,
and Dawn Song. 2014. Code-Pointer Integrity. In USENIX OSDI.
[33] Ruby B. Lee, David K. Karig, John P. McGregor, and Zhijie Shi. 2004. Enlist-
ing Hardware Architecture to Thwart Malicious Code Injection. In Security in
Pervasive Computing. Springer.
[34] Yutao Liu, Peitao Shi, Xinran Wang, Haibo Chen, Binyu Zang, and Haibing Guan.
2017. Transparent and Efficient CFI Enforcement with Intel Processor Trace. In
HPCA. IEEE.
[35] Kangjie Lu, Chengyu Song, Byoungyoung Lee, Simon P. Chung, Taesoo Kim, and
Wenke Lee. 2015. ASLR-Guard: Stopping Address Space Leakage for Code Reuse
Attacks. In CCS. ACM.
[36] Danny Nebenzahl, Mooly Sagiv, and Avishai Wool. 2006. Install-time Vaccina-
tion of Windows Executables to Defend against Stack Smashing Attacks. IEEE
Transactions on Dependable and Secure Computing 3, 1 (2006).
[37] Angelos Oikonomopoulos, Elias Athanasopoulos, Herbert Bos, and Cristiano
Giuffrida. 2016. Poking Holes in Information Hiding. In USENIX Security.
[38] Hilmi Özdoganoglu, T.N. Vijaykumar, Carla E. Brodley, Benjamin A. Kuperman,
and Ankit Jalote. 2006. SmashGuard: A Hardware Solution to Prevent Security
Attacks on the Function Return Address. IEEE Trans. Comput. 55, 10 (2006).
[39] Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis. 2013. Trans-
parent ROP ExploitMitigation Using Indirect Branch Tracing. InUSENIX Security.
[40] Seon-Ho Park, Young-Ju Han, Soon jwa Hong, Hyoung-Chun Kim, and Tai-
Myoung Chung. 2007. The Dynamic Buffer Overflow Detection and Prevention
Tool for Windows Executables Using Binary Rewriting. In ICACT. IEEE.
[41] Manish Prasad and Tzi-cker Chiueh. 2003. A Binary Rewriting Defense Against
Stack-based Buffer Overflow Attacks. In USENIX ATC.
[42] Qualcomm Technologies, Inc. 2017. Pointer Authentication on
ARMv8.3. https://www.qualcomm.com/media/documents/files/
whitepaper-pointer-authentication-on-armv8-3.pdf. (2017).
[43] Jeff Seibert, Hamed Okhravi, and Eric Söderström. 2014. Information Leaks
Without Memory Disclosures: Remote Side Channel Attacks on Diversified Code.
In CCS. ACM.
[44] Hovav Shacham. 2007. The Geometry of Innocent Flesh on the Bone: Return-
into-libc without Function Calls (on the x86). In CCS. ACM.
[45] Stelios Sidiroglou, Giannis Giovanidis, and Angelos D. Keromytis. 2005. A Dy-
namic Mechanism for Recovering from Buffer Overflow Attacks. In ISC. Springer.
[46] László Szekeres, Mathias Payer, Tao Wei, and Dawn Song. 2013. SoK: Eternal
War in Memory. In S&P. IEEE.
[47] Caroline Tice, Tom Roeder, Peter Collingbourne, Stephen Checkoway, Úlfar
Erlingsson, Luis Lozano, and Geoff Pike. 2014. Enforcing Forward-Edge Control-
Flow Integrity in GCC & LLVM. In USENIX Security.
[48] Yubin Xia, Yutao Liu, Haibo Chen, and Binyu Zang. 2012. CFIMon: Detecting
Violation of Control Flow Integrity Using Performance Counters. In DSN. IEEE.
[49] Jun Xu, Zbigniew Kalbarczyk, Sanjay Patel, and Ravishankar K. Iyer. 2002. Ar-
chitecture Support for Defending Against Buffer Overflow Attacks. In EASY.
[50] Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, Laszlo Szekeres, Stephen Mc-
Camant, Dawn Song, and Wei Zou. 2013. Practical Control Flow Integrity and
Randomization for Binary Executables. In S&P. IEEE.
[51] Mingwei Zhang, Rui Qiao, Niranjan Hasabnis, and R. Sekar. 2014. A Platform for
Secure Static Binary Instrumentation. In VEE.
[52] Mingwei Zhang and R. Sekar. 2013. Control Flow Integrity for COTS Binaries. In
USENIX Security.
[53] Hong Wei Zhou, Xin Wu, Wen Chang Shi, Jin Hui Yuan, and Bin Liang. 2014.
HDROP: Detecting ROP Attacks Using Performance Monitoring Counters. In
ISPEC. Springer.
