Binary Compatibility For SGX Enclaves by Shinde, Shweta et al.
Binary Compatibility For SGX Enclaves
Shweta Shinde†
UC Berkeley
Jinhua Cui
National University of Singapore &
National University of Defense Technology
Satyaki Sen Pinghai Yuan Prateek Saxena
National University of Singapore
Abstract—Enclaves, such as those enabled by Intel SGX,
offer a powerful hardware isolation primitive for application
partitioning. To become universally usable on future commodity
OSes, enclave designs should offer compatibility with existing
software. In this paper, we draw attention to 5 design decisions
in SGX that create incompatibility with existing software. These
represent concrete starting points, we hope, for improvements
in future TEEs. Further, while many prior works have offered
partial forms of compatibility, we present the first attempt to offer
binary compatibility with existing software on SGX. We present
RATEL, a system that enables a dynamic binary translation engine
inside SGX enclaves on Linux. Through the lens of RATEL,
we expose the fundamental trade-offs between performance and
complete mediation on the OS-enclave interface, which are rooted
in the aforementioned 5 SGX design restrictions. We report on an
extensive evaluation of RATEL on over 200 programs, including
micro-benchmarks and real applications such as Linux utilities.
I. INTRODUCTION
Commercial processors today have native support for trusted
execution environments (TEEs) to run user-level applications
in isolation from other software on the system. A prime
example of such a TEE is Intel Software Guard eXtensions
(SGX) [47]. The hardware-isolated environment created by
SGX, commonly referred to as an enclave, runs a user-level
application without trusting privileged software.
Enclaved TEEs offer a powerful new foundation for com-
partmentalization on commodity OSes. Enclaves remove priv-
ileged software layers (OS or hypervisor) from the trusted
code base of isolated components. Therefore, they crucially
differ from existing isolation primitives like processes, virtual
machines, and containers. Enclaves offer the intriguing possi-
bility of becoming ubiquitously used abstractions in the future,
much like processes, but this demands a scale of usage not
originally envisioned with SGX. Enclaves would need to be
compatible with a large fraction of the existing software and
OS abstractions. Arguably, compatibility is the most important
challenge facing future enclaved TEEs. One would only be
concerned with additional enclave security threats if they could
run the desired application in the first place.
Right from the inception, compatibility with existing
x86 64 binaries has been a recognized issue with SGX [22].
Prior works have proposed a number of different ways of
achieving partial compatibility—offering specific program-
ming languages for authoring enclave code [4], [54], keeping
compatibility with container interfaces [3], [35], or confor-
†Part of the research was done while at National University of Singapore.
mance to specific versions of library interfaces provided by
library OSes [6], [17], [58], [61], [64].
While these approaches are promising and steadily matur-
ing, none of them offer binary compatibility with existing soft-
ware. In existing approaches, applications are expected to be
relinked against specific versions of libraries (e.g., musl, libc,
glibc), ported to a customized OS, or containerized. Such
modifications require significant changes to the complex build
systems in place, often demanding developer involvement and
even access to source code. More importantly, most prior
works have enabled sufficient SGX compatibility to handle
specific applications [38], [52], [53], standard libraries, or
language runtimes [20], [30]–[32], [72] that these platforms
choose. Trade-offs arising between compatibility, security, and
performance are often resolved in the favor of performance in
prior designs. Thus, the complete picture of these fundamental
trade-offs has never been presented.
The purpose of this paper is two-fold. First, we believe that
future designs of enclave TEEs would benefit from under-
standing which design choices made in SGX create binary
incompatibility. In particular, we pinpoint at 5 specific SGX
restrictions and explain how they create a sweeping incom-
patibility with existing OS abstractions of multi-threading,
memory mapping, synchronization, signal-handling, shared
memory, and others. These challenges affect other compati-
bility approaches too. However, our emphasis on full binary
compatibility brings them out more comprehensively.
Second, we study the feasibility of a new approach that
can offer binary compatibility for unmodified applications in
SGX enclaves. Our approach enables interposition of enclave
applications via dynamic binary translation (DBT). DBT is
a mature technique for cross-platform binary compatibility
available even before SGX [11]. It works by instrumenting
machine instructions on-the-fly to provide a layer of trans-
parency to the underlying system. This is sometimes referred
to as application-level virtualization.
We report on our experience of running a widely used DBT
framework called DynamoRIO inside SGX enclaves [14]. The
resulting system called RATEL enables DBT on Intel SGX
enclaves for unmodified x86 64 Linux binaries. RATEL aims
to ensure that it adheres to enclave threat model: it does
not trust the OS in its design. The challenges of enabling
a full-fledged DBT engine is instructive in exposing a set
fundamental trade-offs on SGX—one has to choose “2 out
of 3” between security, binary compatibility, and performance.
RATEL chooses to resolve these in favor of security and binary
ar
X
iv
:2
00
9.
01
14
4v
1 
 [c
s.C
R]
  2
 Se
p 2
02
0
compatibility.
The main challenge addressed by RATEL is that it offers
secure and complete mediation on all data and control flow
between the OS and the application. Our main finding is that
the problem of complete mediation suffers from the “last mile”
phenomenon: We pay modestly in performance to get to partial
mediation, as seen in many prior works, but significantly for
the full binary compatibility on SGX.
As a standalone tool, RATEL offers its own conceptual
utility. It offers complete mediation on all application-OS
interactions, which is useful for security interposition. This
side steps challenges of static source-code based solutions,
which expect changes ahead of time, and can work for dynami-
cally generated or self-modifying code. Further, DBT provides
the facility of instruction-level instrumentation. This can be
useful in many ways: inlining security monitors, sandboxing,
fine-grained resource accounting, debugging, or deployment
of third-party patches in response to newly discovered flaws
(though these are not our focus) [12], [13], [25], [39], [44].
Contributions & Results. RATEL is the first system that tar-
gets binary compatibility for SGX, to the best our knowledge.
Our proposed design enables an industrial-strength dynamic
binary translation engine inside SGX enclaves. We evaluate
compatibility offered by RATEL extensively. We successfully
run a total of 203 unique unmodified binaries across 5
benchmark suites (58 binaries), 4 real-world application use-
cases (12 binaries), and 133 Linux utilities. These encompass
various work-load profiles including CPU-intensive (SPEC
2006), I/O system call intensive (FSCQ, IOZone), system
stress-testing (HBenchOS), multi-threading support (Parsec-
SPLASH2), a machine learning library (Torch), and real-world
applications demonstrated in prior works on SGX. RATEL
offers compatibility but does not force applications to use any
specific libraries or higher-level interfaces.
Our work pin-points 5 specific design choices in SGX that
are responsible for incompatibility. We believe these create
challenges for prior approaches providing partial compatibility
as well. We hope future enclave TEE designs pay attention to
addressing these 5 points from the outset.
II. WHY IS BINARY COMPATIBILITY CHALLENGING?
Intel SGX allows execution of code inside a hardware-
isolated environment called an enclave [22].1 SGX enforces
confidentiality and integrity of enclave-bound code and data.
All enclave memory is private and only accessible when
executing in enclave-mode. Data exchanged with the external
world (e.g., the host application or OS) must reside in public
memory which is not protected. At runtime, one can only
synchronously enter an enclave via ECALLs and exit an enclave
via OCALLs. Any illegal instructions or exceptions in the
enclave create asynchronous entry-exit points. SGX restricts
enclave entry and exits to pre-specified points in the program.
If the enclave execution is interrupted asynchronously, SGX
1Unless stated otherwise, we use the term Intel SGX v1 to refer to the
hardware as well as the trusted platform software (PSW) and the trusted
software development kit (SDK), as shown in Figure 2.
OS
abstraction
Restrictions
Affecting Abstraction
System call arguments R1
Dynamic Loaded / Gen. code R2
Thread Support R5, R2
Signal Handling R1, R5
Thread Synchronization R3, R1
File / Memory Mapping R1, R2, R3, R4
IPC / Shared Memory R3, R4
TABLE I: Ramifications of SGX design restrictions on com-
mon OS abstractions.
saves the enclave context and resumes it at the same program
point at a later time. Our challenge is to interpose securely
and completely on all the control and data transfers between
the enclave and the OS.
A. Restrictions Imposed by SGX Design
Intel SGX protects the enclave by enforcing strict isolation
at several points of interactions between the OS and the user
code. We outline 5 SGX design restrictions.
R1. Spatial memory partitioning. SGX enforces spatial
memory partitioning. It reserves a region that is private to
the enclave and the rest of the virtual memory is public.
Memory can either be public or private, not both.
R2. Static memory partitioning. The enclave has to specify
the spatial partitioning statically. The size, type (e.g.,
code, data, stack, heap), and permissions for its private
memory have to be specified before creation and these
attributes cannot be changed at runtime.
R3. Non-shareable private memory. An enclave cannot
share its private memory with other enclaves on the same
machine.
R4. 1-to-1 private virtual memory mappings. Private mem-
ory spans over a contiguous virtual address range, whose
start address is decided by the OS. Private virtual address
has one-to-one mapping with a physical address.
R5. Fixed entry points. Enclaves can resume execution only
from its last point and context of exit. Any other entry
points/contexts have to be statically pre-specified as valid
ahead of time.
B. Ramifications
Next, we explain the impact of these design restrictions on
various OS and application functionality (see Table I).
R1. Since SGX spatially partitions the enclave memory, any
data which is exchanged with the OS requires copying between
private and public memory. In normal applications, an OS
assumes that it can access all the memory of a user process, but
this is no longer true for enclaves. Any arguments that point to
enclave private memory are not accessible to the OS or the host
process. The enclave has to explicitly manage a public and a
private copy of the data to make it accessible externally and to
shield it from unwanted modification when necessary. We refer
to this as a two-copy mechanism. Thus, R1 breaks functional-
ity (e.g., system calls, signal handling, futex), introduces non-
transparency (e.g., explicitly synchronizing both copies), and
introduces security gaps (e.g., TOCTOU attacks [18], [29]).
R2. Applications often require changes to the size or permis-
sions of enclave memory. For example, memory permissions
change after dynamic loading of libraries (e.g., dlopen)
or files (e.g., mmap), executing dynamically generated code,
creating read-only zero-ed data segments (e.g., .bss), and
for software-based isolation of security-sensitive data. The
restriction R2 is incompatible with such functionality. To
work with this restriction, applications require careful semantic
changes: either weaken the protection (e.g., read-and-execute
instead of read-or-execute), use a two-copy design, or rely on
some additional form of isolation (e.g., using segmentation or
software instrumentation).
R3. SGX has no mechanism to allow two enclaves to share
parts of their private memory directly. This restriction is in-
compatible with the synchronization primitives like locks and
shared memory when there is no trusted OS synchronization
service. Keeping two copies of locks breaks the semantics
and create a chicken-and-egg issue: how to synchronize two
copies of a shared lock without another trusted synchronization
primitive.
R4. When applications demand new virtual address mappings
(e.g., malloc) the OS adds these mappings. The application
can ask the OS to map the same physical page at several
different offsets—either with same or different permissions.
For example, the same file is mapped as read-only at two
places in the program. Since, SGX doesn’t allow the same
physical address to be mapped to multiple virtual addresses,
any such mappings generate a general protection fault in SGX.
R5. SGX starts or resumes enclave execution only from con-
trolled entry points i.e., the virtual address and the execution
context. However, there are several unexpected entry points to
an application when we run them unmodified in an enclave
(e.g., exception handlers, library functions, illegal instruc-
tions). Statically determining all potential program points for
re-entry is challenging. Further, when the enclave resumes
execution, it expects the same program context. This does
not adhere with typical program functionality. Normally, if
the program wants to execute custom error handling code,
say after a divide-by-zero (SIGFPE) or illegal instruction
(SIGILL), it can resume execution at a handler function in the
binary with appropriate execution context setup by the OS.
On the contrary, SGX will resume enclave execution at the
same instruction and same context (not the OS setup context
for exception handling), thus re-triggering the exception.
Intel is shipping SGX v2, wherein an enclave can make
dynamic changes to private page permissions, type, and size.
We discuss their specifics in Section VII. Note that, SGX v2
only addresses R2 partially, while all the other restrictions
still hold true. Thus, for the rest of the paper, we describe our
design based on SGX v1.
III. OVERVIEW
Before we present our design, we emphasize our key empir-
ical takeaway that led to it: Working with restrictions R1−R5,
we are faced with a “choose 2-out-of-3” trilemma between
security, performance, and binary compatibility. We explain
these trade-offs in Section III-B. Our design picks security
and compatibility over performance, wherever necessary. In
this design principle, it fundamentally departs from prior work.
Several different approaches to enable applications in SGX
enclaves have been proposed. In nearly all prior works, perfor-
mance consideration dominate design decisions. A prominent
way to side-step the performance costs of ensuring compati-
bility and complete mediation is to ask the application to use a
prescribed program-level interface or API. The choice of inter-
faces vary. They include specific programming languages [20],
[28], [30], [73], application frameworks [40], container in-
terfaces [3], and particular implementation of standard libc
interfaces. Figure 1 shows the prescribed interfaces in three
approaches, including library OSes and container engines, and
where they intercept the application to maintain compatibility.
Given that binary compatibility is not the objective of prior
works, they handle subsets of R1 − R5. One drawback of
these approaches is that if an application does not originally
use the prescribed API, the application needs to be rewritten,
recompiled from source, or relinked against specific libraries.
Our work poses the following question: Can full binary
compatibility be achieved on SGX, and if so, with what
trade-offs in security or performance? Application binaries are
originally created with the intention of running on a particular
OS and we aim to retain compatibility with the OS system
call interface (e.g., Linux). In concept, applications are free to
use any library, direct assembly code, and runtime that uses
the Linux system call interface. The central challenge we face
is to enable secure and complete mediation of all data and
control flow between the application and the OS. We do this
by enabling a widely used DBT engine inside enclaves.
A. Background: Dynamic Binary Translation
Dynamic binary translation is a well-known approach to
achieving full binary compatibility. It was designed to secure
and complete mediation: the ability to intercept each instruc-
tion in the program before it executes. DBT works by first
loading the binary code that is about to be executed into its
own custom execution engine. It then updates the code in-situ,
if required, and then dispatches it for execution. To contrast
it with the approach of changing libc, DBT intercepts the
application right at the point at which it interacts with the OS
(see Figure 1).
In this paper, we choose DynamoRIO as our DBT engine,
since it is open-source and widely used in industry [11].2
DynamoRIO is itself an example of just-in-time compilation
engine which dynamically generates code. At a high level,
DynamoRIO first loads itself and then loads the application
code in a separate part of the virtual address, as show in
2The other option is Intel Pin [36], but it is not open-source.
Container
shield
musl
Process
glibc
libos
Process
shim
libc
Operating System
Process
App
libc
DR
(a) (b) (c) (d)
App App App
Fig. 1: Different abstraction layers for compatibility. Black
shaded regions are untrusted, gray shaded regions are mod-
ifications or additions, thick solid lines are enclave bound-
aries, clear boxes are unmodified components, zig-zag lines
show break in compatibility. (a) Container abstraction with
musl libc interface (Scone [3]). (b) Library OS with glibc
interface (Graphene-SGX [17]). (c) Process abstraction with
POSIX interface (Panoply [64]). (d) Dynamic Binary Trans-
lation with DynamoRIO in RATEL (This work).
Figure 2. Similarly, it sets up two different contexts, one for
itself and one for the application. DynamoRIO can update
the code on-the-fly before putting it in the code-cache by
re-writing instructions (e.g., convert syscall instruction to a
library function call). Such rewriting ensures that DynamoRIO
engine takes control before each block of code executes,
enabling the ability to interpose on every instruction. Instru-
mented code blocks are placed in a region of memory called
a code cache. When the code cache executes, DynamoRIO
regains control as the instrumentation logic desires. It does
post-execution updates to itself for book-keeping or to the
program’s state. Additionally, DynamoRIO hooks on all events
intended for the process (e.g., signals). The application it-
self is prevented from accessing DynamoRIO memory via
address-space isolation techniques [39]. Thus, it acts as an
arbiter between the application’s binary code and the external
environment (e.g., OS, filesystem) with secure and complete
mediation.
RATEL retains the entire low-level instruction translation
and introspection machinery of DynamoRIO, including the
code cache and its performance optimizations. This en-
ables reusing well-established techniques for application trans-
parency, instrumentation, and performance enhancements. We
eliminate the support for auxiliary plugins to reduce TCB.
B. RATEL Approach
Our design must provide compatibility for both the Dy-
namoRIO DBT engine as well as any application binary code
that runs translated. We provide a high-level overview of
RATEL and then explain the key trade-offs we face when
forced with compliance to SGX restrictions R1−R5.
High-level Overview. We modify DynamoRIO to adhere to
SGX virtual memory limitations (R1-R4) by setting up our
custom layout. Specifically, we analyze DynamoRIO code to
identify its entire virtual address layout. This allows us to load
App libc lib1 lib2
Context Switch
Execute
Loader
Pre-compiled Application Binaries
{DR} {App, asm, libc, …}
co
de
da
ta
st
ac
k
he
ap
co
de
da
ta
st
ac
k
he
ap
VA Layout
DBT 
Engine
Operating System
Enclave
asm
SDK
PSW
Signal
Handler
Memory
Manager
Thread
Manager
Lock
Manager
Code Dispatcher
Code Cache
Fig. 2: RATEL overview.
RATEL and start its execution without violating the memory
semantics of SGX. We register a fixed entry point in RATEL
when entering or resuming the enclave. This entry point acts
as a unified trampoline, such that upon entry, RATEL decides
where to redirect the control flow, depending on the previously
saved context. In DynamoRIO code, we manually replace
instructions that are illegal in SGX with an external call that
executes outside the enclave. Thus, RATEL execution itself is
guaranteed to never violate R5.
The same challenges show up when RATEL starts loading
and running the translated application binary. However, we
have the advantage of dynamically rewriting the application
logic to adapt it to R1-R5. We statically initialize the virtual
memory size of the application to the maximum allowed
by SGX; the type and permissions of memory is set to the
specified type in the binary. We add a memory management
unit to DynamoRIO to keep track of and transparently update
the applications layout. At runtime, when the application
makes direct changes to its own virtual memory layout via
system calls, RATEL dynamically adapts it to SGX (e.g., by
making two copies or relocating the virtual mappings). RATEL
intercepts all application interactions with the OS. It modifies
application parameters, OS return values, OS events for mon-
itoring indirect changes to the memory (e.g., thread creation).
In the other direction, RATEL also intercepts OS events on the
behalf of the application. Upon re-entry, if the event has to
be delivered to the application, it sets/restores the appropriate
execution context and resumes execution via the trampoline.
Lastly, before executing any application logic, RATEL scans
the code cache for any instructions (e.g., syscall, cpuid) that
may potentially be deemed as illegal in SGX and replaces it
with an external call. Thus, RATEL remedies the application
on-the-fly to adhere to R1-R5.
Key Design Trade-offs. RATEL aims for secure and complete
mediation on all data and control flow between the application
and OS, through the use of DBT. This makes RATEL useful
for a wide variety of reasons: in-lining security monitoring,
software sandboxing, and even profiling and debugging. We
do not assume that the application is written to help RATEL
by adhering to restrictions beyond those specified by a normal
OS, nor do we trust the OS. SGX restrictions R1−R5 give rise
to trade-offs between security, compatibility, and performance.
We point out that these are somewhat fundamental and apply
to RATEL and other compatibility efforts equally. However,
RATEL chooses security and compatibility over performance,
whenever conflicts between the three arise.
Whenever the application reads from or writes outside the
enclave, the data needs to be placed in public memory due to
R1. Computing on data in public memory, which is exposed
to the OS, is insecure. Therefore, if the application wishes
to securely compute on the data, a copy must necessarily be
maintained in a separate private memory space, as R2 forbids
making changing data permissions. This leads to a “two
copy” mechanism, instances of which repeat throughout the
design. The two-copy mechanism, however, incurs both space
and computational performance overheads, as data has to be
relocated at runtime. Further, certain data structure semantics
which require a single memory copy (e.g. futexes) become
impossible to keep compatibility with (see Section IV-D).
R3 creates an “all or none” trust model between enclaves.
Either memory is shared with all entities (including the OS)
or kept private to one enclave. R4 restricts sharing memory
within an enclave further. These restrictions are in conflict with
semantics of shared memory and synchronization primitives.
To implement such abstractions securely, the design must rely
on a trusted software manager which necessarily resides in
an enclave, since the OS is untrusted. Applications can then
have compatibility with lock and shared memory abstractions
and securely, but at the cost of performance: Access to shared
services turn into procedure calls to the trusted manager.
Restriction R5 requires that whenever the enclave resumes
control after an exit, the enclave state (or context) should be
the same as right before exit. This implies that the security
monitor (e.g., the DBT engine) must take control before all
exit points and after resumption, to save-restore contexts—
otherwise, the mediation can be incomplete, creating security
holes and incompatibility. Without guarantees of complete
mediation, the OS can return control into the enclave, bypass-
ing a security checks that the DBT engine implements. The
price for complete mediation on binaries is performance: the
DBT engine must intercept all entry/exit points and simulate
additional context switches in software. Prior approaches, such
as library-OSes, sacrifice complete mediation (security) for
better performance, by asking applications to link against spe-
cific library interfaces which tunnel control via certain points.
But, this does not enforce complete mediation as applications
can make direct OS interactions or override entry handlers,
intentionally or by oversight. There are several further security
considerations that arise in the details of the above design
decisions. These include (a) avoiding naı¨ve designs that have
TOCTOU attacks; (b) saving and restoring the execution
context from private memory; (c) maintaining RATEL-specific
metadata in private memory to ensure integrity of memory
mappings that change at runtime; and (d) explicitly zeroing
out memory content and pointers after use. We explain them
inline in Section IV.
C. Scope
Many challenges are common between the design of RA-
TEL presented here and other systems. These include en-
cryption/decryption of external file or I/O content [3], [35],
[61], [69], sanitization of OS inputs to prevent Iago at-
tacks [18], [37], [66], [71], defenses against known side-
channel attacks [8], [34], [56], [62], [63], additional attestation
or integrity of dynamically loaded/generated code [30]–[32],
[72], and so on. These are important but largely orthogonal to
the focus of this work. These can be implemented on top of
RATEL in the future.
Our focus is to expose the compatibility challenges that
SGX creates with rich process-level abstractions. These re-
quire careful design to eliminate additional security threats.
One limitation of RATEL is that the present implementation
of RATEL has support for majority but not all of the Linux
system calls. The most notable of the unsupported system calls
is fork which is used for multi-processing. We believe that
the basic design of RATEL can be extended to support fork
with the two-copy mechanism, similar to prior work [64]. Our
experience suggests that adding other system calls is a tedious
but conceptually straight-forward effort in RATEL. We expect
to expand the syscall coverage over time, possibly with the
help of automated tools.
IV. RATEL DESIGN
Our main challenge is to execute the system functionality
securely while faithfully preserving its semantics for compati-
bility. We explain how RATEL design achieves this for various
sub-systems that are typically expected by applications.
A. System Calls & Unanticipated Entry-Exits
SGX does not allow enclaves to execute several instructions
such as syscall, cpuid, rdtsc. If the enclave executes
them, SGX exits the enclave and generates a SIGILL signal.
Gracefully recovering from the failure requires re-entering
the enclave at a different program point. Due to R5, this is
disallowed by SGX. In RATEL, either DynamoRIO or the
application can invoke illegal instructions, which may create
unanticipated exits from the enclave.
RATEL has three ways to handle them: (a) entirely delegate
the instruction outside the enclave (e.g., file, networking,
and timer operations); (b) execute the instruction outside the
enclave while explicitly updating the in-enclave state (e.g.,
thread operations, signal handling); or (c) completely simulate
instruction inside the enclave.
RATEL changes DynamoRIO logic to convert such illegal
instruction to stubs that either delegate or emulate the function-
ality. For the target application, whenever RATEL observes an
illegal instructions in the code cache, it replaces the instruction
with a call to the RATEL syscall handler function.
Syscalls are a special case—they access process memory for
input-output parameters and error codes. Since enclaves do not
allow this, for delegating the syscall outside the enclave, RA-
TEL creates a copy of input parameters from private memory
to public memory. This includes simple value copies as well as
deep copies of structures. The OS then executes the syscall and
generates results in public memory. Post-call, RATEL copies
back the explicit results and error codes to private memory.
Memory copies alone are not sufficient. For example, when
loading a library, the application uses dl open which in
turn calls mmap. When we execute the mmap call outside the
enclave, the library is mapped in the untrusted public address
space of the application. However, we want it to be mapped
privately inside the enclave. As another example, consider
when the enclave wants to create a new thread local storage
(TLS) segment. If RATEL executes the system call outside
the enclave, the new thread is created for the DynamoRIO
runtime instead of the target application. Thus, when a syscall
implicitly changes application state, RATEL has to explicitly
propagate those changes inside the enclave.
Alternatively, RATEL selectively emulates some syscalls
inside the enclave. For example, arch prctl is used to read
the FS base, RATEL substitutes it with a rdfsbase instruction
and executes it inside the enclave. We outline the details of
other syscall subsystems that are fully or partially emulated
by RATEL in Sections IV-B, IV-C, IV-D, IV-E.
RATEL resumes execution only after the syscall state has
been completely copied inside the enclave. This allows it to
employ sanitization of OS results before using it. Further, all
the subsequent execution is strictly over private memory to
avoid TOCTOU attacks.
B. Memory Management
For syscalls that change process virtual address layout, RA-
TEL has to explicitly reflect their changes inside the enclave.
First, this is not straightforward. Due to R1-R4, several layout
configurations are not allowed for enclave virtual memory
(e.g., changing memory permissions). Second, RATEL does not
trust the information provided by the OS (e.g., via procmap).
To address these challenges, RATEL maintains its own
procmap-like structure. Specifically, RATEL keeps its own
view of the process virtual memory inside the enclave, tracks
the memory-related events, and updates the enclave state. For
example, after mmap call succeeds outside the enclave, RATEL
allocates and records the new virtual address located inside
the enclave.
Further, RATEL synchronizes the two-copies of memories
to maintain execution semantics. For example, after mmap,
RATEL creates a new memory mapping inside the enclave
Clone Thread  APIs  
Enclave
Host OS
TC
S 1
TC
S 2
TC
S 9… TC
S n
Cl
on
e 
O
CA
LL
 
re
qu
es
ts
Clone RET
Child Threads
App Parent Thread
DBT Engine
Thread Assistant
ECALLs
TCS TCS Non-busy Busy
Fig. 3: Design for multi-threading in RATEL.
and then copies the content of the mmap-ed memory inside the
enclave. On subsequent changes to mmaped-memory, RATEL
updates the non-enclave memory via a write. This is done
whenever the application either unmaps the memory or invokes
sync or fsync system call.
With mediation over memory management, RATEL trans-
parently side-steps SGX restrictions. When application makes
requests that are not allowed in SGX (e.g., changing memory
permissions), RATEL replaces it with a sequence of valid SGX
operations that achieve the same effect (e.g., move the content
to memory which has the required permissions). Subsequently,
when the application binary accesses memory, RATEL can
pre-emptively replace the addresses to access the correct in-
enclave copy of the memory. This allows us to safely and
transparently mimic disallowed behavior inside the enclave.
RATEL does not blindly replicate OS-dictated memory
layout changes inside the enclave. It first checks if the resultant
layout will violate any security semantics (e.g., mapping a
buffer to zero-address). It proceeds to update enclave layout
and memory content only if these checks succeed. To do this,
RATEL keeps its metadata in private memory.
C. Multi-threading
SGX requires the application to pre-declare the maximum
number of threads before execution (R2). Further, it does not
allow the enclave to resume at arbitrary program points or
execution contexts (R5). This creates several compatibility and
security challenges in RATEL.
DynamoRIO and the target application share the same
thread, but they have separate TLS segment for cleaner context
switch. DynamoRIO keeps the threads default TLS segment
for the target application and creates a new TLS segment for
itself at a different address. DynamoRIO switches between
these two TLS segments by changing the segment register—
DynamoRIO uses gsbase, application uses fsbase. SGX
allocates one TLS segment per enclave thread. SGX uses the
same mechanism as DynamoRIO (i.e., changing the segment)
to maintain a shadow TLS segment for itself when executing
enclave code.
Multiplexing Base Registers. When we attempt to run Dy-
namoRIO inside SGX, there are not enough registers to save
three offsets (one for DynamoRIO, one for SGX, one for
the application). To circumvent this limitation, RATEL adds
two TLS segment fields to store fsbase and gsbase register
values. We use these TLS segment fields to save and restore
pointers to the segment base addresses. This allows us to
maintain and switch between three clean TLS segment views
per thread.
Primary-secondary TLS Segment Design. Since RATEL is
in-charge of maintaining the view of multiple-threads, it has to
switch the TLS segment to a corresponding thread every time
the execution enters or exits the enclave. We simplify these
operations with a primary-secondary TLS segment design.
RATEL adds a new field to the SGX thread data structure—a
flag to indicate if the TLS segment is the primary or not.
RATEL marks the default first TLS segment created by
SGX as the primary. To do this, it sets the flag of the TLS
segment when the execution enters the enclave for the first
time after creation. All the subsequent TLS segment, if created,
are marked as secondary. If the flag is false, the base value
stores the pointer to the addresses of the primary TLS segment.
Otherwise, it points to the secondary TLS segment required
to execute DynamoRIO. With this mechanism, upon enclave
entry, RATEL circulates through the TLS segment pointers
until it finds the addresses for the primary TLS segment.
Dynamic Threading. Since the number of TCS entries is fixed
at enclave creation time, the maximum number of threads
supported is capped. RATEL multiplexes the limited TCS
entries, as shown in Figure 3. When an application wants
to create a new thread (e.g., via clone), RATEL first checks
if there is a free TCS slot. If it is the case, it performs an
OCALL to do so outside the enclave. Otherwise, it busy-waits
until a TCS slot is released. Once a TCS slot is available,
the OCALL creates a new thread outside the enclave. After
finishing thread creation, the parent thread returns back to the
enclave and resumes execution. The child thread explicitly
performs an ECALL to enter the enclave and DynamoRIO
resumes execution for the application’s child thread.
For all threading operations, RATEL ensures transparent
context switches to preserve binary compatibility as intended
by DynamoRIO. For security, RATEL creates and stores all
thread-specific context either inside the enclave or SGX’s
secure hardware-backed storage at all times. It does not use
any OS data structures or addresses for thread management.
D. Thread Synchronization
SGX provides basic synchronization primitives (e.g., SGX
mutex and conditional locks) backed by hardware locks. But
they can only be used for enclave code. Thus, they are
semantically incompatible with the lock mechanisms used by
DynamoRIO or legacy applications which use OS locks. For
example, DynamoRIO implements a sophisticated mutex lock
using futex syscall, where the lock is kept in the kernel
memory. Supporting this requires trusting the OS for sharing
Encl A
t1 t2
Encl B
t3 t4
OS
t1 t2 t3 t4
OS
t1 t2 t3 t4
Manager 
Enclave
t1 t2
(a) (b) (c) (d)
Encl A Encl B Encl A Encl B Encl A
Fig. 4: Design choices for lock synchronization. (a) Traditional
Futex. (b) Two-copy design with futex in public memory. (c)
Dedicated lock manager in a separate enclave. (d) RATEL case
where the threads are in the same enclave.
locks. Given R1 and R3, SGX does not offer any memory
sharing model, making it impossible to support futexes.
Need for a Lock Manager. A naive design would be to
maintain a shadow futex variable in public memory, such
that it is accessible to the enclave(s) and the OS. However, the
OS can arbitrarily change the lock state and attack the applica-
tion. As an alternative, we can employ a two-copy mechanism
for locks. The enclave can keep the lock in private memory.
When it wants to communicate state change to the OS, RATEL
can tunnel a futex OCALL to the host OS. There are several
problems even with this approach. Threads inside the enclave
may frequently update the locks in private memory. Polling or
accessing the futex outside the enclave (including the kernel
and the untrusted part of the enclave) requires the latest state
of lock every time there is an update in private memory.
This creates an opportunity for the OS to launch TOCTOU
attack. Even without TOCTOU attacks, it is challenging to
synchronize the two copies in benign executions. Specifically,
private lock states can be changed while the public state is
being updated. This results in threads inside and outside the
enclave with inconsistent views of the same lock (i.e., private
and public copy). The more frequent the local updates, the
higher is the probability of such inconsistencies. In general,
the only way to avoid such race condition bugs is to use
locks for synchronizing the private and public state of the
enclave locks. This is impossible with SGX because it does
not support secure memory sharing between the OS and the
enclave(s). Figure 4 shows the schematics of design choices
for implementing synchronization primitives.
RATEL Lock Manager. Given the futex-usage of Dy-
namoRIO, we identify that we can avoid sharing an enclave’s
lock directly with the OS or other enclaves. The DynamoRIO
usage of futexes can be replaced with a simpler primitive such
as spinlocks to achieve the same functionality. Specifically, we
implement a lock manager in RATEL. We use the hardware
spinlock exposed by SGX to do this securely and efficiently
inside the enclave. In RATEL we invoke our lock manager
implementation wherever DynamoRIO tries to use futexes.
The other instance of futex usage is in the application binaries
being executed with RATEL. To handle those cases, when
RATEL loads application binary into the code cache, it replaces
thread-related calls (e.g., pthread cond wait) with stubs to
invoke our lock manager to use safe synchronization.
E. Signal Handling
RATEL cannot piggyback on the existing signal handling
mechanism exposed by the SGX, due to R5. Specifically, when
DynamoRIO executes inside the enclave, the DynamoRIO
signal handler needs to get description of the event to handle
it (Figure 5(a)). However, Intel’s SGX platform software
removes all such information when it delivers the signal to
to the enclave. This breaks the functionality of programmer-
defined handlers to recover from well known exceptions
(divide by zero). Further, any illegal instructions inside the
enclave generate exceptions, which are raised in form of
signals. Existing binaries may not have handlers for recovering
from instructions illegal in SGX.
SGX allows entering the enclave at fixed program points
and context. Leveraging this, RATEL employs a primary signal
handler that it registers with SGX. For any signals generated
for DynamoRIO or the application, we always enter the en-
clave via the primary handler and we copy the signal code into
the enclave. We then use the primary as a trampoline to route
the control to the appropriate secondary signal handler inside
the enclave, based on the signal code. At a high-level, we
realize a virtualized trap-and-emulate signal handling design.
We use SGX signal handling semantics for our primary. For
the secondary, we setup and tear down a separate stack to
mimic the semantics in the software. The intricate details of
handling the stack state at the time of such context switched
are elided here. Figure 5(b) shows a schematic of our design.
Registration. Original DynamoRIO code and the application
binary use sigaction to register signal handlers for itself. We
refer to them as secondary handlers. In RATEL, first we change
DynamoRIO logic to register only the primary signal handler
with SGX. We then record the DynamoRIO and application
registrations as secondary handlers. This way, whenever SGX
or the OS wants to deliver the signal for the enclave, SGX
directs the control to our primary handler inside the enclave3.
Since this is a pre-registered handler, SGX allows it. The
primary handler checks the signal code and explicitly routes
execution to the secondary.
Delivery. A signal may arrive when the execution control is
inside the enclave (e.g., timer). In this case, RATEL executes
a primary signal handler code that delivers the signal to the
enclave. However, if the signal arrives when the CPU is in a
non-enclave context, SGX does not automatically invoke the
enclave to redirect execution flow. To force this, RATEL has
to explicitly enter the enclave. But it can only enter at a pre-
registered program point with a valid context (R5). Thus we
first wake up the enclave at a valid point (via ECALL) and
copy the signal code. We then simulate the signal delivery by
setting up the enclave stack to execute the primary handler.
3Vanilla SGX PSW does not provide an API which allows the enclave to
register for signals, we have changed the PSW to support this.
Exit. After executing their logic, handlers use sigreturn
instruction for returning control to the point before the signal
interrupted the execution. When RATEL observes this instruc-
tion in the secondary handler it has to simulate a return
back to the primary handler instead. The primary handler
then performs its own real sigreturn. SGX then resumes
execution from the point before the signal was generated.
V. IMPLEMENTATION
We implement RATEL by using DynamoRIO. We run Dy-
namoRIO inside an enclave with the help of standard Intel
SGX development kit that includes user-land software (SDK
v2.1), platform software (PSW v2.1), and a Linux kernel driver
(v1.5). We make a total of 9667 LoC software changes to
DynamoRIO and SDK infrastructure. We run RATEL on an
unmodified hardware that supports SGX v1.
RATEL design makes several changes to DynamoRIO core
(e.g., memory management, lock manager, signal forwarding).
When realizing our design, we address three high-level imple-
mentation challenges. Their root cause is the way Intel SDK
and PSW expose hardware features and what DynamoRIO
expects. There are several low-level challenges that we do not
discuss here for brevity.
Self-identifying Load Address. DynamoRIO needs to know
its location in memory mainly to avoid using its own address
space for the application. In RATEL, we use a call− pop
mechanism to self-identify our location in memory. These
contiguous instructions, allow us to dynamically retrieve our
own address. Specifically, we align this address and decrease
it at a granularity of page-size to compute our start-address in
memory.
Insufficient Hardware Slots. By default, SGX SDK and PSW
assume two SSA frames, which are sufficient to handle most of
the nested signals. Since RATEL design needs one SSA frame
for itself, we increase the SSA frames to three to ensure we
can handle the same set of nested signals as SGX. The SGX
specification allows this by changing the NSSA field in our
PSW implementation.
Preserving Execution Contexts. For starting execution of a
newly created thread, RATEL invokes a pre-declared ECALL
to enter the enclave. This is a nested ECALL, which is not
supported by SGX SDK. To allow it, we modify the SDK to
facilitate the entrance of child threads and initialize the thread
data structure for it. Specifically, we check if the copy of thread
arguments inside the enclave matches the ones outside before
resuming thread execution. We save specific registers so that
the thread can exit the enclave later. Note that the child thread
has its own execution path differentiating from the parent one,
RATEL hence bridges its return address to the point in the
code cache that a new thread always starts. After the thread
is initialized, we explicitly update DynamoRIO data structures
to record the new thread (e.g., the TLS base for application
libraries) This way, DynamoRIO is aware of the new thread
and can control its execution in the code cache.
App
Signal Pending
save user context
DR
Handler 
Signal
Handler 
No Pending Signal
Restore user context
User
Kernel
(a)
Signal Pending
Save user context
External
Memory
Host 
Public
Kernel
Enclave Save Encl context A
Exit
No Pending Signal
Restore user context
Resume
External 
sigret
Restore 
context A
Internal primary
Save context B
Secondary 
handler
Restore
context B
(b)
Encl 
A
Fig. 5: (a) Original signal design in DynamoRIO. (b) Design for signal design in RATEL.
Propagating Implicit Changes & Metadata. Thread uses
exit/exit group syscall for terminating itself. Then the OS
zeros out the child thread ID (ctid). In RATEL, we explicitly
create a new thread inside the enclave, so we have to terminate
it explicitly by zeroing out the pointers to the IDs. Further,
we clean up and free the memory associated with each thread
inside and outside the enclave.
VI. EVALUATION
We primarily evaluate the compatibility of RATEL and
highlight the advantages gained due to binary compatibility.
We further provide a TCB breakdown of RATEL and point
out the performance ramifications, which are common and
comparable to state-of-the-art other approaches.
Setup. All our experiments are performed on a Lenovo ma-
chine with SGX v1 support, 128 MB EPC configured in the
BIOS of which 96 MB is available for user-enclaves, 12 GB
RAM, 64 KB L1, 256 KB L2, 4096 KB L3 cache, 3.4GHz
processor speed. Our software setup comprises Ubuntu ver-
sion 16.04, Intel SGX SDK v2.1, PSW v2.1, driver v1.5,
gcc v5.4.0, DynamoRIO v6.2.17. All performance statistics
reported are the geometric mean over 5 runs.
A. Compatibility
Selection Criteria. We select 310 binaries that cover an
extensive set of benchmarks, utilities, and large-scale appli-
cations. They comprise commonly used evaluations target
for DynamoRIO and enclave-based systems [3], [10], [17],
[21], [33], [65], [66] that we surveyed for our study. Fur-
ther, they represent a mix of memory, CPU, multi-threading,
network, and file I/O workloads. Our 69 binaries are from
micro-benchmarks targets: 29 from SPEC2006 (CPU), 1
from IOZone (I/O) v3.487, 9 from FSCQ v1.0 (file API),
21 from HBenchOS v1.0 (system stress-test), and 9 from
Parsec-SPLASH2 (multi-threading). We run 12 binaries from
3 real-world applications—cURL v7.65.0 (server-side utili-
ties), SQLite v3.28.0 (database), Memcahed v1.5.20 (key-
value store), and 9 applications from Privado (secure ML
Subsystem Total Impl Implementation Covered
Del Emu P.Emu DR + Binaries
Process 12 8 4 2 2 3
Filename based 37 25 25 0 0 16
Signals 12 7 3 4 0 6
Memory 18 10 6 0 4 4
Inter process communication 12 4 4 0 0 0
File descriptor based 65 53 48 0 5 30
File name or descriptor based 19 9 9 0 0 5
Networks 19 17 15 0 2 15
Misc 124 79 79 0 0 36
Total 318 212 193 6 13 115
TABLE II: RATEL syscall support. Column 2− 3: total Linux
system calls and support in RATEL. Column 4 − 6: syscalls
implemented by full delegation, full emulation, and partial
emulation. Column 7: syscalls tested in RATEL.
framework). We test 229 Linux utilities from our system’s
/bin and /usr/bin directories.
Porting Efforts. For benchmarks and applications, we down-
load the source code and compile them with default flags
required to run them natively on our machine. We directly
use the existing binaries for Linux utilities. We test the same
binaries on native hardware, with DynamoRIO, and with
RATEL. Thus, we do not change the original source-code or
the binaries. We test the target binaries on native Linux and on
vanilla DynamoRIO. 278/310 of targets execute successfully
with these baselines.
The remaining 32 binaries either use unsupported devices
(e.g., NTFS) or do not run on our machine. So we discard
them from our RATEL experiments, since vanilla DynamoRIO
also does not work on them. Of the remaining 278 binaries
that work on the baselines, RATEL has support for the system
calls used by 203 of these. We support all of these and directly
execute them with RATEL, with zero porting effort.
System Call Support & Coverage. RATEL supports a total of
212/318 (66.66%) syscalls exposed by the Linux Kernel. We
emulate 6 syscalls purely inside the enclave and delegate 193
of them via OCALLs. For the remaining 13, we use partial
emulation and partial delegation. Table II gives a detailed
1 30 60 90 120 150 180 210
Binaries
10
20
30
40
50
60
#
 o
f u
ni
qu
e 
sy
st
em
 c
al
ls
(a) Number of unique syscalls
1 20 40 60 80 100 120
Unique system calls
1
30
60
90
120
150
180
210
Fr
eq
ue
nc
y
(b) Syscall frequency
Fig. 6: System calls statistics over all 208 binaries. (a) Unique syscalls for each binary; and (b) frequency per syscall.
Function Trusted Untrusted
SDK+PSW DR Total SDK+PSW DR Driver Host Total
Original 147928 129875 277803 49838 66629 2880 1769 121116
Loader 69 1604 1673 27 89 N/A 332 448
MM 46 2241 2287 44 0 N/A 0 44
Syscalls 0 1801 1801 0 0 N/A 1432 1432
Instr 0 45 45 0 0 N/A 26 26
TLS 18 60 78 18 0 N/A 0 18
Signals 201 236 437 136 0 N/A 0 136
Threading 389 393 782 130 0 N/A 157 287
Sync 0 173 173 0 0 N/A 0 0
TABLE III: Breakdown of RATEL TCB.
breakdown of our syscall support. Syscall usage is not uniform
across frequently used applications and libraries [70]. Hence
we empirically evaluate the degree of expressiveness supported
by RATEL. For all of the 278 binaries in our evaluation, we
observe total 121 unique syscalls, RATEL supports 115 of
them. Table II shows the syscalls supported by RATEL and
their usage in our benchmarks and real-world applications
(See Appendix A). Figure 6a and 6b show the distribution of
unique syscalls and their frequency as observed over binaries
supported by RATEL. Thus, our empirical study shows that
RATEL supports 115/121 (95.0%) syscall observed in our
dataset of binaries. To support 212 syscalls, we added 3233
LoC (10 LoC per syscall on average). In the future, RATEL
can be extended to increase the number of supported syscalls.
Interested readers can refer to Appendix A for more details.
RATEL handles 31 out of the 32 standard signals defined for
Linux. It does not handle SIGPROF presently because the
original DynamoRIO has no support for it natively.
There are 72 binaries that use syscalls that are presently
unsupported in RATEL. 3 from HBenchOS fail because RATEL
does not support fork. 1 from SPEC2006 fails because SGX
has insufficient virtual memory. 68 are Linux utilities of
which 39 are multi-process, 10 use unsupported ioctls, 12 use
unsupported signals, 1 fails with the same reason as the one
from SPEC2006, and 6 use other unsupported system calls.
Library vs Binary Compatibility. We maintain full binary
compatibility with all 203 binaries tested for which we had
system call support in RATEL. For them, RATEL works out-
of-the-box in our experiments. We report that, given the same
inputs as native execution, RATEL produces same outputs.
With the aim of true binary compatibility, RATEL supports
binaries without limiting them to a specific implementation or
version of libc. To empirically demonstrate that our binary
compatibility is superior, we test RATEL with binaries that
use different libc implementations. Specifically, we compile
HBenchOS benchmark (12 binaries) with glibc v2.23 and
musl libc v1.2.0. We report that RATEL executes these
system stress workloads out-of-the-box with both the libraries.
We do not make any change to our implementation. Lastly,
we report our experience on porting our micro-benchmarks
to a state-of-the-art library-OS called Graphene-SGX in Ap-
pendix B. Of the 77 programs tested, Graphene-SGX fails on
13. RATEL works out-of-the-box for all except 1, which failed
due to the virtual memory limit enforced by SGX hardware.
B. TCB Breakdown
Since we aim to run applications inside enclaves, we trust
Intel SGX support software (SDK, PSW) that allows us to
interface with the hardware. This choice is same as any other
system that uses enclaves. RATEL comprises one additional
trusted components i.e., DynamoRIO. Put together, RATEL
amounts to 277, 803 LoC TCB. This is comparable to existing
SGX frameworks that have 100K to 1M LoC [3], [17], but only
provide library-based compatibility.
Table III (columns 2-4) summarizes the breakdown of the
LoC included in the trusted components of the PSW, SDK, and
DynamoRIO as well as the code contributed by each of the
sub-systems supported by RATEL. Original DynamoRIO com-
prises 353, 139 LoC. We reduce it to 129, 875 LoC (trusted)
and 66, 629 LoC (untrusted) by removing the components that
are not required or used by RATEL. Then we add 8, 589 LoC
to adapt DynamoRIO to SGX as per the design outlined in
Section IV. Apart from this, as described in Section V, we
change the libraries provided by Intel SGX (SDK, PSW, Linux
driver) and add 1, 078 LoC.
Of the 277, 803 LoC of trusted code, 123, 322 LoC is from
the original DynamoRIO code base responsible for loading
the binaries, code-cache management, and syscall handling.
110, 848 LoC and 37, 080 LoC are from Intel SGX SDK and
PSW respectively. RATEL implementation adds only 6, 553
LoC on top of this implementation. A large fraction of our
added TCB (27.5%) is because of the OCALL wrappers that
are amenable to automated testing and verification [37], [66].
Suite
Name
Benchmark
/Application
Name
Compile Stats Runtime Stats Time (sec) Overhead
(in %)
LOC Size OutCalls
Sys
Calls
Page
Faults
Ctx
Swt DR Ratel
SP
E
C
C
IN
T
20
06
astar 4280 56 KB 26561 618555 271544 181 8.77 10.71 18.13
bzip2 5734 73 KB 26048 618115 443869 238 21.74 34.49 36.96
gobmk 157650 4.4 MB 26594 618629 272957 150 1.73 4.37 60.43
hmmer 20680 331 KB 26144 629203 271106 139 0.98 0.85 −15.45
sjeng 10549 162 KB 26606 618638 2049404 382 5.08 4.57 −11.20
libquantum 2611 51 KB 25969 618004 271071 150 0.50 0.47 −7.81
h264ref 36097 602 KB 27033 619033 272100 284 18.25 34.83 47.60
omnetpp 26652 871 KB 26961 618990 271813 151 2.47 2.72 9.20
Xalan 267376 6.3 MB 28121 620185 273953 198 5.53 4.05 −36.64
gcc 385783 3.8 MB 25758 656241 56201 454 12.85 6.30 −103.97
gromac 87921 1.1 MB 26783 654600 55019 633 2.85 4.79 40.50
SP
E
C
C
FP
20
06
leslie3d 2983 177 K 26831 618865 271723 204 18.80 21.63 13.05
milc 9580 150 KB 32551 624587 271506 192 13.05 22.41 41.76
namd 3892 330 KB 28550 620582 271665 173 18.86 19.41 2.85
cactusADM 60235 819 KB 27619 619634 370217 190 5.34 7.78 31.35
calculix 105123 1.8 MB 27319 629243 313313 174 3.06 3.90 21.44
dealII 94458 4.3 MB 26858 618872 273471 240 24.68 24.73 0.20
GemsFDTD 4883 440 KB 25207 617226 366889 177 5.44 2.08 −161.25
povray 78684 1.2 MB 29082 621108 272267 166 4.42 4.72 6.38
soplex 28282 507 KB 26880 618909 271861 164 2.07 2.08 0.31
specrand 54 8.7 KB 25863 617897 270924 150 0.35 0.27 −27.85
specrand 54 8.7 KB 25863 617897 270990 164 0.34 0.33 −2.35
tonto 107228 4.6 MB 30562 622574 273669 186 6.89 6.65 −3.53
zeusmp 19030 280 KB 27163 619201 1755130 434 20.65 52.03 60.32
IO
Z
O
N
E
readre-read 26545 1.1 MB 27254 622791 23785 1215 0.88 0.89 1.35
random readwrite 26545 1.1 MB 27376 622913 23744 844 0.88 1.09 19.41
read backward 26545 1.1 MB 27431 622968 23854 1159 0.84 1.38 39.22
fwritere-fwrite 26545 1.1 MB 27212 622750 24317 581 0.87 0.86 −0.93
iozone freadre-fread 26545 1.1 MB 27223 622760 23742 374 0.86 0.65 −31.90
FS
C
Q
fscq large file 383 25 KB 25889 1165892 270914 168 0.47 3.41 86.22
fscq small file 161 19 KB 26352 929795 270959 181 0.34 0.17 −104.82
fscq write file 74 18 KB 262015 930226 270867 143 0.31 0.13 −142.19
multicreatewrite 20 11 KB 65721 969595 270969 248 0.38 0.83 54.59
multiopen 14 9.8 KB 225719 1129593 270842 452 0.57 2.44 76.64
multicreate 18 9.9 KB 55720 949625 270866 212 0.31 0.67 53.59
multiwrite 16 9.9 KB 35720 939594 270864 152 0.23 0.30 22.52
multicreatemany 19 11 KB 45729 959605 271034 198 0.36 0.77 53.63
multiread 17 9.9 KB 325721 1229595 270901 589 0.62 3.70 83.20
PA
R
SE
C
-S
PL
A
SH
2
water nsquare 2885 46 KB 27109 622992 25093 199 0.94 0.88 −6.44
water spatial 3652 46 KB 27171 622991 61992 164 1.05 2.31 54.69
barnes 4942 46 KB 26801 622748 28191 287 0.85 1.02 16.59
fmm 7611 64 KB 27218 622909 62025 58 0.97 2.18 55.72
raytrace 200091 92 KB 27291 623175 65323 194 1.23 2.59 52.32
radiosity 21586 230 KB 27609 623118 26162 139 4.30 5.82 26.01
ocean cp 10519 81 KB 27234 622990 25480 86 1.19 1.05 −12.56
ocean ncp 6275 65 KB 27052 622948 25362 91 1.03 1.08 4.71
volrend 27152 271 KB 27309 623167 25082 178 0.75 0.88 14.56
A
pp
lic
at
io
ns
SQLite (10 K keys) 140420 1.3 MB 400548 1818195 272323 1261 5.82 6.94 16.19
cURL (10 MB) 22064 30 KB 35897 940802 272552 1031 1.78 1.17 −51.54
Memcached (100K) 44921 795 KB 1021241 1118691 540649 104765 5.99 9.46 36.68
densenetapp 12551 32 MB 27826 616749 123894 354 7.25 12.90 43.77
lenetapp 230 313 KB 26029 616411 21166 362 0.49 0.26 −92.43
resnet110app 9528 110 MB 27238 696291 23716 200 1.98 2.45 18.82
resnet50app 2826 98 MB 26591 616291 136139 274 7.64 11.81 35.36
resnext29app 1753 132 MB 26410 616728 187284 411 11.38 16.33 30.33
squeezenetapp 914 4.8 MB 26258 616290 23001 252 1.22 1.11 −9.83
vgg19app 990 77 MB 26192 630872 96647 402 1.40 2.44 42.65
wideresnetapp 1495 140 MB 26352 631004 172712 303 20.02 55.38 63.85
inceptionv3 4875 92 MB 26862 1088344 250880 355 13.25 24.63 46.20
TABLE IV: RATEL statistics for benchmarks and real-world applications. Columns 2−3: total application LoC and binary size.
Columns 4 − 7: total OCALLs, system calls, pagefaults, and context switches during one run. Columns 8 − 9: execution time
on vanilla DynamoRIO and RATEL. Column 10: RATEL’s execution overhead of RATEL w.r.t. DynamoRIO. RATEL performs
better than DynamoRIO in some cases because of eager binary loading which improves cache hits.
1 M
B
10
 M
B
10
0 M
B
1 G
B
File size
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Ov
er
he
ad
 (x
 ti
m
es
)
(a) cURL
len
eta
pp
sq
ue
ez
en
eta
pp
de
ns
en
eta
pp
vg
g1
9a
pp
res
ne
t50
ap
p
res
ne
t11
0a
pp
res
ne
xt2
9a
pp
wid
ere
sn
eta
pp
inc
ep
tio
nv
30.0
0.5
1.0
1.5
2.0
2.5
3.0
Ov
er
he
ad
 (x
 ti
m
es
)
(b) Privado
0 200 400 600 800 1000
Database size (No. of keys in K)
20
40
60
80
100
120
140
160
180
Th
ro
ug
hp
ut
 (
op
s/
s) Dynamorio
Ratel
(c) SQLite
4 6 8 10 12 14 16 18 20 22
Throughput (kops/s)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
La
te
nc
y 
(m
s)
Dynamorio
Ratel
(d) Memcached
Fig. 7: RATEL Performance. (a) cURL, (b) Privado, (c) SQLite, and (d) Memcached. (a) and (b) show RATEL execution time
overhead w.r.t. vanilla DynamoRIO. (c) shows average time per operation (micros/op) with increasing database size represented
as number of primary keys in thousands (K); (d) shows the throughput versus latency.
as
ta
r
bz
ip2
ca
ctu
sA
DM
ca
lcu
lix
de
alI
I
go
bm
k
Ge
ms
FD
TD
h2
64
re
f
hm
me
r
les
lie
3d
lib
qu
an
tu
m
mi
lc
na
md
om
ne
tp
p
po
vr
ay
sje
ng
so
ple
x
sp
ec
ra
nd
1
sp
ec
ra
nd
2
to
nt
o
Xa
lan
ze
us
mp gc
c
gr
om
ac
s
0.0
0.5
1.0
1.5
2.0
2.5
O
ve
rh
ea
d 
(x
 t
im
es
)
(a) SPEC 2006
wa
te
r_n
sq
ua
re
wa
te
r_s
pa
tia
l
ba
rn
es
fm
m
ra
yt
ra
ce
ra
dio
sit
y
oc
ea
n_
cp
oc
ea
n_
nc
p
vo
lre
nd
0
2
4
6
8
O
ve
rh
ea
d 
(x
 t
im
es
)
1 thread
2 threads
4 threads
8 threads
16 threads
(b) Parsec-SPLASH-2
wr
ite
rew
rit
e
rea
d
rer
ea
d
ran
do
m-
rea
d
ran
do
m-
wr
ite
bk
wd
-re
ad
rec
ord
-re
wr
ite
str
ide
-re
ad
fw
rit
e
fre
wr
ite
fre
ad
fre
rea
d0.0
0.2
0.4
0.6
0.8
1.0
1.2
Ba
nd
w
id
th
 lo
ss
 (x
 ti
m
es
)
(c) IOZone
Fig. 8: RATEL Performance for micro-benchmarks: (a) SPEC 2006 (CPU), (b) Parsec-SPLASH-2 (multi-threading), and (c)
IOZone (I/O). (a) and (b) show RATEL execution time overhead w.r.t. vanilla DynamoRIO; lower value indicates better
performance. (c) shows bandwidth loss w.r.t. vanilla DynamoRIO; value close to 0 indicates worse performance.
Property Sub-property Performance
DR Ratel
Memory
Intensive
Operations
Bandwidth (MB/s)
More iterations
Less Chunk
size
Raw Memory Read 24 976.73 24 665.05
Raw Memory Write 12 615.13 12 580.36
Bzero Bandwidth 60 877.42 65 072.41
Memory copy libc aligned 56 883.67 60 377.04
Memory copy libc unaligned 56 270.52 61 543.81
Memory copy unrolled aligned 12 272.93 12 351.22
Memory copy unrolled unaligned 12 279.55 12 295.40
Mmapped Read 423.85 190.73
File Read 29.15 12.05
Memory
Intensive
Operations
Bandwidth (MB/s)
Less iteration
More Chunk
size
Raw Memory Read 13 292.24 5717.96
Raw Memory Write 10 664.76 4563.60
Bzero Bandwidth 31 315.64 4166.62
Memory copy libc aligned 12 969.82 1570.94
Memory copy libc unaligned 13 141.00 1556.58
Memory copy unrolled aligned 6714.93 2054.55
Memory copy unrolled unaligned 5853.26 2081.05
Mmapped Read 7163.38 3299.77
File Read 3724.39 769.66
File
System
Latency(s)
Filesystem create 32.43 115.34
Filesystem delforward 18.41 33.53
Filesystem delrand 21.17 37.28
Filesystem delreverse 18.31 33.13
System
Call
Latency(s)
getpid 0.0065 0.0058
getrusage 0.64 7.45
gettimeofday 0.0239 6.5986
sbrk 0.0064 0.0065
sigaction 2.21 2.79
write 0.51 7.23
Signal
Handler Latency
Installing Signal 2.24 2.79
Handling Signal 8.88 81.58
TABLE V: Summary of HBenchOS benchmark results.
Rest of the 4, 752 LoC are for memory management, handling
signals, TLS, and multi-threading interface.
RATEL relies on, but does not trust, the code executing
outside the enclave in the host process (e.g., OCALLs). This
includes 2, 391 LoC changes, Table III (columns 6-10).
C. Performance Analysis
RATEL explicitly trades-off performance for secure and
complete mediation, by design. We present the performance
implications of these design choices. We have two main
findings. First, the performance overheads vary significantly
based on the application workload. Second, we find that most
of the overheads come stem from the specific SGX restrictions
R1-R5 and due to limited physical memory available.
To measure these, we collect various statistics of the ex-
ecution profile of 58 program in our micro-benchmarks and
4 real-world applications (12 binaries in total). Specifically,
we log the target application LoC, binary size, number of
OCALLs, ECALLs, syscalls, enclave memory footprint (stack
and heap), number of page faults, and number of context
switches. Table IV shows these statistics for the benchmarks
and applications evaluated with RATEL. Interested readers can
refer to Appendix C, D for detailed performance breakdowns.
There are three main avenues of overhead costs.
First, fundamental limitations of SGX result in increased
memory-to-memory operations (e.g., two-copy design) or us-
age of slower constructs (e.g., spin-locks instead of fast
futexes). Our evaluation on system stress workloads for each
subsystem measure the worst-case cost of these operations. We
report that on an average, SPEC CPU benchmarks result in
40.24% overheads (Figure 8a), while I/O-intensive workloads
cost 75% slowdown (Figure 8c for IOZone benchmarks).
Further, the performance overheads increases with larger I/O
record sizes. The same is observed for HBenchOS binaries
as reported in Table V. The expensive spin-locks incur cost
that increases with number of threads (Figure 8b for Parsec-
SPLASH2 benchmarks). Overall, we observe that benchmarks
that require large memory copies consistently exhibit signif-
icant slow-downs compared to others, highlighting the costs
imposed by the two-copy design. The cost of signal handling
also increases due to added context saves and restores in
RATEL, as seen in a dedicated benchmark of HBenchOS (see
last two rows in Table V).
We believe some of these observed costs will be common
to other compatibility engines, while the remaining stem
from our preference for binary compatibility in designing
RATEL. As an example, our evaluation on Graphene-SGX
(Appendix B) shows memory-intensive workloads exhibit a
similar increase in overheads. Graphene-SGX does not use
spin-locks and tunnels all signal handling through libc as
it prioritizes performance over binary compatibility, and has
reduced overheads compared to RATEL.
Second, the current SGX hardware implementation has
limited secure physical memory (called the EPC), only 90 MB.
Executing anything on a severely limited memory resource re-
sults in large slow-downs (e.g., increased page-faults). Further,
cost of each page-in and page-out operation itself is higher in
SGX because of hardware based memory encryption. We mea-
sure the impact of this limitation by executing benchmarks and
applications that exceed the working set size of 90 MB for both
data and code. For example, we test varying download sizes
in cURL (Figure 7a) and database sizes in SQLite (Figure 7c).
When the data exceeds 90 MB, we observe a sharp increase in
throughput loss. Similarly, when we execute varying sizes of
ML models that require increasing size of code page memory,
we observe increase in page faults and lowered performance
(Figure 7b). We observe similar loss of latency and throughput
when applications reach a critical point in memory usage as
in memcached (Figure 7d). Detailed performance breakdown
for these applications is in Appendix D.
The added overhead is solely because of RATEL is the
cost of dynamic binary translation itself. In the original
DynamoRIO, the DBT design achieves close to native or
better after the code-caches warm up [11]. In RATEL, we
expect to preserve the same performance characteristics as
DynamoRIO. However, the SGX physical memory limits
directly impact the execution profile of DynamoRIO running
in the enclave. Specifically, RATEL may result in an increase
in physical memory for its own execution that may slow-down
the target binary. It is difficult to directly measure the exact
cost incurred by this factor because we cannot increase the
hardware physical memory in our setup.
Lastly, we observe that the performance costs variation
based on workloads are common to other platforms. As a
direct comparison point, we tested HBenchOS—a benchmark
with varying workloads—with Graphene-SGX. Graphene-
SGX is a popular and well-maintained library-OS has been
under active development for several years as of this writing.
Interested readers can refer to Appendix B, D for details. RA-
TEL offers better binary compatibility as opposed to Graphene-
SGX which provides compatibility with glibc.
VII. RELATED WORK
Several prior works have targeted SGX compatibility. There
are two main ways that prior work has overcome these chal-
lenges. The first approach is to fix the application interface.
The target application is either re-compiled or is relinked to
use such interfaces. The approach that enables the best compat-
ibility exposes specific Libc (glibc or musl libc) versions
as interfaces. This allows them to adapt to SGX restrictions at
a layer below the application. Container or libraryOS solutions
use this to execute re-compiled/re-linked code inside the
enclave as done in Haven [6], Scone [3], Graphene-SGX [17],
Ryoan [35], SGX-LKL [58], and Occlum [61]. Another line of
work is compiler-based solutions. They require applications to
modify source code to use language-level interface [28], [50],
[64], [73].
Both style of approaches can have better performance than
RATEL, but require recompiling or relinking applications. For
example, library OSes like Graphene-SGX and container-
ization engines like Scone expose a particular glibc and
musl version that applications are asked to link with. New
library versions and interfaces can be ported incrementally,
but this creates a dependence on the underlying platform
interface provider, and incurs a porting effort for each library
version. Applications that use inline assembly or runtime code
generation also become incompatible as they make direct
access to system calls, without using the API. RATEL approach
of handle R1-R5 comprehensively offers secure and complete
mediation, without any assumptions about specific interfaces
beyond that implied by binary compatibility.
Security Considerations. As in RATEL, other approaches to
SGX compatibility eventually have to use OCALLs, ECALLs,
and syscalls to exchange information between the enclave
and the untrusted software. This interface is known to be
vulnerable [18], [71]. Several shielding systems for file [15],
[66] and network IO [5], provide specific mechanisms to
safeguard the OS interface against these attacks. For security,
defense techniques offer compiler-based tools for enclave code
for memory safety [41], ASLR [60], preventing controlled-
channel leakage [62], data location randomization [8], secure
page fault handlers [56], and branch information leakage [33].
Performance. Several other works build optimizations by
modifying existing enclave-compliant library OSes. One such
example is Hotcalls [74], Eleos [55] which add exit-less
calls to reduce the overheads of OCALLs. These well-known
optimizations are also available as part of the default Intel
SGX SDK now.
Language Run-times. Recent body of work has also shown
how executing either entire [72] or partial [20] language
runtimes inside an enclave can help to port existing code
written in interpreted languages such as Python [48], [59],
Java [20], web-assembly [31], Go [30], and JavaScript [32].
Programming TEE Applications. Intel provides a C/C++
SGX software stack which includes a SDK and OS drivers
for simulation and PSW for running local enclaves. There
are other SDKs developed in in memory safe languages such
as Rust [28], [50], [73]. Frameworks such as Asylo [4],
OpenEnclave [54], and MesaTEE [49] expose a high-level
front-end for writing native TEE applications using a common
interface. They support several back-end TEEs including Intel
SGX and ARM TrustZone. Our experience will help them to
sidestep several design challenges as well as improve their
forward compatibility.
Future TEEs. New enclave TEE designs have been pro-
posed [16], [23], [26], [42], [65]. Micro-architectural side
channels [7] and new oblivious execution capabilities [23],
[45] are significant concerns in these designs. Closest to our
underlying TEE is the recent Intel SGX v2 [1], [46], [75]. SGX
v2 enables dynamically memory and thread management in-
side the enclave, thus addressing R2 to some extent. The other
restrictions are largely not addressed in SGX v2, and therefore,
RATEL design largely applies to it as well. Designing enclave
TEEs that do not place the same restrictions as SGX remain
promising future work.
VIII. CONCLUSION
We present the design of RATEL, the first work to offer
binary compatibility with existing software on SGX. RATEL
is a dynamic binary translation engine inside SGX enclaves on
Linux. Through the lens of RATEL, we expose the fundamental
trade-offs between performance and ensure secure mediation
on the OS-enclave interface. These trade-offs are rooted in 5
SGX design restrictions, which offer concrete challenges to
next-generation enclave TEE designs.
ACKNOWLEDGMENTS
We thank David Kohlbrenner, Zhenkai Liang, and Roland
Yap for their feedback on improving earlier drafts of the
paper. We thank Shipra Shinde for help on formatting the
figures in this paper. This research was partially supported
by a grant from the National Research Foundation, Prime
Ministers Office, Singapore under its National Cybersecu-
rity R&D Program (TSUNAMi project, No. NRF2014NCR-
NCR001-21) and administered by the National Cybersecurity
R&D Directorate. This material is in part based upon work
supported by the National Science Foundation under Grant
No. DARPA N66001-15-C-4066 and Center for Long-Term
Cybersecurity. Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the
authors and do not necessarily reflect the views of the National
Science Foundation.
AVAILABILITY
RATEL implementation, including modified Intel SGX SDK,
PSW, and driver, is available at https://ratel-enclave.github.io/
Our project webpage and GitHub repository also contains unit
tests, benchmarks, Linux utils, and case studies evaluated in
this paper.
REFERENCES
[1] Software Guard Extensions Programming Reference Rev. 2.
software.intel.com/sites/default/files/329298-002.pdf, Oct 2014.
[2] Aex vector error graphene. https://github.com/oscarlab/graphene/issues/
1155.
[3] Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, Andre
Martin, Christian Priebe, Joshua Lind, Divya Muthukumaran, Daniel
O’Keeffe, Mark L Stillwell, David Goltzsche, Dave Eyers, Ru¨diger
Kapitza, Peter Pietzuch, and Christof Fetzer. SCONE: Secure Linux
Containers with Intel SGX. In OSDI, 2016.
[4] Google asylo: An open and flexible framework for enclave applications.
https://asylo.dev/, 2019.
[5] Pierre-Louis Aublin, Florian Kelbert, Dan OKeeffe, Divya Muthuku-
maran, Christian Priebe, Joshua Lind, Robert Krahn, Christof Fetzer,
David M. Eyers, and Peter R. Pietzuch. Talos : Secure and transparent
tls termination inside sgx enclaves, 2017.
[6] Andrew Baumann, Marcus Peinado, and Galen Hunt. Shielding Appli-
cations from an Untrusted Cloud with Haven. In OSDI, 2014.
[7] Thomas Bourgeat, Ilia A. Lebedev, Andrew Wright, Sizhuo Zhang,
Arvind, and Srinivas Devadas. MI6: Secure Enclaves in a Speculative
Out-of-Order Processor. In MICRO, 2019.
[8] Ferdinand Brasser, Srdjan Capkun, Alexandra Dmitrienko, Tommaso
Frassetto, Kari Kostiainen, Urs Mu¨ller, and Ahmad-Reza Sadeghi.
DR.SGX: hardening SGX enclaves against cache attacks with data
location randomization. CoRR, abs/1709.09917, 2017.
[9] Aaron B. Brown. HBench-OS Operating System Benchmarks. https:
//www.eecs.harvard.edu/margo/papers/sigmetrics97-os/hbench/, 2019.
[10] D. Bruening and Q. Zhao. Practical memory checking with dr. memory.
In International Symposium on Code Generation and Optimization
(CGO 2011), 2011.
[11] Derek Bruening, Evelyn Duesterwald, and Saman Amarasinghe. Design
and implementation of a dynamic optimization framework for windows.
In ACM Workshop on Feedback-Directed and Dynamic Optimization,
Austin, Texas, Dec 2001.
[12] Derek Bruening, Timothy Garnett, and Saman Amarasinghe. An
infrastructure for adaptive dynamic optimization. In Proceedings of
the International Symposium on Code Generation and Optimization:
Feedback-Directed and Runtime Optimization, CGO 03, 2003.
[13] Derek Bruening and Qin Zhao. Practical memory checking with dr.
memory. In Proceedings of the IEEE/ACM International Symposium on
Code Generation and Optimization, 2011.
[14] Derek L. Bruening and Saman Amarasinghe. Efficient, Transparent, and
Comprehensive Runtime Code Manipulation. PhD thesis, USA, 2004.
AAI0807735.
[15] Dorian Burihabwa, Pascal Felber, Hugues Mercier, and Valerio Schi-
avoni. Sgx-fs: Hardening a file system in user-space with intel sgx. In
2018 IEEE International Conference on Cloud Computing Technology
and Science (CloudCom), pages 67–72. IEEE, 2018.
[16] D. Champagne and R. B. Lee. Scalable architectural support for trusted
software. In HPCA - 16 2010 The Sixteenth International Symposium
on High-Performance Computer Architecture, pages 1–12, Jan 2010.
[17] Chia che Tsai, Donald E. Porter, and Mona Vij. Graphene-sgx: A
practical library OS for unmodified applications on SGX. In 2017
USENIX Annual Technical Conference (USENIX ATC 17), pages 645–
658, Santa Clara, CA, 2017. USENIX Association.
[18] Stephen Checkoway and Hovav Shacham. Iago attacks: Why the system
call api is a bad untrusted rpc interface. In Proceedings of the Eighteenth
International Conference on Architectural Support for Programming
Languages and Operating Systems, ASPLOS ’13, 2013.
[19] Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chlipala, M. Frans
Kaashoek, and Nickolai Zeldovich. Using crash hoare logic for certi-
fying the fscq file system. In Proceedings of the 25th Symposium on
Operating Systems Principles, SOSP ’15, 2015.
[20] Tsai Chia-Che, Jeongseok Son, Bhushan Jain, John McAvey, Raluca Ada
Popa, and Donald E. Porter. Civet: An efficient java partitioning
framework for hardware enclaves. In 29th USENIX Security Symposium
(USENIX Security 20), Boston, MA, August 2020. USENIX Association.
[21] JaeWoong Chung, Michael Dalton, Hari Kannan, and Christos
Kozyrakis. Thread-safe dynamic binary translation using transactional
memory. In 2008 IEEE 14th International Symposium on High Perfor-
mance Computer Architecture, 2008.
[22] Victor Costan and Srinivas Devadas. Intel sgx explained. Cryptology
ePrint Archive, Report 2016/086, 2016. http://eprint.iacr.org/2016/086.
[23] Victor Costan, Ilia Lebedev, and Srinivas Devadas. Sanctum: Minimal
hardware extensions for strong software isolation. In USENIX Security
’16.
[24] curl Home Page. https://curl.haxx.se/, 2019.
[25] Github - zwimer/drshadowstack: A software defined dynamic shadow
stack utilizing dynamorio. https://github.com/zwimer/DrShadowStack.
[26] Andrew Ferraiuolo, Andrew Baumann, Chris Hawblitzel, and Bryan
Parno. Komodo: Using verification to disentangle secure-enclave hard-
ware from software. In SOSP, 2017.
[27] File map error graphene. https://github.com/oscarlab/graphene/issues/
433.
[28] fortanix/rust-sgx: The fortanix rust enclave development platform. https:
//github.com/fortanix/rust-sgx.
[29] Tal Garfinkel, Mendel Rosenblum, and Dan Boneh. Flexible OS support
and applications for trusted computing. In Michael B. Jones, editor,
Proceedings of HotOS’03: 9th Workshop on Hot Topics in Operating
Systems, May 18-21, 2003, Lihue (Kauai), Hawaii, USA. USENIX, 2003.
[30] Adrien Ghosn, James R. Larus, and Edouard Bugnion. Secured routines:
Language-based construction of trusted execution environments. In 2019
USENIX Annual Technical Conference (USENIX ATC 19). USENIX
Association, July 2019.
[31] David Goltzsche, Manuel Nieke, Thomas Knauth, and Ru¨diger Kapitza.
Acctee: A webassembly-based two-way sandbox for trusted resource
accounting. In Middleware, 2019.
[32] David Goltzsche, Colin Wulf, Divya Muthukumaran, Konrad Rieck, Pe-
ter Pietzuch, and Ru¨diger Kapitza. Trustjs: Trusted client-side execution
of javascript. In Proceedings of the 10th European Workshop on Systems
Security, page 7. ACM, 2017.
[33] Karan Grover, Shruti Tople, Shweta Shinde, Ranjita Bhagwan, and
Ramachandran Ramjee. Privado: Practical and secure dnn inference
with enclaves, 2018.
[34] Daniel Gruss, Julian Lettner, Felix Schuster, Olya Ohrimenko, Istvan
Haller, and Manuel Costa. Strong and efficient cache side-channel
protection using hardware transactional memory. In 26th USENIX
Security Symposium (USENIX Security 17), pages 217–233, Vancouver,
BC, 2017. USENIX Association.
[35] Tyler Hunt, Zhiting Zhu, Yuanzhong Xu, Simon Peter, and Emmett
Witchel. Ryoan: A Distributed Sandbox for Untrusted Computation on
Secret Data. In 12th USENIX Symposium on Operating Systems Design
and Implementation (OSDI 16), pages 533–549, GA, 2016. USENIX
Association.
[36] Intel. Pin - A Dynamic Binary Instrumentation Tool.
https://software.intel.com/en-us/articles/pin-a-dynamic-binary-
instrumentation-tool.
[37] Mustakimur Rahman Khandaker, Yueqiang Cheng, Zhi Wang, and Tao
Wei. Coin attacks: On insecurity of enclave untrusted interfaces in
sgx. In Proceedings of the Twenty-Fifth International Conference
on Architectural Support for Programming Languages and Operating
Systems, ASPLOS 20, 2020.
[38] Seongmin Kim, Youjung Shin, Jaehyung Ha, Taesoo Kim, and Dongsu
Han. A first step towards leveraging commodity trusted execution
environments for network applications. HotNets 2015.
[39] Vladimir Kiriansky, Derek Bruening, Saman P Amarasinghe, et al.
Secure execution via program shepherding. In USENIX Security Sym-
posium, volume 92, page 84, 2002.
[40] Roland Kunkel, Do Le Quoc, Franz Gregor, Sergei Arnautov, Pramod
Bhatotia, and Christof Fetzer. Tensorscone: A secure tensorflow frame-
work using intel SGX. CoRR, abs/1902.04413, 2019.
[41] Dmitrii Kuvaiskii, Oleksii Oleksenko, Sergei Arnautov, Bohdan Trach,
Pramod Bhatotia, Pascal Felber, and Christof Fetzer. Sgxbounds:
Memory safety for shielded execution. In Proceedings of the Twelfth
European Conference on Computer Systems, EuroSys ’17, pages 205–
221, New York, NY, USA, 2017. ACM.
[42] Dayeol Lee, David Kohlbrenner, Shweta Shinde, Krste Asanovic´, and
Dawn Song. Keystone: An open framework for architecting trusted
execution environments. In Proceedings of the Fifteenth European
Conference on Computer Systems, EuroSys 20, 2020.
[43] Leveldb benchmarks. http://www.lmdb.tech/bench/microbench/
benchmark.html.
[44] Yan Lin, Xiaoxiao Tang, Debin Gao, and Jianming Fu. Control flow
integrity enforcement with dynamic code optimization. In Matt Bishop
and Anderson C A Nascimento, editors, Information Security. Springer
International Publishing, 2016.
[45] Martin Maas, Eric Love, Emil Stefanov, Mohit Tiwari, Elaine Shi,
Krste Asanovic, John Kubiatowicz, and Dawn Song. Phantom: Practical
oblivious computation in a secure processor. In Proceedings of the 2013
ACM SIGSAC Conference on Computer and Communications Security,
CCS ’13, pages 311–324, New York, NY, USA, 2013. ACM.
[46] F McKeen, I Alexandrovich, I Anati, D Caspi, S Johnson, R Leslie-Hurd,
and C Rozas. Sgx instructions to support dynamic memory allocation
inside an enclave. 2016.
[47] Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos V. Rozas,
Hisham Shafi, Vedvyas Shanbhogue, and Uday R. Savagaonkar. In-
novative instructions and software model for isolated execution. In
Proceedings of the 2Nd International Workshop on Hardware and
Architectural Support for Security and Privacy, HASP ’13, pages 10:1–
10:1, New York, NY, USA, 2013. ACM.
[48] mesalock-linux/mesapy: A fast and safe python based on pypy. https:
//github.com/mesalock-linux/mesapy.
[49] Mesatee: A framework for universal secure computing. https://
mesatee.org/.
[50] apache/mesatee-sgx: Rust sgx sdk provides the ability to write intel sgx
applications in rust programming language. https://github.com/apache/
mesatee-sgx.
[51] W. Norcott and D. Capps. IOzone file system benchmark. URL:
www.iozone.org, 2019.
[52] Olga Ohrimenko, Manuel Costa, Ce´dric Fournet, Christos Gkantsidis,
Markulf Kohlweiss, and Divya Sharma. Observing and preventing
leakage in mapreduce. In CCS ’15.
[53] Olga Ohrimenko, Felix Schuster, Cedric Fournet, Aastha Mehta, Sebas-
tian Nowozin, Kapil Vaswani, and Manuel Costa. Oblivious multi-party
machine learning on trusted processors. In USENIX Security ’16.
[54] Open enclave sdk. https://openenclave.io/sdk/.
[55] Meni Orenbach, Pavel Lifshits, Marina Minkin, and Mark Silberstein.
Eleos: Exitless os services for sgx enclaves. In Proceedings of the
Twelfth European Conference on Computer Systems, EuroSys ’17, pages
238–253, New York, NY, USA, 2017. ACM.
[56] Meni Orenbach, Yan Michalevsky, Christof Fetzer, and Mark Silberstein.
Cosmix: a compiler-based system for secure memory instrumentation
and execution in enclaves. In 2019 {USENIX} Annual Technical
Conference ({USENIX}{ATC} 19), pages 555–570, 2019.
[57] The PARSEC Benchmark Suite. https://parsec.cs.princeton.edu/, 2019.
[58] Christian Priebe, Divya Muthukumaran, Joshua Lind, Huanzhou Zhu,
Shujie Cui, Vasily A Sartakov, and Peter Pietzuch. Sgx-lkl: Securing the
host os interface for trusted execution. arXiv preprint arXiv:1908.11143,
2019.
[59] adombeck/python-sgx: Python interface to the sgx sdk. https://
github.com/adombeck/python-sgx.
[60] Jaebaek Seo, Byoungyoung Lee, Seong Min Kim, Ming-Wei Shih, Insik
Shin, Dongsu Han, and Taesoo Kim. Sgx-shield: Enabling address space
layout randomization for sgx programs. In NDSS, 2017.
[61] Youren Shen, Hongliang Tian, Yu Chen, Kang Chen, Runji Wang,
Yi Xu, Yubin Xia, and Shoumeng Yan. Occlum: Secure and efficient
multitasking inside a single enclave of intel sgx. In Proceedings of
the Twenty-Fifth International Conference on Architectural Support for
Programming Languages and Operating Systems, ASPLOS 20, 2020.
[62] Ming-Wei Shih, Sangho Lee, Taesoo Kim, and Marcus Peinado. T-
sgx: Eradicating controlled-channel attacks against enclave programs.
In NDSS. Internet Society, February 2017.
[63] Shweta Shinde, Zheng Leong Chua, Viswesh Narayanan, and Prateek
Saxena. Preventing page faults from telling your secrets. In Proceedings
of the 11th ACM on Asia Conference on Computer and Communications
Security, ASIA CCS ’16, 2016.
[64] Shweta Shinde, Dat Le Tien, Shruti Tople, and Prateek Saxena. Panoply:
Low-TCB Linux Applications With SGX Enclaves. In 24th Annual
Network and Distributed System Security Symposium, NDSS, 2017.
[65] Shweta Shinde, Shruti Tople, Deepak Kathayat, and Prateek Saxena.
PodArch: Protecting Legacy Applications with a Purely Hardware TCB.
Technical report, National University of Singapore, 2015.
[66] Shweta Shinde, Shengyi Wang, Pinghai Yuan, Aquinas Hobor, Abhik
Roychoudhury, and Prateek Saxena. Besfs: A POSIX filesystem for
enclaves with a mechanized safety proof. In 29th USENIX Security
Symposium (USENIX Security 20), 2020.
[67] Spec 2006 benchmarks. https://www.spec.org/, 2019.
[68] SQLite Home Page. https://www.sqlite.org/index.html, 2019.
[69] Chia-Che Tsai, Kumar Saurabh Arora, Nehal Bandi, Bhushan Jain,
William Jannen, Jitin John, Harry A. Kalodner, Vrushali Kulkarni,
Daniela Oliveira, and Donald E. Porter. Cooperation and Security
Isolation of Library OSes for Multi-Process Applications. In EuroSys,
2014.
[70] Chia-Che Tsai, Bhushan Jain, Nafees Ahmed Abdul, and Donald E.
Porter. A study of modern linux api usage and compatibility: What
to support when you’re supporting. In Proceedings of the Eleventh
European Conference on Computer Systems, EuroSys ’16, 2016.
[71] Jo Van Bulck, David Oswald, Eduard Marin, Abdulla Aldoseri, Flavio D.
Garcia, and Frank Piessens. A tale of two worlds: Assessing the
vulnerability of enclave shielding runtimes. In Proceedings of the 2019
ACM SIGSAC Conference on Computer and Communications Security,
CCS ’19, 2019.
[72] Huibo Wang, Erick Bauman, Vishal Karande, Zhiqiang Lin, Yueqiang
Cheng, and Yinqian Zhang. Running language interpreters inside sgx:
A lightweight,legacy-compatible script code hardening approach. In
Proceedings of the 2019 ACM Asia Conference on Computer and
Communications Security, Asia CCS ’19, pages 114–121, New York,
NY, USA, 2019. ACM.
[73] Huibo Wang, Pei Wang, Yu Ding, Mingshen Sun, Yiming Jing, Ran
Duan, Long Li, Yulong Zhang, Tao Wei, and Zhiqiang Lin. Towards
memory safe enclave programming with rust-sgx. In CCS, 2019.
[74] Ofir Weisse, Valeria Bertacco, and Todd Austin. Regaining lost cycles
with hotcalls: A fast interface for sgx secure enclaves. In Proceedings
of the 44th Annual International Symposium on Computer Architecture,
ISCA ’17, pages 81–93, New York, NY, USA, 2017. ACM.
[75] Bin Cedric Xing, Mark Shanahan, and Rebekah Leslie-Hurd. Intel
software guard extensions (intel sgx) software support for dynamic
memory allocation inside an enclave. In Proceedings of the Hardware
and Architectural Support for Security and Privacy 2016, HASP 2016,
2016.
APPENDIX
A. Details of Compatibility Tests with RATEL
Linux Utilities. We test the compatibility offered by RATEL
with all the Linux built-in binaries on our experimental Ubuntu
system. These comprise 229 shared-objected binaries in total,
which are typically are in the directories /bin and /usr/bin.
We run each utility with the most representative options
and inputs. Out of 229, our test machine natively and with
DynamoRIO work with 195. Of these 195 binaries, a total of
138 have all system calls presently supported in RATEL, all
of which worked correctly in our tests out-of-the-box. The 57
programs that did not work fail for 2 reasons: missing syscall
support and virtual memory limits imposed by SGX. Table VI
and Table VII list all Linux utilities and binaries from real-
world applications and benchmarks that ran successfully, and
present the number of unique system calls for each. Table VIII
and Table IX summarize the reasons for all binaries that fail
in RATEL and in native and DynamoRIO, respectively.
45 of the failing utilities are due to lack of multi-processing
(fork) support in RATEL. 5 utilities use certain POSIX signals
for which we have not supported completely (e.g., real-time
signals SIGRTMIN+ n.). 6 utilities fail because they invoke
other unsupported system calls in RATEL, most of which the
restriction R3 in SGX fundamentally does not permit (e.g.,
shared memory syscalls such as shmat, shmdt, shmctl, etc.).
1 utility fails because of the virtual memory limit in SGX, as
it loads more than 100 shared libraries. It should be noted that
the ioctl syscall involves more than 100 variable parameters,
RATEL syscall stubs currently does not cover all of them.
Other Benchmarks & Applications. From the 81 bina-
ries from micro-benchmarks and real applications, 7 do not
work with RATEL. 5 binaries from HBenchOS (lat proc,
lat pipe, lat ctx, lat ctx2, bw pipe) either use fork
or shared memory system calls disallowed by R3. 2 binaries
(lat memsize from HBenchOS, mcf from SPEC2006) with
DynamoRIO require virtual memory larger than SGX limits
on our experimental setup.
B. Comparison to Graphene-SGX
To compare RATEL’s binary compatibility and performance
with other approaches, we have chosen Graphene-SGX, a
library-OS which runs inside SGX enclave. Graphene-SGX
offers the lowest compatibility barrier of all prior systems to
our knowledge, specifically offering compatibility with glibc.
Compatibility. Applications using Graphene-SGX have to
work only with a specific library interface, namely a custom
glibc, which requires re-linking and build process changes.
RATEL, in contrast, has been designed for binary compatibility
which is a fundamental difference in design. To demonstrate
the practical difference, we reported in Section VI-A that
HBenchOS benchmark works out-of-the-box when if built
with both glibc and musl, as an example.
Graphene-SGX requires a manifest file for each application,
that specifies the main binary name as well as dynamic-
libraries, directories, and files used by the application. By
default Graphene-SGX does not allow creation of new files
during runtime. We use the allow file creation to disable
this default. We tested all 77 benchmark and application bina-
ries (HBench-OS, Parsec-SPLASH2, SPEC, IOZone, FSCQ,
SQLite, cURL, Memcached, Privado), out of which 64 work
with Graphene-SGX. Of the 13 that fail on Graphene-SGX,
all except 1 work on RATEL, with the only failure being due
to virtual memory limits.
For Graphene-SGX, 3/9 Parsec-SPLASH2 binaries
(water nsquare, water spatial and volrend), IOZone
binary, and SQLite database workload [43] failed due to I/O
error, (e.g., [27]) which is an open issue. 3/25 binaries
from SPEC2006 failed. Graphene-SGX fails for cactusAMD
due failed due to a signal failure, which is mentioned as
an existing open issue on its public project page [2]. The
calculix program fails with a segmentation fault. The
omnetpp could not process the input file in-spite of making
the input file as allowed in the corresponding manifest file.
4 networking related binaries from HBench-OS namely
lat connect, lat tcp, lat udp and bw tcp could not
run, resulting in a “bad address” error while connecting
to localhost. lat memsize from HBench-OaS fails on
Graphene-SGX as it fails on RATEL too due to the virtual
memory limit.
Performance. We report the performance overheads of
Graphene-SGX for HBenchOS benchmarks in Table X, as
compared to DynamoRIO baseline and RATEL. The slowdown
in both the systems is comparable for I/O benchmarks, since
both of them incur two copies. Graphene-SGX is significantly
faster than DynamoRIO baseline and RATEL for system call
and signal handling, because it implements a library OS inside
the enclave and avoids expensive context switches. RATEL del-
egates most of the system calls to the OS and does not emulate
it like Graphene-SGX, offering compatibility with multiple
libraries in contrast. Further, RATEL offers instruction-level
instrumentation capability.
C. Performance: Micro-benchmarks
We measure the performance for targeted workloads that
stress system APIs, CPU, and IO. The breakdown helps is to
explain the costs associated with executing diverse workloads
with RATEL.
Methodology For each target binary, we record the execution
time in two following settings:
• Baseline (DynamoRIO). We execute the application bi-
nary directly with DynamoRIO.
• RATEL. We use RATEL to execute the application binary
in enclave. We offset the execution time by deducting
the overhead to create, initialize, load DynamoRIO and
the application binary inside the enclave, and destroy the
enclave.
System Stress Workloads. We use HBench-OS [9]—a bench-
mark to measure the performance of primitive functionality
provided by an OS and hardware platform. In Table V we show
the cost of each system-level operation such as system calls,
Utility # of syscalls Utility # of syscalls Utility # of syscalls Utility # of syscalls Utility # of syscalls Utility # of syscalls
ed 18 ppdpo 29 dirmngr 21 hcitool 20 systemctl 32 systemd-cgtop 25
cvt 13 psnup 14 enchant 20 bluemoon 21 vim.basic 45 dirmngr-client 13
eqn 36 t1asm 13 epsffit 13 btattach 21 hciconfig 25 systemd-escape 18
gtf 18 troff 15 faillog 18 fwupdate 19 brltty-ctb 21 systemd-notify 19
pic 13 uconv 13 gendict 14 gatttool 24 fusermount 19 wpa passphrase 18
tbl 13 bccmd 25 hex2hcd 18 gencnval 13 journalctl 26 gamma4scanimage 13
xxd 14 btmgmt 24 icuinfo 15 lessecho 13 sudoreplay 16 systemd-analyze 31
curl 32 busctl 37 kbxutil 14 loginctl 31 watchgnupg 18 systemd-inhibit 31
derb 23 catman 16 lastlog 14 makeconv 14 xmlcatalog 12 systemd-resolve 31
find 27 cd-it8 21 lesskey 13 ppdmerge 26 zlib-flate 13 ulockmgr server 18
gawk 25 expiry 14 lexgrog 15 psresize 14 cupstestdsc 26 systemd-tmpfiles 34
grep 21 genbrk 14 manpath 14 psselect 14 cupstestppd 18 gpg-connect-agent 22
htop 26 gencfu 13 obexctl 35 t1binary 13 hostnamectl 30 kerneloops-submit 20
kmod 17 grotty 13 pkgdata 14 t1binary 13 systemd-run 29 evince-thumbnailer 21
ppdc 30 l2ping 21 ppdhtml 27 t1disasm 13 timedatectl 32 fcitx-dbus-watcher 18
ppdi 30 l2test 27 preconv 15 t1disasm 13 brltty-trtxt 22 systemd-detect-virt 18
qpdf 14 psbook 14 sdptool 22 transfig 15 dbus-monitor 31 dbus-cleanup-sockets 19
gpg2 27 pstops 14 ssh-add 20 vim.tiny 34 dbus-uuidgen 13 systemd-stdio-bridge 25
wget 29 rctest 25 t1ascii 13 dbus-send 30 fcitx-remote 23 systemd-ask-password 20
btmon 23 soelim 12 udevadm 27 gpg-agent 18 gpgparsemail 14 webapp-container-hook 26
genrb 13 whatis 20 volname 15 hciattach 18 systemd-hwdb 17 systemd-machine-id-setup 18
grops 14 rfcomm 19 xmllint 15 localectl 30 systemd-path 18 systemd-tty-ask-password-agent 25
mandb 27 bootctl 19 ciptool 20 pg config 16 enchant-lsmod 13 dbus-update-activation-environment 31
TABLE VI: List of GNU utilities (138) tested with RATEL along with the number of unique system calls called during a single
run.
Utility # of syscalls Utility # of syscalls Utility # of syscalls Utility # of syscalls Utility # of syscalls Utility # of syscalls
gcc 43 dealII 43 leslie3d 43 xalancbmk 49 bw mmap rd 42 water spatial 44
fmm 45 soplex 43 calculix 41 LFS-write 43 resnet50app 41 inceptionv3app 42
curl 55 povray 43 GemsFDTD 42 multiopen 41 densenetapp 42 multicreatemany 41
milc 42 barnes 45 specrand 42 multiread 41 multicreate 47 multicreatewrite 41
namd 42 iozone 47 specrand 42 memcached 59 lat fslayer 42 lat syscall(sbrk) 42
bzip2 48 lat fs 41 lenetapp 41 radiosity 44 lat connect 42 lat syscall(write) 42
gobmk 42 bw tcp 42 vgg19app 43 ocean ncp 44 resnext29app 42 lat syscall(getpid) 42
hmmer 41 h264ref 42 raytrace 45 bw mem cp 42 resnet110app 41 lat syscall(sigaction) 42
sjeng 42 omnetpp 43 ocean cp 45 bw mem rd 42 wideresnetapp 43 lat syscall(getrusage) 43
tonto 44 gromacs 46 volerand 45 bw mem wr 42 squeezenetapp 41 lat syscall(gettimeofday) 42
astar 43 lat sig 44 bw bzero 42 libquantum 42 LFS-smallfile 49
sqlite 47 lat tcp 42 lat mmap 42 multiwrite 41 LFS-largefile 42
zeusmp 43 lat udp 42 cactusADM 43 bw file rd 42 water nsquare 44
TABLE VII: List of applications (12) and individual benchmarks (63) tested with RATEL along with the number of unique
system calls called during a single run.
Property Sub-property Performance
DR Ratel Graphene
Memory
Intensive
Operations
Bandwidth (MB/s)
More iteration
Less Chunk
size
Raw Memory Read 24 976.73 24 665.05 23 210.89
Raw Memory Write 12 615.13 12 580.36 12 114.76
Bzero Bandwidth 60 877.42 65 072.41 47 844.32
Memory copy libc aligned 56 883.67 60 377.04 63 423.44
Memory copy libc unaligned 56 270.52 61 543.81 69 444.44
Memory copy unrolled aligned 12 272.93 12 351.22 12 161.04
Memory copy unrolled unaligned 12 279.55 12 295.40 10 079.86
Mmapped Read 423.85 190.73 3814.69
File Read 29.15 12.05 325.52
Memory
Intensive
Operations
Bandwidth (MB/s)
Less iteration
More Chunk
size
Raw Memory Read 13 292.24 5717.96 5310.06
Raw Memory Write 10 664.76 4563.60 3495.86
Bzero Bandwidth 31 315.64 4166.62 4046.79
Memory copy libc aligned 12 969.82 1570.94 1534.59
Memory copy libc unaligned 13 141.00 1556.58 1545.08
Memory copy unrolled aligned 6714.93 2054.55 2009.46
Memory copy unrolled unaligned 5853.26 2081.05 1999.58
Mmapped Read 7163.38 3299.77 1454.30
File Read 3724.39 769.66 134.42
File
System
Latency(s)
Filesystem create 32.43 115.34 1272.86
Filesystem delforward 18.41 33.53 1185.10
Filesystem delrand 21.17 37.28 1073.69
Filesystem delreverse 18.31 33.13 1266.38
System
Call
Latency(s)
getpid 0.0065 0.0058 0.09
getrusage 0.64 7.45 0.09
gettimeofday 0.0239 6.5986 6.85
sbrk 0.0064 0.0065 0.01
sigaction 2.21 2.79 0.59
write 0.51 7.23 0.53
Signal
Handler Latency
Installing Signal 2.24 2.79 0.60
Handling Signal 8.88 81.58 0.37
TABLE X: Summary of HBenchOS benchmark results for
Graphene-SGX along with DynamoRIO and RATEL.
Reason category # of the unsuccessful Case examples
fork 49 strace, scp, lat proc and lat pipe from HBenchOS, etc.
execv 1 systemd-cat
signal 5 colormgr, cd-iccdump, bluetoothctl etc.
Unsupported syscalls 6 webapp-container, webbrowser-app, etc.
Out-of-memory 3 shotwell, mcf from SPEC2006, lat memsize from HBenchOS
TABLE VIII: Summary of the reasons for failure of all 64
unsuccessful binaries tested with RATEL.
Reason category # of the unsuccessful Case examples
NTFS related 16 ntfs-3g, ntfs-3g.probe, ntfs-3g.secaudit, etc.
Printer related 7 lp, lpoptions, lpq, lpr, lprm, etc.
Scanner related 2 sane-find-scanner, scanimage
Failure in native run 5 umax pp, cd-create-profile, and bwaves from SPEC2006, etc.
Failure in Dynamorio run 8 ssh, ssh-keygen, dig, etc.
TABLE IX: Summary of the reasons for failure of all 38
unsuccessful binaries tested with native and DynamoRIO.
memory operations, context switches, and signal handling.
Memory-intensive operation latencies vary with benchmark
setting: (a) when the operations are done with more itera-
tions (in millions) and less memory chunk size (4 KB) the
performance is comparable; (b) when the operations are done
with less iterations (1 K) and less memory chunk size (4
MB) RATEL incurs overhead ranging from 117% to 651%.
This happens because when the chunk size is large, we need
to allocate and de-allocate memory inside enclave for every
iteration as well as copy large amounts of data.
These file operation latencies match with latencies we
observed in our I/O intensive workloads (Figure 8c). Specif-
ically, the write operation incurs large overhead. Hence, the
create workload incurs 259% overhead because the bench-
mark creates a file and then writes predefined sized data
to it. Costs of system calls that are executed as OCALLs
vary depending on return value and type of the system call.
For example, system calls such as getpid, sbrk, sigaction
that return integer values are much faster. Syscalls such as
getrusage, gettimeofday returns structures or nested struc-
tures. Thus, copying these structures back and forth to/from
enclaves causes much of the performance slowdown. RATEL
has a custom mechanism for registering and handling signal
(Section IV-E); it introduces a latency of 19.71% and 89.11%
respectively. Registering signals is cheaper because it does not
cause a context switch as in the case of handling the signal.
Further, after accounting for the OCALL costs, our custom
forwarding mechanism does not introduce any significant
slowdown.
CPU-bound Workloads. RATEL incurs 40.24% overhead av-
eraged over 23 applications from SPEC 2006 [67] with respect
to DynamoRIO. Table IV shows the individual overheads for
each application with respect to all baselines. From Table IV
we observe that applications that incur higher number of
page faults and OCALLs suffer larger performance slow-downs.
Thus, similar to other SGX frameworks, the costs of enclave
context switches and limited EPC size are the main bottlenecks
in RATEL.
IO-bound Workloads. RATEL performs OCALLs for file I/O
by copying read and write buffers to and from enclave.
We measure the per-API latencies using FSCQ suite for file
operations [19]. Table IV shows the costs of each file operation
and file access patterns respectively. Apart from the cost of
the OCALL, writes are more expensive compared to reads in
general; the multiple copy operations in RATEL amplify the
performance gap between them. Next we use IOZone [51], a
commonly used benchmark to measure the file I/O latencies.
Figure 8c shows the bandwidth over varied file sizes between
16 MB to 1024 MB and record sizes between 4 KB to 4096
KB for common patterns. The trend of writes being more
expensive holds for IOZone too. RATEL incurs an average
slowdown of 75% over all operations, record sizes, and file
sizes.
Multi-threaded Workloads. We use the standard Parsec-
SPLASH2 [57] benchmark suite. It comprises a variety of
high performance computing (HPC) and graphics applications.
We use it to benchmark RATEL overheads for multi-thread
applications. Since some of the programs in Parsec-SPLASH2
mandate the thread count to be power to 2 (e.g., ocean ncp),
we fixed the maximum number of threads in our experiment
to 16. RATEL changes the existing SGX design to handle
thread creation and synchronization primitives, as described
in Section IV-C and IV-D. We measure the effect of this
specific change on the application execution by configuring
the enclave to use varying number of threads between 1-16.
Figure 8b shows a performance overhead of 832%, on
average, across all benchmarks and thread configurations. For
single-threaded execution, RATEL causes an overhead of 28%.
With the increase in threads, it varies from 416%, 910%,
1371%, 1438% respectively. We measure the breakdown of
costs and observe that, on average: (a) creating each thread
contributes to a fixed cost of 57 ms; (b) shared access to
variables becomes expensive by a factor of 1 − 7 times
compared to the elapsed time of futex synchronization with
increase in number of threads. This is expected because
synchronization is cheaper in DynamoRIO execution, in which
it uses unsafe futex primitives exposed by the kernel. On
the other hand, RATEL uses expensive spinlock mechanism
exposed by SGX hardware for security. Particularly, some of
the individual benchmarks, such as water spatial, fmm and
raytrace that involve lots of lock contention events and have
extremely high frequency of spinlock calls (e.g., the spinning
counts of about 423, 000 ms in RATEL while the futex calls of
about 500 ms in DynamoRIO for the raytrace with 8 threads).
Thus, they incur large overheads in synchronization.
D. Performance:Real-world Case-studies
We work with 4 representative real-world applications: a
database server (SQLite), a command-line utility (cURL), a
machine learning inference-as-a-service framework (Privado),
and a key-value store (memcached). These applications have
been used in prior work [64].
SQLite is a popular database [68]. We select it as a case-study
because of its memory-intensive workload. We configure it
as a single-threaded instance. We use a database benchmark
workload [43] and measured the throughput (ops/sec) for each
database operation with varying sizes of the database (total
number of entries). Table IV shows the detailed breakdown
of the runtime statistics for a database with 10, 000 entries.
Figure 7c shows the average throughput over all operations.
With RATEL, we observe a throughput loss of 25.14% on
average over all database sizes. The throughput loss increases
with increase in the database size. The drop is noticeable at
500K where the database size crosses the maximum enclave
size threshhold and results in significant number of page
faults. This result matches with observations from other SGX
frameworks that report SQLite performance [3].
cURL is a widely used command line utility to download data
from URLs [24]. It is network intensive. We test it with RATEL
via the standard library test suite. Table IV shows detailed
breakdown of time execution time on RATEL. We measure
the cost executing cURL with RATEL for downloading various
sizes of files from an Apache (2.4.41) server on the local
network. Figure 7a shows the throughput for various baselines
and file sizes. On average, RATEL causes a loss of 142.27%
throughput as compared to DynamoRIO. For all baselines,
small files (below 100 MB) have smaller download time; larger
file sizes naturally take longer time. This can be explained by
the direct copying of packets to non-enclave memory, which
does not add any memory pressure on the enclave. The only
remaining bottleneck in the cost of dispatching OCALLs which
increase linearly with the requested file size.
Privado is a machine learning framework that provides secure
inference-as-a-service [33]. It comprises of several state of the
art models available as binaries that can execute on an input
image to predict its class. The binaries are CPU intensive and
have sizes ranging from 313 KB to 140 MB (see Table IV).
We execute models from Privado on all the images from
the corresponding image dataset (CIFAR or ImageNet) and
measure inference time. Figure 7b shows the performance of
baselines and RATEL for 9 models in increasing order of
binary size. We observe that RATEL performance degrades
with increase in binary size. This is expected because the
limited enclave physical memory leads to page faults. Hence,
largest model (140 MB) exhibit highest inference time and
smallest model (313 KB) exhibit lowest inference time. Thus,
RATEL and enclaves in general can add significant overheads,
even for CPU intensive server workloads, if they exceed the
working set size of 90 MB.
Memcached is an in-memory key-value cache. We evaluate
it with YCSB’s all four popular workloads A (50% read and
50% update), B (95% read and 5% update), C (100% read) and
D (95% read and 5% insert). We run it with 4 default worker
threads running in DynamoRIO and RATEL settings. We vary
the YCSB client threads with Load and Run operations (to
load the data and then run the workload tests, respectively).
We fix the data size to 1, 000, 000 with Zipfian distribution
of key popularity. We increase the number of clients from 1
to 100 to find out a saturation point of each targeted/scaled
throughput for the settings. Here, we only present workload A
(throughput vs average latency for the read and update); the
other workloads display similar behavior.
As shown in Figure 7d, the client latencies of the two
settings for a given throughput are slightly similar until
approximately 10, 000 ops/sec. Then. RATEL jitters until it
achieves maximum throughput around 17, 000 ops/sec, while
DynamoRIO is flat until 15, 000 ops/sec (the maximum is
21, 000 operations per second). The shared reason of the
deceleration for both is that DynamoRIO slows down the speed
of Read and Update. For RATEL, the additional bottleneck is
the high frequency of lock contention with spin-lock primitive.
For e.g., RATEL costs 18, 320, 000 ms while DynamoRIO’s the
futex calls cost only around 500 ms for a given throughput of
10000 with 10 clients.
