Abstract-Recent research has demonstrated that Intel's SGX is vulnerable to various software-based side-channel attacks. In particular, attacks that monitor CPU caches shared between the victim enclave and untrusted software enable accurate leakage of secret enclave data. Known defenses assume developer assistance, require hardware changes, impose high overhead, or prevent only some of the known attacks. In this paper we propose data location randomization as a novel defensive approach to address the threat of side-channel attacks. Our main goal is to break the link between the cache observations by the privileged adversary and the actual data accesses by the victim. We design and implement a compiler-based tool called DR.SGX that instruments enclave code such that data locations are permuted at the granularity of cache lines. We realize the permutation with the CPU's cryptographic hardware-acceleration units providing secure randomization. To prevent correlation of repeated memory accesses we continuously re-randomize all enclave data during execution. Our solution effectively protects many (but not all) enclaves from cache attacks and provides a complementary enclave hardening technique that is especially useful against unpredictable information leakage.
I. INTRODUCTION
Intel Software Guard Extension (SGX) [15] , [32] enables execution of security-critical application code, called enclaves, in isolation from the untrusted system software. In particular, SGX was designed to ensure confidentiality of enclave data and integrity of enclave execution. Protections in the processor prevent a malicious OS from directly reading or modifying enclave memory at runtime. Processors are also equipped with keys that allow encryption for persistent storage (sealing) and remote verification of enclave software configuration (attestation). SGX enables development of applications and online services with improved security; the architecture is especially useful in cloud computing applications, where data and computation can be outsourced to an external computing infrastructure without having to fully trust the cloud provider and its entire software stack. a) Information leakage: Recent research has, however, demonstrated that SGX isolation can be attacked by exploiting information leakage through various software-based side channels. In SGX, memory management, including paging, is left to the untrusted OS [15] . Consequently, the OS can force page faults at any point of enclave execution and from the requested pages learn (coarse-grained) enclave control flow information or data access patterns [79] . Confidential enclave data can also be inferred by monitoring CPU caches that are shared between the enclave and untrusted software [9] , [62] , [49] , [27] . Compared to page-fault attacks, cache monitoring enables more fine-grained information leakage at the granularity of a single cache line (64 bytes on current Intel CPUs). Sidechannel attacks are a serious concern as they violate data confidentiality and thus defeat one of the main benefits of SGX -the ability to compute over private data on an untrusted (cloud) platform.
In this paper we focus on information leakage through shared caches. The problem of cache attacks has been extensively studied independently of SGX and several countermeasures have been proposed. One common approach is side-channel resilient software implementation, where the developer manually hardens the application code. For example, the scatter-and-gather technique [11] is often used to protect cryptographic implementations in which a look-up table is accessed based on a secret key. Another common approach is Oblivious RAM (ORAM) [68] and Oblivious Execution [46] , [44] , [43] . These techniques hide any data (or code) access by using data encryption and repeated shuffling. Unfortunately, without specialized hardware the performance overhead of such approaches is extremely high. Also, new processor and cache architectures have been proposed to prevent information leakage. For example, Sanctum [16] partitions the L3 cache and RPCache [73] randomizes cache eviction patterns. Finally, recent research has proposed SGX-specific side-channel defenses. T-SGX [65] and Déjá Vu [13] use transactional memory features to prevent attacks that repeatedly interrupt the victim enclave. Cloak [28] uses transactional memory to render an attacker's cache observations oblivious, before accessing sensitive memory content all cache lines are accessed. Cloak relies on the SGX enclave developer to annotate sensitive memory. Raccoon [58] attempts to hide accesses to developerannotated data. b) Our goals and approach: The goal of our work is to develop a generic and practical countermeasure against SGX cache attacks. In particular, we focus on information leakage due to secret-dependent data accesses, as those can effectively leak fine-grained information such as complete cryptographic keys [62] , [49] , [27] or genomic data used for person identification [9] . Our solution should meet the following highlevel requirements: as enclaves are often written by developers who are not security experts, the defense should not rely on the developer (in contrast to manual hardening and code annotation); the performance overhead should be moderate for various practical use (in contrast to ORAM and oblivious execution); the defense should protect enclaves in the current SGX architecture (in contrast to new hardware designs); and the solution should apply to any cache attack strategy (in contrast to point-defenses that, e.g., only detect attacks based on repeated interrupts).
We assume the common SGX adversary model, where the adversary controls the OS and any other system software. The adversary uses cache monitoring techniques, such as Prime+Probe [53] , to obtain a trace of the victim enclave's memory accesses. Although all demonstrated cache attacks exhibit significant noise in practice, we consider a powerful adversary that is able to obtain a perfect memory access trace at the granularity of cache lines. The adversary can monitor any cache level (L1, L2 or L3) and run the victim repeatedly.
Our main idea is to transform the enclave code such that all enclave data locations are randomized at fine granularity. The enclave picks a secret randomization key and on every data access computes a permutation for the accessed memory address based on the key. As a result, the adversary cannot map the observed (permuted) memory address to the actually used address. Because all enclave data is randomized without the need to understand its structure or semantics, we call our approach semantic-agnostic data randomization.
Randomization is a well-known hardening technique, but our approach is different from the existing solutions that typically randomize application code by leveraging its known structure (e.g., randomization at the granularity of functions or code blocks). Due to the well-known difficulty of C and C++ code analysis and pointer tracking, a similar semantic and structure is not available for data [7] . Indeed, ASLR systems like SGX-Shield [64] do not randomize enclave data and therefore do not prevent recent cache attacks [9] , [62] , [49] , [27] . We propose a conceptually different approach that allows randomization of enclave data regardless of its semantics.
c) Challenges and results:
The secure and practical realization of our approach imposes technical challenges. The first is secure and efficient permutation computation under adversarial cache monitoring. If the adversary is able to derive information from the process of address permutation, he can revert our randomization. The second challenge is the performance impact imposed by the randomization. Computing a permutation for every data access is expensive and causes a high runtime overhead. The third main challenge is information leakage through repeated memory accesses. Although an individual access is effectively hidden from the adversary, repetitive access patterns allow (permuted) address correlation and thus leak information.
We design and implement a compiler-based tool called DR.SGX (Data Location Randomization for SGX) that instruments enclave code at compile time such that all memory locations used to store enclave data are permuted at cacheline granularity during run time. The key techniques of our tool include a secure permutation mechanism that computes the permutation as small-domain encryption [6] using the CPU's cryptography hardware acceleration units (AES-NI). Therefore, the permutation process itself leaks no information to an adversary that monitors the cache. To increase performance, we implement a "permutation cache" that stores recently used address permutations. To increase the security of our solution against correlation attacks due to repeated memory accesses, we re-randomize enclave data probabilistically and gradually during enclave execution. The randomization rate is an adjustable parameter: more aggressive re-randomization hides repeated memory access patterns better at the cost of imposing a higher overhead.
We evaluate the performance of our tool and find that its runtime overhead ranges from 3.39× to 9.33× depending on the re-randomization rate. While this overhead is substantial, we argue that it is still practical for many applications. In comparison, state-of-the-art oblivious execution systems that leverage customized hardware cause overheads from 5× to 15× [43] , [46] . Our solution is software only.
We analyze the security of our solution and show that it effectively prevents cache attacks on many types of enclaves. However, as our solution re-randomizes enclave data locations gradually and probabilistically (to improve performance), it cannot eliminate all information leakage (similar to ORAM and oblivious execution) and prevent all attacks. In particular, correlation attacks on enclaves, where secret-dependent data accesses immediately follow predictable access patterns, are a limitation of our solution (see Section VII for details).
We see our approach as complementary to known sidechannel countermeasures. ORAM and oblivious execution provide strong protection with high cost. Manual hardening allows elimination of known attacks. Our solution enables hardening of enclaves against unpredictable information leakage (e.g., new attack vectors or sensitive enclave data that developers would fail to mark) with practical and adjustable overhead. DR.SGX can be used as the only or additional enclave hardening technique depending on the deployment scenario. d) Contributions: To summarize, this paper makes the following main contributions:
• Novel approach. We propose a novel defensive approach called semantic agnostic data randomization as a countermeasure against cache-based side-channel attacks on SGX.
• New tool. We design and implement a compilerbased tool called DR.SGX that instruments the code to permute an enclave's data memory at cache-line granularity and re-randomizes it gradually.
• Evaluation. We evaluate the performance of our system and find its overhead practical. We analyze the security of our solution and show that it prevents many cache attacks effectively.
The rest of the paper is organized as follows. In Section II we provide background information and Section III defines our problem. Section IV presents our approach and system design, and Section V details on our implementation. We evaluate DR.SGX's performance in Section VI and analyze its security in Section VII. Section VIII provides discussion, Section IX reviews related work, and Section X concludes the paper.
II. BACKGROUND
A. Intel SGX SGX introduces a set of new CPU instructions for creating and managing isolated software components, called enclaves, that are isolated from all software running on the system, including privileged software like the operating system (OS) and the hypervisor [48] , [34] . SGX assumes the CPU itself to be the only trustworthy hardware component of the system, i.e., enclave data is handled in plain-text only inside the CPU. Data is stored unencrypted in the CPU's caches and registers. However, whenever data is moved out of the CPU, e.g., into DRAM, it is encrypted and integrity protected.
The OS, although untrusted, is responsible for creating and managing enclaves. It allocates memory for the enclaves, manages virtual to physical address translation for the enclave's memory and copies the initial data and code into the enclave. However, all initialization actions of the OS are recorded securely by SGX and can be verified by an external party through (remote) attestation [3] . SGX's sealing capability enables persistent secure storage of data.
B. Cache Architecture
In the following we provide details of the Intel x86 cache architecture [35] , [33] using terminology from Intel documents [2] . We focus on the Intel Skylake processor generation, i.e., the type of CPU we used for our implementation and evaluation. 1 Memory caching "hides" the latency of memory accesses to the system's dynamic random access memory (DRAM) by keeping a copy of currently processed data in cache. When a memory operation is performed, the cache controller checks whether the requested data is already cached, and if so, the request is served from the cache, called a cache hit, otherwise cache miss. Due to higher cost (production, energy consumption), caches are orders of magnitude smaller than DRAM and only a subset of the memory content can be present in the cache at any point in time.
For each memory access the cache controller has to check if the data are present in the cache. Sequentially iterating through the entire cache would be very expensive. Therefore, the cache is divided into cache lines and for each memory address the corresponding cache line can be quickly determined, the lower bits of a memory address select the cache line. Hence, multiple memory addresses map to the same cache line.
The current Intel CPUs have a three level hierarchy of caches. The last level cache (LLC), also known as level 3 (L3) cache, is the largest and slowest cache; it is shared between all CPU-cores. Each CPU core has a dedicated L1 and L2 cache, but they are shared between the core's simultaneous multi-threading (SMT) execution units (also known as hyperthreading).
C. SGX Side-Channel Attacks
It has been speculated that Intel SGX would be susceptible to side-channel attacks since the early days of SGX [15] . Intel acknowledges the possibility of side-channel attacks on SGX enclaves [36] , however, considers them out of scope of the SGX adversary model and delegates responsibility to protect against side-channel attacks to the enclave developer: "Intel SGX does not provide explicit protection from side-channel attacks. It is the enclave developer's responsibility to address side-channel attack concerns." [36] .
Intel provides two recommendations in the SGX Developer Guide [36] on how to counter side-channel attacks on enclaves. A more general recommendation is that "[...] the application enclave should use an appropriate crypto implementation that is side channel attack resistant inside the enclave if sidechannel attacks are a concern." [36] . A second recommendation is with regard to the side-channel attack demonstrated by Xu et al. [79] : "[...] aligning specific code and data blocks to exist entirely within a single page" [36] . However, these recommendations are not sufficient when considering cachebased side-channel attacks on non-cryptographic implementations [10] , [9] .
Cache-based side-channel attacks can be classified with respect to which memory content is targeted. On one hand, code accesses can be observed to identify secret dependent execution paths. On the other hand, data access can be targeted to identify secret dependent data object usage. Both types of attacks have been shown to successfully extract information from SGX enclaves. Lee et al. [41] demonstrated an attack targeting execution paths, for which they also presented a defense approach. However, the majority of attacks target data accesses [10] , [9] , [62] , [49] , [27] . Another classification is based on the victim enclave type. The demonstrated attacks target either cryptographic algorithms [62] , [49] , [27] or privacysensitive processing such as genomic indexing [10] , [9] .
III. PROBLEM STATEMENT
In this work we focus on systems that provide an isolated execution environment that is implemented as an execution mode of the main CPU. In particular, the CPU's shared resources, like caches, are used by all execution modes of the CPU and thus are shared between isolation domains. Our work is targeted towards Intel SGX, however, the model also applies to other architectures like ARM TrustZone [4] or softwarebased isolation solutions [47] .
The attacker's capabilities to launch side-channel attacks might vary for those architectures. We overestimate the adversary's powers by assuming he can extract a "perfect trace" of cache events, i.e., he can identify every single cache access of the program under attack. We will detail on our adversary model subsequently (Section III-A).
We consider enclaves that process sensitive data in a hostile environment. Sensitive data in this context are not limited to cryptographic key, which are the "classical" targets of sidechannel attacks. Instead, sensitive data have to be seen much broader, for instance, when processing privacy sensitive data in the cloud. Brasser et al. [9] demonstrated that enclaves processing genomic data can be attacks using side channels. Basically, all enclaves will process sensitive data of some kind, motivating the developer to utilize the isolation properties of SGX in the first place.
A. Adversary Model
The adversary's goal is to extract sensitive information from an isolated execution environment (or enclave) through a cache side-channel attack. The adversary can freely configure and modify all software of the system, including privileged software like the operating system (OS). He knows the initial memory layout of the enclave, i.e., the code and initial data of the enclave. Furthermore, we assume that the adversary can initiate the enclave arbitrarily often, if the enclave does interact with an external entity we assume the adversary can simulate the external entity, e.g., by replaying data when needed.
However, the adversary cannot directly access the memory of the enclave. The internal processor state (e.g., the CPU registers) is inaccessible to the adversary, in the event of an interrupt the state is securely stored in an isolated memory region. He cannot modify the code or initial data of the enclave, i.e., the integrity of the enclave is ensured.
2
For this work we focus on cache side channels based on secret-dependent data accesses in memory. Memory accesses to code can leak sensitive data as well, however, we limit the scope of this paper to data memory access only and discuss code randomization in Section IX.
We assume that the adversary has a noise-free cache side-channel and can obtain a "perfect cache trace" of the enclave. This means that he can observe all memory accesses of an enclave, e.g., using a cache attack technique such as Prime+Probe [53] , where the adversary infers information about the victim's memory accesses by monitoring evictions in its own cache lines (the attack works because the victim's and the attacker's memory compete for the same cache lines). He can precisely determine which cache line has been used by the enclave and also the order in which the cache lines have been accessed. 3 The adversary cannot extract information which is more fine grained than accesses to cache lines, i.e., the offset inside a cache line is not observable to him.
More formally, the perfect trace is an ordered list Λ of addresses-prefixes accessed by the target enclave. Addressprefix is the part of an address that determines the cache line it gets mapped to. On current Intel CPUs the cache line size is 64 bytes, thus the last six bits of an address are oblivious to the adversary. 4 
B. Design Goals
General statements about which memory accesses of a program could leak information are hard to make in practice. All memory accesses must be assumed to potentially leak information if the attacker can associate them with relevant data elements of structures. For the adversary it is sufficient to distinguish two memory locations to learn one bit of information. Those memory locations could be two different data structures, e.g., two variables, or different elements within the same data structure, e.g., different entries in a table. To protect all possible programs, the data structures of a program and the elements within data structures both need to be randomized.
The goal of our work is to provide a protection mechanism against side-channel attacks that can be applied to arbitrary enclave programs without involvement of its developer. In particular, the developer must not be required to follow any rules or guidelines for programming his application or add annotations to the source code. This is important for seamless integration of a side-channel defense into the existing development processes. While annotating "critical" data in general helps improving the performance of most solutions it is also error prone for non-security experts. Especially in non-cryptographic applications it is not always obvious which accesses to data objects might leak sensitive information.
C. Limitations of Software-only Side-Channel Defenses
Various side-channel defenses have been proposed in the past. One approach is to modify software such that accesses to data and/or code are not secret dependent, for instance using the scatter-gather technique [11] . This approach eliminates the root cause of side channel vulnerabilities. However, it requires extensive manual efforts and is therefore only applied to a small set of software programs (or libraries). Cryptographic algorithm are natural candidates for this defense approach, the NaCl library aims at providing side channel resilient implementations [1] . Xiao et al. [78] suggest that most cryptographic libraries, like openSSL and mbedTLS, are vulnerable to sidechannel attacks in the adversary model of SGX.
Another approach proposed in the past is based on the detection of an ongoing side-channel attack. The effects on the cache of the attack are detected, e.g., through monitoring cache events using performance counter monitors (PCM) in hardware [14] , [81] . This approach requires either a software entity like to OS to perform to run the protection (which is untrusted in the context of SGX), or PCM access for the victim application (which an enclave cannot).
In the context of SGX two defenses have been proposed as response to the attack by Xu et al. [79] . T-SGX [65] and Déjá Vu [13] allow an enclave to detect when it has been interrupted, which is an effect of many side-channel attacks [79] , [49] , [41] . However, Brasser et al. [9] showed that a side-channel attack on SGX enclaves can be performed without interrupting it, thus, these defenses cannot protect against all known sidechannel attacks.
Oblivious RAM [68] provides strong guaranties and could thwart side-channel attacks. However, ORAM considers a client server model in which the client can keep meta-data securely. In the context of SGX the client would be the enclave while the RAM needs to be considered the untrusted server. The enclave has no secure storage but needs to store the metadata to the RAM and an adversary can observe accesses to the meta-data as well. Therefore, the meta-data itself needs to be accesses and process in a oblivious fashion, making ORAM not easily applicable to SGX. Furthermore, hiding all data accesses during program execution using ORAM is costly. Oblivious executions implementations supported by specialized hardware impose overheads from 5× to 15× [43] , [46] .
IV. OUR APPROACH
Our core idea is to break the link between side-channel observations made by an attacker and the sensitive information processed by the victim. Side-channel attacks inherently rely on the fact that the attacker has knowledge about correlation between an observable effect and the data he aims to extract. For instance, in a cache-monitoring attack the adversary observes (indirectly through the cache) which memory locations are accessed by the victim. From the memory locations the adversary infers the data elements that were accessed by the victim. The individual data elements in turn are linked to the sensitive data the attacker is interested in. Our defense obfuscates the link between memory locations and data elements. Data elements are located at randomized memory locations, so that the adversary cannot know from an observed memory access location which data element was accessed. The adversary no longer learns which data element was accessed but only learns that some data element was accessed. Our goal is to improve enclaves' security against side-channel attacks.
A. Requirements and Challenges
Below we describe the main challenges to be tackled when implementing randomization-based side-channel defenses.
1) Semantic Gap:
As discussed earlier our goal is to provide side-channel protection without involvement of the developer, e.g., source code annotations. Randomization of data without support by a program's developer is a challenging task due to the semantic gap that is inherent to unsafe languages like C and C++. Currently C and C++ are the only programming languages officially supported in the software development kit (SDK) Intel provides for the development of SGX enclaves.
2) Re-randomization: Randomizing the memory layout of a program once to prevent an adversary to learn which data has been accessed is not sufficient. The adversary can determine the relation of memory locations and data objects based on various information. For instance, if a particular data element is accessed at predictable point in the program's execution the adversary learns the randomized location of that object by observing memory accesses at this point. Similarly, access frequency can reveal the randomized location of data elements: if a particular object is accessed a known number of times the adversary can identify the object by finding the memory location that was accessed the correct number of times.
To thwart the adversary in recovering the randomized memory location of data objects their locations need to be changed throughout the runtime. Furthermore, the adversary must not be able to link individual memory accesses to data objects. In particular, the n-th memory access must not be related to a particular data object.
3) (Re-)randomization under Adversary Surveillance: All memory related actions of the program can be observed by the adversary, including those required during the initial data randomization and during the re-randomization process. The initial (un-randomized) memory layout is known to the adversary and he can monitor memory events while data is copied to its randomized locations. Similarly, when the adversary recovered information about the randomized location of data he could link the re-randomization operations. Therefore, the randomization has to be done in a way whose effects are not observable by the adversary.
B. DR.SGX
Our solution, a compiler-based tool called DR.SGX, addresses the design goals and challenges described above by randomizing all program data at fine granularity and rerandomizing the data continuously throughout the run time of the program. Figure 1 shows the high-level design of DR.SGX. The un-randomized memory layout of the example code shown on the left allows the adversary to distinguish, for instance, memory accesses to the first element of array a (a[0]) and the last element (a [31] ). As discussed in Section III, the adversary can distinguish memory accesses at cache line granularity, i.e., he cannot identify individual elements within a single cache line. In the example, the variables p, v and i all reside in the same cache line.
DR.SGX performs randomization at granularity of cache lines. As shown in Figure 1 , a random permutation function π is used to reorder the program's data in memory. The randomization is based on secret values which are generated and only accessible inside the enclave and only processed by cache-monitoring resilient algorithms.
The size of the memory region that holds the data of an enclave can be adjusted by selecting parameters of the permutation function π (Section V details those parameters).
1) Memory Access Instrumentation: DR.SGX performs randomization on cache line granularity for two reasons: (a) randomizing at finer granularity provides no advantage in security, and (b) randomizing in a data structures aware fashion is impractical due to the semantic gap. DR.SGX randomization requires that all memory accesses are instrumented, this is done as a compiler pass. The program code determines the memory location (i.e., memory address) of the data in the original, un-randomized layout. Then, before the access is performed, the randomized location of that address is calculated. The data is then accesses in its new, randomized location.
As we will elaborate in later sections, the cost of performing the randomization calculation for every memory access is significant. We overcome this problem by implementing a "permutation cache" for randomization information We use a cache-monitoring resilient algorithms to access the permutation cache, i.e., the randomized location of recently used data can be looked up and does not need to be computed again.
2) Initial Randomization: The initial randomization of the enclave's data needs to be done in a way that cannot be observed by the adversary, to keep him from learning the randomization function π. In particular, if the adversary can observe a read operation from the un-randomized initial memory layout and a subsequent write operation to a randomized address, he can link data structures to the randomized memory locations. A general approach to break up this linkage is to load a set of data into CPU registers (register operations cannot be tracked by the adversary) and write the data in a random fashion to their new locations. This approach, however, is limited in the size of the data that can be loaded at once into registers, enabling the adversary to learn partial information about the randomized memory layout. DR.SGX uses an initial randomization method which completely hides write operations from the adversary, using write operations, known as non-temporal writes [35] , that evade the CPU's caches. The adversary cannot observe effects of the write in the cache and cannot extract any information about the randomized memory layout. Hence, the adversary only observes reads from the initial, un-randomized memory, whose layout is known to him beforehand.
3) Re-Randomization: A single randomization of an enclave's memory layout is not sufficient to prevent an adversary from learning the relation between (randomized) memory locations and data objects. Therefore, DR.SGX continuously re-randomizes the memory layout. Starting from the initial memory layout l 0 a random permutation function π 1 is applied to derive the first randomized layout l 1 = π 1 (l 0 ). After a configurable window the memory layout is re-randomized, applying π 2 to derive l 2 = π 2 (l 1 ).
Like with the initial randomization, the adversary (who can observe reads from l n and writes to l n+1 ) could link those operations to learn the relation between those memory layouts. To prevent this, DR.SGX uses non-temporal writes and does the re-randomization in progressive and probabilistic manner. Data is not moved from l n to l n+1 in a single bulk operation that could easily identified by the adversary. Instead at each point in time two memory layouts exist at the same time and data is progressively moved from l n to l n+1 interleaving the "normal" memory operation of the enclave. However, precisely following this scheme would produce deterministic accesses which would allow an adversary to tell "normal" memory access and layout transfer operations apart. To hide this information the transfer operation are only performed with probability p making the order of memory accesses nondeterministic.
Maintaining two memory layouts simultaneously imposes the challenge that write operations will cause inconsistency between them. To tackle this problem for each memory access we need to keep track of which copy is currently valid. We do this using a pointer λ in the original memory layout l 0 . Accesses to addresses a ≤ λ use the newer memory layout (l n+1 ), while accesses to addresses a > λ use the previous layout (l n ). In Figure 1 , λ is between cache lines 4 and 5. In the randomized layouts the invalid memory parts are grayed out. For instance, the variables p, v, i and the first bytes of array a have been transferred to the new memory layout. Whenever a cache line is transferred over to the new layout, λ is incremented by the size of a cache line. After all data has been copied to the memory layout l n+1 the process starts over with a new memory layout l n+2 .
V. IMPLEMENTATION
This sections describes the details of our DR.SGX implementation. We start with the instrumentation process, then we explain how to randomize the memory layout in a way that is unobservable to the adversary. Next, we show how we improved DR.SGX's performance by introducing a cache for permutation results. Finally, we discuss the re-randomization of an enclave's data sections.
Throughout this section we will refer to data memory regions or data memory accesses simply as memory regions and accesses (omitting data). When distinction from code memory regions or accesses is required, we will use the appropriate terms.
A. Memory Access Instrumentation
DR.SGX randomizes the memory locations of an SGX enclave's data. The enclave, however, has been developed targeting a linear (virtual) memory model. Each memory access of an enclave has to be instrumented to determine the correct randomized memory location of the data element that is meant to be accessed. With DR.SGX the IR file is processed by an instrumentation pass before it is translated into machine code. Furthermore, DR.SGX adds a small library, which contains functions used to perform the randomization, to the enclave. This library can be written in a high-level language like C / C++ and is translated into IR as well.
Additionally, the instrumentation pass examines all allocations on the stack and transforms those which are larger than a single cache line into heap allocations. A pointer to the heap allocation is placed on the stack and the code using such a large element is modified to access the heap allocation instead of accessing the stack. We will detail on this mechanism in Section V-C. As an optimization, the instrumentation is not performed for addresses that the compiler knows to be on the stack.
Rand.
Lib Example: In Figure 2 an example for the instrumentation of a store instruction is given. In the C file the value 42 is written to the 257-th element of an array var. In IR this operation is translated into store of a immediate value (42) to a dereferenced pointer. This pointer is pointing to the unrandomized location of the array var and needs to be updated. The instrumented IR (Inst. IR) code shows the modifications that have been applied by DR.SGX. First, the pointer to the 257-th element of var is obtained through the use of getelementptr, and stored as an integer store.arg.int. Next, the function addrencrypt is called with the pointer to var as argument. This function is implemented in the DR.SGX library and resolves the given address to the corresponding randomized address. The randomized address is assigned to addrencrypt.res.int, which is afterwards converted into a pointer addrencrypt.res. Finally, the actual memory access is performed using the randomized pointer value.
B. Random Permutation
DR.SGX uses run-time data randomization, which is required for both the unobservable initial randomization as well as the re-randomizations. This means that the randomized location of data must be recovered dynamically. Using a purely random permutation would require to store extensive meta-data, which would then need to be accessed in an unobservable way. Therefore, DR.SGX uses a pseudo-random permutation function to determine the random location of data. This approach has two advantages: (1) collisions are inherently avoided, and (2) randomized locations can be computed based on a nonsecret algorithm and a key, which is small compared to the meta-data in the naive approach. However, the permutation function itself must be resilient against side-channel attacks, otherwise the adversary learns the randomization secret and can disclose the accessed cache lines.
We use small domain encryption for our random permutation function, the domain size must be in the order of memory size 6 used by the enclave employing DR.SGX. We choose a randomized address space of 4 MB 7 . In particular, we use the FFX Format-Preserving Encryption scheme, which is based on a 10-round Feistel network [6] . As the underlying block cipher for FFX we used AES, for which the hardware acceleration extension AES-NI [35] is available in all SGX-enabled CPUs. AES-NI provides both good performance and resiliency against cache-based side-channel attacks.
C. Initial Randomization
The initial randomization is particularly challenging since the adversary knows the initial memory layout of an enclave. Figure 3 shows the entire memory layout of an enclave with DR.SGX. If we used standard write operations to copy data from the initial data section l 0 to the randomized section l x , the adversary would be able to learn the randomized layout. Copying the data would lead to a memory access trace with alternating accesses to l 0 and l x (each value is read from l 0 and written to its randomized location in l x ), so the adversary would learn for each position in l x which element from l 0 was copied there.
In DR.SGX we use non-temporal write instructions to tackle this problem [35] . Non-temporal write instructions provide the processor with the meta-information that the data will not be used again soon by the program and it is not necessary to store them in the cache. On current Intel processors memory write operations using this instruction immediately affect the DRAM and are not buffered in the CPU's cache. 8 This means the write operations do not cause the written data to be added in the CPU cache and are therefore invisible to the adversary. He only observes consecutive accesses to the initial data section of the enclave l 0 .
The secret keys we need as input to our random permutation are generated by the hardware random number generator from inside the enclave. We use rdseed to obtain true random numbers from the CPU [35] . This way the adversary cannot influence or obtain the secret key used by DR.SGX.
The stack of an enclave is not randomized (cf. Figure 3) , however, with DR.SGX it only holds data elements that are smaller than a cache line. Large data objects, like arrays, are allocated on the heap and accessed by dereferencing a pointer placed on the stack. This prevents data leakage through secretdependent access within those data objects. 
D. Permutation Cache
Performing the calculation for pseudo-random permutations is very costly and needs to be performed for each memory access. To improve performance we introduced a cache for memory translations (Perm. Cache in Figure 3) . Permutation is performed at cache line granularity, i.e., all bytes in one cache line in l 0 are mapped as a single block.
When this block is moved to l 1 it will, with high probably, be mapping ta a different cache line, and to yet another cache line in l 2 and so on. On recent x86 processors a cache line is 64 bytes, thus, by storing the result no extra calculations are necessary for memory accesses that fall within the same cache line. Our cache is currently 1 KB which allows for a directmapped storage of permutation results for 256 translations. However, to prevent leakage through our permutation cache we have to access it in a oblivious way to the adversary. For each read to the cache we access all CPU cache lines in our permutation cache, which can be achieved with only 8 unaligned read operations.
E. Re-Randomization
DR.SGX constantly re-randomizes the memory layout of an enclave. Figure 3 shows the overall memory layout, with the initial global data and heap in section l 0 . Initially all data are copied in a pseudo-random permutation to the first random section l x . From there, the data is progressively copied to the second random section l y . As for the initial permutation, nontemporal write operations are used to hide accesses to l y from the adversary. For both sections different secret keys are used to define the permutations.
The data are copied in the order of the original memory layout (in Figure 3 the first five elements of the initial layout l 0 are in l x , while the remaining are in l y ). This scheme allows to decide quickly which permutation is valid for a given memory address. As discussed above, on each memory access an address in the initial layout is permuted.
DR.SGX does re-randomization gradually for each cache line sized memory block, i.e., one block is updated at a time. The cost of re-randomization primarily comes from the permutation calculations required. However, the pipelining of AES instructions in the CPU makes encrypting a small number of addresses only slightly more expensive than encrypting a single address. Therefore, the calculations for re-randomization are done using a piggybacking scheme, when a memory element is requested whose address is not in our permutation cache (cache miss) and that address needs to be computed anyway.
Predictable re-randomization operations, however, can be filtered out by the adversary, since we assume that the adversary has a perfect trace of memory accesses. To overcome this challenge the re-randomization actions are made probabilistic in DR.SGX. On each cache miss a re-randomization is performed with a configurable probability p. After multiple rerandomization steps the uncertainty for the adversary becomes too high for a reliable side-channel attack.
Re-randomizing only on cache-miss events in the permutation cache has the disadvantage that application-dependent cache-misses can be rare events. In such a case the rerandomization would not be sufficient to provide good protection against side-channel attacks. Therefore, we introduce a new configurable parameter t, of forced cache miss threshold. If t − 1 cache hits happen consecutively, the following lookup is treated as a cache miss, regardless of the actual cache status. Like all cache misses, a new line is re-randomized with probability p. This way we can enforce a lower bound on the re-randomization rate of DR.SGX.
VI. PERFORMANCE EVALUATION
We evaluated the performance of DR.SGX using the benchmark suite Nbench [12] . Benchmarking SGX code can be challenging, since well-known benchmark suites rely on a number of features, including system calls, timestamps, and the file system, which are not directly available in SGX. We chose Nbench because it has been previously used to analyze the performance, e.g., of SGX-Shield [64] . It relies only marginally on the file system and it is relatively simple (5217 lines of code), so it can be easily adapted to run inside an SGX enclave. The original version relies on timestamps to run each benchmark for an equal amount of time; since timestamps are not available in SGX enclaves we manually chose for each benchmark the lowest number of iterations that yielded a run time greater than 100 ms. Our test system has an Intel Skylake i7-6700 processor clocked at 3.40 GHz and runs Ubuntu 14.04.4. a) Overhead of DR.SGX modifications: To understand the impact of the various components of DR.SGX, we ran the benchmark multiple times with a subset of the components (see Section V). We first tested moving bigger allocations from the stack to the heap, i.e., replacing allocations on the stack bigger than 63 bytes with calls to malloc. We measured a negligible overhead well below 1% (Stack → Heap in Figure 4 ). Then, we tested the instrumentation of reads and writes (LLVM instruction getelementptr). In DR.SGX, instances of that instruction are followed by calls to our permutation function, unless the argument to the instruction is on the stack. In this test, the permutation function returns its parameter immediately, so the overhead reflects the impact of the instrumentation alone. We measured overheads between 0 and 96%, with a geometric mean of 40% (GEP inst. in Figure 4 ). Finally, we tested the whole system (without any forced cache miss threshold). We chose a value of 0.5 for the re-randomization probability p. Overheads range between 0.36× and 19.68×, with a geometric mean of 3.39×. The benchmarks Assign and LU have the biggest overheads, 16.22× and 19.68× respectively, due to high miss rate in our permutation cache. Those benchmarks have a cache miss rate above 1 time every thousand CPU cycles, while the other benchmarks have an average cache miss rate of 0.12 times every thousand CPU cycles.
b) Overhead of re-randomization: Next, we assessed the impact of various forced cache miss thresholds on the overhead and the re-randomization rate. We chose our forced cache miss thresholds t ∈ {256, 64, 16} but other values are possible. As a measure of the re-randomization rate, we compute the average number of memory events, i.e., reads or writes to the randomized area, during a re-randomization cycle. Figure 6 shows this measure for the benchmarks and the various thresholds. Figure 5 compares the overhead using the various thresholds. The data points in Figures 7 and 8 represent the geometric means of the overhead and the re-randomization rate respectively, while the colored areas represent the minimum and maximum values.
As mentioned, the geometric mean of the performance overhead without a forced cache miss threshold is 3.39× and the maximum is 19.68×. The geometric mean of the memory events per re-randomization is 18.6M, mainly because of benchmarks like StringSort and NNET which have a very high cache hit rate (99.99999% and 99.99977% respectively) and, thus, up to 1.7×10
11 memory events per re-randomization cycle.
Introducing a high forced cache miss threshold (t = 256) effectively reduces the memory events per cycle: the geometric mean is 1.5M, while the maximum is 4.0M (a reduction of five orders of magnitude). On the other hand, the performance overhead only increases slightly: the geometric mean is 4.01× while the maximum is 19.80×. The lowest threshold we considered is t = 16, which further reduces the number of memory events per re-randomization cycle: the geometric mean is 0.23M (maximum: 0.28M, a further reduction of an order of magnitude). The geometric mean of the performance overhead is in this case 9.33× (maximum: 23.30×). The best compromise is t = 64, where the geometric mean of the number of memory events per cycle is 0.62M (maximum: 1.1M) while the geometric mean of the performance overhead is 5.20× (maximum: 19.84×). Using t = 64 the overhead increases only slightly (the maximum overhead is practically the same) but the maximum number of memory events per randomization cycle decreases by a factor of 160000. c) Summary: Although the performance overhead is substantial (e.g., 5.2× for parameters t = 64 and p = 0.5), we argue that our solution is still practical in many use cases. Oblivious execution system that leverage customized hardware impose similar or higher overheads (e.g., 5× in [43] and 15× in [46] ), so implementing similar protections fully in software would be significantly slower compared to our approach.
Developers and system administrators can adjust the parameters of DR.SGX based on the available computing resources. For example, if the deployment scenario allows up to 10× overhead, the cache miss threshold can be set to t = 16 for maximal re-randomization rate and security (see Figure 7) . 
VII. SECURITY ANALYSIS
In this section we analyze security of DR.SGX and demonstrate that it provides significant security improvements compare to vanilla enclaves.
Prerequisites We recall that we consider a strong adversary model (cf. Section III-A) where the attacker has access to the source code of the enclave and can obtain a "perfect cache trace" of ordered events, which precisely records all cache events of a victim in sequential order. In practice, attackers typically have to additionally deal with noise in the side channel, which is introduced by, e.g., interrupts that cannot be completely eliminated even by privileged adversaries. Furthermore, state-of-the-art attack techniques (e.g., [9] , [62] , [49] , [27] ) do not provide sufficient accuracy and resolution necessary for capturing all the cache events. By considering such a strong model that over-approximates typical adversarial capabilities, we ensure our solution can defeat attacks of today's adversaries as well as potential future adversaries that might develop more effective attack techniques.
We would also like remind the reader that our solution does not intend to completely eliminate information leakage through side channels, but rather aims to provide improved security at more reasonable performance cost than, e.g., ORAM solutions [68] , [59] . Hence, in the following we discuss possible forms of leakage, respective attack scenarios and elaborate how well DR.SGX covers them.
A. Random guessing attacks
In the most simple scenario, the attacker identifies the part of the cache trace that includes accesses to the secretdependent data structure (e.g., a hash table). While such identification is going to be quite challenging task in practice (as we elaborate later on in section VII-C), we assume for a moment that it can be done, e.g., by running the nonrandomized version of the victim enclave, and counting the number of cache events from the beginning of execution until secret-dependent memory access occurrence. By using this information the attacker can find the location of secretdependent memory accesses in the trace of a randomized enclave, which is likely to be at the same location.
As a next step, the attacker attempts to reverse-engineer the permutation of elements within the data structure. In particular, when observing k secret-dependent and distinctive accesses to the object of n elements, the search space will be given by an arrangement of k from n:
This number grows rapidly with the size of n, making it non-trivial for an attacker to brute-force permutation even for objects of moderate size. For example, given a data structure of 50 elements and any number of secret-dependent accesses resulting in 25 distinctive accesses to the data structure, the amount of arrangements is as large as 1.96E+39, which gives the chance of a random guess of approximately 2 −131 , which is smaller than the probability to guess an AES-128 key.
B. Predictable and secret-dependent access
In the second attack scenario, a victim enclave exhibits predictable access pattern, e.g., it initializes a security sensitive data structure in a deterministic order, which is known to the attacker from the source code. Hence, it becomes possible to disclose information about permuted memory locations by analyzing access pattern of such deterministic routines in the permuted trace.
To defeat this type of attacks, DR.SGX uses probabilistic gradual re-randomization as the enclave execution progresses (cf. Section IV-A2). Thus, the amount of leaked information and, hence, security guarantees provided are dependent on two factors: (i) the attack window a, which denotes the part of the trace that includes both, deterministic and secretdependent accesses to the same data structure, and on (ii) rerandomization window T , which is a security parameter that determines the length of the trace without re-randomization. The smaller the window T is, the higher the chances that the memory layout is re-randomized in between deterministic and secret-dependent accesses.
Our randomization strategy is to periodically perform rerandomization, which eventually will take place with the given probability. This can be modeled as the number of Bernoulli trials performed until a success occurs, which follow the geometric distribution. Therefore, let X i be a geometrical distributed random variable with success factor p, with X i denoting the number of re-randomization trials performed until re-randomization takes place, and p being the success probability of re-randomization trial. For the geometric distribution, the expected (mean) number of trials up to and including the first success is 1 p . Furthermore, DR.SGX re-randomizes one cache line sized memory block at a time, which means that N re-randomization iterations are needed in order to re-randomize the entire memory region of size N · (cache line size). Hence, the expected mean of complete re-randomization window T can be expressed as: T = ∆T · N · 1 p , where ∆T is a basic rerandomization window (in a number of cache events) 9 , N is the size of the protected memory region, and p is the success probability of the re-randomization trial. For example, for ∆T = 1, N = 4 Mb/64 bytes and p = 0.5: T = 2 16 .
Note that the attack probability depends on the victim enclave, and in particular on the enclave specific attack window a. Furthermore, it also depends on the alignment of the rerandomization window T with the attack window a in the trace. In particular, T should begin before or together with a and end with or after it, otherwise there is a chance that locations of the secret-dependent data entries are affected by re-randomization. Assuming that |a| = m 1 , |T | = m 2 , the attack probability can be expressed as follows:
Generally, the sooner secret-dependent memory accesses happen after deterministic initialization, the narrower the attack window a becomes and the harder it gets to ensure that rerandomization window T is even tighter (i.e., to ensure that condition m 1 > m 2 holds). Hence, it is not far stretched to assume that some fraction of enclaves will be susceptible to this attack if no additional countermeasures are taken.
As an additional countermeasure, we resort to the next line of defense provided by DR.SGX. In particular, we observe that our progressive re-randomization introduces noise in a side channel as a side effect. In more details, re-randomization of each memory address adds additional memory accesses 10 . Because re-randomization happens in interleaving manner with enclave execution, these accesses cannot be distinguished from accesses made by the enclave's application logic.
The noise is added with a given probability p at intervals ∆T . Hence, it can be modeled as a sequence of Bernoulli trials with one of two possible outcomes, true or false, where 'true' represents the event of noise injection and 'false' otherwise. A sequence of Bernoulli trials that counts the number of 'true' occurrences can be modeled as a random variable that follows binomial distribution. The probability mass function of the binomial experiment is as follows: P (k; n, p) = P r(X = k) = n k p k q n−k , k = 0, 1, ..., n, where n k is a binomial coefficient, and P (k; n, p) represents probability that there will be k 'true' outcomes among n trials.
Therefore, the probability of finding how many 'true' events have happened within k attempts is given by the c.d.f. of the binomial distribution:
This implies that the larger n becomes, the harder it gets for the attacker to find the exact number of 'true' events. This becomes particularly difficult for privacy-sensitive applications operating on large volumes of private data, such as genomic indexing or machine learning algorithms.
C. Noise filtering
The attacker might try to filter out noise introduced by our progressive re-randomization by collecting multiple execution traces and averaging the noise out -the typical strategy used by adversaries dealing with noisy channels. However, our periodical re-randomization breaks correlation between different traces, because any observations made in previous rerandomization window become useless after re-randomization. Hence, the attacker is prevented from combining traces of different executions. Alternatively, the attacker might try to combine different segments of the same trace, if they are all from the same re-randomization window. This might be useful in cases when the victim enclave periodically performs secretdependent memory accesses within single execution (e.g., when encrypting multiple blocks using a block cipher). Even in such a case, the attacker is heavily limited by re-randomization in the amount of useful trace segments he could collect. For instance, when running IDEA benchmark we observed that it performed about 300 decryptions in total, while the address space was re-randomized 8.8 times This allows an attacker to collect 34 equal randomized traces (of the decryption routine) at most. On another hand, effective noise filtering typically requires significant more repetitions. For instance, the attack by Götzfried et al. [27] required traces of 9600 decryptions to reveal AES key, while the attack by Brasser et al. [9] leaked 70% of RSA key in ca. 300 repetitions.
D. Frequency-based access analysis
The next attack vector uses the fact that repetitive accesses patterns might be recognizable even in permuted memory layouts. This might ease the attacker's analysis by, e.g., helping him to align traces for noise filtering or for localizing secretdependent memory accesses in the trace. For instance, let us assume that non-randomized enclave produces 100 accesses to addr 01 address, which is permuted to addr 53 and addr 76 in the first and the second trace, respectively. When the attacker observes 53 53 53 53 ... hundred times in a first trace and 76 76 76 76 28 76 76 ... in the second trace, he might learn that in the second trace the re-randomization took place and can filter noise (28) out. As a countermeasure, we introduce an additional parameter t which we refer to as forced cache miss threshold, which enforces an additional cache miss whenever the number of cache hits reaches the threshold. This in turn triggers additional re-randomization round which injects additional noise as a side effect, thus effectively breaking the repetitive sequence.
E. Summary
To summarize, DR.SGX provides better security guarantees to enclaves with larger security sensitive data structures, and to those that do not use security sensitive structures immediately after their initialization. Furthermore, if enclaves do not provide the means to verify correctness of a guessed secret, it becomes practically impossible to filter out noise or guess the permutation of a data structure.
Overall, DR.SGX provides significantly better security guarantees than a baseline, vanilla enclaves running without any protection. Furthermore, DR.SGX's security parameters provide the flexibility to achieve good trade-offs between performance and security guarantees.
VIII. DISCUSSION a) Leakage quantification: Quantification of cachebased information leakage has been studied in previous works. CacheAudit [21] is a static analysis framework that given an x86 binary and a cache configuration yields an upper bound on the amount of information leakage via cache-and time-based side-channels. The information leakage is quantified based on the number of side-channel observations an attacker can obtain. In the model of CacheAudit, randomly permuted observations contribute to the total number of observations, even though the attacker may not learn any useful information from such accesses. Therefore, CacheAudit is not directly suitable for analyzing our defense. Zhang and Lee [80] modeled the cache as a finite state machine to analyze how well various cache architectures defend against side-channel attacks. The leakage is quantified using mutual information, which can be computed using a model-checking tool. Unfortunately, the currently available finite state machine models do not capture our defense; hence, we cannot directly use their tool to analyze our solution. Extending these tools to analyze our approach would be an interesting direction for future work, but beyond the scope of this paper.
b) Cryptographic vs. non-cryptographic enclaves: We notice that properties like repetitive use of the same secret in a single execution and the availability of means to verify correctness of guessing attempts (e.g., plaintext/cyphertext pairs) are typical for cryptographic algorithms, that are usually implemented by security experts and are often hardened against side channel attacks at source code level.
On the other hand, larger security sensitive data structures and long gaps between initialization routines and secretdependent accesses are common in non-cryptographic algorithms, such as genomic indexing and machine learning. Furthermore, these algorithms typically operate on secret data rather than cryptographic keys, hence, the attacker cannot easily verify the correctness of guesses. In our opinion, providing a protection mechanism for non-cryptographic applications is most desirable, as they are typically implemented by developers without any security background. While developers might not be able to select the right security level for their enclaves, they can estimate acceptable performance penalties for their applications. Based on this they can get the best possible security guarantees within their performance requirements.
IX. RELATED WORK
In this section we review side-channel attacks on SGX and compare existing countermeasures to our solution. a) SGX side-channel attacks: Intel has acknowledged that SGX may be susceptible to side-channel attacks [36] . Costan et al. [15] hypothesized possible attack strategies, but did not implement a concrete attack. The first demonstrated attack was by Xu et al. [79] who leveraged monitoring of (forced) page faults. The attack exploits the fact that the untrusted OS is responsible for enclave memory management, including paging. Lee et al. [41] implemented a side-channel attack, called branch shadowing, that reveals fine-grained control flow information of an SGX enclave. The attack exploits the fact that SGX does not clear the branch history cache on context switch.
Several recent works have shown that also shared caches can leak confidential enclave data. Brasser et al. [9] implemented a customized version of Prime+Probe that leverages Intel Performance Monitoring Counters (PMC) to mount an L1 cache attack on RSA decryption and a human genome processing library (with 40 repetitions the attack leaks patterns used for person identification during DNA sequence indexing). The attack does not interrupt the victim enclave to avoid detection [13] . Schwarz et al. [62] demonstrated that a malicious enclave can launch a cross-core L3 cache attack on other software running on the system (enclave or normal process). By monitoring roughly 340 mbedTLS RSA decryptions the malicious enclave can recover 96% of the victim private key. Moghimi et al. [49] implemented an L1 cache attack on AES that monitors AES S-Box accesses. By interrupting the victim at a high frequency they were able to extract a secret key in 10 iterations (for simple AES implementations). Götzfried et al. [27] leverage PMCs to mount an L1 cache attack on AES. They assume that the attacker is synchronized with the victim enclave for more precise priming and probing. The attack required 9600 decryptions to reveal the key. b) Code randomization: Address Space Layout Randomization (ASLR) [57] is a known defensive technique against memory corruption attacks (e.g. ROP [60] ). ASLR hides the locations of mapped memory (code and data) regions by randomizing their offsets at load time. More fine-grained solutions randomize code (but not data) at function [38] , basic block [75] , [18] , and single instruction [56] , [31] level. Static randomization solutions require that the memory layout remains secret. However, this assumption has been shown to be invalid [69] , [63] , [8] , [67] . A countermeasure is to rerandomize the memory layout at runtime. Bigelow et al. [7] observe that re-randomizing the memory layout between every program output and input is sufficient to defend against attackers who are limited to observing the output of a program. Shuffler [77] implements continuous code re-randomization during execution.
The aforementioned randomization techniques are insufficient to defend against privileged SGX adversaries that mount side-channel attacks. Traditional offset-based ASLR is not effective since the attacker is responsible for memory management and thus learns the "secret" randomized offsets. Also, any form of code (re-)randomization, and in particular fine-grained code randomization as implemented in SGX-Shield [64] , is ineffective against attacks that only monitor data accesses (e.g. [62] , [79] , [9] ). The attacker is not limited to observing program outputs, but can additionally observe the memory access pattern during victim enclave execution.
The fine-grained code randomization techniques are not easily applicable to data, where similar "randomizable units", such as functions or code blocks, are not available. Also, reliably tracking data pointers is a harder problem than tracking code pointers [7] . The C standard mandates that a function pointer cannot be casted into another data type, but no limitations are imposed on data pointers. A typical C program also contains many more data pointers than code pointers, which makes dynamic data pointer tracking expensive [7] .
Crane et al. [17] use dynamic software diversity to thwart side-channel attacks. Multiple, diversified code copies of a program are created and loaded in parallel. During the execution the currently executed copy is dynamically switched, leading to unpredictable effects in the side channel. This approach, however, is only applicable to read-only memory elements, like code. Using the same approach for data would lead to inconsistency between the copies. c) ORAM and oblivious execution: Oblivious RAM (ORAM) [24] , [25] and its variants [68] , [59] , [26] , [76] hide the memory access pattern of a trusted client (e.g., CPU or network client) to an untrusted and encrypted memory (e.g., DRAM or server) by introducing fake accesses and shuffling the encrypted memory elements such that the observable access pattern is independent of the actual access pattern. The performance of ORAM depends on the amount of secure memory available within the trusted client. State-of-the-art schemes, like Path ORAM [68] and Ring ORAM [59] , incur memory bandwidth overhead in the order of 100×.
Oblivious execution [46] , [44] , [43] techniques attempt to hide all observable effects of program execution, including both memory accesses (code and data) and timing information. The goal is that an attacker cannot distinguish executions of a program on one input from executions on different inputs. Oblivious execution on standard processor architectures is extremely expensive, and thus oblivious execution systems typically leverage customized hardware. GhostRider [43] combines a new FPGA hardware design with ORAM. Phantom [46] is an ORAM-based oblivious processor whose memory controller leverages bank parallelism in DRAM chips to improve performance. Depending on the workload, GhostRider and Phantom incur a performance overhead in the order of 5× to 15×.
In the context of SGX, Raccoon [58] provides oblivious data access for developer-annotated enclave data by modifying the enclave source code. Secret-dependent memory accesses are hidden by either using ORAM or by streaming over the entire data structure. Also, decoy paths are introduced such that all program paths exhibit the same observable effects. ZeroTrace [61] is an oblivious data structure framework for SGX that runs on top of a software memory controller. Currently, oblivious arrays, lists, and dictionaries are supported, but the framework could be extended with additional data structures.
Also, oblivious implementations of specific libraries have been proposed. Ohrimenko et al. propose data-oblivious machine learning algorithms [52] and a side-channel resilient MapReduce framework [51] . These techniques, however, are not applicable to arbitrary enclaves. d) Hardware-assisted memory bus protection: The sequence of memory accesses appearing on the untrusted memory bus can reveal information about the execution of a program [42] , [70] . Several defenses that decorrelate the actual memory accesses of a program from the memory accesses appearing on the memory bus have been proposed. HIDE [83] is a hardware-assisted mechanism that breaks the correlation between repeated accesses by permuting the address space at well-defined execution points. An integral part is the ability to lock cache lines, which requires changing contemporary cache architectures. Shuffle [82] augments the CPU chip with a shuffle buffer that allows to randomly shuffle memory blocks such that a memory block is written to a new location each time the block leaves the CPU chip. Gao et al. [22] propose a lightweight hardware-assisted scheme that reduces the high number of memory accesses in HIDE, as well as the high number of page faults in the Shuffle scheme. ObfusMem [5] assumes that both the processor and DRAM are trusted and hides access patterns with randomized encryption.
The proposed hardware modifications are not available in current SGX processors. Although our defense was designed to defend against cache attacks, it also addresses leakage on the memory bus (as all observed addresses are randomized). e) New cache architectures: Cache partitioning schemes [19] , [54] , [55] , [71] , [73] , [74] , [20] divide the cache into partitions that are not shared between processes. A static-partition cache [80] statically divides the cache such that different processes do not share any cache lines, while partition-locked caches [73] allow more fine-grained partitioning by giving a process the capability of locking a cache line, meaning that another process cannot evict this cache line until it is unlocked. This allows, for example, to securely implement AES by preloading the S-boxes.
Cache access obfuscation techniques [39] , [74] , [37] , [45] , [40] , [72] , [73] obfuscate the side-channel information obtained by the attacker, either by introducing noise or by randomizing the address to cache line mapping. A random eviction cache [80] adds noise to the channel by periodically evicting a random cache line. A random permutation cache [73] uses a dynamic random memory address to cache line mapping (note that traditional caches have a static deterministic mapping that is known to an attacker). Since this random mapping is dynamically updated and unknown to the adversary he learns nothing about which memory was accessed.
Due to the limited control over caches in contemporary SGX processors it is not feasible to implement cache partitioning in software. Cache obfuscation techniques on the other hand can be implemented in software. Similarly to a random permutation cache, our defense dynamically randomizes the observations of an adversary. f) Software-only side-channel defenses for SGX: Some SGX side-channel attacks rely on frequently interrupting the victim enclave (e.g. [79] , [41] , [49] ). T-SGX [65] and Déjá Vu [13] are compiler-based defensive mechanisms to detect malicious enclave interrupts from inside an enclave. Upon detection the enclave can take counteractive measures, such as simply stopping its execution. T-SGX [65] leverages the Intel Transactional Synchronization Extension (TSX) to detect synchronous enclave exits. Déjá Vu [13] monitors the execution time of an enclave to detect a slowdown caused by frequent enclave interrupts, which allows to detect both synchronous and asynchronous enclave exits. However, SGX cache-attacks that do not rely on interrupting the victim enclave [9] , [62] , [27] are not prevented.
Cloak [28] uses transactional memory (TSX) to preform atomic memory operations (i.e., memory operations that cannot be intercepted by the adversary) that hide sensitive memory accesses. Before sensitive memory is accessed all cache lines are touched (primed) by the enclave. The cache observing adversary always sees that all cache lines have been accessed, and thus learns nothing about the enclave's sensitive accesses. Cloak relies on the enclave's developer to annotate sensitive data structures to be protected from side-channel attacks.
DR.SGX does not require the developer to identify sensitive data, the developer only has to tune DR.SGX's securityperformance trade-off, which requires no security expertise. g) Memory-less encryption: Memory-less encryption has been studied motivated by cold-boot attacks [30] , [29] . TRESOR [50] provides memory-less AES encryption by using the x86 debug registers for secure key storage. At boot time the kernel loads the AES key into the debug registers and afterwards ensures that the contents of the debug registers are neither modified nor written to memory. Loop-Amnesia [66] implements memory-less disk encryption by storing a randomly generated master secret, which is used to encrypt the disk volume key for each mounted volume, in model-specific (CPU) registers. PRIME [23] implements optimized memoryless RSA encryption by storing non-critical or symmetrically encrypted intermediary values in DRAM. These schemes are similar to our implementation of permutation computation.
X. CONCLUSION
In this paper we have proposed semantic agnostic data randomization as a novel defensive approach against cache-based side-channel attacks on SGX. Our tool DR.SGX instruments enclave code such that all data locations in enclave memory are permuted at cache-line granularity and re-randomized during execution. Our solution prevents many cache attacks effectively and provides a complementary way to harden enclaves against information leakage.
