Oblivious RAM (ORAM) is a renowned technique to hide the access pa erns of an application to an untrusted memory. According to the standard ORAM de nition presented by Goldreich and Ostrovsky, two ORAM access sequences must be computationally indistinguishable if the lengths of these sequences are identically distributed. An artifact of this de nition is that it does not apply to modern ORAM implementations adapted in current secure processors technology because of their arbitrary lengths of memory access sequences depending on programs' behaviors (their termination times). As a result, the ORAM de nition does not directly apply; the theoretical foundations of ORAM do not clearly argue about the timing and termination channels.
INTRODUCTION
Security of private data storage and computation in an untrusted cloud server is a critical problem that has received considerable research a ention. A popular solution to this problem is to use tamper-resistant hardware based secure processors including TPM [2, 38, 46] , TPM+TXT [21] , Bastion [8] , eXecute Only Memory , © 2017 Copyright held by the owner/author(s).
is is the author's version of the work. It is posted here for your personal use. Not for redistribution. e de nitive Version of Record was published in Proceedings of , , h p://dx.doi.org/10.475/123 4.
(XOM) [28] [29] [30] , Aegis [43, 44] , Ascend [15] , Phantom [31] , Intel SGX [32] , and Sanctum [9] . In this se ing, a user's encrypted data is sent to the secure processor in the cloud, inside which the data is decrypted and computed upon. e nal results are encrypted and sent back to the user. e secure processor chip is assumed to be tamper-resistant, i.e., an adversary is not able to look inside the chip to learn any information.
While an adversary cannot access the internal state of the secure processor, sensitive information can still be leaked through the processor's interactions with the (untrusted) main memory. Although all the data stored in the external memory can be encrypted to hide the data values, the memory access pa ern (i.e., address sequence) may leak information. For example, existing work [23] demonstrates that by observing accesses to an encrypted email repository, an adversary can infer as much as 80% of the search queries. Similarly, [55] shows that the control ow of a program can be learned by observing the main memory access pa erns which may leak the sensitive private data.
Oblivious RAM (ORAM), rst proposed by Goldreich and Ostrovsky [17] , is a cryptographic primitive that completely obfuscates the memory access pa ern thereby preventing leakage via memory access pa erns. Signi cant research e ort over the past decade has resulted in more and more e cient ORAM schemes [7, 10, 18-20, 33, 34, 39, 41, 42, 50] .
Generally speaking, an ORAM interface translates a single logical read/write into accesses to multiple randomized locations. As a result, the locations touched in successive logical reads/writes have exactly the same distribution and are indistinguishable to an adversary. More precisely, according to the original de nition of ORAM introduced by Goldreich and Ostrovsky [17] , the ORAM access sequences ORAM(A 1 ) and ORAM(A 2 ) generated by the ORAM for any two logical access sequences A 1 and A 2 respectively are computationally indistinguishable if ORAM(A 1 ) and ORAM(A 2 ) have the same length distribution (where the distribution is over the coin ips used in the ORAM interface). Almost all follow-up ORAM proposals claim to follow the same de nition of ORAM security.
A crucial subtlety regarding the above mentioned ORAM security de nition is that it is only applicable to the class of ORAM access sequences whose length is identically distributed. Speci cally, two ORAM access sequences ORAM(A 1 ) and ORAM(A 2 ) may in fact be distinguishable if they have di erent length distributions. In modern secure processors [15, 31] , a conventional DRAM controller is replaced with a functionally-equivalent ORAM controller that makes ORAM requests on last-level cache (LLC) misses. Since a program can have di erent number of LLC misses for di erent inputs, the lengths of their corresponding ORAM access sequences is not identically distributed, and can leak sensitive information (e.g., locality) via the program's termination channel by revealing when the program terminates. Furthermore, the speci c ORAM implementations also introduce further variance in the length of ORAM access sequences due to the additional caching/bu ering used for performance reasons, e.g., a Path ORAM [42] caching the position map blocks for future reuse [13] . Hence, the original ORAM denition ( [17] ) does not apply to practical ORAM implementations embraced by the modern secure processors due to their arbitrary distributions of lengths of ORAM access sequences. In other words, this de nition does not clearly separates or includes leakage over the program's termination channel.
Another source of leakage under Goldreich and Ostrovsky's ORAM is the ORAM access timing, i.e., when an ORAM access is made. Since the ORAM requests are issued upon LLC misses, the ORAM access timing strongly correlates with the program's locality and can potentially leak sensitive information via the ORAM timing channel. Periodic ORAM access schemes have been proposed to protect ORAM timing channel [14, 15] . Notice, however, that these schemes essentially transform the timing channel leakage into the termination channel leakage. Completely preventing termination channel leakage without sacri cing performance is a hard problem. Instead, the leakage can be bounded to only a few number of bits [14] .
In this work, we show that Goldreich and Ostrovsky's ORAM appropriately interpreted for in nite length input access sequences not only implies the standard ORAM de nition ( [17] ) for nite length input access pa erns, but also separates out termination channel leakage via ORAM access sequences. e proposed denition bridges the gap between theory and practice in the ORAM paradigm for secure processor technology and also simpli es proving the security of practical ORAM constructions. Speci cally, for Path ORAM [42] , by leveraging the background eviction [36] technique, our de nition relaxes the bounds on stash size and stash over ow probability while greatly simplifying the security proof presented in [42] and yet o ering similar security properties.
We also analyze a 'strong' ORAM de nition stating that two sequences ORAM(A 1 ) and ORAM(A 2 ) must be computationally indistinguishable if the lengths of the input sequences A 1 and A 2 are equal. is de nition implicitly includes a form of termination channel obfuscation and is applicable for ORAMs used for remote disk storage. Path ORAM satis es this stronger de nition -its security proof must now show that the stash over ow probability is negligible (a complex analysis). e paper makes the following contributions:
(1) A rst rigorous study of the original ORAM de nition presented by Goldreich and Ostrovsky, in view of modern practical ORAMs (e.g., Path ORAM), demonstrating the gap between theoretical foundations and real implementations in secure processor architectures. (2) We show that the Goldreich and Ostrovsky ORAM denition interpreted for in nite length input sequences separates out leakage over the ORAM termination channel leakage. We show how this de nition implies the Goldreich and Ostrovsky ORAM de nition for nite length input sequences, ts the modern practical ORAM implementations in secure processor architectures, and greatly simpli es the Path ORAM security analysis by relaxing the constraints around the stash size and over ow probability, and essentially transforms the security argument into a performance consideration problem. (3) A generic framework for dynamic resource partitioning in secure processor architectures is proposed to control leakage via contention on shared resources, allowing leakage vs. performance trade-o s. In particular, this can be used to reason analyse termination channel leakage. (4) We analyze a 'strong' ORAM de nition which implies the Goldreich and Ostrovsky ORAM de nition interpreted for in nite length input sequence. e 'strong' ORAM de nition implicitly includes obfuscation of the ORAM termination channel and this is useful in ORAM for remote disk storage (in order to prove that Path ORAM satis es this de nition one now needs to show a negligible probability of stash over ow).
BACKGROUND 2.1 Leakage Types via Address Bus Snooping
Privacy of user's sensitive data stored in the cloud has become a serious concern in computation outsourcing. Even though all the data stored in the untrusted storage can be encrypted, an adversary snooping the memory address bus in order to monitor the user's interactions with the encrypted storage can potentially learn sensitive information about the user's computation/data [23, 55] .
In particular, such an adversary can potentially learn secret information about the user's program/data by observing the following three behaviors:
(1) e addresses sent to the main memory to read/write data (i.e., the address channel). (2) e time when each memory access is made (i.e., the timing channel). (3) e total runtime of the program (i.e., the termination channel).
e countermeasures to prevent leakage via the above mentioned channels are orthogonal to each other and can be implemented as needed.
Oblivious RAM
Oblivious RAM is a renowned technique that obfuscates a user's access pa ern to an untrusted storage so that an adversary monitoring the access sequence to the storage cannot learn any information about the user's application or data. Informally speaking, the ORAM interface translates the user's access sequence of program addresses A = (a 1 , a 2 , . . . , a n ) into a sequence of ORAM accesses S = (s 1 , s 2 , . . . , s m ) such that for any two access sequences A 1 and A 2 , the resulting ORAM access sequences S 1 and S 2 are computationally indistinguishable given that S 1 and S 2 are of same length. In other words, the ORAM physical access pa ern (S) is independent of the logical access pa ern (A), except the lengths of the two access pa erns which are correlated. Precisely, an ORAM protects against leakage via the memory address channel only (cf. Section 2.1). e data stored in ORAMs should be encrypted using probabilistic encryption to conceal the data content and also hide which memory location, if any, is updated. With ORAM, an adversary is not able to tell (a) whether a given ORAM access is a read or write, (b) which logical address in ORAM is accessed, or (c) what data is read from/wri en to that location. We revisit the formal de nition of ORAM presented by Goldreich and Ovstrofsky [17] and discuss it in more detail in Section 3.
Path ORAM
Path ORAM [42] is currently the most e cient and simpli ed ORAM scheme for limited client (processor) storage. Over the past few years, several crucial optimizations to basic Path ORAM have been proposed which have resulted in practical ORAM implementations for secure processor se ing.
Path ORAM [42] has two main hardware components: the binary tree storage and the ORAM controller (cf. Figure 1) . Binary tree stores the data content of the ORAM and is implemented on DRAM. Each node in the tree is de ned as a bucket which holds up to Z data blocks. Buckets with less than Z blocks are lled with dummy blocks. To be secure, all blocks (real or dummy) are encrypted and cannot be distinguished. e root of the tree is referred to as level 0, and the leafs as level L. Each leaf node has a unique leaf label s. e path from the root to leaf s is de ned as path s. e binary tree can be observed by any adversary and is in this sense not trusted. ORAM controller is a piece of trusted hardware that controls the tree structure. Besides necessary logic circuits, the ORAM controller contains two main structures, a position map and a stash.
e position map is a lookup table that associates the program address of a data block (a) with a path in the ORAM tree (path s).
e stash is a piece of memory that stores up to a small number of data blocks at a time.
At any time, each data block in Path ORAM is mapped (randomly) to some path s via the position map. Path ORAM maintains the following invariant: if data block a is currently mapped to path s, then a must be stored either on path s, or in the stash (see Figure 1) . Path ORAM follows the following steps when a request on block a is issued by the processor.
(1) Look up the position map with the block's program address a, yielding the corresponding leaf label s.
(2) Read all the buckets on path s. Decrypt all blocks within the ORAM controller and add them to the stash if they are real (i.e., not dummy) blocks. Step 4 is the key to Path ORAM's security. is guarantees that a random path will be accessed when block a is accessed later and this path is independent of any previously accessed random paths (unlinkability). As a result, each ORAM access is random and unlinkable regardless of the request pa ern.
Although, unlinkability property follows trivially from the construction of Path ORAM, another crucial property to be proven is the negligible stash over ow probability for a small sized stash, i.e., O(λ) sized stash for λ being the security parameter.
Recursive Path ORAM
In practice, the position map is usually too large to be stored in the trusted processor. Recursive ORAM has been proposed to solve this problem [39] . In a 2-level recursive Path ORAM, for instance, the original position map is stored in a second ORAM, and the second ORAM's position map is stored in the trusted processor.
e above trick can be repeated, i.e., adding more levels of ORAMs to further reduce the nal position map size at the expense of increased latency. e recursive ORAM has a similar organization as OS page tables.
Background Eviction
In Steps 4 and 5 of the basic Path ORAM operation, the accessed data block is remapped from the old leaf s to a new random leaf s , making it likely to stay in the stash for a while. In practice, this may cause blocks to accumulate in the stash and nally over ow the stash. It has been proven in [42] that the stash over ow probability is negligible for Z ≥ 6. For smaller Z , background eviction [36] has been proposed to prevent stash over ow.
e ORAM controller stops serving real requests and issues background evictions (dummy accesses) when the stash is full. A background eviction reads and writes a random path s r in the binary tree, but does not remap any block. During the writing back phase (Step 5 in Section 2.3) of Path ORAM access, all blocks that are just read in can at least go back to their original places on s r , so the stash occupancy cannot increase. In addition, the blocks that were originally in the stash are also likely to be wri en back to the tree as they may share a common bucket with s r that is not full of blocks. Background eviction is proven secure in terms of the unlinkability property in [36] .
GOLDREICH'S OBLIVIOUS RAM
Oblivious RAM was rst proposed by Goldreich and Ostrofsky [17] . In this section, we rst revisit their de nition of ORAM and then discuss its implications on modern real ORAM implementations for secure processor architectures, speci cally Path ORAM.
Formal De nition
Let A be a sequence of program addresses 1 a 1 , · · · , a i , · · · requested by the CPU during a program execution, and let ORAM(A) be a probabilistic access sequence to the actual storage such that it yields the correct data corresponding to A. en ORAM is called an oblivious RAM if it is a probabilitic RAM and satis es the following de nition.
De nition 3.1 (Oblivious RAM). [17] For every two logical access sequences A 1 and A 2 and their corresponding probabilistic access sequences ORAM(A 1 ) and ORAM(A 2 ), if |ORAM(A 1 )| and |ORAM(A 2 )| are identically distributed, then so are ORAM(A 1 ) and ORAM(A 2 ).
Intuitively, according to De nition 3.1, the sequence of memory accesses generated by an oblivious RAM does not reveal any information about the original program access sequence other than its length distribution. Speci cally, this de nition only protects against the leakage over memory address channel (cf. Section 2.1).
In the above de nition we usually interpret A 1 and A 2 as nite length sequences implying that |ORAM(A 1 )| and |ORAM(A 2 )| will also be nite length. If in nite length input sequences A are allowed, then the orginal ORAM de nition (i.e. De nition 3.1) turns out to be equivalent to De nition 4.1 in Section 4. We will argue below why it is important to admit in nite length input sequences.
A Bogus ORAM
Explained below, De nition 3.1 for nite length input sequences invites the construction of a strange 'bogus' ORAM in which the access sequence of any probabilistic RAM -even if it is not oblivious -can be padded with additional accesses so that it becomes oblivious. Since the access sequence of a non-oblivious probabilistic RAM is only padded, this reveals information about the input access sequence to the probabilistic RAM. is, of course, breaks our intuitive understanding of what oblivious means. e reason why our construction is oblivious is that the additional padding creates a 1-1 correspondence between the access sequence of the probabilistic RAM and the nal length of the access sequence a er padding; this allows us to abuse De nition 3.1 as we essentially code all the information about the access sequence of the probabilistic RAM in the termination channel (the length of the ORAM sequence). is means that each access pa ern A will produce a unique length |ORAM(A)| -so, there are no two di erent sequences in De nition 3.1 for our 'bogus' construction that will be compared.
e bogus construction does not introduce any smartness, it e ectively pushes all the work of making the access pa ern oblivious to making the termination channel oblivious. is observation will lead to a slightly stronger ORAM de nition in Section 4 which is independent of the concept of a termination channel, i.e., the length of an ORAM access sequence does not play a role in the new de nition (which turns out to be equivalent to De nition 3.1 for unrestricted and possibly in nite length input sequences).
Algorithm 1 shows how a (non-oblivious) probablistic RAM RAM f (.) can be padded in order to create an ORAM: Here an input access sequence A to RAM f (.) is nite so that a nite length output sequence RAM f (A) is created which can be uniquely interpreted as an integer x in line 3. 2 e resulting padded ORAM sequence has length x, see line 4. is 
Applicability for Secure Processors
Modern secure processors [15, 31] have embraced Path ORAM interface as a part of their trusted computing base (TCB). In these implementations, the ORAM controller serves the last level cache (LLC) misses by making ORAM requests to the main memory. Consider the LLC misses sequence of an execution to be the input (A) to the ORAM interface de ned in De nition 3.1. In order to conclude indistinguishability (as per the above de nition) of two ORAM access sequences generated as a result of two di erent LLC misses sequences (i.e., by running di erent programs, or running same program with di erent inputs), the ORAM access sequences must have the same length distribution. However, since the LLC misses pa ern changes dynamically across various programs and di erent inputs to the same program [24] , it is very unlikely that the corresponding ORAM access sequences of two di erent executions will have the same length distribution. In particular, this would leak information about the program behavior through the total runtime of the application (i.e., the termination channel).
Another perspective to look at this fact is that De nition 3.1 is completely satis ed by only a small class of ORAM access sequences whose lengths are identically distributed. However, in practice, under the secure processor se ing, the lengths of ORAM access sequences can have arbitrary di erent distributions as discussed earlier. Furthermore, several optimizations and extensions proposed in the literature for Path ORAM, resulting in be er performance/security, introduce further probabilistic variance in the total runtime of the program, i.e., the termination channel. is, as a result, prevents the ORAM de nition under consideration from being directly applicable to secure processors.
ORAM Optimizations vs. Program Runtime
In the following discussion, we brie y talk about various optimizations and tricks proposed in the literature that have resulted in more and more e cient and secure Path ORAM implementations. Each of these techniques typically introduces some amount of variance in the length of the ORAM access sequence as function of , ,
Access memory according to RAM f (A)
Represent RAM f (A) as a binary bit string and interpret as an integer x which is ≥ the number of accesses in RAM f (A)
4:
Access memory according to another sequence A (taken from some a-priori xed distribution) such that the number of accesses in RAM f (A) combined with A is equal to x 5: end procedure the program input, hence, modifying the total runtime of the program that essentially correlates with the given input will leak some information about it.
3.4.1 Unified Path ORAM & PLB. Uni ed ORAM [13] is an improved and state-of-the-art recursion technique to recursively store a large position map. It leverages the fact that each block in a position map ORAM stores the leaf labels for multiple data blocks that are consecutive in the address space. In other words, we can nd position maps of several blocks in a single access to the position map ORAM, although only one of them is of interest. erefore, Uni ed ORAM caches position map ORAM blocks in a small cache called position map lookaside bu er (PLB) to exploit locality (similar to the TLB exploiting locality in page tables). To hide whether a position map access hits or misses in the cache, Uni ed ORAM stores both data and position map blocks in the same binary tree.
Having good locality in position map blocks would result in more PLB hits and overall less number of position map accesses to the Uni ed ORAM tree, and vice versa.
ORAM Prefetching.
In order to exploit data locality in programs under Path ORAM, ORAM prefetchers have been proposed [36, 52] . At rst glance, exploiting data locality and obfuscation seem contradictory: on one hand, obfuscation requires that all data blocks are mapped to random locations in the memory. On the other hand, locality requires that certain groups of data blocks can be e ciently accessed together. However, Path ORAM prefetchers address this problem by (statically/dynamically) creating "super blocks" of data blocks exhibiting locality, and mapping the whole super block on the same path. As a result, a single path read for accessing one particular block yields the corresponding super block which is loaded into the LLC, e ectively resulting in a prefetch. Consequently, good data locality in the program results in more prefetch hits and overall less number of ORAM accesses, and vice versa.
3.4.3 Timing Channel Protection. As noted earlier, the ORAM de nition does not protect against leakage over timing channel (cf. Section 2.1), i.e., when an ORAM access is made. Periodic ORAM schemes have been proposed to protect the timing channel [14, 15] . A periodic ORAM always makes an access at strict periodic intervals, where the time interval O int between two consecutive accesses is public. If there is no pending memory request when an ORAM access needs to happen due to periodicity, a dummy access will be issued (the same operation as background eviction). Whereas, if a real request arrives before the next ORAM access time, it waits until the next ORAM access time to enforce a deterministic behavior. Hence, periodic ORAMs essentially transform the timing channel leakage to the termination channel leakage by potentially introducing extra ORAM accesses due to periodicity.
Implications on Path ORAM Stash Size
Proving that the stash over ow probability is negligible implies Path ORAM's correctness and security. e stash over ow probability drops exponentially in the stash size. A signi cantly complex proof presented in [42] shows that, for Z ≥ 6, a negligible stash over ow probability can be achieved by con guring the stash size appropriately, where Z represents the number of blocks per node in Path ORAM's binary tree. ese parameter se ings might be well suited for asymptotic analysis, however, real implementations might choose a di erent set of parameters to optimize various design points. For example, a smaller stash size is desired to save hardware area overhead. Similarly, studies [36] have shown that Z = 3 yields the best performance for Path ORAM.
For smaller stash sizes and/or Z < 6, the stash over ow can be prevented through background eviction (cf. Section 2.5) which essentially adds 'extra' dummy accesses in the original ORAM access sequence. Notice, however, that satisfying De nition 3.1 requires restricting the ORAM access sequences to have identical length distributions, and hence does not apply to background eviction which would probabilistically modify the lengths of ORAM sequences depending upon the stash occupancy which is program input correlated.
As an example, consider a 2-level recursive Path ORAM where the original position map is stored in a second ORAM, and the second ORAM's position map is stored in the trusted processor (cf. Section 2.4). Let A 1 and A 2 be two program address sequences and let ORAM(A 1 ) and ORAM(A 2 ) be their corresponding ORAM access sequences. Notice that each entry of the sequence ORAM(A i ) consists of two accesses corresponding to the position map ORAM and data ORAM respectively, and is therefore likely to increases the stash occupancy by 2. Further notice that by de nition of recursive ORAM structure, each position map ORAM block contains path/leaf labels of several data ORAM blocks consecutively located in the program's address space.
Assume that A 1 accesses consecutive data blocks in the program's address space, whereas A 2 accesses random data blocks.
en, subsequent accesses from sequence ORAM(A 1 ) will exhibit higher temporal locality for position map blocks. is is because several position map accesses -corresponding to data blocks consecutive in the program's address space -will access the same position map block which is likely to be present already in the stash. erefore, the stash occupancy will grow at a rate of < 2 blocks per recursive access. Whereas, subsequent accesses from ORAM(A 2 ) exhibit extremely poor temporal locality among position map blocks due to the randomized sequence A 2 , therefore the stash occupancy will grow at a rate of ≈ 2 blocks per recursive access. Consequently, two ORAM accesses sequences exhibit two di erent stash occupancies due to the underlying program's behavior.
PROPOSED DEFINITION
In order to argue about indistinguishability of ORAM access sequences, we interpret Goldreich and Ostrovsky's ORAM de nition to also incorporate in nite length input access sequences and this implicitly obfuscates termination channel leakage so that the termination channel cannot be used for leakage in the de nition (this separates out the termination channel and invalidates our bogus ORAM as an ORAM). 
] n for some ( nite) integers i and j. Let k = max{i, j}.
en (by using causality) Finally, we notice that the above de nitions can also be adopted in a Universal Composability framework as in [16] .
Application
In the secure processor se ing an input sequence A represent the LLC misses sequence. In practice, we may think of the processor to continuously access memory (DRAM) and therefore produce an in nite length input access sequence A. is sequence is produced by several programs being contexed switched in and out, some programs terminating and new ones starting.
is shows that for an ORAM de nition to be useful in the secure processor architecture se ing we require Goldreich and Ostrovsky's ORAM De nition 3.1 phrased for in nite length input sequences. e termination channel is separated out as the ORAM interface does not terminate and keeps on executing. If a program (module) P terminates it will communicate over a di erent I/O channel its computed result. 3 e moment at which this happens leaks information to an observing adversary -in fact, the adversary can be another program running on the secure processor whose own termination channel leaks into what extent P has slowed down the adversarial program by using shared resources. In Section 5 we propose a framework for analysing leakage over covert channels induced by shared resources.
In the secure processor architecture se ing we do not need the 'strong' ORAM de nition. It turns out, see in nite length input sequences and its security proof is straightforward. However, PathORAM + background eviction does not satisfy the 'strong' ORAM de nition. We notice that PathORAM without optimizations such as background eviction does satisfy the 'strong' ORAM de nition and proving this requires a much more complex analysis (as one needs to show that the stash only over ows with negligible probability). e 'strong' ORAM de nition makes sense and is useful in the remote disk storage se ing because we will access the remote storage in bursts of requests and we wish the ORAM interface to only reveal the length of the burst and nothing more -in this way the 'strong' ORAM implicitly provides a useful characterization of leakage through the timing channel (i.e., when accesses happen).
e above de nitions translate to write-only ORAM: HIVE [6] and Li et al. [27] essentially use the 'strong' write-only ORAM de nition since these papers discuss the remote disk storage se ing. Flat ORAM [22] , on the other hand, is designed and optimized for the secure processor se ing and is secure under the Goldreich and Ostrovsky equivalent of a write-only ORAM de nition for in nite length input sequences. Flat ORAM does not (and does not need to) satisfy the 'strong' write-only ORAM.
Adapting ORAM Optimizations
An ORAM interface satisfying De nition 4.1 "automatically" caters for the arbitrary and dynamically changing rate of ORAM(A) accesses to memory per input access in A, and is therefore naturally a be er t for practical ORAM implementations (e.g., Path ORAM) in the secure processor se ing when compared to Goldreich and Ostrovsky's ORAM De nition 3.1 for nite length input sequences:
e cumulative e ect of various performance optimizations outlined in Section 3.4 on the termination channel can be incorporated in the proposed ORAM by de nition. E.g., the additional accesses added by the periodic ORAM schemes in order to hide the ORAM access timing, or ORAM prefetching resulting in a reduced number of accesses only results in an altered access sequence, which still remains in nite length.
Simpli ed Stash Analysis
Another crucial advantage of the proposed de nition is that it greatly simpli es the stash size analysis for Path ORAM. As mentioned earlier, the stash must never over ow for the correctness or security of Path ORAM under the 'strong' ORAM De nition 4.3, which imposes certain restrictions on the minimum stash size and ORAM parameters, e.g., Z ≥ 6. Whereas, according to De nition 4.1, it is totally acceptable to have a substantial percentage of background eviction accesses among the overall ORAM accesses, if needed, in order to prevent stash over ow for arbitrary parameter se ings. However, the impact of this relaxed ORAM de nition with any chosen parameter se ings is re ected in the overall performance of the system. e system performance can then essentially be benchmarked to tune the optimum se ings for desired design points depending upon the application.
PRIVACY LEAKAGE ANALYSIS
Recall from Section 3.1 that a standard oblivious RAM protects only against leakage over the memory address channel. In this section, we rst discuss common mitigation techniques for other leakage sources, e.g., the ORAM timing channel and termination channel. Later, we present a generic framework, called PRAXEN, that offers security vs. performance trade-o s against a wide range of hardware side channel a acks in a secure processing environment.
Timing Channel
5.1.1 Static Periodic Behavior. A straightforward approach to hide the ORAM timing behavior is to use a periodic ORAM scheme [15] , as introduced in Section 3.4.3. An ORAM access is made strictly a er prede ned periods, whereas the access period is statically de ned o ine, i.e., before the program runs.
e security of this approach follows trivially as it completely trades o the timing channel leakage with the total runtime of the program, i.e., altering the termination channel behavior. Notice that even if the periodic ORAM controller dynamically changes some internal performance parameters, such as prefetching rate and threshold to control background evictions rate, the resultant ORAM access sequence being strictly periodic only alters the termination time of a program.
Dynamic Periodic
Behavior. While the static periodic approach discussed above is secure, studies have shown that this approach can potentially result in signi cant performance overheads across a range of programs [14] . On one hand, a constant rate of ORAM accesses throughout the program execution is desirable for security, whereas on the other hand, a dynamically varying access rate is desirable for performance. In order to achieve a balance between the two extremes, [14] proposes a framework that splits the program execution into coarse grained (logical) time epochs, and enforces, within each epoch, a strict ORAM access rate that is selected dynamically at the start of each epoch.
Let L max be the maximum program runtime in terms of the number of ORAM accesses such that all programs can complete with ≤ L max ORAM accesses. Let E denote the list of epochs of a program execution, or the epoch schedule, where each epoch is characterized by its number of ORAM accesses, and let R denote the list of allowed ORAM access rates. While running a program during a given epoch, the secure processor is restricted to use a single ORAM access rate, and picks a new rate con guration at the start of the next epoch. Given |E | epochs and |R| rates, there are |R| | E | possible epoch schedules -which can potentially reveal the dynamic behavior of the program. us, the timing channel leakage alone can be upper bounded by log 2 (|R| | E | ) = |E | log 2 |R| bits. To control the amount of leakage, |E | can be set to a small value, e.g., |E | = log 2 L max , resulting in only log 2 L max · log 2 |R| bits leakage while achieving good performance.
Termination Channel
If the results of a program are sent back as soon as the application actually terminates, i.e., the actual termination time is visible to the adversary, sensitive information about the application's input can be leaked by this behavior. Given the maximum number of ORAM accesses L max within which all programs can terminate, the maximum number of termination traces/lengths that any program can possibly have is upper bounded by L max , i.e., one trace per termination point. erefore, applying the information theoretic argument from [14, 40] , at most log 2 L max bits about the inputs can leak through the termination time alone per execution. In practice, due to the logarithmic dependence on L max , termination time leakage is small. For example, log 2 L max = 62 should work for all programs, which is very small if the user's input is at least a few kilobytes. Further, we can reduce this leakage through discretization of runtime. For example, if we "round up" the termination time to the next 2 30 accesses, the leakage is reduced to lg 2 62 − 30 = 32 bits. e overall leakage by both timing and termination channels can be given by log 2 L max · log 2 |R| + log 2 L max bits.
Other Hardware Side Channels
While an outside adversary can only monitor an ORAM's external side channels, such as timing/termination channels; in a modern multi-core secure processor, there also exist several internal hardware-based side channels due to the inevitable sharing of various structures. SVF [11] experimentally measured information leakage in a processor and showed that any "shared structure" can leak information. In particular, privacy leakage over a shared cache has been explicitly demonstrated in [1, 54] for two VMs sharing a cache (without TEE support) showing that secret key bits can leak from one VM to the other, even if the VMs are placed on di erent cores in the same machine.
Researchers have explored how to counter timing channel attacks due to cache interference [12, 48] where solutions either rely on static or dynamic cache partitioning. e static approach lowers processor e ciency but has a strong security guarantee: no information leakage. Current solutions based on the dynamic cache partitioning approach improve processor e ciency but do not guarantee bounds on information leakage. We note that efcient cache partitioning is important as it improves processor e ciency [5, 25, 26, 35, 37, 45, 51] .
Researchers have also explored how to counter timing channel a acks due to network-on-chip interference in multi-cores [47, 49] . Both these schemes use static network partitioning to enable information-leak protection through the processor communication pa erns.
Finally, the most important shared resource channel in the ORAM context that leaks information from the hardware layer is the shared ORAM controller that connects (via a traditional memory controller) the processor to the o -chip memory. A recent work [4] shows that, under Path ORAM, an adversary running a malicious thread at one of the cores of the multi-core system can learn sensitive information about the behavior of user thread(s) running on other core(s) by introducing contention at the shared ORAM controller and observing the service times of its own requests. Again, a static partitioning scheme for this information leakage channel can be used at the cost of e ciency.
We want to design a generic dynamic resource partitioning scheme, applicable to any shared resource(s), based on the insight that leakage can be quanti ed using information theory [3, 40, 53] , in order for achieving a balance between security and performance.
PRAXEN: A PRivacy Aware eXecution ENvironment
In order to control privacy leakage while still dynamically sharing resources for e ciency, we propose a generic resource scheduling strategy which only takes a small, yet a su cient, amount of information about the current and past execution of application threads into account. Each application thread i ∈ A is associated with a con guration c i which serves as input to the resource scheduler for allocating resources to each thread, i.e., the scheduler assigns resources to each thread according to some (probabilistic) algorithm
For example, based on the collection of con gurations (c j ) j ∈A , the resource scheduler may rst, by using interpolation and extrapolation, reconstruct a complete approximate picture of all performance indicators which measure how all resources are being used by each of the threads. is rough picture is used to allocate the current resources to each thread -this allocation will not change (it is static) until one of the application thread's con gurations c i changes.
e reason not to use current measured performance indicators (as an input in Alloc) for scheduling is because these dynamically change with respect to execution decisions based on each application thread's state and this gives an uncontrolled amount of leakage. As we will see, the above static allocation allows precise control of privacy leakage from one application to another.
We call a change of thread i's con guration from a current conguration c i to a new con guration c i a decision point for i. Each decision point is associated with an actual time t i . At a decision point, the scheduler takes the real, i.e. actual measured, performance indicators of thread i in combination with its history of resource allocations to select a new con guration c i together with
• A future time t i at which the next decision point for i occurs, as well as • A set of future con gurations C i from which the next con guration for i will be taken.
We record the tuples (i, c i , t i , C i , t i ) in a history ordered by time t i . Notice that according to this ordering (i, c i , t i , C i , t i ) < (i, c i , t i , C i , t i ) and the above requirements state
So, at time t i a decision has been made about what con guration for i can be selected at the next decision point, and when this decision is applied. For a current time t we can extract from the past history PastHist of decision points the most recent tuples (i, c i , t i , C i , t i ) with t i < t for i ∈ A. We compute the time of the next upcoming decision point N extDecisionPoint as N ext(PastHist, t) = min i ∈A,t i ≥t t i .
Let i be the application thread which corresponds to the upcoming decision point. At this decision point the scheduler is allowed to only change i's con guration: e scheduler computes if T ime ∈ [N extDecisionPoint, N extDecisionPoint + δ ] then δ makes the approach reliable 4: Obtain Per f Ind i for thread i corresponding to NextDecisionPoint 5:
x ← F (PastHist, N extDecisionPoint, Per f Ind i )
6:
Change con guration c i to the one indicated in x at time N extDecisionPoint + δ Leakage Analysis: In the worst case all cores/threads, except for one, can collaborate (i.e., act as malicious threads) to observe one speci c (victim) thread i (and, in particular, observe its con guration changes). Note that the collaborating threads can only observe the victim thread i through changes in resource allocation. We argue that this information is fully captured by i's con guration changes and the times when these changes happened: e reason is that each epoch has (1) a static resource allocation among threads -e.g., DRAM bandwidth, ORAM access rate etc. -preventing internal side channel leakages within an epoch, and (2) indistinguishability of real vs. dummy ORAM accesses -preventing external side channel leakages within an epoch.
erefore, accros time the collaborating threads can only observe and use the output of Alloc((c j ) j ∈A ) in order to extract information about thread i. Hence, the privacy leakage of thread i is at most the information about thread i contained in PastHist which includes the history of con gurations (that form the inputs to Alloc). We notice that each decision point at time t in PastHist is the result of an algorithm F which onl takes as inputs a history PastHist j of past decision points before time t together with the corresponding N extDecisionPoint (which is also a function of past decision points before time t), and Per f Ind j . erefore, by using induction on t, we can prove that only the decision points corresponding to thread i in PastHist contribute to leakage (through Per f Ind i ) of thread i.
We conclude that privacy leakage of a speci c thread i is at most the information about thread i given by the history of i's decision points (i.e., con guration changes and the times at which these happen): e number of leaked bits is at most Shannon entropy H (PastHist i ) = H ({(i, c We may order the random variables {(i, c i ); and the second sum is upper bounded by log |C (j−1) | because c (j) ∈ C (j−1) . Let λ j = log |C (j−1) | then the i th thread leaks at most j λ j bits.
Given that algorithms F and Alloc have enough freedom to reallocate resources, our framework o ers a controlled leakage model while maintaining optimum performance. is methodology can be used on almost all resource sharing paradigms. It particularly has applications in se ings where there is a nite bounded leakage budget.
CONCLUSION
We present a rst rigorous study of the original oblivious RAM de nition presented by Goldreich and Ostrovsky, in view of modern practical ORAMs (e.g., Path ORAM), and demonstrate the gap between theoretical foundations and real ORAM implementations. Goldreich and Ostrovsky's ORAM de nition appropriately interpreted for in nite length input access sequebces separates out the ORAM termination channel and ts modern practical ORAM implementations in the secure processor se ing. e proposed de nition greatly simpli es the Path ORAM security analysis by relaxing the constraints around the stash size and over ow probability, and essentially transforms the security argument into a performance consideration problem. A generic framework for dynamic resource partitioning has also been proposed, which mitigates the sensitive information leakage via internal hardware based side channelssuch as contention on shared resources -with minimal performance loss.
ACKNOWLEDGMENTS
e work is partially supported by NSF grant CNS-1413996 for MACS: A Modular Approach to Cloud Security.
