Abstract. In this paper we show how to engineer proofs of security for software implementations of leakage-resilient cryptosystems on execution platforms with concurrency and caches. The proofs we derive are based on binary executables of the cryptosystem and on simple but realistic models of microprocessors.
Introduction
The sharing of hardware resources is fundamental for the cost-effective implementation of concurrency in processors, operating systems, and the cloud. Unfortunately, sharing of hardware between conflicting parties introduces side channels that breach the isolation between processes and virtual machines. Typical goals of side-channel attacks are the recovery of cryptographic keys [20] and private information about users [22] ; shared resources that have been exploited to this end are processor caches [7] , branch prediction units [3] , and main memory [14] .
Leakage resilient cryptosystems [10, 26] offer formal security guarantees even if the underlying hardware reveals partial information about the internal state of the computation. In today's leakage resilient cryptosystems, the modeling of leakage focuses on physical characteristics such as power consumption or electromagnetic radiation. So far, there has been little focus on applying leakage resilient cryptography to other forms of leakage that arise in the setting of modern computing platforms, and in particular to leakage through cache.
In this paper we show how to engineer proofs of security for software implementations of leakage-resilient cryptosystems on execution platforms with concurrency and caches. Our proofs are based on binary executables of the cryptosystem and on simple but realistic models of microprocessors. We obtain them by tackling the following technical challenges:
-We propose a novel notion of leakage that caters for concurrent accessbased adversaries [13, 20] . This notion of leakage characterizes an adversary that can choose an inital cache state and observe the final cache state, for each time slice of a concurrently running computation. We specialize this notion of leakage to pseudorandom generators, and propose a new security definition in which the adversary can freely interleave request queries, that leak information about keys, with test queries, that output a real or random output according to a secret bit b.
Then, we prove leakage-resilience of a PRF-based PRG in this model. Our proof goes beyond the one from [26] in that it allows leakage functions to be adaptively chosen and makes weaker assumptions on locality of leakage. These relaxations are essential for dealing with concurrent access-based adversaries as considered in this paper. We cast our proof in terms of games, for future certification using automated tools [5] .
-We propose a novel program analysis technique that allows us to statically derive upper bounds on the range of all leakage functions that a concurrent access-based adversary can apply to the state of the cipher. These upper bounds can be used for instantiating the parameters of the cryptographic proof. Our technique is based on an algorithm that efficiently maintains a compact representation of a superset of the set of observations that any concurrent cache adversary can make. We cast this algorithm as an abstract domain [8] , which we plug into CacheAudit [9] , a framework for the automatic, static analysis of cache side-channels of binary executables.
We perform a case study where we use our analysis techniques for certifying the security of a binary executable of a leakage-resilient pseudorandom generator that is based on a library implementation of the AES block cipher. Using our novel abstract domain, we derive bounds for the side-channel leakage of this implementation to concurrent cache adversaries, which we use to instantiate the cryptographic proof. For example, we can show that the advantage of an adversary for distinguishing the output of the PRG from random is upper-bounded by 1 2 94 for an 8KB cache with 128B line size for AES-256. We stress that on several modern CPUs, AES is either implemented in hardware [1] or can be implemented in software without cache side channels [15] . Here, we use AES as a simple yet realistic example for demonstrating the feasibility of platform-based security proofs.
In summary, our contributions are to show how existing cryptosystems can be connected to a notion of leakage that captures caches and concurrency, and to develop program analysis techniques that enable us to statically deliver leakage bounds based on executable code.
Related Work Leakage resilient cryptography (e.g. [10, 19] ) provides models for expressing the security of cryptosystems against adversaries that can obtain partial information about the internal state of the computation. Yu et al. [26] specialize these models to match engineering experience in power analysis attacks. In particular, they account for an adversary who chooses the leakage functions a priori, i.e. before the attack. Moreover, their model requires that each leakage function is applied only to the inputs and outputs of a particular round. Based on these assumptions, they prove the security of a simple pseudorandom generator.
Our leakage model is inspired by that of Yu et al. [26] , but it differs in the following two aspects. First, we allow the adversary to adaptively choose leakage functions between rounds, from a fixed set of leakage functions. This accounts for the fact that, in concurrent cache attacks, the adversary can partially influence the leakage during the attack by interacting with the cache and the scheduler.
Second, instead of applying a leakage function to the inputs and outputs of a round, we apply it to all previously sampled keys. This accounts for the fact that, in cache attacks, keys might persist in the cache beyond the rounds in which they are used.
On the static analysis side, we base our work on CacheAudit [9] , a framework for the automatic, static analysis of cache-based side-channels. CacheAudit makes use of the fact that one can obtain upper bounds for the information leaked through the cache by abstract interpretation and model counting [17, 18] .
An alternative, language-based approach by Zhang et al. [27] is to mitigate timing side channels based on systematic addition of delays. Another approach by Stefan et al. [24] uses typing and restrictive scheduling to close cache timing leaks. Our adversary model differs from theirs in that we consider access-based adversaries, i.e. those that can probe the cache. Finally, two recent approaches rely on the operating system making sure that caches are flushed upon context switches [4] or that security-relevant blocks are never evicted from the cache [16] . In contrast, our approach focuses on the security of the client program and makes only weak assumptions on the operating system.
Organization of this paper
In Section 2 we formalize a leakage model for concurrent cache adversaries and in Section 3 we present a proof of cryptographic security against this leakage model. In Section 4 we present algorithms to compute bounds on the leakage based on binary code, which we put to work in a case study in Section 5. We conclude in Section 6.
Leakage to Concurrent Cache Adversaries
In this section we express the information that is leaked to a cache side-channel adversary in terms of program semantics, where we consider a scenario in which the adversary and the victim are concurrent processes that share the same processor cache. Upon a context switch the adversary partially observes the final cache state of the victim's computation. 4 The adversary can further choose the initial state of the cache of the victim's subsequent time slice. Early instances of this kind of attack against AES can be found in [7, 20] , a more recent and highly effective one is [13] .
Programs, Computations, Caches
A program P = (Σ, I, T ) consists of the following components
For reasoning about cache side channels, we consider a program state that consists of logical memories (representing values of memory locations and registers) in M and a cache state in C (representing the memory blocks that are currently loaded, but not their content), i.e., Σ = M × C. The memory update upd M is a function upd M : M → M that is determined by the instruction set semantics. The cache update is a function upd C : M × C → C that is determined by the cache replacement strategy. For a formal description of the LRU replacement strategy, see Appendix A.1. We obtain the global transition relation T ⊆ Σ × Σ as
which formally captures the asymmetric relationship between logical memories and caches.
A computation of P is a sequence of states and σ 0 σ 1 . . . σ n ∈ Σ * such that σ 0 ∈ I and that for all i ∈ {0, . . . , n − 1}, (σ i , σ i+1 ) ∈ T . The set of all computations is the trace collecting semantics Col (P ). We further denote the projection of all computations to logical memories by Col M (P ).
Leakage to Concurrent Adversaries
We assume that our program runs concurrently with the adversary, where we make the worst-case assumption that the adversary can probe and set the cache state at each context switch. For formalizing this adversary we assume a given set of context switches A ⊆ N. A concurrent computation for A is a sequence of states (m 0 , c 0 ), . . . , (m n , c n ) ∈ Σ * such that 1. for all i ∈ {0, . . . , n − 1} : m i+1 = upd M (m i ), i.e. the logical memory is always updated according to the program semantics; 2. for all i ∈ {0, . . . , n − 1} \ A : c i+1 = upd C (c i , m i ), i.e. without a context switch the cache is updated according to the program semantics; 3. for all i ∈ A ∪ {0} : c i+1 = upd C (c * i , m i ), i.e. at each context switch and initially, the cache can be set to an arbitrary state c * i by the adversary. That is, the adversary's choices can be expressed as a tuple a ∈ C A∪{0} of cache states. They define a mapping
where Col A (P ) denotes the set of all concurrent computations for A. Likewise, we can express the observations an adversary can make at context switches in A as a function π A : Col A (P ) → C A that projects concurrent computations to sub-sequences of cache states with indices in A. The composition of both functions defines a leakage function
that maps internal states of the computation to cache observations at A. Notice that cache states in our model only track which memory blocks are loaded, but not their content. Observing a cache state models leakage about accesses to memory space that is shared between victim and adversary, as in [13] . For modeling disjoint memory spaces, we consider observations that only reveal how many memory blocks are loaded in each cache set [17] .
Leakage about a Key For expressing the leakage about keys in round-based constructions, we assume that the key of round j can only affect the cache between positions α(j) and ω(j) of each computation. As a consequence, information about the key is observable at context switches between those positions. Moreover, information about the key may also persist in the cache state beyond ω(j) and be observable at the subsequent context switch. We account for this by
, the leakage about the key of round j can then be over-approximated by the following function:
Schedulers Without any restrictions on when context switches can occur, we cannot hope to obtain meaningful security guarantees. 5 To model such restrictions, we introduce the notion of a scheduler S ⊂ P(N) that describes all permitted sets of context switches A. For a given scheduler, we can completely characterize the set of functions L j the adversary can apply to a round key by
This class of leakage function will provide the interface between the cryptographic proofs and the guarantees derived by the static analysis.
Leakage Resilient PRG
Stateful pseudo-random number generators (PRGs) that depend on a secret key can be used as the basis for stream ciphers. Such constructions have been proposed as a means to provide leakage resilient cryptographic primitives [10, 21, 25, 26] . In this section, we prove the security of a stateful pseudo-random number generator based on a pseudo-random function (PRF), assuming partial leakages L j on round keys as discussed in the previous section. Our proof is given in terms of bounds on the advantage of distinguishing the PRG from a truly random generator, depending on the computational power of the adversary and the maximal leakage per key.
Leakage is depicted by dotted arrows.
A Leakage Resilient PRG
The construction is depicted in Figure 1 , and is based on a pseudorandom function 2PRF : {0, 1} n → {0, 1} n+m that takes as input a round key k i of n bits and returns as output a pseudorandom string of n + m bits 6 . The first n bits of this string are used as a key k i+1 for the next round, and the last m bits are output.
The sequence x 0 x 1 . . . is the output of the pseudorandom generator.
Our security proof is inspired from [26] , and follows the spirit of so-called practical leakage-resilient cryptography, where bounds are obtained assuming leakage functions that match engineering practice. In particular, we make the assumption that leakage does not reveal information about future computations; for concurrent access-based cache attacks, this assumption is perfectly natural, since caches only hold information about past computations made by the victim. Our proof is based on the random oracle model; extending the proof to the standard model as done e.g. in [26] is left for further work.
The security of the pseudorandom generator is expressed in terms of a cryptographic game where, in each round i, the adversary can do one of the following: -test the round, and get the legitimate output x i or a random output, according to a secret bit b sampled uniformly at the onset of the game. This is called a test query.
Moreover, the adversary has access to an oracle 2PRF adv which he can query for the output of the 2PRF, for a chosen key. After p rounds of this game, the adversary is asked to guess the bit b. The adversary wins if his guessb is correct, i.e. b =b. In summary, this game captures the notion that outputs of the 2PRF should be indistinguishable from random.
The game is formally defined in Figure 2 . We use K adv to store the keys queried by the adversary to the 2PRF adv oracle and K reqtest to store the round 
j is the set of functions that the adversary can apply to the key of round j.
We now present the main theorem of this section, which quantifies the advantage of any adversary from distinguishing the PRG from a truly random number generator, given that he makes at mostueries to the 2PRF adv oracle, sees at most p outputs of the PRG and that the total leakage per key is bounded by a constant d.
Theorem 1. Let A be an adversary that makes at most p queries to request or test, and at most q queries to
Proof. The idea of the proof is to bound the adversary's advantage by the probability of the following events: 1) there is a cycle in the PRG, due to a repetition of a round key, 2) the adversary guesses a round key that was already used in a previous round and 3) the adversary guesses a round key before it is used. These are precisely the cases in which an adversary could distinguish the PRG from a truly random generator, as we show using a game reduction and Shoup's Lemma [23] : Starting from the original game as depicted in Figure 2 , we defined a transformed version G 1 , where we modify the oracles 2PRF and 2PRF adv so that only adversary queries are stored in the map S, whereas queries originating from request and test are always answered with fresh random values. This only makes a difference if there is a collision between secret keys, i.e. k i = k i ′ for distinct i and i ′ (we call this event bad RR ), or if the adversary calls the 2PRF adv oracle with a secret key, i.e. k i ∈ K adv for some i; we distinguish the case where the adversary queries 2PRF adv with k i before the i th round (we call this event bad AR ) from the case where the adversary query occurs after the i th round (and call this event bad RA ). We introduce bad flags to capture these events and modify the code of the oracles as follows:
The two games are equivalent up to bad. It follows from Shoup's Lemma [23] that
Moreover, the key k i is always a freshly sampled value and |K reqtest | ≤ p, therefore the event k i ∈ K reqtest (bad RR ) is a standard birthday event and has a probability of at most
. Also, the probability of the event k i ∈ K adv (bad AR ) for a given freshly uniformly sampled k i key is upper bounded by q 2 n . There are at most p rounds, i.e. 0 ≤ i < p, thus the probability of a collision between k i and a key in K adv is upper bounded by q p 2 n . Finally, the value x output by the test oracle is a fresh uniformly sampled value for each round, and hence the probability of the adversary A guessing the bit b correctly in G 1 is 1 2 . Summarizing, we have
Next, we introduce a game G 2 in which a fresh key k is sampled uniformly at the onset of the game, and an adversary A ′ can observe at each round i the value of λ i (k), for a leakage function λ i drawn from a set L i . The adversary wins if he guesses correctly the key. The game is formalized as follows:
One can prove that for every adversary A against G 1 making at most p queries to the request oracle andueries to the 2PRF adv oracle, there exists an adversary
Computing Bounds on the Leakage
For computing the range of the leakage functions that a concurrent cache-based adversary can apply to the internal state of a concrete program, one needs to consider all possible computations, which is infeasible in most cases. Abstract interpretation [8] overcomes this fundamental problem by resorting to an approximation of the state space and the transition relation. In this section, we present corresponding approximations for concurrent cache-based adversaries.
We proceed by reducing the problem in two steps to the problem of computing numbers of reachable cache states, for which static analysis techniques are in place [9, 17] . Throughout the section we rely on the notation introduced in Section 2.
Reduction to empty initial cache states
In the first reduction step, we show how to soundly abstract from the adversary's choices of cache states. The result is a generalization of a result from [9] to concurrent computations.
Let a ∅ ∈ C A∪{0} be the mapping that takes each i ∈ A ∪ {0} to the empty cache state.
Lemma 1. For the LRU replacement strategy and all A and a
Proof. With LRU replacement, each cache set (seen as a list of memory blocks) of the final cache state of the time slice of a computation with initial cache state a ∅ (i) is a prefix of the corresponding cache set of the same computation, performed with initial cache state a(i). The remaining lines of each set are determined by a(i). This correspondence defines, for each a, a surjective mapping from ran(Λ A,a ∅ ) to ran(Λ A,a ), from which the assertion follows.
Abstract Interpretation
Reachability problems on programs can be cast as finding fixpoints of the transition relation, because reaching a fixpoint means that no new states can be discovered. Abstract interpretation [8] computes such fixpoints based on approximations of the statespace and the transition relation. The relationship between the abstract and the concrete statespace is given by a concretization function γ that maps abstract states to sets of concrete states. A static analysis is (globally) sound if the concretization of an abstract fixpoint contains a concrete fixpoint. In our case, the goal is to define such an abstract domain T ♯ whose fixpoints
CacheAudit [9] is an abstract interpretation framework that enables computing such abstract fixpoints t * based on binary executables and concrete cache models. For framing a sound cache analysis within CacheAudit, an abstract domain needs to satisfy the following local soundness condition, where B denotes the set of all memory blocks:
This statement captures that the abstract cache update function upd T ♯ computes a superset of the concrete cache update function upd T . Computing the set of reachable observations w.r.t. upd T ♯ is hence necessarily a superset of upd T . The global soundness then follows from the fact that CacheAudit updates the abstract cache with a superset of the set of possible memory blocks that are accessed at each program point.
Theorem 2. Local soundness implies global soundness, i.e. (2) ⇒ (1)
This theorem from [9] is a specialization of a result of [8] to the way in which abstract domains are combined in CacheAudit. We present our new abstract domain in two steps: in the first step we abstract sets of cache traces by traces of sets of cache states, while keeping enough information about the history of computation to obtain reasonably precise bounds. In the second step we further abstract to obtain finite representations. Since in abstract interpretation, abstract domains compose, it is enough to prove soundness of each step to have the soundness of the whole abstraction process.
An Abstract Domain for Concurrent Computations
One of the main reason for the intractability of computing leakage functions is the need to keep track of sets of traces. The first step we propose is to abstract such sets into a single (possibly infinite) trace of abstract states abstracting sets of caches and possible interruption choices of the adversary. For that purpose, we assume a given abstract domain for cache states such as the one from [11] , whose elements we denote by c ♯ . The concretization of an abstract cache is a set of caches, i.e.
In addition, we assume that this abstract domain is equipped with a join ⊔ that soundly over-approximates unions of sets of caches. We define the abstract domain T ♯ that groups together cache states with the same concurrent access history as follows: Each element t ♯ ∈ T ♯ consists of a partial map of pairs of nonnegative integers to abstract cache states. 
The result is a set of traces built from a trace of sets using the Cartesian product ·, where the end of each line corresponds to the states that can be observed at a context switch. For a scheduler S, we then define the concretization of an abstract state as the union of the concretizations w.r.t. all sets of context switches in S, i.e.
where S n = {A ∩ {1, . . . n} | A ∈ S} denotes the sets of context switches that are truncated at position n. 
Abstract Transition Function We define an abstract transition function
upd T ♯ (c ♯ 0≤i≤j≤n , M ) = (c
Lemma 2.
upd
The proof of Lemma 2 is given in Appendix A.2; it proceeds by a simple unfolding of definitions and a reduction to the soundness of upd C ♯ .
Compact Representations of Infinite Computations
With the abstraction described above we can represent cache observations up to a moment n in time. We now propose a further abstraction that enables us to finitely represent and compute cache observations for all points in time, using fixpoint techniques of abstract interpretation. Our current abstract domain for concurrent computations grows in two directions, both of which have to be bounded in a meaningful way: (1) the number of instructions since the beginning of the computation, and (2) the number of instructions since the last context switch happenend.
For bounding (1) we assume that the computation of the individual rounds of the PRG is performed in one main loop whose body takes exactly ℓ steps to execute. We leverage this knowledge to fold the abstract states corresponding to the same number of instructions inside the loop body. Technically, we will abstract each program point n ∈ N with the unique j ∈ {0, . . . ℓ − 1} such that j ≡ n mod ℓ, and we write j = [n]. In this way we only need to maintain elements c
For bounding (2), we impose a threshold s on the length of the history we track about the last context switch. To achieve soundness, we modify the update function such that the last element c 
Since the second subscript is an equivalence class, the update functions define a set of fixpoint equations. The transfer function for this set of equations is obviously monotonic, so we can iterate from a matrix entirely filled with empty caches to compute this fixpoint. In addition, we can use program points to store columns of this matrix and use the fixpoint iteration techniques already developed for CacheAudit. Even though c ♯ i,j with i > s of the abstract state defined in Section 4.3 are not explicitly represented in the above state, we define their concretization by
The definition of (d ♯ i,j ) and the corresponding update function ensures that the thus defined γ T (c ♯ i,n ) is always a superset of the concretization of the explicit representation. We obtain the following corollary for the compactly represented abstract state.
Proof. Follows from the proof of the previous lemma and by monotonicity of the new update function.
Computing Bounds on the Leakage
We now present an algorithm for upper-bounding counting the range of the leakage function based on a fixpoint (d ♯ i,j ) with i ≤ s and j ≤ ℓ−1 of the abstract domain described above. For convenience of notation we describe the algorithm in terms of (c ♯ i,j ) and explicit indices, which can be immediately translated using Equation (3) .
Recall from Section 2 that 
where ( * ) follows from Lemma 1, ( * * ) follows from Theorem 2, and A α,ω = {i 1 , . . . i k+1 }. We can instantiate this upper bound by applying existing procedures for counting concretizations of abstract cache states [17] to the elements of (c
For upper-bounding the leakage for a given scheduler S, we need to maximize the above expression over all A ∈ S. We show how this can be done for a very general class of schedulers, whose only requirement is a lower bound f on the number of instructions processed by the victim between two interruptions by the adversary.
For bounding the leakage w.r.t. to S f , we first give a recursive formula that expresses the maximal number of observation that an adversary can make at and between context switches at positions x and y R x,y = max
where we adopt the convention that R x,y = 1 whenever y < x + f . Observe that it is not sufficient to use R α,ω for upper-bounding the leakage of a secret that is present in logical memory between positions α and ω because (1) α does not necessarily coincide with a context switch, and (2) the final context switch may happen an indefinite number of steps after ω. To account for this fact we define
The left term in the definition of L α,ω captures the case where no context switch happens between α and ω. The second term captures the case where at least one context switch happens; in this case x is the position of the first, and y is the position of the last context switch between α and ω. For readability we omit further constraints on the minimal distance f between each two context switches.
The following lemma states that L α,ω describes an upper bound on the information that is leaked about the logical memory between positions α and ω.
The correctness of R x,y follows by a simple induction on y using the bound on Λ
A,a α,ω described above. The correctness of L α,ω follows by construction. Equation (5) immediately suggests an implementation of the algorithm using dynamic programming.
Leakage Per Key
For deriving bounds on the leakage per key for the PRG described in Section 3, we also assume that k i is part of the internal state of rounds i and i + 1. If each round can be computed in ℓ commands, the leakage about k i is hence upperbounded by L iℓ,(i+2)ℓ−1 As we identify all i modulo ℓ in (5), we immediately obtain that L 0,2ℓ−1 is an upper bound for the leakage about each round key.
Corollary 2. For all j ∈ N we have
∀Λ ∈ L j : |ran(Λ)| ≤ L 0,2ℓ−1
Case Study
In this section we report on a case study where we use our techniques to derive formal security guarantees against concurrent cache-based adversaries for binary executable of the leakage-resilient pseudorandom number generator from [26] .
Implementation of abstract domain and counting
We implement a special case of the abstract domain presented in Section 4. Namely, we use a fixed history threshold of s = 1, thereby trading precision for efficiency of the analysis. We connect this abstract domain to the CacheAudit platform [9] . CacheAudit takes as input a (32 bit) x86 binary executable, reconstructs the control flow, and uses abstract interpretation to compute an over-approximation of the set of program states (which comprise the cache) that are reachable. The local soundness of our novel domain (stated in Lemma 2), together with the correctness of CacheAudit (stated in Theorem 2) ensures that our analysis soundly over-approximates the set of all concurrent computations.
We further choose f such that the scheduler interrupts at most once per round of the pseudorandom number generator. With Corollary 2 we see that the leakage per key exceeds the leakage per round by a factor of at most three. We can hence obtain a leakage bound by maximising over the number of concretizations of abstract cache states that appear in the fixpoint, which is how we avoid implementing the concurrent counting procedure from Section 4.5 in full generality. We perform the counting of individual cache states using the techniques described in [17] .
Implementation of the PRG We implement the 2PRF by concatenating two blocks produced by a block cipher BC : {0, 1} n × {0, 1} n → {0, 1} n . More precisely, for an initialization vector IV ∈ {0, 1} n−1 we compute
For our implementation, we instantiate BC with the AES implementation from the PolarSSL library [2] , where we use a keylength of 256 bits. We put the key schedule and the two calls to the block cipher in an infinite loop and use gcc to compile this program to a 32-bit x86 executable, which is the artifact we analyze using the techniques developed in this paper.
Experimental results
We perform the analysis of the executable on a setassociative cache with LRU replacement strategy, where we consider different cache sizes, line sizes, and associativities. The results of our analysis are given in Table 1 . Columns 5 and 6, respectively, present bounds on the leakage to an concurrent cache adversary per round and per key. Our data show that leakage increases with the cache size and decreases with the line size. The first effect occurs because a larger cache size means that the table is spread out into more cache sets, which increases the resolution with which the adversary can observe the memory accesses of the victim. The second effect occurs because a larger line size decreases the adversary's resolution. Finally, our data shows that greater associativities lead to better bounds.
The entries of Column 6 can be used to instantiate the parameters in Theorem 1, where we consider an adversary with q = 2 50 and set the amount of observable data with the same IV to be 1GB, thus p ≤ 2 25 . The cryptographic security guarantees we obtain with these parameters are given in column 7; they range from very strong (e.g. 126.1 bits for a 2KB cache with 64B line size) to non-existent (e.g. 8KB cache with 32B line size).
Discussion Column 4 presents an absolute, program-independent bound on the number of cache states an adversary can observe in each context switch. Throughout the case study, we consider an adversary whose memory space is disjoint from the victim's, i.e. one who can observe how many memory blocks are loaded in each cache set, but not which. For the example of an 8KB cache with 4 ways and lines of 64B, this number amounts to (4 + 1) 8192/(4 * 64) , where the basis denotes the number of observations per set (0-4 blocks have been loaded into that set by the victim) and the exponent denotes the number of (independent) cache sets.
A comparison between columns 4 and 5 sheds light on the scope of our technique. For caches up to 4KB, the entries in both columns almost coincide. This is due to the fact that the 4KB+256B of tables in the PolarSSL AES implementation entirely fill such small caches, and that the static analysis can only predict that each of the corresponding memory blocks can either be loaded or not. The small difference in leakage stems from the fact that the static analysis can predict that the memory blocks containing local variables will be loaded. For caches of 8KB or more the static analysis can moreover determine that the memory access patterns of the executable only affect the memory blocks in which the tables and the local variables reside, hence the bounds obtained by static analysis are significantly better than those obtained by pure combinatorics.
Finally, we remark that there are several timing-relevant features of hardware our approach does not cover (and make assertions about) yet, including out-of-order execution, pipelines, TLBs, and multiple levels of caches. Likewise, implementations of instruction-based scheduling [24] are not yet widely deployed. From a practical perspective, it is currently still wise to rely on implementations that entirely avoid secret-dependent memory lookups, e.g. [1, 6, 15] .
Conclusions
We have presented the first proof of resilience against side-channel attacks by concurrent cache-based adversaries. To achieve this, we extended existing leakageresilient cryptosystems to a notion of leakage that captures caches and concurrency, and we developed program analysis techniques for statically deriving formal security guarantees based on executable code. that blocks of non-zero age are preceded by other blocks, i.e. that caches do not contain "holes". The cache update for LRU is then given by 
A.3 Bounded range leakage and guessing
We recall the following result (see [9] for a proof): 2 n |ran(Y )| In the context of G 2 in Theorem 1, the key k is a uniformly chosen random n-bit string, and Λ(k) the vector of observations of k given to the adversary, corresponding to the variable Y . Therefore, the probability of an adversary of outputting k ′ such that k = k ′ is upper bounded by 
