Cache attacks are a class of Side-channel attacks (SCAs) that extract the secret from the behavior of cache in the processors. These attacks utilize the fact that a cache miss has a different profile of leakages from a cache hit. Cache attacks demonstrated fall into three categories, depending on the channels used to collect the leakages. These channels are spy * Corresponding author. E-mail addresses: zhaoxinjieem@163.com (X. Zhao), fan.zhang@engineer.uconn.edu (F. Zhang).
Introduction
Cache attacks are a class of Side-channel attacks (SCAs) that extract the secret from the behavior of cache in the processors. These attacks utilize the fact that a cache miss has a different profile of leakages from a cache hit. Cache attacks demonstrated fall into three categories, depending on the channels used to collect the leakages. These channels are spy processes (Osvik et al., 2006) , timing information (Bernstein, 2004; Bonneau and Mironov, 2006) and power/electromagnetic (EM) traces (Acıïç mez and Koç , 2006a, 2006b; Bertoni et al., 2005; Bonneau, 2006; Fournier and Tunstall, 2006; Lauradoux, 2005) . The focus of this paper is trace driven cache attack (TDCA), which exploits the power or electromagnetic traces.
With direct access to the cryptographic device, the adversaries can monitor the power/EM traces, which minimizes the invasion to the device. As the name suggests, TDCAs monitor cache hits and misses from power/EM traces, and recover the secret key used in the computation. The number of traces required in TDCAs is much less than in the conventional differential power attacks (DPAs) (Kocher et al., 1999) , correlation power attacks (CPAs) (Brier et al., 2004) or other types of cache attacks (Osvik et al., 2006; Bernstein, 2004; Bonneau and Mironov, 2006) . Considering AES for example, only 30 cache traces are required in TDCAs ) instead of hundreds (or thousands) of power traces in DPAs, CPAs (Brier et al., 2004) , hundreds of cache traces in access driven cache attacks (Osvik et al., 2006) , and millions of cache traces in timing driven cache attacks (Bernstein, 2004; Bonneau and Mironov, 2006) .
AES was targeted in many TDCAs (Acıïç mez and Koç , 2006a, 2006b; Bertoni et al., 2005; Bonneau, 2006; Fournier and Tunstall, 2006; Lauradoux, 2005) . Throughout this paper, AES refers to AES-128 by default. Bertoni et al. (2005) showed that the cache traces manifested in the power profiles can be used to reveal the secret key. The cache events in the first round of AES S-Box lookups 1 implemented with a table of 256 bytes were estimated from power simulations and analyzed in Bertoni et al. (2005) . Further research in TDCAs on AES splits into two directions. One is about exploiting new and real leakages in TDCAs, where cache traces were collected from real power consumptions in Fournier and Tunstall (2006) and and from EM in . The other is improving the efficiency of TDCAs on different AES implementations. In the attacks on AES with large lookup tables (e.g., 1 K bytes), TDCAs can exploit cache events in the first round (Lauradoux, 2005) , the first two rounds (Acıïç mez and Koç , 2006a) or the last round (Acıïç mez and Koç , 2006b; Bonneau, 2006) . For AES implemented with a compact table (256 bytes), TDCAs can exploit the cache events in the first round (Bertoni et al., 2005) , or the first two rounds (Fournier and Tunstall, 2006; . This paper is under the latter direction and tries to improve TDCA on AES.
In the aforementioned TDCAs on AES, the cache events utilized are limited to the first 20 table lookups in the first two rounds because of the avalanche effect. Since the traces are captured for entire encryptions, exploiting the cache events in the third and later rounds can improve the efficiency of TDCA. Combining TDCAs with algebraic techniques and conducting algebraic side-channel attacks (ASCA) Oren et al., 2010; Oren and Wool, 2012 ) is a very promising way to improve TDCA. Previous ASCA mainly focused on power based Hamming weight leakage model Oren and Wool, 2012) or Hamming distance leakage model (Oren et al., 2010) . The original ASCA can only work when the deduction on the targeted states is single and correct. The error tolerant ASCA in Oren et al. (2010) and Oren and Wool (2012) can only work with limited deductions where the variance of the error is small and fixed. Previous ASCA cannot be directly and easily combined with TDCA because in practice, there are multiple deductions and the variance of the errors are large and uncertain.
In COSADE 2012, Zhao et al. proposed the multiple deductions-based ASCA (MDASCA) (Zhao et al., 2012) . The work in Zhao et al. (2012) showed that, due to the inaccurate measurements or the interferences from other components in the cryptosystem, the deduction on the targeted intermediate state from SCA is not always correct. As a result, attacks have to deal with the fact that the correct value is among multiple candidates obtained during the process, which are also referred to as multiple deductions. How to represent and utilize these multiple deductions is critical to improving the error tolerance and exploiting new leakage models for ASCAs Oren et al., 2010; Oren and Wool, 2012 , partial preloaded cache scenario (Bonneau, 2006; ), different AES key lengths (e.g., AES-192/256), how many rounds of leakages can MDASCA exploit, how much can MDASCA improve TDCAs in terms of the data complexity, and what are the new scenarios where we can apply MDASCA in cache attacks? This paper aims to answer these questions and gives a systemic and comprehensive study of the multiple deductions-based algebraic TDCAs (MDATDCAs). The rest of this paper is organized as follows. Section 2 describes the notations used throughout the paper. In Section 3, we formalize the key recovery problem in TDCA with an abstract model, which is independent of specific TDCA techniques (Acıïç mez and Koç , 2006a, 2006b; Bertoni et al., 2005; Bonneau, 2006; Fournier and Tunstall, 2006; Lauradoux, 2005; Zhao et al., 2012) . Then we categorize previous TDCAs on AES into three types and study their limitations in Section 4. In Section 5, we describe the detailed procedure of MDATDCA and analyze the overhead. To evaluate the efficiency of MDATDCA on AES, we build a mathematical model in Section 6 to estimate the maximal number of leakage rounds that can be exploited and 1 S-Box is usually implemented with lookup table. In this paper, S-Box lookup is sometimes written as table lookup.
2 In TDCA, error means that the deduction for the cache events (hit or miss) from side channel leakages is incorrect. error-free attack scenario means that the adversary can deduce all the cache events correctly in one attack. Comparatively, error-tolerant attack scenario means that there exists errors when deducing some cache events in an attack. 
the minimal number of cache traces required in a successful MDATDCA. Unlike previous work that can only analyze the cache events in the first round, this paper can analyze any cache events. The attack setup is described in Section 7. The preliminary results under an error-free scenario are presented in Section 8 to verify the theoretical analysis results. The results with different error rates are showed in Section 9. MDATDCAs on AES with partially preloaded cache are described in Section 10. To demonstrate the power of MDATDCA, we extend the attack to AES-192/256 in Section 11. Finally, we conclude the paper in Section 12.
Notation
Throughout the paper, P denotes the public variable (plaintext or ciphertext) and K denotes the targeted secret variable (the master key or equivalent key). Variables p i and k i denote the ith (i ! 0) part in P and K, respectively. Each part contains l bits. Let q j denote the j-th table lookup in the execution of block ciphers, l denote the number of table lookups considered in the attack, 0 j < l. H and M denote whether q j is a cache hit or miss respectively. y j denotes the index of the lookup q j . U j and V j are the set of p i and k i that represent y j , where
be the function that computes y j from U j and V j ,
Suppose each entry in the table has 2 e bytes and each cache line has 2 d bytes. Let hy j i b be the b most significant bits (MSBs) of y j 
3.
The TDCA problem
In this section, we propose an abstract model which can be used to generalize all TDCAs. TDCA on a block cipher is illustrated in Fig. 1 , where l lookups are considered. Observing power or EM traces, one can detect whether q j is a cache hit (H ) or miss (M ), 0 j < l. From Fig. 1 we can see that a cache miss has a distinct amplitude peak than a cache hit (Note also that the amount of clock cycles is distinctly different). The goal of TDCA is to extract the value of all k i in K (the secret key) from the knowledge of the p i s (known public variables) and q j s (cache events).
Suppose the cache contains no data from the table before each encryption. As to the analysis of the cache event in q t , suppose q j is the only cache miss before q t . A cache hit of q t means both y t and y j access the same cache line. Eq. (1) holds if q t is a cache hit.
The key technique of TDCA is to use Eq. (1) to reduce the search space of V t W V j , which converges to K if t is large enough. Since both U t and U j are known, the adversary can check all the assignments to those k i in V t W V j with Eq. (1). If Eq. (1) is satisfied, the assignment is a possible value for k i . Otherwise, the assignment is an incorrect guess. Similarly, if q t is a cache miss, Eq. (2) can be used in the key search.
From Section 2, there are n cache misses q M1 ; .; q Mn before the first t À 1 lookups. The set O t M ¼ fM 1 ; .; M n g (n < t) can be used to build n additional equations (or inequations). If q t is a hit, only one of q M1 ; .; q Mn accesses the same cache line as q t (because if two of them are in the same cache line as q t , one of them must be a hit). In this case, Eq. (3) holds
If q t is a miss, Eq. (4) holds
Using the n equations (inequations) in (3) or (4), more assignments to key bits in V M 1 W/WV Mn WV t can be verified. The key recovery is converted into the problem of how to converge V M 1 W/WV Mn WV t to the master key K with cache events. In TDCA, the adversary can analyze different table lookups and traces until the search space of K is reduced to a level where a brute-force attack is feasible.
The above abstract model can help us to understand the TDCA problem and is generic to block ciphers using the S-Box (table) lookup structure (Acıïç mez and Koç , 2006a, 2006b; Bertoni et al., 2005; Bonneau, 2006; Fournier and Tunstall, 2006; Lauradoux, 2005; Poddar et al., 2011; Mukhopadhyay, 2010, 2011; Zhao and Wang, 2010) . Different attack techniques can be developed to solve this problem, such as traditional TDCA technique (Acıïç mez and Koç , 2006a, 2006b; Bertoni et al., 2005; Bonneau, 2006; Fournier and Tunstall, 2006; Lauradoux, 2005) , MDASCA technique (Zhao et al., 2012) or others to be proposed in the future.
4.
Analysis of previous work 4.1.
AES implementations
All the AES implementations can be categorized into three types based on (1) g t , the number of the lookup tables; (2) g s , the size of the lookup tables; (3) g l , the number of lookups in one round that access the same table; (4) g c , the size of the cache line, where g c ¼ 2
d
. Note that the scope of this paper is about AES implementations that use one or more lookup tables for Fig. 1 e S-Box (Table) Look-up structure targeted in TDCA. 
the sole S-Box, and not the lookup tables for the field multiplication in the MixColumns operation of AES (Fournier and Tunstall, 2006 Fournier and Tunstall (2006) , , FIPS 197 (2001) Bertoni et al., 2005) , which focused on the cache events of the first round of AES. To further reduce the key search space and the number of plaintexts (or power traces) required, attacks in Fournier and Tunstall (2006) , and also utilized some cache events in the second round. In Fournier and Tunstall (2006) , equations are generated only from the cache hits as shown in Eq. (3). In the first round,
is the bitwise XOR and hk j 4k t i 4 can be derived. It is shown in Fournier and Tunstall (2006) that the search space of the AES key can be reduced to 2 68 with at most 240 adaptive chosen plaintexts. To improve the attack, the work in Fournier and Tunstall (2006) 
TDCA on AES of Type C
The work in Acıïç mez and Koç (2006b)and Bonneau (2006) showed that under Type C implementations, TDCA on the final round of AES is much more effective than in the first round. As to the first round of AES, f t ($) is a linear function and only the higher b bits of k j 4k t , (0 j < t 15) are leaked. While 
Limitation of previous TDCAs
Traditional TDCAs rely on the representation of y t in different rounds. As noted in Acıïç mez and Koç (2006a), the full avalanche effect is achieved in the third round, where f t (U t , V t ) becomes complicated if a manual analysis is conducted. Because of this, current traditional TDCAs on AES (Acıïç mez and Koç , 2006a Koç , , 2006b Bertoni et al., 2005; Bonneau, 2006; Fournier and Tunstall, 2006; Lauradoux, 2005) can at most analyze the cache events in the first two rounds, more precisely, from the first 20 lookups. Moreover, all current TDCA works are for AES-128. As to AES with longer key lengths (e.g., , the key expansion algorithms become more complicated and the first 20 lookups only leak partial bits of the master key. How to recover more key bits and conduct effective TDCA on them are still open problems.
The manual representation of table indexes is awkward. It is imperative to provide a tool for the analysis of the leakages in deeper rounds in order to improve the attacks. Combining algebraic techniques with TDCA seems to be interesting and promising. The MDASCA proposed by Zhao et al. (2012) in COSADE 2012 is a generic method to exploit many types of side-channels leakages with algebraic techniques.
5.
MDASCA-based trace driven cache attacks (MDATDCAs)
In TDCA, the key issue is to obtain the cache events related to table lookups and to represent the possible (and/or impossible) candidates of lookup indexes with equations. We will use the same notations in Section 2 in the following discussion. Recall that there are n misses q M1 ; .; q Mn before q t . As to the set of the b MSBs of the indexes, S If q t is a cache hit, the cache line that includes the index y t has been loaded into the cache by earlier table lookups, which means that d is equal to only one element of the multiple deductions in S t M . If q t is a cache miss, the cache line that includes y t has not been accessed, which means that d is not equal to any element of the multiple deductions in S t M . From above, we can see that how to represent the relations between y t and its multiple possible or impossible deductions for cache miss event is the most challenging part in TDCA. The work in Zhao et al. (2012) proposes a generic method to convert the multiple deductions into algebraic equations and applies it to TDCA. For convenience, we will refer to the MDASCA-based trace driven cache attack or so called multiple deductions-based algebraic TDCA as MDATDCA. In this type of attack, the cipher is first represented with a system of algebraic equations. Then the cache hit/miss events are profiled via power/EM measurements and then convert the multiple deductions on b MSBs of the lookup index for each cache event into equations, which are added into the original equation system of the cipher. Finally, the secret key is recovered by solving the whole equation system (Faugè re, 2007; Soos et al., 2009) . More details about MDASCA can be found in Zhao et al. (2012) . Next, we will describe the core of MDATDCA, which is to represent cache hit and miss events with algebraic equations.
5.1.
Representing a cache hit (6), where : denotes the NOT operation.
Only one d i is equal to d (c i is 1 then), which can be represented as:
According to Eq. (6) and Eq. (7), (b þ 1) Â s p variables and
are introduced to represent D.
Representing a cache miss
Let D denotes the impossible deduction set of hy t i b and s n be the size of
is also introduced as in Section 5.1. None of d i in D is equal to d, which can be represented as:
According to Eq. (8), (b þ 1) Â s n variables and (b þ 1) Â s n ANF equations are introduced to represent D. As shown in this section, the algebraic equations for new constraints are quite simple. They can be easily fed into a solver, e.g., the SAT solver CryptoMiniSAT (Soos et al., 2009) , to recover the key.
Evaluation of MDATDCAs on AES
For simplicity, this section only estimates the number of rounds that can be exploited, and the number of cache traces required in MDATDCAs on AES-128 under the error-free attack scenario, where the cache does not contain any AES data prior to each encryption. Extending these estimations to AES-192/ 256 is straightforward.
The number of rounds that can be exploited
For convenience, D is used to denote the set of cache lines that will be filled up with data from lookup tables. Let m be the number of cache lines in D, m ¼ 2
E denotes the maximal number of rounds that can be utilized in MDATDCA. As long as D is not filled up, there may exist some cache misses (before q t ) that can be used for key recovery. To estimate E in MDATDCA, we introduce the metric n t as in Acıïç mez and Koç (2006a Koç ( , 2006b ) and Zhao et al. (2012) , which is the number of cache lines loaded after t table lookups. For a cache line, the probability of not being filled after t lookups is (m À 1/m) t . Then, after t lookups to the same table, the expected number of loaded cache lines, n t , can be calculate as
Fig. 2 shows how n t changes with t and b in different types of implementations. The solid curves in blue, green and red are for the cases where b ¼ 3, 4, and 5, respectively. We can see that, for Type A (or B), after the first 2/3/6 rounds (or 5/10/10 rounds), n t is approaching to m (i.e., D is filled up). For Type C, all the 16 lookups in the last round can be used for key recovery.
6.2.
The number of cache traces required
The work in Zhao et al. (2012) presents a preliminary study of estimating the minimal number of cache traces required in TDCA. In this section, we introduce four metrics and adopt the information-theoretic approach to optimize the estimations on the minimal number of cache traces required for a successful MDATDCA.
(1) r t : the ratio between the size of the search space of K t after and before analyzing q t The probability that q t is a cache hit is n t /m. If q t is a hit, the expected number for the candidates of hy t i b can be reduced from m to n t . It is also easy to check the probability when q t is a miss. Then r t can be calculated as Fig. 3 shows how r t changes with t and b. We choose r t 0.9 as the threshold (marked as a purple line in each subfigure). Note that the number of lookups read from the intersection between the purple and other lines can also be used to calculate the maximal number of rounds that can be utilized and the result is consistent with the earlier analysis.
(2) p t : the number of key bits that can be derived from q t p t ¼ Àlog 2 ðr t Þ (11) Fig. 4 shows how p t changes with t and b. We can see that (1) q 0 is always a cache miss. p 0 ¼ 0. (2) p t 1 for all the values of t, which means that the number of key bits that can be extracted from adding one lookup is less than one. This can also be observed in Fig. 3 where r t ! 0.5 for all the curves. For AES of Types A and B, since all the key variables can be extracted using the cache events in the first two rounds, we just need to calculate s 0 and s 1 . For Type C, we calculate s 9 in the final round. Note that there are some intersects among K t for different table lookups in practice, thus s i satisfies
(4) s i : the maximal number of key bits recovered in the i-th round Let s 0 , s 1 , and s 9 denote the maximal number of the key bits recovered in the first, second and last round. Since no information is leaked on the first access to each lookup table, according to Section 4, s 0 ¼ 15b and s 1 ¼ 128 for Type A, s 0 ¼ 12b and s 1 ¼ 128 for Type B, and s 9 ¼ 128 for Type C.
With the introduction of all the aforementioned variables, we can now roughly estimate the minimal number of cache traces required to get s i bits in the i-th round, denoted as N i . We calculate N 0 and N 9 as
As s 0 bits are recovered in the first round, we only need to recover the remaining 128 À s 0 bits in the second round. So we roughly calculate N 1 as 14) Let E N denote the estimated minimal number of cache traces needed in order to recover the master key. E N ¼ max{N 0 , N 1 } for Type A and B. E N ¼ N 9 for Type C. Table 2 lists the value of N 0 , N 1 , N 9 and E N for different AES implementations. The value of E N can help us to determine the number of traces needed in practical attacks. We will verify it with experiment results in Sections 7.
6.3.
Comparisons with previous work It should be noted that, we provide an approach to calculate the leaks from every round-key bits in all rounds. We use this approach to roughly estimate the minimal number of traces required in TDCA. Our estimation can be verified by the experiments (to be addressed in Sections 7 and 8). There could be some future work to quantify the joint cache leaks from many key bits in more rounds, and to estimate the lower bound more accurately.
Experiment setup
The overall process of MDATDCA has been described in Section 5. Due to the page limit, here we only list a few important details about the setup. Each run of MDATDCA with different parameters is referred to as a case. Each case will be repeated many times and referred to as instances. 
7.1.
Build the AES equation set
How to represent the S-Box is the most difficult part in algebraic analysis. We adopt the technique in Knudsen and Miolane (2010) to derive every S-Box output bit with highdegree equations (degree 7) from the eight S-Box input bits. More details can be found in Appendix 1.
Profile the cache traces
This paper mainly focuses on the analysis part of MDATDCAs. Details of profiling the events can be found in , and Zhao et al. (2012) . In Sections 8, 10, and 11, we assumed that the cache hits and misses are distinguishable in the EM traces. This can be achieved by modifying the AES source code in OpenSSL and generate the sequences of cache events under different configurations. To prove the feasibility of MDATDCA, in Section 9, we conduct concrete MDATDCA experiments against AES implemented with 256B compact table on 32-bit ARM microprocessor NXP LPC2124. In practice, the cache hits and misses are not always distinguishable from the EM traces, which are treated as uncertain cache events or errors. We propose a method to adapt MDATDCA to exploit these errors. More details can be found in Section 9.
Utilize the cache traces
We build additional equations from the generated cache events. When the leakage information is not enough, there may exist multiple solutions in solving for the key. The SAT solver may output a wrong but satisfied solution, which foils the whole attack, as noted by Oren et al. (2010) and Oren and Wool (2012) . In order to verify these multiple solutions, we append a set of new equations which describes a full AES encryption with a pair of known plaintext and ciphertext. Then, with this new approach, the correct key can always be derived within a reasonable time.
For each instance, we first try to solve the generated equations directly. However, some instances cannot be solved within a day. To accelerate the solving process, we give the guesses to n k key bits first and run the exhaustive search for all the 2 n k guesses. If the guess is correct, the solver can output the correct key within a reasonable amount of time. Otherwise, it will output "unsatisfiable" very quickly. As to Types A, B, C, we set n k ¼ 4, 8, 4, respectively. We can see Table 2 e N i for different AES implementations. that n k is relatively small and the exhaustive search is affordable.
Solve the equation system
Many automatic tools can be used, such as Grö bner basisbased (Faugè re, 2007), or SAT-based solver (Soos et al., 2009 ). We use a SAT-based solver, CryptoMiniSat 2.9.0 (Soos et al., 2009) , on an AMD Athlon 64 Dual core 3600þ processor clocked at 2.0 GHz. In Sections 8, 9, and 10, three case studies are performed in MDATDCA on AES-128 considering different attack scenarios.
Case 1: error-free MDATDCAs on AES-128
In this section, we conduct MDATDCA on AES-128 under two assumptions. The first is that the cache does not contain any AES data prior to each encryption. The second is that the adversary can distinguish the cache miss event from the cache hit event precisely. We name this scenario as MDATDCA on AES with error free, which is also widely used in previous TDCA work (Acıïç mez and Koç , 2006a, 2006b; Bertoni et al., 2005; Bonneau, 2006; Fournier and Tunstall, 2006; Lauradoux, 2005) .
Data and time complexity
We conduct nine cases of MDATDCAs for Types A, B, C and b ¼ 3, 4, 5. Then, for each case, we randomly generate a secret key and collect N cache traces for N different pairs of plaintexts and ciphertexts (N is chosen based on E N in Table 2 ). For each case, we run 100 instances where the correct values of n k key bits are fed into the equation set first. Let t denote the average full attack time (in seconds) when the correct key is retrieved in the exhaustive search procedure of guessing the 2 n k key bits. Fig. 6 (a)e(i) show the distribution of the different solving times (in seconds) for the nine cases by analyzing N cache traces. The bold number is E N . From Fig. 6 , we can see that
(1) The number of cache traces required in practice, N, is consistent with the estimated value of E N . For Types A and B, sometimes N is smaller than E N . This is because the calculation of E N only considers the leakages in the first two rounds or in the last round, while in real attacks, the cache events in the deeper rounds can also contribute to the attacks. (2) The equation solving time seems to follow an exponential distribution. Most instances can be solved in a short time (less than 100 s). Only a few require the longer time (more than 200 s as shown at the right end of each subfigure). Similar observations are also reported in and . (3) The time complexity of the full attack on AES is affordable.
Most attacks can succeed within 7200 s. The time required in attacking AES for Type A and Type C is less than Type B.
Note that we set 200 s as the threshold of the equation solving time for a successful MDATDCA. If the adversary has more computation power, the attack may require fewer cache traces. For example, d ¼ 4 for Type A, but if we set the threshold to 3600 s, only 5 cache traces are required (Zhao et al., 2012) . How to find a good tradeoff between the data complexity (the number of cache traces) and the time complexity (the solving time) is a very interesting problem.
Overhead for the equation system
The original AES with r rounds can be represented with a set of equations. Suppose the number of equations and variables to represent this set are N r e and N r v respectively. For the lookup q t , the overhead introduced can be calculated as in Sections 5.1 and 5.2. The number of variables and ANF equations in the Fig. 7 shows how EQ r and VA r change with r.
We can observe that: (1) Both EQ r and VA r are small (less than 1); (2) Both EQ r and VA r increase linearly with r (as the size of the deduction set, s p or s n , for q t increases); (3) EQ r is a little larger than VA r for the same r.
8.3.
Comparisons with previous work
The comparisons of MDATDCAs with previous work are listed in Table 3 . The first three columns describe the AES implementations. The next three columns list the attacks, and the number of traces and rounds that are required. The last column lists the reduced key search space. We can see that MDATDCAs have better performances than all previous work in terms of both data and time complexity.
Case 2: error-tolerant MDATDCAs on AES-128
In this section, we will describe how to conduct MDATDCA on AES-128 in practice, where there are errors when deducing cache events.
Similar to , we implemented unprotected AES software implementations on a 32-bit ARM microprocessor NXP LPC2124 and profiled the cache collisions via EM probe. We reset the cache to clear the AES data prior to each encryption. The acquisition was performed with Langer RF-B 3-2 probe, Langer PA303N 30 dB preamplifier and Tektronix DPO 4104 oscilloscope.
In the attack, for most of the time, a cache miss related EM trace is likely to have a distinct peak compared to a cache hit, which is also illustrated in Fig. 1 . However, for some table lookups, it is hard to tell whether they are cache miss or hit because the peak is not high enough. We consider such cache events as uncertain ones or errors. Under this scenario, the key issue is to deal with the uncertain cache events. Next, we describe the error-tolerant strategy and present the experimental results on AES.
Error tolerance strategy
In the attack, we set two thresholds of the amplitude peak value to deduce the cache events, the upper bound threshold V M and the lower bound threshold V H . Suppose V t is the amplitude peak value of q t in the trace. If V t < V H , the targeted cache event q t is considered as a hit; if V t > V M , q t is considered as a miss; if V H V t V M , q t is considered as an uncertain event.
We adopt the following strategy to analyze each cache event.
1. q t is a hit.
Then D, the possible deduction set of d (hy t i b ), is composed of the index set related to both previous cache miss events and uncertain cache events. Thus, the set size s p is much larger than the one in error-free MDATDCA. Note that as some uncertain cache events might be cache hit in reality, there might exist two or more deductions which are both equal to d. Thus, we need to update Eq. (6) of Section 5 to c 1 nc 2 n.nc sp ¼ 1 (15) 2. q t is a miss.
Then the impossible deduction set of d(hy t i b ) is only composed of the index set related to previous cache miss events. Note that as some cache miss events in practice may be considered as uncertain cache events, the set size s n is much smaller than the one in error-free MDATDCA.
3. q t is an uncertain cache event.
Then no analysis is performed on this cache event.
Experimental results and comparisons
In the error tolerant MDATDCA experiments, we denote the error rate as e ¼ N E /N A , where N E is the number of uncertain cache events, and N A is the number of all the utilized cache events.
For simplicity, we only perform the MDATDCA on AES for Type A when b ¼ 4. The extensions to other cases are straightforward. To investigate the number of cache traces required for a successful MDATDCA for different e, we first conduct several attacks with different e and repeat the attacks for 100 random keys in every case. The results are plotted in Fig. 8 . Fig. 8 shows the number of traces required for MDATDCA changes with e. It is clear to see that the number of required cache traces increases linearly with the error rate. In practice, the error rate is about 40%. Only 12 cache traces are required to break AES. The online complexity of data acquisition is comparable to DPA, CPA and other types of cache attacks. The offline complexity is also affordable. Recovering the full key from a set of cache traces takes less than an hour on a computer mentioned in Section 7.4. Table 6 lists the comparisons of MDATDCA to which is the only literature about TDCA on AES with error tolerance. We can see that, our error-tolerant 
MDATDCA can analyze the cache events of the first three rounds and require less cache traces than . Moreover, when the error rate is as high as 90%, MDATDCA still works with 80 cache traces, which is better than 80% in .
10.
Case 3: MDATDCAs on AES-128 with preloaded cache
The MDATDCAs in Sections 8 and 9 are all conducted assuming the cache is cleaned before the attack. In practice, the cache might be partially filled with some lines of the lookup table, which is also named as TDCA in the partially preloaded cache scenario and widely studied in previous work (Bonneau, 2006; . This section presents the cache analysis strategy and experimental results of MDATDCAs on AES-128 with partially preloaded cache (Table 4) .
Cache analysis strategy
Under this scenario, since some data of AES lookup table are already filled in the cache, more cache hit events can be observed for a single cache trace in practice. Then, the cache hits that occur may correspond to preloaded lines, and no valuable information can be provided to the attack. We utilized the cache miss events in our MDATDCA on AES.
Experimental results and comparisons
The comparisons of our results with previous work are depicted in Table 5 . We can see that, under partially preloaded cache scenario, less cache traces are required to break AES by MDATDCA than . Even when ten Fig. 9 e Leakages in TDCA on AES-128.
c o m p u t e r s & s e c u r i t y 3 9 ( 2 0 1 3 ) 1 7 3 e1 8 9
of sixteen cache lines are preloaded into cache before the AES encryption, MDATDCA can still succeed within 120 cache traces, which is better than eight preloaded cache lines reported in .
Extensions of MDATDCAs to AES-192/256
11. , in which the key expansion algorithm is much more complicated and the second round key has little (e.g., or no relation (e.g., AES-256) with the first round key. How to conduct TDCA becomes an open problem when facing the difficulty of analyzing the cache leakages in more rounds. Next, we show that why and how MDATDCA can be used to attack AES-192 and AES-256.
MDATDCA on AES-192
The key leakages in TDCA on AES-192 are depicted in Fig. 10 , which is exactly the master key. 
Take TDCA on AES of Type A when b ¼ 4 as an example. With the most efficient TDCA technique in 
MDATDCA on AES-256
The key leakages in TDCA on AES-256 are depicted in Fig. 11 According to the key schedule of AES-256, the master key is just the concatenation of K 0 and K
1
. To break AES-256, analyzing at least the cache events of the first 3 rounds has to be considered and MDATDCA works well for this. We show that 15 cache traces can recover the AES key within 30 min on average under known plaintext and error-free scenario for the full attack.
12.
Conclusions and future work This paper gives a comprehensive study on MDATDCA, the MDASCA-based trace driven cache attacks on AES under different AES implementations, attack scenarios and key lengths. We show that MDATDCA can exploit the cache hit/ miss leakages in more rounds than the traditional TDCAs. Thus, the data and time complexity in both the online and c o m p u t e r s & s e c u r i t y 3 9 ( 2 0 1 3 ) 1 7 3 e1 8 9 offline phases can be significantly reduced. For the first time, we show that TDCAs on AES-192 and AES-256 become possible with the MDATDCA technique. We have achieved many improvements of TDCA on AES compared to previous work. Combining algebraic cryptanalysis with cache attacks is efficient for fully utilizing the leakages and improving cache attacks. We stress that MDATDCAs are resistant to Boolean masking of software AES implementations in the case where all S-Boxes share the same random mask, as detailed in Bonneau and Mironov (2006) . When such a masking scheme is used, our attacks will outperform higher order DPAs or CPAs that typically require thousands of traces. The countermeasures of MDATDCA is the same as TDCA, which are widely discussed in the previous works (Acıïç mez and Koç , 2006a Koç , , 2006b Bertoni et al., 2005; Bonneau, 2006; Fournier and Tunstall, 2006; Lauradoux, 2005) . They include prefetching the lookup table into the cache prior to encryption and shuffling the order of table lookup computations. Meanwhile, Intel and other processor manufacturers have started to implement AES-NI and used on-chip atomic instructions that do not use the cache to implement AES (WATER PAPER, 2010), which can be used to prevent TDCA efficiently.
Note that MDATDCA can also be extended to improve TDCAs on other block ciphers, such as Camellia and CLEFIA Mukhopadhyay, 2010, 2011; Zhao and Wang, 2010) . The study of the trade-off between the data and time complexity in online and offline phases of MDATDCA, how to further quantized evaluating MDATDCA in the contributions of the leaked key bits from cache events to the recovery of the maser key of AES, how to evaluate MDATDCA on AES in case of error-tolerant and pre-loaded cache attack scenarios, how to develop new attack techniques to solve the TDCA problem might also be interesting problems in the future. We hope this paper can bring the understanding of both ASCA and TDCA to a new level, and help to evaluate the physical security of block cipher implementations.
input bits (denoted as x 1 x 2 .x 8 ), which provides an explicit representation of the dependence of the S-Box output on the input. An example of y 1 represented by x 1 x 2 .x 8 is shown as below.
y 1 ¼x 1 4x 3 4x 4 4x 6 4x 1 x 8 4x 1 x 7 4x 2 x 6 4x 4 x 6 4x 1 x 3 4x 6 x 7 4x 2 x 4 4x 2 x 8 4x 6 x 8 4x 3 x 5 4x 2 x 3 x 8 4x 3 x 5 x 6 4x 3 x 7 x 8 4x 3 x 4 x 8 4x 1 x 4 x 7 4x 1 x 5 x 8 4x 1 x 2 x 8 4x 4 x 7 x 8 4x 2 x 3 x 6 4x 3 x 5 x 7 4x 3 x 6 x 8 4x 2 x 7 x 8 4x 1 x 2 x 4 4x 2 x 6 x 7 4x 2 x 5 x 7 4x 1 x 2 x 6 4x 5 x 6 x 8 4x 1 x 3 x 5 4x 2 x 4 x 6 4x 3 x 4 x 5 4x 1 x 6 x 8 4x 3 x 5 x 8 4x 5 x 6 x 7 4x 2 x 3 x 4 4x 2 x 5 x 8 4x 1 x 3 x 4 4x 1 x 2 x 4 x 7 4x 1 x 2 x 5 x 7 4x 1 x 2 x 3 x 6 4x 1 x 2 x 3 x 7 4x 2 x 3 x 4 x 5 4x 1 x 4 x 5 x 6 4x 4 x 5 x 6 x 7 4x 5 x 6 x 7 x 8 4x 4 x 6 x 7 x 8 4x 4 x 5 x 7 x 8 4x 3 x 6 x 7 x 8 4x 1 x 2 x 3 x 4 4x 1 x 5 x 6 x 8 4x 2 x 3 x 5 x 8 4x 1 x 3 x 4 x 7 4x 3 x 5 x 6 x 7 4x 1 x 5 x 6 x 7 4x 3 x 4 x 6 x 7 4x 2 x 4 x 5 x 8 4x 1 x 4 x 7 x 8 4x 1 x 3 x 5 x 8 4x 1 x 2 x 5 x 8 4x 1 x 4 x 5 x 8 4x 1 x 4 x 6 x 8 4x 2 x 4 x 6 x 8 4x 1 x 2 x 4 x 6 4x 1 x 6 x 7 x 8 4x 1 x 4 x 5 x 7 4x 1 x 2 x 4 x 8 4x 2 x 5 x 7 x 8 4x 3 x 5 x 6 x 8 4x 2 x 5 x 6 x 8 4x 2 x 4 x 6 x 7 4x 1 x 2 x 3 x 5 x 7 4x 1 x 2 x 3 x 4 x 7 4x 1 x 2 x 3 x 7 x 8 4x 1 x 2 x 3 x 4 x 5 4x 1 x 2 x 3 x 6 x 7 4x 2 x 3 x 4 x 6 x 7 4x 1 x 2 x 4 x 6 x 7 4x 1 x 2 x 4 x 5 x 8 4x 2 x 3 x 5 x 6 x 7 4x 1 x 2 x 3 x 5 x 8 4x 4 x 5 x 6 x 7 x 8 4x 2 x 3 x 4 x 5 x 6 4x 2 x 3 x 4 x 7 x 8 4x 1 x 3 x 4 x 7 x 8 4x 3 x 4 x 5 x 7 x 8 4x 1 x 4 x 5 x 7 x 8 4x 2 x 3 x 4 x 5 x 8 4x 2 x 4 x 5 x 6 x 8 4x 1 x 3 x 6 x 7 x 8 4x 3 x 4 x 5 x 6 x 8 4x 1 x 2 x 6 x 7 x 8 4x 1 x 4 x 5 x 6 x 8 4x 2 x 4 x 6 x 7 x 8 4x 2 x 5 x 6 x 7 x 8 4x 1 x 2 x 4 x 5 x 6 x 8 4x 1 x 2 x 3 x 4 x 5 x 7 4x 1 x 2 x 5 x 6 x 7 x 8 4x 1 x 2 x 4 x 5 x 7 x 8 4x 1 x 3 x 4 x 5 x 7 x 8 4x 1 x 2 x 3 x 5 x 7 x 8 4x 1 x 2 x 3 x 5 x 6 x 8 4x 2 x 3 x 4 x 5 x 6 x 8 4x 1 x 2 x 3 x 4 x 6 x 8 4x 1 x 3 x 4 x 5 x 6 x 8 4x 1 x 3 x 4 x 6 x 7 x 8 4x 1 x 2 x 3 x 4 x 5 x 7 x 8 4x 1 x 2 x 3 x 4 x 5 x 6 x 8 Each S-Box can be represented by 254 ANF equations with 262 variables. The number of the new variables and ANF equations introduced for different AES operations are listed in Table 6 . r e f e r e n c e s c o m p u t e r s & s e c u r i t y 3 9 ( 2 0 1 3 ) 1 7 3 e1 8 9
