Abstract. Implementations of cryptographic algorithms are vulnerable to Side Channel Analysis (SCA). To counteract it, masking schemes are usually involved which randomize key-dependent data by the addition of one or several random value(s) (the masks). When dth-order masking is involved (i.e. when d masks are used per key-dependent variable), the complexity of performing an SCA grows exponentially with the order d. The design of generic dth-order masking schemes taking the order d as security parameter is therefore of great interest for the physical security of cryptographic implementations. This paper presents the first generic dth-order masking scheme for AES with a provable security and a reasonable software implementation overhead. Our scheme is based on the hardware-oriented masking scheme published by Ishai et al. at Crypto 2003. Compared to this scheme, our solution can be efficiently implemented in software on any general-purpose processor. This result is of importance considering the lack of solution for d 3.
Introduction
Side Channel Analysis exploits information that leaks from physical implementations of cryptographic algorithms. This leakage (e.g. the power consumption or the electro-magnetic emanations) may indeed reveal information on the data manipulated by the implementation. Some of these data are sensitive in the sense that they are related to the secret key, and the leaking information about them enables efficient key-recovery attacks [7, 19] .
Due to the very large variety of side channel attacks reported against cryptosystems and devices, important efforts have been done to design countermeasures with provable security. They all start from the assumption that a cryptographic device can keep at least some secrets and that only computation leaks [25] . Based on these assumptions, two main approaches have been followed. The first one consists in designing new cryptographic primitives inherently resistant to side channel attacks. In [25] , a very powerful side channel adversary is considered who has access to the whole internal state of the ongoing computation. In such a model, the authors show that if a physical one-way permutation exists which does not leak any information, then it can be used in the pseudo-random number generator (PRNG) construction proposed in [4] to give a PRNG provably secure against the aforementioned side channel adversary. Unfortunately, no such leakage-resilient one-way permutation is known at this day. Besides, the obtained construction is quite inefficient since each computation of the one-way permutation produces one single random bit. To get more practical constructions, further works focused on designing primitives secure against a limited side channel adversary [13] . The definition of such a limited adversary is inspired by the bounded retrieval model [10, 22] which assumes that the device leaks a limited amount of information about its internal state for each elementary computation. In such a setting, the block cipher based PRNG construction proposed in [30] is provably secure assuming that the underlying cipher is ideal. Other constructions were proposed in [13, 31] which do not require such a strong assumption but are less efficient [40] . The main limitations of these constructions is that they do not enable the choice of an initialization vector (otherwise the security proofs do not hold anymore) which prevents their use for encryption with synchronization constraints or for challenge-response protocols [40] . Moreover, as they consist in new constructions, these solutions do not allow for the protection of the implementation of standard algorithms such as DES or AES [14, 15] .
The second approach to design countermeasures provably secure against side channel attacks consists in applying secret sharing schemes [2, 39] . In such schemes, the sensitive data is randomly split into several shares in such a way that a chosen number (called the threshold) of these shares is required to retrieve any information about the data. When the SCA threat appeared, secret sharing was quickly identified as a pertinent protection strategy [6, 17] and numerous schemes (often called masking schemes) were published that were based on this principle (see for instance [1, 3, 18, 23, 26, 29, 34, 38] ). Actually, this approach is very close to the problem of defining Multi Party Communication (MPC) schemes (see for instance [9, 12] ) but the resources and constraints differ in the two contexts (e.g. MPC schemes are often based on a trusted dealer who does not exist in the SCA context). A first advantage of this approach is that it can be used to secure standard algorithms such as DES and AES. A second advantage is that dth-order masking schemes, for which sensitive data are split into d + 1 shares (the threshold being d + 1), are sound countermeasures to SCA in realistic leakage model. This fact has been formally demonstrated by Chari et al. [6] who showed that the complexity of recovering information by SCA on a bit shared into several pieces grows exponentially with the number of shares. As a direct consequence of this work, the number of shares (or equivalently of masks) in which sensitive data are split is a sound security parameter of the resistance of a countermeasures against SCA.
The present paper deals with the problem of defining an efficient masking scheme to protect the implementation of the AES block cipher [11] . Until now, most of works published on this subject have focused on first-order masking schemes where sensitive variables are masked with a single random value (see for instance [1, 3, 23, 26, 29] ). However, this kind of masking have been shown to be efficiently breakable in practice by second-order SCA [24, 27, 42] . To counteract those attacks, higher-order masking schemes must be used but a very few have been proposed. A first method has been introduced by Ishai et al. [18] which enables to protect an implementation at any chosen order. Unfortunately, it is not suited for software implementations and it induces a prohibitive overhead for hardware implementations. A scheme devoted to secure the software implementation of AES at any chosen order has been proposed by Schramm and Paar [38] but it was subsequently shown to be secure only in the second-order case [8] . Alternative second-order masking schemes with provable security were further proposed in [34] , but no straightforward extension of them exist to get efficient and secure masking scheme at any order. Actually, at this day, no method exists in the literature that enables to mask an AES implementation at any chosen order d 3 with a practical overhead; the present paper fills this gap.
Preliminaries on Higher-Order Masking

Basic Principle
When higher-order masking is involved to secure the physical implementation of a cryptographic algorithm, every sensitive variable x occurring during the computation is randomly split into d + 1 shares x 0 , . . . , x d in such a way that the following relation is satisfied for a group operation ⊥:
In the rest of the paper, we shall consider that ⊥ is the exclusive-or (XOR) operation denoted by ⊕. [24, 37, 38] . Indeed, the leakages resulting from the d + 1 shares (i.e. the masked variable and the d masks) are jointly dependent on the sensitive variable. Nevertheless, such attacks become impractical as d increases, which makes higher-order masking a sound countermeasure.
Soundness of Higher-Order Masking
The soundness of higher-order masking was formally demonstrated by Chari et al. in [6] . They assume a simplified but still realistic leakage model where a bit b is masked using d random bits x 1 , . . . , x d such that the masked bit is defined as
The adversary is assumed to be provided with observations of d + 1 leakage variables L i , each one corresponding to a share x i . For every i, the leakage is modelled as L i = x i + N i where the noises N i 's are assumed to have Gaussian distributions N (µ, σ 2 ) and to be mutually independent. Under this leakage model, they show that the number of samples q required by the adversary to distinguish the distribution (L 0 , . . . , L d |b = 0) from the distribution (L 0 , . . . , L d |b = 1) with a probability at least α satisfies:
where δ = 4 log α/ log σ. This result encompasses all the possible side-channel distinguishers and hence formally states the resistance against every kind of side channel attack. Although the model is simplified, it could probably be extended to more common leakage models such as the Hamming weight/distance model. The point is that if an attacker observes noisy side channel information about d + 1 shares corresponding to a variable masked with d random masks, the number of samples required to retrieve information about the unmasked variable is lower bounded by an exponential function of the masking order whose base is related to the noise standard deviation. This formally demonstrates that higher-order masking is a sound countermeasure especially when combined with noise. Many works also made this observation in practice for particular side channel distinguishers (see for instance [37, 38, 41] ).
Higher-Order Masking Schemes
When dth-order masking is involved in protecting a block cipher implementation, a so-called dth-order masking scheme (or simply a masking scheme if there is no ambiguity on d) must be designed to enable computation on masked data. In order to be complete and secure, the scheme must satisfy the two following properties:
-completeness: at the end of the computation, the sum of the d shares must yield the expected ciphertext (and more generally each masked transformation must result in a set of shares whose sum equal the correct intermediate result), -dth-order SCA security: every tuple of d or less intermediate variables must be independent of any sensitive variable.
If the dth-order security property is satisfied, then no attack of order lower than d + 1 is possible and we benefit from the security bound (2) . Most block cipher structures (e.g. AES or DES) alternate several rounds composed of a key addition, one or several linear transformation(s), and a non-linear transformation. The main difficulty in designing masking schemes for such block ciphers lies in masking the nonlinear transformations. Many solutions have been proposed to deal with this issue but the design of a dth-order secure scheme for d > 1 has quickly been recognized as a difficult problem by the community. As mentioned above, only three methods exist in the literature that have been respectively proposed by Ishai, Sahai and Wagner [18] , by Schramm and Paar [38] (secure only for d 2) and by Rivain, Dottax and Prouff [34] (dedicated to d = 2). Among them, only [18] can be applied to secure a non-linear transformation at any order d. This scheme is recalled in the next section.
The Ishai-Sahai-Wagner Scheme
In [18] , Ishai et al. propose a higher-order masking scheme (referred to as ISW in this paper) enabling to secure the hardware implementation of any circuit at any chosen order d. They describe a way to transform the circuit to protect into a new circuit (dealing with masked values) such that no subset of d of its wires reveals information about the unmasked values 3 . For such a purpose, they assume without loss of generality that the circuit to protect is exclusively composed of NOT and AND gates. Securing a NOT for any order d is straightforward since 
Remark 1. The use of brackets indicates the order in which the operations are performed, which is mandatory for security of the scheme.
The completeness of the solution follows from:
In [18] it is shown that the AND computation above is secure against any attack of order lower than or equal to d/2. In Section 4, we give a tighter security proof: we show that the scheme is actually dth-order secure.
Practical issues. Although the ISW scheme is an important theoretical result, its practical application suffers few issues. Firstly, it induces an important overhead in silicon area for the masked circuit. Indeed, every single AND gate is encoded using (d + 1)
2 AND gates plus 2d(d + 1) XOR gates, and it requires the generation of d(d + 1)/2 random bits at every clock cycle. As an illustration, masking the compact circuit for the AES S-box described in [5] would multiply its size (in terms of number of gates) by 7 for d = 2, by 14 for d = 3 and by 22 for d = 4 (without taking the random bits generation into account). Secondly, masking at the hardware level is sensitive to glitches, which induces first-order flaws although in theory every internal wire carries values that are independent of the sensitive variables [20, 21] . Preventing glitches in masked circuits imply the addition of synchronizing elements (e.g. registers or latches) which still significantly increases the circuit size (see for instance [32] ).
Since software implementations of masking schemes do not suffer area overhead and are not impacted by the presence of glitches at the hardware level, a straightforward approach to deal with the practical issues discussed above could be to implement the ISW scheme in software. Namely, we could represent each non-linear transformation S to protect by a tuple of Boolean functions (f i ) i usually called coordinate functions of S, and evaluate the f i 's with the ISW scheme by processing the AND and XOR operations with CPU instructions. However, this approach is not practical since the timing overhead would clearly be prohibitive. The present paper follows a different approach: we generalize the ISW scheme to secure any finite field multiplication rather than a simple multiplication over F 2 (i.e. a logical AND). We apply this idea to design a secure higher-order masking scheme for the AES and we show that its software implementation induces a reasonable overhead.
Higher-Order Masking of AES
The AES block cipher iterates a round transformation composed of a key addition, a linear layer and a nonlinear layer which applies the same substitution-box (S-box) to every byte of the internal state. As previously explained, the main difficulty while designing a masking scheme for such a cipher is the masking of the nonlinear transformation, which in that case lies in the masking of the S-box. Our method for masking the AES S-box is presented in the next section, afterward the masking of the whole cipher is described.
In what follows, we shall consider that a random generator is available which on an invocation rand(n) returns n unbiased random bits.
Higher-Order Masking of the AES S-box
The AES S-box S is defined as the right-composition of an affine transformation Af over F 8 2 with the power function x → x 254 over the field
. Since the affine transformation is straightforward to mask, our scheme mainly consists in a method for masking the power function at any order d. Our solution consists in a secure computation of the exponentiation to the power 254 over F 2 8 . Such an approach has already been described by Blömer et al. for d = 1 [3] . The core idea is to apply an exponentiation algorithm (e.g. the square-and-multiply algorithm) on the first-order masked input while ensuring the mask correction step by step. Compared to Blömer et al. 's solution, our exponentiation algorithm is able to operate on dth-order masked inputs and it achieves dth-order SCA security for any value of d. To perform such a secure exponentiation, we define hereafter some methods to securely compute a squaring and a multiplication over F 2 8 at the dth order.
Masking the field squaring. Since we operate on a field of characteristic 2, the squaring is a linear operation and we have
Securely computing a squaring can hence be carried out by squaring every share separately. More generally, for every natural integer j, raising x to the power 2 j can be done securely by raising each x i to the 2 j separately.
Masking the field multiplication. For the usual field multiplication we use the ISW scheme recalled in Section 2.4. Even if it has been described to securely compute a logical AND (that is a multiplication over F 2 ), it can actually be transposed to secure a multiplication over any field of characteristic 2: variables over F 2 are replaced by variables over F 2 n , binary multiplications (i.e. ANDs) are replaced by multiplications over F 2 n and binary additions (i.e. XORs) are replaced by addition over F 2 n (that are n-bit XORs). This keep unchanged the completeness of the scheme recalled in Section 2.4. The whole secure multiplication over F 2 n is depicted in the following algorithm.
Algorithm 1 SecMult -dth-order secure multiplication over F2n Input: shares ai satisfying i ai = a, shares bi satisfying i bi = b Output: shares ci satisfying i ci = ab
Masking the power function. Now we have a secure squaring and a secure multiplication over Our goal is therefore to design an exponentiation algorithm using the least possible multiplications which are not squares. It can be checked that an exponentiation to the power 254 requires at least 4 such multiplications. The exponentiation algorithm presented hereafter achieves this lower bound and requires few additional squares. It involves three intermediate variables denoted y, z and w (note that x and y may be associated to the same memory address).
Algorithm 2
As we will argue in Section 4, , for the dth-order security to hold, it is important that the masks (a i ) i 1 and (b i ) i 1 in input of the SecMult algorithm are mutually independent. That is why we shall refresh the masks at some points during the secure exponentiation by calling a procedure RefreshMasks 4 . The whole exponentiation to the power 254 over F 2 8 secure against dth-order SCA is depicted in the following algorithm. 
For completeness, we describe the RefreshMasks algorithm hereafter. Table 1 ). In comparison, the 2nd-order countermeasures previously published [34, 38] require at least 512 look-ups and 512 XORs and have a memory consumption of at least 256 bytes (see [33, 35] for a detailed comparison).
Algorithm 4 RefreshMasks
Masking the full S-box. The affine transformation is straightforward to mask. After recalling that the additive part of Af equals 0x63, it can be checked that we have:
Masking the affine transformation hence simply consists in applying it to every input share separately and, in case of an even d, in adding 0x63 to one of the share afterward. The full S-box computation secure against dth-order SCA is summarized in the following algorithm.
Algorithm 5 SecSbox
Input: shares xi satisfying i xi = x Output: shares yi satisfying i yi = S(x)
Implementation aspects. Multiplications over F 2 8 are typically implemented in software using log/alog tables (see for instance [11] ). Note that for security reasons, such an implementation must avoid conditional branches in order to ensure a constant operation flow. The squaring and raisings to the 4 and 16 may be looked-up. Different time-memory tradeoffs are possible. If not much ROM is available, the squaring can be implemented using logical shifts and XORs (see for instance [11] ), and the raising to the 2 j , j ∈ {2, 4}, can then be simply processed by j sequential squarings. Otherwise, depending on the amount of ROM available, one can either use one, two or three look-up table(s) to implement the raisings to 2 j , j ∈ {1, 2, 4}.
Remark 2. For the implementations presented in Section 5, we chose to implement the squaring by a look-up table, getting the raising to the 4 (resp. 16) by accessing this table sequentially 2 (resp. 4) times.
Our scheme may also be implemented in hardware. The sensitive part is the implementation of the SecMult algorithm (see Algorithm 1) which may be subject to glitches and which should incorporate synchronizing elements. In particular, the evaluation of the c i shares should not start before the evaluation of all the r i,j 's has been fully completed. Another approach would be to enhance the software implementation of the scheme with special purpose hardware instructions. For instance, the multiplication, squaring and raisings to powers 4 and 16 over F 2 8 could be added to the instructions set of the processor.
Higher-Order Masking of the Whole Cipher
In the previous section, we have shown how the AES S-box can be masked at any chosen order d. We now detail the dth-order masking scheme for the whole AES block cipher.
The AES block cipher [11] operates on a 4 × 4 array of bytes called the state and denoted s = (s l,j ) 1 l,j 4 . The state is initialized by the plaintext value and holds the ciphertext value at the end of the encryption. Each round of AES is composed of four stages: AddRoundKey, SubBytes, ShiftRows, and MixColumns (except the last round that omits the MixColumns). AES is composed of either 10, 12 or 14 rounds, depending on the key length (the longer the key, the higher the number of rounds) plus a final AddRoundKey stage. The round keys involved in the different rounds are derived from the secret key through a key expansion process.
In what follows, we describe how to mask an AES computation at the dth order. We will assume that the secret key has been previously masked and that its d + 1 shares are provided as input to the algorithm (otherwise a straightforward first-order attack would be possible). At the beginning of the computation, the state (holding the plaintext) is split into d + 1 states s 0 , s 1 , . . . , s d satisfying:
This is done by generating d random states s i ← rand(16 × 8) and by computing s 0 ← s⊕ i 1 s i . At the end of the AES computation, the state (holding the ciphertext) is recovered by s ← i s i .
In the next sections, we describe how to perform the different AES transformations on the state shares in order to guarantee the completeness as well as the dth-order security.
Masking AddRoundKey. The AddRoundKey stage at round r consists in adding (by XOR) the rth round key k r to the state. The masked key expansion (see description hereafter) provides d + 1 shares (k r i ) i for every round key k r . To securely process the addition of k r , one simply adds each of its share to one share of the state and the completeness holds from:
Masking SubBytes. The SubBytes transformation consists in applying the AES S-box S to each byte of the state: SubBytes(s) = (S(s l,j )) 1 l,j 4 .
In order to mask this transformation, we apply the secure S-box computation described in Section 3. ) and is multiplied modulo x 4 +1 with a fixed polynomial a(x) = 3x 3 +x 2 +x+2. Since they are both linear with respect to the XOR operation, masking these transformations is straightforward. One just apply them to every state share separately and the completeness holds from:
ShiftRows(s i ) , and:
MixColumns(s i ).
Masking the key expansion. The AES key expansion generates a 4 × 4(Nr + 1) array of bytes w, called the key schedule, where Nr is the number of rounds (which depends on the key-length). Let w * ,j denotes the jth column of w. Each group of 4 columns (w * ,4r−3 , w * ,4r−2 , w * ,4r−1 , w * ,4r ) forms a round key k r that is XORed to the state during the rth AddRoundKey stage. The first Nk columns of the key schedule are filled with the key bytes (where the key byte-length is 4Nk) and the next ones are derived according to the process described hereafter.
Let SubWord be the transformation that takes a four-byte input column and applies the AES S-box to each byte. Let RotWord be the transformation that takes a 4-byte column as input and performs a cyclic shift of one byte from bottom to top. In order to securely process the key expansion at the dth-order, the key schedule w is split into d + 1 schedules w 0 , w 1 , . . . , w d . The first columns of each schedule shares are filled with the key shares at the beginning of the ciphering. Each time a new schedule column w * ,j must be computed, its d + 1 shares (w 0 ) * ,j , (w 1 ) * ,j , . . . , (w d ) * ,j are computed as:
where the t i 's denote the 4-bytes shares of t that are securely computed from the 4-bytes shares of w * ,j−1 . Such a secure computation can be easily deduced from the methods described above. The SubWord transformation is processed by applying the secure S-box computation described in Section 3.1 to the byte shares (w 0 ) l,j , (w 1 ) l,j , . . . , (w d ) l,j for each row-coordinate l ∈ [1, 4] . Since RotWord is linear with respect to the XOR, it is applied (when involved) to every share separately. Finally, when Rcon j/Nk must be added to t, it is added to one of its share (e.g. t 0 ).
The whole dth-order secure key expansion process is summarized in the following algorithm.
*** State unmasking *** 15. c ← s0
Security Analysis
In this section, we give a formal security proof for our scheme. After describing the security model, we pay particular attention to the secure field multiplication algorithm SecMult (i.e. the generalized ISW scheme) which is the sensitive part of our scheme. We improve the security proof given in [18] for the ISW scheme and we show that it achieves dth-order security rather than (d/2)th-order security. Afterward, we prove the security of the whole AES computation (Algorithm 7).
Security Model
We consider a randomized encryption algorithm E taking a plaintext p and a (randomly shared) secret key k as inputs 6 and performing a deterministic encryption of p under the secret key k while randomizing its internal computations by means of an external random number generator (RNG). The RNG outputs are assumed to be perfectly random (uniformly distributed, mutually independent and independent of the plaintext and of the secret key). Any variable that can be expressed as a deterministic function of the plaintext and the secret key, which is not constant with respect to the secret key, is called a sensitive variable with the exception of the ciphertext E k (p) or any deterministic function of it. Note that every intermediate variable computed during an execution of E (except the plaintext and the ciphertext) can be expressed as a deterministic function of a sensitive variable and of the RNG outputs.
We shall consider the plaintext, the secret key and the intermediate variables of E as random variables. The distributions of the intermediate variables are induced by the algorithm inputs (p and k) distributions and by the uniformity of the RNG outputs. The joint distribution of all the intermediate variables of E thus depends on (p, k). On the other hand, some subsets of intermediate variables may be jointly independent of (p, k). This leads us to the following formal definition of dth-order SCA security. Definition 1. A randomized encryption algorithm is said to achieve dth-order SCA security if every d-tuple of its intermediate variables is independent of any sensitive variable.
Equivalently, an encryption algorithm achieves dth-order SCA security if any d-tuple of its intermediate variables, except the plaintext and the ciphertext (or any function of one of them), is independent of the algorithm inputs (p, k).
Before proving the security of our scheme, we need to introduce a few additional notions. To prove the dth-order SCA security of our scheme, we will first show that it can be split into several randomized elementary transformations each achieving dth-order SCA security. Afterward, the security of the whole algorithm will be demonstrated.
As in [18] , our proofs shall apply similar techniques as zero-knowledge proofs [16] . We shall show that the distribution of every d-tuple of intermediate variables (v 1 , v 2 , . . . , v d ) of our randomized AES algorithm can be perfectly simulated without knowing p and k. Namely, we show that it is possible to construct a d-tuple of random variables which is identically distributed as (v 1 , v 2 , . . . , v d ), independently of any statement about p and k. In some cases, the simulated distribution shall involve some intermediate variables ( 
Improved Security Proof for the ISW Scheme
The theorem hereafter states that the generalized ISW scheme (Algorithm 1) achieves dthorder SCA security. The proof given hereafter follows the outlines of that given by Ishai et al. in their paper but it is tighter: we prove that the scheme achieves dth-order SCA security rather than (d/2)thorder SCA security as proved in [18] . -If i / ∈ I (regardless of j), then r i,j does not enter into the computation for any v h . Thus, its value can be left unassigned.
-If i ∈ I, but j / ∈ I, then r i,j is assigned a random independent value. Indeed, if i < j this is what would have happened in Algorithm 1. If i > j, however, we are making use of the fact that r j,i will never be used in the computation of any v h (otherwise we would have j ∈ I by construction). Hence we can treat r i,j as a uniformly random and independent value. -If {i, j} ⊆ I and {i, j} ⊆ J, then we have access to a i , a j , b i and b j and we thus compute r i,j and r j,i exactly as they would have been computed in Algorithm 1; i.e., one of them (say r i,j ) is assigned a random value and the other r j,i is assigned r i,j ⊕ a i b j ⊕ a j b i . -If {i, j} ⊆ I and {i, j} J, then at least r i,j or r j,i (or both) does not enter into the computation for any v h (otherwise we would have {i, j} ⊆ J by construction). Following the same reasoning as previously (case i ∈ I, j / ∈ I), we can then assign a random independent value to the one (if any) that enters in the computation of the v h 's. 4. For every intermediate variable v h of the form a i , b i , a i b i , r i,j (for any i = j), or a sum of values of the above form (including c i as a special case), we know that i ∈ I and i ∈ J, and all the needed values of r i,j have already been assigned in a perfect simulation. Thus, v h can be computed in a perfect simulation.
The only types of intermediate variables remaining are
construction, we have i ∈ I and j ∈ J which allows us to compute a i b j , and since all the r i,j (entering into the computation of the v h 's) has been assigned, the value of v h can be simulated perfectly.
Security Proof of Our Scheme
The following theorem states the security of our whole randomized AES (Algorithm 7).
Theorem 2. The randomized AES computation depicted in Algorithm 7 achieves dth-order SCA security.
In order to demonstrate the theorem statement, we will use the following lemma. 
Implementation Results
To compare the efficiency of our proposal with that of other methods proposed in the literature, we applied them to protect an implementation of the AES-128 algorithm in encryption mode. We have implemented our new countermeasure for d ∈ {1, 2, 3}, namely to counteract either first-order SCA (d = 1) or second-order SCA (d = 2) or third-order SCA (d = 3). Among the numerous methods proposed in the literature to thwart first-order SCA we chose to implement only that having the best timing performance (the table re-computation method [23] ) and that offering the best memory performance (the tower field method [28] ).
In the second-order case, we implemented the only two existing methods: the one proposed in [38] 8 and the one proposed [34] . Eventually, since no countermeasure against 3rd-order SCA was existing before that introduced in this paper, it is the single one in its category.
We wrote the codes in assembly language for an 8051-based 8-bit architecture. The implementations only differ in their approaches to protect the S-box computations. The linear steps of the AES have been implemented in the same way, by following the outlines of the method presented in Sect. 3.2 (and also used in [38] and [34] ). In Table 2 , we list the timing/memory performances of the different implementations.
As expected, in the first-order case the countermeasures introduced in [23] and [28, 29] are much more efficient than ours. This is a consequence of the generic character of our method which is not optimized for one choice of d but aims to work for any d. For instance, the representation of the AES S-box used in [28, 29] involves less field multiplications than our representation. Moreover, those field multiplications can be defined in the subfield F 16 of F 256 , where the field operations can be entirely looked-up thanks to a table of 256 bytes in code memory.
In the second-order case, our proposal becomes much more efficient than the existing solutions. It is 2.2 times faster than the countermeasure proposed in [38] with a RAM memory requirement divided by around 10. It is also 2.5 times faster than the countermeasure in [34] and requires 5.3 times less RAM. Memory allocation differences are merely due to the fact
