Power analysis attacks against embedded secret key cryptosystems are widely studied since the seminal paper of Paul Kocher, Joshua Ja, and Benjamin Jun in 1998 where has been introduced the powerful Differential Power Analysis. The strength of DPA is such that it became necessary to develop sound and efficient countermeasures. Nowadays embedded cryptographic primitives usually integrate one or several of these countermeasures (e.g. masking techniques, asynchronous designs, balanced dynamic dual-rail gates designs, noise adding, power consumption smoothing, etc. ...). This document presents a simple, yet interesting, countermeasure to DPA and HO-DPA attacks, called brutal countermeasure and new power analysis attacks using multi-linear approximations (MLPA attacks) based on very recent and still unpublished results of Tavernier et al..
Introduction
Since the discovery of Differential Power Analysis (DPA) and High Order Differential Power Analysis (HO-DPA) attacks in 1998 ( [13] ), the urge to develop resistant hardware implementations of symmetric ciphers has not ceased. The most popular countermeasures against these devastating attacks have two leaders : the transformed masking methods (initiated by M.-L. Akkar and C. Giraud in [2] ) and the duplication method (first proposed by L. Goubin and J. Patarin in [9] ). When the duplication method of rank n has been shown to be vulnerable against a n-th order DPA [3] , the masking method -which try to randomize the information leaked from the target device -gave better results in terms of resistance and performances. Thus after several propositions of enhanced DES implementations [2, 3, 1] , the work of Jiqiang Lv and Yongfei in 2005 ([16]) finally proposed an enhanced version of DES claimed to be secured against DPA and HO-DPA. To our knowledge, this countermeasure is still holding against those attacks. It uses the unique masking method of [3] where a new random mask is used for every encryption. Hence, before each encryption, a set of several custom SBoxes (dependent on the newly generated mask) is generated and stored in RAM. These techniques have the serious drawback of assuming the SBox generation being done in a secure way (i.e. no information should leak from this operations [3] ) otherwise it is easy to see that the leaked information would lead to HO-DPAs, combining consumptions traces during the SBoxes generation and consumptions traces during the actual encryption. From these considerations and the fact that such countermeasures implementations must be thoroughly considered, it is a matter of fact they eventually slowdown the designer of such embedded systems (smartcards, FPGA devices) and then the product's time to market. Moreover the resulting implementation, that integrates the additional computations (SBoxes generations), might show itself inefficient in terms of execution time from the need of secure computations [3] . We present here a brutal way to counter-act Power Analysis attacks. The countermeasure advantages come from its simplicity and how it naturally disable relevant information leakage, making it easier to design and implement without assuming that any part of the design is more secure than another. We will discuss its cost compare to Jiqiang Lv and Yongfei's bounds for DES unique masquing countermeasures [16] , thus isolating some cases where the brutal countermeasure shows itself attractive to designers. Then we introduce a new set of power analysis attacks based on linear and multi-linear cryptanalysis that will put the first bounds on the brutal countermeasure for DES and AES. Finally we give the current results given by MLPA attacks on somme simulations and on some real consumption traces (the DPA contest traces found in http://www.dpacontest.org/).
Preliminaries on embedded symmetric ciphers and Power Analysis attacks
In this section is first discussed the symmetric cipher design model on which our study has been done and then the way Power Analysis attacks can be applied to those designs.
Embedded symmetric cipher design model
Our study restrict itself to smartcards and FPGA devices that are meant to bore a symmetric cipher implementation. As it is now commonly accepted that hardware implementations of symmetric (as well as asymmetric) ciphers achieve at the same time better performances and better security, the development of such devices has tremendously increased in the last few years. Symmetric cipher hardware implementations can take lots of forms considering the synchronous vs asynchronous designs, the pipelined versions, the implementations designed for restricted areas, consumption and/or high throughput. For reasons of clarity, we will describe the studied designs using the common shape of symmetric ciphers : Substitution-Permutation Network (SPN) composed in rounds (the key schedule won't be taken in account for our study, we only suppose the round keys to be available when needed). A symmetric cipher can be represented as on Figure 1 (note that the sub-blocks within a round can be ordered more or less differently). The Permutation part of the cipher, as well as the add round key part are linear functions that can be very efficiently implemented in hardware with simple combinational logic. However, the substitution part is usually made of SBoxes, that are highly non-linear functions on 4, 6 (DES) or 8 (AES) bits and are not so easy to implement in combinational logic. As a matter of fact, in many designs the SBoxes are stored as lookup tables in memory (RAM or ROM) and accessed when needed in order to save critical logic space. Hence, one way to implement one round of the symmetric cipher is to split it in three clock cycles, the first one dedicated to the add round key function, the second one for the lookup tables of SBoxes to be accessed and the last one for the diffusion function. Of course each of them can be split again in several clock cycles if needed (In AES for instance, there can be 8 RAM accesses to the same SBox in one round or just one RAM access if the SBox is duplicated in RAM). Furthermore, when the throughput is more critical than space, it is usually pretty easy to pipeline the executions, in that case it is then mandatory to implement each round instead of just one round and a loop counter. To our knowledge Power Analysis attacks on smartcard (ASIC) and FPGA are done on such implementations specifications and they will be the base of our study of PA attacks and countermeasure. The knowledge of this high level design (what is computed during each clock cycle) is considered to be known by the attacker, as some probing techniques would give him this information anyway.
Power Analysis Attacks
Power analysis attack is a dynamic and involved source of research as the development of resistant cryptographic hardware devices is needed. The study of PA attacks and their countermeasure has taken a prodigious takeoff since the introduction of the very efficient DPA attacks in 1998.
Power consumption in CMOS circuits Without going into the depths of CMOS gates power consumption (a simple, yet enough for our need, presentation can be found in [17] pages 27-60) what we would like to point out here is that the power consumption of CMOS circuits is dependent on the data manipulated as transitions from 0 to 1 and 1 to 0 consume significantly more power than 0 to 0 or 1 to 1 transitions through a logical gate. An attacker observing the overall consumption of a CMOS circuit during two different execution can tell, at a chosen point in time, which execution has led to a greater number of data changes. What is remarkable to note though is the fact that power consumption of combinational logic (in ASIC or FPGA) at a point within a clock cycle won't give the attacker relevant information on the data since one usually assume that the attacker has not a precise enough knowledge of the netlist to be able to predict the glitches occurring throughout the logic circuit (see [17] pages 39-40). Considering this, the power analysis are based on the study of registers and buses power consumption since theirs data transitions are synchronized with the clock fronts and don't involve combinational logic. To our knowledge all PA attacks are based on this principle.
Hamming distance and Hamming weight models When considering the consumption of a bus or register, since the consumption power is significantly higher when a bit value change, the Hamming distance model (HD) says that the power consumption is closely related to the Hamming weight of the difference (bit-width Xor) of two successive data values. Note that, of course, absolute values of the measured power traces are not of any use for the attacker, but relative values with respect to other measurement are relevant. A more simple model, the Hamming weight model (HW), approximate the power consumption directly by the Hamming weight of the manipulated data value. Other models exists, they are basically variants of those models based on some knowledge the attacker might have on the targeted hardware design (see [17] pages 38-43).
SPA, DPA, HO-DPA
SPA, DPA and HO-DPA attacks are semi-invasive passive attacks introduced in [13] by Paul Kocher, Joshua Ja, and Benjamin Jun in 1998. Their semi-invasiveness and passiveness make them easy to setup, i.e. no need for a complete knowledge of the implementation, timing analysis, and so on. Let us give a rough description of these attacks and introduce some useful notations.
Simple Power Analysis SPA is the simplest way to use Power analysis in order to attack a cryptographic implementation. It requires interpreting the power consumption trace of the cryptographic function execution. According to [13] , SPA can be used to break cryptographic implementations in which the execution path depends on the data being processed (e.g. conditional branching, comparisons, multipliers, exponentiators, etc. ...). Furthermore the authors consider the prevention of SPA to be fairly simple.
Differential Power Analysis The efficiency of DPA attacks comes from the fact that instead of studying directly the power consumption over the execution time, it focuses to data-related instructions. By statistical means, DPA allows the attacker to suppress the measurement noises and bring to light data-dependent operations. Let us borrow the notations of [13] here :
The j th sample of T i , the i th recorded power trace.
• D(P, B, K s ) : DPA selection function, computes B (Hamming weight of intermediates bits at a fixed point of time), as a function of a secret key block K s and the plaintext P (could also be the ciphertext C). In the original DPA from [13] on DES, B is the Hamming weight of one intermediate bit (i.e. the value of one bit). For now let assume the value of B is 0 or 1.
After observing m executions of the cryptographic primitive, recording each power trace
and the corresponding plaintexts P 1···m (respectively ciphertexts C 1···m ), the attacker computes the value of {B i } 1···m using the selection function
The traces are divided in two sets S 0 and S 1 , such that T i ∈ S 0 iff B i = 0, T i ∈ S 1 otherwise and the differential trace over the k samples is computed :
(1 − B i ) If K s was a wrong guess, then the values {B i } 1···m are not related to the manipulated data and then, when the number of tests increases (m → ∞), the differential trace tends to a flat trace (∀j
On an other hand, if K s was a right guess, the value {B i } 1···m are correct and the differential trace is related to the power consumption that coincide with the value of B. Furthermore, the value of other bits, the measurement noises, being not considered by D, will less affect the differential trace as the number of tests increases. Hence, the differential trace will bore spikes on samples where the manipulated data is correlated with D when m increases.
Remarks Other methods have been developed to evaluate more or less precisely correlations between the power consumption traces and selections functions,the interested reader can refer to the work of E. Brier, C. Clavier and F. Olivier in [6] that uses the Pearson coefficient (CPA) and the maximum likelihood method of R. Bévan and E. Knudsen [4] . Moreover, when our description details single-bit DPA (B represent a single bit), more complicated selection function can be used where B can take more than two values (Hamming weight of an intermediate data value), those kind of attacks (DPA multi-bits) have been gathered under the name Partitioning Power Analysis (PPA) by Thanh-Ha Le, Jessy Clédière, Cécile Canovas, Bruno Robisson, Christine Servière et Jean-Louis Lacoume in [14] High-order DPA In a n-order DPA, a combination of n points in the data path is involved in the selection function, i.e. for each power trace, n samples will be differentiated in the same differential trace ∆ D (1 · · · n).
Countermeasures
As introduced in the section 1, the unique masking techniques uses random data for every encryption function call in order to randomize the power consumption. Hence, the additive masking consists in manipulating data that have been xored with a random value (the mask) and follow the mask value throughout the cipher execution such that it can removed when needed (at the end of a round, a set of rounds or even at the end of cipher execution). Even though following the additive mask value is pretty easy when considering linear functions, it show itself tricky when considering highly non-linear function such as SBoxes. Hence, the proposed masking techniques [2, 3, 1] , uses generations of custom SBoxes related to the current masks such that the custom SBoxes make it possible to easily follow the mask values. The custom SBoxes are then stored in RAM, the original versions of the SBoxes can be stored in RAM or ROM. In [16] , the authors proved that three 32-bit random masks and six custom SBoxes are the minimal cost for a secure DES implementation masking all the outputs of the SBoxes of the sixteen rounds.
A brutal countermeasure
As has been detailed in the previous section, the power analysis attacks are based on the study of registers or buses power consumptions, as the transitions from one data value to another inside them are done at a precise time of the clock cycle and then allows to precisely determine the consumption of such a transition. This consumption being assumed to be closely related to the Hamming weight of the manipulated data (straightforwardly in the HW model or on the difference of two successive data in the HD model). DPA attacks work assuming the attacker can predict the value of one bit (or of a set bits) actually manipulated by a register or bus as a function of the known input (or output) bits of the cryptographic primitive and few key bits. In practice there should not be more than 32 key bits involved [3, 1, 16] otherwise the attack couldn't be achieved considering the cost in memory and acquisition time. From the above considerations, a straightforward way to disable such Power Analysis attacks is to suppress the use of registers and buses until every bit stored in registers or going through the buses are either independent on the secret key or dependent on more than 32 bits of the secret key (i.e. before a certain number of rounds).
countermeasure setup and drawbacks
Depending on the target symmetric cipher's diffusion functions, one can fix the number of rounds that must be executed during one clock cycle (i.e. between two registers or two access to a bus). Let us consider the two most popular symmetric ciphers : The Data Encryption Standard (DES) and its successor the Advanced Encryption Standard (AES). The brutal countermeasure for DES would be to compute the first three rounds by pure combinational logic in one clock cycle and, by symmetry, the same thing should be done for the last three rounds. For AES, since its diffusion function is more efficient, the first round should be done in one clock cycle as well as the two last rounds (since the last round of AES does not contain the diffusion MixColumn). Let us call these incompressible blocks the "glued blocks". The obvious drawback of this countermeasure is that it makes it mandatory to implement the SBoxes in combinational logic (using LUT implementation for instance). Furthermore, on a pipelined implementation, it would limit the overall throughput (since it forbids to divide the first and last blocks of logic in several clock cycles). The advantages of the countermeasure being its very simplicity to implement (no need for additional functions) and the fact that it does not base itself on a secure pre-computation. It seems important to note here that this countermeasure is not compatible with the unique masking methods since those methods, as seen in section 2.2.2, need to generate maskdependent SBoxes at runtime.
Drawback bypass In some cases it is possible to go around the pipeline drawback. When the area is not critical, it is possible to put several glued blocks in parallel monitored by a slower clock (generated by a pll component for instance) and connect them to the original rounds implementation that runs at a faster clock cycle. This solution would keep a high throughput even with the countermeasure. Let us also note that the AES SBox have a very efficient implementation in terms speed and area using the multiplicative inverse function in GF (2 8 ) ( [11] ).
Finally this countermeasure may be attractive to designers that have a large combinational logic space and give priority to strong security, even though the cost in area is outrageous.
(M)LPA Attack description and complexity
In this section is introduced Linear Power Analysis and Multy-Linear Power analysis attacks. Those attacks correspond strictly to Linear ([18]) and Multy-Linear cryptanalysis ( [12] ) in the side-channel world. We are first going to introduce some useful notations for the study of linear approximations. Then we will introduce the idea of LPA and MLPA before describing the attacks algorithms and complexity. Finally we will discuss its practical setup.
Linear approximations of a symmetric cipher
Linear cryptanalysis has been introduced by Matsui in 1993 ([18]), since then it has become one of the most important base of the study of block cipher security. Nowadays new block ciphers must prove some inherent resistance against linear cryptanalysis. Let us remark that many cryptanalysis methods are based on this fundamental discovery, among others, the multi-linear cryptanalysis [12, 5] 
Pr
Given such a linear equation, Matsui showed that a high probability of success to recover the involved key bits in the equation using linear cryptanalysis would require a datacomplexity (i.e. number of plaintext-ciphertext pairs) of N = 1/ǫ 2 .
Multi-linear cryptanalysis It was shown in [5] that instead of using a single linear approximation, the use of several linear approximations involving the same key bits would significantly improve the performances of the attack. As a matter of fact, given n linearly independent approximations of respective bias ǫ j , j = 1, · · · , n the data-complexity of the attack would be reduced to
In a very recent -yet to be published -paper, Tavernier et al. ([15] , studied the problem of finding all the linear approximations with a given bias of a given Boolean function. The authors showed the equivalence between the problem of finding linear approximations for a fixed output mask (Γ fixed) and a list decoding problem in the first order Reed-Muller code. They were then able to find good linear approximations up to 8 rounds of DES and thus, based on results of [8] , break a reduced version of the cipher with low data-complexity (2 21 plaintext-ciphertext pairs).
Introduction to (M)LPA
As mentioned above, (M)LPA implies the use of linear approximations to attack a symmetric cipher hardware implementation by power analysis. We will introduce two different ways to use linear approximations by an attacker, the later will be the so called (M)LPA attack. Let us denote H(u) the Hamming weight function of a vector of bits u.
A first approach : a classical approach A very straightforward approach would be to attack by DPA, CPA or PPA using a linear approximation as base of the selection function. This will render the attack's selection function dependent on the approximation bias ǫ and thus increase the data complexity. The advantage of such an attack will be to find linear approximation that involve few bits of the key (less than 32 in practice) when evaluating data values in registers or going through buses that are strictly dependent on more than 32 key bits from the point of view of the cipher function. Hence it would allow to attack a cipher implementation where the unique masking technique or the brutal countermeasure are used only for the data bits that dependent on less than 32 key bits. For instance let us consider the mono-bit DPA attack presented by Kocher in [13] . Using the notations introduced in section 2.2.1, let us denote by m the complexity of the attack if the selection function (D(P, b, K)) is not probabilistic (classic DPA) and M the one when the selection function (D ǫ (P, b, K)) is probabilistic (meaning that D ǫ has probability 1/2 + ǫ to be right).
It is easy to see that when the key guess is wrong, the probabilistic section function is not correlated to the manipulated data (as the old selection function) and the differential trace will tend to a flat trace when when M −→ ∞. Let us consider now that the key guess is right. Since D ǫ is right with a probability p = 1/2 + ǫ, let us denote D true the cases where the selection function is right and D f alse otherwise. Then, after re-indexing the plaintexts and traces, we have
where D is an uncorrelated selection function (it has 1 chance over 2 to be wrong) and then will tend to a flat trace when M −→ ∞. Finally, the data complexity of the attack is such that 2ǫM ≥ m, in other words, the complexity of the attack increase by a factor 1/(2ǫ) as the selection function has a bias ǫ.
Remark 1 Let us note here that the term
in the above equation will crush the potential spikes amplitude and in practice, ǫ shouldn't have to be very small for a data-complexity to be unreachable in practice. The measurement acquisition time cannot be neglected in Power Analysis attacks. Remark 2 The attack described above can be easily extended to multi-linear approximation attack.
Second approach : a HD and HW models approach An interesting way to use linear approximations would be to directly approximate the Hamming weight of a register since this is the quantity which is the most correlated to what is being measured. Thanks to the work of Tavernier et al. (in [15] ), it is possible to find linear approximations of < H(C(P, K)), Γ H > with any chosen vector Γ H (Γ H is a vector of length log 2 (|C|), with respect to the notations of section 4.1). If we assume that the actual value of the measurement samples T i [j] is closely related to the value of the hamming weight of the data manipulated (for the HW model) or the difference between two successive data manipulated (for the HD model), then the use of linear approximations on the hamming weight value of a register (or a bus) would lead to very efficient attacks (a discussion on this assumption is given in the later section 4.3.2). This important remark is the origin of the new MLPA attacks that should prove themselves much more dangerous than the previous DPA-like approach.
The MLPA attack
As introduced in the previous section, the LPA attack is based on the HW and HD models. If we assume that these models are relevant, then multi-linear approximations can be used in all their strength. As presented in [8, 15] in the context of classical multi-linear cryptanalysis, one can consider the recovering of some key-bits as the decoding problem of a code whose length is equal to the number of available linear relations and over a memoryless channel whose capacity depends on the respective biases of the linear approximations. Let us consider a set of n linear relations of biases ǫ l , l = 1, · · · , n with a form as follow :
where the set of vectors κ l , l = 1 · · · n are such that a limited number k of key bits are involved in the equations (in practice less than 32 bits) and form a matrix of rank k The idea is to reconstruct a code word y of length 2 k from a noisy and erased codewordỹ wich is enough close to y, to be able to decode it in the first Reed-Muller code.
Attack algorithm
After observing N encryptions and selecting the sample j in each traces T i , i = 1 . . . N where the target intermediate data bits are manipulated, the attack will proceed as follows :
1. For each linear approximation and each "plaintext-T I [j]" pair (for the HD model it would be, for each "plaintext pair-T i [j] pair") compute the predicted value of < K, κ l > i using the right member of the equation 2 (which would be "
2. For each linear approximation, separate the traces into two sets S l 0 and S l 1 for which < K, κ l > has been evaluated to 0 and 1 respectively.
3. Construct the noisy and erased codewordỹ such that the value ofỹ at position x l = κ l (κ l is seen here as its value in
). The position were no linear approximation is defined will be put to zero thus considering it as an erasure position. 4 . Decodeỹ in the first order Reed-Muller code, i.e. the most probable codeword y is the one that maximise the inner product x∈{0,1} t (−1) y(x)ỹ (x). The Fast Fourrier Transform would do the trick in a time complexity O(k2 k ) and data complexity O(2 k ).
For details of Reed-Muller decoding efficiency in a gaussian and erasure channel, the interested reader should refer to the results of I. Dumer-R. Krichevskiy in [7] .
Practical setup
The attack presented above may seem completely unrealistic since it uses directly the value measured as Hamming weight of the data manipulated, which contradict subsequently the remark done in section 2.2 on the use of absolute measurement values. Two practical setup seem possible to bypass this :
• First of all, let us assume that the targeted device can be run with chosen plaintexts. Under this hypothesis it is possible to attack by re-initializing the registers before each encryption (reseting the register would be to run a set of fixed plaintexts until the device is in the same state before each encryption). Therefore, using simple pretesting on the board, it would be possible to relate the consumption traces to the targeted quantities as following a Gaussian law.
• For a more practical attack, assuming that we have access to a twin device where we can put arbitrary chosen keys, it would be possible to run the algorithm that search linear approximations directly on the twin device as a pre-processing phase of our attack. As the algorithm is run on a Boolean function as a black box, using the consumption measurement as output value of our Boolean function might render the attack even more efficient than in the model presented above. Further more, it is then possible to mount unknown cipher attacks since no knowledge of the symmetric cipher is needed except for its SPN structure (the hardware device is seen as a black box from which the consumption leakage are the outputs).
Results
In this section are presented the results obtained using the above described attacks on the DES and AES cipher. There are two sets of results, the first ones are called simulations and can be seen as the validation of our attack in theoretical model. The second set of experiment have been done on real power traces, and validate the practical feasibility of the attack. Table 1 and Table 2 summarize some of the results, in these tables, "# linear equ." refers to the total number of linear approximations found for the attack, not all of them have been useful, "# Plaintext" or "# Traces" refers to the data complexity of the attack and "Pr(Success)" refers to the probability of success of the attack in simulation. Table 1 summarize our results (with respect to HW and HD model). They show that a glued block of three rounds for a DES version of the brutal countermeasure wouldn't be enough. The simulation has been done considering that the cipher implementation leakage gives the hamming weight of the targeted data. Hence, in the HW model, the linear approximations evaluate the hamming weight of the round register (assuming that their is a register after a glued block of 1, 2 or 3 rounds), in the HD model, the linear approximations evaluate the hamming weight of the differences of the round register between two execution (two different plaintexts). Let us note here that in a chosen plaintext attack, the HW model results correspond to an HD model. It is important to note here that no linear approximation have been found for the first round in HD model, as if no information would leak from the hamming weight of the data manipulated. The attack on AES has been done on the last round since it does not contains the MixColumn diffusion function.
Attack on DPA-contest traces Thanks to the DPA contest, power consumptions traces are freely available. Unable to obtain and setup a hardware device ourselves, these online available traces allowed us to try our attack on real power traces and then prove the feasibility in a real setup of the attack. The attack has been launched on the contest traces (secmatv1 2006 04 0809) that yield about 80000 power consumption traces. The linear approximations evaluate the hamming weight of the difference of data stored in the implementation register (LR) (see [10] for more details on the DES implementation), the attack description and setup can be found in Annexe of this document. 
Conclusion and future work
The results shown in section 4.3.3 prove the feasibility of the MLPA attacks, it is our belief that this set of attacks is a starting point of new results on power analysis attacks on embedded symmetric ciphers. Hence the next steps will be of two kinds :
• The research of better linear approximations in term of bias and which can approximate more rounds of the symmetric cipher. This implies a complexity in time that we did not have for the redaction of this document.
• The experimentation on an unknown cipher implementation with research of linear approximation directly on the board. This attack may lead to very efficient attacks since it directly approximate the leakage function without using any consumption model.
[ Annexe : The attack on DPA-context traces setup
This annexe describe an MLPA attack on power traces found on the dpa-contest website : http://www.dpacontest.org/. The traces used for our attack are stored under the name : secmatv1 2006 04 0809, there is 81089 power traces that have been measured from a straightforward DES implementation detailed in [10] .
The implementation is described in the figure 2 (from [10] ). Let us denote H(X) the Hamming weight function, IP (X) the initial permutation of DES cipher and DES n (X, K) the first n rounds of the DES encryption on a 64-bits vector X and a (n × 64)-bits K. The power measurement samples we are interested in are the ones corresponding to the load of the register LR, after round 1 and 2. According to the Hamming Distance model, they should correspond to H(IP (X) XOR DES 1 (X, K)) (noted C 1 (X, K)) and H(DES 1 (X, K) XOR DES 2 (X, K)) (noted C 2 (X, K)) respectively. The sample in- Figure 2 : Schematic of DES implementation dexes were found by just simulating a DPA attack on the first round and on the second round (using the first round key). It is our believe that these informations could have been found by an attacker using simple timing measurement, anyways it is a hypothesis of the MLPA attack that these informations are known. Hence, the load of register LR after the first round (respectively the second round) was found to be corresponding to the 5749th (respectively the 6374th) sample of the power traces.
Linear approximations have been generated corresponding to < C i (P, K), Γ H >, i ∈ {1, 2}. Only the ones where Γ H equals to 0x10 or 0x20 were kept. The Table 3 give an example of 11 of these approximations for the second round (C 2 ). Over these 11 equation, only 6 key bits are involved (K[j] is the jth bit of the secret key). The last thing we now have to do in order to apply the MLPA algorithm is a way to tell the value < C i (P, K), Γ H >, i ∈ {1, 2} from the consumption measurement at the selected sample. That is why, to simplify this attack, we only select the output mask (Γ H ) to be 0x10 or 0x20 because then, we just have to separate the traces in two, the ones that have power measures greater than the average power measure S 1 and the others S 0 , assuming that the power traces in S 1 are such that < C i (P, K), 0x20 >= 1 and < C i (P, K), 0x20 >= 0 for the others. We then assume that the power traces in S 1 are such that < C i (P, K), 0x10 >= 0 and < C i (P, K), 0x10 >= 1 for the others since there is very few chance to have C i (P, K) < 0x10 or C i (P, K) ≥ 0x30 from random plaintexts. With this setup, and only considering these 11 equations, the 6 keys bits are retrieved from the first 2000 traces.
