Abstract. In this paper we present an enhanced Differential Fault Attack that can be applied to the AES using a single fault. We demonstrate that when a single random byte fault is induced that affects the input of the eighth round, the AES key can be deduced using a two stage algorithm. The first step, would be expected to reduce the possible key hypotheses to 2 32 , and the second step to a mere 2 8 . Furthermore, we show that, with certain faults, this can be further reduced to two key hypotheses.
Introduction
The Advanced Encryption Standard (AES) [18] has been a standard for symmetric key cryptography since October 2000. Smart cards and secure microprocessors, therefore, often include implementations of AES for cryptographic purposes. This can take the form of either VLSI module (i.e. cryptographic accelerators) or embedded software routines.
Fault attacks are where an attacker seeks to intentionally induce a fault in a device to try and derive information on cryptographic keys. The first example of such an attack to appear in the literature was published in 1996 by Boneh, DeMillo and Lipton [6, 7] . This fault attack was applicable to public key cryptosystems, specifically RSA [22] when computed using the Chinese Remainder Theorem. Subsequently, the idea of analysing faults in an implementation of a cryptographic algorithm was applied to block ciphers, such as DES [17] . This technique is referred to as Differential Fault Analysis (DFA) [4] . In both cases the attack functions by comparing a correct result of an algorithm with a the result where a fault has been injected.
DFA has evolved to a very strong and effective form of attack and countermeasures need to be included in devices, such as smart cards, where it has been demonstrated that is is possible to induce a fault. The reported work on inducing faults, such as the optical fault induction reported in [24] , has motivated research in the field of fault-based cryptanalysis of AES. Other methods for injecting faults include variations in the power supply to create a glitch or spike [5] , changing characteristics of the supplied clock [2] , laser light [3] and eddy currents [23] .
Several fault attacks that apply DFA to the AES have been reported in the literature, and most attacks exploit the properties of the encryption function. In [10] , Giraud describes a differential fault analysis of DFA on AES where faults are considered to affect one byte during the computation of the ninth round of AES, and would require 250 faulty ciphertexts. An attack described in Blömer and Seifert [5] extends this to allow an attacker to recover a secret key with 128 to 256 faulty ciphertexts. In [9] , Dusart et. al. show that an attacker would be able to derive a secret key using 40 faulty ciphertexts, if it would be possible to induce a fault that that affects a byte anywhere between the computation of the eighth and ninth round MixColumn. In [21] , Piret and Quisquater describe an attack where two faulty ciphertexts are required, by injecting faults at the input to the eighth or ninth round. In [15] , the authors present a fault attack on AES where a fault is considered to be induced in a 32 bit word of AES in the ninth round. In [15] the authors propose two models for fault induction. In the first model, they assume that at least one of the bytes among the four targeted bytes are uncorrupted. While in the second model they assume that four bytes are corrupted. The former fault model would require six faulty ciphertexts to deriver a secret key, while the latter model would require around 1500 faulty ciphertexts to derive a key. Other authors have considered faults in the key schedule [25, 26] , where the most recent publication has demonstrated that the secret key could also be derived with two faulty ciphertexts [11] .
We can note that when the assumptions are on the value of a byte (either it being faulty or uncorrupted) the number of faulty pairs is quite small. However, it is difficult to be able to insert a given value with any certainty. When numerous faulty ciphertexts are required the same problems still exist, since an attacker needs to find a method of determining which faulty ciphertexts correspond to the desired model. We can, therefore, state that the most efficient attacks are those that require the least faulty ciphertexts and fewest assumptions on the effect of a fault.
In [16] a fault attack against AES was proposed, which suggested that the AES secret key can be derived using a single byte fault induction at the input of the eighth round. The attack exploited the inter-relations between the fault values in the state matrix after the ninth round MixColumn operation and reduced the AES-128 secret key space to around 2 32 , which would require a somewhat lengthy exhaustive to determine a secret key. In this paper, we described an extended version of this attack where the exhaustive search is reduced to 2 8 . We also show that, with certain assumptions on the fault induced, the number of key hypotheses can be reduced to two.
Notation
When referring to operations when analysing AES, multiplications are polynomial multiplication over F 2 8 modulo the irreducible polynomial x 8 +x 4 +x 3 +x+1. It should be clear from the context when a mathematical expression contains integer multiplication.
Organisation
The paper is organised as follows: In Section 2 we describe the background to this paper. In Section 3 we describe an attack based on one of the fault models given in Section 2. In Section 4 we extend this attack by using a different model. In Section 5 we compare the attacks described in this paper with previous work, and we conclude in Section 6.
Background

The Advanced Encryption Standard
Require: The 128-bit plaintext block P and key K. Ensure: The 128-bit ciphertext block C.
1: X ← AddRoundKey(P, K) 2: for i ← 1 to 10 do 3:
X ← SubBytes(X) 5:
if i = 10 then 6:
X ← MixColumns(X) 7:
end if 8:
X ← AddRoundKey(X, K) 10: end for 11: C ← X 12: return C The structure of the Advanced Encryption Standard (AES) [18] , as used to perform encryption, is illustrated in Figure 1 . Note that we restrict ourselves to considering 128-bit AES and that the description above omits a permutation typically used to convert the plaintext P = (p 1 , p -The ShiftRows function is a byte-wise permutation of the state.
-The SubBytes function is the only non-linear step of the block cipher. It is a bricklayer permutation consisting of an S-box applied to the bytes of the state. Each byte of the state matrix is replaced by its multiplicative inverse, followed by an affine mapping. Thus the input byte x is related to the output y of the S-Box by the relation, y = A.x −1 + B, where A and B are constant matrices. In the remainder of this paper we will refer to the function S as the SubBytes function and S −1 as the inverse of the SubBytes function. -The KeySchedule function generates the next round key from the previous one. The first round key is the input key with no changes, subsequent round keys are generated using the SubBytes function and XOR operations. -The MixColumn is a bricklayer permutation operating on the state column by column. Each column of the state matrix is considered as a 4-dimensional vector where each element belongs to F(2 8 ). A 4×4 matrix M whose elements are also ∈ F( 2 8 ) is used to map this column into a new vector. This operation is applied on all the 4 columns of the state matrix. Here M is defined as:
The inverse of the MixColumn can be represented by the matrix:
14 11 13 9 9 14 11 13 13 9 14 11 11 13 9 14
As with the MixColumn operation, all elements of the InverseMixColumn matrix are elements of F (2 8 ). -AddRoundKey: Each byte of the array is XORed with a byte from a corresponding array of round subkeys.
The 128-bit AES algorithm operates on a 128-bit plaintext block and has 10 rounds. Each of the round keys are derived from a 128-bit secret key using key-scheduling algorithm. The scheduling algorithm to generate the r th round key from the (r − 1) th round key is described in Figure 2 . The key scheduling algorithm also uses a round constant, denoted by h r , for the r th round, that is used to eliminate symmetries in the round keys. The symbol << is used to denote a bitwise left shift.
Require: (r − 1) th round key (X = xi for i ∈ {1, . . . , 16}). Ensure: r th round key X. 1: for i ← 0 to 3 do 2:
end if 9: end for 10: return X 
The Fault Model
The implementation of AES we target is an iterative one, as described in [1] . The literature shows that unrolled or pipelined designs of AES are unpopular because they do not allow a block cipher to operate in Output Feedback Mode (OFB) or Cipher Block Chaining (CBC) mode [14] .
Since designs are typically synchronous, an attacker can determine at what point in time certain events are taking place, e.g. when a particular round commences. Moreover, the time certain events take place can often be determined by analysing a suitable side channel. For example, the power consumption of a FPGA or microprocessor has been shown to reveal the details of an implementation ( [13, 20] ).
In this paper we consider two different fault models, that we use to build a method for Differential Fault Analysis of AES.
Random Effect on One Byte
The first fault model that we consider is the same as that used in many other papers, for example [16] , where we assume that the effect of an induced fault is to change one byte to a random value.
For example, an attacker could attempt to use a glitch in the clock to create a fault at the input of a particular round with a certain probability. An iterative design helps in this regard, as an attacker is able to control the timing of fault induction by simply counting the number of clock edges from the start of the instantiation of a cryptographic algorithm. Also, it may be noted that our experiments show that the registers internal to the FPGA device take a certain amount of time to change their value to the next correct value. During the migration, there is a certain amount of time where, if the clock terminates too soon, the correct value will not be written to the registers. This effect applied to a microprocessor is described in [2] .
Fixing a Byte Value The second fault model that we consider is where an instruction (i.e an opcode) is missed in a process and the potential effects this can have on an implementation of AES. Specifically, where this missing of instruction implies that one byte becomes a known, or predictable value.
It has been shown that a glitch in the clock or voltage applied to a microprocessor can be used to make the value returned from one specific instruction constant [2] or skip an instruction [8] . In [8] , an attack is described where the round counter of AES is modified to reduce the number of rounds to one. In our case we will consider a more subtle effect where the loop implementing one round function is terminated early so that one byte of the current state matrix is not overwritten. In this case the value is unknown, but can be computed for a hypothesised key.
The Fault Analysis
In this section we define the strategy to perform a differential fault analysis, where we assume that an adversary has induced a fault in a byte of the input to the eighth round. The first step of the fault attack is equivalent to the analysis described in [16] , and extended in the second step. We are also assuming that the fault corresponds to the first fault model discussed in the previous section where this byte becomes some random and unknown value.
The First Step of the Fault Attack
If a fault is induced in a byte of the state matrix, which is input to the eighth round, the MixColumn operation at the end of the round propagates this fault to the entire column. The ShiftRow operation at the beginning of the following round will then shift these bytes to occupy different columns. The next MixColumn operation will then propagate the fault to the remaining twelve bytes.
This process is shown in Figure 3 where we show the diffusion of a byte fault induced at the input of the eighth round. The XOR difference of the state matrices of the two results, one fault free and the other faulty, is shown. This is what we will use as basis for a differential fault analysis.
If we define the two ciphertexts CT and CT ′ produced from the same plaintext, fault free and faulty respectively, they can be represented by and
where x i and x ′ i , for i ∈ {1, . . . , 16}, are all ∈ {0, . . . , 255}, i.e. one byte. We also define the key matrix for the subkey used in the tenth round as:
where each k i , for i ∈ {1, . . . , 16}, is ∈ {0, . . . , 255}.
If we consider the state of the XOR difference between the values required to compute CT and CT ′ after the ninth round shift row, we can derive the following set of equations. These include the values of the key bytes k 1 , k 8 , k 11 and k 14 , thus giving an expression for 32 bits of K 10 .
Where k 1 , k 8 , k 11 and k 14 are all unknown values ∈ {0, . . . , 255}, and δ 1 is an unknown value ∈ {1, . . . , 255}, i.e. if δ 1 is equal to zero then no fault that corresponds to our model has occurred. The above system of equations can be used to reduce the possibilities for these 32 bits of the key. An attacker could select a value for δ 1 and determine which values of k 1 , k 8 , k 11 and k 14 satisfy all the equations using four independent exhaustive searches. Each equation will return 0, 2, or 4 hypotheses [19] . If any of the four equations cannot be satisfied, i.e. there is an impossible differential [12] , then any hypotheses for that value of δ 1 can be discarded.
If we consider the first equation in the above set:
We know the values of x 1 and x ′ 1 from the correct and faulty ciphertexts respectively. For a given value of 2 δ 1 there will 0, 2 or 4 valid key hypotheses. The mean hypotheses for all δ 1 ∈ {1, . . . , 255} is approximately one, and, therefore, 256 expected key hypotheses when all δ 1 ∈ {1, . . . , 255} are considered.
The same can be said for each of the four equations in the set given above. However, for a given value of δ 1 each of the four equations would be expected to return approximately one hypothesis for a key byte. These values will give one hypothesis for the quartet of key bytes {k 1 , k 8 , k 11 , k 14 }. Given that an attacker will have to take into account all the values in {0, . . . , 255} there will be 256 expected possible values for the quartet {k 1 , k 8 , k 11 , k 14 }.
Information on the remaining key bytes can be derived by using the following sets of equations: In order to obtain information on k 2 , k 5 , k 12 and k 15 an attacker can use 3
.
In order to obtain information on k 3 , k 6 , k 9 and k 16 an attacker can use the following equations:
Finally, in order to obtain information on k 4 , k 7 , k 10 and k 13 an attacker can use the following equations:
It can be noted that the four sets of four equations have an identical structure, and, therefore, the expected number of valid key hypotheses are the same. An evaluation of each set of equations will be expected to return 2 8 unique hypotheses for each quartet of equations. Therefore, an attacker would expect to have 2 32 key hypotheses for the secret key used after evaluating all of the above equations.
The Second Step of the Fault Attack
In order to further reduce the key hypotheses we use the relationship between the ninth round key and the tenth round key.
The ninth round key is defined as:
where each k ′ i , for i ∈ {1, . . . , 16}, is ∈ {0, . . . , 255}. Thus considering the key-scheduling algorithm (see Algorithm 2), the ninth round key, K 9 , generates the tenth round key, K 10 . The key schedule is invertible and K 9 can be expressed in terms of elements of K 10 . The value of K 9 can therefore be expressed as
Again we denote the fault-free ciphertext as:
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 1 C C A and the faulty ciphertext as:
If we trace back each of the ciphertexts back to the input of the ninth round SubByte (see Figure 3) . The state matrix after AddRoundKey, InverseShiftRow, InverseSubBytes becomes 0 B B @ y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14 y15 y16
where each yi, for i ∈ {1, . . . , 16}, is defined as
In the above state matrix we can replace the ninth round keys in terms of the tenth round keys using the equations stated above. Similarly, we can obtain the state matrix at the output of the ninth round MixColumn for the faulty ciphertext CT ′ .
From Figure 3 , we can observe that a fault value in the first column of the state matrix at the output of the eighth round MixColumn is (2f
, where f ′ is a non-zero arbitrary value in F 2 8 . Using the InverseMixColumn operation and the above matrices, we can define the following equation:
Similarly, we can define the following equations: ⊕ k13) )) The second stage of the attack is coupled with the first stage. All of the key hypotheses generated by the first stage are tested using the above above equations. If it the key value satisfies the above equations, we store the key, else it can be discarded.
The expected number of key hypotheses that fulfil the above equations, for a given value of f will be 256, since the equations are similar to those in Section 3.1. That is, each key hypotheses will have a probability of 0, 
Attacking Other Bytes
In the previous sections we describe an attack where we base our differential fault analysis on the knowledge that a fault has been induced in the first byte of the state matrix. However, we can note that the analysis returns a very small number of hypotheses. We could, therefore, conduct 16 independent equivalent analyses under the assumption that a fault is induced one of the 16 bytes of of the state at the beginning of the eighth round. An attacker would expect this to produce 2 4 × 2 8 = 2 12 valid key hypotheses, which is still a trivial exhaustive search.
Extending the Fault Attack
In this section we demonstrate that the fault attack described in the previous attack can be extended using the second fault model defined in Section 2.2. This model requires more assumptions but allows the number of hypotheses to be further reduced.
In second model defined in Section 2.2 we assume that a fault is injected that modifies the opcodes being processed in a microprocessor. Specifically, we consider a fault that reduces the number of bytes processed by the MixColumn function in the seventh round from 16 bytes to 15 bytes, i.e. there is one byte that remains unchanged by this function.
The advantage of this type of fault is that an attacker would expect to be able to identify when the desired fault has been induced by observing a suitable side channel. For example, Figure 4 shows the power consumption of a microprocessor towards the end of a computation of the MixColumn function. The red trace shows the power consumption where the MixColumn function treats all 16 bytes. The black traces shows the power consumption where the MixColumn function treats 15 bytes. The traces have an almost identical power consumption on the left side of the figure, and diverge towards the middle of the figure. This difference could be seen by an attacker and the corresponding faulty ciphertext could then be attacked using the method described in this paper. An attacker could treat this acquired faulty ciphertext and the correct ciphertext to conduct the attack described in Section 3. After generating an expected 2 8 possible key hypotheses, an attacker could proceed to verify that the effect of the fault is possible with the generated key hypotheses, i.e. the input of the relevant byte to the MixColumn function is the same as the output when a fault is induced.. This verification can be conducted at the same time as the evaluation of the second set of equations described in Section 3.2, since the generated values can be used to verify the effect of a fault.
The expected number of hypotheses that would be returned by this verification would be approximately two. This can be seen that if we consider that of the 2 8 hypotheses, one of these hypotheses will be the correct one and each of the remaining hypotheses satisfy the test with a probability of 1 256 . The expected number of remaining hypotheses is, therefore, 1 255 256
. As with the attack described in Section 3, the same analysis could be applied if an attacker is not able to determine what byte has been affected. The fifteen evaluations where an attacker is not considering the correct byte will return one hypotheses (if a set of 2 8 hypotheses does not contain the correct key value, random values would be valid with a probability of 1 256 ). The total number of key hypotheses would, therefore, be 16 255 256 .
Comparison with Previous Work
There are several versions of fault-based differential cryptanalysis that are able to reduce the number of key hypotheses from one fault injected into an implementation of AES [11, 16, 21] . However, the analysis proposed in this paper is more effective as the resulting exhaustive search can be reduced to a trivial amount from one fault. The number of key hypotheses returned by previous work would result in an exhaustive search that would be somewhat time consuming.
The attacks proposed in [11, 16, 21] require two faults to reduce the number of key hypotheses so that an exhaustive search becomes trivial. This can be problematic, since faults are, typically, only successful with a certain probability, and the effect cannot always been predetermined. This would mean that an attacker could potentially have to search among numerous faulty ciphertexts to find a pair that both have the desired fault. The advantage of the proposed attack is that it does not need to reproduce a successful attack in order to able to determine a secret key. The attack proposed in this paper allows an attacker to minimise the number of faults that are required to derive a secret key. This is important as each fault injected into a given device may also render that device unusable. This is because each fault will stress a device and there will be some probability that it will produce a permanent, rather than transient, fault.
In our simulations it has taken approximately 50 minutes to generate all the possible key hypotheses, which would mean that an attacker would expect to find a given secret key after 25 minutes if they could test each possible key as it was generated. This is also somewhat time consuming but the advantage of only requiring one faulty ciphertext is substantial.
Conclusion
The paper proposes a fault-based differential cryptanalysis of AES, that is an extended version of the attack described in [16] . We base our analysis on the often used fault model of one byte being changed to a random value. We also introduce an extension to this attack where we consider a model that provides more information to an attacker, and has also been shown to be practically possible [2, 8] .
An attacker would, therefore, expect to be able to reduce the number of key hypotheses from 2 128 to two with one well placed fault. As noted in [15] , these attacks can be conducted without any knowledge of the plaintext being enciphered, as an attacker would just need to know that the plaintexts are the same.
There are many descriptions of a fault-based differential cryptanalysis of AES that could be prevented by repeating the last two or three rounds of an implementation of AES, to verify that no exploitable fault has been inserted [5, 9, 10, 21, 26] . However, to prevent the attack described in this paper the last four rounds would need to be repeated to check no fault was injected. Moreover, given how much information can be gleaned from one fault, one would expect there to be attacks that require more faulty ciphertexts but would be able to make use of faults in earlier rounds. One would, therefore, suggest that in order to protect an implementation of AES, it would be prudent to protect the last five rounds against fault injection.
