Abstract-Cryptographic implementations are often vulnerable against physical attacks, fault injection analysis being among the most popular techniques. On par with development of attacks, the area of countermeasures is advancing rapidly, utilizing both hardware-and software-based approaches. When it comes to software encoding countermeasures for fault protection and their evaluation, there are very few proposals so far, mostly focusing on single operations rather than cipher as a whole.
Introduction
Protection and physical attacks on cryptographic implementations are ever-evolving areas, resulting into continuous effort on each side to make advancements over the other one. Attackers utilize various techniques that can break the protection and reveal information about the data or secret key. On the other hand, data owners and custodians try to prevent these attacks by applying wide range of countermeasures.
There are various ways to analyze a device and its implementation, Fault Analysis (FA) being one of the most popular ones. Since the first reported attacks, protecting the implementations of ciphers have become a major concern. When selecting a countermeasure, one needs to decide what degree of protection to implement, taking into account the data value and protection price. There is no universal countermeasure, each method has its advantages and limitations. In general, countermeasures can be classified into hardware-based and software-based.
Implementers currently still rely more on hardware-based approaches, such as shielding [1] , sensors [1] , or hardware redundancy [2] . This is mostly because to inject a fault, physical methods are normally used, such as lasers, electromagnetic pulses, or voltage/clock glitches [3] , and therefore, physical protections are effective in detecting/thwarting these.
There are works that utilize encoding techniques in hardware to provide fault resiliency, e.g. [4] , [5] , [6] . However, there is no straightforward way to implement such schemes in software and therefore, these papers do not provide any details on potential efficiency and security in case the countermeasure is ported into software.
Our Contribution
In this work we are interested in analyzing software encoding countermeasures for a full cipher implementation. To facilitate the evaluation, we formalize fault models and encoding countermeasures in software, bringing light into understanding of what is needed and what is possible.
We formalize evaluation metrics that measure the robustness of a code against bit flip faults and instruction skips on a full cipher implementation. We present the exact formula of our metric for a code used in protecting one single operation. Such an analysis gives us insights on what kind of codes to choose -we show that both the minimum and maximum distances of a code are important. This leads us to the notion of anticode from coding theory which is a definition of code that bounds both minimum and maximum distances of a binary code.
We provide theoretical analysis for what parameters an anticode exists, which gives a direct overview of feasibility without the need to manually search for the anticode existence. As the next step, we present an algorithm to automatically select anticodes with required properties for protecting cryptographic implementations against DFA (if such codes exist).
We develop an evaluation method for encoding countermeasures that is based on dynamic code analysis and works directly on assembly implementations. We implemented a protected version of PRESENT-80 cipher by using an AVR assembly language and used our evaluation method to analyze the performance of different anticodes w.r.t. aforementioned metric. Our results reveal what trade-offs between the security level and the efficiency (speed, time) can be achieved. Both advantages and disadvantages are discussed. To the best of our knowledge, this is the first work implementing and evaluating the software encoding countermeasure on a full cipher.
The rest of the paper is organized as follows. Section 2 discusses related works. Section 3 formalizes fault attacks and encoding countermeasures in software. In Section 4 we present our metric for evaluating a code used in encoding countermeasure with respect to bit flip faults and instruction skips. The exact formula for our metric in case the code is used for protecting one operation is provided in Section 5 along with the notion of anticodes. Algorithms used for code selection and for evaluation of software implementations are detailed in Section 6. Section 7 provides a case study on block cipher PRESENT. Section 8 gives a guideline on how to choose anticode parameters. Discussion is stated in Section 9 and finally, Section 10 concludes this paper and provides a motivation for future work.
Related Work

Differential Fault Analysis
When it comes to analyzing symmetric block ciphers under fault conditions, the most effective and popular method is the Differential Fault Analysis (DFA) [7] . Following this method, the attacker normally disturbs the computation circuit during the last three rounds of the encryption and then she compares the faulted ciphertext with the non-faulty one. By analyzing this pair of ciphertexts, she can get the information about the secret key used in the encryption. In some cases, single pair is enough to reduce the key search space to a feasible number [8] , [9] . In other cases, several fault injections are necessary [10] , [11] .
Countermeasures
Software countermeasures against fault attacks can be generally divided into two main groups: instruction-level and algorithmlevel techniques [12] . Instruction-based countermeasures include instruction duplication or triplication, and fault-tolerant instruction sequences, where an instruction is replaced by functionally equivalent sequence of more secure instructions [13] . This technique was recently extended to a new approach, called intra-instruction redundancy [14] . In this technique, data is split among several instructions, by using a redundant bit-slicing.
On the other hand, algorithm-level countermeasures include temporal and information redundancy on an algorithm level [15] . Temporal redundancy techniques normally execute the algorithm several times and then compare the results for inconsistencies [3] , [16] .
Software encoding countermeasures fall in the second category, introducing the redundancy in the information being processed. Depending on the encoding scheme design and amount of redundancy, these countermeasures can provide a robust alternative to hardware-based approaches [17] . Breier and Hou [18] showed how to select codes with desired fault properties for protecting binary operations. Theoretical bounds of software encoding countermeasure used in a whole cipher implementation are considered in [19] , [20] . However, no real implementation or simulation was given in either work. Servant et al. [20] considered a particular code when used in a full cipher, which they referred to as (3, 6) -code, that is actually a (6, 16, 2)−binary code (see Definition 3). The probability of detecting a fault was analyzed in this case and it is 93.75%. The approach in [19] does not consider some important aspects of fault injection, such as ability of the attacker to precisely select the fault mask or his ability to inject instruction skips. Generally, to avoid a successful fault injection attack for the countermeasure in [19] , used code would have to remain a secret.
Countermeasure Evaluation Methods
Moro et al. [21] developed an evaluation platform based on electromagnetic fault injection to experimentally verify temporal redundancy countermeasures at assembly instruction level. They implemented a protected version of FreeRTOS to conduct the study. Two countermeasures were tested -an instruction skip protection and a fault detection that is applicable to a subset of assembly instructions. Their experiments showed that both countermeasures work in a way they are supposed to, however with obvious limitations that come from their designs -they either protect only against instruction skips and not against other, more complex fault models, or they can only protect several chosen instructions of the code.
Yuce et al. [12] provided experimental evaluation of several instruction level countermeasures by using a single clock glitches. They showed that the most popular choices, such as instruction duplication/triplication, parity, and instruction skip countermeasure can be broken by a careful choice of fault scenario.
Goubet et al. [22] aimed at formal verification of countermeasures by using automata and SMT solver. Such approach required a decomposition of a code into pieces, while analyzing each piece separately. Also, the method works by comparing the unprotected code with the protected one. The proposed method, however, is not scalable for a full cipher evaluation -for code snippets, where 10 lines of code need 10.7 s to evaluate. Furthermore, analyzing small snippets separately might not reveal the vulnerabilities that might arise from connecting them to a full implementation (cf. Remark 10).
Breveglieri et al. [6] evaluate a subset of encoding-based countermeasures for hardware, based on parity/residue check bits. However, such methods provide only a limited amount of security -odd number of bit-flips is detected, but even number always passes the checks. Moreover, the attacker can also disturb the parity bit or the "checkpoint" which provides the integrity check.
In case of encoding based software countermeasures, there are no works proposing a full cipher evaluation to the best of our knowledge. The closest works to this one evaluate only a single operation on encoded data [18] , [17] .
Our method is universal for encoding based software countermeasures and provides details on all the possible bit flips and instruction skips. Also, the dynamic code analysis technique that was implemented can efficiently evaluate a full cipher implementation in a short time.
Software Encoding Schemes
In this section we first give the formalization of fault attacks in software. Then, we provide necessary coding theory background and present the formalization of encoding countermeasure that can be applied to all symmetric ciphers, which we refer to as fault resilient encoding scheme.
Fault Attacks in Software
Assembly language is a low-level programming language, specific to a particular architecture. Normally, there is a one-to-one mapping between assembly instructions and machine code that is being executed on the device. Assembly language uses a mnemonic to represent machine operations in the form of instructions. Each instruction falls into one of three categories: data movement, arithmetic/logic, and control-flow.
Operands are entities operated upon by an instruction. Addresses are the locations of specified data in the memory. Operands can be immediate (constant values), registers (values in the processor number registers), or memory (value stored in the memory). Standard instruction can have zero to three operands, the leftmost operand being usually the destination register, the second and the third are source registers.
For our purpose, registers are the most important storage units. Size of the register is typically stated in bits and depends on the device architecture (e.g. 8-bit, 32-bit, 64-bit). Normally, all the registers for a particular device have the same size. It is the fastest type of memory in a computer and it is directly accessible by the arithmetic logic unit (ALU) performing the operations. Definition 1. We define a program to be an ordered sequence of assembly instructions F = { f 1 , f 2 , . . . , f N F }. N F is called the number of instructions for the program. For any assembly instruction f ∈ F , if f has a destination register, we denote this register by r f . Let S denote the set of all programs.
Fault attack is an intentional change of the original data value into a different value. This change can either happen in a register/memory, on the data path, or directly in ALU. In general, there are two main fault models to be considered -program flow disturbances and data flow disturbances. The first one is achieved by disturbing the instruction execution process that can result in changing or skipping the instruction currently being executed. The second one is achieved either by directly changing the data values in storage units, or by changing the data on the data paths or inside ALU. For the purpose of a fault injection attack, these three data flow changes are equivalent and can be modeled by changing the values in registers.
Definition 2 (Instruction skip and fault mask). 1) For any i ∈ Z >0 , an ith instruction skip is a function
• if 1 ≤ i < N F and f i has a destination register r fi whose length is at least N, then
, f N F }, wheref i = eor r fi j, i.e.f i changes the value in r fi , to be the xored result of value in r fi and j.
• ς i, j (F ) = F otherwise.
In our evaluation framework, we consider a single fault adversary -under this attacker model, at most one fault is injected during the encryption/decryption algorithm execution. The attacker can inject a random m−bit flip fault such that all the bits have equal probability to be affected by the fault. In other words, for a random m−bit flip, each fault mask value between 1 and N has the same probability to occur.
Fault Resilient Encoding Scheme
Encoding scheme in our context is a protection method that acts against fault injection attack by detecting malicious changes to secret data processed by the encryption algorithm. In this part we provide a necessary formalization which establishes the foundation for Section 4, where a generic metric for evaluating encoding scheme robustness is proposed.
A binary code, which we denote by C, is a subset of F n 2 , the n−dimensional vector space over F 2 , where n is called the length of the code C. Each element c ∈ C is called a codeword of C and each element x ∈ F n 2 is called a word [23, p.6] . Take two words x, y ∈ F n 2 , the Hamming distance between x and y, denoted by dis (x, y), is defined to be the number of places at which x and y differ [23, p.9] . More precisely, if x = x 1 x 2 . . . x n and y = y 1 y 2 . . . y n , then
where x i and y i are treated as binary words of length 1 and hence
Furthermore, for a word x ∈ F For a linear code with dimension k, a standard notion would be [n, k, d], where n is its length and d is its minimum distance. For a non-linear code, there is no notion of dimension and we follow the standard notion (n, M, d) as presented in [24] . We would like to emphasize that we do not restrict the code to linear codes, allowing the analysis to more code candidates used in encoding countermeasure.
To simplify the notation we introduce the symbol ⊥, which indicates an error. Note that the exact implementation of ⊥ gives certain restrictions on the code C that can be used: if zero is used to implement ⊥, we should require that 0 C.
To formally define encoding countermeasure, we first adopt the definition of symmetric cipher from [25] :
and ∀κ ∈ K, ∀P ∈ P, D(κ, E(κ, P)) = P. We refer to K , P, M, E and D as key space, plaintext space, ciphertext space, encryption and decryption of this cipher, respectively. We define S to be the set of all symmetric ciphers (K, P, M, E, D) such that
A symmetric cipher with encoding countermeasure either outputs an error message or the correct ciphertext. We give the formal definition of such a cipher as follows:
In encoding countermeasure, the important part is the error detection, which is closely related to the encoding and decoding. Here we formalize the notion of encoder and decoder. Definition 6. Given an (n, M = 2 k , d)−binary code C, an encoding-decoding scheme associated with C is a pair of functions (Encoder C , Decoder C )
= {⊥} and Encoder C is bijective with Decoder C C being its inverse.
Thus for Decoder C an error message ⊥ will be returned if the input is not a codeword. More details regarding encodingdecoding schemes can be found in Appendix A.
for some positive integers M 1 , M 2 , . . . , M m+1 . Let S denote the set of all operations.
Note that an assembly implementation of an operation is a program (see Definition 1). Example 1. The xor operation defined on 1-bit strings is an operation g :
satisfies x i =⊥ for at least one i ∈ {1, 2, . . . , m}, then h(x) =⊥. Let S ⊥ denote the set of all operations with error detection. Remark 2. By the above definition, for any symmetric cipher
−binary code C with associated encodingdecoding scheme (Encoder C , Decoder C ), Encoder C ∈ S and Decoder C ∈ S ⊥ . Example 2. Consider the following operation with error detection h : (F 2 ∪ {⊥}) × (F 2 ∪ {⊥}) → F 2 ∪ {⊥}. h outputs the xor of two bits when no error is detected:
An operation g ∈ S can be changed to an operation with error detection utilizing binary codes:
. . , x m ) , and
Example 3. Let us take the function g from Example 1 and take the following (2, 2, 2)−binary code C = {00, 11} with the following encoding-decoding scheme:
Decoder C : 00 → 0, 01 →⊥, 10 →⊥, 11 → 1.
2 ∪ {⊥}) → C is an operation with error detection and
(The proof can be found in Appendix B.1.)
This justifies that we can split or merge multiple cipher operations while considering applying encoding countermeasure to a symmetric cipher (cf. Section 7.1).
Encoding countermeasure applied to a symmetric cipher can be considered as applying a function which is closely related to a binary code on the encryption and decryption of the cipher. Here we give the definition of such a function.
)− binary code C with an associated encoding-decoding scheme (Encoder C , Decoder C ), we define fault resilient C-map to be the following function
and D (κ, ⊥) =⊥. Now we are ready to formalize encoding countermeasure, which we refer to as fault resilient encoding scheme.
Definition 11 (Fault resilient encoding scheme). Given (K, P, M, E, D) ∈ S a symmetric cipher and C an (n, M = 2 k , d)−binary code with an encoding-decoding scheme (Encoder C , Decoder C ).
A cipher of the form Φ C (K, P, M, E, D) is called a fault resilient encoding scheme.
Remark 4. Taking k = 1 and C = {01, 10}, we get the bitsliced encoding, e.g. the one used in [26] (Encoder C (0) = 01, and Encoder C (1) = 10) which follows the principle of a dualrail precharge logic. In Section 7.1, we use k = 4 mainly because PRESENT cipher uses 4-bit SBox (see Section 7.1). For a better understanding of how the fault resilient encoding scheme works, the design overview is stated in Figure 1 . Informally, first, an encoder is applied to both the plaintext and the key. Then, the encryption process is performed, preserving the encoding. In the end, a decoder is applied in order to get the encrypted message. The decryption process is analogous.
Evaluation Metric
In this section we first formalize faults in encoding schemes and provide concepts of safe and missed faults. Then, we propose two metrics for evaluating different binary codes used for fault resilient encoding scheme: one for bit flip fault model and one for instruction skip fault model.
Faults in Fault Resilient Encoding Schemes
We first give the definition of safe and missed faults for an implementation of ϕ C (g) (i.e. for a single operation), where C is a binary code and g is an operation.
The set of possible instruction skips for F is
2) The set of possible fault masks for F is
3) For an integer 1 ≤ m ≤ n, the set of possible m−bit flips for F is
4)
A fault on F is defined to be a function such that ∈ G (F ,sk) or ∈ G (F ,fm) . 5) Fixing an input x, a fault on F is said to be safe if F =⊥ or g(x); and it is said to be a missed fault otherwise.
Remark 5. A fault is closely related to a tampering function defined in [27] . In our notation, a fault is defined on the program code level, but in a broader sense, the effect of introducing a fault in the program execution can be considered as an application of a tampering function.
Given an (n, M = 2 k , d)−binary code C associated with an encoding-decoding scheme (Encoder C , Decoder C ) and a sym-
. The assembly implementations of E and D are programs. If we let F 1 and F 2 be the assembly implementations of E and D respectively, then for any κ ∈ K, P ∈ P, Msg ∈ M ∪ {⊥}, F 1 (κ, P) = E (κ, P) and F 2 (κ, Msg) = D (κ, Msg). We assume the registers involved in the implementation all have length at least n. Recall that E, D ∈ S (Remark 2), we hence give the following definition of safe and missed faults for a fault resilient encoding scheme.
Definition 13 (Safe and missed faults). For a fixed plaintext P ∈ P and a key κ ∈ K, a fault 1 on F 1 is safe if F 1 (κ, P) =⊥ or E(κ, P) and it is called a missed fault otherwise. Similarly, a fault
Recall that for a differential fault analysis [7] , the attacker needs to inject a fault during the execution. Based on where the fault is introduced, diffusion can spread it up to the whole cipher state by the end of encryption. Attacker then compares the faulty output with the correct one and can gain information about the secret key. If the fault is missed, the attacker can use similar technique. In this case, the cipher output would be equivalent to the faulty output obtained by attacking an unprotected cipher implementation. On the other hand, if the fault is safe, it means the output is either ⊥ or the correct output, which will not give the attacker valuable information.
Metrics for Bit Flips and Instruction Skips
In this part we give the metrics we use to evaluate the fault resistance property of a binary code used in fault resilient encoding scheme. Since bit flips and instruction skips are quite different fault models in nature, we propose different metrics for each of them.
The metrics are defined for the implementation of encryption. Similar metrics can be defined for the implementation of decryption.
As mentioned earlier, for an m−bit flip fault attack model, we assume all combinations of m bits have equal probability to be flipped. Thus,
Furthermore, given a particular fault , the probability that is safe is calculated assuming that the plaintext and key are independent random variables following uniform distribution 1 . More precisely,
Pr[ is safe] = |{p, κ : P ∈ P, κ ∈ K, is safe for plaintext P, key κ}| |P||K| .
Definition 14 (m−bit fault resistance probability). Following the notations from Definition 13. Let m be an integer such that 1 ≤ m ≤ n, the m−bit fault resistance probability of C w.r.t.
We are interested in the best case for the attacker, i.e. we consider she can inject a fault that has the highest probability to be missed by the encoding scheme. Therefore, we have to take the minimum of the m−bit fault resistance probabilities. To check the overall resistance of a code in fault resilient encoding scheme, we consider all the possible bit flips and define bit flip resistance probability as follows:
Definition 15 (bit flip fault resistance probability). Given an (n, M = 2 k , d)−binary code C, a symmetric cipher (K, P, M, E, D), and an implementation F 1 of E, the bit flip fault resistance probability for C w.r.t. to (K, P, M, E, D) and F 1 , denoted by p C,bf , is defined as:
where p m,C is the the m−bit fault resistance probability of C w.r.t. to (K, P, M, E, D) and F 1 .
The bit flip fault resistance probability will be used as our metric for evaluating a code used in fault resilient encoding scheme w.r.t. bit flip fault attacks.
For instruction skips, we give the following metric:
, and an implementation F 1 of E, the instruction skip resistance probability for C w.r.t. to (K, P, M, E, D) and F 1 , denoted by p C,sk , is defined as:
Remark 6. We do not assume faults on input data, such as plaintext and key. In case the attacker wants to attack these, she could do it anytime before the actual algorithm execution, even before the encoding. Similarly, we do not assume faults on ciphertext -in this case, the attacker would not get any meaningful information about the secret key.
Anticodes for Fault Resilient Encoding Scheme
In this section, we first provide the exact formulas for p C,bf and p C,sk in case of a simple "cipher" which consists of one binary operation (Section 5.1). Binary operations are very common in symmetric ciphers, e.g. xor, and, modular addition. We remark that analyzing the fault resistance property of a code C with respect to a single operation gives insights on the overall fault resistance of using C in fault resilient encoding scheme. Hence it provides a good approximation of the fault resistance of a full cipher implementation.
In Section 5.2 we introduce anticodes which give improved resistance probabilities compared to codes with unbounded distances.
Finally, Section 5.3 provides a way to check the existence of an anticode for given parameters. 
2 is a binary operation.
Evaluation of Single Operations
Let g ∈ S be a binary operation g :
2 and let C be an (n, M = 2 k , d)−binary code with associated encoding-decoding scheme (Encoder C , Decoder C ) and distance d ≥ 2. We will use zero string to denote ⊥, the error message. Hence we further require that 0 C. And we choose k such that k = max{M 1 , M 2 }.
Remark 7. As mentioned in Remark 1, we do not restrict our codes to be linear. Thus the method of calculating syndrome [23, p .62] of a word and check if this word is a codeword does not apply in our setting. Furthermore, using table lookup for implementation and a null word for denoting error does not require an extra computation (e.g. calculating syndrome) to detect error.
Let F be the assembly implementation (in Figure 1 ) of ϕ C (g). In F , two different instructions are used: LDI loads immediate data into the destination register, LPM loads data from a program memory to the destination register -serving as a table lookup for the binary operation g. Before executing each table look-up we precharge the destination register to zero by using exclusive or operation (EOR in line 3). Note that the table has 2 n × 2 n entries. The value stored at address (a, b) is zero if a, b C and the value is Encoder C (g(Encoder C (x), Encoder C (y))) if a = Encoder C (x) and b = Encoder C (y).
By Definition 12 and the assumptions stated in Remark 6, the set of possible instruction skips and the set of possible fault masks for F are given by
The values of p C,m and p C,sk with respect to the program F and ϕ C (g) can be then calculated as follows:
2. p C,sk = 1.
(The proof can be found in Appendix B.2.)
Remark 8. If S m,C = 0, then p C,m = 1. This is equivalent to saying that in case there are no two codewords in C that are at distance m from each other, m−bit flip fault model would not result in missed faults.
Fault Resilient Anticode Scheme
In this part, we explain the rationale behind extending the encoding scheme with a usage of anticodes to provide better bit flip resistance probabilities.
When selecting the code parameters, the choice of n is dependent on the architecture of the device and the memory constraint. The value of M is mostly related to the cipher design (see Section 8) .
For binary codes with the same length n and cardinality M, the formula from Proposition 1 shows that the smaller the value of if n is odd .
It is known that (see e.g. [28, p.26] )
Hence, we would like to have S m,C = 0 for m "close to" n and we do not want S m,C = 0 for too many m (see Lemma 2) . In the view of the above, we recall the notion of anticode:
A binary anticode is an array of binary digits with n rows and M columns, constructed so that the maximum Hamming distance between any pair of rows is less than or equal to a certain value δ. This value δ is the maximum distance of the anticode.
If we have a binary code, we can take its codewords as rows and then get an anticode. Note that a binary code does not have repeated codewords but an anticode can have repeated rows [29] . The above discussion shows that essentially what we want is a binary code which is also an anticode with a proper maximum distance δ. We introduce the following definition. From the definition, it is clear that a binary code can always be considered as a binary anticode. The difference is that the notion of anticode captures the maximum distance of the code, which is closely related to the selection of codes with better bit flip fault resistance probability. Here, we rename our fault resilient encoding scheme below to emphasize the usage of anticode. To analyze the choice of C that is used in a fault resilient anticode scheme, we theoretically study the performance of C with
Next, we consider n, M as fixed parameters and we assume > 2 (Equation 3), hence we also assume n ≥ 6. For any (n, M, d, δ)−binary code C, let p C,bf (resp. p C,m ) denote its bit flip fault resistance probability (resp. m−bit fault resistance probability) w.r.t. F and ϕ C (g) in Section 5.1.
We have the following observations.
Lemma 2 (Advantage of anticodes for fault detection).
(The proof can be found in Appendix B.3.) We remark that in 3-a, taking m = n implies S n,C3 = 0, which means in this case δ 3 < n. This corresponds to our previous observation that S m,C = 0 for m "close" to n may give anticode with better fault resilient property. Condition 3-b implies that there are at least 3 m such that S m,C3 0, which corresponds to our observation that it is not desirable to have S m,C = 0 for too many m.
The Possible Choices of Anticodes
The We have
vi N(n, 2r − 1, 2 − 1) ≤ N(n + 1, 2r, 2 ) where r, ∈ Z >0 ; vii N(n, 2r − 1, 2 ) ≤ N(n + 1, 2r, 2 ), where r, ∈ Z >0 ; (The proof can be found in Appendix B.4.) In Section 7.1 we will study and analyze the implementation of a fault resilient anticode scheme with PRESENT cipher. Because of the cipher design we will be interested in anticodes with cardinality 16 (see Section 7.1).
By the above Lemma, we computed the possible values of d and δ for n = 8, 9, 10 and M = 16, stated in Table 2 . These values are useful when considering the selection of anticodes (see Section 8). On the other hand, the existence of binary anticodes satisfying condition 3-a or 3-b in Lemma 2 is not guaranteed. However, by using our anticode selection algorithm (Section 6.1), we were able to find anticodes satisfying both conditions 3-a and 3-b in Lemma 2. As expected, they have high bit flip fault resistance probability when used in fault resilient anticode scheme (cf. Remark 10). We would like to emphasize that searching for an anticode with Algorithm 1 is time-consuming, especially for codes with high n. Also, it might not be apparent whether an anticode exists until the whole code space is searched. Therefore, Lemma 3 helps in this direction -it tells us whether it makes sense to run Algorithm 1 for given parameters.
Algorithms
In this section, we provide two useful algorithms for practical evaluation of encoding schemes. The first one selects binary anticodes according to user requirements and the second one evaluates software implementations that follow the fault resilient anticode scheme.
Anticode Selection Algorithm
In order to use and analyze the fault resilient anticode scheme, we first need to select the binary anticodes. The algorithm created for this purpose is described in this section.
Similarly to previous section, we choose the anticodes based on their performance on a single cipher operation. This gives a good approximation of an overall resistance when it is used for a full cipher implementation.
Pseudocode outlining the main idea of the anticode selection is stated in Algorithm 1. The inputs are: parameters n, M, d, δ for the binary anticode, and ε such that we require that the selected binary anticode C satisfies 1 − p C,m < ε for all 1 ≤ m ≤ n, where p C,m is the m−bit fault resistance probability of C with respect to F and ϕ C (g) in Section 5.1. Thus the calculation of p C,m follows from Proposition 1.
We note that for our implementation we use zero word as ⊥ and thus in line 3 we choose sets S which do not contain 0.
The algorithm takes each possible binary code S that consists of M codewords, each of length n (line 3), and test if the distance conditions are satisfied (line 4). I.e. whether the following two conditions are satisfied: 1) min{dis (c, c ) : c, c ∈ S , c c } = d; 2) max{dis (c, c ) : c, c ∈ S , c c } = δ. In case the distance conditions are satisfied, we further check if the fault resistance probability of S can be fulfilled (line 5).
The ε parameter is crucial for selecting an anticode with good fault resilient capabilities. As long as at least one anticode exists for given ε, the algorithm will try to lower this value (line 9) by a pre-specified constant, to find binary anticodes which satisfy the conditions with even smaller ε.
Dynamic Code Analysis
For the purpose of fault analysis, we have designed a dynamic code analyzer that is able to simulate the code execution and fault injection with a bit precision in any instruction of the code. Along with the bit flips, it can simulate instruction skips (see Definition 2) . Pseudocode implementing the evaluation is stated in Algorithm 2.
For a symmetric cipher (K, P, M, E, D), and an (n, M, d, δ)−binary anticode C, let (K, P, M ∪ {⊥}, E , D ) denote the corresponding fault resilient anticode scheme (Definition 19). Given F , an implementation of E , Algorithm 2 calculates approximations of the m−bit fault resistance probability p C,m (Definition 14), bit flip fault resistance probability p C,bf (Definition 15) and instruction skip resistance probability p C,sk (Definition 16) of C with respect to (K, P, M, E, D) and F .
By definition, the values of p C,m , p C,bf , p C,sk should be calculated by evaluating each pair of plaintext and secret key. However, for a symmetric cipher, this would require an infeasible amount of calculations. For PRESENT-80, it would need 2 144 evaluations of each fault model (80-bit key and 64-bit plaintext). Thus, we allow a user input noOfIter which specifies how many pairs of random plaintext and random secret key to consider. Hence the output will be approximations of our evaluation metrics.
We first select a random pair of plaintext and secret key, then compute the corresponding correct ciphertext (line 3).
From line 4 to 13, we evaluate bit flip faults for the selected pair of plaintext and key. The first loop iterates over every possible fault mask, which will be later xor-ed with the intermediate value in order to change the original value in the destination register of an instruction (line 9). According to Definition 2, fault mask is a binary string, however, it is more convenient and efficient to use an integer in the implementation. The second loop iterates over every instruction in F , to select the position in the program to be faulted. The last loop is the program execution itself, it iterates over instructions in F and executes them one by one. In case the instruction number corresponds to the number that is currently being targeted, a bit-flip is performed (line 9). After the execution of F finishes, there is a checking of the output value (lines [10] [11] [12] [13] . If the value equals to the expected ciphertext E(P, κ), or the value is ⊥, it is a safe fault. Otherwise, it is a missed fault (see Definition 13) . In each case we increment a corresponding value in the array, where the array index indicates the Hamming weight of the fault mask. Lines 14-23 evaluate instruction skips. It works in the same fashion as the previous part, however, in this case we save one loop because we do not need a fault mask. Output evaluation is analogous, but the records of safe/missed faults will be integers instead of array of integers.
Lines 24-25 calculate the approximated values of p C,m for each m, which is equal to the number of safe m−bit flip faults divided by the total number of m−bit flip faults considered. Line 26 calculates the approximated value of p C,bf , which is the minimum of p C,m for all m. Line 27 calculates the approximated value of p C,sk , which is equal to the number of safe instruction skips divided by the total number of instruction skip faults considered.
The 
Case Study
In this section, we present the case study on block cipher PRESENT, fully implemented by using fault resilient anticode scheme with (n, 16, d, δ)−binary anticodes for n = 8, 9, 10 (Table D lists all the anticodes used). The anticodes are selected by Algorithm 1. In Section 7.1, we provide implementation details by using a generic microcontroller. Section 7.2 provides the results of the code analysis using Algorithm 2.
PRESENT Cipher Implementation
PRESENT is an ultra-lightweight block cipher, developed in 2007 [32] . It is a symmetric cipher, following an SPN structure, where the block length is 64 bits and key length can be either 128 bits or 80 bits. A round function consists of three operations: addRoundKey (xor of the state with the round key), sBoxLayer (substitution by 4-bit SBox, which we refer to as PRESENT SBox), and pLayer (bitwise permutation). After 31 rounds, there is one more addRoundKey, used for post-whitening. The whole process is depicted in Figure 2 . Because of its lightweight character, it is recommended to use 80-bit key length in order to keep the computation fast and energy efficient [32] . We will focus on this variant, denoted by PRESENT-80. For our implementation, we take pre-computed round keys which are already encoded and therefore, we omit the description of the key schedule here.
By definition, evaluations of p C,m , p C,bf , p C,sk are done on an assembly implementation, thus it is important to specify what kind of implementation is used. The main properties of the implementation in our case study are as follows:
1) Each operation is implemented as a table look-up from memory. 2) Before the table look-up, the destination register of an operation is precharged to a zero so that single instruction skip will be protected.
3) The error message ⊥ is denoted by the value zero 0.
We note that for PRESENT-80, pLayer can be considered as four parallel bitwise operations where each is a function: F This property helps us to tailor the look up tables in a way that can provide more efficient space/time implementation compared to implementing the two layers separately. In the following, we will explain the design of such an implementation .
Encoded Round Function for PRESENT
In this part, we will explain the implementation of the round functions for fault resilient anticode scheme with PRESENT-80 by using (n, 16, d, δ)− binary anticodes. Remark 3 justifies that we can split or merge multiple cipher operations while using the fault resilient C−map (Definition 10), preserving the correct data-flow.
The addRoundKey is a binary operation, xor-ing the key with the current state. Therefore, it can be directly implemented by an xor lookup table, similar to the implementation F of ϕ C (g) in Section 5.1. The sBoxLayer maps an input value to an output value, therefore the standalone implementation would be even easier than the xor. However, we have decided to merge sBoxLayer together with the pLayer, because the latter cannot be implemented in a straightforward way. The overview of this merged implementation is depicted in Figure 4 , which explains how the first encoded nibble is obtained. The explanation of this approach is given below.
Let C be an (n, 16, d, δ)−binary anticode. The implementation of Φ C pLayer•sBoxLayer relies on the xor lookup table and eight other tables, which can be divided into two groups: 1) Bit-extracting Sbox tables: This group has four tables: T 0, T 1, T 2, T 3 such that T i takes a codeword, say Encoder C (x 0 x 1 x 2 x 3 ) and returns the codeword Encoder C (xs i 000). If the input is not a codeword, the return value will be ⊥. Here we assume that after PRESENT SBox, x 0 x 1 x 2 x 3 becomes xs 0 xs 1 xs 2 xs 3 .
In other words, this group first computes an Sbox on the encoded data, and then extracts one bit -the bit position depends on which of the four tables is used. So, the output of these tables is the codeword corresponding to either 0 or 8. 2) Bit-shifting tables: This group has four tables as well:
T B0, T B1, T B2 and T B3. For a codeword of the form Encoder C (x000), T B0, T B1, T B2, T B3 return the codewords Encoder C (x000), Encoder C (0x00), Encoder C (00x0), Encoder C (000x), in their respective order. If the input is not a codeword, the return value will be ⊥ for all the four tables. In other words, the tables in this group provide bit shifting operations, that are necessary to finalize the pLayer. The outputs of tables T B0, T B1, T B2, T B3 can be codewords corresponding to 8, 4, 2, 1 or 0, depending on the value and the bit position. After the Sbox is computed and the bit shifts on the resulting data are done, the data is combined back to 4-bit format by using an xor table -in total, three xor operations are required to combine the data. In the following, we will explain this process step-bystep.
Assume we have
1) Encoder C (a 0 a 1 a 2 a 3 ) is passed to tables T 0, T 1, T 2, T 3, the four returned values are passed to T B0 and we get: Encoder C (as 0 000), Encoder C (as 1 000), Encoder C (as 2 000),
four returned values are passed to T B1 and we get: Encoder C (0bs 0 00), Encoder C (0bs 1 00), Encoder C (0bs 2 00), Encoder C (0bs 3 00); 3) Encoder C (c 0 c 1 c 2 c 3 ) is passed to tables T 0, T 1, T 2, T 3, the four returned values are passed to T B2 and we get: Encoder C (00cs 0 0), Encoder C (00cs 1 0), Encoder C (00cs 2 0),
four returned values are passed to T B3 and we get: Encoder C (000ds 0 ), Encoder C (000ds 1 ), Encoder C (000ds 2 ), Encoder C (000ds 3 ). Afterwards, we need three xor table lookups:
1) The first four encoded nibbles are given by Encoder C (as 0 000) ⊕Encoder C (0bs 0 00) ⊕ Encoder C (00cs 0 0) ⊕Encoder C (000ds 0 ) ; 2) The second four encoded nibbles are given by Encoder C (as 1 000) ⊕Encoder C (0bs 1 00) ⊕ Encoder C (00cs 1 0) ⊕Encoder C (000ds 1 ) ; 3) The third four encoded nibbles are given by Encoder C (as 2 000) ⊕Encoder C (0bs 2 00) ⊕ Encoder C (00cs 2 0) ⊕Encoder C (000ds 2 ) ; 4) The fourth four encoded nibbles are given by Encoder C (as 3 000) ⊕Encoder C (0bs 3 00) ⊕ Encoder C (00cs 3 0) ⊕Encoder C (000ds 3 ) ;
Here ⊕ represents xor table lookup.
Results
For selecting anticodes, we run Algorithm 1 for all the parameters n, M, d, δ combination in Table 2 . For each (n, M, d, δ), ε was set to 1 and the anticodes selected are presented in Table 4 . a 0 a 1 a 2 a 3 Encoder C ( ) To analyze the performance of different anticodes in fault resilient anticode scheme, for each anticode in Table 4 , we run Algorithm 2 using implementation of PRESENT-80 following the specification stated in Section 7.1. To decide the input noOfInter, we randomly picked 10 anticodes and executed the algorithm with different values of noOfInter. Our results showed that for noOfInter≥200, the change in the output probabilities for different values of noOfInter became negligible (< 10 −6 ). Therefore, we have set noOfInter= 200 for our evaluation of each anticode.
The analysis results for anticodes (Table 4) 1) The instruction skip resistance probability is 1 for all anticodes. This is due to the precharge of destination register of our implementation specification.
2) The improvement of using a longer length for encoding the data is obvious -bit flip fault resistance probabilities faults for length 8 go up to ≈0.933, for length 9 up to ≈0.966, and for length 10 up to ≈0.979. 3) For n = 8, 9, 10 the anticode with the best performance (i.e. the highest bit flip fault resistance probability p C,bf ) are anticodes with parameters (8, 16, 3, 6) , (9, 16, 3, 7) , (10, 16, 3, 8) respectively. 4) Every (8, 16, 4, 8) −binary anticode has a property that 8−bit flip has a probability 1 of being missed, i.e. p C,8 = 0. We note that this finding is in accordance with the one described in [18] .
Remark 10.
• The anticodes that achieve the best bit flip fault resistance probabilities satisfy both conditions 3-a and 3-b in Lemma 2.
• Comparing the last two columns of fore, it shows the importance of simulating the cryptographic implementation execution for getting more precise insights on the code robustness.
Selection of Anticode Parameters
Now a natural question to ask is how to choose the parameters for anticodes in general, e.g. for different device architectures and security requirements. We propose the following guidelines: 1) Code length (n): This parameter depends entirely on the underlying device architecture. Because of the addressing in the table look-up implementations, it is necessary to fit the whole address into one instruction. Therefore, e.g. for 8-bit device, one can use at most n = 8. For 16-or 32-bit architectures, greater lengths can be used. However, in that case, memory requirements need to be taken into account (these are explained more in Section 9).
2) Number of codewords (M): Number of codewords is, on the other hand, independent on the underlying architecture -it does not affect table size or require specific register size. The designer needs to take the cipher and the security requirements into account when deciding on number of codewords. For example, in case of PRESENT, the operations are computed on nibbles and therefore, 16 codewords is the preferred number, providing a good trade-off between security and execution speed. Lower number of codewords would mean higher security, but slower speed since the operations need to be carried on smaller chunks of data. 3) Distance (d) and maximum distance (δ): These parameters are not dependent on the device architecture, but can affect the resulting security significantly. The first, and obvious selection criterion is whether a code with certain d, δ exists for some n and M. For this purpose, Lemma 3 provides an answer, with results for n = 8, 9, 10 stated in Table 2 . Another selection criterion is whether some particular fault models can be prevented by other means -suppose we have an additional error detecting module that can detect 2 or 4 bitflips. Then, we can use (8, 16, 2, 4)-anticode from Figure 5 , since only these two models are undetected using this code. Furthermore, the selection is also dependent on the attacker model assumption. In case we want to build an implementation resistant against specific fault model, e.g. nibble flip [33] , then we would like to select an anticode with the highest 4−bit flip fault resistance probability, p C,4 . For any case, while the values of n, M can be decided before the actual cipher implementation, d and δ should be decided after running the evaluation in Section 6.2.
Discussion
Memory and Speed Trade-Offs Table 4 shows that if the anticode C has longer length, fault resilient anticode scheme using C has better fault resistance properties. On the other hand, it also means a bigger memory consumption that increases sub-exponentially with the code length. In the following, we will discuss the overheads. When it comes to speed, the fastest non-bit sliced 8-bit implementation of PRESENT-80 requires 8,721 clock cycles [34] , out of which ≈ 1,248 is a key schedule (since we consider the round keys already in the memory, we will only count 7,473 clock cycles for the implementation from [34] ). In case the selected code can be fully implemented in the SRAM (and therefore, a table look-up operation LD takes 2 clock cycles), fault resilient anticode scheme implementation takes 9,424 clock cycles (≈ 26.1% overhead). In case all the look-up tables are stored in the flash memory (LMP instruction taking 3 clock cycles has to be used), the approach takes 13,640 clock cycles (≈ 82.5% overhead). Therefore, compared to the most popular time redundancy that repeats the encryption twice and compares the results [3] , the encoding method provides reasonable timing overheads, especially if the look-up tables can be stored in the SRAM.
While the speed of the implementation might be reasonable, the memory overheads quickly grow to sizes that are not practical for real-world cryptography. It has to be noted that even if the code length is smaller than the memory address length, the table normally has to occupy the size according to this length, otherwise the unused bits in the address could be faulted and would point to another part of the memory that is used for a different purpose. Therefore, if we want to use a binary anticode of length 6 in a 16-bit addressing space device, the constructed table still has to be of size of 8 × 8 bits. For such architecture, codes longer than 8 bits would not be possible -in case of code length is between 9 − 16, we need a 32-bit addressing space. Also, number of codewords does not affect the memory requirements since the table size for the same code length is constant, only the number of non-zero values will change with different number of codewords. Efficient implementation of encoding schemes therefore still remains an open problem. Table 3 provides memory requirements for some standard cryptographic operations. Since block ciphers combine several functions in order to achieve the security requirements for confusion and diffusion, several tables normally have to be stored in the memory. For example, the PRESENT implementation in Section 7 uses one xor table and eight shifting tables for the combined pLayer and sBoxLayer, resulting in total of 81,920 bytes of memory. To test the feasibility, we made an implementation for Atmel ATmega328P, an 8-bit microcontroller. However, only the eight smaller tables could fit into the device memory, while the big xor table had to be put on an external EEPROM module (256 Kbit Microchip 24LC256). 
Instruction Modification
Recently, a fault attack approach utilizing instruction replacement has emerged [35] . Up to date, there is no dedicated protection against such fault model. In [35] , the attacker has to change the instruction opcode that specifies the operation -e.g. in case of changing ADD to SUB in AVR, as presented in [35] , the instruction opcode needs to be changed from 000011 to 000110. In case the standard instructions are used, if the attacker is able to achieve this model on a particular device, she can do that for any implementation executed on such device. However, in the case of the table look-up based fault resilient anticode scheme, the means to achieve the instruction replacement that result to executing another operation are different. Each operation is executed as fetching the result from a table and hence the address of this table specifies this operation. Instead of changing the instruction opcode, the attacker needs to know the table address she wants to change the operation to. Such address would vary from implementation to implementation and it is not trivial to predict whether the attacker would be able to achieve such precise change.
Cache Timing Attacks
Look-up tables in general are susceptible to cache timing attacks, since fetching a value from one position in the table takes a different time compared to using another position due to cache misses [36] . As mentioned in [37] , there are various ways for protecting such implementations. One way to do it is to use two different round function implementations -some rounds use look-up tables, while the others do not. This method can be further investigated in order to provide the best properties w.r.t. cache-timing, power, and fault attacks. Another approach is cache warming that loads the whole table into the cache, resulting into constant time of execution, avoiding cache misses completely. Furthemore, one can add random delays in the execution to make the attack harder.
Other Fault Analysis Methods
Apart from the Differential Fault Analysis (DFA), there are several other methods that can be used by the attacker. There are methods that have similar requirements to DFA, such as Collision Fault Analysis or Algebraic Fault Analysis, where the knowledge of the fault propagation is necessary in order to get the secret information. Therefore, our scheme can be applied as a countermeasure for these methods as well.
On the other hand, there are approaches that utilize the behavior where the fault does not propagate in all the cases, such as Safe-Error Analysis or Ineffective Fault Analysis (recently utilized in [38] ). These two methods, when used for block ciphers, require a stuck-at fault model, i.e. a model where certain value becomes either '0' or '1', no matter what value was in the register before. The attacker then just needs the information whether the output is faulty or not, without the knowledge of the fault value. Therefore, any error detection method that outputs ⊥ reveals such information to the attacker. Even if it carries out the computation one more time and provides a correct output on the second run, there is already a timing difference that can be observed. However, these attacks can be thwarted by a well-designed error correction codes. Some results in this direction are stated in [18] , along with the code properties. Similar properties could be derived for fault resilient anticode scheme in case such protection is necessary.
Conclusion
In this paper, we have formalized fault resilient anticode schemes and provided a way to evaluate software implementations protected by anticodes. We have practically implemented and evaluated symmetric block cipher PRESENT with encoded operations by using 8-bit microcontroller assembly code.
For the future work, we would like to extend our evaluation methodology to pipelined architectures.
, where N = min{ : k|(N + )}. i.e. we add zero bits to x to get x so that the length of x is divisible by k. Let Encoder C (x) := Encoder C (x ).
• If k|N, say N = kk , for any x = (x 1 , x 2 , . . . ,
It follows that Encoder C : F that Decoder C C k → F N 2 is the inverse of Encoder C and Decoder C (F nk 2 ∪{⊥})\C k = {⊥}. Example. Let us consider a (2, 2, 2)−binary code C = {00, 11} with associated encoding-decoding scheme as follows:
Encoder C : 0 → 00 1 → 11
Decoder C : 00 → 0 01 →⊥ 10 →⊥ 11 → 1.
Extend Encoder C , Decoder C to F 2 2 , we have the following encoding-decoding scheme: ∪ {⊥} → C km+2 ∪ {⊥}, such that for y = Encoder C (a) ∈ C km+1 , ϕ C (g 2 )(y) = Encoder C (g 2 (a)) and ∀y ∈ F nkm+1 2 ∪ {⊥}\C km+1 , ϕ C (g 2 )(y) =⊥. We have ϕ C (g 2 ) • ϕ C (g 1 ) is a map 
B.2 Proof of Proposition 1
Proof. 1. For any j ∈ F n 2 , ς 3, j (F ) always has the same output as F for any plaintext x and any key value key. Thus ς 3, j is safe for any j ∈ F n 2 . For a given plaintext x and a key value key, let y be the correct output of F . Then for any j ∈ F and let C := {c : c ∈ C}. Then dis (c 1 C is an (n, d, δ) −binary anticode, then we're done. Otherwise, dis (c 3 ,c 4 ) = δ − 1. Suppose c 3 = (x 1 , x 2 , . . . , x n ) and take C := C \{c 3 } ∪ {(0, x 2 , . . . , x n )}}, then dis ((0, x 2 , . . . , x n ),c 4 ) = δ and C is an (n, d, δ)−binary anticode. vi, vii. Let C be an (n, M, 2r − 1, δ) binary anticode. Take c 1 , c 2 , c 3 , c 4 ∈ C such that dis (c 1 , c 2 ) = 2r − 1 and dis (c 3 , c 4 ) = δ. We add one parity check bit for each codeword in C to get a binary anticode C : For any c = (c 1 , c 2 , . . . , c n ) ∈ C, definec := (c 1 , c 2 , . . . , c n , c 1 + c 2 + · · · + c n mod 2) and let C := {c : c ∈ C}. Since 2r − 1 is odd, dis (c 1 ,c 2 ) = 2r and ∀x, y ∈ C, dis (x, y) ≥ 2r. If δ = 2 − 1 is odd, dis (c 3 ,c 4 ) = 2 and ∀x, y ∈ C , dis (x, y) ≤ 2 . So C is an (n, M, 2r, 2 )−binary anticode. This proves vi. If δ = 2 is even, ∀x, y ∈ C with dis (x, y) = δ, dis (x , y ) = δ and we have C is an (n, M, 2r, 2 )−binary anticode. This proves vii. 
Appendix C Further Results on Fault Analysis
Appendix D Anticodes
