Abstract. Embedded devices performing RSA signatures are subject to Fault Attacks, particularly when the Chinese Remainder Theorem is used. In most cases, the modular exponentiation and the Garner recombination algorithms are targeted. To thwart Fault Attacks, we propose a new generic method of computing modular exponentiation and we prove its security in a realistic fault model. By construction, our proposal is also protected against Simple Power Analysis. Based on our new resistant exponentiation algorithm, we present two different ways of computing CRT RSA signatures in a secure way. We show that those methods do not increase execution time and can be easily implemented on low-resource devices.
Introduction
In 1997, Boneh, DeMillo and Lipton [1] introduced a new type of cryptanalysis based on error computations: Fault Attacks (FA). Various public-key cryptosystems were concerned but the RSA algorithm was especially targeted. Indeed, Fault Attacks are particularly effective when the Chinese Remainder Theorem (CRT) is applied. Using these techniques, an RSA modulus of arbitrary length can be factorized practically instantly on a PC.
Fault Attacks can be directed at cryptographic embedded devices, like smart cards, as shown in [2] . Straightforward protection mechanisms compute the signature twice, or verify it by performing the inverse operation. Nevertheless, this can be time consuming and further complicated if the corresponding public key is unknown to the device. So, alternative counter-measures, inside the algorithm itself, have been proposed to protect RSA signatures computations against Fault
This work was done while the author was with Oberthur Card Systems.
Attacks [3] [4] [5] [6] [7] . Unfortunately, many of them have been broken since their publication [2, 8, 9] .
Counteracting FA is not sufficient to ensure the security of an embedded cryptosystem. Indeed, another threat comes from physical leakage during cryptographic computations. A category of attacks, called Side Channel Analysis (SCA), exploits this leakage to retrieve information about sensitive data manipulated by the algorithm. Among these attacks, Simple Power Analysis (SPA) is the easiest to mount in practice and an implementation of a cryptosystem in mobile devices must thwart it. Counteracting FA and SPA attacks at the same time is an issue. Indeed, some counter-measures against SPA can been exploited by elaborate Fault Attacks such as Safe Error Attacks. In fact, it appears that Simple Power Analysis and Fault Attacks (classical and Safe Error) must be simultaneously taken into account when implementing cryptographic algorithms.
The paper is organized as follows. In the next section we briefly recall the RSA cryptosystem, the use of the Chinese Remainder Theorem to speed up generation of RSA signatures and the description of Fault Attacks directed against it. Then, in Sect. 3, we present our method for computing a modular exponentiation protected against Fault Attacks, proving its security in a practical fault model whose relevance to (real life) scenarios is discussed. The new algorithm is used in Sect. 4 to design two CRT RSA implementations resistant to FA and SPA.
RSA and Physical Attacks

RSA Cryptosystem
The public-key cryptosystem RSA [10] involves a public modulus N , which is the product of two large secret primes p and q. The public exponent e is co-prime with (p − 1) · (q − 1) and the private exponent d is the modular inverse of e modulo (p − 1) · (q − 1).
An RSA signature S of a message M is computed with the following formula:
To speed-up the exponentiation on low-resource devices, like smart cards, one usually applies the Chinese Remainder Theorem [11] . The resulting CRT RSA signature algorithm is four times faster compared to the classical method. It involves two modular exponentiations and a recombination step using Garner's Algorithm [12] . It needs 5 parameters: the two large primes p and q, the values
, and the pre-computed value A = p −1 mod q.
In Algorithm 2.3, each iteration of the loop involves a modular multiplication whatever the bit-value of the exponent d. Since the sequence of successive operations performed are independent of the key-bits, attacks such as SPA become impossible.
Fault Attacks on CRT RSA Algorithm
Fault Attacks have been suggested by Boneh et al. [1] . They observed that if a device outputs an erroneous CRT RSA signature, an attacker can deduce the private key from this information and the correct signature.
Indeed, let us assume than an error occurs during one of the modular exponentiations of Algorithm 2.1. This results in an incorrect intermediate result, e.g.S p , which will generate an erroneous signatureS. The faulty signatureS and the correct signature S are likely to satisfyS ≡ S mod p andS ≡ S mod q. Consequently, if S −S is not divisible by p, the prime number q is revealed by a gcd computation : q = gcd(S −S, N ). Remark 1. As noticed in [18] , the attack can also be performed without the knowledge of the correct signature: computing gcd(S e − M, N ) will also discover q.
The classical protection against this attack is to verify the computed signature with the public exponent e before sending the signature. The erroneous signature being not returned, the gcd computation can no more be computed. However, this can be costly in time (depending on the value of e) and sometimes impossible, if the public key is unknown to the device 3 .
In Sections 2.2 and 2.3, we recalled two simple ways of thwarting SPA and FA separately. In the next section, we show that this approach is not enough to obtain a secure implementation of a modular exponentiation.
Fault Attacks on SPA-resistant RSA Algorithm
Algorithm 2.3 ensures protection against SPA, but introduces a weakness with respect to another type of Fault Attacks, known as Safe Error, as described in [19] .
When an exponent bit equals 0, the result of the dummy computation of the modular multiplication is not used any more in the algorithm. Consequently, if a fault is induced on this modular multiplication, an attacker can determine the value of the bit, depending on the correctness or the incorrectness of the modular exponentiation. If the result is correct, the modified modular multiplication was a dummy operation, and so the bit of the exponent was 0. On the contrary, if the result is erroneous, the modified modular multiplication was used in the rest of the algorithm, meaning that the exponent bit was 1.
This kind of attack can be applied to an RSA signature to recover the private key, irrespective of whether it uses the CRT mode. This kind of cryptanalysis requires more work on the part of the attacker than the analysis discussed in Sect. 2.3 where only one fault was sufficient to obtain the private key. However, it is much more powerful since it thwarts the classical counter-measure consisting in checking the signature before sending it.
The attack described above illustrates the difficulty of thwarting SPA and FA simultaneously. In the following section, we present a new method to compute modular exponentiation resistant against Simple Power Analysis, Fault Attacks and Safe Error Attacks.
Exponentiation Resistant to Fault Attacks
Our Proposal
Our idea consists essentially in modifying an SPA-resistant algorithm by introducing some coherence test at the end. This test aims at ensuring that no fault has been induced during the execution of the algorithm. In fact, our reasoning is very close to that proposed in [5] and [20] .
Before explaining the core idea of our proposal, let us recall the content of the loop of Algorithm 2.3:
Our idea is based on the three following observations:
-The value A is independent of d. At the end of the algorithm, A satisfies: 
-Since S[0] equals M d mod N , the following relation holds for the content of A after the loop:
Equation (1) 
return("Error")
As it can be easily checked, our algorithm is still resistant to SPA: a modular square always follows a modular multiplication, independently of the value of the exponent. We have added two modular multiplications to the original version (Algorithm 2.3). One modular multiplication can be avoided if, at the beginning of the algorithm, S[1] is initialized with the message M . But as we will argue in Sect. 4, the re-use of the message at the end of the algorithm is useful when it comes to protect a CRT RSA that performs exponentiations with Algorithm 3.1. We shall prove in Sect. To prove the resistance of our proposal to Fault Attacks, we first have to clarify the capabilities of an attacker. In the following, we define the model in which our algorithm will be proved.
Attacker Model
As argued in [3] and [8] , sensitive applications (e.g. Banking, GSM or Identity Card) cannot make use of countermeasures with ad hoc security but need countermeasures which are provably secure against a precisely modeled adversary. Blömer et al. [3] , Wagner [8] and, more recently, Lemke-Rust and Paar [21] have introduced adversarial models for Fault Analysis. They consider various natures of faults and attack scenarios with a focus on pervasive computing on low-cost cryptographic devices. The attacker model presented hereafter follows the outlines of those described in [3] and [8] . It is divided into three parts which respectively aim at specifying how the attacker interacts with the device, the kind of variable targeted during the attack and the type of fault.
We shall assume that the attacker is only able to induce one fault per execution of the algorithm (this assumption is discussed in [2] ). In [3] , Blömer et al. identified three different ways to induce faults on an algorithm.
1. Modification of the input parameters [22] . 2. Modification of the algorithm execution [23] . 3. Modification of the local variables [3] .
A powerful adversary is able to induce a fault in the three different manners listed above and nowadays devices are usually provided with hardware mechanisms that render the task of such an adversary as difficult as possible. The adding of redundancy by hardware functions (e.g. based on error correcting codes or on hash functions) is often sufficiently effective to protect an implementation against permanent modification of input parameters (first model). Hardware mechanisms can also be successfully involved to guarantee the correctness of an algorithm execution (second model) and they give confidence that the algorithm does not end before all the exponent bits are processed [23] . Even if they are effective and efficient to counteract fault inductions of types 1 and 2, hardware mechanisms are rarely able to thwart attacks based on the perturbation of local variables. Defeating such attacks is usually the main role of software countermeasures. In the rest of the paper, we shall consider an adversary that modifies local variables, assuming that the security against the two other kinds of fault inductions is carried out by the Hardware.
Remark 2.
In Appendix A, we propose a slightly modified version of Algorithm 4.1 in which a simple mechanism has been added to counteract some fault injections belonging to the first and the second categories of faults. This version may be used when the effectiveness of some hardware countermeasures is in doubt. It allows to check that the loop has been entirely executed and that the exponent d used during the calculation (and temporarily stored in RAM) has not been modified and equals the exponent d stored in the non-volatile memory of the device.
Let X denote the value of a n-bit local variable and let X denote the corresponding faulty value. From X andX one can deduce an error vector ε such thatX = X + ε. The nature of the error vectors ε essentially depends on the adversary type: a strong adversary shall be able to disturb the value of a local variable at a very precise position (e.g. a bit modification at a given position), whereas a weak adversary could induce a fault but could not determine its position or its value. Blömer et al. exhibited in [3] four different kinds of fault. We recall their classification hereafter.
1. Precise Bit Errors. In the strongest scenario, an attacker can change the value of one bit: X = X ± 2 k for 0 ≤ k ≤ n − 1 2. Precise Byte Errors. One selected byte is affected by the attack: X = X±b·2 k for a known 0 ≤ k ≤ n − 8 and an unknown 0 ≤ b ≤ 255 3. Unknown Byte Errors. One random byte is affected by the attack: X = X ± b · 2 k for a unknown 0 ≤ k ≤ n − 8 and an unknown 0 ≤ b ≤ 255 4. Random Errors. An attacker has no knowledge of the modification:
In our security proof exhibited in the next section, we shall not need to focus on a type of fault in particular and we will prove that our proposal is secure whatever the nature of the fault ε induced by the adversary.
Security Proof
The message M being assumed to be not null, it can be easily checked that A cannot equal 0 if no fault is introduced. An attack consisting in setting A to zero during the execution of the loop is thwarted by the second test at Step 7. In the rest of this section, we argue that the first test at Step 7 allows to detect any other kind of fault induction in the model described in Sect.3.2.
Wagner proposed in [8] a framework to prove the resistance of an algorithm against Fault Attacks. He suggests that the algorithm be divided into a succession of finite states that correspond to single step computations and to study how faults propagate throughout the algorithm. Such an analysis allows to establish that the fault is either detected by the algorithm or cannot be exploited by the attacker.
The algorithm is split up in such a way that the initial state corresponds to the input of the algorithm and the final state corresponds to the output. 
To prove that the coherence test at the end of our algorithm detects any error during the computation of the three variables, we simulate a fault in a random state i + 1 for the three schemes above:
1. Attack changing the content of S[0]:
The wrong state M i j=0 dj ·2 j + ε implies a final state M d satisfying:
2. Attack changing the content of S [1] . In a similar way, a disturbance of S [1] at any moment results in the following state:
which differs from M d if ε and M are not equal to 0 modulo N .
Attack changing the content of
Contrary to the two previous cases, attacking A at a random state i + 1 impacts the content of the two others registers
To better analyze this error propagation, let us rewrite the error in a multiplicative way:
i is co-prime with N , we deduce from the additive error ε the multiplicative error β such that:
is not co-prime with N , we denote by z the least common multiple of M and N . The error β is such that:
So, the different states of the three variables are the following
. . .
and the contents of S[0], S[1] and A finally equal
When applying our verification formula, we get:
which is different from the value M has been shown in [24] that those values are extremely rare. For instance if N is a RSA modulus equal to the product of two primes p and q, then we have β
, where f p and f q are the orders of β modulo p and q respectively. If p and q are such that p − 1 and q −1 are not divisible by large powers of 2, then the probability that this equality holds is comparable to the probability of factoring N by randomly picking one of its prime factors.
Consequently, any error in an intermediate state of the three variables will result in an erroneous result. Thus, we prove that the final check of our algorithm detects any disturbance of any variable during any step of the computation.
CRT RSA Resistant to Fault Attacks
In the previous section, we introduced an exponentiation algorithm and proved its security in a realistic fault model. However, even if the two modular exponentiations in the CRT RSA algorithm have not yet been compromised, the correctness of the whole algorithm is not guaranteed. Indeed, it has been shown in [2] that Garner's recombination can be successfully attacked using FA techniques.
The following algorithms use the same principle as the method described in [5] and [20] . A secure modular exponentiation algorithm (Algorithm 3.1) is used to prevent faults during the two exponentiations in the CRT RSA algorithm. Then, additional information given by this secure modular exponentiation is employed to check that the recombination step was not disturbed.
First Method
Algorithm 3.1 can be used to strengthen the security of a CRT RSA implementation but it has to be slightly modified. Instead of always returning the result of the exponentiation, it returns the three variables if they satisfy Equality (1). Garner's Algorithm is then applied three times, and finally a check is performed to verify that those results satisfy an equality we exhibit below. The goal of this coherence verification is to protect the recombination step.
Security The security of Algorithm 4.1 with respect to FA is straightforwardly deduced from the coherence test and the analysis done in Sect. 3 (it thwarts in particular the recent attack [25] ). The Square and Multiply Always structure of the algorithm makes it resistant against known-plaintext SPA attacks. SPAAttacks assuming that the messages can be chosen by the adversary (e.g. [26, 27] ) are out of the scope of this paper. Classical countermeasures such as the randomization of M (see for instance [28] ) can be used together with our SPA/FA countermeasure to counteract such attacks by rendering the value of M unpredictable. The use of the message at the end of Algorithm 4.1 (during the last check) protects against modification of the message before one of the two exponentiations and thwarts the attack described in [8] . To insure the validity of the other input parameters of Algorithm 4.1, hardware mechanisms may be used (for instance in order to check the CRC value of each parameter).
Complexity This method requires adding only two Garner's recombinations and two modular multiplications to the classical CRT RSA algorithm. However, memory consumption is larger. Four l-bit values and two additional 2l-bit values are required compared to non-protected implementations.
As an alternative, we propose the following algorithm which detects an error with some probability.
Second Method
Our second proposal uses less memory than the previous one, but the coherence verification is made with a probability error, depending on the bit-length b of a security parameter r. This means that an error can remain undetected with a probability equal to
