Abstract. This article describes concrete results and practically validated countermeasures concerning differential fault attacks on RSA using the CRT. We investigate smartcards with an RSA coprocessor where any hardware countermeasures to defeat fault attacks have been switched off. This scenario was chosen in order to analyze the reliability of software countermeasures. We start by describing our laboratory setting for the attacks. Hereafter, we describe the experiments and results of a straightforward implementation of a well-known countermeasure. This implementation turned out to be not sufficient. With the data obtained by these experiments we developed a practical error model. This enabled us to specify enhanced software countermeasures for which we were not able to produce any successful attacks on the investigated chips. Nevertheless, we are convinced that only sophisticated hardware countermeasures (sensors, filters, etc.) in combination with software countermeasures will be able to provide security.
Introduction
This paper shows and proves that fault attacks on RSA with the CRT (also known as Bellcore attacks) due to [BDL] are feasible. They are indeed devastating if there are neither hardware mechanisms (sensors, filters, etc.) nor any appropriate software countermeasures implemented in the underlying smartcard ICs. However, this does not imply that modern high-security smartcard ICs are vulnerable to this kind of attacks. Instead, it shows that fault tolerance and especially sophisticated hardware countermeasures are essential for the design of secure hardware. Moreover, we stress that it is very difficult in the field to switch off these sophisticated hardware countermeasures. This has been done exceptionally for our study concerning software countermeasures against the Bellcore attack.
In order to provide better security for data protection under strong encryption more and more implementations on tamper-proof devices (e.g., smartcard ICs) are proposed. The main reason is that smartcard ICs provide high reliability and security with more memory capacity and better performance characteristics than conventional magnetic stripe cards. With special characteristics of computational ability a large variety of cryptographic applications benefit from smartcard ICs. This attracted a huge amount of research on physical attacks against smartcards in 1996 due to [Koch] , [BDL] and again 1999 by [KJJ] , followed by [GMO,SQ] . However, most research so far focused on Timing or Power Analysis attacks. This is surprising as the frauds with smartcards by inducing faults are reality, cf., [A, AK1, AK2] , whereas no frauds via Timing or Power Analysis attacks have been reported so far. Moreover, research on fault-based cryptanalysis is not very active compared to the other side-channel attacks. Furthermore, no practical investigation of the Bellcore attack is presently known. Indeed, this topic will be publicly addressed within this paper for the first time. It answers a question of Kaliski and Robshaw [KR] of how practical these attacks might be, answered definitely here by physicists, designers and manufactures of secure hardware.
The present paper is organized as follows: Section 2 briefly repeats RSA using the CRT and its fault-based cryptanalysis according to [BDL,JLQ] ; it also includes and discusses the advantages and limitations of so far publicly known software countermeasures to defeat fault attacks on RSA in CRT mode. Section 3 firstly explains so-called spike attacks and their realization on smartcard ICs, their complexity from an attacker's point of view and reveals an appropriate test equipment to implement fault attacks. Secondly, we will present the resulting errors on unprotected hardware and software for RSA in CRT mode. This demonstrates the insufficiency of a straightforward implementation of a wellknown countermeasure due to [Sh] . Within section 4 we basically investigate enhanced software countermeasures derived from our practical observations and our proposed model to counteract fault attacks on RSA. Eventually, section 5 adds some practical conclusions concerning software countermeasures to prevent Bellcore attacks.
Preliminaries

The RSA System
Let N = p · q be the product of two large primes of similar length. To sign a message m ∈ Z N using RSA one computes S := m d mod N , where d is the private exponent satisfying e · d ≡ 1 mod (p − 1)(q − 1) for the public exponent e. The computationally expensive part of signing is the modular exponentiation. For better efficiency most implementations exponentiate as follows: using repeated square and multiply they first compute S p := m d mod p and hereafter S q := m d mod q. Then they construct the signature S = m d mod N using the CRT. This last step takes negligible time compared to the two exponentiations. It is done efficiently by computing
using Garner's algorithm, cf. [Kn] . The exponentiation using the CRT is much faster than the full exponentiation. To see this, observe that
is of order p. Consequently, computing S p requires half as many multiplications as computing S directly. In addition, intermediate values during the computation of S p are only half as big -they are in the range [1, . . . , p] , rather than [1, . . . , N] . Clearly, the same arguments are valid for the computation of S q . When quadratic time complexity is used, multiplying two numbers in Z p takes a quarter of the time as multiplying elements in Z N . Hence, computing S p takes an eighth of the time of computing S directly. Thus, computing S p and S q this way takes a quarter of the time of computing S directly. Thus, CRT exponentiation is four times faster than direct exponentiation. This is the reason for using the CRT for RSA signature generation, cf. [CQ,MvOV] .
The Fault-Based Cryptanalysis of RSA Using CRT
We briefly recall the fault-based cryptanalysis of RSA with the CRT due to [BDL,JLQ] . Assume that during the computation of an RSA signature for a message m a random error occurs during the computation of S p . This yields a faulty signature part S p , whereas the computation of S q is done correctly. The combination of S p and S q via (1) will yield an incorrect signature S . For S it holds that S − S = 0 but S − S ≡ 0 mod q. Therefore, one obtains the factorization of N by computing gcd ((m − (S ) e ) mod N, N ) = q.
Simple Software Countermeasure to Defeat the Fault Attack
Some simple ad-hoc countermeasures have been already suggested within [BDL, KR] . One approach is to perform calculations twice and the other approach suggests to verify the correctness of the signature by comparing the inverse result with the input. The first approach is very time-consuming and it cannot always provide a satisfactory solution because a permanent error may be undetectable by computing the function more than once. The second approach is to verify the correctness by comparing the inverse result with the input m. Generally, this is not a satisfactory solution since the parameter e could be a large integer and this checking procedure becomes time-consuming. Additionally, for a real life software implementation the programmer cannot rely on the fact that e is known and a small number. On the other hand, this countermeasure seems to be the safest.
An interesting countermeasure is the introduction of randomness into the RSA signature process. Here, RSA is applied to F (m, r) where F is some formatting function and r is a random string which ensures that the user never signs the same message twice and the attacker does not know the signed message, cf. [BDL,BR,KR] . Other countermeasures are mentioned in [Ro] .
Shamir's Software Countermeasure
Shamir's idea, cf. [Sh] , is to select a random integer t and to do the following computations
In the case of S pt = S qt mod t the computation is defined to be error free and S is computed according to the CRT recombination equation (1). One drawback in Shamir's method, as pointed out in [JPY] , is the following: Within the CRT mode of real RSA applications the value d is not known, only the values
can be efficiently computed from d p and d q only, as described in [FS] , it will limit the acceptance of Shamir's method. Moreover, his check will be shown to be insufficient anyway. But, our enhanced software countermeasures will resolve the above critical points of his method.
General Remarks on Methods to Overcome Fault Attacks
Only very recently the field of research on fault attacks countermeasures has been emerged. For instance a series of papers [YJ, YKLM1, YKLM2, JQYY] assume that the attacker has a very precise knowledge about the implementation details and especially an absolute accurate control of the timing of his fault induction. Under this strong assumption the private exponent d can be reconstructed by abusing the implemented correctness check as an oracle for the bits of d. However, all the described fault attacks can easily be prevented by various randomization techniques for the RSA algorithm. In a side-channel secure RSA signature implementation such techniques are present.
Moreover, [YKLM1] proposed the following very interesting countermeasure: Their key idea is to influence the computation of S q or the overall computation of S when an error occurred during the computation of S p , or vice versa. The cryptanalysis given in section 2 shows that a successful fault attack is not possible anymore. Unfortunately it was recently shown by [BMS] that their proposal for a so-called infective RSA CRT computation is not secure.
Physical Fault Attacks Realization
First of all, we would like to stress again that modern high-end cryptographic devices, e.g., smartcards, are usually protected by means of various and numerous sophisticated hardware mechanisms to detect any intrusion attempt into their system behavior, cf. [Ma,NR] . This is due to the fact that hardware manufacturers of cryptographic devices such as smartcard ICs have been aware of the importance of protecting against intrusions by, e.g., external voltage variations, external clock variations, etc. for a long time. However, it should be clear that the design of such mechanisms is a very difficult engineering task. Such mechanisms should be able to tolerate slight natural deviations from the standard values of the electrical parameter to be safeguarded. This is necessary to ensure a proper functionality of the underlying device within the specified range, as for example described in [ISO] . On the other hand they also have to detect very fast and unnatural low deviations from the specified standard range. This condition is necessary to detect any attack attempt by modifying the electrical execution conditions to alter a computation's result. For example, the standard specification [ISO] allows for the smartcard IC's contact V CC under normal operating conditions a voltage supply between 4, 5V and 5, 5V.
Although there are lots of possibilities to introduce an error during the cryptographic operation of an unprotected smartcard hardware, we will only explain in detail the so-called spike attacks. The reason is that spike attacks are non invasive attacks. Thus, they require no physical opening and no chemical preparation of the smartcard IC. For further information on various methods how to enforce erroneous computations of chips we refer to [A, AK1, AK2, Gu1, Gu2, Koca, Ma] .
Spikes
A smartcard of voltage class type A should be able to tolerate on the contact V CC a supply voltage between 4, 5V and 5, 5V, where the standard voltage is specified at 5V. Within this range the smartcard will be able to work properly. However, a deviation of the external power supply of much more than the specified 10% tolerance could cause problems with the smartcard IC. Indeed, it could then lead to a wrong computation result, provided that the smartcard IC is still able to finish its computation completely. But most often this is not possible, as the spike causes too much trouble to the CPU of the smartcard IC. Although a spike with the explanation above seems very simple, a specific type of a power spike is determined by nine parameters. Using picture 1 we will explain them:
1. Initial value of the power supply V 2 . 2. Starting point t 1 of the spike. 3. Rise time t 2 − t 1 of the spike. 4. Shape of the rising transition. 5. Height V 3 − V 1 of the power spike. 6. Length of the power spike t 3 − t 2 . 7. Falling time t 4 − t 3 of the spike. 8. Shape of the falling transition. 9. Final value V 1 of the power supply. This indicates the huge range of different parameters which must be scanned for penetration attacks against cryptographic devices. On the other hand, it also reveals the strong demands on the corresponding sensor and filter mechanisms. From the former discussion of spike attacks, one can envision the difficulties an attacker is confronted with, when he wants to overcome all the activated hardware countermeasures within modern high-security smartcard ICs.
Laboratory Setting
In order to systematically investigate the effects of spikes and especially our proposed countermeasures, we basically used the following spike enforcing hardware set-up, which is shown in figure 2. With such a test set-up it is indeed possible to enforce a spike with a very high accuracy. This is necessary, if the spike shall just only enforce a tiny ran-dom computation fault rather than a complete destruction of the smartcard's computation, which would make the smartcard's computation result unusable for a successful attack. Through the coupling of the control and communication of the smartcard with a PC, which is running a dedicated test-software, it is possible to observe and analyze the smartcard's reaction with respect to the applied spike-form as discussed above, e.g., answering with a correct/wrong answer sequence. Furthermore, the PC is responsible for the stimuli, timing and controlling of the above spike parameters. Coupled with an interface card, the spike generator is triggered by the PC which provides the time and voltage information for the specific spike to be applied to the card. The spike generator is directly connected to the power supply V CC of the smartcard and provides its IC with the necessary operating voltage including the voltage drop of the spike. By means of the synchronization of the PC, the spike generator and the chipcard itself a very high attack reproducibility of more than 90% can be achieved. Now, one has to find parameters for such a spike which enables a tiny random computation fault, but leaves the main computation untouched.
Results on Unprotected Hardware and Software
We will now discuss our results of successfully applied spike-attacks on unprotected smartcards, i.e., ICs where any hardware countermeasures against fault attacks have been switched off. Moreover, we have also switched off any (hardware and software) countermeasures against other classical side-channel attacks, like Timing Analysis [Koch] , Power Analysis [KJJ] , Electromagnetic Analysis [SQ,GMO] , etc.
However, to introduce a spike at the right position of the RSA with the CRT, one should investigate the power profile of the critical computation first. Such a power profile of our investigated smartcard equipped with an RSA coprocessor is shown in figure 3. Let us explain this power profile a little bit more: The upper line represents the profile of the smartcard's I/O behavior. The first I/O activity is the start impulse for the smartcard and the second peak is the answer sequence given by the smartcard. Between these two peaks the smartcard is computing a 2048-bit RSA signature using the CRT. This is shown in the lower line where the main power profile of the smartcard is depicted.
The RSA-CRT computation starts at the time block 1.5 and ends at the time block 9.2. In the figure the blocks are numbered from 0 to 9. This is shown by the fact that the power consumption increases -due to the coprocessors activity. One immediately recognizes the two different exponentiations as they are the main power consumers.
In our case the first exponentiation lies in the time frame 1.6 to 5.1, and the second exponentiation lies in the time frame 5.3 to 8.8. Before the first exponentiation one recognizes the loading of the data into the crypto coprocessor for the first exponentiation, after the first exponentiation the corresponding correctness checks and as well the loading of the data into the crypto coprocessor and for the second exponentiation and after the second exponentiation again the correctness checks of the second exponentiation. Finally, one sees the CRT combination of the two partial exponentiations followed eventually by an additional correctness check for the CRT combination. Results on completely unprotected RSA using the CRT. The first algorithm we attacked with our spike equipment was the pure RSA signature algorithm using the CRT: Handling data within CPU wrong exponentiation modp, q Error within CPU or coprocessor modification of q −1 mod p Moving data from E 2 to coprocessor wrong combination of S p and S q All listed errors faulty signature modp and modq Moving data from coprocessor wrong answer of smartcard Fatal error within CPU Note that the first five errors may lead to a successful attack, whereas the last two do not. Thus, we can conclude that it is absolutely necessary to have sophisticated hardware and software countermeasures to avoid such kinds of attacks. Within the remaining sections we will analyze already existing software countermeasures and also develop new and more reliable countermeasures.
Results on unprotected hardware with simple software countermeasures. Motivated by the devastating results obtained within the previous section, we hereafter tested the reliability of the naively implemented software countermeasures due to [Sh] as desribed in section 2. Thus, we applied spikes to the unprotected smartcard while computing the following RSA signature algorithm shown in figure 4 .
Again, we firstly summarize some of the observed errors.
Observed error scenarios The above table is organized as follows. The second column denotes the kind of error which might occur. Column A indicates whether the countermeasure recognizes the induced fault, column B indicates whether the corresponding faulty signature S reveals the secret key and column C says whether the countermeasure is correctly working in the corresponding case. We will briefly comment the observed errors row by row:
randomly choose a short prime r of, e.g., 32 bits 1. During the computation of p the value of p may be changed to some valuep, such that p =pr. Then S p is computed correctly modulo r, but not modulo p. If p is destroyed later, then the check reveals the attack. If a destroyedp will be used for the computation of d p then the check will not recognize this relevant fault. 2. If d is changed before the first two reductions this will not be detected but is not security relevant. If d is changed between the first two reductions, this will be recognized by the check. 3. If d p or d q is destroyed the check will detect this modification. 4. Depending on the time r is destroyed, various things can happen: either the errors will be recognized or they are not security relevant. 5. The destruction of one of the two exponentiations is the classical Bellcore attack. This will be recognized. 6. If S p will be changed before the combination to S then the check will fail. 7. If q −1 mod p will be changed then the faulty signature will reveal the key. The check will not recognize the attack. 8. Cf. last row. 9. If the correct signature is destroyed S reveals no information about the key.
Practical Fault Attacks Countermeasures for Unprotected Hardware
Within this section we will use the formerly discussed errors to propose a simple practical error model. Hereafter, we propose enhanced countermeasures.
RSA dp, pt 
Model to Understand Resulting/Possible Faults
From the observed error scenario, we have learned by an extensive data analysis the following facts:
-During the computation, every input value to the RSA signature algorithm can be altered to a value different from the original value. -During the computation, every variable can be changed.
-The instruction sent to the CPU or a peripheral can be changed. -The only values to trust, are the values which are stored in ROM or EEP-ROM.
Armed with this knowledge, we formulated the following checking philosophy:
Check (at least in a probabilistic sense) every computed intermediate result with respect to its correctness by relying on trusted values only.
In a rough sense, this is reflected by figure 5 . In this context we adapt the transient fault model due to [BDL] which assumes that our power spikes introduce arbitrary errors. Additionally, we assume that the attacker can induce only one spike but at a specific time chosen by himself.
Software Countermeasures Derived According to the Model
Inspired by the previous section, we developed the following countermeasures (shown in figure 6 ) to counteract fault attacks. It takes into account that in a practical application only d p and d q are given. Also, it avoids the use of the public exponent e, which in real applications is most often not known to the signature software.
We will briefly comment on this algorithm. The check after the CRT combination ensures that S is correctly computed from the data S p and S q . Therefore, it remains to guarantee that the latter ones are correct. The central check (S dqt pt ≡ S dpt qt mod t) proves that the two big exponentiations itself where processed in a correct way -assuming that the inputs are not compromised. Note that an erroneous pass of this check can only be due to some very subtle modifications of these input values. Such errors will be intercepted by the first two checking blocks. Finally, we would like to point out the following important advice for a careful implementation: for the two checking blocks the secret parameters d p and d q have to be reloaded from a secure area (EEPROM).
Measurement Results for Enhanced Software Countermeasures
By extensive penetration tests via spikes on the algorithm shown in figure 6 we obtained the following table. It proves empirically the reliability of our software countermeasures.
input: m, p, q, dp, dq, q −1 mod p let t be a short prime number, e.g., 32 bits p := p * t d p := dp + random1 * (p − 1) yes yes yes faulty signature modp and modq prob. 1 − 1/t no yes Clearly, the probability that an error is undetected is equal to 1/t. For t a 32-bit integer, this probability is small enough; t can thus be seen as a security parameter.
Conclusion
We have shown that the classical Bellcore fault attack is in principal feasible when using completely unprotected microcontrollers. Moreover, it also shows that unskilled implementations of countermeasures are not always reliable. It again answers a question of Kaliski and Robshaw [KR] , and shows that these attacks are indeed practical. Our investigation also reveals that one should test any conceivable countermeasures in reality against all possible attack scenarios before trusting them. This was especially done with our newly developed software countermeasures.
Although our software countermeasure seems to be very promising, we are strongly convinced that cryptographic hardware should never be used without appropriate hardware countermeasures in combination with software countermeasures. As a result, we finish with an advice given by Kaliski and Robshaw [KR] from the RSA Laboratories stating that good engineering practices in the design of secure hardware are essential.
