Techniques of Side Channel Cryptanalysis by Muir, James




presented to the University of Waterloo
in fulfilment of the




Waterloo, Ontario, Canada, 2001
c©James Alexander Muir 2001
I hereby declare that I am the sole author of this thesis. This is a true copy of
the thesis, including any required final revisions, as accepted by my examiners.
I understand that my thesis may be made electronically available to the public.
ii
Abstract
The traditional model of cryptography examines the security of cryptographic prim-
itives as mathematical functions. This approach does not account for the physical
side effects of using these primitives in the real world. A more realistic model em-
ploys the concept of a side channel. A side channel is a source of information that is
inherent to a physical implementation of a primitive. Research done in the last half
of the 1990s has shown that the information transmitted by side channels, such as
execution time, computational faults and power consumption, can be detrimental
to the security of ciphers like DES and RSA.
This thesis surveys the techniques of side channel cryptanalysis presented in [30],
[10], and [31] and shows how side channel information can be used to break imple-
mentations of DES and RSA. Some specific techniques covered include the timing
attack, differential fault analysis, simple power analysis and differential power anal-
ysis. Possible defenses against each of these side channel attacks are also discussed.
iii
Acknowledgements
I was introduced to the concept of power analysis at the 1st CACR Information
Security Workshop in November 1998. The following May, I joined the Secure
Systems group at Pitney Bowes in Shelton, CT where I gained practical experience
in this area by working in their power analysis lab. I owe many thanks to both the
CACR and Pitney Bowes; the opportunities they have provided me have enriched
both my experience as a Masters student and the contents of this thesis.
I would like to thank the Department of Combinatorics and Optimization and
NSERC for their financial support during my studies. Also, a special thank-you is
extended to Kim Gingerich and Lori McConnell for their help and encouragement
in pursuing NSERC funding.
Many of my friends and fellow graduate students lent me their attention and
patience as I worked through this thesis. In particular, I would like to thank Kristi
Herridge, John Irving, Debbie Maclean, John Proos, Chris Snyder, and Kerri Webb.
Above all, I would like to thank my family for their love and support.
The Road goes ever on and on Down from the door where it began.
Now far ahead the Road has gone, And I must follow, if I can,
Pursuing it with eager feet, Until it joins some larger way,




2 Timing Analysis 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 The Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Attack Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 RSAREF 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 20
2.3.4 An Improvement . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Other Vulnerable Systems . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Fault Analysis 34
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 RSA Vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
v
3.2.1 RSA with CRT . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.2 Other Implementations . . . . . . . . . . . . . . . . . . . . . 38
3.3 DES Vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.1 DES Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2 Differential Fault Analysis . . . . . . . . . . . . . . . . . . . 47
3.3.3 Intrusive Fault Analysis . . . . . . . . . . . . . . . . . . . . 53
3.4 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 Power Analysis 60
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2 Power Dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Correlation with Operations . . . . . . . . . . . . . . . . . . . . . . 64
4.3.1 Simple Power Analysis . . . . . . . . . . . . . . . . . . . . . 64
4.4 Correlation with Operands . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.1 Hamming Weights . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.2 Differential Power Analysis . . . . . . . . . . . . . . . . . . . 70
4.4.3 Multiple bit DPA . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77





1.1 Traditional cryptographic model . . . . . . . . . . . . . . . . . . . . 1
1.2 Representing side channels . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Left-to-right square-and-multiply algorithm . . . . . . . . . . . . . . 6
2.2 Modular exponentiation in RSAREF . . . . . . . . . . . . . . . . . 10
2.3 Timing distribution of modular squares . . . . . . . . . . . . . . . . 21
2.4 Timing distribution of modular multiplications . . . . . . . . . . . . 22
2.5 Timing distribution of modular exponentiations . . . . . . . . . . . 24
2.6 Results of the timing attack using 100 timings . . . . . . . . . . . . 26
2.7 Results of the timing attack using 1000 timings . . . . . . . . . . . 27
2.8 Square and unconditional multiply algorithm . . . . . . . . . . . . . 31
3.1 Faulted right-to-left square-and-multiply algorithm . . . . . . . . . 39
3.2 Randomized algorithm for RSA fault analysis . . . . . . . . . . . . 42
3.3 DES algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Faulted DES encryption algorithm . . . . . . . . . . . . . . . . . . 48
3.5 DES computation path . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.6 Difference distribution table of S1 . . . . . . . . . . . . . . . . . . . 50
vii
3.7 DES expansion permutation . . . . . . . . . . . . . . . . . . . . . . 51
3.8 Permanent DES fault . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1 CMOS logic inverter and capacitor . . . . . . . . . . . . . . . . . . 63
4.2 SPA trace of DES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 SPA trace of DES rounds one to three . . . . . . . . . . . . . . . . 66
4.4 SPA trace of an RSA signature . . . . . . . . . . . . . . . . . . . . 66
4.5 Round one DES subkey . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.6 A differential trace for R0 . . . . . . . . . . . . . . . . . . . . . . . 74
4.7 DES f function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.8 A differential trace for R1 . . . . . . . . . . . . . . . . . . . . . . . 75




Mathematical abstraction can be a very useful tool in the study of cryptographic
primitives. Cryptographers often evaluate the security of ciphers by considering







Figure 1.1: The traditional cryptographic model.
In this model, two people, Alice and Bob, attempt to use a cipher to engage
in a private conversation across a public channel. An eavesdropper, Eve, monitors
the public channel and tries to deduce what Alice and Bob are talking about. Eve
has at her disposal all the details of the cipher, except for the secret key ( this is
1
CHAPTER 1. BACKGROUND 2
known as Kerckhoff’s assumption ), a few plaintext-ciphertext pairs generated by
either Alice or Bob, as well some reasonable amount of computing power.
Traditionally, any cipher which resisted Eve’s scrutiny in this model was thought
to be secure. Whether or not such a cipher would be implemented in the real world
was then a matter of practicality ( e.g., key length, encryption speed, memory
requirements ). However, as this thesis will illustrate, ciphers which are secure
when specified as mathematical functions are not necessarily secure in real world
implementations.
In reality, ciphers are implemented on physical devices which interact with and
are influenced by their environments. Electronic devices, like pagers and smart
cards, consume power and emit radiation as they operate; they also react to tem-
perature changes and electromagnetic fields. These physical interactions can be
instigated and monitored by adversaries, like Eve, and may result in information
which is useful in cryptanalysis.
An insightful demonstration of this point is related by Peter Wright in [52]. He
explains that in 1956, the British intelligence organization, MI5, was trying to break
a cipher used by the Egyptian Embassy in London, but their efforts were stymied
by the limits of their computational power. Wright, a scientist with GCHQ at the
time, suggested that a carefully placed microphone might help. The Egyptians
were using a Hagelin machine, a rotor based cipher, and after some tests Wright
discovered that the audible click which occurred as the rotors turned could be
exploited. During a special service call to fix a faulty telephone in the embassy, a
microphone was placed close to the Hagelin machine. By listening to the clicks of
CHAPTER 1. BACKGROUND 3
the rotors as cipher clerks reset them each morning, MI5 was able to deduce the
core position of 2 or 3 of the machine’s rotors. This extra information allowed the
task of calculating the initial setting of the Hagelin machine to fall within the means
of MI5 computing resources, and subsequently allowed them to read the embassy’s













Figure 1.2: A model which includes side channels.
The traditional cryptographic model does not account for the physical side
effects of using ciphers in the real world. A more realistic model can be described
using the concept of a side channel, as shown in Figure 1.2. A side channel is a
source of information that is inherent to a physical implementation. MI5’s break of
the Hagelin cipher exploited a side channel consisting of sound, but there are many
others.
The chapters of this thesis demonstrate how the analysis of side channel infor-
mation can be used in cryptanalysis. In particular, three kinds of side channels are
examined: execution time, computational faults and power consumption. The aca-
demic research in these three topics was initiated by Kocher [30], Boneh, DeMillo
and Lipton [10], and Kocher, Jaffe, and Jun [31], respectively. The techniques of




Commercial cryptographers have long been concerned with how much execution
time their cryptographic implementations require. The amount of time used to
encrypt a message or produce a digital signature is often used as a benchmark when
comparing different cryptographic schemes; with all other factors being equal, the
fastest scheme is considered the most efficient and is hence the most marketable.
The amount of time it takes to compute a cryptographic function depends not
only on what that function does but also what inputs are passed to it. Certain en-
codings of messages may require less time to encrypt because of the mathematical
operations used. For example, an encryption function based on integer multiplica-
tion might be quick to evaluate with pen and paper if the message to encrypt is a
power of ten. A prudent cryptographer might then try to express every message
as a power of ten to exploit this computational shortcut. However, in addition to
4
CHAPTER 2. TIMING ANALYSIS 5
messages, cryptographic functions often take secret keys as input and so the value
of a key might influence publicly observably timing characteristics.
On 29 November 1995, Paul Kocher described how the timing characteristics
of cryptosystems such as RSA, DSS and Diffie-Hellman can be correlated to the
values of their secret keys. He further outlined how an attacker is able to analyze
measurements of the time it takes to compute several, say, RSA signatures and
deduce the signing entity’s secret key. After a preliminary version of Kocher’s results
circulated, the cryptographic community began to realize that some products and
protocols currently in use were vulnerable to the attack ( e.g., SSL ). With the
growing popularity of electronic commerce, this new method of cryptanalysis made
quite a good story; it even made the front page of the New York Times [35].
Outline
We first describe the idea behind Kocher’s timing attack on modular exponentia-
tion. Next, we give details about how the attack can be applied to the modular
exponentiation routine in the freely available RSAREF 2.0 cryptographic toolkit.
An analysis of the attack is then presented which allows us to estimate the number
of timing measurements required to extract a secret exponent. A modification of the
attack is then discussed, as well as other cryptosystems and operations which are
potentially vulnerable to timing analysis. We end by presenting a countermeasure
which makes RSA immune to this version of the timing attack.
CHAPTER 2. TIMING ANALYSIS 6
2.2 The Idea
An operation which is fundamental to the RSA cryptosystem is modular expo-
nentiation. It is used to encrypt and decrypt as well as to sign message blocks.
When RSA was introduced, the inventors suggested a repeated square-and-multiply
algorithm ( see Figure 2.1 ) as a way to implement this operation efficiently [46].
Several RSA implementations followed this example including RSAREF, a reference
implementation authored by RSA Laboratories.
Figure 2.1 describes the left-to-right square-and-multiply algorithm. The al-
gorithm’s parameters are labeled using notation from common descriptions of the
RSA cryptosystem. The output S can be thought of as a digital signature. The
private exponent d can be represented using at most n bits where n is the bit length
of the RSA modulus N .
INPUT: M,N, d = (dn−1dn−2 . . . d1d0)2
OUTPUT: S = Md mod N
1 S ← 1
2 for j = n− 1 . . . 0 do
3 S ← S2 mod N
4 if dj = 1 then
5 S ← S ·M mod N
6 return S
Figure 2.1: The left-to-right repeated square-and-multiply algorithm for modular
exponentiation.
Kocher made some important observations about square-and-multiply algo-
CHAPTER 2. TIMING ANALYSIS 7
rithms. In Figure 2.1, the conditional expression at line 4 causes the execution
path of this algorithm to vary according to the value of the exponent. In any loop
iteration, if the relevant bit of d is one, then both a modular square and multiply
are performed ( lines 3 and 5 respectively ); if the relevant bit is zero only a modular
square is performed. So, the required amount of computation, and hence execution
time, to complete the n loop iterations is influenced by the value of the exponent.
If an attacker could observe and compare the execution time of several loop
iterations in the square-and multiply algorithm, he or she may be able to deduce
the value of the corresponding exponent bits. This technique, when applied against
an RSA signature operation, would reveal bits of the signer’s private key. However,
it is not clear how an attacker might observe the timing characteristics of individual
loop iterations1. Kocher’s timing attack describes how an attacker can use the total
execution time of the algorithm to deduce bits of the private exponent. This timing
information can be easily observed by a passive attacker.
Suppose that a malicious user, Marvin, sends a series of signature requests
to a PC that implements RSA using the repeated square-and-multiply algorithm.
Marvin records the times T1, T2, . . . , Tk it takes the PC to return a signature on
each of the known messages M1,M2, . . . ,Mk ∈ ZN . The attack now proceeds to
allow Marvin to recover the bits of d one at a time.
Since d < N and n is the bit length of N , the binary representation of d may
contain leading zeroes2, but to simplify our discussion we will assume that dn−1 = 1.
1We will see in Chapter 4 how the execution time of individual loop iterations can be deduced
using power analysis.
2In practice, any leading zeroes in dn−1dn−2 . . . d1d0 are skipped to reduce the number of loop
iteration required in the square-and-multiply algorithm. However, this implementation detail is
CHAPTER 2. TIMING ANALYSIS 8
Tracing through the pseudo-code of Figure 2.1, Marvin knows that at the start of
the second loop iteration S = M mod N and then, after the squaring step, S = M2
mod N . If dn−2 = 1, the PC computes the product M ·M2 mod N , otherwise it
does not. Using his knowledge of the physical specifications of the target PC,
Marvin simulates on an identical PC ( i.e., same processor, RAM cache, etc. ) the
time t̂i it takes to compute M
2
i ·Mi mod N for each of the known messages. The
value of Mi influences the amount of time required to perform this calculation
3.
Kocher noticed that, when dn−2 = 1, the two ensembles {t̂i} and {Ti} are
correlated. For example, if t̂i is much larger than its expectation, then Ti is also
likely to be larger than its expectation. If dn−2 = 0, then the two ensembles behave
as independent random variables. By measuring the correlation Marvin can decide
the value of dn−2. Now Marvin knows the value of S at the start of the third loop
iteration. To get dn−3 Marvin reconstructs the ensemble {t̂i} by simulating the
time it takes the PC to compute S ·M mod N , where the value of S is known,
and compares it with the ensemble {Ti}. Marvin continues in this way to recover
the remaining bits of d.
2.3 Attack Details
The principles underlying the timing attack are elementary, but there are several
details which must be addressed when putting it into practice. For example, it is
incidental to our method of attack.
3In classical implementations of modular multiplication the product Mi ·M2i is first calculated
in Z and then reduced modulo N . Since it takes more time to multiply large numbers together
than small ones the value of Mi influences the required computation time.
CHAPTER 2. TIMING ANALYSIS 9
not clear how Marvin might measure the correlation between the various ensembles
or how many timing measurements are required for a successful attack. Answers
to these points depend upon the characteristics of the target implementation, but
the techniques Kocher describes in [30] offer some direction.
We analyze Kocher’s method of attack and explain how it can be applied against
modular exponentiation in the RSAREF cryptographic library.
2.3.1 RSAREF 2.0
RSAREF 2.0 was released by RSA Laboratories in 1994. It was intended as an
educational reference implementation of some common cryptographic schemes. In-
cluded in the RSAREF 2.0 library are routines for Diffie-Hellman key agreement
and RSA signatures. In both systems, modular exponentiation is accomplished via
the function NN-ModExp. Pseudo-code is given for NN-ModExp in Figure 2.2.
The algorithm in Figure 2.2 is a generalization of the basic square-and-multiply
algorithm presented earlier and it inherits similar timing properties. When used to
calculate an RSA signature, the algorithm first computes the values M2 mod N
and M3 mod N , and then 2 bits of the private exponent are processed at a time.
Each loop iteration does two squaring operations and, if either exponent bit is
nonzero, one multiply operation.
RSAREF does multiplication in ZN by first calculating a product in Z and
then reducing it modulo N . Squares are calculated using the same technique. The
execution time of this simple method is related to the Hamming weight of the
factors. The function NN-ModMult is used to evaluate each operation.
CHAPTER 2. TIMING ANALYSIS 10
INPUT: M,N, d = (dn−1dn−2 . . . d1d0)2
OUTPUT: S = Md mod N
1 m1 ←M mod N
2 m2 ← m1 ·M mod N
3 m3 ← m2 ·M mod N
4 S ← 1
5 for j = n− 1 . . . 0 by 2 do
6 S ← S2 mod N
7 S ← S2 mod N
8 if (djdj−1)2 = 0 then
9 S ← S ·m(djdj−1)2 mod N
10 return S
Figure 2.2: A left-to-right repeated square-and-multiply method which uses a two
bit window.
A common misconception about the timing attack is that it only determines
whether or not a conditional multiplication is performed. If this were the case then
the attack would not succeed against the algorithm in Figure 2.2. Knowing that a
multiplication is executed in a particular loop iteration would only eliminate one of
four possible values for the relevant pair of exponent bits. To determine the value
of a pair of exponent bits it is necessary to know what operands were used in the
conditional multiplication. The timing attack is able to exploit timing variation in
the multiplications and the squares to do just that.
Suppose Alice and Marvin engage in a signature protocol using their PC’s.
CHAPTER 2. TIMING ANALYSIS 11
When Marvin sends a message to Alice, she uses the RSAREF routines and her
private key pair 〈N, d〉 to sign it. Alice then sends her signature to Marvin. Marvin
records the time Ti that it takes Alice to respond after he sends her the message
Mi.
There are several factors which contribute to the value of Ti. Returning to
Figure 2.2, the time required to perform the operations on lines 1 to 4 makes a
contribution which we denote by ci. In the loop of Figure 2.2, for particular value
of j, the time required to execute lines 6, 7 and 9 also contributes to Ti. We denote
these contributions by ri,j, si,j, and ti,j, respectively. Note that ri,j and si,j are
strictly positive values, but ti,j may be zero. Other factors, such as measurement
error and transmission distance, also contribute to Ti and may be treated as sources
of error. We denote these contributions by ei. Now, we can write:
Ti = ei + ci + (ri,n−1 + si,n−1 + ti,n−1) + (ri,n−3 + si,n−3 + ti,n−3)
+ · · · + (ri,1 + si,1 + ti,1)
= ei + ci +
∑
j
(ri,j + si,j + ti,j).
The bits of Alice’s secret exponent influence the value of almost all of the com-
ponents in this sum. For a particular value of j, the operands used in the two
squaring operations are completely determined by the value of the exponent bits
dn−1dn−2 . . . dj+2dj+1. The operands used in the multiplication step are affected by
these same bits as well as the bits djdj−1. Thus, the the components ri,j, si,j, and
ti,j are all influenced by exponent bits. The value of ci is influenced only the by the
value of Mi.
CHAPTER 2. TIMING ANALYSIS 12
Consider the first loop iteration of NN-ModExp. Using a PC identical to Alice’s,
Marvin can simulate and time the four possible sets of calculations Alice performed
in the first loop iteration when she signed the message Mi. Effectively, Marvin
generates four candidates for the value of ci + ri,n−1 + si,n−1 + ti,n−1. To construct
each candidate, Marvin can simply sign the message Mi four times using the ex-
ponents 00, 01, 10 and 11. Denote the time required for these four signatures by
T̂i,n−1,0, T̂i,n−1,1, T̂i,n−1,2 and T̂i,n−1,3 where the first two indices indicate the rele-
vant message and loop iteration, and the last index represents a guess for the bits
dn−1dn−2. Marvin can construct the following table:
00 01 10 11
T1 − T̂1,n−1,0 T1 − T̂1,n−1,1 T1 − T̂1,n−1,2 T1 − T̂1,n−1,3
T2 − T̂2,n−1,0 T2 − T̂2,n−1,1 T2 − T̂2,n−1,2 T2 − T̂2,n−1,3





In one of the four columns, Marvin’s simulated operations will be the same as the
operations Alice actually performed up to the end of the first loop iteration. In this
column, Marvin’s candidate value will hopefully be closer to ci+ri,n−1+si,n−1+ti,n−1
than the three other candidate values. As the analysis in the following section
shows, with high probability this will cause the sample variance4 of the correct
column to be lower than others. By comparing the four sample variances, Marvin
can determine the value of dn−1dn−2.
4This statistic is usually denoted S2. If Y1, Y2, . . . Yk is a set of observations and Y is their
arithmetic mean, then S2 = 1k−1
∑k
i=1(Yi − Y )2
CHAPTER 2. TIMING ANALYSIS 13
The next pair of exponents bits, dn−3dn−4, can be deduced by timing the four
possible sets of calculations Alice performed before the end of the second loop
iteration. For each message, Marvin can measure the time it takes to sign Mi using
the four exponents dn−1dn−200, dn−1dn−201, dn−1dn−210 and dn−1dn−211. Denote
the time required for these four signatures by T̂i,n−3,0, T̂i,n−3,1, T̂i,n−3,2 and T̂i,n−3,3.
Marvin then reconstructs his table with rows of the form:
Ti − T̂i,n−3,0 Ti − T̂i,n−3,1 Ti − T̂i,n−3,2 Ti − T̂i,n−3,3
Again, Marvin calculates the sample variance of each column to determine the
actual bit values. The value of the other pairs of bits may be decided in turn using
similar tables.
2.3.2 Analysis
Let j0 be a particular value of j in the square-and-multiply algorithm of Figure
2.2, and let g ∈ {0, 1, 2, 3}. Marvin proceeds with the timing attack by filling in
table columns with values of the form Ti − T̂i,j0,g where g is a guess for value of the
exponent bits dj0dj0−1. Assuming that Marvin has correctly determined the value
of the bits dn−1dn−2 . . . dj0+2dj0+1, we have:
T̂i,j0,g = ci +
∑
j>j0
(ri,j + si,j + ti,j) + (ri,j0 + si,j0 + t̂i,j0,g),
CHAPTER 2. TIMING ANALYSIS 14
where t̂i,j0,g is a candidate value for ti,j0 . If g = 0 then t̂i,j0,g = 0, otherwise t̂i,j0,g > 0.
Now, we have:
Ti − T̂i,j0,g = ei + ci +
∑
j








(ri,j + si,j + ti,j) + (ti,j0 − t̂i,j0,g).
Either t̂i,j0,g is a correct measure of the time it took Alice to calculate the
multiplication at line 9 of Figure 2.2 when j = j0 or it is not. If it is correct, then
t̂i,j0,g equals ti,j0 , and so:
Ti − T̂i,j0,g = ei +
∑
j<j0
(ri,j + si,j + ti,j).
If it is not correct, then t̂i,j0,g does not usually equal ti,j0 , so there will be no
cancellation. Marvin can use statistics to determine whether or not this cancellation
occurs and hence check the guess g.
The subtraction of the term t̂i,j0,g affects the variance of a column of data. To
see this, we treat the timing measurements as occurrences of random variables.
The random variable T describes how long it takes to sign a message in ZN using
Alice’s private exponent, d. The random variable T̂j0,g describes how long it takes
to exponentiate a message using the n − j0 most significant bits of d appended
with a two bit guess ( dependent on the value of g ). The random variables r and
s describe how long it takes to square an element of ZN . The random variable t
CHAPTER 2. TIMING ANALYSIS 15
describes how long it takes to multiply two elements of ZN ( note that t is strictly
positive ). Lastly, the random variable e describes the effects of error.
Assuming the time for squares and multiplications in successive loop iterations
are independent from each other and the error, the variance of the random variable
T − T̂j0,g, when the guess g is correct, is:
Var(T − T̂j0,g) = Var
e+ ∑
j<j0












Var(s) +  · Var(t).
The variable  is an integer which is determined by the number of pairs of bits from





that the random variables r and s both describe the time it takes to do a squaring
operation. Thus, r and s are identically distributed and the variance of T − T̂j0,g
can be further simplified to:
Var(T − T̂j0,g) = Var(e) + (j0 − 2)Var(s) +  · Var(t).
When the guess g is incorrect then there are two possibilities for the variance of
T − T̂j0,g, depending on the value of g. Recall that:
Ti − T̂i,j0,g = ei +
∑
j<j0
(ri,j + si,j + ti,j) + (ti,j0 − t̂i,j0,g).
First, suppose that both ti,j0 and t̂i,j0,g are nonzero. Then, the value ti,j0 − t̂i,j0,g is
CHAPTER 2. TIMING ANALYSIS 16
the difference of two ( usually unequal ) occurrences of the random variable t. The
variance of the random variable t− t is Var(t) + Var(−t) = 2 · Var(t), thus for the
relevant table column(s):
Var(T − T̂j0,g) = Var(e) + (j0 − 2)Var(s) + (+ 2)Var(t).
Next, suppose that one of ti,j0 or t̂i,j0,g is zero. Then, for any column(s) of data
with this property:
Var(T − T̂j0,g) = Var(e) + (j0 − 2)Var(s) + (+ 1)Var(t).
So, the column of data based on a correct guess has a variance which is Var(t)
or 2 ·Var(t) lower than the other data columns. The sample variance, S2, is a good
estimator of the true variance and we will now present a heuristic estimate of the
probability that this statistic will distinguish the correct column.
To develop our estimate, we first introduce some notation and state two facts
which are established in most introductory texts on probability ( e.g., [8] ). We
write X ∼ N(µ, σ2) to indicate that the random variable X is normally distributed
with mean µ and variance σ2. The mean of a random variable X is also denoted by
E(X). If Y is a random variable with Y = aX+b, where a and b are constants, and
X ∼ N(µ, σ2), then Y ∼ N(aµ+ b, a2σ2). If X ∼ N(µX , σ2X) and Y ∼ N(µY , σ2Y ),
where X and Y are independent, then X + Y ∼ N(µX + µY , σ2X + σ2Y ).
The column of data in Marvin’s table which corresponds to a correct guess has an
expected variance of Var(e)+(j0−2)Var(s)+ ·Var(t). There is a second column in
CHAPTER 2. TIMING ANALYSIS 17
Marvin’s table that has an expected variance of Var(e)+(j0−2)Var(s)+(+1)Var(t).
These two variances differ by Var(t). Suppose there is a third column of data with
expected variance Var(e) + (j0 − 2)Var(s) + (+ 2)Var(t). Its variance differs from
the first column by 2 · Var(t). The success probability of Marvin’s statistical test,
which consists of calculating S2, is lower when he applies it to the first and second
columns, as opposed to when he applies it to the first and third columns. We derive
an estimate of this worst-case probability of success. An estimate of the probability
in the other case can be derived similarly.
Suppose r, s and t are normally distributed. Let N(µs, σ
2
s) denote the distribu-
tion of r and s, and let N(µt, σ
2
t ) denote the distribution of t. Both:
∑
j<j0





are normally distributed and the data in the correct and incorrect table columns
are distributed according to the sums:
∑
j<j0












Both of these random variables are normally distributed. Denote the distribution
of the former one by N(µ0, σ
2
0). Note that, σ
2
0 = (j0 − 2)σ2s + σ2t .
Suppose we have a total of k accurate timing measurements. Let X1, X2, . . . , Xk
and Y1, Y2, . . . , Yk be standard normal variates. If the effects of error are negligible,
we can model the data in the two columns as:
CHAPTER 2. TIMING ANALYSIS 18
σ0X1 + µ0 (σ0X1 + µ0) + (σtY1 + µt)
σ0X2 + µ0 (σ0X2 + µ0) + (σtY2 + µt)
...
...
σ0Xk + µ0 (σ0Xk + µ0) + (σtYk + µt)
To simplify our notation, we let Vi = σ0Xi+µ0 andWi = (σ0Xi+µ0)+(σtYi+µt).
We want to estimate:
Pr(S2W > S
2

















(Wi −W )2 >
k∑
i=1
(Vi − V )2
)
.
The random variables Vi and Wi are normally distributed with means of µ0 and








































Y 2i > 0
)









i ) = k, and we will use this value to approximate∑k
i=1 Y
2









i ) = 1. Applying the central limit theorem,
CHAPTER 2. TIMING ANALYSIS 19
∑k
i=1XiYi approximately follows a N(0, k) distribution. If Z is a standard normal
variate, Marvin’s probability of success ( in the worst-case ) is roughly:
Pr(S2W > S
2























where Φ(z) is the area under the standard normal curve from −∞ to z. By reapply-
ing the steps of our approximation, we can estimate Marvin’s probability of success










Notice that, as expected, this probability is larger than the first case.
Recall that σ20 = (j0 − 2)σ2s + σ2t . Guessing that  is 34
(j0−2)
2


























Thus the probability of success, in each of the two cases, depends on the values of
σs, σt, j0 and k. As Marvin proceeds with the timing attack, j0 ranges from n − 1
to 1. As more bits of the secret exponent are recovered, j0 decreases, and so the
probability of success should increase. Also, with more timing measurements, k
increases, so the probability of success should increase.
CHAPTER 2. TIMING ANALYSIS 20
In the next section, we evaluate many of the assumptions made in this approx-
imation using experimental data collected from a simulation of the timing attack.
2.3.3 Experimental Results
The instruction set of many PC processors includes a Read Time Stamp Counter
( RDTSC ) function. The time stamp counter is a 64-bit counter which is zeroed on
power-up and is incremented with each CPU clock cycle. By reading this counter
immediately before and after a particular task is executed on a PC, it is possible
to determine the number of CPU cycles consumed by this task5. This number can
then be converted into standard units of time ( e.g., microseconds ), according to
the speed of the processor, but, for the purposes of the timing attack, this is not
necessary.
To estimate the distributions of the time required for RSAREF modular squares,
multiplications and exponentiations, we timed several of these operations, using the
RDTSC function, as they were executed on a PC running MS-DOS r©. The PC’s
processor was a 450-MHz Pentium II r©. The modulus used throughout all of our
experiments was fixed as the 512-bit sample prime, PRIME1, from RSAREF’s Diffie-
Hellman demonstration program.
Figure 2.3 displays the distribution of the time required to square random values
of ZN . The data was collected by timing 10
6 squaring operations. Each of 106 values
squared were drawn uniformly from ZN . The resulting distribution is approximately
normal with µs = 2.7131 × 105 ticks and σs = 1.4719 × 103 ticks. This supports
5For an example of how to call the RDTSC function using standard C, see [27].
CHAPTER 2. TIMING ANALYSIS 21
the assumption in the previous section that the random variable s is normally
distributed. Also, we see that Var(s) ≈ (1.4719 × 103)2 = 2.1665 × 106.





















Figure 2.3: The timing distribution of one million modular squares.
Figure 2.4 displays the distribution of the time required to multiply random
values of ZN . The data was collected by timing the multiplication of 10
6 pairs
of values drawn uniformly from ZN . The resulting distribution is approximately
normal with µt = 2.7119 × 105 ticks and σt = 1.3186 × 103 ticks. Again, the
distribution of the data supports the previous assumption that the random variable
t is normally distributed. Also, Var(t) ≈ (1.3186 × 103)2 = 1.7387 × 106.
It is interesting to note that although the function NN-ModMult is used by
RSAREF to do both modular squares and multiplications, their respective tim-
ing distributions differ. The standard deviation of the multiplication times is lower
than that of the squares. This is evidence that the value of the operands used in
CHAPTER 2. TIMING ANALYSIS 22





















Figure 2.4: The timing distribution of one million modular multiplications.
the NN-ModMult function do indeed have an influence on the observed execution
times.
If we ignore the effects of error, we can use the previous two distributions to
predict the value of the parameters µ and σ in the distribution of modular exponen-
tiation times, assuming that the length of the exponent is known. Consider a 64-bit
exponent drawn uniformly from the space of all such exponents. On average, 24 of
the 32 pairs of bits in this exponent will be nonzero. Thus, when an element of ZN
is exponentiated with this exponent, we expect 24 conditional multiplications to be
performed. The number of squaring operations performed is exactly 64 since two
squares are calculated for each pair of bits. Using the linearity of expected values
CHAPTER 2. TIMING ANALYSIS 23
and variances, we predict that:
µ ≈ 64 · 2.7131 × 105 + 24 · 2.7119 × 105 = 2.3872 × 107 ticks
σ ≈
√
64 · 2.1665 × 106 + 24 · 1.7387 × 106 = 1.3431 × 104 ticks.
Figure 2.5 displays the timing distribution of 105 modular exponentiations using
a fixed 64-bit exponent. The values exponentiated were drawn uniformly from
ZN and the value of the fixed exponent was 0xFEDCBA9876543210 ( i.e., the 16
hexadecimal6 digits written in decreasing order ). Exactly 24 of 32 pairs of bits
in this exponent are nonzero, so we expect the predicted values above to be quite
close to the observed ones. The distribution is approximately normal with µ =
2.3685 × 107 ticks and σ = 1.5026 × 104 ticks. The observed value of σ is larger
than our prediction and this may be caused by the effects of measurement error.
To demonstrate that a comparison of sample variances is a valid method for
distinguishing bits of a secret exponent, we conducted two experiments, each one
targeting the toy exponent d = 0xFEDCBA9876543210. In the first experiment, we
measured the time it took to exponentiate 100 values, M1,M2, . . . ,M100, drawn
uniformly from ZN . Using the resulting timings, T1, T2, . . . , T100, we attempted to
deduce every second pair of exponent bits.
For example, the first pair of bits considered were d61d60; the bits d63d62 were
skipped since we considered only every second pair of bits. To determine d61d60,
the messages M1,M2, . . . ,M100 were exponentiated using the four exponents 0xC,
0xD, 0xE, and 0xF. This resulted in four sequences of timing measurements which
6Values written in hexadecimal are prefixed with 0x.
CHAPTER 2. TIMING ANALYSIS 24


















Figure 2.5: The timing distribution of ten thousand modular exponentiations.
were respectively subtracted from the sequence T1, T2, . . . , T100. The exponent that
produced the data set with the lowest sample variance was then determined and
compared to the actual value of the bits d61d60. If the two values coincide, then
the attack was successful. The next pair of bits considered were d57d56 Again, four
sequences of timing measurements were generated, this time using the exponents
0xFC, 0xFD, 0xFE, and 0xFF, and the resulting sample variances were compared.
For subsequent pairs of bits, the experiment proceeded in a similar manner.
We can use the approximation in the previous section to estimate the probability
of successfully determining the bits d61d60 using 100 timing measurements. This
event will occur only if the sample variance of the correct data set is lower than
that of all three of the other data sets. If we identify the four data sets with their
corresponding bit guesses, we see that the expected variance of data set 11 ( the
CHAPTER 2. TIMING ANALYSIS 25
correct guess ) differs from that of the data sets 00, 01 and 10 by Var(t), 2 · Var(t)



















However, this estimate assumes that the sample variance of the incorrect data sets
are independent of each other and this is not the case. If the sample variance of
data set 00 is large then it is likely that the sample variance of data sets 01 and
10 will be large as well. Without digressing into any further statistical analysis, we
will simply treat the product above as a lower bound on the probability of success.


































≈ 0.6954 · 0.76522 ≈ 0.41.
When considering a pair of bits that have an actual value of 00, the approximate
probability of success is calculated differently. For example, the first pair of bits our
experiment considers with this value is d49d48. In this case, the expected variance
CHAPTER 2. TIMING ANALYSIS 26
of data set 00 ( the correct guess ) differs from each of the data sets 01, 10 and





























The experiment was repeated 25 times, using new values of T1, T2, . . . , T100
in each iteration. For each pair of bits, the observed probability of success was
calculated and compared to the estimated probability of success. The results are



















Figure 2.6: Result of the timing attack with 100 timings.
The table of Figure 2.6 identifies every second pair of exponent bits with their
relevant hex digit. The first entry indicates that 0.44 × 25 = 11 of the 25 trials to
determine the bits d61d60 were successful, which is slightly better than our estimate
of 0.41×25 ≈ 10. As expected, the observed probability of success increases, in the
two respective cases, as the attack progresses. Overall, the observed probability of
success was 240
25·16 ≈ 0.60.
CHAPTER 2. TIMING ANALYSIS 27
Some of the experimental data deviates quite drastically from our estimates.
Most noticeably, the estimated probability of success in the case when the correct
bits are 00 is significantly higher than the observed probability. One possible reason
might be that the variance of the error in the timing measurements, Var(e), is non-
negligible. For example, Var(e) and Var(t) might be close to the same value.
The second experiment followed the same design as the first one except that 1000
timing measurements were used rather than 100. With this number, the estimated












































Figure 2.7: Result of the timing attack with 1000 timings.
CHAPTER 2. TIMING ANALYSIS 28
As expected, nearly all of the observed probabilities have increased from their
values in the first experiment. However, the probabilities in the case when the cor-
rect bits are 00 deviates even further from the estimated values. More experimental
data is required to investigate this aberrant behaviour. Overall, the observed prob-
ability of success was 310
25·16 ≈ 0.78.
The purpose of using a small exponent size in our experiments was to sim-
plify our explanation, however, Kocher’s timing attack has successfully been imple-
mented by other researchers who have targeted exponents of practical sizes [18].
2.3.4 An Improvement
In our description of the timing attack, four exponents, each one representing a
guess for a pair of bits, are used to generate four sets of timing data. The exponents
0xC, 0xD, 0xE and 0xF, were used to determine the value of the third and fourth
most significant bits of d = 0xFEDCBA9876543210, in the experiment described in
the previous section. According to our analysis, the expected variances of the four
resulting sets of timing data are:
0xC Var(e) + (j0 − 2)Var(s) + (+ 1)Var(t)
0xD Var(e) + (j0 − 2)Var(s) + (+ 2)Var(t)
0xE Var(e) + (j0 − 2)Var(s) + (+ 2)Var(t)
0xF Var(e) + (j0 − 2)Var(s) +  · Var(t)
Appending the pair of bits 00 to each of these four exponents can exaggerate the
difference between the expected variance of these data sets.
CHAPTER 2. TIMING ANALYSIS 29
Suppose that four new sets of timing data are generated using the exponents
0x30, 0x34, 0x38 and 0x3C. In binary, the exponent 0x30 is 110000 ( leading zeroes
removed ) which is just the value 0xC concatenated with 00; similarly, for the other
three exponents. The new timing data will differ from the previous data because the
appended pair of bits causes two additional squaring operations to be performed. If
the third and fourth most significant bits of these exponents do not agree with the
bits of d, the addition two squaring operations increase the variance of the data set
by 2 · Var(s). Alternately, if these bits do agree with the bits of d, the variance of
the data set will decrease by the same amount. The respective variances are now:
0x30 Var(e) + j0 · Var(s) + (+ 1)Var(t)
0x34 Var(e) + j0 · Var(s) + (+ 2)Var(t)
0x38 Var(e) + j0 · Var(s) + (+ 2)Var(t)
0x3C Var(e) + (j0 − 4)Var(s) +  · Var(t)
Using this technique, the correct guess for the pair of exponent bits is more
likely to be distinguished by the sample variance of its resulting data set.
2.4 Other Vulnerable Systems
The timing attack can be tailored against virtually any operation which takes a
variable amount of time. The algebraic operations used in public key systems
and signature schemes such as ECC, RSA and ElGamal often run in non-constant
time. Block ciphers such as Rijndael and IDEA are also at risk since they use
multiplication in their encryption processes [33, 29]. The bit rotations used in RC5
CHAPTER 2. TIMING ANALYSIS 30
and DES can leak the Hamming weight of their operands if these operations are
implemented using a shift and conditional bit “wrap around” [24, 28].
Cryptographic engineers must pay careful attention to the influence of key val-
ues on the timing characteristics of table-lookups, bit shifts/rotations, addition,
subtraction and multiplication operations to access the vulnerability of specific im-
plementations to timing attacks.
2.5 Countermeasures
Before describing how to defeat the timing attack, we will first consider two other
common approaches towards developing countermeasures.
The first and most obvious method is to ensure all operations run in a constant
amount of time. Unfortunately, it is difficult to achieve this goal. Compiler opti-
mizations and memory look-ups can introduce unexpected timing variations which
are beyond the control of implementors. Withholding the result of an operation
until a specified amount of time has expired may seem a promising approach, but
the length of the added delay may be conveyed through the system’s power con-
sumption or CPU usage. Using this method would also degrade system efficiency
since all operations would behave as if they were processing worst-case inputs.
Unconditionally performing the multiplication in each loop iteration of a square-
and-multiply algorithm ( see figure 2.8 ) does not make the execution time of the
algorithm constant. Variability in the multiplication and squaring operations will
still remain and this can be exploited. As we emphasized earlier, the timing attack
can determine what operands were used in each step of the algorithm as well as the
CHAPTER 2. TIMING ANALYSIS 31
INPUT: M,N, d = (dn−1dn−2 . . . d1d0)2
OUTPUT: S = Md mod N
1 S ← 1
2 for j = n− 1 . . . 0 do
3 S ← S2 mod N
4 T ← S ·M mod N
5 if dj = 1 then
6 S ← T
7 return S
Figure 2.8: This modification of the square-and-multiply algorithm is still vulner-
able to the timing attack.
path of execution.
If the multiplication and squaring operations ran in constant time, then the time
for a modular exponentiation would only be correlated to the Hamming weight of
the exponent. For random exponents, the Hamming weight does not, on average,
reveal much information about its value. Montgomery multiplication runs in almost
constant time, but there is a small source of variability resulting from a conditional
subtraction. RSA with Montgomery multiplication is vulnerable to the timing
attack, as is shown in [18].
The second method is to add noise to the execution time of operations. The in-
tended effect is to increase the required number of timing measurements so that the
attack becomes infeasible. Our method of attack and subsequent analysis assumed
the effects of noise were negligible, but this may not be the case. Inserting random
CHAPTER 2. TIMING ANALYSIS 32
delays in operations provides a source of noise, but this will reduce efficiency if the
mean of the delay is large. For a successful timing attack, the required number of
timing measurements roughly increases linearly as a function of the variance of the
random delay.
To defeat the timing attack, implementors should prevent an attacker from
learning the inputs to a vulnerable operation. In the case of RSA, if Marvin
does not know the value of the base used in a modular exponentiation, then
the corresponding timing information is of no use. The algebraic structure of
ZN allows messages to be blinded [13] before they are signed. Rather than sign
the message M ∈ Z∗N Alice can pick a random r ∈ Z∗N and sign the message
M ′ = re ·M mod N instead. Denote the resulting signature by S ′. Alice now cal-
culates r−1S ′ = r−1redMd = r−1rMd = Md mod N to obtain her signature on the
message M . The suitability of this technique depends entirely on the details of the
cryptosystem, but many public key systems have the required algebraic structure.
2.6 Remarks
Our analysis of the timing attack, as it is applied to modular exponentiation in
RSAREF, is complicated by the fact that the exponentiation method there pro-
cesses exponents using a 2-bit window. Our discussion could be greatly simplified
if a method using a one bit window was considered. In [30], Kocher simplifies his
analysis by assuming that every second bit of the exponent is known.
Kocher presents results from several experiments in [30] which support his theo-
retical description of the timing attack. Unfortunately, in that publication, Kocher
CHAPTER 2. TIMING ANALYSIS 33
reveals few practical details of how he actually performed his experiments; this
makes the task of reproducing his experiments somewhat difficult for a reader.
Other authors have been more forthcoming with the details of their experiments.
For example, there is a detailed discussion in [28] which describes how precise tim-
ing information ( e.g., microseconds or better ) can be measured on a PC. In our
own experiments, we found Heidenstrom’s document “Timing on the PC family
under DOS” [27] to be an excellent source of information.
It should be noted that Kocher’s timing attack, as presented in [30] and sum-
marized in the previous sections, does not directly apply to the RSA signature
operation in RSAREF 2.0. Like many implementations of RSA, RSAREF 2.0 uses
the Chinese Remainder Theorem ( CRT ) to calculate signatures7. A consequence
of this method is that the inputs to the two component modular exponentiations
are effectively blinded, so the timing attack can not be applied. If an adversary
has the ability to choose which messages are signed then the timing attack can be
applied to CRT implementations as shown in [47].
Timing attacks are more threatening to dedicated cryptographic devices ( e.g.,
smartcards ) than they are to multitasking devices like PCs. Unless a PC is op-
erating in some controlled mode where interrupts are disabled, isolating the time
required by a single computation can be difficult. Usually, computations are contin-
ually interrupted as the operating system makes routine system calls ( e.g., updating
the system clock ). These interruptions can introduce a large amount of error in
timing measurements.




Participating in a cryptographic protocol is a relatively painless process these days;
usually, any required computation or transmission is quickly done with digital hard-
ware ( e.g., PC, smartcard, cellular phone ). Most of these devices seem to operate
reliably when we use them so we might not think to question if the security of a
protocol depends on the reliability of the device which implements it.
Hardware faults and errors which occur during the operation of a cryptographic
module can affect security. For example, a device might transmit ciphertext or
plaintext according to the value of a single register bit. If that bit was flipped
accidentally by, say, a power surge, then subsequent transmissions would be unin-
tentionally sent in the clear. In this case, the fault changed the operational mode of
the module, and had no influence on the strength of any underlying cryptography.
Engineering criteria have been developed to ensure cryptographic modules operate
34
CHAPTER 3. FAULT ANALYSIS 35
correctly in the presence of faults like this one [21]. However, until the mid 1990s it
was not clear that cryptographers had to worry that faults might increase a cipher’s
vulnerability to cryptanalysis.
On 25 September 1996, Boneh, DeMillo and Lipton announced that the oc-
currence of computational faults can have severe consequences to the strength of
cryptographic schemes [36]. In an extreme example, these researchers demonstrated
that a single erroneous RSA signature can compromise a signer’s private key. Their
results were particularly relevant to the design of smartcard systems since the small
size and intended use of these devices provide an adversary with the opportunity
to induce faults and cause erroneous output1. This discovery received widespread
attention and prompted research into the effects of faults in other cryptosystems.
Outline
The first part of this chapter explains two techniques of fault analysis that can be
used to break RSA implementations. Both attacks exploit computational errors
that occur during an RSA signature operation. The second part of the chapter
explains how fault analysis can also be applied to symmetric ciphers; DES, in
particular. A number of attacks are presented, the first of which assumes that an
adversary is able to obtain two DES encryptions of the same message: a faulty
one, and a valid one. Other attacks are suggested which are less restrictive in
terms of what ciphertexts are useful to an adversary, but make the assumption
that the internal circuitry of the target implementation can be manipulated by
1A malicious user may try to induce fault in his or her smartcard by, say, bombarding it with
radiation or putting it in a microwave. Inducing faults on a remote PC seems to be more difficult.
CHAPTER 3. FAULT ANALYSIS 36
the adversary. To end, countermeasures against all of the mentioned attacks are
discussed.
3.2 RSA Vulnerabilities
Modular arithmetic is a fundamental component of many cryptographic schemes.
One consequence of this fact is that these schemes inherit mathematical properties
such as associativity, commutativity and transitivity which may be exploited by
both system designers and attackers. In the case of RSA, modular arithmetic
allows an adversary to carefully examine the effect faults which occur in a signature
operation.
3.2.1 RSA with CRT
The Chinese Remainder Theorem ( CRT ) can be used to speed up RSA signature
generation. Suppose Alice wishes to sign a messageM ∈ ZN where N is the product
of the primes p and q. Rather than calculate the value S = Md mod N directly,
she uses the factors of N and computes:
Sp = M
dp mod p and Sq = M
dq mod q
CHAPTER 3. FAULT ANALYSIS 37
where dp = d mod (p − 1) and dq = d mod (q − 1). She then computes S to be









The values dp, dq, up, uq can be pre-computed, and the time required to calculate
a linear combination of Sp and Sq is negligible compared to the two component
exponentiations.
The speed-up in using the CRT comes from the fact that doing two exponen-
tiations with moduli half the size of N is quicker than doing one exponentiation
modulo N . If n is the bit length of N then calculating Md mod N using a square-
and-multiply algorithm takes time proportional to n3. The factors of N have bit
length n
2










) = 4 times faster than direct expo-
nentiation. For this reason, many RSA implementations use the CRT for signature
generation including RSAREF 2.0 which was presented in the previous chapter [44].
Boneh, DeMillo and Lipton observed that if exactly one of the values Sp or Sq
is computed incorrectly, then an adversary who has two signatures on the same
message, one correct and the other faulty, can factor N . Based on this result,
Lenstra noted that knowledge of only the faulty signature is sufficient to factor N .
We summarize his technique now.
Suppose an error occurs during the calculation of Mdq mod q, resulting in the
value Ŝq = Mdq mod q. The resulting signature 〈M, Ŝ〉 will be invalid. Consider
CHAPTER 3. FAULT ANALYSIS 38
the value M − Ŝe. We have:
M − Ŝe mod p
= M − Spe mod p
= 0 mod p
and
M − Ŝe mod q
= M − Ŝq
e
mod q
= 0 mod q
Thus p is a factor of M−Ŝe and q is not. So, an adversary merely needs to calculate
gcd(N,M − Ŝe) = p in order to factor N . With additional access to the correct
signature 〈M,S〉 the adversary could instead calculate gcd(N,S− Ŝ) = p, as Boneh
et al. originally suggested.
This attack does not assume anything about the nature of the error that oc-
curred during the calculation of Mdq mod q. It makes no difference if the miscal-
culation was the result of a single hardware fault, multiple ones, or even a software
bug. For this reason, this method of fault analysis is the most general of the ones
we consider in this chapter.
3.2.2 Other Implementations
Not all implementations of RSA use the CRT. However, analyzing these systems
under a more restrictive fault model can still lead to some interesting attacks. We
now describe a variation of an attack presented in [10] which exploits register faults
that occur during modular exponentiation.
Suppose that a non-CRT implementation of RSA uses the right-to-left repeated
square-and-multiply algorithm to do modular exponentiation. Figure 3.1 describes
such an algorithm where the output, S, can be thought of as an RSA signature. This
CHAPTER 3. FAULT ANALYSIS 39
INPUT: M,N, d = (dn−1dn−2 . . . d1d0)2
OUTPUT: S = Md mod N
1 z ←M
2 S ← 1
3 for j = 0 . . . n− 1 do
∗ register fault: z ← z ± 2w
4 if dj = 1 then
5 S ← S · z mod N
6 z ← z2 mod N
7 return S
Figure 3.1: A modification of the right-to-left repeated square-and-multiply algo-
rithm which models register faults.
algorithm requires at least two data registers to store the intermediate values of the
variables z and S. The variable z is used to store the values M,M2,M2
2
, . . . ,M2
n−1
as well as the superfluous value M2
n
. A subset of these values is used to form a
product which equals Md mod N , the signature on the message M . A fault in
the register which contains the variable z can corrupt the factors used in this
product and therefore cause an invalid signature. Assuming that register faults flip
individual bits of z, we show that an adversary with access to a number of erroneous
signatures resulting from single faults can efficiently deduce the value of d.
The attack works by recovering blocks of bits from the binary representation of
d, starting with the most significant bits. To illustrate the technique, suppose that
during the signing of the message M a single register fault, denoted in Figure 3.1
CHAPTER 3. FAULT ANALYSIS 40
at line ∗, occurs when j = n − 2. This error propagates and corrupts two of the
intermediate values of z. Listing the values of z we have:
M,M2, . . . ,M2
n−3
, M̃ , M̃2,
where M̃ = M2
n−2 ± 2w for some w. The value M̃ is the result of a fault in the z
register when z = M2
n−2
. This fault caused the bit in position w of M2
n−2
to be










Using binary notation in the exponents, this can be written more simply as:
Ŝ = Mdn−3...d0M̃dn−1dn−2 mod N.
With the public exponent e, we can derive the following equivalences modulo N :
(M2
n−2




)dn−1dn−2Ŝe ≡M ed(M̃ e)dn−1dn−2 (mod N)
≡M(M̃ e)dn−1dn−2 (mod N)
≡M(M2n−2 ± 2w)e·(dn−1dn−2) (mod N).
So, with knowledge of 〈Ŝ,M〉 and the fact that M̃ = M2n−2 ± 2w for some w, an
adversary can exhaust the possible values of w, dn−1, dn−2 until the condition above
CHAPTER 3. FAULT ANALYSIS 41
holds, thereby revealing 2 bits of d. Since 〈Ŝ,M〉 is erroneous, Ŝe ≡ M mod N
and therefore dn−1dn−2 = 00. Thus, there are 3 values for the pair of bits dn−1dn−2
to consider. If n is the bit length of N then there are n possible values of w to
consider. So it takes at most (22 − 1)n = 3n trials to find the correct values of
w, dn−1 and dn−2.
In practice, an adversary does not know the value of j when the register fault




)dn−1dn−2...dj Ŝe ≡M(M2j ± 2w)e·(dn−1dn−2...dj) (mod N). (3.1)
Consider the following example. Suppose we verify the signature 〈S,M〉 =
〈5066, 42〉 against the RSA parameters e = 3, N = 101 · 113 and determine that it
is erroneous. According to the bit length of N , n = 14, and substituting this and
the other values into equivalence 3.1 gives
(423·2
j
)d13d12...dj 50663 ≡ 42(422j ± 2w)3·(d13d12...dj) (mod 1)1413,
for some values of j, w and bits of d. After some trial and error we find that
(423·2
11
)(011)250663 ≡ 42(42211 + 20)3·(011)2 (mod 1)1413,
and so, for this erroneous signature, we have j∗ = 11, but more importantly, we
have learned that the two most significant bits of d are 11.
For a particular value of j, equivalence 3.1 allows an attacker to identify any
CHAPTER 3. FAULT ANALYSIS 42
erroneous signature with j∗ = j at a computational cost of O((2n−j
∗ − 1)n). Iden-
tifying the correct value of j∗ also reveals the value of the bits dn−1dn−2 . . . dj∗ .
Knowledge of these bits reduces the effort required to identify j∗ for other erro-
neous signatures since there are now fewer unknown bits of d. An adversary with
access to a number of erroneous signatures, with possibly different values of j∗, can
exploit this property using the method of attack described in Figure 3.2.
INPUT: e, n,N, 〈M0, S0〉, Ŝ = { 〈M1, Ŝ1〉, 〈M2, Ŝ2〉, . . . , 〈Mk, Ŝk〉 }
OUTPUT: d = (dn−1dn−2 . . . d1d0)2
1 for j = n− 1 . . . 0 do
2 for each 〈Mi, Ŝi〉 ∈ Ŝ do
3 if j∗i = j then
4 update known bits of d
5 Ŝ ← Ŝ − 〈Mi, Ŝi〉
6 solve S0 = M
d
0 mod N for the unknown bits of d
7 return d
Figure 3.2: A randomized algorithm to deduce the value of d from k invalid
signatures and one valid signature.
Throughout this attack an adversary manages a set of invalid signatures, Ŝ. For
each value of j the set Ŝ is scanned to identify any signature 〈Mi, Ŝi〉 with j∗i = j.
The value j∗i is the value of j in the square-and-multiply algorithm when the fault
that generated 〈Mi, Ŝi〉 occurred. The first identification2 made at a particular
value of j reveals some bits of d. Subsequent identifications at the same value
2We ignore the possibility of false identifications since Boneh et al. show in [10] that if this
probability is non-negligible then N can be efficiently factored.
CHAPTER 3. FAULT ANALYSIS 43
of j can be done quickly using the updated value of d. These identifications do
not contribute any previously unknown bits of d, but discarding these signatures
ensures that effort is not wasted on them in succeeding loop iterations. Most of the
work involved in the attack is spent checking the condition j∗i = j at line 3. This
condition is checked via equation 3.1. Once the set Ŝ is exhausted any remaining
unknown bits of d are deduced by solving S0 = M
d
0 mod N where 〈M0, S0〉 is a
valid signature.
We now give a heuristic analysis of the expected running time of the attack.
For any i, the value j∗i lies in the interval [n − 1, . . . , 0]. The values j∗1 , j∗2 , . . . , j∗k
can be ordered and, if necessary, relabeled so that j∗1 ≤ j∗2 ≤ . . . ≤ j∗k . The
first identification made at line 3 of Figure 3.2 recovers n − j∗k bits of d. The
second identification recovers an additional j∗k − j∗k−1 bits, and so on for subsequent
identifications. Thus, the running time of the attack is proportional to:
n−j∗k∑
l=1
k · (2l − 1) · n+
j∗k−j∗k−1∑
l=1
(k − 1) · (2l − 1) · n+ · · · +
j∗1∑
l=1
1 · (2l − 1) · n











Assuming the j∗i follow a uniform distribution, the probability that none of
these values hit a particular interval of width r is (1 − r
n
)k ≈ e− rn k. Since there are
at most n such intervals, the probability that all of them contain a hit is at least
1 − ne− rn k. Taking r = n
k
ln 2n, we see that this event occurs with probability at
least 1
2
. Thus, with probability at least 1
2
, the differences j∗i − j∗i−1 are bounded by
CHAPTER 3. FAULT ANALYSIS 44





















With k = n lnn erroneous signatures, the attack takes O(n3 ln2 n) time.
One possible improvement to this method is to check j∗i against two values at
each invocation of line 3 in Figure 3.2. Equivalence 3.1 is used to check the condition
j∗i = j, and the following equivalence can be used to check if j
∗
i = n− j:
(Ŝe)2
n−j
(M̃ e)d(n−j)−1d(n−j)−2...d0 ≡ (M e2n−j )d(n−j)−1d(n−j)−2...d0M̃ mod N (3.2)
This modification allows blocks of bits to be recovered from both the left and right
ends of d, so the set Ŝ is exhausted more quickly. There is no advantage in terms
of the number of invalid signatures required, however if the value of j∗i is controlled
by the attacker rather than being uniformly distributed over [n−1, . . . , 0] then the
required number of signatures is reduced significantly. An attacker could effectively
divide d into, say, 10 bit blocks and recover each one by brute force. This capability
might be possible in a more intrusive fault model.
3.3 DES Vulnerabilities
After reviewing the discoveries of Boneh, DeMillo and Lipton, one might consider
whether fault analysis can be applied to cryptosystems which do not utilize modular
arithmetic. Typically, symmetric ciphers use bit or byte oriented operations ( e.g.,
CHAPTER 3. FAULT ANALYSIS 45
AND, XOR, ROTATES ) and so the techniques previously discussed are not directly
applicable.
Biham and Shamir quickly answered this point in [6]. They showed that an
implementation of the Data Encryption Standard ( DES ) could be easily broken
if it was subject to the same random register faults that Boneh et al. considered.
Their method of attack combined techniques from differential cryptanalysis [5] with
fault analysis, and was aptly named differential fault analysis.
We present a version of Biham and Shamir’s attack on DES and then describe
how an adversary can attack DES by exploiting permanent register faults. Before
dealing with these topics, we give a brief overview of DES.
3.3.1 DES Algorithm
DES is the most widely recognized and implemented block cipher in the world to
date. Most readers will be familiar with this cipher so our description will mainly
serve as an introduction to the notion which we use in subsequent sections. Further
details about DES can be found in [22].
DES is a 16 round Feistel cipher which uses a 56-bit keyK to map 64-bit message
blocks to 64-bit ciphertext blocks. Each round of DES updates two 32-bit registers,
Ri and Li, using the round function f and some bits of K. A DES encryption is
described in Figure 3.3.
The algorithm works as follows. The message M is subject to an initial permu-
tation, IP , and is then halved into L0 and R0. Each half is then updated according
to the operations described in lines 4 and 5. The round function, f , takes two
CHAPTER 3. FAULT ANALYSIS 46
INPUT: M,K
OUTPUT: C = DESK(M)
1 derive the subkeys K1, K2, . . . , K16 from K
2 L0R0 ← IP (M)
3 for i = 1 . . . 16 do
4 Li ← Ri−1
5 Ri ← Li−1 ⊕ f(Ri−1, Ki)
6 C ← FP (R16L16)
7 return C
Figure 3.3: The DES algorithm.
inputs, the value of Ri−1 and a subkey. Each of the 16 subkeys, K1, K2, . . . , K16,
is composed of a subset of 48 bits of K. The subkeys are pre-computed in line 1
but to save memory it is also possible to generate them on the fly.
The round function f is defined as:
f(Ri−1, Ki) = P (S(E(Ri−1) ⊕Ki)).
Here, the right half, Ri−1, is expanded to 48 bits by the expansion permutation, E,
and is then xored with Ki. The result is then broken into 6 bit blocks and used to
index entries in 8 tables or S-boxes. This operation is denoted by S. Each table
entry is 4 bits so the result of S is 32 bits. The bits of the returned table entries
are then permuted according to the round permutation P .
At the end of round 16 a final permutation, FP , is applied to the right and
CHAPTER 3. FAULT ANALYSIS 47
left halves. This is the inverse of the initial permutation ( i.e., FP = IP−1 ). The
output of the algorithm is C = DESK(M), the encryption of M under the key K.
The definition of all the permutations and tables used in DES are public knowl-
edge. Thus, the security of a DES encryption rests solely in the secrecy of the
key.
3.3.2 Differential Fault Analysis
Consider a smartcard which implements DES, as summarized in Figure 3.3. The
environment in which the smartcard operates can be controlled by any party in
possession of it, so there are several ways in which a malicious user can force
a malfunction, including changing the power supply voltage, adjusting the clock
frequency or applying radiation.
Suppose that smartcard malfunctions are realized as single bit inversions in the
registers which store the 32-bit values Li−1 and Ri−1. These faults affect interme-
diate values computed during a DES encryption and can therefore cause erroneous
output. From the description of DES, we have Ri = Li−1 ⊕ f(Ri−1, Ki), so the
only consequence of a single bit error in Li−1 is an identical single bit error in Ri.
Because of this 1-1 correspondence, we can simplify our fault model to consider
errors only in Ri−1. In the following analysis, we assume erroneous encryptions
are the result of a single bit of Ri−1 being flipped, for some value of i. Figure 3.4
describes a version of the DES algorithm under this model.
To mount Biham and Shamir’s attack, an adversary obtains two encryptions of
some ( possibly unknown ) plaintext from the smartcard. One encryption is carried
CHAPTER 3. FAULT ANALYSIS 48
INPUT: M,K
OUTPUT: C = DESK(M)
1 derive the subkeys K1, K2, . . . , K16 from K
2 L0R0 ← IP (M)
3 for i = 1 . . . 16 do
∗ register fault: Ri−1 ← Ri−1 ⊕ e
4 Li ← Ri−1
5 Ri ← Li−1 ⊕ f(Ri−1, Ki)  where f(Ri−1, Ki) = P (S(E(Ri−1) ⊕Ki))
6 C ← FP (R16L16)
7 return C
Figure 3.4: A faulted version of DES the algorithm.
out under normal environmental conditions, resulting in the ciphertext C, and the
other is carried out under some environmental stress so that the register fault at
line ∗ occurs, resulting in the ciphertext Ĉ. We will assume, for the time being, that
only one register fault occurs during an erroneous encryption. Denote the value of
i, or equivalently, the encryption round, when the fault occurred by i∗. Ciphertext
Ĉ was corrupted by a single bit error in Ri∗−1. Figure 3.4 denotes a particular bit
error using a 32-bit string, e, which has Hamming weight equal to 1.
By inverting the final permutation, FP , an adversary can construct R16L16 from
C and R̂16L̂16 from Ĉ. Further, since L16 = R15 ( Figure 3.4, line 4 ) an adversary
also knows R15 and R̂15. If the register fault occurred in round 16 ( i.e., i
∗ = 16 )
then R15 ⊕ R̂15 will reveal precisely which bit of R̂15 was inverted. The subsequent
steps in the attack may seem more intuitive with reference to a diagram of the
CHAPTER 3. FAULT ANALYSIS 49



















Figure 3.5: The computational path of the last few rounds of a DES encryption.
Continuing under the assumption i∗ = 16, we have L15 = L̂15. The output of
the function f in round 16 is masked by this value, but an attacker can calculate
R16 ⊕ R̂16 =
(




L̂15 ⊕ f(R̂15, K16)
)
= f(R15, K16) ⊕ f(R̂15, K16)
to reveal the difference in the output of the two round functions. Moreover, since
permutations are linear operations, the difference in the output of the S-box table
CHAPTER 3. FAULT ANALYSIS 50
lookups is revealed by:
P−1(R16 ⊕ R̂16) = S(E(R15) ⊕K16) ⊕ S(E(R̂15) ⊕K16)
By design, the S-box operation is nonlinear so the influence of K16 is not cancelled
out in this last calculation. The difference in the input to the S-box operation is
revealed by E(R15 ⊕ R̂15).
Differential cryptanalysis uses these input and output differences to derive in-
formation about the 48 bit subkey K16. To illustrate, suppose we have:
E(R15 ⊕ R̂15) = 0x100000000000
= 000100|0 . . . 0
P−1(R16 ⊕ R̂16) = 0xC0000000
= 1100|0 . . . 0
These values3 indicate that in round 16, the input difference to the first S-box, S1,
is 000100, or 0x4, and the output difference is 1100 or 0xC. Six bits of K16 influence
the output difference of S1. Referring to the difference distribution tables compiled
in [5], we see that out of all possible 6-bit values only two can produce this output
difference. Hence, we are effectively able to deduce 5 bits of K16.
S1 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xA 0xB 0xC 0xD 0xE 0xF
0x1 0 0 0 6 0 2 4 4 0 10 12 4 10 6 2 4
0x2 0 0 0 8 0 4 4 4 0 6 8 6 12 6 4 2
0x4 0 0 0 6 0 10 10 6 0 4 6 4 2 8 6 2
0x8 0 0 0 12 0 8 8 4 0 6 2 8 8 2 2 4
0x10 0 0 0 0 0 0 2 14 0 6 6 12 4 6 8 6
0x20 0 0 0 10 0 12 8 2 0 6 4 4 4 2 0 12
Figure 3.6: Rows from the difference distribution table of S1.
3Hexadecimal values are prefixed with 0x.
CHAPTER 3. FAULT ANALYSIS 51
The rows of the difference distribution table for S1 which correspond to single
bit errors are presented in Figure 3.6. The leftmost column of the table indicates
the input difference and the uppermost row indicates the output difference. The
remaining entries enumerate the number of 6-bit values of the key which produce a
given output difference. One of the design criteria of the S-boxes was that changing
an input bit causes at least two output bits to change. This explains why five
columns of zeroes appear in Figure 3.6.
On average, an error in round 16 eliminates all but 64
11
≈ 6 key values for each
S-box it affects. The first error which affects an S-box will provide an attacker with
about 3 key bits. Because of the definition of the expansion permutation ( Figure
3.7 ) a fault in R̂15 is just as likely to affect two S-boxes as one, and this would
reveal additional key bits.
32 1 2 3 4 5
4 5 6 7 8 9
8 9 10 11 12 13
12 13 14 15 16 17
16 17 18 19 20 21
20 21 22 23 24 25
24 25 26 27 28 29
28 29 30 31 32 1
Figure 3.7: DES expansion permutation.
Of course, not all faults occur in round 16, but faults in round 15 can be analyzed
in a similar manner. Suppose now that i∗ = 15. In this case we do not know exactly
which bit of R̂14 was inverted, but the range of possibilities can be narrowed. As
before, we can determine R15 and R̂15, then P
−1(R15⊕R̂15) reveals which S-box(es)
CHAPTER 3. FAULT ANALYSIS 52
were affected by the fault in R̂14. For example, if this calculation reveals that only
S2 was affected by the fault then we know from Figure 3.7 that the error occurred
in bit position 6 or 7. Likewise, if both S1 and S2 were affected by the fault, then
the error occurred in bit position 4 or 5.
The value of P−1(R15 ⊕ R̂15) also reveals the output difference of any affected
S-box in round 15 and this may determine the exact location of the error. Suppose
that the fault inverted either bit 1 or 32 and the output difference of S1 is 0x5.
From the difference distribution table of Figure 3.6 we see there are no 6-bit key
values which produce an output difference of 0x5 when the input difference is 0x10.
Therefore, the error could not have inverted bit 1.
The output difference of the S-boxes in round 16 is revealed by P−1(R16⊕R̂16⊕e)
where e is one of two possible error strings. At least two of the S-boxes will have
non-zero input differences and the value of e can mask the output difference of at
most one of them. Thus, not knowing the exact value of e is of little consequence
since we can work around it if necessary.
Potentially, a fault in round 15 can reveal more information about K16 than a
fault in round 16. In any case, given a ciphertext pair 〈C, Ĉ〉 an adversary can
easily determine whether i∗ = 15 or 16 and then uncover that information. With
enough ciphertext pairs all 48 bits of K16 can be determined. The remaining 8 bits
of K can be found using a brute force search or, alternatively, the last round of
DES can be peeled back and then differential fault analysis can be re-applied using
a subset of the faulty ciphertext pairs. The latter technique can be used to attack
triple DES.
CHAPTER 3. FAULT ANALYSIS 53
On a personal computer, Biham and Shamir implemented their attack by sim-
ulating random faults in the Ri register throughout the 16 rounds of DES; that is,
one fault, uniformly distributed over the 16 rounds, per encryption. The two were
be able to deduce bits of K16 using ciphertext pairs where the erroneous ciphertext
resulted from an error in the last three DES rounds.
A particularly clever part of their implementation is illustrated in the way they
counted 6-bit key values. Initially, each S-box in round 16 is affected by any one
of 64 possible 6-bit key values. As ciphertext pairs are analyzed, input and output
differences are derived that narrow the correct 6-bit values to subsets of the 64
possibilities. For each S-box, the number of times that a 6-bit value falls in one of
these subsets is counted. After all ciphertext pairs have been analyzed the correct
6-bit values are expected to be counted more frequently than any other value and
can whence be identified.
Although on average only 3
16
of their generated ciphertext pairs were useful in
attacking K16, Biham and Shamir found they were able to completely determine
this subkey with 50 to 200 ciphertext pairs.
3.3.3 Intrusive Fault Analysis
One criticism of differential fault analysis is that the fault model it assumes is
unrealistic. Biham and Shamir responded to this with a host of alternate attacks
which exploit permanent or stuck faults in hardware registers, which they hope are
less controversial.
These new attacks require an adversary to physically intrude into the circuitry
CHAPTER 3. FAULT ANALYSIS 54
of cryptographic tokens and then fix the contents of some memory cells with the aid
of, say, a narrow laser beam. More frugal attackers might choose to probe memory
cells [25]. For smart cards, this capability first requires an adversary to expose the
circuitry of the embedded chip. Anderson and Kuhn explain how to accomplish
this with a process they claim is easy to do [2]. Under this intrusive fault model it
is possible to analyze erroneous DES encryptions without the use of the differential
tables previously required.
DES may be implemented in hardware using an iterative design so that only
one register is used to store the 16 values of Li. Suppose that the least significant
bit of this register is damaged by cutting the wire which either enters or leaves that
memory cell, so that its contents are always 0. Figure 3.8 depicts the last round of








Figure 3.8: The effect of a permanent register fault in the last round of DES.
Recall that, given a ciphertext, an adversary can reconstruct R16. The least
CHAPTER 3. FAULT ANALYSIS 55
significant bit of R16 equals the least significant bit of L15 xored with an output
bit of an S-box. Because of the register damage, the least significant bit of L15 is
0 so the S-box output bit is revealed. By inverting the round permutation P we
find that we can determine an output bit of S7. The input to S7 is the xor of 6
unknown key bits and 6 known bits of R15. All of the 64 possible key values can be
exhausted to see which ones give an output that agrees with the least significant
bit of R16. One ciphertext will eliminate about half of the possible 6-bit key values.
With several ciphertexts an attacker can use the key counting technique described
previously to identify the correct value. The key input to other S-boxes can be
revealed by damaging additional register cells and obtaining more ciphertext.
In this attack it is not necessary for an adversary to process pairs of ciphertext.
Ciphertext that results from faulty DES encryptions alone will suffice, although it
may be advantageous to obtain a valid plaintext ciphertext pair before any damage
is done to the token. With about six ciphertexts per S-box an attacker will be able
to uncover K16 and then the remaining key bits can be found by brute force search.
If a hardware token implements DES using distinct registers for the values of
Li ( i.e., an unrolled implementation ) the attack becomes easier. Destroying all
the memory cells of L15 exposes the output of the S-boxes in round 16. With one
ciphertext, the input and output of any S-box can exhaustively compared using each
of 64 possible key inputs. This will narrow the key value to one of four possibilities.
It takes only about two ciphertexts to determine the last round’s subkey.
In iterated implementations of DES it also possible to target key bits across the
16 rounds, rather than the key bits used in a particular round ( i.e., a subkey ), in
CHAPTER 3. FAULT ANALYSIS 56
what Biham and Shamir describe as a vertical attack. In this approach an adversary
successively encrypts a constant message, M , 48 times. Each encryption is carried
out with an additional one of the wires which transfers subkey bits into the f
function severed. Initially, no wires are severed. Denote the resulting ciphertexts
by C0, C1, . . . C47.
Ciphertext C47 is the encryption of M with all subkeys equal to zero except in
their last bit position. An adversary can now determine the last bit of each subkey
by encrypting M under the 216 possible sets of subkeys and comparing it to C47.
The value of these bits gives 16 bits of the DES key. Additional key bits can be
recovered by repeating this process using C46. The key bits used to form bits 47
and 46 of each subkey are not independent of each other so there are less than 216
values to exhaust in this second step. The attack proceeds by examining each of
C47, C46, C45, . . . in this way until the complete key is reconstructed.
3.4 Countermeasures
All of the attacks we have surveyed in this chapter require a cryptographic im-
plementation to somehow provide erroneous output. To resist these attacks, it is
sufficient that an implementation simply does not provide this output. To this
purpose, the results of cryptographic operations can be verified before they are
publicly exposed. Verifying a result requires extra work and the subsequent loss in
efficiency depends upon the details of the implementation.
In ciphers such as DES, checking a ciphertext for correctness can be done by
computing the encryption function twice and comparing the two results. This
CHAPTER 3. FAULT ANALYSIS 57
decreases efficiency by a factor of 2, and worse, in the random transient fault
model, this precaution may fail to detect erroneous output with non-negligible
probability. For example, in our discussion of differential fault analysis there were
32·16 = 512 bit positions in which a fault could occur. Given two faulty encryptions,
the probability that the same fault occurred in each is 1
512
. Now, the number of
encryptions a malicious user must try in order to obtain the required number of
faulty ciphertexts increases by a factor of 512. In the intrusive fault model, this
countermeasure will fail completely since faults are permanent and they will effect
both encryptions in the same way. Using a decryption to verify the correctness of
a ciphertext seems to be a better choice for a computational check.
The time it takes to verify an RSA signature depends upon the value of the
public exponent, and it is common to use to a small value ( e.g., e = 3 ) to
exploit this fact. Thus, verifying a signature may not be as costly as generating
it, and the overhead of using this countermeasure can be small. When e is large,
Shamir has proposed the following check for implementations which use the CRT
that is less costly than a full signature verification. Recall that with the CRT,
signatures are calculated using the values Sp = M
d mod p and Sq = M
d mod q
and errors in these computations are particularly disastrous. To facilitate a quick
computational check, Shamir instead suggests that a random value, r, about 32 bits
in size, be chosen, and then the values Spr = M
d mod pr and Sqr = M
d mod qr
calculated. If Spr mod r = Sqr mod r, then the exponentiations were carried out
correctly ( with high probability ) and the signature can be constructed from a
linear combination of Sp = Spr mod p and Sq = Sqr mod q.
CHAPTER 3. FAULT ANALYSIS 58
Randomization can also be used to resist fault analysis attacks. Padding a
message M with random bits before it is encrypted or signed will defeat all of the
attacks we have discussed, except for the ones which exploit intrusive faults. For
DES and other ciphers with small block sizes, this approach is not likely to be
viable since some number of input bits would have to be sacrificed to store random
values. For RSA, signature schemes which incorporate randomization have been
described in detail and can be implemented with little overhead [3].
Intrusion detection and self-tests are other methods which cryptographic tokens
can use to protect against these attacks. Cryptographic hardware is commonly engi-
neered to conform to the FIPS 140-1 standard which encompasses these techniques
[21].
3.5 Remarks
Fault analysis has been applied to elliptic curve cryptosystems, as described in [4].
The authors there explain how register faults can perturb points from cryptographi-
cally strong curves onto less strong curves. An adversary can then solve the discrete
log problem on the weaker curve to gain information about the private key.
The attacks Biham and Shamir proposed which exploit intrusive faults are very
similar to the probing attacks described in [25]. The authors in [25] address that
fact even if an adversary can intrude into the circuitry of a target device it is
unlikely that he or she will be able to target particular memory cells or bus lines.
Biham and Shamir do not deal with this issue since they assume that an adversary
can damage particular components at will. With this capability, it is likely that an
CHAPTER 3. FAULT ANALYSIS 59
adversary will choose to probe memory cells containing key bits and read the key




The notion that the power consumption of a cryptographic token can convey sensi-
tive information to an adversary was suggested, almost offhandedly, in [30]. There,
Kocher noted that padding the execution time of operations with dummy compu-
tations ( e.g., empty loops ) may be an ineffective defense against timing attacks
since the power consumption of dummy computations can be much different from
meaningful ones. In this case, an adversary could plot, or trace, the power con-
sumption of a token as it executes a particular operation and then deduce a valid
timing measurement from the length of the initial pattern in the trace.
It is not difficult to imagine a situation where an adversary might have the
opportunity to collect power consumption data. In digital cash systems, a patron
typically initiates a purchase by inserting his or her token into a device, such as a
reader, which is assigned to a vendor. If the token draws power from the reader
60
CHAPTER 4. POWER ANALYSIS 61
then the vendor can potentially monitor this power consumption. So, to evaluate
the security of such systems, the information that an adversary can derive from a
token’s power consumption must be accounted for. Kocher and his newly founded
consulting company apparently spent several months investigating this topic.
In 1998, Kocher and the results of his research were again featured in the New
York Times [50]. The story there summarized some of the details concerning power
analysis that Kocher had recently announced. One particularly startling claim was
that for some tokens, a power trace of a single cryptographic operation is enough
to completely reveal the value of an embedded secret key. Even more startling was
the claim that by examining roughly 1000 power traces Kocher and his employees
had managed to break every smart card product they had examined in the last
year and a half. As more technical details [31] concerning these discoveries were
released it became clear that power analysis was a serious threat to the security of
cryptographic tokens.
Outline
We first explain why the power consumption of tokens is correlated to the calcu-
lations they perform. Next, we show how power consumption information can be
analyzed to deduce what operation a token is executing at a particular moment,
as well as what operands it is manipulating. We describe Simple Power Analy-
sis ( SPA ), some Hamming weight attacks, and then Differential Power Analysis
( DPA ). We end by surveying some countermeasures against these attacks.
Much of the author’s research into power analysis was conducted during a work
CHAPTER 4. POWER ANALYSIS 62
term with the Advanced Concepts and Technology group at Pitney Bowes Inc. in
Shelton, CT. The graphics displayed in this chapter appear by their courtesies.
4.2 Power Dissipation
Electronic devices draw current from a power source during their operation. The
amount of current they draw varies as the paths the current follows through the
device changes. To measure the flow of current a small ( approximately 10-50 Ω )
resistor is put in series with a device’s power supply. An oscilloscope can be used
to measure the voltage difference across the resistor and the current can then be
deduced using Ohm’s law1. Digital oscilloscopes can be used to sample voltages at
high frequencies giving a trace of the flow of current over an interval of time.
The source of current for most devices is supplied at a constant voltage and so
the power dissipated by these devices is proportional to the flow of current through
them2. Because of this, power analysis attacks work just as well with current
measurements as they do with power measurements. Hence, the only difference
between a power analysis attack and a current analysis attack is a constant factor.
Most modern cryptographic devices are implemented in CMOS ( Complemen-
tary Metal Oxide Semiconductor ) logic. The basic building block of CMOS logic
is the inverter, or NOT gate. As depicted in Figure 4.1, the inverter contains two
transistors which act as voltage controlled switches. When the input voltage to the
1From Ohm’s law, we have I = VR where V is the voltage measured across a resistance, R, and
I is the current.
2If P is the power then P = IV .





Figure 4.1: CMOS logic inverter leading to a capacitor.
inverter is high, the top switch opens while the bottom switch closes. This grounds
the inverter’s output and so it goes low. Conversely, when the input voltage is high,
the top switch closes and bottom opens setting the output to +V which thus goes
high.
There is a brief instant, when the inverter is in transition between states, when
both transistors conduct current. This causes a short circuit from +V to the ground
which temporarily dissipates current. Even in a static state, transistors continu-
ously draw a small amount of current which is dissipated as heat and radiation.
The most dominant source of power dissipation is usually caused by the charging
and discharging of internal capacitive loads attached to gate outputs. A thorough
discussion of all three factors is given in [51] and [16].
Power consumption information is useful to an adversary because it is correlated
to the calculations the token is making.
CHAPTER 4. POWER ANALYSIS 64
4.3 Correlation with Operations
In devices with microprocessors, such as smart cards, a few primitive operations
( e.g., LOAD, STORE, etc. ) are used repeatedly during a computation, causing a
regular switching of transistors. This regularity is often observable in power traces
as repeated patterns. In iterative computations, including most cryptographic al-
gorithms, this regularity is especially apparent and can leak sensitive information
to observers.
4.3.1 Simple Power Analysis
Simple power analysis ( SPA ) is a technique whereby information about the oper-
ation of a cryptographic token is deduced directly from a power trace. Depending
on how a cipher is implemented, this information may reveal key material.
Figure 4.2 displays two representations of power consumption data acquired
from a smart card during the first few rounds of a DES operation. The mea-
surements were collected at a rate of 100 MHz using a digital oscilloscope which
converted analog voltages, measured across a resistor, into 12-bit values.
The top trace is composed of raw voltage samples. As transistors in the device
switch, the measured voltage either spikes or dips suddenly. Since a large number of
transistors switch during this computation the trace appears rather noisy. However,
some features, in the early part of the trace at least, can still be discerned. The
bottom trace replaces each group of 100 samples in the top trace with their average,
which smoothes out most of the erratic spikes and dips. The resulting trace is
much clearer and can be compared in greater detail to the description of the DES
CHAPTER 4. POWER ANALYSIS 65
Figure 4.2: SPA traces of a DES operation.
algorithm.
A sequence of operations, constituting a single round, is iterated 16 times during
a DES encryption. In the averaged trace of Figure 4.2, we can see a pattern between
indices 95 000 and 140 000 which seems to repeat throughout the remainder of the
trace. Each occurrence of this pattern is prefixed by a characteristic that resembles
either a V or a W. From a trace of the complete DES operation a total of 16 V’s
and W’s appear, marking 16 occurrences of the pattern. This evidence suggests
that the pattern may represent the calculations of a single DES round. Figure
4.3 provides a more detailed view of the trace of the first three occurrences of the
pattern.
The exact sequence of V’s and W’s, as read from the complete trace, is: VV-
WWWWWWVWWWWWWV. When DES subkeys are generated on the fly, the
56-bit key is initially halved into two registers which are then rotated and permuted
as the subkeys are needed. The sequence of V’s and W’s corresponds exactly to the
CHAPTER 4. POWER ANALYSIS 66
Figure 4.3: An SPA trace of DES rounds one to three.
sequence of rotations described in the DES key schedule [22]: 1122222212222221.
From this fact we can infer that the smart card generates its subkeys on the fly.
Identifying the power characteristics of key rotations can sometimes reveal key
bits. A common way to implement rotations is to shift one bit off the end of a
register and, by default, append a zero on the other [32]. If the bit shifted off the
end is a one, then the appended zero bit is flipped. This conditional operation may
be detectable in a power trace. In the case of DES, making this determination in
each round would reveal all 56 bits of the key.
0 0 1 1 1
Figure 4.4: An SPA trace of an RSA signature operation.
Exponentiation can also be analyzed using SPA. The conditional branches of
square and multiply algorithms can be identified from power traces if the square
CHAPTER 4. POWER ANALYSIS 67
and multiply operations have different power characteristics. Figure 4.4 shows
a portion of a trace from a smart card calculating an RSA signature. Each of
the nine spikes indicates the beginning of a square or multiply operation. Initially,
registers are loaded with values to be squared or multiplied. Multiplications require
additional register loads which increases the width of the leading spike. As a result,
square operations ( narrow spike ) can be distinguished from square-and-multiply
operations ( narrow spike followed by a wider spike ). Thus, five key bits can be
determined from the trace: 00111.
Interpreting SPA characteristics is more easily done with some details about the
target implementation. With complete details ( e.g., source code ), an attacker can
focus on particular regions of a power trace to try and distinguish the characteristics
of specific operations. Generally, any implementation where the path of execution
is determined by key bits has a potential vulnerability to this attack.
4.4 Correlation with Operands
Microprocessors retrieve values from memory using a data bus. The data bus has
a capacitance associated with it that is charged and discharged according to the
values loaded on it. This causes some variation in a device’s power consumption,
but the effects are usually small and can be overshadowed by measurement error
and other sources of noise3.
Experiments in [42] and [1] have discovered two types of correlations between
3Statistically, each power consumption measurement in a trace can be treated as an observation
of a random variable. The noise affecting a measurement is just the standard deviation of the
corresponding random variable.
CHAPTER 4. POWER ANALYSIS 68
data values and power consumption. Hamming weight correlation occurs when
power consumption varies with the number of ones driven onto the bus. Transition
count correlation occurs when power consumption varies with the number of bits
which change on the bus ( i.e., the Hamming weight of the xor of the current and
previous data value ). Which type of correlation is observed in a particular device
depends on its design.
The power consumption of operations which manipulate key bits are of par-
ticular interest to an adversary. However, without detailed knowledge of an im-
plementation, locating these operations in a single power trace can be difficult.
With access to several power traces, an adversary can apply statistical techniques
to locate these regions.
In this section we give a brief example of how an adversary might exploit power
consumption information correlated to the Hamming weight of operands and then
describe some more general attacks which detect power biases due to the value of
individual bits.
4.4.1 Hamming Weights
















bits of information about its value. The microprocessors used in many crypto-
graphic tokens manipulate data in 8-bit blocks, so power analysis can potentially
CHAPTER 4. POWER ANALYSIS 69
















bits of information per key byte for a total of 7 × 2.54 ≈ 17.8 key bits. This
information makes a DES key ( even more ) susceptible to a brute force attack
since the size of key space is now reduced to roughly 238. However, against ciphers
with longer key lengths, such as triple-DES, exhaustive keys searches are infeasible
even with Hamming weight information.
10 51 34 60 49 17 33 57
2 9 19 42 3 35 26 25
44 58 59 1 36 27 18 41
22 28 39 54 37 4 47 30
5 53 23 29 61 21 38 63
15 20 45 14 13 62 55 31
Figure 4.5: The round one DES subkey.
Depending on the details of the target cipher, it may be possible to use Hamming
weight information in more effective attacks. This is the case with DES, as noted in
[42] and [7]. To illustrate, denote the bits of a DES key, including parity check bits4,
by k1k2 . . . k64. The key bits which compose the first round’s subkey are described
in Figure 4.5. The Hamming weight of the first byte of this subkey can be described
with the equation:
k10 + k51 + k34 + k60 + k49 + k17 + k33 + k57 = w1.
4Every eighth bit is set so that each key byte has an odd Hamming weight.
CHAPTER 4. POWER ANALYSIS 70
Expressing the Hamming weight of all key bytes throughout all subkeys in this way
generates a total of 96 equations in 56 unknowns. A calculation using linear algebra
software shows that the coefficient matrix of this system has full rank and so there
is a unique solution for any vector of Hamming weights.
Practically speaking, it is likely that any vector of Hamming weights deduced
from a power trace will contain errors which can cause difficulties in finding an
integral solution to the system. This problem can be overcome in two ways. The
redundancy in the system of equations can be exploited using standard techniques
from error correcting codes. Alternately, a careful study of the DES key schedule
shows that each of the 96 equations contains variables from only one of two subsets
of 28 key bits5. Thus, the original system can be split into two independent systems
of 48 equations and 28 unknowns. Each of the 228 possible solutions in each system
can be tried to see which ones agree most closely with the observed Hamming
weights. Thus, the value of the DES key can be deduced.
4.4.2 Differential Power Analysis
Differential power analysis ( DPA ) is probably the most threatening attack to
result from Kocher’s research. To carry out a DPA attack, an adversary must
have a number of power traces collected from a token as it repeatedly executes a
cryptographic operation. The attack proceeds by deducing bits of the secret key,
used in each operation, from the observed power consumption. An adversary must
also have knowledge of either the inputs or outputs processed by the device during
5This property results from the definition of the DES permutation PC-2.
CHAPTER 4. POWER ANALYSIS 71
each operation. Usually, an encryption token will use the same key over multiple
operations and any generated ciphertext can be freely obtained by an eavesdropper.
The basic technique of DPA is as follows. Suppose an adversary is able to parti-
tion power traces from several cryptographic operations into two groups according
to the intermediate value of some bit, b, calculated during each operation. This
bit is manipulated during each operation and its value may affect the observed
power consumption. If this is the case then the two groups of traces should show
respectively different power biases at locations when b is manipulated. Averaging
the traces in each group helps reduce any noise that may be obscuring these usually
small biases. Plotting the difference of the two average traces reveals any locations
in the traces where these biases occur.
More precisely, let T1, T2, . . . , Tn be the traces collected from a token. Each
trace is an array of k power consumption measurements and represents the power
consumed during each cryptographic operation. For example, a token might exe-
cute, say, 1000 encryptions allowing an adversary to collect n = 1000 traces and
1000 corresponding ciphertexts. The number of measurements in each trace, k,
depends on the sampling rate and memory capacity of an adversary’s equipment,
as well as the duration of the cryptographic operation. Typically, we might have
104 ≤ k ≤ 106.
The two halves of the partition are defined as:
T0 = {Ti : b = 0}
T1 = {Ti : b = 1}.
CHAPTER 4. POWER ANALYSIS 72
The value of b is usually related to the inputs or outputs processed by the token.
If these inputs or outputs are sufficiently random, then both T0 and T1 will contain
roughly the same number of traces. The partitioning bit b might simply be a
particular bit of ciphertext.













where |T1|+ |T0| = n and Ti[j] is the jth power consumption measurement in trace
Ti. Each of A0 and A1 is an array of k averages. The difference, or differential
trace, of A0 and A1 is defined for j = 1 . . . k as:
∆[j] = A1[j] − A0[j].
It might be that the token manipulates bit b more than once throughout an oper-
ation. This is the case with the plaintext bits that enter a DES implementation.
Suppose the bit b is manipulated by the token at times j∗. If the expected difference
in power when the token manipulates the two values of b is ε, then we have:
E[ Ti[j
∗] | b = 1 ] − E[ Ti[j∗] | b = 0 ] = ε.
CHAPTER 4. POWER ANALYSIS 73
At times j = j∗ the power consumption is independent of the value of b, so:
E[ Ti[j] | b = 1 ] − E[ Ti[j] | b = 0 ] = E[ Ti[j] ] − E[ Ti[j] ] = 0.
As the number of traces grows, A1[j] and A0[j] converge to E[ Ti[j] | b = 1 ] and





A1[j] − A0[j] =

ε for j = j∗
0 otherwise
so the differential trace will appear flat with spikes of height ε at times j∗.
Figure 4.6 displays the result of this technique when applied to an implementa-
tion of DES. Using known plaintexts, traces of the first two rounds of one thousand
DES encryptions were partitioned into two sets according to the value of the first
bit of the register R0. This bit is just a copy of a particular plaintext bit, and the
distribution of the plaintexts determined that roughly half of the traces were placed
in each partition. For reference, the differential trace is plotted below an average
of all the traces collected. A clear bias or spike can be distinguished in the first
round.
To see how this technique can be used to recover bits of the secret key, consider
another iteration of this last experiment where the first bit of R1 is used to partition
the traces. Recall that R1 = L0 ⊕ f(R0, K1). Since the plaintext used in each
encryption is known, the only unknowns in this equation are the key bits. Without
knowledge of the key bits, we cannot determine the value of the first bit of R1 and
CHAPTER 4. POWER ANALYSIS 74
Figure 4.6: The average of 1000 traces and a differential trace.
hence we cannot partition the traces. However, from the definition of the round
function f in Figure 4.7, we see that any bit of R0 is influenced by only 6 key bits.
By exhausting each of the 26 key values we can calculate 26 different partitions
of the traces. Only the correct 6-bit key value will partition the traces according
the value of the bit actually calculated in the device. Thus, only one of 26 differ-
ential traces will show biases and can therefore be identified. Figure 4.8 shows the
differential trace for the correct key.
Proceeding in this way, the subkey used in the first round can be reconstructed
6 bits at a time. Once the complete subkey is known, the remaining 8 bits of the
DES key can be found using an exhaustive search. If an exhaustive search is not
possible, as is the case with triple-DES, the attack can be repeated using the bits
of R2 to partition the traces.
The attack can also implemented using known ciphertexts. In this case, the
traces are partitioned using bits of L15. Since L15 = R16⊕f(L16, K16), this variation
of the attack extracts bits of the last round’s subkey.
CHAPTER 4. POWER ANALYSIS 75
S S S S S S S S
E
R K i









Figure 4.7: The DES f function.
Figure 4.8: The differential trace for a correct key guess.
It is important that the execution of the instructions which manipulate the bit
used to partition the traces are aligned in each of the traces ( i.e., the j∗’s are
constant across the different traces ). If this is not the case, then the averaging
step will degrade the power biases rather than reinforce them. In practice, aligning
the power traces can be done by identifying characteristics common to each trace
using SPA techniques.
DPA is generally considered to be a more powerful attack than SPA since the
only implementation aspect it relies on is that the power consumed when a token
processes a 0 is different from when it processes a 1 ( i.e., ε > 0 ). In devices
CHAPTER 4. POWER ANALYSIS 76
which show Hamming weight correlation this is certainly true. With transition
correlation, the DPA technique may still be applicable using a partition function
based on two bits.
4.4.3 Multiple bit DPA
The number of traces, n, required for a successful DPA attack is related to the size
of the power bias ε attributed to the value of the partitioning bit and the noise in


























so the noise in the differential trace is roughly 2σ√
n
. To distinguish the biases in
∆ we must have ε > 2σ√
n






Multiple bit DPA attempts to increase the magnitude of the power bias ε so that
DPA can be carried out using fewer traces ( i.e., smaller value of n ).
In devices which show Hamming weight correlation, if the power bias of different
bit values is ε, then the power bias of different byte values can be as large as 8ε.
Thus, sorting power traces according to the value of multiple bits can result in
differential traces with large spikes, which may be distinguished with fewer traces.
With DES, a guess for the key input to an S-box allows all four output bits of
an S-box to be predicted. In our previous description, we kept track of the value
CHAPTER 4. POWER ANALYSIS 77
of only one output bit and ignored the others. Sorting the traces into the sets:
T0 = {Ti : S-box output is 0000}
T1 = {Ti : S-box output is 1111}
will produce a differential trace with spikes of height roughly 4ε. The disadvantage
of this approach is that each of T0 and T1 contain fewer traces ( roughly n24 each )
so the average traces A0 and A1 will contain higher levels of noise. A detailed
discussion of the trade offs between spike height and the noise in ∆ is given in [42].
Designing multiple bit sorting functions must be done with respect to the words
that a device actually manipulates. Although a key guess may allow a few inter-
mediate bits to be determined, multiple bit DPA is only applicable if these bits are
manipulated together ( e.g., in the same byte ).
4.5 Countermeasures
Techniques for resisting power analysis can be implemented at both the hardware
and software levels. Countermeasures at the software level seem to be more de-
sirable, from a commercial standpoint at least, since they can be implemented on
existing architectures. Hardware countermeasures are generally more costly to im-
plement, but they may be necessary depending on the required level of security.
We give examples of countermeasures at each of the two levels now.
Using secret values to perform conditional operations can cause SPA vulnera-
bilities in cryptographic algorithms. We saw this with RSA in Figure 4.4. Avoid-
CHAPTER 4. POWER ANALYSIS 78
ing these types of conditional statements when implementing these algorithms can
eliminate many SPA weaknesses. In algorithms which inherently assume this type
of key dependent branching, it may not be possible to remove these statements
completely. However, operations with large power characteristics ( e.g., multiplica-
tions ) can be moved outside of conditional branches to decrease the size of SPA
characteristics. This strategy can be applied to the square-and-multiply algorithm
as shown in Figure 4.9.
INPUT: M,N, d = (dn−1dn−2 . . . d1d0)2
OUTPUT: S = Md mod N
1 S ← 1
2 for j = n− 1 . . . 0 do
3 S0 ← S2 mod N
4 S1 ← S0 ·M mod N
5 S ← Sdj
6 return S
Figure 4.9: An SPA resistant version of the square-and-multiply algorithm.
The microcode run by some microprocessors can cause large operand depen-
dent power consumption features, as noted in [32, 12, 38]. Even constant execution
path code can demonstrate serious power analysis vulnerabilities when run on these
components. One way to counteract this problem is to split operands into shares,
using a threshold scheme ( a technique of secret sharing ), and then have the pro-
cessor compute by manipulating shares of sensitive data rather than the data itself
[12]. To deduce sensitive data, an adversary must now combine multiple power
CHAPTER 4. POWER ANALYSIS 79
consumption measurements from various locations within a power trace. This ef-
fectively increases the amount of noise, σ, obscuring the value of the sensitive data.





. So, increasing σ causes the
number of required power traces to increase. The authors in [12] argue that the
number of power traces required for a successful DPA attack increases exponentially
as a function of the number of shares. Unfortunately, the performance penalty as-
sociated with this countermeasure limits its practicality. The technique of random
masking, a similiar mode of defense introduced in [23], has better performance char-
acteristics. However, implementating this countermeasure must be done carefully,
as shown in [40] and [15].
Interleaving random computations into the execution of cryptographic opera-
tions is a common defense against DPA. If an encryption operation is interrupted
at random times with dummy computations then the times at which, say, a partic-
ular key byte is manipulated will vary from encryption to encryption. Power traces
collected from devices protected in this way will not be aligned with respect to
the operations the device has performed. As a result, spikes which would normally
appear thin and tall in a differential trace appear shorter and are smeared across
an interval. Similar to the secret sharing countermeasure, this technique increases
the amount of noise in the differential trace, which hopefully increases the number
of traces necessary for a successful DPA attack to an unreasonable number. More
details on this technique can be found in [14]. Microprocessors which are capable of
randomized multithreading are especially suited to this countermeasure. Clocking
devices using a randomized clock signal produces a similar effect [34].
CHAPTER 4. POWER ANALYSIS 80
Hardware components ( e.g., capacitors and inductors ) can be added to the
power line of tokens to filter, or smooth out, power consumption characteristics
[16, 45]. This approach attempts to decrease the size of the power bias, ε, thereby
increasing the number of traces required for a successful DPA attack.
A hardware countermeasure, which has been developed with smartcard systems
in mind, is to ensure that external power supplies are never connected directly to
the internal chip [48, 45]. This approach attempts to decorrelate the flow of current
on external power lines from internal computations. This is done by inserting a kind
of buffer between external power lines and internal ones. Of course, the internal
chip needs power, so the buffer must accommodate this. Systems of capacitors and
transformers have been proposed which function in this way.
Unfortunately, given enough power consumption traces, adversaries can over-
come most countermeasures. For this reason, system designers should adopt a
leak-tolerant design methodology, as recommended in [32] and [16]. As a token con-
sumes power, engineers should expect that some secret information will be leaked
to observers. The rate at which information is leaked can be used to determine key
lifespans. Keys can be refreshed using non-linear update functions ( e.g., SHA-1 )
when they expire. Several tests to determine the leakage rate of devices are pro-
posed in [16].
4.6 Remarks
Power analysis has been applied to many different ciphers, including several of
the recent AES candidates [11]. Depending on the ways that an adversary can
CHAPTER 4. POWER ANALYSIS 81
manipulate a token, it is possible to attack a cipher with different variations of
power analysis. For example, in [43], it is explained how the ability to re-key a
token, or a copy of a token, can be exploited with power analysis.
Some researchers have proposed attacks on tokens which make use of highly
detailed power consumption profiles [20, 1, 7]. The work required to profile a
device is substantial when compared with standard DPA, but the profile can be
reused to extract keys from several tokens which presumably use different keys.
Details on the equipment necessary to perform power analysis attacks can be
found in [11]. We only note that the cost of this equipment is low; an adversary
should be able to purchase the required equipment for less than $10,000.
Chapter 5
Conclusions
This thesis has presented several ways in which an adversary might use side chan-
nel information to cryptanalyze ciphers such as DES and RSA. The purpose of
discussing these particular ciphers was to exploit the reader’s likely familiarity with
them — a precedent which was set in the first papers to deal with this type of
cryptanalysis [30, 10, 32]. Many of the techniques which we applied to these ci-
phers can be readily applied to others. The timing attack, for example, was not
so much an attack on RSA as it was an attack on modular exponentiation. Any
cryptosystem which implements this operation ( e.g., DSS or Diffie-Hellman ) may
be vulnerable.
Of the three sources of side channel information we considered, it seems that
power consumption presents the most serious problem to cryptographic engineers.
Recall that message blinding and computational checks were relatively effective
software countermeasures against timing and fault analysis. Unfortunately, there
has yet to appear a defense with similar qualities against attacks like differential
82
CHAPTER 5. CONCLUSIONS 83
power analysis. The secret sharing countermeasure presented in [12] is attractive
because of the proof of security which comes with it. However, the performance
penalty incurred in using this countermeasure is quite high in terms of memory and
execution time. Contrary to what was initially suggested, secret sharing must be
done extensively throughout a cipher’s computation, rather than only in, say, the
first three and last three rounds of DES. Profiling attacks can be used to conduct
DPA on the inner rounds of DES, so secret sharing must be used there as well.
Much of the power analysis literature which appears now focuses on ways to
defeat previously suggested countermeasures [14, 15]. This trend has caused any
newly suggested countermeasures to be greeted with much scepticism, but this is
an important part of developing a sound defense.
The reader who is interested in pursuing his or her own investigations into
side channel cryptanalysis will hopefully find the collection of references in the
bibliography useful. Since this is a relatively new field of study much of the relevant
literature is not well known. So far, the majority of the papers on the subject have
been presented at the Cryptographic Hardware and Embedded Systems ( CHES )
conferences.
Bibliography
[1] M.L. Akkar, R. Bevan, P. Dischamp, and D. Moyart. Power Analysis: What
is now Possible. In T. Okamoto, editor, Advances in Cryptology - Proceedings
of ASIACRYPT 2000, volume 1976 of LNCS, pages 489–502. Springer-Verlag,
2000.
[2] R. Anderson and M. Kuhn. Tamper Resistance – a Cautionary Note. In
Proceedings of the Second USENIX Workshop on Electronic Commerce, pages
1–11, November 1996. Available from http://www.usenix.org.
[3] M. Bellare and P. Rogaway. The Exact Security of Digital Signatures – How
to Sign with RSA. In U. Maurer, editor, Advances in Cryptology - Proceedings
of EUROCRYPT 96, volume 1070 of LNCS, pages 399–416. Springer-Verlag,
1996. Available from http://www-cse.ucsd.edu/users/mihir.
[4] I. Biehl, B. Meyer, and V. Müller. Differential Fault Attacks on Elliptic Curve
Cryptosystems. In M. Bellare, editor, Advances in Cryptology - CRYPTO
2000, volume 1880 of LNCS, pages 131–146. Springer-Verlag, 2000.
84
BIBLIOGRAPHY 85
[5] E. Biham and A. Shamir. Differential Cryptanalysis of the Data Encryption
Standard. Springer-Verlag, 1993.
[6] E. Biham and A. Shamir. Differential Fault Analysis of Secret Key Cryptosys-
tems. In B. Kaliski, editor, Advances in Cryptology - CRYPTO ’97, volume
1294 of LNCS, pages 513–525. Springer-Verlag, 1997.
[7] E. Biham and A. Shamir. Power Analysis of the Key Scheduling of
the AES Candidates. In Second AES Candidates Conference, March
1999. Available from http://csrc.nist.gov/encryption/aes/round1/-
conf2/aes2conf.htm.
[8] G. Blom. Probability and Statistics: Theory and Applications. Springer-Verlag,
1989.
[9] D. Boneh. Twenty Years of Attacks on the RSA Cryptosystem. Notices
of the American Mathematical Society, 46(2):203–213, 1999. Available from
http://crypto.stanford.edu/~dabo/pubs.html.
[10] D. Boneh, R. DeMillo, and R. Lipton. On the Importance of Checking Cryp-
tographic Protocols for Faults. In W. Fumy, editor, Advances in Cryptology -
EUROCRYPT ’97, volume 1233 of LNCS, pages 37–51. Springer-Verlag, May
1997.
[11] S. Chari, C. S. Jutla, J. R. Rao, and P. Rohatgi. A Cautionary Note Re-
garding Evaluation of AES Candidates on Smart-Cards. In Second AES Can-
BIBLIOGRAPHY 86
didates Conference, March 1999. Available from http://csrc.nist.gov/-
encryption/aes/round1/conf2/aes2conf.htm.
[12] S. Chari, C. S. Jutla, J. R. Rao, and P. Rohatgi. Towards Sound Approaches
to Counteract Power-Analysis Attacks. In M. Wiener, editor, Advances in
Cryptology - CRYPTO ’99, volume 1666 of LNCS, pages 398–412. Springer-
Verlag, August 1999.
[13] D. Chaum. Blind Signatures for Untraceable Payments. In R. Rivest and
A. Sherman and D. Chaum, editor, Advances in Cryptology - Proceedings of
CRYPTO 82, volume 0, pages 199–203. Plenum Press, 1983.
[14] C. Clavier, J.S. Coron, and N. Dabbous. Differential Power Analysis in the
Presence of Hardware Countermeasures. In Ç. K. Koç and C. Paar, editors,
Cryptographic Hardware and Embedded Systems - CHES 2000, volume 1965 of
LNCS, pages 252–263. Springer-Verlag, August 2000.
[15] J.S. Coron and Louis Goubin. On Boolean and Arithmetic Masking against
Differential Power Analysis. In Ç. K. Koç and C. Paar, editors, Cryptographic
Hardware and Embedded Systems - CHES 2000, volume 1965 of LNCS, pages
231–237. Springer-Verlag, August 2000.
[16] J.S. Coron, P. Kocher, and D. Naccache. Statistics and Secret Leakage. In
Financial Cryptography ’00, 2000.
[17] J. Daemen and V. Rijmen. Resistance Against Implementation Attacks:
A Comparitive Study of the AES Proposals. In Second AES Candi-
BIBLIOGRAPHY 87
dates Conference, March 1999. Available from http://csrc.nist.gov/-
encryption/aes/round1/conf2/aes2conf.htm.
[18] J.F. Dhem, F. Koeune, P.A. Leroux, P. Mestré, J.J. Quisquater, and J.L.
Willems. A Practical Implementation of the Timing Attack. Technical Re-
port CG-1998/1, Université catholique de Louvain, 1998. Available from
http://www.dice.ucl.ac.be/crypto.
[19] W. Diffie and M. Hellman. New Directions in Cryptography. IEEE Transac-
tions on Information Theory, IT-22(6):644–654, November 1976.
[20] P. Fahn and P. Pearson. IPA: A New Class of Power Attacks. In Ç. K. Koç
and C. Paar, editors, Cryptographic Hardware and Embedded Systems - CHES
’99, volume 1717 of LNCS, pages 173–186. Springer-Verlag, August 1999.
[21] FIPS 140-1. Security Requirements for Cryptographic Modules. Federal Infor-
mation Processing Standard, National Institute of Standards and Technology,
January 1994. Available from http://csrc.nist.gov/fips/fips1401.pdf.
[22] FIPS 46-3. Data Encryption Standard. Federal Information Processing Stan-
dard, National Institute of Standards and Technology, 25 October 1999. Avail-
able from http://www.itl.nist.gov/fipspubs/by-num.htm.
[23] L. Goubin and J. Patarin. DES and Differential Power Analysis - The “Dupli-
cation” Method. In Ç. K. Koç and C. Paar, editors, Cryptographic Hardware
and Embedded Systems - CHES ’99, volume 1717 of LNCS, pages 158–172.
Springer-Verlag, August 1999.
BIBLIOGRAPHY 88
[24] H. Handschuh and H. Heys. A Timing Attack on RC5. In In Workshop
Record of Selected Areas of Cryptography - SAC ’98, pages 318–329. Queen’s
University, 1998.
[25] H. Handschuh, P. Paillier, and J. Stern. Probing Attacks on Tamper-Resistant
Devices. In Ç. K. Koç and C. Paar, editors, Cryptographic Hardware and Em-
bedded Systems - CHES ’99, volume 1717 of LNCS, pages 303–315. Springer-
Verlag, August 1999.
[26] M.A. Hasan. Countermeasures for Koblitz Curve Cryptosystems. In Ç. K. Koç
and C. Paar, editors, Cryptographic Hardware and Embedded Systems - CHES
2000, volume 1965 of LNCS, pages 93–108. Springer-Verlag, August 2000.
[27] K. Heidenstrom. FAQ / Application notes: Timing on the PC family un-
der DOS, 1995. Available from ftp://ftp.simtel.net/pub/simtelnet/-
msdos/info/pctim003.zip.
[28] A. Hevia and M. Kiwi. Strength of Two Data Encryption Standard Implemen-
tation Under Timing Attacks. ACM Transactions on Information and System
Security, 2(4):416–437, November 1999.
[29] J. Kelsey, B. Schneier, D. Wagner, and C. Hall. Side Channel Cryptanaly-
sis of Product Ciphers. Journal of Computer Security, 8(2-3):141–158, 2000.
Available from http://www.counterpane.com/side channel.html.
[30] P. Kocher. Timing Attacks on Implementations of Diffie-Hellman, RSA,
DSS and Other Systems. In N. Koblitz, editor, Advances in Cryptology -
BIBLIOGRAPHY 89
CRYPTO ’96, volume 1109 of LNCS, pages 104–113. Springer-Verlag, August
1996. An alternate version is available from http://www.cryptography.com/-
timingattack/paper.html.
[31] P. Kocher, J. Jaffe, and B. Jun. Introduction to Differential Power Analysis and
Related Attacks. Technical report, Cryptography Research Inc., 1998. Avail-
able from http://www.cryptography.com/dpa/technical/index.html.
[32] P. Kocher, J. Jaffe, and B. Jun. Differential Power Analysis. In
M. Wiener, editor, Advances in Cryptology - CRYPTO ’99, volume 1666
of LNCS, pages 388–397. Springer-Verlag, August 1999. Available from
http://www.cryptography.com/dpa/Dpa.pdf.
[33] F. Koeune and J.J. Quisquater. A Timing Attack Against Rijndael. Technical
Report CG-1999/1, Université catholique de Louvain, 1999. Available from
http://www.dice.ucl.ac.be/crypto.
[34] O. Kömmerling and M. Kuhn. Design Principles for Tamper-Resistant Smart-
card Processors. In USENIX Workshop on Smartcard Technology - Smartcard
’99, pages 9–20. USENIX Association, May 1999.
[35] J. Markoff. Secure Digital Transactions Just Got a Little Less Secure. New
York Times, page A1, 11 December 1995.
[36] J. Markoff. Potential Flaw in Cash Card Security Seen. New York Times,
page D1, 26 September 1996.
BIBLIOGRAPHY 90
[37] M. Matsui. The First Experimental Cryptanalysis of the Data Encryption
Standard. In Y. G. Desmedt, editor, Advances in Cryptology - CRYTPO ’94,
volume 839 of LNCS, pages 1–11. Spring-Verlag, August 1994.
[38] R. Mayer-Sommer. Smartly Analyzing the Simplicity and the Power of Simple
Power Analysis on Smartcards. In Ç. K. Koç and C. Paar, editors, Crypto-
graphic Hardware and Embedded Systems - CHES 2000, volume 1965 of LNCS,
pages 78–92. Springer-Verlag, August 2000.
[39] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of Applied
Cryptography. CRC, 1996.
[40] T. Messerges. Securing the AES Finalists Against Power Analysis Attacks. In
B. Schneier, editor, Fast Software Encryption Workshop - FSE 2000, volume
1978 of LNCS. Springer-Verlag, April 2000.
[41] T. Messerges. Using Second-Order Power Analysis to Attack DPA Resistant
Software. In Ç. K. Koç and C. Paar, editors, Cryptographic Hardware and Em-
bedded Systems - CHES 2000, volume 1965 of LNCS, pages 238–251. Springer-
Verlag, August 2000.
[42] T. Messerges, E. Dabbish, and R. Sloan. Investigations of Power
Analysis Attacks on Smartcards. In USENIX Workshop on Smart-
card Technology, pages 151–161, May 1999. Available from
http://www.eecs.uic.edu/~tmesserg/papers.html.
[43] T. Messerges, E. Dabbish, and R. Sloan. Power Analysis Attacks of Modular
BIBLIOGRAPHY 91
Exponentiation in Smart Cards. In Ç. K. Koç and C. Paar, editors, Crypto-
graphic Hardware and Embedded Systems - CHES ’99, volume 1717 of LNCS,
pages 144–157. Springer-Verlag, August 1999.
[44] PKCS #1 v2.0. RSA Cryptography Standard. Public Key Cryptog-
raphy Standard, RSA Laboratories, September 1998. Available from
ftp://ftp.rsasecurity.com/pub/pkcs/ascii/pkcs-1v2.asc.
[45] P. Rakers, L. Connell, T. Colins, and D. Russell. Secure Contactless Smart-
card ASIC with DPA Protection. In IEEE 2000 Custom Integrated Circuits
Conference, pages 239–242, May 2000.
[46] R. Rivest, A. Shamir, and L. Adleman. A Method for Obtaining Digital Sig-
natures and Public-Key Cryptosystems. Technical Memo LCS/TM 82, MIT
Laboratory for Computer Science, 4 April 1977. Revised 12 December 1977.
[47] W. Schindler. A Timing Attack against RSA with the Chinese Remainder
Theorem. In Ç. K. Koç and C. Paar, editors, Cryptographic Hardware and Em-
bedded Systems - CHES 2000, volume 1965 of LNCS, pages 109–124. Springer-
Verlag, August 2000.
[48] A. Shamir. Protecting Smart Cards from Passive Power Analysis with De-
tached Power Supplies. In Ç. K. Koç and C. Paar, editors, Cryptographic
Hardware and Embedded Systems - CHES 2000, volume 1965 of LNCS, pages
71–77. Springer-Verlag, August 2000.
[49] D. Stinson. Cryptography: Theory and Practice. CRC Press, 1995.
BIBLIOGRAPHY 92
[50] P. Wayner. Code Breaker Cracks Smart Cards’ Digital Safe. New York Times,
page D1, 22 June 1998.
[51] N. Weste and K. Eshraghian. Principles of CMOS VLSI Design: A Systems
Perspective. Addison-Wesley, 2nd edition, 1994.
[52] P. Wright. Spy Catcher: The Candid Autobiography of a Senior Intelligence
Officer. Viking, 1987.
