Abstract. Side-channel attacks on block ciphers and public key algorithms have been discussed extensively. However, there is only sparse literature about side-cannel attacks on stream ciphers. The few existing references mainly treat timing [10] and template attacks [13] , or provide a theoretical analysis [8] , [9] of weaknesses of stream cipher constructions. In this paper we present attacks on two focus candidates, Trivium and Grain, of the eS-TREAM stream cipher project. The attacks exploit the resynchronization phase of ciphers. A novel concept for choosing initial value vectors is introduced, which totally eliminates the algorithmic noise of the device, leaving only the pure side-channel signal. This attack allows to recover the secret key with a small number of samples and without building templates. To prove the concept we apply the attack to hardware implementations of the ciphers. For both stream ciphers we are able to reveal the complete key.
Introduction
Differential power analysis (DPA) is a well-known and thoroughly studied threat for implementations of block ciphers, like DES and AES, and public key algorithms, like RSA. However, in the field of stream ciphers this topic is rather unknown. More generally, this is even true for any kind of side-channel analysis (SCA). Side-channel attacks are built on the fact that cryptographic algorithms are implemented on a physical device. SCA can use all kinds of physical emanation from the device, like current consumption, electromagnetic radiation, or execution time variations. This so-called side-channel may leak information about secret data. Even the regular output of the algorithm can be seen as a side-channel-in the case that an attacker was able to induce a fault into the data or control path of the computation. Although there is vast literature about SCA on implementations of block ciphers and public key algorithms, only few publications can be found about attacks on stream ciphers. In [6] the authors study fault attacks on stream ciphers like LILI-128, RC4, and SOBER-t32. The latter one was the target of a timing attack in [10] . Template attacks, which were introduced in [2] , were mounted on RC4 in [12] . So far, there are no reports on a practical DPA targeting a hardware implementation of a stream cipher. There is only one work, [9] , which describes theoretically DPA attacks on A5/1 and E0. These are classical DPA attacks which aim at raising the side-channel signal above the algorithmic noise 1 by statistical means (by averaging over many power traces). In contrast, the presented method cancels out the algorithmic noise exactly by using specially tailored sets of initial value vectors.
Differential power analysis was introduced by Kocher in [7] . In a DPA an attacker generates a set of hypotheses (about some secret value or a partial key) and tries to identify the (unique) true hypothesis by finding the highest correlation between the power consumption of the physical realization of an algorithm and those internal bits which can be computed by the attacker by virtue of one of these hypotheses. The classical setup for a DPA is illustrated in Figure 1 . Some parts S of an implementation of a cryptographic unknown data/partial key k known data/message m output bit c(m, k) This implies, of course, that S cannot be linear. But in any good cipher one is always able to identify some parts with this property.
Our attack targets the resynchronization phase of stream ciphers. Unlike in [9] , where a known IV DPA attack is described, we will describe and execute a chosen IV DPA attack. It will be shown that the signal-to-noise ratio of the side-channel signal, which carries information about the secret key, can be optimized by specially chosen initial value vectors. These are constructed in such a way, that in the statistical analysis of the power traces contributions to the power consumption, which are not related to the correlation signal, will cancel out. We will elaborate the attacks for two recently published stream ciphers, Grain [5] and Trivium [1] . In the case of the first cipher, we will make partial use of a nonlinear element S. In the case of the second cipher we will not use any nonlinearity. S will rather be an XOR gate. By virtue of the new selection scheme for the initial value vectors we are able to mount practical attacks on hardware implementations of the two ciphers. The attack is efficient in practice, as there is no need to construct templates. Also the number of samples is small.
Outline-We will describe the attacks on both stream ciphers. In each case we will first give a definition of the cipher and shortly describe a straight forward hardware implementation, in order to state a theoretical power model for this implementation. We will describe the actual attack on the cipher and show why the attack works in our chosen power model. Finally the attack on a physical implementation of Grain on a field programmable gate array device (FPGA) will be reported.
The target of the attack will be the second version of Grain [5] . After a description of the structure and the implementation of the cipher the power model for a CMOS implementation will be defined and the theory for the attack will be elaborated. Finally we will report the results of a practical realization of the attack on a hardware implementation.
Definition of Grain
Grain is a binary additive synchronous stream cipher with an internal state of 160 bits s i , s i+1 , . . . , s i+79 and b i , b i+1 , . . . , b i+79 residing in a linear feedback shift register (LFSR) and a nonlinear feedback shift register (NLFSR), respectively. It supports a key k = (k 0 , . . . , k 79 ) of 80 bits and an initial value IV = (IV 0 , . . . , IV 63 ) of 64 bits. After a runup time of 160 iteration steps it outputs a key stream z i . During run-up the output bits z i (0 ≤ i < 160) will not be used, but fed back into the LFSR and NLFSR components. Run-up (for 0 ≤ i < 160) and output generation (for 160 ≤ i) is described by the following recursion formula:
All variables represent elements of the binary field F 2 . g : F 64 2 → F 2 is a nonlinear function which we will not describe any further, since the exact form is not essential for the attack. The function h is defined by
The indicator function δ [0,159] (i) is 1 for 0 ≤ i ≤ 159 and 0 otherwise. In addition to the specification in [5] we introduced the two intermediary valuesσ i and σ i . They are not essential for the definition of the cipher. However, these values will be a part of our hypothesis.
Notation 1 For the whole paper we will fix the secret key k. Besides k, the sequences b i , s i , g i , f i ,σ i , σ i , z i will depend on the initial value. Therefore we will often write b 
do not depend on the initial value ν, but only on the key k.
Implementation in Hardware
A structural view of the hardware implementation is given in Of course there is also some control logic for loading the key and initial value, clocking the FSRs, and switching δ.
Power Model
We will use a discrete, Hamming distance based power model to describe the power consumption, since this suites the power consumption of a CMOS implementation very well. For a fixed key k and an initial value ν the power consumption of Grain is a function
where P (i) is the integral over the power consumption during the i-th clock cycle. The i-th clock cycle is the period of time, when the values
, s i+80 are evaluated and the two FSRs are shifted. Therefore we can write
where P G , P H , P F and P FF denote the power consumption of G, H (including the generation of σ i ), F, and a flip-flop, respectively. Ω describes the noise which is independent of the described architectural elements. It is reasonable to model P G , P H , P F and P FF in a way, such that they only depend on the old and new input values:
Note, that this equation may not be fully correct for i = 0, since the "old" values may not always exist (e.g. b −1 ) or could have some default values (after resetting the circuit). In this case the corresponding constant values must be used. We will make no further assumption about the functions P G : F 65 2 → R and P H : F
64+8 2
→ R, since this would add an unnecessary difficulty. It turns out that the precise form will not be needed. We define that P F : F 12 2 → R is only a function of the Hamming distances of consecutive bits of the LFSR:
For P FF we make the usual approximation
We do not assume that all P FF,N j or all P FF,L j are equal, as this cannot be expected to hold in an arbitrary implementation. Ω contains all noise contributions which are independent of the key and the initial value, such as the noise generated by the control hardware of the cipher or switching activity of circuits in the environment.
Notation 2 For a fixed key k the whole cipher depends on the initial values ν. P (ν) , P
G , . . . , denote the respective power consumption functions. P (ν) will still be variable because of Ω. Therefore we will use its expected valuē
which can be approximated by measuring the power consumption for the same initial value ν several times and taking the arithmetic mean value. NowP (ν) depends on the initial value ν only.
Attack on Grain: Theory
The attack on Grain consists of three steps. The first two steps are differential power analyses gaining information of 34 and 16 bits of the key, respectively. The third step is an exhaustive search on the remaining 30 bits of the key.
Step 1 is a DPA with chosen IVs. It is done in 17 rounds. In the i-th round (0 ≤ i ≤ 16) we set up our hypothesis (b h i+63 , σ h i ) about the pair (b i+63 , σ i ) and try to verify the true hypothesis by using the recorded power traces of the key setup phase for several initial values ν ∈ IV i . The set IV i of initial values is tailored in such a way, that the intrinsic power consumption of Grain will cancel out when computing the difference of the power traces (the "correlation function"). In each round the results of the previous ones will be used.
Step 1 is illustrated in Table 3 . Note that (b h i+63 , σ h i ) as well as ν must be used in
i+80 } and the "correlation function" IV i contains 32 initial values which toggle the bits ν i+3 , ν i+13 , ν i+22 , ν i+23 , ν i+25 and set ν i+46 to 1. 
For an explanation of this lemma cf. the paragraph after Lemma 2.
Remark 3. Of course, this DPA also works for families IV i of randomly chosen initial values. However, the algorithmic noise level caused by the remaining 159 flip-flops (other than L 79 ) will be higher compared to the correlation signal. As a consequence the number of necessary samples #IV i for each family would be larger. The same is true for Step 2.
Step 2 works similarly to the first step, but only in 16 rounds. The hypotheses will be (g h i−17 ,σ h i ), for 17 ≤ i ≤ 32. Moreover, the construction of the IV i will make use of the previously learned data. The second step of the attack is given in Table 4 . Note 
i+80 } and the "correlation function" The next lemma gives the justification for the algorithm.
Lemma 2.
In round i (16 ≤ i ≤ 32) of the above algorithm we have:
The proofs of the last two lemmata are straightforward, but involve rather lengthy calculations. They are preferably supported by an algebraic software package. The families of IV i of initial value vectors are constructed such that the following properties hold:
-The toggling of the bits s i+13 and s i+23 directly causes a toggling of s i+80 , but not of s i+79 . This distributes the power functionsP (ν) , ν ∈ IV i to the two sums in the "correlation function" in a way, such that all power contributions not depending on s i+13 and s i+23 will cancel out.
-The toggling of the bit s i+22 has the same effect by influencing s i+79 .
-Function h(x 0 , x 1 , x 2 , x 3 , x 4 ) has the following property: If x 2 = x 3 = 1 and (x 0 , x 1 ) ∈ F 2 2 , changing x 4 to its complement results in a change of h in exactly 50% of the cases. This is the classical assumption for a bit to be used in a DPA.
-The additional conditions in the definition of IV i , for 17 ≤ i ≤ 32 have the following reason. Toggling of bit s i+22 does not result in a toggling of other bits which also depend on the bits s i+13 and s i+23 , asserting that the first two facts are orthogonal.
Remark 5. In a practical attack on a hardware implementation the characteristics described in Lemmata 1 and 2 will transform into a peak for correct hypotheses (b i+63 , σ i ) or (g i ,σ i ), and a negative peak for hypotheses with reversed σ i orσ i . Each of the other two hypotheses should not show a significant peak. During the following 79 clock cycles peaks can still be expected because the correlating signal remains.
Step 3 is now straight forward. After having obtained 50 values b 63 , . . . , b 79 , σ 0 , . . . , σ 16 , σ 17 , . . . ,σ 32 , fifty independent linear equations in k 0 , . . . , k 79 can be written down. Solving these equations a linear map κ : F 30 2 → F 80 2 is obtained, such that the image of κ contains all possible remaining keys. Hence an exhaustive key search can be performed in practice. The complexity of this final step is 2 30 . The whole key can be recovered with, e.g., one appropriate plain text/cipher text pair. The attack can be improved by using the additional information contained in (g 0 , . . . , g 15 ) to exclude more keys.
Attack on Grain: Practical Realization
We implemented a version of Grain which generates one bit of key stream per clock cycle. This is probably the realization most relevant for hardware constrained environments. We chose an implementation on an Altera FLEX EPF10K100ARC240-2 FPGA as our target of attack. In a standard measurement setup the voltage drop at a shunt in the power supply line of the FPGA was measured. The FPGA was operated at 2.5 MHz and the power traces were recorded using a LeCroy LC684DXL oscilloscope with a sample rate of 2 Giga samples per second. A set of 256 power traces for each initial value in each family IV i was obtained. The corresponding sample averagesP (ν) were used to verify or falsify the hypotheses. As an example, in Figure 5 
Differential Power Analysis of Trivium
In this section we describe a DPA attack on the stream cipher Trivium [1] . This attack is not based on any nonlinear part, but correlations with the power consumption of the three flip-flops B 81 , B 82 and B 83 are exploited. These flip-flops lie behind an XOR gate, which mixes known and controllable bits with secret bits. Again we are able to recover the whole key.
Definition of Trivium
Trivium is a stream cipher with an internal state of 288 bits a i , a i+1 , . . . , a i+92 , b i , b i+1 , . . . ,  b i+83 and c i , c i+1 , . . . , c i+110 -residing in three coupled feedback shift registers A, B, and C of 93, 84, and 111 bits respectively-using a key k = (k 0 , . . . , k 79 ) of 80 bits as well as an initial value IV = (IV 0 , . . . , IV 79 ) of 80 bits. After a run-up time of 4 · 288 iteration steps it outputs a key stream z i . Run-up (for 0 ≤ i < 4 · 288) and output generation (for 4 · 288 ≤ i) can be described by the following recursion formula: 1, 1, 1, 0, . . . , 0)
All variables represent elements in F 2 . The intermediate value σ i := a i + a i+1 a i+2 + a i+27 will be our hypothesis in the DPA.
Implementation in Hardware
Again, the target of attack will be an implementation of the cipher which generates one bit of key stream per clock cycle. . . , c i+110 , three additional AND, and a few XOR gates (as given in the recursion formula), as well as additional control logic for loading the key and initial value, and clocking the NLFSRs.
Power Model
We will use the same power model and notation as in the previous attack. The model for the power consumption is
as well as
Furthermore, we make the same assumption regarding the power consumption of the flip-flops as in the previous attack. To simplify the description we ignore the power consumption of the single AND and XOR gates. The notations, likeP (ν) , will also be used equivalently.
Attack on Trivium: Theory
To simplify the description we make the assumption that all flip-flops in our power model have the same power characteristic, i.e., for all appropriate j: e xy = P FF,A j (x, y) = P FF,B j (x, y) = P FF,C j (x, y) with some constants e 00 ≈ 0 ≈ e 11 e 01 , e 10 . This restriction, however, is not necessary to mount the attack.
The DPA is done in 76 rounds. In the i-th round we will know already (σ j )
i−1 j=0 and evaluate σ i . In fact, we will not need to make any hypothesis. The attack is illustrated in the following Table 6 . We will assume, that the values σ 0 , σ 1 , σ 2 , and σ 10 are already known (in this case, one may make an "external" hypothesis on these 4 bits). 2 For i := 3 to 78 except 10 do
Using the knowledge of (σj)
and 
i+81 and b
i+83 (mod 2), therefore the respective index "ν + ", in the above algorithm, can be left out. (ii) ForP i (i) we have the values given in Table 1 .
Remark 6. In a practical attack on a hardware realization, by virtue of Lemma 3, the two inequalities will transform into the decisions {no peak↔negative peak} and {positive peak↔no peak}. The boundary (e 01 + e 10 )/2 was just used for illustration purposes. Extraction of the key: After gaining the 79 values (σ i ) 78 i=0 (possibly depending on the 4 hypothetical values σ 0 , σ 1 , σ 2 , and σ 10 ) we can write down the equations σ i = a i + a i+1 a i+2 + a i+24 for 0 ≤ i ≤ 78. These are 79 equations with 80 indeterminates (k i ) 79 i=0 , which are shown explicitly in Table 2 . One equation is dependent on the others. For solving the system of equations we may assume any value in F 2 for k 12 and k 13 . By reordering the equations in the Table 3 -and leaving out the equation for σ 12 -we can σ0 = k65, σ1 = k64, . . . σ11 = k54, k65, . . . , k54 σ27 = k65 + k64k63 + k38, . . . σ36 = k56 + k55k54 + k29, k38, . . . , k29 σ54 = k38 + k37k36 + k11, . . . σ61 = k31 + k30k29 + k4, k11, . . . , k4 σ78 = k14 + k13k12 + k56, . . . σ69 = k23 + k22k21 + k65, k14, . . . , k23 σ53 = k39 + k38k37 + k12, . . . σ42 = k50 + k49k48 + k23, k39, . . . , k50 σ26 = k66 + k65k64 + k39, . . . σ15 = k77 + k76k75 + k50, k66, . . . , k77 σ68 = k24 + k23k22 + k66, . . . σ66 = k26 + k25k24 + k68, k24, . . . , k26 σ41 = k51 + k50k49 + k24, . . . σ39 = k53 + k52k51 + k26, k51, . . . , k53 σ38 = k54 + k53k52 + k27, σ37 = k55 + k54k53 + k28, k27, k28 σ14 = k78 + k77k76 + k51, σ13 = k79 + k78k77 + k52, k78, k79 σ62 = k30 + k29k28 + k3, . . . σ65 = k27 + k26k25 + k0, k3, . . . , k0 solve one equation after the other and obtain a full key for each previously chosen pair (k 12 , k 13 ). Counting also the hypotheses σ 0 , σ 1 , σ 2 , and σ 10 we may get at most 2 6 = 64 different possible keys. Finding the right one is now trivial.
Conclusion
We presented differential power analysis of two recently proposed stream ciphers, which are focus candidates of the eSTREAM project. A novel concept for choosing initial value vectors was introduced, which eliminates the algorithmic noise of the device, leaving only the pure side-channel signal. This attack allowed us to recover the secret key with a small number of samples and without building templates. It is plausible that this kind of attack can be applied to stream ciphers with a similar construction philosophy. Both ciphers belong to the category of profile 2, which aims at employment in hardware constrained environments, such as mobile communication, identification, and RFID tags. In these fields of application side-channel attacks are a serious threat and protection mechanisms are usually mandatory. On the list of evaluation criteria, which hold for all-purpose ciphers, the size of the hardware implementation is prominent for profile 2 ciphers. If one employs generic countermeasures against DPA, like differential logic styles or masking at gate level, however, the area of a circuit would typically increase by a factor of 3.5 to 5 compared to a straightforward CMOS implementation (cf. [3] , [4] , [11] for extensive references). Hence it is important that ciphers employed in hardware and power restricted environments are suited for more economical DPA countermeasures. Ideally these should be considered from the outset in the design. We expect that there will be larger differences in the robustness of the various eSTREAM designs against DPA (and other types of SCA)-it is even possible that some apparently unprotected design is applicable in practice whereas another design needs already costly countermeasures. Hence the significance of area (and performance) figures of the ciphers is difficult to assess without explicitly considering the susceptibility of the ciphers for SCA.
