Abstract. The main application of stream ciphers is online-encryption of arbitrarily long data. Many practically used and intensively discussed stream ciphers consist of a small number of linear feedback shift registers (LFSRs) and a compression function that transforms the bitstreams produced by the LFSRs into the output keystream. In 2002, Krause proposed a Binary Decision Diagram (BDD) based attack on this type of ciphers, which ranges among the best generic short-keystream attacks on practically used ciphers such as the A5/1 generator used in GSM and the E0 generator from the Bluetooth standard. In this paper we show how to extend the BDD-technique to nonlinear feedback shift registers (NFSRs), feedback shift registers with carry (FCSRs), and arbitrary compression functions. We apply our findings to the eSTREAM focus ciphers Trivium, Grain and F-FCSR. In the case of Grain, we obtain the first nontrivial cryptanalytic result besides generic time-memory-data tradeoffs.
z = C(w).
Practical examples for this design include the E 0 generator used in Bluetooth [4] , the A5/1 generator from the GSM standard for mobile telephones [5] , and the self-shrinking generator [18] .
In 2002, Krause proposed a Binary Decision Diagram (BDD) attack [14, 15] on stream ciphers that are based on Linear Feedback Shift Registers (LFSRs). The BDD-attack is a generic attack in the sense that it does not depend on specific design properties of the respective cipher. It only relies on the assumptions that the generator's output behaves pseudorandomly and that the test whether a given internal bitstream w produces a sample keystream can be represented in a Free Binary Decision Diagram (FBDD) of size polynomial in the length of w. In addition, the attack reconstructs the secret key from the shortest informationtheoretically possible prefix of the keystream (usually a small multiple of the keysize), whereas other generic attack techniques like algebraic attacks [1, 6] and correlation attacks [10] in many cases require amounts of known keystream that are unlikely to be available in practice. In the case of E 0 , the A5/1 generator and the self-shrinking generator, it has been shown in [16] that the performance of the attack in practice does not deviate significantly from the theoretical figures. The inherently high memory requirements of the attack can be reduced by divideand-conquer strategies based on guessing bits in the initial state at the expense of slightly increased runtime [16, 20] .
In the ECRYPT stream cipher project eStream [8] , a number of new ciphers have recently been proposed and analyzed. Many new designs partly replace LFSRs by other feedback shift registers such as nonlinear feedback shift registers (NFSRs) and feedback shift registers with carry (FCSRs) in order to prevent standard cryptanalysis techniques like algebraic attacks and correlation attacks. Moreover, combinations of different types of feedback shift registers permit alternative compression functions. We show that the BDD-based approach remains applicable in the presence of NFSRs and FCSRs combined with arbitrary output functions as long as not too many new internal bits are produced in each clock cycle of the cipher.
Three of the most promising hardware-oriented submissions to the eStream project are the ciphers Trivium [7] , Grain [12] , and the F-FCSR family [2] . All three ciphers are part of the focus group and are now being considered for the final portfolio that will be announced in the middle of 2008. We show that the BDD-attack is applicable to these ciphers and obtain the first exploitable cryptanalytic result on the current version of Grain besides generic time-memory-data-tradeoff attacks. Our results for the F-FCSR family emphasize already known but differently motivated security requirements for the choice of parameters.
This paper is organized as follows. We discuss some preliminaries on FSRbased keystream generators and BDDs in Sect. 2 and explain the extended BDDattack in Sect. 3. Section 4 presents generic constructions for the BDDs that are used in the attack, and Sect. 5 applies our observations to the eStream focus ciphers Trivium, Grain and F-FCSR. The FSR-construction is illustrated in Fig. 1 . 
Preliminaries

FSR-Based Keystream Generators
As an alternative to LFSRs and NFSRs, Feedback Shift Registers with Carry were introduced and extensively analyzed in [13] . (Fibonacci FCSR) consists of an n-bit feedback shift register a = (a 0 , a 1 , . . . , a n−1 ) with feedback taps (c 1 , . . . , c n ) and an additional q-bit memory b with q ≤ log 2 (n) . Starting from an initial configuration (a 0 , b 0 ), in each clock a 0 is produced as output, the sum σ := b + n i=1 a n−i c i is computed over the integers, and the shift register and memory are updated according to a := (a 1 , . . . , a n−1 , σ mod 2) and b := σ div 2.
Definition 2. A Feedback Shift Register with Carry in Fibonacci architecture
A Fibonacci FCSR of length n is illustrated in Fig. 2 .
Fig. 2. FCSR of length n in Fibonacci architecture
We call an FCSR state (a, b) periodic if, left to run, the FCSR will eventually return to that same state. In case the Fibonacci FCSR is in a periodic state, the memory required to store the integer sum b can be further bounded as follows (cf. [11] , Proposition 3.2). Hence, if we know that the initial state (a 0 , b 0 ) is periodic, we can limit the size of the memory to q := log 2 (wt(c) − 1) + 1 bits.
Proposition 1. If a Fibonacci FCSR is in a periodic state, then the value of the memory b is in the range
Based on the initial configuration (a 0 , b 0 ), we can describe the output bitstream (w t ) t≥0 of a Fibonacci FCSR by
Similarly to the Galois architecture of LFSRs, there exists a Galois architecture for FCSRs, which was first observed in [19] and further analyzed in [11] . We note that the Galois architecture is generally more efficient than the Fibonacci architecture because the feedback computations can be performed in parallel and each addition involves at most 3 bits. An example Galois FCSR is illustrated in Fig. 3 . In an FSR-based keystream generator, the FSRs may be interconnected in the sense that the update function F i of R i may also depend on the current content of the other registers, i.e., we have
n → {0, 1} * , which derives the output of each clock from the current state, usually depends on one or more state bits from each FSR.
Similarly to a single FSR, we can think of an FSR-based keystream generator with k registers as producing an internal bitstream (w t ) t≥0 , where
s(t) with r(t) = t mod k and s(t) = s div k , i.e., the t-th internal bit of the generator corresponds to the s(t)-th bit in the bitstream produced by R r(t) . Again, the internal bitstream and hence the output of an FSR-based keystream generator are entirely determined by its starting state s 0 , and the first m bits of the internal bitstream w can be computed as
We denote the prefix of the keystream that is produced from an m-bit internal bitstream w by
We call an integer i an initial position in an internal bitstream w, if w i corresponds to a bit from the initial state of some FSR, and a combined position otherwise. Correspondingly, we denote by IP(i) the set of initial positions and by CP(i) the set of combined positions in {0, . . . , i− 1}. For an internal bistream w, let IB(w) denote the bits at the initial positions in w. Let n min denote the maximum i for which all i ≤ i are initial positions and n max the minimum i for which all i > i are combined positions.
Definition 4. We call an FSR-based keystream generator
regular, if |C m (w)| = β(m) for all w ∈ {0, 1} m , i.e.,
an internal bitstream of length m always yields β(m) keystream bits.
Two important parameters of FSR-based keystream generators are the best-case compression ratio and the information rate, which we define as follows. 
Definition 5. If γm is the maximum number of keybits that the generator produces from internal bitstreams of length
[C(w) is prefix of z] = p C (m), i.e.,
the probability of C(w) being a prefix of z is independent of z.
As shown in [14] , the computation of α can be simplified as follows if the generator fulfills the independence assumption. 
Lemma 1. If the Independence Assumption holds for a keystream generator, we have
m . Finally, we assume the output keystream to behave pseudorandomly, which we formalize as follows. 
Assumption 2 (Pseudorandomness Assumption
We expect the Pseudorandomness Assumption to hold since a significant violation would imply the vulnerability of the generator to a correlation attack.
Binary Decision Diagrams (BDDs)
We briefly review the definitions of Binary Decision Diagrams and their most important algorithmic properties. 
Definition 6. A Binary Decision Diagram (BDD) over a set of variables
X n = {x 1 , . . . , x n } is a directed, acyclic graph G = (V, E) with E ⊆ V × V × {0,Definition 7. For a BDD G over X n , let G −1 (1) ⊆ {0, 1} n denote
the set of inputs accepted by G, i.e., all inputs a ∈ {0, 1}
n such that f root (a) = 1.
Since general BDDs have many degrees of freedom for representing a particular Boolean function, many important operations and especially those that are needed in our context are NP-hard. We therefore concentrate on the more restricted model of Ordered Binary Decision Diagrams (OBDDs), which are defined as follows. 
Definition 8. A variable ordering π for a set of variables
X n = {x 1 , . . . , x n } is a permutation of the index set I = {1, . . . , n}, where π(i) denotes the position of x i in the π-ordered variable list x π −1 (1) , x π −1 (2) , . . . , x π −1 (n) .
Definition 9. A π-Ordered Binary Decision Diagram (π-OBDD) with respect to a variable ordering π is a BDD in which the sequence of tests on a path from the root to a sink is restricted by π, i.e., if an edge leads from an x i -node to an x j -node, then π(i) < π(j). We call a BDD G an OBDD, if there exists a variable ordering π such that G is a π-OBDD. We define the width of an OBDD
In contrast to BDDs, OBDDs allow for efficient implementations of the operations that we are interested in. Let π denote a variable ordering for X n = {x 1 , . . . , x n } and let the π-OBDDs G f , G g and
, and there exists an algorithm MIN that computes in time O(|G f |) the uniquely determined minimal 
We refer the reader to [21] for details on BDDs, OBDDs and the corresponding algorithms.
Note that we can straightforwardly use BDDs as a datastructure for subsets of {0, 1}
n . In order to represent an S ⊆ {0, 1} n , we construct a BDD G S that computes the characteristic function f S of S given by f S (x) = 1 if x ∈ S and f S (x) = 0 otherwise. Hence, G S will accept exactly the elements of S. Moreover, we can compute a BDD representing the intersection S ∩ T of two sets S and T from their BDD-representations G S and G T by an AND-synthesis of G S and G T .
BDD-Based Initial State Recovery
The BDD-based attack on keystream generators, which was first introduced in [14] , is a known-plaintext initial state recovery attack, i.e., the attacker tries to reconstruct the unknown initial state s 0 of the keystream generator from a few known plaintext bits p 1 We observe that for any internal bitstream w ∈ {0, 1} m that yields a prefix of the observed keystream, the following two conditions must hold.
Condition 1. w is an m-extension of the initial state bits in w, i.e., we have
H ≤m (IB(w)) = w.
Condition 2. C m (w) is a prefix of the observed keystream z.
We call any w ∈ {0, 1} m that satisfies these conditions an m-candidate. Our strategy is now to start with m = n min and to dynamically compute the mcandidates for m > n min , until only one m-candidate is left. The first bits of this m-candidate will contain the initial state s 0 that we are looking for. We can expect to be left with only one m-candidate for m ≥ α −1 n , which follows directly from the following Lemma (cf. [14] for a proof). 
Lemma 2. Under Assumption 2, it holds for all keystreams z and all
The proof of Lemma 3 is analogous to the LFSR-case presented in [14, 15] and can be found in Appendix A. From this bound on w(P m ), we can straightforwardly derive the time, space and data requirements of the BDD-based attack. 
Generic BDD Constructions
Keystream Consistency Check Q m
In most cases, the BDD Q m that checks Condition 2 can be straightforwardly derived from the definition of the output function C. If the computation of an output bit z t depends on u(j) > 1 bits from an FSR R j , a fixed bit in the bitstream produced by R j will generally appear and have to be read in the computation of up to u(j) output bits. In this case, we compute an output bit z t from a number of new bits which are being considered for the first time, and several old bits that were already involved in the computation of previous output bits. This would imply reading a fixed variable more than once on the same path in Q m , which is prohibited by the OBDD-definition. The less restrictive BDDs permit this construction, but can no longer guarantee the efficiency of the operations that our attack depends on. A similar problem has been considered in [14] in the context of the irregularly clocked A5/1 generator [5] , which uses the bits of the internal bitstream both for computing output bits and as input for the clock control mechanism. A possible solution, which was also proposed in [14] , is to increase the number of unknowns by working with u(j) synchronized duplicates of the R j -bitstream at the expense of a reduced information rate α. We now consider the more general situation that the update function depends on the new bits and some function (s) g 1 , . . . , g r in the old bits. In this case, it suffices to introduce auxiliary variables for the values of these functions in order to ensure that z t is computed only from new bits. This construction is illustrated in the following example.
Example 1. Consider the output function
Assuming canonical reading order, w t+9 would be the new bit and w t+5 and w t+7 the old bits. With the auxiliary variablew t := g 1 (w t+5 , w t+7 ) and g 1 (x 1 , x 2 ) := x 1 ⊕ x 2 , we can express z t as z t =w t ⊕ w t+9 .
If we add for each auxiliary variable an FSR to the generator that outputs at clock t the corresponding value of g j , we can compute z t without considering the bits from the internal bitstream more than once. Obviously, the resulting equivalent generator is regular, but will have a lower information rate as before, since more bits of the internal bitstream have to be read in order to compute the same number of keystream bits.
FSR Consistency Check R m
Recall that each bit w t of an internal bitstream w is either an initial state bit of some FSR or a combination of other internal bits. In order to decide for a given internal bitstream whether it satisfies Condition 1, we need to check whether the update relations imposed on the bits at the combined positions are fulfilled. Hence, if a combined bit w t is produced by an update relation f i (s 0 , . . . , s n−1 ), we need to check whether f i (w i1 , . . . , w ip ) = w t , which is equivalent to testing whetherf and a reading order π ∈ σ n , we define the set of active monomials at clock t as
Hence, AM(f, t) contains all monomials in f for which at least one, but not all factors are known after the first t inputs have been read. 
and the root of G f as (1, 1 Note that for p = 1, we obtain the LFSR-bound that was proved in [14] . We now turn to the case of Fibonacci FCSRs. Equation (1) implies that we need access to σ t−1 in order to check whether the update relation holds for w t . Therefore, we work with a modified FCSR that outputs the sum σ t instead of the bit w t = σ t mod 2 in each clock. More precisely, the modified FCSR outputs for an initial memory state (b Proof. In order to check whether
This test can be performed in a π-OBDD as follows. Define the vertex set V as
consists of a variable number k corresponding to some σ We can straightforwardly verify that this construction yields a π-OBDD of width at most 2 q+1 which accepts only those inputs for which σ t satisfies the update relations.
In the case of Galois FCSRs with b i ≤ c i at all times, we denote by a i (t) and b i (t) the value of the register cells a i and b i at time t. The definition of Galois FCSRs implies a n−1 (t) = a 0 (t − 1) and for i ∈ {n − 2, . . . , 0} that a i (t) = a i+1 (t − 1) if c i = 0 and a i (t) = a i+1 (t− 1)⊕ b i+1 (t− 1)⊕ a 0 (t− 1) if c i = 1. We therefore focus on the nontrivially computed bits and think of the main register as producing the bitstream a 1 (0), . . . , a n−1 (0), . . . , a i1 (t), . . . , a i l (t) , . . . , Proof. According to Corollary 2, we can test the linear conditions on the a i (t) where c i+1 = 1 in a π-OBDD of width at most 2. Similarly, Corollary 2 yields a maximum width of 2 3 = 8 in the case of the carry register since b ij (t) can be computed as 
Applications
Trivium
Trivium [7] is a regular keystream generator consisting of three interconnected NFSRs R 0 , R 1 , R 2 of lengths n (0) = 93, n (1) = 84, and n (2) = 111. The 288-bit initial state of the generator is derived from an 80 bit key and an 80 bit IV. The output function computes a keystream bit z t by linearly combining six bits of the internal state, with each NFSR contributing two bits (cf. Appendix 6 for details). In order to mount the BDD-attack on Trivium, we write the output function as Theorem 3 shows that the BDD-attack is applicable to Trivium, but its performance is not competitive with recently published attacks, which recover the initial state in around 2 100 operations from 2 61.5 keystream bits [17] or in around 2 135 operations from O(1) keystream bits [9] .
Grain-128
The regularly clocked stream cipher Grain-128 was proposed in [12] and supports key size of 128 bits and IV size of 96 bits. The design is based on two interconnected shift registers, an LFSR R 0 and an NFSR R 1 , both of lengths n (0) = n (1) = 128 and a nonlinear output function. Theorem 4 is to the best of our knowledge the first exploitable cryptanalytic result besides generic time-memory-data-tradeoff attacks [3] , which require time and keystream around 2 128 .
The F-FCSR Stream Cipher Family
The F-FCSR stream cipher family in its current version is specified in [2] . It consists of the variants F-FCSR-H and F-FCSR-16. In order to mount the BDD-attack, we split the FCSR into the main register R 0 and the carry register R 1 . Since each output bit is computed as the sum of up to 15 internal bits, we are in a similar situation as described in Example 1 and need additional LFSRs R 2 , . . . , R 9 to compute the keystream bits b j , 0 ≤ j < 8. The modified output function simply returns these bits in each clock. With l := wt(c)−1 we obtain eight output bits from 2l+8 internal bits, hence β(m) = 2l+16 . The modified generator satisfies the Independence Assumption and the BDD assumption as before, and we have l = 130 additional unknowns. We obtain by applying Theorem 3: Our analysis supports the security requirement that the Hamming weight of c should not be too small, which was also motivated by completely different arguments in [2] . Although the BDD-attack is to the best of our knowledge the first nontrivial attack on the current version of the F-FCSR family, it is far less efficient than exhaustive key search.
Conclusion
In this paper, we have shown that the BDD-attack can be extended to keystream generators based on nonlinear feedback shift registers (NFSRs) and feedback shift registers with carry (FCSRs) as well as arbitrary output functions. We have applied our observations to the three eStream focus candidates Trivium, Grain and F-FCSR. In the case of Grain, we obtain the first exploitable cryptanalytic result besides generic time-memory-data tradeoffs. Our analysis of the F-FCSR family provides additional arguments for already proposed security requirements. 
C Grain Algorithm
In each clock, an output bit z t is derived by 
D F-FCSR-H Algorithm
At each clock, the generator uses the following static filter to extract a pseudorandom byte: F = (ae985dff 26619fc5 8623dc8a af46d590 3dd4254e) 16 The filter splits into 8 subfilters (subfilter j is obtained by selecting the bit j in each byte of F ) 
