Abstract. This paper provides a fresh analysis of the widely-used Common Scrambling Algorithm stream cipher (CSA-SC). Firstly, a new representation of CSA-SC with a state size of only 89 bits is given, a significant reduction from the 103 bit state of a previous CSA-SC representation. Analysis of this 89-bit representation demonstrates that the basis of a previous guess-and-determine attack is flawed. Correcting this flaw increases the complexity of that attack so that it is worse than exhaustive key search. Although that attack is not feasible, the reduced state size of our representation makes it obvious that CSA-SC is vulnerable to several generic attacks, for which feasible parameters are given.
Introduction
The Digital Video Broadcasting Common Scrambling Algorithm (CSA) has been used to encrypt European cable digital television signals since 1994. It was specified by the European Telecommunication Standards Institute (ETSI), and the proprietary algorithm was distributed to cable TV subscribers in the form of a hardware chip. Although some high-level details appear in patents [3] , the algorithm has never officially been revealed. In 2002, a software program that implemented the algorithm was released in binary form. This was reverse engineered by hackers who released the details of the CSA algorithm.
The CSA algorithm can be considered as the application of two cipher layers: a stream cipher layer and a block cipher layer. For encryption, the block cipher is applied first, followed by the stream cipher layer. For decryption, the stream cipher layer is applied first, followed by the block cipher layer. Both the block and stream ciphers are initialized using the same 64-bit key. We do not consider the block cipher component within this paper. The stream cipher component is a binary additive stream cipher. We refer to the keystream generator for the stream cipher component of the CSA algorithm as CSA-SC.
The CSA-SC structure comprises two nonlinear Feedback Shift Registers (FSRs), a combiner with memory and an output function. In the patent application [3] , the total internal state size of CSA-SC is described as 107 bits. In previous analysis of CSA-SC, Weinmann and Wirt [8] showed that it can be modelled using 103 bits, and note that the period of the keystream produced by CSA-SC is upper bounded by 2 103 . In this paper, we provide a new representation of CSA-SC that uses only 89 state bits. Consequently the maximum CSA-SC keystream period must be much less than the 2 103 bits asserted by Weinmann and Wirt [8] , with an upper bound of 2 89 bits. This significant reduction in state size also has implications for the security of CSA-SC.
Weinmann and Wirt [8] presented an analysis of CSA-SC and proposed a guess-and-determine attack with complexity less than 2 45 , based on their 103 bit CSA-SC representation and predicated on the state cycle structure of one of the FSRs during keystream generation. They claimed that the state cycle structure for this FSR consists of many leading paths and short cycles, with experimental simulation to support this conjecture. However, in examining the cycle structure of the FSR when developing our 89-bit model, we identified that there are no leading paths to cycles, and short cycles were not readily located, implying that the Weinmann-Wirt attack can not work as claimed.
We present our model of the CSA-SC in Section 2. In Section 3, we provide theoretical observations about the state update functions used in CSA-SC. These observations contradict the results presented by Weinmann and Wirt [8] , but show that there are security vulnerabilities that can be exploited in cryptanalytic attacks. Section 4 describes an exploration of the FSR state cycle structure. This is motivated by the discrepancy between our observations and the results presented in [8] . Section 5 discusses several possible attacks on CSA-SC. In Section 5.1, the analysis of CSA-SC in [8] is summarized, and the problem with their attack is discussed. In Section 5.2, the vulnerability of CSA-SC to time-memory tradeoff attacks is demonstrated. Section 6 presents some closing remarks on the security of CSA-SC.
Specification of the CSA-SC
CSA-SC comprises two FSRs and a combiner with memory. These are denoted FSR-A, FSR-B and FSM-C, respectively. For our representation of CSA-SC, FSR-A and FSR-B each have ten stages, with each stage containing one four-bit word (a nibble). The combiner FSM-C consists of two stages, each containing a nibble, and a single-bit carry. The total state size is 89 bits. Figure 1 shows a high-level view of the relationships between components of the CSA-SC during keystream generation.
During keystream generation, FSR-A is autonomous. The state update function for FSR-A is nonlinear, with the output of the s-box S A used in calculating the next state value. Nonlinear outputs from FSR-A are also used as input to The notation used in this paper is as follows. Let A t represent the contents of FSR-A at time t. Then A t i,j represents bit j of the ith stage of FSR-A at time t, where i ∈ {0, 1, . . . , 9} and j ∈ {0, 1, 2, 3}. Similarly, B t i,j represents bit j of the ith stage of FSR-B at time t. At time t, the contents of the two four-bit stages of FSM-C are denoted D t i,j , where i ∈ {0, 1} and j ∈ {0, 1, 2, 3}, and the contents of the one-bit carry are denoted c t . According to ETSI conventions, the most significant bit of a stage is denoted by index 0. Binary addition and modular addition in Z 2 4 are represented by ⊕ and operators, respectively. ROL x represents word rotation to the left by x bits. c||D represents the concatenation of bit c and word D. For example, 0||1001 = 01001.
Generating keystream
When the keystream generator is clocked at time t, FSR-A, FSR-B and FSM-C are simultaneously clocked. FSR-A is autonomous, and contributes to the state update functions of FSR-B and FSM-C. Nonlinear combinations of values stored in FSR-A stages, obtained through the use of various s-boxes, are used in the state update functions of all components. S-boxes S A , S B and S D each take twenty-bit inputs from FSR-A and provide 4-bit outputs. S-boxes S P and S Q take 5 bits of input from FSR-A, and each produce a 1-bit output. Specific details regarding the s-boxes are contained in Appendix A. The state update functions for the CSA-SC components are as follows:
The keystream is produced as a series of 2-bit words. 
where
The function F C produces four bits of output, with each output bit formed from a linear combination of four bits from FSR-B. Specifically, F C (B t ) = (B 
A Note on Previous Representations
In the DVB patent [3] , the CSA-SC algorithm is defined as having a state of 107 bits. This representation facilitates an efficient hardware implementation. The CSA-SC representation of Weinmann and Wirt [8] reduced the state size from 107 to 103 bits.
Our representation obtains a further reduction in the state size to 89 bits, by removing the 4-bit memories X, Y , and Z and one-bit memories p and q from the representation used by Weinmann and Wirt. These memories hold the outputs of s-boxes at time t, and are used in the state update function at time t + 1 to form feedback. In this representation, none of the s-boxes use the final stage of FSR-A, A 9 . When the state update function is applied to FSR-A, the contents of A t 0..8 are shifted to become A t+1 1..9 . All of the values required to calculate the feedback at time t remain in the FSR at time t + 1, so the memories, while useful in constructing efficient hardware, are not required in an equivalent representation of the shift register. In the equivalent representation, the indices of the stages used as inputs to the s-boxes must be incremented by one.
Note that in the original representation, the initial value of all memories is zero. For our representation, this necessitates a special case for the first clock of the key initialization process where the output of the s-boxes must be treated as zero irrespective of their inputs. Although this may not be the most efficient hardware implementation, it permits a cryptographically equivalent representation of CSA-SC using only 89 bits.
Neither our work nor that of Weinmann and Wirt considers the key initialization during cryptanalysis, so we omit the details of the initialisation process in this paper. There are several errors in the specification given in the work of Weinmann and Wirt [8] , which we correct in Appendix A of this paper.
Some observations on CSA-SC
In this section, we make some observations regarding CSA-SC. Firstly, in Section 3.1, we show that during keystream generation the state update functions of both FSR-A and FSR-B are invertible. That is, given the state of the two FSRs at time t + 1, the corresponding FSR states at time t can be uniquely determined. This contradicts the claims of leading paths to short cycles made by Weinmann and Wirt [8] , motivating our exploration of the FSR-A state cycles presented in Section 4. Secondly, in Section 3.2 we make observations regarding the ratio of the CSA-SC state size to the key size, which indicates a vulnerability to a generic style of attack.
State update functions during keystream generation
The state update functions for FSR-A and FSR-B are invertible. As FSR-A is autonomous during keystream generation, but FSR-B is dependant on FSR-A, it is necessary to establish that the state update function for FSR-A is invertible before examining the state update function for FSR-B. Following this, the conditions under which the FSM-C may be inverted are also presented.
The state update function for FSR-A makes use of S A , a 20 × 4 s-box. Although S A itself is not bijective, the FSR-A state update function is nevertheless invertible because (rearranging the state update function in Section 2.1):
That is, for the inversion, the register contents are shifted back one stage, rather than forward, and the contents of the last stage, A 9 , are computed from the contents of A , this initially appears to cause a circular dependancy. However, if S A is considered as the concatenation of four 5-input Boolean functions, then the ouput of each of these functions can be computed individually and the dependency avoided. is more complex, due to the use of integer addition rather than XOR.
Consider the case where is in the least significant bit, but as the addition is integer addition, this raises the possibility of carry to the next bit position, and so on. That is, if c t−1 = 1 and the least significant bit of D t−1 1 = 1, then the least significant bit of the integer sum will be 0, and the influence of c t−1 is carried to the next position. However, where the least significant bit of the integer sum is 1 (that is, the sum is an odd value), clearly the value of c t−1 and the value of the least significant bit of D
are not the same, so there is no possibility that the influence of c t−1 extends beyond that least significant bit position. The 2-bit keystream output z t is useful in discovering whether the two least significant bits in D 
is odd, and known keystream bits can be used to determine this unique state. Otherwise, there are two possible prior states for the combiner.
The ratio of state size to key size
The CSA-SC key initialization uses the combination of a 64-bit key and 64-bit IV to populate the 89-bit state in preparation for keystream generation. That is, the initialisation function takes 128 bits of input and produces an 89-bit output: the CSA-SC initial state at the start of keystream generation. Although this paper does not consider the specific details of the initialisation function, clearly there exist multiple key-IV pairs that produce the same internal state at the start of keystream generation, and hence the same keystream. That is, the use of different keys does not guarantee the production of different keystreams. Even for a single 64-bit key, clearly there are multiple IVs for which the initial 40-bit state of FSR-A will be the same. As the nonlinearity of the functions used in CSA-SC is largely determined by the contents of FSR-A, CSA-SC may be vulnerable to divide and conquer attacks which target FSR-A.
The small CSA-SC state size also indicates a potential vulnerability to timememory-data (TMD) tradeoff attacks. These known-plaintext attacks can be used to identify either the internal state of CSA-SC, or the key. These attacks are discussed in greater detail in Section 5.
Exploring the state cycles of FSR-A
In Section 3.1, the state update function of FSR-A is shown to be invertible. Therefore every state of FSR-A has exactly one previous state and exactly one successor state. Consequently, there is either one cycle of length 2 40 in the FSR-A state space or there are multiple disjoint cycles. It is possible that some of these may be short cycles. There can be no overlapping cycles, and there are no cycles with leading paths.
Floyd's algorithm [6] can be used to detect cycles. This simple algorithm, when applied to FSR-A of CSA-SC, detects a single cycle using the following steps: Algorithm FA A naive memoryless approach is to begin with s = 0 and increment s until s = 2 40 − 1. This which ensures that all cycles are mapped, but the running time of the process, at O(2 80 ), makes this infeasible. A modification to the algorithm, using a time-memory tradeoff, tracks the state values visited (using a one-bit flag per state). During each invocation of Floyd's algorithm, the value s is chosen from the complement of the set of visited states. The running time of this algorithm is 2 40 · c, but the storage requirement is 2 40 × log 2 2 40 bits = 2.3 terabytes. In practice, this version of the algorithm must be implemented using hard disks, which have high latency, so that the constant c becomes quite large. The storage requirements can be improved by tracking and storing only "distinguished points". For example, states with an 8-bit prefix consisting of 0 bits may be considered "distinguished". This reduces disk usage by a factor of 2 8 . However, the algorithm will not detect any cycles that traverse only non-distinguished points. Since Weinmann and Wirt [8] claim the existence of small cycles, we do not want to take this approach.
Initialize two instances
A possible compromise is to use the first approach, in which at iteration i, s i = i, but with a slight modification to include early stopping criteria. If Floyd's algorithm traverses over state j < i at iteration i, then the cycle has been visited previously and the algorithm aborts early. Similarly, if the cycle length l is larger than the state space not so far searched, then the algorithm has rediscovered a cycle and aborts.
Given that we know there are no leading paths, the algorithm can be optimized by using a single instance of FSR-A, with a stopping condition that a cycle has been found when the starting point is traversed for the second time. Table 1 . They were generated using several Intel Core Duo machines. Each core is capable of iterating through 2 38 states per day. The identification of nine large cycles in conjunction with the observation that the update function is invertible provides strongly contradictory evidence to the claims of Weinmann and Wirt [8] that 98% of the state space of FSR-A can be partitioned into very short cycles.
Cryptanalysis of CSA-SC
Observations made in Section 3.2 indicate CSA-SC may be vulnerable to two common styles of attack. The dependence of other components on the autonomous 40-bit FSR-A for nonlinearity indicates the potential for divide and conquer style attacks which target FSR-A. The ratio of the state size to the key size indicates a vulnerability to a generic style of attack known as Time-Memory Tradeoff (TMTO) attack. The attack in by Weinmann and Wirt [8] targets FSR-A, and is reviewed in Section 5.1. The application of TMTO attacks to CSA-SC is discussed in Section 5.2.
The Weinmann and Wirt Attack
A guess-and-determine attack on CSA-SC which targets FSR-A is presented by Weinmann and Wirt [8] . The aim of the attack is to recover the internal state of the cipher during keystream generation, when FSR-A is autonomous. The attack complexity is claimed to be less than 2 45 . We explain the flaw in this attack and show that, when the flaw is corrected, the attack performance is actually worse than exhaustive key search.
The attack is performed in three phases. In the first phase, the attacker guesses 53 bits of state comprising FSR-A, FSM-C and the 4-bit memory X used by Weinmann and Wirt [8] for their 103-bit CSA-SC representation. Because each output bit of F C is a linear combination of bits within FSR-B, and these output bits are linearly combined to form the keystream, a system of equations can be formed relating the keystream bits to the unknown contents of FSR-B. The second phase of the attack solves this system of equations using Gaussian elimination. The third phase of the attack comprises consistency checking to establish the veracity of guesses made in the first phase. If the consistency checking fails, the guess in the first phase is considered incorrect and a new guess is made. Otherwise, the attack terminates and the combination of the 53-bit guess and the solution to the equation system is used to recover the initial internal state.
The cost of the first phase is 2 53 operations. In the second phase, a system of equations containing 60 equations in 40 unknowns is developed, which can be solved using Strassen's algorithin in 2 17.7 operations. The cost of the third phase is negligible. The total complexity of the state recovery attack is therefore around 2 70.7 operations, which is about one hundred times worse than a brute-force key search on the 64-bit key.
Weinmann and Wirt [8] claimed to have identified numerous short cycles produced by the FSR-A feedback function. They performed 10,000 random initializations of the cipher, and found that for 98.4% of cases, those key-IV pairs led to FSR-A state cycles with lengths of between 108 and 121,992. This lead to the assumption that for any key-IV pair, the effective state space for FSR-A is equal to the sum of the lengths of those short cycles, with 98.4% probability. An attacker does not know the key, so must guess FSR-A states from all of the points on all of the short cycles. Ignoring leading paths, this gives a total of 313,169 possibilities. Therefore, Weinmann and Wirt [8] claim that the cost of the first phase is reduced to 2 19 × 2 9 , where the second term is the cost of guessing the memories and registers in FSM-C.
The optimisation of the guessing phase of the divide and conquer attack is necessary in order for the attack to be faster than exhaustive search. However, both the theoretical observation in Section 3.1 that no leading paths exist because the FSR-A feedback function is invertible, and the empirical results in Section 4 that demonstrate the FSR-A state cycles form a small number of disjoint large cycles show that the basis for the optimisation is unfounded. Thus the performance of Weinmann and Wirt's attack is worse than exhaustive key search, unless further optimizations are identified.
Time-Memory tradeoff attacks
As the TMTO approach is well known, it is not described here. Instead, we refer the reader to the work of Hong and Sarkar [7] for a description of the phases of TMTO attacks. Our analysis aims to determine the feasibility of applying TMTO attacks to CSA-SC given the constraints on the amount of keystream available to an attacker. The specification for Digital Video Broadcasting indicates that keystream is propagated at a maximum rate of 64 Mb/s, and that rekeying occurs at least every 120 seconds. Therefore, for a single key-IV pair, assume that an attacker has access to about D = 2 33 bits of data. We consider possible tradeoffs for two styles of TMTO.
The first style of TMTO attack is one in which the attacker attempts to invert the CSA-SC keystream to recover the internal state. For this style of attack, the attacker must satisfy the time-memory-data tradeoff curve
89 is the total number of states, T is the time taken to execute the attack, and M is the amount of memory required to store precomputed tables. The value 2 K = 2 64 represents the computational effort required to launch a brute force attack. The attacker is unable to make use of all available data since T ≥ D 2 implies that T > 2 K . Reasonable parameters are therefore D = 2 25 , T = 2 50 and M = 2 39 , for which the attacker requires around 6 terabytes of disk space. This is feasible by today's standards.
If the key initialization function was invertible, then the recovered internal state could be used to derive the key. However, this does not seem to be the case. Therefore this attack is of limited use, since frequent key-IV rekeying is mandatory within the DVB specification, and this attack must be performed on the keystream segment obtained after each rekeying.
The second style of TMTO attack attempts to invert the CSA-SC keystream to recover the key. For this style of attack, the state size is immaterial. The same time-memory-data tradeoff curve holds, but N now refers to the number of possible key+IV values, rather than the number of possible internal states. Here, N = 2 64+64 . Taking the Dunklemann-Keller approach [4] , the attacker prepares different tables for many IVs in the precomputation phase. This removes the restriction that T ≥ D 2 , at the expense of reducing the success rate of the attack if the right IVs are not used. Due to the larger size of N , the computational complexity of this attack is inferior to the first, with one possible parameter set being D = 2 48.5 , T = 2 53 , M = 2 53 . However, as this is key recovery rather than state recovery, the stipulation for frequent rekeying does not present a limitation. Therefore, any proposed key recovery attack on CSA-SC with time complexity greater than T=2 53 should be regarded as unnecessary unless the data and memory requirements are much less than those given for this TMD attack.
Discussion and Conclusion
In this paper we provide a new representation of CSA-SC that uses only 89 state bits, a significant reduction over the 107 bits and 103 bits used for previous representations. Theoretical observations about the state update functions of components of CSA-SC contradict the empirical results presented in previous research [8] , motivating an exploration of the state cycles for FSR-A.
Our FSR-A state-cycle findings raise doubts about the validity of the optimisation required in order for the divide and conquer attack presented by Weinmann and Wirt [8] to be successful. It appears that the complexity of that state recovery attack is not around 2 45 , as claimed, but in fact worse than exhaustive key search. Even applying their approach to our representation of CSA-SC, where the state size is reduced because the memory X is redundant, results in an overall attack complexity of about 2 66.7 operations. This is about thirteen times worse than brute force attack and several orders of magnitude worse than TMTO attacks, although with the memory requirement is less than for the TMTO attacks.
The reduction in the state space obtained in our model indicates that CSA-SC is vulnerable to TMTO attacks. Given a keystream segment produced from a single key-IV pair, a state recovery attack is possible with data, time and memory parameters of D = 2 25 , T = 2 50 and M = 2 39 , respectively. Additionally, for an increased data value obtained by taking keystream segments formed from a single key, but possibly multiple known IVs, a key recovery attack is possible, with data, time and memory requirements of D = 2 48.5 , T = 2 53 , M = 2 53 , respectively. This application of a generic attack style to CSA-SC shows that CSA-SC is vulnerable to cryptanalytic attack.
The attacks discussed in this paper made no use of the CSA-SC initialisation process. Exploring the initialisation process may reveal weaknesses that will lead to improved attacks on this cipher. Additionally, it may be possible to improve the performance of divide and conquer attacks targeting FSR-A by reducing the complexity of determining whether guessed FSR-A and FSM-C states are correct. This could be accomplished, by using a distinguisher, and only proceeding to solving the system of equations to recover the contents of FSR-B for a correct guess.
We examined the cipher with respect to differential and linear attacks, and to guess-and-determine attacks. Even though the s-boxes are far from optimal with respect to differential and linear attacks, the fact that in each clock cycle, half of the bits in FSR-A are passed through s-boxes makes it difficult to utilize the s-box biases; ie. the diffusion in the register is good. Likewise, this means that a large number of bits must be guessed in order to determine a single nibble in the register, and a straightforward guess-and-determine approach is ineffective in reducing the complexity of an attack below 2 53 operations, even considering the effective reduced state size.
Although there are generic attacks that apply to CSA-SC, it appears that the attack strategy of Weinmann and Wirt [8] does not succeed in key recovery with better complexity than brute force. x0x2x3 + x0x2x3x4 + x0x1 + x0x1x4 + x0x1x3 + x0x1x3x4 10 x 3 + x 2 x 4 + x 1 x 3 x 4 + x 1 x 2 + x 1 x 2 x 4 + x 0 + x 0 x 3 x 4 + x 0 x 1 x 4 11 x 4 +x 2 +x 2 x 3 +x 2 x 3 x 4 +x 1 x 3 +x 1 x 2 +x 1 x 2 x 3 +x 0 x 3 x 4 +x 0 x 2 x 3 +x 0 x 2 x 3 x 4 + x 0 x 1 x 3 x 4 + x 0 x 1 x 2 x 3 12 x 4 +x 3 +x 3 x 4 +x 2 +x 1 +x 1 x 3 x 4 +x 0 x 4 +x 0 x 3 x 4 +x 0 x 2 +x 0 x 2 x 3 +x 0 x 2 x 3 x 4 + x 0 x 1 x 3 x 4 + x 0 x 1 x 2 x 3 13 x4 + x3x4 + x2 + x2x3 + x2x3x4 + x1 + x1x2 + x0 + x0x1x3 + x0x1x3x4
