Abstract. In this paper a technique for fault detection in hardware implementation of the PP-1 symmetric block cipher has been studied. Simulations of the behaviour of fault propagation in the key scheduling process is reported. The simulation proves that both parts of the algorithm, data-path and control, should be protected. Previous studies [1, 2] Keywords: PP-1 cipher, Concurrent Error Detection, round key scheduling, parity bits. Słowa kluczowe: szyfr PP-1, współbieżne wykrywanie błędów, generowanie kluczy rundowych, bity parzystości.
Introduction
Secure implementations of cryptographic systems have received much attention. Conventional cryptanalysis deals with the mathematical properties of a system, and the physical cryptanalysis focuses on the physical behaviour of a system during operation. Basically, all assumptions for all kinds of physical attacks apply to all ciphers when they are implemented. Each attack can be different from the others, depending on the actual implementation and depending on the properties of the cipher.
Differential fault analysis (DFA) is a method of physical cryptanalysis and was originally proposed by Biham and Shamir in 1997 [3] . It assumes that an attacker can induce faults into a cipher and collect the correct as well as the faulty behaviours. Then the attacker compares the behaviors in order to retrieve the secret information embedded inside the cipher. It means that fault detection is a desirable property for preventing malicious attacks, aimed at extracting sensitive information, like the secret key, from the device.
There are different types of faults and methods of fault injection in encryption algorithms. The faults can be transient or permanent. Several transient and permanent faults and methods of fault injection such as varying supply voltage, external clocks, temperature or inducing faults using white light, laser and X-rays methods of fault injection are discussed in detail in [4] .
Concurrent Error Detection (CED) techniques are widely used to enhance system dependability. The proposed solutions consist of using various forms of redundancy to obtain an attack-resistant architecture. These solutions have different area overhead, performance penalty, and fault detection latency [5, 6, 7] . This paper recommends possible countermeasure against the fault attacks. The countermeasure is, mainly, a parity check method to verify the correctness of the round key. Proposed method does not require modification of the PP-1 algorithm. We develop the model presented in [1, 5] and extend the fault analysis to the Key Schedule unit. We show that the Key Schedule unit has a highly dispersive behavior that allows an error to propagate quickly, but this does not compromise the detection rate of the parity code.
We provide simulation results related to the fault coverage of the proposed approach. This paper is organized as follows. Sec. 2 and 3 present the PP-1 block cipher -processing path and key scheduling module respectively. In Sec. 4 error propagation in key scheduling module is shown. Possible faults and faults models are described in Sec. 5. In Sec. 6 we present CED schemes. Simulation results are presented in Sec. 7. Sec. 8 concludes the paper.
Processing path of PP-1 cipher
The scalable PP-1 cipher is a symmetric block cipher designed at the Institute of Control and Information Engineering, Poznań University of Technology. It was designed for platforms with limited resources, and it can be implemented for example in simple smart cards.
The PP-1 algorithm is an SP-network. It processes in r rounds data blocks of n bits, using cipher keys with lengths of n or 2n bits, where n = t*64, and t = 1, 2, 3, .... One round of the algorithm is presented in Fig. 1 . It consists of t=n/64 parallel processing paths. In each path the 64-bit nonlinear operation NL is performed. Additionally the n-bit permutation P is used. In the last round, the permutation P is not performed. These algorithm is presented in [6] . Two n-bit round keys k i '=k 2i-1 and k i "=k 2i are used in round i, where i = 1, 2, ..., r. Let j denote the number of the parallel processing paths from left to right, j = 1, 2, ..., t.
The 64-bit round subkeys k i,j ' and k i,j " used in the element NL #j, consist of eight 8-bit elementary keys
The same algorithm is used for encryption and decryption. However, if in the encryption process we use the round keys k 1 , k 2 , ..., k 2r then in the decryption process these keys must be used in the reverse order, i.e. k 2r , k 2r-1 , ..., k 1 .
Round key scheduling
The round key scheduling is performed in 2r+1 iterations (i = 0, 1, ..., 2r), where r is the number of rounds. One iteration of key scheduling is presented in Fig. 2 . The round keys k 1 , k 2 , ..., k 2r are produced on outputs of iterations #1 to #2r [6] . The KS element of the iteration is shown in Fig.3 . It is composed of substitution S, XOR, addition and subtraction modulo 256. The operation RR(e i ) is the rotation of n-bit block V i by e i bits to the right. The 4-bit integer e i is obtained as the result of the XOR operation for 4-bit arguments, which are the 4 most significant bits of the output of the two leftmost S-boxes. Thus for V i = v 1 v 2 ...v n , where v 1 is the most significant bit, the value of e i is calculated as follows:
Error propagation in key scheduling
The error propagation behavior of the data path (i.e., the encryption or decryption process) was studied in [1] . Another part of the algorithm implementation that can be affected by faults is the key schedule. A single faulty bit injected during the round key computation process may cause a large number of erroneous bits in the next round keys. At Error propagation analysis was carried out to understand the effect of an error injected into the round key computation. Experiments were conducted by injecting a single bit flip error at different bits randomly and the number of bits that were in error was computed. One faulty bit injected in one of the inputs of S-boxes in the first round causes about 52% faulty bits in the next rounds.
This analysis helps us in choosing suitable error detection schemes.
Faults models
Fault attack tries to modify the functioning of the computing device in order to retrieve the secret key. The attacker induces a fault during cryptographic computations. The feasibility of a fault attack or at least its efficiency depends on the exact capabilities of the attacker and the type of faults he can induce.
In our considerations we use a realistic fault model wherein either transient or permanent faults are induced randomly into the device. We consider single and multiple faults.
Faults are modelled as a m -bit error vectors 
CED
When the data-path is assumed to be fault-free and the key scheduling is affected by the injection of a single faulty bit at some round, it has been verified that a faulty bit injected in the early rounds causes a high number of erroneous bits. If the erroneous round key is used for decryption, it is not possible to detect the presence of a faulty bit in the key material. The sender will be unable to realize that the transmitted encrypted data is corrupted and the receiver will decrypt useless data. Consequently, special attention must be paid to the fault management of the round key. The operations are the same as in the case of the data processing path.
A proposal for error detection in the data-path of PP-1 was described in [5] . The goal there was to prevent an attacker from breaking the cipher system by injecting one or more incorrect bits.
In this paper we will analyse the possibilities of errors detection in the part of key schedule. As it mentions above, the operations are the same as in the case of the data processing path, it means substitution box S, XOR, addition and subtraction modulo 256. Besides the operation RR(e i ) is used. It is the rotation of n-bit block V i by e i bits to the right. Each of these operations is protected.
In [5] has been proposed, implemented and tested a parity based method of concurrent error detection in Sboxes. The S-box is usually implemented as a 256x8 bits memory, consisting of a data storage section and an address decoding circuit. To increase the dependability and detect input, output and internal memory errors of the S-box we propose replacing the 256x8 bits memory that stores the S-box values with 256 x10 bits memory. One of these two additional bits is parity bit generated for incoming data bytes, the other one is parity bit generated for outgoing data (Fig. 6 ).
In our experiments we focused on transient and permanent, single and multiple stuck-at faults and bit flips faults. Single, transient stuck-at-0/1 errors are detected in 50%, but permanent errors are detected in 100%. Detection percentage for single bit flip errors is close to 100%. The same is observable for permanent and transient errors.
Two another operations -XOR and RR(e i ) (rotation of nbit block V i by e i bits to the right) we protect using parity code. Parity bits are capable of detecting all single bit errors and those multiple bit errors where the number of errors is odd. We cannot, however, employ just a single parity bit for fault detection. As it shown in Sections 4, errors spread quickly throughout the key scheduling block and, on the average, about half of the state bits become corrupt. Hence, the fault coverage of the parity bits would be at best around 50%, which is unacceptable in practice.
To circumvent these problems, we propose to associate one parity bit with each input/output byte of XOR element (Fig. 7) .
where: A -data byte, K -key byte. In this way each parity bit will depend only on a limited portion of the data (8 bits).
Rotation RR(e i ) we protect using only one parity bit for input data, and one for output data and we detect only single bit errors and those multiple bit errors where the number of errors is odd.
A method of CED for two successive operations, addition and subtraction modulo 256 is shown in the Fig. 8 . We use an inverse operation for each data byte. In this case area overhead is more as 100% but all errors are detected. Fig. 8 .
CED for addition operation
Simulation results
In this section, we provide simulation results related to the fault coverage of the proposed approach. We present simulation results on the vulnerability of these techniques for fault models from Section 5. The faults were introduced on inputs, outputs of all operations and into internal memory of the S-box.
In order to measure the detection capability we used VHDL hardware description language and the VHDL simulator provided by Aldec, Active-HDL. The VHDL model of the key scheduling module of the PP-1 cipher has been modified with the faults. In our considerations we used a realistic fault model wherein faults are induced randomly into the device at the beginning of the rounds, i.e., faults are not injected between the round operations. In this experiment we focused on transient and permanent, single and multiple stuck-at faults and bit flips faults.
We perform a check at the output of each round operations (Figs. 9 and 10) and at the end of every round (Fig. 11) . In the first case it is determined the probability of detecting all injected faults. Each security module operates independently of the others and detect errors only in its area. In the second case we determine the probability of detecting only those faults that changed the round keys. In this case all single, permanent errors are detected. In the Fig. 11 we can see, that the percentage of undetected multiple, permanent errors is small (less than 0.15%) and decreases with the number of bit errors. We can say that according to an exponential law.
Percentage of undetected transient errors is greater and is maximum 1.2%. 
Conclusion
Fault attacks are becoming a serious threat to hardware implementations of ciphers and proper countermeasures must be adopted to foil them. The simulation proves that both parts of the algorithm, data-path and control, should be protected. Previous studies [1, 2] have only considered the data-path, ignoring the key scheduling. In this paper we have presented an operation-centered approach to the incorporation of fault detection into cryptographic device implementations with the small hardware overhead. This method of error detecting can provide a useful protection against fault attacks and, in general, against errors occurring during the encryption process. It provide full coverage of single-bit errors and high coverage of multiplebit errors, which are the most common in fault attacks. A proposed fault detection method in key scheduling module required a limited amount of circuit overhead and does not require modification of the PP-1 algorithm.
