In hardware implementation of a cryptographic algorithm, one may achieve leakage of secret information by creating scopes to introduce controlled faulty bit(s) even though the algorithm is mathematically a secured one. The technique is very effective in respect of crypto processors embedded in smart cards. In this paper few fault detecting architectures for RC4 algorithm are designed and implemented on Virtex5(ML505, LX110t) FPGA board. The results indicate that the proposed architectures can handle most of the faults without loss of throughput consuming marginally additional hardware and power.
Endomorphism
RC4 algorithm is very simple and is widely used as a stream cipher. Today RC4 is a part of many network protocols, e.g. SSL, TLS, WEP, WPA and many others.There were many cryptanalysis to look into its key weaknesses [1] followed by many new stream ciphers [2] . RC4 is still the popular stream cipher since it is executed fast and provides high security. It is believed that mathematically secure crypto algorithm becomes vulnerable while implementing it in hardware [3] , since it becomes possible to extract secret information by introducing faults in a controlled fashion due to which on fault detection techniques turn out to be a key issue related to hardware implementation. Moreover, shrinking dimension of raw devices induces Single Event Upset (soft error) which is termed as a change of logic state caused by ions or electro-magnetic radiation striking the device. The dense devices use more hardware components for faster processing and in turn cause increase of ion beam radiation as internal faults. This ion beam radiation causes state change in CLB [4] . Usually two types of faults tolerant circuits are in use,one is Hardware Based Fault Tolerant (HBFT) circuits [5] [6] [7] and the other is Algorithm Based Fault Tolerant (ABFT) circuits [8] [9] [10] [11] . For HBFT, faults are detected either at the CLB level or at the LUT level.
A hamming code based fault detecting and correcting scheme is proposed for stream cipher like A5/1(GSM), E0 (Blue-tooth), RC4 (WEP), and W7 on hardware platform in article [12] . It is not necessary that faults are always sourced from the system itself. ABFT circuits at the communication level with specific reference to RC4 are proposed in [13] and [11] . A sequence number padded to each cipher character of RC4 is proposed in [13] . In [11] a method is proposed where data are stored in a matrix and 1 byte checksum is added to each of the rows and columns of the matrix. For multiple error detection they used Knight Checksum. Both the fault methods detect the fault after execution of the cipher text and thus take some additional time. There exists quite a few literatures on AES Fault tolerance Scheme [8] [9] . The article [8] has tried to find out the contagious sections of the AES algorithm from which section the probability of error spreading is maximum. It has been observed how a single bit and multiple bit errors can spread over the data with algorithm iterations. Fault is located followed by its detection. They have introduced a parity checker scheme (16 bits) with input data block (16 bytes) which can detect single bit errors and as well as odd multiple bit errors.The error detecting efficiency of this scheme is not so good for efficient error detecting crypto system. For key scheduling process [8] has proposed an inverse key scheduling module which error detecting efficiency is appreciable but resource usage becomes twice of the original key scheduling model. In [9] three types of fault detection scheme based on cycle redundancy checks (CRC) are proposed. In this paper an ABFT scheme is proposed for RC4 stream cipher in which efficient fault detecting additional hardware blocks are designed with an intention to detect maximum errors using minimum resources. The proposed scheme can detect faults at the very instant the ciphering is being executed. Fault blocks and the algorithm block are executed in parallel due to which the throughput remains unchanged. When fault is detected, the system is reset to aware the user. Here faults are only detected,not corrected. In the absence of fault detecting blocks, occurrences of faults would cause changes in the power and timing parameters which would provide information to side channel attackers to extract information related to secret key. The paper is organized as follows: Section 2 details the overview of Fault techniques.The experimental results are summarized in Section 3. Conclusion and References are enlisted in Section 4 and 5.
Fault detection techniques adopted for RC4
The RC4 has two sequential algorithms, namely KSA (Key Scheduling Algorithm) and PRGA (Pseudo Random Generator Algorithm) and are shown in . As both the processes have more or less identical operations, the design of fault tolerance modules is discussed for one process only.The detail hardware architecture of the core algorithm has been described else where in [14] . Figure 2 and the structure of the CRC code is shown in Figure 3 . Before the execution of each round, the core algorithm checks the âĂĲno faultâĂİsignal contributed by the three proposed fault checkers. Each of the three fault checkers feed no_f ault to the algorithm block through AND gate.Any fault detected by a particular fault module can stop the execution of the algorithm at that instant of clock edge.
Error detection on S-Box: CRC Checker
To detect fault on S-Box Array standard CRC technique of 4-degree polynomial is used. Lower degree of polynomial is used to reduce length of redundant CRC bits. It has been seen that CRC has good efficiency to detect single bit errors, double bit errors and odd number of errors. A dedicated hardware block to execute CRC algorithm has not been designed since that would require a huge hardware resource,large computation power and might cause some synchronization problem with the main algorithm.This synchronization problem is a sensitive issue as the main crypto core has a very high throughput based on dual edge sensitivity. To bye-pass, a standard 4-degree divisor, X 4 + X 3 + 1 is chosen and four bit residue is computed as CRC code which has been padded to each S-Box element, each element thus becomes a 12-bit data instead of 8-bit, as shown in Figure 3 . This new S-box is stored as two S-Boxes in the CRC hardware block (vide Figure 4) 
Error Detection on Addition Checker
There are few standard types of error detecting techniques on arithmetic operations [15] such as parity and residue techniques which are most popular due to its low cost and high error detecting efficiency. Residue technique is motivated on modulus operation which costs huge hardware footprint [16] in FPGA based platform. This is the reason that the parity scheme is adopted in this paper for addition checker. The 8-bit data is split into two 4-bit nibble to increase the error detecting efficiency as is seen in table 2. Now it is necessary to describe how the parity prediction scheme is initiated for addition operation. Of the two 8 bit numbers, such as If there are two (3)), p(add higher)=p(add(4) xor add(5) xor add(6) xor add (7)), p(aug lower)=p(aug(0) xor aug(1) xor aug(2) xor aug (3)), p(aug higher)=p(aug(4) xor aug(5) xor aug(6) xor aug (7)).
Prediction for arithmetic addition
It is well known that the parity of the sum of two natural number can be obtained by XORing the parities of both summands and of all carries propagated between any two adjacent bits, plus the possible carry-in into the least significant position. Hence
Hardware strategy of addition checker
In KSA process the inputs of addition checker such as,
, and K[i] and its summation result has been passed to the addition checker block. By parity prediction technique the addition checker fault block can check the whether the summation is right or wrong. The efficiency is about 75% which is portrayed in table 3. The same addition checker module has been used for Z computation in line no. 5 and 7 of PRGA process.
Error Detection on i counter
Several techniques [17] have already been developed in order to improve the reliability of binary counter. A completely new technique is proposed consuming very low resource usage and exhibit very high error detecting efficiency. An interesting pattern has been observed in binary counting. If the parity of even bit position data is computed, the parity of first 4 set of data will be the complement of next 4 set of data.Similarly if the parity of odd bit position data is computed, the parity of first 4 set of data will also be and feed a decision whether the counting is right or wrong based on the pattern prediction that we describe in table 4.
Hardware overview of counter checker
The main RC4 algorithm core increasing ′ i ′ in every clock cycle. Every 8 set of data has been buffered into an array in counter checker fault block. The fault block separate 8 set of data into two 4 set of part. The fault checking algorithm checks the proposed pattern mentioned in table 4 after every 8 clock cycle and make decision that whether fault has been occurred or not which has been fed to the main algorithm block.The error detecting 
Results and discussion
The individual fault blocks have been implemented on Xilinx Virtex5 FPGA board. The resource consumption of 3 fault blocks is very less compared to the main architecture. Of the proposed fault blocks, the counter checker block and add checker sub-blocks takes very less resource compared to main architecture (0.09% & 0.31% ) while the main CRC checker subblock(50% ) is resource hungry and takes considerably high resource compared to main architecture. Not only this, the 3 blocks have 45% 0.2% & 0.26% LUTs usage compare to the main architecture. The detail estimation of resource usage is given in table 7.
The xilinx xpower tool to measure system power consumption [18] . Three fault blocks, CRC Counter Checker & Addition Checker consume 7% 1.2% & 4.3% power compared to main architecture power. Resource utilization table 7 & power consumption table 6 is showing that the additional fault blocks has very less resource utilization and less power consumption which is the desirable goal of such kind fault detection application on FPGA based platform. In an earlier paper [14] the RC4 algorithm was implemented in hardware using Vertex 5 FPGA in which 1-byte in 1-clock was the approximate execution speed which has been achieved by carrying the addition process (line 5 of PRGA process) during the rising edge of a clock pulse and the swapping and key streams generation (lines 6 and 7 of PRGA process)during falling edge of the same clock pulse with a loss of one initial clock pulse.The timing diagrams of the proposed three fault modules with respect to to main algorithm clock are also shown in Figure 5 . At falling edge of every 8th consecutive clocks, ′ i ′ is checked and at every rising edge,the addition checker and at every falling edge, the CRC checker is executing their respective tasks. It becomes evident that the fault modules are so designed in hardware here that the throughput of the main RC4 algorithm remains unchanged.
Conclusion
In this paper three low cost fault block are designed for RC4 and implemented in FPGA operating concurrently with the progress of the main algorithm consuming low power and resources, providing run time fault detection efficiency without affecting its throughput. On detection of even one fault the algorithm ceases execution. Had the main algorithm and fault blocks are executed sequentially, the throughput would have been reduced and the attacker would have been able to get the secrets of the algorithm observing power and timing parameters.
References

