Abstract-In cryptographic engineering, extensive attention has been devoted to ameliorating the performance and security of the algorithms within. Nonetheless, in the state-of-the-art, the approaches for increasing the reliability of the efficient hash functions ECHO and Fugue have not been presented to date. We propose efficient fault detection schemes by presenting closed formulations for the predicted signatures of different transformations in these algorithms. These signatures are derived to achieve low overhead for the specific transformations and can be tailored to include byte/word-wide predicted signatures. Through simulations, we show that the proposed fault detection schemes are highly-capable of detecting natural hardware failures and are capable of deteriorating the effectiveness of malicious fault attacks. The proposed reliable hardware architectures are implemented on the application-specific integrated circuit (ASIC) platform using a 65-nm standard technology to benchmark their hardware and timing characteristics. The results of our simulations and implementations show very high error coverage with acceptable overhead for the proposed schemes.
I. INTRODUCTION
Cryptographic hash functions take arbitrary-length inputs and generate fixed-length outputs. The output of hash function is then utilized to provide authentication and integrity for the transferred data. In this paper, due to the efficiency of the algorithms ECHO [1] and Fugue [2] (which has been improved to Fugue 2.0), and the fact that these are inspired by the widely-utilized Advanced Encryption Standard (AES), we present their respective fault detection schemes. These AESinspired hash functions (which have been part of the NIST competition) have received much attention in the literature. For instance, in [3] and [4] , differential and side-channel analysis attacks for ECHO are presented. Moreover, much effort has been put into developing high-performance and efficient hardware implementations of these algorithms, see, for instance, [5] , [6] , and [7] . As discussed in [8] , one important feature of these hash functions is that one can share some resources between the AES and these hash algorithms. Thus, low-complexity implementations are achieved.
Fault attacks pose serious threats to the implementations of the crypto-algorithms. Therefore, many fault detection schemes have been proposed to date for cryptographic and arithmetic entities, see, for instance, [9] , [10] , [11] , [12] [13] , [14] , [15] , [16] , [17] , and [18] for some examples. Nonetheless, to the best of our knowledge, the schemes for increasing the reliability of these algorithms have not been presented in the open literature. Effective fault detection schemes with minimal overhead on these algorithms are essential for achieving reliable hardware architectures.
The summary of our contributions is presented in the following.
• We have obtained new formulations for the predicted signatures of different transformations for hash algorithms, i.e., ECHO [1] and Fugue [2] . The presented closed formulations are used for proposing high-performance and effective fault detection schemes.
• Our simulation results show high fault detection capability for the proposed schemes for all the algorithms. This makes the proposed architectures reliable in practice.
• We have used ASIC implementations to benchmark the hardware and timing characteristics of the proposed schemes. The high efficiency of the proposed schemes makes the proposed architectures suitable for highperformance applications.
II. PRELIMINARIES
ECHO (presented by Benadjila et al.) [1] supports any hash output of length from 128 to 512 bits. The hash function ECHO takes a message and a salt as input. Although the output can be of any length from 128 to 512 bits, the four outputs for NIST competition were 224, 256, 384, and 512 bits. In what follows, we explain the hash function Fugue (presented by the IBM) [2] . Fugue-256 generates a 256-bit output H for the message M which is split into 32-bit blocks m i , 1 ≤ i ≤ t. The chaining value of Fugue-256 (denoted by h) is also split to 32-bit blocks denoted by S i , 0 ≤ i ≤ 29. The following transformation sequence is used for updating h from m i : TIX, ROR3, CMIX, SMIX, ROR3, CMIX, and SMIX (called one round R). The sequence ROR3, CMIX, SMIX is called a sub-round. Therefore, a round R consists of the TIX transformation followed by two sub-rounds [2] . More details are presented throughout the paper as needed. 
III. THE PROPOSED FAULT DIAGNOSIS APPROACHES
In what follows, for each of the algorithms presented in this paper, we propose respective fault detection schemes.
A. ECHO
An overview of the ECHO algorithm for 128 ≤ H size ≤ 256 including the Compress 512 functions is presented in Fig.  1 . As seen in Fig. 1 , each of the t Compress 512 functions gets the 128-bit salt, a 4×4 state of 128-bit entries, and the counter C i , 1 ≤ i ≤ t (used to count the number of message bits being hashed). The first column of the state consists of four 128-bit values which construct the chaining variable of the previous Compress 512 , i.e.,
The other three columns include the 128-bit blocks of the input message. Therefore, in total, there are 12 × t 128-bit message blocks to be processed to give the output (see Fig. 1 ).
As in seen Fig. 1 , each Compress 512 consists of four different transformations, i.e., BIG.SubWords, BIG.ShiftRows, BIG.MixColumns, and BIG.Final. Each BIG.SubWords contains two AES rounds. The first transformation SubBytes which includes 16 S-boxes is the only nonlinear AES transformation. In the AES S-box, the irreducible polynomial of 2 8 ) and Y ∈ GF (2 8 ) be the 8-bit input and output of each S-box, respectively. Then, the S-box consists of a multiplicative inversion, i.e., X −1 ∈ GF (2 8 ), followed by an affine transformation to obtain Y ∈ GF (2 8 ). Look-up tables (LUTs) and composite fields (polynomial basis, normal basis, mixed basis, and redundant-basis are among the approaches for this lowarea implementation variant [19] , [20] , [21] , [22] ) are used to implement the S-boxes. In general, with composite field realizations, a transformation matrix first transforms a field element in the binary field GF (2 8 ) to the corresponding representation in the composite fields
2 ). Then, a multiplicative inversion consisting of composite field operations in the sub-field GF ((2 2 ) 2 ) is performed. Finally, through an inverse transformation matrix, the inverted output is obtained. There have been a number of great research works for error detection of the S-boxes and for the sake of brevity, we do not discuss them.
The next transformation used in BIG.SubWords of ECHO is ShiftRows whose fault detection is straightforward and by rewiring. Moreover, for the two final linear transformations, i.e., MixColumns and AddRoundKey, the 32-bit error indication flag E c = 3 r=0 (in r,c + k r,c + out r,c ), 0 ≤ c ≤ 3, can be used. It is noted that in r,c , k r,c , and out r,c are the input to MixColumns, the round key, and the output of AddRoundKey, respectively. This error indication flag can be compressed so that an n-bit, 1 ≤ n ≤ 32, error indication flag for these two transformations are achieved. Finally, after two rounds of the AES, the output of BIG.SubWords is derived.
Fault detection for the next transformation in ECHO, BIG.ShiftRows, is by permutation. As explained in the aforementioned explanation, the last transformation in BIG.Round, i.e., BIG.MixColumns, is an expansion of MixColumns of the AES. Specifically, the output state of BIG.SubWords (input state of BIG.MixColumns) is arranged as a 4-row, 64-column matrix. Then, each 4 × 4 sub-matrix is multiplied by the fixed MixColumns matrix. Therefore, we obtain the error indication flags of the BIG.MixColumns (B.MC) transformation for j sub-matrices, 0 ≤ j ≤ 15, as follows
where in the sub-matrices, in r,c and out r,c are the input and output of BIG.MixColumns, respectively, for which 0 ≤ r ≤ 3 and 0 ≤ c ≤ 63.
Finally, the BIG.Final transformation is performed as the last transformation in each Compress 512 (see Fig. 1 
Proof. According to [1] , we have v 
B. Fugue
To propose a fault detection scheme for Fugue, we observe that the Fugue transformations can be divided into three types. The first type is the rotation transformations, i.e., ROR3, ROR14, and ROR15. The second category contains the two linear transformations TIX and CMIX. Finally, the last one is the nonlinear transformation SMIX.
Each Fugue round has the following sequence: TIX, ROR3, CMIX, SMIX, ROR3, CMIX, and SMIX. First, we propose the following theorem for the first three transformations TIX, ROR3, and CMIX in the round sequence. Then, we propose the fault detection scheme for the nonlinear transformation SMIX.
Theorem 1: Let σ Si = 29 i=0 S i be the 32-bit result of modulo-2 additions of S i , 0 ≤ i ≤ 29 (called word-wide signature). Then, the predicted word-wide signature of the transformations sequence TIX, ROR3, and CMIX (σ T RC ) in the Fugue round is obtained aŝ
Proof. For TIX, the following substitutions are performed: 
We propose the following theorem for the predicted parity of the Super-Mix function.
Theorem 2: Let I i ∈ GF (2 8 ) and O i ∈ GF (2 8 ), 0 ≤ i ≤ 15, be the 16-byte input and output of the Super-Mix function in Fugue, respectively. Then, the predicted parity for this function, i.e.,P SM , is derived as follows (we note that parity is just an example and any other detecting codes can be utilized)P
where the multiplication is performed using the irreducible polynomial 
IV. SIMULATION RESULTS AND ASIC IMPLEMENTATIONS
The proposed error detection architectures have been simulated after injecting faults. The proposed architectures have the capability of detecting both permanent and transient faults (this covers both natural and malicious faults). In this paper, we use stuck-at error model. The objective in using this model is to cover the malicious errors injected by the attackers to break the algorithm (by injecting one or more incorrect bits) and to detect natural errors caused by bit flips. The stuck-at error forces one bit (for single stuck-at error model) or multiple bits (for multiple stuck-at error model) to be stuck at logic one or zero. This makes the result value independent of the error-free intended value.
In fault attacks, single error injection is the ideal case for gaining the maximum information. Nevertheless, due to technological constraints, a more realistic error model is to inject multiple errors. Therefore, for covering both natural errors and fault attacks, multiple errors need to be considered. The proposed diagnosis schemes in this paper are independent of the life-time of errors. Therefore, both permanent and transient stuck-at errors lead to the same error coverage. We also note that intelligent attackers do not get confined to just multiple stuck-at faults and thus the ability to detect single faults is important.
The fault model used to test the proposed architectures is created using external feedback linear-feedback shift registers (LFSRs) to generate pseudo-random fault vectors that can flip random bits in the output of the gates and at random intervals. For the architectures presented, we have injected up to 80,000 faults and recorded the number of errors. We have also used the redundant-basis S-boxes in composite field where applicable. Moreover, the false alarm ratios are derived. The error coverage in all the cases is more than 99% (and for the case of single stuck-at faults, 100% if we harden the error indication flag comparison units), with relatively low ratio for false alarms, i.e., 0.1%-0.3% for the cases. As we inject more faults, the difference between the error detection results is, comparably, not high, showing the relatively high accuracy of the results.
Through ASIC and for the constructions of the algorithms in 256-bit form, we also present the performance and implementation metrics of the presented constructions. The benchmarking is performed for the error detection architectures using TSMC 65nm library and Synopsys Design Compiler (shown in Table I for area, frequency, throughput, and efficiency [throughput over GE]). We note that in Table I, in order to make the area results meaningful when switching technologies, we have also provided the NAND-gate equivalency (gate equivalents: GE). This is performed using the area of a NAND gate in the utilized TSMC 65-nm CMOS library which is 1.41 µm 2 . The results presented in Table I show acceptable overhead (degradation) for performance and implementation metrics. We also note that the utilized platform is merely for benchmark and we expect similar results on field-programmable gate arrays (FPGAs) or different ASIC libraries.
V. CONCLUSIONS
In this paper, we have proposed efficient fault detection schemes by presenting closed formulations for the predicted signatures of different transformations in three hash algorithms. These signatures are derived to achieve low overhead for the specific transformations and can be tailored to include byte/word-wide predicted signatures. Through simulations, we have shown that the proposed fault detection schemes are highly capable of detecting natural hardware failures and are capable of deteriorating the effectiveness of malicious fault attacks. The proposed reliable hardware architectures have been also implemented on ASIC platform using a 65-nm standard technology to benchmark their hardware and timing characteristics. The high efficiency of the proposed schemes makes the proposed reliable architectures suitable for highperformance applications.
