Abstract-Nowadays, security has become an important topic of interest to researchers. Different types of cryptography algorithms have been developed in order to improve the performance of these information-protecting procedures. A hash function is a cryptography algorithm without a key such as MD5, RIPEMD160, and SHA-1. In this paper, a new SHA family is developed and designed in order to fulfil the cryptographic algorithm performance requirement. Thus, SHA-256 design and SHA-256 unfolding design based on reconfigurable hardware have been successfully completed using Verilog code. These designs were simulated and verified using ModelSim. The results showed that the proposed SHA-256 unfolding design gave better performance on Arria II GX in terms of throughput. The high throughput of SHA-256 unfolding design was obtained at a data transfer speed of 2429.52 Mbps.
INTRODUCTION NIST (The National Institute of Standards and Technology) standard specifies the adoption secure hash algorithms such as SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512 [1] . Hash Function algorithms are used during data transmission to produce the message digest. Therefore, it becomes an essential tool for embedded security in e-mail, internet banking, and other applications. A hash function takes an arbitrary-length message input to produce a fixed-length output. A hash function is a one-way hash function; it is difficult to invert a hash value to a message input. Furthermore, it is computationally infeasible to find a message that produces the same hash value. These properties become an important aspect to ensure that a hash function can work properly.
The purpose of this paper is to provide a high-speed hardware implementation for the SHA-256 algorithm. This algorithm is synthesised and implemented based on Arria II GX. The motivation of this design is to increase the performance of SHA-256 algorithm. The organisation of this paper is as follows: Section 2 describes the SHA-256 Algorithm; Section 3 presents the proposed SHA-256 Algorithm. The implementation results are discussed in Section 4 together with a comparison with other SHA-256 designs. Finally, the last section provides the conclusions of this project.
II. SHA-256 ALGORITHM SHA-2 consists of four different types of hash functions such as SHA-224, SHA-256, SHA-384, and SHA-512. The output length of these hash algorithms depends on the SHA-2 length ranging from 256 to 512-bit. In this paper, the SHA-256 hash function has been designed. This section describes the SHA-256 algorithm together with the block diagram of this algorithm. Each SHA-256 algorithm can be divided into two stages: pre-processing and hash computation. Pre-processing involves padding a message and parsing the padded message into m-blocks. Initialisation values are set to be used in the hash computation. Hash computation produces a message schedule from the padded message. The output hash value generated by hash computation is used to determine the message digest. Hash computation comprises message schedule, functions, constants and word operations that are generated iteratively in order to obtain a hash value. Table 1 shows the characteristics of the SHA-256 hash function. The security of SHA-256 hash function depends on the size of the hash value. The first step of the SHA-256 hash function is preprocessing; the input message is padded. The process of padding the message starts after getting the message input, and a single 1-bit is added at the end of the message. Then, it is followed by n 0-bit until the length of the message is congruent to 448 modulo 512. The last 64-bit is reserved for calculating the length of the message. Thus, the overall message input is 512-bit. (4), (5), (6) and (7) Figure 3 illustrates the proposed SHA-256 hash function architecture. 15 blocks input of 32-bit data is padded as input data; a single 1-bit is added at the end of the message. Then, it is followed by n 0-bit, and the last 64-bit is the length of the message. The overall message of a SHA-256 hash function is 512-bit. The input message, can be obtained by using Equation (1) . The sequence of the message is generated by using a counter module. The SHA-256 hash function uses 64 rounds iteration of the compression function in order to obtain the final hash code. Before SHA-256 starts processing the message, eight buffer initialisations of SHA-256 are generated with the help of a multiplexer module. The ROM blocks are used to define constant, Kt. These constants contain 64X32-bit ROM blocks. Finally, the output module is used to produce the message digest SHA-256. In this module, buffer initialisations are added with the last output of compression function of SHA-256. The message schedule and compression function of the SHA-256 algorithm need to be modified in order to produce the unfolding architecture. An unfolding design is a technique that reduces the number of latency based on the number of J factors [10] . Besides, this technique can also increase the throughput of the SHA-256 algorithm. In this paper, the unfolding technique with factor 2 has been implemented. Modifications of each of block in the message schedule and compression function have to be considered. Figure 4 and Figure 5 show the block diagrams of The two architectures for Cho(next_e,e,f) and Majo(next_a,a,b) are shown in Figure 6 and Figure 7 . Both architectures consist of AND, NOT and XOR gates with different structures of implementation. From Equations (4) and (5), all data inputs for both of these architectures are different. Data input next_e and next_a can be obtained based on the compression function of SHA-256 algorithm as shown in Figure 2 . The proposed SHA-256 design and SHA-256 unfolding design were successfully designed using the Verilog code. Both of the designs were analysed, synthesised and placed and routed based on Altera Quartus II. Table 3 illustrates the synthesis and implementation results of SHA-256 design and SHA-256 unfolding design. These designs were simulated using ModelSim. The throughput of these designs can be The proposed SHA-256, SHA-256 Unfolding and other related SHA-256 publications are illustrated in Table 4 . From this table, the proposed SHA-256 uses 1301 ALUTs, and the maximum clock frequency of this design is 218.9 MHz. If compared with other SHA-256 designs of different types of FPGA architectures, the proposed SHA-256 design gives the highest maximum frequency with a throughput of 1660.40 Mbps. The throughput of the proposed SHA-256 design is almost similar to that of the SHA-2 design in [2] . This is because of a different architecture of FPGA used in designing SHA-256. Besides, paper [2] does not mention the type of SHA-2 that has been designed. The throughput of SHA-256 design can be increased by using the unfolding transformation design. From this table, it shows that the SHA-256 design gives the highest throughput, 2429.52 Mbps with 156.59 MHz maximum frequency. Other SHA256 designs [2 -8] based on different types of FPGA devices are also given in Table 4 in order to obtain a comparison of FPGA implementations. In this paper, the proposed SHA-256 produces better results in terms of maximum frequency. Furthermore, it also uses a small area implementation with 1301 ALUTs. The novelty of this paper is the design of SHA-256 using the unfolding transformation method. This method can improve the throughput of the SHA-256 design because of the small number of latency if compared with the traditional design. The number of clock cycles of SHA-256 decreases from 64 cycles to 32 cycles. Thus, the high throughput of SHA-256 design can be obtained by using the unfolding transformation method.
VI. CONCLUSION
In conclusion, the proposed SHA-256 and SHA-256 unfolding designs are successfully completed and tested. They are comparable to other SHA-256 designs in terms of area and maximum frequency. Based on the comparison with other SHA-256 designs, the proposed SHA-256 unfolding design gives the highest throughput with 2027.84 Mbps. In future, the proposed SHA-256 design can be applied to Keyed-hash Message Authentication Codes (HMAC).
