The Secure Hash Algorithm SHA-512 is a dedicated cryptographic hash function widely considered for use in data integrity assurance and data origin authentication security services. Reconfigurable hardware devices such as Field Programmable Gate Arrays (FPGAs) offer a flexible and easily upgradeable platform for implementation of cryptographic hash functions. Owing to the iterative structure of SHA-512, even a single transient error at any stage of the hash value computation will result in large number of errors in the final hash value. Hence, detection of errors becomes a key design issue. In this paper, we present a detailed analysis of the propagation of errors to the output in the hardware implementation of SHA-512. Included in this analysis are single, transient as well as permanent faults that may appear at any stage of the hash value computation. We then propose an error detection scheme based on parity codes and hardware redundancy. We report the performance metrics such as area, memory, and throughput for the implementation of SHA-512 with error detection capability on an FPGA of ALTERA. We achieved 100% fault coverage in the case of single faults with an area overhead of 21% and with a reduced throughput of 11.6%
INTRODUCTION
Cryptographic hash functions are recommended by several Internet engineering task force requests for comments (RFCs) in applications such as digital signature schemes in public-key cryptosystems, password storage and verification. Hash functions are also the building blocks of secret-key message authentication codes used in two popular security protocols, namely: secure sockets layer (SSL) recommended in RFC 2246 [1] and IPSecurity specified in RFC 2404 [2] . The implementation of cryptographic hash functions using reconfigurable hardware devices such as field programmable gate arrays (FPGAs) has higher performance than software implementation in terms of speed and FPGAs are also flexible and easily upgradeable. Hash functions, which have a 'dedicated' design, are fast and have considerable advantage over other algorithms, which are based on block cipher [3] . Dedicated hash functions suitable for both software and hardware implementation have been proposed. One of the widely considered for use dedicated hash functions in real applications is the Secure Hash Algorithm SHA-512 with the security matching the security of advanced encryption standard (AES) with a key size of 256 [4] . SHA-512 also has complexity of the best birthday attack of 2 256 . In a cryptographic hash function, a message of arbitrary length is first padded and broken into blocks and then converted into a fixed-length output (hash value). The hash values of individual blocks are used iteratively by a compression function to find the final hash value, referred to as message digest. A common sequence of operations is called a digest round and the compression function produces a hash value by subjecting a block of message to many digest rounds.
There are different types of faults and methods of fault injection in public-key encryption algorithms. The faults can be transient or permanent in nature. Transient faults, such as single-event upsets, which are typically a flipping of a logical state or multiple-event faults caused by several single-event faults can be induced. Some of the permanent faults such as total dose-rate faults due to exposure to harmful environment can cause defects. and permanent faults and methods of fault injection such as varying supply voltage, external clocks, temperature or inducing faults using white light, laser and X-rays methods of fault injection are discussed in detail in Bar-EL et al. [5] . If an attacker deliberately generates a glitch attack, causing a flip-flop state to change or corrupt data values when they are transferred from one digest operation to another, even a single fault can result in multiple errors in the hash value computed. The severity of the problem necessitates detection of errors, a key design issue. Moreover, as a digest round consists of several operations, errors can creep in at any of these operations and can affect one or several bits at any of the operations in a digest round. As SHA-512 is considered for use in essential security services, concurrent error detection (CED) is very important. This necessitates an analysis on the propagation of error from the point of origin to the output when implemented in hardware. Even though CED is very desirable, it has certain associated penalties such as hardware cost and the performance degradation due to interaction between the circuit and the detection logic, which need to be considered while designing the error detection circuit. The design goal of the CED is to achieve 100% error detection with minimal penalty. CED techniques involve redundancy (extra logic) in one form or other such as hardware, time or information. A CED circuit based on hardware redundancy duplicates the complete circuit. Duplication targets unrestricted error models, but this approach will result in 100% hardware overhead. Time redundancy techniques re-compute the output at different times with the same circuit thereby increasing the performance overhead by 100%. Moreover, only transient faults can be detected by these CED circuits as the original faulty circuit is used for re-computation. In information redundancy technique, data are appended with additional bits and a coding scheme is used to detect errors. Coding techniques marginally increase the hardware as well as performance overhead and are suitable for simple arithmetic operations with restricted error models. Combinations of the above techniques are also employed to minimize the overhead for CED.
Many techniques have been reported [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] for CED in different application domains. Karri et al. [6] investigated a low-cost, low-latency CED for Rijndael symmetric encryption algorithm. A low-cost method based on a modification of parity bit of input into the parity bit of the output for AES encryption algorithm was implemented on a Xilinx Virtex 1000 FPGA by Wu et al. [7] . A complete scheme for parity-based fault detection in a hardware implementation of the AES was presented by Bertoni et al. [8] . Wu and Karri [9] proposed a CED technique using idle cycles in a data path to do the re-computation with RC6 encryption as a case study. Bertoni et al. [10] have done a detailed study on the propagation of errors on a hardware implementation of Rijndael AES and proposed two fault detection schemes, namely: a redundancy-based scheme and an error detection coding scheme. Yen and Wu [11] proposed a scalable and symmetrical error detection scheme based on cyclic redundancy check. A technique described in Siewiorek and Swarz [12] known as duplication with comparison (DWC) performs CED by duplicating the circuit and comparing the two results. Johnson [13] described triple modular redundancy, a hardware redundancy-based error correction technique with at least 200% hardware overhead. A method similar to the duplication of the circuit, but with partial replication based on prediction logic functions was used by Drineas and Makris [14] for concurrent fault detection in random combinational logic thereby reducing the hardware overhead. Patel and Fung [15] developed a time redundancy-based technique for permanent faults by re-computing with shifted operands. Time redundancy techniques using alternating data are proposed by Reynolds and Metze [16] . A register transfer level CED technique based on time redundancy was proposed by Karri and Wu [17] . A combination of hardware and time redundancy techniques referred to as quadruple time redundancy is proposed by Townsend et al. [18] for concurrent error correcting adder design. For performing arithmetic operations, Parhami [19] suggested a parity code-based system using even-parity redundant operands. Several fault-secure designs for arithmetic operators based on parity prediction are reported by Nicolaidis et al. [20] . Several information redundancy-based CED techniques use codes such as Bose-Lin codes [21] , Berger codes [22] . A method that combines parity check codes and duplication is reported by Almukhaizim and Makriset al. [23] for fault tolerant design of combinational and sequential logic. A non-intrusive CED technique based on compaction of circuit outputs, prediction of the compacted responses and comparison of restricted error models is suggested by Almukhaizim et al. [24] . The idea of multiple parity bits was introduced by Sogomonyan [25] .
A single-chip FPGA solution for two of the dedicated hash functions SHA-384 and SHA-512 was proposed by McLoone and McCanny [26] . A comparative analysis of the hardware implementation of SHA-1 and SHA-512 on Xilinix Virtex FPGA was done by Grembowski et al. [27] . An elegant application specific integrated circuit (ASIC) implementation of SHA-512 by making use of delay balancing and pipelining is recently reported by Dadda and Macchetti [28] . None of the above-mentioned hardware implementations of dedicated hash functions considered the issue of fault tolerance. To the best of our knowledge, an analysis on the propagation of errors and proposing error detection schemes for hardware implementation of SHA-512 has not been done.
In this paper, we analyze the propagation of single and multiple errors occurring at different operations of a digest round in the hardware implementation of SHA-512. The presence of both transient and permanent faults that may occur at different stages of the hash value computation are considered in the analysis. As the hardware implementation on a FPGA is too complex for an exhaustive analysis, we implemented SHA-512 in software only for carrying out the analysis. We propose an error detection scheme based on information as well as hardware redundancy schemes. Errors in most of the operations in a digest round are detected by simple parity circuits and errors in a few of the remaining operations are detected by the simplest DWC technique wherein the hardware pertaining to the operations are duplicated and a comparison is made between the original and the duplicated circuits. As DWC technique results in minimum additional delay with 100% hardware overhead, we used DWC technique only for non-linear operations of SHA-512. Performance metrics such as area, memory and throughput are also computed for the implementation of SHA-512 with error detection capability on a FPGA of ALTERA.
The remaining paper is organized in the following manner. In Section 2, we discuss the SHA-512 algorithm. In Section 3, an analysis of error propagation for single errors is discussed. Section 4 explains the error detection techniques proposed in this paper. Results are discussed in Section 5, and Section 6 concludes the paper.
SHA-512 ALGORITHM
In this section, the SHA-512 algorithm is discussed in detail. When a message of any length ,2 128 bits is input, the SHA-512 computes a 512 bits long condensed representation of message, referred to as message digest. The procedure for the message digest computation consists of two stages, namely: preprocessing and hash computation. In the preprocessing stage, the message is padded, parsed into m-bit blocks and initialization values to be used in the hash computation are set. A message scheduler divides each m-bit block into 16 words and prepares a message schedule by passing one word at a time. A series of hash values are generated iteratively from functions, constants and word operations and the final hash value is called the message digest. The operations performed during the two stages are listed below:
Preprocessing:
(1) Padding the message into multiples of 1024 bits. Step 1: The hash values,
, are assigned to variables a, b, c, d, e, f, g and h. The eight initial hash values, which are 64 bits wide, are shown in Table 1 .
Steps 2 and 3 are performed 80 times on a 1024-bit block.
Step 2: Message schedule where ROTR n (x) is a circular rotation of a variable x by n positions to the right and SHR n (x) is shifting of a variable x by n positions to the right.
Step 3: † A sequence of 80 constant 64-bit words, K t are used by the processing unit. † The processing unit uses four functions, Ch and Maj, P 
IMPLEMENTATION OF SHA-512 ALGORITHMS ON FPGAS Page 3 of 11
Step 4: The ith intermediate hash values H 0 i to H 7 i are computed by modulo-64 bit adders after the iterations.
N after processing all N blocks in the message.
The computation of message schedule are computed by modulo-64 bit adders after the iterations and hence, an additional eight modulo-64 adders are required to find the final hash value for the block. Carry save adders (CSAs) are used for addition as they are faster and have a smaller area than carry look ahead (CLA) adders. In a CSA, a full array adders (FAA) performs addition of three binary vectors without propagating the carries and generate two binary vectors, pseudo-sum and pseudo-carry. Reduction by rows technique is used to perform the multi operand addition wherein an array of full adders is used with a CLA in the last stage to add the final pseudo-sum and pseudo-carry. The implementation of an SHA-512 with CSAs reported by Dadda and Macchetti [28] is shown in Fig. 1 .
The message digest computation in an SHA-512 involves a sequence of operations (digest round) performed on a word B i t of a message block B i . The digest round consists of arithmetic and logical operations listed in Steps 2 and 3 of the hash value computation. These two steps are repeated 80 times to compute the hash value H i for a block. The second stage is the block round, which repeats digest round operations on different blocks of the message to compute the final hash value H. The digest and block round operations are depicted in Fig. 2 .
The basic operations of an SHA-512 are listed below: Final addition of pseudo-sum and pseudo carry using three CLAs. All operations of a digest round except rotation/shift can be performed on bytes. Hence, the operands, which are 64 bits wide, are partitioned into 8 bytes to facilitate better error detection capability.
ERROR ANALYSIS
In this section, the propagation of error in an SHA-512 is discussed. Error analysis is carried out to understand the effect of an error injected into the hash computation circuit. Error injection points are the following: (i) One of the inputs of an individual operation of a digest round, (ii) One of the inputs in a digest round and (iii) One of the inputs at the beginning of a block round, i.e. first digest round of the first block in a multiblock message. A restricted error model is used in our analysis as an SHA-512 involves several loops. This model assumes that at any time only one bit will be in error and the error detection circuits are also designed based on this model. Included in this study are transient as well as permanent faults. In Section 5, we discuss in detail the quantum of experiments conducted.
Error analysis of individual operations in a digest round
In this section, we discuss the effect of a single error on every operation of a digest round individually. Even though the digest round of SHA-512 consists of operations performed in sequence as shown in Fig. 1 , we discuss in this section the effect of an error in the basic operations listed in Section 2. Experiments were conducted by injecting a single error in one of the inputs of an operation block and obtaining the number of erroneous bits at the output of the same block. As Steps 2 and 3, shown in Fig. 2 , are repeated 80 times, hereafter referred to as simply rounds, the errors were introduced at different bits randomly in every round and the number of bits that were in error was computed. The distribution of number of erroneous bits was plotted by computing the frequency of number of errors, where, frequency of errors ¼ (number of times specific number of bits in error/80). As memory and register transfer operations merely transfer the input to output, the erroneous bit in the input is also transferred to the output at the same position. Similarly, in the addition operation performed by FAA, an erroneous bit in any one of operands results in single-bit error in the pseudo-sum output in the same position. The situation is different in the case of pseudo-carry output of FAA. The error is shifted to the left by 1 bit position since every bit of carry produced has a higher positional weight than the pseudo-sum bits and error masking also takes place. In Table 2 , the propagation of an error in a FAA is presented. The shaded cells in Table 2 depict the error masking in pseudo-carry output. The error masking takes place in Maj function as well, since it has an identical logic function as the pseudo-carry output of FAA. 
Correct outputs Erroneous outputs (error in Z)
Pseudo-sum Pseudo-carry Pseudo-sum Pseudo-carry The functions P 0 and P 1 are basically rotate functions and an erroneous bit in the input produces a single error in the output, but the position of the erroneous bit in the output is changed due to rotate operation. As there are three rotations in P 0 , uniformly three erroneous bits were obtained in the output in all the 80 rounds. Same is the case with P 1 . Figure 3a shows the effect of a single-bit error injected at the only input W t22 of s 1 block. In 75% of rounds, three bits of output s 1 were in error. As s 1 has shift function [SHR 6 (x)], and the errors are injected at random in each round, the effect of errors in the six least significant of inputs are ignored in the output. The result is zero error in 20% of the rounds. A similar set of output error patterns was obtained in the case of s 0 .
In the case of Ch, one of the three inputs had to be chosen for the injection of an error. The function Ch(e, f, g) ¼ (e^f) È (: e^g) is such that the error in the output could not be detected due to error masking in four out of eight cases similar to pseudo-carry output of FAA shown in Table 2 if a single error was injected into f or g inputs and in three out of eight cases if the choice of input was e. In   Fig. 3b , the effect of a single-bit error injected at the input e of Ch block is shown. It is clear from graph of Ch that error masking taking place in 55% of cases.
Propagation of errors by CLA is markedly different from that of FAAs. An error in any one of the inputs is propagated by a CLA to several bits in the output. The propagation is done through the bit carried over to the subsequent cells, which in turn propagates errors to sum bits and carry bits and so on. The trend line is exponential in nature and every bit is in error at one time or the other. In 50% of the rounds, the error was present in only one output of the CLA, and the worst-case error of seven erroneous bits in the output occurred only in one round.
From the foregoing analysis it is clear that memory, register and pseudo-sum outputs of FAA are linear operations and rest of the operations are non-linear in nature. A study of this nature helps us in choosing a suitable error detection scheme, which is discussed in detail in Section 4.
It can be seen from Fig. 4 that 50% of bits (256) of message digest are in error in almost 90% of cases and the number of erroneous bits decreases drastically in the last 10% of the cases. The number of erroneous bits in message digest was only three when a fault was inserted in the 80th round. The reason for this behavior is the iterative nature of the algorithm.
Error analysis of a digest round
Experiments were conducted by injecting a single fault in the computed value of W t . A message consisting of a data block of 1024 bits is used and an error is introduced in only 1 bit position of W t , choosing the position randomly in every round in order to simulate the transient fault condition. The number of erroneous bits in the message digest is computed for this block. The results of simulation are shown in Fig. 4 .
Error analysis of a block round
In a message, which consists of several blocks of data, the intermediate hash value H i of a block i computed in digest rounds is used in the subsequent block as initial H values. In order to simulate the effect of a single fault in a block round, a fault was injected in the computed hash value of a block and the block round was unrolled to estimate the erroneous bits in final hash value. A message consisting of 10 data blocks was used and it was found that out of 5120 bits of intermediate hash values, 1600 bits were in error. This is more than 30%.
ERROR DETECTION SCHEMES
Fault detection is achieved in a circuit only by including redundancy in one form or the other. The selection of a suitable scheme depends on the type of errors to be covered, error models and the performance degradation acceptable in terms of hardware overhead and delay. As discussed in Section 1, the schemes based only on time redundancy are not capable of handling permanent faults and the simplest form of error detection scheme which can detect transient as well as permanent faults with hardware redundancy is the DWC scheme. Even though the hardware overhead in this case is little .100% with very little additional delay, these can handle unrestricted error models. Coding schemes can address the problem of detecting transient as well as permanent faults, with lesser overhead on hardware and delay, but these cannot match DWC scheme in terms of error detection capability. The earliest error detecting coding scheme used is the parity bit scheme. It is a well-known fact that parity codes can detect all single bit errors and all errors with multiple odd erroneous bits. A single parity bit scheme is suitable if the group of bits handled by the parity bit is small in size. In SHA-512, as the operands are 64 bits long, multiple parity bits are required to improve the error detection capability.
We use a multiple parity bit scheme comprised of eight parity bits with each parity bit handling a byte of an operand. The eight parity bits of an operand so generated will be referred to as parity vector of the operand in the rest of the paper. In our scheme, the single parity bit of a vector can certainly detect not only all single bit errors in a byte but also all odd number of erroneous bits in the same group. Moreover, multiple even errors in the 64-bit operands may also be detected if the errors are spread in such a way that there are only odd number of erroneous bits in a byte. As the parity bits are handling only a byte at a time, the hardware complexity is also reduced. The parity coding of an SHA-512 digest round is shown in Fig. 5 . Every parity block is 8 bits wide and FAAs have two sets of parity blocks, one for pseudo-sum and another for pseudo-carry outputs. In the following section, we discuss the method of predicting and checking parity bits for each operation.
Parity prediction and checking
The parity prediction scheme depends on the operation performed. Parity checking basically compares the parity generated from the output of an operation with that of the predicted parity.
(1) Memory and register transfer operations: The initial constant H is stored in 8 Â 64 ¼ 512 bits and constant K t in 80 Â 64 ¼ 5120 bits of memory. As these are constants, even parity bits are computed beforehand for every byte of the operand and stored in memory as a vector along with the operands. Memory capacity is increased to 8 Â 72 ¼ 576 bits for H and 80 Â 72 ¼ 5760 bits for K to store the constants along with the parity bits. During memory read or register transfer operations, the parity vectors are transferred along with the operands to the destination. A total of 704 extra bits are required to store the parity bits generated. (2) Pseudo-sum output of FAAs: In FAAs, the sum output is generated by full adders and is given by pseudo-sum ¼ X È Y È Z where X, Y and Z are operands. If pX, pY and pZ are parity vectors of X, Y and Z, respectively, as parity is a linear operator,
In the FAAs of SHA-512 circuit, eight parity bits for pseudo-sum are predicted by XORing the parity bits of respective bytes of the operands.
As, pX È pY È pZ ¼ 1, parity of pseudo-sum can be predicted by XORing the parity of individual operands. A parity vector generator computes the parity vector of pseudo-sum. The predicted parity vector is then compared with the generated parity vector of pseudo-sum. Parity prediction and checking procedure is shown in Fig. 6a for pseudo-sum output of FAA, wherein X, Y and Z are 8-bit operands. Implementation of this scheme would require a parity generator block for the pseudo-sum, which is constructed with an 8-bit XOR for every byte of pseudo-sum, which in turn generates 1 bit of the parity vector. (3) Pseudo-carry output of FAA, logic and rotate/shift operations: For the pseudo-carry output of FAA, logic operations Ch and Maj and rotate/shift operations s 1 , s 0 , P 0 and P 1 no parity codes can be generated as such since these operations are non-linear in nature, and any attempt to create a coding scheme results in marked increase in hardware overhead. Moreover, rotate/shift operations cannot be performed on bytes and this adds to the increased complexity of coding scheme. Hence, parity prediction is done in these operations by duplicating the operation block and generating the parity vectors of original block and the duplicated blocks separately. The parity vectors so generated are then compared to detect errors. Two schemes were attempted which are shown in Fig. 6b and c. The scheme chosen for implementation is Fig. 6b as the hardware overhead was lesser in this case than the scheme in Fig. 6c . For example, s 0 function when implemented using scheme in Fig. 6b , 44 logic elements were used whereas scheme in Fig. 6c required 83 logic elements. Propagation of errors by a CLA was discussed in Section 3.1. As the carries are propagated to subsequent cells, any error in carry will spill over to other bits. A scheme for predicting the parity bit is shown in Fig. 7 . If A and B are the pseudo-sum and pseudo-carry from FAA and if pA and pB are their respective parity bits, C 0 is the carry-in, pS is the parity of the final sum output and RC n21 is the XOR of all the carries generated in all the cells excluding the last cell, then predicted carry ¼ pA È pB È RC n21 È C0. Parity check then involves comparing pS with the predicted parity. For example, let A ¼ 1001, B ¼ 0111; then pA ¼ 0 and pB ¼ 1; pS ¼ 0 as the Sum ¼ 0000. The carry-in C 0 ¼ 0 and carries generated are 1110 where C 4 ¼ 1. The XOR of all the carries generated in all the cells excluding the last cell, i.e. C 4 ,
The predicted carry as computed by the expression, pA È pB È RC 3 È C 0 ¼ 0 È 1 È 1 È 0 ¼ 0, which is same as pS. In the implementation of the adder, the required modifications are carried out to handle the parity generation.
The prediction scheme for a digest round of an SHA-512 is constructed by cascading the individual prediction schemes discussed in this section.
Parity checking points
The parity bits are checked at the output of all operational blocks except at the register outputs where only the transfer of parity bits is done. The total number of checkpoints is 25, which are marked in Fig. 5 . Experiments were conducted by injecting a single fault at each of these check points and we found that all 25 check points are required as each block assumes that the incoming parities are error free. Moreover, checking parity at the output of every operation prevents transmission of errors to subsequent operational blocks and provides the shortest detection latency.
EXPERIMENTAL RESULTS
The parity scheme for the SHA-512 algorithm proposed in this paper was initially implemented in software in order to perform exhaustive analysis as the hardware implementation on FPGA is too complex. The algorithm was later designed and tested using comprehensive design software, the Altera Quartus II, version 5.0. The designs were synthesized using Verilog HDL and VHDL, placed and routed in Altera device EP1S20F780C5 of Stratix family FPGA. The design was tested for a large number of test cases encompassing the different types of expected errors. The test cases can be classified into three categories on the basis of position of error introduced.
In the first category, a message of one data block size was chosen and a single fault was injected as discussed in Section 3.2. In each of the 80 rounds, the error position was randomly chosen among the 64 bits of W t in order to simulate the transient errors. Ten such blocks were tested in the same manner amounting to 10 Â 80 ¼ 800 test cases. These tests were used to detect errors in digest rounds. Next a message of multiple blocks with varying number of blocks was used and 10 such messages were tested when a single fault was injected in each of the 80 rounds giving rise to same number of test cases as above. This tests errors in the block loop. The third category test cases were used to detect permanent errors. A message of one data block size was chosen and a single fault was injected in every bit of W t in each of the 80 rounds. Five such blocks were tested in the same manner amounting to test cases 5 Â 64 Â 80 ¼ 25 600. In all cases, the errors were injected in the bits of W t and 100% error detection was achieved.
The performance metrics such as the area (a) and memory (m) were obtained from the tool and throughput (d) was computed for a digest round of the SHA-512 algorithm with and without error detection capability and these performance metrics are presented in Table 3 . The resources used in terms of the number of logic elements for the implementation of algorithm are referred to as the area. A memory segment consists of a bit-slice of a memory that is implemented in a single embedded cell. Each embedded cell implements one output of the memory and multiple memory segments may Table 3 , the number of additional logic elements required for error detection circuits ¼ (5038 2 4158) ¼ 880 which amounts to a hardware overhead of 21%, i.e. (880/4158) Â 100. The delay overhead computed from reciprocal of frequency of both the circuits is 11.6%. These parameters show that our scheme is superior to complete the hardware redundancy technique in terms of hardware overhead and has much less delay overhead than time redundancy techniques.
CONCLUSIONS
In this paper, a detailed analysis of the propagation of errors in a hardware implementation of SHA-512 is studied. This analysis included single, transient as well as permanent faults injected at all stages of hash value computation. It is found that even a single error injected resulted in half the bits of hash value being in error and the errors are spread across the computed hash value. In-depth analysis of the individual operations of SHA-512 guided us in proposing suitable parity prediction schemes. We proposed an error detection scheme based on parity codes and hardware redundancy and our experiments conducted on a large number of test cases show that our scheme has 100% fault coverage in the case of single errors. The computation of performance metrics, such as area and throughput indicate that our scheme has only limited hardware overhead and short delay overhead. Future work will be directed toward developing error detection schemes with pipelined architecture and developing error correction codes for SHA-512 algorithm.
