Abstract-Satellites are extensively used by public and private sectors to support a variety of services. Considering the cost and the strategic importance of these spacecrafts, it is fundamental to utilize strong cryptographic primitives to assure their security. However, it is of utmost importance to consider fault tolerance in their designs due to the harsh environment found in space, while keeping low area and power consumption. Therefore, this paper proposes novel fault tolerant schemes for the SHA-2 family of hash functions and analyzes their resistance to SEUs. Results obtained through FPGA implementation show that our best fault tolerant scheme for SHA-512 uses up to 32% less area and consumes up to 43% less power than the commonly used TMR technique. Moreover, its memory and registers are 435 and 175 times more resistant to SEUs than TMR. These results are crucial for supporting low area and low power fault tolerant cryptographic primitives in satellites.
INTRODUCTION
Satellites are extensively used by public and private sectors to support communication services, conduct scientific experiments, provide navigation and meteorological support, or increase homeland security. Some countries also employ commercial satellites for military communications [1] . The Consultative Committee for Space Data Systems (CCSDS) has highlighted in [2] that advances in technology allow for com-978-1-4244-2622-5/09/$25.00 c 2009 IEEE plex attacks to be easily carried out against satellites, making security an important goal in satellite designs. Considering the cost and the strategic importance of these spacecrafts, it is not recommended to assure their security by relying on the uniqueness and obscurity of their designs. Actually, due to the lack of appropriate security, some satellites have already been compromised [3] , [4] , [5] . Threats to satellites can pose severe risks to communications infrastructures [1] , [6] and architectures must be designed to provide security services such as authentication, data integrity and confidentiality, thus increasing the security of those spacecrafts.
Security architectures [7] , [8] have been recently proposed to authenticate communications between ground stations and spacecrafts or to provide security services to data processed on-board. To do so, cryptographic primitives are utilized, such as hash functions for integrity checking/authentication and block encryption for confidentiality. These architectures encounter implementation challenges to protect the security mechanisms from the harsh environment found in space. Radiation coming from space can hit satellites' circuitry and cause errors known as Single Event Upsets (SEUs) [9] . SEUs are forms of soft-errors, in the sense that they cause dynamic bit flips but are not damaging to the hardware.
Spacecrafts include both ASICs and a variety of FPGAs in their subsystems. ASICs and non-volatile FPGAs (AntiFuse or Flash) offer an increased SEU resistance compared to FPGAs based on Static Random Access Memory (SRAM) technology; the construction of the SRAM cells make them more sensitive to SEUs. In some cases a single bit-flip in a configuration element of an SRAM FPGA is able to disrupt the entire functioning of the design implemented in the chip. However, the low production volume of satellites makes FPGAs an attractive alternative to reduce non-recurring engineering costs. In addition, their reconfigurability allows postlaunch updates and patches to the satellite hardware. Regarding more specifically SRAM FPGAs, they provide both high density and high speed therefore resulting in a good trade-off between performance and flexibility. Thus, to overcome the issue of SEUs occurring in configuration elements, designers utilizing SRAM FPGAs implement methods like read-back, CRC checking and reconfiguration. These methods basically consist in reading the configuration from the FPGA, checking its correctness using CRC, and then reconfiguring the device with a correct bitstream. Device hardening is another technique to make FPGAs more resistant against radiation.
Considering that the hardware description is protected against SEUs by either the underlying technology (i.e. ASICs and non-volatile FPGAs) or by the aforementioned techniques, the present work focuses on protecting the data processed by the device. Indeed, the issue of SEUs in registers and memory persists for all kinds of FPGAs as well as for ASICs. Therefore, whatever the underlying technology is, we assume that the configuration of the device is safe regarding errors. Thus, the proposed techniques target error detection and correction on the data processed within a device.
This work focuses on fault tolerant design of hash functions for space applications. Hash functions are employed in satellite systems in many different ways. They are used as building blocks in authentication schemes like digital signatures [10] , [11] , and Hash-based Message Authentication Codes (HMAC) [12] . These cryptographic primitives allow for the integrity checking of data received from a ground station to assure that they were not accidentally or maliciously compromised. Furthermore, it could also be employed in invasion detection and recovery schemes [7] . By using those schemes, it becomes possible, for example, to determine whether an attacker, who may have broken into the satellite, has tampered with the system's program memories or FPGA configuration. Considering the properties of hash functions and their applications it is clear that a single bit-flip can have disastrous consequences like provoking unjustified satellite reset or intrusion alert. Therefore, SEUs should be properly addressed to guarantee the correct and reliable operation of spacecrafts employing cryptographic algorithms. For that purpose, fault-tolerant designs for hash function are required.
In this paper we investigate fault tolerant architectures for the family of Secure Hash Algorithms (SHA) [13] which is the most commonly used hash function in integrity checking and authentication architectures. More specifically, the SHA-2 family of hash functions is recommended to be adopted by the Consultative Committee for Space Data Systems (CCSDS) as a standard for space systems [14] . Thus, we propose novel fault-tolerant solutions for SHA-2 that combine existing techniques, i.e. Triple Modular Redundancy (TMR) and Hamming Codes (HC), in order to offer several performance and cost trade-offs. Moreover, while the applicability of these solutions are independent of the underlying technology, we provide a detailed study on power consumption and area based on FPGA implementations. We also carry out an analysis to determine the robustness of each scheme against SEUs, and we show that the proposed solutions offer a better resistance to bit-flips when compared to the traditional TMR. For instance, the best scheme proposed for SHA-512 consumes 32% less area and consumes up to 43% less power than TMR. Furthermore, compared to TMR, its memory and registers are, respectively, 435 and 175 times more resistant to SEUs.
The remainder of this paper is organized as follows. Related works are presented in Section 2. Section 3 describes the SHA-2 algorithms, while Section 4 introduces non-fault tolerant designs for SHA-2. The fault tolerant architectures proposed in this work are presented in Section 5 and their robustness against SEUs are evaluated in Section 6. Section 7 reports our experimental results in term of power, area and frequency of operation, and provides a comprehensive comparison of the proposed schemes applied to the SHA-2 family of hash functions. Our conclusions are presented in Section 8.
RELATED WORK
With the growing worldwide demand for satellite-based services, the dependence on these spacecrafts tends to increase. Consequently, in case of a satellite failure, the risk of losses tends to go higher over the years. As a result, the disruption of satellite services, whether intentional or not, can have a major economic impact. This context motivated the United States General Accounting Office to issue a report [1] presenting several threats to satellite systems. Its conclusion stresses that the security of commercial satellites should be more fully addressed in order to achieve higher levels of protection for the country's critical infrastructure. CCSDS has also highlighted [2] the importance of including security in space missions. Actually, several proposals have been made for CCSDS to standardize the use of strong cryptographic mechanisms for integrity checking, authentication and encryption [14] , [15] .
Several security architectures have been proposed for satellites to remotely manage their hardware configuration from the ground station [16] , and for purposes of key distribution [17] and management [18] , [19] , [20] . All of those works make use of cryptographic primitives such as hash functions to achieve their goals, but none of them considers the harsh environment surrounding spacecrafts and more specifically the resistance to SEUs. In [7] only, a security scheme for key recovery in satellites is proposed with SEU-resistant hardware. The authors, however, employ existing techniques for fault tolerance like TMR and HC, but do not specifically investigate new approaches for fault tolerance.
Hardware implementations of hash functions have been proposed by several works. In [21] a single chip implementation of SHA-384 and SHA-512 based on FPGAs is introduced. A SHA-256 processor is presented in [22] , also employing FPGAs for its implementation. In [23] , [24] the whole SHA-2 family is implemented in FPGAs and compared in terms of area, frequency of operation and throughput. Some other works [25] , [26] , [27] provide further comparisons of FPGA-based implementations of hash functions. Also, some optimizations based on pipelining, loop unrolling, operation rescheduling and hardware reutilization are proposed in [28] , [29] . Previous research has extensively addressed the implementation of the SHA-2 family of hash functions. However, none of them consider the resistance against SEUs and are therefore not appropriate for space applications. Only in [30] error detection was considered for SHA-512 in FPGAs. Since this approach employs parity prediction for the internal hash function operations, it is therefore unable to correct errors, as it is proposed in this paper.
Fault-tolerant designs of cryptographic primitives have mainly been proposed for block encryption algorithms like the Advanced Encryption Standard (AES) [31] . In [8] a fault detection and correction capabilities are included into AES implemented in FPGAs. SEUs are detected in each round transformation by using parity prediction, and corrected through the use of Hamming codes applied to the round data matrix. Another approach is due by [32] , in which single bit-flips in the substitution box of the AES algorithm are detected by using look-up tables and parity prediction.
The hardware designs of chips is made resistant to SEUs through different techniques like radiation hardening, which is the case of Actel's Anti-Fuse [33] and Flash [34] FPGAs. Anti-Fuse devices provide higher reliability compared to Flash ones, but their main drawback is that they can be configured only once. Xilinx also provides radiation-hardened SRAM-based FPGAs [35] to meet the requirements of space applications. SRAM-based FPGAs are available in higher densities than the Anti-Fuse and Flash counterparts, but they are more sensitive to SEUs because of the features of SRAM cell itself. Altera also proposes a strategy to protect the configuration of SRAM-based FPGAs against SEUs at runtime. Some Altera FPGAs employ built-in Cyclic Redundancy Check (CRC) circuitry [36] . Although the aforementioned strategy is able to check the internal FPGA's configuration, it does not detect errors in the user data stored or being processed within the device. Other designs can employ ASICs to achieve high reliability of their hardware implementations at the cost of very high non-recurring engineering costs. However, whatever the underlying technology is, the user data being processed within the device remain unprotected against SEUs.
Aerospace applications have traditionally used techniques based on modular redundancy for mitigating SEUs. For instance, TMR together with FPGA reconfiguration is proposed in [37] to fully protect systems from SEUs. TMR can be very costly, though. It requires the triplication of all architectural elements along with a voter, thus demanding considerable amounts of resources. Attempts were made in [38] to reduce the costs associated with fault mitigation by only applying TMR to the most critical components of a design.
In contrast to previous research, we propose novel faulttolerant designs of the SHA-2 family of hash functions. Therefore, we investigate original combinations of TMR and Hamming codes (HC) that support not only error detection, but also correction. We explore different levels of granularity involving TMR and HC, and analyze all the implementations in terms of area, frequency of operation, throughput and power consumption. Additionally, this research provides an analysis of the resistance of each new scheme against SEUs and compare them to the traditional TMR.
SHA-2 ALGORITHMS
In this section, the SHA-2 family of hash algorithms [13] 7 , are set. Each algorithm uses a distinct set of initial hash values given in [13] .
Hash Computation
The entire computation of the message digest is based on operations over D-bit words. [13] . Furthermore, six logical functions are also employed, and are shown below. The operations ROT R n (x) and SHR n (x) are rotation and shift of x by n bits to the right.
SHA-224 and SHA-256:
SHA-384 and SHA-512:
Further, for each message block i, 1 ≤ i ≤ N , a four-step digest round is performed as follows:
Step 1: Initialize the eight working variables
Step 2: Prepare the message schedule
The number of words processed by the message scheduler is given by j. Actually, j corresponds to the number of iterations performed by the algorithm. For SHA224/256 j = 64, whereas for SHA384/512, j = 80.
Step 3: For t = 0 to j − 1 do:
Step 4: Compute the i th intermediate hash value
After processing all N blocks of message M , the final message digest is obtained by concatenating the hash values (H
. More precisely, the message digest for each algorithm is given by the concatenations shown below. The concatenation of words is represented by the symbol ||.
SHA-224:
. SHA-256 and SHA-512:
SHA-384:
H (N ) 0 ||H (N ) 1 ||H (N ) 2 ||H (N ) 3 ||H (N ) 4 ||H (N ) 5 .
HARDWARE DESIGN
In this section a non-fault tolerant hardware design for SHA-2 algorithms is presented; this design is used in following sections to support the description of the evaluated fault-tolerant techniques. It basically consists of shift-registers, logical operations, D-bit adders, and a memory to store the algorithm's initialization values and constants. Similarly to hash hardware implementations mentioned in Section 2, we do not perform message padding in hardware. Our main focus is on the hash computation datapath.
The architectural elements of the SHA-2 implementation, as shown in Figure 1 , can be divided into four main blocks: Intermediate Hash Computation, Compressor, Message Scheduler, and Constants Memory. The constants memory utilizes the FPGA's RAM blocks. Since this design does not involve any kind of fault tolerance, it is referred to as NoFT. Figure 1 .
In the end of t iterations, the intermediate hash computation must be performed. This operation could be executed in one clock cycle, but it would require eight adders for that. However, in order to save implementation area, only two adders are utilized. This way, the computation of the intermediate hash is spread over the last 4 iterations by computing two additions per clock cycle. More precisely, the additions are performed when t = 60, ..., 63 for SHA224/256. For instance, in SHA224/256, when t = 60, H 3 and H 7 are computed, when t = 61, H 2 and H 6 are computed, and so on. The same strategy is followed by SHA384/512, but the additions are executed when t = 76, ..., 79.
In case of a multi-block message, a new execution cycle initiates with 16 more words M t being shifted into the module. Then, the same procedure described above is executed. For the last message block, read operations are performed to shift out the message digest. Specifically, SHA-224, SHA-256, SHA-384 and SHA-512 require, respectively, 7, 8, 6, and 8 read operations.
The total memory requirement to store the constants K t and H
is 2304 bits for SHA224/256, and 5632 bits for SHA384/512, as shown in Table 1 . Given the variables W 0 , ..., W 15 , a, ..., h and H 0 , ..., H 7 , the total register requirements are 1024 bits for SHA224/256, and 2048 bits for SHA384/512, as listed in Table 2 . Considering the number of iterations involved in a hash computation, it is clear that a single bit-flip in a memory that provides constants or input blocks, or in registers propagating intermediate values, can be devastating for the applications using the hash function (e.g., data integrity checking, authentication). In order to make hash function designs appropriate for space applications, error detection and correction schemes must be incorporated so that SEUs do not compromise its normal operation. This issue is addressed in Section 5. To show the improvements offered by these new solutions, we implemented these schemes for every hash function of the SHA-2 family (i.e. SHA-224, SHA-256, SHA-384, and SHA-512). In the following, when we refer to SHA-2, we imply the four functions belonging to the family; when distinction between functions is required we specifically name the hash function. We define a terminology for Hamming codes: (w,v), where v is the number of data bits, and w is the number of data bits along with parity bits. Furthermore, Tables 1 and 2 summarizes the memory and register requirements discussed below.
Full Triple Modular Redundancy
As described above, triple modular redundancy consists in triplicating the circuit and using a voter to determine the output. In this work, three SHA-2 hardware modules are instantiated and share the same inputs as depicted in Figure 2 . This scheme is referred as FullTMR for short. During implementation of FullTMR, special attention was paid to the design partitioning in order to avoid the synthesizer merging common circuitry and registers, which would lead to misleading synthesis results. FullTMR is used only as a reference model for the comparisons performed in the next sections.
Figure 2. FullTMR block diagram
Since FullTMR uses three instances of the NoFT module, it triplicates the memory and registers requirements. Precisely, the memory requirements of FullTMR is 6912 bits for SHA224/256, and 16896 bits for SHA384/512. The register requirements is now 3072 bits for SHA224/256, and 6144 bits for SHA384/512. An advantage of FullTMR is that it includes fault-tolerance without a big impact on the module's speed, since it employs three NoFT modules working in parallel. On the other hand, a drawback is the big area penalty imposed by the use of replication. Thus, other schemes are proposed to reduce the memory requirements and to achieve smaller implementation area and lower power consumption.
Regarding SEU resistance, FullTMR has three times as much memory and registers as the NoFT module. Considering that these memories do not employ any fault-tolerant mechanism, a single bit-flip in one of these memories compromises the processing of the entire NoFT module. A second bit-flip in any other location of the other two memories, compromises the entire FullTMR module. The same failure condition applies to the registers; one bit-flip in a register of one of the modules, and a second bit-flip in any of the registers of the other two modules. Considering that spacecrafts may have mission lifetimes reaching decades, it is very likely to have multiple bit-flips in the memory elements. As a result, it is very important to protect memory against SEUs while reducing memory requirements; this is the idea behind the scheme presented in the following section.
TMR with Shared Encoded Memory
In order to scale down the number of memory bits used, as well as the power consumption, the constants memory could be shared among the three SHA-2 modules. This scheme is named TMR&HCMem, is depicted in Figure 3 . However, the common memory needs to be protected against SEUs. This is accomplished by encoding the memory of SHA224/256 with a (38,32) Hamming code, whereas SHA384/512 employ a (71,64) Hamming code. For each memory read, a Hamming decoder detects and corrects any potential bit flip, thus sending the correct value to modules. The use of a single memory in TMR&HCMem decreases the memory overhead compared to FullTMR, even when parity bits are attached to each memory element. As a result, the memory requirements of TMR&HCMem is 2736 bits for SHA224/256, and 6248 bits for SHA384/512. The consequence of using HC, though, is the inclusion of the Hamming decoder between the memory and modules. This decoder implies a longer critical path of the circuit and thus decreases its frequency of operation. The register requirements is exactly the same as in FullTMR, i.e. 3072 bits for SHA224/256 and 6144 bits for SHA384/512.
On the other hand, the main advantage of encoding the memory is that it now tolerates up to one bit-flip in each of its memory elements. In order to have a failure, two bit-flips must happen in the same memory element. Hence, the encoded memory is less likely to fail compared to the triplicated, unprotected memory in FullTMR. A deeper analysis of the probability of memory failure is provided in Section 6. Due to its better resistance to bit-flips, encoded memory is used in all of the following schemes introduced next.
TMR for Registers and Shared Encoded Memory
Given our concern in protecting only the data being processed, a further optimization to TMR&HCMem is to move the redundancy to the register level instead of keeping it at the modular level. This scheme is called TMRReg&HCMem.
More precisely, instead of triplicating the entire SHA-2 module, only one SHA-2 module is used, but all its registers are triplicated, as illustrated in Figure 4 . The register requirements is exactly the same as in FullTMR and TMR&HCMem, i.e. 3072 bits for SHA-224 and SHA-256, and 6144 bits for SHA-384 and SHA-512. But now, in order to mask out registers errors, one voter is used for each trio of registers. In total, 32 voters are needed, resulting in higher implementation area. Similarly to TMR&HCMem, this approach uses an encoded memory to keep SHA-2 constants protected against SEUs. Given the Hamming decoder and the multiple voters used, lower frequencies of operation are expected compared to FullTMR and TMRReg&HCMem. The main advantage of this scheme, however, is that errors are masked in every clock cycle. In other words, the design fails if two bit-flips occur in the same register and in the same clock cycle. Since this is very unlikely to happen, a very high protection against SEUs in register is achieved with this scheme, as is more formally discussed in Section 6.
Encoding/Decoding All Registers with Hamming Codes
As an alternative to replicating registers and modules, a scheme named HCAllRegs is proposed. In this scheme, Hamming codes are used in place of TMR to protect the registers contents. The goal is to detect and correct potential bit-flips in each clock cycle of SHA-2 functions. In order to achieve that, register contents are encoded/decoded on every clock cycle before each write/read operation. As a consequence, registers are kept encoded all the time and therefore protected against bit-flips. This scheme is depicted in Figure 5 , where the shaded and dark areas represent Hamming encoders and decoders.
In order to reduce the number of parity bits used, all the 32 registers were merged and treated as a single register. For example, if SHA224/256 encoded their thirty two 32-bit registers separately using a (38,32) Hamming code, a total of 192 parity bits would have been necessary. By treating the registers as one 1024-bit register, a (1035,1024) Hamming code can be used and only 11 parity bits are needed. Likewise, if SHA384/512 encoded their thirty two 64-bit registers separately using a (71,64) Hamming code, a total of 224 parity bits would be needed. When the registers are merged into a single 2048-bit register, a (2060,2048) Hamming code can be employed, thus demanding only 12 parity bits. Notice that, although Figure 5 show each register associated with an encoder and a decoder, a single encoder and a single decoder is employed. In sum, the register requirements are 1035 bits for SHA224/256, and 2060 bits for SHA384/512.
The main disadvantage of this approach is that the use of Hamming encoders and decoders for all registers increases the critical path of the SHA-2 modules and thus reduces the frequency of operation of the design. However, since the merged register is decoded and re-encoded in every clock cycle, it will only have a failure if two bit-flips happen in the same clock cycle. As a consequence, this scheme provides a high protection against SEUs, as described in Section 6.
Encoding/Decoding Main Registers with Hamming Codes
Since HCAllRegs keeps all the registers always encoded, they are protected against SEUs all the time. Although HCAllRegs offers a high resistance against SEUs, that comes at the cost of employing large Hamming encoders and decoders to detect and correct errors in every clock cycle. This can be translated to a higher demand on implementation area and power consumption. In this context, it would be interesting to achieve a better trade-off between SEU protection, area and power consumption. This trade-off is explored in the scheme described in this section.
By analyzing Figure 5 , it can be noticed that, in a given algorithm iteration, only registers a, . Figure 6 . Similarly to HCAllRegs, HCMainRegs uses an encoded memory to keep the constants protected against SEUs. Besides, it uses encoders and decoders in its datapath, which will certainly increase the circuit's critical path. As a result, lower frequencies of operation are expected for HCMainRegs compared to the ones achieved in FullTMR.
EVALUATION OF ROBUSTNESS AGAINST SEUS
In addition to factors such as implementation area and power consumption, it is important to evaluate the robustness of the proposed schemes against SEUs. A qualitative analysis, as done in Section 5, give us a good idea on how resilient each scheme is. However, it is appropriate to conduct a quantitative evaluation of the strategies proposed in this work, so that we can better compare them with the traditional schemes such as TMR.
The evaluations conducted in this section analyze the probability of failure of two main elements: memory and registers. We assume that there is one scheme per device (ASIC or FPGA), and that its implementation is spread uniformly over the device resources. Therefore, we define the following terms:
M : Total memory resources in bits, R : Total number of registers in bits, m : Used memory resources in bits, and r : Used registers in bits.
In order to simplify the discussions, we assume that all designs are performing at the same frequency of operation. We consider the computation of 1 hash for our evaluations. We further assume that the hardware devices are in the same environment and subject to the same bit-flip rate. Then, we define:
µ : Bit-flip rate per memory bit per clock cycle, ρ : Bit-flip rate per register bit per clock cycle, and n : Period of time in which bit-flips may occur expressed in number of clock cycles.
Given that all schemes based on TMR and Hamming codes tolerate one bit-flip, we analyze the condition for failure, which is to have two bit-flips corrupting the SHA-2 computation. Then, we define the following terms: P (F M ) : Probability of a memory failure, P (F R ) : Probability of a register failure, P (X 1 ) : Probability of the first bit-flip in X, and P (X 2 ) : Probability of the second bit-flip in X, where X can be either memory or register elements. From these definitions it is possible to determine the probability of memory and register failures for each of the schemes presented in Section 5.
Full Triple Modular Redundancy
The traditional TMR triplicates the memory requirements of a non fault tolerant module. In order to have a failure in FullTMR, it is necessary to have one bit-flip in one of the memories, and a second bit-flip in any location of the other two memories. Let the total memory usage of TMR be m bits out of M bits available in the device. First analyzing the case of a first bit-flip, we have that the probability of a particle hitting one memory element on-chip is 1/M . Given that m bits are used for TMR, the probability of having this particle hitting one memory element of the TMR design is m/M . Further, the three modules take n clock cycles to compute a SHA-2 operation. Moreover, while in operation their memory elements suffer a bit-flip rate of µ per clock cycle. Thus, the probability for the first bit-flip is given by P (M 1 ) = mnµ/M . Assume that a bit-flip happened in the memory of one of the modules; this corrupted memory occupies m/3 bits of the total memory requirements. So, in order to have a failure, another bit-flip must happen in the remaining 2m/3 bits of the other two healthy modules. The probability of this event to happen is then 2m/(3M ). Consequently, the probability for the second bit-flip is given by P (M 2 ) = 2mnµ/(3M ). Since both events must occur, the final probability to have a memory failure in FullTMR is
The probability of register failure in FullTMR is defined exactly the same way as the memory one. A failure will happen with the occurrence of one bit-flip in a register of one of the modules, and a second bit-flip in any of the registers of the other two modules. The total register usage of TMR is r bits out of R bits available. Given that n clock cycles to compute a SHA-2 operation, and that the chip suffers a register bit-flip rate of ρ per clock cycle, the probability of the first bit-flip in a register is P (R 1 ) = rnρ/R. Furthermore, the probability of the second bit-flip is given by P (R 2 ) = 2rnρ/(3R). As a result, the final probability of having a register failure in FullTMR is
TMR with Shared Encoded Memory
In order to achieve a higher protection level for the memory, each memory position was encoded using Hamming codes. The condition for failure in TMR&HCMem is to have two bit-flips happening in the same memory element. Let the total encoded memory be m bits out of M bits available. The probability of a particle to hit the encoded memory element is m/M . Since TMR&HCMem takes n clock cycles to compute a SHA-2, and considering a bit-flip rate of µ per clock cycle, the probability of the first memory bit-flip is P (M 1 ) = mnµ/M . Given that each memory position is individually encoded, the condition for failure is to have a second bit-flip in the same memory location that received the first bit-flip. Assume that each encoded memory location utilizes l bits, and that the first bit-flip happened in one of these l bits. Then, a failure will happen if a second bit-flip happens in the remaining (l − 1) bits, so that probability of that occurring is (l − 1)/M . As a consequence, the probability of the second bit-flip is given by P (M 2 ) = (l − 1)nµ/M . Thus, as a result, the final probability of having a memory failure in TMR&HCMem is
Since TMR&HCMem is still a TMR-based scheme for registers, the probability of having a register failure is exactly the same as in FullTMR, i.e. is given by Equation (2).
TMR for Registers and Shared Encoded Memory
TMRRegs&HCMem uses the same encoded memory as in TMR&HCMem. Therefore the probability of a memory failure is given by Equation (3). Despite the fact that TMRRegs&HCMem is also a TMR-based scheme, the redundancy is implemented at the register level. Thus, the probability of register failure is slightly different from the other TMR schemes. In order to have a register failure in TMRRegs&HCMem, two bit-flips must occur in the same register and in the same clock cycle.
Consider a total register usage of r bits out of R bits available, and a register bit-flip rate of ρ per clock cycle. Notice that in FullTMR and TMR&HCMem, errors occurring in the middle processing must wait n clock cycles to be masked out by the voter. Now, in TMRRegs&HCMem, errors are masked in every clock cycle for each trio of registers. So, the bit-flip analysis is now performed in a single clock cycle. As a result, the probability of the first register bit-flip is P (R 1 ) = rρ/R. The failure condition for TMRRegs&HCMem is to have two registers corrupted in the trio, so that a voter would not be able to decide for the correct result. Thus, assume that a trio of registers occupies t bits, and that one of these registers suffered a bit-flip. The probability of having a second register corrupted in the trio, is 2t/(3R). Hence, the probability of the second bit-flip to happen is given by P (R 2 ) = 2tρ/(3R). Thus, the final probability of register failure in TMRRegs&HCMem is
Encoding/Decoding All Registers with Hamming Codes
Even though HCAllRegs adopts a totally different faulttolerant technique, it uses the same encoded memory as in TMR&HCMem. Then, the probability of having a memory failure is given by Equation (3). To save parity bits, this scheme merges all the registers before encoding. Then, the resulting Hamming-encoded register can be treated as a single element occupying r bits out of R bits available in the hardware device. Since this single register is decoded and re-encoded in every clock cycle, a failure will occur only if two bit-flips happen in the encoded register during the time frame of 1 clock cycle.
The probability of having a bit-flip in the encoded register is r/R. By considering a single clock cycle period to have a bit-flip and a bit-flip rate of ρ per clock cycle, the probability of the first register bit-flip is P (R 1 ) = rρ/R. Assuming that a first bit-flip happened, the condition for failure is to have a second bit-flip in the encoded register. Now, there are (r − 1) bits that may suffer a bit-flip. As a consequence, probability of having a second bit-flip in the encoded register is P (R 2 ) = (r − 1)ρ/R. Subsequently, the final probability of register failure in HCAllRegs is
Encoding/Decoding Main Registers with Hamming Codes
Although HCAllRegs is an optimization of HCAllRegs which reduces the register requirements, it also uses the same encoded memory as HCAllRegs and TMR&HCMem. Thus, the probability of having a memory failure is given by Equation (3). In HCMainRegs, though, all registers are encoded individually through the use of Hamming codes.
The condition for a register failure is to have two bit-flips in the same register (a, ..
while they are in their idle period. Suppose that all encoded registers employ r bits out of the R bits available in the hardware device. Then, the probability of a bit-flip in any bit of the encoded registers is r/R. Further, assume a bitflip rate of ρ per clock cycle. In this scheme the registers are kept encoded all the time, but are not decoded and reencoded in every clock cycle. Instead of that, they have a idle period of time, which we define as i. More precisely, i is the maximum number of clock cycles without performing error detection and correction on the data present in the registers. Therefore, the probability of the first bit-flip in a register is P (R 1 ) = riρ/R. To have a failure, a second bit-flips must occur in the same encoded register during i clock cycles. Furthermore, consider that a first bit-flip has already occurred in an encoded register, and that each encoded register uses g bits. Then, the second bit-flip must happen in one of the (g − 1) remaining bits. Thus, the probability of having a second bit flip in the same encoded register is P (R 2 ) = (g − 1)iρ/R. As a result, the final probability of register failure in HCMainRegs is
Device-Specific Probability of Failure
In order to employ the aforementioned equations to perform an analysis of device-specific probability of failure, it is necessary to consider the parameters defining the target hardware device, such as the total number of registers (R) and memory (M ) available. Given that we use an Altera CycloneII EP2C35F672C6 FPGA to obtain our experimental results, the same device was used to conduct the analysis in this section. Moreover, information on the design of each scheme must also be known. The memory and register requirements are shown in Tables 1 and Tables 2, respectively. The specific values for all variables defined while determining the probability equations are listed in Table 3 .
Memory Analysis-The memory usage (m) for each scheme is listed in Table 1 . By using Table 3 and the equations provided in this section, it is possible to determine, in terms of µ 2 , the probability of memory failure of all fault tolerant schemes. The results for all schemes are organized in Table 4 . From Equation (1), it results that the probabilities of memory failure of FullTMR are 557.28x10 −3 µ 2 and 5202.99x10 −3 µ 2 , respectively, for SHA224/256 and SHA384/512. Since TMR&HCMem, as well as all the remaining schemes, use a Hamming-encoded memory, their probability of memory failure are given by Equation (3). More precisely, the Hamming-encoded memory decreases the probability of memory failure to 1.77x10 −3 µ 2 and 11.96x10 −3 µ 2 , respectively, for SHA224/256 and SHA384/512. In order to have a better picture of the resistance provided by the Hamming-encoded memory, a normalized analysis is performed taking as reference the triplicated memory of FullTMR. From Table 5 , it is possible to conclude that, by encoding the memory, the memory resistance against SEUs of SHA224/256 is increased 314.63 times. This increase is even higher for SHA384/512, i.e. their memory become 435.15 times more resistant by using Hamming codes instead of TMR.
Register Analysis-Similar analysis for the registers can be done by utilizing the register usage (r) listed in Table 2  along with Table 3 . Equation (2) of register failures for all schemes are listed in Table 6 . Now, if TMRRegs&HCMem is used, Equation (4) If a normalized analysis is performed taking as reference FullTMR, it is easy to observe, from 
EXPERIMENTAL RESULTS
In order to better analyze the advantages of each scheme proposed in Section 5 it is necessary to evaluate them in terms of implementation area, throughput, frequency of operation, and power consumption. Although all the schemes discussed here can be implemented using both ASICs and FPGAs (Flash, Anti-Fuse, SRAM), we selected an SRAM FPGA capable of perform CRC checks automatically: an Altera CycloneII EP2C35F672C6 FPGA. Hence, we described the proposed schemes using VHDL, and performed the FPGA implementation. The tool employed in the description, synthesis, simulation and power estimation of all hardware modules was QuartusII [39] version 7.2, service pack 1.
We conducted the synthesis targeting low implementation area and low power consumption. In order to minimize the power consumption even further, we performed a two-step synthesis and simulation procedure. Once the first synthesis and simulation were complete, a signal activity file was created. This file was then fed back to the tool allowing for better synthesis optimization, thus leading to additional power savings. The data shown in Tables 8, 9 , and 10 reflect the results of the final synthesis and simulation. Table 11 shows dynamic power resulting from the simulation of the modules at their maximum frequency of operation (F max ), whereas Table 12 provides the normalized power estimates with all modules running at 33.33MHz.
Area Results
Implementation area is measured in terms of the number of logic elements (LEs) used to implement a given scheme in the FPGA. According to By analyzing Table 8 , it is possible to observe certain trends in the SHA-2 area utilization. FullTMR occupies about 3 times as much area as NoFT, while TMR&HCMem is slightly bigger than FullTMR. Further, both TMRRegs&HCMem and HCAllRegs employs, on average, 3.9 times more area than NoFT, and thus are more inefficient than FullTMR. HCMainRegs provides the higher area efficiency. On average, it utilizes 2.2 times as much area as NoFT, and less than 3/4 of the area of FullTMR.
Frequency Results
The frequency of operations of the schemes considered in this work is presented in Table 9 . By analyzing the SHA-224 results in that table, it is possible to notice that the fault-tolerant scheme with higher frequency of operation is FullTMR. Due to the high parallelism involved it can operate at 73.05MHz. Because TMR&HCMem uses a Hamming decoder between the memory and the TMR part of the module, its critical path is impacted negatively. As a consequence, it operates at a By observing the results for the other SHA-2 algorithms in Table 9 , it is possible to realize that FullTMR is in fact the fault tolerant scheme that provides higher frequencies of operation. Although, TMR&HCMem and TMRRegs&HCMem have similar frequencies of operation, the former is slightly faster than the latter. Further, these two modules are followed quite closely by HCMainRegs. The slower scheme in all cases, though, is HCAllRegs.
Throughput Results
For the purpose of simulation and throughput estimation, we consider the hash of only one block of message. The block size is 512 bits for SHA224/256, and 1024 bits for SHA384/512. More precisely, the throughput is defined as:
message block size/(#cycles/F max ). In order to compute a message digest, SHA-224, SHA-256, SHA-384 and SHA-512 take, respectively, 88, 89, 103, and 105 clock cycles. The number of clock cycles reported include the complete SHA-2 processing, as well as the time spent writing/reading data to/from the module.
As shown in Table 10 , the frequency of operation has a strong influence on the throughput, i.e. the higher the frequency, the higher the throughput. In fact, SHA-224 using FullTMR has highest throughput among the fault tolerant modules. Its throughput (425.02Mbps) is 96% of the NoFT throughput (443.58Mbps). Further, the throughput of TMR&HCMem (276.01Mbps) and TMRRegs&HCMem (264.20Mbps) achieve, respectively, 62% and 60% of the NoFT throughput. HCAllRegs presents a relative low throughput (165.93Mbps), which is 37% of the NoFT throughput. Moreover, HCMainRegs presents a throughput of 249.72Mbps, which is comparable with the ones of TMR&HCMem and TMRRegs&HCMem. This represents 56% of the NoFT throughput.
Following the same analysis, the throughput of the SHA-256, SHA-384 and SHA-512 using FullTMR are, on average, higher than 90% of the NoFT. On average, TMR&HCMem, TMRRegs&HCMem and HCMainRegs have similar relative throughput. Precisely, they achieve respectively 62%, 60% and 57% of the NoFT throughput.
Power Results
Power consumption is another important factor to be analyzed in space systems. Table 11 reports the dynamic power consumption of the implementations performing one hash computation at their maximum frequency of operation. The SHA-224 NoFT design consumes 78.17mW. When FullTMR is used, its power consumption is 2.8 times higher, i.e. 222.04mW. This is an expected result, given that it triplicates the SHA-224 datapath. By using an encoded memory, TMR&HCMem and TMRRegs&HCMem consume 1.9 times as much power as NoFT, respectively, 146.58mw and 149.89mW. Moreover, HCAllRegs consumes 267.94mW, i.e. 3.4 times more power than NoFT. The scheme with the least power consumption (125.04mW) is HCMainRegs. It consumes 1.6 times as much power as NoFT.
The overall power increase of FullTMR is slightly less than 3 times the power of NoFT. In addition, TMR&HCMem uses on average twice as much power as NoFT, whereas TMRRegs&HCMem uses 1.9 times more power than NoFT. The power increase caused by HCMainRegs is on average 1.6 times than the NoFT one.
In order to provide a fair comparison among the implementations, we performed a normalized power estimation by running all the designs at a common frequency of 33.33MHz.
Given that HCAllRegs has a frequency of operation lower than 33.33MHz, they were not included in this comparison. By looking at the normalized power consumption of SHA-256, SHA-384 and SHA-512 in Table 12 , one can conclude that FullTMR, TMR&HCMem and TMRRegs&HCMem uses, on average, 3 times as much normalized power as NoFT. HCMainRegs, in turn, leads to an average normalized power increase of 2.8 times, compared to NoFT.
Discussion
This section highlights the most important improvements brought by the proposed schemes compared to TMR. The benefits of using an encoded memory is twofold. First, as shown in Table 1 , it uses 60% less memory than FullTMR. Second, as listed in Table 5 , the memory becomes 314.63 times more resistant against SEUs in SHA224/256, and 435.15 times more resistant in SHA384/512.
TMRRegs&HCMem offers the highest level of protection against SEUs. More precisely, as listed in Table 7 , the registers become 131072 times more resistant against SEUs in SHA224/256, and 204800 times more resistant in SHA384/512 when compared to FullTMR. However this scheme employs 1.3 more area than FullTMR to achieve higher levels of resistance to SEUs. HCAllRegs also offers a high protection against SEUs, but as can be noticed from Figures 7 and 8 , that it is the most inefficient scheme in terms of area, throughput, and power consumption.
The best trade-off among implementation area, power consumption and protection against SEUs is achieved with HCMainRegs. As depicted in Figures 7 and 8 , this scheme uses up to 32% less area and consumes up to 43% less power than FullTMR. Further, it employs up to 63% less registers than FullTMR. Additionally, when HCMainRegs is applied to SHA224/256 and SHA384/512 their registers become, respectively, 159.1 and 175.33 times more resistant against SEUs than when using FullTMR. Also, the memory of SHA224/256 and SHA384/512 become, respectively, 314.63 and 435.15 times more resistant than the one in FullTMR.
CONCLUSIONS
The paper proposes fault tolerant schemes for the SHA-2 family of hash functions, providing both error detection and correction. Although all schemes can be applicable to ASICs and FPGAs (SRAM, Flash, Anti-Fuse), we implemented them in an SRAM FPGA to perform their evaluation in terms of area, frequency of operation, throughput, and power consumption. Additionally, a comprehensive analysis of the resistance of the schemes against SEUs is performed.
For the sake of comparison we implemented the traditional TMR and we showed that this fault tolerance technique applied to SHA-2 hash function demands three times as much area resource as a non-fault tolerant approach. The proposed scheme named HCMainRegs provides a better trade-off for area and power consumption and improves the resistance to errors caused by SEUs. For instance, SHA-512 adopting HCMainRegs employs 6897 LEs and 6248 memory bits, and has a dynamic power consumption of 241.63mW. By comparing with NoFT, those results represent area and power increases of 2.1 and 1.7 times, respectively. However, HCMainRegs uses up to 32% less area and consumes up to 43% less power than FullTMR. Moreover, its memory and registers are, respectively, 435 and 175 times more resistant against SEUs than they would be by using FullTMR.
As a result, HCMainRegs can successfully replace TMR for achieving fault tolerance in the SHA-2 family of hash functions. Besides, it provides higher levels of protection against SEUs, as well as favors low power and low implementation area, which are crucial in space applications. To the best of our knowledge, this is the first implementation and analysis of the SHA-2 family of hash functions providing both error detection and correction reported in the literature.
