Abstract-Recently, the tradeoff between energy consumption and fault-tolerance in real-time systems has been highlighted. These works have focused on dynamic voltage scaling (DVS) to reduce dynamic energy dissipation and on-time redundancy to achieve transient-fault tolerance. While the time redundancy technique exploits the available slack-time to increase the fault-tolerance by performing recovery executions, DVS exploits slack-time to save energy. Therefore, we believe there is a resource conflict between the time-redundancy technique and DVS. The first aim of this paper is to propose the use of information redundancy to solve this problem. We demonstrate through analytical and experimental studies that it is possible to achieve both higher transient fault-tolerance [tolerance to single event upsets (SEUs)] and less energy using a combination of information and time redundancy when compared with using time redundancy alone. The second aim of this paper is to analyze the interplay of transient-fault tolerance (SEU-tolerance) and adaptive body biasing (ABB) used to reduce static leakage energy, which has not been addressed in previous studies. We show that the same technique (i.e., the combination of time and information redundancy) is applicable to ABB-enabled systems and provides more advantages than time redundancy alone. Index Terms-Embedded systems, energy efficiency, fault tolerance, single event upsets.
embedded systems [3] , [7] . Time redundancy technique is popular since it is cost-effective and less resources (hardware/software) are wasted on tolerating transient faults as compared to other fault-tolerance techniques [25] , [27] . This technique uses slack-time in the system schedule to improve transient-fault tolerance by performing recovery executions whenever faulty runs occur. The number of possible recovery executions depends on the available slack-time. Since DVS also requires slack-time, there is a resource conflict between DVS and the time-redundancy technique on slack-time which is a limited resource, i.e., if more slack-time is given to DVS to save more energy, less slack-time is left for transient-fault tolerance, and vice versa.
DVS not only reduces the slack-time available to time-redundancy-based fault-tolerance, but also increases the rate of transient faults or single event upsets (SEUs) (bit-flips due to the impact of particles on flip-flops). Indeed, it was reported that the rate of SEUs increases exponentially as supply voltage decreases [3] , [8] [9] [10] . Traditionally, SEUs were regarded as a major concern only for space application. However, recently, SEUs have become the major source of concern even at the ground level due to the continuing technology shrinkage [10] , [12] . Unfortunately, packaging cannot be effectively used to shield against SEUs [13] , [14] since the chip and the packaging materials themselves emit alpha particles that can cause SEUs. Also, SEUs can be caused by neutrons which can easily penetrate through packages [8] , [10] .
The energy consumption of a VLSI system can be subdivided into two main components: dynamic energy and leakage energy. Until recently, dynamic energy has been the main source of energy consumption. However, in deep-submicron CMOS the technology shrinkage causes transistor subthreshold leakage current to increase exponentially which results in a corresponding increase in leakage energy, so that the leakage energy becomes comparable to the dynamic energy. Hence, it is essential to use techniques to manage the leakage energy [21] , [26] . Adaptive body bias (ABB) has been shown to be an effective technique to reduce leakage power [28] by tuning the threshold voltage of the transistors, reverse body bias to reduce subthreshold leakage in standby mode and forward body bias to improve performance in active mode. To implement ABB in practice, generator circuits supplying the body-bias voltages are required [29] . Nevertheless, this biasing also affects the frequency at which the circuit operates and, therefore, influences the slack-time [6] . That is, a problem similar to the one discussed above for DVS exists also for ABB, i.e., there is a resource conflict between ABB and the time-redundancy technique on slack-time. Furthermore, it has been shown that ABB can worsen the SEU rate by 36% [11] .
As opposed to the previous works [1] [2] [3] , [31] on fault-tolerant DVS-enabled real-time systems which focused on time redundancy, in this paper we propose the usage of information redundancy in fault-tolerant DVS-enabled and ABB-enabled systems. Since both DVS and ABB requires slack-time, information redundancy is used to decouple the fault-tolerance from the slack-time and, hence, to provide more slack-time to DVS and ABB without degrading the fault-tolerance capability of the system. To the best of our knowledge, this paper is the first attempt that addresses energy management through DVS and ABB and SEU-tolerance through information-redundancy in conjunction. Also, this paper is the first attempt that considers the energy/fault-tolerance tradeoff in ABB-enabled systems. It should be noted that the aim of the paper is not to propose any new fault-tolerance or energy management technique, rather to identify appropriate fault-tolerance and energy management techniques among the existing ones which are more suitable to be used together. This is necessary since the continuing diversity of embedded systems applications require that such systems to exhibit both reliability and energy efficiency. Toward this, we have evaluated the fault-tolerance/energy tradeoff for a given deadline of two existing fault tolerance techniques when employed in real-time embedded systems to improve their reliability to SEU faults. Our study shows that a combination of time and information redundancy has less interference with energy management techniques (i.e., ABB and DVS) as compared to time redundancy alone (which has been the focus of the previous works [1] [2] [3] ).
The rest of the paper is organized as follows. Section II discusses the related works. Section III presents the system fault-tolerance and energy models. Section IV compares the fault-tolerance and the energy consumption of the proposed approach (which uses both time-redundancy and information-redundancy) and the conventional approach (which solely uses time-redundancy), using the models presented in Section III and experimental results. Finally, Section V concludes the paper.
II. RELATED WORKS
The tradeoff problem between fault-tolerance and energy consumption in DVS-enabled real-time systems has recently been highlighted [3] and become subject to investigations [1] , [2] , [31] . Nonuniform checkpoint placement policies for the combined purpose of conserving energy and providing fault-tolerance have been proposed in [1] . The technique proposed in [2] uses an adaptive check-pointing scheme to achieve fault-tolerance and energy saving in a unified manner. An integrated approach for achieving fault tolerance and energy savings in fixed-priority real-time embedded systems has been investigated in [31] . Although all these techniques [1] , [2] , [31] are effective in achieving fault tolerance, the obtained energy savings are limited due to the fact that the time redundancy requires slack-time; slack-time that otherwise could be exploited through DVS to reduce the energy consumption. In the context of leakage energy reduction, although there are some works on the impact of body biasing on SEU rate [11] , [16] , [17] , these works do not consider any fault-tolerance technique and the interplay of ABB and fault-tolerance techniques has not been studied.
In addition to the aforementioned works which consider the energy/fault-tolerance tradeoff in DVS-enabled real-time systems, recently there have been some other reported works which are not directly related to this paper (here we focus on ABB-enabled and DVS-enabled systems); however, they provide further evidence that the energy/fault-tolerance tradeoff exists [15] , [18] , [19] . Using circuit-level simulation [19] shows that in small circuits (4-bit counters) the transient fault tolerance and power dissipation are at odds. In the context of on-chip communication, [18] analyzes the impact of redundant bus coding on the energy/reliability tradeoff and [15] proposes a dynamic voltage swing approach to optimize the energy consumption of a reliable communication scheme.
III. SYSTEM MODELS
In this paper, we will compare and analyze two types of faulttolerant energy-aware real-time systems, defined as follows.
A) Conventional R System: This represents a fault-tolerant energy aware system which uses pure rollback-recovery, i.e., the conventional approach based on time-redundancy [ Fig. 1(a) ]. In this system, whenever transient faults (i.e., SEUs) occur during the task execution, a recovery execution (re-execution) of the same task is required [3] , [7] . All the systems which use rollback recovery have some error detection mechanisms (e.g., control flow checking techniques, consistency check, etc. [25] , [27] ). Using these mechanisms when a system detects that the results generated by a task are in error, the system re-executes the task [27] . Also, in these systems, highly reliable memory units are required [27] . This is necessary for the correct operation of rollback-recovery. For example, if an error occurs in the memory area which contains the task code, all recovery executions will be faulty since they will re-execute the same erroneous code (for more information on the requirements of rollback-recovery, please refer to [27] ).
As an example, Fig. 1(a) shows a possible scenario which can occur in the conventional R system. As shown in this figure, during the original task execution three SEUs (two SEUs in the same clock cycle and one single SEU) cause a faulty run, hence, necessitating a recovery execution (recovery execution one). Such executions have to be performed until a nonfaulty run happens [e.g., recovery execution two in Fig. 1(a) ]. In order to achieve a certain degree of fault-tolerance it is necessary to reserve some system time for recovery executions (slack-time for recoveries), while the remaining slack-time until the task deadline D can be exploited via DVS and ABB to reduce the system's energy dissipation.
B) Proposed RI System: These fault-tolerant energy-aware systems use both rollback-recovery and information redundancy [25] , i.e., the fault-tolerance is achieved through recovery executions as well as through redundant information that can be used to correct faults during execution (i.e., without necessitating a re-execution). Consider Fig. 1(b) , which demonstrates this approach using the same SEUs as in Fig. 1(a) . As we can observe, whenever one SEU occurs during a single clock cycle (first and third faults in Fig. 1 ), the resulting error can be corrected by some additional hardware which is used for information redundancy. Faults that require a recovery execution occur only if two or more SEUs happen during a single clock cycle [for instance, second fault during the original execution in Fig. 1(b) ]. Accordingly, the number of necessary recoveries is reduced leaving more exploitable slack-time to DVS and ABB.
Suppose a task and its recoveries run at the same frequency . Let be the number of clock cycles which are needed to execute the task, be the deadline (in seconds), and be the probability of having a faulty run. Then, the task execution time is seconds, and the amount of total slack-time is . The first recovery execution is only required when the original execution fails, hence, the probability to run the first recovery is . The second recovery execution is only required when both the original and the first recovery executions fail, hence, the probability to run the second recovery is . Similarly, the th recovery will be executed with probability . Thus, the expected time required for executing recoveries is (1) Therefore, the slack-time which is left for DVS and ABB is (2) DVS and ABB can use the slack-time to save energy. It can be seen from (2) that as (i.e., the probability of having a faulty run) decreases, increases. Note that in the RI system, the usage of information redundancy decreases so that increases and more slack-time becomes available to save energy [compare Fig. 1(a) and (b) ].
Information redundancy in the proposed RI system is obtained by adding some additional hardware to the conventional circuit, as shown in Fig. 1(c) . This hardware comprises a parity generator (produces parity bits, e.g., overlapping parity bits [25] ), flip-flops to store the parity bits, and a single bit error corrector which restores the affected registers to the original content as long as only one bit is corrupted). We will demonstrate in Section IV that the extra energy associated with the additional hardware can be overcompensated by DVS and ABB (because of the increase), i.e., the RI systems can yield higher energy savings when compared to the conventional R systems.
To clarify the hardware required for information redundancy in the proposed RI system, consider a 4-bit register which has been protected using the overlapping parity technique. In the overlapping parity technique, three parity bits are required to protect four bits of information. Each parity bit is generated from a subset of the data bits, called parity group. For example, assuming that the four original data bits are stored in flip-flops D0 through D3 and the three parity bits are stored in flip-flops P0 through P2, Table I shows the three parity groups associated to the parity bits. As shown in this table, the parity groups overlap in such a manner that each data bit appears in more than one parity group. The concept of overlapping parity is to assign each bit to a unique combination of parity bits [25] , so that if an SEU occurs in any one bit (either data or parity), the combination of the parity bits which detect the error is unique. For example, it can be seen from Table I that data bit D2 contributes to the generation of parity bits P2 and P1, hence, when bit D2 is in error (because of an SEU in flip-flop D2) the unique combination of the parity bits which detect this error is , i.e., parity bits P2 and P1 detect the error simultaneously. However, if an SEU occurs in any bit (either data or parity) other than D2, the combination of the affected parities will not be and parities other than P2 and P1 will be affected. Table II shows the combination of the parity bits which detect SEUs.
The single error correction circuitry shown in Fig. 1 (c), can detect and locate an SEU because each SEU affects the parity bits in a unique manner (as shown in Table II ). Once the location of the erroneous bit is known, the error can be corrected by simply complementing the output of the erroneous flip-flop. For example, when parity bits P2 and P0 are affected, bit D1 is erroneous and the output of flip-flop D1 should be complemented (Table II) . The output of the single error correction circuitry maintains the correct value until the next clock edge, on which the erroneous bit is removed when new data is clocked in. Note that the single error correction circuitry cannot correct errors, if more than one SEU occurs in the protected registers during a clock cycle. For example, when data bits D1 and D0 are erroneous, parity bits P2 and P1 detect the error (Table I) . However, the combination is assigned to bit D2 (Table II) , hence, the single error correction circuitry complements bit D2, instead of complementing the erroneous bits D1 and D0.
It should be noted that if an error (SEU) occurs directly in one of the parity flip-flops, it will have no impact on the data bits read out from the register. This is because when an SEU occurs in a parity flip-flop, only the corresponding parity bit is affected. However, the error correction circuitry inverts a data bit only when at least two parities are affected (Note that each data bit has been assigned to more than one parity bit). The penalty for using overlapping parity on four bits of information is high; three parity bits are required for the four bits of information. However, as the number of information bits increases, the number of parity bits required becomes a smaller percentage of the number of actual information bits. For example, only 7 parity bits are adequate for protecting a 64-bit register [25] . It will be shown later, in Table III , Section IV-B, that the proposed RI system is still effective, even when considering the imposed hardware overheads.
A. Fault-Tolerance Assessment
The correctness of a real-time system depends not only on the logical correctness of computation, but also on the time which the application takes to complete successfully. Hence, to measure the fault-tolerance of a real-time system, we need to consider both the tolerance to computation faults as well as the capability to meet deadlines (timely completion). Note that in soft real-time systems, occasionally missing deadlines has negligible effects, however, it is still incorrect (i.e., incorrect but negligible).
While SEUs can cause computation faults, lowering the performance (speed) required by DVS and ABB, increases the application execution time and, hence, in the case of excessive performance reduction, it can cause missed deadlines. Also, in the time-redundancy technique [3] , [7] , the use of rollback executions to tolerate SEUs requires time and when a faulty run occurs the task can be re-executed only if it does not result in missing the deadline (i.e., there is enough slack-time available). From this discussion, it can be seen why in addition to computation faults, timely completion should be taken into account when assessing the fault-tolerance of real-time systems. To do this, the related literature has used the following metrics to measure the fault-tolerance of real-time systems: [2] and [31] have used "the likelihood of timely task completion in the presence of faults," and [3] and [20] have used "the probability to complete the application correctly within its deadline in the presence of faults." It can be seen that these two definitions are equivalent and both of them consider timely task completion in the presence of faults. However, from a terminology point of view, the term "performability" is used in [3] and [20] to refer to this definition, while [2] and [31] do not use the term "performability."
It should be noted that for hard real-time systems, it is important to guarantee timeliness in worst case scenarios (for example in scheduling the tasks), however, from the reliability point of view, it is not possible to claim that a task will be completed correctly within its deadline. For example, in a hard-real time system, even when the probability of having errors is very low, errors can consecutively occur in the original and recovery executions, so that the task cannot be completed within its deadline (the probability of this happening is very low but it is not zero). It can be stated that "in hard real-time systems, we require that a task finishes correctly within its deadline with a very high probability." In other words, based on the above mentioned performability definition, it can be stated that "in hard real-time systems we require a high performability (i.e., very near to 1)." For soft real-time systems, although missed deadlines will not cause catastrophes, we still require that a task can be completed within its deadline. However, the probability of finishing the task within its deadline could be less than what is required for hard real-time systems. In other words, in soft real-time systems, we require a lower performability as compared to hard real-time systems.
Using the performability criterion, this section presents an analysis for both the conventional R and proposed RI systems. In this section, we first consider the system operational frequency since it determines the performance (speed) of the system, which has an important impact on the performability. Then, we consider SEU rate which is another factor with important influence on performability, since SEUs cause faults in computation results. Finally, using the analytical models of operational frequency and SEU rate, we develop the performability models for both the conventional R and proposed RI systems.
A) Operational Frequency: In DVS-enabled systems, reducing the supply voltage of a digital circuit requires the reduction of the frequency in order to ensure correct operation. Simi- larly, in ABB-enabled systems, reducing the body-bias voltage requires the reduction of the frequency. Analytical models for the impact of ABB and DVS on the system operational frequency have been developed in [21] . In this section, we use the same models to formulate the operational frequency of the conventional R and proposed RI systems.
When the conventional R system runs at supply voltage , and body-bias voltage (applied between the body and source of transistors) the operational frequency can be expressed as [21] ( 3) where is the logic depth of the critical path, , and are constants for given process technology, and is a measure of velocity saturation whose value has been approximated to be one [21] .
This paper proposes the usage of information redundancy, which requires some extra hardware logic to process the redundant information. Suppose that because of the extra hardware logic, the depth of the critical path of the proposed RI system is times the depth of the critical path of the conventional R system, i.e., , then the operational frequency of the proposed RI system is (4) where and are the supply voltage and body-bias voltage in the proposed RI system, respectively. B) SEU Rate: SEU rate is the average number of SEUs occurring in a system, per unit of time (e.g., second, hour). It has been observed that supply voltage (DVS) has an important influence on SEU rate, so that as supply voltage decreases, SEU rate increases exponentially [8] , [9] . In fact, SEU rate increases about 1-2 orders of magnitude as supply voltage decreases by 1 V [8] , [9] . Also, it has been reported that body biasing techniques (ABB), used to reduce leakage power, can worsen SEU rate by 36% in flip-flops [11] .
To analyze the impact of combined dynamic voltage scaling and adaptive body biasing on the SEU rate of flip-flops, we have used SPICE-based fault injection experiments. In these experiments, faults were injected to flip-flops similar to the flip-flops used in [11] (in [11] SEU rate measurements are performed by subjecting the flip-flops to accelerated alpha and neutron fluxes). Fig. 2 shows the scheme of these flip-flops. The simulations were carried out using a CMOS 0.25-m technology. Faults were injected using the current sources, which can accurately represent the electrical impact of the particle strikes. Similar approaches have been used in prior works [10] , [17] , [19] . The injected current caused by a particle strike is [10] (5) An SEU occurs if collected charge (caused by a particle strike) exceeds critical charge of a circuit node. In other words, can be defined as the minimum charge collected due to a particle strike that can cause an SEU [10] . It has been shown that there is an exponential relationship between SEU rate and [10] , i.e.,
On the other hand, can be derived using (7) [30] where is the drain current induced by the charged particle, and is the flipping time which defines the irreversibility point after which the feedback mechanism of the flip-flop will take over to continue the flipping process. SPICE simulations were used to measure the flipping time which can be used to calculate . Fig. 3 depicts the experimental results and shows the impact of supply voltage and body-bias voltage variations on the critical charge . In this figure, three curves are plotted for three different body-bias voltages. Each curve illustrates how changes as changes. Three interesting observations can be made from Fig. 3. 1) It can be seen from this figure that regardless of there is a linear relationship between and , i.e., or
2) It can be seen from this figure that when changes, the line is shifted up or down; however, the slope of the line is almost the same. This means that is a function of , but (line slope) is not a function of . Therefore, we can rewrite (8) as (9) 3) The impact of on is much more significant than the impact of . For example, when is constant and equal to 0, if one reduces by 1.5 V (from 3.3 to 1.8), the critical charge will be reduced by about 11.5 fC. However, when is constant and equal to 3.3 V, if one reduces by 2 V (from 0 to ), the critical charge will be reduced by about 3 fC. This result is in agreement with the conclusions reached in [8] , [9] , and [11] , i.e., while variations in change the SEU rate by several orders of magnitude (e.g., multiplied by a factor of 10, 100, 1000, or ) [8] , [9] , variations in changes the SEU rate only by a factor of about 1.36. It should be noted that although ABB does not have a major impact on SEU rate (as compared to DVS), it still has an important impact on the system fault tolerance (Section III-A3). This is because when ABB is used to reduce energy consumption, it uses slack-time and leaves less slack-time for the recovery executions. As mentioned previously, SEU rate is exponentially proportional to , therefore, SEU rate can be expressed as (10) In order to show (10) in a more suitable shape, let be the maximum supply voltage and be the voltage value that when supply voltage decreases by it, the SEU rate increases by one order of magnitude. Then (10) can be rewritten as follows: (11) It should be noted that (11) is obtained from (10), just by defining new constants and , i.e., and . Also, (11) can be rewritten as follows: (12) where is the SEU rate corresponding to . In this paper, it is assumed that the SEU rate increases one order of magnitude as supply voltage decreases by 1 V (reasonable assumption based on the data in [8] and [9] ), hence, V. Also, it is assumed that faults/s (FPS), i.e., the SEU rate at and [reasonable assumption based on the data in [3] ]. Although, this assumption about the SEU rate is reasonable for typical environments [3] , since the SEU rate varies in different environments we will analyze the impact of SEU rate variations on both the proposed RI and conventional R systems in Section IV-C. As mentioned previously, it has been shown that ABB can worsen SEU rate by 36% [11] . Therefore, it is assumed that , where is the minimum value of . The use of information redundancy requires some extra flipflops to store the redundant bits. However, as the number of the flip-flops increases, the rate at which the flip-flops are hit by particles increases linearly [19] . Suppose that because of the redundant bits, the number of the flip-flops of the proposed RI system is times the number of the flip-flops of the conventional R system, then the SEU rate of the proposed RI system is (13) C) Performability Model: It has been observed that the time instants where a radiation particle hit takes place follows a Poisson process [12] . Consequently, Poisson distribution has been commonly used to model the rate of particle-induced faults (i.e., SEUs) [2] , [3] , [12] . In the conventional R system, based on Poisson distribution, the probability of having no SEU during a given clock cycle is (14) Therefore, in the conventional R system, the probability of having a faulty run (at least one SEU during one of the clock cycles) of the task is (15) where is the number of clock cycles which are needed to execute the task. Since the time required for one execution of the task is , the maximum number of possible recoveries is (16) where is the deadline (in seconds). Based on (15) and (16), the performability of the conventional R system is (17) In the proposed RI system, based on Poisson distribution, the probability of having no SEU during a given clock cycle is (18) and the probability of having exactly one SEU in the clock cycle is (19) Hence, the probability of having a faulty run in the proposed RI system can be expressed as (20) Note that, as mentioned in Section III, the proposed RI system has a faulty run if more than one SEU (at least two SEUs) occurs during a clock cycle. Based on (20) , the performability of the proposed RI system is (21) Equations (17) and (21) will be used in Section IV to compare the performabilities of the conventional R system (based on time-redundancy only) and the proposed RI system (i.e., the proposed approach based on the combination of time and information redundancy). It is important to note that the performability of both the conventional R system and the proposed RI system increase with increasing supply-voltage and body-bias voltage (and consequently increasing operational frequency). This is due to two reasons: 1) more recovery executions can be performed within the task deadline and 2) the system is less susceptible to SEUs at higher-supply and body-bias voltages. However, in general, the performability of the RI system is better than the R system when the same supply and body-bias voltages are used. This is due to the fact that the additional information redundancy in the RI system, which does not require slack-time for any recovery execution, covers one SEU per clock cycle, hence, leaving more slack-time for recoveries. This aspect will be clarified in Section IV.
B. Energy Consumption Model
The energy consumption per cycle of the conventional R system is [21] Dynamic Energy Static Energy (22) where is the average switched capacitance/cycle for the whole circuit, is the number of the logic gates in the circuit, , and are constant parameters, and is the current due to junction leakage.
As mentioned in Section III in the proposed RI system, some extra hardware logic is needed to process the redundant information. Suppose that because of the extra hardware, the number of gates in the proposed RI system is times the number of gates in the conventional R system, i.e.,
. Let be the average switched capacitance per cycle for this extra hardware logic, the energy consumption (per cycle) of the proposed RI system is (23) As mentioned in Section III, both the conventional R and the proposed RI systems use rollback-recovery, i.e., after a faulty run the task has to be re-executed. Such recovery executions consume energy just like the original execution. Therefore, to analyze the energy consumption of the conventional R and proposed RI systems, the expected value of energy consumption should be considered. The expected energy consumption is [3] ( 24) where is given either by (22) or (23), depending on which system type is considered. According to (22)- (24), if the con-ventional R system and the proposed RI system operate at the same supply and body-bias voltages, the RI system will show higher energy consumption than the R system. However, it is important to note that the RI system has a much better performability than the R system at the same voltage setting, so it is possible (see Section IV) to lower the supply voltage and body-bias voltage of the RI system via DVS and ABB to achieve less energy dissipation than the R system, even though the RI system still provides better performability than the R system.
IV. EXPERIMENTAL AND ANALYTICAL RESULTS
In this section, we validate the efficiency and applicability of the proposed combined time and information redundancy approach as compared to the time-redundancy approach. For this purpose, we have performed a Crusoe processor case study as well as some experiments using several ITC'99 benchmarks. Section IV-A compares the performability and energy dissipation of the conventional R and the proposed RI systems based on the Crusoe processor. Section IV-B investigates the influence of hardware overhead on the suitability of the proposed approach and presents synthesis results to clarify the typical hardware overhead. Section IV-C studies the impact of the SEU rate on the proposed approach.
A. Case Study: Crusoe Processor
This section demonstrates that it is possible to achieve both higher performability and less energy consumption using a combination of information and time redundancy techniques (proposed RI system) when compared to using time redundancy alone (conventional R System). We use as a case study a Transmeta Crusoe processor implemented in 0.18-m CMOS technology, for which implementation-relevant parameters are given in [21] and [22] . These parameters comprise the following constants needed for the evaluation of performability and energy [see (3) , (4), and (14)- (24)]: V, F, A. As an example, a task with clock cycles and a deadline at ms is considered here. This task has a worst case execution time of ms, when V and V. It should be noted that these values for execution time and deadline are considered only as an example which is used as a case study to plot the tradeoff graphs. For this example, the deadline allows three recovery executions of the whole task at V and V. Furthermore, for the RI system we assume a hardware overhead as well as increased switching activity of 100% (i.e., ), and a critical path depth increase of 10% ( ). This assumption will be examined in Section IV-B.
Using the analytical models developed in Section III, we analyze the energy/performability tradeoff in the conventional R and proposed RI systems when: a) DVS is used and b) both DVS and ABB are simultaneously used. Fig. 4 shows how the energy consumption and the performability of the conventional R and proposed RI systems change when DVS is used (supply voltage changes and body-bias (17) and (24) proposed RI system tradeoff graph obtained from (17) and (24).
1) Energy/Performability Tradeoff in DVS-Enabled Systems:
voltage is constant V). In this figure, the curve of the conventional R system is an energy/performability tradeoff graph obtained from (17) and (24) . Also, the energy/performability tradeoff graph of the proposed RI system has been obtained from (21) and (24) . It can be seen from this figure that in both systems we can improve the performability (fault-tolerance) by increasing the supply voltage; however, this increases the energy consumption of the system.
As shown in Fig. 4 , when the supply voltage of the proposed RI system increases from 1.4 to 1.5 V, the performability does not increase considerably. However, when the supply voltage increases from 1.5 to 1.6 V, the performability abruptly increases. This is because when the supply voltage increases from 1.4 to 1.5 V the number of possible recovery executions remains the same (the performability has a small improvement because of the SEU rate reduction), however, when the supply voltage changes from 1.5 to 1.6 V, the operational frequency reaches the level sufficient to have one more recovery execution, which leads to an abrupt improvement in performability. A similar pattern is observed for the conventional R system when, for example, the supply voltage changes from 1.2 to 1.4 V and from 1.4 to 1.6 V.
It can be seen from Fig. 4 that the curve of the proposed RI system is below the curve of the conventional R system. This leads to an interesting conclusion.
• When the DVS technique is employed, it is possible to achieve both higher fault-tolerance and less energy using the proposed RI system when compared to the conventional R system. We clarify this by means of the following examples. 1) Suppose we require a performability higher than -(this performability is very near to 1 which means that we require a hard real-time system). As it can be seen in Fig. 4 , to meet this requirement, we can use the conventional R system with the supply voltage V. However, if we use the proposed RI system with the supply voltage V, we will achieve the required performability as well as about 43% energy saving. In fact, compared to the conventional R system at V, the proposed RI system can even provide both higher performability and lower energy consumption at the same time if we apply the supply voltages 1.4, 1.5, and 1.6 V (Fig. 4) . 2) Suppose we require a maximum energy consumption of 10 mJ. As it can be seen in Fig. 4 , to meet this requirement, we can use the conventional R system with the supply voltage V which leads to a performability of -. However, if we use the proposed RI system with the supply voltage V, we will achieve the required energy constraint and at the same time a better performability (i.e., -) than the conventional R system.
2) Comparison of the R and RI Systems and Simultaneous DVS and ABB:
Using the analytical models developed in Section III, Fig. 5 shows how the energy consumption and the performability of the conventional R and proposed RI systems change when DVS and ABB are used simultaneously. In this figure, for each system (R and RI) two curves are plotted for two different body-bias voltages, i.e., , and V. Each curve illustrates the energy/performability tradeoff when the suply voltage changes. As shown in Fig. 5 , the curves of the proposed RI system are below the curves of the conventional R system. An interesting observation can be made from Fig. 5 .
• When both the DVS and ABB techniques are employed, for the same constraint on system fault-tolerance (performability) the proposed RI system offers lower energy consumption than the conventional R system. For example, if we require a performability more than -, as it can be seen in Fig. 5 , we can use one of the following combinations: 1) conventional R ; 2) conventional R ; 3) proposed RI ; 4) proposed RI . However, if we use the combination 4, i.e., proposed RI , we will achieve the required performability as well as the least energy consumption.
B. Hardware Overhead
Although the previous analysis has been carried out for the Crusoe processor, most of the parameters (Section IV-A) are independent from the Crusoe design and are only dependent on the used technology. In fact, the only parameters that depend on the Crusoe processor are: 1) number of the gates and flip-flops; 2) average switched capacitance; and 3) depth of critical path. The hardware overhead, which is required to process the redundant information, influences these three parameters. In order to examine the assumptions made in Section IV-A about the hardware overhead value and to study the impact of the overhead on the efficiency of the proposed approach, we have regenerated the plots of Fig. 5 in Fig. 6 for different parameters settings, i.e., critical path increase , hardware overhead ( and ), and switching activity (switched capacitance) overhead . As we can observe from Fig. 6(a) , if the RI system hardware overhead as well as the switching activity are assumed to be 50% higher than in the original R system and the critical path increase to be 4%, then the proposed RI system proves advantageous in terms of both fault-tolerance (performability) and energy dissipation. With increasing critical path (up to 10%), hardware and switching overheads (up to 200%), the energy consumption and performability of the proposed RI system becomes closer to the conventional R system [ Fig. 6(a)-(d) ]; however, the proposed RI system still provides better performability and energy dissipation.
To provide insight into the critical path, hardware, and switching activity overhead required for typical circuit designs, we have carried out some synthesis experiments using four circuits from the ITC'99 benchmarks and Synopsys design compiler. The benchmarks which have been used are benchmarks b12-b15. These benchmarks are: 80386 processor (subset), Viper processor (subset), 1 player game, and sensor interfaces. Some of the other ITC'99 benchmarks are too small so that they can be considered as simple components (such as b1, b2). Also, the other ITC'99 benchmarks include several copies of benchmarks b15 and b14 (such as b16 and b17). We have used the most appropriate benchmarks among the ITC'99 benchmarks (such as processors which can be used in real-time applications).
The experiments were performed for the unmodified circuits (representing the R systems) as well as for the modified circuits (based-on overlapping parity method [25] ) that included the extra hardware for the redundant information (representing the RI systems). To apply the overlapping parity technique, the flips-flops of the system are divided into registers (with different sizes) and each register is replaced with a corresponding SEU tolerant register (See Section III). This process has been performed manually.
After synthesis, the total number of signal transitions was used as a criterion to analyze the average switched capacitance and, hence, the dynamic energy consumption. It should be noted that the hardware overhead also accounts for the static energy overhead (see Section III-B). Table III shows the experimental results. As shown in this table, the performed experiments indicate a hardware overhead of 42% to 173% and a switching activity overhead of 59% to 161%. Also, it has been found that the critical path length increase is less than 7%. Note that for such overheads the proposed RI system yields better results in terms of energy and performability (Fig. 6) . Overall, the experiments presented in this section have shown that the proposed RI systems offer advantages in terms of energy and performability over conventional R systems. This is the particular case if the hardware overhead for the additional information redundancy can be kept below 200% (Fig. 6) .
C. Impact of SEU Rate
So far, we have assumed that FPS (Section III-A2). However, the SEU rate depends on the application environments and, hence, it is worthwhile to study the impact of the SEU rate on the efficiency of the proposed RI system. To do this, we have regenerated the plots of Fig. 5 in Fig. 7 for different SEU rates. Here, it is assumed that: switching activity overhead % , hardware overhead % , and critical path increase % . It can be figured out from Fig. 7 that the proposed RI system proves more advantageous than the conventional R system, when the SEU rate is larger. We clarify this by means of the following example. Suppose we require a performability more than -. To achieve this level of performability, we do the following.
• When FPS [ Fig. 7(a) ], we can use the conventional R system at and the proposed RI system at . However, at these voltage settings, the proposed RI system offers about 22% energy saving as compared to the conventional R system.
• When FPS [ Fig. 7(b) ], we can use the conventional R system at and the proposed RI system at . However, at these voltage settings the proposed RI system offers about 42% energy saving as compared to the conventional R system. • When FPS [ Fig. 7(c) ], we can use the conventional R system at and the proposed RI system at . How-ever, at these voltage settings, the proposed RI system offers about 44% energy saving as compared to the conventional R system. In short, with the performability constraint of -, as the SEU rate increases from FPS to FPS, the energy saving of the proposed RI system over the conventional R system increases from 22% to 44%.
V. CONCLUSION
High fault-tolerance against transient faults (SEUs) and lowenergy consumption are key objectives in the design of real-time embedded systems. There exists effective energy saving techniques such as DVS and ABB and mature fault-tolerance techniques which can be used to achieve these objectives. However, careful considerations should be taken in order to achieve both objectives simultaneously since it has been shown that these two objectives are at odds, i.e., the usage of fault-tolerance techniques increases energy dissipation and the usage of energy-saving techniques reduces system reliability. This paper has intended to contribute to the effort of finding suitable fault tolerance techniques to be used with systems that employ energy management techniques. It is not intended to provide any new fault-tolerance or energy saving technique. Toward this goal, this paper has presented the first investigation into the usage of information redundancy in DVS-enabled and ABB-enabled systems. Experimental and analytical studies have shown that the use of a combination of information-redundancy and rollbackrecovery in DVS-enabled and ABB-enabled real-time systems can significantly improve the system's fault-tolerance as well as energy dissipation, when compared to the real-time systems that rely solely on rollback-recovery, even when considering the imposed hardware overheads. Since the SEU rate varies in different environments, the impact of the SEU rate on the suitability of the proposed approach has been analyzed. The analysis has shown that as the SEU rate increases, the proposed system (based on the combination of information-redundancy and rollback-recovery) proves more advantageous in terms of energy consumption than the conventional system (sole rollback recovery).
