Using additional store-checkpoinsts (SCPs) and compare-checkpoints (CCPs), we present an adaptive checkpointing for double modular redundancy (DMR) in this paper. The proposed approach can dynamically adjust the checkpoint intervals. We also design methods to calculate the optimal numbers of checkpoints, which can minimize the average execution time of tasks. Further, the adaptive checkpointing is combined with the DVS (dynamic voltage scaling) scheme to achieve energy reduction. Simulation results show that, compared with the previous methods, the proposed approach significantly increases the likelihood of timely task completion and reduces energy consumption in the presence of faults.
Introduction
Checkpointing is an important method for fault-tolerance in real-time systems in the condition of harsh environment. The following three types of checkpoints are well known: CSCP, SCP and CCP [1] [2] [3] . CCPs are used to compare the states of the processors without storing them, while, the processors store their states without comparison in SCPs. If the two operations are used together in the same checkpoint, we call it CSCP. Using CCP and SCP, Ziv and Bruck have shown numerically that the task execution time is significantly reduced [1, 4] . Using additional CCPs and SCPs, Nakagawa and Fukumoto have used a triple modular redundancy and double modular redundancy to analyze the optimal checkpoint intervals that can minimize a task execution time, respectively [5] . In addition, many real-time systems are often en- ergy-constrained since system lifetime is determined to a large extent by the battery lifetime [2] . For example, autonomous airborne and sea-borne systems working on limited battery supply, space systems working on a limited combination of solar and battery power supply, time-sensitive systems deployed in remote locations where a steady power supply is not available [3, 6] . DVS has emerged as a popular solution to the problem of reducing power consumption during system operations. The DVS become possible on the availability of embedded processors that can dynamically scale the frequency by adjusting the operation voltage [2, 3] . Many embedded processors have the ability to dynamically scale the operation voltage currently. Such as, the mobile processors from Intel with its SpeedStep [7] technology. In the realm of real-time systems, the DVS techniques focuse on minimizing energy consumption of the system under the condition of meeting the deadlines. The DVS and fault tolerance for real-time systems have been studied as separate problems. It is only recently that an attempt has been made to combine fault tolerance with the DVS [3] . The combination of DVS, CSCPs (CCPs or SCPs) can be used to satisfy system's DVS requirement and improve the performance of real-time systems. However, none of the mentioned papers addressed these issues in terms of conjunction. Using additional SCPs and CCPs, we modify the methods of [3] in the double modular redundancy (DMR) in this paper. Different from the existing methods, our approach is to tune the scheme to the specific system which it is implemented on, and use both the comparison and storage operations efficiently, the performance of checkpoint schemes is improved.
Some notations used in our paper are listed below: t s : the time to store the states of processors. t cp : the time to compare processors' states. t r : the time to roll back the processors to a consistent state. 
Adaptive checkpointing scheme
Assume task τ has a period T , a deadline D , a worst-case computation time N when there are no fault in the system. An upper boundary k represents the number of fault occurrences that have to be tolerated. C is the overhead of a checkpoint. Faults arrive as a Poisson process with parameter λ , the average execution time for the task is minimum, if a constant checkpoint interval of 2 / C λ is used [8] . We refer to this as the Poisson-arrival approach. If the Poisson-arrival scheme is used, the effective task execution time in the absence of faults must be less than the deadline D. Assume the fault-free execution time for a task is N, the worst-case execution time for up to k faults is minimum, if the constant checkpoint interval is set to k NC / [9] . This is the k-fault-tolerant approach.
In addition, we assume that task τ is divided equally into n intervals of length
, and at the end of each interval, CSCP is always placed.
Additional SCPs
Each CSCP interval is divided equally into m inter-
( figure 1 ). The SCPs are placed between the CSCPs, the states of two processors are stored at iT 1 and jT (i=1,2,…, m-1). If two states do not get an agreement at time jT, then, we need to find the most recent SCP with identical states and roll back to it. As shown in figure 1 , two processors are rolled back to (i-1)T 1 because some errors have occurred during ((i-1)T 1 , iT 1 ), and repeat the execution from (i-1)T 1 . The average execution time R 1 (m) for a CSCP interval ((j-1)T, jT) is given by a renewal-equation [4, 10] :
Therefore, the average execution time of a task R SCP (n)=nR 1 (m Figure 2 . The adaptive checkpointing with SCPs, adapchp-SCP (D,E,C,k, λ ), is described in Figure 3 . A check is performed to see if the task has been completed in line 4, and line 5 checks for the deadline constraint. The length of SCP and CSCP interval is set in line 6 and line 7, respectively. In line 9, a check is performed to see if fault is detected. If there is no fault, then continue to run task, otherwise, roll back to previous SCP with identical states and continues execution, which are described from line 12 to T which minimizes R 1 (m);
; m m = 6.
else
Fig. 3 Adaptive checkpointing with SCPs
if (R t > R d ) break with task failure; 6.
Insert SCP with interval length itv; 7.
Insert CSCP with interval length Itv; 8.
Update
if (no error has been detected at CSCP) 10.
Resume execution; 11. else{ 12.
Rollback to the most recent SCP with identical states; 13.
R f = R f -1; 14.
Resume execution;}} jT iT1
Rollback point
Error detection T1 line 16. In line 2 and 14, we use procedure interval (R d , R t , C, R f, λ ) [3] ( figure 4) to calculate the checkpoint interval.
In figure 4 ,
is the checkpoint interval of the k-fault-tolerant approach. In addition, [3] defined some quations: figure 4 calculates the number of faults Exp-fault that are expected to occur in the remaining time R t . If Exp-fault is less than or equal to R f , the k-fault-tolerant requirement is deemed to be more stringent than the Poisson-arrival criterion. In line 3, a check is performed to see if R t exceeds the threshold
. If this condition is satisfied, the checkpoint interval is set to I 3 (R t , R d ,C). In line5, a check is performed to see if R t exceeds threshold Th(R d , R f ,C) but is below
. If this condition is satisfied, the checkpointing interval is set to I 2 (R t , Exp-fault,C). If the k-fault-tolerant threshold is met, the checkpoint interval is set to I 2 (R t , R f ,C) in line 7. Line 8-10 handle the case when the k-fault-tolerant requirement is deemed to be less stringent than the Poisson-arrival criterion.
Additional CCPs
. The CCPs are placed between CSCPs, and the states of the two processors are compared at iT 2 and jT (i=1,2,…, m-1). If two states do not reach to an agreement at iT 2 and jT, that means some errors have occurred during this interval, the two processors will be rolled back to (j-1)T ( 
Adaptive checkpointing with DVS
With additional SCPs and CCPs, we show how adaptive checkpointing scheme can be combined with the DVS to obtain fault tolerance and power savings in real-time systems. In the one hand, our approach is to maximize the probability that the task meets its deadline in the presence of faults. In another hand, our approach is to reduce energy consumption through the DVS.
Assume that task τ has a fixed quantity of computation cycles N in the fault-free condition. Because the variable voltage CPUs are available, the time to execute task τ depends on the processor speed. We therefore characterizeτ by a fixed quantity N, namely, its worst-case number of CPU cycles, needed to execute the task at the minimum processor speed. For the rest of this paper, we normalize the units of N such that the minimum processor speed is 1. That is, if the minimum processor speed is S cycles per second, then we express the number of cycles in units of S cycles and thus normalize the minimum processor speed to S min =1. Of course, period T and deadline D are expressed in terms of the number of CPU cycles at the Fig. 4 Calculating checkpointing interval
chk_interval= I 2 (R t , exp_error,C); 7.
else chk_interval= I 2 (R t , R f ,C);} 8.
else chk_interval=I 1 (C, λ ) To simplify the analysis and to allow for the derivation of analytical formulas, we would like to assume that a single processor with two speeds f 1 and f 2 , and f 1 is the minimum processor speed, namely, f 1 = S min =1. Moreover, the processor can switch its speed in a negligible amount of time.
Additional notations we use is below: R c : the number of instructions of the task that remain to be executed at the time of the voltage scaling decision.
c : the numbe of clock cycles that a single checkpoint takes.
t est : an estimate of the time that the task has to execute in the presence of faults and with checkpointing. The expected number of faults for the duration t est is est t λ .
The checkpointing cost C at frequency f is given by C=c/f.
To ensure est t λ fault tolerance during task execution, the checkpointing interval must be set to
. In addition, we have
We consider the voltage scaling to be feasible if
This forms the basis of the energy-aware adaptive checkpointing that are described in procedure adapchp_dvs_SCPs and adapchp_dvs_CCPs ( Figure 6 and Figure 7 ). Insert CCP with interval length ; 8.
Insert CSCP with interval length ; 9.
Update , according to speed ;
10. if(no error has been detected at CC
Resume execution; 12. else{ 13.
Roll back to the last CSCP; 14.
-1; 
Simulation results
We carried out a set of simulation experiments to evaluate our adaptive checkpointing schemes adapchp_dvs_CCPs and adapchp_dvs_SCPs (referred to as A_D_C and A_D_S) and to compare it with the Poisson-arrival (referred to as Poisson), the k-fault-tolerant (referred to as k-f-t) checkpointing schemes and ADT_DVS [3] (referred to as A_D). Faults are injected into system using a Poisson process with various values for the arrival rate λ . Due to the stochastic nature of the fault arrival process, the experiment is repeated 10,000 times for the same task and the results are averaged over these runs. We are interested here in the probability that the task completes on time, and the energy consumption. Energy consumption is measured by summing the product of the square of the voltage and the number of computation cycles over all the segments of the task [3] . As in [3] , we use the term task utilization U to refer to the ratio N/D. In order to compare with results of ADT_DVS scheme, we let t r =0 and f 2 =2f 1 . Moreover, let P and E represent the probability of timely completion of tasks and energy consumption, respectively.
Additional SCPs
As mentioned previously, additional SCPs scheme fits systems, in which time overhead is determined mainly by the time to compare processor's states. Therefore, the parameters are as following: D=10000, t s =2, t cp =20, c=22. ≤ (low fault arrival rate and high task utilization), we draw the similar conclusions described above.
We assume that both Poisson-arrival and the k-fault-tolerant schemes use the higher speed f 2 . Then the task utilization U in this case is N/(f 2 D). Our experimental results are shown in table 2. We also can draw a conclusion that our scheme outperforms the other three schemes. 
Additional CCPs
Additional CCPs scheme fits systems which overhead time is determined mainly by the time to store processor' states. Therefore, the parameters is as following: D=10000, t s =20, t cp =2, c=22.
Tab. 3 The comparison between adapchp-dvs-CCPs and other algorithms, both the Poisson-arrival and the k-fault-tolerant schemes use the lower speed in table 3 and  table 4 . Similar to section 4.1, simulation results show that compared to ADT_DVS scheme, the proposed scheme significantly increases the likelihood of timely task completion and reduces power consumption in the present of faults.
Conclusion
In this paper, we presented an adaptive checkpointing, using a DMR with two processors, and tuning the scheme to the specific system which it is implemented on. The proposed scheme is done by inserting two types of checkpoints (CCP and SCP) between CSCP. Separating the comparison and store operations enables choosing the optimal interval for each operation, without concerning about the other. We also discussed the optimal numbers of checkpoints that minimize the average times. Based on that, we combined the adaptive checkpoiting with the DVS schemes to achieve energy reduction. We presented simulation results which showed the advantages of our scheme. We will extend the proposed scheme to other task duplication systems with security needs as a future work.
