Two-phase low-energy N-modular redundancy for hard real-time multi-core systems by Salehi, Mohammad et al.
 
 1 
 
Two-Phase Low-Energy N-Modular Redundancy 
for Hard Real-Time Multi-Core Systems 
Mohammad Salehi, Alireza Ejlali, and Bashir M. Al-Hashimi, Fellow, IEEE  
Abstract—This paper proposes an N-modular redundancy (NMR) technique with low energy-overhead for hard real-time multi-
core systems. NMR is well-suited for multi-core platforms as they provide multiple processing units and low-overhead 
communication for voting. However, it can impose considerable energy overhead and hence its energy overhead must be 
controlled, which is the primary consideration of this paper. For this purpose the system operation can be divided into two 
phases: indispensable phase and on-demand phase. In the indispensable phase only half-plus-one copies for each task are 
executed. When no fault occurs during this phase, the results must be identical and hence the remaining copies are not 
required. Otherwise, the remaining copies must be executed in the on-demand phase to perform a complete majority voting. In 
this paper, for such a two-phase NMR, an energy-management technique is developed where two new concepts have been 
considered: i) Block-partitioned scheduling that enables parallel task execution during on-demand phase, thereby leaving more 
slack for energy saving, ii) Pseudo-dynamic slack, that results when a task has no faulty execution during the indispensable 
phase and hence the time which is reserved for its copies in the on-demand phase is reclaimed for energy saving. The energy-
management technique has an off-line part that manages static and pseudo-dynamic slacks at design time and an online part 
that mainly manages dynamic slacks at run-time. Experimental results show that the proposed NMR technique provides up to 
29% energy saving and is 6 orders of magnitude higher reliable as compared to a recent previous work. 
Index Terms— Energy minimization, multi-core systems, real-time and embedded systems, reliability, scheduling.   
——————————      —————————— 
1 INTRODUCTION
ULTI-CORE platforms have emerged to be popular 
and powerful computing engines for many recent 
embedded systems [1], [2], [3], [4], [5]. While such archi-
tectures have been employed for embedded applications 
that require high performance computing, we believe 
they also offer new considerable opportunities for design-
ing embedded systems where hard real-time operation, 
high reliability in the presence of transient faults, and low 
energy consumption are required [6], [7], [8]. In this pa-
per, we address the use of multi-core platforms to achieve 
high reliability with low energy-overhead for hard real-
time embedded systems. 
To achieve reliability against transient faults, we con-
sider N modular redundancy (NMR) [9], [10], where mul-
tiple processing units execute identical copies for each 
task and their results are voted on to produce a single 
output. NMR is well-suited for multi-core platforms as 
they satisfy NMR requirements such as multiple pro-
cessing units and low-overhead communication for vot-
ing [3]. An NMR system can mask faults while less than 
half of its units are faulty. Fault-tolerant real-time systems 
that has been considered in previous works require fault-
detection mechanisms (e.g., [5], [6], [7], [8], [11]) and these 
works have assumed (usually implicitly) that they have 
perfect detection mechanisms (i.e., they can detect all 
faulty task executions). However, common fault-detection 
mechanisms are far less effective than what is required 
for highly reliable systems, whereas NMR does not re-
quire any specific fault-detection mechanism and uses 
result comparison (majority voting) for fault-detection 
and masking [9], [10]. Since it is very unlikely that all 
modules in NMR become faulty at the same time and 
make the same erroneous results, comparing the results 
can provide almost perfect fault-detection/-masking [9], 
[10]. Also, result comparison can be combined with hash-
based detection mechanisms, e.g. Fingerprinting [31], to 
achieve very high detection coverage, about 1-2−16 [31]. 
Therefore, in our experiments in Section 5 we will assume 
1-2−16 detection coverage for our system. Like all other 
fault-tolerance and fault-masking techniques, NMR can 
impose considerable energy overhead [9], [10], which is 
an important concern in the embedded systems where 
energy consumption is prominent. To reduce the energy 
overhead, we propose an energy-management technique 
that bears the major contributions of the work and is spe-
cifically developed for NMR when used for hard real-
time multi-core systems (Sections 3 and 4). The main 
contributions of this work are: 
i) Considering the dominance of the fault-free execu-
tion on faulty executions [6] [8] [38], a two-phase 
NMR is proposed that achieves minimized energy 
consumption in the absence of faults while guaran-
teeing reliability and deadline requirements. 
ii) A specific type of slack time, called pseudo dynamic 
slack, is considered in this work. As explained in 
Section 4, this type of slack time is different from 
conventional slack times, i.e., static and dynamic 
slack [6], [8], [20]. 
 
———————————————— 
x Mohammad Salehi and Alireza Ejlali are with the Department of Computer 
Engineering, Sharif University of Technology, Tehran 14588, Iran (e-mail: 
mohammad_salehi@ce.sharif.edu, ejlali@sharif.edu). 
x Bashir M. Al-Hashimi is with the School of Electronics and Computer 
Science, University of Southampton, Southampton SO17 1BJ, U.K. (e-
mail: bmah@ecs.soton.ac.uk). 
 
M 
2  
 
iii) An energy-management technique is proposed that 
exploits the pseudo-dynamic and static slacks 
through offline optimization (Section 4.2). This is 
different from previous works that have not pro-
posed a mechanism to manage the pseudo-
dynamic slack. Also, an online energy-management 
technique is proposed to exploit dynamic slacks at 
run-time (Section 4.3). 
iv) A specific scheduling technique is developed, 
called block-partitioned scheduling (Section 3) that 
provides the ability of in-advance parallel task execu-
tion (Section 3) to exploit pseudo-dynamic slacks 
more effectively. 
The remainder of this paper is organized as follows. In 
section 2 we review the related work. The proposed tech-
nique is presented in Section 3. Section 4 describes the 
energy-management method which is used for the pro-
posed technique. The experimental results are presented 
in Section 5. Finally, Section 6 concludes the paper. 
2 RELATED WORK 
Some research works, e.g., [6], [7], [11], have addressed 
both fault tolerance and low energy-consumption in fault-
tolerant real-time systems with two processors. These 
works have not considered multiple faults per task execu-
tion, and also they assume they have perfect fault-
detection mechanisms. [13] has proposed voltage-scaling 
techniques to reduce the energy consumption of triple-
modular redundancy (TMR). However, this work has 
only considered single task applications. 
Many previous works in the context of multi-processor 
systems either propose energy reduction management 
techniques without considering reliability (e.g., [14], [15], 
[32], [33]) or consider reliability without considering en-
ergy consumption (e.g., [4], [30], [34]). [14] has considered 
variation in execution times to propose a scheduling algo-
rithm based on dynamic voltage scaling (DVS) [12] for 
multi-processor systems. [15] has studied the energy effi-
ciency of multi-core platforms that use multiple voltage 
islands. [32] has proposed a technique to minimize chip-
level peak power consumption in multi-core systems 
running sporadic real-time tasks. [33] has proposed an 
adaptive task partitioning for multi-core systems running 
independent periodic real-time tasks. [4] has evaluated 
scheduling heuristics for tasks with different criticality. 
[30] has proposed a mapping optimization technique for 
mixed critical multi-core systems with different reliability 
requirements. [34] has proposed software transformations 
to increase reliability through reducing instructions vul-
nerabilities and the executions of critical instructions. 
Recently, research works have also been focused on 
both energy and reliability considerations in multi-core 
systems. Some works, e.g. [35], [36], [37] have proposed 
multi-core architectures that exploit redundancy at differ-
ent levels of abstraction to target low-energy consump-
tion and reliability. [35] has proposed an adaptive multi-
core architecture that selectively adjusts pipeline-level 
redundancy to satisfy reliability target with low energy 
consumption. [36] has proposed a customizable chip-level 
redundancy technique for multi-core systems that utilizes 
power efficient hardware fault-detection mechanisms 
along with forward recovery to reduce overheads in case 
of fault-free executions. [37] has considered the effects of 
DVS on the soft error rate and proposed a flexible dual 
modular redundancy (DMR) mechanism that selectively 
enables per-core DMR to increase reliability. However, 
these works require hardware modiﬁcation or redesign, 
and hence, cannot be used by the current commercial-off-
the-shelf processors, while our proposed technique is 
general and can be exploited by any multi-core processor 
that supports DVS. Some works, e.g. [5] [16], [17], [18], 
[38], have proposed energy-management techniques for 
task-level redundancy in multi-core systems. [5] and [16] 
have considered only one faulty execution for each task to 
preserve the original system reliability, while for many 
applications (e.g., the applications that are used in harsh 
environments) a high level of reliability cannot be 
achieved unless tolerating multiple faulty tasks [9], [10], 
[17], [18]. Some works have considered different applica-
tion models, e.g. periodic independent real-time tasks in 
[17] and [38] and parallel independent applications in 
[18]. However, these works cannot be applied to tasks 
with precedence constraints (e.g., task graphs [5], [6], [7]), 
while we consider hard real-time applications with task 
precedence constraint and propose a scheduling and 
energy-management technique for these applications.   
3 PROPOSED TWO-PHASE NMR TECHNIQUE 
In this paper, we consider frame-based applications [5], 
[6], [7] with hard timing requirements and task prece-
dence constraints where n dependent tasks {T1, T2, 
T3,…,Tn} are executed within each execution frame and 
must be completed as a whole before the end of the frame 
(specified by a deadline D). We also consider that the task 
precedence constraints (dependencies between the tasks) 
are depicted as a directed acyclic graph (DAG) [5], [6], [7]. 
For example Fig. 1a shows an example application tasks 
graph with six tasks where the numbers placed above the 
tasks is their worst-case execution time at the maximum 
supply voltage Vmax and the maximum operational fre-
quency fmax (denoted by Wi for each task Ti). For this type 
of applications we propose a two-phase NMR technique 
with low energy consumption running on multi-core 
platforms. To do this, a new scheduling technique is pro-
posed and a new type of slack time (which is specific to 
the proposed two-phase NMR) is exploited to manage 
energy consumption. In this section we describe the two-
phase operation of the system and the proposed schedul-
ing technique and in the next section we explain the ener-
gy-management technique. 
The two operation phases of the proposed NMR are: 
1. Indispensable phase: At first the system operates in 
its indispensable phase where it executes a multi-
core schedule containing ª º/ 2N  copies of each 
task. For each task, the results of the ª º/ 2N  task 
copies are compared. If no fault occurs, the task re-
sults must be identical and in this case it is used as 
the result of the system. However, when the results 
SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS                                                3 
 
are not identical (when some faults have occurred 
during the indispensable phase), the system tem-
porarily switches to the on-demand phase where it 
executes the remaining ¬ ¼/ 2N  copies of the task to 
perform a complete majority voting. 
2. On-demand phase: In this phase, the system executes 
a part of a multi-core schedule that contains the 
remaining copies of the task which had faulty exe-
cutions in the indispensable phase. As ª º/ 2N  cop-
ies of the task have already been executed in the 
indispensable phase, in the on-demand phase we 
execute the remaining ¬ ¼/ 2N  copies of the same 
task to obtain N results for performing a complete 
majority voting to mask the faults. 
Therefore, each of the two operation phases of the 
proposed NMR technique requires its own schedule, so 
that we need to synthesize two schedules from the same 
application task graph. These two schedules are: i) a mul-
ti-core schedule containing ª º/ 2N  copies for each task for 
the indispensable phase, ii) a multi-core schedule contain-
ing ¬ ¼/ 2N  copies for each task for the on-demand phase. 
It is known that ﬁnding the optimal multi-core schedule 
to maximize parallelism (i.e., minimizing the schedule 
time length) is an NP-hard problem [5]. Indeed multi-core 
schedules are typically obtained by the list scheduling 
algorithm [19] as a simple heuristic that also provides 
parallelism. Similarly, in this paper we use list scheduling 
to synthesize the multi-core schedules of the indispensa-
ble and on-demand phases. Also, in the list scheduling, 
whenever several tasks can be scheduled (these are the 
tasks that all their predecessors are scheduled), we use 
the longest task ﬁrst (LTF) policy to determine the execu-
tion order. We will discuss in Section 4 why the LTF poli-
cy is effective for our proposed technique. For example, 
considering a TMR system (i.e., NMR with N=3), Fig. 1 
shows the step by step generation of the two multi-core 
schedules for a given task graph (Fig. 1a) using list 
scheduling with LTF policy. Fig. 1b shows the schedule 
with two copies of each task for the indispensable phase 
and Fig. 1c shows the schedule with one copy of each task 
for the on-demand phase. 
For the schedule which is used in the on-demand 
phase, we require that each task can overlap (in time) 
with at most one other task in each of the other cores. For 
example, the multi-core schedule of Fig. 1c (step 6) does 
not satisfy this condition as in this schedule T2 overlaps 
with both T3 and T5 on Core2 and also overlaps with both 
T4 and T6 on Core3. Indeed, we need the schedule of 
Fig. 1c (step 6) to be transformed to a schedule like the 
one in Fig. 1d that satisﬁes the condition as each task 
overlaps with at most one other task in each of the other 
cores. We require this condition to be satisﬁed because it 
lets us partition the multi-core schedule into time blocks, 
so that in each block only one single task or multiple par-
allel tasks exist. For example, in Fig. 1d the block B1 only 
consists of the task T1, the block B2 consists of the parallel 
tasks T2, T3 and T4, and the block B3 consists of the parallel 
tasks T5 and T6. In this paper, we call such schedules, 
block-partitioned (BP) schedules. As we will show later in 
this section, whenever a fault occurs during the indispen-
sable phase, we switch to the on-demand phase to execute 
exactly one block of the BP schedule and then we switch 
back to the indispensable phase to continue executing the 
schedule of the indispensable phase. 
As a multi-core schedule which has been synthesized 
using the list scheduling technique with LTF policy (e.g., 
the schedule of Fig. 1c) may not be BP, we use a simple 
technique to convert ordinary schedules to BP schedules. 
Suppose that in a multi-core schedule a task TA overlaps 
with two other tasks TB and TC scheduled on another core 
(Fig. 2a). Assuming that the task TC comes after the task 
TB, we simply shift the task TC (and all its successor tasks) 
to the right until there is no overlap between TA and TC. 
As it can be seen from Fig. 2a, the amount of this shift 
(denoted by σ in the figure) is simply the difference be-
tween the finish time of TA and the start time of TC. We 
start from the beginning of a multi-core schedule, move to 
the right, and apply this technique until we obtain a BP 
multi-core schedule. As an example, when we apply this 
technique to the schedule of Fig. 1c (step 6), we obtain the 
BP schedule of Fig. 2b (step 3). One point that should be 
noted here is that block-partitioning may increase the 
execution time of an application and hence it may cause 
the application to be unschedulable. Therefore, we use the 
proposed energy-management technique (Section 4) 
when the application total execution time is less than its 
deadline. This implies that the energy-management tech-
nique might not be used for some applications that have 
ListSchedulingLTF(G, 2) for the indispensable phase
T1
T1Core1
Core2
Core3
Core4
T1
T1 T2
T2
T2
T2
T3
T3
T1
T1
T1
T1 T2
T2
T5
T5
T1
T1 T2
T2
T3
T3
T5
T5
T4
T4
T2
60 T5
40
T6
20
T3
40
T4
30
T1
20
Core1
Core2
Core3
(a) 
(b)
T1
T1 T2
T2
T4
T4
T6
T6
T5T3
T1 T2T1 T2T1
(c)
Indispensable phase On-demand phase Majority voting (Result comparison)/Saving results
ListSchedulingLTF(G, 1) for the on-demand phase
Core4
step 1 step 2 step 3 step 4 step 5 step 6
step 1 step 2 step 3 step 4 step 5 step 6
(d)
Task Graph
T4 T6
T3
T1 T2
T5
T4 T6
B1 B2 B3
T3
T3
G
T3
T3
T5
T5
T3
T1 T2
T4
T3
T1 T2
T5
T4
T3
T1 T2
 
Fig. 1. Synthesizing a TMR system (i.e., NMR with N=3) on a quad-core platform. a) An example task graph, b) Creating a schedule with two 
copies for each task for the indispensable phase, c) Creating a schedule with one copy for each task for the on-demand phase, and d) A 
block-partitioned version of the on-demand phase schedule. 
4  
 
tight deadlines. Similar schedulability conditions are used 
by other techniques, e.g. [16], [17] and [38], to define in-
feasible solutions.   
In the following, we describe how the proposed two-
phase NMR technique works by means of the example of 
Fig. 1 where we have a TMR system running on a quad-
core platform. When no fault has occurred the system 
executes the schedule of the indispensable phase (Fig. 1b 
(step 6)) where two copies of each task Ti are executed 
and their results are compared. If the results are identical, 
it is used as the result of the system. Whenever the results 
of a task Ti are not identical (which indicates that some 
faults have occurred during the indispensable phase), we 
switch to the on-demand phase to execute the block of the 
BP schedule of the on-demand phase (Fig. 1d) that in-
cludes the same task Ti. After executing the third copy of 
Ti in the on-demand phase, a majority voting is done over 
the three results to mask the faults. Then, we switch back 
to the indispensable phase to continue executing this 
schedule from the point it was broken. 
Fig. 3 shows how the proposed technique operates 
when some faults occur during executing the application 
of Fig. 1. Note that, in this paper, whenever we say a fault 
occurs or a task becomes faulty, we mean that the task 
gives an incorrect result due to some errors (e.g. one or 
more transient faults). Assuming that the task T2 becomes 
faulty, when comparing the results of T2, they do not 
match, and hence the system temporarily switches to the 
on-demand phase. The result mismatch may happen due 
to a fault during the task execution or even due to a fault 
that corrupts the result comparison between the two 
phases. In the on-demand phase as T2 belongs to the block 
B2 of the BP schedule of Fig. 1d, the block B2 is executed 
(the highlighted tasks T2 and T4 in Fig. 3), and then a ma-
jority voting is done over the results of the three copies of 
T2 to mask the fault (Fig. 3a). Here, T3 is not executed in 
the block B2 during the on-demand phase as it has already 
finished successfully before detecting the fault in T2 and 
hence it is no longer required. The important point to be 
noted here is that when we execute B2 during the on-
demand phase we not only execute T2 (whose result is 
required for majority voting as its execution in the indis-
pensable phase has been faulty), but also we execute T4 in 
parallel with T2 and its result is saved in memory, so that 
it can be used later for possible majority voting. After 
executing the block B2 the system switches back to the 
indispensable phase and continues executing the sched-
ule from the point it was broken. After switching back to 
the indispensable phase, two possible execution scenarios 
can be considered regarding the task T4: 
i) If a fault occurs during the execution of T4 in the 
indispensable phase (Fig. 3b), when the result 
comparison indicates fault occurrence, the system 
does not need to switch to the on-demand phase as 
the results of three copies of T4 are already availa-
ble to be voted on (the results of two copies of T4 
are obtained in the indispensable phase and the re-
sult of another copy of T4 already exists in the in-
ternal memory as it was executed in-advance in the 
previous on-demand phase). 
ii) If no fault occurs during the execution of T4 in the 
indispensable phase (Fig. 3a), the results of the 
copy of T4 that was executed in-advance in the pre-
vious on-demand phase are no longer required and 
can be dropped from the memory. 
One question that may arise here is “what happens if the 
in-advance execution of T4 becomes faulty?”. (Such a fault 
may occur during the in-advance execution of T4 in the 
on-demand phase or during saving the results of the in-
advance execution of T4 between the two phases or even 
after the in-advance execution of T4 in its stored results). 
In this case, when the system executes T4 in the indis-
pensable phase, if no fault occurs (Fig. 3c), we will not 
use the results of the in-advance execution of T4, and 
hence no problem occurs. However, if the execution of 
T4 in the indispensable phase also becomes faulty (Fig. 
3d), the system cannot mask this second fault as the 
stored values of the in-advance execution are also faulty. 
Indeed, a TMR system can mask only one faulty execu-
tion for each task (generally speaking, an NMR system 
can mask ¬ ¼/ 2N  faulty executions for each task) 
[9], [10].      
The in-advance executions of tasks (e.g., T4 in the 
block B2 in Fig. 3) in the on-demand phase are useful 
because: 
i) Because of the use of parallel execution in the on-
TA
TB TC
TA
TB TC
(a) (b) 
step 1 step 2 step 3
BlockPartitioning(S)
B1 B2 B3
schedule S
σ
T3
T1 T2
T5
T4 T6
T3
T1 T2
T5
T4 T6
B1 B2 B3
T6
T5
T4
T3
T1 T2
T6
T5
T4
T3
T1 T2
 
Fig. 2. Block partitioning scheme. a) A technique to convert ordinary schedules to block-partitioned (BP) schedules and b) Block-partitioning a 
schedule that is not BP. 
T1
T1 T2
T2
T3
T3
T5
T5
T4
T4
T6
T6
T2
Indispensable phase On-demand phase
Result comparison/Saving results
T4T5
T5
T1
T1 T2
T2
T3
T3
T5
T5
T4
T4
T6
T6
T2
T4T5
T5
ü
T1
T1 T2
T2
T3
T3
T5
T5
T4
T4
T6
T6
T2
T4T5
T5
T1
T1 T2
T2
T3
T3
T5
T5
T4
T4
T6
T6
T2
T4T5
T5
(a) 
ü
(b) 
ü
ü
ü ü
(c) (d) 
block B2 block B2
block B2block B2
8 8
8 8 8
Transient fault Result mismatch8ü
Two identical results for Ti
 
Fig. 3. Operation of the proposed technique when faults occur during 
the execution of the application of Fig. 1. 
SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS                                                5 
 
demand phase, in-advance executions do not im-
pose any time overhead. For example, it can be 
seen in Fig. 3 that when the system have to execute 
T2 in the on-demand phase, the in-advance execu-
tion of T4 is performed in parallel with it. Also, be-
cause of the use of LTF scheduling, tasks that come 
later in the schedule (e.g., T4) can never be longer 
than the tasks that come earlier (e.g., T2) which 
means that the in-advance execution of T4 cannot 
lengthen the execution of the block B2 in Fig. 3. In-
deed, if we did not use in-advance executions, we 
would not have any parallel execution during the 
on-demand phase which implies that the use of in-
advance executions helps us reserve relatively less 
slack time for the on-demand phase, resulting in 
more slack to be available for energy management. 
ii) Although in-advance executions of tasks in the on-
demand phase may turn out to be useless when no 
fault occurs later during the execution of the task in 
the indispensable phase, they have a negligible im-
pact on the average energy consumption. This is 
because an in-advance parallel execution is per-
formed only when a fault occurs in the indispensa-
ble phase (for example in Fig. 3 the in-advance exe-
cution of T4 has been performed because a fault has 
occurred in T2 during the indispensable phase). 
Note that while from a reliability point of view the 
consideration of faults is a must, from the average 
energy consumption point of view, we do not need 
to consider the cases where the system tolerates a 
fault [6], [13]. As an example, consider T2 and T4 in 
Fig. 3. Suppose that the probability of a task execu-
tion becomes faulty is 10−4 and the energy con-
sumption of T2 and T4 are 10 mJ and 5 mJ. When no 
fault occurs, the system only executes T2 in the in-
dispensable phase and consumes 2×10=20 mJ. If a 
fault occurs during the execution of T2 in the indis-
pensable phase, the system will execute T2 and T4 
in the on-demand phase and hence consumes 
(10+5)=15 mJ more energy. Therefore, the average 
energy consumption for the execution of T2 and T4 
is (1−10−4)×20+10−4×(20+15)=20.0015 mJ which is 
very close to the energy consumption when no 
faults occur (20 mJ). This is also consistent with our 
experimental observations showing that the aver-
age energy consumption differs less than 0.01% 
from the fault-free energy consumption. This ex-
ample shows that the energy overhead of the in-
advance executions is negligible from the view-
point of average energy consumption. 
Fig. 4 shows the pseudo-code of the proposed schedul-
ing method used in our technique that receives an appli-
cation task graph (G) to make schedules for the indispen-
sable and on-demand phases (i.e., SIND and SBP respective-
ly). The pseudo-code of Fig. 4a is the main body of the 
scheduling technique that calls the functions presented in 
Figs 4b and 4c. The function of Fig. 4b (ListScheduling-
LTF(G, q)) implements the list scheduling algorithm with 
the LTF policy to make a schedule S containing q copies 
of each task from a task graph G. In this function, line 1 is 
for the initialization purpose. In line 2, we begin a while 
body to apply the scheduling to all tasks. Line 3 is used to 
implement LTF list scheduling, as it selects the largest 
unscheduled task Ti whose predecessors have all sched-
uled. In line 4, q parallel copies of Ti are scheduled. Final-
ly, line 6 returns the schedule S. As it can be seen from 
Fig. 4a, this function is required for both the indispensa-
ble and on-demand phases. For the indispensable phase 
we need a schedule containing ª º/ 2N  copies for each 
task, and for the on-demand phase we need a schedule 
containing ¬ ¼/ 2N  copies for each task. We make these 
two schedules in lines 1 and 2 of Fig. 4a. In line 3 of Fig. 
4a we use the function of Fig. 4c (BlockPartitioning(S)) to 
convert the schedule STMP (temporary schedule obtained 
from line 2 of Fig. 4a) to the BP schedule SBP. The function 
of Fig. 4c receives a multi-core schedule S and starts from 
the beginning of the schedule (line 1). In line 2 we check if 
each task, say TA, overlaps with more than one task in 
another core in the schedule S, say TB, TC (where TC 
comes after TB on the same core). If so, through lines 3 and 
4 we shift the task TC (and all its successor tasks in S) to 
the right until there is no overlap between TA and TC. In 
Inputs: 
G: application task graph 
N: parameter N of NMR, e.g, 3 for TMR 
Outputs: 
SIND: schedule for the indispensable phase 
SBP: BP schedule for the on-demand phase 
1: SIND=ListSchedulingLTF(G, ª º/ 2N );       // Fig. 4b 
2: STMP=ListSchedulingLTF(G, ¬ ¼/ 2N );     // Fig. 4b 
3: SBP=BlockPartitioning(STMP);                      // Fig. 4c 
(a) 
function ListSchedulingLTF(G, q) 
// G: input task graph, q: number of copies for each task in the  
// schedule, S: the output schedule 
1: S = Null;              // Initialize S with an empty schedule 
2: while all tasks in G are not scheduled do 
3:        Ti= the largest unscheduled task in G whose predecessors  
--               have all scheduled; 
4:       Add q parallel copies of Ti to S; 
5: endwhile; 
6: return S; 
(b) 
function BlockPartitioning(S) 
// S: the input multi-core schedule 
1: for each task TA from the beginning of S do 
2:      if TA overlaps with more than one task, TB and TC (where TC  
--           comes after TB in the same core)  then 
3:                σ = (finish time of TA) – (start time of TC); 
4:                shift TC and all its successors in S to the right by σ; 
5:      endif; 
6:  endfor; 
7:  for each block B in S do 
8:     shift all tasks in B to the right and place them at the end of B;  
9:  endfor; 
10: return S; 
(c) 
Fig. 4. The proposed scheduling technique.  
6  
 
line 4, when we shift TC to the right, we need to shift all 
the tasks that come after TC on the same core and the 
tasks that are dependent to TC (successors of TC in the task 
graph) but are scheduled on the other cores. After remov-
ing possible overlaps in the schedule (i.e., partitioning the 
schedule into blocks), through lines 7 to 9 we shift all 
tasks in each block to the right to place them at end of the 
block. We will discuss in Section 4 why this is effective for 
our proposed technique. Finally, in line 10 the schedule S 
(i.e., a BP schedule) is returned. 
It is noteworthy that although the proposed NMR 
technique needs at least ª º/ 2N  cores for parallel execu-
tion of each task in the indispensable and on-demand 
phases, if less than ª º/ 2N  cores are available, the pro-
posed technique still can be used (with a slight change) 
but with less parallelism. Indeed the technique can be 
even used for a single core where for each task, at first the 
system executes ª º/ 2N  copies of the task one after anoth-
er (in series) in its indispensable phase and then compares 
their results. If some faults occur during the indispensable 
phase, the system executes the remaining copies of the 
task (again in series) for the on-demand phase and finally 
the whole results are voted on to mask the faults. It 
should be noted that this reduced parallelism obviously 
takes more time and hence may not be suitable for real-
time systems with tight deadlines. When more cores are 
available, more parallelism can be achieved that results in 
lower schedule length that provides higher schedulability 
[19]. This can also release some static slack time that can 
be used for energy management. 
4 ENERGY MANAGEMENT 
For the proposed NMR technique we have implemented 
a specific energy-management technique which compris-
es offline (Section 4.2) and online (Section 4.3) stages and 
exploits different types of slack time to reduce the system 
energy consumption through DVS [12]. 
Let WIND and WBP be the worst-case time it takes to exe-
cute the schedules SSC and SBP in the indispensable and 
on-demand phases respectively. We need not only to 
reserve the time WIND for the indispensable phase but also 
to reserve the time WBP for the on-demand phase. Hence, 
the proposed technique is feasible when WIND+WBP≤D (D 
is the application deadline) and the static slack SS which 
is left over from the application and can be used for ener-
gy management is: 
 IND BPSS D W W    (1) 
where WIND+WBP is the application total execution time. 
As the amount of static slack is known at design time, 
ofﬂine techniques (e.g., the even slack distribution tech-
nique in [20]) can be used at design time to distribute this 
slack among the tasks. However, in the proposed tech-
nique, there are also two other types of slack time that 
are created at run-time, and hence, unlike the static slack, 
cannot be allocated at design time, and have to be allo-
cated at run-time. These two types of slacks are: 
x Dynamic slack: This slack results at run-time when 
a task consumes less than its worst-case execution 
time due to early completion [6], [8], [11]. It should 
be noted that the actual execution time of a task is 
not known at design time, and hence the dynamic 
slack time which is obtained from the task is also 
not known at design time. 
x Pseudo-dynamic slack: Although we always reserve 
enough time to execute the BP schedule complete-
ly, we do not usually need to execute the tasks of 
this schedule at run-time. This is because when 
ª º/ 2N  copies of a task ﬁnishes successfully during 
the indispensable phase, this task no longer re-
quires the additional ¬ ¼/ 2N  copies in the on-
demand phase. Therefore, the task copies can be 
dropped from the BP schedule, thereby releasing 
some slack. We have called this slack pseudo-
dynamic slack because, just like dynamic slacks, it 
is created at run-time, but unlike dynamic slacks, 
its amount can be calculated ofﬂine at design time. 
When a task Ti executes successfully in the indispensa-
ble phase and we drop its copies from the schedule of the 
on-demand phase, the pseudo-dynamic slack time δi is 
released that can be exploited by DVS to reduce the ener-
gy consumption of the subsequent tasks in the indispen-
sable phase. As the schedule of the on-demand phase is 
available at design time, the amount of this reclaimed 
slack can be calculated ofﬂine at design time. To do this, 
at design time, we consider dropping the tasks from the 
schedule of the on-demand phase one after another in the 
order in which they appear in the schedule of the indis-
pensable phase and the time which is released due to 
dropping a task Ti is the pseudo-dynamic slack δi. 
Fig. 5 shows in more detail how we calculate the pseu-
do-dynamic slack δi which is released after dropping Ti 
from the schedule of the on-demand phase. To calculate 
the pseudo-dynamic slack δi the following three cases can 
be considered: 
1. Case I (Fig. 5a): If there is no task except Ti in the 
block, when Ti is dropped from the schedule the re-
leased slack δi will be Wi+ci, where Wi is the worst-
case execution time of Ti and ci is the maximum time 
which is required for comparing the results (majority 
voting) or saving results. 
2. Case II (Fig. 5b): If Ti is the largest task in the block 
(i.e., Wi ≥ max{Wj} for all the remaining tasks in the 
block), after dropping Ti from the schedule the value 
of pseudo-dynamic slack δi is Wi-max{Wj}. 
3. Case III (Fig. 5c): If there exists at least one task Tj in 
the block larger than Ti, after removing Ti from the 
schedule no pseudo-dynamic slack will be released. 
Considering the three cases in Fig. 5, δi is calculated as: 
δi
(b)
...
Ti
Tj
...
Ti
Tj
(c)
Wi
Wj Wj
Wi
(a)
Ti
Wi+ci
...
δi
(d)
T3
T1 T2
T4 T6
B1 B2 B3
δ1 δ2 δ3 δ4 δ5 δ6
T5
 
Fig. 5.  Pseudo-dynamic slack (δi) calculation. 
SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS                                                7 
 
^ ` ^ `
i j j
          when only T  exists in the block (Case I)
max               when max  (Case II) 
0          when W <W  for at least one task T  (Case III)
i i i
i i j i j
W c
W W W WG
­ 
°°  t®
°
°¯
 (2) 
In the following we illustrate how pseudo-dynamic 
slack is calculated by means of an example. Fig. 5d shows 
the pseudo-dynamic slack δi which will be released after 
dropping each task Ti from the BP schedule. The worst-
case execution times of the example tasks are shown in 
Fig. 1a. The tasks are dropped from Fig. 5d in the order in 
which they are scheduled. In this example, without loss of 
generality, we assume that comparing results (majority 
voting) and saving results for all the tasks consume 5 time 
units (i.e., ci=5 for all the tasks). For this example, it can be 
seen from the schedule of Fig. 5d that as the task T1 is a 
single task in the block B1 (Case I), if we drop T1 from the 
schedule, the released slack will be δ1=W1+c1=25. After 
dropping T1, if we drop T2 from the schedule, as T2 is the 
largest task in the block B2 (Case II), the released slack 
will be δ2=W2−W3=20. After dropping T2, as the task T3 is 
the largest task in B2, the released slack will be 
δ3=W3−W4=10 (Case II). After dropping T3, the task T4 will 
be a single task in the block B2 and if we drop T4 from the 
schedule the released slack will be δ4=W4+c4=35 (Case I). 
Similarly, we obtain: δ5=20 and δ6=25. Although the 
amount of the pseudo-dynamic slacks can be calculated at 
design time, it should be noted that this slack is not avail-
able (and hence cannot be allocated) from the beginning 
of the application execution and is created at run time 
when a task ﬁnishes successfully in the indispensable 
phase. This is why we call it pseudo-dynamic slack. 
It is noteworthy that the proposed scheduling tech-
nique (Section 3) helps to distribute pseudo-dynamic 
slacks evenly among the tasks. It is known that even slack 
distribution results in more energy saving as compared to 
uneven slack distribution [6], [20]. Indeed pseudo-
dynamic slack is prone to be distributed unevenly among 
the tasks. This is because a pseudo-dynamic slack which 
is obtained from a task cannot be exploited by the same 
task or by its previous tasks and it can only be exploited 
by its subsequent tasks. Therefore, those tasks that appear 
later in the schedule have more chance to gain larger 
pseudo-dynamic slacks as compared to the tasks that 
come earlier. This implies that when pseudo-dynamic 
slack becomes available sooner rather than later, it helps 
to distribute pseudo-dynamic slacks more evenly. To 
achieve this, we use two policies in our proposed sched-
uling technique (Section 3): i) we move tasks in each block 
of the BP multi-core schedule to the end of the block, 
thereby enabling the slacks to appear sooner in the block 
(see Fig. 5b), ii) we use the LTF policy. To give an insight 
into how the LTF policy works, consider the following 
example. Suppose that three tasks T1, T2 and T3 with 
worst-case execution times W1=6, W2=3 and W3=2 appear 
in the LTF order in the indispensable phase. Assuming 
that these tasks are in one block of the on-demand phase, 
using (2), the pseudo-dynamic slacks obtained from these 
tasks will be δ1=W1−W2=3, δ2=W2−W3=1 and δ3=W3=2 (In 
this example we assume that ci=0). However, if the tasks 
appear in the order T3, T1 and T2 which is not LTF, the 
pseudo-dynamic slacks will be δ3=0, δ1=3 and δ2=3. There-
fore, in the LTF order pseudo-dynamic slacks are availa-
ble sooner and hence can be distributed among the tasks 
more evenly. 
As we explained earlier, dynamic slack may result at 
run-time due to early completion of tasks [6], [8], [11]. 
However, as the actual execution time of a task is not 
known at design time, the amount of dynamic slack is 
also not known at design time. Hence, we provide an 
online energy-management technique to exploit dynamic 
slacks at run-time (Section 4.3). With respect to pseudo-
dynamic slacks, since unlike dynamic slack the amount of 
pseudo-dynamic slack is known at design time, we have 
developed a speciﬁc offline technique to manage pseudo-
dynamic slacks (Section 3.2). 
4.1 Energy and Reliability Models 
Power consumption of each task Ti mainly comprises 
dynamic power Pdyn(Ti) and static power Pstat(Ti). The 
dynamic power is determined by [6]: 
2(T )dyn i eff i iP C V f  (3) 
where Ceff is the effective switched capacitance, Vi and fi 
are, respectively, the supply voltage and the operational 
frequency during the execution of Ti [6], [7]. The static 
power is mainly comprised of sub-threshold leakage 
power and can be written as: 
0(T )
th
T
V
V
stat i sub i iP I V I e V
K   (4) 
where Vi is the supply voltage, Isub is the sub-threshold 
leakage current, I0 depends on technology parameters 
and device geometries, η is a technology parameter, Vth is 
the transistors threshold voltage, and VT is the thermal 
voltage [6].  
When DVS is used, each task Ti is executed at a voltage 
Vi, which may be less than Vmax (the maximum possible 
supply voltage). For each task Ti, we deﬁne the normal-
ized supply voltage ρi as follows: 
max
U  ii
V
V
 (5) 
When a task Ti is executed at the scaled voltage 
Vi=ρiVmax, considering an almost linear relationship be-
tween voltage and frequency [6], [7], we have: fi=ρifmax, 
where fi is the operational frequency corresponding to Vi 
and fmax is the maximum possible operational frequency 
(corresponding to Vmax). Therefore, when DVS is used, the 
actual execution time of the task is prolonged from ti to 
ti/ρi, and by substituting Vi=ρiVmax and fi=ρifmax in (3) and 
(4), the total energy which is consumed to execute the 
task Ti is given by [6]: 
2 2 2
max max max( ) ( ) ( )ii sub i eff i i S i D i
i
tE T I V C V f P P tU U U UU     (6) 
where PS=IsubVmax and PD=CeffV2maxfmax are respectively the 
static and dynamic powers when the system performs at 
the maximum voltage and frequency. 
Without considering the energy consumption of the 
on-demand phase (which commonly has a very low 
8  
 
probability of being performed as faults rarely occur [6], 
[8]), we focus on the indispensable phase and aim at min-
imizing the fault-free energy consumption (like the works 
[5], [6], [8]). Using the energy model of (6) that gives the 
energy consumption of a single task, the energy which is 
consumed to execute a task Ti in the indispensable phase 
(i.e., executing ª º/ 2N  copies of the task and comparing 
the results) can be written as: 
ª º 2(T ) ( )/ 2NMR i S i D i iE P P t cN U    (7) 
where ci is the result comparison time. Based on (7), the 
energy consumption of the fault-free execution of an ap-
plication with n tasks using the proposed NMR technique 
can be calculated as:  
ª º 2app
1 1
(T ) ( )/ 2
n n
NMR i S i D i i
i i
E E P P t cN U
  
   ¦ ¦  (8) 
As it is explained in Section 3, the fault-free energy 
consumption is very close to the average energy con-
sumption. Therefore, we use (8) in our offline energy 
management at design time (Section 4.2) to minimize the 
fault-free energy consumption. Also, in our experiments 
in Section 5 we report the fault-free energy consumption. 
Transient faults are usually assumed to follow a Pois-
son distribution with an average rate λ [5], [6]. Consider-
ing the effects of DVS on transient fault rates, the fault 
rate at the scaled supply voltage Vi=ρiVmax (ρmin≤ρi≤ ρmax=1) 
is modeled as [5], [6]: 
min
(1 )
1
0( ) 10
U
UO U O

 
id
i  
(9) 
where λ0=λ(ρmax=1) is the fault rate at the maximum volt-
age Vmax, ρmin is the ratio of the minimum supply voltage 
Vmin to Vmax, and the exponent value d is a technology 
dependent constant [5], [6]. Considering (9) (i.e., the effect 
of voltage scaling on transient fault rate), the probability 
of a task Ti being executed correctly is written as [5], [6]: 
( )
( )
O U UU

 
i
i
i
t
i iR e  (10) 
where λ(ρi) is given by (9) and ti/ρi is the execution time of 
Ti when executed at Vi=ρiVmax. Conversely, the probability 
of failure of the task Ti (i.e., the unreliability of Ti) is de-
noted by [5], [6]: 
( )
( ) 1 ( ) 1
O U UU U

   
i
i
i
t
i i i iF R e  (11) 
To calculate the reliability of the proposed two-phase 
NMR technique we consider two cases: i) the fault-free 
execution where all ª º/ 2N  copies of each task are exe-
cuted successfully in the indispensable phase and ii) the 
case where some tasks in the indispensable phase become 
faulty and we perform the on-demand phase. In NMR, 
the correct execution of at least ª º/ 2N  copies of each 
task is required for the system to be functional. In the 
proposed NMR technique, all the correct executions may 
be performed in the indispensable phase (when no fault 
occurs), or some of them are performed in the on-demand 
phase (when a fault occurs). Therefore, the reliability of 
the proposed system can be calculated by considering the 
two cases. The first case gives the reliability of Ti in the 
fault-free state, and the second case gives the reliability 
when some faults occur during the execution of Ti.  
When no fault occurs, ª º/ 2N  copies of each task Ti 
are executed in the indispensable phase and the on-
demand phase is not required. When we use DVS in the 
indispensable phase, each task Ti is executed on the scaled 
supply voltage ρiVmax. Therefore, using (10), the reliability 
of a task Ti in the fault-free case can be calculated as: 
ª º/ 21(T ) ( ) Ni i iR R U  (12) 
where Ri(Ui) is given by (10). To calculate the reliability for 
the case that k (1≤k≤ ¬ ¼/ 2N ) copies of each task become 
faulty (in NMR up to ¬ ¼/ 2N  faulty executions can be 
masked [9], [10]), we consider all the cases that j (1≤j≤k) 
copies from ª º/ 2N  copies of Ti in the indispensable phase 
and k−j copies from ¬ ¼/ 2N  copies of Ti in the on-demand 
phase become faulty. In these cases, the other ª º/ 2N ‒j 
copies in the indispensable phase and ¬ ¼/ 2N ‒(k‒j) copies 
in the on-demand phase are executed correctly. Therefore, 
the probability of the correct execution of a task Ti when 
up to ¬ ¼/ 2N  executions of Ti become faulty can be calcu-
lated using (10) and (11) as: 
ª º ª º¬ ¼
¬ ¼ ¬ ¼
/ 2
/ 2
1 1
indispensable phase
/ 2 ( )
max max
on-demand phase
/ 22(T ) ( ) ( )
/ 2
                           ( ) ( )
N k
j jN
i i i i i
k j
k j N k j
i i
NR F R
j
N
F R
k j
U U
U U

  
  
§ · u¨ ¸
© ¹
§ ·
¨ ¸¨ ¸© ¹
¦ ¦
 (13) 
where ρi determines the scaled voltage which is employed 
in the indispensable phase, ρmax=1 is employed in the on-
demand phase (as in the on-demand phase no DVS is 
used and tasks are executed at the maximum supply volt-
age Vmax). Considering both the fault-free and faulty con-
ditions, the reliability of a task Ti in the presence of up to 
¬ ¼/ 2N  faults when executed by the proposed NMR 
technique, can be written as: 
(T ) 1(T ) 2(T )i i iR R R   (14) 
The reliability of an application execution relies on the 
correct execution of all its tasks. Therefore, using (14), the 
reliability of an application with n tasks running by the 
proposed NMR technique can be calculated as: 
app
1
(T )
n
i
i
R R
 
   (15) 
4.2 Ofﬂine Energy Management 
As explained in the previous sections, in the proposed 
NMR technique, when no fault occurs, we do not execute 
the on-demand phase (includes half of the copies for each 
task, i.e., ª º/ 2N ), which results in considerable energy 
saving as compared with conventional NMR. In this sec-
tion we discuss how the proposed NMR technique ex-
ploits static and pseudo-dynamic slack times to achieve 
even further energy reduction. For this purpose, we de-
velop a specific technique to allocate static and pseudo-
dynamic slack times to tasks ofﬂine at design time. When 
we allocate static and pseudo-dynamic slack times, we 
SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS                                                9 
 
assume that no dynamic slack exists, as the availability 
and the amount of dynamic slack times is not known at 
design time. Indeed, at first we minimize the expected 
energy consumption of the system by the ofﬂine alloca-
tion of static and pseudo-dynamic slacks assuming that 
no dynamic slack exists. However, at run-time we also 
exploit dynamic slacks through our online energy-
management for further energy saving (Section 4.3). 
To develop the offline slack allocation, we formulate 
the problem as an optimization problem. To do this, we 
formulate time constraints as inequalities. For the first 
task T1, as the task is executed at the scaled supply volt-
age ρ1Vmax, its worst-case execution time increases from 
W1+c1 to (W1+c1)/ρ1. Considering that the only slack time 
which is available to T1 is the static slack time (SS given 
by (1)) and no pseudo-dynamic slack is available to it (as 
pseudo static slack is obtained only from the previous 
tasks and T1 has no previous task), T1 cannot exploit more 
than the static slack SS. So we have: 
1 1
1 1
1
( )W c W c SSU
   d  (16) 
It should be noted that although the whole of static 
slack SS is available to the first task T1, this does not nec-
essarily mean that it exploits all its available slack time. 
Indeed, each task can exploit only a part of its available 
slack and set aside the remaining for the subsequent 
tasks. During the indispensable phase, when a task Ti 
finishes successfully the pseudo-dynamic slack δi which is 
obtained by dropping the task Ti from the on-demand 
phase, is available to its subsequent tasks in the indispen-
sable phase. Consequently, the task T2 can exploit both 
the part of static slack SS left over by T1 and the pseudo-
dynamic slack δ1 which is obtained by dropping T1 from 
the on-demand phase. Hence, for the task T2, we have: 
1
1
2 2 1 1
2 2 1 1 1
2 1Obtained
from T
Static slack left over from T
( )   ( )GU U
ª º § ·  d    « »¨ ¸« »© ¹¬ ¼
W c W cW c SS W c  
(17) 
Similarly, for each task Ti (1≤i≤n) we have: 
Obtained from Static slack left over from  the previous tasks
the previous tasks
( ) ( )GU U) )
ª º§ ·   d    « »¨ ¸¨ ¸« »© ¹¬ ¼
¦ ¦ j ji i i i j j j
i jT Tj i j i
W cW c W c SS W c
 
(18) 
where Φi is the set of all tasks that has been executed 
before starting the task Ti. The optimization problem of 
the offline part of energy management can be written as: 
app
1
app demand
minimze: (T )                                  
subject to:
 1: ( ) ( )
            for all T  (1 )
 2 :
n
NMR i
i
j ji i
i i j j j
i j
i
T Tj i j i
E E
W cW cc W c SS W c
i n
c R R
GU U
 
) )
 
ª º§ ·   d    « »¨ ¸¨ ¸« »© ¹¬ ¼
d d
t
¦
¦ ¦  (19) 
where Eapp is the energy consumption of an application 
executed using the proposed NMR technique (given by 
(8)), the constraint c1 (Inequality 18) is used to consider 
time constraints, i.e., to consider how much slack is avail-
able to each task (including pseudo-dynamic and static 
slack), and the constraint c2 guarantees that the system 
reliability does not fall below a required level Rdemand. 
The parameters, tasks worst-case execution time (Wi), 
result comparison time (ci), static slack time (SS) and 
pseudo-dynamic slack (δi) are all known at design time. 
This implies that, this optimization problem can be solved 
offline at design time to determine the ρi values which 
minimize the system energy consumption. It should be 
noted however that we cannot assign obtained ρi values 
to the tasks at design time. Rather, we store the ρi values, 
and during the indispensable phase we assign the supply 
voltage ρiVmax to the task Ti, whenever all its previous 
tasks ﬁnish successfully. In other words, the ρi values that 
we calculate using the proposed ofﬂine technique is only 
valid for the fault-free execution. If some faults occur 
during the indispensable phase, the ρi values will be no 
longer valid. This is because when a fault occurs in a task 
Ti during the indispensable phase, the system cannot 
drop it from the schedule of the on-demand phase, which 
means that the pseudo-dynamic slack δi will not be longer 
available. One possible solution for this problem is the 
ofﬂine calculation of ρi values for all possible fault scenar-
ios and at run time based on how faults occur we can 
decide to use the proper set of ρi values. However, we do 
not use this method as the fault-free state is the most 
probable state and hence is the most prominent state from 
the viewpoint of average energy consumption [6], [8]. 
Therefore, in the proposed technique we use the ρi values 
that are calculated for the fault-free case. However, if a 
fault occurs at run-time, we temporarily do not use the ρi 
values that are calculated offline (as they are no longer 
valid) and from then on, we only use the proposed online 
management technique (Section 4.3) to allocate pseudo-
dynamic slacks. From the beginning of the next frame we 
again use the ρi values that are calculated offline. 
4.3 Online Energy Management 
Let xi be the slack (including the pseudo-dynamic and 
static slacks) that is allocated to a task Ti at design time 
using the offline part of our energy-management (Section 
4.2). When DVS is used, the task worst-case execution 
time increases from Wi+ci to (Wi+ci)/ρi. On the other hand, 
as we exploit the slack xi by DVS, we can also say that the 
task worst-case execution time increases from Wi+ci to 
Wi+ci+xi. This implies that we have: 
( )U
  i ii i i
i
W cx W c  (20) 
Indeed, after calculating the ρi values by solving the of-
fline optimization problem at design time, we obtain the 
slack xi (including pseudo-dynamic and static slacks) that 
we allocate to a task Ti using (20). At run-time, for each 
task Ti, the total slack time SLi which is available to the 
task can be written as: 
i i jSL x DS   (21) 
where xi is the slack time which has been calculated 
ofﬂine in Section 4.2 (including both pseudo-dynamic and 
static slacks), and DSj is the dynamic slack which has 
10  
 
been left over by the previous task (the task Tj)  in the 
indispensable phase due to early completion at run-time. 
Since SLi is the whole slack time which is available to the 
task Ti, the scaled supply voltage ρiVmax which is assigned 
to the task, must not prolong its worst-case execution 
time beyond the time Wi+ci+SLi, i.e., we require: 
U
 d  i i i i i
i
W c W c SL  (22) 
Clearly the proposed online energy management must 
take into account the time-constraint given by (22). An-
other important constraint that must be taken into ac-
count is for guaranteeing reliability. Let Hi be the mini-
mum value of ρi that does not cause the system reliability 
falls below the required level. Clearly we require: 
i iU Ht  (23) 
In the proposed online energy manager, as DVS-
enabled processors usually have discrete volt-
age/frequency levels (Section 5), we always select the 
smallest value of ρi among the set of possible ρi values 
that satisfies both the Inequalities (22) and (23). In order 
to be able to check Inequalities (22) and (23) at run-time 
we need to have SLi and Hi values at run-time. To calculate 
the slack time SLi (given by (21)) at run-time, note that xi 
values have been calculated ofﬂine and stored to be used 
at run-time. Also the dynamic slack time DSi which is 
obtained from the task Ti can be easily calculated at run-
time as follows. When DVS is used for the ﬁrst task (T1), 
the actual execution time of the task is (t1+c1)/ρ1. Since all 
the slack time which is available to T1 is x1, the maximum 
time which is available for executing T1 is W1+c1+x1 there-
fore the dynamic slack which is obtained from T1 is: 
1 1
1 1 1 1
1
( ) U
    t cDS W c x  (24) 
For the remaining tasks (Ti, 2≤i≤n), the maximum availa-
ble time is Wi+ci+xi+DSj (where DSj is the dynamic slack 
which has been left over by the task Tj which is the task 
that is finished just before starting the task Ti). Therefore, 
we can write: 
( ) i ii i i i j
i
t cDS W c x DS U
      (25) 
At the end of each task Ti, we can use (25) (except for 
the ﬁrst task that we use (24)) to calculate DSi at run-time. 
It can be seen from (25) that to calculate the dynamic slack 
DSi, we need to know Wi+ci+xi, DSj, and (ti+ci)/ρi. The 
parameter Wi+ci+xi is known at design time, and hence it 
can be calculated ofﬂine and stored to be used at run-
time. DSj, is the dynamic slack obtained from the task Tj 
(which is the task that is finished just before starting the 
task Ti), and is already calculated at the end of the Tj. 
(ti+ci)/ρi is the actual execution time of the task Ti (includ-
ing the result comparison time), and when the task ﬁnish-
es, its execution time can be easily calculated using the 
internal system clock (as this execution time is the differ-
ence between the start time and ﬁnish time of the task). In 
short, at the end of each task Ti, the dynamic slack time 
DSi, can be calculated at run-time with very low overhead 
as its online calculation only requires a few subtraction 
and addition operations. The minimum possible value of 
ρi that does not cause the system reliability falls below the 
required level (i.e. Hi values) can also be calculated offline 
at design time. To do this we can solve the optimization 
problem of (19), but without considering the constraint c1. 
This is because the constraint c1 is used to consider time 
constraints, but to calculate Hi values we want to know 
which values of ρi can guarantee the required level of 
reliability regardless of time constraints.   
5 EVALUATION AND DISSCUTIONS 
Experiments in this paper were conducted based on the 
power model of the Intel PXA270 processor [21]. This 
processor can operate at different voltage levels in the 
range of 0.85-1.55V, and the corresponding frequencies 
vary from 13MHz to 624MHz. The energy consumption 
for active cores is calculated by (8) where PS and PD (that 
are respectively the static and dynamic power consump-
tion of the system when operating at the maximum volt-
age and frequency) are 925mW and 260mW respectively 
[21]. Also, the Intel PXA270 processor has a low power 
sleep mode with 0.1014mW of idle power consumption. 
We considered that when a core is disabled or is tempo-
rarily unused, it enters the sleep mode and only con-
sumes the idle power. We modified the tool MEET [22] to 
profile execution time and energy consumption while 
using DVS based on the power model of Intel PXA270. 
Like the works [5], [6], [13], [16], we performed system-
level reliability simulation where the reliability was calcu-
lated by (15) and expressed in terms of application probabil-
ity of failure PoFapp (i.e. PoFapp=1-Rapp). The fault rate was 
modelled using (9) under the parameters λ0=10-6 faults/s 
and d=3 [5], [6]. Therefore, the fault rate varies between 
10−6 faults/s and 10−3 faults/s, corresponding to the maxi-
mum and minimum voltage levels.   
Previous research works on reliable real-time systems 
that do not use NMR rely on fault-detection mechanisms 
[5], [6], [7], [8], [11]. However, they have usually over-
looked the overhead and fault coverage of detection 
mechanisms. Indeed, they usually do not consider any 
speciﬁc detection mechanism and simply assume that a 
detection mechanism with perfect fault coverage is part of 
the tasks (e.g., [5], [6], [7]). However, to provide fair com-
parisons, we need to include a real fault-detection mech-
anism in any implementation of previous works which is 
used in our comparisons. To do this, we considered that 
the previous works use fault-detection mechanisms in-
cluded in their tasks (i.e., software fault-detection mecha-
nisms). We conducted a set of experiments to investigate 
the energy and execution time overheads of the software 
fault-detection mechanisms that can be used for previous 
works. To consider effect of fault-detection mechanisms 
on energy and reliability we used two types of software 
fault-detection mechanisms in the implementations of 
previous works that were used in our comparisons: 
1. Heavy fault-detection mechanisms (called HFD): with 
high fault-detection overheads but relatively high 
fault coverage. For this case we assumed that the 
system uses multiple fault-detection mechanisms 
based on code and data redundancy, arithmetic 
SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS                                                11 
 
code, consistency check, and control ﬂow checking 
[9], [10], [23], [24], [25], [26] to achieve high fault 
coverage for different fault types. 
2. Light fault-detection mechanisms (called LFD): with 
relatively low fault-detection overheads and also 
low fault coverage. For this case we assumed that 
the system uses fewer mechanisms to reduce the 
fault-detection overhead with the cost of decreased 
detection coverage [26]. 
Table 1 shows the time and energy overheads that the 
software fault-detection mechanisms impose (assuming 
that we use the supply voltage 1.55V). To measure the 
overheads the applications were selected from the 
MiBench [27] benchmarks. It should be noted that while 
both time and energy overheads of software fault-
detection mechanisms are lower than the overhead of 
modular redundancy with result comparison (majority 
voting), the fault coverage of software mechanisms is not 
sufﬁciently high, unlike majority voting that provides 
high fault masking [9], [10], [23], [24], [26]. Furthermore, 
these software fault-detection mechanisms are applica-
tion-speciﬁc so that each task requires its speciﬁc detec-
tion mechanism [9], [10], [25], [26], while result compari-
son and majority voting are general and can be used for 
any type of tasks without requiring any hardware 
modiﬁcation or redesign [9], [10], [25]. 
To evaluate the effectiveness of the proposed NMR 
technique (which we call it LE-NMR), we compared LE-
NMR with a recent work (proposed in [5]). To provide a 
fair comparison, for both the implementations of LE-
NMR and the system of [5], we assumed that both use the 
same level of task replication, i.e., when we consider an 
NMR with N copies for each task, we also considered that 
the system of [5] has N-1 backups for each task (i.e., again 
N copies for each task) to achieve fault tolerance. In addi-
tion, the system of [5] requires a fault-detection mecha-
nism to determine if a backup task must be executed or 
not. Like most of the previous works, [5] has not ad-
dressed any fault-detection mechanism, but we consid-
ered that the tasks that are scheduled in the system of [5] 
use task-speciﬁc software mechanisms for fault-detection. 
To do this, we considered implementations of [5] where 
the tasks included heavy fault-detection mechanisms 
(called [5]-HFD) and light fault-detection mechanisms 
(called [5]-LFD). We also considered in our experiments 
an implementation of conventional NMR, called CNMR, 
where we do not use the two phases indispensable and 
on-demand. In conventional NMR, all N copies of each 
task are executed in parallel (assuming that enough cores 
are available) and the static slack time is only used for 
energy reduction. 
It should be noted that there are various techniques to 
achieve low-energy fault-tolerance in real-time systems 
(e.g., [6], [7], [8], [11], [13], [16], [17], [18]) and it is beyond 
the scope of this paper to compare the proposed tech-
nique with all these various techniques. The main reason 
to choose the technique of [5] for the comparison is that it 
is a recent work with similar conditions to the proposed 
technique, e.g., hard real-time constraints, the use of DVS, 
and the frame-based application model with task prece-
dence constraints (a set of dependent tasks with a global 
deadline) running on multi-core platforms. Also, it is 
noteworthy that for many of the previous works it is not 
meaningful to compare them with the proposed tech-
nique because they considerably differ from ours in ap-
plication model (e.g., chain of dependent tasks in [6], 
single-task frame in [13], periodic tasks in [11], [17], and 
independent tasks in [18]).  
To compare LE-NMR with [5] and conventional NMR, 
we used both synthetic and practical application task 
graphs. To do this, we used the task graph generator 
TGFF [28] and the Standard Task Graph set (STG) [29]. 
The STG benchmark suite contains both synthetic task 
graphs and practical real-time application task graphs 
including robot control, SPEC fpppp and a sparse matrix 
solver. We also conducted experiments on two other real-
world applications: MPEG4 decoder and MJPEG encoder 
(their task graphs can be found in [15]). 
Fig. 6 and Table 2 show, respectively, the energy con-
sumption and probability of failure for [5]-HFD, [5]-LFD, 
CTMR, and the proposed LE-TMR when running the 
practical applications. The three following interesting 
observations can be made from Fig. 6 and Table 2: 
1. LE-TMR not only provides more energy saving (in 
average 28% and up to 33%) as compared to [5]-
HFD, but also has a less probability of failure, i.e., 
LE-TMR is more reliable. 
2. Although LE-TMR provides relatively less energy 
saving (in average 12%) as compared to [5]-LFD, 
LE-TMR has a far less probability of failure (it pro-
vides much higher reliability). 
3. LE-TMR provides more energy saving (in average 
34%) as compared to CTMR, while provides almost 
the same level of reliability (Table 2). 
0
200
400
600
800
Robot Sparse fpppp MPEG4 MJPEG
Benchmark
[5]-HFD [5]-LFD CTMR LE-TMR
En
er
gy
 C
on
su
m
pt
io
n 
(m
J)
 
Fig. 6.  Energy consumption of LE-TMR, [5]-HFD, [5]-LFD, and 
CTMR when running the practical applications.  
TABLE 1 
TIME (T) AND ENERGY (E) OVERHEADS OF HEAVY FAULT-
DETECTION (HFD) AND LIGHT FAULT-DETECTION (LFD). 
 
No Fault-
Detection HFD LFD 
Overhead (%) 
HFD LFD 
Benchmark T(ms) E(mJ) T(ms) E(mJ) T(ms) E(mJ) T  E  T E  
QuickSort 885 1005 1730 1937 1189 1325 95.4 92.9 34.4 31.9 
BitCounts 339 385 563 656 438 484 66.0 70.2 29.2 25.8 
BasicMath 960 1093 1802 2059 1282 1432 87.7 88.5 33.5 31.1 
SusanSmooth 630 729 1138 1306 823 952 80.6 79.2 30.6 30.6 
SusanCorners 157 180 305 342 184 204 94.5 90.7 17.2 13.9 
SusanEdges 146 167 279 318 186 209 91.0 90.8 27.4 25.2 
 
12  
 
Another set of experiments were conducted in order to 
analyze how the parallelism degree of task graphs affects 
the effectiveness of our technique. To do this, synthetic 
task graphs were generated. It is known that for task 
graphs with the same number of tasks, the height of the 
task graph can be used to take the parallelism degree into 
account [39]. Based on this, in the experiment three clas-
ses of task graphs with different parallelism degrees were 
considered. Let n be the number of nodes (tasks) in a task 
graph and h be the task graph height. Clearly h can vary 
between 1 and n, therefore the three classes of considered 
task graphs are: i) task graphs with 1≤h≤n/3 (called task 
graphs with high parallelism degree), ii) task graphs with 
n/3≤h≤2n/3 (called task graphs with medium parallelism 
degree), and iii) task graphs with 2n/3≤h≤n (called task 
graphs with low parallelism degree). 
The tasks of the synthetic task graphs were randomly 
selected from the MiBench benchmarks and the time and 
energy overheads of the detection mechanisms for these 
tasks were taken from Table 1. The worst-case and actual 
execution times (Wi and ti) of the tasks were generated 
randomly [4], [5], [6]. The worst-case execution times 
were uniformly distributed between 10ms and 100ms. 
However, as the actual execution times for each task may 
have different probability distributions, like works [4], 
[5], [6], in our experiments, we considered the uniform, 
normal, or exponential distributions for the actual execu-
tion time ti and each task Ti was executed only for the 
duration of ti. In the experiment, it was assumed that task 
graphs with 20, 50, 100, 200, 500 tasks with different par-
allelism degrees were executed on multi-core systems 
with 2, 4, 8, 16 and 32 cores. Each case (e.g., a task graph 
with 50 tasks on an 8-core system) was simulated for 1500 
times with different parameters (i.e., tasks worst-case and 
actual execution times and application deadline) and the 
average results are reported in Figs 7 and 8. These figures 
show the energy consumption and probability of failure 
(PoF) for LE-TMR, [5]-HFD, [5]-LFD, and CTMR. 
These observations can be made from Figs 7 and 8: 
1. It can be seen from Fig. 7 that, for all the four sys-
tems, as the parallelism degree of task graphs in-
creases, the energy consumption decreases. How-
ever, the energy consumption of LE-TMR is always 
less than the other three systems.  
2. While the energy consumption of all the four sys-
tems decreases with the increase in the task graph 
parallelism degree, LE-TMR favours more energy 
reduction as compared to the others. For example, 
assuming we have 16 cores, as the task graph paral-
lelism degree increases from low (Fig. 7a) to high 
(Fig. 7c), the energy consumption of LE-TMR re-
duces from 1698mJ to 1231mJ (28% reduction), 
while the energy consumption of CTMR reduces 
from 2132mJ to 1944mJ (9% reduction).  
3. As Fig. 8 shows, LE-TMR has a far less probability 
of failure than the implementations of [5], even 
compared to the implementation of [5] that uses 
heavy fault-detection mechanisms ([5]-HFD). This 
is because of the superiority of majority voting 
(NMR) in covering the faults as compared to fault-
detection mechanisms [9], [10], [24], [25], [26]. 
4. While LE-TMR provides almost the same reliability 
as CTMR (Fig. 8), LE-TMR consumes much less en-
ergy than CTMR (Fig. 7) mainly because of the 
more sophisticated energy-management technique 
that LE-TMR uses. 
We also compared LE-NMR with N=5 and N=7 (i.e., 
LE-5MR and LE-7MR respectively) with [5] and the con-
ventional NMR. The experiments demonstrate that LE-
NMR completely outperform [5] from both the energy-
TABLE 2 
PROBABILITY OF FAILURE (POF) FOR LE-TMR, [5]-HFD, 
[5]-LFD, AND CTMR WHEN RUNNING THE PRACTICAL 
APPLICATIONS.  
 Application [5]-HFD [5]-LFD CTMR LE-TMR  
 Robot 10-3.45 10-2.4 10-9.64 10-9.67  
 Sparse 10-3.52 10-2.32 10-9.52 10-9.53  
 Fpppp 10-3.64 10-2.45 10-9.45 10-9.48  
 MPEG4 10-3.22 10-2.45 10-9.42 10-9.44  
 MJPEG 10-3.31 10-2.74 10-9.86 10-9.82  
       
800
1200
1600
2000
2400
2800
2 4 8 16 32
Number of cores
[5]-HFD [5]-LFD CTMR LE-TMR
En
er
gy
 C
on
su
m
pr
io
n 
(m
J)
(a) Low parallelism degree
800
1200
1600
2000
2400
2800
2 4 8 16 32
Number of cores
[5]-HFD [5]-LFD CTMR LE-TMR
(b) Medium parallelism degreeE
ne
rg
y 
Co
ns
um
pr
io
n 
(m
J)
800
1200
1600
2000
2400
2800
2 4 8 16 32
Number of cores
[5]-HFD [5]-LFD CTMR LE-TMR
(c)  High parallelism degreeE
ne
rg
y 
Co
ns
um
pr
io
n 
(m
J)
 
Fig. 7. Energy consumption of LE-TMR, [5]-HFD, [5]-LFD, and CTMR when running the synthetic applications. 
1E-10
1E-08
1E-06
1E-04
1E-02
2 4 8 16 32
Number of cores
[5]-HFD [5]-LFD CTMR LE-TMR
Pr
ob
ab
ili
ty
 o
f F
ai
lu
re
 (P
oF
)
(a) Low parallelism degree
1E-10
1E-08
1E-06
1E-04
1E-02
2 4 8 16 32
Number of cores
[5]-HFD [5]-LFD CTMR LE-TMR
Pr
ob
ab
ili
ty
 o
f F
ai
lu
re
 (P
oF
)
(b) Medium parallelism degree
1E-10
1E-08
1E-06
1E-04
1E-02
2 4 8 16 32
Number of cores
[5]-HFD [5]-LFD CTMR LE-TMR
Pr
ob
ab
ili
ty
 o
f F
ai
lu
re
 (P
oF
)
(c)  High parallelism degree  
Fig. 8. Probability of failure (PoF) in logscale for LE-TMR, [5]-HFD, [5]-LFD, and CTMR when running the synthetic applications. 
SALEHI ET AL.: TWO-PHASE LOW-ENERGY N-MODULAR REDUNDANCY FOR HARD REAL-TIME MULTI-CORE SYSTEMS                                                13 
 
consumption and reliability viewpoints. LE-5MR and LE-
7MR provide in average respectively 19% (up to 22%), 
and 17% (up to 21%), and 31% (up to 36%) energy saving 
as compared to the corresponding implementations of [5] 
and the conventional NMR. An interesting observation 
from the experiments is that none of the implementations 
of [5] can achieve high reliability (the implementations of 
[5] cannot achieve a probability of failure less than 10−3) 
while LE-NMR satisfies the required reliability level of 
safety-critical applications as they may require probabil-
ity of failure be less than 10−9 [6], [9], [10]. This is because 
the implementations of [5] use software fault-detection 
mechanisms while the fault coverage of these mecha-
nisms is not sufﬁciently high [9], [10], [25], [26], unlike 
LE-NMR that uses majority voting that provides high 
fault masking [9], [10], [24], [25], [26]. 
 
6 CONCLUSION 
In this paper, we described how multi-core platforms can 
be exploited to achieve high reliability with low energy-
overhead for hard real-time systems. To do this, we pro-
posed a low-energy NMR (we called it LE-NMR). To 
achieve energy saving in LE-NMR we exploit two main 
strategies. First, we adopt a two-phase NMR technique, 
where usually (when no fault occurs) only one phase is 
executed, resulting in a considerable energy saving com-
pared with conventional NMR systems. Second, to 
achieve further energy saving, we use DVS. In developing 
the proposed LE-NMR technique, we have considered 
two new concepts: i) Block-partitioned scheduling and ii) 
Pseudo-dynamic slack management. To exploit available 
slacks in the system by DVS, we have developed an ener-
gy-management technique with offline and online parts. 
The offline part at design time derives and solves an op-
timization problem to exploit the slacks that are known at 
design time (i.e., static and pseudo-dynamic slacks), and 
to assign dynamic slacks to the tasks at run-time, the 
online part is used. The experimental results show that 
LE-NMR provides up to 34% energy saving and is 6 or-
ders of magnitude higher reliable as compared to an im-
plementation of a recent previous work. 
 
ACKNOWLEDGMENT 
Mohammad Salehi and Alireza Ejlali acknowledge Re-
search Vice-Presidency of Sharif University of Technology 
for funding this work under grant no. G930827. Bashir M. 
Al-Hashimi acknowledges the EPSRC (UK), for funding 
this work in part under grant PRiME EP/K034448/1.  
Experimental data used in this paper can be found at 
DOI:10.5258/SOTON/397799 
(http://dx.doi.org/10.5258/SOTON/397799). 
 
REFERENCES 
[1] J. Henkel, V. Narayanan, S. Parameswaran, and J. Teich, “Run-
Time Adaption for Highly-Complex Multi-Core Systems,” Proc. 
Ninth IEEE/ACM/IFIP Int’l Conf. Hardware/Software Codesign and 
System Synthesis (CODES+ISSS'13), pp. 1-8, Sept. 29 2013-Oct. 4 
2013, doi: 10.1109/CODES-ISSS.2013.6659000. 
[2] W.Y. Lee, “Energy-Efficient Scheduling of Periodic Real-Time 
Tasks on Lightly Loaded Multicore Processors,” IEEE Trans. 
Parall. Distr. Syst., vol. 23, no. 3, pp. 530-537, March 2012, doi: 
10.1109/TPDS.2011.87. 
[3] A. Munir, S. Ranka, and A. Gordon-Ross, “High-Performance 
Energy-Efficient Multicore Embedded Computing,” IEEE Trans. 
Parall. Distr. Syst., vol. 23, no. 4, pp. 684-700, April 2012, doi: 
10.1109/TPDS.2011.214. 
[4] H. Su, D. Zhu, and D. Mosse, “Scheduling Algorithms for 
Elastic Mixed-Criticality Tasks in Multicore Systems,” Proc. 
IEEE 19th Int’l Conf. Embed. Real-Time Computing Syst. and Appli-
cations (RTCSA'13), pp. 352-357, Aug. 2013, doi: 
10.1109/RTCSA.2013.6732239. 
[5] Y. Guo, D. Zhu, and H. Aydin, “Reliability-Aware Power Man-
agement for Parallel Real-Time Applications with Precedence 
Constraints,” Proc. Int’l Green Computing Conf. and Workshops 
(IGCC), pp.1-8, July 2011, doi: 10.1109/IGCC.2011.6008562. 
[6] A. Ejlali, B.M. Al-Hashimi, and P. Eles, “Low-Energy Standby-
Sparing for Hard Real-Time Systems,” IEEE Trans. Comput.-Aid. 
Des. Integr. Circuits Syst., vol. 31, no. 3, pp. 329-342, March 2012, 
doi: 10.1109/TCAD.2011.2173488. 
[7] M.K. Tavana, M. Salehi, and A. Ejlali, “Feedback-Based Energy 
Management in a Standby-Sparing Scheme for Hard Real-Time 
Systems,” Proc. IEEE 32nd Real-Time Systems Symposium 
(RTSS'11), pp. 349-356, Nov. 2011-Dec. 2011, doi: 
10.1109/RTSS.2011.39. 
[8] R. Melhem, D. Mosse, and E. Elnozahy, “The interplay of pow-
er management and fault recovery in real-time systems,” IEEE 
Trans. Comput., vol. 53, no. 2, pp. 217-231,  Feb 2004, doi: 
10.1109/TC.2004.1261830. 
[9] D.K. Pradhan, Fault-tolerant Computer System Design. Prentice-
Hall, Inc., Upper Saddle River, NJ, 1996. 
[10] I. Koren, and C.M. Krishna, Fault-Tolerant Systems. Morgan 
Kaufmann, Elsevier, San Francisco, CA, 2007. 
[11] M.A. Haque, H. Aydin, and D. Zhu, “Energy-Aware Standby-
Sparing Technique for Periodic Real-Time Applications,” Proc. 
IEEE 29th Int‘l Conf. Comput. Design (ICCD'11), pp. 190-197, Oct. 
2011, doi: 10.1109/ICCD.2011.6081396. 
[12] T.D. Burd, T.A. Pering, A.J. Stratakos, and R.W. Brodersen, “A 
dynamic voltage scaled microprocessor system,” IEEE J. Solid-
State Circuits (JSSC), vol. 35, no. 11, pp. 1571–1580 , Nov. 2000, 
doi: 10.1109/4.881202. 
[13] D. Zhu, R. Melhem, D. Mosse, and E. Elnozahy, “Analysis of an 
Energy Efficient Optimistic TMR Scheme,” Proc. Tenth Int’l 
Conf. Parall. and Distr. Syst. (ICPADS'04), pp. 559-568,  July 
2004, doi: 10.1109/ICPADS.2004.1316138. 
[14] J. Cong and K. Gururaj, “Energy Efficient Multiprocessor Task 
Scheduling under Input-dependent Variation,” Proc. Design, 
Automation and Test in Europe Conf. and Exhibition (DATE'09), 
pp. 411-416, April 2009, doi: 10.1109/DATE.2009.5090698. 
[15] X. Qi and D. Zhu, “Energy efficient block-partitioned multicore 
processors for parallel applications,” J. Comput. Science Tech., 
vol. 26, no. 3, pp. 418–433,  May 2011, doi: 10.1007/s11390-011-
1144-5. 
[16] X. Qi, D. Zhu, and H. Aydin, “Global scheduling based reliabil-
ity-aware power management for multiprocessor real-time sys-
tems,” J. Real-Time Syst., vol. 47, no. 2, pp. 109-142, March 2011, 
doi: 10.1007/s11241-011-9117-x. 
[17] M.A. Haque, H. Aydin, and D. Zhu, “Energy-Aware Task 
Replication to Manage Reliability for Periodic Real-Time Appli-
cations on Multicore Platform,” Int’l Green Computing Conf. 
(IGCC'13), pp. 1-11, June 2013, doi: 10.1109/IGCC.2013.6604518. 
14  
 
[18] D. Zhu, R. Melhem, and D. Mosse, “Energy Efficient Redun-
dant Configurations for Real-Time Parallel Reliable Servers,” J. 
Real-Time Syst., vol. 41, no. 3, pp. 195-221, April 2009, doi: 
10.1007/s11241-009-9067-8. 
[19] E.G. Coffman and R.L. Graham, “Optimal Scheduling for Two-
Processor Systems,” Acta Informatica, vol. 1, no. 3, pp. 200-213, 
1972, doi: 10.1007/BF00288685. 
[20] M.T. Schmitz, B.M. Al-Hashimi, and P. Eles, System-Level Design 
Techniques for Energy-Efficient Embedded Systems. Norwell, MA: 
Kluwer, 2004. 
[21] Intel Corp., “Intel® PXA270 Processor,” Available: 
http://www.intel.com. 
[22] M. Bazzaz, M. Salehi, and A. Ejlali, “An Accurate Instruction-
Level Energy Estimation Model and Tool for Embedded Sys-
tems,” IEEE Trans. Instrum. Meas., vol. 62, no. 7, pp. 1927-1934, 
July 2013, doi: 10.1109/TIM.2013.2248288. 
[23] N. Oh, P.P. Shirvani, and E.J. McCluskey, “Control-Flow 
Checking by Software Signatures,” IEEE Trans. Reli., vol. 51, no. 
1, pp. 111-122, Mar 2002, doi: 10.1109/24.994926. 
[24] J. Aidemark, J. Vinter, P. Folkesson, and J. Karlsson, “Experi-
mental Evaluation of Time-Redundant Execution for a Brake-
by-wire Application,” Proc. Int’l Conf. Dependable Syst. and Net-
works (DSN’02), pp. 210-215, 2002, doi: 
10.1109/DSN.2002.1028902. 
[25] K.S. Yim, V. Sidea, Z. Kalbarczyk, D. Chen, and R.K.A. Iyer, “A 
Fault-Tolerant Programmable Voter for Software-Based N-
Modular Redundancy,” Proc. IEEE Aerospace Conf., pp. 1-20, 
March 2012, doi: 10.1109/AERO.2012.6187253. 
[26] S. Feng, S. Gupta, A. Ansari, and S. Mahlke, “Shoestring: Prob-
abilistic Soft Error Reliability on the Cheap,” Proc. 15th Architec-
tural Support for Programming Languages and Operating Syst. 
(ASPLOS’10), pp. 385-396, 2010, doi: 10.1145/1736020.1736063. 
[27] M.R. Guthaus, J. S. Ringenberg, and D. Ernst, “MiBench: A free, 
commercially representative embedded benchmark suite,” Proc. 
IEEE Int’l Workshop on Workload Characterization (WWC-4), pp. 3-
14, Dec. 2001, doi: 10.1109/WWC.2001.990739. 
[28] D. Rhodes and R. Dick, “TGFF: Task Graphs for Free,” Proc. 6th 
Int’l Workshop on Hardware/Software Codesign (CODES/CASHE 
'98), pp. 97-101, Mar 1998, doi: 10.1109/HSC.1998.666245. 
[29] T. Tobita and H. Kasahara, “A standard task graph set for fair 
evaluation of multiprocessor scheduling algorithms,” J. Schedul-
ing, vol. 5, no. 5, pp. 379–394, Sep. 2002, doi: 10.1002/jos.116. 
[30] S.-H. Kang, H. Yang, K. Sungchan, I. Bacivarov,  S. Ha, and L. 
Thiele, “Reliability-aware mapping optimization of multi-core 
systems with mixed-criticality,” Proc. Design, Automation and 
Test in Europe Conf. and Exhibition (DATE’14), pp. 1-4,  March 
2014, doi: 10.7873/DATE.2014.340. 
[31] J.C. Smolens, B.T. Gold, J. Kim, B. Falsafi, J.C. Hoe, A.G. 
Nowatzyk, “Fingerprinting: Bounding Soft-Error-Detection La-
tency and Bandwidth”, IEEE Micro, vol. 24, no. 6, pp. 22-29, 
Nov./Dec. 2004, doi:10.1109/MM.2004.72. 
[32] J. Lee, B. Yun, and K. G. Shin, “Reducing Peak Power Con-
sumption in Multi-Core Systems without Violating Real-Time 
Constraints,” IEEE Trans. Parall. Distr. Syst., vol. 25, no. 4, pp. 
1024-1033, April 2014, doi: 10.1109/TPDS.2013.131. 
[33] S. Saha, J. S. Deogun, Y. Lu, “Adaptive energy-efficient task 
partitioning for heterogeneous multi-core multiprocessor real-
time systems,” Int’l Conf. High Performance Computing and Simu-
lation (HPCS), pp. 147-153, July 2012, doi: 
10.1109/HPCSim.2012.6266904. 
[34] S. Rehman, F. Kriebel, M. Shafique, J. Henkel, “Reliability-
Driven Software Transformations for Unreliable Hardware,” 
IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst., vol. 33, no. 
11, pp. 1597-1610, Nov. 2014, doi: 10.1109/TCAD.2014.2341894. 
[35] T. Miller, N. Surapaneni, R. Teodorescu, “Flexible Error Protec-
tion for Energy Efficient Reliable Architectures,” 22nd Int’l 
Symp. Comput. Arch. and High Performance Comput. (SBAC-PAD), 
pp. 1-8, Oct. 2010, doi: 10.1109/SBAC-PAD.2010.37. 
[36] R. Jeyapaul, F. Hong, A. Rhisheekesan, A. Shrivastava, K. Lee, 
“UnSync-CMP: Multicore CMP Architecture for Energy-
Efficient Soft-Error Reliability,” IEEE Trans. Parall. Distr. Syst., 
vol. 25, no. 1, pp. 254-263, Jan. 2014, doi: 10.1109/TPDS.2013.14. 
[37] R. Vadlamani, J. Zhao, W. Burleson, and R. Tessier, “Multicore 
soft error rate stabilization using adaptive dual modular re-
dundancy,” Proc. Design, Automation and Test in Europe Conf. and 
Exhibition (DATE'10), pp. 27-32, March 2010, doi: 
10.1109/DATE.2010.5457242. 
[38] T. Wei, P. Mishra, K. Wu, H. Liang, “Fixed-Priority Allocation 
and Scheduling for Energy-Efficient Fault Tolerance in Hard 
Real-Time Multiprocessor Systems,” IEEE Trans. Parall. Distr. 
Syst., vol. 19, no. 11, pp. 1511-1526, Nov. 2008, doi: 
10.1109/TPDS.2008.127. 
[39] H. Topcuoglu, S. Hariri, M.-Y. Wu, “Performance-effective and 
low-complexity task scheduling for heterogeneous computing,” 
IEEE Trans. Parall. Distr. Syst., vol. 13, no. 3, pp. 260-274, Mar 
2002, doi: 10.1109/71.993206.   
 
Mohammad Salehi received the M.S. 
degree in computer engineering from Sharif 
University of Technology, Tehran, Iran, in 
2010, where he is currently working toward 
the Ph.D. degree in computer engineering. 
From 2014 to 2015, he was a visiting re-
searcher in the Chair for Embedded Sys-
tems CES, Karlsruhe Institute of Technolo-
gy (KIT), Germany. His research interests 
include low-power design of embedded 
systems, multi-/many-core systems with a 
focus on dependability/reliability, low power, 
and the tradeoff between the fault tolerance and energy efﬁciency in 
real-time systems. 
 
Alireza Ejlali is an Associate Professor of 
Computer Engineering at Sharif University 
of Technology, Tehran, Iran. He received a 
Ph.D. degree in computer engineering from 
Sharif University of Technology in 2006. 
From 2005 to 2006, he was a visiting re-
searcher in the Electronic Systems Design 
Group, University of Southampton, UK. In 
2006 he joined Sharif University of Technol-
ogy as a faculty member in the department 
of computer engineering and from 2011 to 
2015 he was the director of Computer Archi-
tecture Group in this department. His research interests include low 
power design, real-time embedded systems, and fault-tolerant em-
bedded systems. 
 
Bashir M. Al-Hashimi (M’99–SM’01–F’09) 
is a Professor of computer engineering, 
Dean of Faculty of Sciences and Engineer-
ing, and the Director of the Pervasive Sys-
tems Center, University of Southampton, 
U.K. He is ARM Professor of computer 
engineering and the Co-Director of the 
ARM-ECS Research Center. His research 
interests include methods, algorithms, and 
design automation tools for low-power 
design and test of embedded systems. 
