Power-Aware Real-Time Scheduling upon Identical Multiprocessor Platforms by Nélis, Vincent et al.
HAL Id: inria-00336172
https://hal.inria.fr/inria-00336172
Submitted on 3 Nov 2008
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Power-Aware Real-Time Scheduling upon Identical
Multiprocessor Platforms
Vincent Nélis, Joël Goossens, Raymond Devillers, Dragomir Milojevic, Nicolas
Navet
To cite this version:
Vincent Nélis, Joël Goossens, Raymond Devillers, Dragomir Milojevic, Nicolas Navet. Power-Aware
Real-Time Scheduling upon Identical Multiprocessor Platforms. 2008 IEEE International Conference
on Sensor Networks, Ubiquitous, and Trustworthy Computing, Jun 2008, Taichung, Taiwan. pp.209-
216, ￿10.1109/SUTC.2008.31￿. ￿inria-00336172￿
Power-Aware Real-Time Scheduling upon Identical Multiprocessor Platforms
Vincent Nélis∗, Joël Goossens, Raymond Devillers, Dragomir Milojevic
Université Libre de Bruxelles (U.L.B.)
CP 212, 50 Av. F. D. Roosevelt,
1050 Brussels, Belgium
{vnelis, joel.goossens, rdevil, dragomir.milojevic}@ulb.ac.be
Nicolas Navet
LORIA - Equipe TRIO
Campus Scientifique - B.P. 239
54506 Vandoeuvre-lès-Nancy, France
nicolas.navet@loria.fr
Abstract
In this paper, we address the power-aware scheduling
of sporadic constrained-deadline hard real-time tasks us-
ing dynamic voltage scaling upon multiprocessor platforms.
We propose two distinct algorithms. Our first algorithm is
an off-line speed determination mechanism which provides
an identical speed for each processor. That speed guar-
antees that all deadlines are met if the jobs are scheduled
using EDF. The second algorithm is an on-line and adap-
tive speed adjustment mechanism which reduces the energy
consumption while the system is running.
1 Introduction
1.1 Context of the study
Some important applications impose temporal con-
straints on the response time while running on systems with
limited power resource (such as real-time communication
in satellites). As a result, the research community has in-
vestigated during the past 15 years the low-power system
design. Actually, the dynamic voltage scheduling (DVS)
framework became a major concern for power-aware com-
puter systems. This framework consists in minimizing the
system energy consumption by adjusting the working volt-
age and frequency of the CPU. For real-time systems, this
DVS framework focuses on minimizing the energy con-
sumption while respecting all the timing constraints.
1Supported by the Belgian National Science Foundation (FNRS) under
a FRIA grant.
Many power-constrained embedded systems are
built upon multiprocessor platforms because of high-
computational requirements and because multiprocessing
often significantly simplifies the design. As pointed out
in [4], another advantage is that multiprocessor systems are
more energy efficient than equally powerful uniprocessor
platforms, because raising the frequency of a single proces-
sor results in a multiplicative increase of the consumption
while adding processors leads to an additive increase.
1.2 Problem definition
In the following, we consider the problem of minimizing
the energy consumption needed for executing a set of spo-
radic constrained-deadline real-time tasks scheduled upon
a fixed number of identical processors. The scheduling is
preemptive and uses the global EDF policy [15]. “Global”
scheduling algorithms, on the contrary to partitioned al-
gorithms, allow different instances of the same task (also
called jobs or processes) to be executed upon different pro-
cessors. Each process can start its execution on any pro-
cessor and may migrate at run-time from one processor to
another if it gets meanwhile preempted by smaller-deadline
processes.
We first tackle the problem of choosing the smallest (or
so) processor frequency for the set of CPUs, such that all
deadlines will be met. The procedure is performed off-line
(i.e., before the system starts its execution) and provides
a static result in the sense that the computed speed does
not change over time. Such a static solution is sufficient
to significantly reduce the energy consumption; however,
due to the discrepancy between Worst-Case Execution Time
(WCET) and Actual-Case Execution Time (ACET) [11], it
2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing
978-0-7695-3158-8/08 $25.00 © 2008 IEEE
DOI 10.1109/SUTC.2008.31
209
usually leads to pessimistic results. In a second step, we
thus propose an on-line scheme that takes advantage of un-
used CPU slots to further reduce the energy consumption.
1.3 Previous work
There is a large number of researches about uniprocessor
energy-aware scheduling but much less for the multiproces-
sor case, where low-power scheduling problems are often
NP-hard when the actual applicative constraints are taken
into account (see [7] for a starting point). Among the
most interesting studies, one can cite [14] where the au-
thors provide power-aware scheduling algorithms for bag-
of-tasks applications with deadline constraints on DVS-
enabled cluster systems. A study particularly relevant to
the DVS framework is [6] which targets energy-efficient
scheduling of periodic real-time tasks over multiple DVS
processors with the considerations of power consumption
due to leakage current (i.e. the static part of the energy dis-
sipation). In [8], the authors propose a set of multiprocessor
energy-efficient task scheduling algorithms with different
task remapping and slack reclaiming schemes, where tasks
have the same arrival time and share a common deadline. A
large number of such “slack reclaiming” approaches have
been developed over the years for the uniprocessor case.
Among those, some strategies dynamically collect the un-
used computation times at the end of each job and share it
among the remaining active jobs. Examples of algorithms
following this “reclaiming” approach, include the ones pro-
posed in [19, 16, 21, 3]. Some reclaiming algorithms even
anticipate the early completion of tasks for further reduc-
ing the CPU speed [16, 3], some having different levels of
“aggressiveness” [3].
1.4 Contribution of the paper
Unlike the work considered in [4], we study the case
where the number of processors is already fixed. This
constraint can be imposed by the availability of hardware
components, by design considerations not related to power-
consumption. Notice that in practical situations, the task
characteristics are unknown at (hardware) design time.
The first contribution of this paper, is based on [13], and
provides a technique which determines the minimum off-
line processor speed for the fixed and identical multiproces-
sor platform using EDF.
The second, and the main contribution of this document,
is a slack reclaiming algorithm which is, to the best of our
knowledge, the first of its kind for the global preemptive
scheduling problem of distinct-deadlines tasks on multipro-
cessor platforms. This contribution can be considered as an
extension to the multiprocessor case of a previous proposal
of Shin and Shoi in [19], which is usually referred to as
“One Task Extension” (OTE). We proved that our on-line
proposal does not jeopardize the system feasibility.
Organization of the paper. The document is organized
as follows: in Section 2, we introduce our model of compu-
tation, in particular our task model; in Section 3, we present
our off-line processor speed determination; in Section 4, we
present our on-line speed reduction technique; in Section 5,
we present our experimental results; in Section 6, we con-
sider our future works and in Section 7, we conclude.
2 Model of computation
2.1 Application model
We consider in this paper the scheduling of sporadic
constrained-deadline tasks, i.e., systems where each task
τi = (Ci, Di, Ti) is characterized by three parameters –
a worst-case execution requirement (WCET) denoted Ci,
a minimal inter-arrival delay Ti and a deadline Di ≤ Ti
– with the interpretation that the task generates successive
jobs τi,j (with j = 1, 2, . . . ,∞) arriving at times ei,j such
that ei,j+1 − ei,j ≥ Ti, each such job has an execution re-
quirement of at most Ci execution units, and must be com-
pleted by its deadline noted Di,j = ei,j +Di. We therefore
assume that the worst-case execution time is always lower
than the deadline, i.e. Ci ≤ Di. We assume that preemp-
tion is allowed – an executing job may be interrupted, and
its execution resumed later (may be upon another proces-
sor), with no loss or penalty. Let τ = {τ1, τ2, . . . , τn} de-
notes a sporadic task system. For each task τi, we define
its density λi as the ratio of its execution requirement to
its deadline: λi
def= Ci/Di. Since Ci ≤ Di we have that
λi ≤ 1. We also define the total density λsum(τ) of spo-
radic task system τ as λsum(τ)
def=
∑n
i=1 λi, and its max-
imal density as λmax(τ)
def= maxτi∈τ λi. Without loss of
generality, we assume in the remainder of the paper that
λ1 ≥ λ2 ≥ . . . ≥ λn, and consequently λmax(τ) = λ1.
2.2 Platform model
In our platform model, a processor can dynamically
adapt its working frequency in some continuous range
[fmin, fmax]. The case where the number of frequencies
is finite can be addressed as in [12]. In the remainder of
this paper, we denote by s(t) the processor speed at any
time-instant t. The processor speed s(t) is defined as the
ratio of its current functioning frequency (say f(t)) over
the maximal frequency fmax, i.e.: s(t)
def= f(t)fmax , with
fmin ≤ f(t) ≤ fmax. Notice that the processor speed al-
ways lies between fminfmax and 1, whatever the values of fmin
210
and fmax, and to each speed corresponds exactly one fre-
quency.
We consider in this document multiprocessor platforms
composed of a known and fixed number m of identical pro-
cessors {P1,P2, . . . ,Pm} upon which a set of real-time
tasks is scheduled. The working power of each processor
may be characterized by its speed (or computing capac-
ity) s – with the interpretation that a job that executes on
a processor of speed s for R time units completes s × R
units of execution. The minimal and maximal admissible
speed of all processors are identical and are denoted by
smin
def= fminfmax > 0 and smax
def= fmaxfmax = 1, respectively.
Since we assume that the range of available frequencies is
continuous between fmin and fmax, the speed of the proces-
sors can take any real value between smin and smax at every
instant. Notice that the task computing requirements (Ci’s)
are defined for the maximal speed smax.
In Section 3 we assume that all the processors share a
common speed which is fixed before the system starts its
execution. This speed does not change during the schedul-
ing and thus, we will use the notation s instead of s(t) to
simplify the presentation. Then, we study the case in Sec-
tion 4 where each processor may run at a different speed
and may change it at any time during the scheduling. In our
work, speed assignments are determined at job-level: volt-
age/speed changes only occur at job dispatching instants.
That is, once a job is assigned to a CPU, the CPU speed is
fixed until the job is preempted or completed.
3 Off-line speed determination
3.1 Introduction
Off-line processor speed determination is the process of
determining, during the design of the real-time application,
the lowest processor speed s in order to schedule the spo-
radic task set τ upon an identical multiprocessor platform
with m processors running at speed s. In this Section, we
consider the case where, at any instant, all processors must
be running at the same speed noted s. We shall use the fol-
lowing result:
Theorem 1 (Bertogna, Cirinei and Lipari [5]). Any spo-
radic constrained-deadline task system τ satisfying
λsum(τ) ≤ m− (m− 1) · λmax(τ)
is schedulable by the EDF algorithm upon a platform with
m identical processors.
Then, we get the following sufficient feasibility condition:
Corollary 1. A sporadic constrained-deadline task sys-
tem τ is EDF-schedulable upon an identical multiprocessor
platform with m processors running at speed s if:
s ≥ λmax(τ) +
λsum(τ)− λmax(τ)
m
(1)
Notice that, from the expression (1) (which is a sufficient
condition), s is always greater or equal to λmax(τ), which is
a necessarily condition to ensure the system schedulability,
whatever the scheduling algorithm.
3.2 Algorithm EDF(k)
Following an idea from [13], but adapted to our off-
line speed determination where the number of processors is
fixed, we shall present an improvement on the speed needed
in order to schedule sporadic task sets.
Algorithm EDF(k) (Goossens, Funk and Baruah [13]):
Assuming that the task indexes are sorted by non-increasing
order of task densities and 1 ≤ k ≤ m, EDF(k) assigns
priorities to jobs of tasks in τ according to the following
rules:
For all i < k, taui jobs are assigned the highest priority
(ties are broken arbitrarily).
For all i ≥ k, τi jobs are assigned priorities according to
EDF (ties are again broken arbitrarily).
That is, Algorithm EDF(k) assigns the highest prior-
ity to jobs generated by the (k − 1) tasks in τ that have
highest densities, and assigns priorities according to dead-
lines to jobs generated by all other tasks in τ (thus, “pure”
EDF is EDF(1)). We show in the following that we get
another lower-bound for the speed s when using EDF(k)
instead of EDF, and this bound is always lower than (or
equal to) the one provided by Expression (1). But first,
we introduce the notation τ (i) to refer to the task system
composed of the (n − i + 1) minimum-density tasks in
τ : τ (i) def= {τi, τi+1, . . . , τn}; (according to this notation,
τ ≡ τ (1)).
Theorem 2. Any sporadic constrained-deadline task sys-
tem τ is EDF(k)-schedulable upon an identical multipro-
cessor platform with m processors at speed sk if sk ≥
max{λ1, λk + λsum(τ
(k+1))
m−k+1 }
Corollary 2. A sporadic constrained-deadline task system
τ is schedulable uponm processors at speed sol by EDF(`),
with
sol
def
= max{λ1,
m
min
k=1
{λk +
λsum(τ
(k+1))
m− k + 1 }} (2)
and ` is the parameter minimizing the speed sol of sk.
Proof. The proof is a direct consequence of Theorem 2.
It may be seen that this expression always yields a better
bound than Inequality (1).
211
3.3 Implementation
A more detailed description of our off-line speed deter-
mination mechanism is given by Algorithm 1. Let sol de-
note the returned speed, defined by Expression (2). Before
applying this algorithm, we assume that the number of pro-
cessors is sufficient to schedule the system τ at the maximal
speed. Consequently, the speed sol is initially set to smax
(line 3). Then, the algorithm searches the minimal speed by
sweeping the value of k between 1 andm (line 4 to line 13).
Finally, in order that EDF(k) assigns the highest priorities
to the (k − 1) tasks that have highest densities, we set the
deadline of these tasks to −∞ (line 14).
Algorithm 1: Off-line speed determination
Input: τ , m, smax, smin
Output: sol
begin1
kopt := 1;2
sol := smax ;3
slimit := max{smin, λ1} ;4
for (k := 1 ; k ≤ m and sol > slimit ; k := k + 1) do5
s := max{λ1, λk + λsum(τ
(k+1))
m−k+1 } ;6
if (s < sol) then7
sol := s ;8
kopt := k ;9
if (sol < slimit) then sol := slimit ;10
foreach τi ∈
n
τ1, ..., τkopt−1
o
do Di := −∞ ;11
return (sol) ;12
end13
4 Multiprocessor One Task Extension
4.1 Introduction
In this section, we consider the case where processors
still share the same minimal and maximal speeds smin and
smax, but each one may run at its own execution speed dur-
ing the scheduling. We assume that, when a processor is
idle, its execution speed is always fixed to the minimal com-
mon speed smin. We propose a low-complexity on-line al-
gorithm that aims to further reduce the speeds of the CPUs
by performing “local” adjustments, when it is safe to reduce
the speed below sol defined by Equation (2).
We term our technique MOTE for Multiprocessor One
Task Extension, since it is a multiprocessor version of the
technique proposed in [19] and usually referred to as OTE.
The idea is the following: the speed of a CPU can safely be
reduced below the speed sol during the execution of a job
if the reduced speed does not change anything with respect
to the schedule of the subsequent jobs scheduled on that
CPU. More precisely, subsequent jobs will not be delayed
by more (nor less) higher-priority workload than with sol.
4.2 Notations
We denote by t the current time in the schedule and by
Bi(t) the last release time of τi before or at time t, with
Bi(0) initially set to−Ti (see Equation 3 to understand this
initialization). During the scheduling, Bi(t) is updated at
each time t a job is released by τi. The ready queue, de-
noted by ready-Q, holds all the pending jobs (i.e. ready to
be executed but waiting for a CPU) sorted according to the
EDF(k) rule, where ties are broken according to an arbitrary
rule; recall that using EDF(k), the priorities of the jobs are
constant. In the following, si denotes the processor speed
for the job τi,j at time t. We shall use the following func-
tions.
The function Ai(t, t′) indicates if the sporadic task τi
may generate a job at time t′ ≥ t. Since Ti denotes the
minimal inter-arrival delay between job releases of the spo-
radic task τi, we get:
Ai(t, t′)
def=
{
1 if t′ ≥ Bi(t) + Ti
0 otherwise (3)
Notice that Bi(0) is initially set to −Ti in order to have
Ai(0, 0) = 1 since our task model considers that each task
may release its first job at time t = 0.
Then, the function PotActi(t, t′) (for Potentially Active
at time t′) indicates if τi has an active job at time t which
may still be active at time t′. This function returns 1 only if
τi is active at time t and if t′ is not larger than the deadline
of this job:
PotActi(t, t′)
def=
 1 if ω
si
i (t) > 0 and
t ≤ t′ < Bi(t) +Di
0 otherwise
where ωsii (t) denotes the remaining worst-case execution
requirement of the last released job of τi if executed at speed
si (if a job is done, its ω is set to zero, even if the WCET is
not exhausted).
Theorem 3. The function
Π(τu,v , t, t
′)
def
= m−
X
τi∈τ\{τu}
PotActi(t, t
′)−
X
τi∈τ
Ai(t, t′)
if non-negative, provides a lower bound of the number of
available CPUs at time t′ ≥ t, when ignoring the schedule
of the current job of τu (if any).
Corollary 3. At each time t where a job τu,v is allocated
to CPU P`, the earliest future time instant in the schedule
such that P` may be required by another job (possibly from
the same task) is given by:
tnext =
{
min{t′ ≥ t | Π(τu,v, t, t′) ≤ 0} ifm ≤ n
+∞ otherwise
212
4.3 MOTE scheme
EDF(k) is a job-level fixed-priority consequently a job
executed on a CPU can only be preempted upon its comple-
tion or the release of a (higher priority) job. In our scheme,
the speed reduction of a job is decided when the job is al-
located to a CPU, for the first time or when it resumes after
being preempted. Upon its release, a job is inserted into the
ready-Q if it cannot receive a processor (i.e. all processors
are used and the job is of lower priority). We do not make
any assumptions on the CPU allocation rule when several
CPUs are available for a single job. For instance, free CPUs
can be granted according to the rule “smaller CPU index
first.”
Since we consider multiprocessor platforms, we know
that we have to be very careful to any change in the origi-
nal schedule because of scheduling anomalies. We say that
a scheduling algorithm suffers from anomalies if a change
which is intuitively positive in a schedulable system can
turn it unschedulable. An “intuitively positive change” is
a change which seems to help the scheduling, like reducing
the density of a task (by increasing its period or reducing its
execution requirement) or advancing the start-time of a job;
this can also be an increase of the number of processors
on the platform. Unfortunately, multiprocessor platforms
are subject to scheduling anomalies [2]. For that reason,
our on-line low-power mechanism only focuses on the last
allocated-job and avoids to change the schedule of the other
jobs.
-
t′
P1
P2
P3
t
τ1,1
D1,1
l ?
A1,2
τ2,1
D2,1
l ?
A2,2
τ3,1
D3,1
l ?
A3,2
tnext
Figure 1. Illustration of a 3-task system.
Figure 1 illustrates the main idea of our on-line algo-
rithm when 3 tasks are scheduled upon 3 processors at speed
sol. This example shows a schedule where t is the cur-
rent time, τ1,1, τ2,1 and τ3,1 are the active jobs at time t
(the ready-queue is empty since there are only three tasks in
the system) and plain circles and vertical arrows represent
the deadlines and the (earliest) arrival times (since tasks are
sporadic) of each task, respectively. Suppose that τ1,1 and
τ1,2 are allocated to P1 and P2. Before allocating τ3,1 to the
processor P3, we see that P3 cannot be required by another
job than τ3,1 until time tnext. Indeed, τ1,2 and τ2,2 could be
assigned (if they arrive at time A1,2 and A2,2) to the CPUs
P1 and P2 since the system feasibility ensures that τ1,1 and
τ2,1 will be completed by their deadline. Consequently,
when ignoring the schedule of τ3,1, we see that tnext is the
earliest time instant (after the time t) such that all processors
may be required. Indeed, tnext is the earliest time instant af-
ter time t such that Π(τ3,1, t, tnext) = 3− 0− 3 = 0.
Since tnext is the earliest time instant (after the cur-
rent time t) such that P3 may be required by another job
than τ3,1 (assuming that all the other active jobs are sched-
uled on other processors), one can conclude that P3 will
only execute the job τ3,1 between time instants t and tnext.
That is, we proved that P3 can modify its working speed
in such a way that τ3,1 completes in the worst-case at time
min{D3,1, tnext} (or earlier if smin imposes it).
Principle: Our on-line power-aware algorithm deals with
a priority rule that assigns a constant priority to each job. In
this work, these priorities are determined by the algorithm
EDF(k). Our power-aware algorithm is only applied when
a job τi,j is to be allocated to a CPU P` at time t during the
scheduling, which corresponds to its arrival or to the com-
pletion of a higher priority job. At this time, our method
determines the earliest time instant tnext such that P` may
be needed by another job. The function Π(τi,j , t, t′) (based
on the deadlines of the jobs currently executing) is used to
sweep the task set (with a running time linear in the num-
ber of tasks). Notice that the function Π(τi,j , t, t′) could be
evaluated only at the deadline-times of the jobs currently un-
der execution and at the next (possible) arrival-time of every
task (since between these instants, the function Π(τi,j , t, t′)
is constant). It follows from Corollary 3 that P` will not
execute another job than τi,j until the time instant tnext.
The speed for τi,j can be safely reduced in such a way that
it completes at time min{Di,j , tnext} (if the corresponding
speed is lower than the current one). Obviously, the work-
ing speed of a processor can never be reduced under smin.
Algorithm 2: Determination of tnext
Input: t, τi
Output: tnext
begin
na := number of active tasks at time t ;
L := set of the next deadline and possible arrival-time of
each task, sorted by increasing order of the occurring time ;
tnext := t;
Π := m− (na − 1);
while (Π > 0 and L 6= φ) do
e← L.top();
tnext := e.occurring time ;
if (e.task 6= τi) and (e.type == deadline) then
Π := Π + 1;
else if (e.type == arrival) then Π := Π− 1;
L.pop() ;
return tnext;
end
Let si denote the processor speed of the active job τi,j .
213
Algorithm 3: Speed-allocation to τi,j at time t
Input: τi,j
Output: φ
begin
// Initialization step
if (τi,j is allocated for the first time) then
if (i < k) then si := λi;
else si := λk +
λsum(τ
(k+1))
m−k+1 ;
// MOTE step
if (m ≤ n) then tnext := Call Algorithm2(t, τi) ;
else tnext :=∞ ;
if (tnext > t) then
si := min{si,
ω
si
i (t)·si
min{Di,j ,tnext}−t
} ;
if (si < smin) then si := smin ;
τi,j is allocated to any available CPUs ;
The speed of the designated CPU is fixed to si ;
else No speed reduction can occur. The EDF(k) rule
applies; τi,j either preempts the lowest priority job
currently under execution or is allocated to any available
CPU, and the processor speed is fixed to si. ;
end
This speed si is initialized when τi,j is released. In a sim-
ple version of the MOTE technique, the execution speed of
every released job is initially set to sol, since we assume
that the priorities are assigned by EDF(k) and we proved
that the system feasibility is guarantee when it is scheduled
by EDF(k) at speed sol (Theorem 2). However, we adopt
here another initialization step in order to profit from the
individual speed of each processor. In this “optimized” ini-
tialization step, two cases may arise at the arrival of the job
τi,j :
1. if τi ∈ (τ \ τ (k)) (the set of the (k − 1) tasks with
highest densities), si is fixed to λi.
2. if τi ∈ τ (k), si is fixed to λk + λsum(τ
(k+1))
m−k+1 .
We proved that all deadlines are met when the system is
scheduled while using this rule. Then, when the job τi,j is to
be allocated to a CPU during the scheduling, we determine
the earliest time instant tnext such that Π(τi,j , t, tnext) ≤ 0
and if tnext > t, one has:
si := min
{
si,
ωsii (t) · si
min {Di,j , tnext} − t
}
(4)
We proved also that the system feasibility is not jeopardized
by this speed modification.
4.4 Implementation
Before the system starts its execution, our algorithm
computes the speed sol by determining the optimal value of
k thanks to Equation (2) (see Algorithm 1). Then, while the
system is running, there is only one kind of situation where
the decision to reduce or not the CPU speed for a job τi,j is
taken: when it is allocated to an available CPU (upon its re-
lease, or when it is waiting for an available processor at the
head of the ready-Q and a job terminates its execution). A
detailed description of the applied procedure at any alloca-
tion time is given in Algorithm 3. Algorithm 2 shows how
to compute tnext with a linearithmic (also called quasilin-
ear) worst-case computing complexity O(n · log(n)), where
n is the number of tasks.
It worth noting that the MOTE step (see Algorithm 3) is
applied at most once to each job (and only if i > k); indeed,
a job whose speed has been changed by this step will not be
preempted in the future and thus will not be (re-)stored in
the ready-Q before its end of execution. However, when the
speed of a job (with a normal priority) is initialized but not
modified by the MOTE step at its arrival, it can possibly be
reduced by the MOTE step in the future, if the job is at the
head of the ready-Q and another job completes its execu-
tion. Section 5 shows that the MOTE algorithm indeed sig-
nificantly improves the energy consumption of a real-time
sporadic system.
5 Experiments
5.1 Introduction
In our simulations, we have scheduled periodic
constrained-deadline systems (i.e., Ti is here the exact inter-
arrival delay for each task τi). The energy consumption
of each generated system is computed by simulating the
three methods described in this paper during one hyper-
period (i.e. the least common multiple of the task peri-
ods); indeed, the authors of [9] show that, for the specific
case of synchronous periodic task systems, the schedule
repeats from the origin with a period equals to the hyper-
period. The three methods are: the off-line speed reduc-
tion for EDF (Equation (1)), the off-line speed reduction
for EDF(k) (Equation (2)) and the MOTE algorithm (com-
bined with EDF(k)). The energy consumptions generated
by these three methods are compared with the consumption
by the Smax method (i.e. all jobs are executed at the maxi-
mal processors speed smax), while using different processor
models. During our simulations, about 5000 constrained-
deadline systems were generated and simulated; with the
number of tasks n in [5, 40] (with density below 1 and
λsum(τ) between 1 and 10). During each simulation, the
ACET of each job was generated using a pseudo-random
generator. We made many graphics from our results, but
they are omitted here due to space limitation. To ensure
that the number m of processors is sufficient to schedule
the generated systems at speed smax, m is determined by
214
the following Equation (from [13]):
m := min
{
n,
⌈
λsum(τ)− λmax(τ)
1− λmax(τ)
⌉}
5.2 Processor models
In our experiments, we used two realistic processor mod-
els. These models, noted P1 and P2 in the following, are
derived from the processor Crusoe TM5400 from Trans-
meta and the processor StrongARM SA-1100 from Intel,
respectively. In these two processor models, the voltage can
only vary in a limited range. Moreover, only a fixed num-
ber of functioning frequencies/voltages are available. For
that reason, we use the available processor speed immedi-
ately above the desired one, if the latter is not available.
Note that the use of the two adjacent frequencies to the re-
quested frequency is more efficient from an energy point of
view (see, for instance, [12]). Table 1 (adopted from [17]
and [20]) summarizes the relationship between frequency,
voltage, power consumption and the corresponding speed
for the Transmeta TM5400 (P1) and the StrongARM SA-
1100 (P2).
CPU Freq. (MHz) Volt. (V) Power (%) Speed
700 1.65 100 1
600 1.60 80.59 0.857
P1 500 1.50 59.03 0.714
400 1.40 41.14 0.571
300 1.25 24.60 0.429
200 1.10 12.70 0.286
206 1.50 100 1
195 1.42 78.9 0.947
180 1.30 63.2 0.874
165 1.20 50.0 0.801
150 1.15 39.9 0.728
P2 135 1.10 33.6 0.655
120 1.08 33.0 0.583
105 0.95 19.8 0.510
90 0.90 15.0 0.437
75 0.82 11.8 0.364
60 0.80 9.44 0.291
Table 1. Processors characteristics.
Tables 2 provides the average consumption profit gen-
erated by each method (expressed in percent), compared to
the consumption using the Smax method over the entire sim-
ulation.
5.3 Observations
We observe a large variation in the power saving of our
algorithms when they are simulated upon the Crusoe pro-
cessor and upon the StrongARM SA-1100. This variation is
due to the difference in the shape of their consumption func-
tion: the consumption function of the StrongARM proces-
sor has a higher curvature than the Crusoe processor. That
results with the StrongARM SA-1100 processor
Method name Power saving over Smax Standard deviation
offline EDF 4.33 % 3.34
offline EDF(k) 27.12 % 10.24
MOTE 44.74 % 8.82
results with the Crusoe processor
Method name Power saving over Smax Standard deviation
offline EDF 0.62 % 0.76
offline EDF(k) 5.91 % 4.38
MOTE 23.3 % 7.55
Table 2. Simulation results.
is, a speed reduction in the StrongARM implies a more sig-
nificant reduction of the system energy consumption. This
reduction is therefore even more significant when we use
the standard dynamic consumption model where the power
consumption function is modeled as a constant plus a cubic
function (or at least a quadratic function) of the speed [22].
However, our results for this theoretical case are omitted
due to the space limitation.
According to [18], the Crusoe processor performs a
speed transition less than 20 µs. This time overhead is
negligible for most real-time systems, since the order of
magnitude of the task characteristics is about few millisec-
onds. With the Strong ARM SA-1100 processor, Pouwelse
et al. [17] report that a voltage/speed change can be per-
formed in less than 140 µs. If this may not be considered as
negligible, since we have at most two speed transitions for
each job (one initially and one for a MOTE step), the “volt-
age change overheads” can be incorporated into the worst-
case execution requirement.
6 Future works
Currently this work addresses the impact of the proposed
scheduling algorithms only on the dynamic power compo-
nent of the overall microprocessor power dissipation. Pro-
posed methods do not take into account the power dissi-
pated to hold the circuit state and/or power dissipation due
to the imperfections of the physical implementation (static
power dissipation component). However it is a very well
known fact that for integrated circuits manufactured with
technologies below 130 nm, and especially with current
90 nm and 65 nm technologies, the static power dissipa-
tion component becomes very important and comparable
to the dynamic power dissipation [10]. A significant re-
search effort has been provided, and is still deployed on
the static power dissipation reduction techniques. Proposed
methods target not only low-level, hardware actions (such
as clock gating) but also higher-level (operating system)
actions forcing the processor to enter one of the multiple
low-power dissipation modes for better trade-off between
power saving and wake-up time (see [1] as an example).
215
The problem of the increased static power dissipation of the
sub-micron technologies is the main motivation for our fu-
ture work, in which we will extend the existing controllable
parameters of our scheduling algorithms (voltage and fre-
quency) with a processor switch-off parameter.
7 Conclusion
In this paper, we proposed two approaches which reduce
the energy consumption for real-time systems implemented
upon multiprocessor platforms. The first one is an adap-
tation of the first proposal “Global EDF”, called EDF(k),
which allows a lower computing speed of the processors
than EDF. The second proposal (called MOTE) is an on-
line low-power algorithm which takes into account the “un-
used” CPU times to adjust the processor speeds while the
system is running. We show in our experiments that this
on-line technique can significantly improve the processors
energy consumption (up to 45% for the Intel StrongARM
SA-1100). Moreover, our MOTE technique can incorpo-
rate the speed/voltage change overheads by simply adding
the speed transition time of the processors to the worst-
case workload of each task. Our two methods address spo-
radic constrained-deadline real-time systems. This model
includes the most popular one: the sporadic and implicit-
deadline task systems. The complexity of each decision (at
any job allocation-time) is linear in the number of ready
jobs in the system. This low-complexity makes the MOTE
strategy a very mighty technique.
References
[1] Intel® pxa27x processor family optimization guide.
[2] B. Andersson. Static-priority scheduling on multiproces-
sors. PhD thesis, Chalmers Univerosty of Technology, 2003.
[3] R. Aydin, R. Melhem, D. Moss, and P. Mejia-Alvarez.
Power-aware scheduling for periodic real-time tasks. IEEE
Transactions on Computers, 53(5):584–600, 2004.
[4] S. Baruah and J. Anderson. Energy-aware implementation
of hard-real-time systems upon multiprocessor platform. In
Proceedings of the ISCA 16th International Conference on
Parallel and Distributed Computing Systems, pages 430–
435, August 2003.
[5] M. Bertogna, M. Cirinei, and G. Lipari. Improved schedu-
lability analysis of EDF on multiprocessor platforms. In
ECRTS’ 05: Proceedings of the 17th Euromicro Conference
on Real-Time Systems, 2005.
[6] J.-J. Chen, H.-R. Hsu, and T.-W. Kuo. Leakage-aware
energy-efficient scheduling of real-time tasks in multipro-
cessor systems. In 12th IEEE Real-Time and Embedded
Technology and Applications Symposium, pages 408–417,
2006.
[7] J.-J. Chen and T.-W. Kuo. Energy-efficient scheduling for
real-time systems on dynamic voltage scaling (DVS) plat-
forms. In 13th IEEE International Conference on Embedded
and Real-Time Computing Systems and Applications, pages
28–38. IEEE Computer Society, August 2007.
[8] J.-J. Chen, C.-Y. Yang, and T.-W. Kuo. Slack reclamation for
real-time task scheduling over dynamic voltage scaling mul-
tiprocessors. In IEEE International Conference on Sensor
Networks, Ubiquitous, and Trustworthy Computing (SUTC),
Taichung, Taiwan, June 2006.
[9] L. Cucu and J. Goossens. Feasibility intervals for multi-
processor fixed-priority scheduling of arbitrary deadline pe-
riodic systems. In Design Automation and Test in Europe,
pages 1635–1640. IEEE Computer Society, 2007.
[10] N. Ekekwe and R. Etienne-Cummings. Power dissipa-
tion sources and possible control techniques in ultra deep
submicron cmos technologies. Microelectronics Journal,
37(9):851–860, September 2006.
[11] R. Ernst and W. Ye. Embedded program timing analysis
based on path clustering and architecture classification. In
Proceedings of the IEEE/ACM international conference on
Computer-aided design, pages 598–604, California, United
States, 1997. IEEE Computer Society.
[12] B. Gaujal, N. Navet, and C. Walsh. Shortest path algorithms
for real-time scheduling of fifo tasks with optimal energy
use. In ACM Transactions on Embedded Computing Sys-
tems, volume 4, pages 907–933, November 2005.
[13] J. Goossens, S. Funk, and S. Baruah. Priority-driven
scheduling of periodic task systems on uniform multipro-
cessors. Real Time Systems, 25:187–205, 2003.
[14] K. Kyong Hoon, B. Rajkumar, and K. Jong. Power aware
scheduling of bag-of-tasks applications with deadline con-
straints on dvs-enabled clusters. In Seventh IEEE Interna-
tional Symposium on Cluster Computing and the Grid, 2007.
CCGRID 2007, pages 541–548, May 2007.
[15] C. Liu and J. Layland. Scheduling algorithms for multipro-
gramming in hard real-time environment. In Journal of the
ACM (JACM), pages 46–61, february 1973.
[16] P. Pillai and K. Shin. Real-time dynamic voltage scaling
for low powered embedded systems. Operating Systems Re-
view, 35:89–102, October 2001.
[17] J. Pouwelse, K. Langendoen, and H. Sips. Dynamic voltage
scaling on a low-power microprocessor. In Proceedings of
the 7th annual international conference on Mobile comput-
ing and networking, pages 251–259, 2001.
[18] G. Quan and H. Xiaobo. Energy efficient fixed-priority
scheduling for real-time systems on variable voltage pro-
cessors. In Proceedings of the 38th conference on Design
automation, pages 828–833, 2001.
[19] Y. Shin and K. Choi. Power conscious fixed priority schedul-
ing for hard real-time systems. In Design Automation Con-
ference, pages 134–139, 1999.
[20] A. Sinha and A. P. Chandrakasan. Jouletrack: a web based
tool for software energy profiling. In Proceedings of the 38th
conference on Design automation, pages 220–225, 2001.
[21] F. Zhang and S. Chanson. Processor voltage scheduling for
real-time tasks with non-preemptible sections. In 23th Real-
Time Systems Symposium, pages 235–245, 2002.
[22] D. Zhu. Reliability-aware dynamic energy management in
dependable embedded real-time systems. In Proceedings
of the 12th IEEE Real-Time and Embedded Technology and
Applications Symposium, 2006., pages 397–407, April 2006.
216
