Discrete Frequency Selection of Frame-Based Stochastic Real-Time Tasks by Berten, Vandy et al.
Discrete Frequency Selection of Frame-Based Stochastic
Real-Time Tasks
Vandy BERTEN, Chi-Ju CHANG, Tei-Wei KUO
National Taiwan University
Computer Science and Information Engineering dept.
{vberten, ktw}@csie.ntu.edu.tw, james299kimo@gmail.com
October 29, 2018
Abstract
Energy-efficient real-time task scheduling has been actively
explored in the past decade. Different from the past work,
this paper considers schedulability conditions for stochastic
real-time tasks. A schedulability condition is first presented
for frame-based stochastic real-time tasks, and several algo-
rithms are also examined to check the schedulability of a
given strategy. An approach is then proposed based on the
schedulability condition to adapt a continuous-speed-based
method to a discrete-speed system. The approach is able
to stay as close as possible to the continuous-speed-based
method, but still guaranteeing the schedulability. It is shown
by simulations that the energy saving can be more than 20%
for some system configurations.
Keywords: Stochastic low-power real-time scheduling,
frame-based systems, schedulability conditions.
1 Introduction
In the past decade, energy efficiency has received a
lot of attention in system designs, ranged from server
farms to embedded devices. With limited energy
supply but an increasing demand on system perfor-
mance, how to deal with energy-efficient real-time task
scheduling in embedded systems has become a highly
critical issue. There are two major ways in frequency
changes of task executions: Inter-task or intra-task
dynamic voltage scaling (DVS). Although Intra-task
DVS seems to save more energy, the implementation
is far more complicated than Inter-task DVS. Most of
the time we need very good supports from compilers
or/and operating systems, that is often hard to receive
for many embedded systems. On the other hand, inter-
task DVS is easier to deploy, and tasks might not be
even aware of the deployment of the technology.
Energy-efficient real-time task scheduling has been
actively explored in the past decade. Low-power real-
time systems with stochastic or unknown duration
have been studied for several years. The problem has
first been considered in systems with only one task,
or systems in which each task gets a fixed amount of
time. Gruian [3, 4] or Lorch and Smith [5, 6] both
shown that when intra-task frequency change is avail-
able, the more efficient way to save energy is to in-
crease progressively the speed. Solutions using a dis-
crete set of frequencies and taking speed change over-
head into account have also been proposed [11, 10]. For
inter-task frequency changes, some work has been al-
ready undertaken. In [7], authors consider a similar
model to the one we consider here, even if this model is
presented differently. The authors present several dy-
namic power management techniques: Proportional,
Greedy or Statistical. They don’t really take the dis-
tribution of number of cycles into account, but only its
maximum, and its average for Statistical. According to
the strategy, a task will give its slack time (the differ-
ence between the worst case and the actual number of
used cycle) either to the next task in the frame, or to all
of them. In [1], authors attempt to allow the manager to
tune this aggressiveness level, while in [10], they pro-
pose to adapt automatically this aggressiveness using
the distribution of the number of cycles for each task.
The same authors have also proposed a strategy taking
the number of available speeds into account from the
beginning, instead of patching algorithms developed
for continuous speed processors [8]. Some multipro-
cessor extensions have been considered in [2].
Although excellent research results have been pro-
posed for energy-efficient real-time task scheduling,
little work is done for stochastic real-time tasks, where
the execution cycles of tasks might not be known in
advance. In this paper, we are interested in frame-
based stochastic real-time systems with inter-task DVS,
where frame-based real-time tasks have the same
deadline (also referred as the frame). Note that the
frame-based real-time task model does exist in many
existing embedded system designs, and the results of
this paper can provide insight in the designs of more
complicated systems. Our contribution is twofold:
First, we propose a schedulability test, allowing to eas-
ily know if a frequency selection will allow to meet
ar
X
iv
:0
80
3.
43
08
v2
  [
cs
.O
S]
  7
 A
pr
 20
08
deadlines for any task in the system. As a second con-
tribution, we provide a general method allowing to
adapt a method designed for a continuous set of speeds
(or frequencies) into a discrete set of speeds. This can
be done more efficiently than classically by using the
schedulability condition we give in the first part. Apart
from this alternative way of adapting continuous strat-
egy, we will show how this schedulability test can be
used in order to improve the robustness to parame-
ters variation. The capability of the proposed approach
is demonstrated by a set of simulations, and we show
that the energy saving can be more than 20% for some
system configurations.
The rest of this paper is organized as follows: we first
present the mathematical model of a real-time system
that we consider in Section 2. We then present our first
contribution in Section 3, which consists in schedula-
bility conditions and tests for the model. We then use
those results in Section 3.5 and 4 to explain how we can
improve the discretization of continuous-speed-based
strategies, and show the efficiency of this approach in
the experimental part, in Section 5, and finally con-
clude in Section 6.
2 Model
We have N tasks {Ti, i ∈ [1, . . . , N ]} which run on a
DVS CPU. They all share the same deadline and period
D (which we call the frame), and are executed in the
order T1, T2, . . . , TN . The maximum execution num-
ber of cycles of Ti is wi. Task Ti will require x cycles
with a probability ci(x), where ci(·) is then the distri-
bution of the number of cycles. Of course, in practical,
we cannot use a so precise information, and authors
usually group cycles in “bins”. For instance, we can
choose to use a fixed bin system, with bi the size of the
bins. In this case, the probability distribution c′i(·) is
such that c′i(k) represent the probability to use between
(k − 1)× bi (excluded) and k × bi (included) cycles.
The system is said to be expedient if a task never waits
intentionally. In other words, T1 starts at time 0, T2
starts as soon as T1 finishes, and so on.
The CPU can run at M frequencies (or speeds) f1 <
f2 < · · · < fM , and the chosen frequency does not
change during task execution. The mode j consumes
Pj Watts.
We assume we have N scheduling functions Si(t) for
i ∈ [1, . . . , N ] and t ∈ [0, D]. This function means that
if Ti starts its execution at time t, it will run until its
end at frequency Si(t), where Si(t) ∈ {f1, f2, ..., fM}.
Si(t) is then a step function (piece-wise constant func-
tion), with only M possible values. Remark that Si(t)
is not necessarily an increasing or a monotonous func-
tion. This model generalizes several scheduling strate-
gies proposed in the literature, such as [8, 10] – where
they consider a function corresponding to Si(D − t) –,
or discrete versions of [7]. Figure 1 shows an example
of such scheduling function set.
A scheduling function can be represented by a set
of points (black dots on Figure 1), representing the be-
ginning of the step. | Si | is the number of steps
of Si. Si[k], k ∈ {1, . . . , | Si |} is one point, with
Si[k].t being its time component, and Si[k].f the fre-
quency. Si has then the same value Si[k].f in the inter-
val
[
Si[k].t, Si[k + 1].t
[
(with Si[| Si | +1].t = ∞), and
we have Si(t) = Si[k].f , where
k = max
{
j ∈ {1, . . . , | Si |} : Si[j].t ≤ t
}
.
Notice that finding k can be done inO(log | Si |) (by bi-
nary search), and, except in the case of very particular
models, | Si |≤M .
We first assume that changing CPU frequency does
not cost any time or energy. See Section 4.1 for exten-
sions.
The scheduling functions Si(t) can be pretty general,
but have to respect some constraints in order to ensure
the system schedulability and avoid deadline misses.
Figure 1 Example of scheduling with function Si(t).
We have 5 tasks T1, . . . , T5, running every D. In this
frame, T1 is run at frequency f1 = S1(t1), T2 at f2 =
S2(t2), T3 at f4 = S3(t3), etc
f2
f3
f4
T1
T2
T3
T5
Dt3 t4t1 t5t2
T4
f1
f2
f3
D
S2(t)
z2
f4
f1
f2
D
S3(t)
f4
z3
f3
f1
We need now to define the concept of schedulability
in our model:
Definition 1. An expedient system {Ti, Si(·)}, {fj}(i ∈
{1, . . . , N}, j ∈ {1, . . . ,M}) is said to be schedulable if,
whatever the combination of effective number of cycles for
each task, any task Ti finishes its execution no later than the
end of the frame.
From this definition, we can easily see that if {Ti} is
such that 1fM
∑N
i=1 wi > D (the left hand size repre-
sents the time needed to run any task in the frame at
2
the highest speed if every task requires its worst case
execution cycle), the system will never be schedulable,
whatever the set of scheduling functions. In the same
way, we can see that if {Ti} is such that 1f1
∑N
i=1 wi ≤
D, the system is always schedulable, even with a “very
bad” set of scheduling functions.
Of course, a non schedulable system could be able
to run its tasks completely in almost every case. Be-
ing non schedulable means that stochastically certainly
(with a probability equal to 1), we will have a frame
where a task will not have the time to finish before the
deadline (or the end of the frame)
3 Schedulability and Discretiza-
tion
3.1 Danger Zone
Lemma 1. Any task in {Ti, Ti+1, . . . , TN} can always fin-
ish no later thanD if and only if the system is expedient, and
Ti starts no later than zi, defined as
zi = D − 1
fM
N∑
k=i
wk.
Proof. This lemma can be proved by induction.
Initialization. We first consider the case TN . The very
last time the task TN can start is the time allowing it to
end before D even if it consumes its wN cycles. At the
highest frequency fM , TN takes at most
wN
fM
to finish.
TN has then necessarily to start no later than D − wN
fM
.
Otherwise, if the task starts after that time, even at the
highest frequency, there is no certitude that TN will fin-
ish by D.
Induction. We know that if (and only if) Ti+1 starts no
later than zi+1, the schedulability of {Ti+1, . . . , TN} is
ensured. We need then to show that if Ti starts no later
than zi, it will be finished by zi+1. If Ti starts no later
that zi, we can choose the frequency in order that Ti
finishes before
zi +
wi
fM
= D − 1
fM
N∑
k=i
wk +
wi
fM
= zi+1.
Definition 2. The danger zone of Ti is the range ]zi, D].
This danger zone means that if Ti has to start in
]zi, D], we cannot guarantee the schedulability any-
more. Even if, because of the variable nature of execu-
tion time, we cannot guarantee that some task will miss
its deadline. Of course, the size of the danger zone of
Ti is larger that the one of Tj if i < j, which means that
zi < zj iff i < j.
In order to simplify some notation, we will state
zN+1 = D.
3.2 Schedulability Conditions
Let us now consider conditions on {Si} allowing to
guarantee the schedulability of the system. We prove
the following theorem:
Theorem 1.
Si(t) ≥ wi
zi+1 − t ∀i ∈ [1, . . . , N ], t ∈ [0, zi[,
where
zi = D − 1
fM
N∑
k=i
wk,
is a necessary and sufficient condition in order to guarantee
that if task Ti does never require more than wi cycles and
the system is expedient, any task Ti can finish no later than
zi+1, and then the last one TN no later than D.
Proof. We show this by induction. Let τi be the worst
finishing time of task Ti. Please note that this does not
necessarily correspond to the case where any task be-
fore Ti consumes its WCEC. Figure 2 highlights why.
Figure 2 Example showing that a shorter number of
cycles for one task can result in a worse ending time for
subsequent tasks. Here, t′ is the point at which S2(t)
goes from f1 to f2. On the top plot, T1 uses slightly less
cycles than in the bottom plot, and T2 uses the same
number in both cases, but is run at f1 in the first case,
and at f2 in the second one.
f1
f2
t′
f1
f2
t′
T1
T1 T2
T2
First, we have to show that in the range [0, zi],
wi
zi+1 − t ≤ fM . As this function is an increasing func-
tion of t, we just need to consider the maximal value
we need:
wi
zi+1 − zi =
wi
D − 1
fM
N∑
k=i+1
wk −
(
D − 1
fM
N∑
k=i
wk
)
=
wi
1
fM
wi
= fM
Initialization. For the initialization, we consider T1.
Clearly, as the execution length is not taken into ac-
count for the frequency selection, the worst case occurs
when T1 uses w1 cycles. As T1 starts at time 0, we have
τ1 =
w1
S1(0)
.
3
As S1(t) ≥ w1
z2 − t by hypothesis, we have
τ1 ≤ w1w1
z2
= z2.
T1 ends then no later than z2 in any case. Similarly,
we have that if S1(t) <
w1
z2 − t , τ1 > z2, and we cannot
guarantee that T1 finishes no later than z2
Induction. Let us now consider Ti, with i > 1. We
know by induction that Ti−1 finished its execution be-
tween time 0 and time zi. Let θ be this end time. Know-
ing that task Ti starts at θ, the worst case for Ti is to use
wi cycles. The worst end time of Ti is then
τi = θ +
wi
Si(θ)
with θ ∈ [0, τi−1 = zi].
Then, as Si(t) ≥ wi
zi+1 − t (which is possible, be-
cause we have just shown that the right hand side is
not higher than fM in the range we have to consider),
we have
τi = θ +
wi
Si(θ)
≤ θ + wiwi
zi+1−θ
= θ + zi+1 − θ = zi+1.
We then have that if Si(t) ≥ wi
zi+1 − t , task Ti finishes
always no later than zi+1, and then, as a consequence,
that any task finishes no later than zN+1 = D.
Symmetrically, we can show also that if Si(t) <
wi
zi+1 − t , then τi is higher than zi+1, and then τN is
higher than D, and the system is not schedulable.
Remark that the expedience hypothesis is a little bit
too strong. It would be enough to require that Ti never
waits intentionally later than zi. T1 doesn’t even have
to start at time 0, as soon as it starts no later that z1.
With this hypothesis, the initialization would be: in the
worst case, T1 would start at time θ, somewhere be-
tween 0 and z1 and use w1 cycles. In this case, it would
end at
τ1 = θ +
w1
S1(θ)
≤ θ + w1w1
z2−θ
= z2
and we know that the CPU can be set to the speed w1z2−θ ,
which is not higher than fM because θ is in [0, z1].
Definition 3. We denote by Li(t) the schedulability limit,
or
Li(t) = wi
zi+1 − t
where
zi = D − 1
fM
N∑
k=i
wk.
An example of such schedulability limits is given in
Figure 3, with four tasks, and a maximum frequency of
1000MHz.
Figure 3 Set of limit functions Li(t), for an example of
4 tasks. DZ represents the Danger Zone of T4.
3.3 Discrete Limit
The closest scheduling functions set to the limit is
Si(t) = min {f ∈ {f1, . . . , fN} : f ≥ Li(t)} .
Informally, we could write this function Si(t) =⌈
wi
zi+1 − t
⌉
, where dwe stands for “the smallest available
frequency not lower than x”. This function varies as a
discrete hyperbola between
⌈
wi
zi+1
⌉
and
⌈
wi
zi+1 − zi
⌉
=
⌈
wi
wi
fM
⌉
= dfMe = fM .
This function is however in general not very effi-
cient: T1 is run at the slowest frequency allowing to
still run the following jobs in the remaining time. But
then, T1 is run very slowly, while {T2, . . . , TN} have a
pretty high probability to run at a high frequency. A
more balanced frequency usage is often better.
This strategies actually corresponds to the Greedy
technique (DPM-G) described by Mosse´ et al. [7], ex-
cept that they consider continuous speeds.
Building such a function is very easy, and is inO(M)
for each task, with the method given by Algorithm 1.
We mainly need to be able to inverse L: L−1i (f) =
zi+1 − wif .
Algorithm 1 Building Limit, worst case scheduling
functions. (a)+ means max{0, a}.
z ← D
foreach i ∈ {N, . . . , 1} do
Si
+← (0, f1)
foreach j ∈ {2, . . . ,M} do
Si
+← ((z − wi
fj−1
)+
, fj
)
z ← z − wifM
In the following, this strategy is named as Limit.
4
3.4 Checking the schedulability
Provided a set of scheduling functions {S}, checking
its schedulability is pretty simple. As we know that the
limit function is non decreasing, we just need to check
that each step of Si is above the limit. This can be done
with the following algorithm.
Algorithm 2 Schedulability check
z ← D
foreach i ∈ {N, . . . , 1} do
foreach k ∈ {2, . . . , | Si |} do
if Si[k − 1].f < wi
z − Si[k].t then
return false
z ← z − wifM
return true
This check can then be performed inO
(∑N
i=1 | Si |
)
which, is Si is non decreasing (which is almost always
the case) is lower than O(N ×M).
This test can be used offline to check the schedula-
bility of some method or heuristic, but can also be per-
formed as soon as some parameter change has been de-
tected. For instance, if the system observes that a task
Ti used more cycles than its (expected) WCEC wi, the
test could be performed with the new WCEC in order
to see if the current set of S functions can still be used.
Notice that we only need to check tasks between 1 and
i, because the schedulability of tasks in {i + 1, . . . , N}
does not depend upon wi. See Section 6 about future
work for more details.
3.5 Using Schedulability Condition to Dis-
cretize Continuous Methods
Figure 4 Two different ways of discretizing a continu-
ous strategy: Discr. strat. 1 rounds up to the first avail-
able frequency. Discr. strat. 2 (our proposal) uses the
closest available frequency, taking the limit into ac-
count. Limit is the strategy described by Algorithm 1.
There are mainly two ways of building a set of S-
functions for a given system. The first method consists
in considering the problem with continuous available
frequencies, and by some heuristic, adapting this re-
sult for a discrete speeds system. The second method
consists in taking into account from the beginning that
there are only a limited number of available speeds.
The second family of methods has the advantage of be-
ing usually more efficient in terms of energy, but the
disadvantage of being much more complex, requiring
a non negligible amount of computations or memory.
This is not problematic if the system is very stable and
its parameters do not change often, but as soon as some
on-line adaptation is eventually required, heavy and
complex computations cannot be performed anymore.
In the first family, the heuristic usually used con-
sists in computing a continuous function Sci (t) which
is build in order to be schedulable, and to obtain a
discrete function by using for any t the smallest fre-
quency above Sci (t), or Si(t) = dSci (t)e. However, this
strategy is often pessimistic. But so far, there were
no other method in order to ensure the schedulability.
This assertion is not valid anymore, because we pro-
vided in this paper a schedulability condition which
can be used.
The main idea is, instead of using the smallest fre-
quency above Sci (t), to use the closest frequency to
Sci (t), and, if needed, to round this up with the schedu-
lability limit Li(t). In other words, we will use:
Si(t) = max{dSci (t)c, dLi(t)e}.
The advantage of this technique is that we have more
chance to be closer to the continuous function (which
is often optimal in the case of continuous CPU). How-
ever, both techniques (ceiling and closest frequency)
are approximations, and none of them is guaranteed
to be better than the other one in any case. As we will
show in the experimental section, there are systems in
which the classical discretization is better, but there are
also many cases where our discretization is better.
Algorithm 3 shows how step functions can be ob-
tained. For each task, computing its function is in
O(M × A), where A is the complexity of computing
S−1i (f). According to the kind of continuous method
we use, A can range between 1 (if Sc−1i (f) has a con-
stant closed form) and log(D/ε) × B, with a binary
search, where ε is the desired precision, andB the com-
plexity of computing Sci (t).
Actually, computing the closest frequency
amongst {f1, f2, . . . , fM} roughly boils down to
compute the round up frequency amongst the set
{ f1+f22 , f2+f32 , . . . , fM−1+fM2 }. Then, the range cor-
responding to f1+f22 is mapped onto f2, etc. In
Algorithm 3, if we simply use fj−1 instead of f , we
obtain the classical round up operation.
5
Algorithm 3 Algorithm computing the closest step-
function to Sci (·), respecting the schedulability limit
Li(·). (a)+ stands for max{0, a}.
foreach i ∈ {N, . . . , 1} do
Si
+← (0, f1)
foreach j ∈ {2, . . . ,M} do
f ← (fj−1 + fj)/2
t← min{Sc−1i (f),L−1i (fj−1)}
Si
+← ((t)+, fj)
4 Model Extensions
4.1 Frequency Changes Overhead
Our model allows to easily take the time penalty of fre-
quency changes into account. Let PT (fi, fj) be the time
penalty of changing from fi to fj . This means that once
the frequency change is asked (usually, a special regis-
ter has been set to some predefined value), the proces-
sor is “idle” during PT (fi, fj) units of time before the
next instruction is run. We assume that the worst time
overhead is when the CPU goes from f1 to fM . We de-
note for this PMT = maxi,j PT (fi, fj) = PT (f1, fM ).
Notice that this model is rather pessimistic: on mod-
ern DVS CPUs, the processor does not stop after a
change request, but still run at the old frequency for a
few cycles before the change becomes effective. How-
ever, even if the processor never stops, there is still
a penalty, but the time penalty is negative when the
speed goes down (because the job will be finished
sooner than if the frequency change had been per-
formed before it started). Then as a first approxima-
tion, we could consider that negative penalties com-
pensate positive penalties. But this approximation
does not hold for energy penalties, because all of them
are obviously positive.
We want also to take the switching time before jobs
into account, even if there is no frequency change (we
assume that the job switching time is already taken into
account in PT ). Let ST (fi) be the switching time when
the frequency is fi, and is not changed between two
consecutive jobs. Again, let SMT denote ST (fM ). Usu-
ally, we have ST (fi) < ST (fj) if fi > fj . We made here
the simplifying hypothesis that the switching time is
job independent, which is an approximation since this
time usually depends upon the amount of used mem-
ory. However, in our purpose, we only need to con-
sider an upperbound of this time.
As before, we know that TN must start no later than
D− wNfM . If TN starts at this limit (and even before), the
selected frequency must be fM . Then we could have
two situations:
• Best case: the previous tasks TN−1 was already
running at fM . Then TN−1 needs to finish before
the start limit for TN , minus the switching time,
then D − wNfM − SMT ;
• Worst case: the previous tasks TN−1 was not run-
ning at fM , we need then to change the frequency.
In the worst case, the time penalty will be PMT .
TN−1 needs then to finish no later than D − wNfM −
PMT .
The first limit is then a necessary condition, and the
second, a sufficient condition to ensure the schedu-
lability of TN . Similarly, we can see that Ti must
start before zni to ensure the schedulability of itself
and any subsequent task (necessary condition), and
this schedulability is ensured (sufficient condition) if
Ti starts before zsi , where z
n
i and z
s
i are defined as:
zni = D−
1
fM
N∑
k=i
wk−(N−i+1)SMT = zi−(N−i+1)SMT
and
zsi = D−
1
fM
N∑
k=i
wk−(N−i+1)PMT = zi−(N−i+1)PMT
We can then provide two schedulability conditions:
• Necessary condition: Si(t) ≥ wi
zni+1 − t
;
• Sufficient condition: Si(t) ≥ wizsi+1−t .
Algorithm 3 can easily be adapted using those con-
ditions. We use then Li(t) = wi
zsi+1 − t
.
4.2 Soft Deadlines
If we want to be a little bit more flexible, we could pos-
sibly consider soft deadlines, and adapt our schedula-
bility condition consequently. The main idea is to not
consider the WCEC, but to use some percentile: if κi(ε)
is such that P[ci < κi(ε)] ≥ 1− ε, where ci is the actual
number of cycles of Ti, we can use κi(ε) as a worst case
execution time.
However, it seems to be almost impossible to com-
pute analytically the probability of missing a dead-
line with this model. It would boil down to compute
P[E1 +E2 +E3 + ...+EN ] where Ei represents the ex-
ecution time of jobs of task Ti. Ei depends then upon
the job length distribution, but also upon the speed at
which Ti is run, which depends upon the time at which
Ti−1 ends ... which depends upon the time Ti−2 ended,
and so on. As Ei’s are not independent, it seems then
that we cannot use the central limit theorem.
If we accept an approximation of the failure prob-
ability, we could do in the following way. Let Ci be
the random variable giving the number of cycles of
Ti, and C =
∑
i Ci. Let W =
∑
i wi be the maximal
value of C (the frame worst case execution cycle). Let
Cε = minc{P[C < c] > 1− ε}.
We assume that using the deadline D
W
Cε
will allow
to respect deadlines with a probability close to 1 − ε.
6
Those propositions are only heuristics, and should re-
quire more work, both analytic and experimental.
5 Experimental Results
In order to evaluate the advantage of using a “closest”
approach instead of an “upper bound” approach, we
applied it on two methods. The first is one described
by Mosse´ et al. in [7], and is called DPM-S (Dynamic
Power Management-Statistical), and the second one is
described by described by Xu, Melhem and Mosse´ [10],
called PITDVS (Practical Inter-Task DVS).
5.1 DPM-S
The method DPM-S described in [7] bets that the next
jobs will not need more cycles than their average, and
compute then the speed making this assumption when
a job starts. Of course, the schedulability limit is also
taken into account. In their paper, the authors consider
that they can use any (normalized) frequency between
0 and 1. In order to apply this method on a system with
a limited number of frequencies, we can either round
them up, or use or “closest” approach. They don’t take
frequency change overheads into account, but accord-
ing to what we claimed hereabove, those overheads are
easy to integrate.
We compute now the two following step functions in
this way, where avgi stands for the average number of
cycles of Ti: in Algorithm 3 adapted to take frequency
changes overhead into account (cf Section 4.1),
• DPM-Sup: we replace S−1i by
D −
∑N
j=i avgi
fj−1
; (1)
• DPM-Sclosest: we replace S−1i by
D −
∑N
j=i avgi
f
. (2)
5.2 PITDVS
The second method we consider, by Xu, Melhem
and Mosse´ [10], is called PITDVS (Practical Inter-Task
DVS), and aims at patching OITDVS (Optimal Inter-
Task DVS [9]), an optimal method for ideal proces-
sors (with a continuous range of available frequen-
cies). They apply several patches in order to make this
optimal method usable for realistic processors. They
start by taking speed change overhead into account,
then they introduce maximal and minimal speed (OIT-
DVS assumes speed from 0 to infinity), and finally,
they round up the S-function to the smallest available
frequency. It is in this last patch that we apply our
technique. Using the βi value described in [10] (rep-
resenting the aggressiveness level), we compute the
step functions in the following way: in Algorithm 3
adapted to take frequency changes overhead into ac-
count (cf Section 4.1),
• PITDVSup (in [10]): we replace S−1i by
D − PT × (N − i)− wi
βifj−1
; (3)
• PITDVSclosest (our adaptation): we replace S−1i by
D − PT × (N − i)− wi
βif
. (4)
In the following, we also run simulations using L
(Limit) to choose the frequency. Our aim was not to
show how efficient or how bad this technique is, but
more to show that often, we observe rather counterin-
tuitive results.
5.3 Workloads and Simulation Architec-
ture
For the simulations we present bellow, we use two dif-
ferent sets of workloads. The first one is pretty simple,
and quite theoretical. We use a set of 12 tasks, each of
them having lengths uniformly distributed, between
miscellaneous bounds, different from each other. For
the second set of simulations, we used several work-
loads coming from video decoding using H.264, which
is used in our lab for some other experiments on a TI
DaVinci DM6446 DVS processor. On Figure 9, we show
the distribution of the 8 video clips we used, each with
several thousands of frames.
We present here experimental results run for two dif-
ferent kinds of DVS processors (see for instance [8] for
details about characteristics): a XScale Intel processor
(with frequencies 150, 400, 600, 800 and 1000MHz), and
a PowerPC 405LP (with frequencies 33, 100, 266 and
333MHz). We took frequency change overhead into
account, but the contribution of change overhead was
usually negligible for all of the simulations we per-
formed (lower that 0.1% in most cases). As a third
CPU, we used the characteristics of XScale, but we dis-
abled one of its available frequency (400MHz in the
plots we show here), in order to highlight the advan-
tage of using our approximation against round up ap-
proximation when the number of available frequencies
is quite low.
5.4 Simulations
We performed a large number of simulations in order
to compare the energy performance of “round up” and
“round to closest”. We compare several processor char-
acteristics, and several job characteristics. We both use
theoretical models and realistic values extracted from
production systems.
7
Figure 5 Energy consumption relative to DPM-Sclosest, for a set of 12 tasks with uniformly distribution.
 0.8
 1
 1.2
 1.4
 1.6
 1.8
 2
 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
PowerPC
DPM-Sclosest
DPM-Sup
Limit
 0.9
 1
 1.1
 1.2
 1.3
 1.4
 1.5
 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
XScale
DPM-Sclosest
DPM-Sup
Limit
 0.95
 1
 1.05
 1.1
 1.15
 1.2
 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
XScale (no 400MHz)
DPM-Sclosest
DPM-Sup
Limit
Figure 6 Energy consumption relative to PITDVSclosest, for a set of 12 tasks with uniformly distribution.
 0.8
 1
 1.2
 1.4
 1.6
 1.8
 2
 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
PowerPC
PITDVSclosest
PITDVSup
Limit
 0.9
 1
 1.1
 1.2
 1.3
 1.4
 1.5
 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
XScale
PITDVSclosest
PITDVSup
Limit
 0.95
 1
 1.05
 1.1
 1.15
 1.2
 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
XScale (no 400MHz)
PITDVSclosest
PITDVSup
Limit
For the figures we present here, we simulated
the same system with different strategies computed
with variations of Algorithm 3, amongst DPM-Sclosest
(Eq. (2)), DPM-Sup (Eq. (1)), PITDVSclosest (Eq. (4)),
PITDVSup (Eq. (3)) and Limit (Algorithm 1), computed
the energy consumption, and presented the ratio of
this energy to PITDVSclosest or DPM-Sclosest. We then
performed the same system, but for various deadlines,
going from the deadline allowing to run any task at
the lowest frequency (D = 1f1
∑N
i=1 wi), to the small-
est deadline allowing to run any task at the higher
frequency (D = 1fM
∑N
i=1 wi). We even used smaller
deadlines, because this limit represents a frame where
each task needs at the same time its WCEC, which has a
very tiny probability to occur. We can consider that de-
creasing the deadline boils down to increase the load:
the smaller the deadline, the higher the average fre-
quency. And quite intuitively, for small and large dead-
line (or frame length), we don’t have any difference be-
tween strategies, because they all use always either the
lowest (large deadline) or the highest (small deadline)
frequency.
A first observation was that in many cases, the S-
function of PITDVSup was already almost equal to
Limit. As a consequence, we could not observe any dif-
ference between PITDVSup and PITDVSclosest. We can
for instance see this on Figure 6, right plot: for dead-
lines between 0.1 and 0.06, we don’t see any difference
between PITDVSclosest and Limit.
In the first set of simulations (Figures 5 and 6), we
used 12 tasks, each of them having a uniformly dis-
tributed number of cycles, with miscellaneous param-
eters. On the PowerPC processor, we observe a large
variety in performance comparison. According to the
load (or the frame length), we see that PITDVSclosest
can gain around 30% compared to PITDVSup, or lose
almost 20%, while we obtain similar comparison for
DPM-Sclosest and DPM-Sup, but with smaller values.
We observe also very abrupt and surprising varia-
tions, such as in Figure 6, middle and right, for Limit,
around 0.03. A closer look around to variations shows
that they usually occurs when the frequency of T1
changes. Indeed, as T1 starts always at time 0, its speed
does not really depends upon S1(t), but only upon
S1(0). So when D varies, S1(0) goes suddenly from
one frequency to another one. Then a very slight varia-
tion of D could have a big impact of each frame. Those
slight variations do not have the same impact for other
tasks, because of the stochastic nature of tasks length.
For instance, if we slightly change Si (i 6= 1), it will
only impact a few task speeds. But slight changes in S0
have either no impact at all, or an impact on every task
in every frame.
From those first figures, we can for sure not claim
that doing a “closest” approach is always better than a
“upper bound”. But those simulations highlight that
there are certainly situations where one approach is
better than the other one, and situations with the other
way around. System designers should then pay at-
tention to the way they round continuous frequencies.
With a very small additional effort, we can often do
better than simply round up the original scheduling
function.
For the second set of simulations (using real video
workloads), on Figures 7 and 8, we observe the same
8
Figure 7 Energy consumption relative to DPM-Sclosest, for a set of 8 tasks distributed as shown in Figure 9.
 0.6
 0.8
 1
 1.2
 1.4
 1.6
 1 2 3 4 5 6 7 8 9 10
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
PowerPC
DPMSclosest
DPM-Sup
Limit
 0.6
 0.8
 1
 1.2
 1.4
 1.6
 0.5 1 1.5 2 2.5
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
XScale
DPM-Sclosest
DPM-Sup
Limit
 0.95
 1
 1.05
 1.1
 1.15
 1.2
 1.25
 0.5 1 1.5 2 2.5
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
XScale (no 400MHz)
DPM-Sclosest
DPM-Sup
Limit
Figure 8 Energy consumption relative to PITDVSclosest, for a set of 8 tasks distributed as shown in Figure 9.
 0.8
 1
 1.2
 1.4
 1.6
 1.8
 1 2 3 4 5 6 7 8 9 10
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
PowerPC
PITDVSclosest
PITDVSup
Limit
 0.9
 1
 1.1
 1.2
 1.3
 1.4
 1.5
 1.6
 0.5 1 1.5 2 2.5
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
XScale
PITDVSclosest
PITDVSup
Limit
 0.95
 1
 1.05
 1.1
 1.15
 1.2
 1.25
 0.5 1 1.5 2 2.5
(R
ela
tiv
e) 
En
erg
y
Frame length (Deadline)
XScale (no 400MHz)
PITDVSclosest
PITDVSup
Limit
kind of differences as from the previous experiments:
according to the configuration, one round method is
better than the other one. With PowerPC configura-
tion, PITDVSclosest is better than PITDVSup, but DPM-
Sup seems to be better than DPM-Sclosest. However, with
the XScale processor where we disabled one frequency,
both “closest” methods are better than “up” methods.
Remark that we observe the same kind of benefit by
disabling another frequency than 400MHz.
From the many experiments we performed, it seems
that our approach is especially interesting when the
number of available frequencies is limited, which is
not surprising. Indeed, the less available frequency, the
further from the continuous model. As the two strate-
gies we adapt where basically designed from continu-
ous model, and as our adaptation attempts to be closer
from the original strategy than the classical adaptation,
we would have expected such behavior.
We have also observed than “smooth” systems such
as the one with uniform distribution — but we have
simulated other distributions such as normal or bi-
modal normal distribution — do not give smoother
curves than with the realistic workload, even if sev-
eral of them contain very chaotic data. The irregu-
lar behavior of our curve does not seem to be related
to irregular data, but more to the fact that, as already
mentioned slight variations in S0 can have a big im-
pact on the average energy. In this paper, we do not
present a huge number of simulations, because we do
not claim that our approach is always better: what we
present should be enough to persuade system design-
ers to have a deeper look at the way they manage dis-
cretization.
6 Conclusions and Future Work
The aim of our work was twofold. First, we presented
a simple schedulability condition for frame-based low-
power stochastic real-time systems. Thanks to this con-
dition, we are able to quickly check that any schedul-
ing function guarantees the schedulability of the sys-
tem, even when frequency change overheads are taken
into account. This test can either be used off-line to
check that a scheduling function is schedulable, or on-
line, after some parameter changes, to check whether
the functions can still be used.
The second contribution of this paper was to use
this schedulability condition in order improve the
way a strategy developed for systems with continuous
speeds can be adapted for systems with a discrete set
of available speeds. We show that our approach is not
always better that the classical one consisting in round-
ing up to the first available frequency, but can in some
circumstances, give a gain up to almost 40% in the sim-
ulations we presented.
Our future work includes several aspects. First, by
running much more simulations, we would like to
identify more precisely when our approach is better
than the classical one. It would allow system designers
to be able to choose the approach to use without run-
ning simulation, or making experiments on their sys-
tem.
Another aspect we would like to consider is to have
a deeper look to how the schedulability test we pro-
vide will allow to improve the robustness of a system.
If particular, if we observe that a job has required more
than its (expected) worst case number of cycles, how
9
Figure 9 Distribution of the number of cycles needed to decode different kinds of video, ranging from news
streaming to complex 3D animations. The x-axis is the number of cycles, and the y-axis the probability.
0
0 2.1e+07 4.2e+07
2DAnima
0
0 2.7e+07 5.5e+07
3DAnima
0
0 3.2e+07 6.5e+07
Action
0
0 1.7e+07 3.5e+07
NewsConf
0
0 2.0e+07 4.0e+07
News
0
0 2.2e+07 4.4e+07
onepiece
0
0 2.1e+07 4.2e+07
Shrek3600
0
0 2.1e+07 4.3e+07
Trnas
can we adapt temporarily our system in order to im-
prove its schedulability, before we can compute the
new set of functions, using those new parameters.
References
[1] AYDIN, H., MEJI´A-ALVAREZ, P., MOSSE´, D., AND
MELHEM, R. Dynamic and aggressive scheduling
techniques for power-aware real-time systems. In
RTSS ’01: Proceedings of the 22nd IEEE Real-Time
Systems Symposium (RTSS’01) (Washington, DC,
USA, 2001), IEEE Computer Society, p. 95.
[2] CHEN, J.-J., YANG, C.-Y., KUO, T.-W., AND
SHIH, C.-S. Energy-efficient real-time task
scheduling in multiprocessor dvs systems. In
ASP-DAC ’07: Proceedings of the 2007 conference on
Asia South Pacific design automation (Washington,
DC, USA, 2007), IEEE Computer Society, pp. 342–
349.
[3] GRUIAN, F. Hard real-time scheduling for low-
energy using stochastic data and dvs processors.
In ISLPED ’01: Proceedings of the 2001 interna-
tional symposium on Low power electronics and design
(New York, NY, USA, 2001), ACM, pp. 46–51.
[4] GRUIAN, F. On energy reduction in hard real-
time systems containing tasks with stochastic ex-
ecution times. In Proceedings of Workshop on Power
Management for Real-Time and Embedded Systems
(2001), pp. 11–16.
[5] LORCH, J. R., AND SMITH, A. J. Improving dy-
namic voltage scaling algorithms with pace. In
SIGMETRICS ’01: Proceedings of the 2001 ACM
SIGMETRICS international conference on Measure-
ment and modeling of computer systems (New York,
NY, USA, 2001), ACM, pp. 50–61.
[6] LORCH, J. R., AND SMITH, A. J. Pace: A new
approach to dynamic voltage scaling. IEEE Trans-
actions on Computers 53, 7 (2004), 856–869.
[7] MOSSE, D., AYDIN, H., CHILDERS, B., AND
MELHEM, R. Compiler-assisted dynamic power-
aware scheduling for real-time applications. In
COLP’00: Proceedings of the Workshop on Compilers
and Operating Systems for Low-Power (2000).
[8] XU, R., MELHEM, R., AND MOSSE´, D. A unified
practical approach to stochastic dvs scheduling.
In EMSOFT ’07: Proceedings of the 7th ACM & IEEE
international conference on Embedded software (New
York, NY, USA, 2007), ACM, pp. 37–46.
[9] XU, R., MOSSE´, D., AND MELHEM, R. Minimiz-
ing expected energy in real-time embedded sys-
tems. In EMSOFT ’05: Proceedings of the 5th ACM
international conference on Embedded software (New
York, NY, USA, 2005), ACM, pp. 251–254.
[10] XU, R., MOSSE´, D., AND MELHEM, R. Mini-
mizing expected energy consumption in real-time
systems through dynamic voltage scaling. ACM
Trans. Comput. Syst. 25, 4 (2007), 9.
[11] XU, R., XI, C., MELHEM, R., AND MOSSE´, D.
Practical pace for embedded systems. In EMSOFT
’04: Proceedings of the 4th ACM international con-
ference on Embedded software (New York, NY, USA,
2004), ACM, pp. 54–63.
10
