In this paper, we first provide a schedulability condition for frame-based stochastic low-power real-time systems. We provide also some algorithms using this condition to check the schedulability of a given strategy, and explain how this algorithm can be used to improve the robustness to parameter changes.
Introduction
Because more and more devices need improved autonomy, because more and more devices need better mobility, or simply because climate change enforces everybody to take energy considerations into account, there are less and less fields in computer science and electronics where power consumption is not considered. Embedded systems face now to crucial energy problems, and soon, it won't be possible anymore to build such systems without respecting power constraints, in the same way that today engineers have to take into account possible failures, or parameters variation. Amongst the huge family of systems where real-time constraints ought to be considered, we focus on those where the length of tasks is not know in advance, for which we only know probabilistic properties.
In the embedded system industry, designers often prefer systems with simple characteristic, because there are easier to design, to analyze and to manage. For instance, there exist many multi-tasks systems where all tasks share the same deadline, with synchronized arrivals. Those systems are called frame-based systems.
Facing the problem of energy reduction is usually done by changing the CPU frequency. The main idea is that a task executed at a frequency f will consume roughly twice more energy than the same task at frequency f /2, or, in other words, one unit of time at frequency f consumes approximately four more energy than on unit of time at frequency f /2. The exact relationship between frequency and power consumption does not need to be exactly quadratic, but just need to be convex to allow energy saving.
Those frequency changes can be performed at several moments: only between to tasks (inter-task systems), at any time during the execution of a task (intratask systems), or at both of them (hybrid systems). Intra-task and hybrid systems have the advantage of saving usually more energy than inter-task system, but they are also more complex. They require for instance to be able to interrupt a job in the middle of its execution, change the frequency, and resume the job, which requires advanced OS mechanisms, not always available on embedded systems.
Inter-task frequency changes can even be performed in systems where neither the OS nor the tasks are really aware of the model, as soon as some "scheduler" is allowed to change the frequency. This scheduler can be a simple program (or even a script) launching a process, waiting for its termination, changing the frequency, launching the next one, and so on.
In this paper, we are interested in frame-based stochastic low-power real-time systems, with inter-task frequency changes.
Our contribution is twofold. First, we propose a schedulability test, allowing to easily know if a frequency selection will allow to meet deadlines for any task in the system. This is done by first giving a general model for scheduling functions in the system we consider, and then by providing some conditions on those functions. As a second contribution, we provide a gen-eral method allowing to adapt a method designed for a continuous set of speeds (or frequencies) into a discrete set of speeds. This can be done more efficiently than classically by using the schedulability condition we give in the first part. Appart from this alternative way of adapting continuous strategy, we will show how this schedulability test can be used in order to improve the robustness to parameters variation.
We do not aim at providing a new scheduling strategy, but given a continuous strategy, we want to improve the way to use it on systems with discrete speeds.
Related Work
Low-power real-time systems with stochastic or unknown durations have been studied for several years. The problem has first been considered in systems with only one task, or systems in which each task gets a fixed amount of time. Gruian [2, 3] or Lorch and Smith [4, 5] both shown that when intra-task frequency change is available, the more efficient way to save energy is to progressively increase the speed. First developed for ideal processor with a continuous range of speeds, more practical solutions using a discrete set of frequencies and taking into account speed change overhead have been proposed, for instance in [10] or [9] .
For inter-task frequency changes, some work has been already undertaken. In [6] , authors consider a similar model to the one we consider here, even if this model is presented differently. They consider one periodical task systems, but in which the task can be splitted in several independent sub-tasks, allowing to change the frequency between each sub-task. This is then roughly equivalent to considering several independent tasks, with inter-task speed changing. The authors present several dynamic power management techniques: Proportional, Greedy or Statistical. They don't really take into account the distribution of number of cycles, but only its maximum, and its average for Statistical. According to the strategy, a task will give its slack time (the difference between the worst case and the actual number of used cycle) either to the next task in the frame, or to all of them.
In [1] , authors attempt to allow the manager to tune this aggressiveness level, while in [9] , they propose to automatically adapt this aggressiveness using the distribution of the number of cycles for each task. They first propose a technique for ideal continuous DVS processor (OITDVS), and then adapt it to take into account the discrete number of speeds and changes overhead (PITDVS). The same authors have also proposed a strategy taking into account from the beginning the number of available speeds, instead of patching algorithms developed for continuous speed processors [7] .
Outline
Our paper is organized as follows. We first present the mathematical model of real-time system we consider in Section 2. We then present our first contribution in Section 3, which consists in schedulability conditions and tests for the model we expose in the previous section.
We then use those results in Section 4 and 5 to explain how we can improve the discretization of continuous strategies, and show the efficiency of this approach in the experimental part, in Section 6, and finally conclude in Section 7.
Model
We have N tasks {T i , i ∈ [1, . . . , N ]} which run on a DVS CPU. They all share the same deadline and period D (which we call the frame), and are executed in the order T 1 , T 2 , . . . , T N . The maximum execution number of cycles of T i is w i . Task T i will require x cycles with a probability c i (x), where c i (·) is then the distribution of the number of cycles. Of course, in practical, we cannot use a so precise information, and authors usually group cycles in "bins". For instance, we can choose to use a fixed bin system, with b i the size of the bin. In this case, the probability distribution c i (·) is such that c i (k) represent the probability to use between
The system is said to be expedient if a task never waits intentionally. In other words, T 1 starts at time 0, T 2 starts as soon as T 1 finishes, and so on.
The CPU can run at M frequencies (or speeds) f 1 < f 2 < · · · < f M , and the chosen frequency does not change during task execution. The mode j consumes P j Watts per unit of time.
We assume we have N scheduling functions S i (t) for i ∈ [1, . . . , N ] and t ∈ [0, D]. This function means that if T i starts its execution at time t, it will run until its end at frequency S i (t), where S i (t) ∈ {f 1 , f 2 , ..., f M }. S i (t) is then a step function (piece-wise constant function), with only M possible values. Remark that S i (t) is not necessarily an increasing or a monotonous function. This model generalizes several scheduling strategies proposed in the literature, such as [7, 9] , or discrete versions of [6] . Figure 1 shows an example of such scheduling function set.
A scheduling function can be represented by a set of points (black dots on Figure 1 ), representing the beginning of the step. | S i | is the number of steps of
.t being its time component, and S i [k] .f the frequency. S i has then the same value
Notice that finding k can be done in O(log | S i |) (by binary search), and, except in the case of very particular models, | S i |≤ M .
We first assume that changing CPU frequency does not cost any time or energy. See Section 5.1 for extensions.
The scheduling functions S i (t) can be pretty general, but have to respect some constraints in order to ensure the system schedulability and avoid deadline misses.
Figure 1 Example of scheduling with function S i (t).
We have 5 tasks T 1 , . . . , T 5 , running every D. In this frame, T 1 is run at frequency 
Schedulability
We first need to define the concept of schedulability in our model:
is said to be schedulable if, whatever the combination of effective number of cycles for each task, any task T i finishes its execution no later than the end of the frame.
From this definition, we can easily see that if {T i } is such that
the left hand size represents the time needed to run any task in the frame at the highest speed if every task requires its worst case execution cycle), the system will never be schedulable, whatever the set of scheduling functions. In the same way, we can see that if {T i } is such that
the system is always schedulable, even with a "very bad" set of scheduling functions.
Of course, a non schedulable system could be able to run its tasks completely in almost every case. Being non schedulable means that stochastically certainly (with a probability equal to 1), we will have a frame where a task will not have the time to finish before the deadline (or the end of the frame) 3.1 Danger Zone Lemma 1. Any task in {T i , T i+1 , . . . , T N } can always finish no later than D if and only if the system is expedient, and T i starts no later than z i , defined as
Proof. This lemma can be proved by induction. Initialization. We first consider the case T N . The very last time the task T N can start is the time allowing it to end before D even if it consumes its w N cycles. At the highest frequency f M , T N takes at most w N f M to finish.
T N has then necessarily to start no later than
Otherwise, if the task starts after that time, even at the highest frequency, there is no certitude that T N will finish by D.
Induction. We know that if (and only if) T i+1 starts no later than z i+1 , the schedulability of {T i+1 , . . . , T N } is ensured. We need then to show that if T i starts no later than z i , it will be finished by z i+1 . If T i starts no later that z i , we can choose the frequency in order that T i finishes before
This danger zone means that if T i has to start in ]z i , D], we cannot guarantee the schedulability anymore. Even if, because of the variable nature of execution time, we cannot guarantee that some task will miss its deadline. Of course, the size of the danger zone of T i is larger that the one of T j if i < j, which means that z i < z j iff i < j.
In order to simplify some notation, we will state z N +1 = D.
Schedulability Conditions
Let us now consider conditions on {S i } allowing to guarantee the schedulability of the system. We prove the following theorem:
is a necessary and sufficient condition in order to guarantee that if task T i does never require more than w i cycles and the system is expedient, any task T i will finish by z i+1 , and then the last one T N before D.
Proof. We show this by induction. Let τ i be the worst finishing time of task T i . Please note that this does not necessarily correspond to the case where any task before T i consumes its WCEC. Figure 2 highlights why.
Figure 2
Example showing that a shorter number of cycles for one task can result in a worse ending time for subsequent tasks. Here, t is the point at which S 2 (t) goes from f 1 to f 2 . On the top plot, T 1 uses slightly less cycles than in the bottom plot, and T 2 uses the same number in both cases, but is run at f 1 in the first case, and at f 2 in the second one.
First, we have to show that in the range [0,
As this function is an increasing function of t, we just need to consider the maximal value we need:
For the initialization, we consider T 1 . Clearly, as the execution length is not taken into account for the frequency selection, the worst case occurs when T 1 uses w 1 cycles. As T 1 starts at time 0, we have
by hypothesis, we have
T 1 ends then before z 2 in any case. Similarly, we have that if S 1 (t) < w 1 z 2 − t , τ 1 > z 2 , and we cannot guarantee that T 1 finishes before z 2
Induction. Let now consider T i , with i > 1. We know by induction that T i−1 finished its execution between time 0 and time z i . Let θ be this end time. Knowing that task T i starts at θ, the worst case for T i is to use w i cycles. The worst end time of T i is then
(which is possible, because we have just shown that the right hand side is not higher than f M in the range we have to consider), we have
We then have that if S i (t) ≥ w i z i+1 − t , task T i finishes always before z i+1 , and then, as a consequence, that any task finishes before z N +1 = D. Symmetrically, we can show also that if S i (t) < w i z i+1 − t , then τ i is higher than z i+1 , and then τ N is higher than D, and the system is not schedulable.
Remark that the expedience hypothesis is a little bit too strong. It would be enough to require that T i never waits intentionally later than z i . T 1 doesn't even have to start at time 0, as soon as it starts no later that z 1 . With this hypothesis, the initialization would be: in the worst case, T 1 would start at time θ, somewhere between 0 and z 1 and use w 1 cycles. In this case, it would end at
and we know that the CPU can be set to the speed
An example of such schedulability limits is given in Figure 3 , with four tasks, and a maximum frequency of 1000MHz. 
Discrete Limit
The closest scheduling functions set to the limit is
Informally, we could write this function S i (t) = w i z i+1 − t , where w stands for "the smallest available frequency not lower than x". This function varies as a discrete hyperbola between w i z i+1 and
This function is however in general not very efficient: T 1 is run at the slowest frequency allowing to still run the following jobs in the remaining time. But then, T 1 is run very slowly, while {T 2 , . . . , T N } have a pretty high probability to run at a high frequency. A more balanced frequency usage is often better.
This strategies actually corresponds to the Greedy technique (DPM-G) described by Mossé et al. [6] , except that they consider continuous speeds.
Building such a function is very easy, and is in O(M ) for each task, with the method given by Algorithm 1. We mainly need to be able to inverse L: L
Algorithm 1 Building Limit, worst case scheduling functions. (a)
+ means max{0, a}.
In the following, this strategy is named as Limit.
Checking the schedulability
Provided a set of scheduling functions {S}, checking its schedulability is pretty simple. As we know that the limit function is non decreasing, we just need to check that each step of S i is above the limit. This can be done with the following algorithm.
This check can then be performed in O N i=1 | S i | which, is S i is non decreasing (which is almost always the case) is lower than O(N × M ).
This test can be used offline to check the schedulability of some method or heuristic, but can also be performed as soon as some parameter change has been detected. For instance, if the system observes that a task T i used more cycles than its (expected) WCEC w i , the test could be performed with the new WCEC in order
Using Schedulability Condition
to Discretize Continuous Methods Figure 4 Two different ways of discretizing a continuous strategy: Discr. strat. 1 rounds up to the first available frequency. Discr. strat. 2 (our proposal) uses the closest available frequency, taking the limit into account. Limit is the strategy described by Algorithm 1.
There are mainly two ways of building a set of Sfunctions for a given system. The first method consists in considering the problem with continuous available frequencies, and by some heuristic, adapting this result for a discrete speeds system. The second method consists in taking into account from the beginning that there are only a limited number of available speeds. The second family of methods has the advantage of being usually more efficient in terms of energy, but the disadvantage of being much more complex, requiring a non negligible amount of computations or memory. This is not problematic if the system is very stable and its parameters do not change often, but as soon as some on-line adaptation is eventually required, heavy and complex computations cannot be performed anymore.
In the first family, the heuristic usually used consists in computing a continuous function S c i (t) which is build in order to be schedulable, and to obtain a discrete function by using for any t the smallest frequency above S c i (t), or S i (t) = S c i (t) . However, this strategy is often pessimistic. But so far, there were no other method in order to ensure the schedulability. This assertion is not valid anymore, because we provided in this paper a schedulability condition which can be used.
The main idea is, instead of using the smallest frequency above S c i (t), to use the closest frequency to S c i (t), and, if needed, to round this up with the schedulability limit L i (t). In other words, we will use:
The advantage of this technique is that we have more chance to be closer to the continuous function (which is often optimal in the case of continuous CPU). However, both techniques (ceiling and closest frequency) are approximations, and none of them is guaranteed to be better than the other one in any case. As we will show in the experimental section, there are systems in which the classical discretization is better, but there are also many cases where our discretization is better. 
·). (a)
+ stands for max{0, a}.
Actually, computing the closest frequency amongst {f 1 , f 2 , . . . , f M } roughly boils down to compute the round up frequency amongst the set { }. Then, the range corresponding to f1+f2 2 is mapped onto f 2 , etc. In Algorithm 3, if we simply use f j−1 instead of f , we obtain the classical round up operation.
Model Extensions

Frequency Changes Overhead
Our model allows to take easily into account the time penalty of frequency changes. Let P T (f i , f j ) be the time penalty of changing from f i to f j . This means that once the frequency change is asked (usually, a special register has been set to some predefined value), the processor is "idle" during P T (f i , f j ) units of time before the next instruction is run. We assume that the worst time overhead is when the CPU goes from f 1 to f M . We denote for this
Notice that this model is rather pessimistic: on modern DVS CPUs, the processor does not stop after a change request, but still run at the old frequency for a few cycles before the change becomes effective. However, even if the processor never stops, there is still a penalty, but the time penalty is negative when the speed goes down (because the job will be finished sooner than if the frequency change had been performed before it started). Then as a first approximation, we could consider that negative penalties compensate positive penalties. But this approximation does not hold for energy penalties, because all of them are obviously positive.
We want also to take into account the switching time before jobs, even if there is no frequency change (we assume that the job switching time is already taken into account in P T ). Let S T (f i ) be the switching time when the frequency is f i , and is not changed between two consecutive jobs. Again, let
We made here the simplifying hypothesis that the switching time is job independent, which is an approximation since this time usually depends upon the amount of used memory. However, in our purpose, we only need to consider an upperbound of this time.
As before, we know that T N must start no later than
. If T N starts at this limit (and even before), the selected frequency must be f M . Then we could have two situations:
• Best case: the previous tasks T N −1 was already running at f M . Then T N −1 needs to finish before the start limit for T N , minus the switching time,
• Worst case: the previous tasks T N −1 was not running at f M , we need then to change the frequency. In the worst case, the time penalty will be P M T . T N −1 needs then to finish no later than D −
The first limit is then a necessary condition, and the second, a sufficient condition to ensure the schedulability of T N . Similarly, we can see that T i must start before z n i to ensure the schedulability of itself and any subsequent task (necessary condition), and this schedulability is ensured (sufficient condition) if T i starts before z 
We can then provide two schedulability conditions:
Algorithm 3 can easily be adapted using those conditions. We use then L i (t) = w i z s i+1 − t .
Soft Deadlines
If we want to be a little bit more flexible, we could possibly consider soft deadlines, and adapt our schedulability condition consequently. The main idea is to not consider the WCEC, but to use some percentile:
where c i is the actual number of cycles of T i , we can use κ i (ε) as a worst case execution time. However, it seems to be almost impossible to compute analytically the probability of missing a deadline with this model. It would boil down to compute
where E i represents the execution time of jobs of task T i . E i depends then upon the job length distribution, but also upon the speed at which T i is run, which depends upon the time at which T i−1 ends ... which depends upon the time T i−2 ended, and so on. As E i 's are not independent, it seems then that we cannot use the central limit theorem.
If we accept an approximation of the failure probability, we could do in the following way. Let C i be the random variable giving the number of cycles of T i , and C = i C i . Let W = i w i be the maximal value of C (the frame worst case execution cycle). Let
We assume that using the deadline D W C ε will allow to respect deadlines with a probability close to 1 − ε. Those propositions are only heuristics, and should require more work, both analytic and experimental.
Experimental Results
In order to evaluate the advantage of using a "closest" approach instead of an "upper bound" approach, we applied it on two methods. The first is one described by Mossé et al. in [6] , and is called DPM-S (Dynamic Power Management-Statistical), and the second one is described by described by Xu, Melhem and Mossé [9] , called PITDVS (Practical Inter-Task DVS).
DPM-S
The method DPM-S described in [6] bets that the next jobs will not need more cycles than their average, and compute then the speed making this assumption when a job starts. Of course, the schedulability limit is also taken into account. In their paper, the authors consider that they can use any (normalized) frequency between 0 and 1. In order to apply this method on a system with a limited number of frequency, we can either round them up, or use or "closest" approach. They don't take into account frequency change overheads, but according to what we claimed hereabove, those overheads are easy to integrate.
We compute now the two following step functions in this way, where avg i stands for the average number of cycles of T i : in Algorithm 3 adapted to take into account frequency changes overhead (cf Section 5.1),
• DPM-S up : we replace S
• DPM-S closest : we replace S
PITDVS
The second method we consider, by Xu, Melhem and Mossé [9] , is called PITDVS (Practical Inter-Task DVS), and aims at patching OITDVS (Optimal Inter-Task DVS [8] ), an optimal method for ideal processors (with a continuous range of available frequencies). They apply several patches in order to make this optimal method usable for realistic processors. They start by taking into account speed change overhead, then they introduce maximal and minimal speed (OITDVS assumes speed from 0 to infinity), and finally, they round up the Sfunction to the smallest available frequency. It is in this last patch that we apply our technique. Using the β i value described in [9] (representing the aggressiveness level), we compute the step functions in the following way: in Algorithm 3 adapted to take into account frequency changes overhead (cf Section 5.1),
• PITDVS up (in [9] ): we replace S
• PITDVS closest (our adaptation): we replace S
In the following, we also run simulations using L (Limit) to choose the frequency. Our aim was not to show how efficient or how bad this techniques, but more to show that often, we observe rather counterintuitive results. 
Simulations
We performed a large number of simulations in order to compare the energy performance of "round up" and "round to closest". We compare several processor characteristics, and several job characteristics. We both use theoretical models and realistic values extracted from production systems.
We present here experimental results run for two different kinds of DVS processors (see for instance [7] for details about characteristics): a XScale Intel processor (with frequencies 150, 400, 600, 800 and 1000MHz), and a PowerPC 405LP (with frequencies 33, 100, 266 and 333MHz). We took into account frequency change overhead, but the contribution of change overhead was usually negligible for all of the simulations we performed (lower that 0.1% in most cases). As a third CPU, we used the characteristics of XScale, but we disabled one of its available frequency (400MHz in the plots we show here), in order to highlight the advantage of using our approximation against round up approximation when the number of available frequencies is quite low.
For the figures we present here, we simulated the same system with different strategies computed with variations of Algorithm 3, amongst DPM-S closest (Eq. (2)), DPM-S up (Eq. (1)), PITDVS closest (Eq. (4)), PITDVS up (Eq. (3)) and Limit (Algorithm 1), computed the energy consumption, and presented the ratio of this energy to PITDVS closest or DPM-S closest . We then performed the same system, but for various deadlines, going from the deadline allowing to run any task at the lowest frequency (D = 1 f1 N i=1 w i ), to the smallest deadline allowing to run any task at the higher
We even used smaller deadlines, because this limit represents a frame where each tasks needs at the same time its WCEC, which has a very tiny probability to occur. We can consider that decreasing the deadline boils down to increase the load: the smaller the deadline, the higher the average frequency. And quite intuitively, for small and large deadline (or frame length), we don't have difference between strategies, because they all use always either the lowest (large deadline) or the highest (small deadline) frequency.
A first observation was that in many cases, the Sfunction of PITDVS up was already almost equal to Limit. As a consequence, we could not observe any difference between PITDVS up and PITDVS closest . We can for instance see this on Figure 6 , right plot: for deadlines between 0.1 et 0.06, we don't see any difference between PITDVS closest and Limit.
In the first set of simulations ( Figures 5 and 6 ), we used 12 tasks, each of them having a uniformly distributed number of cycles, with miscellaneous parameters. On the PowerPC processor, we observe a large variety in performance comparison. According to the load (or the frame length), we see that PITDVS closest can gain around 30% compared to PITDVS up , or loose almost 20%, while we obtain similar comparison for DPM-S closest and DPM-S up , but with smaller values.
We observe also very abrupt and surprising variations, such as in Figure 6 , middle and right, for Limit, around 0.03. A closer look around to variations show that they usually occurs when the frequency of T 1 changes. Indeed, as T 1 starts always at time 0, its speed does not really depends upon S 1 (t), but only upon For instance, if we slightly change S i (i = 1), it will only impact a few task speeds. But slight changes in S 0 have either no impact at all, or an impact on every task in every frame.
From those first figures, we can for sure not claim that doing a "closest" approach is always better than a "upper bound". But those simulations highlight that there are certainly situations where one approach is better than the other one, and situations with the other way around. System designers should then pay attention to the way they round continuous frequencies. With a very small additional effort, we can often to better than simply round up the original scheduling function.
As a second set of simulations, we used several workloads coming from video decoding using H.264, which is used in our lab for some other experiments on a TI DaVinci DM6446 DVS processor. On Figure 9 , we show the distribution of the 8 video clips we used, each with several thousands of frames. Figures 7 and 8 present some results with those traces. We observe the same kind of differences as from the previous experiments: according to the configuration, one round method is better than the other one. With PowerPC configuration, PITDVS closest is better than PITDVS up , but DPM-S up seems to be better than DPM-S closest . However, with the XScale processor where we disabled one frequency, both "closest" methods are better than "up" methods. Remark that we observe the same kind of benefit by disabling another frequency than 400MHz.
From the many experiments we performed, it seems that our approach is especially interesting when the number of available frequencies is limited, which is not surprising. In this paper, we do not present a huge number of simulations, because we do not claim that our approach is always better: what we present should be enough to convince system designer to have a deeper look at the way they manage discretization.
Conclusions and Future Work
The aim of our work was twofold. First, we presented a simple schedulability condition for frame-based lowpower stochastic real-time systems. Thanks to this condition, we are able to quickly check that any scheduling function guarantees the schedulability of the system, even when frequency change overheads are taken into account. This test can either be used off-line to check that a scheduling function is schedulable, or online, after some parameter changes, to check whether the functions can still be used.
The second contribution of this paper was to use this schedulability condition in order improve the way a strategy developed for systems with continuous speeds can be adapted for systems with a discrete set of available speeds. We show that our approach is not always better that the classical one consisting in rounding up to the first available frequency, but can in some circumstances, give a gain up to almost 40% in the simulations we presented. Figure 9 Distribution of the number of cycles needed to decode different kinds of video, ranging from news streaming to complex 3D animations. The x-axis is the number of cycles, and the y-axis the probability. Our future work includes several aspects. First, by running much more simulations, we would like to identify more precisely when our approach is better than the classical one. It would allow system designers to be able to choose the approach to use without running simulation, or making experiments on their system.
Another aspect we would like to consider is to have a deeper look to how the schedulability test we provide will allow to improve the robustness of a system. If particular, if we observe that a job has required more than its (expected) worst case number of cycles, how can we adapt temporarily our system in order to improve its schedulability, before we can compute the new set of functions, using those new parameters.
