Reducing energy consumption is one of the main concerns in the design and implementation of embedded real-time systems. For this reason, the current generation of processors allows to vary voltage and operating frequency to balance computational speed and energy consumption. This technique is called dynamic voltage scaling (DVS).
Introduction
The number of embedded systems operated by batteries is increasing in different application domains, from personal digital assistants (PDAs) to autonomous robots, smart phones and sensor networks. Reducing the energy consumed by these systems has become a key design issue, as they can only operate on the limited battery supply. Battery lifetime is a critical design parameter, and it affects directly the size and the weight of the system.
In recent years, the problem of energy reduction arose in the real-time servers' area as well (Rusu et al., 2006) . As processors become more and more powerful, their energy consumption increases correspondingly, and it becomes a problem to dissipate the heat produced (Lefurgy et al., 2003; Bianchini and Rajamony, 2004) . In fact, the increase of the computational power in current digital systems is mostly obtained by increasing the clock frequency, leading to a higher energy demand (Pouwelse et al., 2001; Ishihara and Yasuura, 1998; Lorch and Smith, 2001; Rusu et al., 2006) and a greater heat generated. Conventional computers are currently air-cooled, and manufacturers are facing the problem of building powerful systems without introducing additional techniques such as liquid cooling (Lefurgy et al., 2003) . Not surprisingly, a significant portion of the consumed energy is due to cooling devices, which may consume up to 50% of the total energy (Lefurgy et al., 2003) . For example, a 10 kW rack consumes about 10 MWh a month (including cooling), which represents at least 10% of the operation cost (Barroso et al., 2003) , with this trend likely to increase in the future. Reducing the energy consumed by the computing components would impact on the energy consumed by the cooling devices and, eventually, on the cost of the system.
To prolong the lifetime of all these systems, the energy consumption should be reduced to an absolute minimum through energy-aware techniques. For this reason, the current generation of processors (Intel Corp., 2004a , 2004b MPC5200; Crusoe, 2003; Intel Centrino) allow the operating system to vary dynamically the operating frequency to balance computational speed versus energy consumption. This technique is called dynamic voltage scaling (DVS), and it is used by many energy-aware scheduling policies (Scordino and Lipari, 2004; Pillai and Shin, 2001; Aydin et al., 2004; Qadi et al., 2003; Zhu and Mueller, 2004) . However, a weakness of most approaches is due to the set of assumptions, often not realistic, which are made to simplify the solution. Besides ignoring the energy consumed by the processor when it is idle, these methods often neglect also the delay due to a frequency transition, preventing thus the application of research results to realworld systems. In some approaches (Lee and Sakurai, 2000; Mochocki et al., 2002) , such a delay has been considered in the processor model, but only dynamic techniques aimed at reducing the slack time have been developed. For this reason, we have formulated a general model for the processor Scordino and Bini, 2005) , which takes into account both time and energy overheads due to frequency transition.
In real-time systems, any energy-aware policy acting on the processor speed must take timing constraints into account, to guarantee the timely execution of the computational activities. For safety reasons, hard real-time systems are typically designed to handle peak loads. However, peak load conditions rarely happen in practice, and the system resources are underutilised most of the time. For example, server loads often vary significantly depending on the time of the day or other external factors. Researchers at IBM showed that the average processor utilisation of real servers is between 10% and 50% of their peak capacity (Bohrer et al., 2002) . Thus, much of the server capacity remains unused during normal operations (Rusu et al., 2006) . These issues are even more critical in embedded systems (Xu et al., 2005) , where the peak power has an important impact on the size and the cost of the system. Some studies have observed (Wegener and Mueller, 2001 ) that the actual execution times of tasks in real-world embedded systems can vary up to 80% with respect to their measured worst case execution cycles (WCECs). This suggests that a striking energy reduction can be achieved by enriching DVS policies with more detailed information on the required workload.
Recently, the discipline of probabilistic timing analysis has significantly advanced (Burns et al., 2003; Ermedahl et al., 2003) , and today there exist tools that can provide the probability density function (PDF) of task's execution times . Basically, these tools partition the task code into basic blocks, which are sequential instructions between two consecutive conditional branches. The duration of each block depends on the processor status (e.g., caches, pipeline stages, out-of-order execution, etc.) and it can be modelled by a random variable. The PDF of the whole task can be extracted by combining the information of every block. This information can then be exploited to reduce the energy consumption. Lorch and Smith (2001) proposed the processor acceleration to conserve energy (PACE) model that increases speed as the task progresses in continuous speed processors. However, the model is not general: it considers only a well-defined power function (i.e., energy per cycle proportional to the speed square) and it does not account for the overhead during frequency transition. extended the model to the case of discrete speeds and general power functions. Furthermore, they took into account idle power and speed change overhead. However, none of the above papers dealt with optimal speed transition instants, and it is not clear how an optimal sequence of transition points can be found. In the original paper, the authors proposed a heuristic to select a 'good' sequence of transition points, but only justified it under the condition of continuous speed (Lorch and Smith, 2001 ). In the second paper, the problem of optimal transition points is said to be 'still an open problem beyond the scope of the paper' ). An attempt to consider stochastic information in energy reduction problems has been made also by Gruian (2002) . However, that study addresses the case with no transition overheads and with a specific power function, as well.
In this paper, we integrate the concept of probabilistic execution time within the framework of energy minimisation, providing the basis of a new challenging approach. We show how probabilistic information about task execution times can be exploited to reduce the energy consumed by the processor without violating the timing constraints. Optimal speed assignments and transition points are found using a very general model that accounts for processor idle power and time/energy overheads due to frequency transitions. We also show how these results can be applied to some significant examples.
Energy management scheme
We focus on the problem of reducing the energy consumed by a task τ on a variable speed processor. Some existing energy-aware algorithms (Zhu and Mueller, 2004; Lorch and Smith, 2001; have been developed starting from this simple scheme, since it constitutes a good starting point for more complex analysis.
In our model, the task τ has a period and a deadline both equal to .
T The number of processor cycles required by the task is modelled by a random variable whose PDF is ( ).
C c ƒ
The maximum possible number of cycles needed by τ is max .
C Since τ is a hard real-time task, max C cycles must be available in [0, ] T in any case. If the number of required cycles in [0, ] T is known in advance, it has been shown (Ishihara and Yasuura, 1998; Pouwelse et al., 2001 ) that using a constant speed during task execution minimises the energy consumed assuming a continuous speed processor. In fact, the convexity of the power/speed curve implies that maintaining a constant speed α is better than switching between two different speeds. When the processor offers a limited set of speeds, using the two speeds which are the closest to the optimal (ideal) speed minimises the energy consumption (Ishihara and Yasuura, 1998) .
In our scheme, it is not possible to compute the optimal constant speed α because the actual number of cycles required by the current instance of τ is not known in advance. In this case, a common technique (Aydin et al., 2004; Zhu and Mueller, 2004; Pillai and Shin, 2001; Lorch and Smith, 2001; ) is based upon the idea of deferring some work, expecting that the current instance of τ will request much less than its WCEC max .
C This technique typically splits the task execution into two parts, as shown in Figure 1 . In the first part, the processor runs at a lower speed L α to reduce the energy consumed in the average case. In the second part, instead, the processor runs at a higher speed H α in order to provide up to max C cycles even in the worst case. The idea is that, if a task tends to use much less than its WCEC, the second part, which consumes more energy, may never be needed. The idea of deferring work has been widely used in the literature to create several energy-aware algorithms. For instance, Pillai and Shin (2001) proposed the RTDVS-look ahead algorithm, which defers as much work as possible, setting the processor frequency to the minimum value which ensures that all future deadlines will be met. This technique has also been used by Aydin et al. (2004) in the 'aggressive' version of the DRA algorithm. This algorithm speculatively assumes that current and future instances of the task will most probably present a computational demand lower than the worst case. Hence, it tries to reduce the speed of the running task by deferring all the work above a certain threshold, set according to the average workload. A similar approach has been applied to EDF by Zhu and Mueller (2004) : each task's instance is divided into two portions and the goal is to provide the average number of cycles avg C within the first portion. All these techniques follow intuitive ideas [such as providing the average execution cycles in the first part (Zhu and Mueller, 2004) ] to simplify the solution, and do not study analytically the problem of finding optimal values for speed assignments and transition instants.
Processor model
In CMOS circuits, the energy spent in dynamic switching dominates the energy consumed by leakage currents, and the dynamic portion of energy consumption is modelled by well-known polynomial formulas (Chandrakasan and Brodersen, 1995; Hong et al., 1998) . However, as the integration technology advances, it is expected that the leakage will significantly affect, if not dominate, the overall energy consumption in integrated circuits (ICs) (for semiconductors; Rabaey et al., 2002; Gruian, 2002) . Very recently, some work addressed the issue of scheduling a real-time application while reducing the leakage power as well (Quan et al., 2004) . Also, an important fraction of the consumed energy depends on the memory. It has been shown (Pouwelse et al., 2001 ) that at low frequencies, the energy consumption is dominated by the memory, whereas at high frequencies it is dominated by the processor core. All these remarks have led us to formulate a general model for the processor energy consumption. The processor is characterised by a set For the sake of simplicity, we assume that k o and k e depend only on mode Λ k and neither on the mode the processor was operating before, nor the processor status. Also, we consider only efficient speeds meaning that:
In fact, if this condition is not true for some Λ i and , Λ j then the mode Λ j would be always more convenient than the mode . Λ i DVS architectures may also have inefficient operating frequencies, that can be easily removed from the set of available frequencies (Saewong and Rajkumar, 2003; PARTS) .
Notice that this model is very general, since it is suitable for both continuous and discrete speed processors. If the processor can vary its speed continuously, then the set M is composed by infinite modes; on the other hand, if the processor has only discrete operating modes then the set M will be finite.
Finally, we suppose that the processor has one idle operating mode denoted by . Λ I The processor enters idle mode Λ I when all the computation required by the task is completed. When running in idle mode, the processor does not provide any useful computation (i.e., 0). = I α Notice that existing processors have several idle modes presenting different features. Taking into account only one idle mode , Λ I however, constitutes a good starting point for considering more complex processors.
Optimal speed assignment
A speed assignment is optimal when it minimises the average energy consumed. Given the PDF of the task computation time are equal to zero.
Average energy consumption
We now compute the average energy consumption based on the probabilistic information of the task execution time. The optimal values for speed assignments and transition instants will be computed in Section 3.2 based on this result.
The energy consumed by the processor can be split into two separate components: the active energy , A E consumed when executing the task , τ and the idle energy , I E consumed when the task has terminated and the processor has entered mode . Λ I Since these two terms must be added, they can be considered separately. First, let us compute the value of the active energy.
Let L α and H α be the lower and the higher processor speeds respectively. The period of the scheme is . Τ The number of processor cycles required by the task τ in each period is modelled by a random variable whose PDF is ( ) C f c and the maximum number of cycles is max . C This amount of cycles must be guaranteed in each period because the task is subject to a hard real-time constraint. Our goal is to find the optimal values for the two speed levels L α and H α and the instant Q when the transition should occur. We consider the two cases separately.
In the first case ( ),
and the active energy consumed in one period T is ( )
On the other hand, when
and the energy consumption is
The energy consumption as function of the number of cycles c is shown in Figure 2 . Notice that due to the assumption of equation (1) 
where we set
For compactness, if we also set ( ) 
Accounting for the idle power Equation (7) takes only into account the energy consumed when the processor is running the task. We now evaluate the contribution I E to the energy consumed by the processor after the task has terminated. This contribution is ( ) = + − −
I I I I E e p T f o
where I e and I o are respectively the overheads of energy and time to enter the idle mode and ƒ is the finishing time of the task.
From the previous equations (2) and (4), which established the relationship between the required number of cycles and the finishing time ƒ we can express I E as a function of the number of cycles .
c Hence, we have
As done for the active power consumption, we calculate similarly the average consumption of idle energy by integrating equation (8) 
where we used the definition of ( ) x γ given in equation (6). We can write it in a more compact form as (
Finally, if we add equations (7) and (9), we find that
Equation (10), which extends equation (7) to the case of idle power, is a new result in the literature. It expresses the average energy consumption as function of the probability density of the task execution cycles, while taking also into account the idle power and the overheads. Now an essential remark is in order. Equation (10) can be obtained from equation (7) by the following substitution ( ) Hence, we can say that the 'idle power can be taken into account by a simple adjustment of the operating modes'. For this reason, unless specified differently, in the rest of the paper, we will consider only equation (7).
Minimum energy consumption
Equation (7) is valid for processors with discrete as well as continuous operating modes. Taking into account discrete operating modes requires the evaluation of the energy consumed by all the mode pairs. The problem then becomes to find the optimal pair with the lowest total energy consumption. The complexity of this evaluation process is
where m is the number of available efficient operating modes.
For processors with continuous operating modes, instead, it is possible to analytically find the optimal values for speed assignments and transition instant. Many significant contributions in the literature (Aydin et al., 2004; Pillai and Shin, 2001; Zhu and Mueller, 2004; Scordino and Lipari, 2004) still assume a continuous speed because if the processor speed levels are very close to each other, then this approximation is very close to reality. Obviously, if the optimal speed is not available, it has to be approximated with the closest discrete speed higher than the optimal one. In this case, there is an increase of energy consumption, called energy quantisation error, which was studied by Saewong and Rajkumar (2003) .
In (Chandrakasan and Brodersen, 1995) . However, as stated earlier, it is expected that the power/speed relationship may differ from the ideal polynomial function (Gruian, 2002) . For this reason, we model this relationship by a generic function ( ).
p α
Different from the case of discrete operating modes, in the case of continuous operating modes, we can find the conditions for minimum energy consumption (i.e., optimal transition instant and speed levels) starting from equation (7).
The speeds ( ) 
These equalities follow directly from the adopted energy management scheme as shown in Figure 1 . In the remainder of the paper, we will develop the energy management scheme using x C and Q as our free variables. It is very insightful to plot the quantity avg E on a plane ( , ).
x C Q Figure 3 shows The minimum of equation (7) can be found analytically by calculating the partial derivatives of avg E with respect to the variables
We have:
where we used the property that, from equation (11), it follows:
An equivalent property can also be found for the differentiation with respect to . Q In fact, we have 
Equations (12) and (14) 
From equation (7), replacing H α with max α and L α with the expression of equation (15), we find avg E as function of the unique variable .
x C The minimal energy solution is found by applying classical techniques of functional analysis of one-variable functions.
Examples
After the main equations for the general case are found, we show how they can be applied to find the optimal ( , )
x C Q in some common cases.
When considering continuous speed levels, a common assumption are that the relationship between the power consumption p and speed α is
k n The typical value of n is 3. However, we keep the general form as long as the math is tractable. In this hypothesis, the gradient can be simplified as follows: 
In order to find the conditions of minimum energy, we have to set both the gradient components equal to zero. The math is greatly simplified by assuming no overhead ( 0 = e and 0). 
Due to lack of space, we do not include all the calculations. Because of their importance, we call equation (16) the minimum stochastic energy equations. Once we know n and the probability density ( ), C c ƒ equation (16) can be solved to obtain the pair ( , ) x C Q that minimises the energy consumption.
Uniform density
Let us 
The function ( ), c γ defined in equation (6) 
In this case, the minimum energy can be simply found by properly substituting ( ) 
In the same way, if we assume a more realistic power function with 3, n = the polynomial to be solved becomes ( 
These equations show that our result is an improvement with respect to the sub-optimal solution proposed by Zhu and Mueller (2004) (i.e., setting x C equal to avg ). C In fact, from both equations (17) and (19), we see that the 'optimal value is always greater than' avg C (see also Figure 4 ).
Providing avg C cycles at speed L α would increase the average energy consumed in the period. 
Exponential density
The probability density considered previously is very simple and it allows finding the exact value of the pair ( , )
x C Q that minimises the average energy consumption. We consider now a more sophisticated density ( ) C c ƒ that better captures the characteristics of real execution times. Without loss of generality, we normalise the number of cycles with respect to max C so that the possible values of cycles are in [0,1].
As before, we set Unfortunately, when dealing with exponential densities, the minimal energy ( , )
x C Q pair can only be found by numerical approximation. We investigated the effect of the PDF asymmetry onto the solution. The result is quite interesting. In Figure 6 , we plot the ratios 
Impact of the overheads
In this experiment, we evaluate the impact of energy overhead e and the time overhead o on the energy consumed. For this purpose, we fix the PDF ( ) C c ƒ equal to the exponential density with 50 = − β (please refer to Section 4.2), and the power function 3 ( ) . p = α α Then, we vary the time overhead o and the energy overhead e and, for each value, we compute the average energy consumption avg .
E
The results are shown in Figure 7 . As expected, increasing the overheads results in an increase of the energy consumed.
Idle power
As remarked at the end of Section 3.1, the energy consumed during the idle operating mode I Λ can be taken into account by adjusting properly the parameters of the two modes L Λ and .
H
Λ In this section, we evaluate the impact of the power I p consumed during the idle mode on the optimal values of the speed switch. In the experiments, the PDF is set equal to the exponential density and the power function is assumed to be cubic. In Figure 8 , we plot the results. 
The increase of the average finishing time avg ƒ shown in Figure 8 (a) is justified by the fact that the processor consumes energy during the idle mode I Λ as well. To avoid a waste of energy, it is preferable to stay in active mode for a longer time by deferring the average finishing time avg . ƒ Therefore, it is possible to lower the speeds L α and .
H α This insight is confirmed by Figures 8(b) and 8(c).
Conclusions
Deferring the workload is an effective technique to reduce the energy consumed by the processor when the actual number of cycles is unknown. We exploited this technique when proposing a general model for the processor that accounts for the idle power and for both the time and the energy overheads due to frequency transitions. We computed the optimal values for the speed assignments and the transition instant within our energy management scheme. We also studied how the overhead and the idle power affect the optimal values. Very interestingly, we showed that the power consumed during the idle operating mode can be easily taken into account by adjusting the active operating modes of the processor.
Finally, we showed how our results can be applied to some common cases (namely, the uniform and exponential densities).
As future work, we plan to implement the algorithm in the RTSim Simulator (http://www.rtsim.sf.net) to evaluate the improvement over the existing energy aware techniques. the average execution cycles 
