Abstract-Voltage scheduling is an essential technique used to exploit the benefit of dynamic voltage-scaling processors. Though extensive research exists in this area, current processor limitations such as time and energy transition overhead and voltage-level discretization are often dismissed as insignificant. We show that for hard real-time applications, disregarding these details can lead to suboptimal or even invalid results. We propose two algorithms to account for these limitations. The first is a greedy approach, while the second is more complex, but can significantly reduce the system's energy consumption. Through experimental results on both real and randomly generated systems, we show the effectiveness of both algorithms and explore what conditions make it beneficial to use the complex algorithm over the basic one.
I. INTRODUCTION
The demand for mobile and pervasive computing devices has made low-power/energy computing a critical technology. One of the most effective ways to reduce energy consumption in CMOS processors is dynamic voltage scaling (DVS), i.e., dynamically varying a processor's supply voltage and clock frequency simultaneously. Various DVS processors are commercially available, including Intel's XScale [1] , AMD's Mobile Athlon [2] , and Transmeta's Crusoe processor [3] . Several research groups have also developed their own DVS systems. For example, Burd and Brodersen implemented a DVS system using the ARM8 core [4] , while Pouwelse et al. constructed a similar system using the SA-1100 [5] . Ideally, a DVS processor would operate at any voltage within a specific range and switch from one voltage to another instantaneously. However, due to physical limitations, DVS processors always incur both time and energy overhead during a voltage transition. Furthermore, all commercially available DVS processors today can only operate at discrete voltage levels.
To maximally exploit the benefits of a DVS processor, voltage scheduling, the selection of voltage levels and operating frequencies, is indispensable. A large number of research results have been published on voltage scheduling for DVS processors. These results differ in many aspects, such as the type of applications (e.g., real-time or nonrealtime) considered, the type of systems (e.g., single or multiple processors) used, the location (intratask versus intertask), where a voltage change is allowed, the execution style (e.g., on-line or off-line) of the voltage-scheduling algorithms, etc. It is not difficult to see that nonideal properties of DVS processors, such as discrete voltage levels and transition overhead, can effect voltage-scheduling results and deserves careful study.
In this paper, we focus on voltage scheduling for a set of real-time jobs executed by a single DVS processor. Many embedded applications can be described by such a model. In particular, we study off- line, interjob voltage scheduling where the real-time jobs are executed according to the preemptive earliest deadline first (EDF) scheduling scheme [6] . Preemptive EDF is an optimal scheduling algorithm and has been adopted by many real-time systems [7] . Interjob (or intertask) voltage scheduling is realized by the operation system, which is less intrusive and more portable for a given application. Off-line scheduling does not compete for resources with the actual application and hence can afford to use more sophisticated algorithms. Though off-line scheduling cannot handle dynamic variations, it can often be used as a complement to on-line approaches. The uniqueness of our work is that the impact of practical limitations of DVS processors is analyzed in detail and carefully accounted for during the development of our algorithms.
A. Related Work
Substantial research exists for scheduling real-time applications on DVS processors, e.g., [8] - [14] , some of which have considered off-line, interjob voltage scheduling for preemptive EDF based real-time systems, e.g., [12] , [14] , [15] . However, a limited amount of work has examined voltage scheduling in the presence of practical limitations. Some of these consider only discrete voltage levels. For example, Chandrasena et al. [16] introduce a rate-selection algorithm for a DVS processor with limited voltage levels, but the algorithm provides no deadline guarantee for the tasks. In [17] , Lee et al. consider discrete voltage levels in dynamic (i.e., on-line) voltage scheduling for periodic tasks. Their method, called time slicing, requires that tasks be divided into subtasks (or slots), which is not always possible. Even if the division is possible, preemption is not allowed within a subtask, so this method cannot be applied to the preemptive scheduling problem we address here. Kwon et al. give an optimal intratask scheduling algorithm under the preemptive EDF scheme to match a discrete set of voltage levels [18] . Swaminathan and Chakrabarty also introduce a method to optimally schedule job sets off-line, given a discrete set of voltage levels. Their method uses network flow techniques to assign a single valid voltage level to each job [15] . Saewong and Rajkumar give an in-depth analysis of the ideal placement of a set of discrete voltage levels for a DVS processor, and conclude that the energy increase due to voltage/frequency quantization is inversely proportional to the number of levels [19] . Since the above approaches ignore transition overhead, they tend to introduce more transitions in order to better match discrete voltage levels, which further exasperate the impact of transition overhead.
A number of researchers have studied voltage scheduling when transition overhead is not negligible. Manzak et al. [20] address the transition-time overhead by linearly increasing the total required execution time or decreasing the processor utilization. Such adjustments may lead to either a deadline miss or an overly pessimistic design. In [21] , Hong et al. present a heuristic algorithm that accounts for transition overhead during static scheduling, but assumes both the availability of any voltage level and continuous execution of instructions during a transition. Neither assumption is true for most DVS systems [1] - [3] , [5] . AbouGhazaleh et al. propose an intratask voltage-scheduling method that accounts for transition overhead. Their method, however, requires both compiler support and that the source code be engineered to give voltage selection "hints" to the operating system [22] . Hsu and Kremer also present a compiler-driven DVS algorithm, but hard deadlines are not guaranteed [23] . Saewong and Rajkumar account for large time overheads using a method called Sys-Clock, that selects the minimum constant speed that will finish all jobs by their deadlines; essentially avoiding transition overhead by having no transitions at all [19] . Zhang and Chakrabarty consider all three limitations when scheduling voltage levels and checkpoint times 0278-0070/04$20.00 © 2004 IEEE for fault-tolerant hard real-time systems with periodic tasks [24] . They schedule at most one speed per task, which is coarser than the job-level assignment we consider. Also, they assume that each task can meet all deadlines when running at the smallest processor speed if no faults are present. This assumption will completely eliminate the benefit of DVS for the fault-free environment presented here.
B. Our Contributions
In this paper, we present several observations regarding the impact of transition overhead on executing real-time jobs according to the preemptive EDF scheme. These observations show that transition time overhead can cause deadline misses as well as a significant increase in energy consumption if not handled carefully. A basic algorithm is devised to guarantee that no deadline violations occur in the presence of transition time overhead. Building on the basic algorithm, we developed an algorithm that considers both transition time and energy overheads as well as discrete voltage levels. A large number of experiments have been conducted for both real-world and randomly generated examples to demonstrate the effectiveness of our algorithms. Careful analysis of the experimental results help us to draw several conclusions regarding the nonideal properties of DVS processors and the performance of our proposed algorithms.
The remainder of this paper is organized as follows. Section II summarizes the relevant background material, including system models and motivation examples. Section III describes our algorithms that handle DVS processor limitations. Section IV presents the experimental results and Section V concludes the paper.
II. PRELIMINARIES

A. System Model
We consider real-time applications consisting of a set of independent jobs, J = fJ 1 ...J n g with each job J i 2 J having a release time ri, a deadline di, and worst-case execution cycles ci. The job set is to be executed on a DVS processor whose power consumption is a convex function of the processor speed (frequency) [25] . The convexity assumption holds so long as the switching power is one of the main contributors of the total power, which is the case for current and near-future CMOS devices [26] .
The DVS processor can operate at a finite set of supply voltage levels V = fV 1 ; ...;V max g, each with an associated speed. To simplify the discussion, we normalize the processor speeds by S max , the speed corresponding to Vmax, giving S = fS1; ...; 1g.
Changing from one voltage level to another takes a fixed amount of time, referred to as the transition interval (denoted as 1t), and consumes a variable amount of transition energy (denoted as 1E). The transition interval length for a DVS processor alone is usually on the order of 10 to 100 s [2], [4] , [5] . However, when considering synchronizing with other components in a system, the length can be on the order of milliseconds [19] , [27] . Transition energy includes both the energy consumed by the dc/dc converter and the CPU. No instructions are executed during a transition.
The above DVS processor model captures the main properties of most commercial DVS processors [2], [5] . A variable length-transition interval (e.g., the one described in [4] ) can be approximated by a fixedlength interval equal to the maximum switching time. For processors that do not block instructions during a transition (e.g., [4] ), a schedule that assumes blocking during a transition can be pessimistic, but it will guarantee a valid voltage schedule. As in most DVS work, we assume that each job consumes an equal amount of energy per cycle at a given speed, which is a valid assumption for many applications.
We introduce some notation below which will be used throughout the paper. 
B. Low-Power Earliest Deadline First (LPEDF)
We briefly review the voltage scheduling algorithm LPEDF, presented in [12] , as it is referenced throughout the paper. LPEDF is rephrased in Algorithm 1. LPEDF finds off-line an optimal voltage schedule for a set of independent tasks executed according to the preemptive EDF policy. It assumes an ideal DVS processor without transition overhead. The general idea is to iteratively identify (line 5), schedule (lines 6 and 7) and remove (lines 8-14) each critical interval. Lines 8-14 essentially "squeeze" the critical interval to a single time point at t s (i) by reducing all release times or deadlines inside T i to t s (i) and then reducing all release times or deadlines after t f (i) by jTij. For the rest of the paper, when we refer to squeezing an interval, we mean performing a similar operation.
Algorithm
Algorithm 1 is greedy in the sense that it always picks the critical interval to schedule first. Consequently, the intervals are identified according to a monotonically nonincreasing order of their associated speeds. Due to the convexity of the power function, this monotonicity property (summarized formally in Lemma 1) is beneficial when constructing voltage schedules. For the proof of Lemma 1 and more details on LPEDF, we direct the readers to [12] .
Lemma 1: Critical intervals found by successive iterations of LPEDF are monotonically nonincreasing in intensity, that is, si sj if i j .
C. Motivational Example
To illustrate the impact of ignoring transition overhead, we present the following example. Consider the job set in Fig. 1(a) , which contains four jobs (where 4 represents a job release time and 5 a job deadline).
The optimal voltage schedule by LPEDF, assuming both 1t and 1E are zero, is given in Fig. 1(b) . Suppose the same set of jobs is scheduled on a DVS processor with 1t = 100.
A straightforward approach to include time overhead is to: 1) insert a transition interval at each speed change and 2) rescale the speed of the interval with the lower speed to ensure that the same amount of work is completed. The resulting schedule is shown in Fig. 1(c) . The speed of T 0 2 , the modified interval T 2 , was calculated using 
The schedule in Fig. 1(c) is not required to complete J 4 . Because (2) fails to take into account release times and deadlines when rescaling the speed, it can result in schedules that require jobs to execute before they are released or after their deadlines. Fig. 1(a) , we obtain the schedule shown in Fig. 1(d) . Unfortunately, the schedule in Fig. 1(d) still has monotonicity and execution violations since s 2 > s 1 and J 3 is contained in the transition intervals about T 1 . Furthermore, the schedule is still not feasible due to the overshoot of T2 . The next section presents two algorithms that guarantee valid schedules.
III. ALGORITHMS FOR DVS PROCESSORS WITH PRACTICAL LIMITATIONS
In this section, we first present a scheduling algorithm that can guarantee the feasibility of real-time jobs in the presence of time overhead. We then introduce an enhanced algorithm to improve the energy efficiency of this algorithm and incorporate discrete speeds and transition energy overhead simultaneously.
A. Basic Algorithm for Time Overhead
Regarding monotonicity violations induced by M-LPEDF, we have observed that any critical interval that violates Lemma 1 must be adjacent to the critical interval identified in the previous iteration. (Note that generally critical intervals found in successive iterations are not necessarily adjacent to one another). This observation is stated formally in Lemma 2.
Lemma 2: Let T i01 and T i be two critical intervals obtained by M-LPEDF. If s i01 < s i , then T i01 is adjacent to T i . The key observation leading to Lemma 2 is that a monotonicity violation occurs when the allotted time intervals for executing some jobs are shortened due to the addition of the transition intervals of the previous critical interval. The result is that these jobs require a higher speed in order to meet their deadlines. To remove such a violation, we could simply eliminate the transitions by merging jobs in the violation interval with those in the previously identified critical interval. However, we need to be sure that such a merge will not lead to any deadline misses. Lemma 3 below provides this guarantee.
Proof: To schedule
Lemma 3: Let Ti01 and Ti be two critical intervals obtained by M-LPEDF with s i01 < s i . The minimum speed at which every job J k 2 (J i01 [ J i ) can execute and still meet its deadline is s i01 .
Proof: According to Lemma 2, Ti01 and Ti must be adjacent. Therefore, if the speed s i01 is applied to both of these intervals, no voltage transition occurs. According to Lemma 1, s i01 is the minimum speed required to guarantee the deadlines for jobs in Ji01 and is greater than the speed needed to guarantee the deadlines for jobs in J i , when no time overhead is present. Therefore, si01 is the minimum speed required for every job in (J i01 [ J i ) to meet its deadline.
Based on Lemmas 2 and 3, we can keep track of monotonicity violations and remove them whenever they occur. In fact, these lemmas also help eliminate execution violations. Observe that an execution violation is just a special case of a monotonicity violation, where the speed required by the violation interval is 1. For instance, in Fig. 1(d Building upon the above lemmas and observations, we propose Algorithm 2, called time overhead earliest deadline first (TOEDF), which eliminates both monotonicity and execution violations as they occur, thus producing a feasible schedule with a time overhead of arbitrary size. In TOEDF, when a monotonicity or execution violation is encountered (line 6), T i01 is extended to include all jobs scheduled in T i (line 9). "Squeezing" a critical interval changes the timing parameters of some jobs (line 16). The parameters are required if Ti+1 causes a violation, so we save these parameters (line 13). After T i is identified, we may need to extend it like M-LPEDF to incorporate 1t (line 12).
Algorithm
One must be particularly careful during this step. For example, if an end point of T i is within 1t of any previous interval (say T j ) in T , we should not extend Ti so that it overlaps Tj . This will prevent TOEDF from including more time overhead than necessary. (This can be done by maintaining a proper data structure when implementing the algorithm. We omit the details.) Theorem 1 below summarizes the correctness and complexity of Algorithm 2. The proof follows directly from Lemmas 2 and 3 and is thus omitted. A detailed proof can be found in [28] . In what follows, we apply Algorithm 2 to the job set in Fig. 1 (a) and discuss more details of this algorithm. During the first iteration of Algorithm 2, the critical interval [600, 1000] with s1 = 1 is identified. The interval is extended to [500, 1100] (with 1t = 100) and then (T 1 , s 1 ) is inserted into T . During the second iteration, J 3 causes an execution violation. Therefore, (T1, s1 ) is removed from T and J3 is merged with J 2 2 J 1 . The new T 1 is expanded to [500, 1050] to reflect the earliest release time and the latest deadline of both jobs J 2 and J3 , and is further expended to [400, 1150] to accommodate time overhead. T 1 keeps s 1 from the previous iteration and (T 1 , s 1 ) is once again inserted into T .
In the third iteration, J4 causes a monotonicity violation. Therefore, is inserted into T (we assume no transition at time 0.) The resulting schedule is depicted in Fig. 2 . One can readily verify that this schedule indeed guarantees the schedulability of all the jobs.
B. Improving the Basic Algorithm
Unnecessary energy may be wasted when using TOEDF since, in order to eliminate violations, we use "higher-than-necessary" speeds for some intervals. Although removing violations is critical for guaranteeing the feasibility of a voltage schedule, one should strive to shorten the intervals that demand higher speeds to reduce energy consumption. For example, when applying TOEDF to the job set in Fig. 1(a) , the processor speed for the interval [400, 1200] is 1.0. However, one can readily verify that using the processor speed of 1.0 during the interval [590, 1200] can also guarantee the schedulability of the jobs scheduled in that interval, i.e., J2 -J4. Energy is saved in two ways: first, the interval length that demands the speed of 1 is reduced; second, extra time is available for any remaining jobs, such as J 1 , which can be used to further reduce their required speed. In this case, J1 can be executed at 100=490 = 0:2, instead of 0.29 like the schedule in Fig. 2 . Therefore, if we have to use a higher-than-necessary speed, we should restrict the usage of this speed to as short of an interval as possible. The problem can be formulated as follows.Given a set of jobs, J , and a constant speed, s 3 , where s 3 is a constant speed no less than the minimum speed required to meet all deadlines of J , find the shortest interval in which all deadlines of J are met.
The key to solving this problem is to realize that we really have one degree of freedom, i.e., how long the start of the interval can be delayed. To prevent any deadline misses due to the delayed executions, we must identify the latest start time for the job set, i.e., the latest time at which jobs in J can begin execution at speed s 3 and still meet all deadlines.
Lemma 4: The job set J scheduled by EDF has the latest start time where hp(Ji) is the job set containing jobs with priorities equal to or higher than the priority of Ji . Proof: First, we prove that starting at a time later than t LS (i) causes Ji to miss its deadline. Then, we prove that Ji does not miss its deadline if execution begins at or before t LS (i).
1) By EDF, hp(Ji) includes the jobs with deadlines no later than d i . To guarantee that all jobs in hp(J i ) meet their deadlines, there must be sufficient time to execute the cycles of these jobs. The total workload of jobs in hp(Ji) is W = J 2hp(J ) c k , and the total time needed to execute this workload is W=s 3 . It is trivial to see that starting later than t LS (i) will not give enough time to finish W . The minimum tLS (i) is the most restrictive latest start time of all jobs in J , so it is the correct choice for t LS . 2) Suppose that beginning execution at tLS (i) causes Ji to miss its deadline. Considering the execution of jobs in hp(J i ), there must be idle time in the interval [t LS (i); d i ] such that no job in hp(Ji) is executed during this idle time. Beginning execution at any time earlier than t LS (i) will not alter the job released after the idle interval (including J i based on EDF), so J i must still miss its deadline. This contradicts the assumption that all jobs in J executing at the speed s 3 will finish by their deadlines.
Lemma 4 provides a way to find the latest start time (tLS) for a given set of jobs and a constant speed s 3 . Then, with a simple planesweeping algorithm we find the earliest finish time tEF of all jobs in J based on tLS . Together, tLS and tEF form a minimum time interval. We refer to this procedure as Algorithm MININT. To show that MININT indeed produces a minimum-length valid interval, we present Theorem 2. The proof follows from Lemmas 3 and 4. The complete algorithmic description of MININT and proof of Theorem 2 can be found in [28] .
Theorem 2: Given a set of jobs, J , and a constant speed, s 3 , Algorithm MININT finds a minimum length interval T = [t s ; t f ] needed to complete every job J k 2 J at the speed s 3 by its deadline.
MININT can be readily incorporated into TOEDF to improve its energy performance. In TOEDF, when a violation occurs and jobs are merged with those executed in the previous critical interval, we can apply MININT to find the shortest interval within which every job deadline is satisfied. This typically results in increased energy savings.
C. Discrete Voltage/Frequency Levels
Until now, we have assumed that the processor speed can be continuously varied. However, current commercial DVS processors [1] - [3] only have a finite number of speeds. This factor must be integrated into voltage-scheduling algorithms to provide a valid and energy-efficient schedule.
One intuitive way to deal with discrete speeds is to round up the required frequency and voltage to some allowed level. Unfortunately, this can be extremely pessimistic and energy inefficient, especially for many commercial processors with only a few voltage levels available [2] . A better approach is proposed in [18] and [29] that can use the two levels immediately above and below the desired voltage/speed value to optimally schedule a single job. Although theoretically optimal, this approach is not practically applicable on real processors because excessive voltage transitions, roughly two per job, are introduced. This, coupled with the omission of time overhead, will cause jobs to miss their deadlines.
To consider both time overhead and discrete speeds simultaneously, we believe that it is more advantageous to incorporate the discrete speed effects into the construction of critical intervals and let it propagate to future critical interval construction. Specifically, after a critical interval is identified in Algorithm 2, its speed is increased to the next available level. Recall that when a higher-than-necessary speed is applied, we can use MININT to find the minimal interval needed for the given job set and speed. However, when increasing the speed to the higher level, it can introduce a significant amount of unused idle time, even after we apply algorithm MININT to find a minimal length interval. A better method to utilize these idle times to save energy is to relax the requirement that all jobs originally found in the critical interval must run at the higher speed (note that Lemma 3 only holds when the speed increase is due to a violation). Therefore, we keep only one of these busy intervals (intervals without idle time) for the final voltage schedule, with the expectation that the rest of the jobs may benefit from the higher-than-necessary speed assignment for this interval and can execute at a lower speed. One problem is how to select which busy interval to keep. A good choice can lead to low computation cost and higher energy efficiency. There are a number of heuristics, such as always selecting the first, the last, the shortest, or the longest busy intervals. Though each of these approaches has its intuitive advantages, none of them dominates the others in our experiments. This is due to the many patterns of job arrival times, deadlines, and execution cycles. Therefore, we simply choose the first busy interval as it is the most computationally convenient. We refer to this procedure as Algorithm DISCRETE. A detailed description of DISCRETE can be found in [28] .
D. Transition-Energy Overhead
So far, we have ignored energy overhead in our voltage scheduling algorithms. Similar to time overhead, we account for energy overhead while constructing the critical intervals. This allows the effect to propagate throughout the schedule. Our approach works as follows: when a new critical interval, Ti is identified, whether or not this critical interval is kept depends on whether or not the energy consumed by adopting its speed is smaller than that consumed by merging it with an adjacent critical interval, Tj . (The term adjacent is defined in Section II-A). The idea is that if the energy transition overhead is so significant that using different processor speeds for different intervals consumes more energy than using a single speed, we can simply merge adjacent critical intervals by adopting the largest speed among them for every job and then removing the associated voltage transitions. Again, MININT is used to find a minimum-length interval. We refer to this procedure as Algorithm ENERGY_OH and refer the readers to [28] for more details.
E. Unified Algorithm
By combining the techniques from Sections III-B to III-D with Algorithm 2, a valid voltage schedule with superior energy savings is produced while accounting for practical limitations of real-world DVS processors, including time and energy overhead and discrete voltage levels. We call this unified algorithm UAEDF (see Algorithm 3). Algorithm 3 follows the same general flow as Algorithm 2. First, it identifies the next critical interval assuming any speed is valid (line 6) and matches the next valid speed using DISCRETE (line 7). Then, it removes monotonicity or execution violations and shortens the critical interval with MININT (lines 9-12). ENERGY_OH is used to minimize the impact of energy overhead (line 17). Finally, as in TOEDF, the newly generated interval is "squeezed" into one single time point (line 19) and one iteration of the algorithm is completed. Theorem 3 guarantees the correctness of Algorithm 3. The proof follows directly from Theorems 1 and 2 and is thus omitted. A detailed proof can be found in [28] ).
Algorithm
Theorem 3: Algorithm 3 always produces a valid voltage schedule with a time complexity of O(n 3 ).
IV. EXPERIMENTAL RESULTS
In this section, we quantify the impact of transition overhead and discrete voltage levels and evaluate the energy savings of our proposed algorithms with both randomly generated job sets and real-world examples. We compare our algorithms against Sys-Clock, presented in [19] , modified to schedule EDF job sets rather than RM task sets. As far as we know, Sys-Clock is the only previous work that can include an arbitrarily large time overhead and still guarantee a valid voltage schedule.
In our experiments, we base our power model on the AMD Mobile Athlon4 DVS processor [2] . The Mobile Athlon4 can run at voltage levels in the range of 1.2-1.4 V with 50 mV steps and corresponding frequencies of 500 MHz to 1 GHz with 100 MHz steps. For experiments with more than five voltage levels, we interpolate these performance points using a second-order polynomial. The minimum speed/voltage for all experiments is 500 MHz at 1.2 V. Power is modeled using the equation P = CSWfopV 2 DD . The value of CSW is set to 12.75 nF, based on information from AMD's datasheet [2] . The CPU power ranges from 9.2 to 25 W. The system includes a low power "sleep" state that the processor can utilize when idle. We assume that it takes 1t=2 time to enter the sleep state, and another 1t=2 to exit. The power consumed while in the sleep state is 2.4 W.
Time overhead is modeled by a constant value in the range of 0 (no time overhead) to 5 ms. Note that while most DVS processors have a time overhead in the range of tens to hundreds of microseconds, these numbers may be misleading as they ignore synchronization delay with off chip components such as memory, which can be quite large. The Compaq iPAQ, for example, requires 20 ms to synchronize with its main memory after a frequency switch [27] . Energy overhead is modeled using 1E = 1E DC + 1E CPU where 1E DC is the energy consumed by the dc-dc converter, and 1E CPU is the energy consumed by the CPU during a transition. 1EDC = CDDjVDD1 0 VDD2j with C DD = 100 F and = 0:9 as presented by Burd in [30] . 1ECPU = k1t where k = 2:4 W is the power consumed in the stop-grant state of the AMD processor (entered during a transition) according to [2] .
We first construct and test 100 randomly generated sets of 20 jobs each. The jobs are assumed to always require the worst-case execution cycles to complete. Execution cycles and release times are uniformly distributed between [0,800] and [0,1000] s, respectively. The relative deadlines of the jobs are normally distributed with an average of 810 s and a standard deviation of 280 s. We then schedule each job set using Sys-Clock, TOEDF, and UAEDF on a processor with 1, 14, 5, and 2 discrete voltage levels. The energy from each schedule was normalized against the optimal LPEDF schedule without transition overhead on a continuous processor, i.e., we use LPEDF as the lower bound on energy. The results are summarized in Fig. 3 . Next, we apply our algorithms to three real-world examples: 1) a computerized numeric controller (CNC) task set based on the work by Kim et al. in [31] ; 2) an Avionics task set based on Locke's work in [32] ; and 3) a video phone task set based on based on the work by Shin et al. in [11] . Each task set is first converted to a job set by unrolling the tasks to the set's respective system hyperperiod. These results are given in Figs. 4-6. From the experimental results, one can immediately conclude that TOEDF and UAEDF always outperform Sys-Clock in terms of energy efficiency. For example, in Fig. 3(b) , when the time overhead is 0.5 ms and the number of voltage levels is 14, TOEDF and UAEDF outperform Sys-Clock by as much as 30% and 39% respectively. Moreover, our experiments also show the significant improvement of UAEDF over TOEDF, especially with the increased time overhead and fewer discrete voltage levels. In Fig. 3(b) , at low time overhead, i.e., between 0 and 200 s, TOEDF and UAEDF are within 2% of each other. However, when the time overhead is 0.9 ms, UAEDF outperforms TOEDF by around 13%. Also, at two levels [ Fig. 3(c) ], the difference becomes approximately 25%.
The same conclusion can be drawn from our practical application experiments as shown in Figs. 4-6 . Compared to the random job sets, it is interesting to note that the energy consumption does not increase monotonically with time overhead. This is because for a particular job set, some values of 1t may be "nicer" than others, as the impact of 1t depends on the timing parameters of the given jobs.
Finally, we would like to point out an important aspect not illustrated in our figures, i.e., the energy reduction from including and excluding the energy overhead part of UAEDF. In our experiments the difference was minimal (less than 1% variation). There are three main reasons for this result. First, our algorithms already handle time overhead by reducing the number of transitions which naturally reduces the impact of energy overhead also. Second, the energy overhead that is not directly dependent on 1t, i.e. that incurred by the dc-dc converter (maximally (0:9)(100 2 10 06 )j1:4 2 0 1:2 2 j = 0:0468 mJ) is less than 1% of the energy consumed by a 1 ms interval executing at even the minimum power level (9.2 mJ). Typically critical intervals are longer than 1 ms. Third, assuming that avoiding a transition to eliminate an instance of energy overhead does save energy, the savings are balanced by the execution of some cycles at a higher speed instead of a lower one. This is particularly true for fewer discrete voltage levels.
V. SUMMARY
In this paper, we study the impact that practical limitations of current DVS processors can have on the energy consumed in a hard real-time system. These limitations include time and energy overhead and discrete voltage levels. We have shown through examples and analysis that transition time overhead can cause a theoretically feasible off-line schedule to become invalid if not correctly accounted for during the scheduling process.
We present two algorithms, TOEDF and UAEDF, which always produce valid voltage schedules in the presence of time overhead for a real-time job set theoretically schedulable according to EDF. Our experiments demonstrate that these two algorithms can outperform the previous approach by up to 39%. TOEDF, built on LPEDF [12] , constructs a feasible schedule by addressing monotonicity and execution violations that occur when time overhead is introduced. UAEDF enhances TOEDF by reducing the length of intervals in which a high voltage is needed. In addition, it also takes discrete voltages level and energy overhead into consideration, giving UAEDF increased energy saving potential. Our experiments show that, while UAEDF has comparable energy savings to TOEDF when the time overhead is small, it can greatly improve energy performance, up to 25% when the time overhead becomes significant. This is particularly appealing in the face of rapidly elevated operation frequencies in today's commercial processors.
Currently, the optimality of our algorithms is not guaranteed, so further algorithm development may improve results even more. Also, for the scheduling process to give a practical voltage schedule for an even wider range of systems, we need to account for other implementation details, such as context switching overhead, different transition models and support for other priority schemes, such as fixed-priority scheduling.
