As mobile computing is getting popular; there is an increasing interest in techniques that can minimize energy consumption and prolong the battery life on mobile devices. Processor voltage scheduling is an effective way to reduce energy dissipation by reducing the processor speed. In this paper; we study voltage scheduling for real-time periodic tasks with non-preemptible sections. Three schemes are proposed to address this problem. The static speed algo rithm derives a static feasible speed based on the Stack Resource Policy (SRP). As worst-case blocking does not always occur; the novel dual speed algorithm switches the processor speed to a lower value whenever possible. The dynamic reclaiming algorithm deploys a reservation-based approach to reclaim unused run time for redistribution. It effectively decreases the processor idle time and further reduces the processor speed. The feasibility conditions are given and proved. Simulation results show that the two dynamic algorithms can reduce processor energy consumption' by up to 80 percent over the static speed scheme.
Introduction
For convenience and ease of access, more and more per sonal computing and communication devices are becom ing portable and mobile. These include laptop computers, pocket PCs and PDAs. Most of them are powered by bat teries with limited power capacity. Some recent portable systems are equipped with powerful processors that were available on desktop computers only a few years ago. The performance boost comes at the cost of higher energy con sumption. On the other hand, as battery technology has not been keeping up with the speed of the processors, the lim ited battery power has become a major concern. How to conserve power and prolong battery life is of critical impor-tance and has received much attention recently [1, 2, 3] .
Work exists on evaluating the power consumption of portable computing systems [4, 5] . It was found that the dis play (including backlight), processor, hard disk and wireless LAN card attribute to most of the power consumption [4, 5l.
In particular, the processor may consume up to 25% of the system power for laptop computers [4] . This percentage is even higher for PDAs since hard disks are not used. If the power I consumed by the processor is reduced, the power consumption of the whole system is also substantially re duced.
As the processor may not be fully utilized all the time, the variation in system load can be exploited to reduce power dissipation. The processor can be turned off or made to operate at lower a speed when the system has no or little work to do. Some processors (e.g., the Crusoe processor of Transmeta [6] and the Intel StrongARM processor [7] ) al low the processor voltage to be dynamically adjusted. This is called dynamic voltage scaling (DVS). The relationship among power consumption rate (P), supply voltage (Vs) and clock frequency (f) can be described by the following formula [8] : (1) where C is the switched capacitance. Furthermore, as the operating frequency f (and therefore the processor speed) is approximately proportional to the supply voltage [9] , the power consumption rate is roughly proportional to the cube of the supply voltage. It follows that if we decrease the processor speed by lowering the supply voltage, the power consumption will be reduced. Voltage scaling is particularly useful for real-time applications because a longer computa tion time is acceptable as long as the deadlines are not vi olated. With voltage scaling, the job scheduler must make two types of decisions: which job to run and at what speed to run it. In this paper, we shall refer to job scheduling with voltage scaling simply as "voltage scheduling" for short.
Much work has been done on voltage scheduling with the objective of minimizing processor cncrgy consump tion [I,2,3,9, 10, 11, 12] . Most of them assume the tasks are preemptible. However, in reality, tasks may have regions that are not preemptible. We call these regions blocking sec tions. An example is when the task is holding some critical resources or is in the middle of an atomic transaction. As suming fully preemptible tasks in this case may cause dead line misses or result in incorrect computation.
In this work, we consider voltage scheduling of real-time periodic tasks with blocking sectiuns. We show how to cal culate a static speed by which all admitted tasks can be feasibly scheduled with minimal energy. A dynamic dual speed algorithm is also proposed to run the processor at an even lower speed in some intervals. Furthermore, as thc ac tual execution time usually differs from the declared worst case execution time (WCET), a reservation-based scheme which dynamically reclaims processor time for redistribu tion is proposed to further reduce energy consumption.
The rest of the paper is organized as follows: The sys tem models are introduced in Section 2. Section 3 gives the feasibility conditions and formulas for the static speed algo rithm. Section 4 presents two dynamic voltage scheduling algorithms. Performance evaluation results are presented in Section 5. Section 6 reviews related research on voltage scheduling. Finally, Section 7 concludes this paper.
System Model

Task Model
Real-time periodic tasks are considered in this paper. A periodic task is a sequence of jobs released at constant in tervals (called the period). We denote the set of tasks by T.
Each task T; E T is characterized by four parameters:
• A.i: time the task is first released.
• D;: relative deadline of the task.
• Pi: period of the task.
• Ei: worst-case execution time (WCET) of any job in the task.
In this paper, we assume the relative deadline of a task is equal to its period. Each job in a task can be considered as a processing request. It is associated with an absolute deadline by which the job should be completed. We say a job meets the deadline if the job is completed at or beforc before its deadline, and it misses the deadline otherwise. In this paper, we assume hard real-time tasks, i.e., there should be no deadline miss. A task Ti = (Ji,l, h2, Ji, 3 ,"" Ji, n ) consists of n jobs, where job Ji,j is characterized by its re leasc timc r i,j, the execution time e"j (-S Ei) and the ab solute deadline di , j' The execution time is defined as the 236 time required to process the job at the processor's maximum speed. Furthermore, jobs are preemptible except when they are running in their blocking sections. A job can have zero, one or more blocking sections, and Gi denotes the length of the longest blocking scction of any job in task T,2• Thc positions of the blocking sections are randomly distributed within a job except that they are non-overlapping.
Processor Model
The processor is capable of dynamic voltage scaling and its speed is proportional to the supply voltage. The maxi mum and minimum possible supply voltages are denoted as v'nax and V.nin, respectively, while the corresponding pro ccssor speeds are Smax and Smin, respectively. The pro cessor voltage can be adjusted at discrete steps within the range. Throughout this paper, we assume the processor's maximum speed is I and all other speeds are normalized with respect to the maximum speed. As observed in [2] , the voltage transition delay is very short. We therefore assume the voltage transition cost is negligible and the voltage can be adjusted at any time (whether inside or outside a blocking section). We also assume the processor power follows for mula (l), which in our case can be simplified to P = K . V s 3
where K is a constant.
Static Blocking-aware Voltage Scheduling
In this section we show how to find a static voltage/speed setting to minimize energy consumption in a system con sisting of periodic tasks with blocking sections. In the static scheme, the processor voltage is changed only when a new task arrives or when an existing task terminates.
The Stack Resource Policy (SRP) was proposed by Baker to schedule tasks with shared resources [13] . The core idea is that a job is allowed to preempt a lower priority job only if all the resources it needs are available. The fea sibility conditiun uf the SRP was also derived and is listed in Theorem 1:
Theorem 1 [13] Suppose n periodic tasks are sorted by their periods. They are schedulable by the earliest deadline first (EDF) algorithm with the SRP if
Iik,l -S k :::: n,:
where Bi is the maximum length that a job in Ti can be blocked.
In voltage scheduling, if the SRP is used with EDF [14] , the processor speed can be reduced according to Theorem 2.
Note that we have replaced D with P in the formula since they have the same value in our task model. Proof: Note that scheduling a task set T at processor speed H is equivalent to scheduling a task set T* at the maximum processor speed where the execution times and resource holding times of T* are 1/ H times the corresponding val ues in T. Hence, the above inequality can be rewritten as
Moreover, the maximum blocking time a job in Tt may en counter would also be scaled up by 1/ H times. According to Theorem 1, T* is schedulable by the SRP at full proces sor speed (=1), so the original task set T is schedulable at speed H. Comparing with (2), it is easy to see L ::; H. We refer to L as the utilization speed, or simply the "low speed". In contrast, the static speed H (or "high speed") calculated in Section 3 ensures a job will not miss its deadline even if worst-case blocking occurs. We propose a dual speed algo rithm that allows the processor to operate at speed L when ever possible. In the dual speed algorithm, if a job blocks a higher priority job, then during the lifetime of the low pri ority job (Le., until its deadline), the processor must run at speed H. In all other situations the processor may run at the low speed L. The Dual Speed (DS) algorithm is formally presented in Figure 1 . If the system is executing in a high speed interval (i.e., the system is running at speed H), then End_H in the algorithm indicates the time point at the end of that interval. Otherwise End_H = -1. If there is no work to do, the processor will enter the idle state. Note that 1* Hand L are recomputed by the static speed algorithm as a task joins or leaves the system. Initially the processor speed is L. End_H indicates the end of the high-speed interval. If the system is not in a high speed interval, End_H = -1. Initially
When job Ji,J arrives: with discrete speed levels, the SeLSpeed function in the al gorithm sets the processor speed to the lowest speed level that is greatcr than or cqual to thc speed spccified in the parameter.
In the dual speed scheme, the high speed and the low speed need to be fe-calculated only when a task joins or leaves the system. By maintaining two task lists respec tively sorted by the tasks' periods and maximum blocking lengths, the calculation of Hand L can be done in O(n) time where n is the number of tasks. After H and L have been calculated, both procedures in Figure I only take 0(1) time. As Hand L are not frequently changed, the overhead of the algorithm is minimal.
An example is givcn in Figure 2 to illustrate the differ ence between the static speed algorithm and the dual speed algorithm. The up and down arrows denote the arrival times and deadlines of the jobs, respectively. The white boxes in dicate job execution intervals, while the shaded boxes indi cate blocking sections. The two jobs in Tl both need 1 time unit to finish at the high speed H while the job in T2 needs 4 timc units. Suppose the low speed L is half the value of the high speed. Under the static speed algorithm, the proces
while under the dual speed algorithm, the processor runs at speed L before time point 4, at which blocking occurs (Fig  ure 2b ). Based on (1), the dual speed algorithm would save 25 percent of energy compared to the case of static speed.
Although the dual speed algorithm reduces the processor speed, the feasibility of the task set is still maintained. The following theorem guarantees that if a task set is schedula ble by the static speed algorithm, it is also schedulable by the dual-speed algorithm. �III II II II IIII II II II II II II II II II II II II I IIII II II II II II ;=1 '
Proof: We prove the theorem by contradiction.
(4)
Suppose the claim is false and t is the earliest time that a job misses its deadline. We find another time t' before t which is the latest time point such that no active job arrived before t' has a deadline at or before t. Let X = t -t'. In the fi rst case, because all tasks are periodic, the processor demand generated by the jobs in M is bounded by �� = l lX/P,J . E;. On the other hand, as the processor is never idle and its speed is greater than or equal to L in the entire interval, the processor demand the processor can handle is at least L . X. Since a job misses its deadline at time t by assumption, the total processor de mand in the interval must exceed what the processor can handle in the same interval. Therefore
which contradicts with (4).
In the second case, the processor demand during (tf, t] is larger than that in the first case due to the execution of Jk. However, since Jk will no longer be executed once it leaves its blocking section, the total time Jk can execute during (tf, t] cannot exceed its longest blocking section. Suppose J k belongs to task T m. Then the time it can execute is bounded by Gm . Therefore the total processor demand is bounded by G m + �� = 1 l X / Pd . Ei, where P I is the longest period that is smaller than or equal to X. As the deadline of
Jk is later than t, the processor must have been operating at speed H throughout the whole interval (tf, t], therefore the amount of work processed is X . H. If thcre is a deadline miss at time t, the processor demand must be greater than the work processed, i.e.,
Since Ii < Pm =? Gm :::; max{Gj/Ii < Pj}, we have
which contradicts with (3) .
0
The basic dual speed algorithm can be extended to fur ther reduce energy consumption. This is achieved by short ening the lengths of high speed intervals. In the extension, a high speed interval can be terminated at once if one of the following conditions occurs: i) Ajob whose deadline is later than or equal to End_H starts execution, ii) The processor becomes idle. Because these two situations can never occur in the middle of the interval (tf, t] as shown in the proof of Theorem 3, the feasibility of the task set is not impaired. In the dual speed algorithm, the processor speed is al ways higher than or equal to the utilization speed, which suggests room for further speed reduction. Moreover, the utilization speed is calculated based on the tasks' WCETs, but the actual processing demand is often lower. When a job completes early, the processor would have some idle time. If this portion of time can be redistributed to the other pending jobs, the processor speed can be further reduced. In this section, we present a reservation-based scheme that dynamically collects the residue time from early comple tions. As the algorithm is an extension to the dual speed algorithm, we call this algorithm the Dual Speed Dynamic Reclaiming (DSDR) algorithm. Aydin et al. [I] proposed a similar approach for a fully preemptible environment, but their algorithm is not applicable when blocking sections are present.
First, we need to define the run time of a job and distin guish it from the job's execution time. The run time (de noted by R) can be viewed as a budget assigned to a job.
It specifies the wall clock time3 that can be used to process a job and is consumed as the job executes. The run time of a job has a deadline which is set equal to the job's dead line for reclaiming purpose. The execution time (denoted by E) describes the time needed to complete the job under the maximum processor speed. Given the run time and the exe cution time of ajob, the speed at which the processor should operate is S = E . Smax/ R, where Smax is the maximum processor speed.
We first introduce a reservation-based extension to the dual speed algorithm in which run time is reserved for each job. In order to maintain the same feasibility conditions as in the dual speed algorithm, we need to allocate each job the same amount of run time as it will get under the dual speed algorithm. We take the processor speed determined by the dual speed algorithm (H or L) as the base speed for a job and allocate run time to each job so it can complete its WCET at the base speed before the run time is depleted. When a job arrives, it is assigned an initial run time assum ing speed L, which is the optimal allocation if no task is blocked. Note that the processor is never idle if all jobs ex ecute at their WCETs and at speed L. However, since the base speed cannot be determined until the first time the job is selected for execution, the actual run time of the job may be adjusted at that time. As the job executes, it consumes run time and it must complete before its run time is depleted. list is sorted in increasing order of the deadlines. The free run time comes from two sources. The first source is the residue run time when a job completes. The deadline of the reclaimed run time is set equal to the deadline of the completed job. A job may also contribute run time to the FRT-list if it starts to execute (for the first time) in a high speed interval. In this case, because the job's base speed is H, its actual run time assigned could be less than its initial run time allocated when it arrived. The difference in run time is taken from the job and is inserted to the FRT-list. The deadline of this free run time is set to the end of the high speed interval. In this way, DSDR effectively reclaims unused run time for redistribution, which in turn reduces the processor idle time and leads to decreased processor speed.
When a job is scheduled to run, it is eligible to use its own run time as well as the run time in the FRT-list with deadline earlier than or equal to the job's deadline. With the additional free run time, the job can be processed at a lower speed. If the run time in an item of the FRT-list is depleted, the item is removed.
Before we formally present the algorithm, we first intro duce the following notations used in the algorithm:
• Ji: the current job of task Ti. • Ri'(t): the available run time of job Ji at time t.
• R[(t): the run time in the FRT-list that can be used by job J, at time t.
• E[(t): the worst-case residue execution time of job Ji under the maximum speed Smax at time t.
The core of the DSDR algorithm is given in Figure 3 . H, Land End_H in the figure have the same meanings as in the dual speed algorithm. Note that the processor speed is cal culated according to the usable run time R; (t) + R'; (t) and the worst-case residue execution time E[(t) except when the current job is blocking another job. In the latter case, the processor speed is always set at H to reduce the blocking time. To complete the DSDR algorithm, the following rules are used to update the run time and the worst-case residue execution time of ajob, and the run time in the FRT-list: When the end of high speed interval is reached: 
If a new task arrives at time t, then the run time of all
jobs is set to ET (t) j H', where H' is the newly calcu lated high speed due to the task arrival. The FRT-list is cleared.
Note that these rules do not need to be carried out at ev ery time unit. Instead, Rule 1 is applied only when the cur rent job (Ji) completes, blocks another job, or is preempted, and Rule 2 is used only if a new job arrives when the pro cessor is idle.
Based on the above discussions, the following lemmas can be proved. Lemma 1 When a task set is scheduled by DSDR, no job will deplete its run time before it completes.
The proof should be clear from the DSDR algorithm ( Figure 3 ) and the three updating rules above. Note that ajob may use the run time in the FRT-list before consuming its own allocated run time. Proof: We prove the lemma by contradiction.
Suppose the claim is not true. Let t be the earliest instant that the run time of a job or in the FRT-list is not depleted at its deadline. We choose another time h before t, which is the latest time point such that no pending jobs arrived before tl has a deadline at or before t, and the FRT-list does not contain any item whose deadline is at or before t. If such a time point does not exist, let t1 = O. Constructed in this way, the processor never stops consuming run time throughout the interval (iI, il. Furthermore, the run time consumed during the interval is either generated after i1 and has a deadline before t (denoted as run time A), or has a deadline after t (denoted as run time B).
We consider two cases. In the first case, only the run time in A is consumed. Note that run time is only generated on job releases. Let Y = i -t1. the total amount of run time in A is bounded by :Z=� 1 lY/P,J . EdL. As there is still run time left at time t, the amount of run time in A must be greater than the run time consumed in the interval, which is
which contradicts with (6).
In the second case, the run time in both A and B is con sumed in the interval. We choose a third time t2 which is the latest time point before t such that the deadline of the run time being consumed at t2 is after i. Since run time in B is consumed in the interval, i2 must exist and t1 < i2 < i. The scenario is illustrated in Figure 4 . Gb/H + :Z=;= 1 lZ/P,j . EdH, where Gb is the length of the longest blocking section in Jb• As there is still residue run time at time t whose deadline is at t and the processor keeps consuming run time throughout the period, we have
Because Z ;:: PI and Z / Pi ;:: l Z / Pi J ' (7) can be rewritten as which contradicts with (5). equal to the end of the current high speed interval, the high speed interval is tcrminated at oncc. Note that this cxtension does not affect the proof of Lemma 2, so the conclusions in Lemma 2 and Theorem 4 are still valid.
Performance Evaluation
Simulation experiments were carried out to evaluate the effectiveness of the proposcd algorithms in saving energy. In this section, we fi rst describe the assumptions used and the characteristics of the task sets. The simulation results are then presented and analyzed.
Simulation Setup
The event-driven simulator imitates the processing of pe riodic tasks on a single processor that is capable of volt age/speed scheduling. We assume a maximum processor speed (=1) and a minimum processor speed (=0.1). Speed levels between the two bounds are discrete and spaced by 0.1. The supply voltage is proportional to the processor speed and the power consumption follows formula (l). We also assume the processor does not consume power when it is in the idle state. The simulation experiments consisted of three phases: task generation, admission control, job scheduling and execution.
To simulate the mixed workload in real systems, we gen erated tasks whose periods belonged to one of three ranges:
long period (l000",5000ms), middle period (100", 1000ms) and short period (20", lOOms). The WCETs of the tasks in the three categories were (l '" I ODOms), (1 '" lOOms) and (1�20ms), respectively. The tasks were uniformly dis tributed in thcse categorics. Within each category, the tasks' periods and WCETs were randomly selected from the corresponding ranges. After the task set was gener ated, the WCETs of the tasks were scaled such that the to tal processor utilization would not exceed the desired value, which was specifi ed by the utilization factor. The block ing factor specified the maximum percentagc the block ing section could occupy in a job's execution time, so the maximum length of any blocking section of a task was WCET x blocking_factor.
All tasks generated must undergo an admission control procedure in the order of their released time. We uscd the formula in Section 3 to calculate a static processor speed H. If the required speed was smaller than the maximum processor speed, the new task was admitted; otherwise the task was dropped without processing. Finally, the admitted tasks periodically gcnerated jobs. The slack factor specified the difference between the actual job execution time and the task's WCET. Specifically, the execution times of the jobs in a task were uniformly dis tributed between the task's WCET and (I-slack_factor) x
242
WCET. Each job could at most have one blocking section, and we used blocking_prob to represcnt the probability that a job would be assigned a blocking section. The position of the blocking section was randomly chosen. Released jobs were stored in the job queue and processed in the EDF or der. The voltage scheduling algorithms presented in this paper were used to determine the processor speed for the current executing job.
Experimental Results
Extensive simulations were conducted with the three proposed algorithms4 In each experiment, we generated a task set of 30 tasks. All tasks were released at time O. The experiment was carried out for 100,00Oms and the energy consumption was recorded. In order to improve accuracy, we performed simulations on 10 distinct task sets for each set of parameters and took the average. The static speed algorithm was used as the baseline and the power consump tions of the other two algorithms were normalized with the baseline.
Blocking Parameters
In the fi rst set of simulations, we evaluated the effect of the blocking sections on power consumption. The blocking parameters blocking_prob and blocking-fador were var ied to produce task sets with different numbers of blocking sections and different maximum blocking lengths. In these experiments, we let all jobs execute at their WCETs. The performance would improve if the actual execution time is less.
We fi rst fixed blocking_prob to 0.5 and varied blocking_factor between 0 and 0.3. The range of block ing factor was chosen based on two reasons: i) The block ing sections are usually short in practical applications. ii) As the blocking factor exceeded 0.3, the number of tasks not admitted increased quickly, which affected the accuracy of the results. Figure 5 shows the normalized energy con sumption of the dual speed algorithm and the DSDR algo rithm under utilization factors 0.4 and 0.6, respectively. As the maximum blocking length increased with blocking fac tor, the value of high speed H was also increased. The en ergy consumed by all three algorithms grew with H, but the energy consumption ot' the two dynamic algorithms grew slower because these two algorithms switched to low speed modes in some intervals. As a result, the dual speed algo rithm saved up to 70 percent of the energy consumed under the static algorithm. Since DSDR utilized the run time re claimed in high-speed intervals, it saved more energy. How ever, the difference is not signifi cant. As observed in the ex periments, blocking only occurred rarely and the durations 4 0nly the results of the extended rather than the basic dual speed algo rithm and the basic DSDR were presented since the extended algorithms always out performed the basic algorithms in all cases. of high speed intervals were very short, which left very lit tle room for DSDR to reclaim. Another observation is that when the blocking factor was smaller than 0.09, thc discrete values of high and low speeds coincided, consequently all three algorithms consumed the same amount of energy.
In the similar way, we have varied blocking_prob with blocking_factor fixed at 0.2. As shown in Figure 6 , the energy consumed by the dual speed algorithm increased with the blocking probability. However, the observed ac tual blocking rate was vcry low, so the normalized energy consumption was only increased by less than 10 percent. In DSDR, as a job executing in a high speed interval can use the run time in the FRT-Iist to compensate for its run time loss due to run time reclamation (see Section 4.2), the impact on energy consumption was even less (less than 4 percent).
Slack Factor
In this set of simulations, we fixed blocking_factor and blocking_prob to 0.2 and 0.5, respectively. Figure 7 shows the simulation results for slack factor between 0 and 0.9. The dual speed algorithm was insensitive to the variation of the slack factor. Again, DSDR far outperformed the dual speed algorithm due to its ability to reclaim unused run time from early completions. For example, at utilization 0.4 and slack factor 0.5, DSDR saved 70 percent while the dual speed algorithm saved only about 50 percent of the power consumed under the static speed algorithm.
243
Processor Utilization
Simulations experiments were also performed at differ ent system load levels. Again we fixed blocking_factor and blocking_prob at 0.2 and 0.5, respectively. Two sets of experiments were carried out with slack factor set at 0.5 and 0.8, respectively. The utilization factor was varied be tween 0.1 and 1.0 (see Figure 8) . The normalized energy consumptions remained relatively constant when the utiliza tion was low but grew as the utilization exceeded 0.6. It turned out the growth was due to task dropping in the ad mission control phase. Tasks with large blocking sections or small periods were more likely to be dropped since they would signifi cantly increase H. As a result, H -L began to shrink, which reduced the energy saving of the dynamic algorithms. In all cases, DSDR consistently outperformed the dual speed algorithm.
In summary, both dynamic voltage scheduling algo rithms outperformed the static speed algorithm under all pa rameter settings. Taking advantage of the dynamic reclaim ing mechanism, DSDR saved even more energy than the dual speed algorithm, especially when the actual execution times were less than their WCETs. this area can be classified into two categories: interval based scheduling [3, 10, 12] and profile-based schedul ing [1, 2, 9, 11].
Interval-based scheduling estimates the processor uti lization based on observations in the past interval(s). For example, the PAST algorithm records the processor utiliza tion in the previous interval. If the utilization exceeded the upper threshold, the supply voltage is incremented; if the utilization was below the lower threshold, the supply volt age is decremented [12] . The weighted-AVG algorithm [10] makes use of the processor utilization in the previous inter val as well as the average utilization in all past intervals. Thus prediction is made based on both short-term and long term system behaviors. Simulations using real-life traces were also carried out to evaluate the performance of these algorithms. More recently, Lorch et al. [3] proposed to use distributions to estimate the processing requirements of fu ture tasks from the recent behaviors of similar tasks. They suggested using the gamma model for its low complexity and good predicating capability.
Interval-based algorithms assume the workload is more or less stable (or follows some distributions) so that the be haviors of future tasks can be accurately estimated based on past observations. The accuracy in predicting future tasks significantly affects possible energy reduction. This kind of strategies is not suitable for real-time tasks. As time con straints are not considered, the algorithms may improperly 244 lower the processor speed and cause deadline misses, espe cially when the execution demands vary greatly from job to job.
Profile-based scheduling is usually used to schedule real time tasks. Some knowledge of the released jobs, and to some extent of the future jobs in the current task set is sup posed to be known. Yao et al. [11] derived an optimal of fline scheduling algorithm for aperiodic real-time tasks. An online approximation algorithm was also presented in the same paper. Hong et al. [9] used a similar planning-based approach to handle non-preemptible tasks. The traditional real-time periodic task model is used in some recent work. Pillai and Shin [2] studied both fixed priority and dynamic priority scheduling and proposed schemes under the rate monotonic (RM) algorithm and the earliest-deadline-first (EDF) algorithm individually. Their look-ahead algorithm takes the processing requirements of future jobs into con sideration and delay processing of these jobs as much as possible. Aydin et at [I] proved there is an optimal static voltage that can feasibly schedule periodic tasks with min imum power consumption. They also proposed a dynamic algorithm which uses the slacks of the jobs that complete early.
All of the above work assumed that tasks were either fully preemptible or completely non-preemptibIe. Schedul ing tasks with blocking sections usually require a higher processor speed than the speed needed for fully preemptible tasks (i.e., the utilization speed). Our approaches utilize this fa ct and allow the processor to operate at the utiliza tion speed or lower at strategic intervals without impairing the fe asibility of the task set.
Conclusion
In this paper, we have investigated voltage scheduling of real-time periodic tasks with non-preemptible blocking sections. Three voltage scheduling schemes have been pro posed to minimize energy consumption while satisfying the tasks' time constraints. The static speed scheme is based on the stack resource policy (SRP) [13] and calculates a mini mal fe asibly static speed. Instead of always operating at one static speed, the dual speed algorithm lowers the processor speed to the utilization speed in some intervals. We have also presented a reservation-based scheme which reserves run time for each job. A reclaiming mechanism is used to collect the unused run time and redistribute it to jobs that are able to make use of it. As the jobs that receive extra run time are eligible to run for a longer period of time, the processor speed can be further reduced to save energy. 
