Abstract-In this paper, we focus on the pinwheel task model with a variable voltage processor with d discrete voltage/speed levels. We propose an intra-task DVS algorithm, which constructs a minimum energy schedule for k tasks in O(d+k log k) time. We also give an inter-task DVS algorithm with O(d+n log n) time, where n denotes th e number of jobs. Previous approaches solve this problem by generating a canonical schedule beforehand and adjusting the tasks' speed in O(dn log n) or O(n 3 ) time. However, the length of a canonical schedule depends on the hyper period of those task periods and is of exponential length in general. In our approach, the tasks with arbitrary periods are first transformed into harmonic periods and then profile their key features. Afterward, an optimal discrete voltage schedule can be computed directly from those features.
INTRODUCTION
In the last decade, energy-aware computing has become widespread not only for portable and mobile devices powered by batteries, but also for large systems in which the cost of energy consumption and cooling is substantial. With dynamic voltage scaling (DVS) techniques [1, 11, 12, [14] [15] [16] 27] , processors are capable of performing at a range of voltages and frequencies. Since energy consumption is at least a quadratic function of the supply voltage (hence CPU speed), the total energy consumption could be minimized by sharing slack time while satisfying the time constraints of the tasks.
For the DVS hard real-time systems, two categories of algorithms are used: inter-task DVS [1, 11, 21, 24] and intra-task DVS [5, 15, 25] . In the former case, speed assignments are determined at task dispatch or completion times. In other words, when an instance (job) of a task is assigned to a CPU, the CPU speed is not changed until it is preempted or complete. Inter-task scheduling algorithms are often implemented under operating system control, and programs do not need to be modified during their runtime. On the contrary, intra-task DVS algorithms adjust the CPU speed within the boundaries of a given task. Intra-task DVS techniques are under software and compiler control by using program checkpoints or voltage scaling points of the target real-time software. Therefore, it exploits all the slack time from the run variations of different execution paths and the CPU speed is gradually increased to assure the timely completion of real-time tasks. However, checkpoints have to be generated at compiling time and indicate places in the code where the processor speed and voltage should be re-calculated. They could increase the complexity of programming and the overhead of real-time systems.
Many theoretical models for DVS only consider the power consumption function with convexity [1, 21, 26, 27] . In these models, the processor must be able to run at any real-speed level in order to achieve optimality. In general, an off-the-shelf processor with variable voltages runs only at a finite number of speed levels. For example, the Intel SpeedStep® technology [28] and AMD Cool'n Quiet ® [29] that are currently used in general-purpose mobile and handy devices support 3 and 5 speed levels, respectively. Therefore, an applicable model for DVS scheduling should capture the discrete, rather than continuous, nature of the available speed scale.
There are several works that have addressed the problem of task scheduling and min-energy DVS scheduling. Yao et al. [26] proposed a theoretical DVS model and an O(n 3 ) algorithm for computing a min-energy DVS schedule in a continuous variable voltages CPU. Ishihara et al. [9] proposed an optimal voltage allocation technique using a discrete variable voltage processor. However, the optimality of the technique is confined to a single task. Kwon et al. [6] proposed an optimal discrete approach, which is based on the continuous version in [26] and therefore requires O(n 3 ) time. The recent result proposed by Li et al. [15] gives an O(dnlog n) time algorithm, which constructs a minimum energy schedule without first computing the optimal continuous schedule. In the min-energy DVS scheduling algorithms mentioned above, those techniques have to generate certain schedules in advance as the intermediate processing of their algorithms. For example, the algorithm Bipartition in [15] has to generate an s-schedule and a reversed s-schedule in advance. Moreover, algorithm Alloc-vt in [14] has to generate a minenergy continuous schedule from [26] prior, in order to perform their algorithm. Since the lengths of such schedules depend on the LCM of task periods, their algorithms could not be completed in polynomial time. Moreover, in the periodic tasks systems, the preprocessing overhead produced by these approaches may become very severe when tasks join and leave the systems frequently.
In the real-time applications, broadband 3G (B3G) wireless communication systems provide a packet-switched core network to support broadband wireless multimedia services. The resource management policies in the cell of B3G system are to guarantee the quality-of-service (QoS) of real-time (RT) traffics. To guarantee the QoS of RT traffics in a cell, many researchers [2, 13, 18, 19] proposed the pinwheel scheduling algorithms to reduce the jitter of variable bit rate (VBR) traffic in a cell. In other applications, such as the medium access control (MAC) layer of CDMA and TDMA-based wireless networks [10, 22, 23] , many pinwheel scheduling schemes are proposed for solving the frame-based packet scheduling problems. These pinwheel methods provide low delay and low jitter for RT traffic and short-queue length for non-RT traffic.
In a network system, jitter is the variation in the time between packet arrival, which is caused by network congestion, timing drift, or route changes [16] . In a periodic task schedule, a task's jitters are often caused by the interference of other tasks. A jitterless schedule means that the inter-arrival times of successive instances (jobs) of a task are identical. For example, the delay and jitter control in ATM (Asynchronous Transfer Mode), a multimedia stream requires QOS including end-to-end delay 100ms, and the bandwidth should be allocated approximately to guarantee the delay in network systems. In many real-time applications, tasks must be executed in a distance-constrained manner, rather than just periodically. C. -W. Hsueh et al. propose the Sr [6] algorithm to transform the lengths of periods of a pinwheel task into harmonic, which is shorter than or equal to the original periods. Moreover, D. -R. Chen et al. [4] give the property that each task in this type of schedule have, a constant relative beginning, and finishing and preemption times, and therefore this schedule provide a good predictability and allow for offline scheduling optimization. This paper discusses the theoretical power-aware real-time scheduling. We consider a discrete DVS scheduling problem for periodic task systems given worst-case execution times (WCET). We proposed an algorithm that finds a min-energy intra-task schedule in O(d+k log k) time. We also give an inter-task scheduling algorithm in O(d+nlog n) time where k, n, and d denote the number of tasks, jobs, and voltage level, respectively. Notably, our approaches are off-line and are truly polynomial-time algorithms, which can be achieved by the following three phases.
(1) We proof that any slack time can be shared among the tasks with a transformed period. (2) We also compute the total utilization with tasks speed s c (s 1 ≥ s c ≥ s d ). which is first greater than or equal to 1. (3) Given the speed, s c , we compute every task's features, such as the relative beginning and finishing times of every task and we adjust the speed of the tasks to prevent missing a deadline.
The rest of the paper is organized as follows: in Section 2, we give the model and the notational conventions. Section 3 presents the properties of a jitterless schedule and the algorithm that generates a pinwheel schedule. The DVS algorithms are proposed in Section 4. In Section 5, we present the performance analyses of the proposed algorithms and compare the utilization of transformed task sets with those of their original task sets. Section 6 concludes this paper.
TASK MODEL
Pinwheel task systems are first motivated by the performance requirements of satellite-based communications [6] . A pinwheel task T i is defined by two positive integers, an execution requirement and a window length, with the explanation that the task, T i , needs to be allocated to the shared resource for at least a out of every b consecutive time units. Additionally, pinwheel scheduling is also applied in the channel assignment policies with buffer and preemptive priority for RT traffics. In many RT applications, tasks must be executed in a distance-constrained manner [17, 20] , rather than just periodically. For example, in the wireless sensor network applications, a multi-sensor assessment task is invoked either periodically or is triggered by certain events. This assessment process may have one or more input tasks for collecting data from different sensors. Similar requirements exist in the phased-array radar systems, where the dwell tasks collect device data and recognize the properties of the aircrafts within certain end-to-end deadlines [8] . In the distance-constrained task systems (DCTS), the temporal distance between any two consecutive executions of a job should always be less than a certain value. In DCTS, pinwheel tasks transform the distance-constraints into 2 n multiples of other shorter periods [7, 8] , which are not longer than their original distance-constraints, by using the algorithm of Sr [6] .
The advantage of the period transformation is that the produced schedules have regular start, preemption, and finish times, and therefore provide good predictability.
We focus our attention on synchronous, preemptive, and periodic task systems. In the task set τ={T 1 , …, T k } of k periodic real-time tasks, every task T i consists of an infinite sequence of jobs j i,1 , j i,2 , …. A task T i with a WCET requirement e i and a period p i has the weight w i =e i /p i , where 0<w i <1. A feasible schedule must give each job its WCET between the arrival-time r i and the deadline d i . In the task model, we assume that every task period p i and deadline has been transformed as harmonic that they have been sorted according to their periods, p 1 ≤ p 2 ≤…<p k . Because of the jitterless schedule, the relative beginning b i and finishing time f i of T i are fixed and can be efficiently obtained in Section 3.
Denote by s 1 >s 2 >…>s d the clock speeds corresponding to d given the discrete voltage levels. The highest speed s 1 is always fast enough to guarantee a feasible schedule for given tasks. Moreover, e i and . For simplicity,  U denotes the total weight of τ at highest speed. The power P, or energy consumed per unit time, is a convex function of the processor speed. The energy consumed by the processor during the time interval [t 1 , t 2 ] is E(t 1 ,
. We refer to this problem as discrete DVS scheduling (abbreviated to DVSintra-task). The first goal is to find, for any given task set τ, a feasible schedule produced by DVS-intra-task that minimizes E. In the inter-task version, every job has only one speed during its execution. The second goal is to generate the inter-task (abbreviated to DVS-inter-task) schedules and to lower their energy consumption as that of the schedule produced by DVS-intra-task as possible.
JITTERLESS SCHEDULE
In this section, we introduce the concept of an h-schedule produced by Sr and propose its important properties. Algorithm Sr [6] converts the periods into a set of special periods that are not greater than the original periods, while minimizing the total task utilization increase. For example, in Figure 1 
Speed selection
In this subsection, we prove that any slack time in a jitterless schedule with harmonic task periods can be allocated to all jobs. By using this property, we can provide a unique speed for a schedule to minimize energy consumption and the times of speed adjustment.
Definition 1.
An h-schedule, for  which conforms to the RM policy and the lengths of the task period, are transformed into harmonics by using Sr [6] .
Without loss of generality, the length of an h-schedule is equal to p k . Notably, as long as the utilization of the task set after transformation is less than or equal to 1, the task set can be feasibly scheduled. 
This contradicts our assumption. □ Definition 2. In the h-schedule for τ, we define a deadline as being tight if task T i is finished just on time at d i .
Theorem 2.
In an h-schedule for τ, slack time exists if and only if all jobs in the schedule do not miss their deadlines and have no tight deadline.
Proof. For the "only if" direction, we prove by contradiction. Case 1: (k=1) When an h-schedule contains only one task which has tight deadline, all of its jobs must be finished exactly at their deadlines. Therefore, no slack time exists in the schedule. Without loss of generality, T k has the lowest priority in τ and we derive  U ≥ 1. According to Lemma 1, it contradicts our assumption. The "if" direction is easy to see. Suppose the h-schedule for  contains no slack time. Without loss of generality we only discuss the jobs performed in interval I. The total execution time of these jobs can be written as: . Since interval I contains no slack, we have:
When a  U >1, h-schedule is not feasible, there exists at least one job that is missing its deadline. When  U =1, the latest job in interval I must have a tight deadline. This completes the proof. □ Base on Theorem 2, as long as an h-schedule that is executing at a constant speed is missing a deadline or has a tight deadline, there is no wasted slack time in the schedule. We give the following definition of eligibility for CPU speed to the h-schedule: Definition 3. In an h-schedule for , we define the critical speed s c to be the highest speed such that all tasks execute at speed s c and c U  ≥ 1. The purpose of s c produced by CriticalSpeed(τ) in Figure 2 is to rapidly find a suitable hschedule and to reduce the number of speed adjustments.
Task execution profiling
After computing s c , we can derive the beginning and finishing times of every task under speed s c without constructing an actual h-schedule. In Figure 3 
Example. Consider a 3-task system ={T1, T 2 , T 3 } with e 1 =0.8, e 2 =11, e 3 =1.6, p 1 =3.25, p 2 =6.5 and p 3 =13. The CPU has three speed levels: s 1 =1, s 2 =0.8 and s 3 =0.5. The h-schedule with the highest speed is shown in Figure 4 The schedule with s c can be illustrated in Figure 4 (b). In fact, we can obtain the task features by using Algorithm 2, instead of producing the whole schedule. Therefore, we have b 1 =0, f 1 =b 2 =1.6, f 2 =b 3 =5.4 and f 3 >12.
SCHEDULING ALGORITHMS
Before presenting the algorithms, we introduce using the following notations throughout this section. For an h-schedule under speed s c , we define: Figure 5 presents an off-line DVS algorithm for the intra-task schedule, which minimizes the 
Proposed algorithm for the intra-task schedule

Theorem 3.
An h-schedule for task set  consumes the minimum energy using the DVS-intratask(τ) algorithm.
Proof: An h-schedule with minimum energy consumption does not contain any idle period. In our scheme, the energy consumption is determined by the total amount of time the processor runs at speed s c and s c-1 , respectively. It is clear that, according to line 7 and line 8 of the DVSintra-task (τ), an h-schedule does not contain any idle period. Moreover, because of the convex of power/speed curve, the wide-gapped speed adjustments will hamper significant energy savings [26] . Consequently, the range of speed adjustments of the DVS-intra-task (τ) is at most one speed level, when an h-schedule under speed s c incurs missing a deadline. Clearly, the speed adjustment for h-schedule minimizes energy consumption, and the theorem follows: □ The algorithm in Figure 5 runs in O(d+klogk) time, where k denotes the number of task in .
In the algorithm, its input periods have to be transformed in advance by Sr [6] , which runs in O(k log k) time, and Algorithm 1 (in line 3) needs O(d) time for finding a critical speed. Notably, the time complexity of the algorithm in [15] is O(dnlog n) while n denotes the number of jobs. In the task model, the number of jobs is far greater than that of task. Therefore, our scheme outperforms their algorithms, even if we ignore the overhead of producing the whole schedule in their method.
Proposed algorithm for the inter-task schedule
In an h-schedule produced by a DVS-inter-task(τ), every job has a unique execution speed whenever it performs. However, the decision for speed adjustment is similar to that of the knapsack problem with an unbroken object and identical value. More precisely, given that  is derived from the h-schedule, we have to decide which jobs can be placed into the interval in such a way that the sum of their execution time is greater than  and their differences are minimal. Unfortunately, optimization algorithm for this decision problem runs in pseudo-polynomial time [3] . In other words, the time complexity depends on the length of an output schedule.
The objective of our algorithm is to avoid the fluctuation of execution speeds as much as DVS-intra-task (τ) Then assign s c to the whole h-schedule.
Or Else
= possible. Thus, the proposed method has to know the task features produced by Algorithm 2. In order to choose the successive jobs for increasing their speed, we sort all jobs by their finishing times and it takes O(nlog n) time. From lines 7 to 11, they search for the suitable jobs according to this order and books the nearest job with the minimum length of execution. Hence, the cost is at most O(n). For the remaining part of the complexity, period transformation takes O(klogk), finding the critical speed needs O(d) and to compute the f i and b i of every task O(k) is needed. Because n>>k in a periodic task model, the total running time of Algorithm 4 is O(d+nlogn). Figure 4 (c), the first job of T 1 is placed in job set J and other jobs cannot be picked, because of the for-loop of Algorithm 4 in Figure 6 . Finally, in line 12, the second job of T 1 is assigned to j min and included in J.
Example: In
PERFORMANCE ANALYSIS
Besides the proposed methods, we discussed the performance of the techniques in [9, 14, 15, 26] . Because these methods belong to DVS-intra-task (τ) scheduling and are optimal solutions in power consumption, we can only compare their time complexities. In addition, the energy efficiency of our DVS-inter-task (τ) method is presented.
The complexity of given methods are shown in Table 1 . The second and third rows present the time complexity of DVS-intra-task(τ) and DVS-inter-task(τ) techniques, respectively. The fourth row shows the space complexity of each method. Notations n, k, and d denote the number of jobs, tasks, and speed levels, respectively. Notably, L denotes the length of an input schedule, and "X" denotes that the corresponding method does not provide such a technique. In a periodic task system, the number of jobs is far greater than that of tasks and our DVS-intra-task (τ) algorithm outperforms those of others.
Notably, the method proposed by Ishihlara et al. is formulated as a linear programming problem and has at least O(dn) time complexity [9] . However, the optimality of the technique is confined to a single task. Therefore, the optimality does not hold for the practical case in which every task has different execution speeds. In addition, the techniques proposed in [14, 15] have to generate a "canonical" schedule before voltage/speed adjustments. Their space complexities depend on the length of such schedule and could not be generated in polynomial time. For example, in Figure 1 (a), we have to generate a canonical schedule with length of the Least Common Multiplier (LCM) of these five tasks. In Figure 1(b) , the length of the canonical schedule is at most 21.2.
Our experiments were carried out in two respects: (1) the inflation of U τ caused by Sr and (2) the energy-efficiency of the DVS-inter-task(τ) scheme. Because of period transformation, U τ is greater than its original utilization and the difference between them is called inflation. In Figure  7 , Sr is performed in the 20,000 randomly generated task sets with varying sizes. For example, when every τ contains 9 tasks, the average inflation per task set is 0.14. The figure indicates that inflation will rise at a rate that is proportionate to the size of task set.
Since the DVS-intra-task(τ) scheme produces a min-energy schedule, it can be a yardstick by which we can assess the energy efficiency of the DVS-inter-task(τ). In Figure 8 , it presents the percentage of normalized deviation produced by the DVS-inter-task(τ) to that of an optimal solution versus the sizes of task set. Because the previous works [9, 14, 15, 26] are optimal in energy consumption, we do not need to compare them again with the proposed DVS-intertask(τ). The figure shows that the energy consumption of the DVS-inter-task(τ) does not deviate more than 6% from that of optimal solution. It also presents that energy consumption is rather sensitive to the varying sizes of task sets. The deviation is less than 4% when the sizes of task sets are greater than 10. The main reason is that, when the number of tasks increases, the number of candidate tasks for sharing the slack time is also increased.
CONCLUSIONS
In this paper, we considered the pinwheel task scheduling on a variable voltage processor with d discrete voltage/speed levels. In the assumption of harmonic task periods, we gave an intra-task scheduling algorithm, which constructs a minimum energy schedule for k periodic tasks in O(d+klogk) time. We also gave an inter-task scheduling algorithm, which decreases the number of speed/voltage switching in O(d+nlogn) where n denotes the number of jobs. Because of the periodic task model, the number of jobs is far greater than that of tasks. Our schemes outperformed the scheme in [15] even if the task is equivalent to the job. Moreover, since the schedule is obtained without first generating an actual schedule, our schemes are truly polynomial time algorithms. We also proposed some fundamental properties associated with jitterless schedules. The useful properties may provide some new insights for jitterless task scheduling. 
