Abstract nique. Suppose a task (pi with scheduled voltage VDDi was completed and In recent times, much attention has been devoted to power optimization for the next task (pj with scheduled voltage VDDj is started. Let VDDi < VDDj. real-time systems, while guaranteeing that such systems meet their hard (or Assume that task (pj had already been queued while (Pi was being processed. soft) scheduling deadlines. To reduce power, different tasks in such systems In this case, we begin the VDD switching from VDDi to VDDj during the may be run at different power supply voltages, in order to maximally utilize time (p is being executed, which results in a condition where (Pi completes slack in the schedule. However, prior approaches have ignored the practical its work (defined henceforth as the number of cycles required in the compuaspects of switching the power supply. In a typical IC, the VDD net is highly tation of a task) earlier. Task (pj now begins earlier than planned. We find capacitive, and as a result, its voltage cannot be changed instantaneously, the time Tl at which, if VDD switching from VDDi to VDDj is initiated, In traditional approaches, the assumption is that this net switches instan-then the speed-up of (Pi is equal to the increased delay of (pj. In other words, taneously, which in effect makes it essential to include the VDD switch-the work of both (Pi and (pj completes before their respective deadlines. We ing time in the worst-case execution time (WCET) of a process (adding formulate this problem and find an expression for the time T,*. We report pessimism to the WCET value). In our approach, we precisely model the the results of experiments to validate this expression, showing a close match switching of the VDD net, and allow the system to continue computations between the mathematical model and the experimental delays.
1. Introduction our approach. Section 5 summarizes our work.
The area of embedded system scheduling arguably started with the seminal work of Liu and Layland [1] in 1973. In this paper, the authors assumed a 2. Previous Work single-processor system with n independent periodic tasks, and given worst In the seminal paper by Liu and Layland [1], the authors motivated the case execution times (WCETs). Liu and Layland showed that if tasks were area of real-time systems, and provided a fixed priority scheduler which had scheduled statically using a priority which was inversely proportional to an asymptotic upper bound for processor utilization of 69%. The focus of their periodicity, the resulting schedule was optimum among all fixed pri-this work was schedulability, rather than power. ority schedules.
In recent times, with the growing interest in low-power real-time embedIn recent times, the work of Liu and Layland has been extended in several ded systems, there have been several efforts to augment the work of [1] for ways. There have been several approaches to devise static power-conscious low power applications. Most of these efforts attempt to reduce power by scheduling algorithms [2, 3, 4, 5] . Other dynamic schedulers were also re-scaling the frequency of operation, the value of VDD1, or by powering down ported [6, 7, 8, 9 , 10] which utilized dynamic voltage scheduling (DVS). In the system during periods of inactivity. An excellent review of low power a DVS processor, voltage may be modified dynamically, allowing the sched-scheduling is found in [15] . uler to trade off power for delay (by varying the VDD value of the processor).
In [2] , the authors augment a fixed priority schedule in a power conscious
The independent task assumption of [1] was removed in [11] . Techniques to manner. If there is dead-time in the schedule, such periods are filled in by generate variable supply voltages were reported in [12, 13] . reducing the clock frequency, VDD value or by system power-down. In [3], In the above scheduling algorithms, it is assumed that the delay overhead the authors devised an algorithm to find the optimal voltage for each task. of VDD switching was negligible. However, this is not the case for realistic They ignore the delay and power overhead of switching VDD. However, processors that are used to implement real time embedded systems. These this is a problem in general since the VDD net on an IC can be significantly processors have significantly capacitive VDD nets. For example, the capac-capacitive, especially for Systems-on-a-Chip (SOCs). For an large IC, this itance on the VDD net in [14] was reported to be 160nF. In fact, designers capacitance can be in the range of a a few 100 nF [14] . Later, in [4] an energy make special efforts to increase this capacitance for signal integrity reasons. efficient fixed priority scheduling algorithm was reported, which could be This clearly makes the zero VDD switching delay assumption weaker. As used to find optimal voltages for each task or for entire task sets. Finally, a result, the assumption that VDD switching has negligible delay overhead in [5] , a genetic DVS algorithm was presented. is unjustified in modern designs. If we were to use the zero VDD switching Fixed priority dynamic voltage schedulers (DVS) have also been extended delay assumption, the worst-case VDD switching delay must be included in to dynamic schedulers. In [6] , a DVS algorithm was reported for dynamic the WCETs of each task, resulting in more conservative WCETs. If the com-schedules, using slack analysis. In [7] , the authors reported a static and putation of a task is pre-empted n times by other tasks (operating at different dynamic algorithm for voltage and clock scaling of real-time embedded sysvoltages), we need to increment the WCET of the task by n times the worst tems. In [8, 9] reported for heterogeneous real-time distributed embedded systems. VDD The independent task assumption was removed in [ 11] , where the authors VDD reported a scheduling algorithm for periodic task graphs, with the additional iDD ability to handle aperiodic tasks. These algorithms were generalized to the DVS scenario as well. VDDi In [ 16] In all the above efforts, the thrust was on scheduling algorithms. The VDD time required to switch VDD was ignored, and implicitly included in the VDD task WCET. This adds pessimism to the schedule, since the worst-case time iDD taken by the VDD net to switch must be factored into the WCET of each process. In our work, we allow task execution during VDD switching, al- In this manner, we can perform computation while VDD is being switched, Figure 1 : DVS timing diagrams allowing us to reduce the pessimism in the WCETs of the tasks. The VDD switching is therefore performed on-the-fly, even while tasks pi and pj are Figure 1 illustrates the problem being addressed by our approach. When being computed.
a DVS enabled processor switches from VDDi to VDDj during operation, If VDDi > VDDj, then by starting the transition at the same time T1 as in its switching waveform is a rising (or falling) exponential, since the VDD the ideal case of Figure 2a ), we can ensure that the work of (pi is completed net on a modern IC is significantly capacitive, and the voltage regulation before its deadline T1. For (P2, the average VDD value is above VDDj, and circuit has a finite series resistance. In this section, we discuss both the hence it completes earlier than scheduled, again guaranteeing that its work cases, in which VDDi < VDDj and VDDi > VDDj. Traditional scheduling was completed before its deadline T2.
approaches consider the VDD transition to be an ideal step function, which
The computation of Tj* is performed as follows.
means that the worst case dead-time (A) during VDD switching increases The total delay of computing tasks (Pi and (pj using our on-the-fly VDD Figure 2a ) which assumes that the VDD change is a step function) is given that of (P;2 is T2. Assume that (p1 is queued already.byEuto3
Subfigure a) illustrates the ideal case (i.e. the VDD switching waveform b qain3 is an ideal step function). In other words, this assumes that the dead-time (A) is zero.
Nideal =|F[vDDi1dt + / FVD]t(3)
Subfigure b) illustrates the actual VDD switching waveform, which our°T method incorporates into the schedule. Our method must obviously med1 16 The first task we performed was to find the value of ux. We determined Without loss of generality, we let T*= 0. In that case, T1 becomes the through experimentation that xu 2. We used a simple 9-stage ring oscil-look-ahead time, which we need to determine: lator, along with two other circuits (apex7 and alu4) from the MCNC91 Let t' t/t and T2' T2/t, and using the assumption that T2 > t, we get: benchmark suite. We mapped these circuits using SIS [20] to a library of 21 gates using a predictive 0.1pm process technology [ (VaVT)2 (Vib-VT)2
If VDD switching is a step function (the ideal case of Figure 2a) ), we have:
As we can see, T1 is independent of T2, which is intuitively reasonable.
T i T2
The When both Va and Vb are much greater than VT, T1 is very close to T.
From T,< to T2, VDD is an exponential function of time: 
