Experiences in Implementing an Energy-Driven Task Scheduler in RT-Linux by Vishnu Swaminathan et al.
Experiences in Implementing an Energy-Driven Task Scheduler in RT-Linux
1
Vishnu Swaminathan, Charles B. Schweizer, Krishnendu Chakrabarty and Amil A. Patel
Department of Electrical & Computer Engineering
Duke University
Durham, NC 27708
fvishnus,cbs2,krish,aap
g@ee.duke.edu
Abstract
Dynamic voltage scaling (DVS) is being increasingly
used for power management in embedded systems. Energy
is a scarce resource in embedded real-time systems and en-
ergy consumption must be carefully balanced against real-
time responsiveness. We describe our experiences in imple-
menting an energy driven task scheduler in RT-Linux. We
attempt to minimize the energy consumed by a taskset while
guaranteeing that all task deadlines are met. Our algo-
rithm, which we call LEDF, follows a greedy approach and
schedules as many tasks as possible at a low CPU speed in
a power-aware manner. We present simulation results on
energy savings using LEDF, and we validate our approach
using the RT-Linux testbed on the AMD Athlon 4 proces-
sor. Powermeasurementstakenonthetestbedcloselymatch
the power estimates obtained using simulation. Our results
showthatDVSresultsinsigniﬁcantenergysavingsforprac-
ticalreal-life task sets. We alsoshow that whenCPU speeds
are restricted to only a few discrete values, this approach
saves more energy than currently existing methods.
Keywords: Deadlines, low energy earliest-deadline-ﬁrst
(LEDF) scheduling, RT-Linux, dynamic voltage scaling,
variable-speed task scheduling.
1 Introduction
Energy consumption has become an important design
parameter for battery-operated portable and embedded sys-
tems. Since the amount of power available to these systems
1This research was supported in part by DARPA under grant no.
N66001-001-8946, in part by a graduate fellowship from the North Car-
olina Networking Initiative, and in part by DARPA and Army Research
Ofﬁce under Award No. DAAD19-01-1-0504. Any opinions, ﬁndings, and
conclusions or recommendations expressed in this publication are those of
the author(s) and do not necessarily reﬂect the view of the DARPA and
ARO agencies.
is limited, it is desirable to minimize the energy consump-
tion such that the life of the battery or battery pack may
be maximized. Embedded systems such as transmitters and
sensors tend to be situated at remote locations; hence, the
cost of replacing battery packs is high when the batteries
that power these systems fail.
A number of embedded and mobile systems are also de-
signed for real-time use. These systems must be designed
to meet both functional and timing requirements. Thus, the
correct behavior of these systems depends not only on the
accuracy of computations but also on their timeliness. Any
real-time scheduling algorithm must guarantee timeliness
and schedule tasks such that the deadline of every task is
met. Energy minimization adds a new dimension to these
design issues. While energy minimization for embedded
and mobile computing is of great importance, energy con-
sumption must be carefully balanced against the need for
real-time responsiveness.
One approach to conserve energy is to employ low-
power design methodologies. A number of low-power de-
sign techniques have been developed for various levels of
design, from the architectural to the transistor levels. At the
system level, idle functionalunits can be powereddown[6].
Predictive shutdown based power optimization techniques
are presented in [12, 22]. Although these methods are easy
to implement, theyare tailoredfor systems with a ﬁxed sup-
ply voltage. Most embedded processors are equipped with
a voltage supply that is capable of providing at least two
different voltages to the system, and the above methods are
unable to make use of this feature. Furthermore,since these
methodsaregenerallyappliedatthedesign/synthesisphase,
they are inﬂexible and cannot adapt to changing workload
conditions.
An alternative approach to saving energy is based on dy-
namic power management (DPM), in which the operating
system (OS) is responsible for reducing system power con-
sumption. DPM was made possible largely due to the in-
troduction of the Advanced Conﬁguration and Power Inter-
face (ACPI) standard for desktopand notebooksystems [2].
1
Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’02) 
0-7695-1739-0/02 $17.00 © 2002 IEEE The standard allows hardware power states to be controlled
by the OS through system calls, effectively transferring the
power reduction responsibility from the hardware (BIOS)
to the software (OS). The ACPI standard has contributed
to the increase in availability of variable-speed processors.
The speeds of these processorscan be changeddynamically
through OS system calls. In most cases, an increase (de-
crease) in processor speed is achieved by increasing (de-
creasing) the supply voltage to the processor. Thus, when
the processor workload is low, the OS can potentially re-
duce the supply voltage to the processor, thereby utilizing
the quadratic dependence of power on voltage to reduce
power consumption.
In embedded systems with variable-speed processors,
the OS can reduce energy consumption by scheduling tasks
appropriately. For real-time systems, optimal preemptive
ofﬂine scheduling algorithms have been developed [23].
Heuristics for ofﬂine scheduling of non-preemptive real-
time tasks were presented in [9]. In [10], preemptive task
scheduling was addressed with a limit on the number of
speed changes allowed. More recently, an ofﬂine ﬁxed pri-
ority scheduling algorithm was presented in [19]. In [15],
an online DVS scheme that considers workload-variation
slack time was presented. DVS techniques for multiproces-
sor systems are described in [8] and [13]. In [3], a spec-
ulative technique is proposed to reduce processor speed in
anticipation of reduced execution times. Another predic-
tive scheme is proposed in [20]. This scheme has been im-
plemented in the eCos operating system [7] running on a
StrongArm processor board. The placement of power man-
agementpointsto adjustCPU speed has been studied in [1].
Compiler-assisted DVS is also studied in [11].
Although the scheduling methods cited above are very
efﬁcient, most of them make the assumption that the CPU
can operate at different voltage levels (and hence different
clock frequencies), which can be varied continuously.I n
addition, several of these methods are aimed at the synthe-
sis of low-power designs and they do not address energy
minimization during task execution. Furthermore, most of
the above methods investigate DVS techniques from a the-
oretical standpoint. It is unclear how these methods can be
implemented in practice and how effective they would be
in a practical system. In [17], the authors have developed
simple real-time scheduling strategies based on the earli-
est deadline ﬁrst and rate monotonic algorithms [16], and
they have implementedthem on the Linux operatingsystem
running on an x86 platform. Their techniques were imple-
mented as extendable modules to the Linux kernel, which
does not guarantee real-time behavior. In [18], another im-
plementation of a real-time DVS algorithm, termed energy
priority scheduling, is presented. This method too has been
implemented on the non-real-time Linux OS running on a
modiﬁed StrongArm system board.
In this paper, we ﬁrst present an online scheduling al-
gorithm for real-time systems that attempts to minimize
the energy consumed by a given taskset. This algorithm
was implemented on a laptop equipped with AMD’s Power
Now! DVS capability and running Real-Time Linux (RT-
Linux). We then present implementation details and ex-
perimental power measurements that validate and support
our simulation results. Our algorithm is based on the well-
knownearliest-deadline-ﬁrst(EDF)algorithm. We consider
a practical scenario where a single CPU executes a set of
non-preemptable tasks. A generic periodic taskset is con-
verted using the LCM method into one where all tasks have
the same period [14]. The voltage, and consequently the
clock speed, of the CPU may be switched between two or
more values dynamically at run-time through OS system
calls. To the best of our knowledge, this is the ﬁrst im-
plementation of a real-time DVS algorithm on a real-time
operating system.
The rest of the paper is organized as follows. In Sec-
tion 2, we develop a framework for minimum-energy task
scheduling and describe the underlying assumptions. In
Section 3, we present the LEDF heuristic for task schedul-
ing. We also present simulation results to evaluate LEDF.
In Section 4, we describe our RT-Linux testbed and in Sec-
tion 5, we provide experimental results on power measure-
ments using this testbed. Section 6 summarizes our work,
discusses the lessons learned during this implementation
and experimentation, and describes extensions and direc-
tions for future work.
2 Preliminaries and Problem Statement
In this section, we present our notation and the underly-
ing assumptions. We are given a set
R
=
f
r
1
;
r
2
;
:
:
:
;
r
n
g
of
n tasks. Associated with each task
r
i
2
R are the follow-
ing parameters:
￿ its release (or arrival) time
a
i,
￿ its deadline
d
i,
￿ its length
l
i (represented in number of instruction cy-
cles), and
￿ its period
p
i.
Each task is released at time
a
i. This is the time at which
the task is placed in the ready queue and is ready to start
execution. We assume, without loss of generality, that all
tasks have identical periods. If the periods of tasks are
different, we can perform a polynomial transformation to
this scenario through the application of the LCM theorem.
When the allowable CPU voltage values can take on only a
few discrete values, it is advantageousto performthis trans-
formation rather than use a utilization-based DVS scheme.
Although this method may not be the best way to transform
periodic tasksets under certain conditions, we show that it
2
Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’02) 
0-7695-1739-0/02 $17.00 © 2002 IEEE workswellforRT-Linuxwhenthenumberofjobsinagiven
taskset is in the hundreds. Finally, all tasks must complete
execution before their deadlines.
We assume that the CPU can operate at one of
k volt-
ages:
V
1
;
V
2
;
:
:
:
;
V
k. Depending on the voltage level, the
CPU speed may take on
k values:
s
1
;
s
2
;
:
:
:
;
s
k. The sup-
ply voltage to the CPU is under OS control and the OS may
dynamically switch the voltage during run-time with rela-
tively low overhead. CPU speeds are speciﬁed in terms of
the number of instructions executed per second. Each task
r
i may be executed at a voltage
v
i
2
f
V
1
;
V
2
;
:
:
:
;
V
k
g,a n d
correspondingly, at a speed
x
i
2
f
s
1
;
s
2
;
:
:
:
;
s
k
g.T a s k s
are not preemptable, i.e., once a task starts executing, no
other task can execute until it completes execution. There
is also no inserted idle time. This means that the scheduling
algorithm does not allow the processor to be idle if there is
a task that has been released but has not started execution.
It is well-known that power consumption in CMOS cir-
cuits has a quadratic dependence on the CPU voltage.
Therefore, the energy
E
i consumed per instruction cycle
by task
r
i is proportional to
v
2
i . Hence, the energy
E
i con-
sumed by task
r
i of length
l
i is proportionalto
v
2
i
l
i
: In other
words, energy consumption of task
r
i varies quadratically
with its assigned processor voltage
v
i and linearly with its
length
l
i.
We now present a formal statement of the scheduling
problem:
P
c
p
u: Given a set
R of
n tasks, and for each task
r
i
2
R,
(i) a release time
a
i
2
Z
+
0 , (ii) a deadline
d
i
2
Z
+,a n d
(iii) a length
l
i
2
Z
+, and a processor capable of operat-
ing at
k different voltages
V
1
;
V
2
;
:
:
:
;
V
k with correspond-
ing speeds
s
1
;
s
2
;
:
:
:
;
s
k, determine a sequence of voltages
v
1
;
v
2
;
:
:
:
;
v
n and corresponding speeds
x
1
;
x
2
;
:
:
:
;
x
n for
the taskset
R such that each task meets its release time con-
straints and deadlines, and minimizes the total energy con-
sumed
P
n
i
=
1
v
2
i
l
i by the system.
3 The LEDF Scheduling Algorithm
The low-energyearliest deadline ﬁrst heuristic, or sim-
ply LEDF, is an extension of the well-known earliest dead-
line ﬁrst (EDF) algorithm. For this description, we assume
that LEDF uses two speeds rather than
k discrete frequen-
cies. The operation of LEDF is as follows:
LEDF maintains a list of all released tasks, called the
ready list. When tasks are released, the task with the near-
est deadline is chosen to be executed. A check is performed
to see if the task deadline can be met by executing it at the
lower voltage (speed). If the deadline can be met, LEDF
assigns the lower voltage to the task and the task begins ex-
ecution. During the task’s execution, other tasks may be
released. These tasks are assumed to be placed automati-
Procedure LEDF()
t
c: current time;
X
l: Low speed;
X
h: High speed;
begin
1. if ready list
6
= NULL
2. Sort task deadlines in ascending order;
3. Select task
r
i with earliest deadline;
4. if
t
c
+
l
i
X
l
￿
d
i
5. if remaining tasks meet their deadlines at high speed
6. Schedule
r
i to run at
X
l;
7. else
8. Schedule
r
i to run at
X
h;
9. else
10. do-nothing;
end
Figure 1. The LEDF algorithm.
￿￿
￿￿
￿￿
￿￿
￿￿￿￿￿￿
￿￿￿￿￿￿
￿￿￿￿￿￿
￿￿￿￿￿￿
3 r1 r2 r r 9 65 r8 r 4 r 7 r11 r rr 13 r12r17 r 14 r15 r 10
￿￿￿￿
￿￿￿￿
￿￿￿￿
￿￿￿￿
r
￿￿￿
￿￿￿
￿￿￿
￿￿￿
￿￿
￿￿
￿￿
￿￿
￿￿￿
￿￿￿
￿￿￿
￿￿￿
￿
￿
￿
￿
￿￿￿
￿￿￿
￿￿￿
￿￿￿
16
￿￿￿￿￿
￿￿￿￿￿
￿￿￿￿￿
￿￿￿￿￿
￿￿￿
￿￿￿
￿￿￿
￿￿￿
0                                    10                                    20                                     30                                    40                                    50                      55
Task at 3.3 V ( 400 MIPS )
Task at 2.47 V ( 300 MIPS )
Energy consumption: 169709 units
Figure 2. Schedule generated using LEDF.
cally on the ready list. LEDF again selects the task with the
nearest deadline to be executed. As long as there are tasks
waiting to be executed, LEDF does not keep the processor
idle. This process is repeated until all the tasks have been
scheduled. Figure 1 explains the algorithm in pseudo-code
form.
Our algorithm has a computational complexity of
O
(
n
l
o
g
n
) where
n is the number of tasks ready for exe-
cution. The worst-case scenario occurs when all
n tasks are
released at time
t
=
0. This involves sorting
n tasks in
the ready list and then selecting the task with the earliest
deadline for execution. Given that all
n tasks have already
arrived and that they are already sorted by deadline, we no
longer need to perform sorting on the taskset. Scheduling a
task for execution has
O
(
n
) complexity due to the deadline
check for all later tasks. This results in a worst-case execu-
tion time of
O
(
n
l
o
g
n
). We now show results of the LEDF
simulations for a taskset of seventeen tasks.
The example taskset is given in Table 1. It consists of
seventeen tasks
r
1 to
r
1
7. Each task has a release time, a
deadline and length. We assume the two processor speeds
to be 300 million instructions per second (MIPS) at 2.47 V
and 400 MIPS at 3.3 V.
The execution times for the tasks at the two different
speeds are also shown. Figure 2 shows the LEDF sched-
ule for the taskset. The tasks that execute at 400 MIPS
are hatched. The energy consumed by the optimal schedule
3
Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’02) 
0-7695-1739-0/02 $17.00 © 2002 IEEE Task Release
a
i Deadline
d
i Length
l
i
l
i
=
3
0
0
l
i
=
4
0
0
(x
1
0
6) (x
1
0
6) (x
1
0
6)
r
1 3 7 800 2.66 2.0
r
2 9 21 750 2.5 1.875
r
3 0 5 1600 5.33 4.0
r
4 18 25 1000 3.33 2.5
r
5 14 16 600 2.0 1.5
r
6 7 10 1200 4.0 3.0
r
7 20 27 1100 3.66 2.75
r
8 14 20 1600 5.33 4.0
r
9 11 14 500 5.0 3.75
r
1
0 30 35 1400 4.66 3.5
r
1
1 27.5 30 800 2.66 2.0
r
1
2 40 42 600 2.0 1.5
r
1
3 34 39 1600 5.33 4.0
r
1
4 40 46 1200 4.0 3.0
r
1
5 44 50 1400 4.66 3.5
r
1
6 44 55 2000 6.66 5.0
r
1
7 40 43 300 1.0 0.75
Table 1. A simple taskset consisting of 17
tasks.
(generatedbyanoff-line,fully-informedmethod)is167327
units. Note that energyis measured in units of
P
n
i
=
1
v
2
i
l
i as
explainedin Section2. Theenergyconsumedby thissched-
ule is 169709 units. We observe that the energy consumed
by the LEDF-generated schedule is only 1% greater than
that for the optimal schedule.
The increased energy consumption of LEDF arises due
tothe factthatLEDFdoesnotpossessa prioriknowledgeof
the release times. The optimal schedule executes tasks
r
1
2
and
r
1
7 at a higher speed (voltage) even though both tasks
could have met their respective deadlines by running at a
lower speed (voltage), because this allows the longer task
r
1
5 to run at a lower speed (voltage). This characteristic of
the taskset cannot be utilized by any online scheduling al-
gorithm, because such an algorithmhas no prior knowledge
of release times. We have thus followed a greedy approach
to low-energy scheduling. Our algorithm, LEDF, schedules
a task at a low speed, if possible, without considering the
effect of this decision on the energy consumptionsof future
tasks.
4 Implementation Testbed
4.1 Hardware Platform
Severalhardwareoptionswere consideredfor our imple-
mentation. First, the Hitachi SuperH SH-4, an embedded
processor, was considered due to its application domain.
As explained in the introduction, the systems that beneﬁt
most from DPM are small, and often portable, embedded
systems. Unfortunately, the SH4 has no scalable dynamic
powersaving features. As with manyembeddedprocessors,
it can be shut down entirely to conserve power for periods
of time, but it does not possess the ability to switch voltages
Power state Speed (MHz) Voltage (V)
1 1100 1.4
2 900 1.35
3 700 1.25
Table 2. Speed and voltage settings for the
Athlon 4 processor.
or frequencies on-the-ﬂy.
The next processor considered was from the Intel line of
x86 processors. In additionto the practical beneﬁts of using
an x86 system for prototyping and testing, the mobile Pen-
tium III processor also features a well-known power saving
mechanism called SpeedStep technology. Processors with
SpeedStep technology contain two encoded bus frequency
multipliers (and core voltages) instead of one. This allows a
maximum performance state while the machine is plugged
into an AC outlet and a low-power operating state when the
machine is running on battery power. The mobile Pentium
III processor was ruled out because it is restricted to only
two power states that are hard-wired into the processor [4].
After ruling out the above alternatives, we chose the mo-
bile AMD Athlon 4 processor for our power measurement
experiments. AMD’s PowerNow! technologyoffersgreater
ﬂexibility in setting both frequencies and core voltages [5].
The 1.1 GHz Mobile Athlon 4 processor can be set at sev-
eral core voltage levels ranging from 1.2V to 1.4V in 0.05V
increments. For each core voltage there is a predetermined
maximum clock frequency, though lower frequencies can
be used if desired.
The power states we chose to use in our scheduler and
simulations are shown in Table 2. Ratios for power con-
sumption in different states can be calculated using the
wellknownrelationshipforCMOS powerconsumption,i.e.
P
=
f
a
C
V
2
d
d,w h e r e
P is the power,
f is the frequency
of operation,
a is the average switching activity,
C is the
switching capacitance, and
V
d
d is the operating voltage.
The switching capacitance and average switching activity
will be constant for the same processor and software, so we
only need to consider the frequency and the square of the
core voltage. We calculate that power state 2 uses approxi-
mately 76% as much power as power state 1 (the maximum
powerstate). Powerstate3usesonly51%asmuchpoweras
the maximum power state. The minimum power conﬁgura-
tion for this processor is 300 MHz at 1.2V, which consumes
only 20% of the power consumed in the maximum power
state.
PowerNow! technology was developed primarily to ex-
tendbatterylifeonmobilesystems. We thereforeconducted
our experiments on a laptop rather than a desktop PC. In-
stead of inserting a current probe into the laptop (a highly
complicated and invasive proposition), we opted to simply
measure system power during the experiments. The lap-
top’s system power is drawn from the power converter at
4
Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’02) 
0-7695-1739-0/02 $17.00 © 2002 IEEE 1.8632 A
Digital Multimeter
Testing station: NX-7321 latop without battery
360 mF
capacitance
Standard 19V
AC/DC adapter
To Outlet
Ground
Power
Figure 3. Illustration of the experimental
setup.
approximately 18.5V DC current. In [17], a digital oscillo-
scope was used to take current measurements at very high
frequencies. These measurements were then averaged over
time and an accurate measurement of power consumption
was derived. This is certainly a very viable test setup for
taking laptop power measurements. We choose, however,
the simpler approach of using a large capacitor to average
out the DC current drawn by the laptop, thus allowing us
to take a single reading instead of averaging a large vol-
ume of data. This method works primarily due to the pe-
riodic nature of our tests. In a periodic real-time system,
the power drawn over one hyperperiod is roughly the same
as the power drawn over the next hyperperiod as long as
no tasks are added or removed from the taskset. Since a
fairly large amount of energy needs to be sourced and sunk
by the capacitor at the different processor speeds and ac-
tivity levels, we use a 30V DC 360 mF capacitance (a 160
mF and a 200 mF capacitor in parallel). This capacitance is
capable of averaging current loads for power state periods
ranging up to hundreds of milliseconds. When the proces-
sor power state switches at a lower rate than this, the capac-
itor is unable to provide or absorb all the transient energy
requirements and the current measurements taken between
the AC/DC converter and the voltmeter readings ﬂuctuate.
Figure 3 illustrates our experimental hardware setup.
4.2 Software Architecture
We used RT-Linux [21] as the operating system for our
experiments. In addition to providing hard real-time guar-
antees for tasks and a periodic scheduling system, RT-
Linuxalsoprovidesa well-documentedmethodofchanging
scheduling policies. An elegant modular interface allows
for easy adaptation of the scheduler module to use LEDF
and then load and unload it as necessary. We used this fea-
tureofRT-Linuxto swapLEDFforaregularEDFscheduler
during power consumption comparisons. Furthermore, RT-
Linux uses Linux as its idle task, providing a very conve-
nient method of control and evaluation for the execution of
the real-time tasks. In the future, we plan to implement our
scheduling algorithm in eCos to show that it will be effec-
tive in a smaller, simpler, embedded operating system. Al-
though eCos does not have the dynamic modular interface
of Linux and RT-Linux, it does have an innovativecompile-
time conﬁgurationmechanism that can be used to select the
scheduling policy.
The release times of all tasks is set to zero (the beginning
of the period). All tasks have an absolute deadline associ-
ated with them that is recalculated at each release based on
the absolute time of release and the relative deadline.
LEDF sorts all tasks by their absolute deadlines and
chooses the task with the earliest deadline ﬁrst. If there
are no real-time tasks pending, the Linux/Idle task is cho-
sen and run at the lowest available speed. In this case, a
timeout is set to preempt the task at the next known release
time. If any real-time tasks are present, each speed at which
the processor can run is considered in order from lowest
to highest. For each speed, the worst-case execution time
is calculated based on the maximum instruction count. If
this execution time is too high to meet the current absolute
deadline for the task, the next higher speed is considered.
Otherwise, a schedulability test is applied to verify that all
ready tasks will be able to meet their deadlines when the
current earliest-deadline task is run at a lower speed. The
test consists of iterating down the ordered list of tasks and
comparing the worst-case completion time for each task (at
the highest speed) against its absolute deadline. If any task
will miss its deadline, the current speed for the current task
is insufﬁcient and the next higher speed is considered. This
is only guaranteed to work when all tasks share the same
period. When tasks have different periods, but their dead-
lines occur at the end of their periods, a better test would be
to remember the last speed at which each task was run and
compute the overall utilization (substituting the proposed
speed for the current task instead of its last speed); verify-
ing that this utilization is below a certain threshold implies
schedulability. Once a speed is identiﬁed for the current
task, the switching code is invoked if the processor is not
already operating at that speed.
The present implementation performs an O(
n
2)s o r ta t
every scheduling instant for simplicity. For efﬁciency, this
couldbereplacedbyanO(
n
l
o
g
n)sortingtechnique. Alter-
natively, the task queue could be separated into a deadline-
ordered ready queue and a blocked queue. The latter option
wouldrequirean O(
n)insertionwhenevera task is released,
but would allow a constant-time task selection and an O(
n)
schedulability test.
Switching the power state of a mobile Athlon 4 proces-
sor simply consists of writing to a model speciﬁc register
(MSR). After identifying the core voltage and clock fre-
quencyat whichthe processoris to be set, these valuesmust
5
Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’02) 
0-7695-1739-0/02 $17.00 © 2002 IEEE be encoded into a 32-bit word along with 3 control bits.
Another 32-bit word contains the stop-grant timeout count
(SGTC), which represents the number of 100 MHz system
clocks during which the processor is stalled for the volt-
age and frequency changes. The maximum phase-locked
loop(PLL) synchronizationtime is 50
￿s and the maximum
time for ramping the core voltage is 100
￿s. The WRMSR
macro provides a very simple mechanism for instrument-
ing the power state change. For debugging, the RDMSR
macro was used with a status MSR to retrieve the proces-
sor’s power state. Decoding the two 32-bit word values re-
vealsthe maximum,current,and defaultfrequencyandcore
voltage.
The RT-Linux high-resolution timer used for scheduling
is based (in x86 systems) on the time-stamp counter (TSC).
TheTSC is a specialcounterintroducedby Intelthatsimply
counts clock periods in the CPU since it was started (boot-
time). The gethrtime() RT-Linux method (and all methods
derived from it) convert the TSC value into a time value
using the recorded clock frequency. Thus, a simple calcula-
tion to determine time in nanoseconds from the TSC value
would be the product of TSC and speed, divided by
1
0
9,
where speed is measured in gigahertz. Because RT-Linux
was initially developed without the need for dynamic fre-
quencyswitching, the speed used for the calculationof time
is set at boot time and never changed. Thus, when the pro-
cessor is slowed to a low-powerstate with a lowerclock fre-
quency, the TSC counts at a lower rate. However, the geth-
rtime() method is oblivious to this and the measurement of
time slows down proportionally. In order to circumventthis
issue, wechoseto tracktimefromwithintheLEDFmodule,
utilizing knowledge of the processor’s operating speed.
One notable effect of using time calculations based on
the TSC is that the effect of a speed switch on the TSC, and
thus how to measure time during the switch, is unclear. The
TSC does appear to be incremented during some part of the
speed switch, but the count is not a reliable means of mea-
suring time. We were also unable to verify that the SGTC
value speciﬁed when performing the speed switch was an
accurate method of calculating the time elapsed during a
switch. However, because time is now “controlled” en-
tirely by the scheduler, we can simulate ideal conditions for
the experiment by ignoring time elapsed during the speed
switch. This would clearly not work for a real embedded
system where real-time tasks would have deadlines with
signiﬁcanceoutsideofthesystem(whereignoringthespeed
switch times would cause drift in measured time as com-
pared to real time).
5 Experimental Results
We now present data from our power measurement ex-
periments. In this paper we measure total system power
consumption of the laptop. Knowledge of CPU power sav-
ings is useful, however, in generalizing the results. CPU
power savings can easily be derived from a set of experi-
ments. In order to isolate the power used by the proces-
sor and system board, we can turn off all system compo-
nents except the CPU and system board. We can then take
a power reading when the CPU is halted. This power mea-
surement represents the total system power excluding CPU
power. We can then subtract this base power from all future
power readings in order to obtain CPU power alone. How-
ever, we found that halting a processor is far more complex
than simply issuing a “HLT” instruction. Disconnecting the
CPU fromthe system bus is necessary to halt the CPU clock
and achieve signiﬁcant power savings. This action can be
performedthroughthe Northbridgeor Southbridgedepend-
ing on the chipset and CPU state. We were unable to obtain
sufﬁcient documentation to implement this.
As an alternative method of estimating power drawn by
the system board and components, the power consumption
of the CPU with maximumload can be calculated from sys-
temmeasurementsat 2 powerstates. Tests canbedevisedto
isolate power drawn by the LCD screen, hard drive, and the
portion of the system beyond our control. Once an estimate
for system power is available, we can eliminate that from
all our readings to get an approximation of the fraction of
CPU power being saved.
In our case, we chose to compare a fully loaded proces-
sor operating at 700 MHz (with a core frequency of 1.25V)
and at 1100 MHz (with a core voltage of 1.4V). The 700
MHz conﬁguration uses
(
7
0
0
￿
1
:
2
5
2
)
=
(
1
1
0
0
￿
1
:
4
2
),o r
50.73% as much CPU power as the 1100 MHz conﬁgura-
tion. A measured difference of (
2
:
3
7
3
￿
1
:
6
4
7) = 0.726A
implies that the fully loaded CPU operating at 1100 MHz
draws approximately 1.474A. Knowing this, we deduce
from the information in Table 3 that the system board and
basic componentsdraw approximately0.456A,and that un-
der normal operation, the system (including the disk drive
and display) draws about 0.976A in addition to the load
from the CPU. This estimation, although approximate, pro-
vides a useful method of isolating energy used by the CPU
for various utilizations and scheduling algorithms. Know-
ing CPU enery consumption is useful primarily in estimat-
ing the effect of LEDF in a different system, speciﬁcally
one that has a signiﬁcantly different ratio of CPU to sys-
tem power consumption. However, total system power is
certainly a reasonable metric to compare the efﬁciency of
LEDF with alternative scheduling saving methods.
We performed several experiments with 3 different ver-
sions of the scheduling algorithm and 3 different taskset
sizes at various CPU utilization levels. The experiments
were modeled loosely after those performed in [17]. We
constructeda pseudo-randomtask generatorto generate our
test sets. The task generator has 2 modes used to create dif-
6
Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’02) 
0-7695-1739-0/02 $17.00 © 2002 IEEE CPU (1100 MHz) Screen Disk Current drawn (A)
Idle Off STBY 1.5
Idle Off On 1.54
Idle On STBY 1.91
Idle On Sleep 1.9
Idle On On 1.97
Max Load Off STBY 1.93
Max Load On On 2.45
Table 3. Current consumptions of various
system components.
ferent types of task sets. In the ﬁrst mode, a random set
of tasks is created with different periods. The method used
for choosing periods is based on that described in [17]. In
this mode, release times are set to the beginning of a period
and deadlines to the end of a period. Computation require-
ments for the tasks are chosen randomly and then scaled to
meet the target utilization. In the second mode, a taskset is
generated in which all tasks share the same period. Dead-
lines are chosen randomly, though conditions are imposed
to guarantee schedulability in the resulting taskset.
The simulator is a simple PERL program that reads in
task data and generates the schedule that would be gener-
ated by the LEDF scheduler. It then takes user-supplied
baselinepowermeasurementsandusesthemtocomputethe
powerconsumptionof the taskset. Summing up the fraction
of the period spent in each state and multiplying it by the
appropriate power consumption measurement produces the
overall power consumption for the taskset. As a reasonable
representation of the load generated by the Linux/Idle task,
the simulator assumes the Linux/Idletask to consume a cer-
tain amount of power whose value lies between the power
consumptions of a fully-loaded and fully-idle system run-
ning at a given speed. This power value was determined by
measuring the power consumption of the laptop with regu-
lar Linuxrunninga subset of daemonprocessesin the back-
ground.
We used a single power-state version of LEDF (in effect,
EDF) as a comparison point. These tests show the maxi-
mumpowerrequirementsfortheamountofwork(computa-
tion)to bedone. We also used 2-speedand3-speedversions
of LEDF to observe what effect adding additional power
states would have. The 2-speed version used operating fre-
quenciesof 700 and 1100 MHz, and the 3-speed version in-
corporated an intermediate 900 MHz operating frequency.
The CPU utilizations ranged from 10% to 80% in incre-
ments of 10%. The maximum utilization of 80% was nec-
essary to guarantee that the Linux/Idle task had sufﬁcient
time available for control operations. Without forcing the
schedulertoleave20%ofthe periodopenfortheLinux/Idle
task, the shell became unresponsive, forcing a hard reboot
of the machine between each test. We also implemented
thecycle-conservingEDF(ccEDF)algorithmfrom[17]and
compared our algorithm to it. These results are shown in
10 20 30 40 50 60 70 80
26
28
30
32
34
36
38
40
42
44
Experimental Results for 5−task Task Sets
Utilization
P
o
w
e
r
 
C
o
n
s
u
m
e
d
 
(
W
)
EDF   
LEDF2 
LEDF3 
ccEDF2
ccEDF3
Figure 4. Heuristic comparison for 5-task
taskset.
Figures 4, 5 and 6 for 5, 10, and 15 randomly-generated
tasks, respectively. Note that LEDF2 (LEDF3) and ccEDF2
(ccEDF3) refer to the use of two (three) processor speeds.
We see that LEDF performs much better than ccEDF for
tasksets with intermediate and high utilization values.
The reason behind this observation is as follows. The
ccEDF algorithm operates by considering the utilization of
the taskset at each scheduling instant. It is a well-known
fact that a taskset is schedulable under EDF if its utiliza-
tion is less than or equal to 1 for a given CPU speed. The
ccEDF algorithm ﬁrst statically ﬁnds a lower operating fre-
quency
f
l suchthatthetotalutilizationofthetaskset exactly
equals1. However, a loweredoperatingfrequency
f
l can be
found given that the CPU frequency is continuously vari-
able. When CPU speeds are limited to a small subset of the
frequencyrange,ﬁnding an
f
l for which utilization equals 1
maynotbepossible(especiallywhenthefrequenciesarefar
apart). Intuitively, ccEDF attempts to distribute the avail-
able slack in the system evenly among all jobs by increas-
ing the execution times of all jobs. When only discrete fre-
quency values are allowed, it is advantageous to allocate
slack to a few jobs in a more efﬁcient manner. Since LEDF
allocates slack to jobs on a per-job basis, it performs better
than ccEDF at intermediate and higher utilization values.
This is also the reason why we do not consider a preemp-
tive algorithm in this paper. When preemption is allowed, it
is usually easier to test for schedulability based on utiliza-
tion. A utilization-based DVS scheme is inefﬁcient for the
discrete speed case.
Comparisons between measured experimental results
and simulation results are shown in Figures 7 and 8.
Although not shown explicitly, the EDF simulation re-
sults almost precisely match experimental results. The two-
and three-speed versions of LEDF are a close match. In the
7
Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’02) 
0-7695-1739-0/02 $17.00 © 2002 IEEE 10 20 30 40 50 60 70 80
26
28
30
32
34
36
38
40
42
44
Experimental Results for 10−task Task Sets
Utilization
P
o
w
e
r
 
C
o
n
s
u
m
e
d
 
(
W
)
EDF   
LEDF2 
LEDF3 
ccEDF2
ccEDF3
Figure 5. Heuristic comparison for 10-task
taskset.
worst case, the measurement was 70 mA off from the sim-
ulation result (3-speed LEDF with 5 tasks at 70% utiliza-
tion). Based on our estimate of system load, this translates
to about7%ofthe currentdrawnby the CPU in thiscase. In
most cases, the simulated and measured valuesare the same
or within 2% of each other.
As a single-speed scheduler, we expect the EDF power
consumption to vary linearly with the computation require-
ments of the taskset. In all cases the EDF power consump-
tion variation is nearly linear, using approximately 37W to
37.4W for tasksets with low utilization and over 42W for
80% utilization.
The results for varying tasksets are all quite similar.
The only major difference is that the 15-task tasksets gave
slightly smoother results. This is expected because it is less
likely that randomly tight deadlines will have a great effect
on the results. In the 5-task experiments, the effect of a
randomly tight deadline can be greater because the task is
likely to consist of a greater portion of the processing time.
Averaging results over more task sets for each utilization
may reduce this effect. Also, note that the effect should not
bepresent undera fullyutilizedsystem and a minimallyuti-
lized system because all tasks will be running at the highest
or lowest speed. Indeed the 80% utilization mark shows a
consistent 39.6W consumption rate and the 10% utilization
point shows a consistent 27.8W consumption rate.
The powersavingsrangedfrom9.4Win a minimallyuti-
lized system to 2.6W in a fully utilized system. The fully
utilized system has lower power consumption under LEDF
because LEDF schedules the non-real-time component at
the lowest speed. Note, however, that up to the 50% mark
thepowersavingsremainover9W andremaininmostcases
10 20 30 40 50 60 70 80
26
28
30
32
34
36
38
40
42
44
Experimental Results for 15−task Task Sets
Utilization
P
o
w
e
r
 
C
o
n
s
u
m
e
d
 
(
W
)
EDF   
LEDF2 
LEDF3 
ccEDF2
ccEDF3
Figure 6. Heuristic comparison for 15-task
taskset.
over 7W for 60% utilization. With a maximum utilization
of 80%, the system can still save signiﬁcant power with a
reasonable task load.
The simulation results provided a very close match to
the experimental results, indicating that the simulation en-
gine model accurately models the real hardware. Because
the simulation engine does not take into account the sched-
uler’s computation time, the ﬁdelity of the results may de-
grade for very high task counts due to the extra cost of
sorting the deadlines. In order to verify this, we evaluated
LEDF with several randomly-generated tasksets with dif-
ferent utilizations with the number of tasks ranging from
10 to 50 and measured the execution time of the scheduler
for each taskset. Our results show that the execution time
of the scheduler was in the order of microseconds, while
the task execution times were in the order of milliseconds.
For increasing taskset size, scheduler runtime increases at a
very slow rate. Thus, scheduling overhead does not prove
to be too costly for the power-aware version of EDF for
taskset utilization ranging from 10% to 80%. Scheduler
overhead becomes an issue only when the number of tasks
is in the hundreds. However, for tasksets with more than
100 tasks, the RT-Linux platform tended to become unre-
sponsive. These results are shown in Table 4. The entries
in the table correspond to tasksets with 20% utilization, but
with varying number of tasks. The other tasksets we exper-
imented on (taskset utilizations of 50% and 80%) also have
the same trend in scheduler runtime and are not reproduced
here due to space limitations. The start times in the table
represent the scheduler clock values at the start of each test
run and the end times represent the clock reading at the end
of the ﬁrst scheduler iteration. The time difference between
8
Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’02) 
0-7695-1739-0/02 $17.00 © 2002 IEEE 10 20 30 40 50 60 70 80
26
28
30
32
34
36
38
40
42
Results for 2−speed LEDF with 15−task Task Sets
Utilization
P
o
w
e
r
 
C
o
n
s
u
m
e
d
 
(
W
)
Simulation  
Experimental
Figure 7. Comparison of experimental 2-state
LEDF with expected results.
Number Scheduler Scheduler Time
of tasks start (ns) end (ns) difference (ns)
10 581872567 581876243 3676
15 920912729 920926639 13910
20 1562702962 1562707667 4705
25 1601616110 1601622331 6221
30 321138025 321141561 3536
35 1289818471 1289823622 5151
40 1858597555 1858606179 8624
45 917709246 917715072 5826
Table 4. Scheduling overhead for varying
taskset sizes.
the end and start times thus includes the sorting time and
time for identiﬁcation of the active task. Although our im-
plementation of LEDF is of O(
n
2) complexity, scheduling
overhead is negligible for up to a hundred tasks for utiliza-
tions ranging from 10% to 80%. We emphasize that no task
deadlines are missed for tasksets at any utilization value.
For small tasksets where the taskset consists of a few hun-
dred tasks, scheduling overhead is negligible compared to
task execution times.
6 Conclusions
Energy has emerged as a major resource constraint in
battery-operated systems. For real-time embedded systems
such as sensor networks, energy consumptionmust be care-
fully balanced with real-time responsiveness. In this paper,
we have presented a dynamic power management scheme
that uses a low-energy EDF scheduler to guarantee real-
time execution with signiﬁcant energy savings. The LEDF
scheduler is efﬁcient and it can be easily integrated into
the kernels of real-time operating systems. We have im-
10 20 30 40 50 60 70 80
26
28
30
32
34
36
38
40
42
Results for 3−speed LEDF with 15−task Task Sets
Utilization
P
o
w
e
r
 
C
o
n
s
u
m
e
d
 
(
W
)
Simulation  
Experimental
Figure 8. Comparison of experimental 3-state
LEDF with expected results.
plemented LEDF on an RT-Linux testbed running on an
AMD Athlon 4 processor. We have evaluated LEDF on
several tasksets, both real-life and randomly generated, and
have shown that LEDF provides signiﬁcant energy savings
in real-time systems.
Most of the lessons learned during this experimentation
occured during the planning and development phases in-
stead of the actual testing phase. The ﬁrst lesson is that
unless we are working on an in-house embedded system,
using a common platform such as the x86 platform helps to
keepoptionsopen. Furthermore,awealthoffreeandeasily-
available information, software, and support are available
for various aspects of an x86 system. Although not dis-
cussed in this paper, early prototype work using a SuperH
SH-4-based embedded system resulted in a lot of difﬁculty
(from a variety of sources) in just getting RT-Linux in-
stalled. Unless there are strong reasons to the contrary, it
is advisable to implement ﬁrst on a familiar system, then
later port to the desired end-system if necessary.
The next lesson is that highly-competitive hardware
companies tend to encapsulate as much information as they
can under the label “proprietary”. This might not be such
a problem for embedded processors (the SuperH SH-4 had
excellent documentation for everything we wanted to do),
but it took us a long time to get the information we needed
for power switching from AMD. Thence it is advisable to
identify at the outset what detailed information is needed,
and either work on obtaining this information before any-
thing else, or pad the developmentschedule for delays.
Anotherlessonlearntis toverifyan implementationwith
external measurements as much as possible. When making
modiﬁcations to kernel code, always suspect any informa-
tion that also comes from the kernel. It was not until we
9
Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’02) 
0-7695-1739-0/02 $17.00 © 2002 IEEE attempted to verify the execution times of programs run-
ningat lowerspeedsthatwe evensuspectedthetimingissue
stemming from the TSC. Prior to this, it simply appeared
that the power state switches were failing, indicating that
we were not writing the correct information to the MSR.
The status MSR indicated successful power switches, how-
ever, which caused further alarm.
An interesting point that emerges from this work is that
for discrete speeds, using utilization values as a metric to
lower operating frequency (as in the case of ccEDF) results
in inefﬁcient DVS algorithms. Using the LCM polynomial
tranformation provides greater energy savings when only
discrete CPU frequencies are discretized. Although com-
putationally expensive for some special cases, the tranfor-
mation of generic periodic tasksets into tasksets where all
tasks have the same period can result in an efﬁcient DVS
algorithm that can be easily implemented in RT-Linux.
Our results give rise to several new directions. A few of
these are listed below:
￿ The algorithm presented in this paper does not handle
preemption. In mission-critical systems, preemptionis
an extremely important method to guarantee safety.
￿ The model presented in this paper does not consider
the arrivals of sporadic tasks. Algorithms that han-
dle these kind of tasks usually attempt to minimize re-
sponse time. A more intelligent DVS algorithm can be
developed and implemented that attempts to minimize
energy while at the same time minimize the response
times of sporadic tasks.
￿ Severalalgorithmshavebeendevelopedthatattemptto
utilize the slack available in a taskset to further scale
down CPU voltages and speeds. Implementing these
methods can lead to more energy savings than consid-
ering worst-case execution times alone.
References
[1] N. AbouGhazaleh, D. Mosse, B. Childers and R. Melham. Toward
the placement of power management points in real-time applica-
tions. Proc. Workshop on Compilers and Operating Systems for Low
Power (COLP), 2001.
[2] Advanced Conﬁguration and Power Interface (ACPI).
http://www.acpi.info/.
[3] H. Aydin, R. Melham, D. Mosse, P. Mejia-Alvarez. Dynamic and
agressive scheduling techniques for power-aware real-time systems.
Proc. Real-Time Systems Symp., pp. 95–105, 2001.
[4] Intel SpeedStep Technology.
http://support.intel.com/support/processors/mobile/pentiumiii/ss.htm
[5] AMD PowerNow! Technology.
http://www.amd.com/us-en/assets/content type
/white papers and tech docs/24319.pdf
http://www.amd.com/epd/processors/6.32bitproc/8.amdk6fami/x24404
/24404a.pdf
[6] A. Chandrakasan and R. Broderson. Low Power Digital CMOS De-
sign, Kluwer Academic Publishers, Norwell, MA, 1995.
[7] The embedded Conﬁgurable operating system (eCos).
www.redhat.com/embedded/technologies/ecos
[8] F. Gruian and K. Kuchcinski. LEneS: Task scheduling for low-
energy systems using variable supply voltage processors. Proc. Asia
South Paciﬁc Design Automation Conf., pp. 449–455, 2001.
[9] I. Hong, D. Kirovski, G. Qu, M. Potkonjak and M. B. Srivastava.
Power optimization of variable-voltage core-based systems. Proc.
Design Automation Conf., pp. 176–181, 1998.
[10] I. Hong, G. Qu, M. Potkonjak, and M. B. Srivastava. Synthesis tech-
niques for low-power hard real-time systems on variable-voltage
processors. Proc. Real-Time Systems Symp., pp. 178–187, 1998.
[11] C-H. Hsu, U. Kremer and M. Hsiao. Compiler-directed dynamic
frequency and voltage scheduling. Proc. Workshop on Power-Aware
Computer Systems, 2000.
[12] C. Hwang and A. C-H. Wu. A predictive system shutdown method
for energy saving of event-driven computation Proc. Intl. Conf.
Computer-Aided Design, pp. 28–32, 1997.
[13] J. Luo and N. K. Jha. Power-conscious joint scheduling of periodic
task graphs and aperiodic tasks in distributed real-time embedded
systems. Proc. Intl. Conf. Computer-Aided Design, pp. 357–364,
2000.
[14] E. L. Lawler and C. U. Martel. Scheduling periodically occurring
tasks on multiple processors. Information Processing Letters, vol.
12, no. 1, pp. 9–12, 1981.
[15] S. Lee and T. Sakurai. Run-time voltage hopping for low-power
real-time systems. Proc. Design Automation Conf., pp. 806–809,
2000.
[16] C. L. Liu and J. Layland. Scheduling algorithms for multiprogram-
ming in a hard real-time environment. Journal of the ACM, vol. 20,
pp. 46–61, 1973.
[17] P. Pillai and K. G. Shin. Real-time dynamic voltage scaling for low-
power embedded operating systems. Proc. Symp. Operating Sys-
tems Principles, pp. 89–102, 2001.
[18] J. Pouwelse, K. Langendoen, and H. Sips. Energy priority schedul-
ing for variable voltage processors. Proc. Intl. Symp. Low-Power
Electronics and Design, pp. 28–33, 2001.
[19] G. Quan and X. Hu. Energy efﬁcient ﬁxed-priority scheduling for
real-time systems on variable voltage processors. Proc. Design Au-
tomation Conf., pp. 828–833, 2001.
[20] V. Raghunathan, P. Spanos and M. B. Srivastava. Adaptive power-
ﬁdelity in energy-aware embedded systems. Proc. Real-Time Sys-
tems Symp., pp. 106–115, 2001.
[21] The RT-Linux Operating System.
http://www.fsmlabs.com/community/.
[22] M. B. Srivastava, A. P. Chandrakasan and R. W. Broderson. Pre-
dictive system shutdown and other architectural techniques for en-
ergy efﬁcient programmable computation. IEEE Trans. VLSI Sys-
tems, vol. 4, pp. 42–55, 1996.
[23] F. Yao, A. Demers and S. Shenker. A scheduling model for reduced
CPU energy. Proc. IEEE Annual Foundations of Computer Science,
pp. 374–382, 1995.
10
Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’02) 
0-7695-1739-0/02 $17.00 © 2002 IEEE 