Multiple Frequency Selection in DVFS-Enabled Processors to Minimize
  Energy Consumption by Rizvandi, Nikzad Babaii et al.
Multiple Frequency Selection in DVFS-
Enabled Processors to Minimize Energy 
Consumption 
 
Nikzad Babaii Rizvandi
1,2
, Albert Y. Zomaya
1
, Young 
Choon Lee
1
, Ali Javadzadeh Boloori
1,2
, Javid Taheri
1
 
1 
Center for Distributed and High Performance Computing 
School of Information Technologies, University of Sydney 
NSW 2006, Australia 
 
2 
National ICT Australia (NICTA), Australian Technology Park 
Sydney, NSW 1430, Australia  
{nikzad, yclee}@it.usyd.edu.au 
{albert.zomaya, Javid.taheri}@sydney.edu.au   
ajav4801@uni.sydney.edu.au 
 
1. INTRODUCTION 
   Research on low power systems has received a great amount of attention in recent years 
since the sustainability of current technologies and practices has become a serious issue. 
A few example systems where lowering power usage is critical are: 
 Wireless sensors: several sensors extract data from the environment concurrently, 
transmit these data to a processing unit and receive processed data accompanied by 
appropriate commands from the processing unit [1-4]. The sensors and their 
receiver/transmitter are generally powered by battery and/or solar cells. 
  Satellite circuits: Satellites typically involve massive number of complex circuits that 
must work in low power. These circuits are supplied by solar cells, the only available 
power supply in satellites. 
 Robots and surveillance devices: these devices are heavily used in army, mine 
extraction and in difficult or unsafe environments for humans. 
 Cell phones and laptops: these devices are powered by batteries which are expected 
to work for a long time. 
In the meantime, stiff increases in energy price and the environmental impact of carbon 
dioxide emissions associated with energy generation and transportation have forced the 
issue of reducing energy consumption to be extended to a broader range of system 
including High Performance Computing Systems (HPCS).    
Various issues such as resource management in both software and hardware levels must 
be addressed to reduce energy consumption in HPCS. An important issue in hardware 
resource management is how to reduce power usage in processors. In the recent past, 
many hardware-based approaches have been made to efficiently reduce energy 
consumption, particularly for processors. Dynamic voltage-frequency scaling (DVFS) is 
perhaps the most appealing method incorporated into many recent processors. Energy 
savings with this method is based on the fact that the power consumption in CMOS 
circuits has direct relation with frequency and the square of voltage supply. In this case, 
the execution time and power consumption can be controlled by switching between 
processor's frequencies and voltages. Although this approach was initially designed for 
single processor task scheduling [5], it has recently received much attention in 
multiprocessor systems as well [6, 7]. 
 DVFS technique and task scheduling can be combined in two ways: (1) schedule 
generation, and (2) slack reclamation. In the schedule generation, tasks graph are 
(re)scheduled on DVFS-enabled processors in a global cost function including both 
energy saving and makespan to meet both energy and time constraints at the same time 
[8, 9]. In slack reclamation, which works as post processing procedure on the output of 
scheduling algorithms, DVFS technique is used to minimize the energy consumption of 
tasks in a schedule generated by a separate scheduler. The existing methods based on 
DVFS technique, however, have two major shortcomings: (1) most of them focus on 
schedule generation and do not adequately take the slack reclamation approaches into 
account to save more energy, and (2) the existing slack reclamation methods use only one 
frequency for each task among all discrete set of processor’s frequencies. Using one 
frequency usually results in uncovered slack time where processor and other devices only 
waste energy. 
In this chapter we focus on slack reclamation and propose a new slack reclamation 
technique, Multiple Frequency Selection DVFS (MFS-DVFS). The key idea is to execute 
each task with a linear combination of more than one frequency such that this 
combination results in using the lowest energy by covering the whole slack time of the 
task. We have tested our algorithm with both random and real-world application task 
graphs and compared with the results in previous researches in [7] and [10]. The 
experimental results show that our approach can achieve energy almost identical to the 
optimum energy saving.  
 2. Energy Efficiency in HPCSs 
    Many of electronic systems in our life such as satellite systems, cell-phones, game 
instruments and so on are using rechargeable batteries as their power supplies. Although 
the battery capacity has been grown significantly in recent years (the battery capacity 
increases 5% per year), battery life is still the major drawback for most of electronic 
systems. In addition to power-aware battery-based systems, the issue of energy 
consumption has recently attracted a great amount of attention in high performance 
computing systems (HPCS). Energy consumption issue in such systems can be classified 
into three groups: (1) system-level resource allocation, (2) service-level energy-load 
distribution, and (3) task scheduling level (Figure 1).  
In the system-level, the problem is how to distribute computational resources (e.g. CPU, 
network, memory and I/O) between large scale data storages and processing centers (such 
as supercomputers and data centers). Fairly distribute resources among applications (or 
services) not only requires to obtain individual adaptation among resources but also needs 
to understand the interaction between individual resources when they work as a system. 
Therefore, the big challenge here is to find both the relationship among system resources 
and their trade-off, which may cause an optimal balance between performance, QoS and 
energy consumption [11]. Among different technologies in system-level for managing 
resources between workloads, virtualization becomes a key technology in data centers. 
Virtualization allows the computational resources to be shared between different 
workloads. Many of incoming workloads to data centers are medium size workloads 
which often require a small fraction of the computational resources. The servers typically 
spend around 70% of their maximum power consumption even in low utilization. With 
 virtualization, such workloads can be run within a virtual machine (VM) causing 
significant saving in overall energy usage. The associated VMs may require fewer 
amounts of resources and therefore they can be run on a single hardware unit. It is 
obvious that less hardware is used in overall, less energy is wasted for both working on 
and cooling of the servers.  
In the service-level, energy reduction by load balancing, scheduling and mapping 
workloads is concerned. The main challenge is to utilize appropriate algorithms to both 
multiplex/demultiplex workloads in order to save energy and make a trade-off between 
performance and service cost reduction because of energy savings. Also, to avoid hotspot 
in data centers due to high-loaded nodes, services can be moved from nodes with high-
load and high temperature to nodes with smaller load and lower temperature. Generally, 
this movement of services should happen when the destination nodes can operate the 
services in an energy efficient way [11].  
In site-level data/task scheduling, the focus of this chapter, the operating system (OS) and 
hardware configuration such as dynamic power management, micro-architecture 
techniques and dynamic voltage scaling are used to decrease power. Here, the typical 
question could be: 
"What is the suitable OS/hardware configuration to process tasks in the shortest possible 
time and with minimum energy?" 
  
Figure 1. Energy consumption levels in HPCS 
3. Exploitation of dynamic voltage-frequency scaling 
    Dynamic voltage-frequency scaling is a modern technique in computer architecture to 
reduce the energy consumption of microprocessors or control the amount of the generated 
heat by the circuit. This technique is commonly utilized in battery-based devices such as 
laptops and cell phones where decreasing the energy usage of battery is necessary. In 
addition, DVFS is used in high-computing nodes not only to decrease the power of the 
nodes but also to save more energy to cool down the nodes’ places.  An approximation 
model shows that the dynamic power in CMOS circuits is a linear function of both 
switching frequency and voltage square as: C.V
2
.f, where C is the effective switching 
capacity per clock cycle. Therefore, a workload (or task) can save more energy when it is 
executed in lower voltage and frequency. In general, a computing node executes several 
tasks with inter-task relationships (e.g., precedence constraints) simultaneously. These 
inter-task relationships typically incur slack time (idle time) between tasks where can be 
 used by DVFS to reduce energy usage. Specifically, the slack time associated with a task 
is utilized to execute the task in a lower voltage-frequency; this in turn results in energy 
reduction.  
    There are two ways to combine scheduling and Dynamic Voltage-Frequency Scaling: 
(1) independent slack reclamation, and (2) integrated scheduling generation. The existing 
methods in literature based on these combinations have two major limitations: (1) most of 
them focus on integrating DVFS and scheduling (integrated schedule generation) and do 
not sufficiently consider the slack reclamation approaches to save more energy, and (2) 
the existing slack reclamation methods use only one frequency for each task among all 
discrete set of processor’s frequencies. Using one frequency usually results in uncovered 
slack time where processor and other devices only waste energy.   
3.1. Independent slack reclamation 
Independent slack reclamation, works on the output of other scheduling algorithms as a 
post processing procedure by applying DVFS technique to minimize energy consumption 
of generated tasks by a scheduler. In [7] Kimura et al proposed an energy reduction 
algorithm for power-scalable clusters supporting DVFS. In a simplified version of this 
algorithm, the appropriate frequency is chosen among a set of processor’s frequencies for 
each task regarding its slack time. Another algorithm was proposed in [10] to reclaim 
slack time for each task in a DAG by linear combination of the processor highest and 
lowest frequencies. To the best of our knowledge, among existing energy-aware 
algorithms in HPCS, these two methods are the most similar approaches to our MFS-
DVFS algorithm presented in this chapter. We address the simplified version of these two 
algorithms as Reference DVFS (RDVFS) and Maximum-Minimum-Frequency DVFS 
 (MMF-DVF) in the rest of this chapter and will use them as benchmarks to evaluate the 
performance of our proposed algorithm. 
3.2. Integrated scheduling generation  
    In integrated schedule generation, tasks graph are (re)scheduled on DVFS-enabled 
processors using a global cost function including both energy saving and makespan to 
meet both energy and time constraints at the same time [8, 9]. Therefore, the final 
scheduling will be a trade off between makespan and energy.  Kappiah et al in [12] 
presented Just-in-time DVFS technique to fill slack time in MPI programs. They utilized 
a system called Jitter to reduce the frequency on nodes with more slack times and fewer 
computations. Jitter aimed to make sure that the tasks came just in time without 
increasing overall execution time. DVS technique was applied in [8] on processors that 
did not work in peak performance during execution of a parallel application. The best 
processor frequency of each task was selected by analyzing computation and 
communication power profiles collected prior to the execution. A method to reduce 
power consumption was presented in [13] by adaptively activating and deactivating 
hardware resources and in particular, memory for intensive HPC applications. Cache 
missing in accessing the main memory also plays an important role in adjusting and 
triggering processors slack times. Lee and Zomaya in [9] presented a DVFS-based 
algorithm to minimize both completion time and energy consumption of precedence-
constrained parallel jobs on HPC systems. This method tried to minimize a summation of 
two cost functions: completion time and energy. Consequently, the final result was a 
trade-off between the quality of scheduling and energy consumption. The concept of 
energy scalability in formal terms was introduced by Ding et al. in [14]. In addition to 
 studying energy efficiency/iso-efficiency concept, they extended an analytical model to 
investigate the tradeoff between performance and energy saving in HPCS. Molnos et al in 
[15] classified the slack times in real-time applications into static, work and shared lack 
groups for multiple dependent tasks on multiple DVFS-enabled processors. They 
proposed a dynamic dependency-aware task scheduling to adjust voltage/frequency of 
each processor regarding tasks' real time deadlines. A profiled-based power-performance 
optimization method was presented in [16] to also utilize DVFS in HPCS. Here, the 
execution of a program was divided into several regions. In trial steps, profile 
information of each region, including power and execution profiles was extracted and 
then utilized to find its best combination of processors' voltages and frequencies. In [17], 
an upper limit for system energy usage was selected externally. Subsequently, a 
combination of performance modeling and performance prediction was applied to reduce 
execution times with respect to their predefined energy usage upper limit. After creating 
models for both execution time and energy consumption, key parameters of models were 
estimated by executing a program for a small number of times and then regressing the 
estimated parameters. Here, for better estimation of parameters, the following steps were 
iterated until a proper schedule is achieved: (1) using models to predict each possible 
scheduling of tasks, (2) executing the program a few times with the best predicted 
schedule and (3) updating estimated key parameters. Rountree et al in [18] proposed an 
energy-aware schedule generation algorithm for DVFS-enabled processors where a 
combination of all processor frequencies is involved into an overall linear programming 
optimization.  
 
 4. Preliminaries 
In this section, the system, application and energy models used in our study have been 
described. 
4.1. System and application models 
    In this work, we assume an HPC system comprising of N homogeneous processors 
with individual memories. The switching time from one frequency to another is typically 
in microseconds (between sec30 and sec150 refer to [19]) while the execution time of 
tasks is in milliseconds. Therefore, compared with tasks' execution time, the switching 
time can be ignored. We consider a set of M dependent tasks denoted as 
)()2()1( ,...,, MAAA  represented by task graph or directed acyclic graph (DAG). The k
th
-
task ( )(kA ) have the following parameters (Figure.2-a): (a) )(kOSt  is the task execution time 
in the original scheduling without slack reclamation, (b) )(kT  is the whole time the 
processor assigns to this task. This time is a summation of the task’s execution and slack 
times, (c) )(kit represents the task execution time when it is executed in frequency if , and 
(d) )(kK  is the number of tick cycles required for executing this task. This parameter can 
be calculated as: )()()()( kRD
k
RD
k
OSN
k tftfK  , where Nf  is the highest processor frequency, 
and (e) )(kRDf  and 
)(k
RDt  are frequencies calculated from RDVFS algorithm, explained in 
section  5.2 and its associated time, respectively. 
 4.2. Energy model 
A typical DVFS-enabled processor can execute a task in a discrete set of frequencies 
)...( 121 NN ffff   . For example, AMD Turion MT-34 can operate at six 
frequencies ranging from 800MHz to 1800MHz [5]. The power consumption of a 
processor consists of two parts: (1) dynamic part that is mainly related to CMOS circuit 
switching energy, and (2) static part that addresses the CMOS circuit leakage power [20]. 
In CPUs, the power consumption is formulated as [21]: 
)1(
2






vP
fvCP
leakage
effdynamic
 
Here, fCeff , and v  represent the effective capacitance, processor’s frequency and 
voltage, respectively. Because the leakage power is always negligible compared with the 
dynamic power [20], the overall energy consumption of k
th
-task )( )(kA  in DAG is 
calculated as:  
)2()( )()()()( ki
k
Idle
k
idynamic
k tTPtPE   
CPU power consumption can be modeled as a convex function of frequency 
as   3fPdynamic [21]; Therefore, the energy of k
th
-task )( )(kA in Eqn.2 is changed to:                     
)3()()( )()()(3)( ki
k
Idle
k
i
k tTPtfE    
  
Figure 2. Time representation of MFS-DVFS and other algorithms: (a) The original 
scheduling, (b) The RDVFS algorithm, (c) the optimum continuous frequency, (d) the 
MMF-DVFS algorithm and (e) our proposed method in this chapter (MFS-DVFS). 
 
 5. Energy-aware scheduling using DVFS  
In this section, we explain existing DVFS-based approaches to reduce energy 
consumption of processors by reclaiming the slack time for each task. In the end, we 
present our algorithm, MFS-DVFS, that uses a linear combination of frequencies to solve 
the stated problem. 
 
5.1. Optimum Continuous Frequency 
The optimal approach to remove slack time and as a result, reduce energy consumption of 
a processor is to perform a task using a continuous frequency by the processor (Figure 2-
c). Before moving further, proving the following theorems are necessary: 
Theorem 1: If 1f  and )( 12 ff   execute a task in 1t and 2t , respectively. Then, 
),(),( 22
)(
11
)( tfEtfE kk  .  
Proof:  
 
 
 
0
)()(
)(
)(
),(),(
122112
1
)(
1
3
1
2
)(
2
3
2
11
)(
22
)(






Idle
k
Idle
k
Idle
kk
Pffffff
tTPtf
tTPtf
tfEtfE




 
As generally IdleP , therefore the theorem 1 is proved. 
 Theorem 2: If processor frequency is continues (unrealistic assumption), the optimum 
energy for k
th
-task is obtained when the task covers the whole task's slack time ( )(kT ). 
Proof: the result in theorem 1 shows that when a frequency covers the whole slack time it 
gives the optimum power consumption. Note that this frequency may not exist unless the 
frequency set is continuous. 
Refer to theorem 2, for k
th
-task (
)(kA ), the optimum continuous frequency and its related 
energy are defined as )(k contoptf   and 
)(k
contoptE   and are calculated as [10]: 
  










)(3)(
cont -opt
)(
cont -opt
)(
)(
)(
cont -opt
)4(
kkk
k
k
OS
N
k
TfE
T
t
ff

 
In actual systems, however, frequencies must be chosen from a discrete set of 
frequencies. Also, finishing a task by its deadline may require choosing a frequency that 
is faster than the optimal frequency. Therefore, the optimal discrete frequency of k
th
-task 
is the first frequency in the discrete set larger than 
)(k
contoptf  . This discrete frequency and 
its associated time are )(kRDf  and 
)(k
RDt  , respectively. The algorithm calculating this 
frequency is referred to as RDVFS for our comparison[7].  
 5.2. Reference Dynamic Voltage-Frequency Scaling (RDVFS)  
RDVFS is a simplified version of the algorithm introduced by Kimura et al in [7] for 
power-scalable high performance clusters supporting DVFS. It reduces energy 
consumption of processors by selecting the smallest available processor frequency 
(fRDVFS) capable of finishing a task in a given time frame (Figure 2-b). The details of 
RDVFS algorithm is shown in Figure 3.  
 
  
 
For each task assigned to a processor, fRDVFS, which is the first frequency larger than 
optimal frequency (fopt-cont) calculated from Eqn.4, is likely to be the best discrete 
frequency candidate to execute the task within the given time frame and covering its 
RDVFS algorithm: slack reclamation by one frequency 
Input: the scheduled tasks on a set of P processors 
1.   for task )(kA scheduled on processor jP  
2.      Compute the optimum continuous frequency ( contopt
kf 
)(  ) from Eqn.4  
3.      Pick the closest higher frequency to contopt
kf 
)(  in the cpu frequency set,   
        e.g. 
             
nff
fff
ffff
RDVFS
k
ncontopt
k
n
nn




 
 )(
)(
1
min1max ],...,,,...,[
  
4.    )(
)(
)(
)( k
RDVFS
k
contopt
k
RDVFS
k T
f
f
t

  
5.  )( )()()()( RDVFS
kk
IdleRDVFS
k
RDVFS
k
RDVFS tTPtfE   
6. end for 
 7.  return ( RDVFS
kf )(  and RDVFS
kt )(   for all tasks) 
 
 
Figure 3. RDVFS algorithm 
 related slack time. As mentioned before, a major limitation of RDVFS technique is the 
usage of only one frequency to execute the task.  
5.3.Maximum- Minimum Frequency for Dynamic Voltage-
Frequency Scaling (MMF-DVFS)  
Maximum-Minimum Frequency for Dynamic Voltage- Frequency Scaling (MMF-DVFS) 
technique presented in [10] is similar to RDVFS as both of these approaches use DVFS 
to reduce energy consumption of scheduled dependent tasks in clusters. Unlike RDVFS 
algorithm which applies only one frequency to execute a task, MMF-DVFS uses a linear 
combination of maximum and minimum processor frequencies to achieve the optimal 
energy consumption regarding to slack time of the task, as shown in figure 2.d. Before 
explaining further details of MMF-DVFS, proving the following lemma is essential: 
Lemma: If fDVFS is the appropriate DVFS frequency obtained from RDVFS algorithm 
with task’s energy consumption EDVFS; then, there is always a linear combination of the 
processor’s minimum and maximum frequencies with energy consumption less than 
EDVFS. 
Proof: If fN, f1 and fDVFS are the maximum, minimum and appropriate DVFS processor 
frequencies extracted from DVFS algorithm, then the lemma indicates that the following 
non-equation always has a non-zero values for 
Nf
t and 
1f
t for kth task: 
)5(
1
1






Ttt
EEE
fNf
DVFSff N
  
 According to Eqn.3 in section 4.2, tKfE f
3 . By combining this with Eqn.5, we 
achieve the following equation:  






Ttt
ttftftf
ff
ffDVFSffN
N
NN
1
1
)(*** 3
1
1
33
                                                            (6) 
 Assuming ,
1
Ttt fNf  the equation (6) converts to: 
1
1 *)(0 f
DVFSN
DVFS
Nf
t
ff
ff
t


  
Which indicates that there is always a valid positive 
Nf
t and 
1f
t . The detail of MMF-
DVFS algorithm has been shown in Figure 4. 
 
MMF-DVFS algorithm: linear combination of maximum and 
minimum frequencies 
Input: the scheduled tasks on a set of P processors 
1.   for task 
)(kA scheduled on processor jP  
2.     Calculate amount of time for minmax , ff :  
- 
1
1
)()()(
)(
ff
fTtf
t
N
k
RDVFS
k
RDVFS
k
f
k
N


  
- 
1
)()()(
)(
1
ff
tffT
t
N
RDVFS
k
RDVFS
k
N
k
f
k


  
3.    1
)(
1
3)(3)( kN
k
NDVFSMMF tftfkE              
4.      end for 
5.   return (the set of( 1
)()( , kN
k tt ) for all tasks) 
 
Figure 4. MMF-DVFS algorithm 
 The algorithm finds the appropriate time portions of the maximum and minimum 
frequencies to execute each scheduled task. It can be seen from figure 7 that the MMF-
DVFS algorithm works the same as RDVFS in the worst case.  
In the next section, we present MFS-DVFS algorithm, which uses a linear combination of 
a variety of processor frequencies instead of two to perform a pre-defined task (Figure 2-
e). The new approach is more energy-efficient compared to the other algorithms 
discussed earlier in this chapter; its energy saving is quite close to the case of using 
continuous optimum frequency. 
5.4. Multiple Frequency Selection for Dynamic Voltage-
Frequency Scaling (MFS-DVFS) 
The RDVFS algorithm decreases a task execution energy by choosing the best 
processor’s speed with respect to the task’s idle time [7]. As an example, a set of four 
tasks scheduled on two processors is shown in Figure.2-a where Figure.2-b, 2-c and 2-d 
are the results of applying the RDVFS, optimum continuous frequency and MMF-DVFS 
algorithms on the task, respectively. Figure.2-e also shows the principle of MFS-DVFS 
algorithm, the proposed algorithm in this chapter. Initially, the task is executed for )(kNt  
time units with the highest processor frequency, then its execution frequency is reduced 
to the second highest value and spends )( 1
k
Nt  time units in this frequency. Then, the 
frequency decreases and task is executed in other frequencies until finishing. 
The key idea of MFS-DVFS is to execute tasks using a linear combination of available 
frequencies so that their slack times are fully filled/covered MFS-DVFS can be defined 
 as finding the best combination of available frequencies )...( 1 Nff  to perform a 
predefined task with K steps of computation within a predefined time T. Therefore, the 
power consumption minimization of k
th
-task ( )(kA ) in MFS-DVFS algorithm is 
formulated in an optimization form as follows: 























Nifort
Tt
Kft
ts
tTPftEMin
k
i
N
i
kk
i
N
i
k
i
k
i
N
i
k
i
k
Idle
N
i
i
k
i
k
,,2,1,0.3
.2
)7(.1
..
)()(:
)(
1
)()(
1
)()(
1
)()(
1
3)()(


 
The optimization problem in Eqn.7 represents the power consumption problem: how to 
choose )(kit  so that the consumed energy of task 
)(kA  minimizes. For executing the task, 
the processor has to use the same number of tick clocks in both RDVFS and MMF-DVFS 
algorithms as constraint 1 in Eqn.7.  Applying the two mentioned theorems simplifies the 
optimization problem in Eqn.7 to: 























Nifort
Tt
Kft
ts
ftEMin
k
i
N
i
kk
i
N
i
k
i
k
i
N
i
i
k
i
k
,,2,1,0.3
.2
)8(.1
..
)(:
)(
1
)()(
1
)()(
1
3)()(


 
 To find the best possible values of )(kit , this optimization algorithm must be applied to all 
tasks in the scheduling. There are cases that MFS-DVFS cannot improve the power 
consumption, for example when a task reaches to 1f  (the lowest frequency) in the 
RDVFS algorithm or it has no idle time. Therefore, to improve the speed of MFS-DVFS 
algorithm, eligible tasks should be extracted before optimization 
Task eligibility: to simplify the formulation let us just consider 4 discrete values for 
frequencies (the real processors have normally 4-5 frequencies). In any case, the same 
procedure can be used for the higher number of frequencies. The problem in Eqn.8 
becomes: 
















4,,2,1,0.3
.2
.1
..
)(:
)(
)()(
4
)(
3
)(
2
)(
1
)(
4
)(
43
)(
32
)(
21
)(
1
4
1
3)()(
ifort
Ttttt
Kftftftft
ts
ftEMin
k
i
kkkkk
kkkkk
i
i
k
i
k 
 
Merging constraints 2 and 3 results in: 

























12
14)(
4
12
13)(
3
12
1
)()(
)(
2
12
42)(
4
12
32)(
3
12
)(
2
)(
)(
1
ff
ff
t
ff
ff
t
ff
fTK
t
ff
ff
t
ff
ff
t
ff
KfT
t
kk
kk
k
kk
kk
k
 
Therefore, the power consumption function changes to  
)9()(4
)(
2
)(
3
)(
1
)(
0
)( kkkkkk tataaE   
 Where 
12
143
2
12
243
1
3
4
)(
2
12
133
2
12
233
1
3
3
)(
1
12
1
)()(
3
2
12
)(
2
)(
3
1
)(
0
)()()(
)10()()()(
)()(
ff
ff
f
ff
ff
ffa
ff
ff
f
ff
ff
ffa
ff
fTK
f
ff
KfT
fa
k
k
kkkk
k





















 
To guarantee achieving less energy consumption using MFS-DVFS algorithm, the 
following condition should be satisfied.  
)11()()(4
)(
2
)(
3
)(
1
)(
0
k
RD
kkkkk Etataa   
)(
4
)(
2
)(
3
)(
1
)(
0
kkkkk tataa   shows a 3-dimensional surface and the search region is where it 
satisfies the three following constraints: (1) 0)(3 
kt , (2) 0)(4 
kt  and (3) 0)( kRDE . The 
first two constraints in Eqn.11 are also considered by optimization in Eqn.8. The only one 
that specifies the search region is constraint 3. If a task satisfies this recent constraint, 
then it can be concluded that there is a valid search region for this task where MFS-DVFS 
gives better result than RDVFS. Then linear programming explores this search region to 
find out the best suitable frequencies and their associated times. The detail of MFS-DVFS 
algorithm has been shown in Figure 5. 
   
 
6. Experimental Results 
In this section we present the results of energy consumption obtained from simulating our 
MFS-DVFS algorithm in comparison with RDVFS, MMF-DVFS and optimum 
continuous frequency. In order to compare the algorithms, the following schedulers were 
used with different number of processors: (1) list scheduling, (2) list scheduling with 
Longest Processing Time first (LPT) and (3) list scheduling with Shortest Processing 
Time first (SPT). 
The simulations were carried out using the simulator we developed as a part of this study. 
 
MFS-DVFS algorithm: linear combination of frequencies 
Input: the scheduled tasks on a set of P processors 
1.   For task )(kA scheduled on processor jP  
2.      Apply RDVFS algorithm on this task  
3.      if  0)( kRDE  for this task then    
           - this task is eligible for MFS-DVFS 
           - Solve optimization problem in Eqn6 by linear  
              programming 
        else 
             RDVFS is the optimal result 
4.      end if 
5.   end for 
6.  return (the voltages and frequencies of optimal execution)  
     the task) 
 
 Figure 5. MFS-DVFS algorithm 
 6.1. Simulation Settings  
We use the voltage/frequency setting of two real processors in our simulations: 
Transmeta Crusoe [7] and Intel Xscale [22]. Table 1 shows the voltage/frequency and the 
related power consumption of these processors following with the convex models of each  
 
processor. These models use least-square curve fitting to fit a convex function )( 3  f  
on the frequency-power of two real processors, as shown in Figure 6. 
 We evaluated the performance of MFS-DVFS with two sets of task graphs: randomly 
generated and real-word parallel applications.  The two real world applications used in 
our experiments were LU decomposition and Gauss-Jordan with DAGs extracted from 
[19]. We applied a large number of variations in the number of processors and tasks for 
Figure 6. The least-square modelling of (a) Transmeta Crusoe, and  (b) Intel 
Xscale processors 
 
 
 each application in our simulations. The random task graph set consisted of 1500 graphs 
with five graph sizes of 100, 200, 300, 400 and 500 nodes, together with three different 
schedulers on five sets of 2, 4, 8, 16 and 32 processors. 
 
TABLE 1.  The voltage/frequency setting of two real processors in the experiments with their power 
consumption and convex models 
                  Transmeta Crusoe  
Level Frequency 
(MHz) 
Voltage 
(V) 
Power 
(W) 
0 667 1.6 5.3 
1 600 1.5 4.2 
2 533 1.35 3.0 
3 400 1.225 1.9 
4 300 1.2 1.3 
Convex model 
mW4.44
10
1094.1
3
6
5- 






f
P  
                     Intel Xscale 
Level Frequency 
(MHz) 
Voltage 
(V) 
Power 
(W) 
0 1000 1.8 1.6 
1 800 1.6 0.9 
2 600 1.3 0.4 
3 400 1 0.17 
4 150 0.75 0.08 
Convex model 
mW60
10
1055.1
3
6
6- 






f
P  
 
 These task graphs have different number of tasks, task distributions, communication costs 
and task dependencies. The execution cycle of these randomly generated tasks varied 
from 5-10 million cycles from a uniform distribution, respectively. We used 150 real-
world application task graphs based on LU decomposition algorithm in our experiments. 
For the real-application graph, the same number of task graphs –ranging from 100 to 500 
tasks– with three schedulers and on five sets of processors were investigated. 
6.2. Results 
The simulation results of normalized energy consumption for all DAGs (Figures. 7 and 8) 
are shown in table 2. This table clearly denotes the superior performance of MFS-DVFS 
scheduling compared to the other approaches in all cases. Figure 8 depicts that although 
the efficiency of all algorithms including MFS-DVFS in saving energy in LU 
decomposition is significant, these algorithms have less performance on Gauss-Jordan 
tasks. For a deeper examination of this behaviour, a sample three level Gauss-Jordan 
application job scheduling on three processors has been shown in Figure 9. As explained 
before, since there is no idle time among tasks in Gauss-Jordan graphs applications, none 
of these algorithms can efficiently reduce energy consumption. 
An interesting issue for further investigation is the relationship between energy 
consumption and the number of processors in our experiments. Increasing the number of  
processors expedites the processing time and consequently reduces the makespan; 
however, as a drawback, it also increases the system slack time. Figure 10 addresses this 
issue and illustrates the percentage of overall energy saving of the system on the number 
 of processors for random and LU decomposition task graphs. The graphs in this figure 
reveal the fact that increasing the number of processors results in saving more energy. 
 
Figure 7. The normalized energy consumption on the number of tasks: (a) The typical list 
scheduler (b) The list scheduler with Longest Processing Time first (LPT) and (c) The list 
scheduler with Shortest Processing Time first (SPT).  
 
 Figure 8. The normalized energy consumption of MFS-DVFS and other algorithms on the 
number of tasks for two real-world applications: (a) LU decomposition, (b) Gauss-Jordan.  
 
 
 
 
 
Table  2.  The energy saving percentage of MFS-DVFS and other 
algorithms on 1800 random and real task graphs.  
 
Experiment Random 
tasks 
Gauss-
Jordan 
LU-
decomposition 
RDVFS 13.00% 0.1% 24.8% 
MMF-DVFS 13.50% 0.11% 25.5% 
MVFS-DVFS 14.40% 0.11% 27.0% 
Optimal 
Continuous 
Frequency 
14.84% 0.14% 27.81% 
 
 The major limitation on most DVFS-based algorithms working with one frequency (such 
as the RDVFS algorithm) is that the frequency combinations are fixed. Those algorithms 
work better when the processor can run at any arbitrary set of frequencies. However, due 
to technological issues, the number of valid frequencies is limited so that these algorithms 
have to choose the most appropriate frequency among a set of frequencies defined by 
DVFS. According to the fix number of tick cycles for a task (constraint 1 in Eqn.8) the 
relation among )(kRDt ,
)(k
RDf , Nf  and 
)(k
OSt  for task 
)(kA is: 
)(
)(
)( k
OS
N
k
RDk
RD t
f
f
t 
 
It is shown that although )(kRDt  is a continuous variable, it cannot accept all values; 
therefore the slack time of tasks cannot be minimized. However, in MFS-DVFS 
algorithm, the relation between those variables is 
)()(
22
)(
11
)()( k
NN
kkk
RD
k
RD RDRDRD
tftftftf    
which is one equation with more than one variables ),,( )()(1
k
N
k tt
RD
 and might have many 
eligible results; thus, appropriate values of these variables, with regard to the task 
conditions, can minimize the slack time and/or reduce energy consumption. 
An overhead with MFS-DVFS and MMF-DVFS is the transition time of switching from 
the one frequency to another one. An almost true assumption is that the overhead of 
transition times is relatively much less than the execution times of tasks; therefore the  
 Figure 9. Gauss-Jordan task graph: (a) a sample scheduling of a three level Gauss-Jordan 
task graph on three processors, (b) a Gauss-Jordan DAG for three levels. The 
communication costs (Cij) are equal to 10 time units for all i and j. 
 
transition times overhead can be neglected in calculations. In our experiments, the tasks 
with T at least 20 times more than transition time is considered for the MFS-DVFS 
algorithm. 
7. Conclusion 
    Since most traditional static task scheduling algorithms in HPCS do not consider 
power management, we addressed the energy issue with task scheduling and presented 
the MFS-DVFS algorithm. Our algorithm adopted the DVFS technique, a recent advance 
in processor design, to reduce energy consumption. 
In this chapter, we studied existing DVFS-based approaches to cover idle time and in 
particular, using a linear combination of more than one frequency to reduce energy 
consumption on processors. First, we noticed the energy model in DVS-enabled  
 Figure 10. The comparison between the percentages of energy saving in MFS-DVFS with 
other algorithms on the number of processors: (a) 1500 randomly generated task graphs, 
(b) 300 LU decomposition task graphs. 
 
processors. Then, we formulated our algorithm (MFS-DVFS) as an optimization problem 
of all frequencies for each task and then solved it to find the suitable time portions. 
Simulation results of 1500 randomly generated task graphs and 300 real world 
 application task graphs showed the effectiveness of the MFS-DVFS algorithm compared 
with other algorithms. 
8. Acknowledgment   
The work reported in this chapter is in part supported by National ICT Australia 
(NICTA). Professor A.Y. Zomaya's work is supported by an Australian Research Council 
Grant LP0884070. 
REFERENCE 
[1] N. Kamyabpour and D. B. Hoang, "A hierarchy energy driven architecture for 
wireless sensor networks," presented at the 24th IEEE International Conference 
on Advanced Information Networking and Applications (AINA-2010), Perth, 
Australia, 2010. 
[2] N. Kamyabpour and D. B. Hoang, "A Task Based Sensor-Centeric Model for 
overall Energy Consumption," Computing Research Repository(CoRR), 2012. 
[3] K. Almiani, S. Selvakennedy, and A. Viglas, "RMC: An Energy-Aware Cross-
Layer Data-Gathering Protocol for Wireless Sensor Networks," presented at the 
22nd International Conference on Advanced Information Networking and 
Applications (AINA), GinoWan, Okinawa, Japan, 2008. 
[4] K. Almiani, A. Viglas, and L. Libman, "Energy-efficient data gathering with tour 
length-constrained mobile elements in wireless sensor networks," presented at the 
The 35th Annual IEEE Conference on Local Computer Networks (LCN), Denver, 
Colorado, USA, 2010. 
[5] J. Zhuo and C. Chakrabarti, "Energy-efficient dynamic task scheduling algorithms 
for DVS systems," ACM Trans. Embed. Comput. Syst., vol. 7, pp. 1-25, 2008. 
[6] R. Ge, X. Feng, and K. W. Cameron, "Performance-constrained Distributed DVS 
Scheduling for Scientific Applications on Power-aware Clusters," presented at the 
Proceedings of the 2005 ACM/IEEE conference on Supercomputing, Seattle, WA, 
USA, 2005. 
[7] H. Kimura, M. Sato, Y. Hotta, T. Boku, and D. Takahashi, "Emprical study on 
Reducing Energy of Parallel Programs using Slack Reclamation by DVFS in a 
 Power-scalable High Performance Cluster," in 2006 IEEE International 
Conference on Cluster Computing,, Barcelona, Spain, 2006, pp. 1-10. 
[8] R. Xiaojun, Q. Xiao, Z. Ziliang, K. Bellam, and M. Nijim, "An Energy-Efficient 
Scheduling Algorithm Using Dynamic Voltage Scaling for Parallel Applications 
on Clusters," in Computer Communications and Networks, 2007. ICCCN 2007. 
Proceedings of 16th International Conference on, Honolulu, Hawaii, USA, 2007, 
pp. 735-740. 
[9] Y. C. Lee and A. Y. Zomaya, "Minimizing Energy Consumption for Precedence-
Constrained Applications Using Dynamic Voltage Scaling," presented at the 
Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster 
Computing and the Grid (CCGrid), Shanghai, China, 2009. 
[10] N. B. Rizvandi, J. Taheri, A. Y. Zomaya, and Y. C. Lee, "Linear Combinations of 
DVFS-enabled Processor Frequencies to Modify the Energy-Aware Scheduling 
Algorithms," presented at the Proceedings of the 2010 10th IEEE/ACM 
International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 
Melbourne, Australia, May 17-20, 2010. 
[11] A. Berl, E. Gelenbe, M. d. Girolamo, G. Giuliani, H. d. Meer, M. Q. Dang, and K. 
Pentikousis, "Energy-Efficient Cloud Computing," The Computer Journal, 2009. 
[12] N. Kappiah, V. W. Freeh, and D. K. Lowenthal, "Just In Time Dynamic Voltage 
Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs," 
presented at the Proceedings of the 2005 ACM/IEEE conference on 
Supercomputing, Seattle, USA, 2005. 
[13] Z. Zhu and X. Zhang, "Look-Ahead Architecture Adaptation to Reduce Processor 
Power Consumption," IEEE Micro, vol. 25, pp. 10-19, 2005. 
[14] D. Yang, K. Malkowski, P. Raghavan, and M. Kandemir, "Towards energy 
efficient scaling of scientific codes," in Parallel and Distributed Processing, 
2008. IPDPS 2008. IEEE International Symposium on, Miami, USA, 2008, pp. 1-
8. 
[15] A. Molnos and K. Goossens, "Conservative Dynamic Energy Management for 
Real-Time Dataflow Applications Mapped on Multiple Processors," presented at 
the 12th Euromicro Conference on Digital System Design, Architectures, 
Methods and Tools, Patras, Greece, 2009. 
[16] Y. Hotta, M. Sato, H. Kimura, S. Matsuoka, T. Boku, and D. Takahashi, "Profile-
based optimization of power performance by using dynamic voltage scaling on a 
pc cluster," presented at the IEEE International Symposium on Parallel and 
Distributed Processing (IPDPS), Isle of Rhodes, Greece, 2006. 
[17] R. Springer, D. K. Lowenthal, B. Rountree, and V. W. Freeh, "Minimizing 
execution time in MPI programs on an energy-constrained, power-scalable 
 cluster," presented at the Proceedings of the eleventh ACM SIGPLAN 
symposium on Principles and practice of parallel programming, New York, New 
York, USA, 2006. 
[18] B. Rountree, D. K. Lowenthal, S. Funk, V. W. Freeh, B. R. de Supinski, and M. 
Schulz, "Bounding energy consumption in large-scale MPI programs," in 
Supercomputing, 2007. SC '07. Proceedings of the 2007 ACM/IEEE Conference 
on, 2007, Reno, Nevada, USA, pp. 1-9. 
[19] T. Simunic, L. Benini, A. Acquaviva, P. Glynn, and G. D. Micheli, "Dynamic 
voltage scaling and power management for portable systems," presented at the 
Proceedings of the 38th annual Design Automation Conference, Las Vegas, 
Nevada, USA, 2001. 
[20] P. d. Langen and B. Juurlink, "Trade-Offs Between Voltage Scaling and 
Processor Shutdown for Low-Energy Embedded Multiprocessors," presented at 
the Embedded Computer Systems: Architectures, Modeling, and Simulation, 
Samos, Greece, 2007. 
[21] J.-J. Chen, C.-Y. Yang, T.-W. Kuo, and C.-S. Shih, "Energy-Efficient Real-Time 
Task Scheduling in Multiprocessor DVS Systems," presented at the Proceedings 
of the 2007 Asia and South Pacific Design Automation Conference, Yokohama, 
Japan, 2007. 
[22] C. Xian and Y.-H. Lu, "Dynamic voltage scaling for multitasking real-time 
systems with uncertain execution time," presented at the Proceedings of the 16th 
ACM Great Lakes symposium on VLSI, Philadelphia, PA, USA, 2006. 
 
 
