Turkish Journal of Electrical Engineering and Computer Sciences
Volume 24

Number 5

Article 59

1-1-2016

Energy-aware stochastic scheduling model with precedence
constraints on DVFS-enabled processors
MOHAMMAD SAJID
ZAHID RAZA

Follow this and additional works at: https://journals.tubitak.gov.tr/elektrik
Part of the Computer Engineering Commons, Computer Sciences Commons, and the Electrical and
Computer Engineering Commons

Recommended Citation
SAJID, MOHAMMAD and RAZA, ZAHID (2016) "Energy-aware stochastic scheduling model with
precedence constraints on DVFS-enabled processors," Turkish Journal of Electrical Engineering and
Computer Sciences: Vol. 24: No. 5, Article 59. https://doi.org/10.3906/elk-1505-112
Available at: https://journals.tubitak.gov.tr/elektrik/vol24/iss5/59

This Article is brought to you for free and open access by TÜBİTAK Academic Journals. It has been accepted for
inclusion in Turkish Journal of Electrical Engineering and Computer Sciences by an authorized editor of TÜBİTAK
Academic Journals. For more information, please contact academic.publications@tubitak.gov.tr.

Turkish Journal of Electrical Engineering & Computer Sciences
http://journals.tubitak.gov.tr/elektrik/

Turk J Elec Eng & Comp Sci
(2016) 24: 4117 – 4128
c TÜBİTAK
⃝
doi:10.3906/elk-1505-112

Research Article

Energy-aware stochastic scheduling model with precedence constraints on
DVFS-enabled processors
Mohammad SAJID∗, Zahid RAZA
School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, India
Received: 14.05.2015

•

Accepted/Published Online: 11.07.2015

•

Final Version: 20.06.2016

Abstract:The stochastic scheduling of precedence-constrained jobs on a heterogeneous processor is a challenging problem
that requires solutions with one or more optimized QoS parameters. In this work, an energy-aware stochastic algorithm
is proposed to schedule the batch of precedence-constrained jobs on heterogeneous DVFS-enabled processors with the
objective of optimizing turnaround time and energy consumption. The processing time of tasks in all jobs and their
precedence-constraint times are governed by independent probability distributions. The performance of the proposed
stochastic algorithm is compared with SHEFT and ECS based on randomly generated batches of diﬀerent sizes. The
experimental study reveals that the proposed algorithm significantly outperforms the SHEFT and ECS algorithms in
terms of turnaround time and energy consumption.
Key words: Stochastic scheduling, batch of stochastic precedence-constrained jobs, slack sharing, DVFS-enabled
processors, turnaround time, energy consumption

1. Introduction
The fastest supercomputer, Tianhe-2, consists of 16,000 physical nodes, each node having 2 Intel Xeon Ivy
Bridge and 3 Xeon Phi processors with overall power consumption of 17.808 MW [1]. The production of 1 kW
of electricity power consumes approximately 0.4 kg of coal and 4 L of water, and produces 0.272 kg of solid
powder, 0.997 kg of CO 2 , and 0.03 kg of SO 2 . The data centers consist of thousands of supercomputers, which
results in very high power consumption; therefore, the power consumption of data centers is a considerable
issue due to the associated negative environmental eﬀects, high monetary costs, and reduced reliability of the
systems [2]. To address this issue, various techniques, including resource hibernation, dynamic voltage-frequency
scaling (DVFS), resource consolidation, and memory optimization, have been proposed [2]. DVFS is a useful
technique that automatically adjusts the inputs of CMOS-enabled processors by scaling the frequency up and
down to reduce power consumption [2]. For DVFS-enabled processors, the scheduling algorithms must consider
the assignment of jobs to processors as well as the selection of frequency for execution in order to exploit
the idle slots that occur due to precedence constraints. In general, the scheduling of precedence-constrained
jobs has been proven to be NP-complete [3]. Furthermore, the scheduling on heterogeneous DVFS-enabled
processors to optimize turnaround time as well as energy consumption makes it an even more challenging
problem [4]. A plethora of energy-aware and nonenergy-aware algorithms have been proposed considering
deterministic processing and communication times [5–11]. For a deterministic scheduling problem, all the
parameters regarding the job and system are known beforehand and cannot deviate from the given parameters.
∗ Correspondence:

sajid.cst@gmail.com

4117

SAJID and RAZA/Turk J Elec Eng & Comp Sci

For real environments, the parameters of the job as well as the system can deviate considerably due to unknown
operating system conditions, memory access time, and so on. The stochastic scheduling model considers the
parameters as a random variable with a given probability density function and makes an eﬀort to optimize
the expected performance [12–15]. In realizing this, this work proposes an energy-aware stochastic algorithm
that schedules the batch of precedence-constrained jobs on many heterogeneous DVFS-enabled processors to
optimize the turnaround time and energy consumption. The processing times and precedence-constraint times
follow the independent normal probability distributions.
The organization of the remaining paper is given as follows. Section 2 discusses the related research for
energy-aware and stochastic scheduling. Section 3 formulates the energy-aware stochastic scheduling problem on
a heterogeneous DVFS-enabled computation system. Section 4 explains the ESHEFT algorithm, while Section
5 presents the experimental study for diﬀerent batches. Section 6 presents the concluding remarks.
2. Related research
Scheduling algorithms with the objective of minimizing the turnaround time and energy consumption, an active
research area with various energy-aware algorithms, are proposed for a DVFS-enabled heterogeneous computing
system. Zhu et al. introduced the concept of slack sharing, which reclaims the time unused by a task to reduce
the frequency of processors for reducing the energy consumption on a homogeneous computing system (HCS).
The two algorithms using the slack sharing concept, GSSR and FLSSR, were proposed for independent and
dependent tasks, respectively [7]. Zhang et al. presented many energy-eﬃcient algorithms for the HCS with
changeable continuous and discrete speeds for reducing energy consumption while meeting time deadlines.
Simulation studies indicate that for both continuous and discrete speeds computers, the hybrid algorithms have
superior performance and oﬀer the best task schedules [8]. Lee and Zomaya presented 2 energy-aware algorithms,
ECS and ECS + idle, in order to schedule the dependent tasks on a DVFS-enabled HCS based on the relative
superiority metric (RS) and makespan-conservative energy reduction technique (MCER) [9]. Li explored 2
energy-aware models, i.e. a problem to optimize turnaround time with energy consumption as the constraint
and a problem to optimize the energy consumption with turnaround time as the constraint, in order to schedule
dependent tasks on DVFS-enabled processors. Li proposed 3 types of algorithms (prepower-determination,
postpower-determination, and hybrid algorithms) to solve each subproblem eﬃciently [10]. Zhuravlev et al.
presented a survey of energy-cognizant scheduling algorithms considering 3 types of hardware mechanisms:
DVFS-enabled processors, thermal management, and asymmetric multicore designs [11].
For stochastic scheduling, the well-known algorithms are SEPT, WSEPT, and LEPT, which take into
consideration the expectation of execution times for making scheduling decisions. Skutella and Uetz proposed
constant-factor approximation algorithms for precedence-constrained stochastic tasks to optimize the total
weighted completion time on the HCS. For precedence-constraint scheduling with and without release dates
on m processors, the CMNS [12] algorithm (with κ > 0) has been proven with a performance guarantee of (1
+ κ)(1 + (1/ κ) + max {1, ((m – 1)/m)∆ }) and (1 + κ)(1 + ((m – 1)/m κ) + max{1, ((m – 1)/m) ∆ }),
respectively [13]. Tang et al. proposed SHEFT to address the problem of scheduling precedence-constrained
stochastic tasks on the HCS employing the average execution time of tasks based on expectation and variance
[14]. Li et al. presented the SDLS algorithm to schedule precedence-constraint tasks using the stochastic
bottom level and stochastic dynamic level on the HCS for independently normally distributed processing and
communication times [15].
It is to be noted that either energy/nonenergy-aware algorithms are proposed for jobs with deterministic
processing times or nonenergy-aware approximation/heuristics algorithms have been reported in the literature.
4118

SAJID and RAZA/Turk J Elec Eng & Comp Sci

This work presents an energy-aware stochastic scheduling algorithm for a batch of precedence-constrained jobs
with stochastic processing times to schedule on per-chip DVFS-enabled heterogeneous processors.
3. Problem formulation
This section explains the computation system employed, batch model, and energy model, along with the problem
statement.
3.1. Computation system
The computation system C is a HCS of K diﬀerent per-chip DVFS-enabled processors represented by C =
{px : 1 ≤ x ≤ K} . Each processor px ∈ C can run on nx discrete voltage levels, and the set of nx voltage levels
is given by Vx = {vx,z : 1 ≤= x ≤= nx }, such that if y < z, vx,y > vx,z . For each voltage level vx,z , there
exists a corresponding frequency fx,z ∈ px , and the set of nx frequency levels for processor px is given by Fx
= { fx,z :1 ≤ z ≤ nx } . The total number of frequency/voltage levels (NC ) in computation system C will be
∑K
equal to the sum of all the voltage/frequency levels of all processors, i.e. NC = x=1 |Vx |. The DVFS-capability
allows processors to switch from one voltage level to another with a trade-oﬀ between the power consumption
and processing time. Therefore, the processing time of any task at the highest frequency (fxmax ) of processor px
is minimum, whereas energy consumption is maximum. It is assumed that all processors are connected using a
fast interconnection network and the intracommunication cost between is zero, while intercommunication cost is
nonzero. The present computation system C can be extended to multicore DVFS-enabled or multicore per-core
DVFS enabled computation systems.
3.2. Batch model
Batch B consists of a collection of M independent jobs to be executed on computation system C and is represented
by B = {Ji : 1 ≤ i ≤ M } , with each job Ji ∈ B consisting of multiple dependent tasks. Each job Ji ∈ B is given
in the form of a directed acyclic graph (DAG) as Ji = (Ti , Ei ), where Ti = {ti,j : 1 ≤ j ≤Mi } represents the
{
}
set of Mi dependent and atomic tasks, and Ei = ei,(j,k) : 1 ≤ j ≤ k ≤Mi ⊂ Ti × Ti the set of precedenceconstrained edges between tasks. The edge ei,(j,k) ∈Ei represents the precedence constraint between tasks ti,j
and ti,k such that task ti,k starts its execution after completion of task ti,j . Task ti,k is the successor task of
task ti,j , and the set of all successor tasks of ti,j is given by Succi,j = {ti,k :ei,(j,k) ∈Ei } . The predecessor task
of task ti,j is connected by the edge ei,(k,j) ∈Ei , and the set of predecessors of task ti,j is given by P redi,j =
{ ti,k : ei,(k,j) ∈Ei } . A task ti,j having zero predecessor tasks, i.e. P redi,j = Φ , is called the entry task, and a
task ti,j having zero successor tasks, i.e. Succi,j = Φ, is called the exit task. The level of tasks ti,j is 1 if task
ti,j has zero predecessor tasks, i.e. P redi,j = Φ. On the next level, the successors of the first level’s tasks are
stretched out, and so on. The level of task ti,j is determined recursively as follows:
{
Li,j =

1,

P redi,j = Φ

1 + max{Li,k : ∀ti,k ∈ P redi,j }, otherwise

.

(1)

The level of the batch ( LB ) is the maximum level of any task ti,j ∈ B and is computed as:
LB = max{Li,j : ∀ti,j , Succi,j = Φ}

(2)

The tasks of all jobs are randomly distributed over diﬀerent levels from 1 to LB .
4119

SAJID and RAZA/Turk J Elec Eng & Comp Sci

3.3. Scheduling design principles
Stochastic scheduling makes an eﬀort to optimize the expected performance of the solutions under the assumption that parameters are random variables with known probability distributions. In this work, the processing
times of each task ti,j ∈Ti and precedence-constraints edge ei,(j,k) ∈Ei in a job Ji follow the independent normal probability distributions [12–17]. If the processing time of task ti,j and precedence-constraint time of edge
′

ei,(j,k) follow the normal probability distribution with expectations and variances as µ, µ ; σ 2 , σ ′ , respectively,
2

then ti,j ∼ N (µ, σ 2 ) and ei,(j,k) ∼ N (µ′ , σ ′ ). The processors deployed in computation system C are heteroge2

neous DVFS-enabled; therefore, the processing time of each task ti,j ∈ Ti with respect to each frequency level
fx,z ∈ Fx will have diﬀerent expectations and variances. For example, the processing time of task ti,j w.r.t.
2
vx,z will follow an independent normal probability distribution with expectation µi,j,x,z and variance σi,j,x,z
,
2
i.e. ti,j ∼ N (µi,j,x,z , σi,j,x,z
) . In the theory of randomness, the performance of stochastic scheduling depends

on the function ∆ of expectation as well as variance for random processing time Y where Var(Y)/E[Y] 2 ≤ ∆
[12, 14–17]. For ∆ > 1, the performance of stochastic scheduling increases as the variance of random variable
decreases. For others, the performance is aﬀected by the sum of expectation and variance of the random variable. Therefore, the processing times and precedence-constraint times are computed based on the expectation
and variance of the random variable. The approximate value (AV(Y)) of a random number Y using mean (E[Y])
and variance (Var[Y]) can be determined as:

√
V ar[Y ]


 E [Y ] + V ar[Y ] if E[Y ]2 ≤ 1
(
)
AV (Y )=
.


otherwise
 E [Y ] 1 + √ 1

(3)

V ar[Y ]

Let [ wi,j,x,z ] be a matrix of order M ×Mmax ×K×NC , where wi,j,x,z gives the approximate value (AV (ti,j ) ) of
the processing time of task ti,j ∈Ti with regard to frequency level fx,z ∈Fx on processor px using Eq. (3), and
Mmax represents the maximum number of tasks in any job of the batch. Let wzmax (ti,j ) give the approximate
processing time of task ti,j ∈Ti with regard to the fastest frequency (fxmax ) of processor px . Let [wi,(j,k) ] be a
matrix of orderM ×Mmax ×Mmax ; wi,(j,k) gives the approximate precedence-constraint time of edge ei,(j,k)
between tasks ti,j and ti,k . The average processing time of task ti,j with respect to the DVFS-enabled
computation system C can be written as:
AP (ti,j ) =

∑K
x=1

/
∑K
w
z=1 i,j,x,z

∑nx

x=1

nx .

(4)

The stochastic b-level (sbi,j ) of task ti,j in the batch B with respect to the DVFS-enabled computation system
C can now be rewritten as [5,6]:
{
sbi,j =

AP (ti,j )

Succi,j = Φ
.
{
}
AP (ti,j ) + maxti,k ∈Succi,j sbi,k + wi,(j,k) } Otherwise

(5)

To compute the turnaround time of the whole batch, it is required to define the start time (ST i,j,x ) , the finish
x,z
time ( F T i,j,x ) for task ti,j ∈ B scheduled on processor px ∈ C , the allocation metric (Xi,j
) , and the ready
x,z
time (RT x ) of processor px . Let Xi,j
= 1 if task ti,j is scheduled on processor px at frequency fx,z level

4120

SAJID and RAZA/Turk J Elec Eng & Comp Sci

x,z
x,z
and Xi,j
= 0 otherwise. If Xi,j
= 1, then the start time ( ST i,j,x ) and finish time ( F T i,j,x ) of task ti,j on

fx,z ∈ Fx can be computed using Eqs. (6) and (7) respectively, and the expected ready time of the processor
is given by Eq. (8).
{
ST i,j,x =

RT x

P redi,j = Φ
{
}}
{
,
max P RT x , maxti,k ∈P redi,j &&x̸=y F T i,k,y + ei,(k,j) , otherwise
F T i,j,x = ST i,j,x + wi,j,x,z ,
{
RT x =

F T i,j,x
F T l,m,x

x,z
if Xi,j
=1

otherwise

(6)

(7)

.

(8)

Here, F T l,m,x represents the last scheduled tasks ( tl,m ) on processor px .
Due to the simultaneous execution of M precedence-constrained jobs on computation system C, the
turnaround time (T AT B ) of batch B will be the maximum of finish times of all the tasks and can be computed
as:
T AT B = max {F T i,j,x : ∀ti,j ∈ B, Succi,j = Φ} .
(9)
3.4. Energy consumption model
The energy consumption of computational system C depends on all involved resources, namely the processors,
cooling system, memory accesses, communication networks, and so on [2]. The processors consume the most
significant portion of the total power consumption. The power consumption (P owx,z ) of the processor consists
of static power ( PStatic ) and dynamic power ( PDynamic ) , and it is given at voltage vx,z ∈ Vx as [7,9–11]:
P owx,z = PStatic + PDynamic .

(10)

If processor px is idle, the processor consumes the static power (leakage power) constantly, irrespective of
voltage and frequency, i.e. P owx = PStatic . With the busy mode, the dynamic power (PDynamic ) depends on
frequency fx,z and voltage vx,z given as:
2
PDynamic = γx × Vx,z
× fx,z ,

(11)

where γx is a constant representing the activity factor and the physical capacitance. The constant γx is a
manufacturing constant, whereas frequency fx,z and vx,z can be decided at the compile or run time.
x,z
If Xi,j
= 1, then task ti,j is executed on processor px at frequency fx,z ; the energy consumed by task

ti,j can be computed using Eqs. (10) and (11) as:
Ei,j,x,z = P owx,z × wi,j,x,z .

(12)

The total energy consumption ( EN B ) of batch B depends on the energy consumed by all tasks and idle slots
created due to precedence constraints. Let Nx represent the total number of idle slots on processor px and
IS x,i represent the length of the i th idle slot on processor px ; the energy consumed by the idle slot ( IS i ) is
computed as:
EN x,i = PStatic × IS x,i .

(13)
4121

SAJID and RAZA/Turk J Elec Eng & Comp Sci

Therefore, the total energy consumption (EN B ) of batch B on computation system C is computed as the sum
of the energy consumption during idle slots and the energy consumed by the allocated tasks, and it is given as
follows:
∑M ∑Mi ∑K ∑nx
∑K ∑Nx
EN B =
wi,j,x,z × Xi,j,(x,z) +
EN x,i .
(14)
i=1

j=1

x=1

z=1

x=1

i=1

3.5. Problem statement
The formulated scheduling problem can be represented in Graham et al.’s notation as QK |ti,j ∼ pred, stoc| min
T AT B min EN B [4]. The first field, QK, represents the computation system C of K heterogeneous parallel
DVFS-enabled processors given in Section 3.1, whereas the second field represents the task ti,j ∈ B following
the precedence constraints with stochastic time requirements as given in Section 3.2. The third field corresponds
to the objective, i.e. to optimize the energy consumption and turnaround time of the batch B. Let SB represent
the schedule generated by the scheduling algorithm. The statement of the problem can then be written as:
{
Minimize

T AT B
,
EN B

(15)

 ∑K ∑n
x,z
x


x=1
z=1 Xi,j = 1, ti,j

st.

.
M ∑
∑
∑M

Mi ∑K ∑nx
x,z

X
M
=

i
j=1
x=1
z=1 i,j
i=1

(16)

i=1

4. The ESHEFT algorithm
The proposed ESHEFT algorithm executes the batch of precedence-constrained jobs (B) on the computation
system (C), based on HEFT [6] and SHEFT [14]. However, HEFT and SHEFT are single-job algorithms with
the aim of optimizing turnaround time, whereas the proposed algorithm ESHEFT is a batch- and energy-aware
algorithm to schedule the batch of jobs to optimize both turnaround time (T AT B ) and energy consumption
(EN B ). Furthermore, the ESHEFT algorithm employs a modified stochastic b-level (Eq. (5)) of the tasks
in order to follow precedence constraints between tasks and idle slots to choose the execution frequency that
reduces energy consumption without sacrificing turnaround time.
The pseudocode of our energy-aware stochastic heterogeneous earliest finish time (ESHEFT) algorithm
is shown in Algorithm 1. The ESHEFT algorithm takes 2 inputs, batch B and computation system C, and
returns schedule SB with expected turnaround time (T AT B ) and energy consumption (EN B ) as output. As
the algorithm proceeds, it computes the stochastic b-level (sbi,j ) and level Li,j for each task ti,j ∈ B using
Eqs. (5) and (1), respectively. For the k th level’s tasks, a queue Qk is formed in decreasing order of stochastic
b-level ( sbi,j ), and the algorithm selects the processor and the execution frequency for each task level-wise.
For the tasks of the k th level, it removes the first task (say ti,j ) from queue Qk and computes the expected
earliest starting time ( ST i,j,x ) and expected earliest finish time (F T i,j,x ) with respect to each processor px ∈ C
at maximum frequency (fxmax ) using Eqs. (6) and (7) respectively. For task ti,j , the algorithm selects the
processor that oﬀers the minimum expected earliest finish time (F T i,j,x ) and adds this task to the scheduling
queue ( SCHQk ) of the k th level. The algorithm updates schedule SB information as:
SB (P roci,j ) = px .
4122

(17)

SAJID and RAZA/Turk J Elec Eng & Comp Sci

Following this, the algorithm selects the frequency for each task of the same level and the frequency is chosen
using the idle slot before the task, resulting in no eﬀect on turnaround time. For tasks in SCHQk , it removes the
first task (say ti,j ) and determines its allocated processor px using ti,j and SB . Following this, the algorithm
determines the earliest start time (EST i,j,x ) of task ti,j on processor px as:
{
{
}}
EST i,j,x = max F T l,m,z , maxti,k ∈P redi,j &&z̸=y F T i,k,y + ei,(k,j) .

(18)

Next, the algorithm computes the maximum possible total slot time that can be available to task ti,j without
violating the precedence constraints as:
SlackT i,j,x = F T i,j,x − EST i,j,x − wxmax (ti,j ) , if EST i,j,x < ST i,j,x .

(19)

The algorithm then determines the minimum possible continuous frequency (fnew (tij )) of the processor px while
meeting the deadlines of the expected earliest finish time(F T i,j,x ) as:
fnew (tij ) =

wxmax (ti,j )
× fxmax .
wxmax (ti,j ) + SlackT i,j,x

(20)

Algorithm 1. ESHEFT algorithm.
Algorithm 1: ESHEFT
Input: Batch B, Computation System C
Output: Schedule SB with T AT B , EN B
Begin
1. Compute the level Li,j of each task ti,j ∈ B using Eq. (1)
2. Compute the stochastic b-level ( sbi,j ) for each task ti,j ∈ B using Eq. (5)
3. Divide batch B into levels from 1 to LB
4. For each level Lk , form queue Qk by adding and sorting tasks in decreasing order of sbi,j
5. For each level k = 1 to LB do
6. //Processor Selection
While there are tasks at level Lk
a. Remove the task ti,j ∈ B from the list Qk
b. Compute expected earliest starting time (ST i,j,x ) and expected earliest finish time (F T i,j,x ) w.r.t.
each px ∈ C at fxmax using Eqs. (6) and (7) respectively
c. Assign processor px ∈ C with minimum F T i,j,x to task ti,j
d. Add task ti,j to SCHLIST k
e. Update SB (P roci,j ) using Eq. (17)
End While
7. //Frequency selection without sacrificing the turnaround time
For each task ti,j ∈SCHLIST k
a. Determine the processor px =SB (P roci,j ) and EST i,j,x
4123

SAJID and RAZA/Turk J Elec Eng & Comp Sci

b. Compute the slack slot SlackT i,j,x using Eq. (19)
c. Compute the continuous frequency fnew (ti,j ) using (20)
¯ fnew (ti,j )
d. Schedule task ti,j on processor px with discrete frequency fx,z >
¯ represents immediately greater than or equal fraction
e. // >
f. Update ST i,j,x and F T i,j,x corresponding to fx,z using Eqs. (21) and (22)
g. Send precedence-constraints data to all successors ti,k ∈Succi,j
h. Update SB (f req i,j ) using Eq. (23)
End For
8. End For
9. Get Schedule SB
10. Compute T AT B using schedule SB and Eq. (9)
11. Compute EN B using schedule SB and Eq. (14)
12. Return SB , T AT B , and EN B
End
Next, the ESHEFT algorithm selects the discrete frequency ( fx,z ) immediately greater than the continuous frequency fxCont ; this frequency is assigned to task ti,j for execution. Due to the diﬀerence between discrete
(fx,z ) and continuous (fnew (ti,j )) frequencies, the expected earliest starting time (ST i,j,x ) and expected earliest finish time(F T i,j,x ) will be updated using Eqs. (21) and (22), respectively. The algorithm also updates
schedule SB information using Eq. (23).
ST i,j,x = EST i,j,x ,

(21)

F T i,j,x = ST i,j,x + wx,z (ti,j ) ,

(22)

(
)
SB f req i,j = fx,z .

(23)

Following scheduling and frequency selection for all tasks, the algorithm computes the turnaround time (T AT B )
and energy consumption (EN B ) of the batch B using Eqs. (9) and (14), respectively.
4.1. Performance evaluation
To evaluate performance, a simulation program is developed using MATLAB with the Intel Core i5-3470 that
realizes DVFS-enabled processors of 5 types, namely Intel Core 2 Duo with 4 frequencies, Intel Core 2 Extreme
with 4 frequencies, AMD Sempron APUs with 3 frequencies, AMD Athlon APUs with 3 frequencies, and TI
DSP with 2 frequencies. A simulation environment of 15 processors consisting of 3 processors of each type
is created. To generate the batch B consisting of a random number of jobs (DAGs) and random processing
and precedence-constrained tasks, a statistical prediction technique is used to get the normal distribution of
processing and precedence-constraint times. The simulation program also takes 4 input parameters: 1) number
of stochastic jobs in a batch B; 2) range of stochastic tasks in each job; 3) ranges of expectations and variances
of processing times of tasks; 4) ranges of expectations and variances of precedence-constraint times [18]. The
4124

SAJID and RAZA/Turk J Elec Eng & Comp Sci

performance of the ESHEFT is compared with ECS [9] and SHEFT [14] with 2 variations. Similar to ECS and
SHEFT, ECS-1 and SHEFT-1 schedule jobs from batch B one-by-one randomly, whereas SHEFT-2 and ECS-2
convert the whole batch B into a single DAG by adding the pseudo-entry and -exit tasks with zero processing
and communication times to schedule the tasks. Five diﬀerent batches are created that consist of 10, 20, 30, 40,
and 50 jobs with task ranges from 32 to 256, which results in the number of tasks [320, 2560], [640, 5120], [960,
7680], [1280, 10240], and [1600, 12800], respectively. The ranges of expectation and variance of processing times
are [1, 3000] and [20, 1000] respectively, whereas ranges of expectation and variance of precedence-constrained
times are [1, 500] and [20, 100], respectively. The results corresponding to 5 batches with turnaround time,
energy consumption, and their standard deviations are shown in Figures 1–4, respectively.
x 10 5

4
3 x 10

ECS-1
ECS-2
SHEFT-1
SHEFT-2
ESHEFT

4

Turnaround Time Std. Dev.

Turnaround Time (s)

5

3
2
1
0

10

20

30
40
Number of Jobs in Batch

8
6
4
2
0

1.5
1
0.5
10

14000

x 10 4
ECS-1
ECS-2
SHEFT-1
SHEFT-2
ESHEFT

10

20
30
40
Number of Jobs in Batch

50

Figure 3. Energy consumption of diﬀerent batches.

ECS-1
ECS-2
SHEFT-1
SHEFT-2
ESHEFT

20
30
40
Number of Jobs in Batch

50

Figure 2. Turnaround standard deviation of diﬀerent
batches.

Energy Std Deviation (KJ)

Energy Consumption (KJ)

10

2

0

50

Figure 1. Turnaround time of diﬀerent batches.

2.5

12000
10000

ECS-1
ECS-2
SHEFT-1
SHEFT-2
ESHEFT

8000
6000
4000
2000
0

10

20
30
40
Number of Jobs in Batch

50

Figure 4. Energy consumption standard deviation of
diﬀerent batches.

Figure 1 represents the turnaround time of ECS-1, ECS-2, SHEFT-1, SHEFT-2, and ESHEFT on the
computation systems of 15 processors for 5 batches. It is observed from Figure 1 that the order of turnaround
time performance is ESHEFT, SHEFT-2, ECS-2, SHEFT-1, and ECS-1. It can also be seen from Figure 1 that
the performance of ESHEFT increases with the size of the batch. For average performance of the 5 batches,
the respective percentages for improvement of ESHEFT with respect to ECS-1, ECS-2, SHEFT-1, and SHEFT2 are 12.5%, 11.5%, 12.2%, and 10.7% in terms of turnaround time. It is observed from Figure 2 that the
ESHEFT algorithm oﬀers more stable performance, as the standard deviation of the oﬀered turnaround time
is smaller in comparison to other algorithms. Corresponding to the turnaround time presented in Figure 1, the
4125

SAJID and RAZA/Turk J Elec Eng & Comp Sci

energy consumption (kJ) of the same 5 batches is shown in Figure 3. It is observed from Figure 3 that the
ESHEFT algorithm consumes minimal energy; the order of energy consumption is given as ESHEFT, ECS-2,
SHEFT-2, ECS-1, and SHEFT-1. The energy-aware algorithms ECS-1 and ECS-2 both take into consideration
the execution time and energy consumption of tasks; hence, both oﬀer lower energy consumption in comparison
to SHEFT-1 and SHEFT-2, respectively. For the average energy consumption of 5 batches, the percentages
for energy consumption improvement of ESHEFT with respect to ECS-1, ECS-2, SHEFT-1, and SHEFT-2 are
11.1%, 10.4%, 11.4%, and 10.7%, respectively. It is also observed from Figure 4 that the ESHEFT algorithm
oﬀers the minimum energy standard deviation, which results in more stable performance in comparison to its
peers.
Both ECS-1 and SHEFT-1 algorithms schedule jobs one-by-one randomly and only exploit the parallelism
between the tasks of a job rather than making use of parallelism between diﬀerent jobs, resulting in a greater
number of idle slots on the schedule. Therefore, ECS-1 and SHEFT-1 oﬀer higher turnaround time in comparison
to the ECS-2, SHEFT-2, and ESHEFT algorithms. More idle slots consume more energy towards the total
energy consumption, which causes ECS-1 and SHEFT-1 to consume more energy in comparison to the ECS2, SHEFT-2, and ESHEFT algorithms. ECS-2 and SHEFT-2 form a single DAG, adding some precedence
constraints between jobs as well as tasks, which results in less parallelism and more delay in the execution time
of tasks; hence, these algorithms oﬀer greater turnaround time in comparison to ESHEFT. Since the proposed
algorithm ESHEFT explores and exploits the parallelism between and within the jobs, the ESHEFT algorithm
makes full use of resources, which results in less turnaround time. Additionally, SHEFT-2 and ECS-2 consume
more energy in idle slots in comparison to the ESHEFT algorithm.
The main reason behind the good performance of the ESHEFT algorithm is that it combines all of the
jobs to exploit the parallelism between and within the jobs of the batch and generates a continuous frequency
based on the available idle slot that helps to choose the discrete frequency of the processor. It can be concluded
that combining the jobs into batches and scheduling using ESHEFT has a major impact on performance in
comparison to peers for the QK |ti,j ∼ pred, stoc| min T AT B min EN B problem.
5. Conclusion
An energy-aware stochastic ESHEFT algorithm is proposed to schedule the batch of precedence-constrained jobs
(DAGs) on DVFS-enabled processors, incorporating the slack time of tasks to minimize energy consumption
without sacrificing turnaround time. Using a randomly generated batch of precedence-constrained jobs, the
simulation results demonstrate the superiority of the ESHEFT algorithm over SHEFT and ECS in terms of
turnaround time, energy consumption, and their standard deviations. From the experimental study, it is found
that scheduling the precedence-constrained stochastic jobs into batches is preferable to scheduling a single job
in terms of turnaround time and energy consumption, as it results in higher system utilization, lower monetary
costs, and lower negative environmental eﬀects. Additionally, the performance of the ESHEFT algorithm
suggests its possible use for scheduling in general multiprocessor, multicore processors, and large-scale parallel
computing systems.
Symbol

Explanation

C

Computation system C with K diﬀerent per-chip DVFS-enabled processor px .

Vx /Fx

The set of nx discrete voltage (vx,z )/frequency (fx,z ) levels for processor px ∈ C.

B

Batch of M jobs and each job Ji ∈ B consisting of multiple dependent tasks.

Ti /Ei

Ti consists of Mi dependent tasks ti,j and Ei consists of precedence-constrained edge
ei,(j,k) between tasks.

4126

SAJID and RAZA/Turk J Elec Eng & Comp Sci

Succi,j

The set of all successor tasks of ti,j .

P redi,j

The set of predecessors of task ti,j

Li,j /LB

The level of tasks ti,j /batch B.

[wi,j,x,z ]

A matrix of orderM ×Mmax ×K×NC , where wi,j,x,z gives processing time of task ti,j ∈Ti
w.r.t. frequency level fx,z ∈Fx .

[wi,(j,k) ]

A matrix of order M ×Mmax ×Mmax , wi,(j,k) gives approximate precedence-constraint
time of edge ei,(j,k) between tasks ti,j and ti,k .

sbi,j

The stochastic b-level (sbi,j ) of task ti,j w.r.t. to DVFS-enabled processor.

ST i,j,x

Expected start time of task ti,j scheduled on processor px .

F T i,j,x

Expected finish time of task ti,j scheduled on processor px .

RT x

Expected ready time of processor px .

T AT B

The turnaround time of batch B.

P owx,z

The power consumption of processor px consists of static power (PStatic ) and dynamic
power (PDynamic ) at voltage vx,z .

EN B

The total energy consumption of batch B.

References
[1] TOP 500 Supercomputers, TIANHE-2 November 2014 List. URL: http://www.top500.org/lists/2014/11/. Accessed
on 8 May 2015.
[2] Venkatachalam V, Franz M. Power reduction techniques for microprocessor systems. ACM Comput Surv 2005; 37:
195-237.
[3] Garey MR, Johnson DS. Computers and Intractability: A Guide to the Theory of NP-Completeness. New York,
NY, USA: W. H. Freeman & Co., 1990.
[4] Leung JYT. Handbook of Scheduling: Algorithms, Models, and Performance Analysis. New York, NY, USA:
Chapman & Hall/CRC, 2004.
[5] Kwok YK, Ahmad I. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM
Comput Surv 1999; 31: 406-471.
[6] Topcuoglu H, Hariri S, Wu M. Performance-eﬀective and low-complexity task scheduling for heterogeneous computing. IEEE T Parall Distr 2002; 13: 260-274.
[7] Zhu D, Melhem R, Childers B. Scheduling with dynamic voltage/speed adjustment using slack reclamation in
multiprocessor real-time systems. IEEE T Parall Distr 2003; 14: 686-700.
[8] Zhang S, Chatha KS. Approximation algorithm for the temperature-aware scheduling problem. In: IEEE/ACM
2007 Computer-Aided Design Conference; 4–8 November 2007; San Jose, CA. Piscataway, NJ, USA: IEEE/ACM.
pp. 281-288.
[9] Lee YC, Zomaya A. Energy conscious scheduling for distributed computing systems under diﬀerent operating
conditions. IEEE T Parall Distr 2011; 22: 1374-1381.
[10] Li K. Scheduling precedence constrained tasks with reduced processor energy on multiprocessor computers. IEEE
T Comput 2012; 61: 1668-1681.
[11] Zhuravlev S, Saez JC, Blagodurov S, Fedorova A, Prieto M. Survey of energy-cognizant scheduling techniques. IEEE
T Parall Distr 2013; 24: 1447-1464.
[12] Chekuri C, Motwani R, Natarajan B, Stein C. Approximation techniques for average completion time scheduling.
SIAM J Comput 2001; 31: 146-166.

4127

SAJID and RAZA/Turk J Elec Eng & Comp Sci

[13] Skutella M, Uetz M. Stochastic machine scheduling with precedence constraints. SIAM J Comput 2005; 34: 788-802.
[14] Tang X, Li K, Liao G, Fang K, Wu F. A stochastic scheduling algorithm for precedence constrained tasks on grid.
Future Gener Comp Sys 2011; 27: 1083-1091.
[15] Li K, Tang X, Veeravalli B, Li K. Scheduling precedence constrained stochastic tasks on heterogeneous cluster
systems. IEEE T Comput 2015; 64: 191-204.
[16] Moring RH, Schulz AS, Uetz M. Approximation in stochastic scheduling: the power of LP-based priority policies.
J ACM 1999; 46: 924-942.
[17] Scharbrodt M, Schickingera T, Steger A. A new average case analysis for completion time scheduling. J ACM 2006;
53: 121-146.
[18] Kasahara H. STG: Standard Task Graph Set. Kasahara Laboratory, Waseda University, Tokyo, Japan. URL:
http://www.kasahara.elec.waseda.ac.jp/schedule/. Accessed on 2 May 2015.

4128

