A Machine Assignment Mechanism for Compile-Time List-Scheduling Heuristics by Hagras, Tarek & Janeček, Jan
Computing and Informatics, Vol. 24, 2005, 341–350
A MACHINE ASSIGNMENT MECHANISM
FOR COMPILE-TIME LIST-SCHEDULING HEURISTICS
Tarek Hagras, Jan Janeček
Department of Computer Science and Engineering
Czech Technical University in Prague
Prague, Czech Republic
e-mail: tarek@felk.cvut.cz, janecek@cs.felk.cvut.cz
Manuscript received 18 November 2003; revised 3 May 2005
Communicated by Ladislav Hluchý
Abstract. Finding an optimal solution for a scheduling problem is NP-complete.
Therefore, it is necessary to use heuristics to find a good schedule rather than eva-
luating all possible schedules. List scheduling is generally accepted as an attractive
approach, since it combines low complexity with good results. List scheduling con-
sists of two phases: a task prioritization phase where a certain priority is computed
and assigned to each task, and a machine assignment phase where each task (in
order of its priority) is assigned a machine that minimizes a suitable cost function.
This paper presents a machine assignment mechanism that can be used with any
list-scheduling algorithm. The mechanism is called Reverse Duplicator Mechanism
and outperforms the current mechanisms.
Keywords: Compile-time scheduling, machine assignment mechanisms, list-sche-
duling, homogenous computing systems.
1 INTRODUCTION
Efficient schedule of parallel programs is one of the most essential and difficult issues
to achieve high performance in both homogeneous and heterogeneous computing
environments [1]. The main objective of scheduling mechanism is to map tasks to
machines and order their executions so that precedence requirements are satisfied
and minimum overall completion time (makespan) is achieved. When the structure
of the parallel program in terms of its task execution times, task dependencies
342 T. Hagras, J. Janeček
and size of communicated data is known a priori, the application is represented
by the static model and scheduling can be accomplished statically at compile-time.
In the general form of a static task scheduling, the application is represented by
the directed acyclic graph (DAG) [2, 3], in which nodes represent application tasks
and edges represent inter-task data dependencies. Each node is labeled by the
computation cost (expected computation time) of the task and each edge is labeled
by the communication cost (expected communication time).
Finding an optimal solution for the scheduling problem is NP-complete [4, 5].
Therefore, it is necessary to have heuristics to find a good schedule rather than
evaluate all possible schedules. Most scheduling heuristics algorithms are based on
list scheduling [2, 3, 5–8]. List scheduling consists of two phases: a task prioritizing
phase where a task list (L) that contains all tasks is constructed and a priority is
computed and assigned to each task in L, and a machine assignment phase where
each task (in order of its priority) is assigned a machine that minimizes a suitable
cost function. The scheduling heuristic is called static if the machine assignment
phase starts after finishing the task prioritizing phase [2, 5, 9] and it is called dynamic
if the two phases are interleaved [10, 11].
This paper presents a machine assignment mechanism called Reverse Duplica-
tor (RD) that outperforms the current mechanisms in low complexity. The mecha-
nism can be used for the machine assignment phase with any list-scheduling algo-
rithm. It can be also used for both homogeneous and heterogenous environments.
The remainder of this paper is organized as follows. The next section describes the
current machine assignment mechanisms. Section 3 presents the suggested mecha-
nism. In Section 4, the performance comparison of the examined mechanisms is
presented. Section 5 provides the conclusion.
2 CURRENT MACHINE ASSIGNMENT MECHANISMS
This section presents the current machine assignment mechanisms used in static list-
scheduling algorithms. These mechanisms are: non-insertion [3, 4, 8] and insertion
based [6, 9] mechanisms.
2.1 Non-Insertion Based Mechanism
The non-insertion (NI) mechanism tries to assign each task vi ∈ L a machine pm ∈ P ,
that allows the task to be executed as early as possible. The Task Start Time on
a machine TST is defined as
TST(vi, pq) = max
vn∈prnt(vi)
{RT(pq),FT(vn) + k · cn,i} ,
where:
prnt(vi) is the set of immediate predecessors of vi,
RT(pq) is the time when pq is available,
A Machine Assignment Mechanism 343
FT(vn) is the completion time of the parent node vn,
cn,i is the communication cost between vn and vi, and
k = 1 if the machine assigned to parent task vn is not pq, and k = 0 otherwise.
2.2 Insertion Based Mechanism
The insertion based (IB) mechanism considers a possible insertion of each task vi ∈ L
in the earliest idle time slot between two already scheduled tasks on a given machine.
For each task vi, the absolute start time on a machine pq is computed as follows
AST(vi, pq) = max
vn∈prnt(vi)
{FT(vn) + k · cn,i},
where:
prnt(vi),FT(vn), cn,i and k are the same as TST(vi, pq) in non-insertion based
mechanism.
A task vi can be inserted into the machine pq, which contains the node sequence
{vq1 ,vq2 ,....,vqx }, after task vqy if
TST(vqy+1 , pq)−max{AST(vi, pq),TFT(vqy , pq)} ≥ wi
where:
TST(vqy , pq) is the task vy starting time on pq,
TFT(vqy , pq) is the task vy finishing time on pq,
wi is the task vi computation cost, and
TST(vqx+1, pq) is equal to ∞.
The machine pm that minimizes vi start time is selected.
3 PROPOSED MECHANISM
Basically, task-duplication algorithms try to duplicate the parent-tree or some se-
lected parents of a current selected task to an unbounded number of machines. The
goal of this duplication is to minimize or optimize the start time of the duplicated
parents to be able to select the machine that minimizes the start time of the se-
lected task. This big number of duplications increases the algorithm complexity,
while optimality is still far from being achieved. Considering an unbounded number
of machines as a target computation environment is still unpractical.
The main idea of the proposed machine assignment mechanism (Figure 1) is to:
1. select the machine that minimizes the start time of the current selected task,
2. examine the idle time left by the selected task on the selected machine for
duplicating one selected parent,
344 T. Hagras, J. Janeček
3. confirm this duplication if it will reduce the start time of the current selected
task.
In contrast to the basic idea of general duplication algorithms, the proposed mecha-
nism selects the machine and then checks for duplication. Instead of examining one
task at each step, the mechanism examines one task and one parent, which does
not increase the complexity of the classical non-insertion based machine assignment
mechanism.
The following five definitions should be given to clarify the proposed mechanism:
Definition 1. The Task Start Time on a machine TST is defined as follows:
TST(vi, pq) = max
vn∈prnt(vi)
{RT(pq),FT(vn) + k · cn,i},
where:
prnt(vi) is the set of immediate parents of vi,
RT(pq) is the time when pq is available,
FT(vn) is the completion time of parent vn,
cn,i is the communication cost between vn and vi, and
k is equal to 1, if the machine assigned to parent task vn is not pq and is
equal to 0 otherwise.
Definition 2. The Duplication Time Slot:
DTS(vi, pm) = TST(vi, pm)− RT(pm).
Definition 3. The Critical Parent is the parent vCP (scheduled on pq) of vi (ten-
tatively scheduled on pm) whose data arrival time to vi is the latest.
Definition 4. DAT (vCP2, pm) is the data arrival time of the second critical parent
vCP2 on pm.
Definition 5. The Duplication Condition is
DTS(vi, pm) > wCP
and
TST(vCP , pm) + wCP < TST(vi, pm).
If the duplication condition is satisfied the mechanism works as follows:
1. duplicate the vCP on pm at the later of RT(pm) and TST(vCP , pm),
2. update RT(pm),
3. assign vi to pm at the later of RT(pm) and DAT (vCP2, pm).
A Machine Assignment Mechanism 345
while not the end of L do
dequeue vi from L
for each machine pq in the machine set P do
compute TST(vi, pq)
select the machine pm that minimizes TST of vi
select vCP and vCP2 of vi
if the duplication condition is satisfied
if TST(vCP , pm) ≤ RT(pm)
duplicate vCP on pm at RT(pm)
RT(pm) = RT(pm) +wCP
else
duplicate vCP on pm at TST(vCP , pm)
RT(pm) = TST (vCP , pm) + wCP
if DAT (vCP2, pm) > RT(pm)
assign vi to pm at DAT (vCP2, pm)
RT(pm) = DAT (vCP2, pm) + wi
else
assign vi to pm at RT(pm)
RT(pm) = RT(pm) +wi
else
assign task vi to pm at TST(vi, pm)
RT(pm) = TST(vi, pm) + wi
Fig. 1. Proposed machine assignment mechanism
4 MECHANISMS COMPLEXITY
Complexity is usually expressed in terms of the number of nodes v, the number of
edges e, and the number of machines p. The mechanisms complexity is shown in
Table 1.
Algorithm Complexity
RD O(pv2)
NI O(pv2)
IB O(pv3)
Table 1. Mechanisms complexity
346 T. Hagras, J. Janeček
1
2 3 4 5 6
7 8 9
10
4
2 8 4 2 2
4 8 4
4
8
16
4 4
20
4 12
12
2 4
12
12
12 20 12
Fig. 2. Application graph
5 EXPERIMENTAL RESULTS AND DISCUSSION
This section presents the performance comparison of the examined mechanisms (non-
insertion (NI) and insertion based (IB) mechanisms) in addition to the proposed
mechanism (reverse duplicator (RD)). For this purpose, we used a list L generated
by the well known list scheduling algorithm, Modified Critical Path (MCP) [8].
For the application graph in Figure 2, the list L generated using MCP heuristic is
{v1, v6, v3, v2, v4, v5, v8, v7, v9, v10} and the scheduling of the examined mechanisms
is shown in Figure 3. In Figure 3 b), the gray tasks are the inserted tasks and in
Figure 3 c), the gray tasks are the duplicated tasks. A large number of randomly
generated task graphs with variant characteristics and the following comparison
metrics are used for the comparison.
5.1 Comparison Metrics
The comparisons of the mechanisms are based on the following metrics.
Makespan. The makespan is defined as the overall completion time and can be
specified as follows:
makespan = FT(vexit),
where: FT(vexit) is the finish time of the scheduled exit task.
Scheduling Length Ratio (SLR). The main performance measure is the schedu-
ling length (makespan). Since a large set of task graphs with different properties
A Machine Assignment Mechanism 347
10
20
30
40
1
6
3
8
7
10
P1 P2 P3
4
52
9
a)
Non-insertion
(Makespan = 46)
10
20
30
40
1
6
3
8
7
10
P1 P2 P3
4
5
2
9
b)
Insertion based
(Makespan = 42)
10
20
30
40
1
6
3
8
7
10
P1 P2 P3
9
1
2
1
4
5
5
c)
Reverse Duplicator
(Makespan = 34)
Fig. 3. Schedules produced by the examined mechanisms
is used, it is necessary to normalize the schedule length to the lower bound, which
is called the Schedule Length Ratio (SLR). The SLR is defined as
SLR =
makespan
∑
i∈CT wi
.
The denominator is the sum of the computation costs of the tasks on a critical
path (CP). The average SLR is used in our experiments.
Quality of Schedules. The percentage number of times that a mechanism pro-
duced better, worse, and equal quality of schedule compared to every other
mechanism is counted in the experiments.
5.2 Random Graph Generator
The random graph generator was implemented to generate application DAGs with
various characteristics that depend on several input parameters. The generator
requires the following input parameters to build weighted DAGs:
• number of tasks in the graph v,
• graph levels l,
348 T. Hagras, J. Janeček
• communication to computation ratio CCR, which is defined as the ratio of the
average communication cost to the average computation cost.
In all experiments, graphs with single entry and single exit node were conside-
red. In each experiment, the values of parameters were selected from the following
sets:
v ∈ {20, 40, 60, 80, 100, 120},
0.2 v ≤ l ≤ 0.8 v,
CCR ∈ {0.5, 1.0, 2.0}.
5.3 Performance Results
The performances of the mechanisms were compared with respect to different graph
size. The experiments were repeated for each v from the v set given above. For
each v, 1 000 graph were generated using random selection for CCR and levels (l)
(given above) for each graph. The average SLR for each v is given in Figure 4. For
all experiments 16 full connected machines were used. In general the performances
of the IB are better than the NI and the RD outperformed them both.
 1.36
 1.38
 1.4
 1.42
 1.44
 1.46
 1.48
 1.5
 1.52
 1.54
 0  20  40  60  80  100  120  140
A
ve
ra
ge
 S
LR
Number of Tasks
RD
IB
NI
Fig. 4. Average SLR
Finally, the percentage of situations that each mechanism in the experiments
produced better (B), equal (E) or worse (W) scheduling length compared to every
other mechanism were counted for all generated graphs. Each cell in Table 2 indi-
cates the comparison results of the mechanism at the left with the mechanism at
the top.
6 CONCLUSION
In this paper we presented a simple machine assignment mechanism called Reverse
Duplicator. The mechanism can be used with list-scheduling heuristics for both
A Machine Assignment Mechanism 349
RD IB NI
B 89.62% 93.62%
RD E 4.72% 4.57%
W 5.67% 1.82%
B 5.67% 31.28%
IB E 4.72% 64.08%
W 89.62% 4.63%
B 1.82% 4.63%
NI E 4.57% 64.08%
W 93.62% 31.28%
Table 2. Pairwise comparison of the examined mechanisms
limited and unlimited number of machines. The performance of the mechanism was
examined using variant random generated graphs. Three comparison matrices were
used to measure its performance. The reverse duplicator mechanism outperformed
both the non-insertion and insertion based mechanisms having the same complexity
as the non-insertion based mechanism.
REFERENCES
[1] Feitelson, D.—Rudolph, L.—Schwiegelshohm, U.—Sevcik, K.—
Wong, P.: Theory and Practice in Parallel Job Scheduling. JSSPP, 1997,
pp. 1–34.
[2] Kwok, Y.—Ahmed I.: Benchmarking the Task Graph Scheduling Algorithms. Proc.
IPPS/SPDP, 1998.
[3] Liou, J.—Palis, M.: A Comparison of General Approaches to Multiprocessor
Scheduling. Proc. Int’l Parallel Processing Symp., 1997, pp. 152–156.
[4] Khan, A.—McCreary, C.—Jones, M.: A Comparison of Multiprocessor
Scheduling Heuristics, ICPP, Vol. 2, 1994, pp. 243–250.
[5] Hagras, T.—Janeček, J.: A High Performance, Low Complexity Algorithm
for Compile-Time Job Scheduling in Homogeneous Computing Environments. IEEE
Proc. Int’l Conf. Parallel Processing Workshops (ICPP03 workshops). October 2003,
pp. 149–155.
[6] Kwok, Y.—Ahmed, I.: Dynamic Critical-Path Scheduling: An Effective Technique
for Allocating Task Graph to Multiprocessors. IEEE Trans. Parallel and Distributed
Systems, Vol. 7, 1996, pp. 506–521.
[7] Zhou, H.: Scheduling DAGs on a Bounded Number of Processors. Int’l Conf., Pa-
rallel and Distributed Processing Techniques and Applications, 1996.
[8] Min-You, W.—Gajski, D.: Hypertool: A Programming Aid for Message-Passing
Systems. IEEE Trans. Parallel and Distributed Systems, Vol. 1, 1990, No. 3.
[9] Topcuoglu, H.—Hariri, S.—Min-You, W.: Task Scheduling Algorithm for He-
terogeneous Processors. Heterogeneous Computing Workshop, 1999, pp. 3–14.
350 T. Hagras, J. Janeček
[10] Hwang, J.—Chow, Y.—Anger, E.—Lee, C.: Scheduling Precedence Graphs in
Systems with Interprocessor Communication Times. SIAM Journal on Computing,
Vol. 18, 1989, No. 2, pp. 244–257.
[11] Sih, G.—Lee, E.: A Compile-Time Scheduling Heuristic for Interconnection-
Constrained Heterogeneous Processor Architectures. IEEE Trans. In Parallel and
Distributed Systems, Vol. 4, 1993, No. 2, pp. 75–87.
Tarek Hagras received his M. Sc. degree in computer engineer-
ing from Asyut University, Asyut, Egypt, and his Ph.D. degree
in computer engineering and informatics from Czech Technical
University in Prague, Czech Republic, in 1998 and 2005, respec-
tively. Currently, he is a lecturer at Higher Institute of Energy,
Aswan, Egypt. He is a member of International Society of Com-
puters and their Applications (ISCA), IEEE and its Computer
and Communication Societies, and Egyptian Syndicate of Pro-
fessional Engineers. His research interests include parallel and
distributed systems and task scheduling in homogeneous and
heterogeneous computing systems.
Jan Janeek is an associate professor in the Department of
Computer Science and Engineering at the Czech Technical Uni-
versity in Prague. He received his M. Sc. degree and his Ph.D.
degree in technical cybernetics from the Czech Technical Univer-
sity in Prague in 1973 and 1981, respectively. Currently he lec-
tures on local area networks, advanced Technologies of computer
networks, distributed systems and applications of embedded sys-
tems. His research focuses on distributed computation, middle-
ware technologies, networking and embedded applications. He
has led and participated in research teams working on projects
dealing with networking technologies (X.25 PAD, ATM switch, VoIP software), software
implementation tools (embedded C and Pollux compilers, efficient SOAP parser), dis-
tributed quorum algorithms, support for asynchrony in distributed applications, and ef-
fectiveness of middleware technologies. He is a member of IEEE and its Computer and
Communication Societies, and serves as a vice chairman to the Czech Chapter of the IEEE
Computer Society.
