It has been already demonstrated that cost-e ective multiprocessor designs may be obtained by combining in the same architecture processors of di erent speeds (heterogeneous architecture) so that the serial and critical portions of the application may bene t from a fast single processor. In such an environment, the problem of assigning tasks to processors becomes a very important one. This papers presents a systematic way to build static heuristic scheduling algorithms. Using this strategy, several algorithms are proposed and their performance are compared through simulation. One of the proposed algorithms is shown to achieve substantial performance gains as the degree of heterogeneity of the architecture increases.
1. Introduction. Parallel applications can be envisioned as being composed of a set of interrelated tasks which are sequential units of processing 5, 13] . They can be characterized by several parameters such as fraction of sequential processing, average parallelism, maximum parallelism 16] . When multiprocessors are used to execute parallel applications, the parallel portion of the application can be speeded up according to the number of processors allocated to the application. If all processors are identical (homogeneous architecture), the sequential portion of the application will have to be executed in one of the processors, degrading considerably the execution time of the application 3]. Menasc e and Almeida 10, 11, 12] have rst used analytic models to show the cost-e ectiveness of having a larger processor tightly coupled to several smaller ones in what is called a heterogeneous architecture. Similar analysis followed by Andrews and Polychronopoulos 4] . Recently, researchers at CMU carried out an experiment connecting a Cray YMP/832 to a 32,000 node Connection Machine CM-2 through a fast HIPPI data path. They were able to obtain a speedup of 10 in a distributed solution to the assignment problem. The serial portions of the algorithm were run on the Cray while the parallel ones on the CM2 17]. Freund 6 ] also discussed models that explored the cost-e ectiveness of a set of loosely coupled supercomputers.
Once we recognize the value of heterogeneity, some fundamental problems have to be solved in order to make the idea work. One of the most important ones is scheduling of tasks among processors. In a homogeneous environment one has to be able to determine the optimum number of processors to be allocated to an application (processor allocation), as well as which tasks are going to be assigned to each processor (processor assignment) 7]. In a heterogeneous setting, we not only have to determine how many but which processors are allocated to an application, as well as which processors are going to be assigned to each task. This paper deals with static heuristic task scheduling algorithms for heterogeneous environments. Section two de nes the basic concepts that will be used throughout the paper. Next section presents a meta-algorithm that will be used to systematically build several static heuristic scheduling algorithms. Section four describes some properties used to classify the algorithms and the following section gives several examples of scheduling algorithms. These algorithms were evaluated using simulation in order to compare their relative performance. The results of this analysis is reported in section six.
2. Basic Concepts: Parallel Heterogeneous Architectures, Parallel Applications and Scheduling. For the purpose of this paper, a heterogeneous parallel architecture is a set, P = fp 1 ; p 2 ; ; p m g, of m interconnected processors. Each processor has an instruction set partitioned into I execution time equivalent classes. The instruction execution times of a given processor j are represented by the vector~ j = ( 1j ; 2j ; ; Ij ), where ij is the execution time of a type i instruction at processor p j . A value of 1 for ij indicates that processor p j does not execute instructions of type i.
A parallel application is a set of partially ordered interrelated tasks.
Let T ( ) = ft 1 ; t 2 ; ; t n g be the set of tasks of and G ( ) its acyclic directed precedence graph as used in many other studies 5, 13, 16] . Each node in this graph represents one of the tasks of the application. Arcs in the graph link a task to its immediate successors in the execution sequence. A task t k is said to become executable when all its immediate predecessors in the graph nish their execution. Associated to each task t k we de ne a vector ? k = ( 1k ; ; Ik ) such that ik is the average number of instructions of type i executed by task t k . This vector will be called the task service demand vector. Notice that in a homogeneous architecture, the service demand of a task can be measured in time units by a single scalar. In the heterogeneous environment it is not possible to measure the service demand in time units any more.
In a heterogeneous architecture, the execution time of a task depends on the processor that is going to execute it. Hence, the average execution time of task t k at processor p j , denoted by (t k ; p j ), is given by the dot product? k ~ j . Thus, a parallel application with n tasks and m processors can be represented by a precedence graph G and a n m matrix such that k;j = (t k ; p j ) is as de ned above.
As de ned in 18], the processor scheduling problem can be viewed as a two step process, namely processor allocation and processor assignment. Processor allocation in a heterogeneous setting, deals with the determination of not only how many but also which processors are to be allocated to a job. The processor assignment problem deals with the assignment of allocated processors to the tasks of the job. This paper deals with the processor assignment problem in heterogeneous environments.
After all tasks have been assigned to processors, each processor ends up with an ordered list of tasks that will run on it as soon as they become executable. 3 . A Meta-Algorithm for Building Scheduling Algorithms. The types of scheduling algorithms considered here are static and heuristic. Static in the sense that scheduling decisions are taken prior to the execution of the application, and heuristic since certain rules | called heuristics | will be used in the scheduling process in order to yield good sub-optimal and computationally inexpensive task assignments.
This section presents a meta-algorithm that can be used to build, in a systematic way, a range of static heuristic algorithms. The meta-algorithm is composed of an envelope component and a heuristic component The above modular structure lends itself to the systematic construction of scheduling algorithms. Some important properties of heuristics and envelopes will be discussed in the next section. Task Based (TB) The processor is selected at random from the processor domain and the task is selected according to a given heuristic.
Properties of Envelopes and
Processor Based (PB) The task is selected at random from the task domain and the processor is selected according to a given heuristic.
Task Processor Based (TPB) Heuristics (not necessarily the same) are used to select both the task and the processor. 5 . Some Scheduling Algorithms for Heterogeneous Parallel Architectures . This section presents some examples of envelopes and heuristics, and their combinations, i.e. the resulting scheduling algorithms. The goal of the heuristics considered in this paper is to minimize the execution time of the application.
5.1. Envelopes. Some de nitions are in order before some envelopes can be presented. The set of predecessors of a task t in G will be denoted by pred G (t). During the scheduling process tasks can be in three di erent states: scheduled, schedulable, and non schedulable. A task t is said to be in scheduled state if it is already assigned to a processor; t is said to be in schedulable state if it is not scheduled and pred G (t) = or if all its predecessors are in the scheduled state; and, if at least one of the predecessors of t is not scheduled then t is in non-schedulable state. For the NAO envelopes described in what follows, we assume that pred G (t) = 8t 2 T.
Several envelopes are described below. An informal description, as well as a more precise one in terms of the three steps of an envelope are given. When the Initialization step is null it is not mentioned. We will also assume in what follows that (t k ; p j ) is the task processor pair selected by the heuristic. . The selected processor p j is the one that minimizes the nish time of t k in the deterministic simulation execution. In other words, p j is such that (t k ; p j ) + freet(p j ) = min p2P f (t k ; p) + freet(p)g, where freet(p) is, as previously de ned, the next instant of time where processor p will become free in the deterministic execution simulation.
Highest Level First -(HLF). This is an AO TB heuristic which relies on an adaptation of the de nition of level given in 1, 9] . The level of a task t k , denoted by L(t k ), is a measure of the impact of t k on the tasks that succeed it directly or indirectly. One way to calculate L(t k ) is to consider the set k of all paths in G that go from t k into the nal task of the application.
If we assign weights to each task, we can de ne the weight of a path as the sum of the weights of all tasks in the path. Then, the level of a task can be de ned as the maximum weight of all paths that originate from it. In a homogeneous environment, a natural de nition for the weight of a task would be its execution time. However, in a heterogeneous architecture, the execution time depends on the processor that is going to be assigned to the task, and this is only known after the scheduling is complete. Porto proposed an estimated average execution time for task t k , denoted by k , which is an average of the (t k ; p) values for all p 2 P weighted by the probability of processor p being assigned to task t k 14]. This probability is computed assuming a geometric distribution synthesized in a manner that re ects the fact that processors that execute a task faster should have a higher probability of being assigned to it. So, the heuristic selects t k s.t. L(t k ) = max t j 2T +fL(t j )g, and the processor in a random fashion from P + . Lowest Co-Level First (LCF). This heuristic is also AO and TB and as the previous one relies on a de nition of co-level given in 1], with the adaptations for the heterogeneous case suggested in 14]. Co-level is a measure similar to level except that instead of assessing the impact of a task on the remaining ones that depend on it, it measures the amount of work already performed until the task can be started. As in the previous case, we use the approximate average execution time for tasks, 's, as the weights of tasks. So, the co-level C(t k ) of t k is the maximum weight of all paths in G from the initial task into task t k . Then, the heuristic selects t k s.t. C(t k ) = min t j 2T +fC(t j )g, and the processor in a random fashion from P + . Highest Load of Successor Tasks First (HLSTF). This is an AO TB heuristic which selects the task which has the maximum load, where the load of a task t k , denoted by load(t k ), is simply the sum of the approximate average execution times ( 's) of all immediate successors of t k . So, HLSTF selects task t k s.t. load(t k ) = max t j 2T +fload(t j )g, and the processor in a random fashion from P + . Largest Task First (LTF). This is a NAO TB heuristic which selects the task with the largest service demand. So, t k is such that P I i=1 ik = max t 6. Performance Comparison. This section describes the performance comparison studies that were carried out based on simulation results. For this purpose, a special simulator was built in a VS/PASCAL IBM environment. The conclusions drawn in this section are supported not only by the graphs depicted in this paper, but also by over a hundred graphs generated during the study. The ones present here serve the purpose of illustrating our ndings.
6.1. Model Parameters. The input parameters for the simulation model re ect the essential characteristics of a parallel application composed by a set of n tasks which are to be executed in a parallel architecture composed by a set of m processors, as detailed in the next subsections.
6.1.1. Heterogeneous Architectures. The heterogeneous architectures considered here were based on the models described in section two with some simplifying assumptions related to the number and type of processors, namely: Any processor is able to execute any task.
There is only one heterogeneous or serial processor in P (p het ), which has the highest processing capacity. (t j ;p l ) . With this assumption it is possible to consider that the instruction set has only one partition (I = 1). Given the two previous assumptions, and also for simplifying reasons, we assume, without loss of generality, that hom = 1 and het = 1 PPR , where PPR is the Processor Power Ratio de ned in 10], which measures the ratio between the speed of the fastest processor and the speed of each of the remaining ones.
6.1.2. Parallel Applications. As described in section two, parallel applications are seen as a set of n tasks, characterized by their service demand vectors 2 and the precedence relationships between them (graph G). We consider the same kind of topologies previously used by other authors 19, 20] to analyze scheduling disciplines, namely MVA, GRAVITY and MATRIX n , which are brie y described bellow. By changing the ser-vice demand of the tasks, these topologies can be made to cover a diversi ed range of real and general purpose parallel applications. MVA This is a dynamic programming problem, which represents several computations quali ed as wave front because the topology is characterized by a regular mesh divided in two distinct phases. The rst shows a slowly increasing parallelism which slowly decreases to one during the second one. GRAVITY MATRIX n This is the implementation of a matrix multiplication algorithm. This program uses a blocking algorithm, designed to increase performance through the exploitation of cache's locality principles.
It is a simple fork-and-join structure, where the serial fraction is concentrated over the initial and nal tasks. The subscript n indicates the number of tasks that are forked after the initial task.
6.2. The Simulation Process. The simulation is divided into two basic phases, namely scheduling and execution. During scheduling, all tasks are assigned to processors according to one of the scheduling algorithms discussed in the previous section. The second phase is the simulation of the execution process, where the processors execute the tasks that were assigned to them during the scheduling phase, following the partial order given by the task graph. During this last phase, task execution times are assumed to be exponentially distributed random variables, with means given by the appropriate entries in matrix ( ). The mean response time is obtained after several iterations, when a 95% con dence level is reached. However, the performance metric in the graphs given below is the normalized mean response time T rel , where the normalization value is the deterministic response time, obtained through a simulation where all processors are identical to p hom , and task execution times are considered to be deterministic and equal to the entries of the estimated mean execution time matrix . Moreover, the graphs below present this performance metric as a function of two distinct comparison parameters, namely Serial Fraction (F s ), and Processor Power Ratio (PPR). The serial fraction is the fraction of sequential processing of the application. This parameter is calculated independently of the number of processors assigned to the application. PPR measures the degree of heterogeneity of the architecture. 6 .3. Simulation Results. The graphs presented in this subsection are divided according to the comparison parameters. For each comparison parameter, there are two sets of graphs: the rst one shows curves of di erent algorithms built with a common envelope and distinct heuristics, while the second set depicts algorithms with a common heuristics and distinct envelopes. The label under the x-axis indicates the topology, followed by either an envelope or heuristic name, followed by the values used for parameters other than the one plotted on the x-axis.
Serial Fraction (F s ). Figures 1 and 2 show the performance of algorithms with the Deterministic Execution Simulation (DES) envelope, as a function of the serial fraction, for two di erent topologies, namely MVA and MATRIX 50 . All curves show an increasing performance (decreased normalized execution time) with an increase in the serial fraction, at di erent levels of performance. Although it is not shown here, it has been observed that for all other envelopes, except DES, all heuristics tend to exhibit very similar performance levels for a xed envelope. As it may be seen from the gures, the MVA case ( gure 1) shows a greater spread between the curves than the MATRIX 50 one ( gure 2). This happens because with the MATRIX 50 topology, the serial fraction is concentrated on the initial and nal tasks for all experiments. Since DES obeys the precedence order of the task graph, it is clear that when initial and nal tasks are to be scheduled, they are alone in the Task Domain Figures 3, 4, 5, and 6 are used to compare the performance of scheduling algorithms with the same heuristic but with a di erent envelope, also for the MVA and MATRIX 50 topologies. These graphs also show the performance curve of the algorithm composed of MFTPO + DES, as a performance reference. As it can be seen, the DES algorithms outperform all other algorithms with the same heuristic but with di erent envelopes. Nevertheless, when the application is almost entirely sequential, all algorithms tend to exhibit identical performance. The MFTPO+DES algorithm, on the other hand, shows the best performance for all serial fraction values because it is the only one capable of assigning all tasks to p het , even though there are other processors in the Processor Domain.
The rst two graphs ( gures 3 and 4) use the LTF heuristic, which is a TB one, while the last two of this series ( gures 5 and 6) use the LTF+SEETFPO heuristic, which is TPB. When composed with DES, TPB 3 At these instances all processors are free according to the DES procedure. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . heuristics behave very similarly to the MFTPO + DES algorithm. This indicates that under DES scheduling, a correct processor selection is generally more e ective than the task selection criterium.
Degree of Heterogeneity (PPR). Figures 7 and 8 compare DES based algorithms that use di erent heuristics, as a function of the Processor Power Ratio (PPR), for topologies MVA and GRAVITY , respectively. It should be noted that not all heuristics are explicitly referenced in the gures since many behave almost identically. For instance, although not shown in the gures, the curves for Largest Task First (LTF), Highest Load of Successor Tasks First (HLSTF) and Highest Level First (HLF) in gures 7 and 8 are practically identical. Only the LTF curve is shown. The rst important observation is that the MFTPO heuristic outperforms the remaining ones in both cases. It can also be seen from the pictures that the curves of di erent heuristics under DES are well separated. This kind of behavior was also observed in most of the situations analyzed by the authors, i.e di erent heuristics tend to behave similarly under the same envelope, except for DES. Actually, DES is the only type of envelope investigated that displays this sensitivity to the degree of heterogeneity of the architecture. See also the two lower curves of gures 9 and 10 for another display of the sensitivity to the PPR of algorithms that use DES as an envelope. The solid curve in both cases refers to the MFTPO+DES algorithm which is there for comparison purposes. As we can see from the gures, DES is the envelope that outperforms all others for the same heuristic. Moreover, when we combine DES with the MFTPO heuristic, we get an even better performance. This can easily be understood if we consider an extreme situation where p het has an in nite processing capacity when compared to p hom .
Certainly, in this case, the best schedule for any parallel application, would be to assign all tasks to p het . The only algorithms which are capable of making this choice are the ones composed with the MFTPO heuristic (DES as the envelope). It can also be observed that CPD envelopes as well as UPD ones are rather insensitive to the degree of heterogeneity. Moreover, the heuristics composed with MFTPO exhibit the best performance and are the only ones which maintain an increasing performance even when the PPR is very high.
Comparing gures 9 and 10, we observe again the di erence between the performance of TB heuristics (Lowest Co-Level First -gure 9) and TPB ones (LTF+SEETFPO case -gure 10) with DES as the envelope in relation to MFTPO. The latter ones exhibit a performance closer to that of MFTPO than the former type of heuristics. This emphasizes the already mentioned importance of a correct processor choice during the scheduling procedure when compared to task selection.
AO/non AO and CTD, CPD analysis. It was observed that the AO feature of an envelope or heuristic does not necessarily imply in a signi cant . .. . .. . . . . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . .. . .. . . . . . . . . . .. . . . . . . . . . . . . . . . .. . .. . .. . . . . . . . . . .. . . . . .   . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . performance gain over NAO ones, as it might be expected. On the other hand, it was found that CTD envelopes tend to perform better than non CTD ones, specially for some TPB heuristics such as LTF+SEETFPO (see gure 10). It was also observed that, in general, envelopes could be ranked, performance-wise, in the order UPD, CPD and DES with DES being the best. Another observation is that PB and TPB heuristics tend to perform better than TB ones (see gures 1, 7 and 8).
Variation of the Number of Processors. All previous analysis assumed that the number of processors was xed. Figure 11 shows the impact of the variation of the number of processors for an MVA topology and for a DES envelope. The results for four di erent heuristics | LTF, SEETFPO, MFTPO, and LUF | are displayed in the gure. The maximum logical parallelism for the MVA topology considered here is 6. As it can be seen from the curves in gure 11, MFTPO besides displaying the smallest average normalized response time is rather insensitive to an increase in the number of processors. The reason is that it will tend to select the fastest processor even though there are other slower processors available. LTF and LUF experience an increase in the average normalized response time as the number of processors increases because the likelihood that a slower processor will be available and hence selected increases with the number of processors.
Variation of the Coe cient of Variation of Task Execution Times. In all previous studies, we assumed that task execution times were exponentially distributed. Table 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 1 Trel as function of PPR and Ct 0 2 4 6 8 10 12 14 16 18 . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The scheduling decisions made by DES assume that the task execution times are deterministic, i.e. C t = 0. So, the larger the value of C t the more the actual task execution times will di er from the static assumptions made prior to execution. The table shows that for values of PPR greater than 4, DES+LTFMFT is rather insensitive to the variance of the task execution times. Even for small values of PPR (PPR = 2), the maximum relative di erence between a value of T rel obtained with a large C t and a value of T rel obtained with a small value of C t is of the order of 30 %. It can also be seen that for any value of C t , T rel decreases dramatically with PPR as shown in gures 7 and 8 which were obtained with C t = 1. will become free in the deterministic execution simulation; delay 0 (t k ) is the communication delay that t k would be imposed if it were to be assigned to p i . This search is done over the entire set of processors, P, previously allocated to the parallel application. If the selected processor p j is busy, then the task-processor pair is made null, meaning that no selection has been made. 7 . Concluding Remarks. This paper presented a novel approach, based on the concepts of envelope and heuristic, to systematically build static heuristic scheduling algorithms for multiprocessors. Using this approach, several scheduling heuristic algorithms were suggested and their performance was compared through the use of a simulator specially built for this purpose. The analysis compares several heuristics under the same envelope as well as several envelopes under the same heuristic. These comparisons are carried out as a function of two di erent parameters: the fraction of sequential processing of the application and the degree of heterogeneity of the architecture (PPR). The results show that (i) the Deterministic Execution Simulation (DES) envelope is clearly very sensitive to the degree of heterogeneity of the architecture, i.e. its performance improves signi cantly as the architecture becomes more predominantly heterogeneous; (ii) Minimum Finish Time Processor Oriented (MFTPO) is the best heuristic among all others compared; (iii) DES when combined with MFTPO generates the best algorithm among all due to its capability of performing a lookahead of pro- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cessor availability instants during the deterministic execution simulation; (iv) di erent heuristics exhibit rather di erent behavior under DES while they tend to perform similarly under other envelopes; (v) Processor Based and Task Processor Based heuristics tend to perform better than pure Task Based ones; (vi) the Application Orientation of envelopes and heuristics is not a determinant factor in performance;
Finally, it is worth observing that the schedulers examined here, as in other studies 2], rely on input data, such as task graphs and number of instructions executed, which might not be readily available in some cases, either because they were not collected or because they are data dependent. Although this is a rather important concern, we feel that being able to determine the kind of information needed to achieve best performance in scheduling decisions, will help compiler and operating system designers to incorporate into their designs the needed data collection facilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   . . . . . . . . . . . . . . . . . . . . . . . . . 
