Abstract-Consider the problem of determining a task-toprocessor assignment for a given collection of implicit-deadline sporadic tasks upon a multiprocessor platform in which there are two distinct kinds of processors. We propose a polynomialtime approximation scheme (PTAS) for this problem. It offers the following guarantee: for a given task set and a given platform, if there exists a feasible task-to-processor assignment, then given an input parameter, , our PTAS succeeds, in polynomial time, in finding such a feasible task-to-processor assignment on a platform in which each processor is 1 + 3 times faster. In the simulations, our PTAS outperforms the state-of-the-art PTAS [1] and also for the vast majority of task sets, it requires significantly smaller processor speedup than (its upper bound of) 1+3 for successfully determining a feasible task-to-processor assignment.
I. INTRODUCTION
This paper addresses the problem of finding an assignment of tasks to processors (also referred to as partitioning) for a given set of implicit-deadline sporadic tasks (also referred to as Liu and Layland (LL) tasks [2] ) on a heterogeneous multiprocessor platform comprising processors of two unrelated types: type-1 and type-2. We refer to such a computing platform as two-type platform. Our interest in considering such a platform model is motivated by the fact that many chip makers offer chips having two types of processors [3] - [7] .
In the partitioning problem, every task must be statically assigned to a processor at design time and all its jobs must execute on that processor at run time. The challenge is to find, at design time, a task-to-processor assignment such that, at run time, an uniprocessor scheduling algorithm running on each processor meets all the deadlines. Scheduling the tasks to meet deadlines on an uniprocessor platform is a well-understood problem. One may use Earliest-Deadline First (EDF) [2] , for example. EDF is an optimal scheduling algorithm on uniprocessor systems [2] , [8] , with the interpretation that it always constructs a schedule in which all the deadlines are met, if such a schedule exists. Therefore, assuming that an optimal scheduling algorithm is used on each processor, the challenging part is to find a partitioning for which there exists a schedule that meets all the deadlines -such a partitioning is said to be a feasible partitioning hereafter. Even in the simpler case of identical multiprocessors, finding a feasible partitioning is strongly NP-Complete [9] . Hence, this result continues to hold for two-type platforms. In this work, we propose a polynomial-time approximation scheme (PTAS), for this problem which outperforms the state-of-the-art PTAS [1] .
Definition 1 (PTAS). A PTAS takes an instance of an optimization problem (for which exact solutions are intractable)
and a parameter > 0 and, in polynomial time, produces a solution that is within a factor f ( ) of being optimal where function f () is independent of the problem instance.
Definition 2 (Approximation ratio). An algorithm for solving an optimization problem is said to have an approximation ratio of A if for all instances of the problem, the algorithm produces a solution that is within a factor of A from the optimal value.
Related work. The partitioning problem on heterogeneous multiprocessors has been studied in the past [10] - [14] . In [10] - [12] , the authors proposed algorithms for the problem of partitioning LL task sets on heterogeneous multiprocessors with an approximation ratio of 2. All these approaches [10] - [12] focused on generic heterogeneous multiprocessor platforms with two or more processor types. Due to practical relevance, Andersson et al. [13] considered the partitioning problem on two-type platforms and proposed an algorithm, FF-3C, and couple of its variants based on first-fit heuristic. These had the same performance guarantee as the approaches in [10] - [12] (i.e., requiring processors twice as fast, in the worst-case) but can be implemented efficiently and exhibited better average-case performance than those in [11] , [12] .
In a recent significant development, Wiese et al. [1] proposed a PTAS (referred to as PTAS LP since it uses "Linear Programming") for partitioning LL task system on limited heterogeneous multiprocessors in which processors are of a relatively small number (≥ 2) of distinct types. The PTAS LP provides the following guarantee: if there exists a feasible partitioning of a given task set on a limited heterogeneous multiprocessor platform then the PTAS LP succeeds in partitioning the task set on a platform in which each processor is 1+ 1− times faster. This is theoretically a significant result since PTAS LP partitions the task set in polynomial time, to any desired degree of accuracy. However, its practical significance is severely limited as the algorithm has a very high run-time complexity since it "heavily" relies on solving linear programming formulations. Even on a two-type platform, it has a high run-time complexity which makes its implementation highly inefficient (which is confirmed by the simulations in Section VIII). Therefore, we propose a PTAS for two-type platforms which does not rely on solving linear programs and hence offers a significantly better time-complexity than PTAS LP .
Contribution and significance of this work. We present a PTAS for the problem of partitioning a given LL task set on a two-type platform which offers the following guarantee. If there exists a feasible partitioning of a task set τ on a two-type platform π then given an > 0, PTAS succeeds, in polynomial time, in finding a feasible partitioning of τ on π (1+3 ) where π (1+3 ) is a two-type platform in which each processor is 1+3 times faster than the corresponding processor in π.
We believe the significance of this work is as follows. For the problem under consideration, our PTAS has superior performance compared to prior state-of-the-art, i.e., PTAS LP . Specifically, compared to PTAS LP , our PTAS has (i) a much better run-time complexity and (ii) a competitive approximation ratio. We evaluate the average-case performance of these algorithms with randomly generated task sets. The evaluation is based on (i) the processor speedup the algorithm needs, for a given task set, so as to succeed, compared to an optimal algorithm and (ii) the running time. Overall, our algorithm outperforms PTAS LP by requiring much smaller processor speedup and running faster by orders of magnitude. Also, for the vast majority of task sets, it requires significantly smaller processor speedup than its upper bound of 1 + 3 .
II. SYSTEM MODEL We consider the problem of partitioning a task set τ = {τ 1 , τ 2 , . . . , τ n } of n implicit-deadline sporadic tasks (LL tasks) on a two-type heterogeneous multiprocessor platform π comprising m processors, of which m 1 are of type-1 and m 2 are of type-2. Each task τ i is characterized by two parameters: a worst-case execution time (WCET) and a period T i . Each task τ i releases a (potentially infinite) sequence of jobs, with the first job released at any time during the system execution and subsequent jobs released at least T i time units apart. Each job released by a task τ i has to complete its execution within T i time units from its release. We assume that an optimal scheduling algorithm such as EDF is used on each processor.
On a two-type platform, the WCET of a task depends on the processor type on which it executes. We denote by C i /T i its utilizations on type-1 and type-2 processors, respectively. A task τ i that cannot be executed on processors of type-1 (resp., type-2) is modeled by setting its u i = ∞ (resp., v i = ∞).
III. AN OVERVIEW OF OUR APPROACH
We now give an overview of our algorithm (referred to as PTAS NF since it uses "Next-Fit"). Our PTAS takes > 0 as an input parameter and outputs a feasible partitioning. Let us partition the given task set τ into two subsets as follows:
Intuitively, τ hvy refers to "heavy" and τ lgt refers to "light" tasks. Our PTAS, has the following steps:
Step 1. We first approximate the utilizations of every task in τ hvy to some finite number of pre-computed values. The motivation for doing this is twofold: (i) by restricting the number of pre-computed values to a constant, we ensure polynomial complexity for the algorithm and (ii) by choosing these values cleverly, we ensure the approximation ratio of the algorithm is bounded. Then, we assign the tasks in τ hvy to processors using the algorithm A hvy described in Section IV-A. In Section IV-E, we show that after using A hvy , the sum of the utilizations of the tasks assigned on processors of type-1 (resp., type-2) does not exceed (1 + ) × m 1 (resp., (1 + ) × m 2 ).
Step 2. Some tasks from τ hvy (with
∧ v i ≥ ) may remain unassigned after using A hvy . These unassigned tasks form the set, τ int ("intermediate" tasks). Now, A int fractionally assigns the tasks (i.e., tasks can be split between processors) with u i < ∧ v i ≥ (resp., u i ≥ ∧ v i < ) to type-1 (resp., type-2) processors as described in Section V-A. In Section V-B, we show that after using A int , the sum of the utilizations of all the tasks assigned so far on processors of type-1 (resp., type-2) still does not exceed (1 + ) × m 1 (resp., (1 + ) × m 2 ).
Step 3. Fractionally assign the tasks in τ lgt to processors using the algorithm A lgt (which makes use of a fractional knapsack property) described in Section VI-A. In Section VI-B, we show that after using A lgt , the sum of the utilizations of all the tasks assigned so far on processors of type-1 (resp., type-2) does not exceed (1 + 2 ) × m 1 (resp., (1 + 2 ) × m 2 ).
Step 4. Finally, those tasks from τ int and τ lgt that were assigned fractionally by A int and A lgt are assigned integrally using the algorithm A fract described in Section VII-A. In Section VII-B, we show that after using A fract , the sum of the utilizations of all the tasks assigned so far on processors of type-1 (resp., type-2) does not exceed (1 + 3 ) × m 1 (resp., (1 + 3 ) × m 2 ). Hence, we conclude that if τ has a feasible partitioning on π then PTAS NF succeeds in finding such a feasible partitioning of τ on π (1+3 ) .
IV. ASSIGNING THE TASKS IN τ hvy (STEP 1)
In this section, we describe the algorithm, A hvy , for integrally assigning (a subset of) the tasks in τ hvy to processors and also analyze its returned assignment.
A. Description of the algorithm A hvy
It consists of three steps described in the next three sections:
Step 1.1. It defines a finite set S( ) of utilization values, based on the value of the input parameter, . Then, it computes the "rounded-down utilizations" u rd i and v rd i of every task τ i ∈ τ by rounding down u i and v i to one of the quantized values in S( ). We will denote by τ rd hvy the set of tasks obtained by rounding down the utilizations of the tasks of τ hvy .
Step 1.2. It uses dynamic programming to determine, in polynomial time, (i) all the subsets of τ rd hvy that can be partitioned upon m 1 processors of type-1 and (ii) all the subsets that can be partitioned upon m 2 processors of type-2.
Step 1.3. It exhaustively considers each pair of subsets such that one subset can be partitioned on m 1 processors of type-1 and the other subset can be partitioned on m 2 processors of type-2. Using the ordered pair of subsets under consideration, it integrally assigns (a subset of) the tasks from τ hvy to processors (at least all the tasks with u i ≥ ∧ v i ≥ ).
B. Rounding-down the utilizations of the tasks (Step 1.1)
We compute the set S( ) of all real numbers ≤ 1 that are of the form (1+ ) k , for all integers k ≥ 0. Then, we compute the rounded-down utilizations u rd i and v rd i of every task τ i ∈ τ by rounding down each of its utilizations (u i and v i ) to the nearest value present in the set S( ). For tasks with u i < (resp., v i < ), we set u rd i = 0 (resp., v rd i = 0) and for tasks with
The definition of S( ) leads to the following property.
and thus
The same holds for v i .
Therefore, if the utilizations of each task is reduced by this maximal factor, it follows that any collection of tasks with their reduced utilizations summing to ≤ 1 would have their original utilizations summing to ≤ (1 + ).
Let us now determine the number L of distinct values in S( ). Since only values with
Note that each X and each Y is no greater than |τ hvy |.
C. Generating the feasible configurations (Step 1.2)
The rounding down of the utilizations described in the previous section ensures that the utilizations of the tasks in τ hvy may only take one of the values in S( ), providing the set τ rd hvy . In this section, using dynamic programming, we determine, in polynomial time, all the subsets of τ rd hvy that can be partitioned upon m 1 processors of type-1 (resp., m 2 processors of type-2). Once all the feasible subsets (also referred to as feasible configurations) are determined, we use this information to assign a subset of tasks from τ hvy on type-1 and type-2 processors (described in Section IV-D). The algorithm A hvy uses the same approach as the one presented in [14] to determine all the configurations (x 0 , x 1 , . . . , x L−1 ) of tasks in τ rd hvy (resp., (y 0 , y 1 , . . . , y L−1 )) that are feasible on m 1 processors of type-1 (resp., m 2 processors of type-2), in which x ≤ X ≤ |τ hvy | (resp., y ≤ Y ≤ |τ hvy |) for each , 0 ≤ < L. This approach [14] is summarized below. As there are no more than Π
such feasible configurations on type-1 processors (and the same holds for type-2 processors) and since L is a constant for a given value of , the time to determine all the feasible configurations is polynomial in n. Summary of the approach in [14] (1 + X ) columns. Each column corresponds to a different configuration and each cell has a value ∈ {yes, no}. A cell in the i'th row and the j'th column is a "yes" if the corresponding configuration is feasible on i processors of type-1. This table is filled row-wise starting with the first row. Filling in the first row is straightforward for all the configurations: it is a "yes" if the corresponding configuration, say
otherwise. The i'th row is filled in by using the entries of the (i−1)'th row. Specifically, for the configuration corresponding to the j'th column, say (x 0 , x 1 , . . . , x L−1 ), the cell at the i'th row is a "yes" if and only if there exists two configurations
is a feasible configuration on one processor of type-1; and
For each cell in the i'th row, there are polynomially many possible candidates for the role of (x 0 , x 1 , . . . , x L−1 ); hence, each cell in the i'th row can be filled in polynomial time. Similarly, the second table for type-2 processors is constructed. Note: By using standard dynamic programming tricks which require storing additional information [14] , we can obtain a task-to-processor assignment from the feasible configurations.
D. Determining the partitioning (Step 1.3)
Using the two configuration tables that were constructed in the previous step, we now determine a partitioning for (a subset of) the heavy tasks. The main idea is as follows. Suppose that the task set τ can indeed be partitioned on the given platform and let H feas denote (one of) the feasible partitioning. For each , 0 ≤ < L, let x feas denote the number of tasks
that are assigned to type-1 processors in H feas . Since H feas is a feasible partitioning, the configuration (x ) must appear in the table constructed for type-2 processors and the cell at the m 2 'th row of the corresponding column must contain "yes". However, since we do not know which of the feasible configurations in our tables correspond to H feas , we consider every ordered pair of configurations that are feasible on m 1 and m 2 processors of type-1 and type-2 respectively. Since there are only polynomially (i.e., O(n L )) many distinct feasible configurations in each table, it follows that there are at most polynomially many such ordered pairs of feasible configurations to consider.
For each considered ordered pair of configurations, by assuming that they are the ones corresponding to H feas , we attempt to construct a similar task-to-processor assignment for the tasks in τ hvy as that of H feas . The assignment obtained will be similar to H feas in the following sense: although the tasks assigned in both the assignments may not be the same, it holds that (as we show later), the sum of utilizations of the tasks assigned by our algorithm on each processor type does not exceed that of H feas by a factor of 1 + .
Let
. . , y L−1 )} denote the currently considered ordered pair of feasible configurations on m 1 and m 2 processors of type-1 and type-2, respectively. The algorithm A hvy to determine the corresponding task-toprocessor assignment of tasks from τ hvy is as follows.
Step 
E. Assignment analysis
Let H hvy denote the assignment of the heavy tasks returned by A hvy . In this section, we show that in H hvy , the subset of tasks assigned to each processor consumes no more than (1+ ) of the capacity of that processor. 
, it is straightforward (from the fact that we consider the ordered pair P feas ) to see that A hvy successfully assigns exactly x feas tasks τ i satisfying
+1 to type-1 processors (through either case 1.3.1.2 or 1.3.1.3). While these may not be the same tasks as those that are assigned to these processors in H feas , the utilization of each task does not exceed that of the corresponding task assigned in H feas by more than a factor of (1 + ). Hence the lemma holds for the heavy tasks in Γ 
+1 that is assigned to processors of type-2 through one of these cases, there is a task, say τ k , also with (
+1 which is also assigned to processors of type-2 in H feas (since we consider the ordered pair P feas ). Since we have shown that the lemma holds as long as A hvy does not declare failure, we now show that A hvy cannot fail while considering the ordered pair P feas of feasible configurations. For a failure to occur, it is necessary for A hvy to go through case 1. = (1 + ) are assigned to type-2 processors. Therefore, it must be the case that in H feas , some of the n 1 − y feas "additional" tasks were assigned to type-1 processors. Let τ j denote one of these additional tasks, thus satisfying v rd j = (1 + ) and u
, A hvy necessarily went through case 1.3.2.1 and since this case allows tasks with smaller utilization on type-2 processors to be accommodated in unused slots that were reserved for tasks with larger utilization, τ j must have been assigned at that moment. This contradicts our assumption that τ j is unassigned at this time instant. Hence, we can conclude that A hvy does not declare failure for the ordered pair P feas of feasible configurations and the lemma holds for every task in Γ 
Lemma 2. After assigning the tasks in τ hvy , we have
and
Proof: We show only the proof of Expression (4), as the proof of Expression (5) is quite similar. The proof is a direct consequence of Lemma 1. We know from Lemma 1 and Definition 5 that there exists a 1:1 mapping between every task τ i in Γ 
Finally, we know from the feasibility of H feas that 
A. The description of the algorithm A int
The algorithm A int to assign the tasks in τ int is as follows: 1) Assign all the tasks in τ 1 int to type-1 processors using the wrap-around technique. This technique works as follows. Take the first processor of type-1 and assign as many of the tasks as possible from τ 1 int "integrally" onto that processor. When a task fails to be assigned integrally, assign that task "fractionally" such that the current processor is filled completely and the remaining fraction is assigned to the next processor of type-1, continue this procedure until all the tasks from τ 1 int are assigned to type-1 processors. 2) Analogously, assign all the tasks in τ 2 int to type-2 processors using the wrap-around technique.
B. Assignment analysis
We now show that for a task set τ that is feasible on a platform π, A int always succeeds in assigning all the tasks in τ In the following lemma, we make use of the fact that the two sets of tasks Γ 1 hvy and Γ 2 hvy have been obtained by algorithm A hvy , using the ordered pair P feas of feasible configurations.
Lemma 3.
After assigning all the tasks in τ int using the ordered pair of feasible configuration , we have:
Proof: In the feasible assignment H feas , τ 1 int number of tasks with u i < ∧ v i ≥ must have been assigned to type-1 processors. This is a consequence of the fact that P feas contains exactly the same number of tasks with utilization ≥ on the processor that they are assigned to, as in H feas . Let Φ 1 int denote the set of tasks with u i < ∧ v i ≥ that are assigned to type-1 processors in H feas . Since H feas is a feasible assignment, it holds that,
Since the number of tasks with u i < ∧ v i ≥ that have been assigned to type-1 processors is same in both H feas and the assignment computed by our algorithm, we have
Here, it is worth recalling Step 1.3.1.3 and Step 1.3.2.4 of algorithm A hvy . In these steps, while assigning the tasks to processors of type-1 (resp., type-2), when A hvy has to choose few tasks to assign from the available set of tasks, it always chooses those tasks that have a larger utilization on type-2 (resp., type-1) processors (leaving "easier" tasks for A int to assign). Now coming back to algorithm A int , although the tasks (with u i < ∧ v i ≥ ) assigned by A int to type-1 processors may not be the same as those assigned by H feas , we can infer that:
Applying Inequality (6) and (12) on (11) and then performing some arithmetic manipulations (see [15] for details), we get:
Using similar reasoning as above, we can show that Expression (10) holds as well. Hence the proof.
Corollary 1. After assigning the tasks in τ int , we have:
Proof: Inequality (13) follows from Expressions (6) and (12) 
A. The description of the algorithm A lgt
The pseudo-code for assigning tasks in τ lgt is shown in Algorithm 1 (which uses the fract-next-fit subroutine shown in Algorithm 2). The intuition behind the design of this algorithm is that, assuming a platform, π (1+2 ) , first we assign tasks to processors on which they have a smaller utilization (lines 1 and 2). Then, if there are remaining tasks, these have to be assigned to processors on which they have a larger utilizations (lines 7 and 15).
B. Assignment analysis
First, we present some useful result in Lemma 4, obtained by relating the problem under consideration to the fractional knapsack problem (see Chapter 16.2 in [16] ). This result will be used in Lemma 5. The relation between the fractional knapsack problem and the problem under consideration was explored in [13] . Lemma 4 is an adaptation of Lemma 5 in [13] . Hence, we only state the lemma here. The detailed description of the fractional knapsack problem, its relation with the task assignment problem and the proof of Lemma 4 can be found in Appendix A in [15] . 2 While assigning tasks to type-1 processors, if a task cannot be assigned integrally on m 1 'th processor (the last processor of type-1), then assign a fraction of that task such that m 1 'th processor is fully utilized and assign the rest of the fraction to m 2 'th processor (the last processor of type-2). This task is denoted by τ f later in the proofs -in Section VII. This is not shown in the pseudo-code explicitly for ease of representation.
Algorithm 1: A lgt : An algorithm to assign τ lgt tasks 
x and for every pair of tasks
It then holds that: 
where 1) all the tasks in τ hvy \ τ int are assigned integrally 2) some tasks in τ int are assigned fractionally and the rest are assigned integrally 3) some tasks in τ lgt are assigned fractionally and the rest are assigned integrally Proof: Informally, the claim can be written as follows: if there exists a feasible partitioning for a task set τ on a two-type platform π then algorithms A hvy , A int and A lgt succeed in assigning the tasks in τ on a platform π (1+2 ) , with some tasks assigned fractionally. We already know from Lemma 3 that after assigning the tasks in τ hvy \τ int and τ int using algorithms A hvy and A int , respectively, the sum of the utilizations of the tasks assigned on type-1 (resp., type-2) processors does not exceed (1 + )m 1 (resp., (1 + )m 2 ).
Therefore, we need to show that after assigning the tasks in τ lgt by using algorithm A lgt , the sum of the utilizations of the tasks assigned on processors of type-1 (resp., type-2) does not exceed (1+2 )m 1 (resp., (1+2 )m 2 ). An equivalent claim is that, after assigning tasks in τ hvy \ τ int and τ int by using algorithms A hvy and A int respectively, if A lgt fails to assign the tasks of τ lgt (with fractional assignment of tasks allowed) on platform π (1+2 ) then there does not exist a feasible partitioning of the tasks in τ on platform π. Here, we prove this equivalent claim by contradiction. Assume that there exists a feasible assignment H feas of τ on π but A lgt fails to assign the tasks in τ lgt on π (1+2 ) (after A hvy and A int successfully assigned the tasks of τ hvy \ τ int and τ int ).
Since A lgt failed to assign these tasks, it must have declared FAILURE and we explore all possibilities for this to occur: 
where P 1 and P 2 denote the set of type-1 and type-2 processors respectively and U [p] denotes the sum of the utilization of the tasks assigned on processor p.
Since
⇒ u f1 < ≤ m 1 and analogously since τ f2 ∈ τ 2 lgt , we know that v f2 < ≤ m 2 . Using these on Expressions (23) and (24), we get
Observe that (i) the set of tasks that has been assigned on type-1 processors so far is Γ 
Applying Expression (13) and (14) on Expression (27) and (28) respectively, performing some arithmetic manipulations and summing the resulting expressions (see [15] for details) yields:
It is trivial to see that assigning all the tasks of τ 1 lgt and τ 2 lgt to type-1 and type-2 processors, respectively (as in the above expression), requires the minimum processing capacity. Hence, Expression (29) continues to hold for any other assignment of these tasks, implying that H feas cannot be a feasible assignment, which leads to a contradiction. 
We know that the tasks assigned to type-2 processors at this stage are Γ . Therefore, we can rewrite Expression (30) as:
Using this on Expression (31), then applying Expression (14) and finally performing some arithmetic manipulations (see [15] for details) gives us:
We also know that, when A lgt executed line 1 (where it performed fract-next-fit), there must have been a task τ f1 ∈ τ 1 lgt \Γ 1 lgt 1 which was attempted on type-1 processors but failed to be assigned. Note that this task τ f1 may be the same as τ f mentioned above or it may be different. Because it was not possible to assign τ f1 on type-1 processors, we know that:
We know that the tasks assigned to type-1 processors are Γ 
Since τ f1 ∈ τ 1 lgt \ Γ 1 lgt 1 , we have u f1 < ≤ 2 . Using this on Expression (34), then applying Expression (13) and finally performing some arithmetic manipulations (see [15] for details) gives us:
Finally, Expression (35) can be rewritten as:
Let us now discuss the feasible assignment H feas . Let Φ 
Expression (37) can be rewritten as:
We can now reason about the inequalities we obtained about the assignment H feas and the one constructed by A lgt . We can see that Expressions (36) 
• A2 is Γ 2 lgt 1 ; Note that for every pair of tasks τi ∈ A1 and τj ∈ A2 it holds that
we get:
vi to both the sides in the above inequality, then applying Expressions (37) and (38) to the right-hand side and then applying Expressions (32) and (35) to the left-hand side yields:
This is a contradiction.
Failure on line 18 in Algorithm 1: A contradiction results -proof analogous to the previous case.
We showed that all the cases where A lgt declares FAILURE lead to a contradiction. Hence, the lemma holds.
VII. INTEGRAL ASSIGNMENT OF τ int AND τ lgt (STEP 4)
We now discuss how to integrally assign the tasks from τ int and τ lgt that were fractionally assigned by algorithms A int and A lgt , respectively. We also show that, if there is a feasible partitioning of the given task set on a given two-type platform then our PTAS succeeds in finding such a feasible partitioning on a platform in which each processor is 1 + 3 times faster.
A. The description of the algorithm A fract
The algorithm, A fract , works as follows : 1) Proof: We know from Lemma 5 that if there exists a feasible partitioning of τ on π then the three algorithms A hvy , A int and A lgt described in Sections IV-VI succeed in assigning tasks in τ (with a subset of tasks from τ int and τ lgt fractionally assigned) on π (1+2 ) . As a consequence, we have:
We also know that in such an assignment (as a consequence of using the wrap-around technique in A int and A lgt ):
• at most m 1 −1 tasks are split between processors of type-1 with one task split between each pair of consecutive processors; let the set Γ 1 split denote these fractional tasks.
• at most m 2 −1 tasks are split between processors of type-2 with one task split between each pair of consecutive processors; let the set Γ 2 split denote these fractional tasks.
• at most one task (from τ lgt ) is split between processors of type-1 and type-2; let τ f ∈ τ lgt denote this task that must be split between the m 1 'th processor of type-1 and the m 2 'th processor of type-2.
• the rest of the tasks are integrally assigned to either type-1 or type-2 processors. Let τ To prove the theorem, we need to show that A fract succeeds in integrally assigning all the fractional tasks on π (1+3 ) . On Step 1, A fract copies the assignment from π (1+2 ) onto a faster platform π (1+3 ) . After this step,
int ∪ τ lgt , we have:
Step 2, A fract assigns the split tasks integrally. So, ∀p 1 ∈ type-1 of π (1+3 ) , it moves the fraction of the task τ
that is assigned to (p 1 + 1)'th processor of type-1 to p 1 'th processor of type-1. After this re-assignment, it follows from Expressions (41) and (42) that:
Analogously, ∀p 2 ∈ type-2 of π (1+3 ) , it moves the fraction of the task τ 2 p2,p2+1 that is assigned to (p 2 +1)'th processor of type-2 to p 2 'th processor of type-2. After this re-assignment, it follows from Expressions (41) and (43) that:
Finally, the task τ f that is split between the m 1 'th processor of type-1 and the m 2 'th processor of type-2 remains to be integrally assigned. Since τ f ∈ τ lgt , it holds that u f < and v f < . From Expression (45) and (47), it follows that task τ f can be integrally assigned to either m 1 'th or m 2 'th processor. Hence, after integrally assigning this task, we obtain:
Since Expression (48) is a necessary and sufficient schedulability condition for EDF on a uniprocessor of capacity 1 + 3 , the assignment of τ on π (1+3 ) returned by our algorithm, PTAS NF , is a feasible assignment. Hence, the proof.
VIII. EXPERIMENTAL SETUP AND RESULTS
After studying the theoretical (worst-case) bound, i.e., the approximation ratio of our algorithm, PTAS NF , we evaluate its average-case performance and compare it with prior stateof-the-art, i.e., PTAS LP . For this purpose, we look at the following aspects: (i) how much faster processors our algorithm needs in practice in order to successfully partition a task set compared to PTAS LP ? and (ii) how fast our algorithm runs compared to PTAS LP ? Also, we look at (iii) how much pessimism is there in our theoretically derived performance bound? In order to answer these questions, we performed two sets of experiments. The first set of experiments described in Section VIII-A addresses (i) and (ii) and the second set of experiments described in Section VIII-B addresses (iii).
A. Comparison with prior state-of-the-art
We compare the average-case performance of PTAS NF with PTAS LP . We implemented both the algorithms in C on an Intel Core2 (2.80 GHz) machine. For PTAS LP , we used a state-of-the-art LP solver, IBM ILOG CPLEX [17] .
For a given task set, we define the minimum required speedup factor, MRSF NF , of PTAS NF as the minimum amount of extra speed of processors that PTAS NF needs, so as to succeed in finding a feasible partitioning as compared to an optimal algorithm. We define MRSF LP of PTAS LP analogously. For different values of , we assess the averagecase performance of these algorithms by measuring their (i) minimum required speedup factors and (ii) running times.
The problem instances (number of tasks, their utilizations and number of processors of each type) were generated randomly. Each problem instance had at most 10 tasks and at most 2 processors of each type. For performing fair evaluation, we convert the randomly generated task sets into critically feasible task sets -more details in Appendix B in [15] . A task set is termed critically feasible if it is feasible on a given two-type platform but rendered infeasible if all u i and v i are increased by an arbitrarily small factor. The intuition behind using critically feasible task sets in our simulations is that it is "hard" to find a feasible partitioning for these task sets since only a few assignments are feasible among all the possible assignments. Hence, by using such task sets, we believe our evaluations have been fair and unbiased.
We ran PTAS NF and PTAS LP on 5000 critically feasible task sets and for each task set, we obtain MRSF NF and MRSF LP as follows. We initially set the speedup factor to 1.0 and input the task set to the algorithm. If the algorithm cannot find a feasible mapping, we increment the speedup factor by a small value, i.e., by 0.01, and divide the original utilizations, u i and v i , of each task by the new speedup factor and feed the resulting task set to the algorithm. These steps (adjust the speedup factor and feed back the derived task set) are repeated till the algorithm succeeds, which gives us the MRSF of the algorithm for the task set. This entire procedure is repeated for 5000 critically feasible task sets.
With this procedure, we obtain the histograms of MRSFs for both the algorithms for different values of . Figure 1 shows the histogram for = 0.2 (note that the y-axis is in log scale). As can be seen, the MRSF NF never exceeded 1.12 which is 20% from the optimal value of 1.0 compared to its upper bound of 1 + 3 = 1.60, i.e., 1.12−1.0 1.6−1.0 × 100 = 20%, whereas MRSF LP is as high as 1.28 which is 56% from the optimal value of 1.0 compared to its upper bound of 1+ 1+ = 1.50, i.e., 1.28−1.0 1.5−1.0 × 100 = 56%. Therefore, PTAS NF requires much smaller processor speedup on an average than PTAS LP in order to find a feasible partitioning. The observations for other values of follow the same trend -see Appendix B in [15] .
We also measure the average running times of both the algorithms for different values of . In these experiments, the speedup factor is set to 1 + 3 for PTAS NF and to 1+ 1+ for PTAS LP . This ensures that both the algorithms succeed in finding a feasible partitioning for a given task set in a single run and hence the experiments are not biased to give advantage to any of them. In our experiments with 5000 critically feasible task sets, as can be seen in Table I , for = 0.1, PTAS NF runs approximately 50000 times faster compared to PTAS LP . This factor is even higher for other values of .
To summarize, our algorithm exhibits a better averagecase performance by requiring significantly smaller processor speedup for finding a feasible partitioning and by running orders of magnitude faster compared to PTAS LP . Overall, PTAS NF outperforms prior state-of-the-art, PTAS LP . 
B. Evaluation of PTAS NF for different values of
In order to understand how much pessimism is there in the analysis of PTAS NF , we evaluated its performance for different values of . In this set of experiments, we chose larger number of problem instances with each problem instance being more complex 3 . We generated 10000 critically feasible task sets where each task set had at most 25 tasks and at most 3 processors of each type. Then, for different values of , we ran PTAS NF on these 10000 critically feasible task sets and obtained the histograms of MRSF NF . Figure 2 shows the histogram for = 0.3. As can be seen, for almost 98% of the task sets, the MRSF NF did not exceed 1.06, i.e., approximately 7% of its theoretical bound (i.e., 1+3 = 1.90), for the remaining 2% of the task sets, the factor did not exceed 1.12, i.e., approximately 13% of its theoretical bound. Thus, in the simulations, for the vast majority of task sets, our algorithm requires much smaller processor speedup than indicated by its approximation ratio. The observations for other values of follow the same trend -see Appendix B in [15] .
Hence, PTAS NF performs significantly better in simulations than indicated by its theoretical bound.
IX. CONCLUSIONS
A polynomial-time approximation scheme was proposed for the problem of partitioning a given collection of implicitdeadline sporadic tasks upon a multiprocessor platform in which there are two distinct kinds of processors. It provides the following guarantee: if a task set has a feasible partitioning on a two-type platform then given an input, > 0, our PTAS succeeds in finding such a feasible partitioning for the task set on a two-type platform in which each processor is 1 + 3 3 Since we do not run PTAS LP (which takes much longer to output the solution) in this batch of experiments, we could increase the problem instances and size of each problem compared to previous set of experiments. times faster. In simulations, our algorithm outperforms prior state-of-the-art PTAS [1] and also performs significantly better than indicated by its theoretical bound.
