We consider pipelined real-time systems, commonly found in assembly lines, consisting of a chain of tasks executing on a distributed platform. Their processing is pipelined: each processor executes only one interval of consecutive tasks. We are therefore interested in minimizing both the input-output latency and the period. For dependability reasons, we are also interested in maximizing the reliability of the system. We therefore assign several processors to each task, so as to increase the reliability of the system. We assume that both processors and communication links are unreliable and subject to transient failures, the arrival of which follows a constant parameter Poisson law. We also assume that the failures are statistically independent events. We study several variants of this multiprocessor mapping problem with several hypotheses on the target platform (homogeneous/heterogeneous speeds and/or failure rates). We provide NP-hardness complexity results, and optimal mapping algorithms for polynomial problem instances.
Introduction
Pipelined real-time systems are commonly found in assembly lines and are subject to strict dependability and realtime constraints. They consist of a chain of tasks executing on a distributed platform. Each task is a block of code with a known amount of work to be processed. The role of the first task of the chain is to acquire some data set from the environment (thanks to sensor drivers), to process it, and finally to transmit its result to the second task. Each subsequent task receives its input data from its immediately preceding task, processes it, and transmits its result to its immediately successor task, except the last task that transmits it to the environment (thanks to actuator drivers).
Tasks are assigned to processors of the platform using an interval mapping, which groups consecutive tasks of the linear chain and assigns them to the same processor. Interval mappings are more general than one-to-one mappings, which establish a unique correspondence between tasks and processors; they are very useful for reducing communication overheads, not to mention the many situations where there are more tasks than processors and where interval mappings are mandatory. The key performance-oriented metrics to determine the best interval mapping are the period and the latency. The period is the time interval required between the beginning of the execution of two consecutive data sets. Equivalently, the inverse of the period is the throughput that measures the aggregate rate of processing of data. The latency is the time elapsed between the beginning and the end of the execution of a given data set, hence it measures the response time of the system to process the data set entirely. Minimizing the latency is antagonistic to minimizing the period, and tradeoffs should be found between these criteria.
Besides real-time constraints, expressed as an upper bound on the period and/or the latency, pipelined real-time systems must also satisfy crucial dependability constraints, which are expressed as a lower bound on the reliability of the mapping. Increasing the reliability is achieved by replicating the intervals on several processors. Increasing the replication level is therefore good for the reliability, but bad for the period and latency. We thus have three antagonistic criteria, the reliability, the period, and the latency.
We evaluate the reliability of a single task mapped onto a processor according to the classical model of Shatz and Wang [21] , where each hardware component (processor or communication link) is fail-silent and is characterized by a constant failure rate per time unit λ: the reliability of a task of duration d is therefore e −λd . For an interval of several tasks mapped onto a single processor, we just have to sum up the task durations, hence obtaining e −λD , where D is the sum of the task durations. For a mapping with replication, we compute the reliability by building the Reliability Block Diagram (RBD) corresponding to this mapping. Here we face the delicate issue that computing the reliability is exponential in the size of the mapping (or equivalently the size of the RBD). To solve this issue, we insert routing operations in the mapping to guarantee that the RBD is by construction serial-parallel, therefore allowing us to compute its reliability in linear time.
We first present the models in Section 2, and then discuss related work in Section 3. The core of our contribution is presented in Sections 4, 5, and 6. Finally, we conclude in Section 7.
Framework
In this section, we detail the application model, the platform model, the failure model, and the replication model. We end with the formal definition of the mono-or multicriteria multiprocessor mapping problem.
Application model
An application is a chain of n tasks C = (τ i ) 1≤i≤n . Each task τ i is a block of code that receives its input from its predecessor τ i−1 , computes a known amount of work, and produces an output data set of a known size. Therefore, each task τ i is represented by the pair (w i , o i ), where w i is the amount of work and o i is the output data size. By convention, o n = 0 because τ n emits its result directly to the environment through actuator drivers. Specifying the size of the input data set required by a task is not necessary since, by definition of a chain, it is equal to the size of the output data set of its immediately preceding task. Figure 1 shows an example of a chain composed of n tasks. is not a critical assumption since worst-case execution time (WCET) analysis has been applied with success to real-life processors actually used in embedded systems. In particular, it has been applied to the most critical existing embedded system, namely the Airbus A380 avionics software running on the Motorola MPC755 processor [9, 22] .
Platform model
The target platform consists of p processors connected by point-to-point communication links. We note P the set of processors: P = (P u ) 1≤u≤p . We assume that communication links are homogeneous: this means that all links have the same bandwidth b. On the contrary, each processor P u may have a different speed s u . Such platforms correspond to networks of workstations with plain TCP/IP interconnects or other LANs.
In order to derive a realistic communication model, we assume that the number of outgoing point-to-point connections of each processor is limited to K. A given processor is thus capable of simultaneously sending messages to (and receiving messages from) K other processors. Indeed, there is no physical device capable of sending, say, 100 messages to 100 distinct processors, at the same speed as if it was a single message. The output bandwidth of the sender's network card would be a limiting factor. Our assumption of bounded multi-port communications [14] is reasonable for a large range of platforms, from large-scale clusters to multi-core System-on-Chips (SoCs).
In addition, we assume that communications are overlapped with computations, that is, a processor can compute the current instance of task τ i and, in parallel, send to another processor the result of the previous instance of τ i . This model is consistent with current processor architectures where a SoC can include a processor and several communication co-processors.
Interval mapping
The chain of tasks is executed repeatedly in a pipelined manner to achieve a better throughput. As a consequence, mapping the chain on the platform involves dividing the chain into m intervals of consecutive tasks, and assigning each processor to a unique interval. This technique is known as interval mapping. Figure 2 shows an example of a division of a chain of tasks into m intervals. In a mapping without replication, each interval is assigned to a single processor, while in a mapping with replication, each interval is assigned to several processors. Replication is crucial to increase the reliability of the system [10] . If the number of processors is greater than the number of tasks, then each interval can be of size one (that is, one task per interval), but this is rarely the case for reallife systems. Also, having many small intervals is likely to decrease the period but will also increase the communication costs and hence decrease the total reliability: thus a trade-off is to be found.
For each 1 ≤ j ≤ m, the interval I j is the set of consecutive tasks between indices f j and l j . Moreover, f 1 = 1, ∀2 ≤ j ≤ m, f j = l j−1 + 1, and l m = n. The amount of work processed by I j is therefore W j = τi∈Ij w i = lj i=fj w i . The size of the output data set produced by interval I j is that of its last task, that is, o lj .
Failure model
Both processors and communication links can fail, and they are fail-silent. Classically, we adopt the failure model of Shatz and Wang [21] : failures are transient and the maximal duration of a failure is such that it affects only the current operation executing onto the faulty processor, and not the subsequent operations (same for communication links); this is the "hot" failure model. Besides, the occurrence of failures on a processor (same for a communication link) follows a Poisson law with a constant parameter λ, called its failure rate per time unit. Modern fail-silent hardware components can have a failure rate around 10 −6 per hour. Since communication links are homogeneous, we note λ their identical failure rate per time unit. Concerning the processors, we note λ u the failure rate per time unit of the processor P u , for each P u in P.
Moreover, failure occurrences are statistically independent events. Note that transient failures are the most common failures in modern embedded systems, all the more when processor voltage is lowered to reduce the energy consumption, because even very low energy particles are likely to create a critical charge leading to a transient failure [25] .
The reliability of a system measures its continuity of service. It is defined as the probability that it functions correctly during a given time interval [2] . According to our model, the reliability of the processor P (resp. the communication link L) during the duration d is r = e −λd , where λ is the failure rate per time unit of P or L. Conversely, the probability of failure of the processor P (resp. the communication link L) during the duration d is f = 1 − r = 1 − e −λd . Hence, the reliability of the task τ i on processor P u is:
Accordingly, the reliability of the interval I mapped on the processor P u is:
Equations (1) and (2) show that platform heterogeneity may come from two factors: (i) processors having different speeds, and (ii) processors having different failure rates. We say that the platform is homogeneous if processors have same speeds and same failure rates (hence the reliability and the execution time of an interval no longer depends on the processor it is assigned to) and we say that the platform is heterogeneous otherwise.
Replication model
We use spatial redundancy to increase the reliability of a system: in other words, we replicate the intervals on several processors. Figure 3 shows an example of mapping by interval with spatial redundancy: the interval I 1 is mapped on the processors {P 1 , P 2 , P 3 }, the interval I 2 is mapped on the processors {P 4 , P 5 }, and so on until the interval I m mapped on the processors {P t−1 , P t }. Concerning the communications, the data-dependency o l1 is mapped on the point-to-point links {L 14 , L 15 , L 24 , L 25 , L 34 , L 35 }, and so on. To increase the reliability, each processor of a given interval communicates with each processor of the next interval. Specifically, for any 1 ≤ j ≤ m − 1, all the processors executing interval I j send their result to all processors executing the next interval I j+1 . Because of the bounded number K of possible communications (see Section 2.2), the maximum number of replicas per interval is also limited to K.
Multiprocessor mapping problem
We study several variants of the multiprocessor interval mapping problem. The inputs of the problem are a chain of n tasks C = (τ i ) 1≤i≤n , a hardware platform of p processors P = (P u ) 1≤u≤p , and a bound K on the maximal number of replications for each interval of tasks. The output is an interval mapping of C onto P, that is, a distribution of C into m intervals and an assignment of each interval to at most K processors of P, such that each processor executes only one interval. Each variant of the mapping problem optimizes a different set of criteria among the following ones:
• the reliability, • the expected input-output latency, • the worst-case input-output latency, • the expected period,
• the worst-case period.
Our contribution is multifold. In Section 4, we show how to compute the different objectives (reliability, expected and worst-case latency, and expected and worst-case period) for a given multiprocessor mapping. Then, for homogeneous platforms, we prove that:
1. computing a mono-criterion mapping that optimizes the reliability is polynomial (Section 5.1);
2. optimizing both the reliability and the period remains polynomial (Section 5.2);
3. the problem of optimizing both the reliability and the latency is NP-complete (Section 5.3).
For heterogeneous platforms, we prove that optimizing the reliability only is NP-complete, and hence all the multicriteria mapping problems that include the reliability in their criteria are also NP-complete (Section 6). Finally, we state some concluding remarks in Section 7.
Related work
Several papers have dealt with workflow applications whose dependence graph is a linear chain. The pioneering papers [23, 24] investigate bi-criteria (period, latency) optimization of such workflows on homogeneous platforms. An extension of these results to heterogeneous platforms is provided in [5, 6] .
All the previous papers deal with fully reliable platforms. In our previous work [4] , we have studied the (reliability, latency) mapping problem with fail-silent processors. The model in [4] is quite different, and much more crude, than the one of this paper: each processor has an absolute probability of failing, independent of task durations, and the faults are unrecoverable. To the best of our knowledge, we are not aware of other published work on optimizing linear chain workflows for reliability. However, many papers deal with a single directed acyclic graph (DAG) instead of a pipelined workflow, be it a fully general DAG [8] , a linear chain [20] , or even independent tasks [15, 20] . The closest of the latter papers is [20] : it contains a short section on linear chains, with mono-criterion dynamic programming algorithm for optimizing the reliability which is similar to Algorithm 1 (see Section 5.1).
Finally, the specific problem of bi-criteria (length, reliability) multiprocessor scheduling has also been addressed in [7, 1, 13, 19, 11, 12] for general DAGs of operations, but except [1, 11, 12] , these papers do not replicate the operations and have thus a very limited impact on the reliability. Moreover, none consider chains of tasks and interval mappings, and therefore they attempt to minimize the length of the mapping without distinguishing between the period and the latency (the latter one being similar to the schedule length).
Evaluation of a given mapping
In this section, we detail the computation of the different objectives (reliability, expected and worst-case latency, and expected and worst-case period) for a given mapping. We compute the reliability of a mapping by building its reliability block diagram (RBD) [18, 3] . Formally, a RBD is an acyclic oriented graph (N, E), where each node of N is a block representing an element of the system, and each arc of E is a causality link between two blocks. Two particular connection points are its source S and its destination D. An RBD is operational if and only if there exists at least one operational path from S to D. A path is operational if and only if all the blocks in this path are operational. The probability that a block be operational is its reliability. By construction, the probability that a RBD is operational is equal to the reliability of the system that it represents.
In our case, the system is the multiprocessor interval mapping, possibly partial, of the application on the platform. A mapping is partial if not all intervals have been mapped yet, but of course those intervals that are mapped are such that all their predecessors are also mapped. Each block represents an interval I j placed onto a processor P u or a data-dependency o lj between the two intervals I j and I j+1 placed onto a communication link. The reliability of a block is therefore computed according to Equation (2) .
Computing the reliability in this way assumes that the occurrences of the failures are statistically independent events (see Section 2.4). Without this hypothesis, the fact that some blocks belong to several paths from S to D makes the computation of the reliability very complex. Concerning hardware faults, this hypothesis is reasonable, but this would not be the case for software faults [17] .
The main drawback of the approach is that the computation of the reliability is, in general, exponential in the size of the RBD. When the schedule is without replication, the RBD is serial (i.e., there is a single path from S to D) so the computation of the reliability is linear in the size of the RBD. But when the schedule is with replications, the RBD has no particular form, so the computation of the reliability is exponential in the size of the RBD. The reason is that processors are heterogeneous: the completion dates of a given interval on its assigned processors are different, so the reception dates by the processors of the next interval are different. This is true even when the application is a chain of intervals rather than a general graph. See Figure 4 for an illustration, where the RBD corresponding to the mapping has no specific form.
One solution to compute the reliability of the mapping of Figure 4 involves enumerating all the minimal cut sets of its RBD [16] . A cut set in a RBD is a set of blocks C such that there is no path from S to D if we remove all the blocks of C from the RBD. A cut C is minimal if, whatever the block that is removed from it, the resulting set is not a cut anymore. It follows that the reliability of a minimal cut set is the reliability of all its blocks put in parallel. The reliability of the mapping can then be approximated by the reliability of the alternative RBD composed of all the minimal cut sets put in sequence. Because this RBD is serial-parallel, this computation is linear in the number of minimal cut sets. The problem is that, in general, the number of minimal cuts is exponential in the size of the mapping. For this reason, we follow the approach of [11] and we insert routing operations between the intervals to make sure that the RBD representing a mapping is always serialparallel, therefore making tractable the computation of the reliability. This is illustrated in Figure 5 , where a routing operation R has been mapped on processor P 5 and the RBD corresponding to the mapping is serial-parallel; as a consequence, the reliability of this mapping can be computed in a linear time w.r.t. the number of intervals. Routing operations can be mapped on any processor. For instance, in the mapping of Figure 5 , R could have been mapped on P 1 instead of P 5 , therefore avoiding the need for the communication (o l1 /L 15 ). Also, routing operations are assumed to be executed in 0 time units [11] , hence for any processor P u , the reliability of the block (R/P u ) is 1.
As we have advocated, inserting routing operations yields the huge advantage of making the reliability computation linear in time. This comes at a cost in the execution time of the system because of the increased number of communications. However, it has been shown in [11] that the overhead incurred by the routing operations is reasonable (only +3.88 % on average).
For an interval I of weight W mapped on the subset of processors P I , let ec be its expected time of computation, and let wc be its WCET (by the slowest processor of P I ). Assume that the processors in P I are ordered according to their speed, from the fastest P 1 to the slowest P t : that is, ∀1 ≤ u < t, we have s u ≥ s u+1 . Then, the expected and worst-case execution times of I on P I are:
Equation (3) sums up, for each P u , the case where the first u − 1 fastest processors fail, and the u-th one is successful. Then, for a mapping (I 1 , P 1 ), . . . , (I m , P m ), the expected latency EL and the expected period EP are:
The worst-case latency W L and the worst-case period W P are defined similarly, but with the worst-case cost of intervals (Equation (4)) instead of the expected cost (Equation (3)):
W P = max{ max
Finally, thanks to the routing operations, the reliability of the mapping (I 1 , P 1 ) , . . . , (I m , P m ) is:
(1−r comm,i−1 ×r u,Ii ×r comm,i ) (9) Equation (9) above is computed according to the generic form of the RBD of Figure 5 . To account for the fact that the first interval I 1 has no incoming communication, we just set o 0 = 0, hence r comm,0 = 1. The same occurs for the outgoing communication of the last interval I m . Finally, routing operations do not appear in Equation (9) since their reliability is always equal to 1.
Complexity results for homogeneous platforms
In this section, we provide optimal polynomial algorithms for the mono-criterion reliability optimization problem, and then for the bi-criteria (reliability, period) optimization problem. Finally, we prove the NP-completeness of the bi-criteria (reliability, latency) optimization problem. Note that on homogeneous platforms, the expected latency and worst-case latency are the same. This also holds true for the expected period and worst-case period.
Reliability optimization
We present a mono-criterion polynomial-time algorithm that maximizes the reliability of a given chain of tasks on a given homogeneous platform. Algorithm 1 is a dynamic programming algorithm. It is a simplified version of Algorithm 2 for bi-criteria (reliability, period) optimization, which we present in the next section.
Data: a number p of fully homogeneous processors of failure rate λ, a list A of n tasks of sizes w i , and a maximal number K of replications Result: a reliability r for k = 1 to min{K, p} do Proof. In this algorithm, F (i, k) is the optimal reliability when mapping the first i tasks on k processors, and it is computed iteratively with the dynamic programming procedure.
Reliability/period optimization
We now present a bi-criteria (reliability, period) polynomial-time algorithm that optimizes the reliability of a mapping given a bound on the period. Recall that, for homogeneous platforms, the worst-case period and the expected period are the same.
Data: a number p of fully homogeneous processors of failure rate λ, a list A of n tasks of sizes w i , a maximal number K of replications, and an upper-bound P on the period Result: a reliability r for k = 1 to min{K, p} do 2 ) the optimal mapping for reliability optimization on fully homogeneous platforms, when a bound on the period is given.
Proof. In this algorithm, F (i, k) is the optimal reliability of a mapping of p processors on the i first tasks. The dynamic programming procedure of Algorithm 1 has been modified to account for the period bound.
Finally, we observe that the converse problem, namely optimizing the period when a bound on the reliability is enforced, is polynomial too (use a binary search on the period and repeatedly execute Algorithm 2 until the optimal value is found).
Reliability/latency optimization
We now prove the NP-completeness of the bi-criteria (reliability, latency) optimization problem on homogeneous platforms. As for the period, there is no difference between the worst-case latency and the expected latency on such platforms.
Theorem 3. The problem of optimizing the reliability on homogeneous platforms, with a bound on the latency, is NPcomplete.
Proof. Consider the associated decision problem: given an homogeneous platform, a chain of tasks, a bound K on the number of replications, a reliability r and a latency L, does there exist a mapping of reliability at least r and latency not exceeding L? This problem is obviously in NP: given a mapping, it is easy to compute its reliability and latency, and to check that it is valid in polynomial time.
To establish the completeness, we use a reduction from 2-PARTITION: given a set A of n numbers a 1 , . . . , a n , does there exist a subset A ⊂ A such that a∈A a = a / ∈A a.
a∈A a. Let a min = min 1≤i≤n {a i } and a max = max 1≤i≤n {a i }. We build the following instance of our problem with 3n + 1 tasks and 6n identical processors:
• K = 2 and λ = 10 −8 10 −n a −3n max ;
• s = b = 1 (unit processor speed and link bandwidth);
• ∀1 ≤ i ≤ n, r i = e −λwi and r comm,i = 1;
• L = (n + 1)B + n 2 + 3T ; • it follows that the reliability of the mapping is r = (1
The size of instance I 2 is polynomial in the size of I 1 . We now show that I 1 has a solution if and only if I 2 has a solution. Suppose first that I 1 has a solution A . Then we propose the following solution for I 2 :
• all intervals are replicated 2 times;
• any task of size B make up an interval;
• for all 1 ≤ i ≤ n, if a i ∈ A , then T 3i−1 and T 3i are assigned to two different intervals, else they constitute one single interval.
This yields the following costs for the latency:
• the sum of computation costs does not depend of the mapping: (n + 1)B + n 2 + 2T ; • for each a i ∈ A , we add a communication cost a i .
We thus obtain a latency L = (n + 1)B + n 2 + 3T . Concerning the reliability, it is the product of the reliability of all intervals:
• the reliability of intervals of size B is (1−(1−e −λB ) 2 );
• for each a i ∈ A , the product of the reliability of the two intervals for tasks T 3i−1 and T 3i−1 is
, which is greater
∈ A , the reliability of the interval for tasks T 3i−1 and
We thus obtain, for the product of all these reliabilities,
n Suppose now that I 2 has a solution. The exponent in the reliability bound implies that any interval is replicated at least 2 times, and the bound on replication is 2. This means that all intervals are replicated exactly 2 times. Suppose that one of the tasks of size B is computed together with another task in the same interval. This yields the bound on reliability:
< r This means that any task of size B makes up an interval. Let A be the set of values i such that T 3i−1 and T 3i are not in the same interval. We obtain the following formulas:
• For the reliability:
• For the latency:
This means ai / ∈A a i ≤ T and ai∈A a i ≤ T . Hence, A is a solution for I 1 . This concludes the proof.
We conclude that, on homogeneous platforms, the bicriteria (reliability, period) problem is polynomial, while the bi-criteria problem (reliability, latency) is NP-complete. As a consequence, the tri-criteria (reliability, period, latency) problem is NP-complete too.
It is striking, and somewhat unexpected, that the bicriteria (reliability, period) problem is easier than the (reliability, latency) one. The intuition for this difference is the following: when the period bound is given, we know once and for all which processors are fast enough to be enrolled for a given interval. Therefore, the mapping choices are local. On the contrary, the computation of the latency remains global, and its final value, including communication costs, depends upon the choices that will be made further on.
Complexity results for heterogeneous platforms
In this section, we prove the NP-completeness of the reliability optimization problem on heterogeneous platforms. Proof. Consider the associated decision problem: given a heterogeneous platform, a chain of tasks, a bound on the number K of replications, and a reliability r, does there exist a mapping of reliability at least r? This problem is obviously in NP: given a reliability and a mapping, it is easy to compute the reliability and to check that it is valid in polynomial time. To establish the completeness, we use a reduction from 3-PARTITION. Consider the following general instance I 1 of 3-PARTITION: given 3n numbers a 1 , . . . , a 3n and a number T such that 1≤j≤3n a j = nT , does there exist n independent subsets B 1 , . . . , B n of {a 1 , . . . , a 3n } such that for all 1 ≤ i ≤ n, aj ∈Bi a j = T ? Let a min = min 1≤i≤3n i{a i }.
We build the following instance I 2 with n tasks and p = 3n processors:
• K = 3;
• ∀1 ≤ i ≤ n, w i = 1/n (all tasks have cost 1/n);
• r u,i = e −λu w i su ;
• r comm,i = 1;
• ∀1 ≤ u ≤ 3n, λ u = λ * γ au and s u = 1;
• it follows that the reliability of the mapping is r = 1 − λ 3 γ T n .
The size of I 2 is polynomial in the size of I 1 . We show that I 1 has a solution if and only if I 2 has a solution. Suppose first that I 1 has a solution B 1 , . . . , B n . We propose the following solution for I 2 :
• we have one interval per task;
• the i-th task is replicated three times and allocated to the set of processors {P u |u ∈ B i }.
We obtain a reliability for task i which is equal to
Suppose now that I 2 has a solution. We first show that the optimal mapping consists of n intervals, one per task, each replicated three times. Suppose that we know the number of intervals in the optimal mapping. There are at most n intervals, and we have enough processors to duplicate all of them three times, and this increases the reliability. We conclude that all intervals will be replicated three times. Suppose now that one of this intervals contains t > 1 tasks. There are enough processors to split this interval into t single-task intervals, each replicated 3 times. Let r 1 be the reliability of the original interval with t tasks, and r t the reliability of the same tasks assigned to t intervals replicated 3 times. By hypothesis of optimality, we have:
However, λ ≤ 10 −8 , which contradicts the hypothesis. This means that, in the optimal solution, any task constitutes an interval.
Let, for all i, B i = {a j , T i mapped on P j }. We obtain the following reliability:
Suppose that, for a value i, aj ∈Bi a j = T . Then, By hypothesis, we have aj ∈Bi a j = T for a value i. Then by convexity,
By hypothesis, we have: 4 nγ 4T ≥ 1. This contradicts the hypothesis. Then, if {B 1 , . . . , B n } corresponds to a solution of I 2 , we have aj ∈Bi a j = T for 1 ≤ i ≤ n. This shows that B 1 , . . . , B n is a solution for I 1 , which concludes the proof.
Because mono-criterion reliability optimization is already NP-complete, all multi-criteria problems with period or latency or both, are also NP-complete on heterogeneous platforms.
Conclusion
We have addressed problems related to the mapping of linear chain workflows on homogeneous and heterogeneous distributed platforms. The main goal was to optimize the reliability of the mapping through task replication, while enforcing bounds on performance-oriented criteria (period and latency). We have been able to derive a comprehensive set of NP-hardness complexity results, together with optimal algorithms for polynomial instances. Altogether, these results provide a solid theoretical foundation for the study of multi-criteria mappings of linear chain workflows. Another contribution of this paper is the introduction of a realistic communication model that nicely accounts for the inherent physical limitations on the communication capabilities of state-of-the-art processors.
Communication failures have been incorporated in the model through routing operations, which guarantee that evaluating the system reliability remains computationally tractable. An interesting research direction would be to investigate whether it is feasible to remove this routing procedure, and accurately approximate the reliability of general (non serial-parallel) systems.
Another direction for future work involves the design of efficient heuristics for even more difficult problems that would mix performance-related criteria (period, latency) with several other objectives, such as reliability, resource cost, and power consumption.
