We investigate the problem of scheduling a set of tasks with individual deadlines and conditional precedence constraints on a heterogeneous Network on Chip (NoC)-based Multi-Processor System-on-Chip (MPSoC) such that the total expected energy consumption of all the tasks is minimized, and propose a novel approach. Our approach consists of a scheduling heuristic for constructing a single unified schedule for all the tasks and assigning a frequency to each task and each communication assuming continuous frequencies, an Integer Linear Programming (ILP)-based algorithm and a polynomial time heuristic for assigning discrete frequencies and voltages to tasks and communications. We have performed experiments on 16 synthetic and 4 real-world benchmarks. The experimental results show that compared to the state-of-theart approach, our approach using the ILP-based algorithm and our approach using the polynomial-time heuristic achieve average improvements of 31% and 20%, respectively, in terms of energy reduction.
Introduction
Modern mobile systems such as robots and driverless cars require computationally powerful and energy-efficient hardware due to their complex functions and battery power constraint. MPSoC is an ideal architecture for those mobile systems due to its high performance and low power dissipation. Examples of commercial MPSoCs include Samsung Exynos 5422 SoC [1] and Zynq UltraScale+ MPSoC devices [3] . Samsung Exynos 5422 SoC powers the famous Samsung Galaxy smartphone series. Zynq UltraScale+ MPSoC devices have been used in robots. Typically an MPSoC consists of processors with different power and performance profiles. For example, Samsung Exynos 5422 SoC consists of 4 highperformance ARM Cortex-A15 CPU, 4 low-power ARM Cortex-A7 CPU. Modern MPSoCs have a large number of processors and the number of processors on MPSoCs are expected to grow [10] . According to International Technology Roadmap for Semiconductors (ITRS), MPSoCs will integrate thousands of processors [15] by 2025. Therefore, the traditional bus-based on-chip communication is no longer feasible due to its poor scalability. NoC-based communication provides a significant improvement in terms of flexibility, scalability and performance over hierarchical (e.g., Advanced Microcontroller Bus Architecture and STBus) and traditional bus structures [23] .
Mobile systems are battery powered. Although battery lifetimes have increased over the years, modern batteries are still far from meeting the needs of power-hungry mobile devices. Therefore, energy efficiency is a critical issue in mobile systems. One way to improve energy efficiency is to apply Dynamic Voltage and Frequency Scaling (DVFS). DVFS saves energy consumption by lowering the voltage/frequency of a processor/communication link when it is underutilized. For example, in order to reduce the energy consumption of a Nexus 4 Android smartphone on-demand governor scales the CPU frequency and voltage level based on CPU utilization every 50 millisecond [17] . In addition to processors, NoC communication links and routers also consume a large amount of on-chip energy. For Alpha 21364 processor [32] , out of 125W total on-chip power consumption, 23W (20%) is consumed by NoC routers and links, and out of 23W, the NoC links consume 58% of the power. Therefore, it is important to take communication energy into account when mapping applications onto NoC-based MPSoCs.
In this paper, we target energy-efficient mobile embedded systems such as driver-less cars, robots and advanced combat helmets using a NoC-based MPSoC as the hardware platform. For those mobile systems, their complex functions such as object recognition and communication, are known at the design stage, and the embedded software is typically modelled as a set of tasks with conditional precedence constraints and individual deadlines. We investigate the problem of scheduling a set of tasks with conditional precedence constraints and individual deadlines on a heterogeneous NoC-based MPSoC such that the total expected processor and communication energy is minimized. The processors and NoC links are voltage scalable and can operate at a set of discrete voltage/frequency levels. We make the following major contributions:
1) We propose a novel offline task scheduling approach. Our approach consists of a task scheduling heuristic that constructs a single unified schedule for all the tasks and collectively assigns a frequency to each task and each communication assuming continuous frequencies, and an ILP-based algorithm and a polynomial-time heuristic for assigning a discrete frequency to each task and each communication. To the best of our knowledge, our approach is the first one that investigates the problem of scheduling a set of tasks and communications with conditional precedence constraints on NoC-based MPSoCs such that the total expected energy consumption is minimized. 2) We have performed experiments on 20 benchmarks.
Compared to the state-of-the-art approach proposed by Li and Wu [24] that does not consider conditional precedence constraints, our approach using the ILP-based algorithm achieves an average improvement of 31% and a maximum improvement of 61%, and our approach using the polynomial-time heuristic achieves an average improvement of 20% and a maximum improvement of 46%. Furthermore, both our approach using the ILPbased algorithm and our approach using the polynomialtime heuristic run approximately three times faster than the state-of-the-art approach. The rest of this paper is organized as follows. Section 2 gives an overview of the related work. Section 3 describes all the models, including the task model, the power models, and the MPSoC model. Section 4 presents our heuristic for task scheduling and frequency assignment assuming continuous frequencies for both processors and communications. Section 5 proposes an ILP-based algorithm and a polynomialtime heuristic for assigning discrete frequencies to tasks and communications. Section 6 presents our experimental results and analysis. Lastly, Section 7 concludes this paper.
Related Work
Several approaches have been proposed to minimize energy consumption for heterogeneous multi-processors systems. Gebotys et al. [12] investigate the problem of scheduling tasks onto heterogeneous processors such that total energy consumption is minimized. In their approach task mapping and scheduling are integrated with dynamic voltage scaling to maximize energy efficiency. Singh et al. [38] propose a contention-aware, energy efficient, duplication-based mixed integer programming (CEEDMIP) formulation for scheduling task graphs on NoC-based heterogeneous multiprocessors. The key idea of their approach is to duplicate some tasks to reduce the communication energy as well as traffic congestion. Zhang et al. [45] propose an ILP-based, energy-aware task mapping algorithm on heterogeneous multi-processors, and an evolutionary algorithm-based, energy-efficient task mapping heuristic. Cai et al. [6] propose an energy efficient approach for heterogeneous multi-processor mobile embedded systems. Their approach assigns discrete frequencies to tasks based on the critical path lengths of tasks. Lin et al. [25] propose an energy-efficient algorithm for heterogeneous MPSoC-based mobile devices. They integrate task mapping and scheduling with dynamic voltage scaling to reduce the energy consumption of mobile devices.
Huang et al. [21] propose a simulated annealing-based energy-aware task mapping algorithm on heterogeneous NoCbased MPSoCs. In their model, processors are assumed to be voltage scalable and NoC links operate at a fixed frequency. Mixed Integer Linear Programming (MILP) is used to assign voltages/frequencies to tasks. Shin et al. [37] consider a NoC model with voltage scalable links and propose a genetic algorithm for minimizing the communication energy of the NoC by scaling the link voltages. Ghosh et al. [13] consider a model similar to that of Huang et al. [21] and propose an energy-aware task scheduling heuristic based on MILP relaxation and randomized rounding. Li et al. [24] assume a NoC model with voltage scalable links. They propose a task mapping algorithm and a genetic algorithm-based task voltage/frequency assignment algorithm. A detailed survey on approaches for multi-processor energy-efficient embedded computing is given in [31] .
Only a few approaches have been proposed aiming at minimizing the energy consumption of tasks with conditional precedence constraints. Shin et al. [36] propose a scenariobased offline Non-Linear Programming (NLP) algorithm that assigns each task a speed for each scenario. The approach has an exponential time complexity as it constructs a separate schedule for each scenario. Wu et al. [41] propose an approach that employs the schedule table generated by [9] to identify the available slack time in the worst-case and propose a heuristic that assigns a frequency to each task. In [28] a heuristic is proposed to assign each task a speed based on the critical path length. The heuristic has an exponential time complexity in the worst case since it enumerates all the possible scenarios when computing the critical path lengths. Umair et al. [40] propose a task mapping algorithm for periodic CTGs, and propose an NLP-based algorithm and a heuristic to assign voltages/frequencies to tasks.
Energy efficiency is not only critical in mobile embedded systems but also important in cloud computing. In cloud data centers, efficient power management may reduce operation costs, increase system reliability and reduce adverse effects of large power consumption on environments. Many approaches have been proposed to minimize energy consumption in the data center. Hasan et al [19] formulate the problem of offline scheduling of jobs on the servers of a data center such that the total energy consumption is minimized, as a binary integer program. They also propose an online heuristic for the same problem. Sarood et al. [35] propose an Integer Linear Programming (ILP)-based scheduler to reduce energy consumption in data centers. Huai et al. [20] propose a load balancing algorithm combined with DVFS to significantly reduce the energy consumption of data center servers. Roukh et al. [34] argue that database management systems (DBMSs) are one of the major energy consumers in data centers, and propose a machine learning-based approach to reduce the energy consumption of nodes of database clusters when optimizing queries. Xu et al. [42] propose an energy-aware query optimization platform called PET for DBMSs. PET estimates the energy costs of queries offline and the evaluation engine of the DBMS configures PET parameters towards a desired energy/performance trade-off. Guo et al. [16] propose an energy efficient query processing framework in DBMSs. Their approach works out energy cost query plans and makes a trade-off between the performance and the energy plans. Authors in [18] and [14] discuss practices in detail to reduce the energy footprint of data centers. Mittal et al. [30] give a survey of power management techniques for data centers.
Our approach differs from all the previous approaches in three major aspects. First, our approach considers conditional 
Models
The target application is modelled by a conditional task graph (CTG) [39] . A CTG is a weighted directed acyclic graph G(V, E, A, X) defined as follows. V = {v 1 , v 2 , . . . , v n } is a set of tasks. Each task has an execution time represented by the number of clock cycles on each processor and a deadline d i . All the tasks are non-preemptible. E ⊆ V × V is a set of directed edges each denoting the dependency between the two tasks. A is a set of triplets (e i , c i , p(c i )), where e i ∈ E, and c i and p(c i ) represent the condition associated with e i and its probability [26] , respectively. X is a set of edge weights. An edge weight χ s ∈ X of an edge e s = (v i , v j ) represents the communication volume in bits from task v i to task v j . Our task model is described in detail in [39] . The execution probability of each node v i ∈ V is represented by p(v i ). We use algorithm presented in [26] to compute the probabilities.
The MPSoC has a set P = {pe 1 , pe 2 , . . . , pe m } of m processors. We assume heterogeneous processors, where each processor pe k ∈ P is DVFS-enabled and can operate on a set {(V dd1 , f 1 ), . . . , (V ddn k , f n k )} of n k discrete voltagefrequency pairs. A matrix N C represents the execution times in clock cycles of all the tasks in G on different processors, where N C(j, i) is the number of clock cycles of task v i on pe j .
The dynamic power P d k,i of a task v i on processor pe k , dominated by discharging and charging of load capacitance due to gate switching, is given as [4] , where C ef f k,i , V dd k,i and f k,i are the effective load switching capacitance, the supply voltage and the operating frequency, respectively. The execution time of a task v i on processor pe k operating at frequency f k,i is given as t k,i = N C(k, i)/f k,i . The operating frequency f is approximated by
, where K 1 , K 2 , K 6 and V th1 are circuit dependent constants, L d is the logic depth, and α is velocity saturation imposed by the used technology (1.4 ≤ α ≤ 2). The total energy consumption E k,i of a task v i on pe k is computed as follows [4] :
We consider the 2D mesh NoC architecture, where each processor is associated with a router, and there are N R rows and N C columns. Every router has five ports with one port used to communicate with the associated processor, and the remaining four ports used to communicate with the neighboring routers. A link connecting two routers is called global link and a link connecting a router with its associated processor is referred to as a local link. All the links are full duplex. All the global links are identical and have same link width (also called bus width or the number of wires) b w .
We only take into account the energy consumption of global links and neglect the energy consumption of the local links. In the rest of this paper, links refer to global links unless they are explicitly specified.
The NoC links can operate at a set {(V dd1 , f 1 ), . . . , (V dd F , f F )}, of voltage-frequency pairs. In a 2D mesh, the Manhattan distance η i,j between two processors pe i and pe j is defined as follows: η i,j = |x i −x j |+|y i −y j |, where (x i , y i ) and (x j , y j ) are the coordinates of pe i and pe j , respectively.
The wormhole switching [27] , [24] and deterministic XY routing are used. We do not scale router frequencies as adjusting router frequencies makes the problem too complex. We assume that router frequencies are fixed and commensurate with link frequencies as in [24] .
Consider the message e i for a communication node. The time taken by e i on the links operating at frequency f i such that e i traverses the network without contention is calculated as follows [24] :
We use the bit energy model given in [43] , [29] for communication. Assume that the source node and the destination node of e i are mapped on processors pe s and pe d , respectively. The energy of transmitting one bit of the message e i is
, where E Rbit is the energy consumption of one bit on one router, and E lbiti is the energy consumption of transmitting one bit on one link when all the links of e i operate at f i . Thus, the energy consumption of transmitting e i on the links operating at frequency f i is calculated as follows:
where P i is the total power consumed in transmitting one bit when the links that e i traverses operate at frequency f i . P i is the sum of the dynamic power P dyn i and static power P stati , P i = P dyn i +P stati [4] . The static and dynamic powers depend on how links are implemented. The frequency f i is approximated by
Task Mapping, Scheduling and Frequency Assignment
In order to schedule tasks and communications in a unified way, we first transform a CTG G into an extended CTG by adding an additional node for every edge in the original CTG. We refer to these additional nodes as communication nodes. The original nodes in G are kept unchanged and are referred to as task nodes. Specifically, for each edge (v i , v j ) ∈ G, we add
has the same condition and probability. The extended graph is represented by G e (V + V * , E , A ), where V is a set of task nodes, V * is a set of communication nodes, E is a set of edges, and A is a set of 3-tuples where each 3-tuple consists of an edge, the condition associated with the edge and probability of the condition. Figure 1(b) shows the extended graph G e of the CTG in Figure 1(a) .
Successor-Tree-Consistent Deadline
Our offline scheduling algorithm schedules nodes using the priorities of task nodes and communication nodes. We extend the notion of successor-tree-consistent deadline [39] to NoC-based MPSoCs, and propose a priority scheme for nodes, where the priority of each node v i is its successortree-consistent deadline denoted by d i . When computing the successor-tree-consistent deadline of each node, we assume that all the processors and NoC-links operate at the maximum frequencies. Furthermore, the original CTG is used rather than the extended graph G e . Before defining the successor-treeconsistent deadline, we introduce the worst case set of a task. Let IP red(v i ) and ISucc(v i ) be the sets of all the immediate predecessors and all the immediate successors of a task v i , respectively.
Definition 1: The worst-case set of a task v i , denoted by W CS(v i ), is a set of tasks defined as follows:
Definition 2: Given a CTG G and a task v i , the successor tree of a task v i is a weighted directed tree ST(G,
Definition 3: Given a task v i , if v i is a sink task, its successor-tree-consistent deadline d i is equal to its preassigned deadline d i . Otherwise, d i is the upper bound on the latest completion time of v i in any feasible schedule of the relaxed problem instance: a set V = {v i } ∪ W CS(v i ) of tasks with the precedence constraints in the form of the successor tree ST (G, v i ), where the deadline of each task v j ∈ W CS(v i ) is its successor-tree-consistent deadline, and the deadline of v i is its preassigned deadline, and the same MPSoC.
The successor-tree-consistent deadlines of all the tasks in G are computed as follows. 
Earliest Successor-Tree-Consistent Deadline First Algorithm
In a CTG, the number of scenarios grows exponentially as the number of conditions increases. Therefore, it is not feasible to construct a separate schedule for each scenario. Our offline scheduling approach constructs a single unified schedule for all the scenarios by exploiting the mutual exclusion relations between communication and task nodes. Two nodes are said to be mutually exclusive in the graph G e if they cannot coexist in any scenario. For example, in Figure 1 (a) v 5 and v 6 are mutually exclusive. Two mutually exclusive nodes can be allocated the same resource at the same time. In a CTG, two nodes are said to be concurrent if they are not reachable from each other in graph G e and are not mutually exclusive.
We propose an Earliest-Successor-Tree-consistent Deadline First (ESTDF) list scheduling algorithm assuming that all processors and links operate at the maximum frequencies. ESTDF is called by our main algorithm IOETCS described in the next subsection. It determines the order in which task nodes and communication nodes execute and captures this order by adding additional precedence constraints in the input graph G. The output of ESTDF is the input graph G with the additional precedence constraints. Given a CTG G, a matrix N C of worst-case clock cycles of tasks, a vector X of communication volumes and a task-to-processor mapping M ap, ESTDF works by constructing a set ReadySet containing the source nodes of G and repeating the following steps until ReadySet is empty.
1) Select a node v j with the minimum successor-treeconsistent deadline from ReadySet. 2) Compute its ready time r j = max{ζ l : v l ∈ IP red(v j )}, where ζ l is the finish time of the node v l . 3) If v j is a communication node, compute its finish time ζ j = r j + t j , where t j is given in Equation (2), and insert unconditional directed edges in G from v j to the communication nodes that are concurrent to v j , have
Iterative Offline Energy-Aware Task and Communication Scheduling Algorithm (IOETCS)
We propose an iterative offline energy-aware task and communication scheduling algorithm (IOETCS), Algorithm 1, for a NoC-based MPSoC. IOETCS constructs a single unified schedule iteratively assuming continuous frequencies for both processors and links.
IOETCS repeats three major steps until all the nodes in G e are mapped and scheduled. First, it selects an unscheduled task node v i ∈ V with the smallest successor-tree-consistent deadline among all the unscheduled task nodes. Second, it initializes the initial energy consumption E ini of the schedule to infinity and repeats the following steps for every pe k ∈ P : 1) Tentatively assign v i to the processor pe k by M ap[i] ← k and construct a sub-graph G s (V s + V * s , E s ) where V s is the set of all the mapped task nodes, V * s is a set of communication nodes with both child and parent nodes mapped on different processors and E s is a set of all the edges where every edge in E s belongs to E and both its head and tail nodes are in V s + V * s . For each communication node v s whose parent node v p and child node v c are mapped on the same processor, insert a directed edge (v p , v c ) to E s .
ALGORITHM 1: IOETCS
input : CTG G e (V + V * , E , A ) with a matrix N C and a set X, node deadlines, and a NoC-based MPSoC output: Schedule graph G * (V s + V * s , E s ), a vector M ap for task mapping, and a communication and task voltage assignment. Construct a list L of nodes in V sorted in non-descending order of successor-tree-consistent deadlines;
Compute voltage assignment of nodes in G s and total expected energy E exp of G s by solving NLP;
2) Call G s ← EST DF (G s , N C, X, M ap) to construct a local schedule and capture the resource constraints introduced by the local schedule. 3) Given a task-to-processor mapping and a graph G s , assign voltages/frequencies to task and communication nodes by solving a non-linear programming (NLP) problem. The objective of the NLP is to minimize the total expected energy consumption of graph G s . The expected energy consumption is given as
The NLP problem is formulated as follows:
In Equation (5) (4) and (5) are the task execution time and communication time constraints, respectively. Equation (6) is the deadline constraint, and the Equations (7), (8) are precedence constraints. Since the constraints and the objective function are convex, this NLP problem can be solved in polynomial time [44] . 4) If the initial energy E ini is greater than E exp , set p ← k, G * ← G s , and E exp ← E ini .
In the final step, IOETCS maps v i to processor p, and set M ap[i] ← p.
Discrete Frequency Assignment
Algorithm 1 constructs a graph G * (V s + V * s , E s ) that captures the original precedence constraints and constraints introduced by the schedule, and assigns an optimal frequency/voltage to each node in G * . However, the frequency/voltage level assigned to a node may not be a valid discrete frequency/voltage of the processor/link where the node is mapped. Therefore, we propose an ILP-based algorithm and a polynomial time heuristic for assigning a discrete frequency to each node.
ILP-Based Algorithm
The optimal frequency f if v i is a task node.
Let V opt be a set of nodes that lie in Case 1.
is a set of task nodes and V * R = V * s \ V opt is a set of communication nodes for which Case 2 holds. The expected energy consumption is now given as
(given in Equation (1)) are the energy consumptions of a task node v i on a processor pe k at the frequencies f , respectively and C is the sum of energy consumption of nodes in V opt . The ILP problem is formulated as follows:
The decision variables are task execution time t k,i , communication time t i , binary variable x i and start time ρ i . t (14) and (15) collectively define the deadline constraints, Equations (16), (17), (18) and (19) collectively define the precedence constraints.
Heuristic Algorithm
The ILP problem is a well-known NP-Complete problem. Therefore, the previous ILP-based algorithm is not scalable. Next, we propose a polynomial time heuristic to assign discrete frequencies to task and communication nodes. The heuristic uses the schedule constructed by IOETCS algorithm (Algorithm 1) and works as follows:
1) Compute the cuts of graph G * as follows:
• Create a copy G of G * and repeat the following steps until G is empty: a) Create a cut containing all the source nodes with zero in-degree in G . b) Remove all the source nodes and their incident edges from G . 2) For every node v i ∈ V s + V * s , if its optimal frequency computed by NLP is a discrete frequency, assign the optimal frequency to v i . Otherwise, assign f
3) Construct a new local schedule using the new frequency such that the order between nodes remain the same as in the schedule used by the NLP-based algorithm. 4) If there is no late task node, the algorithm terminates.
Otherwise, repeat the following steps until there is no late task node.
• Find the first late task node v j and repeat the following steps until v j is not late. a) Find a set B of nodes where every node v z ∈ B satisfies the following two conditions. First, v z belongs to the set {v j } ∪ P red(v j ), where P red(v j ) is a set of predecessors of v j . Second, the frequency of v z has not been adjusted before and v z has not been assigned an optimal discrete frequency by NLP. b) For every node v i ∈ B, compute its rank. The rank of v i is a 2-tuple (g i , κ i ) which reflects the impact of v i on shifting the late node v j to an earlier time. Let C p be a set of nodes of a cut containing v i , C p be C p ∩ B, F T containing v i , is given as
The normalized time gain κ i of v i is computed as: if v i is a task node. Update the schedule.
PERFORMANCE EVALUATION
In this section, we use IOETCS-ILP and IOETCS-Heuristic to denote our approach using the ILP-based algorithm and the heuristic, respectively, for assigning a discrete frequency to each task and each communication. To demonstrate the effectiveness of IOETCS-ILP and IOETCS-Heuristic, we compare them with three approaches. The first approach is Li-Wu approach, a state-of-art approach for unconditional task graph model proposed in [24] . The second approach ILP-vpv-flv that is the same as IOETCS-ILP except that the NLP and ILP algorithms are modified such that they only scale processors frequencies/voltages and assign the maximum link frequency to all communication nodes. The third approach is ILP-fpvvlv that is the same as IOETCS-ILP except the NLP and ILP algorithms are modified such that they only scale the voltages of links and assign the maximum processor frequencies to task nodes.
Simulation Setup
We use the same experimental setup as in [11] , [7] , [4] . The technology parameters are taken from [7] . We use two types of processors in our experiments, Type 1 and Type 2, modelled after the processors in [7] and [8] , respectively. The configuration for NoC links are adopted from [24] . The execution times in cycles of tasks are randomly generated within [ Table II The communication volumes are generated randomly within [80, 800] × 10 6 in bits. The deadline for each application is set to twice the makespan of the schedule of the application constructed by IOETCS algorithm assuming the maximum processors frequencies, the maximum links frequencies and a common deadline of 300 seconds for all the tasks so that there is reasonable slack for energy reduction. All the approaches are implemented in Matlab version R2015a. We use fmincon, quadprog and intlinprog solvers to solve the NLP, quadratic programming and ILP problems, respectively. The hardware platform consists of Intel(R) Core(TM) i5-4570 CPU with a clock frequency of 3.20 GHz, 8.00 GB memory, and 3 MB caches.
Results and Discussion

Experiments with conditional task graphs:
In the first set of experiments we choose eight benchmarks and their details are given in Table II where x/y/z/D stands for the number of tasks, the number of OR-FORK tasks, the number of conditions and the deadline of the application in seconds, respectively. The column with heading Dim represents NoC dimensions. The benchmarks in Table II are the same benchmarks used in [26] .
IOETCS-ILP achieves an average improvement of 31%, a maximum improvement of 62 % for CTG 7 and a minimum Table I improvement of 1.03 % for CTG 1 over ILP-vpv-flv. It achieves an average improvement of 27%, a maximum improvement of 61% for CTG 3 and a minimum improvement of 7.9% for CTG 6 over ILP-fpv-vlv. IOETCS-Heuristic achieves an average improvement of 23%, a maximum improvement of 40% for CTG 5 and a minimum improvement of 1.3% for CTG 1 in comparison to ILP-vpv-flv. It achieves an average improvement of 18%, a maximum improvement of 61% for CTG 3 and a minimum improvement of 4% for CTG 6 over ILP-fpvvlv. We observe that ILP-vpv-flv performs significantly better in terms of energy consumption if the computation energy dominates the total energy, and ILP-fpv-vlv performs better if communication energy dominates the total energy. voltages and the link voltages. We choose two real-world benchmarks vehicle cruise controller [33] and Robot control [2] that are the task graphs of actual applications. These benchmarks are executed on 3x3 NoC where the processors are selected randomly as either Type 1 or Type 2. Both IOETCS-ILP and IOETCS-Heuristic perform significantly better than ILP-vpv-flv and ILP-fpvvlv in terms of energy consumption. In terms of running time IOETCS-ILP and IOETCS-Heuristic take longer time compared to ILP-vpv-flv and ILP-fpv-vlv. The reason is that IOETCS algorithm cannot find a feasible solution for some sub-problems, and thus the solver takes a longer time to converge.
Experiments with non-conditional task graphs:
To demonstrate the effectiveness of our approach on task graphs without conditional precedence constraints, we have conducted a second set of experiments. We choose eight task graphs (TG) and their details are given in Table I where a/b/D stand for the number of tasks, the number of edges and the deadline of the application in seconds, respectively. The column with the heading Dim represents NoC dimensions. The benchmarks in Table II are the same benchmarks used in [26] except that all the edges are treated as unconditional edges. Figure 6 (a) gives a comparison of 8 benchmarks in Table I in terms of energy consumption where all the processors are of Type 1. IOETCS-ILP achieves an average improvement of 31%, a maximum improvement of 61% for TG 6 and a minimum improvement of 9% for TG 1 over Li-Wu approach. IOETCS-Heuristic achieves an average improvement of 20%, a maximum improvement of 46% for TG 4 and a minimum improvement of 2% for TG 1 over Li-Wu approach. We observe that Li-Wu approach makes very poor mapping decisions for heterogeneous processors. The benchmarks TG 3, TG 4, TG 6 and TG 8 are executed on MPSoCs where the processors are randomly selected as either Type 1 or Type 2. The reason for poor performance of Li-Wu approach is that it does not take into account the energy profiles of processors when making mapping decisions. The benchmarks TG 1, TG 2, TG 5 and TG 8 are executed on MPSoCs with homogeneous processors (Type 1). As a result, Li-Wu approach performs considerably better. In terms of running time, IOETCS-ILP and IOETCSHeuristic run approximately three times faster than Li-Wu approach. The major reason is that the genetic algorithm takes significantly longer time as it constructs a new schedule for each candidate solution using ETFGBF.
We have chosen two real-world benchmarks JPEG encoder [22] and Automatic Target Recognition (ATR) [24] . JPEG encoder is executed on a 3x3 MPSoC and ATR is executed on a 4x5 MPSoC. The processors are randomly selected as either Type 1 or Type 2. For both benchmarks, IOETCS-ILP and IOETCS-Heuristic outperform Li-Wu approach in terms of both running time and energy consumption.
We observe that the energy consumption of the schedules produced by IOETCS-Heuristic are close to those of the schedules produced by IOETCS-ILP. IOETCS-ILP achieves the average improvement of 11% over IOETCS-Heuristic in terms of energy consumption for all the problem instances. In terms of running time, IOETCS-Heuristic runs slightly faster than IOETCS-ILP.
Conclusion
We investigate the problem of energy-aware mapping and scheduling of tasks and communications with conditional precedence constraints and individual deadlines on a heterogeneous NoC-based MPSoC and propose a novel approach. Our approach reduces the total expected energy consumption by collectively optimizing the voltages/frequencies of processors and NoC links. The IOETCS algorithm maps tasks to processors and serializes communications that use same communication links. It constructs a unified schedule and assigns voltages/frequencies to tasks and communications collectively assuming continuous voltages/frequencies. The IOETCS algorithm significantly narrows down the search space for our ILP-based algorithm and our heuristic for assigning discrete frequencies/voltages to tasks and communications. The experimental results show that in terms of energy consumption, our approach using either ILP or heuristic outperforms the state-of-the-art approach proposed by Li and Wu [24] that considers only unconditional task graphs. Compared to the state-of-the-art approach, our ILP-based approach achieves an average improvement of 31%, a maximum improvement of 61% and a minimum improvement of 9%, and our heuristicbased approach achieves an average improvement of 20%, a maximum improvement of 46% and a minimum improvement of 2%. In terms of running time, our approach is approximately 3 times faster than the state-of-the-art approach.
