ABSTRACT Network-on-Chip (NoC)-based multiprocessor system-on-chips (MPSoCs) are becoming the de-facto computing platform for computationally intensive real-time applications in the embedded systems due to their high performance, exceptional quality-of-service (QoS) and energy efficiency over superscalar uniprocessor architectures. Energy saving is important in the embedded system because it reduces the operating cost while prolongs lifetime and improves the reliability of the system. In this paper, contention-aware energy efficient static mapping using NoC-based heterogeneous MPSoC for real-time tasks with an individual deadline and precedence constraints is investigated. Unlike other schemes task ordering, mapping, and voltage assignment are performed in an integrated manner to minimize the processing energy while explicitly reduce contention between the communications and communication energy. Furthermore, both dynamic voltage and frequency scaling and dynamic power management are used for energy consumption optimization. The developed contention-aware integrated task mapping and voltage assignment (CITM-VA) static energy management scheme performs tasks ordering using earliest latest finish time first (ELFTF) strategy that assigns priorities to the tasks having shorter latest finish time (LFT) over the tasks with longer LFT. It remaps every task to a processor and/or discrete voltage level that reduces processing energy consumption. Similarly, the communication energy is minimized by assigning discrete voltage levels to the NoC links. Further, total energy efficiency is achieved by putting the processor into a low-power state when feasible. Moreover, this approach resolves the contention between communications that traverse the same link by allocating links to communications with higher priority. The results obtained through extensive simulations of real-world benchmarks demonstrate that CITM-VA approach outperforms state-of-the-art technique and achieves an average ∼ 30% total energy improvement. Additionally, it maintains high QoS and robustness for real-time applications.
I. INTRODUCTION
Energy dissipation over the past decade in System-on-Chips (SoCs) has become a captious design constraint as it limits the performance, reliability and battery life [1] . Therefore, energy management techniques such as Dynamic Voltage and Frequency Scaling (DVFS) or Dynamic Power Management (DPM) are adopted to optimize the energy consumption. DVFS reduces concurrently the supplied voltage and frequency of the DVFS-enabled processor when performance requirement is low [1] , [2] . DPM shutdowns the processor or switches it to a low-power state i.e. sleeping mode when it is in an idle state and similarly wakes it up when needed [3] . Modern embedded systems use SoCs which integrate processors, memory, advanced peripherals and power management circuitry on a single chip [4] .
Multiprocessor System-on-Chips (MPSoCs) are widely deployed in high-performance computing and applicationspecific embedded systems such as gaming and aerospace for real-time response. Moreover, they offer energy efficiency and performance advantages over uniprocessor architectures [1] , [5] , [6] . Thus, MPSoCs are becoming the computing engines in embedded systems for real-time applications.
A dramatic increase in their use is expected in the upcoming years and there will be hundreds of processors on a single chip [5] . Subsequently, Network-on-Chip (NoC) will replace the traditional bus-based communication due to its limited band width, high latency and poor scalability [7] .
Heterogeneous MPSoCs significantly reduce the energy consumption and dramatically enhance the performance compared to homogeneous MPSoCs [8] . They contain interconnected DVFS-enabled processors exhibiting different power-performance profiles and computing capabilities [9] , [10] . Samsung Exynos 5422 is a popular example of heterogeneous MPSoC which is used in Galaxy S5 smart-phone. It contains of high-performance ARM Cortex-A15 processor and energy efficient ARM Cortex-A7 [11] - [13] . Few other commercial heterogeneous MPSoCs include Cell by IBM and Toshiba, Texas Instruments OMAP TM 1510, OMAP TM 3630, Apple A5X, and SHAPES a NoC based heterogeneous MPSoC [14] .
Mapping is basically proper allocation of a set of tasks representing an application on the processors of MPSoC architecture in order to reduce either energy consumption or enhance the performance of the system [15] . Task mapping on MPSoCs is a well known NP-hard problem [16] . Therefore, an optimal solution does not exist subsequently, heuristics are deployed to obtain a near optimal solution [17] . Heuristics for task mapping approaches use different formulation based on Multi Integer Linear Programming (MILP) [18] , Integer Linear Programming (ILP) [7] , [19] - [21] and Non Linear Programming (NLP) [22] . Additionally searching based algorithms are also developed using Genetic Algorithm (GA) [23] , [24] , Ant Colony Optimization (ACO) [25] , Particle Swarm Optimization (PSO) [19] , [26] and Simulated Annealing (SA) [21] , [27] . Among these algorithms, GA is a popular and widely adopted algorithm for task mapping on MPSoCs.
Dynamic and static are the two types of task mapping. In dynamic mapping, the tasks are assigned to the processors at runtime while in static mapping task set is allocated to the processors before an embedded system runs [28] . Dynamic mapping can fully utilize the available resources of the embedded system but increases the optimization complexity significantly [29] . Static mapping can be used in various realtime Internet-of-Things (IoT) based multimedia applications for example human gait analysis [30] , remote ultrasound system [31] , object or person tracking [32] , surveillance [33] , [34] and human recognition [35] . IoT paradigm has enabled embedded systems (smart nodes, sensors, actuators) to interconnect them to the Internet using networking technologies for physical and/or environmental conditions monitoring/control purposes as shown in FIGURE 1 [36] - [38] . Quality-of-Service (QoS) is an important concept for energy constrained embedded systems in IoT because degradation of QoS is unacceptable in health-care and safety related critical applications [39] - [41] . Therefore, it is important that QoS is addressed all the time along with energy savings in IoT. In this work, NoC based heterogeneous MPSoC architecture with DVFS-enabled processors is considered while contention and energy-aware static mapping for real-time Directed Acyclic Graph (DAG) tasks with individual deadlines and precedence constraints is studied.
First, task mapping, scheduling, and voltage scaling are performed in an integrated manner. Unlike other approaches which map the tasks first and then assigns voltage levels separately. Our Contention-aware Integrated Task Mapping and Voltage Assignment (CITM-VA) scheme guides the tasks and communications mapping to a more energy efficient solution. Moreover, both DVFS and DPM are integrated to reduce the total energy consumption.
Second, the proposed CITM-VA static energy management scheme saves communication energy by minimizing the communication over the NoC. It further reduces the communication energy by scaling the voltage levels of the NoC links and assigns communications voltage level such that communication energy is minimized. Hence the available slack is efficiently shared between the communications and tasks. Furthermore, contentions among concurrent tasks and communications are resolved by prioritizing the execution of high priority tasks and communications over low priority ones.
Third, our experimental results are generated from simulations conducted on five real-world benchmarks adopted from Embedded Systems Synthesis Benchmarks (E3S) [42] . The results are compared to state-of-the-art Energy-efficient Contention-aware Mapping (ECM) static energy management scheme developed by Li and Wu [24] . The proposed CITM-VA approach outperforms ECM in terms of energy savings and QoS. Compared to ECM, CITM-VA reduces the average total energy consumption by ∼ 30%.
The rest of the paper is organized as follows: Section II reviews existing task mapping and scheduling approaches using multiprocessors systems. Preliminaries are discussed in Section III. Section IV presents the proposed static contention and energy-aware scheme. The results examined in Section V, and Section VI concludes this paper.
II. RELATED WORK
Motivated by the fact that MPSoCs are high-performance green computing platform several studies have investigated the static task mapping and scheduling problem. Static task VOLUME 6, 2018 Mapping and scheduling strategies targeting energy savings can be categorized into (A) computation (B) communication and (C) Total.
A. COMPUTATION ENERGY REDUCTION
Srinivasan and Chatha [27] developed SA based energy optimization approach called (LPPWU ) sa for MPSoC platform with DVFS-enabled homogeneous processors using busbased communication. The (LPPWU ) sa reduced overall run time and combined DVFS and DPM for achieving maximum energy savings. This approach first only combines DVFS with the scheduling and after that separately deploys DPM at the final step to minimize the overall energy consumption.
Tosun [43] mapped periodic independent tasks on heterogeneous MPSoC architecture to reduce computation energy consumption. A heuristic algorithm using the Earliest Deadline First (EDF) strategy is deployed for task mapping and optimal voltage levels are assigned using DVFS. The investigation assumes independent task model and does not perform experiments on dependent tasks. Moreover, task duplication is used which can negatively affect the memory usage of the MPSoC platform.
Chen et al. [18] formulated the energy optimization problem as MILP using NoC based homogeneous MPSoC architecture for dependent real-time tasks represented by a DAG. A Non-preemptive schedule is generated and discrete voltage level is assigned to each task for energy consumption optimization. However, this study does not explicitly consider communication energy overhead i.e. interprocessor communications and heterogeneous MPSoC architecture for further energy savings.
An ILP-based meta-heuristic called Shuffled Frog Leaping Algorithm (SFLA) is proposed by Zhang et al. [19] to map real-time tasks on bus-based communication MPSoC platform consisting of heterogeneous processors. The study focuses only on real-time independent periodic applications while does not consider inter-task communication and task precedence constraints.
Tariq and Wu [22] investigated the problem of energyaware scheduling of tasks with conditional precedence constraints on shared-memory homogeneous MPSoC. Their objective was to minimize the total computation energy. In their approach they first map the tasks to processors and then perform voltage scaling. They have proposed an NLP based algorithm that assigns optimal continuous voltage level to each task given and initial schedule. The major drawback of their approach is that they assume processors can operate at any voltage level between minimum and maximum voltage and frequency levels. This is a not a practical assumption as the processors in MPSoCs only operate on the discrete voltage and frequency levels.
B. COMMUNICATION ENERGY OPTIMIZATION
Shin and Kim [44] considered a NoC with voltage scalable links and proposed a GA based approach for reducing the communication energy by finding the optimal voltages for communications on the links. Similarly, Wang et al. [45] studied the static mapping approach deploying Adaptive Genetic Algorithm (AGA) for communication energy management. Chou and Marculescu [46] improved the contention of NoC architecture for MPSoCs platform using DAG. Mapping problem is formulated as an ILP for improving congestion control efficiency and providing best-effort communication in the network. A Linear Programming (LP) based heuristic is also developed to overcome the scalability issue. However, these studies have not considered the reduction of computation energy consumed by the processors of MPSoCs architecture.
Wang et al. minimized the inter-processor communication overhead and improved memory usage and throughput using traditional bus-based interconnect infrastructure homogeneous MPSoC platform. First, intra-data dependencies are transformed into inter-data dependencies in the DAG. Second, Heuristic Memory-Aware Task Scheduling (HMATS) is proposed for near optimal task schedule [45] . Since the scheduling heuristic algorithm focus only on traditional bus based communication, therefore cannot be extended to NoC based multiprocessor systems.
Sing et al. [47] proposed a contention-aware, energy efficient, duplication-based mixed integer programming (CEED-MIP) formulation in order to schedule task graphs on heterogeneous NoC-MPSoC architecture. This approach duplicates some tasks on the processor to reduce the inter-processor communication energy and avoid traffic congestion. The study fails to minimize processing energy consumption. Furthermore, duplication of tasks adversely affects overall systems energy savings which are not considered in the energy model.
C. TOTAL ENERGY SAVINGS
Wang et al. [20] optimized communication and computation energy for real-time streaming DAG task using bus communication based homogeneous MPSoC architecture. Task mapping and scheduling problem is formulated as ILP and the schedule length is minimized by reducing the interprocessor communication overhead. In another study, Wang et al. deployed homogeneous MPSoC architecture for realtime streaming applications and integrated task level coarsegrained software pipelining with DPM and DVFS to reduce the total energy consumption. A two-phase energy optimization algorithm is developed wherein the first phase DAG is transformed into independent task model using re-timing. In the second phase GA based algorithm known as GeneS is used to find a feasible schedule and DVFS and DPM are used to reduce computation and inter-processor communication energy consumption [23] . Though, processing and communication energies are minimized but both the studies do not consider NoC based communication. Furthermore, in this study all processors are assumed to be homogeneous.
Huang et al. [21] extended the ILP formulation to both communication and processing energies optimization of NoC-MPSoCs platform with heterogeneous processors.
Li and Wu [24] proposed a two-step contention and energyaware real-time task mapping on NoC based homogeneous MPSoC architecture. First, task mapping is formulated as a quadratic binary programming problem that minimizes the communication energy. Second voltage levels are assigned to each task and communication using GA.
Tariq et al. [7] used NoC based heterogeneous MPSoC and minimized the total energy consumption of tasks with conditional precedence constraints. They have proposed an Iterative Offline Energy-aware Task and Communication Scheduling (IOETCS) Algorithm that collectively performs scheduling and voltage scaling. They have proposed an NLP based algorithm that given an initial schedule generated by Earliest Successor-Tree-Consistent Deadline First algorithm assigns each task and communication optimal voltage and frequency levels within a continuous voltage and frequency range. The optimal continuous voltage and frequency levels are then mapped to valid discrete voltage and frequency levels using either an ILP or Heuristic based algorithms. However, in their approach they have not integrated DPM with DVFS and assume that no energy is consumed during the idle time slots.
Unlike the aforementioned studies we consider NoC based heterogeneous MPSoC architecture with distributed memory for real-time dependent task represented by a DAG. Moreover, We reduce both processing and communication energy consumption and integrate DVFS and DPM in our developed CITM-VA heuristic while explicitly considering NoC links contention. To the best of our knowledge, we have proposed the first approach to solve this problem.
III. PRLIMINARIES
In this section we introduce models on which the proposed contention and energy-aware static task mapping approach CITM-VA is based. In this paper tiles and processors are used interchangeably.
A. APPLICATION MODEL
A real-time application with dependent tasks can be modeled by a Directed Acyclic Graph (DAG) as shown in FIGURE 2. A DAG, G(V , E, X ) is an edge weighted task graph. V = {v 1 , v 2 , . . . , v n } is a set of nodes and each node v i ∈ V represents a task (a sequential chunk of execution). Each task v i has an execution requirement in worst case clock cycles on a processor pe j represented by NCC (i,j) as well as an implicit deadline. E ⊆ V × V denotes a set of edges where each edge (v i , v j ) ∈ E represent a data dependency relation between tasks v i and v j . X is a set of edge weights. Thus, χ (i,j) the edge weight of an edge (v i , v j ) is the volume of data sent from v i to v j in units of bits. Each task in DAG has an individual deadline.
B. PLATFORM MODEL
NoC based MPSoC consisting k number of heterogeneous tiles is considered as demonstrated in FIGURE 3(a). Each tile is comprised of local memory, processor (pe) and a network interface. Therefore, MPSoC contains a set P = {pe 1 , pe 2 , . . . , pe k } of k DVFS-enabled processors. The links that provide connection between the router (represented by R) and tile are termed as Local Links while Global links interconnect the routers with each for data communication purposes. Each processor pe i ∈ P can oper-
Moreover, each processor supports DPM and can be switched into different power modes.
C. COMMUNICATION INTERCONNECT MODEL
A 2D mesh topology NoC architecture is assumed for inter-processor communication. It consists N x rows and N y columns of the routers therefore, k number of routers are equal to N x × N y . Each routers is comprised of five ports to communicate with neighbor routers and a processor as shown in FIGURE 3(b). Each ports has a Link and buffer. All links are identical and full duplex with band width b w . Similar to the processors the links in NoC can operate at n set of discrete voltage/frequency levels i.e.
We assume a simple and energy efficient Wormhole (WH) packet switching technique for NoC communication. WH splits the data packet into small pieces called FLITS and they are delivered in a pipelined fashion in the network. Furthermore, we assume widely used deterministic XY routing scheme. The distance between two processors pe i and pe j in 2D mesh is given by the Manhattan distance η i,j = |x i − x j | + |y i − y j |, where (x i , y i ) are the coordinates of processor pe i and (x j , y j ) are the coordinates of processor pe j in the mesh.
D. ENERGY MODEL
We consider the energy consumed by processors, routers and network links in our energy model. The dynamic power P d i dissipated in executing a task v i on a pe j is given by the following equation [22] , [48] :
where, V dd j , f j , and C eff i denote the supply voltage, operating frequency and the effective load switching capacitance, respectively. This mathematical relation shows that decrease in dynamic power occurs when the supplied voltage is reduced thus, Equation (1) serves as the baseline for DVFS. The total power (P T i ) dissipated in executing a task on a pe j is the sum of the dynamic power, the static power and inherent power P on required to keep the processor on i.e. idle power when no task is running on the processor. At V dd j and f j , the P T i is calculated as follows [22] :
where K 3 , K 4 and K 5 are technology dependent constants, L g is the number of logic gates in the circuit and I j and V bs are junction leakage current and body-bias voltage, respectively. As E = P × t, where t shows time so, the energy E i consumed in executing a task v i on p j is given as follows [7] , [18] :
where t i is the execution time of v i and is given by t i =
. The operating frequency f and supply voltage V dd are related by the following equation [48] :
where V th is the threshold voltage, K 6 is the process dependent constant, L d is the logic depth of the processor critical path and α reflects velocity saturation ( 1.4 ≤ α ≤ 2). So, according to Equation (4) energy consumption rely on the voltage and frequency level assigned to the tasks. The energy consumed by the processor when it is idle in an active mode is given as follows:
where P a is the power consumed by the processor in active mode and t idle is the time period for which the processor is idle. Similarly, the energy consumed by a processor in the sleep mode is calculated by:
where P sleep is the power dissipated by the processor in sleep mode and t sleep is the duration for which processor stays in sleep mode. Since P a > P sleep , energy efficiency can be achieved by switching the processor into a sleep mode.
However, there are switching costs associated in transitioning the processor between active and sleep modes. The processor break even time t BET represents the shortest duration of idle time interval that justifies processor's transition from active mode to sleep mode. Thus, if this interval is shorter than t BET the mode switch overheads are larger than the energy saving and therefore, transition to low power mode should be avoided. The definition of t BET is given as follows:
where t sw and E sw are total switching time interval and total switching energy overhead respectively. Under the fixed operational frequency f l of the links the time taken to transmit the message (χ i,j ) of a communication is in general dominated by the serialization delay. The execution time t i com of transmitting a message for communication
is given as follows [7] :
Let the parent node v s of e i be mapped on pe i and its child node v k be mapped on pe j . Then the hop count between pe i and pe j is given as η i,j . The energy consumed in transmitting one bit of a message e i is E bit = E Rbit (η (i,j) + 1) + η (i,j) E lbit , where E Rbit is the energy consumed by a router in transmitting one bit and E lbit is the energy consumed by links in transmitting a bit. The energy E c i consumed in transmitting χ (i,j) volume of data is calculated as follows [7] , [24] :
where P i shows the total power consumption for one bit on the links that e i traverses at f i . Thus, P i is the sum of static power (P s ) and dynamic power (P d ) i.e. P i = P d i + P s i [48] . Inserting Equation (10) in Equation (9) yields the following equation:
Equation (11) indicates that communication energy can be reduced by assigning optimal discrete frequency or voltage levels (as voltage and frequency are interchangeable according to Equation (4)) to mesh topology NoC links for transmitting the data.
IV. CONTENTION AND ENERGY-AWARE APPROACH
Energy optimization of real-time tasks with precedence and deadline constraints count on the mapping because heterogeneous MPSoC architecture consists of different performance profiles processors. Moreover, energy saving is associated with the order in which both the tasks and communications are executed. Thus, a significant amount of energy reduction can be achieved by prioritizing shorter deadlines tasks and communications among the nodes. This priorities assignment strategy enables DVFS technique to efficiently utilize the available slack by assigning lower voltage levels to the tasks. Moreover, tasks and communication deadlines of real-time application are not violated by DVFS technique which ensures high QoS. Furthermore, DPM also plays a vital role when processor's leakage power in idle mode is taken into consideration and energy efficiency can be increased by switching the processors into a lowpower state. Therefore, the quality and energy efficiency of an approach targeting heterogeneous MPSoCs architecture is influenced by four factors: (1) Task Mapping (2) Voltage Scaling (3) Task Ordering and (4) Slack Power Management. Therefore, algorithms 1-4 are developed for these steps implemented in an integrated manner. Moreover, the notations and terms used in these algorithms are listed in TABLE 1.
Before CITM-VA approach is explained which is demonstrated in Algorithm 1, extended graph (G e ) is defined. Basically G e is the transformed version of a traditional DAG (G). So, G e is generated by inserting additional node V s in G for each edge (v i , v j ) whose head node V j and tail node 
1) INITIAL SOLUTION
In the initial solution step of the CITM-VA first, two matrices and of κ × |V | are generated of κ × |V ||E| dimensions respectively. Where κ denotes the input parameter of CITM-VA algorithm. It performs the following step to generate an initial solution. 1) First, task node v i is randomly mapped to a processor, 1 ≤ i ≤ |V | (Line 5) to generate each row of matrix .
Then each row of matrix is generated through maximum processor voltage assignment to a task v i where it is mapped and maximum link voltage is assigned if v i is a communication node 1 ≤ i ≤ |V ||E| (Lines 6-8).
2) Now each row of and forms a solution. Therefore, to compute the fitness value, first a schedule is generated using Earliest Latest Finish Time First (ELFTF) algorithm described in Section IV-C (Lines 9-12). 3) Then ELFTF returns the feasibility and energy of each solution. Thereby, provided the energy and feasibility of each solution the fitness value is computed using fitness function given in Equation (12) . 2) Second, a new solution from an existing solutions is generated by re-mapping a task node to a processor and/or voltage level or a communication to a voltage level such that total energy is minimized and deadlines are satisfied by using ReMap algorithm which is explained in Section IV-B (Line 39). 3) Finally, the solution is updated if energy savings occur otherwise the existing solutions remain unchanged using the ReMap algorithm. The fitness value of each solution is calculated and new solutions are combined with existing solutions to generate κ solutions for the next iteration.
3) STAGNATION CONTROL
As CITM-VA adopts an elitism based searching approach and keeps the best and discards the worst solution in each iteration. Therefore, CITM-VA is prone to stagnation similar to other elitist approaches. Stagnation occurs when no improvement in the energy reduction is observed over a certain number of predefined iterations. In case of stagnation detection, a stagnation control strategy is followed to escape from local optimum (Lines 21-32). 1) We delete C. ELFTF CITM-VA algorithm examines the feasibility of the schedule under given mapping and voltage assignment (for tasks and communications) by calling Algorithm 3. Algorithm 3 demonstrates ELFTF approach. The two major steps performed by ELFTF are explained as follows: 1) For each task and communication node first the latest finish time is computed (Lines 2-12). 2) A schedule for nodes is generated based on their priorities represented by PRI where PRI is equal to the sum of latest finish time and the earliest start time (16) (17) (18) (19) (20) (21) (22) (23) (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) (34) (35) . Nodes with smaller value of PRI have higher priority over nodes with larger value of PRI . The nodes with higher priorities are scheduled earlier in time compared to nodes with lower priorities. To achieve the goal of the second step first a set named cSet(v i ) is defined. Before cSet(v i ) is defined concurrent nodes are defined. Two nodes in G e are concurrent if they are not reachable from each other in G e . For a task node v i , cSet(v i ) is defined as a set of task nodes concurrent to v i and mapped on the same processor where v i is mapped. For a communication node v i , cSet(v i ) is a set of communication nodes concurrent to v i and have conflict with v i (communication nodes have conflicts if they use same NoC links). Therefore, after a node is scheduled the earliest start time of all the nodes that belong to cSet(v i ) and ImSuc(v i ) are updated to finish time of v i . This ensures that no two nodes that are concurrent are scheduled at the same time and the nodes with smaller priority are scheduled later than nodes with higher priority. The schedule feasibility is determined and its energy is also computed in the process.
D. SLACK POWER MANAGEMENT
There may be slack available after voltage scaling. As discussed in section III-D the processor can remain in active state during the idle period or it can switch to sleep state. This is determined by Algorithm 4 the Slack Power Management (SPM). After scheduling a node we call SPM (in ELFTF, Line 22) . It performs the following steps: 1) We first determine the length of idle time slot t idle (Line 1). 2) We calculate break even time t BET (Line 2). 3) If idle interval is greater than or equal to break even time then the processor is switched to sleep mode otherwise the processors stays in active mode (Lines 3-6).
V. RESULTS AND DISCUSSION
The energy performance of CITM-VA scheme is evaluated by comparing it to a GA based Energy-efficient Contentionaware Mapping (ECM) approach developed by Li and Wu [24] . Primarily ECM maps DAG tasks on NoC based homogeneous MPSoC architecture using quadratic binary programming to minimize the communication energy by applying relaxation-based iterative rounding. Then a GA is used to assign discrete voltages to the task and communication nodes for processing energy consumption reduction considering the deadline. It performs mapping and voltage scaling separately and does not consider LTF of the tasks to generate a suitable task schedule. Moreover, energy performance profiles of the processors are not considered for higher energy efficiency. Additionally energy consumption during the idle period of the processors is assumed negligible which is not a practical scenario. A certain amount of energy consumes to keep the processor on even if there is no task running on it.
A. EXPERIMENTAL SETUP
Five real-world benchmarks listed in TABLE 2 are selected from E3S [42] , which is the most suited and widely adopted benchmarks suite in the automated system level allocation and scheduling research [42] , [49] . In the experimental analysis, a NoC based MPSoC architecture with heterogeneous DVFS-enabled processors is used. Furthermore, the processors have the capability to switch into different power modes. The results are generated considering different scenarios and parameters. A 70 nanometers (nm) processor technology is adopted from [18] and the values of the different parameters are given in TABLE 3. These parameters show CMOS fabrication in 70 nm technology. The reason behind considering 70 nm CMOS technology is that leakage power consumption constitutes a significant portion of the total power [18] , [48] . Subsequently, which makes it a suitable technology for CITM-VA which targets to optimize the leakage power consumption during the idle period of the processor for realtime applications. In the future dynamic power would be dominated by leakage power consumption in the modern embedded systems [50] . Therefore, static energy consumption optimization is important. The simulation environment is built in Matlab version R2016a moreover, the experiments are conducted using hardware platform of Intel(R) Xeon(R), i5-3570 CPU with a clock frequency of 3.5 GHz and 16.0 GB memory, 10 MB cache. Two types of processors, Type 1 (Transmeta Crusoe) and Type 2 (PXA-250) are deployed. Type 1 processor model is acquired from [18] and operates at five different voltage levels in the range from 0.65 V to 0.85 V with 50 mV step. Type 2 is modeled using processor in [51] and it operates at four discrete voltage levels from 0.85 V to 1.3 V with 0.15 V step size. The power and energy consumption for different power modes of these processors are listed in TABLE 4. Where TABLE 5, and TABLE 6 show the operating speed, i.e. different voltage and frequency levels of type 1 and type 2 processors respectively. Furthermore, intlinprog solver for programming ILP problems is also used in the simulation.
B. CITM-VA ENERGY PERFORMANCE
Different results have been generated considering the parameters such as dynamic energy (d), static energy (s), number of tiles or NoC structure and makespan. 
2) 6 × 4 NoC
The energy efficiency of CITM − VA is further investigated for all 5 benchmarks when number of tiles are increased to 24 as shown in the 
4) ROBUSTNESS AND QOS
The robustness and QoS of both the schemes are analyzed when the makespan generated by ECM is considered as baseline and multiplied with MM of 0.9 as exhibited in FIGURE 8. The ECM approach can not produce energy consumption approximation at 0.9 × makespan because it does not implement ELFTF strategy to re-arrange the task nodes order according to the deadline set. Therefore, QoS degrades at strict deadline and ECM fails to efficiently perform task mapping and voltage scaling at strict deadlines for real-time applications. Thus, ECM shows no robustness and exibits poor QoS. Contrarily, CITM − VA@(0.9 × makespan) ECM converges at strict deadlines and produces energy consumption values. Though the energy efficiency reduces to ∼ 18%, ∼ 24% for (CITM − VA) d and (CITM − VA) d+s compared to the normal makespan.
5) ENERGY AND MM
FIGURE 9 demonstrates the total energy consumption of real-world benchmarks for different makespan when 7 × 4 NoC with 28 tiles is used. The total energy consumption of both CITM − VA and ECM decreases when MM value is increased from 0.9. This reduction of energy consumption is due to the expansion in the common deadline and lower discrete voltage levels can be applied to the tasks and communication nodes. It is worth noticing that CITM −VA generates energy consumption values even at strict deadlines while ECM does not converge and fails to produce output below 1.0 value of MM. Moreover, the total energy consumption of ECM for all benchmarks is higher than CITM − VA. So, CITM − VA performs better than ECM at different values of the MM.
VI. CONCLUSION
In this paper, an investigation is performed on contentionaware static mapping and voltage scaling for real-time DAG task set with precedence constraints and individual deadlines using NoC based heterogeneous MPSoC architecture with DVFS-enabled processors. The proposed CITM-VA approach optimizes both the communication and computational energy and performs task mapping, scheduling and voltage scaling in an integrated manner. It adopts ELFTF strategy and generates a prioritized task schedule to adequately utilize the available slack and links. ReMap algorithm is used to efficiently map the task and communication nodes to the resources and discrete voltage levels such that overall energy consumption is reduced. To further improve the energy savings DPM is deployed when an idle processor is in a high-power consumption state. Contention between the communications traversing the same link is eliminated by dedicating the links to higher priority communications. The extensive evaluation results illustrate that compared to stateof-the-art technique ECM, the proposed approach CITM-VA achieves better energy efficiency, i.e. an average ∼ 30%. Moreover, it also maintains high QoS and robustness at strict task deadlines with significant energy savings.
[51] R. Jejurikar, C. Pereira, and R. Gupta, ' 
