Abstract-Outsourcing of the various aspects of IC design and fabri cation flow strongl y questions the classic assumption that "hardware is trustworth y ". Multiprocessor S y stem-on-Chip (MPSoC) platforms face some of the most demanding securit y concerns, as they process, store, and communicate sensitive information using third-part y intellectual property (3PIP) cores that ma y be untrustworth y . The complexit y of an MPSoC makes it expensive and time consuming to full y anal y ze and test it during the design stage. Consequently, the trustworthiness of the 3PIP components cannot be ensured. To protect MPSoCs against malicious modifications, we propose to incorporate trojan toleration into MPSoC platforms b y revising the task scheduling step of the MPSoC design process. We impose a set of securit y -driven diversit y constraints into the scheduling process, enabling the s y stem to detect the presence of malicious modifications or to mute their effects during application execution. Furthermore, we pose the securit y -constrained MPSoC task scheduling as a multi-dimensional optimization problem, and propose a set of heuristics to ensure that the introduced securit y constraints can be fulfilled with minimum performance and hardware overhead.
I. INTRODUCTION
Heterogeneous Multiprocessor System-on-Chip (MPSoC) archi tectures have become a routine way of building embedded systems such as smart phones, network routers, storage and web servers, and gaming systems [1] . MPSoC designers typically integrate third party intellectual property (3PIP) cores and outsource fabrication and testing steps. This allows designers of high-end embedded system applications to quickly respond to the increasing demands in functional, power consumption and programmability without sacrificing design productivity [2] .
Notwithstanding these benefits, outsourcing the design steps is making heterogeneous MPSoC platforms an important attack vec tor especially since defenses against software-based attacks have become strong. Heterogeneous MPSoCs are vulnerable to malicious modifications (also known as Hardware Trojan Horses) in the 3PIPs and in the manufactured IC during fabrication. Trojans may cause system failures at some key point during application execution or could create backdoors to leak confidential information back to the attacker. Moreover, the growing complexity of MPSoCs makes it expensive and time consuming to fully test or analyze a system for the presence of trojans, since they are purposefully inserted by an attacker in hard-to-detect sites in the design.
Since it is not possible to guarantee trustworthiness (i.e., 100% trojan free-ness) of 3PIPs, researchers have started to look into the possibility of enabling MPSoC to detect trojans or mute their effects during application execution. In this paper, we pose a security-driven MPSoC task scheduling technique to account for the untrustworthiness of the 3PIP cores. Our main contribution is the incorporation of diversity into MPSoC task schedules. As multiple copies of the same 3PIP may be instantiated in the target 978-1-4799-1585-9/ 13/$3 1.00 ©20 13 IEEE MPSoC, diversity is essential to prevent two copies of a task from producing the same incorrect outputs. We will describe diversity based scheduling constraints that enable the target MPSoC to (1) detect trojans that maliciously alter task outputs, and (2) preclude potential collusion between 3PIP cores from the same vendor I . Incorporating the outlined security constraints into heterogeneous MPSoCs, while desirable, needs to accommodate the performance, power, cost, and design complexity constraints that the designer already faces. We propose security-driven MPSoC task scheduling as a multi-dimensional optimization problem, and outline task scheduling heuristics which, by exploiting the flexibility inherent in schedule generation, are able to minimize the performance, power, and hardware overhead of the security constraints.
The rest of this paper is organized as follows. Section II briefly reviews MPSoC design security and outlines the technical motiva tion of the proposed technique. Section III presents the proposed security-driven scheduling constraints and heuristics. Section IV experimentally verifies the efficacy of the technique, while Section V summarizes the paper.
II. BACKGROUND AND MOTIVATION

A. Security challenges in MPSoC design
To reduce costs and meet the tight time-to-market deadlines, com panies such as Apple Inc. purchase IP cores from third parties (e.g., ARM and Marvell), integrate these cores, generate the layout, and send it to foundries (e.g., Samsung Foundries) for fabrication. This globalized design trend makes MPSoCs prone to insider attacks. A rogue insider in the foundry may make subtle mask changes, or alter chemical compositions to accelerate failures in critical circuitry [3] . A rogue insider in a third-party design house may insert malicious logic in an IP [3] , [4] , [5] to modify functionality, deny service, or create a backdoor to leak confidential information.
Current trojan detection techniques are based on functional testing [6] and/or side-channel analysis [7] , [8] . These techniques hinges on the characterization of system function, path delays [9] , and/or power consumption [6] , [10] , [11] so that manufactured chips can be measured and compared against the expected values to detect trojans. Yet these techniques are not quite effective for trojans in 3PIPs since there is no golden (trojan-free) model for the designer to refer to. Neither are techniques used to detect design bugs, such as RTL verification, effective for detecting trojans in 3PIPs [12] , since they are very time consuming and do not scale for complex MPSoCs.
I Typically different vendors do not share a common back door or common trigger pattern in common since to do so, a rogue element has to expose Itself to other rogue elements. 
B. The key idea
We question the implicit assumption that underlies all MPSoC design flows: 3PlPs procured from 3PlP vendors are trustworthy. There may be a rogue designer in a third-party design house who may insert non-trivial, malicious logic in 3PIPs coming out of the design house. The inserted trojan may cause the task running on the malicious 3PIP either to produce incorrect output or to generate additional, unexpected data to trigger trojans in another 3PIP core from the same vendor.
We incorporate security constraints in the scheduling step of the MPSoC design flow to handle trojans that may have eluded the verification based approaches. Our scheme either detects or mutes the effect of trojans as follows:
• A trojan can be always on. The MPSoC designer has no control over an always-on trojan in a procured 3PIP. These trojans will be detected through duplicating tasks on 3PIP cores with the same functionality but from different vendors and checking their results.
• A trojan can be conditionally triggered. An attacker may distribute trojans across multiple cores such that a trojan in one 3PIP activates a trojan in another 3PIP at a chosen time, at a chosen location, and on a chosen input [3] . A concrete example is shown in Fig. 1 To mute these distributed trojans, we prevent 3PIPs produced by the same vendor from communicating with each other. This way, even if a core generates an unexpected trigger, such data is never communicated to any core from the same vendor that may be looking for it.
C. Related work
A wealth of trojan detection techniques target malicious modifica tions during fabrication. Hardware trojans are identified by detecting their impact on power [6] , [10] , [11] , delay [9] , and a combination of them [7] , [S] . Yet side channel analysis is usually limited by the measurement capabilities of the analog probes. For trojans of small size, the subtle differences in power and delay can be masked by process variations and measurement errors, which can be as high as 5% [13] . In [13] , several non-destructive techniques combining algebraic, numerical, and statistical methods with power and delay measurements have been proposed to detect hardware trojans in the presence of process variations.
A few techniques are proposed to tackle trojans in 3PIPs. In [14] , a trojan detection and prevention scheme is proposed for homogeneous systems. Each program is partitioned into segments and redundantly executed on three or more cores, aiming at limiting the data access capability of each core. In [4] , a register transfer level technique is proposed in to monitor inter-component commu nications to detect malicious behavior. This work is extended in [15] to prevent trojans from being triggered through obfuscating and scrambling the inputs to infected hardware units. As these techniques require detailed RTL information, their application to third-party IP cores is limited. Another related work [12] requires the check of a 3PIP against pre-defined agreements on security related properties provided by the vendor. Yet developing security related properties for a 3PIP is still in its infancy. Further, there may still be opportunities for the rogue designer to deliver malicious 3PIP cores that honor these security properties.
Previous work in MPSoC security targets software attacks such as buffer overflow, stack overflow, and software-based side-channel attacks. Architectural enhancements to MPSoCs have been proposed in [16] , [17] to detect software-based attacks through monitoring timing, control flow, and instruction execution counts at runtime. MPSoCs are protected against software-based side-channel attacks by creating a trusted execution environment that isolates the cores that execute critical tasks from the rest [IS] . Support for customizing the security policies for different applications executing on an MPSoC has been proposed in [19] .
III. SECURITy-DRIVEN TASK SCHEDULING
Task scheduling is a critical step in MPSoC design [20] that binds tasks to cores and coordinates necessary communications among the tasks. This step determines the performance and power consumption characteristics of an application, as well as the types and number of the 3PIP cores needed in the MPSoC. Whereas, a typical scheduler explores a two-dimensional design space of performance (modeled as schedule length) and cost (modeled as types and total number of cores), in this paper, we explore a third optimization dimensionsecurity.
In this section, we first define a set of security constraints, which are then embedded into the MPSoC task schedule. To satisfy these security constraints at minimum performance and hardware over head, we furthermore develop heuristics that exploit the flexibility inherent in task scheduling.
A. Security-driven scheduling constraints
The proposed technique imposes two security constraints to detect trojans and prevent tasks from passing triggers to activate trojans, respectively.
1) Duplication with diversity: To detect trojans, every task is redundantly executed on two 3PIP cores each coming from a different vendor. Same inputs will be sent to the task and its duplicate, while the outputs of both copies will be compared for trojan detection.
The task and its duplicate are assumed to be compared by a trustable component (not designed by the third party) to ensure the trustworthiness of the comparison step. Note that this comparison is at a relatively coarse granularity: instead of performing cycle-by cycle comparison of signals and instruction results, our technique only compares the final task outcome. However, the designer still can protect critical intermediate values, if any, by partitioning tasks into smaller sub-tasks. If the comparison fails, all the dependent tasks are terminated and a security flag is raised.
This security constraint ensures the detection of any task that pro duces an incorrect output (because of a trojan), as long as attackers in two independent IP design houses will not collude to develop identical trojans that produce identical incorrect outputs (which is highly unlikely). Note that this constraint is stricter than the straightforward duplication used for fault tolerance. Straightforward duplication is sufficient for fault detection because fault behavior is random and hence the occurrence of two identical faults in different cores is extremely low (unless these faults are induced by design errors). In contrast, trojans are inherent in 3PIPs. If multiple copies of the same 3PIP are instantiated in the MPSoC, each instantiation will contain the same trojan. Therefore, diversity is essential to prevent two copies of a task from producing the same incorrect outputs.
2) Communication diversity: This constraint aims at preventing collusion between 3PIPs from the same vendor. To prevent activation of these trojans, a task and its predecessor need to be scheduled on 3PIP cores from different vendors.
These two security constraints together prevent activation of dis tributed trojans. The first constraint exposes incorrect task outputs, preventing them from triggering trojans in any dependent core. The second constraint ensures that all the valid communication paths are between 3PIPs from different vendors. Therefore, during application execution if one core silently produces unexpected data in addition to valid task outputs, the dependent cores, as they come from different vendors and hence are unlikely to collude with this core, will not use such data for trojan triggering. If two cores from the same vendor access the same data object 2 , a security flag will be raised indicating the detection of an invalid communication path.
B. Reformulation of the MPSoC scheduling problem
Companies such as Microchip Technology, Altera, Sony and Ingenic provide cores sharing the same instruction set (i.e., MIPS [22] ). These diverse cores can be integrated on the target MPSoC to achieve the diversity required by the two security constraints. However, an increased level of diversity elevates the design cost. The two security constraints require more cores, and creating more inter core communications that may probably increase schedule length and energy consumption.
To consider security along with the traditional scheduling goals, we extend task scheduling from three aspects to incorporate the outlined security constraints.
• We consider the number of vendors as an additional scheduling metric that impacts hardware and design cost.
• In line with existing scheduling algorithms for heterogeneous systems [23] , [24] , we model heterogeneity through assigning different processing speeds to cores from different vendors.
• The goal is to fulfill security constraints while minimizing the associated overhead in hardware, performance, power, and cost, at a priority specified by the designer.
The proposed scheduling constraints can be incorporated into various scheduling algorithms [20] , [25] , [26] to achieve diversity and enhance security in heterogeneous MPSoCs. To demonstrate the effectiveness and potential overhead of these constraints, we select a representative class of scheduling algorithms (i.e., list scheduling) as the baseline, and present two approaches for incorporating security constraints in the rest of this section.
C. A straightforward scheduling approach
Our first approach directly embeds security constraints into the task graph, and then schedules the graph onto the target MPSoC. In a nutshell, this approach consists of the following four steps, as shown in Fig. 2 
(a):
• Task graph coloring that satisfies security constraints and determines the number of vendors needed in the target MPSoC.
• Color-constrained task scheduling that determines the total number of cores, the core assignment of each task and its tentative start time.
2 This can be detected by monitoring the communication channels. For shared memory MPSoCs, an ownership vector (similar to the one used in directory based cache coherency protocols [21] ) can be used. For MPSoCs using on-chip networks, the routers can be augmented to check the types of cores sending and receiving each message. • Color to core-speed mapping that determines the exact type and speed of each 3PIP core.
• Schedule finalization that determines the exact start time and finish time of each task based on core speed.
1) Task graph coloring:
Given the security constraints, determin ing the minimum number of vendors needed in the target MPSoC platform can be modeled as a graph coloring problem 3 .
Based on the two security constraints, a task conflict graph can be constructed. Ve rtices in this graph represent tasks, while edges in the graph represent conflicts. Specifically, to model the duplication with-diversity constraint, the entire task graph is duplicated, and an edge is inserted between each task and its duplicate. To model the communication-diversity constraint, an edge is inserted between any pair of dependent tasks.
Once the task conflict graph is constructed, standard graph coloring algorithms [27] can be applied to determine the minimum number of colors needed to color the graph, which is equal to the minimum number of vendors needed in the MPSoC.
2) Color-constrained task scheduling: This is performed by mod ifying a standard list scheduling approach to ensure that coloring constraints can be fulfilled. At the beginning, all the cores are colorless; at each scheduling step, a task is placed on colorless cores or cores with the same color to find its early start time; if the task is assigned to a colorless core, the core immediately inherits the task color, and only accommodates tasks of the same color from then on.
3) Color to core-speed mapping: While the prior step colors each individual core, the exact speed of each core has not been determined. Since tasks on critical paths have direct impact on schedule length, we develop a heuristic that assigns a higher speed to a core with more critical tasks in order to maximize schedule performance.
The proposed heuristic ranks cores according to the number of critical tasks a core has. For cores with the same number of critical tasks, a higher rank is given to the one either with more critical child tasks or on the upper levels of the task graph. In this way, the benefit obtained by scheduling these critical tasks earlier can be maximized.
After ranking all the cores, the color to core-speed mapping can be performed in an iterative manner. At each iteration, among all the "not-assigned-yet" cores, the one with the highest rank is identified. This core, along with the ones that have the same color as it, is given the highest speed among all the "available" speed. Then, that speed is marked as "unavailable" and all these cores are marked as "assigned". This process iterates until all the cores have been assigned a specific speed.
4) Schedule finalization:
The core speeds assigned in the prior step impacts task execution time. Without changing the task-to-core binding, this step adjusts the start and finish time of each task. If the finish time of a task is delayed, its dependent tasks need to be delayed as well.
D. Schedule quality enhancement
The straightforward scheduling approach fulfills the proposed security constraints at the finest granularity: trojan detection is added to each task output, and collusion prevention is added to each communication path. Yet one significant disadvantage of this approach is that all the communications are forced to be between 3PIP cores. This prohibits the scheduler from putting dependent tasks on the same core to hide communication latency and save energy.
To reduce the performance and energy overhead, we explore the possibility of grouping dependent tasks on critical paths into a cluster and scheduling the entire cluster in a single core. This will reduce the number of inter-core communications while satisfying the security constraints at a relatively coarser granularity. In other words, unlike the straightforward approach that always schedules dependent tasks across different vendors, the cluster-based approach schedules dependent tasks either on the same core (for the intra cluster cases) or across different vendors (for the inter-cluster cases).
Since a standard list scheduler has the ability of automatically grouping critical tasks and scheduling them on the same core to hide communication latency, we develop a cluster-based scheme that first generates a performance-driven schedule and then colors the schedule to fulfill security constraints. As shown in Fig. 2(b) , the revised scheme also includes four steps: task scheduling, core coloring, color to core-speed mapping, and schedule finalization. As the latter two steps are identical to the straightforward approach, we focus on the first two steps in the rest of this section.
1) Task scheduling: This step generates a tentative schedule and determines the total number of cores needed in the target MPSoC. To maximally explore the scheduler's ability of grouping critical tasks, we impose minimum constraint during this step, that is, a task i and its duplicate i f cannot be scheduled on the same core (for trojan detection).
2) Core coloring: This step embeds security constraints into the schedule generated in the prior step, determining the exact color of each core and the number of vendors needed in the target MPSoC.
Since tasks are grouped into clusters, security constraints need to be imposed between cores instead of tasks. Hence, we construct a core conflict graph wherein each vertex represents a core and each edge represents a conflict. An edge is inserted between cores i and j in two cases: (1) there exists one or more pairs of duplicated tasks on i and j (to satisfy duplication-with-diversity); (2) there exists one or more communication paths between i and j (to satisfy communication-diversity).
The core conflict graph can be easily constructed based on the tentative schedule generated at the prior step. Subsequently, standard graph coloring algorithms [27] can be applied to determine the color of each core, while each task automatically inherits the color of the core it is scheduled on.
E. Heuristic for vendor count control
While the cluster-based approach is capable of placing critical tasks on a single core to boost performance, it has a potential hazard of increasing the number of vendors needed in the MPSoC. This is because the number of vendors is bounded by the maximum clique size 4 of the core conflict graph. Unfortunately, traditional performance-driven list scheduling does not take this into consider ation. The scheduling decisions may end up inducing a large clique in the core conflict graph.
To solve this problem, we propose a heuristic to exploit the flex ibility in prioritizing scheduling decisions. A traditional scheduler randomly picks a core assignment if a task has the same earliest start time on cores i and j. In contrast, the proposed scheme evaluates these two options based on their impact on the maximum clique size of the corresponding core conflict graph and selects the one with the smaller impact. In this way, the scheduler is able to minimize the number of vendors without degrading schedule performance.
Precisely computing the maximum clique size of a graph is NP complete. However, as the scheduler processes one task at a time, an efficient heuristic can be developed. Scheduling a task on core i may add one or more edges 5 into the core conflict graph. As these edges share the same vertex i, they at most may increase the max.imum clique size by 1. We employ the algorithm proposed in [28] to compute a tight upper-bound of the maximum clique size based on the number of triangles in the conflict graph. If an edge eij is added, only the vertices i, j and the neighbors of i and j need to be ex.amined, thus reducing the computation complexity.
F. Illustrative example
To concretely illustrate the differences between the two schedul ing approaches, an example of applying them to the standard task graph of Gaussian Elimination is shown in Fig. 3 . Each node includes four values: Ti denoting its task 10 i, Cj denoting that it is assigned to core j, and m-n denoting its start time m and finish time n. Task execution time is 10 on the faster codes (light nodes) and 12 on the slower cores (dark nodes). The solid arrows represent intra-core communication (with 0 overhead), while the bold, dash arrows represent inter-core communication (with 10 overhead).
Clearly, the baseline schedule in Fig. 3(a) does not satisfy the security constraints. Tasks are not duplicated, and cores from the same vendor (Cl and C2) directly communicate. In comparison, the schedule in 3(b) satisfies both security constraints at the task level. Every task is duplicated on two distinct cores, and every communication is between distinct cores, resulting in a 50% longer schedule than 3(a). The schedule in 3(c) is obtained by duplicating the performance-driven schedule in 3(a) and then coloring it to fulfill security constraints. The schedule contains 6 clusters, and 50% of all the communications are intra-cluster, resulting in a 30% improvement in schedule length over 3(b).
IV. EXPERIMENTAL RESULTS
A. Methodology
We perform comprehensive study to evaluate the effectiveness and the potential overhead of the proposed security-driven task scheduling approaches. We select a standard list scheduling [29] algorithm as the baseline. The different algorithms -baseline, straightforward, and cluster-based -are implemented in C.
The test set includes standard parallel task graphs and random task graphs. Standard task graphs include fork-join, LV decomposition, Laplace equation solver, Gaussian elimination, and FFT. Their DAG representations can be found in [30] . We also use TGFF [31] to generate 100 random task graphs representative of a large spectrum of possible parallel applications. Table I reports the configurations for critical parameters in TGFF. High communication and low-communication task graphs are generated by adjusting computation and communication overhead parameters_ It is assumed that the underlying MPSoC platform can accom modate up to 16 cores_ Typically the cores produced by different vendors exhibit variations in many parameters (e.g., speed, area, and power consumption). However, since performance is the main concern of scheduling, we focus on modeling speed variations across 3PIP cores, in line with the assumption made by most scheduling approaches of heterogeneous systems [23] , [24] . We set the step of speed differences equal to 10% of the fastest core speed in our experiments. Note that the core speed only affects task execution time, while the inter-core communication overhead remains intact.
B. Results
1) Number of vendors needed:
Our first set of experiments explores the minimum number of vendors required for incorporating the proposed security constraints. As this is a crucial factor that impacts both the design cost and the hardware cost, our goal is to control this factor within an reasonable range. Fig. 4 shows the ratio distribution of the minimum number of vendors required for all the task graphs. It can be clearly seen that when the heuristic for minimizing the maximum clique size is applied during scheduling, 4 vendors are sufficient for most of the tested task graphs (lOS out of the 107 tested task graphs, including 7 standard and 100 random). We believe that this extra cost for vendors is acceptable for building a trustworthy MPSoC, especially when it is used in critical infrastructure.
2) Length of two security-driven schedules: We evaluate the performance of the straightforward and the cluster-based schemes by comparing their schedule lengths to the baseline list scheduling scheme [29] . To ensure fairness in comparison, the three schemes use the same number of cores (the one that allows the baseline scheme to deliver best performance), and the two security-driven schemes use the same number of vendors. For the two security driven schemes, their ratio of schedule length increase (L:!.SL) over Ratio distribution of minimum number of vendors needed for security, wlo and wI maximum clique constraint the baseline is calculated using the following equation:
where n is the number of task graphs and SL(i) stands for the schedule length of the ith task graph. The lower the L:!.SL, the better performance it achieves.
The results of schedule lengths are shown in Table II . Both the standard and the random task graphs are reported, with the random task graphs divided into four categories based on the communication overhead and the number of vendors needed. As can be seen, duplicating every task increases the schedule length by 58%. When both security constraints are imposed, the straightforward approach has 21 %-40% degradation on top of duplication-only, while the cluster-based approach only has 9%-20% on top of duplication only. This confirms that by grouping critical tasks into a single core, inter-core communication latency can be largely hidden, and hence schedule length can be sizably reduced.
Another observation is that the task graphs requiring 4 vendors have larger overhead in schedule length than those with 3 vendors. This is because slower cores need to be used for execution when the number of vendors increases.
3) inter-core communication ratio: Finally, we report the ratio of inter-core communication (denoted as "inter-la-air') for each scheduling scheme. This is obtained using the following formula:
mter to a -� C ( .) n i=l ommall z with Comminter(i) and Commall(i) respectively denoting the inter-core communications and all the communication paths for task graph i. A higher ratio implies that more communication paths satisfy the second security constraint (while the first security constraint is always guaranteed).
The results are also shown in Table II . As expected, all the communication paths in the straightforward scheme are forced to be between cores. For the cluster-based scheme, the ratio of inter-core communications is close to the baseline scheme. This is because the cluster-based approach first generates a performance-driven schedule (in the same way as the baseline) and then imposes security constraints to protect these inter-core communication paths and hence results in a trustworthy MPSoC design.
V. CONCLUSIONS
We have presented a task scheduling based approach that an MPSoC designer can take to protect MPSoCs against malicious modifications in 3PIPs. We proposed two security constraints: the duplication-with-diversity constraint allows the detection of trojans that cause tasks to produce incorrect output, while the communication-diversity constraint allows the prevention of col lusion between 3PIPs from the same vendor. We furthermore proposed two approaches for incorporating these constraints into task scheduling, one at the task level and the other at the core level. The experimental results show that using the proposed scheduling heuristic, security constraints can be fulfilled within 4 vendors. By scheduling dependent tasks on the same core, the overhead caused by the two security constraints can be effectively reduced to 81 % of schedule length with no extra core needed. We believe that this extra cost is acceptable for building a trustworthy MPSoC, especially when it is used in critical infrastructure.
Future work will be focused on three directions: (1) incorporating security constraints into various other scheduling algorithms (such as genetic, ILP-based), (2) constructing more precise and detailed models of core processing speeds and power consumption, task exe cution time and variations, and (3) extending the security constraints so as to protect MPSoC against collusion even between different vendors. 
