Adaptive Energy-Efficient Task Partitioning for Heterogeneous
Multi-Core Multiprocessor Real-Time Systems by Saha, Shivashis et al.
University of Nebraska - Lincoln 
DigitalCommons@University of Nebraska - Lincoln 
CSE Conference and Workshop Papers Computer Science and Engineering, Department of 
2012 
Adaptive Energy-Efficient Task Partitioning for Heterogeneous 
Multi-Core Multiprocessor Real-Time Systems 
Shivashis Saha 
University of Nebraska-Lincoln, ssaha@cse.unl.edu 
Jitender S. Deogun 
University of Nebraska-Lincoln, jdeogun1@unl.edu 
Ying Lu 
University of Nebraska-Lincoln, ying@unl.edu 
Follow this and additional works at: https://digitalcommons.unl.edu/cseconfwork 
 Part of the Computer Sciences Commons 
Saha, Shivashis; Deogun, Jitender S.; and Lu, Ying, "Adaptive Energy-Efficient Task Partitioning for 
Heterogeneous Multi-Core Multiprocessor Real-Time Systems" (2012). CSE Conference and Workshop 
Papers. 198. 
https://digitalcommons.unl.edu/cseconfwork/198 
This Article is brought to you for free and open access by the Computer Science and Engineering, Department of at 
DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in CSE Conference and 
Workshop Papers by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. 
Adaptive Energy-Efficient Task Partitioning for Heterogeneous
Multi-Core Multiprocessor Real-Time Systems
Shivashis Saha, Jitender S. Deogun, and Ying Lu
Department of Computer Science and Engineering,
University of Nebraska-Lincoln, Lincoln, NE 68588-0115, U.S.A.
Email: {ssaha,deogun,ylu}@cse.unl.edu
Abstract—The designs of heterogeneous multi-core multipro-
cessor real-time systems are evolving for higher energy efficiency
at the cost of increased heat density. This adversely effects the
reliability and performance of the real-time systems. Moreover,
the partitioning of periodic real-time tasks based on their worst
case execution time can lead to significant energy wastage.
In this paper, we investigate adaptive energy-efficient task
partitioning for heterogeneous multi-core multiprocessor real-
time systems. We use a power model which incorporates the
impact of temperature and voltage of a processor on its static
power consumption. Two different thermal models are used
to estimate the peak temperature of a processor. We develop
two feedback-based optimization and control approaches for
adaptively partitioning real-time tasks according to their ac-
tual utilizations. Simulation results show that the proposed
approaches are effective in minimizing the energy consumption
and reducing the number of task migrations.
Keywords-Adaptive Task Partitioning; Thermal-Constrained
Task Partitioning; Energy Minimization; Heterogeneous Multi-
Core Multiprocessor Real-Time Systems
I. INTRODUCTION
Energy-efficient designs of computer systems have received
significant interest in recent years due to an increased need for
energy conservation [1]. Heterogeneous multiprocessor real-
time systems are known to be energy efficient and have better
performance as compared to homogeneous systems [2]. The
energy efficiency of recent multiprocessor systems is achieved
by increasing the power density. This in turn results in high
heat density which can significantly impact the reliability and
performance of heterogeneous real-time systems [3].
The power consumption of a processor is divided into
static and dynamic power consumption. The power consumed
by the processor to maintain its activeness is called the
static power consumption [1]. Similarly, the power needed by
the processor while executing a task is called the dynamic
power consumption. The static power is generated by the
leakage current while the dynamic power is a function of
the speed of the processor [4]. This function is known to
be a strictly convex and monotonically increasing function
and is represented by a polynomial of at least second degree
[5]. This convex relationship is exploited by the Dynamic
Voltage Scaling (DVS) techniques for minimizing the total
energy consumption of a processor [6]. There has been a
significant research in energy-aware scheduling for homoge-
neous multiprocessor systems which have negligible static
power consumption [7]. It has been shown that static power
consumption is significant and is comparable to dynamic
power consumption [8]. Leakage aware scheduling strategies
for heterogeneous systems have been recently investigated [1],
[9]. There has also been a recent interest in temperature-
aware multiprocessor scheduling strategies for minimizing the
temperature of a processor and thus improving its reliability
[10], [11], [12]. In an ongoing project, we have been inves-
tigating thermal-constrained energy-efficient partitioning for
heterogeneous multi-core multiprocessor real-time systems.
Worst case execution time (WCET) of tasks are generally
known to be pessimistic estimates. Thus, WCET based task
partitioning may result in significant energy wastage as the
processors may be unnecessarily running at high speeds due
to the pessimistic estimation. Thus, there is a need for an
adaptive approach for partitioning tasks in order to minimize
the energy consumption in real-time systems.
In this paper, we investigate adaptive thermal-constrained
energy-efficient partitioning of periodic tasks in heterogeneous
multi-core multiprocessor real-time systems. We consider a
system which is heterogeneous across multiprocessors, but
homogeneous within a multiprocessor. Thus, our objective is
to find an optimal set of active cores and partitioning of the
tasks based on their actual utilization that results in minimum
energy consumption while satisfying processor constraints and
meeting task deadlines. We use a power model that considers
the impact of temperature [12] and voltage [11] of a processor
on its leakage power consumption. Two thermal models, heat-
independent thermal (HIT) model and heat-dependent thermal
(HDT) model, are used for estimating the peak temperature
of a processor. We consider negligible heat transfer among
the cores in HIT model [11], and non-negligible heat transfer
in HDT model [10], [12]. We present Distributed Utilization
Control (DUC) and Greedy Utilization Control (GUC) heuris-
tics for adaptive task partitioning which are feedback-based
optimization and control approaches. Simulation results show
the effectiveness of the proposed heuristics in minimizing en-
ergy consumption and reducing the number of task migrations.
The rest of the paper is organized as follows: Related
work is given in Section II. The system models and the
problem statement are given in Section III. The adaptive
task partitioning algorithms are presented in Section IV. In
Section V, we present and discuss the simulation results.
Finally, conclusion and future works are given in Section VI.
978-1-4673-2362-8/12/$31.00 ©2012 IEEE 147
2012 International Conference on High Performance Computing and Simulation (HPCS), 
Digital Object Identifier: 10.1109/HPCSim.2012.6266904  
II. RELATED WORK
Real-time scheduling of periodic tasks in multiprocessor
environment is well investigated [13]. The existing techniques
can be broadly classified into global and partitioning-based
scheduling techniques. In global scheduling techniques, the
global scheduler selects tasks for execution primarily from a
queue of tasks [14]. However, in partitioning-based scheduling
techniques the processors are independently scheduled and
each task is assigned to a single processor for execution [15].
Partitioning-based scheduling techniques are more widely used
than global scheduling techniques due to their simplicity
in design and implementation [16]. Thus, our heuristics are
partitioning-based scheduling techniques aimed to minimize
energy consumption while satisfying the thermal constraints.
There has been a significant amount of research in energy-
aware scheduling strategies for homogeneous multiprocessor
systems [7], [14]. However, energy-aware scheduling strategies
for heterogeneous multiprocessor systems is still in its early
stages [1]. Most of these existing work is based on power mod-
els with negligible static power consumption [17]. However, a
power model with non-negligible static power consumption
was recently investigated [1]. Recent research has shown
that leakage current of a processor changes super linearly
with its temperature [18]. Thus, thermal aware scheduling
strategies in real-time systems have been recently investigated
which assume dynamic leakage current [10], [11], [12], [19], .
Thermal models were also proposed to consider non-negligible
heat transfer between different cores in a multi-core system
[10], [12]. Moreover, it has also been shown recently that
leakage current is not only impacted by the temperature of
a processor but also by its supply voltage [11].
Most of the partitioning-based approaches use information
of WCET of tasks which is overly pessimistic and hard to pre-
dict accurately. Thus, there is a need for adaptive partitioning
of tasks based on their actual execution time. Control theory
has been used for developing adaptive centralized scheduling
for multiprocessor real-time systems [20]. Probabilistic distri-
butions of execution time of tasks have also been used for
developing adaptive scheduling strategies [21].
III. SYSTEM MODELS AND PROBLEM DEFINITION
In this section, we describe our models and the problem.
A. Multiprocessor Model
Let Ω = {M1,M2, · · · ,Mm} denotes a set of intercon-
nected heterogeneous multiprocessor units. Each multiproces-
sor unit has k identical cores (or processors), i.e. Mi =
{Mi,1,Mi,2, · · · ,Mi,k} (i = 1, . . . ,m). A core Mi,j (j =
1, . . . , k) supports dynamic voltage scaling (DVS) and varies
its voltage/speed/frequency fi,j to one of the discrete levels in
the range [fmini , f
max
i ], where f
max
i (f
min
i ) is the maximum
(minimum) operating frequency of the multiprocessor unit Mi.
Thus, our system is heterogeneous across multiprocessors,
but homogeneous within a multiprocessor. For simplicity, the
frequency of a core is normalized with respect to (wrt) fmaxi ,
i.e., we assume fmaxi = 1. The throughput (or capacity) of
a core is proportional to its operating frequency [22]. The
capacity of a core Mi,j is denoted by µi,j and is equal to
αifi,j , where αi is the performance coefficient of Mi. In
a heterogeneous multiprocessor system, higher values of αi
correspond to more powerful multiprocessor units.
B. Task Model
Let Γ = {τ1, τ2, · · · , τn} denotes a set of independent
periodic real-time tasks. A periodic task τi (i = 1, . . . , n)
is an infinite number of task instances (jobs) released with
periodicity Pi [23]. Thus, the relative deadline of a current
instance (job) of τi is represented by its period Pi. Wi denotes
the worst-case execution time of τi on a core of a standard
multiprocessor ℘ with performance coefficient α℘ = 1 and it
is equal to Wif℘ , if the core ℘ is running at a constant frequency
f℘. The worst-case utilization of τi under the maximum
frequency of a standard core is denoted by ui and it is equal
to WiPi . Thus, the total worst-case utilization of task set Γ
under the maximum frequency of a standard core is Utot =∑n
i=1 ui =
∑n
i=1
Wi
Pi
. The necessary condition for having a
feasible schedule of Γ on Ω is to have Utot ≤
∑m
j=1 kαj . We
make this assumption throughout the paper.
Given a partitioning of tasks into cores, the worst-case uti-
lization of Mi,j (i=1, . . . ,m; j=1, . . . , k) under its maximum
frequency is denoted by Ui,j . If Γi,j denotes the set of tasks al-
located to Mi,j , then Ui,j=
∑
τr∈Γi,j Wr/Pr
αi
=
∑
τr∈Γi,j ur
αi
, where
r=1, . . . , n. As each task is allocated to exactly one core,
Utot=
∑n
r=1 ur=
∑m
i=1
∑k
j=1 αiUi,j . P is the hyper-period of
task set Γ, i.e. the minimum positive number such that the jobs
are released every P time units. If P1, · · · , Pn are integers,
then P is the least common multiple (LCM) of all task periods.
In addition to the task period Pi and worst-case execution
time Wi of a task τi (i = 1, . . . , n), let the actual execution
time of the qth job of τi be ci,q . Thus, under a constant
frequency f℘, the actual execution time of qth job of τi is
ci,q
f℘
. Therefore, the actual utilization of τi under maximum
frequency (fmax=1) on a standard core ℘ is u˜i=
ci,q
Pi
. Given
a partitioning of tasks into cores, the actual utilization of
Mj,r (j=1, . . . ,m; r=1, . . . , k) under its maximum frequency
is U˜j,r=
∑
τi∈Γj,r u˜i
αj
. Thus, we aim to minimize energy con-
sumption in hyper-period P while satisfying all the constraints.
C. Power Model
The power consumed by a core Mi,j is given by Φi,j
(i=1, . . . ,m; j=1, . . . , k). It is composed of two parts: Φsi,j and
Φdi,j (Eq. 1a). Φ
s
i,j is the static (or leakage) power consumption
of a core which is generated by the leakage current for
maintaining the activeness of the core [1], [8]. Φdi,j is the
dynamic power consumption of a core required for executing
a task [4]. Eq. 1b gives the static power consumption of a core
which is dependent on both frequency [10], [12] (proportional
to the supply voltage [24]) and temperature [11], [24] of the
core. γi, δi, and χi are non-negative constants dependent on
the architecture of Mi. The dynamic power consumption of
a core is a function of its frequency. Using current DVS
technologies, g(fi,j) is assumed to be a strictly convex and
monotonically increasing function which is represented by a
148
polynomial of at least second degree. We assume Φdi,j is a
cubic polynomial in frequency (i.e. f3i,j) [10], [11], [12] and
is given by Eq. 1c.
Φi,j(fi,j) = Φ
s
i,j(fi,j) + Φ
d
i,j(fi,j) (1a)
Φsi,j(fi,j) = γifi,j + δifi,jTi,j (1b)
Φdi,j(fi,j) = χif
3
i,j (1c)
D. Temperature Model
In this section, we present two different thermal models,
heat-independent thermal model [11] and heat-dependent ther-
mal model [10], [12].
1) Heat-Independent Thermal (HIT) Model: In this model
we assume there is negligible or no heat transfer among cores
of a multiprocessor unit or among different multiprocessors
[11], [24]. Using the RC thermal model [1], [10], [11], [12],
[24], the temperature of a core Mi,j (i = 1, . . . ,m; j =
1, . . . , k) wrt time is denoted by Ti,j(t) and is given by
Eq. 2, where T amb is the ambient temperature (in ◦C),
Ri is the thermal resistance of Mi (in J/◦C), Ci is the
thermal capacitance of Mi (in Watt/◦C), Φi,j(t) is the power
consumption of Mi,j wrt time (in Watt), and
dTi,j(t)
dt is the
derivative of the temperature of Mi,j wrt time.
RiCi
dTi,j(t)
dt
+ Ti,j(t)−RiΦi,j(t) = T amb (2)
If initial temperature of Mi,j at time t0 is T 0i,j and Mi,j is
running at a constant frequency fi,j , then the final temperature
of Mi,j at time t1 is denoted by T 1i,j and is given by Eq. 3.
dTi,j
dt
=
T amb + γiRifi,j + δiRifi,jTi,j + χiRif
3
i,j
RiCi
− Ti,j
RiCi
(3a)∫ T 1i,j
T 0i,j
dTi,j
Tamb+γiRifi,j+χiRif3i,j
RiCi
− ( 1−δiRifi,jRiCi )Ti,j
=
∫ t1
t0
dt
(3b)
T 1i,j =
Tamb + γiRifi,j + χiRif
3
i,j
1− δiRifi,j
−(
Tamb + γiRifi,j + χiRif
3
i,j
1− δiRifi,j
− T 0i,j
)
e
−( 1−δiRifi,j
RiCi
)(t1−t0)
(3c)
If Mi,j is running at a constant frequency, then its temper-
ature is a non-decreasing function (Eq. 3) [10], [11], [12],
[24]. The temperature of Mi,j becomes steady when the
system reaches steady state condition. The peak temperature
of Mi,j running at constant frequency fi,j is denoted by T ∗i,j(
d2Ti,j(t)
dt2 |Ti,j(t)=T∗i,j > 0
)
and is given by Eq. 4.
T ∗i,j =
Tamb+γiRifi,j+χiRif
3
i,j
RiCi
1−δiRifi,j
RiCi
=
T amb + γiRifi,j + χiRif
3
i,j
1− δiRifi,j
(4)
2) Heat-Dependent Thermal (HDT) Model: In this model
we assume non-negligible amount of heat transfer among
the cores of a multiprocessor unit and negligible or no
heat transfer among different multiprocessor units [10],
[12]. We also assume that there is a set of heat sinks
Ξi={$i,1, $i,2, · · · , $i,h} for each multiprocessor unit Mi
(i=1 . . .m). These heat sinks are only used for heat dissipation
and are placed on top of the cores. These heat sinks do not
generate any power. Fourier’s laws can be used to model
the dynamic heat transfer between the core and heat sinks
of a multiprocessor unit where each core acts as a discrete
thermal element [10], [12]. Using RC thermal model [10],
[12], let the thermal conductance between two cores Mi,j and
Mi,j′ is ωij,j′(∀j, j′∈1 . . . k). ζij,q denotes the vertical thermal
conductance between core Mi,j and sink $i,q (q=1 . . . h). The
horizontal thermal conductance between the heat sinks $i,q
and $i,q′ (∀q, q′∈1 . . . h) is denoted by ωiq,q′ . We assume that
ωij,j′=ω
i
j′,j , ω
i
j,j=0, ω
i
q,q′=ω
i
q′,q , and ω
i
q,q=0. ω
amb denotes the
thermal conductance between the heat sink and the environ-
ment. The thermal capacitance of Mi and Ξi are denoted by Ci
and Csinki respectively. Eq. 5 gives the temperature of Mi,j wrt
time which is denoted by Ti,j(t), where
dTi,j(t)
dt and Φi,j(t) are
respectively the derivative of temperature of Mi,j and power
consumption of Mi,j wrt time. T sinki,q (t) denotes temperature
of $i,q wrt time and is given by Eq. 6, where
dT sinki,q (t)
dt and
T amb are respectively the derivative of temperature of $i,q
wrt time and ambient temperature of the system.
Ci
dTi,j(t)
dt
= Φi,j(t)−
k∑
j′=1
ωij,j′ (Ti,j(t)− Ti,j′ (t))
−
h∑
q=1
ζij,q(Ti,j(t)− T sinki,q (t))
(5)
Csinki
dT sinki,q (t)
dt
= −ωamb(T sinki,q (t)− Tamb)−
k∑
j=1
ζij,q(
T sinki,q (t)− Ti,j(t))−
h∑
q′=1
ωiq,q′ (T
sink
i,q (t)− T sinki,q′ (t))
(6)
As long as Mi,j is running at a constant frequency, its
temperature is a non-decreasing function [10], [11], [12],
[24]. When the system reaches a steady state condition, the
temperature of Mi,j becomes steady. T ∗i,j denotes the peak (or
maximum) temperature of Mi,j running at constant frequency
fi,j
(
d2Ti,j(t)
dt2 |Ti,j(t)=T∗i,j > 0
)
, and T sink∗i,q denotes the peak
temperature of the $i,q
(
d2T sinki,q (t)
dt2 |T sinki,q (t)=T sink∗i,q > 0
)
. Eq.
7 approximates the values of T ∗i,j and T
sink∗
i,q .
0 = (γifi,j + χif
3
i,j) + T
∗
i,j
δifi,j − k∑
j′=1
ωij,j′ −
h∑
q=1
ζij,q

+
k∑
j′=1
ωij,j′T
∗
i,j′ +
h∑
q=1
ζij,qT
sink∗
i,q
(7a)
0 = ωambT amb + T sink∗i,q
−ωamb − k∑
j=1
ζij,q −
h∑
q′=1
ωiq,q′

+
k∑
j=1
ζij,qT
∗
i,j +
h∑
q′=1
ωiq,q′T
sink∗
i,q′
(7b)
149
We get Eq. 8 from simplifying Eq. 7 [12], where
Aj,j = δifi,j −
k∑
j′=1
ωij,j′ −
h∑
q=1
ζij,q
Aj,j′ = ω
i
j,j′
Aj,k+q = Ak+q,j = ζ
i
j,q
Ak+q,k+q = −ωamb −
k∑
j=1
ζij,q −
h∑
q′=1
ωiq,q′
Ak+q,k+q′ = ω
i
q,q′

A1,1 . . . A1,k+h
...
...
...
Ak,1 . . . Ak,k+h
Ak+1,1 . . . Ak+1,k+h
...
...
...
Ak+h,1 . . . Ak+h,k+h


T ∗i,1
...
T ∗i,k
T sink∗i,k+1
...
T sink∗i,k+h

= −

γifi,1 + χif
3
i,1
...
γifi,k + χif
3
i,k
ωambTamb
...
ωambTamb

(8a)
Therefore, [T ]k+h,1 = − [A]−1k+h,k+h × [λ]k+h,1 (8b)
E. Problem Definition
Proposition 1: If the frequency of Mi,j is at least U˜i,j ,
then any periodic hard real-time scheduling policy which can
fully utilize the core (e.g., Earliest Deadline First (EDF), Least
Laxity First) can be used to obtain a feasible schedule [25].
According to Proposition 1, we assume that the frequency
of Mi,j is tightly related to the actual utilization U˜i,j instead
of the worst-case utilization Ui,j . The actual frequency of
Mi,j denoted by f˜i,j is thus the lowest discrete frequency
that is greater than or equal to U˜i,j . The energy consumption
of Mi,j during the time interval [0, P ] is thus estimated by
P ×Φ(f˜i,j). We also need to make sure that by running Mi,j
at frequency f˜i,j , the peak temperature of Mi,j denoted by
T ∗i,j is no more than T
max
i,j , which is the maximum feasible
operating temperature of Mi,j .
1) Problem Statement: Given a set of periodic real-time
tasks (Γ) and a set of interconnected multi-core multiprocessor
units (Ω), the problem is to identify a subset of cores to be ac-
tivated (Ψ) and partition tasks to the active cores such that the
overall energy consumption is minimized, peak temperature
constraints are satisfied, and task deadlines are not violated.
This problem is known to be NP-Hard in the strong sense
[4], [26]. Thus, the problem is formally defined as follows:
Minimize: P
∑
Mi,j∈Ψ
Φi,j(f˜i,j)
i=1...m
j=1...k
Ψ⊆Ω (9)
Subject to: 0 ≤ U˜i,j ≤ βi,j
i=1...m
j=1...k
Mi,j∈Ψ
(10a)∑
Mi,j∈Ψ
αiU˜i,j = γUtot
0<l≤m
0<ci≤k
Ψ⊆Ω
(10b)
f˜i,j : lowest discrete frequency where f˜i,j ≥ U˜i,j
i=1...m
j=1...k
Mi,j∈Ψ
(10c)
T ∗i,j ≤ Tmaxi,j
i=1...m
j=1...k
Mi,j∈Ψ
(10d)
where, βi,j is the schedulable utilization bound (e.g. 1.0 if
Mi,j is under EDF) and γ is the ratio between actual and
worst-case total utilization of the tasks.
IV. ADAPTIVE PARTITIONING ALGORITHMS
In this section we present adaptive partitioning algorithms
which are based on feedback control and real-time scheduling
theories and use actual utilization of tasks to minimize the
overall energy consumption.
During the initial partitioning of tasks, the values of the
actual utilization of tasks are hard to estimate. Thus, for the
initial partitioning it is assumed that actual and worst-case
utilization of tasks are same, i.e. u˜i = WiPi (i = 1 . . . n)
and γ = 1. This information is used to compute the initial
partitioning using the Min-core Worst-fit (MW) heuristic which
is described below. Thus, we achieve the initial partitioning of
tasks and also identify the set of active cores (Ψ) needed to
execute all the tasks without violating any constraints.
In MW heuristic, we use worst-case execution time of
tasks to partition the tasks into cores according to the worst-
fit scheduling strategy. Therefore, the core with maximum
available capacity is selected to execute the next task in
the queue [4], [9]. But, worst-fit scheduling just solves the
partitioning part of the problem. It does not find the least
number of active cores (Ψ) needed to execute all the tasks. In
order to find Ψ, the cores are sorted in decreasing order of their
respective maximum capacity and the number of active cores is
sequentially decreased when using worst-fit scheduling. Each
partitioning scheme ensures that peak temperature constraints
of cores and deadlines of tasks are not violated. Finally, the
partitioning scheme resulting in the least energy consumption
is identified as the initial partitioning strategy for the tasks.
Thus, MW heuristic also identifies the set of active cores (Ψ)
needed to execute all the tasks.
The initial partitioning of tasks and identification of active
cores are based on the WCET of tasks which is generally
known to be a pessimistic estimate and can be significantly
greater than the actual execution time. Thus, we use a dynamic
feedback based approach to adapt the initial partitioning using
the actual utilization of tasks. We use a feedback predictor
which monitors the current jobs and estimates the actual
utilization of future jobs [20]. The information of u˜i is
periodically updated and used to recalculate the partitioning
of tasks for minimizing the energy consumption. However,
unlike the initial partitioning, this new solution is not going
to be implemented exactly since this dramatic change of task
partitioning may require a large number of tasks migrations.
Therefore, the new solution of the task partitioning is only
used to determine the total utilization target for a core’s tasks,
denoted by Bj,r (j = 1 . . .m; r = 1 . . . k), i.e. the target value
of U˜j,r. We set Bj,r =
∑
τi∈Γj,r u˜i, where Γj,r denotes the
tasks allocated to Mj,r when the new partitioning scheme was
to be followed. Since only a subset of active cores (Ψ) are
needed to be activated, we have Bj,r = {0|Mj,r /∈ Ψ}. Thus,
we develop utilization control methods to adapt the initial task
partitioning and achieve the target utilization of the cores.
150
We use an approach similar to the controlled task migration
approach [13] for changing the utilization of cores. Each job
of a task is executed only on a single core while different
jobs of the task can execute on different cores. Thus, the
runtime context of a job is maintained in only one core but the
task-level context may be migrated [13]. The task migration
decisions of a core is done by the utilization controller of
the core. The control period ts is selected such that multiple
jobs of a task can be released during the period. Therefore,
U˜j,r(s) is estimated during the time interval [(s−1)ts, sts) and
the main objective of the task migration strategy is to choose
migrations such that U˜j,r(s + 1) is close to Bj,r during the
sth sampling point (time sts). Below we present two different
task migration heuristics for accomplishing our objective.
Distributed Utilization Control (DUC) Heuristic: In DUC
heuristic, a task is migrated to an under-utilized core (U˜j,r <
Bj,r) from an over-utilized core based on the migration
probability. Let, Pjr,j′r′ denotes the probability of core Mj,r
migrating a task to core Mj′,r′ (j′ = 1 . . .m; r′ = 1 . . . k) and
δj,r(s) denotes the total actual utilization of tasks migrated out
of Mj,r during the (s+1)th control period. Therefore, the load
dynamics of a core is given by Eq. 11.
U˜j,r(s+1) = U˜j,r(s)−δj,r(s)+
m∑
j′=1
k∑
r′=1
αj′,r′
αj,r
Pj′r′,jrδj′r′(s)
(11)
The current utilization of all under-utilized cores is broad-
casted to all active cores. Based on this information, a core
Mj,r computes the values of Pjr,j′r′ using Eq. 12, where Θ
denotes the set of under-utilized cores.
Pjr,j′r′ =
 0 IfMj′,r′ /∈ ΘBj′,r′−U˜j′,r′ (s)∑
Ma,b∈Θ(Ba,b−U˜a,b(s))
Otherwise (12)
The control rules are given in Eq. 13. The value of positive
control gain Kj,r is selected to ensure overall system stability
[27]. The admission controller in a core accepts a task
migration if and only if the utilization and the task deadline
miss ratios of the core are bounded (e.g. U˜j,r(s+1) ≤ βj,r and
MRj,r(s + 1) ≤ 1%,MRj,r denotes the task deadline miss
ratio of a core Mj,r). A small bound on miss ratio is selected to
ensure soft real-time properties [28]. The admission controller
also ensures that the peak temperature constraint of the core is
not violated by accepting any new task. An idle core is turned
off or put to a sleep state for saving energy.
yj,r(s) = U˜j,r(s)−Bj,r (13a)
δj,r(s) =
{
Kj,ryj,r(s) If yj,r(s) ≥ 0
0 Otherwise (13b)
Greedy Utilization Control (GUC) Heuristic: In GUC
heuristic, we follow a greedy approach for migrating the
tasks. The current utilization of all under-utilized cores is
broadcasted to all active cores. Based on this information, the
most over-utilized core selects the task with the least actual
utilization and migrates the task to the most under-utilized
core. The admission controller ensures that the utilization
and task deadline miss ratios of the core are bounded (e.g.
U˜j,r(s + 1) ≤ βj,r and MRj,r(s + 1) ≤ 1%) and the peak
temperature constraint of a core is not violated by accepting
a new task. An idle core is turned off or put to a sleep state
for saving energy.
V. RESULTS AND DISCUSSION
In this section, we describe the simulation and analyze the
results. Our results represent an average of 20 runs and all
results have 95% confidence level. The maximum feasible
operating temperature of a core and ambient temperate of
the system are assumed to be 75◦C and 0◦C respectively.
We used 6 different sets of periodic tasks, where the total
number of tasks, n are 80, 90, 110, 140, 200, and 300. The
total worst-case utilizations of task sets (Utot) are assumed to
be similar, and the average task utilizations (Utot/n) of task
sets are different. The values of the average task utilization
are increasingly smaller with an average of 0.205, 0.175,
0.145, 0.115, 0.085, and 0.055 respectively. The actual task
utilization is considered to be 40% of the worst-case utilization
in our experiments. The task hyper-period P is 1000 seconds.
A. Results using Heat-Independent Thermal Model
We simulated a set of 8 interconnected multiprocessor units
(m = 8), and each multiprocessor unit has 4 identical cores
(k = 4). The parameters used in the simulation are given in
Table I [11], [22], [24]. It is assumed that fmini = 0.5f
max
i .
TABLE I
SIMULATION PARAMETERS USING HIT MODEL
Mi f
max
i γi δi χi Ri Ci αi
M1 3.3 20.5060 0.1666 3.656 0.282 340 2.152
M2 3.4 5.0187 0.1942 2.138 0.487 295 1.666
M3 3.3 12.7880 0.2043 3.645 0.288 320 1.148
M4 3.0 15.6262 0.1942 4.556 0.238 320 1.044
M5 3.2 20.6393 0.1574 3.204 0.278 295 0.869
M6 3.1 11.9759 0.1586 2.719 0.480 255 0.540
M7 3.0 10.3490 0.1124 2.074 0.661 335 0.348
M8 2.6 13.1568 0.1754 2.332 0.680 380 0.340
Fig. 1(a) gives the number of active cores (|Ψ|) needed to
execute the respective task sets. Using WCET-based partition-
ing approach, 16 to 20 active cores are needed to execute
all the task sets. However, using the actual utilization (AET)
based DVS scaling, 7 or 8 active cores are sufficient.
For comparative analysis, we implemented MW heuristic
using the actual task utilizations to get the baseline centralized
solution. The tasks are migrated in the centralized solution for
implementing the new task partitioning scheme, while tasks
are migrated in the feedback approaches for achieving the
utilization target (i.e. Bj,r, where j=1, . . . ,m; r = 1, . . . , k).
The task migrations are classified into two categories: indis-
pensable and normal task migrations. The indispensable task
migrations are composed of those unavoidable task migrations
which results from turning off the cores. The normal task
migrations represent the task migrations among the cores
that remain active. Therefore, the number of normal task
migrations are compared for evaluating the migration overhead
of the heuristics.
151
 0 
5 
10 
15 
20 
25 
80 90 110 140 200 300 
N
u
m
b
e
r
 o
f 
A
c
t
iv
e
 C
o
r
e
s
 
Task Set Size 
WCET-based Solution 
AET-based Solution 
(a) Number of Active Cores
 
0 
20 
40 
60 
80 
100 
120 
140 
160 
80 90 110 140 200 300 
N
u
m
b
e
r 
o
f 
N
o
rm
a
l 
Ta
sk
 M
ig
ra
ti
o
n
s 
Task Set Size 
Centralized Solution 
GUC Heuristic 
DUC Heuristic 
(b) Number of Normal Task Migrations
 
0 
200 
400 
600 
800 
1000 
1200 
1400 
80 90 110 140 200 300 
To
ta
l 
E
n
e
rg
y
 C
o
n
su
m
p
ti
o
n
 (
K
J)
 
Task Set Size 
WCET-based 
Solution 
AET-based 
Solution 
Centralized 
Solution 
GUC Heuristic 
DUC Heuristic 
(c) Total Energy Consumption
Fig. 1. Results using Heat-Independent Thermal Model
The number of required normal task migrations is compared
in Fig. 1(b). In our experiments, GUC and DUC heuristics
reduce the number of normal task migrations on an average by
61.5% and 59.4% respectively as compared to the centralized
solution. The number of indispensable tasks migrations in our
experiments are 24, 32, 34, 39, 68, and 88 respectively when
task set size are 80, 90, 110, 140, 200, and 300.
The total energy consumed by the different partitioning
approaches is compared in Fig. 1(c). The energy consumption
using the actual utilization (i.e. U˜i, i=1, . . . , n) based DVS
scaling is significantly lower than the worst-case utilization
(i.e. Ui) based DVS scaling. The energy consumption is
further minimized by using adaptive or centralized strategies
which use actual execution time of tasks for obtaining the
partitioning of the tasks. In our experiments, the centralized
solution reduces the energy consumption by 51.86% to 62.17%
as compared to the WCET-based paritioning. DUC heuristic
achieves similar reductions in energy consumption but requires
51.4% to 67.8% fewer normal task migrations as compared
to the centralized solution. GUC heuristic also has similar
energy savings but needs 58.5% to 65.8% fewer normal task
migrations as compared to the centralized solution. Therefore,
the proposed adaptive heuristics are effective in minimizing
energy consumption according to the actual utilization of tasks
and also minimizing the number of normal task migrations.
B. Results using Heat-Dependent Thermal Model
We simulated a set of 4 interconnected multiprocessor units
(m = 4) where each multiprocessor unit has a 2×2 layout with
2 sinks, i.e. each unit has 4 identical cores (k = 4) [10], [12].
The simulation parameters are given in Table II [12] and we
assume that fmini = 0.5f
max
i . The values of Ai,i (i = 1 . . . 4)
in matrix A can only be computed after the frequencies of the
cores have been determined (Eq. 8). Therefore, these entries
are left blank in Table II.
Fig. 2(a) compares the number of active cores (|Ψ|) needed
to execute the respective task sets. 15 active cores are needed
to execute all the task sets using WCET-based partitioning
approach. However, 6 active cores are sufficient when actual
utilization (AET) based DVS scaling is used.
The number of required normal task migrations is compared
in Fig. 2(b). In our experiments, GUC and DUC heuristics
reduce the number of normal task migrations on an average by
47.5% and 43.0% respectively as compared to the centralized
TABLE II
SIMULATION PARAMETERS USING HDT MODEL
(a) Matrix A for M1
0.009 0.004 0.000 0.200 0.050
0.009 0.000 0.004 0.050 0.060
0.004 0.000 0.009 0.200 0.050
0.000 0.004 0.009 0.050 0.060
0.200 0.050 0.200 0.050 -1.725 0.300
0.050 0.060 0.050 0.060 0.300 -1.445
(b) Matrix A for M2
0.025 0.007 0.020 0.500 0.050
0.025 0.020 0.007 0.050 0.200
0.007 0.020 0.025 0.500 0.050
0.020 0.007 0.025 0.050 0.200
0.500 0.050 0.500 0.050 -2.925 0.900
0.050 0.200 0.050 0.200 0.900 -2.325
(c) Matrix A for M3
0.020 0.010 0.000 0.400 0.100
0.020 0.000 0.010 0.100 0.120
0.010 0.000 0.020 0.400 0.100
0.000 0.010 0.020 0.100 0.120
0.400 0.100 0.400 0.100 -2.625 0.700
0.100 0.120 0.100 0.120 0.700 -2.065
(d) Matrix A for M4
0.013 0.007 0.004 0.300 0.050
0.013 0.004 0.007 0.080 0.090
0.007 0.004 0.013 0.300 0.080
0.004 0.007 0.013 0.080 0.090
0.300 0.080 0.300 0.080 -2.085 0.400
0.080 0.090 0.080 0.090 0.400 -1.665
(e) Other Simulation Parameters
Mi f
max
i γi δi χi αi
M1 2.2 0.10 0.002 1.0 2.152
M2 2.5 0.20 0.015 1.3 1.666
M3 2.0 0.18 0.010 2.2 1.044
M4 1.9 0.15 0.005 1.7 0.540
solution. Due to heat transfer among cores, HDT model re-
quires a higher number of normal task migrations as compared
to HIT model. The number of indispensable task migrations
in our experiments are 23, 27, 31, 41, 62, and 89 respectively
when the task set size are 80, 90, 110, 140, 200, and 300.
The total energy consumed by the partitioning schemes is
compared in Fig. 2(c). The energy consumption using the
actual utilization (i.e. U˜i) based DVS scaling is significantly
smaller than the worst-case utilization (i.e. Ui) based DVS
scaling. There is significant energy savings by using adaptive
or centralized solutions that use actual execution time of
tasks for obtaining the task partitioning. In our experiments,
the centralized solution reduces the energy consumption by
58.5% to 62.2% as compared to the WCET-based solution.
152
 0 
2 
4 
6 
8 
10 
12 
14 
16 
80 90 110 140 200 300 
N
u
m
b
e
r
 o
f 
A
c
t
iv
e
 C
o
r
e
s
 
Task Set Size 
WCET-based 
Solution 
AET-based 
Solution 
(a) Number of Active Cores
 
0 
20 
40 
60 
80 
100 
120 
140 
160 
80 90 110 140 200 300 
N
u
m
b
e
r 
o
f 
N
o
rm
a
l 
Ta
sk
 M
ig
ra
ti
o
n
s 
Task Set Size 
Centralized Solution 
GUC Heuristic 
DUC Heuristic 
(b) Number of Normal Task Migrations
 
0 
20 
40 
60 
80 
100 
120 
140 
80 90 110 140 200 300 
To
ta
l 
E
n
e
rg
y
 C
o
n
su
m
p
ti
o
n
 (
K
J)
 
Task Set Size 
WCET-based 
Solution 
AET-based 
Solution 
Centralized 
Solution 
GUC Heuristic 
DUC Heuristic 
(c) Total Energy Consumption
Fig. 2. Results using Heat-Dependent Thermal Model
DUC heuristic achieves similar energy savings but requires
34.8% to 50% fewer normal task migrations as compared
to the centralized solution. GUC heuristic also has similar
energy savings and needs 39.5% to 50% fewer normal task
migrations as compared to the centralized solution. Therefore,
the proposed adaptive heuristics are effective in minimizing
the number of normal task migrations and minimizing energy
consumption according to the actual utilization of tasks.
VI. CONCLUSION
We present adaptive energy-efficient task partitioning for
heterogeneous multi-core multiprocessor real-time systems.
We use a power model which incorporates the impact of
temperature and voltage of a core on its static power consump-
tion. Two different thermal models, namely HIT and HDT
models are used for estimating the peak temperature of a core.
We present DUC and GUC heuristics for adaptive thermal-
constrained energy-efficient partitioning of tasks which are
feedback-based optimization and control approaches. In our
simulations with HIT model, DUC and GUC heuristics min-
imize the energy consumption by an average of 55% as
compared to a WCET-based task partitioning scheme and
require an average of 60% fewer normal task migrations
as compared to a centralized solution for obtaining similar
energy savings. Similarly in our simulations with HDT model,
DUC and GUC heuristics minimize the energy consumption
by an average of 60% as compared to a WCET-based task
partitioning scheme and require an average of 45% fewer
normal task migrations as compared to a centralized solution
for obtaining similar energy savings.
In future, we plan to investigate strategies for dealing with
modeling inaccuracies of power and thermal parameters and
task execution times. We also want to evaluate our solutions
on several multiprocessor systems, e.g. multi-core computers,
smart phones, and vehicle computing platforms.
REFERENCES
[1] J.-J.Chen, A.Schranzhofer, and L.Thiele, “Energy minimization for
periodic real-time tasks on heterogeneous processing units,” Parallel and
Distributed Processing Symposium, International, pp. 1–12, 2009.
[2] R.Kumar and D. M.Tullsen, “Core architecture optimization for hetero-
geneous chip multiprocessors,” in PACT, 2006, pp. 23–32.
[3] Z.Wang and S.Ranka, “Thermal constrained workload distribution for
maximizing throughput on multi-core processors,” in GREENCOMP,
2010, pp. 291–298.
[4] H.Aydin and Q.Yang, “Energy-aware partitioning for multiprocessor
real-time systems,” in IPDPS, 2003.
[5] Y.Chen et al., “Managing server energy and operational costs in hosting
centers,” ACM SIGMETRICS PER, vol. 33, no. 1, pp. 303–314, 2005.
[6] I.Hong et al., “Synthesis techniques for low-power hard real-time
systems on variable voltage processors,” in RTSS, 1998, pp. 178–187.
[7] J.-J.Chen and C.-F.Kuo, “Energy-efficient scheduling for real-time sys-
tems on dynamic voltage scaling (dvs) platforms,” in RTCSA, 2007.
[8] P.deLangen and B.Juurlink, “Leakage-aware multiprocessor scheduling
for low power,” in IPDPS, April 2006.
[9] P.Langen and B.Juurlink, “Leakage-aware multiprocessor scheduling,”
J. Signal Process. Syst., vol. 57, pp. 73–88, October 2009.
[10] T.Chantem, X. S.Hu, and R. P.Dick, “Temperature-aware scheduling
and assignment for hard real-time applications on mpsocs,” IEEE
Transactions on VLSI Systems, no. 99, pp. 1 – 14, 2010.
[11] G.Quan and V.Chaturvedi, “Feasibility analysis for temperature-
constraint hard real-time periodic tasks,” Industrial Informatics, IEEE
Transactions on, vol. 6, no. 3, pp. 329–339, aug. 2010.
[12] N.Fisher et al., “Thermal-aware global real-time scheduling and analysis
on multicore systems,” in Journal of System Architecture, vol. 57, no. 5,
2011, pp. 547–560.
[13] J.Carpenter et al., “A categorization of real-time multiprocessor schedul-
ing problems and algorithms,” in Handbook on Scheduling Algorithms,
Methods, and Models. Chapman Hall/CRC, Boca, 2004.
[14] S. K.Baruah et al., “Proportionate progress: A notion of fairness in
resource allocation,” Algorithmica, vol. 15, pp. 600–625, 1994.
[15] S.Baruah, “Task partitioning upon heterogeneous multiprocessor plat-
forms,” in RTAS, 2004, pp. 536–543.
[16] M.Goraczko et al., “Energy-optimal software partitioning in heteroge-
neous multiprocessor embedded systems,” in DAC, 2008, pp. 191–196.
[17] A.Schranzhofer, J.-J.Chen, and L.Thiele, “Dynamic power-aware map-
ping of applications onto heterogeneous mpsoc platforms,” Industrial
Informatics, IEEE Transactions on, vol. 6, no. 4, nov. 2010.
[18] Y.Liu et al., “Accurate temperature-dependent integrated circuit leakage
power estimation is easy,” in DATE, 2007, pp. 1526–1531.
[19] L.Schor et al., “Worst-case temperature guarantees for real-time appli-
cations on multi-core systems,” in Proc. IEEE Real-Time and Embedded
Technology and Applications Symposium (RTAS), 2012.
[20] A.Block et al., “An adaptive framework for multiprocessor real-time
systems,” ECRTS, pp. 23–33, 2008.
[21] J.-J.Chen et al., “Approximation algorithms for multiprocessor energy-
efficient scheduling of periodic real-time tasks with uncertain task
execution time,” RTAS, pp. 13–23, 2008.
[22] L.Wang and Y.Lu, “An efficient threshold-based power management
mechanism for heterogeneous soft real-time clusters,” Industrial Infor-
matics, IEEE Transactions on, vol. 6, no. 3, pp. 352 –364, aug. 2010.
[23] J. W. S. W.Liu, Real-Time Systems. Prentice Hall PTR, 2000.
[24] V.Chaturvedi, H.Huang, and G.Quan, “Leakage aware scheduling on
maximum temperature minimization for periodic hard real-time sys-
tems,” in ICCIT, 2010, pp. 1802–1809.
[25] H.Aydin et al., “Dynamic and aggressive scheduling techniques for
power-aware real-time systems,” in RTSS, 2001, pp. 95–105.
[26] J. A.Stankovic et al., “Implications of classical scheduling results for
real-time systems,” Computer, vol. 28, pp. 16–25, 1995.
[27] J.Chiasson et al., “The effect of time delays on the stability of load
balancing algorithms for parallel computations,” IEEE Trans. on Control
Systems Technology, pp. 932–942, 2005.
[28] C.Lu et al., “Performance specifications and metrics for adaptive real-
time systems,” RTAS, 2000.
153
