Energy-efficient thermal-aware multiprocessor scheduling for real-time tasks using TCPNs by Rubio-Anguiano, Laura et al.
Noname manuscript No.
(will be inserted by the editor)
Energy-Efficient Thermal-Aware Multiprocessor
Scheduling for Real-Time Tasks Using TCPN
L. Rubio-Anguiano 1, G. Desirena-López1,
A. Ramı́rez-Treviño1 and J.L. Briz 2
Received: date / Accepted: date
Abstract We present an energy-efficient thermal-aware real-time global sched-
uler for a set of hard real-time (HRT) tasks running on a multiprocessor sys-
tem. This global scheduler fulfills the thermal and temporal constraints by
handling two independent variables, the task allocation time and the selection
of clock frequency.
To achieve its goal, the proposed scheduler is split into two stages. An off-line
stage, based on a deadline partitioning scheme, computes the cycles that the
HRT tasks must run per deadline interval at the minimum clock frequency to
save energy while honoring the temporal and thermal constraints, and com-
putes the maximum frequency at which the system can run below the maxi-
mum temperature. Then, an on-line, event-driven stage performs global task
allocation applying a Fixed-Priority Zero-Laxity policy, reducing the overhead
of quantum-based or interval-based global schedulers. The on-line stage em-
bodies an adaptive scheduler that accepts or rejects soft RT aperiodic tasks
throttling CPU frequency to the upper lowest available one to minimize power
consumption while meeting time and thermal constraints. This approach lever-
ages the best of two worlds: the off-line stage computes an ideal discrete HRT
multiprocessor schedule, while the on-line stage manage soft real-time aperi-
odic tasks with minimum power consumption and maximum CPU utilization.
Keywords TCPN · modeling · scheduling
1
CINVESTAV. Av del Bosque 1145, Baj́ıo, 45019 Zapopan, Jal
E-mail: {lerubio, gdesirena, art}@gdl.cinvestav.mx
2
Universidad de Zaragoza, Maŕıa de Luna 3, 50009 Zaragoza, España
E-mail: briz@unizar.es
2 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
1 Introduction
Multicore microprocessors are a promising platform for embedded real-time
(RT) systems. They allow the optimization of space, weight and power (SWaP)
requirements, but they pose new scheduling challenges far beyond the long-
established RT theory and practice for unicores. For example, CPU task allo-
cation can lead to hot spots, reducing a microprocessor lifespan. Also, mobile
devices call for a careful energy management to extend battery life. Hence,
a multicore RT scheduler must consider temperature and energy besides the
timing constraints. Many current MPSoCs provide power management mecha-
nisms such as dynamic voltage and frequency scaling (DVFS), which can lower
voltage and clock frequency simultaneously, thus reducing energy consumption
quadratically.
Global schedulers dynamically allocate any task to any processor whereas
partitioned schedulers statically divide tasks among the available processors
(Sec.2.3). Only global scheduling has been proved hard RT (HRT) and soft
RT (SRT) optimal for implicit deadline task sets, whereas partitioning is lim-
ited to a 50% utilization bound (Oh and Bakker [1998]). No optimal scheduler
exists for constrained or arbitrary deadlines (Fisher et al. [2010]). However,
context-switching and migration costs are high in global schedulers, although
techniques like deadline partitioning (Funk et al. [2011], Moulik et al. [2017])
can lower that overhead. Recent state-of the-art contributions like Branden-
burg and Gül [2016], Casini et al. [2017] prove that ad-hoc approaches based
on mixing semipartitioning with a number of techniques can be optimal in spe-
cific cases, always assuming unavoidable context-switch and migration costs.
Meanwhile, the industry remains understandably conservative as we discuss
in Sec. 2.3. Thus, there are still many open issues in the RT multiprocessor
scheduling arena even when restricted to sporadic tasks with implicit deadlines
without resource sharing. This become all the more complex when additional
constraints like a bounded maximum temperature are considered.
We propose in this paper a thermal-aware, energy-efficient RT global sched-
uler based on an off-line stage and an on-line stage. The off-line stage takes
two steps. The first step computes the minimum frequency (F ∗) required to
correctly run the HRT task set at maximum CPU utilization, since F ∗ is
minimum, then the power consumption is minimized. The second step com-
putes the maximum frequency (F+) subject to the 100% CPU utilization and
thermal constraints. In other words, any frequency increase above F+ would
imply a thermal bound violation. Then, the off-line stage leverages a deadline
partitioning approach to compute the CPU cycles that each task must run
per deadline interval. We get around the NP-completeness of the workload
problem by splitting the hyperperiod into deadline intervals. An on-line stage
performs the task allocation leveraging a simple Fixed-Priority Zero-Laxity
policy (FPZL, Davis and Burns [2011a]). It includes an Adaptive Scheduler
(AS ) which adds robustness by throttling CPU frequency and accepting or
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 3
rejecting aperiodic tasks, thus ensuring the correct execution of the HRT task
set minimizing energy consumption and keeping a controlled temperature.
The main contribution of this paper is to show the applicability of TCPNs to
HRT multiprocessor scheduling. We include thermal constraints and SRT ape-
riodic task management to pose a scenario which is considered non-trivial in
today’s HRT multiprocessor scheduling. However, there are implicit meaning-
ful contributions to the state-of-the-art in the field as well. First, the unimod-
ularity of the HRT and thermal constraint matrices reduces the complexity of
the off-line stage to a linear programming problem (LPP), which yields task
cycles per deadline interval at minimum CPU frequency (minimum power).
Second, leveraging a FPZL policy makes the on-line allocation stage event-
driven instead of quantum- or interval-driven, dramatically reducing the num-
ber of scheduling points (i.e. the chances of a context-switch and migration) to
three minimal events (zero-laxity, job completion and aperiodic arrival). Last,
the combination of the off-line stage with the AS manages aperiodic task ar-
rival while guaranteeing HRT and thermal constraints with optimized power
consumption.
This article is a journal extension of a conference paper (Rubio-Anguiano et al.
[2018]). We now provide complete proofs and obtain context-switch and migra-
tion bounds. Also, we present further experiments assuming a more realistic
DVFS mechanism based on the XScale multicore processor ( Chen and Kuo
[2007]), and we include a comparison with a global EDF (g-EDF ) scheduler
and a pfair /deadline partitioning hybrid scheduler.
The work is organized as follows. Section 2 presents basic concepts related with
Petri nets and Timed Continuous Petri nets, and the forced TCPN equation as
we use it. Section 3 states the scheduling problem herein addressed. Section 4
describes the modeling methodology. Section 5 describes the off-line stage of
the scheduler. Section 6 extends on the on-line stage, providing context-switch
and migration bounds, detailing the aperiodic scheduler, and stating algo-
rithm complexity. Section 7 compares the proposal with two thermal-aware
schedulers based on global EDF and pfair, discussing simulation results. Last,
Section 8 draw some conclusions and points out future work.
2 Background
The following two subsections introduce basic necessary definitions concerning
Petri nets and Timed Continuous Petri nets. An interested reader may also
consult Mahulea et al. [2008], Desel and Esparza [1995], Silva and Recalde
[2007] to get a deeper insight in the field. A last subsection provides insight
on related research on multiprocessor real-time scheduling, highlighting the
novelty of our proposal.
4 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
2.1 Discrete Petri Nets
Definition 1 A (discrete) Petri net is the 4-tuple N = (P, T,Pre,Post)
where P and T are finite disjoint sets of places and transitions, respectively.
Pre and Post are |P | × |T | Pre− and Post− incidence matrices, where
Pre(i, j) > 0 (resp. Post(i, j) > 0) if there is an arc going from tj to pi (resp.
going from pi to tj), otherwise Pre(i, j) = 0 (resp. Post(i, j) = 0).
Definition 2 A (discrete) Petri net system is the pair Q = (N,M) where N
is a Petri net and M : P → N ∪ {0} is the marking function assigning zero or
a natural number to each place. The marking is also represented as a column
vector M , such that its i − th element is equal to M(pi), named the tokens
residing into pi. M0 denotes the initial marking distribution.
A transition t ∈ T is said enabled at the markingM ∈ N|P | iff M ≥ Pre[p, t],
the occurrence or firing of an enabled transition leads to a new marking dis-
tribution M ′ ∈ N|P | that can be computed by using M ′ = M + C[P, t] =
M + C · et, where C = Post− Pre is named the incidence matrix, and et
denotes the t− th elementary vector (et(k) = 1 if k = t, otherwise et(k) = 0).
2.2 Continuous and Timed continuous Petri nets
Definition 3 A continuous Petri Net (ContPN ) is a pair ContPN = (N,m0)
where N = (P, T,Pre,Post) is a Petri net (PN ) and m0 ∈ {R+ ∪ 0}|P | is
the initial marking.
The evolution rule is different from the discrete PN case. In continuous PN ’s
the firing is not restricted to be integer. A transition ti in a ContPN is
enabled at m if ∀ pj ∈• ti,m[pi] > 0; and its enabling degree is defined
as enab(ti,m) = minpj∈•ti
m[pj ]
Pre[pj ,ti]
. The firing of ti in a certain positive
amount α ≤ enab(ti,m) leads to a new marking m′ = m+ αC[P, ti], where
C = Post− Pre is computed as in the discrete case.
Ifm is reachable fromm0 by firing the finite sequence σ of enabled transitions,
then m = m0 + C
−→σ is named the fundamental Eq. where −→σ ∈ {R+ ∪ 0}|T |
is the firing count vector, i.e −→σ [tj ] is the cumulative amount of firings of tj in
the sequence σ.
Definition 4 A timed continuous Petri net (TCPN) is a time-driven con-
tinuous state system described by the tuple (N,λ,m0) where (N,m0) is a
continuous PN and the vector λ ∈ {R+ ∪ 0}|T | represents the transitions rates
determining the temporal evolution of the system.
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 5
Transitions fire according to certain speed, which generally is a function of
the transition rates and the current marking. Such function depends on the
semantics associated to the transitions. Under the infinite server semantics
Silva et al. [2011] the flow through a transition ti (the transition firing speed)
is defined as the product of its rate, λi, and enab(ti,m), the instantaneous




(through the rest of this paper, for the sake of simplicity the flow through a
transition ti is denoted as fi).
The firing rate matrix is denoted by Λ = diag(λ1, ..., λ|T |). For the flow to be
well defined, every continuous transition must have at least one input place,
hence in the following we will assume ∀t ∈ T, |•t| ≥ 1. The ”min” in the
above definition leads to the concept of configuration. A configuration of a
TCPN at m is a set of arcs (pi, tj) such that pi provides the minimum ratio
m[p]/Pre[p, ti] among the places p ∈• ti at the given marking m. We say
that pi constrains tj for each arc (pi, tj) in the configuration. A configuration
matrix is defined for each configuration as follows:
Π(m) =
{ 1
Pre[i,j] if pi is constraining tj
0 otherwise.
(1)
The flow through the transitions of the net can be written in a vectorial form
as f(m) = ΛΠ(m)m. The dynamical behaviour of a PN system is described
by its fundamental equation:
ṁ = CΛΠ(m)m (2)
In order to apply a control action to (2), a term u such that 0 ≤ ui ≤ fi(m) is
added to every transition ti to indicate that its flow can be reduced. Thus the
controlled flow of transition ti becomes wi = fi − ui. Then, the forced state
equation is
ṁ = C[f(m)− u] = Cw
0 ≤ ui ≤ fi(m)
(3)
2.3 Related work
RT scheduling in multiprocessors has been traditionally tackled through two
different approaches: partitioned and global scheduling (Baker [2005], Davis
and Burns [2011b]) and mixed solutions (Calandrino et al. [2007]). Partitioned
schedulers allocate tasks statically to CPUs, and tasks are not allowed to
migrate. Under this scheme, HRT schedulability analysis can be derived from
6 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
uniprocessor scheduling, ensuring a maximum utilization bound of 50%, which
severely hampers the SWaP compromises (Oh and Bakker [1998]).
In contrast, global schedulers can allocate tasks to any CPU and allow task mi-
gration. Global Earliest Deadline First (gEDF ) (Bertogna and Cirinei [2007])
is a global preemptive scheduling algorithm where all ready tasks are enqueued
in a single ready queue, and the m highest-priority tasks are executed on the m
processors. Job priorities are inversely proportional to their associated dead-
lines, with a smaller absolute deadline corresponding to a higher priority. This
algorithm guarantees soft real-time (SRT) schedulability for implicit-deadline
task sets, managing dynamic priorities at task level while fixing priorities at
job level, but is not optimal under a HRT scheme (Goossens and Funk [2003],
Bertogna et al. [2005]). Global scheduling can benefit from the concept of fluid
scheduling, which consists in instantaneously sharing CPUs among all active
jobs. Practical implementations approach this theoretical behavior by inter-
leaving jobs, keeping a fair CPU share within time periods, and honoring time
constraints in the case of RT tasks sets. Upon this principles, global schedulers
based on proportionate fairness like pfair (Baruah et al. [1996]), PD (Baruah
et al. [1995]) and PD2 (Anderson and Srinivasan [2001]) achieve HRT optimal
schedulability for implicit-deadline tasks sets. In pfair algorithms, the time is
discretized and the tasks can only be executed during an integer number of
quanta Q. The time quanta are then fairly distributed between the tasks so
that ζ, the difference between the execution time of every task τi and the fluid
schedule is smaller than 1 quantum at any time.
The downside of pfair algorithms is that they incur in an unfeasible number
of context switches and migrations, since scheduling actions are taken on a
quantum basis. Deadline partitioning schedulers such as Dpfair (Funk et al.
[2011]) alleviate that overhead by limiting the scheduling points to the set of all
task deadlines, i.e scheduling actions are now at variable time intervals instead
of at a fixed quantum. RT-TCPN (Desirena-Lopez et al. [2016]) leverages a
TCPN system model (tasks and CPUs) to track the fluid schedule as closely
as possible by resorting to a sliding mode feedback controller. Every task is
continuously executed at a rate equal to its utilization factor, formulated as a
per-task continuous (fluid) function. The time is divided in intervals bounded
by the successive deadlines in the ordered set of all task deadlines as in deadline
partitioning. The fluid functions are calculated for every deadline interval. At
any instant ζ, any task τi must have been executed ui × ζ units of time from
the start of the schedule. This ensures fairness at every deadline interval. The
discretization is performed on a quantum basis, with Q being the greater
common divisor (GCD) of all deadlines in the set. When Q tends to 1, this
approach tends to the pfair behaviour.
Thermal-aware scheduling has been widely studied for single core systems,
exploiting DVFS to reduce power consumption and temperature (Kong et al.
[2014], Hettiarachchi et al. [2014]). Chen et al. [2007] study the temperature
problem in uniprocessors and in homogeneous multiprocessor systems. They
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 7
leverage a partitioned EDF-based algorithm that minimizes energy, and derive
an approximation bound for the maximum temperature. Schor et al. [2012]
perform a worst-case temperature analysis for RT tasks with non-deterministic
workloads running on multiprocessors systems. Ahmed et al. [2016] tackle the
problem of thermal constrained scheduling of periodic tasks. Chantem et al.
[2011] use an equivalent circuit model to estimate the temperature for a given
set of HRT tasks on a multicore system, also referred to a partitioned scheme.
Feedback methods from control theory have been often used to cope with a dy-
namic environment for RT scheduling. The feedback control algorithm in Fu
et al. [2010] enforces both thermal and RT constraints but is restricted to
single-core processors, as they do not consider inter-core thermal coupling in
multicore processors. A general framework of dynamic thermal management
for multicore processors is proposed in Donald and Martonosi [2006]. It con-
sists of a hierarchical feedback control loop with PI controllers but does not
guarantee RT performance. The thermal problem is defined in Zanini et al.
[2009] as a control theory problem with a state space representation, and pro-
poses an optimum solution to the frequency assignment problem for thermal
balancing of MPSoC, but it does not consider RT constraints nor the schedul-
ing problem. Other contributions based on control theory are limited to SRT
systems (Fu et al. [2009], Fu et al. [2012]), allowing for a certain percentage of
missed deadlines.
Most former references leverage partitioned approaches. In recent years, context-
switch and migration overhead has tipped the scale in favor of semipartitioned
and empirical designs with few contributions left leveraging global scheduling
techniques (Moulik et al. [2017]). In semipartitioned schedulers, some tasks
are statically allocated to the processors whereas others are split across mul-
tiple processors as in global scheduling. This allow Casini et al. [2017] to deal
with dynamic task sets, for example. Empirical designs resort to a combina-
tion of techniques, but can only solve specific problems (Brandenburg and
Gül [2016]). However, the automotive, aeronautics and space industry, under-
standably conservative but already embracing multicores, keeps clinging to
the traditional cyclic executive upon static partitioning (ARINC [1997], Diniz
and Rufino [2005], AUTOSAR [2017]) suffering from unbalanced SWaP and
facing tough problems such as the fair Worst Case Delay calculation, which
lead to complex ILP solutions (Fernandez et al. [2017], Cardona et al. [2018]).
Moreover, the problem of including additional constraints such as a maxi-
mum temperature or shared resources, still leaves many open avenues in the
HRT multiprocessor scheduling arena, which gives global scheduling a chance
because of its capability to achieve optimal CPU utilization.
In this vein, quantum-based schedulers such as pfair are good candidates to
control temperature, at the cost of a high overhead. Deadline partitioning
approaches limit context switching and migration, but the variable intervals
between deadlines can be too long to cope with temperature variations. In
Desirena-Lopez et al. [2019] we extended and improved the aforementioned
8 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
approach taken in Desirena-Lopez et al. [2016] to include thermal constraints.
The fluid time share that each task must be granted at each processor during
the hyperperiod to meet HRT and thermal constraints is computed off-line
in polynomial time by solving a LPP. This time share is a per-CPU (CPUj),
per-task (τi) fluid schedule function jFSCτi(ζ), ζ being the current time. This
functions implicitly provide a partition, and are calculated at each deadline
interval over the ordered set of all task deadlines (i.e. at each ζ = sdi) following
a deadline partitioning scheme. The fluid scheduler is discretized on-line, on a
quantum basis, with Q = GCD(sdi). At each quantum, a sliding mode feed-
back controller makes a TCPN model of tasks and CPUs to evolve so that the
error between the jFSCτi(ζ) and the actual fluid execution time of τi on each
CPUj becomes minimum, accounting for disturbances. The evolution of the
TCPN barely takes a simple linear computation. Then, a per-CPU task prior-
ity queue is build upon the difference between the fluid and the actual discrete
execution time, and jobs are accordingly dispatched. The feedback controller
allows the system to recover from disturbances such as CPU detentions due
to environmental hazards causing energy interruptions or thermal peaks. The
advantage of this approach is the implementation of an on-off control law,
which leads to a low-overhead scheduler, capable of handling perturbations in
underloaded systems without rescheduling a job.
As in Desirena-Lopez et al. [2019], we follow in this paper a scheme in two
stages, off-line (LPP) and on-line, but there are substantial differences. In
broad terms, we add frequency control to minimize power consumption and
deal with aperiodic tasks, but differences go further when going into detail.
First, the LPP solved during the off-line stage yields CPU cycles per task
and deadline interval (over the set of all task deadlines, according to a dead-
line partitioning approach), instead of providing a fluid time share per task
and CPU that must be discretized later on-line, as in Desirena-Lopez et al.
[2019]. That is, there is no implicit partition or CPU allocation now, and the
time share comes out already in cycles per deadline interval, also meeting the
time and thermal constraints of the HRT task set, at the minimum available
CPU frequency (F ∗), in order to minimize power consumption. F ∗ is always
below the maximum frequency (F+) at which the system can run the task
set without violating the thermal constraint. Second, the on-line stage is now
event-based (zero-laxity, task completion and aperiodic task arrival events)
instead of quantum-based, which dramatically plummets the overhead. Also,
calculations in this stage are much lighter than in Desirena-Lopez et al. [2019],
since the feedback controller can be obviated, and the only pending tasks left
are task allocation and aperiodic task management, a capability unavailable
in Desirena-Lopez et al. [2019]. Task allocation is performed by applying a
Fixed-Priority Zero-Laxity algorithm (FPZL, Davis and Burns [2011a]). Ape-
riodic task arrival is managed by an Adaptive Scheduler (AS). which accepts
or rejects arriving aperiodic tasks. The AS adds robustness by throttling CPU
frequency, accepting or rejecting aperiodic tasks upon system limits. This war-
rants the correct execution of the HRT tasks, minimizes energy consumption,
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 9
Table 1: System notation
Symbol Description
n Number of tasks
m Number of processors
T A task set
P a Processor set
τi The i
th task
cci The worst-case execution in CPU cycles of τi
ci The worst-case execution time of τi
di The relative deadline of τi
ωi The period of τi
ui The utilization of task τi
U System utilization
H Hyperperiod
F Set of discrete frequencies
Fs Set of discrete operating frequencies Fs ⊆ F
Φ∗ The normalized minimum frequency
F∗ The minimum frequency
Fc The solution of Eq.(11)
F+ The maximum thermal frequency
Fn The operating frequency
sdki The k − th deadline of task τi
SD The set of ordered deadlines sdqi
IkSD The k − th scheduling interval




The cycles of task τai to be executed during I
k
SD
Xk Set of all active tasks in IkSD
kq Thermal conductivity of component q
Vq Volume of component q
VCPUj Volume of CPU j
ρq Density of component q
cpq Specific heat capacity of component q
h Convection coefficient
Tq Temperature of component q
and meets thermal constraints, much improving the scheduling scheme pre-
sented in Desirena-Lopez et al. [2019].
3 Problem definition
This section introduces the scheduling problem herein addressed. It starts
setting the working scenario of tasks and then, it states the Minimum Energy
Thermal-Aware RT Scheduler.
The set of independent periodic tasks is denoted by T = {τ1, ..., τn}. Each task
is identified by the 3-tuple τi = (cci, di, ωi), where cci is the worst-case execu-
tion time (WCET) in CPU cycles, ωi is the task period, and di is the relative
implicit deadline (di = ωi) (Baruah et al. [2015]). P = {CPU1, . . . , CPUm} is
10 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
a set of m identical processors with an homogeneous clock frequency F ∈ F =
{F1, . . . , Fmax}.
We assume that all task parameters, including task period and CPU cycles
are integers and that any task instance or job can be preempted at any time.
The hyperperiod is defined as the period equal to the least common multiple
of periods H = lcm(ω1, ω2, . . . , ωn) of the n periodic tasks. A task τi executed




time at every ωi interval. The system utilization is defined as the fraction of






The CPU utilization must be computed and should be less or equal to the
number of processors i.e., U ≤ m (Baruah et al. [1996]).
We also consider the arrival of asynchronous, aperiodic tasks. Each aperiodic






i ) in which cc
a
i (required CPU cycles)




Formally, the problem addressed in this work is stated as follows.
Problem 31 Minimum Energy Thermal Aware RT Scheduler (METARTS).
Given the sets T of tasks and P of CPUs, the METARTS problem consists
in designing an algorithm to allocate within the hyperperiod H the tasks in T
to the m identical CPUs in such a way that the deadlines for T are always
satisfied, the CPU temperatures are always kept below a given bound Tmax and
the consumed energy is minimum. Additionally, the scheduler must execute
aperiodic tasks upon arrival subject to the temporal and thermal constraints,
or reject the aperiodic task otherwise.
4 System modeling Methodology
This section describes how we model tasks, CPU allocation, thermal behavior
and energy consumption with a TCPN. The TCPN model is outlined in Fig. 1.
Table 1 summarizes system parameters and symbols as used in the paper, and
Table 2 gathers the notation related to our TCPN model.
4.1 Task model
A task τi arrives every ωi time units to the system. This is modeled in the




, then one token is generated every ωi. Since the arc weight from t
ω
i to
pcci is equal to cci, then cci tokens are added to p
cc
i every ωi, and the marking
of pcci represents the WCET of task τi in CPU cycles (cci). The relative task





























































Fig. 1: TCPN model integrating task, CPU and thermal modeling. (a) details
the case for a single CPU, and (b) zooms out and extend the model for a chip
multiprocessor.
deadlines are modeled as the marking of places pdi . Relative deadlines are a
constant parameter, therefore the marking of pdi remains constant, and places
pdi have no input nor output arc, since we assume implicit deadlines, di = ωi.
Fig. 1a also shows the TCPN model of a single CPU (middle dotted box). We
model the allocation of task cycles to different CPUs by linking the place pcci
of each τi to a transition boundary t
alloc
i,j of each CPUj . Fig. 1b provides an
overview of a set of tasks and CPUs. Places pidlej and p
busy
i,j respectively repre-
sent the idle state and the busy state of the CPU. The marking of place pidlej
models the available CPU cycles (throughput capacity). The initial marking at
pidlej is set to 1, indicating that CPUj is idle.
The firing of talloci,j allocates CPU cycles of task τi to CPUj by moving tokens
from place pcci to place p
busy
i,j . The firing of t
exec
i,j executes the allocated CPU
cycles. The required CPU capacity is reserved during the allocation period
and continuously released as task CPU cycles are executed. Arcs from pidlej to




j are weighted by a constant value η, to ensure that
12 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
Table 2: Notation for the TCPN model
TCPN symbol Description
TCPN Module for task τi
pωi Period place of τi
pdi Deadline place of τi
pcci Cpu cycles place of τi
tωi period transition of τi
λωi Fire rate of transition t
ω
i
TCPN Module for CPUj
pidlej Idle state place of CPUj
ppowj Power place of CPUj
pbusyi,j Busy state place of τi in CPUj
talloci,j Allocation transition of τi in CPUj
texeci,j Execution transition of τi in CPUj
λalloci,j Fire rate transition of t
alloc
i,j
λexeci,j Fire rate transition of t
exec
i,j
η CPU modeling parameter
TCPN Thermal model
pcomq Place of component q
pairq Place of component q
pαq Place for leakage power of component q
tcondq→r Conduction transition from component q to component r
tconvq Convection transition of component q
tconvq→∞ Convection transition of component q to air
tαq Leakage power transition for α of component q
tδq Leakage power transition for δ of component q
λcondq→r Fire rate transition of t
cond
q→r
λconvq Fire rate transition of t
conv
q
λconvq→∞ Fire rate transition of t
conv
q→∞
λαq Fire rate transition of t
α
q
λδq Fire rate transition of t
δ
q
the flow in transitions talloci,j is limited by the throughput capacity of the CPU
(modeled by the marking of place pidlej ).
4.2 Thermal modeling methodology
At the bottom of Fig. 1a (TCPN Thermal Model) we show a set of dotted
boxes, only the central one being closed. Each box correspond to a prismatic
element modeling the thermal properties and behavior of a specific chip area.
Fig. 1b zooms out this view, to include a number of prismatic elements obeying
a specific meshing of the chip multiprocessor. Firing the boundary transitions
texeci,j (which model the execution of τi in CPUj) adds tokens to places p
comj
i
(at the center of each dotted box in the TCPN Thermal Model, with marking
Ti). This activates the rest of transitions and places conforming this ther-
mal model, representing heat transfer by conduction and convection, and thus
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 13
causing temperature to increase because of the computing activity. A detailed
explanation of the thermal model is provided in Desirena-Lopez et al. [2014].
4.3 Energy consumption modeling methodology
Let F be the CPUj clock frequency during the time interval (ζ1, ζ2]. Then, the





PCPUj (F )dζ (4)
PCPUj (F ) represents the power consumed by a CPUj . It depends on the
dynamic power due to computational activities of tasks (Pdyn(F )), and on
the static power due to leakage (Pleak). It is computed as: PCPUj (F ) =





3 + Pleakj . Pleakj can be modeled as a
linear function of temperature (Ahmed et al. [2016]): Pleak = δT + ρ, where T
is the CPUs temperature and δ and ρ are modeling constants.
4.4 TCPN model behavior
The dynamic behavior of the TCPN model introduced in Fig. 1a is described
by the following equations:




ṁT =CT ΛTΠT (m)mT − CallocT w
alloc (5c)








where Cx, Λx, and Πx(m) are the incidence matrix, the firing rate transi-
tions and the configuration matrix (x = {T, a, T ,P} ) of the thermal, tasks,




P stand for the
connections of transitions talloci,j from (to) places in the thermal model, the
task model, and CPU throughput model respectively. Matrix CexecP has the
columns of the transitions texeci,j of the incidence matrix CP .w
alloc is the con-
trolled flow of the allocation transitions (i.e. the allocation rate of tasks to
14 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
CPUs). All matrices are computed using the TCPN theory presented in Sec-
tion 2.
Eq. (5a) represents the evolution of system temperature. Eq. (5b) indicates
that the environmental temperature keeps constant during observation time
(its derivative is neglected). Eq. (5c) describes the arrival of periodic tasks
to the system. Eq. (5d) models the CPUs cycles that are assigned to tasks.
Finally, Eq. (5e) models task execution.
5 Off-line stage
This stage computes the time that each task job τi must run during the in-
tervals defined by all deadlines in the task set. First, we compute the thermal
constraint and frequency ranges. Then, we use these results as the constraints
in an integer linear problem (ILP) to compute the execution times for tasks
during deadlines. Fortunately, the constraints can be represented by an uni-
modular matrix, thus this ILP can be efficiently solved by a linear program-
ming problem (LPP).
5.1 Thermal constraint
The TCPN thermal equation Eq. (5a) is used to derive the thermal constraints.
Since the schedule must be periodic, i.e. it must be repeated every hyperperiod,
so must be the thermal solution. In this case, the initial and final temperature
in every hyperperiod must be equal. This can be approached by considering the
steady state of the temperature, i.e. the steady state of the thermal equation:









In a steady state temperature (mTss) when time tends to infinite ṁT = 0.
Hence mTss = −A−1(F 3Bwalloc +B′ma). In order to not violate the ther-
mal constraint of CPUs, the steady state temperature must be less than or
equal to its maximum temperature level, i.e., SmTss ≤ Tmax (thermal con-
straint) then:
−SA−1 F 3Bwalloc ≤ Tmax + SA−1B′ma (7)
This equation provides the thermal constraints that the allocation of tasks to
the processors (walloc) must fulfill. It includes the allocated tasks (which in
the steady state are equivalent to the executed tasks), the clock frequency and
the temperature bounds. This equation will be used to compute the range of
feasible operation frequencies.
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 15
5.2 Minimum frequency
The proposed approach aims to minimize the system dynamic energy con-
sumption under the constraints of the RT task deadlines and thermal limit. We
explore the energy minimization under DVFS, which vary processor frequency
by selecting one from a finite set of a preset values, i.e. F = {F1, ..., Fmax}. For
convenience, we normalize this set as φ = {φmin = F1Fmax , ...., 1}. Since all the
design parameters are fixed in the CPU, the consumed energy is minimized in
Eq. 4 iff the clock frequency F is minimized. Nevertheless, F must be high
enough to ensure that the temporal constraints are met. The next proposition
obtains the minimum clock frequency that fulfills the temporal constraints.
Proposition 1 Assuming that the task utilization is less than the number
of processors in the METARTS problem, the normalized clock frequency that










Proof According to Eq. (4), the energy has a minimum iff the consumer power






m, and φ ≥ φmin. Using Lagrange multipliers, the Lagrangian function is







− m) + µ2(φmin − φ). The solution yields four
cases: a) Both multipliers are inactive (µ1,2 = 0); b) Both multipliers are
active (µ1,2 ≥ 0); c) µ1 = 0 and µ2 ≥ 0; and d) µ1 ≥ 0 and µ2 = 0. The
first case is unfeasible, because φ cannot be zero. In the second case, the only







. Finally, if one multiplier is active
while the other one is inactive there are two possible solutions: φ = φmin





. Consequently, in order to fulfill both constraints,
the normalized clock frequency that minimizes the total energy consumption






The normalized frequency Φ∗ meets the temporal constraints. To guarantee
that the thermal constraints are also fulfilled, we must compute walloc and





= m and the processor frequency is,
F ∗ = min{F ∈ F|F ≥ Φ∗Fmax} (9)
given the nature of the discrete set of frequencies. When computing Φ∗ (Eq. 8)
we assume a fully utilized system, but actual F ∗ in Eq. 9 can make the ex-
ecution faster, causing the utilization to become below 100%, in those cases





to assure that system
16 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
utilization is 100% . To make the distribution of the CPU cycles required to
















walloc controls the flow of the allocation transitions in the TCPN (talloci,j in
Fig. 1), thus modeling the allocation rate of tasks to CPUs (Eq. 5). If walloc
satisfies Eq. (7), then the thermal constraints are also satisfied. Otherwise, the
METARTS problem does not have a solution. If it has a solution (Φ∗ is fea-
sible), then we can compute the maximum CPU cycles available for aperiodic
tasks, and the maximum clock frequency that can be used subject to thermal
constraints.
5.3 Maximum CPU cycles and clock frequency
The maximum thermal frequency F+ ∈ F is the greatest frequency at which
all CPUs can operate at 100% of utilization so that temperature meets the
thermal constraint. To compute F+, first we solve the programming problem







, . . . , CCmFcH
]T
≤ Tmax + SA−1B′ma
CCj
FcH
= 1 ∀j = 1, . . . ,m
F ∗ ≤ Fc ≤ Fmax
(11)
The first constraint establish the thermal requirements. CCj represents the
cycles that CPUj must execute per hyperperiod. Since all CPUs must work
at their maximum capacity, the second constraint implies that the CPU uti-
lization is 100%. The last constraint bounds Fc to the actual clock frequency
range of CPUs. Finally, the solution for F+ has to be in the set F of discrete
frequencies, thus the processor frequency F+ is calculated as,
F+ = max{F ∈ F|F ≤ Fc}. (12)
With the minimum frequency F∗ and the maximum thermal frequency F+
we define
Fs = {F ∈ F|F ∗ ≤ F ≤ F+}, (13)
as the set of operating frequencies that meet the thermal constraint.
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 17
5.4 Deadline partitioning
Once frequency F ∗ is known (Eq. 9), the WCET that each task must run
at each deadline interval can be computed. For this, we consider the ordered
set of the deadlines of all tasks to define scheduling intervals, as in deadline




within the hyperperiod H. Thus every qi ∗ωi, where qi = 1, ..., ni is a deadline
that must be considered in the analysis. These deadlines sdqii are ordered
and joined in the set SDi = {sd1i , ..., sd
ni
i }. The general set of deadlines is
defined as SD = SD0 ∪ ... ∪ SD|T | where SD0 = {0}. The elements in SD
can be arranged in ascendant order and renamed as SD = {sd0, ..., sdα},
where sdα is the last deadline. The scheduling interval I
k
SD = [sdk, sdk+1) is
defined and |IkSD| = sdk+1 − sdk represents the scheduling interval duration.
The proposed deadline partitioning problem assumes a 100% utilization on
every CPU. However, because F ∗ = min{F ∈ F|F ≥ Φ∗Fmax}, F ∗ 6= Φ∗Fmax
in general, and consequently idle cycles may appear. In order to solve this






. Then the cycles that each HRT task must execute in the IkSD ,
i.e xki ,are computed through the following linear programming problem.
Let cc∗i = ωi ∗ F ∗ − cci be the cycles that task τi can be idle. Thus, the total
amount of cycles (sdk∗F ∗ ) in sdk can be rewritten as sdk∗F ∗ = q∗ωi∗F ∗+ri,
where 0 ≤ ri < ωi ∗ F ∗, q ∈ Z, and q represents the occurrences of a task at
sdk. If ri = 0, it means that the deadline of τi is sdk. Then the following LPP










xki = m ∗ |IkSD| ∗ F ∗
if ri = 0
k∑
γ=1
xγi = q ∗ cci
if ri 6= 0
k∑
γ=1
xγi ≥ −q ∗ cc∗i +max{0,
k∑
γ=1
|IγSD| ∗ F ∗ − cc∗i }
∀i xki ≤ |I
γ
SD| ∗ F ∗
(14)
The first constraint implies that the CPU utilization is 100%. It is required
since Φ∗ indicates that CPU utilization is 100%. The second constraint guaran-
tees that those tasks that must complete their execution in the current interval
do actually end. The last constraint ensures deadline fulfillment.
The following proposition guarantees that if the former LLPs (Eq. 14) are
orderly solved according to the k − th interval, then the computed amount of
time that each task must run per interval yields a feasible schedule.
18 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
Proposition 2 Given a task set T , where the task utilization at F ∗ is equal
to the number of CPUs, then the solution of the linear programming problems
in Eq. (14) is always integer. Moreover, if each task τi is executed exactly x
k
i
cycles during the k − th interval, then a feasible schedule is obtained.
Proof Let T k = T k1 ∪ T k2 , where T k1 and T k2 partition the task set. T k1 =
{τ1, ..., τv} is the set of tasks that have their deadlines at sdk and T k2 =
{τv+1, ..., τn} = T − T k1 . In the LPP 14, the last two constraints must be
converted into equality equations. This is solved by adding slack variables
hi to each constraint. Then all the constraints are represented as My = b
where the vector of variables is composed of the workload and slack variables,
i.e. y = [x h]T . Notice that vector b is always integer. By construction, the







where L has the form:
Lv+1×n =

1 1 · · · 1 1 · · · 1
1 0 · · · 0 0 · · · 0








0 0 · · · 1 0 · · · 0
 (16)
It is easy to see that the rank of L is v+1. Hence, the rank of M is rank(M) =
rank(L) + rank(I) = 2n+ 1, i.e M is a full row rank matrix. In order to prove
that the solution is always integer, we will demonstrate that the restriction
matrix M is unimodular.
Since M is a full row rank matrix, then the determinant of every square sub-
matrix (Msi) of order 2n+1, obtained by removing columns, must be equal to
1, 0 or -1 to proof that M is unimodular Sierksma [2001]. When the columns
are removed, the following three scenarios are possible.
1) If any of the first v columns is removed, Msi losses rank, because a row
with only zero elements remains, hence the determinant is 0. 2) If the removed
column M(•, j) contains a nonzero entry M(i, j), where j > v, then the corre-
sponding row M(i, •) has a nonzero element among the first v columns, then
Msi losses rank since the resulting row is duplicated among the first v rows.
Thus the determinant is 0. 3) When any other column not listed before is







Then, according to theorem 3.2 in Sierksma [2001], A is always a totally uni-
modular matrix, thus det(A) = 0,±1. Also the determinant for the identity
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 19
Fig. 2: Overall view of the Minimum Energy Thermal-Aware scheduler with
aperiodic task management.
matrix is always 1; therefore, applying determinant per blocks, det(Msi) =
det(A) = 0,±1. ut
6 On-line stage
The previous section described the off-line stage, that computes the CPU clock
frequency F ∗, the maximum clock frequency F+ and the task execution time
per deadline interval. Fig. 2 provides an overview of the two stages of the
scheduler, the parts involved, and their linking signals. The process to build
up all these parts and run simulations has been fully automated, as part of
a publicly available simulation framework Desirena et al. [2019]. The package
also includes other schedulers that can be simulated out-of-the-box. It provides
tools to manually or automatically generate RT task sets for simulation, and to
plot the results. This Section describes each of the components of the on-line
stage and their relation to the whole system.
The inputs of the on-line stage are the deadline partition (i.e. the ordered
set of deadline intervals (IkSD), the cycles that each task must execute per
deadline interval (XkHRT ), the set of actual feasible CPU frequencies (Fs), the
actual cycles that each job have executed since the last scheduling event (exki ),
and the aperiodic task parameters and arrivals. The output signal jW
alloc
i
(Eq. 10), is the vector that determines the task allocation ratio of the allocation
transitions (talloci,j ) in the TCPN model (Eq. 5). If the scheduler is implemented
on a real system instead of tested on a TCPN model of the system, it represents
the vector of pairs (τi,CPUj) determining the allocation of tasks to CPUs.
20 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
ALGORITHM 1: On-line Scheduler
Input:
IkSD – Scheduling (deadline) intervals;
Xk – Task cycles per deadline interval ;
exki – Cycles actually executed for each task since the last scheduling event ;
Fn – CPU frequency
Output: A feasible schedule
1 Initialize k = 0
2 while true do
3 Compute the ordered set SL of laxities;
4 if reach a new IkSD then
5 k = k + 1 ; /* Update scheduling interval */
6 Compute task priorities using Priority Levels ;
7 Execute the m tasks with higher priority;
8 else if reach a zero laxity then
9 Compute task priorities using Priority Levels ;
10 Execute the m tasks with higher priority;
11 else if aperiodic task arrival then




The on-line scheduler (Alg. 1) leverages a Fixed Priority Zero-Laxity (FPZL)
algorithm (Davis and Burns [2011a]) to allocate tasks to processors until the
next scheduling event. The inputs are the outputs of the off-line stage, and the
runtime of each task accumulated since the previous scheduling event (exki ,
Fig. 2). A scheduling event occurs when a job reaches its zero laxity, a job
ends, or an aperiodic task arrives. In the latter event, if the aperiodic task is
accepted, the adaptive scheduler (AS ) provides the cycles that the aperiodic
task must run during the deadline interval (xkτai
), along with the adjusted
CPU frequency (Fn).
Priority Levels.- Whenever an event occurs, task priorities are updated
according to their laxity. Per-job laxities are calculated and ordered in a set
SL = {li|li = sdk+1− (Fn ∗xki − exki )− ζ} (Alg. 1, step 3). Jobs reaching their
zero-laxity time are given the maximum priority (= 1). Jobs being executed
and with laxity different from zero receive priority equal to 2. The remaining
jobs receive priority level equal to 3 (the lowest one). Thus, zero laxity tasks
have the highest priority and must be executed immediately.
Execution of m tasks with the highest priority.- In Alg. 1, steps 7, 10,
m tasks are dispatched to the m CPUs. In order to reduce the number of mi-
grations, tasks that are executed during two consecutive events are allocated
to the same CPU. In a system simulated by a TCPN, this step means to com-
pute Eq. 5 according to jW
alloc
i (Eq. 10), in order to advance the simulation.
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 21
In a real system, the set of m tasks are just passed to the dispatcher of the
operating system.
6.2 Preemptions and migrations bound
Job reemption is one of the causes of run time overhead and large memory
requirements in RT scheduling. In this section we prove that the number of
preemptions and migrations incurred by Alg. 1 is bounded.
Proposition 3 Assuming that the conditions of Proposition 2 hold, then the
context switches at each scheduling interval IkSD caused by Algorithm 1 have
an upper bound given by
2m+ na (18)
where na represents the number of active tasks in each I
k
SD.
Proof . Let X be the solution of the LPPs in Eq.(14), and xki ∈ X. During the
IkSD interval, task τ
k
i must run for x
k
i cycles at a given Fn clock frequency, but
it can be the case that xki = 0, hence the number of active tasks n
k
a in the I
k
SD
interval can be less than n. Recall that a task can only leave a processor under
two conditions: it has finished its execution or it is preempted by another task
that reached zero laxity (ZL). On the best case scenario, all tasks assigned
to the processors are already in ZL, therefore nka = m. Under this condition
each task is allocated to a processor, generating m context switches (CS) and
another m CS at the end of their execution, thus, producing a total of 2m
CS. Also, it is possible that the m tasks that were first allocated finished their
execution before IkSD. Therefore, at most n
k
a −m tasks will be allocated and
incurring in (nka−m) + 2m CS. Finally, there are cases when some tasks reach
ZL and others just finished their execution. Since the schedule is feasible there
could only be m tasks reaching ZL at most, lest they miss their deadline.
Hence the upper bound of CS is (nka − m) + 2m + m, which is the same as
2m+ nka. ut
Proposition 4 Assuming that Proposition 2 holds, Algorithm 1 causes at
most m− 1 migrations of tasks.
Proof . Recall that a task τi is preempted only when another task has reached
zero laxity, hence when τi is preempted it will migrate because its original
processor will become unavailable. Therefore at most there will be m − 1
migrations.
22 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
6.3 Aperiodic tasks and Adaptive Scheduler
Aperiodic tasks arrive asynchronously to the system. An adaptive scheduler
(AS ) determines if these tasks can be executed without compromising the
HRT constraints of the periodic task set (Fig. 2, Alg. 2). If so, a new CPU
clock frequency is computed allowing the execution of the aperiodic task. The
computed frequency must be in the range Fs = {F ∗ . . . F+}, because from
the off-line stage we know that these frequencies meet the thermal constraints.
Moreover, the frequency must also be as low as possible, in order to guarantee
a minimum power consumption while meeting the temporal constraints.
Upon an aperiodic arrival, the AS determines the current IjSD and the schedul-




i ). Recall that the scheduler
is periodic on the hyperperiod, such that the scheduling interval IΓSD is the
one that contains the element g = (rai + d
a
i ) mod H.
Then the AS computes the required CPU cycles Cu for all active tasks from









xγ mod αi (19)
where α is the number of scheduling interval and exki is the execution of active
task i in CPU cycles, since the last scheduling event and Xk = XkHRT ∪Xkap
represents the set of CPU cycles that every active task must execute during the
k− th scheduling interval. Hence the first sum in Eq.( 19) stands for the CPU
cycles that the system still need to execute, while the second sum does the
same for the subsequent scheduling intervals up to IΓSD. With this information,
the algorithm computes the maximum amount of CPU cycles (Cfree) that the
processors can spare when running at maximum frequency F+,
Cfree = m ∗ dai ∗ F+ − Cu (20)
where m is the number of CPUs and dai is the aperiodic task relative deadline.
If Cfree is greater than the cycles demanded by the aperiodic task (cc
a
i ), the










Once Fn has been determined, the algorithm calculates the number of cycles
xkτai from τ
a





Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 23
ALGORITHM 2: Adaptive Scheduler (AS)
Input:
Xk – Task cycles per deadline interval ; IkSD – Scheduling intervals;
ccai , d
a
i – Aperiodic tasks parameters;
Fs = {F ∗, . . . , F+}–Clock frequencies;
Fn– CPUs operating frequency;
exki – Cycles executed for each task since the last scheduling event ;
Output:
Fn updated operating frequency,
xkτai
cycles of the aperiodic task per scheduling interval;
1 if periodic task arrives then
2 Determine interval IΓSD, where τ
a
i has its deadline
3 Compute required CPU cycles for active tasks during interval; /* Eq. (19) */
4 Calculate free CPU cycles Cfree ; /* Eq.(20) */
5 if Cfree ≥ ccai then
6 Accept task τai ;
7 Determine new Fn; /* Eq.(21) */
8 Calculate xkτai
from current IkSD to I
Γ





13 if an aperiodic task finishes then
14 Discard the CPU cycles associated to the aperiodic task;
15 Recalculate the new frequency;
16 end






m(|IkSD| − rai )Fn −
|Xk|∑
i=1
(xki − exki ), ccr

For γ = k + 1 to (Γ + g)
ccr = ccr − x(γ mod α)−1τai
xγ mod ατai
= min
m(|Iγ mod αSD |)Fn −
|Xγ mod α|∑
i=1
xγ mod αi , ccr

6.4 Complexity
The complexity of the on-line stage depends on two algorithms. The priority
level and the computation of laxity in Alg. 1 is linear in the number of tasks.
At most n = |T | tasks will end its execution xki in the k − th interval (there
are at most n tasks). Also, n tasks will reach their zero laxity at most. If q
aperiodic tasks arrive in the k− th interval, then the nested while loop ends in
24 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
(n+n+q)×(n+n) (number of events × number of operations). Considering
that the outer loop runs α = |ISD| times, then the number of steps of this
algorithm is polynomial in the order of tasks. Alg. 2 runs on the arrival of an
aperiodic task and is polynomial in the order of tasks and independent of the
number of CPUs. Thus the proposed algorithm is polynomial in the order of
tasks.
7 Experimental Results
In this Section we simulate the behavior of EETAMS, a scheduler implemented
according to the on-line and off-line stages herein proposed. First, we present
an example to study the thermal behavior and real utilization considering
the HRT task set. Then, we proof the ability to deal with the arrival of an
aperiodic task while maximizing CPU utilization, controlling temperature and
optimizing energy consumption. Last, we compare EETAMS with gEDF, a
non-fluid global RT scheduler, and RT-TCPN, a pfair / deadline partitioning
hybrid algorithm. Both gEDF and RT-TCPN were introduced in Section 2.3.
In the case of gEDF we have applied a straightforward implementation. The
comparison focuses on temperature control, number of context switches and
consumed energy.
7.1 Experimental environment
We assume a platform composed of two homogeneous Intel XScale silicon mi-
croprocessors mounted over a copper heat spreader for all the experiments.
The isotropic thermal properties and dimensions of the materials are taken
from Desirena-Lopez et al. [2014]. The power model for the Intel XScale
is based on Chen and Kuo [2007]. The processor supports five operating
frequency levels F = {0.15, 0.4, 0.6, 0.8, 1} GHz, consuming PCPU =
{80, 170, 400, 900, 1600} mWatt respectively. Thus, the power consumption
function can be modeled approximately as PCPU = 0.08 + 1.52 ·φ3 Watt. The
temperature of the surrounding air is constant and set to 45o C. In the exper-
iments we assume cache memories and speculative mechanisms non-existent
or turned off.
We use an in-house simulation framework publicly available (Desirena et al.
[2019]). As of today, this framework allows to generate automatically the
TCPN model, the LPP equations and solutions, and to integrate other schedul-
ing algorithms programmed in MATLAB [2018]. The schedulers used in this
section are available out-of-the-box.
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 25
Fig. 3: Temperature evolution (upper plot) for the periodic schedule (lower
plot) at CPU1 (above) and CPU2 (below). The maximum temperature pro-
duced by this schedule is TCPU1,2 = 45.3
oC
.
7.2 Temperature control and utilization
In this first experiment, we study the thermal control capabilities and real
utilization achieved by EETAMS. We consider a set of sporadic tasks with
implicit deadlines T = {τ1, τ2, τ3}, where τ1 = (1.5e9, 4), τ2 = (3e9, 8), τ3 =
(5e9, 12), the hyperperiod is H = 24. The maximum operating temperature
level is set to Tmax1,2 = 50
o C. First, the minimum frequency for the periodic
task set is computed off-line according to Eq. (8), obtaining Φ∗ = 0.5833.
This frequency is raised to the nearest upper frequency actually available for
the processor, which is F ∗ = 0.6 GHz. Eq. (11) provides the maximum clock
frequency (F+ = 1 GHz), so that the METARTS problem has a solution.
We assume that scheduling and context-switch overheads are included in task
WCET. Then, solving the LPP in Eq. (14) for F ∗ yields the CPU cycles of
each task to be executed per interval (xki ).
Fig. 3 provides the schedule and temperature evolution produced by the al-
gorithm without considering aperiodic tasks. The off-line LPP and the FPZL
on-line scheduler are work-conserving and yield a theoretical 100% CPU uti-
lization. However, the fact that F ∗ > (Φ∗ × Fmax) makes tasks allocated to
CPU2 to run faster in this simulation. This translates into the slack that
appears in the lowest plot of the figure (interval [18, 20]), which temporarily
lowers the temperature as shown in the corresponding temperature graph, and
decreases the theoretical utilization by about 8%.
26 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
Fig. 4: Temperature evolution (upper plot) for the periodic schedule (lower
plot) at CPU1 (above) and CPU2 (below) upon acceptance of the aperiodic
task τa1 . The maximum temperature produced by this schedule is TCPU1,2 =
46.24oC
7.3 Handling aperiodic tasks
We now show the behavior of the EETAMS scheduler upon the arrival of an
aperiodic task while running the same HRT task set considered in the previous
experiment. Fig. 4 depicts the outcome when an aperiodic task τa1 = (4000, 10)
arrives at ζ = 2, during the I1SD time slice. τ
a
1 has an absolute deadline at ζ =
12. Since Cfree ≥ cca1 , the AS accepts the aperiodic task, and computes Fn =
0.8 GHz ∈ Fs as the frequency at which the processors must execute during
interval [2, 12]. The solid red line shows that temperature increases during
this interval because of the execution of the tasks at frequency Fn, and then it
decreases after ζ = 12 because a new (lower) frequency has been calculated for
the next interval. In both experiments, with and without the aperiodic tasks,
CPU1 achieves full utilization, whereas CPU2 shows a slack (idle time) at
about ζ = 18, which translates into a temperature valley. As it happened in the
previous experiment, this slack appears because the exact optimal frequency
calculated in Eq. (8) is upper bounded by a frequency belonging to the discrete
set of frequencies actually available in the microprocessor (Fn ∈ Fs).
7.4 Comparison with gEDF and the RT-TCPN pfair scheduler
EETAMS is tied to the fluid scheduling concept. As we summarized in Sec-
tion 2.3, this type of schedulers achieve 100% system utilization but yield a
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 27
high overhead because of the many scheduling points, context switches and
migrations. EETAMS is implemented with a deadline partitioning approach
instead of a quantum-based one like pfair algorithms, which should lower the
overhead. In this Section we compare EETAMS with RT-TCPN, which lever-
ages a quantum calculated on a deadline partitioning basis, and with gEDF, a
well-known scheduler unrelated to the fluid scheduling concept. We focus on
temperature control and scheduling overhead. Since gEDF lacks HRT optimal-
ity, a comparison considering aperiodic tasks locks both gEDF and RT-TCPN
at the maximum system clock frequency, which makes the comparison unfair.
On the other hand, the fact that the real utilization achieved by EETAMS falls
below 100% because of the rounding to an actual available system frequency,
makes all the more interesting the comparison with gEDF.
Because gEDF is not HRT optimal, we need to assure that all the task sets
are HRT schedulable under the three compared schedulers, for the comparison
to be fair. Therefore, in this experiment we abide by the schedulability con-
straints of gEDF. An implicit-deadline HRT task set τ is schedulable under
gEDF on m processors if Utot(T ) ≤ m − (m − 2) · Umax(T ), where Utot(T )
(Utot(T ) =
∑
i ui/m) is the total utilization and Umax(T ) the maximum uti-
lization of a task in T (Goossens and Funk [2003]). We consider m = 2 in our
experiments, therefore to warrant the schedulability under gEDF Umax(T )
should be less than or equal to 0.5, with Utot(T ) ≤ 1.5 (75% of maximum
utilization). Accordingly, we produced tasks with a utilization randomly gen-
erated under a uniform distribution by using the UUnifast algorithm (Bini and
Buttazzo [2005]). We simulated ten task sets for each total utilization vary-
ing Utot from 50% to 75% on the entire hyperperiod, working at maximum
frequency.
Fig. 5 compares the performance of the three schedulers regarding context
switches (y-axis) for our experimental range of system utilization (x-axis).
We normalize context switches as context switches per job, calculated as the
number of context switches divided by the number of jobs along the hyperpe-
riod. The huge vertical gap between the pfair -like, quantum-based RT-TCPN
and the other two schedulers requires the use of a logarithmic scale. Both,
EETAMS and gEDF schedulers incurred in fewer context switches than the
RT-TCPN scheduler. The number of context switches per job for the two fluid-
based schedulers (EETAMS and RT-TCPN ) decreases steadily as utilization
increases, but we cannot fully observe the trend since we limit utilization to
1.5 (0.75%), in order for the comparison to be fair with gEDF as explained
above.
The percentage of maximum temperature in Fig. 6 indicates the maximum
value of temperature reached by the platform during the simulation of the
computed schedule, as a percentage of the temperature bound. As a general
rule, the greater the maximum temperature, the shorter the system lifespan.
gEDF yields the highest temperature values. EETAMS obtains the lowest
temperatures because it successfully finds the minimum frequency for the pe-
28 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2




























gEDF EETAMS RT − TCPN
Fig. 5: Average amount of preemptions per job on 2 processors when the total
utilization varies between 50% and 75%

































gEDF EETAMS RT − TCPN
Fig. 6: Maximum temperature comparison for m = 2 and Utot = [1 1.5]
riodic task set in order to achieve full system utilization. As expected, the
percentage of maximum temperature tends to the same value when the total
system utilization tends to 100%.
Fig. 7 indicates the energy consumed during the hyperperiod. The values cor-
respond to the execution of the task sets with a power consumption PCPU =
0.08+1.52 ·φ3. The scheduler EETAMS tends to consume less energy because
it exploits DVFS, while gEDF and RT-TCPN always work at the maximum
operating frequency.
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 29




























gEDF EETAMS RT − TCPN
Fig. 7: Energy consumption.
8 Conclusions
This work shows that the TCPN formalism is a suitable tool for the design
of thermal-aware RT schedulers, particularly when complexity rises because
of further elements such as additional thermal constraints or aperiodic task
management. Leveraging this formalism, we build a two-stage, energy-efficient
thermal-aware scheduling system in which an HRT periodic task set executes
at minimum clock frequency on a set of processors, with the ability to manage
aperiodic tasks, optimizing power consumption, maximizing CPU utilization
and honoring the HRT and thermal constraints in all cases. The TCPN models
the activity of the tasks, their allocation to CPUs, heat generation and trans-
fer, and how the latter affects to the overall system temperature. The thermal
schedule feasibility is proved by an LPP that captures the RT and thermal re-
strictions as linear constraints. If there exists a feasible solution, then the LPP
finds the maximum operating frequency F+ to satisfy the thermal constraint.
This modeling methodology is automated by a software tool, integrated in a
publicly available simulation framework which encompasses EETAMS, a sim-
ulated implementation of the scheduling system described in this paper, along
with other schedulers equally available. We experimentally show that EE-
TAMS achieves a successful thermal control while meeting the expected HRT
constraints, maximizing CPU utilization, minimizing energy consumption and
dealing with aperiodic tasks.
We compare EETAMS with RT-TCPN, which is a fluid scheduler implemen-
tation which uses a quantum, and with gEDF, a non-fluid scheduler with lower
overhead but that lacks the HRT optimality of fluid schedulers. The compar-
ison shows that EETAMS achieve superior thermal control and the lowest
energy consumption while keeping context switching to a level comparable to
gEDF.
30 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
Future work include feedback control to add robustness under disturbances,
avoiding recalculations upon arrival of aperiodic tasks, further heuristics to
minimize the number of migrations and context switches, and testing the pro-
posed scheduler in a RT kernel.
Acknowledgement
This work has been supported by the Ministerio de Ciencia, Innovación y
Universidades and the European ERDF under Grant TIN2016-76635-C2-1-R
(AEI/ERDF, EU), and by the Aragon Government (T58 17R research group)
and ERDF 2014-2020 “Construyendo Europa desde Aragón”.
References
Dong-Ik Oh and T. P. Bakker. Utilization bounds for n-processor rate mono-
tone scheduling with static processor assignments. Real-Time Systems, 15
(2):183–192, 1998.
Nathan Fisher, Joël Goossens, and Sanjoy Baruah. Optimal online multipro-
cessor scheduling of sporadic real-time tasks is impossible. Real-Time Syst.,
45(1-2):26–71, June 2010.
Shelby Funk, Greg Levin, Caitlin Sadowski, Ian Pye, and Scott Brandt. Dp-
fair: a unifying theory for optimal hard real-time multiprocessor scheduling.
Real-Time Systems, 47(5):389–429, 2011.
S. Moulik, R. Devaraj, A. Sarkar, and A. Shaw. A deadline-partition oriented
heterogeneous multi-core scheduler for periodic tasks. In 2017 18th Interna-
tional Conference on Parallel and Distributed Computing, Applications and
Technologies (PDCAT), pages 204–210, 2017.
Björn B. Brandenburg and Mahircan Gül. Global scheduling not required: Sim-
ple, near-optimal multiprocessor real-time scheduling with semi-partitioned
reservation. In IEEE Real-Time Systems Symposium (RTSS 2016), pages
99–110, 2016.
Daniel Casini, Alessandro Biondi, and Giorgio Buttazzo. Semi-partitioned
scheduling of dynamic real-time workload: A practical approach based on
analysis-driven load balancing. In Marko Bertogna, editor, 29th Euromicro
Conference on Real-Time Systems (ECRTS 2017), volume 76 of Leibniz
International Proceedings in Informatics (LIPIcs), pages 13:1–13:23, 2017.
Robert I. Davis and Alan Burns. Fpzl schedulability analysis. In Proceedings of
the 2011 17th IEEE Real-Time and Embedded Technology and Applications
Symposium, RTAS ’11, pages 245–256, Washington, DC, USA, 2011a. IEEE
Computer Society. ISBN 978-0-7695-4344-4.
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 31
L. Rubio-Anguiano, G. Desirena-López, A. Ramı́rez-Treviño, and J.L. Briz.
Energy-efficient thermal-aware scheduling for rt tasks using tcpn. IFAC-
PapersOnLine, 51(7):236 – 242, 2018. doi: https://doi.org/10.1016/j.ifacol.
2018.06.307. 14th IFAC Workshop on Discrete Event Systems WODES
2018.
Jian-Jia Chen and Tei-Wei Kuo. Procrastination determination for peri-
odic real-time tasks in leakage-aware dynamic voltage scaling systems. In
Computer-Aided Design, 2007. ICCAD 2007. IEEE/ACM International
Conference on, pages 289–294, Nov 2007.
C. Mahulea, A. Ramirez-Trevino, L. Recalde, and M. Silva. Steady-state con-
trol reference and token conservation laws in continuous Petri net systems.
Automation Science and Engineering, IEEE Transactions on, 5(2):307–320,
April 2008. ISSN 1545-5955.
J. Desel and J. Esparza. Free Choice Petri Nets. Cambridge Tracts in Theo-
retical Computer Science 40, 1995.
M. Silva and L. Recalde. Redes de Petri continuas: Expresividad, análisis y
control de una clase de sistemas lineales conmutados. Revista Iberoamer-
icana de Automática e informática Industrial, 4(3):5–33, julio 2007. ISSN
1697-7912.
M. Silva, Jorge Júlvez, Cristian Mahulea, and C. Renato Vázquez. On fluidiza-
tion of discrete event models: observation and control of continuous Petri
nets. Discrete Event Dynamic Systems, 21(4)(3):427–497, December 2011.
Theodore P Baker. A comparison of global and partitioned edf schedulabil-
ity tests for multiprocessors. In In International Conf. on Real-Time and
Network Systems. Citeseer, 2005.
Robert I Davis and Alan Burns. A survey of hard real-time scheduling for
multiprocessor systems. ACM computing surveys (CSUR), 43(4):35, 2011b.
John M. Calandrino, James H. Anderson, and Dan P. Baumberger. A hybrid
real-time scheduling approach for large-scale multicore platforms. In Pro-
ceedings of the 19th Euromicro Conference on Real-Time Systems, ECRTS
’07, pages 247–258, Washington, DC, USA, 2007. IEEE Computer Society.
Marko Bertogna and Michele Cirinei. Response-time analysis for globally
scheduled symmetric multiprocessor platforms. In Real-Time Systems Sym-
posium, 2007. RTSS 2007. 28th IEEE International, pages 149–160. IEEE,
2007.
Joeè L Goossens and Shelby Funk. Priority-driven scheduling of periodic task
systems on multiprocessors. Real-Time Systems, pages 2–3, 2003.
Marko Bertogna, Michele Cirinei, and Giuseppe Lipari. Improved schedulabil-
ity analysis of edf on multiprocessor platforms. In Proceedings of the 17th
32 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
Euromicro Conference on Real-Time Systems, ECRTS ’05, pages 209–218,
Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2400-1.
Sanjoy K Baruah, Neil K Cohen, C Greg Plaxton, and Donald A Varvel. Pro-
portionate progress: A notion of fairness in resource allocation. Algorithmica,
15(6):600–625, 1996.
Sanjoy K Baruah, Johannes E Gehrke, and C Greg Plaxton. Fast scheduling
of periodic tasks on multiple resources. In ipps, page 280. IEEE, 1995.
James H Anderson and Anand Srinivasan. Mixed pfair/erfair scheduling of
asynchronous periodic tasks. In Real-Time Systems, 13th Euromicro Con-
ference on, 2001., pages 76–85. IEEE, 2001.
G. Desirena-Lopez, J. L. Briz, C. R. Vázquez, A. Ramı́rez-Treviño, and
D. Gómez-Gutiérrez. On-line scheduling in multiprocessor systems based
on continuous control using timed continuous petri nets. In 13th Interna-
tional Workshop on Discrete Event Systems, pages 278–283, 2016.
Joonho Kong, Sung Woo Chung, and Kevin Skadron. Recent thermal man-
agement techniques for microprocessors. ACM Computing Surveys, 44(3):
13:1–13:42, 2014.
Pradeep M. Hettiarachchi, Nathen Fisher, Masud Ahmed, Le Yi Wang, Shi-
nan Wang, and Weisong Shi. A design and analysis framework for thermal-
resilent hard real-time systems. Embedded Computing Systems, ACM Trans-
actions on, 13(5s):146:1–146:25, 2014.
Jian-Jia Chen, Chia-Mei Hung, and Tei-Wei Kuo. On the minimization fo the
instantaneous temperature for periodic real-time tasks. In 13th IEEE Real
Time and Embedded Technology and Applications Symposium (RTAS’07),
pages 236–248. IEEE, 2007.
Lars Schor, Iuliana Bacivarov, Hoeseok Yang, and Lothar Thiele. Worst-case
temperature guarantees for real-time applications on multi-core systems.
In 2012 IEEE 18th Real Time and Embedded Technology and Applications
Symposium, pages 87–96. IEEE, 2012.
Rehan Ahmed, Parameswaran Ramanathan, and Kewal K Saluja. Neces-
sary and sufficient conditions for thermal schedulability of periodic real-
time tasks under fluid scheduling model. ACM Transactions on Embedded
Computing Systems (TECS), 15(3):49, 2016.
T. Chantem, X.S. Hu, and R.P. Dick. Temperature-aware scheduling and
assignment for hard real-time applications on MPSoCs. Very Large Scale
Integration (VLSI) Systems, IEEE Transactions on, 19(10):1884–1897, Oct
2011. ISSN 1063-8210.
Yong Fu, Nicholas Kottenstette, Yingming Chen, Chenyang Lu, Xenofon D
Koutsoukos, and Hongan Wang. Feedback thermal control for real-time
Energy-Efficient Thermal-Aware Multiprocessor Scheduling for RT Tasks Using TCPN 33
systems. In 2010 16th IEEE Real-Time and Embedded Technology and Ap-
plications Symposium, pages 111–120. IEEE, 2010.
James Donald and Margaret Martonosi. Techniques for multicore thermal
management: Classification and new exploration. In ACM SIGARCH Com-
puter Architecture News, volume 34, pages 78–88. IEEE Computer Society,
2006.
Francesco Zanini, David Atienza, and Giovanni De Micheli. A control theory
approach for thermal balancing of mpsoc. In 2009 Asia and South Pacific
Design Automation Conference, pages 37–42. IEEE, 2009.
Xing Fu, Xiaorui Wang, and Eric Puster. Dynamic thermal and timeliness
guarantees for distributed real-time embedded systems. In 2009 15th IEEE
International Conference on Embedded and Real-Time Computing Systems
and Applications, pages 403–412. IEEE, 2009.
Yong Fu, Nicholas Kottenstette, Chenyang Lu, and Xenofon D Koutsoukos.
Feedback thermal control of real-time systems on multicore processors. In
Proceedings of the tenth ACM international conference on Embedded soft-
ware, pages 113–122. ACM, 2012.
ARINC. Specification 651: Design guide for integrated modular avionics, 1997.
N. Diniz and J. Rufino. Arinc 653 in space, 2005.
AUTOSAR. Specification of rte software, 2017.
Gabriel Fernandez, Javier Jalle, Jaume Abella, Eduardo Quiñones, Tullio Var-
danega, and Francisco J. Cazorla. Computing safe contention bounds for
multicore resources with round-robin and fifo arbitration. IEEE Trans.
Comput., 66(4):586–600, April 2017.
J. Cardona, C. Hernandez, E. Mezzetti, J. Abella, and F. J. Cazorla. Noco:
Ilp-based worst-case contention estimation for mesh real-time manycores.
In 2018 IEEE Real-Time Systems Symposium (RTSS), pages 265–276, Dec
2018.
G. Desirena-Lopez, A. Ramı́rez-Treviño, J. L. Briz, C. R. Vázquez, and
D. Gómez-Gutiérrez. Thermal-aware real-time scheduling using timed con-
tinuous petri nets. ACM Transactions on Embedded Computing systems. To
appear, accepted Apr. 2019), 2019.
S. Baruah, M. Bertogna, and G. Butazzo. Multiprocessor Scheduling for Real-
Time Systems. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2015.
ISBN 978-3-319-08695-8.
G. Desirena-Lopez, C. R. Vázquez, A. Ramı́rez-Treviño, and D. Gómez-
Gutiérrez. Thermal modelling for temperature control in MPSoC’s using
fluid Petri nets. In IEEE Conference on Control Applications part of Multi-
conference on Systems and Control, 2014.
34 L. Rubio-Anguiano 1, G. Desirena-López1, A. Ramı́rez-Treviño1 and J.L. Briz 2
Gerard Sierksma. Linear and integer programming: theory and practice. CRC
Press, 2001.
G. Desirena, L. Rubio, A. Ramirez, and J.L. Briz. Thermal-aware hrt schedul-
ing simulation framework, 2019. URL https://www.gdl.cinvestav.mx/art/
uploads/TCPN-Thermal-Aware Real-Time Scheduling.zip.
MATLAB. version 9.4 (R2018a). The MathWorks Inc., Natick, Mas-
sachusetts, 2018.
Enrico Bini and Giorgio C Buttazzo. Measuring the performance of schedula-
bility tests. Real-Time Systems, 30(1-2):129–154, 2005.
