Models and formal verification of multiprocessor system-on-chips  by Brekling, Aske et al.
The Journal of Logic and Algebraic Programming 77 (2008) 1–19
Contents lists available at ScienceDirect
The Journal of Logic and Algebraic Programming
journal homepage: www.elsevier .com/ locate / j lap
Models and formal veriﬁcation of multiprocessor system-on-chips
Aske Brekling*, Michael R. Hansen, Jan Madsen
Informatics and Mathematical Modelling, Technical University of Denmark, DK-2800 Lyngby, Denmark
A R T I C L E I N F O A B S T R A C T
Available online 18 July 2008
Keywords:
Multiprocessor system-on-chips
Schedulability
Timed automata
Uppaal
Veriﬁcation
In this article we develop a model for applications running on multiprocessor platforms.
An application is modelled by task graphs and a multiprocessor system is modelled by
a number of processing elements, each capable of executing tasks according to a given
scheduling discipline. We present a discrete model of computation for such systems and
characterize the size of the computation tree it sufﬁces to consider when checking for
schedulability.
Analysis of multiprocessor system on chips is a major challenge due to the freedom
of interrelated choices concerning the application level, the conﬁguration of the execution
platform and the mapping of the application onto this platform. The computational model
provides a basis for formal analysis of systems.
The model is translated to timed automata and a tool for system veriﬁcation and
simulation has been developed using Uppaal as backend. We present experimental results
on rather small systems with high complexity, primarily due to differences between best-
case and worst-case execution times. Considering worst-case execution times only, the
system becomes deterministic and using a special version of Uppaal, where the no history
is saved, we could verify a smart-phone application consisting of 103 tasks executing on
four processing elements.
© 2008 Elsevier Inc. All rights reserved.
1. Introduction
Modern hardware systems are moving toward execution platforms made up of multiple programmable and dedicated
processing elements implemented on a single chip, known as multiprocessor system-on-chip (MPSoC). The different parts of
an embedded application are executing on these processing elements; but the activity of mapping the parts of an embedded
programonto the platformelements is non-trivial. First of all, theremay be various and often conﬂicting resource constraints.
Real-time constraint, for example, should be met together with constraints on the use of memory and energy. There also
are a huge variety in freedom of choices in the mapping of an application to a platform because there are many ways to
partition an embedded program into parts, there are many ways these parts can be assigned to processing elements, and
each processing element can be set up in many ways.
In this work we aim at models and tools for analysis of problems that must be considered when an application is mapped
to an execution platform. Such models are called system models [25] as they comprise a model for the application executing
on the platform, and the analysis of such systems is also called cross-layer analysis as it deals with problems where decisions
concerning one layer of abstraction, e.g. concerning the scheduling principle used in a processing element, has inﬂuence on
the properties at another level of abstraction, e.g. a task is missing a deadline.
To illustrate the problems we shall address, consider the simple example in Fig. 1, where the application is speciﬁed by
ﬁve cyclic tasks τ1, . . . ,τ5 and three processing elements pe1,pe2,pe3. There are causal dependencies between tasks, and τ1,
* Corresponding author.
E-mail address:awb@imm.dtu.dk (A. Brekling).
1567-8326/$ - see front matter © 2008 Elsevier Inc. All rights reserved.
doi:10.1016/j.jlap.2008.05.002
2 A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19
Fig. 1. Example: best-case execution time gives slower run.
for example, must ﬁnish before τ2 can start (written τ1 ≺ τ2). We want to ﬁnd the shortest period where all tasks meet their
deadlines and analyze two runs corresponding to the two possible execution times for τ1 in Fig. 2. In both runs, τ1 and τ3 are
executing on pe1 and pe3, respectively, in the ﬁrst time step,where no task is executing on pe2 due to the causal dependencies.
The later time steps have similar explanations. Observe that the shortest possible period is π = 4 corresponding to the case
where the best-case execution time bcetτ1 = 1 is chosen for τ1. Thus, an analysis based on the worst-case execution time
wcetτ1 = 2 would, in this case, not lead to the worst-case scenario. This is an example of a multiprocessing timing anomaly
[7] exhibiting a counterintuitive timing behavior. A locally faster execution, either by making the processor faster or by
making the algorithm more efﬁcient, may lead to an increase of the execution time of the whole system. The presence of
such behavior makes multiprocessor timing analysis more difﬁcult [27].
It is easy to check whether a period π = 3 can be achieved for this application, simply by changing the priorities so that
τ4 gets higher priority than τ2. But the problems cannot get much bigger than the one in Fig. 1 before the consequences of
design decisions cannot be comprehended, and it is necessary to have tool support for the design space exploration [10,24].
1.1. Related work
As indicated by the example in Fig. 1, the schedulability analysis of systems relates to the classical job-shop scheduling
problem [23]. A job-shop problem is speciﬁed as follows: suppose a number of non-cyclic tasks and a number of machines
(processing elements) are given. Each task τ is assigned a machine for a speciﬁed amount of time (bcetτ = wcetτ ). A job is
a sequence of tasks (e.g. τ3 ≺ τ4 ≺ τ5), and the job-shop optimization problem is the problem of ﬁnding a minimum-time
schedule for a collection of jobs (two in the above example). This problem is known to be NP-hard. The problem we address
can be considered a generalization of the job-shop problem: we consider cyclic tasks, and tasks can have an offset, best case
and worst-case execution times. Furthermore, we consider sequential components of task given by directed acyclic graphs
(not just a linear order), and we consider a variety of preemptive scheduling disciplines. We will, however, just consider a
decision version of the problem. In [2] it is shown that ﬁnding an optimal schedule for a job-shop optimization problem
corresponds to ﬁnding a shortest path in a certain kind of acyclic timed automata.
Most modelling methodologies for multiprocessor embedded systems are based on the Y-chart of system design [25]. In
the Y-chart there is a clear distinction between the application and the execution platform, and there is an explicit mapping
of the elements of the application onto elements of the execution platform. This mapping results in a particular system
design that can be evaluated using simulation or formal analysis.
Simulation-based methodologies are still the predominant technique for performance evaluation and, for example,
MPARM [17] is a SystemC-based [16] framework formultiprocessorMPSoC architectures aimed at exploring on-chip commu-
nication networks. It is a cycle-accuratemodel that requires detailed knowledge of the execution platform. The detailed level
allows for executing compiled software, including the real-time operating systems, on the platform model. Refs. [11,22,9]
propose a higher level modelling approach, by viewing the application as a set of tasks executing on a real-time operating
system. As embedded software plays a dominating role in deﬁning the functionality of an embedded system, choosing the
right scheduling policy has a distinct inﬂuence on the system’s rate and reactivity. All three approaches are based on SystemC
and target single processor systems.
The challenges of MPSoC platforms have, for instance, been addressed in [25,10,24,13,19,21]. In [13], the use of virtual
processing units (VPU) for representing processors is proposed. Each VPU supports a priority-based scheduler that models
the execution of software tasks. Tasks are modelled as communicating ﬁnite state machines, where each transition is
3
1
2 4
5
pe1
pe2
pe3 3
1
4 2
5
pe1
pe2
pe3
Fig. 2. (a) Execution time for τ1 is 1. (b) Execution time for τ1 is 2.
A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19 3
regarded an atomic operation taking a ﬁxed amount of processor cycles. A similar approach is used in ARTS [20,21], where
synchronization among processors and resource allocation is incorporated into themodel of the real-time operating system.
The designer can make simulations analyzing quantitative properties such as timing, memory and power usage, as well
as communications. A tool is proposed in [24] for performance evaluation and exploration based on SystemC. The focus is
multimedia applications that are modelled as Kahn process networks and the performance is evaluated using a trace-driven
simulation approach.
SystemCoDesigner [10] is a SystemC-based simulation framework that supports automatic performance evaluation and
design space exploration. The design space exploration in [10] is based on a multi-objective evolutionary approach. A
similar approach is used for design space exploration in CHARMED [14] which focus on periodic multi-mode embedded
systems.
The advantage of simulation-based tools is that the designers, early in the development process, receive insight into the
feasibility of the chosen designs before the concrete implementations are considered. However, although the simulation tool
can exhibit a collection of executions of a system, it can in no way be used to guarantee that every execution is meaningful.
In order to handle shortcomings of simulation-based approaches, several formal approaches have been proposed. In [26], an
approach for analyzing communication delays in message scheduling, together with optimization strategies for bus access,
is presented. Thiele et al. [30] provide a real-time calculus for scheduling of preemptive tasks with static priorities and, in
[28], a formal approach based on event ﬂow interfacing is described. A simple tool for analysis of interfacing is provided,
together with algorithms for conﬁguring a global analysis model.
In [6], a timed-automata approach for schedulability analysis of single processor systems is provided. It appears that
the computation model behind preemptive scheduling strategies is stopwatch automata, for which it is known that the
reachability problem is undecidable; however, in [6] it is shown that the schedulability problem for extended timed automata
is decidable for preemptive scheduling strategies. This approach is used in the Times tool [4], which considers schedulability
of tasks on a single processor only. As for the Times tool, our approach is based on a timed automata model.
In [8], Uppaal is used in connection with veriﬁcation and analysis of the legOS scheduler. We provide a more general
model where different schedulers can be veriﬁed on multiprocessor platforms. Altisen and Tripakis propose in [31] an
implementation methodology for transforming a timed automaton into a program, and through that, checking properties of
the execution of that program. This is done bymodelling the program as well as the execution platform as untimed and timed
automata. However, the notion of an execution platform is basically modelled in terms of the digital clock of the platform.
Many issues such as multiprocessor anomalies, which our work concerns, cannot be investigated through this methodology.
In [2,1], the timed-automata model is proposed as a natural tool for posing and solving scheduling problems in general.
However, the concept of an execution platform is not considered, and therefore issues such as the multiprocessor anomalies
are not investigated.
Overview: In this paperweprovide, in Section 2, amodel for applications (in terms of task graphs), execution platforms (in
terms of a number of processing elements each capable of executing tasks according to a speciﬁed scheduling discipline), and
systems (in terms of a mapping of an application onto the execution platform). Furthermore, we present the computational
model for systems (in Section 2.4) and a bound on the size of the computation tree (in Section 2.5) it sufﬁces to consider in
order to decidewhether all tasksmeet their deadlines.We present, in Section 3, a timed-automata [3] implementation of the
model, and, in Section 4, a tool usingUppaal [5,15] as backend, which can be used for design space exploration. Experimental
results are presented in Section 4.1. Finally, the last section contains a conclusion.
2. Model for a system-level framework
We shall now give a model for a system consisting of an application which is running on an execution platform. An
execution platform is made up of multiple programmable and dedicated processing elements (pe) and an application can be
divided up into a number of tasks (τ ), each of which represents a piece of sequential code. These tasks might have timing
constraints as well as causal dependencies with other tasks.
Having processing elements that can execute several different tasks and manage execution time dynamically introduces
a need for a dedicated real-time operating system (os) as a layer between the application and the execution platform. The
real-time operating system should manage scheduling of the different tasks as well as the dependencies tasks might have
with each other.
A system-level framework should be recognized as a framework for the overall system comprising application, real-time
operating system and processing elements. There are tools available for simulation of MPSoCs – one of these is ARTS [21],
and the model in this section as well as the timed-automata model of the next section have background in concepts and
features of ARTS.
Fig. 3 provides an overview of the different components of a system-level framework. A system in ARTS is amapping of an
application (made up of a number of tasks), onto an execution platform. Dashed arrows from tasks to processing elements
in Fig. 3 represent this mapping, e.g. τ1 is mapped to pe1. The arrows in the task graph show dependencies between tasks,
e.g. τ1 and τ2 must ﬁnish execution before τ3 can start. Each processing element has its own real-time operating system. The
communication network is modelled as a special processing element on which communication tasks are mapped (See [19]
for further details). We shall now present a formal model for the execution of an application on an execution platform.
4 A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19
Fig. 3. System-level framework for a MPSoC.
2.1. Application – the task model
Let a ﬁnite set T of tasks be given. Each task τ ∈ T is characterized by a period πτ ∈ N, a best-case execution time bcetτ ∈ N,
and a worst-case execution time wcetτ ∈ N, where wcetτ ≥ bcetτ > 0. A task τ needs a certain amount, between bcetτ and
wcetτ , of time units of a processor’s time to ﬁnish its job in a given period. (This notion is formalized later whenwe introduce
the execution model for multiprocessor platforms.) Furthermore, a task τ is characterized by an initial offset oτ ∈ N, which
means that the ﬁrst period of τ starts oτ time units after the system has started. Thus, the nth period of task τ is the time
interval [oτ + (n − 1) · πτ ,oτ + n · πτ [ ⊂ R≥0 for n = 1,2, . . .
An application is modelled by a task graph G = (T , ≺), where ≺⊆ T × T is a directed, acyclic graph. An edge (τ ,τ ′) ∈≺
(also written τ ≺ τ ′) represents a causal dependency, i.e. τ must ﬁnish its job before τ ′ can start.
A sequential component Gs = (Ts, ≺s), where Ts ⊆ T and ≺s ⊆ Ts × Ts, is a connected sub-graph of G, for which
• all tasks have the same period πTs , i.e. πi = πj = πTs for τi,τj ∈ Ts, and
• the offsets of tasks are so close to each other that their ﬁrst (and hence the nth) period overlaps, i.e. |oi − oj| < πTs for
τi,τj ∈ Ts.
We shall, from now on, assume that G can be partitioned into sequential components Gi = (Ti, ≺i), for i = 1, . . . ,N, where
Ti ∩ Tj = ∅, whenever i /= j, T = T1 ∪ · · · ∪ TN and ≺=≺1 ∪ · · · ∪ ≺N .
2.2. The model of the platform
A platform consists of M ≥ 1 processing elements (also called processors) PE = (pe1,pe2, . . . ,peM), where each processing
element pei is capable of executing tasks according to a given scheduling principle. We will here consider the following
preemptive scheduling strategies: rate-monotonic RM, ﬁxed-priority FP and earliest-deadline-ﬁrst EDF scheduling. There
will be no principal difﬁculties in extending the model with more scheduling principles; but this will, of course, add more
details.
2.3. The system model
A systemconsistsof amappingm : T → {1, . . . ,M}of tasks toprocessingelements, anda conﬁgurationSch : PE → {RM,FP,EDF}
of the scheduling principle used by each processing element. We shall often consider the set of tasks Tpei mapped to a
particular processing element pei, i.e. let
Tpei = m−1(i) = {τ ∈ T | m(τ ) = i}
2.3.1. Scheduling of tasks
We assume that each task has a unique number given by an injective function pr : T → {1, . . . ,card(T )}. This numbering
is used to deﬁne total orderings of tasks in connection with the various scheduling principles.
A task τ has higher ﬁxed priority than τ ′, written τ >FP τ ′, iff pr(τ ) < pr(τ ′), i.e. smaller number given by pr means higher
priority.
For rate-monotonic scheduling, the priority relation among tasks is primarily given by their periods (shorter periodsmean
higher priority) and secondarily by their numbering according pr:
A task τ has higher rate-monotonic priority than τ ′, written τ >RM τ ′, iff πτ < πτ ′ ∨ (πτ = πτ ′ ∧ pr(τ ) < pr(τ ′)).
A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19 5
The priority relations >FP and >RM are total orders on the set of tasks due to the unique numbering given by pr, and,
therefore, therewill never be anon-deterministic choicewhen scheduling according to the rate-monotonic andﬁxed-priority
disciplines in the system.
The relations >FP and >RM are simple since the ﬁxed-priority and rate-monotonic principles are static scheduling prin-
ciples, i.e. the ordering relations among tasks are independent of the current point in time. For earliest deadline ﬁrst this
is different as, for a given time point t, the task with the highest priority is that with the shortest distance to its nearest
deadline, and hence the priority relation between tasks may change when some new period starts during the execution.
Let t ∈ N and τ ∈ T . By distτ (t) we denote the distance from t to the nearest deadline (i.e. the start of a new period) of τ :
distτ (t) = p − t where
{
p = oτ + n · πτ
for the smallest n ∈ N+ so that p > t
}
Notice that we have a simple setting where the deadline for a task is the same as the start point of its next period. Themodel
is easily extended to cope with deadlines that are earlier than the start of the next period.
A task τ has higher earliest-deadline-ﬁrst priority than τ ′ at time t ∈ N, written τ >tEDF τ ′, iff
distτ (t) < distτ ′ (t) ∨ (distτ (t) = distτ ′ (t) ∧ pr(τ ) < pr(τ ′))
The priority relation>tEDF is also a total order, and observe that when no task has a deadline between two time points t1 and
t2, then >
t
EDF = >t
′
EDF, for every t,t
′ where t1 < t,t′ < t2. Thus, the earliest-deadline-ﬁrst priorities can change at time points
only when some new period for a task starts.
2.4. The model of computation
To model the computations of a system, the notion of a state, which is a snapshot of the state of affairs of the individual
processing elements, is introduced. Consider a processing element pei. For that processing element the state componentmust
recordwhich task in Tpei is currently executing,wherewe denote by⊥ that no task is currently executing on pei. Furthermore,
for every task τ ∈ Tpei , the state component also records the execution time i(τ ) ∈ {bcetτ ,bcetτ + 1, . . . ,wcetτ − 1,wcetτ } that
is needed by τ to ﬁnish its job in the current period. We call i an execution vector for pei. Each time a new period for τ starts,
a non-deterministic choice is taken concerning the execution time for τ in that period. Thus, a state component for pei is a
tuple (si,i), where si ∈ Tpei ∪ {⊥} and i is an execution vector for pei.
A system state or just a state is an M-tuple σ = ((s1,1),(s2,2), . . . ,(sM ,M)), where (si,i), is a state component for pei, for
i ∈ {1, . . . ,M}.
A trace (or history) is a ﬁnite sequence of states:
 = σ1σ1 · · · σk
where k ≥ 0 is the length of , denoted by length(). A trace with length k describes a system behavior in the interval [0,k[.
Consider a task τ ∈ T . The completed periods of τ in a trace  is the set
Completed(τ ) = {n ∈ N+ | oτ + n · πτ ≤ length()}
and the current period number of τ in  is
cpn(τ ) =
⎧⎨
⎩
0 if length() < oτ
1 if oτ ≤ length() < oτ + πτ
1+max Completed(τ ) otherwise
The nth period, n ≥ 1, of τ in  = σ1σ1 · · · σk is the (possibly empty) sub-sequence of :
(τ ,n) = σα+1 · · · σα+πτ , where α = oτ + (n − 1) · πτ .
Notice that (τ ,n) consists of πτ states just when n ∈ Completed(τ ) and fewer (possibly no) states otherwise.
The execution time execσ (τ ) of a task τ ∈ Tpei , in a state σ is
execσ (τ ) =
{
1 if si = τ
0 otherwise
where σ = (s1,1), . . . ,(si,i) . . . ,(sM ,M)).
This notion extends to a ﬁnite sequence of states as follows:
execσ1···σj (τ ) =
j∑
m=1
execσj (τ )
We denote by Finished(τ ,n) ∈ Bool whether τ ∈ Tpei has ﬁnished its job in the nth period, n ≥ 1, in :
Finished(τ ,n) =
{
true if exec(τ ,n)(τ ) = i(τ )
false otherwise
6 A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19
where i is the execution vector for processing element pei in the last system state of , or the zero-execution vector 
0
i
, where
0
i
(τ ) = 0, for τ ∈ Tpei , if the trace  is empty.
We shall now deﬁne the set of possible successor components states for a processing element pei given a trace  with
length k. Let i be the execution vector for pei in the last state in  or the zero-execution vector if the trace is empty. For each
τ ∈ Tpei , there are the following choices for the execution vector ′ for the next state component of pei:• ′(τ ) = 0, if k < oτ , i.e. the ﬁrst period for τ has not started yet,
• ′(τ ) ∈ {bcetτ ,bcetτ + 1, . . . ,wcetτ − 1,wcetτ }, if πτ | k − oτ (πτ divides k − oτ ), i.e. if a new period for τ starts at time k, then
one of the possible execution times for τ is chosen, and
• ′(τ ) = (τ ), if k ≥ oτ and πτ  | k − oτ .
Let EV(i) be the (ﬁnite) set of all possible execution vectors for the next state component of pei given .
For a given  ∈ EV(i), a task τ ∈ Tpei is enabled given  and , if• (τ ) > 0, i.e. n = cpn(τ ) > 0,
• Finished(τ ,n) = false, i.e. τ is not yet ﬁnished in the current period, and
• Every task τ ′ on which τ depends, i.e. τ ′ ≺ τ , is ﬁnished in the nth period, i.e. Finished(τ ′,n) = true.
Let Enable(,i) ⊆ Tpei be the set of enabled tasks on pei given  and .
The enabled task (if any) with the highest priority is selected to execute on the processing element:
Select(,i) =
{⊥ if Enable(,i) = ∅
τ if τ is the biggest element in Enable(,i) wrt. >
k
Sch(pei)
where k = length() and the priority relations >FP and >RM are extended to the time domain in the trivial way: >tFP = >FP
and >tRM = >RM, for t ∈ N.
The set of possible next component states for pei is deﬁned by
Next(i) = { (s,) |  ∈ EV(i) and s = Select(,i) }
and the set of possible next system states is deﬁned by
Next = { ((s1,1), . . . ,(sn,n)) | (si,i) ∈ Next(i), for 1 ≤ i ≤ M }
The computation tree for a system is a ﬁnitely branching, inﬁnite tree, where the root is the empty trace 〈 〉 and the internal
nodes are (labelled by) system states. Furthermore,
• there is an edge from the root to a distinct node for each system state in Next〈 〉, and
• for every internal node node in the tree, letnode be a trace of system states leading from the root to node. For every system
state σ ∈ Nextnode there is an edge from node to a distinct node labelled by σ .
A run of the system is an inﬁnite sequence of states
ρ = σ1 σ2 σ3 · · ·
occurring on some path starting from the root of the computation tree.
Notice that when the worst and best case execution times are equal for every task in the system, then there is exactly one
run of the system, as the scheduling is deterministic.
We call a system schedulable if for every run, each task ﬁnishes its job in all its periods. We shall now show that this
problem is decidable, and in the next section we shall use Uppaal to solve it.
2.5. Decidability
We now consider the problem of determining schedulability of a system and we will give an upper bound on the size of
the part of the computation tree it sufﬁces to consider when checking for schedulability.
We ﬁrst establish a periodic behavior of the priority relations among tasks. The two relations >tFP and >
t
RM for the
static scheduling principles are trivial as they are, in fact, static. We just need to consider the priorities with regard to the
earliest-deadline-ﬁrst principle.
To this end, let a situation at time t, sitt , t ∈ N, be deﬁned by the k-tuple:
sitt = (distτ1 (t),distτ2 (t), . . . ,distτk (t))
where k = card(T ) and the numbering is given by pr. For a given time t, the earliest-deadline-ﬁrst priorities can easily be
extracted from sitt (it would actually sufﬁce just to consider situations for tasks that are mapped to processing elements
with an earliest-deadline-ﬁrst discipline).
Consider an arbitrary task τ . It is easy to see that distτ satisﬁes the following properties:
distτ (t) = 1 ⇐⇒ distτ (t + 1) = πτ (1)
distτ (t) > 1 ⇐⇒ distτ (t + 1) = distτ (t) − 1 (2)
Therefore, for every t,c ∈ N:
A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19 7
Fig. 4. Example: deadline missed at time OM + 3 · H , where OM = 2,H = 3.
distτ (oτ + t) = distτ (oτ + c · πτ + t) (3)
Hence, when the time of the offset for τ has passed, distτ becomes periodic with period πτ and, hence, any multiplum of πτ
is a period as well.
This generalizes to situations in the following way. Let OM = max {oτ1 , . . . ,oτk } be the maximal offset in the system, and
H = LCM {πτ1 , . . . ,πτk } be the hyper-period for the tasks, i.e. the least commonmultiple of all periods of tasks in the system.
Since theperiod of any task in the system is a divisor of thehyper-period,wehave, using (3), the followingperiodic properties:
distτ (OM + t)=distτ (OM + c · H + t) (4)
sitOM+t =sitOM+c·H+t (5)
for every t,c ∈ N.
Hence, the inﬁnite sequence of situations
Sit = sit1 sit2 sit3 . . .
has, using (5), the following form:
Sit = sit1 sit2 sit3 . . . sitOM (sitOM+1 sitOM+2 . . . sitOM+H )ω (6)
Therefore, for any time t ≥ OM + H , we meet situations (and scheduling priorities) that we have seen earlier. This does
not mean, however, that it sufﬁces to consider the computation tree to a depth OM + H . The problem is that the execution
times may not have "stabilized" yet at time OM + H as the example in Fig. 4 shows. Even though the utilization U =
1/3+ 1/3+ 2/3 = 4/3 > 1 and the system obviously is not schedulable, the ﬁrst deadline is missed three hyper-periods
after the maximal offset.
Consider a run ρ = σ1 σ2 σ3 · · ·where, for every task, the same execution time is chosen throughout the run, i.e. for τ ∈ Tpei
and j,k ≥ oτ we have that i(τ ) = ′i(τ ), where σj = ((s1,1), . . . ,(si,i), . . . ,(sM ,M)) and σk = ((s′1,′1), . . . ,(s′i,′i), . . . ,(s′M ,′M)).
For a given t ∈ N, let ρt be the preﬁx of ρ up to time t and execρ(τ ,t) be the execution time of τ in the current period up to
time t in ρt , i.e.
ρt = σ1 σ2 · · · σt and execρ(τ ,t) = execσ (τ )
where σ = ρt(τ ,cpn(ρt ,τ)).
Consider a time t ≥ oτ . At time t, some tasks may not have started yet and, therefore, τ may have been granted more
executing time in its current period at time t than one hyper-period later at time t + H , i.e.
execρ(τ ,t) ≥ execρ(τ ,t + H) ≥ 0 (7)
where we exploit that some task is executing on a processing element whenever there is an enabled task on that element –
execution time is not wasted.
When a time tp ≥ OM is reached where for every task t ∈ T :
execρ(τ ,tp) = execρ(τ ,tp + H)
then ρ is periodic from time tp, i.e.
8 A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19
ρ = ρtp (σtp+1 σtp+2 . . . σtp+H )ω
An upper bound on tp is
OM + H · (1+ 
τ∈T wcetτ ) (8)
since, by (7), theworst-case scenario iswhen one task only gets its execution time decreased (by one)when one hyper-period
has passed. This upper bound (8) can be improved since execρ(τ ,OM) = 0 for every task τ that starts a new period at time
OM . These tasks satisfy the property: πτ | (OM − oτ ). Thus, in order to search the computation checking for schedulability,
it sufﬁces to search to the depth: OM + H · (1+ 
τ∈TXwcetτ ), where TX = {τ ∈ T | πτ  | (OM − oτ )}. A missed deadline that
occurs deeper in the computation tree will also occur at a depth closer to the root, but possibly on another path.
Let us examine the number of nodes at the following two depths OM + H and OM + H · (1+ 
τ∈TXwcetτ ) in the
computation tree. These numbers are the minimum and maximum number of paths in the tree that need examination
in order to check for schedulability. For a given task τ , we let
• periodicChoicesτ denote the number of non-deterministic choices for execution time in each period for τ , i.e.
periodicChoicesτ = wcetτi − bcetτi + 1
• initialChoicesτ denote total number of non-deterministic choices for evaluation time of τ until OM , i.e.
initialChoicesτ = (OM − oτ )/πτ  · periodicChoicesτ
• hyperChoicesτ denote the number of non-deterministic choices for the evaluation time of τ in a hyper-period, i.e.
hyperChoicesτ = (H/πτ ) · periodicChoicesτ
Hence the total number of non-deterministic choices for the system in the time interval [0,OM + H[ is
totalChoices =
∏
τ∈T
(initialChoicesτ + hyperChoicesτ )
which equals the number of nodes in the computation tree at depth OM + H , i.e. the minimum depth in the computation
tree to check before a system can be deemed schedulable. Unschedulability may be determined earlier.
The total number of non-deterministic choices for the system in the time interval [0,OM + H · (1+ 
τ∈TXwcetτ )[ , is
maxChoices =
∏
τ∈T
(initialChoicesτ + hyperChoicesτ · (1+ 
τ∈TXwcetτ ))
which equals the number of nodes in the computation tree at depth OM + H · (1+ 
τ∈TXwcetτ ), i.e. the depth in the
computation tree it sufﬁces to search when checking for schedulability.
A small example showing a high complexity
It is easy to give a small systemwith a huge number of states in the computation tree up to the depth OM + H . Consider,
for example, the system with three tasks given in Fig. 5. The maximal offset and the hyper-period of this system are:
OM =max{0,10,27} = 27
H =LCM{11,8,251} = 22088
and the number of non-deterministic choices for each task in the initial part, each period and each hyper-period are given
by
Task Periodic choices Initial choices Hyper-choices
periodicChoicesτ initialChoicesτ hyperChoicesτ
τ1 3 9 6024
τ2 4 12 11044
τ3 13 0 1144
Hence the number of nodes at depth OM + H = 22115 is
totalChoices = (9+ 6024) · (12+ 11044) · (0+ 1144) ≈ 7.6 · 1010
and the number of nodes at depth OM + H · (1+ 
τ∈TXwcetτ ) = 176731 is
maxChoices = (9+ 6024 · 8) · (12+ 11044 · 8) · (0+ 1144 · 8) ≈ 3.9 · 1013
3. A timed-automata model for MPSoC
Having a decidability result for schedulability, the next question is to get an implementation of the decision algorithm.
One way would be to construct a program directly on the basis of the computation tree and the results from Section 2.5. We
A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19 9
Fig. 5. Example: small example with a huge state space in the non-periodic part.
Fig. 6. Communication between execution platform and application.
will, however, take another approach as we aim at a more general framework supporting both simulation and veriﬁcation of
systems, and, therefore, we will build a model that comprises the components of MPSoCs and exhibits the parallel activities
occurring in such systems. We will aim at a model comprising the concepts supported by the ARTS framework [20] and
our approach is to develop a model of systems using a model-checking tool. Since the model of computation for systems is
discrete, we could use a system like SPIN [12]. But the ARTS framework has notions that are naturallymodelled by clocks and
timed automata [3], and, therefore, we will use the Uppaal [5,15] system for modelling, veriﬁcation and simulation. We will
not give introductions to timed automata and the Uppaal system in this paper. For such introductions we refer to [3,5,15].
As the current version of Uppaal does not support stopwatches, and we are dealing with preemptive scheduling (where
the running time of a task can be temporarily stopped), wemake the execution time in theUppaalmodel discrete. The clocks
handling the periodicity and the dynamically updated scheduling criteria still operate continuously. In a SPIN model the
periodicity and the dynamically updated scheduling criteria would be handled in discrete steps. Thus, it has a cost to get rid
of the clocks. Whether a SPIN model would give more efﬁcient veriﬁcation a later experiment must show. At this point we
have chosen to use Uppaal as this will give a more natural model.
TheUppaalmodel is not a direct implementation of themodel of computation. However, the notions and general structure
introducedare recognizable in theUppaalmodel, andweexplain theUppaalmodel by relating it to themodel of computation.
Wewill give a top-downpresentation of the timed-automatamodel,where a system is described as a parallel composition
of an application and an execution platform:
System = Application ‖ ExecutionPlatform
Application = ‖τ∈T TA(τ )
ExecutionPlatform = ‖M
j=1 TA(pej) ‖ DynamicPriorities
where ‖ denotes parallel composition of timed automata. Thus, an application consists of a collection of timed automata for
tasks combined in parallel and an execution platform consists of a parallel composition of timed automata for processing
elements combined in parallel with a single timed automaton handling the dynamic priorities. The timed automaton
DynamicPriorities works for all processing elements, having earliest deadline ﬁrst as scheduling discipline. From a model
point-of-view it would be more natural to have one such automaton for each processing element; from a veriﬁcation view-
point, the global timed automaton is preferred in order to reduce the number of clocks needed for this purpose to just one.
The deﬁnitions of TA(τ ),TA(pej) and DynamicPriorities are given in Sections 3.1, 3.2 and 3.2.4, respectively.
The timed automata comprising a system communicate by synchronous communication and shared variables. We give
an overview of the channel communication in Fig. 6. Suppose that the set of tasks {τj1 , . . . ,τjk } = Tpej is executed on the
processing element pej . The timed automata for these k tasks and the processing element pej share four channels readyj ,
runj , preemptj and finishj . The model will be constructed so that all channel events will occur at integer time points only,
and the correctness of the construction relies on that property.
10 A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19
missedDeadline=1
x=0
cp=0, cr=0, e=exe
cr++
x=0
missedDeadline=1
enoDtratS
x<=1
x<1
Discrete
Released MissedDeadline
Running
cr==e
x>0 &&
cr<e
&& x==1
&& cr<e
exe:int[bcet,wcet]
&& cr==e
]ﬂ
Fig. 7. Simpliﬁed template for the task.
3.1. Template for a task
We present the timed automaton TA(τ ) for a task τ ∈ Tpej (tau) with offset o (offset), period π (pi), and best-case and
worst-case execution times bcet (bcet) and wcet (wcet), respectively, in two steps. In this Uppaal model offset and pi are
global arrays with an entry for each task τ . In the same way there are four channel arrays ready, run, preempt and finish
each with an entry for every processing element pe.
Simpliﬁed template for a task
A ﬁrst simpliﬁed template can be seen in Fig. 7 together with a table explaining the identiﬁers. We will focus on timing
and channel synchronization in this part.
A clock cp is used to handle the offset and the periodicity of the task. The timed automaton stays in the Start location until
the offset time has elapsed. In the remaining locations, the task has entered the periodic behavior, or the "error-location"
missedDeadline.
In the Done location, the task has ﬁnished its job in the current period and is awaiting the start (when cp == pi) of the next,
where the transition from Done to Released is taken. This is signalled to the processing element pej using the channel readyj .
In this transition the clock cp and other variables are initialized. In particular, a choice is taken setting e non-deterministically
A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19 11
missedDeadline=1
x=0
cp=0, cr=0, e=exe,
!lockPE(taskFinishing)
cr++
missedDeadline=1
enoDtratS
x<=1
x<1
Discrete
Released MissedDeadline
Running
cr==e &&
!lockT(disc)
x>0 &&
cr<ex==1 && 
&& cr<e &&
!lockT(disc)
exe:int[bcet,wcet]
&& cr==e
Fig. 8. Full template for the task.
to a value in {bcet,bcet + 1, . . . ,wcet}. This corresponds to the choices for the execution vector in the computation tree when
a new period for τ starts. Furthermore, the integer variable cr holds the execution time for τ in the current period and is
initialized to 0.
In the "standard version" of Uppaal clocks cannot be stopped.1 Since we have preemptive scheduling, we cannot use a
clock to hold the execution time for τ in the current period.We therefore have to introduce the integer variable cr and update
it in discrete steps as described in the following.
The transition from Released to Running takes place when the processing element pej sends a signal on the runj channel
to the task. This transition takes place when the task is enabled and has highest priority among the enabled tasks on pej . On
this transition, the clock x, which is used to record one time unit, is set to 0. In one round from state Running via Discrete
and back to Running the effect is that one time unit has elapsed and that the running time cr is incremented by 1 (as the
previous runj signal is issued at an integer time point where cp < pi). Notice that the committed location is introduced as
Uppaal does not allow disjunctions in guards.
Transitions from Released and Discrete to missedDeadline are takenwhen adeadline ismissed. Furthermore, a transition
from Running to Done is taken when the task has ﬁnished its job in the current period cr == e and this is signalled to the
processing element pej by a finishj event. Since cr is not a clock we ensure that the transition is taken without delay by
making the finish channels urgent. In the Running location, the task may be preempted when a task on pej with higher
priority becomes enabled. In this case the processing element issues a preemptj signal and the task makes a transition to the
Released state.
Full template for a task
The full timed-automaton template for a task can be seen in Figs. 8, and 9 provides a table with explanations for the
introduced identiﬁers. The added information concerns dependencies, value transfer in channel communications and locking
mechanisms. We will brieﬂy discuss the main parts.
Dependencies. In order tomanage dependencies, two global Booleanmatrices depend and origdep are used,where origdep
represents the static task graph ≺, and depend the dependencies at the current time point. Initially origdep and depend are
equal, and if τi ≺ τj and τi has ﬁnished its job in the current period at the current time, then the entry for (τi,τj) in depend is
false and we say that that particular dependency is resolved. The procedure setOrigDep updates depend each time a task is
1 We are currently experimenting with a version of Uppaal supporting stopwatches and veriﬁcation using over-approximations.
12 A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19
Fig. 9. Explanation of some of the identiﬁers used in Fig. 8.
entering the Release location. It is easy to implement this procedure, as the needed information is available in the variables:
origdep, Finished, and offset.
Value transfer.Communication between the platformand the application is done over the synchronization channels ready,
run, preempt and finish. Some of this communication requires value transfer in order to identify the task in question (as all
tasks on a given processing element share the same channels). The global array tauid is used so that tauid[j-1] handles the
communication on pej .
Lockingmechanisms. In order to avoid someundesiredbehavior, two lockingmechanismsareused.One lockingmechanism
– implementedby the array disc and theBoolean function lockT–makes sure that all tasks are able to react (i.e. they are in the
location Running) when a preempt signal can occur. The other lockingmechanism – implemented by the array taskFinishing
and the Boolean function lockPE – makes sure that all finish signals are reacted to by the platform before any ready can be
handled. This is done in order not to preempt a task that has actually just ﬁnished.
3.2. Timed-automata model for processing elements
Themodel for a processing element is described by a parallel composition of a controller, a synchronizer and a scheduler:
TA(pej) = Controllerj ‖ Synchronizerj ‖ Schedulerj
where the controller coordinates the activities inside the processing element (i.e. with the tasks, the synchronizer and the
scheduler) and it coordinates the interaction between the processing elements, which is needed since dependent tasks may
be mapped to different processing elements. The synchronizer maintains a consistent view of the dynamic dependency
relation between tasks, and the scheduler implements the scheduling discipline.
The communication internally on each processing element and between the different processing elements is shown in
Fig. 10 in terms of communication between pej and pel .
3.2.1. Template for the controller
A timed-automaton template for a controller is given in Fig. 11 and the identiﬁers introduced are explained in Fig. 12. The
main idea is that the controller, in the initial location Idle, can react to reschedule signals fromother controllers and to ready
and finish signals from its own tasks. On such events it invokes its synchronizer and scheduler on the synchronize and
schedule channels as needed, and before reentering the Idle location it issues the necessary preempt, run and reschedule
signals.
Fig. 10. Communication internally and between processing elements.
A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19 13
//Wait for scheduler
//Activate scheduler
//Wait for synchronizer
//activate synchronizer
Running=0,
reschedule!
reschedule?
//Get a ready signal
//Get a finish signal
APPtoPEComm
Synchronizing
Idle
!lockPE(taskFinishing)
rescheduleNeeded
PEtoAPPCommPEtoPEComm
Scheduling
ExecutingChange
ReadyForScheduling
&& Running
!lockT(disc)
 && Running)
!lockT(disc)
Running=1
rescheduleNeeded=0
!rescheduleNeeded
!lockT(disc)
Fig. 11. Template for the controller.
Notice that time does not progress in one round from the Idle location and back, as all other locations of the controller
are committed or urgent. The single urgent location is used to capture all finish signals available at that point in time.
Fig. 12. Explanation of some of the identiﬁers used in Fig. 11.
14 A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19
//Synchronizer finishing
//Synchronizer finishing
syncFinish(),
syncReady()
//Synchronizer starting
Idle
SynchronizingdepCh=0,
rescheduleNeeded=1
aDependencyHasChanged()
!aDependencyHasChanged()
Fig. 13. Template for the synchronizer.
Each controller maintains arrays describing the tasks that are currently released, enabled and executing, and on the basis
of this information it is not difﬁcult to implement the procedure opdDep, which updates the dynamic dependencies due
to a ﬁnish signal. Notice that the ﬂag rescheduleNeeded is set by the synchronizer and the variable selected is set by the
scheduler.
3.2.2. Template for the synchronizer
The template for the synchronizer is given in Figs. 13 and 14 gives an explanation of the identiﬁers introduced.
When receiving a synchronize signal from the controller, the synchronizer updates the dynamic dependencies caused by
the currently released andﬁnished tasks, using theprocedures syncReleased and syncFinish, respectively. Theseprocedures,
whichmaintain the arrays WaitDep, Enabled and Released as well as the ﬂag depCh, are easily implemented. In the transition
from the Synchronizing location to the Idle location, the ﬂag rescheduleNeeded is set when a dependency has changed (i.e.
depCh is true).
3.2.3. Template for the scheduler
The template for the scheduler is shown in Figs. 15 and 16 provides an explanation for the identiﬁers introduced. The
procedure tasksReadyOnProc initiates the scheduling decision making by ﬁnding a task that is enabled, setting selected
accordingly, and setting aTaskIsReady to true. If no tasks on the processing element are enabled, aTaskIsReady is set to false.
The procedure findHighestPriority ﬁnds the enabled task with the highest priority on the processing element based on
the scheduling discipline given in Sch and sets selected to the number of that task. The cases for the static scheduling
disciplines (ﬁxed priority and rate monotonic in our case) are straightforward implementations that are based on the
deﬁnitions in Section 2.3.1. The difﬁcult part concerning earliest-deadline-ﬁrst scheduling are dealt with in the next section.
3.2.4. Template for dynamic priorities
The template for managing dynamically updated priorities is shown in Figs. 17 and 18 provides an explanation for the
identiﬁers introduced. This timed automaton works for all processing elements having earliest deadline ﬁrst as scheduling
discipline.
The construction of this timed automaton is based on the property (6), which describes the cyclic behavior of sit-
uations, where each situation corresponds to an earliest-deadline-ﬁrst priority list for the tasks. It is, of course, only
necessary to consider tasks that are scheduled according to the earliest-deadline-ﬁrst discipline. The self-loop for the location
OffsetPriorities covers the parts until the time of the maximal offset, and the self-loop for the location Priorities covers
the cyclic part of (6). At each transition a reschedule signal is broadcasted to all processing elements (i.e. all controllers).
The hard part is the construction of values for the variables OffSteps, Steps, OffPrios and Prios. But these can be
generated by a program traversing the sequence of situations until the end of the ﬁrst hyper-period. The idea is simple. Start
Fig. 14. Explanations for the identiﬁers used in Fig. 13.
A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19 15
//Scheduler finishing
//Scheduler finishing
tasksReadyOnProc()
//Scheduler starting
SchedulingIdle
findHighestPriority()
aTaskIsReady
!aTaskIsReady
Fig. 15. Template for the scheduler.
Fig. 16. Explanation for some of the identiﬁers used in Fig. 15.
ﬁnding the longest preﬁx of sit1 sit2 · · · sitk0 for which the priority list is constant. Then OffPrios[0] contains this priority
list and OffSteps[0] is equal to k0. Proceed ﬁnding the longest preﬁx sit1 · · · sitk0sitk0+1 · · · sitk0+k1 , for which the priority
list is constant for the subsequence sitk0+1 · · · sitk0+k1 . Then OffPrios[1] contains that priority list and OffSteps[1] is k1.
This process stops when the situation for the maximal offset is reached, and then a similar process for Steps and Prios can
start.
4. Veriﬁcation of MPSoC Using UPPAAL
The representation of a system model in the Uppaal system can be used for simulation as well as for veriﬁcation. We
will here focus on the veriﬁcation aspect for the schedulability problem only, where we verify whether every task meets its
deadline in every period. This is not the case if there is some state on some path where the Boolean variable missedDeadline
(see Fig. 8) is true. Hence, we issue the Uppaal query:
i++, pri=OffPrios[i]
reschedule!
i=0, pri=Prios[i]
i=0, pri=Prios[i], 
reschedule!
reschedule!
reschedule!
sc<=Steps[i]sc<=OffSteps[i]
seitiroirPseitiroirPtesffO
i++, pri=Prios[i]
sc==OffSteps[i] && 
!lockT(disc) && 
!lockPE(taskFinishing)
sc==Steps[i] &&
&& !lockT(disc) && 
!lockPE(taskFinishing)
sc==Steps[i] &&
&& !lockT(disc) && 
!lockPE(taskFinishing)
sc==OffSteps[i] &&
&& !lockT(disc) && 
!lockPE(taskFinishing)
Fig. 17. Template for maintaining dynamic priorities.
16 A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19
Fig. 18. Explanation of some of the identiﬁers used in Fig. 17.
E<>missedDeadline,
and every task meets every deadline iff this property is not satisﬁed.
The Uppaal system can generate a diagnostic trace for such queries and this is a very useful facility for understanding the
scenario where a deadline is missed. It should be noted that this trace is expressed in terms of the timed-automata model
and not in the context of the overall MPSoC model in Section 2.
We have tested the Uppaal model on a collection of examples found in the literature and the veriﬁcations gave correct
results.
A tool has been developed for the generation of a Uppaal model on the basis of a simple description of a system model.
The Uppaalmodel is developed on the basis of a java-program. This program contains constructors for the components of a
system, i.e. for tasks, applications, processors and platforms. Fig. 20 contains the java-program corresponding to the example
in Fig. 19.
The system consists of two processors with two tasks on each. Rate-monotonic scheduling is used on both processors.
The system has an inter-processor dependency from τ2 to τ3. The tasks have the following timing constraints, where the
best-case execution time equals the worst-case execution time for every task:
Task Processor Execution time Period Offset
τ1 pe1 2 4 0
τ2 pe1 2 6 0
τ3 pe2 2 6 0
τ4 pe2 3 6 4
This system has an inter-processor dependency and an offset on τ4.
The system is not schedulable and the following trace is extracted by the tool from the diagnostic Uppaal -trace:
Time 1 5
---------------
Task: 1 1100110
Task: 2 0011001
Task: 3 000000x
Task: 4 ----111
The trace covers the ﬁrst seven time units and the missed deadline is marked with x. Furthermore, the symbol - marks
that the task has not yet started, 0marks that the task is not executing and 1marks that the task is executing. Hence, task τ3
misses its deadline after 6 time units.
It is easy to explore various designs using this tool and, for example, changing the real-time operating systemon pe2 to use
earliest deadline ﬁrst will give a schedulable system. The only change needed in the java code concerns the line initializing
p2 as described in the comment. Another observation, which is easily veriﬁed, is that if the offset for τ4 is set to 0, the system
is schedulable in both the case where rate-monotonic scheduling is used and when earliest deadline ﬁrst is used. Hence, the
case where the offset is zero for all tasks, which is the classical worst-case scenario for single processor systems, does not
necessarily give the worst-case scenario in the multiprocessor case.
A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19 17
Fig. 19. Example of a multiprocessor system with a dependency.
Fig. 20. Java-code for the system in Example 19.
Consider the example in Fig. 21, which shows a multiprocessor anomaly where, contrary to usual single-processor
scheduling theory, the system is not schedulable using dynamically updated scheduling criteria, but it is schedulable using
statically updated scheduling criteria.
This system is not schedulable using earliest-deadline-ﬁrst scheduling disciplines. However, when using ﬁxed-priority
scheduling with pr(τ1) = 2, pr(τ2) = 1 and pr(τ3) = 3, the system is schedulable. This is easily veriﬁed using the tool.
Fig. 21. Example: Rate-monotonic vs. earliest-deadline-ﬁrst scheduling.
18 A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19
We will now examine the system given in Fig. 5. In section 2.5 it was shown that the minimum number of nodes in the
computation tree will be 7.6 · 1010 at the depth OM + H (i.e. t = 22115). The maximal depth that is needed when checking
for schedulability is OM + 8 · H (i.e. t = 176731) with 3.9 · 1013 nodes. The schedulability was veriﬁed using the Uppaal
model. The veriﬁcation used 3.1 GB of memory and took less than 11 minutes on an AMD CPU of 1.8 MHz and 32 GB of RAM.
If the system is changed slightly by adding an extra choice for execution time to τ3 i.e. wcetτ3 = 14, the number of nodes
at depth OM + H will be 8.2 · 1010 and the number of nodes at the depth OM + 8 · H will be 4.2 · 1013. When attempting
veriﬁcation of this revised system on the same CPU, the veriﬁcation aborts after 19minwith an Out of memory errormessage
after having used 3.4 GB of memory.
4.1. Further experimental results
In order tomake veriﬁcation on real-life systemswe have experimented [18] with a speciﬁcation of a smart-phone device
[29] consisting of 103 tasks on 4 processing elements. By only using worst-case execution times and, therefore, only having
one run in the computation tree, we could verify the system using a specialized version of Uppaalwhere no history is saved
and therefore less memory is used. This indicates that the model can also be used in analysis of larger real-life systems.
A more intuitive implementation of the MPSoC model would be using stopwatch automata. However, the absence of
stopwatches in the standard Uppaal version moved our focus to that of discretization of the running time. A version of
Uppaal under development introducing stopwatches – using over approximations for veriﬁcation – has however beenmade
available to us, and we have made some initial experiments with this. The stopwatch model eliminates the need for the
integer variable cr and the clock x and introduces instead a stopwatch. Furthermore, the locking mechanism used for
the discretization is eliminated. The experiments with veriﬁcation of the stopwatch model have given encouraging results,
as the example that gave an out of memory error with the standard version can be veriﬁed with the stopwatch version. All
current experiments have provided exact results. The tendencies on all experiments so far is that memory consumption as
well as veriﬁcation times are reduced by approximately 40% in comparison to the original model given here.
5. Summary
We have developed a discrete model of computation capturing the synchronization and timing behavior of an embedded
application executing on multiple interconnected processors. Tasks mapped to a processor are executed according to a
particular scheduling policy. Analysis of such systems is a major challenge due to the freedom of interrelated choices
concerning the application level, the conﬁguration of the execution platform and the mapping of the application onto this
platform, which often leads to timing anomalies. We have characterized the maximal size of the computation tree it sufﬁces
to be considered when checking for schedulability.
The computational model provides a basis for formal analysis of systems and has been expressed as timed automata
and implemented in the Uppaal system. This allows for veriﬁcation using the Uppaal model checker. We have presented
experimental results on rather small systems with high complexity, primarily due to differences between best-case and
worst-case execution times. By considering worst-case execution times only, where the system becomes deterministic, we
have shown that it is possible to verify a smart-phone application consisting of 103 tasks executing on a platform with four
processors.
We are currently extending our model to introduce costs other than timing, being able to formally verify properties such
as memory and power usage. Furthermore, initial experiments with a stopwatchmodel have given interesting results, as the
analyzeswheremore efﬁcient than those using the "standardUppaal tool" and theywere precise on the examples conducted
despite the fact that veriﬁcation is based on over-approximations.
Acknowledgements
Wewould like to thank Jens Sten Ellebæk Nielsen and Kristian Stålø Knudsen for contribution to the simpliﬁcation of the
model and working extensively on the frontend tool. Furthermore, we are grateful for comments from Martin Fränzle, Kim
G. Larsen and Tolga Ovatman on this work. Finally, we are grateful to Jacob Illum and Alexandre David for providing us with
versions of Uppaal under development, in order for us to conduct initial experiments with veriﬁcation of larger real-life
systems as well as models with stopwatches.
The work presented in this paper has been supported by ARTIST2 (IST-004527), MoDES (Danish Research Council 2106-
05-0022) and the Danish National Advanced Technology Foundation under project DaNES.
References
[1] Y. Abdeddaïm, E. Asarin, O. Maler, Scheduling with timed automata, Theoret. Comput. Sci., 354 (2) (2006) 272–300.
[2] Y. Abdeddaïm, O. Maler, Job-shop scheduling using timed automata, Proceedings of the 13th International Conference on Computer Aided Veriﬁcation
(CAV’01), vol. 2102, Springer, 2001.
[3] R. Alur, D.L. Dill, A theory of timed automata, Theoret. Comput. Sci. 126 (2) (1994) 183–235.
A. Brekling et al. / Journal of Logic and Algebraic Programming 77 (2008) 1–19 19
[4] T. Amnell, E. Fersman, L. Mokrushin, P. Pettersson, W. Yi, Times – A tool for modelling and implementation of embedded systems, Lecture Notes in
Comput. Sci. 2280 (2002) 460–464.
[5] G. Behrmann, A. David, K.G. Larsen, A tutorial on uppaal, Lecture Notes in Comput. Sci. 3185 (2004) 200–236.
[6] E. Fersman, P. Pettersson, W. Yi, Timed automata with asynchronous processes: schedulability and decidability, Lecture Notes in Comput. Sci. 2280
(2002) 67–82.
[7] R.L. Graham, Bounds on multiprocessor timing anomalies, SIAM J. Appl. Math. 17 (2) (1969) 416–429.
[8] L. Halkjaer, K. Haervi, A. Ingolfsdottir, Veriﬁcation of the LegOS scheduler using uppaal, Electron. Notes Theor. Comput. Sci. 39 (3) (2000) 273–292.
[9] P. Hastrono, S. Klaus, S.A. Huss, An integrated SystemC framework for real-time scheduling assessments on system level, in: Proceedings of the 25th
IEEE International Real-Time Systems Symposium (RTSS’04), 2004.
[10] C. Haubelt, J. Falk, J. Keinert, T. Schlichter,M. Streubühr, A. Deyhle, A. Hadert, J. Teich, A SystemC-based designmethodology for digital signal processing
systems, EURASIP J. Embedded Syst. 2007 (1) (2007) 15.
[11] F. Hessel, V.M. da Rosa, I.M. Reis, R. Planner, C.A.M. Marcon, A.A. Susin, Abstract RTOS modeling for embedded systems, Proceedings of the 15th IEEE
International Workshop on Rapid System Prototyping (RSP’04), IEEE Computer Society, Washington, DC, USA, 2004.
[12] G.J. Holzmann, The SPIN Model Checker: Primer and Reference Manual, Addison-Wesley, 2003.
[13] T. Kempf,M.Doerper, R. Leupers, G. Ascheid,H.Meyr, T. Kogel, B. Vanthournout, Amodular simulation framework for spatial and temporal taskmapping
ontomulti-processor SoC platforms, Proceedings of the Conference onDesign, Automation and Test in Europe (DATE’05), vol. 2, IEEE Computer Society,
Washington, DC, USA, 2005.
[14] V. Kianzad, S.S. Bhattacharyya, CHARMED: a multi-objective co-synthesis framework for multi-mode embedded systems, in: Proceedings of the 15th
IEEE International Conference on Application-Speciﬁc Systems, Architectures and Processors (ASAP’04), 2004.
[15] K.G. Larsen, P. Pettersson, W. Yi, Uppaal in a nutshell, Int. J. Software Tools Technol. Transf. 1 (1–2) (1997) 134–152.
[16] T.G.S. Liao, G. Martin, S. Swan, System Design with systemC, Springer, 2002.
[17] M. Loghi, F. Angiolini, D. Bertozzi, L. Benini, R. Zafalon, Analyzing on-chip communication in a MPSoC environment, in: Proceedings of the Conference
on Design, Automation and Test in Europe (DATE’04), IEEE, 2004.
[18] J. Madsen, M.R. Hansen, K.S. Knudsen, J.E. Nielsen, A.W. Brekling, System-level veriﬁcation of multi-core embedded systems using timed-automata,
in: Proceedings of the 17th World Congress International Federation of Automatic Control Seoul, Korea, July 6–11, 2008, pp. 9302–9307.
[19] J. Madsen, S. Mahadevan, K. Virk, Network-centric system-level model for multiprocessor SoC simulation, Interconnect-Centric Design for Advanced
SoC and NoC, Kluwer Academic, 2004, pp. 341–365.
[20] J. Madsen, K. Virk, M.J. Gonzalez, A SystemC-based abstract real-time operating system model for multiprocessor system-on-chip, in: Multiprocessor
System-on-Chip, Morgan Kaufmann, 2004, pp. 283–312.
[21] S. Mahadevan, K. Virk, J. Madsen, ARTS: a SystemC-based framework for multiprocessor systems-on-chip modelling, Des. Automat. Embedded Syst.
11 (4) (2007) 285–311.
[22] R.L. Moigne, O. Pasquier, J.-P. Calvez, A generic RTOS model for real-time systems simulation with SystemC, in: Proceedings of the Conference on
Design, Automation and Test in Europe (DATE’04), vol. 3, IEEE Computer Society, 2004.
[23] J.F. Muth, G.L. Thompson, Industrial Scheduling, Prentice-Hall International, Englewood Cliffs, NY, USA, 1963.
[24] A.D. Pimentel, C. Erbas, S. Polstra, A systematic approach to exploring embedded system architectures at multiple abstraction levels, IEEE Trans.
Comput. 55 (2) (2006) 99–112.
[25] A.D. Pimentel, L.O. Hertzberger, P. Lieverse, P. van derWolf, E.F. Deprettere, Exploring embedded-systems architectures with artemis, IEEE Comput. 34
(11) (2001) 57–63.
[26] P. Pop, P. Eles, Z. Peng, Bus access optimization for distributed embedded systems based on schedulability analysis, in: Proceedings of the Conference
on Design, Automation and Test in Europe (DATE’00), ACM, New York, NY, USA, 2000.
[27] J. Reineke, B. Wachter, S. Thesing, R. Wilhelm, I. Polian, J. Eisinger, B. Becker, A Deﬁnition and classiﬁcation of timing anomalies, in: Proceedings of 6th
International Workshop on Worst-Case Execution Time (WCET) Analysis, 2006.
[28] K. Richter, M. Jersak, R. Ernst, A formal approach to MpSoC performance veriﬁcation, IEEE Comput. 36 (4) (2003) 60–67.
[29] M.T. Schmitz, B.M. Al-Hashimi, P. Eles, System-Level Design Techniques for Energy-Efﬁcient Embedded Systems, Kluwer Academic Publishers, Norwell,
MA, USA, 2004.
[30] L. Thiele, S. Chakraborty, M. Naedele, Real-time calculus for scheduling hard real-time systems, in: Proceedings of IEEE International Symposium on
Circuits and Systems (ISCAS 2000), vol. 4, Geneva, Switzerland, 2000.
[31] S. Tripakis, K. Altisen, Implementation of timed automata: an issue of semantics or modeling?, Lecture Notes in Comput. Sci. 3829 (2005) 273–288.
