Hard real-time guarantee of automotive applications during mode changes by DZIURZANSKI, Piotr et al.
Hard Real-time Guarantee of Automotive Applications
during Mode Changes
Piotr Dziurzanski
Dept. of Computer Science, University of York
Deramore Lane, Heslington, York
YO10 5GH, UK
PD678@york.ac.uk
Amit Kumar Singh
Dept. of Computer Science, University of York
Deramore Lane, Heslington, York
YO10 5GH, UK
Amit.Singh@york.ac.uk
Leandro Soares Indrusiak
Dept. of Computer Science, University of York
Deramore Lane, Heslington, York
YO10 5GH, UK
Leandro.Indrusiak@york.ac.uk
Björn Saballus
Robert Bosch GmbH Corporate Sector
Research and Advance
Engineering - Software (CR/AEA2)
70442 Stuttgart, Germany
bjoern.saballus@de.bosch.com
ABSTRACT
This paper presents a resource allocation approach that ben-
efits from modal nature of hard-real time systems under con-
sideration. The modal nature determines the operational
modes of the systems. Thanks to the modal nature of these
systems, it is possible to decrease the number of active cores
consuming high power in certain modes, leading to consid-
erable energy savings while still not violating any of timing
constraints. The proposed approach consists of both off-
line and on-line steps. More computational intensive steps
are performed off-line, whereas only detection of the cur-
rent mode and mode switching are performed online. In the
presented automotive use case, the number of required cores
has been decreased up to 75% in a particular mode and rela-
tively low amount of data is to be migrated during the mode
change.
Categories and Subject Descriptors
C.3 [Special-Purpose and Application-Based Systems]:
Real-time and embedded systems; B.8.2 [Performance and
Reliability]: Performance Analysis and Design Aids
Keywords
hard real-time systems, modal systems, task allocation, task
migration
1. INTRODUCTION
Due to the growing number of electronic control units
(ECUs) in contemporary cars, sometimes reaching even 100,
the automotive industry gradually resigns from their para-
digm of using a separate unit for each functionality [18].
The requirement of placing a number of ever more sophis-
ticated functionalities in one chip resulted in appearance
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00
of multi-core ECUs [34]. The AUTOSAR (AUTomotive
Open System ARchitecture) standard [1] assumes a static
(i.e. compile-time) mapping of atomic software components,
named runnables, into cores since it is less complex and more
predictable than dynamic resource allocation [16].
Due to the hard real-time constraint in automotive sys-
tems, the cores have to execute all the tasks on time even for
their worst-case execution behavior, where they take worst-
case execution time (WCET), which is usually much higher
than the average execution time [31]. One possibility of de-
creasing the difference between the worst and average task
execution times stems from the modal nature of such appli-
cations, i.e. from the fact that they can behave in a limited,
known at design-time, number of ways, named modes. If
each mode is analysed independently, the average execution
time may be closer to the WCET determined for that mode
[27]. In [21], six modes have been identified in a 4 cylinder
gasoline torque based system, for example Cranking, Idle
and Wide Open Throttle. It has been stressed there that ex-
ecution times of particular runnables differ significantly for
various modes of an ECU and thus applying different map-
pings for each operating mode may be beneficial. This way a
lower number of cores could be needed than that of the cor-
responding system design not considering operating modes.
However, introducing different mapping for modes imposes
significant design complications, which have not been anal-
ysed in [21].
The contexts of runnables that are executed on different
cores in different modes have to be migrated from one core
to another, setting additional requirements for the available
communication bandwidth. The process of mode switch-
ing usually incurs overhead (both in execution time and
energy), which is to be taken into account at run-time to
decide whether to switch to a different mode or not. In hard
real-time systems, it is essential to satisfy all the timing
constraints even during the mode switching process, i.e. the
migration time of tasks must be time bounded [12]. There-
fore, the worst case switching time has to be assumed to
provide the timing guarantees.
During the migration process, the taskset schedulability
must not be violated. To guarantee this property, we pro-
pose to treat a migration process as any other asynchronous
process in schedulability analysis, i.e. to use so-called pe-
riodic servers, which are periodic tasks executing aperiodic
jobs. When a periodic server is executed, it processes pend-
ing task migration. If there is no pending migration, the
server simply holds its capacity. To reduce the migration
time, a recursive greedy algorithm for reducing the amount
of data transferred during a mode change is proposed. It
aims to decrease the number of periodic server instances
used during a single mode switching. The proposed ap-
proach can be applied to any hard real-time systems, where
different operating modes can be identified, and automotive
systems in particular.
As an example, throughout this paper we will analyse an
engine ECU code named DemoCar. We will identify its op-
erating modes and apply clustering to decrease their number
and to eliminate task migration between neighbouring mode
pairs (i.e. two modes from which at least one mode can be
directly transferred to the second one) if the mode change
is to be finished rapidly. The mappings for each mode will
be determined using a genetic algorithm. This algorithm
applies two optimizing criteria: runnable schedulability in
terms of a number of deadline violations and migration cost
in terms of the context length of the transmitted runnables.
The typical schedulability analysis is used to determine the
necessary network bandwidth to guarantee that the mode
switching migration finishes in the required time.
In the next section, the state-of-the-art solutions are de-
scribed followed by the proposed approach and a discussion
on providing performance guarantees during mode changes.
2. RELATEDWORKS
Systems with distinguishable operating modes are increas-
ingly popular in research. A number of research activity
aims at developing design-time (off-line) heuristics to reduce
the number of operating points, since the amount of possi-
ble scenarios is typically prohibitively high [14]. This De-
sign Space Exploration (DSE) process can be carried out
using classic heuristic techniques (including genetic algo-
rithms [13], [14], [26], tabu search [25], simulated annealing
[35], particle swarm optimization [20], etc.), or with tech-
niques for pruning the design space [22], performing statis-
tical analysis for identifying potentially benefiting operating
points, or use a priori knowledge of the target platform [4].
Then during run-time of that system, a run-time manager
(RTM) chooses an appropriate operating point according
to the available platform resources by solving an instance of
multi-dimensional multi-choice knapsack problem (MMKP).
Despite MMKP belongs to the NP-hard complexity class,
there exists a number of light-weight greedy heuristics fa-
cilitating finding a quasi-optimal mapping during run-time
[13]. Alternatively, in [21], there is a possibility of determin-
ing the current mode out of explicitly given set by observing
some variables of the model. In our work, the current mode
is determined in a similar way.
Two different mapping approaches are proposed in [28].
In the first one, named global static power-aware mapping,
each task is assigned to one particular processing element
independently from the actual scenario. This approach re-
duces the amount of memory required for storing the con-
figurations and increases the efficiency of run-time manage-
ment. However, it results in increased power consumption
in comparison with the second approach, dynamic power-
aware scenario-mapping, where this assumption is relaxed
and different mappings for scenarios are stored. These ap-
proaches do not allow task migration - once a task is assigned
to a processing element, it remains there until finishing its
computation. In contrast, Benini et al. [5] allowed tasks
to migrate between processing elements when the envisaged
performance increase is higher than the precomputed mi-
gration cost. This analysis is performed at each instance of
configuration change.
In order to analyse the worst case switching time between
two modes, it is helpful to show the possible modes and
transition between them in a formal way, using for exam-
ple Finite State Machines (FSMs), as proposed in [27]. In
this way it is possible to enumerate all allowed modes and
transitions, and to check the cost of mode switchings. In [9],
an average switching time overhead for H.264 decoder has
been measured to be equal to 0.2% of the total system time.
This sligh value has been caused by a low number of existing
modes, obtained due to the clustering, and thus relatively
rare switches. In hard real-time systems such decrease of
modes by clustering is even more crucial and thus it is in-
corporated in the proposed design flow. In [30], the authors
suggest to map as many tasks as possible to the same core in
various modes to avoid the data or code items to be moved
between different resources when switching between modes.
In the proposed approach, we use a genetic algorithm to
minimize the amount of data to be migrated.
To perform a schedulability analysis during mode changes,
the data migration work is performed during time slots al-
located to a periodic server. There exist different kinds of
such servers, including polling servers, sporadic servers and
deferrable servers, with different replenish policies of server
execution time [7]. Despite these differences, their period
and maximum execution time during one period are selected
in a way that the chosen end-to-end scheduling test proves
that no deadline is violated. To decrease the timing of best-
effort (i.e. migration) task execution, the best-effort band-
width server includes a slack reclaiming procedure and an
algorithm for determining appropriate server parameters [3].
An example of a direct application of synchronized deferrable
servers for multi-core systems has been demonstrated in
[33], but its authors assumed the migration cost to be ei-
ther negligible, or added to the worst-case execution time of
each task, which is difficult to be applied in systems with
network architecture prone to contentions. In the proposed
approach, more realistic migration time is evaluated, tak-
ing into account network parameters and interference from
other flows.
In [6], it was experimentally shown that even a total freez-
ing task migration strategy, i.e. where the migrated task is
stalled while all its code and data are transferred through
links to the target core, can be used in a NoC-based envi-
ronment and still improve the fulfilment of task deadlines
in soft real-time systems. To guarantee hard timing con-
straints, freeze time should be bounded and possibly short.
One of the possible techniques is a precopy strategy, where
code and data of some tasks are copied before the actual
switching. However, this technique is more complicated than
total freezing and has higher migration time, as some data
portions may be required to be copied more than once due
to their modifications [15]. Storing task code in a few cores
and transferring only the necessary data is another possibil-
ity. However, in doing so, the storage overhead at each core
can increase by a large amount.
A method to guarantee hard real-time for task migration
is proposed in [17]. However, a costly schedulability analysis
is performed during runtime. No experiments supporting
their proposed approach is provided, but one may predict
that the overload of that dynamics could be considerable.
The research described in [21] is the closest to the ap-
proach proposed in this paper. Its authors have identified
mode transition points in an engine management system,
and shown that a load distribution by mode-dependent task
allocation is better balanced in comparison with a static task
allocation. The performance has been evaluated by simula-
tion, but, contrary to our approach, the task migration costs
have not been considered.
CylNumObserver IgnitionSWCSync
InjectionSWCSync
MassAirFlowSWC
ThrottleSensSWC
APedSensor APedVoterSWC
ThrottleCtrl
ThrottleActuator
BaseFuelMass
ThrottleChangeSWC
TransFuelMassSWC
IgnitionSWC
TotalFuelMassSWC
OperatingModeSWC
IdleSpeedCtrlSWC
APedSensorDiag
InjBattVoltCorrSWC
CylinderNumber
TriggeredCylinderNumber
IgnitionTime1
IgnitionTime2
IgnitionTime3
IgnitionTime4
IgnitionTime5
IgnitionTime6
IgnitionTime7
IgnitionTime8
IgnitionTiming
EngineSpeed
TotalFuelMassPerStroke
CrankFlag
BatVoltCorr
InjTimeCyl1
InjTimeCyl2
InjTimeCyl3
InjTimeCyl4
InjTimeCyl5
InjTimeCyl6
InjTimeCyl7
InjTimeCyl8
MAFSensorMAFSensorVoltage
ThrottleAngle1
ThrottleAngle2
ThrottlePosition1
ThrottlePosition2
PedalAngle1
PedalAngle2
AcceleratorPedalPosition1
AcceleratorPedalPosition2
VotedPedalPosition
IdleThrottleCorrection
RateOfThrottleChange
VehicleSpeed
DesiredThrottlePosOut
DesiredThrottlePos
CoolantTemperature
BaseFuelMassPerStroke
MafRateOut
FuelEnabled
InletAirTemperature
OverrunFlag
UpdatePeriod
ThrottleImpulseBeta1
ThrottleImpulseBeta2
OverrunFuelShutoffFlag
TransientFuelMassPerStroke
OverrunIgnitionRetard
IdleFlag
IdleOLFlag
IdleIgnitionCorrection
LambdaCat1
LambdaCat2
IgnitionOn PowerUpComplete IdleSpeedSetpointAFRFeedbackFlag
BatteryVoltage
Figure 1: Flow graph of the DemoCar example; the runnables belonging to the same task are highlighted
with the same colour, labels are not highlighted. Some flows are drawn in different colours for readability
From the literature survey it follows that designing real-
time systems with distinguishable operating modes has been
mainly limited to soft timing constraints. According to the
authors’ knowledge, there is no proposal of any method guar-
anteing no hard deadline violation during task migrations.
In particular, schedulability analysis has not been applied
to check the feasibility of task migration process or to deter-
mine the worst case switching time between two operating
modes except of the positional paper [17].
3. SYSTEMMODEL
3.1 Application model
In this work we assume application model is consistent
with the AUTOSAR standard [1]. A taskset Γ is com-
prised of an arbitrary number of periodic runnables, Γ =
{τ1, τ2, τ3, . . .}, grouped in tasks with hard real-time con-
straints. The j-th occurrence (j-th job) of runnable τi is
denoted with τi,j . The taskset is known in advance, includ-
ing the WCET of each runnable, Ci, its period Ti, priority
Pi and its relative deadline Di equal to this period. Runnab-
les are atomic schedulable units communicating each other
with so called labels, N = {ν1, . . . , νr}, which are mem-
ory locations of a particular length. The order of read and
write operations to labels denotes the runnable dependen-
cies, as the write operation to a particular label should be
completed before its reading. Deadlines for mode chang-
ing time between each neighbouring pair of modes are also
provided. We assume that the labels are stored in the same
node that the runnable that reads these labels. If more than
one runnable mapped to different cores read from the same
label, its content is to be replicated to all the reading nodes
and the writer should update the label value at all the lo-
cations. It means that the writer is aware of all its readers
and knows their locations in all the possible modes.
Example 1. Throughout this paper, we consider a light-
weight engine control system named DemoCar as an exam-
ple application. The flow graph of this application is de-
picted in Fig. 1. It consists of 18 runnables and 61 la-
bels. All runnables are periodic: 8 runnables (highlighted
in green) are to be executed every 10ms, whereas period of
6 runnables (red, blue and yellow) equals 5ms, two (violet)
ψ0,1 ψ1,1 ψ2,1
ψ0,0 ψ1,0 ψ2,0
π0,1 π1,1 π2,1
π0,0 π1,0 π2,0
Figure 3: An example many-core system platform
runnables are executed every 20ms and the period of two
(orange) runnables is 100ms. In Figure 2 (upper part), 11
identified modes of this application are presented. These
modes have been identified by inspecting the code of the
runnable named OperatingModeSWC, which computes val-
ues of transaction and output functions of the Finite State
Machine steering this engine. For example, label FuelEn-
abled is read by two runnables: TransFuelMassSWC and
ThrottleChangeSWC. If these runnables are mapped to dif-
ferent cores, the label is to be replicated and kept in both
the cores where these runnables were mapped to. It is a role
of the writer, OperatingModeSWC, to update these values
coherently not violating any timing constraints.
3.2 Platform model
The hardware platform assumed in this paper is a mesh
Network on Chip (NoC) with a certain number of cores
pi ∈ Π and routers ψ ∈ Ψ, as shown in Figure 3. Each
link is modelled as a single resource, so, for example, to
transfer a portion of data from pi0,1 to appropriate sink pi2,0
we need such resources allocated simultaneously: pi0,1−ψ0,1,
ψ0,1−ψ1,1, ψ1,1−ψ2,1, ψ2,1−ψ2,0, ψ2,0−pi2,0. In every mode,
each runnable is mapped to one core and a label is stored
in local memories of the cores requesting that label. Data
transfer overhead is taken into consideration, assuming con-
stant time for transferring a single flit (Flow control digIT,
a piece of a network package whose length usually equals
PowerUpPowerDown
WaitForPower
DownDelay
Stalled Cranking IdleOpenLoop IdleClosedLoop Drive WaitForOverrun Overrun
WaitForAFR
Feedback
PowerDown PowerUp Cluster_1
Figure 2: Finite State Machine describing mode changes in DemoCar use case: before (upper part) and after
(lower part) the clustering step
Detection of current 
mode
Mode detection / 
clustering
(optional)
Spanning tree 
construction
Static mapping for 
initial mode
Static mapping for 
non-initial modes 
minimizing amount of 
data to be migrated
off-line
 
Schedulability analysis 
for taskset during mode 
changes – bandwidth 
determining
on-line
Mapping switching
 
Figure 4: Steps of dynamic resource allocation method benefiting from modal nature of applications
the data width of a single link) between two neighbouring
cores if no contentions are present. Timing constants for
packet latencies while traversing one router and one link are
denoted as dR and dL, respectively. The priority of data
transfer packets are assumed to be equal to the priority of
the runnable sending them.
3.3 Problem formulation
Given a platform and an application model with a defined
set of operating modes, the problem is to determine schedu-
lable mappings for each mode so that the amount of data
to be migrated during the allowed mode changes is mini-
mized. During mode changing, the taskset should be still
schedulable despite the additional network traffic generated
by the task migrations. The neighbouring modes (i.e. the
modes connected with a link in the FSM describing the al-
lowed mode transitions) with similar runnables’ execution
time can be clustered to decrease the frequency of task mi-
grations. The deadlines for mode changing time between
each neighbouring pair of modes must not be violated.
4. PROPOSED APPROACH
In this section, steps of the proposed design flow are de-
scribed. Since it has been assumed that the tasksets of the
considered application are known in advance, it is possible
to perform some preliminary computations statically. Con-
sequently, the mapping problem can be split into two stages:
off-line (static) and on-line (dynamic), as shown in Figure 4.
The computation time of the off-line part is not crucial and
thus heuristics with even high complexity may be used for
runnable and label mappings. It seems promising to com-
bine the most effective approaches, such as multi-objective
simulating annealing or genetic algorithms. The possibil-
ity of extending genetic operators benefiting from the full
knowledge of the system domain, such as mutation in a way
similar to [24], makes the genetic approach the first choice
at this step.
During run time, detection of the current mode is assumed
to be done by observing certain variable. (In DemoCar such
variable is named sm and is stored in runnable Operating-
ModeSWC.) When a value of this variable has been changed,
the current runnable and label mapping might have to be
switched. The mappings have been chosen during the de-
sign time with respect to minimize the amount of data to
be migrated. Schedulability analysis guarantees that even
the worst case switching time does not violate the deadline
required for mode changes. If such violation is unavoidable,
either the states can be clustered, or the network bandwidth
is to be increased.
The off-line part of the proposed approach is comprised
of five steps, which are covered in the following subsections.
4.1 Mode detection / clustering
The reasons for introducing the mode detection & clus-
tering step are twofold. Firstly, some neighbouring modes
can be characterized with similar runtime and resource con-
sumption. Then there is little benefit in preparing differ-
ent mappings for such modes and migrating the runnab-
les when a transition between these neighbouring modes is
made. Moreover, some transitions are required to be done
immediately, whereas others can be less time tight. If a
runnable migration is to be performed quickly, for example
between two consecutive runnable occurrences, the band-
width needed to transfer the appropriate amount of data
in that time may be unreasonably high. Therefore, it may
be more sensible to merge two modes with such rapid task
switching time and generate only one mapping for them.
Example 2. (continuation of Example 1) In DemoCar, tran-
sitions between modes: Stalled, Cranking, IdleOpenLoop,
IdleClosedLoop, Drive, WaitForOverrun, Overrun, WaitFor-
AFRFeedback and WaitForPowerDownDelay are to be per-
formed between two consecutive executions of their runnable
PowerDown PowerUp Cluster_1
1
1
0.1
0.9
PowerUp Cluster_1 PowerDown
Figure 5: Spanning tree construction for DemoCar
occurrences, which is upperbounded with 5ms for 9 run-
nables. Since performing task migration during such short
time window would require a bandwidth of considerable size,
these modes have been clustered into Cluster 1. Finally,
three modes can be identified after the clustering step: Pow-
erDown, PowerUp and Cluster 1, as presented in Figure 2
(lower part).
4.2 Spanning tree construction
To minimize the amount of data to be migrated between
two consecutive modes with the technique proposed in this
paper, the FSM describing mode changes should include
weights denoting state transition probabilities. Since prob-
abilities of staying in the current mode are not relevant at
this step, they can be omitted for simplicity. The probabil-
ities can be given or determined during long simulation of
the modal system. The FSM has also to have all its cycles
removed to guarantee halting of the Static mapping for non-
initial modes step. In this regard, for an FSM treated as a
weighted connected graph G(V,E), where V is the set of ver-
tices and E denotes the set of edges, a maximum spanning
tree can be constructed. We recollect that a spanning tree of
a graph G is its subgraph T (V,E′), which is connected and
whose number of edges is equal to the number of vertices
minus 1, |E′| = |V | − 1. If T denotes the set of all spanning
trees of G, a maximum spanning tree Tmax(V,Emax) of G
is a spanning tree iff:
∀
T (V,E′)∈T
∑
(v,z)∈Emax
w(v, z) ≥
∑
(v,z)∈E′
w(v, z),
where w(v, z) is the weight value assigned to the edge from a
vertex v to z. A maximum spanning tree can be constructed
in time O(|E|log|V |), e.g. by the classic Prim’s algorithm
[23].
The operation performed in this step neither influences
the application behaviour nor limit the possible mode tran-
sitions. It only makes the least frequent transitions not op-
timized during stage Static mapping for not initial modes
minimizing amount of data to be migrated (Fig. 4).
Example 3. (continuation of Example 2) For DemoCar,
probabilities of mode changing have been shown in Figure
5 (left). The maximum spanning tree, constructed with the
Prim’s algorithm, is presented in Figure 5 (right).
4.3 Static mapping for initial mode
Since mapping for each mode is performed off-line, even
heuristics known from their high computational cost, such
as genetic algorithms, can be applied. A genetic algorithm
used for hard real-time systems shall guarantee that under
the chosen schedule all timing constraints are satisfied. This
can be performed in several ways. For example, each missed
deadline can impose a certain penalty to the fitness function
value, and thus each schedule with unsatisfied constraints
should be eliminated during the evolutionary process. A
particular mapping is portrayed as a chromosome, stored as
a bit string, representing on each gene the processing core
Algorithm 1: Pseudo-code of no deadline violation
with makespan minimisation algorithm for the initial
mode mapping
inputs : Workload Γ;
Resource set Π;
outputs : Task mapping;
1 Choose an initial random population of task
mappings
2 while not termination condition do
3 Evaluate the number of deadline violations;
//criterion (i)
4 Evaluate the makespan; //criterion (ii)
5 Create clusters of individuals with the same
number of deadline violations;
6 Sort the clusters by increasing number of
deadline violations;
7 Sort individuals in each cluster wrt their
makespan;
8 Perform tournament selection; //criterion (i) has
higher priority than criterion (ii)
9 Generate individuals using crossover and
mutation;
10 Create a new population with the best found
mappings;
11 end
where the task would be mapped to, similarly to [11]. The
bit string one-point crossover operator and flip bit mutation
have been applied together with the tournament selection of
the individuals.
Below, an algorithm encompassing the aforementioned prop-
erties is described.
In Algorithm 1, it is presented a pseudo-code of a genetic
algorithm that can be used during Static mapping for initial
mode step, the third off-line step of the proposed approach,
as depicted schematically in Figure 4. We propose to use
two fitness functions - measuring (i) the number of deadline
violations and (ii) makespan (also known as response time).
The first fitness function value is of primary importance, as
in a hard real-time system no deadline violation is allowed.
But among fully schedulable mappings, the one leading to
a lower makespan is chosen, since idle intervals can be used
to decrease energy consumption or execute tasks of lower
criticality levels.
In the algorithm, the following steps can be singled out.
Step 1. Initial population initialisation (line 1). An ar-
bitrary number of random task mappings (individuals) is
created.
Step 2. Creating a new population (lines 3-10). For each
individual, values of two fitness functions are computed -
the number of deadline violations and the makespan (lines
3-4). Individuals with the same number of deadline misses
are grouped together (line 5). The groups are then sorted
with respect to the number of deadline violations in the as-
cending order (line 6). Inside each group, individuals are
sorted according to their growing makespan (line 7). The
tournament selection is then performed - individuals from a
group with lower number of deadline violations are always
preferred, whereas among individuals from one group the
one with the lowest makespan is to be chosen (line 8). The
individuals winning the tournament are then combined us-
ing a typical crossover operation and mutated (line 9). A
new population is created (line 10). Step 2 is repeated in a
loop as long as a termination condition is not fulfilled, which
can be a maximal number of generated populations or lack
of improvement in a number of subsequent generations.
Example 4. (continuation of Example 3) For the PowerUp
(initial) mode of DemoCar to be executed on a multi-core
embedded system, we evaluate makespan and number of vi-
olated deadlines during one hyperperiod (i.e. the least com-
mon multiple of all runnables’ periods) by allocating run-
nables and labels to different cores.
The size of the NoC mesh has been initially configured
as 2x2 with no idle cores, since this size has been earlier
checked (also using Algorithm 1) and is large enough to ex-
ecute DemoCar in the most computational intensive mode,
Cluster 1, not violating any of its timing constraints. The
genetic algorithm is executed again to perform assignment
of tasks to cores with timing characteristics for the initial
PowerUp mode. The genetic algorithm has been configured
to generate 100 generations of 20 individuals each. The first
fully schedulable allocation has been found in the 1st gen-
eration, which suggests that it might be possible to allocate
the taskset to a lower number of cores.
After performing further search it has appeared that the
taskset in the initial mode is schedulable even when mapped
to one (out of four) active core. The lowest makespan for
the NoC with three idle cores is equal to 8622µs.
4.4 Static mapping for non-initial modes
It is of primary importance to migrate as little data as pos-
sible during mode changes to minimise the migration time.
Each application A includes a set of tasks and can be
represented with a vector comprised of p runnables and
r shared memory locations (labels) of these tasks, A =
[τ1, . . . , τp, ν1, . . . , νr], and platform Π is composed of s pro-
cessing cores, Π = {pi1, . . . , pis}. A mapping M is a vector of
p core locations, M = [piτ1 , . . . , piτp ], where each element cor-
responds with the appropriate element of A and can be sub-
stituted with any element of set Π. Each element of weight
vector W, W = [wτ1 , . . . , wτp ], is equal to the amount of
data that has to be transferred when a particular runnable
is migrated, including the labels to be read.
Let Mα and Mβ be sets of mappings that are fully schedu-
lable in a given system in state α and β, respectively. The
elements of the difference vector Dmα,mβ = [dτ1 , . . . , dτp ]
indicate which runnables are to be migrated when the mode
is changed from α to β. Each element dδ, δ ∈ {τ1, . . . , τp},
takes value 1 if the particular runnable/label is allocated to
different cores in mappings mα ∈ Mα and mβ ∈ Mβ , and 0
otherwise:
dδ =
{
0, if mα,δ = mβ,δ,
1, otherwise.
(1)
where mα,δ and mβ,δ denote the δ-th element of vectors mα
and mβ , respectively. The migration cost c between two
states α and β is then computed in the following way:
cmα,mβ = Dmα,mβ ·WT. (2)
A recursive greedy algorithm for reducing an amount of
data transferred during mode changes is presented in Algo-
rithm 2.
Since some cycles are likely to occur in a graph represent-
ing the Finite State Machine describing transitions between
modes, a spanning tree (ST) is to be built, as described in
the previous subsection. Then the mode corresponding to
the initial state of the FSM is selected as the current mode
(line 1). For this mode, a set of schedulable mappings is
generated, e.g. with Algorithm 1 (line 2). If more than
one schedulable mapping is found, an additional criterion
crit (e.g. minimum makespan value) is used to select one
of them (line 3). Then for each direct successor of the ST
Algorithm 2: Pseudo-code of a migration data
transfer minimisation algorithm
inputs : A spanning tree ST based on Finite State
Machine (FSM) describing the system
modes with transaction probabilities;
W - size of each runnable memory
footprint;
crit - mapping optimality criterion (e.g.
minimum makespan value);
outputs : Runnable and label mapping for each
mode;
1 Select the initial state of ST and assign it to α;
2 Find a set of schedulable mappings Mα;
3 Select mα ∈ Mα wrt criterion crit;
4 forall the β being a direct successor of α in ST
do
5 FindMappingMin(α, β, mα);
6 end
7 FindMappingMin(α, β, mα)
8 Find a set of schedulable mappings Mβ
minimizing criterion (2) using W
9 Select mβ ∈Mβ wrt criterion crit
10 forall the q being a direct successor of β in ST
do
11 FindMappingMin(β, q, mβ)
12 end
node corresponding to FSM initial state, the FindMapping-
Min procedure is executed (lines 4 and 5).
In the FindMappingMin procedure, a set of schedulable
mappings for that successor node is found using minimal mi-
gration cost criterion (2) (line 8). If more than one schedu-
lable mapping is equally evaluated by this criterion, an addi-
tional criterion, crit, is used (line 9). The FindMappingMin
procedure is then recursively run for each direct successor
of the ST node provided as the function parameter (lines
10 and 11). More mappings could be delivered to the Find-
MappingMin procedure to browse a larger search space by
skipping lines 4 and 9 in the algorithm and providing all
elements of Mα instead of just one.
Example 5. (continuation of Example 4) Regardless of the
mode, the application has been mapped in a 2x2 mesh Net-
work on Chip without deadline violations. For the PowerUp
mode, schedulable mappings have been found even if three
of the four NoC cores remains idle, as shown in Example 4.
It means that in this mode three cores can be switched off,
leading to considerable energy savings. Similarly, two cores
can remain idle in the PowerDown mode. (PowerDown re-
quires more computations than PowerUp since some main-
tenance procedures are to be consistently performed.) How-
ever, despite intensive search using a genetic algorithm, all
four cores are needed in the Cluster 1 mode to have the
taskset fully schedulable. Thus, when the current mode
changes from PowerUp to Cluster 1, three cores have to
be activated, whereas two cores can be switched off after
leaving the Cluster 1 mode.
Let us focus on the transition between the PowerUp and
Cluster 1 modes. For PowerUp, only one core is active
and thus all runnables are to be mapped to the only active
core. However, in other cases a larger set of mappings that
are fully schedulable on active NoC cores would have been
identified. From these mappings, the one with the lowest
makespan (an additional criterion) is chosen. This mapping
has been used as a parameter of the FindMappingMin pro-
τi, τj τk
mα 
π a π b
τi τj,τk
mβ 
π a π b
Figure 6: Example of two different mappings (mα,
mβ) of runnables τi, τj, τk into cores pia and pib
cedure (from the algorithm presented in Algorithm 2). The
set of schedulable mappings following the minimum criterion
(equation (2)) is identified using a genetic algorithm. By
the applied criterion (min(cmα,mβ )), a significantly lower
amount of data has to be migrated during the mode change.
In the best found case, 3 runnables have to be migrated,
whose total cPowerUp,Cluster 1 = 261968 bytes. However,
by not using this criterion, but the minimal makespan in-
stead, the lowest number of runnables to be migrated equals
13, which results in cPowerUp2,Cluster 1 = 890162 bytes. In
the second case, the amount of data to be transferred us-
ing a periodic server is about 240% higher than in the first
mapping pair. Since periodic servers offer equal throughput
during the system execution, the mode change between the
latter mappings would last more than three times as long as
between the former pair.
During mode change from Cluster 1 to PowerDown, 2
runnables have to be migrated and cCluster 1,PowerDown =
113568 bytes. Although the transition between modes Pow-
erDown and PowerUp are not optimized, in this case only 2
runnables have to be migrated with cPowerDown,PowerUp =
113536 bytes.
4.5 Schedulability analysis for taskset during
mode changes
Since some runnables and labels are expected to be located
at different cores in two different modes, their migration is to
be performed during mode changes. A runnable migration
process is schematically depicted in Figure 6. Two map-
pings mα and mβ of runnables τi, τj , τk into nodes pia and
pib are used in two different system modes: α and β, respec-
tively. The difference between these mappings is the assign-
ment of runnable τj . We assume no deadline violations for
both mappings mα and mβ . During hyperperiods involved
in the migration process between α and β, the schedulabil-
ity analysis for communication resources should take into
consideration not only all the transfers between pia and pib
described in the workload, but also an additional periodic
job, i.e. the periodic server of certain policy (polling, spo-
radic, deferrable, etc.) with a certain execution time in each
period. A technique for determining this time is presented
in this subsection.
When the mode changes from α to β, runnable τj is to
be copied from pia to pib. Since the precopy strategy is ap-
plied, τj is still executed on core pia during the migration.
To migrate runnable τj , the periodic server is used. The
whole context of the runnable is transferred during a number
of subsequent hyperperiods. It is worth stressing that the
maximal migration time can be computed statically, since
the runnable context size and the periodic server time slot
length and period are known. After this time, it is safe to
start executing τj on pib and remove its copy in pia.
To guarantee schedulability of runnables, one of the schedu-
lability tests, described for example in [8], shall be applied.
It is possible to calculate the longest possible time interval
between the release of runnable τi and its termination, which
is referred as τi’s worst case response time (WCRT) and is
represented by Ri. The schedulability analysis is performed
in the way described in [2], i.e. by checking whether WCRTs
of all runnables do not exceed their deadlines. WCRT of
runnable τi can be computed using equation:
Ri = Ci +
∑
∀τj∈hp(τi)
⌈Ri +Rj − Cj
Tj
⌉
Cj , (3)
where hp(τi) denotes the set of all runnables that can pre-
empt τi, Ci and Cj are the worst case execution time of τi
and τj , respectively, and Tj denotes the period of τj . Sim-
ilarly, the worst case latency rk of packet ϕk transmitted
over a link in a mesh NoC with wormhole switching, issued
periodically every tk, can be formulated in a similar manner
as that of [29], [11], [19]:
rk = ck + bk + lk, (4)
where ck is a basic network latency, bk is the maximal block-
ing time from lower-priority packages, and lk is the maximal
blocking time due to interference with higher-priority pack-
ets. The basic network latency can be computed with the
following equation [11], [19]:
ck = H · (dR + dL) +
⌈PS
FS
⌉
· dL, (5)
where dR and dL denote the constant packet latencies while
traversing one router and one link, respectively, PS is the
number of bits in the package, and FS is the flit length in
bits. H is the hop number between source and destination
cores. The remaining terms of equation (4) can be computed
with equations [11], [19]:
bk = H · (dR + dL), (6)
lk =
∑
∀ϕl∈interf(ϕk)
⌈rk + (rl − cl) +Ri
tl
⌉
(cl + bl), (7)
where interf(ϕk) denotes the direct interference set of ϕk,
which is the set of all packets that can preempt ϕk, i.e.
have a higher priority and share at least one link with ϕk.
The response time of task τi that releases ϕk, Ri, has been
substituted as a maximum release jitter. The term (rl − cl)
is an upper bound of indirect interference [11].
By applying equations (3) and (4), both schedulabilities
of independent runnables executed on processing cores and
packet transmissions can be verified. However, jobs in the
considered applications, possibly executed on different cores,
are characterised with various dependency patterns. Typi-
cally, to start a job execution it is required to have all its
parent jobs executed (which contributes to so-called compu-
tation latency) and all the necessary data transferred to the
core where this job is assigned to (communication latency).
The goal is thus to establish whether all task-chains of an
application have their end-to-end deadlines met in a partic-
ular platform, and this assessment is referred as end-to-end
schedulability test. Such test must consider the end-to-end
latency of each task of a task-chain. To check schedula-
bility of a task chain, it is sufficient and necessary to test
the individual end-to-end response times of all tasks be-
longing to that chain [13]. In [13], a technique for end-
to-end schedulability analysis is proposed, but it assumes
a pipelined task execution pattern, where multiple jobs of
the same task chain are executed simultaneously over dif-
ferent cores, but the simultaneous execution of more than
one job of the same task is not allowed. When the execu-
tion pattern does not follow this scheme, meeting end-to-end
APedVoterSWC ThrottleCtlr
Throttle
ChangeSWC
TransFuelMassSWC TotalFuelMassSWC
BaseFuelMass IgnitionSWCThrottleActuator
0 1 2 3 4 5 6 7 8 9 10
Ta
sk
_1
0
m
s
Operating
ModeSWC
IdleSpeedCtrlSWC
0 10 20
Ta
sk
_2
0
m
s
CylNumObserver
0 2.5 5
C
yl
N
u
m
Tr
ig
ge
re
d
Ta
sk
APedSensorDiag
InjBattVolt
CorrSWC
0 50 100
Ta
sk
_1
0
0
m
s IgnitionSWCSync
InjectionSWCSync
0 2.5 5
A
ct
u
at
o
rT
as
k MassAirFlowSWC
ThrottleSensSWC
0 2.5 5
Ta
sk
_5
m
s
APedSensor
Figure 7: Tasks’ stages in DemoCar: green - runnable execution, red - write to labels; release times and
deadlines are provided in ms
deadlines can be checked by assigning an appropriate local
deadline for each job in every chain. These local deadlines
shall be chosen in a way that all the jobs on every core are
schedulable and the local deadlines at the chain last stage
do not exceed the respective end-to-end deadline [10].
Example 6. (continuation of Example 5) In DemoCar, each
task is composed with series of three subsequent stages: read
from labels, runnable execution, write to labels. Since the
labels are always located in the same core as that of runnable
reading these labels, the read stage can be omitted and two
remaining stages are presented in Fig. 7 for all tasks, high-
lighted in green and red. Runnables belonging to one task
and drawn one above the other can be executed in parallel,
whereas the execution order of the runnables follows depen-
dencies defined by label write and read operations so that
each label is to be written by a runnable prior to be read
by another runnable. The end-to-end deadline has been di-
vided into number of stages in each task and in that way
deadlines for each stage have been determined (in Fig. 7
these deadlines are written beneath the end point of each
stage).
For example, the release time of runnable ThrottleCtrl is
2ms. By this time, all the packets with label values required
by this runnable are assumed to arrive at the node executing
ThrottleCtrl. The deadline for this runnable execution is
3ms, so the WCRT (Ri) must not be higher than this value.
The packages with data are then issued between 2ms (the
runnable release time) and Ri. They have to reach their
destination nodes earlier than 4ms.
To check schedulability of DemoCar, it is then sufficient
to check schedulability for runnables executed in all (green)
stages and also data transfers to and from labels performed
in the appropriate (red) stages.
Since the earliest execution starting time of each runnable
is limited by the starting time of the stage including partic-
ular runnable, this stage starting time can be treated as an
offset Oi as described in [32]:
Ri = Ci +
∑
∀τj∈hp(τi)
⌈Ri + (Rj − Cj −Oj)−Oi
Tj
⌉
Cj +Oi.
(8)
Using similar rationale, equations (4) and (7) can be rewrit-
ten in the following way:
rk = ck + bk + lk +Ri, (9)
lk =
∑
∀ϕl∈interf(ϕk)
⌈ (rk −Oi) + (rl − cl −Oi′)
tl
⌉
(cl + bl),
(10)
where term (rk−Oi) reflects an additional jitter imposed by
the response time of task Ri that initiates this transfer and
Oi′ denotes an offset of the task that releases ϕl. One more
requirement has to be added to set interf(ϕk). It includes
not only packets having a higher priority and transferred
via a path sharing at least one link, but also timing bound-
aries of both their sender executions or traffic stages have to
overlap.
As mentioned earlier, the proposed task mapping tech-
nique aims to benefit from a modal nature of applications,
but it also possess new challenges. If the modes are treated
independently from each other, the end-to-end schedulabil-
ity of runnables and packet transmission in each mode can
be analysed using equations (8) and (9). It is the instant of
transition between these modes that requires special atten-
tion. The task migration time can be computed with equa-
tions (5), (6), (9), (10), where the packet size, PS, is equal
to the sum of the header length and the size of the payload
including the whole context of runnables and labels to be
migrated. If a relatively large runnable is to be migrated in
a highly utilised platform, performing the migration when
the next job of the runnable is due to start could require
rather high bandwidth in order not to violate any deadlines.
Thus we assume to use the precopy strategy, as described
in [24]. The job is executed in its current (source) location
during the mode switching, until all the runnables have been
migrated to their new (destination) locations. Then the mi-
grated runnables are removed from the source location, and
their next execution will be performed in their destination
locations. If a runnable is of combinational nature (its out-
puts depend solely on input values; all DemoCar runnables
have this property), only the runnable code section is to be
migrated. In case of a sequential nature of a runnable, the
whole context is to be migrated.
Similarly to [17], we split a runnable context intro two
parts: invariant, which is not modified at runtime, and dy-
namic, including all volatile memory locations. We assume
that an upper bound of the dynamic part size of all run-
nables is known in advance. This part shall be migrated at
once using the last instance of the periodic server. It means
that the local memory locations that can be modified by
the runnable must not be precopied, but migrated after the
last execution of the runnable in the old location. This re-
Table 1: Number of hyperperiods (100ms) required
for switching between states PowerUp to Cluster 1
in DemoCar depending on router (dR) and one link
latencies (dL)
dR [ns] dL [ns] No. of hyperperiods
50 100 1
100 200 2
100 400 3
200 500 4
400 800 6
500 1000 7
quirement can influence the minimum periodic server size
and, consequently, the network bandwidth, as it must be
then wide enough to guarantee migration of dynamic part
before the next runnable execution (in the new location).
This property shall be checked using (9).
In the proposed approach, any kind of periodic servers
can be used, however, the trade-off between implementation
complexity and ability to guarantee the deadlines of hard
real-time tasks, as described for example in [7], shall be con-
sidered.
The number of the hyperperiods required for performing
task migration depends on the size of runnables and labels to
be transferred, mappings, and network bandwidth, in par-
ticular flit size FS and timing constants for packet latencies
while traversing one router and one link dR and dL.
Example 7. (continuation of Example 6) The flit size, FS,
has been fixed to 16 bits. A few examples of the number of
hyperperiods required to migrate tasks from PowerUp to
Cluster 1, depending on constants dR and dL are presented
in Table 1. The hyperperiod length for DemoCar equals
100ms and this time is enough to migrate all data when the
router and link latencies are equal to 50 and 100ns, respec-
tively.
4.6 On-line steps
In the proposed approach, only two steps are performed
on-line: Detection of current mode and Mapping switching.
Both of these steps are characterised with low computational
complexity and thus they impose low overhead for the sys-
tem during run time.
We assume that the system states are defined explicitly
and there is a possibility of determining the current state
by observing some system model variables, similarly to [21].
Otherwise, the most efficient multi-choice knapsack problem
(MMKP) heuristics, listed in the brief survey earlier, have to
be applied to identify the current mode on-line, as proposed
in [13].
When the mode change is requested, an agent residing in
each core prepares a set of packages with runnables to be
migrated via the network. This agent is configured stati-
cally and is equipped with a table with information which
runnables have to be migrated during a particular mode
change. Then the precopy of these runnables is performed.
In the following hyperperiods, runnables are transported us-
ing periodic servers of the length determined statically in
step Schedulability analysis for taskset during mode change.
The agent is aware of the number of periodic server instances
that have to be used during the whole migration process (as
in example in Table 1), and have the volatile portion of the
context identified. If this instance number elapses, the run-
nables that have been migrated are killed.
Simultaneously, the same agent can receive migration data
from other agents in the network. After the appropriate
number of hyperperiods, the contexts of these runnables are
fully migrated and are ready to be executed by the operating
system.
The details of the agent depend on the underlying oper-
ating system.
5. CONCLUSIONS
An approach of task migration in a multi-core network-
based embedded system has been proposed as a way to de-
crease the number of cores needed for guaranteing safe ex-
ecution of a hard real-time software. The steps to be per-
formed statically have been described in details using illus-
trative examples based on a lightweight engine control unit.
A Finite State Machine describing mode changes has been
extracted from the software code and transaction probabil-
ities have been identified during simulation. The closely
related modes have been merged into clusters. The most
frequent transactions have been identified with the classic
Prim’s algorithm, and a genetic algorithm has been used
to determine the runnable-to-core mapping for the initial
mode. Similarly, a genetic algorithm minimizing the number
of migrated data has been used for selecting the runnables to
be migrated when a change of the current mode is requested.
The migration time has been evaluated using schedulability
analysis depending on the network bandwidth.
The proposed approach requires development of an agent
realising the migration process. Since its architecture details
depend on the underlying operating system, its implemen-
tation and evaluation in real embedded environments are
planned as a future work.
Acknowledgement
The research leading to these results has received funding
from the European Union’s Seventh Framework Programme
(FP7/2007-2013) under grant agreement no. 611411.
6. REFERENCES
[1] AUTOSAR: AUTomotive Open System ARchitecture,
2015, http://www.autosar.org/.
[2] N. Audsley, A. Burns, M. Richardson, K. Tindell, A.J.
Wellings, Applying new scheduling theory to static
priority preemptive scheduling, Softw. Eng. J., Volume
8, Issue 5, 1993, Pages 284–292.
[3] S. Banachowski, T. Bisson, S.A. Brandt, Integrating
best-effort scheduling into a real-time system,
International Real-Time Systems Symposium, 2004,
Pages 139–150.
[4] G. Beltrame, L. Fossati, D. Sciuto, Decision-theoretic
design space exploration of multiprocessor platforms,
IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, Volume 29, Issue 7,
2010, Pages 1083–1095.
[5] L. Benini, D. Bertozzi, M. Milano, Resource
management policy handling multiple use-cases in
MPSoC platforms using constraint programming,
Logic Programming, Springer Berlin Heidelberg, 2008,
Pages 470–484.
[6] E.W. Briao, D. Barcelos, F. Wronski, F.R. Wagner,
Impact of task migration in NoC-based MPSoCs for
soft real-time applications, IFIP International
Conference on Very Large Scale Integration (VLSI –
SoC’2007), 2007, Pages 296–299.
[7] R.I. Davis, A. Burns, Hierarchical fixed priority
preemptive scheduling, 26th IEEE Real-Time Systems
Symposium (RTSS’2005), 2005, Pages 389–398.
[8] R.I. Davis, A. Burns, A survey of hard real-time
scheduling for multiprocessor systems, ACM Comput.
Surv, Volume 43, Numer 4, Article 35, 2011, 44 pages.
[9] S.V. Gheorghita, M. Palkovic, J. Hamers, A.
Vandecappelle, S. Mamagkakis, T. Basten, L.
Eeckhout, H. Corporaal, F. Catthoor, F. Vandeputte,
K. De Bosschere, System-scenario-based design of
dynamic embedded systems, ACM Trans. Des.
Autom. Electron. Syst., Volume 14, Issue 1, Article 3,
January 2009, 45 pages.
[10] S. Hong, T. Chantem, X.S. Hu, Meeting End-to-End
Deadlines through Distributed Local Deadline
Assignments, IEEE 32nd Real-Time Systems
Symposium (RTSS’11), 2011, Pages 183–192.
[11] L.S. Indrusiak, End-to-end schedulability tests for
multiprocessor embedded systems based on
networks-on-chip with priority-preemptive arbitration,
Journal of Systems Architecture, Volume 60, Issue 7,
August 2014, Pages 553–561.
[12] J.R. van Kampenhout, Deterministic Task Transfer in
Network-on-Chip Based Multi-Core Processors,
Computer Engineering, Issue 18, 2011.
[13] Y.K. Kwok, A.A. Maciejewski, H.J. Siegel, I. Ahmad,
A. Ghafoor, A semi-static approach to mapping
dynamic iterative tasks onto heterogeneous computing
systems, J. Parallel Distrib. Comput., Volume 66,
Issue 1, January 2006, Pages 77–98.
[14] G. Mariani, V. Zaccaria, G. Palermo, P. Avasare, G.
Vanmeerbeeck, C. Ykman-Couvreur, C. Silvano, An
industrial design space exploration framework for
supporting run-time resource management on
multi-core systems, International Conference on
Design, Automation and Test in Europe
(DATE’2010), 2010, Pages 196–201.
[15] D.S. Milojicic, F. Douglis, Y. Paindaveine, R.
Wheeler, S. Zhou, Process migration, ACM
Computing Surveys (CSUR), Volume 32, Issue 3,
September 2000, Pages 241–299.
[16] A. Monot, N. Navet, B. Bavoux, F. Simonot-Lion,
Multisource Software on Multicore Automotive ECUs
- Combining Runnable Sequencing With Task
Scheduling, IEEE Transactions on Industrial
Electronics, vol.59, no. 10, 2012, Pages 3934–3942.
[17] P. Munk, B. Saballus, J. Richling, H.U. Heiss, Position
Paper: Real-Time Task Migration on Many-Core
Processors, 28th International Conference on
Architecture of Computing Systems (ARCS’15), 2015,
Pages 1–4.
[18] M. Di Natale, A.L. Sangiovanni-Vincentelli, Moving
From Federated to Integrated Architectures in
Automotive: The Role of Standards, Methods and
Tools, Proceedings of the IEEE, Volume 98, Number
4, 2010, Pages 603–620.
[19] B. Nikolic, H.I. Ali, S.M. Petters, L.M. Pinho, Are
virtual channels the bottleneck of priority-aware
wormhole-switched NoC-based many-cores?, 21st
International conference on Real-Time Networks and
Systems (RTNS’13), 2013, Pages 13–22.
[20] S. Pandey, L. Wu, S.M. Guru, R. Buyya, A Particle
Swarm Optimization-Based Heuristic for Scheduling
Workflow Applications in Cloud Computing
Environments, 24th IEEE International Conference on
Advanced Information Networking and Applications
(AINA’10), 2010, Pages 400–407.
[21] J. Park, J. Harnisch, M. Deubzer, K. Jeong et al.,
Mode-Dynamic Task Allocation and Scheduling for an
Engine Management Real-Time System Using a
Multicore Microcontroller, SAE Int. J. Passeng. Cars -
Electron. Electr. Syst., Volume 7, Issue 1, 2014, Pages
133–140.
[22] R. Piscitelli, A.D. Pimentel, Design space pruning
through hybrid analysis in system-level design space
exploration, Design, Automation & Test in Europe
Conference & Exhibition (DATE’2012), 2012, Pages
781–786.
[23] R.C. Prim, Shortest connection networks and some
generalizations, Bell System Technical Journal, Numer
36, 1957, pp. 1389-1401.
[24] W. Quan, A.D. Pimentel, Exploring Task Mappings
on Heterogeneous MPSoCs using a Bias-Elitist
Genetic Algorithm, The Computing Research
Repository (CoRR), 2014.
[25] P.K. Saraswat, P. Pop, J. Madsen, Task Mapping and
Bandwidth Reservation for Mixed Hard/Soft
Fault-Tolerant Embedded Systems, 16th IEEE
Real-Time and Embedded Technology and
Applications Symposium (RTAS’10), 2010, Pages
89–98.
[26] M.N.S.M. Sayuti, L.S. Indrusiak, Real-time low-power
task mapping in Networks-on-Chip, IEEE Annual
Symposium on VLSI (ISVLSI’13), 2013, Pages 14–19.
[27] L. Schor, I. Bacivarov, D. Rai, H. Yang, S.H. Kang, L.
Thiele, Scenario-based design flow for mapping
streaming applications onto on-chip many-core
systems, ACM International conference on Compilers,
architectures and synthesis for embedded systems,
2012, Pages 71–80.
[28] A. Schranzhofer, J.J. Chen, L. Thiele, Dynamic
Power-Aware Mapping of Applications onto
Heterogeneous MPSoC Platforms, IEEE Trans. on
Industrial Informatics, Volume 6, Issue 4, November
2010, Pages 692–707.
[29] Z. Shi, A. Burns, Real-time communication analysis
for on-chip networks with wormhole switching,
ACM/IEEE International Symposium on
Networks-on-Chip (NOCS’08), 2008, Pages 161–170.
[30] S. Stuijk, M. Geilen, B. Theelen, T. Basten,
Scenario-aware dataflow: Modeling, analysis and
implementation of dynamic applications, International
Conference on Embedded Computer Systems
(SAMOS’11), 2011, Pages 404–411.
[31] R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S.
Thesing, D. Whalley, G. Bernat, C. Ferdinand, R.
Heckmann, T. Mitra, F. Mueller, I. Puaut, P.
Puschner, J. Staschulat, P. Stenstrom, The worst-case
execution-time problem-overview of methods and
survey of tools, ACM Trans. Embed. Comput. Syst.,
Volume 7, Number 3, Article 36, 2008, 53 Pages.
[32] K. Tindell, Adding Time-Offsets to Schedulability
Analysis, Technical Report YCS 221, Dept. of
Computer Science, University of York, England,
January 1994.
[33] H. Zhu, S. Goddard, M.B. Dwyer, Response Time
Analysis of Hierarchical Scheduling: The Synchronized
Deferrable Servers Approach, 32nd IEEE Real-Time
Systems Symposium (RTSS’11), 2011, Pages 239–248.
[34] D. Zhu, C. Qian, Challenges in Future Automobile
Control Systems with Multicore Processors, Workshop
on Developing Dependable and Secure Automotive
Cyber-Physical Systems from Components, 2011.
[35] Q. Zhu, H. Zeng, W. Zheng, M. DI Natale, A.
Sangiovanni-Vincentelli, Optimization of task
allocation and priority assignment in hard real-time
distributed systems, ACM Trans. Embed. Comput.
Syst., Volume 11, Issue 4, Article 85, January 2013, 30
Pages.
