Scheduling and Communication Synthesis for Distributed Real-Time Systems by Pop, Paul
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
General rights 
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners 
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. 
 
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. 
• You may not further distribute the material or use it for any profit-making activity or commercial gain 
• You may freely distribute the URL identifying the publication in the public portal  
 
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately 
and investigate your claim. 
   
 
Downloaded from orbit.dtu.dk on: Dec 17, 2017
Scheduling and Communication Synthesis for Distributed Real-Time Systems
Pop, Paul
Publication date:
2000
Link back to DTU Orbit
Citation (APA):
Pop, P. (2000). Scheduling and Communication Synthesis for Distributed Real-Time Systems.
Scheduling and
Communication Synthesis for
Distributed Real-Time Systems
Paul Pop
ISBN 91-7219-776-5 ISSN 0280-7971
PRINTED IN LINKÖPING, SWEDEN
BY LINKÖPING UNIVERSITY
COPYRIGHT © 2000 PAUL POP
Lianei

Abstract
EMBEDDED SYSTEMS ARE now omnipresent: from cellular
phones to pagers, from microwave ovens to PDAs, almost all the
devices we use are controlled by embedded systems. Many embed-
ded systems have to fulfill strict requirements in terms of perform-
ance and cost efficiency. Emerging designs are usually based on
heterogeneous architectures that integrate multiple programmable
processors and dedicated hardware components. New tools which
extend design automation to system level have to support the inte-
grated design of both the hardware and software components of
such systems.
This thesis concentrates on aspects of scheduling and communi-
cation for embedded real-time systems. Special emphasis has been
placed on the impact of the communication infrastructure and pro-
tocol on the overall system performance. The scheduling and com-
munication strategies proposed are based on an abstract graph
representation which captures, at process level, both the dataflow
and the flow of control. We have considered non-preemptive static
cyclic scheduling and preemptive scheduling with static priorities
for the scheduling of processes, while the communications are stat-
ically scheduled according to the time triggered protocol. We have
developed static cyclic scheduling algorithms for time-driven sys-
tems with control and data dependencies. We show that by consid-
ering aspects of the communication protocol, significant
improvements can be gained in the schedule quality. In the context
of event-driven systems we have proposed a less pessimistic sched-
ulability analysis that is able to handle both control and data
dependencies. Moreover, we have provided a schedulability analysis
for the time-triggered protocol, and we have proposed several opti-
mization strategies for the synthesis of communication protocol
parameters. Extensive experiments as well as real-life examples
demonstrate the efficiency of our approaches.

Acknowledgements
I WOULD LIKE to express thanks towards my advisors Petru
Eles and Zebo Peng for their precious guidance during my grad-
uate studies and their valuable comments on the thesis.
Many thanks also to the people at Volvo Technological Deve-
lopment in Gothenburg, especially to Jakob Axelsson, for their
insightful ideas during the early stages of this work.
I am also grateful towards my colleagues at IDA and in the
ARTES network for providing a creative and pleasant working
environment, and towards the administrative and technical
staff at IDA, that have always been supportive.
Last, but not least, my deepest gratitude towards my family
and friends for their love and encouragement.

Contents
Introduction 1
Motivation 2
Problem Formulation 4
Contributions 6
Thesis Overview 7
System Model and Architecture 9
Design Representation 9
Conditional Process Graph 10
System Architecture 14
Time vs. Events 14
Hardware Architecture 15
Software Architecture 17
Related Work 23
Codesign 24
Scheduling 25
Aspects Related to Communication 29
Scheduling and Bus Access Optimization
for Time Driven Systems 31
Scheduling with Control and Data Dependencies 32
List Scheduling based Algorithm 35
PCP Priority Function 36
Scheduling for Time Driven Systems 38
Scheduling of Messages with the Time Triggered Protocol 40
Improved Priority Function 42
Communication Synthesis 45
Experimental Results 49
Schedulability Analysis and Communication Synthesis
for Event Driven Systems 53
Schedulability Analysis 54
Schedulability Analysis with the Time Triggered Protocol 56
Static Single Message Allocation 59
Static Multiple Message Allocation 60
Dynamic Message Allocation 61
Dynamic Packets Allocation 63
Schedulability Analysis under Control and Data Dependencies 65
Tasks with Data Dependencies 68
Conditional Process Graphs 71
Communication Synthesis 77
Experimental Results 83
Application 95
Cruise Controller 96
Experimental Results 97
Conclusions and Future Work 101
Conclusions 101
Future Work 103
INTRODUCTION
1
Chapter 1
Introduction
THIS THESIS CONCENTRATES on aspects related to the sched-
uling and synthesis of distributed embedded real-time systems
consisting of programmable processors and application specific
hardware components.
We have investigated the impact of particular communication
infrastructures and protocols on the overall performance and
how the requirements of such an infrastructure have to be con-
sidered for process and communication scheduling. Not only
have particularities of the underlying architecture to be consid-
ered during scheduling, but the parameters of the communica-
tion protocol should also be adapted to fit the particular
embedded application.
The approaches to scheduling and system synthesis are based
on an abstract graph representation which captures, at process
level, both dataflow and the flow of control.
This introductory chapter presents the motivation behind our
research, the formulation of the research problems, and our con-
tributions. An overview of the thesis is also presented.
CHAPTER 1
2
1.1 Motivation
Figure 1.1 presents the microprocessor market share in the year
1999 [Tur99]. As we can see from the figure, less than 1% of the
world’s microprocessors are used in general purpose systems
(i.e., computers). More than 99% are used in embedded real-time
systems. Embedded real-time systems are now omnipresent:
from cellular phones to pagers, from microwave ovens to PDAs,
almost all the devices we use are controlled by embedded sys-
tems.
Many embedded systems have to fulfill strict requirements in
terms of performance and cost efficiency. Emerging designs are
usually based on heterogeneous architectures that integrate
multiple programmable processors and dedicated hardware
components. New tools which extend design automation to sys-
tem level have to support the integrated design of both the hard-
ware and software components of such systems.
During the synthesis of an embedded system the designer
maps the functionality captured by the input specification on
different architectures, trying to find the most cost efficient solu-
tion which, at the same time, meets the design requirements
[Ern98]. This design process implies the iterative execution of
several allocation and partitioning steps before the hardware
and software components of the final implementation are gener-
ated. The term hardware/software codesign is often used to
Figure 1.1: Microprocessor Market Shares
Embedded Systems (99%)
General Purpose (1%)
INTRODUCTION
3
denote this system-level design process. Surveys on this topic
can be found in [Mic96, Mic97, Ern98, Gaj95, Sta97, Wol94].
Figure 1.2 presents one possible codesign flow. The design
starts with an abstract system specification. The initial system
specification is implementation independent which means that
no assumptions are made concerning how different parts will
later be implemented. Thus, different implementation alterna-
tives can be evaluated, including hardware/software trade-offs.
Moving further into the design process, the designer has to
decide what components to include in the hardware architecture
and how these components are connected. This is the so called
architecture selection phase. Following the selection of the archi-
System
Specification
Architecture
selection
Scheduling
Hardware
Synthesis
Software
Synthesis
Integration
Figure 1.2: A Codesign Flow
Partitioning
CHAPTER 1
4
tecture components, the designer has to decide what part of the
functionality should be implemented on which of the selected
components (mapping) and what is the execution order of the
resulting tasks (scheduling).
Scheduling has to be performed during several phases of the
design flow. We can, for example, use scheduling for performance
estimation during the architecture selection and mapping
phases where we are interested to quickly explore design
alternatives and compare them in terms of timing behaviour. In
addition, scheduling can also be used during the final stages of
the design process when we are interested to synthesize the
system such that time constraints are fulfilled.
Once a partitioning into hardware and software and a map-
ping have been decided on, the design process continues with the
software synthesis and hardware synthesis phases. In the final
phase, the hardware and software parts are integrated and
tested. All these design steps can partially overlap, and they can
be assisted by (semi)automatic synthesis tools.
In this thesis we concentrate on several aspects related to the
scheduling and synthesis of systems consisting of communicat-
ing processes which are implemented on multiple processors
and dedicated hardware components. In such a system, in which
several processes communicate with each other and share
resources like processors and buses, scheduling of processes and
communications is a factor with a decisive influence on the per-
formance of the system and on the way it meets its timing con-
straints.
1.2 Problem Formulation
The input for our problem is a model of a real-time system cap-
tured using a set of conditional process graphs [Ele98a, Ele00]
described in detail in Section 2.1.1. Each node in this graph rep-
resents one process that can potentially be assigned to one of
INTRODUCTION
5
several programmable or hardware processors. Estimated worst
case execution time for each process on each potential host proc-
essor is given. We assume that the amount of data to be trans-
ferred during communication between two processes has been
determined in advance.
We consider a generic architecture consisting of programma-
ble processors and application specific hardware processors
(ASICs) connected through several buses. As the communication
infrastructure for our distributed real-time system we consider
the time-triggered protocol (TTP) [Kop94]. TTP is well suited for
safety critical distributed real-time embedded systems and rep-
resents one of the emerging standards for several application
areas like, for example, automotive electronics [Wir98]. Chapter
3 describes in more detail the system architectures considered,
with Section 3.2.1 introducing the time-triggered protocol.
In our approach, process scheduling can use either a non-
preemptive static cyclic or a static priority preemptive scheduling
approach while the bus communication is statically scheduled
according to the TTP. Only one process can be executed at a time
by a programmable processor while a hardware processor can
execute processes in parallel. Processes on different processors
can be executed in parallel. Only one data transfer can be per-
formed by a bus at a given moment. Data transfer on buses and
computation can overlap.
Algorithms for automatic hardware/software partitioning
have been presented in [Axe96, Ele97, Ern98]. The problems
discussed in this thesis concern the performance estimation of a
given design alternative and scheduling of processes and com-
munications. Thus, we assume that each process has been
assigned to a (programmable or hardware) processor and each
communication channel which connects processes assigned to
different processors has been assigned to a bus.
In this context, our goals are the following:
 • derive a schedulability analysis for systems with both control
CHAPTER 1
6
and data dependencies,
 • derive a schedulability analysis for systems where the com-
munication takes place using the time-triggered protocol,
 • determine an as small as possible worst case delay by which
the system completes its execution and generate the static
schedule such that this delay is guaranteed, and
 • determine the parameters of the communication protocol so
that the overall system performance is optimized and, thus,
the imposed time constraints can be satisfied.
1.3 Contributions
In our approach, an embedded system is viewed as a set of inter-
acting processes mapped on an architecture consisting of several
programmable processors and ASICs interconnected by a com-
munication channel.
Process interaction is not only in terms of dataflow but also
captures the flow of control, since some processes can be acti-
vated depending on conditions computed by previously executed
processes.
We have considered both the non-preemptive static cyclic
scheduling and the static priority preemptive scheduling
approaches for the scheduling of processes, while the communi-
cations are statically scheduled according to the TTP.
The scheduling strategies are based on a realistic communica-
tion model and execution environment. We take into considera-
tion the overheads due to communications and to the execution
environment and consider the requirements of the communica-
tion protocol during the scheduling process.
The main contributions of this thesis are:
 • a less pessimistic schedulability analysis technique in order
to bound the response time of a hard real-time system with
both control and data dependencies (modelled as a condi-
INTRODUCTION
7
tional process graph) [Pop00b, Pop00c];
 • a schedulability analysis in the context of the time-triggered
protocol, considering four different approaches to message
scheduling [Pop00a, Pop99d];
 • a static scheduling strategy for systems with both data and
control dependencies, that takes into consideration the over-
heads due to communications and to the execution environ-
ment and considers the requirements of the communication
protocol during the scheduling process [Pop99a, Pop99c]; and
 • several optimization strategies for the synthesis of the bus
access scheme in order to fit the communication particulari-
ties of a certain application [Pop00a, Pop99a, Pop99b].
1.4 Thesis Overview
This thesis has 7 chapters, and it is structured as follows:
 • Chapter 2 introduces the conditional process graph,
describes the hardware and software architectures consid-
ered and presents the time-triggered protocol.
 • Chapter 3 presents the related work in the areas of schedul-
ing and communication synthesis, as well as some basic
approaches to hardware/software codesign.
 • Chapter 4 considers a non-preemptive static scheduling
approach both for processes and messages. In such a context,
we present previous work on the static cyclic scheduling of
systems with data and control dependencies. This work is
then extended to handle the scheduling of messages over the
TTP. Several approaches to the synthesis of communication
parameters for the TTP are proposed and they are later eval-
uated based on extensive experiments.
 • Chapter 5 assumes a preemptive fixed priority scheduling
approach for the processes and a non-preemptive static cyclic
scheduling approach for the messages, according to the TTP.
A schedulability analysis of the TTP is developed considering
CHAPTER 1
8
four message scheduling approaches. This analysis is then
extended to systems with data and control dependencies.
Optimization strategies that derive the parameters of the
communication protocol are proposed. Extensive experi-
ments evaluate the optimization strategies, and show that
by considering both data and control dependencies we are
able to reduce the pessimism of the analysis.
 • Chapter 6 presents a real-life example. We apply our sched-
uling and communication synthesis strategies to a vehicle
cruise controller, and the results obtained validate our
research.
 • Chapter 7 is the final chapter of the thesis and presents our
conclusions and future work ideas.
SYSTEM MODEL AND ARCHITECTURE
9
Chapter 2
System Model and
Architecture
THIS CHAPTER PRESENTS preliminaries for the later discus-
sions. We start by introducing the conditional process graph that
is used for system modelling, and then continue with the presen-
tation of the hardware architecture considered. Our contribu-
tion is the software architecture designed for both time-driven
and event-driven systems.
2.1 Design Representation
The specification that is at the input of the design process out-
lined in Figure 1.2 in the previous chapter could actually be very
heterogeneous. Different formalisms are used to specify and
model different parts of the system. Then, the information
needed for the subsequent design phases, such as architecture
selection, partitioning, scheduling, verification, etc., have to be
extracted and mapped to internal representations that are more
CHAPTER 2
10
suited for that purpose. In certain cases, different internal mod-
els can be used for different tasks to be performed during system
analysis and design.
There is a lot of research in the area of system modelling and
specification, and an impressive number of representations have
been proposed. An overview and classification of different design
representations is given in [Edw97, Ern99].
In this thesis we use the conditional process graph [Ele98,
Ele00] as an abstract model for system representation.
2.1.1 CONDITIONAL PROCESS GRAPH
A process graph is an abstract representation consisting of a
directed, acyclic, polar graph G(V, ES, EC). Each node Pi∈V rep-
resents one process. ES and EC are the sets of simple and condi-
tional edges respectively. ES ∩ EC = and ES ∪ EC = E, where E
is the set of all edges. An edge eij∈E from Pi to Pj indicates that
the output of Pi is the input of Pj. The graph is polar, which
means that there are two nodes, called source and sink, that con-
ventionally represent the first and last process. These nodes are
introduced as dummy processes, with zero execution time and
no resources assigned, so that all other nodes in the graph are
successors of the source and predecessors of the sink respec-
tively.
A mapped process graph, Γ(V*, ES*, EC*, M), is generated from
a process graph G(V, ES, EC) by inserting additional processes
(communication processes) on certain edges and by mapping
each process to a given processing element. The mapping of
processes Pi∈V* to processors and buses is given by a function
M: V*→PE, where PE={pe1, pe2, .., peNpe} is the set of processing
elements. PE=PP∪HP∪B, where PP is the set of programmable
processors, HP is the set of dedicated hardware components, and
B is the set of allocated buses. In certain contexts, we will call
both programmable processors and hardware components sim-
ply processors. For any process Pi, M(Pi) is the processing ele-
∅
SYSTEM MODEL AND ARCHITECTURE
11
ment to which Pi is assigned for execution. In the rest of this
thesis, when we use the term conditional process graph (CPG),
we consider a mapped process graph as defined here.
Each process Pi, assigned to a programmable or hardware
processor M(Pi), is characterized by an execution time tPi.
In the process graph depicted in Figure 2.1, P0 and P15 are the
source and sink nodes respectively. The nodes denoted P1, P2, .., P14
are “ordinary” processes specified by the designer. They are
assigned to one of the two programmable processors or to the hard-
ware component (ASIC). The rest of the nodes are so called commu-
nication processes and they are represented in Figure 2.1 as solid
circles. They are introduced during the generation of the system
representation for each connection which links processes mapped to
different processors. These processes model inter-processor com-
D
C
P0
P6
P8
P9P10
P11
P2
P4
P3P14
P15
P1
P5
P7
P12
P13
C
D
3
8
30
2
4
2
2
330
3
4
38
1
3
1
1
1
1
1
ASIC
Processor 1
Processor 2
Buss
Figure 2.1: Conditional Process Graph
CHAPTER 2
12
munication and their execution time ti,j (where Pi is the sender
and Pj the receiver process) is equal to the corresponding com-
munication time. All communications in Figure 2.1 are per-
formed on one bus.
An edge eij∈EC is a conditional edge (represented with thick
lines in Figure 2.1) and has an associated condition value.
Transmission on such an edge takes place only if the associated
condition value is true and not, like on simple edges, for each
activation of the input process Pi. In Figure 2.1 processes P1 and
P7 have conditional edges at their output.
We call a node with conditional edges at its output a disjunction
node (and the corresponding process a disjunction process). A dis-
junction process has one associated condition, the value of which
it computes. Alternative paths starting from a disjunction node,
which correspond to complementary values of the condition, are
disjoint and they meet in a so called conjunction node (with the
corresponding process called conjunction process)1. In Figure 2.1
circles representing conjunction and disjunction nodes are
depicted with thick borders. The alternative paths starting from
disjunction node P1, which computes condition C, meet in con-
junction node P5. We assume that conditions are independent
and alternatives starting from different processes cannot
depend on the same condition.
A process, that is not a conjunction process, can be activated
only after all its inputs have arrived. A conjunction process can
be activated after messages coming on one of the alternative
paths have arrived. All processes issue their outputs when they
terminate. If we consider the activation time of the source process
as a reference, the activation time of the sink process is the delay
of the system at a certain execution. This delay has to be, in the
worst case, smaller than a certain imposed deadline. Release
1. If no process is specified on an alternative path, it is modelled by a con-
ditional edge from the disjunction to the corresponding conjunction node
(a communication process may be inserted on this edge at mapping).
SYSTEM MODEL AND ARCHITECTURE
13
times of some processes as well as multiple deadlines can be eas-
ily modelled by inserting dummy nodes between certain proc-
esses and the source or the sink node respectively. These dummy
nodes represent processes with a certain execution time but
which are not allocated to any processing element.
A boolean expression XPi, called a guard, can be associated to
each node Pi in the graph. It represents the necessary conditions
for the respective process to be activated. XPi is not only neces-
sary but also sufficient for process Pi to be activated during a
given system execution. Thus, two nodes Pi and Pj, where Pj is
not a conjunction node, are connected by an edge eij only if
XPj⇒XPi (which means that XPi is true whenever XPj is true).
This avoids specifications in which a process is blocked even if
its guard is true, because it waits for a message from a process
which will not be activated. If Pj is a conjunction node, predeces-
sor nodes Pi can be situated on alternative paths corresponding
to a condition.
The above execution semantics is that of a so called single rate
system. It assumes that a node is executed at most once for each
activation of the system. If processes with different periods have
to be handled, this can be solved by generating several instances
of the processes and building a CPG which corresponds to a set
of processes as they occur within a time period that is equal to
the least common multiple of the periods of the involved proc-
esses.
As mentioned, we consider execution times of processes, as
well as the communication times, to be given. In the Figure 2.1
they are depicted to the right of each node. In the case of hard
real-time systems this will, typically, be worst case execution
times and their estimation has been extensively discussed in the
literature [Eng99, Li95, Lun99, Mal97]. For many applications,
actual execution times of processes are depending on the current
data and/or the internal state of the system. By explicitly cap-
turing the control flow in our model, we allow for a more fine-
tuned modeling and a tighter (less pessimistic) assignment of
CHAPTER 2
14
worst case execution times to processes, compared to traditional
data-flow based approaches.
2.2 System Architecture
As pointed out in the introductory chapter, real-time systems
are nowadays omnipresent. Depending on the particular appli-
cation implemented, real-time systems can be implemented as
uniprocessor, multiprocessor, or distributed. Systems can be
hard or soft, event-driven or time-driven, fault-tolerant, autono-
mous, etc. A good classification of real-time systems is given in
[Kop97a].
This chapter describes the architecture we consider in this
thesis for the implementation of a distributed real-time system.
Our hardware architecture consists of a set of nodes intercon-
nected by a communication channel that uses the time-triggered
protocol as the communication protocol. The software architec-
ture depends on the triggering mechanisms for the start of com-
munication and processing activities.
2.2.1 TIME VS. EVENTS
According to [Kop97a] a trigger is “an event that causes the start
of some action, e.g., the execution of a task or the transmission of
a message.” Different approaches to the design of real-time sys-
tems can be identified, based on the triggering mechanisms for
the processing and communication: event-triggered or time-trig-
gered.
In the event-triggered approach all the activities happen
when a significant change of state occurs. The significant events
are brought to the attention of the CPU by the interrupt mecha-
nism. Event-triggered systems typically require preemptive pri-
ority-based scheduling, where the appropriate process is
invoked to service the event.
SYSTEM MODEL AND ARCHITECTURE
15
In the time-triggered approach all the activities are initiated
at predetermined points in time. Thus, there is only one inter-
rupt in each node of a distributed time-triggered system, the
time interrupt. In a distributed time-triggered system it is
assumed that the clocks of all nodes are synchronized to provide
a global notion of time. Time-triggered systems typically require
non-preemptive static cyclic scheduling, where the process acti-
vation or message communication is done based on a schedule
table built off-line.
We consider the time-triggered protocol for the communica-
tion infrastructure, and thus, the communication of messages is
time-triggered. However, depending on the particular applica-
tion, the activation of processes can be either time-triggered
(Chapter 4) or event-triggered (Chapter 5).
2.2.2 HARDWARE ARCHITECTURE
We consider architectures consisting of nodes connected by a
broadcast communication channel (Figure 2.2). Every node con-
sists of a TTP controller [Kop97b], a CPU, a RAM, a ROM and an
I/O interface to sensors and actuators. A node can also have an
ASIC in order to accelerate parts of its functionality.
Time Triggered Protocol. Communication between nodes is
based on the time-triggered protocol (TTP) [Kop94]. TTP was
designed for distributed real-time applications that require pre-
dictability and reliability (e.g, drive-by-wire). It integrates all
the services necessary for fault-tolerant real-time systems. TTP
services of importance to our problems are: message transport
with acknowledgment and predictable low latency, clock syn-
chronization within the microsecond range and rapid mode
changes.
The communication channel is a broadcast channel, so a mes-
sage sent by a node is received by all the other nodes. The bus
access scheme is time-division multiple-access (TDMA) (Figure
CHAPTER 2
16
2.3). Each node Ni can transmit only during a predetermined
time interval, the so called TDMA slot Si. In such a slot, a node
can send several messages packaged in a frame. We consider
that a slot Si is at least large enough to accommodate the largest
message generated by any process assigned to node Ni, so the
messages do not have to be split in order to be sent. A sequence
of slots corresponding to all the nodes in the architecture is
called a TDMA round. A node can have only one slot in a TDMA
round. Several TDMA rounds can be combined together in a
cycle that is repeated periodically. The sequence and length of
the slots are the same for all the TDMA rounds. However, the
length and contents of the frames may differ.
Every node has a TTP controller that implements the protocol
services, and runs independently of the node’s CPU. Communi-
cation with the CPU is performed through a so called message
base interface (MBI) which is usually implemented as a dual
ported RAM (Figure 2.4).
The TDMA access scheme is imposed by a so called message
descriptor list (MEDL) that is located in every TTP controller.
TTP Controller
I/O Interface
RAM
ROM
ASIC
CPU
Sensors/Actuators
... ...
Node
Figure 2.2:  System Architecture
SYSTEM MODEL AND ARCHITECTURE
17
The MEDL basically contains: the time when a frame has to be
sent or received, the address of the frame in the MBI and the
length of the frame. MEDL serves as a schedule table for the
TTP controller which has to know when to send or receive a
frame to or from the communication channel.
The TTP controller provides each CPU with a timer interrupt
based on a local clock, synchronized with the local clocks of the
other nodes. The clock synchronization is done by comparing the
a-priori known time of arrival of a frame with the observed
arrival time. By applying a clock synchronization algorithm,
TTP provides a global time-base of known precision, without any
overhead on the communication.
Information transmitted on the bus has to be properly format-
ted in a frame. A TTP frame has the following fields: start of
frame, control field, data field, and CRC field. The data field can
contain one or more application messages.
2.2.3 SOFTWARE ARCHITECTURE
We have designed two distinct software architectures: one for
time-triggered systems, and another for event-triggered sys-
tems. The main component of both software architectures is a
real-time kernel that runs on top of each node of the architec-
ture.
TDMA Round
Cycle of two rounds
Slot
S0 S1 S2 S3 S0 S1 S2 S3
Frames
Figure 2.3: Buss Access Scheme
CHAPTER 2
18
Time Triggered Systems. Each kernel in the software archi-
tecture for the time-triggered systems has a schedule table. This
schedule table contains all the information needed to take deci-
sions on activation of processes and transmission of messages,
based on the values of conditions (Table 4.1).
In order to run a predictable hard real-time application the
overhead of the kernel and the worst case administrative over-
head (WCAO) of every system call has to be determined. Having
a time-triggered system, all the activity is derived from the pro-
gression of time which means that there are no other interrupts
except for the timer interrupt.
Several activities, like polling of the I/O or diagnostics, take
place directly in the timer interrupt routine. The overhead due
to this routine is expressed as the utilization factor Ut. Ut repre-
sents a fraction of the CPU power utilized by the timer interrupt
routine, and has an influence on the execution times of the proc-
esses.
We also have to take into account the overheads for process
activation and message passing. For process activation we con-
P1 P2
RT-Kernel
MBI
CPU
TTP Controller
P3
RT-Kernel
MBI
CPU
TTP Controller
S1 S0 S1
tm2
m1
m1
m2
m2
m2 m2
Figure 2.4:  Message Passing, Time-Driven Systems
N0 N1
Round 2
SYSTEM MODEL AND ARCHITECTURE
19
sider an overhead δPA. The message passing mechanism is illus-
trated in Figure 2.4, where we have three processes, P1 to P3. P1
and P2 are mapped to node N0 that transmits in slot S0, and P3
is mapped to node N1 that transmits in slot S1. Message m1 is
transmitted between P1 and P2 that are on the same node, while
message m2 is transmitted from P1 to P3 between the two nodes.
We consider that each process has its own memory locations for
the messages it sends or receives and that the addresses of the
memory locations are known to the kernel through the schedule
table.
P1 is activated according to the schedule table, and when it
finishes it calls the send kernel function in order to send m1, and
then m2. Based on the schedule table, the kernel copies m1 from
the corresponding memory location in P1 to the memory location
in P2. The time needed for this operation represents the WCAO
δS for sending a message between processes located on the same
node1. When P2 will be activated it finds the message in the
right location. According to our scheduling policy, whenever a
receiving process needs a message, the message is already
placed in the corresponding memory location. Thus, there is no
overhead on the receiving side, for messages exchanged on the
same node.
Message m2 has to be sent from node N0 to node N1. At a cer-
tain time, known from the schedule table, the kernel transfers
m2 to the TTP controller by packaging m2 into a frame in the
MBI. The WCAO of this function is δKS. Later on, the TTP con-
troller knows from its MEDL when it has to take the frame from
the MBI, in order to broadcast it on the bus. In our example the
timing information in the schedule table of the kernel and the
MEDL is determined in such a way that the broadcasting of the
frame is done in the slot S0 of Round 2. The TTP controller of
1. Overheads δS, δKS and δKR depend on the length of the transferred mes-
sage; in order to simplify the presentation this aspect is not discussed
further.
CHAPTER 2
20
node N1 knows from its MEDL that it has to read a frame from
slot S0 of Round 2 and to transfer it into the MBI. The kernel in
node N1 will read the message m2 from the MBI, with a corre-
sponding WCAO of δKR. When P3 will be activated based on the
local schedule table of node N1, it will already have m2 in its
right memory location.
Event Triggered Systems. Each kernel in the software
architecture for the event-triggered systems has a so called tick
scheduler. The tick scheduler is activated periodically by the
timer interrupts and decides on activation of processes, based on
their priorities. Several activities, like polling of the I/O or diag-
nostics, take also place in the timer interrupt routine.
As in the previous section, the overhead of the kernel and the
worst case administrative overhead (WCAO) of every system
call have to be determined. Our schedulability analysis takes
into account these overheads, and also the overheads due to the
message passing.
P1 P2
RTK
MBI
CPU
TTP Controller
P3
MBI
CPU
TTP Controller
S1 S0 S1
m1 m1m2
m2
Figure 2.5:  Message Passing, Event-Driven Systems
N0 N1
Round 2
T
O
u
t
m2
RTK
DO
u
t
m2
m2
SYSTEM MODEL AND ARCHITECTURE
21
The message passing mechanism is illustrated in Figure 2.5,
where we have three processes, P1 to P3. P1 and P2 are mapped
to node N0 that transmits in slot S0, and P3 is mapped to node
N1 that transmits in slot S1. Message m1 is transmitted between
P1 and P2 that are on the same node, while message m2 is trans-
mitted from P1 to P3 between the two nodes.
Messages between processes located on the same processor
are passed through shared protected objects. The overhead for
their communication is accounted for by the blocking factor,
computed according to the priority ceiling protocol [Sha90].
Message m2 has to be sent from node N0 to node N1. Thus,
after m2 is produced by P1, it will be placed into an outgoing
message queue, called Out. The access to the queue is guarded
by a priority-ceiling semaphore. A so called transfer process
(denoted with T in Figure 2.5) moves the message from the Out
queue into the MBI.
How the message queue is organized and how the message
transfer process selects the particular messages and assembles
them into a frame, depends on the particular approach chosen
for message scheduling (see Section 5.1). The message transfer
process is activated at certain a priori known moments, by the
tick scheduler in order to perform the message transfer. These
activation times are stored in a message handling time table
(MHTT) available to the real-time kernel in each node. Both the
MEDL and the MHTT are generated off-line as result of the
schedulability analysis and optimization which will be discussed
later. The MEDL imposes the times when the TTP controller of a
certain node has to move frames from the MBI to the communi-
cation channel. The MHTT contains the times when messages
have to be transferred by the message transfer process from the
Out queue into the MBI, in order to further be broadcasted by
the TTP controller. As result of this synchronization, the activa-
tion times in the MHTT are directly related to those in the
MEDL and the first table results directly from the second one.
CHAPTER 2
22
It is easy to observe that we have the most favourable situa-
tion when, at a certain activation, the message transfer process
finds in the Out queue all the “expected” messages which then
can be packed into the just following frame to be sent by the TTP
controller. However, application processes are not statically
scheduled and availability of messages in the Out queue can not
be guaranteed at fixed times. Worst case situations have to be
considered, as will be shown in Section 5.1.
Let us come back to Figure 2.5. There we assumed a context in
which the broadcasting of the frame containing message m2 is
done in the slot S0 of Round 2. The TTP controller of node N1
knows from its MEDL that it has to read a frame from slot S0 of
Round 2 and to transfer it into its MBI. In order to synchronize
with the TTP controller and to read the frame from the MBI, the
tick scheduler on node N1 will activate, based on its local MHTT,
a so called delivery process, denoted with D in Figure 2.5. The
delivery process takes the frame from the MBI, and extracts the
messages from it. For the case when a message is split into sev-
eral packets, sent over several TDMA rounds, we consider that a
message has arrived at the destination node after all its corre-
sponding packets have arrived. When m2 has arrived, the deliv-
ery process copies it to process P3 which will be activated.
Activation times for the delivery process are fixed in the MHTT
just as explained earlier for the message transfer process.
The number of activations of the message transfer and deliv-
ery processes depends on the number of frames transferred, and
they are taken into account in our analysis, as well as the delay
implied by the propagation on the communication bus.
RELATED WORK
23
Chapter 3
Related Work
A LOT HAS BEEN published in the last years in the areas of
hardware/software codesign and real-time systems research.
The intent of this chapter is to give a brief overview of the previ-
ous research on codesign with an emphasis on scheduling and
communication synthesis.
The aspects of our work that differ from the related research
presented in this chapter are:
 • we consider a more complex system model that is able to cap-
ture both the flow of data and that of control;
 • our system architectures are heterogeneous and consider a
realistic communication model based on the time-triggered
protocol;
 • we consider issues related to the interaction between sched-
uling of processes and communication scheduling; and
 • we have provided system level communication synthesis
strategies that lead to significant improvements on the per-
formance of the system.
CHAPTER 3
24
3.1 Hardware/Software Codesign
Section 1.1 has introduced hardware/software codesign (shorter,
codesign) and presented a possible codesign flow. The intention
of this section is to provide a short overview of this emerging
research area. For more details, the reader is referred to several
surveys on this topic [Mic96, Mic97, Ern98, Gaj95, Sta97,
Wol94].
Codesign is a relatively new research area. The “First Inter-
national Workshop on Hardware/Software Codesign” has taken
place in 1992, and has been an yearly event since then. Around
the same time hardware/software codesign tracks and sessions
have started to appear at important Electronic Design Automa-
tion (EDA) conferences like DAC, DATE, ICCAD, ISSS, etc.
The initial assumptions of codesign were quite restrictive, and
the goals modest. For example, several researchers have
assumed a simple specification in form of a computer program,
and the main goal was to obtain an as high as possible execution
performance within a given cost (acceleration). The architecture
considered consisted of a single processor together with an ASIC
used to accelerate parts of the functionality [Cho95a, Gup95,
Moo97]. In this context, the main problems were to divide the
functionality between the ASIC and the CPU (hardware/soft-
ware partitioning) [Ele97, Ern93, Gup93, Vah94], to automati-
cally generate drivers and other components related to
communication (communication synthesis) [Cho92, Wal94] and
to simulate and verify the resulting system (cosimulation and
coverification) [Val95, Val96]. However, today the initial
assumptions are no longer valid and the goals are much broader
[Bol97, Dav98, Dav99, Dic98, Lak99, Ver96]:
 • The applications are heterogeneous, consisting of hardware
and software components. Both hardware and software can
be data or control dominated, and hardware can be both dig-
RELATED WORK
25
ital and analog.
 • The specification for such applications is inherently hetero-
geneous and complex. Several languages as well as several
models of computation can be found within a specification.
 • The architectures are varied ranging from distributed
embedded systems, in the automotive electronics area, to
systems on a chip used in telecommunications.
 • The goals include not only acceleration with minimal hard-
ware cost, but also issues related to the reuse of legacy hard-
ware and software subsystems, real-time constraints, quality
of service, fault tolerance and dependability, power consump-
tion, flexibility, time-to-market, etc.
3.2 Scheduling
Process scheduling for performance estimation and synthesis of
real-time systems has been intensively researched in the last
years. The existing approaches differ in the scheduling strategy
adopted, system architectures considered, handling of the com-
munication and process interaction aspects. However, our main
distinction in this section will be made between non-preemptive
static cyclic scheduling and preemptive fixed-priority schedul-
ing. We have to mention that performance estimation and sched-
uling of processes typically requires, as an input, estimated
execution times of single processes [Eng99, Ern97, Gon95,
Hen95, Li95, Lun99, Mal97, Suz96].
Non-preemptive static cyclic scheduling. Static cyclic
scheduling of a set of data dependent software processes on a
multiprocessor architecture has been intensively researched
[Kop97a, Xu00].
Several approaches are based on list scheduling heuristics
using different priority criteria [Cof72, Deo98, Jor97, Kwo96,
Wu90] or on branch-and-bound algorithms [Kas84]. These
approaches are based on the assumption that a number of iden-
CHAPTER 3
26
tical processors are available to which processes are progres-
sively assigned as the static schedule is elaborated. Such an
assumption is obviously not acceptable for distributed embed-
ded systems which are heterogeneous by nature. In [Jor97] a list
scheduling based approach is extended to handle heterogeneous
architectures. Scheduling is performed by progressively assign-
ing tasks to the allocated processors with the goal to minimize
the length of the schedule. The proposed algorithm handles only
processors which execute one single process at a time (not typi-
cal for hardware) and the resulting partitioning does not take
into consideration any design constraints.
In [Ben96, Pra92] static scheduling and partitioning of proc-
esses, and allocation of system components, are formulated as a
mixed integer linear programming (MILP) problem. A disadvan-
tage of this approach is the complexity of solving the MILP
model. The size of such a model grows quickly with the number
of processes and allocated resources. In [Kuc97] a formulation
using constraint logic programming has been proposed for simi-
lar problems.
In all the previous approaches process interaction is only in
terms of dataflow. However, when including control dependen-
cies significant improvements in the quality of the resulting
schedules can be obtained [Ele98a]. Section 4.1 presents in more
detail related research on the static scheduling for systems with
control and data dependencies that is used as a starting point
for our work.
It has been claimed [Xu93] that static cyclic scheduling
approach is the only approach that can solve a certain class of
problems. However, advances in the area of fixed priority
preemptive scheduling show that such classes of problems can
also be handled with other scheduling strategies [Aud93,
Tin94b].
RELATED WORK
27
Fixed priority preemptive scheduling. Preemptive sched-
uling of independent processes with static priorities running on
single processor architectures has its roots in [Liu73]. The
approach has been later extended to accommodate more general
computational models and has also been applied to distributed
systems [Tin94a]. The reader is referred to [Aud95, Bal98,
Sta93] for surveys on this topic.
In [Yen97] performance estimation is based on a preemptive
scheduling strategy with static priorities using rate monotonic
analysis. In [Lee99] an earlier deadline first strategy is used for
non-preemptive scheduling of processes with possible data
dependencies. Preemptive and non-preemptive static scheduling
are combined in the cosynthesis environment described in
[Dav98, Dav99].
In many of the previous scheduling approaches researchers
have assumed that processes are scheduled independently. How-
ever, this is not the case in reality, where process sets can exhibit
both data and control dependencies. Moreover, knowledge about
these dependencies can be used in order to improve the accuracy
of schedulability analyses and the quality of produced schedules.
One way of dealing with data dependencies between processes
with static priority based scheduling has been indirectly
addressed by the extensions proposed for the schedulability
analysis of distributed systems through the use of the release jit-
ter [Tin94b]. Release jitter is the worst case delay between the
arrival of a process and its release (when it is placed in the run-
queue for the processor) and can include the communication delay
due to the transmission of a message on the communication
channel.
Tindell et al. [Tin94b] and Yen et al. [Yen98] use time offset
relationships and phases, respectively, in order to model data
dependencies. Offset and phase are similar concepts that
express the existence of a fixed interval in time between the
arrivals of sets of processes. The authors show that by introduc-
CHAPTER 3
28
ing such concepts into the computational model, the pessimism
of the analysis is significantly reduced when bounding the time
behaviour of the system. The work has been later extended with
the concept of dynamic offsets [Pal98]. The works by [Tin94b]
and [Yen98] are further detailed in Section 5.2 that introduces
the schedulability analysis for the time-triggered protocol. Also,
a brief introduction to schedulability analysis is presented in
Section 5.1.
When control dependencies exist then, depending on condi-
tions, only a subset of the set of processes is executed during an
invocation of the system. Modes have been used to model a cer-
tain class of control dependencies [Foh93]. Such a model basi-
cally assumes that at the starting of an execution cycle, a
particular functionality is known in advance and is fixed for one
or several cycles until another mode change is performed. How-
ever, modes cannot handle fine grained control dependencies, or
certain combinations of data and control dependencies. Careful
modeling using the periods of processes (lower bound between
subsequent re-arrivals of a process) can also be a solution for
some cases of control dependencies [Ger96]. If, for example, we
know that a certain set of processes will only execute every sec-
ond cycle of the system, we can set their periods to the double of
the period of the rest of the processes in the system. However,
using the worst case assumption on periods leads very often to
unnecessarily pessimistic schedulability evaluations. More
refined process models can produce much better schedulability
results, as will be later shown in the thesis. Recent works
[Bar98a, Bar98b] aim at extending the existing models to han-
dle control dependencies. In [Bar98b] Baruah introduces the
recurring real-time task model that is able to capture lower level
control dependencies, and presents an exponential-time analy-
sis for uniprocessor systems.
RELATED WORK
29
3.3 Aspects Related to Communication
Currently, more and more real-time systems are used in physi-
cally distributed environments and have to be implemented on
distributed architectures in order to meet reliability, functional,
and performance constraints. However, researchers have often
ignored or very much simplified aspects concerning the commu-
nication infrastructure.
One typical approach is to consider communication processes
as processes with a given execution time (depending on the
amount of information exchanged) and to schedule them as any
other process, without considering issues like communication
protocol, bus arbitration, packaging of messages, clock synchro-
nization, etc. These aspects are, however, essential in the con-
text of safety-critical distributed real-time applications and one
of our objectives is to develop a strategy which takes them into
consideration for process scheduling.
Many efforts dedicated to communication synthesis have con-
centrated on the synthesis support for the communication infra-
structure but without considering hard real-time constraints
and system level scheduling aspects [Cho95b, Dav95, Knu99,
Nar94]. Lower level communication synthesis aspects under tim-
ing constraints have been addressed in [Ort98, Knu99].
We have to mention here some results obtained in extending
real-time schedulability analysis so that network communica-
tion aspects can be handled. In [Tin95], for example, the CAN
protocol is investigated while the work reported in [Erm97] con-
siders systems based on the ATM protocol. Analysis for a simple
TDMA protocol is provided in [Tin94a] that integrates processor
and communication schedulability and provide a “holistic”
schedulability analysis in the context of distributed real-time
systems.
CHAPTER 3
30
TIME DRIVEN SYSTEMS
31
Chapter 4
Scheduling and Bus Access
Optimization for Time
Driven Systems
IN THIS CHAPTER we consider time-driven distributed real-
time systems that use the time-triggered protocol for the com-
munication infrastructure. Thus, both the activation of proc-
esses and the transmission of messages are done based on the
progression of time.
The chapter starts by presenting an approach to static sched-
uling under control and data dependencies for distributed real-
time systems [Dob98, Ele98a, Ele00]. The approach considers a
simplified communication model in which the execution time of
the communication processes depends only on the amount of
data exchanged by the processes engaged in the communication.
The communication processes are treated exactly as ordinary
processes during scheduling, and the bus is modelled similar to
a programmable processor that can “execute” one communica-
tion at a time as soon as the communication becomes “ready”.
CHAPTER 4
32
We propose in this chapter several extensions to this
approach:
 • scheduling of messages using a realistic communication
model based on the time-triggered protocol (Section 4.2.1);
 • a new priority function for list scheduling that uses knowl-
edge about the bus access scheme in order to improve the
schedule quality (Section 4.2.2); and
 • optimization strategies for the synthesis of parameters of the
communication protocol, aimed at improving the schedule
quality (Section 4.2.3).
4.1 Scheduling with Control and Data
Dependencies
In our approach, we consider distributed hard-real time systems
modelled using conditional process graphs.
Optimal scheduling has been proven to be an NP complete
problem [Ull75] in even simpler contexts than those characteris-
tic to distributed systems represented as CPGs. Thus, it is
essential to develop heuristics which produce good quality
results in a reasonable time.
In [Dob98, Ele98a, Ele00] the authors concentrate on develop-
ing a scheduling algorithm for systems with both control and
data dependencies, modelled using the conditional process
graph. According to this model, some processes can only be acti-
vated if certain conditions, computed by previously executed
processes, are fulfilled. Thus, process scheduling is complicated
since at a given activation of the system, only a certain subset of
the total amount of processes is executed and this subset differs
from one activation to the other.
The output produced by their scheduling algorithm is a sched-
ule table that contains all the information needed by a distrib-
uted run time scheduler to take decisions on activation of
processes. It is considered that during execution a very simple
TIME DRIVEN SYSTEMS
33
non-preemptive scheduler located in each processing element
decides on process and communication activation depending on
the actual values of conditions. Only one part of the table has to
be stored in each processor, namely the part concerning deci-
sions which are taken by the corresponding scheduler.
Under these assumptions, Table 4.1 presents a possible sched-
ule (produced by the algorithm in Figure 4.1) for the conditional
process graph in Figure 2.1. In Table 4.1 there is one row for
each “ordinary” or communication process, which contains acti-
vation times corresponding to different values of conditions.
Each column in the table is headed by a logical expression con-
structed as a conjunction of condition values. Activation times in
a given column represent starting times of the processes when
the respective expression is true.
According to the schedule in Table 4.1 process P1 is activated
unconditionally at the time 0, given in the first column of the
table. However, activation of some processes at a certain execu-
tion depends on the values of the conditions, which are unpre-
dictable. For example, process P11 has to be activated at t=44 if
C∧D is true and t=52 if C∧D is true. At a certain moment during
the execution, when the values of some conditions are already
known, they have to be used in order to take the best possible
decisions on when and which process to activate. Therefore,
after the termination of a process that produces a condition (dis-
junction process), the value of the condition is broadcasted from
the corresponding processor to all other processors. This broad-
cast is scheduled as soon as possible on the communication
channel, and is considered together with the scheduling of the
messages.
To produce a deterministic behaviour, which is correct for any
combination of conditions, the table has to fulfill several require-
ments:
1. No process will be activated if, for a given execution, the con-
CHAPTER 4
34
ditions required for its activation are not fulfilled.
2. Activation times have to be uniquely determined by the con-
ditions.
3. Activation of a process Pi at a certain time t has to depend
Table 4.1: Schedule Table for Graph in Figure 2.1
process true C C∧D C∧D C C∧D C∧D
P1 0
P2 5
P3 14 14
P4 45 45
P5 51 50 55 47
P6 3 3
P7 7 7
P8 9 9
P9 11 11
P10 13 13
P11 44 52
P12 47 9 55 9
P13 48 13 56 11
P14 14 9
P1,2 4
P4,5 48 47
P2,3 13 13
P3,4 44 44
P12,13 47 10 55
P8,10 12 12
P10,11 43 43
C 3 11 9
D 11 9 11 9
TIME DRIVEN SYSTEMS
35
only on condition values which are determined at the respec-
tive moment t and are known to the processing element
which executes Pi.
4.1.1 LIST SCHEDULING BASED ALGORITHM
As the starting point for our improved scheduling technique that
is tailored for time-triggered embedded systems we consider the
list scheduling based algorithm in [Dob98, Ele00] presented, in
a very simplified form, in Figure 4.1.
ListScheduling(CurrentTime, ReadyList, KnownConditions)
repeat
Update(ReadyList)
for each processing element PE
if PE is free at CurrentTime then
Pi = GetReadyProcess(ReadyList)
if there exists a Pi then
Insert(Pi, ScheduleTable, CurrentTime, KnownConds)
if Pi is a disjunction process then
Ci = condition calculated by Pi
ListScheduling(CurrentTime,
ReadyList ∪ ready nodes from the true branch,
KnownConditions ∪ true Ci)
ListScheduling(CurrentTime,
ReadyList ∪ ready nodes from the false branch,
KnownConditions ∪ false Ci)
end if
end if
end if
end for
CurrentTime = time when a scheduled process terminates
until all processes of this alternative path are scheduled
end ListScheduling
Figure 4.1: List Scheduling Based Algorithm
CHAPTER 4
36
List scheduling heuristics [Ele98b] are based on priority lists
from which processes are extracted in order to be scheduled at
certain moments. In the algorithm presented in Figure 4.1,
there is such a list, ReadyList, that contains the processes which
are eligible to be activated on the corresponding processor at
time CurrentTime. These are processes which have not been yet
scheduled but have all predecessors already scheduled and ter-
minated.
The ListScheduling function is recursive and calls itself for each
disjunction node in order to separately schedule the nodes in the
true branch, and those in the false branch respectively. Thus,
the alternative paths are not activated simultaneously and
resource sharing is correctly achieved (for details on how the
algorithm fulfils the three requirements on the schedule table
we refer to [Ele00]).
An essential component of a list scheduling heuristic is the
priority function used to solve conflicts between ready processes.
The highest priority process will be extracted by function
GetReadyProcess from the ReadyList in order to be scheduled.
4.1.2 PCP PRIORITY FUNCTION
Priorities for list scheduling very often are based on the critical
path (CP) from the respective process to the sink node. Thus, for CP
scheduling, the priority assigned to a process Pi will be the maxi-
mal execution time from the current node to the sink:
,
where πik is the kth path from node.
Considering the concrete definition of the problem, significant
improvements of the resulting schedule can be obtained, without
any penalty in scheduling time, by making use of the available
information on process allocation [Ele98b].
Let us consider the graph in Figure 4.2 and suppose that the
list scheduling algorithm has to decide between scheduling proc-
lPi maxk
tPjP j πik∈
∑=
TIME DRIVEN SYSTEMS
37
ess PA or PB which are both ready to be scheduled on the same
programmable processor or bus pei. In Figure 4.2 we depicted only
the critical path from PA and PB to the sink node. Let us consider
that PX is the last successor of PA on the critical path such that
all processes from PA to PX are assigned to the same processing
element pei. The same holds for PY relative to PB. tA and tB are
the total execution time of the chain of processes from PA to PX
and from PB to PY respectively, following the critical paths. λA
and λB are the total execution times of the processes on the rest of
the two critical paths. Thus, we have:
lPA = tA + λA, and lPB = tB + λB.
However, [Ele98b] does not use the length of these critical
paths as a priority. The policy in [Ele98b] is based on the estima-
tion of a lower bound L on the total delay, taking into considera-
tion that the two chains of processes PA-PX and PB-PY are
executed on the same processor. LPA and LPB are the lower
bounds if PA and PB respectively are scheduled first:
PX PY
PA PB
tA tB
λA
Figure 4.2: Delay estimation for PCP scheduling
P0
λBPN
CHAPTER 4
38
LPA = max(T_current + tA + λA, T_current + tA + tB + λB)
LPB = max(T_current + tB + λB, T_current + tB + tA + λA)
The alternative that offers the perspective of the shorter delay
L = min(LPA, LPB) is selected. It can be observed that if λA > λB
then LPA < LPB, which means that we have to schedule PA first so
that L = LPA; similarly if λB > λA then LPB < LPA, and we have to
schedule PB first in order to get L = LPB.
4.2 Scheduling for Time Driven Systems
We propose several extensions to the scheduling algorithm
briefly described in Section 4.1. The extensions consider a real-
istic communication and execution infrastructure, and include
aspects of the communication protocol in the optimization proc-
ess.
Thus, as an input to our problem we consider a safety-critical
application that has several operating modes, and each mode is
modelled by a conditional process graph. The architecture of the
system is given as described in the Section 2.2. Each process of
the process graph is mapped on a CPU or an ASIC of a node. The
worst case execution time (WCET) for each process mapped on a
processing element is known, as well as the length bmi of each
message.
We are interested to derive a worst case delay on the system
execution time for each operating mode, so that this delay is as
small as possible, and to synthesize the local schedule tables for
each node, as well as the MEDL for the TTP controllers, which
guarantee this delay.
Considering the concrete definition of our problem, the com-
munication time is no longer dependent only on the length of the
message, as assumed in the previous section. Thus, if the mes-
sage is sent between two processes mapped onto different nodes,
the message has to be scheduled according to the TTP protocol.
Several messages can be packaged together in the data field of a
TIME DRIVEN SYSTEMS
39
frame. The number of messages that can be packaged depends
on the slot length corresponding to the node. The effective time
spent by a message mi on the bus is , where is
the length of the slot Si and T is the transmission speed of the
channel. Therefore, the communication time does not depend
on the bit length of the message mi, but on the slot length cor-
responding to the node sending mi.
The important impact of the communication parameters on
the performance of the application is illustrated in Figure 4.3 by
means of a simple example.
In Figure 4.3d we have a process graph consisting of four proc-
esses P1 to P4 and four messages m1 to m4. The architecture con-
sists of two nodes interconnected by a TTP channel. The first
node, N0, transmits on the slot S0 of the TDMA round and the
second node, N1, transmits on the slot S1. Processes P1 and P4
tmi
bSi
T⁄= bSi
t
mi
b
mi
P1
P2 P3
P4
m1 m2
m3 m4
m1 m2 m3 m4
m1 m2 m3 m4
m1 m2 m3 m4
P2 P3
P2 P3
P2 P3
P1 P4
P1 P4
P1
S1 S0
S1S0
S1S0
Round 1 Round 2 Round 3 Round 4 Round 5
Round 1 Round 2 Round 3 Round 4
Round 1 Round 2 Round 3
a) Schedule length of 24 ms
b) Schedule length of 22 ms
c) Schedule length of 20 ms
d) Graph example
P4
Figure 4.3: Scheduling Example
CHAPTER 4
40
are mapped on node N0, while processes P2 and P3 are mapped
on node N1. With the TDMA configuration in Figure 4.3a, where
the slot S1 is scheduled first and slot S0 is second, we have a
resulting schedule length of 24 ms. However, if we swap the two
slots inside the TDMA round without changing their lengths, we
can improve the schedule by 2 ms, as seen on Figure 4.3b. Fur-
ther more, if we have the TDMA configuration in Figure 4.3c
where slot S0 is first, slot S1 is second and we increase the slot
lengths so that the slots can accommodate both of the messages
generated on the same node, we obtain a schedule length of 20
ms which is optimal. However, increasing the length of slots
does not necessarily improve a schedule, as it delays the commu-
nication of messages generated by other nodes.
In the next two sections our goal is to synthesize the local
schedule table of each node and the MEDL of the TTP controller
for a given order of slots in the TDMA round and given slot
lengths. The ordering of slots and the optimization of slot
lengths will be discussed in Section 4.2.3.
4.2.1 SCHEDULING OF MESSAGES WITH THE TTP
Given a certain bus access scheme, which means a given order-
ing of the slots in the TDMA round and fixed slot lengths, the
CPG has to be scheduled with the goal to minimize the worst
case execution delay. This can be performed using the algorithm
ListScheduling (Figure 4.1) presented in Section 4.1.1. Two aspects
have to be discussed here: the planning of messages in predeter-
mined slots and the impact of this communication strategy on
the priority assignment.
The function ScheduleMessage in Figure 4.4 is called in order to
plan the communication of a message m, with length bm, gener-
ated on Nodem and which is ready to be transmitted at
TimeReady. ScheduleMessage returns the first round and the cor-
responding slot (the slot corresponding to Nodem) which can
host the message. In Figure 4.4 RoundLength is the length of a
TIME DRIVEN SYSTEMS
41
TDMA round expressed in time units (in Figure 4.5, for exam-
ple, RoundLength=18 ms). The first round after TimeReady is
the initial candidate to be considered. For this round, however, it
can be too late to catch the right slot, in which case the next
round is selected. When a candidate round is selected we have to
check that there is enough space left in the slot for our message
(boccupied represents the total number of bits occupied by mes-
sages already scheduled in the respective slot of that round). If
no space is left, the communication has to be delayed for another
round.
With this message scheduling scheme, the algorithm in Figure
4.1 will generate correct schedules for a TTP based architecture,
with guaranteed worst case execution delays. However, the qual-
ity of the schedules can be much improved by adapting the pri-
ScheduleMessage (TimeReady, bm, Nodem)
-- the slot in which the message has to be sent
Slot=the slot assigned to Nodem
-- the first round which could be a candidate
Round=
-- is the right slot in this round already gone?
if time_ready - Round * RoundLength > startSlot then
-- if yes, take the next round
Round = Round + 1
end if
-- is enough space left in the slot for the message?
while bm > bSlot - boccupied do
-- if not, take the next round
Round = Round + 1
end while
-- return the right round and slot
return (Round, Slot)
end ScheduleMessage
TimeReady RoundLength⁄
Figure 4.4: Message Scheduling
CHAPTER 4
42
ority assignment scheme so that particularities of the
communication protocol are taken into consideration.
4.2.2 IMPROVED PRIORITY FUNCTION
For the scheduling algorithm outlined previously we initially
used the Partial Critical Path (PCP) priority function [Dob98,
Ele98b, Ele00]. PCP uses as a priority criterion the length of
that part of the critical path corresponding to a process Pi which
starts with the first successor of Pi that is assigned to a proces-
sor different from the processor running Pi. The PCP priority
function is statically computed once at the beginning of the
scheduling procedure.
However, considering the concrete definition of our problem,
significant improvements of the resulting schedule can be
obtained by including knowledge of the bus access scheme into
the priority function. This new priority function will be used by
the GetReadyProcess (Figure 4.1) in order to decide which process
to select from the list of ready process.
Figure 4.5: Priority Function Example
m
P2
P3
P1
S1=8S0=10
Round 0 Round 1
a) Schedule length of 40 ms
m
P1
Round 0 Round 1
b) Schedule length of 36 ms
P2
P3
S0=10 S1=8
P4
P0
P1
P2
P4
m
c) Graph example
P3
16
8
8 6
4
P4
TIME DRIVEN SYSTEMS
43
Let us consider the graph in Figure 4.5c, and suppose that the
list scheduling algorithm has to decide between scheduling proc-
ess P1 or P2 which are both ready to be scheduled on the same
programmable processor. The worst case execution time of the
processes is depicted on the right side of the respective node and
is expressed in ms. The architecture consists of two nodes inter-
connected by a TTP channel. Processes P1 and P2 are mapped on
node N1, while processes P3 and P4 are mapped on node N0.
Node N0 transmits on slot S0 of the TDMA round and N1 trans-
mits on slot S1. Slot S0 has a length of 10 ms while slot S1 has a
length of 8 ms. For simplicity we suppose that there is no mes-
sage transferred between P1 and P3. PCP (see Section 4.1.2)
assigns a higher priority to P1 because it has a partial critical
path of 12, starting from P3, longer than the partial critical path
of P2 which is 10 and starts from m. This results in a schedule
length of 40 ms as depicted in Figure 4.5a. On the other hand, if
we schedule P2 first, the resulting schedule, depicted in Figure
4.5b, is of only 36 ms.
This apparent anomaly is due to the fact that the way we have
computed PCP priorities, considering message communication
as a simple activity of delay 6ms, is not realistic in the context of
a TDMA protocol. Let us consider the particular TDMA configu-
ration in Figure 4.5 and suppose that the scheduler has to decide
at t=0, which one of the processes P1 or P2 to schedule. If P2 is
scheduled, the message is ready to be transmitted at t'=8. Based
on a computation similar to that used in Figure 4.5, it follows
that message m will be placed in round = 0, and it arrives
in time to get slot S1 of that round (TimeReady=8 < startS1=10).
Thus, m arrives at tarr=18, which means a delay relative to t'=8
(when the message was ready) of δ=10. This is the delay that
should be considered for computing the partial critical path of
P2, which now results in δ+tP4=14 (longer than the one corre-
sponding to P1).
The obvious conclusion is that priority estimation has to be
based on message planning with the TDMA scheme. Such an
8 18⁄
CHAPTER 4
44
estimation, however, cannot be performed statically, before
scheduling. If we take the same example in Figure 4.5, but con-
sider that the priority based decision is taken by the scheduler
at t=5, m will be ready at t'=13. This is too late for m to get into
slot S1 of round 0. The message arrives with round 1 at tarr=36.
This leads to a delay due to the message passing of δ=36-13=23,
different from the one computed above.
We introduce a new priority function, the modified PCP
(MPCP), which is computed during scheduling, whenever sev-
eral processes are in competition to be scheduled on the same
resource. Similar to PCP, the priority metric is the length of that
portion of the critical path corresponding to a process Pi which
Lambda(lambda, CurrentProcess)
if CurrentProcess is a message then
slot = slot of node sending CurrentProcess
round = lambda / RoundLength
if lambda - RoundLength * round > start of slot in round
round = next round
end if
while not message fits in the slot of round then
round = next round
end while
lambda = round * RoundLength +
start of slot in round + length of slot
else
lambda = lambda + WCET of CurrentProcess
end if
if lambda > MaxLambda
MaxLambda = lambda
end if
for each successor of CurrentProcess
Lambda(lambda, successor)
end for
return MaxLambda
end Lambda
Figure 4.6: The Lambda Function
TIME DRIVEN SYSTEMS
45
starts with the first successor of Pi that is assigned to a proces-
sor different from M(Pi). The critical path estimation starts with
time t at which the processes in competition are ready to be
scheduled on the available resource. During the partial traver-
sal of the graph the delay introduced by a certain node Pj is esti-
mated as follows:
t' is the time when the node generating the message terminates
(and the message is ready); tarr is the time when the slot to
which the message is supposed to be assigned has arrived. The
slot is determined like in Figure 4.4, but without taking into con-
sideration space limitations in slots.
Thus, the priority function MPCP has to be dynamically deter-
mined during the scheduling algorithm for each ready process,
every time the GetReadyProcess function is activated in order to
select a process from the ReadyList. The computation of MPCP is
performed inside the GetReadyProcess function and involves a
partial traversal of the graph, as presented in Figure 4.6.
As the experimental results (Section 4.3) show, using MPCP
instead of PCP for the TTP based architecture results in an
important improvement of the quality of generated schedules,
with a slight increase in scheduling time.
4.2.3 COMMUNICATION SYNTHESIS
In the previous subsections we have shown how the algorithm
ListScheduling can produce an efficient schedule for a CPG, given
a certain TDMA bus access scheme. However, as shown in
Figure 4.3, both the ordering of slots and the slot lengths
strongly influence the worst case execution delay of the system.
We first present a heuristic which, based on a greedy
approach, determines an ordering of slots and their lengths so
tPj, if Pj is not a message passing
tarr-t', if Pj is a message passing
δPj=
CHAPTER 4
46
that the worst case delay corresponding to a certain CPG is as
small as possible.
Greedy Approaches. The initial solution, the so called
“straightforward” one, assigns in order nodes to the slots
(NodeSi=Ni) and fixes the slot length lengthSi to the minimal
allowed value, which is equal to the length of the largest mes-
sage generated by a process assigned to NodeSi. The algorithm
OptimizeAccess
-- creates the initial, straightforward solution
for i = 0 to NrSlot - 1 do
NodeS = Ni
lengthS = MinLengthSi
end for
-- over all slots
for i = 0 to NrSlot - 1 do
-- over all slots which have not yet been allocated
-- a node and slot length
for j = i to NrSlot - 1 do
swap values (NodeSi, lengthSi) with (NodeSj, lengthSj)
-- initially, lengthSi has the minimal allowed value
for all slot lengths λ, larger than lengthSi do
lengthS = λ
ListScheduling( ... )
remember BestSolution = (NodeSi, lengthSi),
with the smallest δmax produced by ListScheduling
end for
swap back values (NodeSi, lengthSi) with (NodeSj, lengthSj)
to the state before entering the for cycle
end for
-- slot Si gets a node allocated and a length fixed
bind (NodeSi, lengthSi) = BestSolution
end for
end OptimizeAccess
Figure 4.7: Optimization of the Bus Access Scheme
TIME DRIVEN SYSTEMS
47
starts with the first slot and tries to find the node which, when
transmitting in this slot, will minimize the worst case delay of
the system, as produced by ListScheduling. Simultaneously with
searching for the right node to be assigned to the slot, the algo-
rithm looks for the optimal slot length. Once a node was selected
for the first slot and a slot length fixed, the algorithm continues
with the next slots, trying to assign nodes (and to fix slot
lengths) from those nodes which have not yet been assigned.
When calculating the length of a certain slot, a first alterna-
tive could be to try all the slot lengths λ allowed by the protocol.
Such an approach starts with the minimum slot length deter-
mined by the largest message to be sent from the candidate
node, and it continues incrementing with the smallest data unit
(e.g. 2 bits) up to the largest slot length determined by the max-
imum allowed data field in a TTP frame (e.g., 32 bits, depending
on the controller implementation). We call this alternative
OptimizeAccess1. A second alternative, OptimizeAccess2, is based
on a feedback from the scheduling algorithm which recommends
slot sizes to be tried out. Before starting the actual optimization
process for the bus access scheme, a scheduling of the straight-
forward solution is performed which generates the recom-
mended slot lengths. These lengths are produced by the Sched-
uleMessage function (Figure 4.4), whenever a new round has to
be selected because of lack of space in the current slot. In such a
case the slot length which would be needed in order to accommo-
date the new message is added to the list of recommended
lengths for the respective slot. With this alternative, the optimi-
zation algorithm in Figure 4.7 only selects among the recom-
mended lengths when searching for the right dimension of a
certain slot.
Simulated Annealing. A second algorithm we have devel-
oped is based on a simulated annealing (SA) strategy.
The greedy strategy constructs the solution by progressively
selecting the best candidate in terms of the schedule length pro-
CHAPTER 4
48
duced by the function ListScheduling. Unlike the greedy strategy,
SA tries to escape from a local optimum by randomly selecting a
new solution from the neighbours of the current solution. The
new solution is accepted if it is an improved solution. However, a
worse solution can also be accepted with a certain probability
that depends on the deterioration of the cost function and on a
control parameter called temperature [Ree93].
In Figure 4.8 we give a short description of this algorithm. An
essential component of the algorithm is the generation of a new
solution x’ starting from the current one xnow. The neighbours of
the current solution xnow are obtained by a permutation of the
slots in the TDMA round and/or by increasing/decreasing the
slot lengths. We generate the new solution by either randomly
swapping two slots (with a probability 0.3) and/or by increasing/
SimulatedAnnealing
construct an initial TDMA round xnow
temperature = initial temperature TI
repeat
for i = 1 to temperature length TL
generate randomly a neighboring solution x’ of xnow
delta = schedule with x’ - schedule with xnow
if delta < 0 then xnow = x’
else
generate q = random (0, 1)
if q < e-delta / temperature then xnow = x’ end if
end if
end for
temperature = α * temperature
until stopping criterion is met
return solution corresponding to the best schedule
end SimulatedAnnealing
Figure 4.8: The Simulated Annealing Strategy
TIME DRIVEN SYSTEMS
49
decreasing with the smallest data unit the length of a randomly
selected slot (with a probability 0.7).
For the implementation of this algorithm, the parameters TI
(initial temperature), TL (temperature length), α (cooling ratio),
and the stopping criterion have to be determined. They define
the so called cooling schedule and have a decisive impact on the
quality of the solutions and the CPU time consumed. We were
interested to obtain values for TI, TL and α that will guarantee
the finding of good quality solutions in an acceptable time.
For graphs with 160 and less processes we were able to run an
exhaustive search that found the optimal solutions. For the rest
of the graph dimensions, we performed very long and expensive
runs with the SA algorithm, and the best ever solution produced
has been considered as the optimum for the further experi-
ments. Based on further experiments we have determined the
parameters of the SA algorithm so that the optimization time is
reduced as much as possible but the optimal result is still pro-
duced. For example, for the graphs with 320 nodes, TI is 500, TL
is 400 and α is 0.97. The algorithm stops if for three consecutive
temperatures no new solution has been accepted.
4.3 Experimental Results
For evaluation of our scheduling algorithms we first used condi-
tional process graphs generated for experimental purpose. We
considered architectures consisting of 2, 4, 6, 8 and 10 nodes. 40
processes were assigned to each node, resulting in graphs of 80,
160, 240, 320 and 400 processes. 30 graphs were generated for
each graph dimension, thus a total of 150 graphs were used for
experimental evaluation. Execution times and message lengths
were assigned randomly using both uniform and exponential
distribution. For the communication channel we considered a
transmission speed of 256 kbps and a length below 20 meters.
CHAPTER 4
50
The maximum length of the data field was 8 bytes, and the fre-
quency of the TTP controller was chosen to be 20 MHz. All
experiments were run on a SPARCstation 20.
The first result concerns the quality of the schedules produced
by the list scheduling based algorithm using PCP and MPCP
priority functions. In order to compare the two priority functions
we have calculated the average percentage deviations of the
schedule length produced with PCP and MPCP from the length
of the best schedule between the two. The results are depicted in
Figure 4.9. In average the deviation with MPCP is 11.34 times
smaller than with PCP. However, due to its dynamic nature,
MPCP has in average a bigger execution time than PCP. The
average execution times for the ListScheduling function using PCP
and MPCP are depicted in Figure 4.10 and are under half a sec-
ond for graphs with 400 processes.
In the next experiments we were interested to check the
potential of the algorithms presented in Section 4.2.3 to improve
0
1
2
3
4
5
6
7
8
9
10
80 160 240 320 400
PCP
MPCP
Figure 4.9: Quality of Schedules with PCP and MPCP
Number of processes
A
ve
ra
ge
 p
er
ce
n
ta
ge
 d
ev
ia
ti
on
TIME DRIVEN SYSTEMS
51
the generated schedules by optimizing the bus access scheme.
We compared schedule lengths obtained for the 150 CPGs con-
sidering four different bus access schemes: the straightforward
solution, the optimized schemes generated with the two alterna-
tives of our greedy algorithm (OptimizeAccess1 and
OptimizeAccess2) and the near-optimal scheme produced using a
the simulated annealing (SA) based algorithm. Very long and
extensive runs have been performed with the SA algorithm for
each graph and the best ever solution produced has been consid-
ered as the near-optimum for that graph.
Table 4.1 presents the average and maximum percentage
deviation of the schedule lengths obtained with the straightfor-
ward solution and with the two optimized schemes from the
length obtained with the near-optimal scheme. For each of the
graph dimensions, the average optimization time, expressed in
seconds, is also given. The first conclusion is that by considering
the optimization of the bus access scheme, the results improve
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 50 100 150 200 250 300 350 400 450
PCP
MPCP
Number of processes
A
ve
ra
ge
 e
xe
cu
ti
on
 t
im
e 
in
 s
ec
on
ds
Figure 4.10: Average Exec. Times of PCP and MPCP
CHAPTER 4
52
significantly compared to the straightforward solution. The
greedy heuristic performs well for all the graph dimensions. As
expected, the alternative OptimizeAccess1 (which considers all
allowed slot lengths) produces slightly better results, on aver-
age, than OptimizeAccess2. However, the execution times are
much smaller for OptimizeAccess2. It is interesting to mention
that the average execution times for the SA algorithm, needed to
find the near-optimal solutions, are between 5 minutes for the
CPGs with 80 processes and 275 minutes for 400 processes.
Finally, Chapter 6 presents a real-life example implementing
a vehicle cruise controller.
Table 4.1: Evaluation of the Bus Access Optimization Algorithm
Nr. of
proc.
Straightforward
solution
OptimizeAccess1 OptimizeAccess2
avg.
dev.
max.
dev.
avg.
dev.
max.
dev.
exec.
time
avg.
dev.
max.
dev.
exec.
time
80 3.16% 21% 0.02% 0.5% 0.25s 1.8% 19.7% 0.04s
160 14.4% 53.4% 2.5% 9.5% 2.07s 4.9% 26.3% 0.28s
240 37.6% 110% 7.4% 24.8% 10.46s 9.3% 31.4% 1.34s
320 51.5% 135% 8.5% 31.9% 34.69s 12.1% 37.1% 4.8s
400 48% 135% 10.5% 32.9% 56.04s 11.8% 31.6% 8.2s
EVENT DRIVEN SYSTEMS
53
Chapter 5
Schedulability Analysis and
Communication Synthesis
for Event Driven Systems
TTP HAS BEEN classically associated with non-preemptive
static scheduling of processes, mainly because of fault tolerance
reasons [Kop97a]. In the previous chapter we have addressed
the issue of non-preemptive static process scheduling and com-
munication synthesis using TTP.
However, considering preemptive priority based scheduling at
the process level, with time triggered static scheduling at the
communication level, can be the right solution under certain cir-
cumstances [Lon99]. A communication protocol like TTP, pro-
vides a global time base, improves fault-tolerance and
predictability. At the same time, certain particularities of the
application or of the underlying real-time operating system can
impose a priority based scheduling policy at the process level.
In this chapter we consider event-driven distributed real-time
systems where the activation of processes is event-triggered,
CHAPTER 5
54
while the communications are time-triggered, according to the
TTP.
The chapter is structured as follows. The next section intro-
duces some previous work on schedulability analysis that is
needed for later discussions. We then show how the current
state of the art schedulability analysis for distributed real-time
systems can be extended to consider the time triggered protocol
(Section 5.2). Section 5.3 presents the schedulability analysis we
have developed for systems with both control and data depend-
encies modelled as a set of conditional process graphs. Once real-
istic communication aspects are captured by the schedulability
analysis, this can be used to drive the communication synthesis
process described in Section 5.4. Finally, Section 5.5 presents the
experimental results obtained for the work presented in this
chapter.
5.1 Schedulability Analysis
A set of processes is schedulable if there exists at least one
scheduling algorithm that is able to produce a feasible schedule.
A schedule is feasible if all tasks can be completed within the
specified constraints.
The aim of the schedulability analysis is to determine suffi-
cient and necessary conditions under which the system is sched-
ulable. There are basically two approaches to this problem:
 • produce exact (both sufficient and necessary) feasibility test
of a set of processes within the constraints; and
 • derive the worst-case response times (the longest time
between the arrival of a process and its completion) and com-
pare them to the deadlines, the so called response time analy-
sis.
The schedulability analysis has its roots in [Liu73] that has
provided a sufficient feasibility test (based on a utilization
EVENT DRIVEN SYSTEMS
55
bound) for processes that have priorities assigned according to
the rate monotonic priority assignment policy [Liu73]:
where n is the number of processes, Ci is the worst case execu-
tion time of process Pi, and Ti is its deadline.
There are several unrealistic assumptions in [Liu73]: the
processes are considered independent, periodic, having a dead-
line equal to the period, etc.
The second approach, using the response time analysis to
check the exact feasibility of a set of processes, has been dis-
cussed in [Aud91]. It compares the worst case response time of
each process to its deadline, and if the inequalities hold, the sys-
tem is schedulable:
where ri is the response time of process Pi and hp(Pi) is the set of
processes that have higher priority than Pi. The authors note
that the summation term increases monotonically in ri, thus
solutions can be found using a recurrence relation. This
approach makes no assumptions regarding the priority assign-
ment scheme, and allows the deadlines to be less than the peri-
ods.
A huge amount of research has been done in the last decades,
that aims at relaxing the assumptions made by the previous
works, and the reader is referred to the overview in [Aud95].
Of particular importance for our work are the research in
[Yen98, Tin94b] that aims to relax assumptions made on the
independence of processes, and the research presented in
[Tin94a] that extends the existing analysis to distributed sys-
tems with a simple TDMA protocol. Thus, we will present these
works in detail in the coming sections.
Ci
Ti
-----
i 1=
n
∑ n 21 n/ 1–( )≤
ri Ci C j
ri
T j
------
j∀ hp Pi( )∈
∑+=
CHAPTER 5
56
5.2 Schedulability Analysis with the Time
Triggered Protocol
For the purpose of this section, we consider applications mod-
elled as a set of processes. Each process Pi is allocated to a cer-
tain processor, has a known worst-case execution time Ci, a
period Ti, a deadline Di and a uniquely assigned priority. We
consider a preemptive execution environment, which means
that higher priority processes can interrupt the execution of
lower priority processes. A lower priority process can block a
higher priority process (e.g., it is in its critical section), and the
blocking time is computed according to the priority ceiling proto-
col [Sha90]. Processes exchange messages, and for each message
mi we know its size . A message is sent once in every nm
invocations of the sending process, and has a unique destination
process. Each process is allocated to a node of our distributed
architecture, and the messages are transmitted according to the
TTP.
We are interested to synthesize the MEDL of the TTP control-
lers (and as a direct consequence, also the MHTTs -- see Section
2.2.3) so that the process set is schedulable on an as cheap (slow)
as possible processor set.
Under these assumptions Tindell et al. [Tin94a] integrate
processor and communication schedulability and provide a
“holistic” schedulability analysis in the context of distributed
real-time systems with communication based on a simple TDMA
protocol. The basic idea is that the release jitter of a destination
process depends on the communication delay between sending
and receiving a message. The release jitter of a process is the
worst case delay between the arrival of the process and its
release (when it is placed in the run-queue for the processor).
The communication delay is the worst case time spent between
sending a message and the message arriving at the destination
process.
Smi
EVENT DRIVEN SYSTEMS
57
Thus, for a process d(m) that receives a message m from a
sender process s(m), the release jitter is
,
where rs(m) is the response time of the process sending the mes-
sage, am (worst case arrival time) is the worst case time needed
for message m to arrive at the communication controller of the
destination node, rdeliver is the response time of the delivery
process (see Section 2.2.3), and Ttick is the jitter due to the oper-
ation of the tick scheduler. The communication delay for a mes-
sage m is
.
am itself is the sum of the access delay and the propagation
delay. The access delay is the time a message queued at the
sending processor spends waiting for the use of the communica-
tion channel. In am we also account for the execution time of the
message transfer process (see Section 2.2.3). The propagation
delay is the time taken for the message to reach the destination
processor once physically sent by the corresponding TTP con-
troller.
The worst case time message m takes to arrive at the commu-
nication controller of the destination node is determined in
[Tin94a] using the arbitrary deadline analysis, and is given by:
,
where the term is the access delay, is the
propagation delay, and Tm is the period of the message.
In [Tin94a] an analysis is given for the end-to-end delay of a
message m in the case of a simple TDMA protocol. For this case,
,
where pm is the number of packets of message m, Sp is the size of
the slot (in number of packets) corresponding to m, and Im is the
interference caused by packets belonging to messages of a
Jd m( ) rs m( ) am rdeliver Ttick+ + +=
Cm am rdeliver+=
am
max
q 0 1 2 …, , ,= wm q( ) Xm q( ) qTm–+( )=
wm q( ) qTm– Xm q( )
wm q( )
q 1+( ) pm Im w q( )( )+
Sp
----------------------------------------------------------- TTDMA=
CHAPTER 5
58
higher priority than m. Although there are many similarities
with the general TDMA protocol, the analysis in the case of TTP
is different in several aspects and also differs to a large degree
depending on the policy chosen for message scheduling.
Before going into details for each of the message scheduling
approaches proposed by us, we analyze the propagation delay
and the message transfer and delivery processes, as they do not
depend on the particular message scheduling policy chosen. The
propagation delay Xm of a message m sent as part of a slot S,
with the TTP protocol, is equal to the time needed for the slot S
to be transferred on the bus. This time depends on the slot size
and on the features of the underlying bus.
The overhead produced by the communication activities must
be accounted for not only as part of the access delay for a mes-
sage, but also through its influence on the response time of proc-
esses running on the same processor. We consider this influence
during the schedulability analysis of processes on each proces-
sor. We assume that the worst case computation time of the
transfer process (T in Figure 2.5) is known, and that it is differ-
ent for each of the four message scheduling approaches. Based
on the respective MHTT, the transfer process is activated for
each frame sent. Its worst case period is derived from the mini-
mum time between successive frames.
The response time of the delivery process (D in Figure 2.5),
rdeliver, is part of the communication delay. The influence due to
the delivery process must be also included when analyzing the
response time of the processes running on the respective proces-
sor. We consider the delivery process during the schedulability
analysis in the same way as the message transfer process.
The response times of the communication and delivery proc-
esses are calculated, as for all other processes, using the arbi-
trary deadline analysis from [Tin94a].
The four approaches we have considered for scheduling of mes-
sages using TTP differ in the way the messages are allocated to
the communication channel (either statically or dynamically)
EVENT DRIVEN SYSTEMS
59
and whether they are split or not into packets for transmission.
The next subsections present an analysis for these approaches as
well as the degrees of liberty a designer has, in each of the cases,
when synthesizing the MEDL.
5.2.1 STATIC SINGLE MESSAGE ALLOCATION (SM)
The first approach to scheduling of messages using TTP is to
statically (off-line) schedule each of the messages into a slot of
the TDMA cycle, corresponding to the node sending the mes-
sage. We also consider that the slots can hold each at maximum
one single message. This approach is well suited for application
areas (like automotive electronics) where the messages are typ-
ically short and the ability to easily diagnose the system is criti-
cal.
As each slot carries only one fixed, predetermined message,
there is no interference among messages. If a message m misses
its slot it has to wait for the following slot assigned to m. The
access delay for a message m in this approach is the maximum
time between consecutive slots of the same node carrying the
message m. We denote this time by , illustrated in Figure
5.1.
In this case, the worst case arrival time am of a message m
becomes . Therefore, the main aspect influencing
the schedulability analysis for the messages is the way the mes-
sages are statically allocated to slots, resulting different values
for . , as well as Xm, depend on the slot sizes which
in the case of SM are determined by the size of the largest mes-
sage sent from the corresponding node, plus the bits for control
and CRC, as imposed by the protocol.
During the synthesis of the MEDL, the designer has to allo-
cate the messages to slots in such a way that the process set is
schedulable. Since the schedulability of the process set can be
influenced by the synthesis of the MEDL only through the
 parameters, these parameters have to be optimized.
Tmmax
Tmmax
Xm+
Tmmax
Tmmax
Tmmax
CHAPTER 5
60
Let us consider the simple example depicted in Figure 5.2,
where we have three processes, P1, P2, and P3 running each on
different processors. When process P1 finishes executing it sends
message m1 to process P3 and message m2 to process P2. In the
TDMA configuration presented in Figure 5.2a, only the slot for
the CPU running P1 is important for our discussion and the
other slots are represented with light gray. With this configura-
tion, where the message m1 is allocated to the rounds 1 and 4
and the message m2 is allocated to rounds 2 and 3, process P2
misses its deadline because of the release jitter due to the mes-
sage m2 in round 2. However, if we have the TDMA configura-
tion depicted in Figure 5.2b, where m1 is allocated to the rounds
2 and 4 and m2 is allocated to the rounds 1 and 3, then all the
processes meet their deadlines.
5.2.2 STATIC MULTIPLE MESSAGE ALLOCATION (MM)
This second approach is an extension of the first one. In this
approach we allow more than one message to be statically
assigned to a slot, and all the messages transmitted in the same
slot are packaged together in a frame. In this case there is also
no interference, so the access delay for a message m is the same
as for the first approach, namely, the maximum time between
consecutive slots of the same node carrying the message m,
.
However, this approach offers more freedom during the syn-
thesis of the MEDL. We have now to decide also on how many
and which messages should be put in a slot. This allows more
S0 S1
Figure 5.1: Worst case arrival time for SM
S0 S1 S0 S1
m m
T Xm
m’
mmax
Tmmax
EVENT DRIVEN SYSTEMS
61
flexibility in optimizing the parameter. To illustrate this,
let us consider the same example depicted in Figure 5.2. With
the MM approach, the TDMA configuration can be arranged as
depicted in Figure 5.2c, where the messages m1 and m2 are put
together in the same slot in the rounds 1 and 2. Thus, the dead-
line is met, and the release jitter is further reduced compared to
the case presented in Figure 5.2b where the deadlines were also
met but the process P3 was experiencing large release jitter.
5.2.3 DYNAMIC MESSAGE ALLOCATION (DM)
The previous two approaches have statically allocated one or
more messages to their corresponding slots. This third approach
considers that the messages are dynamically allocated to
frames, as they are produced.
Thus, when a message is produced by a sender process it is
placed in the Out queue ordered according to the priorities of the
Tmmax
Figure 5.2: Optimizing the MEDL for SM and MM
P1
P2
P3
m1 m2 m2 m1
m2 m1 m2 m1
m2 m1 m2 m1
a)
Release Jitter Running process Message Process activation Deadline
P1
P2
P3
b)
P1
P2
P3
c)
CHAPTER 5
62
messages. At its activation, the message transfer process takes a
certain number of messages from the head of the Out queue and
constructs the frame. The number of messages accepted is
decided so that their total size does not exceed the length of the
data field of the frame. This length is limited by the size of the
slot corresponding to the respective processor. Since the mes-
sages are sent dynamically, we have to identify them in a certain
way so that they are recognized when the frame arrives at the
delivery process. We consider that each message has several
identifier bits appended at the beginning of the message.
Since we dynamically package the messages into frames in
the order they are sorted in the queue, the access delay to the
communication channel for a message m depends on the number
of messages queued ahead of it.
The analysis in [Tin94a] bounds the number of queued ahead
packets of messages of higher priority than message m, as in
their case it is considered that a message can be split into pack-
ets before it is transmitted on the communication channel. We
use the same analysis, but we have to apply it for the number of
messages instead that of packets. We have to consider that mes-
sages can be of different sizes as opposed to packets which
always are of the same size.
Therefore, the total size of higher priority messages queued
ahead of a message m in a window w is:
,
where Sj is the size of the message mj, rs(j) is the response time of
the process sending message mj, and Tj is the period of the mes-
sage mj.
Further, we calculate the worst case time that a message m
spends in the Out queue. The number of TDMA rounds needed,
in the worst case, for a message m placed in the queue to be
removed from the queue for transmission is
,
Im w( )
w rs j( )+
T j
-----------------------
j∀ hp m( )∈
∑ S j=
Sm Im+
Ss
----------------------
EVENT DRIVEN SYSTEMS
63
where Sm is the size of the message m and Ss is the size of the
slot transmitting m (we assume, in the case of DM, that for any
message x, ). This means that the worst case time a mes-
sage m spends in the Out queue is given by
,
where TTDMA is the time taken for a TDMA round.
To determine the term that gives the access
delay (see Section 4), is determined, using the arbitrary
deadline analysis, as being:
.
Since the size of the messages is given with the application,
the parameter that will be optimized during the synthesis of the
MEDL is the slot size. To illustrate how the slot size influences
the schedulability, let us consider the example in Figure 5.3a,
where we have the same setting as for the example in Figure
5.2a. The difference is that we consider message m1 having a
higher priority than message m2, and we schedule dynamically
the messages as they are produced. With the TDMA configura-
tion in Figure 5.3a message m1 will be dynamically scheduled
first in the slot of the first round, while message m2 will wait in
the Out queue until the next round comes, thus causing the proc-
ess P2 to miss its deadline. However, if we enlarge the slot so that
it can accommodate both messages, message m2 does not have to
wait in the queue and it is transmitted in the same slot as m1.
Therefore P2 will meet its deadline, as presented in Figure 5.3b.
However, in general, increasing the length of slots does not nec-
essarily improve the schedulability, as it delays the communica-
tion of messages generated by other nodes.
Sx Ss≤
S
m
I
m
+
S
s
-------------------
TTDMA
wm q( ) qTm–
wm q( )
wm q( )
q 1+( )Sm Im w q( )( )+
Ss
----------------------------------------------------------- TTDMA=
CHAPTER 5
64
5.2.4 DYNAMIC PACKETS ALLOCATION (DP)
This approach is an extension of the previous one, as we allow
the messages to be split into packets before they are transmitted
on the communication channel. We consider that each slot has a
size that accommodates a frame with the data field being a mul-
tiple of the packet size. This approach is well suited for the
application areas that typically have large message sizes, and by
splitting them into packets we can obtain a higher utilization of
the bus and reduce the release jitter. However, since each packet
has to be identified as belonging to a message, and messages
have to be split at the sender and reconstructed at the destina-
tion, the overhead becomes higher than in the previous
approaches.
For the analysis we use the formula from [Tin94a] which is
based on similar assumptions as those for this approach:
,
where pm is the number of packets of message m, Sp is the size of
the slot (in number of packets) corresponding to m, and
,
where pj is the number of packets of a message mj.
In the previous approach (DM) the optimization parameter for
the synthesis of the MEDL was the size of the slots. Within this
approach we can also decide on the packet size, which becomes
another optimization parameter. Consider the example in
Figure 5.3c where messages m1 and m2 have a size of 6 bytes
each. The packet size is considered to be 4 bytes and the slot cor-
responding to the messages has a size of 12 bytes (3 packets) in
the TDMA configuration. Since message m1 has a higher prior-
ity than m2, it will be dynamically scheduled first in the slot of
the first round, and it will need 2 packets. In the remaining
packet, the first 4 bytes of m2 are scheduled. Thus, the rest of 2
bytes from message m2 have to wait for the next round, causing
wm q( )
q 1+( )pm Im w q( )( )+
Sp
----------------------------------------------------------- TTDMA=
Im w( )
w rs j( )+
T j
--------------------
j∀ hp m( )∈
∑ p j=
EVENT DRIVEN SYSTEMS
65
the process P2 to miss its deadline. However, if we change the
packet size to 3 bytes, and keep the same size of 12 bytes for the
slot, we now have 4 packets in the slot corresponding to the CPU
running P1 (Figure 5.3d). Message m1 will be dynamically
scheduled first, and will take 2 packets from the slot of the first
round. This will allow us to send m2 in the same round, there-
fore meeting the deadline for P2.
In this particular example, with one single sender processor
and the particular message and slot sizes as given, the problem
seems to be simple. This is, however, not the case in general. For
example, the packet size which fits a particular node can be
unsuitable in the context of the messages and slot size corre-
m1 m2 m1 m2
m1 m2/packet 2 m1 m2/packet 2m2/packet 1 m2/packet 1
Figure 5.3: Optimizing the MEDL for DM and DP
Release Jitter Running process Message Process activation Deadline
P1
P2
P3
d)
P1
P2
P3
c)
P1
P2
P3
b)
P1
P2
P3
a)
m1 m2 m1 m2
m1 m2 m1 m2
CHAPTER 5
66
sponding to another node. At the same time, reducing the pack-
ets size increases the overheads due to the transfer and delivery
processes.
5.3 Schedulability Analysis under Control and
Data Dependencies
In the previous section we were interested to extend the current
response-time analysis to take into consideration the time-trig-
gered protocol. For this we have modelled an application as a set
of processes.
In this section we are interested to extend this model to han-
dle both data and control dependencies. Thus, we consider appli-
cations modelled as a set ψ of n conditional process graphs Γi, i =
1..n. Every process Pi in such a graph is mapped to a certain
processor, has a known worst-case execution time Ci, a deadline
Di, and a uniquely assigned priority. All processes belonging to
the same CPG Γi have the same period TΓi which is the period of
the respective conditional process graph. Each CPG in the appli-
cation has its own independent period. Typically, global dead-
lines δΓi on the delay of each CPG are imposed and not
individual deadlines on processes.
We consider a priority based preemptive execution environ-
ment, which means that higher priority processes will interrupt
the execution of lower priority processes. A lower priority process
can block a higher priority process (e.g., it is in its critical sec-
tion), and the blocking time is computed according to the prior-
ity ceiling protocol [Sha90].
We are interested to develop a schedulability analysis for a
system modelled as a set of conditional process graphs. For the
rest of the section we will consider that global deadlines are
imposed on each CPG. The approach can be easily extended if
individual deadlines are imposed on processes.
EVENT DRIVEN SYSTEMS
67
To show the relevance of our problem, let us consider the
example depicted in Figure 5.4, where we have a system mod-
elled as two conditional process graphs Γ1 and Γ2 with a total of
9 processes (processes P0, P8, P9 and P12 are dummy processes
and are not counted), and one condition. The processes are
mapped on three different processors as indicated by the shad-
ing in Figure 5.4, and the worst case execution time in millisec-
onds for each process on its respective processor is depicted to
the left of each node. Γ1 has a period of 200 ms, Γ2 has a period of
150 ms. The deadlines are 100 ms on Γ1 and 90 ms on Γ2.
Table 5.1 presents the estimated worst case delay on the two
graphs. In the column labelled “no conditions” we have the
results for the case when the analysis is applied to the set of
processes, ignoring control dependencies. This results in a worst
case delay of 120 ms for Γ1 and 82 ms for Γ2. Thus, the system is
considered to be not schedulable.
C C
P4
P5
P11P10
P6
P8
P1
P3
P2
P7
P0
P9
P12
Γ1 Γ2
27
30
25
24
19
30
22
25 32
Figure 5.4: System with Control and Data Dependencies
CHAPTER 5
68
However, this analysis assumes as a worst case scenario the
possible activation of all nine processes for each execution of the
system. This is the solution which will be obtained using a data-
flow graph representation of the system. However, considering
the CPG Γ1 in Figure 5.4, it is easy to observe that process P3 on
the one side and processes P2 and P4 on the other side will not be
activated during the same period of Γ1.
Making use of this information for the analysis we obtain a
worst case delay of 100 ms, for Γ1, as shown in Table 5.1 in the
column headed “conditions”, which indicates that the system is
schedulable.
5.3.1 TASKS WITH DATA DEPENDENCIES
Methods for schedulability analysis of data dependent processes
with static priority preemptive scheduling have been proposed
in [Yen98] and [Tin94b].
They use the concept of offset or phase, respectively, in order to
handle data dependencies. [Tin94b] shows that the pessimism of
the analysis is reduced through the introduction of offsets. The
offsets have to be determined by the designer.
[Yen98] provides a framework that iteratively finds the
phases (offsets) for all processes, and then feeds them back into
the schedulability analysis which in turn is used again to derive
better phases. Thus, the pessimism of the analysis is iteratively
reduced.
We have used the framework provided by [Yen98] as a starting
point for our analysis. The response time of a process Pi is:
Table 5.1: : Worst Case Delays for the System in Figure 5.4
CPG Worst Case Delays
no conditions conditions
Γ1 120 100
Γ2 82 82
EVENT DRIVEN SYSTEMS
69
 (1)
where hp(Pi) is the set of processes that have higher priority
than Pi, and Oij is the phase of Pj relative to Pi.
As a first step, we have extended this analysis to real-time
systems that use the time-triggered protocol as the underlying
communication infrastructure. This aspect has been discussed
in Section 5.2.
In [Yen98] a system is modelled as a set S of n task graphs Gi,
i = 1..n. The system model assumed and the definition of a task
graph are similar to our CPG, but without considering any con-
ditions. The aim of the schedulability analysis in [Yen98] is to
derive an as tight as possible worst case delay on the execution
time of each of the task graphs in the system. This delay estima-
tion is done using the algorithm DelayEstimate described in
Figure 5.5.
At the core of this algorithm is a worst case response time cal-
culation based on offsets, similar to the analysis in [Tin94b].
Thus, in the LatestTimes function worst case response times and
upper bounds for the offsets are calculated, while the Earliest-
Times function calculates the lower bounds of the offsets.
The LatestTimes function is a modified critical-path algorithm
that calculates for each node of the graph the longest path to the
sink node. Thus, during the topological traversal of the graph G
within LatestTimes, for each process Pi, the worst case response
time ri is calculated according to the equation (1). This value is
based on the values of the offsets known so far. Once an ri is cal-
culated, it can be used to determine and update offsets for other
successor processes. Accordingly, the EarliestTimes function deter-
mines the lower bounds on the offsets. The influence on graph G
from other graphs in the system is considered in both of the
functions mentioned earlier.
These calculations can be improved by realizing that for a
process Pi, there might exist a process Pj mapped on the same
processor, with priority(Pi) < priority(Pj), such that their execu-
ri Ci C j
ri Oij–
T j
------------------
j∀ hp Pi( )∈
∑+=
CHAPTER 5
70
tion windows never overlap. In this case, the term in the equa-
tion (1) that expresses the influence of Pj on the execution of Pi
can be dropped, resulting in a tighter worst case response time
calculation. This situation is expressed through the so called
maxsep table, computed by the MaxSeparations function, whose
value maxsep[Pi, Pj] is less than or equal to 0 if the two processes
never overlap during their execution. maxsep stands for maxi-
mum separation, an analysis modified from [Mc92] that builds
Figure 5.5: Delay Estimation and
Schedulability Analysis for Task Graphs
DelayEstimate(task graph G, system S)
-- derives the worst case delay of a task graph G considering
-- the influence from all other task graphs in the system S
for each pair (Pi, Pj) in G
maxsep[Pi, Pj] = ∞
end for
step = 0
repeat
LatestTimes(G)
EarliestTimes(G)
for each Pi ∈ G
MaxSeparations(Pi)
end for
until maxsep is not changed or step < limit
return the worst case delay δG of the graph G
end DelayEstimate
SchedulabilityTest(system S)
-- derives the worst case delay for each task graph in the system
-- and verifies if the deadlines are met
for each task graph Gi ∈ S
DelayEstimate(Gi, S)
end for
if all task graphs meet their deadline system S is schedulable
end SchedulabilityTest
EVENT DRIVEN SYSTEMS
71
the maxsep table based on the worst case execution times and
offsets determined in EarliestTimes and LatestTimes.
Having a better view on the maximum separation between
each pair of processes, tighter worst case execution times and
offsets can be derived, which in turn contribute to the update of
the maxsep table. This iterative tightening process is repeated
until there is no modification to the maxsep table, or a certain
imposed limit on the number of iterations is reached.
Finally, the DelayEstimate function returns the worst-case
delay δG estimated for a task graph G, as the latest time when
the sink node of G can finish its execution. Based on the delays
produced by DelayEstimate, the function SchedulabilityTest in
Figure 5.5 concludes on the schedulability of the system.
5.3.2 CONDITIONAL PROCESS GRAPHS
Section 2.1.1 has presented the conditional process graph.
Before introducing our schedulability analysis for CPGs, we
reinforce two concepts: the unconditional subgraphs and the
process guards.
Depending on the values calculated for the conditions, differ-
ent alternative paths through a conditional process graph are
activated for a given activation of the system. To model this, a
boolean expression XPi, called guard (introduced in Section
2.1.1), can be associated to each node Pi in the graph. It repre-
sents the necessary condition for the respective process to be
activated. In Figure 5.8, for example, XP4=C∧D, XP5=C,
XP9=true, XP11=true, and XP12=K.
We call an alternative path through a conditional process
graph, resulting from a combination of conditions, an uncondi-
tional subgraph, denoted by g. For example, the CPG Γ1 in
Figure 5.8 has three unconditional subgraphs, corresponding to
the following three combinations of conditions: C∧D, C∧D, and
C. The unconditional subgraph corresponding to the combina-
CHAPTER 5
72
tion C∧D in the CPG Γ1 consists of processes P1, P2, P4, P6, P7,
P9 and P10.
The guards of each process, as well as the unconditional sub-
graphs resulting from a conditional process graph Γ can be
determined through a simple recursive topological traversal of
Γ.
Ignoring Conditions (IC). A straightforward approach to
the schedulability analysis of systems represented as CPGs is to
ignore control dependencies and to apply the schedulability
analysis as described in Section 5.3.1 (the algorithm
SchedulabilityTest in Figure 5.5).
This means that conditional edges in the CPGs are considered
like simple edges and the conditions in the model are dropped.
What results is a system S consisting of simple task graphs Gi,
each one resulted from a CPG Γi of the given system ψ. The sys-
tem S can then be analyzed using the algorithm in Figure 5.7. It
Figure 5.6:  Example of two CPGs
C
C
P7
P10
P13P12
P1
P5
P11
P14
Γ1 Γ2
P4P3
P2
P6
P8
D D
KK
P9
EVENT DRIVEN SYSTEMS
73
is obvious that if the system S is schedulable, the system ψ is
also schedulable.
This approach, which we call IC, is, of course, very pessimistic.
However, this is the current practice when worst case arrival
periods are considered and classical data flow graphs are used
for modeling and scheduling.
Brute Force Solution (BF). The pessimism of the previous
approach can be reduced by using a conditional process graph
model. A simple, brute force solution is to apply the schedulabil-
ity analysis presented in Section 5.3.1, after the CPGs have been
decomposed into their constituent unconditional subgraphs.
Consider a system ψ which consists of n CPGs Γi, i = 1..n. Each
CPG Γi can be decomposed into ni unconditional subgraphs gji , j
= 1..ni. In Figure 5.6, for example, we have 3 unconditional sub-
graphs g1
1, g2
1, g3
1 derived from Γ1 and two, g12, g22 derived
from Γ2.
At the same time, each CPG Γi can be transformed into a
simple task graph Gi, by transforming conditional edges into
ordinary ones and dropping the conditions. When deriving the
worst case delay on Γi we apply the analysis from Section 5.3.1
(algorithm DelayEstimate in Figure 5.5) separately to each
unconditional subgraph gj
i in combination with the graphs (G1,
G2, ... Gi-1, Gi+1, Gn). This means that we consider each
alternative path from Γi in the context of the system, instead of
the whole subgraph Gi as in the previous approach. This is
Figure 5.7: Schedulability Analysis Ignoring Conditions
SA/IC(system ψ)
-- verifies the schedulability of a system consisting of a set of
-- conditional process graphs
transform each Γi ∈ ψ into the corresponding Gi ∈ S
SchedulabilityTest(S)
if S is schedulable, system ψ is schedulable
end SA/IC
CHAPTER 5
74
described by the algorithm DE/CPG in Figure 5.8a. The
schedulability analysis is then based on the delay estimation for
each CPG as shown in the algorithm SA/BF in Figure 5.8b.
Such an approach, we call it BF, while producing tight bounds
on the delays, can be expensive from the runtime point of view,
because it is applied for each unconditional subgraph. In gen-
eral, the number of unconditional subgraphs can grow exponen-
tially. However, for many of the practical systems this is not the
case, and the brute force method can be used. Alternatively, less
expensive methods, like those presented below, should be
applied.
Figure 5.8: Brute Force Schedulability Analysis
DE/CPG(CPG Γ, system S)
-- derives the worst case delay of a CPG Γ considering
-- the influence from all other task graphs in the system S
extract all unconditional subgraphs gj from Γ
for each gj
DelayEstimate(gj, S)
end for
return the largest of the delays, which is
the worst case delay δΓ of CPG Γ
end DE/CPG
a) DE/CPG -- Delay Estimate for Conditional Process Graphs
SA/BF(system ψ)
-- verifies the schedulability of a system consisting of a set ψ of
-- conditional process graphs
transform each Γi ∈ψ into the corresponding Gi ∈S
for each Γi ∈ψ
DE/CPG(Γi, {G1, G2, ...Gi-1, Gi+1, Gn})
end for
if all CPGs meet their deadline the system ψ is schedulable
end SA/BF
b) SA/BF -- Schedulability Analysis: the Brute Force approach
EVENT DRIVEN SYSTEMS
75
Condition Separation (CS) In some situations, the explo-
sion of unconditional subgraphs makes the brute force method
inapplicable. Thus, we need to find an analysis that is situated
Figure 5.9: Schedulability Analysis using
Condition Separation
SA/CS(system ψ)
-- verifies the schedulability of a system consisting of a set ψ of
-- conditional process graphs
transform each Γi ∈ ψ into the corresponding Gi ∈ S
and keep guard XPi for each Pi
for each Gi ∈ S
-- derives the worst case delay of a task graph Gi
-- considering the influence from all other task graphs
-- in the system S
for each pair (Pi , Pj) in Gi
maxsep[Pi, Pj] = ∞
end for
step = 0
repeat
LatestTimes(Gi)
EarliestTimes(Gi)
for each Pi ∈ Gi
MaxSeparations(Pi)
end for
for each pair (Pi , Pj) in Gi
if ∃C, C ⊂ XPi ∧ C ⊂ XPj then
maxsep[Pi, Pj] = 0
end if
end for
until maxsep is not changed or step < limit
δΓi is the worst case delay for Γi
end for
if all CPGs meet their deadline, the system ψ is schedulable
end SA/CS
CHAPTER 5
76
somewhere between the two alternatives IC and BF, which
means its should be not too pessimistic and should run in
acceptable time.
A first idea is to go back to the DelayEstimate algorithm in
Figure 5.5, and use the knowledge about conditions in order to
update the maxsep table. Thus, if two processes Pi and Pj never
overlap their execution because they execute under alternative
values of conditions, then we can update maxsep[Pi, Pj] to 0, and
thus improve the quality of the delay estimation. Two processes
Pi and Pj never overlap their execution if there exists at least
one condition C, so that C ⊂ XPi (XPi is the guard of process Pi)
and C ⊂ XPj.
In this approach, called CS, we practically use the same algo-
rithm as for ordinary task graphs and try to exploit the informa-
tion captured by conditional dependencies in order to exclude
certain influences during the analysis. In Figure 5.9 we show the
algorithm SA/CS which performs the schedulability analysis
based on this heuristic.
Relaxed Tightness Analysis (RT). The two approaches dis-
cussed here are similar to the brute force algorithm (Figure 5.8).
However, they try to improve on the execution time of the anal-
yses by reducing the complexity of the DelayEstimate algorithm
(Figure 5.5) which is called from the DE/CPG function (Figure 5.8
a). This will reduce the execution time of the analysis, not by
reducing the number of subgraphs which have to be visited (like
in the CS approach), but by reducing the time needed to analyze
each subgraph. As our experimental results show (Section 5.5),
this approach can be very effective in practice. Of course, by the
simplification applied to DelayEstimate the quality of the analysis
is reduced in comparison to the brute force method.
We have considered two alternatives of which the first one is
more drastic while the second one is trying a more refined trade-
off between execution time and quality of the analyses.
EVENT DRIVEN SYSTEMS
77
With both these approaches, the idea is not to run the itera-
tive tightening loop in DelayEstimate that repeats until no
changes are made to maxsep or until the limit is reached. While
this tightening loop iteratively reduces the pessimism when cal-
culating the worst case response times, the actual calculation of
the worst case response times is done in LatestTimes, and the rest
of the algorithm in Figure 5.5 just tries to improve on these val-
ues. For the first approach, called RT1 the function DelayEstimate
has been transformed like in Figure 5.10a.
However, it might be worth using at least the MaxSeparations in
order to obtain tighter values for the worst case response times.
For the alternative RT2 in Figure 5.10b, DelayEstimateRT2 first
calls LatestTimes and EarliestTimes, then MaxSeparations in order to
build the maxsep table, and again LatestTimes to tighten the
worst case response times.
Figure 5.10: Delay Estimation for the RT Approaches
DelayEstimateRT1(task graph G, system S)
LatestTimes(G)
end DelayEstimateRT1
a) Delay Estimation for RT1
DelayEstimateRT2(task graph G, system S)
for each pair (Pi , Pj) in Gi
maxsep[Pi, Pj] = ∞
end for
LatestTimes(G)
EarliestTimes(G)
for each Pi ∈ G
MaxSeparations(Pi)
end for
LatestTimes(G)
end DelayEstimateRT2
b) Delay Estimation for RT2
CHAPTER 5
78
5.4 Communication Synthesis
Once a schedulability analysis for event-driven distributed real-
time systems is in place, our problem is to synthesize the com-
munications. This means to synthesize the MEDL of the TTP
controllers (and consequently the MHTTs) so that the process
set is schedulable on an as cheap as possible architecture.
The MEDL is synthesized according to the optimization
parameters available for each of the four approaches to message
scheduling discussed in Section 5.2. In order to guide the optimi-
zation process, we need a cost function that captures the “degree
of schedulability” for a certain MEDL implementation. Our cost
function is a modified version of that in [Tin92]:
where n is the number of processes in the application, Ri is the
response time of a process Pi, and Di is the deadline of a process
Pi. If the process set is not schedulable, there exists at least one
Ri that is greater than the deadline Di, therefore the term f1 of
the function will be positive. In this case the cost function is
equal to f1. However, if the process set is schedulable, then all Ri
are smaller than the corresponding deadlines Di. In this case f1
= 0 and we use f2 as the cost function, as it is able to differentiate
between two alternatives, both leading to a schedulable process
set. For a given set of optimization parameters leading to a
schedulable process set, a smaller f2 means that we have
improved the response times of the processes, so the application
can be potentially implemented on a cheaper hardware architec-
ture (with slower processors and/or bus). The release time Ri is
calculated according to the arbitrary deadline analysis [Tin94a]
based on the release jitter of the process (see section 4), its
cost function =
f2 = , if f1 = 0Ri Di–
i 1=
n∑
f1 = , if f1 > 0max 0 R, i Di–( )
i 1=
n∑
EVENT DRIVEN SYSTEMS
79
worst-case execution time, the blocking time, and the interfer-
ence time due to higher priority processes.
For a given application, we are interested to synthesize a
MEDL such that the cost function is minimized. We are also
interested to evaluate in different contexts the four approaches
to message scheduling, thus offering the designer a decision sup-
port for choosing the approach that best fits his application.
The MEDL synthesis problem belongs to the class of combina-
torial problems, therefore we are interested to develop heuristics
that are able to find accurate results in a reasonable time. We
have developed optimization algorithms corresponding to each
of the four approaches to message scheduling. A first set of algo-
rithms is based on simple and fast greedy heuristics. A second
class of heuristics aims at finding near-optimal solutions using
the simulated annealing (SA) algorithm.
The greedy heuristic differs for each of the four approaches to
message scheduling. The main idea is to improve the “degree of
schedulability” of the process set by incrementally trying to
reduce the release jitter of the processes.
The only way to reduce the release jitter in the SM and MM
approaches is through the optimization of the parame-
ters. This is achieved by a proper placement of messages into
slots (see Figure 5.2).
The OptimizeSM algorithm presented in Figure 5.11, starts by
deciding on a size (sizeSi) for each of the slots. There is nothing to
be gained by enlarging the slot size, since in this approach a slot
can carry at most one message. Thus, the slot sizes are set to the
minimum size that can accommodate the largest message sent
by the corresponding node.
Then, the algorithm has to decide on the number of rounds,
thus determining the size of the MEDL. Since the size of the
MEDL is physically limited, there is a limit to the number of
rounds (e.g., 2, 4, 8, 16 depending on the particular TTP control-
ler implementation). However, there is a minimum number of
rounds MinRounds that is necessary for a certain application,
Tmmax
CHAPTER 5
80
which depends on the number of messages transmitted. For
example, if the processes mapped on node N0 send in total 7
OptimizeSM
-- set the slot sizes
for each node Ni do
sizeSi = max(size of messages mj sent by node Ni)
end for
-- find the min. no. of rounds that can hold all the messages
for each node Ni do
nmi = number of messages sent from Ni
end for
MinRounds = max (nmi)
-- create a minimal complete MEDL
for each message mi
find round in [1..MinRounds] that has an empty slot for mi
place mi into its slot in round
end for
for each RoundsNo in [MinRounds...MaxRounds] do
-- insert messages in such a way that the cost is minimized
repeat
for each process Pi that receives a message mi do
if Di - Ri is the smallest so far then m = mPi end if
end for
for each round in [1..RoundsNo] do
place m into its corresponding slot in round
calculate the CostFunction
if the CostFunction is smallest so far then
BestRound = round
end if
remove m from its slot in round
end for
place m into its slot in BestRound if one was identified
until the CostFunction is not improved
end for
end OptimizeSM
Figure 5.11: Greedy Heuristic for SM
EVENT DRIVEN SYSTEMS
81
messages, then we have to decide on at least 7 rounds in order to
accommodate all of them (in the SM approach there is at most
one message per slot). Several numbers of rounds, RoundsNo,
are tried out by the algorithm starting from MinRounds up to
MaxRounds.
For a given number of rounds (that determine the size of the
MEDL) the initially empty MEDL has to be populated with mes-
sages in such a way that the cost function is minimized. In order
to apply the schedulability analysis that is the basis for the cost
function, a complete MEDL has to be provided. A complete
MEDL contains at least one instance of every message that has
to be transmitted between the processes on different processors.
A minimal complete MEDL is constructed from an empty MEDL
by placing one instance of every message mi into its correspond-
ing empty slot of a round. In Figure 5.2a, for example, we have a
MEDL composed of four rounds. We get a minimal complete
MEDL, for example, by assigning m2 and m1 to the slots in
rounds 3 and 4, and letting the slots in rounds 1 and 2 empty.
However, such a MEDL might not lead to a schedulable system.
The “degree of schedulability” can be improved by inserting
instances of messages into the available places in the MEDL,
thus minimizing the parameters. For example, in Figure
5.2a by inserting another instance of the message m1 in the first
round and m2 in the second round leads to P2 missing its dead-
line, while in Figure 5.2b inserting m1 into the second round and
m2 into the first round leads to a schedulable system.
Our algorithm repeatedly adds a new instance of a message to
the current MEDL in the hope that the cost function will be
improved. In order to decide an instance of which message
should be added to the current MEDL, a simple heuristic is
used. We identify the process Pi which has the most “critical” sit-
uation, meaning that the difference between its deadline and
response time, Di - Ri, is minimal compared with all other proc-
esses. The message to be added to the MEDL is the message
m=mPi received by the process Pi. Message m will be placed into
Tmmax
CHAPTER 5
82
that round (BestRound) which corresponds to the smallest value
of the cost function. The algorithm stops if the cost function can
not be further improved by adding more messages to the MEDL.
The OptimizeMM algorithm is similar to OptimizeSM. The main
difference is that in the MM approach several messages can be
placed into a slot (which also decides its size), while in the SM
approach there can be at most one message per slot. Also, in the
case of MM, we have to take additional care that the slots do not
exceed the maximum allowed size for a slot.
The situation is simpler for the dynamic approaches, namely
DM and DP, since we only have to decide on the slot sizes and, in
the case of DP, on the packet size. For these two approaches, the
placement of messages is dynamic and has no influence on the
cost function. The OptimizeDM algorithm (see Figure 5.12)
starts with the first slot Si = S0 of the TDMA round and tries to
find that size (BestSizeSi) which corresponds to the smallest
CostFunction. This slot size has to be large enough (Si ≥
OptimizeDM
for each node Ni do
MinSizeSi = max(size of messages mj sent by node Ni)
end for
-- identifies the size that minimizes the cost function
for each slot Si
BestSizeSi = MinSizeSi
for each SlotSize in [MinSizeSi...MaxSize] do
calculate the CostFunction
if the CostFunction is best so far then
BestSizeSi = SlotSizeSi
end if
end for
sizeSi = BestSizeSi
end for
end OptimizeDM
Figure 5.12: Greedy Heuristic for DM
EVENT DRIVEN SYSTEMS
83
MinSizeSi) to hold the largest message to be transmitted in this
slot, and within bounds determined by the particular TTP
controller implementation (e.g., from 2 bits up to MaxSize = 32
bytes). Once the size of the first slot has been determined, the
algorithm continues in the same manner with the next slots.
The OptimizeDP algorithm has, in addition, to determine the
proper packet size. This is done by trying all the possible packet
sizes given the particular TTP controller. For example, it can
start from 2 bits and increment with the “smallest data unit”
(typically 2 bits) up to 32 bytes. In the case of the OptimizeDP
algorithm the slot size is a multiple of the packet size, and it is
within certain bounds depending on the TTP controller.
We have also developed an SA based algorithm for bus access
optimization corresponding to each of the four message schedul-
ing approaches. In order to tune the parameters of the algorithm
we have first performed very long and expensive runs on
selected large examples, and the best ever solution, for each
example, has been considered as the near-optimum. Based on
further experiments we have determined the parameters of the
SA algorithm, for different sizes of examples, so that the optimi-
zation time is reduced as much as possible but the near-optimal
result is still produced. These parameters have then been used
in the large scale experiments presented in the following sec-
tion.
5.5 Experimental Results
We first present the experimental results for the schedulability
analysis with the TTP (Section 5.2), comparing the four message
scheduling approaches. Then, we present the results obtained
for the communication synthesis problem outlined in Section
5.4. Finally, the results for the schedulability analysis of sys-
tems with data and control dependencies (Section 5.3) are pre-
sented.
CHAPTER 5
84
Schedulability Analysis with TTP and Communication
Synthesis. For the evaluation of our message scheduling
approaches over TTP we used sets of processes generated for
experimental purpose. We considered architectures consisting of
2, 4, 6, 8 and 10 nodes. 40 processes were assigned to each node,
resulting in sets of 80, 160, 240, 320 and 400 processes. 30 sets
were generated for each dimension, thus a total of 150 sets of
processes were used for experimental evaluation. Worst case
computation times, periods, deadlines, and message lengths
were assigned randomly within certain intervals. For the com-
munication channel we considered a transmission speed of 256
kbps. The maximum length of the data field in a slot was 32
bytes, and the frequency of the TTP controller was chosen to be
20 MHz. All experiments were run on a Sun Ultra 10 worksta-
tion.
For each of the 150 generated examples and each of the four
scheduling approaches we have obtained, using our optimization
strategy, the near-optimal values for the cost function (Section
5.4), produced by our SA based algorithm. These values, for a
given example, might differ from one approach to another, as
they depend on the optimization parameters and the schedula-
bility analysis determined for each of the approaches. We were
interested to compare the four approaches to message schedul-
ing, based on the values obtained for the cost function.
Thus, Figure 5.13 presents the average percentage deviations
of the cost function obtained in each of the four approaches, from
the minimal value among them. The DP approach is generally
the most performant, and the reason for this is that dynamic
scheduling of messages is able to reduce release jitter because no
space is waisted in the slots if the packet size is properly
selected. However, by using the MM approach we can obtain
almost the same result if the messages are carefully allocated to
slots by our optimization strategy. Moreover, in the case of bigger
sets of processes (e.g., 400) MM outperforms DP, as DP suffers
form large overhead due to the handling of the packets. DM per-
EVENT DRIVEN SYSTEMS
85
forms worse than DP because it does not split the messages into
packets, and this results in a mismatch between the size of the
messages dynamically queued and the slot size, leading to
unused slot space that increases the jitter. SM performs the
worst as it does not permit much room for improvement, leading
to large amounts of unused slot space. Also, DP has produced a
MEDL that resulted in schedulable process sets for 1.33 times
more cases than the MM and DM. MM, in its turn, produced two
times more schedulable results than the SM approach.
Together with the four approaches to message scheduling, a so
called ad-hoc approach is presented. The ad-hoc approach per-
forms scheduling of messages without trying to optimize the
access to the communication channel. The ad-hoc solutions are
based on the MM approach and consider a design with the
TDMA configuration consisting of a simple, straightforward,
allocation of messages to slots. The lengths of the slots were
0
2
4
6
8
10
12
14
16
50 100 150 200 250 300 350 400 450
Number of Processes
A
ve
ra
ge
 P
er
ce
n
ta
ge
 D
ev
ia
ti
on
 [
%
] SM
MM
DM
DP
Ad-hoc
Figure 5.13: Comparison of the
Four Approaches to Message Scheduling
CHAPTER 5
86
selected to accommodate the largest message sent from the
respective node. Figure 5.13 shows that the ad-hoc alternative is
constantly outperformed by any of the optimized solutions. This
shows that by optimizing the access to the communication chan-
nel, significant improvements can be produced.
Next, we have compared the four approaches with respect to
the number of messages exchanged between different nodes and
the maximum message size allowed. For the results depicted in
Figure 5.14 and 5.15 we have assumed sets of 80 processes allo-
cated to 4 nodes. Figure 5.14 shows that as the number of mes-
sages increases, the difference between the approaches grows
while the ranking among them remains the same. The same
holds for the case when we increase the maximum allowed mes-
sage size (Figure 5.15), with a notable exception: for large mes-
sage sizes MM becomes better than DP, since DP suffers from
the overhead due to packet handling.
0
5
10
15
20
25
30
10 15 20 25 30 35 40 45 50
SM
MM
DM
DP
Number of Messages
A
ve
ra
ge
 P
er
ce
n
ta
ge
 D
ev
ia
ti
on
 [
%
]
Figure 5.14: Four Approaches to Message Scheduling --
The Influence of the Messages Number
EVENT DRIVEN SYSTEMS
87
The above comparison between the four message scheduling
alternatives is mainly based on the issue of schedulability. How-
ever, when choosing among the different policies, several other
parameters can be of importance. Thus, a static allocation of
messages can be beneficial from the point of view of testing and
debugging and has the advantage of simplicity. Similar consid-
erations can lead to the decision not to split messages. In any
case, however, optimization of the bus access scheme is highly
desirable.
In addition, we have considered a real-life example imple-
menting an aircraft control system adapted from [Tin94b] where
the ad-hoc solution and the SM approach failed to produce a
schedulable solution. However, with the other two approaches
schedulable solutions were produced, DP generating the small-
est cost function followed in this order by MM and DM.
0
5
10
15
20
25
30
0 5 10 15 20 25 30 35
SM
MM
DM
DP
Maximum No. of Bytes in a Message
Figure 5.15: Four Approaches to Message Scheduling --
The Influnece of the Message Sizes
A
ve
ra
ge
 P
er
ce
n
ta
ge
 D
ev
ia
ti
on
 [
%
]
CHAPTER 5
88
We were also interested in the quality of our optimization heu-
ristics for the communication synthesis problem. Thus, we have
run all the examples presented above, using the four greedy heu-
ristics (OptimizeSM, OptimizeMM, OptimizeDM, OptimizeDP) and com-
pared the results with those produced by the SA based
algorithm. The table below presents the average and maximum
percentage deviations of the cost function for each of the graph
dimensions.
All the four greedy heuristics perform very well, with less
than 2% loss in quality compared to the results produced by the
SA algorithms. The execution times for the greedy heuristics
were more than two orders of magnitude smaller than those
with SA.
Schedulability Analysis for Systems with Control and
Data Dependencies. In this context, the two main aspects
we were interested in are the quality of the schedulability anal-
ysis and the scalability of the algorithms for large examples. A
set of massive experiments were performed on conditional proc-
ess graphs generated for experimental purpose.
Table 5.1: Optimization Heuristics
Optimize 80
procs.
160
procs.
240
procs.
320
procs.
400
procs.
SM avg. 0.12% 0.19% 0.50% 1.06% 1.63%
max. 0.81% 2.28% 8.31% 31.05% 18.00%
MM avg 0.05% 0.04% 0.08% 0.23% 0.36%
max. 0.23% 0.55% 1.03% 8.15% 6.63%
DM avg 0.02% 0.03% 0.05% 0.06% 0.07%
max. 0.05% 0.22% 0.81% 1.67% 1.01%
DP avg 0.01% 0.01% 0.05% 0.04% 0.03%
max. 0.05% 0.13% 0.61% 1.42% 0.54%
EVENT DRIVEN SYSTEMS
89
We considered architectures consisting of 2, 4, 6, 8 and 10
processors. 40 processes were assigned to each node, resulting in
graphs of 80, 160, 240, 320 and 400 processes, having 2, 4, 6, 8
and 10 conditions, respectively. The number of unconditional
subgraphs varied for each graph dimension depending on the
number of conditions and the randomly generated structure of
the CPGs. For example, for CPGs with 400 processes, the maxi-
mum number of unconditional subgraphs was 64.
30 graphs were generated for each graph dimension, thus a
total of 150 graphs were used for experimental evaluation.
Worst case execution times were assigned randomly using both
uniform and exponential distribution. These experiments were
also run on a Sun Ultra 10 workstation.
In order to compare the quality of the schedulability
approaches, we need a cost function that captures, for a certain
system, the difference in quality between the schedulability
approaches proposed (Section 5.3). Our cost function is the dif-
ference between the deadline and the estimated worst case
delay of a CPG, summed for all the CPGs in the system:
where n is the number of CPGs in the system, δΓi is the esti-
mated worst case delay of the CPG Γi, and DΓi is the deadline on
Γi. A higher value for this cost function, for a given system,
means that the corresponding approach produces better results
(schedulability analysis is less pessimistic).
For each of the 150 generated example systems and each of
the five approaches to schedulability analysis we have calcu-
lated the cost function mentioned previously, based on results
produced with the algorithms described in Section 5.3. These
values, for a given system, differ from one analysis to another,
with the BF being the least pessimistic approach and therefore
having the largest value for the cost function.
We are interested to compare the five approaches, based on
the values obtained for the cost function. Thus, Figure 5.16
cost function DGi
δGi–( )
i 1=
n∑=
CHAPTER 5
90
presents the average percentage deviations of the cost function
obtained in each of the five approaches, compared to the value of
the cost function obtained with the BF approach. A smaller
value for the percentage deviation means a larger cost function,
thus a better result. The percentage deviation is calculated
according to the formula:
.
Figure 5.17 presents the average runtime of the algorithms, in
seconds.
The brute force approach, BF, performs best in terms of qual-
ity and obtains the largest values for the cost function at the
expense of a large execution time. The execution time can be up
to 7 minutes for large graphs of 400 processes, 10 conditions, and
64 unconditional subgraphs. At the other end, the straightfor-
deviation
costBF costapproach–
costBF
------------------------------------------------------------ 100=
CS
IC
0
20
40
60
80
100
50 100 150 200 250 300 350 400
RT2
BF
RT1
Number of Processes
A
ve
ra
ge
 P
er
ce
n
ta
ge
 D
ev
ia
ti
on
 [
%
]
Figure 5.16: Average Percentage Deviations
for Each of the Five Analyses
EVENT DRIVEN SYSTEMS
91
ward approach IC that ignores the conditions, performs worst
and becomes more and more pessimistic as the system size
increases. As can be seen from Figure 5.16, IC has even for
smaller systems of 160 processes (3 conditions, maximum 8
unconditional subgraphs) a 50% worse quality than the brute
force approach, with almost 80% loss in quality, in average, for
large systems of 400 processes. It is interesting to mention that
the low quality IC approach has also an average execution time
which is equal or comparable to the much better quality heuris-
tics (except the BF, of course). This is because it tries to improve
on the worst case delays through the iterative loop presented in
DelayEstimate, Figure 5.5.
Let us turn our attention to the three approaches CS, RT1,
and RT2 that, like the BF, consider conditions during the analy-
sis but also try to perform a trade-off between quality and execu-
0
50
100
150
200
250
300
350
400
450
50 100 150 200 250 300 350 400
CS
IC
RT2
BF
RT1
Number of Processes
A
ve
ra
ge
 E
xe
cu
ti
on
 T
im
e 
[s
]
Figure 5.17: Average Execution Times
for Each of the Five Analyses
CHAPTER 5
92
tion time. Figure 5.16 shows that the pessimism of the analysis
is dramatically reduced by considering the conditions during the
analysis. The RT1 and RT2 approaches, that visit each uncondi-
tional subgraph, perform in average better than the CS
approach that considers condition separation for the whole
graph. However, CS is comparable in quality with RT1, and even
performs better for graphs of size smaller than 240 processes (4
conditions, maximum 16 subgraphs).
The RT2 analysis that tries to improve the worst case
response times using the MaxSeparations, as opposed to RT1, per-
forms best among the non-brute-force approaches. As can be
seen from Figure 5.16, RT2 has less than 20% average deviation
from the solutions obtained with the brute force approach. How-
ever, if faster runtimes are needed, RT1 can be used instead, as
it is twice faster in execution time than RT2.
0
20
40
60
80
100
0 5 10 15 20 25 30
CS
IC
RT2
BF
RT1
Figure 5.18: Comparison of the Schedulability Analysis
Approaches Based on the No. of Unconditional Subgraphs
Number of Unconditional Subgraphs
A
ve
ra
ge
 P
er
ce
n
ta
ge
 D
ev
ia
ti
on
 [
%
]
EVENT DRIVEN SYSTEMS
93
We were also interested to compare the five approaches with
respect to the number of unconditional subgraphs and the
number of conditional process graphs that form a system. For
the results depicted in Figure 5.18 we have assumed CPGs con-
sisting of 2, 4, 8, 16, and 32 unconditional subgraphs of maxi-
mum 50 processes each, allocated to 8 processors. Figure 5.18
shows that as the number of subgraphs increases, the differ-
ences between the approaches grow while the ranking among
them remains the same, as resulted from Figure 5.16. The CS
approach performs better than RT1 with a smaller number of
subgraphs, but RT1 becomes better as the number of subgraphs
in the CPGs increases.
Figure 5.19 presents on a logarithmic scale the average per-
centage deviations for systems consisting of 1, 2, 3, 4 and 5 con-
ditional process graphs of 160 nodes each. As the number of
conditional process graphs increases, the IC and CS approaches
Figure 5.19: Comparison of the Schedulability Analysis
Approaches Based on the Number of CPGs
Number of Conditional Process Graphs
A
ve
ra
ge
 P
er
ce
n
ta
ge
 D
ev
ia
ti
on
 [
%
]
10
100
CS
IC
RT2
RT1
1 1.5 2 2.5 3 3.5 4 4.5 5
CHAPTER 5
94
become more pessimistic. However, RT1 and RT2 perform very
well, with RT2 being the least pessimistic approach (except the
BF approach, not depicted in Figure 5.19).
Finally, a real-life example implementing a vehicle cruise con-
troller is presented in the next chapter.
APPLICATION
95
Chapter 6
Application
THE RESEARCH PRESENTED in this thesis deals with aspects
of scheduling and communication synthesis for distributed hard
real-time systems. Such systems are today used in many appli-
cation areas.
In this chapter we discuss the relevance of our results to the
area of automotive electronics. The automotive electronics area
deals with the electronically controlled functions onboard vehi-
cles. Such electronic functionality is typically implemented in
modern vehicles using distributed architectures consisting of
several processors interconnected by a communication network.
In this context, aspects of real-time communication are of
extreme importance.
Electronic functionality in a vehicle might include: control of
the displays in the dashboard, light control, mirror adjustment,
seat position adjustment, climate control, window control, sun-
roof control, keyless vehicle entry systems, theft avoidance, con-
trol of the audio systems like radio or telephone, engine control,
power train control, braking suspension, vehicle dynamics con-
trol, etc.
CHAPTER 6
96
Most of these functions are not safety critical. However, as the
electronic functionality replaces mechanical and hydraulic func-
tions like braking and steering, the dependability aspect
becomes of extreme importance. In this context, a communica-
tion protocol like TTP provides services that support fault-toler-
ance and predictability.
6.1 Cruise Controller
A typical safety critical application with hard real-time con-
straints, to be implemented on a TTP based architecture, is a
vehicle cruise controller (CC). We have considered a CC system
derived from a requirement specification provided by the indus-
try.
The CC described in this specification delivers the following
functionality: it maintains a constant speed for speeds over 35
km/h and under 200 km/h, offers an interface (buttons) to
increase or decrease the reference speed, and is able to resume
its operation at the previous reference speed. The CC operation
is suspended when the driver presses the brake pedal.
TTP
ABS
TTP
TCM
TTP
ECM
TTP
ETM
TTP
CAN
TTP
CAN
CEM
CAN
PDM
CAN
...
...DDM
Figure 6.1: Hardware Architecture for the CC
APPLICATION
97
The specification assumes that the CC will operate in an envi-
ronment consisting of several nodes interconnected by a TTP
channel (Figure 6.1). There are five nodes which functionally
interact with the CC system: the Anti Blocking System (ABS),
the Transmission Control Module (TCM), the Engine Control
Module (ECM), the Electronic Throttle Module (ETM), and the
Central Electronic Module (CEM). It has been decided to distrib-
ute the functionality (processes) of the CC over these five nodes.
The transmission speed of the communication channel is 256
kbps and the frequency of the TTP controller was chosen to be 20
MHz.
We have modelled the specification of the CC system using a
conditional process graph that consists of 32 processes, and
includes two alternative tracks. The model is presented in
Figure 6.2 where the worst case execution times are depicted to
the right of each process and the message sizes to the left of each
message.
6.2 Experimental Results
We have applied the scheduling algorithms and optimization
heuristics presented in this thesis using as input the CPG corre-
sponding to the CC system.
Let us first discuss the results obtained for the strategies pre-
sented in Chapter 4 that deals with time-driven systems. In
Chapter 4 we were interested to compare the quality of the
schedules produced by the list scheduling based algorithm using
PCP and MPCP priority functions, and to check the potential of
the algorithms presented in Section 4.2.3 to improve the gener-
ated schedules by optimizing the bus access scheme. For the
implementation of the cruise controller as a time-driven system
we have considered a deadline of 400 ms. Thus, for the CC exam-
ple, the straightforward solution for bus access resulted in a
schedule corresponding to a maximal delay of 429 ms (which
CHAPTER 6
98
does not meet the deadline) when PCP was used as a priority
function, while using MPCP we obtained a schedule length of
Figure 6.2: Cruise Controller Model
CEM
ABS
ETM
ECM
TCM
Mapping to nodes ON
OFF
Speed Up Speed Down
12 ms
7
10
5
18 15
14 6
3
8
5
10
6
7
11
5
8
11
5
5
6
0 ms
9
17
5
12
10
7
6
4 bytes
5
8
4
16
16
8
8
8
8
8
8
8
4
4
16
16
16
APPLICATION
99
398 ms. The first and second greedy heuristics for bus access
optimization produced solutions so that the worst case delay was
reduced to 314 and 323 ms, respectively. The near-optimal solu-
tion (produced with the SA based approach) results in a delay of
302 ms.
Section 5.3 has presented several schedulability analysis tech-
niques for systems with both control and data dependencies con-
sidering priority-based preemptive scheduling. For simplicity,
the scheduling of messages has been treated separately, in Sec-
tion 5.2. We have applied the analyses in Section 5.3 to the CPG,
modelling the CC system without considering the messages
(depicted with solid circles in Figure 6.2). The deadline has been
set to 130 ms. In this context, we have obtained the following
results. Without considering the conditions, IC obtained a worst
case delay of 138 ms, thus the system resulted as being unsched-
ulable. The system has also resulted as unschedulable with the
Conditions Separation (CS) approach that has produced a result
of 132 ms. However, the Brute Force approach (BF) produced a
worst case delay of 124 ms which proves that the system imple-
menting the vehicle cruise controller is, in fact, schedulable.
Both Relaxed Tightness approaches (RT1 and RT2) produced
the same worst case delay of 124 ms as the BF.
CHAPTER 6
100
CONCLUSIONS AND FUTURE WORK
101
Chapter 7
Conclusions and
Future Work
IN THIS THESIS we have presented aspects of scheduling and
communication for distributed hard real-time systems. Special
emphasis has been placed on the impact of the communication
infrastructure and protocol on the overall system performance.
The scheduling and communication strategies proposed are
based on an abstract graph representation which captures, at
process level, both dataflow and the flow of control.
The intention of this chapter is to summarize the work pre-
sented in the thesis and point out ideas for future work.
7.1 Conclusions
In this thesis we have considered hard-real time systems imple-
mented on distributed architectures consisting of several nodes
interconnected by a communication channel.
The systems are modelled using conditional process graphs
that are able to capture data as well as control dependencies
between processes.
CHAPTER 7
102
We have considered both non-preemptive static cyclic schedul-
ing and preemptive scheduling with static priorities for the
scheduling of processes, while the communications are statically
scheduled according to the time triggered protocol.
Time-Driven Systems. We have first proposed an extension
to a static scheduling algorithm for CPGs. Thus, we have shown
that the general scheduling algorithm for conditional process
graphs can be successfully applied if the strategy for message
planning is adapted to the requirements of the TDMA protocol.
At the same time, the quality of generated schedules has been
much improved by adjusting the priority function used by the
scheduling algorithm to the particular communication protocol.
However, not only have particularities of the underlying archi-
tecture to be considered during scheduling, but the parameters
of the communication protocol should also be adapted to fit the
particular embedded application. We have shown that impor-
tant performance gains can be obtained, without any additional
cost, by optimizing the bus access scheme. The optimization
algorithm, which now implies both process scheduling and opti-
mization of the parameters related to the communication proto-
col, generates an efficient bus access scheme as well as the
schedule tables for activation of processes and communications.
Event-Driven Systems. In the context of fixed priority
preemptive scheduling we have proposed a schedulability anal-
ysis for the time-triggered protocol. We have considered four dif-
ferent approaches to message scheduling over TTP, that were
compared based on the issue of schedulability. After this, we pre-
sented optimization strategies for the bus access scheme in
order to fit the communication particularities of a certain appli-
cation. We showed that by optimizing the bus access scheme,
significant improvements in the “degree of schedulability” of a
system can be produced. Our optimization heuristics are able to
efficiently produce good quality results.
CONCLUSIONS AND FUTURE WORK
103
The schedulability analyses then have been extended in order
to bound the response time of a hard real-time system with both
control and data dependencies. Five approaches to the schedula-
bility analysis of such systems are proposed, and we show that
by considering the conditions during the analysis, the pessi-
mism of the analysis can be drastically reduced.
7.2 Future Work
Once good performance analysis techniques are developed, they
can be used to guide the partitioning and architecture selection
tasks.
Mapping of processes. In [Pop98] we have addressed the
problem of system level partitioning. Given an architecture con-
sisting of several processors, ASICs and shared buses, the parti-
tioning algorithm in [Pop98] finds the partitioning with the
smallest hardware cost and is able to predict and guarantee the
performance of the system in terms of worst case delay. Our
intention is to extend the work in [Pop98] to consider a more
realistic setting. Important issues that have to be considered
are: reuse of functionality, physical constraints related to the
placement of sensors and actuators, memory constraints, and a
realistic communication infrastructure.
Architecture Selection. During the phase of mapping proc-
esses to architecture components, the mapping strategy could
find out that there are not enough resources in order to guaran-
tee the constraints imposed on the application. Such situations
might include: lack of enough memory, lack of enough computing
power to guarantee a certain imposed performance, etc. In such
situations, several decisions have to be made related to architec-
ture selection: how much memory to add and where, which proc-
essor should be replaced with a more powerful one, or if a new
processor should be added to the architecture, etc. Nonetheless,
CHAPTER 7
104
before architecture selection can be performed, we also have to
address the problem of architecture modelling.
105
References
[Aud91] N. C. Audsley, A. Burns, M. F. Richardson, A. J. Well-
ings, “Hard Real-Time Scheduling: The Deadline
Monotonic Approach”, Proceedings of 8th IEEE
Workshop on Real-Time Operating Systems and Soft-
ware, 127-132, 1991.
[Aud93] N. C. Audsley, K. Tindell, A. Burns, “The End Of The
Line For Static Cyclic Scheduling?”, Proceedings of
the 5th Euromicro Workshop on Real-Time Systems,
36 -41, 1993.
[Aud95] N. C. Audsley, A. Burns, R. I. Davis, K. W. Tindell, A.
J. Wellings, “Fixed Priority Pre-emptive Scheduling:
An Historical Perspective”, Real-Time Systems, 8,
173-198, 1995.
[Axe96] J. Axelsson, “Hardware/software partitioning aiming
at fulfilment of real-time constraints”, Journal of
Systems Architecture, 42, 449-464, 1996.
[Bal98] F. Balarin, L. Lavagno, P. Murthy, A. Sangiovanni-Vin-
centelli, “Scheduling for Embedded Real-Time Sys-
tems”, IEEE Design and Test of Computers, 71-82,
January-March, 1998.
CHAPTER 7
106
[Bar98a] S. Baruah, “Feasibility Analysis of Recurring
Branching Tasks”, Proceedings of the 10th Euromicro
Workshop on Real-Time Systems, 138-145, 1998.
[Bar98b] S. Baruah, “A General Model for Recurring Real-
Time Tasks”, Proceedings of the IEEE Real-Time
Symposium, 114-122, 1998.
[Ben96] A. Bender, “Design of an Optimal Loosely Coupled
Heterogeneous Multiprocessor System”, Proc.
ED&TC, 275-281, 1996.
[Bol97] I. Bolsens, H. J. De Man, B. Lin, K. Van Rompaey, S.
Vercauteren, D. Verkest, “Hardware/Software Co-
Design of Digital Telecommunication Systems”, Pro-
ceedings of the IEEE, 85(3), 391-418, 1997.
[Cho92] P. Chou, R. Ortega, G. Borriello, “Synthesis of hard-
ware/software interface in microcontroller-based sys-
tems”, Proceedings of the International Conference on
Computer Aided Design, 1992.
[Cho95a] P. H. Chou, R. B. Ortega, G. Borriello, “The Chinook
Hardware/Software Co-Synthesis System”, Proc. Int.
Symposium on System Synthesis, 22-27, 1995.
[Cho95b] P. Chou, G. Borriello, “Interval Scheduling: Fine-
Grained Code Scheduling for Embedded Systems”,
Proc. ACM/IEEE DAC, 462-467, 1995.
[Cof72] E. G. Coffman Jr., R. L. Graham, “Optimal Schedul-
ing for two Processor Systems”, Acta Informatica, 1,
200-213, 1972.[Dav95]
[Dav95] J. M. Daveau, T. Ben Ismail, A. A. Jerraya, “Synthe-
sis of System-Level Communication by an Allocation-
Based Approach”, Proc. Int. Symposium on System
Synthesis, 150-155, 1995.
107
[Dav98] B. P. Dave, N. K. Jha, “COHRA: Hardware-Software
Cosynthesis of Hierarchical Heterogeneous Distrib-
uted Systems”, IEEE Transactions on CAD, 17(10),
900-919, 1998
[Dav99] B. P. Dave, G. Lakshminarayana, N. J. Jha, “COSYN:
Hardware-Software Co-Synthesis of Heterogeneous
Distributed Embedded Systems”, IEEE Transactions
on VLSI Systems, 7(1), 92-104, 1999.
[Deo98] J. S. Deogun, R. M. Kieckhafer, A. W. Krings, “Stabil-
ity and Performance of List Scheduling with Exter-
nal Process Delays”, Real Time Systems, 15(1), 5-38,
1998.
[Dic98] R. P. Dick, N. K. Jha, “CORDS: Hardware-Software
Co-Synthesis of Reconfigurable Real-Time Distrib-
uted Embedded Systems”, Proceedings of the Inter-
national Conference on CAD, 1998.
[Dob98] A. Doboli, P. Eles, “Scheduling under Control
Dependencies for Heterogeneous Architectures”,
International Conference on Computer Design
(ICCD), 1998.
[Edw97] S. Edwards, L. Lavagno , E. A. Lee , A.Sangoivanni-
Vincentelli , “Design of Embedded Systems: Formal
Models, Validation and Synthesis”, Proceedings of the
IEEE, Vol. 85, No. 3, March 1997.
[Ele00] P. Eles, A. Doboli, P. Pop, Z. Peng, “Scheduling with
Bus Access Optimization for Distributed Embedded
Systems”, IEEE Transactions on VLSI Systems, 2000
(to appear).
[Ele97] P. Eles, Z. Peng, K. Kuchcinski, A. Doboli, “System
Level Hardware/Software Partitioning Based on
Simulated Annealing and Tabu Search”, Design
Automation for Embedded Systems, 2(1), 5-32, 1997.
CHAPTER 7
108
[Ele98] P. Eles, K. Kuchcinski, Z. Peng, A. Doboli, P. Pop,
“Process Scheduling for Performance Estimation and
Synthesis of Hardware/Software Systems”, Proceed-
ings of the Euromicro Conference, 168-175, 1998.
[Ele98a] P. Eles, K. Kuchcinski, Z. Peng, A. Doboli, P. Pop,
“Scheduling of Conditional Process Graphs for the
Synthesis of Embedded Systems”, Proceedings of
Design Automation & Test in Europe (DATE), 1998.
[Ele98b] P. Eles, K. Kuchcinski, Z. Peng, A. Doboli, P. Pop,
“Process Scheduling for Performance Estimation and
Synthesis of Hardware/Software Systems”, Proceed-
ings of 24th Euromicro Conference, 1998.
[Eng99] J. Engblom, A. Ermedahl, M. Sjödin, J. Gustafsson,
H. Hansson, “Towards Industry Strength Worst-Case
Execution Time Analysis”, Swedish National Real-
Time Conference SNART’99, 1999
[Erm97] H. Ermedahl, H. Hansson, M. Sjödin, “Response-
Time Guarantees in ATM Networks”, Proc. IEEE
Real-Time Systems Symposium, 274-284, 1997.
[Ern93] R. Ernst, J. Henkel, T. Benner, “Hardware/software
co-synthesis for microcontrollers”, IEEE Design &
Test of Computers, 64-75, September 1997.
[Ern97] R. Ernst, W. Ye, “Embedded Program Timing Analy-
sis Based on Path Clustering and Architecture Clas-
sification”, Proc. Int. Conf. on CAD, 598-604, 1997.
[Ern98] R. Ernst, “Codesign of Embedded Systems: Status
and Trends”, IEEE Design and Test of Computers,
45-54, April-June 1998.
[Ern99] L. Thiele, K. Strehl, D. Ziegengein, R. Ernst, J. Teich,
“FunState-an internal design representation for
codesign”, International Conference on Computer-
Aided Design, 558 -565, 1999.
109
[Foh93] G. Fohler, “Realizing Changes of Operational Modes
with Pre Run-time Scheduled Hard Real-Time Sys-
tems”, Responsive Computer Systems, H. Kopetz and
Y. Kakuda, editors, 287-300, Springer Verlag, 1993.
[Gaj95] D. D. Gajski, F. Vahid, “Specification and Design of
Embedded Hardware-Software Systems”, IEEE
Design and Test of Computers, 53-67, Spring 1995.
[Ger96] R. Gerber, D. Kang, S. Hong, M. Saksena, “End-to-
End Design of Real-Time Systems”, Formal Methods
in Real-Time Computing, D. Mandrioli and C. Heit-
meyer, editors, John Wiley & Sons, 1996.
[Gon95] J. Gong, D. D. Gajski, S. Narayan, “Software Estima-
tion Using A Generic-Processor Model”, Proceedings
of the European Design and Test Conf, 498-502, 1995.
[Gup93] R. K. Gupta, G. De Micheli, “Hardware-software
cosynthesis for digital systems”, IEEE Design & Test
of Computers, 29-41, September 1993.
[Gup95] R. K. Gupta, “Co-Synthesis of Hardware and Software
for Digital Embedded Systems”, Kluwer Academic
Publishers, Boston, 1995.
[Hen95] J. Henkel, R. Ernst, “A Path-Based Technique for
Estimating Hardware Run-time in Hardware/Soft-
ware Cosynthesis”, Proceedings of the International
Symposium on System Synthesis, 116-121, 1995
[Jor97] P. B. Jorgensen, J. Madsen, “Critical Path Driven
Cosynthesis for Heterogeneous Target Architec-
tures”, Proceedings of the International Workshop on
Hardware/Software Codesign, 15-19, 1997.
[Kas84] H. Kasahara, S. Narita, “Practical Multiprocessor
Scheduling Algorithms for Efficient Parallel Process-
ing”, IEEE Transaction on Computers, 33(11), 1023-
1029, 1984.
CHAPTER 7
110
[Knu99] P. V. Knudsen, J. Madsen, “Integrating Communica-
tion Protocol Selection with Hardware/Software
Codesign”, IEEE Transactions on CAD, 18(8), 1077-
1095, 1999.
[Kop94] H. Kopetz, G. Grünsteidl, “TTP-A Protocol for Fault-
Tolerant Real-Time Systems”, IEEE Computer, 27(1),
14-23, 1994.
[Kop97a] H. Kopetz, “Real-Time Systems-Design Principles
for Distributed Embedded Applications”, Kluwer Aca-
demic Publishers, 1997.
[Kop97b] H. Kopetz et al, “A Prototype Implementation of a
TTP/C, Controller”, SAE Congress and Exhibition,
1997.
[Kuc97] K. Kuchcinski, “Embedded System Synthesis by
Timing Constraint Solving”, Proceedings of the Inter-
national Symposium on System Synthesis, 50-57,
1997.
[Kwo96] Y. K. Kwok, I. Ahmad, “Dynamic Critical-Path
Scheduling: an Effective Technique for Allocating
Task Graphs to Multiprocessors”, IEEE Transactions
on Parallel and Distributed Systems, 7(5), 506-521,
1996.
[Lak99] G. Lakshminarayana, K. S. Khouri, N. K. Jha,
“Wawesched: A Novel Scheduling Technique for Con-
trol-Flow Intensive Designs”, IEEE Transactions on
Computer-Aided Design of Integrated Circuits and
Systems, 18(5), 1999.
[Lee99] C. Lee, M. Potkonjak, W. Wolf, “Synthesis of Hard
Real-Time Application Specific Systems”, Design
Automation for Embedded Systems, 4(4), 215-241,
1999.
111
[Li95] Y. S. Li, S. Malik, “Performance Analysis of Embed-
ded Software Using Implicit Path Enumeration”,
Proc. ACM/IEEE DAC, 456-461, 1995.
[Liu73] C. L. Liu, J. W. Layland, “Scheduling Algorithms for
Multiprogramming in a Hard Real-Time Environ-
ment”, Journal of the ACM, V20, N1, 46-61, 1973.
[Lon99] H. Lonn, J. Axelsson, “A Comparison of Fixed-Prior-
ity and Static Cyclic Scheduling for Distributed
Automotive Control Applications”, Proceedings of the
11th Euromicro Conference on Real-Time Systems,
142-149, 1999.
[Lun99] T. Lundqvist, P. Stenström, “An Integrated Path and
Timing Analysis Method Based on Cycle-Level Sym-
bolic Execution”, Real-Time Systems, V17, N2/3, 183-
207, 1999.
[Mal97] S. Malik, M. Martonosi, Y.S. Li, “Static Timing Anal-
ysis of Embedded Software”, Proc. ACM/IEEE DAC,
147-152, 1997.
[Mc92] K. McMillan, D. Dill, “Algorithms for interface tim-
ing verification”, Proceedings of IEEE International
Conference on Computer Design, 48-51, 1992.
[Mic96] G. De Micheli, M.G. Sami eds., “Hardware/Software
Co-Design”, NATO ASI 1995, Kluwer Academic Pub-
lishers, 1996.
[Mic97] G. De Micheli, R.K. Gupta, “Hardware/Software Co-
Design”, Proceedings of the IEEE, 85(3), 349-365,
1997.
[Moo97] V. Mooney, T. Sakamoto, G. De Micheli, “Run-Time
Scheduler Synthesis for Hardware-Software Systems
and Application to Robot Control Design”, Proc. Int.
Workshop on Hardware-Software Co-design, 95-99,
1997.
CHAPTER 7
112
[Nar94] S. Narayan, D. D. Gajski, “Synthesis of System-Level
Bus Interfaces”, Proceedings of the European Design
and Test Conference, 395-399, 1994.
[Ort98] R. B. Ortega, G. Borriello, “Communication Synthe-
sis for Distributed Embedded Systems”, Proceedings
of the International Conference on CAD, 437-444,
1998.
[Pal98] J. C. Palencia, M. González Harbour, “Schedulability
Analysis for Tasks with Static and Dynamic Offsets”,
Proc. of the 19th IEEE Real-Time Systems Sympo-
sium, 1998.
[Pop00a] P. Pop, P. Eles, Z. Peng, “Bus Access Optimization for
Distributed Embedded Systems Based on Schedula-
bility Analysis”, Proceedings of the Design, Automa-
tion & Test In Europe Conference, 567-574, 2000.
[Pop00b] P. Pop, P. Eles, Z. Peng, “Performance Estimation for
Embedded Systems with Data and Control Depend-
encies”, Proceedings of the 8th International Work-
shop on Hardware/Software Codesign, 62-66, 2000.
[Pop00c] P. Pop, P. Eles, Z. Peng, “Schedulability Analysis for
Systems with Data and Control Dependencies”, Pro-
ceedings of the 12th Euromicro Conference on Real-
Time Systems, 2000 (to appear).
[Pop98] P. Pop, P. Eles, Z. Peng, “Scheduling Driven Parti-
tioning of Heterogeneous Embedded Systems”,
Swedish Workshop on Computer Systems Architec-
ture, 99-102, 1998.
[Pop99a] P. Pop, P. Eles, Z. Peng, “Scheduling with Optimized
Communication for Time-Triggered Embedded Sys-
tems”, 7th International Workshop on Hardware/
Software Codesign, 178-182, 1999.
113
[Pop99b] P. Pop, P. Eles, Z. Peng, “Communication Scheduling
for Time-Triggered Systems”, 11th Euromicro Con-
ference on Real-Time Systems (Work in Progress Pro-
ceedings), 1999.
[Pop99c] P. Pop, P. Eles, Z. Peng, “An Improved Scheduling
Technique for Time-Triggered Embedded Systems”,
25th Euromicro Conference, 303-310, 1999.
[Pop99d] P. Pop, P. Eles, Z. Peng, “Schedulability-Driven Com-
munication Synthesis for Time Triggered Embedded
Systems”, Proceedings of the 6th International Con-
ference on Real-Time Computing Systems and Appli-
cations, 287-294, 1999.
[Pra92] S. Prakash, A. Parker, “SOS: Synthesis of Applica-
tion-Specific Heterogeneous Multiprocessor Sys-
tems”, Journal of Parallel and Distributed
Computers, 16, 338-351, 1992.
[Ree93] C. R. Reevs, Modern Heuristic Techniques for Combi-
natorial Problems, Blackwell Scientific Publications,
1993.
[Sha90] L. Sha, R. Rajkumar, J. Lehoczky, “Priority Inherit-
ance Protocols: An Approach to Real-Time Synchro-
nization,” IEEE Transactions on Computers, 39(9),
1175-1185, 1990.
[Sta97] J. Staunstrup, W. Wolf eds., “Hardware/Software Co-
Design: Principles and Practice”, Kluwer Academic
Publishers, 1997.
[Suz96] K. Suzuki, A. Sangiovanni-Vincentelli, “Efficient
Software Performance Estimation Methods for Hard-
ware/Software Codesign”, Proc. ACM/IEEE DAC,
605-610, 1996.
CHAPTER 7
114
[Sta93] J. A. Stankovic, K. Ramamritham, “Advances in
Real-Time Systems”, IEEE Computer Society Press,
1993.
[Tin92] K. Tindell, A. Burns, A. J. Wellings, “Allocating Real-
Time Tasks (An NP-Hard Problem made Easy),”
Real-Time Systems, 4(2), 145-165, 1992.
[Tin94a] K. Tindell, J. Clark, “Holistic Schedulability Analysis
for Distributed Hard Real-Time Systems”, Micro-
processing and Microprogramming, 40, 117-134,
1994.
[Tin94b] K. Tindell, “Adding Time-Offsets to Schedulability
Analysis”, Department of Computer Science, University
of York, Report Number YCS-94-221, 1994.
[Tin95] K. Tindell, A. Burns, A. J. Wellings, “Calculating
Controller Area Network (CAN) Message Response
Times”, Control Engineering Practice, 3(8), 1163-
1169, 1995.
[Tur99] J. Turley, “Embedded Processors by the Numbers”,
Embedded Systems Programming, May 1999.
[Ull75] D. Ullman, “NP-Complete Scheduling Problems”,
Journal of Computer Systems Science, 10, 384-393,
1975.
[Vah94] F. Vahid, J. Gong, D. Gajski, “A binary-constraint
search algorithm for minimizing hardware during
hardware/software partitioning”, Proceedings of
European Design Automation Conference EURO-
DAC/VHDL, 214-219, 1994.
[Val95] C. A. Valderrama, A. Changuel, P. V. Raghavan, M.
Abid, T. Ben Ismail, A. A. Jerraya, “A Unified Model
for Co-simulation and Co-synthesis of Mixed Hard-
ware/Software Systems”, Proceedings of the Euro-
pean Design and Test Conference, 1995.
115
[Val96] C. A. Valderrama, F. Nacabal, P. Paulin, A. A. Jer-
raya, “Automatic Generation of Interfaces for Dis-
tributed C-VHDL Cosimulation of Embedded
Systems: an Industrial Experience”, Proceedings of
the International Workshop on Rapid System Proto-
typing, 72-77, 1996.
[Ver96] D. Verkest, K. Van Rompaey, I. Bolsens, H. De Man,
“CoWare--A Design Environment for Heterogeneous
Hardware/Software Systems”, Design Automation for
Embedded Systems, 1, 357-386, 1996.
[Wal94] E. Walkup, G. Borriello, “Automatic Synthesis of
Device Drivers for Hardware/Software Co-design”,
Technical Report 94-06-04, Dept. of Computer Science
and Engineering, University of Washington, 1994.
[Wir98] X-by-Wire Consortium, URL: http://www.vmars.tuw-
ien.ac.at/projects/xbywire/, 1998
[Wol94] W. Wolf, “Hardware-Software Co-Design of Embed-
ded Systems”, Proceedings of the IEEE, V82, N7, 967-
989, 1994.
[Wu90] M. Y. Wu, D. D. Gajski, “Hypertool: A Programming
Aid for Message-Passing Systems”, IEEE Transac-
tions on Parallel and Distributed Systems, 1(3), 330-
343, 1990.
[Xu00] J. Xu, D. L. Parnas, “Priority Scheduling Versus Pre-
Run-Time Scheduling”, Real Time Systems, 18(1), 7-
24, 2000.
[Xu93] J. Xu, D. L. Parnas, “On satisfying timing constraints
in hard-real-time systems Software Engineering”,
IEEE Transactions on Volume: 19(1), 70 -84, 1993.
[Yen97] T. Y. Yen, W. Wolf, “Hardware-Software Co-Synthesis
of Distributed Embedded Systems”, Kluwer Academic
Publishers, 1997.
CHAPTER 7
116
[Yen98] T. Yen, W. Wolf, “Performance estimation for real-
time distributed embedded systems”, IEEE Transac-
tions on Parallel and Distributed Systems, Volume:
9(11), 1125 -1136, 1998
