Scheduling of Conditional Process Graphs for the Synthesis of Embedded Systems by Eles, Petru et al.
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
General rights 
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners 
and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. 
 
• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. 
• You may not further distribute the material or use it for any profit-making activity or commercial gain 
• You may freely distribute the URL identifying the publication in the public portal  
 
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately 
and investigate your claim. 
   
 
Downloaded from orbit.dtu.dk on: Dec 16, 2017
Scheduling of Conditional Process Graphs for the Synthesis of Embedded Systems
Eles, Petru; Kuchcinski, Krzysztof; Peng, Zebo; Pop, Paul; Doboli, Alex
Published in:
Proceedings of the conference on Design, automation and test in Europe
Link to article, DOI:
10.1109/DATE.1998.655847
Publication date:
1998
Document Version
Publisher's PDF, also known as Version of record
Link back to DTU Orbit
Citation (APA):
Eles, P., Kuchcinski, K., Peng, Z., Pop, P., & Doboli, A. (1998). Scheduling of Conditional Process Graphs for
the Synthesis of Embedded Systems. In Proceedings of the conference on Design, automation and test in
Europe: The Most Influential Papers of 10 Years DATE (pp. 132-138). Springer. DOI:
10.1109/DATE.1998.655847
Abstract
We present an approach to process scheduling based on
an abstract graph representation which captures both data-
flow and the flow of control. Target architectures consist of
several processors, ASICs and shared busses. We have
developed a heuristic which generates a schedule table so
that the worst case delay is minimized. Several experiments
demonstrate the efficiency of the approach.
1. Introduction
In this paper we concentrate on process scheduling for
systems consisting of communicating processes implemented
on multiple processors and dedicated hardware components.
In such a system in which several processes communicate
with each other and share resources, scheduling is a factor
with a decisive influence on the performance of the system and
on the way it meets its timing constraints. Thus, process sched-
uling has not only to be performed for the synthesis of the fi-
nal system, but also as part of the performance estimation task.
Optimal scheduling, in even simpler contexts than that
presented above, has been proven to be an NP complete
problem [13]. In our approach, we assume that some pro-
cesses can be activated if certain conditions, computed by
previously executed processes, are fulfilled. Thus, process
scheduling is further complicated since at a given activation
of the system, only a certain subset of the total amount of
processes is executed and this subset differs from one acti-
vation to the other. This is an important contribution of our
approach because we capture both the flow of data and that
of control at the process level, which allows an accurate and
direct modeling of a wide range of applications.
Performance estimation at the process level has been well
studied in the last years [10, 12]. Starting from estimated ex-
ecution times of single processes, performance estimation
and scheduling of a system containing several processes can
be performed. In [14] performance estimation is based on a
preemptive scheduling strategy with static priorities using
rate-monotonic-analysis. In [11] scheduling and partitioning
of processes, and allocation of system components are for-
mulated as a mixed integer linear programming problem
while the solution proposed in [8] is based on constraint logic
programming. Several research groups consider hardware/
software architectures consisting of a single programmable
processor and an ASIC. Under these circumstances deriving
a static schedule for the software component practically
means the linearization of a dataflow graph [2, 6].
Static scheduling of a set of data-dependent software
processes on a multiprocessor architecture has been inten-
sively researched [3, 7, 9]. An essential assumption in these
approaches is that a (fixed or unlimited) number of identical
processors are available to which processes are progres-
sively assigned as the static schedule is elaborated. Such an
assumption is not acceptable for distributed embedded sys-
tems which are typically heterogeneous.
In our approach we consider embedded systems specified as
interacting processes which have beenmapped on an architec-
ture consisting of several processors and dedicated hardware
components connected by shared busses. Process interac-
tion in our model is not only in terms of dataflow but also
captures the flow of control under the form of conditional se-
lection. Considering a non-preemptive execution environment
we statically generate a schedule table for processes and derive
a worst case delay which is guaranteed under any conditions.
The paper is divided into 7 sections. In section 2 we formu-
late our basic assumptions and introduce the graph-based
model which is used for system representation. The schedule
table and the general scheduling strategy are presented in
sections 3 and 4. The algorithm for generation of the schedule
table is presented in section 5. Section 6 describes the exper-
imental evaluation and section 7 presents our conclusions.
2. Problem Formulation and the Conditional
Process Graph
We consider a generic architecture consisting of pro-
grammable processors and application specific hardware
processors (ASICs) connected through several busses.
These busses can be shared by several communication
channels connecting processes assigned to different proces-
sors. Only one process can be executed at a time by a
programmable processor while a hardware processor can
execute processes in parallel. Processes on different proces-
sors can be executed in parallel. Only one data transfer can
be performed by a bus at a given moment. Computation and
Scheduling of Conditional Process Graphs for the Synthesis of Embedded Systems
Petru Eles
1,2
, Krzysztof Kuchcinski
1
, Zebo Peng
1
, Alexa Doboli
2
, and Paul Pop
2
2
Computer Science and Engineering Department
Technical University of Timisoara
Romania
1
Dept. of Computer and Information Science
Linköping University
Sweden
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on October 20, 2009 at 05:33 from IEEE Xplore.  Restrictions apply. 
data transfer on several busses can overlap.
In [4] we presented algorithms for automatic hardware/
software partitioning based on iterative improvement heu-
ristics. The problem we are discussing in this paper
concerns performance estimation of a given design alterna-
tive and scheduling of processes and communications.
Thus, we assume that each process is assigned to a (pro-
grammable or hardware) processor and each
communication channel which connects processes assigned
to different processors is assigned to a bus. Our goal is to de-
rive a worst case delay by which the system completes
execution, so that this delay is as small as possible, and to
generate the schedule which guarantees this delay.
As an abstract model for system representation we use a di-
rected, acyclic, polar graph Γ(V, ES,EC). Each node Pi∈V rep-
resents one process. ES and EC are the sets of simple and con-
ditional edges respectively. ES ∩ EC = and ES ∪ EC = E,
where E is the set of all edges. An edge eij∈E from Pi to Pj
indicates that the output of Pi is the input of P j. The graph is
polar, which means that there are two nodes, called source
and sink, that conventionally represent the first and last pro-
cess. These nodes are introduced as dummy processes so
that all other nodes in the graph are successors of the source
and predecessors of the sink respectively.
The mapping of processes to processors and busses is given
by a function M: V→PE, where PE={pe1, pe2, .., peNpe} is
the set of processing elements. For any processPi,M(Pi) is the
processing element to which Pi is assigned for execution.
Each process Pi, assigned to processor or bus M(Pi), is
characterized by an execution time tPi. In the process graph
depicted in Fig. 1, P0 and P32 are the source and sink nodes
respectively. Nodes denoted P1, P2, .., P17, are "ordinary"
processes specified by the designer. They are assigned to
one of the two programmable processors pe1 and pe2 or to the
hardware component pe3. The rest are so called communica-
tion processes (P18, P19, .., P31). They are represented in Fig.
1 as black dots and are introduced for each connection which
links processes mapped to different processors. These pro-
cesses model inter-processor communication and their execu-
tion time is equal to the corresponding communication time.
An edge eij∈EC is a conditional edge (thick lines in Fig. 1)
and it has an associated condition. Transmission on such an
edge takes place only if the associated condition is satisfied.
We call a node with conditional edges at its output a disjunc-
tion node (and the corresponding process a disjunction
process). Alternative paths starting from a disjunction node,
which correspond to a certain condition, are disjoint and they
meet in a so called conjunction node (with the corresponding
process called conjunction process). In Fig. 1 circles repre-
senting conjunction and disjunction nodes are depicted with
thick borders. We assume that conditions are independent.
A boolean expression XPi, called guard, can be associ-
ated to each node Pi in the graph. It represents the necessary
condition for the respective process to be activated. In Fig.
1, for example, XP3=true, XP14=D∧K, XP17=true, XP5=C.
Two nodes Pi and Pj, where Pj is not a conjunction node,
can be connected by an edge eij only if XPj⇒XPi (which
means that XPi is true whenever XPj is true). This restriction
avoids specifications in which a process is blocked because
it waits for a message from a process which will not be acti-
vated. If P j is a conjunction node, predecessor nodes Pi can
be situated on alternative input paths.
According to our model, we assume that a process,
which is not a conjunction process, can be activated only
after all its inputs have arrived. A conjunction process can
be activated after messages coming on one of the alternative
paths have arrived. All processes issue their outputs when
they terminate. If we consider the activation time of the
source process as a reference, the activation time of the sink
process is the delay of the system at a certain execution.
3. The Schedule Table
For a given execution of the system, a subset of the pro-
cesses is activated which corresponds to the actual path
through the process graph. This path depends on certain
conditions. For each individual path there is an optimal
schedule of the processes which produces a minimal delay.
Let us consider the process graph in Fig.1. If all three con-
ditions, C, D, and K are true, the optimal schedule requires
P1 to be activated at time t=0 on processor pe1, and proces-
sor pe2 to be kept idle until t=4, in order to activate P3 as
soon as possible (see Fig. 4a). However, if C and D are true
but K is false, the optimal schedule requires to start both P1
∅
P0
P7
P17P10
P13
P11
P8 P9
P32
P16P15P14
P12P3
P1
P2
P6
P5
P4
CC
D D
K
K
C
Fig. 1. Conditional Process Graph with execution times and mapping
Process mapping
Processor pe1: P1, P2, P4, P6, P9, P10, P13
Processor pe
2
: P3, P5, P7, P11, P14, P15, P17
Processor pe3: P8, P12, P16
Communications are mapped to a unique bus
Execution time tPi for processes Pi
tP1: 3 tP6: 5 tP11: 6 tP16: 4
tP2: 4 tP7: 3 tP12: 6 tP17: 2
tP3: 12 tP8: 4 tP13: 8
tP4: 5 tP9: 5 tP14: 2
tP5: 3 tP10: 5 tP15: 6
Execution time ti,j for communication
between Pi and Pj
t1,3: 1 t4,7: 3 t11,12: 1 t13,17: 2
t2,5: 3 t6,8: 3 t11,13: 2 t16,17: 2
t3,6: 2 t7,10: 2 t12,14: 1
t3,10: 2 t8,10: 2 t12,15: 3
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on October 20, 2009 at 05:33 from IEEE Xplore.  Restrictions apply. 
on pe1 and P11 on pe2 at t=0; P3will be activated in this case
at t=6, after P11 has terminated and, thus, pe2 becomes free
(see Fig. 4b). This example reveals one of the difficulties
when generating a schedule for a system like that in Fig. 1.
As the values of the conditions are unpredictable, the deci-
sion on which process to activate on pe2 and at which time,
has to be taken without knowing which values the conditions
will later get. On the other side, at a certain moment during
execution, when the values of some conditions are already
known, they have to be used in order to take the best possi-
ble decisions on when and which process to activate. An
algorithm has to be developed which produces a schedule of
the processes so that the worst case delay is as small as pos-
sible. The output of this algorithm is a so called schedule
table. In this table there is one row for each "ordinary" or
communication process, which contains activation times for
that process corresponding to different values of the condi-
tions. Each column in the table is headed by a logical
expression constructed as a conjunction of condition values.
Activation times in a given column represent starting times
of the processes when the respective expression is true.
Table 1 shows part of the schedule table corresponding to
the system depicted in Fig. 1. According to this schedule pro-
cesses P1, P2, P11 as well as the communication process P18
are activated unconditionally at the times given in the first
column of the table. No condition has yet been determined
to select between alternative schedules. Process P14, on the
other hand, has to be activated at t=24 if D∧C∧K=true and
t=35 if D∧C∧K=true. To determine the worst case delay,
δmax, we have to observe the rows corresponding to pro-
cesses P10 and P17: δmax= max{34 + t10, 37 + t17}=39.
The schedule table contains all information needed by a
distributed run time scheduler to take decisions on activation
of processes.We consider that during execution a very simple
non-preemptive scheduler located on each programmable/
communication processor decides on process and communi-
cation activation depending on actual values of conditions.
Once activated, a process executes until it completes. To pro-
duce a deterministic behaviorwhich is correct for any combina-
tion of conditions, the table has to fulfill several requirements:
1. If for a certain processPi, with guard XPi, there exists an acti-
vation time in the column headed by expression Ek, then
Ek⇒XPi; this means that no process will be activated if
the conditions required for its execution are not fulfilled.
2. Activation times have to be uniquely determined by the
conditions. Thus, if for a certain process Pi there are
several alternative activation times then, for each pair of
such times (τPi
Ej, τPi
Ek) placed in columns headed by
expressions Ej and Ek, Ej∧Ek=false.
3. If for a certain execution of the system the guard XPi be-
comes true then Pi has to be activated during that execution.
Thus, considering all expressions Ej corresponding to col-
umns which contain an activation time for Pi, ∨Ej=XPi.
4. Activation of a process Pi at a certain time t has to
depend only on condition values which are determined at
the respective moment t and are known to the processing
element M(Pi) which executes Pi.
The value of a condition is determined at the moment τ at
which the corresponding disjunction process terminates.
Thus, at any moment t, t≥τ, the condition is available for
scheduling decisions on the processor which has executed
the disjunction process. However, in order to be available
on any other processor, the value has to arrive at that pro-
cessor. The scheduling algorithm has to consider both the
time and the resource needed for this communication.
The following strategy has been adopted for scheduling the
communication of conditions: after termination of a disjunc-
tion process the value of the condition is broadcasted from the
corresponding processor to all other processors; this broadcast
is scheduled as soon as possible on the first bus which be-
comes available after termination of the disjunction process.
For this task only busses are considered to which all proces-
sors are connected and we assume that at least one such bus
exists1. The time τ0 needed for this communication is the
same for all conditions and depends on the features of the em-
ployed buses. Given the minimal amount of transferred
information, the time τ0 is smaller than (at most equal to) any
other communication time. The transmitted condition is
available for scheduling decisions on all other processors τ0
time units after initiation of the broadcast. For the example
given in Table 1 communication time for conditions has been
considered τ0=1. The last three rows in Table 1 indicate the
schedule for communication of conditions C, D, and K.
4. The Scheduling Strategy
Our goal is to derive a minimal worst case delay and to
generate the corresponding schedule table for a process
graph Γ(V, ES, EC), a mapping functionM: V→PE, and exe-
cution times tPi for each process Pi∈V. At a certain
1. This assumption is made for simplification of the further discussion. If no
bus is connected to all processors, communication tasks have to be sched-
uled on several busses according to the actual interconnection topology.
Table 1: Part of schedule table for the graph in Fig. 1
true DD∧CD∧C∧KD∧C∧KD∧CD∧C∧KD∧C∧KDD∧CD∧C
P1 0
P2 3
P10 34 34 26 26 34 26
P11 0
P14 35 24
P17 29 37 30 26 22 24
P18
1→3
3
P19
2→5
9 10
P20
3→10
28 20 21 21 22 18
D 6
C 7 7
K 15 15
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on October 20, 2009 at 05:33 from IEEE Xplore.  Restrictions apply. 
execution of the system, one of the Nalt alternative paths
through the process graph will be executed. Each alternative
path corresponds to one subgraph Gk∈Γ, k=1, 2, ..., Nalt. For
each subgraph there is an associated logical expression Lk
(the label of the path) which represents the necessary condi-
tions for that subgraph to be executed.
If at activation of the system all the conditions would be
known, the processes could be executed according to the
(near)optimal schedule of the corresponding subgraph Gk.
Under these circumstances the worst case delay δmaxwould be
δmax = δM, with
δM = max{δk, k=1, 2, ..., Nalt}, where δk is the delay cor-
responding to subgraph Gk.
However, this is not the case as we do not assume any predic-
tion of the conditions at the start of the system. Thus, what we
can say is only that1: δmax ≥ δM.
A scheduling heuristic has to produce a schedule table for
which the difference δmax−δM is minimized. This means that
the perturbation of the individual schedules, introduced by the
fact that the actual path is not known in advance, should be as
small as possible. We have developed a heuristic which, start-
ing from the schedules corresponding to the alternative paths,
produces the global schedule table, as result of a, so called,
schedule merging operation. Hence, we perform scheduling
of a process graph as a succession of the following two steps:
1. Scheduling of each individual alternative path;
2. Merging of the individual schedules and generation of
the schedule table.
We present algorithms for scheduling of the individual
paths in [5]. In this paper we concentrate on the generation
mechanism of the global schedule table.
5. The Table Generation Algorithm
The input for the generation of the schedule table is a set
of Nalt schedules, each corresponding to an alternative path,
labeled Lk, through the process graph Γ. Each such schedule
consists of a set of pairs (Pi, τPi
Lk), where Pi is a process acti-
vated on path Lk and τPi
Lk is the start time of process Pi
according to the respective schedule. The schedule table gen-
erated as output fulfills the requirements presented in section 3.
The schedule merging algorithm is guided by the length
of the schedules produced for each alternative path. While
progressively constructing the schedule table, at each
moment, priority is given to the requirements of the sched-
ule corresponding to that path, among those which are still
reachable, that produces the largest delay. Thus, we induce
perturbations into the short delay paths and let the long ones
proceed as similar as possible to their (near)optimal schedule.
5. 1. Schedule Merging
The generation algorithm of the schedule table proceeds
1. This formula to be rigorously correct, δM has to be the maximum of the
optimal delays for each subgraph.
along a binary decision tree corresponding to all alternative
paths, which is explored in a depth first order. Fig. 2 shows the
decision tree explored during generation of Table 1. The
nodes of the decision tree correspond to the states reached
when, according to the actual schedule, a disjunction process
has been terminated and the value of a new condition has been
computed. The algorithm is guided by the following basic rules:
1. Start times of processes are fixed in the table according,
with priority, to the schedule of that path which is reach-
able from the current state and produces the longest delay.
2. The start time τPi
Lk of a process Pi is placed in a column
headed by the conjunction of all condition values known
at τPi
Lk on the processing element M(Pi), according to the
current schedule. If such a column does not yet exist in
the table, it will be generated.
3. After a new path has been selected, its schedule will be ad-
justed by enforcing the start times of certain processes
according to their previously fixed values. This can be the
case of a process Pi which is part of the current path la-
belled Lk (Lk⇒XPi), and of a previously handled path
labelled Lq (Lq⇒XPi). When handling path Lq an activation
time for process Pi has been fixed in a column headed by
expression E. If E depends exclusively on conditions cor-
responding to tree nodes which are predecessors of the
branching node between the two paths, then the schedule
of the current path, Lk, has to be adjusted by taking into
consideration the previously fixed activation time of Pi.
4. Further readjustments of the current schedule are per-
formed in order to avoid violation of requirement 2 in
section 3. This aspect will be discussed in subsection 5.2.
At the beginning, start times of processes are placed into
Table 1 according to the schedule which corresponds to the
path labeledD∧C∧K. After the first back-step, to node K (Fig.
2), the schedule corresponding to path D∧C∧K becomes the
actual one. New start times will be fixed into the schedule
table according to an adjusted version of this schedule. The
next back-step is to node C. Two schedules are now reach-
able taking the branch C, which are labelled D∧C∧K and
D∧C∧K respectively. D∧C∧K, which produces a larger
delay, will be selected first as the actual schedule. It will be
followed until the next beck-step has been performed.
The algorithm for generation of the schedule table is
briefly described, as a recursive procedure, in Fig. 3. An es-
sential aspect of this algorithm is that, after each back-step,
a new schedule has to be selected as the current one. The se-
lection rule gives priority to the path with the largest delay,
among those which are reachable from the current node in
the decision tree. Further start times of processes will be
Length of the optimal schedule for the al-
ternative paths through the graph in Fig. 1
D∧C∧K 39
D∧C 39
D∧C∧K 38
D∧C∧K 32
D∧C∧K 31
D∧C 31
Fig. 2. Decision tree explored for the graph in Fig. 1
D
D
D
C
C
C
K K
K
C
C
C
true
K K
K
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on October 20, 2009 at 05:33 from IEEE Xplore.  Restrictions apply. 
placed into the schedule table according to an adjusted ver-
sion of the new current schedule. Processes which satisfy
the condition of rule three presented above have to be
moved to their previously fixed start time. They are consid-
ered as locked in this new position. As result of a simple
rescheduling procedure the start times of the other, un-
locked, processes are changed to the earliest moment which
is allowed, taking in consideration data dependencies. Rel-
ative priorities of unlocked processes assigned to the same
non-hardware processor are kept as in the original schedule.
In Fig. 4 we show the adjustment of the schedule labelled
D∧C∧K performed after the back-step to node K. At this
moment start times of processes P1, P2, P11, P3, P12, P18,
P27, and of the communication processes for conditions D,
C, and K have already been placed in the table according to
the schedule of path D∧C∧K which is shown in Fig. 4b. The
activation time of these processes has been placed in col-
umns headed by expressions true, C or D∧C, and
consequently they are mandatory also for path D∧C∧K
(both node C and D are predecessors of node K which is the
branching node between the two paths). Under these cir-
cumstances some of the other processes have to be moved
from their original position in this schedule, shown in Fig.
4a, to their position in the adjusted schedule of Fig 4c. This
adjusted version is used in order to fix start times of further
processes until the next back-step.
5. 2. Conflict Handling at Table Generation
Suppose we are currently handling a path labelled Lk.
According to the adjusted schedule of this path we place an
activation time τPi
Lk of process Pi into the table, so that the
respective column is headed by an expression E. The prob-
lem is how to preserve the coherence of the table in the sense
introduced by requirement 2 defined in section 3. If there is
no activation time previously introduced in the row corre-
sponding to Pi no conflicts can be generated. If, however, the
respective row contains activation times, there exists a poten-
tial of conflict between the column headed by E and columns
which already include activation times of Pi. Let us consider
that such a column is headed by an expression F. According
to requirement 2, we have a conflict between columns E and
F if there exists no condition C so that E=q∧C and F=q'∧C.
Intuitively, such a conflict means that for two or more paths
the same process Pi is scheduled at different times but the
conditions known on the processing element M(Pi) do not
allow to the scheduler to identify the current path and to
take a deterministic decision on activation of Pi.
If placement of an activation time for process Pi in a col-
umn headed by expression E produces a conflict, the current
schedule has to be readjusted so that an expression E' will
head the column that hosts the new activation time of Pi and
no conflict is induced in the schedule table. As shown in the
algorithm presented in Fig. 3, after adjustment of a sched-
ule, unlocked processes are checked if their placement in
the table produces any conflicts. If this is the case, the pro-
BuildScheduleTable(current_schedule, back_step)
if back_step then
Select new current_schedule
Adjust current_schedule
Check for conflicts and readjust current_schedule
end if
while not (EndOfSchedule or
arrived at moment so that a disjunction process is terminated) do
Take following process in current_schedule and
place start time into ScheduleTable
end while
if EndOfSchedule then return end if
BuildScheduleTable(current_schedule, false)
BuildScheduleTable(current_schedule, true)
end BuildScheduleTable
Fig. 3. Algorithm for generation of a schedule table
P17
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
A
A
A
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
A
A
A
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAA
AAA
AAAA
AAAA
AAAA
AA
AA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
A
A
A
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
A
A
A
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AA
AA
AA
P1 P2 P6 P9 P10
P12
P11
19 20 21 22 23 24 25 26 27 28 29 30 31 33 34 3532 360 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 1714 18 37 38 39
P17
AAAA
AAAA
AAAA
AA
AA
AA
P14
AAAA
AAAA
AAAA
A
A
A
P21
P16
P8
AAAA
AAAA
AAAA
AA
AA
AA
P31P18
AAA
AAA
AAA
AA
AA
AA
P20 D
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
A
A
A
P23 K
AAAA
AAAA
AAAA
AA
AA
AA
P25P27 P28
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAA
AAA
AAA
AAAA
AAAA
AAAA
AAAA
AA
AA
AA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
A
A
A
A
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
A
A
A
A
P1 P2 P6 P9 P10
P12
P11 P17
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAA
AAA
P15
AAAA
AAAA
AAAA
AA
AA
AA
P21
P16 P8
C
AAAA
AAAA
AAAA
AA
AA
AA
P31
AAAA
AAAA
AAAA
AA
AA
AA
P20D
AAA
AAA
AAA
AAA
AAA
AAA
P23K
AAAA
AAAA
AAAA
AA
AA
AA
P25
AAA
AAA
AAA
AAA
AAA
AAA
P29
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAA
AAA
AAA
AAAA
AAAA
AAAA
AA
AA
AA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
A
A
A
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
A
A
A
P1 P2 P6 P9 P10
P3
P12
P11
AAAA
AAAA
AAAA
AA
AA
AA
P14
AAAA
AAAA
AAAA
AAAA
AA
AA
AA
AA
P21
P16 P8
C
AAAA
AAAA
AAAA
AAAA
AA
AA
AA
AA
P31
AAA
AAA
AAA
AAA
AA
AA
AA
AA
P20D
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
AAAA
P23K
AAAA
AAAA
AAAA
AAAA
AA
AA
AA
AA
P25
P3
C
P3
P18 P27
P18 P27 P28
Time
Processor pe1
Processor pe
2
Processor pe
3
(hardware)
Bus pe
4
Processor pe
1
Processor pe2
Processor pe3
(hardware)
Bus pe4
Processor pe1
Processor pe2
Processor pe3
(hardware)
Bus pe
4
c) Adjusted schedule of the path corresponding to D∧C∧K
b) Optimal schedule of the path corresponding to D∧C∧K
a) Optimal schedule of the path corresponding to D∧C∧K
Fig. 4. Optimal and adjusted schedules for paths extracted from the process graph in Fig. 1
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on October 20, 2009 at 05:33 from IEEE Xplore.  Restrictions apply. 
cess will be moved to a new activation time and the
schedule is readjusted by changing the start time of some
unlocked processes (similar to the operation performed at
the initial adjustment). The main problem which has to be
solved is to find the new activation time for Pi so that con-
flicts are avoided. In [5] we demonstrated the following two
theorems in the context of our table generation algorithm:
Theorem 1
Consider a process Pi which is part of two paths, Lk and
Lq, with activation times τPi
Lk and τPi
Lq respectively. If the
set of predecessors of Pi is different in the two paths,
then no conflict is possible between the columns
corresponding to the two activation times.
As a consequence of this theorem readjustments for con-
flict handling can not impose an activation time of a process
which is not feasible for the respective path.
Theorem 2
Consider a process Pi so that placement of its activation
time τPi
L , corresponding to the current path L, into the sched-
ule table produces a conflict. There exists an activation
time τ' of Pi, corresponding to one of the previously han-
dled paths with which the current one is in conflict, so that
τ' has the following property: if Pi is moved to activation
time τ' in the current schedule, all conflicts are avoided.
Consider W the set of columns with which there exists a
conflict at placement of the activation time for Pi. Based on
Theorem 2 we know that one of the times τPi
F placed in a col-
umn F∈W, represents a correct solution for conflict elimination.
Thus, the following loop over the setW can produce the new ac-
tivation time of a process Pi so that all conflicts are avoided:
6. Experimental Evaluation
The strategy we have presented for generation of the
schedule table guarantees that the path corresponding to the
largest delay, δM, will be executed in exactly δM time. This,
however, does not mean that the worst case delay δmax, cor-
responding to the generated global schedule, is always
guaranteed to be δM. Such a delay can not be guaranteed in
theory. According to our scheduling strategy δmax will be
worse than δM if the schedule corresponding to an initially
faster path is disturbed at adjustment or conflict handling so
that its delay becomes larger than δM.
For evaluation of the schedule merging algorithm we used
1080 conditional process graphs generated for experimental
purpose. 360 graphs have been generated for each dimension
of 60, 80, and 120 nodes. The number of alternative paths
through the graphs is 10, 12, 18, 24, or 32. Execution times
were assigned randomly using both uniform and exponen-
tial distribution. We considered architectures consisting of
one ASIC and one to eleven processors and one to eight
busses [5]. Experiments were run on a SPARCstation 20.
Fig. 5 presents the percentage increase of the worst case
delay δmax over the delay δM of the longest path. The aver-
age increase is between 0.1% and 7.63% and, practically, it
does not depend on the dimension of the graph but only on
the number of merged schedules. It is worth to be men-
tioned that a zero increase (δmax=δM) was produced for
90% of the graphs with 10 alternative paths, 82% with 12
paths, 57% with 18 paths, 46% with 24 paths, and 33% with
32 paths. In Fig. 6 we show the average execution time for
the schedule merging algorithm, as a function of the number
of merged schedules. The time needed for scheduling of the
individual paths depends on the employed algorithm. As
demonstrated in [5], good quality results can be obtained
with a list scheduling based algorithm which needs less than
0.003 seconds for graphs having 120 nodes.
Finally, we present a real-life example which implements
the operation and maintenance (OAM) functions correspond-
ing to the F4 level of the ATM protocol layer [1]. Fig. 7a
shows an abstract model of the ATM switch. Through the
switching network cells are routed between the n input and
q output lines. In addition, the ATM switch also performs
several OAM related tasks.
In [4] we discussed hardware/software partitioning of the
OAM functions corresponding to the F4 level. We concluded
that filtering of the input cells and redirecting of the OAM cells
towards the OAM block have to be performed in hardware as
part of the line interfaces (LI). The other functions are per-
formed by theOAMblock and can be implemented in software.
We have identified three independent modes in the func-
tionality of the OAM block. Depending on the content of the
input buffers (Fig. 7b), the OAM block switches between
these three modes. Execution in each mode is controlled by a
for all columns F∈W do
ifmoving P
i
to τPi
F all conflicts are avoided then
return τPi
F
end if
end for
0
1
2
3
4
5
6
7
10 15 20 25 30 35
I
n
c
r
e
a
s
e
o
f
δδ m
a
x
o
v
e
r
δδ M
(
%
)
Number of merged schedules
120 nodes
80 nodes
60 nodes
Fig.5.Increase of the worst case delay
0.05
0.1
0.15
0.2
0.25
10 15 20 25 30 35
E
x
e
c
u
t
i
o
n
t
i
m
e
(
s
)
Number of merged schedules
120 nodes
80 nodes
60 nodes
Fig. 6.: Execution time for schedule merging
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on October 20, 2009 at 05:33 from IEEE Xplore.  Restrictions apply. 
statically generated schedule table for the respective mode.
We specified the functionality corresponding to each mode
as a set of interacting VHDL processes. Table 2 shows the
characteristics of the resulting process graphs. The main ob-
jective of this experiment was to estimate the worst case
delays in each mode for different alternative architectures of
the OAM block. Based on these estimations as well as on
the particular features of the environment in which the
switch will be used, an appropriate architecture can be se-
lected and the dimensions of the buffers can be determined.
Fig. 7b shows a possible implementation architecture of the
OAM block, using one processor and one memory module
(1P/1M). Our experiments included also architecture models
with two processors and onememory module (2P/1M), as well
as structures consisting of one respectively two processors
and two memory modules (1P/2M, 2P/2M). The target ar-
chitectures are based on two types of processors: 486DX2/
80MHz and Pentium/120MHz. For each architecture, pro-
cesses have been assigned to processors taking into
consideration the potential parallelism of the process graphs
and the amount of communication between processes. The
worst case delays resulting after generation of the schedule
table for each of the three modes, are given in Table 2. As
expected, using a faster processor reduces the delay in each
of the three modes. Introducing an additional processor,
however, has no effect on the execution delay in mode 2
which does not present any potential parallelism. In mode 3
the delay is reduced by using two 486 processors instead of
one. For the Pentium processor, however, the worst case de-
lay can not be improved by introducing an additional
processor. Using two processors will always improve the
worst case delay in mode 1. As for the additional memory
module, only inmode 1 the model contains memory accesses
which are potentially executed in parallel. Table 2 shows
that only for the architecture consisting of two Pentium pro-
cessors providing an additional memory module pays back
by a reduction of the worst case delay in mode 1.
7. Conclusions
We have presented an approach to process scheduling
for the synthesis of embedded systems. The approach is
based on an abstract graph representation which captures, at
process level, both dataflow and the flow of control. A
schedule table is generated by a merging operation per-
formed on the schedules of the alternative paths. The main
problems which have been solved in this context are the
minimization of the worst case delay and the generation of
a logically and temporally deterministic table, taking into
consideration communication times and the sharing of the
busses. The algorithms have been evaluated based on exper-
iments using a large number of graphs generated for
experimental purpose as well as real-life examples.
References
[1] T.M. Chen, S.S. Liu, ATM Switching Systems, Artech House
Books, 1995.
[2] P. Chou, G. Boriello, Interval Scheduling: Fine-Grained
Code Scheduling for Embedded Systems, Proc. ACM/IEEE
DAC, 1995, 462-467.
[3] E.G. Coffman Jr., R.L. Graham, "Optimal Scheduling for two
Processor Systems", Acta Informatica, 1, 1972, 200-213.
[4] P. Eles, Z. Peng, K. Kuchcinski, A. Doboli, System Level
Hardware/Software PartitioningBased on Simulated Annealing
and Tabu Search,Des. Aut. for Emb. Syst., V2, N1, 1997, 5-32.
[5] P. Eles, K. Kuchcinski, Z. Peng, A. Doboli, P. Pop, Process
Scheduling for Performance Estimation and Synthesis of Em-
bedded Systems, Research Report, Department of Computer
and Information Science, Linköping University, 1997.
[6] R. K. Gupta, G. De Micheli, A Co-Synthesis Approach to
Embedded System Design Automation, Des. Aut. for Emb.
Syst., V1, N1/2, 1996, 69-120.
[7] H. Kasahara, S. Narita, "Practical Multiprocessor Scheduling
Algorithms for Efficient Parallel Processing", IEEE Trans. on
Comp., V33, N11, 1984, 1023-1029.
[8] K. Kuchcinski, "Embedded System Synthesis by Timing
Constraint Solving", Proc. Int. Symp. on Syst. Synth., 1997.
[9] Y.K. Kwok, I. Ahmad, "Dynamic Critical-Path Scheduling: an
Effective Technique for Allocating TaskGraphs to Multiproces-
sors", IEEETrans. onPar. & Distr. Syst., V7, N5, 1996, 506-521.
[10] Y. S. Li, S. Malik, Performance Analysis of Embedded Soft-
ware Using Implicit Path Enumeration, Proc. ACM/IEEE
Design Automation Conference, 1995, 456-461.
[11] S. Prakash, A. Parker, SOS: Synthesis of Application-Specific
Heterogeneous Multiprocessor Systems, Journal of Parallel
and Distrib. Comp., V16, 1992, 338-351.
[12] K. Suzuki, A. Sangiovanni-Vincentelli, Efficient Software
Performance Estimation Methods for Hardware/Software Code-
sign, Proc. ACM/IEEE DAC, 96, 605-610.
[13] J.D. Ullman, "NP-Complete Scheduling Problems", Journal
of Comput. Syst. Sci., 10, 384-393, 1975.
[14] T. Y. Yen, W. Wolf, Hardware-Software Co-Synthesis of Dis-
tributedEmbeddedSystems, Kluwer Academic Publisher, 1997.
Fig. 7. ATM switch with OAM block
LI
LI
LI
.
.
.
.
.
.
OAM
block
from/to Phys. Layer&
Management Syst.
i1
i2
in
o1
o2
oq
Processor
OAM cells
(from LIs)
from Phys. Layer&
Management Syst.
to Management
System
(to sw. netw.)
OAM cells
Memory
a) ATM switch b) OAM block
Table 2: Worst case delays for the OAM block
mo
de
Model Worst case delay (ns)
nr.
proc.
nr.
paths
1P/1M 1P/2M 2P/1M 2P/2M
486 Pent. 486 Pent.
2×
486
2×
Pent.
486+
Pent.
2×
486
2×
Pent.
486+
Pent.
1 32 6 4471270144712701293221312532293219322532
2 23 3 1732116717321167173211671167173211671167
3 42 8 5852354858523548503335483548503335483548
Authorized licensed use limited to: Danmarks Tekniske Informationscenter. Downloaded on October 20, 2009 at 05:33 from IEEE Xplore.  Restrictions apply. 
