Efficient scheduling of real-time compute-intensive periodic graphs on a large grain data flow multiprocessor by Akin, Cem
Calhoun: The NPS Institutional Archive
Theses and Dissertations Thesis Collection
1993-03
Efficient scheduling of real-time compute-intensive
periodic graphs on a large grain data flow multiprocessor
Akin, Cem










ECURITY cJ^W«TON OF THIS PAGE
ia. REPORT SECURITY CLASSIMCA I ION UNCLASSIFIED
REPORT DOCUMENTATION PAGE
1b. RESTRICTIVE MARKINGS
2a SECURITY CLA55IFICA I ION AUTHORITY
2b. D F.CLA55I FICATION/DOWNG
1
RABINS SCHEDULE
3. DISTRIBUTION/AVAILABILITY CF REPORT
Approved for public release;
distribution is unlimited











6c. ADDRESS (City, State, and ZIP Code)
Monterey, CA 93943-5000
7b. ADDRESS (City, State, and ZIP Code)
Monterey, CA 93943-5000




9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER







1 1 . TITLE (Include Security Classification)






from 05/92 to 03/93
1S. PAGE COUNT
171
14. DATE OF REPORT (Year, Month, Day)
March 1993
16. supplementary NOTATiorfhe views expressed in this thesis are those of the author and do not reflect the official
policy or position of the Department of Defense or the United States Government
17. COSATI CODES
FIELD GROUP SUB-GROUP
18. SUBJECT TERMS (Continue on reverse if necessary and identify by block number)
Large Grain Data Flow, Data Flow, Scheduling, Graph Restructuring, Run-
time, Real-time, Compile-time, Resource Usage, Throughput
1 9. ABSTRACT (Continue on reverse if necessary and identify by block number)
Architectures of computer systems based on Data Flow (DF) concepts attracted great attention as an
alternative to conventional sequential architectures (Von Neumann). DF architectures are capable of efficiently
sxploiting a massive amount of parallelism inherent in many types of computation. They are programmed using
directed graphs whose vertices are function modules and whose edges denote data dependencies between function
modules. An important subclass of DF is Large Grain Data Flow (LGDF) which is efficiently used in computation
intensive applications, such as signal processing. Presently, most leadoffs incorporate nondeterministic run-time
:echnique to allocate system resources to support the execution (One such technique could be First Come First
Served). Despite of the usual simplistic nature of scheduling techniques which, results in a low run-time overhead,
the system throughput and predictability could rapidly degrade under high system load. To provide uniform output
and improve the resource usage even under a high load, a compile-time technique called Revolving Cylinder (RC)
was introduced. In this thesis, we present a LGDF simulator and a Graph restructurer that restructures the given
2japh according to the RC technique. We then perform a comparative experimental study of the different implemen-
'26. D ISTR I BUTION/AVAILABILITY OF AB5TRAC1
[J UNCLASSIFIED/UNLIMITED Q SAME AS RPT. Q DTIC USERS 21 . ABSTRACT SECURITY CLASSIFICATIONUNCLASSIFIED
6F RESPONSIBLE INDIVIDUAL
y
2WAC22a. NAmr 22b. TELEPHONE/7nc/ude Area Code)(408) 656-2693 E SYMBOL
)D FORM 1473, 84 MAR 83 APR edition may be used until exhausted
All other editions are obsolete
SECURITY CLASSIFICATION OF THIS PAGE
UNCLASSIFIED
SECURITY CLASSIFICATION OF THIS PAGE
tation of RC and the FCFS scheduling techniques. Our results demonstrate that there is a high
potential for the RC technique, if a satisfactory node mapping technique is developed.
SECURITY CLASSIFICATION OF THIS PAGE
UNCLASSIFIED
u
Approved for public release; distribution is unlimited
Efficient Scheduling ofReal-time Compute-intensive Periodic Graphs




B.S., Turkish Naval Academy, 1987
Submitted in partial fulfillment of the
requirements for the degree of





Architectures of computer systems based on Data Flow (DF) concepts attracted
great attention as an alternative to conventional sequential architectures (Von Neumann).
DF architectures are capable of efficiently exploiting a massive amount of parallelism
inherent in many types of computation. They are programmed using directed graphs
whose vertices are function modules and whose edges denote data dependencies between
function modules. An important subclass of DF is Large Grain Data Flow (LGDF) which
is efficiently used in computation intensive applications, such as signal processing. Pres-
ently, most leadoffs incorporate nondeterniinistic run-time technique to allocate system
resources to support the execution (One such technique could be First Come First Served).
Despite of the usual simplistic nature of scheduling techniques which, results in a low run-
time overhead, the system throughput and predictability could rapidly degrade under high
system load. To provide uniform output and improve the resource usage even under a high
load, a compile-time technique called Revolving Cylinder (RC) was introduced. In this
thesis, we present a LGDF simulator and a Graph restructurer that restructures the given
graph according to the RC technique. We then perform a comparative experimental study
of the different implementation of RC and the FCFS scheduling techniques. Our results
demonstrate that there is a high potential for the RC technique, if a satisfactory node map-





A. DATA FLOW ARCHITECTURES 1
B. OBJECTIVES 2
1. Scope of the Thesis 2
C. THESIS ORGANIZATION 3
II. BACKGROUND 4
A. REVOLVING CYLINDER ANALYSIS 4
B. IMPLEMENTATION OF RC 10
C. POTENTIALS OF THE RC ANALYSIS 11
IE. SIMULATOR :": 12
A. PROGRAM MODEL 12
B. MACHINE MODEL 15
1. Processors 15
a. Generic Processors (GPs) 15
b. Input/Output Processors (IOPs) 16
2. Global Memory Modules(GMMs) 16
3. Scheduler : 17
C. SIMULATOR DESCRIPTION 18
D. SIMULATOR PROGRAM DATA STRUCTURE 22
E. USER INTERFACE 22
1. Input 22
2. Output 24
F. EXAMPLE RUNS OF THE SIMULATOR 25
IV. GRAPH RESTRUCTURING 28
A. CYLINDER MAPPING 30
B. INDEX ASSIGNMENT 31
C. SYNCHRONIZATION ARCS CREATION 32
D. PERFORMANCE EVALUATION 35
1. Basic Performance Measurements 38
2. Evaluating the Effect of Memory Mapping 40
3. Effect of the Queue Size on the System Throughput 42
V. CONCLUSION REMARKS 43
A. CONCLUSIONS 43
B. FUTURE WORK 43
APPENDIX A: Sample Graph File 45
APPENDIX B: Simulator Source Code 48
APPENDIX C: Graph Restructure Source Code 135
APPENDIX D: Interrelation of the Files in PIPDAFS and GR 162
REFERENCES 163
INITIAL DISTRIBUTION LIST 164
VI
I. INTRODUCTION
A. DATA FLOW ARCHITECTURES
The Data Flow (DF) architecture is an alternative to Von Neumann architecture and is
capable of exploiting a massive amount of parallelism inherent in many types of
computation. In the DF model of computation, a program is represented by a directed graph
in which the nodes denote operations and the arcs connecting them represent data
dependencies. Nodes become ready when their operands arrive, thus computations are
data-driven. The DF model supports concurrent execution of ready nodes. In this model,
execution of nodes is asynchronous because of its data-driven nature. Also, computations
are free from side effects. There is no notion of shared data storage and results are conveyed
directly by means of data arcs, [KARP 66]. These properties imply that multiprocessor
architectures based on DF model need not suffer from the synchronization and coordination
overheads incurred in Von Neumann architectures.
Depending upon the chosen granularity, DF architectures can be categorized in two
groups: Large Grain Data Row (LGDF) and Fine Grained Data Flow (FGDF)
architectures. The granularity of nodes is crucial to the effectiveness of multiprocessing.
For a given application, the larger the node grain size, the smaller the degree of parallelism
that can be exploited. However, the larger the grain size, the smaller the amount of
communication overhead that is incurred. Thus, fine granularity does not necessarily imply
better performance, because it also increases communication overhead, which affects the
parallelism exploited. The choice of granularity can be influenced by software engineering
considerations as well as by parallel processing considerations, [LEE 89].
Real-time compute-intensive applications require predictable response time and high
performance as measured by throughput. Satisfying response time and throughput
requirements are critical for the correct functioning real-time applications. The
predictability of both response time and throughput can be influenced by resource
allocation and communication overhead. Resource allocation and communication
overhead can be controlled at compile-time and at run-time to lead to high throughput and
deterministic response time.
Based on how a graph node and arc attributes are used at compile-time and how much
control information is generated to aid the run-time mechanism, DF scheduling
implementations can be classified as fully dynamic, self timed, static, fully static. Fully
dynamic allocation performs all scheduling of nodes at run-time based on the readiness of
inputs and resource availability. In a selftimed allocation system, compiler determines the
order of the node execution and allocates resources, but execution of the nodes is
determined at run-time by data arrival. Static node allocation involves the assignment of a
node to a processor, but the order of execution is left up to run-time scheduler based on the
node's input data. In fully static allocation, the compiler determines the exact execution
time assignment, and ordering of nodes based on that node's predicted behaviour, [LEE
90].
B. OBJECTIVES
First-Come-First-Served (FCFS) is a scheduling strategy where the ready node
primitives are ordered by the time they become ready. High system loads may cause
memory contention and imperfect computation/communication overlap resulting in a
degradation of throughput and unpredictable response time. The FCFS strategy does not
exert any run-time control to solve the above mentioned problem.
In this thesis, compile-time analysis of LGDF graphs is carried out using the
Revolving Cylinder (RC) technique. RC restructures the original graph to obtain high
throughput and predictable response time, [SLZ 92]. Performance analysis of results
obtained by simulation is carried out to compare merits of different approaches.
1. Scope of the Thesis
In [LIT 91] and [SLZ 92], the RC approach to determine the node execution
sequence was first suggested to enhance the system throughput.
This thesis refines the RC approach and its scope is analysis of previous work
[LIT91] [BELL 92]development of an event driven simulator (PIPDAFS) for LGDF
machines, and finally comparison of techniques for graph restructuring using the RC
approach.
C. THESIS ORGANIZATION
Chapter II reviews the RC approach and the previous work accomplished in the area.
In Chapter III, we present PIPDAFS (Periodic Inputs DAta Flow Simulator) and sample
runs.In Chapter IV, we discuss and compare the graph restructuring techniques and
analysis of the experimented results. In Chapter V, we conclude from the results along with
suggestion for future work to be done.
II. BACKGROUND
A. REVOLVING CYLINDER ANALYSIS
The Revolving Cylinder (RC) technique for determining the node execution sequence
performs compile-time analysis and results in the restructuring of the graph, [LIT 91], [SLZ
92]. RC technique restructures the application, described by a LGDF graph by using the
machine configuration and the application graph as input.
The first step towards restructuring is to layout the graph nodes in a schedule that
achieves some purpose (e.g. no nodes reading or writing from the same memory modules
at the same time). The graphs that we deal with are Directed Acyclic Graphs (DAG). We
benefit from research done on fine grained scheduling of parallel loops.
Loop scheduling literature [RAU 81] [HSU 86], has dealt with the problem of
scheduling multiple iterations of the same parallel loop (represented by a DAG) to saturate
the available resources. The resultant schedules possess a minimal initial cost for iterations,
while possibly extending the completion time of each iteration.
Our scheduling technique can benefit from these results by noting that the real-time
applications that we deal with require multiple instantiations of a DAG each of whose
nodes is a large-grained primitive. These multiple instantiations have similarities with the
execution of parallel iterations of a loop. Based on that, we drive the following scheduling
principle 1 :
"The schedule(s) providing the maximal throughput with minimal initiation interval
for a DAG depend only on the resource requirements of the DAG nodes and not on the
topology of the DAG."
1. This principle can be obviously deduced from the work of [RAU 81] and [HSU 86] although it was not
explicidy mentioned there.
This principle simply states that schedules with the same throughput can be found for
different DAGs as long as they have similar sets of nodes with respect to the resource
requirements. As a result of this principle, it might be worthwhile to reduce the DAG to be
scheduled to an equivalent DAG with the same set of nodes V and an empty set of edges E
before scheduling it as will be shown below.
The following examples should clarify the principle:
Figure 2.1. Sample Directed Acylic Graphs (DAG) (a)DAG I (b)DAG II (c)DAG III
The DAG I, DAG II and DAG HI are three graphs with same node requirements and
different edges. Then according to the above principle, they can have the same schedule
which produces the same throughput. Two example of the many possible schedules for
these graphs are shown in Figure 2.2a and Figure 2.2b, (The schedules assume two
processors).These examples depict the node execution pattern that is induced by a
schedule. The nodes might belong to different instances. The exact instance to which a node
belongs depends on the set of edges (that has been ignored so far) are in tables 1,2,3. We
say nodes in the schedules with the indices of the instances they belong to. The above
principle reduces optimal throughput of a DAG scheduling to a bin-packing problem.
While the set of edges in a DAG has no effect on the potential throughput of the DAG, it
















Figure 2.2. Two possible schedules for graphs with two processor
The schedules can be enforced as it is depicted Table 1, Table2 and Table 3
respectively.




























Consider the Tables 1 and 2 both of which show an indexed version of schedule I and
II on DAG I. The corresponding schedule of one instance of DAG I using the Tables 1 and
2 are shown in Figure 2.3a and Figure 2.3b respectively. It is clear that these are different
schedules possessing the property that they yield the same throughput.
The algorithms for assigning indices to the nodes in a schedule and to synchronize the
nodes so as to be faithful to the schedule are the main purpose of "RC Scheduling". These
algorithms can be found in, [SLZ 92].
The name "Revolving Cylinder" reflects a way looking at the schedule by wrapping
the nodes around a cylinder, thereby causing its end to meet its beginning. For each node
in the original graph, with the top and working toward the bottom, attempt to schedule the
node at its earliest start time If it can not be inserted at that time, delay the start time by the
width of a slot and repeat until it can be inserted. The earliest start time of all descendants
of that node and repeat the above sequence with the next node as the top node in the graph,
[LIT 91] and [SLZ 92].

























































i do d a l
bi do do bi
*2 ei ci e
k ei ci e
k ei eo
C2 ei eo
C2 di di *2
b2 di di b2
*3 e2 C2 ei
h e2 C2 ei
fi «2 k ei
C3 e2 k ei
c3 d2 d2 a3
b3 d2 d2 b3
(a)Schedule I (b) Schedule n
Figure 2.3. Execution cycles of the schedule I and schedule II on DAG I
In Figure 2.4, the execution of RC belonging to the DAG I is shown. Each node of the
graph occupies a portion of the cylinder equal to its execution time. And a new instantiation
could be started every six cycles, when two processor are used. Thus another instance of
the graph can be overlapped with the first instance after six cycles. To prevent any conflict
on the graph execution when instances are overlapped, nodes are assigned indices. For the







13 14 14 15 15 16 16 17
Figure 2.4. Execution of DAG I with Schedule I
B. IMPLEMENTATION OF RC
Once all nodes have been inserted into the cylinder, then node indices based on their








Figure 2.5. Node indeces of DAG I with schedule I.
Dependency arcs are created by using these indices. The dependency arcs belonging
to DAG I according to the schedule I are shown in Figure 2.6. Detailed explanation of node





1 W^-J c\ \ i: initial tokensj : threshold






-igure 2.6. A Possible Restructure of DAG I
10
The labels on each dependency arcs indicates the initial tokens, threshold and
consumed amount of sink node respectively.
C. POTENTIALS OF THE RC ANALYSIS
The RC technique of restructuring the application provides an improvement that the
dependencies enforce node execution in order to provide more throughput and predictable
response time. Without any extraneous control the FCFS scheduling can provide uniform
throughput The nodes receiving external data are ready for execution independent of the
status of other nodes in the graph. If external data arrives more frequently than the
execution frequencies of the lower nodes of the graph, they fall behind the upper nodes of
the graph. This results in the upper nodes output queues going overcapacity, preventing
them from entering the ready node list, [POPS 90].
It is possible to enforce the execution order loosely. Enforcing the cylinder loosely
would make the systemfully dynamic where the nodes are scheduled at run time only. It is
preferable to run the system in fully dynamic mode. The RC technique and subsequent
restructuring simply enhance the fully dynamic mode.
It is also possible to enforce the execution order strictly which would make the system
fully static. For each node of the graph a specific processor and exact time to begin
processing can be given from the cylinder. This can be enforced directly by the scheduler
to yield the fully static mode. But the running the system in the fully static mode is
unwarranted. This mode of operation can be limited to the machines which do not have a
dedicated run-time scheduler. The failure of a single processor will crash the whole system.
This is unacceptable in a real-time system.Also since, all processors are assumed to be
identical, it becomes unnecessary to assign a specific pocessors to a specific node. If a node
is ready for execution, it can be assigned to the first available processor. This will reduce
the amount of time a node waits for a processor in the fully static case, and provides




The input to Periodic Input Processing DAta Flow Simulator (PIPDAFS) is a
directed graph. Nodes are the computation to be performed on the input data.
Data passing links between the nodes are FIFO queues. Each node reads data from its
input queue, performs a nontrivial computation and writes the produced data to the output
queues. A node is assured to carry no history. An example of a program graph which











= 2 W /^\ E2 = 2000( 2 J p2 =4




E3 = 10000 fT\
P3 = 2 I 3 J
Q3 = 2 ^-^
f\ E4 = 10000
(4 J P4 =4Vly Q4=4
Figure 3.1. Sample program graph
The functionalities of the nodes are not simulated in PIPDAFS, only the resource
requirements and computation times are. For each node and queue following quantities are
prespecified and is fixed across different invocation of the graph.
Node Execution Time (Ej): It is the execution time of node i excluding any
communication and synchronization overhead.
Data Production Amount (Pj): It is the amount of data that node i produces to its
output queues for every invocation.
12
Data Consumption Amount (Qj): It is the amount data that node i consumes from
its input queues for each reading.
Data Threshold Amount (Rjj): It is the least amount of data to be present in queue
(i,j), so that node j can start execution.
Data Capacity Amount (Cjj): It is the maximum amount of data queue (i,j) can
store.
When a source node finishes its execution, it starts to write results to the output
queues, output queues current lengths are incremented by production amount of source
node. If current length of the queue is greater than threshold quantity then queue is said to
be over_threshold. Queues can not be overcapacity since source nodes which will cause
overcapacity can not be ready. Input queue sizes are updated as soon as a node consume its
data from that queue. When data is consumed the queue current lengths are decreased by
data consumption quantity. Input data for a node are queued and consumed in FIFO order.
Node production amount can be different from the consumption amount as it is depicted in
Figure 3.1. Queues do not only communicate data, but also, they can communicate
synchronization information.
E,: Execution time of node i
P,: Data Production Amount of node i
Q,: Data Consumption Amount of node 1
R^: Data Threshold Amount of queue
from node i to node j.
C -Data Capacity of the queue from
node i to node j.
L,,: Current Length of queue from
node i to node j
Figure 3.2. A sample graph node with its associated queues.
13
Assuming the node N in Figure 3.2 whose input queues are numbered from 1 to n
and output queues are numbered from 1 to m.Let us accurately describe the conditions in
which a node can be ready to be executed:
Queue Current Length (Ljj): Current length of the queue (i,j).
Queue which links node i to node N, is over_threshold when;
Ljjsj > Rjjsj for alii, 1—»n
Queue which links node i to node N, is overcapacity when;
Ljn > Cjjvj for alii, 1—>n
When node setup is completed each input queue of the node is consumed.
Lj^sj = Ljjsj - Qn for all i,l—>n
When a node is executed each output queue of the node is written.
Ljsfj = Ln j + Pj for all j, 1—>m
A node is ready to execution when following conditions are satisfied,
i.) L jv — R ijsr for all i, 1—>n (All input queues are over_threshold.)
ii.) L Nj+P N — C Nj for all j, 1—>m ( All output queues has enough space
to store the results.)
iii.) If t
n





+ £. for ^j ^ j_^n ^ no(je can nQt j^g j-pujitiple instances
executing at a time.)
Input data arrival has a periodic nature. In every period I/O processors execute
I/O nodes. The I/O nodes read the raw data from the external units such as sensors, format
it, and forward it to its output queues. If the data arrival rate is higher than I/O node
execution, then there is a chance of having new data overwrite the old data causing a data
loss. It is the responsibility of application programmer, to ensure the program correctness
by either choosing a computation which is not sensitive to mild data loss or by choosing an
appropriate rate that can be met almost always by different machine components.
14
B. MACHINE MODEL
The machine model we assumed is based on three functional components: the
Scheduler(SCH), Processors (GPs, IOPs), Global Memory Modules (GMMs). As data for
the nodes becomes available, each node must be scheduled to execute on a processor. The
machine model can be seen in Figure 3.3, and its functional components are described
below in details.
GP .Processor is free
GMM
Queue overthreshold .Queue overcapacity








a. Generic Processors (GPs)
Based on the schedule signal a GP will read the primitive code and the
necessary queue data from the GMMs (set-up stage), it will execute the designated
primitives, and write the output queue data back to the GMMs.(breakdown stage). A
maximum of three nodes can be associated with an GP at any one time
i.) One node being setup.
ii.) One node executing.
iii.) Another node being broken down.
15
It is important to note that, if a node is being broken down, a node assigned to set
up on the GP can not start to set up (although it has been assigned to that GP).
Allowing set up and break down to overlap with execution is the main mechanism
for allowing computation-communication overlap.
A processor is said to be free and available for reassigning by the scheduler, if the
setup stage is not busy. Needless to say that this concept of a free processor is different from
the classical definition of what a free processor is. A processor can still be executing and
be considered free. The purpose of that is to allow for maximum computation and
communication overlap.
When the data is ready for an internal node, SCH sends a signal to GMM to send
code to assigned GP. Each GP has a local memory which capable of storing the primitive
code and the input data. As soon as GP acquires the code seeks input data from GMMs.
When a GP finishes node set up informs the SCH that it is free.
b. Input/Output Processors(IOPs)
Input nodes periodically receive data from the sensors. IOPs execute the input
node of DFG by formatting the raw data and writing to their output queues. Output nodes
are also executed at IOPs and IOPs redirect the ready data to the next stage in the system.
When the data is ready for input or output nodes, SCH sends a signal to an
IOP to execute the specified input or output node. IOPs send a signal to GMMs for writing/
reading.
2. Global Memory Modules(GMMs)
The GMMs provide the data storage for the machine. These GMMs are different
from a typical computer memory, since each is proactive (each has some sort of a
processor) and operates independently. Each data queue is allocated to a single GMM for
storage. When a GP starts node setup, input data queues are consumed. When a GP finishes
execution all the output queues from the completed node are produced (written) to the
appropriate GMMs. After each produce and consume queue, current lengths are updated.
16
GMMs check for over_threshold or overcapacity conditions. If GMM recognize a over_
threshold informs the scheduler. When a node is assigned to an GP, SCH instructs to GMM
to write primitive code to the GP's local memory. Upon receiving the primitive code by GP,
it asks appropriate GMMs for input data. GMMs only communicate with one processor at
a time. If there are requests while GMM is busy, these requests are waited for GMM to be
available.
3. Scheduler
In this model, SCH works like a dispatcher and maintains two list one for the
ready nodes and the other one for processors which indicates the free processors. SCH
dispatches the ready nodes to the free processors. A node can be in three different state
i.) Ready node; which is waiting in the ready list to be assigned to a free
GP to be executed.
ii.) Processing node; which is assigned to a processor and is in one of the
setup, breakdown or execution stages.
iii.) Waiting Node; which is waiting for input data's being ready or output
queues having enough space or both.
A node becomes ready, if all of its input queues are over threshold and output
queues will not be over_ threshold after the node execution. When a node becomes ready,
it is put into the ready node list and then SCH attempts to match a free processor to a ready
node. When a processor finishes node setup and start to node execution, it is indicated as
free processor, and SCH checks the ready node list, if ready node list is not empty then
attempts to match a ready node to the free processor.
We assume that SCH runs on a dedicated processor. 1
1. Alternatively Scheduler could be made to run on a CP with a high priority.
17
C. SIMULATOR DESCRIPTION
PIPDAFS is an event driven simulator, events are stored in a queue which is
prioritized by their time stamps. The simulation terminates when a certain number of
instances (prespecified by the user) has been executed. The event queue is first initialized
with events denoting readiness of the input nodes, then the rest of the events are produced
from these initial events. An instance is counted as started when one of the input nodes
belonging to that instance has started to setup. An instance is counted as completed, when
all of the output nodes belonging to that instance have finished breakdown. The events that
may occur during the simulation are shown in Figure 3.4. In Figure 3.4 simulation starts
from the reach_producton_event and terminates with finish_breakdown event.The events
are;
1) Reachproduction_period\ This event is produced periodically by the external
input data and if the output queues of input node has enough space, makes the input nodes
ready to execute. This event produces the Inputqueuesoverthreshold and the
Reach_production_period events.
2) Inputqueuesoverthreshold: This event indicates that all input queues of a node
are over_threshold. In other words, input data is available for node execution. While
processing this event output queues are checked whether they enough space to store the
results or node is an output node. If the output queues have space or the node is an output
node then the Readyjnode event is produced to indicate that node is ready to be executed.
3) Readyjnode: This event indicates that node is ready to be executed and can be
put into the ready node list and be scheduled to a free processor. If a previous instance of
the ready node is currently executing then according to the policy (defined by the user
before simulation) ready node may be put into the ready node list, but not executed until
previous instance has been completed or ready node not even put into the ready node list.
For each ready node added the Schedule_a_nodejfrom_ready_list event is produced.
18
WbeQ queue 15 Dot the 1
queue to be read ibea
read the next quei
When memory module is busy or
Processor is currently writing
yWhen setup or breakdown stage ls bus t
Figure 3.4. Events that are produced during the simulation
19
4) Schedule_a_nodejfrom_ready_list: If both the free processor list and the
ready node list were not empty. This event attempts to match a ready node to a free
processor. If the attempt is successful, it produces Start_setup event.
5) Startjsetup: This event is produced, when a ready node matches to free
processor successfully. If the node is the first input node of a new instance, then it will
indicate the time for a new instance start.
6) Startjreadinginstruction_str-earn: This event keeps trying to read the
instruction stream until the memory module, where instructions are stored is not busy.
Then it marks the memory module as busy and produces the
Finish_reading_instruction_stream event.
7) Finish_reading_instruction_stream:This event marks a memory module as
not busy and for each input queues of the node whose instruction stream was accessed, the
Startjreadqueue event is produced.
8) Startjreadqueue: This event keeps checking a memory module and assigned
processor until memory module is not busy and processor is not reading, then the
Finish_reading event is produced. Reading is simulated by time delay.
9) Finishjreading: This event completes the reading of the queue. If it is the last
queue which is to be read, then Finishjsetup event is produced.
10) Finishjsetup: If processor is not executing and breakdown stage is not busy,
setup can be finished. When setup is finished, Free processor and Start_execution events
are produced. Otherwise finish_setup event is reentered with a new time stamp.
20
//) Freeprocessor: This event indicates that processor is free and produces the
Schedulejajnodej'romjreadyjist event is produced.
12) Start_execution:T\\\s event produces Finish_execution event.
13) Finishjexecution: If breakdown and setup stages are not busy, then produces
the Startjbreakdown event else checks whether setup stage is waiting for execution to
finish or not. When setup stage is waiting for execution to finish, Finishjexecution event
is produced in order to prevent deadlock.If neither setup stage is waiting for execution stage
nor setup and breakdown stages were not ready then Finishjexecution event reentered
with a new time stamp.
14) startjbreakdown: This event produces Start_writejqueue event for each
output queue and indicate the assigned processor as not free. Since setup and breakdown
can not happen concurrently in this model.
15) Start_writejqueue: If memory module to be written is not currently busy
and processor is not currently writing, then the Finish_writing event is produced else
Startjwritejqueue is reentered with a new time stamp.
16) Finish_writing: This event completes the writing to the queue. If it is the
last queue to be written, produces the FinishJbreakdown event. Updates the output
queue's, current length. If queue length is overjhreshold produce the
Queuejoverthreshold event.
17) Finish_breakdown:This event completes the breakdown. If the node
completed is the last output node of that instance, it is assumed that an instance is finished
and instance finish time is written a file namely endtimes.
21
18) Queuesjoverthreshold: This event checks the all input queues whether all are
over_threshold or not. If all are over_threshold, then the Inputqueuesoverthreshold event is
produced.
19) Free scheduler. This event marks the scheduler as not busy and produces
Schedule_a_nodeJ'romjready_list event. Scheduling time can be defined by the user. This is
useful in runs when the overhead of scheduling is a problem research.
D. SIMULATOR PROGRAM DATA STRUCTURES
The simulator is written in C++. Its data structure and class dependencies are depicted in
Figure 3.5. The simulator code is given in Appendix B.
E. USER INTERFACE
1. Input
To execute the simulator user simply invokes the executable version and follows the
on-screen prompts. The simulator collects data about machine from file machine. config.
Graph data is stored in the file graph.dat. Also the memory assignments is given in the file
memory.dat. A sample graph data and a machine configuration file is given in Appendix A.
Besides the graph and the machine files, user is prompted for the following data:
- Instance Start Number. The instance number when information gathering is
started.
- Instance End Number. The instance number when the simulation will be
completed.
- Instance Overlap: It the amount of overlap between the multiple instances of
the same node. It could be none which means while a node instance is ready, another instance
of the same node can not be ready. Or it could be overlapped which means multiple instances












J £ * CO o °"S-
- o-Tf § £2 «*g 3 »p
i- ^o-r; o u c 5 3 ="r„l? Jo-
c - a
*'^-a-ax'8'8lss§:ii-8»§



































































_ _ u -o o
u >> 3 O 3
-5 C er e g.
^a s g o. o. e
a e e §
E 00 00 u0000
4> W O li O


























































































g J*c S 2
a
£ -O KO O II
n
e c n












0= = # o IS c
fc'5|-8t-
§8Sll « « ^
°
JJJI1JJJJ
00 >. i5 3 a
8 Jlit31ll



































- Node Execution Priority. It is the priority for scheduling the nodes in the
ready node list. User choose of the following:
Shortest First (Execution time): Nodes in the ready node list are
ordered by their execution times.
Longest First (Length): Nodes in the ready node list are ordered by
their length.
Random: Node's priorities are defined by the simulator randomly.
User Defined: Node's priorities are defined by the user.
No Priority(FCFS)
- Data Period: User may want to repeat the simulation for different data
periods without changing the other data. This time in order to prevent, the user from
reinvoking the simulator. The initial data period and final data period with increment
amount is asked by the simulator.
Start Period: It is the data period where the simulation will start.
End Period: It is the data period where the simulation will finish.
Interval: It is amount of increase in the data rate for after each simulation.
2. Output
After the simulation is completed, following data is produced:
- Processor Utilization (Considering setup and breakdown as useful
utilization): It is stored in file utilization.dat.
- Processor Utilization (Only execution): It is stored in file execution.dat
- Throughput: It is stored in file throughput.dat
- Response Time: It is stored in file resptime.dat
- Coefficient of Variation: It is ratio of instance length standard deviation to
the average instance length. It is stored in file Coefvar.dat
24
Files also contain corresponding normalized data rate values.
If the user prefer to store the events occurring during simulation, a log file can be
produced. Log file includes time, event name, node number, queue number, memory
module, processor number. Any user who interested in different kinds of statistics can
obtain desired information by filtering the log file.
F. EXAMPLE RUNS OF THE SIMULATOR
The graph depicted in Figure 5.1 is executed with simulator. The graph belongs to a


































Figure 3.6. Correlator graph
The results belonging to correlator graph are shown in Figure 3.7, Figure 3.8, Figure 3.9.
In resultant graph, X-coordinate refers to the normalized data rate which is the ratio of
25
maximum throughput rate to the input data rate. Maximum throughput rate means total
execution time over number of processors.Y-coordinate refers to throughput, utilization or
coefficient of standard deviation. Throughput is the number instances that are executed in
unit time in this simulator which one second. Utilization shows processor usage percent,
obtained by dividing processor busy time to total execution time. Two kinds of utilization
is calculated for processors, one consider the setup and breakdown as useful utilization
and the other one does not. For each data rate 500 instance were executed and statistical









O . 7 5
O . 7
O . 6 5
o . s
O . 5 5
O . 5
! ! y \ \ \ \ !


























5 O . 5 5 O . 6 O . 6 5 O . 7 0.-75 O . S
Mo r-ma 1 izad Data R«c« 0.35 0.9 0.95 1


























3 0.53 O.C O.fiS 0.7 0.^5 O.S 0.O5 0.9 0.95 1
Vigure 3.8. Correlator graph, throughput
26






. O 2 -
1 1 — T 1 1 ! 'A ! ' Z^




1 \ /III 1 i '
/
Figure 3.9. Correlator graph, coefficient of variation
27
IV. GRAPH RESTRUCTURING
This chapter describes the graph restructuring that enforces RC scheduling in detail.
Also, it evaluates the potential of RC-based scheduling techniques through a series of
experiments.
Figure 4. 1 An example for graph restructuring.
Figure 4. 1 describes the steps that are needed to restructure an application graph and
make it ready for execution.The first step in the process is cylinder mapping. At this point,
graph nodes are mapped to the cylinder according to a given objective. The circumference
1. In the Figure 4.1b the mapping is based on fitting the graph by using the nodes in a topologically
sorted list.
28
of the cylinder is determined by the sum of the nodes' execution times divided by the
number of processors (an approximation which does not consider the communication
delay). The mapping shown in Figure 4. 1(b) assumes two processors. After the cylinder is
mapped, the next step is index assignment. In this step, nodes are given relative indices that
clarify which node is executing which relative instance. For example in Figure 4. 1 (c), while
nodef is executing, node e can execute one later instance. Indices are determined by the
nodes' locations on the cylinder.Figure 4.1(c) depicts the index assignment. After the
indices are assigned, the next step is synchronization arcs creation to enforce the given
schedule in Figure 4.1(c). Figure 4.1(d) shows the synchronization arcs. The
synchronization arcs that are obtained are pruned by removing the redundant arcs. And the
resultant arcs are added to the original graph. The restructured graph with the added
synchronization arcs is depicted in Figure 4.1(d). The program that restructures the graph
is called Graph Restructurer (GR) and its organization is represented in Figure 4.2.The
















Figure 4.2. The structure of the graph restructurer.
29
The GR basically has three functions, cylinder mapping, index assignment and the
synchronization arcs creation.
A. CYLINDER MAPPING
The cylinder mapping component of the GR takes the original graph and machine
configuration as input data. In [LIT 91] and [BELL 92], nodes are ordered for mapping
according to their execution times. The graph topology is not considered. This produces
more dependency arcs increasing the communication overhead. Figure 4.3 represents the
dependency arcs for the same graph with a random mapping. This random mapping
produces 7 synchronization arcs, but the mapping which considers the graph topology
produces only one synchronization arc. In Figure 4.4 the mapping algorithm which
considers the graph topology is given.
In the given algorithm, first the nodes are sorted topologically, then mapping starts
from the root node and continues down the sorted list of the nodes. The mapping heuristic
is to map a child node to an empty space, such that it can start after its parent's finish time.
If no such space can be found, then the node is placed in the earliest start empty space
(probably resulting in a synchronization arc). If no attempt to find any empty space is
successful, then the cylinder circumference is incremented by a percentage specified by the












* JkP^IXX e ) /
production qty.
V^T
-igure 4.3. Dependency arcs with random cylinder mapping.
30
procedure Map_the_cylinder(G,C) /*G is Directed Acyclic Graph*
/* C is the cylinder*/
q <— topological_sort(G); /* q is a queue*/
boolean Success= true
for each node sorted topologically in the graph
Try to place the node to an empty space starts after latest end
parent node;
if there is no empty space
try to place the node to the empty space that starts before the
latest ends parent node, but has enough space to place the node
after the latest parent node;
if there is no empty space
try to place the node to the earliest available empty space,
else if there is no empty place
success = false;






Figure 4.4. Algorithm for cylinder mapping.
B. INDEX ASSIGNMENT
The data flow execution of the graph requires the analysis of the assignment of the
nodes to the cylinder to determine which instance of a node is actually to be executed. The
algorithm for index assignment is given in [BELL 92].
In this algorithm, an arbitrary node is assigned an index of zero to represent the fact
that it is the current instance of this node that is being executed. Every parent and child of
this node is examined with respect to its relative position on the cylinder.
31
For a node N, Indices of the parents and children nodes are determined according to
the following conditions:
1. If a parent's completion time on the cylinder is greater than the node N's start
time on the cylinder, then the parent must be executing a later instance.Therefore, the
parent is given an index which is incremented by one.
2. If a parent's completion time on the cylinder is less than or equal to the node
N's start time on the cylinder, then the parent can be executing the same instance with the
current node. Therefore, parent is given the same index as node N.
3.1f a child's start time on the cylinder is less than the node N's finish time on the
cylinder then the child must be executing a previous instance. Therefore, the child is given
an index which is decremented by one.
4. If a child's start time on the cylinder is greater than or equal to the node N'
finish time on the cylinder then the child can be executing the same instance with the node
N. Therefore, the child is given the same index as the node N.
The original graph and the cylinder mapping is given to the algorithm as input and the
node index assignments are output to a file.
C. SYNCHRONIZATION ARCS CREATION
After the cylinder assignment and corresponding node indices are established. The
required synchronization arcs for a given RC schedule can be determined.
A synchronization arc is a logical arc which enforces a given execution sequence. For
example, in Figure 4.1(c), node b and node c can be executed after node a's completion,
node d can be executed after node ft's completion, node e can be executed after node c's
completion, nodef can be executed after node d and node e's completion and node a can
be executed after nodefs completion. The execution sequence of all nodes except node a
2. Some value greater than one would be acceptable, although the parents output queue size will
place an upper bound on that value.
3. Some value less than one would be acceptable, although the child input queue size will place an
lower bound on that value.
32
is satisfied by the data dependency. To control the execution of node a, we need an arc from
node f which will trigger node a after it completes its execution. This trigger is called
synchronization arc. These synchronization arcs can be implemented as input queues or
output queues.The synchronization arcs implemented as input queues are called logical
synchronization arcs, and the synchronization arcs implemented as output queues are called
physical synchronization arcs.
In order to understand what logical and physical synchronization arcs means, let us
look the Figure 4. 1 again. In this Figure, after restructuring the original graph we obtain an
synchronization arc from node f to node a with two initial tokens. It simply prevents the
node a's third execution until node f finishes its first execution. This arc can be
implemented as an input token queue which has an initial number of tokens, a threshold,
and a consumption and production amount. During the execution, these tokens are
produced and consumed according to the sink and source node execution.
Enforcing the RC schedule with logical arcs is called Start After Finish (SAF)
approach. It can be easily seen from the Figure 4. le. node a can not start its third execution
until node b finishes its first execution.
The same synchronization arc can also be enforced by putting a physical arc. This
physical arc can be implemented as an output token queue. In this approach, execution
sequence is controlled by the queue capacity instead ofqueue threshold quantity. Figure 4.5
depicts the physical synchronization arc and the logical synchronization arc which
correspond to the same synchronization arc.




Enforcing the RC schedule with physical arcs is called Start After Start (SAS)
approach. It can be easily seen from Figure 4.4 (b) that node a can not start its third
execution until nodef starts its first execution.The algorithm for SAF and SAS approaches






are nodes of graph, G*/
for all nodes, n
r
check index i of n
r





starts on the cylinder
check index j of ns
/*if n
r
starts at the top of the cylinder, the latest*/
/*node ends at the bottom of the cylinder.*/
/*In this case, j should be decremental by one*/





put i - j initial tokens on the arc
set threshold = 1, consume = 1
else if i < j
put initial tokens on the arc
set threshold = j-i+1, consume = 1
end (for).







are nodes of graph, G*/
for all nodes, n
r
check index i of n
r





starts on the cylinder
check index j of n s
/*if n
r
starts at the top of the cylinder, the latest*/
/*node starts at the bottom of the cylinder.*/
/*In this case, j should be decremented by one*/





put i - j initial tokens on the arc
set threshold = 1, consume = 1
else if i < j
put initial tokens on the arc
set capacity = j-i+1, consume = 1, threshold = 1
end (for).
Figure 4.7. Algorithm to generate physical synchronization arcs.
D. PERFORMANCE EVALUATION
In our performance evaluation experiments, we use two sample graphs one is a graph
which is used for active sonobuoys and the other is an artificial test graph with a controlled
topology. The benchmark graph consists of 50 nodes and 139 data queues. It is depicted in
Figure 4. 8.The small test graph consists of 25 nodes and 27 data queues. It is depicted in
Figure 4.9.
In the experiments, in order to minimize the effect of the communication overhead, the
backward synhronization arcs from lower nodes to the upper nodes, are removed
35








Total execution time = 4181000cyc. V^
node 50 and node 5 1 are I/O nodes
(46]






Al execution times are given in cycles.
Figure 4.9. A artificial test graph.
37
1. Basic Performance Measurements
In this experiment, we examine the performance of SAS, SAF, and FCFS
scheduling techniques running on different numbers of processors and arbitrary queue to
memory mapping. The experiment is repeated twice, assuming high and low
communication overheads. Low and high communication overheads correspond to 0.15
and 0.75 percent communication-computation ratio respectively. In order to chose a data
rate, first FCFS scheduling is simulated and the data rate which saturates the machine is
chosen. The cylinder then is mapped at this data rate. The results demonstrate that FCFS
has a high throughput in most cases.
Figure 4.10 and figure 4.11 depict the throughput results from the benchmark
graph and artificial test graph respectively. In Figure 4. 10, SAF scheduling exhibits slightly







Graph Name: Benchmark graph
Number of Memory Modules: 10
Number of Processors: 2-5- 1
1
Low Comm.: 15 percent













LOW COMM. HIGH COMM
ELEVEN PROCESSORS
Figure 4. 10. The throughput obtained from the benchmark graph.
38
Graph Name: Artificial test graph
Number of Memory Modules: 10
<M- Number of Processors: 2-3-5
Low Comm.: 15 percent
















LOW COMM HIGH COMM
TWO PROCESSORS










LOW COMM HIGH COMM
FIVE PROCESSORS
Figure 4. 1 1 The throughput obtain from the artificial test graph.
In Figure 4.11, for high communication overheads, SAS demonstrates better
throughput in most cases, but for low communication SAS and FCFS are exhibiting almost
same throughput.
Uraph Name: Benchmark graph
Number of Memory Modules: 10
Number of Processors: 2-5-11
Low Comm.: 15 percent






LOW COMM. HIGH COMM
TWO PROCESSORS




Figure 4.12. The processor utilization obtained from benchmark graph.
39
Figure 4.12 shows the processor utilization of benchmark graph. The results
demonstrate The FCFS scheduling has slightly better utilization than SAS and SAF
scheduling techniques in all cases.
Figure 4.13 depicts the coefficient of variation (COV) of the iteration length
distribution. This figure measures the regularity of the output production which is at most
importance in real-time systems. As revealed by this figure, none of the scheduling
technique is conclusively better than the others with respect to COV.
Graph Name: Benchmark graph __
Number of Memory Modules: 10 W hCFS
Number of Processors: 2-5-11 M RC(SAS)
Low Comm.: 15 percent _
or,,QA_
High Comm.: 75 percent E3 RC(SAF)
JM HJUHLUMM
TWO PROCESSORS FIVE PROCESSORS ELEVEN PROCESSORS
Figure 4. 13. The Coefficient of Variation obtained from benchmark graph.
2. Evaluating the Effect of Memory Mapping
The second experiment is performed to see if mapping the queues to memory
modules to minimize the access conflicts will give better results than the randomly mapped
memory modules. The mapping is done maually by inspecting th RC schedule.The
experiment is performed assuming 30 percent communication overheads. In the
benchmark graph, 10 memory modules and 5 processors are used, and the obtained results
are depicted in Figure 4. 14. In the artificial test graph, 10 memory modules and 3 processors
40
are used, and the obtained results are shown in Figure 4.15. As revealed by the figures,
mapping the queues to memories according to the cylinder assignment gives better








Graph Name: Benchmark graph
Number of Memory Modules: 10





Random Memory mapping Adjusted Memory mapping
m













Graph Name: Artificial Test graph
Number of Memory Modules: 10








Figure 4.15 The effect of the memory mapping on scheduling techniques (Test graph)
41
3. Effect of the Queue Size on the System Throughput
This experiment is performed to watch the effect of changing the data queue sizes
with respect to FCFS, and compare that the performance of SAS and SAF assuming the
queue sizes dictated by the RC assignment. The experiment was performed on the artificial
test graph with 4 processors and 10 memory modules. The data rate which saturates the
machine in FCFS assuming very large queue sizes, is chosen. Different queue sizes were
tried. The results are demonstrated in Figure 4.16. For the given data rate, the experiment







Graph Name: Artificial Test graph
Number of Memory Modules: 10
Number of Processors: 4
Communication: 30 percent
Normalized data rate: 0.90










In this thesis, a simulator (PIPDAFS) which is used for Large Grain Data Flow
machines, and a graph restructurer (GR) which determines the node dependencies
according to a given strategy and restructures the original graph were developed. A series
of experiments were performed by using GR and PIPDAFS. GR and PIPDAFS are a kind
of tool that can be used to test the different scheduling strategies.
In our experiments, we compared FCFS scheduling with two other scheduling
techniques in which the graphs were restructured and the nodes were executed according
to their graph topologies. While in theory, restructuring techniques should improve the
performance, the result of the experiments demonstrate that this improvement is meager.
We believe that the reason for these results is that the mapping techniques employed in
these experiments completely ignored the effects of communication. We believe that
combining the restructuring techniques with a mapping, while considering communication
overhead, would give results closer to the anticipated theoretical results.
B. FUTURE WORK
As noted from the experimental results, a new mapping algorithm which considers the
communication overhead should be designed.
Algorithms for allowing the complete communication-computation overlap and
deterministic nonconflicting resource usage should be identified.
Currently synchronization arcs that are obtained through restructuring are manually
pruned. Some work is needed in the area of determining the minimal number of
synchronization arcs required to satisfy the graph restructuring, as the number of
synchronization arcs can be potentially be a major factor in the overall system performance.
In this research, a dynamic node to processor assignment technique was employed.
Using a static processor assignment technique could be a step towards achieving more run
43
-time determinism. This should be critically weighed against the utility of fully dynamic
processor assignment techniques with respect to fault tolerance.
44





















































7 1 3000 -1 -1 1 1
13 1 5000 -1 -1 1 1




8 1024 25000 -1 -1
14 1024 5000 -1 -1
20 1024 10000 -1 -1
3 1024 15000 -1 -1
9 1024 10000 -1 -1
15 1024 20000 -1 -1
21 1024 25000 -1 -1
4 1024 50000 -1 -1
10 1024 25000 -1 -1
16 1024 150000 -1 -1
22 1024 20000 -1 -1
5 1024 10000 -1 -1
11 1024 20000 -1 -1
17 1024 40000 -1 -1
23 1024 80000 -1 -1
6 1024 20000 -1 -1
12 1024 5000 -1 -1
18 1024 10000 -1 -1
24 1024 15000 -1 -1





















































2 2 9 16384 16384 131072
3 3 4 4096 4096 32768
4 4 5 4096 4096 32768
5 5 6 4096 4096 32768
6 6 25 4096 4096 32768
7 7 8 16384 16384 131072
8 8 9 16384 16384 131072
9 9 10 4096 4096 32768
10 10 11 4096 4096 32768
11 11 12 4096 4096 32768
12 12 25 4096 4096 32768
13 13 14 16384 16384 131072
14 14 15 16384 16384 131072
15 15 10 4096 4096 32768
16 16 17 4096 4096 32768
17 17 18 4096 4096 32768
18 18 25 4096 4096 32768
19 19 20 16384 16384 131072
20 20 15 16384 16384 131072
21 21 22 4096 4096 32768
22 22 23 4096 4096 32768
23 23 24 4096 4096 32768
24 24 25 4096 4096 32768
25 20 21 2048 2048 16384
26 8 3 4096 4096 32384
27 9 16 2048 2048 16384
46
Sample Machine Configuration File




















APPENDIX B: Simulator Source Code
// Author : Cem Akin
// Advisor : Amr Zaky
// Description : LGDF machine simulator class header file
//Date: 12 November 1992











































// Author : Cem Akin
// Advisor : Amr Zaky
// Description : LGDF machine simulator class.
// Date : 12 November 1992




// Constructor of simulator class.
simulator::simulator(){
clock = 0;
sch = new scheduler,
ml = new mlist;




gqueuelist = new Qlist;
gnodelist = new nlist;




























































































drate = double (double( tottime / pnum)/dperiod);
cerr« '^SIMULATION IS IN PROCESS PLEASE WAIT !!!!\n";













// This function processes the given event and enqueue the produced events to
// event queue.
void simulator::process_event(event *e,fstream& startfile,
fstream& endfile,int instancenum){







intnode *tnode =new intnode;
int duration;
clock = e->starttime; //Advance the system clock to event starttime
switch (e->eventname){
//if input data period has been reached
case reach_production_period: {
tempnode=gnodelist->getnode(e->nodenum);
tempevent->eventname=inqsoverthreshold; //Produce event that
tempevent->starttime=clock; //indicates input queues
tempevent->priority =clock; //are overthreshold.
tempevent->nodenum = e->nodenum; //It is assumed that amount
eventqueue->enqueue(tempevent); //of data at each period is
//large enough to make input
//queues overthreshold











tempevent->eventname = ready_node;//If all node output queues
tempevent->starttime = clock; //have space or node is an
tempevent->priority = clock; //output node.then produce





tempnode->waitingforoutput = true;//Otherwise set the flag to




//if node is ready to be executed
case ready_node: (
tempnode = gnodelist->getnode(e->nodenum);
//If node can not be put into the ready list decrease the readamount
//which is previously incremented to prevent
if (onlyonecanbeready && (sch->member(e->nodenum)))
decreasereadamount(tempnode);
//If node is not already in rl then put node to rl and schedule a
//from rl.








else{ // If more than one node can be ready at any time
//put node to rl. and schedule a node
sch->putnodeinrl(gnodelist->getnode(e->nodenum));
increasewriteamount(tempnode);

























if (startinstance >= inststart)
startfile« startinstance« " "« clock «endl;
if (startinstance == inststart)
interval = clock;
if ((tempnode->getsetuptimeO>=0)){






tempnode->getsetuptime()=0){ // If breakdown is not busy
tempproc->setsetupbusydll(clock+tempnode->getsetuptime());
tempproc->setsetupbusy(true);













else{ // Otherwise start to read after setup finished
















if (!(tempproc->isbreakdownbusy())){ // If breakdown is not busy













else{ // Otherwise start to read after setup finished











//Start to read primitive instruction
case start_reading_instruction_stream: {




if (!(tempmemory->isbusy())){ // if memory is not busy
//Calculate reading delay and finish reading after this delay












else { //otherwise try to read after memory become available
tempproc->setsetupbusytill(tempmemory->getbusytime());










//finishes the reading of primitive instructions





//Produce read queues for output queues
produce_readqueues(tempnode,e);
break;
// Starts to read specified input queue if processor is not already





if (tempqueue->qtype == syn_arc){




















































//Mark processor as not busy and look if there are waiting ready














case start_execution : {
tempnode=gnodelist->getnode(e->nodenum);
//Get the execution delay
duration=tempnode->getexectime();
tempproc=pl->getproc(e->assocproc);









//Finishes execution if breakdown and setup stages are not busy
















else {//Otherwise indicate that execution is waiting for setup
tempproc->waitingonsetup = true;





































//Starts writing to the specified GM if processor is not already













































































































tempproc->onlyexectime = tempproc->onlyexectime +
tempnode->getexectime();








else{// Execution stage is busy try to finish setup
//When execution is finished
tempproc->waitingonexec = true;


























//Increment current length of queue
tempqueue->inccurTentlength(tempqueue->getproductionqty());
//Decrement writeamount since queue is physically written
tempqueue->writeamount=tempqueue->writeamount-
tempqueue->getproductionqty();












//If produeced data makes the queue overthreshold
if (tempqueue->isoverthreshold()){





















tempevent = new event;
tempnode=gnodelist->getnode(e->nodenum);
tempnode->processing = false;






if (finishinstance >= inststart)













// If queue is overthreshold check the other input queues












//Marks scheduler as not busy
case free_scheduler: {
sch->setbusy(false);
//Checks if there is a ready node which is waiting for










( ; //end of procesevent
// Produces read queue events for each input if it is not an input node,





//If it is an input queue, then after fixed amount delay finish setup






















//Produces write queue events for each output queue if node is not an
//output node




//If node is an output node finish breakdown after fixed amount of delay











//If node is an internal node produce write queue event for each output
//queue
for(l->current=l->head;l->current;l->current=l->current->nextnode){









//Consumes the input queues by decreasing the threshold amount















//If all output qs are ready and if source node is waiting
69













//If sink node still overthreshold and it is not an input node
//indicates that input queues are still overthreshold
if (areallqsot(n)&&(n->getnodetype()!=inpuf)){
increasereadamount(n);






//Returns boolean if all input queues are overthreshold
boolean simulator :areallqsot(gnode *n){
Queueltem* q;







//Increases the read amount of input nodes. It is used to prevent, input queues




intlist* ql = n->getinputqueuelist();
for (ql->current=ql->head;ql->current;ql->current=ql->current->nextnode){
q = gqueuelist->getqueue(ql->current->number);
q->readamount = q->readamount +q->consumptionqty;
//Decreases the read amount of input queues
void simulator::decreasereadamount(gnode *n){
Queueltem* q;
intlist* ql = n->getinputqueuelist();
for (ql->current=ql->head;ql->current;ql->current=ql->current->nexmode) {
q = gqueuelist->getqueue(ql->current->number);
q->readamount = q->readamount -q->consumptionqty;
};
};
//Increases the write amount of output queues to prevent putting an extra ready
//node which will cause overcapacity
void simulator::increasewriteamount(gnode *n){
Queueltem* q;
intlist* ql = n->getoutputqueuelist();
for (ql->current=ql->head;ql->current:ql->current=ql->current->nexmode) {
q = gqueuelist->getqueue(ql->current->number);
q->writeamount = q->writeamount +q->productionqty;
};
>;




intlist* ql = n->getoutputqueuelist();
for (ql->current=ql->head;ql->current;ql->current=ql->current->nextnode) {
q = gqueuelist->getqueue(ql->current->number);






//Return true if all output nodes of an instance is executed or not








//Initialize the output queues execution flag











//Prints the statistical information
void simulator::printstatistics(fstream& startfile,fstream& endfile,













COUt << "********************************** ***********************\n"-
COUt« "***************************************************** ****\r.".
cout« "** **\n";
cout« "** STATISTICS **\n";
cout« "** **\n";
cout« "** **\n";
COIlt << "***************************************************** ****\r>".
COUt« "*********************************************************\ri"'
cout« endl;
cout«"PROCESSOR ID : PROCESSOR TYPE : PROCESSOR UTILIZATION(with comnr.W
for (pl->current=pl->head;pl->current;pl->current=pl->current->nextproc) {
cout «" " « pl->current->p->procid« " ";
cout« pl->current->p->ptype« " ";
cout« double((pl->current->p->durationprocbusy))/double(clock)« endl;
if(pl->current->p->gettype()!=io){
averageexectime = averageexectime +
double((pl->current->p->onlyexectime))/double(clock);
































difference = end - start;
sum = sum + difference;
squaresum = squaresum + (difference * difference);
};
stime=end - stime;
average = sum / double(instnum);
cout« "DATA RATE :"« drate« endl;




cout« "AVERAGE THROUGHPUT :"« ((instnum-l)*1000000)/stime«endl;
cout« "SIMULATION TIME :"« clock« endl;
cout« "INSTANCE LENGTH STANDARD VARIATION :"« sqrt(variance)« endl;
cout« "COMMUNICATION OF ONE INSTANCE :"«communication;
cout« endl;
cout« "COMPUTATION OF ONE INSTANCE :";
cout« tottime« endl;
cout« "COMMUNICATION / COMPUTATION RATIO :";
cout« communication / tottime;
cout« endl;
cout« endl;
aputil« drate« " "« averagebusytime/double (cnt) «endl;
onlyexec« drate« " "« averageexectime/double (cnt)« endl;
insden« drate« " "« average« endl;
thrput« drate« " " « ((instnum-l)*1000000)/stime«endl;









s = new simulator;
s->simulate();
75
// Author : Cem Akin
// Advisor : Amr Zaky
// Description : Event Class header file.
// Date : 12 November 1992





















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : Event Class source file.
// Date : 12 November 1992



















//Prints the event to the log file
void event: :printevent(fstream & logfile){
logfile« eventname «" "« starttime« " " « priority « " " ;
logfile« nodenum « " "« queuenum « " "« assocproc« " Nn'
logfile« assocmem «endl;
77
// Author : Cem Akin
// Advisor : Amr Zaky
// Description : GNODE Class (Graph node).
// Date : 12 November 1992







































































// Author : Cem Akin
// Advisor : Amr Zaky
// Description : GNODE Class source code.
//Date : 12 November 1992







































//Loads a single graph node from given stream

















































































































inqlist = new intlist;
outqlist = new intlist;
return extime;
I:
//Prints a graph node
ostream& operator«(ostream& os,gnode& g) {
os « " ID :"« g.nodeid« "node type:" « g.ntype« endl;
os « "AIS :"« g.aissize« " Exe time :"« g.exectime« endl;
os« "setup time" « g.setuptime« endl;
os« "Memory : "«g.memid«endl;
os« *g.inqlist« endl;
os « *g.outqlist« endl;
I* os « " priority :" « g.priority « endl;
os« "proctype :"« g.protype«" altproctype " «g.altproctype«endl;*/
return os;
























//Returns processor type that node can run on
proctype gnode::getproctype(){
return protype;





























































































// Author : Cem Akin
// Advisor : Amr Zaky
// Description : GQUEUE Class Header file(Graph Queue).
// Date : 12 November 1992



















//Loads one queue from given stream




























if (currentlength >= thresholdqty)
overthreshold = true;
gmid = gid;







ostream& operator«(ostream& os,QueueItem& q) {
os« "ID :"« q.queueid« "NODE IN " « q.nodein« "NODEOUT " « q.nodeout« endl;
90
os« "MEMORY :" « q.gmid«endl;
os« "THRES : "« q.thresholdqty« endl:
os« "CAPAC : " « q.capacity « "DATARATE " « endl;
os« "LENGTH " « q.currentlength « "overthreshold :"« q.overthreshold «endl;
return os;
//Increments the current length of the queue by production quantity
void Queueltem::inccurrentlength(int t){
currentlength=currentlength+t;




















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : NLIST CLASS header file(Node List).
// Date : 12 November 1992


























// Author : Cem Akin
// Advisor : Amr Zaky
// Description : NLIST CLASS source code.
// Date : 12 November 1992














int nlist::loadnodes(fstream& grphfile,fstream& mem.int choicejnt nummem){
int numnodes = 0;




for (int loop = l;loop <= numnodes;loop++){
mem» gid;
gid = gid % nummem+1;
tempn= new nitem;
tempn->element = new gnode;
if (head= NULL){
exectime=exectime+tempn->element->loadnode(giphfile,choice,gid);








//Get the node whose id number is given





cerr «nid« "***ERROR undefined node id\n";
};
//Prints a node
ostream& operator«(ostream& os,nlist& n){
if(n.head==NULL)





















q = new pnqueue;
for (current=head;ciirrent;current=current->nextitem) {
nodes[count][l] = current->element->nodeid;





for (loop =0;loop<= count;Ioop++)
if (in_degrees[loop] == 0){
tempnode = new intnode;



















tempnode = new intnode;








g_Iabel = g_label +1;
for(current=head;current;current=current->nextitem)
cout« current->element->nodeid«"--"«current->element->order«endI;
return gjabel - 1;
96
// Author : Cem Akin
// Advisor : Amr Zaky
// Description : QLIST CLASS header file(Graph Queue List).
// Date : 12 November 1992























// Author : Cem Akin
// Advisor : Amr Zaky
// Description : QLIST CLASS source file.
// Date : 12 November 1992













//Loads graph queues from file namely simdata
void Qlist::loadqueues(fstream &grphfile,fstream &memjilist* nolist,int nummem){




for (int loop = l;loop <= numqueues; loop++) {
mem»gid;
gid = gid%nummem+l;
tempq = new qitem;
tempq->element = new Queueltem;
if (head= NULL) {
tempq->element->loadqueue(grphfile,gid,nolist);







//Returns the graph queue whose id is given
Queueltem* Qlist::getqueue(int quid){
for (current=head;current;current=current->nextitem)
if (current->element->queueid == quid)
return current->element;
if (current == NULL)
cerr« "****ERROR NO SUCH QUEUED";
//Prints the graph queue list
ostream& operator«(ostream& os,Qlist& q){
if (q.head= NULL)







// Author : Cem Akin
// Advisor : Amr Zaky
// Description : MEMORY Class Header file.
//Date : 12 November 1992





















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : LGDF machine simulator class.
// Date : 12 November 1992













//Updates the busy time if it is in the future
void memory: :setbusytill(double t){
busytill = t;
};
//Set the given boolean value
void memory: :setbusy(boolean value) {
busy = value;
};








//Returns the time that memory will stay busy
int memory: :getbusytime(){
return busytill;
//Sets the given number as object id
void memory: :setobjectid(int t){
memid = t;
102
// Author : Cem Akin
// Advisor : Amr Zaky
// Description : MUST CLASS(Memory list).
// Date : 12 November 1992

















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : MLIST CLASS source code.
// Date : 12 November 1992


















//Adds given memory to the list
void mlist::addtolist(memory * mem){
if (head == NULL) {











//Returns the memory whose id is given



























// Author : Cem Akin
// Advisor : Amr Zaky
// Description : PROCESSOR CLASS header file.
//Date : 12 November 1992


















































// Author : Cem Akin
// Advisor : Amr Zaky
// Description : PROCESSOR CLASS source file.
// Date : 12 November 1992


























//Updates the busy time of the setup stage
void processor: :setsetupbusytill(double t){
if (setupbusytill < t)
setupbusytill = t;
108
//Updates the busy time of execution stage
void processor::setexecbusytill(double t){
if (execbusytill < t)
execbusytill = t;
};
//Updates the breakdown stage busy time
void processor: :setbreakdownbusytill(double t){
if (breakdownbusytill < t)
breakdownbusytill = t;
//Sets setup busy flag to the given boolean value
void processor: rsetsetupbusy(boolean value) {
setupbusy = value;
//Sets execution stage flag to the given boolean value
void processor::setexecbusy(boolean value){
execbusy = value;
//Sets breakdown busy flag to the given boolean value
void processor::setbreakdownbusy(boolean value) {
breakdownbusy = value;
};
//Sets the processor free flag to the given boolean value
void processor: :setfree(boolean value) {
free = value;




//Returns true if setup stage is busy
boolean processor: :issetupbusy(){
return setupbusy;
//Returns true if the execution stage is busy
boolean processor: :isexecbusy(){
return execbusy;
//Returns true if breakdown stage is busy
boolean processor: : isbreakdownbus y( ) {
return breakdownbusy;




//Returns time that setup stage will stay busy
double processor: :getsetupbusytime() {
return setupbusytill;
//Returns time that execution stage will stay busy
double processor: :getexecbusytime(){
return execbusytill;
//Returns time that breakdown stage will stay busy






//Sets the object id to the given value
110
void processor: :setobjectid(int t){
procid = t;
};
//Sets the processor type according to given integer value
void processor: :setproctype(int t){
switch (t){
caseO:{





































// Author : Cem Akin
// Advisor : Amr Zaky
// Description : PLIST CLASS header file(Processor list).
// Date : 12 November 1992






















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : PLIST CLASS source file.
// Date : 12 November 1992















//Adds given processor to processor list
void plist::addtolist(processor * proc){
if (head == NULL) {





for (current=head;current->nextproc ;current=current->nextproc )
;










if (current= NULL) {
cout «"***ERROR undefined processor id\n";
//Loads processor from given stream
int plist::loadprocessors(fstream& grphfile,int numprocs){
int prtype;
int proccnt = 0;
procnode* temp;
for (int loop=l;loop <= numprocs;loop++) {
















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : SCHEDULER CLASS header file.
// Date : 12 November 1992

































// Author : Cem Akin
// Advisor : Amr Zaky
// Description : SCHEDULER CLASS source code.
// Date : 12 November 1992











readynodelist = new pnqueue;
freeproclist = new pnqueue;
//Destructor




//Puts the given graph node to the ready node list
void scheduler::putnodeinrl(gnode* n){
intnode* tempnode=new intnode;











//Sets scheduler busy flay to the given boolean value
void scheduler: :setbusy(boolean value) {
busy = value;
//Returns true if the scheduler is busy
boolean scheduler: :isbusy(){
return busy;













//Updates the scheduler busy time to the given value
void scheduler: :setbusytill(int t){
busytill=t;
//Returns true if the ready node list is empty
boolean scheduler::emptyrl(){





//Schedules a node from ready list by matching it to a free processor and
118
// produces start setup event.First it tries to find a node which is not
//currently executing
void scheduler::schedule_node(nlist* nl,plist *pl,pqueue* eventqueue.double clck){














































cout« "PROCESSOR ID :"«tempevent->assocproc;





































//Returns true if the given node is already in the ready node list
boolean scheduler::member(int num){














cout « readynodelist->current->nodeitem->nid« " ";
cout« "»»»\n";











// Author : Cem Akin
// Advisor : Amr Zaky
// Description : PNQUEUE CLASS header fde (Priority Queue For integer node.
// Date : 12 November 1992




















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : PNQUEUE CLASS source file.
// Date : 12 November 1992













//Enqueues the given integer node to queue































//Returns integer node from queue
intnode * pnqueue :: dequeue(){
item* temp;
temp= head;






// Author : Cem Akin
// Advisor : Amr Zaky
// Description : PQUEUE CLASS header file(Priority queue for events).
// Date : 12 November 1992




















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : PQUEUE CLASS source file.
// Date : 12 November 1992














//Enqueues the given event according its priority





























//Dequeues the event from the top of the queue

















for (current = head;current;current = current->nextitem)




// Author : Cem Akin
// Advisor : Amr Zaky
// Description : NODE CLASS header file.
// Date : 12 November 1992

















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : NODE CLASS source file.
// Date : 12 November 1992












//Set node with id and prty




ostream& operator«(ostream& os, intnode& e){
os« "Event ID: "«e.nid;
os« endl;





// Author : Cem Akin
// Advisor : Amr Zaky
// Description : Integer List Class header file.
//Date : 12 November 1992





















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : Integer List Class source file.
//Date : 12 November 1992























//Adds the given integer to the list
void indist;:addtolist(int num){
if(head==NULL){









//Prints the integer list
ostream& operator«(ostream& osjntlist& ilist) {
os« '<' « " ";
for (ilist.current = ilist.head:ilist.current;
ilistcurrent = ilist.current->nextnode)
os« ilist.current->number« " ";




APPENDIX C: Graph Restructure Source Code
// Author : Cem Akin
// Advisor : Amr Zaky
// Description : CNODE CLASS header file.
// Date : 10 March 1993
// Last Revised : 13 March 1993
class cnode{//This class is used to represent a space that is
















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : CYLINDER CLASS header file.
// Date : 10 March 1993

























// Author : Cem Akin
// Advisor : Amr Zaky
// Description : Cylinder Class Source file.
//Date : 10 March 1993






int ps = 0;
int circum = 0;
gnodelist = new nlist;
sortedlist=new nlist;



























cylin[c].slice = new nodelist;
cylin[c].el = new nodelist(circum);
1;
I:








//Finds the latest end parent for a given node and returns it














//Maps the cylinder according to its topology








while (! flag) {
138
};
cout« "I COULD NOT FIND ANY SOLUTION WITH ";
cout« circum« "An";
cout« "I NEED LARGER CIRCUMFERENCE.WILL YOU GIVE ME AW
cout« "PERCENT THAT I CAN INCREASE THE SIZE OF CYLINDERS";
cout« "PERCENT : ";
cin » percent;
cout« 'WHANK YOU NOW I AM TRYING TO FIND A SOLUTIONS";


































boolean found 1 = false;
boolean found2 = false;
boolean found3 = false;
int minstartl = circum;
int minstart2 = circum;










for (gnodelist->current = gnodelist->head;
gnodelist->current;














































tempcnode = new cnode;
tempcnode->id = gnodelist->current->element->nodeid;
tempcnode->start = tl;







tempspace = new cnode;
tempspace->id = ++count;





if (tempcnode->finish < t2){









cout« cylin[sl].el->current->element->start« " ";
cout« cylin[sl].el->current->element->finish«endl;









tempcnode = new cnode;
tempcnode->id = gnodelist->current->element->nodeid;
tempcnode->start = t;





cout« t «" "«t5«" "«t6«endl;
cylin[s3].el->remove(t5);
if(t5<t){







if (tempcnode->finish < t6){
tempspace = new cnode;
tempspace->id = ++count;













tempcnode = new cnode;
tempcnode->id = gnodelist->current->element->nodeid;
tempcnode->start = t3;




cout« t «" "«t3«" "«t4«endl;
cylin[s2].slice->insert(tempcnode);
cylin[s2].el->remove(t3);
if (tempcnode->finish < t4){


































cylinder *c = new cylinder;













// Author : Cem Akin
// Advisor : Amr Zaky
// Description : ARCNODE CLASS header fUe.
// Date : 12 November 1992



















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : ARCNODE CLASS header fde.
// Date : 12 November 1992





















// Author : Cem Akin
// Advisor : Amr Zaky
// Description : ARCLIST CLASS source file.
//Date : 12 November 1992






















tempn = new sitem;
tempn->element=an;
if(head==NULL)








// Author : Cem Akin
// Advisor : Amr Zaky
// Description : CYLINDER CLASS header fUe.This used for index assignment and dependency
// arc creation
// Date : 12 November 1992




























// Author : Cem Akin
// Advisor : Amr Zaky
// Description : CYLINDER CLASS source file.This used for index assignment and dependency
// arc creation.
// Date : 12 November 1992








































































































































arc list* arc 1st;
fstream arcdata,datafile;
arcdata.openC'tokens",ios: :out);
arclst = new arclist;










































datafde« gnum« " ";
datafile« « " ";
datafile« arclst->current->element->sourcenodeid« " ";
datafile« arclst->current->element->sinknodeid« "
datafde« arclst->current->element->threshold« " ";
datafde« arclst->current->element->initial_length« " ";
datafde« arclst->current->element->consumption« " ";
datafde« 100« endl;
gnum++;
arcdata« arclst->current->element->sourcenodeid« " ";
arcdata« arclst->current->element->sinknodeid« " ";
arcdata« arclst->current->element->initial_length« " ";













arc list* arc 1st;
fstream arcdata,datafile;
arcdata.open("tokens",ios: :out);
arclst = new arclist;

















































datafile« gnum« " ";
datafde«0«" ";
datafile« arclst->current->element->sourcenodeid« " ";
datafile« arclst->current->element->sinknodeid« "
datafde« arclst->current->element->threshold« " ";
datafde« arclst->current->element->initial_length« " ";
datafile« arclst->current->element->consumption« " ";
datafde« arclst->current->element->capacity« endl;
gnum++;
arcdata« arclst->current->element->sourcenodeid« " ";
arcdata« arclst->current->element->sinknodeid« " ";
arcdata« arclst->current->element->initial_length« " ";








cout« "CHOICE ONE OF THE FOLLOWINGVi\n v
cout« "1..START AFTER FINISH (SAF)\n";
cout« "2..START AFTER START (SAS)\n\n":
cout« "Choice : ";
cin » choice;











ml = new mlist;
plist * pi;
pi = new plist;






















tempnode = new gnode;
for (loop = 0; loop < numprocs;loop++){
cyldata» numassignednodes;
tempslice = new cylinderrslice;
templist = new clist;
for (loop2 = 0;loop2 < numassignednodes;loop2++){
cyldata» temp;














































































pic « « " "«levell+100« endl;
else











APPENDIX D: Interrelation of the Files in PIPDAFS and GR
graph.dat machineconfig.dat
File Name: map

















Description: Simulates the graph















[BELL 92] Bell, Harold A., A Compile-Time Approach for Chaining and Execution Control in
the AN/UYS-2 Parallel Signal Processor, Master's Thesis, Naval Postgraduate
School, Monterey California, June 1992.
[HSU 86] Hsu Y. P., Highly Concurrent Scalar Processing, Phd Thesis, University of Illinois at
Urbana-Champaign, Urbana, Illinois, December 1986.
[KARP 66] Karp, R. M., and Miller, R. E., "Properties of a Model for Parallel Computations:
Determinacy, Terminacy, Termination, Queueing," SIAM Journal of Applied Mathe-
matics, v.14, No.6 November 1966.
[LEE 89] Lee, E. A., Wai-hung Ho, Edwin E. Goei, Jefferey C. Bier, and Shuvra Bhatta-
charyya, "Gabriel: A Design Environment for DSP, "IEEE Transactions on Acous-
tics Speech and Signal Processing," v.37, No ll,pp. 1751-1762, November 1989.
[LEE 90] Lee, E. A., Bier, J. C, "Architectures for Statically Scheduled Dataflow," Journal of
Parallel and Distributed Computing, v. 10, pp. 333-348, December 1990.
[LIT 91] Little, B. S., "A Technique for Predictable Real-Time Execution in the AN/UYS-2
Parallel Signal Processing Architecture," Master's Thesis, Naval Postgraduate
School, Monterey, California, December 1990.
[POPS 90] AT&T Technologies, Report 58854401, "Enhanced Modular Signal Processor
(EMSP) Principles of Operation (POPS)," AT&T Bell Laboratories, March 1990.
[RAU 81] Rau, B.R., Kuekes, PJ. and Glaeser, CD. "A Statistically Scheduled VLSI Intercon-
nect for Parallel Processors," in VLSI Systems and Computations, Computer Sci-
ence Press, pp. 389-395,1981.
[SLZ 92] Shukla, S. B., Little, B. S., and Zaky, A., "A Compile-time Technique for Control-
ling Real-time Execution of Task-level Data-flow Graphs," presented at the 1992
International Conference on Parallel Processing.
163
INITIAL DISTRIBUTION LIST
Defense Technical Information Center
Cameron Station
Alexandria, VA 22304-6 145
Dudley Knox Library, Code 052
Naval Postgraduate School
Monterey, CA 93943




Dr. Amr Zaky, Code CS/Za
Professor, Computer Science Department
Naval Postgraduate School
Monterey, CA 93943-5000
Dr. Shridhar Shukla, Code EC/Sh





Bakanliklar, Ankara / TURKEY
Golcuk Tersanesi Komutanligi
Golcuk, Kocaeli / TURKEY
Deniz Harp Okulu Komutanligi
8 1704 Tuzla, Istanbul / TURKEY
Taskizak Tersanesi Komutanligi
Kasimpasa, Istanbul / TURKEY
LTjg Cem AKIN
Haciyusuf mah. I. okul sok.
Esen apt. No:27 kat:2 daire:3







UVJULU tM'VA WW '
NAVAL POSTGRADUATE SCH001
MONTEREY CA 93943-5101
GAYLORD S

