Simulation of a data flow computer by Torsone, Carol
Rochester Institute of Technology
RIT Scholar Works
Theses Thesis/Dissertation Collections
4-2-1985
Simulation of a data flow computer
Carol Torsone
Follow this and additional works at: http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact ritscholarworks@rit.edu.
Recommended Citation
Torsone, Carol, "Simulation of a data flow computer" (1985). Thesis. Rochester Institute of Technology. Accessed from
Rochester Institute of Technology 
School of Computer Science and Technology 
SIMULATION OF A DATA FLOW COMPUTER 
by 
Carol M. Torsone 
A thesis, submitted to 
The Faculty of the School of Computer Science and Technology, 
in partial fulfillment of the requirements for the degree of 
Master of Science in Computer Science 
Approved b John L. Ellis y:--------------------------------------------John L. Ellis, Ph.D. 
Lawrence A. Coon 
Lawrence A. Coon, Ph.D. 
Peter H. Lutz 
Peter H. Lutz, Ph.D. 
Apri I 2, 1985 
PERMISSION TO REPRODUCE 
Title of Thesis: Simulation of a Data Flow Computer 
I, Carol Torsone, hereby grant permission to the Wallace 
Memorial Library, of RIT, to reproduce my thesis in whole or 
in part. Any reproduction will not be for commercial use or 
profit. 
Carol M. Tarsone 
Apri 1 2, 1985 
ABSTRACT
A data flow computer is a highly concurrent and asynchronous
multiprocessor due to its fundamentally new architecture. It has
no program counter and is not sequential. Instructions execute
whenever their operands are available to them. Because of this
data-activated instruction execution, multiple instructions can
execute concurrently.
The project for this thesis was the simulation of a data
flow computer. A graph language and machine language were
defined; then a simulator was written which reads and executes a
machine language program in the asynchronous and concurrent
manner of a data flow computer-
KEYWORDS: data flow, concurrent, asynchronous, multiprocessor,
data-act ivated.
Table of Contents
1. Introduction 1
_. Data-Flow Program Organization
7"
. 1 Data Flow Braphs 7
. Data Structures 1
.3 Machine Representation of Data Flow Graphs 16
3. Machine Orgaj-iizat ion 19
3.1 Packet Communication 19
3. Synchronization of Instruction Execution 0
4. Implementations 4
4. 1 MIT Data Flow Computer 4
4. Manchester Data-Flow Computer 6
4.3 Irvine Data Flow Machine 8
4.4 Texas Instruments Distributed Data Processor 9
4.5 Utah Data-Driven Machine 31
4.6 LAU System 33
4.7 Newcastle Data-Control Flow Computer 34
5. Project Description 36
5. 1 Description of Model 37
5. Data Flow Program Organization 4
5.3 Data Structures 46
5.3.1 Instructions 46
5. 3. Packet Communication Network 48
5.3.3 Match and Fetch Queues 5
5.3.4 The Match Unit 53
5.4 Structure of the Simulator Program 56
5.4.1 The Main Module 56
5. 4. Load Module 57
5.4.3 Match Module 58
5.4.4 Proc Module 60
5.4.5 Monitors 61
5.4.5.1 Qrngr Monitor 6
5. 4. 5. FQrngr Monitor 6
5.4.5.3 MQmgr Monitor 63
5.4.5.4 BufferMgr Monitor 63
5.4.5.5 Process Control Monitor 64
5.5 Trace Feature 65
5.6 The Data Flow Language 67
5.6.1 Graph Language 68
5.6.1.1 Arithmetic Operators 69
5.6.1.2 Logical Operators 70
5.
5.
5.
5.
5.
5 6> iz!
5.6.3
5.6.4
6. Conclusions
7. Bibliography
8. Appendices
9. User Manual
6.1.3 Halt
6.1.4 Decider Operators
6. 1 . 5 Input
6.1.6 Output
6. 1.7 Gate if True, Gate if False
6. 1. 8 Switch
6. 1. 9 Loops
6.1.10 Apply Operator and Subprogram?
6.1.11 Completeness of Graph Language
Mnemonic Language
6. . 1 Form of Mnemonic Program
Machine Language Statements
The Machine Language Program
The Assembler
71
7
73
75
76
77
78
8
84
85
89
90
96
99
101
104
114
17
FIGURES
. 1 f(x) = x**2 - 2*x +3 8
.2 f(x) = x** - 2*x + 3 10
2.3 Merge and Switch Operators 11
.4 Gate Operator 11
.5 Decider Operator 1
.6 Apply Operator 13
.7 Structures 15
.8 Machine Representation of Data Flow Graphs 17
3.1 Packet Communication Organization 19
3. 2 Token Storage 21
3.3 Token Matching 22
4. 1 MIT Data Flow Computer 25
4.2 Cell Block 25
4. 3 Manchester Data-Flow Computer 26
4.4 Irvine Data-Flow Processing Element 28
4.5 Texas Instruments Distributed Data Processor 30
4.6 Individual Data Flow Computer 30
4.7 Utah Data Driven Machine 32
4.8 LAU System 34
4.9 Newcastle Data-Control Flow Computer 35
5. 1 Model of Simulator 3q
5. Program Constants and Start Constants 41
5.3 Division of Program into Code Blocks 44
5.4 Packet Communication System 49
5. 5 Queue 54
5.6 Queues and Pool of Buffers 54
5.7 Token Store in the Match Unit 55
5.8 Arithmetic Operators 69
5.9 Logical Operators 71
5. 10 Decider Operators 72
5.11 Input Statement 74
5. 12 Output Operator 75
5. 13 Gate if true, Gate if false 76
5. 14 Switch Operator 77
5. 15 Loop operators: L, LI, D, DI 79
5. 16 Loop with Tagged Tokens 82
5. 17 Apply Operator and Subprograms 83
5.18 Conditional and While Schemas 85
5.19 Subprogram Activation 95
5.20 Specification of Constants 98
TABLES
5. 1 Mnemonic Language 37
5.2 Enabling Counts of Subprogram Operators 97
ACKNOWLEDGMENTS
I would like to recognize the members of my committee for
their assistance in the completion of my thesis. Their direc
tion, help and ideas were invaluable throughout the project.
I wish to thank Dr. Lawrence Coon for sharing his interest
in data flow computers, pointing me in the direction of the
research in this area, and lending his encouragement, interest
and ideas.
I also owe a debt of gratitude to Dr. Peter Lutz for his
ideas and assistance in the area of concurrent programming in the
Euclid language.
My chairman, Dr. John Ellis, deserves special thanks for trie
suggestions he has made, and also for his time and efforts in
guiding me from beginning to end through my thesis project.
CHAPTER 1
Introduct ion
The computers of today still share fundamental properties of
the von Neumann design, which arei
(1) A central processing unit
(2) A global, updatable memory
(3) A single instruction counter which causes a sequential cen
tralized control of computation.
(4) A connecting tube that can transmit a single word or
address between the CPU and memory (the "von Neumann
bottleneck") CBackus 19783.
Computing applications of the future will require billions
of operations a second, and the von Newmann concept has caused
problems in attaining these speeds for the following reasons.
The first is the bottleneck in the physical configuration
of the machines, as described above.
The second is the fact that conventional programming
languages used today reflect the same architectural principles
as the computers:
(1) A variable in a programming language mirrors the concept
of memory storage cells.
(2) An assignment statement is similar to the idea of a pro
cessing unit which performs state changes in the storage
through the "word-at-a-t ime" tube.
(3) The statements used for control flow (go to, if, do,
call) follow the concept of the instruction counter-
These "von Newman" languages are complex and weak, due
to a defect at their most basic level: They were designed from
the inside out (they closely reflect the behavior of the
underlying architecture) instead of from the outside in (from the
programmer's point of view, permitting the natural expression
of the problem) CBackus 19783 CMyers 19823.
The third is that, in an effort to achieve greater
speed and computing power, the concept of parallelism has
evolved. However, it has been largely restricted to the
switching of a processor among separate processes, and the use
of multiple processors (programmer-specified decomposition of a
program into parallel instruction or data streams to be processed
by separate processors). The first approach doesn't buy much
in terms of overall speed, and the latter presents a non-trivial
task to the programmer for which he is given few tools, and
still does not offer relief to the problem of memory interference
on a system which has multiple processors and shared memory
CMyers 19823.
Over the past few years, a number of novel computer
architectures have been proposed based on "naturally" parallel
organizations. They arose from a desire to utilize asyn
chronous concurrency to increase computer performance, to
exploit VLSI in the design of the computer, and from the need
for new "very high level" programming languages, which are based
on principles which conflict with those of the von Newrnann
architecture CTreleaven, Brownbridge and Hopkins, 19823. They
also arose from the feeling that necessary improvements
would only come about through new and radical approaches to the
basic design of computers.
One of these areas of research is in data flow (or data
driven) computers, which was pioneered by Jack Dennis at MIT.
Data flow does away with the basic properties of the von
Newrnann design by:
(1) Eliminating the idea of the instruction counter,
sequential instruction execution, and control flow.
Data flow computers are data driven; when the operands to
an instruction are available, the operation is executed. As
a consequence, many instructions may be available for
execution at once.
(2) Eliminating the concept of memory for storing variables.
Data values move from one instruction to the next as the
program executes.
(3) Taking advantage of parallelism within a program
without explicit directions from the programmer.
Highly concurrent computation is a natural consequence of
the data flow concept.
Data flow programs are represented by connecting
instructions (nodes) in a directed graph; i.e. one
instruction's output is another instruction's input. The
order of execution is controlled by the availability and flow
of data among the instructions, not an instruction counter.
A data flow graph is, in effect, the base language or machine
code for a data flow computer.
Data flow processors are stored-program computers in
which the stored program is a representation of data flow
graphs. The machine itself is designed to recognize which of
the instructions in its program memory are enabled, and all such
instructions are dispatched to execution units as soon as
resources are available. Any instructions in the program which
have their data available can execute concurrently CAgerwala
and Arvind 19823.
Language-based computer design can insure the programmabi 1-
ity of a radical architecture, where the computer is a hardware
interpreter for a specific base language. Since data flow
languages allow the expression of concurrency of program
execution on a large scale, a data flow computer will be able
to support great concurrency and achieve a significant increase
in performance.
In this thesis, the various architectures which have been
designed to support data flow graphs are investigated, as well as
the instruction-handling mechanisms used to build prototype
data flow systems. Further, a data flow base language will be
defined for a "generic" data flow machine architecture and
this machine will be simulated in its execution of programs.
Specific architectural features such as interconnections
between processors, speed of communications lines, etc., will not
be simulated as they are beyond the scope of this thesis.
Chapter two defines what a data flow graph or program is and
describes basic operators commonly used. Also described are
approaches and problems in handling data structures in a data
flow program, and machine representation of data flow graph pro
grams.
Chapter three describes the configuration of a data flow
computer's resources and how these resources can be allocated to
support execution of a program.
Chapter four describes the implementations of the major data
flow models and prototypes described in the literature.
Chapter five contains a detailed description of the data
flow model simulated for this thesis, as well as a definition of
the graph language used, a mnemonic form of the graph language to
be used as input into an assembler, and the machine language exe
cuted by the simulator.
Conclusions drawn after the project was completed are con
tained in Chapter six, i.e. an appraisal of the simulator. Also
described are the next steps which could be taken and related
topics which may be or are currently being investigated.
CHAPTER 2
Data Flow Program Organization
2. 1. Data Flow Graphs
In this chapter, data flow graphs or programs will be
described as they are generally viewed by those doing research in
this area CDennis 19753 CDavis and Keller 19823. There are cer
tain operators that are included in almost all graph representa
tions. Even though they may have variations in name, pictorial
representation, or slight variations in their use (e.g. number of
input and output arcs), they are basically the same operator.
Elementary data flow programs are represented as directed
graphs in which the nodes are operators (i.e. instructions).
Nodes are connected by arcs along which elementary values or
_2i<___
can travel (an elementary data flow value is of type
integer, real or string). An operator is enabled when tokens are
present on all its input arcs. An enabled operator can then exe
cute or fire at any time, provided no tokens are present on any
output arc. The enabled operator removes the tokens from its
input arcs, computes a value based on the input tokens or
operands, and places this result token on its output arc.
A result may be sent to more than one destination by means
of an operator which duplicates tokens: It removes a token from
8its input arc and places copies of the input token on its output
arcs.
ft node marked with a constant value is assumed to regenerate
that value as often as it is needed.
An example of a data flow graph is shown in Figure 2. 1. This
graph repeatedly computes the polynomial function f(x) = x ** 2 -
2 * x + 3 for a sequence of input tokens. Note that the two mul
tiply operators are not connected by an ard, which implies there
does not exist a data dependency between them. Therefore, these
operators could execute concurrently. This type of concurrency is
K
__
#
\ \(_
V
-r
result
Figure 2. 1 f (x) = x**2 - *x + 3
called horizontal, or sgacii. I___orai concurrency or _i__Ii_i__
corresponds to several generations of tokens moving through a
graph, and is illustrated in Figure . CDavis and Keller 1983.
An operator is drawn as a box containing the name of the
operation. Certain operators are distinguished by shape. The
operator which duplicates tokens, as described above, is a tiny
circle.
The merge, switch, and gate operators have both data and
control inputs. The merge and switch operators (Figure .3) are
used in conditionals and iterations. With a merge, a control
token with the value true or false must first arrive on the hor
izontal input. The value of that token determines from which of
the vertical data inputs the next token will be taken. Any token
on the other input arc remains there until selected by a subse
quent true or false token. A switch also waits for a token on
the control input; its value of true or false determines the out
put arc to which the vertical token is passed.
A gate is an operator that either passes on or absorbs the
input token depending on the value of a boolean control value
(Figure .4). In the case of a "gate if true," the vertical data
token is absorbed; then if the horizontal token had the value
true the data value is put on the output arc, but if the control
value were false the data token is absorbed and there is no out
put value. A "gate if
false"
operator works in a similar way,
X_:: if .
*
__: :__
*
^LJP
H
result
9
*
/"
_.
.j : < __,
*
10
=____?
T_I
IT
result
i ic _; _.
SeJ?
result
t_L_I
4 =5--?
1
result
9
* *
^
result
Figure 2.2 f(x) = x**2 - 2*x + 3
11
M F^V_
V MERGE
J*"
_
f SWITCH \
V
Figure 2.3 Merge and Switch Operators
G> f he
Gate if True Gate if False
Figure 2.4 Gate Operator
except that a false control value causes the data token to be put
on the output arc.
A decider operator produces a true or false control result
by applying its associated predicate to the data inputs (Figure
2.5). Typical predicates are equality, inequality, less than,
etc.
A constant-producing node car, contain a graph representing a
function, and its use is similar to the way in which conventional
programs use subroutines and procedures. The value of the token
12
Figure 2.5 Decider Operator
produced by the node is the definition of the function. It is
used as an input token to an aegi_ operator along with other
input tokens which carry parameter values for the function. This
is illustrated in Figure 2.6, where inputs to the apply operator
are the data value of r (radius) and the graph for the function
to compute volume of a sphere.
2.2. Data Structures
The token model of an data flow program shows an elementary
data value being completely swallowed up as input to an opera
tion, and a new value being produced as output. To be consistent
with this and to maintain the clear semantics of data flow, a
data structure such as ar\ array must be treated as a single
object or token rather than a collection of elements. The entire
structure must be moved through the system in the same way as an
13
Figure 2.6 Apply Operator
14
elementary token and must be copied whenever a change is made to
the structure, even if only one element will be altered.
This is obviously impractical as far as overhead is con
cerned, and one solution is to add a memory to a data flow
machine that will store only data structures CMyers 19823 CRum-
baugh 19753 CDennis 19753. Structures are then represented as
trees whose nodes are substructures (the branches are ordered),
and whose leaves are non-structure values. Tokens can then carry
pointers to these structures, rather than the structures them
selves. Examples of structures are shown in Figure 2.7.
Based upon pure LISP, each node of a structure must keep a
reference count; i.e. the number of existing pointers to that
node. No changes can ever be made to a shared structure; i.e.
one with a reference count greater than 1 (single assignment
rule). Instead, if a node's values must change and the reference
count for that node is greater than one, then the contents of the
node must be copied before any changes can be made to it in order
to prevent side effects for other pointers to that same node. If
a node' s reference count becomes 0, that node' s memory space can
be deallocated as it is now inaccessible. A garbage collection
scheme may be used for this purpose.
Another problem with treating structures as a single object
is that it limits runtime parallelism since a computation requir
ing a structure would not be able to run until the entire struc-
15
vector (5, 4, 3)
4 true
The LISP-like list
((16, (1), 73), 4, true)
A 3 x 3 matrix
12 3
4 5 6
7 8 9
Figure 2.7 Structures
16
ture is complete, even though many of its elements may be avail
able and could be used to start the computation. A proposed
solution to this problem is the use of I-structures as a data
type CArvind and Thomas 19803. An I -structure is art asynchro
nous, array- like data structure which allows random access to
individual elements, and is useful in reducing data dependencies
from the entire structure to individual elements of the struc
ture. They also use a reference count for reclamation of struc
ture space.
2. 3.
_______ __E_____t._t.i2_ S_ Data Eigw Graghs
Graph programs can be directly encoded to represent the
machine code for a specially-constructed processor (or can be
used as virtual machine code and interpreted on a conventional
processor). There are many ways to represent the coding of a
program, depending on which specific prototype of a data flow
computer the program is intended to run. One method will be
demonstrated here. Data tokens will apply only to single values,
not structures.
The graphical program is represented as a set of contiguous
memory locations, orie node or instruction per location, called a
2__ _l2_ (Figure 2.8) CDavis and Keller 19823. Each instruc
tion has a location, a function or op code, an ordered listing of
the node's input arcs identified by the location of the nodes
from which the tokens originate, and a location (instruction) to
Graph Program
(3)
17
Equivalent Pascal
Program:
if x < 0 then
x := x - 1
else
x := x + 1;
Machine Representation
Instruction Input Path Destination
Location Opcode One Two Three of Results
1 result 2
2 merge 3 4 11 1.1
3 - 7 5 2.1
4 + 5 7 2.2
5 duplicate 6 3.2, 4.1
6 1 5.1
7 switch 10 11 3.1 or 4.2
3 if (<) 10 9 11.1
9
0"
8.2
10 duplicate 12 7.1, 8.1
11 duplicate 8 7.2, 2.3
12 input 10.1
Figure 2. 8 Machine Representation of Data Flow Graphs
18
which the output token is sent. The destination of the result is
in the form i.j where i is the location of the node and j is the
input path CTreleaven 19793.
Each instruction also needs to keep an enabling count of the
number of input tokens it is waiting for before the instruction
can fire. Initially this count is set to its maximum number of
input arcs. When the count reaches zero, the node can fire.
Corresponding to the instructions in memory is the data
block, containing data tokens in contiguous locations which
parallel the instruction nodes from which they emanate. That is,
if location i in the code block represents a node of the graph,
location i in the data block represents the token value on the
arc leaving that node.
When a node becomes firable, i.e. its enabling count is zero
indicating all tokens are available, the processor can execute
the operation indicated by the instruction by fetching the values
on the node's arcs. The result of the calculation is then stored
on the node's output arc. The count of the receiving node is
decremented; if it becomes zero, implying it is now firable, the
cycle begins again. Any number of firable nodes can be processed
concurrently. A list of enabled instructions is kept, and
instruction execution is distributed over many processing units.
Firing of enabled nodes continues until no firable nodes are
left.
CHAPTER 3
Machine Organization
This section will describe how a data flow computer's
resources are configured and allocated to support execution of a
program.
i- E_____ _2____i___i2Q
Data flow computers are most often based on a packet commun
ication machine organization, which consists of a circular
instruction execution pipeline in which processors, communica
tions and memories are interspersed with pools of work (Figure
3.1) CTreleaven 19823. A program's instructions are stored in
memory, and execution of the program consists of independent
information packets, which may split and merge, traveling around
the pipeline. In this way, packets of work are allocated to
memory
Ml . .Mm pooJ
processors
Pl...Pp
communications
C1...CC
A
; (pool)
Figure 3.1 Packet Communication Organization
20
resources. Each packet to be processed is placed with similar
packets in one of the pools of work (e.g. all the packets of
enabled instructions are contained in the pool waiting for a free
processor). When a resource becomes idle, it takes a packet from
its input pool, processes it, places a modified packet in its
output pool, and returns to an idle state. A system could be
configured to have many identical resources between the pools, or
have many individual pipelines connected by a communications net
work.
_____2_i___i2_ 2f i______i2_ ______2D
There are two schemes employed in synchronizing instruction
execution CDennis 1979b3 CTreleaven, Brownbridge and Hopkins
19823: token storage and token matching.
_2___ I_2__3_ is illustrated in Figure 3.2 CTreleaven,
Brownbridge and Hopkins 19823. Data token packets arrive at the
input pool of the Update unit. The Update unit stores the input
tokens in their destination instructions in the Memory unit. At
this time, the Update unit determines if all the tokens for that
instruction have arrived, enabling it for execution. If so, the
address of that instruction is placed in its output pool. The
Fetch unit then uses these addresses to retrieve the instructions
from memory and places them in its output pool to wait for a free
processor. This basic scheme is essentially that implemented in
the MIT data flow computer CDennis, Misunas and Leung 19773 and a
1Memory
Unit
Ml . . .Mm
i
Update
Unit
f instructl
\^ addresse
on 1 -k Fetch
UnitD
. i
* *
data
tokens ~\ ProcessingUnit
Pl...Pp
/executable
J*
\. instruction
Figure 3.2 Token Storage
prototype data flow computer built by the Texas Instruments Com
pany CCornish 19793.
_2___ D3___i__ is illustrated in Figure 3.3 CTreleaven,
Brownbridge and Hopkins 19823. The Matching unit takes data
tokens from its input pool and forms them into sets, using their
destination instruction address to determine set membership.
When a token arrives, the token store in the Matching unit is
searched for a token with the same destination address. If no
tokens are found with the same address, the enabling count
required by the destination instruction is decremented by one.
If the enabling count becomes zero, the instruction is now
enabled and the token is sent directly to the Fetch /Update unit.
Otherwise, the token is stored. If, however, tokens with the
same address are found in the token store, the count is
22
Matching
Unit
x
( data YV tokens J^~
<sets of A.
tokens f
Processing
Unit
Pl...Pn
Memory
Unit
MX Mm
Fetch/
Update
unit
(executable^
instructions^
Figure 3.3 Token Matching
decremented by one, and if the count is still greater than zero,
the token is stored with the rest of its set; if the count
becomes zero, implying the instruction is now enabled, the whole
set of tokens is released to the Fetch /Update unit. This unit
forms a packet consisting of the instruction and its tokens, and
places it in its output pool to wait for a processor. Examples
of this scheme include Irvine Data Flow CArvind, Kathail and
Pingali 19803, the Manchester Data Flow System CWatson and Gurd
19793, and the Newcastle Data-Control Flow Computer [Hopkins,
Rautenback and Treleaven 19793.
The main advantage of token matching over token storage is
that it allows the removal of the restriction that only one token
can be on art output arc at any one time. The arcs then become
FIFO queues, allowing tokens to be matched into sets, and allow-
ing a program's instructions to be used reentrant ly.
CHAPTER 4
I rnp 1 ementat ions
There are many data flow models and implementations in the
literature. This chapter will describe the major ones, starting
with Jack Dennis's data flow computer at MIT, which has formed
the basis for most other projects.
i-
_II ____ Ei2_ _2EQE____
The organization of the M. I.T. machine uses token storage,
feedback and a cell block architecture. The structure of the
machine is shown in Figure 4.1 CDennis 19803.
Only one token is allowed to exist on an arc at any given
time; i.e. the firing rule is that an instruction is enabled when
all of its operands are present and there is no token existing on
its output arc.
An enabled instruction packet enters the arbitration (rout
ing) network, which passes the packet on to the appropriate pro
cessing unit according to its opcode. The processing unit pei
forms the necessary operation and sends a result packet to the
distribution network, which directs it to the cell block contain
ing the destination instruction.
24
25
Figure 4. 1 MIT Data Flow Computer
/-*
__i
-A$ UPOATE
nesuti
PACTS'
opeunos
pacxst
Figure 4.2 Cell Block
Each cell block is structured as
in Figure 4.2. The cell
26
block functions in the way described for token storage in section
3.2, with the program instructions stored in the Activity Store.
4. 2. Manchester D_ta-Fiow Computer
The Manchester Data-Flow Computer, shown in Figure 4.3
CTreleaven, Brownbridge and Hopkins 19823, is very similar to the
MIT design with two major exceptions.
The first is that it uses token matching. The token queue is
a FIFO buffer and leads to the Matching Store that is associative
in nature, implemented using RAM with hardware hashing tech
niques. A token may be one of a pair or a single input to an
instruction. Token pairs from the matching store, or single
tokens for single input instructions that have bypassed the
Matching Store, are routed to the Instruction Store. Here the
Matching
Store
' '
'
Token
Queue
Instruction
Store
outpi" *1
i T
Switch Processing
Unit
P0...P19input
_T
Figure 4.3 Manchester Data-Flow Computer
27
tokens are combined with a copy of their destination instructions
to form an executable instruction packet that is passed to the
Processing Unit. The Processing Unit consists of an arbitration
and distribution system and microprocessors, any of which can be
assigned to an executable instruction.
The second major difference is the label field carried by
each token. Since the arcs of the program graph are viewed as
FIFO queues and consequently more than one token can be on an
output arc of an instruction, each data token must carry a label
field which identifies the process to which the token belongs,
the destination instruction address, and an iteration number
specifying which token on an arc it is. This allows a program's
instructions to be used as reentrant code, and greatly increases
concurrent execution of a program.
In addition, the Manchester design has a built-in switch to
provide I/O for the system.
A complete description of this machine is found in CWatson
and Gurd 19823 and CWatson and Gurd 19793.
This project has included the design of LAPSE, a high level,
single assignment language, and the implementation of a transla
tor for LAPSE into data flow program graph language. It also
included translating conventional languages into data flow
graphs, which resulted in the development of an experimental
compiler for a subset of Pascal by P. J. Whitelock.
28
I__i__ ____ Ei2_ __tli__
The Irvine Data Flow Machine originated from research at the
University of California at Irvine, and is now located at MIT,
where research is continuing. It supports the ID high level data
flow language, makes use of VLSI with a multiprocessor design,
token matching, supports I-structures, and uses a sophisticated
token identification scheme CArvind, Gostelow and Plouffe 19783,
CTreleaven, Brownbridge and Hopkins 19823.
It consists of N processing elements and a packet communica
tions network for routing a token from one physical element to
Input
Section
>
Waiting-Matching
Section
'
Inst-ructlon
Fetch
Section
Program
Memory
' >
Service
Section
Data Structure
Memory
i
Output
Section
Figure 4.4 Irvine Data-Flow Processing Element
29
another. One processing element is shown in Figure 4.4. Tokens
are routed to the element that holds its destination instruction
in its Program Memory. If a token is destined for the same phy
sical element that generated it, it can bypass the N x N network
and use a short-circuit path back to itself.
_____ ___________ Distributed Data P_2____ (DDP)
The DDP was designed by Texas Instruments to investigate the
potential of data flow as the basis of a high-performance com
puter. It was constructed using off-the-shelf technology. The
project began in 1976 and has been operational since 1978 CCoi
nish 19793.
It is not connected with any high-level data flow language.
Instead, a cross compiler, based on the Texas Instruments
Advanced Scientific Computer's optimizing FORTRAN compiler,
translates FORTRAN subprograms separately into directed graph
representations and a linkage editor combines them into a single
program.
The DDP has many similarities to the MIT machine, in that an
instruction is enabled when a token is present on all its input
arcs and no token is present on any of its output arcs. In addi
tion, it uses token storage and control tokens for feedback sig
nals. The machine organization, shown in Figure 4.5 CTreleaven,
Brownbridge and Hopkins 19823, is significantly different from
the MIT computer, however. It has four identical data flow
50
Data
Flow
Computer
Data
Flow
Computer
Data
Flow
Computer
Data
Flow
Computer
D C L N Ring
Front End
Interface
TI 990/10
19x96K
Memory
Figure 4.5 Texas Instruments Distributed Data Processor
Xn^TrcTien
u.
<=r
AnThrtteTie. U.r>'t
__
update
CfinTrollfi-r
/v
"?
fro* r_W
MerMflrv1
Figure 4.6 Individual Data Flow Computer
computers over which a program is distributed, and a Texas
Instruments 990/10 minicomputer acting as a front-end processor
for I/O, providing operating systems support and handling collec
tion of performance data. These five units are connected by the
DCLN ring, which is a variable-length, word-wide,
circular shift
31
register. The ring may carry up to 5 variable-length packets in
parallel.
Each data flow computer consists of four principle units, as
shown in Figure 4.6. Executable instructions are removed from
the Pending Instruction Queue by the Arithmetic Unit and pro
cessed. Output token packets are released to the Update Con
troller which stores the token in the instruction in Program
Memory and decrements the instruction's Predecessor Count. If
that count becomes zero, the instruction is ready to execute and
a copy is placed in the Pending Instruction queue.
____ ____~__i_!_ ___i__
The Utah Data-Driven Machine #1 (DDM1) was designed by Al
Davis and his colleagues while working at Burroughs Interactive
Research Center in La Jolla, California, and completed in 1976.
An improved version of DDM1 now resides at the University of
Utah, where the project is continuing under support from Bur
roughs Corporation.
The recursive architecture of this machine is much different
from the ones outlined in the previous sections. Rather than a
packet communication organization, it has an expression manipula
tion machine organization (identical resources organized into a
tree, where each resource contains a processor, communication and
memory capability). It has a VLSI implementation, and exploits
physical locality to decrease message frequency and increase
32
speed.
Its tree structure has a single root and a possibility for
up to eight sons at any node. A node is a processor-store ele
ment (PSE) which consists of a processor module and its associ
ated local storage module. A block diagram of a node is shown in
Figure 4.7 CDavis 19793, CTreleaven, Brownbridge and Hopkins
19823.
Agenda
Queue
(AQ)
Father PSE
I J
Input
Queue
IQ OQ Output
Queue
Atomic
Processor
CAP)
W
SWITCH
it It It If
7
Atomic
Storage
(ASU)
Son PSE's
Figure 4. 7 Utah Data Driven Machine
33
4. 6. LAU S_stem
The LAU System is based at the CERT Laboratory at the
University of Toulouse, France. The project began with the
design of a high-level single assignment language, and a compiler
and simulator for that language. This led to a the design and
construction of a powerful 32-processor data-driven computer.
Program representation is based on three logical types of
memory, one each for instructions, data, and control information.
The machine language has a three-address format which consists of
an operation code, two data memory addresses for input operands,
and a data memory address for the result operand.
The LAU machine has a packet communication organization with
token storage. Figure 4.8 CTreleaven, Brownbridge and Hopkins
19823 shows the structure of the processor. It consists basically
of three units. The Memory unit provides storage for instructions
and data. The Control Unit, the truly original and unique part
of the processor, contains the Instruction Control Memory and .the
Data Control Memory. The Processing Unit consists of 32 identical
processing elements, each element being a 16-bit micro processor
CSyre, Comte and Hifdi 19773, CTreleaven, Brownbridge and Hopkins
19823.
34
Control
Unit
Input
Queue
Instructions ready-
r instructions .
--update
CO CI C2.
^update Cd Process!
read Cd5
write
.
descriptor
Lng
Unit
P0...P31
i\ Memory
Unit
read operand -
write operand
Figure 4.8 LAU System
Z-
______il ____~_2___2i Ei2_ Qomputer
The group at the University of Newcastle upon Tyne were
interested in the data flow program organization only (not the
resulting machine architecture), the suitability of these pro
grams for a general-purpose decentralized computer, and the pos
sibility for combining them. The JUMBO computer which they
developed will be described here, which was built to study the
integration of data-flow and control-flow computation.
The JUMBO computer has a packet communication organization
with token matching. However, data can also be embedded in the
instruction. When an instruction is enabled, the token inputs and
the embedded inputs are merged to produce a set of up to eight
data values and addresses. The execution of an instruction can
produce data tokens, data to store in memory, and control tokens.
^55
A block diagram of the computer is shown in Figure 4.9
CTreleaven, Brownbridge and Hopkins 19823.
Matching
Unit
(token ^\^
packets
Jr~
Processing
Units
< token set Apackets J ^
(stored data^
packets J
Memory
Unit
executable
Instruction
packets
Figure 4.9 Newcastle Data-Control Flow Computer
CHAPTER 5
Project Descr i pt ion
As the main project for my thesis, a simulator has been
written which executes data flow program graphs. It was written
in Concurrent Euclid, chosen because of its support for con
current processes and monitors and its portability. Disadvan
tages of Euclid are that it does not support real numbers and the
maximum value of an integer is only 32767; test data must be
chosen accordingly.
The overall approach to the simulator is based on the models
of Dennis CDennis 19743, Watson and Gurd CWatson and Gurd 19793,
and Arvind CArvind and Gostelow 19753, CArvind and Gostelow
19823, where:
1. An instruction can execute whenever its operands become
available, and any number of enabled instructions can execute
concurrent ly.
2. All operators are free of side effects; that is, enabled
instructions can execute in any order or concurrently and the end
result will be the same with no error produced.
3. Tokens carry a tag which identifies not only the instruction
to which the token is going, but a code block identification
number (which identifies the instantiation of a subprogram or a
36
37
loop within a program or subprogram), and an iteration number.
In this way, many instantiations of an instruction can occur con
current ly.
4. Procedure calls can occur concurrently and loops can be
unfolded in the manner of Arvind' s U-Interpreter. CArvind and
Gostelow 19823
i Des_iE_i2_ of Model
The conceptual design of the model is shown in figure 5. 1.
It has a packet communication network to carry data tokens from
one unit to another. Tokens can be of type integer, boolean or
character. Again, reals have not been implemented because Euclid
does not support them.
Program instructions are read by the simulator and stored in
E_23___ They can be referenced by the Fetch and Match
Units. Any constants which are to be used as input to an
instruction are "permanently" entered into the token store in
the Match Unit under that particular instruction number, making
the constant input always ready and available for use.
When an instruction has been executed, a packet containing a
data token is sent to the Match Queue, which is a queue of pack
ets. The Match Unit removes packets from this queue and, by
referencing the instruction to which the packet is destined,
determines if this one token enables the instruction. If it
II Match Unit I
l& Token Store I
I
I Match Queue I
I Program I
) I Memory I
I I
> I Fetch Queue I > I
1 Processor 1 1 (
1 Processor 2 1 <
1 Processor 101 <
Figure 5. 1 Model of Simulator
I
Fetch
Unit
38
does, a packet with the data token is immediately sent to the
Fetch Queue. Otherwise, the Match Unit must check it's Token
Store for the presence of other tokens for that instruction. If
the instruction requires two data tokens, i.e. enabling count is
two, the Match Unit then checks for the presence of a constant in
the Token Store for that instruction. If there is one present, a
data token is produced for that constant, added to the packet,
then the packet is sent to the Fetch Queue. If there is no con
stant, or if the enabling count is greater than two, then the
39
Match Unit must look in the Token Store for other tokens for
that instruction. If one or more are found, the enabling count
for that instruction is now decremented by one and checked. If
it is now zero, then the stored tokens are removed from Token
Store, added to the packet and the packet is sent to the Fetch
Queue. If the enabling count is not zero, the newly arrived
token is stored along with the other tokens for that instruction.
In the case where no other tokens are found in the Token
Store for that instruction, and the enabling count has not
reached zero, the newly arrived token is entered into the Token
Store under its destination instruction number to await the
arrival of tokens which will enable the instruction.
The Fetch LJ_i_ repeatedly removes a packet from its queue.
It gets a copy of the packet's destination instruction from pro
gram store and then sends the packet and the copy of the instruc
tion to a processor for execution of the instruction.
There is a pool of 10 processors in the simulator; each pro
cessor can execute any opcode. However, in order to make it look
as if any one processor is dedicated to only one operation and
there are an "unlimited" number of processors for any one opera
tion, the Fetch Unit dynamically allocates the processors as they
are needed and assigns the op code at that time. Upon completion
of an instruction execution, the processor goes back into a
"pool"
of available processors. The number of processors is
40
large enough that the Fetch Unit seldom has to wait for one to
become available.
When a processor executes an instruction, the incoming data
tokens are "consumed" by the processor and disappear. The result
ing data token (s), if any, which are produced by the operation
are sent in one or more packets to the Match Queue.
The Match Unit, Fetch Unit, and the processors are all
operating concurrently and therefore packets are constantly being
pipelined through the system.
There are two types of constants used in the model : program
constants and start constants. They will look the same in a
graph program but are handled differently by the simulator.
Start constants are constants which are created at the start of
the program as if they were produced dynamically by the execution
of a statement. They are used to execute an instruction only
once, and their presence alone enables the instruction. In this
way, execution of the program begins; start constants are the
"initial" conditions which "get the ball
rolling." This is in
contrast to a program constant which must be available to an
instruction throughout the program and may be used in execution
many times. Examples of each are shown in the simple program in
figure 5.2 which loops five times and then stops. (The state
ments identified as L, LI, D, and DI are used in loops to imple
ment the grouping of tokens into sets and are explained in detail
41
Figure 5.2 Program Constants and Start Constants
in section 5.6.1.9.) The constant 1 entering the L statement is a
start constant. It is the only input to that statement; its
arrival causes the statement to be enabled and thus executed.
This particular L statement will not execute again in the pro
gram. In contrast, the constants 5 and 1 entering the "if
<="
and "+" statements respectively are only one of two input tokens
to their destination statements. Their arrival does not enable
the instruction; the instruction still must wait for the arrival
of the second data token to be enabled. These program constants
42
will be required on each execution of their statements, each
time the loop is executed.
5. 2. Data Flow Program Orgamzat
_2_
A data flow program can be broken up into code blocks. A
code block is made up of one or more operators. Each loop
within the program is a separate code block, and each subprogram
is a code block. Groups of instructions not belonging to any loop
or subprogram belong to the initial code block number which is 1.
Each code block, as the program executes, has its own unique
ident if ier.
As a program begins execution, its code block identifier
(CID) is 1. Each time a loop is initially entered, the CID is
assigned a new, unique integer, and the old CID is saved for use
again on exit from the loop. Also, on entry to each subprogram,
a new CID is assigned and the old one saved for use upon return
to the calling program. New
CID' s are assigned sequentially.
Each time a new one is required, the last number assigned is
incremented by one and then assigned as the new CID.
An iteration count (IID) is also kept. It is initially set
to 1; with each succeeding iteration of a loop, the IID is
incremented by 1. Upon exit from a loop, the IID becomes what
ever it was upon entry to the loop.
43
Therefore, at any point in a program's execution, the tokens
of each instantiation of each instruction have a unique tag with
the current CID and IID.
As an example, the outline of a program is presented in
figure 5.3 along with the tags of the tokens, represented as
(CID, IID).
This method of tagging data tokens enables the execution of
loops to be "unfolded" in the manner of Arvind and Gostelow' s U-
Interpreter CArvind and Gostelow 19823 and also allows concurrent
instantiations of subprograms. This would include calling of a
subprogram from different points in a program, or recursive calls
to a subprogram. Each instantiation of any instruction within a
loop or subprogam will have its own unique tag (CID, IID).
Tagging data tokens in this way greatly increases con
currency, as shown in the graph program in Appendix D which com
putes factorial 1 to factorial 7. As can be seen, the loop com
puting the numbers 1 to 7 has very few instructions. This loop
will run very quickly compared to the part of the program which
actually computes the factorial recursively. It has been shown
by actually testing the program on the simulator that the loop
from 1 to 7 completes its 6th iteration before the first fac
torial is produced, and it completes its 7th iteration almost
immediately after 1! is computed. This is possible since compu
tation in the loop does not depend on computation of the
fac-
44
start
Call 1 (4,1)
Call 2 (5,1)
Call 3 (6,1)
end
(1,1)
(1,1)
(2,1) All tokens for instructions
(2,2) in this loop have CID = 2.
(2,3) It loops 4 times.
(2,4)
(1,1) Back to the original tag.
(3,1) Loop 3 times.
(3,2) Call subprogram F on
(3,3) each loop.
(1,1) Back to original tag.
Figure 5.3 Division of Program into Code Blocks
torial. This indicates that the factorial subprogram has
greater
than six instantiations running at once
(greater than six since
the factorial subprogram calls itself once for 1!, twice for El,
etc.). Therefore, six different factorials
are being computed at
45
one time, and there is the probability that there is greater than
one data token in the system for many instructions in the fac
torial subprogram. This can be done since the simulator dynami
cally assigns a different tag to the data tokens which identifies
the context from which the instruction was called.
Since there is no data dependency between successive compu
tations of n!, the loop can unfold and compute all of the fac
torials concurrently, thereby attaining maximum concurrency. It
is conceivable that n! may even complete before (n-1) !-
If the subprogram computations were dependent on one
another, that would be the only constraint upon ordering the exe
cution of instructions within an unfolded loop. Consider, for
example, the following program written in ID (a high level data
flow language) taken from CArvind and Gostelow 19823 which
integrates a function f by the trapezoidal rule.
(initial s < (f(a) + f(b)) / 2;
x < a + h;
for i from 1 to n-1 do
new s < s + f (x) ;
new x < x + h;
return s) * h
The computation of i from 1 to n-1 could execute indepen
dently of the instructions within the loop, and the computation
of new x can be done independently of the computation of new s.
Each new s and new x, however, depend on the old x, and therefore
they must execute in order: new s and new x must always wait for
46
the x of the previous loop.
5. 3. Data Structures
This section will describe in detail the data structures
used to implement the various elements of the data flow system;
in particular, the instructions, the packet communication net
work, the Match and Fetch Queues, and the Token Store in the
Match Unit.
5. 3. 1. I______i2__
The data flow program machine language instructions are read
in and stored in an array called
"instruction" of maximum size
"maxinstr"
which is presently set at 1000. Each element of the
array represents one instruction in record form, and contains the
following information:
(1) The opcode: an integer representing the mnemonic opcode.
(2) The enabling count of the instruction: the number of con
text control tokens and the number of data tokens required
for execution.
(3) The string to be printed along with output. This is used
only for output statements; for other statements, it is
undefined.
(4) The file number from which data should be read. This
applies only to input statements; for other statements, it
47
is undefined.
(5) An array of input types. This array is of size 20, which is
the maximum number of inputs any instruction can have. In
fact, most instructions will have only one or two; only the
input and output statements and those statements dealing
with subprograms will be allowed to have up to the maximum
number of inputs. The type of the data token that will be
input to port 1 is the first element of this array, the type
of the token input to port 2 is the second element, and so
on. Refer to Appendix B for data type codes.
(6) A pointer to the beginning of a linked list of information
defining the dest inat ion (s) of output data tokens produced
by execution of the instruction. Each element of this list
is a record and contains:
a. the output port number of the data token to which this desti
nation information applies.
b. the type of the data token being sent.
c. the instruction number to which the token is being sent.
d. the port number of the destination instruction to which the
token is being sent.
e. a pointer to the next record in the list.
48
One output port can have any number of destinations speci
fied; this precludes the necessity of a duplicate instruction.
The destination information is read in along with the program
instructions. The destination information for each instruction
is placed immediately after the instruction itself and is ordered
by output port number.
5. 3. 2. E_ket Communication Network
The network described here is the system of packets of
information which are "sent" from one unit of the simulated data
flow computer to another (see figure 5.4). The packets all look
the same, no matter where they are in the system. The difference
is only in how each unit handles them.
A packet has a header, which is a record containing the fol
lowing information:
(1) The instruction number to which the packet applies.
(2) The tag of the data token <s) in the packet; i.e. the code
block number and the iteration.
(3) A pointer to a single data token or a linked list of data
tokens, depending on what part of the system this packet is
headed for and what the instruction is.
The data tokens themselves are records and contain the fol
lowing information:
49
Fetch Queue
,
as
Matoh Queue
03^ fe
Processors
Hi
=J=
Figure 5.4 Packet Communication System
50
(1) The type of the data token; i.e. integer, character,
boolean, instruction address, or context control.
(2) The actual value of the token. The values are represented
as shown below.
If the type is:
integer
character
boolean
instruct ion address
context control
the value is:
integer value
ordinal value of the character
an integer: 1 = true, 0 - false.
an integer representing the
number of an instruction in the
array of instructions.
two integers: the first repre
senting the code block number
and the second, the iteration
number.
There is also a capability for implementing pointer type
tokens; in this case, the value would be a pointer to a
structure. However, at present, structures have not been
implemented.
(3) The port number through which the data token will enter its
destination instruction.
(4) A pointer to the next data token in the packet, or nil if
this is the only token or the last token in the list.
A packet with two data tokens looks like this:
51
1 instruct ior nbr 1
1 tag 1
1 po inter to token 1
) I type I
I I value
I port I
I next I
) 1 type 1
1 value 1
1 port I
1 nil 1
Packets which are on their way to a processor would contain
the header and a linked list of all the data tokens necessary to
execute the instruction.
Packets produced by a processor and going to the Match Queue
would in general contain only the header and one data token. The
exception to this is a packet produced by execution of the
activate or end statements, in which case all data tokens
required are sent in one packet to the next instruction to be
executed.
The Match Unit removes a packet from its queue and, using
the information in the header record, either stores the tokens in
its token store to await the arrival of other tokens which will
enable the instruction, or sends the packet on immediately to the
Fetch queue, sometimes with the addition of other data tokens
from the token store.
The Fetch Unit removes a packet from its queue and, by look
ing at the header, determines which op code is to be executed.
It then sends the entire packet to a processor
"created" for that
52
op code.
By using a small packet which points to a list of data
tokens, the packet alone can be sent around the system, keeping
overhead low, while the tokens remain "stationary". Tokens are
easily shifted around from one packet to another by simply rede
fining the pointer in the packet or a pointer in the list of
tokens. They are also easily added to and removed from lists of
tokens in token store by simply setting pointers. This ability
makes the concept of a packet and tokens a very flexible and
efficient data structure.
5.3.3. Match and Fetch Queues The Match and Fetch Queues actu
ally share one pool of buffers, which is an array of records
representing headers of packets.
To store a packet in the Match Queue, a processor must
acquire a free buffer, i.e. a location in this array of records,
store the packet information in that array location, and then
enter that location number in the Match queue. The Match Queue
itself is just an array of buffer locations (integers), managed
as a queue. Access to the Match Queue is shared between the pro
cessors, which are the producers of packets, and the Match Unit,
which is the consumer. The Match Unit repeatedly removes a
buffer location from the Match Queue (as they become available),
gets a copy of the contents of the buffer, and then releases the
buffer location to the pool of free buffer locations.
__
Storing information in the Fetch Queue is a similar process,
the producer process being the Match Unit and the consumer pro
cess, the Fetch Unit.
One can picture a queue to look as shown in figure 5.5, with
the packet headers pointing to the data tokens.
The relationship between the pool of buffers and the Match
and Fetch Queues is shown in figure 5.6. At any point in time,
the buffer numbers in the two queues are mutually exclusive.
5. 3. 4. Jhe Match Unit
The basic data structure of the Match Unit is what has been
called the
_2__D __2__
<see figure 5.7). It is an array of size
"maxinstr, " which is the same size as the array of instructions;
this allows a place in the token store for each instruction in
the data flow program.
The token store is the structure which will store the data
tokens for each instantiation of each instruction while they are
waiting to be matched up with other tokens, thus enabling the
instruction. In order to keep tokens for each code block number
and each initiation separate from one another, and yet be readily
and easily accessible, the tokens are stored first by instruction
number, then code block number, and then iteration number.
The basic information stored in the array for each instruc
tion, independent of code block number and iteration, is:
r*r front
instruction instruction instruction
tff tag t*S
*-
type type type
value value
value-
port port port
/ / /
Figure 5.5 Queue
Match Queue
96 97 99
Fetch Queue
98 100
96 97 99 99 100
Pool of Buffers
Figure 5.6 Queues and pool
of buffer?
Token store
ton^Z ?
1
/
/ /
/'
)
/ *-> /
Figure 5.7 Token store in the Match Unit
(1) A boolean indicating whether or not there is a constant
present as input to this instruction.
(2) A pointer to the constant data token, if one exists.
(3) A pointer to a list of code block numbers for which there
presently are tokens stored.
The list of code block pointers is also a linked list; for
each code block number, there is a pointer to a linked list of
iteration numbers which occurred under that CID. For example, in
a loop, each instruction could have several instantiations, where
each instantiation would have the same CID but different IID's.
The list of iteration pointers carries an enabling count for each
instantiation of the instruction, and a pointer to the list of
actual data token (s).
5. 4. Structure of the Simulator. Program
The program consists of a main module, three modules which
are used by the main module, and five monitors. They are
described in detail in this section.
There are also two files of definitions of constants, types,
and data structures used throughout the program. The main module
needs to include the larger of these files; the other modules
include one or the other. The larger of these files is "defini
tions!", the other is "def init ions2".
5. 4. i. The MaiQ _2__i_
The main module of the simulator is in file sirn. e. This
module contains:
(1) The declarations for some additional data structures used in
the program.
57
(2) An "initially" section which reads in and sets the trace
indicator (refer to section 5.5). It also calls certain
procedures in the Load module to read in the data flow
machine language program, initialize the Match Unit, and set
up the start condition for data flow program execution
(3) The processes which make up the data flow computer; i.e. the
Match Unit, the Fetch Unit, and the Processors.
5. 4. 2.
_2__ _2__1_
The Load module, file load.e, contains all the procedures
necessary to read in the machine language data flow program and
start execution.
In procedure Readprograrn, the number of statements is first
read, then the statements themselves. If any input statements are
included in the program, the files specified are opened.
The program constants used in the data flow program are read
in Procedure Init ial ize_Match. For each statement which has a
constant as input, the token store must reflect the fact that a
constant is present by setting the constant indicator to true,
and a token for that constant must be created and a pointer to
the token stored. All other statements which do not have a con
stant as input are initialized to have the constant indicator
false, and the pointer to a constant token is set to nil.
58
The start constants are read in procedure
Get_the_bal l_Rolling. For each constant, the procedure creates a
packet for its destination instruction containing the constant as
a data token. It then enters this packet into the Fetch Queue,
which will cause execution of the destination instruction.
ft-
_____ _2__i_
The Match Module, file "match. e", contains all the pro
cedures required to update and maintain the data structure
"tokenstore" in the Match Unit (refer to figure 5.7). These pro
cedures are called only from the Match Unit in the main module
and they are always called in the context of a particular
instruct ion.
Procedure Enter_al l_into_FQ is called when the newly arrived
data token has caused the enabling count of the instruction to go
to zero. The procedure takes the data token along with the other
token (s) already stored in tokenstore for instruction m, and
enters them as one packet in the Fetch Queue. It then must
remove the iteration number record from token store. If there
are no other iteration numbers entered under that CID, it must
also remove the CID record.
Procedure Find_CID is called to find an incoming data
token's CID in token store; i.e. to see if there is an entry
under the CID of the token. It is given the code block number of
an incoming data token and looks in the tokenstore for an entry
59
for that particular CID. If there is one, a pointer to the list
of IID records is returned along with a flag set to true; other
wise the flag is set to false, meaning the CID is not in token
store.
Procedure Find_IID is called when a new token has arrived
for which a CID record has already been found in token store. It
now wants to know if there is already an entry under the IID of
the token. The procedure is given the iteration number of the
incoming data token and a pointer to the correct CID record and
it looks in token store for an entry under the IID of the token.
If one is found, a pointer to the correct IID record is returned
along with a flag set to true; otherwise the flag is set to
false.
Procedure Add_to_tokenstore simply enters a token into the
token store under the correct CID and IID. The CID and IID
records may have to be created at this point.
Procedure Enter_IID is called when a token has arrived for
which it has already been established that there is no IID record
in the list matching the token's IID; i.e. this is the first
token to arrive with this particular IID. The procedure is given
an incoming data token and a pointer to the list of IID records
under the correct CID. A record for the IID of the token is
entered in the list and, a pointer to the token is entered in
the IID record.
60
Procedure Enter_CIDIID is called when it has been esta
blished that there is no CID record in token store which matches
the CID of an incoming token. (Therefore there also is no IID
record.) CID and IID records are entered into token store along
with a pointer to the data token under the appropriate instruc-
t ion.
E_2 _2_Aii_
The Proc module, file "proc. e", contains all the procedures
for executing the data flow machine language statements. The
execution of a statement (with the exception of the halt and out
put statements) produces one or more data tokens which are then
sent in a packet to the Match Queue.
If, during the execution of an instruction, a data flow
error condition is detected, a flag is set which stops execution
of any more instructions. With no instructions being executed,
the production of data tokens is stopped. Once the Match Unit
clears all packets out of the Match Queue produced by previously
executed instructions, all action is stopped due to the lack of
packets in the system.
In Euclid, this is seen as the blocking of all processes
(Match Unit, Fetch Unit, and processors). This condition is
reported to the user, who then must press the delete key to end
the program.
61
Error conditions detected by the procedures executing
instructions are as follows:
(1) Tokens are not of the type expected. For example, the
instruction was coded to expect two integer inputs and
instead received two boolean input tokens.
(2) A token is missing for one or more input ports. There prob
ably were multiple tokens sent to a single port.
(3) Received more than one token for a port (non-fatal error for
output statement).
(4) Trying to print other than character or integer data in out
put statement.
(5) No instruction address token received by a begin statement,
or no context control packet received by an end statement.
The above are all considered fatal error conditions which
stop execution except error 3 for an output statement. In this
case, only one of the values will be printed; the other is lost.
5. 4. 5. Monitors
In addition to the modules described, the data flow simula
tor contains five monitors. Monitors provide a convenient means
for guaranteeing mutual exclusion to a portion of the program and
data, and for blocking and waking up processes. They are
included in Concurrent Euclid, the language in which the
62
simulator is written. A complete description of the operation
and use of monitors is found in CHolt 19833.
The first monitor, the Qmgr monitor, manages the buffer
which actually holds the packets in the Match and Fetch queues.
The FQrngr, MQmgr and BufferMgr monitors are tied together in
their usage to maintain the queues. They are based on the con
cept of producer/consumer pairs. A more detailed description of
their interaction is found in CHolt 19833. The Process_Control
monitor controls the creation of processors.
5. 4. 5. 1. Q_gr
_2_i_2E
The Qmgr monitor, file "Qmgr. e", makes additions and dele
tions from the pool of buffers which the Match and Fetch Queues
reference. As mentioned before in section 5.3.3, these queues
actually share one single data structure, declared in this moni
tor to be "queue" of type Q (an array of records) . The only
access allowed to "queue" is through this monitor; no other pro
cedures can access the data structure.
5. 4. 5. 2. FQrngr Monitor
The FQrngr monitor, file "FQmgr.e", is used to enter and
remove packets from the Fetch Queue. Refer to section 5.3.3 for
a complete description of the data structures making up the Fetch
Queue.
63
Procedure FEnter accepts a buffer number of the shared array
"queue" from the Match Unit and enters that number in the array
"Fbuffer", which is an array of integers managed as a queue. By
entering a buffer number into this queue, a packet has been
entered into the Fetch Queue. The actual packet must have been
entered into the array "queue" before the procedure FEnter was
called, through use of the monitor Qmgr.
Procedure FRemove removes the first buffer number from the
head of the Fetch queue (i.e. the array "Fbuffer") and passes it
back to the Fetch Unit, effectively removing a packet from the
queue. A copy of the information in the array
"queue" must have
been acquired before calling FRemove.
5. 4. 5. 3. MQmgr Monitor.
The MQmgr monitor, file "MQmgr. e", works in exactly the same
way as the FQrngr monitor, except that it enters packets into the
Match Queue for the processors using procedure MEnter and removes
packets for the Match Unit using procedure MRemove.
5. 4. 5. 4. BufferMgr _2Q__2_
The BufferMgr monitor, file "bufmgr.e", contains the Acquire
and Release procedures which are called by the producers and con
sumers of the Match and Fetch queues.
To enter a packet into the Match Queue, a processor must
first acquire a free buffer number in the array "queue" into
64
which to put the packet information. The consumer of the Match
Queue, the Match Unit, releases the buffer number after it has
removed a packet from the queue. A similar situation holds for
the Fetch Queue.
The list of available buffer locations is kept in an array
"pool", which is a stack.
5. 4. 5. 5. E_2E_____2___2i _2_i_2_
The Process_Control monitor, file "pcntrl.e", is the monitor
which controls the creation of processors when they are needed.
Procedure Spawn creates a processor; procedure Sip puts a proces
sor back to sleep when it has finished executing an instruction.
It works in the following manner.
When the simulation program begins to run, the processes in
the main module which work as the processors of the computer are
started and immediately call procedure Body_of_Processor in the
Proc module. This procedure immediately enters a loop and calls
procedure Sip, which puts the calling process to sleep on a
"wait"
queue as a generic processor, waiting to be put to work.
When the Fetch queue receives a packet of data tokens and
the number of an instruction which is now to be executed, it
calls procedure Spawn with the opcode of the instruction, a copy
of the instruction itself, and the packet. This call results in
waking up a process and taking it off the
"wait"
queue. It is
given the opcode, instruction and packet, and returns to the pro
cedure Body_of_Processor in the Proc module, where it has now
become a processor for the specific op code involved. The
instruction is then executed through procedures in the Proc
module.
_____ E_____
In the case of execution error in a data flow program run
ning on the simulator, a trace capability has been built into the
simulation program to aid in finding where the error occurred.
The first line of any data flow program must contain the
string
"trace" if a trace is desired, or "notrace" if it is not.
If "notrace" is selected, only the program output is written to
standard output; i.e. that output produced from an
"output"
statement.
However, if "trace" is selected, the user is given a running
commentary of packets taken from the Match and Fetch Queues,
along with their destination instruction numbers. In this way,
the user can trace the number of packets produced for any
instruction, and which instructions have been enabled and passed
on to the Fetch Unit and processors. Then if an error does
occur, the user can pinpoint which data flow instruction caused
the error.
66
An example of a program trace is shown in Appendix E.
67
5. 6. The Data Fiow Language
The language executed by the simulator includes the follow
ing operators:
+ - * /
absolute value
negate (unary)
mod u 1 o
logical and, or, not
if <, <=, >, >, -, /-
input
output
L, LI, D, DI (used for loops)
T-gate, F-gate, switch
halt
activate, terminate, begin, end (for subprograms)
The graphical representation of these operators is shown in
section 5.6.1. A data flow program would first be written in the
graphical form, then written into a file in mnemonic form (sec
tion 5.6.2) for input into an assembler. The assembler would
produce a machine language file (section 5.6.3), which is then
read and executed directly by the simulator.
The assembler has not been written at this point; it stands
as a project to be completed at some future time. It is a fairly
straightforward translation from mnemonics to machine code; the
translation is specified in section 5.6.5.
For purposes of testing the simulator, programs have been
written in graphical form and directly translated into machine
code. The advantage of using the assembler will be a built-in
68
check for certain programming errors, and error-free translation.
5. 6. 1. Gragh Language
This section outlines the graphical representation of each
operator that has been implemented on the simulator. (Note that
the term "operator" in graph language is synonymous with the term
"instruction" in the written mnemonic or machine language form.)
Input and output ports are shown numbered in cases where there is
more than one port and the order of inputs and/or outputs is
critical to the correct execution of the program. The direction
of data flow on the arcs is indicated by arrows.
There is no "duplicate" instruction; if it is desired to
duplicate a token at several different locations, the token is
sent to multiple destinations directly from the instruction that
produced it. However, in data flow graph diagrams, the duplicate
operator (a small circle) will still be shown.
The symbol X also will be used in graph diagrams, although
there is no corresponding operator. This symbol was used in
CArvind, Gostelow and Plouffe 19783 and represents a legal merg
ing of two lines where only one of the two lines will actually
receive a value.
Constants are represented as a number in a circle being
directed to an operator. The value of a program constant is
always available to the instruction; it has to wait only for the
69
non-constant data token to arrive for the instruction to be
enabled. The exception is a "start constant" which is a starting
condition for the program; i.e. it triggers the start of execu-
tuion, and it is the only input value required by an instruction
(see section 5.1). Start constants are not legal in subprograms,
only in the main program.
5. 6. i. i. Arithmetic Qe____2__
The arithmetic operators are addition, subtraction, multi
plication, truncating integer divide, modulo, absolute value and
unary negate (see figure 5.8).
xy xy xy xy xy
4* 4* 4* 4> 4 4/ 4> 4 4* nJ/
I + I l-l 1*1 I / I I mod
4, >\, 4? 4 4/
x+y x-y x*y x/y x mod y
^ ^
I abs I i neg I
4s 4/
Ixl -x
Figure 5.8 Arithmetic operators
70
_____
'
+ ~ * / mod : Two integer tokens, one each into
port 1 and port 2.
abs, neg : One integer token into port 1.
Ogerat ion :
+, -, *, /, mod: (portl) operator (port2)
abs, neg : operator (port 1)
The integer result of the arithmetic operation leaves from output
port 1 .
There are no built-in checks for common arithmetic errors
such as exceeding rnaxint, going below minint, and division by
zero. If these errors occur, they will be handled by the com
puter as usual.
____ii_S
G2___!
Context Control : 0
Data: +, -, *, /, mod 2
abs, neg 1
5. 6. i- 2.
_2_i_i Qfi____2__
The logical operators are "and", "or", and "not" (see figure
5.9).
Ingut: and, or: Two boolean tokens, one each into ports 1 and 2.
not: One boolean token into port 1.
71
x y
4, 4*
x y
4- 4,
X
1 and 1 1 or 1 1 not 1
4
x and y
4^
x or y
4>
not x
Figure 5.9 Logical operators
Qgerat i on :
and, or: (port 1) operator (port 2)
not: not (port 1)
____ii_S
Qount :
Context control : 0
Data:
and, or 2
not 1
5.6.1.3. Halt
I halt I
1_E__! 0ne token of any type. Type is not checked.
72
_E_ra_io_:The input token is absorbed and no output is produced.
This instruction is actually a "sink" for a line of logic in a
program which is finished but yet produces a token which must be
sent somewhere. See Appendix D for an example of its use.
____ii_g
Count:
Context control : 0
Data: 1
5. 6. i. 4. Decider Operators
The Decider Operators make a decision based on a predicate
p. The possible values of p are <, <=, >, ) =, = and /= (see fig
ure 5. 10) .
Two integer tokens, one each into ports 1 and _.
W
<f (P) >
Figure 5. 10 Decider operators
73
_B____io_:The integer inputs are compared according to the
predicate of the operator. One boolean value is the result of
the comparison, and it leaves on output port 1.
____ii_S
_2___*
Context control: 0
Data: 2
5. 6. 1. 5. I nput
The input operator will read up to m values from file number
N (see figure 5.11). Values read can be integer or character.
The user can read from a maximum of five different files; the
file- number must be between 1 and 5. If more than one input
statement specifies a certain file number, there is no guarantee
as to the order in which the lines are read from the file, due to
the asynchronous and non-sequential nature of a data flow com
puter.
I_E__s 0ne token of any type. The value has no significance and
is not looked at; it is merely a trigger to the execution of the
input statement.
QE___--2_s The i"P"Jt token triggers the execution of the instruc
tion, which causes m integer values to be read from program argu
ment file N. Program argument files are file names which are
listed in the simulator run command after the file name which
contains the data flow program. For example, in the following
74
I input N I
i I
I 1 . . m I
> . . 4*
where m
- the number of values to be read (1 <= m <= 20)
N = the file number (1 <= N <= 5) .
Figure 5.11 Input Statement
statement :
% dfsirn <prograrnf i le datafilel datafile2
datafilel defines file 1 and datafile2 defines file 2. Refer to
Appendix C for an example of a program using data files.
The first value read from the input file is sent from out
put port 1, the second value from port 2, etc. That is, each
value read has its own destination; they are not sent as a group
to any one destination.
____ii_a
Context control : 0
Data: 1
75
5. 6. 1. 6. Output
The output operator will cause the value of at most 20
tokens to be printed on standard output, along with an identify
ing string of characters (see Figure 5.12). Values can be
integer or character.
I_E__: From 1 to m integer or character tokens (m <- 20) arrive
on input ports 1 to m.
_E_____2_sWhen input tokens have arrived on all m input arcs,
the operator is ready to execute. The string, which is included
as part of the instruction, is printed on standard output. Then
the values of all the tokens which were received at the input
ports are printed in order of input port number on standard out
put. The values are printed on the same line as the string.
Integers are printed in a field of 10 spaces; characters in a
field of one space with no blanks on either side. No tokens are
. 4/
I 1 . . m I
I I
I output I
I 'string' I
where 1 <- m <= 20
string <= 20 characters.
Figure 5. 12 Output operator
76
produced on any output arcs.
I___ii_g Count
Context control: 0
Data: m
5. 6. 1 . 7. Gate if true, Gate if false
These operators either pass on or absorb the input token,
depending on the value of a boolean control token (refer to fig
ure 5. 13) .
I_E__: One integer or character data token into port 1; one
boolean token into port 2.
Qei_at ion :
Gate if true: If the boolean value is true, the data token
is passed on from output port 1. If the boolean value is false,
Figure 5.13 Gate if true, Gate if false
77
the data token is absorbed and no output token is produced.
Gate if false: If the boolean value is false, the data
token is passed on from output port 1. If the boolean value is
true, the token is absorbed and no output token is produced.
I___iiDfl Count:
Context control: 0
Data:
5. 6. 1. 8. Switch
The switch operator causes data to flow from one output port
or the other depending on a boolean control value (see figure
5. 14).
I_E__: 0r'e integer or character token into port 1; one boolean
token into port 2.
SWITCH
\k v
Figure 5.14 Switch Operator
78
QE_____on: If the input boolean token is true, then the data
token from input port 1 is sent from output port 1; otherwise it
is sent from output port 2.
-_-___QS _2___s
Context control: 0
Data:
5. 6. 1. 9. Loogs
Four operators are required to implement a loop: L, LI, D
and DI CArvind and Gostelow 19823 (see figure 5.15).
_
QE____2_S This operator is required at the entry point to a
loop; it creates a new context for execution by giving the tag of
an input token a unique code block identifier and sets the itera
tion number to 1.
I_E__' One boolean, integer or character input token with tag (C,
i).
The input token is passed on from output port 2 to its
dest inat ion(s) with a new tag (C , 1), C being the unique CID
dynamically assigned to the loop just entered. The IID - 1 since
this is the first iteration of the loop.
A context control token is sent from output port 1 to the LI
statement. The tag of the token is (C , 1); the old tag (C, i) is
carried as the data value.
79
daXa. Token
___
L.
T
(C, i )
_Jc_
_cCR_"Cek_n
da.to. tofcren
cc',c)
A.
Li
Pi
Ce,0
Figure 5.15 Loop operators: L, LI, D, DI
____ii_3
Count :
Context control : 0
Data: 1
_i
Operator: This operator is required at the exit point of a
loop. It restores the tag of the data token to the value it
had on entry to the loop, which was (C, i).
Ingut :
80
Port 1: Context control token with new context as tag value
and old context stored as data value of the token.
Port 2: One boolean, integer or character data token with
tag of the new context.
_E____i2_*The context control token is used only to restore the
tag of the data token to that of the old context. The context
control token is absorbed and not passed on. The data token from
input port 2 is passed on from output port 1 to its
destination(s) with the tag of the old context.
____ii_g
Count :
Context control: 1
Data: 1
3 Qe____2_! The iteration count of a token in a loop has to be
incremented every time the token goes around the loop. The D
operator accomplishes this.
Ingut : One boolean, integer or character data token with tag (C ,
i).
The input token is passed on from output port 1 with
its data value unchanged but with its iteration number incre
mented by one. In this way, tokens arriving at an instruction
within the loop but for different iterations can be matched up
correctly.
81
____ii_a
Context Control: 0
Data : 1
_i
Qsgrator: This operator is required at the exit point of a
loop, immediately before the LI instruction. It never receives
more than one token for any instantiation of a loop.
I_E__! One boolean, integer or character data token with tag (C ,
i)
_E____i2_'The input token is sent from output port 1 to its des
tination instruction (LI) with the iteration field of its tag set
to 1. Everything else remains the same.
_Q__ii_S
Count:
Context control: 0
Data: 1
An example of a loop and the use of the L, LI, D and DI
operators is shown in figure 5.16. The code block and iteration
numbers are shown in parentheses. (C, I) is the context as the
token enters the loop. C is the new code block number for the
loop domain; 2 is the last iteration of the loop.
82
(?) ( r"" , >
^_
t-
]V,3)
Pi
cc',i)
Ll
(C,l)
Figure 5. 16 Loop with tagged tokens
5. 6. 1 . 10. Aggly Qgera_2_ and Subprograms
The apply operator represents a subprogram call. Referring
to figure 5.17, a
"symbolic" token carrying the name of the sub
program to be executed is shown being sent to the apply opera
tor. This token is called symbolic because in the implementation
of apply in the simulator, there is no actual token used for this
purpose. The n input parameters to the subprogram arrive in
83
SIP
wqrv\
L_i
*\.
apply
m
I
Figure 5. 17 Apply Operator and Subprograms
tokens at input ports 1 to n. There must be at least one input
parameter to a subprogram, if only to trigger its execution, and
at most 19 parameters. Input parameters can be of type boolean,
integer or character.
The first statement of a subprogram is the begin operator,
with all input parameters to the subprogram being shown arriving
at ports 1 to n. The begin statement then sends the tokens to
the appropriate operators in the subprogram. All resulting out
put values from the subprogram are directed to the end operator,
from which point they are assumed to be directed back to the cal
ling program.
84
This is a rather simplified picture of what is actually
happening in the application of a subprogram. However, it is an
adequate representation for writing graph programs, and a more
detailed description is deferred to section 5.6.3 on Machine
Language.
5. 6. 1. M .
_2__i__en__s.
of Graflh Language
With the instruction set described above, it is possible to
represent all control flow concepts which have been included in
high level languages, including high level data flow languages
such as ID CArvind, Gostelow and Phouffe 19783. The conditional
schema:
if x then y else z
is shown in a data flow graph in figure 5.18. The while schema:
while x do
y
is also shown in figure 5.18. A loop can be expressed in a while
schema :
for i = 1 to 10 do same as i = 1
f<x) while i <= 10 do
f (x)
and a repeat statement is a while schema with the test at the
bottom of the loop instead of at the top. A case statement can
be expressed as a series or nest of conditionals.
&i
Conditional While Schema
Figure 5.18 Conditional and While Schemas
5. 6. 2. M___2L'i ________
Suggestions for a possible mnemonic language are briefly
outlined in table 5.1. The mnemonic language is the statement
form of a graph program; i.e. the form in which a program would
be entered into a file for subsequent input into an assembler.
86
In the headings shown in the table, "# copies" refers to the
number of copies required of the token produced by the execution
of the statement. In this way, duplicate tokens are sent to
their destinations without the use of a duplicate operator. A
destination instruction number and input arc number must be
specified for each copy required.
Brackets -C > on both sides of a list mean that one item
from the list is to be used. Square brackets C 3 mean the items
inside are optional. One bracket > with a number to the right
indicates the number of items in the list enclosed by the
bracket.
87
W
0
S-
a
P -P
in ai
a >-
E E
r \
ai w
- _
WW
H _
OJ
E
H
^>l 1
w w
r* H WW
rt _ -rt _
<f- Ul I
0 01 I
H |
j- a i
_ o I
z u I
w
e
OJ
E
IS 01 I
p a i
m x i
D K I
w
p
en
P c
C -rt
rt J.
s. -P
a to
w
"4- +>
0 3
a
j. -p
XI 3
z o
01 S- I
rl _ |
rt Z I
Ul I
1- Ul I
0 +> I
3 I
s. a i
_ c I
Z W I
01 1
TI 1 +>
0 TJ Ul DITJ -P 3
(1 + *^0_OICS-0 a
CI E 16 C IB 0 C c
D 1 ^
-p
3
a
p
3
0
r
II II II u
v ^ ^ ~ II v. p
H
4- >---*- *- I
Ul
S-
P S-
tn _
c z
Table 5.1. Mnemonic Language.
88
rt _ * _
w
a
w
p
w
4
OJ w
p
w
p
01 111
P 4>
ig <g _ _ a a
r
a
a
K
n
e , ,
-rt E E 4 4
* 3 C
n g b n
01 ai a- oj
s E +> -P
m X
c c
rt
a DI TJ ID
x 01 c S.
CO _ 01 01
-
z
e -
Table 5.1. Mnemonic Language (cont'd).
89
5. 6. 2. 1. Form of Mnemonic Program
The first statement in any program is specification of a
trace. If a trace is desired, the first line should read
"trace"; otherwise "notrace".
The statements in the main program are numbered conse-
qutively from one with one exception. The statement number fol
lowing the apply statement must be two greater than the apply
statement number. That is, if the apply statement number is n,
then n+1 is unused and the next statement is number n+2. The
main program statements immediately follow the trace/notrace
statement. Constants used in the main program are specified
immediately after the numbered statements, in the same way they
are specified for machine language programs (see Chapter 5, sec-
t ion 6. 4) .
Following the main program are the subprograms. The state
ments in each subprogram are numbered from 0, starting with the
S/P statement. The second statement of a subprogram is the begin
statement; the last numbered statement is the end statement.
(Note there is no end statement for the main program. ) Following
the numbered statements of each subprogram, the constant state
ments for that subprogram are specified.
Refer to appendix C for an example of a program in mnemonic
form.
90
5. 6. 3. Machine Language Statements
The form for machine language statements is as follows;
Enab. Cnt
instr File nbr or input nbr of
# opcode CC data print string types dests
(1) Instruction number. In the machine language program, state
ments must be numbered consequt ively from one, with no
numbers skipped. Subprograms do not start over from one;
the statement numbers start with the next available number.
(2) Opcode. The opcode as listed in Appendix A.
(3) Enabling count. Two separate numbers are required, the
first for context control tokens and the second for data
tokens.
(4) File number or print string. This field applies only to
input and output statements. For an input statement, the
file number from which data is to be read should be shown
here. For an output statement, the string of characters to
be printed with the output values should be written here,
enclosed in single quotes. The string inside the quotes
must be a maximum of 20 characters.
(5) Input types. Twenty separate numbers are required here,
whether or not the statement can legally use that many input
91
tokens. The numbers correspond to the input port numbers of
the statements; i.e. the first input type corresponds to
port 1, the second to port 2, etc. The exception to this is
the input statement. The types for an input statement refer
to the type of the value which is to be read from the file.
Codes for input types are shown in Appendix B.
(6) Number of destinations. One integer is shown here,
representing the total number of destinations for all tokens
from all output ports of the instruction.
Entries in fields of a machine language statement must be
complete, separated by at least one blank, and in the correct
order. Other than that, the input can be spaced out in any form
and spread out over any number of lines. Refer to Appendix C for
specification of all machine language statements.
The specifications for the destinations of output tokens
produced by a statement follow the statement to which they apply.
They are entered in the following form:
output port type of data instr. nbr. input port
(1) Output port. The number of the output port from which the
token is produced. This number will be one in most cases.
The exceptions ares
92
switch Tokens are produced from port 1 in the true case, port
2 if false.
input The first data value read goes out from port 1, the
second from port 2, etc.
output The number of destinations is zero since no tokens are
produced by this statement.
L The context control token leaves from port 1, the data
token from port 2.
halt The number of destinations is zero.
terminate,
end These statements use as many output ports as there are
parameters. e.g. parameter 1 leaves from port 1,
parameter 2 from port 2, etc.
act ivate,
begin These statements use one more output port than the
number of parameters. The first output port is for
either the instruction address or context control
token, the rest are for the parameters.
(2) Type of data. The code for the type of the data which
leaves by this output port.
93
(3) Instruction number. The destination instruction number.
<4) Input port. The port number through which the token should
enter the destination instruction.
As with the machine language statements, the entries in the
above fields need only be separated by one blank. The destina
tion information for any one statement must, however, be ordered
by output port. Duplicate tokens can be sent to different
instructions by specifying the same output port and type but dif
ferent instruction numbers and input ports, in any order.
There is no machine language statement for a constant. Pro
gram constants for the main program and subprograms are listed
together in one section immediately after the program, and start
constants for the main program and subprograms are listed
together after the program constants. the program.
The apply operator for subprograms was introduced in a pre
vious section on graph language. In machine language, the func
tion of the apply operator is actually accomplished by two
separate operators: activate and terminate. (The terminate
operator must immediately follow activate in machine code.) The
activate, terminate, begin and end operators all work together to
allow tokens to be operated upon in the new context of the sub
program (and with a new tag) and then to return to the correct
old context, thereby allowing more than one instantiation of a
subprogram at one time. The interaction of these operators is
94
shown in figure 5. 19.
The n tokens which are input parameters to the subprogram
are the input to the activate operator. The activate operator
then sends a special token to the begin operator of the subpro
gram, along with the tokens which are the input to the subpro
gram. This special token is an "instruction address" token,
which is sent from output port 1. It contains as its data value
the address (statement number) of the terminate statement. The n
input tokens to the subprogram are sent from output ports 2 to
(n+1); i.e. the token which arrived at port 1 leaves from port 2,
and so on. These tokens are all sent with the same tag that they
had when they arrived at the activate statement. There must be
at least one input token to a subprogram since without it the
execution of the subprogram would not be triggered. There can be
at most 19 input parameters.
The begin statement, upon receiving these tokens which are
sent all in one packet, changes the context to allow execution of
the subprogram under its own unique context. The instruction
address token is changed to a context control token, the tag is
changed to the new tag, and the old tag as well as the address of
the terminate statement are both sent as data values to the end
statement from output port one. Input data tokens (the parame
ters) are sent to their destinations in the body of the subpro
gram with the new tag.
Context of
Subprogram
(C, 1)
Figure 5.19 Subprogram activation
96
The end operator waits to receive the context control token
and y data tokens with the same tag as the context control token.
When they are all available, the tags of the data tokens are
changed to the old tag which has been saved in the context con
trol token. Then they are all sent in one packet to the address
of the terminate operator which is also contained in the context
control token. The data tokens are sent from a port number which
is one less than the port number at which they arrived; i.e., a
data token which arrived at input port 2 is sent from output port
1.
The terminate operator is enabled upon receiving this one
packet containing all the data tokens. This operator has all the
information as to where these returned values are to be sent in
the calling program; they are sent in separate packets to their
respective destination instruct ion (s) .
The enabling counts for these four instructions is summed up
in table 5.2.
Refer to Appendix C for an example of a program written in
machine language.
5. 6. 4. Jhe Machine Language E_2____
The machine language program must be entered into a file in
a specific form, as outlined below.
97
Enabling Count
Context
Operator Control Data
activate 0 x
terminate 0 1
begin 0 1
end 1 y
Table 5.2. Enabling counts of subprogram operators.
(1) line 1: The specification of the trace feature: "trace" or
"notrace"
(2) line 2: One integer specifying the number of machine
language statements in the program. Since statements are
numbered consequt ively, this number will be the same as the
statement number of the last statement in the program,
including subprograms.
(3) starting at line 3: the machine language statements,
immediately followed by the destination information for each
statement.
(4) An integer specifying the number of unique program constants
used throughout the program.
(5) Specification of program constants and their destinations,
as follows:
98
value of nbr of
constant destinations instruction arc
N m i
k
As shown, one constant can be used as input to several
instructions. Refer to the example in figure 5.20.
(6) An integer specifying the number of start constants in the
program.
(7) Specification of start constants and their destinations.
They are entered in the same form as program constants, in
(5) above.
If a program contains these operators:
inst r 4 instr 22 inst r 10
X 5 5 y z 5
* 4 * 4/
if <
The constant 5 can be specified as follows:
5 3 4 2
10 2
22 1
Figure 5.20 Specification of constants
99
Refer to Appendix C for an example of a complete program.
5. 6. 5. The Assembler
As stated earlier, the assembler would read a file contain
ing a data flow program in mnemonic form and produce a file con
taining the program in machine language form, which is directly
executable by the simulator.
Translation of mnemonic statements into machine language is
fairly straightforward and is done line for line, with the excep
tion of the apply and subprogram statements. Mnemonic opcodes
and types are translated to numeric codes, and the total number
of copies specified becomes the number of destinations.
The translation is best done in two passes. On the first
pass, the
"S/P" statements are dropped, and subprogram state
ments, beginning with the "begin" statement, are renumbered using
a relocation factor (subprogram statement number + number of last
statement before the S/P statement ). All references (destina
tion specifications) within the subprogram are changed in the
same way. A symbol table is created with the name of the subpro
gram and the number of its begin statement.
On the second pass, apply operators are translated into two
operators: the "activate" and
"terminate" operators (the reason
for leaving the statement number following
"apply" unused). The
activate statement looks up the subprogram name in the symbol
100
table to get the statement number to use for its destination
informat ion.
Constants specified for the main program and subprograms
must be combined into one group in the machine language program.
One way to do this would be to enter all constants and their des
tination instruction numbers in a table on the first pass using
relocated instruction numbers for the subprograms. There would
be one entry for each unique constant, with all the destinations
specified for that constant throughout the program. After the
second pass, the information in the table could be printed out in
the correct form for constant specification. Start constants
would, of course, have to be kept separate from program constants
and printed in a separate section after the program constants.
An example of a translation from mnemonics to machine
language is shown in Appendix C.
CHAPTER 6
CONCLUSIONS
Programming the simulator in Euclid has worked out well in
many ways. For example, the "pseudo-concurrency" of Euclid has
allowed an accurate simulation of how a data flow computer might
actually operate. However, it has had its drawbacks, one being
the lack of helpful diagnostics when a run-time error occurs.
Also, the fact that files are not closed after a run-time error
results in the lack of readable diagnostic information where out
put from various processes is written into separate files. This
was partially overcome by the use of the "trace" feature, which
allows a user of the simulator to watch execution of the program
and know exactly which data flow instructions generated the
error.
The fact that reals are not implemented in Euclid and there
fore in the simulator and that integers have a maximum value of
32767 has been a limiting factor in the use and testing of the
simulator. Many "real
life" applications which would have been
interesting due to their high degree of concurrency could not be
programmed because of these limitations; e.g., integration of a
function f by the trapezoidal rule CArvind and Gostelow 19823.
The next immediate step to aid in the use of the data flow
simulator is to write the assembler to convert mnemonic programs
101
102
to machine language; this would simplify the writing of test pro
grams.
There is also work which could be done in the area of imple
menting data structures. At the moment, only single-valued vari
ables can be used as data tokens, which again limits the applica
tions of the simulator. The system could be expanded to allow
data tokens to carry a pointer to a whole structure, or to cer
tain parts of it. Work in the area of I-structures CArvind and
Thomas 19803 would also be very valuable as far as future real-
life applications of a data-flow system.
At the level of graph language, the capability to write a
program on a CRT using graphic symbols and having that program
automatically translated into machine language would be useful.
Efforts could also be made to optimize data flow graphs before
they are converted to machine language. Research is being done
at Manchester on a higher level macro-assembly language.
Implementing compilers for sequential programming languages
and high level data flow languages is also a subject under
research today. A similar compiler could be written to produce
machine language for this simulator.
Study could be given to the features required in a high
level data flow language as far as the type of applications which
would lend themselves best to the special capabilities of a data
flow computer, such as resource management. A study of various
103
types of languages could also be undertaken, including nondeter
ministic, deterministic, functional, and single-assignment
languages, and their implementation using a data flow computer.
104
BIBLIOGRAPHY
CAckerman 19823
Ackerman, W. B. "Data Flow Languages, " Comguter 15
(February 1982), 15-25.
CAgerwala and Arvind 19823
Agerwala, T. , and Arvind. "Data Flow Systems, " Com-
E____ 15 (February 1982) , 10-13.
CArvind and Gostelow 19753
Arvind and Gostelow, K. P. "A New Interpreter for
Dataflow Schemas and its Implications for Computer
Architecture, " Technical Report 72, Department of
Information and Computer Science, University of Cali
fornia, Irvine, October 1975.
CArvind and Gostelow 19773
Arvind and Gostelow, K. P. "A Computer Capable of
Exchanging Processors for
Time," B_2___i_3_ 1E1E
Congress (1977), 849-854.
CArvind and Gostelow 19783
Arvind and Gostelow, K. P. "Dataflow Computer Architec
ture: Research and
Goals," Technical Report 113,
Department of Information and Computer Science, Univei
sity of California, Irvine, February 6, 1978.
CArvind and Gostelow 19823
Arvind and Gostelow, K. P. "The U-Interpreter, " Com
guter 15 (February 1982), 42-49.
CArvind, Gostelow and Plouffe 19783
105
Arvind, Gostelow, K. P. , and Plouffe, W. "An Asynchro
nous Programming Language and Computing Machine,"
Technical Report 114a, Department of Information and
Computer Science, University of California, Irvine,
December 1978.
CArvind and Kathail 19813
Arvind and Kathail, V. "A Multiple Processor Data Flow
Machine That Supports Generalized Procedures," Eighth
B____i ___E2_i__ 22 _2_E____ Architecture, Minneapolis,
Mn. , 12-14 May 1981 (New York: IEEE 1981), 291-302.
CArvind and Thomas 19803
Arvind and Thomas, R. E. " I-Structures: An Efficient
Data Type for Functional Languages," Report
MIT/LCS/TM-210, Laboratory for Computer Science,
M. I.T. , September 1980.
CBackus 19783
Backus, J. "Can Programming be Liberated from the von
Neumann Style? A Functional Style and its Algebra of
Programs, " Communications of the ACM 21, 8 (August
1978), 613-641.
CClark 19733
Clark, B. "A Speed-Independent Implementation of Data
Flow
Schemas," Computation Structures Group Memo 82,
Project MAC, M. I.T. , June 1973.
CDavis 1978a3
Davis, A. L. "The Architecture and System Method of
106
DDM1: A recursively Structured Data Driven
Machine,"
E_2E___i_32 __ B____i ________ _2_E____ Architecture
(Palo Alto, California, April 3-5) ACM, New York, 1978,
210-215.
CDavis 1978b3
Davis, A. L. "Data Driven Nets: A Maximally Con
current, Procedural, Parallel Process Representation
for Distributed Control Systems, " Technical Report
UUCS-78-108, Department of Computer Science, University
of Utah, July 1978.
CDavis 19793
Davis, A. L. "A Data Flow Evaluation System Based on
the Concept of Recursive
Locality," P_2___i_S_ i_Z_
_2_E____
Conference (New York, New York, June
4-7), volume 48, AFIPS Press, Arlington, Va. , 1979,
1079-1086.
CDavis and Keller 19823
Davis, A. L. , and Keller, R. M. "Data Flow Program
Graphs," Co_E____ 15 February 1982), 26-41.
CDennis 19743
Dennis, J. B. "On Storage Management for Advanced Pro
gramming
Languages," Computation Structures Group Memo
109-1, Project MAC, M. I.T., October 1974 (revised
November 1, 1974).
CDennis 19753
Dennis, J. B. "First Version of a Data Flow Procedure
107
Language," MAC Technical Memorandum 61, Project MAC,
M. I.T. , May 1975.
CDennis 19773
Dennis, J. B. "A Language Design for Structured Con
currency,
" Computation Structures Note 28-1, Laboratory
for Computer Science, M. I.T. , February 1977.
CDennis 19793
Dennis, J. B. , "The Varieties of Data Flow
Computers,"
B_2___i_S_ Ei_I_ _D______i2__i _2_______ Distributed
_2_B__i_3 _______ (Toulouse, France, October 1979) ,
438-439.
CDennis 19803
Dennis, J. B. "Data Flow Supercomputers,
" Comguter 13
(November 1980) , 48-56.
CDennis, Boughton, and Leung 19803
Dennis, J. B. , Boughton, G. A. , and Leung, C. K. C.
"Building Blocks for Data Flow
Prototypes," Seventh
Annual Symposium on Computer Architecture Conference
Proceedings, SIGARCH Newsletter Volume 8 Number 3, May
6-8, 1980, 1-8.
CDennis and Misunas 19743
Dennis, J. B. , and Misunas, D. P. "A Computer Archi
tecture for Highly Parallel Signal Processing," Compu
tation Structures Group Memo 108, Project MAC, M. I.T. ,
August 1974.
CDennis and Misunas 19753
108
Dennis, J. B. , and Misunas, D. P. "A Preliminary Archi
tecture for a Basic Data Flow Processor, " Proceedings
2nd
________i2__i ___B2_i__
Computer
_________
(Houston, Texas, January 20-22), IEEE, New York, 1975,
126-132.
CDennis, Misunas and Leung 19773
Dennis, J. B. , Misunas, D. P. , and Leung, C. K. "A
Highly Parallel Processor Using a Data Flow Machine
Language," Computation Structures Group Memo 134,
Laboratory for Computer Science, M. I.T. , January 1977.
CFriedrnan and Wise 19763
Friedman, D. P., and Wise, D. S. "The Impact of Appli
cative Programming on
Multiprocessing," Technical
Report No. 52, Computer Science Department, Indiana
University, Bloomington, July 1976.
CGajski, Padua, Kuck and Kuhn 19823
Gajski, D. D. , Padua, D. A., Kuck, D. J., and Kuhn, R.
H. "A Second Opinion on Data Flow Machines and
Languages," CornE_ter 15 (February 1982), 58-69.
CGostelow and Thomas 19793
Gostelow, K. P., and Thomas, R. E. "A View of
Dataflow,
" P_2__-__3_ ___i2__i _2_E____ Conference
(New York, New York, June 4-7), Volume 48, AFIPS Press,
Arlington, Va. , 1979, 629-636.
CGurd and Watson 19773
Gurd, J-, and Watson, I., "A Multilayered Data Flow
109
Computer Architecture, " E_2___i_g_ i_ZZ _________2__i
_______
on B___ii_i E_2___i_3 (August 1977) , 94.
CHolt 19833
Holt, R. C. Concurrent Euclid, the Unix System, and
I__i_ Add i son-Wesley Publishing Company, Inc., USA,
1983.
CHo and Irani 19833
Ho, L. Y. , and Irani, K. B. "An Algorithm for Proces
sor Allocation in a Dataflow Multiprocessing Environ
ment,
" E_22___i_S_ 1983 International Conference on
E____i_i E_2___i_3 IEEE Computer Society Press,
Silver Spring, Maryland, August 1983, 338-340.
CHogenauer, Newbold and Inn 19823
Hogenauer, E. B. , Newbold, R. F. and Inn, Y. J. "DDSP
- A Data Flow Computer for Signal Processing," Pr2E___-
i_S_ i__ I_______i2__i Conference on Parallel Process
ing (August 1982), 126-133.
CKeller 19773
Keller, R. M. "Semantics of Parallel Program
Graphs,"
Technical Report UUCS-77-110, Department of Computer
Science, University of Utah, July 1977.
CKeller 19803
Keller, R. M. "Divide and CONCer: Data Structuring in
Applicative Multiprocessing Systems," Proceedings Lisg
August 1980, 196-202.
CKeller, Lindstrom and Pat i 1 19783
110
Keller, R. M. , Lindstrom, G. , and Patil, S. "An Archi
tecture for a Loosely-Coupled Parallel Processor,"
Technical Report UUCS-78-105, Department of Computer
Science, University of Utah, October 1978.
CKeller, Lindstrom and Patil 19793
Keller, R. M. , Lindstrom, G. , and Patil, S. "A
Loosely-Coupled Applicative Multi-Processing System,"
E_2___i_g_ ___i2__i _2_E____ Conference, AFIPS Press,
New Jersey, 1979, 613-622.
CKeller and Yen 19813
Keller, R. M. , and Yen, W. J. "A Graphical Approach to
Software Development Using Function Graphs," Digest of
E_E___ _2_E2_ Sfiring 81, February 1981, 156-161.
CLeler 19833
Leler, W. "A Small, High-Speed Dataflow Processor, "
B_2___i_S_ i_3 I_______i2__i Conference on Parallel
E'_2_ss__g, IEEE Computer Society Press, Silver Spring,
Maryland, August 1983, 341-343.
CLerner 19843
Lerner, E. J. "Data Flow Architecture,
" IEEE Sgectrum,
April 1984, 57-62.
CLeung 19753
Leung, C. K. C. "Formal Properties of Well-Formed Data
Flow Schemas, " MAC Technical Memorandum 66, Project
MAC, M. I.T. , June 1975.
CLitvin 19833
Ill
Litvin, Y. "Top Down Data Flow Programming, " E_2____
i_31 1983 International Conference on E___II__ E_2____
ing, IEEE Computer Society Press, Silver Spring, Mary
land, August 1983, 252-254.
CMyers 19823
Myers, G. J. Advances in Comguter Architecture, John
Wiley & Sons, New York, 1982.
CPlas, A. et al 19763
Plas, A., et al "LAU System Architecture: A Parallel
Data-Driven Processor Based on Single Assignment, "
B_2-__iD31 i_Z_ _________2__i Conference on E___ii_i
E_2__s_i_g (August 1976), 293-302.
CRumbaugh 19753
Rumbaugh, J. E. A E___ii_l 9_____2_2__ Qornguter
____i__ctu__ f2_ Data Flow E_23____ Ph- D. Thesis,
Department of Electrical Engineering and Computer Sci
ence, M. I.T. , May 1975.
CRumbaugh 19773
Rumbaugh, J. E. "A Data Flow
Multiprocessor," IEEE
I______i2__ 2_ Comsuters C-26, 2 (February 1977),
138-146.
[Schwartz 19803
Schwartz, J. T. "Ultracomputers,
" ACM Transactions on
E_23____i_3 Languages and Systerns, Vol. 2, No. 4,
October 1980, 484-521.
CSharp 19803
112
Sharp, J. A. "Some Thoughts on Data Flow Architec
tures, " ComEuter Architecture
____
6 <15 June 1980),
11-21.
CSrini 19813
Srini, V. P. "An Architecture for Extended Abstract
Data Flow, " Eighth Annual ___E2_i__ 2_ _2_E____ Archi
tecture* Minneapolis, Mn. , 12-14 May 1981 (New York:
IEEE 1981), 303-325.
CSyre, Comte and Hifdi 19773
Syre, J. C. , Comte, D. and Hifdi, N. "Pipelining,
Parallelism and Asynchronism in the LAU System,"
E_2___i_3_ i_ZZ International _2______ on Parallel
E_2___i_3 (August 1977) , 87-92.
CTanenbaum 19813
Tanenbaum, A. S. _2_E____ ____2___ Englewood Cliffs,
New Jersey: Prentice-Hall, Inc., 1981.
CTodd 19823
Todd, K. W. "Function Sharing in a Static Data Flow
Machine, " E_2___i_3_ i___ ________i2__i Conference on
E___ii_i E_2__s_i_g <August 1982), 137-139.
CTreleaven 19793
Treleaven, P. C. "Exploiting Program Concurrency in
Computing Systems," Comguter 12, 1 (January 1979), 42-
49.
CTreleaven 19803
Treleaven, P. C. (Ed.) "VLSI: Machine Architecture and
113
Very High Level Languages,
" SiGARCH ComEuter Architec
ture News 8 (15 December I960), 27-38.
CTreleaven 19833
Treleaven, P. C. "The New Generation of Computer
Architecture, " Tenth. Annual I_______i2__i Conference on
Comguter Arch.itecture Conference E_2___i_31 Stock
holm, Sweden, 13-16 June 1983 (New York: IEEE 1983),
402-409.
CTreleaven, Brownbridge and Hopkins 19823
Treleaven, P. C. , Brownbridge, D. R. , and Hopkins, R.
P. "Data Driven and Demand Driven Computer Architec
ture, " CorrjE__i_3 Surveys 14 (March 1982), 93-143.
CWatson and Gurd 19793
Watson, I., and Gurd, J. "A Prototype Data Flow Com
puter with Token
Labelling," E_2E___i_S_ ___i__i Qom-
E____ Conference, AFIPS Press, New Jersey, 1979,
623-
628.
CWoo and Agrawala 19833
Woo, N. S. , and Agrawala, A. "The DC1 Flow Schema with
the Data/Control-Driven
Evaluation," E_2___i_S_ i___
International Conference on E___ii_i E_E___i03 IEEE
Computer Society Press, Silver Spring, Maryland, August
1983, 244-251.
114
APPENDIX A
Opcodes
Operator Code
abs 1
neg 2
input 3
not 4
halt 5
output 6
L 7
LI 8
D 9
DI 10
+ 11
- 12
* 13
/ 14
mod 15
and 16
or 17
Tgate 18
Fgate 19
sw i t ch 20
if < 21
if <- 22
if > 23
if ) = 24
if = 25
if /= 26
begin 27
end 28
act ivate 29
terminate 30
115
APPENDIX B
Data Type Codes
Data Type Code
no data 0
boolean l
integer 2
character 3
instruction address 4
pointer 5
context control 7
116
APPENDIX C
c
0
If- -rt
^
O -P d-
10 3
iCTnnnnnnniSTnjTnnnjHTjTiuTjtjTis -otj-otjxitj
~ * &z -P
Ul
111
Q
-h OJ n n n n m n in io n m
QSQSSSSWWQSSQSSSSSSSQSSSSSSS*-*
** " +> -p
Ul
4 -P . .
OJ . . .
so.
M^SSSSSS8 0JMSS8SSSSSS8SSSSSSS<t<fit<tI-
-P -P +> .p +> .p
*;Bmi(imi8sojnicjrtrtS8H8sniiriwi<ii<)S'<Hr.**^<-
i +>+> 4> .p -P -P -P -P +> +> +> -P +J
wwwwojojcuww'iwrf^ii>^i^iiv)roiv)r)iv)ro^i-iw^-o-sri>""
-P.P +a -P+J4J+>-P-P4J-P+J-PiJ+J+J-P
ui
P DI
fll S- ^
E O S-
01 -P
+> # m
is -> .
-P 01 -P
(0 "
rt -rt
01 U. J-
oi _
ns
3
Ol
-P
10
_1 3
O 1!
0l_-PWWOJWrtrtOJrtrtCilWrtrtrtrtrtflJWWWWWrtWWW.XrtrtS
IS
rt DIQ
0 -
10 t-f
_ _
0
O
UUSSSSSSSSSSSSShSSSSSSSSSSSSSSSh
o^wM-t-rtWiiTM^iON^r^iDois^wro^iniiiinsaiijioiQNtt)
U rt rt rt rt -iP3 "^ wftJCUWWOJW W^i^iWnwW
a
o
UiC E +
E
01 p
O "? +> 3
E O 3 a
01 U in oi to a. .p "a -p
a XI 01 0 3 S- 0 ^ rt
h: o + !1 ^ * ns CE-OIOCJJQD
01
01 -p
p IS
10
n u n u ai ai > -h
"*'.*** II N+>+) +>4) ^ g.rt
rt.rtlBIB.pJ.OI"0
u-it-it_>-ii_ii_ to cncnu OJ 0) _
.rt -rt .rt .rt .rt -rt ui | ix rg .p a 01
01
5-
01
r
p
3
a
+>
3
0
o
p
m
01
3
ITS
> S-
01
1- XI
O E
3
S-
01
xi ai
e ~
3 -rt
*-
E
IB
5.
cn
O
E S-
is a
s. xi
Ol 3
o in
s.
a e
xi o
3 s.
m i-
o -a
P 01
DI S.
3
^ -P
O 01
ai S-
Ul Ul
S. S- -
oi ai
p p
01
E
IB
s.
B
S-
01
S- -P
ai u
IB
+>
as
D
O
c
S-
0
IB
01
I I
0
0
XI
.p + 15 +>
u u
IB IB
i. S-
IB IB
-P
Ul
- oi .
a. o-ts _ _
0 u
M- <l- l(-
0 0 0
S-
01
5- S. S.
S- 01 0J 01 _
01 oi oi oi oi
XI fll OJ 01 01
E -P +> +> +)
3
-f* -r^ .* -.-I
II II II II II II II
Ti ETJrtCUfo*
+3 +> *> +>
117
(1) Types are listed for as many variables as will be read from
the file. The rest should be entered as 0. Note that types
refer to the variables to be read from a file, rather than
actual input to the statement as is the case for the other
opcodes. The actual input to the statement which triggers
its execution can be of any type and its type is not noted
in the statement.
(2) Types are listed for as many variables as will be printed in
the output statement (same number as enabling count). The
rest should be entered as 0.
(3) Both types must be the same.
(4) There must be at least one destination specified for each
output port. Destinations for output port 1 must be given
before those for port 2.
(5) Types are listed for k variables which will be passed to the
subprogram as parameters (1 <= k <= 19).
(6) Types are listed for m variables which will be passed back
from the subpreogram (1 (= m <= 19). The enabling count
will not match this number. The enabling count is 1 since
all data tokens are sent in one packet to the terminate
statement.
(7) Types are listed for k variables which will be passed to the
subprogram as parameters. The number of types will not
118
equal the enabling count since all tokens are sent at once
in one packet. First data type shown must be a 4.
(8) Types are listed for m variables which are passed back to
the calling program, plus the first data type which refers
to the context control token and must be a 7-
119
The graph program shown below reads four integers from one
input file, multiplies the sum of the first and second by the sum
of the third and fourth integers, and outputs the product along
with the identifying comment "program result =". The numbers in
parentheses to the left of each operator indicate the statement
number in the mnemonic and machine language programs which will
contain this particular operator.
0
(5) inputfile 1
(D
r^r-c
(2)
(3)
(4) output
1 program
result='
This same graph program is shown below ir. mnemonic form:
120
notrace
1 + 1 3, 1
2 + 1 3, 2
3 * 1 4, 1
4 output 1 'program result =' int
5 input 4 1 int 1 1,1
int 1 1,2
int 1 2, 1
int 1 2,2
1
115 1
The same program is again shown below in its machine
language form. This is the form that the simulator will accept
for execution.
notrace
5
11102220000000000000000 2 21
12 3 1
211022200000000020 0 0 0 02201
12 3 2
3 13 022200000000020000 2 0001
12 4 1
4 6 0 1 'program result =200000000000000000000
53011222200020000002000204
12 11
2 2 12
3 2 2 1
4 2 2 2
0
1
115 1
121
APPENDIX D
Graph Program to compute factorials from 1! to 7! using recur
sion.
Main Program
Factorial Subprogram
122
1
a.en wat e.
term inat
_
I
V
123
Machine Language form of the factorial program:
notrace
20
17 0 1 -000000019001211301300000 3
17 7 1
2 2 5 1
2 2 2 1
2 22 0 2 22000000000000000000 .
115 2
3 11 02 22000000000000000000 .
12 4 1
4 901 20000000000000000000
12 5 1
12 2 1
5 20 02 21000000000000000000 4
12 3 1
12 9 1
1 2 10 1
2 2 6 1
6 10 01 20000000000000000000 J
12 7 2
7 811 72000000000000000000 1
12 8 1
8501 20000000000000000000 0
9 6 0 2 'loop #, factorial:' 22000000000000000000
10 29 01 20000000000000000000 2
1 4 12 1
2 2 12 2
11 30 01 20000000000000000000 1
12 9 2
12 27 01 42000000000000000000 3
1 7 20 1
2 2 14 1
2 2 13 1
13 25 02 2000000000000000000 1
1 1 14 2
14 20 02 21000000080000000000 3
1 2 15 2
2 2 16 1
2 2 17 1
15 11 02 22000000000000000000 1
1 2 20 2
16 13 02 22000000000000000000 1
1 2 20 2
17 12 02 22000000000000000000 1
1 2 18 1
18 29 01 20000000000000000000 2
1 4 12 1
2 2 12 2
19 30 01 20000000000000000000 1
1 2 16 2
20 28 11 72000000000000000000 1
12 0 1
3
7 12 2
0 1 13 2
1 3 15 1
17 2
3 2
1
1111
124
Output of the factorial program;
Reed the program.
Start execution.
loop #, factorial: -1 1
loop #, factorial: 2 2
loop # , factorial : 3 6
loop #, factorial: 4 24
loop #, factorial: 5 120
loop #, factorial:. 6 720
loop #, factorial: 7 5040
12S
APPENDIX E
Trace of the program shown in Appendix C:
Bead the program.
Finished reading in program.
oFen file 1
Echo print of program:
1 11 0 2 2 2'0 00020000000000000
12 3 1
2 1ie222000000000000000000
12 3 2
3 13 0222000000000000000000
12 4 1
4 6 0 1 program result =20000000000000000000
530112222000000000000000 0
12 11
2 2 1 2
3 2 2 1
4 2 2 2
End of Frogram. Number of statements = 5
Constants:
Starting constants:
115 1
Start execution.
started pi
started p2
started p4
started p5
started p
started p7
started p8
started p9
started process match
Started Fetch On it
started p3
started p!0
Match Unltr Got packet for instruction 5
Fetch, Unit: Gat instruction 5
Match Unit? Got packet for instruction 1
Match Unit: Got packet for instruction 1
Fetch Unit: Got instruction I
Match Unitr Got packet for instruction 2
Match Unit: Got packet for instruction 2
Fetch. Unitr Got instruction 2
Match; Unit: Got packet for instruction 3
Match Unit: Got packet for instruction 3
Fetch Unit: Got instruction 3
Match Unit: Got packet for instruction 4
Fetch Unitr Got instruction. 4
program result
- 21
126
Z BeB___iK. E
Error conditions detected by the procedures executing
instructions are as follows:
<1) Tokens are not of the type expected. For example, the
instruction was coded to expect two integer inputs and
instead received two boolean input tokens.
(2) A token is missing for one or more input ports. There prob
ably were multiple tokens sent to a single port.
(3) Received more than one token for a port (non-fatal error for
output statement).
<4) Trying to print other than character or integer data in out
put statement.
(5) No instruction address token received by a begin statement,
or no context control packet received by an end statement.
127
USER MANUAL
In using the simulator, the easiest way to go about prepai
ing a program is to first write the program in graph language.
Then each operator should be numbered starting from 1, remember
ing to skip one number after any apply statement. The operators
can be ordered in any sequence; order is not important. What is
important is that all destination information gives the correct
destination operator number. The program can then be written in
mnemonics or directly into machine language, and at this point,
the statements must be listed in the order in which they were
numbered.
The file containing the machine language program, say
pgmname, is then specified in the run statement, followed by any
data file names:
'A dfsirn <pgmname datafilel datafile2 ... datafileS
The output in this case will appear on the terminal. Output can
be redirected to a file as follows:
54 dfsim <pgmname datafilel ... datafileS >outputfile
If the program runs normally with no errors, the run will end
with a message to the effect that all processes are blocked.
This message will appear on the terminal whether or not program
output is redirected to a file. If a run time error is encoun-
128
tered, however, the output files will not be closed and therefore
are not accessible. All the user is given on the terminal is a
very cryptic message such as "Bus error - core dumped" or "Memory
Fault - core dumped". In that case, output should not be
redirected into a file but sent directly to the terminal and the
user will be able to see what output, if any, was produced before
the error was encountered. The use of the trace function as
described in section 5.5 is also extremely helpful in finding the
program error and is the only way to watch execution statement by
statement.
