Method for concurrent execution of primitive operations by dynamically assigning operations based upon computational marked graph and availability of data by Stoughton, John W. & Mielke, Roland V.
United States Patent [I91 
Stoughton et al. 
[ i i ]  Patent Number: 4,922,413 
[451 Date of Patent: May 1, 1990 
[54] METHOD FOR CONCURRENT EXECUTION 
OF PRIMITlvE OPERATIONS BY 
DYNAMICALLY ASSIGNING OPERATIONS 
BASED UPON COMPUTATIONAL MARKED 
GRAPH AND AVAILABILITY OF DATA 
[75] Inventors: John W. Stoughton; Roland V. 
[73] Assignee: Center for Innovative Technology, 
[21] Appl. No.: 29,665 
[22] Filed: Mar. 24,1987 
[51] Int. C l . 5  ......................... G06F 15/16; G06F 9/38 
[52] U.S. Cl. ................................. 364/200; 364/230.3; 
36W232.22; 364/281.4; 364/281.8 
[58] Field of Search ... 364/200 MS File, 900 MS File, 
364/300 
1561 References Cited 
Mielke, both of Virginia Beach, Va. 
Herndon, Va. 
U.S. PATENT DOCUMENTS 
4,149,240 4/1979 Misumas et al. .................... 364/200 
4,153,932 5/1979 Dennis et al. ....................... 364/200 
4,156,910 5/1979 Barton et al. ....................... 364/200 
4,247,892 1/1981 Lawrence ........................... 364/200 
4,251,861 2/1981 Mago .................................. 364/200 
4,318,173 3/1982 Freedman et al. .................. 364/200 
4,319,321 3/1982 Anastas et al. ...................... 364/200 
4,384,324 5/1983 Kim et al. ........................... 364/200 
4,644,461 2/1987 Jennings .............................. 364/200 
4,644,464 2/1987 Logsdon et al. .................... 364/200 
4,733,347 3/1988 Fuknoka ............................. 364/200 
OTHER PUBLICATIONS 
Peterson, Petri Net Theory and the Modeling of Systems, 
Prentiss-Hall, Englewood Cliffs, N.J., 198 1. 
Commoner et al., “Marked Directed Graphs”, Journal 
of Computer and System Sciences, 5,  511-523 (1971). 
Murata, T., “Use of Resource-Time product concept to 
derive a performance measure of timed Petri nets”, 
Procs. of the 1985 Midwest Symposium on Circuits and 
Systems, Univ. of Louisville, Aug. 19-20, 1985. 
Primary Examiner-Thomas C. Lee 
Attorney, Agent, or Firm-Staas & Halsey 
[571 ABSTRACT 
Computationally complex primitive operations of an 
algorithm are executed concurrently in a plurality of 
functional units under the control of an assignment 
manager. The algorithm is preferably defined as a com- 
putationally marked graph contianing data status edges 
(paths) corresponding to each of the data flow edges. 
The assignment manager assigns primitive operations to 
the functional units and monitors completion of the 
primitive operations to determine data availability using 
the computational marked graph of the algorithm. All 
data accessing of the primitive operations is performed 
by the functional units independently of the assignment 
manager. 
2 Claims, 9 Drawing Sheets 
INPUT ALGORmM 
DEFINED AS PRIMWE 
OPERATONS AND DATA 
PROVlDE DEFINITIONS 
OF ALGORmMS AND 
PRIMITIVE OPERATIONS 
I ASSIGN PRIMWE OPERATION TO AVALAELE FUNCTION UNil 
https://ntrs.nasa.gov/search.jsp?R=20080008773 2019-08-30T03:45:37+00:00Z
US. Patent May 1,1990 Sheet 1 of 9 4922,413 
US. Patent May 1,1990 Sheet 2 of 9 4,922,413 






FIRST FIRST INPUT STATUS EDGE 
DATA STATUS EDGE 
66 52 
US. Patent my 1,1990 Sheet 4 of 9 4,922,413 
FIRST OUTPUT 












us. Patent May 1,1990 Sheet 5 of 9 4922,413 
FIRST OUTPUT 
PROCESS STATUS EDGE 
FIG. 3E 
FIRST OUTPUT 




US. Patent May I, 1990 Sheet 6 of 9 4,922,413 
FIRST FIRST OUTPUT 
DATA FIRST INPUT STATUS EDGE 
STATUS EDGE 
52 66 
FIRST FIRST OUTPUT 
DATA FIRST INPUT FIG. 3H STATUS EDGE 
INPUT STATUS EDGE 
STATUS EDGE READY BUSY EDGE 64 
EDGE 56 66 52 
US. Patent May 1,1990 Sheet 7 of 9 4,922,413 




1 ASSIGiMENT MANAG R prl ,o
Sheet 8 of 9 4,922,413 
TOKEN BUS  
FUNCTIONAL 
UNIT # 1 
FUNCTIONAL I UNIT#2 I 
t ,107 
... FUNCTIONAL _. 
UNIT # K 4 DATA BUS 
I 
U.S. Patent May 1,1990 
FIG. 6 
Sheet 9 of 9 
c 
INPUT ALGORITHM 
DEFINED AS PRIMITIVE 
OPERATIONS AND DATA 
FLOW IN BETWEEN 
1 
PROVIDE DEFINITIONS 
OF ALGORITHMS AND 
PRIMITIVE OPERATIONS 
TO ASSIGNMENT MANAGER 
ASSIGN PRIMITIVE 





executed on data processing systems having, e.g., a 
data-driven architecture. 
Efforts have been made to define tasks to be per- 
formed in a data processing system for concurrent exe- 
5 cution of primitive operations of an algorithm. For 
example, previous proposals for assignment of the com- 
puting elements or functional units to particular tasks 
have used either static or dynamic assignment. When 
static assignment is used, the tasks performed by each The invention described herein was developed in part 1o functional unit is determined during the design of a 
program. This requires a great deal of specificity in the 
program, increasing the amount of time required for 
development and thereby reducing the benefits gained 
from using data flow control in a processing system. 
15 Dynamic assignment, on the other hand, determines the 
task performed by each functional unit at the beginning 
of execution of an algorithm. This reduces the effort 
required during but still results in fixed 
METHOD FOR CONCURRENT EXECUTION OF 
PRIMITIVE OPERATIONS BY DYNAMICALLY 
ASSIGNING OPERATIONS BASED UPON 
COMPUTATIONAL MARKED GRAPH AND 
AVAILABILITY OF DATA 
ORIGIN O F  THE INVENTION 
1 
with '*'. Government under 'Ontract 
17993 with the National Aeronautic and Space Admin- 
rights in this invention. 
(NASA)' The u's* Government has certain 
BACKGROUND OF THE INVENTION 
1. Field of the Invention 
The present invention is related to the analysis of the 
concurrently processed computationally 
rithm and, more PafiicularlY, to a data Processing 'YS- 20 tive operation during the processing of the algorithm. 
algo- correspondence between a functional unit and a pimi- 
tern that uses a Petri net model of concurrently pro- 
cessed computationally complex algorithms. SUMMARY O F  THE INVENTION 
2. Description of the Related Art An object of the present invention is to provide a 
One method which is currently being developed datadriven architecture in which functional units are 
processing of primitive operations in an algorithm. The primitive operations of an algorithm. 
hardware used in a data processing system having such Another object of the present invention is to provide 
a parallel architecture is relatively easy to develop by scheduling, coordination and communication for a plu- 
using, e.g., identical, special Purpose computing de- rality of functional units concurrently executing compu- 
ments each Of which can a shared memory. In 30 tationally intensive primitive operations of an algorithm 
comparison, it has been much more difficult to develop by providing a controlled path for each data path in a 
the software required to schedule, coordinate and com- petri net model of the algorithm. 
municate between the individual computing elements Another object of the present invention is to provide 
and other portions of the data processing system, such a data processing system having multiple functional 
as external data in~ut /out~ut  to and from terminals, 35 units concurrently processing computationally inten- 
printers, tape drives, etc. sive primitive operations of an algorithm in which the 
Conventional multiprocessing systems typically exe- operations performed by the functional units are contin- 
cute a Program in a single Processor, while different uously reassigned during execution of the algorithm. 
Program(s) executes in the other PrOcessor(s). The Yet another object of the present invention is to pro- 
scheduling, coordination and communication problems 40 vide a concurrent data processing system in which con- 
in such multiprocessor systems are among the most currency is maximized. 
complex addressed by existing computer systems. How- A further object of the present invention is to provide 
ever, in the type of Parallel Processing discussed above, a data processing system having a plurality of functional 
solution of coordination and communication problems units concurrently executing primitive operations of an 
is even more critical, because a single Program or a b -  45 algorithm which is slowed, but not prevented from 
rithm is being processed by more than one computing executing an algorithm, if one or more, but not all, 
element. AS a result, in addition to sharing the hard- functional units become inoperative. 
ware, as in the conventional multiprocessor system, the The above objects are attained by providing a 
computing elements also share data. Since a first in- method for concurrent execution of primitive opera- 
struction or task may be performed in a first computing 50 tions of an algorithm in a data processing system having 
element to generate intermediate data used in a second a data-driven architecture comprising an assignment 
task performed by a second computing element, the manager, a plurality of functional units connected to the 
flow of data in the system must be carefully controlled. assignment manager and a global data memory con- 
Such data processing systems which referred to as per- nected to the functional units. The method comprises 
forming concurrent parallel processing and may have a 55 the steps of defining an algorithm in terms of primitive 
data-driven architecture, Le., controlled by data flow, operations and data flow between the primitive opera- 
or a demand-driven architecture. tions; providing definitions of the algorithm and the 
The differences between a data processing system primitive operations to the assignment manager; assign- 
performing concurrent operations and a system orga- ing each primitive operation to an available function 
nized according to von Neumann principles render 60 unit in dependence upon the definitions previously de- 
conventional methods of describing and defining com- fined as the data for that primitive operation becomes 
puter operation inadequate. For example, flowcharts or available; and monitoring completion of each primitive 
conventional algorithm graphs which are useful in de- operation to determine data availability. 
scribing the operation of a von Neumann structure are These objects, together with other objects and advan- 
insufficiently descriptive of the operation of a concur- 65 tages which will subsequently apparent, reside in the 
rent processing system. Techniques for defining and details of construction and operation as more fully here- 
controlling the operation of a concurrent processing inafter described and claimed, reference being had to 
system is critical for the development of software to be the accompanying drawings forming a part hereof, 
increase the execution speed Of computers is parallel 25 continuously assigned during concurrent execution of 
3 
4,922,413 
wherein the like reference numerals refer to like parts 
throughout. X.W=B.X,V.I + B.UN 
4 
BRIEF ~ ~ % X I P T I O N  OF THE DRAWINGS An algorithm graph of this recursive function is illus- 
FIG. 1 is an example of an algorithm directed graph; 5 trated in FIG. 1 as described below. The second step in 
FIG. 2 is a node marked graph for a node in an a l p -  the preferred embodiment is to define the activities 
rithm marked graph having two inputs and two outputs; which occur in each primitive operation in a graphical 
FIG. 3A is the node marked graph of FIG. 2 with fashion using a node marked graph. The third step is to 
tokens representing the idle state; combine the node graphs for each node to form a com- 
FIG. 3B is the node marked graph of FIG. 2 with 10 putational marked graph which fully defines the opera- 
tokens representing the process ready state; tions performed by a parallel processor in executing the 
FIG. 3C is the node marked graph of FIG. 2 with algorithm. The computational marked graph can be 
tokens representing the data accepted state; input to a data processing system in a manner discussed 
FIG. 3D is the node marked graph of FIG. 2 with further below. 
tokens representing the processing function - outputs 15 A Petri net is a useful mathematical tool for modeling 
empty state; and analyzing systems containing interacting concur- 
FIG. 3E is the node marked graph of FIG. 2 with rent components. Petri nets were first developed in 1962 
tokens representing the process complete - outputs de- by Carl Petri in “Kommunikation mit Automaten”, 
livered state; Ph.D. dissertation, University of Bonn, Bonn, West 
FIG. 3F is the node marked graph of FIG. 2 with 20 Germany, 1962, and were later identified as a useful 
tokens representing the process complete - inputs avail- analysis tool by Holt and Commoner in “Events and 
able state; Conditions”, Applied Data Research, New York, 1970. 
FIG. 3G is the node marked graph of FIG. 2 with A comprehensive treatment of Petri nets is presented in 
tokens representing the data consumed state; J. L. Peterson, Petri Net Theory and the Modeling of 
FIG. 3H is the node marked graph of FIG. 2 with 25 Systems, Prentiss-Hall, Englewood, Cliffs, N.J., 1981. 
tokens representing the process function - output buffer In the usual Petri net reuresentation of a svstem. 
full state; 
FIG. 4 is a computational marked graph correspond- 
ing to the algorithm marked graph in FIG. 1; 
FIG. 5 is a block diagram of a parallel processing 
system to which the present invention can be applied; 
FIG. 6 is a flowchart of a method according to the 
present invention. 
~ 
DESCRIPTION OF THE PREFERRED 
EMBODIMENT 
As discussed above, at the present time it is much 
easier to develop hardware for parallel processing than 
to develop software which takes advantage of such 
hardware to execute a single algorithm. While it is pos- 
sible to specify the operations to be performed in each 
processing unit in a parallel processor, this requires a 
great deal of programming time and also requires that 
the specified number of processors be available at run 
time while making no use of any additional processors 
which might be available. This is referred to as static 
assignment of processing or functional units. Dynamic 
assignment of functional units at the time processing 
begins is only a slight improvement over static assign- 
ment in that it is difficult to compensate for a failure of 
a processing unit during execution of an algorithm. As a 
result, there is little fault tolerance in such systems and 
a great deal of programming effort is required. 
According to the present invention, these obstacles 
are overcome by decomposing an algorithm into a se- 
quence of primitive operations, some of which can be 
performed concurrently. Preferably, the algorithm is 
represented as a Petri net to clearly display both data 
flow and control flow in the execution of the algorithm. 
The decomposition process is performed in the three 
steps which follow. Each step will be described in more 
detail below. 
.First, according to a preferred embodiment of the 
present invention, an algorithm graph is developed 
which depicts primitive operations ob the algorithm 
and the data flow between the primitive operations. 
Equation (1) is an example of a relatively simple algo- 
rithm which is a discrete system state recursion: 
events or actions which takk place’in the system are 
modeled by “transitions”. The condition or status of the 
system is modeled by “places”. The existence of a con- 
30 dition is indicated by marking the corresponding 
“place” with one or more “tokens” and nonexistence of 
the condition is indicated by the absence of any tokens. 
A marking vector M is a vector whose elements identify 
the number of tokens marking each place. The execu- 
35 tion of a Petri net is controlled by the number and distri- 
bution of tokens in the net. A transition is enabled if 
each arc from its input places has at least one token. An 
enabled transition “fires” by removing one token from 
each arc directed from an input place to the transition 
40 and depositing into each output place one token for 
each arc directed from the transition to the place. Tran- 
sition firings continue as long as at least one transition is 
enabled. When there are no enabled transitions, the 
execution of the net halts. 
Two very important subclasses of Petri nets are state 
machines and directed graphs. A state machine is a Petri 
net in which each transition is restricted to having ex- 
actly one input place and one output place. A marked 
graph is a Petri net in which each place is restricted to 
50 having exactly one input transition and one output tran- 
sition. Thus, a state machine can represent the conflicts 
by a place with several output transitions, but cannot 
model the creation and destruction of tokens required to 
model concurrency or the waiting which characterizes 
55 synchronization Marked graphs on the other hand, 
cannot model conflict or data-dependent decisions, but 
can model concurrency. For this reason, the present 
invention preferably uses marked graphs to model and 
analyze current processing systems. 
The concept of time is not explicitly encoded in the 
definition of Petri nets. However, for performance eval- 
uation and scheduling problems it is necessary to define 
time delays associated with transitions. Such a Petri net 
is known as a timed Petri net and is described in T. 
65 Murata, “Use of Resource Time Product Concept to 
Derive a Performance Measure of Timed Petri Nets”, 
Proceedings of Mid West Symposium on Circuits and Sys- 





Petri net, enabling tokens are encumbered or reserved through the global data memory is postponed until all 
when a transition fires. After a specified delay time successor computational nodes have accessed the data. 
associated with the transition, tokens are deposited at This requirement helps reduce processing time, but 
the output places of the transition. Thus, a timed Petri leads to the necessity of providing data status informa- 
net is used, according to the present invention, to pro- 5 tion in the computational marked graph. Finally, a 
vide an effective method for modeling the finite delay primitive operation will not be assigned to a functional 
time associated with the completion of a primitive oper- unit unless the functional unit is available and all data 
ation by a processor. required by the primitive operation is available. 
As discussed above, the first step in executing an As a result of the requirement in the preferred em- 
algorithm according to the present invention is to de- 10 bodiment that output data from a primitive operation be 
compose the algorithm and describe it using an algo- accessed by subsequent nodes prior to being stored in 
rithm directed graph. A simple example of an algorithm the global data memory, the node marked graph is de- 
directed graph is illustrated in FIG. l. This graph is veloped with data status information being directed 
characterized by nodes which represent the operation along an edge corresponding to each edge providing 
or function to be performed on a data packet or vector 15 data between two nodes, but in a direction opposite to 
incident to that node. Directed arcs or “edges” of the that of the data flow. In addition, one embodiment of 
graph represent the data labels of inputs to the primitive the present invention breaks down each node of the 
operations. Edges leaving a node represent the output algorithm directed graph into three nodes: reading, 
of the primitive operations which are passed on to the processing and writing. Such a three node marked 
next primitive operation in the algorithm. 20 graph is illustrated in FIG. 2, where node 40 is a reading 
In FIG. 1, node 10 represents supplying the Nth node, node 42 is a processing node and node 44 is a 
value of input U. When this data is supplied, a token is writing node. Edge 46 provides a first input I1 and edge 
indicated on edge 12. A constant value is supplied on 48 provides a second input 12. Edge 50 provides data 
edge 14 from node 15 to node 16. When tokens repre- status corresponding to the first input I1 on edge 46, 
senting both UNand A are available, the primitive oper- 25 while edge 52 provides data status corresponding to the 
ation V*M is performed in node 16 by ‘‘firing” the node. second input data on edge 48. A data ready signal is 
The result is a token on edge 18 indicating that the data supplied along edge 54 to the processing node 42 when 
A.UN is available. This is supplied to node 20 which inputs I1 and I2 are full as indicated by tokens on edges 
performs the primitive operation V+V adding the 46 and 48 (see FIG. 3B). When the process or computa- 
value of B’XN-1 to A-UNand supplying the value XNon 30 tion are formed by processing node 42 is completed, a 
edges 22 and 24. While edge 22 supplies the value XNto token is placed along edge 56. Data output edges 58 and 
an output node 26, edge 24 supplies the value XN to 60 are directed outward from output node 44, while 
node 28 which performs the primitive operation V*M. edges 62 and 64 provide data status information regard- 
Edge 30 supplies the constant B from node 31 to node ing the use of the data output on edges 58 and 60, re- 
28 so that the value B.X,v-1 can be supplied from node 35 spectively, by subsequent primitive operations. When 
28 to node 20 on edge 32. the output buffers have been emptied, a token is placed 
Although the algorithm directed graph is useful in on edge 66 to indicate that the process is not busy. This 
describing the computational flow, it does not address state is represented by the tokens 68a-68e illustrated in 
the procedures a particular computing structure must FIGS. 2 and 3A. 
manifest in order to perform the task indicated on the 40 Eight possible states are illustrated in FIGS. 3A-3H. 
graph. Further, the graph is ambiguous.with regard to They respectively illustrate the idle state (FIG. 3A) 
issues of protocol, deadlock and resource assignment which is also illustrated in FIG. 2; process ready - inputs 
when processing is to be achieved in a multiple proces- enabled (FIG. 3B); data accepted - inputs consumed 
sor environment. Of particular importance is how the (FIG. 3C); performing processing (FIG. 3D); process- 
sequence of data flow and operations are managed in a 45 ing completed - outputs available (FIG. 3E); processing 
multiple computational resources environment which completed - inputs available (FIG. 3F.); data consumed 
emphasizes maximum concurrent processing. An unam- - output buffers full (FIG. 3G); and performing process- 
biguous representation of control/data flow is provided ing - output buffers full (FIG. 3H). 
by a particular variety of Petri net termed a computa- As discussed above, a computational marked graph is 
tional marked graph which permits data concerning 50 provided by replacing each of the nodes of the algo- 
control status and resource utilization to be monitored rithm marked graph illustrated in FIG. 1 with a node 
by the placement of “tokens” which may be tracked marked graph like those illustrated in FIGS. 2 and 
during execution of an algorithm. The development of 3A-3H. For the algorithm directed graph illustrated in 
the computational marked graph is accomplished by FIG. 1, the corresponding computational marked graph 
generating a node marked graph for each node in the 55 is illustrated in FIG. 4. In other words, the computa- 
algorithm directed graph. tional marked graph of a particular algorithm is com- 
In the preferred embodiment, the node marked posed of interconnected node marked graphs by joining 
graphs are generated after making certain assumptions the various input edges to corresponding output edges 
about the system which will execute the algorithm. It is of predecessor node marked graphs. 
preferable to assume that the system comprises a plural- 60 Preferably, the computational marked graph accord- 
ity of functional units each of which includes processing ing to the present invention has several features. First, 
capabilities, local memory for programmed storage and no more than one token is allowed in each place (edge) 
temporary input and output data containers. It is further at a time. This is significant in that the occurrence of 
assumed that a global data memory will be available to more than one token in a place would indicate overrun- 
all functional units. Inputs associated with each edge in 65 ning data buffers or perhaps multiple allocation of a 
the algorithm directed graph correspond to fixed data process. Second, data flow and bufferfull conditions are 
containers in the global data memory. In the preferred represented by directed arcs corresponding to the di- 
embodiment, transfer of data from a functional unit rected data flow of the original algorithm directed 
4,922,4 13 
7 8 
graph, while status and control information are pro- PC/XT and each of the units 104-108 include an 
vided on arcs corresponding to, but oppositely directed INTEL 8088 processor, 32 kbytes of memory and the 
to the original data flow. For example, edge 14 in FIG. required number of input/output ports in an S-100 bus 
4 corresponds to edge 14 in FIG. 1 in that the data A is enclosure with a power supply. 
supplied from node 15 to the algorithm node 16. There- 5 In the test system, the units 104-108 and 102 have 
fore, edge 70 is included to provide data status informa- been connected by RS-232 cables transmitting at 9600 
tion indicating whether the input provided on edge 14 baud. However, the present invention is by no means 
has been read by reading node 71. Similarly, edge 72 limited to such a system, but could be operated on any 
provides data status corresponding to the input data UN parallel processor system which includes a device to 
on edge 12 and edge 74 provides data status information 10 carry out the functions of the assignment manager de- 
corresponding to data flow edge 18. Similarly, edges 76, scribed below. For example, the present invention 
78, 80 and 82 provide data status information corre- could be applied to a system in which the functions of 
sponding to data flow paths 22, 24, 30 and 32. In addi- the assignment manager are performed by one or more 
tion, each of the processing nodes 16, 20 and 28 of the of the functional units. Similarly, while the global mem- 
algorithm directed graph in FIG. 1 has been replaced in 15 ory 108 is illustrated as a separate unit, in an actual 
FIG. 4 with a node marked graph including reading, system, the memory associated with each of the func- 
processing and writing nodes interconnected by three tional units 105-107 might contain a copy of the con- 
edges (arcs) as illustrated in FIG. 2. Nodes 16 and 28 tents of the global memory with transmission between 
have been replaced with node marked graphs which the units occurring at the time that the data is to be 
have only a single output, but otherwise they are con- 20 stored in the global memory. In an actual system, it is 
structed the same as the node marked graph illustrated expected that the units 104-108 may each comprise a 
in FIG. 2. single printed circuit board or even a single integrated 
As is known in the art, Petri nets have properties circuit. In addition, instead of being identical units, the 
which include safeness and liveness. As noted above, units 104-108 may be differentiated by the functions 
the marked graphs generated according to the present 25 which they are intended to perform. For example, the 
invention are required to have no more than one “to- global memory 108 may contain a larger amount of 
ken” in each “place” during execution of the algorithm memory if a large amount of data is being processed. 
represented by the graph. Thus, according to the con- Also, one or more of the functional units, such as func- 
ventional terminology, the Petri net represented by the tion unit 107 might contain additional input/output 
computational marked graphs generated according to 30 ports for receiving or transmitting data to or from the 
the present invention are “safe”. A Petri net is consid- external data unit 114. 
ered to be “live” if every transistion is enabled during According to the present invention, an algorithm, 
execution of the net, Le., the algorithm. It is has been defined as a computational marked graph, is executed 
shown in Peterson (supra) that a marked graph is live if by a system, such as that illustrated by in FIG. 5, as 
and only if the number of tokens on each cycle is at least 35 follows. The assignment manager 104 initiates execu- 
one. Preferrably, initialization of execution of an algo- tion of the algorithm by assigning nodes 2 the functional 
rithm according to the present invention includes the units in the manner described in the following para- 
requirement of a live marking. graph. As the primitive operation performed by an 
One example of a parallel processing system which initially activated functional unit is made available, the 
could be used to execute an algorithm defined as a com- 40 assignment manager 104 assigns the primitive operation 
putational marked graph, generated as discussed above, which uses the produced data to another functional 
is illustrated in FIG. 5. In the parallel processing system unit. Preferably, only the primitive operations to be 
embodiment illustrated in FIG. 5, an algorithm loader performed by a functional unit and the data labels are 
102 is used to deiine algorithms as computational assigned by the assignment manager 104. All data access 
marked graphs and then load the computational marked 45 and processing operations are performed by the func- 
graph to an assignment manager 104. The algorithm tional units 105-107. Data from the functional units 
loader 102 is provided to simplify the user interface and 105-107 is stored in and supplied to the functional units 
permit diagnostics to be run to monitor the functioning 105-107 by the processor in the global memory 108. As 
of the parallel processors. However, once the algorithm execution of the algorithm proceeds, the assignment 
or algorithms to be processed by the parallel processor 50 manager monitors the completion of each transistion 
have been supplied to the assignment manager 104, the node by marking the appropriate edges or places in the 
algorithm loader 102 is no longer required and could be computational marked graph stored in its memory with 
disconnected from the system. “tokens”. 
In addition to the assignment manager 104, the paral- As illustrated in FIG. 5, the algorithm is executed in 
le1 processing system illustrated in FIG. 5 includes K 55 k functional units 105-107 by accessing data stored in 
functional units 105-107 and a global data memory 108. the global data memory 108 via the data bus 112. Con- 
The functional units 105-107 are connected to the as- trol and status information are passed between the func- 
signment manager 104 via a token bus 110 and to the tional units 105-107 and the assignment manager 104 via 
global data memory 108 via a data bus 112. The units the token bus 110. The assignment manager 104 is pri- 
104-108 may be identical with their role in the system 60 marily engaged in managing the buffer status of the 
determined only by the program being executed inputs and outputs of each node. The operation of the 
therein. This permits a high degree of redundancy al- assignment manager 104 and the type of data passed 
lowing any of the units to perform the functions of the over the token bus 110 will be described below with 
assignment manager or the global memory provided the reference to FIG. 2. 
necessary programs are stored in its memory and the 65 For a node in the computational graph with two 
proper connections are provided. For example, for inputs, I1 and 12, and two outputs 01 and 02, the node 
testing purposes, a parallel processing system has been marked graph corresponds to the graph illustrated in 
constructed in which the algorithm loader is an IBM FIG. 2. As noted above, the assignment manager 104 
4,922,413 
9 10 
includes both processing capability and memory. The Many other features and advantages of the present 
memory is used to store the following information for invention are apparent from the detailed specification, 
each node of the graph: node number; processing indi- and thus, it is intended by the appended claims to cover 
cator; function name; and buffer status indicators for 11, all such features and advantages which fall within the 
12, 01 and 02. 5 spirit and scope of the invention. Further, since numer- 
The data passed between the assignment manager 104 ous modifications and changes will readily occur to 
and the functional units 105-107 may be viewed as those skilled in the art, from the disclosure of the inven- 
asynchronous handshaking. In one embodiment of a tion, it is not desired to limit the invention to the exact 
dataprocessing system to which the invention can be construction and operation illustrated and described. 
applied, the following messages are transmitted. These For example, while graph terminology has been used in 
messages may be hard-wired or in the form of data describing the operation of the assignment manager, the 
packets. present invention is not limited to a graph oriented 
To initiate the start of a computation, the assignment system, but rather is applicable to any concurrent exe- 
manger 104 first determines that node #1 is fireable by cution of an algorithm in which primitive operations are 
determining the status bits associated with the inputs of continuously assigned by functional units during execu- 
node #1 indicate that all input buffers are full (11-f, 12-0 l5 tion of an algorithm by an assignment manager. Ac- 
and the process is not busy (NB) as illustrated in FIG. 2. cordingly, suitable modifications and equivalents may 
When these conditions are recognized, the assignment be resorted to, all falling within the scope and spirit of 
manager 104 sends a READY message over the token the invention. 
bus 110 to all of the functional units 105-107. The What is claimed is: 
READY message preferably includes the name of the 2o 1. A method for concurrent execution of primitive 
function to be processed and the inputs to be used. The operations in an algorithm by a data processing system 
available functional units may respond with an AC- having a data-driven architecture comprising an assign- 
CEPT message containing the functional unit identifica- ment manager, a plurality of functional units connected 
tion number and the function name. The assignment to the assignment manager and a global data memory 
manager 104 acknowledges one of the ACCEPT mes- 25 connected to the functional units, said method compris- 
sages with an IDACK response indicating the identifi- ing the steps of: 
caiion number of the funchonal unit to b e  assigned the 
function, the function to be performed, and the node 
number of the graph. The assignment manger 104 then 
updates the computational marked graph in its memory 30 
to indicate that the node corresponding to that node 
number (#1) has been assigned to a functional unit, Le., 
functional unit 105. As soon as the indicated functional 
unit 105 has retrieved the input data from the global 
data memory 108, it transmits an INPUT message indi- 35 
cating its identification number and the input buffers 
which have been used. The assignment manager then 
updates the input status bits of the corresponding node 
in the computational marked graph stored in its mem- 
ory and responds with an INACK message to acknowl- 40 
edge receipt of the input message. 
When processing is completed, the functional unit 
executing the primitive operation transmits an output 
request message OUTRQ to the assignment manager 
104 over the token bus 110. The output request message 45 
would include at least the functional unit identification 
number and the number of the node being processed. 
The assignment manager 104 responds with an output 
status message OUTSTAT indicating the identification 
number of the functional unit and the labels of the 
global data memory 108 to be used for the outputs 01 50 
and 02. The functional unit 105 responds by transmit- 
ting the output data to the global data memory 108 via 
the data bus 112. When this process is complete, the 
functional unit 105 transmits an OUTPUT message to 
the assignment manager 104 via the token bus 110. The 55  
assignment manager 104 responds by updating the com- 
putational marked graph in its memory and transmitting 
an output acknowledge message OUTACK to the func- 
tional unit. 
of highly efficient concurrent processing of primitive 
operations in an algorithm while minimizing the efforts 
involved in programming and providing a large amount 
of fault tolerance, even the failure of a functional unit 
during execution of the algorithm. In addition, the num- 65 
ber of functional units need not be known at the time 
that the algorithm is defined as a computational marked 
graph. 
A system such as that described provides the benefits 60 
<a) inputting an algorithm defined in terms of primi- 
tive operations and data flow between the primi- 
tive operations, including the steps of 
(ai) generating an algorithm directed graph contain- 
ing nodes representing primitive operations and 
data flow arcs representing data flow in the algo- 
rithm; 
(aii) generating a node marked graph for each of 
the nodes in the algorithm directed graph gener- 
ated in step (ai), the node marked graph includ- 
ing at least one input edge and at least one output 
edge corresponding to the data flow arcs associ- 
ated with the corresponding node in the algo- 
rithm directed graph, and data status edges, each 
of the data status edges corresponding to one of 
the input and output edges; 
(aiii) generating a computational marked graph 
corresponding to the algorithm directed graph 
generated in step (ai) by combining the node 
marked graphs generated in step (aii), each out- 
put edge of a predecessor node being associated 
with an input edge of the successor node; and 
(aiv) storing data representing the computational 
marked graph in the data processing system; 
(b) automatically providing definitions of the algo- 
rithm and the primitive operations to the assign- 
ment manager; 
(c) assigning each primitive operation to an available 
function unit in dependence upon the definitions 
previously defined as the data for that primitive 
operation becomes available, said assigning per- 
formed by the assignment manager; and 
(d) monitoring in the assignment manager completion 
of each primitive operation to determine data avail- 
ability. 
2. A method as recited in claim 1, 
wherein the algorithm directed graph contains at 
least one processing node, and 
wherein step (aii) comprises the step of defining in- 
put, processing and output transitions for each of 
the processing nodes in the algorithm directed 
graph. * * * * *  
