Increasing processor utilization during parallel computation rundown by Jones, W. H.
I, 
I' 
--_./Jf!jlf 7?J!-F73f/f? 
............ 
NASA Technical Memorandum 87349 
NASA-TM-87349 19860017442 
~----------------~ 
Increasing Processor Utilization During 
Parallel Computation Rundown 
William H. Jones 
Lewis Research Center 
Cleveland, Ohio 
Prepared for the 
1986 International Conference on Parallel Processing 
OCT 3 ·19Ho 
LAN('lEY Hl!:>lAIl(;,.., CtiH £R 
lIiRARY. NAS,\ 
HAMPTON, VIRGINI" 
sponsored by the Institute of Electrical and Electronics Engineers, Inc. 
St. Charles, Illinois, August 19-22, 1986 
NI\SI\ i 111111111111111111111111111111111111111111111 . NF01535 
https://ntrs.nasa.gov/search.jsp?R=19860017442 2020-03-20T14:27:47+00:00Z
dd' '''i' \'\""6 c ,', II '.r. J.\ I c.'r···UE 1~! priGE 27(:19 C{i TEC;OF,'y' ~59 f{PTtl: ~ N{-)S(-i-Tt1-873 l ·I·9 
! \ \~;c ~ "L·····::::. ,_.:.) ".:\.,,:' ':1'· f.'.:) 1·i . .. 1':' .. 1' ~,',:,,'. :, ..'.:' .... ,  -.);:OJ'. i . - .) DOl'" ~;_u _ ~ 1.15:87349 86/00/00 13 PAGES UNCLASSIFI~L JCU~Ehl 
UTTL: Increasing processor utilization during parallel computation rundown 
(~UrH:: (i/JD!'U:::U!l Vl. 1···1. 
CORP: National Aeronautics and Sp~ce Administration. Lewis Research Center, 
Cleveland, Ohio. AVAIL.NTIS 
SAP: HC A02/MF AOl 
CID: UNITED STATES Presented at the 1986 International Conference on Parallel 
Processing, St. Charles, Ill., 19-22 Aug. 1986; sponsored by IEEE 
MAJS~ I*COMPUTATION/*PARALLEL PROCESSING (COMPUTERS)/*RUN TIME (COMPUTERS) 
MINS: / ALGORITHMS; NAVIER-STOKES EQUATION/ PROGRAMMING LANGUAGES/ SCHEDULING 
(~IFJri ~ (~IU. t h C) \-. 
ABS~ Some parallel processing environments provide for asynchronous execution 
and completion of general purpose parallel computations from a single 
computational phase. When all the computations from such a phase are 
complete, a new parallel computational phase is begun. Depending upon the 
granularity of the parallel computations to be performed, there may be a 
shortage of available work as a particular computational phase draws to a 
close (computational rundown). This can result in the waste of computing 
resources and the delay of the overall problem. In many practical 
instances, strict sequential ordering of phases of parallel computation is 
not totally required. In such cases, th~ beginning of one phase can be 
.. 
r-
o 
r-
C"1 
I 
W 
INCREASING PROCESSOR UTILIZATION DURING PARALLEL COMPUTATION RUNDOWN 
, ' 
William H. Jones 
National Aeronautics and Space Administration 
Lewis Research Center 
Cleveland, Ohio 44135 
SUMMARY 
Some parallel processing environments provide for asynchronous execution 
and completion of general purpose parallel computations from a single computa-
tional phase. When all the computations from such a phas~are complete, a new 
parallel, computational phase is begun. Depending upon the granularity of the 
parallel computations to be performed, there may be a shortage of available 
work as a particular computational phase draws to a close (computational run-' 
down). This can result in the waste of computing resources and the delay of 
the overall problem. 
'In many practical instances, strict sequential ordering of phases of 
parallel comp~tation is not totally required. In such cases, the "beginning" 
of one phase can be correctly computed before the "end" of a previous phase is 
completed. This allows additional work to be generated somewhat earlier to 
keep computing resources busy during each computational rundown. This paper 
identifies the conditions under which this can occur, reports the frequency of 
occurrence of such overlapping in an actual parallel Navier-Stokes code, sug-
gests a language construct~ and discusses possible control strategies 
for the management of such computational phase overlapping. 
INTRODUCTION' 
General purpose parallel computations are usually divided into phases 
that must execute sequentially in order to guarantee algorithmic integrity. 
For instance, the checkerboard approach to the successive over-relaxation 
solution of the 'potential field problem divides into two such phases: the 
"odd" locations phase and the "even" locations phase. On the parallel phase 
level, the iterated values of the previous phase must be complete before the 
new values of the next phase can be correctly computed. 
In the checkerboard algorithm, the execution time of each location is 
definite (nominally, the time for four additions and a divide). Thus, the 
distribution of work among processors can be accurately planned. Under ideal 
conditions (involving the number of checkerboard locations in comparison to 
the number of processors), the distribution of work can be arranged so that 
each processor shares an exactly even po~tion of the work and, as a conse-
quence, each processor completes its work at exactly the same time. Perfect 
computation resource utilization is realized (at least in a practical sense) 
since the next computational phase can begin immediately. 
Unfortunately, ideal conditions are infrequently found in real applica-
tions. Continuing with the checkerboard algorithm, consider the situation 
when the potential grid is 1024 points on a side (2**20 grid points) and 1000 
processors are available. Each computational phase will provide 524, 288 
individual computations, or 524 computations for each of the 1000 processors; 
however, 288 computat1ons w1ll be left over for d1str1but1on among the 1000 
processors. Th1s w1ll leave 712 processors w1th noth1ng to do wh11e the f1nal 
288 computat1ons are carr1ed out. 
The burden of exper1ence ga1ned by the author suggests that even th1s 
example 1s optimistic. Most computat1ons carried out by the author's parallel 
Nav1er-Stokes solver (the Comb1ned Aerodynam1c and structural Dynamic Problem 
Emulat1ng Routines or CASPER (ref. 1) wh1ch was controlled by the Parallel, 
Asynchronous Execut1ve or PAX (ref. 2» could not even be ascr1bed w1th def1-
n1te execution times. In some instances, whether or not the computation was 
even to be carried out in a part1cular instance was a condit1onal part of the 
algorithm. No control over the computation-count-to-processor rat10 was 
attempted -- processors were allocated as they became ava11able on a the-more-
the-merr1er bas1s. Also, shared 1nformat1on access t1mes were unpred1ctable 
and unrepeatable from 1nstance to 1nstance. As a result, there was no assur-
ance that 1nd1v1dual processors could be kept busy as a part1cular computa-
t10nal phase drew to a close. 
The PAX/CASPER project prov1ded the exper1ence base c1ted later 1n th1s 
paper. PAX/CASPER was focused on a parallel, general purpose, Nav1er-Stokes 
solver. Thus, th1s exper1ence base 1s presented not as a grand genera11zat1on 
for all of parallel process1ng, but as a spec1f1c 
example 1n pract1cal parallel processing. 
Certa1n other situat10ns that m1ght seem of 1nterest 1n the overlapp1ng 
of computat10nal phases (for 1nstance, the poss1b1l1t1es for overlapp1ng 1n a 
tight iterat1ve loop) are not treated for the s1mple reason that theyd1d not 
occur in the PAX/CASPER project. PAX/CASPER was not so much a research project 
1n parallel processing as an exploratory development of a far-term aerodynam1c 
tool. Thus, the mot1vat10n was to solve the problems that occurred rather 
than to solve the problems that one could imagine. 
It has been suggested that scheduling and overhead problems w1ll be a 
particular problem in PAX/CASPER. So far, this has not been the case. Opera-
t10nal exper1ence shows that the rat10 of computat10n to management has been 
runn1ng at someth1ng 1n the neighborhood of 200. Th1s paper 1s an effort to 
chart a method of 1mprov1ng upon th1s situat10n so as to stave off any back-
s11d1ng that m1ght occur as the ratio of computat10nal to management resources 
1ncreases. There are add1tional strategies which have been ident1f1ed for 
development. These include a middle management scheme to parallelize the 
ser1al management function, a direct worker-to-worker lateral communication 
scheme, and a data-proximity work assignment algorithm. These strategies com-
bined w1th the overlapping of computational phases should enhance the manage-
ment overhead situation. 
Various solutions to the computational rundown problem may be acceptable. 
Some parallel processing schemes for general purpose computation may choose 
simply to accept the lower processor utilization as a minor design flaw. 
Another alternative is to create a multi-parallel-job-stream env1ronment that 
allows computational work of one job stream to fill in when another job stream 
enters a computational rundown situation. This will bring processor utiliza-
tion up; however, it should be recognized that the primary goal of parallel 
processing is to reduce elapsed wall-clock time for a given job. The intro-
duction of such a "batch" environment will inevitably distribute processor 
2 
resources among the several job streams and, thus, reduce the total processing 
power on any particular job and lengthen its elapsed wall-clock time. 
Overlapping Computational Phases 
The goal then is to find more ready-to-compute work from the parallel 
algorithm that is being computed. As mentioned previously, this is not pos-
sible at the parallel phase level: each phase must be completed before the 
next is begun in order to guarantee algorithmic integrity; however, if an 
examination is made at a deeper (sub-phase or, in the terminology of the 
author, task) level, it is frequently discovered that the completion of por-
tions (tasks) of one phase will allow the correct computation of portions 
(tasks) of the succeeding phase. 
Consider again the checkerboard algorithm. If all the "odd" locations 
adjacent to a particular "even" location have been updated with new values 
from the current computational phase, then the new value for that particular 
"even" location for the next computational phase can be correctly computed. 
Additionally, since all the computations requiring as an input the current 
value of that particular "even" location have been completed, the value for 
that "even" location can be updated without affecting the results of the 
current computational phase. 
At this point, it is necessary to make certain assumptions (or, alterna-
tively, set certain system design constraints) about the nature of computa-
tional phase rundown. Two basic situations arise: one in which task assignments 
and releases are statically determined and one in which such matters are 
dynamically determined. 
The static situation is much simpler from the standpoint of next-phase 
task release timing since everything is determined ahead of time. In this 
case, it can be acceptable for computational rundown to begin almost immedi-
ately since the scheduling of the next-phase task has already been statically 
determined. No completion processing of current-phase tasks is required to 
schedule the release of the next-phase task. (In fact, work in this area for 
the purposes of real-time simulation has been conducted for some years at NASA 
Lewis (refs. 3 and 4». 
The dynamic scheduling situation is substantially more interesting. Some 
time delay must be available between the completion of the first current-phase 
tasks and the onset of computational rundown. This delay is needed to provide 
time to process the completion of the early current-phase tasks and, in so 
doing, schedule the next-phase tasks that are thus enabled. During this delay, 
there must be enough current-phase tasks to keep the processing resources busy 
in order to avoid a computational load dip while the next-phase tasks are 
scheduled. 
In the dynamic scheduling situation, enablement relationships between the 
current-phase tasks and the next-phase tasks (i.e., the relationship that 
enables a next-phase task based upon the completion of a current-phase task) 
may be either static or dynamic. That is, the completion of a particular 
current-phase task may always enable the same next-phase task (the static 
enablement case) or it may enable some next-phase task that can only be identi-
fied at the time of execution (the dynamic enablement case). The nature of 
3 
the enablement relationships is important because it is involved in setting 
the time delay from the completion of the first current-phase tasks to the 
availability of the first enabled next-phase tasks. 
Considering these characteristics of the dynamic scheduling situation 
(i.e., the time to process current-phase task completion, the time to recognize 
enablement relationships, and the time to schedule enabled next-phase tasks), 
it can be observed that the number of tasks should substantially outnumber the 
number of processors. Certainly, there should be at the outset of the current-
phase work at least two tasks for each processor so that at least one task 
execution time will be available to process the completion of the first task 
assigned to the processor and to schedule the enabled next-phase task. This 
presumes that completion processing and task scheduling time is small with 
respect to task execution time. In particular, it assumes that one such com-
pletion, enablement, and scheduling cycle for each of the processors in the 
system can be completed in a single task execution time. (The author's experi-
ence with PAX suggests that this is reasonable even for dynamic managerial 
style parallel processing systems. Systems that use hardware-level synchro-
nization primitives presumably would be at even greater advantage in this 
area.) 
The conditions under which this overlapping of computational phases can 
correctly occur are the same as those that allow parallel computations withiri 
a particular phase. Let the logical predicate PARALLEL(x,y) return the condi-
tion TRUE when x and yare such that parallel computations are allowed. 
Clearly, PARALLEL(n,m) must always be TRUE if nand m are distinct computa-
tional granules of the same parallel computational phase. Let Q be an ~ncom~ . 
pleted granule of the current phase and r be a granule of the next phase that 
has been enabled by some completed granule, p, of the current phase. If 
PARALLEL(Q,r) necessarily returns the value TRUE, then the current-phase and 
next-phase can be correctly overlapped. 
The exact nature of the logical predicate PARALLEL(x,y) is, of course, of .. 
substantial practical interest; however, it has no direct impact upon the 
ability to overlap phases as outlined above. Different parallel systems may 
identify different logical predicates. 
Identifying Enabled Granules 
The first challenge to be met is to find a way of identifying enabled 
next~phase granules for overlapping. It is easy to postulate that some map-
ping function exists either to map from the set of completed granules, p, to 
the set of enabled granules, r, or to map from the set of uncompleted granules, . 
Q, to the set of enabled granules, r. It is very difficult to establish what . 
this mapping function might be in any general way. Fortunately, this mapping 
function is much more easily identified when each concrete situation is faced.' 
. :,First, consider the simplest imaginable case as represented by th& follow-
ing Fortran code segment: 
4 
," 
DO 100 I=l,N 
B(I)=A(I) 
100 CONTINUE 
DO 200 I=l,N 
D(I)=C(I) 
200 CONTINUE 
First computational phase 
Second computational phase 
Assuming that there are not shared output area constraints, it can be observed 
that these two parallel computational phases can be computed in parallel with 
each other. This represents what might be called a universal mapping function 
wherein any granule of the second computational phase is enabled by any granule 
or set of granules (including the null set) of the first computational phase. 
PAX/CASPER experience shows that 6 out of 22 (or 27 percent) of the paral-
lel computational phases allow universal mapping enablement of the succeeding 
phases. This represents 266 out of 1188 lines (or 22 percent) of the code 
that is executed in parallel in PAX/CASPER. . 
This universal mapping usually occurs in PAX/CASPER when the nature of 
the larger computational process is changing. For instance, the change over 
from power of compression computations to interpolator matrix generation is 
one such character change. The two computations do not involve shared inform-
ation of any kind and, thus, they can be entirely overlapped. Of course, the 
two phases could be merged into one by a preprocessor of the parallel control 
stream; however, since the mechanisms necessary to handle this case would be a 
subset of those needed for the following case, it might well be simpler to 
support this enablement mapping. 
For the next case, consider the following Fortran fragment that is to be 
computed in parallel as two succeeding computational phases: 
DO 100 I=l,N 
B(I)=A(I) 
100 CONTINUE 
DO 200 I=l,N 
C(I)=B(I) 
200 CONTINUE 
First computational phase 
Second computational phase 
Again assuming that there are not shared output area constraints, it can be 
observed by inspection that the identity mapping function (I = I) maps from 
completed granules, p, to enabled granules, r. This is also a simple and 
easily identified mapping. 
PAX/CASPER experience indicates that it applies in 9 out of 22 (or 41 
percent) of the parallel computational phases (representing 551 of 1188 code 
lines, or about 46 percent of the parallel code in PAX/CASPER). Combining 
this direct mapping with the simpler universal mapping above indicates that 
(at least in PAX/CASPER experience) 68 percent of the parallel computational 
phases and 68 percent of the code executed in parallel can be easily overlapped 
5 
to defeat computational rundown. These two enablement mapping possibilities 
are the most frequently occurring situations in PAX/CASPER experience. 
The next most frequently occurring enablement mapping in PAX/CASPER 
experience is what could be called null mapping, that is, the situation in 
which no overlapping is possible. This occurs in 4 out of 22 (or 18 percent) 
of the computational phases and represents 262 out of 1188 (or 22 percent) of 
the lines of code executing in parallel. In all cases the cause was not that 
such an overlapping did not exist between the parallel computations but was, 
in fact, that serial actions and decisions had to occur between the phases. 
This is important since it allows one to assess how often the extra effort of 
supporting overlapping features will be entirely defeated, regardless of the 
sophistication of the overlapping phase support features. 
Another enablement mapping occurring in PAX/CASPER experience is a reverse 
indirect mapping. Consider the following Fortran fragment: 
10 
100 
DO 10 I=l,N 
DO 10 J=l,lO 
IMAP(J,I)=IRAND() 
CONTINUE 
DO 100 I=l,N 
A(I)=FUNC(I) 
CONTINUE 
DO 200 I=l,N 
DO 200 J=l,lO 
B(I)=A(IMAP)(J,I»! 
200 CONTINUE 
I Set up source mapping 
IRAND produces an integer 
in the range 1 to N 
First computational 
phase generates some 
number in A(x) 
Second computational 
phase sums subsets of 
the results of the 
first computational 
phase 
Clearly, this computation can be overlapped; however, determining the 
enablement mapping is very difficult. This is because knowing that a particu-
lar first phase granule is complete does not directly identify any distinct 
second phase granule as computable; however, a reverse mapping from desired 
second phase granule to required first phase granules is possible. 
In PAX/CASPER experience, this situation occurs in 2 of 22 (or 9 percent) 
of the computational phases representing 78 out of 1188 (or 7 percent) of the 
lines of code executing in parallel. While this is not a frequently occurring 
situation in PAX/CASPER experience, it cannot be ignored out of hand. Some 
engineering judgement must be made to weigh the cost (in terms of management 
overhead, computational resource transferred from workers to management, etc.) 
of some reverse enablement mapping solution against the cost of computational 
rundown in 9 percent of the parallel computational phases. 
Certainly, a solution exists for the reverse, indirect enablement mapping. 
Once the values of the information selection map (represented in the code frag-
ment by the array IMAP) have been determined, it is a simple matter to produce 
a composite map of first phase granules that must be completed in order to 
enable a particular second phase granule. The executive can then use this map 
upon each first phase granule completion to determine the computability of 
particular second phase granules. This map could also be used to direct a 
preferred order of first phase granule dispatch1ng so as to enable a known 
second phase granule as early as poss1ble. 
Two 1mportant facts about th1s reverse enablement mapp1ng must be 
included. First, both occurrences of th1s situation involved a dynamically 
generated information selection map. Thus, the composite granule map would 
have to be generated by the executive at or after first phase initiation but 
before any second phase enablements. Second, the impact of executive computa-
tion must be considered. In the PAX/CASPER UNIVAC 1100 test bed, executive 
computation was done at the direct expense of worker computation. Thus, exten-
sive composite granule map generation could be self defeating. Some real 
parallel machines may provide separate executive computing resources, in which 
case the generat10n and use of composite granule maps would not be out of the 
quest10n. 
A final enablement form was observed in PAX/CASPER that could be charac-
terized as a forward, indirect mapped situation. Consider the following 
Fortran fragment: 
DO 10 I=l,M 
IMAP(I)=IRAND() 
10 CONTINUE 
DO 100 I=1,M 
B(IMAP(I»=A(IMAP(I» 
100 CONTINUE 
DO 200 I=l,N 
C(I)=B(I) 
200 CONTINUE 
Generate forward 
map 
Use forward map 
to operate on a 
subset of the 
arrays 
Perform some further 
further opera-
tion on the 
complete arrays 
Th1s situation 1s somewhat easier than the reverse, indirect mapping in 
that the identification of a particular granule in the first phase can be 
directly mapped to an enabled granule in the successor phase; however, much of 
the complication of a mapped enablement remains. This form was the least fre-
quently occurring situation in PAX/CASPER showing up only once (5 percent of 
the phases) and accounting for only 31 of 1188 lines of code executed in 
parallel. 
No other forms of enablement mapping were observed in PAX/CASPER. 
Certainly, extensions of the forms already presented can be imagined. 
Additionally, a seam mapping problem (such as would be appropriate for the 
checkerboard approach to the successive over-relaxation problem) can be 
foreseen. These other forms are beyond the scope of the present paper. 
Language Construction 
The developing PAX/CASPER language is simple and requires the user to 
make specific statements concerning choices for the management of each parallel 
computational phase. Statements involving the enablement of a succeeding phase 
could be made at two times: during the definition of a computational phase to 
7 
the management system and during the invocation of the phase for actual com-
putations. The difficulty to be faced is that the statements no longer apply 
solely to the phase being referenced, but rely also on the characteristics of 
the succeeding phase. 
The simplest approach is to require the user to specify the appropriate 
enablement mapping method when the phase in invoked. It might appear as in 
the following PAX parallel language fragment: 
DISPATCH phase-name 
ENABLE/MAPPING=option 
This is simple and explicit; however, it leaves the door wide open to 
user mistakes. There is no interlock between this phase and the next that can 
be verified by the executive. A simple solution to this would be to identify 
the name of the enabled next phase so that the executive system (or language 
processor) can verify that, in fact, that phase is following. This might 
appear as follows: 
DISPATCH phase-name 
ENABLE [phase-name/MAPPING=option] 
This allows the desired verification, but also brings up a new possi-
bility. Occasionally, a conditional branch that is not dependent on the com-
putational phase separates that phase from two or more succeeding phases, each 
of which may (or may not) be overlappable. If each of these phases were iden-
tified in the above construct, the executive could preprocess the branch and 
overlap the appropriate phase. This could look as follows: 
DISPATCH phase-name 
ENABLE/BRANCHINDEPENDENT 
[phase-name-l/MAPPING=option 
phase-name-2/MAPPING=option] 
IF (IMOD(LOOPCOUNTER,lO).NE.O) 
THEN GO TO branch-target 
DISPATCH phase-name-l 
GO TO rejoin 
branch-target: 
DISPATCH phase-name-2 
rejoin: 
8 
Finally, the matching of mapping selections and phases and the invocation 
of the appropriate overlapping services is something that could be done when 
the parallel phase is defined to the system; however, it would still be neces-
sary to identify preprocessable branches at the computation invocation site. 
This could appear as follows: 
DEFINE PHASE phase-name 
ENABLE [ 
phase-name-l/MAPPING=option 
phase-name-2/MAPPING=option 
phase-name-3/MAPPING=option 
] 
DISPATCH phase-name 
ENABLE/BRANCHDEPENDENT 
The ENABLE/BRANCHINDEPENDENT would be deleted when branch preprocessing 
was either not appropriate or not needed. The executive system could perform 
the appropriate lookahead to see whether any of the named succeeding phases 
was actually following and apply, as appropriate, the specified enablement 
mapping. 
Control Strategies 
Control strategies for enabling and scheduling overlapped parallel compu-
tational phases are, of course, highly dependent upon the overall parallel 
processing strategies. As alluded to earlier, some approaches to parallel 
processing may do all of this before any computations are begun. Indeed, the 
entire process may be done manually by a human being when the pattern of paral-
lel processing is fixed for the life of the system. 
Within the PAX system, the opposite is true: the identification and 
scheduling of computable granules is entirely automatic. A scheduling mecha-
nism for enabled computational granules already exists within the PAX system. 
It was developed to schedule dynamically created computations that conflicted 
(usually in terms of shared data access) with pre-existing computational 
granules. 
Within PAX, each internal description of one (or more) computational 
granules included a queue head for a double circularly-linked list of comput-
able but conflicting computational granules. Upon completion of the described 
computation, all the queued conflicting computations became unconditionally 
computable and were placed in the waiting computation queue. The waiting com-
putation queue was kept in a known order and, for the purposes of the conflict-
ing computation problem, it was determined that such conflicting computations 
would be placed ahead of the normal computations in the queue and, thus, given 
higher priority. 
The scheduling of universally mapped successor phases within this system 
is very easy indeed. At the time of phase initiation, the successor phase is 
9 
also initiated and the resulting computation description placed in the waiting 
computation queue behind the current phase description. 
The scheduling of directly enabled successor phases is similarly easy at 
first sight. At the time of phase initiation, the successor phase is also 
initiated and the resulting computation description placed in the conflicted 
computation queue of the current phase description. Thus, when the current 
phase computation is completed, the now-enabled successor computation will be 
placed in the waiting computation queue to be considered for scheduling. 
The above approach for directly enabled successor phases is fine if each 
indivisible granule of computation is described separately. Unfortunately, 
this is usually not economical (in terms of storage space and task search 
times, among other things) and was not the choice taken in PAX design. Compu-
tations were, instead, described as large, contiguous collections of granules. 
The descriptions were split apart as necessary to produce conveniently sized 
tasks for workers and then merged back into single descriptions when the work 
was completed. This splitting of descriptions requires that queued computation 
descriptions also be split so that each queued description will accurately 
reflect the enablement relationship between the computation and its queued 
successor computation. 
While this is certainly possible, it forces a further design decision for 
the executive software. PAX computation splitting was demanddriven by the 
presence of an idle worker. It was felt that the delay while splitting a task 
description was acceptable; however, the additional delays of splitting queued 
successor computation descriptions may represent an unacceptable situation. 
Two possible solutions exist. One possibility is to presplit the tasks before 
idle workers present themselves to the executive. This would allow the execu-
tive to work ahead in otherwise idle time. Alternatively, the splitting of a 
computation could generate a successor-splitting task that could be quickly 
queued for later attention when the executive would again be idle. 
The successor computation description could be removed from the current 
computation description and included in the successor-splitting task informa-
tion. When the successor-splitting task is executed the successor computation 
could be split and requeued to the appropriate current computation descriptions. 
Management of indirectly (both forward and reverse) mapped successor com-
putations is a good deal more interesting. The description of the successor 
computation cannot simply be queued to the description of the current computa-
tion since there is no guarantee of the enablement relationship. Additionally, 
it would seem wise to get the current phase into execution without the delay 
of constructing the necessary information for enabling successor computations. 
Both forward and reverse indirection would seem well handled by much the same 
mechanisms since the only significant difference is the direction of the 
indirection. Each leads naturally to a list of current phase granules that 
must be completed to enable a particular successor phase granule. 
It would seem appropriate to identify a subset group of successor-phase 
granules that are to be the subject of the enablement operation so as to avoid 
solving an unnecessarily large enablement problem. Once this subset has been 
identified, the current-phase granules that enable the successor subset can be 
identified. Since these are not necessarily the current phase granules that 
would be naturally selected by the scheduling mechanism, they should be split 
10 
, 
into individual descriptions and placed in the waiting computation queue 1n 
such a manner as to elevate the1r computat10na1 pr1or1ty. 
It 1s 1mportant to note that the descr1pt1on of the successor subset can-
not s1mply be queued to anyone of the 1dent1f1ed current-phase granules s1nce 
it is enabled not by the completion of anyone such granule but by the comp1e-
t10n of all the 1dent1f1ed granules. Th1s enab1ement on comp1et10n of all 
1dent1f1ed current-phase granules can be handled by any number of s1mp1e mech-
an1sms. For 1nstance, dur1ng comp1et10n process1ng, a status b1t (set when 
the current-phase granules were 1dent1f1ed and sp11t 1nto 1nd1v1dua1 descrip-
tions) can be checked and, if 1t is set, an enab1ement counter decremented. 
When the enab1ement counter reaches zero, it can be taken as a signal that the 
successor-phase granules are computable. 
CONCLUDING REMARKS 
This paper has d1scussed the poss1b111t1es for over1app1ng parallel com-
putat10ns in a general purpose parallel-computation environment so as to mini-
mize loss of computational resources. Practical experience w1th PAX/CASPER, a 
parallel Nav1er-Stokesso1ver, suggests that simple and p1aus1ble steps could 
provide such overlapping 1n 68 percent of the computat10na1 phases and that, 
with extended effort, more than 90 percent of the computational phases are 
amenable to some form of phase over1app1ng. 
REFERENCES 
1. W.H. Jones, "Combined Aerodynam1c and Structural Dynam1c Problem Emulating 
Rout1nes (CASPER): Theory and Imp1ementat10n, NASA TP-2418, 1985. 
2. W.J. Jones, "Parallel, Asynchronous Executive (PAX): System Concepts, 
Fac111t1es, and Architecture," NASA TP-2179, 1983. 
3. D.J. Arpas1, "Real-Time Mult1processor Pro- gramm1ng Language, (RTMPL) 
User1s Manual," NASA TP-2422, 1985. 
4. D.J. Arpas1 and E.J. Milner, "Part1t10n1ng and Packing Mathematical 
S1mu1at1on Models for Calculation on Parallel Computers," NASA TM-87170, 
1986. 
11 
1. Report No. 2. Government Accession No. 
NASA TM-87349 
4. Title and Subtitle 
Increasing Processor Utilization During 
Parallel Computation Rundown 
7. Author(s) 
William H. Jones 
9. Performing Organization Name.and Address 
National Aeronautics and Space Administration 
Lewis Research Center 
Cleveland, Ohio 44135 
12. Sponsoring Agency Name and Address 
National Aeronautics and Space Administration 
Washington, D.C. 20546 
15. Supplementary Notes 
3. Recipient's Catalog No. 
5. Report Date 
6. Performing Organization Code 
505-62-21 
8. Performing Organization Report No. 
E-3101 
10. Work Unit No. 
11. Contract or Grant No. 
13. Type of Report and Period Covered 
Technical Memorandum 
14. Sponsoring Agency Code 
Prepared for the 1986 International Conference on Parallel Processing, sponsored 
by the Institute of Electrical and Electronics Engineers, Inc., st. Charles, 
Illinois, August 19-22, 1986. 
16. Abstract 
Some parallel processing environments provide for asynchronous execution and 
completion of general purpose parallel computations from a single computational 
phase. When all the computations from such a phase are complete, a new parallel 
computational phase is begun. Depending upon the granularity of the parallel 
computations to be performed, there may be a shortage of available work as a 
particular computational phase draws to a close (computational rundown). This 
can result in the waste of computing resources and the delay of the overall 
problem. In many practical instances, strict sequential ordering of phases of 
parallel computation is not totally required. In suc~ cases, the "beginning" of 
one phase can be correctly computed before the "end" of a previous phase is com-
pleted. This allows additional work to be generated somewhat earlier to keep 
computing resources busy during each computational rundown. This paper identi-
fies the conditions under which this can occur, reports the frequency of occur-
rence of such overlapping in an actual parallel Nav1er-Stokes code, suggests a 
language construct, and discusses possible control strategies for the management 
of such computational phase overlapping. 
17. Key Words (Suggested by Author(s)) 
Parallel processing; Parallel process 
rundown; Parallel process overlapping 
18. Distribution Statement 
Unclassified - unlimited 
STAR Category 59 
19. Security Classif. (of this report) 
Unclassified 
20. Security Classif. (of this page) 
Unclassified 
21. No. of pages 
"For sale by the National Technical Information Service, Springfield, Virginia 22161 
22. Price" 
End of Document 
