A compiler for recognizing statements of a FORTRAN program which are suited for fas t execution on a parallel or pipelin e machine such as ILLIAC-IV, STAR or ASC i s described . The technique employs "interval analysis" to provide flow informatio n to the vector/parallel recognizer . Wher e profitable the compiler changes scala r variables to subscripted variables . Th e output of the compiler is an extension t o FORTRAN which shows parallel and vecto r operations explicitly .
INTRODUCTIO N
The very high performance computer s being built today for delivery in the nex t several years (e .g . Texas Instrumen t ' s ASC , Control Data ' s STAR and Burrough ' s ILLIAC -IV) rely on radically new machine organizations to attain their speed . They ar e based on pipeline (vector) or paralle l processing concepts (1) which, however different they may be, appear quite simila r to the user . Each of these computer s requires that the same sequence of operations be applied to a large set of dat a items in a regular fashion . Each operation is specified in turn and it is applied to the entire set of data . Thus , these machines may be thought of as performing an operation simultaneously on al l data items . The ILLIAC-IV operates on 6 4 data items in parallel . The STAR and AS C perform operations on data items sequentially, but with a very high degree o f overlap .
However, programmers are not accustomed to stating problems in the for m required by these machines . An effectiv e means of problem statement is transformation of a standard high level languag e program into one which details paralle l operations . This paper describes th e technicrues used in a compiler to perfor m this transformation . The compiler accept s a standard FORTRAN program (2), identifies implicit parallel or vector operations an d produces a program which performs thes e operations explicitly . The program outpu t by the compiler is 'almost' in th e Burrough's ILLIAC-IV FORTRAN language (3) . In this extension to FORTRAN an asteris k appearing in a subscript position indicate s that the entire column takes part in a n operation . The operation is performed i n a manner analagous to the functioning o f an I/O statement ' s " implied DO " .
The next three sections of this pape r describe three of the four compiler segments : 1. Statement Classificatio n 2. Flow Analysi s 3. Recognition of Parallel or Vecto r Operation s 4. Optimization (not included, will b e discussed in a later paper) .
STATEMENT CLASSIFICATIO N
As each source statement is read i t is classified as one of the approximatel y forty FORTRAN statement types (4) . Th e appropriate routine is then called t o process the statement . Two functions ar e performed at this time . The source statements are transformed to an intermediat e text representation which is convenien t for further analysis, and information i s gathered which will later be used to determine program flow and data flow . Eac h appearance of a variable, constant, o r label is entered in a "reference table " and dictionary . The format of this ke y pair of tables is given in Figure 1 . Whe n the entire FORTRAN program has been read , control is transferred to the flow analysis routines . It is now possible to characteriz e the program flow in terms of basic blocks . The elementary flow relationship that w e employ is called a predecessor . Block "i " is said to be a predecessor of block "j " if block "j" can be reached from block "i" ; it is an immediate predecessor if bloc k "j" can be reached in one step . A set o f predecessor lists is used to represent th e program flow . Each predecessor list give s all the immediate predecessor blocks, a s in Figure 3 .
Block Predecessor s Figure 2 We are ready to begin processing a t this time . All that remains is to choos e an ordering of the basic blocks . Clearl y we wish an ordering that will identif y loops, as that is a necessary condition fo r the existence of parallel or vector operations . The notion of a "Strongly Connected Region" (SCR) which is roughly equivalent to the extended range of a "DO " loop is convenient at this time . An SC R is a set of basic blocks with the propert y that any pair of basic blocks of the se t are predecessors of one another . Neste d loops produce a nested set of SCRs whe n all blocks of inner loops are treated a s single blocks . A construct which readil y finds SCRs and has other useful orderin g relations is called the "Cocke-Alle n interval decomposition" (6, 7) . By usin g this technique we will identify SCRs and determine a processing order .
The "interval" construction arise s naturally by extension of the concept o f basic block . Because a basic block ha s only sequential flow it is not usefu l where consideration of flow is necessary . We extend the definition of basic block t o permit divergent flow paths . This i s still not adequate for handling SCRs so w e extend the definition again to permit convergent flow paths from a common predecessor . This development is traced below .
We now define an "Extended Basi c Block" as a section of code with one entr y and any number of exits . An extende d basic block is a set of basic blocks whic h can be amalgamated to a unit, as shown i n Figure 4 . The extended basic block permits us to work with larger units of code . For example, Figure 4 illustrates an extended basic block which is a set of basi c blocks on three divergent paths . We ma y consider blocks 1-2-3, blocks 1-2-4 an d blocks 1-5 as units, for the order of cod e is sequential along each path . The definition is now extended t o allow convergent paths, and so yields a n "interval" . An interval is the maxima l set of basic blocks containing a distinguished basic block, called the "interva l head", with the properties that : a) all predecessors of blocks in th e interval, except the interval head , must belong to the interval . b) any SCR in the interval must include the head . In constructing an interval we again begi n by setting the interval equal to a give n basic block, the interval head . A basi c block may be addded to the inteval if an d only if all of its immediate predecessor s are already in the interval . This definition yields a partial order among th e blocks of the interval, shown in Figure 6 .
Since a basic block can be made par t of an interval only if all its predecessors are already in the interval, it i s clear that an SCR cannot be added to a n interval . However, an interval will contain an SCR when a block of the SCR whic h could not be added to a prior interva l becomes the head of the interval . Thus , intervals can be used to identify SCRs . The interval construction uniquel y partitions a program flow-graph (6) . Th e resulting intervals are then treated a s basic blocks and the process repeated a s in Figure 7 . The set of iterated intervals defines a processing order (with som e blocks processed several times) for analysis of the source program . The iterated interval sequence either converges to a single block or takes on an irreducibl e form . By transformation of the sourc e program the irreducible form could b e eliminated (8) . Currently either condition signals the completion of processing .
RECOGNITION OF PARALLEL OR VECTO R OPERATION S
The input to this segment of the compiler is the ordered set of blocks of a n interval . The interval may contain th e entire program or an SCR . The SCR is th e item of interest and must be located within the interval . As observed earlier th e target of the backward branch creating th e SCR must be the interval head . The SC R consists of all interval predecessors o f the interval head . It is in the SCR tha t parallel or vector operations can be performed when sucessive passages throug h the SCR are independent of one another .
In order to execute computations withi n the SCR in a parallel or vector mode o f operation, the number of iterations mus t not be dependent on computations withi n the SCR . Equivalently, the SCR must hav e only one exit, controlled by the variabl e used to count the number of iteration s called the induction variable .
The induction variable must now h e identified . To accomplish this it i s necessary to distinguish "relative constants", those variables (all constant s are relative constants) whose values d o not change within the SCR . This is don e iteratively, by first marking all variables which are not defined within the SCR , as they are clearly relative constants . Next, all variables which have only a single unconditional definition that precede s all uses and is defined in terms of relative constants are marked relative constants . This process is repeated until n o new relative constants are found .
If the exit branch of the SCR depend s upon precisely one variable that is not a relative constant then that variable i s the induction variable candidate . I n order for the induction variable candidat e to be the induction variable it may onl y be defined by addition or subtraction of a relative constant expression to itsel f within unconditional arithmetic statements . In an SCR corresponding to a "DO" loop the induction variable is the dovariable . Identification of the inductio n variable means it is possible to determin e the number of iterations of the SCR befor e entry . If no induction variable can h e identified then parallel or vector calculations cannot be performed and the nex t interval is processed .
When the induction variable has bee n identified it is used to determine whic h subscripted variables may participate i n vector and parallel operations . Currentl y the compiler considers only subscripte d variables in which the induction variabl e makes precisely one appearance in the sam e position in every subscript . Any othe r variable which appears in the subscrip t must be a relative constant, so that it s value is available for the entire paralle l or vector operation (corresponding to eac h iteration through the SCR) . A subscripte d variable may participate in vector an d parallel operations only if there is n o "feedback" between different iterations o f the SCR, as shown in Figure 8 .
In order to determine whether feedbac k can occur we must know in which directio n the induction variable is changing . I f the induction variable is not incremente d by a constant, we must examine the exi t branch of the SCR . If the loop exi t occurs on "greater" (>,~) condition th e induction variable is increasing, if th e loop exit occurs on "less" (<,) condition , the induction variable is decreasing . Other conditions make it impossible t o determine the direction of change of th e induction variable, therefore impossible to check for feedback, and so parallel an d vector operations cannot be performed .
The condition for feedback is now described for an increasing induction variable . Let I(S) be the multiplier of th e induction variable in subscript S, and le t J(S) be the constant term of the inductio n variable in subscript S . For example, i f "L " is the induction variable, the n I (3 * L + 2, M -7, 2 * is + 4) = 3 and j (3 * L + 2, M -7, 2 * N + 4) = 2 .
Let I L be the maximum of I(S) over al l subscripts appearing in definitions of th e variable and let I R be the minimum of I(S ) over all subscripts appearing in uses o f the variable . Define J L and J R in a similar fashion . Feedback potentially exist s if J L > J R As illustrated in Figure 8 , potentia l feedback does not always result in feedback . Potential feedback implies the existence of subscripts "s" in a definitio n and "t" in a use of the variable wit h I (s) > I (t) or J (s) > J(t) .
In the cas e of potential feedback all definition/us e pairs of subscripts must be checked t o determine whether feedback actually occurs . It is not sufficient merely to recognize vector and parallel operations amon g subscripted variables . It is desirable t o replace sequential processes with vecto r and p arallel processes whenever possible . Figure 9a illustrates a loop where temporary variables are used to hold common sub--expressions ; Figure 9b illustrates th e same loop as transformed by the compiler . A scalar in an SCR is transformed into a vector when at least one of its unconditional definitions is in terms of a vecto r quantity . Any calculation which is conditional , and thus loop dependent, cannot be performed in parallel or as a vector . Thi s would at first seem to eliminate a grea t many operations but in fact does not . Even though flow information indicate s that a calculation is conditional it ma y not be so in the context of an SCR . A s illustrated in Figure 10 , if a conditiona l calculation dependes on a loop independen t test then either the calculation wil l always or never be executed during th e loop, as the test result does not var y within the loop . In this case both th e loop independent test as well as the parallel and vector operations may be moved ou t of the loop as described below . WTM= 0 . M -mongrel quantities ; neither of th e above .
Pairs of operands combine in the followin g manner :
The arithmetic analyzer moves 'V' subexpressions out of the loop to enable the m to be executed in parallel or vector mode . It moves 'R' subexpressions to the nex t outer loop where they are again processed . On this second processing a former 'R ' subexpression can assume any of the thre e possible types . 'M' subexpressions ar e left unchanged and are flagged so tha t they need not be re-examined since an 'M ' subexpression cannot take on a differen t type in an outer loop . Figure 11 illustrates the action of the compiler on a sample program .
CONCLUSIO N This paper has described the interna l organization of a compiler used to recognize, create and translate vector or parallel executable operations into a form useful by the appropriate hardware .
ACKNOWLEDGEMENT
The author wishes to thank Ellino r Angel who designed and programmed the arithmetic analyzer . 
