In Figure   3 .3, we show a picture of the data structures required to maintain a grid hierarchy. This structure has been designed keeping in view tile t)otential parallelism in the algorithm.
In the next subsection, we explore how tile HPF directives can I)e used to control the locality of such a collection of grids. That is, rather than constructing a single distribution which inaps each grid as a whole to exactly one processor we independently distribute the data arrays of each inetividual grid.
The HPF version of this code is giw_n in Figure 3 .6. The mapt)iilg is expressed by declaring the pointer, grid_data, in the derived type GRID_TYPE to be distributed by (*, BLOCK) ONTO P, where P is the set of all processors available to tile program. The array GRID_COLL is not distributed, resulting in its replication across all processors.
This approach exph)its the parallelism within each grid, but not the parallelism across the grids of a collection. Each processor Inay own a part of each grid, leading to a more even workload; however, some of the grids may not be large enough to effectively exploit all the, processors ill the system.
The parallelism in the code is made explMt by using the INDEPENDENT directive to declare both levels of the nested loop ill the solver routine to be parallel. 3This is under the assumption that the underlying system does not support one-sided communication since in that case the processor owning the data does not nee(l to be involved in the communication. Since the partitioning of the mesh is to be det.ermined at runtime, the arrays constituting the mesh,
GRID
and EDGE, are declared to be DYNAMIC. As indicated above, the irregularity of the w,rtex nuInl)ering implies that the INDIRECT distril)ution is needed to map the vertices to the processors. Thus, the routine GRID_PARTITIONnot only partitions the grid but also returns the mapping array NODEMAP such that the value of its ith element represents the index of the processor on which the ith element of the GRID array is to be mat)ped.
Once the partitioning of the vertices has been determined, we can also determine the mapping of array representing the edges. Given the structure of tile computation, it would be usefifi to distribute EDGE in such a way that the values at one or both of its nodes are on the same processor.
We have chosen to distribute the elements of EDGE to the processor which owns the values for the first of its nodes. We again use the INDIRECT distribution for this, assuming that the GRID_PARTITIONroutine will also setup the EDGEMAP array based on tile values in the EDGE array.
Note that the inapping arrays are as large as the unstructured mesh itself and hence have to be distril _uted themselves. This is in contrast to the mapping array used with multil)hlck codes in the last section which was used to map the grids in a collection and hence was sainll and could be replicated across the processors.
Tile computation is specified using a INDEPENDENT loop, with an ON clause to specify where each iteration is to be performed. Thus the iterations of the loop, over the edges in this case, (:an be executed in parallel.
In Figure 4 .1, the ON clause specifies that the Ith iteration should be performed on the processor that owns the (I, 1)th element of EDGE. 
