Graduation date: 1994Parallel loops are one of the main sources of parallelism in scientific applications,\ud and many parallel loops do not have a uniform iteration execution time. To\ud achieve good performance for such applications on a parallel computer, iterations\ud of a parallel loop have to be assigned to processors in such a way that each processor\ud has roughly the same amount of work in terms of execution time. A parallel\ud computer with a large number of processors tends to have distributed-memory. To\ud run a parallel loop on a distributed-memory machine, data distribution also needs\ud to be considered. This research investigates the scheduling of non-uniform parallel\ud loops on both shared-memory and distributed-memory parallel computers.\ud We present Safe Self-Scheduling (SSS), a new scheduling scheme that combines\ud the advantages of both static and dynamic scheduling schemes. SSS has two\ud phases: a static scheduling phase and a dynamic self-scheduling phase that together\ud reduce the scheduling overhead while achieving a well balanced workload. The techniques\ud introduced in SSS can be used by other self-scheduling schemes. The static\ud scheduling phase further improves the performance by maintaining a high cache hit\ud ratio resulting from increased affinity of iterations to processors. SSS is also very\ud well suited for distributed-memory machines.\ud We introduce methods to duplicate data on a number of processors. The\ud methods eliminate data movement during computation and increase the scalability\ud of problem size. We discuss a systematic approach to implement a given self-scheduling\ud scheme on a distributed-memory. We also show a multilevel scheduling\ud scheme to self-schedule parallel loops on a distributed-memory machine with a large\ud number of processors to eliminate the bottleneck resulting from a central scheduler.\ud We proposed a method using abstractions to automate both self-scheduling\ud methods and data distribution methods in parallel programming environments. The\ud abstractions are tested using CHARM, a real parallel programming environment.\ud Methods are also developed to tolerate processor faults caused by both physical\ud failure and reassignment of processors by the operating system during the execution\ud of a parallel loop.\ud We tested the techniques discussed using simulations and real applications.\ud Good results have been obtained on both shared-memory and distributed-memory\ud parallel computers
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.