Location of Repository

Scheduling non-uniform parallel loops on MIMD computers



Graduation date: 1994Parallel loops are one of the main sources of parallelism in scientific applications,\ud and many parallel loops do not have a uniform iteration execution time. To\ud achieve good performance for such applications on a parallel computer, iterations\ud of a parallel loop have to be assigned to processors in such a way that each processor\ud has roughly the same amount of work in terms of execution time. A parallel\ud computer with a large number of processors tends to have distributed-memory. To\ud run a parallel loop on a distributed-memory machine, data distribution also needs\ud to be considered. This research investigates the scheduling of non-uniform parallel\ud loops on both shared-memory and distributed-memory parallel computers.\ud We present Safe Self-Scheduling (SSS), a new scheduling scheme that combines\ud the advantages of both static and dynamic scheduling schemes. SSS has two\ud phases: a static scheduling phase and a dynamic self-scheduling phase that together\ud reduce the scheduling overhead while achieving a well balanced workload. The techniques\ud introduced in SSS can be used by other self-scheduling schemes. The static\ud scheduling phase further improves the performance by maintaining a high cache hit\ud ratio resulting from increased affinity of iterations to processors. SSS is also very\ud well suited for distributed-memory machines.\ud We introduce methods to duplicate data on a number of processors. The\ud methods eliminate data movement during computation and increase the scalability\ud of problem size. We discuss a systematic approach to implement a given self-scheduling\ud scheme on a distributed-memory. We also show a multilevel scheduling\ud scheme to self-schedule parallel loops on a distributed-memory machine with a large\ud number of processors to eliminate the bottleneck resulting from a central scheduler.\ud We proposed a method using abstractions to automate both self-scheduling\ud methods and data distribution methods in parallel programming environments. The\ud abstractions are tested using CHARM, a real parallel programming environment.\ud Methods are also developed to tolerate processor faults caused by both physical\ud failure and reassignment of processors by the operating system during the execution\ud of a parallel loop.\ud We tested the techniques discussed using simulations and real applications.\ud Good results have been obtained on both shared-memory and distributed-memory\ud parallel computers

Year: 1993
OAI identifier: oai:ir.library.oregonstate.edu:1957/35612
Provided by: ScholarsArchive@OSU

Suggested articles



  1. 1000 Fortran Compiler Reference,"
  2. (1974). A Comparison of List Schedules for Parallel Processing Systems,"
  3. (1988). A Graph-Oriented Mapping Strategy for a Hypercube," in the
  4. A.
  5. (1992). Adaptive Guided Self-Scheduling,"
  6. An Almost-Optimal Algorithm for the Assembly Line Scheduling Problem"
  7. and
  8. and Analysis of Parallel Algorithms, Oxford, NewYork: Oxford University Press, 1993.
  9. and Liu J., "Conjugate Gradient Method
  10. (1972). Bounds on Multiprocessor Scheduling Anomalies and Related Packing Algorithms,"
  11. C.
  12. Compilers,"
  13. (1976). Computer and Job-shop Scheduling Theory,"
  14. Computing,
  15. Containing
  16. (1992). Control Abstraction in Parallel Programming Languages,"
  17. (1991). Crowley K., "Run-Time Parallelization and Scheduling of Loops," IEEE
  18. D.J.,
  19. Data Arrays," in
  20. (1992). DAWGSA Distributed Compute Server Utilizing Idle Workstations,"
  21. (1992). Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers,"
  22. (1986). Doacross: Beyond Vectorization for multiprocessors,"
  23. Drafting
  24. (1990). Dynamic Processor Self-Scheduling for General Parallel Nested Loops,"
  25. E.G.,
  26. (1989). Efficient Parallel Algorithms,
  27. (1990). Empirical Comparison of Heuristic Load Distribution in Point-to-Point Multicomputer Networks,"
  28. F.Z., Rami G.M., and Kirk R.P., "Dilation Based Bidding
  29. (1991). Flynn L.E., "Factoring: A Practical and Robust
  30. for
  31. (1990). Fortran D Language Specification,"
  32. Gligor, V.D., "Properties of Multiprocessor Scheduling Algorithms,"
  33. H.,
  34. (1992). Improving Processor and Cache Locality in Fine-Grain Parallel Computations using Object-Affinity Scheduling and Continuation Passing," The
  35. in Distributed Systems," IEEE Transactions on Computers, vol.
  36. J., "Analysis of SSS Scheme," Presented at Permian Basin Supercomputing Conference, Permian
  37. J.A.B.,
  38. J.T.,
  39. Jones
  40. Kant K., "Introduction
  41. Kimura K. and Ichuyoshi N., "Probabilistic Analysis
  42. King
  43. Koelbel C.
  44. Lam
  45. LeBlanc
  46. Li G. and Wah. B.W., "The design
  47. Liu
  48. Liu 1. and Saletore V.A., "Self-Scheduling on Distributed-Memory Machines," to appear in
  49. Liu J., "Data Parallel
  50. Liu J., Saletore V.A., and
  51. Liu J., Saletore V.A., and Lewis T.G., "Safe
  52. (1983). Load Redistribution Under Failure in Distributed Systems"
  53. M.J., " Massive Parallelism through program Restructuring," Frontier 90, pp. 407-415.
  54. MaCmillan
  55. (1987). Mapping Strategy for Parallel
  56. Markatos E.P. and
  57. (1991). Markatos E.P., "Multiprogramming on Multiprocessors,"
  58. Massive Parallelism," YALEU/DCS/TR-833, October, 1990.
  59. Medium-Grain
  60. Message-Passing Supercomputer," in the Proceedings of Supercomputing, 1990, pp. 888-897
  61. (1984). Miranker, W.L., and Winkler, A.,
  62. of
  63. on
  64. (1991). Optimal Communication Algorithms for Hypercubes,"
  65. (1972). Optimal Scheduling on Two Processor System,"
  66. Optimization
  67. (1961). Parallel sequencing and assembly line problem,"
  68. Pfister G.F., Brantley W.C., George D.A., Harvey S.L.,
  69. Polychronopoulos C., "Toward Auto-Scheduling
  70. Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing,"
  71. preliminary
  72. Programming
  73. Quinn M.J., "Designing Efficient Algorithms for Parallel Computers," McGraw-Hill
  74. (1990). Reeves A. P., "A localized Dynamic Load Balancing Strategy for
  75. Rosing
  76. Saletore
  77. Sarker
  78. Scatter
  79. Scheduling
  80. Scheduling Interval-ordered Tasks," SIAM Journal of Computing , vol. 8,
  81. (1990). Scheduling Variable Length Parallel Subtasks,"
  82. Schonberg E., and Flynn L.E., "Factoring:
  83. Shirazi
  84. (1990). Skillicron D.B., "Architecture-Independent
  85. (1988). Solving Problems on Concurrent Processors, Prentice-Hall, Eng lle Cliffs,
  86. (1991). Supporting Machine Independent Programming on Diverse Parallel Architectures,"
  87. (1989). Synchronization and Communication Costs of Loop Partitioning on Shared- Memory Multiprocessor Systems,"
  88. Systems,
  89. Tang
  90. Tel G., Topics in Distributed Algorithms, Cambridge University Press,
  91. (1989). The Effect of Barrier Synchronization and Scheduling Overhead on Parallel Loops,"
  92. (1992). The Impact of Task-Length Parameters on the Performance of the Random Load-Balancing Algorithm,"
  93. (1983). The NYU ultracomputer Designing an MIMD shared-memory parallel computer,"
  94. (1984). The VLIW machine: a multiprocessor for compiling scientific code,"
  95. to
  96. Ulman J.D.,
  97. University
  98. Wang C. and Wang
  99. Weiss
  100. Weiss, "Allocating independence subtasks on parallel processor," IEEE Transaction on Software Engineering, vol.
  101. Wiley
  102. with
  103. Wolf
  104. Zima H., Bast H. -J, and Gerndt M., "SUPERB: A tool for semi-automatic MIMD/SIMD parallelization," in Parallel Computing, vol. 6., 1988, pp. 1-18.
  105. Znati T.F., Melhem R.G., and Pruhs K.R., "Dilation based bidding schemes for dynamic load balancing on distributed processing systems," in the Proceedings of The Sixth Distributed Memory Computing Conference, 1991, pp. 129-136.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.