118 research outputs found
Recommended from our members
A comparison of task scheduling algorithms on multicomputers
For many years, the von Neumann bottleneck has imposed speed limits on the execution of a program. Because of their sequential nature, von Neumann computers can only execute a single instruction at a time. Instructions that are side-effect free and can be executed in parallel must wait. In an effort to overcome this bottleneck, multicomputers have been developed and implemented. These multicomputers are a new class of computers based on multiple processors. With multiple processors available, instructions can be processed in parallel. However, now interprocessor communication (IPC) delays must be taken into account. A program that is run on several processors may take longer to execute than if it were run on a single processor. The focus of the speed limits has now changed. The new focus now hinges on the efficient partitioning of a program and
allocation of those partitions to processors.
Several algorithms have been developed to solve the problem of partitioning and
scheduling. Three algorithms were studied under uniform conditions to determine the
efficiencies of each. They were Internalization, Balanced Layered Allocation Scheme (BLAS) and Dynamic Level Scheduling (DLS). Simulation studies indicate that BLAS
performs the best overall. These algorithms based their communications costs on a
simplified IPC cost model. A more realistic message grouping IPC model was developed to test the accuracy of the algorithms which are based on the simplified model. These simulation studies indicated that the simplified model was a fairly accurate gauge of a more realistic system
Visualization of program performance on concurrent computers
A distributed memory concurrent computer (such as a hypercube computer) is inherently a complex system involving the collective and simultaneous interaction of many entities engaged in computation and communication activities. Program performance evaluation in concurrent computer systems requires methods and tools for observing, analyzing, and displaying system performance. This dissertation describes a methodology for collecting and displaying, via a unique graphical approach, performance measurement information from (possibly large) concurrent computer systems. Performance data are generated and collected via instrumentation. The data are then reduced via conventional cluster analysis techniques and converted into a pictorial form to highlight important aspects of program states during execution. Local and summary statistics are calculated. Included in the suite of defined metrics are measures for quantifying and comparing amounts of computation and communication. A novel kind of data plot is introduced to visually display both temporal and spatial information describing system activity. Phenomena such as hot spots of activity are easily observed, and in some cases, patterns inherent in the application algorithms being studied are highly visible. The approach also provides a framework for a visual solution to the problem of mapping a given parallel algorithm to an underlying parallel machine. A prototype implementation applied to several case studies is presented to demonstrate the feasibility and power of the approach
Task mapping for non-contiguous allocations.
This paper examines task mapping algorithms for non-contiguously allocated parallel jobs. Several studies have shown that task placement affects job running time for both contiguously and non-contiguously allocated jobs. Traditionally, work on task mapping either uses a very general model where the job has an arbitrary communication pattern or assumes that jobs are allocated contiguously, making them completely isolated from each other. A middle ground between these two cases is the mapping problem for non-contiguous jobs having a specific communication pattern. We propose several task mapping algorithms for jobs with a stencil communication pattern and evaluate them using experiments and simulations. Our strategies improve the running time of a MiniApp by as much as 30% over a baseline strategy. Furthermore, this improvement increases markedly with the job size, demonstrating the importance of task mapping as systems grow toward exascale
- …