4 research outputs found
ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment
Applications in science and engineering often require huge computational
resources for solving problems within a reasonable time frame. Parallel
supercomputers provide the computational infrastructure for solving such
problems. A traditional application scheduler running on a parallel cluster
only supports static scheduling where the number of processors allocated to an
application remains fixed throughout the lifetime of execution of the job. Due
to the unpredictability in job arrival times and varying resource requirements,
static scheduling can result in idle system resources thereby decreasing the
overall system throughput. In this paper we present a prototype framework
called ReSHAPE, which supports dynamic resizing of parallel MPI applications
executed on distributed memory platforms. The framework includes a scheduler
that supports resizing of applications, an API to enable applications to
interact with the scheduler, and a library that makes resizing viable.
Applications executed using the ReSHAPE scheduler framework can expand to take
advantage of additional free processors or can shrink to accommodate a high
priority application, without getting suspended. In our research, we have
mainly focused on structured applications that have two-dimensional data arrays
distributed across a two-dimensional processor grid. The resize library
includes algorithms for processor selection and processor mapping. Experimental
results show that the ReSHAPE framework can improve individual job turn-around
time and overall system throughput.Comment: 15 pages, 10 figures, 5 tables Submitted to International Conference
on Parallel Processing (ICPP'07
Efficient Algorithms for Block-Cyclic Array Redistribution between Processor Sets
Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on P processors to cyclic(Kx) on Q processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the procedure to compute the schedule and the index sets is extremely fast. It takes O(max(P;Q)) time. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have i..
Efficient Algorithms for Block-Cyclic Array Redistribution between Processor Sets
Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on P processors to cyclic(Kx) on Q processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the procedure to compute the schedule and the index sets is extremely fast. It takes O(max(P;Q)) time. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have i..
Adaptation of multiway-merge sorting algorithm to MIMD architectures with an experimental study
Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2002.Thesis (Master's) -- Bilkent University, 2002.Includes bibliographical references leaves 73-78.Sorting is perhaps one of the most widely studied problems of computing. Numerous
asymptotically optimal sequential algorithms have been discovered. Asymptotically
optimal algorithms have been presented for varying parallel models as well. Parallel
sorting algorithms have already been proposed for a variety of multiple instruction,
multiple data streams (MIMD) architectures. In this thesis, we adapt the multiwaymerge
sorting algorithm that is originally designed for product networks, to MIMD
architectures. It has good load balancing properties, modest communication needs and
well performance. The multiway-merge sort algorithm requires only two all-to-all
personalized communication (AAPC) and two one-to-one communications
independent from the input size. In addition to evenly distributed load balancing, the
algorithm requires only size of 2N/P local memory for each processor in the worst
case, where N is the number of items to be sorted and P is the number of processors.
We have implemented the algorithm on the PC Cluster that is established at
Computer Engineering Department of Bilkent University. To compare the results we
have implemented a sample sort algorithm (PSRS Parallel Sorting by Regular
Sampling) by X. Liu et all and a parallel quicksort algorithm (HyperQuickSort) on
the same cluster. In the experimental studies we have used three different benchmarks
namely Uniformly, Gaussian, and Zero distributed inputs. Although the multiwaymerge
algorithm did not achieve better results than the other two, which are
theoretically cost optimal algorithms, there are some cases that the multiway-merge
algorithm outperforms the other two like in Zero distributed input. The results of the experiments are reported in detail. The multiway-merge sort algorithm is not
necessarily the best parallel sorting algorithm, but it is expected to achieve acceptable
performance on a wide spectrum of MIMD architectures.Cantürk, LeventM.S