4 research outputs found

    ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment

    Get PDF
    Applications in science and engineering often require huge computational resources for solving problems within a reasonable time frame. Parallel supercomputers provide the computational infrastructure for solving such problems. A traditional application scheduler running on a parallel cluster only supports static scheduling where the number of processors allocated to an application remains fixed throughout the lifetime of execution of the job. Due to the unpredictability in job arrival times and varying resource requirements, static scheduling can result in idle system resources thereby decreasing the overall system throughput. In this paper we present a prototype framework called ReSHAPE, which supports dynamic resizing of parallel MPI applications executed on distributed memory platforms. The framework includes a scheduler that supports resizing of applications, an API to enable applications to interact with the scheduler, and a library that makes resizing viable. Applications executed using the ReSHAPE scheduler framework can expand to take advantage of additional free processors or can shrink to accommodate a high priority application, without getting suspended. In our research, we have mainly focused on structured applications that have two-dimensional data arrays distributed across a two-dimensional processor grid. The resize library includes algorithms for processor selection and processor mapping. Experimental results show that the ReSHAPE framework can improve individual job turn-around time and overall system throughput.Comment: 15 pages, 10 figures, 5 tables Submitted to International Conference on Parallel Processing (ICPP'07

    Efficient Algorithms for Block-Cyclic Array Redistribution between Processor Sets

    No full text
    Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on P processors to cyclic(Kx) on Q processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the procedure to compute the schedule and the index sets is extremely fast. It takes O(max(P;Q)) time. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have i..

    Efficient Algorithms for Block-Cyclic Array Redistribution between Processor Sets

    No full text
    Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on P processors to cyclic(Kx) on Q processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the procedure to compute the schedule and the index sets is extremely fast. It takes O(max(P;Q)) time. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have i..

    Adaptation of multiway-merge sorting algorithm to MIMD architectures with an experimental study

    Get PDF
    Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2002.Thesis (Master's) -- Bilkent University, 2002.Includes bibliographical references leaves 73-78.Sorting is perhaps one of the most widely studied problems of computing. Numerous asymptotically optimal sequential algorithms have been discovered. Asymptotically optimal algorithms have been presented for varying parallel models as well. Parallel sorting algorithms have already been proposed for a variety of multiple instruction, multiple data streams (MIMD) architectures. In this thesis, we adapt the multiwaymerge sorting algorithm that is originally designed for product networks, to MIMD architectures. It has good load balancing properties, modest communication needs and well performance. The multiway-merge sort algorithm requires only two all-to-all personalized communication (AAPC) and two one-to-one communications independent from the input size. In addition to evenly distributed load balancing, the algorithm requires only size of 2N/P local memory for each processor in the worst case, where N is the number of items to be sorted and P is the number of processors. We have implemented the algorithm on the PC Cluster that is established at Computer Engineering Department of Bilkent University. To compare the results we have implemented a sample sort algorithm (PSRS Parallel Sorting by Regular Sampling) by X. Liu et all and a parallel quicksort algorithm (HyperQuickSort) on the same cluster. In the experimental studies we have used three different benchmarks namely Uniformly, Gaussian, and Zero distributed inputs. Although the multiwaymerge algorithm did not achieve better results than the other two, which are theoretically cost optimal algorithms, there are some cases that the multiway-merge algorithm outperforms the other two like in Zero distributed input. The results of the experiments are reported in detail. The multiway-merge sort algorithm is not necessarily the best parallel sorting algorithm, but it is expected to achieve acceptable performance on a wide spectrum of MIMD architectures.Cantürk, LeventM.S
    corecore