32 research outputs found

    A Randomized Parallel Sorting Algorithm With an Experimental Study

    Get PDF
    Previous schemes for sorting on general-purpose parallel machines have had to choose between poor load balancing and irregular communication or multiple rounds of all-to-all personalized communication. In this paper, we introduce a novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead. Moreover, unlike previous variations, our algorithm efficiently handles the presence of duplicate values without the overhead of tagging each element with a unique identifier. This algorithm was implemented in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, the IBM SP-2, and the Cray Research T3D. We ran our code using widely different benchmarks to examine the dependence of our algorithm on the input distribution. Our experimental results illustrate the efficiency and scalability of our algorithm across different platforms. In fact, it seems to..

    An Introduction to Parallel Algorithms

    No full text
    x, 566 tr. ; 21x30 cm

    An Introduction to Parallel Algorithms

    No full text
    x, 566 tr. ; 24 cm

    Efficient Image Processing Algorithms on the Scan Line Array Processor

    No full text
    We develop efficient algorithms for low and intermediate level image processing on the scan line array processor, a SIMD machine consisting of a linear array of cells that processes images in a scan line fashion. For low level processing, we present algorithms for block DFT, block DCT, convolution, template matching, shrinking, and expanding which run in real-time. By real-time, we mean that, if the required processing is based on neighborhoods of size m \Theta m, then the output lines are generated at a rate of O(m) operations per line and a latency of O(m) scan lines, which is the best that can be achieved on this model. We also develop an algorithm for median filtering which runs in almost real-time at a cost of O(m log m) time per scan line and a latency of b m 2 c scan lines. For intermediate level processing, we present optimal algorithms for translation, histogram computation, scaling, and rotation. We also develop efficient algorithms for labelling the connected components..

    SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMPs)

    Get PDF
    We describe a methodology for developing high performance programs running on clusters of SMP nodes. The SMP cluster programming methodology is based on a small prototype kernel (SIMPLE) of collective communication primitives that make efficient use of the hybrid shared and message passing environment. We illustrate the power of our methodology by presenting experimental results for sorting integers, two-dimensional fast Fourier transforms (FFT), and constraint-satisfied searching. Our testbed is a cluster of DEC AlphaServer 2100 4/275 nodes interconnected by an ATM switch

    Practical Parallel Algorithms for Personalized Communication and Integer Sorting

    No full text
    A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become easier to design efficient and portable parallel algorithms by making use of these communication primitives. While existing primitives allow an assortment of collective communication routines, they do not handle an important communication event when most or all processors have non-uniformly sized personalized messages to exchange with each other. We focus in this paper on the h-relation personalized communication whose efficient implementation will allow high performance implementations of a large class of algorithms. While most previous h-relation algorithms use randomization, this paper presents a new deterministic approach for h-relation personalized communication with asymptoticaly optimal complexity for h p². As an application, we ..
    corecore