Search CORE

32 research outputs found

A Randomized Parallel Sorting Algorithm With an Experimental Study

Author: David A. Bader
David R. Helman
Joseph JáJá
Publication venue
Publication date
Field of study

Previous schemes for sorting on general-purpose parallel machines have had to choose between poor load balancing and irregular communication or multiple rounds of all-to-all personalized communication. In this paper, we introduce a novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead. Moreover, unlike previous variations, our algorithm efficiently handles the presence of duplicate values without the overhead of tagging each element with a unique identifier. This algorithm was implemented in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, the IBM SP-2, and the Cray Research T3D. We ran our code using widely different benchmarks to examine the dependence of our algorithm on the input distribution. Our experimental results illustrate the efficiency and scalability of our algorithm across different platforms. In fact, it seems to..

CiteSeerX

An Introduction to Parallel Algorithms

Author: Jájá Joseph
Publication venue: Addison-Wesley
Publication date: 10/05/2022
Field of study

x, 566 tr. ; 21x30 cm

Thư viện trường Đại học Đà Lạt

An Introduction to Parallel Algorithms

Author: Jájá Joseph
Publication venue: Addison-Wesley
Publication date: 10/05/2022
Field of study

x, 566 tr. ; 24 cm

Thư viện trường Đại học Đà Lạt

Efficient Image Processing Algorithms on the Scan Line Array Processor

Author: David Helman
Joseph JáJá
Publication venue
Publication date: 01/01/1995
Field of study

We develop efficient algorithms for low and intermediate level image processing on the scan line array processor, a SIMD machine consisting of a linear array of cells that processes images in a scan line fashion. For low level processing, we present algorithms for block DFT, block DCT, convolution, template matching, shrinking, and expanding which run in real-time. By real-time, we mean that, if the required processing is based on neighborhoods of size m \Theta m, then the output lines are generated at a rate of O(m) operations per line and a latency of O(m) scan lines, which is the best that can be achieved on this model. We also develop an algorithm for median filtering which runs in almost real-time at a cost of O(m log m) time per scan line and a latency of b m 2 c scan lines. For intermediate level processing, we present optimal algorithms for translation, histogram computation, scaling, and rotation. We also develop efficient algorithms for labelling the connected components..

CiteSeerX

SIMPLE: A Methodology for Programming High Performance Algorithms on Clusters of Symmetric Multiprocessors (SMPs)

Author: David A. Bader
Joseph Jájá Y
Publication venue
Publication date: 01/01/1997
Field of study

We describe a methodology for developing high performance programs running on clusters of SMP nodes. The SMP cluster programming methodology is based on a small prototype kernel (SIMPLE) of collective communication primitives that make efficient use of the hybrid shared and message passing environment. We illustrate the power of our methodology by presenting experimental results for sorting integers, two-dimensional fast Fourier transforms (FFT), and constraint-satisfied searching. Our testbed is a cluster of DEC AlphaServer 2100 4/275 nodes interconnected by an ATM switch

CiteSeerX

Digital Repository at the University of Maryland

Practical Parallel Algorithms for Personalized Communication and Integer Sorting

Author: David Bader
David R. Helman
Joseph JáJá
Publication venue
Publication date: 01/01/1995
Field of study

A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become easier to design efficient and portable parallel algorithms by making use of these communication primitives. While existing primitives allow an assortment of collective communication routines, they do not handle an important communication event when most or all processors have non-uniformly sized personalized messages to exchange with each other. We focus in this paper on the h-relation personalized communication whose efficient implementation will allow high performance implementations of a large class of algorithms. While most previous h-relation algorithms use randomization, this paper presents a new deterministic approach for h-relation personalized communication with asymptoticaly optimal complexity for h p². As an application, we ..

CiteSeerX