4 research outputs found

    Practical Parallel Algorithms for Personalized Communication and Integer Sorting

    Get PDF
    A fundamental challenge for parallel computing is to obtain high-level, architecture independent, algorithms which efficiently execute on general-purpose parallel machines. With the emergence of message passing standards such as MPI, it has become easier to design efficient and portable parallel algorithms by making use of these communication primitives. While existing primitives allow an assortment of collective communication routines, they do not handle an important communication event when most or all processors have non-uniformly sized personalized messages to exchange with each other. We focus in this paper on the h-relation personalized communication whose efficient implementation will allow high performance implementations of a large class of algorithms. While most previous h-relation algorithms use randomization, this paper presents a new deterministic approach for h-relation personalized communication. As an application, we present an efficient algorithm for stable integer sorting. The algorithms presented in this paper have been coded in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, IBM SP-1 and SP-2, Cray Research T3D, Meiko Scientific CS-2, and the Intel Paragon. Our experimental results are consistent with the theoretical analysis and illustrate the scalability and efficiency of our algorithms across different platforms. In fact, they seem to outperform all similar algorithms known to the authors on these platforms. (Also cross-referenced as UMIACS-TR-95-101.

    Parallelization of Reconstructability Analysis Algorithms.

    Get PDF
    Bush Jones published a series of papers providing sequential algorithms that are key to reconstructability analysis. These algorithms include the determination of unbiased reconstructions and a greedy algorithm for a generalization of the reconstruction problem. The implementation of these sequential algorithms provide scientists and mathematicians with the means of utilizing reconstructability analysis in systems modeling. The algorithms, however, are so computationally intensive that the system is limited to a very small set of variables. Many papers have been written applying reconstructability analysis and maximum entropy methods to various disciplines. Reconstructability analysis has the potential of dramatically impacting the scientific community, but the sequential algorithms leave the utilization of reconstructability analysis infeasible. The author has parallelized the reconstructability analysis algorithms developed by Jones, thereby, bridging the gap between theoretical application and feasible implementation. Since the goal of parallelization of these reconstructability analysis algorithms is to make them feasible to as many researchers as possible, a specific architecture is not assumed. It is assumed that the architecture employed is a multiple data architecture. That is, the architectural design needed for the implementation of these algorithms must have memory local to each processing element (PE). The parallel algorithms developed and presented here do not address the problems of communications between processors of particular architectures. These algorithms assume a reconfigurable bus system which is a bus system whose configuration can be dynamically altered thus allowing broadcasting and long-distance communications to be completed in constant time. It is noted that processor arrays with such reconfigurable bus systems have been designed. Frequently, parallel algorithms do not address the situation in which the number of values on which to operate is larger than the number of processors. However, since the purpose of the parallelization of these reconstructability analysis algorithms is to make them feasible for large structure systems, the parallelization given does address the situation in which the number of values on which to operate is larger than the number of processors available. Therefore, implementation of the algorithms involves simply incorporating the communication protocols between processors for the particular architecture employed

    Integer Sorting on a Mesh-Connected Array of Processors

    No full text
    Schnorr and Shamir and independently Kunde, have shown that sorting N =n superscript 2 inputs into snake-like ordering on a n X n mesh requires 3n -o(n) steps. Using a less restrictive, more realistic model we show that the sorting N = n superscript 2 integers in the range [1 ...N] can be performed in 2n +o(n) steps on a n X n mesh
    corecore