2,609 research outputs found

    On the design and implementation of broadcast and global combine operations using the postal model

    Get PDF
    There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1. Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%

    Parallel Weighted Random Sampling

    Get PDF
    Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

    Lightweight MPI Communicators with Applications to Perfectly Balanced Quicksort

    Get PDF
    MPI uses the concept of communicators to connect groups of processes. It provides nonblocking collective operations on communicators to overlap communication and computation. Flexible algorithms demand flexible communicators. E.g., a process can work on different subproblems within different process groups simultaneously, new process groups can be created, or the members of a process group can change. Depending on the number of communicators, the time for communicator creation can drastically increase the running time of the algorithm. Furthermore, a new communicator synchronizes all processes as communicator creation routines are blocking collective operations. We present RBC, a communication library based on MPI, that creates range-based communicators in constant time without communication. These RBC communicators support (non)blocking point-to-point communication as well as (non)blocking collective operations. Our experiments show that the library reduces the time to create a new communicator by a factor of more than 400 whereas the running time of collective operations remains about the same. We propose Janus Quicksort, a distributed sorting algorithm that avoids any load imbalances. We improved the performance of this algorithm by a factor of 15 for moderate inputs by using RBC communicators. Finally, we discuss different approaches to bring nonblocking (local) communicator creation of lightweight (range-based) communicators into MPI

    Practical Massively Parallel Sorting

    Get PDF

    Randomized Initialization of a Wireless Multihop Network

    Full text link
    Address autoconfiguration is an important mechanism required to set the IP address of a node automatically in a wireless network. The address autoconfiguration, also known as initialization or naming, consists to give a unique identifier ranging from 1 to nn for a set of nn indistinguishable nodes. We consider a wireless network where nn nodes (processors) are randomly thrown in a square XX, uniformly and independently. We assume that the network is synchronous and two nodes are able to communicate if they are within distance at most of rr of each other (rr is the transmitting/receiving range). The model of this paper concerns nodes without the collision detection ability: if two or more neighbors of a processor uu transmit concurrently at the same time, then uu would not receive either messages. We suppose also that nodes know neither the topology of the network nor the number of nodes in the network. Moreover, they start indistinguishable, anonymous and unnamed. Under this extremal scenario, we design and analyze a fully distributed protocol to achieve the initialization task for a wireless multihop network of nn nodes uniformly scattered in a square XX. We show how the transmitting range of the deployed stations can affect the typical characteristics such as the degrees and the diameter of the network. By allowing the nodes to transmit at a range r= \sqrt{\frac{(1+\ell) \ln{n} \SIZE}{\pi n}} (slightly greater than the one required to have a connected network), we show how to design a randomized protocol running in expected time O(n3/2log2n)O(n^{3/2} \log^2{n}) in order to assign a unique number ranging from 1 to nn to each of the nn participating nodes
    corecore