2,609 research outputs found
On the design and implementation of broadcast and global combine operations using the postal model
There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1.
Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%
Parallel Weighted Random Sampling
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries
Lightweight MPI Communicators with Applications to Perfectly Balanced Quicksort
MPI uses the concept of communicators to connect groups of processes. It
provides nonblocking collective operations on communicators to overlap
communication and computation. Flexible algorithms demand flexible
communicators. E.g., a process can work on different subproblems within
different process groups simultaneously, new process groups can be created, or
the members of a process group can change. Depending on the number of
communicators, the time for communicator creation can drastically increase the
running time of the algorithm. Furthermore, a new communicator synchronizes all
processes as communicator creation routines are blocking collective operations.
We present RBC, a communication library based on MPI, that creates
range-based communicators in constant time without communication. These RBC
communicators support (non)blocking point-to-point communication as well as
(non)blocking collective operations. Our experiments show that the library
reduces the time to create a new communicator by a factor of more than 400
whereas the running time of collective operations remains about the same. We
propose Janus Quicksort, a distributed sorting algorithm that avoids any load
imbalances. We improved the performance of this algorithm by a factor of 15 for
moderate inputs by using RBC communicators. Finally, we discuss different
approaches to bring nonblocking (local) communicator creation of lightweight
(range-based) communicators into MPI
Randomized Initialization of a Wireless Multihop Network
Address autoconfiguration is an important mechanism required to set the IP
address of a node automatically in a wireless network. The address
autoconfiguration, also known as initialization or naming, consists to give a
unique identifier ranging from 1 to for a set of indistinguishable
nodes. We consider a wireless network where nodes (processors) are randomly
thrown in a square , uniformly and independently. We assume that the network
is synchronous and two nodes are able to communicate if they are within
distance at most of of each other ( is the transmitting/receiving
range). The model of this paper concerns nodes without the collision detection
ability: if two or more neighbors of a processor transmit concurrently at
the same time, then would not receive either messages. We suppose also that
nodes know neither the topology of the network nor the number of nodes in the
network. Moreover, they start indistinguishable, anonymous and unnamed. Under
this extremal scenario, we design and analyze a fully distributed protocol to
achieve the initialization task for a wireless multihop network of nodes
uniformly scattered in a square . We show how the transmitting range of the
deployed stations can affect the typical characteristics such as the degrees
and the diameter of the network. By allowing the nodes to transmit at a range
r= \sqrt{\frac{(1+\ell) \ln{n} \SIZE}{\pi n}} (slightly greater than the one
required to have a connected network), we show how to design a randomized
protocol running in expected time in order to assign a
unique number ranging from 1 to to each of the participating nodes
- …