Search CORE

2,609 research outputs found

On the design and implementation of broadcast and global combine operations using the postal model

Author: Bruck Jehoshua
De Coster Luc
Dewulf Natalie
Ho Ching-Tien
Lauwereins Rudy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/1996
Field of study

There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1. Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%

Caltech Authors

Parallel Weighted Random Sampling

Author: Sanders Peter
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries

Dagstuhl Research Online Publication Server

Lightweight MPI Communicators with Applications to Perfectly Balanced Quicksort

Author: Axtmann Michael
Sanders Peter
Wiebigke Armin
Publication venue
Publication date: 01/01/2018
Field of study

MPI uses the concept of communicators to connect groups of processes. It provides nonblocking collective operations on communicators to overlap communication and computation. Flexible algorithms demand flexible communicators. E.g., a process can work on different subproblems within different process groups simultaneously, new process groups can be created, or the members of a process group can change. Depending on the number of communicators, the time for communicator creation can drastically increase the running time of the algorithm. Furthermore, a new communicator synchronizes all processes as communicator creation routines are blocking collective operations. We present RBC, a communication library based on MPI, that creates range-based communicators in constant time without communication. These RBC communicators support (non)blocking point-to-point communication as well as (non)blocking collective operations. Our experiments show that the library reduces the time to create a new communicator by a factor of more than 400 whereas the running time of collective operations remains about the same. We propose Janus Quicksort, a distributed sorting algorithm that avoids any load imbalances. We improved the performance of this algorithm by a factor of 15 for moderate inputs by using RBC communicators. Finally, we discuss different approaches to bring nonblocking (local) communicator creation of lightweight (range-based) communicators into MPI

arXiv.org e-Print Archive

Crossref

KITopen

Practical Massively Parallel Sorting

Author: Axtmann Michael
Bingmann Timo
Sanders Peter
Schulz Christian
Publication venue: Association for Computing Machinery
Publication date: 01/01/2015
Field of study

KITopen

Randomized Initialization of a Wireless Multihop Network

Author: Ravelomanana Vlady
Publication venue
Publication date: 03/12/2004
Field of study

Address autoconfiguration is an important mechanism required to set the IP address of a node automatically in a wireless network. The address autoconfiguration, also known as initialization or naming, consists to give a unique identifier ranging from 1 to

n

for a set of

n

indistinguishable nodes. We consider a wireless network where

n

nodes (processors) are randomly thrown in a square

X

, uniformly and independently. We assume that the network is synchronous and two nodes are able to communicate if they are within distance at most of

r

of each other (

r

is the transmitting/receiving range). The model of this paper concerns nodes without the collision detection ability: if two or more neighbors of a processor

u

transmit concurrently at the same time, then

u

would not receive either messages. We suppose also that nodes know neither the topology of the network nor the number of nodes in the network. Moreover, they start indistinguishable, anonymous and unnamed. Under this extremal scenario, we design and analyze a fully distributed protocol to achieve the initialization task for a wireless multihop network of

n

nodes uniformly scattered in a square

X

. We show how the transmitting range of the deployed stations can affect the typical characteristics such as the degrees and the diameter of the network. By allowing the nodes to transmit at a range r= \sqrt{\frac{(1+\ell) \ln{n} \SIZE}{\pi n}} (slightly greater than the one required to have a connected network), we show how to design a randomized protocol running in expected time

O(n^{3/2} \log^2{n})

in order to assign a unique number ranging from 1 to

n

to each of the

n

participating nodes

arXiv.org e-Print Archive

HAL-Paris 13