11 research outputs found
Randomized Permutations in a Coarse Grained Parallel Environment [extended abstract]
International audienceWe show how to uniformly distribute data at random (not to be confounded with permutation routing) in a coarse grained parallel environment with p processors. In contrast to previously known work, our method is able to fulfill the three criteria of uniformity, work-optimality and balance among the processors simultaneously. To guarantee the uniformity we investigate the matrix of communication requests between the processors. We show that its distribution is a generalization of the multivariate hypergeometric distribution and we give algorithms to compute it efficiently
Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model
In this paper we develop optimal algorithms in the binary-forking model for a
variety of fundamental problems, including sorting, semisorting, list ranking,
tree contraction, range minima, and ordered set union, intersection and
difference. In the binary-forking model, tasks can only fork into two child
tasks, but can do so recursively and asynchronously. The tasks share memory,
supporting reads, writes and test-and-sets. Costs are measured in terms of work
(total number of instructions), and span (longest dependence chain).
The binary-forking model is meant to capture both algorithm performance and
algorithm-design considerations on many existing multithreaded languages, which
are also asynchronous and rely on binary forks either explicitly or under the
covers. In contrast to the widely studied PRAM model, it does not assume
arbitrary-way forks nor synchronous operations, both of which are hard to
implement in modern hardware. While optimal PRAM algorithms are known for the
problems studied herein, it turns out that arbitrary-way forking and strict
synchronization are powerful, if unrealistic, capabilities. Natural simulations
of these PRAM algorithms in the binary-forking model (i.e., implementations in
existing parallel languages) incur an overhead in span. This
paper explores techniques for designing optimal algorithms when limited to
binary forking and assuming asynchrony. All algorithms described in this paper
are the first algorithms with optimal work and span in the binary-forking
model. Most of the algorithms are simple. Many are randomized
Fast Generation of Random Permutations via Networks Simulation
We consider the problem of generating random permutations with the uniform distribution. That is, we require that for an arbitrary permutation of n elements, with probability 1=n! the machine halts with the ith output cell containing (i), for 1 i n. We study this problem on two models of parallel computations: the CREW PRAM and the EREW PRAM. The main result of the paper is an algorithm for generating random permutations that runs in O(log log n) time and uses O(n 1+o(1) ) processors on the CREW PRAM. This is the first o(log n)-time CREW PRAM algorithm for this problem. On the EREW PRAM we present a simple algorithm that generates a random permutation in time O(log n) using n processors and O(n) space. This algorithm outperforms each of the previously known algorithms for the exclusive write PRAMs. The common and novel feature of both our algorithms is first to design a suitable random switching network generating a permutation and then to simulate this network on..
Fast Generation of Random Permutations via Networks Simulation
We consider the classical problem of generating random permutations with the uniform distribution. That is, we require that for an arbitrary permutation ß of n elements, with probability 1/n! the machine halts with the ith output cell containing ß(i), for 1 i n. We study this problem on two models of parallel computations: the CREW PRAM and the EREW PRAM. The main result of the paper is an algorithm for generating random permutations that runs in O(log log n) time and uses O(n 1+o(1) ) processors on the CREW PRAM. This is the first o(log n)-time CREW PRAM algorithm for this problem. On the EREW PRAM we present a simple algorithm that generates a random permutation in time O(log n) using n processors and O(n) space. This algorithm matches the running time and the number of processors used of the best previously known algorithms for the CREW PRAM, and performs better as far as the memory usage is considered. The common and novel feature of both our algorithms is to design first a s..
Fast Generation of Random Permutations via Networks Simulation
We consider the classical problem of generating random permutations with the uniform distribution. That is, we require that for an arbitrary permutation ß of n elements, with probability 1=n! the machine halts with the ith output cell containing ß(i), for 1 i n. We study this problem on two models of parallel computations: the CREW PRAM and the EREW PRAM. The main result of the paper is an algorithm for generating random permutations that runs in O(log log n) time and uses O(n 1+o(1) ) processors on the CREW PRAM. This is the first o(log n)-time CREW PRAM algorithm for this problem. On the EREW PRAM we present a simple algorithm that generates a random permutation in time O(log n) using n processors and O(n) space. This algorithm matches the running time and the number of processors used of the best previously known algorithms for the CREW PRAM, and performs better as far as the memory usage is considered. The common and novel feature of both our algorithms is to design first a s..