27 research outputs found
Methods for Partitioning Data to Improve Parallel Execution Time for Sorting on Heterogeneous Clusters
International audienceThe aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For uniformly related processors (processors speeds are related by a constant factor), we develop a constant time technique for mastering processor load and execution time in an heterogeneous environment and also a technique to deal with unknown cost functions. For non uniformly related processors, we use a technique based on dynamic programming. Most of the time, the solutions are in O(p) (p is the number of processors), independent of the problem size n. Consequently, there is a small overhead regarding the problem we deal with but it is inherently limited by the knowing of time complexity of the portion of code following the partitioning
An Elegant Algorithm for the Construction of Suffix Arrays
The suffix array is a data structure that finds numerous applications in
string processing problems for both linguistic texts and biological data. It
has been introduced as a memory efficient alternative for suffix trees. The
suffix array consists of the sorted suffixes of a string. There are several
linear time suffix array construction algorithms (SACAs) known in the
literature. However, one of the fastest algorithms in practice has a worst case
run time of . The problem of designing practically and theoretically
efficient techniques remains open. In this paper we present an elegant
algorithm for suffix array construction which takes linear time with high
probability; the probability is on the space of all possible inputs. Our
algorithm is one of the simplest of the known SACAs and it opens up a new
dimension of suffix array construction that has not been explored until now.
Our algorithm is easily parallelizable. We offer parallel implementations on
various parallel models of computing. We prove a lemma on the -mers of a
random string which might find independent applications. We also present
another algorithm that utilizes the above algorithm. This algorithm is called
RadixSA and has a worst case run time of . RadixSA introduces an
idea that may find independent applications as a speedup technique for other
SACAs. An empirical comparison of RadixSA with other algorithms on various
datasets reveals that our algorithm is one of the fastest algorithms to date.
The C++ source code is freely available at
http://www.engr.uconn.edu/~man09004/radixSA.zi
Deterministic Selection on the Mesh and Hypercube
In this paper we present efficient deterministic algorithms for selection on the mesh connected computers (referred to as the mesh from hereon) and the hypercube. Our algorithm on the mesh runs in time O([n/p] log logp + âp logn) where n is the input size and p is the number of processors. The time bound is significantly better than that of the best existing algorithms when n is large. The run time of our algorithm on the hypercube is O ([n/p] log log p + Ts/p log nM/em\u3e), where Ts/p is the time needed to sort p element on a p-node hypercube. In fact, the same algorithm runs on an network in time O([n/p] log log p +Ts/p log), where Ts/p is the time needed for sorting p keys using p processors (assuming that broadcast and prefix computations take time less than or equal to Ts/p
Evaluating holistic aggregators efficiently for very large datasets
In data warehousing applications, numerous OLAP queries involve the processing of holistic aggregators such as computing the âtop n,â median, quantiles, etc. In this paper, we present a novel approach called dynamic bucketing to efficiently evaluate these aggregators. We partition data into equiwidth buckets and further partition dense buckets into sub-buckets as needed by allocating and reclaiming memory space. The bucketing process dynamically adapts to the input order and distribution of input datasets. The histograms of the buckets and subbuckets are stored in our new data structure called structure trees. A recent selection algorithm based on regular sampling is generalized and its analysis extended. We have also compared our new algorithms with this generalized algorithm and several other recent algorithms. Experimental results show that our new algorithms significantly outperform prior ones not only in the runtime but also in accuracy
Selection, Routing and Sorting on the Star Graph
We consider the problems of selection, routing and sorting on an n-star graph (with n! nodes), an interconnection network which has been proven to possess many special properties. We identify a tree like subgraph (which we call as a \u27(k, l, k) chain network\u27) of the star graph which enables us to design efficient algorithms for the above mentioned problems. We present an algorithm that performs a sequence of n prefix computations in O(n2) time. This algorithm is used as a subroutine in our other algorithms. In addition we offer an efficient deterministic sorting algorithm that runs in O(n3lg n) steps. Though an algorithm with the same time bound has been proposed before, our algorithm is very simple and is based on a different approach. We also show that sorting can be performed on the n star graph in time O(n3) and that selection of a set of uniformly distributed n keys can be performed in O(n2) time with high probability. Finally, we also present a deterministic (non oblivious) routing algorithm that realizes any permutation in O(n3) steps on the n-star graph. There exists an algorithm in the literature that can perform a single prefix computation in O(n lg n) time. The best known previous algorithm for sorting has a run time of O(n3 lg n) and is deterministic. To our knowledge, the problem of selection has not been considered before on the star graph
Granularity of parallel memories
Consider algorithms which are designed for shared memory models of parallel computation in which processors are allowed to have fairly unrestricted access patterns to the shared memory. General fast simulations of such algorithms by parallel machines in which the shared memory is organized in modules where only one cell of each module can be accessed at a time are proposed. The paper provides a comprehensive study of the problem. The solution involves three stages:
(a) Before a simulation, distribute randomly the memory addresses among the memory modules.
(b) Keep several copies of each address and assign memory requests of processors to the "right\u27; copies at any time.
(c) Satisfy these assigned memory requests according to specifications of the parallel machine