27 research outputs found

    Methods for Partitioning Data to Improve Parallel Execution Time for Sorting on Heterogeneous Clusters

    Get PDF
    International audienceThe aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For uniformly related processors (processors speeds are related by a constant factor), we develop a constant time technique for mastering processor load and execution time in an heterogeneous environment and also a technique to deal with unknown cost functions. For non uniformly related processors, we use a technique based on dynamic programming. Most of the time, the solutions are in O(p) (p is the number of processors), independent of the problem size n. Consequently, there is a small overhead regarding the problem we deal with but it is inherently limited by the knowing of time complexity of the portion of code following the partitioning

    An Elegant Algorithm for the Construction of Suffix Arrays

    Get PDF
    The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array consists of the sorted suffixes of a string. There are several linear time suffix array construction algorithms (SACAs) known in the literature. However, one of the fastest algorithms in practice has a worst case run time of O(n2)O(n^2). The problem of designing practically and theoretically efficient techniques remains open. In this paper we present an elegant algorithm for suffix array construction which takes linear time with high probability; the probability is on the space of all possible inputs. Our algorithm is one of the simplest of the known SACAs and it opens up a new dimension of suffix array construction that has not been explored until now. Our algorithm is easily parallelizable. We offer parallel implementations on various parallel models of computing. We prove a lemma on the ℓ\ell-mers of a random string which might find independent applications. We also present another algorithm that utilizes the above algorithm. This algorithm is called RadixSA and has a worst case run time of O(nlog⁡n)O(n\log{n}). RadixSA introduces an idea that may find independent applications as a speedup technique for other SACAs. An empirical comparison of RadixSA with other algorithms on various datasets reveals that our algorithm is one of the fastest algorithms to date. The C++ source code is freely available at http://www.engr.uconn.edu/~man09004/radixSA.zi

    Deterministic Selection on the Mesh and Hypercube

    Get PDF
    In this paper we present efficient deterministic algorithms for selection on the mesh connected computers (referred to as the mesh from hereon) and the hypercube. Our algorithm on the mesh runs in time O([n/p] log logp + √p logn) where n is the input size and p is the number of processors. The time bound is significantly better than that of the best existing algorithms when n is large. The run time of our algorithm on the hypercube is O ([n/p] log log p + Ts/p log nM/em\u3e), where Ts/p is the time needed to sort p element on a p-node hypercube. In fact, the same algorithm runs on an network in time O([n/p] log log p +Ts/p log), where Ts/p is the time needed for sorting p keys using p processors (assuming that broadcast and prefix computations take time less than or equal to Ts/p

    Evaluating holistic aggregators efficiently for very large datasets

    Get PDF
    In data warehousing applications, numerous OLAP queries involve the processing of holistic aggregators such as computing the “top n,” median, quantiles, etc. In this paper, we present a novel approach called dynamic bucketing to efficiently evaluate these aggregators. We partition data into equiwidth buckets and further partition dense buckets into sub-buckets as needed by allocating and reclaiming memory space. The bucketing process dynamically adapts to the input order and distribution of input datasets. The histograms of the buckets and subbuckets are stored in our new data structure called structure trees. A recent selection algorithm based on regular sampling is generalized and its analysis extended. We have also compared our new algorithms with this generalized algorithm and several other recent algorithms. Experimental results show that our new algorithms significantly outperform prior ones not only in the runtime but also in accuracy

    Selection, Routing and Sorting on the Star Graph

    Get PDF
    We consider the problems of selection, routing and sorting on an n-star graph (with n! nodes), an interconnection network which has been proven to possess many special properties. We identify a tree like subgraph (which we call as a \u27(k, l, k) chain network\u27) of the star graph which enables us to design efficient algorithms for the above mentioned problems. We present an algorithm that performs a sequence of n prefix computations in O(n2) time. This algorithm is used as a subroutine in our other algorithms. In addition we offer an efficient deterministic sorting algorithm that runs in O(n3lg n) steps. Though an algorithm with the same time bound has been proposed before, our algorithm is very simple and is based on a different approach. We also show that sorting can be performed on the n star graph in time O(n3) and that selection of a set of uniformly distributed n keys can be performed in O(n2) time with high probability. Finally, we also present a deterministic (non oblivious) routing algorithm that realizes any permutation in O(n3) steps on the n-star graph. There exists an algorithm in the literature that can perform a single prefix computation in O(n lg n) time. The best known previous algorithm for sorting has a run time of O(n3 lg n) and is deterministic. To our knowledge, the problem of selection has not been considered before on the star graph

    Granularity of parallel memories

    Get PDF
    Consider algorithms which are designed for shared memory models of parallel computation in which processors are allowed to have fairly unrestricted access patterns to the shared memory. General fast simulations of such algorithms by parallel machines in which the shared memory is organized in modules where only one cell of each module can be accessed at a time are proposed. The paper provides a comprehensive study of the problem. The solution involves three stages: (a) Before a simulation, distribute randomly the memory addresses among the memory modules. (b) Keep several copies of each address and assign memory requests of processors to the "right\u27; copies at any time. (c) Satisfy these assigned memory requests according to specifications of the parallel machine
    corecore