30 research outputs found

    Parallel Algorithmic Techniques for Combinatorial Computation

    Get PDF
    Parallel computation offers the promise of great improvements in the solution of problems that, if we were restricted to sequential computation, would take so much time that solution would be impractical. There is a drawback to the use of parallel computers, however, and that is that they seem to be harder to program. For this reason, parallel algorithms in practice are often restricted to simple problems such as matrix multiplication. Certainly this is useful, and in fact we shall see later some non-obvious uses of matrix manipulation, but many of the large problems requiring solution are of a more complex nature. In particular, an instance of a problem may be structured as an arbitrary graph or tree, rather than in the regular order of a matrix. In this paper we describe a number of algorithmic techniques that have been developed for solving such combinatorial problems. The intent of the paper is to show how the algorithmic tools we present can be used as building blocks for higher level algorithms, and to present pointers to the literature for the reader to look up the specifics of these algorithms. We make no claim to completeness; a number of techniques have been omitted for brevity or because their chief application is not combinatorial in nature. In particular we give very little attention to parallel sorting, although sorting is used as a subroutine in a number of the algorithms we describe. We also only describe algorithms, and not lower bounds, for solving problems in parallel

    Data Oblivious Algorithms for Multicores

    Get PDF
    As secure processors such as Intel SGX (with hyperthreading) become widely adopted, there is a growing appetite for private analytics on big data. Most prior works on data-oblivious algorithms adopt the classical PRAM model to capture parallelism. However, it is widely understood that PRAM does not best capture realistic multicore processors, nor does it reflect parallel programming models adopted in practice. In this paper, we initiate the study of parallel data oblivious algorithms on realistic multicores, best captured by the binary fork-join model of computation. We first show that data-oblivious sorting can be accomplished by a binary fork-join algorithm with optimal total work and optimal (cache-oblivious) cache complexity, and in O(log n log log n) span (i.e., parallel time) that matches the best-known insecure algorithm. Using our sorting algorithm as a core primitive, we show how to data-obliviously simulate general PRAM algorithms in the binary fork-join model with non-trivial efficiency. We also present results for several applications including list ranking, Euler tour, tree contraction, connected components, and minimum spanning forest. For a subset of these applications, our data-oblivious algorithms asymptotically outperform the best known insecure algorithms. For other applications, we show data oblivious algorithms whose performance bounds match the best known insecure algorithms. Complementing these asymptotically efficient results, we present a practical variant of our sorting algorithm that is self-contained and potentially implementable. It has optimal caching cost, and it is only a log log n factor off from optimal work and about a log n factor off in terms of span; moreover, it achieves small constant factors in its bounds

    Optimal parallel string algorithms: sorting, merching and computing the minimum

    No full text
    We study fundamental comparison problems on strings of characters, equipped with the usual lexicographical ordering. For each problem studied, we give a parallel algorithm that is optimal with respect to at least one criterion for which no optimal algorithm was previously known. Specifically, our main results are: % \begin{itemize} \item Two sorted sequences of strings, containing altogether nn~characters, can be merged in O(logn)O(\log n) time using O(n)O(n) operations on an EREW PRAM. This is optimal as regards both the running time and the number of operations. \item A sequence of strings, containing altogether nn~characters represented by integers of size polynomial in~nn, can be sorted in O(logn/loglogn)O({{\log n}/{\log\log n}}) time using O(nloglogn)O(n\log\log n) operations on a CRCW PRAM. The running time is optimal for any polynomial number of processors. \item The minimum string in a sequence of strings containing altogether nn characters can be found using (expected) O(n)O(n) operations in constant expected time on a randomized CRCW PRAM, in O(loglogn)O(\log\log n) time on a deterministic CRCW PRAM with a program depending on~nn, in O((loglogn)3)O((\log\log n)^3) time on a deterministic CRCW PRAM with a program not depending on~nn, in O(logn)O(\log n) expected time on a randomized EREW PRAM, and in O(lognloglogn)O(\log n\log\log n) time on a deterministic EREW PRAM. The number of operations is optimal, and the running time is optimal for the randomized algorithms and, if the number of processors is limited to~nn, for the nonuniform deterministic CRCW PRAM algorithm as we

    Aspects of practical implementations of PRAM algorithms

    Get PDF
    The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

    Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model

    Full text link
    In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and ordered set union, intersection and difference. In the binary-forking model, tasks can only fork into two child tasks, but can do so recursively and asynchronously. The tasks share memory, supporting reads, writes and test-and-sets. Costs are measured in terms of work (total number of instructions), and span (longest dependence chain). The binary-forking model is meant to capture both algorithm performance and algorithm-design considerations on many existing multithreaded languages, which are also asynchronous and rely on binary forks either explicitly or under the covers. In contrast to the widely studied PRAM model, it does not assume arbitrary-way forks nor synchronous operations, both of which are hard to implement in modern hardware. While optimal PRAM algorithms are known for the problems studied herein, it turns out that arbitrary-way forking and strict synchronization are powerful, if unrealistic, capabilities. Natural simulations of these PRAM algorithms in the binary-forking model (i.e., implementations in existing parallel languages) incur an Ω(logn)\Omega(\log n) overhead in span. This paper explores techniques for designing optimal algorithms when limited to binary forking and assuming asynchrony. All algorithms described in this paper are the first algorithms with optimal work and span in the binary-forking model. Most of the algorithms are simple. Many are randomized

    Realizing degree sequences in parallel

    No full text
    A sequence dd of integers is a degree sequence if there exists a (simple) graph GG such that the components of dd are equal to the degrees of the vertices of GG. The graph GG is said to be a realization of dd. We provide an efficient parallel algorithm to realize dd. Before our result, it was not known if the problem of realizing dd is in NCNC

    Efficient Algorithms and Data Structures for Massive Data Sets

    Full text link
    For many algorithmic problems, traditional algorithms that optimise on the number of instructions executed prove expensive on I/Os. Novel and very different design techniques, when applied to these problems, can produce algorithms that are I/O efficient. This thesis adds to the growing chorus of such results. The computational models we use are the external memory model and the W-Stream model. On the external memory model, we obtain the following results. (1) An I/O efficient algorithm for computing minimum spanning trees of graphs that improves on the performance of the best known algorithm. (2) The first external memory version of soft heap, an approximate meldable priority queue. (3) Hard heap, the first meldable external memory priority queue that matches the amortised I/O performance of the known external memory priority queues, while allowing a meld operation at the same amortised cost. (4) I/O efficient exact, approximate and randomised algorithms for the minimum cut problem, which has not been explored before on the external memory model. (5) Some lower and upper bounds on I/Os for interval graphs. On the W-Stream model, we obtain the following results. (1) Algorithms for various tree problems and list ranking that match the performance of the best known algorithms and are easier to implement than them. (2) Pass efficient algorithms for sorting, and the maximal independent set problems, that improve on the best known algorithms. (3) Pass efficient algorithms for the graphs problems of finding vertex-colouring, approximate single source shortest paths, maximal matching, and approximate weighted vertex cover. (4) Lower bounds on passes for list ranking and maximal matching. We propose two variants of the W-Stream model, and design algorithms for the maximal independent set, vertex-colouring, and planar graph single source shortest paths problems on those models.Comment: PhD Thesis (144 pages
    corecore