Search CORE

30 research outputs found

Parallel Algorithmic Techniques for Combinatorial Computation

Author: Eppstein David
Galil Zvi
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1988
Field of study

Parallel computation offers the promise of great improvements in the solution of problems that, if we were restricted to sequential computation, would take so much time that solution would be impractical. There is a drawback to the use of parallel computers, however, and that is that they seem to be harder to program. For this reason, parallel algorithms in practice are often restricted to simple problems such as matrix multiplication. Certainly this is useful, and in fact we shall see later some non-obvious uses of matrix manipulation, but many of the large problems requiring solution are of a more complex nature. In particular, an instance of a problem may be structured as an arbitrary graph or tree, rather than in the regular order of a matrix. In this paper we describe a number of algorithmic techniques that have been developed for solving such combinatorial problems. The intent of the paper is to show how the algorithmic tools we present can be used as building blocks for higher level algorithms, and to present pointers to the literature for the reader to look up the specifics of these algorithms. We make no claim to completeness; a number of techniques have been omitted for brevity or because their chief application is not combinatorial in nature. In particular we give very little attention to parallel sorting, although sorting is used as a subroutine in a number of the algorithms we describe. We also only describe algorithms, and not lower bounds, for solving problems in parallel

CiteSeerX

Columbia University Academic Commons

Recommended from our members

Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design

Author: Halverson Ranette Hudson
Publication venue: 'University of North Texas Libraries'
Publication date: 01/12/1993
Field of study

The goal of a parallel algorithm is to solve a single problem using multiple processors working together and to do so in an efficient manner. In this regard, there is a need to categorize strategies in order to solve broad classes of problems with similar structures and requirements. In this dissertation, two parallel algorithm design strategies are considered: linked list ranking and parentheses matching

UNT Digital Library

Data Oblivious Algorithms for Multicores

Author: Elaine Shi
Vijaya Ramachandran
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 29/06/2021
Field of study

As secure processors such as Intel SGX (with hyperthreading) become widely adopted, there is a growing appetite for private analytics on big data. Most prior works on data-oblivious algorithms adopt the classical PRAM model to capture parallelism. However, it is widely understood that PRAM does not best capture realistic multicore processors, nor does it reflect parallel programming models adopted in practice. In this paper, we initiate the study of parallel data oblivious algorithms on realistic multicores, best captured by the binary fork-join model of computation. We first show that data-oblivious sorting can be accomplished by a binary fork-join algorithm with optimal total work and optimal (cache-oblivious) cache complexity, and in O(log n log log n) span (i.e., parallel time) that matches the best-known insecure algorithm. Using our sorting algorithm as a core primitive, we show how to data-obliviously simulate general PRAM algorithms in the binary fork-join model with non-trivial efficiency. We also present results for several applications including list ranking, Euler tour, tree contraction, connected components, and minimum spanning forest. For a subset of these applications, our data-oblivious algorithms asymptotically outperform the best known insecure algorithms. For other applications, we show data oblivious algorithms whose performance bounds match the best known insecure algorithms. Complementing these asymptotically efficient results, we present a practical variant of our sorting algorithm that is self-contained and potentially implementable. It has optimal caching cost, and it is only a log log n factor off from optimal work and about a log n factor off in terms of span; moreover, it achieves small constant factors in its bounds

arXiv.org e-Print Archive

Cryptology ePrint Archive

Optimal parallel string algorithms: sorting, merching and computing the minimum

Author: Hagerup T.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1993
Field of study

We study fundamental comparison problems on strings of characters, equipped with the usual lexicographical ordering. For each problem studied, we give a parallel algorithm that is optimal with respect to at least one criterion for which no optimal algorithm was previously known. Specifically, our main results are: % \begin{itemize} \item Two sorted sequences of strings, containing altogether

n

~characters, can be merged in

O(\log n)

time using

O(n)

operations on an EREW PRAM. This is optimal as regards both the running time and the number of operations. \item A sequence of strings, containing altogether

n

~characters represented by integers of size polynomial in~

n

, can be sorted in

O({{\log n}/{\log\log n}})

time using

O(n\log\log n)

operations on a CRCW PRAM. The running time is optimal for any polynomial number of processors. \item The minimum string in a sequence of strings containing altogether

n

characters can be found using (expected)

O(n)

operations in constant expected time on a randomized CRCW PRAM, in

O(\log\log n)

time on a deterministic CRCW PRAM with a program depending on~

n

, in

O((\log\log n)^3)

time on a deterministic CRCW PRAM with a program not depending on~

n

, in

O(\log n)

expected time on a randomized EREW PRAM, and in

O(\log n\log\log n)

time on a deterministic EREW PRAM. The number of operations is optimal, and the running time is optimal for the randomized algorithms and, if the number of processors is limited to~

n

, for the nonuniform deterministic CRCW PRAM algorithm as we

MPG.PuRe

Aspects of practical implementations of PRAM algorithms

Author: Ravindran Somasundaram
Publication venue
Publication date
Field of study

The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

Warwick Research Archives Portal Repository

Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model

Author: Acar U. A.
Acar Umut A.
Agrawal Kunal
Agrawal Kunal
Akhremtsev Yaroslav
Arora N. S.
Ben-David Naama
Ben-David Naama
Blelloch Guy E
Blelloch Guy E
Blelloch Guy E
Blelloch Guy E.
Blelloch Guy E.
Blelloch Guy E.
Blumofe Robert D.
Cole Richard
Cole Richard
Cole Richard
Dhulipala Laxman
Dhulipala Laxman
Gil J.
Goodrich Michael T.
Gustedt Jens
Guy
Guy
Miller G.L.
Nievergelt Jürg
Rajasekaran S.
Valiant L. G.
Vishkin Uzi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/06/2020
Field of study

In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and ordered set union, intersection and difference. In the binary-forking model, tasks can only fork into two child tasks, but can do so recursively and asynchronously. The tasks share memory, supporting reads, writes and test-and-sets. Costs are measured in terms of work (total number of instructions), and span (longest dependence chain). The binary-forking model is meant to capture both algorithm performance and algorithm-design considerations on many existing multithreaded languages, which are also asynchronous and rely on binary forks either explicitly or under the covers. In contrast to the widely studied PRAM model, it does not assume arbitrary-way forks nor synchronous operations, both of which are hard to implement in modern hardware. While optimal PRAM algorithms are known for the problems studied herein, it turns out that arbitrary-way forking and strict synchronization are powerful, if unrealistic, capabilities. Natural simulations of these PRAM algorithms in the binary-forking model (i.e., implementations in existing parallel languages) incur an

\Omega(\log n)

overhead in span. This paper explores techniques for designing optimal algorithms when limited to binary forking and assuming asynchrony. All algorithms described in this paper are the first algorithms with optimal work and span in the binary-forking model. Most of the algorithms are simple. Many are randomized

arXiv.org e-Print Archive

Crossref

Realizing degree sequences in parallel

Author: Arikati S.
Maheshwari A.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1994
Field of study

A sequence

d

of integers is a degree sequence if there exists a (simple) graph

G

such that the components of

d

are equal to the degrees of the vertices of

G

. The graph

G

is said to be a realization of

d

. We provide an efficient parallel algorithm to realize

d

. Before our result, it was not known if the problem of realizing

d

is in

NC

Carleton University's Institutional Repository

MPG.PuRe

Recommended from our members

Analysis and design of algorithms : double hashing and parallel graph searching

Author: Molodowitch Mariko
Publication venue: eScholarship, University of California
Publication date: 01/01/1990
Field of study

The following is in two parts, corresponding to the two separate topics in the dissertation.Probabilistic Analysis of Double HashingIn [GS78], a deep and elegant analysis shows that double hashing is asymptotically equivalent to the ideal uniform hashing up to a load factor of about 0.319. In this paper we show how a resampling technique can be used to develop a surprisingly simple proof of the result that this equivalence holds for load factors arbitrarily close to 1.Parallel Depth First Search of Planar Directed Acyclic GraphsIn 1988, Kao [Kao88] presented the first NC algorithm for the depth first search of a directed planar graph. Recently, Kao and Klein [KK90] reduced the number of processors required from O(n^4) to linear, but the time bound is O(log^8 n).We present an algorithm for the depth first search of a planar directed acyclic graph with k sources using O(n) processors and O(log k log n) time on a CRCW PRAM model. For planar dags with a single source and a single sink, we present a simple optimal algorithm which gives the depth first search in O(log n) time with O(n/log n) processors on an EREW PRAM. For a single-source multiple-sink planar dag, we have an O(log n) time O(n) processor EREW algorithm. The EREW algorithms assume that the embedding is given. A simplified variant of the depth first search of a multisource planar dag can be used to solve the single source reachability problem for a planar directed acyclic graph in O(log^2 n) time and O(n) processors on an CRCW PRAM. Since an O(log^4 n) algorithm for this problem is used as a subroutine by Kao and Klein in their depth first search for the general planar directed graph, this will lower their time bound by a factor of log^2 n. Our work uses the concept of a planar Euler tour depth first search, a depth first search in which the Euler tour around the tree is planar and crosses no tree edge. This concept may prove to be of use in other parallel algorithms for planar graphs

eScholarship - University of California

Efficient Algorithms and Data Structures for Massive Data Sets

Author: Alka
Publication venue
Publication date: 01/01/2010
Field of study

For many algorithmic problems, traditional algorithms that optimise on the number of instructions executed prove expensive on I/Os. Novel and very different design techniques, when applied to these problems, can produce algorithms that are I/O efficient. This thesis adds to the growing chorus of such results. The computational models we use are the external memory model and the W-Stream model. On the external memory model, we obtain the following results. (1) An I/O efficient algorithm for computing minimum spanning trees of graphs that improves on the performance of the best known algorithm. (2) The first external memory version of soft heap, an approximate meldable priority queue. (3) Hard heap, the first meldable external memory priority queue that matches the amortised I/O performance of the known external memory priority queues, while allowing a meld operation at the same amortised cost. (4) I/O efficient exact, approximate and randomised algorithms for the minimum cut problem, which has not been explored before on the external memory model. (5) Some lower and upper bounds on I/Os for interval graphs. On the W-Stream model, we obtain the following results. (1) Algorithms for various tree problems and list ranking that match the performance of the best known algorithms and are easier to implement than them. (2) Pass efficient algorithms for sorting, and the maximal independent set problems, that improve on the best known algorithms. (3) Pass efficient algorithms for the graphs problems of finding vertex-colouring, approximate single source shortest paths, maximal matching, and approximate weighted vertex cover. (4) Lower bounds on passes for list ranking and maximal matching. We propose two variants of the W-Stream model, and design algorithms for the maximal independent set, vertex-colouring, and planar graph single source shortest paths problems on those models.Comment: PhD Thesis (144 pages

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server