Search CORE

39 research outputs found

Fast Parallel Operations on Search Trees

Author: Akhremtsev Yaroslav
Sanders Peter
Publication venue
Publication date: 11/05/2016
Field of study

Using (a,b)-trees as an example, we show how to perform a parallel split with logarithmic latency and parallel join, bulk updates, intersection, union (or merge), and (symmetric) set difference with logarithmic latency and with information theoretically optimal work. We present both asymptotically optimal solutions and simplified versions that perform well in practice - they are several times faster than previous implementations

arXiv.org e-Print Archive

Crossref

Fast Parallel Algorithms for Basic Problems

Author: Wen Zhaofang
Publication venue: ODU Digital Commons
Publication date: 01/07/1991
Field of study

Parallel processing is one of the most active research areas these days. We are interested in one aspect of parallel processing, i.e. the design and analysis of parallel algorithms. Here, we focus on non-numerical parallel algorithms for basic combinatorial problems, such as data structures, selection, searching, merging and sorting. The purposes of studying these types of problems are to obtain basic building blocks which will be useful in solving complex problems, and to develop fundamental algorithmic techniques. In this thesis, we study the following problems: priority queues, multiple search and multiple selection, and reconstruction of a binary tree from its traversals. The research on priority queue was motivated by its various applications. The purpose of studying multiple search and multiple selection is to explore the relationships between four of the most fundamental problems in algorithm design, that is, selection, searching, merging and sorting; while our parallel solutions can be used as subroutines in algorithms for other problems. The research on the last problem, reconstruction of a binary tree from its traversals, was stimulated by a challenge proposed in a recent paper by Berkman et al. ( Highly Parallelizable Problems, STOC 89) to design doubly logarithmic time optimal parallel algorithms because a remarkably small number of such parallel algorithms exist

Old Dominion University

Deterministic parallel algorithms for bilinear objective functions

Author: Harris David G.
Publication venue
Publication date: 21/06/2018
Field of study

Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low independence. A series of papers, beginning with work by Luby (1988), showed that in many cases these techniques can be combined to give deterministic parallel (NC) algorithms for a variety of combinatorial optimization problems, with low time- and processor-complexity. We extend and generalize a technique of Luby for efficiently handling bilinear objective functions. One noteworthy application is an NC algorithm for maximal independent set. On a graph

G

with

m

edges and

n

vertices, this takes

\tilde O(\log^2 n)

time and

(m + n) n^{o(1)}

processors, nearly matching the best randomized parallel algorithms. Other applications include reduced processor counts for algorithms of Berger (1997) for maximum acyclic subgraph and Gale-Berlekamp switching games. This bilinear factorization also gives better algorithms for problems involving discrepancy. An important application of this is to automata-fooling probability spaces, which are the basis of a notable derandomization technique of Sivakumar (2002). Our method leads to large reduction in processor complexity for a number of derandomization algorithms based on automata-fooling, including set discrepancy and the Johnson-Lindenstrauss Lemma

arXiv.org e-Print Archive

Recommended from our members

Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design

Author: Halverson Ranette Hudson
Publication venue: 'University of North Texas Libraries'
Publication date: 01/12/1993
Field of study

The goal of a parallel algorithm is to solve a single problem using multiple processors working together and to do so in an efficient manner. In this regard, there is a need to categorize strategies in order to solve broad classes of problems with similar structures and requirements. In this dissertation, two parallel algorithm design strategies are considered: linked list ranking and parentheses matching

UNT Digital Library

Efficient parallel computation on multiprocessors with optical interconnection networks

Author: He Min
Publication venue: LSU Digital Commons
Publication date: 01/01/2002
Field of study

This dissertation studies optical interconnection networks, their architecture, address schemes, and computation and communication capabilities. We focus on a simple but powerful optical interconnection network model - the Linear Array with Reconfigurable pipelined Bus System (LARPBS). We extend the LARPBS model to a simplified higher dimensional LAPRBS and provide a set of basic computation operations. We then study the following two groups of parallel computation problems on both one dimensional LARPBS\u27s as well as multi-dimensional LARPBS\u27s: parallel comparison problems, including sorting, merging, and selection; Boolean matrix multiplication, transitive closure and their applications to connected component problems. We implement an optimal sorting algorithm on an n-processor LARPBS. With this optimal sorting algorithm at disposal, we study the sorting problem for higher dimensional LARPBS\u27s and obtain the following results: • An optimal basic Columnsort algorithm on a 2D LARPBS. • Two optimal two-way merge sort algorithms on a 2D LARPBS. • An optimal multi-way merge sorting algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 3D LARPBS. • An optimal 5-phase sorting algorithm on a 3D LARPBS. Results for selection problems are as follows: • A constant time maximum-finding algorithm on an LARPBS. • An optimal maximum-finding algorithm on an LARPBS. • An O((log log n)2) time parallel selection algorithm on an LARPBS. • An O(k(log log n)2) time parallel multi-selection algorithm on an LARPBS. While studying the computation and communication properties of the LARPBS model, we find Boolean matrix multiplication and its applications to the graph are another set of problem that can be solved efficiently on the LARPBS. Following is a list of results we have obtained in this area. • A constant time Boolean matrix multiplication algorithm. • An O(log n)-time transitive closure algorithm. • An O(log n)-time connected components algorithm. • An O(log n)-time strongly connected components algorithm. The results provided in this dissertation show the strong computation and communication power of optical interconnection networks

Louisiana State University

Fault-tolerant sorting networks

Author: Ma Yuan
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1994
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 1994.Includes bibliographical references (p. 148-150).by Yuan Ma.Ph.D

DSpace@MIT

Implicit Decomposition for Write-Efficient Connectivity Algorithms

Author: Ben-David Naama
Blelloch Guy E.
Fineman Jeremy T.
Gibbons Phillip B.
Gu Yan
McGuffey Charles
Shun Julian
Publication venue
Publication date: 07/10/2017
Field of study

The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, bandwidth, and energy. Motivated by this trend, we propose sequential and parallel algorithms to solve graph connectivity problems using significantly fewer writes than conventional algorithms. Our primary algorithmic tool is the construction of an

o(n)

-sized "implicit decomposition" of a bounded-degree graph

G

n

nodes, which combined with read-only access to

G

enables fast answers to connectivity and biconnectivity queries on

G

. The construction breaks the linear-write "barrier", resulting in costs that are asymptotically lower than conventional algorithms while adding only a modest cost to querying time. For general non-sparse graphs on

m

edges, we also provide the first

o(m)

writes and

O(m)

operations parallel algorithms for connectivity and biconnectivity. These algorithms provide insight into how applications can efficiently process computations on large graphs in systems with read-write asymmetry

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model

Author: Acar U. A.
Acar Umut A.
Agrawal Kunal
Agrawal Kunal
Akhremtsev Yaroslav
Arora N. S.
Ben-David Naama
Ben-David Naama
Blelloch Guy E
Blelloch Guy E
Blelloch Guy E
Blelloch Guy E.
Blelloch Guy E.
Blelloch Guy E.
Blumofe Robert D.
Cole Richard
Cole Richard
Cole Richard
Dhulipala Laxman
Dhulipala Laxman
Gil J.
Goodrich Michael T.
Gustedt Jens
Guy
Guy
Miller G.L.
Nievergelt Jürg
Rajasekaran S.
Valiant L. G.
Vishkin Uzi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/06/2020
Field of study

In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and ordered set union, intersection and difference. In the binary-forking model, tasks can only fork into two child tasks, but can do so recursively and asynchronously. The tasks share memory, supporting reads, writes and test-and-sets. Costs are measured in terms of work (total number of instructions), and span (longest dependence chain). The binary-forking model is meant to capture both algorithm performance and algorithm-design considerations on many existing multithreaded languages, which are also asynchronous and rely on binary forks either explicitly or under the covers. In contrast to the widely studied PRAM model, it does not assume arbitrary-way forks nor synchronous operations, both of which are hard to implement in modern hardware. While optimal PRAM algorithms are known for the problems studied herein, it turns out that arbitrary-way forking and strict synchronization are powerful, if unrealistic, capabilities. Natural simulations of these PRAM algorithms in the binary-forking model (i.e., implementations in existing parallel languages) incur an

\Omega(\log n)

overhead in span. This paper explores techniques for designing optimal algorithms when limited to binary forking and assuming asynchrony. All algorithms described in this paper are the first algorithms with optimal work and span in the binary-forking model. Most of the algorithms are simple. Many are randomized

arXiv.org e-Print Archive

Crossref

Communication-Efficient Probabilistic Algorithms: Selection, Sampling, and Checking

Author: Hübschle-Schneider Lorenz
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 14/12/2020
Field of study

Diese Dissertation behandelt drei grundlegende Klassen von Problemen in Big-Data-Systemen, für die wir kommunikationseffiziente probabilistische Algorithmen entwickeln. Im ersten Teil betrachten wir verschiedene Selektionsprobleme, im zweiten Teil das Ziehen gewichteter Stichproben (Weighted Sampling) und im dritten Teil die probabilistische Korrektheitsprüfung von Basisoperationen in Big-Data-Frameworks (Checking). Diese Arbeit ist durch einen wachsenden Bedarf an Kommunikationseffizienz motiviert, der daher rührt, dass der auf das Netzwerk und seine Nutzung zurückzuführende Anteil sowohl der Anschaffungskosten als auch des Energieverbrauchs von Supercomputern und der Laufzeit verteilter Anwendungen immer weiter wächst. Überraschend wenige kommunikationseffiziente Algorithmen sind für grundlegende Big-Data-Probleme bekannt. In dieser Arbeit schließen wir einige dieser Lücken. Zunächst betrachten wir verschiedene Selektionsprobleme, beginnend mit der verteilten Version des klassischen Selektionsproblems, d. h. dem Auffinden des Elements von Rang

k

in einer großen verteilten Eingabe. Wir zeigen, wie dieses Problem kommunikationseffizient gelöst werden kann, ohne anzunehmen, dass die Elemente der Eingabe zufällig verteilt seien. Hierzu ersetzen wir die Methode zur Pivotwahl in einem schon lange bekannten Algorithmus und zeigen, dass dies hinreichend ist. Anschließend zeigen wir, dass die Selektion aus lokal sortierten Folgen – multisequence selection – wesentlich schneller lösbar ist, wenn der genaue Rang des Ausgabeelements in einem gewissen Bereich variieren darf. Dies benutzen wir anschließend, um eine verteilte Prioritätswarteschlange mit Bulk-Operationen zu konstruieren. Später werden wir diese verwenden, um gewichtete Stichproben aus Datenströmen zu ziehen (Reservoir Sampling). Schließlich betrachten wir das Problem, die global häufigsten Objekte sowie die, deren zugehörige Werte die größten Summen ergeben, mit einem stichprobenbasierten Ansatz zu identifizieren. Im Kapitel über gewichtete Stichproben werden zunächst neue Konstruktionsalgorithmen für eine klassische Datenstruktur für dieses Problem, sogenannte Alias-Tabellen, vorgestellt. Zu Beginn stellen wir den ersten Linearzeit-Konstruktionsalgorithmus für diese Datenstruktur vor, der mit konstant viel Zusatzspeicher auskommt. Anschließend parallelisieren wir diesen Algorithmus für Shared Memory und erhalten so den ersten parallelen Konstruktionsalgorithmus für Aliastabellen. Hiernach zeigen wir, wie das Problem für verteilte Systeme mit einem zweistufigen Algorithmus angegangen werden kann. Anschließend stellen wir einen ausgabesensitiven Algorithmus für gewichtete Stichproben mit Zurücklegen vor. Ausgabesensitiv bedeutet, dass die Laufzeit des Algorithmus sich auf die Anzahl der eindeutigen Elemente in der Ausgabe bezieht und nicht auf die Größe der Stichprobe. Dieser Algorithmus kann sowohl sequentiell als auch auf Shared-Memory-Maschinen und verteilten Systemen eingesetzt werden und ist der erste derartige Algorithmus in allen drei Kategorien. Wir passen ihn anschließend an das Ziehen gewichteter Stichproben ohne Zurücklegen an, indem wir ihn mit einem Schätzer für die Anzahl der eindeutigen Elemente in einer Stichprobe mit Zurücklegen kombinieren. Poisson-Sampling, eine Verallgemeinerung des Bernoulli-Sampling auf gewichtete Elemente, kann auf ganzzahlige Sortierung zurückgeführt werden, und wir zeigen, wie ein bestehender Ansatz parallelisiert werden kann. Für das Sampling aus Datenströmen passen wir einen sequentiellen Algorithmus an und zeigen, wie er in einem Mini-Batch-Modell unter Verwendung unserer im Selektionskapitel eingeführten Bulk-Prioritätswarteschlange parallelisiert werden kann. Das Kapitel endet mit einer ausführlichen Evaluierung unserer Aliastabellen-Konstruktionsalgorithmen, unseres ausgabesensitiven Algorithmus für gewichtete Stichproben mit Zurücklegen und unseres Algorithmus für gewichtetes Reservoir-Sampling. Um die Korrektheit verteilter Algorithmen probabilistisch zu verifizieren, schlagen wir Checker für grundlegende Operationen von Big-Data-Frameworks vor. Wir zeigen, dass die Überprüfung zahlreicher Operationen auf zwei „Kern“-Checker reduziert werden kann, nämlich die Prüfung von Aggregationen und ob eine Folge eine Permutation einer anderen Folge ist. Während mehrere Ansätze für letzteres Problem seit geraumer Zeit bekannt sind und sich auch einfach parallelisieren lassen, ist unser Summenaggregations-Checker eine neuartige Anwendung der gleichen Datenstruktur, die auch zählenden Bloom-Filtern und dem Count-Min-Sketch zugrunde liegt. Wir haben beide Checker in Thrill, einem Big-Data-Framework, implementiert. Experimente mit absichtlich herbeigeführten Fehlern bestätigen die von unserer theoretischen Analyse vorhergesagte Erkennungsgenauigkeit. Dies gilt selbst dann, wenn wir häufig verwendete schnelle Hash-Funktionen mit in der Theorie suboptimalen Eigenschaften verwenden. Skalierungsexperimente auf einem Supercomputer zeigen, dass unsere Checker nur sehr geringen Laufzeit-Overhead haben, welcher im Bereich von

2\,\%

liegt und dabei die Korrektheit des Ergebnisses nahezu garantiert wird

KITopen

Efficient Algorithms and Data Structures for Massive Data Sets

Author: Alka
Publication venue
Publication date: 01/01/2010
Field of study

For many algorithmic problems, traditional algorithms that optimise on the number of instructions executed prove expensive on I/Os. Novel and very different design techniques, when applied to these problems, can produce algorithms that are I/O efficient. This thesis adds to the growing chorus of such results. The computational models we use are the external memory model and the W-Stream model. On the external memory model, we obtain the following results. (1) An I/O efficient algorithm for computing minimum spanning trees of graphs that improves on the performance of the best known algorithm. (2) The first external memory version of soft heap, an approximate meldable priority queue. (3) Hard heap, the first meldable external memory priority queue that matches the amortised I/O performance of the known external memory priority queues, while allowing a meld operation at the same amortised cost. (4) I/O efficient exact, approximate and randomised algorithms for the minimum cut problem, which has not been explored before on the external memory model. (5) Some lower and upper bounds on I/Os for interval graphs. On the W-Stream model, we obtain the following results. (1) Algorithms for various tree problems and list ranking that match the performance of the best known algorithms and are easier to implement than them. (2) Pass efficient algorithms for sorting, and the maximal independent set problems, that improve on the best known algorithms. (3) Pass efficient algorithms for the graphs problems of finding vertex-colouring, approximate single source shortest paths, maximal matching, and approximate weighted vertex cover. (4) Lower bounds on passes for list ranking and maximal matching. We propose two variants of the W-Stream model, and design algorithms for the maximal independent set, vertex-colouring, and planar graph single source shortest paths problems on those models.Comment: PhD Thesis (144 pages

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server