39 research outputs found
Fast Parallel Operations on Search Trees
Using (a,b)-trees as an example, we show how to perform a parallel split with
logarithmic latency and parallel join, bulk updates, intersection, union (or
merge), and (symmetric) set difference with logarithmic latency and with
information theoretically optimal work. We present both asymptotically optimal
solutions and simplified versions that perform well in practice - they are
several times faster than previous implementations
Fast Parallel Algorithms for Basic Problems
Parallel processing is one of the most active research areas these days. We are interested in one aspect of parallel processing, i.e. the design and analysis of parallel algorithms. Here, we focus on non-numerical parallel algorithms for basic combinatorial problems, such as data structures, selection, searching, merging and sorting. The purposes of studying these types of problems are to obtain basic building blocks which will be useful in solving complex problems, and to develop fundamental algorithmic techniques.
In this thesis, we study the following problems: priority queues, multiple search and multiple selection, and reconstruction of a binary tree from its traversals. The research on priority queue was motivated by its various applications. The purpose of studying multiple search and multiple selection is to explore the relationships between four of the most fundamental problems in algorithm design, that is, selection, searching, merging and sorting; while our parallel solutions can be used as subroutines in algorithms for other problems. The research on the last problem, reconstruction of a binary tree from its traversals, was stimulated by a challenge proposed in a recent paper by Berkman et al. ( Highly Parallelizable Problems, STOC 89) to design doubly logarithmic time optimal parallel algorithms because a remarkably small number of such parallel algorithms exist
Deterministic parallel algorithms for bilinear objective functions
Many randomized algorithms can be derandomized efficiently using either the
method of conditional expectations or probability spaces with low independence.
A series of papers, beginning with work by Luby (1988), showed that in many
cases these techniques can be combined to give deterministic parallel (NC)
algorithms for a variety of combinatorial optimization problems, with low time-
and processor-complexity.
We extend and generalize a technique of Luby for efficiently handling
bilinear objective functions. One noteworthy application is an NC algorithm for
maximal independent set. On a graph with edges and vertices, this
takes time and processors, nearly
matching the best randomized parallel algorithms. Other applications include
reduced processor counts for algorithms of Berger (1997) for maximum acyclic
subgraph and Gale-Berlekamp switching games.
This bilinear factorization also gives better algorithms for problems
involving discrepancy. An important application of this is to automata-fooling
probability spaces, which are the basis of a notable derandomization technique
of Sivakumar (2002). Our method leads to large reduction in processor
complexity for a number of derandomization algorithms based on
automata-fooling, including set discrepancy and the Johnson-Lindenstrauss
Lemma
Recommended from our members
Efficient Linked List Ranking Algorithms and Parentheses Matching as a New Strategy for Parallel Algorithm Design
The goal of a parallel algorithm is to solve a single problem using multiple processors working together and to do so in an efficient manner. In this regard, there is a need to categorize strategies in order to solve broad classes of problems with similar structures and requirements. In this dissertation, two parallel algorithm design strategies are considered: linked list ranking and parentheses matching
Efficient parallel computation on multiprocessors with optical interconnection networks
This dissertation studies optical interconnection networks, their architecture, address schemes, and computation and communication capabilities. We focus on a simple but powerful optical interconnection network model - the Linear Array with Reconfigurable pipelined Bus System (LARPBS). We extend the LARPBS model to a simplified higher dimensional LAPRBS and provide a set of basic computation operations. We then study the following two groups of parallel computation problems on both one dimensional LARPBS\u27s as well as multi-dimensional LARPBS\u27s: parallel comparison problems, including sorting, merging, and selection; Boolean matrix multiplication, transitive closure and their applications to connected component problems. We implement an optimal sorting algorithm on an n-processor LARPBS. With this optimal sorting algorithm at disposal, we study the sorting problem for higher dimensional LARPBS\u27s and obtain the following results: âą An optimal basic Columnsort algorithm on a 2D LARPBS. âą Two optimal two-way merge sort algorithms on a 2D LARPBS. âą An optimal multi-way merge sorting algorithm on a 2D LARPBS. âą An optimal generalized column sort algorithm on a 2D LARPBS. âą An optimal generalized column sort algorithm on a 3D LARPBS. âą An optimal 5-phase sorting algorithm on a 3D LARPBS. Results for selection problems are as follows: âą A constant time maximum-finding algorithm on an LARPBS. âą An optimal maximum-finding algorithm on an LARPBS. âą An O((log log n)2) time parallel selection algorithm on an LARPBS. âą An O(k(log log n)2) time parallel multi-selection algorithm on an LARPBS. While studying the computation and communication properties of the LARPBS model, we find Boolean matrix multiplication and its applications to the graph are another set of problem that can be solved efficiently on the LARPBS. Following is a list of results we have obtained in this area. âą A constant time Boolean matrix multiplication algorithm. âą An O(log n)-time transitive closure algorithm. âą An O(log n)-time connected components algorithm. âą An O(log n)-time strongly connected components algorithm. The results provided in this dissertation show the strong computation and communication power of optical interconnection networks
Fault-tolerant sorting networks
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 1994.Includes bibliographical references (p. 148-150).by Yuan Ma.Ph.D
Implicit Decomposition for Write-Efficient Connectivity Algorithms
The future of main memory appears to lie in the direction of new technologies
that provide strong capacity-to-performance ratios, but have write operations
that are much more expensive than reads in terms of latency, bandwidth, and
energy. Motivated by this trend, we propose sequential and parallel algorithms
to solve graph connectivity problems using significantly fewer writes than
conventional algorithms. Our primary algorithmic tool is the construction of an
-sized "implicit decomposition" of a bounded-degree graph on
nodes, which combined with read-only access to enables fast answers to
connectivity and biconnectivity queries on . The construction breaks the
linear-write "barrier", resulting in costs that are asymptotically lower than
conventional algorithms while adding only a modest cost to querying time. For
general non-sparse graphs on edges, we also provide the first writes
and operations parallel algorithms for connectivity and biconnectivity.
These algorithms provide insight into how applications can efficiently process
computations on large graphs in systems with read-write asymmetry
Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model
In this paper we develop optimal algorithms in the binary-forking model for a
variety of fundamental problems, including sorting, semisorting, list ranking,
tree contraction, range minima, and ordered set union, intersection and
difference. In the binary-forking model, tasks can only fork into two child
tasks, but can do so recursively and asynchronously. The tasks share memory,
supporting reads, writes and test-and-sets. Costs are measured in terms of work
(total number of instructions), and span (longest dependence chain).
The binary-forking model is meant to capture both algorithm performance and
algorithm-design considerations on many existing multithreaded languages, which
are also asynchronous and rely on binary forks either explicitly or under the
covers. In contrast to the widely studied PRAM model, it does not assume
arbitrary-way forks nor synchronous operations, both of which are hard to
implement in modern hardware. While optimal PRAM algorithms are known for the
problems studied herein, it turns out that arbitrary-way forking and strict
synchronization are powerful, if unrealistic, capabilities. Natural simulations
of these PRAM algorithms in the binary-forking model (i.e., implementations in
existing parallel languages) incur an overhead in span. This
paper explores techniques for designing optimal algorithms when limited to
binary forking and assuming asynchrony. All algorithms described in this paper
are the first algorithms with optimal work and span in the binary-forking
model. Most of the algorithms are simple. Many are randomized
Communication-Efficient Probabilistic Algorithms: Selection, Sampling, and Checking
Diese Dissertation behandelt drei grundlegende Klassen von Problemen in Big-Data-Systemen, fĂŒr die wir kommunikationseffiziente probabilistische Algorithmen entwickeln. Im ersten Teil betrachten wir verschiedene Selektionsprobleme, im zweiten Teil das Ziehen gewichteter Stichproben (Weighted Sampling) und im dritten Teil die probabilistische KorrektheitsprĂŒfung von Basisoperationen in Big-Data-Frameworks (Checking). Diese Arbeit ist durch einen wachsenden Bedarf an Kommunikationseffizienz motiviert, der daher rĂŒhrt, dass der auf das Netzwerk und seine Nutzung zurĂŒckzufĂŒhrende Anteil sowohl der Anschaffungskosten als auch des Energieverbrauchs von Supercomputern und der Laufzeit verteilter Anwendungen immer weiter wĂ€chst. Ăberraschend wenige kommunikationseffiziente Algorithmen sind fĂŒr grundlegende Big-Data-Probleme bekannt. In dieser Arbeit schlieĂen wir einige dieser LĂŒcken.
ZunĂ€chst betrachten wir verschiedene Selektionsprobleme, beginnend mit der verteilten Version des klassischen Selektionsproblems, d. h. dem Auffinden des Elements von Rang in einer groĂen verteilten Eingabe. Wir zeigen, wie dieses Problem kommunikationseffizient gelöst werden kann, ohne anzunehmen, dass die Elemente der Eingabe zufĂ€llig verteilt seien. Hierzu ersetzen wir die Methode zur Pivotwahl in einem schon lange bekannten Algorithmus und zeigen, dass dies hinreichend ist. AnschlieĂend zeigen wir, dass die Selektion aus lokal sortierten Folgen â multisequence selection â wesentlich schneller lösbar ist, wenn der genaue Rang des Ausgabeelements in einem gewissen Bereich variieren darf. Dies benutzen wir anschlieĂend, um eine verteilte PrioritĂ€tswarteschlange mit Bulk-Operationen zu konstruieren. SpĂ€ter werden wir diese verwenden, um gewichtete Stichproben aus Datenströmen zu ziehen (Reservoir Sampling). SchlieĂlich betrachten wir das Problem, die global hĂ€ufigsten Objekte sowie die, deren zugehörige Werte die gröĂten Summen ergeben, mit einem stichprobenbasierten Ansatz zu identifizieren.
Im Kapitel ĂŒber gewichtete Stichproben werden zunĂ€chst neue Konstruktionsalgorithmen fĂŒr eine klassische Datenstruktur fĂŒr dieses Problem, sogenannte Alias-Tabellen, vorgestellt. Zu Beginn stellen wir den ersten Linearzeit-Konstruktionsalgorithmus fĂŒr diese Datenstruktur vor, der mit konstant viel Zusatzspeicher auskommt. AnschlieĂend parallelisieren wir diesen Algorithmus fĂŒr Shared Memory und erhalten so den ersten parallelen Konstruktionsalgorithmus fĂŒr Aliastabellen. Hiernach zeigen wir, wie das Problem fĂŒr verteilte Systeme mit einem zweistufigen Algorithmus angegangen werden kann. AnschlieĂend stellen wir einen ausgabesensitiven Algorithmus fĂŒr gewichtete Stichproben mit ZurĂŒcklegen vor. Ausgabesensitiv bedeutet, dass die Laufzeit des Algorithmus sich auf die Anzahl der eindeutigen Elemente in der Ausgabe bezieht und nicht auf die GröĂe der Stichprobe. Dieser Algorithmus kann sowohl sequentiell als auch auf Shared-Memory-Maschinen und verteilten Systemen eingesetzt werden und ist der erste derartige Algorithmus in allen drei Kategorien. Wir passen ihn anschlieĂend an das Ziehen gewichteter Stichproben ohne ZurĂŒcklegen an, indem wir ihn mit einem SchĂ€tzer fĂŒr die Anzahl der eindeutigen Elemente in einer Stichprobe mit ZurĂŒcklegen kombinieren. Poisson-Sampling, eine Verallgemeinerung des Bernoulli-Sampling auf gewichtete Elemente, kann auf ganzzahlige Sortierung zurĂŒckgefĂŒhrt werden, und wir zeigen, wie ein bestehender Ansatz parallelisiert werden kann. FĂŒr das Sampling aus Datenströmen passen wir einen sequentiellen Algorithmus an und zeigen, wie er in einem Mini-Batch-Modell unter Verwendung unserer im Selektionskapitel eingefĂŒhrten Bulk-PrioritĂ€tswarteschlange parallelisiert werden kann. Das Kapitel endet mit einer ausfĂŒhrlichen Evaluierung unserer Aliastabellen-Konstruktionsalgorithmen, unseres ausgabesensitiven Algorithmus fĂŒr gewichtete Stichproben mit ZurĂŒcklegen und unseres Algorithmus fĂŒr gewichtetes Reservoir-Sampling.
Um die Korrektheit verteilter Algorithmen probabilistisch zu verifizieren, schlagen wir Checker fĂŒr grundlegende Operationen von Big-Data-Frameworks vor. Wir zeigen, dass die ĂberprĂŒfung zahlreicher Operationen auf zwei âKernâ-Checker reduziert werden kann, nĂ€mlich die PrĂŒfung von Aggregationen und ob eine Folge eine Permutation einer anderen Folge ist. WĂ€hrend mehrere AnsĂ€tze fĂŒr letzteres Problem seit geraumer Zeit bekannt sind und sich auch einfach parallelisieren lassen, ist unser Summenaggregations-Checker eine neuartige Anwendung der gleichen Datenstruktur, die auch zĂ€hlenden Bloom-Filtern und dem Count-Min-Sketch zugrunde liegt. Wir haben beide Checker in Thrill, einem Big-Data-Framework, implementiert. Experimente mit absichtlich herbeigefĂŒhrten Fehlern bestĂ€tigen die von unserer theoretischen Analyse vorhergesagte Erkennungsgenauigkeit. Dies gilt selbst dann, wenn wir hĂ€ufig verwendete schnelle Hash-Funktionen mit in der Theorie suboptimalen Eigenschaften verwenden. Skalierungsexperimente auf einem Supercomputer zeigen, dass unsere Checker nur sehr geringen Laufzeit-Overhead haben, welcher im Bereich von liegt und dabei die Korrektheit des Ergebnisses nahezu garantiert wird
Efficient Algorithms and Data Structures for Massive Data Sets
For many algorithmic problems, traditional algorithms that optimise on the
number of instructions executed prove expensive on I/Os. Novel and very
different design techniques, when applied to these problems, can produce
algorithms that are I/O efficient. This thesis adds to the growing chorus of
such results. The computational models we use are the external memory model and
the W-Stream model.
On the external memory model, we obtain the following results. (1) An I/O
efficient algorithm for computing minimum spanning trees of graphs that
improves on the performance of the best known algorithm. (2) The first external
memory version of soft heap, an approximate meldable priority queue. (3) Hard
heap, the first meldable external memory priority queue that matches the
amortised I/O performance of the known external memory priority queues, while
allowing a meld operation at the same amortised cost. (4) I/O efficient exact,
approximate and randomised algorithms for the minimum cut problem, which has
not been explored before on the external memory model. (5) Some lower and upper
bounds on I/Os for interval graphs.
On the W-Stream model, we obtain the following results. (1) Algorithms for
various tree problems and list ranking that match the performance of the best
known algorithms and are easier to implement than them. (2) Pass efficient
algorithms for sorting, and the maximal independent set problems, that improve
on the best known algorithms. (3) Pass efficient algorithms for the graphs
problems of finding vertex-colouring, approximate single source shortest paths,
maximal matching, and approximate weighted vertex cover. (4) Lower bounds on
passes for list ranking and maximal matching.
We propose two variants of the W-Stream model, and design algorithms for the
maximal independent set, vertex-colouring, and planar graph single source
shortest paths problems on those models.Comment: PhD Thesis (144 pages