Search CORE

23 research outputs found

A Bulk-Parallel Priority Queue in External Memory with STXXL

Author: GS Brodal
J Singler
JS Vitter
L Arge
MC Pinotti
N Deo
P Sanders
P Sanders
PJ Varman
R Dementiev
Publication venue
Publication date: 01/01/2015
Field of study

We propose the design and an implementation of a bulk-parallel external memory priority queue to take advantage of both shared-memory parallelism and high external memory transfer speeds to parallel disks. To achieve higher performance by decoupling item insertions and extractions, we offer two parallelization interfaces: one using "bulk" sequences, the other by defining "limit" items. In the design, we discuss how to parallelize insertions using multiple heaps, and how to calculate a dynamic prediction sequence to prefetch blocks and apply parallel multiway merge for extraction. Our experimental results show that in the selected benchmarks the priority queue reaches 75% of the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape

arXiv.org e-Print Archive

Crossref

KITopen

Bulk-Parallel Priority Queue in External Memory

Author: Keh Thomas
Publication venue
Publication date: 13/08/2014
Field of study

We present a priority queue implementation with support for external memory. The focus of our work has been to derive a benefit from parallel shared-memory machines. It\u27s the first parallel optimization of an external-memory priority queue. An additional bulk insertion interface accelerates longer sequences of homogeneous operations, as they are more likely to occur in applications that process large amounts of data. The algorithm will be available as an extension to the STXXL, a popular C++ template library for extra large data sets. Experiments have shown great improvements over the current external-memory priority queue of the STXXL for homogeneous bulk operations. However, the high overhead for spawning threads, as well as the need for cache synchronization in the global ExtractMin operation, show the inherent limitations of the parallelizability of priority queues

KITopen

Algorithm Libraries for Multi-Core Processors

Author: Singler Johannes
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2010
Field of study

By providing parallelized versions of established algorithm libraries, we ease the exploitation of the multiple cores on modern processors for the programmer. The Multi-Core STL provides basic algorithms for internal memory, while the parallelized STXXL enables multi-core acceleration for algorithms on large data sets stored on disk. Some parallelized geometric algorithms are introduced into CGAL. Further, we design and implement sorting algorithms for huge data in distributed external memory

KITopen

An Experimental Study of External Memory Algorithms for Connected Components

Author: Fagerberg Rolf
Hammer David
Meyer Ulrich
Penschuck Manuel
Tran Hung
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Symposium on Experimental Algorithms (SEA 2021)
Publication date: 01/01/2021
Field of study

We empirically investigate algorithms for solving Connected Components in the external memory model. In particular, we study whether the randomized O(Sort(E)) algorithm by Karger, Klein, and Tarjan can be implemented to compete with practically promising and simpler algorithms having only slightly worse theoretical cost, namely Borůvka’s algorithm and the algorithm by Sibeyn and collaborators. For all algorithms, we develop and test a number of tuning options. Our experiments are executed on a large set of different graph classes including random graphs, grids, geometric graphs, and hyperbolic graphs. Among our findings are: The Sibeyn algorithm is a very strong contender due to its simplicity and due to an added degree of freedom in its internal workings when used in the Connected Components setting. With the right tunings, the Karger-Klein-Tarjan algorithm can be implemented to be competitive in many cases. Higher graph density seems to benefit Karger-Klein-Tarjan relative to Sibeyn. Borůvka’s algorithm is not competitive with the two others

Dagstuhl Research Online Publication Server

Hochschulschriftenserver - Universität Frankfurt am Main

Algorithm Engineering for fundamental Sorting and Graph Problems

Author: Osipov Vitaly
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

Fundamental Algorithms build a basis knowledge for every computer science undergraduate or a professional programmer. It is a set of basic techniques one can find in any (good) coursebook on algorithms and data structures. In this thesis we try to close the gap between theoretically worst-case optimal classical algorithms and the real-world circumstances one face under the assumptions imposed by the data size, limited main memory or available parallelism

KITopen

Traversing large graphs in realistic settings

Author: Ajwani Deepak
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2008
Field of study

The notion of graph traversal is of fundamental importance to solving many computational problems. In many modern applications involving graph traversal such as those arising in the domain of social networks, Internet based services, fraud detection in telephone calls etc., the underlying graph is very large and dynamically evolving. This thesis deals with the design and engineering of First Search (BFS) algorithms for massive sparse undirected graphs. Our pipelined implementations with low constant factors, together with some heuristics preserving the worst-case guarantees makes BFS viable on massive graphs. We perform an extensive set of experiments to study the effect of various graph properties such as diameter, inititraversal algorithms for such graphs. We engineer various I/O-efficient Breadth al disk layouts, tuning parameters, disk parallelism, cache-obliviousness etc. on the relative performance of these algorithms. We characterize the performance of NAND flash based storage devices, including many solid state disks. We show that despite the similarities between flash memory and RAM (fast random reads) and between flash disk and hard disk (both are block based devices), the algorithms designed in the RAM model or the external memory model do not realize the full potential of the flash memory devices. We also analyze the effect of misalignments, aging, past I/O patterns, etc. on the performance obtained on these devices. We also consider I/O-efficient BFS algorithms for the case when a hard disk and a solid state disk are used together. We present a simple algorithm which maintains the topological order of a directed acyclic graph with n nodes under an online edge insertion sequence in O(n2.75)time, independent of the number m of edges inserted. For dense DAGs, this is an improvement over the previous best result of O (min{m3/2 logn,m3/2 +n2 logn}). While our analysis holds only for the incremental setting, our algorithm itself is fully dynamic. We also present the first average-case analysis of online topological ordering algorithms. We prove an expected runtime of O (n2 polylog(n)) under insertion of the edges of a complete DAG in a random order for various incremental topological ordering algorithms.Die Traversierung von Graphen ist von fundamentaler Bedeutung für das Lösen vieler Berechnungsprobleme. Moderne Anwendungen, die auf Graphtraversierung beruhen, findet man unter anderem in sozialen Netzwerken, internetbasierten Dienstleistungen, Betrugserkennung bei Telefonanrufen. In vielen dieser lAnwendungen ist der zugrunde iegende Graph sehr gross und ändert sich kontinuierlich. Wir entwickelnmehrere I/O-effiziente Breitensuch-Algorithmen für massive, dünnbesiedelte, ungerichtete Graphen. Im Zusammenspiel mit Heuristiken zur Einhaltung von Worst-Case-Garantien, ermöglichen unsere pipeline-basierten Implementierungen die Praktikabilität von Breitensuche auf massiven Graphen. Wir führen eine Vielfalt an Experimente durch, um die Wirkung unterschiedlicher Grapheigenschaften zu untersuchen, wie z.B. Graph-Durchmesser, anfängliche Belegung der Festplatte, Tuning-Parameter, Plattenparallelismus. Wir charakterisieren die Leistung von NAND-Flash basierten Speichermedien, einschliesslich vieler solid-state Disks. Wir zeigen, dass trotz der Ähnlichkeiten von Flash-Speicher und RAM (schnelle wahlfreie Lese-Zugriffe) und von Flash-Platten und Festplatten (beide sind blockbasiert) Algorithmen, die für das RAMModell oder das Externspeicher-Modell entworfenen wurden, nicht das volle Potential der Flash-Speicher-Medien ausschöpfen. Zusätzlich analysieren wir die Wirkung von Ausrichtungsfehlern, Alterung, vorausgehenden I/O-Mustern, usw., auf die Leistung dieser Medien. Wir berücksichtigen auch I/O-effiziente Breitensuch-Algorithmen für die gleichzeitige Nutzung von Festplatten und solid-state Disks. Wir stellen einen einfachen Algorithmus vor, der beim Online-Einfügen von Kanten die topologische Ordnung von einem gerichteten, azyklischen Graphen (DAG) mit n Knoten beibehält. Dieser Algorithmus hat eine Laufzeitkomplexität von O(n2.75) unabhängig von der Anzahl m der eingefügten Kanten. Für dichte DAGs ist dies eine Verbesserung des besten, vorherigen Ergebnisses von O(min{m3/2 logn,m3/2 +n2 logn}). Während die Analyse nur im inkrementellen Szenario gütlig ist, ist unser Algorithmus vollständig dynamisch. Ferner stellen wir die erste Average-Case-Analyse von Online-Algorithmen zur Unterhaltung einer topologischen Ordnung vor. Für mehrere inkrementelle Algorithmen, welche die Kanten eines kompletten DAGs in zufälliger Reihenfolge einfügen, beweisen wir eine erwartete Laufzeit von O(n2 polylog(n))

Universaar

Acronym

MPG.PuRe