33 research outputs found

    Cache-Oblivious Data Structures and Algorithms for Undirected Breadth-First Search and Shortest Paths

    Get PDF
    We present improved cache-oblivious data structures and algorithms for breadth-first search (BFS) on undirected graphs and the single-source shortest path (SSSP) problem on undirected graphs with non-negative edge weights. For the SSSP problem, our result closes the performance gap between the currently best cache-aware algorithm and the cache-oblivious counterpart. Our cache-oblivious SSSP-algorithm takes nearly full advantage of block transfers for dense graphs. The algorithm relies on a new data structure, called bucket heap, which is the first cache-oblivious priority queue to efficiently support a weak DECREASEKEY operation. For the BFS problem, we reduce the number of I/Os for sparse graphs by a factor of nearly sqrt{B}, where B is the cache-block size, nearly closing the performance gap between the currently best cache-aware and cache-oblivious algorithms

    Priority Queues with Multiple Time Fingers

    Full text link
    A priority queue is presented that supports the operations insert and find-min in worst-case constant time, and delete and delete-min on element x in worst-case O(lg(min{w_x, q_x}+2)) time, where w_x (respectively q_x) is the number of elements inserted after x (respectively before x) and are still present at the time of the deletion of x. Our priority queue then has both the working-set and the queueish properties, and more strongly it satisfies these properties in the worst-case sense. We also define a new distribution-sensitive property---the time-finger property, which encapsulates and generalizes both the working-set and queueish properties, and present a priority queue that satisfies this property. In addition, we prove a strong implication that the working-set property is equivalent to the unified bound (which is the minimum per operation among the static finger, static optimality, and the working-set bounds). This latter result is of tremendous interest by itself as it had gone unnoticed since the introduction of such bounds by Sleater and Tarjan [JACM 1985]. Accordingly, our priority queue satisfies other distribution-sensitive properties as the static finger, static optimality, and the unified bound.Comment: 14 pages, 4 figure

    Cache-Oblivious Iterated Predecessor Queries via Range Coalescing

    Get PDF
    In this paper we develop an optimal cache-oblivious data structure that solves the iterated predecessor problem. Given k static sorted lists L[subscript 1],L[subscript 2],…,L[subscript k] of average length n and a query value q, the iterated predecessor problem is to find the largest element in each list which is less than q. Our solution to this problem, called “range coalescing”, requires O(log[subscript B+1]n+k/B) memory transfers for a query on a cache of block size B, which is information-theoretically optimal. The range-coalescing data structure consumes O(kn) space, and preprocessing requires only O(kn / B) memory transfers with high probability, given a tall cache of size M=Ω(B[superscript 2])

    External-Memory Dictionaries with Worst-Case Update Cost

    Get PDF
    The Bϵ-tree [Brodal and Fagerberg 2003] is a simple I/O-efficient external-memory-model data structure that supports updates orders of magnitude faster than B-tree with a query performance comparable to the B-tree: for any positive constant ϵ \u3c 1 insertions and deletions take O(B11-ϵ logB N) time (rather than O(logB N) time for the classic B-tree), queries take O(logB N) time and range queries returning k items take O(logB N + Bk) time. Although the Bϵ-tree has an optimal update/query tradeoff, the runtimes are amortized. Another structure, the write-optimized skip list, introduced by Bender et al. [PODS 2017], has the same performance as the Bϵ-tree but with runtimes that are randomized rather than amortized. In this paper, we present a variant of the Bϵ-tree with deterministic worst-case running times that are identical to the original’s amortized running times

    Cache-oblivious index for approximate string matching

    Get PDF
    This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlog kn)B) disk pages and finds all k-error matches with O((|P|+occ)B+log knloglog Bn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω (|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)B) disk pages, and the I/O complexity is O((|P|+occ)B+log k(k+1)nloglogn) . © 2011 Elsevier B.V. All rights reserved.postprin

    Fast Sorting on a Distributed-Memory Architecture

    Get PDF
    We consider the often-studied problem of sorting, for a parallel computer. Given an input array distributed evenly over p processors, the task is to compute the sorted output array, also distributed over the p processors. Many existing algorithms take the approach of approximately load-balancing the output, leaving each processor with Θ(n/p) elements. However, in many cases, approximate load-balancing leads to inefficiencies in both the sorting itself and in further uses of the data after sorting. We provide a deterministic parallel sorting algorithm that uses parallel selection to produce any output distribution exactly, particularly one that is perfectly load-balanced. Furthermore, when using a comparison sort, this algorithm is 1-optimal in both computation and communication. We provide an empirical study that illustrates the efficiency of exact data splitting, and shows an improvement over two sample sort algorithms.Singapore-MIT Alliance (SMA

    Traversing large graphs in realistic settings

    Get PDF
    The notion of graph traversal is of fundamental importance to solving many computational problems. In many modern applications involving graph traversal such as those arising in the domain of social networks, Internet based services, fraud detection in telephone calls etc., the underlying graph is very large and dynamically evolving. This thesis deals with the design and engineering of First Search (BFS) algorithms for massive sparse undirected graphs. Our pipelined implementations with low constant factors, together with some heuristics preserving the worst-case guarantees makes BFS viable on massive graphs. We perform an extensive set of experiments to study the effect of various graph properties such as diameter, inititraversal algorithms for such graphs. We engineer various I/O-efficient Breadth al disk layouts, tuning parameters, disk parallelism, cache-obliviousness etc. on the relative performance of these algorithms. We characterize the performance of NAND flash based storage devices, including many solid state disks. We show that despite the similarities between flash memory and RAM (fast random reads) and between flash disk and hard disk (both are block based devices), the algorithms designed in the RAM model or the external memory model do not realize the full potential of the flash memory devices. We also analyze the effect of misalignments, aging, past I/O patterns, etc. on the performance obtained on these devices. We also consider I/O-efficient BFS algorithms for the case when a hard disk and a solid state disk are used together. We present a simple algorithm which maintains the topological order of a directed acyclic graph with n nodes under an online edge insertion sequence in O(n2.75)time, independent of the number m of edges inserted. For dense DAGs, this is an improvement over the previous best result of O (min{m3/2 logn,m3/2 +n2 logn}). While our analysis holds only for the incremental setting, our algorithm itself is fully dynamic. We also present the first average-case analysis of online topological ordering algorithms. We prove an expected runtime of O (n2 polylog(n)) under insertion of the edges of a complete DAG in a random order for various incremental topological ordering algorithms.Die Traversierung von Graphen ist von fundamentaler Bedeutung für das Lösen vieler Berechnungsprobleme. Moderne Anwendungen, die auf Graphtraversierung beruhen, findet man unter anderem in sozialen Netzwerken, internetbasierten Dienstleistungen, Betrugserkennung bei Telefonanrufen. In vielen dieser lAnwendungen ist der zugrunde iegende Graph sehr gross und ändert sich kontinuierlich. Wir entwickelnmehrere I/O-effiziente Breitensuch-Algorithmen für massive, dünnbesiedelte, ungerichtete Graphen. Im Zusammenspiel mit Heuristiken zur Einhaltung von Worst-Case-Garantien, ermöglichen unsere pipeline-basierten Implementierungen die Praktikabilität von Breitensuche auf massiven Graphen. Wir führen eine Vielfalt an Experimente durch, um die Wirkung unterschiedlicher Grapheigenschaften zu untersuchen, wie z.B. Graph-Durchmesser, anfängliche Belegung der Festplatte, Tuning-Parameter, Plattenparallelismus. Wir charakterisieren die Leistung von NAND-Flash basierten Speichermedien, einschliesslich vieler solid-state Disks. Wir zeigen, dass trotz der Ähnlichkeiten von Flash-Speicher und RAM (schnelle wahlfreie Lese-Zugriffe) und von Flash-Platten und Festplatten (beide sind blockbasiert) Algorithmen, die für das RAMModell oder das Externspeicher-Modell entworfenen wurden, nicht das volle Potential der Flash-Speicher-Medien ausschöpfen. Zusätzlich analysieren wir die Wirkung von Ausrichtungsfehlern, Alterung, vorausgehenden I/O-Mustern, usw., auf die Leistung dieser Medien. Wir berücksichtigen auch I/O-effiziente Breitensuch-Algorithmen für die gleichzeitige Nutzung von Festplatten und solid-state Disks. Wir stellen einen einfachen Algorithmus vor, der beim Online-Einfügen von Kanten die topologische Ordnung von einem gerichteten, azyklischen Graphen (DAG) mit n Knoten beibehält. Dieser Algorithmus hat eine Laufzeitkomplexität von O(n2.75) unabhängig von der Anzahl m der eingefügten Kanten. Für dichte DAGs ist dies eine Verbesserung des besten, vorherigen Ergebnisses von O(min{m3/2 logn,m3/2 +n2 logn}). Während die Analyse nur im inkrementellen Szenario gütlig ist, ist unser Algorithmus vollständig dynamisch. Ferner stellen wir die erste Average-Case-Analyse von Online-Algorithmen zur Unterhaltung einer topologischen Ordnung vor. Für mehrere inkrementelle Algorithmen, welche die Kanten eines kompletten DAGs in zufälliger Reihenfolge einfügen, beweisen wir eine erwartete Laufzeit von O(n2 polylog(n))

    Algorithm Engineering for fundamental Sorting and Graph Problems

    Get PDF
    Fundamental Algorithms build a basis knowledge for every computer science undergraduate or a professional programmer. It is a set of basic techniques one can find in any (good) coursebook on algorithms and data structures. In this thesis we try to close the gap between theoretically worst-case optimal classical algorithms and the real-world circumstances one face under the assumptions imposed by the data size, limited main memory or available parallelism
    corecore