182 research outputs found

    Optimal Hierarchical Layouts for Cache-Oblivious Search Trees

    Full text link
    This paper proposes a general framework for generating cache-oblivious layouts for binary search trees. A cache-oblivious layout attempts to minimize cache misses on any hierarchical memory, independent of the number of memory levels and attributes at each level such as cache size, line size, and replacement policy. Recursively partitioning a tree into contiguous subtrees and prescribing an ordering amongst the subtrees, Hierarchical Layouts generalize many commonly used layouts for trees such as in-order, pre-order and breadth-first. They also generalize the various flavors of the van Emde Boas layout, which have previously been used as cache-oblivious layouts. Hierarchical Layouts thus unify all previous attempts at deriving layouts for search trees. The paper then derives a new locality measure (the Weighted Edge Product) that mimics the probability of cache misses at multiple levels, and shows that layouts that reduce this measure perform better. We analyze the various degrees of freedom in the construction of Hierarchical Layouts, and investigate the relative effect of each of these decisions in the construction of cache-oblivious layouts. Optimizing the Weighted Edge Product for complete binary search trees, we introduce the MinWEP layout, and show that it outperforms previously used cache-oblivious layouts by almost 20%.Comment: Extended version with proofs added to the appendi

    Cache-oblivious algorithms

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 67-70).by Harald Prokop.S.M

    Cache-Oblivious Representation of B-Tree Structures

    Full text link
    We present a data structure CORoBTS for storing a search tree with all leaves at the same depth and vertices of arity between chosen constants aa and bb in a cache-oblivious way. It provides operations for inserting an aa-ary subtree and removing a subtree; both have an amortized I/O complexity O(S(log2N)/B+logBNloglogS+1)\mathcal{O}(S\cdot(\log^2 N)/ B + \log_B N \cdot \log\log S + 1) and amortized time complexity O(Slog2N)\mathcal{O}(S\cdot\log^2 N), where SS is the size of the subtree and NN size of the whole stored tree. The tree allows searching with an optimal I/O complexity O(logBN)\mathcal{O}(\log_B{N}) and is stored in a linear space. We use the data structure as a top space-time tree in the cache-oblivious partially persistent array proposed by Davoodi et al. [DFI\"O14]. The space complexity of the persistent array is then improved from O(Ulog23+VlogU)\mathcal{O}(U^{\log_2 3} + V \log U) to O(U+VlogU)\mathcal{O}(U + V \log U), where UU is the maximal size of the array and VV is the number of versions. The data locality and I/O complexity of both present and persistent reads are kept unchanged; I/O complexity of writes is worsened by a polylogarithmic factor.Comment: 26 pages + 7 pages of algorithms, 7 figure

    Praktické datové struktury

    Get PDF
    V této práci implementujeme datové struktury pro uspořádané a neuspořádané slovníky a měříme jejich výkon v hlavní paměti pomocí syntetických i praktických experimentů. Náš průzkum zahrnuje jak obvyklé datové struktury (B-stromy, červeno-černé stromy, splay stromy a hashování), tak exotičtější přístupy (k-splay stromy a k-lesy). Powered by TCPDF (www.tcpdf.org)In this thesis, we implement several data structures for ordered and unordered dictionaries and we benchmark their performance in main memory on synthetic and practical workloads. Our survey includes both well-known data structures (B-trees, red-black trees, splay trees and hashing) and more exotic approaches (k-splay trees and k-forests). Powered by TCPDF (www.tcpdf.org)Department of Applied MathematicsKatedra aplikované matematikyMatematicko-fyzikální fakultaFaculty of Mathematics and Physic

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
    corecore