182 research outputs found
Optimal Hierarchical Layouts for Cache-Oblivious Search Trees
This paper proposes a general framework for generating cache-oblivious
layouts for binary search trees. A cache-oblivious layout attempts to minimize
cache misses on any hierarchical memory, independent of the number of memory
levels and attributes at each level such as cache size, line size, and
replacement policy. Recursively partitioning a tree into contiguous subtrees
and prescribing an ordering amongst the subtrees, Hierarchical Layouts
generalize many commonly used layouts for trees such as in-order, pre-order and
breadth-first. They also generalize the various flavors of the van Emde Boas
layout, which have previously been used as cache-oblivious layouts.
Hierarchical Layouts thus unify all previous attempts at deriving layouts for
search trees.
The paper then derives a new locality measure (the Weighted Edge Product)
that mimics the probability of cache misses at multiple levels, and shows that
layouts that reduce this measure perform better. We analyze the various degrees
of freedom in the construction of Hierarchical Layouts, and investigate the
relative effect of each of these decisions in the construction of
cache-oblivious layouts. Optimizing the Weighted Edge Product for complete
binary search trees, we introduce the MinWEP layout, and show that it
outperforms previously used cache-oblivious layouts by almost 20%.Comment: Extended version with proofs added to the appendi
Cache-oblivious algorithms
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 67-70).by Harald Prokop.S.M
Cache-Oblivious Representation of B-Tree Structures
We present a data structure CORoBTS for storing a search tree with all leaves
at the same depth and vertices of arity between chosen constants and in
a cache-oblivious way. It provides operations for inserting an -ary subtree
and removing a subtree; both have an amortized I/O complexity
and
amortized time complexity , where is the size
of the subtree and size of the whole stored tree. The tree allows searching
with an optimal I/O complexity and is stored in a
linear space.
We use the data structure as a top space-time tree in the cache-oblivious
partially persistent array proposed by Davoodi et al. [DFI\"O14]. The space
complexity of the persistent array is then improved from to , where is the maximal size
of the array and is the number of versions. The data locality and I/O
complexity of both present and persistent reads are kept unchanged; I/O
complexity of writes is worsened by a polylogarithmic factor.Comment: 26 pages + 7 pages of algorithms, 7 figure
Praktické datové struktury
V této práci implementujeme datové struktury pro uspořádané a neuspořádané slovníky a měříme jejich výkon v hlavní paměti pomocí syntetických i praktických experimentů. Náš průzkum zahrnuje jak obvyklé datové struktury (B-stromy, červeno-černé stromy, splay stromy a hashování), tak exotičtější přístupy (k-splay stromy a k-lesy). Powered by TCPDF (www.tcpdf.org)In this thesis, we implement several data structures for ordered and unordered dictionaries and we benchmark their performance in main memory on synthetic and practical workloads. Our survey includes both well-known data structures (B-trees, red-black trees, splay trees and hashing) and more exotic approaches (k-splay trees and k-forests). Powered by TCPDF (www.tcpdf.org)Department of Applied MathematicsKatedra aplikované matematikyMatematicko-fyzikální fakultaFaculty of Mathematics and Physic
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
- …