466,564 research outputs found
Online Data Structures in External Memory
The original publication is available at www.springerlink.comThe data sets for many of today's computer applications are
too large to t within the computer's internal memory and must instead
be stored on external storage devices such as disks. A major performance
bottleneck can be the input/output communication (or I/O) between
the external and internal memories. In this paper we discuss a variety of
online data structures for external memory, some very old and some very
new, such as hashing (for dictionaries), B-trees (for dictionaries and 1-D
range search), bu er trees (for batched dynamic problems), interval trees
with weight-balanced B-trees (for stabbing queries), priority search trees
(for 3-sided 2-D range search), and R-trees and other spatial structures.
We also discuss several open problems along the way
Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array
The longest common prefix (LCP) array is a versatile auxiliary data structure
in indexed string matching. It can be used to speed up searching using the
suffix array (SA) and provides an implicit representation of the topology of an
underlying suffix tree. The LCP array of a string of length can be
represented as an array of length words, or, in the presence of the SA, as
a bit vector of bits plus asymptotically negligible support data
structures. External memory construction algorithms for the LCP array have been
proposed, but those proposed so far have a space requirement of words
(i.e. bits) in external memory. This space requirement is in some
practical cases prohibitively expensive. We present an external memory
algorithm for constructing the bit version of the LCP array which uses
bits of additional space in external memory when given a
(compressed) BWT with alphabet size and a sampled inverse suffix array
at sampling rate . This is often a significant space gain in
practice where is usually much smaller than or even constant. We
also consider the case of computing succinct LCP arrays for circular strings
MKtree: generation and simulations
The problem to represent very complex systems has been studied by
several authors, obtaining solutions based on different data
structures. In this paper, a K dimensional tree
(Multirresolution Kdtree, MKtree) is introduced. The MKtree represents
a hierarchical subdivision of the scene objects that guarantees a
minimum space overlap between node regions. MKtrees are useful for
collision detection and for time-critical rendering in very large
environments requiring external memory storage. Examples in ship
design applications are described.Postprint (published version
What Does Dynamic Optimality Mean in External Memory?
A data structure A is said to be dynamically optimal over a class of data structures ? if A is constant-competitive with every data structure C ? ?. Much of the research on binary search trees in the past forty years has focused on studying dynamic optimality over the class of binary search trees that are modified via rotations (and indeed, the question of whether splay trees are dynamically optimal has gained notoriety as the so-called dynamic-optimality conjecture). Recently, researchers have extended this to consider dynamic optimality over certain classes of external-memory search trees. In particular, Demaine, Iacono, Koumoutsos, and Langerman propose a class of external-memory trees that support a notion of tree rotations, and then give an elegant data structure, called the Belga B-tree, that is within an O(log log N)-factor of being dynamically optimal over this class.
In this paper, we revisit the question of how dynamic optimality should be defined in external memory. A defining characteristic of external-memory data structures is that there is a stark asymmetry between queries and inserts/updates/deletes: by making the former slightly asymptotically slower, one can make the latter significantly asymptotically faster (even allowing for operations with sub-constant amortized I/Os). This asymmetry makes it so that rotation-based search trees are not optimal (or even close to optimal) in insert/update/delete-heavy external-memory workloads. To study dynamic optimality for such workloads, one must consider a different class of data structures.
The natural class of data structures to consider are what we call buffered-propagation trees. Such trees can adapt dynamically to the locality properties of an input sequence in order to optimize the interactions between different inserts/updates/deletes and queries. We also present a new form of beyond-worst-case analysis that allows for us to formally study a continuum between static and dynamic optimality. Finally, we give a novel data structure, called the J?llo Tree, that is statically optimal and that achieves dynamic optimality for a large natural class of inputs defined by our beyond-worst-case analysis
Predicting Memory Demands of BDD Operations using Maximum Graph Cuts (Extended Paper)
The BDD package Adiar manipulates Binary Decision Diagrams (BDDs) in external
memory. This enables handling big BDDs, but the performance suffers when
dealing with moderate-sized BDDs. This is mostly due to initializing expensive
external memory data structures, even if their contents can fit entirely inside
internal memory.
The contents of these auxiliary data structures always correspond to a graph
cut in an input or output BDD. Specifically, these cuts respect the levels of
the BDD. We formalise the shape of these cuts and prove sound upper bounds on
their maximum size for each BDD operation.
We have implemented these upper bounds within Adiar. With these bounds, it
can predict whether a faster internal memory variant of the auxiliary data
structures can be used. In practice, this improves Adiar's running time across
the board. Specifically for the moderate-sized BDDs, this results in an average
reduction of the computation time by 86.1% (median of 89.7%). In some cases,
the difference is even 99.9\%. When checking equivalence of hardware circuits
from the EPFL Benchmark Suite, for one of the instances the time was decreased
by 52 hours.Comment: 25 pages, 11 Figures, 2 Tables. Extended version of paper published
at ATVA 202
Worst-Case Optimal Algorithms for Parallel Query Processing
In this paper, we study the communication complexity for the problem of
computing a conjunctive query on a large database in a parallel setting with
servers. In contrast to previous work, where upper and lower bounds on the
communication were specified for particular structures of data (either data
without skew, or data with specific types of skew), in this work we focus on
worst-case analysis of the communication cost. The goal is to find worst-case
optimal parallel algorithms, similar to the work of [18] for sequential
algorithms.
We first show that for a single round we can obtain an optimal worst-case
algorithm. The optimal load for a conjunctive query when all relations have
size equal to is , where is a new query-related
quantity called the edge quasi-packing number, which is different from both the
edge packing number and edge cover number of the query hypergraph. For multiple
rounds, we present algorithms that are optimal for several classes of queries.
Finally, we show a surprising connection to the external memory model, which
allows us to translate parallel algorithms to external memory algorithms. This
technique allows us to recover (within a polylogarithmic factor) several recent
results on the I/O complexity for computing join queries, and also obtain
optimal algorithms for other classes of queries
I/O efficient bisimulation partitioning on very large directed acyclic graphs
In this paper we introduce the first efficient external-memory algorithm to
compute the bisimilarity equivalence classes of a directed acyclic graph (DAG).
DAGs are commonly used to model data in a wide variety of practical
applications, ranging from XML documents and data provenance models, to web
taxonomies and scientific workflows. In the study of efficient reasoning over
massive graphs, the notion of node bisimilarity plays a central role. For
example, grouping together bisimilar nodes in an XML data set is the first step
in many sophisticated approaches to building indexing data structures for
efficient XPath query evaluation. To date, however, only internal-memory
bisimulation algorithms have been investigated. As the size of real-world DAG
data sets often exceeds available main memory, storage in external memory
becomes necessary. Hence, there is a practical need for an efficient approach
to computing bisimulation in external memory.
Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)),
where |N| and |E| are the numbers of nodes and edges, resp., in the data graph
and Sort(n) is the number of accesses to external memory needed to sort an
input of size n. We also study specializations of this algorithm to common
variations of bisimulation for tree-structured XML data sets. We empirically
verify efficient performance of the algorithms on graphs and XML documents
having billions of nodes and edges, and find that the algorithms can process
such graphs efficiently even when very limited internal memory is available.
The proposed algorithms are simple enough for practical implementation and use,
and open the door for further study of external-memory bisimulation algorithms.
To this end, the full open-source C++ implementation has been made freely
available
- …