2,404 research outputs found
New Algorithms and Lower Bounds for Sequential-Access Data Compression
This thesis concerns sequential-access data compression, i.e., by algorithms
that read the input one or more times from beginning to end. In one chapter we
consider adaptive prefix coding, for which we must read the input character by
character, outputting each character's self-delimiting codeword before reading
the next one. We show how to encode and decode each character in constant
worst-case time while producing an encoding whose length is worst-case optimal.
In another chapter we consider one-pass compression with memory bounded in
terms of the alphabet size and context length, and prove a nearly tight
tradeoff between the amount of memory we can use and the quality of the
compression we can achieve. In a third chapter we consider compression in the
read/write streams model, which allows us passes and memory both
polylogarithmic in the size of the input. We first show how to achieve
universal compression using only one pass over one stream. We then show that
one stream is not sufficient for achieving good grammar-based compression.
Finally, we show that two streams are necessary and sufficient for achieving
entropy-only bounds.Comment: draft of PhD thesi
Robust and Adaptive Search
Binary search finds a given element in a sorted array with an optimal number of log n queries. However, binary search fails even when the array is only slightly disordered or access to its elements is subject to errors. We study the worst-case query complexity of search algorithms that are robust to imprecise queries and that adapt to perturbations of the order of the elements. We give (almost) tight results for various parameters that quantify query errors and that measure array disorder. In particular, we exhibit settings where query complexities of log n + ck, (1+epsilon) log n + ck, and sqrt(cnk)+o(nk) are best-possible for parameter value k, any epsilon > 0, and constant c
Optimal parallel string algorithms: sorting, merching and computing the minimum
We study fundamental comparison problems on strings of characters, equipped with the usual lexicographical ordering. For each problem studied, we give a parallel algorithm that is optimal with respect to at least one criterion for which no optimal algorithm was previously known. Specifically, our main results are: % \begin{itemize} \item Two sorted sequences of strings, containing altogether ~characters, can be merged in time using operations on an EREW PRAM. This is optimal as regards both the running time and the number of operations. \item A sequence of strings, containing altogether ~characters represented by integers of size polynomial in~, can be sorted in time using operations on a CRCW PRAM. The running time is optimal for any polynomial number of processors. \item The minimum string in a sequence of strings containing altogether characters can be found using (expected) operations in constant expected time on a randomized CRCW PRAM, in time on a deterministic CRCW PRAM with a program depending on~, in time on a deterministic CRCW PRAM with a program not depending on~, in expected time on a randomized EREW PRAM, and in time on a deterministic EREW PRAM. The number of operations is optimal, and the running time is optimal for the randomized algorithms and, if the number of processors is limited to~, for the nonuniform deterministic CRCW PRAM algorithm as we
Complexity of union-split-find problems
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 45-46).In this thesis, we investigate various interpretations of the Union-Split-Find problem, an extension of the classic Union-Find problem. In the Union-Split Find problem, we maintain disjoint sets of ordered elements subject to the operations of constructing singleton sets, merging two sets together, splitting a set by partitioning it around a specified value, and finding the set that contains a given element. The different interpretations of this problem arise from the different assumptions made regarding when sets can be merged and any special properties the sets may have. We define and analyze the Interval, Cyclic, Ordered, and General Union-Split-Find problems. Previous work implies optimal solutions to the Interval and Ordered Union-Split-Find problems and an (log n/ log log n) lower bound for the Cyclic Union-Split-Find problem in the cell-probe model. We present a new data structure that achieves a matching upper bound of (log n/ log log n) for Cyclic Union-Split Find in the word RAM model. For General Union-Split-Find, no o(n) bound is known. We present a data structure which has an [Omega](log2 n) amortized lower bound in the worst case that we conjecture has polylogarithmic amortized performance. This thesis is the product of joint work with Erik Demaine.by Katherine Jane Lai.M.Eng
Benchmarking Learned Indexes
Recent advancements in learned index structures propose replacing existing
index structures, like B-Trees, with approximate learned models. In this work,
we present a unified benchmark that compares well-tuned implementations of
three learned index structures against several state-of-the-art "traditional"
baselines. Using four real-world datasets, we demonstrate that learned index
structures can indeed outperform non-learned indexes in read-only in-memory
workloads over a dense array. We also investigate the impact of caching,
pipelining, dataset size, and key size. We study the performance profile of
learned index structures, and build an explanation for why learned models
achieve such good performance. Finally, we investigate other important
properties of learned index structures, such as their performance in
multi-threaded systems and their build times
- …