7,616 research outputs found

    The Log-Interleave Bound: Towards the Unification of Sorting and the BST Model

    Full text link
    We study the connections between sorting and the binary search tree model, with an aim towards showing that the fields are connected more deeply than is currently known. The main vehicle of our study is the log-interleave bound, a measure of the information-theoretic complexity of a permutation π\pi. When viewed through the lens of adaptive sorting -- the study of lists which are nearly sorted according to some measure of disorder -- the log-interleave bound is comparable to the most powerful known measure of disorder. Many of these measures of disorder are themselves virtually identical to well-known upper bounds in the BST model, such as the working set bound or the dynamic finger bound, suggesting a connection between BSTs and sorting. We present three results about the log-interleave bound which solidify the aforementioned connections. The first is a proof that the log-interleave bound is always within a lg⁥lg⁥n\lg \lg n multiplicative factor of a known lower bound in the BST model, meaning that an online BST algorithm matching the log-interleave bound would perform within the same bounds as the state-of-the-art lg⁥lg⁥n\lg \lg n-competitive BST. The second result is an offline algorithm in the BST model which uses O(LIB(π))O(\text{LIB}(\pi)) accesses to search for any permutation π\pi. The technique used to design this algorithm also serves as a general way to show whether a sorting algorithm can be transformed into an offline BST algorithm. The final result is a mergesort algorithm which performs work within the log-interleave bound of a permutation π\pi. This mergesort also happens to be highly parallel, adding to a line of work in parallel BST operations

    New Algorithms and Lower Bounds for Sequential-Access Data Compression

    Get PDF
    This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

    Universal lossless source coding with the Burrows Wheeler transform

    Get PDF
    The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source

    Efficient and Error-Correcting Data Structures for Membership and Polynomial Evaluation

    Get PDF
    We construct efficient data structures that are resilient against a constant fraction of adversarial noise. Our model requires that the decoder answers most queries correctly with high probability and for the remaining queries, the decoder with high probability either answers correctly or declares "don't know." Furthermore, if there is no noise on the data structure, it answers all queries correctly with high probability. Our model is the common generalization of a model proposed recently by de Wolf and the notion of "relaxed locally decodable codes" developed in the PCP literature. We measure the efficiency of a data structure in terms of its length, measured by the number of bits in its representation, and query-answering time, measured by the number of bit-probes to the (possibly corrupted) representation. In this work, we study two data structure problems: membership and polynomial evaluation. We show that these two problems have constructions that are simultaneously efficient and error-correcting.Comment: An abridged version of this paper appears in STACS 201
    • 

    corecore