1,089 research outputs found

    New Algorithms and Lower Bounds for Sequential-Access Data Compression

    Get PDF
    This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

    Discrete sequence prediction and its applications

    Get PDF
    Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results

    Probabilistic Shaping for Finite Blocklengths: Distribution Matching and Sphere Shaping

    Get PDF
    In this paper, we provide for the first time a systematic comparison of distribution matching (DM) and sphere shaping (SpSh) algorithms for short blocklength probabilistic amplitude shaping. For asymptotically large blocklengths, constant composition distribution matching (CCDM) is known to generate the target capacity-achieving distribution. As the blocklength decreases, however, the resulting rate loss diminishes the efficiency of CCDM. We claim that for such short blocklengths and over the additive white Gaussian channel (AWGN), the objective of shaping should be reformulated as obtaining the most energy-efficient signal space for a given rate (rather than matching distributions). In light of this interpretation, multiset-partition DM (MPDM), enumerative sphere shaping (ESS) and shell mapping (SM), are reviewed as energy-efficient shaping techniques. Numerical results show that MPDM and SpSh have smaller rate losses than CCDM. SpSh--whose sole objective is to maximize the energy efficiency--is shown to have the minimum rate loss amongst all. We provide simulation results of the end-to-end decoding performance showing that up to 1 dB improvement in power efficiency over uniform signaling can be obtained with MPDM and SpSh at blocklengths around 200. Finally, we present a discussion on the complexity of these algorithms from the perspective of latency, storage and computations.Comment: 18 pages, 10 figure
    corecore