5 research outputs found

    Compressing Sets and Multisets of Sequences

    Get PDF
    This is the accepted manuscript for a paper published in IEEE Transactions on Information Theory, Vol. 61, No. 3, March 2015, doi: 10.1109/TIT.2015.2392093. © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper describes lossless compression algorithms for multisets of sequences, taking advantage of the multiset’s unordered structure. Multisets are a generalization of sets, where members are allowed to occur multiple times. A multiset can be encoded naïvely by simply storing its elements in some sequential order, but then information is wasted on the ordering. We propose a technique that transforms the multiset into an order-invariant tree representation, and derive an arithmetic code that optimally compresses the tree. Our method achieves compression even if the sequences in the multiset are individually incompressible (such as cryptographic hash sums). The algorithm is demonstrated practically by compressing collections of SHA-1 hash sums, and multisets of arbitrary, individually encodable objects.This work was supported in part by the Engineering and Physical Sciences Research Council under Grant EP/I036575 and in part by a Google Research Award. This paper was presented at the 2014 Data Compression Conferenc

    Unconditionally secure ciphers with a short key for a source with unknown statistics

    Get PDF
    We consider the problem of constructing an unconditionally secure cipher with a short key for the case where the probability distribution of encrypted messages is unknown. Note that unconditional security means that an adversary with no computational constraints can obtain only a negligible amount of information ("leakage") about an encrypted message (without knowing the key). Here we consider the case of a priori (partially) unknown message source statistics. More specifically, the message source probability distribution belongs to a given family of distributions. We propose an unconditionally secure cipher for this case. As an example, one can consider constructing a single cipher for texts written in any of the languages of the European Union. That is, the message to be encrypted could be written in any of these languages

    Integer Set Compression and Statistical Modeling

    Get PDF
    Compression of integer sets and sequences has been extensively studied for settings where elements follow a uniform probability distribution. In addition, methods exist that exploit clustering of elements in order to achieve higher compression performance. In this work, we address the case where enumeration of elements may be arbitrary or random, but where statistics is kept in order to estimate probabilities of elements. We present a recursive subset-size encoding method that is able to benefit from statistics, explore the effects of permuting the enumeration order based on element probabilities, and discuss general properties and possibilities for this class of compression problem

    Coding of Sets of Words

    No full text
    corecore