5 research outputs found
Maximum Likelihood Associative Memories
Associative memories are structures that store data in such a way that it can
later be retrieved given only a part of its content -- a sort-of
error/erasure-resilience property. They are used in applications ranging from
caches and memory management in CPUs to database engines. In this work we study
associative memories built on the maximum likelihood principle. We derive
minimum residual error rates when the data stored comes from a uniform binary
source. Second, we determine the minimum amount of memory required to store the
same data. Finally, we bound the computational complexity for message
retrieval. We then compare these bounds with two existing associative memory
architectures: the celebrated Hopfield neural networks and a neural network
architecture introduced more recently by Gripon and Berrou
Compressing Sets and Multisets of Sequences
This is the accepted manuscript for a paper published in IEEE Transactions on Information Theory, Vol. 61, No. 3, March 2015, doi: 10.1109/TIT.2015.2392093. © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper describes lossless compression algorithms
for multisets of sequences, taking advantage of the
multisetâs unordered structure. Multisets are a generalization of
sets, where members are allowed to occur multiple times. A multiset
can be encoded naĂŻvely by simply storing its elements in some
sequential order, but then information is wasted on the ordering.
We propose a technique that transforms the multiset into an
order-invariant tree representation, and derive an arithmetic
code that optimally compresses the tree. Our method achieves
compression even if the sequences in the multiset are individually
incompressible (such as cryptographic hash sums). The algorithm
is demonstrated practically by compressing collections of SHA-1
hash sums, and multisets of arbitrary, individually encodable
objects.This work was supported in part by the Engineering
and Physical Sciences Research Council under Grant EP/I036575 and in
part by a Google Research Award. This paper was presented at the 2014 Data
Compression Conferenc
Integer Set Compression and Statistical Modeling
Compression of integer sets and sequences has been extensively studied for
settings where elements follow a uniform probability distribution. In addition,
methods exist that exploit clustering of elements in order to achieve higher
compression performance. In this work, we address the case where enumeration of
elements may be arbitrary or random, but where statistics is kept in order to
estimate probabilities of elements. We present a recursive subset-size encoding
method that is able to benefit from statistics, explore the effects of
permuting the enumeration order based on element probabilities, and discuss
general properties and possibilities for this class of compression problem
Compressing multisets using tries
International audienceWe consider the problem of efficient and lossless representation of a multiset of m words drawn with repetition from a set of size 2^n . One expects that encoding the (unordered) multiset should lead to significant savings in rate as compared to encoding an (ordered) sequence with the same words, since information about the order of words in the sequence corresponds to a permutation. We propose and analyze a practical multiset encoder/decoder based on the trie data structure. The act of encoding requires O(m(n + log m)) operations, and decoding requires O(mn) operations. Of particular interest is the case where cardinality of the multiset scales as m = 2^n/c for some c > 1, as n â â. Under this scaling, and when the words in the multiset are drawn independently and uniformly, we show that the proposed encoding leads to an arbitrary improvement in rate over encoding an ordered sequence with the same words. Moreover, the expected length of the proposed codes in this setting is asymptotically within a constant factor of 5/3 of the lower bound