74 research outputs found

    Average Profile of the Lempel-Ziv Parsing Scheme for Markovian Source

    Get PDF

    Distribution des symboles finaux dans un arbre de recherche avec des sources de Markov

    Get PDF
    Lempel-Ziv'78 is one of the most popular data compression algorithm on words. Over the last decades we uncover its fascinating behavior and understand better many of its beautiful properties. Among others, in 1995 by settling the Ziv conjecture we proved that for memoryless source (i.e., when a sequence is generated by a source without memory) the number of LZ'78 phrases satisfies the Central Limit Theorem (CLT). Since then the quest commenced to extend it to Markov sources, however, despite several attempts this problem is still open. In this conference paper, we revisit the issue and focus on a much simpler, but not trivial problem that may lead to the resolution of the LZ'78 dilemma. We consider the associated Digital Search Tree (DST) version of the problem in which the DST is built over a fixed number of Markov generated sequences. In such a model we shall count the number of of the so called "tail symbol", that is, the symbol that follows the last inserted symbol. Our goal here is to analyze this new quantity under Markovian assumption since it plays crucial role in the analysis of the original LZ'78 problem. We establish the mean, the variance, and the central limit theorem for the number of tail symbols. We accomplish it by applying techniques of analytic combinatorics on words also known as analytic pattern matching

    The expected profile of digital search trees

    Get PDF
    AbstractA digital search tree (DST) is a fundamental data structure on words that finds various applications from the popular Lempel–Zivʼ78 data compression scheme to distributed hash tables. The profile of a DST measures the number of nodes at the same distance from the root; it depends on the number of stored strings and the distance from the root. Most parameters of DST (e.g., depth, height, fillup) can be expressed in terms of the profile. We study here asymptotics of the average profile in a DST built from sequences generated independently by a memoryless source. After representing the average profile by a recurrence, we solve it using a wide range of analytic tools. This analysis is surprisingly demanding but once it is carried out it reveals an unusually intriguing and interesting behavior. The average profile undergoes phase transitions when moving from the root to the longest path: at first it resembles a full tree until it abruptly starts growing polynomially and oscillating in this range. These results are derived by methods of analytic combinatorics such as generating functions, Mellin transform, poissonization and depoissonization, the saddle point method, singularity analysis and uniform asymptotic analysis

    A Central Limit Theorem for non-overlapping return times

    Full text link
    Define the non-overlapping return time of a random process to be the number of blocks that we wait before a particular block reappears. We prove a Central Limit Theorem based on these return times. This result has applications to entropy estimation, and to the problem of determining if digits have come from an independent equidistribted sequence. In the case of an equidistributed sequence, we use an argument based on negative association to prove convergence under weaker conditions

    New analysis of the asymptotic behavior of the Lempel-Ziv compression algorithm

    Get PDF
    We give a new analysis and proof of the Normal limiting distribution of the number of phrases in the 1978 Lempel-Ziv compression algorithm on random sequences built from a memoriless source. This work is a follow-up of our last paper on this subject in 1995. The analysis stands on the asymptotic behavior of a DST obtained by the insertion of random sequences. Our proofs are augmented of new results on moment convergence, moderate and large deviations, redundancy analysis

    The effect of flexible parsing for dynamic dictionary-based data compression

    Full text link

    Sequential Decoding of Low-Density Parity-Check Codes by Adaptive Reordering of Parity Checks

    Get PDF
    Decoding algorithms are investigated in which unpruned codeword trees are generated from an ordered list of parity checks. The order is computed from the received message, and low-density parity-check codes are used to help control the growth of the tree. Simulation results are given for the binary erasure channel. © 1992 IEE

    On the variance of a class of inductive valuations of data structures for digital search

    Get PDF
    AbstractLet an inductive valuation L on the family of binary tries or Patricia tries or digital search trees be defined in the following way: L(t) = L(tl) + L(tr) + R(t), where tl and tr denote the left and right subtrees of t and R depends only on the size (the number of records) ¦t¦ of t. Let LN denote L restricted to the trees of size N. In Theorem 1 we give sufficient conditions on the sequence r¦t¦ $̈= R(t) for the variance Var LN to be of exact order N, if the family of tries (resp. Patricia tries, resp. digital search trees) is equipped with the Bernoulli model. For the symmetric Bernoulli model we prove the existence of a continuous periodic function δ with period 1, such that Var LN ∼ δ(log2 N) .̄ N holds
    corecore