79 research outputs found

    Noiseless compression using non-Markov models

    Get PDF
    Adaptive data compression techniques can be viewed as consisting of a model specified by a database common to the encoder and decoder, an encoding rule and a rule for updating the model to ensure that the encoder and decoder always agree on the interpretation of the next transmission. The techniques which fit this framework range from run-length coding, to adaptive Huffman and arithmetic coding, to the string-matching techniques of Lempel and Ziv. The compression obtained by arithmetic coding is dependent on the generality of the source model. For many sources, an independent-letter model is clearly insufficient. Unfortunately, a straightforward implementation of a Markov model requires an amount of space exponential in the number of letters remembered. The Directed Acyclic Word Graph (DAWG) can be constructed in time and space proportional to the text encoded, and can be used to estimate the probabilities required for arithmetic coding based on an amount of memory which varies naturally depending on the encoded text. The tail of that portion of the text which was encoded is the longest suffix that has occurred previously. The frequencies of letters following these previous occurrences can be used to estimate the probability distribution of the next letter. Experimental results indicate that compression is often far better than that obtained using independent-letter models, and sometimes also significantly better than other non-independent techniques

    The Rényi Redundancy of Generalized Huffman Codes

    Get PDF
    Huffman's algorithm gives optimal codes, as measured by average codeword length, and the redundancy can be measured as the difference between the average codeword length and Shannon's entropy. If the objective function is replaced by an exponentially weighted average, then a simple modification of Huffman's algorithm gives optimal codes. The redundancy can now be measured as the difference between this new average and A. Renyi's (1961) generalization of Shannon's entropy. By decreasing some of the codeword lengths in a Shannon code, the upper bound on the redundancy given in the standard proof of the noiseless source coding theorem is improved. The lower bound is improved by randomizing between codeword lengths, allowing linear programming techniques to be used on an integer programming problem. These bounds are shown to be asymptotically equal. The results are generalized to the Renyi case and are related to R.G. Gallager's (1978) bound on the redundancy of Huffman codes

    A Study of the Learnability of Relational Properties: Model Counting Meets Machine Learning (MCML)

    Full text link
    This paper introduces the MCML approach for empirically studying the learnability of relational properties that can be expressed in the well-known software design language Alloy. A key novelty of MCML is quantification of the performance of and semantic differences among trained machine learning (ML) models, specifically decision trees, with respect to entire (bounded) input spaces, and not just for given training and test datasets (as is the common practice). MCML reduces the quantification problems to the classic complexity theory problem of model counting, and employs state-of-the-art model counters. The results show that relatively simple ML models can achieve surprisingly high performance (accuracy and F1-score) when evaluated in the common setting of using training and test datasets - even when the training dataset is much smaller than the test dataset - indicating the seeming simplicity of learning relational properties. However, MCML metrics based on model counting show that the performance can degrade substantially when tested against the entire (bounded) input space, indicating the high complexity of precisely learning these properties, and the usefulness of model counting in quantifying the true performance

    A Formal Proof of PAC Learnability for Decision Stumps

    Full text link
    We present a formal proof in Lean of probably approximately correct (PAC) learnability of the concept class of decision stumps. This classic result in machine learning theory derives a bound on error probabilities for a simple type of classifier. Though such a proof appears simple on paper, analytic and measure-theoretic subtleties arise when carrying it out fully formally. Our proof is structured so as to separate reasoning about deterministic properties of a learning function from proofs of measurability and analysis of probabilities.Comment: 13 pages, appeared in Certified Programs and Proofs (CPP) 202

    Bounds on the Redundancy of Noiseless Source Coding

    No full text
    59 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1982.The Renyi redundancy, R(,s)(p,w), is the difference between the exponentially weighted average codeword length,(DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)is the best possible.and the Renyi entropy,(DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)(Here s > 0 is a parameter, p = (p(,1),p(,2),...,p(,m)), and w = (w(,1),w(,2),...,w(,m)), where p(,i) is the probability that the i('th) codeword, consisting of w(,i) bits, is used.) As s (--->) 0('+) this approaches the usual redundancy. Huffman's algorithm generalizes in a natural way to the s > 0 case. Let R(,s)(p) be the Renyi redundancy of the Huffman code for p and s. The main result of Chapter II is a technique for computing bounds L(,s)(p) and U(,s)(p), satisfying0 (LESSTHEQ) L(,s)(p) (LESSTHEQ) R(,s)(p) (LESSTHEQ) U(,s)(p) < 1.In the case of block to variable-length (BV) coding of a binary memoryless source, these bounds are shown to be asymptotically equal as the block length increases, generalizing a result mentioned by Krichevskii (1966).Chapter III treats the problem of minimizing the oridinary (s = 0) redundancy when p is not entirely known. Let p be a probability vector containing the probabilities of blocks of length n from some J-state unifilar (Markov) source with aphabet size A. Let P denote the class of such probability vectors, and let W denote the class of uniquely decodable codes with A('n) codewords. The minimax redundancy is(DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)The main result of Chapter III is a technique for generating a sequence of BV codes for which the minimax redundancy is bounded above byU of I OnlyRestricted to the U of I community idenfinitely during batch ingest of legacy ETD
    • …
    corecore