825 research outputs found

    Lower Bounds on the Redundancy of Huffman Codes with Known and Unknown Probabilities

    Full text link
    In this paper we provide a method to obtain tight lower bounds on the minimum redundancy achievable by a Huffman code when the probability distribution underlying an alphabet is only partially known. In particular, we address the case where the occurrence probabilities are unknown for some of the symbols in an alphabet. Bounds can be obtained for alphabets of a given size, for alphabets of up to a given size, and for alphabets of arbitrary size. The method operates on a Computer Algebra System, yielding closed-form numbers for all results. Finally, we show the potential of the proposed method to shed some light on the structure of the minimum redundancy achievable by the Huffman code

    About Adaptive Coding on Countable Alphabets: Max-Stable Envelope Classes

    Full text link
    In this paper, we study the problem of lossless universal source coding for stationary memoryless sources on countably infinite alphabets. This task is generally not achievable without restricting the class of sources over which universality is desired. Building on our prior work, we propose natural families of sources characterized by a common dominating envelope. We particularly emphasize the notion of adaptivity, which is the ability to perform as well as an oracle knowing the envelope, without actually knowing it. This is closely related to the notion of hierarchical universal source coding, but with the important difference that families of envelope classes are not discretely indexed and not necessarily nested. Our contribution is to extend the classes of envelopes over which adaptive universal source coding is possible, namely by including max-stable (heavy-tailed) envelopes which are excellent models in many applications, such as natural language modeling. We derive a minimax lower bound on the redundancy of any code on such envelope classes, including an oracle that knows the envelope. We then propose a constructive code that does not use knowledge of the envelope. The code is computationally efficient and is structured to use an {E}xpanding {T}hreshold for {A}uto-{C}ensoring, and we therefore dub it the \textsc{ETAC}-code. We prove that the \textsc{ETAC}-code achieves the lower bound on the minimax redundancy within a factor logarithmic in the sequence length, and can be therefore qualified as a near-adaptive code over families of heavy-tailed envelopes. For finite and light-tailed envelopes the penalty is even less, and the same code follows closely previous results that explicitly made the light-tailed assumption. Our technical results are founded on methods from regular variation theory and concentration of measure

    Information Theoretic Methods For Biometrics, Clustering, And Stemmatology

    Get PDF
    This thesis consists of four parts, three of which study issues related to theories and applications of biometric systems, and one which focuses on clustering. We establish an information theoretic framework and the fundamental trade-off between utility of biometric systems and security of biometric systems. The utility includes person identification and secret binding, while template protection, privacy, and secrecy leakage are security issues addressed. A general model of biometric systems is proposed, in which secret binding and the use of passwords are incorporated. The system model captures major biometric system designs including biometric cryptosystems, cancelable biometrics, secret binding and secret generating systems, and salt biometric systems. In addition to attacks at the database, information leakage from communication links between sensor modules and databases is considered. A general information theoretic rate outer bound is derived for characterizing and comparing the fundamental capacity, and security risks and benefits of different system designs. We establish connections between linear codes to biometric systems, so that one can directly use a vast literature of coding theories of various noise and source random processes to achieve good performance in biometric systems. We develop two biometrics based on laser Doppler vibrometry: LDV) signals and electrocardiogram: ECG) signals. For both cases, changes in statistics of biometric traits of the same individual is the major challenge which obstructs many methods from producing satisfactory results. We propose a ii robust feature selection method that specifically accounts for changes in statistics. The method yields the best results both in LDV and ECG biometrics in terms of equal error rates in authentication scenarios. Finally, we address a different kind of learning problem from data called clustering. Instead of having a set of training data with true labels known as in identification problems, we study the problem of grouping data points without labels given, and its application to computational stemmatology. Since the problem itself has no true answer, the problem is in general ill-posed unless some regularization or norm is set to define the quality of a partition. We propose the use of minimum description length: MDL) principle for graphical based clustering. In the MDL framework, each data partitioning is viewed as a description of the data points, and the description that minimizes the total amount of bits to describe the data points and the model itself is considered the best model. We show that in synthesized data the MDL clustering works well and fits natural intuition of how data should be clustered. Furthermore, we developed a computational stemmatology method based on MDL, which achieves the best performance level in a large dataset

    Minimum Description Length Model Selection - Problems and Extensions

    Get PDF
    The thesis treats a number of open problems in Minimum Description Length model selection, especially prediction problems. It is shown how techniques from the "Prediction with Expert Advice" literature can be used to improve model selection performance, which is particularly useful in nonparametric settings

    Entropy of delta-coded speech

    Get PDF
    • …
    corecore