31 research outputs found

    Minimax Optimal Bayes Mixtures for Memoryless Sources

    Get PDF
    Tasks such as data compression and prediction commonly require choosing a probability distribution over all possible sequences. To achieve an efficient prediction strategy, the chosen distribution should be a good approximation of the true distribution underlying the data. Similarly, an efficient compression strategy should assign shorter codes for more probable sequences. In particular, a compression strategy that minimizes the code-length can be shown to minimize the often-used logarithmic prediction loss. However, the optimal strategy requires knowing the true distribution which is not available in most applications. In universal compression or prediction we assume that the true probability distribution is not known but belongs to a known class of distributions. A universal code is a code that can compress the data essentially as well as the best distribution in the class in hindsight. Similarly, a universal predictor achieves low prediction loss regardless of the distribution. We call a universal code minimax optimal if it minimizes the worst-case regret, i.e. excess code-length or prediction loss compared to the best distribution in the class. In this thesis we assume the known class to be discrete memoryless sources. The minimax optimal code for this class is given by the normalized maximum likelihood (NML) distribution. However, in practice computationally more efficient distributions such as Bayes mixtures have to be used. A Bayes mixture is a mixture of the probability distributions in the class weighted by a prior distribution. The conjugate prior to the multinomial distribution is the Dirichlet distribution, using which asymptotically minimax codes have been developed. The Dirichlet distribution requires a hyperparameter that dictates the amount of prior mass given to the outcomes. The distribution given by the symmetric hyperparameter 1/2 has been widely studied and has been shown to minimize the worst-case expected regret asymptotically. Previous work on minimax optimal Bayes mixtures has mainly been concerned with large sample sizes in comparison to the alphabet size. In this thesis we investigate the minimax optimal Dirichlet prior in the large alphabet setting. In particular, we find that when the alphabet size is large compared to the sample size, the optimal hyperparameter for the Dirichlet distribution is 1/3. The worst-case regret of this mixture turns out to approach the NML regret when the alphabet size grows and the distribution provides an efficient approximation of the NML distribution. Furthermore, we develop an efficient algorithm for finding the optimal hyperparameter for any sample size or alphabet size

    About adaptive coding on countable alphabets

    Get PDF
    This paper sheds light on universal coding with respect to classes of memoryless sources over a countable alphabet defined by an envelope function with finite and non-decreasing hazard rate. We prove that the auto-censuring AC code introduced by Bontemps (2011) is adaptive with respect to the collection of such classes. The analysis builds on the tight characterization of universal redundancy rate in terms of metric entropy % of small source classes by Opper and Haussler (1997) and on a careful analysis of the performance of the AC-coding algorithm. The latter relies on non-asymptotic bounds for maxima of samples from discrete distributions with finite and non-decreasing hazard rate

    About Adaptive Coding on Countable Alphabets: Max-Stable Envelope Classes

    Full text link
    In this paper, we study the problem of lossless universal source coding for stationary memoryless sources on countably infinite alphabets. This task is generally not achievable without restricting the class of sources over which universality is desired. Building on our prior work, we propose natural families of sources characterized by a common dominating envelope. We particularly emphasize the notion of adaptivity, which is the ability to perform as well as an oracle knowing the envelope, without actually knowing it. This is closely related to the notion of hierarchical universal source coding, but with the important difference that families of envelope classes are not discretely indexed and not necessarily nested. Our contribution is to extend the classes of envelopes over which adaptive universal source coding is possible, namely by including max-stable (heavy-tailed) envelopes which are excellent models in many applications, such as natural language modeling. We derive a minimax lower bound on the redundancy of any code on such envelope classes, including an oracle that knows the envelope. We then propose a constructive code that does not use knowledge of the envelope. The code is computationally efficient and is structured to use an {E}xpanding {T}hreshold for {A}uto-{C}ensoring, and we therefore dub it the \textsc{ETAC}-code. We prove that the \textsc{ETAC}-code achieves the lower bound on the minimax redundancy within a factor logarithmic in the sequence length, and can be therefore qualified as a near-adaptive code over families of heavy-tailed envelopes. For finite and light-tailed envelopes the penalty is even less, and the same code follows closely previous results that explicitly made the light-tailed assumption. Our technical results are founded on methods from regular variation theory and concentration of measure

    Coding on countably infinite alphabets

    Get PDF
    33 pagesInternational audienceThis paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal distribution provide benchmarks for coding techniques originating from the theory of universal coding over finite alphabets. We prove general upper-bounds on minimax regret and lower-bounds on minimax redundancy for such source classes. The general upper bounds emphasize the role of the Normalized Maximum Likelihood codes with respect to minimax regret in the infinite alphabet context. Lower bounds are derived by tailoring sharp bounds on the redundancy of Krichevsky-Trofimov coders for sources over finite alphabets. Up to logarithmic (resp. constant) factors the bounds are matching for source classes defined by algebraically declining (resp. exponentially vanishing) envelopes. Effective and (almost) adaptive coding techniques are described for the collection of source classes defined by algebraically vanishing envelopes. Those results extend ourknowledge concerning universal coding to contexts where the key tools from parametric inferenc

    Coding on countably infinite alphabets

    Full text link
    This paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal distribution provide benchmarks for coding techniques originating from the theory of universal coding over finite alphabets. We prove general upper-bounds on minimax regret and lower-bounds on minimax redundancy for such source classes. The general upper bounds emphasize the role of the Normalized Maximum Likelihood codes with respect to minimax regret in the infinite alphabet context. Lower bounds are derived by tailoring sharp bounds on the redundancy of Krichevsky-Trofimov coders for sources over finite alphabets. Up to logarithmic (resp. constant) factors the bounds are matching for source classes defined by algebraically declining (resp. exponentially vanishing) envelopes. Effective and (almost) adaptive coding techniques are described for the collection of source classes defined by algebraically vanishing envelopes. Those results extend ourknowledge concerning universal coding to contexts where the key tools from parametric inferenceComment: 33 page

    On the Compression of Unknown Sources

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2018

    Divergence Measures

    Get PDF
    Data science, information theory, probability theory, statistical learning and other related disciplines greatly benefit from non-negative measures of dissimilarity between pairs of probability measures. These are known as divergence measures, and exploring their mathematical foundations and diverse applications is of significant interest. The present Special Issue, entitled “Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems”, includes eight original contributions, and it is focused on the study of the mathematical properties and applications of classical and generalized divergence measures from an information-theoretic perspective. It mainly deals with two key generalizations of the relative entropy: namely, the R_ényi divergence and the important class of f -divergences. It is our hope that the readers will find interest in this Special Issue, which will stimulate further research in the study of the mathematical foundations and applications of divergence measures
    corecore