161 research outputs found

    Approximate Profile Maximum Likelihood

    Full text link
    We propose an efficient algorithm for approximate computation of the profile maximum likelihood (PML), a variant of maximum likelihood maximizing the probability of observing a sufficient statistic rather than the empirical sample. The PML has appealing theoretical properties, but is difficult to compute exactly. Inspired by observations gleaned from exactly solvable cases, we look for an approximate PML solution, which, intuitively, clumps comparably frequent symbols into one symbol. This amounts to lower-bounding a certain matrix permanent by summing over a subgroup of the symmetric group rather than the whole group during the computation. We extensively experiment with the approximate solution, and find the empirical performance of our approach is competitive and sometimes significantly better than state-of-the-art performance for various estimation problems

    The Roles of Polyploidy, Climate, and Genetic Architecture in the Evolution of Leaf Form in Viburnum (Adoxaceae)

    Get PDF
    Plants exhibit extensive variation in leaf form, but the evolutionary drivers of this variation are not well understood. This dissertation leverages the wide diversity of leaf form and instances of leaf syndrome convergence in Viburnum to investigate trends in leaf trait evolution across the group. First, I explore how changes in chromosome number and genome size influence leaf characters of ecophysiological importance. It appears that even with extensive variation in chromosome number and genome size across Viburnum, nucleotypic changes largely do not constrain leaf traits related to ecophysiological function. I then look into a case of convergent evolution for leaf syndromes in a radiation of Central and South American cloud forest species. First, I studied whether or not different species with the same leaf syndromes occupied similar climatic niches and found that though all leaf syndromes occupied largely overlapping climatic zones, some leaf syndromes do seem to sort by a couple select climate variables. I then uncover hybridization between two Mexican species with different leaf syndromes and use this underlying genetic diversity to identify genetic markers associated with leaf trait differences. I find that some leaf traits are associated with many genomic regions while others are associated with only a few and that some significant regions are associated with multiple traits. Overall, there are many factors, from genetic to environmental, shaping the evolution of leaf form in Viburnum

    Variable-Length Coding with Feedback: Finite-Length Codewords and Periodic Decoding

    Full text link
    Theoretical analysis has long indicated that feedback improves the error exponent but not the capacity of single-user memoryless channels. Recently Polyanskiy et al. studied the benefit of variable-length feedback with termination (VLFT) codes in the non-asymptotic regime. In that work, achievability is based on an infinite length random code and decoding is attempted at every symbol. The coding rate backoff from capacity due to channel dispersion is greatly reduced with feedback, allowing capacity to be approached with surprisingly small expected latency. This paper is mainly concerned with VLFT codes based on finite-length codes and decoding attempts only at certain specified decoding times. The penalties of using a finite block-length NN and a sequence of specified decoding times are studied. This paper shows that properly scaling NN with the expected latency can achieve the same performance up to constant terms as with N=∞N = \infty. The penalty introduced by periodic decoding times is a linear term of the interval between decoding times and hence the performance approaches capacity as the expected latency grows if the interval between decoding times grows sub-linearly with the expected latency.Comment: 8 pages. A shorten version is submitted to ISIT 201

    An Exact Characterization of the Generalization Error for the Gibbs Algorithm

    Get PDF
    Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and lack of guarantees. As a result, they may fail to characterize the exact generalization ability of a learning algorithm.Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm (a.k.a. Gibbs posterior) using symmetrized KL information between the input training samples and the output hypothesis. Our result can be applied to tighten existing expected generalization error and PAC-Bayesian bounds. Our approach is versatile, as it also characterizes the generalization error of the Gibbs algorithm with data-dependent regularizer and that of the Gibbs algorithm in the asymptotic regime, where it converges to the empirical risk minimization algorithm. Of particular relevance, our results highlight the role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm

    Information-Theoretic Characterizations of Generalization Error for the Gibbs Algorithm

    Get PDF
    Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and even vacuous when evaluated in practice. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contributions are exact characterizations of the expected generalization error of the well-known Gibbs algorithm (a.k.a. Gibbs posterior) using different information measures, in particular, the symmetrized KL information between the input training samples and the output hypothesis. Our result can be applied to tighten existing expected generalization errors and PAC-Bayesian bounds. Our information-theoretic approach is versatile, as it also characterizes the generalization error of the Gibbs algorithm with a data-dependent regularizer and that of the Gibbs algorithm in the asymptotic regime, where it converges to the standard empirical risk minimization algorithm. Of particular relevance, our results highlight the role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm

    Jensen-Shannon Information Based Characterization of the Generalization Error of Learning Algorithms

    Get PDF
    Generalization error bounds are critical to understanding the performance of machine learning models. In this work, we propose a new information-theoretic based generalization error upper bound applicable to supervised learning scenarios. We show that our general bound can specialize in various previous bounds. We also show that our general bound can be specialized under some conditions to a new bound involving the Jensen-Shannon information between a random variable modelling the set of training samples and another random variable modelling the hypothesis. We also prove that our bound can be tighter than mutual information-based bounds under some conditions.Comment: Accepted in ITW 2020 conferenc
    • …
    corecore