507 research outputs found

    Consistency and convergence rate of phylogenetic inference via regularization

    Full text link
    It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct "gene tree." Although the gene tree may deviate from the "species tree" due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to incorporate such additional information. Recent studies using simulation and empirical data suggest that a likelihood penalty quantifying concordance with a species tree can significantly improve the accuracy of gene tree reconstruction compared to using sequence data alone. However, the consistency of such an approach has not yet been established, nor have convergence rates been bounded. Because phylogenetics is a non-standard inference problem, the standard theory does not apply. In this paper, we propose a penalized maximum likelihood estimator for gene tree reconstruction, where the penalty is the square of the Billera-Holmes-Vogtmann geodesic distance from the gene tree to the species tree. We prove that this method is consistent, and derive its convergence rate for estimating the discrete gene tree structure and continuous edge lengths (representing the amount of evolution that has occurred on that branch) simultaneously. We find that the regularized estimator is "adaptive fast converging," meaning that it can reconstruct all edges of length greater than any given threshold from gene sequences of polynomial length. Our method does not require the species tree to be known exactly; in fact, our asymptotic theory holds for any such guide tree.Comment: 34 pages, 5 figures. To appear on The Annals of Statistic

    Uncovering latent structure in valued graphs: A variational approach

    Full text link
    As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case. We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host--parasite interaction networks in forest ecosystems.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS361 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Markov blankets of life: autonomy, active inference and the free energy principle

    Get PDF
    This work addresses the autonomous organization of biological systems. It does so by considering the boundaries of biological systems, from individual cells to Home sapiens, in terms of the presence of Markov blankets under the active inference scheme—a corollary of the free energy principle. A Markov blanket defines the boundaries of a system in a statistical sense. Here we consider how a collective of Markov blankets can self-assemble into a global system that itself has a Markov blanket; thereby providing an illustration of how autonomous systems can be understood as having layers of nested and self-sustaining boundaries. This allows us to show that: (i) any living system is a Markov blanketed system and (ii) the boundaries of such systems need not be co-extensive with the biophysical boundaries of a living organism. In other words, autonomous systems are hierarchically composed of Markov blankets of Markov blankets—all the way down to individual cells, all the way up to you and me, and all the way out to include elements of the local environment

    Template-Based Static Posterior Inference for Bayesian Probabilistic Programming

    Full text link
    In Bayesian probabilistic programming, a central problem is to estimate the normalised posterior distribution (NPD) of a probabilistic program with conditioning. Prominent approximate approaches to address this problem include Markov chain Monte Carlo and variational inference, but neither can generate guaranteed outcomes within limited time. Moreover, most existing formal approaches that perform exact inference for NPD are restricted to programs with closed-form solutions or bounded loops/recursion. A recent work (Beutner et al., PLDI 2022) derived guaranteed bounds for NPD over programs with unbounded recursion. However, as this approach requires recursion unrolling, it suffers from the path explosion problem. Furthermore, previous approaches do not consider score-recursive probabilistic programs that allow score statements inside loops, which is non-trivial and requires careful treatment to ensure the integrability of the normalising constant in NPD. In this work, we propose a novel automated approach to derive bounds for NPD via polynomial templates. Our approach can handle probabilistic programs with unbounded while loops and continuous distributions with infinite supports. The novelties in our approach are three-fold: First, we use polynomial templates to circumvent the path explosion problem from recursion unrolling; Second, we derive a novel multiplicative variant of Optional Stopping Theorem that addresses the integrability issue in score-recursive programs; Third, to increase the accuracy of the derived bounds via polynomial templates, we propose a novel technique of truncation that truncates a program into a bounded range of program values. Experiments over a wide range of benchmarks demonstrate that our approach is time-efficient and can derive bounds for NPD that are comparable with (or tighter than) the recursion-unrolling approach (Beutner et al., PLDI 2022)

    Variational Inference in Nonconjugate Models

    Full text link
    Mean-field variational methods are widely used for approximate posterior inference in many probabilistic models. In a typical application, mean-field methods approximately compute the posterior with a coordinate-ascent optimization algorithm. When the model is conditionally conjugate, the coordinate updates are easily derived and in closed form. However, many models of interest---like the correlated topic model and Bayesian logistic regression---are nonconjuate. In these models, mean-field methods cannot be directly applied and practitioners have had to develop variational algorithms on a case-by-case basis. In this paper, we develop two generic methods for nonconjugate models, Laplace variational inference and delta method variational inference. Our methods have several advantages: they allow for easily derived variational algorithms with a wide class of nonconjugate models; they extend and unify some of the existing algorithms that have been derived for specific models; and they work well on real-world datasets. We studied our methods on the correlated topic model, Bayesian logistic regression, and hierarchical Bayesian logistic regression

    Prior Density Learning in Variational Bayesian Phylogenetic Parameters Inference

    Full text link
    The advances in variational inference are providing promising paths in Bayesian estimation problems. These advances make variational phylogenetic inference an alternative approach to Markov Chain Monte Carlo methods for approximating the phylogenetic posterior. However, one of the main drawbacks of such approaches is the modelling of the prior through fixed distributions, which could bias the posterior approximation if they are distant from the current data distribution. In this paper, we propose an approach and an implementation framework to relax the rigidity of the prior densities by learning their parameters using a gradient-based method and a neural network-based parameterization. We applied this approach for branch lengths and evolutionary parameters estimation under several Markov chain substitution models. The results of performed simulations show that the approach is powerful in estimating branch lengths and evolutionary model parameters. They also show that a flexible prior model could provide better results than a predefined prior model. Finally, the results highlight that using neural networks improves the initialization of the optimization of the prior density parameters.Comment: Accepted as a full paper for publication at RECOMB-CG 2023 (Camera-ready version). 15 pages (excluding references), 6 tables and 1 figur

    Phylogenetic information complexity: Is testing a tree easier than finding it?

    Get PDF
    Phylogenetic trees describe the evolutionary history of a group of present-day species from a common ancestor. These trees are typically reconstructed from aligned DNA sequence data. In this paper we analytically address the following question: is the amount of sequence data required to accurately reconstruct a tree significantly more than the amount required to test whether or not a candidate tree was the `true' tree? By `significantly', we mean that the two quantities behave the same way as a function of the number of species being considered. We prove that, for a certain type of model, the amount of information required is not significantly different; while for another type of model, the information required to test a tree is independent of the number of leaves, while that required to reconstruct it grows with this number. Our results combine probabilistic and combinatorial arguments.Comment: 15 pages, 3 figure
    • …
    corecore