79 research outputs found

    Phylogenetic mixtures and linear invariants for equal input models

    Get PDF
    The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the ‘equal input model’. This model generalizes the ‘Felsenstein 1981’ model (and thereby the Jukes–Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a ‘random cluster’ process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees—the so called ‘model invariants’), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of n=4 leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167–191, 1987).Peer ReviewedPostprint (author's final draft

    Algebraic methods in phylogenetics

    Get PDF
    To those outside the field, and even to some focused on empirical applications, phylogenetics may appear to have little to do with algebra. Probability and statistics are clearly important ingredients, as modeling and inferring evolutionary relationships motivate the field. Combinatorics is also an obvious component, as the graph-theoretic notions of trees, and more recently networks, are used to describe the relationships. But where does the algebra arise? The models used in phylogenetics are necessarily complex. At the simplest, they depend on a tree structure, as well as Markov matrices describing changes in nucleotide sequences along the edges. These two components result in probability distributions given by rather complicated polynomials on the parameters of the models, whose precise form reflects the structure of the tree.Peer ReviewedPostprint (author's final draft

    GenNon-h: Generating multiple sequence alignments on nonhomogeneous phylogenetic trees

    Get PDF
    Background: A number of software packages are available to generate DNA multiple sequence alignments (MSAs) evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models and it is not optimized to generate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages). Results: We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices. Based on the input model and a phylogenetic tree in the Newick format (with branch lengths measured as the expected number of substitutions per site), the algorithm produces DNA alignments of desired length. GenNon-h is publicly available for download. Conclusion: The software presented here is an efficient tool to generate DNA MSAs on a given phylogenetic tree. GenNon-h provides the user with the nonstationary or nonhomogeneous phylogenetic data that is well suited for testing complex biological hypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models.Postprint (published version

    Models algebraics en filogenètica

    Get PDF
    En aquest article fem una introducci´o a les aplicacions de la geometria algebraica en filogen`etica. Gr`acies a qu`e gran part dels models evolutius usats en filogen`etica corresponen a varietats algebraiques, l’ideal associat a aquestes varietats pot ser usat per donar un nou enfocament a la infer`encia filogen`etica.Peer Reviewe

    Genètica i geometria algebraica

    Get PDF
    "... Les varietats algebraiques apareixen de manera natural en considerar models estadístics empleats en genòmica i filogenètica. Explicarem quina és la relació entre aquests models estadístics i la geometria algebraica. Veurem també com utilitzar aquestes varietats algebraiques per a recuperar les relacions ancestrals entre espècies, és a dir, recuperar l’arbre filogenètic".Factoria FM

    Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages

    Get PDF
    One reason why classical phylogenetic reconstruction methods fail to correctly infer the underlying topology is because they assume oversimplified models. In this article, we propose a quartet reconstruction method consistent with the most general Markov model of nucleotide substitution, which can also deal with data coming from mixtures on the same topology. Our proposed method uses phylogenetic invariants and provides a system of weights that can be used as input for quartet-based methods. We study its performance on real data and on a wide range of simulated 4-taxon data (both time-homogeneous and nonhomogeneous, with or without among-site rate heterogeneity, and with different branch length settings). We compare it to the classical methods of neighbor-joining (with paralinear distance), maximum likelihood (with different underlying models), and maximum parsimony. Our results show that this method is accurate and robust, has a similar performance to maximum likelihood when data satisfies the assumptions of both methods, and outperform the other methods when these are based on inappropriate substitution models. If alignments are long enough, then it also outperforms other methods when some of its assumptions are violatedPeer ReviewedPostprint (author's final draft

    A new phylogenetic reconstruction method based on invariants

    Get PDF
    An attempt to use phylogenetic invariants for tree reconstruction was made at the end of the 80s and the beginning of the 90s by several authors (the initial idea due to Lake [Lake, 1987] and Cavender and Felsenstein [Cavender and Felsenstein, 1987]). However, the e±ciency of methods based on invariants is still in doubt ([Huelsenbeck, 1995], [Jin and Nei, 1990]), probably because these methods only used few generators of the set of phylogenetic invariants. The method studied in this paper was first introduced in [Casanellas et al., 2005] and it is the first method based on invariants that uses the whole set of generators for DNA data. The simulation studies performed in this paper prove that it is a very competitive and highly e±cient phylogenetic reconstruction method, especially for non-homogeneous phylogenetic trees

    Relevant phylogenetic invariants of evolutionary models

    Get PDF
    Recently there have been several attempts to provide a whole set of generators of the ideal of the algebraic variety associated to a phylogenetic tree evolving under an algebraic model. These algebraic varieties have been proven to be useful in phylogenetics. In this paper we prove that, for phylogenetic reconstruction purposes, it is enough to consider generators coming from the edges of the tree, the so-called edge invariants. This is the algebraic analogous to Buneman's Splits Equivalence Theorem. The interest of this result relies on its potential applications in phylogenetics for the widely used evolutionary models such as Jukes-Cantor, Kimura 2 and 3 parameters, and General Markov models.Preprin

    Local equations for equivariant evolutionary models

    Get PDF
    Phylogenetic varieties related to equivariant substitution models have been studied largely in the last years. One of the main objectives has been finding a set of generators of the ideal of these varieties, but this has not yet been achieved in some cases (for example, for the general Markov model this involves the open “salmon conjecture”, see [2]) and it is not clear how to use all generators in practice. Motivated by applications in biology, we tackle the problem from another point of view. The elements of the ideal that could be useful for applications in phylogenetics only need to describe the variety around certain points of no evolution (see [13]). We produce a collection of explicit equations that describe the variety on a Zariski open neighborhood of these points (see Theorem 5.4). Namely, for any tree T on any number of leaves (and any degrees at the interior nodes) and for any equivariant model on any set of states ¿, we compute the codimension of the corresponding phylogenetic variety. We prove that this variety is smooth at general points of no evolution and, if a mild technical condition is satisfied (“d-claw tree hypothesis”), we provide an algorithm to produce a complete intersection that describes the variety around these points.Peer ReviewedPostprint (author's final draft
    corecore