9 research outputs found

    Capturing a phylogenetic tree when the number of character states varies with the number of leaves

    Full text link
    We show that for any two values α,β>0\alpha, \beta >0 for which α+β>1\alpha+\beta>1 then there is a value NN so that for all nNn \geq N the following holds. For any binary phylogenetic tree TT on nn leaves there is a set of nα\lfloor n^\alpha \rfloor characters that capture TT, and for which each character takes at most nβ\lfloor n^\beta \rfloor distinct states. Here `capture' means that TT is the unique perfect phylogeny for these characters. Our short proof of this combinatorial result is based on the probabilistic method.Comment: 3 pages, 0 figure

    Predicting Horizontal Gene Transfers with Perfect Transfer Networks

    Get PDF
    Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa. This problem has been studied extensively in the form of ancestral recombination networks, but these only model hybridation events and do not differentiate between direct parents and lateral donors. We focus on tree-based networks, in which edges representing vertical descent are clearly distinguished from those that represent horizontal transmission. Our model is a direct generalization of perfect phylogeny models to such networks. Our goal is to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa

    Predicting Horizontal Gene Transfers with Perfect Transfer Networks

    Full text link
    Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa under the assumption that characters have unique births, and that once a character is gained it is rarely lost. Examples of such traits include transposable elements, biochemical markers and emergence of organelles, just to name a few. We study the differences between our model and two similar models: perfect phylogenetic networks and ancestral recombination networks. Our goals are to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa. We finally provide lower and upper bounds on the number of transfers required to explain a set of taxa, in the worst case

    Defining a Phylogenetic Tree with the Minimum Number of rr-State Characters

    Full text link

    Discrete and statistical approaches to genetics

    Get PDF
    This thesis presents a number of major innovations in related but different areas of research. The contributions range along a continuum from mathematical phylogenetics, to development of statistical methodology for detecting recombination and finally to the application of statistical techniques to understand Feline Immunodeficiency Virus (FIV) an important pathogen. An underlying theme is the application of combinatorial and statistical ideas to problems in evolutionary biology and genetics.Chapter 2 and Chapter 3 give a number of results relevant to mathematical phylogenetics, in particular maximum parsimony. Chapter 2 presents a new formulation of maximum parsimony in terms of character subdivision, providing a direct link with the character compatibility problem, also known as the perfect phylogeny problem. Specialization of this result to two characters gives a simple formula based on the intersection graph for calculating the parsimony score for a, pair of characters. Chapter 3 further explores maximum parsimony. In particular, it is shown that a maximum parsimony tree for a sequence of characters minimizes a subtree-prune and regraft (SPR) distance to the sets of trees on which each character is convex. Similar connections are also drawn between the Robinson-Foulds distance and a new variant of Dollo parsimony.Chapter 4 presents an application of the work in Chapters 2 and 3 to develop a statistical test for detecting recombination. An extensive coalescent based simulation study shows that this new test is both robust and powerful in a variety of different circumstances compared to a number of current methods. In fact, a simple model of mutation rate correlation is shown to mislead a number of competing tests, causing recombination to be falsely inferred. Analysis of empirical data sets confirm that the new test is one of the best approaches to distinguish recurrent mutation from recombination.Finally, Chapter 5 uses the test developed in Chapter 4 to localize recombinant breakpoints in 14 genomic strains of FIV taken from a wild population of cougars. Based on the technique, three recombinant strains of FIV are identified. Previous studies have focused on the epidemiology and population structure of the virus and this study shows that recombination has also played an important role in the evolution of FIV

    Tree Reconstruction From Multi-State Characters

    Get PDF
    In evolutionary biology, a character is a function # from a set X of present-day species into a finite set of states. Suppose the species in X have evolved according to a bifurcating tree T . Biologists would like to use characters to infer this tree. Assume that # is the result of an evolutionary process on T that has not involved reverse or parallel transitions, such characters are called homoplasy-free.Inthiscase,# provides direct combinatorial information about the underlying evolutionary tree T for X.Weconsider the question of how many homoplasy-free characters are required so that T can be correctly reconstructed. We first establish lower bounds showing that, when the number of states is bounded, the number of homoplasy-free characters required to reconstruct T grows (at least) linearly with the size of X.In contrast, our main result shows that, when the state space is su#ciently large, every bifurcating tree can be uniquely determined by just five homoplasy-free characters. We briefly describe the relevance of this result for some new types of genomic data, and for the amalgamation of evolutionary trees. 1
    corecore