9 research outputs found
Capturing a phylogenetic tree when the number of character states varies with the number of leaves
We show that for any two values for which
then there is a value so that for all the
following holds. For any binary phylogenetic tree on leaves there is a
set of characters that capture , and for which
each character takes at most distinct states. Here
`capture' means that is the unique perfect phylogeny for these characters.
Our short proof of this combinatorial result is based on the probabilistic
method.Comment: 3 pages, 0 figure
Predicting Horizontal Gene Transfers with Perfect Transfer Networks
Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity.
We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa. This problem has been studied extensively in the form of ancestral recombination networks, but these only model hybridation events and do not differentiate between direct parents and lateral donors. We focus on tree-based networks, in which edges representing vertical descent are clearly distinguished from those that represent horizontal transmission. Our model is a direct generalization of perfect phylogeny models to such networks. Our goal is to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa
Predicting Horizontal Gene Transfers with Perfect Transfer Networks
Horizontal gene transfer inference approaches are usually based on gene
sequences: parametric methods search for patterns that deviate from a
particular genomic signature, while phylogenetic methods use sequences to
reconstruct the gene and species trees. However, it is well-known that
sequences have difficulty identifying ancient transfers since mutations have
enough time to erase all evidence of such events. In this work, we ask whether
character-based methods can predict gene transfers. Their advantage over
sequences is that homologous genes can have low DNA similarity, but still have
retained enough important common motifs that allow them to have common
character traits, for instance the same functional or expression profile. A
phylogeny that has two separate clades that acquired the same character
independently might indicate the presence of a transfer even in the absence of
sequence similarity. We introduce perfect transfer networks, which are
phylogenetic networks that can explain the character diversity of a set of taxa
under the assumption that characters have unique births, and that once a
character is gained it is rarely lost. Examples of such traits include
transposable elements, biochemical markers and emergence of organelles, just to
name a few. We study the differences between our model and two similar models:
perfect phylogenetic networks and ancestral recombination networks. Our goals
are to initiate a study on the structural and algorithmic properties of perfect
transfer networks. We then show that in polynomial time, one can decide whether
a given network is a valid explanation for a set of taxa, and show how, for a
given tree, one can add transfer edges to it so that it explains a set of taxa.
We finally provide lower and upper bounds on the number of transfers required
to explain a set of taxa, in the worst case
Discrete and statistical approaches to genetics
This thesis presents a number of major innovations in related but different areas of research. The contributions range along a continuum from mathematical phylogenetics, to development of statistical methodology for detecting recombination and finally to the application of statistical techniques to understand Feline Immunodeficiency Virus (FIV) an important pathogen. An underlying theme is the application of combinatorial and statistical ideas to problems in evolutionary biology and genetics.Chapter 2 and Chapter 3 give a number of results relevant to mathematical phylogenetics, in particular maximum parsimony. Chapter 2 presents a new formulation of maximum parsimony in terms of character subdivision, providing a direct link with the character compatibility problem, also known as the perfect phylogeny problem. Specialization of this result to two characters gives a simple formula based on the intersection graph for calculating the parsimony score for a, pair of characters. Chapter 3 further explores maximum parsimony. In particular, it is shown that a maximum parsimony tree for a sequence of characters minimizes a subtree-prune and regraft (SPR) distance to the sets of trees on which each character is convex. Similar connections are also drawn between the Robinson-Foulds distance and a new variant of Dollo parsimony.Chapter 4 presents an application of the work in Chapters 2 and 3 to develop a statistical test for detecting recombination. An extensive coalescent based simulation study shows that this new test is both robust and powerful in a variety of different circumstances compared to a number of current methods. In fact, a simple model of mutation rate correlation is shown to mislead a number of competing tests, causing recombination to be falsely inferred. Analysis of empirical data sets confirm that the new test is one of the best approaches to distinguish recurrent mutation from recombination.Finally, Chapter 5 uses the test developed in Chapter 4 to localize recombinant breakpoints in 14 genomic strains of FIV taken from a wild population of cougars. Based on the technique, three recombinant strains of FIV are identified. Previous studies have focused on the epidemiology and population structure of the virus and this study shows that recombination has also played an important role in the evolution of FIV
Tree Reconstruction From Multi-State Characters
In evolutionary biology, a character is a function # from a set X of present-day species into a finite set of states. Suppose the species in X have evolved according to a bifurcating tree T . Biologists would like to use characters to infer this tree. Assume that # is the result of an evolutionary process on T that has not involved reverse or parallel transitions, such characters are called homoplasy-free.Inthiscase,# provides direct combinatorial information about the underlying evolutionary tree T for X.Weconsider the question of how many homoplasy-free characters are required so that T can be correctly reconstructed. We first establish lower bounds showing that, when the number of states is bounded, the number of homoplasy-free characters required to reconstruct T grows (at least) linearly with the size of X.In contrast, our main result shows that, when the state space is su#ciently large, every bifurcating tree can be uniquely determined by just five homoplasy-free characters. We briefly describe the relevance of this result for some new types of genomic data, and for the amalgamation of evolutionary trees. 1