2,808 research outputs found

    Detection of recombination in DNA multiple alignments with hidden markov models

    Get PDF
    CConventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected

    The inference of gene trees with species trees

    Get PDF
    Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

    Phylogenetic networks: A tool to display character conflict and demographic history

    Get PDF
    Evolutionary trees have the assumption that evolution and phylogeny can be represented in a strictly bifurcating manner. Firmly speaking, from one ancestral taxon, two descendant taxa emerge. Nevertheless, hybridization, recombination and horizontal gene transfer is in conflict with this straightforward concept. In such cases, evolutionary lines do not only separate from each other, but have the possibility of melting again and are called reticulations. Consequently, networks can represent evolutionary events more realistically than phylogenetic trees. Networks can display alternative topologies and co-existence of ancestors and descendants, which are otherwise not obvious when a comparison is done on several single trees or a consensus tree. Therefore, networks have the ability to visualize the conflicting information in a given data set. Moreover, the distribution, frequencies and arrangement of haplotypes in populations can reveal the phylogenetic histories of the taxa, regarding predictions from the coalescent theory. This review aims to: (1) give a brief comparison between phylogenetic trees and networks, (2) provide the overall concept of the coalescent theory, (3) clarify how phylogenetic networks can be used to display conflict data and evaluate phylogenetic histories, and (4) offer a useful starting point and guide for sequence analysis, with the aim to discover population dynamics.Key words: Phylogenetic networks, reticulation, coalescent theory, population history, character conflict

    Predicting Horizontal Gene Transfers with Perfect Transfer Networks

    Get PDF
    Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa. This problem has been studied extensively in the form of ancestral recombination networks, but these only model hybridation events and do not differentiate between direct parents and lateral donors. We focus on tree-based networks, in which edges representing vertical descent are clearly distinguished from those that represent horizontal transmission. Our model is a direct generalization of perfect phylogeny models to such networks. Our goal is to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa

    A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes

    Get PDF
    The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences

    Consequences of genetic recombination on protein folding stability

    Get PDF
    Genetic recombination is a common evolutionary mechanism that produces molecular diversity. However, its consequences on protein folding stability have not attracted the same attention as in the case of point mutations. Here, we studied the effects of homologous recombination on the computationally predicted protein folding stability for several protein families, finding less detrimental effects than we previously expected. Although recombination can affect multiple protein sites, we found that the fraction of recombined proteins that are eliminated by negative selection because of insufficient stability is not significantly larger than the corresponding fraction of proteins produced by mutation events. Indeed, although recombination disrupts epistatic interactions, the mean stability of recombinant proteins is not lower than that of their parents. On the other hand, the difference of stability between recombined proteins is amplified with respect to the parents, promoting phenotypic diversity. As a result, at least one third of recombined proteins present stability between those of their parents, and a substantial fraction have higher or lower stability than those of both parents. As expected, we found that parents with similar sequences tend to produce recombined proteins with stability close to that of the parents. Finally, the simulation of protein evolution along the ancestral recombination graph with empirical substitution models commonly used in phylogenetics, which ignore constraints on protein folding stability, showed that recombination favors the decrease of folding stability, supporting the convenience of adopting structurally constrained models when possible for inferences of protein evolutionary histories with recombination.Agencia Estatal de Investigación | Ref. PID2019-107931GA-I00/AEI/10.13039/501100011033Agencia Estatal de Investigación | Ref. PID2019-109041GBC22/10.13039/501100011033Ministerio de Economía y Competitividad | Ref. RYC-2015-18241Financiado para publicación en acceso aberto: Universidade de Vigo/CISU

    The inference of gene trees with species trees.

    Get PDF
    This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution

    Network Analysis of Non-treelike Patterns in Evolution

    Get PDF

    Molecular phylogenetic analysis of host use and biogeography within the genus Rhinusa and the related genus Gymnetron (Coleoptera : Curculionidae)

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Towards a Processual Microbial Ontology

    Get PDF
    types: ArticleStandard microbial evolutionary ontology is organized according to a nested hierarchy of entities at various levels of biological organization. It typically detects and defines these entities in relation to the most stable aspects of evolutionary processes, by identifying lineages evolving by a process of vertical inheritance from an ancestral entity. However, recent advances in microbiology indicate that such an ontology has important limitations. The various dynamics detected within microbiological systems reveal that a focus on the most stable entities (or features of entities) over time inevitably underestimates the extent and nature of microbial diversity. These dynamics are not the outcome of the process of vertical descent alone. Other processes, often involving causal interactions between entities from distinct levels of biological organisation, or operating at different time scales, are responsible not only for the destabilisation of pre-existing entities, but also for the emergence and stabilisation of novel entities in the microbial world. In this article we consider microbial entities as more or less stabilised functional wholes, and sketch a network-based ontology that can represent a diverse set of processes including, for example, as well as phylogenetic relations, interactions that stabilise or destabilise the interacting entities, spatial relations, ecological connections, and genetic exchanges. We use this pluralistic framework for evaluating (i) the existing ontological assumptions in evolution (e.g. whether currently recognized entities are adequate for understanding the causes of change and stabilisation in the microbial world), and (ii) for identifying hidden ontological kinds, essentially invisible from within a more limited perspective. We propose to recognize additional classes of entities that provide new insights into the structure of the microbial world, namely ‘‘processually equivalent’’ entities, ‘‘processually versatile’’ entities, and ‘‘stabilized’’ entities.Economic and Social Research Council, U
    • …
    corecore