Search CORE

5,432 research outputs found

Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space

Author: Casey D
Goldman N
Perron U
Weber CC
Yang Z
Publication venue
Publication date: 30/04/2020
Field of study

How can we best learn the history of a protein’s evolution? Ideally, a model of sequence evolution should capture both the process that generates genetic variation and the functional constraints determining which changes are fixed. However, in practical terms the most suitable approach may simply be the one that combines the convenience of easily available input data with the ability to return useful parameter estimates. For example, we might be interested in a measure of the strength of selection (typically obtained using a codon model) or an ancestral structure (obtained using structural modelling based on inferred amino acid sequence and side chain configuration). But what if data in the relevant state-space are not readily available? We show that it is possible to obtain accurate estimates of the outputs of interest using an established method for handling missing data. Encoding observed characters in an alignment as ambiguous representations of characters in a larger state-space allows the application of models with the desired features to data that lack the resolution that is normally required. This strategy is viable because the evolutionary path taken through the observed space contains information about states that were likely visited in the “unseen” state-space. To illustrate this, we consider two examples with amino acid sequences as input. We show that ω, a parameter describing the relative strength of selection on non-synonymous and synonymous changes, can be estimated in an unbiased manner using an adapted version of a standard 61-state codon model. Using simulated and empirical data, we find that ancestral amino acid side chain configuration can be inferred by applying a 55-state empirical model to 20-state amino acid data. Where feasible, combining inputs from both ambiguity-coded and fully resolved data improves accuracy. Adding structural information to as few as 12.5% of the sequences in an amino acid alignment results in remarkable ancestral reconstruction performance compared to a benchmark that considers the full rotamer state information. These examples show that our methods permit the recovery of evolutionary information from sequences where it has previously been inaccessible

UCL Discovery

Phylogenetic influence of complex, evolutionary models: a Bayesian approach

Author: Krishnan Neeraja M
Publication venue: LSU Digital Commons
Publication date: 01/01/2004
Field of study

Molecular evolution recovers the history of living species by comparing genetic information, exploring genome structure and function from an evolutionary perspective. Here we infer substitution rates and ancestral reconstructions, to better understand mutation responses to some known biochemical phenomena. Mutation processes are commonly inferred using parsimony, maximum likelihood and Bayesian. Parsimony is not explicitly model-based, and is statistically biased due to unrealistic assumptions. The model-based maximum likelihood approaches become computationally inefficient while analyzing large or high-dimensional datasets, leaving little opportunities to incorporate complex evolutionary models. We implemented a posterior probability (Bayesian) approach that evaluates evolutionary models, applying it to primate mitochondrial genomes. The species nucleotide sequence data were augmented with ancestral states at the internal nodes of the phylogeny. We simplified probability calculations for substitution events along the branches by assuming that only up to one or two substitution events occurred per branch per site. These conditional pathway calculations introduce very little bias into the inferred reconstructions, while increasing the feasibility of incorporating complex evolutionary models with higher dimensions. Compositional bias tests, including functional predictions of ancestral tRNAs, show that ancestral sequences from the Bayesian approach are more biologically realistic than those reconstructed by maximum likelihood. To explore other model complexity, we allowed substitution rates to vary among sites by having a different model at each site. With a strand-symmetric model as the base model, asymmetric substitution probabilities for specific substitution types were varied among sites. This model would not be feasible with standard matrix exponentiation methods, particularly maximum likelihood. We observed for A--\u3eG and C--\u3eT substitutions almost linear, respectively, almost asymptotic responses (with some regional deviations). Note that the HMM models had no a priori response built in them. Observed responses fitted predictions from earlier gene by gene likelihood analyses. For A--\u3eG substitutions, deviations from the expected linear response correlated positively with the loop-forming propensity of the corresponding site in the mRNA secondary structure. In the COI region, C--\u3eT substitutions have a prominent dip, suggesting protection against mutations. The C--\u3eT substitution responses differed significantly between primate sub-groups defined based on their single genome A--\u3eG responses

Louisiana State University

Recommended from our members

The Influence of Structural Constraints on Protein Evolution

Author: Perron Umberto
Publication venue: University of Cambridge
Publication date: 01/05/2020
Field of study

Few mathematical models of sequence evolution incorporate parameters describingprotein structure, despite its high conservation, essential functional role and the increasingavailability of structural data. The primary goal of my PhD project was to create astructurally aware amino acid substitution model in which proteins are represented usingan expanded alphabet that relays both amino acid identity and structural information.Each character in this alphabet specifies an amino acid as well as information aboutthe rotamer configuration of its side chain: the discrete geometric pattern of permittedside chain atomic positions, as defined by the dihedral angles between covalently linkedatoms. I generated a 55-state “Dayhoff-like” substitution model (RAM55) by assigningrotamer states in 79,558 structures (∼50%of all PDBe entries) and identifying substitu-tions between closely related sequences. RAM55’s rotamer state exchange patterns clearlyshow that the evolutionary properties of amino acids depend strongly upon side chain ge-ometry. Exploiting knowledge of these patterns assists in phylogenetic analyses: I showthat RAM55 performs as well as or better than traditional 20-state models on simulatedand empirical data for divergence time estimation, tree inference, side chain configurationprediction and ancestral sequence reconstruction.Further, encoding observed characters in an alignment as ambiguous representations ofcharacters in a larger state-space allows the application of RAM55 to 20-state amino aciddata for which structures are not known. Adding structural information to as few as12.5%of the sequences in an amino acid alignment results in excellent ancestral reconstructionperformance compared to a benchmark that considers the full rotamer state information.This strategy significantly expands the applicability of RAM55 to real-world scenarioswhere structure might only be available for some of the sequences of interest.Thus, not only is rotamer configuration a valuable source of information for phylo-genetic studies, but modelling the concomitant evolution of sequence and structure mayhave important implications for understanding protein folding and function

Apollo (Cambridge)

The inference of gene trees with species trees

Author: Bastien Boussau
Eric Tannier
Gergely J. Szöllősi
Montbonnot France
Vincent Daubin
Publication venue
Publication date: 04/11/2013
Field of study

Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

PubMed Central

HAL

Repository of the Academy's Library

ELTE Digital Institutional Repository (EDIT)

Hal-Diderot

Combining genomics and epidemiology to track mumps virus transmission in the United States.

Author: Bankamp Bettina
Barreira Paul
Burns Meagan
Byrne Elizabeth H
Chak Bridget
Fitzgerald Susan
Fleming Stephen
Gharib Soheyla
Grad Yonatan H
Hennigan Scott
Krasilnikova Lydia A
Lett Susan
Lewnard Joseph A
MacInnis Bronwyn L
Madoff Lawrence C
Matranga Christian B
McNall Rebecca J
Metsky Hayden C
Park Daniel J
Piantadosi Anne
Qu James
Rota Paul A
Sabeti Pardis C
Sabina Brandon
Schaffner Stephen F
Shah Rickey R
Siddle Katherine J
Smole Sandra
Wohl Shirlee
Yozwiak Nathan L
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Unusually large outbreaks of mumps across the United States in 2016 and 2017 raised questions about the extent of mumps circulation and the relationship between these and prior outbreaks. We paired epidemiological data from public health investigations with analysis of mumps virus whole genome sequences from 201 infected individuals, focusing on Massachusetts university communities. Our analysis suggests continuous, undetected circulation of mumps locally and nationally, including multiple independent introductions into Massachusetts and into individual communities. Despite the presence of these multiple mumps virus lineages, the genomic data show that one lineage has dominated in the US since at least 2006. Widespread transmission was surprising given high vaccination rates, but we found no genetic evidence that variants arising during this outbreak contributed to vaccine escape. Viral genomic data allowed us to reconstruct mumps transmission links not evident from epidemiological data or standard single-gene surveillance efforts and also revealed connections between apparently unrelated mumps outbreaks

Directory of Open Access Journals

eScholarship - University of California

Evolution of substrate specificity in a recipient's enzyme following horizontal gene transfer

Author: Abascal
Aldo R. Camacho-Zarco
Barona-Gomez
Brune
Carver
Collaborative Computational Project Number 4 (CCP4)
Dean
Depristo
Des Marais
Dittmar
Due
Francisco Barona-Gómez
Glykos
Guindon
Henn-Sax
Hess
Hodgson
Hu
Hughes
Humphrey
Ikeda
James
Jensen
Jensen
Jones
Jorgensen
Jung
Kane
Klassen
Kuper
Leaver-Fay
Lerat
Lianet Noda-García
MacKerell
Mauricio Carrillo-Tripp
McCoy
Murshudov
Nester
Noda-Garcia
Ohno
Pal
Parish
Paul Gaytán
Perrakis
Piatigorsky
Sofía Medina-Ruíz
Sterner
Tokuriki
Tokuriki
Treangen
Vilmos Fülöp
Vriend
Wright
Xie
Xie
Publication venue: 'Oxford University Press (OUP)'
Publication date: 25/06/2013
Field of study

Despite the prominent role of horizontal gene transfer (HGT) in shaping bacterial metabolism, little is known about the impact of HGT on the evolution of enzyme function. Specifically, what is the influence of a recently acquired gene on the function of an existing gene? For example, certain members of the genus Corynebacterium have horizontally acquired a whole L-tryptophan biosynthetic operon, whereas in certain closely related actinobacteria, for example, Mycobacterium, the trpF gene is missing. In Mycobacterium, the function of the trpF gene is performed by a dual-substrate (βα)8 phosphoribosyl isomerase (priA gene) also involved in L-histidine (hisA gene) biosynthesis. We investigated the effect of a HGT-acquired TrpF enzyme upon PriA’s substrate specificity in Corynebacterium through comparative genomics and phylogenetic reconstructions. After comprehensive in vivo and enzyme kinetic analyses of selected PriA homologs, a novel (βα)8 isomerase subfamily with a specialized function in L-histidine biosynthesis, termed subHisA, was confirmed. X-ray crystallography was used to reveal active-site mutations in subHisA important for narrowing of substrate specificity, which when mutated to the naturally occurring amino acid in PriA led to gain of function. Moreover, in silico molecular dynamic analyses demonstrated that the narrowing of substrate specificity of subHisA is concomitant with loss of ancestral protein conformational states. Our results show the importance of HGT in shaping enzyme evolution and metabolism

Crossref

Warwick Research Archives Portal Repository

Phylogenetic systematics, biogeography, and evolutionary ecology of the true crocodiles (Eusuchia: Crocodylidae: Crocodylus)

Author: Oaks Jamie Richard
Publication venue: LSU Digital Commons
Publication date: 01/01/2007
Field of study

Modern crocodylian systematics has been dominated by investigations of higher-level relationships aimed at resolving the disparity between morphological and molecular data, especially regarding the true gharial (Gavialis). Consequently, no studies to date have provided adequate resolution of the interspecific relationships within the most broadly distributed, ecologically diverse, and species-rich crocodylian genus, Crocodylus. In this study, Bayesian and ML partitioned phylogenetic analyses were performed on a DNA sequence dataset of 7,282 base pairs representing four mitochondrial regions, nine nuclear loci, and all 23 crocodylian species. The analyses were performed on a suite of partitioning strategies to investigate the modeling effects of partition choice in phylogenetic analyses. Bayesian lognormal relaxed-clock dating analyses also were performed on the dataset, calibrated from the rich crocodylian fossil record. A robust interspecific phylogeny of Crocodylus is reconstructed, and subsequently used in ML and Bayesian ancestral character-state reconstructions to test hypotheses about the biogeographic history and evolutionary ecology of the genus. The results demonstrate that the genus originated from an ancestor in the tropics of the Late Miocene Indo-Pacific, and rapidly radiated and dispersed around the globe during a period marked by mass extinctions of fellow crocodylians. The results also prove paraphyly of Crocodylus, and reveal more diversity within the genus than recognized by current taxonomy. This study also establishes a baseline for assessing the utility of various model selection criteria for objectively selecting the optimal partitioning strategy within ML and Bayesian frameworks. The results indicate that gene identity is a poor method of partition choice. Furthermore, the results of the ancestral character-state reconstructions suggest ML and Bayesian methods produce more realistic and reliable results than parsimony

Louisiana State University

A Comparison of Phylogenetic Network Methods Using Computer Simulation

Author: A Rzhetsky
A Shioura
AR Templeton
AR Templeton
AR Templeton
B Holland
B Rannala
BA Schaal
BME Moret
D Posada
D Posada
David Posada
DF Robinson
DH Huson
DH Huson
DL Swofford
DM Hillis
DM Hillis
FT Bakker
G Cardona
G Jin
HJ Bandelt
I Cassens
I Cassens
Jason E. Stajich
JS Song
KA Crandall
Keith A. Crandall
L Excoffier
LL Cavalli-Sforza
M Clement
M Forster
M Pagel
M Perez-Losada
MH Schierup
MK Kuhner
N Nguyen
N Saitou
RC Griffiths
RR Hudson
RR Hudson
S Schneider
S Wain-Hobson
Steven M. Woolley
TH Jukes
W-H Li
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Background: We present a series of simulation studies that explore the relative performance of several phylogenetic network approaches (statistical parsimony, split decomposition, union of maximum parsimony trees, neighbor-net, simulated history recombination upper bound, median-joining, reduced median joining and minimum spanning network) compared to standard tree approaches, (neighbor-joining and maximum parsimony) in the presence and absence of recombination. Principal Findings: In the absence of recombination, all methods recovered the correct topology and branch lengths nearly all of the time when the substitution rate was low, except for minimum spanning networks, which did considerably worse. At a higher substitution rate, maximum parsimony and union of maximum parsimony trees were the most accurate. With recombination, the ability to infer the correct topology was halved for all methods and no method could accurately estimate branch lengths. Conclusions: Our results highlight the need for more accurate phylogenetic network methods and the importance of detecting and accounting for recombination in phylogenetic studies. Furthermore, we provide useful information for choosing a network algorithm and a framework in which to evaluate improvements to existing methods and nove

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes

Author: A Bergeron
A Bergeron
A Bhutkar
A Caprara
A Darling
A Sinha
A Sturtevant
C Kemkemer
Cedric Chauve
D Fulkerson
D Karolchik
D Sankoff
Eric Tannier
F Alizadeh
F Richard
F Swidan
F Yang
G Bourque
G Bourque
G Bourque
G Bourque
G Landau
J Earnest-DeYoung
J Ma
J Ma
J Meidanis
J Tang
J Wienberg
Jens Stoye
K Booth
K Lindblad-Toh
L Froenicke
L Froenicke
M Alekseyev
M Belcaid
M Blanchette
M Dom
M Dom
M Habib
M Hajiaghayi
M Muffato
M Rocchi
M Svartman
M Svartman
MJ Benton
MP Beal
N El-Mabrouk
N Eriksen
N Luc
P Goldberg
P Pevzner
Pavel A. Pevzner
R Karp
R McConnell
S Bérard
S Bérard
S Pasek
T Christof
T Faraut
T Mikkelsen
VL Rascol
W Murphy
W Murphy
Y Nakatani
Y van de Peer
Z Adam
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

The reconstruction of ancestral genome architectures and gene orders from homologies between extant species is a long-standing problem, considered by both cytogeneticists and bioinformaticians. A comparison of the two approaches was recently investigated and discussed in a series of papers, sometimes with diverging points of view regarding the performance of these two approaches. We describe a general methodological framework for reconstructing ancestral genome segments from conserved syntenies in extant genomes. We show that this problem, from a computational point of view, is naturally related to physical mapping of chromosomes and benefits from using combinatorial tools developed in this scope. We develop this framework into a new reconstruction method considering conserved gene clusters with similar gene content, mimicking principles used in most cytogenetic studies, although on a different kind of data. We implement and apply it to datasets of mammalian genomes. We perform intensive theoretical and experimental comparisons with other bioinformatics methods for ancestral genome segments reconstruction. We show that the method that we propose is stable and reliable: it gives convergent results using several kinds of data at different levels of resolution, and all predicted ancestral regions are well supported. The results come eventually very close to cytogenetics studies. It suggests that the comparison of methods for ancestral genome reconstruction should include the algorithmic aspects of the methods as well as the disciplinary differences in data aquisition

Public Library of Science (PLOS)

Crossref

INRIA a CCSD electronic archive server

Directory of Open Access Journals

PubMed Central

Simon Fraser University Institutional Repository

HAL Descartes

Hal-Diderot