Search CORE

Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes

Author: A Doron-Faigenboim
A Schneider
AL Halpern
AR Kinjo
C Kosiol
Darren Martin
DT Jones
G Bazykin
GC Conant
H Akaike
I Keller
J Adachi
J Adachi
JP Huelsenbeck
K Tamura
L Jin
M Anisimova
M Averof
M Hasegawa
M Kimura
MA Larkin
MO Dayhoff
MW Dimmic
N Goldman
N Rodrigue
N Takahata
NGC Smith
R Grantham
S Guindon
S Miyazawa
S Whelan
S Whelan
S Whelan
Sanzo Miyazawa
SC Choi
SQ Le
SV Muse
T Miyata
T Miyata
TK Seo
TK Seo
W Delport
W Delport
Z Yang
Z Yang
Z Yang
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/03/2011
Field of study

Empirical substitution matrices represent the average tendencies of substitutions over various protein families by sacrificing gene-level resolution. We develop a codon-based model, in which mutational tendencies of codon, a genetic code, and the strength of selective constraints against amino acid replacements can be tailored to a given gene. First, selective constraints averaged over proteins are estimated by maximizing the likelihood of each 1-PAM matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution matrices. Then, selective constraints specific to given proteins are approximated as a linear function of those estimated from the empirical substitution matrices. Akaike information criterion (AIC) values indicate that a model allowing multiple nucleotide changes fits the empirical substitution matrices significantly better. Also, the ML estimates of transition-transversion bias obtained from these empirical matrices are not so large as previously estimated. The selective constraints are characteristic of proteins rather than species. However, their relative strengths among amino acid pairs can be approximated not to depend very much on protein families but amino acid pairs, because the present model, in which selective constraints are approximated to be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can provide a good fit to other empirical substitution matrices including cpREV for chloroplast proteins and mtREV for vertebrate mitochondrial proteins. The present codon-based model with the ML estimates of selective constraints and with adjustable mutation rates of nucleotide would be useful as a simple substitution model in ML and Bayesian inferences of molecular phylogenetic trees, and enables us to obtain biologically meaningful information at both nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table 9 published in 10.1371/journal.pone.0017244. Supporting information is attached at the end of the article, and a computer-readable dataset of the ML estimates of selective constraints is available from 10.1371/journal.pone.001724

Public Library of Science (PLOS)

Directory of Open Access Journals

A generalized mechanistic codon model.

Author: Dib L.
Salamin N.
Zaheri M.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

Models of codon evolution have attracted particular interest because of their unique capabilities to detect selection forces and their high fit when applied to sequence evolution. We described here a novel approach for modeling codon evolution, which is based on Kronecker product of matrices. The 61 × 61 codon substitution rate matrix is created using Kronecker product of three 4 × 4 nucleotide substitution matrices, the equilibrium frequency of codons, and the selection rate parameter. The entities of the nucleotide substitution matrices and selection rate are considered as parameters of the model, which are optimized by maximum likelihood. Our fully mechanistic model allows the instantaneous substitution matrix between codons to be fully estimated with only 19 parameters instead of 3,721, by using the biological interdependence existing between positions within codons. We illustrate the properties of our models using computer simulations and assessed its relevance by comparing the AICc measures of our model and other models of codon evolution on simulations and a large range of empirical data sets. We show that our model fits most biological data better compared with the current codon models. Furthermore, the parameters in our model can be interpreted in a similar way as the exchangeability rates found in empirical codon models

Serveur académique lausannois

Empirical Analysis of the Most Relevant Parameters of Codon Substitution Models

Author: Schneider Adrian
Zoller Stefan
Publication venue
Publication date: 18/06/2018
Field of study

Traditionally, codon models of evolution have been parametric, meaning that the 61 ×61 substitution rate matrix was derived from only a handful of parameters, typically the equilibrium frequencies, the ratio of nonsynonymous to synonymous substitution rates and the ratio between transition and transversion rates. These parameters are reasonable choices and are based on observations of what aspects of evolution often vary in coding DNA. However, the choices are relatively arbitrary and no systematic empirical search has ever been performed to identify the best parameters for a codon model. Even for the empirical or semi-empirical models that have been presented recently, only the average substitution rates have been estimated from databases of real coding DNA, but the parameters used were essentially the same as before. In this study we attempted to investigate empirically what the most relevant parameters for a codon model are. By performing a principal component analysis (PCA) on 3666 substitution rate matrices estimated from single gene families, the sets of the most co-varying substitution rates were determined. Interestingly, the two most significant principal components (PCs) describe clearly identifiable parameters: the first PC separates synonymous and nonsynonymous substitutions while the second PC distinguishes between substitutions where only one nucleotide changes and substitutions with two or three nucleotide changes. For the third and subsequent PCs no simple descriptions could be foun

RERO DOC Digital Library

PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions

Author: Arvestad
Blanchette
Brent
Butler
Clark
Goldman
Guttman
Guttman
Holmes
I. Jungreis
Kellis
Lin
M. F. Lin
M. Kellis
Ota
Ozsolak
Stark
Whelan
Yang
Publication venue
Publication date: 17/08/2010
Field of study

As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species _Drosophila_ genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE

Quantifying evolutionary constraints on B cell affinity maturation

Author: Bedford Trevor
Bradley Philip
Matsen IV Frederick A.
McCoy Connor O.
Minin Vladimir N.
Robins Harlan
Publication venue: 'The Royal Society'
Publication date: 12/03/2014
Field of study

The antibody repertoire of each individual is continuously updated by the evolutionary process of B cell receptor mutation and selection. It has recently become possible to gain detailed information concerning this process through high-throughput sequencing. Here, we develop modern statistical molecular evolution methods for the analysis of B cell sequence data, and then apply them to a very deep short-read data set of B cell receptors. We find that the substitution process is conserved across individuals but varies significantly across gene segments. We investigate selection on B cell receptors using a novel method that side-steps the difficulties encountered by previous work in differentiating between selection and motif-driven mutation; this is done through stochastic mapping and empirical Bayes estimators that compare the evolution of in-frame and out-of-frame rearrangements. We use this new method to derive a per-residue map of selection, which provides a more nuanced view of the constraints on framework and variable regions.Comment: Previously entitled "Substitution and site-specific selection driving B cell affinity maturation is consistent across individuals

eScholarship - University of California

FigShare

πBUSS:a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios

Author: Baele Guy
Bielejec Filip
Carvalho Luiz Max
Lemey Philippe
Rambaut Andrew
Suchard Marc A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/12/2013
Field of study

Background: Simulated nucleotide or amino acid sequences are frequently used to assess the performance of phylogenetic reconstruction methods. BEAST, a Bayesian statistical framework that focuses on reconstructing time-calibrated molecular evolutionary processes, supports a wide array of evolutionary models, but lacked matching machinery for simulation of character evolution along phylogenies. Results: We present a flexible Monte Carlo simulation tool, called piBUSS, that employs the BEAGLE high performance library for phylogenetic computations within BEAST to rapidly generate large sequence alignments under complex evolutionary models. piBUSS sports a user-friendly graphical user interface (GUI) that allows combining a rich array of models across an arbitrary number of partitions. A command-line interface mirrors the options available through the GUI and facilitates scripting in large-scale simulation studies. Analogous to BEAST model and analysis setup, more advanced simulation options are supported through an extensible markup language (XML) specification, which in addition to generating sequence output, also allows users to combine simulation and analysis in a single BEAST run. Conclusions: piBUSS offers a unique combination of flexibility and ease-of-use for sequence simulation under realistic evolutionary scenarios. Through different interfaces, piBUSS supports simulation studies ranging from modest endeavors for illustrative purposes to complex and large-scale assessments of evolutionary inference procedures. The software aims at implementing new models and data types that are continuously being developed as part of BEAST/BEAGLE.Comment: 13 pages, 2 figures, 1 tabl

Lirias

Springer - Publisher Connector

Edinburgh Research Explorer

eScholarship - University of California

Back-translation for discovering distant protein homologies

Author: A. Pedersen
B. Oostra
C. Kosiol
J. Leluk
J. Leluk
J. Raes
K. Okamura
L. Arvestad
L. Delaye
M. Clamp
M. Pellegrini
P. Harrison
P. Lio
R. Blake
S. Altschul
S. Altschul
S. Altschul
Y. Hahn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics (WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009

CiteSeerX

HAL - Lille 3