63 research outputs found
Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes
Empirical substitution matrices represent the average tendencies of
substitutions over various protein families by sacrificing gene-level
resolution. We develop a codon-based model, in which mutational tendencies of
codon, a genetic code, and the strength of selective constraints against amino
acid replacements can be tailored to a given gene. First, selective constraints
averaged over proteins are estimated by maximizing the likelihood of each 1-PAM
matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution
matrices. Then, selective constraints specific to given proteins are
approximated as a linear function of those estimated from the empirical
substitution matrices.
Akaike information criterion (AIC) values indicate that a model allowing
multiple nucleotide changes fits the empirical substitution matrices
significantly better. Also, the ML estimates of transition-transversion bias
obtained from these empirical matrices are not so large as previously
estimated. The selective constraints are characteristic of proteins rather than
species. However, their relative strengths among amino acid pairs can be
approximated not to depend very much on protein families but amino acid pairs,
because the present model, in which selective constraints are approximated to
be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can
provide a good fit to other empirical substitution matrices including cpREV for
chloroplast proteins and mtREV for vertebrate mitochondrial proteins.
The present codon-based model with the ML estimates of selective constraints
and with adjustable mutation rates of nucleotide would be useful as a simple
substitution model in ML and Bayesian inferences of molecular phylogenetic
trees, and enables us to obtain biologically meaningful information at both
nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table
9 published in 10.1371/journal.pone.0017244. Supporting information is
attached at the end of the article, and a computer-readable dataset of the ML
estimates of selective constraints is available from
10.1371/journal.pone.001724
Recombination rate and selection strength in HIV intra-patient evolution
The evolutionary dynamics of HIV during the chronic phase of infection is
driven by the host immune response and by selective pressures exerted through
drug treatment. To understand and model the evolution of HIV quantitatively,
the parameters governing genetic diversification and the strength of selection
need to be known. While mutation rates can be measured in single replication
cycles, the relevant effective recombination rate depends on the probability of
coinfection of a cell with more than one virus and can only be inferred from
population data. However, most population genetic estimators for recombination
rates assume absence of selection and are hence of limited applicability to
HIV, since positive and purifying selection are important in HIV evolution.
Here, we estimate the rate of recombination and the distribution of selection
coefficients from time-resolved sequence data tracking the evolution of HIV
within single patients. By examining temporal changes in the genetic
composition of the population, we estimate the effective recombination to be
r=1.4e-5 recombinations per site and generation. Furthermore, we provide
evidence that selection coefficients of at least 15% of the observed
non-synonymous polymorphisms exceed 0.8% per generation. These results provide
a basis for a more detailed understanding of the evolution of HIV. A
particularly interesting case is evolution in response to drug treatment, where
recombination can facilitate the rapid acquisition of multiple resistance
mutations. With the methods developed here, more precise and more detailed
studies will be possible, as soon as data with higher time resolution and
greater sample sizes is available.Comment: to appear in PLoS Computational Biolog
A mathematical framework for critical transitions: normal forms, variance and applications
Critical transitions occur in a wide variety of applications including
mathematical biology, climate change, human physiology and economics. Therefore
it is highly desirable to find early-warning signs. We show that it is possible
to classify critical transitions by using bifurcation theory and normal forms
in the singular limit. Based on this elementary classification, we analyze
stochastic fluctuations and calculate scaling laws of the variance of
stochastic sample paths near critical transitions for fast subsystem
bifurcations up to codimension two. The theory is applied to several models:
the Stommel-Cessi box model for the thermohaline circulation from geoscience,
an epidemic-spreading model on an adaptive network, an activator-inhibitor
switch from systems biology, a predator-prey system from ecology and to the
Euler buckling problem from classical mechanics. For the Stommel-Cessi model we
compare different detrending techniques to calculate early-warning signs. In
the epidemics model we show that link densities could be better variables for
prediction than population densities. The activator-inhibitor switch
demonstrates effects in three time-scale systems and points out that excitable
cells and molecular units have information for subthreshold prediction. In the
predator-prey model explosive population growth near a codimension two
bifurcation is investigated and we show that early-warnings from normal forms
can be misleading in this context. In the biomechanical model we demonstrate
that early-warning signs for buckling depend crucially on the control strategy
near the instability which illustrates the effect of multiplicative noise.Comment: minor corrections to previous versio
Correlated Evolution of Nearby Residues in Drosophilid Proteins
Here we investigate the correlations between coding sequence substitutions as a function of their separation along the protein sequence. We consider both substitutions between the reference genomes of several Drosophilids as well as polymorphisms in a population sample of Zimbabwean Drosophila melanogaster. We find that amino acid substitutions are “clustered” along the protein sequence, that is, the frequency of additional substitutions is strongly enhanced within ≈10 residues of a first such substitution. No such clustering is observed for synonymous substitutions, supporting a “correlation length” associated with selection on proteins as the causative mechanism. Clustering is stronger between substitutions that arose in the same lineage than it is between substitutions that arose in different lineages. We consider several possible origins of clustering, concluding that epistasis (interactions between amino acids within a protein that affect function) and positional heterogeneity in the strength of purifying selection are primarily responsible. The role of epistasis is directly supported by the tendency of nearby substitutions that arose on the same lineage to preserve the total charge of the residues within the correlation length and by the preferential cosegregation of neighboring derived alleles in our population sample. We interpret the observed length scale of clustering as a statistical reflection of the functional locality (or modularity) of proteins: amino acids that are near each other on the protein backbone are more likely to contribute to, and collaborate toward, a common subfunction
Genetic Diversity in the SIR Model of Pathogen Evolution
We introduce a model for assessing the levels and patterns of genetic diversity in pathogen populations, whose epidemiology follows a susceptible-infected-recovered model (SIR). We model the population of pathogens as a metapopulation composed of subpopulations (infected hosts), where pathogens replicate and mutate. Hosts transmit pathogens to uninfected hosts. We show that the level of pathogen variation is well predicted by analytical expressions, such that pathogen neutral molecular variation is bounded by the level of infection and increases with the duration of infection. We then introduce selection in the model and study the invasion probability of a new pathogenic strain whose fitness (R0(1+s)) is higher than the fitness of the resident strain (R0). We show that this invasion probability is given by the relative increment in R0 of the new pathogen (s). By analyzing the patterns of genetic diversity in this framework, we identify the molecular signatures during the replacement and compare these with those observed in sequences of influenza A
Recommended from our members
The stability of ecosystems: a brief overview of the paradox of enrichment
In theory, enrichment of resource in a predator-prey model leads to destabilization of the system, thereby collapsing the trophic interaction, a phenomenon referred to as "the paradox of enrichment". After it was first proposed by Rosenzweig (1971), a number of subsequent studies were carried out on this dilemma over many decades. In this article, we review these theoretical and experimental works and give a brief overview of the proposed solutions to the paradox. The mechanisms that have been discussed are modifications of simple predator-prey models in the presence of prey that is inedible, invulnerable, unpalatable and toxic. Another class of mechanisms includes an incorporation of a ratio-dependent functional form, inducible defence of prey and density-dependent mortality of the predator. Moreover, we find a third set of explanations based on complex population dynamics including chaos in space and time. We conclude that, although any one of the various mechanisms proposed so far might potentially prevent destabilization of the predator-prey dynamics following enrichment, in nature different mechanisms may combine to cause stability, even when a system is enriched. The exact mechanisms, which may differ among systems, need to be disentangled through extensive field studies and laboratory experiments coupled with realistic theoretical models
Reinvestigation of aminoacyl-TRNA synthetase core complex by affinity purification-mass spectrometry reveals TARSL2 as a potential member of the complex
10.1371/journal.pone.0081734PLoS ONE812-POLN
Advantages of a Mechanistic Codon Substitution Model for Evolutionary Analysis of Protein-Coding Sequences
A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated.The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths
Inference of Co-Evolving Site Pairs: an Excellent Predictor of Contact Residue Pairs in Protein 3D structures
Residue-residue interactions that fold a protein into a unique
three-dimensional structure and make it play a specific function impose
structural and functional constraints on each residue site. Selective
constraints on residue sites are recorded in amino acid orders in homologous
sequences and also in the evolutionary trace of amino acid substitutions. A
challenge is to extract direct dependences between residue sites by removing
indirect dependences through other residues within a protein or even through
other molecules. Recent attempts of disentangling direct from indirect
dependences of amino acid types between residue positions in multiple sequence
alignments have revealed that the strength of inferred residue pair couplings
is an excellent predictor of residue-residue proximity in folded structures.
Here, we report an alternative attempt of inferring co-evolving site pairs from
concurrent and compensatory substitutions between sites in each branch of a
phylogenetic tree. First, branch lengths of a phylogenetic tree inferred by the
neighbor-joining method are optimized as well as other parameters by maximizing
a likelihood of the tree in a mechanistic codon substitution model. Mean
changes of quantities, which are characteristic of concurrent and compensatory
substitutions, accompanied by substitutions at each site in each branch of the
tree are estimated with the likelihood of each substitution. Partial
correlation coefficients of the characteristic changes along branches between
sites are calculated and used to rank co-evolving site pairs. Accuracy of
contact prediction based on the present co-evolution score is comparable to
that achieved by a maximum entropy model of protein sequences for 15 protein
families taken from the Pfam release 26.0. Besides, this excellent accuracy
indicates that compensatory substitutions are significant in protein evolution.Comment: 17 pages, 4 figures, and 4 tables with supplementary information of 5
figure
COMPARATIVE GENOMICS OF TRANSCRIPTIONAL REGULATION IN YEASTS AND ITS APPLICATION TO IDENTIFICATION OF A CANDIDATE ALPHA-ISOPROPYLMALATE TRANSPORTER
- …