Skip to main content
Article thumbnail
Location of Repository

Inferring Stabilizing Mutations from Protein Phylogenies: Application to Influenza Hemagglutinin

By Jesse D. Bloom and Matthew J. Glassman


One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

Topics: Research Article
Publisher: Public Library of Science
OAI identifier:
Provided by: PubMed Central

Suggested articles


  1. (2004). A Bayesian mixture model for across-site heterogeneities inthe amino-acidreplacement process.
  2. (1994). A codon-based model of nucleotide substitution probabilities for protein-coding DNA sequences.
  3. (2000). A DNA transfection system for generation of influenza A virus from eight plasmids.
  4. (1994). A novel screening strategy for stabilization of Escherichia coli ribonuclease HI involving a screen for an intragenic suppressor of carboxyl-terminal deletions.
  5. (2006). A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the protein data bank.
  6. (1982). A simple method for displaying the hydropathic character of a protein.
  7. (2006). A single determinant dominates the rate of yeast protein evolution.
  8. (2005). A stability pattern of hydrophobic mutations that reflects evolutionary structural optimization.
  9. (1990). Additivity of mutational effects in proteins.
  10. (1992). Amino acid substitutions influencing intracellular protein folding pathways.
  11. (2006). An integrated view of protein evolution.
  12. (1998). Assessing the impact of secondary structure and solvent accessibility on protein evolution.
  13. (1986). Bacteriophage l cro mutations: effects on activity and intracellular degradation.
  14. (2001). Bayesian inference of phylogeny and its impact on evolutionary biology.
  15. (1996). Combining protein evolution and secondary structure.
  16. (2006). Comparison of avian and human influenza A viruses reveals a mutational bias on the viral genomes.
  17. (1989). Complete mutagenesis of the HIV-1 protease.
  18. (1997). Conformational stabilities of Escherichia coli RNase HI variants with a series of amino acid substitutions at a cavity within the hydrophobic core.
  19. (2004). Consensus-based engineering of protein stability: from intrabodies to thermostable enzymes.
  20. (2004). Construction of stabilized proteins by combinatorial consensus mutagenesis.
  21. (1993). Cooperative stabilization of Escherichia coli ribonuclease HI by insertion of Gly80b and Gly-77RAla substitution.
  22. (2006). CUPSAT: prediction of protein stability upon point mutations.
  23. (2004). Danchin A
  24. (1999). Directed evolution converts subtilisin E into a functional equivalent of thermitase.
  25. (1998). Directed evolution of a thermostable esterase.
  26. (2002). Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction.
  27. (1992). Effect of cavity-modulating mutations on the stability of Escherichia coli ribonuclease HI.
  28. (2002). Energetic landscape of a-lytic protease optimizes longevity through kinetic stability.
  29. (1993). Engineering multiple properties of a protein by combinatorial mutagenesis.
  30. (1995). Enhancement of protein stability by the combination of point mutations in T4 lysozyme is additive.
  31. (2004). Enhancing the thermal tolerance and gastric performance of a microbial phytase for use as a phosphate-mobilizing monogastric-feed supplement.
  32. (2002). Evaluation of structural and evolutionary contributions to deleterious protein mutation prediction.
  33. (2005). Evidence for selection on synonymous mutations affecting stability of mrna secondary structure in mammals.
  34. (2007). Evolution favors protein mutational robustness in sufficiently large populations..
  35. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach.
  36. (1985). Genetic analysis of staphylococcal nuclease: identification of three intragenic ‘‘global’’ suppressors of nuclease-minus mutations.
  37. (1995). High resistance of Escherichia coli ribonuclease HI variant with quintiple thermostabilizing mutations to thermal denaturation, acid denaturation, and proteolytic degradation.
  38. (2005). I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure.
  39. (2006). Identification of physicochemical selective pressure on protein encoding nucleotide sequences.
  40. (1986). Identification of the defects in the hemagglutinin gene of two temperaturesensitive mutants of A/WSN/33 influenza virus.
  41. (1995). Identification of the sites for suppressor mutations on the hemagglutinin molecule to temperature-sensitive phenotype of the influenza virus.
  42. (2006). Improved mutants from directed evolution are biased to orthologous substitutions.
  43. (2004). Inferring Phylogenies. SunderlandMassachusetts: Sinauer Associates,
  44. (1992). Influence of transition rates and scan rate and kinetic simulations of differential scanning calorimetry profiles of reversible and irreversible protein denaturation.
  45. (2006). Influenza: propagation, quantification, and storage. Curr Protoc Microbiol
  46. (1989). Large increases in general stability for subtilisin BPN’ through incremental changes in free energy of unfolding.
  47. (2002). Linear correlation between thermal stability and folding kinetics of lysozyme.
  48. (2000). Lower kinetic limit to protein thermal stability: a proposal regarding protein stability in vivo and its relation with misfolding diseases.
  49. (2001). Macromolecular crowding: an important but neglected aspect of the intracellular environment.
  50. (1973). Maximum likelihood and minimum-step methods for estimating evolutionary trees from data on discrete characters.
  51. (1999). Membrane protein folding and stability: physical principles.
  52. (2008). Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution.
  53. (1998). Models of natural mutations including site heterogeneity.
  54. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models.
  55. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.
  56. (1998). Mutagenesis of a buried polar interaction in an SH3 domain: sequence conservation provides the best prediction of stability effects.
  57. (2000). Mutational analysis of the propensity for amyloid formation by a globular protein.
  58. (2006). Natural selection for kinetic stability is a likely origin of correlations between mutational effects on protein energetics and frequencies of amino acid occurrences in sequence alignments.
  59. (2000). Nature disfavors sequences of alternating polar and nonpolar amino acids: implications for amyloidogenesis.
  60. (1997). Nature of driving force for protein folding: a result from analyzing the statistical potential.
  61. (1999). Neutral evolution of mutational robustness.
  62. (2006). New lowviscosity overlay medium for viral plaque assays.
  63. (2002). On different facets of regularization theory.
  64. (1993). On the pH dependence of protein stability.
  65. (2001). On the relationship between protein stability and folding kinetics: a comparative study of the N-terminal domains of RNase HI, E. coli and Bacillus stearothermophilus L9.
  66. (2002). Origins of the high stability of an in vitro selected coldshock protein.
  67. (2007). PHYLIP (Phylogeny Inference Package) version 3.67. Distributed by the author.
  68. (2006). Point mutations in protein globular domains: contributions from function, stability, and misfolding.
  69. (2000). PoPMuSiC, an algorithm for predicting protein mutant stability changes. application to prion proteins.
  70. (2002). Potential applications and pitfalls of Bayesian inference of phylogeny.
  71. (2002). Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations.
  72. (2004). Principles of protein folding, misfolding, and aggregation.
  73. (1973). Principles that govern the folding of protein chains.
  74. (2005). PROBCONS: probabilistic consistency-based multiple sequence alignment.
  75. (2007). Protein stability imposes limits on organism complexity and speed of molecular evolution.
  76. (2006). Protein stability promotes evolvability.
  77. (2006). ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions.
  78. (2005). Pulse proteolysis: a simple method for quantitative determination of protein stability and ligand binding.
  79. (2007). Quantifying the impact of protein tertiary structure on molecular evolution.
  80. (2001). Rapid evolution of reversible denaturation and elevated melting temperature in a microbial haloalkane dehalogenase.
  81. (2004). Relation between protein stability, evolution and structure as probed by carboxylic acid mutations.
  82. (2007). Relative tolerance of mesostable and thermostable protein homologs to extensive mutation.
  83. (2007). Role of the charge-charge interactions in defining stability and halophilicity of the CspB proteins.
  84. (2004). Sarai A
  85. (1994). Sequence statistics reliably predict stabilizing mutations in a protein domain.
  86. (2005). Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling.
  87. (2002). Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations.
  88. (2006). Solution structure of p53 core domain: structural basis for its instability.
  89. (1993). Stabilization of Escherichia coli ribonuclease HI by cavity-filling mutations within a hydrophobic core.
  90. (2005). Stabilization of the cold shock protein CspB from Bacillus subtilis by evolutionary optimization of coulombic interactions.
  91. (1993). Step-wise mutation of barnase to binase: a procedure for engineering increased stability of proteins and an experimental analysis of the evolution of protein stability.
  92. (2001). Structural constraints and emergence of sequence patterns in protein evolution.
  93. (1992). Structural details of ribonuclease H from Escherichia coli as refined to an atomic resolution.
  94. (1994). Structural origins of pH and ionic strength effects on protein stability acid denaturation of sperm whale myoglobin.
  95. (1999). Structure and Mechanism in Protein Science.
  96. (1984). Sugiura A
  97. (2003). Systematic variation of amino acid substitutions for stringent assessment of pairwise covariation.
  98. (2002). The consensus concept for thermostability engineering of proteins: further proof of concept.
  99. (2007). The creation of a novel fluorescent protein guided by consensus engineering.
  100. (2008). The evolutionary genetics and emergence of avian influenza viruses in wild birds.
  101. (2004). The folding transition state of the cold shock protein is strongly polarized.
  102. (2008). The genomic and epidemiological dynamics of human influenza A virus.
  103. (1992). The hydrophobic core of Escherichia coli thioredoxin shows a high tolerance to nonconservative single amino acid substitutions.
  104. (2008). The Influenza Virus Resource at the National Center for Biotechnology Information.
  105. (2001). The predicted antigenicity of the haemagglutinin of the 1918 Spanish influenza pandemic suggests an avian origin.
  106. (2001). The roles of stability and contact order in determining protein folding rates.
  107. (2004). The structure and receptor binding properties of the 1918 influenza hemagglutinin.
  108. (2007). The Universal Protein Resource (UniProt).
  109. (2005). Thermodynamic prediction of protein neutrality.
  110. (2007). Thermodynamics of neutral protein evolution.
  111. (1997). Thermophilic proteins: stability and function in aqueous and organic solvents.
  112. (2003). Thermostabilization of bacterial fructosyl-amino acid oxidase by directed evolution.
  113. (1992). Thermostabilization of Escherichia coli ribonuclease HI by replacing left-handed Lys95 with Gly or Asn.
  114. (2000). Transition-state structure as a unifying basis in proteinfolding mechanisms: contact order, chain topology, stabiity, and the extended nucleus mechanism.
  115. (2007). Transition-transversion bias is not universal: a counter example from grasshopper psuedogenes.
  116. (2003). Translational selection and yeast proteome evolution.
  117. (2000). Two exposed amino acid residues confer thermostability on a cold shock protein.
  118. (1993). Universal nucleic acidbinding domain revealed by crystal structure of the B. subtilis major cold-shock protein.
  119. (2002). Why are proteins marginally stable?
  120. (2002). Why are proteins so robust to site mutations?

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.