11,164 research outputs found

    Strong Selection Significantly Increases Epistatic Interactions in the Long-Term Evolution of a Protein

    Full text link
    Epistatic interactions between residues determine a protein's adaptability and shape its evolutionary trajectory. When a protein experiences a changed environment, it is under strong selection to find a peak in the new fitness landscape. It has been shown that strong selection increases epistatic interactions as well as the ruggedness of the fitness landscape, but little is known about how the epistatic interactions change under selection in the long-term evolution of a protein. Here we analyze the evolution of epistasis in the protease of the human immunodeficiency virus type 1 (HIV-1) using protease sequences collected for almost a decade from both treated and untreated patients, to understand how epistasis changes and how those changes impact the long-term evolvability of a protein. We use an information-theoretic proxy for epistasis that quantifies the co-variation between sites, and show that positive information is a necessary (but not sufficient) condition that detects epistasis in most cases. We analyze the "fossils" of the evolutionary trajectories of the protein contained in the sequence data, and show that epistasis continues to enrich under strong selection, but not for proteins whose environment is unchanged. The increase in epistasis compensates for the information loss due to sequence variability brought about by treatment, and facilitates adaptation in the increasingly rugged fitness landscape of treatment. While epistasis is thought to enhance evolvability via valley-crossing early-on in adaptation, it can hinder adaptation later when the landscape has turned rugged. However, we find no evidence that the HIV-1 protease has reached its potential for evolution after 9 years of adapting to a drug environment that itself is constantly changing.Comment: 25 pages, 9 figures, plus Supplementary Material including Supplementary Text S1-S7, Supplementary Tables S1-S2, and Supplementary Figures S1-2. Version that appears in PLoS Genetic

    Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

    Get PDF
    One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

    FoxO gene family evolution in vertebrates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Forkhead box, class O (FoxO) belongs to the large family of forkhead transcription factors that are characterized by a conserved forkhead box DNA-binding domain. To date, the FoxO group has four mammalian members: FoxO1, FoxO3a, FoxO4 and FoxO6, which are orthologs of DAF16, an insulin-responsive transcription factor involved in regulating longevity of worms and flies. The degree of homology between these four members is high, especially in the forkhead domain, which contains the DNA-binding interface. Yet, mouse FoxO knockouts have revealed that each FoxO gene has its unique role in the physiological process. Whether the functional divergences are primarily due to adaptive selection pressure or relaxed selective constraint remains an open question. As such, this study aims to address the evolutionary mode of FoxO, which may lead to the functional divergence.</p> <p>Results</p> <p>Sequence similarity searches have performed in genome and scaffold data to identify homologues of FoxO in vertebrates. Phylogenetic analysis was used to characterize the family evolutionary history by identifying two duplications early in vertebrate evolution. To determine the mode of evolution in vertebrates, we performed a rigorous statistical analysis with FoxO gene sequences, including relative rate ratio tests, branch-specific <it>d</it><sub><it>N</it></sub>/<it>d</it><sub><it>S </it></sub>ratio tests, site-specific <it>d</it><sub><it>N</it></sub>/<it>d</it><sub><it>S </it></sub>ratio tests, branch-site <it>d</it><sub><it>N</it></sub>/<it>d</it><sub><it>S </it></sub>ratio tests and clade level amino acid conservation/variation patterns analysis. Our results suggest that FoxO is constrained by strong purifying selection except four sites in FoxO6, which have undergone positive Darwinian selection. The functional divergence in this family is best explained by either relaxed purifying selection or positive selection.</p> <p>Conclusion</p> <p>We present a phylogeny describing the evolutionary history of the FoxO gene family and show that the genes have evolved through duplications followed by purifying selection except for four sites in FoxO6 fixed by positive selection lie mostly within the non-conserved optimal PKB motif in the C-terminal part. Relaxed selection may play important roles in the process of functional differentiation evolved through gene duplications as well.</p

    A thermodynamic basis for prebiotic amino acid synthesis and the nature of the first genetic code

    Full text link
    Of the twenty amino acids used in proteins, ten were formed in Miller's atmospheric discharge experiments. The two other major proposed sources of prebiotic amino acid synthesis include formation in hydrothermal vents and delivery to Earth via meteorites. We combine observational and experimental data of amino acid frequencies formed by these diverse mechanisms and show that, regardless of the source, these ten early amino acids can be ranked in order of decreasing abundance in prebiotic contexts. This order can be predicted by thermodynamics. The relative abundances of the early amino acids were most likely reflected in the composition of the first proteins at the time the genetic code originated. The remaining amino acids were incorporated into proteins after pathways for their biochemical synthesis evolved. This is consistent with theories of the evolution of the genetic code by stepwise addition of new amino acids. These are hints that key aspects of early biochemistry may be universal.Comment: 16 pages, 2 tables, 4 figures. Accepted for publication in Astrobiolog

    Directional Darwinian Selection in proteins

    Get PDF

    On the construction and interpretation of fitness landscapes for HIV: a computational perspective

    Get PDF
    To identify vulnerable viral targets to incorporate into an immunogen, fitness landscapes for the viral proteome have been constructed. These landscapes describe the sum or synergistic replicative cost exacted on the virus for any combination of non-synonymous mutations. Here we attempt to assess the robustness of current computational methods for measuring the fitness cost of HIV polymorphisms in these landscapes. We also address in the following chapters assumptions and shortcomings that may underlie current landscape\u27s uneven ability to predict fitness effects. In the first chapter, I appraise the robustness of current frame-works that derive fitness costs from patient sequence data. In this chapter I also address the fields over-reliance on cross-sectional data, justified by the assumptions that the viral populations can be 1) regarded as an ideal population at equilibrium and 2) are at large unmarred by host pressures. To explore how these problematic assumptions may undermine landscape construction, I assemble an alternate landscape, where fitness costs were directly measured from temporal population fluxes using a dynamical systems framework. This landscape paints a far different picture of the fitness topography. In the following chapter, I tackle another problematic aspect of current landscapes, their neglect of physicochemical detail. I demonstrate that this model contrivance, leads us to under or over estimating fitness costs at positions with highly divergent or similar physicochemical character. In response, I adapt a population genetics model to account for the functional impact of each residue mutation, and illustrate that it improves our ability to predict in vitro viral fitness. Finally, in the last chapter, we employ several different metrics of fitness to determine if the overall topography of the fitness landscape might shift over the course of early infection. Research has suggested that the replicative capacity of the virus increases over time and that viral populations are continuously evolving in response to immune pressures. We found, that although the protein was not mutational static at residue resolution, at the regional and protein level it remained static due to compensating mutations

    The adaptive evolution of the mammalian mitochondrial genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The mitochondria produce up to 95% of a eukaryotic cell's energy through oxidative phosphorylation. The proteins involved in this vital process are under high functional constraints. However, metabolic requirements vary across species, potentially modifying selective pressures. We evaluate the adaptive evolution of 12 protein-coding mitochondrial genes in 41 placental mammalian species by assessing amino acid sequence variation and exploring the functional implications of observed variation in secondary and tertiary protein structures.</p> <p>Results</p> <p>Wide variation in the properties of amino acids were observed at functionally important regions of cytochrome <it>b </it>in species with more-specialized metabolic requirements (such as adaptation to low energy diet <it>or </it>large body size, such as in elephant, dugong, sloth, and pangolin, and adaptation to unusual oxygen requirements, for example diving in cetaceans, flying in bats, and living at high altitudes in alpacas). Signatures of adaptive variation in the NADH dehydrogenase complex were restricted to the loop regions of the transmembrane units which likely function as protons pumps. Evidence of adaptive variation in the cytochrome <it>c </it>oxidase complex was observed mostly at the interface between the mitochondrial and nuclear-encoded subunits, perhaps evidence of co-evolution. The ATP8 subunit, which has an important role in the assembly of F<sub>0</sub>, exhibited the highest signal of adaptive variation. ATP6, which has an essential role in rotor performance, showed a high adaptive variation in predicted loop areas.</p> <p>Conclusion</p> <p>Our study provides insight into the adaptive evolution of the mtDNA genome in mammals and its implications for the molecular mechanism of oxidative phosphorylation. We present a framework for future experimental characterization of the impact of specific mutations in the function, physiology, and interactions of the mtDNA encoded proteins involved in oxidative phosphorylation.</p

    The investigation of type-specific features of the copper coordinating AA9 proteins and their effect on the interaction with crystalline cellulose using molecular dynamics studies

    Get PDF
    AA9 proteins are metallo-enzymes which are crucial for the early stages of cellulose degradation. AA9 proteins have been suggested to cleave glycosidic bonds linking cellulose through the use of their Cu2+ coordinating active site. AA9 proteins possess different regioselectivities depending on the resulting cleavage they form and as result, are grouped accordingly. Type 1 AA9 proteins cleave the C1 carbon of cellulose while Type 2 AA9 proteins cleave the C4 carbon and Type 3 AA9 proteins cleave either C1 or C4 carbons. The steric congestion of the AA9 active site has been proposed to be a contributor to the observed regioselectivity. As such, a bioinformatics characterisation of type-specific sequence and structural features was performed. Initially AA9 protein sequences were obtained from the Pfam database and multiple sequence alignment was performed. The sequences were phylogenetically characterised and sequences were grouped into their respective types and sub-groups were identified. A selection analysis was performed on AA9 LPMO types to determine the selective pressure acting on AA9 protein residues. Motif discovery was then performed to identify conserved sequence motifs in AA9 proteins. Once type-specific sequence features were identified structural mapping was performed to assess possible effects on substrate interaction. Physicochemical property analysis was also performed to assess biochemical differences between AA9 LPMO types. Molecular dynamics (MD) simulations were then employed to dynamically assess the consequences of the discovered type-specific features on AA9-cellulose interaction. Due to the absence of AA9 specific force field parameters MD simulations were not readily applicable. As a result, Potential Energy Surface (PES) scans were performed to evaluate the force field parameters for the AA9 active site using the PM6 semi empirical approach and least squares fitting. A Type 1 AA9 active site was constructed from the crystal structure 4B5Q, encompassing only the Cu2+ coordinating residues, the Cu2+ ion and two water residues. Due to the similarity in AA9 active sites, the Type force field parameters were validated on all three AA9 LPMO types. Two MD simulations for each AA9 LPMO types were conducted using two separate Lennard-Jones parameter sets. Once completed, the MD trajectories were analysed for various features including the RMSD, RMSF, radius of gyration, coordination during simulation, hydrogen bonding, secondary structure conservation and overall protein movement. Force field parameters were successfully evaluated and validated for AA9 proteins. MD simulations of AA9 proteins were able to reveal the presence of unique type-specific binding modes of AA9 active sites to cellulose. These binding modes were characterised by the presence of unique type-specific loops which were present in Type 2 and 3 AA9 proteins but not in Type 1 AA9 proteins. The loops were found to result in steric congestion that affects how the Cu2+ ion interacts with cellulose. As a result, Cu2+ binding to cellulose was observed for Type 1 and not Type 2 and 3 AA9 proteins. In this study force field parameters have been evaluated for the Type 1 active site of AA9 proteins and this parameters were evaluated on all three types and binding. Future work will focus on identifying the nature of the reactive oxygen species and performing QM/MM calculations to elucidate the reactive mechanism of all three AA9 LPMO types
    • …
    corecore