876 research outputs found

    Prots: A fragment based protein thermo‐stability potential

    Get PDF
    Designing proteins with enhanced thermo‐stability has been a main focus of protein engineering because of its theoretical and practical significance. Despite extensive studies in the past years, a general strategy for stabilizing proteins still remains elusive. Thus effective and robust computational algorithms for designing thermo‐stable proteins are in critical demand. Here we report PROTS, a sequential and structural four‐residue fragment based protein thermo‐stability potential. PROTS is derived from a nonredundant representative collection of thousands of thermophilic and mesophilic protein structures and a large set of point mutations with experimentally determined changes of melting temperatures. To the best of our knowledge, PROTS is the first protein stability predictor based on integrated analysis and mining of these two types of data. Besides conventional cross validation and blind testing, we introduce hypothetical reverse mutations as a means of testing the robustness of protein thermo‐stability predictors. In all tests, PROTS demonstrates the ability to reliably predict mutation induced thermo‐stability changes as well as classify thermophilic and mesophilic proteins. In addition, this white‐box predictor allows easy interpretation of the factors that influence mutation induced protein stability changes at the residue level. Proteins 2012; © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/89526/1/23163_ftp.pd

    Predicting protein thermostability changes from sequence upon multiple mutations

    Get PDF
    Motivation: A basic question in protein science is to which extent mutations affect protein thermostability. This knowledge would be particularly relevant for engineering thermostable enzymes. In several experimental approaches, this issue has been serendipitously addressed. It would be therefore convenient providing a computational method that predicts when a given protein mutant is more thermostable than its corresponding wild-type

    Role of Proteome Physical Chemistry in Cell Behavior.

    Get PDF
    We review how major cell behaviors, such as bacterial growth laws, are derived from the physical chemistry of the cell's proteins. On one hand, cell actions depend on the individual biological functionalities of their many genes and proteins. On the other hand, the common physics among proteins can be as important as the unique biology that distinguishes them. For example, bacterial growth rates depend strongly on temperature. This dependence can be explained by the folding stabilities across a cell's proteome. Such modeling explains how thermophilic and mesophilic organisms differ, and how oxidative damage of highly charged proteins can lead to unfolding and aggregation in aging cells. Cells have characteristic time scales. For example, E. coli can duplicate as fast as 2-3 times per hour. These time scales can be explained by protein dynamics (the rates of synthesis and degradation, folding, and diffusional transport). It rationalizes how bacterial growth is slowed down by added salt. In the same way that the behaviors of inanimate materials can be expressed in terms of the statistical distributions of atoms and molecules, some cell behaviors can be expressed in terms of distributions of protein properties, giving insights into the microscopic basis of growth laws in simple cells

    A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants.</p> <p>Results</p> <p>We report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophilic from mesophilic sequences. The prediction accuracies reached 98.9% and 97.3% in discriminating orthologous pairs in training and the holdout testing datasets, respectively. Moreover, the scoring function can distinguish non-homologous sequences with an accuracy of 88.4%. Additional blind tests using two datasets of experimentally investigated mutations demonstrated that the scoring function can be used to predict the relative thermostability of proteins and their mutants at very high accuracies (92.9% and 94.4%). We also developed an amino acid substitution preference matrix between mesophilic and hyperthermophilic proteins, which may be useful in designing more thermostable proteins.</p> <p>Conclusions</p> <p>We have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may reflect the thermal adaptation induced substitution biases. A web server implementing the scoring function and the dataset used in this study are freely available at <url>http://www.abl.ku.edu/thermorank/</url>.</p

    MACHINE LEARNING AND BIOINFORMATIC INSIGHTS INTO KEY ENZYMES FOR A BIO-BASED CIRCULAR ECONOMY

    Get PDF
    The world is presently faced with a sustainability crisis; it is becoming increasingly difficult to meet the energy and material needs of a growing global population without depleting and polluting our planet. Greenhouse gases released from the continuous combustion of fossil fuels engender accelerated climate change, and plastic waste accumulates in the environment. There is need for a circular economy, where energy and materials are renewably derived from waste items, rather than by consuming limited resources. Deconstruction of the recalcitrant linkages in natural and synthetic polymers is crucial for a circular economy, as deconstructed monomers can be used to manufacture new products. In Nature, organisms utilize enzymes for the efficient depolymerization and conversion of macromolecules. Consequently, by employing enzymes industrially, biotechnology holds great promise for energy- and cost-efficient conversion of materials for a circular economy. However, there is need for enhanced molecular-level understanding of enzymes to enable economically viable technologies that can be applied on a global scale. This work is a computational study of key enzymes that catalyze important reactions that can be utilized for a bio-based circular economy. Specifically, bioinformatics and data- mining approaches were employed to study family 7 glycoside hydrolases (GH7s), which are the principal enzymes in Nature for deconstructing cellulose to simple sugars; a cytochrome P450 enzyme (GcoA) that catalyzes the demethylation of lignin subunits; and MHETase, a tannase-family enzyme utilized by the bacterium, Ideonella sakaiensis, in the degradation and assimilation of polyethylene terephthalate (PET). Since enzyme function is fundamentally dependent on the primary amino-acid sequence, we hypothesize that machine-learning algorithms can be trained on an ensemble of functionally related enzymes to reveal functional patterns in the enzyme family, and to map the primary sequence to enzyme function such that functional properties can be predicted for a new enzyme sequence with significant accuracy. We find that supervised machine learning identifies important residues for processivity and accurately predicts functional subtypes and domain architectures in GH7s. Bioinformatic analyses revealed conserved active-site residues in GcoA and informed protein engineering that enabled expanded enzyme specificity and improved activity. Similarly, bioinformatic studies and phylogenetic analysis provided evolutionary context and identified crucial residues for MHET-hydrolase activity in a tannase-family enzyme (MHETase). Lastly, we developed machine-learning models to predict enzyme thermostability, allowing for high-throughput screening of enzymes that can catalyze reactions at elevated temperatures. Altogether, this work provides a solid basis for a computational data-driven approach to understanding, identifying, and engineering enzymes for biotechnological applications towards a more sustainable world

    Mechanisms for stabilisation and the maintenance of solubility in proteins from thermophiles

    Get PDF
    BACKGROUND: The database of protein structures contains representatives from organisms with a range of growth temperatures. Various properties have been studied in a search for the molecular basis of protein adaptation to higher growth temperature. Charged groups have emerged as key distinguishing factors for proteins from thermophiles and mesophiles. RESULTS: A dataset of 291 thermophile-derived protein structures is compared with mesophile proteins. Calculations of electrostatic interactions support the importance of charges, but indicate that increases in charge contribution to folded state stabilisation do not generally correlate with the numbers of charged groups. Relative propensities of charged groups vary, such as the substitution of glutamic for aspartic acid sidechains. Calculations suggest an energetic basis, with less dehydration for longer sidechains. Most other properties studied show weak or insignificant separation of proteins from moderate thermophiles or hyperthermophiles and mesophiles, including an estimate of the difference in sidechain rotameric entropy upon protein folding. An exception is increased burial of alanine and proline residues and decreased burial of phenylalanine, methionine, tyrosine and tryptophan in hyperthermophile proteins compared to those from mesophiles. CONCLUSION: Since an increase in the number of charged groups for hyperthermophile proteins is separable from charged group contribution to folded state stability, we hypothesise that charged group propensity is important in the context of protein solubility and the prevention of aggregation. Accordingly we find some separation between mesophile and hyperthermophile proteins when looking at the largest surface patch that does not contain a charged sidechain. With regard to our observation that aromatic sidechains are less buried in hyperthermophile proteins, further analysis indicates that the placement of some of these groups may facilitate the reduction of folding fluctuations in proteins of the higher growth temperature organisms

    Structural analysis of protein complexes associated with DNA maintenance

    Get PDF

    Genomics and genetics of <em>Sulfolobus islandicus</em> LAL14/1, a model hyperthermophilic archaeon

    Get PDF
    The 2 465 177 bp genome of Sulfolobus islandicus LAL14/1, host of the model rudivirus SIRV2, was sequenced. Exhaustive comparative genomic analysis of S. islandicus LAL14/1 and the nine other completely sequenced S. islandicus strains isolated from Iceland, Russia and USA revealed a highly syntenic common core genome of approximately 2 Mb and a long hyperplastic region containing most of the strain-specific genes. In LAL14/1, the latter region is enriched in insertion sequences, CRISPR (clustered regularly interspaced short palindromic repeats), glycosyl transferase genes, toxin–antitoxin genes and MITE (miniature inverted-repeat transposable elements). The tRNA genes of LAL14/1 are preferential targets for the integration of mobile elements but clusters of atypical genes (CAG) are also integrated elsewhere in the genome. LAL14/1 carries five CRISPR loci with 10 per cent of spacers matching perfectly or imperfectly the genomes of archaeal viruses and plasmids found in the Icelandic hot springs. Strikingly, the CRISPR_2 region of LAL14/1 carries an unusually long 1.9 kb spacer interspersed between two repeat regions and displays a high similarity to pING1-like conjugative plasmids. Finally, we have developed a genetic system for S. islandicus LAL14/1 and created ΔpyrEF and ΔCRISPR_1 mutants using double cross-over and pop-in/pop-out approaches, respectively. Thus, LAL14/1 is a promising model to study virus–host interactions and the CRISPR/Cas defence mechanism in Archaea
    corecore