306 research outputs found

    Prots: A fragment based protein thermo‐stability potential

    Get PDF
    Designing proteins with enhanced thermo‐stability has been a main focus of protein engineering because of its theoretical and practical significance. Despite extensive studies in the past years, a general strategy for stabilizing proteins still remains elusive. Thus effective and robust computational algorithms for designing thermo‐stable proteins are in critical demand. Here we report PROTS, a sequential and structural four‐residue fragment based protein thermo‐stability potential. PROTS is derived from a nonredundant representative collection of thousands of thermophilic and mesophilic protein structures and a large set of point mutations with experimentally determined changes of melting temperatures. To the best of our knowledge, PROTS is the first protein stability predictor based on integrated analysis and mining of these two types of data. Besides conventional cross validation and blind testing, we introduce hypothetical reverse mutations as a means of testing the robustness of protein thermo‐stability predictors. In all tests, PROTS demonstrates the ability to reliably predict mutation induced thermo‐stability changes as well as classify thermophilic and mesophilic proteins. In addition, this white‐box predictor allows easy interpretation of the factors that influence mutation induced protein stability changes at the residue level. Proteins 2012; © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/89526/1/23163_ftp.pd

    MACHINE LEARNING AND BIOINFORMATIC INSIGHTS INTO KEY ENZYMES FOR A BIO-BASED CIRCULAR ECONOMY

    Get PDF
    The world is presently faced with a sustainability crisis; it is becoming increasingly difficult to meet the energy and material needs of a growing global population without depleting and polluting our planet. Greenhouse gases released from the continuous combustion of fossil fuels engender accelerated climate change, and plastic waste accumulates in the environment. There is need for a circular economy, where energy and materials are renewably derived from waste items, rather than by consuming limited resources. Deconstruction of the recalcitrant linkages in natural and synthetic polymers is crucial for a circular economy, as deconstructed monomers can be used to manufacture new products. In Nature, organisms utilize enzymes for the efficient depolymerization and conversion of macromolecules. Consequently, by employing enzymes industrially, biotechnology holds great promise for energy- and cost-efficient conversion of materials for a circular economy. However, there is need for enhanced molecular-level understanding of enzymes to enable economically viable technologies that can be applied on a global scale. This work is a computational study of key enzymes that catalyze important reactions that can be utilized for a bio-based circular economy. Specifically, bioinformatics and data- mining approaches were employed to study family 7 glycoside hydrolases (GH7s), which are the principal enzymes in Nature for deconstructing cellulose to simple sugars; a cytochrome P450 enzyme (GcoA) that catalyzes the demethylation of lignin subunits; and MHETase, a tannase-family enzyme utilized by the bacterium, Ideonella sakaiensis, in the degradation and assimilation of polyethylene terephthalate (PET). Since enzyme function is fundamentally dependent on the primary amino-acid sequence, we hypothesize that machine-learning algorithms can be trained on an ensemble of functionally related enzymes to reveal functional patterns in the enzyme family, and to map the primary sequence to enzyme function such that functional properties can be predicted for a new enzyme sequence with significant accuracy. We find that supervised machine learning identifies important residues for processivity and accurately predicts functional subtypes and domain architectures in GH7s. Bioinformatic analyses revealed conserved active-site residues in GcoA and informed protein engineering that enabled expanded enzyme specificity and improved activity. Similarly, bioinformatic studies and phylogenetic analysis provided evolutionary context and identified crucial residues for MHET-hydrolase activity in a tannase-family enzyme (MHETase). Lastly, we developed machine-learning models to predict enzyme thermostability, allowing for high-throughput screening of enzymes that can catalyze reactions at elevated temperatures. Altogether, this work provides a solid basis for a computational data-driven approach to understanding, identifying, and engineering enzymes for biotechnological applications towards a more sustainable world

    Novel DNA ligases from the Red Sea brine pools: Cloning, expression, in silico characterization and comparative thermostability

    Get PDF
    Extreme physicochemical conditions such as high temperature, salinity, and the presence of heavy metal are characteristics of some of the Red Sea brine pools environment. We screened two Red Sea Brine pools (Atlantis II(AT-II), and Discovery Deeps (DD), and one interface layer (Kebrit Deep) to identify novel DNA ligases with potential extreme biochemical properties. Furthermore, we did an in silico comparative thermostability study by examining the stability role of proline and arginine residues at the loop conformations and exposed regions of ligase sequences from metagenomic assemblies of different extreme environments, including the Red Sea metagenomes. A sequence-based metagenomics approach was used to identify the putative DNA ligase sequences from the Red Sea brine pools and interface layer metagenomes downloaded from the NCBI database. 6, 148, 453 metagenomic reads were assembled using MEGAHIT, which generated 783,176 contigs. A concatenated HMM model built from raw HMM models of ATP and NAD+ ligases domains available from the Pfam database was used to scan predicted ORFs from contigs. A total of 18 ORFs were identified, and two of the ORFs, LigATL1 ATP type), from AT-II and LigKDU4 (NAD+ type) from KB, were selected for synthesis, phylogenetic study, and further preliminary characterizations. LigATL1 was cloned, expressed, and partially purified. Additionally, ligase sequences from psychrophilic, mesophilic, thermophilic, and hyperthermophilic environments were retrieved from the NCBI database for comparative thermostability study with some of the putative Red Sea ligase sequences. The retrieved 22 ligase sequences were divided into five different closest taxonomic groups. ConSurf and DisEMBL servers were used to analyze Proline (Pro) and Arginine (Arg) residues in the exposed/buried regions and the loop and hot loops regions of the putative ligases (retrieved + Red Sea), respectively. A putative LigATL1 showed a 38% identity to ATP-Dependent DNA ligase from Erysipelotrichaceae bacterium, while LigKDU4 has a 60% identity to NAD+ Dependent DNA ligase from Candidatus Marinimicrobia bacterium. The phylogenetic analysis suggests that LigATL1 belongs to the LigD(ATP type) family, while LigKDU4 is amongst the LigA family,(NAD+ type). LigATL1 has 100% confidence modeling using bound-adenylated nicked human DNA ligase as a template, and is superimposed with the highest similarity (Template modeling ™ score =1.0) to thermostable DNA ligase from S.solfataricus. LigKDU4 modeled with 100% confidence using bound-adenylated nicked E.coli DNA ligase, and also superimposed with the highest similarity(TM score= 1.0) to thermostable t2 filiform DNA ligase. In vitro, functional assay and biochemical characterization are still required to confirm both enzyme activity and thermostability. For the comparative thermostability analysis, many Ligase sequences from thermophilic or hyper thermophilic environments had higher Pro and Arg residues both at the exposed and the hot loops regions than those from other mesophilic and psychrophilic environments. The highest buried Pro and Arg residues were reported for ligase sequences from psychrophilic environments at almost all the groups. Two out of five putative ligase sequences selected for the thermophilic AT-II environment had more hot loops and less buried Pro and Arg residues than other pairs in their respective groups. In the case of LigKDU4(MLK), it has the highest hot loop and exposed Arg residues than its pairs in its group which is unusual when compared to Arg analysis in other groups. This comparative study can give an insight into improving the thermal stability of enzymes generally

    Step-by-step design of proteins for small molecule interaction: a review on recent milestones

    Get PDF
    Protein design is the field of synthetic biology that aims at developing de-novo custom made proteins and peptides for specific applications. Despite exploring an ambitious goal, recent computational advances in both hardware and software technologies have paved the way to high-throughput screening and detailed design of novel folds and improved functionalities. Modern advances in the field of protein design for small molecule targeting are described in this review, organized in a step-by-step fashion: from the conception of a new or upgraded active binding site, to scaffold design, sequence optimization and experimental expression of the custom protein. In each step, contemporary examples are described, and state-of-the art software is briefly explored.publishe

    DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins

    Get PDF
    Abstract not availableAli Akbar Jamali, Reza Ferdousi, Saeed Razzaghi, Jiuyong Li, Reza Safdari, and Esmaeil Ebrahimi

    Prediction of lung tumor types based on protein attributes by machine learning algorithms

    Full text link

    Ancestral sequence reconstruction as an accessible tool for the engineering of biocatalyst stability

    Get PDF
    Synthetic biology is the engineering of life to imbue non-natural functionality. As such, synthetic biology has considerable commercial potential, where synthetic metabolic pathways are utilised to convert low value substrates into high value products. High temperature biocatalysis offers several system-level benefits to synthetic biology, including increased dilution of substrate, increased reaction rates and decreased contamination risk. However, the current gamut of tools available for the engineering of thermostable proteins are either expensive, unreliable, or poorly understood, meaning their adoption into synthetic biology workflows is treacherous. This thesis focuses on the development of an accessible tool for the engineering of protein thermostability, based on the evolutionary biology tool ancestral sequence reconstruction (ASR). ASR allows researchers to walk back in time along the branches of a phylogeny and predict the most likely representation of a protein family’s ancestral state. It also has simple input requirements, and its output proteins are often observed to be thermostable, making ASR tractable to protein engineering. Chapter 2 explores the applicability of multiple ASR methods to the engineering of a carboxylic acid reductase (CAR) biocatalyst. Despite the family emerging only 500 million years ago, ancestors presented considerable improvements in thermostability over their modern counterparts. We proceed to thoroughly characterise the ancestral enzymes for their inclusion into the CAR biocatalytic toolbox. Chapter 3 explores why ASR derived proteins may be thermostable despite a mesophilic history. An in silico toolbox for tracking models of protein stability over simulated evolutionary time at the sequence, protein and population level is built. We provide considerable evidence that the sequence alignments of simulated protein families that evolved at marginal stability are saturated with stabilising residues. ASR therefore derives sequences from a dataset biased toward stabilisation. Importantly, while ASR is accessible, it still requires a steep learning curve based on its requirements of phylogenetic expertise. In chapter 4, we utilise the evolutionary model produced in chapter 3 to develop a highly simplified and accessible ASR protocol. This protocol was then applied to engineer CAR enzymes that displayed dramatic increases in thermostability compared to both modern CARs and the thermostable AncCARs presented in chapter 2
    corecore