306 research outputs found
Prots: A fragment based protein thermo‐stability potential
Designing proteins with enhanced thermo‐stability has been a main focus of protein engineering because of its theoretical and practical significance. Despite extensive studies in the past years, a general strategy for stabilizing proteins still remains elusive. Thus effective and robust computational algorithms for designing thermo‐stable proteins are in critical demand. Here we report PROTS, a sequential and structural four‐residue fragment based protein thermo‐stability potential. PROTS is derived from a nonredundant representative collection of thousands of thermophilic and mesophilic protein structures and a large set of point mutations with experimentally determined changes of melting temperatures. To the best of our knowledge, PROTS is the first protein stability predictor based on integrated analysis and mining of these two types of data. Besides conventional cross validation and blind testing, we introduce hypothetical reverse mutations as a means of testing the robustness of protein thermo‐stability predictors. In all tests, PROTS demonstrates the ability to reliably predict mutation induced thermo‐stability changes as well as classify thermophilic and mesophilic proteins. In addition, this white‐box predictor allows easy interpretation of the factors that influence mutation induced protein stability changes at the residue level. Proteins 2012; © 2011 Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/89526/1/23163_ftp.pd
MACHINE LEARNING AND BIOINFORMATIC INSIGHTS INTO KEY ENZYMES FOR A BIO-BASED CIRCULAR ECONOMY
The world is presently faced with a sustainability crisis; it is becoming increasingly difficult to meet the energy and material needs of a growing global population without depleting and polluting our planet. Greenhouse gases released from the continuous combustion of fossil fuels engender accelerated climate change, and plastic waste accumulates in the environment. There is need for a circular economy, where energy and materials are renewably derived from waste items, rather than by consuming limited resources. Deconstruction of the recalcitrant linkages in natural and synthetic polymers is crucial for a circular economy, as deconstructed monomers can be used to manufacture new products. In Nature, organisms utilize enzymes for the efficient depolymerization and conversion of macromolecules. Consequently, by employing enzymes industrially, biotechnology holds great promise for energy- and cost-efficient conversion of materials for a circular economy. However, there is need for enhanced molecular-level understanding of enzymes to enable economically viable technologies that can be applied on a global scale. This work is a computational study of key enzymes that catalyze important reactions that can be utilized for a bio-based circular economy. Specifically, bioinformatics and data- mining approaches were employed to study family 7 glycoside hydrolases (GH7s), which are the principal enzymes in Nature for deconstructing cellulose to simple sugars; a cytochrome P450 enzyme (GcoA) that catalyzes the demethylation of lignin subunits; and MHETase, a tannase-family enzyme utilized by the bacterium, Ideonella sakaiensis, in the degradation and assimilation of polyethylene terephthalate (PET). Since enzyme function is fundamentally dependent on the primary amino-acid sequence, we hypothesize that machine-learning algorithms can be trained on an ensemble of functionally related enzymes to reveal functional patterns in the enzyme family, and to map the primary sequence to enzyme function such that functional properties can be predicted for a new enzyme sequence with significant accuracy. We find that supervised machine learning identifies important residues for processivity and accurately predicts functional subtypes and domain architectures in GH7s. Bioinformatic analyses revealed conserved active-site residues in GcoA and informed protein engineering that enabled expanded enzyme specificity and improved activity. Similarly, bioinformatic studies and phylogenetic analysis provided evolutionary context and identified crucial residues for MHET-hydrolase activity in a tannase-family enzyme (MHETase). Lastly, we developed machine-learning models to predict enzyme thermostability, allowing for high-throughput screening of enzymes that can catalyze reactions at elevated temperatures. Altogether, this work provides a solid basis for a computational data-driven approach to understanding, identifying, and engineering enzymes for biotechnological applications towards a more sustainable world
Novel DNA ligases from the Red Sea brine pools: Cloning, expression, in silico characterization and comparative thermostability
Extreme physicochemical conditions such as high temperature, salinity, and the presence of heavy metal are characteristics of some of the Red Sea brine pools environment. We screened two Red Sea Brine pools (Atlantis II(AT-II), and Discovery Deeps (DD), and one interface layer (Kebrit Deep) to identify novel DNA ligases with potential extreme biochemical properties. Furthermore, we did an in silico comparative thermostability study by examining the stability role of proline and arginine residues at the loop conformations and exposed regions of ligase sequences from metagenomic assemblies of different extreme environments, including the Red Sea metagenomes. A sequence-based metagenomics approach was used to identify the putative DNA ligase sequences from the Red Sea brine pools and interface layer metagenomes downloaded from the NCBI database. 6, 148, 453 metagenomic reads were assembled using MEGAHIT, which generated 783,176 contigs. A concatenated HMM model built from raw HMM models of ATP and NAD+ ligases domains available from the Pfam database was used to scan predicted ORFs from contigs. A total of 18 ORFs were identified, and two of the ORFs, LigATL1 ATP type), from AT-II and LigKDU4 (NAD+ type) from KB, were selected for synthesis, phylogenetic study, and further preliminary characterizations. LigATL1 was cloned, expressed, and partially purified. Additionally, ligase sequences from psychrophilic, mesophilic, thermophilic, and hyperthermophilic environments were retrieved from the NCBI database for comparative thermostability study with some of the putative Red Sea ligase sequences. The retrieved 22 ligase sequences were divided into five different closest taxonomic groups. ConSurf and DisEMBL servers were used to analyze Proline (Pro) and Arginine (Arg) residues in the exposed/buried regions and the loop and hot loops regions of the putative ligases (retrieved + Red Sea), respectively. A putative LigATL1 showed a 38% identity to ATP-Dependent DNA ligase from Erysipelotrichaceae bacterium, while LigKDU4 has a 60% identity to NAD+ Dependent DNA ligase from Candidatus Marinimicrobia bacterium. The phylogenetic analysis suggests that LigATL1 belongs to the LigD(ATP type) family, while LigKDU4 is amongst the LigA family,(NAD+ type). LigATL1 has 100% confidence modeling using bound-adenylated nicked human DNA ligase as a template, and is superimposed with the highest similarity (Template modeling ™ score =1.0) to thermostable DNA ligase from S.solfataricus. LigKDU4 modeled with 100% confidence using bound-adenylated nicked E.coli DNA ligase, and also superimposed with the highest similarity(TM score= 1.0) to thermostable t2 filiform DNA ligase. In vitro, functional assay and biochemical characterization are still required to confirm both enzyme activity and thermostability. For the comparative thermostability analysis, many Ligase sequences from thermophilic or hyper thermophilic environments had higher Pro and Arg residues both at the exposed and the hot loops regions than those from other mesophilic and psychrophilic environments. The highest buried Pro and Arg residues were reported for ligase sequences from psychrophilic environments at almost all the groups. Two out of five putative ligase sequences selected for the thermophilic AT-II environment had more hot loops and less buried Pro and Arg residues than other pairs in their respective groups. In the case of LigKDU4(MLK), it has the highest hot loop and exposed Arg residues than its pairs in its group which is unusual when compared to Arg analysis in other groups. This comparative study can give an insight into improving the thermal stability of enzymes generally
Step-by-step design of proteins for small molecule interaction: a review on recent milestones
Protein design is the field of synthetic biology that aims at developing de-novo custom made proteins and peptides for specific applications. Despite exploring an ambitious goal, recent computational advances in both hardware and software technologies have paved the way to high-throughput screening and detailed design of novel folds and improved functionalities. Modern advances in the field of protein design for small molecule targeting are described in this review, organized in a step-by-step fashion: from the conception of a new or upgraded active binding site, to scaffold design, sequence optimization and experimental expression of the custom protein. In each step, contemporary examples are described, and state-of-the art software is briefly explored.publishe
DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins
Abstract not availableAli Akbar Jamali, Reza Ferdousi, Saeed Razzaghi, Jiuyong Li, Reza Safdari, and Esmaeil Ebrahimi
Recommended from our members
Biology in the information age : computational methods to understand and engineer the central dogma
The rise of NGS, big data, and ‘-omics’ has ushered biology into a new age, with the power to fundamentally change how research is approached. Rather than using a singular hypothesis, we can now incorporate more data-driven methods that drive new biological insights, explain emergent biological phenomena, and/or derive novel functionality. This thesis highlights the changing role of computation to both learn more about biological systems as well as leveraging data-intensive computational techniques to create new proteins and enzymes.
The ability for computational approaches to drive biological understanding is presented in three studies. First, the laboratory evolution of DNA polymerases, the workhorses of replication, towards novel functionality is explored. In the three polymerases created, modeling and large scale approaches are used to demonstrate the additional capability of each new enzyme. Next, two independent studies in the genomic adaptations needed for E. coli cells to adapt a 21st amino acid (selenocysteine and nitrotyrosine) are presented. Next generation sequencing is used to better understand the mechanisms of how cells accommodate the increased fitness burden placed by an orthogonal translation system. Lastly, community-wide changes in the oral microbiome are studied in the progression towards periodontitis, with implications towards potential therapeutic targets.
The capstone of this thesis leverages big data techniques to engineer novel proteins, the chief functional units within cells. Protein structural data is implemented into a convolutional neural network to associate amino acids with neighboring chemical microenvironments at state-of-the-art accuracy. This algorithm enables identification of gain-of-function mutations, and subsequent experiments confirm substantive improvements in stability-associated phenotypes in vivo across three diverse proteins. This work is the first demonstration of using deep learning to empirically improve protein function and opens a new avenue for protein engineering.Cellular and Molecular Biolog
Ancestral sequence reconstruction as an accessible tool for the engineering of biocatalyst stability
Synthetic biology is the engineering of life to imbue non-natural functionality. As such, synthetic biology has considerable commercial potential, where synthetic metabolic pathways are utilised to convert low value substrates into high value products. High temperature biocatalysis offers several system-level benefits to synthetic biology, including increased dilution of substrate, increased reaction rates and decreased contamination risk. However, the current gamut of tools available for the engineering of thermostable proteins are either expensive, unreliable, or poorly understood, meaning their adoption into synthetic biology workflows is treacherous. This thesis focuses on the development of an accessible tool for the engineering of protein thermostability, based on the evolutionary biology tool ancestral sequence reconstruction (ASR). ASR allows researchers to walk back in time along the branches of a phylogeny and predict the most likely representation of a protein family’s ancestral state. It also has simple input requirements, and its output proteins are often observed to be thermostable, making ASR tractable to protein engineering. Chapter 2 explores the applicability of multiple ASR methods to the engineering of a carboxylic acid reductase (CAR) biocatalyst. Despite the family emerging only 500 million years ago, ancestors presented considerable improvements in thermostability over their modern counterparts. We proceed to thoroughly characterise the ancestral enzymes for their inclusion into the CAR biocatalytic toolbox. Chapter 3 explores why ASR derived proteins may be thermostable despite a mesophilic history. An in silico toolbox for tracking models of protein stability over simulated evolutionary time at the sequence, protein and population level is built. We provide considerable evidence that the sequence alignments of simulated protein families that evolved at marginal stability are saturated with stabilising residues. ASR therefore derives sequences from a dataset biased toward stabilisation. Importantly, while ASR is accessible, it still requires a steep learning curve based on its requirements of phylogenetic expertise. In chapter 4, we utilise the evolutionary model produced in chapter 3 to develop a highly simplified and accessible ASR protocol. This protocol was then applied to engineer CAR enzymes that displayed dramatic increases in thermostability compared to both modern CARs and the thermostable AncCARs presented in chapter 2
- …