169 research outputs found

    MACHINE LEARNING AND BIOINFORMATIC INSIGHTS INTO KEY ENZYMES FOR A BIO-BASED CIRCULAR ECONOMY

    Get PDF
    The world is presently faced with a sustainability crisis; it is becoming increasingly difficult to meet the energy and material needs of a growing global population without depleting and polluting our planet. Greenhouse gases released from the continuous combustion of fossil fuels engender accelerated climate change, and plastic waste accumulates in the environment. There is need for a circular economy, where energy and materials are renewably derived from waste items, rather than by consuming limited resources. Deconstruction of the recalcitrant linkages in natural and synthetic polymers is crucial for a circular economy, as deconstructed monomers can be used to manufacture new products. In Nature, organisms utilize enzymes for the efficient depolymerization and conversion of macromolecules. Consequently, by employing enzymes industrially, biotechnology holds great promise for energy- and cost-efficient conversion of materials for a circular economy. However, there is need for enhanced molecular-level understanding of enzymes to enable economically viable technologies that can be applied on a global scale. This work is a computational study of key enzymes that catalyze important reactions that can be utilized for a bio-based circular economy. Specifically, bioinformatics and data- mining approaches were employed to study family 7 glycoside hydrolases (GH7s), which are the principal enzymes in Nature for deconstructing cellulose to simple sugars; a cytochrome P450 enzyme (GcoA) that catalyzes the demethylation of lignin subunits; and MHETase, a tannase-family enzyme utilized by the bacterium, Ideonella sakaiensis, in the degradation and assimilation of polyethylene terephthalate (PET). Since enzyme function is fundamentally dependent on the primary amino-acid sequence, we hypothesize that machine-learning algorithms can be trained on an ensemble of functionally related enzymes to reveal functional patterns in the enzyme family, and to map the primary sequence to enzyme function such that functional properties can be predicted for a new enzyme sequence with significant accuracy. We find that supervised machine learning identifies important residues for processivity and accurately predicts functional subtypes and domain architectures in GH7s. Bioinformatic analyses revealed conserved active-site residues in GcoA and informed protein engineering that enabled expanded enzyme specificity and improved activity. Similarly, bioinformatic studies and phylogenetic analysis provided evolutionary context and identified crucial residues for MHET-hydrolase activity in a tannase-family enzyme (MHETase). Lastly, we developed machine-learning models to predict enzyme thermostability, allowing for high-throughput screening of enzymes that can catalyze reactions at elevated temperatures. Altogether, this work provides a solid basis for a computational data-driven approach to understanding, identifying, and engineering enzymes for biotechnological applications towards a more sustainable world

    Novel DNA ligases from the Red Sea brine pools: Cloning, expression, in silico characterization and comparative thermostability

    Get PDF
    Extreme physicochemical conditions such as high temperature, salinity, and the presence of heavy metal are characteristics of some of the Red Sea brine pools environment. We screened two Red Sea Brine pools (Atlantis II(AT-II), and Discovery Deeps (DD), and one interface layer (Kebrit Deep) to identify novel DNA ligases with potential extreme biochemical properties. Furthermore, we did an in silico comparative thermostability study by examining the stability role of proline and arginine residues at the loop conformations and exposed regions of ligase sequences from metagenomic assemblies of different extreme environments, including the Red Sea metagenomes. A sequence-based metagenomics approach was used to identify the putative DNA ligase sequences from the Red Sea brine pools and interface layer metagenomes downloaded from the NCBI database. 6, 148, 453 metagenomic reads were assembled using MEGAHIT, which generated 783,176 contigs. A concatenated HMM model built from raw HMM models of ATP and NAD+ ligases domains available from the Pfam database was used to scan predicted ORFs from contigs. A total of 18 ORFs were identified, and two of the ORFs, LigATL1 ATP type), from AT-II and LigKDU4 (NAD+ type) from KB, were selected for synthesis, phylogenetic study, and further preliminary characterizations. LigATL1 was cloned, expressed, and partially purified. Additionally, ligase sequences from psychrophilic, mesophilic, thermophilic, and hyperthermophilic environments were retrieved from the NCBI database for comparative thermostability study with some of the putative Red Sea ligase sequences. The retrieved 22 ligase sequences were divided into five different closest taxonomic groups. ConSurf and DisEMBL servers were used to analyze Proline (Pro) and Arginine (Arg) residues in the exposed/buried regions and the loop and hot loops regions of the putative ligases (retrieved + Red Sea), respectively. A putative LigATL1 showed a 38% identity to ATP-Dependent DNA ligase from Erysipelotrichaceae bacterium, while LigKDU4 has a 60% identity to NAD+ Dependent DNA ligase from Candidatus Marinimicrobia bacterium. The phylogenetic analysis suggests that LigATL1 belongs to the LigD(ATP type) family, while LigKDU4 is amongst the LigA family,(NAD+ type). LigATL1 has 100% confidence modeling using bound-adenylated nicked human DNA ligase as a template, and is superimposed with the highest similarity (Template modeling ℱ score =1.0) to thermostable DNA ligase from S.solfataricus. LigKDU4 modeled with 100% confidence using bound-adenylated nicked E.coli DNA ligase, and also superimposed with the highest similarity(TM score= 1.0) to thermostable t2 filiform DNA ligase. In vitro, functional assay and biochemical characterization are still required to confirm both enzyme activity and thermostability. For the comparative thermostability analysis, many Ligase sequences from thermophilic or hyper thermophilic environments had higher Pro and Arg residues both at the exposed and the hot loops regions than those from other mesophilic and psychrophilic environments. The highest buried Pro and Arg residues were reported for ligase sequences from psychrophilic environments at almost all the groups. Two out of five putative ligase sequences selected for the thermophilic AT-II environment had more hot loops and less buried Pro and Arg residues than other pairs in their respective groups. In the case of LigKDU4(MLK), it has the highest hot loop and exposed Arg residues than its pairs in its group which is unusual when compared to Arg analysis in other groups. This comparative study can give an insight into improving the thermal stability of enzymes generally

    Protein structure and function relationships: application of computational approaches to biological and biomedical problems

    Get PDF
    In this work we have studied several cases by means of different computational approaches for the analysis of the structure and function relationships. In chapter 2 we describe a method, based on multiple neural networks, that we developed for evaluate the accuracy of predicted threedimensional protein structures. This tool has been used in different studies described in this work, in which the prediction of the 3D structure of the protein under study, has been necessary. In chapter 3, the interaction among a new class of natural sweeteners (steviol glycosides) and the human sweet taste receptor, has been analyzed by means of an insilico docking study, which allowed to identify the preferential binding site for the steviol glycosides. In chapter 4 the relationship between the dynamical properties and the function of some psychrophilic enzyme has been studied. A comparative study (psychrophile vs mesophile) of the thermodynamic properties of two different enzymes belonging to the elastases and the uracilDNAglycosylases families has been done. This study, carried out with molecular dynamic simulations, revealed that the low temperature adaptation is related to the different flexibility of the psychrophilic compared to the mesophilic enzyme. In chapter 5, we have studied the structural and functional impact of point mutations on three different proteins which are involved in serious rare diseases which cause grave metabolic disorders

    Bayesian prediction of bacterial growth temperature range based on genome sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments.</p> <p>Results</p> <p>This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles). The predictive performance of these protein families were compared to those of 87 basic sequence features (relative use of amino acids and codons, genomic and 16S rDNA AT content and genome size). When using naĂŻve Bayesian inference, it was possible to correctly predict the optimal temperature range with a Matthews correlation coefficient of up to 0.68. The best predictive performance was always achieved by including protein families as well as structural features, compared to either of these alone. A dedicated computer program was created to perform these predictions.</p> <p>Conclusions</p> <p>This study shows that protein families associated with specific thermophilicity classes can provide effective input data for thermophilicity prediction, and that the naĂŻve Bayesian approach is effective for such a task. The program created for this study is able to efficiently distinguish between thermophilic, mesophilic and psychrophilic adapted bacterial genomes.</p

    Modelling the catabolite and microbiological profile of cheddar cheese manufactured from ayrshire milk

    Get PDF
    Thesis (D. Tech.) -- Central University of Technology, Free State, 2010Branded dairy products have lately become a global trend. As a result of this, the origin of the milk used in the manufacturing of branded cheeses must be declared by the producer, since it is known that these products are highly adulterated with foreign milk. In South Africa, branded Ayrshire Cheddar cheese has become highly popular due to its unique organoleptic properties and in light of claims that it ripens much faster than cheese made from other milk (not including Ayrshire). This study was therefore directed to investigate the unique properties of branded Ayrshire Cheddar cheese versus Cheddar cheese manufactured from a mixture of other breeds’ milk (not including Ayrshire milk) and to establish a catabolite profile for each cheese type. The outlay of the thesis was constructed into six chapters each with its own outcomes. The first chapter focused on the variations between the two Cheddar cheese batches (produced from Ayrshire and other breeds’ milk) with regards to organic acid, selected chemical parameters and starter microbiotia. In the following three chapters mathematical models were developed that would predict organic-; fatty and amino acid fluxtuations respectively in the cheese made from Ayrshire and other milk. In the last chapter two artificial neural networks were designed with the two starter organisms, Lactococcus lactis and Streptococcus thermophilus as variable indicator respectively. Thirty-two cheese samples of each batch (pure Ayrshire (4) / mix breed with no Ayrshire (4)) were ripened and samples were analysed under the same conditions on the following days after production: 2, 10, 22, 36, 50, 64, 78, and 92. In the subsequent chapters, the following analysis were done on each day of analysis: organic acid by means of high performance liquid chromatography (HPLC); fatty acids by means of Gas Chromatography Mass Spectometry (GCMS); amino acids by means of GC-MS; microbial analysis by means of traditional methods, total DNA extraction and polymerase chain reaction (PCR); and standard chemical analysis for moisture, NaCl and pH. In the first research chapter, the minimum and maximum (min/max) values, standard deviations and proposed rel X values of organic acids were evaluated in Ayrshire and the mixed-breed Cheddar cheese, and showed that isovaleric acid is the organic acid with the least variation relative to concentration in both cheeses and it was assumed that this organic acid is the most effective indicator of cheese uniformity. Clear differences in organic acids, chemical variables and starter micro-organisms were also evident in the two cheese batches. Results obtained from the regression models which was defined for each organic -; amino - and fatty acid by means of mathematical equations can be used by the manufacturer to achieve i.e. the selection of cheese for specialist lines, the early exclusion of defective cheeses, and the establishment of brand origin (Ayrshire vs. mixed-breed Cheddar cheeses). The regression graphs also illustrate unique flux patterns in Ayrshire and the mixed-breed in terms of organic -, fatty -, and amino acid content. In the last chapter, the discrimination between the two batches was respectively done via artificial neural network (ANN) modelling of Lactococcus lactis and Streptococcus thermophilus as indicator organisms. The ANN consisted of a multilayered network with supervised training arranged into an ordered hierarchy of layers, in which connections were allowed only between nodes in immediately adjacent layers. The construction thereof allowed for two output nodes, connected to an input layer consisting of two nodes to which the inputs were connected. In both cheeses the results from the ANN showed acceptable classification of the cheeses based on the counts of L. lactis and S. thermophilus

    The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

    Get PDF
    One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

    Polyphasic taxonomy of thermophilic actinomycetes

    Get PDF
    PhD ThesisMolecular systematic methods were applied in a series of studies designed to resolve the taxonomic relationships of thermophilic actinomycetes known to be difficult to classify using standard taxonomic procedures. The test strains included representatives of clusters defined in an extensiven umerical phenetic survey of thermophilic streptomycetesa nd twelve marker strains. The resultant genotypic data together with the results of corresponding phenotypic studies were used to highlight novel taxa and to improve the circumscription of validly described species. The most comprehensive study was undertaken to clarify relationships within and between representative alkalitolerant, thermophilic and neutrophilic, thermophilic streptomycetes isolated from soil and appropriate marker strains. The resultant data, notably those from DNA: DNA relatedness studies, supported the taxonomic integrity of the validly described species Streptomyces thermodiastaticus, Streptomyces thermoviolaceus and Streptomyces thermovulgaris. However, the genotypic and phenotypic data clearly show that Streptomyces thermonitrificans Desai and Dhala 1967 and Streptomyces thermovulgaris (Henssen 1957) Goodfellow et al. 1987 represent a single species. On the basis of the priority, Streptomyces thermonitrificans is a later subjective synonym of Streptomyces thermovulgaris. Similarly, eight out of eleven representative alkalitolerant, thermophilic isolates and three out of sixteen representative neutrophilic, thermophilic isolates had a combination of properties consistent with their classification as Streptomyces thermovulgaris. One of the remaining alkalitolerant, thermophilic isolate, Streptomyces strain TA56, merited species status. The name Streptomyces thermoalcalitolerans sp. nov. is proposed for this strain. A neutrophilic, thermophilic isolate, Streptomyces strain NAR85, was identified as Streptomyces thermodiastaticus. Four other neutrophilic thermophilic isolates assigned to a numerical phenetic cluster and a thermophilic isolates from poultry faeces were also considered to warrant species status; the names Streptomyces eurythermophilus sp. nov. and Streptomyces thermocoprophilus sp. nov. are proposed to accommodate these strains. It was also concluded that additional comparative taxonomic studies are required to clarify the relationships between additional thermophilic streptomycete strains included in the present investigation. A corresponding polyphasic approach was used to clarify the taxonomy of six thermophilic isolates provisionally assigned to either the genera Amycolatopsis or Excellospora. Two of the isolates, strain NT202 and NT303, had properties consistent with their classification in the genus Amycolatopsis. However, the genotypic and phenotypic data also showed that these strains formed a new centre of taxonomic variation for which the name Amycolatopsis eurythermus sp. nov. is proposed. Similarly, the four remaining strains formed two new centre of taxonomic variation within the genus Excellospora. It is proposed that isolates TA113 and TA114 be designated Excellospora alcalithermophilus sp. nov. Similarly, the name Excellospora thermoalcalitolerans sp. nov. is proposed for strains TA86 and TA111. An emended description is also given for the genus Excellospora.Overseas Research Student Award

    DĂ©veloppement, validation et nouvelles applications d’un modĂšle d’analyse des modes normaux basĂ© sur la sĂ©quence et la structure de protĂ©ines

    Get PDF
    Les protĂ©ines existent sous diffĂ©rents Ă©tats fonctionnels rĂ©gulĂ©s de façon prĂ©cise par leur environnement afin de maintenir l‘homĂ©ostasie de la cellule et de l‘organisme vivant. La prĂ©valence de ces Ă©tats protĂ©iques est dictĂ©e par leur Ă©nergie libre de Gibbs alors que la vitesse de transition entre ces Ă©tats biologiquement pertinents est dĂ©terminĂ©e par le paysage dâ€˜Ă©nergie libre. Ces paramĂštres sont particuliĂšrement intĂ©ressants dans un contexte thĂ©rapeutique et biotechnologique, oĂč leur perturbation par la modulation de la sĂ©quence protĂ©ique par des mutations affecte leur fonction. Bien que des nouvelles approches expĂ©rimentales permettent dâ€˜Ă©tudier l‘effet de mutations en haut dĂ©bit pour une protĂ©ine, ces mĂ©thodes sont laborieuses et ne couvrent qu‘une fraction de l‘ensemble des structures primaires d‘intĂ©rĂȘt. L‘utilisation de modĂšles bio-informatiques permet de tester et gĂ©nĂ©rer in silico diffĂ©rentes hypothĂšses afin d‘orienter les approches expĂ©rimentales. Cependant, ces mĂ©thodes basĂ©es sur la structure se concentrent principalement sur la prĂ©diction de l‘enthalpie d‘un Ă©tat, alors que plusieurs Ă©vidences expĂ©rimentales ont dĂ©montrĂ© l‘importance de la contribution de l‘entropie. De plus, ces approches ignorent l‘importance de l‘espace conformationnel protĂ©ique dictĂ© par le paysage Ă©nergĂ©tique cruciale Ă  son fonctionnement. Une analyse des modes normaux peut ĂȘtre effectuĂ©e afin d‘explorer cet espace par l‘approximation que la protĂ©ine est dans une conformation dâ€˜Ă©quilibre oĂč chaque acide aminĂ© est reprĂ©sentĂ© par une masse rĂ©gie par un potentiel harmonique. Les approches actuelles ignorent l‘identitĂ© des rĂ©sidus et ne peuvent prĂ©dire l‘effet de mutations sur les propriĂ©tĂ©s dynamiques. Nous avons dĂ©veloppĂ© un nouveau modĂšle appelĂ© ENCoM qui pallie Ă  cette lacune en intĂ©grant de l‘information physique et spĂ©cifique sur les contacts entre les atomes des chaĂźnes latĂ©rales. Cet ajout permet une meilleure description de changements conformationnels d‘enzymes, la prĂ©diction de l‘effet d‘une mutation allostĂ©rique dans la protĂ©ine DHFR et Ă©galement la prĂ©diction de l‘effet de mutations sur la stabilitĂ© protĂ©ique par une valeur entropique. Comparativement Ă  des approches spĂ©cifiquement dĂ©veloppĂ©es pour cette application, ENCoM est plus constant et prĂ©dit mieux l‘effet de mutations stabilisantes. Notre approche a Ă©galement Ă©tĂ© en mesure de capturer la pression Ă©volutive qui confĂšre aux protĂ©ines d‘organismes thermophiles une thermorĂ©sistance accrue
    • 

    corecore