9,846 research outputs found

    The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

    Get PDF
    One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

    Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

    Get PDF
    One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

    Systematic analysis of primary sequence domain segments for the discrimination between class C GPCR subtypes

    Get PDF
    G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.Peer ReviewedPostprint (author's final draft

    Mutagenesis Objective Search and Selection Tool (MOSST): an algorithm to predict structure-function related mutations in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Functionally relevant artificial or natural mutations are difficult to assess or predict if no structure-function information is available for a protein. This is especially important to correctly identify functionally significant non-synonymous single nucleotide polymorphisms (nsSNPs) or to design a site-directed mutagenesis strategy for a target protein. A new and powerful methodology is proposed to guide these two decision strategies, based only on conservation rules of physicochemical properties of amino acids extracted from a multiple alignment of a protein family where the target protein belongs, with no need of explicit structure-function relationships.</p> <p>Results</p> <p>A statistical analysis is performed over each amino acid position in the multiple protein alignment, based on different amino acid physical or chemical characteristics, including hydrophobicity, side-chain volume, charge and protein conformational parameters. The variances of each of these properties at each position are combined to obtain a global statistical indicator of the conservation degree of each property. Different types of physicochemical conservation are defined to characterize relevant and irrelevant positions. The differences between statistical variances are taken together as the basis of hypothesis tests at each position to search for functionally significant mutable sites and to identify specific mutagenesis targets. The outcome is used to statistically predict physicochemical consensus sequences based on different properties and to calculate the amino acid propensities at each position in a given protein. Hence, amino acid positions are identified that are putatively responsible for function, specificity, stability or binding interactions in a family of proteins. Once these key functional positions are identified, position-specific statistical distributions are applied to divide the 20 common protein amino acids in each position of the protein's primary sequence into a group of functionally non-disruptive amino acids and a second group of functionally deleterious amino acids.</p> <p>Conclusions</p> <p>With this approach, not only conserved amino acid positions in a protein family can be labeled as functionally relevant, but also non-conserved amino acid positions can be identified to have a physicochemically meaningful functional effect. These results become a discriminative tool in the selection and elaboration of rational mutagenesis strategies for the protein. They can also be used to predict if a given nsSNP, identified, for instance, in a genomic-scale analysis, can have a functional implication for a particular protein and which nsSNPs are most likely to be functionally silent for a protein. This analytical tool could be used to rapidly and automatically discard any irrelevant nsSNP and guide the research focus toward functionally significant mutations. Based on preliminary results and applications, this technique shows promising performance as a valuable bioinformatics tool to aid in the development of new protein variants and in the understanding of function-structure relationships in proteins.</p

    Detection of discriminative sequence patterns in the neighborhood of proline cis peptide bonds and their functional annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Polypeptides are composed of amino acids covalently bonded via a peptide bond. The majority of peptide bonds in proteins is found to occur in the <it>trans </it>conformation. In spite of their infrequent occurrence, <it>cis </it>peptide bonds play a key role in the protein structure and function, as well as in many significant biological processes.</p> <p>Results</p> <p>We perform a systematic analysis of regions in protein sequences that contain a proline <it>cis </it>peptide bond in order to discover non-random associations between the primary sequence and the nature of proline <it>cis/trans </it>isomerization. For this purpose an efficient pattern discovery algorithm is employed which discovers regular expression-type patterns that are overrepresented (i.e. appear frequently repeated) in a set of sequences. Four types of pattern discovery are performed: i) exact pattern discovery, ii) pattern discovery using a chemical equivalency set, iii) pattern discovery using a structural equivalency set and iv) pattern discovery using certain amino acids' physicochemical properties. The extracted patterns are carefully validated using a specially implemented scoring function and a significance measure (i.e. log-probability estimate) indicative of their specificity. The score threshold for the first three types of pattern discovery is 0.90 while for the last type of pattern discovery 0.80. Regarding the significance measure, all patterns yielded values in the range [-9, -31] which ensure that the derived patterns are highly unlikely to have emerged by chance. Among the highest scoring patterns, most of them are consistent with previous investigations concerning the neighborhood of <it>cis </it>proline peptide bonds, and many new ones are identified. Finally, the extracted patterns are systematically compared against the PROSITE database, in order to gain insight into the functional implications of <it>cis </it>prolyl bonds.</p> <p>Conclusion</p> <p><it>Cis </it>patterns with matches in the PROSITE database fell mostly into two main functional clusters: family signatures and protein signatures. However considerable propensity was also observed for targeting signals, active and phosphorylation sites as well as domain signatures.</p

    Multiple Property Tolerance Analysis for the Evaluation of Missense Mutations

    Get PDF
    Computational prediction of the impact of a mutation on protein function is still not accurate enough for clinical diagnostics without additional human expert analysis. Sequence alignment-based methods have been extensively used but their results highly depend on the quality of the input alignments and the choice of sequences. Incorporating the structural information with alignments improves prediction accuracy. Here, we present a conservation of amino acid properties method for mutation prediction, Multiple Properties Tolerance Analysis (MuTA), and a new strategy, MuTA/S, to incorporate the solvent accessible surface (SAS) property into MuTA. Instead of combining multiple features by machine learning or mathematical methods, an intuitive strategy is used to divide the residues of a protein into different groups, and in each group the properties used is adjusted

    In silico modeling of chemical and biological interactions at different scales

    Get PDF
    En les últimes dècades, molts països han imposat regulacions sobre els efectes potencials de les substàncies químiques envers la salut humana i els criteris mediambientals. A més a més, tenint en compte el temps necessari per a les proves d’avaluació dels efectes de gran nombre de productes químics i el seu cost ha produït un ràpid augment en el nombre de models computacionals, que relacionen l'estructura de les substàncies químiques amb la seva activitat biològica. Actualment existeixen els models de relació estructura-activitat (SAR) per a productes químics, utilitzant un enfocament similar s’ha desenvolupat un nou model i generat conjunts d'alertes metabòliques que es puguin utilitzar juntament amb els mètodes Q(SAR). Aquest treball presenta regles SAR per a la predicció de mutagenicitat in vitro, juntament amb alertes metabòliques per a la predicció in vivo. Permetent, obtenir una idea preliminar sobre si un producte químic exhibeix el mateix comportament mutagènic in vitro i in vivo. Entre els compostos químics, les nanopartícules, també s'estan utilitzant cada cop més a través de diferents classes de productes usats pels consumidors. En un context fisiològic, la corona de les proteïnes constitueix la interfície entre les nanopartícules i les cèl·lules. En aquest treball, s'han utilitzat les propietats fisicoquímiques de la corona de les proteïnes per tal de desenvolupar un model capaç de predir l'associació cel·lular. Finalment, aquesta tesi es centra en el tema de la resistència als fàrmacs en els bacteris, que s'ha convertit en un assumpte d'interès global. Amb l'augment de la resistència dels bacteris als antibiòtics, és important disposar d'informació sobre la resposta que les noves proteïnes bacterianes tindrien sobre els antibiòtics actualment disponibles. Pel qual, en aquest treball s'ha desenvolupat un mètode d'alineació lliure per millorar la classificació en perfils de resistència de les proteïnes bacterianes, en base a les seves propietats fisicoquímiques.En las últimas décadas, muchos países han impuesto regulaciones sobre los efectos potenciales de las sustancias químicas con respecto a la salud humana y a criterios medio ambientales. Además, el tiempo necesario para las pruebas de evaluación de los efectos de un gran número de productos químicos y su coste ha producido un rápido aumento en el número de modelos computacionales que relacionan la estructura de las sustancias químicas con su actividad biológica. Actualmente existen los modelos de relación estructura-actividad (SAR) para productos químicos, utilizando un enfoque similar se ha desarrollado un nuevo modelo para generar conjuntos de alertas metabólicas que puedan utilizarse junto con los métodos Q(SAR). Este trabajo presenta reglas SAR para la predicción de mutagenicidad in vitro, junto con alertas metabólicas para la predicción también in vivo. Permitiendo, además, obtener una idea preliminar de si un producto químico exhibe el mismo comportamiento mutagénico in vitro e in vivo. Entre los compuestos químicos, las nanopartículas, también se están utilizando cada vez más en diferentes clases de productos usados por los consumidores. En términos fisiológicos, la corona de las proteínas constituye la interfaz entre las nanopartículas y las células. En este trabajo se ha desarrollado un modelo con las propiedades físico-químicas de la corona de las proteínas para predecir la asociación celular. Por último, esta tesis se centra en el tema de la resistencia a los fármacos en las bacterias, que se ha convertido en un asunto de interés global. Con el aumento de la resistencia de las bacterias a los antibióticos, es importante disponer información sobre la respuesta que las nuevas proteínas bacterianas tendrán sobre los antibióticos actualmente disponibles. Por esto se ha desarrollado un método de alineación libre para mejorar la clasificación en perfiles de resistencia de las proteínas bacterianas en base a sus propiedades físico-químicas.In the past decades, government, society and industry at large have taken keen interest in the impact at different scales that exposure to chemicals has on humans and environment. Many countries governments have imposed regulations as per which it has become important to establish the potential effects of these chemical entities with respect to human health and environmental endpoints. Given the time taken by traditional tests, costs and large number of chemicals to be evaluated, there has been a rapid growth in the number of computational models that link the structure of chemicals to their biological activity. To extend the basis of knowledge that currently exists in Structure Activity Relationship (SAR) models for chemicals, a similar approach was used to develop a new model and generate sets of metabolic triggers which can be used together with Q(SAR) methods. This thesis presents SAR rules for prediction of mutagenicity in vitro, along with metabolic triggers for prediction of mutagenicity in vitro and in vivo. Along with chemical compounds, nanoparticles are also being used increasingly across different classes of consumers’ products. Since, in physiological context, the protein corona constitutes the interface between the nanoparticle and cells, it plays a fundamental role in nanoparticle-cell association. In this thesis, the physicochemical properties of protein corona were used to develop a model to predict cell association. Lastly, this thesis focuses on the topic of drug resistance in bacteria, which has become a matter of global concern. With bacteria growing resistant to antibiotics at a faster pace than discovery of new antibiotics, information on the response that new bacterial proteins would have to the currently available antibiotics, based on their similarity with the known antibiotic-resistant proteins is necessary. An alignment-free method was developed to improve the resistance profile classification of bacterial proteins based on their physicochemical properties

    Specialized dynamical properties of promiscuous residues revealed by simulated conformational ensembles

    Get PDF
    The ability to interact with different partners is one of the most important features in proteins. Proteins that bind a large number of partners (hubs) have been often associated with intrinsic disorder. However, many examples exist of hubs with an ordered structure, and evidence of a general mechanism promoting promiscuity in ordered proteins is still elusive. An intriguing hypothesis is that promiscuous binding sites have specific dynamical properties, distinct from the rest of the interface and pre-existing in the protein isolated state. Here, we present the first comprehensive study of the intrinsic dynamics of promiscuous residues in a large protein data set. Different computational methods, from coarse-grained elastic models to geometry-based sampling methods and to full-atom Molecular Dynamics simulations, were used to generate conformational ensembles for the isolated proteins. The flexibility and dynamic correlations of interface residues with a different degree of binding promiscuity were calculated and compared considering side chain and backbone motions, the latter both on a local and on a global scale. The study revealed that (a) promiscuous residues tend to be more flexible than nonpromiscuous ones, (b) this additional flexibility has a higher degree of organization, and (c) evolutionary conservation and binding promiscuity have opposite effects on intrinsic dynamics. Findings on simulated ensembles were also validated on ensembles of experimental structures extracted from the Protein Data Bank (PDB). Additionally, the low occurrence of single nucleotide polymorphisms observed for promiscuous residues indicated a tendency to preserve binding diversity at these positions. A case study on two ubiquitin-like proteins exemplifies how binding promiscuity in evolutionary related proteins can be modulated by the fine-tuning of the interface dynamics. The interplay between promiscuity and flexibility highlighted here can inspire new directions in protein-protein interaction prediction and design methods. © 2013 American Chemical Society
    corecore