72 research outputs found

    A Structure-Based Approach for Detection of Thiol Oxidoreductases and Their Catalytic Redox-Active Cysteine Residues

    Get PDF
    Cysteine (Cys) residues often play critical roles in proteins, for example, in the formation of structural disulfide bonds, metal binding, targeting proteins to the membranes, and various catalytic functions. However, the structural determinants for various Cys functions are not clear. Thiol oxidoreductases, which are enzymes containing catalytic redox-active Cys residues, have been extensively studied, but even for these proteins there is little understanding of what distinguishes their catalytic redox Cys from other Cys functions. Herein, we characterized thiol oxidoreductases at a structural level and developed an algorithm that can recognize these enzymes by (i) analyzing amino acid and secondary structure composition of the active site and its similarity to known active sites containing redox Cys and (ii) calculating accessibility, active site location, and reactivity of Cys. For proteins with known or modeled structures, this method can identify proteins with catalytic Cys residues and distinguish thiol oxidoreductases from the enzymes containing other catalytic Cys types. Furthermore, by applying this procedure to Saccharomyces cerevisiae proteins containing conserved Cys, we could identify the majority of known yeast thiol oxidoreductases. This study provides insights into the structural properties of catalytic redox-active Cys and should further help to recognize thiol oxidoreductases in protein sequence and structure databases

    Protein structure search and local structure characterization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA.</p> <p>Results</p> <p>We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at <url>http://140.113.166.178/safast/</url>.</p> <p>Conclusion</p> <p>The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.</p

    Recruitment of rare 3-grams at functional sites: Is this a mechanism for increasing enzyme specificity?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A wealth of unannotated and functionally unknown protein sequences has accumulated in recent years with rapid progresses in sequence genomics, giving rise to ever increasing demands for developing methods to efficiently assess functional sites. Sequence and structure conservations have traditionally been the major criteria adopted in various algorithms to identify functional sites. Here, we focus on the distributions of the 20<sup>3 </sup>different types of <it>3</it>-grams (or triplets of sequentially contiguous amino acid) in the entire space of sequences accumulated to date in the UniProt database, and focus in particular on the rare <it>3</it>-grams distinguished by their high entropy-based information content.</p> <p>Results</p> <p>Comparison of the UniProt distributions with those observed near/at the active sites on a non-redundant dataset of 59 enzyme/ligand complexes shows that the active sites preferentially recruit <it>3</it>-grams distinguished by their low frequency in the UniProt. Three cases, Src kinase, hemoglobin, and tyrosyl-tRNA synthetase, are discussed in details to illustrate the biological significance of the results.</p> <p>Conclusion</p> <p>The results suggest that recruitment of rare <it>3</it>-grams may be an efficient mechanism for increasing specificity at functional sites. Rareness/scarcity emerges as a feature that may assist in identifying key sites for proteins function, providing information complementary to that derived from sequence alignments. In addition it provides us (for the first time) with a means of identifying potentially functional sites from sequence information alone, when sequence conservation properties are not available.</p

    Mining protein loops using a structural alphabet and statistical exceptionality

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied.</p> <p>Results</p> <p>We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints.</p> <p>Conclusions</p> <p>We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at <url>http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/</url>.</p

    Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies

    Get PDF
    Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%–63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with “overprediction” of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation

    Uso de plantas com finalidade medicinal por pessoas vivendo com HIV/ AIDS em terapia antirretroviral

    Get PDF
    Este foi um estudo observacional, transversal analítico realizado em ambulatório de referência do Estado do Maranhão-Brasil, no período de maio de 2009 a fevereiro de 2010, com o objetivo de estudar o uso de plantas com finalidade medicinal entre pessoas vivendo com HIV/AIDS, em uso de antirretrovirais. Um total de 339 pessoas respondeu um questionário abordando o uso de plantas e características demográficas, socioeconômicas, comportamentais, relacionadas à soropositividade e ao uso de antirretrovirais. A prevalência de utilização de plantas foi de 34,81%. As mais utilizadas foram: Turnera ulmifolia (12,09%); Melissa officinalis (10,62%); Plectranthus barbatus (7,67%); Cymbopogan citratus (capim limão) (4,72%) e Mentha spp. (hortelã) (2,36%). A maioria das pessoas (96,61%) referiu melhora após a utilização. Um percentual de 75,42% dos usuários de plantas não informou essa prática ao médico. Entre os que informaram o uso, 55,17% afirmaram que o médico estava de acordo e somente uma pessoa foi orientada a interromper o uso (3,45%). Apenas um médico (3,45%) indicou o uso de plantas. A análise ajustada evidenciou diferença para uso de plantas em relação ao sexo feminino (RP=1,58, 95% IC 1,15-2,15 p 0,004) e à orientação sexual do tipo homossexual (RP=0,63 IC 0,44-0,90 p 0,012). Este estudo aponta para a necessidade de melhor diálogo entre médico e pacientes sobre o uso de plantas com finalidade medicinal, alertando sobre possíveis perigos quando associados aos antirretrovirais, especialmente entre usuários do sexo feminino ou com prática do tipo homossexual.It is an observational, analytic study, developed at a hospital in Maranhao-Brazil, from May-2009 to February-2010. The objective was to study the use of plants with medicinal purpose in people living with HIV/AIDS and using retroviral therapy. A total of 339 (three hundred and thirty-nine) people answered a questionnaire about the use of plants and demographic, socioeconomic, behavioral characteristics, including those related to HIV status and use of antiretroviral therapy The prevalence of the use of plants with medicinal purpose was 34,81%. The most often used were: Turnera ulmifolia (chanana) (12,09%), Melissa officinalis (erva cidreira,) (10,62%), Plectranthus barbatus (boldo) (7,67%), Cymbopogan citratus (capim limão) (4,72%) and Mentha spp. (hortelã) (2,36%). Most people interviewed (96,61%) reported improvement after use. A rate of 75,42% of the plant users had not reported their practice to a medical doctor. Among respondents who reported use, 55.17% said their doctor agreed to it, and only one person was advised to discontinue the use (3,45%); only one doctor (3,45%) indicated the use of plants. Multivariate analysis showed differences for the use of plants in relation to gender (female PR= 1,58, 95% CI 1,15 - 2,15 p 0,004) and homosexual practices (PR= 0,63, CI 0,44 - 0,90 p 0,012). This study highlights the need for a better dialogue between doctors and patients about the use of plants with medicinal purposes, and warns about possible dangers when they are combined with antirretroviral therapy, particularly between female and homossexual users

    InterCarb: a community effort to improve interlaboratory standardization of the carbonate clumped isotope thermometer using carbonate standards

    Get PDF
    Increased use and improved methodology of carbonate clumped isotope thermometry has greatly enhanced our ability to interrogate a suite of Earth-system processes. However, interlaboratory discrepancies in quantifying carbonate clumped isotope (Δ47) measurements persist, and their specific sources remain unclear. To address interlaboratory differences, we first provide consensus values from the clumped isotope community for four carbonate standards relative to heated and equilibrated gases with 1,819 individual analyses from 10 laboratories. Then we analyzed the four carbonate standards along with three additional standards, spanning a broad range of δ47 and Δ47 values, for a total of 5,329 analyses on 25 individual mass spectrometers from 22 different laboratories. Treating three of the materials as known standards and the other four as unknowns, we find that the use of carbonate reference materials is a robust method for standardization that yields interlaboratory discrepancies entirely consistent with intralaboratory analytical uncertainties. Carbonate reference materials, along with measurement and data processing practices described herein, provide the carbonate clumped isotope community with a robust approach to achieve interlaboratory agreement as we continue to use and improve this powerful geochemical tool. We propose that carbonate clumped isotope data normalized to the carbonate reference materials described in this publication should be reported as Δ47 (I-CDES) values for Intercarb-Carbon Dioxide Equilibrium Scale

    A horizontal alignment tool for numerical trend discovery in sequence data: application to protein hydropathy.

    Get PDF
    PMC3794901An algorithm is presented that returns the optimal pairwise gapped alignment of two sets of signed numerical sequence values. One distinguishing feature of this algorithm is a flexible comparison engine (based on both relative shape and absolute similarity measures) that does not rely on explicit gap penalties. Additionally, an empirical probability model is developed to estimate the significance of the returned alignment with respect to randomized data. The algorithm's utility for biological hypothesis formulation is demonstrated with test cases including database search and pairwise alignment of protein hydropathy. However, the algorithm and probability model could possibly be extended to accommodate other diverse types of protein or nucleic acid data, including positional thermodynamic stability and mRNA translation efficiency. The algorithm requires only numerical values as input and will readily compare data other than protein hydropathy. The tool is therefore expected to complement, rather than replace, existing sequence and structure based tools and may inform medical discovery, as exemplified by proposed similarity between a chlamydial ORFan protein and bacterial colicin pore-forming domain. The source code, documentation, and a basic web-server application are available.JH Libraries Open Access Fun

    Using decision trees to extract patterns for dairy culling management

    No full text
    Trabajo presentado en el 14ª Artificial Intelligence Applications and Innovations (AIAI 2018), celebrado en Rhodes (Grecia), del 25 al 27 de mayo de 2018The management of a dairy farm involves taking difficult technical and economic decisions such as the replacement of some cows to either maintain or increase the productivity of the farm. However, there is not a standard method supporting the selection procedure of which animals need to be culled. In the present study we used decision trees to develop a model able to classify a cow according to the average herd productivity. This model, obtained from a data base around 98000 cows, predicts the average milk production of the first lactation of a cow based on the monthly milk controls corresponding to the lactation peak. Our goal is to identify poor productive cows during her first lactation in order to make more accurate selections of which cows should be culled.This research is partially funded by the projects (Project AGL2015-67409-C2-01-R) from the Spanish Ministry of Economy and Competitiveness; RPREF (CSIC Intramural 201650E044); and the grant 2014-SGR-118 from the Generalitat de Catalunya.Peer reviewe
    corecore