3,670 research outputs found

    EFICAz²: enzyme function inference by a combined approach enhanced by machine learning

    Get PDF
    ©2009 Arakaki et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/10/107doi:10.1186/1471-2105-10-107Background: We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment. Results: We have developed two new EFICAz components, analogs to the two FDR-based components, where the discrimination between homo and heterofunctional members is based on the evaluation, via Support Vector Machine models, of all the aligned positions between the query sequence and the multiple sequence alignments associated to the enzyme families. Benchmark results indicate that: i) the new SVM-based components outperform their FDR-based counterparts, and ii) both SVM-based and FDR-based components generate unique predictions. We developed classification tree models to optimally combine the results from the six EFICAz components into a final EC number prediction. The new implementation of our approach, EFICAz², exhibits a highly improved prediction precision at MTTSI < 30% compared to the original EFICAz, with only a slight decrease in prediction recall. A comparative analysis of enzyme function annotation of the human proteome by EFICAz² and KEGG shows that: i) when both sources make EC number assignments for the same protein sequence, the assignments tend to be consistent and ii) EFICAz² generates considerably more unique assignments than KEGG. Conclusion: Performance benchmarks and the comparison with KEGG demonstrate that EFICAz² is a powerful and precise tool for enzyme function annotation, with multiple applications in genome analysis and metabolic pathway reconstruction. The EFICAz² web service is available at: http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.htm

    Automatic Assignment of EC Numbers

    Get PDF
    A wide range of research areas in molecular biology and medical biochemistry require a reliable enzyme classification system, e.g., drug design, metabolic network reconstruction and system biology. When research scientists in the above mentioned areas wish to unambiguously refer to an enzyme and its function, the EC number introduced by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) is used. However, each and every one of these applications is critically dependent upon the consistency and reliability of the underlying data for success. We have developed tools for the validation of the EC number classification scheme. In this paper, we present validated data of 3788 enzymatic reactions including 229 sub-subclasses of the EC classification system. Over 80% agreement was found between our assignment and the EC classification. For 61 (i.e., only 2.5%) reactions we found that their assignment was inconsistent with the rules of the nomenclature committee; they have to be transferred to other sub-subclasses. We demonstrate that our validation results can be used to initiate corrections and improvements to the EC number classification scheme

    EFICAz2: enzyme function inference by a combined approach enhanced by machine learning

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment.</p> <p>Results</p> <p>We have developed two new EFICAz components, analogs to the two FDR-based components, where the discrimination between homo and heterofunctional members is based on the evaluation, via Support Vector Machine models, of all the aligned positions between the query sequence and the multiple sequence alignments associated to the enzyme families. Benchmark results indicate that: i) the new SVM-based components outperform their FDR-based counterparts, and ii) both SVM-based and FDR-based components generate unique predictions. We developed classification tree models to optimally combine the results from the six EFICAz components into a final EC number prediction. The new implementation of our approach, EFICAz<sup>2</sup>, exhibits a highly improved prediction precision at MTTSI < 30% compared to the original EFICAz, with only a slight decrease in prediction recall. A comparative analysis of enzyme function annotation of the human proteome by EFICAz<sup>2 </sup>and KEGG shows that: i) when both sources make EC number assignments for the same protein sequence, the assignments tend to be consistent and ii) EFICAz<sup>2 </sup>generates considerably more unique assignments than KEGG.</p> <p>Conclusion</p> <p>Performance benchmarks and the comparison with KEGG demonstrate that EFICAz<sup>2 </sup>is a powerful and precise tool for enzyme function annotation, with multiple applications in genome analysis and metabolic pathway reconstruction. The EFICAz<sup>2 </sup>web service is available at: <url>http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.html</url></p

    The effect of absent blood flow on the zebrafish cerebral and trunk vasculature

    Get PDF
    The role of blood flow in vascular development is complex and context-dependent. In this study, we quantify the effect of the lack of blood flow on embryonic vascular development on two vascular beds, namely the cerebral and trunk vasculature in zebrafish. We perform this by analysing vascular topology, endothelial cell (EC) number, EC distribution, apoptosis, and inflammatory response in animals with normal blood flow or absent blood flow. We find that absent blood flow reduced vascular area and EC number significantly in both examined vascular beds, but the effect is more severe in the cerebral vasculature, and severity increases over time. Absent blood flow leads to an increase in non-EC-specific apoptosis without increasing tissue inflammation, as quantified by cerebral immune cell numbers and nitric oxide. Similarly, while stereotypic vascular patterning in the trunk is maintained, intra-cerebral vessels show altered patterning, which is likely to be due to vessels failing to initiate effective fusion and anastomosis rather than sprouting or path-seeking. In conclusion, blood flow is essential for cellular survival in both the trunk and cerebral vasculature, but particularly intra-cerebral vessels are affected by the lack of blood flow, suggesting that responses to blood flow differ between these two vascular beds

    Identification of functionally related enzymes by learning-to-rank methods

    Full text link
    Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes

    Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models.

    Get PDF
    Knowing the catalytic turnover numbers of enzymes is essential for understanding the growth rate, proteome composition, and physiology of organisms, but experimental data on enzyme turnover numbers is sparse and noisy. Here, we demonstrate that machine learning can successfully predict catalytic turnover numbers in Escherichia coli based on integrated data on enzyme biochemistry, protein structure, and network context. We identify a diverse set of features that are consistently predictive for both in vivo and in vitro enzyme turnover rates, revealing novel protein structural correlates of catalytic turnover. We use our predictions to parameterize two mechanistic genome-scale modelling frameworks for proteome-limited metabolism, leading to significantly higher accuracy in the prediction of quantitative proteome data than previous approaches. The presented machine learning models thus provide a valuable tool for understanding metabolism and the proteome at the genome scale, and elucidate structural, biochemical, and network properties that underlie enzyme kinetics

    Prediction of enzyme kinetic parameters based on statistical learning

    Get PDF
    Values of enzyme kinetic parameters are a key requisite for the kinetic modelling of biochemical systems. For most kinetic parameters, however, not even an order of magnitude is known, so the estimation of model parameters from experimental data remains a major task in systems biology. We propose a statistical approach to infer values for kinetic parameters across species and enzymes making use of parameter values that have been measured under various conditions and that are nowadays stored in databases. We fit the data by a statistical regression model in which the substrate, the combination enzyme-substrate and the combination organism-substrate have a linear effect on the logarithmic parameter value. As a result, we obtain predictions and error ranges for unknown enzyme parameters. We apply our method to decadic logarithmic Michaelis-Menten constants from the BRENDA database and confirm the results with leave-one-out crossvalidation, in which we mask one value at a time and predict it from the remaining data. For a set of 8 metabolites we obtain a standard prediction error of 1.01 for the deviation of the predicted values from the true values, while the standard deviation of the experimental values is 1.16. The method is applicable to other types of kinetic parameters for which many experimental data are available

    A Byzantine Fault-Tolerant Ordering Service for the Hyperledger Fabric Blockchain Platform

    Full text link
    Hyperledger Fabric (HLF) is a flexible permissioned blockchain platform designed for business applications beyond the basic digital coin addressed by Bitcoin and other existing networks. A key property of HLF is its extensibility, and in particular the support for multiple ordering services for building the blockchain. Nonetheless, the version 1.0 was launched in early 2017 without an implementation of a Byzantine fault-tolerant (BFT) ordering service. To overcome this limitation, we designed, implemented, and evaluated a BFT ordering service for HLF on top of the BFT-SMaRt state machine replication/consensus library, implementing also optimizations for wide-area deployment. Our results show that HLF with our ordering service can achieve up to ten thousand transactions per second and write a transaction irrevocably in the blockchain in half a second, even with peers spread in different continents

    Data mining of protein families using common peptides

    Get PDF
    Predicting the function of a protein from its sequence is typically addressed using sequence-similarity. Here we propose a motif-based approach, using supervised motif extraction from protein sequences belonging to one functional family. The resulting deterministic motifs form Common Peptides (CPs) that characterize this family, allow for data mining of its proteins and facilitate further partition of the family into cluster
    • …
    corecore