42 research outputs found

    A critical evaluation of automatic atom mapping algorithms and tools

    Get PDF
    The identification of the atoms which change their position in chemical reactions is an important knowledge within the field of Metabolic Engineering. This can lead to new advances at different levels from the reconstruction of metabolic networks to the classification of chemical reactions, through the identification of the atomic changes inside a reaction. The Atom Mapping approach was initially developed in the 1960s, but recently suffered important advances, being used in diverse biological and biotechnological studies. The main methodologies used for atom mapping are the Maximum Common Substructure and the Linear Optimization methods, which both require computational know-how and powerful resources to run the underlying tools. In this work, we assessed a number of previously implemented atom mapping frameworks, and built a framework able of managing the different data inputs and outputs, as well as the mapping process provided by each of these third-party tools. We evaluated the admissibility of the calculated atom maps from different algorithms, also assessing if with different approaches we were capable of returning equivalent atom maps for the same chemical reaction.ERDF -European Regional Development Fund(UID/BIO/04469/2013)info:eu-repo/semantics/publishedVersio

    Comparative evaluation of atom mapping algorithms for balanced metabolic reactions: application to Recon 3D

    Get PDF
    The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, many algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice

    enviRule: an end-to-end system for automatic extraction of reaction patterns from environmental contaminant biotransformation pathways

    Get PDF
    Transformation products (TPs) of man-made chemicals, formed through microbially mediated transformation in the environment, can have serious adverse environmental effects, yet the analytical identification of TPs is challenging. Rule-based prediction tools are successful in predicting TPs, especially in environmental chemistry applications that typically have to rely on small datasets, by imparting the existing knowledge on enzyme-mediated biotransformation reactions. However, the rules extracted from biotransformation reaction databases usually face the issue of being over/under-generalized and are not flexible to be updated with new reactions. We developed an automatic rule extraction tool called enviRule. It clusters biotransformation reactions into different groups based on the similarities of reaction fingerprints, and then automatically extracts and generalizes rules for each reaction group in SMARTS format. It optimizes the genericity of automatic rules against the downstream TP prediction task. Models trained with automatic rules outperformed the models trained with manually curated rules by 30% in the area under curve (AUC) scores. Moreover, automatic rules can be easily updated with new reactions, highlighting enviRule’s strengths for both automatic extraction of optimized reactions rules and automated updating thereof

    Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites.

    Get PDF
    M-CSA (Mechanism and Catalytic Site Atlas) is a database of enzyme active sites and reaction mechanisms that can be accessed at www.ebi.ac.uk/thornton-srv/m-csa. Our objectives with M-CSA are to provide an open data resource for the community to browse known enzyme reaction mechanisms and catalytic sites, and to use the dataset to understand enzyme function and evolution. M-CSA results from the merging of two existing databases, MACiE (Mechanism, Annotation and Classification in Enzymes), a database of enzyme mechanisms, and CSA (Catalytic Site Atlas), a database of catalytic sites of enzymes. We are releasing M-CSA as a new website and underlying database architecture. At the moment, M-CSA contains 961 entries, 423 of these with detailed mechanism information, and 538 with information on the catalytic site residues only. In total, these cover 81% (195/241) of third level EC numbers with a PDB structure, and 30% (840/2793) of fourth level EC numbers with a PDB structure, out of 6028 in total. By searching for close homologues, we are able to extend M-CSA coverage of PDB and UniProtKB to 51 993 structures and to over five million sequences, respectively, of which about 40% and 30% have a conserved active site

    A critical evaluation of automatic atom mapping algorithms and tools

    Get PDF
    Dissertação de mestardo em BioinformaticsThe identification of the atoms which change their position in chemical reactions is an important knowledge within the field of Metabolic Engineering (ME). This can lead to new advances at different levels from the reconstruction of metabolic networks to the classification of chemical reactions, through the identification of the atomic changes inside a reaction. The Atom Mapping approach was initially developed in the 1960’s, but recently it has suffered important advances, being used in diverse biological and biotechnological studies. The main methodologies used for the atom mapping process are the Maximum Common Substructure (MCS) and the Linear Optimization methods, which both require computational know-how and powerful resources to run the underlying tools. In this work, we assessed a number of previously implemented atom mapping frameworks, and built a framework able of managing the different data inputs and outputs, as well as the mapping process provided by each of these third-party tools. We also evaluated the admissibility of the calculated atom maps from different algorithms, assessing if with different approaches were capable of returning equivalent atom maps for the same chemical reaction.A identificação dos átomos que mudam a sua posição durante uma reacção química é um conhecimento importante no âmbito da investigação no campo da Engenharia Metabólica. Esta identificação é capaz de nos trazer vantagens a diversos níveis desde a reconstrução de redes metabólicas até à classificação de reacções químicas através da identificação das mudanças atómicas dentro de uma reacção. As técnicas de mapeamento de átomos foram inicialmente desenvolvidas nos anos 1960, mas têm sofrido importantes avanços recentemente, sendo usada em diversos trabalhos biológicos e biotecnológicos. As principais metodologias usadas no mapeamento de átomos usam as abordagens de Máxima Estrutura Comum ou a Optimização Linear, em ambos os casos requerendo conhecimentos computacionais bem como de importantes recursos para correr as ferramentas subjacentes. Neste trabalho, avaliamos diversas plataformas de mapeamento de átomos já implementadas, e construímos uma plataforma capaz de gerir as diferentes entradas e saídas de dados, bem como o processo de mapeamento providenciado por cada uma das ferramentas. Avaliamos, ainda, a admissibilidade dos mapas atómicos calculados e se diferentes algoritmos, com diferentes abordagens, são capazes de calcular mapas atómicos equivalentes para a mesma reacção química

    Understanding enzyme function evolution from a computational perspective.

    Get PDF
    In this review, we will explore recent computational approaches to understand enzyme evolution from the perspective of protein structure, dynamics and promiscuity. We will present quantitative methods to measure the size of evolutionary steps within a structural domain, allowing the correlation between change in substrate and domain structure to be assessed, and giving insights into the evolvability of different domains in terms of the number, types and sizes of evolutionary steps observed. These approaches will help to understand the evolution of new catalytic and non-catalytic functionality in response to environmental demands, showing potential to guide de novoenzyme design and directed evolution experiments

    The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching

    Get PDF
    open access articleBackground: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, providing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug discovery, metabolomics, and toxicology. Over the last 10 years, the code base has grown significantly, however, resulting in many complex interdependencies among components and poor performance of many algorithms. Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifically addressing the increased functional complexity and poor performance. We first summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to significantly better performance for substructure searching, molecular fingerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions: This paper highlights our continued efforts to provide a community driven, open source cheminformatics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientific computing software

    Recon3D enables a three-dimensional view of gene variation in human metabolism

    Get PDF
    Genome-scale network reconstructions have helped uncover the molecular basis of metabolism. Here we present Recon3D, a computational resource that includes three-dimensional (3D) metabolite and protein structure data and enables integrated analyses of metabolic functions in humans. We use Recon3D to functionally characterize mutations associated with disease, and identify metabolic response signatures that are caused by exposure to certain drugs. Recon3D represents the most comprehensive human metabolic network model to date, accounting for 3,288 open reading frames (representing 17% of functionally annotated human genes), 13,543 metabolic reactions involving 4,140 unique metabolites, and 12,890 protein structures. These data provide a unique resource for investigating molecular mechanisms of human metabolism. Recon3D is available at http://vmh.life
    corecore