6 research outputs found

    MOLGEN–CID, A Canonizer for Molecules and Graphs Accessible through the Internet

    No full text
    The MOLGEN Chemical Identifier MOLGEN-CID is a software module freely accessible via the Internet. For a molecule or graph entered in molfile format it produces, by a canonical renumbering procedure, a canonical molfile and a unique character string that is easily compared by computer to a similar string. The mode of operation of MOLGEN-CID is detailed and visualized with examples

    Ab initio machine learning in chemical compound space

    Get PDF
    Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first principles based virtual sampling of this space, for example in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest sub-sets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an {\em ab initio} view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics

    Automated de novo metabolite identification with mass spectrometry and cheminformatics

    Get PDF
    In this thesis new algorithms and methods that enable the de novo identification of metabolites have been developed. The aim was to find methods to propose candidate structures for unknown metabolites using MSn data as starting point. These methods have been integrated into a semi-automated pipeline to identify new human metabolites. The discovery of new metabolites will improve our capability to understand disease via its metabolic fingerprint, to develop personalized treatments and to discover new drugs. In addition, the cheminformatics methods presented in this thesis increase our understanding on the properties of human metabolites. The research described in this thesis has shown that the success of de novo metabolite identification relies on the synergy between analytical chemistry methods (i.e. LC-MSn) and cheminformatics tools.Netherlands Organization for Applied Scientific Research (TNO) Netherlands Metabolomics CentreUBL - phd migration 201

    Towards automated identification of metabolites using mass spectral trees

    Get PDF
    The detailed description of the chemical compounds present in organisms, organs/tissues, biofluids and cells is the key to understand the complexity of biological systems. The small molecules (metabolites) are known to be very diverse in structure and function. However, the identification of the chemical structure of metabolites is one of the major bottlenecks in metabolomics research. Hence, the annotation and the structure elucidation of the metabolites are essential to understand the biological system under study. Actually, no single analytical platform exists that can measure and identify all existing metabolites. Multistage mass spectrometry (MSn) is a powerful analytical technique that helps identifying all these metabolites. This technique provides detailed structural information of the unknown metabolite by fragmenting the metabolite and its fragments recursively. However, only computational tools can provide a fast and straightforward analysis of the large amount of complex data that is generated by using MSn spectrometry. The aim of this thesis was to develop a novel semi-automatic approach for the identification of metabolites using MS n data. Furthermore, these tools were to be integrated into a pipeline to assign identities to unknown metabolites present in databases but especially to unknown metabolites not present in a databaseUBL - phd migration 201
    corecore