11,668 research outputs found

    A critical evaluation of automatic atom mapping algorithms and tools

    Get PDF
    The identification of the atoms which change their position in chemical reactions is an important knowledge within the field of Metabolic Engineering. This can lead to new advances at different levels from the reconstruction of metabolic networks to the classification of chemical reactions, through the identification of the atomic changes inside a reaction. The Atom Mapping approach was initially developed in the 1960s, but recently suffered important advances, being used in diverse biological and biotechnological studies. The main methodologies used for atom mapping are the Maximum Common Substructure and the Linear Optimization methods, which both require computational know-how and powerful resources to run the underlying tools. In this work, we assessed a number of previously implemented atom mapping frameworks, and built a framework able of managing the different data inputs and outputs, as well as the mapping process provided by each of these third-party tools. We evaluated the admissibility of the calculated atom maps from different algorithms, also assessing if with different approaches we were capable of returning equivalent atom maps for the same chemical reaction.ERDF -European Regional Development Fund(UID/BIO/04469/2013)info:eu-repo/semantics/publishedVersio

    A critical evaluation of automatic atom mapping algorithms and tools

    Get PDF
    Dissertação de mestardo em BioinformaticsThe identification of the atoms which change their position in chemical reactions is an important knowledge within the field of Metabolic Engineering (ME). This can lead to new advances at different levels from the reconstruction of metabolic networks to the classification of chemical reactions, through the identification of the atomic changes inside a reaction. The Atom Mapping approach was initially developed in the 1960’s, but recently it has suffered important advances, being used in diverse biological and biotechnological studies. The main methodologies used for the atom mapping process are the Maximum Common Substructure (MCS) and the Linear Optimization methods, which both require computational know-how and powerful resources to run the underlying tools. In this work, we assessed a number of previously implemented atom mapping frameworks, and built a framework able of managing the different data inputs and outputs, as well as the mapping process provided by each of these third-party tools. We also evaluated the admissibility of the calculated atom maps from different algorithms, assessing if with different approaches were capable of returning equivalent atom maps for the same chemical reaction.A identificação dos ĂĄtomos que mudam a sua posição durante uma reacção quĂ­mica Ă© um conhecimento importante no Ăąmbito da investigação no campo da Engenharia MetabĂłlica. Esta identificação Ă© capaz de nos trazer vantagens a diversos nĂ­veis desde a reconstrução de redes metabĂłlicas atĂ© Ă  classificação de reacçÔes quĂ­micas atravĂ©s da identificação das mudanças atĂłmicas dentro de uma reacção. As tĂ©cnicas de mapeamento de ĂĄtomos foram inicialmente desenvolvidas nos anos 1960, mas tĂȘm sofrido importantes avanços recentemente, sendo usada em diversos trabalhos biolĂłgicos e biotecnolĂłgicos. As principais metodologias usadas no mapeamento de ĂĄtomos usam as abordagens de MĂĄxima Estrutura Comum ou a Optimização Linear, em ambos os casos requerendo conhecimentos computacionais bem como de importantes recursos para correr as ferramentas subjacentes. Neste trabalho, avaliamos diversas plataformas de mapeamento de ĂĄtomos jĂĄ implementadas, e construĂ­mos uma plataforma capaz de gerir as diferentes entradas e saĂ­das de dados, bem como o processo de mapeamento providenciado por cada uma das ferramentas. Avaliamos, ainda, a admissibilidade dos mapas atĂłmicos calculados e se diferentes algoritmos, com diferentes abordagens, sĂŁo capazes de calcular mapas atĂłmicos equivalentes para a mesma reacção quĂ­mica

    Retrosynthetic reaction prediction using neural sequence-to-sequence models

    Full text link
    We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 experimental reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step towards solving the challenging problem of computational retrosynthetic analysis

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    An MDL framework for sparse coding and dictionary learning

    Full text link
    The power of sparse signal modeling with learned over-complete dictionaries has been demonstrated in a variety of applications and fields, from signal processing to statistical inference and machine learning. However, the statistical properties of these models, such as under-fitting or over-fitting given sets of data, are still not well characterized in the literature. As a result, the success of sparse modeling depends on hand-tuning critical parameters for each data and application. This work aims at addressing this by providing a practical and objective characterization of sparse models by means of the Minimum Description Length (MDL) principle -- a well established information-theoretic approach to model selection in statistical inference. The resulting framework derives a family of efficient sparse coding and dictionary learning algorithms which, by virtue of the MDL principle, are completely parameter free. Furthermore, such framework allows to incorporate additional prior information to existing models, such as Markovian dependencies, or to define completely new problem formulations, including in the matrix analysis area, in a natural way. These virtues will be demonstrated with parameter-free algorithms for the classic image denoising and classification problems, and for low-rank matrix recovery in video applications

    Towards automatic Markov reliability modeling of computer architectures

    Get PDF
    The analysis and evaluation of reliability measures using time-varying Markov models is required for Processor-Memory-Switch (PMS) structures that have competing processes such as standby redundancy and repair, or renewal processes such as transient or intermittent faults. The task of generating these models is tedious and prone to human error due to the large number of states and transitions involved in any reasonable system. Therefore model formulation is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model formulation. This paper presents an overview of the Automated Reliability Modeling (ARM) program, under development at NASA Langley Research Center. ARM will accept as input a description of the PMS interconnection graph, the behavior of the PMS components, the fault-tolerant strategies, and the operational requirements. The output of ARM will be the reliability of availability Markov model formulated for direct use by evaluation programs. The advantages of such an approach are (a) utility to a large class of users, not necessarily expert in reliability analysis, and (b) a lower probability of human error in the computation

    Automated computation of materials properties

    Full text link
    Materials informatics offers a promising pathway towards rational materials design, replacing the current trial-and-error approach and accelerating the development of new functional materials. Through the use of sophisticated data analysis techniques, underlying property trends can be identified, facilitating the formulation of new design rules. Such methods require large sets of consistently generated, programmatically accessible materials data. Computational materials design frameworks using standardized parameter sets are the ideal tools for producing such data. This work reviews the state-of-the-art in computational materials design, with a focus on these automated ab-initio\textit{ab-initio} frameworks. Features such as structural prototyping and automated error correction that enable rapid generation of large datasets are discussed, and the way in which integrated workflows can simplify the calculation of complex properties, such as thermal conductivity and mechanical stability, is demonstrated. The organization of large datasets composed of ab-initio\textit{ab-initio} calculations, and the tools that render them programmatically accessible for use in statistical learning applications, are also described. Finally, recent advances in leveraging existing data to predict novel functional materials, such as entropy stabilized ceramics, bulk metallic glasses, thermoelectrics, superalloys, and magnets, are surveyed.Comment: 25 pages, 7 figures, chapter in a boo
    • 

    corecore