11,668 research outputs found
A critical evaluation of automatic atom mapping algorithms and tools
The identification of the atoms which change their position in chemical reactions is an important knowledge within the field of Metabolic Engineering. This can lead to new advances at different levels from the reconstruction of metabolic networks to the classification of chemical reactions, through the identification of the atomic changes inside a reaction. The Atom Mapping approach was initially developed in the 1960s, but recently suffered important advances, being used in diverse biological and biotechnological studies. The main methodologies used for atom mapping are the Maximum Common Substructure and the Linear Optimization methods, which both require computational know-how and powerful resources to run the underlying tools.
In this work, we assessed a number of previously implemented atom mapping frameworks, and built a framework able of managing the different data inputs and outputs, as well as the mapping process provided by each of these third-party tools. We evaluated the admissibility of the calculated atom maps from different algorithms, also assessing if with different approaches we were capable of returning equivalent atom maps for the same chemical reaction.ERDF -European Regional Development Fund(UID/BIO/04469/2013)info:eu-repo/semantics/publishedVersio
A critical evaluation of automatic atom mapping algorithms and tools
Dissertação de mestardo em BioinformaticsThe identification of the atoms which change their position in chemical reactions is an
important knowledge within the field of Metabolic Engineering (ME). This can lead to new
advances at different levels from the reconstruction of metabolic networks to the classification
of chemical reactions, through the identification of the atomic changes inside a reaction.
The Atom Mapping approach was initially developed in the 1960âs, but recently it has
suffered important advances, being used in diverse biological and biotechnological studies.
The main methodologies used for the atom mapping process are the Maximum Common
Substructure (MCS) and the Linear Optimization methods, which both require computational
know-how and powerful resources to run the underlying tools.
In this work, we assessed a number of previously implemented atom mapping frameworks,
and built a framework able of managing the different data inputs and outputs, as
well as the mapping process provided by each of these third-party tools. We also evaluated
the admissibility of the calculated atom maps from different algorithms, assessing if with
different approaches were capable of returning equivalent atom maps for the same chemical
reaction.A identificação dos ĂĄtomos que mudam a sua posição durante uma reacção quĂmica Ă© um
conhecimento importante no ùmbito da investigação no campo da Engenharia Metabólica.
Esta identificação Ă© capaz de nos trazer vantagens a diversos nĂveis desde a reconstrução
de redes metabĂłlicas atĂ© Ă classificação de reacçÔes quĂmicas atravĂ©s da identificação das
mudanças atómicas dentro de uma reacção.
As técnicas de mapeamento de åtomos foram inicialmente desenvolvidas nos anos 1960,
mas tĂȘm sofrido importantes avanços recentemente, sendo usada em diversos trabalhos
biolĂłgicos e biotecnolĂłgicos. As principais metodologias usadas no mapeamento de ĂĄtomos
usam as abordagens de Måxima Estrutura Comum ou a Optimização Linear, em ambos os
casos requerendo conhecimentos computacionais bem como de importantes recursos para
correr as ferramentas subjacentes.
Neste trabalho, avaliamos diversas plataformas de mapeamento de ĂĄtomos jĂĄ implementadas,
e construĂmos uma plataforma capaz de gerir as diferentes entradas e saĂdas de
dados, bem como o processo de mapeamento providenciado por cada uma das ferramentas.
Avaliamos, ainda, a admissibilidade dos mapas atĂłmicos calculados e se diferentes
algoritmos, com diferentes abordagens, sĂŁo capazes de calcular mapas atĂłmicos equivalentes
para a mesma reacção quĂmica
Retrosynthetic reaction prediction using neural sequence-to-sequence models
We describe a fully data driven model that learns to perform a retrosynthetic
reaction prediction task, which is treated as a sequence-to-sequence mapping
problem. The end-to-end trained model has an encoder-decoder architecture that
consists of two recurrent neural networks, which has previously shown great
success in solving other sequence-to-sequence prediction tasks such as machine
translation. The model is trained on 50,000 experimental reaction examples from
the United States patent literature, which span 10 broad reaction types that
are commonly used by medicinal chemists. We find that our model performs
comparably with a rule-based expert system baseline model, and also overcomes
certain limitations associated with rule-based expert systems and with any
machine learning approach that contains a rule-based expert system component.
Our model provides an important first step towards solving the challenging
problem of computational retrosynthetic analysis
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
An MDL framework for sparse coding and dictionary learning
The power of sparse signal modeling with learned over-complete dictionaries
has been demonstrated in a variety of applications and fields, from signal
processing to statistical inference and machine learning. However, the
statistical properties of these models, such as under-fitting or over-fitting
given sets of data, are still not well characterized in the literature. As a
result, the success of sparse modeling depends on hand-tuning critical
parameters for each data and application. This work aims at addressing this by
providing a practical and objective characterization of sparse models by means
of the Minimum Description Length (MDL) principle -- a well established
information-theoretic approach to model selection in statistical inference. The
resulting framework derives a family of efficient sparse coding and dictionary
learning algorithms which, by virtue of the MDL principle, are completely
parameter free. Furthermore, such framework allows to incorporate additional
prior information to existing models, such as Markovian dependencies, or to
define completely new problem formulations, including in the matrix analysis
area, in a natural way. These virtues will be demonstrated with parameter-free
algorithms for the classic image denoising and classification problems, and for
low-rank matrix recovery in video applications
Towards automatic Markov reliability modeling of computer architectures
The analysis and evaluation of reliability measures using time-varying Markov models is required for Processor-Memory-Switch (PMS) structures that have competing processes such as standby redundancy and repair, or renewal processes such as transient or intermittent faults. The task of generating these models is tedious and prone to human error due to the large number of states and transitions involved in any reasonable system. Therefore model formulation is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model formulation. This paper presents an overview of the Automated Reliability Modeling (ARM) program, under development at NASA Langley Research Center. ARM will accept as input a description of the PMS interconnection graph, the behavior of the PMS components, the fault-tolerant strategies, and the operational requirements. The output of ARM will be the reliability of availability Markov model formulated for direct use by evaluation programs. The advantages of such an approach are (a) utility to a large class of users, not necessarily expert in reliability analysis, and (b) a lower probability of human error in the computation
Automated computation of materials properties
Materials informatics offers a promising pathway towards rational materials
design, replacing the current trial-and-error approach and accelerating the
development of new functional materials. Through the use of sophisticated data
analysis techniques, underlying property trends can be identified, facilitating
the formulation of new design rules. Such methods require large sets of
consistently generated, programmatically accessible materials data.
Computational materials design frameworks using standardized parameter sets are
the ideal tools for producing such data. This work reviews the state-of-the-art
in computational materials design, with a focus on these automated
frameworks. Features such as structural prototyping and
automated error correction that enable rapid generation of large datasets are
discussed, and the way in which integrated workflows can simplify the
calculation of complex properties, such as thermal conductivity and mechanical
stability, is demonstrated. The organization of large datasets composed of
calculations, and the tools that render them
programmatically accessible for use in statistical learning applications, are
also described. Finally, recent advances in leveraging existing data to predict
novel functional materials, such as entropy stabilized ceramics, bulk metallic
glasses, thermoelectrics, superalloys, and magnets, are surveyed.Comment: 25 pages, 7 figures, chapter in a boo
- âŠ