26 research outputs found

    Kinetic model construction using chemoinformatics

    Get PDF
    Kinetic models of chemical processes not only provide an alternative to costly experiments; they also have the potential to accelerate the pace of innovation in developing new chemical processes or in improving existing ones. Kinetic models are most powerful when they reflect the underlying chemistry by incorporating elementary pathways between individual molecules. The downside of this high level of detail is that the complexity and size of the models also steadily increase, such that the models eventually become too difficult to be manually constructed. Instead, computers are programmed to automate the construction of these models, and make use of graph theory to translate chemical entities such as molecules and reactions into computer-understandable representations. This work studies the use of automated methods to construct kinetic models. More particularly, the need to account for the three-dimensional arrangement of atoms in molecules and reactions of kinetic models is investigated and illustrated by two case studies. First of all, the thermal rearrangement of two monoterpenoids, cis- and trans-2-pinanol, is studied. A kinetic model that accounts for the differences in reactivity and selectivity of both pinanol diastereomers is proposed. Secondly, a kinetic model for the pyrolysis of the fuel “JP-10” is constructed and highlights the use of state-of-the-art techniques for the automated estimation of thermochemistry of polycyclic molecules. A new code is developed for the automated construction of kinetic models and takes advantage of the advances made in the field of chemo-informatics to tackle fundamental issues of previous approaches. Novel algorithms are developed for three important aspects of automated construction of kinetic models: the estimation of symmetry of molecules and reactions, the incorporation of stereochemistry in kinetic models, and the estimation of thermochemical and kinetic data using scalable structure-property methods. Finally, the application of the code is illustrated by the automated construction of a kinetic model for alkylsulfide pyrolysis

    Enhancing Reaction-based de novo Design using Machine Learning

    Get PDF
    De novo design is a branch of chemoinformatics that is concerned with the rational design of molecular structures with desired properties, which specifically aims at achieving suitable pharmacological and safety profiles when applied to drug design. Scoring, construction, and search methods are the main components that are exploited by de novo design programs to explore the chemical space to encourage the cost-effective design of new chemical entities. In particular, construction methods are concerned with providing strategies for compound generation to address issues such as drug-likeness and synthetic accessibility. Reaction-based de novo design consists of combining building blocks according to transformation rules that are extracted from collections of known reactions, intending to restrict the enumerated chemical space into a manageable number of synthetically accessible structures. The reaction vector is an example of a representation that encodes topological changes occurring in reactions, which has been integrated within a structure generation algorithm to increase the chances of generating molecules that are synthesisable. The general aim of this study was to enhance reaction-based de novo design by developing machine learning approaches that exploit publicly available data on reactions. A series of algorithms for reaction standardisation, fingerprinting, and reaction vector database validation were introduced and applied to generate new data on which the entirety of this work relies. First, these collections were applied to the validation of a new ligand-based design tool. The tool was then used in a case study to design compounds which were eventually synthesised using very similar procedures to those suggested by the structure generator. A reaction classification model and a novel hierarchical labelling system were then developed to introduce the possibility of applying transformations by class. The model was augmented with an algorithm for confidence estimation, and was used to classify two datasets from industry and the literature. Results from the classification suggest that the model can be used effectively to gain insights on the nature of reaction collections. Classified reactions were further processed to build a reaction class recommendation model capable of suggesting appropriate reaction classes to apply to molecules according to their fingerprints. The model was validated, then integrated within the reaction vector-based design framework, which was assessed on its performance against the baseline algorithm. Results from the de novo design experiments indicate that the use of the recommendation model leads to a higher synthetic accessibility and a more efficient management of computational resources

    Automatic learning for the classification of chemical reactions and in statistical thermodynamics

    Get PDF
    This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations

    Development of a universal alignment medium for the extraction of RDCs and structure elucidation with tensorial constraints

    Get PDF
    In der hochauflösenden Kernresonanzspektroskopie liefern rest-anisotrope Parameter wie dipolare Restkopplungen (RDcs), restliche chemische Verschiebungsanisotropien (RCSAs) und quadrupolare Restkopplungen (RQCs) wertvolle Informationen, die zu den unter isotropen (standard-) Bedingungen gemessenen NMR-Parametern fĂŒr die Strukturverfeinerung und -aufklĂ€rung komplementĂ€r sind. Daher ist ein homogenes und ausreichend schwaches sogenanntes "Alignment" in sogenannten Alignmentmedien erforderlich, um eine geringe Anisotropie in der Probe zu induzieren. Obwohl eine Vielzahl unterschiedlicher Alignmentmedien und -methoden existiert, werden sie recht selten genutzt, da die derzeit verfĂŒgbaren ĂŒblichen Alignmentmedien entweder speziell fĂŒr die Anwendung mit kleinen organischen MolekĂŒlen oder fĂŒr große (Bio-) MakromolekĂŒle entwickelt wurden und ihre Verwendung auf bestimmte Lösungsmittel beschrĂ€nkt ist. Neben flĂŒssigkristallinen Phasen werden mechanisch gestreckte oder gestauchte Polymergele zur adĂ€quaten Ausrichtung von gelösten Stoffen eingesetzt. Oft kann die PrĂ€paration von Proben bis zum Äquilibrieren bis zu Wochen und Monaten dauern, was die Nutzung dieser Methoden im kommerziellen Bereich eher erschwert. FrĂŒhere Untersuchungen mit vernetzten Polyethylenoxid-Hydrogelen (PEO) zeigten eine erhebliche Quellung in einer Vielzahl von Lösungsmitteln. Leider erforderte die Vernetzung die Verwendung von ÎČ\beta- oder Îł\gamma-Strahlung oder wochenlange Bestrahlung mit ultraviolettem Licht. In dieser Arbeit wird ein schneller Syntheseweg optimiert und vorgestellt, um homogene GelstĂ€bchen auf Basis von PEO als Alignmentmedien herzustellen, der mit den in jedem modernen Labor verfĂŒgbaren Mitteln nachvollzogen werden kann. DarĂŒberhinaus wird das Quellverhalten in einer Vielzahl von reinen Lösungsmitteln und Gemischen aufgezeigt und der Einfluss des Massengehalts wĂ€hrend der Vernetzung untersucht. Durch Verwendung eines Ausgangsmaterials mit geringer DispersitĂ€t wird Kontrolle ĂŒber die Verteilung der KettenlĂ€ngen zwischen den Vernetzungspunkten ermöglicht, wodurch homogene Gele mit schmalen Linienbreiten erhalten werden. Gequollene Poly(ethylen oxid) diacrylat (PEODA) Gele, die mit unterschiedlichem Massenanteil vernetzt wurden, werden zusĂ€tzlich mit Doppelquanten NMR (DQ-NMR) untersucht, um das VerhĂ€ltnis von vernetztem Anteil und der Solfraktion inklusive der Defekte im Netzwerk zu ermitteln. Die Anwendbarkeit von PEODA in skalierbaren Streck- und Kompressionsapparaturen wird prĂ€sentiert und erfolgreich mit reinen Lösungsmitteln und Gemischen demonstriert. Hierbei wird eine Methode eingefĂŒhrt, die ein schnelles externes Äquilibrieren und Übertragen in die Probenröhrchen ermöglicht. Es wird gezeigt, dass durch Feinabstimmung verschiedener Parameter, vernetztes PEODA als universelles Alignmentmedium fĂŒr gelöste Substanzen, die von kleinen Naturstoffen bis zu Proteinen reichen, geeignet ist. Die bisher angewandte Methode der manuellen Extraktion von Kopplungen ist zeitaufwĂ€ndig und subjektiv. Es wird ein Verfahren zur halbautomatischen Extraktion von Kopplungen vorgestellt, welches mathematisch auf der Anwendung von Kreuz- bzw. Autokorrelationen beruht. FĂŒr die Auswertung von RDCs an kleinen organischen MolekĂŒlen, wurden MolekĂŒldynamiksimulationen mit orientierenden Randbedingungen evaluiert und angewandt (engl.: molecular dynamics with orientational constraints, MDOC), einer MolekĂŒlmechanik-Methode, die ohne Annahmen ĂŒber die Ausgangskonformation auskommt. Die Methode ermöglicht die Bestimmung des Konformationsraums anhand experimenteller anisotroper Daten, die als tensoriell orientierende Bedingungen in das Kraftfeld einfließen. Wenn gelöste MolekĂŒle flexibel sind und in unterschiedlichen Konformationen auftreten, ist die Interpretation der Daten mit bisherigen Methoden aufgrund der gemittelten Natur der extrahierten Daten schwierig und Annahmen ĂŒber auftretende Konformere mĂŒssen getroffen werden, die unter UmstĂ€nden nicht gerechtfertigt sind. In diesen FĂ€llen ist der MDOC Ansatz vorzuziehen, da die Interpretation der RDCs nicht durch die Wahl der Modellierung beeinflusst wird

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp

    Uncovering the Electronic Structure of Systems Vital to Nuclear Energy, Life, and Space Exploration via Anion Photoelectron Spectroscopy

    Get PDF
    Photoelectron spectroscopy provides a window into the complicated world of electronic structure and interactions on a molecular level. More specifically, anion photoelectron spectroscopy (aPES) allows one to obtain information regarding an anionic atom, molecule, or cluster as well as its neutral counterpart, with the additional advantages of improved resolution and species selection through the direct manipulation of the ion beam. A combination of mass spectrometry, anion photoelectron spectroscopy (aPES) and theory were used harmoniously in this work to study the electronic structure of several atomic and clusters anions, from atoms to large heterogeneous metal clusters. Studies of biological molecules are presented in Chapter 2, including the correlation-bound anion p-chloroaniline and several antioxidants. Chapter 3 (as well as Appendix A) focus on uranium and thorium species, including the experimentally measured electron affinity of the uranium atom: a fundamental property of the element. Chapter 4 focuses on systems related to propulsion applications, including systematic studies of Al3Hn− (n=1–9) clusters and of Ir n− with hydroxylamine, offering a wealth of information regarding reactions leading to combustion and ignition in jet engines

    The Material Theory of Induction

    Get PDF
    The fundamental burden of a theory of inductive inference is to determine which are the good inductive inferences or relations of inductive support and why it is that they are so. The traditional approach is modeled on that taken in accounts of deductive inference. It seeks universally applicable schemas or rules or a single formal device, such as the probability calculus. After millennia of halting efforts, none of these approaches has been unequivocally successful and debates between approaches persist. The Material Theory of Induction identifies the source of these enduring problems in the assumption taken at the outset: that inductive inference can be accommodated by a single formal account with universal applicability. Instead, it argues that that there is no single, universally applicable formal account. Rather, each domain has an inductive logic native to it.The content of that logic and where it can be applied are determined by the facts prevailing in that domain. Paying close attention to how inductive inference is conducted in science and copiously illustrated with real-world examples, The Material Theory of Induction will initiate a new tradition in the analysis of inductive inference

    Report / Institute fĂŒr Physik

    Get PDF
    The 2015 Report of the Physics Institutes of the UniversitÀt Leipzig presents an interesting overview of our research activities in the past year. It is also testimony of our scientific interaction with colleagues and partners worldwide
    corecore