Search CORE

158,029 research outputs found

Recommended from our members

Extraction of chemical structures and reactions from the literature

Author: Lowe Daniel Mark
Publication venue: University of Cambridge
Publication date: 09/10/2012
Field of study

The ever increasing quantity of chemical literature necessitates the creation of automated techniques for extracting relevant information. This work focuses on two aspects: the conversion of chemical names to computer readable structure representations and the extraction of chemical reactions from text. Chemical names are a common way of communicating chemical structure information. OPSIN (Open Parser for Systematic IUPAC Nomenclature), an open source, freely available algorithm for converting chemical names to structures was developed. OPSIN employs a regular grammar to direct tokenisation and parsing leading to the generation of an XML parse tree. Nomenclature operations are applied successively to the tree with many requiring the manipulation of an in-memory connection table representation of the structure under construction. Areas of nomenclature supported are described with attention being drawn to difficulties that may be encountered in name to structure conversion. Results on sets of generated names and names extracted from patents are presented. On generated names, recall of between 96.2% and 99.0% was achieved with a lower bound of 97.9% on precision with all results either being comparable or superior to the tested commercial solutions. On the patent names OPSIN s recall was 2-10% higher than the tested solutions when the patent names were processed as found in the patents. The uses of OPSIN as a web service and as a tool for identifying chemical names in text are shown to demonstrate the direct utility of this algorithm. A software system for extracting chemical reactions from the text of chemical patents was developed. The system relies on the output of ChemicalTagger, a tool for tagging words and identifying phrases of importance in experimental chemistry text. Improvements to this tool required to facilitate this task are documented. The structure of chemical entities are where possible determined using OPSIN in conjunction with a dictionary of name to structure relationships. Extracted reactions are atom mapped to confirm that they are chemically consistent. 424,621 atom mapped reactions were extracted from 65,034 organic chemistry USPTO patents. On a sample of 100 of these extracted reactions chemical entities were identified with 96.4% recall and 88.9% precision. Quantities could be associated with reagents in 98.8% of cases and 64.9% of cases for products whilst the correct role was assigned to chemical entities in 91.8% of cases. Qualitatively the system captured the essence of the reaction in 95% of cases. This system is expected to be useful in the creation of searchable databases of reactions from chemical patents and in facilitating analysis of the properties of large populations of reactions

Apollo (Cambridge)

Recommended from our members

Information extraction from chemical patents

Author: Jessop David M
Publication venue: University of Cambridge
Publication date: 15/03/2011
Field of study

The automated extraction of semantic chemical data from the existing literature is demonstrated. For reasons of copyright, the work is focused on the patent literature, though the methods are expected to apply equally to other areas of the chemical literature. Hearst Patterns are applied to the patent literature in order to discover hyponymic relations describing chemical species. The acquired relations are manually validated to determine the precision of the determined hypernyms (85.0%) and of the asserted hyponymic relations (94.3%). It is demonstrated that the system acquires relations that are not present in the ChEBI ontology, suggesting that it could function as a valuable aid to the ChEBI curators. The relations discovered by this process are formalised using the Web Ontology Language (OWL) to enable re-use. PatentEye – an automated system for the extraction of reactions from chemical patents and their conversion to Chemical Markup Language (CML) – is presented. Chemical patents published by the European Patent Office over a ten-week period are used to demonstrate the capability of PatentEye – 4444 reactions are extracted with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra are extracted from the text using OSCAR3, which is developed to greatly increase recall. The resulting system is presented as a significant advancement towards the large-scale and automated extraction of high-quality reaction information. Extended Polymer Markup Language (EPML), a CML dialect for the description of Markush structures as they are presented in the literature, is developed. Software to exemplify and to enable substructure searching of EPML documents is presented. Further work is recommended to refine the language and code to publication-quality before they are presented to the community.Unileve

Apollo (Cambridge)

A text-mining system for extracting metabolic reactions from full-text articles

Author: Czarnecki Jan M.
Nobeli Irene
Shepherd Adrian J.
Smith Adrian M.L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway—metabolic pathways—has been largely neglected. Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein–protein interactions. Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task. Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein–protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Birkbeck Institutional Research Online

Early maturation processes in coal. Part 1: Pyrolysis mass balances and structural evolution of coalified wood from the Morwell Brown Coal seam

Author: Adler
Al Darouich
Albrecht
Bates
Behar
Behar
Dria
Elodie Salmon
Fowler
François Lorant
Françoise Behar
Hatcher
Hatcher
Hatcher
Hatcher
Hatcher
Holdgate
McKinney
Mukhopadhyay
Nelson
Nimz
Patrick G. Hatcher
Paul-Marie Marquaire
Payne
Philippi
Philp
Rappé
Root
Salmon
Solomon
Solomon
Spackman
Stout
Publication venue: 'Elsevier BV'
Publication date: 28/03/2009
Field of study

In this work, we develop a theoretical approach to evaluate maturation process of kerogen-like material, involving molecular dynamic reactive modeling with a reactive force field to simulate the thermal stress. The Morwell coal has been selected to study the thermal evolution of terrestrial organic matter. To achieve this, a structural model is first constructed based on models from the literature and analytical characterization of our samples by modern 1-and 2-D NMR, FTIR, and elemental analysis. Then, artificial maturation of the Morwell coal is performed at low conversions in order to obtain, quantitative and qualitative, detailed evidences of structural evolution of the kerogen upon maturation. The observed chemical changes are a defunctionalization of the carboxyl, carbonyl and methoxy functional groups coupling with an increase of cross linking in the residual mature kerogen. Gaseous and liquids hydrocarbons, essentially CH4, C4H8 and C14+ liquid hydrocarbons, are generated in low amount, merely by cleavage of the lignin side chain

arXiv.org e-Print Archive

Crossref

Computer-aided solvent selection and design for efficient chemical processes

Author: Linke S.
McBride K.
Song Z.
Sundmacher K.
Zhou T.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

MPG.PuRe

Alkali release from aggregates in long-service concrete structures. Laboratory test evaluation and ASR prediction

Author: Mangialardi Teresa
Mario Berra
Paolini Antonio Evangelista
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Il lavoro propone un semplice modello per la previsione dello sviluppo di espansione deleteria da reazione alcali-silice (ASR) in strutture di calcestruzzo progettate per lunga vita di servizio. Il modello è basato su parametri di composizione e di reattività legati alla ASR, compreso il contributo in alcali a lungo termine da parte degli aggregati. Questo contributo è stato stimato attraverso una prova di estrazione di laboratorio, appositamente sviluppata con lo scopo di massimizzare il rilascio in tempi di prova relativamente brevi e con basso rapporto soluzione lisciviante/aggregato. Il metodo di prova proposto è basato sullo standard italiano riportato nella norma UNI 11417-2 e consiste nel sottoporre l'aggregato a lisciviazione con una soluzione satura di idrossido di calcio a 105°C, in autoclave. Sono stati sottoposti a prova nove aggregati (sette sabbie e due aggregati grossi), il rapporto in peso lisciviante/aggregato era pari a 0,6, il rapporto Ca(OH)2 solida/aggregato era pari a 0,05 ed il tempo di prova 120 ore. I risultati delle prove sono stati utilizzati nel modello di previsione dell'espansione deleteria a lungo termine, ottenendo delle previsioni del tutto congruenti con le informazioni sul comportamento reale dei materiali, nonché con le raccomandazioni riportate nel CEN/TR 16349:2012.This paper proposes a simple model for predicting the development of deleterious expansion from alkali-silica reaction (ASR) in long-service concrete structures. This model is based on some composition and reactivity parameters related to ASR, including the long-term alkali contribution by aggregates to concrete structures. This alkali contribution was estimated by means of a laboratory extraction test, appositely developed in this study in order to maximize the alkali extraction within relatively short testing times and with low leaching solution/aggregate ratios. The proposed test is a modification of the Italian Standard test method UNI 11417-2 (Ente Nazionale Italiano di Normazione) and it consists of subjecting an aggregate sample to leaching with saturated calcium hydroxide solution in a laboratory autoclave at 105 degrees C. Nine natural ASR-susceptible aggregates (seven sands and two coarse aggregates) were tested and the following optimized test conditions were found: leaching solution/aggregate weight ratio = 0.6; solid calcium hydroxide/aggregate weight ratio = 0.05; test duration = 120 h. The results of the optimized alkali extraction tests were used in the proposed model for predicting the potential development of long-term ASR expansion in concrete dams. ASR predictions congruent with both the field experience and the ASR prevention criteria recommended by European Committee for Standardization Technical Report CEN/TR 16349: 2012 were found, thus indicating the suitability of the proposed model

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Archivio della ricerca- Università di Roma La Sapienza

The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures

Author: Chang Haw-Shiuan
Flanigan Jeffrey
Huang Kevin
Jensen Zach
Kim Edward
McCallum Andrew
Mysore Sheshera
Olivetti Elsa
Strubell Emma
Publication venue
Publication date: 01/01/2019
Field of study

Materials science literature contains millions of materials synthesis procedures described in unstructured natural language text. Large-scale analysis of these synthesis procedures would facilitate deeper scientific understanding of materials synthesis and enable automated synthesis planning. Such analysis requires extracting structured representations of synthesis procedures from the raw text as a first step. To facilitate the training and evaluation of synthesis extraction models, we introduce a dataset of 230 synthesis procedures annotated by domain experts with labeled graphs that express the semantics of the synthesis sentences. The nodes in this graph are synthesis operations and their typed arguments, and labeled edges specify relations between the nodes. We describe this new resource in detail and highlight some specific challenges to annotating scientific text with shallow semantic structure. We make the corpus available to the community to promote further research and development of scientific information extraction systems.Comment: Accepted as a long paper at the Linguistic Annotation Workshop (LAW) at ACL 201

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Cellulosic materials as biopolymers and supercritical CO2as a green process: chemistry and applications

Author: Camy Séverine
Condoret Jean-Stéphane
Medina-Gonzalez Yaocihuatl
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2012
Field of study

In this review, we describe the use of supercritical CO2 (scCO2) in several cellulose applications. The focus is on different technologies that either exist or are expected to emerge in the near future. The applications are wide from the extraction of hazardous wastes to the cleaning and reuse of paper or production of glucose. To put this topic in context, cellulose chemistry and its interactions with scCO2 are described. The aim of this study was to discuss the new emerging technologies and trends concerning cellulosic materials processed in scCO2 such as cellulose drying to obtain aerogels, foams and other microporous materials, impregnation of cellulose, extraction of highly valuable compounds from plants and metallic residues from treated wood. Especially, in the bio-fuel production field, we address the pre-treatment of cellulose in scCO2 to improve fermentation to ethanol by cellulase enzymes. Other reactions of cellulosic materials such as organic inorganic composites fabrication and de-polymerisation have been considered. Cellulose treatment by scCO2 has been discussed as well. Finally, other applications like deacidification of paper and cellulosic membranes fabrication in scCO2 have been reviewed. Examples of the discussed technologies are included as well

Open Archive Toulouse Archive Ouverte

Retrosynthetic reaction prediction using neural sequence-to-sequence models

Author: Gomes Joseph
Ho Stephen
Kawthekar Prasad
Liu Bowen
Nguyen Quang Luu
Pande Vijay
Ramsundar Bharath
Shi Jade
Sloane Jack
Wender Paul
Publication venue
Publication date: 06/06/2017
Field of study

We describe a fully data driven model that learns to perform a retrosynthetic reaction prediction task, which is treated as a sequence-to-sequence mapping problem. The end-to-end trained model has an encoder-decoder architecture that consists of two recurrent neural networks, which has previously shown great success in solving other sequence-to-sequence prediction tasks such as machine translation. The model is trained on 50,000 experimental reaction examples from the United States patent literature, which span 10 broad reaction types that are commonly used by medicinal chemists. We find that our model performs comparably with a rule-based expert system baseline model, and also overcomes certain limitations associated with rule-based expert systems and with any machine learning approach that contains a rule-based expert system component. Our model provides an important first step towards solving the challenging problem of computational retrosynthetic analysis

arXiv.org e-Print Archive

Directory of Open Access Journals