Search CORE

12,420 research outputs found

The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures

Author: Chang Haw-Shiuan
Flanigan Jeffrey
Huang Kevin
Jensen Zach
Kim Edward
McCallum Andrew
Mysore Sheshera
Olivetti Elsa
Strubell Emma
Publication venue
Publication date: 01/01/2019
Field of study

Materials science literature contains millions of materials synthesis procedures described in unstructured natural language text. Large-scale analysis of these synthesis procedures would facilitate deeper scientific understanding of materials synthesis and enable automated synthesis planning. Such analysis requires extracting structured representations of synthesis procedures from the raw text as a first step. To facilitate the training and evaluation of synthesis extraction models, we introduce a dataset of 230 synthesis procedures annotated by domain experts with labeled graphs that express the semantics of the synthesis sentences. The nodes in this graph are synthesis operations and their typed arguments, and labeled edges specify relations between the nodes. We describe this new resource in detail and highlight some specific challenges to annotating scientific text with shallow semantic structure. We make the corpus available to the community to promote further research and development of scientific information extraction systems.Comment: Accepted as a long paper at the Linguistic Annotation Workshop (LAW) at ACL 201

arXiv.org e-Print Archive

Crossref

DSpace@MIT

SynKB: Semantic Search for Synthetic Procedures

Author: Bai Fan
Freitag Dayne
Madrid Peter
Niekrasz John
Ritter Alan
Publication venue
Publication date: 06/10/2022
Field of study

In this paper we present SynKB, an open-source, automatically extracted knowledge base of chemical synthesis protocols. Similar to proprietary chemistry databases such as Reaxsys, SynKB allows chemists to retrieve structured knowledge about synthetic procedures. By taking advantage of recent advances in natural language processing for procedural texts, SynKB supports more flexible queries about reaction conditions, and thus has the potential to help chemists search the literature for conditions used in relevant reactions as they design new synthetic routes. Using customized Transformer models to automatically extract information from 6 million synthesis procedures described in U.S. and EU patents, we show that for many queries, SynKB has higher recall than Reaxsys, while maintaining high precision. We plan to make SynKB available as an open-source tool; in contrast, proprietary chemistry databases require costly subscriptions.Comment: Accepted to EMNLP 2022 Demo trac

arXiv.org e-Print Archive

Building Open Knowledge Graph for Metal-Organic Frameworks (MOF-KG): Challenges and Case Studies

Author: An Yuan
Ardila Katherine
Fajardo-Rojas Fernando
Furst Jacob
Greenberg Jane
Gómez-Gualdrón Diego A.
Hu Xiaohua
Kalinowski Alex
Langlois Kyle
McCLellan Scott
Uribe-Romo Fernando J.
Zhao Xintong
Publication venue
Publication date: 29/11/2023
Field of study

Metal-Organic Frameworks (MOFs) are a class of modular, porous crystalline materials that have great potential to revolutionize applications such as gas storage, molecular separations, chemical sensing, catalysis, and drug delivery. The Cambridge Structural Database (CSD) reports 10,636 synthesized MOF crystals which in addition contains ca. 114,373 MOF-like structures. The sheer number of synthesized (plus potentially synthesizable) MOF structures requires researchers pursue computational techniques to screen and isolate MOF candidates. In this demo paper, we describe our effort on leveraging knowledge graph methods to facilitate MOF prediction, discovery, and synthesis. We present challenges and case studies about (1) construction of a MOF knowledge graph (MOF-KG) from structured and unstructured sources and (2) leveraging the MOF-KG for discovery of new or missing knowledge.Comment: Accepted by the International Workshop on Knowledge Graphs and Open Knowledge Network (OKN'22) Co-located with the 28th ACM SIGKDD Conferenc

arXiv.org e-Print Archive

The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

Author: Adel Heike
Benteau Renou
Friedrich Annemarie
Hingerl Johannes
Lange Lukas
Maruscyk Anika
Tomazic Federico
Publication venue
Publication date: 01/01/2020
Field of study

This paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network on top seems beneficial. Our models will serve as competitive baselines in future work, and analysis of their performance highlights difficult cases when modeling the data and suggests promising research directions.Comment: Accepted for publication at ACL 202

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Fourteenth Biennial Status Report: März 2017 - February 2019

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2019
Field of study

MPG.PuRe

Pipelines for Procedural Information Extraction from Scientific Literature: Towards Recipes using Machine Learning and Data Science

Author: Aguirre Carlos A.
Bobadilla Luis
Buttler David
Christensen Derek
Davich Emily
De La Torre Maria F.
Han T. Yong-Jin
Hsu William H.
Lam Alice
Luo Lei
Roth Jordan
Theis Yihong
Yang Huichen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/12/2019
Field of study

This paper describes a machine learning and data science pipeline for structured information extraction from documents, implemented as a suite of open-source tools and extensions to existing tools. It centers around a methodology for extracting procedural information in the form of recipes, stepwise procedures for creating an artifact (in this case synthesizing a nanomaterial), from published scientific literature. From our overall goal of producing recipes from free text, we derive the technical objectives of a system consisting of pipeline stages: document acquisition and filtering, payload extraction, recipe step extraction as a relationship extraction task, recipe assembly, and presentation through an information retrieval interface with question answering (QA) functionality. This system meets computational information and knowledge management (CIKM) requirements of metadata-driven payload extraction, named entity extraction, and relationship extraction from text. Functional contributions described in this paper include semi-supervised machine learning methods for PDF filtering and payload extraction tasks, followed by structured extraction and data transformation tasks beginning with section extraction, recipe steps as information tuples, and finally assembled recipes. Measurable objective criteria for extraction quality include precision and recall of recipe steps, ordering constraints, and QA accuracy, precision, and recall. Results, key novel contributions, and significant open problems derived from this work center around the attribution of these holistic quality measures to specific machine learning and inference stages of the pipeline, each with their performance measures. The desired recipes contain identified preconditions, material inputs, and operations, and constitute the overall output generated by our computational information and knowledge management (CIKM) system.Comment: 15th International Conference on Document Analysis and Recognition Workshops (ICDARW 2019

arXiv.org e-Print Archive

Crossref

Recommended from our members

Supporting Story Synthesis: Bridging the Gap between Visual Analytics and Storytelling

Author: Andrienko G.
Andrienko N.
Chen S.
Li J.
Nguyen P.
Turkay C.
Wang Y.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2020
Field of study

Visual analytics usually deals with complex data and uses sophisticated algorithmic, visual, and interactive techniques. Findings of the analysis often need to be communicated to an audience that lacks visual analytics expertise. This requires analysis outcomes to be presented in simpler ways than that are typically used in visual analytics systems. However, not only analytical visualizations may be too complex for target audience but also the information that needs to be presented. Hence, there exists a gap on the path from obtaining analysis findings to communicating them, which involves two aspects: information and display complexity. We propose a general framework where data analysis and result presentation are linked by story synthesis, in which the analyst creates and organizes story contents. Differently, from the previous research, where analytic findings are represented by stored display states, we treat findings as data constructs. In story synthesis, findings are selected, assembled, and arranged in views using meaningful layouts that take into account the structure of information and inherent properties of its components. We propose a workflow for applying the proposed framework in designing visual analytics systems and demonstrate the generality of the approach by applying it to two domains, social media, and movement analysis

City Research Online

Crossref

Warwick Research Archives Portal Repository