5,430 research outputs found
Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks
Leveraging new data sources is a key step in accelerating the pace of
materials design and discovery. To complement the strides in synthesis planning
driven by historical, experimental, and computed data, we present an automated
method for connecting scientific literature to synthesis insights. Starting
from natural language text, we apply word embeddings from language models,
which are fed into a named entity recognition model, upon which a conditional
variational autoencoder is trained to generate syntheses for arbitrary
materials. We show the potential of this technique by predicting precursors for
two perovskite materials, using only training data published over a decade
prior to their first reported syntheses. We demonstrate that the model learns
representations of materials corresponding to synthesis-related properties, and
that the model's behavior complements existing thermodynamic knowledge.
Finally, we apply the model to perform synthesizability screening for proposed
novel perovskite compounds.Comment: Added new funding support to the acknowledgments section in this
versio
The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures
Materials science literature contains millions of materials synthesis
procedures described in unstructured natural language text. Large-scale
analysis of these synthesis procedures would facilitate deeper scientific
understanding of materials synthesis and enable automated synthesis planning.
Such analysis requires extracting structured representations of synthesis
procedures from the raw text as a first step. To facilitate the training and
evaluation of synthesis extraction models, we introduce a dataset of 230
synthesis procedures annotated by domain experts with labeled graphs that
express the semantics of the synthesis sentences. The nodes in this graph are
synthesis operations and their typed arguments, and labeled edges specify
relations between the nodes. We describe this new resource in detail and
highlight some specific challenges to annotating scientific text with shallow
semantic structure. We make the corpus available to the community to promote
further research and development of scientific information extraction systems.Comment: Accepted as a long paper at the Linguistic Annotation Workshop (LAW)
at ACL 201
Retrosynthetic reaction prediction using neural sequence-to-sequence models
We describe a fully data driven model that learns to perform a retrosynthetic
reaction prediction task, which is treated as a sequence-to-sequence mapping
problem. The end-to-end trained model has an encoder-decoder architecture that
consists of two recurrent neural networks, which has previously shown great
success in solving other sequence-to-sequence prediction tasks such as machine
translation. The model is trained on 50,000 experimental reaction examples from
the United States patent literature, which span 10 broad reaction types that
are commonly used by medicinal chemists. We find that our model performs
comparably with a rule-based expert system baseline model, and also overcomes
certain limitations associated with rule-based expert systems and with any
machine learning approach that contains a rule-based expert system component.
Our model provides an important first step towards solving the challenging
problem of computational retrosynthetic analysis
Designing algorithms to aid discovery by chemical robots
Recently, automated robotic systems have become very efficient, thanks to improved coupling between sensor systems and algorithms, of which the latter have been gaining significance thanks to the increase in computing power over the past few decades. However, intelligent automated chemistry platforms for discovery orientated tasks need to be able to cope with the unknown, which is a profoundly hard problem. In this Outlook, we describe how recent advances in the design and application of algorithms, coupled with the increased amount of chemical data available, and automation and control systems may allow more productive chemical research and the development of chemical robots able to target discovery. This is shown through examples of workflow and data processing with automation and control, and through the use of both well-used and cutting-edge algorithms illustrated using recent studies in chemistry. Finally, several algorithms are presented in relation to chemical robots and chemical intelligence for knowledge discovery
Into the Unknown: How Computation Can Help Explore Uncharted Material Space
Novel functional materials are urgently needed to help combat the major global challenges facing humanity, such as climate change and resource scarcity. Yet, the traditional experimental materials discovery process is slow and the material space at our disposal is too vast to effectively explore using intuition-guided experimentation alone. Most experimental materials discovery programs necessarily focus on exploring the local space of known materials, so we are not fully exploiting the enormous potential material space, where more novel materials with unique properties may exist. Computation, facilitated by improvements in open-source software and databases, as well as computer hardware has the potential to significantly accelerate the rational development of materials, but all too often is only used to postrationalize experimental observations. Thus, the true predictive power of computation, where theory leads experimentation, is not fully utilized. Here, we discuss the challenges to successful implementation of computation-driven materials discovery workflows, and then focus on the progress of the field, with a particular emphasis on the challenges to reaching novel materials
Recommended from our members
Accelerating Materials Discovery with Machine Learning
As we enter the data age, ever-increasing amounts of human knowledge are being recorded in machine-readable formats.
This has opened up new opportunities to leverage data to accelerate scientific discovery.
This thesis focuses on how we can use historical and computational data to aid the discovery and development of new materials.
We begin by looking at a traditional materials informatics task -- elucidating the structure-function relationships of high-temperature cuprate superconductors.
One of the most significant challenges for materials informatics is the limited availability of relevant data.
We propose a simple calibration-based approach to estimate the apical and in-plane copper-oxygen distances from more readily available lattice parameter data to address this challenge for cuprate superconductors.
Our investigation uncovers a large, unexplored region of materials space that may yield cuprates with higher critical temperatures.
We propose two experimental avenues that may enable this region to be accessed.
Computational materials exploration is bottle-necked by our ability to provide input structures to feed our workflows.
Whilst \textit{ab-intio} structure identification is possible, it is computationally burdensome and we lack design rules for deciding where to target searches in high-throughput setups.
To address this, there is a need to develop tools that suggest promising candidates, enabling automated deployment and increased efficiency.
Machine learning models are well suited to this task, however, current approaches typically use hand-engineered inputs.
This means that their performance is circumscribed by the intuitions reflected in the chosen inputs.
We propose a novel way to formulate the machine learning task as a set regression problem over the elements in a material.
We show that our approach leads to higher sample efficiency than other well-established composition-based approaches.
Having demonstrated the ability of machine learning to aid in the selection of promising compound compositions, we next explore how useful machine learning might be for identifying fabrication routes.
Using a recently released data-mined data set of solid-state synthesis reactions, we design a two-stage model to predict the products of inorganic reactions.
We critically explore the performance of this model, showing that whilst the predictions fall short of the accuracy required to be chemically discriminative, the model provides valuable insights into understanding inorganic reactions.
Through careful investigation of the model's failure modes, we explore the challenges that remain in the construction of forward inorganic reaction prediction models and suggest some pathways to tackle the identified issues.
One of the principal ways that material scientists understand and categorise materials is in terms of their symmetries.
Crystal structure prototypes are assigned based on the presence of symmetrically equivalent sites known as Wyckoff positions.
We show that a powerful coarse-grained representation of materials structures can be constructed from the Wyckoff positions by discarding information about their coordinates within crystal structures.
One of the strengths of this representation is that it maintains the ability of structure-based methods to distinguish polymorphs whilst also allowing combinatorial enumeration akin to composition-based approaches.
We construct an end-to-end differentiable model that takes our proposed Wyckoff representation as input.
The performance of this approach is examined on a suite of materials discovery experiments showing that it leads to strong levels of enrichment in materials discovery tasks.
The research presented in this thesis highlights the promise of applying data-driven workflows and machine learning in materials discovery and development.
This thesis concludes by speculating about promising research directions for applying machine learning within materials discovery
- …