44 research outputs found
Unified Representation of Molecules and Crystals for Machine Learning
Accurate simulations of atomistic systems from first principles are limited
by computational cost. In high-throughput settings, machine learning can
potentially reduce these costs significantly by accurately interpolating
between reference calculations. For this, kernel learning approaches crucially
require a single Hilbert space accommodating arbitrary atomistic systems. We
introduce a many-body tensor representation that is invariant to translations,
rotations and nuclear permutations of same elements, unique, differentiable,
can represent molecules and crystals, and is fast to compute. Empirical
evidence is presented for energy prediction errors below 1 kcal/mol for 7k
organic molecules and 5 meV/atom for 11k elpasolite crystals. Applicability is
demonstrated for phase diagrams of Pt-group/transition-metal binary systems.Comment: Revised version, minor changes throughou
Recommended from our members
Author Correction: Text-mined dataset of inorganic materials synthesis recipes.
An amendment to this paper has been published and can be accessed via a link at the top of the paper
Machine-learning rationalization and prediction of solid-state synthesis conditions
There currently exist no quantitative methods to determine the appropriate
conditions for solid-state synthesis. This not only hinders the experimental
realization of novel materials but also complicates the interpretation and
understanding of solid-state reaction mechanisms. Here, we demonstrate a
machine-learning approach that predicts synthesis conditions using large
solid-state synthesis datasets text-mined from scientific journal articles.
Using feature importance ranking analysis, we discovered that optimal heating
temperatures have strong correlations with the stability of precursor materials
quantified using melting points and formation energies (, ). In contrast, features derived from the thermodynamics of
synthesis-related reactions did not directly correlate to the chosen heating
temperatures. This correlation between optimal solid-state heating temperature
and precursor stability extends Tamman's rule from intermetallics to oxide
systems, suggesting the importance of reaction kinetics in determining
synthesis conditions. Heating times are shown to be strongly correlated with
the chosen experimental procedures and instrument setups, which may be
indicative of human bias in the dataset. Using these predictive features, we
constructed machine-learning models with good performance and general
applicability to predict the conditions required to synthesize diverse chemical
systems. Codes and data used in this work can be found at:
https://github.com/CederGroupHub/s4
Text-mining and machine-learning solid-state synthesis from the scientific literature
Innovations of novel materials often involve synthesizing new compounds with better materials properties. However, computationally designing synthesis methods for these new compounds remains an uncharted new area of research. This thesis proposes to use machine-learning approaches to predict materials synthesis routes by training on synthesis information from the published scientific literature. However, most inorganic materials synthesis information in the scientific literature is locked-up in written natural language and must be parsed using natural language processing and information retrieval techniques. Therefore, this thesis aims to achieve two objectives: 1) constructing a text-mining pipeline that extracts solid-state synthesis datasets from scientific papers, and 2) implementing an interpretable machine-learning method to predict solid-state synthesis conditions.Training information retrieval systems usually requires large manually labeled datasets, which are not widely available in materials informatics. To alleviate the lack of labeled datasets, we demonstrate a semi-supervised machine-learning method (Chapter 3), which is implemented for the classification of paragraphs in papers. Without any human labeling efforts, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental synthesis steps. Guided by a small amount of annotation, supervised training methods, such as random forest, can then associate these steps with different synthesis methods, such as solid-state or hydrothermal synthesis. Using the topic modeling results, we also show a Markov chain representation of the order of experimental steps, which reconstructs a flowchart of synthesis procedures.To fulfill the first objective, we have extracted a dataset of "codified recipes" for solid-state synthesis using an automated text-mining pipeline (Chapter 4). The dataset currently consists of over 30,000 solid-state synthesis entries. Every entry contains synthesis information including input materials, target materials, experimental operations, the associated processing parameters and synthesis conditions, and the balanced synthesis reaction equation. This dataset is the first-ever collection of machine-readable solid-state synthesis experiments and enables data mining of various aspects of inorganic materials synthesis.To fulfill the second objective, we have built a machine-learning approach that predicts solid-state synthesis conditions (heating temperature and heating time) using the above-mentioned dataset (Chapter 5). We used dominance importance ranking analysis and discovered that optimal heating temperatures have strong correlations with the stability of precursor materials. This correlation extends Tamman's rule from intermetallics to oxide systems, suggesting the importance of reaction kinetics in solid-state synthesis. Heating times are shown to be strongly correlated with the chosen experimental procedures and instrument setups, which may be indicative of the selection bias in the dataset. Our machine-learning models achieve good synthesis prediction performance and general applicability for diverse chemical systems. While focusing particularly on solid-state synthesis, this thesis demonstrates a scalable framework to unlock the large amount of inorganic materials synthesis information from the literature, and machine-learn robust and interpretable synthesis predictors. At the end of this thesis, we outline several interesting future research topics which expand the work into a broader context of materials informatics and synthesis science
Unified representation of molecules and crystals for machine learning
Accurate simulations of atomistic systems from first principles are limited by computational cost. In high-throughput settings, machine learning can reduce these costs significantly by accurately interpolating between reference calculations. For this, kernel learning approaches crucially require a representation that accommodates arbitrary atomistic systems. We introduce a many-body tensor representation that is invariant to translations, rotations, and nuclear permutations of same elements, unique, differentiable, can represent molecules and crystals, and is fast to compute. Empirical evidence for competitive energy and force prediction errors is presented for changes in molecular structure, crystal chemistry, and molecular dynamics using kernel regression and symmetric gradient-domain machine learning as models. Applicability is demonstrated for phase diagrams of Pt-group/transition-metal binary systems.publishe
Hydrophobicity Improvement of Cement-Based Materials Incorporated with Ionic Paraffin Emulsions (IPEs)
Cement-based materials are non-uniform porous materials that are easily permeated by harmful substances, thereby deteriorating their structural durability. In this work, three ionic paraffin emulsions (IPEs) (i.e., anionic paraffin emulsion (APE), cationic paraffin emulsion (CPE), and non-ionic paraffin emulsion (NPE), respectively) were prepared. The effects of incorporation of IPEs into cement-based materials on hydrophobicity improvement were investigated by environmental scanning electron microscopy (ESEM), Fourier transform infrared (FTIR) spectroscopy, transmission and reflection polarizing microscope (TRPM) tests and correlation analyses, as well as by compressive strength, impermeability, and apparent contact angle tests. Finally, the optimal type and the recommended dose of IPEs were suggested. Results reveal that the impermeability pressure and apparent contact angle value of cement-based materials incorporated with IPEs are significantly higher than those of the control group. Thus, the hydrophobicity of cement-based materials is significantly improved. However, IPEs adversely affect the compressive strength of cement-based materials. The apparent contact angle mainly affects impermeability. These three IPEs impart hydrophobicity to cement-based materials. In addition, the optimal NPE dose can significantly improve the hydrophobicity of cement-based materials
Synthetic accessibility and stability rules of NASICONs
In this paper we develop the stability rules for NASICON structured
materials, as an example of compounds with complex bond topology and
composition. By applying machine learning to the ab-initio computed phase
stability of 3881 potential NASICONs we can extract a simple two-dimensional
descriptor that is extremely good at separating stable from unstable NASICONS.
This machine-learned "tolerance factor" contains information on the Na content,
the radii and electronegativities of the elements, and the Madelung energy. We
test the predictive capability of this approach by selecting six predicted
NASICON compositions. Five out of the six resulted in a phase pure NASICON
while the sixth composition led to a NASICON that coexisted with other phases,
validating the efficacy of this approach. This work not only provide tools to
understand synthetic accessibility of NASICON-type materials, but also
demonstrate an efficient paradigm for discovering new materials with complicate
composition and atomic structure