9 research outputs found
Predicting materials properties without crystal structure: deep representation learning from stoichiometry
Abstract: Machine learning has the potential to accelerate materials discovery by accurately predicting materials properties at a low computational cost. However, the model inputs remain a key stumbling block. Current methods typically use descriptors constructed from knowledge of either the full crystal structure — therefore only applicable to materials with already characterised structures — or structure-agnostic fixed-length representations hand-engineered from the stoichiometry. We develop a machine learning approach that takes only the stoichiometry as input and automatically learns appropriate and systematically improvable descriptors from data. Our key insight is to treat the stoichiometric formula as a dense weighted graph between elements. Compared to the state of the art for structure-agnostic methods, our approach achieves lower errors with less data
Identifying Crystal Structures Beyond Known Prototypes from X-ray Powder Diffraction Spectra
The large amount of powder diffraction data for which the corresponding
crystal structures have not yet been identified suggests the existence of
numerous undiscovered, physically relevant crystal structure prototypes. In
this paper, we present a scheme to resolve powder diffraction data into crystal
structures with precise atomic coordinates by screening the space of all
possible atomic arrangements, i.e., structural prototypes, including those not
previously observed, using a pre-trained machine learning (ML) model. This
involves: (i) enumerating all possible symmetry-confined ways in which a given
composition can be accommodated in a given space group, (ii) ranking the
element-assigned prototype representations using energies predicted using Wren
ML model [Sci.\ Adv.\ 8, eabn4117 (2022)], (iii) assigning and perturbing atoms
along the degree of freedom allowed by the Wyckoff positions to match the
experimental diffraction data (iv) validating the thermodynamic stability of
the material using density-functional theory (DFT). An advantage of the
presented method is that it does not rely on a database of previously observed
prototypes and, therefore is capable of finding crystal structures with
entirely new symmetric arrangements of atoms. We demonstrate the workflow on
unidentified XRD spectra from the ICDD database and identify a number of stable
structures, where a majority turns out to be derivable from known prototypes,
but at least two are found to not be part of our prior structural data sets.Comment: 18 pages including citations and supplementary materials, 4 figures;
overall text improvement; revision of some results in Page
Matbench Discovery -- An evaluation framework for machine learning crystal stability prediction
Matbench Discovery simulates the deployment of machine learning (ML) energy
models in a high-throughput search for stable inorganic crystals. We address
the disconnect between (i) thermodynamic stability and formation energy and
(ii) in-domain vs out-of-distribution performance. Alongside this paper, we
publish a Python package to aid with future model submissions and a growing
online leaderboard with further insights into trade-offs between various
performance metrics. To answer the question which ML methodology performs best
at materials discovery, our initial release explores a variety of models
including random forests, graph neural networks (GNN), one-shot predictors,
iterative Bayesian optimizers and universal interatomic potentials (UIP).
Ranked best-to-worst by their test set F1 score on thermodynamic stability
prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P
> Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest.
The top 3 models are UIPs, the winning methodology for ML-guided materials
discovery, achieving F1 scores of ~0.6 for crystal stability classification and
discovery acceleration factors (DAF) of up to 5x on the first 10k most stable
predictions compared to dummy selection from our test set. We also highlight a
sharp disconnect between commonly used global regression metrics and more
task-relevant classification metrics. Accurate regressors are susceptible to
unexpectedly high false-positive rates if those accurate predictions lie close
to the decision boundary at 0 eV/atom above the convex hull where most
materials are. Our results highlight the need to focus on classification
metrics that actually correlate with improved stability hit rate.Comment: 18 pages, 9 figures, 3 table
DeePMD-kit v2: A software package for Deep Potential models
DeePMD-kit is a powerful open-source software package that facilitates
molecular dynamics simulations using machine learning potentials (MLP) known as
Deep Potential (DP) models. This package, which was released in 2017, has been
widely used in the fields of physics, chemistry, biology, and material science
for studying atomistic systems. The current version of DeePMD-kit offers
numerous advanced features such as DeepPot-SE, attention-based and hybrid
descriptors, the ability to fit tensile properties, type embedding, model
deviation, Deep Potential - Range Correction (DPRc), Deep Potential Long Range
(DPLR), GPU support for customized operators, model compression, non-von
Neumann molecular dynamics (NVNMD), and improved usability, including
documentation, compiled binary packages, graphical user interfaces (GUI), and
application programming interfaces (API). This article presents an overview of
the current major version of the DeePMD-kit package, highlighting its features
and technical details. Additionally, the article benchmarks the accuracy and
efficiency of different models and discusses ongoing developments.Comment: 51 pages, 2 figure
A foundation model for atomistic materials chemistry
Machine-learned force fields have transformed the atomistic modelling of
materials by enabling simulations of ab initio quality on unprecedented time
and length scales. However, they are currently limited by: (i) the significant
computational and human effort that must go into development and validation of
potentials for each particular system of interest; and (ii) a general lack of
transferability from one chemical system to the next. Here, using the
state-of-the-art MACE architecture we introduce a single general-purpose ML
model, trained on a public database of 150k inorganic crystals, that is capable
of running stable molecular dynamics on molecules and materials. We demonstrate
the power of the MACE-MP-0 model - and its qualitative and at times
quantitative accuracy - on a diverse set problems in the physical sciences,
including the properties of solids, liquids, gases, chemical reactions,
interfaces and even the dynamics of a small protein. The model can be applied
out of the box and as a starting or "foundation model" for any atomistic system
of interest and is thus a step towards democratising the revolution of ML force
fields by lowering the barriers to entry.Comment: 119 pages, 63 figures, 37MB PD
Genomic epidemiology of SARS-CoV-2 in a UK university identifies dynamics of transmission
AbstractUnderstanding SARS-CoV-2 transmission in higher education settings is important to limit spread between students, and into at-risk populations. In this study, we sequenced 482 SARS-CoV-2 isolates from the University of Cambridge from 5 October to 6 December 2020. We perform a detailed phylogenetic comparison with 972 isolates from the surrounding community, complemented with epidemiological and contact tracing data, to determine transmission dynamics. We observe limited viral introductions into the university; the majority of student cases were linked to a single genetic cluster, likely following social gatherings at a venue outside the university. We identify considerable onward transmission associated with student accommodation and courses; this was effectively contained using local infection control measures and following a national lockdown. Transmission clusters were largely segregated within the university or the community. Our study highlights key determinants of SARS-CoV-2 transmission and effective interventions in a higher education setting that will inform public health policy during pandemics.</jats:p