9 research outputs found

    Predicting materials properties without crystal structure: deep representation learning from stoichiometry

    Get PDF
    Abstract: Machine learning has the potential to accelerate materials discovery by accurately predicting materials properties at a low computational cost. However, the model inputs remain a key stumbling block. Current methods typically use descriptors constructed from knowledge of either the full crystal structure — therefore only applicable to materials with already characterised structures — or structure-agnostic fixed-length representations hand-engineered from the stoichiometry. We develop a machine learning approach that takes only the stoichiometry as input and automatically learns appropriate and systematically improvable descriptors from data. Our key insight is to treat the stoichiometric formula as a dense weighted graph between elements. Compared to the state of the art for structure-agnostic methods, our approach achieves lower errors with less data

    Identifying Crystal Structures Beyond Known Prototypes from X-ray Powder Diffraction Spectra

    Full text link
    The large amount of powder diffraction data for which the corresponding crystal structures have not yet been identified suggests the existence of numerous undiscovered, physically relevant crystal structure prototypes. In this paper, we present a scheme to resolve powder diffraction data into crystal structures with precise atomic coordinates by screening the space of all possible atomic arrangements, i.e., structural prototypes, including those not previously observed, using a pre-trained machine learning (ML) model. This involves: (i) enumerating all possible symmetry-confined ways in which a given composition can be accommodated in a given space group, (ii) ranking the element-assigned prototype representations using energies predicted using Wren ML model [Sci.\ Adv.\ 8, eabn4117 (2022)], (iii) assigning and perturbing atoms along the degree of freedom allowed by the Wyckoff positions to match the experimental diffraction data (iv) validating the thermodynamic stability of the material using density-functional theory (DFT). An advantage of the presented method is that it does not rely on a database of previously observed prototypes and, therefore is capable of finding crystal structures with entirely new symmetric arrangements of atoms. We demonstrate the workflow on unidentified XRD spectra from the ICDD database and identify a number of stable structures, where a majority turns out to be derivable from known prototypes, but at least two are found to not be part of our prior structural data sets.Comment: 18 pages including citations and supplementary materials, 4 figures; overall text improvement; revision of some results in Page

    Matbench Discovery -- An evaluation framework for machine learning crystal stability prediction

    Full text link
    Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard with further insights into trade-offs between various performance metrics. To answer the question which ML methodology performs best at materials discovery, our initial release explores a variety of models including random forests, graph neural networks (GNN), one-shot predictors, iterative Bayesian optimizers and universal interatomic potentials (UIP). Ranked best-to-worst by their test set F1 score on thermodynamic stability prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest. The top 3 models are UIPs, the winning methodology for ML-guided materials discovery, achieving F1 scores of ~0.6 for crystal stability classification and discovery acceleration factors (DAF) of up to 5x on the first 10k most stable predictions compared to dummy selection from our test set. We also highlight a sharp disconnect between commonly used global regression metrics and more task-relevant classification metrics. Accurate regressors are susceptible to unexpectedly high false-positive rates if those accurate predictions lie close to the decision boundary at 0 eV/atom above the convex hull where most materials are. Our results highlight the need to focus on classification metrics that actually correlate with improved stability hit rate.Comment: 18 pages, 9 figures, 3 table

    DeePMD-kit v2: A software package for Deep Potential models

    Full text link
    DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials (MLP) known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensile properties, type embedding, model deviation, Deep Potential - Range Correction (DPRc), Deep Potential Long Range (DPLR), GPU support for customized operators, model compression, non-von Neumann molecular dynamics (NVNMD), and improved usability, including documentation, compiled binary packages, graphical user interfaces (GUI), and application programming interfaces (API). This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, the article benchmarks the accuracy and efficiency of different models and discusses ongoing developments.Comment: 51 pages, 2 figure

    A foundation model for atomistic materials chemistry

    Full text link
    Machine-learned force fields have transformed the atomistic modelling of materials by enabling simulations of ab initio quality on unprecedented time and length scales. However, they are currently limited by: (i) the significant computational and human effort that must go into development and validation of potentials for each particular system of interest; and (ii) a general lack of transferability from one chemical system to the next. Here, using the state-of-the-art MACE architecture we introduce a single general-purpose ML model, trained on a public database of 150k inorganic crystals, that is capable of running stable molecular dynamics on molecules and materials. We demonstrate the power of the MACE-MP-0 model - and its qualitative and at times quantitative accuracy - on a diverse set problems in the physical sciences, including the properties of solids, liquids, gases, chemical reactions, interfaces and even the dynamics of a small protein. The model can be applied out of the box and as a starting or "foundation model" for any atomistic system of interest and is thus a step towards democratising the revolution of ML force fields by lowering the barriers to entry.Comment: 119 pages, 63 figures, 37MB PD

    Genomic epidemiology of SARS-CoV-2 in a UK university identifies dynamics of transmission

    Get PDF
    AbstractUnderstanding SARS-CoV-2 transmission in higher education settings is important to limit spread between students, and into at-risk populations. In this study, we sequenced 482 SARS-CoV-2 isolates from the University of Cambridge from 5 October to 6 December 2020. We perform a detailed phylogenetic comparison with 972 isolates from the surrounding community, complemented with epidemiological and contact tracing data, to determine transmission dynamics. We observe limited viral introductions into the university; the majority of student cases were linked to a single genetic cluster, likely following social gatherings at a venue outside the university. We identify considerable onward transmission associated with student accommodation and courses; this was effectively contained using local infection control measures and following a national lockdown. Transmission clusters were largely segregated within the university or the community. Our study highlights key determinants of SARS-CoV-2 transmission and effective interventions in a higher education setting that will inform public health policy during pandemics.</jats:p
    corecore