159 research outputs found
Atom-Density Representations for Machine Learning
The applications of machine learning techniques to chemistry and materials
science become more numerous by the day. The main challenge is to devise
representations of atomic systems that are at the same time complete and
concise, so as to reduce the number of reference calculations that are needed
to predict the properties of different types of materials reliably. This has
led to a proliferation of alternative ways to convert an atomic structure into
an input for a machine-learning model. We introduce an abstract definition of
chemical environments that is based on a smoothed atomic density, using a
bra-ket notation to emphasize basis set independence and to highlight the
connections with some popular choices of representations for describing atomic
systems. The correlations between the spatial distribution of atoms and their
chemical identities are computed as inner products between these feature kets,
which can be given an explicit representation in terms of the expansion of the
atom density on orthogonal basis functions, that is equivalent to the smooth
overlap of atomic positions (SOAP) power spectrum, but also in real space,
corresponding to -body correlations of the atom density. This formalism lays
the foundations for a more systematic tuning of the behavior of the
representations, by introducing operators that represent the correlations
between structure, composition, and the target properties. It provides a
unifying picture of recent developments in the field and indicates a way
forward towards more effective and computationally affordable machine-learning
schemes for molecules and materials
Feature Optimization for Atomistic Machine Learning Yields A Data-Driven Construction of the Periodic Table of the Elements
Machine-learning of atomic-scale properties amounts to extracting
correlations between structure, composition and the quantity that one wants to
predict. Representing the input structure in a way that best reflects such
correlations makes it possible to improve the accuracy of the model for a given
amount of reference data. When using a description of the structures that is
transparent and well-principled, optimizing the representation might reveal
insights into the chemistry of the data set. Here we show how one can
generalize the SOAP kernel to introduce a distance-dependent weight that
accounts for the multi-scale nature of the interactions, and a description of
correlations between chemical species. We show that this improves substantially
the performance of ML models of molecular and materials stability, while making
it easier to work with complex, multi-component systems and to extend SOAP to
coarse-grained intermolecular potentials. The element correlations that give
the best performing model show striking similarities with the conventional
periodic table of the elements, providing an inspiring example of how machine
learning can rediscover, and generalize, intuitive concepts that constitute the
foundations of chemistry.Comment: 9 pages, 4 figure
Atomic-scale representation and statistical learning of tensorial properties
This chapter discusses the importance of incorporating three-dimensional
symmetries in the context of statistical learning models geared towards the
interpolation of the tensorial properties of atomic-scale structures. We focus
on Gaussian process regression, and in particular on the construction of
structural representations, and the associated kernel functions, that are
endowed with the geometric covariance properties compatible with those of the
learning targets. We summarize the general formulation of such a
symmetry-adapted Gaussian process regression model, and how it can be
implemented based on a scheme that generalizes the popular smooth overlap of
atomic positions representation. We give examples of the performance of this
framework when learning the polarizability and the ground-state electron
density of a molecule
Efficient implementation of atom-density representations
Physically motivated and mathematically robust atom-centered representations of molecular structures are key to the success of modern atomistic machine learning. They lie at the foundation of a wide range of methods to predict the properties of both materials and molecules and to explore and visualize their chemical structures and compositions. Recently, it has become clear that many of the most effective representations share a fundamental formal connection. They can all be expressed as a discretization of n-body correlation functions of the local atom density, suggesting the opportunity of standardizing and, more importantly, optimizing their evaluation. We present an implementation, named librascal, whose modular design lends itself both to developing refinements to the density-based formalism and to rapid prototyping for new developments of rotationally equivariant atomistic representations. As an example, we discuss smooth overlap of atomic position (SOAP) features, perhaps the most widely used member of this family of representations, to show how the expansion of the local density can be optimized for any choice of radial basis sets. We discuss the representation in the context of a kernel ridge regression model, commonly used with SOAP features, and analyze how the computational effort scales for each of the individual steps of the calculation. By applying data reduction techniques in feature space, we show how to reduce the total computational cost by a factor of up to 4 without affecting the model’s symmetry properties and without significantly impacting its accuracy
Boltzmann-conserving classical dynamics in quantum time-correlation functions: "Matsubara dynamics".
We show that a single change in the derivation of the linearized semiclassical-initial value representation (LSC-IVR or "classical Wigner approximation") results in a classical dynamics which conserves the quantum Boltzmann distribution. We rederive the (standard) LSC-IVR approach by writing the (exact) quantum time-correlation function in terms of the normal modes of a free ring-polymer (i.e., a discrete imaginary-time Feynman path), taking the limit that the number of polymer beads N → ∞, such that the lowest normal-mode frequencies take their "Matsubara" values. The change we propose is to truncate the quantum Liouvillian, not explicitly in powers of ħ(2) at ħ(0) (which gives back the standard LSC-IVR approximation), but in the normal-mode derivatives corresponding to the lowest Matsubara frequencies. The resulting "Matsubara" dynamics is inherently classical (since all terms O(ħ(2)) disappear from the Matsubara Liouvillian in the limit N → ∞) and conserves the quantum Boltzmann distribution because the Matsubara Hamiltonian is symmetric with respect to imaginary-time translation. Numerical tests show that the Matsubara approximation to the quantum time-correlation function converges with respect to the number of modes and gives better agreement than LSC-IVR with the exact quantum result. Matsubara dynamics is too computationally expensive to be applied to complex systems, but its further approximation may lead to practical methods.T.J.H.H., M.J.W., and S.C.A. acknowledge funding from the U.K. Engineering and Physical Sciences Research Council. A.M. acknowledges the European Lifelong Learning Programme (LLP) for an Erasmus student placement scholarship. T.J.H.H. also acknowledges a Research Fellowship from Jesus College, Cambridge and helpful discussions with Dr. Adam Harper.This is the author accepted manuscript. The final version is available from AIP via http://dx.doi.org/10.1063/1.491631
Path-integral dynamics of water using curvilinear centroids
We develop a path-integral dynamics method for water that resembles centroid
molecular dynamics (CMD), except that the centroids are averages of
curvilinear, rather than cartesian, bead coordinates. The curvilinear
coordinates are used explicitly only when computing the potential of mean
force, the components of which are re-expressed in terms of cartesian
'quasi-centroids' (so-called because they are close to the cartesian
centroids). Cartesian equations of motion are obtained by making small
approximations to the quantum Boltzmann distribution. Simulations of the
infrared spectra of various water models over 150-600 K show these
approximations to be justified: for a two-dimensional OH-bond model, the
quasi-centroid molecular dynamics (QCMD) spectra lie close to the exact quantum
spectra, and almost on top of the Matsubara dynamics spectra; for gas-phase
water, the QCMD spectra are close to the exact quantum spectra; for liquid
water and ice (using the q-TIP4P/F surface), the QCMD spectra are close to the
CMD spectra at 600 K, and line up with the results of thermostatted
ring-polymer molecular dynamics and approximate quantum calculations at 300 and
150 K. The QCMD spectra show no sign of the CMD 'curvature problem' (of
erroneous red shifts and broadening). In the liquid and ice simulations, the
potential of mean force was evaluated on the fly by generalising an adiabatic
CMD algorithm to curvilinear coordinates; the full limit of adiabatic
separation needed to be taken, which made the QCMD calculations 8 times more
expensive than partially adiabatic CMD at 300 K, and 32 times at 150 K (and the
intensities may still not be converged at this temperature). The QCMD method is
probably generalisable to many other systems, provided collective
bead-coordinates can be identified that yield compact mean-field ring-polymer
distributions.Cambridge University Vice Chancellor's award
Snow property controls on modelled Ku-band altimeter estimates of first-year sea ice thickness: Case studies from the Canadian and Norwegian Arctic
Uncertainty in snow properties impacts the accuracy
of Arctic sea ice thickness estimates from radar altimetry. On firstyear sea ice (FYI), spatiotemporal variations in snow properties
can cause the Ku-band main radar scattering horizon to appear
above the snow/sea ice interface. This can increase the estimated
sea ice freeboard by several centimeters, leading to FYI thickness
overestimations. This study examines the expected changes in Kuband main scattering horizon and its impact on FYI thickness
estimates, with variations in snow temperature, salinity and
density derived from 10 naturally occurring Arctic FYI Cases
encompassing saline/non-saline, warm/cold, simple/complexly
layered snow (4 cm to 45 cm) overlying FYI (48 cm to 170 cm).
Using a semi-empirical modeling approach, snow properties from
these Cases are used to derive layer-wise brine volume and
dielectric constant estimates, to simulate the Ku-band main
scattering horizon and delays in radar propagation speed.
Differences between modeled and observed FYI thickness are
calculated to assess sources of error. Under both cold and warm
conditions, saline snow covers are shown to shift the main
scattering horizon above from the snow/sea ice interface, causing
thickness retrieval errors. Overestimates in FYI thicknesses of up
to 65% are found for warm, saline snow overlaying thin sea ice.
Our simulations exhibited a distinct shift in the main scattering
horizon when the snow layer densities became greater than 440
kg/m3
, especially under warmer snow conditions. Our simulations
suggest a mean Ku-band propagation delay for snow of 39%,
which is higher than 25%, suggested in previous studies
Machine-learning of atomic-scale properties based on physical principles
We briefly summarize the kernel regression approach, as used recently in
materials modelling, to fitting functions, particularly potential energy
surfaces, and highlight how the linear algebra framework can be used to both
predict and train from linear functionals of the potential energy, such as the
total energy and atomic forces. We then give a detailed account of the Smooth
Overlap of Atomic Positions (SOAP) representation and kernel, showing how it
arises from an abstract representation of smooth atomic densities, and how it
is related to several popular density-based representations of atomic
structure. We also discuss recent generalisations that allow fine control of
correlations between different atomic species, prediction and fitting of
tensorial properties, and also how to construct structural kernels---applicable
to comparing entire molecules or periodic systems---that go beyond an additive
combination of local environments
- …