9 research outputs found
Recursive evaluation and iterative contraction of -body equivariant features
Mapping an atomistic configuration to an -point correlation of a field
associated with the atomic positions (e.g. an atomic density) has emerged as an
elegant and effective solution to represent structures as the input of
machine-learning algorithms. While it has become clear that low-order density
correlations do not provide a complete representation of an atomic environment,
the exponential increase in the number of possible -body invariants makes it
difficult to design a concise and effective representation. We discuss how to
exploit recursion relations between equivariant features of different orders
(generalizations of -body invariants that provide a complete representation
of the symmetries of improper rotations) to compute high-order terms
efficiently. In combination with the automatic selection of the most expressive
combination of features at each order, this approach provides a conceptual and
practical framework to generate systematically-improvable, symmetry adapted
representations for atomistic machine learning
Optimal radial basis for density-based atomic representations
The input of almost every machine learning algorithm targeting the properties of matter at the atomic scale involves a transformation of the list of Cartesian atomic coordinates into a more symmetric representation. Many of the most popular representations can be seen as an expansion of the symmetrized correlations of the atom density and differ mainly by the choice of basis. Considerable effort has been dedicated to the optimization of the basis set, typically driven by heuristic considerations on the behavior of the regression target. Here, we take a different, unsupervised viewpoint, aiming to determine the basis that encodes in the most compact way possible the structural information that is relevant for the dataset at hand. For each training dataset and number of basis functions, one can build a unique basis that is optimal in this sense and can be computed at no additional cost with respect to the primitive basis by approximating it with splines. We demonstrate that this construction yields representations that are accurate and computationally efficient, particularly when working with representations that correspond to high-body order correlations. We present examples that involve both molecular and condensed-phase machine-learning models
Expanding Density-Correlation Machine Learning Representations for Anisotropic Coarse-Grained Particles
Physics-based, atom-centered machine learning (ML) representations have been
instrumental to the effective integration of ML within the atomistic simulation
community. Many of these representations build off the idea of atoms as having
spherical, or isotropic, interactions. In many communities, there is often a
need to represent groups of atoms, either to increase the computational
efficiency of simulation via coarse-graining or to understand molecular
influences on system behavior. In such cases, atom-centered representations
will have limited utility, as groups of atoms may not be well-approximated as
spheres. In this work, we extend the popular Smooth Overlap of Atomic Positions
(SOAP) ML representation for systems consisting of non-spherical anisotropic
particles or clusters of atoms. We show the power of this anisotropic extension
of SOAP, which we deem \AniSOAP, in accurately characterizing liquid crystal
systems and predicting the energetics of Gay-Berne ellipsoids and
coarse-grained benzene crystals. With our study of these prototypical
anisotropic systems, we derive fundamental insights into how molecular shape
influences mesoscale behavior and explain how to reincorporate important
atom-atom interactions typically not captured by coarse-grained models. Moving
forward, we propose \AniSOAP as a flexible, unified framework for
coarse-graining in complex, multiscale simulation.Comment: The following article has been submitted to the Journal of Chemical
Physics. After it is published, the updated version can be found through
their websit
Roadmap on machine learning in electronic structure
In recent years, we have been witnessing a paradigm shift in computational materials science. In fact, traditional methods, mostly developed in the second half of the XXth century, are being complemented, extended, and sometimes even completely replaced by faster, simpler, and often more accurate approaches. The new approaches, that we collectively label by machine learning, have their origins in the fields of informatics and artificial intelligence, but are making rapid inroads in all other branches of science. With this in mind, this Roadmap article, consisting of multiple contributions from experts across the field, discusses the use of machine learning in materials science, and share perspectives on current and future challenges in problems as diverse as the prediction of materials properties, the construction of force-fields, the development of exchange correlation functionals for density-functional theory, the solution of the many-body problem, and more. In spite of the already numerous and exciting success stories, we are just at the beginning of a long path that will reshape materials science for the many challenges of the XXIth century
Unified theory of atom-centered representations and message-passing machine-learning schemes
Data-driven schemes that associate molecular and crystal structures with
their microscopic properties share the need for a concise, effective
description of the arrangement of their atomic constituents. Many types of
models rely on descriptions of atom-centered environments, that are associated
with an atomic property or with an atomic contribution to an extensive
macroscopic quantity. Frameworks in this class can be understood in terms of
atom-centered density correlations (ACDC), that are used as a basis for a
body-ordered, symmetry-adapted expansion of the targets. Several other schemes,
that gather information on the relationship between neighboring atoms using
"message-passing" ideas, cannot be directly mapped to correlations centered
around a single atom. We generalize the ACDC framework to include
multi-centered information, generating representations that provide a complete
linear basis to regress symmetric functions of atomic coordinates, and provides
a coherent foundation to systematize our understanding of both atom-centered
and message-passing, invariant and equivariant machine-learning schemes
Optimal radial basis for density-based atomic representations
The input of almost every machine learning algorithm targeting the properties of matter at the atomic scale involves a transformation of the list of Cartesian atomic coordinates into a more symmetric representation. Many of the most popular representations can be seen as an expansion of the symmetrized correlations of the atom density and differ mainly by the choice of basis. Considerable effort has been dedicated to the optimization of the basis set, typically driven by heuristic considerations on the behavior of the regression target. Here, we take a different, unsupervised viewpoint, aiming to determine the basis that encodes in the most compact way possible the structural information that is relevant for the dataset at hand. For each training dataset and number of basis functions, one can build a unique basis that is optimal in this sense and can be computed at no additional cost with respect to the primitive basis by approximating it with splines. We demonstrate that this construction yields representations that are accurate and computationally efficient, particularly when working with representations that correspond to high-body order correlations. We present examples that involve both molecular and condensed-phase machine-learning models.COSM
Completeness of atomic structure representations
In this paper, we address the challenge of obtaining a comprehensive and symmetric representation of point particle groups, such as atoms in a molecule, which is crucial in physics and theoretical chemistry. The problem has become even more important with the widespread adoption of machine-learning techniques in science, as it underpins the capacity of models to accurately reproduce physical relationships while being consistent with fundamental symmetries and conservation laws. However, some of the descriptors that are commonly used to represent point clouds— notably those based on discretized correlations of the neighbor density that power most of the existing ML models of matter at the atomic scale—are unable to distinguish between special arrangements of particles in three dimensions. This makes it impossible to machine learn their properties. Atom-density correlations are provably complete in the limit in which they simultaneously describe the mutual relationship between all atoms, which is impractical. We present a novel approach to construct descriptors of finite correlations based on the relative arrangement of particle triplets, which can be employed to create symmetry-adapted models with universal approximation capabilities, and have the resolution of the neighbor discretization as the sole convergence parameter. Our strategy is demonstrated on a class of atomic arrangements that are specifically built to defy a broad class of conventional symmetric descriptors, showing its potential for addressing their limitations
Electronic Excited States from Physically Constrained Machine Learning
Data-driven techniques are increasingly used to replace electronic-structure calculations of matter. In this context, a relevant question is whether machine learning (ML) should be applied directly to predict the desired properties or combined explicitly with physically grounded operations. We present an example of an integrated modeling approach in which a symmetry-adapted ML model of an effective Hamiltonian is trained to reproduce electronic excitations from a quantum-mechanical calculation. The resulting model can make predictions for molecules that are much larger and more complex than those on which it is trained and allows for dramatic computational savings by indirectly targeting the outputs of well-converged calculations while using a parametrization corresponding to a minimal atom-centered basis. These results emphasize the merits of intertwining data-driven techniques with physical approximations, improving the transferability and interpretability of ML models without affecting their accuracy and computational efficiency and providing a blueprint for developing ML-augmented electronic-structure methods