    Coarse-Graining Auto-Encoders for Molecular Dynamics

    Molecular dynamics simulations provide theoretical insight into the microscopic behavior of materials in condensed phase and, as a predictive tool, enable computational design of new compounds. However, because of the large temporal and spatial scales involved in thermodynamic and kinetic phenomena in materials, atomistic simulations are often computationally unfeasible. Coarse-graining methods allow simulating larger systems, by reducing the dimensionality of the simulation, and propagating longer timesteps, by averaging out fast motions. Coarse-graining involves two coupled learning problems; defining the mapping from an all-atom to a reduced representation, and the parametrization of a Hamiltonian over coarse-grained coordinates. Multiple statistical mechanics approaches have addressed the latter, but the former is generally a hand-tuned process based on chemical intuition. Here we present Autograin, an optimization framework based on auto-encoders to learn both tasks simultaneously. Autograin is trained to learn the optimal mapping between all-atom and reduced representation, using the reconstruction loss to facilitate the learning of coarse-grained variables. In addition, a force-matching method is applied to variationally determine the coarse-grained potential energy function. This procedure is tested on a number of model systems including single-molecule and bulk-phase periodic simulations.Comment: 8 pages, 6 figure

    Simulations with machine learning potentials identify the ion conduction mechanism mediating non-Arrhenius behavior in LGPS

    Li10_{10}Ge(PS6_6)2_2 (LGPS) is a highly concentrated solid electrolyte, in which Coulombic repulsion between neighboring cations is hypothesized as the underlying reason for concerted ion hopping, a mechanism common among superionic conductors such as Li7_7La3_3Zr2_2O12_{12} (LLZO) and Li1.3_{1.3}Al0.3_{0.3}Ti1.7_{1.7}(PO4_4)3_3 (LATP). While first principles simulations using molecular dynamics (MD) provide insight into the Li+^+ transport mechanism, historically, there has been a gap in the temperature ranges studied in simulations and experiments. Here, we used a neural network (NN) potential trained on density functional theory (DFT) simulations, to run up to 40-nanosecond long MD simulations at DFT-like accuracy to characterize the ion conduction mechanisms across a range of temperatures that includes previous simulations and experimental studies. We have confirmed a Li+^+ sublattice phase transition in LGPS around 400 K, below which the \textit{ab}-plane diffusivity Dab∗D^*_{ab} is drastically reduced. Concomitant with the sublattice phase transition near 400 K, there is less cation-cation (cross) correlation, as characterized by Haven ratios closer to 1, and the vibrations in the system are more harmonic at lower temperature. Intuitively, at high temperature, the collection of vibrational modes may be sufficient to drive concerted ion hops. However, near room temperature, the vibrational modes available may be insufficient to overcome electrostatic repulsion, thus resulting in less correlated ion motion and comparatively slower ion conduction. Such phenomena of a sublattice phase transition, below which concerted hopping plays a less significant role, may be extended to other highly concentrated solid electrolytes such as LLZO and LATP

    Chemistry-informed Macromolecule Graph Representation for Similarity Computation and Supervised Learning

    Macromolecules are large, complex molecules composed of covalently bonded monomer units, existing in different stereochemical configurations and topologies. As a result of such chemical diversity, representing, comparing, and learning over macromolecules emerge as critical challenges. To address this, we developed a macromolecule graph representation, with monomers and bonds as nodes and edges, respectively. We captured the inherent chemistry of the macromolecule by using molecular fingerprints for node and edge attributes. For the first time, we demonstrated computation of chemical similarity between 2 macromolecules of varying chemistry and topology, using exact graph edit distances and graph kernels. We also trained graph neural networks for a variety of glycan classification tasks, achieving state-of-the-art results. Our work has two-fold implications - it provides a general framework for representation, comparison, and learning of macromolecules; and enables quantitative chemistry-informed decision-making and iterative design in the macromolecular chemical space.Comment: Main text: 4 pages, 2 figures, 1 table; Appendix: 18 pages, 25 figures, 3 table

    Learning Pair Potentials using Differentiable Simulations

    Learning pair interactions from experimental or simulation data is of great interest for molecular simulations. We propose a general stochastic method for learning pair interactions from data using differentiable simulations (DiffSim). DiffSim defines a loss function based on structural observables, such as the radial distribution function, through molecular dynamics (MD) simulations. The interaction potentials are then learned directly by stochastic gradient descent, using backpropagation to calculate the gradient of the structural loss metric with respect to the interaction potential through the MD simulation. This gradient-based method is flexible and can be configured to simulate and optimize multiple systems simultaneously. For example, it is possible to simultaneously learn potentials for different temperatures or for different compositions. We demonstrate the approach by recovering simple pair potentials, such as Lennard-Jones systems, from radial distribution functions. We find that DiffSim can be used to probe a wider functional space of pair potentials compared to traditional methods like Iterative Boltzmann Inversion. We show that our methods can be used to simultaneously fit potentials for simulations at different compositions and temperatures to improve the transferability of the learned potentials.Comment: 12 pages, 10 figure

    Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks

    Neural network (NN) interatomic potentials provide fast prediction of potential energy surfaces, closely matching the accuracy of the electronic structure methods used to produce the training data. However, NN predictions are only reliable within well-learned training domains, and show volatile behavior when extrapolating. Uncertainty quantification approaches can flag atomic configurations for which prediction confidence is low, but arriving at such uncertain regions requires expensive sampling of the NN phase space, often using atomistic simulations. Here, we exploit automatic differentiation to drive atomistic systems towards high-likelihood, high-uncertainty configurations without the need for molecular dynamics simulations. By performing adversarial attacks on an uncertainty metric, informative geometries that expand the training domain of NNs are sampled. When combined to an active learning loop, this approach bootstraps and improves NN potentials while decreasing the number of calls to the ground truth method. This efficiency is demonstrated on sampling of kinetic barriers and collective variables in molecules, and can be extended to any NN potential architecture and materials system.Comment: 12 pages, 4 figures, supporting informatio

    Entropy and Energy Profiles of Chemical Reactions

    The description of chemical processes at the molecular level is often facilitated by use of reaction coordinates, or collective variables (CVs). The CV measures the progress of the reaction and allows the construction of profiles that track the evolution of a specific property as the reaction progresses. Whereas CVs are routinely used, especially alongside enhanced sampling techniques, links between profiles and thermodynamic state functions and reaction rate constants are not rigorously exploited. Here, we report a unified treatment of such reaction profiles. Tractable expressions are derived for the free-energy, internal-energy, and entropy profiles as functions of only the CV. We demonstrate the ability of this treatment to extract quantitative insight from the entropy and internal-energy profiles of various real-world physicochemical processes, including intramolecular organic reactions, ionic transport in superionic electrolytes, and molecular transport in nanoporous materials

    Photocell optimization using dark state protection

    This work was supported by the Leverhulme Trust (RPG-080). EMG is supported by the Royal Society of Edinburgh/Scottish Government. RGB thanks Samsung Advanced Institute of Technology for funding. AF thanks the Anglo-Israeli association and the Anglo-Jewish association for funding.Conventional photocells suffer a fundamental efficiency threshold imposed by the principle of detailed balance, reflecting the fact that good absorbers must necessarily also be fast emitters. This limitation can be overcome by "parking" the energy of an absorbed photon in a dark state which neither absorbs nor emits light. Here we argue that suitable dark states occur naturally as a consequence of the dipole-dipole interaction between two proximal optical dipoles for a wide range of realistic molecular dimers. We develop an intuitive model of a photocell comprising two light-absorbing molecules coupled to an idealized reaction centre, showing asymmetric dimers are capable of providing a significant enhancement of light-to-current conversion under ambient conditions. We conclude by describing a road map for identifying suitable molecular dimers for demonstrating this effect by screening a very large set of possible candidate molecules.PostprintPeer reviewe

    Automated patent extraction powers generative modeling in focused chemical spaces

    Deep generative models have emerged as an exciting avenue for inverse molecular design, with progress coming from the interplay between training algorithms and molecular representations. One of the key challenges in their applicability to materials science and chemistry has been the lack of access to sizeable training datasets with property labels. Published patents contain the first disclosure of new materials prior to their publication in journals, and are a vast source of scientific knowledge that has remained relatively untapped in the field of data-driven molecular design. Because patents are filed seeking to protect specific uses, molecules in patents can be considered to be weakly labeled into application classes. Furthermore, patents published by the US Patent and Trademark Office (USPTO) are downloadable and have machine-readable text and molecular structures. In this work, we train domain-specific generative models using patent data sources by developing an automated pipeline to go from USPTO patent digital files to the generation of novel candidates with minimal human intervention. We test the approach on two in-class extracted datasets, one in organic electronics and another in tyrosine kinase inhibitors. We then evaluate the ability of generative models trained on these in-class datasets on two categories of tasks (distribution learning and property optimization), identify strengths and limitations, and suggest possible explanations and remedies that could be used to overcome these in practice
