1,416 research outputs found

    First principles view on chemical compound space: Gaining rigorous atomistic control of molecular properties

    Full text link
    A well-defined notion of chemical compound space (CCS) is essential for gaining rigorous control of properties through variation of elemental composition and atomic configurations. Here, we review an atomistic first principles perspective on CCS. First, CCS is discussed in terms of variational nuclear charges in the context of conceptual density functional and molecular grand-canonical ensemble theory. Thereafter, we revisit the notion of compound pairs, related to each other via "alchemical" interpolations involving fractional nuclear chargens in the electronic Hamiltonian. We address Taylor expansions in CCS, property non-linearity, improved predictions using reference compound pairs, and the ounce-of-gold prize challenge to linearize CCS. Finally, we turn to machine learning of analytical structure property relationships in CCS. These relationships correspond to inferred, rather than derived through variational principle, solutions of the electronic Schr\"odinger equation

    Coarse-grained interaction potentials for polyaromatic hydrocarbons

    Get PDF
    Using Kohn-Sham density functional theory (KS-DFT), we have studied the interaction between various polyaromatic hydrocarbon molecules. The systems range from mono-cyclic benzene up to hexabenzocoronene (hbc). For several conventional exchange-correlation functionals potential energy curves of interaction of the π\pi-π\pi stacking hbc dimer are reported. It is found that all pure local density or generalized gradient approximated functionals yield qualitatively incorrect predictions regarding structure and interaction. Inclusion of a non-local, atom-centered correction to the KS-Hamiltonian enables quantitative predictions. The computed potential energy surfaces of interaction yield parameters for a coarse-grained potential, which can be employed to study discotic liquid-crystalline mesophases of derived polyaromatic macromolecules

    Understanding molecular representations in machine learning: The role of uniqueness and target similarity

    Get PDF
    The predictive accuracy of Machine Learning (ML) models of molecular properties depends on the choice of the molecular representation. Based on the postulates of quantum mechanics, we introduce a hierarchy of representations which meet uniqueness and target similarity criteria. To systematically control target similarity, we rely on interatomic many body expansions, as implemented in universal force-fields, including Bonding, Angular, and higher order terms (BA). Addition of higher order contributions systematically increases similarity to the true potential energy and predictive accuracy of the resulting ML models. We report numerical evidence for the performance of BAML models trained on molecular properties pre-calculated at electron-correlated and density functional theory level of theory for thousands of small organic molecules. Properties studied include enthalpies and free energies of atomization, heatcapacity, zero-point vibrational energies, dipole-moment, polarizability, HOMO/LUMO energies and gap, ionization potential, electron affinity, and electronic excitations. After training, BAML predicts energies or electronic properties of out-of-sample molecules with unprecedented accuracy and speed

    Alchemical and structural distribution based representation for improved QML

    Full text link
    We introduce a representation of any atom in any chemical environment for the generation of efficient quantum machine learning (QML) models of common electronic ground-state properties. The representation is based on scaled distribution functions explicitly accounting for elemental and structural degrees of freedom. Resulting QML models afford very favorable learning curves for properties of out-of-sample systems including organic molecules, non-covalently bonded protein side-chains, (H2_2O)40_{40}-clusters, as well as diverse crystals. The elemental components help to lower the learning curves, and, through interpolation across the periodic table, even enable "alchemical extrapolation" to covalent bonding between elements not part of training, as evinced for single, double, and triple bonds among main-group elements

    Toward transferable interatomic van der Waals interactions without electrons: The role of multipole electrostatics and many-body dispersion

    Get PDF
    We estimate polarizabilities of atoms in molecules without electron density, using a Voronoi tesselation approach instead of conventional density partitioning schemes. The resulting atomic dispersion coefficients are calculated, as well as many-body dispersion effects on intermolecular potential energies. We also estimate contributions from multipole electrostatics and compare them to dispersion. We assess the performance of the resulting intermolecular interaction model from dispersion and electrostatics for more than 1,300 neutral and charged, small organic molecular dimers. Applications to water clusters, the benzene crystal, the anti-cancer drug ellipticine---intercalated between two Watson-Crick DNA base pairs, as well as six macro-molecular host-guest complexes highlight the potential of this method and help to identify points of future improvement. The mean absolute error made by the combination of static electrostatics with many-body dispersion reduces at larger distances, while it plateaus for two-body dispersion, in conflict with the common assumption that the simple 1/R61/R^6 correction will yield proper dissociative tails. Overall, the method achieves an accuracy well within conventional molecular force fields while exhibiting a simple parametrization protocol.Comment: 13 pages, 8 figure

    Structure and band gaps of Ga-(V) semiconductors: The challenge of Ga pseudopotentials

    Get PDF
    Design of gallium pseudopotentials has been investigated for use in density functional calculations of zinc-blende-type cubic phases of GaAs, GaP, and GaN. A converged construction with respect to all-electron results is described. Computed lattice constants, bulk moduli, and band gaps vary significantly depending on pseudopotential construction or exchange-correlation functional. The Kohn-Sham band gap of the Ga-(V) semiconductors exhibits a distinctive and strong sensitivity to lattice constant, with near-linear dependence of gap on lattice constant for larger lattice constants and Gamma-X crossover that changes the slope of the dependence. This crossover occurs at approximate to 98, 101, and 95% deviation from the equilibrium lattice constant for GaAs, GaP, and GaN, respectively

    Non-covalent interactions across organic and biological subsets of chemical space: Physics-based potentials parametrized from machine learning

    Get PDF
    Classical intermolecular potentials typically require an extensive parametrization procedure for any new compound considered. To do away with prior parametrization, we propose a combination of physics-based potentials with machine learning (ML), coined IPML, which is transferable across small neutral organic and biologically-relevant molecules. ML models provide on-the-fly predictions for environment-dependent local atomic properties: electrostatic multipole coefficients (significant error reduction compared to previously reported), the population and decay rate of valence atomic densities, and polarizabilities across conformations and chemical compositions of H, C, N, and O atoms. These parameters enable accurate calculations of intermolecular contributions---electrostatics, charge penetration, repulsion, induction/polarization, and many-body dispersion. Unlike other potentials, this model is transferable in its ability to handle new molecules and conformations without explicit prior parametrization: All local atomic properties are predicted from ML, leaving only eight global parameters---optimized once and for all across compounds. We validate IPML on various gas-phase dimers at and away from equilibrium separation, where we obtain mean absolute errors between 0.4 and 0.7 kcal/mol for several chemically and conformationally diverse datasets representative of non-covalent interactions in biologically-relevant molecules. We further focus on hydrogen-bonded complexes---essential but challenging due to their directional nature---where datasets of DNA base pairs and amino acids yield an extremely encouraging 1.4 kcal/mol error. Finally, and as a first look, we consider IPML in denser systems: water clusters, supramolecular host-guest complexes, and the benzene crystal.Comment: 15 pages, 9 figure
    • …
    corecore