1,326 research outputs found

    Big Data meets Quantum Chemistry Approximations: The Δ\Delta-Machine Learning Approach

    Full text link
    Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k constitutional isomers of C7_7H10_{10}O2_2 we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree-Fock methods, at the computational cost of Hartree-Fock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semi-empirical quantum chemistry and machine learning models trained on 1 and 10\% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy

    Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning

    Get PDF
    We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schr\"odinger equation is mapped onto a non-linear statistical regression problem of reduced complexity. Regression models are trained on and compared to atomization energies computed with hybrid density-functional theory. Cross-validation over more than seven thousand small organic molecules yields a mean absolute error of ~10 kcal/mol. Applicability is demonstrated for the prediction of molecular atomization potential energy curves

    Transcription Factor Efg1 Shows a Haploinsufficiency Phenotype in Modulating the Cell Wall Architecture and Immunogenicity of Candida albicans

    No full text
    The Candida albicans transcription factor Efg1 is known to be involved in many different cellular processes, including morphogenesis, general metabolism, and virulence. Here we show that besides its manifold roles, Efg1 also has a prominent effect on cell wall structure and composition, strongly affecting the structural glucan part. Deletion of only one allele of EFG1 already results in severe phenotypes for cell wall biogenesis, comparable to those with deletion of both alleles, indicative of a severe haploinsufficiency for EFG1. The observed defects in structural setup of the cell wall, together with previously reported alterations in expression of cell surface proteins, result in altered immunogenic properties of strains with compromised Efg1 function. This is shown by interaction studies with macrophages and primary dendritic cells. The structural changes in the cell wall carbohydrate meshwork presented here, together with the manifold changes in cell wall protein composition and metabolism reported in other studies, contribute to the altered immune response mounted by innate immune cells and to the altered virulence phenotypes observed for strains lacking EFG1

    Machine Learning for Quantum Mechanical Properties of Atoms in Molecules

    Get PDF
    We introduce machine learning models of quantum mechanical observables of atoms in molecules. Instant out-of-sample predictions for proton and carbon nuclear chemical shifts, atomic core level excitations, and forces on atoms reach accuracies on par with density functional theory reference. Locality is exploited within non-linear regression via local atom-centered coordinate systems. The approach is validated on a diverse set of 9k small organic molecules. Linear scaling of computational cost in system size is demonstrated for saturated polymers with up to sub-mesoscale lengths

    Amazon.com, Inc.: Retailing Giant to High-Tech Player?

    Get PDF

    Machine Learning of Molecular Electronic Properties in Chemical Compound Space

    Get PDF
    The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel, and predictive structure-property relationships. Such relationships enable high-throughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning (ML) model, trained on a data base of \textit{ab initio} calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic ground- and excited-state properties. The properties include atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity, and excitation energies. The ML model is based on a deep multi-task artificial neural network, exploiting underlying correlations between various molecular properties. The input is identical to \emph{ab initio} methods, \emph{i.e.} nuclear charges and Cartesian coordinates of all atoms. For small organic molecules the accuracy of such a "Quantum Machine" is similar, and sometimes superior, to modern quantum-chemical methods---at negligible computational cost

    Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties

    Get PDF
    We introduce a fingerprint representation of molecules based on a Fourier series of atomic radial distribution functions. This fingerprint is unique (except for chirality), continuous, and differentiable with respect to atomic coordinates and nuclear charges. It is invariant with respect to translation, rotation, and nuclear permutation, and requires no pre-conceived knowledge about chemical bonding, topology, or electronic orbitals. As such it meets many important criteria for a good molecular representation, suggesting its usefulness for machine learning models of molecular properties trained across chemical compound space. To assess the performance of this new descriptor we have trained machine learning models of molecular enthalpies of atomization for training sets with up to 10k organic molecules, drawn at random from a published set of 134k organic molecules. We validate the descriptor on all remaining molecules of the 134k set. For a training set of 5k molecules the fingerprint descriptor achieves a mean absolute error of 8.0 kcal/mol, respectively. This is slightly worse than the performance attained using the Coulomb matrix, another popular alternative, reaching 6.2 kcal/mol for the same training and test sets

    The Separable Kernel of Nucleon-Nucleon Interaction in the Bethe-Salpeter Approach

    Get PDF
    The dispersion relations for nucleon-nucleon (NN) T-matrix in the framework of Bethe-Salpeter equation for two spin one-half particle system and with separable kernel of interaction are considered in the paper. The developed expressions are applied for construction of the separable kernel of interaction for S partial-waves in singlet and triplet channels. We calculate the low energy scattering parameters and the phase shifts and also the deuteron binding energy with the separable interaction. The approach can be easily extended to higher partial-waves for NN-scattering and other reactions (anti N N-, pi N-scattering).Comment: RevTex 4 style, 9 pages, 1 figur

    Kernel learning for ligand-based virtual screening: discovery of a new PPARγ agonist

    Get PDF
    Poster presentation at 5th German Conference on Cheminformatics: 23. CIC-Workshop Goslar, Germany. 8-10 November 2009 We demonstrate the theoretical and practical application of modern kernel-based machine learning methods to ligand-based virtual screening by successful prospective screening for novel agonists of the peroxisome proliferator-activated receptor gamma (PPARgamma) [1]. PPARgamma is a nuclear receptor involved in lipid and glucose metabolism, and related to type-2 diabetes and dyslipidemia. Applied methods included a graph kernel designed for molecular similarity analysis [2], kernel principle component analysis [3], multiple kernel learning [4], and, Gaussian process regression [5]. In the machine learning approach to ligand-based virtual screening, one uses the similarity principle [6] to identify potentially active compounds based on their similarity to known reference ligands. Kernel-based machine learning [7] uses the "kernel trick", a systematic approach to the derivation of non-linear versions of linear algorithms like separating hyperplanes and regression. Prerequisites for kernel learning are similarity measures with the mathematical property of positive semidefiniteness (kernels). The iterative similarity optimal assignment graph kernel (ISOAK) [2] is defined directly on the annotated structure graph, and was designed specifically for the comparison of small molecules. In our virtual screening study, its use improved results, e.g., in principle component analysis-based visualization and Gaussian process regression. Following a thorough retrospective validation using a data set of 176 published PPARgamma agonists [8], we screened a vendor library for novel agonists. Subsequent testing of 15 compounds in a cell-based transactivation assay [9] yielded four active compounds. The most interesting hit, a natural product derivative with cyclobutane scaffold, is a full selective PPARgamma agonist (EC50 = 10 ± 0.2 microM, inactive on PPARalpha and PPARbeta/delta at 10 microM). We demonstrate how the interplay of several modern kernel-based machine learning approaches can successfully improve ligand-based virtual screening results
    corecore