1,329 research outputs found
Big Data meets Quantum Chemistry Approximations: The -Machine Learning Approach
Chemically accurate and comprehensive studies of the virtual space of all
possible molecules are severely limited by the computational cost of quantum
chemistry. We introduce a composite strategy that adds machine learning
corrections to computationally inexpensive approximate legacy quantum methods.
After training, highly accurate predictions of enthalpies, free energies,
entropies, and electron correlation energies are possible, for significantly
larger molecular sets than used for training. For thermochemical properties of
up to 16k constitutional isomers of CHO we present numerical
evidence that chemical accuracy can be reached. We also predict electron
correlation energy in post Hartree-Fock methods, at the computational cost of
Hartree-Fock, and we establish a qualitative relationship between molecular
entropy and electron correlation. The transferability of our approach is
demonstrated, using semi-empirical quantum chemistry and machine learning
models trained on 1 and 10\% of 134k organic molecules, to reproduce enthalpies
of all remaining molecules at density functional theory level of accuracy
Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning
We introduce a machine learning model to predict atomization energies of a
diverse set of organic molecules, based on nuclear charges and atomic positions
only. The problem of solving the molecular Schr\"odinger equation is mapped
onto a non-linear statistical regression problem of reduced complexity.
Regression models are trained on and compared to atomization energies computed
with hybrid density-functional theory. Cross-validation over more than seven
thousand small organic molecules yields a mean absolute error of ~10 kcal/mol.
Applicability is demonstrated for the prediction of molecular atomization
potential energy curves
Transcription Factor Efg1 Shows a Haploinsufficiency Phenotype in Modulating the Cell Wall Architecture and Immunogenicity of Candida albicans
The Candida albicans transcription factor Efg1 is known to be involved in many different cellular processes, including morphogenesis, general metabolism, and virulence. Here we show that besides its manifold roles, Efg1 also has a prominent effect on cell wall structure and composition, strongly affecting the structural glucan part. Deletion of only one allele of EFG1 already results in severe phenotypes for cell wall biogenesis, comparable to those with deletion of both alleles, indicative of a severe haploinsufficiency for EFG1. The observed defects in structural setup of the cell wall, together with previously reported alterations in expression of cell surface proteins, result in altered immunogenic properties of strains with compromised Efg1 function. This is shown by interaction studies with macrophages and primary dendritic cells. The structural changes in the cell wall carbohydrate meshwork presented here, together with the manifold changes in cell wall protein composition and metabolism reported in other studies, contribute to the altered immune response mounted by innate immune cells and to the altered virulence phenotypes observed for strains lacking EFG1
Machine Learning for Quantum Mechanical Properties of Atoms in Molecules
We introduce machine learning models of quantum mechanical observables of
atoms in molecules. Instant out-of-sample predictions for proton and carbon
nuclear chemical shifts, atomic core level excitations, and forces on atoms
reach accuracies on par with density functional theory reference. Locality is
exploited within non-linear regression via local atom-centered coordinate
systems. The approach is validated on a diverse set of 9k small organic
molecules. Linear scaling of computational cost in system size is demonstrated
for saturated polymers with up to sub-mesoscale lengths
Machine Learning of Molecular Electronic Properties in Chemical Compound Space
The combination of modern scientific computing with electronic structure
theory can lead to an unprecedented amount of data amenable to intelligent data
analysis for the identification of meaningful, novel, and predictive
structure-property relationships. Such relationships enable high-throughput
screening for relevant properties in an exponentially growing pool of virtual
compounds that are synthetically accessible. Here, we present a machine
learning (ML) model, trained on a data base of \textit{ab initio} calculation
results for thousands of organic molecules, that simultaneously predicts
multiple electronic ground- and excited-state properties. The properties
include atomization energy, polarizability, frontier orbital eigenvalues,
ionization potential, electron affinity, and excitation energies. The ML model
is based on a deep multi-task artificial neural network, exploiting underlying
correlations between various molecular properties. The input is identical to
\emph{ab initio} methods, \emph{i.e.} nuclear charges and Cartesian coordinates
of all atoms. For small organic molecules the accuracy of such a "Quantum
Machine" is similar, and sometimes superior, to modern quantum-chemical
methods---at negligible computational cost
Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties
We introduce a fingerprint representation of molecules based on a Fourier
series of atomic radial distribution functions. This fingerprint is unique
(except for chirality), continuous, and differentiable with respect to atomic
coordinates and nuclear charges. It is invariant with respect to translation,
rotation, and nuclear permutation, and requires no pre-conceived knowledge
about chemical bonding, topology, or electronic orbitals. As such it meets many
important criteria for a good molecular representation, suggesting its
usefulness for machine learning models of molecular properties trained across
chemical compound space. To assess the performance of this new descriptor we
have trained machine learning models of molecular enthalpies of atomization for
training sets with up to 10k organic molecules, drawn at random from a
published set of 134k organic molecules. We validate the descriptor on all
remaining molecules of the 134k set. For a training set of 5k molecules the
fingerprint descriptor achieves a mean absolute error of 8.0 kcal/mol,
respectively. This is slightly worse than the performance attained using the
Coulomb matrix, another popular alternative, reaching 6.2 kcal/mol for the same
training and test sets
The Separable Kernel of Nucleon-Nucleon Interaction in the Bethe-Salpeter Approach
The dispersion relations for nucleon-nucleon (NN) T-matrix in the framework
of Bethe-Salpeter equation for two spin one-half particle system and with
separable kernel of interaction are considered in the paper. The developed
expressions are applied for construction of the separable kernel of interaction
for S partial-waves in singlet and triplet channels. We calculate the low
energy scattering parameters and the phase shifts and also the deuteron binding
energy with the separable interaction. The approach can be easily extended to
higher partial-waves for NN-scattering and other reactions (anti N N-, pi
N-scattering).Comment: RevTex 4 style, 9 pages, 1 figur
Kernel learning for ligand-based virtual screening: discovery of a new PPARγ agonist
Poster presentation at 5th German Conference on Cheminformatics: 23. CIC-Workshop Goslar, Germany. 8-10 November 2009 We demonstrate the theoretical and practical application of modern kernel-based machine learning methods to ligand-based virtual screening by successful prospective screening for novel agonists of the peroxisome proliferator-activated receptor gamma (PPARgamma) [1]. PPARgamma is a nuclear receptor involved in lipid and glucose metabolism, and related to type-2 diabetes and dyslipidemia. Applied methods included a graph kernel designed for molecular similarity analysis [2], kernel principle component analysis [3], multiple kernel learning [4], and, Gaussian process regression [5]. In the machine learning approach to ligand-based virtual screening, one uses the similarity principle [6] to identify potentially active compounds based on their similarity to known reference ligands. Kernel-based machine learning [7] uses the "kernel trick", a systematic approach to the derivation of non-linear versions of linear algorithms like separating hyperplanes and regression. Prerequisites for kernel learning are similarity measures with the mathematical property of positive semidefiniteness (kernels). The iterative similarity optimal assignment graph kernel (ISOAK) [2] is defined directly on the annotated structure graph, and was designed specifically for the comparison of small molecules. In our virtual screening study, its use improved results, e.g., in principle component analysis-based visualization and Gaussian process regression. Following a thorough retrospective validation using a data set of 176 published PPARgamma agonists [8], we screened a vendor library for novel agonists. Subsequent testing of 15 compounds in a cell-based transactivation assay [9] yielded four active compounds. The most interesting hit, a natural product derivative with cyclobutane scaffold, is a full selective PPARgamma agonist (EC50 = 10 ± 0.2 microM, inactive on PPARalpha and PPARbeta/delta at 10 microM). We demonstrate how the interplay of several modern kernel-based machine learning approaches can successfully improve ligand-based virtual screening results
- …