11 research outputs found

    FCHL revisited:Faster and more accurate quantum machine learning

    Get PDF
    We introduce the FCHL19 representation for atomic environments in molecules or condensed-phase systems. Machine learning models based on FCHL19 are able to yield predictions of atomic forces and energies of query compounds with chemical accuracy on the scale of milliseconds. FCHL19 is a revision of our previous work [Faber et al. 2018] where the representation is discretized and the individual features are rigorously optimized using Monte Carlo optimization. Combined with a Gaussian kernel function that incorporates elemental screening, chemical accuracy is reached for energy learning on the QM7b and QM9 datasets after training for minutes and hours, respectively. The model also shows good performance for non-bonded interactions in the condensed phase for a set of water clusters with an MAE binding energy error of less than 0.1 kcal/mol/molecule after training on 3,200 samples. For force learning on the MD17 dataset, our optimized model similarly displays state-of-the-art accuracy with a regressor based on Gaussian process regression. When the revised FCHL19 representation is combined with the operator quantum machine learning regressor, forces and energies can be predicted in only a few milliseconds per atom. The model presented herein is fast and lightweight enough for use in general chemistry problems as well as molecular dynamics simulations

    IMPRESSION – prediction of NMR parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy

    Get PDF
    The IMPRESSION (Intelligent Machine PREdiction of Shift and Scalar Information Of Nuclei) machine learning system provides an efficient and accurate route to the prediction of NMR parameters from 3-dimensional chemical structures. Here we demonstrate that machine learning predictions, trained on quantum chemical computed values for NMR parameters, are essentially as accurate but computationally much more efficient (tens of milliseconds per molecule) than quantum chemical calculations (hours/days per molecule). Training the machine learning systems on quantum chemical, rather than experimental, data circumvents the need for existence of large, structurally diverse, error-free experimental databases and makes IMPRESSION applicable to solving 3-dimensional problems such as molecular conformation and isomeris

    Sur un théorème de Poncelet et sa généralisation par M. Horvarth

    Get PDF
    Most chemical transformations (reactions or conformational changes) that are of interest to researchers have many degrees of freedom, usually too many to visualize without reducing the dimensionality of the system to include only the most important atomic motions. In this article, we describe a method of using Principal Component Analysis (PCA) for analyzing a series of molecular geometries (e.g., a reaction pathway or molecular dynamics trajectory) and determining the reduced dimensional space that captures the most structural variance in the fewest dimensions. The software written to carry out this method is called PathReducer, which permits (1) visualizing the geometries in a reduced dimensional space, (2) determining the axes that make up the reduced dimensional space, and (3) projecting the series of geometries into the low-dimensional space for visualization. We investigated two options to represent molecular structures within PathReducer: aligned Cartesian coordinates and matrices of interatomic distances. We found that interatomic distance matrices better captured non-linear motions in a smaller number of dimensions. To demonstrate the utility of PathReducer, we have carried out a number of applications where we have projected molecular dynamics trajectories into a reduced dimensional space defined by an intrinsic reaction coordinate. The visualizations provided by this analysis show that dynamic paths can differ greatly from the minimum energy pathway on a potential energy surface. Viewing intrinsic reaction coordinates and trajectories in this way provides a quick way to gather qualitative information about the pathways trajectories take relative to a minimum energy path. Given that the outputs from PCA are linear combinations of the input molecular structure coordinates (i.e., Cartesian coordinates or interatomic distances), they can be easily transferred to other types of calculations that require the definition of a reduced dimensional space (e.g., biased molecular dynamics simulations)

    A community-powered search of machine learning strategy space to find NMR property prediction models

    Get PDF
    The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published "in-house" efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties

    Sonifying stochastic walks on biomolecular energy landscapes

    Get PDF
    Translating the complex, multi-dimensional data produced by simulations of biomolecules into an intelligible form is a major challenge in computational chemistry and biology. The so-called “free energy landscape” is amongst the most fundamental concepts used by scientists to understand both static and dynamic properties of biomolecular systems. In this paper we use Markov models to design a strategy for mapping features of this landscape to sonic parameters, for use in conjunction with visual display techniques such as structural animations and free energy diagrams. This allows for concurrent visual display of the physical configuration of a biomolecule and auditory display of characteristics of the corresponding free energy landscape. The resulting sonification provides information about the relative free energy features of a given configuration including its stability

    ProCS15: a DFT-based chemical shift predictor for backbone and Cβ atoms in proteins

    Get PDF
    We present ProCS15: a program that computes the isotropic chemical shielding values of backbone and Cβ atoms given a protein structure in less than a second. ProCS15 is based on around 2.35 million OPBE/6-31G(d,p)//PM6 calculations on tripeptides and small structural models of hydrogen-bonding. The ProCS15-predicted chemical shielding values are compared to experimentally measured chemical shifts for Ubiquitin and the third IgG-binding domain of Protein G through linear regression and yield RMSD values of up to 2.2, 0.7, and 4.8 ppm for carbon, hydrogen, and nitrogen atoms. These RMSD values are very similar to corresponding RMSD values computed using OPBE/6-31G(d,p) for the entire structure for each proteins. These maximum RMSD values can be reduced by using NMR-derived structural ensembles of Ubiquitin. For example, for the largest ensemble the largest RMSD values are 1.7, 0.5, and 3.5 ppm for carbon, hydrogen, and nitrogen. The corresponding RMSD values predicted by several empirical chemical shift predictors range between 0.7–1.1, 0.2–0.4, and 1.8–2.8 ppm for carbon, hydrogen, and nitrogen atoms, respectively

    Training Neural Nets to Learn Reactive Potential Energy Surfaces using Interactive Quantum Chemistry in Virtual Reality

    No full text
    Whilst the primary bottleneck to a number of computational workflows was not so long ago limited by processing power, the rise of machine learning technologies has resulted in a paradigm shift which places increasing value on issues related to data curation - i.e., data size, quality, bias, format, and coverage. Increasingly, data-related issues are equally as important as the algorithmic methods used to process and learn from the data. Here we introduce an open source GPU-accelerated neural network (NN) framework for learning reactive potential energy surfaces (PESs), and investigate the use of real-time interactive ab initio molecular dynamics in virtual reality (iMD-VR) as a new strategy for rapidly sampling geometries along reaction pathways which can be used to train NNs to learn reactive PESs. Focussing on hydrogen abstraction reactions of CN radical with isopentane, we compare the performance of NNs trained using iMD-VR data versus NNs trained using a more traditional method, namely molecular dynamics (MD) constrained to sample a predefined grid of points along hydrogen abstraction reaction coordinates. Both the NN trained using iMD-VR data and the NN trained using the constrained MD data reproduce important qualitative features of the reactive PESs, such as a low and early barrier to abstraction. Quantitatively, learning is sensitive to the training dataset. Our results show that user-sampled structures obtained with the quantum chemical iMD-VR machinery enable better sampling in the vicinity of the minimum energy path (MEP). As a result, the NN trained on the iMD-VR data does very well predicting energies in the vicinity of the MEP, but less well predicting energies for 'off-path' structures. The NN trained on the constrained MD data does better in predicting energies for 'off-path' structures, given that it included a number of such structures in its training set
    corecore