11 research outputs found
FCHL revisited:Faster and more accurate quantum machine learning
We introduce the FCHL19 representation for atomic environments in molecules
or condensed-phase systems. Machine learning models based on FCHL19 are able to
yield predictions of atomic forces and energies of query compounds with
chemical accuracy on the scale of milliseconds. FCHL19 is a revision of our
previous work [Faber et al. 2018] where the representation is discretized and
the individual features are rigorously optimized using Monte Carlo
optimization. Combined with a Gaussian kernel function that incorporates
elemental screening, chemical accuracy is reached for energy learning on the
QM7b and QM9 datasets after training for minutes and hours, respectively. The
model also shows good performance for non-bonded interactions in the condensed
phase for a set of water clusters with an MAE binding energy error of less than
0.1 kcal/mol/molecule after training on 3,200 samples. For force learning on
the MD17 dataset, our optimized model similarly displays state-of-the-art
accuracy with a regressor based on Gaussian process regression. When the
revised FCHL19 representation is combined with the operator quantum machine
learning regressor, forces and energies can be predicted in only a few
milliseconds per atom. The model presented herein is fast and lightweight
enough for use in general chemistry problems as well as molecular dynamics
simulations
IMPRESSION – prediction of NMR parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy
The IMPRESSION (Intelligent Machine PREdiction of Shift and Scalar
Information Of Nuclei) machine learning system provides an efficient and
accurate route to the prediction of NMR parameters from 3-dimensional chemical
structures. Here we demonstrate that machine learning predictions, trained on
quantum chemical computed values for NMR parameters, are essentially as
accurate but computationally much more efficient (tens of milliseconds per
molecule) than quantum chemical calculations (hours/days per molecule).
Training the machine learning systems on quantum chemical, rather than
experimental, data circumvents the need for existence of large, structurally
diverse, error-free experimental databases and makes IMPRESSION applicable to
solving 3-dimensional problems such as molecular conformation and isomeris
Sur un théorème de Poncelet et sa généralisation par M. Horvarth
Most chemical transformations (reactions or conformational changes) that are of interest to researchers have many degrees of freedom, usually too many to visualize without reducing the dimensionality of the system to include only the most important atomic motions. In this article, we describe a method of using Principal Component Analysis (PCA) for analyzing a series of molecular geometries (e.g., a reaction pathway or molecular dynamics trajectory) and determining the reduced dimensional space that captures the most structural variance in the fewest dimensions. The software written to carry out this method is called PathReducer, which permits (1) visualizing the geometries in a reduced dimensional space, (2) determining the axes that make up the reduced dimensional space, and (3) projecting the series of geometries into the low-dimensional space for visualization. We investigated two options to represent molecular structures within PathReducer: aligned Cartesian coordinates and matrices of interatomic distances. We found that interatomic distance matrices better captured non-linear motions in a smaller number of dimensions. To demonstrate the utility of PathReducer, we have carried out a number of applications where we have projected molecular dynamics trajectories into a reduced dimensional space defined by an intrinsic reaction coordinate. The visualizations provided by this analysis show that dynamic paths can differ greatly from the minimum energy pathway on a potential energy surface. Viewing intrinsic reaction coordinates and trajectories in this way provides a quick way to gather qualitative information about the pathways trajectories take relative to a minimum energy path. Given that the outputs from PCA are linear combinations of the input molecular structure coordinates (i.e., Cartesian coordinates or interatomic distances), they can be easily transferred to other types of calculations that require the definition of a reduced dimensional space (e.g., biased molecular dynamics simulations)
A community-powered search of machine learning strategy space to find NMR property prediction models
The rise of machine learning (ML) has created an explosion in the potential
strategies for using data to make scientific predictions. For physical
scientists wishing to apply ML strategies to a particular domain, it can be
difficult to assess in advance what strategy to adopt within a vast space of
possibilities. Here we outline the results of an online community-powered
effort to swarm search the space of ML strategies and develop algorithms for
predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in
molecules. Using an open-source dataset, we worked with Kaggle to design and
host a 3-month competition which received 47,800 ML model predictions from
2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced
models with comparable accuracy to our best previously published "in-house"
efforts. A meta-ensemble model constructed as a linear combination of the top
predictions has a prediction accuracy which exceeds that of any individual
model, 7-19x better than our previous state-of-the-art. The results highlight
the potential of transformer architectures for predicting quantum mechanical
(QM) molecular properties
Sonifying stochastic walks on biomolecular energy landscapes
Translating the complex, multi-dimensional data produced by simulations
of biomolecules into an intelligible form is a major challenge
in computational chemistry and biology. The so-called “free
energy landscape” is amongst the most fundamental concepts used
by scientists to understand both static and dynamic properties of
biomolecular systems. In this paper we use Markov models to
design a strategy for mapping features of this landscape to sonic
parameters, for use in conjunction with visual display techniques
such as structural animations and free energy diagrams. This allows
for concurrent visual display of the physical configuration of
a biomolecule and auditory display of characteristics of the corresponding
free energy landscape. The resulting sonification provides
information about the relative free energy features of a given
configuration including its stability
ProCS15: a DFT-based chemical shift predictor for backbone and Cβ atoms in proteins
We present ProCS15: a program that computes the isotropic chemical shielding values of backbone and Cβ atoms given a protein structure in less than a second. ProCS15 is based on around 2.35 million OPBE/6-31G(d,p)//PM6 calculations on tripeptides and small structural models of hydrogen-bonding. The ProCS15-predicted chemical shielding values are compared to experimentally measured chemical shifts for Ubiquitin and the third IgG-binding domain of Protein G through linear regression and yield RMSD values of up to 2.2, 0.7, and 4.8 ppm for carbon, hydrogen, and nitrogen atoms. These RMSD values are very similar to corresponding RMSD values computed using OPBE/6-31G(d,p) for the entire structure for each proteins. These maximum RMSD values can be reduced by using NMR-derived structural ensembles of Ubiquitin. For example, for the largest ensemble the largest RMSD values are 1.7, 0.5, and 3.5 ppm for carbon, hydrogen, and nitrogen. The corresponding RMSD values predicted by several empirical chemical shift predictors range between 0.7–1.1, 0.2–0.4, and 1.8–2.8 ppm for carbon, hydrogen, and nitrogen atoms, respectively
Training Neural Nets to Learn Reactive Potential Energy Surfaces using Interactive Quantum Chemistry in Virtual Reality
Whilst the primary bottleneck to a number of computational workflows was not
so long ago limited by processing power, the rise of machine learning
technologies has resulted in a paradigm shift which places increasing value on
issues related to data curation - i.e., data size, quality, bias, format, and
coverage. Increasingly, data-related issues are equally as important as the
algorithmic methods used to process and learn from the data. Here we introduce
an open source GPU-accelerated neural network (NN) framework for learning
reactive potential energy surfaces (PESs), and investigate the use of real-time
interactive ab initio molecular dynamics in virtual reality (iMD-VR) as a new
strategy for rapidly sampling geometries along reaction pathways which can be
used to train NNs to learn reactive PESs. Focussing on hydrogen abstraction
reactions of CN radical with isopentane, we compare the performance of NNs
trained using iMD-VR data versus NNs trained using a more traditional method,
namely molecular dynamics (MD) constrained to sample a predefined grid of
points along hydrogen abstraction reaction coordinates. Both the NN trained
using iMD-VR data and the NN trained using the constrained MD data reproduce
important qualitative features of the reactive PESs, such as a low and early
barrier to abstraction. Quantitatively, learning is sensitive to the training
dataset. Our results show that user-sampled structures obtained with the
quantum chemical iMD-VR machinery enable better sampling in the vicinity of the
minimum energy path (MEP). As a result, the NN trained on the iMD-VR data does
very well predicting energies in the vicinity of the MEP, but less well
predicting energies for 'off-path' structures. The NN trained on the
constrained MD data does better in predicting energies for 'off-path'
structures, given that it included a number of such structures in its training
set
Recommended from our members
A community-powered search of machine learning strategy space to find NMR property prediction models
The rise of machine learning (ML) has created an explosion in the potential strategies for using data to make scientific predictions. For physical scientists wishing to apply ML strategies to a particular domain, it can be difficult to assess in advance what strategy to adopt within a vast space of possibilities. Here we outline the results of an online community-powered effort to swarm search the space of ML strategies and develop algorithms for predicting atomic-pairwise nuclear magnetic resonance (NMR) properties in molecules. Using an open-source dataset, we worked with Kaggle to design and host a 3-month competition which received 47,800 ML model predictions from 2,700 teams in 84 countries. Within 3 weeks, the Kaggle community produced models with comparable accuracy to our best previously published ‘in-house’ efforts. A meta-ensemble model constructed as a linear combination of the top predictions has a prediction accuracy which exceeds that of any individual model, 7-19x better than our previous state-of-the-art. The results highlight the potential of transformer architectures for predicting quantum mechanical (QM) molecular properties.</p