43 research outputs found
Genetic optimization of training sets for improved machine learning models of molecular properties
The training of molecular models of quantum mechanical properties based on
statistical machine learning requires large datasets which exemplify the map
from chemical structure to molecular property. Intelligent a priori selection
of training examples is often difficult or impossible to achieve as prior
knowledge may be sparse or unavailable. Ordinarily representative selection of
training molecules from such datasets is achieved through random sampling. We
use genetic algorithms for the optimization of training set composition
consisting of tens of thousands of small organic molecules. The resulting
machine learning models are considerably more accurate with respect to small
randomly selected training sets: mean absolute errors for out-of-sample
predictions are reduced to ~25% for enthalpies, free energies, and zero-point
vibrational energy, to ~50% for heat-capacity, electron-spread, and
polarizability, and by more than ~20% for electronic properties such as
frontier orbital eigenvalues or dipole-moments. We discuss and present
optimized training sets consisting of 10 molecular classes for all molecular
properties studied. We show that these classes can be used to design improved
training sets for the generation of machine learning models of the same
properties in similar but unrelated molecular sets.Comment: 9 pages, 6 figure
Cryogenic Spectroscopy and Quantum Molecular Dynamics Determine the Structure of Cyclic Intermediates Involved in Peptide Sequence Scrambling
Collision-induced dissociation (CID) is a key technique used in mass spectrometry-based peptide sequencing. Collisionally activated peptides undergo statistical dissociation, forming a series of backbone fragment ions that reflect their amino acid (AA) sequence. Some of these fragments may experience a “head-to-tail” cyclization, which after proton migration, can lead to the cyclic structure opening in a different place than the initially formed bond. This process leads to AA sequence scrambling that may hinder sequencing of the initial peptide. Here we combine cryogenic ion spectroscopy and ab initio molecular dynamics simulations to isolate and characterize the precise structures of key intermediates in the scrambling process. The most stable peptide fragments show intriguing symmetric cyclic structures in which the proton is situated on a C2 symmetry axis and forms exceptionally short H-bonds (1.20A) with two backbone oxygens. Other non-symmetric cyclic structures also exist, one of which is protonated on the amide nitrogen, where ring opening is likely to occur
Infrared Spectroscopy of Mobility-Selected H+-Gly-Pro-Gly-Gly (GPGG)
We report the first results from a new instrument capable of acquiring infrared spectra of mobility-selected ions. This demonstration involves using ion mobility to first separate the protonated peptide Gly-Pro-Gly-Gly (GPGG) into two conformational families with collisional cross-sections of 93.8 and 96.8 Ă…2. After separation, each family is independently analyzed by acquiring the infrared predissociation spectrum of the H2-tagged molecules. The ion mobility and spectroscopic data combined with density functional theory (DFT) based molecular dynamics simulations confirm the presence of one major conformer per family, which arises from cis/trans isomerization about the proline residue. We induce isomerization between the two conformers by using collisional activation in the drift tube and monitor the evolution of the ion distribution with ion mobility and infrared spectroscopy. While the cis-proline species is the preferred gas-phase structure, its relative population is smaller than that of the trans-proline species in the initial ion mobility drift distribution. This suggests that a portion of the trans-proline ion population is kinetically trapped as a higher energy conformer and may retain structural elements from solution
From a week to less than a day: Speedup and scaling of coordinate-scaled exact exchange calculations in plane waves
Exact exchange is a primordial ingredient in Kohn–Sham Density Functional Theory based Molecular Dynamics (MD) simulations whenever thermodynamic properties, kinetics, barrier heights or excitation energies have to be predicted with high accuracy. However, the cost of such calculations is often prohibitive, restricting the use of first principles MD to (semi-)local density functionals, in particular in a plane wave basis. We have recently proposed the use of coordinate-scaled orbitals to reduce the cost of the most expensive Fourier transforms during the calculation of the exact exchange potential of isolated systems. Here, we present the implementation and parallelisation of this coordinate scaling approach in the CPMD code and analyse its performance under different parallelisation schemes. We show that speedups increase with system size and that with an optimal configuration, speedups of up to one order of magnitude are possible with respect to conventional calculations. Simulations that have previously taken one week can therefore be finished within less than a day
Time and length scales in ab initio molecular dynamics
A review; time and length scales accessible to ab initio mol. dynamics simulations are necessarily limited. This is the price one pays for an "in principle" unbiased description of chem. reactivity. We address the problem of times scales focusing on the detn. of the reaction path and mechanism of chem. reactions using advanced sampling techniques. Further in the discussion about the calcn. of thermochem. consts. both, time and length scale problems are covered. Finally, new techniques for extending system sizes with d. functional theory are presented