105,027 research outputs found
Genetic optimization of training sets for improved machine learning models of molecular properties
The training of molecular models of quantum mechanical properties based on
statistical machine learning requires large datasets which exemplify the map
from chemical structure to molecular property. Intelligent a priori selection
of training examples is often difficult or impossible to achieve as prior
knowledge may be sparse or unavailable. Ordinarily representative selection of
training molecules from such datasets is achieved through random sampling. We
use genetic algorithms for the optimization of training set composition
consisting of tens of thousands of small organic molecules. The resulting
machine learning models are considerably more accurate with respect to small
randomly selected training sets: mean absolute errors for out-of-sample
predictions are reduced to ~25% for enthalpies, free energies, and zero-point
vibrational energy, to ~50% for heat-capacity, electron-spread, and
polarizability, and by more than ~20% for electronic properties such as
frontier orbital eigenvalues or dipole-moments. We discuss and present
optimized training sets consisting of 10 molecular classes for all molecular
properties studied. We show that these classes can be used to design improved
training sets for the generation of machine learning models of the same
properties in similar but unrelated molecular sets.Comment: 9 pages, 6 figure
Molecular dynamics simulations of the interactions of potential foulant molecules and a reverse osmosis membrane
Reverse osmosis (RO) is increasingly one of the most common technologies for desalination worldwide. However, fouling of the membranes used in the RO process remains one of the main challenges. In order to better understand the molecular basis of fouling the interactions of a fully atomistic model of a polyamide membrane with three different foulant molecules, oxygen gas, glucose and phenol, are investigated using molecular dynamics simulations. In addition to unbiased simulations, umbrella sampling methods have been used to calculate the free energy profiles of the membrane-foulant interactions. The results show that each of the three foulants interacts with the membrane in a different manner.It is found that a build up of the two organic foulants, glucose and phenol, occurs at the membrane-saline solution, due to the favourable nature of the interaction in this region, and that the presence of these foulants reduces the rate of flow of water molecules over the membrane-solution interface. However, analysis of the hydrogen bonding shows that the origin of attraction of the foulant for the membrane differs. In the case of oxygen gas the simulations show that a build up of gas within the membrane is likely, although, no deterioration in the membrane performance was observed
First-principles molecular structure search with a genetic algorithm
The identification of low-energy conformers for a given molecule is a
fundamental problem in computational chemistry and cheminformatics. We assess
here a conformer search that employs a genetic algorithm for sampling the
low-energy segment of the conformation space of molecules. The algorithm is
designed to work with first-principles methods, facilitated by the
incorporation of local optimization and blacklisting conformers to prevent
repeated evaluations of very similar solutions. The aim of the search is not
only to find the global minimum, but to predict all conformers within an energy
window above the global minimum. The performance of the search strategy is: (i)
evaluated for a reference data set extracted from a database with amino acid
dipeptide conformers obtained by an extensive combined force field and
first-principles search and (ii) compared to the performance of a systematic
search and a random conformer generator for the example of a drug-like ligand
with 43 atoms, 8 rotatable bonds and 1 cis/trans bond
Fabrication of a Self-Assembled and Flexible SERS Nanosensor for Explosive Detection at Parts-Per-Quadrillion Levels from Fingerprints
Apart from high sensitivity and selectivity of surface-enhanced Raman scattering (SERS)-based trace explosive detection, efficient sampling of explosive residue from real world surfaces is very important for homeland security applications. Herein, we demonstrate an entirely new SERS nanosensor fabrication approach. The SERS nanosensor was prepared by self-assembling chemically synthesized gold triangular nanoprisms (Au TNPs), which we show display strong electromagnetic field enhancements at the sharp tips and edges, onto a pressure-sensitive flexible adhesive film. Our SERS nanosensor provides excellent SERS activity (enhancement factor = ∼6.0 × 106) and limit of detection (as low as 56 parts-per-quadrillions) with high selectivity by chemometric analyses among three commonly military high explosives (TNT, RDX, and PETN). Furthermore, the SERS nanosensors present excellent reproducibility (<4.0% relative standard deviation at 1.0 μM concentration) and unprecedentedly high stability with a “shelf life” of at least 5 months. Finally, TNT and PETN were analyzed and quantified by transferring solid explosive residues from fingerprints left on solid surfaces to the SERS nanosensor. Taken together, the demonstrated sensitivity, selectivity, and reliability of the measurements as well as with the excellent shelf life of our SERS nanosensors obviate the need for complicated sample processing steps required for other analytical techniques, and thus these nanosensors have tremendous potential not only in the field of measurement science but also for homeland security applications to combat acts of terror and military threats
Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks
In de novo drug design, computational strategies are used to generate novel
molecules with good affinity to the desired biological target. In this work, we
show that recurrent neural networks can be trained as generative models for
molecular structures, similar to statistical language models in natural
language processing. We demonstrate that the properties of the generated
molecules correlate very well with the properties of the molecules used to
train the model. In order to enrich libraries with molecules active towards a
given biological target, we propose to fine-tune the model with small sets of
molecules, which are known to be active against that target.
Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test
molecules that medicinal chemists designed, whereas against Plasmodium
falciparum (Malaria) it reproduced 28% of 1240 test molecules. When coupled
with a scoring function, our model can perform the complete de novo drug design
cycle to generate large sets of novel molecules for drug discovery.Comment: 17 pages, 17 figure
Long-wavelength density fluctuations as nucleation precursors
Recent theories of nucleation that go beyond Classical Nucleation Theory
predict that diffusion-limited nucleation of both liquid droplets and of
crystals from a low-density vapor (or weak solution) begins with
long-wavelength density fluctuations. This means that in the early stages of
nucleation, 'clusters' can have low density but large spatial extent, which is
at odds with the classical picture of arbitrarily small clusters of the
condensed phase. We present the results of kinetic Monte Carlo simulations
using Forward Flux Sampling to show that these predictions are confirmed:
namely that on average, nucleation begins in the presence of low-amplitude, but
spatially extended density fluctuations thus confirming a significant
prediction of the non-classical theory
- …