105,027 research outputs found

    Genetic optimization of training sets for improved machine learning models of molecular properties

    Get PDF
    The training of molecular models of quantum mechanical properties based on statistical machine learning requires large datasets which exemplify the map from chemical structure to molecular property. Intelligent a priori selection of training examples is often difficult or impossible to achieve as prior knowledge may be sparse or unavailable. Ordinarily representative selection of training molecules from such datasets is achieved through random sampling. We use genetic algorithms for the optimization of training set composition consisting of tens of thousands of small organic molecules. The resulting machine learning models are considerably more accurate with respect to small randomly selected training sets: mean absolute errors for out-of-sample predictions are reduced to ~25% for enthalpies, free energies, and zero-point vibrational energy, to ~50% for heat-capacity, electron-spread, and polarizability, and by more than ~20% for electronic properties such as frontier orbital eigenvalues or dipole-moments. We discuss and present optimized training sets consisting of 10 molecular classes for all molecular properties studied. We show that these classes can be used to design improved training sets for the generation of machine learning models of the same properties in similar but unrelated molecular sets.Comment: 9 pages, 6 figure

    Molecular dynamics simulations of the interactions of potential foulant molecules and a reverse osmosis membrane

    Get PDF
    Reverse osmosis (RO) is increasingly one of the most common technologies for desalination worldwide. However, fouling of the membranes used in the RO process remains one of the main challenges. In order to better understand the molecular basis of fouling the interactions of a fully atomistic model of a polyamide membrane with three different foulant molecules, oxygen gas, glucose and phenol, are investigated using molecular dynamics simulations. In addition to unbiased simulations, umbrella sampling methods have been used to calculate the free energy profiles of the membrane-foulant interactions. The results show that each of the three foulants interacts with the membrane in a different manner.It is found that a build up of the two organic foulants, glucose and phenol, occurs at the membrane-saline solution, due to the favourable nature of the interaction in this region, and that the presence of these foulants reduces the rate of flow of water molecules over the membrane-solution interface. However, analysis of the hydrogen bonding shows that the origin of attraction of the foulant for the membrane differs. In the case of oxygen gas the simulations show that a build up of gas within the membrane is likely, although, no deterioration in the membrane performance was observed

    First-principles molecular structure search with a genetic algorithm

    Full text link
    The identification of low-energy conformers for a given molecule is a fundamental problem in computational chemistry and cheminformatics. We assess here a conformer search that employs a genetic algorithm for sampling the low-energy segment of the conformation space of molecules. The algorithm is designed to work with first-principles methods, facilitated by the incorporation of local optimization and blacklisting conformers to prevent repeated evaluations of very similar solutions. The aim of the search is not only to find the global minimum, but to predict all conformers within an energy window above the global minimum. The performance of the search strategy is: (i) evaluated for a reference data set extracted from a database with amino acid dipeptide conformers obtained by an extensive combined force field and first-principles search and (ii) compared to the performance of a systematic search and a random conformer generator for the example of a drug-like ligand with 43 atoms, 8 rotatable bonds and 1 cis/trans bond

    Fabrication of a Self-Assembled and Flexible SERS Nanosensor for Explosive Detection at Parts-Per-Quadrillion Levels from Fingerprints

    Get PDF
    Apart from high sensitivity and selectivity of surface-enhanced Raman scattering (SERS)-based trace explosive detection, efficient sampling of explosive residue from real world surfaces is very important for homeland security applications. Herein, we demonstrate an entirely new SERS nanosensor fabrication approach. The SERS nanosensor was prepared by self-assembling chemically synthesized gold triangular nanoprisms (Au TNPs), which we show display strong electromagnetic field enhancements at the sharp tips and edges, onto a pressure-sensitive flexible adhesive film. Our SERS nanosensor provides excellent SERS activity (enhancement factor = ∼6.0 × 106) and limit of detection (as low as 56 parts-per-quadrillions) with high selectivity by chemometric analyses among three commonly military high explosives (TNT, RDX, and PETN). Furthermore, the SERS nanosensors present excellent reproducibility (<4.0% relative standard deviation at 1.0 μM concentration) and unprecedentedly high stability with a “shelf life” of at least 5 months. Finally, TNT and PETN were analyzed and quantified by transferring solid explosive residues from fingerprints left on solid surfaces to the SERS nanosensor. Taken together, the demonstrated sensitivity, selectivity, and reliability of the measurements as well as with the excellent shelf life of our SERS nanosensors obviate the need for complicated sample processing steps required for other analytical techniques, and thus these nanosensors have tremendous potential not only in the field of measurement science but also for homeland security applications to combat acts of terror and military threats

    Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks

    Full text link
    In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active towards a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria) it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.Comment: 17 pages, 17 figure

    Long-wavelength density fluctuations as nucleation precursors

    Full text link
    Recent theories of nucleation that go beyond Classical Nucleation Theory predict that diffusion-limited nucleation of both liquid droplets and of crystals from a low-density vapor (or weak solution) begins with long-wavelength density fluctuations. This means that in the early stages of nucleation, 'clusters' can have low density but large spatial extent, which is at odds with the classical picture of arbitrarily small clusters of the condensed phase. We present the results of kinetic Monte Carlo simulations using Forward Flux Sampling to show that these predictions are confirmed: namely that on average, nucleation begins in the presence of low-amplitude, but spatially extended density fluctuations thus confirming a significant prediction of the non-classical theory
    corecore