11 research outputs found

    Towards the simulation of biomolecules: optimisation of peptide-capped glycine using FFLUX

    Get PDF
    YesThe optimisation of a peptide-capped glycine using the novel force field FFLUX is presented. FFLUX is a force field based on the machine-learning method kriging and the topological energy partitioning method called Interacting Quantum Atoms. FFLUX has a completely different architecture to that of traditional force fields, avoiding (harmonic) potentials for bonded, valence and torsion angles. In this study, FFLUX performs an optimisation on a glycine molecule and successfully recovers the target density-functional theory energy with an error of 0.89 ± 0.03 kJ mol−1. It also recovers the structure of the global minimum with a root-mean-squared deviation of 0.05 Å (excluding hydrogen atoms). We also show that the geometry of the intra-molecular hydrogen bond in glycine is recovered accurately.EPSRC Established Career Fellowship [grant number EP/K005472

    Gaussian process models of potential energy surfaces with boundary optimization

    Get PDF
    A strategy is outlined to reduce the number of training points required to model intermolecular potentials using Gaussian processes, without reducing accuracy. An asymptotic function is used at a long range, and the crossover distance between this model and the Gaussian process is learnt from the training data. The results are presented for different implementations of this procedure, known as boundary optimization, across the following dimer systems: CO-Ne, HF-Ne, HF-Na+, CO2-Ne, and (CO2)2. The technique reduces the number of training points, at fixed accuracy, by up to ∼49%, compared to our previous work based on a sequential learning technique. The approach is readily transferable to other statistical methods of prediction or modeling problems

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    Efficient Training and Implementation of Gaussian Process Potentials

    Get PDF
    Molecular simulations are a powerful tool for translating information about the intermolecular interactions within a system to thermophysical properties via statistical mechanics. However, the accuracy of any simulation is limited by the potentials that model the microscopic interactions. Most first principles methods are too computationally expensive for use at every time-step or cycle of a simulation, which require typically thousands of energy evaluations. Meanwhile, cheaper semi-empirical potentials give rise to only qualitatively accurate simulations. Consequently, methods for efficient first principles predictions in simulations are of interest. Machine-learned potentials (MLPs) have shown promise in this area, offering first principles predictions at a fraction of the cost of ab initio calculation. Of particular interest are Gaussian process (GP) potentials, which achieve equivalent accuracy to other MLPs with smaller training sets. They therefore offer the best route to employing information from expensive ab initio calculations, for which building a large data set is time-consuming. GP potentials, however, are among the most computationally intensive MLPs. Thus, they are far more costly to employ in simulations than semi-empirical potentials. This work addresses the computational expense of GP potentials by both reducing the training set size at a given accuracy and developing a method to invoke GP potentials efficiently for first principles prediction in simulations. By varying the cross-over distance between the GP and a long-range function with the accuracy of the former, training by sequential design requires up to 40 % fewer training points at fixed accuracy. This method was applied successfully to the CO-Ne, HF-Ne, HF-Na+, CO2-Ne, 2CO, 2HF and 2HCl systems, and can be extended easily to other interactions and methods of prediction. Meanwhile, a significant reduction in the time taken for Monte Carlo displacement and volume change moves is achieved by parallelisation of the requisite GP calculations. Though this exploits in part the framework of GP regression, the distribution of the calculations themselves is general to other methods of prediction. The work also shows that current kernels and input transforms for modelling intermolecular interactions are not improved easily

    Efficient Training and Implementation of Gaussian Process Potentials

    Get PDF
    Molecular simulations are a powerful tool for translating information about the intermolecular interactions within a system to thermophysical properties via statistical mechanics. However, the accuracy of any simulation is limited by the potentials that model the microscopic interactions. Most first principles methods are too computationally expensive for use at every time-step or cycle of a simulation, which require typically thousands of energy evaluations. Meanwhile, cheaper semi-empirical potentials give rise to only qualitatively accurate simulations. Consequently, methods for efficient first principles predictions in simulations are of interest. Machine-learned potentials (MLPs) have shown promise in this area, offering first principles predictions at a fraction of the cost of ab initio calculation. Of particular interest are Gaussian process (GP) potentials, which achieve equivalent accuracy to other MLPs with smaller training sets. They therefore offer the best route to employing information from expensive ab initio calculations, for which building a large data set is time-consuming. GP potentials, however, are among the most computationally intensive MLPs. Thus, they are far more costly to employ in simulations than semi-empirical potentials. This work addresses the computational expense of GP potentials by both reducing the training set size at a given accuracy and developing a method to invoke GP potentials efficiently for first principles prediction in simulations. By varying the cross-over distance between the GP and a long-range function with the accuracy of the former, training by sequential design requires up to 40 % fewer training points at fixed accuracy. This method was applied successfully to the CO-Ne, HF-Ne, HF-Na+, CO2-Ne, 2CO, 2HF and 2HCl systems, and can be extended easily to other interactions and methods of prediction. Meanwhile, a significant reduction in the time taken for Monte Carlo displacement and volume change moves is achieved by parallelisation of the requisite GP calculations. Though this exploits in part the framework of GP regression, the distribution of the calculations themselves is general to other methods of prediction. The work also shows that current kernels and input transforms for modelling intermolecular interactions are not improved easily
    corecore