6 research outputs found

    PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications

    Get PDF
    Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities

    Supramolecular Polymerization of <i>N</i>,<i>N</i>′,<i>N</i>″,<i>N</i>‴-<i>tetra</i>-(Tetradecyl)-1,3,6,8-pyrenetetracarboxamide: A Computational Study

    No full text
    The role of molecular dipole orientations and intermolecular interactions in a derivative of pyrene on its supramolecular self-assembly in solution has been investigated using quantum chemical and force field based computational approaches. Five possible dipole configurations of the molecule have been examined, among which the one in which adjacent dipole vectors are antiparallel to each other is determined to be the ground state, on electrostatic grounds. Self-assembly of this molecule under realistic conditions has been studied using MD simulations. Dipolar relaxation in its liquid crystalline (LC) phase has been investigated and contrasted against that in the well-established benzene-1,3,5-tricarboxamide (BTA) family. The dihedral barrier related to the amide dipole flip is larger in the pyrene system than in BTA which explains the differences in their dipolar relaxation behaviors. The mechanism underlying polarization switching upon the application of an external electric field in the LC phase is investigated. Unlike in BTA, this switching is not associated with a reversal of the helical sense of the hydrogen bonded chains, due to differences in molecular symmetry. The observations enable general conclusions on the relationship between electric field induced chiral enhancement and symmetry to be drawn

    PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications

    No full text
    Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski’s rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery

    PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications

    No full text
    Abstract Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski’s rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery
    corecore