6 research outputs found
PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications
Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities
Supramolecular Polymerization of <i>N</i>,<i>N</i>′,<i>N</i>″,<i>N</i>‴-<i>tetra</i>-(Tetradecyl)-1,3,6,8-pyrenetetracarboxamide: A Computational Study
The role of molecular
dipole orientations and intermolecular interactions
in a derivative of pyrene on its supramolecular self-assembly in solution
has been investigated using quantum chemical and force field based
computational approaches. Five possible dipole configurations of the
molecule have been examined, among which the one in which adjacent
dipole vectors are antiparallel to each other is determined to be
the ground state, on electrostatic grounds. Self-assembly of this
molecule under realistic conditions has been studied using MD simulations.
Dipolar relaxation in its liquid crystalline (LC) phase has been investigated
and contrasted against that in the well-established benzene-1,3,5-tricarboxamide
(BTA) family. The dihedral barrier related to the amide dipole flip
is larger in the pyrene system than in BTA which explains the differences
in their dipolar relaxation behaviors. The mechanism underlying polarization
switching upon the application of an external electric field in the
LC phase is investigated. Unlike in BTA, this switching is not associated
with a reversal of the helical sense of the hydrogen bonded chains,
due to differences in molecular symmetry. The observations enable
general conclusions on the relationship between electric field induced
chiral enhancement and symmetry to be drawn
PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications
Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski’s rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery
PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications
Abstract Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski’s rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery