Search CORE

3 research outputs found

PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications

Author: Bhati Agastya P
Garg Akshit
Jeurkar Shruti
Korlepara Divya B
Kumar Vishal
Mehta Sarvesh
Modee Rohit
Muvva Charuvaka
Nayar Divya
Pal Pradeep Kumar
Priyakumar U Deva
Roy Subhajit
Sharma Shubham
Sridharan Bhuvanesh
Vasavi CS
Publication venue: NATURE PORTFOLIO
Publication date: 07/09/2022
Field of study

Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities

UCL Discovery

PubMed Central

PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications

Author: Aathira G. Nair
Divya B Korlepara
Divya Nayar
Indhu Ramachandran
Kavita Thakran
Pradeep Kumar Pal
Prathit Chatterjee
Rakesh Srivastava
Reena Jaglan
Saalim H. Raza
Sanjana Pandey
Shivam Pandit
Shivangi Verma
Shruti Jeurkar
Shubham Sharma
U. Deva Priyakumar
Vasavi C.S.
Vishal Kumar
Publication venue
Publication date: 07/08/2023
Field of study

Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski’s rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery

ChemRxiv

PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications

Author: Aathira G. Nair
Divya B. Korlepara
Divya Nayar
Indhu Ramachandran
Kavita Thakran
Pradeep Kumar Pal
Prathit Chatterjee
Rakesh Srivastava
Reena Jaglan
Saalim H. Raza
Sanjana Pandey
Shivam Pandit
Shivangi Verma
Shruti Jeurkar
Shubham Sharma
U. Deva Priyakumar
Vasavi C. S.
Vishal Kumar
Publication venue: Nature Portfolio
Publication date: 01/02/2024
Field of study

Abstract Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski’s rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery

Directory of Open Access Journals