5 research outputs found

    Do Deep Learning Methods Really Perform Better in Molecular Conformation Generation?

    Full text link
    Molecular conformation generation (MCG) is a fundamental and important problem in drug discovery. Many traditional methods have been developed to solve the MCG problem, such as systematic searching, model-building, random searching, distance geometry, molecular dynamics, Monte Carlo methods, etc. However, they have some limitations depending on the molecular structures. Recently, there are plenty of deep learning based MCG methods, which claim they largely outperform the traditional methods. However, to our surprise, we design a simple and cheap algorithm (parameter-free) based on the traditional methods and find it is comparable to or even outperforms deep learning based MCG methods in the widely used GEOM-QM9 and GEOM-Drugs benchmarks. In particular, our design algorithm is simply the clustering of the RDKIT-generated conformations. We hope our findings can help the community to revise the deep learning methods for MCG. The code of the proposed algorithm could be found at https://gist.github.com/ZhouGengmo/5b565f51adafcd911c0bc115b2ef027c

    Uni-pKa: An Accurate and Physically Consistent pKa Prediction through Protonation Ensemble Modeling

    No full text
    Predicting pKa values of small molecules has key applications in drug discovery and molecular simulation. However, current methods face challenges in rigorously interpreting experimental data and ensuring thermodynamic consistency between successive pKa values. This study puts forward a protonation ensemble framework to address these limitations by modeling the full space of possible protonation microstates. Within this framework, we derive rigorous definitions connecting experimental macro-pKas to underlying micro-pKa equilibria. Under this new framework, we develop Uni-pKa, an accurate and reliable pKa predictor. Uni-pKa first pretrains on over 1 million predicted pKas from ChemBL to learn expressive molecular representations. It is then finetuned on experimental datasets that enforce consistency with the protonation ensemble definitions. The high-quality experimental pKa datasets are fitted to this framework by recovering underlying microstates from macro-pKas. Modeling the complete ensemble enables rigorous interpretation of macro-pKa data, and inherently preserves thermodynamic consistency, improving the prediction accuracy of Uni-pKa. Experiments demonstrate that Uni-pKa achieves state-of-the-art performance, outperforming previous methods. This novel protonation ensemble approach significantly advances machine learning for pKa prediction and molecular property modeling. Uni-pKa provides a good example of how to combine chemical knowledge and machine learning methods. Users can utilize Uni-pKa for predicting and ranking the protonation states of molecules under various pH conditions via https://app.bohrium.dp.tech/uni-pka

    Synergistic application of molecular docking and machine learning for improved binding pose

    No full text
    Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design. Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space, while relying on machine-learning approaches may lead to invalid conformations. In this study, we propose a novel strategy that combines molecular docking and machine learning methods. Firstly, the protein-ligand binding poses are predicted using a deep learning model. Subsequently, position-restricted docking on predicted binding poses is performed using Uni-Dock, generating physically constrained and valid binding poses. Finally, the binding poses are re-scored and ranked using machine learning scoring functions. This strategy harnesses the predictive power of machine learning and the physical constraints advantage of molecular docking. Evaluation experiments on multiple datasets demonstrate that, compared to using molecular docking or machine learning methods alone, our proposed strategy can significantly improve the success rate and accuracy of protein-ligand complex structure predictions

    Synergistic Application of Molecular Docking and Machine Learning for Improved Protein-Ligand Binding Pose Prediction

    No full text
    Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design. Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space, while relying on machine-learning approaches may lead to invalid conformations. In this study, we propose a novel strategy that combines molecular docking and machine learning methods. Firstly, the protein-ligand binding poses are predicted using the Uni-Mol Docking machine learning approach. Subsequently, position-restricted docking(PR Docking) on predicted binding poses is performed using Uni-Dock, generating physically constrained and valid binding poses. Finally, the binding poses are re-scored and ranked using machine learning scoring functions. This strategy harnesses the predictive power of machine learning and the physical constraints advantage of molecular docking. Evaluation experiments on multiple datasets demonstrate that, compared to using molecular docking or machine learning methods alone, our proposed strategy can significantly improve the success rate and accuracy of protein-ligand complex structure predictions. This strategy is avaliable at https://github.com/dptech-corp/Uni-Dock

    Uni-Mol: A Universal 3D Molecular Representation Learning Framework

    No full text
    Molecular representation learning (MRL) has gained tremendous attention due to its critical role in learning from limited supervised data for applications like drug design. In most MRL methods, molecules are treated as 1D sequential tokens or 2D topology graphs, limiting their ability to incorporate 3D information for downstream tasks and, in particular, making it almost impossible for 3D geometry prediction/generation. In this paper, we propose a universal 3D MRL framework, called Uni-Mol, that significantly enlarges the representation ability and application scope of MRL schemes. Uni-Mol contains two pretrained models with the same SE(3) Transformer architecture: a molecular model pretrained by 209M molecular conformations; a pocket model pretrained by 3M candidate protein pocket data. Besides, Uni-Mol contains several finetuning strategies to apply the pretrained models to various downstream tasks. By properly incorporating 3D information, Uni-Mol outperforms SOTA in 14/15 molecular property prediction tasks. Moreover, Uni-Mol achieves superior performance in 3D spatial tasks, including protein-ligand binding pose prediction, molecular conformation generation, etc. The code, model, and data are made publicly available at https://github.com/dptech-corp/Uni-Mol
    corecore