13 research outputs found

    Machine Learning and Solvation Theory for Drug Discovery

    Full text link
    Drug discovery is a notoriously expensive and time-consuming process; hence, developing computational methods to facilitate the discovery process and lower the associated costs is a long-sought goal of computational chemists. Protein-ligand binding, which provides the physical and chemical basis for the mechanism of action of most drugs, occurs in an aqueous environment, and binding affinity is determined not only by atomic interactions between the protein and ligand but also by changes in their interactions with surrounding water molecules that occur upon binding. Thus, a quantitative understanding of the roles water molecules play in the protein-ligand binding process is an essential foundation for developing computational methods and tools to aid the drug discovery process. Grid inhomogeneous solvation theory (GIST) is a tool that measures the thermodynamic and structural properties of water molecules on protein surfaces. Since its implementation, GIST has been used to study water behavior upon protein-ligand binding and to account for solvent effects in scoring functions used in virtual screening. This thesis is comprised of two research projects that extend the applications and functionality of GIST. In the first project, we investigated whether the water properties measured by GIST could improve the performance of machine learning models, specifically, convolutional neural networks (CNN) applied to virtual screening (GIST-CNN project). In the second project, we implemented the particle mesh Ewald (PME) algorithm for energy calculation in GIST, enabling GIST to become a more accurate and more efficient tool for end-state free energy calculation (PME-GIST project). The GIST-CNN project arose in response to reports indicating that convolutional neural network (CNN) models were able to outperform classical scoring functions in virtual screening. We noticed that all the reported machine learning models had been trained only by protein-ligand structures, while water molecules were completely neglected. Given that water molecules play essential roles in protein-ligand binding, we hypothesized that we could further improve the performance of CNN models in terms of enrichment efficiency by adding water features, measured by GIST, to the data used to train the model. Contrary to our hypothesis, we found that adding water features could not further improve the performance of a CNN model trained by protein-ligand structures, which was already very high. However, further investigation revealed that the high performance and reported enrichment efficiency of a CNN model trained by protein-ligand information was solely attributable to biases in the Database of Useful Decoys-Enhanced (DUD-E), which was used to train and test the model. In this project, we also established a suite of methods to investigate what a model learns from the input during training and argued that machine learning models should be thoroughly validated before being applied in real drug discovery projects. The motivations for the PME-GIST project were twofold. First, although GIST provides the statistical thermodynamic framework for thermodynamic end-state free energy calculation, inconsistencies in energy calculations between the previous GIST implementation (GIST-2016) and modern molecular dynamics engines prevent precise comparison of the GIST end-state method to other reference free energy calculation methods such as thermodynamic integration (TI). Second, the O(N2) nonbonded energy calculation is the most expensive step in the entire GIST calculation process. By implementation of the PME algorithm into GIST, we aimed to achieve GIST energy calculations consistent with those of modern molecular dynamic engines and to accelerate the energy calculation to O(NlogN), which is highly desirable when applying GIST to the measurement of water properties across an entire protein surface. In addition to implementing PME, we derived a simple empirical estimator for high order entropies, which are truncated in GIST. After incorporating PME-based energy calculation and the high order entropy estimator, we used PME-GIST to calculate end-state solvation free energy for a wide range of small molecules and achieved results highly consistent with TI (= 0.99, mean unsigned difference = 0.44 kcal/mol). The PME-GIST code we developed in this project was integrated into the open-source molecular dynamics analysis software CPPTRAJ for easy access by others in the drug discovery community. In summary, in this thesis, we explored the potential of adding solvation thermodynamics to machine learning-based virtual screening and found that the high performance reported for machine learning models in this application reflected biases in the dataset used construct and test them rather than successfully generalization of the physical principles that govern molecular interactions. We also addressed the inconsistent energy calculation between GIST and modern molecular simulation engines by developing PME-GIST. We hope the research work presented in this thesis will further expand and accelerate the application of GIST to drug discovery

    Performance and Analysis of the Alchemical Transfer Method for Binding Free Energy Predictions of Diverse Ligands

    Full text link
    The Alchemical Transfer Method (ATM) is herein validated against the relative binding free energies of a diverse set of protein-ligand complexes. We employed a streamlined setup workflow, a bespoke force field, and the AToM-OpenMM software to compute the relative binding free energies (RBFE) of the benchmark set prepared by Schindler and collaborators at Merck KGaA. This benchmark set includes examples of standard small R-group ligand modifications as well as more challenging scenarios, such as large R-group changes, scaffold hopping, formal charge changes, and charge-shifting transformations. The novel coordinate perturbation scheme and a dual-topology approach of ATM address some of the challenges of single-topology alchemical relative binding free energy methods. Specifically, ATM eliminates the need for splitting electrostatic and Lennard-Jones interactions, atom mapping, defining ligand regions, and post-corrections for charge-changing perturbations. Thus, ATM is simpler and more broadly applicable than conventional alchemical methods, especially for scaffold-hopping and charge-changing transformations. Here, we performed well over 500 relative binding free energy calculations for eight protein targets and found that ATM achieves accuracy comparable to existing state-of-the-art methods, albeit with larger statistical fluctuations. We discuss insights into specific strengths and weaknesses of the ATM method that will inform future deployments. This study confirms that ATM is applicable as a production tool for relative binding free energy (RBFE) predictions across a wide range of perturbation types within a unified, open-source framework

    Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening

    Full text link
    Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening. However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed datasets for the training and evaluation of CNN molecular recognition models. Here, we outline various sources of bias in one such widely-used dataset, the Directory of Useful Decoys: Enhanced (DUDE). We have constructed and performed tests to investigate whether CNN models developed using DUD-E are properly learning the underlying physics of molecular recognition, as intended, or are instead learning biases inherent in the dataset itself. We find that superior enrichment efficiency in CNN models can be attributed to the analogue and decoy bias hidden in the DUD-E dataset rather than successful generalization of the pattern of proteinligand interactions. Comparing additional deep learning models trained on PDBbind datasets, we found that their enrichment performances using DUD-E are not superior to the performance of the docking program AutoDock Vina. Together, these results suggest that biases that could be present in constructed datasets should be thoroughly evaluated before applying them to machine learning based methodology development

    Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening

    Full text link
    Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening. However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed datasets for the training and evaluation of CNN molecular recognition models. Here, we outline various sources of bias in one such widely-used dataset, the Directory of Useful Decoys: Enhanced (DUDE). We have constructed and performed tests to investigate whether CNN models developed using DUD-E are properly learning the underlying physics of molecular recognition, as intended, or are instead learning biases inherent in the dataset itself. We find that superior enrichment efficiency in CNN models can be attributed to the analogue and decoy bias hidden in the DUD-E dataset rather than successful generalization of the pattern of proteinligand interactions. Comparing additional deep learning models trained on PDBbind datasets, we found that their enrichment performances using DUD-E are not superior to the performance of the docking program AutoDock Vina. Together, these results suggest that biases that could be present in constructed datasets should be thoroughly evaluated before applying them to machine learning based methodology development

    Macrophage in Sporadic Thoracic Aortic Aneurysm and Dissection: Potential Therapeutic and Preventing Target

    No full text
    Thoracic aortic aneurysm and dissection (TAAD) is a life-threatening cardiovascular disorder lacking effective clinical pharmacological therapies. The underlying molecular mechanisms of TAAD still remain elusive with participation of versatile cell types and components including endothelial cells (ECs), smooth muscle cells (SMCs), fibroblasts, immune cells, and the extracellular matrix (ECM). The main pathological features of TAAD include SMC dysfunction, phenotypic switching, and ECM degradation, which is closely associated with inflammation and immune cell infiltration. Among various types of immune cells, macrophages are a distinct participator in the formation and progression of TAAD. In this review, we first highlight the important role of inflammation and immune cell infiltration in TAAD. Furthermore, we discuss the role of macrophages in TAAD from the aspects of macrophage origination, classification, and functions. On the basis of experimental and clinical studies, we summarize key regulators of macrophages in TAAD. Finally, we review how targeting macrophages can reduce TAAD in murine models. A better understanding of the molecular and cellular mechanisms of TAAD may provide novel insights into preventing and treating the condition

    Thermodynamic Decomposition of Solvation Free Energies with Particle Mesh Ewald and Long-Range Lennard-Jones Interactions in Grid Inhomogeneous Solvation Theory

    No full text
    Grid Inhomogeneous Solvation Theory (GIST) maps out solvation thermodynamic properties on a fine meshed grid and provides a statistical mechanical formalism for thermodynamic end-state calculations. However, differences in how long-range non-bonded interactions are calculated in molecular dynamics engines and in the current implementation of GIST have prevented precise comparisons between free energies estimated using GIST and those from other free energy methods such as thermodynamic integration (TI). Here, we address this by presenting PME-GIST, a formalism by which particle mesh Ewald (PME) based electrostatic energies and long-range Lennard-Jones (LJ) energies are decomposed and assigned to individual atoms and the corresponding voxels they occupy in a manner consistent with the GIST approach. PME-GIST yields potential energy calculations that are precisely consistent with modern simulation engines and performs these calculations at a dramatically faster speed than prior implementations. Here, we apply PME-GIST end-states analyses to 32 small molecules whose solvation free energies are close to evenly distributed from 2 kcal/mol to -17 kcal/mol and obtain solvation energies consistent with TI calculations (R2 = 0.99, mean unsigned difference 0.8 kcal/mol). We also estimate the entropy contribution from the 2nd and higher order entropy terms that are truncated in GIST by the differences between entropies calculated in TI and GIST. With a simple correction for the high order entropy terms, PME-GIST obtains solvation free energies that are highly consistent with TI calculations (R2 = 0.99, mean unsigned difference = 0.4 kcal/mol) and experimental results (R2 = 0.88, mean unsigned difference = 1.4 kcal/mol). The precision of PME-GIST also enables us to show that the solvation free energy of small hydrophobic and hydrophilic molecules can be largely understood based on perturbations of the solvent in a region extending a few solvation shells from the solute. We have integrated PME-GIST into the open-source molecular dynamics analysis software CPPTRAJ

    Impact of body mass index on perioperative mortality of acute stanford type A aortic dissection: a systematic review and meta-analysis

    No full text
    Abstract Background Obesity may increase perioperative mortality of acute Stanford type A aortic dissection (ATAAD). However, the available evidence was limited. This study aimed to systematically review published literatures about body mass index (BMI) and perioperative mortality of ATAAD. Methods Electronic literature search was conducted in PubMed, Medline, Embase and Cochrane Library databases. All observational studies that investigated BMI and perioperative mortality of ATAAD were included. Pooled odds ratio (OR) and 95% confidence interval (CI) were calculated using a random-effects model. Meta-regression analysis was performed to assess the effects of different clinical variables on BMI and perioperative mortality of ATAAD. Sensitivity analysis was performed to determine the sources of heterogeneity. Egger’s linear regression method and funnel plot were used to determine the publication bias. Results A total of 12 studies with 5,522 patients were eligible and included in this meta-analysis. Pooled analysis showed that perioperative mortality of ATAAD increased by 22% for each 1 kg/m2 increase in BMI (OR = 1.22, 95% CI: 1.10–1.35). Univariable meta-regression analysis indicated that age and female gender significantly modified the association between BMI and perioperative mortality of ATAAD in a positive manner (meta-regression on age: coefficient = 0.04, P = 0.04; meta-regression on female gender: coefficient = 0.02, P = 0.03). Neither significant heterogeneity nor publication bias were found among included studies. Conclusions BMI is closely associated with perioperative mortality of ATAAD. Optimal perioperative management needs to be further explored and individualized for obese patient with ATAAD, especially in elderly and female populations. Trial registration PROSPERO (CRD42022358619). Graphical Abstract BMI and perioperative mortality of ATAAD

    An Online Repository of Solvation Thermodynamic and Structural Maps of SARS-CoV-2 Targets

    No full text
    SARS-CoV-2 recently jumped species and rapidly spread via human-to-human transmission to cause a global outbreak of COVID-19. The lack of effective vaccine combined with the severity of the disease necessitates attempts to develop small molecule drugs to combat the virus. COVID19_GIST_HSA is a freely available online repository to provide solvation thermodynamic maps of COVID-19-related protein small molecule drug targets. Grid Inhomogeneous Solvation Theory maps were generated using AmberTools cpptraj-GIST and Hydration Site Analysis maps were created using SSTmap code. The resultant data can be applied to drug design efforts: scoring solvent displacement for docking, rational lead modification, prioritization of ligand- and protein- based pharmacophore elements, and creation of water-based pharmacophores. Herein, we demonstrate the use of the solvation thermodynamic mapping data. It is hoped that this freely provided data will aid in small molecule drug discovery efforts to defeat SARS-CoV-2

    Associations between urinary phthalate metabolite concentrations and markers of liver injury in the US adult population

    No full text
    Background: Phthalates have been largely used for years in varieties of products worldwide. However, research on the joint toxic effect of various phthalates exposure on the liver is lacking. Objectives: We aimed to assess exposure to phthalates on liver function tests (LFTs). Methods: This analysis included data on 6046 adults (≥20 years old) who participated in a National Health and Nutrition Examination Survey (NHANES) in 2007–2016. We employed linear regression and Bayesian kernel machine regression (BKMR), to explore the associations of urinary phthalate metabolites with 8 indicators of LFTs. Results: Di(2-ethylhexyl) phthalate (ΣDEHP) was found to be positively associated with serum alanine aminotransferase (ALT), gamma-glutamyl transferase (GGT) and alkaline phosphatase (ALP) (all P FDR < 0.05). We found significant positive associations of ∑DEHP, mono-ethyl phthalate (MEP) and mono-(carboxyisononyl) phthalate (MCNP) with total bilirubin (TBIL) (all P FDR < 0.05). ΣDEHP, mono-n-butyl phthalate (MBP), mono-(3-carboxypropyl) phthalate (MCPP) and mono-benzyl phthalate (MBzP) were negatively associated with serum ALB (all P FDR < 0.05). The BKMR analyses showed a significantly positive overall effect on ALT, AST, ALP and TBIL levels with high concentrations of phthalate metabolites and a significantly negative overall effect on ALB and TP, when all the chemicals at low concentrations. Conclusions: Our results add novel evidence that exposures to phthalates might be adversely associated with the indicators of LTFs, indicating the potential toxic effect of phthalate exposures on the human liver

    Oxidative stress drives vascular smooth muscle cell damage in acute Stanford type A aortic dissection through HIF-1α/HO-1 mediated ferroptosis

    No full text
    Background: Acute Stanford type A aortic dissection (ATAAD) is characterized by intimal tearing and false lumen formation containing large amounts of erythrocytes with heme. Heme oxygenase 1 (HO-1) is the key enzyme to degrade heme for iron accumulation and further ferroptosis. The current study aimed at investigating the role of HO-1 in the dissection progression of ATAAD. Methods: Bioinformatic analyses and experimental validation were performed to reveal ferroptosis and HO-1 expression in ATAAD. Human aortic vascular smooth muscle cell (HA-VSMC) was used to explore underlying molecular mechanisms and the role of HO-1 overexpression in ATAAD. Results: Ferroptosis was identified as a critical manner of regulated cell death in ATAAD. HO-1 was screened as a key signature of ferroptosis in ATAAD, which was closely associated with oxidative stress. Single cell/nucleus transcriptomic analysis and histological staining revealed that HO-1 and HIF-1α were upregulated in vascular smooth muscle cell (VSMC) of ATAAD. Further in vitro experiments showed that H2O2-induced oxidative stress increased VSMC ferroptosis with the overexpression of HO-1, which could be suppressed by HIF-1α inhibitor PX-478. HIF-1α could transcriptionally regulate the expression of HO-1 through binding to its promoter region. Pharmacological inhibition of HO-1 by zinc protoporphyrin (ZnPP) did not reduce H2O2-induced HA-VSMC damage without heme co-incubation. However, H2O2-induced HA-VSMC damage was worsened when heme was added into the medium, and ZnPP could reduce HA-VSMC damage in this condition. Conclusion: HO-1 is a key signature of VSMC ferroptosis in ATAAD. HIF-1α/HO-1 mediated ferroptosis might participate in oxidative stress induced VSMC damage
    corecore