56 research outputs found
Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening
Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening. However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed datasets for the training and evaluation of CNN molecular recognition models. Here, we outline various sources of bias in one such widely-used dataset, the Directory of Useful Decoys: Enhanced (DUDE). We have constructed and performed tests to investigate whether CNN models developed using DUD-E are properly learning the underlying physics of molecular recognition, as intended, or are instead learning biases inherent in the dataset itself. We find that superior enrichment efficiency in CNN models can be attributed to the analogue and decoy bias hidden in the DUD-E dataset rather than successful generalization of the pattern of proteinligand interactions. Comparing additional deep learning models trained on PDBbind datasets, we found that their enrichment performances using DUD-E are not superior to the performance of the docking program AutoDock Vina. Together, these results suggest that biases that could be present in constructed datasets should be thoroughly evaluated before applying them to machine learning based methodology development
Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening
Recently much effort has been invested in using convolutional neural network (CNN) models trained on 3D structural images of protein-ligand complexes to distinguish binding from non-binding ligands for virtual screening. However, the dearth of reliable protein-ligand x-ray structures and binding affinity data has required the use of constructed datasets for the training and evaluation of CNN molecular recognition models. Here, we outline various sources of bias in one such widely-used dataset, the Directory of Useful Decoys: Enhanced (DUDE). We have constructed and performed tests to investigate whether CNN models developed using DUD-E are properly learning the underlying physics of molecular recognition, as intended, or are instead learning biases inherent in the dataset itself. We find that superior enrichment efficiency in CNN models can be attributed to the analogue and decoy bias hidden in the DUD-E dataset rather than successful generalization of the pattern of proteinligand interactions. Comparing additional deep learning models trained on PDBbind datasets, we found that their enrichment performances using DUD-E are not superior to the performance of the docking program AutoDock Vina. Together, these results suggest that biases that could be present in constructed datasets should be thoroughly evaluated before applying them to machine learning based methodology development
Uncoupling the structure–activity relationships of β2 adrenergic receptor ligands from membrane binding
Ligand binding to membrane proteins may be significantly influenced by the interaction of ligands with the membrane. In particular, the microscopic ligand concentration within the membrane surface solvation layer may exceed that in bulk solvent, resulting in overestimation of the intrinsic protein−ligand binding contribution to the apparent/measured affinity. Using published binding data for a set of small molecules with the β2 adrenergic receptor, we demonstrate that deconvolution of membrane and protein binding contributions allows for improved structure−activity relationship analysis and structure-based drug design. Molecular dynamics simulations of ligand bound membrane protein complexes were used to validate binding poses, allowing analysis of key interactions and binding site solvation to develop structure−activity relationships of β2 ligand binding. The resulting relationships are consistent with intrinsic binding affinity (corrected for membrane interaction). The successful structure-based design of ligands targeting membrane proteins may require an assessment of membrane affinity to uncouple protein binding from membrane interactions
Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD
Implementing precision medicine for complex diseases such as chronic obstructive lung disease (COPD) will require extensive use of biomarkers and an in-depth understanding of how genetic, epigenetic, and environmental variations contribute to phenotypic diversity and disease progression. A meta-analysis from two large cohorts of current and former smokers with and without COPD [SPIROMICS (N = 750); COPDGene (N = 590)] was used to identify single nucleotide polymorphisms (SNPs) associated with measurement of 88 blood proteins (protein quantitative trait loci; pQTLs). PQTLs consistently replicated between the two cohorts. Features of pQTLs were compared to previously reported expression QTLs (eQTLs). Inference of causal relations of pQTL genotypes, biomarker measurements, and four clinical COPD phenotypes (airflow obstruction, emphysema, exacerbation history, and chronic bronchitis) were explored using conditional independence tests. We identified 527 highly significant (p 10% of measured variation in 13 protein biomarkers, with a single SNP (rs7041; p = 10−392) explaining 71%-75% of the measured variation in vitamin D binding protein (gene = GC). Some of these pQTLs [e.g., pQTLs for VDBP, sRAGE (gene = AGER), surfactant protein D (gene = SFTPD), and TNFRSF10C] have been previously associated with COPD phenotypes. Most pQTLs were local (cis), but distant (trans) pQTL SNPs in the ABO blood group locus were the top pQTL SNPs for five proteins. The inclusion of pQTL SNPs improved the clinical predictive value for the established association of sRAGE and emphysema, and the explanation of variance (R2) for emphysema improved from 0.3 to 0.4 when the pQTL SNP was included in the model along with clinical covariates. Causal modeling provided insight into specific pQTL-disease relationships for airflow obstruction and emphysema. In conclusion, given the frequency of highly significant local pQTLs, the large amount of variance potentially explained by pQTL, and the differences observed between pQTLs and eQTLs SNPs, we recommend that protein biomarker-disease association studies take into account the potential effect of common local SNPs and that pQTLs be integrated along with eQTLs to uncover disease mechanisms. Large-scale blood biomarker studies would also benefit from close attention to the ABO blood group
Global patient outcomes after elective surgery: prospective cohort study in 27 low-, middle- and high-income countries.
BACKGROUND: As global initiatives increase patient access to surgical treatments, there remains a need to understand the adverse effects of surgery and define appropriate levels of perioperative care. METHODS: We designed a prospective international 7-day cohort study of outcomes following elective adult inpatient surgery in 27 countries. The primary outcome was in-hospital complications. Secondary outcomes were death following a complication (failure to rescue) and death in hospital. Process measures were admission to critical care immediately after surgery or to treat a complication and duration of hospital stay. A single definition of critical care was used for all countries. RESULTS: A total of 474 hospitals in 19 high-, 7 middle- and 1 low-income country were included in the primary analysis. Data included 44 814 patients with a median hospital stay of 4 (range 2-7) days. A total of 7508 patients (16.8%) developed one or more postoperative complication and 207 died (0.5%). The overall mortality among patients who developed complications was 2.8%. Mortality following complications ranged from 2.4% for pulmonary embolism to 43.9% for cardiac arrest. A total of 4360 (9.7%) patients were admitted to a critical care unit as routine immediately after surgery, of whom 2198 (50.4%) developed a complication, with 105 (2.4%) deaths. A total of 1233 patients (16.4%) were admitted to a critical care unit to treat complications, with 119 (9.7%) deaths. Despite lower baseline risk, outcomes were similar in low- and middle-income compared with high-income countries. CONCLUSIONS: Poor patient outcomes are common after inpatient surgery. Global initiatives to increase access to surgical treatments should also address the need for safe perioperative care. STUDY REGISTRATION: ISRCTN5181700
2015/16 seasonal vaccine effectiveness against hospitalisation with influenza a(H1N1)pdm09 and B among elderly people in Europe: Results from the I-MOVE+ project
We conducted a multicentre test-negative caseâ\u80\u93control study in 27 hospitals of 11 European countries to measure 2015/16 influenza vaccine effectiveness (IVE) against hospitalised influenza A(H1N1)pdm09 and B among people aged â\u89¥ 65 years. Patients swabbed within 7 days after onset of symptoms compatible with severe acute respiratory infection were included. Information on demographics, vaccination and underlying conditions was collected. Using logistic regression, we measured IVE adjusted for potential confounders. We included 355 influenza A(H1N1)pdm09 cases, 110 influenza B cases, and 1,274 controls. Adjusted IVE against influenza A(H1N1)pdm09 was 42% (95% confidence interval (CI): 22 to 57). It was 59% (95% CI: 23 to 78), 48% (95% CI: 5 to 71), 43% (95% CI: 8 to 65) and 39% (95% CI: 7 to 60) in patients with diabetes mellitus, cancer, lung and heart disease, respectively. Adjusted IVE against influenza B was 52% (95% CI: 24 to 70). It was 62% (95% CI: 5 to 85), 60% (95% CI: 18 to 80) and 36% (95% CI: -23 to 67) in patients with diabetes mellitus, lung and heart disease, respectively. 2015/16 IVE estimates against hospitalised influenza in elderly people was moderate against influenza A(H1N1)pdm09 and B, including among those with diabetes mellitus, cancer, lung or heart diseases
Quantum Mechanics Approaches to Structurally Informed Design
This manuscript focuses on the application of molecular modeling and structure-informed design (SID) to drug discovery. Routine utilization of quantum mechanics techniques allows generating and testing SID hypothesis, based on first principles. We introduce the concept of combining electrostratic potential surfaces and non-bonding orbitals to determine the nature and directionality of intermolecular interactions, particularly those that are not very often exemplified in the Protein Data Bank
Relative Binding Free-Energy Calculations at Lipid-Exposed Sites: Deciphering Hot Spots.
Relative binding free-energy (RBFE) calculations are experiencing resurgence in the computer-aided drug design of novel small molecules due to performance gains allowed by cutting-edge molecular mechanic force fields and computer hardware. Application of RBFE to soluble proteins is becoming a routine, while recent studies outline necessary steps to successfully apply RBFE at the orthosteric site of membrane-embedded G-protein-coupled receptors (GPCRs). In this work, we apply RBFE to a congeneric series of antagonists that bind to a lipid-exposed, extra-helical site of the P2Y1 receptor. We find promising performance of RBFE, such that it may be applied in a predictive manner on drug discovery programs targeting lipid-exposed sites. Further, by the application of the microkinetic model, binding at a lipid-exposed site can be split into (1) membrane partitioning of the drug molecule followed by (2) binding at the extra-helical site. We find that RBFE can be applied to calculate the free energy of each step, allowing the uncoupling of observed binding free energy from the influence of membrane affinity. This protocol may be used to identify binding hot spots at extra-helical sites and guide drug discovery programs toward optimizing intrinsic activity at the target
- …